A false positive error or false positive (false alarm) is a result that indicates a given condition exists when it doesn’t.
You can get the number of false positives from the confusion matrix. For a binary classification problem, it is described as follows:
Predicted = 0 | Predicted = 1 | Total | |
---|---|---|---|
Actual = 0 | True Negative (TN) | False Positive(FP) | N |
Actual = 1 | False Negative (FN) | True Positive (TP) | P |
Total | N * | P * |
In statistical hypothesis testing, the false positive rate is equal to the significance level, \(\alpha\), and \(1 - \alpha\) is defined as the specificity of the test. Complementarily, the false negative rate is given by \(\beta\).
The different measures for classification are:
Name | Definition | Synonyms |
---|---|---|
False positive rate (\(\alpha\)) | FP/N | Type I error, 1- specificity |
True Positive rate (\(1-\beta\)) | TP/P | 1 - Type II error, power, sensitivity, recall |
Positive prediction value | TP/P* | Precision |
Negative prediction value | TN/N* | |
Overall accuracy | (TN + TP)/N | |
Overall error rate | (FP + FN)/N |
Also, note that F-score is the harmonic mean of precision and recall.
\[\text{F1 score} = \frac{2 * \text{precision} * \text{recall}}{\text{precision} + \text{recall}}\]For example, in cancer detection, sensitivity and specificity are the following:
And precision and recall are the following:
Often, we want to make binary prediction e.g. in predicting the quality of care of the patient in the hospital, whether the patient receive poor care or good care? We can do this using a threshold value \(t\).
Now, the question arises, what value of \(t\) we should consider.
i.e.
Thus, the answer to the above question depends on what problem you are trying to solve. With no preference between the errors, we normally select \(t = 0.5\).
Area Under the ROC Curve gives the AUC score of a model.
Below is xkcd comic regarding the wrong interpretation of p-value and false positives.
Explanation of above comic on explain xkcd wiki.
References: