# Confusion Matrix

## Summary

In statistical classification, we create algorithms or models to predict or classify data into a finite set of classes. Since models are not perfect, some data points will be classified incorrectly. Confusion matrix is basically a tabular summary showing how well the model is performing.^{}

In one dimension, the matrix takes the actual values. The matrix then maps these to the predicted values in the other dimension. In reality, the matrix is like a histogram. The entries in the matrix are counts. For example, it records how many data points were predicted as "true" when they were actually "false".^{}

Confusion matrix is useful in both binary classification as well as multiclass classification problems. There are many performance metrics that can be computed from the matrix. Learning these metrics is handy for a statistician or data scientist.

## Milestones

## Discussion

What are the elements and terminology used in Confusion Matrix? Let's also consider a concrete example of a pregnancy test. Based on a urine test, we predict if a person is pregnant or not. We assume that the ground truth (pregnant or not) is available to us. We therefore have four possibilities:

^{}**True Positive (TP)**: We predict a pregnant person is pregnant. This is a good prediction.**True Negative (TN)**: We predict a non-pregnant person is not pregnant. This is a good prediction.**False Positive (FP)**: We predict a non-pregnant person is pregnant. This type of error is also called*Type I Error*.**False Negative (FN)**: We predict a pregnant person is not pregnant. This type of error is also called*Type II Error*.

When these are arranged in matrix form, it will be apparent that correct predictions are represented along the main diagonal. Incorrect predictions are in the non-diagonal cells. This makes it easy to see where predictions have gone wrong. We may also say that the matrix represents the model's inability to classify correctly, and hence the "confusion" in the model.

^{}What metrics are used for evaluating the performance of a prediction model? Performance metrics from a confusion matrix are represented in the following equations:

^{}^{}$$Recall\ or\ Sensitivity=TP/(TP+FN)=TP/AllPositives\\Specificity=TN/(TN+FP)=TN/AllNegatives\\Precision=TP/(TP+FP)=TP/PredictedPositives\\Prevalence=TP+FN/Total=AllPositives /Total\\Accuracy=(TP+TN)/Total\\Error\ Rate=(FP+FN)/Total$$

It's important to understand the significance of these metrics. Accuracy is an overall measure of correct prediction, regardless of the class (positive or negative). The complement of accuracy is error rate or misclassification rate.

High recall implies that very few positives are misclassified as negatives. High precision implies very few negatives are misclassified as positives. There's a trade-off here. If model is partial towards positives, we'll end up with high recall but low precision. It model favours negatives, we'll end up with low recall and high precision.

^{}High specificity, like high precision, implies that very few negatives are misclassified as positives. If positive represents some disease, specificity is the model's confidence in clearing a person as disease-free. Selectivity is the model's confidence in diagnosing a person as diseased.

^{}Ideally, recall, specificity, precision and accuracy should all be close to 1. FNR, FPR and error rate should be close to 0.

Could you give a numerical example showing calculations of performance measures of a prediction model? This example has 165 samples. We show the following calculations:

^{}- Recall or True Positive Rate (TPR): TP/(TP+FN) = 100/(100+5) = 0.95
- False Negative Rate (FNR): 1 - TPR = 0.05
- Specificity or True Negative Rate (TNR): TN/(TN+FP) = 50/(50+10) = 0.17
- False Positive Rate (FPR): 1 - TNR = 0.83
- Precision: TP/(TP+FP) = 100/(100+10) = 0.91
- Prevalence: (TP+FN)/Total = (100+5)/165 = 0.64
- Accuracy: (TP+TN)/Total = (100+50)/165 = 0.91
- Error Rate: (FP+FN)/Total = (10+5)/165 = 0.09

Why do we need so many performance measures when accuracy can be sufficient? If the dataset has 90% positives, then achieving 90% accuracy is easy by predicting only positives. Thus, accuracy is not a sufficient measure when dataset is imbalanced. Accuracy also doesn't differentiate between Type I (False Positive) and Type II (False Negative) errors.

^{}This is where the confusion matrix gives us more useful measures with FPR and FNR; or their complementary measures, Recall and Specificity respectively.Consider the multiclass problem of iris classification that has three classes: setosa, versicolor and virginica. This has an accuracy of 84% (32/38) but it doesn't tell us where the errors are happening. With the confusion matrix, it's easy to see that only versicolor is wrongly classified. The matrix also shows that versicolor is misclassified as virginica and never as setosa. We can also see that Recall is 62% (10/16) for versicolor.

^{}In fact, when classes are not evenly represented in the data, confusion matrix by itself doesn't give an adequate visual representation. For this reason, we use a

**normalized confusion matrix**that takes care of class imbalance.^{}What are other performance metrics for a classification/prediction problem? **F-measure**takes a harmonic mean of Recall and Precision, (2*Recall*Precision)/(Recall+Precision). It's a value closer to the smaller of the two. Applying this to our earlier example, we get F-measure = (2*0.95*0.91)/(0.95+0.91) = 0.92^{}A commonly used graphical measure is the

**ROC Curve**. It's generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as we vary the threshold for assigning observations to a given class.^{}How often will we be wrong if we always predict the majority class?

**Null Error Rate**gives us a measure for this. It's a useful baseline when evaluating a model. In our example, null error rate would be 60/165 = 0.36. If the model always predicted positive, it would be wrong 36% of the time.^{}**Cohen's Kappa**can be applied to know how well a classifier is performing as opposed to classifying simply by chance. A high Kappa score implies accuracy differs a lot from null error rate.^{}What's the procedure to make or use a Confusion Matrix? We certainly need both the actual values and the predicted values. We can arrange the actual values by rows and the predicted values by columns, although some may swap the two. It's therefore important read the arrangement of the matrix correctly. For each actual value, count the number of predicted values for each class. Fill these counts into the matrix.

^{}There's no threshold for good accuracy, sensitivity or other measures. They should be interpreted in the context of problem, domain and business.

Could you mention some tools and techniques in relation to the Confusion Matrix? In R, package

*caret: Classification and Regression Training*can be used to get confusion matrix with all relevant statistical information. The function is`confusionMatrix(data=predicted, reference=expected)`

.^{}This plots actuals (called reference) by columns and predictions by rows.^{}In Python, package

*sklearn.metrics*has an equivalent function,`confusion_matrix(actual, predicted)`

.^{}^{}This plots actuals by rows and predictions by columns.^{}Other related and useful functions are`accuracy_score(actual, predicted`

) and`classification_report(actual, predicted)`

.^{}^{}

## Sample Code

## References

- Brownlee, Jason. 2016. "What is a Confusion Matrix in Machine Learning." Machine Learning Mastery, November 18. Accessed 2019-08-18.
- Caret. 2019. "confusionMatrix: Create a confusion matrix." Caret Docs, via rdrr, May 02. Accessed 2019-08-18.
- Idris, Awab. 2018. "Confusion Matrix." Medium, July 11. Accessed 2019-06-27.
- Kohavi, Ron and Foster Provost, eds. 1998. "Glossary of Terms." Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Machine Learning, vol. 30, pp. 271-274, Kluwer Academic Publishers. Accessed 2019-06-28.
- Krüger, Frank. 2016. "Activity, Context, and Plan Recognition with Computational Causal Behaviour Models." ResearchGate, December. Accessed 2019-08-20.
- Makhtar, Mokhairi and Daniel C. Neagu and Mick J. Ridley. 2011. "Comparing Multi-class Classifiers: On the Similarity of Confusion Matrices for Predictive Toxicology Applications." In: Yin H., Wang W., Rayward-Smith V. (eds), Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, vol. 6936, Springer, Berlin, Heidelberg. Accessed 2019-08-18.
- Markham, Kevin. 2014. "Simple guide to confusion matrix terminology." Data School, March 25. Accessed 2019-06-27.
- Narkhede, Sarang. 2018. "Understanding Confusion Matrix." Towards Data Science, via Medium, May 09. Accessed 2019-08-18.
- Parikh, R., A. Mathai, S. Parikh, G. Chandra Sekhar, and R. Thomas. 2008. "Understanding and using sensitivity, specificity and predictive values." Indian journal of ophthalmology, 56(1), 45–50, Jan-Feb. doi:10.4103/0301-4738.37595. Accessed 2019-08-20.
- Pearson, Karl. 1904. "On the theory of contingency and its relation to association and normal correlation." Drapers' Company Research Memoirs, Biometric Series I, Dept. of Applied Mathematics, University of London. Accessed 2019-06-28.
- Scikit-learn. 2019a. "sklearn.metrics.confusion_matrix." scikit-learn, v0.21.3, July 30. Accessed 2019-08-18.
- Scikit-learn. 2019b. "Confusion matrix." scikit-learn, v0.21.3, July 30. Accessed 2019-08-18.
- Scikit-learn. 2019c. "sklearn.metrics.classification_report." scikit-learn, v0.21.3, July 30. Accessed 2019-08-18.
- Sharma, Abhishek. 2017. "Confusion Matrix in Machine Learning." GeeksforGeeks, October 15. Updated 2018-02-07. Accessed 2019-08-18.
- Townsend, James. 1971. "Theoretical analysis of an alphabet confusion matrix." Attention Perception & Psychophysics 9(1):40-50, via ResearchGate, January. Accessed 2019-06-28.

## Milestones

## Tags

## See Also

- Hypothesis Testing and Types of Errors
- ROC Curve
- Machine Learning
- Contingency Table
- Statistical Classification
- Statistical Inference

## Further Reading

- Caret. 2019. "confusionMatrix: Create a confusion matrix." Caret Docs, via rdrr, May 02. Accessed 2019-08-18.
- Vanneti, Marco. 2007. "Confusion matrix online calculator." Accessed 2019-08-18.
- Mills, Peter. 2017. "Bayesian Learning for Statistical Classification." Stats and Bots, via Medium, September 26. Accessed 2019-08-18.
- Narkhede, Sarang. 2018. "Understanding Confusion Matrix." Towards Data Science, via Medium, May 09. Accessed 2019-08-18.