ROC Curve
 Summary

Discussion
 How do we define or plot the ROC Curve?
 Which application domains are using ROC Curves?
 What is AUC and its significance?
 How do I interpret an AUC value?
 Why do I need an ROC Curve when TPR and FPR may be adequate?
 For a binary classification problem, how to I select the optimum threshold on the ROC Curve?
 How do I apply ROC Curves to multiclass problems?
 What are some pitfalls or drawbacks of using ROC Curve and AUC?
 What software packages are available for ROC analysis?
 Milestones
 References
 Further Reading
 Article Stats
 Cite As
In many applications, there's a need to decide between two alternatives. In the military, radar operators look at approaching objects and decide if it's a threat. Doctors look at an image and decide if it's a tumour. For facial recognition, an algorithm has to decide if it's a match. In Machine Learning, we call this binary classification while in radar we call it signal detection.
The decision depends on a threshold. Receiver Operating Characteristic (ROC) Curve is a graphical plot that helps us see the performance of a binary classifier or diagnostic test when the threshold is varied. Using the ROC Curve, we can select a threshold that best suits our application. The idea is to maximize correct classification or detection while minimizing false positives. ROC Curve is also useful when comparing alternative classifiers or diagnostic tests.^{}
Discussion
How do we define or plot the ROC Curve? Let's take a binary classification problem that has two distributions: one for positives and one for negatives. To classify subjects into one of these two classes, we select a threshold. Anything above the threshold is classified as positive. The accuracy of the classifier depends directly on the threshold we use. ROC Curve is plotted by varying the thresholds and recording the classifier's performance at each threshold.^{}
ROC curve plots True Positive Rate (TPR) versus False Positive Rate (FPR). TPR is also called recall or sensitivity. TPR is the probability that we detect a signal when it's present. FPR is the complement of specificity: (1specificity). FPR is the probability that we detect a signal when it's not present.^{} Being based on only recall and specificity, ROC curve is independent of prevalence, that is, how common is the condition in the population.^{}
An ideal classifier will have an ROC curve that rises sharply from origin until FPR rises when TPR is already high. Each point on the ROC curve represents the performance of the classifier at one threshold value.^{}
Which application domains are using ROC Curves? ROC started in radar applications. It was later applied in many other domains including psychology, medicine, radiology, biometrics, and meteorology. More recently, it's being used in machine learning and data mining.^{}
In medical practice, it's used for assessing diagnostic biomarkers, imaging tests or even risk assessment.^{} It's been used to analyse information processing in the brain during sensory difference testing.^{}
In bioinformatics and computational genomics, ROC analysis is being applied. In particular, it's used to classify biological sequences and protein structures.^{}
ROC has been used to describe the performance of instruments built to detect explosives.^{} In engineering, it's been used to evaluate the accuracy of pipeline reliability analysis and predict the failure threshold value.^{}
What is AUC and its significance? After plotting the ROC Curve, the area under it is called Area Under the ROC Curve (AUC),^{} Area Under the Curve (AUC), or AUROC. It's been said that "ROC is a probability curve and AUC represents degree or measure of separability". In other words, AUC is a single metric that can be used to quantify how well two classes are separated by a binary classifier.^{} It's also useful when comparing different classifiers.^{}
AUC has some useful properties. It's scaleinvariant. This means it tells how well predictions are ranked rather than their absolute values. AUC is also classificationthresholdinvariant. We can objectively compare prediction models irrespective of classification thresholds used. However, these properties are not desirable for some applications.^{}
AUC is also prevalenceinvariant. Suppose a health condition is prevalent in only 1% of the population. A simple classifier can achieve 99% accuracy by predicting negative always. AUC however gives a more useful value of 0.5.^{}
How do I interpret an AUC value? Since both axes of the ROC Curve range [0,1], AUC also ranges [0,1]. Some researchers map AUC to Gini Coefficient, which is 2*AUC1, with range [1,1].^{}
More realistically, AUC has a range [0.5,1] since the ROC curve is expected to be above the diagonal. Value 0.5 implies very poor separation and is represented by the diagonal ROC curve. Value 1 implies perfect separation, where TPR is always 1 at all values of FPR. As a thumb rule, we have an excellent classifier if AUC is >=0.9 and a good classifier when it's >= 0.8.^{} ^{}
Why do I need an ROC Curve when TPR and FPR may be adequate? ROC Curve is a useful tool to compare classification methods and decide which one is better. Suppose a computer algorithm is implemented to diagnose a medical condition. Using ROC curves, we can compare its performance against a doctor's diagnosis, and against doctor's diagnosis when aided with computerassisted detection (CAD). As shown in figure, a doctor using CAD gives best performance. The other two approaches have the same AUC but the doctor has a higher specificity (lower FPR).^{}
In any binary classification problem, it's not possible to agree on a single threshold and consequently on values of sensitivity and specificity. Take the case of diagnostic testing as an example. Threshold would be adjusted based on the context and available information, such as patient history, presence of symptoms, or even likelihood of getting sued for a missed cancer. If we just plot two points for two classifiers, it's hard to know which one is better. Once we plot entire ROC curves, it's easy to see which one is better.^{}
For a binary classification problem, how to I select the optimum threshold on the ROC Curve? There are basically two methods of determining the optimum threshold:^{}
 Minimumd: This is the shortest distance of the curve from the topleft corner or (0,1) point.
 Youden index: This is the vertical distance from the curve to the diagonal. To find the optimum point on the curve, we should maximize the Youden index.
ROC Curve and AUC ignore prevalence or misclassification costs. For example, poor sensitivity means missed cancer and delayed treatment whereas poor specificity means unnecessary treatment.^{} Likewise, a false positive on a blood test for HIV simply means a discarded blood sample but a false negative will infect the blood recipient.^{} It's for this reason decision makers should consider financial costs, and combine ROC analysis with utilitybased decision theory to find the optimum threshold.^{}
How do I apply ROC Curves to multiclass problems? Given \(c\) classes, the ROC space has \(c(c1)\) dimensions. This makes it difficult to apply ROC Curve methodology to multiclass problems. However, some attempt has been made to apply it to 3 classes where AUC concept is extended to Volume Under the ROC Surface (VUS).^{}
One approach is to reframe the problem into \(c\) onevsall binary classifiers. However, ROC Curve may not be suitable since FPR will be underestimated due to large number of negative data points. For this reason, Precision vs. Recall curve is more suitable.^{}
For computing the AUC, one technique is to average pairwise comparisons. This equivalent AUC value is useful since we can ignore the costs associated with different kinds of misclassification errors.^{}
What are some pitfalls or drawbacks of using ROC Curve and AUC? In practice, AUC must be presented with a confidence interval, such as 95% CI, since it's estimated from a population sample.^{} However, one research in clinical chemistry showed that many researchers failed to include CI or constructed them incorrectly.^{}
AUC involves loss of information. Two ROC curves crossing each other can have the same AUC but each will have a range of thresholds at which it's better.^{} ^{} Clinicians and patients interpret sensitivity and specificity but don't find AUC useful. They're not interested in performance across all thresholds.^{} In ML, cost curves have been proposed as an alternative.^{} Another alternative is Hmeasure.^{}
AUC ignores the misclassification costs. A new test may be deemed worthless by using AUC alone. AUC also ignores prevalence but it's known that prevalence affects test results. While sensitivity and specificity are also independent of prevalence, prevalence can be considered during interpretation of the ROC curve.^{}
Jorge M. Lobo et al. give many other reasons why AUC is not a suitable measure.
What software packages are available for ROC analysis? In R language, we can use the pROC package. Once we obtain the actual and predicted values, we can obtain the AUC along with confidence interval using the function
ci.auc()
.^{} On GitHub,sachsmc/plotROC
is an open source package for easily plotting ROC curves. It uses ggplot2, to which it adds handy functions for plotting:geom_roc
,geom_rocci
andstyle_roc
.^{}In Python, a webpage on Scikitlearn gives code examples showing how to plot ROC curves and compute AUC for both binary and multiclass problems. It makes use of functions
roc_curve
andauc
that are part of sklearn.metrics package.^{}
Milestones
The idea of ROC starts in the 1940s with the use of radar during World War II. The task is to identify enemy aircraft while avoiding false detection of benign objects. ROC provides a suitable threshold for radar receiver operators. This also explains the origin of the term Receiver Operating Characteristic (ROC).^{}
In the 1950s, psychologists start using ROC when studying the relationship between psychological experience and physical stimuli.^{}
Peterson and Birdsall explain the ROC Curve in detail in the context of signal detection theory. They plot probability of signal detection versus probability of false alarm. Curve1 represents optimum operation, curve3 sets the lower limit, and curve2 is by guessing. Curve1 is produced by varying the operating level or threshold β, which is also the slope of the curve at that point. The value on the yintercept is the one that needs to be maximized.^{}
L.B. Lusted applies ROC methodology to compare different studies of chest film interpretations for detection of pulmonary tuberculosis. This is the first application of ROC to radiology. It subsequently inspires the use of ROC in many diagnostic imaging systems.^{} Lusted himself publishes Decisionmaking studies in patient management in 1971.^{}
Dorfman and Alf develop a method of curve fitting and use software to automate ROC analysis. A maximum likelihood approach under binomial assumption is developed. Many other programs written in FORTRAN are developed later: ROCFIT, CORROC, ROCPWR, and LABROC.^{}
The concept of FreeResponse Receiver Operating Characteristic (FROC) Curve is introduced in auditory domain. In freeresponse analysis, in addition to detection, we also need to point out the location. The term "freeresponse" was coined in 1961. In 1978, FROC is applied for the first time in imaging. FROC can help where ROC can fail. For example, ROC can show locationlevel false positive and false negative that could "cancel" each other. This gives an imagelevel true positive: image shows cancer but wrong location is reported.^{}
Although Area Under the ROC Curve (AUC) was previously used, Hanley and McNeil develop analytical techniques to bring out its statistical properties. They estimate the standard error for different underlying distributions and sample sizes.^{}
As one of the earliest application of ROC Curve to machine learning, K.A. Spackman uses it evaluate and compare ML algorithms.^{}
Andrew Bradley notes that ROC curve is useful for visualizing a classifier's performance but not suitable for comparing multiple classification methods. A single performance measure is more desirable. He discusses how AUC can be used as a measure for comparing machine learning algorithms. He explains why AUC is a better measure than overall accuracy.^{}
An article titled Better decisions through science appears in Scientific American. It brings ROC Curve to the attention of a wider audience.^{} One example in this article talks about glaucoma diagnosis using eye fluid pressure. It defines the basic terms and shows hypothetical distribution curves, ROC Curves, and AUC. It states that AUC is a reflection of a test's accuracy.^{}
Hand and Till generalize the concept of AUC for multiclass problems.^{} In 2007, Landgrebe and Duin approximate the problem via pairwise analysis.^{}
References
 Berrar, Daniel, and Peter Flach. 2012. "Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them)." Briefings in Bioinformatics, vol. 13, no. 1, pp. 83–97, January. Accessed 20190723.
 Bradley, Andrew P. 1997. "The use of the area under the ROC curve in the evaluation of machine learning algorithms." Pattern Recognition, vol. 30, no. 7, pp. 11451159, Elsevier Science Ltd. Accessed 20190820.
 Chakraborty, D. P. 2013. "A brief history of freeresponse receiver operating characteristic paradigm data analysis." Academic Radiology, 20(7), 915–919, July. doi:10.1016/j.acra.2013.03.001. Accessed 20190820.
 Döring, Matthias. 2018. "Performance Measures for MultiClass Problems." Data Science Blog, December 04. Accessed 20190723.
 Ekelund, Suzanne. 2011. "ROC curves – what are they and how are they used?" Acute Care Testing, January. Accessed 20190723.
 Fawcett, Tom. 2006. "Introduction to ROC analysis." Pattern Recognition Letters, 27(8):861874, June. Accessed 20190820.
 Google Developers. 2019. "Classification: ROC Curve and AUC." Crash Course, Machine Learning, Google, March 05. Accessed 20190723.
 HajianTilaki, Karimollah. 2013. "Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation." Caspian Journal of Internal Medicine, vol. 4, no. 2, pp. 62735. Accessed 20190723.
 Halligan, Steve, Douglas G. Altman, and Susan Mallett. 2015. "Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach." European Radiology, vol. 25, no. 4, pp. 932–939. Accessed 20190723.
 Hand, David J. and Robert J. Till. 2001. "A Simple Generalisation of the Area Under the ROCCurve for Multiple Class Classification Problems." Machine Learning, vol. 45, pp. 171–186, Kluwer Academic Publishers. Accessed 20190820.
 Hanley, James A., and Barbara J. McNeil. 1982. "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology, vol. 143, no.1, pp. 2936, April. Accessed 20190820.
 Joy, Janet E, Edward E Penhoet, and Diana B Petitti, eds. 2005. "ROC Analysis: Key Statistical Tool for Evaluating Detection Technologies." Appendix C in Saving Women's Lives: Strategies for Improving Breast Cancer Detection and Diagnosis, National Academies Press. Accessed 20190820.
 Kumar, Rajeev and Abhaya Indrayan. 2011. "Receiver Operating Characteristic (ROC) Curvefor Medical Researchers." Indian Pediatrics, vol. 48, pp. 277287, April 17. Accessed 20190820.
 Landgrebe, Thomas C.W. and Robert P.W. Duin. 2007. "Approximating the multiclass ROC by pairwise analysis." Pattern Recognition Letters, vol. 28, pp. 1747–1758, Elsevier. Accessed 20190820.
 Li, Bai. 2018. "Useful properties of ROC curves, AUC scoring, and Gini Coefficients." Lucky's Notes, April 04. Accessed 20190822.
 Narkhede, Sarang. 2018. "Understanding AUC  ROC Curve." Towards Data Science, via Medium, June 26. Accessed 20190723.
 Obuchowski, Nancy A., Michael L. Lieber, and Frank H. Wians. 2004. "ROC Curves in Clinical Chemistry: Uses, Misuses, and Possible Solutions." Clinical Chemistry, vol. 50, no. 7, pp. 11181125, June 30. Accessed 20190822.
 Park, Seong Ho, Jin Mo Goo, and ChanHee Jo. 2004. "Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists." Korean J Radiol., vol. 5, no. 1, pp. 11–18, JanMar. Accessed 20190822.
 Peterson, William Wesley, and Theodore G. Birdsall. 1953. "The theory of signal detectability." TR No. 13, Engineering Research Institute, Univ. of Michigan, June. Accessed 20190820.
 Rickert, Joseph. 2019. "ROC Curves." R Views, RStudio, January 17. Accessed 20190822.
 Sachs, Michael C. 2018. "Generate ROC Curve Charts for Print and Interactive Use." Via GitHub IO, June 01. Accessed 20190822.
 Scikitlearn Docs. 2019. "Receiver Operating Characteristic (ROC)." Scikitlearn v0.21.3, July 30. Accessed 20190822.
 Sonego, Paolo, András Kocsor, and Sándor Pongor. 2008. "ROC analysis: applications to the classification of biological sequences and 3D structures." Briefings in Bioinformatics, vol, 9, no. 3, pp. 198–209, May. Accessed 20190820.
 Streiner, David L., and John Cairney. 2007. "What’s Under the ROC? An Introduction to Receiver Operating Characteristics Curves." The Canadian Journal of Psychiatry, vol. 52, no. 2, pp. 121128. February. Accessed 20190822.
 Swets, J.A., R.M. Dawes, and J. Monahan. 2000. "Better decisions through science." Scientific American, pp. 82–87, October. Accessed 20190820.
 Tape, Thomas G. 2019a. "The Area Under an ROC Curve." Interpreting Diagnostic Tests, University of Nebraska Medical Center. Accessed 20190723.
 Tape, Thomas G. 2019b. "Plotting and Intrepretating an ROC Curve." Interpreting Diagnostic Tests, University of Nebraska Medical Center. Accessed 20190723.
 Tee, Kong Fah, Lutfor Rahman Khan, and Tahani CoolenMaturi. 2015. "Application of receiver operating characteristic curve for pipeline reliability analysis." Proc IMechE Part O: Journal of Risk and Reliability, December. Accessed 20190820.
 Treadway, Andrew. 2019. "How to get an AUC confidence interval." Open Source Automation, August 20. Accessed 20190822.
 Wichchukit, Sukanya and Michael O'Mahony. 2010. "A transfer of technology from engineering: use of ROC curves from signal detection theory to investigate information processing in the brain during sensory difference testing." J Food Sci., 75(9):R18393, NovDec. Accessed 20190820.
 Wikimedia Commons. 2015. "File:ROC curves colors.svg." Wikimedia Commons, December 05. Accessed 20190820.
 Wikipedia. 2019. "Receiver operating characteristic." Wikipedia, August 17. Accessed 20190821.
 Young, M., Wen Fan, Anna Raeva, and Jose Almirall. 2013. "Application of Receiver Operating Characteristic (ROC) Curves for Explosives Detection Using Different Sampling and Detection Techniques." Sensors (Basel, Switzerland), 13(12), 16867–16881. doi:10.3390/s131216867. Accessed 20190820.
Further Reading
 Turner, David A. 1978. "An Intuitive Approach to Receiver Operating Characteristic Curve Analysis." J Nucl Med, vol. 19, no. 2, pp. 213220. Accessed 20190822.
 Bohne, Julien. 2018. "Beyond the ROC AUC: Toward Defining Better Performance Metrics." BCG Gamma, via Medium, November 01. Accessed 20190822.
 Lobo, Jorge M., Alberto JiménezValverde, and Raimundo Real. 2007. "AUC: a misleading measure of the performance of predictive distribution models." Global Ecology and Biogeography, Blackwell Publishing Ltd. Accessed 20190723.
 Swets, J.A., R.M. Dawes, and J. Monahan. 2000. "Better decisions through science." Scientific American, pp. 82–87, October. Accessed 20190820.
Article Stats
Cite As
See Also
 Hypothesis Testing and Types of Errors
 Statistical Classification
 Data Visualization
 Confusion Matrix
 Detection Theory
 Machine Learning