# ROC Curve

In many applications, there's a need to decide between two alternatives. In the military, radar operators look at approaching objects and decide if it's a threat. Doctors look at an image and decide if it's a tumour. For facial recognition, an algorithm has to decide if it's a match. In Machine Learning, we call this binary classification while in radar we call it signal detection.

The decision depends on a threshold. Receiver Operating Characteristic (ROC) Curve is a graphical plot that helps us see the performance of a binary classifier or diagnostic test when the threshold is varied. Using the ROC Curve, we can select a threshold that best suits our application. The idea is to maximize correct classification or detection while minimizing false positives. ROC Curve is also useful when comparing alternative classifiers or diagnostic tests.

## Discussion

• How do we define or plot the ROC Curve?

Let's take a binary classification problem that has two distributions: one for positives and one for negatives. To classify subjects into one of these two classes, we select a threshold. Anything above the threshold is classified as positive. The accuracy of the classifier depends directly on the threshold we use. ROC Curve is plotted by varying the thresholds and recording the classifier's performance at each threshold.

ROC curve plots True Positive Rate (TPR) versus False Positive Rate (FPR). TPR is also called recall or sensitivity. TPR is the probability that we detect a signal when it's present. FPR is the complement of specificity: (1-specificity). FPR is the probability that we detect a signal when it's not present. Being based on only recall and specificity, ROC curve is independent of prevalence, that is, how common is the condition in the population.

An ideal classifier will have an ROC curve that rises sharply from origin until FPR rises when TPR is already high. Each point on the ROC curve represents the performance of the classifier at one threshold value.

• Which application domains are using ROC Curves?

ROC started in radar applications. It was later applied in many other domains including psychology, medicine, radiology, biometrics, and meteorology. More recently, it's being used in machine learning and data mining.

In medical practice, it's used for assessing diagnostic biomarkers, imaging tests or even risk assessment. It's been used to analyse information processing in the brain during sensory difference testing.

In bioinformatics and computational genomics, ROC analysis is being applied. In particular, it's used to classify biological sequences and protein structures.

ROC has been used to describe the performance of instruments built to detect explosives. In engineering, it's been used to evaluate the accuracy of pipeline reliability analysis and predict the failure threshold value.

• What is AUC and its significance?

After plotting the ROC Curve, the area under it is called Area Under the ROC Curve (AUC), Area Under the Curve (AUC), or AUROC. It's been said that "ROC is a probability curve and AUC represents degree or measure of separability". In other words, AUC is a single metric that can be used to quantify how well two classes are separated by a binary classifier. It's also useful when comparing different classifiers.

AUC has some useful properties. It's scale-invariant. This means it tells how well predictions are ranked rather than their absolute values. AUC is also classification-threshold-invariant. We can objectively compare prediction models irrespective of classification thresholds used. However, these properties are not desirable for some applications.

AUC is also prevalence-invariant. Suppose a health condition is prevalent in only 1% of the population. A simple classifier can achieve 99% accuracy by predicting negative always. AUC however gives a more useful value of 0.5.

• How do I interpret an AUC value?

Since both axes of the ROC Curve range [0,1], AUC also ranges [0,1]. Some researchers map AUC to Gini Coefficient, which is 2*AUC-1, with range [-1,-1].

More realistically, AUC has a range [0.5,1] since the ROC curve is expected to be above the diagonal. Value 0.5 implies very poor separation and is represented by the diagonal ROC curve. Value 1 implies perfect separation, where TPR is always 1 at all values of FPR. As a thumb rule, we have an excellent classifier if AUC is >=0.9 and a good classifier when it's >= 0.8.

• Why do I need an ROC Curve when TPR and FPR may be adequate?

ROC Curve is a useful tool to compare classification methods and decide which one is better. Suppose a computer algorithm is implemented to diagnose a medical condition. Using ROC curves, we can compare its performance against a doctor's diagnosis, and against doctor's diagnosis when aided with computer-assisted detection (CAD). As shown in figure, a doctor using CAD gives best performance. The other two approaches have the same AUC but the doctor has a higher specificity (lower FPR).

In any binary classification problem, it's not possible to agree on a single threshold and consequently on values of sensitivity and specificity. Take the case of diagnostic testing as an example. Threshold would be adjusted based on the context and available information, such as patient history, presence of symptoms, or even likelihood of getting sued for a missed cancer. If we just plot two points for two classifiers, it's hard to know which one is better. Once we plot entire ROC curves, it's easy to see which one is better.

• For a binary classification problem, how to I select the optimum threshold on the ROC Curve?

There are basically two methods of determining the optimum threshold:

• Minimum-d: This is the shortest distance of the curve from the top-left corner or (0,1) point.
• Youden index: This is the vertical distance from the curve to the diagonal. To find the optimum point on the curve, we should maximize the Youden index.

ROC Curve and AUC ignore prevalence or misclassification costs. For example, poor sensitivity means missed cancer and delayed treatment whereas poor specificity means unnecessary treatment. Likewise, a false positive on a blood test for HIV simply means a discarded blood sample but a false negative will infect the blood recipient. It's for this reason decision makers should consider financial costs, and combine ROC analysis with utility-based decision theory to find the optimum threshold.

• How do I apply ROC Curves to multiclass problems?

Given $$c$$ classes, the ROC space has $$c(c-1)$$ dimensions. This makes it difficult to apply ROC Curve methodology to multiclass problems. However, some attempt has been made to apply it to 3 classes where AUC concept is extended to Volume Under the ROC Surface (VUS).

One approach is to reframe the problem into $$c$$ one-vs-all binary classifiers. However, ROC Curve may not be suitable since FPR will be underestimated due to large number of negative data points. For this reason, Precision vs. Recall curve is more suitable.

For computing the AUC, one technique is to average pairwise comparisons. This equivalent AUC value is useful since we can ignore the costs associated with different kinds of misclassification errors.

• What are some pitfalls or drawbacks of using ROC Curve and AUC?

In practice, AUC must be presented with a confidence interval, such as 95% CI, since it's estimated from a population sample. However, one research in clinical chemistry showed that many researchers failed to include CI or constructed them incorrectly.

AUC involves loss of information. Two ROC curves crossing each other can have the same AUC but each will have a range of thresholds at which it's better. Clinicians and patients interpret sensitivity and specificity but don't find AUC useful. They're not interested in performance across all thresholds. In ML, cost curves have been proposed as an alternative. Another alternative is H-measure.

AUC ignores the misclassification costs. A new test may be deemed worthless by using AUC alone. AUC also ignores prevalence but it's known that prevalence affects test results. While sensitivity and specificity are also independent of prevalence, prevalence can be considered during interpretation of the ROC curve.

Jorge M. Lobo et al. give many other reasons why AUC is not a suitable measure.

• What software packages are available for ROC analysis?

In R language, we can use the pROC package. Once we obtain the actual and predicted values, we can obtain the AUC along with confidence interval using the function ci.auc(). On GitHub, sachsmc/plotROC is an open source package for easily plotting ROC curves. It uses ggplot2, to which it adds handy functions for plotting: geom_roc, geom_rocci and style_roc.

In Python, a webpage on Scikit-learn gives code examples showing how to plot ROC curves and compute AUC for both binary and multiclass problems. It makes use of functions roc_curve and auc that are part of sklearn.metrics package.

## Milestones

1940

The idea of ROC starts in the 1940s with the use of radar during World War II. The task is to identify enemy aircraft while avoiding false detection of benign objects. ROC provides a suitable threshold for radar receiver operators. This also explains the origin of the term Receiver Operating Characteristic (ROC).

1950

In the 1950s, psychologists start using ROC when studying the relationship between psychological experience and physical stimuli.

1953

Peterson and Birdsall explain the ROC Curve in detail in the context of signal detection theory. They plot probability of signal detection versus probability of false alarm. Curve-1 represents optimum operation, curve-3 sets the lower limit, and curve-2 is by guessing. Curve-1 is produced by varying the operating level or threshold β, which is also the slope of the curve at that point. The value on the y-intercept is the one that needs to be maximized.

1960

L.B. Lusted applies ROC methodology to compare different studies of chest film interpretations for detection of pulmonary tuberculosis. This is the first application of ROC to radiology. It subsequently inspires the use of ROC in many diagnostic imaging systems. Lusted himself publishes Decision-making studies in patient management in 1971.

1968

Dorfman and Alf develop a method of curve fitting and use software to automate ROC analysis. A maximum likelihood approach under binomial assumption is developed. Many other programs written in FORTRAN are developed later: ROCFIT, CORROC, ROCPWR, and LABROC.

1969

The concept of Free-Response Receiver Operating Characteristic (FROC) Curve is introduced in auditory domain. In free-response analysis, in addition to detection, we also need to point out the location. The term "free-response" was coined in 1961. In 1978, FROC is applied for the first time in imaging. FROC can help where ROC can fail. For example, ROC can show location-level false positive and false negative that could "cancel" each other. This gives an image-level true positive: image shows cancer but wrong location is reported.

1982

Although Area Under the ROC Curve (AUC) was previously used, Hanley and McNeil develop analytical techniques to bring out its statistical properties. They estimate the standard error for different underlying distributions and sample sizes.

1989

As one of the earliest application of ROC Curve to machine learning, K.A. Spackman uses it evaluate and compare ML algorithms.

1997

Andrew Bradley notes that ROC curve is useful for visualizing a classifier's performance but not suitable for comparing multiple classification methods. A single performance measure is more desirable. He discusses how AUC can be used as a measure for comparing machine learning algorithms. He explains why AUC is a better measure than overall accuracy.

2000

An article titled Better decisions through science appears in Scientific American. It brings ROC Curve to the attention of a wider audience. One example in this article talks about glaucoma diagnosis using eye fluid pressure. It defines the basic terms and shows hypothetical distribution curves, ROC Curves, and AUC. It states that AUC is a reflection of a test's accuracy.

2001

Hand and Till generalize the concept of AUC for multiclass problems. In 2007, Landgrebe and Duin approximate the problem via pairwise analysis.

Author
No. of Edits
No. of Chats
DevCoins
4
4
2203
3
5
379
1
0
163
1
0
59
2052
Words
3
Likes
14K
Hits

## Cite As

Devopedia. 2019. "ROC Curve." Version 9, August 22. Accessed 2024-06-25. https://devopedia.org/roc-curve
Contributed by
4 authors

Last updated on
2019-08-22 13:48:41
• Site Map