Area Under the Curve (AUC) in Plain English

AUC can be intuitively understood as: “the probability that the classifier will assign a higher score to a randomly chosen positive example than to a randomly chosen negative example.” – Wikipedia

Yeah ok nice but what does that really mean? Actually the previous intuition is a bit tricky to understand. So let’s try to understand it.

Suppose we have a binary classification problem scenario as the following: we have a dataset X with instances that have either 0 or 1 as labels. You divide the dataset into two parts: 1- training set, 2-test test. Next you train a classifier with the training set.

Now we want to test the performance of the classifier. Normally your test set has the following form: X_{test} is the instances matrix and Y_{test} is a vector that says if each instance is 0 or 1 . You feed X_{test} into the classifier and let it classify each instance and ask it to give a confidence score for each class. For example, if the instance number 15 in X_{test} has a true label 1 then your classifier will probably give the following confidence scores: 0.97 for class 1 and 0.3 for class 0 . So after feeding X_{test} into the classifier each instance will have two confidence scores. Let’s assume that Probs1 and Probs0 are vectors that hold the confidence scores for all the nodes, so Probs1 holds the scores of all instances for class 1 and Probs0 for class 0 .

Let’s recall what we have until this point: X_{test} , Y_{test} , Probs1 , Probs0 . Now when you compute the AUC you only use Y_{test} and Probs1 for the calculations (why? later my friend). Again Y_{test} holds the true labels and Probs1 holds the confidence or the probability of each instance to be of class 1 (or the positive class). Let’s assume you have a perfect classifier that classifies everything correctly. Then now notice that in Probs1 all instances that have true labels as 1 should have really high scores, while those that have true labels as 0 should have really low scores.

Now suppose you pick two instances from the test set X_{test} at random, x_1 that has a true label 1 and x_2 has a true label 0 . After that you check the scores of these two from Probs1 . You will see that x_1 might have 0.94 and x_2 have 0.1 . Notice that x_1 has a higher score than x_2 because x_1 is truly positive and x_2 is truly negative. Now suppose you keep picking positive and negative instances at random from X_{test} and you make a comparison to check if the truly positive instance has a higher score than the truly negative instance. Since our classifier is perfect it will always give higher scores to truly positive instances than to truly negative instances. So its AUC will be 1.0 .

Let’s consider if our classifier is not perfect, then some instances that are truly positive in reality will have low scores in Probs1 and some instances that are truly negative in reality will have high scores in Probs1 . In this case our comparisons will have some misses, for example: given two randomly picked positive and negative instances from X_{test} , the positive instance will have a lower score than the negative instance. This way the AUC of the imperfect classifier will be bad.

Actually suppose you made n comparisons, n` of the comparisons are correct and n`` of them are incorrect, then you can compute the AUC using the following formula:

AUC = \frac{n` + 0.5 n``}{n}

Now go back and read the intuition at the beginning of the post and hopefully you will fully understand it. (Always keep in your mind that we only use a probability vector for the positive class to compute the AUC)

p.s. this post was written very quickly without editing, so I hope it is not confusing or has serious mistakes.

References

Leave a Reply

Your email address will not be published. Required fields are marked *