Accuracy | Data Stories

We have already talked about it in this post. Just want to add few more things after finishing a course. This post is just an extension of above with some practical considerations.

We are claiming that accuracy may not be a good measure always. When you are building automated machine learning you must trust it.

Case Study

You want to show positive reviews on your website.
Say in your dataset 90% reviews are negative.
A classifier can achieve 90% accuracy by predicting all of them as negative.
But what you are interested in is finding out remaining 10% and display it on your website.

Precision = Did I show something negative?

Recall = How good I am at finding positive reviews?

Analogy with Optimist and Pessimist

Optimist assigns every/most review as positive
- Very good recall, but less precision
Pessimist assigns every/most review with negative
- Bad recall, good precision

Trade-off

Trade-off comes while scoring, not while training
We can assign labels based on probabilities
Decision tree gives probability by no of positive and negative samples at leaf node
Logistic regression of-course gives probability
We can change threshold to trade off between precision and recall
Positive when prob > 1 => Pessimist
Positive when prob > 0 => Optimist

Single no not always useful

Single numbers like F1 score and AUC are something I am not great fan of
You can not always choose classifier just by AUC, ROC curve might intersesct
- This intersection means that one classifier is better at some range of precision
- But if they don’t intersect we choose the one with higher AUC
From business perspective we are should be clear whether we want more precision or recall
Another practical metric they talked about was precision at k
- Say I want to display 5 reviews on my website
- What is the precision after 5 values I have chosen

Data Stories

Tag Accuracy

On Classification Accuracy – 2