Precision-Recall Tradeoff - Yousef's Notes
Precision-Recall Tradeoff

Precision-Recall Tradeoff

We have to choose a balance between Precision and Recall because we can’t have both high. Increasing one decreases the other.

  • High precision: model is conservative in predicting positives → high False Negatives as it misses some positives.
  • High recall: model is liberal in predicting positives → high False Positives as it misses some negatives.

#Tuning the tradeoff

  • We can assign a higher weight to the examples of a specific class. e.g. SVM in scikit-learn accepts weights of classes as input.
  • By tuning Hyperparameters to maximize either Precision or Recall on the Validation Dataset.
  • By varying the decision threshold for algorithms that return prediction scores.

#Example

We have a logistic regression model or a decision tree. We can increase precision at the cost of a lower recall: we decide that the prediction will be positive only if the score returned by the model is higher than 0.9 (instead of the default value of 0.5)

#Single Metric

To compare two or more models, we want a single value that determines the precision-recall tradeoff.

  • Optimizing and satisficing technique
    • Depending on the problem, choose either precision or recall and fix one metric to a threshold. e.g. spam classification: choose precision, fix FN threshold to 2%.
    • Generalization: threshold n-1 metrics and optimize the nth.
  • F-score
  • Simple average, or weighted average of metrics.
  • Invent our own domain-specific metric.