Distribution Shift - Yousef's Notes

When the distributions of the training data and test data are not the same.

#Causes

Not enough data and/or not enough similarity between the dataset and production data.
Data in production changes over time
- New/outdated features and or labels
- Consumer trends
- Weather patterns
e.g. lots of labeled images from the web, but our goal is to train a classifier on instagram photos.

We can analytically detect that a shift has ocurred but understanding the type we are experiencing requires testing.

Histograms, density plots, or boxplots of features from training and validation/test sets
Principal Component Analysis or t-SNE projections of both datasets.

Compare the distribution of features. e.g. Jensen-Shannon divergence (JSD) but there are other methods for the more statistically savvy engineers.

Train a classifier to discriminate between datasets.
Label all training data as 0 and validation data as 1.
Train a classifier (e.g. logistic regression) to distinguish them.
If the classifier performs much better than random guessing, a distribution shift exists.

We assume that the way features are generated from labels doesn’t change.
Confusion matrix: $\hat{p}{\text {test }}(Y)=C^{-1} \cdot p{\text {test }}(\hat{Y})$
- where C is the confusion matrix on the training set, and $\hat{Y}$ are predicted classes on test data
Black Box Shift Detection (BBSD)
- Train a classifier on the training set.
- Use this classifier to generate soft predictions on both training and test data.
- Compare the label distributions (e.g. histograms of predicted class probabilities) and check for significant differences that suggest label shift.

Hardest to detect, as it often requires access to true labels Y in the test data.
Measure performance (e.g.. accuracy, loss) of the original model on new data
- if performance drops but covariate and label shifts are ruled out, we suspect concept shift.
Check if misclassification patterns change in unusual ways.