Used to evaluate the performance of the ML algorithm compared to simpler/naive/trivial solutions.
A model or algorithm that provides a reference point for comparison:
- rule-based algorithm
- heuristic algorithm
- a simple statistic about training data
- another ml algorithm
- etc
#Random Prediction
Throwing a dice but be careful and look at the probability distributions
- classification: randomly picking a class from all the classes of the problem.
- regression: randomly selecting a target from all unique target values in the training data.
#Zero Rule
- classification: predict the most common class in the Training Dataset, independent of the input value.E.g., 100 records, 80 non-span, 20 spam. Baseline: 80% accuracy.
- regression: predict the sample average of the target values in the training data.
#Machine Learning
- Text classification, machine translation: SVM with a linear kernel.
- General numerical dataset: Linear Regression or Logistic Regression or KNN (k=5)
- Image classification: SVM with linear kernel, convolutional [[neural network]].
#Rule-based or groups of humans
- e.g. doctors.