- Explainability: Do we need to understand/explain how/why an algorithm made a prediction? Opaque: DNN, ensemble models; Transparent: [kNN, linear regression, decision tree learning.
- In-memory vs. out-of-memory: Can we load all the dataset in the RAM? If not, incremental learning algorithms, i.e., Naive Bayes and Neural Network (NN) training algorithms.
- Number of records and features: What is the maximum dataset/feature size/number managed by the algorithm? E.g., SVM modest size/number; Neural Network and random forests millions.
- Nonlinearity of the data: Are data linearly separable? Yes: SVM + linear kernel, linear and logistic regression; No: DNN or ensembles. Training speed: How much time do we have for (re)training the algorithm? Consider retraining time (e.g., every hour), opportunities for parallelism (e.g., random tree forests), GPUs (e.g., NN)
- Prediction speed: How fast does a prediction/inference need to be? Consider throughput requirements. Shallow is faster than deeper algorithms (e.g., DNN, KNN, ensemble).
#Shortlisting Algorithms Candidates (Spot Checking)
- Select algorithms based on different principles (sometimes called orthogonal), such as instance-based algorithms, kernel-based, shallow learning, deep learning, or ensembles.
- Try each algorithm with 3-5 different values of the most sensitive hyperparameters, e.g., the number of neighbors k in kNN, penalty C in SVM, or decision threshold in logistic regression).
- Use the same training/validation split for all experiments.
- If the learning algorithm is not deterministic (e.g., NN and random forests), run several experiments and then average the results.
- Once the project is over, note which algorithms performed the best. Use this information when working on a similar problem in the future.