Feature Selection - Yousef's Notes

#Why you can’t use all features

Not all features will be equally relevant for a given problem. Features present in few examples:

Increase dimensionality: number of columns (features)/number of rows (examples)
Increase sparsity: a lot of zeros in each example
Increase computational cost: storage, computing time, dedicated technologies.
Overfitting risk: fitting noise instead of true patterns
Reduce model interpretability: harder to analyze and understand.
Distance metric degradation: in high dimensions, data points tend to be equidistant, reducing the effectiveness of distance-based algorithms (e.g. kNN, clustering)

Cutting the Long Tail
- Cut the features in the tail of the distribution of examples over features.
Boruta
L1-Regularization (Lasso Regression)
Task-specific feature selection
- Remove stop words
- Replace uncommon words with a single label, such as RARE_WORDS