#Feature Discretization
Discretizing a real-valued numerical feature can improve predictive accuracy with small datasets. Different binning strategies are converted back to numerical values for the ML algorithm.
#Synthesizing features from relational data
Reduce multiple tables to single features by computing statistics on data across tables.
#Synthesizing features from the data
Clustering is commonly used to synthesize one or more additional features. e.g. k-means clustering:
- Choose k
- Apply clustering to the training data
- Add k additional features to your feature vectors. Each feature is binary: 1 if the vector belongs to a cluster.
#Synthesizing features from other features
- (Deep) neural networks with large amounts of data.
- Apply a simple transformation to one or a pair of existing features.
- Numerical features of the feature from kNN, found by using Euclidean distance or Cosine Similarity.
- discretization
- squaring
- mean/standard deviation
- Pairs of numerical features: simple arithmetic operators
- e.g. produce all the possible transformations for all the pairs and then select those that increase the quality of the model.