Decision Tree - Yousef's Notes

The decision tree algorithm is a supervised machine learning method used for both #ml/classification and #ml/regression tasks. It works by recursively splitting the dataset into subsets based on Feature values, creating a tree-like structure of decisions. Each internal node represents a decision based on a feature, each branch represents the outcome of that decision, and each leaf node represents a final prediction (class label or continuous value).

#How it works

Input Data:
- A dataset with labeled examples (for classification) or continuous target values (for regression).
- Each example is represented as a set of features.
Tree Construction:
- The algorithm starts at the root node and recursively splits the data into subsets based on feature values.
- The goal is to create splits that maximize the homogeneity (purity) of the resulting subsets.
Splitting Criteria:
- The choice of feature and split point is determined by a splitting criterion:
  - For #ml/classification:
  - For #ml/regression :
    - Variance Reduction: Splits are chosen to minimize the variance of the target variable in the resulting subsets.
Stopping Conditions:
- The recursion stops when one of the following conditions is met:
  - All samples in a node belong to the same class (for classification).
  - The target values in a node are sufficiently homogeneous (for regression).
  - A predefined maximum depth is reached.
  - Further splits do not improve the model significantly (based on a threshold).
Leaf Nodes:
- For #ml/classification : the leaf node predicts the majority class of the samples in that node.
- For #ml/regression: The leaf node predicts the average (or median) target values of the samples in that node.

#Evaluation

#Advantages

Easy to interpret and visualize (white-box model).
Handles both numerical and categorical data.
Requires little data preprocessing (e.g., no need for feature scaling).
Robust to outliers.

#Disadvantages

Prone to overfitting, especially with deep trees without pruning.
Can be unstable; small changes in the data may lead to different trees.
Biased toward features with more levels or categories.
May not perform well on datasets with complex relationships (e.g., XOR problems).

#Popular Decision Tree Algorithms

ID3 (Iterative Dichotomiser 3):
- Uses information gain for splitting.
- Handles categorical features only.
C4.5:
- An extension of ID3 that handles both categorical and numerical features.
- Uses information gain ratio to reduce bias toward features with many levels.
CART (Classification and Regression Trees):
- Uses Gini impurity for classification and variance reduction for regression.
- Supports binary splits only.
Random Forest
- An ensemble method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting.
Gradient Boosted Trees:
- Builds trees sequentially, with each tree correcting the errors of the previous one.

#Applications

Customer segmentation
Fraud detection
Medical diagnosis
Predictive modeling
Natural language processing (e.g., parsing sentences)