Use datasets composed of labeled data to perform operations like #ml/classification and #ml/regression .
#Preconditions
- Obtain a labeled dataset
- split the dataset into Training and Holdout Datasets
- Ensure that records in the validation and test datasets are statistically similar and independent.
- Data Imputation and Feature Engineering
- Convert all examples into numerical feature vectors
- Select a performance metric that returns a single number.
- We have a Baseline
#Main Algorithms
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Gradient Boosting Machines
- SGD Regressor/Classifier
#Examples
- Handwritten Digit Recognition
- Spam Detection
- Customer Segmentation
- Personalized Treatment
- Credit Scoring
- Churn Prediction
- Object Detection
- Sentiment Analysis.
- Fraud Detection.
- Learn Detection.
#Limitations
- Needs significant amount of labeled data: time-consuming and expensive.
- Training data must represent real-world scenarios and avoid biases.
#Classification
Predicts the category the data belongs to. e.g. spam detection, churn prediction, sentiment analysis, dog breed detection.
#Regression
Predicts a numerical value based on previously observed data. e.g. house price prediction, stock price prediction, height-weight prediction.
#Other Problems
Under specific conditions, supervised ML can solve problems beyond classification and regression.
- Ranking problems
- Metric learning
- Time-series forecasting
- Anomaly detection
- Structured prediction
- Imitation learning