# What it does
Based on [[Baye’s Theorem]]
It assumes that features are conditionally independent given the class label.
Simplifies computation and often performs well in practice, despite the naive assumption.
$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$
$P(C|X)$ : Posterior probability (probability of the class given the features).
$P(X|C)$ : Likelihood (probability of the features given the class).
$P(C)$ : Prior probability (probability of the class occurring).
$P(X)$ : Evidence (probability of the features).
# How it works
3 types
Gaussian (continuous data)
Multinomial (discrete data)
Bernoulli (binary)
If a Feature value doesn’t appear in the training data for a class, it results in zero probability for that class (can be addressed using Laplace Smoothing )
Calculates posterior probabilities.
# Preconditions
Conditional independence rarely holds in the real world.
Requires minimal training data and has few parameters to tune.
# Evaluation
# Advantages
Fast and efficient
Simple to implement
Performs well with small data
Handles high-dimensional data
# Limitations
Assumes independence of features
Struggles with zero probabilities
Copyright 2024-2025 Yousef Amirghofran