Syllabus Map
- Study map: Syllabus Study Map
Overview
- Naive Bayes is a probabilistic classifier built from Bayes’ theorem.
- It assumes features are conditionally independent given the class.
- Despite this strong assumption, it is a strong baseline for many classification tasks.
Core Idea
- Bayes’ theorem updates a prior belief using observed evidence:
- For feature vector and class :
- Naive Bayes uses the conditional independence assumption:
- So classification uses:
- Predict the class with maximum posterior probability:
Step-by-Step Training and Inference
Step 1: Estimate class priors
- Compute from class frequencies in the training set.
Step 2: Estimate feature likelihoods
- Estimate for each class and feature.
- The exact form depends on feature type and Naive Bayes variant.
Step 3: Apply smoothing
- Use Laplace (additive) smoothing to avoid zero probabilities:
- is count of feature value in class , is number of possible values, and is smoothing strength.
Step 4: Score new samples
- For each class, compute posterior score and choose the largest.
- Use log-space for numerical stability:
Common Variants
Gaussian Naive Bayes
- For continuous features, assume:
- Estimate mean and variance per class-feature pair.
Multinomial Naive Bayes
- Best for count features (for example bag-of-words).
- Uses term frequencies and class-conditional token probabilities.
Bernoulli Naive Bayes
- Best for binary features (feature present/absent).
- Models each feature as Bernoulli conditioned on class.
Naive Bayes In Practice
When to Use Naive Bayes
- When you need a simple, fast baseline classifier.
- When features are high-dimensional and sparse (for example text).
- When training data is limited.
- When interpretability of probabilistic reasoning is useful.
When Not to Use Naive Bayes
- When feature dependencies are strong and central to the task.
- When probability calibration is critical without post-calibration.
- When nonlinear interactions dominate predictive signal.
- When richer models clearly outperform and latency budget allows them.
Practical Notes
Efficient for baseline deployment
- Naive Bayes trains and serves quickly with low memory overhead.
Assumption mismatch can hurt performance
- If likelihood assumptions differ from data reality, calibration and accuracy may degrade.
Tune smoothing strength
- The Laplace parameter can significantly change rare-feature behavior.
Compute in log space
- Log-probability arithmetic helps avoid numerical underflow in long feature vectors.