IOAI ML Notes Classical Machine LearningSupervised Learning

Naive Bayes

A comprehensive guide to Naive Bayes - exploring probabilistic classification with Bayes' theorem under conditional independence assumptions.

Syllabus Map


Overview


Core Idea

P(AB)=P(BA)P(A)P(B)P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)} P(yx1,,xn)=P(y)P(x1,,xny)P(x1,,xn)P(y \mid x_1,\dots,x_n) = \frac{P(y)\,P(x_1,\dots,x_n \mid y)}{P(x_1,\dots,x_n)} P(x1,,xny)=i=1nP(xiy)P(x_1,\dots,x_n \mid y) = \prod_{i=1}^{n} P(x_i \mid y) P(yx1,,xn)P(y)i=1nP(xiy)P(y \mid x_1,\dots,x_n) \propto P(y)\prod_{i=1}^{n} P(x_i \mid y) y^=argmaxyP(y)i=1nP(xiy)\hat{y} = \arg\max_y P(y)\prod_{i=1}^{n}P(x_i \mid y)

Step-by-Step Training and Inference

Step 1: Estimate class priors

Step 2: Estimate feature likelihoods

Step 3: Apply smoothing

P(xi=vy=c)=Nicv+αNic+αKP(x_i=v \mid y=c) = \frac{N_{icv} + \alpha}{N_{ic} + \alpha K}

Step 4: Score new samples

logP(yx)logP(y)+i=1nlogP(xiy)\log P(y \mid x) \propto \log P(y) + \sum_{i=1}^{n}\log P(x_i \mid y)

Common Variants

Gaussian Naive Bayes

xiy=cN(μic,σic2)x_i \mid y=c \sim \mathcal{N}(\mu_{ic}, \sigma_{ic}^2)

Multinomial Naive Bayes

Bernoulli Naive Bayes


Naive Bayes In Practice

When to Use Naive Bayes

When Not to Use Naive Bayes

Practical Notes

Efficient for baseline deployment

Assumption mismatch can hurt performance

Tune smoothing strength

Compute in log space

← Back to Blog