Syllabus Map
- Study map: Syllabus Study Map
Overview
- Bias and variance are commonly used to describe underfitting and overfitting behaviour.
- Decomposing loss into bias and variance helps interpret model performance.
Definitions (Expectation Over Training Sets)
- Let the true target be and the prediction be .
- Bias is the difference between the expected prediction and the true target.
- Variance measures how predictions vary around their expectation.
- Unless stated otherwise, the expectation is taken over different training sets.
Squared Loss Decomposition (Regression)
- Squared loss: .
- The expected squared loss decomposes into bias, variance, and a noise term (often called irreducible Error).
Total Error and Irreducible Error
- Total Error (expected loss) can be viewed as:
- Bias (systematic Error from wrong assumptions),
- Variance (Error from sensitivity to training data),
- Irreducible Error (noise in the data that no model can remove).
- In the squared-loss decomposition, the irreducible Error is the noise term that is often omitted for simplicity.
Why Total Error = Bias^2 + Variance + Irreducible Error
Notation:
- : input features
- : true target
- : model prediction from a training set
- : true underlying function
- : noise term (unobserved randomness)
- : expectation over different training sets
Assume the data-generating process:
Then:
Expand the expectation:
And decompose the first term:
So:
- Bias^2:
- Variance:
- Irreducible Error:
Therefore:
0-1 Loss Decomposition (Classification)
- 0-1 loss uses the mode prediction as the “main prediction,” not the mean.
- Bias is and variance is .
- Decomposing 0-1 loss is less straightforward and has multiple formulations.
Practical Notes
Variance Reduction
- Bagging typically reduces variance compared to a single decision tree in the provided examples.
0-1 Loss Caveat
- For 0-1 loss, if bias is 1, increasing variance can reduce loss (a counterintuitive edge case).