IOAI ML Notes Mathematics FundamentalsCalculus

Learning Rate and Convergence

Math fundamentals note for IOAI ML preparation.

16 February 2026

Syllabus Map

Study map: Syllabus Study Map

Core Idea

Convergence depends on step size, curvature, and gradient noise.

Practical Notes

Use schedules when training plateaus.

Decay or cosine schedules improve late-stage convergence.

Monitor both loss and validation metrics.

Convergence in training loss does not guarantee generalization.

Why This Matters for ML

Learning rate is one of the highest-impact hyperparameters in training.
Convergence diagnostics (oscillation, divergence, stagnation) guide schedule decisions.
Warmup/decay policies often determine whether large models train stably.
Generalization quality can depend strongly on the optimization path set by step-size policy.

← Back to Blog