IOAI ML Notes Neural NetworkDeep Learning

Weight Initialisation

Weight initialisation schemes and practical guidelines for stable training.

Syllabus Map


Overview


Distributions (Normal vs Uniform)

Normal distribution

Uniform distribution


Notation / Terminology


Core idea

Common schemes

Random initialisation

Constant initialisation

Xavier / Glorot

Uniform form

WU(6nin+nout, 6nin+nout)W \sim \mathcal{U}\left(-\sqrt{\frac{6}{n_{in}+n_{out}}},\ \sqrt{\frac{6}{n_{in}+n_{out}}}\right)

Normal form

WN(0, 2nin+nout)W \sim \mathcal{N}\left(0,\ \frac{2}{n_{in}+n_{out}}\right)

He / Kaiming

Uniform form

WU(6nin, 6nin)W \sim \mathcal{U}\left(-\sqrt{\frac{6}{n_{in}}},\ \sqrt{\frac{6}{n_{in}}}\right)

Normal form

WN(0, 2nin)W \sim \mathcal{N}\left(0,\ \frac{2}{n_{in}}\right)

Practical Notes

Match initialisation to activation function

Initialize biases simply unless task-specific priors exist

← Back to Blog