IOAI ML Notes Natural Language ProcessingSelf-Supervised Learning

Masked Language Modeling

How MLM trains encoder models by predicting masked tokens from bidirectional context.

Syllabus Map


Overview


Core Objective

Definition

LMLM=iMlogPθ(xix\M)\mathcal{L}_{\text{MLM}}=-\sum_{i\in\mathcal{M}}\log P_\theta(x_i\mid x_{\backslash \mathcal{M}})

Why It Works


Masking Strategy

Standard BERT-Style Corruption

Purpose of Mixed Corruption


Step-by-Step Workflow

Step 1: Prepare corpus

Step 2: Build MLM training examples

Step 3: Pretrain encoder

Step 4: Adapt to downstream task


Practical Notes

Choose masking policy carefully

Use domain-adaptive pretraining when needed

Remember MLM limitations

← Back to Blog