IOAI ML Notes Natural Language Processing

Encoder-Decoder Models

Sequence-to-sequence models for translation and multimodal tasks.

Syllabus Map


Overview


Core Idea

P(y1:Tx)=t=1TP(yty<t,x)P(y_{1:T}\mid x)=\prod_{t=1}^{T}P(y_t\mid y_{<t},x)

Transformer Encoder-Decoder Mechanics

Encoder

Decoder

CrossAttn(Q,K,V)=softmax ⁣(QKdk)V\text{CrossAttn}(Q,K,V)=\text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V

Training Objective

L=t=1TlogPθ(yty<t,x)\mathcal{L}=-\sum_{t=1}^{T}\log P_\theta(y_t^\star\mid y_{<t}^\star,x)

Inference Workflow

Step 1: Encode source sequence

Step 2: Start decoding

Step 3: Search strategy

Step 4: Stop condition


Use Cases


Practical Notes

Decoding Tradeoffs

Training-Inference Mismatch

Long-Context Cost

← Back to Blog