IOAI ML Notes Audio ProcessingDeep Learning

Pre-trained Audio Encoders

Self-supervised and supervised audio encoders for speech and general audio tasks.

Syllabus Map


Overview


Pre-trained Audio Encoders

Common Input Representations

Example Model Families

wav2vec 2.0

HuBERT

WavLM

Whisper Encoder


Self-Supervised Pretraining Idea

LSSL=Lcontrastive or classification+λLdiversity/regularization\mathcal{L}_{\text{SSL}}= \mathcal{L}_{\text{contrastive or classification}}+ \lambda\mathcal{L}_{\text{diversity/regularization}}

Step-by-Step Usage

Step 1: Choose encoder by task

Step 2: Prepare features

Step 3: Extract embeddings

Step 4: Train downstream head

Step 5: Fine-tune selectively


Evaluation


Practical Notes

Data and Domain

Serving Constraints

← Back to Blog