Syllabus Map
- Study map: Syllabus Study Map
Overview
- Autoencoders learn to compress data into a latent code and reconstruct the original input.
- They are trained without labels, making them useful for representation learning.
- Common uses include dimensionality reduction, denoising, and anomaly detection.
- The bottleneck forces the model to capture the most salient structure in the data.
Core Idea
- Learn an encoder that maps input to a compact latent vector .
- Learn a decoder that reconstructs from .
- Train by minimising reconstruction Error so .
How It Works
Step 1: Encode the input
- Start with input vector .
- Compute latent code .
- The bottleneck dimension is typically smaller than .
Step 2: Decode the latent code
- Reconstruct .
- Decoder mirrors the encoder or uses a task-specific head.
- Output shape matches the input shape.
Step 3: Measure reconstruction Error
- Compare and with a suitable loss.
- Typical losses: MSE for real-valued data, BCE for binary data.
- The loss defines what details are prioritised in reconstruction.
Step 4: Update parameters
- Backpropagate the reconstruction loss through decoder and encoder.
- Update with gradient descent.
- Repeat over mini-batches until convergence.
Objective / Formula
Practical Notes
Bottleneck Size
- Undercomplete autoencoders (small ) encourage compact, informative codes.
- Overcomplete autoencoders usually need regularization (for example sparsity or dropout) to avoid identity mapping.
Evaluation
- Strong reconstruction does not always imply useful representations for downstream tasks.
- Validate latent quality with task-relevant metrics, not only reconstruction loss.
Architecture Choice
- For image data, convolutional encoders/decoders usually outperform fully connected architectures.
- Match the architecture to data structure (spatial, sequential, tabular) for better representations.
Types of Autoencoders
Undercomplete
Core Mechanism
- Bottleneck forces compression.
- Encoder must discard redundant features and keep only information needed for reconstruction.
Strengths
- Good for dimensionality reduction and compact representations.
- Often learns a useful low-dimensional manifold with simple MLP/CNN blocks.
Limitations
- Strong compression can remove fine details.
- May underfit if the bottleneck is too narrow.
Overcomplete
Core Mechanism
- Bottleneck gives high latent capacity.
- Without constraints, the encoder-decoder pair can approximate identity mapping.
Strengths
- Can preserve more detail than strict bottlenecks.
- Useful when paired with strong regularization.
Limitations
- Needs regularization (sparsity, noise, weight decay, dropout) to avoid trivial copying.
- Representation quality can look good by reconstruction but be weak for downstream tasks.
Sparse
Core Mechanism
- Add sparsity penalty (for example on or KL sparsity constraints).
- Encourages only a few active latent units per sample.
Strengths
- More interpretable latent codes.
- Better robustness to irrelevant features.
Limitations
- Overly strong sparsity can hurt reconstruction quality.
- Requires tuning sparsity weight carefully.
Denoising
Core Mechanism
- Corrupt input and reconstruct clean .
- Common corruption: Gaussian noise, masking, salt-and-pepper noise.
Strengths
- Learns features stable under perturbations.
- Improves robustness to noisy or partially missing inputs.
Limitations
- Performance depends on corruption type and strength.
- Mismatch between training corruption and real noise hurts transfer.
Contractive
Core Mechanism
- Penalize encoder sensitivity using a Jacobian norm term.
- Pushes nearby inputs to map to nearby latent codes.
Strengths
- Learns locally invariant, smooth features.
- Can improve robustness around the data manifold.
Limitations
- Jacobian penalties add compute cost.
- Too much contraction can collapse useful variation.
Variational (VAE)
Core Mechanism
- Learn a latent distribution with KL regularization toward a prior .
- Encoder outputs and ; sample with reparameterization.
Reparameterization Trick
- In a VAE, direct sampling blocks standard backpropagation through stochastic nodes.
- Rewrite sampling as:
- Randomness is isolated in , while and stay differentiable.
- In practice, the encoder outputs and , with .
- This enables low-variance gradient estimates and practical VAE training.
Strengths
- Enables sampling and generation.
- Latent space is smoother and more structured than deterministic AEs.
Limitations
- Reconstructions can be blurrier than deterministic models.
- KL-reconstruction balance can be hard to tune (posterior collapse risk).
Convolutional
Core Mechanism
- Use convolutional encoders/decoders for spatial data.
- Downsample with strided convolutions or pooling, then decode back.
Strengths
- Preserves locality and scales well to images.
- Learns hierarchical features (edges, textures, objects).
Limitations
- Can lose global context without attention or large receptive fields.
- Deconvolution artifacts can affect output quality.
Sequence
Core Mechanism
- Use RNN or Transformer encoder-decoder for sequential inputs.
- Encode variable-length sequences into latent states or token embeddings.
Strengths
- Works well for text, speech, and time series.
- Attention-based variants preserve long-range dependencies better than plain RNNs.
Limitations
- Sequence compression can discard fine temporal/token details.
- Training can be memory-heavy for long contexts.
Adversarial (AAE)
Core Mechanism
- Match latent distribution to a target prior with a discriminator.
- Replace explicit KL with adversarial latent regularization.
Strengths
- Flexible prior matching and often sharper samples.
- Combines reconstruction quality with generative latent structure.
Limitations
- Inherits adversarial training instability.
- Requires balancing reconstruction and discriminator objectives.