Syllabus Map
- Study map: Syllabus Study Map
Overview
- Multi-layer perceptrons (MLPs) are fully connected neural networks that map input features to outputs through stacked linear layers and non-linear activations.
- They learn non-linear decision boundaries, making them a strong baseline for tabular data and small to mid‑sised feature sets.
- MLPs are simple, fast to train, and easy to scale, but they do not model spatial or sequential structure well.
- Key design levers are depth, width, activation functions, and regularisation.
Core idea
- An MLP composes multiple affine transformations with non-linear activations to approximate complex functions.
- It transforms an input vector into a prediction through layers:
. - Benefits:
- Universal function approximatour in theory (with sufficient width).
- Strong baseline for tabular classification and regression.
- Straightforward to implement, debug, and deploy.
- Drawbacks:
- Poor inductive bias for images, text, and sequences.
- Can overfit without careful regularisation.
- Dense layers scale poorly with very high‑dimensional inputs.
How it works
Step 1: Define the input and output
- Input (feature vector).
- Output is either:
- Regression: real‑valued vector, or
- Classification: class scores (logits).
Step 2: Compute the hidden layers
- Each layer applies:
- At neuron level:
- is a non‑linear activation (e.g., ReLU, tanh, GELU).
- Weights are fully connected, so every input connects to every neuron.
Step 3: Produce the output
- Final layer output:
- Regression:
- Classification: logits then softmax:
Step 4: Compute the loss
- Choose a loss function:
- MSE for regression:
- Cross‑entropy for classification:
- The loss measures prediction Error across the batch.
Step 5: Backpropagate gradients
- Use the chain rule to compute gradients: and .
- Gradients flow backwards from output to input.
- Optimisers (SGD, Adam) update parameters to reduce loss.
Step 6: Iterate with batches and epochs
- Training proceeds over mini‑batches for stable gradient estimates.
- Multiple epochs are used to fit the dataset.
- Monitour validation loss to detect overfitting.
Practical usage
Pros and cons
- Pros
- Simple and flexible baseline for many structured problems.
- Efficient on CPUs and small GPUs.
- Works well with engineered features.
- Cons
- Not data‑efficient for images or sequences.
- Can be sensitive to feature scaling.
- Dense layers can be memory‑heavy at scale.
Implications and usage
- Use MLPs for tabular classification/regression, embeddings, and small‑scale tasks.
- Pair with normalisation and regularisation for stability.
- For images or text, consider CNNs or transformers unless the input is already vectorised.