Syllabus Map
- Study map: Syllabus Study Map
k-Nearest Neighbours
Definition
- The k-nearest neighbours (KNN) algorithm is a non-parametric, supervised learning classifier.
- It uses proximity to make classifications or predictions about the grouping of an individual data point.
Distance Metrics
Euclidean Distance
- Calculates the straight-line distance of a point from another point , where is the -th dimension :
Manhattan Distance
- Calculates the absolute distance of a point from another point , where is the -th dimension :
Minkowski Distance
Inference
-
k-Nearest Neighbours is a lazy learner, meaning there is no process of discrete training and saving of weights.
-
This makes inference simpler at the cost of performance.
-
Consider the dataset :
- For the subset of points with class
K-Nearest Neighbours In Practice
When to Use K-Nearest Neighbours
- When the training data changes often and retraining cost should be minimal.
- When example-based explanations are useful for stakeholders.
- When local neighbourhoods reflect meaningful similarity.
- When you can trade slower inference for simple modelling.
When Not to Use K-Nearest Neighbours
- When features have incompatible scales that cannot be normalised.
- When class imbalance makes local voting unreliable.
- When memory constraints prevent storing the full dataset.
- When privacy constraints disallow retaining raw examples.
Practical Notes
Efficiency and Distance
- Use KD-tree or ball-tree search for speed in low dimensions.
- Weight neighbours by distance to reduce boundary noise.
Hyperparameters and Dimensionality
- Tune and the metric with cross-validation.
- Consider dimensionality reduction before KNN in high dimensions.