IOAI ML Notes Classical Machine LearningUnsupervised Learning

DBSCAN, Hierarchical & Spectral Clustering

A concise guide to density-based, hierarchical, and spectral clustering.

Syllabus Map


Overview


DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Core Idea

How It Works (Step-by-Step)

Step 1: Define Neighborhood and Density Threshold

Nε(xi)={xjd(xi,xj)ε}N_\varepsilon(x_i)=\{x_j \mid d(x_i,x_j)\le \varepsilon\}

Step 2: Identify Core Points

Nε(xi)min_samples|N_\varepsilon(x_i)| \ge \text{min\_samples}

Step 3: Expand Clusters from Core Points

Step 4: Attach Border Points

Step 5: Label Remaining Points as Noise

Key Hyperparameters

Practical Notes

Handles outliers well.

Struggles with varying densities.

Scale features before DBSCAN.


Hierarchical Clustering

Core Idea

How It Works (Step-by-Step)

Agglomerative Clustering (Bottom-Up)

Step 1: Compute Pairwise Distances

dij=d(xi,xj)d_{ij}=d(x_i,x_j)

Step 2: Initialise Singleton Clusters

C(0)={{x1},{x2},,{xn}}\mathcal{C}^{(0)}=\{\{x_1\},\{x_2\},\dots,\{x_n\}\}

Step 3: Measure Inter-Cluster Distance with a Linkage Rule

Step 4: Merge Closest Clusters Iteratively

Step 5: Cut the Dendrogram

Divisive Clustering (Top-Down)

Step 1: Start with One Cluster

C(0)={{x1,x2,,xn}}\mathcal{C}^{(0)}=\{\{x_1,x_2,\dots,x_n\}\}

Step 2: Select a Cluster to Split

Step 3: Split the Selected Cluster

Step 4: Repeat Recursive Splitting

Step 5: Read Final Clusters from the Dendrogram

Linkage Criteria

Practical Notes

No need to pre-specify number of clusters.

Can be expensive for large datasets.

Choice of linkage changes cluster shape.


Spectral Clustering

Core Idea

How It Works (Step-by-Step)

Step 1: Build an Affinity (Similarity) Matrix

Wij=exp(xixj22σ2)W_{ij}=\exp\left(-\frac{\|x_i-x_j\|^2}{2\sigma^2}\right)

Step 2: Compute Degree Matrix and Graph Laplacian

Dii=jWijD_{ii}=\sum_j W_{ij} L=DWL=D-W Lsym=ID1/2WD1/2L_{\text{sym}}=I-D^{-1/2}WD^{-1/2}

Step 3: Solve Eigenproblem

Step 4: Embed Points in Spectral Space

Step 5: Cluster in the Embedded Space

Why It Helps

Practical Notes

Good for non-convex clusters.

Requires choosing number of clusters.

Sensitive to graph construction.

← Back to Blog