Syllabus Map

Overview

Linear Regression is one of the simplest and most fundamental algorithms in Machine Learning for regression.
It models the relationship between an input variable $x$ and an output variable $y$ by fitting a straight line that best represents the data.

\hat{y}_i = w x_i + b

Where:
- $w$ : weight (slope)
- $b$ : bias (intercept)
- $\hat{y}_i$ : predicted value for input $x_i$

To measure how well our model fits the data, we use the Mean Squared Error (MSE) cost function:

C = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}_i - y_i)^2

C = \frac{1}{2m} \sum_{i=1}^{m} (w x_i + b - y_i)^2

w = w - \alpha \frac{\partial C}{\partial w}

b = b - \alpha \frac{\partial C}{\partial b}

Derivative of cost $w.r.t.$ prediction:
$\frac{\partial C}{\partial \hat{y}_i} = \frac{1}{m} (\hat{y}_i - y_i)$
Derivative of prediction $w.r.t.$ parameters:
$\frac{\partial \hat{y}_i}{\partial w} = x_i, \quad \frac{\partial \hat{y}_i}{\partial b} = 1$
Applying chain rule:
- Gradient $w.r.t.$ $w$ : $\frac{\partial C}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_i - y_i) x_i$
- Gradient $w.r.t.$ $b$ : $\frac{\partial C}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_i - y_i)$

w = w - \alpha \cdot \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_i - y_i) x_i

b = b - \alpha \cdot \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_i - y_i)