IOAI ML Notes Computer VisionDeep Learning

Convolutional Layers

Convolution basics and core CNN building blocks.

Syllabus Map


Overview


Convolution Basics

Core Idea

Kernel sliding over an image

Key Concepts

Kernels

Stride

Padding

Output size

H_{out} = \left\lfloour \frac{H + 2P - K}{S} \right\rfloour + 1 W_{out} = \left\lfloour \frac{W + 2P - K}{S} \right\rfloour + 1

Channels and feature maps

Convolution operation

yi,j=u,vwu,vxi+u,  j+vy_{i,j} = \sum_{u,v} w_{u,v}\,x_{i+u,\;j+v}

CNN Overview (General Structure)

Typical pipeline

Spatial hierarchy


CNN vs Fully Connected (FC)

Why CNNs dominate for images

When FC makes sense


Residual Connections

Core idea

y=F(x)+xy = F(x) + x

Why residual learning helps

Identity vs projection shortcuts

Gradient flow


PyTorch Implementation

Core building blocks

import torch
import torch.nn as nn

conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1)
bn = nn.BatchNorm2d(32)
relu = nn.ReLU()
pool = nn.MaxPool2d(kernel_size=2, stride=2)

A simple CNN block

class ConvBlock(nn.Module):
    def __init__(self, in_ch, out_ch):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

    def forward(self, x):
        return self.net(x)

Residual block (basic)

class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        return self.relu(out + x)

Dataset processing (PyTorch)

Basic image dataset

from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_ds = datasets.ImageFolder("data/train", transform=transform)
val_ds = datasets.ImageFolder("data/val", transform=transform)

DataLoader

from torch.utils.data import DataLoader

train_loader = DataLoader(train_ds, batch_size=32, shuffle=True, num_workers=2)
val_loader = DataLoader(val_ds, batch_size=32, shuffle=False, num_workers=2)
← Back to Blog