IOAI ML Notes Computer VisionDeep Learning

Image Generation

GANs and diffusion models for generating images.

Syllabus Map


Overview


GAN (Generative Adversial Network)

Core Idea

minGmaxD  Expdata[logD(x)]+Ezp(z)[log(1D(G(z)))]\min_G \max_D \; \mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z\sim p(z)}[\log(1-D(G(z)))]

Step-by-Step GAN Training

Step 1: Sample data and latent vectors

Step 2: Update discriminator

Step 3: Update generator

Step 4: Alternate updates

Important GAN Variants

Practical Notes

Training stability is the main challenge

Balance generator and discriminator updates

Watch for mode collapse

Track quality with standard metrics


Diffusion Models

Core Idea

Denoising Process

Markov Formulation

q(xtxt1)=N ⁣(1βtxt1,βtI)q(x_t \mid x_{t-1}) = \mathcal{N}\!\left(\sqrt{1-\beta_t}\,x_{t-1}, \beta_t I\right)

Model Components

Text Conditioning

Step-by-Step Diffusion Training

Step 1: Sample image and timestep

Step 2: Add noise

xt=αˉtx0+1αˉtϵx_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon

Step 3: Predict noise

Step 4: Optimize denoising loss

Lsimple=Ex0,ϵ,t[ϵϵθ(xt,t,c)22]\mathcal{L}_{\text{simple}}= \mathbb{E}_{x_0,\epsilon,t}\left[ \|\epsilon-\epsilon_\theta(x_t,t,c)\|_2^2 \right]

Step 5: Sample images

Guidance and Samplers

Classifier Guidance (Older Approach)

Classifier-free guidance (CFG)

ϵ^=ϵθ(xt,t,)+w(ϵθ(xt,t,c)ϵθ(xt,t,))\hat{\epsilon}= \epsilon_\theta(x_t,t,\varnothing)+ w\left(\epsilon_\theta(x_t,t,c)-\epsilon_\theta(x_t,t,\varnothing)\right)

DDIM (Denoising Diffusion Implicit Models)

xt1=αˉt1x^0+1αˉt1σt2ϵ^θ+σtz,zN(0,I)x_{t-1} = \sqrt{\bar{\alpha}_{t-1}}\,\hat{x}_0 + \sqrt{1-\bar{\alpha}_{t-1}-\sigma_t^2}\,\hat{\epsilon}_\theta + \sigma_t z,\quad z\sim\mathcal{N}(0,I)

Inpainting

Practical Notes

Delivers strong image quality

Inference is usually slower than GANs

Latent diffusion improves efficiency

Tune sampler controls carefully


GANs vs Diffusion (Quick Comparison)

← Back to Blog