Lecture 10

Diffusion Models

Learning to generate by denoising.

2 processes, forward diffusion process that gradually add noise to input and reverse denoising process that learn to generate data by denoising.

Forward

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = N (x_t; \sqrt{1- \beta_t}x_{t-1}, \beta_tI) -> q(x1:Tx0)=t=1Tq(xtxt1)q(x_{1:T}|x_0) = \prod_{t=1}^T q(x_t|x_{t-1})

Diffusion Kernel

define α^t=s=1t(1βs)\hat{\alpha}_t = \prod{s=1}^t (1-\beta_s) -> q(xtx0)=N(xt;α^tx0,(1α^t)I))q(x_t|x_0) = N(x_t; \sqrt{\hat{\alpha}_t}x_0, (1-\hat{\alpha}_t)I))

For sampling: xt=α^tx0+(1α^tϵx_t = \sqrt{\hat{\alpha}_t}x_0 + \sqrt{(1-\hat{\alpha}_t}\epsilon where ϵN(0,I)\epsilon \sim N(0, I)

βt\beta_t values schedule is designed such that α^t0\hat{\alpha}_t \rightarrow 0 and q(xTx0)N(xT;0,I)q(x_T|x_0) \approx N(x_T; 0, I)

Forward distribution

Denoising

Generation:

Sample xTN(xT;0,I)x_T \sim N(x_T ; 0,I)

Iteratively sample xt1q(xt1xt)x_{t-1} \sim q(x_{t-1}|x_t)

In general, q(xt1xt)q(xt1)q(xtxt1)q(x_{t-1}|x_t) \propto q(x_{t-1})q(x_t|x_{t-1}) is intractable

We can approximate q(xt1xt)q(x_{t-1}|x_t) use normal distribution if βt\beta_t is small each forward diffusion step.

Reverse

Learning

Learning

Parameterization

Last updated