Diffusion Models
Learning to generate by denoising.
2 processes, forward diffusion process that gradually add noise to input and reverse denoising process that learn to generate data by denoising.
Forward
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI) -> q(x1:T∣x0)=∏t=1Tq(xt∣xt−1)
Diffusion Kernel
define α^t=∏s=1t(1−βs) -> q(xt∣x0)=N(xt;α^tx0,(1−α^t)I))
For sampling: xt=α^tx0+(1−α^tϵ where ϵ∼N(0,I)
βt values schedule is designed such that α^t→0 and q(xT∣x0)≈N(xT;0,I)
Denoising
Generation:
Sample xT∼N(xT;0,I)
Iteratively sample xt−1∼q(xt−1∣xt)
In general, q(xt−1∣xt)∝q(xt−1)q(xt∣xt−1) is intractable
We can approximate q(xt−1∣xt) use normal distribution if βt is small each forward diffusion step.
Learning
Parameterization