From Video – Hung-Yi Lee
The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material. – Michelangelo.

- Reverse Process:
Given a noise image + step (the severity of the noise), the model will denoise and output an image with less noise.
In fact, the model is a Noise Predictor, it predict a noise, and then subtract the noisy image with this noise to generate the output image (less noisy). - Forward Process: Add noise and generate training data
Iteratively add randomly sampled “noise” to a clean image.
This “noise” would be the “Ground-truth” for Noise Predictor to learn.

Input to the Denoise model:
- Noisy image
- Noise step (how severe is the noise?)
Output:
- The image after denoising.

How to Train the Noise Predictor?


Text-to-Image Generative Model
First, you have to have text-image pair dataset.
Then, add one additional input to Denoise model: the text.
The rest is the same! That simple, huh?



The algorithm from original DDPM paper:

Thank you! Hope you enjoy the learning and see you next time!
