Image Genartion

Model Families

Variational Autoencoders (VAEs)

Encode images to a compressed size, then decode back to the origional size, learning the distribution of the image data as they go.

Generative Adversarial Models (GANs)

Pit two neural networks against each other. One of them (the generator) creates images and the other (the discriminator) tries to discern if the images are real or fake. The discriminator neural network learns and gets better at distinguishing real and fake images and the generator also learns to create better fakes. They learn from each other and yield ‘deep fake’ images.

Autoregressive Models

Generate images by treating an image as a sequence of pixels. Modern approaches draw on Large Language Model techniques.

Diffusion Models

Draws on thermodynamic science to generate images. The essential idea is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process (adding noise, iteratively, to the image for example). We then learn a reverse diffusion process that restores structure in data (a way to de-noise and restore the image), yielding a highly flexible and tractable generative model of the data.

Unconditioned generation:

Human Face synthesis
Super-resolution

Conditioned Generations

Text-to-Image
Image-Inpainting
Text-Guided Image-to-Image

Denoising Diffusion Pbobabilistic Models (DDPM) are models that can create novel images from noise. These are created by first training the models on a diffusion process in which such Gaussian noise is added to the image and the model is trained on the result. After this phase the model is trained on a reverse diffusion process in which Gaussian noise is removed from the image. This is meant to reconstruct the image that the model was originally trained on. This entire process is called DDPM Generation.