Lecture 7

Generative Models

Supervised Learning: data and label, learn a function to map x -> y.

Unsupervised: just data, no labels. Learn some underlying hidden structure of data.

Discriminivative vs Generative Models

DIscrimininative Modle: learn a probability distribution p(xy)p(x|y)

Generative Model: learn a probability distribution p(x)p(x)

Conditional Generative Model, learn p(xy)p(x|y)

Density Function

p(x)p(x) assigns a positive number to each possible xx; higher numbers mean xx is more likely.

Normalized:

Xp(x)dx=1\int_X p(x)dx = 1

Different values of xx comptete density.

Discriminative: Density function, p(x)p(x) assigns a positive number to each possible xx. Higher numbers mean xx is more likely. Possible labels for each input comptete for probability mass. But no competition between images.

No way for model to handle unreasonable inputs, it must give label distribution for all images.

Generative Model: all possible images compete with each other for probability mass.

Requires deep image understanding. Model can reject unreasonable input by assigning them small values.

Conditional Generative Model: each possible label induces a competition among all images.

Recall Bayes rule:

P(xy)=P(yx)P(y)P(x)P(x|y) = \frac{P(y|x)}{P(y)}P(x)
Bayes

Discriminative -> Assign labels to data Feature Learning (with labels)

Generative -> Detect outliers. Feature learning (without labels). Sample to generate new data.

Conditional -> Assign labels, while rejecting outliers. Generate new data conditioned on input labels.

Taxonomy of Generative Models

Generative Model taxonomy

Autoregressive Model

Goal: explicit function for p(x)=f(x,W)p(x) = f(x, W)

Given dataset x(1),x(2),,x(N)x^{(1)}, x^{(2)}, \dots, x^{(N)}, train the model by solving:

W=arg maxWip(x(i))W^* = \argmax_W \prod_i p(x^{(i)}) Maximize probability of training data

=arg maxWilogp(x(i)) = \argmax_W \sum_i \log p(x^{(i)}) Log trick to exchange product for sum

=arg maxWilogf(x(i),W) =\argmax_W \sum_i \log f(x^{(i)}, W) Loss function, train for GD.

Assume xx consist of multiple subparts: x=(x1,x2,x3,,xT)x = (x_1, x_2, x_3, \dots, x_T)

Break down probability using chain rule: p(x)=p(x1,x2,x3,,xT)=p(x1)p(x2x1)p(x3x1,x2)p(x) = p(x_1, x_2, x_3, \dots, x_T) = p(x_1)p(x_2|x_1)p(x_3|x_1, x_2)\dots

=t=1Tp(xtx1,,xt1) = \prod_{t=1}^T p(x_t|x_1, \dots, x_{t-1}) Probability of next subpart given all previous subparts.

Pixel RNN

Generate image pixels one at a time, starting at upper left corner.

Compute hidden state for each pixel that depends on hidden states and RGB from left and above.

hx,y=f(hx1,y,hx,y1,W)h_{x,y} = f(h_{x-1,y}, h_{x, y-1}, W)

At each pixel, predict red, then blue, then green, softmax over [0,1,,255][0,1,\dots,255]

Each pixel depends implciity on all pixels above and left.

Problem: slow during training and testing, N x N image requires 2N-1 sequential steps.

Last updated