# Lecture 7

## Generative Models

Supervised Learning: data and label, learn a function to map x -> y.

Unsupervised: just data, no labels. Learn some underlying hidden structure of data.

&#x20;**Discriminivative vs Generative Models**

DIscrimininative Modle: learn a probability distribution $$p(x|y)$$

Generative Model: learn a probability distribution $$p(x)$$

Conditional Generative Model, learn $$p(x|y)$$

**Density Function**

$$p(x)$$ assigns a positive number to each possible $$x$$; higher numbers mean $$x$$ is more likely.

Normalized:

$$\int\_X p(x)dx = 1$$

Different values of $$x$$ comptete density.

Discriminative: Density function, $$p(x)$$ assigns a positive number to each possible $$x$$. Higher numbers mean $$x$$ is more likely. Possible labels for each input comptete for probability mass. But no competition between images.

No way for model to handle unreasonable inputs, it must give label distribution for all images.

Generative Model: all possible images compete with each other for probability mass.

Requires deep image understanding. Model can reject unreasonable input by assigning them small values.

Conditional Generative Model: each possible label induces a competition among all images.

Recall Bayes rule:

$$
P(x|y) = \frac{P(y|x)}{P(y)}P(x)
$$

<figure><img src="/files/NEFgvegRZSW5QJI5lP15" alt=""><figcaption><p>Bayes</p></figcaption></figure>

Discriminative -> Assign labels to data Feature Learning (with labels)

Generative -> Detect outliers. Feature learning (without labels). Sample to generate new data.

Conditional -> Assign labels, while rejecting outliers. Generate new data conditioned on input labels.

Taxonomy of Generative Models

<figure><img src="/files/zDzfUYIM7nJ6ZabnEWrS" alt=""><figcaption><p>Generative Model taxonomy</p></figcaption></figure>

## Autoregressive Model

Goal: explicit function for $$p(x) = f(x, W)$$

Given dataset $$x^{(1)}, x^{(2)}, \dots, x^{(N)}$$, train the model by solving:

$$W^\* = \argmax\_W \prod\_i p(x^{(i)})$$ Maximize probability of training data

$$= \argmax\_W \sum\_i \log p(x^{(i)})$$ Log trick to exchange product for sum

$$=\argmax\_W \sum\_i \log f(x^{(i)}, W)$$ Loss function, train for GD.

Assume $$x$$ consist of multiple subparts: $$x = (x\_1, x\_2, x\_3, \dots, x\_T)$$

Break down probability using chain rule: $$p(x) = p(x\_1, x\_2, x\_3, \dots, x\_T) = p(x\_1)p(x\_2|x\_1)p(x\_3|x\_1, x\_2)\dots$$

$$= \prod\_{t=1}^T p(x\_t|x\_1, \dots, x\_{t-1})$$ Probability of next subpart given all previous subparts.

![](/files/dx4nG6Xr9VSYExAib4Xr)

### Pixel RNN

Generate image pixels one at a time, starting at upper left corner.

Compute hidden state for each pixel that depends on hidden states and RGB from left and above.

$$
h\_{x,y} = f(h\_{x-1,y}, h\_{x, y-1}, W)
$$

At each pixel, predict red, then blue, then green, softmax over $$\[0,1,\dots,255]$$

Each pixel depends implciity on all pixels above and left.

Problem: slow during training and testing, N x N image requires 2N-1 sequential steps.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tianyi0216.gitbook.io/blog/course_notes/cs-839-notes/lecture-7.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
