# YOLO

For object detection.

Objection direction as a regression problem to spatially separated bounding boxes and associated class probabilities using a single neural network.&#x20;

<figure><img src="/files/hUMIeB1D4lQg0l3liwrm" alt=""><figcaption></figcaption></figure>

Advantages:

1. Extremely fast - since it is an regression problem.
2. Reasons globally about the image when making predictions.&#x20;
3. Learns generalizable representations of objects.&#x20;

Method/Process.

First, divide input image into an $$S \times S$$ grid. If center of an object falls into cell - cell is responsible for detecting the object. Each ggrid cell predicts $$B$$ bounding boxes and confident score for the boxes. - IOU between the predicted box and the truth (score)

Each box consist of 5 predictions. $$x, y, w, h$$ and confidence.

Each grid cell also predicts class probabilities $$Pr(Class\_i | Object)$$.

At test time, we multiply conditional class probabilities with box confidnece prediction, the result is then&#x20;

$$
Pr(Class\_i) \* IOU\_{pred}^{truth}
$$

<figure><img src="/files/cvqeI4IEVHQnzZwbrC1p" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/twRZGKUz6mxSF2G24Eke" alt=""><figcaption></figcaption></figure>

Training: details in paper. Final layer predict both class probabilities and bounding box. Final layer is linear activation, other is ReLu. Also dropout

Loss function:

<figure><img src="/files/wDWxVvo1BOm5skwij6Wd" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tianyi0216.gitbook.io/blog/reading-notes/ml-readings/yolo.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
