YOLO

https://arxiv.org/abs/1506.02640

For object detection.

Objection direction as a regression problem to spatially separated bounding boxes and associated class probabilities using a single neural network.

Advantages:

  1. Extremely fast - since it is an regression problem.

  2. Reasons globally about the image when making predictions.

  3. Learns generalizable representations of objects.

Method/Process.

First, divide input image into an S×SS \times S grid. If center of an object falls into cell - cell is responsible for detecting the object. Each ggrid cell predicts BB bounding boxes and confident score for the boxes. - IOU between the predicted box and the truth (score)

Each box consist of 5 predictions. x,y,w,hx, y, w, h and confidence.

Each grid cell also predicts class probabilities Pr(ClassiObject)Pr(Class_i | Object).

At test time, we multiply conditional class probabilities with box confidnece prediction, the result is then

Pr(Classi)IOUpredtruthPr(Class_i) * IOU_{pred}^{truth}

Training: details in paper. Final layer predict both class probabilities and bounding box. Final layer is linear activation, other is ReLu. Also dropout

Loss function:

Last updated