You Only Look Once

You Only Look Once
Original author(s)	Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
Initial release	2015
Written in	Python
Type	Object detection; Convolutional neural network; Computer vision;
Website	https://pjreddie.com/darknet/yolo/

You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks. First introduced by Joseph Redmon et al. in 2015,^[1] YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks.^[2]

The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image.

Overview

Compared to previous methods like R-CNN and OverFeat,^[3] instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Versions

There are two parts to the YOLO series. The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon.^[4]

YOLOv1

The original YOLO algorithm, introduced in 2015, divided the image into an S x S grid. If the center of an object fell into a grid cell, that cell was responsible for detecting that object. Each grid cell predicted B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts.

YOLOv2

Released in 2016, YOLOv2 (also known as YOLO9000)^[5]^[6] improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes. It could detect over 9000 object categories. It was also released on GitHub under the Apache 2.0 license.^[7]

YOLOv3

YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function.^[8]

YOLOv4 and beyond

Subsequent versions of YOLO (v4, v5, etc.) have been developed by different researchers, further improving performance and introducing new features. These versions are not officially associated with the original YOLO authors but build upon their work.^[4] As of 2023^[update], there are up to YOLOv8.^[2]

References

^ Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, Ali (2016). "You Only Look Once: Unified, Real-Time Object Detection": 779–788. {{cite journal}}: Cite journal requires |journal= (help)
^ ^a ^b Terven, Juan; Córdova-Esparza, Diana-Margarita; Romero-González, Julio-Alejandro (2023-11-20). "A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS". Machine Learning and Knowledge Extraction. 5 (4): 1680–1716. doi:10.3390/make5040083. ISSN 2504-4990.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Sermanet, Pierre; Eigen, David; Zhang, Xiang; Mathieu, Michael; Fergus, Rob; LeCun, Yann (2014-02-23), OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, doi:10.48550/arXiv.1312.6229, retrieved 2024-09-12
^ ^a ^b "YOLO: Real-Time Object Detection". pjreddie.com. Retrieved 2024-09-12.
^ Redmon, Joseph; Farhadi, Ali (2017). "YOLO9000: Better, Faster, Stronger": 7263–7271. {{cite journal}}: Cite journal requires |journal= (help)
^ "YOLOv2: Real-Time Object Detection". pjreddie.com. Retrieved 2024-09-12.
^ Rémy, Philippe (2024-09-05), philipperemy/yolo-9000, retrieved 2024-09-12
^ Redmon, Joseph; Farhadi, Ali (2018-04-08), YOLOv3: An Incremental Improvement, doi:10.48550/arXiv.1804.02767, retrieved 2024-09-12

External links

[1] Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, Ali (2016). "You Only Look Once: Unified, Real-Time Object Detection": 779–788. {{cite journal}}: Cite journal requires |journal= (help)

[:0-2] Terven, Juan; Córdova-Esparza, Diana-Margarita; Romero-González, Julio-Alejandro (2023-11-20). "A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS". Machine Learning and Knowledge Extraction. 5 (4): 1680–1716. doi:10.3390/make5040083. ISSN 2504-4990.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[3] Sermanet, Pierre; Eigen, David; Zhang, Xiang; Mathieu, Michael; Fergus, Rob; LeCun, Yann (2014-02-23), OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, doi:10.48550/arXiv.1312.6229, retrieved 2024-09-12

[:1-4] "YOLO: Real-Time Object Detection". pjreddie.com. Retrieved 2024-09-12.

[5] Redmon, Joseph; Farhadi, Ali (2017). "YOLO9000: Better, Faster, Stronger": 7263–7271. {{cite journal}}: Cite journal requires |journal= (help)

[6] "YOLOv2: Real-Time Object Detection". pjreddie.com. Retrieved 2024-09-12.

[7] Rémy, Philippe (2024-09-05), philipperemy/yolo-9000, retrieved 2024-09-12

[8] Redmon, Joseph; Farhadi, Ali (2018-04-08), YOLOv3: An Incremental Improvement, doi:10.48550/arXiv.1804.02767, retrieved 2024-09-12

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]