Object detection

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Objects detected with OpenCV's Deep Neural Network module (dnn) by using a YOLOv3 model trained on COCO dataset capable to detect objects of 80 common classes.

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Uses[edit]

It is widely used in computer vision tasks such as image annotation,[1] activity recognition,[2] face detection, face recognition, video object co-segmentation. It is also used in tracking objects, for example tracking a ball during a football match, tracking movement of a cricket bat, or tracking a person in a video.

Concept[edit]

Every object class has its own special features that helps in classifying the class – for example all circles are round. Object class detection uses these special features. For example, when looking for circles, objects that are at a particular distance from a point (i.e. the center) are sought. Similarly, when looking for squares, objects that are perpendicular at corners and have equal side lengths are needed. A similar approach is used for face identification where eyes, nose, and lips can be found and features like skin color and distance between eyes can be found.

Methods[edit]

Methods for object detection generally fall into either machine learning-based approaches or deep learning-based approaches. For Machine Learning approaches, it becomes necessary to first define features using one of the methods below, then using a technique such as support vector machine (SVM) to do the classification. On the other hand, deep learning techniques are able to do end-to-end object detection without specifically defining features, and are typically based on convolutional neural networks (CNN).

See also[edit]

References[edit]

  1. ^ Ling Guan; Yifeng He; Sun-Yuan Kung (1 March 2012). Multimedia Image and Video Processing. CRC Press. pp. 331–. ISBN 978-1-4398-3087-1.
  2. ^ Wu, Jianxin, et al. "A scalable approach to activity recognition based on object use." 2007 IEEE 11th international conference on computer vision. IEEE, 2007.
  3. ^ Dalal, Navneet (2005). "Histograms of oriented gradients for human detection" (PDF). Computer Vision and Pattern Recognition. 1.
  4. ^ Ross, Girshick (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation" (PDF). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE: 580–587. arXiv:1311.2524. doi:10.1109/CVPR.2014.81. ISBN 978-1-4799-5118-5.
  5. ^ Girschick, Ross (2015). "Fast R-CNN" (PDF). Proceedings of the IEEE International Conference on Computer Vision: 1440–1448. arXiv:1504.08083. Bibcode:2015arXiv150408083G.
  6. ^ Shaoqing, Ren (2015). "Faster R-CNN". Advances in Neural Information Processing Systems. arXiv:1506.01497.
  7. ^ Liu, Wei (October 2016). SSD: Single shot multibox detector. European Conference on Computer Vision. Lecture Notes in Computer Science. 9905. pp. 21–37. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3.
  8. ^ Redmon, Joseph (2016). "You only look once: Unified, real-time object detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1506.02640. Bibcode:2015arXiv150602640R.
  9. ^ Redmon, Joseph (2017). "YOLO9000: better, faster, stronger". arXiv:1612.08242 [cs.CV].
  10. ^ Redmon, Joseph (2018). "Yolov3: An incremental improvement". arXiv:1804.02767 [cs.CV].
  11. ^ Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4203–4212. arXiv:1711.06897. Bibcode:2017arXiv171106897Z.
  12. ^ Lin, Tsung-Yi (2017). "Focal Loss for Dense Object Detection". arXiv:1708.02002 [cs.CV].
  13. ^ Zhu, Xizhou (2018). "Deformable ConvNets v2: More Deformable, Better Results". arXiv:1811.11168 [cs.CV].
  14. ^ Dai, Jifeng (2017). "Deformable Convolutional Networks". arXiv:1703.06211 [cs.CV].

External links[edit]