Part-based models

Part-based models refers to a broad class of detection algorithms used on images, in which various parts of the image are used separately in order to determine if and where an object of interest exists. Amongst these methods a very popular one is the constellation model which refers to those schemes which seek to detect a small number of features and their relative positions to then determine whether or not the object of interest is present.

These models build on the original idea of Fischler and Elschlager^[1] of using the relative position of a few template matches and evolve in complexity in the work of Perona and others.^[2] These models will be covered in the constellation models section. To get a better idea of what is meant by constellation model an example may be more illustrative. Say we are trying to detect faces. A constellation model would use smaller part detectors, for instance mouth, nose and eye detectors and make a judgment about whether an image has a face based on the relative positions in which the components fire.

Non-constellation models

Many overlapping ideas are included under the title part-based models even after having excluded those models of the constellation variety. The uniting thread is the use of small parts to build up to an algorithm that can detect/recognize an item (face, car, etc.) Early efforts, such as those by Yuille, Hallinan and Cohen^[3] sought to detect facial features and fit deformable templates to them. These templates were mathematically defined outlines which sought to capture the position and shape of the feature. Yuille, Hallinan and Cohen's algorithm does have trouble finding the global minimum fit for a given model and so templates did occasionally become mismatched.

Later efforts such as those by Poggio and Brunelli^[4] focus on building specific detectors for each feature. They use successive detectors to estimate scale, position, etc. and narrow the search field to be used by the next detector. As such it is a part-based model, however, they seek more to recognize specific faces rather than to detect the presence of a face. They do so by using each detector to build a 35 element vector of characteristics of a given face. These characteristic can then be compared to recognize specific faces, however cut-offs can also be used to detect whether a face is present at all.^[5]

Cootes, Lanitis and Taylor^[6] build on this work in constructing a 100 element representation of the primary features of a face. The model is more detailed and robust however, given the additional complexity (100 elements compared to 35) this might be expected. The model essentially computes deviations from a mean face in terms of shape, orientation and gray level. The model is matched by the minimization of an error function. These three classes of algorithms naturally fall within the scope of template matching^[7]

Of the non-constellation perhaps the most successful is that of Leibe and Schiele.^[8]^[9] Their algorithm finds templates associated with positive examples and records both the template (an average of the feature in all positive examples where it is present) and the position of the center of the item (a face for instance) relative to the template. The algorithm then takes a test image and runs an interest point locater (hopefully one of the scale invariant variety). These interest points are then compared to each template and the probability of a match is computed. All templates then cast votes for the center of the detected object proportional to the probability of the match, and the probability the template predicts the center. These votes are all summed and if there are enough of them, well enough clustered, the presence of the object in question (i.e. a face or car) is predicted.

The algorithm is effective because it imposes much less constellational rigidity the way the constellation model does. Admittedly the constellation model can be modified to allow for occlusions and other large abnormalities but this model is naturally suited to it. Also it must be said that sometimes the more rigid structure of the constellation is desired.

References

^ Fischler, M.A.; Elschlager, R.A. (1973). "The Representation and Matching of Pictorial Structures". IEEE Transactions on Computers. C-22: 67–92. doi:10.1109/T-C.1973.223602.
^ Fergus, R.; Perona, P.; Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. pp. II–264. doi:10.1109/CVPR.2003.1211479. ISBN 0-7695-1900-8.
^ Yuille, Alan L.; Hallinan, Peter W.; Cohen, David S. (1992). "Feature extraction from faces using deformable templates". International Journal of Computer Vision. 8 (2): 99. doi:10.1007/BF00127169.
^ Brunelli, R.; Poggio, T. (1993). "Face recognition: Features versus templates". IEEE Transactions on Pattern Analysis and Machine Intelligence. 15 (10): 1042. doi:10.1109/34.254061.
^ Simonite, Tom. "Photo Algorithms ID White Men Fine—Black Women, Not So Much". Wired. ISSN 1059-1028. Retrieved 2023-04-17.
^ Lanitis, A.; Taylor, C.J.; Cootes, T.F. (1995). A unified approach to coding and interpreting face images. IEEE International Conference on Computer Vision. p. 368. doi:10.1109/ICCV.1995.466919. ISBN 0-8186-7042-8.
^ Brunelli, R. (2009). Template Matching Techniques in Computer Vision: Theory and Practice. Wiley. ISBN 978-0-470-51706-2.
^ Leibe, Bastian; Leonardis, Aleš; Schiele, Bernt (2007). "Robust Object Detection with Interleaved Categorization and Segmentation". International Journal of Computer Vision. 77 (1–3): 259–289. CiteSeerX 10.1.1.111.464. doi:10.1007/s11263-007-0095-3.
^ Leibe, Bastian; Leonardis, Ales; Schiele, Bernt (2006). "An Implicit Shape Model for Combined Object Categorization and Segmentation". Toward Category-Level Object Recognition. Lecture Notes in Computer Science. Vol. 4170. p. 508. CiteSeerX 10.1.1.5.6272. doi:10.1007/11957959_26. ISBN 978-3-540-68794-8.

[1] Fischler, M.A.; Elschlager, R.A. (1973). "The Representation and Matching of Pictorial Structures". IEEE Transactions on Computers. C-22: 67–92. doi:10.1109/T-C.1973.223602.

[2] Fergus, R.; Perona, P.; Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. pp. II–264. doi:10.1109/CVPR.2003.1211479. ISBN 0-7695-1900-8.

[3] Yuille, Alan L.; Hallinan, Peter W.; Cohen, David S. (1992). "Feature extraction from faces using deformable templates". International Journal of Computer Vision. 8 (2): 99. doi:10.1007/BF00127169.

[4] Brunelli, R.; Poggio, T. (1993). "Face recognition: Features versus templates". IEEE Transactions on Pattern Analysis and Machine Intelligence. 15 (10): 1042. doi:10.1109/34.254061.

[5] Simonite, Tom. "Photo Algorithms ID White Men Fine—Black Women, Not So Much". Wired. ISSN 1059-1028. Retrieved 2023-04-17.

[6] Lanitis, A.; Taylor, C.J.; Cootes, T.F. (1995). A unified approach to coding and interpreting face images. IEEE International Conference on Computer Vision. p. 368. doi:10.1109/ICCV.1995.466919. ISBN 0-8186-7042-8.

[7] Brunelli, R. (2009). Template Matching Techniques in Computer Vision: Theory and Practice. Wiley. ISBN 978-0-470-51706-2.

[8] Leibe, Bastian; Leonardis, Aleš; Schiele, Bernt (2007). "Robust Object Detection with Interleaved Categorization and Segmentation". International Journal of Computer Vision. 77 (1–3): 259–289. CiteSeerX 10.1.1.111.464. doi:10.1007/s11263-007-0095-3.

[9] Leibe, Bastian; Leonardis, Ales; Schiele, Bernt (2006). "An Implicit Shape Model for Combined Object Categorization and Segmentation". Toward Category-Level Object Recognition. Lecture Notes in Computer Science. Vol. 4170. p. 508. CiteSeerX 10.1.1.5.6272. doi:10.1007/11957959_26. ISBN 978-3-540-68794-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Non-constellation models

See also

References