A hierarchical classifier is a classifier that maps input data into defined subsumptive output categories. The classification occurs first on a low-level with highly specific pieces of input data. The classifications of the individual pieces of data are then combined systematically and classified on a higher level iteratively until one output is produced. This final output is the overall classification of the data. Depending on application-specific details, this output can be one of a set of pre-defined outputs, one of a set of on-line learned outputs, or even a new novel classification that hasn't been seen before. Generally, such systems rely on relatively simple individual units of the hierarchy that have only one universal function to do the classification. In a sense, these machines rely on the power of the hierarchical structure itself instead of the computational abilities of the individual components. This makes them relatively simple, easily expandable, and very powerful.
Many applications exist that are efficiently implemented using hierarchical classifiers or variants thereof. The clearest example[according to whom?] lies in the area of computer vision. Recognizing pictures is something that hierarchical processing can do well. The reason the model is so well fit to this application is that pictures can intuitively be viewed as a collection of components or objects. These objects can be viewed as collections of smaller components like shapes, which can be viewed as collections of lines, and so on. This coincides directly with the way hierarchical processing works. If a simple unit of the processing hierarchy can classify lines into shapes, then an equivalent unit could process shapes into objects (of course, there are some intermediate steps between these, but the idea is there). Thus, if you arrange these generic classifying units in a hierarchical fashion (using a directed acyclic graph), a full step-by-step classification can ensue from pixels of color all the way up to an abstract label of what is in the picture.
There are a lot of similar applications that can also be tackled by hierarchical classification such as written text recognition[clarification needed - ambiguous term], robot awareness, etc. It is possible that mathematical models and problem solving methods can also be represented in this fashion. If this is the case, future research in this area could lead to very successful automated theorem provers across multiple domain. Such developments would be very powerful,[according to whom?] but is yet unclear how exactly these models are applicable.
One similar model is the notion of graphical models where an input space is systematically broken down into subspaces, and those into smaller subspaces, and so on, creating a hierarchy of input spaces. This allows for predictions about behavior of inputs in various regions with statistical methods such as Bayesian networks allowing for easily computable conditional probabilities. Recently, there has been a lot of research in this area with respect to vision systems. Hierarchical classifiers are extremely similar to these models, but do not have to depend on statistical interpretation.
Another similar model is the simple neural network. Commonly, neural networks are a network of individual nodes that each tries to learn a function of input to output. The functionality of the network as a whole is dependent on the ability of the nodes to work together to yield the correct overall output. Neural networks can be trained to do lots of tasks and are often domain-specific. However, as in the case of graphical models, neural networks have shown great general-purpose behavior in computer vision even when tackling relatively general problems. Hierarchical classifiers can, in fact, be seen as a special case of neural networks where, instead of learning functions, discrete output classes are learned. Learning is then a pattern-match with an error threshold instead of an interpolation of an approximate function.
Neuroscience's perspective on the workings of the human cortex also serves as a similar model. The generally accepted view of the brain today is that the brain is a generic pattern machine that works to abstract information again and again until it relates to a broad stored concept. For instance, a familiar face is not stored as a collection of pixels, rather as a combination of very specific eyes, nose, mouth, ears, etc. In this way, when the data has been classified into those components, that collection of those components can then be classified into that face. Thus, neuroscience trends and data are very valuable to research in these areas as they are highly relevant to the inner workings of these models. This is especially true since the human brain is inherently very good at applications like facial recognition that these models strive to be good at. The brain is in a sense a benchmark of proficiency for hierarchical processing.
Tomaso Poggio of MIT does research in the area of computer vision and has recently been developing a hierarchical vision system that is both relatively simple and empirically comparable to the human brain. The research combines neural networks with lots of cognitive psychology and neuroscience data in an attempt to create the most realistic and human-like artificial vision system that exists.
Tom Dean of Brown University researches graphical models and Bayesian networks and is currently developing a hierarchical system that claims to have very good results in vision problems. This model is able to very simply produce properties such as rotational and translational invariance that a reliable vision system needs in order to yield non-trivial results.
Jeff Hawkins is the founder of Palm Computing and the Redwood Neuroscience Institute and is the author of On Intelligence in which he proposes his theories on the workings of the brain that center around hierarchical processing and the brain as a generic pattern machine that functions by continually abstracting and categorizing data.
Leslie Lamport is the author of the paper "How to write a proof", in which he proposes to write proofs in a hierarchical fashion with main ideas, sub-ideas, sub-sub-ideas, etc. The proofs are written in such a way that they mirror the structure of a tree. Some automated theorem provers of today have attempted to capitalize on this formalization of the structure of proofs as to more efficiently solve problems. However, none of these theorem provers have the capabilities to adequately solve problems across domains as current vision systems are beginning to be able to do. Thus, it is highly possible but still unknown whether similar tactics to the ones used in the vision system and specifically hierarchical processing can dramatically improve automated theorem provers.