Hierarchical classification

A hierarchical classifier is a classifier that maps input data into defined subsumptive output categories. The classification occurs first on a low-level with highly specific pieces of input data. The classifications of the individual pieces of data are then combined systematically and classified on a higher level iteratively until one output is produced. This final output is the overall classification of the data. Depending on application-specific details, this output can be one of a set of pre-defined outputs, one of a set of on-line learned outputs, or even a new novel classification that hasn't been seen before. Generally, such systems rely on relatively simple individual units of the hierarchy that have only one universal function to do the classification. In a sense, these machines rely on the power of the hierarchical structure itself instead of the computational abilities of the individual components. This makes them relatively simple, easily expandable, and very powerful.

Application

Many applications exist that are efficiently implemented using hierarchical classifiers or variants thereof. The clearest example^{[according to whom?]} lies in the area of computer vision. Recognizing pictures is something that hierarchical processing can do well.^{[citation needed]} The reason the model is so well fit to this application is that pictures can intuitively be viewed as a collection of components or objects. These objects can be viewed as collections of smaller components like shapes, which can be viewed as collections of lines, and so on. This coincides directly with the way hierarchical processing works. If a simple unit of the processing hierarchy can classify lines into shapes, then an equivalent unit could process shapes into objects (of course, there are some intermediate steps between these, but the idea is there). Thus, if you arrange these generic classifying units in a hierarchical fashion (using a directed acyclic graph), a full step-by-step classification can ensue from pixels of color all the way up to an abstract label of what is in the picture.

There are a lot of similar applications that can also be tackled by hierarchical classification such as written text recognition^{[clarification needed - ambiguous term]}, robot awareness, etc. It is possible that mathematical models and problem solving methods can also be represented in this fashion.^{[citation needed]} If this is the case, future research in this area could lead to very successful automated theorem provers across multiple domain. Such developments would be very powerful,^{[according to whom?]} but is yet unclear how exactly these models are applicable.

Similar models

One similar model is the notion of graphical models where an input space is systematically broken down into subspaces, and those into smaller subspaces, and so on, creating a hierarchy of input spaces. This allows for predictions about behavior of inputs in various regions with statistical methods such as Bayesian networks allowing for easily computable conditional probabilities. Recently, there has been a lot of research in this area with respect to vision systems. Hierarchical classifiers are extremely similar to these models, but do not have to depend on statistical interpretation.

Another similar model is the simple neural network. Commonly, neural networks are a network of individual nodes that each tries to learn a function of input to output. The functionality of the network as a whole is dependent on the ability of the nodes to work together to yield the correct overall output. Neural networks can be trained to do lots of tasks and are often domain-specific. However, as in the case of graphical models, neural networks have shown great general-purpose behavior in computer vision even when tackling relatively general problems. Hierarchical classifiers can, in fact, be seen as a special case of neural networks where, instead of learning functions, discrete output classes are learned.^{[citation needed]} Learning is then a pattern-match with an error threshold instead of an interpolation of an approximate function.

Neuroscience's perspective on the workings of the human cortex also serves as a similar model. The generally accepted view of the brain today is that the brain is a generic pattern machine that works to abstract information again and again until it relates to a broad stored concept. For instance, a familiar face is not stored as a collection of pixels, rather as a combination of very specific eyes, nose, mouth, ears, etc. In this way, when the data has been classified into those components, that collection of those components can then be classified into that face. Thus, neuroscience trends and data are very valuable to research in these areas as they are highly relevant to the inner workings of these models. This is especially true since the human brain is inherently very good at applications like facial recognition that these models strive to be good at. The brain is in a sense a benchmark of proficiency for hierarchical processing.