Novelty detection

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of,[1] with the help of either statistical or machine learning based approaches.

Novelty detection is one of the fundamental requirements of a good classification system.[1] A machine learning system can never be trained with all the possible object classes and hence the performance of the network will be poor for those classes that are under-represented in the training set.[2] A good classification system must have the ability to differentiate between known and unknown objects during testing.[1] For this purpose, different models for novelty detection have been proposed.

Novelty detection is a hard problem in machine learning since it depends on the statistics of the already known information. A generally applicable, parameter-free method for outlier detection in a high-dimensional space is not yet known. Novelty detection finds a variety of applications especially in signal processing, computer vision, pattern recognition, data mining and robotics.[1] Another important application is the detection of a disease or potential fault whose class may be under-represented in the training set.[2]

The statistical approaches to novelty detection may be classified into parametric and non-parametric approaches. Parametric approaches assume a specific statistical distribution (such as a Gaussian distribution) of data and statistical modeling based on data mean and covariance, whereas non-parametric approaches do not make any assumption on the statistical properties of data.[1]