Jump to content

Anomaly detection

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Olexa Riznyk (talk | contribs) at 23:06, 21 June 2015 (→‎Popular techniques: a reference for replicator neural networks added). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In data mining, anomaly detection (or outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.[1] Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions.[2]

In particular in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular unsupervised methods) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro clusters formed by these patterns.[3]

Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.[citation needed]

Applications

Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, and detecting Eco-system disturbances. It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[4][5]

Several anomaly detection techniques have been proposed in literature. Some of the popular techniques are:

Application to data security

Anomaly detection was proposed for Intrusion detection systems (IDS) by Dorothy Denning in 1986.[23] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with Soft computing, and inductive learning.[24] Types of statistics proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[25] The counterpart of anomaly detection in intrusion detection is misuse detection.

Software

  • ELKI is an open-source Java data mining toolkit that contains several anomaly detection algorithms, as well as index acceleration for them.

See also

References

  1. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1541880.1541882, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1541880.1541882 instead.
  2. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/s10462-004-4304-y, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/s10462-004-4304-y instead.
  3. ^ Dokas, Paul; Ertoz, Levent; Kumar, Vipin; Lazarevic, Aleksandar; Srivastava, Jaideep; Tan, Pang-Ning (2002). "Data mining for network intrusion detection" (PDF). Proceedings NSF Workshop on Next Generation Data Mining.
  4. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/TSMC.1976.4309523, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/TSMC.1976.4309523 instead.
  5. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/IJCNN.2011.6033571, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/IJCNN.2011.6033571 instead.
  6. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/s007780050006 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/s007780050006 instead.
  7. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/342009.335437, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/342009.335437 instead.
  8. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/3-540-45681-3_2, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/3-540-45681-3_2 instead.
  9. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/335191.335388 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/335191.335388 instead.
  10. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/s10618-012-0300-z, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/s10618-012-0300-z instead.
  11. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/978-3-642-01307-2_86, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/978-3-642-01307-2_86 instead.
  12. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1109/ICDM.2012.21, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1109/ICDM.2012.21 instead.
  13. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1002/sam.11161, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1002/sam.11161 instead.
  14. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1162/089976601750264965 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1162/089976601750264965 instead.
  15. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/3-540-46145-0_17, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/3-540-46145-0_17 instead.
  16. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1016/S0167-8655(03)00003-5 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1016/S0167-8655(03)00003-5 instead.
  17. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/1081870.1081891, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/1081870.1081891 instead.
  18. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/978-3-642-12026-8_29, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/978-3-642-12026-8_29 instead.
  19. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1137/1.9781611972818.2, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1137/1.9781611972818.2 instead.
  20. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1137/1.9781611972825.90, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1137/1.9781611972825.90 instead.
  21. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/2594473.2594476 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/2594473.2594476 instead.
  22. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/2618243.2618257 , please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/2618243.2618257 instead.
  23. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/TSE.1987.232894, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/TSE.1987.232894 instead.
  24. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/RISP.1990.63857, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/RISP.1990.63857 instead.
  25. ^ Jones, Anita K.; Sielken, Robert S. (1999). "Computer System Intrusion Detection: A Survey". Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA. CiteSeerx10.1.1.24.7802.