Jump to content

Concept drift: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 34: Line 34:


==Researchers==
==Researchers==
* [https://venus.tue.nl/ep-cgi/ep_detail.opl?fac_id=92&rn=20040766&taal=US&hash=GEEnA9HBATTnjJFFZWvB7uqTXU Jorn Bakker], Eindhoven University of Technology, the Netherlands
* [http://www.cs.waikato.ac.nz/~abifet/ Albert Bifet], University of Waikato, New Zealand
* [http://www.cs.waikato.ac.nz/~abifet/ Albert Bifet], University of Waikato, New Zealand
* [http://www.mat.ua.pt/gladys Gladys Castillo], University of Aveiro, Portugal
* [http://www.mat.ua.pt/gladys Gladys Castillo], University of Aveiro, Portugal
* [http://www.cs.kuleuven.be/~anton/ Anton Dries], Katholieke Universiteit Leuven, Belgium
* [http://www.liaad.up.pt/~jgama/ Joao Gama], University of Porto, Portugal
* [http://www.liaad.up.pt/~jgama/ Joao Gama], University of Porto, Portugal
* [http://iaia.lcc.uma.es/~rfm Raúl Fidalgo], University of Málaga, Spain
* [http://iaia.lcc.uma.es/~rfm Raúl Fidalgo], University of Málaga, Spain
Line 42: Line 44:
* [http://www.math.bas.bg/~koychev/ Ivan Koychev], Institute of Mathematics and Informatics, Bulgarian Academy of Science
* [http://www.math.bas.bg/~koychev/ Ivan Koychev], Institute of Mathematics and Informatics, Bulgarian Academy of Science
* [http://www6.miami.edu/UMH/CDA/UMH_Main/1,1770,44604-1;45098-3,00.html Miroslav Kubat], University of Miami, USA
* [http://www6.miami.edu/UMH/CDA/UMH_Main/1,1770,44604-1;45098-3,00.html Miroslav Kubat], University of Miami, USA
* [http://www.bangor.ac.uk/~mas00a/ Ludmila Kuncheva], University of Wales, Bangor, UK
* [http://www.bangor.ac.uk/~mas00a/ Ludmila I. Kuncheva], University of Wales, Bangor, UK
* [http://www.cs.georgetown.edu/~maloof/ Mark Maloof], Georgetown University, USA
* [http://www.cs.georgetown.edu/~maloof/ Mark Maloof], Georgetown University, USA
* [http://www.cs.bham.ac.uk/~flm/index.html Leandro L. Minku], University of Birmingham, UK.
* [http://louisville.edu/~o0nasr01/ Olfa Nasraoui], University of Louisville, USA
* [http://louisville.edu/~o0nasr01/ Olfa Nasraoui], University of Louisville, USA
* [http://lis2.huie.hokudai.ac.jp/~knishida/index_en.html Kyosuke Nishida], Hokkaido University, Japan
* [http://lis2.huie.hokudai.ac.jp/~knishida/index_en.html Kyosuke Nishida], Hokkaido University, Japan

Revision as of 12:14, 12 October 2009

In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as the times passes.

The term concept refers to the quantity you are looking to predict. More generally, it can also refer to other phenomena of interest besides the target concept, such as an input, but, in the context of concept drift, the term commonly refers to the target variable.

Examples

In a fraud detection application the target concept may be a binary attribute FRAUDULENT with values "yes" or "no" that indicates whether a given transaction is fraudulent. Or, in a weather prediction application, there may be several target concepts such as TEMPERATURE, PRESSURE, and HUMIDITY.

The behavior of the customers in an online shop may change over time. Let's say you want to predict weekly merchandise sales, and you have developed a predictive model that works to your satisfaction. The model may use inputs such as the amount of money spent on advertising, promotions you are running, and other metrics that may affect sales. What you are likely to experience is that the model will become less and less accurate over time - you will be a victim of concept drift. In the merchandise sales application, one reason for concept drift may be seasonality, which means that shopping behavior changes seasonally. You will likely have higher sales in the winter holiday season than during the summer.

Possible remedies

To prevent deterioration in prediction accuracy over time the model has to be refreshed periodically. One approach is to retrain the model using only the most recently observed samples (Widmer and Kubat, 1996). Another approach is to add new inputs which may be better at explaining the causes of the concept drift. For our sales prediction application you may be able to reduce concept drift by adding information about the season to your model. By providing information about the time of the year you will likely reduce rate of deterioration of your model, but you likely will never be able to prevent concept drift altogether. This is because actual shopping behavior does not follow any static, finite model. New factors may arise at any time that influence shopping behavior, the influence of the known factors or their interactions may change.

Concept drift cannot be avoided if you are looking to predict a complex phenomenon that is not governed by fixed laws of nature. All processes that arise from human activity, such as socioeconomic processes, and biological processes are likely to experience concept drift. Therefore, periodic retraining, also known as refreshing of your model is inescapable.

Software

  • RapidMiner (RapidMiner, formerly YALE (Yet Another Learning Environment)): free open-source software for knowledge discovery, data mining, and machine learning also featuring data stream mining, learning time-varying concepts, and tracking drifting concept (if used in combination with its data stream mining plugin (formerly: concept drift plugin))
  • EDDM (EDDM (Early Drift Detection Method)): free open-source implementation of drift detection methods in Weka (machine learning).
  • MOA (Massive Online Analysis): free open-source software specific for mining data streams with concept drift. It contains a prequential evaluation method, the EDDM concept drift methods, a reader of ARFF real datasets, and artificial stream generators as SEA concepts, STAGGER, rotating hyperplane, random tree, and random radius based functions. MOA supports bi-directional interaction with Weka (machine learning).

Benchmark datasets

Real

  • Elec2, electricity demand, 2 classes, 45312 instances. Reference: M.Harries, Splice-2 comparative evaluation: Electricity pricing, Technical report, The University of South Wales, 1999. Download from J.Gama webpage.
  • Text mining, a collection of text mining datasets with concept drift, maintained by I.Katakis. Download

Artificial

  • STAGGER, J.C.Schlimmer, R.H.Granger, Incremental Learning from Noisy Data, Mach. Learn., vol.1, no.3, 1986.
  • SEA concepts, N.W.Street, Y.Kim, A streaming ensemble algorithm (SEA) for large-scale classification, KDD'01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001. Download from J.Gama webpage.
  • A toolbox, A.Narasimhamurthy, L.I.Kuncheva, A framework for generating data to simulate changing environments, Proc. IASTED, Artificial Intelligence and Applications, Innsbruck, Austria, 2007. Access.

Data generation frameworks

  • Lindstrom P, SJ Delany & B MacNamee (2008) Autopilot: Simulating Changing Concepts in Real Data In: Proceedings of the 19th Irish Conference on Artificial Intelligence & Cognitive Science, D Bridge, K Brown, B O'Sullivan & H Sorensen (eds.)

p272-263 PDF

  • Narasimhamurthy A., L.I. Kuncheva, A framework for generating data to simulate changing environments, Proc. IASTED, Artificial Intelligence and Applications, Innsbruck, Austria, 2007, 384-389 PDF

Researchers

Bibliographic references

Reviews

  • Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Mining Data Streams: A Review, in ACM SIGMOD Record, Vol. 34, No. 1, June 2005, ISSN: 0163-5808
  • Jiang, J., A Literature Survey on Domain Adaptation of Statistical Classifiers. 2008. PDF
  • Kuncheva L.I. Classifier ensembles for detecting concept change in streaming data: Overview and perspectives, Proc. 2nd Workshop SUEMA 2008 (ECAI 2008), Patras, Greece, 2008, 5-10, PDF
  • Kuncheva L.I., Classifier ensembles for changing environments, Proceedings 5th International Workshop on Multiple Classifier Systems, MCS2004, Cagliari, Italy, in F. Roli, J. Kittler and T. Windeatt (Eds.), Lecture Notes in Computer Science, Vol 3077, 2004, 1-15, PDF.
  • Tsymbal, A., The problem of concept drift: Definitions and related work. Technical Report. 2004, Department of Computer Science, Trinity College: Dublin, Ireland. PDF
  • Widmer, G., Learning in Dynamically Changing Domains: Recent Contributions of Machine Learning. In Proceedings of the MLNet Workshop on Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues, Prague, Czech Republic, 1997. PS
  • Zliobaite, I., Learning under Concept Drift: an Overview. Technical Report. 2009, Faculty of Mathematics and Informatics, Vilnius University: Vilnius, Lithuania. PDF

Other

2009

  • Koren, Y. Collaborative Filtering with Temporal Dynamics. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009. Pages 447-456 PDF

2008

  • Karnick M., Ahiskali M., Muhlbaier, M.D., and Polikar R., "Learning Concept Drift in Nonstationary Environments Using an Ensemble of Classifiers Based Approach," World Congress on Computational Intelligence / IEEE International Joint Conference on Neural Networks, Hong Kong, 1-6 June 2008. PDF

2007

  • Kolter, J.Z. and Maloof, M.A. Dynamic Weighted Majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8:2755--2790, 2007. PDF
  • Mulhbaier D., and Polikar, R. "Multiple Classifiers Based Incremental Learning Algorithm for Learning in Nonstationary Environments," IEEE International Conference on Machine Learning and Cybernetics, Volume 6, Page(s):3618 - 3623, 19-22 August 2007.PDF
  • Núñez M., Fidalgo R., and Morales R., Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners, Journal of Machine Learning Research, 8, (2007) 2595-2628 PDF
  • Scholz, Martin and Klinkenberg, Ralf: Boosting Classifiers for Drifting Concepts. In Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams, Vol. 11, No. 1, pages 3-28, March 2007.

2006

  • Nasraoui O. , Cerwinske J., Rojas C., and Gonzalez F., "Collaborative Filtering in Dynamic Usage Environments", in Proc. of CIKM 2006 – Conference on Information and Knowledge Management, Arlington VA , Nov. 2006 PDF
  • Nasraoui O. , Rojas C., and Cardona C., “ A Framework for Mining Evolving Trends in Web Data Streams using Dynamic Learning and Retrospective Validation ”, Journal of Computer Networks- Special Issue on Web Dynamics, 50(10), 1425-1652, July 2006 PDF

2005

  • Kolter J.Z. and Maloof, M.A. Using additive expert ensembles to cope with concept drift. In Proceedings of the Twenty-second International Conference on Machine Learning, pages 449-456. New York, NY: ACM Press, 2005.
  • Scholz, Martin and Klinkenberg, Ralf: An Ensemble Classifier for Drifting Concepts. In Gama, J. and Aguilar-Ruiz, J. S. (editors), Proceedings of the Second International Workshop on Knowledge Discovery in Data Streams, pages 53--64, Porto, Portugal, 2005.

2004

  • Klinkenberg, Ralf: Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, pages 281--300, 2004.
  • Maloof M.A. and Michalski R.S. Incremental learning with partial instance memory. Artificial Intelligence 154, 2004, pp. 95-126.

2003

  • Klinkenberg, Ralf. Predicting Phases in Business Cycles Under Concept Drift. In Hotho, Andreas and Stumme, Gerd (editors), Proceedings of LLWA-2003 / FGML-2003, pages 3--10, Karlsruhe, Germany, 2003.
  • Klinkenberg, Ralf and Rüping, Stefan: Concept Drift and the Importance of Examples. In Franke, Jürgen and Nakhaeizadeh, Gholamreza and Renz, Ingrid (editors), Text Mining -- Theoretical Aspects and Applications, pages 55--77, Berlin, Germany, Physica-Verlag, 2003.
  • Kolter, J.Z. and Maloof, M.A. Dynamic Weighted Majority: A new ensemble method for tracking concept drift. Proceedings of the Third International IEEE Conference on Data Mining, pages 123-130, Los Alamitos, CA: IEEE Press, 2003.

2001

  • Klinkenberg, Ralf: Using Labeled and Unlabeled Data to Learn Drifting Concepts. In Kubat, Miroslav and Morik, Katharina (editors), Workshop notes of the IJCAI-01 Workshop on \em Learning from Temporal and Spatial Data, pages 16--24, IJCAI, Menlo Park, CA, USA, AAAI Press, 2001.

2000 and earlier

  • Carroll J. and Rosson M. B. The paradox of the active user. In J.M. Carroll (Ed.), Interfacing Thought: Cognitive Aspects of Human-Computer Interaction. Cambridge, MA, MIT Press, 1987.
  • Grabtree I. Soltysiak S. Identifying and Tracking Changing Interests. International Journal of Digital Libraries, Springer Verlag, vol. 2, 38-53, 1998.
  • Harries M. B., Sammut C., Horn K. Extracting Hidden Context, Machine Learning 32, 1998, pp. 101-126.
  • Klinkenberg, Ralf and Joachims, Thorsten: Detecting Concept Drift with Support Vector Machines. In Langley, Pat (editor), Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 487--494, San Francisco, CA, USA, Morgan Kaufmann, 2000.
  • Klinkenberg, Ralf and Renz, Ingrid: Adaptive Information Filtering: Learning in the Presence of Concept Drifts. In Sahami, Mehran and Craven, Mark and Joachims, Thorsten and McCallum, Andrew (editors), Workshop Notes of the ICML/AAAI-98 Workshop \em Learning for Text Categorization, pages 33--40, Menlo Park, CA, USA, AAAI Press, 1998.
  • Koychev I. Gradual Forgetting for Adaptation to Concept Drift. In Proceedings of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning. Berlin, Germany, 2000, pp. 101-106.
  • Koychev I. and Schwab I., Adaptation to Drifting User’s Interests, Proc. of ECML2000 Workshop: Machine Learning in New Information Age, Barcelona, Spain, 2000, pp. 39-45.
  • Maloof M.A. and Michalski R.S. Selecting examples for partial memory learning. Machine Learning, 41(11), 2000, pp. 27-52.
  • Mitchell T., Caruana R., Freitag D., McDermott, J. and Zabowski D. Experience with a Learning Personal Assistant. Communications of the ACM 37(7), 1994, pp. 81-91.
  • Schlimmer J., and Granger R. Incremental Learning from Noisy Data, Machine Learning, 1(3), 1986, 317-357.
  • Schwab I., Pohl W. and Koychev I. Learning to Recommend from Positive Evidence, Proceedings of Intelligent User Interfaces 2000, ACM Press, 241 - 247.
  • Widmer G. Tracking Changes through Meta-Learning, Machine Learning 27, 1997, pp. 256-286.
  • Widmer G. and Kubat M. Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 1996, pp. 69-101.

See also