Data stream mining

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by HelpUsStopSpam (talk | contribs) at 01:32, 26 May 2017 (Reverted 1 edit by 80.12.33.47 (talk) to last revision by Marko Tscherepanow. (TW)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities.

In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream. Machine learning techniques can be used to learn this prediction task from labeled examples in an automated fashion. Often, concepts from the field of incremental learning are applied to cope with structural changes, on-line learning and real-time demands. In many applications, especially operating within non-stationary environments, the distribution underlying the instances or the rules underlying their labeling may change over time, i.e. the goal of the prediction, the class to be predicted or the target value to be predicted, may change over time. This problem is referred to as concept drift.

Examples of data streams include computer network traffic, phone conversations, ATM transactions, web searches, and sensor data. Data stream mining can be considered a subfield of data mining, machine learning, and knowledge discovery.

Software for data stream mining

  • MOA (Massive Online Analysis): free open-source software specific for mining data streams with concept drift. It has several machine learning algorithms (classification, regression, clustering, outlier detection and recommender systems). Also it contains a prequential evaluation method, the EDDM concept drift methods, a reader of ARFF real datasets, and artificial stream generators as SEA concepts, STAGGER, rotating hyperplane, random tree, and random radius based functions. MOA supports bi-directional interaction with Weka (machine learning).
  • RapidMiner: commercial software for knowledge discovery, data mining, and machine learning also featuring data stream mining, learning time-varying concepts, and tracking drifting concept (if used in combination with its data stream mining plugin (formerly: Concept Drift plugin))

Events

See also

Books

  • Gama, João; Gaber, Mohamed Medhat, eds. (2007). Learning from Data Streams: Processing Techniques in Sensor Networks. Springer. p. 244. doi:10.1007/3-540-73679-4. ISBN 9783540736783.
  • Ganguly, Auroop R.; Gama, João; Omitaomu, Olufemi A.; Gaber, Mohamed M.; Vatsavai, Ranga R., eds. (2008). Knowledge Discovery from Sensor Data. Industrial Innovation. CRC Press. p. 215. ISBN 9781420082326.
  • Gama, João (2010). Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman and Hall. p. 255. ISBN 9781439826119.
  • Lughofer, Edwin (2011). Evolving Fuzzy Systems - Methodologies, Advanced Concepts and Applications. Studies in Fuzziness and Soft Computing. Vol. 266. Heidelberg: Springer. p. 456. doi:10.1007/978-3-642-18087-3. ISBN 9783642180866.
  • Sayed-Mouchaweh, Moamar; Lughofer, Edwin, eds. (2012). Learning in Non-Stationary Environments: Methods and Applications. New York: Springer. p. 440. doi:10.1007/978-1-4419-8020-5. ISBN 9781441980199.

References