RapidMiner

From Wikipedia, the free encyclopedia

Jump to: navigation, search

RapidMiner (formerly YALE (Yet Another Learning Environment)) is an environment for machine learning and data mining experiments. It allows experiments to be made up of a large number of arbitrarily nestable operators, described in XML files which are created with RapidMiner's graphical user interface. RapidMiner is used for both research and real-world data mining tasks.

The initial version has been developed by the Artificial Intelligence Unit of University of Dortmund since 2001. It is distributed under a GNU license, and has been hosted by SourceForge since 2004.

RapidMiner provides more than 500 operators for all main machine learning procedures, including input and output, and data preprocessing and visualization. It is written in the Java programming language and therefore can work on all popular operating systems. It also integrates learning schemes and attribute evaluators of the Weka learning environment.

Contents

[edit] What is RapidMiner ?

[edit] What is it?

The Community Edition of RapidMiner (formerly "Yale") is an open source toolkit for data mining. Its strengths reside in part in its ability to easily define analytical steps (especially when compared with R), and in generating graphs more easily than e.g., R, or more effectively than MS Excel.

[edit] What is it for?

RapidMiner is well suited for analyzing data generated by high-throughput instruments, e.g., genotyping, proteomics, and mass spectrometry.

Example applications:

  • Bypassing its data mining functions and simply having RapidMiner generate figures.
  • Exploring data in "super-Excel" fashion ("knowledge discovery").
  • Constructing custom data analysis workflows.
  • Calling RapidMiner functions from programs written in other languages/systems (e.g., Perl).

Notable selected features of RapidMiner:

  • Broad collection of data mining algorithms such as decision trees and self-organization maps.
  • Sophisticated graphics, such as overlapping histograms, tree charts and 3D scatter plots.
  • Many varied plugins, such as a text plugin for doing text analysis.

[edit] How does it work?

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree" in RapidMiner parlance). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running, the GUI can also be used to interactively control and inspect running processes.

Other ways of using RapidMiner involve calling RapidMiner from e.g., a Perl program. The Java application programming interface ("API") provides clear interfaces for applying operators individually (i.e., no need to create an operator tree), providing the ability to bypass the GUI and controlling analytical processes directly.

Last, one can also call individual RapidMiner functions directly from the command line.

[edit] Where to get it?

Because RapidMiner runs on Java, it can be installed on any computer on which Java runs. Documentation RapidMiner seems well documented, especially its tutorial.

[edit] Software Versions

Although the core of RapidMiner is open-source and is offered free of charge as a "Community Edition", there is also "Enterprise Edition", that is, according to the site, "Community Edition + More Features + Services + Guarantees"[1] RapidMiner source is also offered under proprietary commercial license, to allow integration in closed-source solutions.

[edit] Who uses RapidMiner?

RapidMiner flexibility allows it use with text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner is found in the Electronic Industry, Energy Industry, Automobile Industry, Commerce, Aviation, Telecommunications, Banking and Insurance, Production, IT Industry, Market Research, Pharmaceutical Industry, Universities and other Miscellaneous businesses (i.e. sports teams, train station, police station). For specific examples of each business area can be referenced # Reference: [2]

[edit] Properties

Some properties of RapidMiner are:

  • written in Java
  • knowledge discovery processes are modeled as operator trees
  • internal XML representation ensures standardized interchange format of data mining experiments
  • scripting language allows for automatic large-scale experiments
  • multi-layered data view concept ensures efficient and transparent data handling
  • graphical user interface, command line mode (batch mode), and Java API for using RapidMiner from other programs
  • plugin and extension mechanisms, several plugins already exist
  • plotting facility offering a large set of high-dimensional visualization schemes for data and models
  • applications include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

[edit] References

  • I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

[edit] External links

Languages