RapidMiner: Difference between revisions

Content deleted Content added

Inline

Revision as of 06:48, 12 February 2011

RapidMiner (formerly YALE (Yet Another Learning Environment)) is an open source environment for machine learning, data mining, text mining, predictive analytics, and business analytics. Data mining processes in RapidMiner can be made up of a large number of arbitrarily nestable operators, described in XML files which are created with RapidMiner's graphical user interface. RapidMiner is used for research, education, training, rapid prototyping, application development, and industrial deployments. In a poll by the data-mining newspaper KDnuggets, RapidMiner ranked second between data mining/analytic tools used for real projects in 2009^[1] and first in 2010.^[2]

The RapidMiner open source project, formely named YALE, was initiated by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer. The initial version of RapidMiner was developed by the Artificial Intelligence Unit of University of Dortmund since 2001. It is distributed under the AGPL license, and has been hosted by SourceForge since 2004. In 2006, Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that now supports the further development of RapidMiner as main contributor. In addition, more than 30 developers word-wide contribute improvements and extensions to the software.

RapidMiner provides more than 600 operators for all main data mining and machine learning procedures, including import, export, data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. RapidMiner is written in the Java programming language and therefore runs on all popular operating systems. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project. According to SourceForge, RapidMiner is used in more than 60 countries world-wide.

What is RapidMiner ?

What is it?

The Community Edition of RapidMiner (formerly "Yale") is an open source toolkit for data mining. Its strengths reside in part in its ability to easily define analytical steps (especially when compared with R), and in generating graphs more easily^{[citation needed]} than e.g., R, or more effectively^{[citation needed]} than MS Excel.

What is it for?

RapidMiner is well suited^{[citation needed]} for analyzing data generated by high-throughput instruments, e.g., genotyping, proteomics, and mass spectrometry.

Example applications:

Bypassing its data mining functions and simply having RapidMiner generate figures.
Exploring data in "super-Excel" fashion ("knowledge discovery").
Constructing custom data analysis workflows.
Calling RapidMiner functions from programs written in other languages/systems (e.g., Perl).

Notable selected features of RapidMiner:

Broad collection of data mining algorithms such as decision trees and self-organization maps.
Sophisticated^{[citation needed]} graphics, such as overlapping histograms, tree charts and 3D scatter plots.
Many varied plugins, such as a text plugin for doing text analysis.

How does it work?

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree" in RapidMiner parlance). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running, the GUI can also be used to interactively control and inspect running processes.

Other ways of using RapidMiner involve calling RapidMiner from e.g., a Perl program. The Java application programming interface ("API") provides clear interfaces for applying operators individually (i.e., no need to create an operator tree), providing the ability to bypass the GUI and controlling analytical processes directly.

Last, one can also call individual RapidMiner functions directly from the command line.

Where to get it?

RapidMiner runs on Java, it can be installed on any computer on which Java runs. I can run as a GUI application or also as a command-line tool on a server.

Software Versions

Although the core of RapidMiner is open-source and is offered free of charge as a "Community Edition", there is also "Enterprise Edition", that is, according to the site, "Community Edition + More Features + Services + Guarantees"[1] RapidMiner source is also offered under proprietary commercial license, to allow integration in closed-source solutions.

Who uses RapidMiner?

RapidMiner flexibility allows it use with text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner is found in the Electronic Industry, Energy Industry, Automobile Industry, Commerce, Aviation, Telecommunications, Banking and Insurance, Production, IT Industry, Market Research, Pharmaceutical Industry, Universities and other Miscellaneous businesses (i.e. sports teams, train station, police station). For specific examples of each business area can be referenced # Reference: [2]

Properties

Some properties of RapidMiner are:

written in Java
knowledge discovery processes are modeled as operator trees
internal XML representation ensures standardized interchange format of data mining experiments
scripting language allows for automatic large-scale experiments
multi-layered data view concept ensures efficient and transparent data handling
graphical user interface, command line mode (batch mode), and Java API for using RapidMiner from other programs
plugin and extension mechanisms, several plugins already exist
plotting facility offering a large set of high-dimensional visualization schemes for data and models
applications include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

Extensions

The Rapdiminer can be easily extended by additional plugins.

Today the Rapidminer contain more than 15 extensions, which advances scope of its aplicability to: text mining, image processing, time series processing, web mining, statistics, visualization, semantics, paralelization of computation process, automatic process design (PaREn Automatic System Construction Wizard) and others). Several of the extensions can be found directly in the application extension manager. The other can be downloaded from webs of their respective providers.

References

Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

External links

RapidMiner project home page
YALE becomes RapidMiner
Stanford University Lane Medical Library article "What is RapidMiner?" - one of the sources of this Wikipedia page

[1] "Data Mining Tools Used Poll (May 2009)".

[2] "Data Mining / Analytic Tools Used Poll (May 2010)".

[1]

[2]

@@ Line 44: / Line 44: @@
 === Where to get it? ===
-Because RapidMiner runs on Java, it can be installed on any computer on which Java runs.
+RapidMiner runs on Java, it can be installed on any computer on which Java runs. I can run as a GUI application or also as a command-line tool on a server.
 == Software Versions ==