Jump to content

RapidMiner

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 89.24.151.204 (talk) at 06:20, 12 February 2011 (→‎See also). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

RapidMiner (formerly YALE (Yet Another Learning Environment)) is an open source environment for machine learning, data mining, text mining, predictive analytics, and business analytics. Data mining processes in RapidMiner can be made up of a large number of arbitrarily nestable operators, described in XML files which are created with RapidMiner's graphical user interface. RapidMiner is used for research, education, training, rapid prototyping, application development, and industrial deployments. In a poll by the data-mining newspaper KDnuggets, RapidMiner ranked second between data mining/analytic tools used for real projects in 2009[1] and first in 2010.[2]

The RapidMiner open source project, formely named YALE, was initiated by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer. The initial version of RapidMiner was developed by the Artificial Intelligence Unit of University of Dortmund since 2001. It is distributed under the AGPL license, and has been hosted by SourceForge since 2004. In 2006, Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that now supports the further development of RapidMiner as main contributor. In addition, more than 30 developers word-wide contribute improvements and extensions to the software.

RapidMiner provides more than 600 operators for all main data mining and machine learning procedures, including import, export, data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. RapidMiner is written in the Java programming language and therefore runs on all popular operating systems. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project. According to SourceForge, RapidMiner is used in more than 60 countries world-wide.

What is RapidMiner ?

What is it?

The Community Edition of RapidMiner (formerly "Yale") is an open source toolkit for data mining. Its strengths reside in part in its ability to easily define analytical steps (especially when compared with R), and in generating graphs more easily[citation needed] than e.g., R, or more effectively[citation needed] than MS Excel.

What is it for?

RapidMiner is well suited[citation needed] for analyzing data generated by high-throughput instruments, e.g., genotyping, proteomics, and mass spectrometry.

Example applications:

  • Bypassing its data mining functions and simply having RapidMiner generate figures.
  • Exploring data in "super-Excel" fashion ("knowledge discovery").
  • Constructing custom data analysis workflows.
  • Calling RapidMiner functions from programs written in other languages/systems (e.g., Perl).

Notable selected features of RapidMiner:

  • Broad collection of data mining algorithms such as decision trees and self-organization maps.
  • Sophisticated[citation needed] graphics, such as overlapping histograms, tree charts and 3D scatter plots.
  • Many varied plugins, such as a text plugin for doing text analysis.

How does it work?

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree" in RapidMiner parlance). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running, the GUI can also be used to interactively control and inspect running processes.

Other ways of using RapidMiner involve calling RapidMiner from e.g., a Perl program. The Java application programming interface ("API") provides clear interfaces for applying operators individually (i.e., no need to create an operator tree), providing the ability to bypass the GUI and controlling analytical processes directly.

Last, one can also call individual RapidMiner functions directly from the command line.

Where to get it?

Because RapidMiner runs on Java, it can be installed on any computer on which Java runs. Documentation RapidMiner seems well documented, especially its tutorial.[weasel words]

Software Versions

Although the core of RapidMiner is open-source and is offered free of charge as a "Community Edition", there is also "Enterprise Edition", that is, according to the site, "Community Edition + More Features + Services + Guarantees"[1] RapidMiner source is also offered under proprietary commercial license, to allow integration in closed-source solutions.

Who uses RapidMiner?

RapidMiner flexibility allows it use with text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner is found in the Electronic Industry, Energy Industry, Automobile Industry, Commerce, Aviation, Telecommunications, Banking and Insurance, Production, IT Industry, Market Research, Pharmaceutical Industry, Universities and other Miscellaneous businesses (i.e. sports teams, train station, police station). For specific examples of each business area can be referenced # Reference: [2]

Properties

Some properties of RapidMiner are:

  • written in Java
  • knowledge discovery processes are modeled as operator trees
  • internal XML representation ensures standardized interchange format of data mining experiments
  • scripting language allows for automatic large-scale experiments
  • multi-layered data view concept ensures efficient and transparent data handling
  • graphical user interface, command line mode (batch mode), and Java API for using RapidMiner from other programs
  • plugin and extension mechanisms, several plugins already exist
  • plotting facility offering a large set of high-dimensional visualization schemes for data and models
  • applications include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

Extensions

One of the advantage of the Rapdiminer is it can be easily extended by additional features. Today the rapidminer contain more than 15 extensions.

  • Text processing (Extension Manager)
  • Weka extension (Extension Manager)
  • Web Mining Extension (Extension Manager)
  • Parallel Processing Extension (Extension Manager)
  • PMML (Extension Manager)
  • PaREN - automatic process design
  • WhiBo - Decision Trees Extension
  • R Extension (Extension Manager)
  • Community (Extension Manager)
  • Reporting Extension (Extension Manager)
  • Series Processing Extension (Extension Manager)
  • Image Processing Rapidminer Extension
  • OpenCV Rapidmiiner Extension
  • Hidden Markov Models Rapidminer Extension
  • Semweb Rapidminer Extension
  • Text CategorizationRapidmiiner Extensio
  • INQLE - Intelligent Network of Querying and Learning Engines
  • Market Basket Analysis Extension

See also

  • Weka - machine learning algorithms that can be integrated into RapidMiner
  • R-Project - statistical framework that can be integrared into RapidMiner

References

  • Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.
  1. ^ "Data Mining Tools Used Poll (May 2009)".
  2. ^ "Data Mining / Analytic Tools Used Poll (May 2010)".