ELKI: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Latest release has AGPL, thus move into free software categories.
add DOI, award
Line 41: Line 41:


ELKI started as implementation<ref name="acmshort">{{cite doi | 10.1145/1656274.1656286}}</ref> of the doctoral dissertation of Dr. Arthur Zimek,<ref>{{Citation | first = Arthur | last = Zimek | title = Correlation Clustering | date=2008-06-30 | publisher = [[Ludwig Maximilian University of Munich]] | url=http://edoc.ub.uni-muenchen.de/8736/1/Zimek_Arthur.pdf | id={{URN|nbn|de:bvb:19-87361}} | place = Munich, Germany }}</ref> which was awarded "[[SIGKDD]] Doctoral Dissertation Award 2009 Runner-up"<ref>{{cite web | url=http://www.sigkdd.org/awards_dissertation.php |title=SIGKDD Doctoral Disseration Award | publisher=[[Association for Computing Machinery|ACM]] [[SIGKDD]] | accessdate=30 May 2010}}</ref> by the [[Association for Computing Machinery]] for its contributions to [[correlation clustering]]. The algorithms published as part of the dissertation (4C, COPAC, HiCO, ERiC, CASH) are available in ELKI.<ref name="acmshort" />
ELKI started as implementation<ref name="acmshort">{{cite doi | 10.1145/1656274.1656286}}</ref> of the doctoral dissertation of Dr. Arthur Zimek,<ref>{{Citation | first = Arthur | last = Zimek | title = Correlation Clustering | date=2008-06-30 | publisher = [[Ludwig Maximilian University of Munich]] | url=http://edoc.ub.uni-muenchen.de/8736/1/Zimek_Arthur.pdf | id={{URN|nbn|de:bvb:19-87361}} | place = Munich, Germany }}</ref> which was awarded "[[SIGKDD]] Doctoral Dissertation Award 2009 Runner-up"<ref>{{cite web | url=http://www.sigkdd.org/awards_dissertation.php |title=SIGKDD Doctoral Disseration Award | publisher=[[Association for Computing Machinery|ACM]] [[SIGKDD]] | accessdate=30 May 2010}}</ref> by the [[Association for Computing Machinery]] for its contributions to [[correlation clustering]]. The algorithms published as part of the dissertation (4C, COPAC, HiCO, ERiC, CASH) are available in ELKI.<ref name="acmshort" />

Version 0.4 presented at the "Symposium on Spatial and Temporal Databases" 2011 with included various methods for spatial outlier detection<ref name="sstd11" /> won the conferences "best demonstration paper award".


== Included algorithms ==
== Included algorithms ==
Line 110: Line 112:
}}</ref>
}}</ref>


Version 0.4 (August 2011) added algorithms for geo data mining and support for multi-relational database and index structures.<ref>{{cite journal
Version 0.4 (August 2011) added algorithms for geo data mining and support for multi-relational database and index structures.<ref name="sstd11">{{cite journal
| title=Spatial Outlier Detection: Data, Algorithms, Visualizations
| title=Spatial Outlier Detection: Data, Algorithms, Visualizations
| author=Elke Achtert, Achmed Hettab, [[Hans-Peter Kriegel]], Erich Schubert, Arthur Zimek
| author=Elke Achtert, Achmed Hettab, [[Hans-Peter Kriegel]], Erich Schubert, Arthur Zimek
Line 117: Line 119:
| publisher=Spinger
| publisher=Spinger
| year=2011
| year=2011
| DOI=10.1007/978-3-642-22922-0_41
}}</ref>
}}</ref>



Revision as of 23:54, 26 August 2011

Environment for DeveLoping KDD-Applications Supported by Index-Structures
Developer(s)Ludwig Maximilian University of Munich
Stable release
0.3.0 / March 30, 2010; 14 years ago (2010-03-30)
Preview release
0.4.0~beta1 / August 17, 2011; 12 years ago (2011-08-17)
Repository
Written inJava
Operating systemMicrosoft Windows, Linux, Mac OS
PlatformJava platform
TypeData Mining
LicenseAGPL (since version 0.4.0)
Websitehttp://elki.dbs.ifi.lmu.de/

Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI) is a Knowledge Discovery in Databases (KDD, "data mining") software framework developed for use in research and teaching by the database systems research unit of Professor Hans-Peter Kriegel at the Ludwig Maximilian University of Munich, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.

Description

The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms belong to clustering, outlier detection[1] and database indexes. A key concept of ELKI is to allow the combination of arbitrary algorithms, data types, distance functions and indexes and evaluate these combinations. When developing new algorithms or index structures, the existing components can be reused and combined.

The university project is developed for use in teaching and research. The source code is written with extensibility, readability and reusability in mind, but it is not extensively optimized for performance. A scientific evaluation comparing run times thus is only sound when both algorithms are implemented within ELKI so they share the same cost. It currently does not offer integration with business intelligence applications or even an interface to common database management systems via SQL. The application of the algorithms requires knowledge about their use and study of documentation. The audience are students, researchers and software engineers.

The visualization modules use SVG for scalable graphics output, and Apache Batik for rendering of the user interface as well as lossless export into PostScript and PDF for easy inclusion in scientific publications in LaTeX.

Awards

ELKI started as implementation[2] of the doctoral dissertation of Dr. Arthur Zimek,[3] which was awarded "SIGKDD Doctoral Dissertation Award 2009 Runner-up"[4] by the Association for Computing Machinery for its contributions to correlation clustering. The algorithms published as part of the dissertation (4C, COPAC, HiCO, ERiC, CASH) are available in ELKI.[2]

Version 0.4 presented at the "Symposium on Spatial and Temporal Databases" 2011 with included various methods for spatial outlier detection[5] won the conferences "best demonstration paper award".

Included algorithms

Select included algorithms[6]:

Licensing

The website or source code does not give an explicit license, it should therefore be considered copyrighted. The authors have stated that research use is acceptable but attribution is required. For commercial use, an explicit license is required.

Version history

Version 0.1 (July 2008) contained several Algorithms from cluster analysis and anomaly detection, as well as some index structures such as the R*-tree. The focus of the first release was on subspace clustering and correlation clustering algorithms.[7]

Version 0.2 (July 2009) added functionality for time series analysis, in particular distance functions for time series.[8]

Version 0.3 (March 2010) extended the choice of anomaly detection algorithms and visualization modules.[9]

Version 0.4 (August 2011) added algorithms for geo data mining and support for multi-relational database and index structures.[5]

Related applications

External links

References

  1. ^ Hans-Peter Kriegel, Peer Kröger, Arthur Zimek (2009). "Outlier Detection Techniques (Tutorial)" (PDF). 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009). Bangkok, Thailand. Retrieved 2010-03-26.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  2. ^ a b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1145/1656274.1656286, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1145/1656274.1656286 instead.
  3. ^ Zimek, Arthur (2008-06-30), Correlation Clustering (PDF), Munich, Germany: Ludwig Maximilian University of Munich, urn:nbn:de:bvb:19-87361
  4. ^ "SIGKDD Doctoral Disseration Award". ACM SIGKDD. Retrieved 30 May 2010.
  5. ^ a b Elke Achtert, Achmed Hettab, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek (2011). "Spatial Outlier Detection: Data, Algorithms, Visualizations". 12th International Symposium on Spatial and Temporal Databases (SSTD 2011). Minneapolis, MN: Spinger. doi:10.1007/978-3-642-22922-0_41.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. ^ excerpt from "Data Mining Algorithms in ELKI 0.4". Retrieved August 17, 2011.
  7. ^ Elke Achtert, Hans-Peter Kriegel, Arthur Zimek (2008). "ELKI: A Software System for Evaluation of Subspace Clustering Algorithms" (PDF). Proceedings of the 20th international conference on Scientific and Statistical Database Management (SSDBM 08). Hong Kong, China: Springer. doi:10.1007/978-3-540-69497-7_41.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  8. ^ Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek (2009). "ELKI in time: ELKI 0.2 for the performance evaluation of distance measures for time series" (PDF). Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases (SSTD 2010). Aalborg, Dänemark: Springer. doi:10.1007/978-3-642-02982-0_35.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  9. ^ Elke Achtert, Hans-Peter Kriegel, Lisa Reichert, Erich Schubert, Remigius Wojdanowski, Arthur Zimek (2010). "Visual Evaluation of Outlier Detection Models". 15th International Conference on Database Systems for Advanced Applications (DASFAA 2010). Tsukuba, Japan: Spinger. doi:10.1007/978-3-642-12098-5_34.{{cite journal}}: CS1 maint: multiple names: authors list (link)