Jump to content

Exploratory data analysis: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
DixonDBot (talk | contribs)
No edit summary
Line 85: Line 85:
* [http://oli.web.cmu.edu/openlearning/forstudents/freecourses/statistics Carnegie Mellon University - free online course on EDA]
* [http://oli.web.cmu.edu/openlearning/forstudents/freecourses/statistics Carnegie Mellon University - free online course on EDA]
* [http://www.tc.umn.edu/~zief0002/Notes/F08_02_EDA.pdf University of Minnesota - EDA class notes]
* [http://www.tc.umn.edu/~zief0002/Notes/F08_02_EDA.pdf University of Minnesota - EDA class notes]
* [http://www.statistik.tuwien.ac.at/edavis - International Workshop on Data Analysis, Vienna 2010]


[[Category:Exploratory data analysis|*]]
[[Category:Exploratory data analysis|*]]

Revision as of 15:55, 22 November 2010

Exploratory data analysis (EDA) is an approach to analysing data for the purpose of formulating hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses[1]. It was so named by John Tukey to contrast with Confirmatory Data Analysis, the term used for the set of ideas about hypothesis testing, p-values, confidence intervals etc. which formed the key tools in the arsenal of practising statisticians at the time.

EDA development

Tukey held that too much emphasis in statistics was placed on statistical hypothesis testing (confirmatory data analysis); more emphasis needed to be placed on using data to suggest hypotheses to test. In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.

The objectives of EDA are to:

Many EDA techniques have been adopted into data mining and are being taught to young students as a way to introduce them to statistical thinking.[2]

Techniques

There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude taken than by particular techniques.[3]

The principal graphical techniques used in EDA are:

The principal quantitative techniques are:

Graphical and quantitative techniques are:

History

Many EDA ideas can be traced back to earlier authors, for example:

The Open University course Statistics in Society (MDST 242), took the above ideas and merged them with Gottfried Noether's work, which introduced statistical inference via coin-tossing and the median test.

Software

  • OpenSHAPA (modern open source successor to MacSHAPA), permits analysis of various media files (e.g. video, sound).
  • CMU-DAP (Carnegie-Mellon University Data Analysis Package, FORTRAN source for EDA tools with English-style command syntax, 1977).
  • Data Applied, a comprehensive web-based data visualization and data mining environment.
  • Fathom (for high-school and intro college courses).
  • JMP, an EDA package from SAS Institute.
  • KNIME Konstanz Information Miner - Open-Source data exploration platform based on Eclipse.
  • LiveGraph (open source real-time data series plotter).
  • Orange, an open-source data mining software suite.
  • SOCR provides a large number of free Internet-accessible.
  • DASS-GUI - data mining framework written in C++ and Qt.
  • TinkerPlots (for upper elementary and middle school students).

See also

References

  1. ^ "Conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler, Statistical Science, Volume 15, Number 1 (2000), 79-94.
  2. ^ Konold, C. (1999). Statistics goes to school. Contemporary Psychology, 44(1), 81-82.
  3. ^ "Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught," John W. Tukey The American Statistician, 34(1), (Feb., 1980), pp. 23-25.

Bibliography

  • Andrienko, N & Andrienko, G (2005) Exploratory Analysis of Spatial and Temporal Data. A Systematic Approach. Springer. ISBN 3-540-25994-5
  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 0-471-09776-4. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 0-471-09777-2. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Leinhardt, G., Leinhardt, S., Exploratory Data Analysis: New Tools for the Analysis of Empirical Data, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85–157.
  • Theus, M., Urbanek, S. (2008), Interactive Graphics for Data Analysis: Principles and Examples, CRC Press, Boca Raton, FL, ISBN 978-1-58488-594-8
  • Tukey, John Wilder (1977). Exploratory Data Analysis. Addison-Wesley. ISBN 0-201-07616-0. {{cite book}}: Cite has empty unknown parameters: |origmonth=, |month=, |chapterurl=, |origdate=, and |coauthors= (help)
  • Velleman, P F & Hoaglin, D C (1981) Applications, Basics and Computing of Exploratory Data Analysis ISBN 0-87150-409-X
  • Young, F. W. Valero-Mora, P. and Friendly M. (2006) Visual Statistics: Seeing your data with Dynamic Interactive Graphics. Wiley ISBN 978-0-471-68160-1