Jump to content

Exploratory data analysis: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Srylesmor (talk | contribs)
Srylesmor (talk | contribs)
Line 15: Line 15:


==Techniques==
==Techniques==
There are a number of tools that are useful for EDA, but EDA is defined more by the attitude taken than the techniques used.<ref name="tukey_eda_tools">"Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught.", John W. Tukey, We Need Both Exploratory and Confirmatory, The American Statistician, Vol. 34, No. 1 (Feb., 1980), pp. 23-25.</ref>
There are a number of tools that are useful for EDA, but EDA is defined more by the attitude taken than the techniques used.<ref name="tukey_eda_tools">"Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught.", John W. Tukey, We need both exploratory and confirmatory, ''The American Statistician'', ''34(1)'', (Feb., 1980), pp. 23-25.</ref>


The principal [[graphical technique]]s used in EDA are:
The principal [[graphical technique]]s used in EDA are:

Revision as of 03:55, 16 April 2008

Exploratory data analysis (EDA) is about looking at data to form hypotheses worth testing, complementing the tools of conventional statistics for testing hypotheses[1]. It was so named by John Tukey.

EDA development

Tukey held that too much emphasis in statistics was placed on statistical hypothesis testing (confirmatory data analysis); more emphasis needed to be placed on using data to suggest hypotheses to test. In particular, he held that confusion of the two types of analysis and employing them on the same set of data can lead to systematic bias owing to the issues endemic in testing hypotheses suggested by the data.

The objectives of EDA are to:

Tukey's books were notoriously opaque, and so several attempts were made to popularise his EDA ideas. Prominent among these was the Statistics in Society (MDST242) course of The Open University.

Many EDA techniques have been adopted into data mining and are being taught to young students as a way to introduce them to statistical thinking.[2]

Techniques

There are a number of tools that are useful for EDA, but EDA is defined more by the attitude taken than the techniques used.[3]

The principal graphical techniques used in EDA are:

The principal quantitative techniques are:

Graphical and quantitative techniques are:

History

Many EDA ideas can be traced back to earlier authors, for example:

The Open University course Statistics in Society (MDST 242), took the above ideas, and merged them with Gottfried Noether's work, which introduced statistical inference via coin-tossing and the median test.

For details of the above, see John Bibby's book HOTS: History of Teaching Statistics.

Software

  • CMU-DAP (Carnegie-Mellon University Data Analysis Package, FORTRAN source for EDA tools with English-style command syntax, 1977)
  • Fathom (for high-school and intro college courses)
  • LiveGraph (free real-time data series plotter)
  • TinkerPlots (for upper elementary and middle school students)
  • SOCR provides a large number of free Internet-accessible tools for EDA.

See also

Bibliography

  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1985). Exploring Data Tables, Trends and Shapes. ISBN 0-471-09776-4. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Hoaglin, D C; Mosteller, F & Tukey, John Wilder (Eds) (1983). Understanding Robust and Exploratory Data Analysis. ISBN 0-471-09777-2. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)CS1 maint: multiple names: authors list (link)
  • Tukey, John Wilder (1977). Exploratory Data Analysis. Addison-Wesley. ISBN 0-201-07616-0. {{cite book}}: Cite has empty unknown parameters: |accessyear=, |origmonth=, |accessmonth=, |month=, |chapterurl=, |origdate=, and |coauthors= (help)
  • Velleman, P F & Hoaglin, D C (1981) Applications, Basics and Computing of Exploratory Data Analysis ISBN 0-87150-409-X

References

  1. ^ "And roughly the only mechanism for suggesting questions is exploratory. And once they’re suggested, the only appropriate question would be how strongly supported are they and particularly how strongly supported are they by new data. And that’s confirmatory.", A conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler, Statistical Science Volume 15, Number 1 (2000), 79-94.
  2. ^ Konold, C. (1999). Statistics goes to school. Contemporary Psychology, 44(1), 81-82.
  3. ^ "Exploratory data analysis is an attitude, a flexibility, and a reliance on display, NOT a bundle of techniques, and should be so taught.", John W. Tukey, We need both exploratory and confirmatory, The American Statistician, 34(1), (Feb., 1980), pp. 23-25.
  • Leinhardt, G., Leinhardt, S., Exploratory Data Analysis: New Tools for the Analysis of Empirical Data, Review of Research in Education, Vol. 8, 1980 (1980), pp. 85-157.
  • DataDesk (free-to-try commercial EDA software for Mac and PC)
  • GGobi (free interactive multivariate visualization software linked to R)
  • MANET (free Mac-only interactive EDA software)
  • Miner3D (EDA and visualization software)
  • Mondrian (free interactive software for EDA)
  • Orange (free component-based software for interactive EDA and machine learning)
  • ViSta (free interactive software based on Xlisp-Stat for EDA)
  • VisuMap (EDA software for high dimensional non-linear data)
  • Visulab (free interactive software for high dimensional non-spatial / non-temporal data with interactive EDA and visualization)
  • XLisp-Stat (free software and Lisp based EDA development framework for Mac, PC and X Window)
  • Experimental Data Analyst Mathematica application package for EDA
  • FactoMineR (free exploratory multivariate data analysis software linked to R)