Talk:Statistics
| This is the talk page for discussing improvements to the Statistics article. This is not a forum for general discussion of the article's subject. |
|||
|---|---|---|---|
|
Article policies
|
||
| Archives: 1, 2, 3, 4 | |||
|
|
|||
| Statistics has been listed as a level-2 vital article in Mathematics. If you can improve it, please do. This article has been rated as B-Class. |
| Statistics was one of the Mathematics good articles, but it has been removed from the list. There are suggestions below for improving the article to meet the good article criteria. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake. | ||||
|
||||
| This article is of interest to multiple WikiProjects. Click [show] for further details. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This page is for discussion of the article about statistics. Comments and questions about the special page about Wikipedia site statistics (number of pages, edits, etc.) should be directed to Wikipedia talk:Special pages. |
| This subject is featured in the Outline of statistics, which is incomplete and needs further development. That page, along with the other outlines on Wikipedia, is part of Wikipedia's Outline of Knowledge, which also serves as the table of contents or site map of Wikipedia. |
| This article has been mentioned by a media organization: | |
|---|---|
|
New lead section[edit]
I've rewritten the lead, as requested, see you people like it:
Statistics is the study of the collection, analysis, interpretation, presentation and organization of data.[1] In applying statistics to e.g. a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.[1] In case census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
When analyzing data, it is possible to use one of two statistics methodologies: descriptive statistics, which summarizes data from a sample using indexes such as the mean or standard deviation, or inferential statistics, which draws conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[2] Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. To be able to make an inference upon unknown quantities, one or more estimators are evaluated using the sample. Standard statistical procedure involve the development of a null hypothesis, a general statement or default position that there is no relationship between two quantities. Rejecting or disproving the null hypothesis is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false. What statisticians call an alternative hypothesis is simply an hypothesis which contradicts the null hypothesis. Working from a null hypothesis two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual difference between populations is missed giving a "false negative"). A critical region is the set of values of the estimator which leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn't belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false. Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.
Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. The presence of missing data and/or censoring may result in biased estimates and specific techniques have been developed to address these problems. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value in 95% of all possible cases. Ways to avoid misuse of statistics include using proper diagrams and avoiding bias. In statistics, dependence is any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. If two variables are correlated, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable.
Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not until the 18th century that it started to draw more heavily from calculus and probability theory. Statistics continues to be an area of active research, for example on the problem of how to analyze Big data.
— Preceding unsigned comment added by Lbertolotti (talk • contribs) 1 October 2014 (UTC)
References
- ^ a b Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
- ^ Lund Research Ltd. "Descriptive and Inferential Statistics". statistics.laerd.com. Retrieved 2014-03-23.
Approve I much prefer this version. The first two paragraphs are really good. I do think the other sections should be in seperate sections. Mcshuffles (talk) 10:09, 5 April 2017 (UTC)
- Note: The original poster already made the proposed changes back in October 2014. This section should probably be archived, since the lead section has changed in several ways since then. - dcljr (talk) 03:12, 23 April 2017 (UTC)
Proposed merge with Mathematical statistics[edit]
As per rationale via Wikipedia:Articles for deletion/Mathematical statistics (2nd nomination). Appears to be the third suggestion to merge. Nordic Nightfury 16:10, 29 November 2016 (UTC)
Support. The book on non-mathematical statistics must be very thin. Remarkably, the article on Mathematical statistics is not actually very mathematical, though it certainly discusses mathematical concepts. Isambard Kingdom (talk) 17:10, 29 November 2016 (UTC)
Support. Statistics is a mathematical discipline. Jmc200 (talk) 14:06, 15 December 2016 (UTC)
Oppose. Sorry guys, the above reasons for support won't wash. It's obvious, innit, that statistics is a subset of mathematics. However, there are particular areas & methods within statistics that rely more directly on mathematical techniques than others, and these areas can be grouped and called 'mathematical statistics'. Look, for example, at the modules available from the Open University: among a number of statistical modules there is a level 3 one called 'mathematical statistics'. So don't deny that there is such a field within statistics, and don't claim that is the whole field. People using statistics recognise the validity of the description. Gravuritas (talk) 16:59, 15 December 2016 (UTC)
PS having just looked at the Mathematical statistics,then confusion reigning is understandable. It starts off with some very wobbly defintions, and proceeds to fill much of the article with stuff which I would call stats, not mathematical stats. Please suspend the "merge" request for a month and I'll try to change the math stat article so that at least it does what it says on the tin. Gravuritas (talk) 17:14, 15 December 2016 (UTC)
Oppose. If there was no more to statistics than mathematical statistics, the 'mathematical' prefix would be redundant. But there is much more, such as the non-mathematical aspects of the design of surveys, experiments and observational studies, graphical and tabular display of data and results, the philosophy of statistics, plus the whole body of knowledge about how to chose which mathematical method to use. Qwfp (talk) 17:49, 15 December 2016 (UTC)
Support. "Statistics" as a scientific field is synonymous to "mathematical statistics", which should be distinct from the everyday use of "statistics" that refers to "summary statistics". The "Statistics" article should be either redirected to mathematical statistics or become a disambiguation page for "mathematical statistics" and "summary statistics". Delafé (talk) 22:35, 16 December 2016 (UTC)
Comment: the statement ' "Statistics" as a scientific field is synonymous to "mathematical statistics", ' is a numpty statement. Let's take two examples: 1. Much work with the normal distribution involves the user in nothing more than simple arithmetic, and so is not considered to be part of mathematical,statistics. 2. Some work with Likelihood functions involves the frequent use of differentiation, and so is considered mathematical statistics. The inability of pure mathematicians and non-mathematicians to recognise the difference is frankly, irrelevant. Ask a non-academic statistician and s/he will understand the difference. Gravuritas (talk) 01:17, 17 December 2016 (UTC)
- I'm not sure what you mean by "user making simple arithmetic on the normal distribution" but this sounds like a very personal point of view based on a very personal perception of statistics and mathematics in general. You seem to imply that differentiation is allegedly closer to mathematics than some other analytical process just because it is (in your opinion) more difficult to apply on paper. If that is the case then I must say you haven't understood at all what mathematics is about (and it is probably why concepts that are common knowledge to a mathematician are "numpty" to you). The fact is that any trained statistician or specialised mathematician knows that the principles of statistical inference are pure mathematics in nature and have been established by means of mathematical proof, something which cannot be said for, e.g., the algorithmic inference made by predictive models in machine learning. Delafé (talk) 10:04, 21 December 2016 (UTC)
-
- Response. Let me try one more time. The sub-field of mathematics, which is statistics, is useful to a great number of people, of a very wide range of ability, training, and experience. Many elements of statistical thinking and reasoning are sufficently easy to use that lots of people use them. Some elements of statistical reasoning demand a higher level of ability and/or training in maths than most people can cope with or been educated in, and this disparate set of techniques can conveniently be called 'mathematical statistics'. If you wish, I can list the topics that the Open University consider to be mathematical statistics. There is absolutely no assertion on my part that the rest of stats is 'non-mathematical'. If you like, we could effectively split stats into 'easy stats' and 'Less easy stats' with mathematical stats being the latter. @MaxEnt yes, non-academic statisticians would draw approximately the same line- for instance, none of the stats used by Six Sigma practitioners industrially would be mathematical statistics.
- Gravuritas (talk) 17:33, 5 April 2017 (UTC)
Support. Generally I prefer splitting over lumping, but the cuts need to be helpful, rather than obstructive. Gravuritas says "Ask a non-academic statistician and s/he will understand the difference." Unfortunately, the standard isn't "understand", it's explicate. Would any two of these intuitive non-specialists draw essentially the same line? If not, what we have here is a noddable of nonagreement, where everyone agrees internally that such a line exists, but when pressed no two people wish to carve it in the same place.
On the other side, does non-mathematical statistics even make sense? Imagine telling some math-hating child "oh, this isn't math, it's least squares". Statistics prior to least squares has about the relation to modern statistics that alchemy has to modern chemistry (defined error estimator about as central to the reformation as the periodic table).
For my own purposes, I might be tempted to lasso traditional statistics (distilled facts about national populations useful to government) under a page titled "statistics (public administration)". But I'm sure not going to invent an arbitrary wall to partition statistics into "statistics (alchemy)" and "statistics (chemistry)", as if that usefully aids the great unwashed who visit here.
After checking out Darwin, Galton and the Statistical Enlightenment, another page name for old-school statistics would apparently be "statistics (unenlightened)", and then I guess old-old-school statistics would be "statistics (applied phlebotium)" — look ma, no math at all! — MaxEnt 18:31, 11 January 2017 (UTC)
Oppose There is a lot of statistics that is outside the scope of mathematics, you can see my examples in Talk:Statistics#Definition_of_.22statistics.22. Mcshuffles (talk) 14:38, 4 April 2017 (UTC)
Oppose These are very different topics, and the combined article would be too varied to be useful to many people. Elliot321 (talk) 17:39, 4 April 2017 (UTC)
Support We should combine the articles because the articles are so similar and basically are the same. Not-a-parted-haired-libertarian (talk) 14:28, 10 April 2017 (UTC)
- The preceding comment (by Not-a-parted-haired-libertarian) was moved from the next section to this one, because I think the editor simply made a mistake when they placed it at the bottom of the page. I added the "Support" label for the convenience of readers. Note that this user has since been blocked for sock puppetry (not, AFAIK, involving any other user who has commented here) and disruptive editing, but this comment seems reasonable, so… - dcljr (talk) 10:02, 23 April 2017 (UTC)
Long comment. (First off, please note that the proposal is for Mathematical statistics to be merged into Statistics, not the other way around. Also, I should point out upfront that I created the Mathematical statistics article back in August 2004, when there was very little statistical information in Wikipedia.) This has come up repeatedly over the past decade (4 different links). Most of the arguments I've seen that MS should not be merged here seem to be based on a hypothetical MS article that has never actually existed — namely, a well developed one containing much material not already covered (or not more appropriately covered) in Statistics (or elsewhere… more on that in a moment). OTOH, many objections to having MS as a separate article seem to be based on a misunderstanding (or mischaracterization) of what the word "mathematical" implies about the topic.
Addressing the second point first: The distinction between statistics and mathematical statistics is quite similar to that between physics and mathematical physics. A lot of physics requires mathematical calculations, of course. And almost all of the physics in use nowadays (being learned by students and being applied by scientists and engineers) was developed after — and is in some way a direct result of — Newton's application of then-cutting-edge mathematics to the subject. But that doesn't mean all of "today's physics" can rightly be called "mathematical physics". Instead, the term (as I understand it) refers to current research in physics that employs various techniques from applied mathematics (especially from mathematical analysis and abstract algebra), as well as to upper-level undergraduate and graduate physics courses taught from the same perspective. (Note that as a math major in college I took 3 semesters of physics alongside physics majors, but none of what I learned was what I'd call "mathematical physics"!) Similarly, just because most statistics being used nowadays is mathematical in nature and follows the mathematical work of Legendre, Galton, Pearson, etc., doesn't make it all "mathematical statistics". Instead, that term is used mainly in academia to refer to research and undergrad/grad courses based on techniques of (mostly) mathematical analysis. There is a contrast to be made with physics, however, in that I think I would count a lot of calculus-based introductory statistics classes as (elementary) mathematical statistics, whereas I don't think most calculus-based introductory physics classes count as mathematical physics. (But perhaps that could be attributed to my own bias or ignorance.)
In any case, note that we also have a Statistical theory article, as well as a redirect at Theoretical statistics that points not to that article but to Mathematical statistics. With this in mind, it is perhaps instructive to consider other fields of study X that not only have an article or redirect at "Mathematical X" but also at "Theoretical X" or "X theory". These include (omitting variations that use "theory" in a different sense than the one under discussion):
- Chemistry: Mathematical chemistry, Theoretical chemistry
- Biology: Mathematical biology = Theoretical biology = Biological theory (all redirect to Mathematical and theoretical biology)
- Economics: Mathematical economics, Economic theory (latter redirects to Economics#Theory)
- Psychology: Mathematical psychology, Theoretical psychology
Looking through these, it would seem that "Mathematical X" ("MX") is typically described as involving the application of mathematical methods to the field X, whereas "Theoretical X" or "X theory" ("TX") is more concerned with providing theoretical explanations of observed phenomena in the field X. The distinction is a subtle one, but IP editor 5.151.82.74 explained it this way (paraphrasing remarks made at Talk:Theoretical physics): "MX" tends to be a branch of applied mathematics of interest to mathematicians, whereas "TX" tends to be a subfield of X. One might take the contrast to the extreme and say that "MX" investigates the properties of, and relationships between, mathematical objects that just happen to be inspired by the field X (i.e., to find mathematical truths), whereas "TX" exists to find better explanations of observations collected in field X (i.e., to find scientific "truths" [explanations]).
Now, despite the fact that many people call statistics a science, there isn't really the same level of interplay between theory and experimental observation as in the other sciences listed above. So any distinction between statistical theory and mathematical statistics is perhaps not a useful one for our purposes. Therefore, if we're going to merge and redirect one of them to this article, I would say merge and redirect both; and if we're going to keep them separate from this article, then I would say merge and redirect one to the other. (Note, BTW, that both ST and MS are linked to by a similar number of other articles. By my count, not counting template transclusions: 449 links to Statistical theory and 365 links to Mathematical statistics.)
Perhaps a way of moving forward on this is for one user to take it upon themselves to implement one solution (i.e., in their userspace, with the help of other interested parties) and another user to implement the other solution, and then it will come down to a more concete descision: which solution seems better. (Although keep in mind that some interested parties might not be able to devote much time to this until the summer…) - dcljr (talk) 17:51, 23 April 2017 (UTC)
Definition of "statistics"[edit]
I take issue with the following that was written in this Wikipedia article (introduction section, fourth line from top):
Some popular definitions are: Merriam-Webster dictionary defines statistics as "classified facts representing the conditions of a people in a state – especially the facts that can be stated in numbers or any other tabular or classified arrangement[3]".
I've probably read the definition of "statistics" once or twice before, but I've never seen it specifically (yet so vaguely) attributed to "people in a state." So, I decided to pay Merriam-Webster.com a visit (the source cited) and this is what it actually has listed for the definition of "statistics":
Definition of statistics 1: a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data 2: a collection of quantitative data
Statistics is confusing enough as it is. Perhaps we shouldn't complicate it any further than it needs to be. I'm not trying to be rude, so I hope this is taken as a friendly piece of criticism. At the very least, though, citations should be used accurately. Perhaps the definition you used was true at the time of writing, although I find it hard to believe Merriam Webster would use such a definition for statistics. Stranger things have happened I suppose. Either way, I just thought I should let you know. Emerald Evergreen 18:14, 27 January 2017 (UTC)
Statistics being a branch of mathematics, depends on your definition of mathematics. I'd argue that certain aspects of statistics are outside the scope of mathematics, like:
- certain principles that relate to real-world data (for example the Likelihood principle)
- visualising data
- biases
- statistical algorithms — Preceding unsigned comment added by Mcshuffles (talk • contribs) 14:05, 4 April 2017 (UTC)
Statistics is also treated as separate to mathematics in scientific literature. For example in arvix it is not considered a branch of mathematics. In the world of scientific journal's math and statistics tend to be treated separately.
It would be better to say "statistics is a science" than "statistics is a branch of mathematics", since the later takes sides in an on-going dispute (you can for example see this dispute on CrossValidated https://stats.stackexchange.com/questions/78579/stats-is-not-maths).
I much prefere oxford dictionary's definition
The practice or science of collecting and analysing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
---Mcshuffles (talk) 13:59, 4 April 2017 (UTC)
- Wikipedia level-2 vital articles in Mathematics
- Wikipedia B-Class vital articles in Mathematics
- Wikipedia B-Class level-2 vital articles
- Delisted good articles
- Mathematics articles related to probability and statistics
- Frequently viewed mathematics articles
- B-Class mathematics articles
- Top-Priority mathematics articles
- Vital mathematics articles
- B-Class Statistics articles
- Top-importance Statistics articles
- WikiProject Statistics articles
- B-Class Version 1.0 articles
- Top-importance Version 1.0 articles
- Mathematics Version 1.0 articles
- B-Class Version 0.5 articles
- Top-importance Version 0.5 articles
- Wikipedia Version 0.5 selected articles
- Mathematics Version 0.5 articles
- Wikipedia CD Selection-0.5
- B-Class vital articles
- Wikipedia Version 1.0 vital articles
- B-Class core topic articles
- Wikipedia Version 1.0 core topic articles
- Wikipedia Version 1.0 articles
- Wikipedia pages referenced by the press