Talk:Statistics

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Former good article Statistics was one of the Mathematics good articles, but it has been removed from the list. There are suggestions below for improving the article to meet the good article criteria. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.
June 11, 2006 Good article reassessment Delisted
          This article is of interest to the following WikiProjects:
WikiProject Mathematics (Rated B-class, Top-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Top Importance
 Field: Probability and statistics
A vital article.
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 
News This article has been mentioned by a media organisation:
Wikipedia Version 1.0 Editorial Team / v0.5 / Vital / Core
WikiProject icon This article has been reviewed by the Version 1.0 Editorial Team.
Taskforce icon
This article has been selected for Version 0.5 and subsequent release versions of Wikipedia.


Please click here to add new comments at the bottom of this page.


Hawthorne[edit]

Citation: Wickström, G.; Bendix, T. (2000). "Commentary". Scandinavian Journal of Work, Environment & Health 26 (4): 363. doi:10.5271/sjweh.555.  159.83.196.1 (talk) 20:22, 15 May 2012 (UTC)

Very unclear what this was added. The full title of the article appears to be: 'The "Hawthorne effect" - what did the original Hawthorne studies actually show?'. This might be relevant to some other article, but doesn't seem directly related to article content/intent. Melcombe (talk) 21:06, 15 May 2012 (UTC)
Offered in response to "citation needed".159.83.196.1 (talk) 19:07, 17 May 2012 (UTC)

Mentioning statisticians in the lead[edit]

A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful application of statistical analysis. Such people have often gained this experience through working in any of a wide number of fields. There is also a discipline called mathematical statistics that studies statistics mathematically.

This seems unnecessary. The lead summarizes the topic at hand, not the people who work in its field. I think it's more appropriate to relegate 'statisticians' to the See also section or - perhaps - have some other section, dedicated to describing what's required to work in the field, cover this.
Sowlos (talk) 14:46, 6 September 2012 (UTC)

how is statistics used — Preceding unsigned comment added by 72.27.91.178 (talk) 16:19, 4 December 2012 (UTC)

Proposed merge with Mathematical statistics[edit]

No consensus here. Number 57 11:15, 29 May 2014 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

The mathematical statistics article was WP:AFD'd. You can access its entry here
NOTE: I closed the above AfD as "defer to this discussion", per WP:IAR. Please carry on. --j⚛e deckertalk 04:26, 26 May 2014 (UTC)

The mathematical statistics article is a bit of a waif. There's a nice couple of sentences about the difference between descriptive and inferential statistics and also about the development of statistical theory. The data analysis section seems out of place there: if that were moved to an appropriate place in statistics, then there wouldn't be much left there. The original author of that article endorsed a redirect some years ago. Illia Connell (talk) 05:29, 25 April 2013 (UTC)

I agree that the current mathematical statistics article has little to offer. But, before we merge, let's ask ourselves one question: If Wikipedia were complete and perfect, would that article exist, separately from statistics? What would it cover? If the answer is "a lot", then maybe we should improve that article, instead of merging it. Mgnbar (talk) 12:58, 25 April 2013 (UTC)
In response to Mgnbar's question, I should say in a complete encyclopedia, Mathematical Statistics and Statistics should have different articles. Statistics includes many "qualitative" sub-fields that I think can be only covered in this article. Taha (talk) 02:01, 30 April 2013 (UTC)
I do believe that mathematical statistics is a separate entry, possibly even a discipline, than statistics (applied), but it should then be part of "probability theory". Limit-theorem (talk) 21:00, 9 June 2013 (UTC)

Merge. Statistics is a branch of mathematics. Having 'mathematical' in the title is redundant. The common terms that describe the difference is 'applied' or 'theoretical'. Science.philosophy.arts (talk) 00:04, 20 September 2013 (UTC)

Statistics using mathematics fairly heavily, as does physics, and engineering. None of these is simply a "branch of mathematics". Applied statistics involves many non-mathematical aspects, and even theoretical statistics goes beyond simply mathematical issues (e.g. the philosophy of inference). --Avenue (talk) 10:29, 27 November 2013 (UTC)
The mathematical statistics article is poor. The current article should be expanded. As far as statistics being a branch of mathematics, it sounds as though you are not a mathematician.

Statistics is a science in my opinion, and it is no more a branch of mathematics than are physics, chemistry and economics; for if its methods fail the test of experience--not the test of logic--they are discarded. - John Tukey

160.36.8.226 (talk) 17:41, 22 November 2013 (UTC)
  • Support - "Mathematical statistics" is completely redundant with "statistics," as I have no clue what "non-mathematical" statistics would be; a statistic is, by definition, just a function on a sample. Seppi333 (talk) 01:38, 27 November 2013 (UTC)
That is so far off base it's not even funny. Do you really need someone to explain how statistics (the field) is not just about statistics (the plural of statistic)? --Avenue (talk) 10:29, 27 November 2013 (UTC)
Avenue, I didn't notice your reply until now. I'm not sure what motivated you to be an asshole and write a rude, asinine response like that. I was (IMO - quite clearly) comparing the "mathematical" vs "non-mathematical" treatment of the field with the mathematical definition of statistical functions by juxtaposing those clauses, not suggesting the something like "the field is defined as the exhaustive set of statistical functions" or whatever absurd proposition you're suggesting I'm asserting.
I'm going to copyedit scope within the next few weeks to expand it and fix the abhorrent lack of citations; I'll probably fix the WP:UNDUE problem while I'm at it. Seppi333 (Insert ) 12:21, 23 January 2014 (UTC)
  • Strongly oppose. Expand mathematical statistics, don't merge them. I agree there's not much there at present, but it's a big subject in its own right, and certainly worthy of a separate article. --Avenue (talk) 10:29, 27 November 2013 (UTC)

Could someone who opposes the merge explain how these topics are different? Seppi333 (talk) 13:05, 27 November 2013 (UTC)

  • Oppose. Statistics theory [1] may be merged with mathematical statistics. Statistics definitely no. Mathematical statistics, based on mathematical models of uncertainty (nowadays basically probability), concerns the study of principles of inference/learning in statiscal models (selecting best models, model evaluation, and so on..). It is basically a sub field of statistics that deals with decision under uncertainty in a mathematical framework (using mathematical structures). There are other subfields alike, in which there is no clear use of uncertainty though. What is know today as unsupervised learning is the basic example (Clustering, Topological Data Analysis, Association Rule Learning etc...). It is a field by its means.
Indented line

Statistics is larger: there is aquisition of data (metodology and ethics in survey), organization of data (database), presentation of data (visualization) etc... Some may argue that such things may be modelled in mathematical basis. In fact they can, relational algebra/calculus in relational databases is an example, but this is not what is proposed by the mathematical area (a little in methodology in survey sampling yes). BrennoBarbosa (talk) 09:56, 23 January 2014 (UTC)

That statistics may be discussed without referring explicitly to math I understand, but if Clustering, Topological Data Analysis can't be considered math then what is it? computer science? Also how exactly "if its methods fail the test of experience--not the test of logic--they are discarded." Linear regression may work well on some dataset but not in another, so how does that stand as "test of experience"? As far as a I know linear regression may be a poor model when it's assumptions are violated, but then again that is case with any mathematical model since it's the assumptions that logically guarantee the validity of any theorem.Lbertolotti (talk) 21:32, 28 January 2014 (UTC)

  • What is (or what will be) the difference between probability theory and Mathematical statistics? Mathematical statistics sounds like other similar applied statistics subjects like actuarian statistics, biostatistics, social statistics, etc while it is of course Theoretical statistics (unless someone uses statistical methods to model mathematicians). In any case, I support moving the current duplicate (and slightly irrelevant) content on Mathematical statistics into Statistics and Probability theory and wait for someone to add content to Mathematical statistics or redirect to probability theory. We don't want too much duplicate information. Sda030 (talk) 00:21, 26 February 2014 (UTC)
  • Oppose- Keep them as separate articles.  SAMI  talk 16:28, 13 May 2014 (UTC)

Earlier on, Seppi333 asked a fair question: is there anything in statistics that can be considered as "non-mathematical statistics"? The article about mathematical statistics seems to suggest that the non-mathematical part of statistics consists of organizing and planning (of data, of experiments). Though I doubt if many statisticians consider organizing and planning to be part of statistics at all. Therefore support. Marcocapelle (talk) 06:01, 14 May 2014 (UTC)

Wrapping up all opposing arguments:

  • Statistics includes many "qualitative" sub-fields that I think can be only covered in this article.
    • Like what?
  • I do believe that mathematical statistics is a separate entry, possibly even a discipline, than statistics (applied), but it should then be part of "probability theory".
    • Fair enough to make a distinction between application of statistics (as an activity) and statistical theory (as a piece of knowledge). However, everything about statistics on Wikipedia is about statistical theory anyway.
  • Statistics (the field) is not just about statistics (the plural of statistic).
    • I’ve no idea what the author means with regards to whether or not to merge the two articles.
  • Expand mathematical statistics, don't merge them. I agree there's not much there at present, but it's a big subject in its own right, and certainly worthy of a separate article.
    • Its own right, then how?
  • Mathematical statistics (…) is basically a sub field of statistics that deals with decision under uncertainty in a mathematical framework (using mathematical structures). There are other subfields alike, in which there is no clear use of uncertainty though.
    • Fair enough, though that is a distinction between descriptive and inferential statistics which is already explained in the Statistics article.
  • Statistics is larger: there is acquisition of data (methodology and ethics in survey), organization of data (database), presentation of data (visualization) etc...
    • Fair enough, it would be perfect if someone would write articles about all these fields (and some of those articles already exist). Meanwhile we have two articles that are both about statistics in a 'smaller' sense, why shouldn’t we merge them?

Bottom line, I haven't seen any convincing arguments to keep two articles and so the best thing is to merge (or delete what's now on Mathematical Statistics). Marcocapelle (talk) 19:06, 24 May 2014 (UTC)


  • note I have undone the inappropriate close of this discussion. I count four opposes and only two supports, which is either a consensus not to merge or no consensus to do so. Therefore a closure to merge was inappropriate. Further a contentious discussion should only be closed by an uninvolved editor or administrator. An editor who has already participated and !voted on one side or the other should not take it on themselves to do so. Especially not by ignoring the !votes and using their own reasoning to close the discussion.--JohnBlackburnewordsdeeds 20:17, 26 May 2014 (UTC)
@JohnBlackburne: This was not a merge related to this discussion. That page was a WP:POV FORK. I don't need consensus to remove that. Feel free to remake a CORRECT page with CITATIONS to that content. Not a page about mathematical statistics with 7 citations that said
"Mathematical statistics is XYZ." (no citation)
"Bob, Greg, Bill, and Rod used XYZ which was the fad in the 1970s." (7 citations)

If you restore this again, we're going to the NPOV noticeboard AND I'm STRICTLY holding you to WP:3RR. Just test me. Seppi333 (Insert  | Maintained) 03:09, 27 May 2014 (UTC)

  • Oppose Statistics was originally the science of the state; hence the name, which was introduced into English by Sir John Sinclair, replacing the previous phrase of political arithmetic. At that time, the essence of the topic was to collect facts which would assist government, such as a census. Application of mathematical methods to these facts came later. This seems to be why the phrase mathematical statistics is so common in book titles: it was a branch of the original science. The original concept is now back in vogue as big data - old wine in new bottles, eh? As the field is broad, the article statistics should present such a historical account and summary. The article mathematical statistics should focus upon the mathematical developments. Andrew (talk) 08:45, 28 May 2014 (UTC)
That article shouldn't even exist until it's large enough to merit its own page, per WP:SUMMARY STYLE. Did you even bother reading the crap that was on that page before you restored it? In any event, I've deleted almost the entire page and made it a stub since I expect more people will randomly drop it and restore the WP:POV FORK that the page was without bothering to look at it.Seppi333 (Insert  | Maintained) 15:25, 28 May 2014 (UTC)
That's two editors now that have undone your blanking of the article. Stop trying to preempt the outcome of this discussion, wait for it to conclude.--JohnBlackburnewordsdeeds 18:11, 28 May 2014 (UTC)
Per my transclusion, this discussion is now moot. Seppi333 (Insert  | Maintained) 21:32, 28 May 2014 (UTC)
That is not even a reason, and the state you left in was a complete mess. A normal editor trying to edit that would be presented with incomprehensible (to norrmal editors) parser code that doesn't belong in an article. Someone using the visual editor would just see a template, not editable text. Two editors have restored it now, myself Andrew Davidson, so a limited consensus in favour of that version. Stop repeatedly removing content against consensus.--JohnBlackburnewordsdeeds 21:50, 28 May 2014 (UTC)
Guess we're going to the noticeboard when I return home. I'd suggest you restore my citations before I do.Seppi333 (Insert  | Maintained) 22:00, 28 May 2014 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────@JohnBlackburne:I've decided to offer this compromise instead of go straight to the notice board, as I care more about addressing the POV fork than then irrelevant content on math stat: if you're ok with both pages as they currently are, I'll concede the coatrack issue on the other page. Seppi333 (Insert  | Maintained) 00:21, 29 May 2014 (UTC)

My problem with it now is the parser code. I've never seen anything like that in an article, so it's not covered by any guideline, but it renders the page uneditable by the majority of editors. Those editing source see the parser code, which only a small minority of editors understand. Other editors will either stay away from editing or make an attempt but easily break it not knowing how it works. The Visual Editor is even worse: it simply isn't editable. So much for the encyclopaedia that anyone can edit. It's worse here; you can't actually edit the text with either editor. It provides an edit link but a very non-standard one which looks like an external link. Just copying the text would be normal and easily understood: there's no storage limit or other reason for transcluding it.--JohnBlackburnewordsdeeds 00:39, 29 May 2014 (UTC)
WP:SELECTIVETRANSCLUSION is the page on template- or article-to-article transcluding. It's been done extensively on Adderall. Seppi333 (Insert  | Maintained) 01:08, 29 May 2014 (UTC)
I just realized the template is unnecessary, so that would solve the VE problem if removed. Just the only include tags are needed for 1 section. Seppi333 (Insert  | Maintained) 01:11, 29 May 2014 (UTC)
It an improvement but still looks like parser code in source code view, and you end up with the template editor for the first two words in VE. You and I can look at it and immediately recognise what it's doing but most editors will have a much harder time editing it. As for Adderall the similar approach has been used by you judging by the history, and Amphetamine is the article that has impenetrable wikitext and broken VE editing because of it; I've not seen this anywhere else or used by any other editor. WP:SELECTIVETRANSCLUSION does not recommend this as a way to build articles: the examples it gives of transclusion are far simpler and more commonplace.--JohnBlackburnewordsdeeds 01:37, 29 May 2014 (UTC)
It mentions article-article transclusion in WP:SELTRANS#Target document markup. I'll simplify the source code more. Seppi333 (Insert  | Maintained) 02:07, 29 May 2014 (UTC)
It doesn't give that as an example how to use transclusion though. The three examples are #Composite pages, #Pages with a common section and #Repetition within a page. Anyway, guidelines only describe common practice, help pages are mostly howtos; neither is meant to be prescriptive. But if markup makes a page uneditable by a large portion of editors then it shouldn't be used if simpler markup would achieve the same. In this case (and at Adderall / Amphetamine) content should just be copied. There's no reason it has to be the same in both articles, so no need to use complex markup to make it so.--JohnBlackburnewordsdeeds 02:43, 29 May 2014 (UTC)
The relevant guideline is MOS:MARKUP: "Keep markup simple / The simplest markup is often the easiest to edit, the most comprehensible, and the most predictable.--JohnBlackburnewordsdeeds 02:50, 29 May 2014 (UTC)"
@JohnBlackburne: I don't really care how the page is unforked as long as it's not a fork. If you want to copy/paste the lead into this article, I'm fine with that. Transclusion is just much simpler to maintain. Are we in agreement to just copy the lead then? Seppi333 (Insert  | Maintained) 03:02, 29 May 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I went ahead and converted it to plain text - I'm assuming you're in agreement since your comments only pertained to the transclusion as opposed to the text. Let me know if otherwise. Seppi333 (Insert  | Maintained) 04:33, 29 May 2014 (UTC)

I've asked at WP:ANRFC whether this can be reviewed and closed, as it has I think gone on long enough.--JohnBlackburnewordsdeeds 22:07, 28 May 2014 (UTC)


The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Statistics vs Data Science[edit]

Is statistics a subfield of data science? See discussion in

  1. REDIRECT [[2]]
Statistics Data Science
Data Analysis (Inference) Data Mining
Data Organization Data Management
Data Collection Data Acquisition
Data Presentation (Exploratory Analysis) Data Visualization

The first column is just the begining of the text. Regarding the second column, see:

  1. REDIRECT [[3]]

I "dramatically" suggest merging the two or, at least, give a meaningful distinction rather than "advances in computing with data". It is almost as if a science becomes something different just because you're using a tool.
I think the name for that is Computational Statistics. I agree that there are many methods used in data science that are not yet teached to many degrees in statistics, but come on! Statistics is older and may maintain it's name, but Data Science is more descriptive in my view. What do you think? Am I being too extreme? BrennoBarbosa (talk) 09:18, 23 January 2014 (UTC)

Restructuring the Statistics article[edit]

I noticed that in the Scope and in the Overview chapters of this article a lot of text had been inserted which doesn't really belong there. Besides the main text of the Overview wasn't very clear either. On my home page, see Marcocapelle, I made an attempt to rewrite the Scope and Overview, and besides I moved all stuff that doesn't really belong in a Scope or Overview to other chapters (see chapter 3.1, 6 and 8 as numbered on my home page). Can you all check if this restructuring makes sense to you? Marcocapelle (talk) 09:17, 17 May 2014 (UTC)

You could help us understand your rewrite by listing the items that you've moved. For example, you seem to have moved misuse of statistics out of Overview. It's hard to say what belongs in Overview, as the entire article is arguably an overview of statistics. So is Overview just an overview of this article? Usually the intro section does that. Maybe we should all discuss what these sections mean. Mgnbar (talk) 10:58, 17 May 2014 (UTC)

Good point! In my perspective, an overview should give a reader insight in what Statistics really is, rather than move in all possible side directions from the start. An overview should not be used as a sort of table of contents, and should even less be used for single remarks that aren't elaborated in a later stage.

  • So I moved the more detailed part of sampling to a section in a new chapter 'Data collection'
  • I moved the part about misuse to an already existing chapter about misuse.
  • I moved the misinterpretation of correlation to a paragraph in the chapter of misuse.
  • I moved the paragraph of applied statistics versus theoretical statistics as a section of the new Trivia chapter.
  • I moved the paragraph of machine learning and data mining as another section of the new Trivia chapter.
  • I moved a paragraph into another section of the new Trivia chapter and named it Statistics in society.

In addition, of the chapters that I otherwise didn't touch, I did make the title a bit clearer though (chapter 4, 5 and 7).

Just for your info, I don't think that the Statistics article as a whole is perfect after this restructuring, but at least the start of the article has improved a lot (I think).

Kind regards, Marcocapelle (talk) 11:18, 17 May 2014 (UTC)

Has anyone taken the effort to have a look or to think about the above? Marcocapelle (talk) 19:18, 24 May 2014 (UTC)
People don't seem too upset by your changes. So I suggest that you Be Bold and start making them. Mgnbar (talk) 13:04, 26 May 2014 (UTC)
All right then, thanks for the reaction! Marcocapelle (talk) 19:51, 26 May 2014 (UTC)

Fallacy of Transposed conditional[edit]

Is there any Wikipedia article which explains the fallacy of transposed conditional? Lbertolotti (talk) 16:45, 2 September 2014 (UTC)

Prosecutor's fallacy. Qwfp (talk) 17:36, 2 September 2014 (UTC)

New lead section[edit]

I've rewritten the lead, as requested, see you people like it:

Statistics is the study of the collection, analysis, interpretation, presentation and organization of data.[1] In applying statistics to e.g. a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.[1] In case census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

When analyzing data, it is possible to use one of two statistics methodologies: descriptive statistics, which summarizes data from a sample using indexes such as the mean or standard deviation, or inferential statistics, which draws conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[2] Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. To be able to make an inference upon unknown quantities, one or more estimators are evaluated using the sample. Standard statistical procedure involve the development of a null hypothesis, a general statement or default position that there is no relationship between two quantities. Rejecting or disproving the null hypothesis is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false. What statisticians call an alternative hypothesis is simply an hypothesis which contradicts the null hypothesis. Working from a null hypothesis two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual difference between populations is missed giving a "false negative"). A critical region is the set of values of the estimator which leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn't belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false. Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. The presence of missing data and/or censoring may result in biased estimates and specific techniques have been developed to address these problems. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value in 95% of all possible cases. Ways to avoid misuse of statistics include using proper diagrams and avoiding bias. In statistics, dependence is any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. If two variables are correlated, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable.

Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not until the 18th century that it started to draw more heavily from calculus and probability theory. Statistics continues to be an area of active research, for example on the problem of how to analyze Big data.

Typing Errors[edit]

One of the diagrams on this page reads "ovservation", not "observations", which may need attention. 114.78.37.19 (talk) 12:41, 17 November 2014 (UTC)


Lbertolotti (talk) 23:49, 1 October 2014 (UTC)

  1. ^ a b Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
  2. ^ Lund Research Ltd. "Descriptive and Inferential Statistics". statistics.laerd.com. Retrieved 2014-03-23.