Talk:Statistics

Skip to table of contents

This is the talk page for discussing improvements to the Statistics article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Template:Vital article

Statistics was one of the Mathematics good articles, but it has been removed from the list. There are suggestions below for improving the article to meet the good article criteria. Once these issues have been addressed, the article can be renominated. Editors may also seek a reassessment of the decision if they believe there was a mistake.

Article milestones
Date	Process	Result
June 11, 2006	Good article reassessment	Delisted

Mathematics B‑class Top‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-priority on the project's priority scale.

Statistics B‑class Top‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Top	This article has been rated as Top-importance on the importance scale.

This page is for discussion of the article about statistics. Comments and questions about the special page about Wikipedia site statistics (number of pages, edits, etc.) should be directed to Wikipedia talk:Special pages.

This page is automatically archived to the pages listed by number in the box above. Below are links to old versions of this talk page that were used as archives before the automated process was implemented.

Template:Outline of knowledge coverage

This article has been mentioned by a media organization:

Kathy Lange (December 1, 2006). "Differences Between Statistics and Data Mining". http://www.dmreview.com/ DM Review. {{cite news}}: External link in |agency= (help)

Template:WP1.0

Archives

1, 2, 3, 4, 5

This page has archives. Sections older than 90 days may be automatically archived by when more than 2 sections are present.

Please click here to add new comments at the bottom of this page.

Hawthorne

Citation: Wickström, G.; Bendix, T. (2000). "Commentary". Scandinavian Journal of Work, Environment & Health. 26 (4): 363. doi:10.5271/sjweh.555. 159.83.196.1 (talk) 20:22, 15 May 2012 (UTC)[reply]

Very unclear what this was added. The full title of the article appears to be: 'The "Hawthorne effect" - what did the original Hawthorne studies actually show?'. This might be relevant to some other article, but doesn't seem directly related to article content/intent. Melcombe (talk) 21:06, 15 May 2012 (UTC)[reply]

Offered in response to "citation needed".159.83.196.1 (talk) 19:07, 17 May 2012 (UTC)[reply]

Mentioning statisticians in the lead

A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful application of statistical analysis. Such people have often gained this experience through working in any of a wide number of fields. There is also a discipline called mathematical statistics that studies statistics mathematically.

This seems unnecessary. The lead summarizes the topic at hand, not the people who work in its field. I think it's more appropriate to relegate 'statisticians' to the See also section or - perhaps - have some other section, dedicated to describing what's required to work in the field, cover this.
—Sowlos (talk) 14:46, 6 September 2012 (UTC)[reply]

how is statistics used — Preceding unsigned comment added by 72.27.91.178 (talk) 16:19, 4 December 2012 (UTC)[reply]

Proposed merge with Mathematical statistics

No consensus here. Number 5 7 11:15, 29 May 2014 (UTC)[reply]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

The mathematical statistics article is a bit of a waif. There's a nice couple of sentences about the difference between descriptive and inferential statistics and also about the development of statistical theory. The data analysis section seems out of place there: if that were moved to an appropriate place in statistics, then there wouldn't be much left there. The original author of that article endorsed a redirect some years ago. Illia Connell (talk) 05:29, 25 April 2013 (UTC)[reply]

I agree that the current mathematical statistics article has little to offer. But, before we merge, let's ask ourselves one question: If Wikipedia were complete and perfect, would that article exist, separately from statistics? What would it cover? If the answer is "a lot", then maybe we should improve that article, instead of merging it. Mgnbar (talk) 12:58, 25 April 2013 (UTC)[reply]

In response to Mgnbar's question, I should say in a complete encyclopedia, Mathematical Statistics and Statistics should have different articles. Statistics includes many "qualitative" sub-fields that I think can be only covered in this article. Taha (talk) 02:01, 30 April 2013 (UTC)[reply]

I do believe that mathematical statistics is a separate entry, possibly even a discipline, than statistics (applied), but it should then be part of "probability theory". Limit-theorem (talk) 21:00, 9 June 2013 (UTC)[reply]

Merge. Statistics is a branch of mathematics. Having 'mathematical' in the title is redundant. The common terms that describe the difference is 'applied' or 'theoretical'. Science.philosophy.arts (talk) 00:04, 20 September 2013 (UTC)[reply]

Statistics using mathematics fairly heavily, as does physics, and engineering. None of these is simply a "branch of mathematics". Applied statistics involves many non-mathematical aspects, and even theoretical statistics goes beyond simply mathematical issues (e.g. the philosophy of inference). --Avenue (talk) 10:29, 27 November 2013 (UTC)[reply]

The mathematical statistics article is poor. The current article should be expanded. As far as statistics being a branch of mathematics, it sounds as though you are not a mathematician.

Statistics is a science in my opinion, and it is no more a branch of mathematics than are physics, chemistry and economics; for if its methods fail the test of experience--not the test of logic--they are discarded. - John Tukey

160.36.8.226 (talk) 17:41, 22 November 2013 (UTC)[reply]

Support - "Mathematical statistics" is completely redundant with "statistics," as I have no clue what "non-mathematical" statistics would be; a statistic is, by definition, just a function on a sample. Seppi333 (talk) 01:38, 27 November 2013 (UTC)[reply]

That is so far off base it's not even funny. Do you really need someone to explain how statistics (the field) is not just about statistics (the plural of statistic)? --Avenue (talk) 10:29, 27 November 2013 (UTC)[reply]

Avenue, I didn't notice your reply until now. I'm not sure what motivated you to be an asshole and write a rude, asinine response like that. I was (IMO - quite clearly) comparing the "mathematical" vs "non-mathematical" treatment of the field with the mathematical definition of statistical functions by juxtaposing those clauses, not suggesting the something like "the field is defined as the exhaustive set of statistical functions" or whatever absurd proposition you're suggesting I'm asserting.

I'm going to copyedit scope within the next few weeks to expand it and fix the abhorrent lack of citations; I'll probably fix the WP:UNDUE problem while I'm at it. Seppi333 (Insert 2¢) 12:21, 23 January 2014 (UTC)[reply]

Strongly oppose. Expand mathematical statistics, don't merge them. I agree there's not much there at present, but it's a big subject in its own right, and certainly worthy of a separate article. --Avenue (talk) 10:29, 27 November 2013 (UTC)[reply]

Could someone who opposes the merge explain how these topics are different? Seppi333 (talk) 13:05, 27 November 2013 (UTC)[reply]

Oppose. I'm reminded of the discussion around whether to set up a 'Probability and statistics' workgroup of WikiProject Mathematics, or a separate WikiProject Statistics. Qwfp (talk) 19:16, 27 November 2013 (UTC)[reply]

Oppose. Statistics theory [1] may be merged with mathematical statistics. Statistics definitely no. Mathematical statistics, based on mathematical models of uncertainty (nowadays basically probability), concerns the study of principles of inference/learning in statiscal models (selecting best models, model evaluation, and so on..). It is basically a sub field of statistics that deals with decision under uncertainty in a mathematical framework (using mathematical structures). There are other subfields alike, in which there is no clear use of uncertainty though. What is know today as unsupervised learning is the basic example (Clustering, Topological Data Analysis, Association Rule Learning etc...). It is a field by its means.

Indented line

Statistics is larger: there is aquisition of data (metodology and ethics in survey), organization of data (database), presentation of data (visualization) etc... Some may argue that such things may be modelled in mathematical basis. In fact they can, relational algebra/calculus in relational databases is an example, but this is not what is proposed by the mathematical area (a little in methodology in survey sampling yes). BrennoBarbosa (talk) 09:56, 23 January 2014 (UTC)[reply]

That statistics may be discussed without referring explicitly to math I understand, but if Clustering, Topological Data Analysis can't be considered math then what is it? computer science? Also how exactly "if its methods fail the test of experience--not the test of logic--they are discarded." Linear regression may work well on some dataset but not in another, so how does that stand as "test of experience"? As far as a I know linear regression may be a poor model when it's assumptions are violated, but then again that is case with any mathematical model since it's the assumptions that logically guarantee the validity of any theorem.Lbertolotti (talk) 21:32, 28 January 2014 (UTC)[reply]

What is (or what will be) the difference between probability theory and Mathematical statistics? Mathematical statistics sounds like other similar applied statistics subjects like actuarian statistics, biostatistics, social statistics, etc while it is of course Theoretical statistics (unless someone uses statistical methods to model mathematicians). In any case, I support moving the current duplicate (and slightly irrelevant) content on Mathematical statistics into Statistics and Probability theory and wait for someone to add content to Mathematical statistics or redirect to probability theory. We don't want too much duplicate information. Sda030 (talk) 00:21, 26 February 2014 (UTC)[reply]

Oppose- Keep them as separate articles. SAMI ^talk 16:28, 13 May 2014 (UTC)[reply]

Earlier on, Seppi333 asked a fair question: is there anything in statistics that can be considered as "non-mathematical statistics"? The article about mathematical statistics seems to suggest that the non-mathematical part of statistics consists of organizing and planning (of data, of experiments). Though I doubt if many statisticians consider organizing and planning to be part of statistics at all. Therefore support. Marcocapelle (talk) 06:01, 14 May 2014 (UTC)[reply]

Wrapping up all opposing arguments:

Statistics includes many "qualitative" sub-fields that I think can be only covered in this article.
- Like what?
I do believe that mathematical statistics is a separate entry, possibly even a discipline, than statistics (applied), but it should then be part of "probability theory".
- Fair enough to make a distinction between application of statistics (as an activity) and statistical theory (as a piece of knowledge). However, everything about statistics on Wikipedia is about statistical theory anyway.
Statistics (the field) is not just about statistics (the plural of statistic).
- I’ve no idea what the author means with regards to whether or not to merge the two articles.
Expand mathematical statistics, don't merge them. I agree there's not much there at present, but it's a big subject in its own right, and certainly worthy of a separate article.
- Its own right, then how?
Mathematical statistics (…) is basically a sub field of statistics that deals with decision under uncertainty in a mathematical framework (using mathematical structures). There are other subfields alike, in which there is no clear use of uncertainty though.
- Fair enough, though that is a distinction between descriptive and inferential statistics which is already explained in the Statistics article.
Statistics is larger: there is acquisition of data (methodology and ethics in survey), organization of data (database), presentation of data (visualization) etc...
- Fair enough, it would be perfect if someone would write articles about all these fields (and some of those articles already exist). Meanwhile we have two articles that are both about statistics in a 'smaller' sense, why shouldn’t we merge them?

Bottom line, I haven't seen any convincing arguments to keep two articles and so the best thing is to merge (or delete what's now on Mathematical Statistics). Marcocapelle (talk) 19:06, 24 May 2014 (UTC)[reply]

note I have undone the inappropriate close of this discussion. I count four opposes and only two supports, which is either a consensus not to merge or no consensus to do so. Therefore a closure to merge was inappropriate. Further a contentious discussion should only be closed by an uninvolved editor or administrator. An editor who has already participated and !voted on one side or the other should not take it on themselves to do so. Especially not by ignoring the !votes and using their own reasoning to close the discussion.--JohnBlackburne^words_deeds 20:17, 26 May 2014 (UTC)[reply]

@JohnBlackburne: This was not a merge related to this discussion. That page was a WP:POV FORK. I don't need consensus to remove that. Feel free to remake a CORRECT page with CITATIONS to that content. Not a page about mathematical statistics with 7 citations that said
"Mathematical statistics is XYZ." (no citation)
"Bob, Greg, Bill, and Rod used XYZ which was the fad in the 1970s." (7 citations)

If you restore this again, we're going to the NPOV noticeboard AND I'm STRICTLY holding you to WP:3RR. Just test me. Seppi333 (Insert 2¢ | Maintained) 03:09, 27 May 2014 (UTC)[reply]

Oppose Statistics was originally the science of the state; hence the name, which was introduced into English by Sir John Sinclair, replacing the previous phrase of political arithmetic. At that time, the essence of the topic was to collect facts which would assist government, such as a census. Application of mathematical methods to these facts came later. This seems to be why the phrase mathematical statistics is so common in book titles: it was a branch of the original science. The original concept is now back in vogue as big data - old wine in new bottles, eh? As the field is broad, the article statistics should present such a historical account and summary. The article mathematical statistics should focus upon the mathematical developments. Andrew (talk) 08:45, 28 May 2014 (UTC)[reply]

That article shouldn't even exist until it's large enough to merit its own page, per WP:SUMMARY STYLE. Did you even bother reading the crap that was on that page before you restored it? In any event, I've deleted almost the entire page and made it a stub since I expect more people will randomly drop it and restore the WP:POV FORK that the page was without bothering to look at it.Seppi333 (Insert 2¢ | Maintained) 15:25, 28 May 2014 (UTC)[reply]

That's two editors now that have undone your blanking of the article. Stop trying to preempt the outcome of this discussion, wait for it to conclude.--JohnBlackburne^words_deeds 18:11, 28 May 2014 (UTC)[reply]

Per my transclusion, this discussion is now moot. Seppi333 (Insert 2¢ | Maintained) 21:32, 28 May 2014 (UTC)[reply]

That is not even a reason, and the state you left in was a complete mess. A normal editor trying to edit that would be presented with incomprehensible (to norrmal editors) parser code that doesn't belong in an article. Someone using the visual editor would just see a template, not editable text. Two editors have restored it now, myself Andrew Davidson, so a limited consensus in favour of that version. Stop repeatedly removing content against consensus.--JohnBlackburne^words_deeds 21:50, 28 May 2014 (UTC)[reply]

Guess we're going to the noticeboard when I return home. I'd suggest you restore my citations before I do.Seppi333 (Insert 2¢ | Maintained) 22:00, 28 May 2014 (UTC)[reply]

@JohnBlackburne:I've decided to offer this compromise instead of go straight to the notice board, as I care more about addressing the POV fork than then irrelevant content on math stat: if you're ok with both pages as they currently are, I'll concede the coatrack issue on the other page. Seppi333 (Insert 2¢ | Maintained) 00:21, 29 May 2014 (UTC)[reply]

My problem with it now is the parser code. I've never seen anything like that in an article, so it's not covered by any guideline, but it renders the page uneditable by the majority of editors. Those editing source see the parser code, which only a small minority of editors understand. Other editors will either stay away from editing or make an attempt but easily break it not knowing how it works. The Visual Editor is even worse: it simply isn't editable. So much for the encyclopaedia that anyone can edit. It's worse here; you can't actually edit the text with either editor. It provides an edit link but a very non-standard one which looks like an external link. Just copying the text would be normal and easily understood: there's no storage limit or other reason for transcluding it.--JohnBlackburne^words_deeds 00:39, 29 May 2014 (UTC)[reply]

WP:SELECTIVETRANSCLUSION is the page on template- or article-to-article transcluding. It's been done extensively on Adderall. Seppi333 (Insert 2¢ | Maintained) 01:08, 29 May 2014 (UTC)[reply]

I just realized the template is unnecessary, so that would solve the VE problem if removed. Just the only include tags are needed for 1 section. Seppi333 (Insert 2¢ | Maintained) 01:11, 29 May 2014 (UTC)[reply]

It an improvement but still looks like parser code in source code view, and you end up with the template editor for the first two words in VE. You and I can look at it and immediately recognise what it's doing but most editors will have a much harder time editing it. As for Adderall the similar approach has been used by you judging by the history, and Amphetamine is the article that has impenetrable wikitext and broken VE editing because of it; I've not seen this anywhere else or used by any other editor. WP:SELECTIVETRANSCLUSION does not recommend this as a way to build articles: the examples it gives of transclusion are far simpler and more commonplace.--JohnBlackburne^words_deeds 01:37, 29 May 2014 (UTC)[reply]

It mentions article-article transclusion in WP:SELTRANS#Target document markup. I'll simplify the source code more. Seppi333 (Insert 2¢ | Maintained) 02:07, 29 May 2014 (UTC)[reply]

It doesn't give that as an example how to use transclusion though. The three examples are #Composite pages, #Pages with a common section and #Repetition within a page. Anyway, guidelines only describe common practice, help pages are mostly howtos; neither is meant to be prescriptive. But if markup makes a page uneditable by a large portion of editors then it shouldn't be used if simpler markup would achieve the same. In this case (and at Adderall / Amphetamine) content should just be copied. There's no reason it has to be the same in both articles, so no need to use complex markup to make it so.--JohnBlackburne^words_deeds 02:43, 29 May 2014 (UTC)[reply]

The relevant guideline is MOS:MARKUP: "Keep markup simple / The simplest markup is often the easiest to edit, the most comprehensible, and the most predictable.--JohnBlackburne^words_deeds 02:50, 29 May 2014 (UTC)"[reply]

@JohnBlackburne: I don't really care how the page is unforked as long as it's not a fork. If you want to copy/paste the lead into this article, I'm fine with that. Transclusion is just much simpler to maintain. Are we in agreement to just copy the lead then? Seppi333 (Insert 2¢ | Maintained) 03:02, 29 May 2014 (UTC)[reply]

I went ahead and converted it to plain text - I'm assuming you're in agreement since your comments only pertained to the transclusion as opposed to the text. Let me know if otherwise. Seppi333 (Insert 2¢ | Maintained) 04:33, 29 May 2014 (UTC)[reply]

I've asked at WP:ANRFC whether this can be reviewed and closed, as it has I think gone on long enough.--JohnBlackburne^words_deeds 22:07, 28 May 2014 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Statistics vs Data Science

Is statistics a subfield of data science? See discussion in

REDIRECT [[2]]

Statistics	Data Science
Data Analysis (Inference)	Data Mining
Data Organization	Data Management
Data Collection	Data Acquisition
Data Presentation (Exploratory Analysis)	Data Visualization

The first column is just the begining of the text. Regarding the second column, see:

REDIRECT [[3]]

I "dramatically" suggest merging the two or, at least, give a meaningful distinction rather than "advances in computing with data". It is almost as if a science becomes something different just because you're using a tool.
I think the name for that is Computational Statistics. I agree that there are many methods used in data science that are not yet teached to many degrees in statistics, but come on! Statistics is older and may maintain it's name, but Data Science is more descriptive in my view. What do you think? Am I being too extreme? BrennoBarbosa (talk) 09:18, 23 January 2014 (UTC)[reply]

Restructuring the Statistics article

I noticed that in the Scope and in the Overview chapters of this article a lot of text had been inserted which doesn't really belong there. Besides the main text of the Overview wasn't very clear either. On my home page, see Marcocapelle, I made an attempt to rewrite the Scope and Overview, and besides I moved all stuff that doesn't really belong in a Scope or Overview to other chapters (see chapter 3.1, 6 and 8 as numbered on my home page). Can you all check if this restructuring makes sense to you? Marcocapelle (talk) 09:17, 17 May 2014 (UTC)[reply]

You could help us understand your rewrite by listing the items that you've moved. For example, you seem to have moved misuse of statistics out of Overview. It's hard to say what belongs in Overview, as the entire article is arguably an overview of statistics. So is Overview just an overview of this article? Usually the intro section does that. Maybe we should all discuss what these sections mean. Mgnbar (talk) 10:58, 17 May 2014 (UTC)[reply]

Good point! In my perspective, an overview should give a reader insight in what Statistics really is, rather than move in all possible side directions from the start. An overview should not be used as a sort of table of contents, and should even less be used for single remarks that aren't elaborated in a later stage.

So I moved the more detailed part of sampling to a section in a new chapter 'Data collection'
I moved the part about misuse to an already existing chapter about misuse.
I moved the misinterpretation of correlation to a paragraph in the chapter of misuse.
I moved the paragraph of applied statistics versus theoretical statistics as a section of the new Trivia chapter.
I moved the paragraph of machine learning and data mining as another section of the new Trivia chapter.
I moved a paragraph into another section of the new Trivia chapter and named it Statistics in society.

In addition, of the chapters that I otherwise didn't touch, I did make the title a bit clearer though (chapter 4, 5 and 7).

Just for your info, I don't think that the Statistics article as a whole is perfect after this restructuring, but at least the start of the article has improved a lot (I think).

Kind regards, Marcocapelle (talk) 11:18, 17 May 2014 (UTC)[reply]

Has anyone taken the effort to have a look or to think about the above? Marcocapelle (talk) 19:18, 24 May 2014 (UTC)[reply]

People don't seem too upset by your changes. So I suggest that you Be Bold and start making them. Mgnbar (talk) 13:04, 26 May 2014 (UTC)[reply]

All right then, thanks for the reaction! Marcocapelle (talk) 19:51, 26 May 2014 (UTC)[reply]

Fallacy of Transposed conditional

Is there any Wikipedia article which explains the fallacy of transposed conditional? Lbertolotti (talk) 16:45, 2 September 2014 (UTC)[reply]

Prosecutor's fallacy. Qwfp (talk) 17:36, 2 September 2014 (UTC)[reply]

New lead section

I've rewritten the lead, as requested, see you people like it:

Statistics is the study of the collection, analysis, interpretation, presentation and organization of data.^[1] In applying statistics to e.g. a scientific, industrial, or societal problem, it is necessary to begin with a population or process to be studied. Populations can be diverse topics such as "all persons living in a country" or "every atom composing a crystal". It deals with all aspects of data including the planning of data collection in terms of the design of surveys and experiments.^[1] In case census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

When analyzing data, it is possible to use one of two statistics methodologies: descriptive statistics, which summarizes data from a sample using indexes such as the mean or standard deviation, or inferential statistics, which draws conclusions from data that are subject to random variation, for example, observational errors or sampling variation.^[2] Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena. To be able to make an inference upon unknown quantities, one or more estimators are evaluated using the sample. Standard statistical procedure involve the development of a null hypothesis, a general statement or default position that there is no relationship between two quantities. Rejecting or disproving the null hypothesis is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false. What statisticians call an alternative hypothesis is simply an hypothesis which contradicts the null hypothesis. Working from a null hypothesis two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual difference between populations is missed giving a "false negative"). A critical region is the set of values of the estimator which leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn't belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false. Multiple problems have come to be associated with this framework: ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other important types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. The presence of missing data and/or censoring may result in biased estimates and specific techniques have been developed to address these problems. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value in 95% of all possible cases. Ways to avoid misuse of statistics include using proper diagrams and avoiding bias. In statistics, dependence is any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence. If two variables are correlated, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable.

Statistics can be said to have begun in ancient civilization, going back at least to the 5th century BC, but it was not until the 18th century that it started to draw more heavily from calculus and probability theory. Statistics continues to be an area of active research, for example on the problem of how to analyze Big data.

References

^ ^a ^b Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9
^ Lund Research Ltd. "Descriptive and Inferential Statistics". statistics.laerd.com. Retrieved 2014-03-23.

Typing Errors

One of the diagrams on this page reads "ovservation", not "observations", which may need attention. 114.78.37.19 (talk) 12:41, 17 November 2014 (UTC)[reply]

Resolved

Lbertolotti (talk) 23:49, 1 October 2014 (UTC)[reply]

Modern developments in statistics?

I notice both in the statistics article and in the linking box at the bottom, we don't currently link to existing wikipedia articles that reflect the statistics of causal inference. Since causal inference is arguably one of the main purposes of statistics, it would be good if we good somehow link topics such as Mediation (statistics), Interaction (statistics), the rubin causal model and structural equation modeling. And even though instrumental variables analysis is mentioned in the text, it would be useful to have it in the "statistics" box at the bottom of the page along with difference-in-differences, regression discontinuity design and propensity score matching. Anyway, I'm not very tech-savvy so would be grateful if someone knew how to do this? — Preceding unsigned comment added by 93.162.74.34 (talk) 23:10, 28 December 2014 (UTC)[reply]

Big data? A representative example?

I'd like to ask if the inclusion of Big data, as recently introduced to the lead in this edit [4], is representative of the "active research" being made in the statistics community. Is there not, for example, also active research in, say, Bayesian analysis of small data sets? Or, for that matter, other areas of statistics about which I am not familiar? Also, is "big data" an area of statistical research per se, or is it more accurately described an area where existing statistical methods are being applied in a newish area? So, I just wanted to ask. Isambard Kingdom (talk) 17:56, 19 June 2015 (UTC)[reply]

maths

Statics

Yakoobstk (talk) 04:18, 17 August 2016 (UTC)[reply]

Numerical Optimization in Statistics might be a big mistake

Statistics is something not like mathematics. The mathematical theory of extreme value cannot be simply applied in Statistics. This is because an optimizer constructed with sample data is randomly variable, and the extreme value of the optimizer (minimum or maximun) cannot be more significant than other values of the optimizer. We should take the expectation of the optimizer to do statistical decision, e.g. model selection. Yuanfangdelang (talk) 20:16, 30 August 2016 (UTC)[reply]

Are you talking about what statistics as a field should be doing? Or what this Wikipedia article on statistics should be discussing? If the former, then this is the wrong venue. If the latter, then you need reliable sources. And even then numerical optimization will have a place in Wikipedia statistics articles, because it is common in computing maximum likelihood estimates, etc. (whether rightly or wrongly). Mgnbar (talk) 20:23, 30 August 2016 (UTC)[reply]

So, should we delete this section? I think a so-called reliable sources is that the property of an optimizer, which is a random variable. If the minimum or maximum of a random variable is not more significant than other sampling points, why the minimum or maximum of an optimizer can do more than other values in a statistical analysis? Maybe people would say that an optimizer will be converged to its minimum or maximum. This is wrong since any random variable will not converge to its extreme value but to its expectation only. Thanks! Yuanfangdelang (talk) 20:50, 30 August 2016 (UTC)[reply]

First, I'm not sure which section you're talking about. Statistical computing? Second, by the way you have not cited a reliable source. Third, here are some facts, as I see them.

The likelihood is a function of the parameters.
The MLE is the max of this function.
Numerical optimization algorithms can find the max (when done well).
Finding MLEs in this way is a common practice in statistics.

Are any of these facts wrong? Fourth, you seem to be arguing that statisticians should not use MLE. But they do, and even if they shouldn't, it's not Wikipedia's job to correct them. Mgnbar (talk) 05:46, 31 August 2016 (UTC)[reply]

Wikipedia is not a statistics journal. To discuss what statisticians should do, or should not do, is outside the scope of Wikipedia. Publish your opinion in relevant statistics journals instead and "fix" it there first. Wikipedia is an encyclopedia, which summarizes and references important prior work only and does not do original research. We literally do not care of what "might be a big mistake" (as long as it is a mistake common e.g. in literature): Wikipedia has an article on Flat Earth despite this being a "mistake" because it used to be a dominant concept. HelpUsStopSpam (talk) 09:48, 31 August 2016 (UTC)[reply]

I don't think those facts are wrong, but they are not ALL about the likelihood. The likelihood is not only a statistical function but also a random variable defined with sampling data points. Therefore, using the MLE (maximum likelihood estimate) as an optimization in model selection doesn't make sense, in the viewpoint of Statistics. It might be the reason that all models determined by an optimization are usually over-fitting and a cross-validation has to be taken. Indeed in 1962, someone pointed out that "Optimization is danger (in Statistics)" in a paper "The future of Data analysis" published in the journal of Annals of Statistics. However, nobody has ever cared this warning since then. Unfortunately, the author didn't tell why it is danger because he didn't realize that an optimizer is a random variable, too. Yuanfangdelang (talk) 17:54, 1 September 2016 (UTC)[reply]

About the Arithmetic mean in Statistics

My another opinion on the foundation of Statistics is that the arithmetic mean cannot be used as an unbiased expectation estimate for a continuous random variable with a skewed single-peak distribution. It only can be used for absolutely normal distributions that are symmetric distributions, in which the density changes in the two laterals of the peak are symmetric. But in a skewed distribution, the density changes in the two laterals of the peak are asymmetric, and the density changes will impact the location of the peak. Therefor, for a given normal distribution curve, the arithmetic mean will not always at the peak if the normal curve is randomly changed to a skewed curve, and only the estimate on the peak can be called "unbiased" expectation estimate for the skewed distribution. Obviously nobody can prove that the arithmetic mean will be always at the peak of a skewed distribution.Yuanfangdelang (talk) 18:33, 1 September 2016 (UTC)[reply]

I just put my personal opinions in the "talk" about the term "Statistics" in wiki rather than trying to modify the contents of the term. Is it proper? Yuanfangdelang (talk) 18:35, 1 September 2016 (UTC)[reply]

No, it is not proper. You are not discussing specific changes to be made to Wikipedia's statistics article, backed up by citations of reliable sources. You are discussing your original research on perceived problems in general statistical practice. You should take both of these discussions to something like the statistics section of Stack Overflow. Regards. Mgnbar (talk) 18:42, 1 September 2016 (UTC)[reply]

Let me make a backup before migrate it to somewhere else. Thanks so much! Yuanfangdelang (talk) 19:08, 1 September 2016 (UTC)[reply]

[Dodge-1] Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9

[LundResearchLtd-2] Lund Research Ltd. "Descriptive and Inferential Statistics". statistics.laerd.com. Retrieved 2014-03-23.

[1]

[2]