Wikipedia talk:WikiProject Statistics

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics
Main page Talk page Members Templates Resources

2016 Community Wishlist Survey Proposal to Revive Popular Pages[edit]

Magic Wand Icon 229981 Color Flipped.svg

Greetings WikiProject Statistics Members!

This is a one-time-only message to inform you about a technical proposal to revive your Popular Pages list in the 2016 Community Wishlist Survey that I think you may be interested in reviewing and perhaps even voting for:

If the above proposal gets in the Top 10 based on the votes, there is a high likelihood of this bot being restored so your project will again see monthly updates of popular pages.

Further, there are over 260 proposals in all to review and vote for, across many aspects of wikis.

Thank you for your consideration. Please note that voting for proposals continues through December 12, 2016.

Best regards, SteviethemanDelivered: 18:08, 7 December 2016 (UTC)

Two similar categories[edit]

Do we need Category:Probability theorists and Category:Researchers in stochastics to be kept separated? Their scope looks pretty similar but I may well be overlooking something here. Marcocapelle (talk) 09:55, 9 December 2016 (UTC)

Should Wikipedia statistics articles be understandable by non-statisticians?[edit]

Isambard Kingdom has proposed deletion of material from several statistics project articles with the justification that the material is too detailed.

The justification raises a fundamental issue about the suitable level of presentation in Wikipedia statistics articles. I believe the question would benefit from discussion among members of the Statistics WikiProject. (For examples of Isambard Kingdom's proposed deletions see the talk pages for the Hosmer-Lemeshow test, Kaplan-Meier estimator, or Exponential distribution)

A common criticism of Wikipedia statistics articles is that they are not comprehensible to non-statisticians. Here are examples from some of the most frequently-read statistics pages.

  • "Absolutely obtuse to a lay person"

  • Utterly indecipherable to the lay-reader. If the general public is your audience, this article is a complete failure. I'm a reader with an advanced degree, and a well-rounded education, and I can't penetrate even the lede.

  • "The explanation is completely in theoretical terms. I'm trying to understand an article better, and this piece is absolutely no help in doing so."
  • "The first sentence is ridiculously complicated! Statistics is very poorly explained on wikipedia, and this is one of the worst examples."

  • "Who is this article for? Well it isn't for me. I understood NOTHING!"

  • "Accessibility: Would it be possible to write an introductory section that gives just a conceptual description of what the binomial distribution is about, before we enter the maths?"

  • "This article is quite technical. It would be nice to have a simpler layman's description too."

  • "Please, somebody, take pity on those of us who need more fundamental understanding, and write an introduction to this subject that would be useful and graspable by anybody with the basic interest to look it up. That's how to make Wikipedia better; make it useful."

  • "This page is utterly incomprehensible for the novice who just wants a basic idea of what logistic regression analysis *does*. The rigorous math is fine but before diving into it it would be nice to give a more comprehensible introduction and maybe a real world example that might illuminate the topic a bit."
  • "The point above is extremely relevant. Most people do not have a firm understanding of Applied Mathematics or Statistics in general. Quite a surprise that none of the contributing authors has ventured into making their knowledge understandable for the lay person. The ability to teach or communicate concepts to others is a distinction between an expert and an apprentice."
  • "Generally I've found that statistics articles not saying very much (although a few of them do) and consequently incomprehensible"
  • "As a novice, most wikipedia articles on statistics are useless. An encylopedia article should present basic information, and direct users to more detailed information at other entries. Someone has written a very fine statistics textbook, in wiki-form, that is useless to either laymen or novices."
  • "You MUST be joking. I don't think I am a dolt. However I am not a mathematician nor a statistician; I am a professional translator (also a linguist and also a contributor to Wikipedia but in language-related articles and such). I looked up this article today because I NEED to know, in a very basic LAYMAN's sort of way, what logistic regression is, what it is about, and ideally (for my purposes) an intelligible explanation of how it works which provides a model of the language that ought to be used when explaining this to someone."
  • "recognisable (faithful might be a better word) to those familiar working with logistic regression but completely opaque to neophytes. I cannot understand it and I'm really trying."

The key problem with many of the statistics articles in Wikipedia, as indicated by the quotes, is that they are incomprehensible to lay readers. The current level of writing is suitable for a person already familiar with probability and statistics, not for the average readers.

The need is for articles that provide sufficient detail and examples that the reader can readily understand. Editors of the statistics pages have more familiarity with these concepts than most readers. What statistician-editors find tedious, many readers will find informative and necessary to understanding.

If Isambard Kingdom and other readers find the material too detailed and tedious, then one solution would be to move the introductory material to the end of the articles. The material could be in a section with the title "Introduction to x for the novice". In that way, readers who wish for a brief, mathematical, highly technical explanation can get that first, while readers who wish for a more comprehensible lay-oriented explanation can find it at the end.

Would appreciate your thoughts and suggestions. If this is not the suitable forum for this discussion, please inform me of the better forum and accept my apologies. Michaelg2015 (talk) 19:58, 14 January 2017 (UTC)

Prodding articles for this reason is inappropriate, so they should all be rejected. However too technical is a problem. Two ways to address this are 1. to have the introductory material at the start. The lede should be understandable to most readers. Secondly for large articles/topics there could be an "Introduction to xxxx" article, linked right from the start of the technical article. I don't thnk the introductory stuff should be at the end, as the newbies to the topic won't even find it before they give up reading. Rather than a statistical specialist writing that material, we need someone like a maths teacher that knows statistics to work on it. Graeme Bartlett (talk) 01:17, 15 January 2017 (UTC)
"Too detailed" is not a reason for deletion, but I don't see evidence of a prod in the recent logistic regression history, either. How best to summarize a highly technical topic for the widest possible audience is a problem for many math articles. WP:TECHNICAL is the basic guideline here. Among its advice is to make early sections as simple as possible and to write "one level down". Looking at the logistic regression article, the first two sections contain (1) what is it used for, (2) a concrete example) and (3) the basic concepts behind LR. I think it does a pretty good job of explaining the basics without straying into textbook territory. The prose could probably be improved. But I don't think there is any magic prose that in a few paragraphs will allow general audience folks without an understanding of basic notions like probability distribution or fitting data to a model to understand what LR is about. --Mark viking (talk) 04:55, 15 January 2017 (UTC)
This discussion prompted me to revisit the Logistic regression article and make some revisions, although inadequate. Michaelg2015's post is useful in pointing out the big problems, and this is the right forum. I accept that the goal should be to make a broad set of statistical articles accessible to the lay person. This is to be done by writing accessible introductory explanations (what the statistical method is, a good concrete example, and a good discussion of history of applications) and relegating complications to later sections or to advanced level articles (which themselves should meets similar standards, but expecting readers to be informed on basic level topics). A common complication is that there are multiple ways most statistical methods were derived, or could be derived
  • E.g., ordinary linear regression can be explained from curve-fitting, visually, then by arbitrary adoption of the minimization of the least squared deviations as a convenient method of calculation, or it can be derived from maximum likelihood estimation assuming normally-distributed errors).
  • E.g., logistic regression can be explained from curve-fitting, and by the relative ease of calculation vs. the probit approach (see J.S. Cramer, The Origins of Logistic Regression, with interesting tabulation of number of papers by decade employing probit vs. logit), and by maximum likelihood
The principles of the above-mentioned wp:Technical are good, but we need more guidance for statistics in particular. A general solution to the problem could be to develop a good working format of a standard statistics article, and refine that with practice. Here's a draft outline:
1. One- or two-paragraph introduction stating what the statistical method "is". This must define the method simply, and it may mention that extensions/complications exist but not go into them at all, to avoid cognitive overload.
2. A good concrete example. This should be well-chosen. To be encyclopedic, it should be a real example that has historical or practical importance; it should not be a bad-textbook-like made up implausible example.
3. Applications history, scope, in various field areas. Comments on growth and decline of use relative to alternatives.
4. How the model can be derived, with historical notes, perhaps in chronological order of actual derivation of the model, but perhaps better in order of simplicity of explanation
    • Derivation 1 (e.g., curve-fitting, use of graph paper perhaps lognormal graph paper as relevant)
    • Derivation 2 (e.g., maximum likelihood)
4. Calculation of the method: brief discussion of algorithms, mention of some software
5. Interpretation of the estimated model.
6. Alternative measures of fit, there are always ad hoc alternatives, unfortunately, some of which may be useful. (What is an ROC curve, anyhow?)
7. Extensions
    • E.g., when data is pair-matched
    • E.g., multiple categories, unordered or not, rather than just binary outcomes for logistic regression
Note the "Extensions" should usually be short discussions linking to a main article on the advanced topic.
--doncram 11:54, 16 January 2017 (UTC)

Main Wikipedia gamma distribution page - an error[edit]

Hi, I'm new to this, I know how to edit Wikipedia pages, but I am reluctant to do that on such a fundamental point, so I thought I would post this here where I can't do any damage.

The CDF of the Gamma Distribution is correct on this Wiki Statistics project page, or whatever this place is called, but is incorrect on the main Wikipedia page you go to for the Gamma Distribution. The summary box on the RHS of the top of that page gives 1- F(x), not F(x) (I am not going to try to format this...). It is correct in the body of the Gamma Distribution page. Anyone familiar with this distribution will understand what I am talking about and will, I hope, change it. I have been working a lot with survival analysis lately and it is a disaster to get muddled on 1 - F(x) versus F(x).

Vandawk8 (talk) 14:58, 20 January 2017 (UTC) Vandawk8 (talk) 14:58, 20 January 2017 (UTC)

Thanks for catching that. Looking at the article history, a recent editor changed the summary CDF to be in terms of the upper incomplete gamma function, whereas in the body of the text, the CDF is in terms of the lower incomplete gamma function. I think it is better to stick with the conventions in the body of the text for the summary template, so will make the change. In the future, if you see a problem, be bold in making a correction! There are others watching the page and will often discuss and revert if they think your edit is a mistake. Cheers, --Mark viking (talk) 18:49, 20 January 2017 (UTC)

WikiJournal of Science promotion[edit]

WikiJournal of Science logo.svg

The WikiJournal of Science is a start-up academic journal which aims to provide a new mechanism for ensuring the accuracy of Wikipedia's scientific content. It is part of a WikiJournal User Group that includes the flagship WikiJournal of Medicine.[1][2]. Like Wiki.J.Med, it intends to bridge the academia-Wikipedia gap by encouraging contributions by non-Wikipedians, and by putting content through peer review before integrating it into Wikipedia.

Since it is just starting out, it is looking for contributors in two main areas:


  • See submissions through external academic peer review
  • Format accepted articles
  • Promote the journal


  • Original articles on topics that don't yet have a Wikipedia page, or only a stub/start
  • Wikipedia articles that you are willing to see through external peer review (either solo or as in a group, process analagous to GA / FA review)
  • Image articles, based around an important medical image or summary diagram

If you're interested, please come and discuss the project on the journal's talk page, or the general discussion page for the WikiJournal User group.

  1. ^ Shafee, T; Das, D; Masukume, G; Häggström, M. "WikiJournal of Medicine, the first Wikipedia-integrated academic journal". WikiJournal of Medicine. 4. doi:10.15347/wjm/2017.001. 
  2. ^ "Wikiversity Journal: A new user group". The Signpost. 2016-06-15. 

T.Shafee(Evo&Evo)talk 10:29, 24 January 2017 (UTC)