Wikipedia talk:WikiProject Statistics

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics
Main page Talk page Members Templates Resources
          This page is of interest to the following WikiProjects:
WikiProject Statistics (Rated Project-class)
WikiProject icon

This page is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

 Project  This page does not require a rating on the quality scale.

Audience considerations[edit]

I've just read the articles in Weiner and Gaussian processes. I am not a mathematician, but I am a social scientist with an interest in research methodologies. I was hoping to find a clear description of cases where an assumption of normal distribution is sound. I am working on a paper where qualitative interviews indicated heterogeneity in a key behavior, so we have looked at splitting our groups using k-means cluster analysis, based on continuous behavior observation data. We found that previously-used groupings of observations, where agents had been assumed to have homogeneous behavior, had heterogeneous behavior and that individuals clustered together in multiple equilibria. We have had a lot of push-back from the statisticians and stats-trained researchers, in the group, because they claimed at first to not understand the method and then said the findings were probably exaggerated.

I came to wikipedia with these concerns: what conceptual framework supports an assumption of homogeneity or heterogeneity? What tests are available to establish one or the other? What types of cause and effect relationships underlie equilibrium processes that exist in reality? Basically, I wanted to turn the argument around and ask them to question their assumptions in the same light they were questioning my work.

I searched the web for "empirical support for homogeneity and normal distributions" and saw the word "process" with wikipedia in the search results, and thought I was on the right track for finding information about the causal/conceptual framework, like an operational model, a process flow diagram or at least a textual description of what characteristics typify these sorts of processes, or something like that. But, I was completely unprepared to understand what I was reading. It was not helpful or useful to me at all.

I don't know in general about all of the articles in the math/stats project at Wikipedia, but these articles were not accessible to me. I think they would be inaccessible by any non-mathematician. The sort of 'text book talk' in proofs and formulas can be helpful. I've really appreciatd the project's sensitivity and specificity articles. But, in these articles there was nothing but 'text book talk'. I had no frame of reference to understand these articles.

Maybe it is my applied research background that cripples me in the more basic research and math theory arena, but it seems like the audience for wikipedia should be somewhat like that of an encyclopedia, not a text book. And definitely not an advanced undergraduate/graduate school level textbook.

So, all I can say in response to my colleagues, for now, is "your assumption contradicts the beliefs of the real people we are claiming to study" and "i've shown that there isn't a tendency toward an equilibrium between our three core behavioral indices, but toward multiple points of equilibrium". I am guessing they will reply "we know better than the people we are studying, they don't realize their equilibrium-seeking tendencies" and "all you've shown is something so confusing that we don't understand it and that you don't know how to do things the old fashioned, tried and true way".

I thought the wikipedia articles would help explain how empirical single-equilibrium processes occur, something about the standard approach for supporting an assumption of equilibrium and if and how homogeneity relates to the discussion and... And all I found were pieces written to an audience so specific that I didn't learn a single thing, although the figures did say something to me, but I can't explain what because the article didn't say.

I don't want this to be a place to settle a dogmatic/ideological score, but I do think the audience should be considered in a more meaningful way. I wanted to find information that could help me make sense of complicated math stuff, but it was over my head. I'm sorry to see that.

AfC submission - 04/06[edit]

Draft:Multiple factor analysis. FoCuSandLeArN (talk) 22:46, 4 June 2014 (UTC)

Leaflet For Wikiproject Statistics At Wikimania 2014[edit]

Hi all,

My name is Adi Khajuria and I am helping out with Wikimania 2014 in London.

One of our initiatives is to create leaflets to increase the discoverability of various wikimedia projects, and showcase the breadth of activity within wikimedia. Any kind of project can have a physical paper leaflet designed - for free - as a tool to help recruit new contributors. These leaflets will be printed at Wikimania 2014, and the designs can be re-used in the future at other events and locations.

This is particularly aimed at highlighting less discoverable but successful projects, e.g:

• Active Wikiprojects: Wikiproject Medicine, WikiProject Video Games, Wikiproject Film

• Tech projects/Tools, which may be looking for either users or developers.

• Less known major projects: Wikinews, Wikidata, Wikivoyage, etc.

• Wiki Loves Parliaments, Wiki Loves Monuments, Wiki Loves ____

• Wikimedia thematic organisations, Wikiwomen’s Collaborative, The Signpost

For more information or to sign up for one for your project, go to:
Project leaflets
Adikhajuria (talk) 14:22, 13 June 2014 (UTC)

A draft at AFC needs some specialist attention[edit]

Please see Draft:Geometric-Poisson Distribution, it needs some help from a subject specialist to get it into acceptable shape. Roger (Dodger67) (talk) 14:45, 28 June 2014 (UTC)

Is this project dead?[edit]

None of the current topics on this page has received even a single reply, does that mean there is no action here? Roger (Dodger67) (talk) 07:50, 29 June 2014 (UTC)

Quiet, but not entirely dead. A number of topics were requests for comments at talk pages or drafts. I usually go directly to the indicated pages rather than comment here. The article you requested comments for has already been accepted, which quenches any comments at this point. I will say that the acceptance of the article was a mistake; geometric Poisson distributions (a type of compound Poisson distribution) have been around since the 70's and the present article seems a coatrack for a particular researcher's papers. --Mark viking (talk) 16:43, 29 June 2014 (UTC)


Hello, I just noticed an error in the confusion matrix: the denominators of FPR and FNR are switched. NB: just in the two cases at the bottom of the confusion matrix. In the list to its right things are ok. Regards, Ivo. Jul 11 2014.

Ben Geen, Colin Norris, Lucia de Berk[edit]

Are there people out there interested in the topic of unexplained clusters of cases at hospitals leading to miscarriages of justice? I recently got hired by the defence in the case of Ben Geen to take a look at some statistics in his case.

My report will be submitted to the CCRC in an attempt to get them to consider considering the case and who knows maybe even recommending a re-trial. So a very long, long way to do.

Since I'm working for the defence I should not be editing wikipedia pages on the topic. But maybe other people like to. There is a lot going on, see

About the connections with Poisson variation (the law of small numbers) and data-analytic fishing expeditions (trawling) and cognitive biases in statistics, see

Here is a big connection with statistics. Prof. Jane Hutton wrote an expertise report for the defence for (failed) appeal in 2008. She was not allowed to present her arguments in court to the jury because according to the judge what she had to say was "barely more than common sense, anyway". Shades of Sally Clarke, right? Similarly, an anaesthetist who had a lot of scathing things to say about the medical evidence, was not allowed to present arguments in court either, for the same reason. Of course, the anaesthetist was only a US associate professor and his evidence contradicted that of a UK full professor, very eminent man, who had previously been very useful in the Harold Shipman case. Now Shipman was a serial killer, no doubt about that. But Ben Geen?

Richard Gill (talk) 11:45, 28 July 2014 (UTC)

Index (statistics)[edit]

I've created this stub as a new parent for Category:Index numbers (which perhaps could be renamed). The category old parent was too limited (Index (economics)). There are indexes such as the Gender Gap Index and others I list at Measures of gender equality that are clearly not limited to the science of economics. I'll leave it to you to expand this article, or redirect it (I am not sure how relevant it is to the Indexed family...). If anyone would like a clarification for "what is this article about", it's about a type of object that could be linked from such sentence: A Human Development Index is an index that measures human development (and clearly the index (economics) was too narrow. --Piotr Konieczny aka Prokonsul Piotrus| reply here 06:14, 14 August 2014 (UTC)

Indicator (statistics)[edit]

It seems we are missing a key theoretical concept. Let's take a sample sentence: "The indicator is defined as a share of private sector employment of population aged 16". Where should the indicator link point to? I don't think our current disambig page has anything helpful... PS. Found a ref, will stub it - but help from more experienced editors in stats is appreciated. --Piotr Konieczny aka Prokonsul Piotrus| reply here 06:27, 14 August 2014 (UTC)

In statistics, an indicator variable (also called a Dummy variable (statistics)) is a auxiliary binary variable created to indicate membership in a specified set. Having created the variable, one can perform statistical analyses on it. Indicator function is a closely related concept. What you are talking about seems more like an Economic indicator, which is just a statistic of interest. Presumably other social sciences have borrowed the idea from economics. For instance, there is an article on Community indicators and a journal Social Indicators Research We don't have an article on Social indicators, but probably should. --Mark viking (talk) 18:03, 14 August 2014 (UTC)

Template: regression bar: Why OLS under models?[edit]

In the template regression bar (, I feel like ordinary least squares (OLS) shouldn't go under "Models" as it is an estimation technique and not a model. Indeed it is listed under "Estimators" too. — Preceding unsigned comment added by (talk) 11:41, 22 August 2014 (UTC)