Wikipedia talk:WikiProject Statistics/Archive 5

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Morisita's overlap index

Morisita's overlap index has problems. I've commented at talk:Morisita's overlap index. Michael Hardy (talk) 01:56, 12 December 2010 (UTC)

Much improved now....... Michael Hardy (talk) 04:06, 24 December 2010 (UTC)

Principal stratification

Principal stratification is a severely stubby article that needs a lot of work if it is to be worth keeping. Not only within the article, but also if there are other articles that should link to it, that should get done. Michael Hardy (talk) 04:06, 24 December 2010 (UTC)


Canonical correspondence analysis was not mentioned on the disambiguation page about CCA, so I've added it there, but now a separate page about this method is needed. It is mentioned only very briefly on the page about correspondence analysis.--Sylwia Ufnalska (talk) 08:26, 4 January 2011 (UTC)

Bayesian information criterion

While perusing Special:RecentChangesLinked/Category:WikiProject Statistics articles I noticed this comment on the talk page of Bayesian information criterion. I've reverted the change to the first formula that was made by over several edits on 15 November 2010, but I've left in some other material for now that was added by the same editor. I agree with that this is an important article that is in need of attention from an expert in this area. That's not me though, so i've tagged it with {{expert-subject|Statistics}}. --Qwfp (talk) 10:19, 4 January 2011 (UTC)

Energy distance

Would anyone like to take a look at Energy distance to see if it should be merged with E-statistic (energy statistics)? Also the energy distance is mentioned as statistical distance. Would a statistical distance necessarily have to have properties of distance? (E-distance does not satisfy triangle inequality.) Mathstat (talk) 21:27, 5 January 2011 (UTC)

Just to note here that the article "statistical distance" covers the point that things covered by the general term need not satisfy conditions of being, for example, metrics. Melcombe (talk) 09:58, 6 January 2011 (UTC)
Merge template now added. There is also some overlap with distance correlation, which was itself a merger of several other articlers started by the same editor. But let discussion be on the talk pages of those articles.Melcombe (talk) 16:33, 6 January 2011 (UTC)

New Hot Articles subscription service for WikiProjects

Would you guys be interested in beta testing a new Hot Articles service? This service automatically updates a list of the most edited articles for your project from the last several days. An example can be seen at WikiProject Feminism. If you're interested, you can sign up on the subscriptions page. The initial trial is limited to 10 Projects and is for 7 days only, but then will likely be extended indefinitely if the trial is successful. Kaldari (talk) 23:22, 11 January 2011 (UTC)

'Missing values' or 'missing data'?

I've proposed renaming 'Missing values' to 'Missing data'. As the latter is a redirect with no history I could have just been bold, but i thought it might potentially be controversial with some so decided to solicit discussion first at Talk:Missing values#Rename. --Qwfp (talk) 10:38, 15 January 2011 (UTC)


Would it be possible for more of this information to be written for they lay-man?

Ie. It seems that just searching the two-tailed Mann-Whitney U-test requires a post-grad qualification in statistics to determine if it is more suitable than a students t-test in science articles that need reviewing.

I would like to search general information, and find general information on topics i can normally get background information on, by 'Wikiiing'. I do not like to search general information and find highly complicated information which is too verbose or to specific to jargon in the field.

The text is quite informative, and very detailed, but overly reliant on jargon. Please dumb it down for the basic scientist undergrad.

Jay-ace-n (talk) 11:13, 27 October 2010 (UTC)

A lot of statistics articles are overly technical; see section 17 just above, where I list articles that I've been working on and ask people to list other articles needing "dumbing down", but nobody did this.

Could you point to exactly which articles and sections you think are too technical, and say exactly what is too technical? Your comment above about the Mann-Whitney U-test is one example, and a good one -- which article are you referring to exactly? Thanks, Benwing (talk) 04:10, 28 October 2010 (UTC)

I have been noticing this. I would be willing to contribute some real-world examples that people could more easily relate to, possibly even with values given for the parameters (i.e., showing what happens with IQ test results of N(100, 15)). I think this could help. toll_booth (talk) 18:51, 5 January 2011 (UTC)

Hi "TB", if you are really willing to undergo this it would be great!
I recently asked for some suggestions for tiny datasets that might be useful, and came by some great resources. This might be also useful to you in what you propose:
Cheers, Talgalili (talk) 22:02, 5 January 2011 (UTC)
OK I'll try. :) toll_booth (talk) 04:59, 17 January 2011 (UTC)

student's t distribution/"Special cases" section

At some point there should be an explicit tie between the pdf and cdf in the box with the "Distribution function F" and the "Density function f" in the indicated section, even though it is implied and easily established. (talk) 22:47, 19 January 2011 (UTC)

de Moivre's law - request for feedback

Currently De moivre's law redirects to De Moivre's formula, but there is at least one important use of the term de Moivre's law in actuarial science for which the article de Moivre's formula is not relevant. I wrote a draft of a short article on de Moivre's law here User:Mathstat/De Moivre's law and would appreciate any feedback on the article, suggestions for categories, and about what to do next, since there is already a page by this name. Mathstat (talk) 18:23, 13 February 2011 (UTC)

Looks very nice to me. I'd move it to de Moivre's law and add a hatnote along the lines of — then edit de moivre's law to redirect to de Moivre's law --Qwfp (talk) 19:01, 13 February 2011 (UTC)
Thanks for the advice; moved it and edited the redirect. Mathstat (talk) 23:15, 13 February 2011 (UTC)

Log likelihoods: Logarithm article

The GA article on logarithms shall soon be nominated for feature-article status. The main author, User:Jakob.scholbach, requests help with the section on probability and statistics, particularly with the paragraph on log-likelihoods.

There are short comments at the bottom of my talk page: I suggested his using the discussion from our article on likelihood functions, featuring the gamma distribution, or a binomial likelihood.

A member of the WP project on mathematics, Jakob has helped many of our articles also.

Thanks! Best regards,  Kiefer.Wolfowitz  (Discussion) 23:56, 15 February 2011 (UTC)

Wijsman's decomposition

I just needed a beautiful theorem which lots of people know but is not written down anywhere (asfaik) in an *accessible* way including elementary examples. Suppose a probability space is invariant under a compact group of transformations on . Suppose for simplicity that only the trivial subgroup leaves all elements of the space fixed (otherwise we must divide it out). Assume smoothness. Then the space is essentially the product of two independent probability spaces: one space carrying the maximal invariant, the other being the group itself with Haar measure. There is a neat elementary example of this situation in the Monty Hall problem!

The result is also much used in ergodic theory, it's called the ergodic decomposition.

Question: what to call it, what to link it to?

There are connections to sufficiency, to invariance (in statistics), to experimental design, and so on. Everywhere where symmetry can be used to simplify statistical models or statistical reasoning. Multivariate normal distribution and multivariate analysis.


R. Wijsman (1990), Invariant measure on groups and their use in statistics

P. Diaconis (1988), Group representations and their applications in statistics and probability

I'd like to write an article on this, but I'll need some help. Richard Gill (talk) 11:13, 23 February 2011 (UTC)

Citation templates now support more identifiers

Recent changes were made to citations templates (such as {{citation}}, {{cite journal}}, {{cite web}}...). In addition to what was previously supported (bibcode, doi, jstor, isbn, ...), templates now support arXiv, ASIN, JFM, LCCN, MR, OL, OSTI, RFC, SSRN and Zbl. Before, you needed to place |id={{arxiv|0123.4567}} (or worse |url=, now you can simply use |arxiv=0123.4567, likewise for |id={{JSTOR|0123456789}} and |url=|jstor=0123456789.

The full list of supported identifiers is given here (with dummy values):

{{cite journal |author=John Smith |year=2000 |title=How to Put Things into Other Things |journal=Journal of Foobar |volume=1 |issue=2 |pages=3–4 |arxiv=0123456789 |asin=0123456789 |bibcode=0123456789 |doi=0123456789 |jfm=0123456789 |jstor=0123456789 |lccn=0123456789 |isbn=0123456789 |issn=0123456789 |mr=0123456789 |oclc=0123456789 |ol=0123456789 |osti=0123456789 |rfc=0123456789 |pmc=0123456789 |pmid=0123456789 |ssrn=0123456789 |zbl=0123456789 |id={{para|id|____}} }}

Obviously not all citations needs all parameters, but this streamlines the most popular ones and gives both better metadata and better appearances when printed. Headbomb {talk / contribs / physics / books} 03:23, 8 March 2011 (UTC)

Related article up for deletion

See Wikipedia:Articles for deletion/VisuMap for discussion of deletion of VisuMap which is apparently a package for exploratory data analysis... listed 18 March so you'd need to be quick. Melcombe (talk) 16:46, 22 March 2011 (UTC)

Data generating process

We don't currently have an article or redirect for the term 'data generating process' or 'data-generating process', though a search gives 12 hits. Would it be reasonable to redirect both to 'Generative model' and edit its lead accordingly? I confess I'm not familiar with the latter terminology. Qwfp (talk) 16:50, 21 February 2011 (UTC)

The most un-nerving statistical clichés (even worse than "the gold standard") are "the data generating process" (bad feeling in my stomach) and "letting the data speak" (shudder of horror). The cointegration book and other writings by econometrician Katarina Juselius use both clichés.
There are many caveats about "statistical models"---ranging from "all models are flawed" (Box) to "a statistician who thinks he knows the model is not a statistician" (Kempthorne, unsurprisingly!). I cannot ~remember any criticisms of the "data generating process" hubris, though, but I would love to read some. Sincerely,  Kiefer.Wolfowitz  (Discussion) 18:05, 21 February 2011 (UTC)
I think there is some need at wikipedia for explanation of "process" and other nouns such as model and system. Nominally the article generative model is more specific. It suffers from taking "model" for granted.--P64 (talk) 18:46, 21 March 2011 (UTC)

It may be most sensible to set up data generating process as a disambiguation article, since there seem to be (at least) 3 distinct usages found in a Google search: (i) meaning effectively the "data collection process", being routes and procedures by which data reach a database (particularly where these may change over time); (ii) a statistical/probabilistic model representing supposed random variations in observations, often in terms of explanatory and/or latent variables; (iii) a general but non-specific (in the sense of being not directly/explicitly modelled) term for all the random influences that combine together to lead to individual observations, where one instance would be the supposed justification of the "common occurence" of the normal distribution in terms of a combination of multiple random additive effects. I don't think that Generative model is helpful here as that article requires one to make the distinction "Generative models contrast with discriminative models" (and besides which it is pretty vague about whatever it is actually about). Melcombe (talk) 13:36, 7 April 2011 (UTC)

Melcombe's comments constitute an excellent draft of a disambiguation page.  Kiefer.Wolfowitz  (Discussion) 16:21, 7 April 2011 (UTC)


I have nominated Monty Hall problem for a featured article review here. Please join the discussion on whether this article meets featured article criteria. Articles are typically reviewed for two weeks. If substantial concerns are not addressed during the review period, the article will be moved to the Featured Article Removal Candidates list for a further period, where editors may declare "Keep" or "Delist" the article's featured status. The instructions for the review process are here. Tijfo098 (talk) 22:57, 7 April 2011 (UTC)

ArbCom decision on Monty Hall Problem: Probability theory versus statistical decision theory

The proposed ArbCom decision on the Monty Hall Problem is being discussed.

There is an erroneous statement that the problem is one of probability theory, rather than one of (statistical) decision theory. This error also occurs in the first sentence of the article. (Recall Keith Devlin's imperialistic use of "Bayesian mathematics" or the ongoing efforts to replace "statistics" with "stochastics" by probabilists in Europe.)

Amusingly, Keith Devlin published a wrong (well: incomplete) solution to the Monty Hall problem. He skipped a little step in the argument, and didn't know Bayes' rule, so couldn't recover from his slip-up gracefully. Richard Gill (talk) 07:27, 15 April 2011 (UTC)

Other aspects of this decision---particularly about original research (bad) versus "arithmetic operations" (okay)---concern this project.  Kiefer.Wolfowitz  (Discussion) 12:40, 14 March 2011 (UTC)


The decision has been publicized.  Kiefer.Wolfowitz  (Discussion) 00:43, 25 March 2011 (UTC)

Branching random walk

Branching random walk is a stubby new article. Work on it. Michael Hardy (talk) 21:06, 22 April 2011 (UTC)

Doesn't MOS strongly favor Branching random-walk?  Kiefer.Wolfowitz  (Discussion) 21:18, 22 April 2011 (UTC)

Probability notations

On the suggestion of one of the editors interested in the arbitration on Monty Hall problem, I started a little essay on mathematical notation in probability theory and its applications. First draft is at essay on probability notation; you can talk about it at: probability notation essay-talk. Comments are welcome! Especially if you can tell me that this is all superfluous because it's been done, and done better, before.

Richard Gill (talk) 18:24, 21 March 2011 (UTC)

It's interesting, and if you made it a bit less breezy and wikilinked some of the terminology and maybe expanded it a little, it would make a nice wikibook. (talk) 11:41, 21 April 2011 (UTC)
Thanks for the feedback, I'll get back to it sometime and do some more work. Richard Gill (talk) 08:04, 3 May 2011 (UTC)

Who gave us Bayes' rule?

Does anyone here know who discovered Bayes' rule? (Posterior odds equals prior odds times likelihood ratio or Bayes' factor). Richard Gill (talk) 07:25, 15 April 2011 (UTC)

Pierre-Simon Laplace and Thomas Bayes, according to conventional accounts. Michael Hardy (talk) 17:28, 20 April 2011 (UTC)
Stephen Stigler has an article with this title. He is always entertaining and and careful with his sources.  Kiefer.Wolfowitz  (Discussion) 23:21, 20 April 2011 (UTC)

Technically, this is a question for one of Wikipedia's reference desk pages rather than for the discussion page of a WikiProject, since the latter are intended for discussion on how to maintain and improve Wikipedia's articles on a topic. (In this case, it seems harmless.) Michael Hardy (talk) 21:07, 22 April 2011 (UTC)

A good point, but consider the possibility that Richard wants to improve some articles on Bayesian themes ....  Kiefer.Wolfowitz  (Discussion) 22:03, 22 April 2011 (UTC)
I asked Stephen Stigler and *he doesn't know*. Please note I am not talking about the usual formula for computing a posterior probability from prior probabilities etc etc along with the awful denominator (I call that "Bayes' theorem"); I am talking about "posterior odds equals prior odds times likelihood ratio" or if you prefer "posterior is proportional to prior times likelihood". According to Stephen this way of thinking did not arise till the 20th century. There might be some anticipation of it in Cournot. Haven't read it all, yet. Richard Gill (talk) 08:04, 3 May 2011 (UTC)
Stigler's translation from Laplace reveals (all in modern terms) thinking ^posterior proportional to likelihood^ and thinking ^uniform prior is proper specification of probability^. I agree that isn't the same as thinking ^posterior proportional to likelihood times prior^; the uniform prior is part of the general approach to probability, taken for granted.
There is a problem regarding Bayes so-called Theorem, Law, or Rule(s). None of those terms really is free for the taking because all are used equivocally. I do think it's ok to write such a terminological distinction into a particular article as G has done.--P64 (talk) 16:16, 3 May 2011 (UTC)
Yes. For Laplace and the whole of the 19th C everyone was Bayesian and the priors was the uniform so the posterior was the likelihood (normalized). Richard Gill (talk) 06:54, 4 May 2011 (UTC)
This cannot be correct, because it would imply that many of the results that are attributed to Fisher appeared in Laplace, Edgeworth, etc. After all, WP's biography of Fisher begins with the quote that "Fisher was the genius who almost single-handedly created modern statistical theory". Sign me perplexed,  Kiefer.Wolfowitz 07:34, 4 May 2011 (UTC)

Rescue: Rexer's Annual Data Miner Survey

Hello. I am new to Wikipedia and would like to dispute the notability tag added by User:Melcombe to the Rexer's Annual Data Miner Survey article. Unlike statistics, data mining is a relatively young and interdisciplinary field and not as much writing has been done for it. I should think it significant that 735 participants from 60 countries participated in the most recent 2010 survey. Moreover, each year more people become involved with the survey each year. Compare this with other surveys that have articles on Wikipedia. Can you help? Thanks. Luke145 (talk) 20:07, 6 May 2011 (UTC)

This article looks like commercial promotion of Rexer Analytics. It seems to me that the information about these "surveys" ought to be included in another article about data mining, if at all. Richard Gill (talk) 14:57, 8 May 2011 (UTC)
I agree. Luke145, do you think Rexer Analytics or this survey is similar to some of the eight Survey#Organisations? Only two other Survey disambiguees are proper names and those evidently concern "surveys" in a different sense. (They enable the identification of land parcels in western Canada and mainly-western US.) --P64 (talk) 18:58, 8 May 2011 (UTC)

R programming

what are the constraints and weaknesses of "R programming"? —Preceding unsigned comment added by Memami60 (talkcontribs) 05:08, 23 May 2011 (UTC)

Statistical meaningfulness test

Opinions of the article titled Statistical meaningfulness test?

And what other articles should link to it? Michael Hardy (talk) 02:08, 26 May 2011 (UTC)

This doesn't seem to be too good. The title implies far too much generality in the application of this idea, which seems to be limited to trend estimation. It seems the idea is far too recent to have established notability, given the 2011 source. At best it might just about be worth including a minor paragrapoh about it in trend estimation, not a whole article to itself. But let's move discussion to the article's talk page ... I will copy this there. JA(000)Davidson (talk) 09:20, 26 May 2011 (UTC)

Scatter matrix

Does anyone have a suitable citation for the term "Scatter matrix" as used in Scatter matrix? ... I can't find it among several standard book sources, but Google finds it in quite a few relatively obscure technical papers. I found one book source calling the same thing the "corrected sum of squares and products (SSP) matrix". Melcombe (talk) 09:26, 2 June 2011 (UTC)

Life-time of correlations

Life-time of correlations is quite a vaguely written article as it stands. Michael Hardy (talk) 01:26, 7 June 2011 (UTC)

Medical "risk"

Hello. WikiProject Medicine is interested in your opinions here. Thanks. Axl ¤ [Talk] 17:46, 7 June 2011 (UTC)

Poll presentation

Hello, your insights would be much valued at Wikipedia:Reliable_sources/Noticeboard#CNN_Poll_2011. Thanks, unmi 08:43, 8 June 2011 (UTC)


Hey there! I'm from WP Elements, and we want to start using Bplus-class for our articles, but without any idea how to introduce it. I noticed you use it, so could you help us to do it? Help is surely appreciated--R8R Gtrs (talk) 09:43, 16 June 2011 (UTC)

Type I Type II

I did some work cleaning up the errors page, but it keeps getting tweaked and with the null formulations and the correct/incorrect, failure to accept/reject, each example, it's getting jumbled. Can someone check this and make sure everything is as it should be. I blame you all for not just saying "Was false, tested false", "was true, tested false", .... Ocaasi c 19:19, 30 April 2011 (UTC)

As a user, I had a problem finding the article about type I and II errors. I think that search phrases such as "error type", "error type I" should redirect to that article, and than a relevant link should be added in the 'see also' section of the article "statistical error". I have only done the latter. Xzar (talk) 07:40, 23 June 2011 (UTC)

Clean up

I have modified the address used in various project templates to point to a "clean up listing", so that an up-to-date version should now be found. The old one was a year old, while the new address should be automatically maintained for now. For info the list can be found at, which is where the templates now link. Melcombe (talk) 11:05, 28 June 2011 (UTC)

Deletion proposal - Sampling variogram

The AfD proposal for Sampling variogram has been relisted ... please contibute at Wikipedia:Articles for deletion/Sampling variogram . Melcombe (talk) 09:17, 15 July 2011 (UTC)

Two envelopes problem, two children problem

Having fought for two years on the Monty Hall problem, almost getting banned from wikipedia for OR and COI, I am looking for a new brawl, and am getting stuck into the two envelopes problem. Have been accused of gross arrogance and incivility within one day (practice makes perfect! I didn't want to waste time with ritual dances but went straight to the nitty gritty). There is a big problem with that page, that a lot of people have been writing up their own common sense solutions (both sensical and nonsensical) but almost no one actually reads the sources. I just wrote up two mainsteam solutions to two main variants of the two envelopes problem, both "out of my head", ie without reliable sources. (Very evil of course, very un-wikipedian). After all, I have been talking about these problems with professional friends for close on fourty years now, and setting them as exam questions, talking about them with students, without ever actually carefully reading published literature on the problems.

It's un-wikipedian only if some content is disputed or you aspire to class B (or C, may vary across wikipedia), or both. I believe you have both. ...

Maybe some of you folk here can get access to some of those papers in journals where you have to pay a big tax to the publishing company before you can actually read the pdf. That would be useful.

Looks like the two children problem is equally much a mess.

Of course I could be completely wrong that what I think are the solutions to the two main variants of the two envelopes problems are indeed the solutions, or for that matter, are correct at all... Richard Gill (talk) 08:12, 3 May 2011 (UTC)

The latest contributions to the Talk page for two envelopes problem are mostly by me. And two of the sections on the page itself were hurriedly and entirely written purely as uncited "own research" by me. [1], [2].

Well: I think I report the accepted wisdom / folklore in my community. What does the community think? Does the community know good references? Richard Gill (talk) 14:50, 3 May 2011 (UTC)

-The facts are telling: there is so much "accepted wisdom / folklore" and reference to reliable sources is challenging or impossible (and not primarily because some journal content is tax-sheltered, so to speak). Previously I have noted WP troubles handling pedagogy and illustration. Others have said the same regarding exposition (partly in and about Monty Hall arbitration). It is inevitable that distinction between original research and original pedagogy, original illustration, or original exposition is contentious. Yet we may all agree that Wikipedia should include entries such as "Monty Hall problem" rather than "'The Monty Hall Problem' in print".
-It's possible that the way is clear to writing "Bertrand's paradox" or "Newcomb's paradox" well and without much dispute. If so, I guess it's because there is a locus classicus in the writings of J.Bertrand or S.Newcomb, not only penned but also discussed in print before any WP editor was born.
-Consider the long "See also" sections of some articles on puzzles (problems, paradoxes, etc). No doubt this cornucopia has been generated partly by some teachers and columnists tweaking the corpus. Some have presented wholly new scenarios, with or without coining new titles, while others have presented their own revised versions for their own purposes yet under the old names. Now after a few generations of mass higher education, we have many editors who cut their own eye teeth on one puzzle or another. Indeed, we may have thousands who cut their eye teeth as economics, probability, statistics, logic, or philosophy students, along with hundreds for whom it was their eye teeth as teaching assistants or faculty members.
-I think we may agree that it can be good pedagogy to adapt one of these puzzles to a local purpose --specific to the study session, the course of lectures, or the article for in a teaching journal (if not for wikipedia beyond Start class). Except in the teaching journal it may be good practice to do so either with or without changing the scenario (goats for asses; envelopes for wallets) and with various attitudes toward previous authors (credit Bertrand by calling the puzzle Bertrand's, nod to the revisions by not naming him).
-Collectively we have learned to think in terms of different versions of these different puzzles and their solutions. For some of us and some of the puzzles, that happened before anyone wrote any version into the literature.
-Today I don't have any optimistic or encouraging suggestion. Evidently I have some ideas for an Essay but I admit that I don't yet see it taking any particular shape. I'm not optimistic about that either.--P64 (talk) 17:57, 3 May 2011 (UTC)
Food for thought. No easy solution. Indeed worth an Essay! Richard Gill (talk) 06:52, 4 May 2011 (UTC)
I have collected my thinkings on the two envelopes problem on my Talk page at [3]. This must all be known. References??? Richard Gill (talk) 14:53, 8 May 2011 (UTC)

I've written a new essay on Two Envelopes Problem on my talk page: [4]. It's called TEP, Anna Karenina and the Aliens movie franchise. Richard Gill (talk) 16:42, 20 July 2011 (UTC)

"Tail" is nowhere defined

The term "tail" is used in many places in the project where distributions are discussed. For those of us with experience with distributions, the term is an everyday term like "lunch", but for many it is not well understood. I'm not sure where would be the best place to introduce the term and welcome suggestions. Could it merit its own article? Jojalozzo 02:56, 25 July 2011 (UTC)

There is the article Heavy-tailed distribution which might give some ideas. Dmcq (talk) 08:20, 25 July 2011 (UTC)
Thinking about it there is room I think for a special article. The standard deviation article seems to assume just the normal distribution and I think something should be done about its table showing things like probability of greater than 5 standard deviations. Dmcq (talk) 08:25, 25 July 2011 (UTC)

Renaming discussion regarding article Copula (statistics)

The proposed renaming being discussed at Talk:Copula (statistics)#Requested move may be of interest to members of this WikiProject. Favonian (talk) 08:21, 2 August 2011 (UTC)

Merge help requested

I was going to ask for comment from statistics folks about whether to merge Heavy-tailed distribution with Fat-tailed distribution at Talk:Heavy-tailed distribution so I could merge if you thought it was a good idea. Then it occurred to me that I wouldn't be able to do the merge anyway since I don't know statistics very well. So, I'll leave this whole issue to WikiProject Statistics, if I may. Thank you, D O N D E groovily Talk to me 01:41, 6 August 2011 (UTC)


Several other contributors have mentioned the inconsistency of notation for transposes. Sometimes we have ', sometimes $^T$. At one point (affine transformations) we have a 'dot product' for the same thing. Could we please have consistency? I personally like $^{\rm T}}$, followed by $^T$. Econometricians like ' as T tends for them to indicate time. — Preceding unsigned comment added by (talk) 09:50, 13 July 2011 (UTC)

It's a fact of life that different *sources* use different notation. Sooner or later the reader is going to be confronted with this. Now if there is a small group of wikipedia articles on closely related topics that refer to one another a great deal, and which rely on much the same collection of sources, then uniformity of notation within this group would be sensible and an individual editor can seek consensus among the editors of those articles for a preferred notation. But in general I doubt we would ever agree on a preferred notation nor be able to enforce it; and anyway there will be readers coming to statistics articles from physics or from pure mathematics or whereever, who will have to be told specifically "where ... denotes transpose". Richard Gill (talk) 16:54, 20 July 2011 (UTC)
I am a big fan of notational consistency and I would like to see it across large rather than small groups of related articles and across wide rather than narrow reference. For example, all of the articles on named probability distributions might use plain, bold, and italic fonts consistently; lower and uppercase letters; Roman and Greek alphabets; a or k as an integer parameter, or not. For example of wide reference, articles that use particular Poisson distribution(s) should (try to) use the same notation as does Poisson distribution—same use of 'k' and lambda, for instance. The only question should be how hard to 'try'. --P64 (talk) 22:58, 20 July 2011 (UTC)
May be, but a major question arises in respect to sources ... and after all a major tenet of Wikipedia is provision of adequate reference to sources ... it may be best in some cases to match notation to a specific sources reference, so that things can be easily checked. Of course this is only really relevent where there are several parameters where things like m, n, M ,N are switched round in external sources. And, as for example WP:REFB#Good references suggests that it it not good to use other Wikipedia articles as sources, it may in some cases be best to restate things like the pmf of the Poisson distribution (in a local notation if necessary) rather than relying entirely on cross-refs to other articles. Melcombe (talk) 08:50, 21 July 2011 (UTC)
I certainly do support notational consistency when there are a lot of articles referring to one another a great deal. People who feel this is important should try on a few important cases while the lazy or indifferent watch to see how it is received by other editors. Richard Gill (talk) 12:48, 16 August 2011 (UTC)

Economic statistics

Please see Talk:Economic statistics for discussion of the future of the Economic statistics article. Melcombe (talk) 08:46, 5 September 2011 (UTC)

The Quartile article needs some help

The second sentence in the lead doesn't make sense. There are also some worrying concerns in the Talk page. -- Dandv(talk|contribs) 08:56, 8 September 2011 (UTC)

Featured article candidate

The article Shapley–Folkman_lemma is nominated for Featured Article status, and the usual Featured-Article reviewers are not mathematicians. Therefore, your comments would be especially valuable, particularly for deciding whether the article meets the FA criteria. Your criticism/suggestions/bold improvements would all be welcome.

Probabilists are especially needed! :)

  1. Please help with the 2-paragraph section on probability_and_measure_theory, which is written in summary style. Is it okay to say that the component measures are defined on the same probability space, or is it better to say that they are defined on the same (finitely) measureable space. (I vote for the latter ....)
  2. Probabilists may enjoy the applications to random sets!

The SF lemma was used recently for a statistical article in Econometrica, but it seems to be under-used/ignored in statistics.  Kiefer.Wolfowitz 19:15, 30 September 2011 (UTC)

Thanks for your help, in my hour of need ....  Kiefer.Wolfowitz 19:15, 30 September 2011 (UTC)

UPDATE  Kiefer.Wolfowitz 03:32, 1 October 2011 (UTC)

In advanced measure-theory, the Shapley–Folkman lemma has been used to prove Lyapunov's theorem, which states that the range of a vector measure is convex.[1] Here, the traditional term "range" (alternatively, "image") is the set of values produced by the function. A vector measure is a vector-valued generalization of a measure; for example, if p1 and p2 are probability measures defined on the same measurable space, then the product function (p1p2) is a vector measure, where (p1p2) is defined for every event ω by

(p1p2)(ω)=(p1(ω), p2(ω)).
  1. ^ Tardella (1990, pp. 478–479): Tardella, Fabio (1990). "A new proof of the Lyapunov convexity theorem". SIAM Journal on Control and Optimization. 28 (2): 478–481. doi:10.1137/0328026. MR 1040471. 

Market survey discussion

A group of which am a member need help regarding the reliability of market surveys and hacve posted a request for inpedendent advice at Wikipedia:Reliable_sources/Noticeboard#Using_reports_of_market_research_surveys. Since market surveys are a branch of applied statistics, could I request input from members of the statitics group. Martinvl (talk) 07:30, 4 October 2011 (UTC)


The community is invited to participate in a request for comment about my editing: WP:Requests_for_comment/Kiefer.Wolfowitz.  Kiefer.Wolfowitz 20:54, 8 October 2011 (UTC)

I thank the project for past collaboration. I especially thank Qwfp, Melcombe, Michael Hardy for helping me improve my editing.
Good luck!
 Kiefer.Wolfowitz 18:48, 28 October 2011 (UTC)

2010 US census facts for each populated places in the US

Does anyone know if there is work happening to 'bot' fix the Demographics section of every US populated place article (state, county, city, town, township, village, etc, etc) with 2010 census info instead of the current 2000 census info that is present? Is this being discussed somewhere? Is there is techie to do this? Hmains (talk) 19:00, 28 October 2011 (UTC)

Assessment of non-mathematical articles

The quality assessment link on the project page is only relevant for mathematics related articles. What is the process for getting articles assessed that have no direct mathematical content? Specifically, the following articles:

--ChrisSteinbach (talk) 21:06, 30 October 2011 (UTC)

You can use the statistics project banner, which is of the form {{WPStatistics|class=?|importance=?}}, for articles that have relevant content. However the majority of stats project articles haven't been classified, but it does work in terms of assigning articles inti categories such as Category:B-Class Statistics articles. There is also the possibility of using the banners for other relevant projects. At some stage the stats banner may be improved to also sub-classification into fields, but for now all 2200 or so articles are treated as a single group. Melcombe (talk) 09:39, 31 October 2011 (UTC)
Thanks for the reply. The WPStatistics banner has already been applied to these articles, but I'm unsure if it is OK to update the class and importance fields without requesting an assessment. --ChrisSteinbach (talk) 20:26, 31 October 2011 (UTC)
Just go ahead. It is up to individual editors to update these assessments, without any formality. If someone disagrees they can change it or start a discussion on the assessment. Some of the existing assessments are pretty extraordinary, but presumably reflect someone's impression from the standpoint of some particular field of endeavour. Melcombe (talk) 10:00, 1 November 2011 (UTC)

Statistical proof

I recently made some serious edit changes to an article titled statistical proof. When I arrived, the article, at that time[5], was a jumbled mess. Some editors feel that the changes have not been for the better and some have suggested that there is no such thing as "statistical proof". I would like to extend an invite out to people to join in the discussion and assist with this page. Thanks.Thompsma (talk) 18:57, 11 November 2011 (UTC)

Discussion about how to restructure the multivariate normal page

There is a discussion on the Multivariate_normal_distribution about how to restructure or split the article. The root of the discussion is that the multivariate normal commonly refers to the case where the covariance matrix is positive definite, but other authors refer to the more general case where the matrix is merely positive semi-definite as the multivariate normal. For the most part the current page reflects the more general situation. Some feel that this generality needlessly complicates the article. One proposal is to split the more general case to a different page. Other proposals are to integrate the two, with two separate definitions in the one article, or to cover only one or the other. Please offer advice on how to proceed. Marc.coram (talk) 05:05, 12 November 2011 (UTC)

Adding graphs to an infobox

I meant to ask this before the article went to DYK, but I am not sure how to get a graph I created in Excel into an infobox. In particular, I created a graph in Excel showing the bizarre log-Cauchy distribution under certain parameters, but I am not sure how to upload the resulting Excel page into Wikipedia or Commons. Rlendog (talk) 14:46, 2 November 2011 (UTC)

I figured something out, by converting the graph to a PDF, but it still doesn't show up as well as I'd like. Rlendog (talk) 18:57, 2 November 2011 (UTC)
I've redone them in gnuplot as this allows exporting as SVG, which is Wikimedia's preferred image format for vector graphics and displays better when resized. Qwfp (talk) 14:54, 13 November 2011 (UTC)

Overall and general population

I noted in the example on the 1age standardisation' entry that the heart disease rates amongst indigenous Australians ae compared to the general population and to the overall population. Are the general population and the overall population the same?

This article seems confusingly written. How can patients who have not completed the design be included in the assessment of efficacy? How can patient dropout break the initial randomization? These are concepts that sound like nonsense unless further explained. Certainly, dropout may be greater in the treatment group due to more severe side effects, but the dropout rate should be the same in the control group, and in any case it is not clear that the side effects should be correlated with the treatment effect. Including untreated patients in the assessment of treatment efficacy is certain to bias against the efficacy measure, so this is an overreaction strategy that seems entirely unnecessary in a proper experimental design with a placebo control group. ----

(false date and sig to allow archiving) Melcombe (talk) 15:17, 15 November 2011 (UTC)

Should the curve and surface fitting web site be discussed anywhere in Wikipedia?

Should the curve and surface fitting web site be discussed anywhere in Wikipedia? It seems relevant to some of the statistics topics.

Links to are banned from Wikipedia. Such discussions are useless no matter how relevant.

(false date and sig to allow archiving) Melcombe (talk) 15:17, 15 November 2011 (UTC)

Category rename

There is a discussion of renaming category Category:Experimental design to Category:Design of experiments: see and contribute in this discussion.

(false date and sig to allow archiving) Melcombe (talk) 15:17, 15 November 2011 (UTC)

Proposed deletion of Testing in data mining applications

Ambox warning yellow.svg

The article Testing in data mining applications has been proposed for deletion because of the following concern:

Very muddled article right from the lede. It's unclear which meaning of Wiktionary:Application it is referring to (software or usage) (I suspect IEP student doesn't realise word has >1 meaning). A Data mining applications article (or something with a less ambiguous title) would probably be needed before this article. The cite for "Lift chart" just shows that "Lift Chart" is the name of a tab/graph in a particular piece of software. I could go on ...

While all contributions to Wikipedia are appreciated, content or articles may be deleted for any of several reasons.

You may prevent the proposed deletion by removing the {{proposed deletion/dated}} notice, but please explain why in your edit summary or on the article's talk page.

Please consider improving the article to address the issues raised. Removing {{proposed deletion/dated}} will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and articles for deletion allows discussion to reach consensus for deletion. DexDor (talk) 09:45, 20 November 2011 (UTC)

file deletion proposal

Cumulative distribution function

I have proposed deletion of this file: Wikipedia:Files_for_deletion/2011_November_22#File:TnormCDF.png The vertical axis says "probability density", and that is false, and cannot be edited. The values of a cumulative probability distribution function are probabilities, not probability densities. The values of a probability density function are probability densities. This is obvsiouly not a probability density function. Michael Hardy (talk) 05:55, 22 November 2011 (UTC)

Michael Hardy, given that the R source code is right there on the page, it was pretty easy to edit. Other methods of changing might have included asking me (the content creator) to change it. I don't really get why this is a big deal. 018 (talk) 21:01, 26 November 2011 (UTC)

Help in correcting the diagram

Diagram showing the cumulative distribution function for the normal distribution with mean (µ) of 0 and variance (σ2) of 1. The prediction interval for any standard score corresponds numerically to (1-(1-Φµ,σ2(standard score))*2). For example, a standard score numerically being x = 1.96 gives Φµ,σ2(1.96)=0.9750 corresponding to a prediction interval of (1-(1-0.9750)*2) = 0.9500 = 95%.

As a perceptive user noticed [6], I had mixed up variance and standard deviation in the diagram at Standard_score#Percentile_ranks_and_prediction_intervals. The text is hopefully fixed now, but it seems that a correction in the diagram itself is necessary as well. I know how to easily change the letters in the image in Inkscape, but before I do so, can we conclude exactly how it needs to be changed? Is it enough to remove that little 2 above the σ at the top? Mikael Häggström (talk) 05:36, 30 November 2011 (UTC)

Yes, that should do it. Thanks! Qwfp (talk) 10:56, 30 November 2011 (UTC)
Thanks for the confirmation Face-smile.svg The error is hopefully fixed now. Mikael Häggström (talk) 18:56, 30 November 2011 (UTC)


Please see Talk:Deviance (statistics) if you can help in a question of current terminology regarding "deviance" or "scaled deviance". Melcombe (talk) 10:10, 1 December 2011 (UTC)

Percentage Points

I disagree with this statement in the article: "Statements such as "between 1980 and 1990, the smoking rate decreased twice as much as the lung cancer rate" are ambiguous: it is not clear whether percentages or percentage points are being compared."

I don't see how it is ambiguous. By saying "the smoking rate decreased twice as much as the lung cancer rate" the speaker would merely be saying that the decrease of one was twice the value of the other. I see how a statement could be incorrect when a speaker misuses the term "percent" but that term is not mentioned in this sentence. When no specific unit is mention it is convention to assume the author is referring to absolute value. Therefore, one must also assume that if one were to nominate a unit of measurement to be applied to the value referred to as decreasing, then one would also have to apply the same unit of measurement to the the value with which it is being compared. After all, you don't very often hear a beef farmer say "my herd of cows has decreased by twice as many chickens as last year" do you now?

Shoutatthesky (talk) 09:20, 4 January 2012 (UTC)

This discussion should tak place at Talk:Percentage point ... I have put a copy there, with response. Melcombe (talk) 18:12, 6 January 2012 (UTC)

Park test

Park test is a new article by a new user that is a mess. It definitely needs a look over from an expert from Mathematics or Statistics. I will also leave a note on the Mathematics project page. Thanks. Safiel (talk) 16:33, 27 January 2012 (UTC)

two-way analysis of variance

Two-way analysis of variance is currently a mess. Lots of essentially non-editable material. Michael Hardy (talk) 21:20, 6 February 2012 (UTC)

list of statisticians

Please see Talk:List of statisticians for newly started discussion of who should be in the list. Melcombe (talk) 13:30, 13 February 2012 (UTC)

Missing stack in User:Tompw/bookshelf, Wikipedia:Size in volumes etc.

I posted the following at Wikipedia:Help desk#Missing stack in User:Tompw/bookshelf, Wikipedia:Size in volumes etc. yesterday, and they suggested I seek consensus in the relevant project, which I think is this.


The display of "how big is Wiki in terms of printed books", which is included in several places (notably Wikipedia:Size of Wikipedia), appears to have a problem in the way it calculates the size of the display. For example, the current display computes the size as equivalent to 1634 volumes, but then displays that as approximately 7 1/6 full stacks (shelves of books), rather than the correct (approximately) 8 1/6th stacks. It appears to be a relatively simple miscalculation (rounding the result to nearest rather than rounding up, see description below).

The problem is that this is hosted at User:Tompw/bookshelf. I have left a message regarding this at User_talk:Tompw#Missing stack in User:Tompw/bookshelf, Wikipedia:Size in volumes etc. (reproduced below).

Unfortunately Tompw appears to be inactive. They've made one edit since last July (in October), and they've not responded to my message in several days.

So I have two questions.

First, as a matter of etiquette, should I go ahead and fix the code under User:Tompw/bookshelf?

Second, given the general use of these pages/images, should this be moved out of user space, and perhaps set up as a template?

(the following was posted to User_talk:Tompw on Feb 16):

-- Missing stack in User:Tompw/bookshelf, Wikipedia:Size in volumes etc. --
I believe the number of stacks in the various "how big is Wiki in printed books" graphics is missing a stack.
It's currently 1634 volumes, which should be eight full stacks, plus a partial ninth stack.  It's displaying
seven full stacks plus a partial eighth.  I believe the problem is with the calculation in
User:Tompw/bookshelf/stacks.  It's currently:
 {{ #expr:  {{User:Tompw/bookshelf/volumes}}/200 round 0}}
It should probably be something like
{{ #expr:  ceil({{User:Tompw/bookshelf/volumes}}/200)}}
(I think I did the conversion of braces correctly, but if the above has ended up with a missing or extra brace,
I apologize in advance.)
The round function is not what you'd really want.  Round would convert 300-499 books (1.500 to 2.495 stacks) to
2, and 500-699 books (2.500 to 3.495 stacks) to 3.  Ceil will get the next highest integer.  Thus 1.005 through
2.000 (201-400 books) would get 2, but 2.005 through 3.000 (401-600 books) would get 3.
Likewise, the calculation in User:Tompw/bookshelf/volumes, should probably be changed from:
 {{#expr:({{NUMBEROFARTICLES:R}}/(500*2*2*80*50/(6*562)) round 0) + 1}}
Although that's only going to be off a book at worst.
I haven't quite traced through how the partial stack gets drawn, so I'm not sure if there's an impact there or not.
Rwessel (talk) 21:14, 16 February 2012 (UTC)

Rwessel (talk) 21:33, 21 February 2012 (UTC)

Fat heavy long (right) tail

New section 15:50 quickly extended at least twice


Fat-tailed distribution
Heavy-tailed distribution
Long Tail
Power law —shares a lead diagram with Long Tail; captions differ
Right-tailed distribution —a redirect to "Skewness"
Long-tailed distribution —a redirect to "Heavy-tailed distribution"
long tail —disambiguation

The first entry in disambiguation long tail is "The Long Tail, a consumer demographic in business". That adequately conveys the main theme of Long Tail. Meanwhile, the first and last paragraphs of that article Long Tail, along with its first section "Statistical meaning", and its claim by WP:Statistics only, seem out of place because they emphasize mathematics rather than business.

Should there be any statistics article with a "Long" name? Alternatively, Long Tail might be rearranged to relegate the statistical theme. Judicious use of this qualification might help: "As a proper noun, the Long Tail ...". A WP:hatnote might be helpful because some users may capitalize for a wrong reason —and we may already have inappropriate links to Long Tail.

For what it's worth (and I cannot interpret), all five articles are claimed by WP:Statistics, only "Fat" and "Right"/Skewness by WP:Mathematics. --P64 (talk) 16:00, 23 February 2012 (UTC)

There has previously been some consolidation of articles on this sort of topic. There is a group of them under Category:Tails of probability distributions. Your use of "redirects" above is slightly misleading, as Heavy-tailed distribution actually contains formal maths-type defintions of "heavy-tail", "long-tail" and "subexponential" distributions, but without relating or comparing them in any useful way.... I suppose definitions of some of the other related types might be added, but this would highlight the need for some sort of ordering of the different lengths of tail and for expanding and improving the existing classification for common distributions according to under which definitions of length they fall. However, even in mathematics there is are different meaning being attributed to these terms (as partly indicated in Heavy-tailed distribution, and the position is even more confused in the real world. Thus the other articles have grown around usages in a variety of applications and have citations to those uses. You haven't mentioned Long-tail traffic which is definitely related and needs cleaning up, but probably needs to be left as a separate article because of its specific relevance to signal processing. Basically, we can't hope to get things clean and tidy with very tight definitions etc. because we can't over-ride established usages in a wide rage of applications. But we can try to get the maths and stats presented in a way that this correct and meaningful. I will add Heavy-tailed distribution to the maths project because of its strong maths content, and I'll add Long-tail traffic to the stats project. (My comment about "redirects" above was based on experience where a redirect of a term usually leads somewhere that doesn't even mention the starting term, so no offence intended.) Melcombe (talk) 08:50, 24 February 2012 (UTC)

A tool

Some time ago there was a request for a way of seeing recent changes to project article pages. The tool at (Wikiproject Watchlist) presently allows this (specifically, only the most recent edit). Type WPStatistics into the box next to the one containing "Template" (i.e. where it by default says "WikiProject Banner"). This tool is due to "Tim1357", not me. The tool also allows access to lists of "hot articles" (the list on the project page seems broken at present). Melcombe (talk) 23:43, 24 February 2012 (UTC)

WMF's "rate this page"

After answering a survey on an article, readers are invited to edit as part of WMF's strategy to recruit new editors.

I opened a discussion of the WMF's "rate this page" initiative to recruit editors from readers.

Such recruitment "surveys" are prohibited by the ethical code of public-opinion researchers.

Sincerely,  Kiefer.Wolfowitz 22:56, 27 February 2012 (UTC)

Hierarchy of Pareto distributions

Presently, there are at least four articles dealing with the hierarchy of Pareto distributions:

I think the relationships between these articles can be improved and would suggest the following:

  • Rename Pareto distribution to Pareto (type I) distribution
  • Treat the hierarchy of Pareto distributions in a section "Generalizations" in "Pareto (type I) distribution"
  • Redirect "Pareto distribution" to "Pareto (type I) distribution", with a hatnote mentioning other Pareto distributions
  • Lomax distribution should be renamed to Pareto (type II) distribution, mentioning the "Lomax distribution" as a special case
  • Clarify the relationship between Pareto (III) and the log-logistic distribution
  • Clarify the relationships between Pareto IV, Feller-Pareto and the "generalized Pareto distribution"

Comments from members of this project would be appreciated. Isheden (talk) 11:53, 13 March 2012 (UTC)

Sounds plausible, but....
Is this based on a reliable source? Which?
WP has more probabilists than statisticians. I would suggest asking also at the mathematics project.  Kiefer.Wolfowitz 12:16, 13 March 2012 (UTC)
Arnold is often mentioned as a reliable source for Pareto distributions. For example, the first few pages in chapter 7 of this book [7] discusses how the various distributions are related. Isheden (talk) 13:09, 13 March 2012 (UTC)
There are enough sources (see the article) for these relationships; the question is how best to summarize them in an encyclopedia article. There are many problems with trying to unify all Pareto distributions under "generalized Pareto" because there are more than one definition of generalized Pareto (see Johnson, Kotz, and Balakrishnan chapter on Pareto).
In addition, Pareto named three types of distributions - so in this sense Types I, II, III are not "generalized Pareto" but exactly what they are named. A very good resource on the history and developments is Kleiber and Kotz (see references in the article). Due to the existence of more than one generalized Pareto, and the historical place of Types I, II, III, it may be best to leave generalized Pareto as a separate article. Lomax is a short article and it may be more directly informative for readers to land on the page with the most relevant information rather than be redirected and have to read carefully to find it. The article was recently improved with links to Pareto Type II, so the relation should be clear enough there.
One improvement would be to change notation of the parameter for the lower bound of the support set of Pareto I, because presumably that suggests "x minimum value". That is not the standard notation (usually is used) and makes more sense when trying to use similar notation for all of the various members of the hierarchy. Mathstat (talk) 15:28, 13 March 2012 (UTC)
I like the idea of keeping Lomax as it is; it seems to fulfill a nice need for a quick pdf reference that could easily get lost in a maze of pareto relationships. Also it has a different parameterization and typical usage than Pareto II. Ditto for the Log-Logistic. How about a hattip on Pareto saying something like "This article is about the Pareto 1, for other paretos see Pareto I-IV" or something. Then create a separate article discussing the Types I,II,III (maybe IV and Feller-Pareto, etc) as well. This can disucss the historical context, the relationship to Lomax, Log-Logistic, and Generalized Pareto. It seems that it is more than sufficient to warrant its own article instead of a subsection to Pareto I. (talk) 15:29, 16 March 2012 (UTC)
The issues with attempting to unify all Pareto distributions under a generalized Pareto include:
  • There are more than one definition in the literature and in use for generalized Pareto.
For example, The Pareto Type IV distribution is often referred to as the generalized Pareto distribution.
In Continuous Univariate Distributions Volume 1, Chapter 20, Section 12 (Johnson, Kotz, and Balakrishnan) two distinct definitions of generalized Pareto distributions are given. The first was described by Ljubo (1965) and the second described by Pickands (1975) and more recently by Hosking et al.
  • Some of the special cases are special cases of other families such as power-law distribution or Burr distribution.
For example, the log-logistic special case (related to Pareto Type III) is shown in equation (23.88) of Continuous Univariate Distributions Volume 2, Johnson, Kotz, and Balakrishnan 1995, Chapter 23, p. 151 to be a special case of Burr Type XII distributions, and has also been called Weibull-exponential distributions.
  • The standard reference texts on the subject do not treat the various types as special cases of either of the latter two GPD.
  • Some variants do not appear as special cases of GPD.
For example, the Zipf distribution (discrete Pareto) is not a special case of GPD. The bounded Pareto and symmetric Pareto may not be special cases of a GPD. (To add to the confusion, there are at least two Pareto Type III definitions and one of them is not going to fit as a special case.)
It would be good to expand GPD with appropriate identification of special cases, supporting references, and a table as suggested, as well as links to all related distributions. Alternate definitions of GPD should also be cited to avoid confusion.

Mathstat (talk) 18:01, 17 March 2012 (UTC)

Statistician and Statistics

Within hours of the pages Statistician and Statistics becoming unprotected several edits have been made by a single user similar to those that apparently motivated the page protection. Could members please review recent edits to Statistics which include adding/deleting paragraphs and references? Mathstat (talk) 16:01, 16 March 2012 (UTC)

Hi MathStat!
Thanks for the heads up.
The relevant articles have been protected for 3 months, following the resumption of automaton-like editing after the expiration of the 2-week protections.
The new account, Mark Vaughn, has been indefinitely blocked as a sock. The IPs cannot be blocked, since they move around Lahore, without blocking out many of the editors from Lahore.
Best regards,  Kiefer.Wolfowitz 11:59, 18 March 2012 (UTC)

Moment closure

Moment closure is a new article.

It's a complete orphan, i.e. no other articles currently link to it. Its orphaned status is something to work on. Michael Hardy (talk) 23:04, 18 March 2012 (UTC)

Restricted randomization

Deletion of Restricted randomization is proposed. Please comment on the page devoted to discussion of the proposed deletion. Michael Hardy (talk) 04:33, 25 March 2012 (UTC)

I've flagged it for rescue. Michael Hardy (talk) 18:20, 26 March 2012 (UTC)


Wikipedia:HighBeam describes a limited opportunity for Wikipedia editors to have access to HighBeam Research.
Wavelength (talk) 16:04, 5 April 2012 (UTC)

two-way ANOVA

Two-way analysis of variance is an article that needs a lot of work. Probably more than one person can do in one day. Michael Hardy (talk) 16:48, 3 May 2012 (UTC)

Relative Risk vs Risk Ratio

There is a tendency to use the terms Relative Risk and Risk Ratio interchangeably - what this article actually discusses is Risk Ratio. Relative Risk can mean either Risk Ratio or Odds Ratio dependent upon circumstances and it would be good for this to be clarified.


Regards, Alan — Preceding unsigned comment added by Alanmcleod (talkcontribs) 00:08, 13 May 2012 (UTC)

Standard score

I'd like to continue a discussion at Talk:Confidence interval#Correct to say that z in example is standard score? here, because it seems to involve several articles (standard score, confidence interval and prediction interval). In short, it started with that I was uncertain whether the "number z" in Confidence_interval#Practical_example is a z-score (which, as I understand from the discussion, is not). Subsequently, it would be correct to have the section Standard_score#Confidence_intervals removed, as has been done.
However, I'm still rather sure that the z used in the formulas in the section Prediction interval#Known_mean,_known_variance (in the subsection "Standard score") is a z-score, and not a quantile, because it makes no sense to me to describe it as such. I'd appreciate some further explanation as to whether that z is a z-score or a quantile (or neither). Mikael Häggström (talk) 10:37, 8 May 2012 (UTC)

Is the confusion/worry that in one instance the z-score is defined as a quantity calculated from data, and in another it appears to be something looked-up in a table? To describe the value looked up in the table as a quantile is correct, but it is also the value that the z-score would have to be in order to be at the given percentge point of the distribution. In similar contexts, (testing) the value in the table would be more fully described as the critical value for the z-score or (confidence intervals) as the percentage point (or percentile or quantile) of the distribution of the z-score. Thus the value in the table might be described as a (theoretical) z-score if brevity were required, as in the heading of a table. Perhaps this could be done better in the article, but it might correspond to what is in the very limited references to that article. Melcombe (talk) 14:58, 8 May 2012 (UTC)
I interpret from your reply that standard score can be used to establish both prediction intervals and confidence intervals, and that is what I recall from previous statistics classes as well. Therefore, I think a reciprocal description is justified in the standard score article. Such a description was actually present, but recently removed ([8]). However, I think it should be reinserted, and perhaps a bit revised in order to present the subject in a way that most people agree with. Mikael Häggström (talk) 17:49, 11 May 2012 (UTC)
I think just using "standard score" would only be spreading the confusion you have found above, when there is no particular reason for the brevity that causes the confusion. In the particular case of prediction intervals that you quote, it would probably be better/clearer to use "quantile of the standard score", rather than just "standard score", or even just something like "percentage point of the standard normal distribution", which may actually be more useful to a general reader, or both. Melcombe (talk) 21:53, 11 May 2012 (UTC)
In any case, I think some for of "standard score" should be mentioned in articles, even if it's in the form of "z is not a standard score" because it seems to often used as such in external sources. Mikael Häggström (talk) 19:06, 13 May 2012 (UTC)
On second thought, I'm not certain that it must be mentioned. The most important thing, IMPOV, is to have some kind of designation to the factor z, as used in Prediction_interval#Known_mean.2C_known_variance, whether it includes "standard score" or not. For now, I'm about to change it to "quantile of the standard score", but I'm open for alternatives. Mikael Häggström (talk) 07:38, 14 May 2012 (UTC)

Power spectrum estimation

Our coverage of power spectrum estimation (spectral density estimation) seems very weak, with very little bringing together or contrasting the different methods, or giving historical perspective. Indeed, most articles looking for a signal processing treatment of the subject appear to have been being redirected to spectrum analyzer, about a hardware box with very little discussion of algorithms usually used in pure-software methods.

In particular, there is very little discussion of all-poles versus all-zeros methods. There appears to be nothing at all about John Parker Burg or the Burg algorithm. We have quite a detailed article on the Levinson recursion, but nothing to say that this is perhaps its most important application. Linear predictive coding appears to exist in a silo of its own, without even a link to ARMA modelling; while in turn the article on ARMA models doesn't appear to mention power spectrum estimation at all. Autoregressive model is a bit better, but doesn't give any sense how a pure AR fit is likely to compare to other fits.

It probably doesn't help that spectral analysis goes to a dab page, and the top link spectrum analysis that probably ought to be merged into spectroscopy.

This is very poor. Given the importance of this topic in signal processing and applications, we ought to be able to match at minimum the level of discussion in Numerical Recipes at least. But at the moment we're way short. Anybody out there willing to step up to the plate? Jheald (talk) 09:23, 26 May 2012 (UTC)

Cross-posted to WT:WPMATH, WT:PHYSICS Jheald (talk) 09:30, 26 May 2012 (UTC)
You forgot to mention that spectral density has a very short section on estimation that doesn't particularly overlap with spectral density estimation in terms of specific methods mentioned. There is also Maximum entropy spectral estimation, which is incomplete, but Burg's algorithm is often described by that terminology. Further Multiple signal classification, again incomplete in terms of references, mentions several methods including Burgs's and others I don't recognise from the names (presumably the originators'). Melcombe (talk) 11:57, 26 May 2012 (UTC)

Discussion moved to Talk:Spectral density estimation#Coverage. --TSchwenn (talk) 17:13, 26 May 2012 (UTC)

Lester Dubins

I was surprised that we had no article on Lester Dubins. I've just created one. It needs further work, both within the article itself and in other articles that ought to link to it. Michael Hardy (talk) 17:41, 3 June 2012 (UTC)

A renaming

Please see Talk:Newman–Keuls method for discussion of renaming the corresponding article to have Student-Newman–Keuls ... Melcombe (talk) 09:27, 12 June 2012 (UTC)

Median: Administrative attention

An IP editor User: continued to engage in incivility and personal attacks, e.g. "you are larely ill-informed. stop making stupid changes to well-established articles without gaining consensus". The IP's previous median edit had the summary "opinionated dh who is keen to denigrate established sources without providing any new sources", whatever that meant. The IP restored the use of multiset rather than list, etc.

Previous incivility includes "only the most arrogant editors dump huge blocks of text and expect others to point out where sources are missing: supply at least some per standard".

I and editor Benwing have tried to civil discussions with the IP. The IP's response to Benwing began with "Don't try your bombast with me" (or something similar).

(BTW, I introduced the citations to Kemperman's famous article and to the Vardi--Zhang method for the spatial median in recent edits, following my having introduced Oja's recent friendly monograph.) There is no point in my trying to improve the article by referring to e.g. Chris Small's survey until this IP editor is under control.

That edit restored citations to Wolfram's MathWorld for propositions that are not contained in the MathWorld "reference". (Also, the MathWorld site is not a reliable source for this topic.)  Kiefer.Wolfowitz 07:34, 7 May 2012 (UTC)

I've issued a level 1 civility warning, to the user.--Salix (talk): 08:20, 7 May 2012 (UTC)

I have checked the MathWorld citation and it does cover the two points made in the article: (1) use of the average of central values in the even-sample-size case; (2) explicit statement to the two notations cited ... so at least this seems retainable (after shifting one of the cite templates slightly closer). So you were wrong on your first point, were you any more careful on other points/edits? Looking through your edit comments in this sequence, you were not particulaly civil yourself. I note you called MathWorld "unreliable", whereas it provides a reasonable summary of what a median is, and you didn't replace it with another reference. I haven't bothered to check many of the other changes, but it seems to me that the existing lead section is preferable to your version (which introduces undefined concepts and arguments that would take too much space to cover even minimally in a lead section). After all, the lead section was carefully honed by many editors over a substantial period of time, so perhpas this IP editor is correct that you should have discussed it on the Talk page. Of course I have experience where you like to demand this of others, as in this case, but don't think of doing this yourself. It seems common practice that if a change is reverted you explain/seek-consensus on the Talk page before putting it back in. Melcombe (talk) 11:51, 7 May 2012 (UTC)
Hi Melcombe!
I thought you would have an interest in this IP's edits, given the similarity of views (along with the somewhat gruffer tone). :)
I don't remember commenting on the triviality about an even number of data, so your first point is moot. I have no objection to a discussion of this triviality, if high quality leading sources discuss it.
You seem to misrepresent MathWorld, which does not have the disquisition on multiple notation for the median, although it may use two notations---again, a pointless triviality. Have you or the IP inserted a discussion on the different notation for the arithmetic means it its article? The disquisition in the WP article was OR (perhaps by synthesis) or misrepresentation of the (unreliable) source.
Unreliable?? MathWorld discusses the mean for an arbitrary distribution, so it is not fit for a first course in statistics, as I noted before. It also cites old dull references rather than HQRS. I agree that it is not worthless.
Actually, the record shows that I have inserted standard HQRS. You can see that my emphases are consistent with recent survey articles on nonparametrics in JASA 2000 and Statistical Science, whereas Melcombe and the gruffer IP are consistent with ... each other! :)
Reverting 7 of my edits is hardly sporting. Sure, revert some, and keep the good ones. Let us discuss the edits on the talk page, per usual, while avoiding the use of "bombast", etc.
Cheers,  Kiefer.Wolfowitz 16:33, 7 May 2012 (UTC)
You seem to misunderstand the use of citations in Wikipedia. Each backs up (verifies) the information/opinion in the sentence or paragraph to which it is attached. The appearance of a citation in an article doesn't mean it can be expected to back-up any other information in the article and I don't know why you would expect it to do so. The relevance of "a first course in statistics" is beyond me, as that is certainly not what Wikipedia is meant to be. The point is that the MathWorld page provides a reliable source that an ill-informed reader can check easily and without being faced by heaps of irrelevant information, specifically for the two points being made. Perhaps the IP editor is a fan of MathWorld and you insulted his/her favourite reading, so it would be understandable if you were insulted back ... perhaps you should tone down your edit comments. Wikipedia has standards for what can be counted as reliable sources and MathWorld meets these... and you only need to do a simple search in Wikipedia to see just how many articles make use of that source. And yes, even "trivial" points need citations, particularly when they are basic, as Wikipedia has a reputation for containing misinformation slipped in when no one is watching, so any user would be well-advised to check as much as possible. You only have to look at the edit history to see how much the article has attracted edits that needed to be reverted.
You will have to discuss with the IP editor reasons for reverting changes. I put back in (some of ?) the citations you mentioned as having been omitted, but I'll steer clear of the rest as it is too boring to go through lots of before and after comparisons.
Melcombe (talk) 20:50, 7 May 2012 (UTC)
Melcombe, you justify insults of me--- of which the IP provided many, e.g. ill-informed, ignorant, dick-head ;) --- because I criticized a page on MathWorld!
Do you want to apologize for the incivility towards other editors, while you are at it?
You admit that MathWorld claims that every distribution has a mean? It is not a reliable source about the statistical concept of medians. I am familiar with WP:RS, which states that a source may be reliable for some topics and unreliable for others. You seem to be claiming that MathWorld is reliable for everything!
MathWorld is better on computer-algebra related topics and surrounding areas of mathematics. It has a fine precis of the Beauzamy-Degot identity.
Other editors can easily compare our respective edits for accuracy and veracity, and for the quality of sources (in particular for having a global perspective rather than representing Kendall's dictionaries, which were written in the 1950s~with slim updating since).  Kiefer.Wolfowitz 21:17, 7 May 2012 (UTC)
Why do you extend this discussion? I justify nothing has it as nothing to do with me. I did point out that in that sequence of edits you were the first to be uncivil .... and then you call down opprobrium on someone else's head (a relatively new editor) for the same (perhaps slightly worse) fault. You seem to think that somewhere on MathWorld it makes some claim about the existence of the mean. If this is meant to be the page linked in the median article then there is no such "claim" ... in fact the mean is barely mentioned and there is no discussion of existence (and why would there be?). It is clear from the edit comments that you were the one saying that MathWorld is flawed... I have never indicated anything like that I am "claiming that MathWorld is reliable for everything" ... in fact I have only ever looked at a few pages that happen to have been linked from Wikipedia. You still misunderstand the purpose of citations... they are there for verification, they are not there to imply "this is the most up-to-date and technically sophisicated source possible". Citations are not a list of suggested reading material... there can be (and sometimes are) separate sectons for that. As to supposed "quality" of sources, that is a matter of personal opinion but you should recall that Wikipedia is intended for a general readership and that there should be at least some basic-level sources to reflect this. Books contaning what they will consider mathematical gibberish are somewhat useless to the general publc. As to the age of sources ... well in mathematics if something is true it is always true, and the meaning of the most basic terms in statistics has not changed substantially if at all since the 60's. You say "Other editors can easily compare our respective edits ..." ... yes they can. Melcombe (talk) 14:24, 8 May 2012 (UTC)
The account may be new, Melcombe. I would not bet that the editor be new; would you?
The new account is uncivil and insulting to other editors, besides me. I put down on article on MathWorld, but not the new account's editor, who has as much to offer as you. ;)
The new account's revealed preference for "multiset" over "list", the mastication of even sample-sizes and uniqueness, etc., should concern you, given your stated concern with popular exposition.  Kiefer.Wolfowitz 17:01, 8 May 2012 (UTC)

Welcome to the real world. This is a generic problem in wikipedia, for high-importance, popular articles. My first experience was long ago, when some undergrad facing a mid-term exam asked me, personally, for help on orbital angular momentum or something like that. I spent hours fixing up the article. He read it, didn't understand it, and replaced it with total nonsense. Probably failed his mid-term. Think strategically: what should the WP policy be for these things? linas (talk) 23:26, 14 July 2012 (UTC)

Such strategies are already in place: WP:CITE, WP:NOR. Melcombe (talk) 08:22, 15 July 2012 (UTC)

Statistical data

At present, and since 2004, statistical data redirects to statistics, which doesn't really serve the purpose. (Specifically, I mean data, not data analysis and not "statistic". ) Does anyone have a good source from which to start a reasonable article. The best I have found online is , which does start by defining statistical data. Melcombe (talk) 08:24, 17 July 2012 (UTC)

Help with the "False discovery rate" article

Hello dear editors, As part of my academic work I am currently working on improving the False discovery rate article. And I would like your help in ideas on how to improve the article:

  1. Do you think the order of the sections makes sense? (I thought a lot about it, and the current order makes the most sense to me - what do you think?)
  2. Are there any sections that are missing from this article, but should clearly be included?
  3. Can you help me grade the current quality and importance of the article? (since the main paper introducing the FDR is the 3rd most cited statistical paper ever, with over 14,000 citations, I believe it can be said that the importance of the topic is high - but I am not sure how much)
  4. What other general recommendations can you give me for making this a "featured level" article? (I've read the relevant Wikipedia articles for doing it so - but I am wondering what YOU see is clearly missing/can be improved upon)

With regards, Tal Galili (talk) 21:09, 2 August 2012 (UTC)

A short list of articles to review/wikify

Hello all,

As part of a course assignment, students in my class have finished working on several articles, and they are ready to be reviewed with suggestions for future improvements (done by them and/or future editors). The articles are:

With regards, Tal Galili (talk) 20:49, 5 August 2012 (UTC)

Invitation to comment at Monty Hall problem RfC

You are invited to comment on the following RfC:

Talk:Monty Hall problem#Conditional or Simple solutions for the Monty Hall problem?

The Monty Hall problem is an especially interesting one because for many people it is their first exposure to probability calculations, and because it has a distinct psychological aspect; why do so many engineers, scientists and mathematicians get it wrong at first?

The question the RfC asks concerns the place conditional probability should have in the Monty Hall problem article. We could really use some informed opinions on this. --Guy Macon (talk) 01:13, 7 September 2012 (UTC)

What is population pyramids — Preceding unsigned comment added by (talk) 15:03, 21 September 2012 (UTC)

Behrens–Fisher distribution

I've written this somewhat hastily scrawled user-space draft. I have in mind that with some further work it can evolve into something to be moved into the article space under the title Behrens–Fisher distribution (currently a redirect). In its early stages that will be maybe two or three times as long as the present draft. I'll be back to do more work on it. In the mean time, maybe others can improve it as well. Michael Hardy (talk) 03:41, 24 August 2012 (UTC)

PS: Behrens–Fisher distribution is now in the article space. Michael Hardy (talk) 17:31, 13 October 2012 (UTC)
Michael I think this is a really great article. Btyner (talk) 03:29, 7 September 2012 (UTC)
Thank you. Definitely further work could make it better. Michael Hardy (talk) 17:31, 13 October 2012 (UTC)

Help requested at Wikipedia talk:Articles for creation/Lord's paradox

An editor began Wikipedia talk:Articles for creation/Lord's paradox but is unable to finish it, and asked if anyone else would be able to neaten it up. Here's his request: Wikipedia:WikiProject_Articles_for_creation/Help_desk#Review_of_Wikipedia_talk:Articles_for_creation.2FLord.27s_paradox

If this is indeed an article we're lacking, hopefully this can be neatened up and published. Thanks for any help! MatthewVanitas (talk) 20:00, 15 October 2012 (UTC)

data point

Someone proposed deletion of the article titled data point for lack of references. It gives a better and clearer definition that I can recall having seen in any textbook. Is there an elementary textbook that can be cited that says the same thing? Michael Hardy (talk) 22:56, 21 October 2012 (UTC)