Wikipedia talk:WikiProject Statistics/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search


The list of statistical topics is not in its present form very sophisticated. Lists are far more versatile the categories, but this list doesn't take advantage of all of that. Look at list of mathematics articles and lists of mathematics topics (the latter is a magnificent thing!). I'll have more to say on this later. Michael Hardy (talk) 19:25, 16 March 2008 (UTC)


I've archived the discussion of the proposal of this WikiProject that took place on Wikipedia:WikiProject Council/Proposals at Wikipedia:WikiProject Statistics/Proposal. Please do not edit that page but feel free to continue discussion here of any of the points that came up. Qwfp (talk) 23:37, 16 March 2008 (UTC)

Proposed merge of Kernel (statistics) and Kernel smoother

It has been proposed to merge the articles Kernel (statistics) and Kernel smoother. Weigh in on the discussion at Talk:Kernel (statistics)#Proposed merge of Kernel (statistics) and Kernel smoother.  --Lambiam 19:35, 17 March 2008 (UTC)

Project banner on talk pages

It's up to you guys whether you'd like to piggyback on the {{maths rating}} template or create your own. Either way, if you need help with setting up templates, customizing the assessment system, or need a bot to tag a bunch of talk pages, I should be able to lend a hand. — Carl (CBM · talk) 01:06, 17 March 2008 (UTC)

Thanks very much for your offering your help Carl. I think we might want to take you up on it at some later point but as you say, we need to discuss the options first. I have created a WikiProject Statistics talk page banner {{WPStatistics}} but so far I've only put it on a dozen or so talk pages (including some of the most viewed statistics articles), and always in addition to {{maths rating}}. Although that could be done by a bot, the number of pages in "field=probability and statistics" (230ish) is not that vast especially if several of us work on it and some human judgement is useful e.g. to decide if use of {{WikiProjectBannerShell}} would be a good idea. There may also be articles in "field=probability and statistics" on parts of probability theory field that are not relevant to statistics that we won't want to tag as being in the scope of WikiProject Statistics. And Category:Statistics would need a lot of clearing out and reorganising before we could think of tagging articles in that category or some of its sub-categories using a bot.
I was thinking of keeping the assessment side of things within the existing framework of WP:WikiProject Mathematics/Wikipedia 1.0, but then people pointed out (see discussion around the proposal archived at WP:WikiProject Statistics/Proposal) that there are articles on the non-mathematical side of statistics that would sensibly be considered to belong in statistics but not mathematics, so logically we might want a separate assessment system I guess. But I think it should be a higher priority to re-assess the articles currently in the "probability and statistics" field and make sure all the frequently viewed statistics articles are rated and then start to act on that info by starting to improve the high-importance but low-quality articles. If a few statistical but not-strictly-mathematical articles are included in the {{maths rating}} system I don't think it really matters, as long as WPM doesn't mind. Qwfp (talk) 11:43, 17 March 2008 (UTC)

I've been puting the WPStatistics above the WPMathematics tag when the article is clearly on statistics; someone else has been doing the opposite. Should there be some convention on this? Michael Hardy (talk) 19:35, 19 March 2008 (UTC)

Hmm, good question Michael. I've been putting it below but I never really thought about it — guess I was just being chronological. On second thoughts it seems to make sense that if it's definitely an article about statistics the {{WPStatistics}} tag can go first and I'll change my practice, though it doesn't seem worth going back to change the ones I've already done. But if it's more on the border (perhaps e.g. all the articles on probability distributions) then I think I'd stick with being chronological. Not sure it's too important if different people do it differently — more important that it's done, at least for frequently-viewed articles. Qwfp (talk) 20:20, 19 March 2008 (UTC)

missing statistics/statistician articles

Now that we have our own wikiproject (thanks for putting this together!), should we have maintain a list of such articles? I mean besides Wikipedia:Requested articles/Mathematics#Statistics? Thanks Btyner (talk) 02:23, 20 March 2008 (UTC)

Do you mean Wikipedia:Requested articles/Statistics? Or did you want to put it somewhere else? Michael Hardy (talk) 03:38, 20 March 2008 (UTC)
Maybe we could explicitly list least a few of them under Requests in WP:WikiProject Statistics#Article-related tasks? In particular any you think should be created but don't know and can't find out enough about to just go ahead yourself and create a stub? Qwfp (talk) 05:58, 20 March 2008 (UTC)
Yes, Wikipedia:Requested articles/Statistics sounds good to me. I'm also wondering whether it would be justifiable to set up something like User:Mathbot/Most wanted redlinks but for stats. Thanks Btyner (talk) 19:06, 21 March 2008 (UTC)

Can we set up something like the page at Wikipedia:WikiProject Mathematics/Current activity? Michael Hardy (talk) 16:50, 20 March 2008 (UTC)

Personally I can't see much advantage of Wikipedia:Requested articles/Statistics over Wikipedia:Requested articles/Mathematics#Statistics. I'm not sure statistics gets enough article requests to justify an entire subpage of Wikipedia:Requested articles to itself. Yes would be nice to have an equivalent of Wikipedia:WikiProject Mathematics/Current activity but that looks like it's taken quite a bit of work by some bot operator, so we might need to see if we can persuade a bot operator to set up the something similar for statistics. I know nothing about bot operation myself.

Anyway, I'm happy for someone else to take the lead on this — I don't have any particular role, rights or responsibilities as the project proposer any different from the rest of the members. I have no intention of spending as much time on WikiProject Statistics on an ongoing basis as I did for a few days when setting it up. I'm going on a short break over the next few days and likely to be fairly busy in real life when I get back — I'll keep an interested eye on how it's going but I certainly don't feel I own WikiProject Statistics in any way and I'm very happy for other members to be bold and take the initiative. Regards, Qwfp (talk) 21:53, 21 March 2008 (UTC)

I guess what I'm trying to get at would be a list of stats-related redlinks; for example, What about something in the style of the lists linked from Wikipedia:Missing science topics#Mathematics? Btyner (talk) 14:37, 22 March 2008 (UTC)

merge probability and probability theory?

I was just looking over the top priority articles in the project and wondered why we have an article on probability and another article on probability theory. Just from a top level view, does this seem like a good idea to others? If you think it's a good idea, what is the difference between the two? Pdbailey (talk) 21:42, 22 March 2008 (UTC)

I suspect the reason we have two separate articles is that merging them is a lot of work, since they're necessarily fairly long. I'm inclined to say that should get merged. Michael Hardy (talk) 01:05, 23 March 2008 (UTC)
I think Probability is meant to be more accessible, and also the article that deals with non-mathematical aspects such as applications, leaving the mathematically sophisticated stuff for probability theory. If they are to be merged, which I'm not convinced yet is a good idea, then we must make sure we save scary symbols like Ω for the end.  --Lambiam 14:16, 23 March 2008 (UTC)
I tend to agree it makes practical sense to continue having one article for a general audience and the other for a more advanced audience. Very similarly, there are separate Calculus and Real analysis articles, even though both articles note that they're basically the same subject covered at different levels of sophistication. (Physics does something similar in the various Mechanics articles). Best, --Shirahadasha (talk) 01:07, 24 March 2008 (UTC)
Interesting take, and I can't say that I would argue strongly against it. I have three questions (1) can we agree that the two should at least state this dichotomy explicitly? (2) do you really think that the two articles are at the levels you suggest? (3) what importance rating should they have given this separation? Pdbailey (talk) 02:40, 24 March 2008 (UTC)
Both articles in the Entropy/Introduction to entropy pair articulate the dichotomy very explicitly (For a generally accessible and less technical introduction to the topic, see Introduction to entropy). This seems a good idea. Best --Shirahadasha (talk) 22:13, 24 March 2008 (UTC)

Entropy (disambiguation)

I could use some more eyeballs on this page.

In my view, to help people get to the article they want most quickly, it is helpful to include structure in the page to group together meanings primarily related to Entropy in a thermodynamic sense, and those primarily related to Entropy in an Information Theory sense. However, because there is no provision for this is the WP:DAB guidelines, various editors specialising in disambiguation (who may know rather more about disambiguation than they do about entropy), would prefer to see all the links muddled together in a single (IMO much harder to navigate) long alphabetical list. Cf this diff: [1].

Since dab pages are supposed to help readers who do know something about the subject find the article they want, I'd greatly appreciate if members of this project could look at the two versions above, and then leave their thoughts on the talk page.

Thanks, Jheald (talk) 23:23, 23 March 2008 (UTC)

Took a while to look into this and it appears to be a tar pit of an edit war over useful vs following some interpretation of the rules. Nevertheless, I don't see what we can add since there already appears to be someone helping two editors figure this out. I think this is just a distraction to WP:Statistics. Pdbailey (talk) 02:37, 24 March 2008 (UTC)
Thanks for taking a look. But with respect, one of the most valuable aspects of a Wikiproject is to be somewhere where people with relevant knowledge can come together and seek self-defence against random rules-pushers, POV artists, and other threats.
Unfortunately, the row has become even more of (in your very apt phrase) a "tar pit", with the page currently locked down on the "wrong version" despite the efforts of various Maths editors...
One of the points at issue is whether the meaning of the word Entropy in information theory (and related links) should be considered a "primary meaning" of the word Entropy, as important as thermodynamic Entropy -- or whether it should be listed as an also-ran. It would therefore be useful to have the input of those who come to Entropy from the statistics direction, who I imagine would see the information-theory meaning of Entropy as quite as fundamental as the physics meaning, and deserving co-equal billing; to balance the input of those who may only have heard about entropy in a physical context.
I don't apologise for bringing the matter here. The only way to try to get a good outcome when something like this happens is to get enough people who do understand a subject to come to a page; otherwise everything gets railroaded by the views of those who don't. Jheald (talk) 22:04, 24 March 2008 (UTC)

Archiving discussion

I notice the new automatic archiving applied to this discussion page: (i) I think 7 days may be too short a time period (thats my interpretation of what's going on); (ii) Is there a good way of putting a link(s) to the archived stuff on the duscussion page? Melcombe (talk) 09:45, 26 March 2008 (UTC)

I half set that up last night as a bit of an experiment as it looks like this page is going to host enough discussion to need some sort of archiving, which is good news, and archiving by hand is a fairly tedious business (i've done it a couple of times). I hadn't realised that anything would actually get archived so soon. Please accept my apologies for not discussing it here first. I think you're right that 7 days is too short — lets try 28 days. I've added an archive box. In case anyone wants to change things, I've followed the instructions at User:MiszaBot/Archive_HowTo. MiszaBot archives WT:WPM and seems to do a good job. Qwfp (talk) 10:14, 26 March 2008 (UTC)
I've just undone the actual archiving and commented out the {{archive box}} so everything's back here for now, but discussions older than 28 days will get archived. Does that sound reasonable? Qwfp (talk) 10:44, 26 March 2008 (UTC)
Please don't think think that I was complaining. It's good that someone is prepared to do something. I think 28 days would be OK, but maybe start at around 2 months to allow for longer vacations? I did wonder whether it would be possible to mark certain threads as not to be archived automatically, but perhaps this would be better handled by putting the important points in the main article for the Project. As an example of the type of thing I mean, consider the stuff under "Lists" above. I have taken the liberty of copying this contribution by Michael Hardy into the main article. Melcombe (talk) 11:35, 26 March 2008 (UTC)

Other tags

I have come across this tag, {{Expert-subject|Statistics}}, which is not mentioned in the project article. Is this an advised thing to use? I have made limited use of [[Category:Statistics articles needing expert attention]] which seems not to be so blatant about complaining about an articles contents. Melcombe (talk) 12:15, 26 March 2008 (UTC)

{{expert-subject}} can be used with any WikiProject as argument. When there's a suitable WikiProject it's preferable to the generic {{expert}} tag. I wouldn't expect {{expert-subject|Statistics}} to be used by participants in WikiProject Statistics very often — it's more for use by others seeking expert statistical input and members of WikiProject Expert Request Sorting. I agree it should be mentioned on WikiProject Statistics page somewhere though.
Use of this tag places an article in Category:Statistics articles needing expert attention but an article can be placed in the category without using the tag. I'd agree that these complaining "hat" tags can be overused (see User:Shanes/Why tags are evil and its talk page). I guess the question is whether an article is inaccurate or potentially misleading and the reader needs to be warned, which is probably fairly rare.
Several of the articles in Category:Statistics articles needing expert attention got there because I went through Category:Mathematics articles needing expert attention and reallocated those that seemed clearly to fall within statistics. But I didn't consider at the time whether they deserved to be in the category or whether the tag was too blatant, so it may be useful for someone to revisit that. Qwfp (talk) 14:56, 26 March 2008 (UTC)


I know that there is the page Notation in probability and statistics covering some stuff about notation, but is it worth developing more on other aspects of conventional usage in statistics. Where I work, we generally use capital letters for all distributions as in "Normal distribution" as oppsed to "normal" distribution, similarly for Exponential. I think that this is partly on the grounds that these are specific names for specific distributions and partly on the grounds that it makes it easier to spot where important assuptions are being stated. I note that some changes have/are being made to try to enforce the other convention. I am not really against this, but it would be good to have something somewhere, specific to statistics (or probability and statistics), that would indicate some common conventions towards which things can be moved. There must be other points also ... so some form of guideline? Melcombe (talk) 11:06, 28 March 2008 (UTC)

I'd agree that it might be useful to develop and document some conventions. There's quite a bit already at Wikipedia:WikiProject_Mathematics#Conventions and it might be more helpful to add to those pages than start a new one specifically for statistics, though I'm not sure either way.
I don't think i've come across the convention of initial capitals for names of distributions though. Wikipedia:Manual of Style (capital letters) doesn't seem to cover this explicitly but does give the general guide that "unnecessary capitalization is avoided". Another general rule is that capitals are used for words derived from proper names and the distribution wasn't named after Henry Normal! I don't much like the name "normal distribution" and sometimes prefer to use "Gaussian distribution" instead myself but there's no chance that could become a general convention. Qwfp (talk) 13:41, 28 March 2008 (UTC)

I am accused of vandalism to design of experiments

Two very very confused people have been disputing the content I put at design of experiments. I have patiently explained to them why they are wrong at talk:design of experiments. If they would just go to the library and look up the literature that is referenced in the article, they would see that it backs me up. I have a Ph.D. in statistics and I care about the subject. Anyone who cares would have read my comments on that talk page and attempted to digest what they say. But someone came along and accused me of "vandalism" to the page and reinstated the mathematically erroneous edits. Anyone who would just check the math would see the point. Can others take a look and explain to these people that I'm not just some isolated crackpot? Michael Hardy (talk) 16:13, 28 March 2008 (UTC)

Redescending M-estimator

Redescending M-estimator is very clearly in need of attention. Michael Hardy (talk) 12:55, 3 April 2008 (UTC)

It seems necessary to think about the article M-estimator as well because, while a definition of the phi-function can be found there, it is not particularly prominent. But a proper definition is needed somewhere. Is the term phi-estimator extensively used? Perhaps there needs to be an article for that, before having redescending version. The present text seems to imply that it is the phi-function that redescends, not the M-function ... the objective function for minimisation (the M- or pho- function) would flatten-off for high values.
Melcombe (talk) 15:24, 3 April 2008 (UTC)
Just a suggestion: I would support merging the redescending material into the main M estimator article. Having the redescending article seems quite too specific. Baccyak4H (Yak!) 02:16, 4 April 2008 (UTC)
I see that the article Robust statistics has quite a bit about (and links to) M-estimator... it mentions "redescending ψ functions" but doesn't (yet) link to Redescending M-estimator. Melcombe (talk) 12:16, 7 April 2008 (UTC)

Margin of error

Margin of error was promoted to featured-article status during the 2004 election campaign. Then it was demoted on 3 March 2007. Now that we're heading into another campaign, should we see if we can get it promoted again? And maybe linked to from the main page at some point int he late summer or early fall? Michael Hardy (talk) 16:35, 5 April 2008 (UTC)

can I just add wpstatistics?

Can I just add {{WPStatistics}} as I did to Bias_of_an_estimator or do I have to add something somewhere else too? Sorry, I'm new to this whole project thing. Pdbailey (talk) 23:43, 5 April 2008 (UTC)

I don't think there's anything else that needs to be done. Michael Hardy (talk) 05:10, 6 April 2008 (UTC)

Melcombe (talk) 08:55, 14 April 2008 (UTC)


Can someone who knows how these things are best done sort out the recent overwriting of article Outlier in some acceptable way? Melcombe (talk) 08:55, 14 April 2008 (UTC)

I reverted it, but the article itself could do with a few changes. For one thing, defining an outlier in terms of standard deviations is poor form -3mta3 (talk) 09:15, 14 April 2008 (UTC)

Guttman scale

The article titled Guttman scale is a profoundly terrible mess. One is led by various clues to suspect (and the fact that one can only suspect is part of what's so bad about the article in its present form) that this has something to do with statistics. Please see talk:Guttman scale. Michael Hardy (talk) 17:20, 22 April 2008 (UTC)

Note that there was some text in article Homogeneity (statistics) (now hidden) that implied that the Guttman scale was associated with this (and there is still presently a link). This article was also in a mess, but for info it was/is in category Pschometrics but not Statistics, while Guttman scale is in both as well as Market Research. Melcombe (talk) 17:35, 22 April 2008 (UTC)
See also Scale (social sciences)#Comparative scaling techniques which seems uninformative, but a google does find some stuff that seems understandable. Melcombe (talk) 17:58, 22 April 2008 (UTC)
I've added a lede. Please review and improve.  --Lambiam 08:47, 29 April 2008 (UTC)

Proportionality principle

Does the "proportionality principle" as described at [2] have a more well known name? I'm thinking about adding a section to Monty Hall problem with this analysis, but I'm a little hesitant without a better reference backing up the basic principle. -- Rick Block (talk) 16:09, 26 April 2008 (UTC)

It's a special case of the likelihood principle. I'm not sure if there's any standard name for this special case. Michael Hardy (talk) 16:20, 26 April 2008 (UTC)
It's not really a special case of the likelihood principle, which is more concerned with inference. The ref given indicares that it is really Bayes' Theorem presented in a way that allows the avoidance of some mathematical expressions. Melcombe (talk) 08:50, 28 April 2008 (UTC)

Except that Bayes theorem is used in inference. The likelihood principle says identical inferences should be drawn from proportional likelihood functions; this is the case in which the inferences are the posterior probabilities. So it's a special case of the likelihood principle. Michael Hardy (talk) 15:08, 28 April 2008 (UTC)

a posteriori probability and Empirical probability

The article a posteriori probability is essentially a disambig which leads to both Bayesian stuff and to Empirical probability. Empirical probability is brief and seems to imply that a posteriori probability is covered by what is meant by Empirical probability without saying much else. This seems doubtful to me. Any thoughts on this? There seems to have been an attempt in the past to convert the article a posteriori probability which was then simply a redirect to Empirical probability into a redirect to posterior probability, but this was then changed to point both ways. Melcombe (talk) 10:38, 29 April 2008 (UTC)

The term is used on these slides in the slogan "Hypothesis testing compares a posteriori probability with a priori probability" – which seems based (in my opinion) on a misunderstanding. Hypothesis testing does compare an posterior probability P, but not with a prior probability, but with a priorly selected confidence level. Here P is the posterior probability under the null hypothesis of an outcome deviating (one-sided or two-sided) at least as much from the null-hypothesis norm as the experimentally observed outcome. On the slides the term "a posteriori probability" is indeed construed as being the experimentally observed relative frequency. I haven't examined if this misuse of the term is sufficiently widespread to warrant inclusion of this mistaken meaning in Wikipedia.  --Lambiam 18:19, 30 April 2008 (UTC)
I suggest that Empirical probability should be sent to AfD. One of its two references is at! I challenge anyone to find a widely-used textbook of probability or statistics that has the phrase 'empirical probability' as a term in the index. The current article makes empirical probability simply a relative frequency. I think we can use the term 'relative frequency' for that. EdJohnston (talk) 19:10, 30 April 2008 (UTC)
I did find "empirical probability" in my dictionary of mathematics (Unwin) and it did define it as a posterior probability ... but without saying anything about a prior probability, so it may well be wrong. As for your challenge, I found "empirical probability" in the index of Mood & Graybill's Intro to the Theory of Statistics (2nd Edition)(1963), but the term doesn't seem to be in the text ... it uses "relative frequency" (only) in a section headed "A Posteriori or Frequency Probability". Melcombe (talk) 13:33, 14 May 2008 (UTC)

Maybe it should be redirected to empirical distribution function. Michael Hardy (talk) 20:09, 30 April 2008 (UTC)

I think Empirical probability can usefully be revised to fill the context where, if there is a continous rv X being observed, there is the choice between (i) estimating Pr(X>x) by counting such events in the observed data set and (ii) fitting a parametric distribution function F and esimating Pr(X>x) as 1-F(x). But if no-one sees an equivalence between a posteriori probability and Empirical probability, then perhaps the simplest would be to redirect the former to posterior probability with a little rephrasing of the latter. Melcombe (talk) 09:42, 1 May 2008 (UTC)

Given the above finding in Mood&Graybill, I have now left "a posteriori probability" to point to both places. I have revised "empirical probability" mainly by adding in some statistical context and to indicate alternatives to estimation using empirical probabilities. In that article I have said that the use of the term "a posteriori probability" is not directly related to Bayesian inference (simply "after the event"?). If someone wants to put in exactly how the empirical probability estimate can be obtained as a Bayesian estimate, they might well do so. Additionally, I note that where the article apparently links to "relative frequency" it actually goes to frequency (statistics). Melcombe (talk) 13:34, 14 May 2008 (UTC)


Does covariate need some work? Michael Hardy (talk) 17:53, 30 April 2008 (UTC)

All I see are the possibilities: (i) include other near-equivalent words such as "explanatory variable" for regression and exogenous and endogenous variables for econometrics; (ii) and example application where the term can reasonably be used. Melcombe (talk) 09:47, 1 May 2008 (UTC)
I have modified the article and it may now be clearer. I did not add exogenous and endogenous variables, as these are subtly different ideas. As usual, more might be done. Melcombe (talk) 14:02, 14 May 2008 (UTC)

Additive smoothing

The nearly orphaned article titled Additive smoothing could probably use some work. Michael Hardy (talk) 01:11, 16 May 2008 (UTC)


Eigenpoll is also deficient. Michael Hardy (talk) 01:38, 16 May 2008 (UTC)

Data matrix

Data matrix (lower-case m) now redirects to Data Matrix (capital M). The latter is about a topic in computer science. Several statistics articles link to the former and get inappropriately redirected. Some disambiguation work is needed. Michael Hardy (talk) 18:58, 10 April 2008 (UTC)

Changed Data matrix into a disambig page for Matrix (mathematics), Data matrix (statistics), and Data matrix (computer). For now, made Data matrix (statistics) a redirect to Matrix (mathematics) but this approach permits it to be built as a separate article when someone is ready. Best, --Shirahadasha (talk) 21:26, 10 April 2008 (UTC)
I've bypassed the disambig page for the 5 links to data matrix from article space, of which three now point to data matrix (statistics), namely Biplot, Origin of birds and Cluster analysis. Qwfp (talk) 07:35, 11 April 2008 (UTC)
For now, I've hidden the Data matrix (statistics) entry at Data matrix since it is just a redirect to Matrix (mathematics) as noted by Shirahadasha above. I also added Data matrix (statistics) to Category:Redirects with possibilities. Btyner (talk) 14:23, 26 May 2008 (UTC)
Those thinking of these pages might want to consider also the article Dataset, which seems close to implying that a dataset is a single data matrix. Melcombe (talk) 09:17, 11 April 2008 (UTC)

"Exact test"

At talk:exact test I've asked if someone can fill in certain items of information in the article that I could not. Further comments on that page are welcome. (Or on this page.) Michael Hardy (talk) 20:13, 16 May 2008 (UTC)

Things on boundary of scope

We probably need to have a policy about what to do about articles having a statistical backgroud/relevance but which set in a different context. For example, I came across Evidence under Bayes theorem which seems dedicated to a legal context. It is not (yet) listed in the list of statistical topics, and perhaps it shouldn't be? No doubt there are others that contain applications of statistical ideas but are not strictly about statistics. But perhaps these would be more distantly related and so more obvious. Should there be a "list of non-statistical topics related to statistics"?

Melcombe (talk) 17:08, 2 April 2008 (UTC)
I would consider "statistical thinking" and conceptual topics to be relevant. Best, --Shirahadasha (talk) 20:10, 3 April 2008 (UTC)
Another example might be statistical multiplexing. I tried adding it to Category:Statistics but this was reverted. Btyner (talk) 14:01, 26 May 2008 (UTC)
I have added it to Category:Queueing theory which does contain some telecoms stuff and which is under Category:Statistics. Melcombe (talk) 09:55, 30 May 2008 (UTC)

Category:Probability and statistics, Category:Probability, Category:Statistics

I'm sure this has been debated before, but what use does Category:Probability and statistics serve? Certainly there are articles that belong in both categories, but is the intersection of these categories really a useful category itself? Note that Probability and statistics, the "main article" for Category:Probability and statistics, is essentially a disambiguation page. Btyner (talk) 14:07, 26 May 2008 (UTC)

Category:Probability and Category:Statistics are both subcategories of Category:Probability and statistics , together with Category:Randomness. At present there are many articles listed directly under Category:Probability and statistics that might be better removed/moved to other categories. Are there any obvious other categories that should reasonably be added as subcategories to Category:Probability and statistics rather than just being subcategories of either Category:Probability and Category:Statistics ? How is operations research dealt with? Melcombe (talk) 11:25, 28 May 2008 (UTC)
I have added this task, and revision of some articles mentioned above to the "Todo" lists in the project page. Melcombe (talk) 10:53, 29 May 2008 (UTC)


Can anyone help with the "SkewLogistic" distribution? It is used in the "Related distributions" sections of the Chi-square distribution, Gamma distribution and Exponential distribution articles, but doesn't have its own article and doesn't appear anywhere else. It seems it need to be some type of Gumbel or extreme value distribution to fulfill what is in the articles where it appears. Melcombe (talk) 15:55, 29 May 2008 (UTC)

I was wrong about the extreme value distribution bit, but there are still problems. It seems that the "SkewLogistic" distribution here needs to a generalized logistic distribution of Type I according to Johnson,Kotz&Balakrishnan terminology, whereas the "literature" (ie. google) comes up with a very different distribution for "skew-logistic". Melcombe (talk) 15:36, 30 May 2008 (UTC)

"Temporal mean"

What should we do with the stub article titled temporal mean? Michael Hardy (talk) 16:00, 19 April 2008 (UTC)

Let's see, what are the options? Transwiki to wiktionary? Merge to mean? Both?? Qwfp (talk) 18:57, 19 April 2008 (UTC)
...or expand into a substantial article? Michael Hardy (talk) 23:10, 19 April 2008 (UTC)
Is there much substantially more to say than (essentially) "temporal mean means mean over time"? I don't know myself as time series and related topics are not something i've ever really studied. Qwfp (talk) 10:58, 20 April 2008 (UTC)
Well, there may be something more to say using the context of space-time modelling and data, so that a temporal mean would often be spatially varying. Also, for "ordinary" time-series, there might be something relevant to say about reducing datasets of say daily data to monthly, using monthly means etc., so as to create time-series of temporal means. However, I have not found a relevant reference in which the phrase is used, although I did find "temporal autocorrelation". Melcombe (talk) 08:49, 21 April 2008 (UTC)
Based on the comment by Melcombe, I think the better option would be to move it into an article on temporal statistics or high frequency statistics. A brief search turned up nothing, if others find nothing, I say delete it and the conept must wait until the other article is written. Pdbailey (talk) 15:38, 21 April 2008 (UTC)
What about a redirect to Moving average?  --Lambiam 23:14, 28 April 2008 (UTC)
I think a deletion would be best at present, as there are many possible somewhat distinct meanings and any possible redirect is likely to be off-target. The article seems not to have any substantive articles linking to it (?) ... one guess is that it originated in a list of topics found on other general maths/stats websites. Melcombe (talk) 08:53, 29 April 2008 (UTC)
Given the present content of Temporal mean, the redirect is on the dot. Should different and notable meanings of the term "temporal mean" emerge later, we can always change this then into, for example, a disambiguation page.  --Lambiam 16:03, 30 April 2008 (UTC)

I note that, since this discussion, after replacement with a redirect the artice was restored and briefly extended with a reference, if anyone here wants to take further interest. Melcombe (talk) 09:24, 16 June 2008 (UTC)

merge binomial test and sign test ?

Aren't these really the same thing? I proposed this merge in Nov. 2006, but forgot about it and the tags were removed in Oct. 2007 without any discussion for or against. Any comments from the crowd here? Btyner (talk) 23:16, 11 June 2008 (UTC)

They are the same thing in a mathematical sense eventually, but only after going through a layer or two of reduction from different contexts, and it is these different contexts that make it reasonable to have separate articles. I suggest putting a link to binomial test into sign test and expanding the latter to include either/both more discussion about nonparametric tests of shifts of location (which this isn't quite of course) and/or links to other such tests. If the article ever got particularly detailed, there could be discussion of the power of the test against shift-alternatives, which wouldn't really fit immediately into a more general article on binomial test. Melcombe (talk) 09:00, 12 June 2008 (UTC)

"Statistical law"

What are we to make of the stub article titled Statistical law? As it stands, I'm not sure there's any precisely defined concept here. Michael Hardy (talk) 23:36, 6 June 2008 (UTC)

We do not have articles titled Mathematical law, Geometrical law, Topological law, etcetera, nor should we, for the simple reason that these are not established concepts. I likewise see no raison d'être for this article – which at best would be a dictionary definition.  --Lambiam 03:48, 7 June 2008 (UTC)
Should we also get rid of Category:Statistical laws? There seem to be a variety of ways of speaking that people have used in the past. I suppose we don't have to take notice of all of them. But is Zipf's law not a law? Is it not statistical? That article was put in the category Statistical laws in August, 2006 but our article Statistical law was only created this week. I agree that the current text of the article Statistical law doesn't seem right. EdJohnston (talk) 04:40, 7 June 2008 (UTC)
One role for the article Statistical law would be as a target link from the article Scientific law to act as another marker that not all scientific laws concern physics . There may be a need to make distinctions between probability-theory-based laws and statistical- observation-based laws: note that there are some "laws" under Category:Statistical theorems.
A specific suggestion is to place Category:Statistical laws not only directly under Category:Statistics but also under Category:Statistical theory, so that there would be the following sub-categories of this: Estimation theory; Hypothesis testing; Statistical inequalities; Probability interpretations; Statistical approximations; Statistical theorems. Thus "inequalities", "approximations", "theorems" and "laws" would form a natural grouping of categories.
Another suggestion is to make article Statistical law (renamed) a lead article for Category:Statistical laws, with a content saying something about the types of things "statistical laws" are, which might be something like... "types of empirical behaviour commonly observed across many different collections of data". Perhaps the article could then have a brief introduction, from an empirical point of view to things like the central limit theorem (to avoid having to place the theory under "laws"). And perhaps some of the articles under Category:Statistical laws could be moved to other subcategories.
Melcombe (talk) 11:55, 11 June 2008 (UTC)
As there are no articles of substance which link to it, I would say delete it. If it is to be kept, I agree that it should be renamed to something like emperical statistical laws to distinguish it from probability theorems like the law of large numbers. -3mta3 (talk) 11:21, 12 June 2008 (UTC)
It may be difficult to distinguish ... many things now backed-up by theorems may have started off as empirical observances. Melcombe (talk) 10:08, 13 June 2008 (UTC)

I have revised and moved the article to become Empirical statistical laws... this could still do with expansion, particularly if someone could enter some history of things like the law of large numbers being noticed. Melcombe (talk) 15:46, 19 June 2008 (UTC)

List of basic topics, Topical list

I see that the previous "List of basic statistics topics" has been co-opted (moved) to become Topical outline of statistics. This seems to now fall between two stools as there is no longer a list of basic statistics ideas and the "topical list" doesn't seem to have the required breadth of coverage of statistics to cover all (most) topics in statistics. I don't think the previous "List of basic statistics topics" was necessarily correct, but there is a question of what its intent should be: (i) an abbreviated version of the long list; (ii) topics covered in an introductory course on statistics for statisticians (or otherwise)? Melcombe (talk) 12:15, 17 June 2008 (UTC)

K-factor error

Can anyone help with the article titled K-factor error? It appears to assume the reader knows what a "k-factor" is, and the article titled k-factor is a disambiguation page that does not help. Michael Hardy (talk) 23:43, 22 June 2008 (UTC)

The article is still being edited by someone, but mostly to remove various templates that have been added. I have have put something into the article's discussion page, where others may want to express their opinion or keep track. Melcombe (talk) 09:13, 24 June 2008 (UTC)

Rename proposal for the lists of basic topics

This project's subject has a page in the set of Lists of basic topics.

See the proposal at the Village pump to change the names of all those pages.

The Transhumanist 10:22, 4 July 2008 (UTC)

Analysis of variance in linear regression

I've just done a series of edits on the analysis of variance section in linear regression that amount to a semi-major rewrite. Some idiot claimed that the "regression sum of squares" was THE SAME THING AS the sum of squares of residuals, and the error sum of square was NOT the same thing as the sum of squares of residuals. In other words, the section was basically nonsense. Michael Hardy (talk) 12:19, 1 July 2008 (UTC)

Michael Hardy, I'm not sure why you posted this note here, but in any case, lets focus on the article and not the editors (until the editors need focusing on). Is that the case here? Pdbailey (talk) 17:39, 7 July 2008 (UTC)
OK, but it's hard not to notice the editors when they appear to be typing without paying attention to what they're typing. This seemed like a case where only inattentiveness could explain the weird nature of the content.
The reason I posted it here is that the article may need attention. Michael Hardy (talk) 20:57, 7 July 2008 (UTC)

Orders of approximation

has anyone ever heard of this topic? it should probably in statistics or mathematics, not just dangling there. The page definitely needs some time if it isn't a dupe. Pdbailey (talk) 13:53, 8 July 2008 (UTC)

This is already in category Numerical analysis, so doesn't quite dangle alone. Possibly not really relevant to statistics apart from possibly a link from stuff on function fitting or model selection. Melcombe (talk) 16:05, 8 July 2008 (UTC)

Semantic mapping

How shall we address the issues I raised at User talk:Fc renato? Michael Hardy (talk) 16:37, 12 July 2008 (UTC)

The article title should definitely be lowercased. Gary King (talk) 18:19, 12 July 2008 (UTC)

So should it be semantic mapping (statistics) or just semantic mapping? If the former, then the latter should be a disambiguation page. Michael Hardy (talk) 03:07, 13 July 2008 (UTC)

Independent component analysis

On behalf of those of us who took Statistics1 in college 10 years ago and don't speak the jargon, I am appealing to the good folks of this WikiProject to do something to make this article readable. Try this simple experiment: get someone you know who does not actually know anything about this feild and show them the lead sentence of this article: "Independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals. It is a special case of blind source separation." Now, I realize some of the terms are linked, so you could at least partially put it together, but there is just not enough information here for your average Joe to make any real sense of that statement, especially if they don't happen to know what a "non-Gaussian source signal" is. Thanks for your time Beeblbrox (talk) 16:37, 13 July 2008 (UTC)

Lots of statistics articles need lots of work, and it's moving slowly, but maybe in five years statistics will be among the subjects that Wikipedia treats clearly and thoroughly. Michael Hardy (talk) 16:46, 13 July 2008 (UTC)

"Estimator" and "Estimation theory"

The articles titled estimator and estimation theory are pretty weak in their current forms. A lot of work is needed. Michael Hardy (talk) 15:26, 7 July 2008 (UTC)

why have both? Pdbailey (talk) 17:35, 7 July 2008 (UTC)
Not sure, but we also have "estimation".
Maybe we should think about how to merge all three into one. Michael Hardy (talk) 20:59, 7 July 2008 (UTC)
Hmm, amazing how often Wikipedia ends up with multiple articles because someone red links something. Perhaps redirects should be preemptive. Back on topic, that is a lot of work, and convincing three sets of editors that they should give up on the text they worked on could take a while. But, that doesn't mean that it isn't the right thing to do. Pdbailey (talk) 01:10, 8 July 2008 (UTC)
But article Estimation is essentially a dismbiguation page (which might be cleaned-up a bit but otherwise should remain), while Estimation (statistics) already redirects to "Estimation theory". Melcombe (talk) 13:44, 8 July 2008 (UTC)
It may be useful to have two articles, one assuming technical background and a more accessible article written from a non-technical point of view. See e.g. Quantum mechanics and Introduction to quantum mechanics. Best, --Shirahadasha (talk) 03:45, 8 July 2008 (UTC)
Not sure I agree. I don't think there is much non-technical interest in estimation. The non-technical bit might be: you do your best for something that has some desirable properties that statisticians can argue over the merits of ad nauseam. Pdbailey (talk) 03:50, 8 July 2008 (UTC)
But that's because the statistics community has largely failed to educate the public on the value of statistical concepts and statistical thinking. Key properties like bias, variation, efficiency, optimality criteria, etc. can be explained non-technically. --Shirahadasha (talk) 12:21, 8 July 2008 (UTC)
Shirahadasha, I guess I think the concepts of bias and dispersion are much simpler to explain at multiple levels on one page. Far easier than all of what is wrapped up in and its various solutions. I guess I think we can put the simple explanation all all the details on one page and the reader can read until they don't want to. Pdbailey (talk) 13:37, 8 July 2008 (UTC)
It seems that estimation theory started off from the point of view of signal processing and still has much of that flavour (and possibly notation). I think that "estimation theory" is a better title for a statistics-based article that "estimator" so it may be most appropriate to move to a dual-article situation, having "estimation theory (signal processing)" and "estimation theory (statistics)", or some such, with much (all) of what is in "estimator" moved into the latter. Incidently, I think the same problem of having things started from a signal processing POV has arisen elsewhere, in particular for cross-correlation, which may also benefit from splitting of the signal processing POV. However, I think I saw in some of the talk pages that someone was keen to make a firm distinction between "estimator" and "estimation" so it might be best to move this discussion to the articles' talk pages as the next step. Melcombe (talk) 08:55, 8 July 2008 (UTC)

I just looked at the "what links here" form estimation, estimator and estimation theory. They all appear to be well linked and even heavily linked via redirects. Isn't there a bot that fixed links to redirects? I ask because I was thinking of making estimation a redirect to estimator, but as I recall double redirects are a problem. Pdbailey (talk) 13:47, 8 July 2008 (UTC)

(copied from above in case missed) But article Estimation is essentially a dismbiguation page (which might be cleaned-up a bit but otherwise should remain), while Estimation (statistics) already redirects to "Estimation theory". Melcombe (talk) 13:44, 8 July 2008 (UTC) Melcombe (talk) 16:01, 8 July 2008 (UTC)
Melcombe, i really don't understand your comment. It doesn't claim to be a disambig page. If it is one, what is it disambiguating between? Why does it have links to it if it is a disambig page? Pdbailey (talk) 18:45, 8 July 2008 (UTC)
well I did say "essentially a disambiguation page" at least tries to distinguish between maths and stats versions of "estimation" and there may be some other alternative meanings in the links under "see also". It doesn't have the disambig template and an initial question is whether there is a need for a disambig page for "estimation"... it look as if there could be. As to why there are links to the page ... you have history, laziness and bots as possible reasons. Melcombe (talk) 10:19, 9 July 2008 (UTC)

Proposed solution

I propose that we generate a new article that combines the three. I'd submit that we can start gentle and then include more math the farther down the page we go and have only one article, but am open to having two if that is not possible. I've created a stub at User:Pdbailey/Estimation. Pdbailey (talk) 18:10, 12 July 2008 (UTC)

I think you would be trying to cram too much into a single article, and that there is a lot more that is not yet mentioned. Probably you have not yet considered the largish number of closely associated topics for which there are already articles. And why start with "estimation theory"... why not start at "statistical theory" or "statistics" ...because it wouldn't be sensible to do so. Melcombe (talk) 09:18, 14 July 2008 (UTC)

User:Pdbailey/Estimation is so far a very very biased (pardon the pun) article, and fails to include anything like the simple definitions now found at estimation. Michael Hardy (talk) 17:24, 14 July 2008 (UTC)

Michael Hardy, the article is now just a copy and paste of the other articles. I was working on a leed, but gave up. Pdbailey (talk) 22:00, 14 July 2008 (UTC)

Question from Todo

"should Category:Systems of probability distributions and Category:Types of probability distributions be merged? " (Q by Btyner). Since I initiated these categories, I can say that I thought of "sytems" as meaning things like the Pearson, Jonhnson, Burr systems (which are called sytems in the literature) and others such as mixtures (concentrating on the the system thing, rather than as individual probability distributions), while "types" was for other generic qualities or categories of distributions (or families of disrtributions) such as circular distributions, log-tailed distributions, location-shift, etc.. These seem to be rather different ideas deserving of beeing treated separately. Melcombe (talk) 09:05, 14 July 2008 (UTC)

Fair enough--thanks for creating the categories! By the way I recently added Exponential family, Natural exponential family‎, Location-scale family, and Maximum entropy probability distribution to the "types" category, and Tweedie distributions to the "systems" category. I hope this is in accordance with the intended usage. Thanks again! Btyner (talk) 23:44, 15 July 2008 (UTC)


Does anyone else think Wikipedia would benefit from a statistics and/or probability portal? There are portals for algebra, analysis, category theory, cryptography, discrete math, geometry, topology, and set theory. Why not one for us? Btyner (talk) 23:52, 3 July 2008 (UTC)

I would weakly support this but I wouldn't know how to go about it. Perhaps something along the way could be considered but starting up a "topic list" for statistics that would be brief enough to go on a portal page but which would give a good indication of the range of things covered by statistics. Melcombe (talk) 12:41, 4 July 2008 (UTC)
Started Portal:Statistics. Let's all see what we can make of it! Btyner (talk) 22:08, 19 July 2008 (UTC)

Bayesian average

What should become of the article titled Bayesian average? It seems to be the same thing as a posterior expected value. In its present state, the article certainly needs work, but I'm wondering if there's something it should get merged into? Michael Hardy (talk) 03:44, 18 July 2008 (UTC)

It might be used as part of an expanded introduction in the article Empirical Bayes method but that would need quite some effort to make it fit (since that latter doesn't yet contain a "normal distribution" context). Another possibility is the presently extremely brief article Shrinkage (statistics) ... shrinking towards the mean is a standard sort of terminology. Melcombe (talk) 10:49, 18 July 2008 (UTC)

Normally distributed and uncorrelated does not imply independent

The merge proposals at Normally distributed and uncorrelated does not imply independent do not seem well thought-out. Maybe people here can add useful comments. Michael Hardy (talk) 22:45, 22 July 2008 (UTC)

Optimal classification

I'm not sure whether to consider Optimal classification to be within the scope of this WikiProject or not. But it's been nominated for deletion: Wikipedia:Articles for deletion/Optimal classification. It looks as if some of the people saying it should be deleted have no interest in or knowledge of the subject matter. Michael Hardy (talk) 15:21, 25 July 2008 (UTC)


I took a try at reorganizing Category:Statistics. Let me know if I need to undo anything or make additional changes. My goal was to have as few pages in the category, and assign pages to subcats. The subcats need TLC if anyone feels up to the task.

Also, what is the wikipolicy of round robin cats? I think that Category:Statistics, Category:Probability and statistics and Category:Probability should all be sub cats of each other -- at least until the cats are better organized?

G716 <T·C> 22:01, 25 July 2008 (UTC)

Please take this discussion to the section above. Btyner (talk) 14:53, 27 July 2008 (UTC)
Previous discussion now archived (but not much there anyway).Melcombe (talk) 09:29, 30 July 2008 (UTC)
According to Wikipedia:Categorization, cycles should usually be avoided. This seems like a situation in which {{See also}} should be used to avoid a category cycle. Stepheng3 (talk) 04:32, 29 July 2008 (UTC)
On the question of re-organising the statistics-related categories.... I would suggest this be done on the basis that categories are meant to be useful, with tidyness being much less important. While the present situation, in terms of artilcles directly under "Statistics", is a lot better then the 300 or so articles 2 or 3 months ago, there may now be rather too few. The same may apply to the categories directly under "Statistics", but to a lesser extent. We need a balance between the needs of those who know what they are looking for (ie. would know the specific terminology to look for) and those who may know the type of thing they are looking for but not what it is called. As a next step I suggest adding back, as articles directly under "Statistics", a few (very few) articles on leading sub-topics, such as statistical inference, statistical graphics etc.. Perhaps the article "Foundations of statistics" should be removed as it only a rather grandiose name for part of statistics, whereas we all know that the true basis of statistics are the ideas about presenting information graphically. In terms of categories, I see that some were removed from directly under "Statistics" ... without necessarily revisiting those removed can all here think whether there are any other obvious categories that should usefully be found directly under "Statistics". Melcombe (talk) 09:29, 30 July 2008 (UTC)
If a biography is in a subcategory of Category:Statisticians by nationality, should it also be in Category:Statisticians? Seems that some pages are in one, or the other, or both; we should have some consistency.—G716 <T·C> 22:17, 25 July 2008 (UTC)
I think that articles can usefully be in both categories. Melcombe (talk) 09:29, 30 July 2008 (UTC)

"Cumulative density function"

You'd have to miss the point of the first two words in this absurd phrase completely in order not to see that they flatly contradict each other. But there it was in Rice distribution until I fixed it a few minutes ago. And I've seen it before. We need to search for it and expunge it. Michael Hardy (talk) 21:43, 28 July 2008 (UTC)

Hadamard variance

Hadamard variance was put up for speedy deletion. I removed the tag and gave it a category also used at Allan variance which may be wide of the mark. I'd appreciate it if someone from this project could take a look at it. Thanks. Ben MacDui 20:29, 2 August 2008 (UTC)

Some work has since been done on the article, some of it by me. But so far there's no actual statement of the definition. Michael Hardy (talk) 00:57, 3 August 2008 (UTC)
Thanks for this. There is an intro here if anyone is inspired to assist. Ben MacDui 08:25, 3 August 2008 (UTC)

chi-square again

I was wondering why the list of articles has so few X's when I saw in my dictionary of statistics "X2-statistic", and following this up elsewhere "X2-test" (Kendall and Stuart). K&S (1973) say "following recent practice, we write X2 for the test statistic and reserve the symbol χ2 for the distributional form ... Earlier writers confusingly wrote χ2 for the statistic as well as the distribution." Presently, articles seem only to use χ2, without mentioning X2 at all (?). So, any thoughts on bringing the X2 convention into play? It needs a mention somewhere at least, possibly using a redirect, but maybe the articles on chi-square tests, contingency tables etc., should be fully modified to reflect this usage (if X2 actually does have a wide use). Melcombe (talk) 09:43, 5 August 2008 (UTC)

I'm not sure how much credence to put on conventions that old. [3] looks like it might be helpful, though I don't currently have journals access. Does anyone want to have a look at it?--Fangz (talk) 11:28, 5 August 2008 (UTC)
Well my dictionary is a 2002 version and it references the book "The Analysis of Contingency Tables"(Everitt,1992) for this topic. I don't have access to that, but I found "A Guide to Chi-Squared Testing" (Greenwood & Nikulin, 1996, Wiley) which does use X2 for the test statistic, without calling it either X2-test or X2-statistic. Seaching on-line is little immediate use since X2 is a common text-replacement for the symbol chi-squared.Melcombe (talk) 13:26, 5 August 2008 (UTC)
I always thought exactly that: the "X" was used only because some typesetting cannot make a proper "χ". If so, I still am not sure what the best recommendation is. Universally using "χ" would certainly be appropriate, but as "X" is in common usage for whatever reason, I cannot see a strenuous objection to it either, as its usage here would mirror outside usage. Might a mention alongside the description using the χ symbol that demonstrates the "X" alternative make sense? Baccyak4H (Yak!) 13:41, 5 August 2008 (UTC)
The use of an X is used in McCullagh and Nelder's "Generalized Linear Models" and called, "Pearson X2 statistic. (see, i.e. page 34) or "Pearson's statistic" on page 121. I just edited deviance and realized I should probably link Pearson X-squared statistic to the appropriate article on chi-squared tests. Pdbailey (talk) 18:40, 8 August 2008 (UTC)
I guess I should also add that "Generalized Linear Models" is obviously typeset in TeX or some derivative typesetting system, and every symbol is selected to be just so. I have no doubt they fully intended an X and not a . Pdbailey (talk) 21:05, 9 August 2008 (UTC)
I've seen X2 in a lot of places, and I don't think the reason is typographical. I believe the reason it is used in newer texts is that the Pearson statistic does not actually follow a χ2 distribution, rather it approaches one asymptotically as the sample size goes to infinity. By contrast, the F statistic truly follows an F distribution, the t statistic a t distribution, and so on. Perhaps this discussion should be continued in Talk:Pearson's chi-square test. Perturbationist (talk) 14:14, 9 August 2008 (UTC)