Wikipedia talk:WikiProject Statistics/Archive 3

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Statistics portal listed for peer review

I've listed the statistics portal for peer review hereG716 <T·C> 04:55, 22 May 2009 (UTC)

Fisher consistency

The introductory section of Fisher consistency is as opaquely and clumsily written as anything you'll see. I am still somewhat uncertain how best to rewrite it. Michael Hardy (talk) 16:14, 25 May 2009 (UTC)

Is this a different concept than Consistent estimator? —3mta3 (talk) 16:44, 25 May 2009 (UTC)
These slides by Steffen Lauritzen makes clear there is a difference (slide 8): The sample variance is consistent for the population variance whether you use n or n–1 in the denominator, but only the former is Fisher consistent (while only the latter is unbiased in finite samples of course). Robust Statistical Methods with R gives the same example on p11 (Google books). Someone (maybe even me) should put this example in the article. Qwfp (talk) 19:19, 25 May 2009 (UTC)

I seem to recall that Fisher's way of writing about it made me wonder if he himself appreciated that it wasn't obvious that they're the same thing, and it seemed quite possible that they were not. I'm not quite sure the issue of n versus n − 1 is really a counterexample in one of the senses in which Fisher might reasonably have intended it, since he might have been thinking of an infinite population with a continuous distribution, and the ratio of n to n − 1 approaches 1.

I am suspicious of the article's assertion about the way in which the concept is "most often used".

And what can we make of the following words?

the probability it will match along the lines of interest, in terms of probability curves

That's one of the worst pieces of writing I've seen in a while. Michael Hardy (talk) 21:13, 25 May 2009 (UTC)

I made some edits to the body, but not the introduction. I hope it's an improvement. Skbkekas (talk) 00:50, 26 May 2009 (UTC)

Question, your consideration

Regarding ANOVA analysis. Given that a first level anal is a t test is two dimensional and that a second level Anova cube analysis is a functional consideration i assume that the distance across analysis in indirectly colarable factors would be reliant, on either an assumed distance within three dimension on higher order dimensional factor.

my first question is this does the Anova depend on higher order dimensional relationships as expressed through set logic. or three dimensional distance reliant on distance as expressed Accross three standard deviations expression of accuracy reliant on, three dimensional order.

i have seen spss's Anova analysis and assume it is possible to define the analysis, Essentially, set logic? apparently an higher order dimensional analysis, but would assume other software of grater quality would be able to correlate the expression of ideal three dimensional error relationships between bodies as a factor of; total, difference again in terms of a bell curve expressed across a three dimensional map whilst the portrayal of such error relationships may be best expressed in terms of. simple spectrum. direct line analysis. i assume the difference between subjects in scalar terms; to the ideal populous would express in relation to a parallel Anova of accuracy in this the expression of two way exceptions may be use full in highlighting worthwhile sets.

Does any one know of a project that has worked on such a package or intends to produce one? can someone advise me on best principals on the formulaic construction - of spss set logic? —Preceding unsigned comment added by FirmBenevolence (talkcontribs) 02:51, 4 June 2009 (UTC)

category proposal

Would a category "Statistical metaphors" be useful? Possible articles to belong here might be

But I'd like to get others' opinions on whether you think such a category would be useful/have a well-defined scope, etc. Thanks, Btyner (talk) 02:17, 30 May 2009 (UTC)

This may fall afoul of Wikipedia:OC#SHAREDNAMES (or at least the principle beyond it, that features of names are generally not a good basis for categorisation). Why not create a list instead? -- Avenue (talk) 01:46, 1 June 2009 (UTC)

abhorent statistics

Regarding fischer consitency Belive relivant to noxious anti cancer drugs, Freuds endemic, and genetic feedback.

Sig FirmBenevonce Start End Document. —Preceding unsigned comment added by FirmBenevolence (talkcontribs) 03:22, 4 June 2009 (UTC)

Another article considered for deletion

A probability/metrics type article is up for deletion. Please see Lukaszyk-Karmowski metric and comments at Wikipedia:Articles for deletion/Lukaszyk-Karmowski metric. An existing suggestion is that there may be something here worth keeping if someone can suggest a broader article into which it might be merged, so any ideas? Melcombe (talk) 09:20, 4 June 2009 (UTC)

Statistics Portal nominated as featured portal candidate

I have nominated the statistics portal to be a featured portal hereG716 <T·C> 19:02, 7 June 2009 (UTC)

Dormant article

Please consider article Interclass dependence. It seems meaningless to me and isn't linked by any real article. My dictionary says that "Interclass correlation" is the same as an ordinary correlation, which is essentially what Interclass correlation says. So is there anything to give a meaningfull definition for "Interclass dependence" ... perhaps some maths to say what variables are being considered? Melcombe (talk) 09:13, 29 April 2009 (UTC)

Interclass dependence has been PRODed—G716 <T·C> 02:27, 12 June 2009 (UTC)

Number of articles in WikiProject Statistics

Some statistics on the number of articles in the project and their "cleanliness" from Wikipedia:WikiProject Statistics/Cleanup listing.

Date Number of articles tagged
with {{WPStatistics}}
Number (%) of articles
requiring cleanup
14 July 2008 477 129 (27.0%)
8 October 2008 1258 289 (23.0%)
24 February 2009 1390 326 (23.5%)
6 March 2009 1405 339 (24.1%)
4 June 2009 1575 325 (20.6%)

G716 <T·C> 01:07, 9 June 2009 (UTC)

Numerical smoothing and differentiation

This has been a member of Category:chemistry stubs for a while. I just added it to Category:statistics. Any thoughts? Btyner (talk) 01:52, 8 June 2009 (UTC)

Good work on linking it to statistics. It looks like a nice article, which will be useful for computational statistics. [Care should be taken with the polynomial fitting, because I couldn't find "spline" (or "local" model) there, although the language of "succesive sets" gives some implicit wiggle-room. Specifying differentiability or polynomials can be problematic if the real need is Lipschitz- or Hölder-continuity; also rational splines are attractive alternatives to polynomial splines. It could use statistical references (Grace Wahba, I. J. Good on penalized likelihood, etc.) and links to generalized additive models (GAM).] Again, good work on linking it to statistics!Kiefer.Wolfowitz (talk) 16:30, 8 June 2009 (UTC)
There is not really any statistical content in the article as it stands, although it does have "least squares fitting" and it does mention "noise". The context is fairly similar to curve fitting and I have just now added it into thecorresponding categories of Category:Numerical analysis and Category:Interpolation. As for stats categories, I would suggest "computational statistics" and "regression analysis". However, the article's concentration on the equally-spaced x-values case brings to mind the work in one of Kendall's books on finding coefficients for symetric and non-symetric moving averages for trend-fitting. There is some stuff in Kendall&Stuart's Vol.3 (Chapter 46) with tables of weights, and I think one of Kendall's other books had tables covering end-points. So there may be some point in having it under a time-series analysis category. Melcombe (talk) 09:11, 9 June 2009 (UTC)
Why are these two concepts grouped together? Should they be split and merged into more mature articles? I have to imagine that applied math has an article on numerical differentiation and that somebody added a smoothing spline type article somewhere, right? PDBailey (talk) 21:36, 9 June 2009 (UTC)
In principle I agree, but this article was in some sense appropriated from the chemistry space, so I would hesitate to do this. Ideally, this article would focus on the use of smoothing, interpolation, and numerical differentiation in the analysis of chemistry data, and leave the technical and foundational issues to the more mature mathematical and statistical articles. Skbkekas (talk) 13:58, 10 June 2009 (UTC)
A possibly valid point, but there is little chemistry background at present in the article. So that the chemistry link is not overlooked I have added the article to Category:Mathematical chemistry and Category:Cheminformatics, which seem likely to be relevant. In addition, I have moved it from Category:Statistics to Category:Regression analysis. Melcombe (talk) 11:22, 11 June 2009 (UTC)

Well there are no chemistry related links to it[1]. In fact, most links seem due to its appearance on {{Least Squares and Regression Analysis}}.—3mta3 (talk) 12:24, 11 June 2009 (UTC)

It earlier had a chemistry stub tag. There is a related article at Savitzky–Golay smoothing filter which contains (only) chemistry references. X-refs have been put in the articles. One of them has a link to a Dutch wikipedia article which contains more stuff (in Dutch). Melcombe (talk) 08:49, 16 June 2009 (UTC)

Smoothing categories

Where should smoothing articles be categorized? Below are some example smoothing articles and the currently assigned categories. Are there enough smoothing articles for a separate "Statistical smoothing" category.

Article Category/ies
Additive smoothing Statistical natural language processing
Kernel smoother Non-parametric statistics
Local regression Regression analysis
Numerical smoothing and differentiation Statistics
Smoothing spline Regression analysis, Splines, Statistical methods
Smoothing Statistical charts and diagrams, Data analysis, Time series analysis, Signal processing

G716 <T·C> 05:21, 9 June 2009 (UTC)

There should also be something under a spatial analysis category, although it might be represented as something like "non-exact interpolation", possibly under "kriging". Melcombe (talk) 09:11, 9 June 2009 (UTC)

Let's not forget

Article Category/ies
Savitzky–Golay smoothing filter Filters (should be Filter theory !)

seeing as at the moment that is (and no more than) what the article currently describes.

We should probably also be considering smoothers based on radial basis functions, gaussian processes, ARMA processes and local Fourier fits, and Tikhonov-style regression/regularization, including regression to differential equations.

Of course once past the surface there are common underlying structures that can be found between all of these. Jheald (talk) 21:08, 11 June 2009 (UTC)

A further different type of "filter" to which "smoothing" applies is of course a Kalman filter ... the article seems to have something about a fixed-lag smoother but possibly not the full forward-then-backward-pass smoothing. Melcombe (talk) 13:36, 15 June 2009 (UTC)

"Revising opinions in statistics" proposed for deletion

Please see the discussion at Wikipedia:Articles for deletion/Revising opinions in statistics. Don't just say Keep or Delete or Redirect or whatever; give your arguments and answer others' points. Michael Hardy (talk) 03:15, 13 June 2009 (UTC)

In other news, edit summaries are also helpful. PDBailey (talk) 13:34, 13 June 2009 (UTC)

Another article to consider

I have come across the article (a,b,0) class of distributions which is untouched for some time and has essentailly no articles linking to it. Questions are: Is there a better name for this? Is it just a special case of something else? Melcombe (talk) 09:04, 16 June 2009 (UTC)

Never heard of it before, but it appears to be the same thing as Panjer recursion#Claim number distribution. Merge? Note that Panjer class already redirects to this section. Google reveals that Panjer defined an (a,b,1) class too, but Wikipedia appears to have nothing on that at present. Qwfp (talk) 10:28, 16 June 2009 (UTC)
I've just formally proposed a merger. See Talk:Panjer recursion#Merger proposal. It appears the original authors of both articles are still editing, if not every day, so I'll give it a week or two to see if they can help as this really isn't my area. Qwfp (talk) 11:09, 16 June 2009 (UTC)
This has prompted me to look in Johnson,Kotz & Kemp's book on Univariate Discrete distributions ... the "(a,b,0) class of distributions" seems to be identical to what they call the "Katz family" due to Katz's work over 1945-65, and which is a special case of a family worked on by Carver 1919-25. Ord's book "Families of Frequency Distributions" covers the general case of Carver and I think identifies the "(a,b,0) class of distributions" as a special case which he calls type III without associating anyone's name with it. It seems there may be enough known about these distributions to have results for moments and estimation in an article on the distribution. Melcombe (talk) 12:25, 16 June 2009 (UTC)
I have copied the above to the merge discussion, but having it here also might raise a little interest. Melcombe (talk) 12:29, 16 June 2009 (UTC)

Is it wrong if PCA (or PCR) comes out with some negative weightings for sensors with positive physical quanties?

An answer or referall to an expert appreciated:

Is it automatically unphysical to have a PCA reconstruction that has some stations negatively weighted? Would think that it could occur for both degeneracy and anticorrelation with the average (actual physical effects). Of course the summation must be positive, but is it automatically wrong if some of the stations have negative weights?

This is being debated on these blog threads. Unfortunatley, the debate has muddled particular examination of the Stieg Antarctic PCA-based recon with general absolute claims that negative weightings are bad, bad, bad.

Could you please adjuticate?

See here: (talk) 17:26, 19 June 2009 (UTC)

Try asking at one of the reference desks, mathematics probably would be best. Baccyak4H (Yak!) 17:39, 19 June 2009 (UTC)
Thank you. Have complied. (talk) 18:11, 19 June 2009 (UTC)

New templates

New and revised templates have been made at Template:Experimental design and Template:Least squares and regression analysis. Comments have been requested on these specifically. See the templates' discussion pages. Melcombe (talk) 14:35, 25 June 2009 (UTC)

Is this a good article?

Does anyone have an opinion on if the article on Statistics Online Computational Resource is a good idea? It appears to have been created by someone from that group, and reads a bit like advertising to me. Is conflict of interest a problem here? PDBailey (talk) 02:36, 21 June 2009 (UTC)

Well, it's certainly not a WP:Good Article, but it's short and factual and I see no obvious issues with WP:NPOV or WP:peacock terms, and it's hard to be sure whether there's a WP:COI.
WP:Notability (web) is perhaps more of an issue though; the refs are clearly not independent of the subject and I can't quickly find anything that is, with the notable(?) exception of two paragraphs in Science magazine. I'm not sure it's doing any real harm myself, but up to you if you want to take it to WP:AfD, or perhaps even try WP:CSD#A7. Qwfp (talk) 14:33, 21 June 2009 (UTC)
This university's computational department has (for many years) provided on-line documentation of statistical computing procedures, like examples of SAS. I assume that this is a continuation of UCLA's commendable efforts. (Whether this page belongs on Wikipedia is a question for Wikipedia mavens.)Kiefer.Wolfowitz (talk) 15:28, 21 June 2009 (UTC)
The world is not exactly overrun with free statistical software, but the lack of third-party sources is a concern. I would not nominate this for deletion at present, but 'Whatlinkshere' shows that this package is being promoted in other articles and lists, and its linkage should probably not expand beyond its present limits. Anyone who has the patience to read some of the Software Reviews columns in the American Statistician can probably form an opinion of which software packages are the most widely used. If SOCR has been reviewed in that journal it could provide a third-party reference for this article. EdJohnston (talk) 19:27, 26 June 2009 (UTC)
There does not appear to be a review there, but thanks for the suggestion! (searching the acronym returned nothing, searching the full titled revealed articles that are not on this topic.) But that could be because it is relatively new. However, the article was started by someone apparently from UCLA who I came to know after she/he linked about 30 pages to SOCR. PDBailey (talk) 21:26, 26 June 2009 (UTC)

Statistics Reference Desk?

I notice we get stats questions from time to time on this page. Anyone think a formal reference desk would be useful/desireable, in the style of Wikipedia:Reference desk/Mathematics? Btyner (talk) 01:03, 27 June 2009 (UTC)

There's lots of stats questions at the math reference desk and they seem to be handled there well enough. If people are asking such questions here instead, they're just asking at the wrong place. (talk) 02:43, 27 June 2009 (UTC)
I would agree with that, that the math refdesk does a reasonable job. Baccyak4H (Yak!) 03:36, 27 June 2009 (UTC)

n = 1 fallacy

n = 1 fallacy is a fairly new article that might benefit from some work by the denizens of this project. It's also somewhat orphaned: only two non-list articles link to it. Michael Hardy (talk) 22:50, 24 June 2009 (UTC)

Seems very close to Ecological fallacy. Btyner (talk) 23:22, 24 June 2009 (UTC)
I've always known this as a 'unit of analysis error' (e.g. this glossary entry). I think it's distinct from ecological fallacy, though related. Qwfp (talk) 10:05, 25 June 2009 (UTC)
You're right, they are distinct but definitely related. Consider individuals nested in populations:
(1) Population A gets treatment X, population B gets treatment Y, but the nesting hinders inference on the effect of treatment on individual outcomes.
(2) Population A has infection rate X, population B has infection rate Y, but the nesting prevents us from saying what caused the difference between X and Y, whether we gave them different treatments or not.
Btyner (talk) 00:49, 26 June 2009 (UTC)
I have added it as a "see also" into Hypothesis which already had Ecological fallacy as a "see also". Perhaps someting could be done to draw them both into the main part of that article. Melcombe (talk)
I've just looked up the primary reference on the web (doi:10.1016/, and most of the text of the wikipedia page appears to be a copyright violation of Box 1 of this article. There's perhaps a conflict of interest too, as the user name of the creator is similar to that of the first author of the paper. If they're the same person that doesn't avoid the copyright problem as they signed over the copyright to the publisher, Elsevier. I'll remind myself of the policy at WP:COPYVIO before taking appropriate action. Qwfp (talk) 17:24, 26 June 2009 (UTC)
Note: copyright issue potentially resolved now. See artlcle's Talk. Melcombe (talk) 09:30, 29 June 2009 (UTC)

redirect request

Could someone please redirect Pitman-Yor process to Pitman–Yor_process? Thanks. (talk) 02:42, 27 June 2009 (UTC)

I can't find another example of it being set with an m-dash, why not just move the article? PDBailey (talk) 13:09, 27 June 2009 (UTC)
I think it is an en-dash, which apparently is correct according to MOS:ENDASH. Anyway, redirect created. —3mta3 (talk) 14:21, 27 June 2009 (UTC)

Bayesian average

Bayesian average is an extraordinarily badly written article, abusive to the reader. Is it salvageable at all? Can someone help? See my comments at talk:Bayesian average. Michael Hardy (talk) 06:04, 30 June 2009 (UTC)

Nuisance variables and nuisance parameters

Prior to March 2009 these two terms both redirected to the same article, viz Nuisance variable [2].

My understanding is that "nuisance variable" is most often just a Bayesian variant usage of "nuisance parameter", since to a Bayesian there is no great distinction between variables and parameters.

Either way they are quantities that are fundamental to the probabilistic model, but which are of no particular interest in themselves (or no longer of interest), yet must be taken into account in any analysis of the quantities which are of interest.

Melcombe (talk · contribs) has insisted the concept be split into two different articles (and added an example where there may be "nuisance variables" that from a frequentist perspective genuinely may be "variables" rather than "parameters" -- eg past values in a time-evolving process); but to me it makes no sense to separate out two such tightly related notions (in fact from a Bayesian perspective indistinguishable notions) into separate articles.

IMO it makes much more sense to have one article, with a note that some people distinguish "nuisance parameters" from "nuisance variables" and other people don't.

What do people here think? Jheald (talk) 17:22, 30 June 2009 (UTC)

Discussion is at Talk:Nuisance parameter#Merge Nuisance parameter and Nuisance variable. Melcombe (talk) 09:12, 1 July 2009 (UTC)

Western Electric Rules

On the page "Western Electric Rules", Rule 4 is stated as follows: "Nine consecutive points fall on the same side of the centerline (in zone C or beyond)"
I have a hunch that should be corrected to say, "Eight...". It seems like every web page I visit that mentions Western Electric rules, says that rule 4 is violated if just eight consecutive points fall on the same side of the centerline.-- (talk) 13:09, 3 July 2009 (UTC)

I've copied your question to Talk:Western Electric rules as I don't recognise any of editors of that page as being members of WikiProject Statistics, and added a {{dubious}} tag to Rule 4 in the article. The edit history shows this has been altered several times. Looks like Richard Argentieri (talk · contribs) first "updated" it to nine and also created the image that shows nine, but that was in 2006 and he's not edited since. Qwfp (talk) 14:47, 3 July 2009 (UTC)

Emission Probability

This seems a candidate to be a redirect to Hidden Markov model or moved to the wiki-dictionary. Any comments? —G716 <T·C> 00:25, 29 June 2009 (UTC)

Looks like it should just be deleted. Seems to be just someone defining a variable-name that happens to be in a bit of computer code in Hidden Markov model . Doesn't seem like a terminology that would be generally used. Melcombe (talk) 08:46, 29 June 2009 (UTC)
It's more than just a name that happened to be used in some computer code. It's a standard term widely used in books and papers on Hidden Markov Models. However, I agree that it does not merit its own article and can simply be defined in the Hidden Markov Model article. Skbkekas (talk) 18:41, 30 June 2009 (UTC)
I PRODed the article with a link here —G716 <T·C> 13:11, 10 July 2009 (UTC)

References for "logarithmic distribution"

Logarithmic distribution lacks references (and has just been so tagged). I think we'll find various things written by Fisher between maybe about 1930 and 1960. Maybe I'll look at this on Monday if no one beats me to it. Michael Hardy (talk) 18:40, 5 July 2009 (UTC)

Couldn't resist the challenge. The first description appears to be a 1943 article by Fisher, which I've added as a reference. In the process I discovered this is also described at relative species abundance#Logseries (Fisher et al 1943)[14], so I quickly added a wikilink from there, but could do with a bit more editing (and maybe move the citations from the section headings, which I assume is against WP:HEAD and makes them most awkward to link to). Time I stopped editing for today though. Qwfp (talk) 21:43, 5 July 2009 (UTC)
Google Scholar finds 33 articles with the phrase "logarithmic distribution" up to 1943, including articles by Benford and Yuan, the last
referencing earlier work by Galton, etc. The reference volumes on continuous univariate distributions by Johnson, Kotz, and Balarishan (2nd edition 1994+) may be helpful. Kiefer.Wolfowitz (talk) 12:59, 6 July 2009 (UTC) WARNING: My suggestions were apparently unhelpful! Kiefer.Wolfowitz (talk) 17:23, 6 July 2009 (UTC)
Johnson et al. are not that helpful ...they indicate a paper in German in Biometrika of 1934 as using the terms of the distribution for something, but they don't say that they use the name "logarithmic distribution". They refer to workers on the "log-series distribution", without giving specific refs (except to later papers), as leading to Fisher et al. (1943). I am unable to see the Yuan ref above, but care would be needed that it is actually the same distribution. Melcombe (talk) 14:32, 6 July 2009 (UTC)
The paper by Benford is on Benford's law. The paper (an entire doctoral dissertation in fact) by Yuan is about the log-normal distribution. Unfortunately nomenclature isn't even standardised now: MathWorld calls this the log-series distribution and its entry for "logarithmic distribution" is about a continuous distribution on a finite interval with density proportional to log x, but notes that Benford's law and the log-series distribution are also sometimes known as the logarithmic distribution. Maybe we should add a similar note too. Fisher's 1943 paper seems to be the seminal paper, if not actually the first. I've remove the word "first" from the article, and added "log-series distribution" as another alternative name. Qwfp (talk) 17:09, 6 July 2009 (UTC)
Johnson et al. try to make a distinction between the log-series distribution and logarithmic distibution as used here, but they seem unclear exactly what this is. For completeness, their 1934 reference is ... Luders R (1934) Die Statistik de seltenen Ereignisse, Biometrika, 26, 108-128 ... can someone check if this should be cited? Melcombe (talk) 08:38, 7 July 2009 (UTC)

(outdent) I can see a few pages of Johnson et al on Google books. As far as i can see the point they're making about the "log-series" vs "logarithmic" is that Fisher's 1943 paper was talking about a frequency distribution, not a probability distribution, so the normalization is determined by the total number rather than fixed so that the pmf sums to one. Which seems a pretty minor distinction to me, and they go on to say that since the late 1970s the two terms have been generally been used interchangeably to mean the probability distribution. I can access the Lüders paper (another doctoral dissertation – this one only 21 pages! Times have changed...) but i can't understand it as i don't read German, and it doesn't even have an English abstract. Google Translate isn't a great deal of help (due in part to OCR errors to be fair), though it does at least translate the title as "The Statistics of Rare Events". I can't entirely follow Johnson et al.'s para mentioning this paper, though it sounds to me that though Lüders used the terms of a log-series, he didn't interpret them as a probability distribution; Quetelet drew on his work in a way that did, but not until 1949. Qwfp (talk) 19:49, 7 July 2009 (UTC)

Regression articles discussion July 2009

A discussion of content overlap of some regression-related articles has been started at Talk:Linear least squares#Merger proposal but it isn't really just a question of merging and no actual merge proposal has been made. Melcombe (talk) 11:17, 14 July 2009 (UTC)

Barnes interpolation

I think the article on Barnes interpolation is in need of improvement. The article uses a lot of variables without mentioning their meaning, and the explanation of the method offered in the text is also weak.

I would help, but I wouldn't need the wiki page if I knew the method myself :( —Preceding unsigned comment added by (talk) 15:19, 16 July 2009 (UTC)

I've copied the above comment to Talk:Barnes interpolation. Not even heard of Barnes interpolation before myself. Qwfp (talk) 16:25, 16 July 2009 (UTC)

mixed / fixed / random / etc

I just saw that mixed model redirected to multilevel model. As mixed models are a much broader class I broke the redir and started a stub article instead. I then saw that fixed effects and random effects redirect as well to quite specific articles, e.g., the fixed effects article assumes both panel data and a time covariate, way too narrow a scope.

It seems this whole family of related articles needs a good updating. comments and suggestions welcome. Baccyak4H (Yak!) 16:58, 18 June 2009 (UTC)

Baccyak4H, you are right. I would say that these articles (fixed and random effects) should be disambig pages, but until there is a second thing to link to (maybe there already is?) I think redirects are fine. PDBailey (talk) 21:30, 20 June 2009 (UTC)
Agreed. The fixed effect article in particular needs help. It either contradicts other articles is is written poorly enough so as to appear to. I made some superficial edits, but lack the expertise to tackle the heart of the matter Executive Outcomes (talk) 18:02, 21 July 2009 (UTC)

Pareto interpolation

'nother one that definitely needs work is Pareto analysis. Michael Hardy (talk) 05:08, 27 July 2009 (UTC)

Contiguity space

Contiguity space is opaquely written in a different way from the ones I name above. Michael Hardy (talk) 13:20, 27 July 2009 (UTC)

Is there any difference

between the Kernel smoother and Kernel regression articles? If not, the former should probably be merged into the latter. ... stpasha » talk » 15:41, 29 July 2009 (UTC)

Well, the articles may seem to have some stuff in common, but notionally they are different. For example, simple kernel smoothing is applicable to spectral estimation in time-series analysis, to the estimation of rates of occurence in point processes and to density estimation. In none of these is there necessarily a "regression" aspect and there is more specific theoretical interest in the bias competing with variance-reduction aspects of the smoothing problem than in "local regression". There are other points of difference too... for example in all these applications there would be good reason to ensure that all estimates produced as outputs are non-negative, a requirement that doesn't typically arise for "regression". Melcombe (talk) 17:02, 29 July 2009 (UTC)
At the moment though, the articles cover pretty much the same content. There are also kernel methods and kernel (statistics). —3mta3 (talk) 17:14, 29 July 2009 (UTC)

Deletion nomination - Empirical statistical laws

Empirical statistical laws has been nominated for deletion, discussion at Wikipedia:Articles for deletion/Empirical statistical laws. Contributions with reasons please. You might like to note that there is archived discussion of an earlier version of this article under its previous name at Wikipedia talk:WikiProject Statistics/Archive 1#"statistical law". Melcombe (talk) 10:15, 31 July 2009 (UTC)

WPStatistics needs a quality rating

There are statistical concepts that are not really mathematical: Pareto principle, rule of thumb - just a few examples. Currently they are tagged as part of WPStatistics but cannot be assessed as Template:WPStatistics does not support article's quality assessment. --Piotr Konieczny aka Prokonsul Piotrus| talk 17:19, 6 August 2009 (UTC)

The template now accepts the full quality ratings including the math Bplus and also the standard importance ratings. -- Avi (talk) 17:55, 6 August 2009 (UTC)
Of course, if we want assessments, we now have to go back and add them by hand. Once a suitable amount are done, we can get the Assessment matrix put somewhere. -- Avi (talk) 18:56, 6 August 2009 (UTC)

I've been bold

I'm afraid I've been perhaps too bold and made a change to {{WPStatistics}} that is slowly rippling through Category:Wikiproject Statistics articles, moving them to Category:WikiProject Statistics articles. Now, this change was intentional, since I felt the inconsistency in the name ("Wikiproject" vs. "WikiProject") was... well, if not confusing, at least potentially troublesome. Still, I guess I should have asked here before I did it. Please see my note at the top of Category:Wikiproject Statistics articles and the explanation of the effect of my template change at Help:Category#Categories and templates before attempting to revert the changes. You can, of course, check my other recent edits at Special:Contributions/Dcljr, in case you're worried I've made other inadvisable edits. [g] - dcljr (talk) 05:20, 7 August 2009 (UTC)

I guess I should take some of the blame by unilaterally adding importance and quality options Face-blush.svg. -- Avi (talk) 05:52, 7 August 2009 (UTC)
I'm glad I didn't get yelled at... <g> Looks like the move has been completed. I've changed the note at Category:Wikiproject Statistics articles (accidentally referred to the wrong category title originally). I guess the "old" category title can now be deleted or "soft-redirected" in a more standard way to the new title? - dcljr (talk) 00:00, 20 August 2009 (UTC)
I've just nominated it for speedy deletion as an empty category WP:CSD#C1. Qwfp (talk) 08:28, 20 August 2009 (UTC)

more on WPStatistics template

Given the recent changes to the WPStatistics template, which seem basically OK to me, would it be worth taking the parallel to the maths template even further and add a "Field" possibility. Of course someone would need to come up with a sensible list of fields within statistics to provide valid options. But there are about 2000 articles, so some subdivision is reasonable. We could at least have a "statisticians" field to have somewhere to put statisticians who aren't necessarily mathematicians. However, is there some possibility of providing some automatic coordination with the maths template? Melcombe (talk) 10:32, 10 August 2009 (UTC)

Category:Markov chains and Category:Markov models

There is a naming dispute considering the correct name for the category for the main article Markov chain and related articles, see WP:CFD. (talk) 03:20, 28 August 2009 (UTC)

This may be a little late, but see Wikipedia:Categories for discussion/Log/2009 August 22#Category:Markov chains. Melcombe (talk) 09:37, 28 August 2009 (UTC)

Statistical disclosure control

I just noticed Wikipedia currently has nothing on statistical disclosure control, although the term gets plenty of Google hits and there's an entire book titled "Statistical Disclosure Control in Practice" (ISBN 978-0-387-94722-8). I thought about writing at least a stub, but it's not something I know anything about at present. Anyone feel like taking it on? Looks like there could be US government sources that may be public-domain (See also WP:Public domain resources).

I would think this is of wider interest than just us statisticians: it's relevant to a wider public, both as subjects and consumers of statistics. Just an idea in case someone feels like tackling a new article. Regards, Qwfp (talk) 15:33, 3 September 2009 (UTC)

PS Just checked WP:Requested articles/Mathematics#Statistics, and "Statistical disclosure" (or initially "Disclosure, statistical") has been listed since October 2006!

I think there are some brief mentions of the topic in some articles ... I found one at Microdata (statistics). Melcombe (talk) 09:24, 7 September 2009 (UTC) ...and another one at Exponential mechanism (differential privacy). Melcombe (talk) 14:45, 11 September 2009 (UTC) ... and Barnardisation and differential privacy. Melcombe (talk) 15:17, 18 September 2009 (UTC)


The article Meta-analysis seems to have been mauled, with some seemingly useful info arbitrarily deleted. (Also see artlicle talk page.) Can someone with experience in this topic try to sort it out? Melcombe (talk) 09:29, 11 September 2009 (UTC)

Quality assessment

It seems we have a lot of statistic articles with quality ratings lower than they should have had. This probably happens when editors improve the articles without changing the ratings later on. In this way we get lots of stub- or start- quality articles where the rating should in fact have been C- or even B-.

What we really need but lack, is a centralized team of people who have a very good overview of the majority of statistics articles with the goal to (1) review all the articles in the lower quality segment and see if they have adequate quality rating, (2) set up a bot which will watch over the articles in List of statistical topics and list on a special page those articles which undergo substantial changes and thus are likely to need reassessment, (3) alter the current quality-importance table on the portal’s page to reflect the ratings from {{WPStatistics}} template instead of the mathematics template.

/*hides*/ ... stpasha » talk » 07:26, 16 September 2009 (UTC)

The maths ratings tables don't recognise a C rating as it is not one of the recognised grades there ... and there is no description of what it means on the related pages e.g. . I believe a C rating is not handled well by the automatic processing.
I think it is a lot of work to set up the type of automatic processing you suggest. There is a section above where I tied to prompt discussion of paralleling the maths rating scheme by having a "field" component to the template but, if this were thought reasonable, some thought would be needed into choosing allowable values. One thought is that, if "fields" were to be stated in the template, then multiple "fields" might be allowed each with its own quality-importance ratings. (For example Probability space is set as top priority ... but realistically it is only top priority for specialist proability, not for statistics.) The maths rating template also has a "comments" component leading to special discussion pages, but this seems to be little used in practice ... so perhaps this should not be carried over. Melcombe (talk) 09:08, 16 September 2009 (UTC)
It is possible to reconfigure the VeblenBot (which collects all the ratings) to recognize the C-class rating, and to produce a separate table for WPStatistics importance/quality scale. See for example User:VeblenBot/Economics/table:ECONOMICS — the table generated for the econ project.
However if your aspiration is to have a “field” multi-variable, and separate ratings for each possible value of the field, then the things might get more complicated… If we were to create separate ratings then the more centralized approach would be to create a new template, for example called “Ratings”, which would supersede the existing quality assessment templates such as “Maths rating” and “WPStatistics”, and others. I think it’ll be better to keep things simple first :) ... stpasha » talk » 20:38, 16 September 2009 (UTC)
I have no particular aspirations here. But, before anyone starts a systmatic attempt to set values in the templates it would be good to get agreement that the contents are right, to save having to go through them all again. There were essentially no responses on discussion of the template contents. You may not have seen earlier discussion about the coverage of this project, particularly whether it should be covering the topic of "probability", given that there is notionally a separate project for this and many would be interested in statistics and not probability, and vice versa. Given the apparent lack of activity on the probability project we have been covering probability articles here to some extent at least by including them in List of statistical topics ... but also because it is not clear where to draw the line. One possibility would be to have two templates, one for probability and one for statistics, but both leading back to this project.
In your earlier posting you mentioned trying to make use of List of statistical topics ... you may not have seen that there was a decision (on the talk page for that article) not to include "statisticians" and "statistical organisations" on the list, but there are already stats project templates on some of these articles at least, so it may not be easy to coordinate things in the way you suggested while getting a complete coverage of articles with the stats template. But there may be another way.
Lets hope we get some contributions from others to this discussion so a decision can be reached soon.
Melcombe (talk) 10:05, 17 September 2009 (UTC)

So I went ahead and edited the logo used by WikiProject Statistics, primarily because I took issues with the horrible brownish color of the Gaussian curve. This change affects every single article within the project. Hopefully the new picture is uniformly more likable, otherwise the change can be reverted. ... stpasha » talk » 00:12, 22 September 2009 (UTC)

Article request

I think we could benefit from having the article “List of statistical tests”. The purpose of such article would be to make the current Category:Statistical tests more accessible. For example right now someone who wants to find how to test equality of variances of two distributions will have to look at the category and open every single link there (almost 100) to see which test suits him. With the outline article we could group similar tests in a tree-like structure, so that it is clear what exactly each particular test is testing, and what are possibly additional assumptions required by each of the tests. Like

... stpasha » talk » 19:14, 21 September 2009 (UTC)

What you suggest might be better named "Outline of ..." as it seems to fit within that type of name. See a heading template on Talk:Outline of regression analysis for info on a project that is trying to coordinate "outlines". Of course I think the suggestion of a coordinating article of some type is a good one. Melcombe (talk) 08:57, 22 September 2009 (UTC)

CSS code displays on top of some pages

on Normal distribution and other pages before the actual article, i see css code on top of the article - like this:

/* Infobox template style */

.infobox {

  border: 1px solid #aaaaaa;
  background-color: #f9f9f9;
  color: black;
  margin-bottom: 0.5em;
  margin-left: 1em;
  padding: 0.2em;
  float: right;
  clear: right;

wtf is this??? -- (talk) 10:07, 29 September 2009 (UTC)

This question is better addressed at Wikipedia:Village pump. Make sure you also tell them your browser version, since for modern browsers the CSS code is never shipped together with the page but only as links to external files.  … stpasha »  16:39, 29 September 2009 (UTC)

Some feedback

Hi everybody; just wanted to share this quotation from a talk by Eric Feigelson today at the IAU General Assembly in Rio de Janeiro, who was talking about interdisciplinary connections between statistics and astronomy. Feigelson, commenting on the fact that he had just displayed the opening sentence of the Statistics lemma on one of his slides: "Wikipedia for statistics is excellent. And quite reliable. You can learn quite a lot from Wikipedia." All the best from Rio, Markus Poessel (not logged-in) —Preceding unsigned comment added by (talk) 13:58, 14 August 2009 (UTC)

I could not find any better place to post this, so I'm posting it here: I'm not a statistician but it seems to me that the formula for calculating the Geometric Standard Deviation on [3] is wrong. The divisor should be (n - 1) instead of n. —Preceding unsigned comment added by Vikash.madhow (talkcontribs) 14:56, 3 November 2009 (UTC)

No, n is correct. Compare standard deviation and sample standard deviation. -- Avenue (talk) 03:26, 4 November 2009 (UTC)

Correlation and Pearson Correlation

There has been a lot of confusion over the scope of the correlation and Pearson correlation articles. Recently, someone has been moving some material from the Pearson correlation article to the correlation article. I think this runs against a consensus that the correlation article should focus on general associations ("co-relations") while the Pearson correlation article should focus on the product moment type of statistic. I am inclined to move things back to Pearson correlation, but will wait to see if anyone has a good argument not to. Skbkekas (talk) 03:49, 2 November 2009 (UTC)

Rename correlation article to "correlation and dependence"

melcombe has suggested renaming the correlation article as "correlation and dependence." I think this is a good idea, but since this is a high profile article I thought I would raise it here before making the change. Here are the issues:

  • There has been a lot of trouble determining what material belongs in the correlation article and what material belongs in the Pearson correlation article. Currently there is a lot of duplication.
  • The opening of the correlation article indicates that the scope is more general than linear correlation, noting the origin of the term as "co-relation." Changing the article title would reinforce this interpretation.

If the rename goes forward, it would be good to remove some of the details about Pearson correlation, from the correlation article, and add some material about other methods for assessing correlation and association.

Skbkekas (talk) 15:02, 6 November 2009 (UTC)

Project scope question

Hi. I've recently added the articles Labour Force Survey and Census to this WikiProject. I now wonder if that was the right thing to do since most of the project's articles seem to be about statistics as a science rather than the institutional setting in which statistics are produced. Does anyone have any thoughts on this? Cordless Larry (talk) 16:40, 8 November 2009 (UTC)

Although I now see that Official statistics is included, which reassures me somewhat. Cordless Larry (talk) 16:42, 8 November 2009 (UTC)
The present status regarding these topics is at least partly a result of there being no active participants with an interest in them. An early thought on coverage of the project is "I would consider statistical thinking and conceptual topics to be relevant" in Wikipedia talk:WikiProject Statistics/Archive 1#Things on boundary of scope. Melcombe (talk) 10:49, 10 November 2009 (UTC)
Well I'm happy to join the project and will take an interest in this type of article, although I might not have that much time to dedicate to it. Cordless Larry (talk) 15:29, 10 November 2009 (UTC)

{{Population pyramid}}

Template:Population pyramid has been nominated for deletion. (talk) 06:00, 20 November 2009 (UTC)

Group testing

Group testing examines the problem from a combinatorialist's point of view. We should add a statistician's point of view. Michael Hardy (talk) 04:57, 25 November 2009 (UTC)

Article on a distribution lost

The previous article on ARGUS distribution has been redirected to much the same content in a much larger article. The previously existing stuff gave only a density function of an unusual form, with no other details. But it might be something someone wants to work on. Melcombe (talk) 12:54, 10 November 2009 (UTC)

This has now been restored and a little detail added, but it could still do with being brought up to standard for distribution articles. Melcombe (talk) 10:03, 27 November 2009 (UTC)

Higher-order statistics

Higher-order statistics is very vaguely written at best and I wonder if there's a legitimate concept there. Michael Hardy (talk) 03:48, 27 November 2009 (UTC)

Exponential Smoothing

The exponential smoothing article needs some serious work. It has extra 'padding' information now, and lacks all types of exponential smoothing except for simple exponential smoothing. Types to add include double (Brown), triple, linear trend (Holt), damped trend, and exponential trend, among others. JLT 15:41 16 Dec 2009 (CST) —Preceding unsigned comment added by (talk)

Strange that someone comments on this without linking to it. Here it is: exponential smoothing. Michael Hardy (talk) 14:55, 17 December 2009 (UTC)

Welch's t test

Hi! Not sure if this is a correct page, however: Welch's t test says that t is defined by ... - why is the power there? it seems very confusing since I believe there should be no power in Welch's test. Best, Jakub —Preceding unsigned comment added by (talk) 23:18, 25 December 2009 (UTC)

Histogram extension to continuous data

Hi, there is an unreferenced section in the histogram article which is not very well written and looks like OR to my non-expert eyes. Can some expert in the thing check it out? Thanks! --Cyclopiatalk 17:08, 2 January 2010 (UTC)

I think it should be deleted. It's nonsense, or at least completely unintelligible. Skbkekas (talk) 04:14, 3 January 2010 (UTC)

I doubt that it's actually nonsense. But the notations involve are not defined, so it's hard to tell what is meant. Michael Hardy (talk) 12:47, 3 January 2010 (UTC)

Well, I deleted it. If and when the original editor of that thing will come out to explain, we will reconsider it back. --Cyclopiatalk 13:31, 3 January 2010 (UTC)

I sent him an email recently, and it hasn't bounced. We'll see what happens. Michael Hardy (talk) 07:22, 4 January 2010 (UTC)

Geospatial topology

I have place a prod template for Geospatial topology. This article is in several stats categories but hasn't been linked into the stats project such as to cause this to be noted on the project page. Melcombe (talk) 09:59, 13 January 2010 (UTC)

New category for statisticians, "Category:Wikipedians interested in statistics"

There was a category called "Category:Wikipedians interested in Bayesian methods", so I created a super-category for all statisticians.

(I did not link the category of members of this project to the statistics category, in case this was objectionable.)

Kiefer.Wolfowitz (talk) 01:34, 14 January 2010 (UTC)

Would anybody object to linking our category (Wikipedia Project Statisticians) to that category?
(I noticed that the "Category:Wikipedian statisticians" is listed as a subscategory of WP WikiProject Statistics, I believe.)
This might be a way to recruit talent and boost visibility for the project.... Kiefer.Wolfowitz (talk) 00:42, 17 January 2010 (UTC)

Wrapped Normal Distribution

The pdf for the wrapped normal doesn't appear correct to me. If I type it in Mathematica, I get imaginary values out. The Jacobi description that follows is a mix of variables that have the same name in different formulas, and is confusing at best. As stated it appears as such.


where is the Jacobi theta function:

My Proposal

where is the 3rd Jacobi theta function:

If I type this in to Mathematica, it works, and matches know results that I have to compare with. Also, I propose deleting the Jacobi elliptic explanation. The summation form is also suspect, but I'll look this up later.

I would prefer someone to validate this, otherwise in 2 weeks i will change it. (talk) 20:08, 20 January 2010 (UTC)

I have copied the above to Talk:Wrapped normal distribution which is a better place for discussion. Melcombe (talk) 09:44, 21 January 2010 (UTC)

Law of the unconscious statistician

Please see Talk:Law of the unconscious statistician for discussion of possible treatment of that article (deletion, redirection or trimming). Since the name includes "statistician" this may be of some interest. Melcombe (talk) 15:18, 21 January 2010 (UTC)

demographics population pyramids

I recently created a population pyramid animation for the United States population by gender from 1950-2010 at File:United States Population by gender 1950-2010.gif. Should I create such a population pyramid for all countries...would it be on any benefit?Smallman12q (talk) 01:43, 22 January 2010 (UTC)

WP 1.0 bot announcement

This message is being sent to each WikiProject that participates in the WP 1.0 assessment system. On Saturday, January 23, 2010, the WP 1.0 bot will be upgraded. Your project does not need to take any action, but the appearance of your project's summary table will change. The upgrade will make many new, optional features available to all WikiProjects. Additional information is available at the WP 1.0 project homepage. — Carl (CBM · talk) 03:58, 22 January 2010 (UTC)

Tools to help your project with unreferenced Biographies of living people

List of cleanup articles for your project

If you don't already have this and are interested in creating a list of articles which need cleanup for your wikiproject see: Cleanup listings A list of examples is here

Moving unreferenced blp articles to a special "incubation pages"

If you are interested in moving unreferenced blp articles to a special "incubation page", contact me, User talk:Ikip

Watchlisting all unreferenced articles

If you are interested in watchlisting all of the unreferenced articles once you install Cleanup_listings, contact me, User talk:Ikip

Ikip 02:07, 26 January 2010 (UTC)

New articles need attention

A bunch of new articles created by user:Yuanfangdelang need a lot of attention. They thoroughly neglect conventions of WP:MOS and there may be notability questions. Michael Hardy (talk) 15:06, 24 January 2010 (UTC)

Of particular concern is continuity test. New statistical material of dubious merit was added to an existing page on continuity testing in electronics. Does anybody know if there are statistical methods for continuity testing? Presumably something exists in time series, change point analysis, or functional data analysis. If there is such a thing, we can add a disambiguation page for continuity test with links to separate pages on use of the term in electronics and statistics. Skbkekas (talk) 02:12, 25 January 2010 (UTC)

A big mess of many non-notable statistics articles

Please see Wikipedia:Articles for deletion/Ligong Chen. (Crossposted to Wikipedia talk:WikiProject Mathematics.) —David Eppstein (talk) 01:32, 29 January 2010 (UTC)

I have moved this contribution here, as it seems to be directly relevant to the sets of articles by user:Yuanfangdelang referred to above, and not to statistics articles more generally, as might otherwise appear. Melcombe (talk) 10:29, 29 January 2010 (UTC)


I've changed Studentization from a redirect page to something that is now barely more than a disambiguation page. If it's not a disambiguation page, then it's an orphan and other pages should link to it. Which leads us to a question: which ones? Michael Hardy (talk) 17:32, 3 February 2010 (UTC)

I have expanded Studentization and added links in Studentized residual and Studentized range. Those articles raise the question of whether the capital S should be used, as the term is almost a name. Melcombe (talk) 13:20, 4 February 2010 (UTC)

AfD for Object Oriented Quality Management

Seem to be a statistics topic. Pcap ping 09:22, 6 February 2010 (UTC)

New-ish notation

Is someone able to add a sensible definition to the disambig page (yes, this is a wikilink), which I think is a symbol relating to independence which has come into recent usage in some parts of stats. And an addition to Notation in probability and statistics would be good. Melcombe (talk) 10:35, 10 November 2009 (UTC)

I see that the immediate problem of definition has been solved ... thanks to those concerned. Melcombe (talk) 11:49, 16 December 2009 (UTC)
For info, I see that the article Conditional independence now has some material on this notation. Melcombe (talk) 11:18, 8 February 2010 (UTC)

Question: since when does ⊥ mean independent? It appear to be used interchangeably with the traditional independent symbol on conditional independence. What is going on? 018 (talk) 13:06, 8 February 2010 (UTC)

My impression (and it is no more than that) is that '' is strictly correct (following Dawid 1979 JSTOR 2984718 ?) but '⊥' is sometimes used instead as it is more often available or easier to obtain (typing \perp\!\!\!\perp is a bit of a pain...). I think articles (WP or academic) using either should define it first, in which case i don't see a problem. Qwfp (talk) 13:51, 8 February 2010 (UTC)
I'm okay with that so long as (0) the article defines the notation (as you suggest), (1) each page uses one or the other form, (2) the strictly correct version is used when the article talks about both orthogonal and perpendicular. 018 (talk) 14:18, 8 February 2010 (UTC)

Observation space

There are two articles that use "observation space" in a technical sense, yet there seems to be no definition of this, either in its own article or another, to provide a target for a link. Thus probability distribution even has some maths symbols associated with the term, while probability density function mentions "observation space" in passing. Melcombe (talk) 11:49, 8 February 2010 (UTC)

Convention for P

The two articles listing notation conventions, namely Table_of_mathematical_symbols and Notation in probability and statistics, seem to disagree about the notation for "Probability": the former mentions a "blackboard" P for display and non-italic P or Pr as alternatives (but not an italic P), while the latter gives only an italic P. This question is prompted by a recent edit by someone which switched from Pr to P in Probability mass function. It would be good for these statements of conventions to be brought into line. Of course there may be some hidden distinction being made in these notations that I haven't spotted. I will place this message also at Talk:Notation in probability and statistics.... please place discussion there. Melcombe (talk) 10:13, 11 February 2010 (UTC)

Least absolute deviations

The article about Least absolute deviations (LAD), in the section "Solving methods", omits a simple transformation that casts LAD problems as Linear Programs (LP), which can in turn be reliably and efficiently solved by general purpose LP packages (for the transformation, see p. 294 of Boyd and Vandenberghe's book "Convex Optimization", freely available at Most people would be better served by doing this simple and intuitive transformation and then applying one of the many Linear Programming packages available instead of trying to code their own solution based on Barrodale and Roberts' paper.

I have never contributed to Wikipedia before, so I don't know what is the "adequate" way of doing this. I don't know if I have to ask permission from someone, so I decided to post here before and see if there was any feedback. I would be glad to write this myself.

--Gpfreitas (talk) 06:21, 9 March 2010 (UTC)

I have copied this to Talk:Least absolute deviations, which would be a good place for replies. Melcombe (talk) 09:42, 9 March 2010 (UTC)

create a history of statistics template box

I was reading Karl Pearson's page and saw he's in a philosophy of science template box, but not a statistics one. I think a good addition to Template:Statistics would be a template that mentioned the big names and developments in Statistics. e.g. Pearson, Fisher, Galton, Gauss, Spearman, Tukey etc. --Rajah (talk) 04:19, 9 March 2010 (UTC)

Useful Wikipedia resources include
Ian Hacking's recent books are also useful. Thanks for your initiative. Kiefer.Wolfowitz (talk) 23:02, 10 March 2010 (UTC)

The statistics behind the assessment of diagnostic tests

I think this area needs some reorganising. The following four articles use template Template:SensSpecPPVNPV to place the same example section into multiple articles. Perhaps they should be merged into a more coherent single article on the statistical analysis of diagnostic tests?

These three also could do with some consolidation (merging?). See also Inter-rater reliability.

Comments? Tayste (edits) 00:03, 11 March 2010 (UTC)

An alternative view is to wonder whether the same template can be included in more articles in Category:Summary statistics for contingency tables, so as to get more consistency in terminology/notation.... there are other articles here dealing only with 2 by 2 tables. But, anyway, some of these articles have grown out of different fields of application and may have articles linking to them that are hoping to find specific types of information that it would be easy to destroy in a merge. Melcombe (talk) 11:00, 11 March 2010 (UTC)
I can see some merit in having a single article on the statistics of diagnostic test accuracy, but it might be rather long so i can also see some drawbacks. But merging Positive predictive value with Negative predictive value seems obvious and uncontroversial. (I proposed merging sensitivity and specificity a while back and i'm glad that's now been done by other(s). Afraid i don't have time just at the moment to add the merger proposal templates and start the discussion) Qwfp (talk) 22:52, 11 March 2010 (UTC)

Randomness articles

I was contacted by someone who was complaining about the state of articles on randomness, and I am passing this on here for info. The specific articles are: Random sequence, Statistical randomness, Applications of randomness, Randomness. Melcombe (talk) 11:54, 11 March 2010 (UTC)

Request for comment on Biographies of living people

Hello Wikiproject! Currently there is a discussion which will decide whether wikipedia will delete 49,000 articles about a living person without references, here:

Wikipedia:Requests for comment/Biographies of living people

Since biographies of living people covers so many topics, nearly all wikiproject topics will be effected.

The two opposing positions which have the most support is:

  1. supports the deletion of unreferenced articles about a living person, User:Jehochman
  2. opposes the deletion of unreferenced articles about a living person, except in limited circumstances, User:Collect

Comments are welcome. Keep in mind that by default, editor's comments are hidden. Simply press edit next to the section to add your comment.

Please keep in mind that at this point, it seems that editors support deleting unreferenced article if they are not sourced, so your project may want to pursue the projects below.

Template:Statistics journals

This template needs to be cleaned. Several journals that are listed as open access are, in fact, not OA (the Brazilian Journal of Probability and Statistics, for example. Unfortunately, I don't have the time to check all this (and I don't know enough about creating templates either) to do this myself. Perhaps somebody here can do this? Thanks. --Crusio (talk) 17:51, 15 March 2010 (UTC)

Normal distribution

Could I flag the current state of the Financial variables section of this article?

Two paragraphs have been added contradicting the first two paragraphs of the section, but are unreferenced. I can't judge if the new paragraphs are OR or what - though they are certainly not patent nonsense. However, as a result of these cahanges, the section, as a whole, does not read well.

Cje (talk) 10:13, 15 March 2010 (UTC)

Clarifying - that is Financial variables in Normal distribution#Occurrence
Cje (talk) 15:09, 15 March 2010 (UTC)
I made a comment at talk:normal distribution about this. Skbkekas (talk) 16:28, 15 March 2010 (UTC)

That section, as a whole, was removed. The article normal distribution is not a proper place to discuss which quantities are not really normal and why aren't they.  // stpasha »  05:48, 19 March 2010 (UTC)

A definite improvement! Much better flow to the section now and easier for the novice (such as myself) to follow the main argument. Thanks Cje (talk) 10:00, 19 March 2010 (UTC)

Unreferenced living people articles bot

User:DASHBot/Wikiprojects provides a list, updated daily, of unreferenced living people articles (BLPs) related to your project. There has been a lot of discussion recently about deleting these unreferenced articles, so it is important that these articles are referenced.

The unreferenced articles related to your project can be found at >>>Wikipedia:WikiProject Statistics/Archive 3/Unreferenced BLPs<<<

If you do not want this wikiproject to participate, please add your project name to this list.

Thank you.

Update: Wikipedia:WikiProject Statistics/Archive 3/Unreferenced BLPs has been created. This list, which is updated by User:DASHBot/Wikiprojects daily, will allow your wikiproject to quickly identify unreferenced living person articles.
There maybe no or few articles on this new Unreferenced BLPs page. To increase the overall number of articles in your project with another bot, you can sign up for User:Xenobot_Mk_V#Instructions.
If you have any questions or concerns, visit User talk:DASHBot/Wikiprojects. Okip 23:26, 27 March 2010 (UTC)

Silly but founded problem

Embarrassingly, I have just been looking simply for a name of an operation and found a set of orfaned really basic level topics. Basically I have an observed value (A) and an expected value (B) so to see how much the A is "off" I did (A-B)/B, I would normally call it change, but my divisor is an expected value and "change from the expected value" sounds odd so having no idea what this is called (ratio minus one?), I looked around for a name in wikipedia. Change gives a disambigation page, including fold change (A/B) and percentage change (A-B/B), the latter has as a see-also set of topics covering the same thing: Relative difference, Percent difference and Percentage change, all of which seem written by non-expert users. Going through the interlinked stats pages, coefficient of variation is the closest find and ratio talks about normal division. Could someone give these five pages a look at? Thanks --Squidonius (talk) 07:52, 6 April 2010 (UTC)

Gaussian minus exponential distribution

Gaussian minus exponential distribution needs work. In particular, it's an orphan: other pages should link to it. Michael Hardy (talk) 03:13, 10 February 2010 (UTC)

It seems to me that both the PDF and CDF in that article are wrong. You can get negative density values and cumulative probabilities outside of (0,1) fairly easily (e.g. let lambda approach 0 in the expression for the CDF and you get 1-2F where F is the standard normal CDF). Does anyone have know of a reference for this? I assume that the construction here is the difference between independent Gaussian and exponential values, although this is not stated.Skbkekas (talk) 22:15, 6 April 2010 (UTC)

Orthogonal array testing

The article titled Orthogonal array testing never gets around to saying what orthogonal array testing is. Michael Hardy (talk) 18:17, 6 April 2010 (UTC)

Histogram and bar chart

What term of them have a more general meaning: histogram or bar chart? What is the differences between the two terms? The pages histogram and bar chart don't say it. --Aushulz (talk) 08:38, 9 April 2010 (UTC)

Olympic Average

I would like to suggest a new article, but before I write it I would like to see if others have heard of this or not. The Olympic Average is an average where the upper quartile and lower quartile of a sample are removed and then the rest of the sample is averaged. Searching Google for this term returns almost nothing. Have others heard of this term? Does anyone have any additional info on this as that is all I have?-- (talk) 18:16, 13 April 2010 (UTC)

I entered "trimmed mean" into the search box, and it redirects to truncated mean. Michael Hardy (talk) 18:47, 13 April 2010 (UTC)
See also Interquartile mean. Qwfp (talk) 18:54, 13 April 2010 (UTC)

Standard normal table

Everywhere I look to convert a standard score into a percentile rank, it says to look in a standard normal table for the conversion. Nowhere can I find the equation used to generate this table. I believe that the equation is , but I can't find any sources. Does anyone if this equation is accurate? Dude1818 (talk) 21:05, 18 April 2010 (UTC)

It's not. If the population is normally distributed, then the percentile corresponding to the standard score z is just 100% × Φ(z), where Φ is the cumulative distribution function of the normal distribution. As to how Φ is computed from scratch, that's more complicated question since it lacks a closed form.
But this isn't the most appropriate forum for your question, since this page is supposed to be about how to improve Wikipedia's statistics articles. Wikipedia:Reference desk/Mathematics would be an appropriate place to raise this matter. Michael Hardy (talk) 20:41, 19 April 2010 (UTC)

skew-normal distribution by location,scale&slant parameter

if x1,x2~N(0,1) min(x1,x2)~SN(-1) F min{X1,X2}(t)= 1-Φ^2(-t) —Preceding unsigned comment added by (talk) 19:24, 19 April 2010 (UTC)

Apparently inaccurate description of weighted linear regression

See discussion of Linear Regression page. Grevillea (talk) 04:16, 20 April 2010 (UTC)

should we have an article on SHAZAM?

The editor who added SHAZAM (software) works for the company. The question is, do we want the article? What should be added? 018 (talk) 18:39, 22 April 2010 (UTC)

There is also the question as to if we want it in the template of statistical software. [4] 018 (talk) 18:57, 22 April 2010 (UTC)

Can't see a problem with having an article on SHAZAM myself. I'm sure it'll improve given time. Google Scholar suggests it's probably notable. As for the question of whether we want it in Template:Statistical software... that makes me wonder if we want Template:Statistical software. Does it serve a useful purpose not served by List of statistical packages? It's much more selective, but it's not at all clear to me on what grounds (beyond excluding redlinks). Qwfp (talk) 21:02, 22 April 2010 (UTC)
Qwfp, it allows navigation between packages pages. Who wants such navigation... I have no idea. But I'll bet someone really wants it somehow. It is also nicer than having all the packages link to each other in "see also". 018 (talk) 01:46, 23 April 2010 (UTC)

Use of a 2-sample KS test for Sockpuppet analysis

I realise that this isn't directly related to the Statics project and for that I apologise, but is really intended to solicit comment from the domain experts who might be watching this page and who might be interested in wider Wikipedia policy enforcement. I have proposed the use of a 2 -sample KS test as a means of detecting putative sock puppetry. I have written up my approach on one of my user pages and discussed its use on the SPI discussion page. Given that this might end up being used as an enforcement tool against users both innocent and guilty, I feel that it is appropriate to invite expert comment either on the SPI discussion page or on the article's talk page. Are there any fundamental flaws here? Are there any pragmatic improvements that could be suggested, etc.. Can I also apologise for my potential ignorance here in advance. I only did this stuff to Masters level and that was over 30 years ago. TerryE (talk) 19:41, 6 May 2010 (UTC)

Codomain of a random variable

Codomain of a random variable: observation space – recently archived Talk at WikiProject Mathematics.

Codomain of a random variable – current Talk at WikiProject Mathematics, a continuation at my initiative.

By the way, I am a user registered here WP:WPSTAT and not there WP:WPMATH.

The discussion is general except for the term observation space and my remarks on the articles random variable and random sequence. See also random element, especially its list of internal links. --P64 (talk) 23:25, 12 April 2010 (UTC)

I think the treatment of the whole subject (or maybe the actual theory of the subject) has some gaps. I left a comment at talk:probability space#nature makes its move asking for clarification. (talk) 18:46, 14 April 2010 (UTC)
I believe the immediate problem is pedagogical, which is probably what you call the treatment of the subject rather than its actual theory. The comment at probability space talk doesn't help me. Have you read random sequence and its talk? --P64 (talk) 22:58, 7 May 2010 (UTC)

Rename suggestion

I suggest to rename article “Linear regression” into “Linear regression model”, and “Discrete choice” into “Discrete choice models”. There are several arguments in favor of such a change:

  1. Uniformity. The articles which are close in spirit to “linear regression” and “discrete choice” are: General linear model, Generalized linear model, Errors-in-variables models, Hierarchical model, Probit model, etc.
  2. Unambiguity. The term “linear regression” is not unambiguous. In modern use it can be understood as either the, well, regression which is linear (the word “regression” means an attempt to simplify and summarize the available data into a simple structure — a linear function in this case), or it could be understood as a method for calculation of the regression function, in which case it becomes a synonym to OLS. Examples of such confusions are: simple linear regression article, which should be properly named “simple least squares”; or econometrics article which talks about OLS but links to linear regression; the german and french versions linked from the linear regression article are in fact devoted to OLS.
  3. Shorter doesn't mean better. We have agreed somewhere at some point that proliferation of different articles regarding linear regressions and least squares should be put to end. And in particular we decided that the topic and scope of each article should be made as clear and precise as possible. My suggestion aims at exactly this goal.

I’ll be back.  // stpasha »  23:52, 21 April 2010 (UTC)

In terms of Linear regression, that would be a repurposing. I don't fully agree. — Arthur Rubin (talk) 01:43, 22 April 2010 (UTC)
Arthur Rubin, can you expand on what you mean by and why this is an instance of "repurposing?" 018 (talk) 02:02, 22 April 2010 (UTC)
The subjects appropriate for linear regression (and which seem to be presently in the article) are not the same subjects appropriate for linear regression model (or models). Stpasha's statement that the rename "aims" at the goal of making the topics precise may be accurate, but I believe it misses the mark. — Arthur Rubin (talk) 03:06, 22 April 2010 (UTC)
I don't think i understand you. What in your opinion are the topics appropriate for the title “linear regression model”? In my view the linear regression model is a line y = Xβ + ε, and the topics such as how to estimate this equation, in which fields it is used, etc.  // stpasha » 
What is unclear? It is certainly possible, and often done, to undertake a "linear regression", including an ordinary least squares analysis or a weighted least squares analysis, without starting from a "model" (meaning a statistical model in this instance). In fact, oftehn the very point of labelling an analysis as "ordinary least squares" is that one knows that that the supposedly corresponding "OLS model" does not apply. More generally, linear regression does not start from a statistical model, but rather a from a requirement to draw some conclusion from some data. Thus there is much more to "linear regression" than just some model, and such an article should not start by setting out an a-priori model. Melcombe (talk) 15:30, 12 May 2010 (UTC)

Correction for attenuation and Disattenuation - are these articles basically the same?

As far as I cant tell, despite a difference in the way the topic is presented, these two articles are covering identical grounds and are about the same core topic. I don't feel comfortable with making a merge request unless somebody better qualified than me has a look over them first, but I am having trouble discerning what the difference between them is meant to be? TheGrappler (talk) 00:11, 19 March 2010 (UTC)

They look quite similar to me as well. The former is more stubbier though, so probably should be merged into the latter.  // stpasha »  00:26, 19 March 2010 (UTC)
Thanks, any other thoughts on this? Other than the fact that the former lists Spearman as the originator of the method (without a reference), I don't think the former actually has any content the latter lacks? TheGrappler (talk) 12:34, 19 March 2010 (UTC)
Spearman ref is:
C. Spearman. The Proof and Measurement of Association between Two Things. The American Journal of Psychology 15 (1):72-101, 1904. JSTOR 1412159
Just noticed that we have an article on Regression dilution which references this, and which might also be considered for merging with the two articles above. Qwfp (talk) 13:51, 19 March 2010 (UTC)
To me it appears that both of those articles could be merged into regression dilution. Is it worth setting up a merge request? TheGrappler (talk) 13:00, 22 March 2010 (UTC)

Merging these two articles may be a good place to start. Prefixes in this listing identify the wikiprojects that now claim the articles: Measurement (Me), Statistics (S), Psychology (P).

As a start I have now added merge templates to Correction for attenuation and Disattenuation. There is some edxtra notation in one that is not in the other, but both relate to adjusting correlations rather than regression coeffs. I will transfer the ref given above into the corresponding article. Melcombe (talk) 16:49, 13 May 2010 (UTC)

projected changes

So, I'm planning to make several changes to the project main page, and I thought I'd announce them here first.

  • We need the list of topics within the project. The “list of statistics topics” has altogether different purpose, as emphasized by Melcombe. What we need is the Category:WikiProject_Statistics_articles only in a more humane form.
  • Guidelines have to be improved, probably a new page Manual of style for WikiProject Statistics started, similar to the existing notation in probability and statistics article.
  • Participants list requires clean-up, and people who are retired from the project (as defined via whether the user has done any edits on the statistics-related topics within the past 6 months) moved into the “retired” list. This will let us get a better understanding of how many people are actually on the project.

 // stpasha »  17:07, 11 May 2010 (UTC)

I like 1 and 3, but don't get the manual of style suggestion. Can you tell me (1) what would go into this and (2) how we would minimize the work that has to go into it? My work minimization concern also applies to (3). Perhaps a bot could help us out. 018 (talk) 17:16, 11 May 2010 (UTC)
The generic MoS for Wikipedia is very long, very clever, and not very useful for our purposes. We want to introduce some kind of standardization across the project’s articles, so that when people go from one article to another they wouldn’t have to relearn all notation anew. I understand that such task might be quite difficult, but we can at least try. Another thing is (and i believe Michael Hardy would support me) is to explain how to write the inline mathematical formulas. The dice vs die thing. The suggested formatting for references and the order of subsections. etc.  // stpasha »  18:00, 11 May 2010 (UTC)
Okay, sounds good. If it can start short and then grow as need be. I would be a little hesitant to impose too strong of notational similarity across the project though. 018 (talk) 21:18, 11 May 2010 (UTC)
I'm not a project member, but this kind of suggestion seems to have a vague sense of impending doom about it. 018 makes some good points about creating unnecessary work and imposing too strict guidelines. A better approach might be to look at the actual articles, and see if there are issues which guidelines might help with. After all, the purpose of the project is to develop articles, and guidelines are just a means to an end. For a human-readable topic list, there already is Template:Statistics. -- Radagast3 (talk) 11:59, 12 May 2010 (UTC)
I actually like the idea now that I see the specific ideas stpasah has. All of those would be useful to many editors and not too difficult to write. 018 (talk) 23:23, 12 May 2010 (UTC)
I looked at starting Manual of style for WikiProject Statistics and wondered, what is wrong with Manual_of_Style_(mathematics) for all the issues you mentioned? At the very least, I'd suggest we would make those additions there. 018 (talk) 00:42, 13 May 2010 (UTC)
Could have mentioned earlier, you know… It’s too late to turn back now. One problem with MOS:MATH is that it is nowhere mentioned on this project’s page. Although there are certainly other projects having their own recommended MOSes, for example Wikipedia:WikiProject Logic/Standards for notation.  // stpasha »  03:05, 13 May 2010 (UTC)
It's never too late. After all there would need to several stages of review before a manual of style can be adopted.... after all it cannot be written merely to reflect one person's whim. If something special really is needed for "statistics" then the easiest way to accomplish this would be to modify the maths one with a special section for statistics articles and such changes can be agreed on the talk page there. Melcombe (talk) 09:10, 13 May 2010 (UTC)
I've put up the (largely incomplete) preliminary version at the Wikipedia:WikiProject Statistics/Manual of Style. The guide would certainly benefit from the discussion on some of the more controversial topics, so please set a watch to its Talk: page.  // stpasha »  22:52, 13 May 2010 (UTC)