Wikipedia talk:WikiProject Statistics/Archive 2

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Contents

Articles flagged for cleanup

Currently, 477 articles are assigned to this project, of which 129, or 27.0%, are flagged for cleanup of some sort. (Data as of 14 July 2008.) Are you interested in finding out more? I am offering to generate cleanup to-do lists on a project or work group level. See User:B. Wolterding/Cleanup listings for details. More than 150 projects and work groups have already subscribed, and adding a subscription for yours is easy - just place the following template on your project page:

{{User:WolterBot/Cleanup listing subscription|banner=WPStatistics}}

If you want to respond to this canned message, please do so at my user talk page; I'm not watching this page. --B. Wolterding (talk) 17:55, 5 August 2008 (UTC)

Not too surprising; some are good but quite a lot are fairly weak. Michael Hardy (talk) 18:52, 5 August 2008 (UTC)
Added template with hidden = 1 until Wikipedia:WikiProject Statistics/Cleanup listing is created. I'll come back later to check when the bot runs again and creates the page.—G716 <T·C> 01:56, 6 August 2008 (UTC)
Fixed a typo (above): Statisitcs -> Statistics. Since the above file name is a blue link now, the bot has run. EdJohnston (talk) 19:38, 9 August 2008 (UTC)

Two new afd

Hello - I have nominated two statistics articles for deletion. Please comment at discussions below. If this notification is considered canvassing, please accept my apologies.

Regards—G716 <T·C> 04:17, 13 August 2008 (UTC)

Would the reasons proposed apply equally to list of important publications in mathematics, list of important publications in computer science, list of important publications in biology, list of important publications in physics, list of important publications in economics, list of important publications in psychology etc.? Michael Hardy (talk) 06:17, 13 August 2008 (UTC)
I replied to your comment on the AfD discussion—G716 <T·C> 07:20, 13 August 2008 (UTC)

List of scientific journals in statistics

Any opinions on using the table here instead of the list in List of scientific journals in statistics?—G716 <T·C> 14:31, 16 August 2008 (UTC)

Lack-of-fit sum of squares

I just created Lack-of-fit sum of squares. Happy editing! Michael Hardy (talk) 18:43, 17 August 2008 (UTC)

This looks good. But there remains the question of relating it to Explained sum of squares,Total sum of squares,Residual sum of squares and Sum of squares. The new article and the first three of these would form a natural group, so linking between them would be worthwhile. But do they first need to be rethought and possibly have the notation changed so that they more readily match the extended needs in the context of the new article. Melcombe (talk) 15:21, 18 August 2008 (UTC)

That situation has been a disorganized mess for some time. Definitely it needs work. Michael Hardy (talk) 17:56, 19 August 2008 (UTC)

Proposed deletion of K-factor error

For info, I have added a proposal to delete template to K-factor error as there has been no improvement. Melcombe (talk) 15:54, 18 August 2008 (UTC)

Probability distribution

Here's the worst opening paragraph I've read in a long time, at probability distribution:

A probability distribution describes the values and probabilities associated with a random event. The values must cover all of the possible outcomes of the event, while the total probabilities must sum to exactly 1, or 100%. For example, a single coin flip can take values Heads or Tails with a probability of exactly 1/2 for each; these two values and two probabilities make up the probability distribution of the single coin flipping event. This distribution is called a discrete distribution because there are a countable number of discrete outcomes with positive probabilities.

Sigh.... Michael Hardy (talk) 16:53, 20 August 2008 (UTC)

New article bot

The new article bot has been set up for statistics by User:Colchicum. The rules obviously need tweaking. The bot's statistics pages are here: (Search result, Log, Rules). Any help will be welcome. —G716 <T·C> 03:13, 23 August 2008 (UTC)

Wikiquote Statistics

I am tempted to delete all the unsourced quotes on the wikiquote:statistics page. These appear to be trivial list cruft, more suitable for American late night talk shows, that do nothing to further the understanding of the importance of Statistics. If you have any any favorite meaningful quotes please consider adding them. —G716 <T·C> 11:57, 26 August 2008 (UTC)

Hypoexponential distribution

Can someone look at Hypoexponential distribution in terms of improving/explaining/wikilinking the "phase-type" notation and possibly also providing formulae for the skewness and kurtosis which should be possible. Melcombe (talk) 14:55, 26 August 2008 (UTC)

Hypothesis of linear regression?

Hypothesis of linear regression is a badly written article that has sat there for two-and-a-half years. Is it worth trying to save? Michael Hardy (talk) 19:38, 26 August 2008 (UTC)

It could be tagged onto the end of Mathematical formalization of the statistical regression problem, which is what is says it relates to. Certainly the "hypothesis" part of the article name is unhelpful. Melcombe (talk) 08:59, 27 August 2008 (UTC)

That latter article is one of the worst-written things on Wikipedia. Michael Hardy (talk) 09:37, 27 August 2008 (UTC)

I agree with Michael. This is a terrible article, starting with the title. I'd favor deep-sixing it. If there's anything worth saving I suppose it could be put elsewhere, but I would think that anything worthwhile is probably already in another article. Bill Jefferys (talk) 02:03, 28 August 2008 (UTC)
Maybe. But the trouble is knowing what the context of any particular regression-related article is meant to be. I think at least one tries to be about the mathematical problem rather than about the statistical context. At least Mathematical formalization of the statistical regression problem has a title that makes it clear (clearer) what the contents are supposed to be about. Melcombe (talk) 09:21, 1 September 2008 (UTC)

Process Window Index

If ever an article needed to get cleaned up, Process Window Index does. Maybe I'll be back....... Michael Hardy (talk) 12:32, 4 September 2008 (UTC)

Laurence Baxter

Laurence Baxter has been nominated for AfDG716 <T·C> 03:35, 9 September 2008 (UTC)

Most probable number

Most probable number is an article that escaped this projects notice and that of most people who work on statistics for some months; I suspect it lacked suitable category tags. It could probably benefit from the attention of statisticians. Michael Hardy (talk) 19:37, 9 September 2008 (UTC)

List of basic statistics topics

List of basic statistics topics has been cleaned-up somewhat. Now would be a good time for reviewing the list and seeing if other basic topics are needed, and whether the articles already linked-to are appropriate. Melcombe (talk) 10:11, 11 September 2008 (UTC)


External links to Distribution Explorer

I would like to bring wider attention to the discussion at Talk:Probability distribution#Links to source code & a Statistical Distribution Explorer from all Wikipedia Distribution entries. See also User talk:Pabristow#Distribution Explorer. Please continue the conversation at the former location rather than here. Thanks, Qwfp (talk) 15:57, 13 September 2008 (UTC)

Statistical consultant

I have listed Statistical consultant for AfD: Wikipedia:Articles for deletion/Statistical consultantG716 <T·C> 12:39, 14 September 2008 (UTC)

Wikipedia 0.7

I would like to draw the attention of those who watch this page but not Wikipedia talk:WikiProject Mathematics to WT:WPM#Wikipedia 0.7 articles have been selected for Mathematics. Of the 31 selected maths articles with maintenance tags, it appears the only statistical ones are Central limit theorem (expert), Linear regression (cleanup) and Student's t-distribution (cleanup).

From a brief look at the selected articles for WikiProject Mathematics, there are rather more selected statistics articles of only Start quality, however, including Degrees of freedom (statistics), Average, Random walk, Statistical independence, Principal components analysis, Cumulative distribution function, Statistical hypothesis testing, Analysis of variance, Confidence interval, Chi-square distribution, Probability space, Arithmetic mean, Median, Random variable, Probability (I may well have missed some). I'm haven't checked how recently these were assessed, so it could be worth revisiting that as well as editing the articles themselves. --Qwfp (talk) 12:54, 16 September 2008 (UTC)

Wiki for collection of Statistics

I apologise if this question has come up before... Is there a wiki for the collection of statistics from reliable sources (e.g. government departments, NGOs, the UN etc.) - all I can find from a google search is the usage statistics for other wikis. If one does exist, could a link to it go on this page? If one doesn't exist, is there any good reason why not? I think it would be a very useful resource. Andipi (talk) 22:46, 21 September 2008 (UTC)

There are some collections of articles under Category:Statistical data sets ...perhaps these form the basis of a reasonable resource, but no doubt they could be improved. See also Official statistics. But these are probably more about some of the datasets, rarher than the data themselves, which maybe what Andipi was thinking of. Melcombe (talk) 08:43, 22 September 2008 (UTC)
Thanks for the link. I'm quite surprised there isn't a more comprehensive wiki for statistics actually. Any idea what it would take to set one up? Andipi (talk) 19:16, 24 September 2008 (UTC)

Standard normal deviate

The new article titled Standard normal deviate seems highly problematic at best. Michael Hardy (talk) 02:09, 22 September 2008 (UTC)

...and now I've drastically edited it so that for now it's not nonsense. Michael Hardy (talk) 02:13, 22 September 2008 (UTC)
there are many similar very short articles that might be collected together into Glossary of probability and statistics ... but it would be good if this had a structure such that it would be easy to redirect directly to the relevant part of the article. Melcombe (talk) 08:47, 22 September 2008 (UTC)

Minimum distance estimation

I think we need an article on minimum distance estimation. We have articles on some subtopics like Cramer-von Mises, Kolmogoro-Smirinov, etc. -- Avi (talk) 02:45, 24 September 2008 (UTC)

OK, so I started one. Corrections, improvements, etc. greatly appreciated at Minimum distance estimation. Thanks. -- Avi (talk) 19:08, 24 September 2008 (UTC)

Political Forecasting

Can someone take a look at that article? It was written by someone with an obvious WP:COI. VG 17:31, 24 September 2008 (UTC)

Template:Theory of probability distributions

I've just created Template:Theory of probability distributions as a navigational template (navbox). It includes pmf, cdf, pdf, several articles on moments, pgf, mgf, characteristic function and cumulant. This was partly inspired by BenFrantzDale's point at Template talk:VarDevSkewnessEtc about the desirability of making the relationship between (some of) these articles more explicit, although Template:VarDevSkewnessEtc didn't itself seem too popular.

I haven't yet transcluded this into the bottom of the relevant articles. Comments more than welcome at Template talk:Theory of probability distributions. Thanks, Qwfp (talk) 16:49, 25 September 2008 (UTC)

Uncertain data up for deletion

It's a stub written from a (too) narrow comp. sci. perspective, but I doubt deletion is the right action there. VG 22:42, 28 September 2008 (UTC)

I agree. I would prefer not to see that article deleted. SamanthaG (talk) 03:10, 1 October 2008 (UTC)

Statistical practice on AfD

Statistical practice is on AfDG716 <T·C> 03:00, 5 October 2008 (UTC)

Khmaladze transformation

The article titled Khmaladze transformation is a mess, but maybe it can be reorganized into a reasonable article. Michael Hardy (talk) 15:15, 9 October 2008 (UTC)

Pages needing attention/Statistics

Wikipedia:Pages needing attention/Statistics currently redirects to Wikipedia:Pages needing attention/Mathematics#Statistics which is woefully out of date. I suggest (a) deleting Wikipedia:Pages needing attention/Mathematics#Statistics, (b) redirecting Wikipedia:Pages needing attention/Statistics to Wikipedia:WikiProject Statistics/Cleanup listing, and (c) using Wikipedia:WikiProject Statistics/to do for pages that need special attention.—G716 <T·C> 13:25, 14 September 2008 (UTC)

It does look better, but unfortunately this other list is also out-of-date: June 2008 according to the bot's description. Melcombe (talk) 16:51, 15 September 2008 (UTC)
Apparently the database snapshots that the bot uses are no longer taken on a regular basis; I don't know if this is permanent or temporary. I've added {{see also|Wikipedia:WikiProject Statistics/Cleanup listing}} to Wikipedia:Pages needing attention/Mathematics#StatisticsG716 <T·C> 03:34, 3 October 2008 (UTC)
Wikipedia:WikiProject Statistics/Cleanup listing has been updated. —G716 <T·C> 11:00, 13 October 2008 (UTC)

Wagaman Reference Lines

Wagaman Reference Lines is on afdG716 <T·C> 22:35, 12 October 2008 (UTC)

Uniformly more powerful?

Should the term uniformly more powerful (as used in Bonferroni correction) point to uniformly most powerful test, or is there a better redirect? -- The Anome (talk) 14:22, 10 October 2008 (UTC)

I've directed "powerful" to statistical power and added a parenthetical explanation. Michael Hardy (talk) 03:41, 20 October 2008 (UTC)

New banner suggestion

I propose a redesign to the banner Template:WPStatistics/NewBanner as shown below. I think this is cleaner looking and provides more info than the current banner {{WPStatistics}}. Any comments? {{WPStatistics/NewBanner}}G716 <T·C> 02:39, 26 October 2008 (UTC)

New template StatsTopicTOC

I created Template:StatsTopicTOC to mimic Template:MathTopicTOC, but simpler. I intentionally left out the TOC to List of statistics topics as that article is on a single page, rather than the multi-page math list. Comments? —G716 <T·C> 15:06, 26 October 2008 (UTC)

pseudo code in correlation article

I've suggested (here) removing pseudo-code from the correlation article. Any opinions? —G716 <T·C> 11:17, 11 November 2008 (UTC)

Category:Statistical lists and tables

It seems to me that Category:Statistical lists and tables is morphing into a statistics version of Category:Mathematics-related lists, with only a couple of articles on "tables." I suggest moving Category:Statistical lists and tables to "Category:Statistics-related lists", and recategorizing the two "tables" articles. Regards—G716 <T·C> 18:04, 4 October 2008 (UTC)

There was previously an article "t-table" that contained numerical values for the Student-t distribution, but this was deleted possibly because some similar content is/was in the Student t article. The present two articles on tables do not contain extensive numerical values, but are more descriptive. Two thoughts are: (i) there may be scope for an article on statistical tables, briefly listing the tables in student-level tables and going on to describe what is contained in more extensive sets of tables; (ii) it might be useful to see if there are any more "actuarial" tables that can usefully be covered in a combined category. Melcombe (talk) 09:37, 6 October 2008 (UTC)
I cannot find any other articles on statistical tables, so I am going to remove the two articles in this category and move the cat.—G716 <T·C> 15:48, 16 November 2008 (UTC)

Please comment on the CfD here. Regards—G716 <T·C> 16:02, 16 November 2008 (UTC)

(Wilcoxon–)Mann–Whitney(–Wilcoxon) (U)

I'm not a statistician and therefore hesitate to edit stats articles. But I wonder about the terminology in one article: please see my comments here. Thank you. Tama1988 (talk) 09:53, 17 November 2008 (UTC)

T-table up for afd

T-table is on afdG716 <T·C> 05:08, 26 October 2008 (UTC)

This has been relisted to allow more opinions ... so still time to say what you think. Melcombe (talk) 15:28, 3 November 2008 (UTC)
Just as a side note, is this table correct? It gives a value of 1.96 at 97.5% at infinity, which I'm willing to bet a chance at an infinite amount of money is the value for 95%... SDY (talk) 00:54, 24 November 2008 (UTC)

Note that T-table no longer points to the article being discussed for deletion, which has now been deleted, see Wikipedia:Articles for deletion/T-table. Instead it is a redirect to a different table which was already in the Student t distribution article ... the difference being that one gave two-tailed and one one-tailed critical points. This may answer the "side note" above? Melcombe (talk) 09:55, 5 December 2008 (UTC)

"Binomial sign test"

Binomial sign test is a redlink (not a redirect); neither binomial test nor sign test (I think) mentions the other. Ummm? Tama1988 (talk) 11:13, 18 November 2008 (UTC)

New category

I have createda new category Category:Medical statistics, mainly on the grounds that it seems to be an obvious thing to find under "fields of application of statistics", but also because it seems that everything that is "medical statistics" may not be "epidemiology" which I think is the main alterative. So, can those with a revelant background think about adding existing articles about topics or special terminology used in medical statistics to the category. Melcombe (talk) 10:26, 18 December 2008 (UTC)

Hypothesis of linear regression

I have raised an AfD for this article: see Wikipedia:Articles for deletion/Hypothesis of linear regression. See also previous discussion here at Wikipedia talk:WikiProject Statistics/Archive 2#Hypothesis of linear regression?. Melcombe (talk) 17:13, 22 December 2008 (UTC)

How about adding Mathematical formalization of the statistical regression problem to the AfD as well? —G716 <T·C> 04:11, 23 December 2008 (UTC)

Deletion nomination

I've nominated this page:

Wikipedia:Articles for deletion/Mathematical formalization of the statistical regression problem

Don't just say Keep, Delete, Merge, etc. State your reasons and arguments. Michael Hardy (talk) 19:37, 30 December 2008 (UTC)

Proposal: always include examples

Proposal: in every article include an example section, with how the concept might be applied in real life —Preceding unsigned comment added by 89.181.64.44 (talk) 09:33, 5 January 2009 (UTC)

Article alerts

There's a new automated report for the project at Wikipedia:WikiProject Statistics/Article alerts.

It contains all articles in workflows which are tagged with {{WPStatistics}} on their talk page. See User:B. Wolterding/Article alerts

The following workflows are covered in this report. (Not necessarily all of them have active items, though.)

  • Proposed deletion
  • Articles for deletion
  • Miscellany for deletion
  • Templates for deletion
  • Categories for deletion
  • Good article nominations
  • Good article reassessment
  • Good topic candidates
  • Featured article candidates
  • Featured article reviews
  • Featured list candidates
  • Featured list removal candidates
  • Featured topic candidates
  • Peer review
  • Requests for comments
  • Requested moves
  • Did you know

I'll add a note on the project page to let folks know that this report exists. Regards —G716 <T·C> 03:44, 6 January 2009 (UTC)

Extreme value distribution needs fixing

Redirects in Gumbel distribution, log-Weibull distribution, is currently a mess. I suggest to rename current Fisher-Tippett distribution article as Gumbel distribution, and redirect Fisher-Tippett distribution to Extreme value distribution or vice versa, with possible merging/rearranging information. (Igny (talk) 02:29, 4 January 2009 (UTC))

Request to admins, can someone move Fisher-Tippett distribution to Gumbel distribution? Currently this article is primarily about the Gumbel distribution, and the more general case deserves a separate article, which is actually partially covered in generalized extreme value distribution. (Igny (talk) 18:16, 5 January 2009 (UTC))
I have put in a request for a rename (you could have done this yourself by following the instructions on the "move" tab of the page). Once this is done, will you either implement or check for a redirect of Fisher-Tippet to Generalized extreme values distrubution, which seems the most appropriate target I think. There would also be a need to revise some of the text/headings in the various articles. Melcombe (talk) 10:26, 6 January 2009 (UTC)
I have done some of the above at least. But there is also a need to think how the articles Type-1 Gumbel distribution and Type-2 Gumbel distribution should be dealt with. They seem to be, respectively, a different article about the Gumbel distribution, and a special parameterisation of one of the cases of the generalized extreme value distribution. It may be enough just to record their alternative names in the other articles, with the references to the underlying webpages also included. Melcombe (talk) 14:55, 6 January 2009 (UTC)
In my opinion, we might as well delete Type-1 Gumbel distribution and Type-2 Gumbel distribution. These are just differently parameterized Gumbel distribution and Weibull distribution, as implemented in GSL. (Igny (talk) 16:52, 7 January 2009 (UTC))
But the question is: are these terms used suffiently widely (ie. anywhere outside this library of computer routines) that the terms need to be mentioned and/or redirected in Wikipedia? Equivalence to something else is not directly relevant to this question. Melcombe (talk) 09:59, 8 January 2009 (UTC)

Major revision of Fisher consistency

Please see Fisher consistency: the original was short and I have added stuff about two other meanings for the same term but there may be others. It is still short and could do with some clarification throughout, by someone who knows about these things. Melcombe (talk) 11:25, 9 January 2009 (UTC)

Help this user

User:Bassis has created several new statistics articles. He or she clearly knows the topics, but the articles are the absolute worst imaginable cases of how to ignore necessary context-setting. (There are also the usual newbie issues of adapting to conventions, etc.) Michael Hardy (talk) 17:03, 12 January 2009 (UTC)

Contents of list of topics

Please contribute at the end of Talk:List of statistics topics if you have any thoughts on whether things like journals should be included in the list of statistics topics. Melcombe (talk) 13:05, 16 January 2009 (UTC)

Null hypothesis - discussion of merge

Please see Talk:Null hypothesis for discussion of moving a large slice of the existing article into the article Statistical hypothesis testing which may be where it belongs. Melcombe (talk) 10:31, 29 January 2009 (UTC)

Gravity (social science methodology)

Gravity (social science methodology) is an article in need of further work. It's been "prodded", which seems a bit extreme and I may remove the tag. Michael Hardy (talk) 02:17, 6 February 2009 (UTC)

Gravity in meta-analysis nominated for deletion

It is proposed that Gravity in meta-analysis be deleted. Please write opinions at Wikipedia:Articles for deletion/Gravity in meta-analysis. Don't just say Keep or Delete; give reasons. Michael Hardy (talk) 15:11, 6 February 2009 (UTC)

Poincaré plot

Poincaré plot needs work! Michael Hardy (talk) 18:57, 9 February 2009 (UTC)

Stable distribution articles

There is a discussion at Talk:Stability (probability) regarding rearragement of three existing articles on this topic. Please see this urgently of you are concerned about this. Melcombe (talk) 11:49, 23 February 2009 (UTC)

Average

It seems that an important insight has been deleted from this topic. There is a general approach to averages that explains why all the various kinds of averages are subsumed under the same name. In this article on averages, the approach is now only taken in regards to annualized return, but it is completely general for all types of averages of any number of terms (even the Heronian Mean of n terms) : the f-type of average of n terms is defined, using the n-place function, by: f(t1, t2, ...,tn) = f(A, A, ...,A). Thus, the arithmetic average, A, of two numbers is defined by the equation: N1 + N2 = A + A, and the geometric average, G, of three numbers is defined by the equation: N1*N2*N3 = G*G*G, and so on for all averages discussed in the article and elsewhere. Without such a discussion, the article gives no indication as to why all the various types of calculations it discusses are related. Please, lets move this discussion from taxonomy back to mathematics. <cf. "Advanced Portfolio Attribution Analysis" edited by Carl Bacon, chapter 6> 74.66.21.220 (talk) 07:55, 24 February 2009 (UTC)

I think the above is meant to relate to the content of Average.There is already some discussion of a similar point on that article's talk page. Melcombe (talk) 09:54, 24 February 2009 (UTC)

Statistical mechanics

Does anyone have any thoughts on dealing with articles related to 'statistical mechanics', as these often aren't particularly related to "statistics". I know that this project has covered a lot that is not "statistics" but rather "probability", but at least it is useful to statistics. I have taken the step of putting the category for statistical mechanics directly under "Proability and statistics" and well as leaving it under "Statistics", for now at least. Melcombe (talk) 10:55, 28 November 2008 (UTC)

I agree, statistical mechanics uses probability theory but its not closely related to statistics. "Probabablistic mechanics" might be a more descriptive name really. I've changed the text at the top of category:statistical mechanics to agree with the lead of the statistical mechanics article and say that it "concerns the application of probability theory..." rather than "...of statistics" as it said before. I'd !vote for removing it from category:statistics. Qwfp (talk) 18:28, 28 November 2008 (UTC)

I just came across statistical finance which also seems to suffer from using "statistical" in this strange way. Melcombe (talk) 10:17, 23 December 2008 (UTC)

Following on from the above, I have made a new category Category:Statistical mechanics theorems and moved some articles into this from Category:Statistical theorems. If anyone sees any others that deserve to be moved, then they could move them. Melcombe (talk) 09:24, 4 February 2009 (UTC)

The paragraph on Solomon four group design in the page on Experimental Design is extremely unclear and needs revision —Preceding unsigned comment added by Blinkman77 (talkcontribs) 15:12, 1 March 2009 (UTC)

Generalized distributions

There was a Little discussion recently at Talk:Generalized_Gaussian_distribution (now moved on to Talk:Generalized normal distribution) about "generalized" distributions which it may be worth extending to other cases. How should "generalized" distributions be treated when there are several different versions of distribution that have gone by the same name. One case in particular are those going by the name "generalized logistic" of which Johnson et al Vol 2 list three distinct types (although they say 4 different "Types") and there is at least another one. I guess the same must happen for other developments of standard distributions. It may be difficult to retain the existing format of the "distribution" articles, with their boxes for details and plots, if there are multiple versions of distributions of the same name. Melcombe (talk) 10:31, 3 March 2009 (UTC)

Correspondence Analysis: New Entry

I would like to write an article on Correspondence Analysis, I see this topic is mentioned in various other entries (e.g., Principal Component Analysis, and Seriation (Archeology)). Is it possible to set up the entry so that I write the article? I am new to all this, apparently it is easier if some expert sets up the new entry title, then I can add to it. Thanks. Michael Greenacre michael.greenacre@gmail.com 141.76.179.243 (talk) 09:53, 17 March 2009 (UTC)

I've replaced the redirect we had for correspondence analysis with the text from the relevant section of our principal components analysis article. Please go ahead and add to or replace this as you see fit. -- Avenue (talk) 00:24, 18 March 2009 (UTC)

Donsker's theorem

Can someone address the question I raised at talk:Donsker's theorem? Michael Hardy (talk) 12:00, 21 March 2009 (UTC)

General linear model

Please see Talk:General linear model for a question of terminology. Melcombe (talk) 10:31, 23 March 2009 (UTC)

The BiPareto distribution

The BiPareto distribution (capitalisation taken from the inventors) was introduced in 2002 in this paper and has since been used in various other papers. Unfortunately, I fear I have not the time to write an article about it. Perhaps, by mentioning it here, someone interested in writing one will come up. --Pot (talk) 12:42, 26 March 2009 (UTC)

MOS for CDFs and PDFs

I think that there should be consistency in the plots of distribution functions PDFs and CDFs. In particular, it would be nice if there were a single format for each, and size recommendation. I don't intend to work out the details of that manual entry here, just propose it (any maybe get some help as to where it would go). Two examples of where this would make the encyclopedia more unified across is Poisson distribution and Binomial distribution. In the case of the binomial, the pmf is shown with dots at each integer, the Poisson has lines connecting them. One CDF has lines connecting discrete jumps suggesting the CDF is a relation, the other shows the CDF as a function.

It might also make sense to prefer those with the code to those that do not have the code (this allows others to edit the code and verify accuracy). I think it also makes sense to prefer those made with open source software to those made with proprietary software. I would also argue that R should be preferred to GNUPLOT because R is truly cross platform, so for example there is CRAN a binary release for Windows, OS X, debian, suse, ubuntu, and redhat while sourceforge (the "primary download site" according the the gnuplot page) only has binaries for gnuplot for windows and OS 2. But maybe I am too far into the weeds here, I am not trying to open up a fight of the merits of these suggestions. I am mainly seeing if others are interested in having unification of style along this front.PDBailey (talk) 04:16, 28 March 2009 (UTC)

This would once have come under the remit of WP:WikiProject Probability but that's been essentially inactive for quite a while now. There was some discussion of the most appropriate way to plot discrete distributions at Template talk:Probability distribution#Discussion back in April 2005 (before my time). I'm not clear that a consensus was reached. Perhaps that's a good place to take up the discussion, and if a consensus is reached, guidelines could be included in Template:Probability distribution/doc, which is transcluded on Template:Probability distribution. Can't say I have any strong opinions on these issues myself — agree it's a good idea to include code in the image pages, though I think discussion of preferred software for plots could be a distraction. Qwfp (talk) 10:50, 28 March 2009 (UTC)

Conjoint analysis

I am surprised that the discussion of conjoint analysis makes no mention of the elegant work laying the axiomatic foundations of the multinomial logit model in 1974 by MIT professor Daniel McFadden. His paper "Conditional Logit Analysis of Qualitative Choice Behavior", in Frontiers in Econometrics is a classic in the field. —Preceding unsigned comment added by 203.118.130.60 (talk) 04:30, 5 April 2009 (UTC)

Proposed deletion

The recently added article Loyer's paradox has been proposed for deletion (neither thing by me). Is this something that is known by another name already or is it otherwise worth saving? Melcombe (talk) 08:40, 6 April 2009 (UTC)

From my comments on that article's talk page:
Arne's "overall average" is based on his having faced left-handed pitchers 80 times and right-handed pitchers only 20 times. The article is not explicit about the probability that the pitcher is left-handed. It implies that the probability that Arne is chosen is 1/2, regardless of the pitcher's handedness, since that number is used in finding the "unconditioned probability of a hit". Clearly Arne's overall probability of getting a hit depends on how often he faces a left-handed pitcher.
Suppose the probabilty of a left-handed pitcher is p and that of a right-handed pitcher is 1 − p. What is the conditional probability that Arne is chosen given that there's a left-handed pitcher? Is is still 1/2, i.e. the choice of batter is independent of the handedness of the pitcher? If so, then Arne's overall probability of getting a hit is 0.3p + 0.1(1 − p). Do the same for Barney, with his numbers.
If they're not independent, similar questions arise, but the joint probability distribution must be chosen in a way that makes the aforementioned "1/2" correct.
At any rate, that the article's bottom-line conclusion is erroneous can be seen by invoking the law of total probability.
I'm a bit rushed right now. I'll be back. Michael Hardy (talk) 20:36, 6 April 2009 (UTC)

Solution and refutation

begin excerpt from article

against left-handers against right-handers overall average
Arnie 24/80 = .300 2/20 = .100 26/100 = .260
Barney 12/50 = .240 24/50 = .480 36/100 = .360
P(H) .270 .290 .310

The data produce a paradox wherein the unconditional probability of the pinch-hitter getting a hit is greater than any of the conditional probabilities. In formal notation,

  • The unconditioned probability of a hit is
P(H) = {\frac 1 2}(.260) + {\frac 1 2}(.360) = .310
  • The conditional probability of a hit given the pitcher is left-handed is
P(H \mid L) = {\frac 1 2}(.300) + {\frac 1 2}(.240) = .270
  • The conditional probability of a hit given the pitcher is right-handed is
P(H \mid R) = {\frac 1 2}(.100) + {\frac 1 2}(.480) = .290

end excerpt from article

After the words "In formal notation", the coefficients of 1/2 in the first line of TeX mean that the probability that Arnie is chosen is 1/2 and the probability that Barnie is chosen is 1/2. In the second and third lines of TeX, the coefficients equal to 1/2 mean that the conditional probability that Arnie is chosen, given that the pitcher is left-handed, is 1/2, and the same for Barney, and the conditional probability that Arnie is chosen, given that the pitcher is right-handed, is 1/2, and the same for Barney. That implies that the choice of Arnie or Barney is independent of the handedness of the pitcher. This independence means the probability of getting a left-handed pitcher is the same for the two batters. But the table above treats them as different. That is the essential error. Once we know they're independent, then to complete the probability model, we need only know the probability p of getting a left-handed pitcher (and then 1 − p is the probability of a right-handed pitcher). Then we can say Arnie's "overall average", i.e. his probability of getting a hit, is


\begin{align}
\Pr(\text{hit} \mid \text{Arnie}) & = \Pr(\text{hit} \mid \text{LH and Arnie})\Pr(\text{LH}) + \Pr(\text{hit} \mid \text{RH and Arnie})\Pr(\text{RH}) \\
& = 0.3 p + 0.1(1-p) \\
& = 0.2p + 0.1.
\end{align}

Similarly Barney's overall probability of a hit is


\begin{align}
\Pr(\text{hit} \mid \text{Barney}) & = \Pr(\text{hit} \mid \text{LH and Barney})\Pr(\text{LH}) + \Pr(\text{hit} \mid \text{RH and Barney})\Pr(\text{RH}) \\
& = 0.24 p + 0.48(1-p) \\
& = 0.48 - 0.24p.
\end{align}

Then the unconditioned probability of a hit is


\begin{align}
\Pr(\text{hit}) & = \Pr(\text{hit} \mid \text{Arnie})\Pr(\text{Arnie}) + \Pr(\text{hit} \mid \text{Barney})\Pr(\text{Barney}) \\
& = \frac 1 2 (0.2p + 0.1) + \frac 1 2 (0.48 - 0.24p) \\
& = 0.29 - 0.02p.
\end{align}

The conditional probability of a hit given a left-handed pitcher is


\begin{align}
\Pr(\text{hit} \mid \text{LH}) & = \Pr(\text{Arnie})\Pr(\text{hit} \mid \text{LH and Arnie}) + \Pr(\text{Barney})\Pr(\text{hit} \mid \text{LH and Barney}) \\
& = \frac 1 2 (0.3) + \frac 1 2 (0.24) \\
& = 0.27.
\end{align}

Similarly, the conditional probability of a hit given a right-handed pitcher is


\begin{align}
\Pr(\text{hit} \mid \text{RH}) & = \Pr(\text{Arnie})\Pr(\text{hit} \mid \text{RH and Arnie}) + \Pr(\text{Barney})\Pr(\text{hit} \mid \text{RH and Barney}) \\
& = \frac 1 2 (0.1) + \frac 1 2 (0.48) \\
& = 0.29.
\end{align}

The question now is: Is the unconditional probability of a hit BETWEEN the conditional probability of a hit given LH, and the conditional probability of a hit given RH?

I.e. is 0.29 − 0.02p between 0.27 and 0.29?

The answer is "yes", since p is between 0 and  1. If p = 0, then the probability is 0.29. If p = 1, then the probability is 0.27.

So the claim in the article is false.

The error entered at the point where the article's author based the "overall average" on the historical frequency with which Arnie faced left- and right-handed batters. It should have been based instead on the probabilities of his facing left- and right-handed batters in the proposed scenario.

Notice that

 \Pr(\text{hit}) = \Pr(\text{hit} \mid \text{LH})\Pr(\text{LH}) + \Pr(\text{hit} \mid \text{RH})\Pr(\text{RH}), \,

and that is enough to prove that the unconditional probability of a hit must lie between the two conditional probabilities. In this connection, see also law of total probability. Michael Hardy (talk) 03:29, 7 April 2009 (UTC)

False Positive Rate.

In your definition of the false positive rate, shouldn't it be defined as S/R instead of V/R? —Preceding unsigned comment added by 67.171.96.218 (talk) 15:29, 6 April 2009 (UTC)

Polynomial regression

Polynomial regression was written by a VERY CONFUSED person, as you will see if you look at the edit history. I'm not sure whether it's worth saving or not. Michael Hardy (talk) 05:15, 7 April 2009 (UTC)

Michael Hardy, I am not sure why the editor or the process matter, the article is fine and an okay topic to have. What exactly was the point of this section? PDBailey (talk) 02:16, 12 April 2009 (UTC)
The point was that the article needed rewriting, and it has been ... between the time of the first post and yours ... so there probably was a good effect of posting here. Although, as always stiil more could be done. See article's talk page. Melcombe (talk) 08:57, 15 April 2009 (UTC)

Multiple meanings of cross-validation

Apparently there are uses of the term "cross-validation" (in analytical chemistry and psychology) that are completely unrelated to its meaning in statistics. Some edits have been made to the cross-validation page regarding one of the other uses. It's so completely unrelated that it seems to belong on a different page. I'm inclined to add a disambiguation page and move the current page to cross-validation (statistics). But I thought I would ask for comments first. Skbkekas (talk) 00:18, 12 April 2009 (UTC)

Skbkekas, what makes you think the one in chemistry is completely unrelated to the one in statistics? To me it seems obvious that it's a very similar idea. Michael Hardy (talk) 03:51, 14 April 2009 (UTC)
The idea in chemistry is to use two different assays to measure the same physical quantity. The rationale is that each assay suffers from its own (hopefully unique) biases, so if the finding is consistent in the two assays, it is less likely to be a technical artifact. The idea in psychology seems to be similar -- use two different assessment instruments to measure the same behavioral or personality characteristic, and hope that the limitations of one instrument are not shared by the other instrument. The idea in statistics seems quite distinct to me -- there is no analogue of the "two assays" or "two instruments". Rather, the concern is about overfitting a single data set using a single statistical model. Cross-validation in statistics isn't very easy to explain or understand, and I think that having to wade through remarks about other uses of the term can only make things more difficult for the reader. Skbkekas (talk) 01:30, 15 April 2009 (UTC)
Sounds like a good idea to me. PDBailey (talk) 02:12, 12 April 2009 (UTC)

Now clean up the links

It's been moved to cross-validation (statistics). Many of the links to cross-validation, now a disambiguation page, should link to cross-validation (statistics). Some (e.g. cross validation, with no hyphen) should link to the disambiguation page. And maybe some should link to the other pages. Please help. Michael Hardy (talk) 03:49, 14 April 2009 (UTC)

Agreed. I fixed around 10 and will look for others. Skbkekas (talk) 01:30, 15 April 2009 (UTC)

Merger proposal (familywise error rates)

I have proposed to merge experimentwise error rate and comparisonwise error rate into familywise error rate. Skbkekas (talk) 18:04, 22 April 2009 (UTC)

I think that's a good idea. Jollyroger131 (talk) 03:01, 27 April 2009 (UTC)

I've now done the following: I created a new page called false positive rate and redirected comparisonwise error rate and pairwise error rate to it. I think "false positive rate" is the modern term among the three, and the other two terms are identical in meaning but are usually used in the context of ANOVA. I did include some discussion on the false positive rate page about this. I'm less sure about what to do with familywise error rate and experimentwise error rate. I think "familywise error rate" is the more modern term, and most of the time the two terms are used interchangeably. However there may be some situations in higher order ANOVA where the meanings diverge. Unfortunately, the experimentwise error rate page suggests that "familywise" refers to comparisons within a factor, which is inconsistent with its modern usage outside of ANOVA. I don't know how standard this usage is within ANOVA. Skbkekas (talk) 16:31, 30 April 2009 (UTC)

Broken template

The link under "Review and assessment" to WikiProject Mathematics/Wikipedia 1.0/Probability and statistics seems broken at present, in that the template on that page does not produce the usual tables for this topic, whereas they are still produced for the other maths topics. Does someone here know whom to inform to try to get this fixed? Melcombe (talk) 09:30, 23 April 2009 (UTC)

I left a note for the bot owner hereG716 <T·C> 00:25, 24 April 2009 (UTC)
Somehow the page had become too large. I have recreated it and it seems to be OK now. Eventually this whole system will be merged into the WP 1.0 bot, which will solve these size problems. — Carl (CBM · talk) 01:45, 24 April 2009 (UTC)

truly large number of monkeys

Infinite monkey theorem and Law of Truly Large Numbers seem related in my opinion, so much that I thought of a possibility to merge. The articles did not even contain references to each other. (Igny (talk) 18:57, 25 April 2009 (UTC))

Least-squares estimation of linear regression coefficients

Least-squares estimation of linear regression coefficients is something of a mess. I think it's almost all by one person who thoroughly ignores WP:MOSMATH and has a really lousy expository style, overly notation-intensive and otherwise problematic. Michael Hardy (talk) 18:56, 27 April 2009 (UTC)

It seems to be yet another article along the lines of linear regression, linear least squares and linear model. While I don't think it is a problem having separate articles, they do seem to have a lot of overlap. —3mta3 (talk) 19:20, 27 April 2009 (UTC)

Op (statistics)

Op (statistics) is a very badly written article. Another article written by the same person, extremum estimator, suggests that "Op" is intended to be something akin to the big-O notation. But nothing in Op (statistics) explains that. Instead the article makes an assertion that is clearly not true as it stands. It never attempts to say what "Op" is. Can someone help? Michael Hardy (talk) 19:23, 29 April 2009 (UTC)

The idea is directly related to convergence in probability and might be best dealt with there perhaps (???) except that this redirects into a longer article (but this might still be suitable). Alternatively it could be put into the big-O notation article as it is almost a natural extension. The standard notation has a big or little O followed by a subscript p ... I think it is usually a lower case p but my dictionary uses a captial subscript P. A first problem would be to sort out a suitable article title even for a redirect .... the "p" or "P" stands for "probability" not "statistics". Melcombe (talk) 10:08, 30 April 2009 (UTC)
I have made some changes under the same title for the time being, but more work is needed, including finding a sensible title. Suggest move discussion to the article's talk page. Melcombe (talk) 11:41, 30 April 2009 (UTC)

Re-Organize Topics List on Statistics Portal

I propose a re-structuring of the topics listed on the statistics portal. The first two topic headers, "Descriptive Statistics" and "Inferential Statistics", are very general categories, but then we have headers of "Survival Analysis" and "Regression Analysis" which are individual methods. Also, Analysis of Variance is listed under "Inferential Statistics" while Regression Analysis gets its own header. This doesn't make sense - regression and ANOVA are closely related, both forms of the general linear model. Finally, I'm not sure about correlation getting its own header. You could view the sample correlation as a descriptive statistic, putting it under that header, but of course there are also methods to make inference on the population correlation, putting it under inferential statistics.

Overall I think the hierarchy of the topics needs to be reconsidered. I briefly looked at editing the portal page myself, but I don't see how to edit the topic listings.

I think it would make sense to have a topic header of "Linear Models", including linear regression and ANOVA. "Design of Experiments" is another important general category that could get its own header, and this could include articles on randomization, blocking, fixed and random factors, crossed and nested designs, effect size, power and sample size calculations, etc. Jollyroger131 (talk) 04:14, 25 April 2009 (UTC)

I just delved into editing the topics section of the portal, so if no one objects I'll try to reorganize it myself. Jollyroger131 (talk) 02:54, 27 April 2009 (UTC)
I agree it needed a revamp, and in the absence of any obvious progress i decided to be bold and have a go at it myself (hope you don't mind Jollyroger131). I decided it has to include a "Probability" heading as there's no separate portal covering that. I've discovered that statistics isn't easy to classify into a small number of headings, so the placement of several topics (e.g. correlation) is fairly arbitrary. I've tried to keep it to general topics and key concepts rather than e.g. specific statistical tests. Please feel free to edit Portal:Statistics/Topics or discuss further here or start Portal talk:Statistics. Qwfp (talk) 14:17, 3 May 2009 (UTC)
I Don't mind at all - looks pretty good. Jollyroger131 (talk) 03:51, 11 May 2009 (UTC)

Loyer's paradox on AfD

I've nominated the article titled Loyer's paradox for deletion. I hesitated for a few weeks before doing this because the article's author had said he would replace the content. Some time has gone by with no progress on this. I'll withdraw the nomination if he can do that. But for now, see the discussion at Wikipedia:Articles for deletion/Loyer's paradox. Don't just say Keep or Delete; give your arguments for your position. Michael Hardy (talk) 00:28, 8 May 2009 (UTC)

Interaction

Copy of my comments at talk:interaction variable:

The second sentence in this article says an interaction variable is formed by multiplying two predictor variables.

But then later in the article it's talking about categorical predictors, which clearly cannot be multiplied.

Now someone's proposed merging this with interaction (statistics). And that article also begins by talking about multiplication of two predictors, just as if it's about that particular form of interaction rather than about interaction in general.

What a mess.

Michael Hardy (talk) 16:47, 13 May 2009 (UTC)

Statistician stubs

After obtaining agreement (or at least lack of objection) at Wikipedia:WikiProject Stub sorting/Proposals#Statistician, I've created a new stub template {{Statistician-stub}} and Category:Statistician stubs to help locate statistician biography stubs in need of expansion or improvement. Cat Scan found 120 articles in both Category:Statisticians and Category:Scientist stubs or their subcategories (including Category:Mathematician stubs), and I believe such a template would be appropriate for a large majority of these articles. About 40 of these are American and 40 are British, so although there's no need for country-specific subcategories at present, I've created 'upmerged' templates {{US-statistician-stub}} and {{UK-statistician-stub}} with a view to the future – at present these all place articles directly in Category:Statistician stubs but it will make it easier to create separate categories in the future should numbers grow to justify it.

I've started going through the articles found by Cat Scan and changing the stub templates when appropriate but could do with some help! If you want to help please take the opportunity to consider whether the article is still a stub, and try to ensure the template tag is located in accordance with the guidelines at WP:STUB#How to mark an article as a stub. You might even feel inspired to edit the article text too... Thanks, Qwfp (talk) 20:15, 14 May 2009 (UTC)

Yes check.svg Done See Category:Statistician stubs. Now we just need to expand them... Qwfp (talk) 17:52, 16 May 2009 (UTC)
Nice work! I moved a handful of articles from Category:statistics stubs to Category:statistician stubsG716 <T·C> 00:40, 17 May 2009 (UTC)

webapplets

A couple users (one?) have been adding links like [this one that are "interactive distributions." Is this link spam or useful? What should generally be included in the external links for a distribution? PDBailey (talk) 22:16, 16 May 2009 (UTC)

This discussion is now located at Category talk:Discrete distributions. PDBailey (talk) 01:42, 19 May 2009 (UTC)

GA Sweeps invitation

This message is being sent to WikiProjects with GAs under their scope. Since August 2007, WikiProject Good Articles has been participating in GA sweeps. The process helps to ensure that articles that have passed a nomination before that date meet the GA criteria. After nearly two years, the running total has just passed the 50% mark. In order to expediate the reviewing, several changes have been made to the process. A new worklist has been created, detailing which articles are left to review. Instead of reviewing by topic, editors can consider picking and choosing whichever articles they are interested in.

We are always looking for new members to assist with reviewing the remaining articles, and since this project has GAs under its scope, it would be beneficial if any of its members could review a few articles (perhaps your project's articles). Your project's members are likely to be more knowledgeable about your topic GAs then an outside reviewer. As a result, reviewing your project's articles would improve the quality of the review in ensuring that the article meets your project's concerns on sourcing, content, and guidelines. However, members can also review any other article in the worklist to ensure it meets the GA criteria.

If any members are interested, please visit the GA sweeps page for further details and instructions in initiating a review. If you'd like to join the process, please add your name to the running total page. In addition, for every member that reviews 100 articles from the worklist or has a significant impact on the process, s/he will get an award when they reach that threshold. With ~1,300 articles left to review, we would appreciate any editors that could contribute in helping to uphold the quality of GAs. If you have any questions about the process, reviewing, or need help with a particular article, please contact me or OhanaUnited and we'll be happy to help. --Happy editing! Nehrams2020 (talkcontrib) 06:17, 20 May 2009 (UTC)