Talk:Binomial proportion confidence interval

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 

suggestions for improvement[edit]

I wrote some of the material a few years ago. Maybe it is a bit too technical. I'll add an extra paragraph in the introduction that explains why there is more than one formula. Steve Simon (talk) 15:00, 9 September 2008 (UTC)

The article has been labelled too technical, but I don't see it as being that much more technical than a lot of other mathematical articles. One could leave out detail to make things more succinct, but that might make it more difficult to follow. Some suggestions:

  • remove the bit on inverting hypothesis tests, and just mention the normal-derived interval is called a Wald interval, with a link
  • add a section on continuity corrections for the normal interval (and score intervals?)146.232.75.208 15:17, 22 September 2006 (UTC)

27 Nov 2006: I am not a statistician but I believe there may be an important error in the Wilson score interval. According to http://www.ppsw.rug.nl/~boomsma/confbin.pdf, the final term in the numerator under the square root sign should be (z squared)/(4n squared), not (z squared)/4n as is written. I don't have the mathematical capacity to determine which is correct, but for my data the former calculation makes a lot more sense than the latter, so I suspect that wikipedia's entry is wrong. I hope a statistician reviews this at some point!


Actually I think the Wilson score interval was right the first time. The formula in the cited article only looks different because the expression inside the square root was multiplied out. 131.111.8.104 15:34, 29 May 2007 (UTC)


  • There is a comment within the article that does not belong there. Look in the section "wilson score interval" for the sentence "(The following formula may be wrong. It's identical to the way the Normal approximation is derived)". This statement has to be moved to the discussion. Can someone please check if the formula is correct and then remove that comment, please. —Preceding unsigned comment added by 82.212.0.230 (talk) 13:52, 17 May 2008 (UTC)

I don't understand the comment that the Clopper-Pearson intervals are conservative due to the discreteness of the Binomial distribution; they are based on the beta distribution which IS continuous and well behaved in the interval. So, in fact, I think the comment is wrong (Fredrik x nilsson).

  • Fredrik, check out Brown, Cai, DasGupta 2001 in the references for a great illustration of the conservative performance of the Clopper-Pearson interval. I updated this section to help clarify. In short, by ensuring that the coverage is never below 95%, it is often much above 95%. MrYdobon (talk) 08:26, 29 September 2009 (UTC)


I don't agree with the last section in its summary of those papers. First, it makes no mention of the Wald interval, which it flat out states that it generally does not produce coverage probabilities of the levels desired. Moreover, I think the description of "better" is ambiguous and misleading at best. The exact coverage probabilities are inherently good, since they always guarantee that you reach at least the desired intervals. However, they may calculate an interval that is too large than desired. The approximations are "better" in the sense they don't over-estimate as much. In either case, however, both the exact and non-Wald approximations are generally generally better CI estimates for small n. —Preceding unsigned comment added by 134.174.140.216 (talk) 21:01, 11 June 2010 (UTC)

The entry is technical, not too technical, in my opinion. However, for readers unfamiliar with all the formulas, -- myself included -- it would be helpful to see the calculation and result for an example case using each formula. — Preceding unsigned comment added by Tjrm (talkcontribs) 21:36, 9 January 2011 (UTC)

Please have a look at the German WP article on this topic. It includes an example and tries to illustrate the intervals and the coverage probabilities. -- KurtSchwitters (talk) 09:17, 4 February 2011 (UTC)

I have made two amendments which I view as essential on this topic. The normal approximation section replicates a common error in explanations of the binomial proportion interval, one which inevitably leaves the reader totally confused. This error is to conflate the distribution of the likely position of an observation p about a population true value P (which is binomial) with the likely position of P about an observation p (which is not). Since the latter is what we wish to obtain, this is very serious indeed! Moreover, if you don't recognise this, the rest of the page makes no sense! Why should we bother with improvements if the actual interval is binomial (and almost-normal)? So, my suggested amendments are: (1) revise the explanation of the normal approximation to avoid this error and then (2) deal with it conclusively under the Wilson score interval section. Sean a wallis (talk) 21:58, 5 July 2013 (UTC)

Further amendments, sharpened up the Normal approximation section to discourage casual and incorrect use of this problematic method, and added a new section on the Wilson interval with continuity correction. The latter could be a sub-section of the Wilson interval. Sean a wallis (talk) 14:36, 12 July 2013 (UTC)

I came across the lower bound of the Wilson Score Interval being used as a 'confidence' metric for decision tree nodes[1]. Perhaps adding a subsection to the Wilson Interval about how it can be applied to decision trees would be useful. Gkarthik92 (talk) 19:52, 29 December 2015 (UTC)

Why reverted?[edit]

Resolved

@Qwfp: Can you please elaborate on why you reverted my edit? The old version was certainly not clear enough for a layperson. I can see your objecting to my use of the word "likelihood" instead of "probability", though I was using it in what I thought was a layperson-friendly way; so I agree to changing it to "probability". But you also said "Added text contained incorrect interpretation of confidence interval" -- can you explain what's incorrect about it? The added text said, with "likelihood" replaced by "probability",

A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a (not necessarily fair) coin is flipped ten times. The observed binomial proportion is the fraction of the flips which turn out to be heads. Given this observed proportion, the confidence interval for the true proportion innate in that coin is the range of possible proportions which, with some specified probability such as 95%, contains the true proportion. Thus there is a 95% chance that the true probability of heads coming up on any throw is somewhere in that range.

What's wrong with that? Duoduoduo (talk) 14:40, 14 February 2011 (UTC)

See the lead of confidence interval:"A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained." This is a widespread misconception. Repeating it in Wikipedia could make it even more widespread, which would not be desirable. --Qwfp (talk) 19:48, 14 February 2011 (UTC)
That passage was put in in November, and it is true only if one uses a Bayesian definition of "probability". The article also says "How frequently the interval contains the parameter is determined by the confidence level or confidence coefficient." In other words, the "probability" in the frequentist sense that you've got the true parameter in the confidence interval is 95% or whatever. So it's not a "misconception"; rather it is a matter of one's preferred definition of "probability". But I think it is valid to use wording that avoids semantic debates.
The problem I was trying to address with my edit is that the article as it is written does not define the term in the title of the article. All it says is "a binomial proportion confidence interval is a confidence interval for a proportion in a statistical population." That's circular and about as uninformative as it could get! I'll rewrite it to put in the same definition that the other article uses, applied to this context. In doing so, the example also needs to be expanded as I did; as the example stands, it is uninformative since it says nothing about a proportion. Duoduoduo (talk) 21:17, 14 February 2011 (UTC)
I'm happy with the text of your contribution to the article now — many thanks for revising it. I apologise for reverting your edits rather than taking the time to revise the text myself. I don't entirely agree that it's a mere semantic debate but that's not relevant to improving the article, and it's a debate that's already been held at Talk:Confidence interval#Meaning of the term "confidence" and Talk:Confidence interval#second paragraph in lead. I didn't participate myself as I find such debates somewhat stressful, so I certainly don't wish to start another here. Best wishes, Qwfp (talk) 09:07, 15 February 2011 (UTC)

Extension of scope[edit]

The above discussion suggests that there may be a good reason to expand the scope of ths article to become "Interval estimation for binomial proportions" and to allow an equal footing in the article for credible intervals, with distinctions being made where appropriate. But that might just get too confusing. Perhaps there could be a separate article called "Binomial proportion crediible interval". Melcombe (talk) 10:06, 15 February 2011 (UTC)

Extending the scope seems a reasonable idea to me, especially as one confidence interval with reasonable frequentist properties that's not covered here at present is the credible interval resulting from a Jeffreys prior, so there's some overlap. The theory of credible intervals for a binomial proportion is simpler than that of confidence intervals (once you've decided on your prior), so it shouldn't take too long to explain. Much of the mathematical basis is already present at conjugate prior#Example but that doesn't mention credible intervals, while the credible interval article lacks any examples, so this could be a nice illustration. I'm not volunteering to write it though! Qwfp (talk) 11:05, 15 February 2011 (UTC)
I have just added a section Binomial proportion confidence interval#Jeffreys interval, however. If someone wishes to edit and extend it to include informative priors and thus 'fully' Bayesian credible intervals, that's fine with me. Qwfp (talk) 15:45, 15 February 2011 (UTC)

Typo (?) at the end of the Agresti-Coull section[edit]

At the end of the section of the A-C interval, we read: "this is the "add 2 successes and 2 failures" interval in [7]."

The citation is superscripted. Because that quotation comes from the article, should the "[7]" be part of the sentence or should it be superscripted as in the present article?

Or one could use the full article quotation: …this is the "'add two successes and two failures' adjusted Wald interval." And the give the citation. (Also it's at p.122b)

  1. ^ http://blog.bigml.com/2012/11/29/put-some-confidence-in-your-predictions/