Talk:Confidence interval

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Missing the forest for the trees[edit]

The CI article missed or obfuscates a few critical notions:

  • Confidence intervals are a way to express the mean and variability of a sample.
  • Confidence implies a sample---probability has the connotation of a distribution.

As much as I like things to be Right, the right-of-way in an encyclopedia goes to the pedestrian. At least make the introduction more clear. neffk (talk) 22:19, 26 February 2016 (UTC)

Agree, & why don't you take a crack at it? Please ? --Pete Tillman (talk) 16:56, 30 March 2016 (UTC)

With regards to the approachability of this article[edit]

Why not use the Simple English version of this complicated article (link below)? It seems more accessible for the average reader than the in-depth one here.

DC (talk) 14:26, 30 March 2016 (UTC

A few carefully chosen words word make the article much clearer[edit]

For example:

"In each of the above, the following applies: If the true value of the parameter lies outside the 90% confidence interval once it has been calculated, then an event has occurred which had a probability of 10% (or less) of happening by chance."

It took me a long time to understand why this was not an example of the common misconceptions detailed in the following section. Only when I'd understood that "an event" should be taken to mean "the action of performing sampling and calculating the specific interval" and not simply "the lying outside the interval of the population mean", did the sentence not appear at odds with a correct interpretations of the confidence interval. — Preceding unsigned comment added by (talk) 22:37, 18 July 2016 (UTC)


Misunderstandings Section[edit]

An anonymous editor has twice removed the text "nor that there is a 95% probability that the interval covers the population parameter" from the section under misunderstandings, with the claim that it is redundant. It is not redundant; I wish it were. There are erroneous accounts of confidence intervals which consider that it is incorrect to speak of the probability of a parameter lying within an interval but legitimate to speak of the probability of an interval covering the parameter. This is sometimes justified by saying that a parameter is a constant while the bounds of the inverval are random variables. Both statements are in fact false and it is important to state this clearly.Dezaxa (talk) 16:42, 7 March 2017 (UTC)

I don't see how you've made a distinction between the two. How is saying a given constant value is within a given range any different from saying a given range includes (or "covers") a given constant value? And where in the cited source is that distinction made? (talk) 06:31, 10 March 2017 (UTC)

Update: I've changed "nor" to "i.e.". This way we are not giving the false impression that the two statements are distinct, but it is still clear that either way the statement is phrased it is false. This compromise should satisfy both of our concerns I think. (talk) 21:17, 10 March 2017 (UTC)

The Misunderstandings section (and the 2nd paragraph of the summary) harp on what seems to be a pointless distinction without a difference. Of course a past sampling event either contains the true parameter, or does not. But when we speak of "confidence" that's exactly what we mean: our confidence that the sample DOES contain the parameter. To use the true-coin analogy: a person flips a coin and keeps it hidden. I am justified in having 50% confidence in my claim that the coin landed "heads". It did or did not... but confidence is referring to my level of certainty about the real, unknown value.

Now to the confidence interval: if it is correct to say that, when estimated using a particular procedure, 95 out of 100 95%-confidence intervals will contain the true parameter, than surely it must follow that I may be 95% confident that the one interval actually calculated contains that parameter. Go with the hypothetical: if the procedure was conducted 100 times, 95 of those would contain the parameter. But we have selected a subset (size one) of those 100 virtual procedures. 95 times out of 100, taking that subset will yield a sample that includes the true parameter. So I am 95% confident that this is what has happened. It either did or didn't, obviously, which is true for all statistics regarding past events. But confidence doesn't only apply to future events, but to unknown past ones.

Or one more way: I intend to do the procedure 100 times. It's expected that when I'm done, 95 of those 100 intervals will contain the true parameter. It then follows, since I have no reason to expect bias, that there's a 95% chance that the very first time I do the procedure, my interval will contain the true parameter. The fact I don't get around to doing 99 more procedures is irrelevant - I can be 95% confident that the one/first procedure performed does contain the true parameter.

How is this incorrect? (And, semi-related: I wasn't bold enough yet to edit down the pre-TOC introduction, but it unnecessarily duplicates this same Misunderstandings information). Grothmag (talk) 21:41, 6 April 2017 (UTC)