Talk:Confidence interval

Mathematics Start‑class High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
High	This article has been rated as High-priority on the project's priority scale.

Statistics Unassessed

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Archives

Archive 1 - 03 March 2008

A question

Can any body tell me why the interval estimation better than point estimation?(shamsoon) —Preceding unsigned comment added by Shamsoon (talk • contribs) 14:53, 23 February 2008 (UTC)[reply]

inserted a little about this point in intro to article Melcombe (talk) 10:00, 26 February 2008 (UTC)[reply]

Disputed point

This "dispute" relates to the section "How to understand confidence intervals" and was put in by "Hubbardaie" (28 February 2008)

I thought I would start a section for discussing this.

Firstly there is the general question about including comparisons of frequentist and Bayesian approaches in this article when they might be better off as articles specifically for this comparison. I would prefer it not appearing in this article. I don't know if the same points are included in the existing frequentist vs Bayesian pages. Melcombe (talk) 16:46, 28 February 2008 (UTC)[reply]

Secondly, it seems that some of the arguments here may be essentially the same as that used for the third type of statistical inference -- fiducial inference -- which isn't mentioned as such. So possibly the "frequentist vs Bayesian" idea doesn't stand up. Melcombe (talk) 16:46, 28 February 2008 (UTC)[reply]

That fiducial intervals in some cases differ from confidence intervals was proved in 1936 by Bartlett in 1936. Michael Hardy (talk) 17:17, 28 February 2008 (UTC)[reply]

I agree that moving this part of the article somewhere might be approapriate. I'll review Bartlett but, as I recall, he wasn't making this exact argument. I believe the error here is to claim that P(a<x<b) is not a "real" probability because x is not a random variable. Acually, that is not a necessary criterion. Both a and b are random in the sense that they were computed from a set of samples which were selected from a population by a random process.Hubbardaie (talk) 17:27, 28 February 2008 (UTC)[reply]

Also, a frequentists holds only that the only meaning of "probability" is in regards to the frequency of occurances over a large number of trials. Consider a simple experiment for the measurement of a known parameter by random sampling (such as random sampling from a large population where we know the mean). Compute a 90% CI based on some sample size and we repeat this until we have a large number of separate 90% CI's each based on their own randomly selected samples. We will find, after sufficient trials, that the known mean of the population falls within 90% of the computed 90% CI's. So, even in a strict frequentist sense, P(a<x<b) is a "real" probability (at least the distinction, if there is a real distinction, has no bearing on observed reality).Hubbardaie (talk) 17:32, 28 February 2008 (UTC)[reply]

I think one version of a fiducial derivation is to argue that it is sensible to make the conversion from (X random, theta fixed) to (X fixed,theta random) directly without going via Bayes Theorem. I thought that was what was being done in this section of the article. Melcombe (talk) 18:05, 28 February 2008 (UTC)[reply]

Another thing we want to be careful of is that while some of the arguments were being made by respected statisticians like R.A. Fischer, these were views that were not widely adopted. And we need to separate out where the expert has waxed philisophically about a meaning and where he has come to a conclusion with a proof. At the very least, if a section introduces these more philisophical issues, it should not present it as if it were an uncontested and mathematically or empirically demonstrable fact.Hubbardaie (talk) 18:13, 28 February 2008 (UTC)[reply]

P(a<x<b)=0.9 is (more or less) the definition of a 90% confidence interval when a and b are considered random. But once you've calculated a particular confidence interval you've calculated particular values for a and b. In the example on the page at present, a = 82 - 1.645 = 80.355 and b = 82 + 1.645 = 83.645. 80.355 and 83.645 are clearly not random! So you can't say that P(80.355<x<83.645)=0.9, i.e. it's not true that the probability that x lies between 80.355 and 83.645 is 90%. This is related to the prosecutor's fallacy as it's related to confusing P(parameter|statistic) with P(statistic|parameter). To relate the two you need a prior for P(parameter). If your prior is vague compared to the info from the data then the prior doesn't make a lot of difference, but that's not always the case. I'm sure I read a particularly clear explanation of this fairly recently and I'm going to rack my brains to try to remember where now.Qwfp (talk) 18:40, 28 February 2008 (UTC)[reply]

I think fiducial inference is a historical distraction and mentions of it in the article should be kept to a very minimum. I think the general opinion is that it was a rare late blunder by Fisher (much like vitamin C was for Pauling). That's why I added this to the lead of fiducial inference recently (see there for ref):

In 1978, JG Pederson wrote that "the fiducial argument has had very limited success and is now essentially dead."

Qwfp (talk) 18:40, 28 February 2008 (UTC)[reply]

I think that you would find on a careful read of the prosecutors fallacy that it addresses a very different point and it would be misapplied in this case. I'm not confusing P(x|y) with P(y|x). I'm saying that experiments show that P(a<x<b) is equal to the frequency F(a < x < b) over a large number of trials (which is consistent with a "frequentist" position). On another note, no random number is random *once* it is chosen, but clearly a and b were computed from a process that included a random variable (the selection from the population) If I use a random number generator to generate a number between 0 and 1 with a uniform distribution, there is a 90% chance it will generate a value over .1. Once I have this number in front of me, it is an observed fact but *you* don’t yet know it. Would you say that P(x>.1) is not really a 90% probability because the actual number is no longer random (to me) or are you saying that its not really random because the “.1” wasn’t random? If we are going down the path of distinguishing what is “truly random” vs. “merely uncertain”, I would say we would have to solve some very big problems in physics, first. Even at a very fundamental level, is there a real difference between “random” and “uncertain” that can be differentiated by experiment? Apparently not. The “truly random” distinction won’t matter if nothing really is random or if there is no way to experimentally distinguish “true randomness” from observer-uncertainty. Hubbardaie (talk) 20:05, 28 February 2008 (UTC)[reply]

I've tracked down at least one source that i was half-remembering in my (rather hurried) contributions above, namely

Lindley, D.V. (2000), "The philosophy of statistics", Journal of the Royal Statistical Society: Series D (The Statistician), 49: 293–337, doi:10.1111/1467-9884.00238

On page 300 he writes:

"Again we have a contrast similar to the prosecutor's fallacy:

confidence—probability that the interval includes θ;

probability—probability that θ is included in the interval.

The former is a probability statement about the interval, given θ; the latter about θ, given the data. Practitioners frequently confuse the two."

I don't find his 'contrast' entirely clear as the difference between the phrases "probability that the interval includes θ" and "probability that θ is included in the interval" is only that of active vs passive voice; the two seem to mean the same to me. The sentence that follows that ("The former...") is clearer and gets to the heart of it I think. I accept it's not the same as the prosecutor's fallacy, but as Lindley says, it's similar.

I think one way to make the distinction clear is to point out that it's quite easy (if totally daft) to construct confidence intervals with any required coverage that don't even depend on the data. For instance, to get a 95% CI for a proportion (say, my chance of dying tomorrow) that I know has a continuous distribution with probability 0 of being exactly 1:

Draw a random number between 0 and 1.
If it's less than 0.95, the CI is [0,1].
If it's greater than 0.95, the CI is [1,1], i.e. the single value 1 (the "interval" is a single point)

In the long run, 95% of the CIs include the true value, satisfying the definition of a CI. Say I do this and the random number happens to be 0.97. Can I say "there's a 95% probability that I'll die tomorrow" ?

Clearly no-one would use such a procedure to generate a confidence interval in practice. But you can end up with competing estimation procedures giving different confidence intervals for the same parameter, both of which are valid and have the same model assumptions, but the width of the CIs from one procedure varies more than those from the other procedure. (To choose between the two procedures, you might decide you prefer the one with the smaller average CI width). For example, say (0.1, 0.2) and (0.12, 0.18) are the CIs from the two procedures from the same data. But you can't then say both "there's 95% probability that the parameter lies between 0.10 and 0.20" and "there's 95% probability that the parameter lies between 0.12 and 0.18" i.e. they can't both be valid credible intervals.

Qwfp (talk) 22:46, 28 February 2008 (UTC) (PS Believe it or not after all that, I'm not in practice a Bayesian.)[reply]

Good, at least now we have a source. But I have two other very authoritative sources:A.A. Sveshnikov "Problems in Probability Theory, Mathematical Statistics and the Theory of Random Functions", 1968, Dover Books, pg 286 and a very good online source Wolframs's Mathmatica source site http://mathworld.wolfram.com/ConfidenceInterval.html. The former source states on pg 286 the following (Note: I could not duplicate some symbols exactly as Sveshnikov shows them, but I replaced them consistently so that the meaning is not altered)

"A Confidence interval is an interval that with a given confidence a covers a parameter θ to be estimated. The width of a symmetrical confidence interval 2e is determined by the condition P{|θ - θ'|<=e}=a, where θ' is the estimate of the parameter θ and the probability {emphasis added} P{|θ - θ'|<=e} is determined by the distribution law for θ'"

Here Sveshnikov makes it clear that he is usin the confidence interval as a statement of a probability. When we go to the Mathworld site it defines the confidence interval as:

"A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability." {emphasis added}

I find other texts such as Morris DeGroot's "Optimal Statistical Decisions" pg 192-3 and Robert Serfling's "Approximation Theorems in Mathematical Statistics" pg 102-7 where confidence intervals are defined as P(LB<X<UB) {replacing their symbols with LB and UB}. It is made clear earlier in each text that the P(A) is the probability of A and the use of the same notation for the confidence interval apparently merits no further qualification.

On the other hand, even though none of my books within arm’s reach make the distinction Qwfp’s source makes, I found some other online sources that do attempt to make this distinction. When I search on ( “confidence interval”, definition, “probability that” ) I found that a small portion of the sites that come up make the distinction Qwfp’s source is proposing. Sites that make this distinction and sites that define confidence interval as a probability both include sources from academic institutions and what may be laymen. Although I see the majority of sites clearly defining a CI as a ‘’’probability’’’ that a parameter falls within an interval, I now see room for a possible dispute.

The problem is that the distinction made in this “philosophy of statistics” source and in other sources would seem to have no bearing on its use in practice. What specific decisions will be made incorrectly if this is interpreted one way or the other? Clearly, anyone can run an experiment on their own spreadsheet that shows that 90% of true means will fall within 90% CI’s when such CI’s are computed a large number of times. So what is the pragmatic effect?

I would agree to a section that, instead of matter-of-factly stating one side of this issue as the “truth”, simply points out the difference in different sources. To do otherwise would constitute original research.Hubbardaie (talk) 05:05, 29 February 2008 (UTC)[reply]

One maxim of out-and-out Bayesians is that "all probabilities are conditional probabilities", so if talking about probabilities, one should always make clear what those probabilities are conditioned on.

The key, I think, is to realise that Sveshnikov's statement means consideration of the probability P{(|θ - θ'|<=e) | θ }, i.e. the probability of that difference given θ, read off for the value θ = θ'. This is a different thing to the probability P{(|θ - θ'|<=e) | θ' }.

I think the phrase "read off for the value θ = θ' " is correct for defining a CI. But an alternative (more principed?) might be to quote the maximum value of |θ - θ'| s.t. P{(|θ - θ'|<=e) | θ } < a.

Either way, the wheels would still fall off in the example of the next section. Jheald (talk) 15:09, 29 February 2008

Sveshnikov said what he said. And I believe you are still mixing some unrelated issues about the philisophical position of Bayesian (which I prefer to call subjectivist) vs. frequentist view. Now we are going in circles, so just provide citations of the arguments you want to make. The only fair way to treat this is as a concept that lacks concensus even among authoritative sources. —Preceding unsigned comment added by Hubbardaie (talk • contribs) 29 February 2008

I have moved some of the section that I believe is correct to before the "Dispute" marker in the article, so that the "disputed bit" is more clearly identified. I hope the remaining is what was meant. Melcombe (talk) 18:02, 13 March 2008 (UTC)[reply]

Two questions: the first is just seeking clarification about the dispute. Is the issue in dispute analagous to this: I bought a lottery ticket yesterday and had a 1 in a million chance of winning last night. Today, I have heard nothing about what happened in the lottery last night. Some of you are saying that I can no longer assert that I have a 1 in a million chance of having won the lottery - I either have or I haven't; objective uncertainty no longer exists and therefore I cannot assign a probability. Others of you are saying that I can still say that there's a 1 in a million chance that I've won (on the basis that I have no information now that I didn't have yesterday). [For "Winning lottery", read "getting a confidence interval that really does contain the true value".] —Preceding unsigned comment added by 62.77.181.1 (talk) 16:21, 30 April 2008 (UTC)[reply]

For the first question ... strictly the "dispute" should be about the accuracy of what is in the article, or about the accuracy of how it represents different interpretations or viewpoints of things, where such differences exist. Unfortunately the discussion here has turned into a miasma with a different type of dispute ongoing. Your analogy is only partly related to the question here, since here there are three quantities involved ... the two end points of the interval, and the thing in the middle. In your analogy, you have something (the outcome of the lottery draw) that is at one stage random, at other fixed (one drawn) but unknown, and at other fixed and known. For CI's the endpoints are at one stage random, and at another fixed and known, while in the traditional CI, the thing in the middle is always fixed (and unknown). For a Bayesian cedible interval, the end-points are first random then fixed as before, while the thing in the middle is either random and unknown, or fixed and unknown and where in both cases the unknown-ness is dealt with by allowing the thing to be represented by a probability distribution, however, between the two stages, there is a change in the distribution used to reprersent the thing in the middle. So the difference is that, for traditional CI's, the probability statement relates to the end-points at the stage when they are random with the thing in middle treated as fixed, while, for Bayesian cedible intervals, the probability statement relates to a stage where the end-points are treated as fixed. I expect that is as clear as mud, which is why having proper mathematical statements about what is going on is important. Melcombe (talk) 09:05, 1 May 2008 (UTC)[reply]

Second question: is the assertion in the main article (under methods of derivation) about a duality that exists between confidence levels and significance testing universally accepted? That is, if a population parameter has a finite number of possible values, and if I can calculate, for each of these, the exact probability of the occurence of the observed or a "more extreme" outcome, is it safe from dispute to assert that the 95% confidence interval consists precisely of those values of the population parameter for which this probability is not less than 0.05? Or does that duality just happen to hold in common circumstances? —Preceding unsigned comment added by 62.77.181.1 (talk) 16:21, 30 April 2008 (UTC)[reply]

For the second question ... the significance test inversion works in general but note that (i) you don't need to restrict yourself as in "if a population parameter has a finite number of possible values"; (ii) each different significance test would produce different confidence intervals, so you should not think of there being a "the" confidence interval. The "significance test inversion" justification, which is reasonably simple, is one of the things that still needs to be included in the article. Melcombe (talk) 16:39, 30 April 2008 (UTC)[reply]

Example of a CI calculation going terribly wrong

Here's an example I put up yesterday at Talk:Bayesian probability

Suppose you have a particle undergoing diffusion in a one degree of freedom space, so the probability distribution for it's position x at time t is given by

P(x|t)dx={\frac {1}{\sqrt {2\pi \mu t}}}\exp {\frac {-x^{2}}{\mu t}}dx

Now suppose you observe the position of the particle, and you want to know how much time has elapsed.

It's easy to show that

{\hat {t}}={\frac {x^{2}}{\mu }}

gives an unbiased estimator for t, since

E({\hat {t}}|t)=t.

We can duly construct confidence limits, by considering for any given t what spread of values we would be likely (if we ran the experiment a million times) to see for

{\hat {t}}

.

So for example for t=1 we get a probability distribution of

P(\;{\hat {t}}\;)d{\hat {t}}\propto {\sqrt {\hat {t}}}\exp {\frac {-{\hat {t}}}{\mu }}d{\hat {t}}

from which we can calculate lower and upper confidence limits -a and b, such that:

P(-a<t-{\hat {t}}<b)=0.95

Having created such a table, suppose we now observe

x={\sqrt {\mu }}

. We then calculate

{\hat {t}}=1

, and report that we can state

P({\hat {t}}-a<t<{\hat {t}}+b)

with 95% confidence, or that the "95% confidence range" is

1-a<t<1+b\;

.

But that is not the same as calculating P(t|x).

The difference stands out particularly clearly, if we think what answer the method above would give, if the data came in that

x=0\;

.

From

x=0\;

we find that

{\hat {t}}=0

. Now when t=0, the probability distribution for x is a delta-function at zero. So the distribution for

{\hat {t}}

is also a delta-function at zero. So a and b are both zero, and so we must report a 100% confidence range,

0\leq t\leq 0

.

On the other hand, what is the probability distribution for t given x? The particle might actually have returned to x=0 at any time. The likelihood function, given x=0, is

L(t;x)={\frac {1}{\sqrt {2\pi \mu t}}}

Conclusion: data can come in, for which confidence intervals decidedly do not match Bayesian credible intervals for θ given the data, even with a flat prior on θ. Jheald (talk) 15:26, 29 February 2008 (UTC)[reply]

What about a weaker proposition, that given a particular parameter value, t = t*, the CI would accurately capture the parameter 95% of the time? Alas, this also is not necessarily true.

What is true is that a confidence interval for the difference

{\hat {t}}-t

calculated for a correct value of t would accurately be met 95% of the time.

But that's not the confidence interval we're quoting. What we're actually quoting is the confidence interval that would pertain if the value of t were

{\hat {t}}

. But t almost certainly does not have that value; so we can no longer infer that the difference

{\hat {t}}-t

will necessarily be in the CI 95% of the time, as it would if t did equal

{\hat {t}}

.

This can be tested straightforwardly, as per the request above for something that could be tested and plotted on a spreadsheet. Simply work out the CIs as a function of t for the diffusion model above; and then run a simulation to see how well it's calibrated for t=1. If the CIs are calculated as above, it becomes clear that those CIs exclude the true value of t a lot more than 5% of the time. Jheald (talk) 15:38, 29 February 2008 (UTC)[reply]

You make the same errors here that you made in the Bayesian talk. You show an example of a calculation for a partcular CI, then after that, simply leap to the original argument that the confidence of CI is not a P(a<x<b). You don't actually prove that point and, as my citations show, you would have to contradict at least some authoritative sources in that claim. All we can do is a balanced article that doesn't pretent that the claim "A 95% CI does not have a 95% probability of containing the true value" as an undisputed fact. It seems, again, more like a matter of an incoherent definition that can't possibly have any bearing on observations. But, again, let's just present all the relevant citations.Hubbardaie (talk) 16:18, 29 February 2008 (UTC)[reply]

Let's review things. I show an example where there is a 100% CI that a parameter equals zero; but the Likelihood function is

L(t;x)={\frac {1}{\sqrt {2\pi \mu t}}}

Do you understand what that Likelihood function means? It means that the posterior probability P(t|data) will be different in all circumstances from a spike at t=0, unless you start off with absolute initial certainty that t=0.

That does prove the point that there is no necessity for the CI to have any connection with the probability P(a<t<b | data). Jheald (talk) 16:44, 29 February 2008 (UTC)[reply]

We are going in circles. I know exaclty wat Likelihood function means but my previous arguments already refute your conclusion. I just can't keep explaining it to you. Just show a citation for this argument and we'll move on from there.Hubbardaie (talk) 18:03, 29 February 2008 (UTC)[reply]

A confidence interval will in general only match a Bayesian credible interval (modulo the different worldviews), if

(i) we can assume that we can adopt a uniform prior for the parameter;
(ii) the function P(θ'|θ) is a symmetric function that depends only on (θ'-θ), with no other dependence on θ itself;
and also (iii) that θ' is a sufficient statistic.

If those conditions are met (as they are, eg in Student's t test), then one can prove that P(a<t<b | data) = 0.95.

If they're not met, there's no reason at all to suppose P(a<t<b | data) = 0.95.

Furthermore, if (ii) doesn't hold, it's quite unlikely that you'll find, having calculated a and b given t', that {a<t<b} is true for 95% of cases. Try my example above, for one case in particular where it isn't. Jheald (talk) 18:45, 29 February 2008 (UTC)[reply]

Then, according to Sveshnikov and the other sources I cite, it is simply not a correctly computed CI, since the CI must meet the standard that the confidence IS the P(a<x<b). I repeat what I said above. Let's just lay out the proper citations and represent them all in the article. But I will work through your example in more detail. I think I can use an even more general way of describing possible solution spaces for CI's and whether for the set of all CI's over all populations, where we have no a priori knowledge of the population mean or variance, if P(a<x<c)is identical to the confidence. Perhaps you have found a fundamental error in the writings of some very respected reference sources in statistics. Perhaps not.Hubbardaie (talk) 22:53, 29 February 2008 (UTC)[reply]

After further review of the previous arguments and other research, I think I will concede a point to Jheald after all. Although I believe as Sveshnikov states, that a confidence interval should express a probability, not all procedures for computing a confidence interval will represent a true P(a<x<b) but only for two reasons:

1) Even though a large number of confidence intervals will be distributed such that the P(a<x<b)=the stated confidence, there will, by definition, be some that contradict prior knowledge. But in this case we will still find that such contradictions will apply to a small and unlikely set of 95% of CI's (by definition)

2) THere are situations, especially in small samples, where 95% confidence intervals contradict prior knowledge even to the extent that a large number of 95% CI's will not contain the answer 95% of the time. In these cases it seems to be because prior knowledge contradicts the assumptions in the procedure used to compute the CI. In effect, an incorrectly computed CI. For example, suppose I have a population distributed between 0 and 1 by a function F(X^3) where X is a uniformly distributed random varialbe between 0 and 1. CI's computed with small samples using the t-stat will produce distributions that allow for negative values even when we know the population can't produce that. Furthermore, this effect cannot be made up for by computing a large number of CI's with separate random samples. Less than 95% of the computed 95% CI's will contain the true population mean.

The reason, I think, Sveshnikov and others still say that CI do, in fact, represent a probablity P(a<x<b) is because that is the objective, by definition, and that where we choose sample statistics based assumptions that a priori knowledge contradict, we should not be surprise we produced the wrong answer. So Sveshnikov et al would, I argue, just say we picked the wrong method to compute the CI and that the ojective should always be to produce a CI that has a P(a<x<b) as stated. We have simply pointed out the key shortcoming of non-Bayesian methods when we know certain things about the population which don't match the assumptions of the sampling statistic. So, even this concession doesn't exactly match everything Jheald was arguing, I can see where he was right in this area. Thanks for the interesting discussion.Hubbardaie (talk) 17:38, 1 March 2008 (UTC)[reply]

Non-Statisticians

Another question: Can you easily calculate a confidence interval for a categorical variable? How is this done? For a percentage, a count...? There is an example in the opening of the article about CIs for political candidates, implying that it's no big deal. But I was reading somewhere else that CIs don't apply for categorical (ie which brand do you prefer?) type variables. SPSS is calculating them, but I'm having a hard time interpreting this information. Thanks for your help! —Preceding unsigned comment added by 12.15.60.82 (talk) 20:27, 21 April 2008 (UTC)[reply]

Easily? That depends on your background. I suggest you either look at the section of this article "Confidence intervals for proportions and related quantities", or go directly to Binomial proportion confidence interval for something more complete. Melcombe (talk) 08:53, 22 April 2008 (UTC)[reply]

I have to say, that I came to this page to get an understanding of CIs and how they are calculated - but as the page presently stands it is simply to complex for anybody who doesn't have a good grounding in statistics to understand it. Not a criticisms per say but thought that need to be pointed out.

I assume that this is being edited by statisticians but just as an example the opening paragraph: "In statistics, a confidence interval (CI) is an interval estimate of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient. Increasing the desired confidence level will widen the confidence interval."

Is simply to complex and relies on to much previous knowledge of stats to make sense. WIKI is after all supposed to be written for the general reader. I know that to you guys this paragraph will seem easily understandable but trust me, for a non statistician it is uselessMaras brother Ted (talk) 20:57, 27 March 2008 (UTC)[reply]

I agree completely and I'm a statistician. There appear to be some statistician-wannabes in here pontificating about the old Bayesian vs. Frequentist debate and whether the term "confidence interval" refers to a range that has a stated probability of containing the value in question. There is no such practical difference in the real world and the entire discussion is moot. As others have pointed out in this discussion page, many mathematical texts on statistics (as the discussions cite more than once) explicitly define a confidence interval in terms of a range that has a stated probability of contaiing the true value. Any discussion that a confidence interval is NOT really about a probability of a value falling in a range is entirely irrelevant to the practical application of statistics. Anyone who disagrees should just start an article to answer the question "How Many Angels Can Dance on the Head of a Pin?". It would be every bit as productive. I just wish I had more time to rewrite this artcle.ProfHawthorne (talk) 19:47, 28 March 2008 (UTC)[reply]

Hear, hear! Um, now.... who's going to fix it? Listing Port (talk) 20:30, 28 March 2008 (UTC)[reply]

...and I am a statistician and I think "ProfHawthorne" is wrong. I note that this talk page comment is (so far) his or her only edit to Wikipedia. Michael Hardy (talk) 21:00, 28 March 2008 (UTC)[reply]

I'm not a statistician and I think the intro is terrible. For one thing, it has seven paragraphs, violating WP:LEAD and for another it's first sentence (and many other sentences) infringe on WP:JARGON which is "especially important for mathematics articles." Listing Port (talk) 21:15, 28 March 2008 (UTC)[reply]

I agree the lead section needs improvement. I do not agree with "ProfHawthorne", though. I hope we can develop something that gets the gist across clearly to the lay reader, without becoming technically inaccurate. -- Avenue (talk) 00:42, 29 March 2008 (UTC)[reply]

When did I argue that we can't develop something that makes the point without being technically inaccurate? Everything I said was entirely accurate. See my comments to Hardy below.ProfHawthorne (talk) 14:04, 29 March 2008 (UTC)[reply]

Just wanted to say thanks for all of the replies and hope this begins some useful debate and changes. I also had to add that I am not ungrateful to the work people have done on this but simply that it is not "understandable." To give you an idea I found the answer much easier to understand in a book entitled "Advanced statistics" in clinical trials" or something similar in my university library. Maras brother Ted (talk) 11:51, 29 March 2008 (UTC)[reply]

Hardy is correct that this talk page was my first comment in Wikipedia. But the conversation about confidence intervals is so misguided I could no longer sit on the sidelines. If Hardy thinks I'm wrong, point to the "error" specifically. Start by a general review of stats texts and note the overwhelming majority that define a confidence interval in terms of an interval having a stated probability of containing a parameter. Such a relationship, in fact, the basis of Tchebysheff's Inequality, a fundamental constraint on all probability distributions. If you are a statistician, you would already know this.ProfHawthorne (talk) 13:46, 29 March 2008 (UTC)[reply]

I make the following points:

This article has as much need to be technically correct as any other article on mathematics, so some complexity is necessary, although we can try for a simpler introduction;
The best way to promote misunderstanding is to use vaguely defined concepts and ill-informed versions of these, and a lot of supposed Bayesian-frequentist comparisons on these pages seem to suffer from this. "Confidence interval" does have a well-defined meaning in mathematical statistics and it is best not to cloud the issue of what "Confidence interval" means. If there is to be discussion and comparisons of other interpretations of interval estimates then this can most sensibly made in connection with either the interval estimation articles or the more general articles on statistical inference. There really is no point in having repeated "this is wrong because it isn't Bayesian" stuff put in every article that tries to give a description of what a particular statistical approach is doing;
I don't think that these pages should become a "cookbook" for statistics where people think they can find something to tell them how to analyse their data without actually thinking. There seems too much danger of just looking-up a summary table without remembering about all the assumptions that are built-in;
I wasn't aware that these pages were trying to replace existing text-books;
There needs to be some recognition of the fact that "statistics" is a difficult subject so that anything that implies that people can just look at these articles and do "statistics" for themselves should be avoided. If they haven't done a course in statistics at some appropriate level, then they should be encouraged to find a properly trained statistical, and preferably one with relevant experience.

Melcombe (talk) 09:58, 31 March 2008 (UTC)[reply]

Yes, yes, yes, yada yada, that's all fair and good, but the important fact is that your changes are excellent! Thanks! :) Listing Port (talk) 22:13, 31 March 2008 (UTC)[reply]

I think Melcombe's additions are very reasonable. However, it now contradicts the "meaning & interpretation" section. I see that someone has modified that section back to the old misconceived Bayesian vs. Frequentist distinction. The author removed the "no citations" and "controversy" flag but still offers no citations. It also contrdicts the definiton Melcombe aptly provided so I think we can still call it controversial. Since no citations are provided, its still only original research (even if it weren't controversial, which it is). I'll add the flags back to this section and we can all discuss thisHubbardaie (talk) 15:25, 4 April 2008 (UTC)[reply]

Dispute tag back

I just added the dispute and citaion flags back to the "meaning" section. Another point on the controversy is to note the entire example given earlier in the article consistently uses the notation of the form P(a<X<b) indicating a confidence interval is actually a range with a probability of containing a particular value - contrary to the uncited claims of the disputed section. —Preceding unsigned comment added by Hubbardaie (talk • contribs) 15:31, 4 April 2008 (UTC)[reply]

I removed the dispute flag having deleted a block that I thought was not necessary. Since the dispute flag was put back I have rearranged the text so that stuff on similar topics is under a single section, meaning that there are now two dispute tags left in. I think we need to be clear about exactly what bits are of concern I think that the subsection headed "How to understand confidence intervals" is/was OK, but the next subsection headed "Meaning of the term confidence" is newer and possibly not needed (or better off in some other article). I moved the stuff headed "Confidence intervals in measurement" into a separate section and left another dispute tag there. My immediate question is whether it is only therse two chunks of text that ther4e may be a problem with? Melcombe (talk) 16:30, 4 April 2008 (UTC)[reply]

Calling this a disputed issue would even be a bit too generous. The section about how to interpret the meaning of a confidence interval is simply wrong by a country mile. I work with a large group of statisticians and I mentioned this article to my peers, today. One thought it was a joke. I'm on the board for a large conference of statisticians and I can tell you that this idea that a confidence interval is not really a range with a given probability of containing a value would probably (no pun) be news to most of them. The confidence coefficient of a confidence interval is even defined as the probability that a given confidence interval contains the estimated quantity. Most of the article and the examples given are consitent with this accepted view of confidence intervals but this disputed section goes off on a different path. From my read of the other discussions, it appears that the only source anyone can find for the other side of this debate is a single philosophy book which was not quoted exactly. There are many other good sources provided and they all seem to agree with those who say the confidence coefficient really IS a probability. Or did I get taken by a late April Fool's joke?DCstats (talk) 19:43, 4 April 2008 (UTC)[reply]

Well said. I suggest the simplest fix is just the deletion of that entire section. It is inconsistent with the rest of the article, anyway.ERosa (talk) 13:22, 6 April 2008 (UTC)[reply]

I suggest a major rewrite

I have read the informal and the theoretical definitions of "confidence interval" in this article and find the informal definition to be way too vague, and the theoretical one to almost completely miss the point. I strongly urge a major rewrite of this article.

By the way, i took some time to google "confidence interval" to find out what is readily available on the web via course notes or survey articles. I spent less than 30 minutes on this, but I must say that there is a huge amount of poor and misleading exposition out there, which may explain in part where this article got its inspiration from. Then again, maybe not. Daqu (talk) 02:49, 6 April 2008 (UTC)[reply]

I agree. The part of this article that discusses how to "properly" intepret the meaning of the confidence interval is being refuted by many in here for good reason. Someone has some very wrong ideas about confidence intervals, statistics and just basic probability. The previously noted lack of sources in that section should be our first clue. ERosa (talk) 13:20, 6 April 2008 (UTC)[reply]

IMO one fundamental problem is that confidence intervals don't actually mean what most people (including many scientists who use them) actually believe them to mean, and explaining this issue is non-trivial, especially when the potential audience comes with erroneous preconceptions and may not even have any suspicion that they are fundamentally wrong.Jdannan (talk) 02:43, 24 April 2008 (UTC)[reply]

I agree there are a lot of misconceptions, some shown here in this article. But the body of the article uses language quite consistent with every mathemaical statistics text I have on my shelves. The article correctly explains that for a 95% confidence interval the following must hold:

P(-z\leq Z\leq z)=1-\alpha =0.95.

In other words, at least for this part of the article, the author correctly describes a confidence interval as an interval with a stated probability of containing the value. Others in this discussion that attempt to directly refute this use of the term end up contradicting most of the math in this article and every reputable text on the topic (only one side on this discussion has provided citations). —Preceding unsigned comment added by ERosa (talk • contribs) 14:11, 24 April 2008 (UTC) Oops, forgot to signERosa (talk) 14:15, 24 April 2008 (UTC)[reply]

Exactly. I think we have some problems with some flawed thinking about stats in here.DCstats (talk) 19:19, 25 April 2008 (UTC)[reply]

But even the language here is open to misinterpretation. Prior to making the observation and calculating the interval, one can say that a confidence interval will have a certain probability of containing the true value (ie, considering the as-yet-unknown interval as a sample from an infinite population - 95% of the confidence intervals will contain the parameter). But once you have the numbers, this is no longer true, and one cannot say that the parameter lies in the specific interval (once it has been calculated) with 95% probability. IME it is not trivial to explain this in simple but accurate terms. In fact numerous scientists in my field don't appear to understand it.Jdannan (talk) 00:29, 25 April 2008 (UTC)[reply]

But ... even "once it has been calculated" there is still the probability-linked interpretation that, (whatever the outcome) it is the outcome of a procedure that had a given probability of covering the "true" value". Melcombe (talk) 08:56, 25 April 2008 (UTC)[reply]

I suppose anything ever written is open to misinterpretation in the sense that some people will misinterpret it. But the expression used above by ERosa is entirely consistent with every credible reference used in universities (one in which I've taught). Whatever misinterpretation it might be open to by laymen, apparently the mathematical statisticians find it to be completely unambiguous.

Your claim that you only have a confidence interval "prior to making the observations and calculating theinterval" but that is not longer true "once you have the numbers" belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable.DCstats (talk) 19:19, 25 April 2008 (UTC)[reply]

Not sure how you got that from what I wrote. I (obviously) never made the absurd statment that one only has a confidence interval prior to making the observations. What I was trying to convey was that the statistical property (eg 90% coverage) can only ever apply to a hypothetical infinite population of CIs calculated in the same way from hypothetical observations that haven't been made. Saying that a given confidence interval has a 90% chance of containing the parameter is equivalent to saying that a given coin toss, once you have already seen it come up heads, has a 50% chance of landing tails. Are you really saying that you would be happy with a statement that the confidence interval [0.3,0.8] has a 50% chance of containing a parameter, when that parameter that is known to be an integer? (eg the number of children of a given person)?Jdannan (talk) 22:55, 25 April 2008 (UTC)[reply]

I thought I was getting it from your direct quotes and, on further reading, it seems I quoted you accurately. Anyway, you are using an irrelevant example. When a 90% CI is computed, it is usually the case that all you know about it is the samples taken, not the true value. You are correct when you say that in an arbitarilly large series of 90% CI's computed from random samples, 90% will contain the estimated value. You are incorrect when you claim that saying a 90% CI has a 90% chance of containing the answer is like saying that a coin toss has a 50% chance of being tails once you have already seen it come up heads. The two are different because in the case of a 90% CI gathered from a random sample, you do not have certain knowledge of the parameter (eg. the mean). That's why its called a 90% CI and not a 100% certain point.

If you have additional information other than the samples themselves, then the methods that do not consider that knowledge do not apply and the answer they give wouldn't be the proper 90% CI. In those cases where other knowledge is available, 90% CI still means the same thing, but you have to use a different method (discrete Bayesian inversions for your integer example) to get an interval that is actually a 90% CI. You seem to be confusing situation where you somehow obtained the knowledge of true value and the situation where we are not certain of the value and can only estimate it after a random sample.

I would really like to see what source you are citing for this line of reasoning. It is truly contrary to all treatment of the concept in standard sources and is contrary even to the meathematical expressions used throughout this article where a confidence interval is described as a probility that a value is between two bounds.DCstats (talk) 02:07, 26 April 2008 (UTC)[reply]

"If you have additional information other than the samples themselves, then the methods that do not consider that knowledge do not apply and the answer they give wouldn't be the proper 90% CI." Why on earth not? If the (hypothetical, infinite) population of confidence intervals calculated by a specific method have a given probability of coverage, then they are valid confidence intervals. Can you give any reference to support your assertion that this is not true when a researcher has prior knowledge? There are numerous simple examples where perfectly valid methods for generating confidence intervals with a specific coverage will produce answers where the researcher knows (after looking at a given interval so generated) whether the parameter does, or does not, lie in that specific interval.Jdannan (talk) 03:06, 26 April 2008 (UTC)[reply]

Because "on earth" you haven't met the conditions required for the procedure to produce the right answer. If you already know the exact mean of a population, THEN take a random sample, the interval will not be a 90% CI, by definition. You just answered your own question when you asked "If the (hypothetical, infinite) population of confidence intervals calculated by a specific method have a given probability of coverage, then they are valid confidence intervals." If those conditions really apply and the probability is valid, then the CI is valid. If you know in each of those cases what the true mean is or that the answer must fit other constraints (like being an integer) then the the standard t-stat or z-stat procedure is NOT your 90% CI. The hypothetically infinite population thought experiment will prove it. If you take a random sample of a population where you already know the mean to be greater than 5, then use the t-statistic to compute a "90%" CI of 2 to 4, then we know that the actual probability the interval contains the answer is 0% (because of our prior knowledge). The meaning of 90% CI didnt change, but using a procedure that incorrectly ignores known constraints will, of course produce the wrong answer. You just use a different method to compute the PROPER 90% CI that would meet the test over a large number of samples. You have simply misunderstood the entire point. Just like in any physics problem, if you leave out a critical factor in computing, say, a force, you have not proven that force means other than it does. You simply proved that you get the wrong answer when you leave out something. If you need a source for the fact that a 90% CI *really* does mean that an interval X, Y has a probability P(X<m<Y) where m is the paremeter, then you only need to pick up any mathematical stats text. I see references offered earlier in this discussion that look perfectly valid. Can you offer any reference that a 90% CI is NOT defined in this way? —Preceding unsigned comment added by DCstats (talk • contribs) 13:12, 26 April 2008 (UTC)[reply]

I forgot to sign that. Also, if you have prior knowledge and wish to compute a 90% CI where there really is a 90% chance the interval contains the answer, then you have to use Bayesian methods. The t-stat and z-stat only apply when you don't have other knowledge of constraints. That's why they are called non-Bayesian. So if you set up a test where you took a large number of samples and 90% of the 90% CIs contain the true value, and you want to account for other known constraints, then the non-Bayesian methods will produce the wrong answer. The definition of 90% CI has not changed, mind you. But the procedures that fail to take into account known constraints will not meet the test of repeated sampling.DCstats (talk) 13:28, 26 April 2008 (UTC)[reply]

I think you might want to clarify that you are refering to prior knowledge in a large number of samples, you do have to know the true mean of the population in order to know that 90% of the 90% CI's meet contain the true population mean. That aside, DCstats is right. If we know that the mean is not between 2 and 4, then 2 to 4 cannot be the 90% CI according to how the term is defined. So I don't understand what Jdannan is saying. Are you saying that if you have knowledge that makes it 100% certain that the mean is not between 2 and 4, that 2 and 4 are still the 90% CI as long as you used a method that (incorrectly) ignored prior knowledge? That makes no sense. If a researcher knows that the answer cannot be within the interval she just computed, then she just computed it incorrectly. What's the confusion?ERosa (talk) 13:50, 26 April 2008 (UTC)[reply]

Oh dear. I asked for a reference, not an essay. Never mind, you have (both) demonstrated quite conclusively that you are hopelessly confused over this, so this will be my last comment on this topic. One specific error in what you are saying: a method that has 90% coverage is indeed a valid method for generating confidence intervals even though it may generate stupid intervals on occasion. Here are two references that support my point (many supposed definitions are rather vague and blur the distinction between frequentist and Bayesian probability): [1] [2]. One of them even specifically makes the point that a CI does not generally account for prior information. Let's go back to my trivial example about integers: if X is an unknown integer, and x is an observation of X that has N(0,1) error then it is trivial to construct a 25% (say) confidence interval centred on the observed x - ie (x-0.32,x+0.32) and these intervals WILL have 25% coverage (that is, if one draws more x from the same distribution and creates these intervals, 25% will contain X) and yet these intervals will not infrequently contain no integers at all. The fact that one can create OTHER intervals, or do a Bayesian analysis, has no bearing on whether these intervals are valid 25% confidence intervals for X, and I do not believe you will find a single credible reference to say otherwise.Jdannan (talk) 02:44, 28 April 2008 (UTC)[reply]

You are an excellent example of how a little knowledge (in this case, very little) can be a dangerous thing. First, it was already made clear that the citations provided previously in this discussion (Sveshnikov, Mathworld, etc.) are sufficient to make the point that if a and b are a 90% CI of x, then P(a<x<b)=.9, literally. That's why you will find that notation consistently throughout all (valid and widely accepted) sources on this topic. If an interval does not have a 90% chance of containing x, then it is simply not the 90% CI of x, period. Even the first of your own citations is consistent with this and it directly contradicts the definition of your second source. Did you even read these? Furthermore,the Bayesian vs. frequentist distinction has no bearing in practice and is only an epistemological issue. It can (and has) been shown that where a Bayesian probability of 80% has been computed, over a large number of trials, the frequency will be such that 80% of the events will occur. The Bayesian vs. frequentist distinction simply has no relevance to the practical application here and your example is just as moot as it was before. I believe you are one of a group (hopefully very small group) of laymen who continue to propogate a fundamental misconception about the nature of the Bayesian vs. frequentist debate and attempts to redefine what should be fairly straightforward concepts like confidence intervals with this confused concept. ERosa (talk) 13:29, 29 April 2008 (UTC)[reply]

You know, I think you wrote that first sentence without seeing even a scrap of irony! -- Avenue (talk) 15:09, 29 April 2008 (UTC)[reply]

There would only be irony if I were the one with the "very little knowledge". In this group, that is definitely not the case. I'm simply explaining that the literal meaning of the mathematical expression P(a<x<b) really is a probability that x will fall between a and b. This is contrary to one source provided by Jdannan and completely consistent with the other source provided by Jdannaan. (a,b) is a confidence interval of x with C confidence if and only if P(a<x<b)=C. The second source Jdannan provided actually says (ironically, in complete disagreement with his first source) that a 95% CI does NOT actually have a 95% chance of containing the estimated parameter. In other words, that source is saying that where (a,b) is 90% CI, P(a<x<b) does NOT equal .9. This is quite different from the Sveshnikov and Mathworld sources provided above as well as different from every graduate level text you will find on the topic. Period. There are sources already provided (one of which is Jdannon's own source) that P(a<x<b) is equal to the stated confidence of the interval. I think it is fair to ask for a graduate level text (not a website) that explicitly states that P(a<x<b)<>C. If you can produce a valid source, then will will show only that different valid sources take opposing positions (since the sources in favor of my position are certaintly valid). If you can't produce one, then we will have settled who should be seeing which irony. Fair enough? ERosa (talk) 18:50, 29 April 2008 (UTC)[reply]

Certainly it is true that P(a<x<b) <> P(a<x<b) if the P,a,x or b on the left hand side are not the same as or do not have the same meaning as the P,a,x or b on the right hand side. Just because the same symbols are used does not make them the same. You do actually have to think about what these mean and how they are interpreted. Which is why the article here, and any other on statistics, needs to be specific about such things. Melcombe (talk) 09:20, 30 April 2008 (UTC)[reply]

Time for a little bit of actual stats knowledge in this conversation and, for that matter, a little bit of basic logic. When the values of a, x, b are held constant, P(a<x<b) means the same in both situations. This is why identical notation is used throughout every textbook on the topic and it is used without any hint that somehow the meaning had changed from one chapter to the next. As ERosa pointed out, the second source provided by Jdannon is not consistent with most valid sources on this topic. This source stated that a confidence coefficient (that's the probability associated with a confidence interval) should not be interpreted as a probability. This is wrong. At least four citations have been given that support this definition that the confidence coefficient is actually a probability and all notation used is consistent with this without ever explaining that P() on one page means probability and on another page it doesn't. It appears that the citation given by Avenue below was not in any peer-reviewed journal or text that would be used in any stats course. On a side note, I showed a petroleum-industry statistician this debate just a few minutes ago and we both agreed that Jdannon and Avenue (assuming they are different persons and not socks) must be some terribly confused armchair statisticians. I'll repeat the previous challenges in here to produce a source for the opposing argument that is not just a unpublished document or web page written by a community college teacher.DCstats (talk) 18:22, 30 April 2008 (UTC)[reply]

“

Warning! There is much confusion about how to interpret a confidence interval. A confidence interval is not a probability statement about θ since θ is a fixed quantity, not a random variable.

”

— Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference, p. 92

You have a genius for missing the point. The symbol P is the one there appears to be confusion over, not a, x, and b. We don't know who your oil industry friend is, or what you told them, so that doesn't help the debate here. Accusing someone with a thousand times more edits than you (and that's not an exaggeration) of being a sock puppet is laughable. Yes, the link I gave is not from a textbook. I offered it to help clear up your confusion, not as something to put directly in the article. A better source is quoted to the left. This is entirely consistent with Jdannan's second link above. -- Avenue (talk) 00:23, 1 May 2008 (UTC)[reply]

To DCstats, Avenue also has a genius for missing the point. And I think you may be on to something about the sock puppets. Apparently Avenue believes that the number of edits somehow insulates him or her from that suspician. I think it is more of a measure of how much time someone has on thier hands (I'm not trying to be discouraging to all of the useful contributors of wikiland).

To Avenue, as to whether or not you are getting the point, there are at least four widely accepted (and published) references that directly your position and the position of the this source (if it is valid, it would be the first). They were offered by Hubbardaie much earlier in the discussion. But I will make a gesture in the spirit of community. I'll review the source you just provided and if I can confirm it is valid then I will concede that there is a debate. We obviously can't just say that the debate is settled in favor of the single source you gave because the other sources - a couple of which I have confirmed - obviously contradict it. So, those are the choices: Either Hubbardaie's sources are right and a confidence interval is literally interpreted as a probability or there is a debate and neither can be held as a consensus agreement. Given the stature of the previously offered sources in favor of the "CI is a probability" position", there can be no other reasonable path.ERosa (talk) 01:06, 1 May 2008 (UTC)[reply]

(reset indent) If you want to pursue the sock puppet accusation, go ahead. From Jdannan's home page it seems we live in different countries, so checkuser should be clearcut.

It is not widely accepted that Hubbardaie's sources contradict the position taken by mine. The only comment he got after posting them suggested he didn't fully understand the main one he quotes, by Sveshnikov. If interpreted correctly, there is no apparent contradiction. He may easily have also misunderstood the last two, which he hasn't quoted. The Mathworld link uses odd language ("A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability", my emphasis), and doesn't discuss interpretation at all. Not one to rely on, I think.

The source I posted is on Google Books,[3] if that helps. (The fact that you consider reviewing the source I posted such a magnanimous gesture seems a bit bizarre.) If you do look at it, you'll find Wasserman defines confidence intervals in a way consistent with Sveshnikov. You seem to be seeing contradictions where none exist. -- Avenue (talk) 04:11, 1 May 2008 (UTC)[reply]

Another area where there seems to be confusion (as you noted before) is the distinction between statements derived from probabilities (confidence intervals) and statements of probabilities. See page 4 here for an explicit explanation of this point. -- Avenue (talk) 15:02, 28 April 2008 (UTC)[reply]

Thanks for that link - it looks like a nice clear description, and I like the challenge to bet on the results!Jdannan (talk) 08:17, 29 April 2008 (UTC)[reply]

Your standards for a "source" are not very high. Well, perhaps that's why you merely refer to it as a "link" and ot a source. That's probably a good idea. This is not published in any peer reviewed journal nor widely accepted text. A quick review of ITS sources reveals that many of them are also just unpublished web sources. Please provide a proper source. This reads like bad freshman philosophy. ERosa (talk) 19:05, 29 April 2008 (UTC)[reply]

Regarding more general comments about the contents of this article:

those who want to see a description of Bayesian versions of interval estimation would do well to help to specify how this works formally in the article Credible interval, where there is a sad lack of specificity.
those who want more discussion of comparisons of procedures, and of interpretations and usefulness and of alternatives would do well to contribute to the article Interval estimation.
those who are interested in more general philosophical questions associated with other ways of thinking about interval estimation may also want to see Fiducial inference.

Melcombe (talk) 08:56, 25 April 2008 (UTC)[reply]

As you suggested, I checked out the articles on credible interval ad fiducial inference. Both are also seriously lacking in verifiable citations for specific claims in text, but these articles seem to be more consistent with accepted thinking on the topic. Both correcty describe these distinctions as not widly accepted and, in the case of fiducial inference, even controversial. The credible interval article even correctly gives a definition of confidence interval that we should simply use for this article. The fiducial inference inference article uses the same mathematical expressions describing that a confidenc interval is really the probability that the inteval contains the value with the stated probability.

Since we can't use original research, we have to resort to claims based only on published citations. I suggest that we refrain from adding lengthy philosophical interpretations based on not a single in text source, especially when the cited sources discussed elsewhere in this discussion directly contradict the position.DCstats (talk) 19:19, 25 April 2008 (UTC)[reply]

I don't know if this helps the discussion or not, but... If I do one test at 90% the chance that the interval will contain the real result is 90%. If I do 20 tests at 90%, and one returns a positive result, the chance that that interval contains the real result is about 12%. When multiple tests are done they are not independent of each other. Consequently, the real 90% confidence interval doesn't just depend on the result of one test, it depends on the number of tests done and the number of positive results found. This confusion of meanings over what the 90% means is I think at the heart of most of the world's dodgy statistics. It would be nice if this article could make these sorts of things clear. Actually, it would be nice if this and other mathematical articles in WP were actually meaningful to laymen. —Preceding unsigned comment added by 125.255.16.233 (talk) 22:26, 27 April 2008 (UTC)[reply]

I can answer your first question...it might have helped if you didn't make some mathematically incorrect (or not sufficiently explained) statements. Your second half of the first sentence is correct (If I do one test at 90%...). The second sentence will require more explanation because, even though I'm a mathematician and statistician, you will have to explain your assumptions before I can see how they are correct (that's a nice way for me to say they don't make sense). If you mean that when you conduct 20 trials of something that has a 90% probability, then the probability of getting exactly one success is on the order of 10 to the power of -18 (one millionth of one trillionth). This is a simple binomial distribution with a 90% success rate and 20 trials. So I will give you the benefit of the doubt and assume that you actually meant something other than that and that further explanation would clear it up. But one of your conclusions, oddly, is correct: that the meaning of a 90% CI must be that over a very large number of trials, the probability times the trials equals the frequency.DCstats (talk) 00:16, 28 April 2008 (UTC)[reply]

A confidence level of 90% means there's a 90% chance that you won't get a false +ve. In 20 trials, there is therefore a (.9)^20 chance that you won't get any false +ves. That's about 12%. So any of the remaining tests will have a real confidence level of 12%, 'real' meaning taking into account all the tests. —Preceding unsigned comment added by 125.255.16.233 (talk • contribs)

First, this article is about confidence intervals, not significance tests. The size of the interval gives more information than a test result. An analogue of the problem of multiple comparisons still applies, but thinking about the false discovery rate is probably more useful here than the familywise error rate. To some extent it's implicit in the treatment here already, but explicitly mentioning the issue might be useful. I don't think we need to get into numerical details though. -- Avenue (talk) 12:14, 28 April 2008 (UTC)[reply]

Avenue is correct that this is about confidence intervals and not significance test. Even if it were about significance tests, the unsigned contributor is still getting his math wrong - or, more precisely, using the wrong math to answer the question. A significance test is simply not defined as the probability of getting all correct results (i.e. the true value being within the interval) after 20 trials. The probability of a 90% CI bounding the true value is 90% for each individual test, no matter how many are conducted and that is the only probability relevant to the CI. Furthermore, the unsigned contributor mentioned nothing of this calculation being some sort of significance test in the original statement.ERosa (talk) 14:46, 28 April 2008 (UTC)[reply]

I saw language like "the number of tests" and "false +ve" and assumed that they were thinking of each confidence interval as having an associated significance test. If I've misinterpreted them, I'm sorry. -- Avenue (talk) 15:19, 28 April 2008 (UTC)[reply]

If you look up confidence level you will see that it redirects to this article, meaning that the editors thought that information on both topics falls within the scope of this article. Further, the size of the interval is analogous to the smallness of the p value; a false positive with a really narrow interval is still a false positive. The width of the confidence interval is of no use in calculating the likelihood of a false positive, and so has no real meaning when judging whether a statistic is believable or not. —Preceding unsigned comment added by 125.255.16.233 (talk) 13:53, 28 April 2008 (UTC)[reply]

It seems natural to me that confidence level redirects here, and significance level takes you to Statistical significance, even though they might be equivalent in some mathematical sense. It certainly does not mean that everything covered there has to be covered here, and vice versa. I see you are now bringing in p values, another topic that needs only cursory mention here at most. All these things are related, but the focus here is on confidence intervals. -- Avenue (talk) 14:15, 28 April 2008 (UTC)[reply]

Dear DCstats and ERosa

Starting a new section here because the preceding is hopelessly indented.

DCstats and ERosa, you are mistaken in your thinking. Jdannan and Avenue are quite correct. I know this might be very difficult for you to accept. You claim to have references supporting your interpretation. However, I think you are misinterpreting the statements from the textbooks you are reading. I can't speak for your "petroleum industry statistican" friend. My guess is that you conveyed your misinterpretation of the situation to him/her. I have a ph.d. in mathematics and after reading this talk page and thinking I was almost losing my mind, I consulted with several other ph.d. mathematician and statistician friends of mine who confirmed that Jdannan and Avenue are indeed correct. And the interpretation given in the article is basically correct as well. In fact, it's the interpretation I've seen given in every stats book I've looked at.

Some references, since these seem to be very important to people:

"Warning! There is much confusion about how to interpret a confidence interval. A confidence interval is not a probability statement about θ since θ is a fixed quantity, not a random variable." — Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference, p. 92
"CAUTION! A 95% confidence interval does not mean that there is a 95% probability that the interval contains μ. Remember, probability describes the likelihood of undetermined events. Therefore, it does not make sense to talk about the probability that the interval contains μ, since the population mean is a fixed value. Think of it this way: Suppose I flip a coin and obtain a head. If I ask you to determine the probability that the flip resulted in a head, it would not be 0.5, because the outcome has already been determined. Instead the probability is 0 or 1. Confidence intervals work the same way. Because μ is already determined, we do not say that there is a 95% probability that the interval contains μ." — Michael Sullivan III, Statistics: Informed Decisions Using Data, p. 453
"The idea of interval estimation is complicated; an example is in order. Suppose that, for each λ, x is a real random variable normally distributed about λ with unit variance; then, as is very easy to see with the aid of a table of the normal distribution, if M(x) is taken to be the interval [x − 1.9600, x + 1.9600], then (1) P(λ in M(x) | λ) = α, where α is constant and almost equal to 0.95. It is usually thought necessary to warn the novice that such an equation as (1) does not concern the probability that a random variable λ lies in a fixed set M(x). Of course, λ is given and therefore not random in the context at hand; and given λ, α is the probability that M(x), which is a contraction of x, has as its value an interval that contains λ." — Leonard J. Savage, The Foundations of Statistics, p. 260

Your misunderstanding seems to be based on an inadequate grasp of a few facts:

The population mean μ is a fixed constant, not a random variable.
The endpoints of the confidence interval are random variables, not fixed constants.
A statement about the level of confidence of a confidence interval is a probabilistic statement about the endpoints of the interval considered as random variables.

The issue about whether μ is "known" or not is a complete red herring. What matters is that μ is fixed, not whether it's "known". This also has absolutely nothing to do with probabilistic interpretations of quantum physics, good lord.

Let me go through an example which has the most simplifying assumptions. Suppose we have a normally distributed random variable X on a population with mean μ = 100 and standard deviation σ = 16, and suppose we select samples of size n = 100. Then the sample mean X-bar is a sampling distribution defined on the space of all possible samples of size 100, and it has mean μ = 100 and standard deviation σ/sqrt(n) = 16/sqrt(100) = 16/10 = 1.6. A 95% confidence interval for μ based on this sampling is then given by (X-bar − z_α/2*σ/sqrt(n), X-bar + z_α/2*σ/sqrt(n)) = (X-bar − (1.96)*1.6, X-bar + (1.96)*1.6) = (X-bar − 3.136, X-bar + 3.136).

Now, note the following facts:

The confidence interval is defined to be a random interval, i.e. the endpoints of the confidence interval are random variables defined in terms of the sample mean X-bar.
We "know" what the population mean μ is, and yet we were still able to define the random interval that is the "95% confidence interval". Our "knowledge" of the value of μ has nothing to do with whether we can define this random interval.
The probability that μ is contained in the random confidence interval is 95%. In other words, P(X-bar − 3.136 < μ < X-bar + 3.136) = 95%. Note, this is a probabilistic statement about the random sampling distribution X-bar.

Once we construct a particular confidence interval, the probability that μ is contained in that particular realization is either 0 or 1, not 95%. This is because our endpoints are no longer random variables, and our interval is no longer a random interval. For example, suppose we take a sample and get x-bar = 101. In that case, our particular realization of the confidence interval is (97.864, 104.136). The probability that μ = 100 is contained in this interval is one, not 95%. This is because 97.864, 100, and 104.136 are all just fixed constants, not random variables. If we take another sample and get x-bar = 91, our particular realization of the confidence interval is (87.864, 94.136), in which case the probability that μ = 100 is contained in this interval is zero, not 95%.

"Your claim that you only have a confidence interval 'prior to making the observations and calculating the interval' but that is not longer true 'once you have the numbers' belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable."

Well, sure, just like it's experimentally testable by flipping a fair coin 1,000,000 times that it has 50% probability of landing heads. But once you've flipped the coin heads, the probability that particular coin toss is heads is 100%, not 50%. You seem to be very caught up on the fact that μ is "unknown". This is totally irrelevant. As I said above, what matters is that μ is fixed, not whether it's "known".

Here's a little thought experiment: Suppose I go into another room where you can't see me. I flip a coin in that room, and it comes up either heads or tails, but you can't see which happened. Now, you decide to flip a fair coin. It is true, the experiment of you flipping a fair coin can be modelled by a Bernoulli trial random variable, and the probability that the flip will agree with my flip is 50%. Now, suppose you actually flip the coin, and say it comes up heads. What is the probability that your heads flip agrees with my flip? It's not 50%, it's either 0% or 100%. The fact I don't have knowledge of whether your flip is heads or tails, and hence the fact I don't know which of the two cases is correct, whether the probability is 0% or 100%, is totally irrelevant. The two coins are specific physical objects. They either agree or they don't agree. There is no "50% probability" they agree.

Let's go one step further. Suppose the coin I flip that you can't see is a fair coin. Once I've flipped my fair coin, the probability you will flip a fair coin that agrees with mine is 50%. However, this has nothing to do with whether my coin is fair or biased. To see this, suppose my coin is biased and is weighted to flip heads 99% of the time and tails 1% of the time. Now, I flip the coin, and you can't see the result. The probability you will flip a fair coin that agrees with mine is still 50%. This is because my flip is a fixed constant, not a random variable. It's your flip that's random, not mine! Now, do you see why "knowledge" of μ is irrelevant?

The fact that so many people who claim to be statisticians or scientists are confused on these points is a bit surprising to me. But then again, a world in which mathematical idiots like MarkCC spout complete nonsense all the time and everyone thinks he's a math genius, and a world in which every scientist "knows" HIV causes AIDS, then again, maybe it shouldn't surprise me. darin 69.45.178.143 (talk) 16:40, 1 May 2008 (UTC)[reply]