Talk:Confidence interval/Archive 3

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Archive 1 Archive 2 Archive 3 Archive 4


Archive proposal

This Talk page is getting clumsy and could do with archiving. I propose archiving everything prior to 2009 - any objections, or suggestions of a different cutoff point? seglea (talk) 22:25, 15 April 2009 (UTC)

I agree it's too big, but the discussions in June 2008 seem to overlap with some of the points raised recently. How about just archiving everything prior to that? -- Avenue (talk) 00:16, 16 April 2009 (UTC)
Done. -- Avenue (talk) 18:40, 2 May 2009 (UTC)

I'm a bit rusty in statistics, but......

The terms of confidence interval, confidence level and degree of certainty are needed to be clarified. If they are synonyms, please list them. BTW, is there any APPROVED STATISTICS TERMINOLOGY? —Preceding unsigned comment added by (talk) 11:37, 10 May 2009 (UTC)

I recommend to create another article for confidence level, even there are a few words-- (talk) 11:58, 10 May 2009 (UTC)

In the book of mine titled as Introduction to Business Statistics by Alan H. Kvanli, the relationship of Confidence Interval and Confidence Level has bee described as follows:
The higher the confidence level, the wider the confidence interval. The confidence level is written as (1 − α)·100%, where α= .01 for a 99% confidence interval, α= .05 for a 95% confidence interval, and so on.-- (talk) 12:12, 10 May 2009 (UTC)

I tried to get the two terms separated some time back, but there are forces at work here that don't want to see that happen. (talk) 14:47, 7 June 2009 (UTC)

Avoid jargon words in the definition of the term.....

The definition of the term in the following

is more understanable than the one in the begining of this article.-- (talk) 12:25, 10 May 2009 (UTC)

Meaning of the term "confidence"

Hello. I'll admit right away that I'm not a statistician, but this section had me so confused that I feel forced to question it.

"In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less."

Wouldn't the researcher only see it if his observation is incorrect? If we reject hypothesis A with a confidence interval of 95%, we have not necessarily observed something that happens one time in twenty. If hypothesis A is CORRECT (which is not necessarily the case) and we STILL reject it due to our observations, then we have seen something that happens one time in twenty or less (and vice versa). This text says kind of the opposite.

A statistical test involves looking at the probability of an event and comparing it with the significance level. If the probability of the event is less than the significance level you reject the null hypothesis, which is that the event happened by chance. If the confidence level is 95%, the significance level is 5%, or 1 in 20. If you reorganise the mathematics so that you are testing on a confidence interval, the basis for the test is still that there is a 5% chance that the expected statistic will fall outside the interval. That's what the 95% means. So when you do a statistical test at 95%, you are looking to see if something has occurred that has a probability of happening of 5% or less. If the result is positive, then the question arises as to whether it is a false positive (a type I error) or a real result. If you only do one test, and it returns a positive result, then 95% of the time it will be a real result.
"If you only do one test, and it returns a positive result, then 95% of the time it will be a real result." NO this is flat-out wrong. It is simply not possible to infer a probability that an effect is "real" or not based on frequentist statistics. If the null hypothesis is true, then in 5% of experiments a test will generate a "significant" result (according to the test) and 95% of the time it will not. That's all. Jdannan (talk) 04:11, 8 June 2009 (UTC)

"If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were."

Assuming you had specified your test before rolling the dice, this is true enough, but note that the 97% "confidence" here does not equate to believing the dice are fixed with probability 97%. Jdannan (talk) 04:11, 8 June 2009 (UTC)

The chance is also 1/36 of rolling a double one, two, three, four or five. Even if you get a mix, by the above reasoning you could say: "I got a 3 and a 5, the odds of that are only 2/36, or about 6%, so I am 94% confident that the dice are fixed". "Statistically speaking" this whole reasoning must be considered faulty, since you're very certain that the dice are wrong no matter what. At least this sounds to me as though the test is made "a posteriori". You need to, so to speak, FIRST set up a confidence interval, THEN look at the dice. Here you are purposefully constructing a confidence intervall that does not include the sample outcome. Maybe I'm just reading it wrong; it would work if you first say, "if the dice are normal, I am 97% confident that they will turn up a result from 2 to 11", then roll the dice once and discover that they failed your test. If that's the case, I think it needs to be clarified.

The underlying assumption is that you are trying to roll a double six, and hence that the probability of doing so is what matters. If you use your 3 in 5 argument, you are really saying that if you roll the dice and get two numbers then the dice must be fixed. But the chance of that is 100%. Or maybe you are saying two different numbers; the chance of that is 5/6. The double six argument is two sixes.

"For example, say a study is conducted which involves 40 statistical tests at 95% confidence, and which produces 3 positive results. Each test has a 5% chance of producing a false positive, so such a study will produce 3 false positives about two times in three."

First of all, the outcome of the tests should depend on the things that are actually tested. Assume for instance that the actual value of the 40 things tested were all positive. Then you could have no "false positives", but in this example, you would have 37 false negatives. I guess the idea is that the tests should all come out negative (if the tests were 100% correct), but 3 come out as false positives. That is to say we do 40 coin tosses with a coin that's 95% likely to end up on one side and 5% on the other, and still we get the other 3 times out of 40. Then I must ask how it's been calculated. I would get

0.95^37 * 0.05^3 * 40! / (37! * 3!) = 0.185 as the likelihood of 3 false positives, so how is it "two times in three"?

Two times in three the study will have produced 3 false positives, there being no real effects. The next most likely outcome is two false positives and one real result; obviously the chance of that is less than one third. Within the third, there is a very minute chance that the study will have produced no false positives and 37 false negatives. I can't calculate the exact probability of that for you because I don't know the numbers involved, but it is insignificant. You can't start by making an assumption about which result are reael because if you knew that why would you be doing statistics?
Just noticed... The chance of a positive being a false positive is 5%, because that's what the test is based on. The chance of a negative being a false negative is dependent on the strength of the effect and the number of tests. It's not 5%; unless you are dealing with small sample sizes its a hell of a lot less. —Preceding unsigned comment added by (talk) 23:28, 7 June 2009 (UTC)

"Thus the confidence one can have that any of the study's results are real is only about 32%, well below the 95% the researchers have set as their standard of acceptance."

Well, the confidence that any single result is correct is 95%, obviously (what does "any of the study's results" mean if not these?). (talk) 23:20, 6 June 2009 (UTC)

Another way of explaining this is as follows: A statistical test is based on the probability of an event. When you do multiple statistical tests you are looking at multiple events. When you select the ones that are positive you are looking at multiple events and selecting. An underlying law of probability is that when you do this the events are not independent, and hence the probability of all the events must be taken into account when calculating the probability of any individual event. The mathematics used to calculate p-values, confidence intervals, and so on are derived on the assumption that there is only one event. Hence, when you do multiple tests you break that assumption and you get the wrong answers. In every case the actual probability will be much higher than the probability tested. In this case the tests were conducted at 95% confidence but because more than one was done the real confidence is about 32%. To get the 95% desired the individual tests would have had to be conducted at a much higher confidence level. (talk) 14:44, 7 June 2009 (UTC)
What is said about 'confidence' is at least partly wrong, and should only concern 'confidence' in relation to confidence intervals. Nijdam (talk) 16:18, 7 June 2009 (UTC)
One, what is wrong? You said yourself that you are not a statistician, so it would be useful to see what you regard as actually wrong and why. What you have shown above is a lack of understanding of the mathematics underlying statistical testing, which is understandable, so I assume you are not going to argue with me over that. If there's something wrong with the wording I'm happy to see if we can find a more acceptable phrasing. Two, Confidence Interval and Confidence Level both refer to this article. I tried to change that once and failed dismally. So this article, like it or not, has to cover both topics. (talk) 16:38, 7 June 2009 (UTC)
Well I'm not the one who wrote most of the above. And I am a statistician. And this article is about the term confidence interval, so an explanation about 'confidence' in this article should relate to confidence interval. There is of course a one-to-one relation between testing and confidence intervals. But: In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less., well, may not be wrong if interpreted in the right way, but at least completely not understandable, and also not direct related to confidence intervals. Also the next sentence: If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were. Similarly, the finding of a statistical link at 95% confidence is not proof, nor even very good evidence, that there is any real connection between the things linked. doesn't contribute much to the understanding of 'confidence'. Nijdam (talk) 09:00, 8 June 2009 (UTC)
Firstly, the terms confidence interval and confidence level both redirect here. Hence, even though the title is confidence interval, the article is about both terms. I feel the two terms should be split into two separate articles. Good luck making it happen; I tried and got shouted down.
Secondly, the point of this section is to highlight the difference between the normal meaning of the word 'confidence' and 'confidence' as a statistical term. When a layman talks of being 50% confident of something he means that he is half sure that it is right. I hope you agree that the result of a statistical test that is reported with a 50% confidence level does not mean the same thing. I believe that it is entirely appropriate within an 'encyclopedia' supposedly for the masses to ensure that that point is clear. (talk) 12:01, 8 June 2009 (UTC)
Nuts! This section continues to conflate study-wise error and error in individual tests. How can it possibly be that if I conduct one hypothesis test and get result A and you conduct tests A and B that our results for A mean different things? It's nonsensical. There isn't some probability fairy out there who watches over what tests you do and changes the likelihood of a result on that basis. The joint probability of A and B is of course lower than the individual probability of A or B. You lower the critical value alpha for the tests on A and B not because anything has changed about those individual tests, but because you want to reduce the likelihood that any of the results you present, all together, are in error. But again, your decision with A is no more or less likely to be in error if you publish it alone than if you publish it along with B. -- (talk) 18:43, 30 June 2009 (UTC)

Neutrality Dispute: It's clear from the above discussion that there is some debate about the 'Meaning of the term "confidence"'. Also, I have some doubts about the neutrality of this particular section and whether it is encyclopedic in its current form. For example, the statement that 95% confidence intervals aren't "even very good evidence" is clearly not neutral. (Indeed, when does statistical evidence become 'good' anyway?) Secondly, it doesn't really clear up the meaning of confidence intervals. Rather, it is devoted in the main to discrediting the 95% confidence interval. Yet this itself misleads the reader into thinking that all we need to do is up the confidence level -- when in fact most misconceptions about confidence intervals are to do with its interpretation, not with what level is chosen. It would be enough to mention that there is nothing special about the 95% confidence intervals so frequently used in research.

As for interpreting multiple statistical tests as giving confidence to the study itself... Who does this? In my experience of scientific research (where confidence limits are rife) I've never come across this.

All in all, too many slights at us wretched laymen and non-statisticians, and not enough cited facts to speak for themselves. Unfortunately, I'm one of these poor, unfortunate 'laymen', and so I don't have the confidence to edit the section and make it factually accurate myself (you've got to give me credit for that pun, folks! ;-) ) Robnpov (talk) 09:27, 24 July 2009 (UTC)

The aim of this section is to point out the difference between the common usage meaning of the word 'confidence' and its meaning as a technical term. Specifically, it is intended to make the point that 95% confidence does not mean 'unlikely to be wrong', but rather pretty much the opposite. If you think it needs rephrasing, have a go at it.
If you are claiming that this section is not neutral, then you are saying that it is slanted towards some position that is in dispute. What is that position? If it is that statistical evidence is not very good evidence, well, you've made the same point above.
I'll give you a week. If you don't respond I'll remove the warning. (talk) 12:25, 26 July 2009 (UTC)
It's now been 2 weeks since you tagged the section and you have neither responded or made any attempt to modify the section. I assume therefore that your tagging of the section is nothing more than vandalism and am removing the tag. (talk) 03:00, 8 August 2009 (UTC)
Well, I'm sorry for not replying to your points. I only edit Wikipedia occasionally, when I spot something and think it needs attention. If I have the time to research and do a proper job of it, I do try to edit the article rather than just tag it. But sometime I forget to go back and follow up.
So, can I assure you that the tag was added in good faith. Please don't forget the wikipedia principle, WP:Assume good faith. Several things should have indicated good faith too: I tagged only the subsection rather than the whole article, I didn't blank the section, I didn't add nonsense/offensive material, and I explained the edit on the talk page. Check out WP:Vandalism before throwing that term around in future.
Anyway, to the article. I think my original post already answered your questions, though. Here's what is in dispute -- the meaning of the term confidence in a statistical context as presented. In addition, as I said, I personally dispute that 95% confidence intervals "aren't very good evidence". There should be no value-judgement here. Nor do I see how this is relevant to the meaning of confidence intervals.
This needs reliable, published sources about the meaning of confidence in statistics. The problems are a mixture of neutrality, tone and relevance, and I am sorry if NPOV was a bad choice of tag. But previous editors on this talk page seem to back up my assessment.
I note that you are the original contributor of this section. It is very useful that you have clarified what the main point of this subsection is: to distinguish between the everyday and statistical usage of the term 'confidence'. Thank you for this. I will edit the article to emphasise that point as soon as I get chance. Hopefully a few statisticians will weigh in and judge my efforts on their accuracy...
Robnpov (talk) 15:39, 9 September 2009 (UTC)

Relation to testing

In the section about the relation with statistical testing I read twice: 'second' parameter. What is the meaning of this 'second'?Nijdam (talk) 13:24, 8 June 2009 (UTC)

If you measure the average height of a group of people and generate a confidence interval on it, then measure the height of a second group of people, if the second height does not fall in the confidence interval then you can reject the hypothesis that the true average heights of the two groups are the same. With lots of caveats about distributions. And buried in waffle. (talk) 10:41, 9 June 2009 (UTC)
May be you mean: If X is a random sample from and Y from , independent from each other and with the same standarddeviation, then:
is a CI for
is a CI for
is a CI for
But the relation???Nijdam (talk) —Preceding undated comment added 15:43, 9 June 2009 (UTC).
No, you can't, in general. The width of a confidence interval (CI) for the difference between two averages is larger than the width of the CI for either average (unless one group has zero variance). -- Avenue (talk) 11:01, 9 June 2009 (UTC)
Hence the caveats and the waffle. (talk) 14:19, 9 June 2009 (UTC)
I think this section is wrong. It seems to say: Suppose f(.|q) is a density with parameter q, and x1,...,xn is a random sample from this distribution, with X=t(x1,...,xn) and also (a,b) is a P-confidence interval for q, then if y1,...,ym is a random sample from f(.|w), with Y=t'(y1,...,ym), then the complement of (a,b) is the critical region for testing H0: w=X on the basis of Y. If so it is nonsense, and it seemes to show the error often made by people who have difficulty in understanding CI, to give the CI the role of acceptance region, which it is not. Nijdam (talk) 15:27, 9 June 2009 (UTC)

"Interval" vs "length of an interval"

The second sentence of the article "Instead of estimating the parameter by a single value, an interval likely to include the parameter is given." could be improved by saying "the length of an interval" instead of "an interval". (I agree that an example in the article does make it clear that a confidence interval is not a specific interval with specific numerical endpoints. But why not start out on the right track?) Or you might want to resort to more lengthy language that says it is a conceptual interval centered about the unknown true value of the parameter being estimated.

Tashiro (talk) 19:22, 3 July 2009 (UTC)

A confidence interval does indeed have specific endpoints; it's not just a length. Michael Hardy (talk) 17:37, 24 July 2009 (UTC)

Relation to hypothesis testing

The article says: one general purpose approach to constructing confidence intervals is to define a 100(1−α)% confidence interval to consist of all those values θ0 for which a test of the hypothesis θ=θ0 is not rejected at a significance level of 100α%.

At least for Binomial, this is not true: for example the 95% "significance interval" for p arising from Bin(n=50,p=.5) will contain p=.5 about 96.7% of the time. Is that only because Binomial is discrete? Or is something more subtle going on?

What's the most extreme example of the difference in performance between CIs and "significance intervals" that the authors of this article can come up with? It might be worth publishing in the main article.

Red Ed (talk) 20:43, 13 October 2009 (UTC)


Needs Attention, Rewrite

I have a technical background - programming, databases, etc. - but it's been a few years since I studied stat or calculus. Still, I couldn't understand the first sentence, and I made several tries. Maybe I just need more coffee, but I don't think this article would be any use to anyone but a student of mathematics. All I want is to calculate confidence intervals for a data set, but this article does not help me. If there's anyone out there who can speak math-geek and also English, please share! (talk) 21:08, 9 October 2009 (UTC)

I agree; the plural and singular words don't agree and it's confusing. I would try to edit the section, but there's no edit button since the grammar problem is in the beginning. (talk) 01:19, 11 October 2010 (UTC)
"No edit button": either (i) use the general edit tab that allows the whole page to be edited.... I think the actual position of this changes depending on defaults and options chosen, or (ii) there is a person-prefence option that produces an edit option for the lead section in the same way as for other sections, but you would need to register and sign on for that to work. The remark to which you are replying is a year old now and changes may have been made since then. But go ahead and make "improvements" if you wish. Melcombe (talk) 08:46, 11 October 2010 (UTC)

By Chance

When doing a test you calculate the probability of something happening and (one way or another) compare it to your significance level. A probability is a measure of chance. To add "by chance" is therefore pointless. The argument associated with a test is that the probability that you've calculated is not the real probability of the event because the calculated probability was too small. That is, the researcher has seen something happen whose probability is less than 5% and is arguing that its probability of happening is actually higher, or if you like that the underlying distribution is not the real distribution. The actual probability of the event happening is unknown. (If you knew it you'd know the underlying distribution, and so why would you be doing testing?) You could rephrase the sentence as "In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in 20 or less but that the researcher wants to argue actually occurs more often." But surely this is needless pedantry, given that the point of testing has been made repeatedly in the preceeding articles. 13:51, 14 June 2010 (UTC) —Preceding unsigned comment added by (talk)

likely? often?

I recently changed the article form

Instead of estimating the parameter by a single value, an interval likely to include the parameter is given.

Instead of estimating the parameter by a single value, an interval that often includes the parameter is given.

(emphasis added in both cases). I don't like the word, "likely" because I think it generally is read as, "probably." But a CI does not probably contain the real value, that would require Bayesian statistics. In contract, "often" or "frequently" or "usually" all suggest that it is the typical way of things, though it might not be true. I'd suggest any of those would be preferable to "likely." 018 (talk) 02:34, 24 September 2010 (UTC)

I do prefer "likely" to the other terms you mention. I think it comes closest to what it means. And BTW it is a kind of probability. Nijdam (talk) 11:14, 24 September 2010 (UTC)
What "is given" does not probably include the true parameter. The method produces intervals that probably include the parameter. My point is that I think this sentence lends itself to the misinterpretation. In contrast, emphasizing repetition as, "often," "usually," or "frequently," do both captures the probability aspect as well as the reality that you just don't know once the numbers have been handed to you. 018 (talk) 15:22, 24 September 2010 (UTC)

second paragraph in lead

Michael Hardy recently [1] replaced the second paragraph] in the lead stating, "This paragraph as written is wrong." The paragraph in question was

Because the endpoints of the confidence interval (the confidence limits) are random variables, the only claim that can be made about it is that the method of generating confidence intervals usually results in an interval containing the true value, not that any particular confidence interval (i.e. range of values) has some probability of containing the true value. To make a statement like that would require Bayesian statistics.

I'd appreciate knowing what exactly he objects to.

In addition, the paragraph was added

An somewhat delicate epistemological issue arises in connection with confidence intervals. Before one takes a sample consisting of one or more (usually more) data points, one may say of a 90% confidence interval that there is a 90% probability that it will contain the parameter of interest. Can one assert the same probability after one sees the data and thus knows where the endpoints of the confidence interval are? In some cases, the answer is clearly no. Specifically, if there is a known probability distribution for the parameter itself before the data are known, and the confidence interval falls in a region where the parameter is very unlikely to be, then one can say that one has probably observed one of the 10% of cases in which the 90% confidence interval fails to contain the parameter of interest. In such cases, one uses instead a 90% posterior probability interval, found by using Bayes' theorem. In cases where one has not assigned a probability distribution to the parameter, may one say, after knowing the endpoints of a 90% confidence interval, that there is a 90% chance that the interval contains the parameter of interest? One cannot say that if one construes "90% probability" as meaning 90% of the cases in which one takes a sample of one or more data points. But can one say that one should be 90% sure? That is not strictly a mathematical problem and is philosophically problematic.[1]

There are several problems with this. The first is that it is... not an paragon of clear communication. I think the lead should follow the principle of KISS and more complicated explanations can be saved for later. This article is not about Bayesian statistics, so I think it should really just be mentioned in the lead that Bayesian statistics is what would give you what you might suspect you wanted.

The second is that it is wrong. It reads, "Before one takes a sample consisting of one or more (usually more) data points, one may say of a 90% confidence interval that there is a 90% probability that it will contain the parameter of interest. Can one assert the same probability after one sees the data and thus knows where the endpoints of the confidence interval are? In some cases, the answer is clearly no." No. You can't go from Pr(B|A) to Pr(A|B) without Bayesian statistics. Bayes' theorem could or could not exist, priors could or could not exist, and it wouldn't matter. A frequentest can not transform Pr(B|A) to Pr(A|B)--end of story. 018 (talk) 18:25, 11 October 2010 (UTC)

I agree completely. While the previous version isn't perfect, I think it is much better than its replacement, and we should revert back to the previous version for now. --Avenue (talk) 01:38, 12 October 2010 (UTC)
Given the above, I have moved the offending paragraph out of the lead, where it does not seem to belong and where a general reader is likely to be put off by the lead phrase "A somewhat delicate epistemological issue arises", as who knows what "epistemological" means. I have left it in the article, as at least it has a citation for what it is trying to say, something that is sadly lacking everywhere else. I think the "previous version" of the lead mentioned above had only the single paragraph in it. But this then leaves nothing being said about "importance". Melcombe (talk) 11:22, 13 October 2010 (UTC)
Melcombe, I'm also adding back the paragraph that Michael Hardy removed. While he claimed it was wrong, he never said why he thought this. 018 (talk) 14:15, 13 October 2010 (UTC)
By "previous version", I meant the version including that removed paragraph, so I support O18's action. O18 has also explained why the offending paragraph is wrong. It's not clear what part of the paragraph is supported by the source cited (perhaps just the last sentence or two?), and we should fix or remove the clearly erroneous parts. --Avenue (talk) 15:35, 13 October 2010 (UTC)
I have replaced the offending paragraph for now with something better that may still reflect what was trying to be said. There is a question of whether a contrast with Bayesian stuff is appropriate at this point, but I did try to bring in some context of general statistical inference rather than being just Bayesian/frequentist. I will try to adjust the lead a bit. Melcombe (talk) 17:20, 13 October 2010 (UTC)
Melcombe, what you wrote is still wrong. The implication of, "To make a statement like that would require Bayesian inference and a Bayesian interpretation of probability." suggests that Bayesian probability gives you Pr(A), not Pr(A|B). Bayesian inference only tells you the assumptions (and that is why it is okay to make probability statements) it isn't that Bayesian inference is a magic wand.
Here is an example. Imagine if we each have private information about the value of something. Then we get shared data. You might give get one range and I might get another. Say we both constructed 90% credibility intervals--but they don't overlap, then can I say, "the probability that my credibility interval contains the true value is 90%, and the probability that your credibility interval contains the true value is 90%." No, obviously not. Or, consider a situation where I am given data and a good prior from someone and do an analysis and then by the time I give it to them the real answer has been revealed (like who will win a football game), then I would be wrong to say, "the probability that Peru wins is 70%" because they lost. This is similarly an issue with the statement, "A particular confidence interval either does or does not contain the true value and no probability can be attached." again, the same problem exists with the credibility interval.
Finally, this article is about confidence intervals, NOT credibility intervals. So the lead should focus on confidence intervals, not credibility intervals. 018 (talk) 17:39, 13 October 2010 (UTC)
I also want to add that I think you did a very good job with the philosophy section. 018 (talk) 17:42, 13 October 2010 (UTC)
I don't see a problem there. The sentence "To make a statement like that would require Bayesian inference and a Bayesian interpretation of probability" does not say that Bayesian inference alone is enough; it also points out that a Bayesian probability interpretation is required. In your example, one can say "my subjective probability that my credibility interval contains the true value is 90%, and your subjective probability that your credibility interval contains the true value is 90%." Once you allow Bayesian interpretations, it may not make sense to expect people to agree on the probability of something, because this depends on what information they have available and give credence too. In the Peru soccer game example, the probability should be updated once more information comes to light (e.g. a report from a reliable source that Peru has won). But the statement "the probability that Peru wins is 70%" still makes sense if we interpret "the probability" in terms of the information available when that statement was made. If we break your last statement up into (a) "A particular confidence interval either does or does not contain the true value" and (b) "no probability can be attached", then (a) also applies to credibility intervals, but (b) does not. I agree Melcombe did a nice job. --Avenue (talk) 22:41, 13 October 2010 (UTC)
I think the statement, "you can't say that the probability of the true value is in the confidence interval is 95% is true, that would require Bayesian..." is very misleading. I think it implies that Bayesian statistics allows you to make statements about Pr(A). Neither Bayesian statistics and frequentist statistics allow that. Bayesian statistics does allow you to make probability statements about posteriors, but now unconditionally. I'd be okay with something like, confidence intervals allow you to say "foo." An alternative is Bayesian statistics that allow you to say, "bar." But it certainly doesn't deserve much space in the lead since, again, this article isn't a compare and contrast, it is an article about confidence intervals. Now, if we wanted to merge the two... that would be different (not that I'm proposing it). 018 (talk) 23:03, 13 October 2010 (UTC)
Fair enough. I think we should retain some coverage of Bayesian alternatives here, especially since confidence intervals are often misinterpreted as credibility intervals, but I agree this doesn't need to go in the lead section. --Avenue (talk) 02:30, 14 October 2010 (UTC)

In reference to the recent edits by Benwing. I think the lead would make more sense if it made positive statements, "a CI is foo" rather than negative statements, "a CI is not bar." I also think that the implication in the new text remains that Bayesian statistics magically allows you to make statements about Pr(A) and not just Pr(A|B). 018 (talk) 15:57, 14 October 2010 (UTC)

I edited this paragraph to try and find a compromise between your version and my version. Although you may object to "negative statements", the statement in question is in fact the most important point being made by the paragraph and deserves to be stated first off rather than cloaked three or four sentences down. The fact that confidence intervals cannot be interpreted in the "obvious" way is something that every beginning statistics student eventually runs up against, and invariably gets confused by. As a result, we owe it to our readers to make this point clearly. (IMO the fact that confidence intervals have such a tortured interpretation is strong evidence that the frequentist paradigm is not the right way to look at things.) Benwing (talk) 09:17, 16 October 2010 (UTC)
Benwing, seeing what you have there, I think you are right. You did a great job of improving the second paragraph. Thanks, 018 (talk) 17:55, 16 October 2010 (UTC)
You're welcome! Benwing (talk) 10:44, 17 October 2010 (UTC)

two mentions of credibility intervals that are not well linked

Right now the article mentions credibility intervals in two spots. The first is in the definition section and the second is in the "Alternatives and critiques" section. I think the information in both sections is good, and that the do belong in their respective sections... but it would be nice to mention prediction intervals in both places and to have something that ties them together. Maybe it just isn't possible. 018 (talk) 15:07, 18 October 2010 (UTC)

Prediction intevals are presently dealt with in the subsection headed "Intervals for random outcomes". Melcombe (talk) 09:10, 19 October 2010 (UTC)
I'm not quite sure what you mean here ... also, since credible intervals are the main alternative to confidence intervals, I don't see a problem in mentioning them in multiple places in the article.
I'm not saying that Bayesian CIs don't deserve a section or two, only that it would be nice if the two could be more directly related or work together somehow. Maybe they could be closer? I'm not sure it is possible, just a thought. 018 (talk) 04:09, 19 October 2010 (UTC)
They are just references, not sections. You seem determined to expunge Bayesian references from this article. I don't really understand why. It's very important to clarify how confidence intervals and credible intervals differ. Benwing (talk) 05:06, 19 October 2010 (UTC)

A major point is that this article is about confidence intervals, not credibility intervals. A discussion/comparison of the two could reasonably go in the articles for either topic. The trouble is that too many articles then get swamped with ill-informed and citation-less discussion of Bayesian-vs-classical stuff, so much so that the basic ideas of each individual article then become confused and lost. While there might be a point in having a completely separate article for Bayesian-vs-classical arguments across the whole of statistics, such discussion has previously been removed from statistical inference which is where it should logically be if it goes in a more general article ... but for the time being it may be worth seeing if discussion for interval estimation should/would fit into interval estimation. However, on the topic of Bayesian citations here ... wikilinks to the credibility interval article should be sufficient here unless citations are to material that actually does do a reasonably good job of camparing confidence and credibility intervals. (No I haven't looked to see exactly what was added/removed.) Melcombe (talk) 09:10, 19 October 2010 (UTC)

"Philosophical issues" section is confusing

I added a "confusing-section" tag on this section because it seems very confusing to me ... reading through it, I have a hard time understanding what exactly the section is trying to say.

Perhaps more useful than this section as written would be a section addressing the reason why confidence intervals are expressed the way they are and why the interpretation ends up tricky. The way I see it, frequentist statistics asserts that probabilities are real, objective numbers, and you can only speak probabilistically about phenomena that are actually random in nature. Hence you can talk about the probability of a random process such as selecting a random voter to poll, or about any quantity that is in some way derived from a random process, but you can't talk about the probability of an inherently non-random quantity such as the percent of voters who will vote for a specific party (assuming that the voters have made up their minds and won't change) -- the percent of voters is a specific, non-random quantity, since theoretically you could ask every voter what their vote will be and determine the answer with certainty. Confidence intervals are basically the "best you can do" given these philosophical tenets. The problem is that this viewpoint is fundamentally in conflict with human reasoning about uncertainty -- humans have no problems making probabilistic assessments about non-random phenomena (e.g. "I'm 75% sure you are lying"). Humans don't see any fundamental difference in uncertainty stemming from actual randomness and uncertainty stemming from lack of knowledge, and reason probabilistically about both phenomena in the same way. Bayesian statistics takes a philosophical stance that is in accordance with this viewpoint. This is why the natural tendency of humans is to think about confidence intervals in a way that actually meshes with the way that Bayesian credible intervals work.

I think that something similar to what I've just expressed is what should go into the philosophical issues section. We should also certainly include a short discussion of what the problems with the Bayesian standpoint are.

This section should be short, as it's getting into larger and more basic issues of Bayesian vs. frequentist statistics (and should refer to an article discussing these larger issues for further info). Benwing (talk) 03:20, 19 October 2010 (UTC)

Benwing, I feel like I keep harping on this point... you appear to understand it in the above, but the "gotcha" of freqentist CIs is that you can't say Pr(A|...)=foo. The "gotcha" of Bayesian CIs is that you can't say, Pr(A)=bar. It is important to note that the probability of A is only subjective and conditions on some prior and data. 018 (talk) 04:16, 19 October 2010 (UTC)
You need to explain this further. In Bayesian statistics, you can talk about the probability of anything. The probability may depend on subjective assumptions, but it certainly exists. AFAIK the point of frequentist statistics was to try and construct "objective" probabilities, with the trade-off that you can't talk about certain sorts of probabilities. Bayesian statistics does not limit what you can view probabilistically. Benwing (talk) 05:04, 19 October 2010 (UTC)
Perhaps you should explain why you want to impose a decription of Bayesian inference in an article that is not about Bayesian inference. Melcombe (talk) 09:13, 19 October 2010 (UTC)
Melcombe, I think you have chosen a good path for the article in your recent edits: mentioning the existence of the Credibility intervals as an alternative and linking to them. 018 (talk) 15:50, 20 October 2010 (UTC)

"then the interval [a, b] consist of exactly the values θ0 of θ, for which the null hypothesis: θ = θ0 is not rejected at significance level 1 − γ."

This phrase doesn't make sense to me. Could someone clarify what are θ0 and θ, and what is the null hypothesis? — Preceding unsigned comment added by (talkcontribs) 09:58, 18 November 2010

Does that explain it? 018 (talk) 16:35, 18 November 2010 (UTC)

I think we should start with a simple, "if the null is 0" and then add the note about all theta-0 in the CI... 018 (talk) 17:09, 19 November 2010 (UTC)
I have no idea what you mean. Better formulate here your suggestion. Nijdam (talk) 10:50, 20 November 2010 (UTC)


"In statistics, a confidence interval ... is an interval that frequently includes the parameter of interest, if the experiment is repeated."

So when you repeat the experiment you will find that you always get the same interval but that the parameter varies each time. At least that is what this is saying. (talk) 16:58, 4 December 2010 (UTC)



I do not understand why there are the two sections "CI as random intervals" and "CI for inference". What is the intended difference? Nijdam (talk) 23:01, 4 January 2011 (UTC)

The intended difference is given by their titles. The first points out that in the "probability of" equations, it is the end points of the intervals that are random, or that the interval itself is a random outcome deriving from the randomness in the original sampling process. The second indicates the way in which the probability associated with the random interval covering the true value can be used for inference. Melcombe (talk) 15:29, 5 January 2011 (UTC)
I still do not understand what you're aiming at. The first section also is meant to show what inference there is in a CI. Nijdam (talk) 21:33, 6 January 2011 (UTC)


"For example, a confidence interval can be used to describe how reliable survey results are. In a poll of election voting-intentions, the result might be that 40% of respondents intend to vote for a certain party. A 90% confidence interval for the proportion in the whole population having the same intention on the survey date might be 38% to 42%. From the same data one may calculate a 95% confidence interval, which might in this case be 36% to 44%."


"one general purpose approach to constructing confidence intervals is to define a 100(1 − α)% confidence interval to consist of all those values θ0 for which a test of the hypothesis θ = θ0 is not rejected at a significance level of 100α%."

Surely these two statements are contradictory? I'm not enough of an expert to know which one is wrong though... (talk) 03:07, 25 January 2011 (UTC)

No, I'm just confused... —Preceding unsigned comment added by (talk) 03:12, 25 January 2011 (UTC)

There's nothing contradictory there to my eyes. --Avenue (talk) 05:04, 25 January 2011 (UTC)

Confused use of term significance level

Five lines down it says: "The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-value:"

Then right below Use In Practice, it says: "The significance level is usually denoted by the Greek symbol α (lowercase alpha). Popular levels of significance are 10% (0.1), 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a p-value lower than the α-level, the null hypothesis is thus rejected"

Why does it call both the p-value and α significance level? (talk) 16:02, 16 May 2011 (UTC)

Because α is the pre-set value that an observed p-value must be below for a test to be significant. Hence α = critical p-value as implied (note the "critical"), but α and "p-value" are not the same. Melcombe (talk) 16:11, 18 May 2011 (UTC)

Is this page unnecessarily confusing?

The current page focuses on the "true value" of a parameter. What is the meaning of that in the normal case of a "normal" distribution where values are distributed. For example, the boiling point of water under particular conditions has a "true value" and measurements can be used to narrow in on that, with variation due to error. But what is the "true value" of the height of Americans? Obviously there is no such thing, what can be estimated is the mean value, and the amount of variation through parameters like standard deviation and confidence intervals. For example, we might find that the mean value is 5'6" and an X% confidence interval might be 5' to 6'.

Talking about the true value produces confusion with the standard error of the mean as, in the case of a distribution with a "true value" the true value and the mean should be pretty much the same given large sampling. But for something like height, confidence intervals tell us about the spread of the data. The Standard Error of the Mean gives us confidence intervals on the mean but those are different than confidence intervals (by a factor of the square root of sample size). This page produces confusion on this point.

Basically, a confidence interval is the mean plus or minus a multiple of the SD. This means that it DOES tell you, for a normal distribution, that X% of values are within X% confidence intervals. In other words, if sampling shows that the 95% confidence intervals for American male height is 5'6" +/- six inches we can interpret that to mean that approximately 95% of Americans will have heights between 5' and 6'. Obviously this is an approximation, just like everything in the statistical realm, as only measurement of the entire population would provide absolute certainty.

Let me rephrase myself. If you sample a population and get mean M and standard deviation S and then construct from that a perfect (mathematical) normal distribution with these parameters we could say that 68% of measurements are between M-S and M+S. Given that M and S are estimates of the true mean and true standard deviation for a population then it seems obvious that approximately 68% of measurements are estimated to be between M-S and M+S. Why would this simple and useful interpretation be denied? I understand that there are many assumptions (the sample was sufficiently large, the sampling was sufficiently random, the underlying population is normally distributed and so on) but everything in real statistics is based on assumptions like that. — Preceding unsigned comment added by DavidRCrowe (talkcontribs) 16:59, 2 August 2011 (UTC)

I think you might want to talk about your theories at the mathematics reference desk. If you got this information from this page, then we do have expositional issues and you should come back hare and we edit the part of the page that made you believe this is what a confidence interval is. 018 (talk) 17:43, 4 August 2011 (UTC)

I think I have a better understanding of the problem with this page now and that it mixes up two distinct concepts. "Confidence interval" is a generic concept. For example, the Oxford English Dictionary defines it as, "a range of values so defined that there is a specific probability that the value of a parameter of a population lies within it". This could be any parameter of the population, not just mean. The other concept, is of course, "Confidence interval of the mean", which is the range of values with a specific probability of the mean lying within.

The opening sentence defines quite well the generic term, "In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate."

The problem occurs when the term "confidence interval" is misused to mean "confidence interval of the mean", as in the "Theoretical Example" section, which uses the equation for confidence interval of the mean. Note that the confidence interval of the population is a similar equation without the division by the square root of the sample size (i.e. just the mean plus or minus the SD multiplied by a constant depending on the probability desired (1.96 for 95% for example)).

Either the discussion of confidence interval of the mean should be deleted or it should be clarified to indicate that this is just one type of confidence interval.

DavidRCrowe (talk) 03:54, 14 August 2011 (UTC)

Is this clearer and correct? "A Confidence Interval is claim that, if a population parameter measured across the full population falls within a specified interval, then that parameter measured on any single, perfectly random sample will also fall within that range.

For example, a political poll that places a candidate's approval rating between 36%-44% with 95% confidence is claiming that, if the actual percentage of approving voters falls within the 36%-44% interval and a single random sample of voters is selected, there is a 95% chance that the percentage of approving votes within the sample would also fall within the 36%-44% interval.

This claim does not directly address the chance that the approval rating is in the 36%-44% interval to begin with, or the chance that the claim is correct. It also does not claim to address samples that are not perfectly random, or one sample chosen from multiple samples." (talk) 17:36, 16 December 2011 (UTC)

unfortunately, no. There is even text about this in the article under, "Philosophical issues". 018 (talk) 21:48, 16 December 2011 (UTC)

Amazing writing.

"Mathematics can take over once the basic principles of an approach to inference have been established, but it has only a limited role in saying why one approach should be preferred to another."

What an "amazing" sentence this is. So precise and yet so deep... Who would have guessed such masters of literature and philosophy would come on our humble Wikipedia. Anonywiki (talk) 08:29, 24 November 2011 (UTC)


What more should be in this article

Following on from "Daqu"'s last comment from above, perhaps it is time to move on to what more needs to be in this article. I would not want any more on "explanations" and "interpretations" as this is not a text book. Topics that I think should be in, but are not yet included are:

  • one-sided intervals
  • equal-tail-area ways of determining limits
  • derivation via inversion of significance tests
  • confidence intervals derived from non-parameteric tests
  • overlap of confidence intervals (to extend the limited content in existing figure)

In addition, I think the material in the subsection headed "Meaning of the term confidence" should be removed, possibly into a separate article ... as, given some other articles, there seems a need to discuss the layman's question of "how much confidence is there in that test" in terms of statistical power rather than being led through to "confidence intervals". So possibly an article "confidence in statistical results" would be justified.

This an interesting discussion. I hope the following comments are helpful.
In considering "confidence in statistical results" it is important to make a difference between concepts of probability and uncertainty. In some cases, but not always, uncertainty is the same as probability. Fisher, R. A. (Statistical Methods and Scientific Inference, 1956, p. 9) writes
"The prime difficulty lay in the uncertainty of such inferences, and it was a fortunate coincidence that the recognition of the concept of probability ... provided a possible means by which such uncertainty could be specified and made explicit."
and later on page 37 he says that
"... the concept of Mathematical Probability affords a means, in some cases, of expressing inferences from observational data, involving a degree of uncertainty, and of expressing them rigorously, in that the nature and degree of the uncertainty is specified with exactitude, yet it is by no means axiomatic that the appropriate inferences, though in all cases involving uncertainty, should always be rigorously expressible in terms of this same concept. ... in the vast majority of cases the work is completed without any statement of mathematical probability being made about the hypothesis under consideration. The simple rejection of a hypothesis, at an assigned level of significance ..."
In Bayesian inference measures of uncertainties have probability interpretation, that is, they have the properties of "Mathematical Probability" (with collection of events, etc.). In case of confidence intervals and significance tests the measures of uncertainties, that is, confidence levels and observed significance levels, have been calculated from probability distributions, but as post-data concepts they do not have probability interpretation. What would be the collection of events in these cases? In case of observed confidence interval surely not the interval and its complement. And in case of significance test surely not the hypothesis and its complement. In case of confidence interval there is no meaning to assert the complement of the observed interval has confidence level alpha and in the same way in case of significance test it is not meaningful to assert that the significance of the complement of the hypothesis is one minus the observed significance level.
If confidence levels and significance levels had the interpretation of probabilities there would not be need to use words "confidence level" and "significance level". Because they are not probabilities of some events, but instead measures of uncertainties of certain inferential statements these phrases are necessary to make the difference clear. --Esauusi (talk) 00:15, 16 August 2008 (UTC)

Any constructive thoughts about what should be in the article, as opposed to teaching each other statistics?

Melcombe (talk) 08:41, 5 June 2008 (UTC)

As mooted above, I have moved the subsection mentioned into a new article Confidence in statistical conclusions. Melcombe (talk) 11:07, 24 June 2008 (UTC)

If, as a layman, I type in the term Confidence Level, I want to see some discussion telling me what this actually means in terms of what real confidence I can have in the result being offered. You have removed this from the article and placed it where no layman looking for it can find it. I would suggest redirecting the term confidence level to significance level, which is more appropriate. In the mean time, I'm putting the information back here. (talk) 10:22, 29 June 2008 (UTC)

I concur with unnamed user. I love how readable and approachable this article is. Pure wikipedia style art. User:scottmas — Preceding unsigned comment added by (talk) 04:26, 14 April 2012 (UTC)

Confidence example incorrect?

I think this is the wrong way round -

"For example, a confidence interval can be used to describe how reliable survey results are. In a poll of election voting-intentions, the result might be that 40% of respondents intend to vote for a certain party. A 90% confidence interval for the proportion in the whole population having the same intention on the survey date might be 38% to 42%. From the same data one may calculate a 95% confidence interval, which might in this case be 36% to 44%."

I believe confidence of 90% would give a range of 36-44% and 95% ranging 38-42 - and not as said in the above section — Preceding unsigned comment added by (talk) 09:29, 3 May 2012 (UTC)

It's correct as it is. A 95% confidence interval will be wider than a 90% confidence interval. Qwfp (talk) 10:19, 3 May 2012 (UTC)

That isnt logical - 90% of 40 is 36

                 - 95% of 40 is 38

A 'wider' confidence interval would mean you were less sure of the accuracy of the data. 90% is less certain than 99%.

Look at it this way: I show you a coin and give you two intervals for the probability p of landing heads when thrown:A= [0.25,0.75] and B=[0.45,0.55]. Which of A and B do you think has the higher confidence of containing p? Nijdam (talk) 09:02, 6 May 2012 (UTC)

A 90% confidence interval does not mean calculate 90% of the number and add/subtract it. The fact that the numbers chosen for this example display that property is a coincidence. (talk) 08:15, 13 May 2012 (UTC)

Correct to say that z in example is standard score?

It seems that many external sites treats z (as seen in Confidence interval#Practical example) as standard score. However, the Standard score#Calculation from raw score section uses the standard deviation (σ) as the denominator, and not the standard error (σ/√n) as seen in this practical example. Is it still correct to link the z in this article to standard score? Mikael Häggström (talk) 05:42, 6 May 2012 (UTC)

Here lies the problem of just thinking in terms of formulas. The z-score is always related to some variable. It would be best to speak of the z-score of some X. Then:
Hence it depends on X what has to be used as . In the practical example "X" is the sample mean, hence:

Nijdam (talk) 09:13, 6 May 2012 (UTC)

Thanks Face-smile.svg It's less confusing for me now. I've made a section about the relation at Standard_score#Confidence_intervals. I hope it can be looked over, as there may be errors, or perhaps it can be explained in a better way. Mikael Häggström (talk) 13:18, 6 May 2012 (UTC)

Note: not every z is a z-score.Nijdam (talk) 09:53, 7 May 2012 (UTC)

Ok... so what about the z in Prediction interval#Known mean, known variance? Is that really a standard score? Mikael Häggström (talk) 15:33, 7 May 2012 (UTC)

For a mathematician like me it sounds odd when you say "the z in a prediction interval". I guess you refer to the formula in the article, about a known distribution. In the referred case the distribution is a normal distribution and the z is a quantile, no z-score. Nijdam (talk) 21:39, 7 May 2012 (UTC)

I was referring to the Known-mean-known-variance-section as linked, but anyhow you've found the formula that I was thinking of. However, these are obviously issues that affect several statistics-articles, so I think we should continue this discussion at: Wikipedia talk:WikiProject Statistics#Standard score. Mikael Häggström (talk) 10:38, 8 May 2012 (UTC)


The section about the meaning of the term 'confident' is not correct

There is a difference in meaning between the common usage of the word "confidence" and its statistical usage, which is often confusing to the layman. In statistics, the word "confidence" is a technical term used to indicate how rare an outcome has to be before the writer will accept it as significant.

>>It seems here is confusion about the terms 'confidence' and 'level of significance'.

In general usage the word "confidence" indicates how likely the writer thinks it is that a statement is true. "I am 95% confident that Martians exist" means the speaker has little doubt that they do. In statistics, "significant" and "true" are not synonyms.

For example, suppose a tester reports after conducting a survey that she has found that the rate of lung cancer amongst consumers of peanut butter is significant at the 95% confidence level.

>>I very much doubt a serious tester will ever report this.

What this means is that the particular outcome produced by the study is one that happens by chance alone less than 5% of the time.

>>Most outcomes happen less than 5% of the time. For instance the outcome 1 in a N(0,1)-distributed population happens never. Likewise any other outcome

As the number of things that cause lung cancer is (presumably) actually very small compared to the number of things that can be tested, the likelihood that there is any real connection between the two is also very small.

>>I'm lost here.

If, for example, the probability that something chosen at random actually causes lung cancer is one in a million, the probability that her result means that peanut butter causes lung cancer is not 95%, but rather 0.002%.

>>And here! Nijdam (talk) 11:34, 13 May 2012 (UTC)

1. Confidence level is 100% minus significance level. The two terms are effectively equivalent names for the same thing. Quoting the confidence level indicates the level of significance the test was carried out at.

>>I've never heard of this. Confidence is used in the parameter space, level of significance in the sample space.

>>> From Modern Elementary Statistics (3e), John Freund, New Jersey:1952, p247: "The probability of committing a Type I error is generally referred to as the level of significance at which the test is conducted, and is denoted by the Greek letter alpha." ibid, p224: "An interval like this is called a confidence interval, its endpoints are called confidence limits, and the probability 1 - alpha [...] is called the degree of confidence." I was taught to call this the confidence level. Confidence level 95% means significance level 5%.

>>>>Proving what?? Nijdam (talk) 12:57, 17 May 2012 (UTC)

2. The point being made here is that "confidence" is not an indication of how likely a positive result is to be real, but how likely a chance result is to be positive (or negative if you prefer).

>>I don't know what point you make, you apparently do not see that it is nonsense to report: the rate of lung cancer amongst consumers of peanut butter is significant at the 95% confidence level .

>>> You know that feeling of despair you get when you tell someone that a third party has a few kangaroos loose in his top paddock, and they comment that they didn't know the guy owned any kangaroos? An example is not normally nor necessarilya statement of fact.

>>>>You still don't get it, but 'the rate of lung cancer' never can be significant, let alone at the '95% confidence level'. Nijdam (talk) 12:57, 17 May 2012 (UTC)

3. When one does a statistical test, either the result is chance (probability N) or it isn't (probability A, N+A=1). If chance, the probability of getting a positive result is the significance level (S), so the probability of getting a positive result by chance is NS. If it is not chance, the probability is similarly AP, for some value P. Thus the formula for the probability of a positive result being a real positive as opposed to a false postitive is AP/(NS+AP). Substitute 5% for S, 100% for P (the largest possible value), and 1 in a million for A, and the result is 0.002%. This is all simple algebra.

>>Clearly you're not an expert in statistics. If in a z-test with null hypothesis μ=0 the observed z-score is 1, the probability to observe this is 0, yet it is not significant.

>>> For any probability density function (that is a probability function based on real numbers) the probability of any event is 0. Therefore is is meaningless to talk about the probability of particular events. Therefore the term "probability" is normally interpreted as meaning the integral of the pdf from the specified point to whichever infinity is appropriate. If the significance level of a test is 5%, it means that if you calculate the point at which the integral of the pdf from infinity to that point in 5%, events within the integrated region will be considered significant and those out side it will not.

>>>>I know my statistics, but maybe only some self appointed experts use such terms. You probably refer to the p-value. Nijdam (talk) 12:57, 17 May 2012 (UTC)

4. "Significant" means that an outcome is rare. "True" means that a statement is correct. "I got a positive result so that means its not chance" may well be a false statement, even though the result is significant.

>>If 'rare' is correctly interpreted. The rest I don't understand. Nijdam (talk) 22:16, 15 May 2012 (UTC)

>>> A type I error is a significant result, because it meets the standard of rarity set by the significance level. But it does not indicate a real connection between the things tested. A statement that there is such a connection must therefore be false. (talk) 14:30, 16 May 2012 (UTC)

>>>>I would not formulate: 'A type I error is a significant result', but I understand what you mean. I just don't get what you want to express in the context of the discussion. Nijdam (talk) 12:57, 17 May 2012 (UTC) (talk) 16:15, 14 May 2012 (UTC)

Okay, let's try again. When you do a statistical test, either you get a positive result or you get a negative one. When you do a statistical test, either the result is due to chance alone or it isn't. If you do a statistical test and get a positive result, either the result is due to chance (a false positive) or it isn't (a real positive). If you know the significance level of the test, the probability of the result being just chance, and you assume that you cannot get a false negative, then you can estimate the probability of a positive result being a real positive. Which I have done. (And before you jump in, yes I mean maximum probability -- the assumptions bias the calculation that way.)

"Overall, nonsmokers exposed to environmental smoke had a relative risk of coronary heart disease of 1.25 (95 percent confidence interval, 1.17 to 1.32) as compared with nonsmokers not exposed to smoke." The word confidence in this sentence means that the level of sigificance at which the test was conducted was 5%, that is that if the null hypothesis is true and there is no connection between environmental smoke and coronary heart disease then intervals that do not contain the statistic will occur by chance in about 5% of such surveys.

However, the layman on seeing the word "confidence" will assume that it bears its common usage meaning, ie that the writer is saying that the chance of the result being right is 95%, ie that the probability of this positive result being a real result and not a false positive is 95%. But that is not what the word means in this context. In fact the probabilty of the result being a real result is highly dependent on what proportion of testable things actually cause lung cancer, and so is a hell of a lot less than 95%.

Now do you understand the point of the section? (talk) 14:34, 17 May 2012 (UTC)

No, I don't. You're permanently confusing hypothesis testing and confidence intervals. There is an equivalence between them, in the sense that values of the parameter to be tested within the confidence interval would not have been rejected at a level of significance of 1 - confidence level. This, I presume, is what you're talking about. However this does not mean that level of significance is the same as 1 - confidence level. Nijdam (talk) 22:58, 17 May 2012 (UTC)

"Overall, nonsmokers exposed to environmental smoke had a relative risk of coronary heart disease of 1.25 (95 percent confidence interval, 1.17 to 1.32) as compared with nonsmokers not exposed to smoke." If you were a layman, who had no familiarity with mathematical concepts, what would you assume the words "95 percent confidence" meant? (talk) 06:01, 18 May 2012 (UTC)

Being a layman I probably would not understand this at all, and for good reasons, because me too, being an expert, do not understand what the meaning is of a confidence interval compared with something else. Nijdam (talk) 08:48, 18 May 2012 (UTC)

The quote is taken from a scientific paper published in the New England Journal of Medicine, and is referenced from quite a large number of web sites, so I assume it is seen as an important paper. Bear with me here as I am trying to explain something. Would it be fair to suggest that a layman might take it to mean that the real relative risk must have a 95% chance of falling inside the interval, or that the speaker is 95% sure that the two things are linked, or that the claim that exposure to environmental smoke causes coronary heart disease is 95% likely to be right, or some such? (talk) 14:03, 18 May 2012 (UTC)

Doesn't impress me. My point is: what is it you actually want to explain? What's written now is definitely insufficient. Nijdam (talk) 20:11, 18 May 2012 (UTC)

If I were to make a list of 1000 different condiments, jams, nut butters, pickles, chutneys, and so on, how many of those condiments do you think are likely to actually cause lung cancer? Or are you unable to estimate? (talk) —Preceding undated comment added 22:14, 18 May 2012 (UTC).

I don't care about the lung cancer example. I want to know what you want to say about the term 'confidence'. Nijdam (talk) 13:16, 19 May 2012 (UTC)

I have repeatedly tried to explain the point I am making to you. Rather than try to understand what I am saying, you have repeatedly chosen to gainsay me. I can only conclude that either you are incapable of understanding what I am saying, perhaps because you lack any real mathematical knowledge or ability, or more likely that you are being perverse. Either way there is no point in me continuing to waste my time responding to you. So, either answer the questions I have posted here or this conversation is at an end. (talk) 16:21, 19 May 2012 (UTC)

Well, well. I hope you have learned in the mean time that "level of confidence" is NOT complementary to "level of significance". Nijdam (talk) 07:37, 20 May 2012 (UTC)

No, couldn't walk away from it. If nothing else this is helping me find ways of expressing myself more clearly. Let's try framing it as set theory.

You want to generate a confidence interval. That means that you have a statistic that you want to generate the interval for.

>>Rubbish, a confidence interval is generated for a parameter of a distribution.

The statistic is one possible outcome of an event.

>>A statistic is a function of the sample, that 'summarises' the sample. Outcome and event are specific terms in probability theory.

Call the set of all possible outcomes that the event could have produced A.

>>Perhaps you mean the sample space??

If you were to generate a confidence interval for each outcome in A then some of those intervals would contain the parameter and some would not.

>>Let's not speak of 'for each outcome in A', as I don't know exactly what you mean. Probably you mean all possible samples.

Partition A into two subsets, C and S, such that C contains all the possible outcomes for which the interval would contain the parameter and S contains all the possible outcomes for which it would not.

>>?? The endpoints U and V of the confidence interval are statistics, i.e. functions of the elements of the sample space. For every value of the parameter it is possible to indicate the intervals that contain this value. But then what? There is thus in a simple case a collection of C(parameter)'s giving rise to an interval that contains the parameter. As an example for the parameter μ of a N(μ,1)-distribution, the sample mean M is a sufficient statistic. A confidence interval will be [M-d,M+d], with realisation [m-d,m+d]. For μ=0 the intervals based on m=-d to m=d will contain 0. For μ=1 similar, etc. So what??
>.From here I have great difficulty to make your ideas understandable. Nijdam (talk) 19:43, 20 May 2012 (UTC)

Each outcome has a probability of occuring. When you generate a 95% confidence interval, or mention a confidence level of 95% or a degree of confidence of 95%, what you mean is that the sum of the probabilities of the outcomes that form C is 95%. When you perform a significance test and find an outcome significant at p=0.05, what you mean is that it is a member of the subset S, and the 0.05 or 5% refers to the fact that the sum of all the probabilities of outcomes in S is 5%. The whole point of specifying significance or confidence is to provide the reader with information about how the set of outcomes was partitioned. (talk) 13:57, 20 May 2012 (UTC)

Yep, that's what I expected.

30 years ago I did my Maths degree. Our Faculty of Mathematics had two major areas of research, statistics and computing, and so even though I wasn't much interested in it at the time I had to learn the bloody stuff. Being a Mathematics degree, I learnt it from the ground up: first set theory, then probability theory, then statistics. That is because in Mathematics you can't use any bit of maths unless you can demonstrate it from first principles. Recently a friend of mine did a chiropractic degree. As part of his degree he had to do a course in statistics, and I managed to get a copy of his course notes. And guess what. No set theory. Virtually no probability. Instead, he was taught a bunch of terms, and then taught the formulas based on them and how he should use those formulas. As a result he knows how to do things, but he doesn't understand what he is doing, and can't take on board ideas that conflict with what he has been taught.

I see the same thing in you. You can't understand simple set theory, or simple algebra. You can't understand a term unless I use the precise word or phrase that you have been taught, and even then you can't conceptualise beyond the bounds of your teaching. I would guess that you are a psychologist, or epidemiologist, or doctor, or something similar. You are what I would call a layman, albiet with a false sense of competence based on insufficient learning. If you want to understand what I am saying you need to first admit that you are improperly trained and then find someone to teach you the mathematics upon which statistics is based. (talk) 22:11, 20 May 2012 (UTC)

From Understanding Statistics (2e), Mendenhall & Ott, Duxbury Press, Massachusetts:1976.

"Constructing an interval estimate is like attempting to rope an immobile steer. In this case the parameter that you wish to estimate corresponds to the steer and the interval corresponds to the loop formed by the cowboy's lariat. Each time you draw a sample, you construct a confidence interval for a parameter and you hope to "rope it," that is, enclose it in the interval. You will not be successful for every sample. The probability that an interval will enclose the parameter is the confidence coefficient.

"To consider a practical example, suppose that you wish to estimate the mean number of bacteria per cubic centimeter of water in a polluted stream. We draw 10 samples, each containing n = 20 observations. Then we construct a confidence interval for the population mean µ from each sample. The 10 confidence intervals might appear as shown in Figure 8.8. The horizontal line segments represent the 10 intervals and the vertical line represents the location of the population mean number of bacteria per cubic centimeter. Note that the parameter µ is fixed and that the interval location and width vary from sample to sample. Thus we speak of the "probability that the interval encloses µ," not "the probability that µ falls in the interval," because µ is fixed. Having grasped the concept of a confidence interval, let us now consider how to find the confidence interval for a population mean µ based on a random sample of n measurements.

"The confidence interval for a population mean µ may be obtained [...] In fact, if n is large and the probability distribution of y is approximately normal, we would expect approximately 95 percent of the intervals obtained in repeated sampling to enclose the unknown population mean µ.

"Note that we will only calculate one such interval for a given problem, and it either will or will not cover µ. Since we know that approximately 95 percent of the intervals in repeated sampling will enclose the mean µ, we say we are 95 percent confident that the interval we calculate does enclose µ."

There is one parameter. There are multiple statistics that could represent that parameter, one for each sample you could take from the population. There is one confidence interval for each statistic. It is generated from the data that generated the statistic. Because there are many possible statistics, there are many possible confidence intervals. "The probability that an interval will enclose the parameter is the confidence coefficient." If you divide all the possible samples into two groups, those that produce an interval that encloses µ and those that do not, and sum the probabilities of those that do, the total will be 95%, and that is how the confidence coefficient is defined.

If you roll a dice there is one chance in six that it will roll a six. If you generate a 95% confidence interval from a sample, there is one chance in twenty that it will not enclose the parameter. Do you understand this, or do I need to dig out more books? (talk) 10:52, 22 May 2012 (UTC)

Well, I do, but apparently you don't. It's exactly this I wrote instead of your so called explanation.Nijdam (talk) 13:25, 22 May 2012 (UTC)
That's why I wrote: "how often the fixed, but unknown parameter will be in the observed interval". And I also added the equivalence between hypothesis testing and confidence intervals. From there, I guess, your misunderstanding originates.Nijdam (talk) 13:30, 22 May 2012 (UTC)
What you wrote: "the word "confidence" is a technical term used to indicate how rare an outcome has to be before the writer will accept it as significant. " is nonsense. This applies to the level of significance in hypothesis testing. Nijdam (talk) 13:40, 22 May 2012 (UTC)

"a confidence interval is generated for a parameter of a distribution." A confidence interval is generated for a statistic, not for a parameter. You add two numbers to the statistic and that gives you your interval. An interval may or may not include a parameter, in the same way that a dice may or may not roll a six. You should know that.

Sorry, no better way than this to show your ignorance. A confidence interval is an interval estimate of a parameter. What else should there be estimated. A statistic is an observed quantity. Statistics are used as estimators or as test statistic.

"A statistic is a function of the sample, that 'summarises' the sample. Outcome and event are specific terms in probability theory." From the above text: "A population is the set of all measurements of interest to the sample collector." "A sample is any subset of measurements selected from that population." "The objective of statistics is to make an inference about a population based on information contained in a sample."

Any contradiction so far? BTW inference about a population, means inference about the parameters.

"A random sample of one measurement from a population of N measurements is one in which each of the N measurements has an equal probability of being selected." "A random sample of n measurements is one in which every different sample of size n from the population has an equal probability of being selected." "Probability is the measure of one's belief in a particular outcome of an experiment." "The probabilist assumes that the population is known, then calculates the probability of observing a particular sample outcome." "The statistician on the other hand calculates the probability of the observed sample outcome and based on this probability tries to infer something about the nature of the population." I could go on. Statistics is based on probability theory, which is based on set theory. You should know that.

What is your point?

"Let's not speak of 'for each outcome in A', as I don't know exactly what you mean." But you should.


"The endpoints U and V of the confidence interval are statistics, i.e. functions of the elements of the sample space. For every value of the parameter it is possible to indicate the intervals that contain this value" A parameter does not take multiple values. You should know that.

Not multiple values, but possible values.

"When you generate a 95% confidence interval, or mention a confidence level of 95% or a degree of confidence of 95%, what you mean is that the sum of the probabilities of the outcomes that form C is 95%" "If you divide all the possible samples into two groups, those that produce an interval that encloses µ and those that do not,

That is not possible, as you do not know the value of the parameter. Therefore for each possible value of the parameter the sample space may be divided into two groups one for which the proposed confidence interval contains that value of the parameter and one that doesn't. But what's the use?

and sum the probabilities of those that do, the total will be 95%, and that is how the confidence coefficient is defined." These two statements are saying exactly the same thing, and yet you deny the first one and agree with the second. The difference is only in the terminology. If you actually understood the meanings of the terminology you are using you would know that.

For each possible value of the parameter the probability for the proposed interval to contain the parameter is the confidence level. No connection with the sum you mentioned above. Your are confused again, I guess, with the acceptance region in testing.

"because me too, being an expert," Experts do not make the sort of mistakes you are making. (talk) 14:31, 22 May 2012 (UTC)

Right. Nijdam (talk) 17:09, 22 May 2012 (UTC)

"Overall, nonsmokers exposed to environmental smoke had a relative risk of coronary heart disease of 1.25 (95 percent confidence interval, 1.17 to 1.32) as compared with nonsmokers not exposed to smoke." "For each possible value of the parameter the probability for the proposed interval to contain the parameter is the confidence level." A possible value of the parameter is 1. What is the probability that [1.17,1.32] contains 1? Er, that would be 0%, not 95%. Another possible value for the parameter is 1.3. What is the probability that [1.17,1.32] contains 1.3? Er, that would be 100%, not 95%. Name any possible value of the parameter and the probability that the interval must contain it is always either 0% or 100%. What you should have said is that the confidence level is the probability of the confidence interval generated from a particular statistic containing the parameter. This point is mentioned three times in the opening section of the article and yet you still don't get it. (talk) 22:18, 22 May 2012 (UTC)

You still don't see the difference between a confidence interval and its realisation. Nijdam (talk) 10:04, 23 May 2012 (UTC)
I give you an example. X is N(μ,1)-distributed. Unknown to you μ=0. I give you one observation: x=2. Let's calculate a 0.90-CI. It is the interval [x-z,x+z], with: P(X-μ>z)=0.05. Hence z=1,645 and the observed confidence interval is: [0.355, 3.645]. It is not this observed interval that covers μ with probability 0.90. That would be nonsense, then either μ is in this interval or is not in this interval (as in this case). No, the CI [X-1.645, X+1.645] (notice the capital X) does cover μ (i.e. for all possible values) with probability 0.90. That's how it is constructed. And every time we observe a new value x for X, we get a new interval. Nijdam (talk) 10:25, 23 May 2012 (UTC)

"It is not this observed interval that covers μ with probability 0.90. That would be nonsense" Indeed. The probability that the process would generate an interval that covered μ was 90%. In this case it didn't. So the probability that this observed interval contains μ is 0%. If you were to repeat the process with random values of x then 90% of the time the interval generated would cover μ. That is what the 90% means.

"X is N(μ,1)-distributed" Suppose I'm taking samples from the population of a country in order to estimate the average height of people in the country. I could instead measure the heights of all the people in the country and calculate the parameter. Suppose I'm sampling manufactured light bulbs to estimate what percentage are dodgy. I could instead test all the bulbs and calculate the parameter. The parameter is not distributed, it has only one value. It's the statistics that are drawn from the samples that are distributed around the parameter.

>>So far, so good.

Yes, I do understand what you are doing. You are generating an interval on the statistic and then transferring it to the parameter by assuming that if x minus X is z then X minus x must be -z. But all you've done is a transformation. You're saying that if the interval centred at the statistic contains μ then the interval centred at μ will contain the statistic. X is not a variable, it is a point whose value is μ.

>>No no, nothing of the kind. X is a random variable with N(μ,1)-distribution. I construct a (stochastic) confidence interval by standardising X into X-μ, which now has a completely known distribution, namely N(0,1). Hence I can find z, such that P(-z<X-μ<z)=confidence level. Form this equation it follows that P(X-z<μ<X+z)=confidence level. Hence [X-z,X+z] is a confidence interval. By observing X=x, it takes a specific value.

There is a distribution for which the parameter is x and the statistics are possible values of some variable X. Making an inference on that would involve trying to estimate a number you already know, which seems pointless.

>>I wouldn't say: "for which the parameter is x", but "for which the parameter takes the value x".

The reason this sort of fiddly distinction is important is that the probability of the generated interval containing the parameter is not just the probability of generating an interval that contains the parameter. Except in fairyland it also depends on other knowledge that you have about the parameter. (talk) 12:35, 23 May 2012 (UTC)

>>?? Don't get way you want to say here. Nijdam (talk) 16:22, 23 May 2012 (UTC)

Here's another cite for you:

From Statistical Analysis for Decision Making (4e), Morris Hamburg, Harcourt Brace Jovanovich:1987.

"We must be very careful how we interpret this confidence interval. It is incorrect to make a probability statement about this specific interval. For example, it is incorrect to state that the probability is 95% that the mean lifetime µ of all tires falls in this interval. The population mean is not a random variable; hence, probability statements cannot be made about it. The unknown population mean µ either lies in the interval or it does not. We must return to the line of argument used in explaining the method and indicate that the values of the random variable are the intervals of xbar ± 1.96 sigma(xbar), not µ. Thus, if repeated simple random samples of the same size were drawn from this population and the interval xbar ± 1.96 sigma(xbar) were constructed from each of them, then 95% of the statements that the interval contains the population mean µ would be correct. Another way of putting it is that in 95 samples out of 100, the mean µ would lie within intervals constructed by this procedure. The 95% figure is referred to as a confidence coefficient to distinguish it from the type of probability calculated when deductive statements are made about sample values from known population parameters."

The same as I said (of course, otherwise there would be a problem with this textbook). Nijdam (talk) 18:06, 23 May 2012 (UTC)

Have yet to see anything in any text book that agrees with you. (talk) 16:45, 23 May 2012 (UTC)

You should note the agreement of the above textbook with what I told you. Nijdam (talk) 18:06, 23 May 2012 (UTC)

Okay, I've picked you. You are not talking about a confidence interval but about a Bayesian credible interval, which is something different. The process you described above generates a credible interval, not a confidence interval. From Statistical Analysis for Decision Making: "This sort of distribution is thought of as appropriate subjective prior distribution when the decision maker has virtually no knowledge of the value of the parameter being estimated. Doubtless, such states of almost complete lack of knowledge about parameter values are rare." If you know something about the parameter, this procedure won't work. In any case, confidence intervals such as the one I quoted above are not credible intervals. (talk) 17:46, 23 May 2012 (UTC)

Absolute nonsense. Nijdam (talk) 16:19, 3 June 2012 (UTC)

Inequality definition of confidence interval

After some discussion with User:Scwarebang, I've added an alternative definition of confidence intervals which requires merely that the coverage probability matches *at least* the confidence level, along with a reference to Roussas. In order not to clutter the main definition, and also because this seems to be not the standard way of doing things, I've put it for now under the section "Approximate confidence intervals". Mtroffaes (talk) 08:18, 3 July 2012 (UTC)

Request for expert help in making this article more usable

I am an experienced user of statistics, but not a statistical expert. I came to this page for help because journal editors increasingly require authors to report confidence limits of populations statistics, rather than standard deviations or standard errors, for reasons which I understand and find persuasive.

Obviously, I expect eventually to have to consult text books and if need be the original literature, and I am capable of doing that. But I expect an encyclopaedia article to give me an orientation to the subject, tell me the main terms to look out for, and if the matter is simple enough enable me to get on with my job. I also expect it to tell me where I can read more - the sneering tone above people's desire for references is way out of order. What I found was almost entirely useless for my purposes, which I suspect are like those of the majority of people who will want to use this article. The article told me a certain amount that I already knew, and then plunged me immediately into more mathematics, in more pedantic detail, than I needed or could easily cope with. It also contained a certain amount of what I strongly suspected was philosophical grandstanding - and a glance at the talk page confirmed that impression. We all know there are issues at the philosophical foundations of probability, and that there are differences of opinion between classical and Bayesian statistics; a page about a particular topic that lies essentially within the classical sphere is not the right place for expressing those disagreements. It is right to point out somewhere that they exist, and that a confidence interval though it occupies the same role as a credible interval is not the same; but after that we should hear no more of the matter.

Accordingly, I am going to try to reorganize the article to improve its usability. In doing so, I appeal for help to all the experts who have, from their different points of view, contributed to its content. Although I intend to rely largely on material that is already present, I am going to cut out (or relegate to later sections) some detail, and as a user rather than an expert, I am also almost bound to introduce solecisms. I am confident that despite the disagreements aired above, you will be able to help, and in a consensual way. I do not believe that expert statisticians really disagree about the kind of statement that needs to be included in an article at the level appropriate for Wikipedia; and if there are differences of view, they should be signalled in the article but we should not be trying to resolve them - that would be the dreaded Original Research.

Having written a lengthy comment, I now have no time to start the work of revision. I'll be back later to get on with it, and I look forward to your help. seglea (talk) 16:55, 12 April 2009 (UTC)

Well whether or not you consider it "philosophical grandstanding" your opening comment that "By definition, the confidence interval with probability P has that probability of including the true value of parameter which is being estimated." while perhaps defensible on some readings will certainly be misinterpreted by many readers, as was discussed ad nauseam on this page previously. Is there actually a reason why you want the page to be misleading? Jdannan (talk) 21:45, 12 April 2009 (UTC)
Hah, good, I see the previous commentators have not forgotten this page, so we have a chance to work on it. I have now done my best with the opening section, which I have reduced to a (fairly) short opening para and a longer section now called "Conceptual basis". It is important to get this correct and effective before moving on to tidying up the rest of the article, so I am glad you are opening the discussion up.
In answer to your specific point:
(a) That is the definition of CI that I have found in the only stats book I happen to have on my shelf at home - I will check others next time I am in my office, but my memory is that they say the same thing. If this definition is not accepted by all, we are going to have to give an alternative, equally brief and conceptual, definition, and give references to where they are each discussed. It is not appropriate to try to resolve within a Wikipedia article matters on which qualified experts disagree - all we can do is try to help readers be aware of and if possible understand the disagreement.
(b) Having read the discussions above, I think I can see the misinterpretation you are concerned about. I have now expanded the "Conceptual Basis" section to try to head them off. From the timing of your comment above, I think you caught this process half way through. Please have a look at the current version and see if it is addressing the point you are worried about. It is not very elegantly expressed yet, but if the point is right we can work on its expression later.
(c) Without disputing the need to get this correct, I have yet to see that if readers made the misinterpretation you are concerned about, they would then go on to make errors of inference about data. (I am perfectly open to persuasion on this, but I need to be shown an example before I'll believe it.) If the interpretation is important to the statistical theory of CIs but not to their practical use, then all we need do is note that at the theoretical level the definition needs further discussion, and move that discussion to the theoretical section of the article. seglea (talk) 22:08, 12 April 2009 (UTC)
The "errors of inference about data" are that even talking in a probabilistic way about (unknown but constant) parameters is a category error for a frequentist. It makes about as much sense for them to say there is a 90% probability that x lies in the interval [2.3,3.4] or whatever as it would for them to talk about the colour of x.Jdannan (talk) 02:01, 13 April 2009 (UTC)
Thank you for your response. I understand your concern and have modified the text to try to meet it. It doesn't seem appropriate to go into this matter in the opening paragraph, as (regrettably, no doubt) it will on first encounter be largely meaningless to most people who find themselves wanting to use confidence intervals. So in the opening para I have made clear that this isn't an ideal way of expressing things, and in the next section I have tried to put it precisely; and would welcome help in getting that bit right. seglea (talk) 15:56, 13 April 2009 (UTC)

"A confidence interval is always qualified by a particular probability (say, P), usually expressed as a percentage; thus one speaks of a "95% confidence interval"." P is not a probability, it is the Confidence Level. At best, it is the level of probability at which a result is accepted by a researcher, or perhaps the probability of not getting a type I error. The real probability of a particular interval containing the parameter may be much higher.

"By definition, the confidence interval with probability P has that probability of including the true value of the parameter which is being estimated." This is total bullshit. P is the percentage of confidence intervals that the generation process produces that will contain the parameter. Different thing. To get the actual probability of the interval containing the parameter you would have to know the details of the generation process and do some complex arithmetic.

More seriously, a general introduction should be for a general user, not a mathematician. Yours isn't. If you want to write a good general introduction to this article, go and read the entry in the Encyclopaedia Brittanica. They do in about two paragraphs what this article fails to do in pages of waffle.

Until I see a good introduction, I'm putting the page back to where it was. It may have been hard to read, but at least it wasn't bullshit. —Preceding unsigned comment added by (talk) 23:58, 12 April 2009 (UTC)

I am afraid I have simply reverted your edit, which was inappropriate on several grounds:
(a) You really should get yourself a userid so that your contributions can be properly identified and your areas of expertise recognised.
(b) Your comments are falling short of the basics expected under WP:Civility.
(c) You have reintroduced into the opening para a basic error (that a 90% CI will be wider than a 95% CI).
(d) As I am not a mathematician (and have made that clear above), I would be quite incapable in writing an introduction that was suitable for one. I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means. If you find parts of the opening paragraph overly mathematical in expression, I suggest that you try some detailed editing to improve them, rather than wholesale reversion.
(e) As I have already indicated in my responses to the helpful interventions of Jdannan, I understand the issue about defining a CI in terms of a probability - but it happens to be the definition that is widely used, and as such it needs to be given. A formulation very like the one you propose already appears in the next section of the article, to explain what is actually meant, and I am afraid this leads me to believe that you may not have read what you were criticising. I agree that it would be better to introduce the P value as the Confidence Level rather than setting people off on the wrong track by calling it a probability, and I am just going to change the text to do that.
Where I do agree with you is that this article contains a great deal of unnecessary material, some of it repetitive. I am working from the top down to eliminate that and will not be able to do it all in one edit. I do not think that the 2 paragraphs you cite for the Britannica article are a necessary target; the present material includes some useful mathematical sections that should be retained, though not in the earliest parts of the article. seglea (talk) 15:56, 13 April 2009 (UTC)
And I see that Confidence Level redirects to this article, so that term should certainly be in the opening para, in bold - it now is. seglea (talk) 16:05, 13 April 2009 (UTC)
IP editors are welcome if their edits meet with consensus, as are you. But do you really think you should be throwing out accusations of incivility, after your first missive here? Allowing time for discussion of your concerns before making changes might have caused less fireworks and reduced the chance of your work being reverted. But I agree with you on (c), at least, and that there is plenty of scope for improvement. -- Avenue (talk) 18:02, 13 April 2009 (UTC)
Unreserved apologies to all if my opening remarks came across as uncivil; they were not intended to be; I was simply trying to state what I thought was wrong with the article from my point of view as a likely user, to set out a strategy for improving it, and to recruit editors of good will and greater expertise than mine (among whom I strongly suspect you can be counted) to help in the process. seglea (talk) 18:48, 13 April 2009 (UTC)
That was a classy apology – thank you. I'm sorry if I read a tone into your message that was not what you meant. -- Avenue (talk) 22:16, 13 April 2009 (UTC)

Suppose a researcher conducts a study in which she does ten statistical tests and gets one positive link. If each test is done at 95% confidence level then she has ten chances at 5% or over a 40% chance of getting at least one type I error. So the probability that the interval will contain the parameter is just under 60%, not 95%. Textbook definitions are based on the assumption that only one test is done, but in the real world that is very rarely the case.

The real problem with this article is one common to most of the mathematical articles in this monstrosity, which is that is is so overburdened with fiddly technical definitions and qualifications that no ordinary person has much hope of understanding it. Show the opening paragraph to your wife and then ask her to tell you what a confidence interval actually is; I doubt she'll get past "interval estimate". Then do the same with the Britannica article, and you'll see what I'm talking about.

"I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means." -- I'd really like to see an article here that such a user could understand, but it won't happen.

With reference to (c), that got snuck in 31st march. It wasn't there last time I read it. :-) There are two many idiots editing this monstrosity.

With reference to (b), I've been through one revert war already, and have seen what 'civility' means here. :-) Get yourself a thicker skin. —Preceding unsigned comment added by (talk) 11:10, 14 April 2009 (UTC)

Now, if I wanted to be tiresome I could suggest that generalising from one bad experience is no way for someone who claims to know about statistics to behave. Having edited hundreds of Wikipedia articles, and seen many of them grow into fine and usable documents through good-humoured collaborative effort, my experience is different. Until proved wrong, I'll assume that the same process can work for this article, and I am doing my best to look through your dyspeptic way of expressing yourself to what you are saying. Indeed, I've just made a couple of edits that reflect the points you are making above. Actually I think we (and other contributors) have the same aim here - namely to reduce a sprawling article to a usable one. That should not mean sacrificing technical rigour, it just means getting it in the right place. However, I edit on this site for pleasure and my own education, and neither of those is added to by interacting with people who don't have any manners; so I have a general policy of ignoring people who are simply rude. seglea (talk) 21:56, 14 April 2009 (UTC)

I have made some more tweaks to the intro, and removed a couple of chunks lower down that seemed out of place. If anyone thinks they were needed, can you explain what their particular role was, please? seglea (talk) 21:56, 14 April 2009 (UTC)

Re: 283982996 -- this stays in until I see the point made elsewhere in the article, in language my wife could understand. :-) —Preceding unsigned comment added by (talk) 12:29, 15 April 2009 (UTC)

My concern about these paras are that, though the points are fair enough, I am not sure they belong in this particular article - they are very general issues about statistical inference. This kind of thing (i.e. general issues finding their way into specific articles) is a common cause of bloat in Wikipedia articles - and this article is by common consent suffering from bloat. I will have a look around and see if they can be placed somewhere more suitable and dealt with more briefly here, by a link. seglea (talk) 22:25, 15 April 2009 (UTC)

Firstly, given that both Confidence Level and Confidence Interval point here, where else is a more appropriate place to talk about what the word confidence means in a statistical context? Where else is someone looking up the word confidence going to end up? Secondly, given that the reason many people will look up confidence level/interval is because they've seen a published statistic and want to know what the 95% means, this is the appropriate place to tell them. Thirdly, in my experience "move it to a more suitable article" is a weasel way of saying delete it without appearing to. The wikipedia indexing function won't find the article "Meaning of the term confidence in a statistical sense" when someone types in confidence level, and so the content is buried. My suggestion is to have another section directly under the main section with a title like "Confidence Intervals and Statistical Tests" which tells laymen how to interpret confidence intervals when presented with the results of statistical tests. That to be followed by the technical definitions. But I'm happy so long as this remains. —Preceding unsigned comment added by (talk) 00:38, 16 April 2009 (UTC)

Let's recall that this stuff was deleted previously and put back unnecessarily by this same person who insists that only he is right. And now put balk for a second time. The article Confidence in statistical conclusions was created so as to have a place for this stuff which clearly does not belong in this article but might have some small merit .. but by the tags added no one else seems to think so. Of course this wasn't good enough. The key to "making this article more usable" must be to first omit all the stuff not directly related to the article's title. Melcombe (talk) 08:53, 16 April 2009 (UTC)

I've come to this page 3 years on, and it is still pretty unuseable. Especially so when compared with other Wikipedia articles on statistics. I'm not in a position to edit it, but it would be much better with a simple description, none of the Classical vs Bayesian grandstanding, and if maths must be used, less of the elitist long formulae when a much shorter formula could be used. Sorry this isn't very complimentary but I've never read a Wiki article so dense, about a straightforward subject. — Preceding unsigned comment added by (talk) 21:23, 5 July 2012 (UTC)

Intervals for random outcomes

I do not believe these intervals (prediction intervals) are also called confidence intervals. Nijdam (talk) 22:41, 7 January 2011 (UTC)

I strongly second this. Because the distinction between confidence and prediction intervals is a common misunderstanding, perhaps we should simply refer to their definition in the prediction interval article, and mentioned that they are not to be confused with confidence intervals. Mtroffaes (talk) 07:46, 4 July 2012 (UTC)
I went ahead and renamed the section to "Comparison to prediction intervals". I've also cleaned up the article with respect to the definition, so 1 − α is used consistently. I've also mentioned the term significance level in the definition. Mtroffaes (talk) 08:39, 4 July 2012 (UTC)
In hindsight, it is indeed better to mention significance level only in connection to hypothesis testing, so I agree on removing the term in the definition. I've reverted Nijdam's earlier revert to my formatting improvements, in good faith, taking care that the intended reverts were kept (i.e. that γ is used rather than 1-α). Mtroffaes (talk) 14:32, 11 July 2012 (UTC)


I added the dispute template at the article as some anonymous has entered the section "Meaning of the term "confidence"", which is not correct and in whcih the terms confidence and statistical significance are confusingly brought into relation. Nijdam (talk) 16:24, 3 June 2012 (UTC)

I put the template at the section of interest in order to bring attention specifically to this topic. Mikael Häggström (talk) 21:08, 26 June 2012 (UTC)
That section is an egregious misuse of statistics and is incorrect. I suggest removal. BYS2 (talk) 10:02, 7 July 2012 (UTC)
+1. Nijdam (talk) 22:12, 8 July 2012 (UTC)

The tag was placed in reference to the statement "In statistics, the word "confidence" is a technical term used to indicate how rare an outcome has to be before the writer will accept it as significant". The term 'significant' was poorly chosen; it was intended to mean meaningful, not statistically significant. The section has since been rephrased and can no longer be read to have the same meaning. According to wiki protocol Nijdam is free to remove the tag or leave it there for as long as he pleases. If you have any specific complaints about the section in question please detail them, preferably in a new section, and I will answer them. Statements such as "That section is [...] incorrect" without explanation aren't evidence of knowledge on your part, only of bias. (talk) 16:05, 8 July 2012 (UTC)

See my post in the 'Meaning and Interpretation of Confidence Interval' talk section below. BYS2 (talk) 16:16, 11 July 2012 (UTC)

Example is poorly constructed

It is highly unlikely that standard deviation from mean is known but the mean is not. Calculating the standard deviation from mean requires informate example reads in part: "The distribution of X is assumed here to be a normal distribution with unknown expectation μ and (for the sake of simplicity) known standard deviation σ = 2.5 grams." If the expectation is unknown, it seems contrived and unreasonable to present the variance of X as a known. AngleWyrm (talk) 18:18, 11 January 2010 (UTC)

Obviously the example is not meant to be particularly realistic, but rather to provide a case where both the setting and manipulations required are simple to understand. Melcombe (talk) 09:39, 11 January 2010 (UTC)
The current example adds confusion rather than illumination. X is presented as an unknown distribution, with an unknown expectation (mean of X). Then it is assigned the property of a Normal Distribution, a common practice, especially in examples. But the example as it is currently written goes on to say the variance of an unknown distribution is known, a rather extraordinary claim, even to the point of casting doubt on the validity of the methods being used. What can be known is the variance of the sampled 25 test cups. AngleWyrm (talk) 18:18, 11 January 2010 (UTC)
The article doen't say "unknown distribution". There can certainly be pratical situations, similar to that here where a standard deviation is effectively known exactly ... thus the sample of 25 being considered might be from one batch of many such batches, where experience has shown that the mean changes between batches but the variance does not ... combining information across batches can lead to an effectively exact value. In any case: (1) there is no need to be more complicated than necessary, and (2) the second example ("theoretical example") treats the case you seem to want. Melcombe (talk) 16:19, 12 January 2010 (UTC)
The example fails to specify the desired size of a cup:
" supposed to be adjusted so that the mean content of the cups is close to 250 grams of margarine." Close to is an arbitrary and subjective quantity. Is the cup supposed to be 250g +/- 100g? Probably not. Is it supposed to be 250g +/- 5g? That's a little more reasonable. I'm suggesting that the desired outcome be explicitly stated. With the both μ and σ specified, the "For the sake of simplicity" cop-out can be removed. AngleWyrm (talk) 20:45, 13 January 2010 (UTC)

Proposed change to example I've created an image for the example. This picture represents the desired output of the factory equipment. The actual output can then be sampled, and then compared to see if it is an approximation of the desired curve, or if the sample differs significantly from the target. AngleWyrm (talk) 01:29, 23 January 2010 (UTC)

The cups are filled by a filling machine of which the expected quantity of margarine may be adjusted. The desired output is 250 gr, but due to unavoidable random effects it is normally distributed with (property of the filling machine) σ = 2.5 gr. The machine is adjusted to deliver on the average 250 gr, i.e. the expected value is set to μ = 250 gr. But in due course this value may change a little. That's why one is interested in the confidence interval. All quite realistic. Your picture also needs be adjusted, as it shows the wrong standard deviation. Nijdam (talk) 15:38, 23 January 2010 (UTC)
Image altered to represent standard deviation of 2.5g, and altered the initial paragraph it include σ = 2.5g as part of the problem definition. Two questions: 1. How was n=25 determined? (Required Sample Size) and 2. What is a good choice of confidence interval? Also, the problem should directly address some sort of conclusion about the sample from the factory floor. AngleWyrm (talk) 00:11, 24 January 2010 (UTC)
Let's not go too overboard with the example here. It seems from the comments that what someone is wanting is something that would be much better placed in another article, such as acceptance sampling. Let's not unbalance the article for confidence interval. Melcombe (talk) 11:01, 25 January 2010 (UTC)
The example states "To determine if the machine is adequately calibrated," so it's not too much to ask that the example directly address and answer that question. Otherwise it is not a practical example. If this article proposes the use of Confidence Intervals to deal with the factory example, then it should clearly demonstrate at the end of the problem a conclusion that "yes, the machine needs calibrating," or "No, the data does not support the expenditure of a maintenance call." So is this problem a demonstration of Confidence intervals in use? If it's not, then the entire example should be removed. AngleWyrm (talk) 00:28, 27 February 2010 (UTC
At your service! Nijdam (talk) 11:17, 27 February 2010 (UTC)

Imagine for a moment that the testing process is destructive, and that they are not inexpensive margarine cups, but much more expensive car engine blocks. The testing questions that come up are "Is the machine manufacturing engine blocks to within the engineer's specifications?" and "How few can I destroy and still get a defensible answer to the first question?" Let's say that each engine block costs $25 in materials and 20 min of factory time. So: Is this scenario answerable with Confidence Intervals, or is there a better way? AngleWyrm (talk) 05:12, 14 March 2010 (UTC)

On one hand, a first example is ideally as simple as possible, using a normally distributed quantity as this (margarine) one does. On the other hand the example is unnecessarily complicated by the fact that it's not about the parameter of a tangible phenomenon, but rather about whether the machine is calibrated correctly. This adds a layer of abstraction that is not desirable in a first example.
I'd suggest that the first example be sampling from a population of voters to estimate what percentage is in favor of a particular voting measure (which need not be specified). And that it be stated outright that, for simplicity, the standard deviation will be assumed known as some fixed number.
The next example currently in the text is more realistic, not assuming σ to be known and so needing to use the Student t-distribution. So it is a bit ironic that it is called a "theoretical" example. There is no value, in my opinion, for a "theoretical" example to be the second one, so why not come up with a concrete framework within which the same example can be discussed. (Then the third example can be "theoretical".)Daqu (talk) 08:28, 8 August 2012 (UTC)
I rather differ from opinion here. There is nothing wrong with the introductory example. The suggested example about a poll is more difficult because of the discrete nature of the distribution. And finally, I do not understand the criticism on the term theoretical. Nijdam (talk) 15:01, 8 August 2012 (UTC)
As I said, the problem with the first example is that the idea of the *calibration* of a machine is not a tangible concept. This makes it harder for someone trying to understand the concept of confidence interval. The best examples would be a confidence interval for the mean of numbers or quantities of tangible things. And it should be stated explicitly, not just in passing, what is being *assumed* (like the value of σ). Also, this example appears to be taken verbatim from a Thai website with a PowerPoint download (google the phrase "Statistical Interval for a Single Sample").
I don't think anyone will have problems in understanding the context of this example, and ...
Does it cross your mind the Thai's got this from Wikipedia? (I'm the original author of this example!)Nijdam (talk) 08:01, 9 August 2012 (UTC)
For the second example, it is appropriate to use an unknown mean as well as an unknown σ — as currently in the second example. But instead of making it theoretical, it would be again better to use a concrete, tangible example. It should not be difficult or complicated to discuss finding a confidence interval for the percentage of people for some ballot measure. This is done all the time by polling companies, and reported in newspapers as something like "The poll shows 57% are in favor, with a sampling error of ±3%." Most people, including the newspaper reporters who write such articles, have no idea what this means. But if this is too complicated for anyone to write up, then why not use people's heights.
Then, for a third example it would be appropriate to be theoretical.Daqu (talk) 19:01, 8 August 2012 (UTC)

Meaning and Interpretation of Confidence Interval

I can see that there has been a lot of argument over the content of this article, and I'm reluctant to add to it, but as a newcomer I have to say that I find some of the statements about the interpretation or meaning of confidence intervals inconsistent. Here is my understanding of a confidence interval in the context of a sampling procedure. (If you disagree, please explain gently, I am trying hard and I'm not trolling). Suppose we have a population with an unknown mean and we sample it with a view to finding an interval estimate of its mean. We now compute a confidence interval from our sample, say 95%, as the sample mean plus or minus 1.96 times the standard error. We now form the hypothesis, H, that the true unknown mean lies within this interval. If E denotes the evidence that we have from the sample data, i.e. the computed confidence interval, then what we have calculated is that the probability of E given H is 95%. To make this clear, let's state it another way, it means that the probability of "If H then E" is 95%. Or again, if the hypothesis is correct then we can expect the sampling procedure to yield an interval that contains the true unknown mean 95% of the time. Frequentist statistics only ever gives the probability of "If H then E". The problem is that what people would naively like to know is, what is the probability of H given E? i.e. what is the probability that the true unknown mean lies within the computed interval, given the evidence we have found? This is a two-fold problem: firstly, because on a frequentist understanding of probability one cannot speak of the probability of the population mean lying within a range, because the population mean is not a random variable. And secondly, because on a Bayesian interpretation of probability, one cannot attempt to calculate the probability of H given E without other information, or at least assumptions, about the prior probability of H. The nasty consequence is that researchers reporting statistical results frequently confuse the probability of E given H with the probability of H given E and explain their results as meaning the latter. This really is surprisingly common: I’ve just performed an entirely unscientific sample of information about confidence intervals from a Google search and found some articles published by prestigious universities in which the two statements are conflated. Coming back to this WP article, there seem to me to be a few places here where the distinction is not made clear and where a non-expert reader might take away the wrong interpretation. Dezaxa (talk) 22:32, 26 June 2012 (UTC)

Non-expert readers may find that the confidence interval simulation applet at Rossman-Chance website is helpful for the (frequentist) interpretation of a confidence interval. Select "means" for method, and "z with s" for the interval that you described above. Enter the fixed value of mean and sample size n. Then click to simulate normal random samples with a specified mean and standard deviation. Now try entering 100 for the number of samples. The 100 confidence intervals are displayed in a graph. The green intervals cover the true value of the mean, and the red ones miss it. The black dot is the sample mean - it varies as do the endpoints of the interval. (The vertical black line is the population mean. Usually this is unknown, but here we simulated the data so we know it.) Count the number of red intervals. The number of red ones should be close to 5% if you set confidence level at 95%. In other words, there should be about 95 green and 5 red for 100 replications. You do not get exactly 95% green every time but the long run average will converge to 95% green. The running total will be displayed at the bottom.
The applet is provided for use with the Rossman and Chance textbook "Workshop Statistics 4e". More applets are here. Not sure if it is appropriate to link the applet in the main article, but the non-expert readers may benefit from the reference and link to applets. Mathstat (talk) 01:26, 27 June 2012 (UTC)

Moved the tag back -- only the person placing it can delete it or (presumeably) change it.

1. Frequentist = confidence interval; bayesian = credible interval; two different things.

2. I think most people who have a go at this section miss the point. Statistical estimation/testing is a method of using probability to try and relate a statistic to its corresponding parameter. Probability is not just some dry unchanging "feed it into a formula and get a result" thing. In calculating a probability you must always take into account all that you know about the event and its outcome. For example, the probability of getting a six when rolling a dice is 1 in 6, but if you've rolled the dice and seen that it has come up three then the chance of it being a six is 0%.

The maths used to generate the p value in a statistical test or the confidence interval is built on two assumptions, and those assumptions must be honoured or the results of the maths are worthless. Firstly, the maths assumes that the only thing you know about the probability of the outcome is what's in the sample data. If 1 in 20 intervals don't contain the parameter, and 1 in a million things cause Alzheimer's, then it's ridiculous to suggest that an interval is more likely to indicate a link to Alzheimer's than to be a dodgy interval.

Similarly, the maths assumes that you are generating one interval or doing one test. When you select from amongst multiple outcomes, the method of calculating the probability of the selected event changes; you can't treat them as independent events. If you roll 600 dice you expect to get 100 sixes. If you generate 1000 intervals at 95% you expect to get 50 dodgy intervals. You can't pick out a six and claim that the chance of getting it was 1 in 6, and you can't pick out and interval and claim that the chance of it being dodgy was 5%.

If someone said "I rolled 12 dice and got 2 sixes, so those two dice must have been loaded" you'd laugh at him. But if someone says "I generated 40 intervals and 2 did not contain the parameter..." (talk) 11:38, 27 June 2012 (UTC)

I placed the tag. Nijdam (talk) 12:18, 27 June 2012 (UTC)

In relation to the disputed section entitled “Meaning of the term Confidence”, my gloss of the explanation, from a Bayesian perspective, would be that even though the experiment yielded a 95% confidence interval of relative risk that did not contain the value 1.0, this should not be interpreted as meaning that it is highly probable that environmental smoke is linked with heart disease, because one must take into account the prior probability of such a link. This prior probability might well be assessed as being low because out of a very large number of possible factors that could have been tested, most are presumably not linked with heart disease. Stated thus, I don’t disagree. If the unnamed author meant something else, then it would be helpful if this could be clarified. Dezaxa (talk) 22:41, 27 June 2012 (UTC)

It's not about "Bayesian" or "Frequentist", it's about probability. If you roll a dice, there is a 1/6 chance of getting a six. If you roll two dice, and pick the highest, there is a nearly 1/3 chance of getting a six. If you generate a 95% confidence interval, there is a 1/20 chance of getting one that doesn't contain the parameter. If you generate 2, and pick the most abnormal, there is a nearly 1/10 chance of getting one that doesn't contain the parameter. So if you generate two 95% confidence intervals, and pick the most abnormal, you can only be 90% confident that the chosen interval contains the parameter. Same word, two different meanings.

And I would suggest, and I think you would agree with me, that there are a lot of laymen doing statistics who readily confuse one with the other, simply because they don't understand the underlying probability theory. (talk) 10:20, 28 June 2012 (UTC)

Try, try again.

You start with a sample. From the sample you generate a probability distribution based on the sample mean, sample variance, and sample distribution type (which you assume). From that you calculate (estimate) a statistic called a confidence interval. The 95% confidence level is the number you feed into the formula to get the statistic. It gives you a range of values into which 95% of the outcomes of events having that distribution must fall. But this distribution is not the interesting one, as it is only relevant to the particular sample.

You then examine a second distribution, being the distribution of samples around the population mean, based on the population mean, population variance, and population distribution type, none of which you know. You assume that the population variance and population distribution type are the same as those for the sample, giving you the same shape curve as in the first distribution, but shifted to be centred on the population mean. This curve has an associated parameter, the population confidence interval. You then use reflection to say that if your sample mean lies within the population confidence interval then the population mean must lie within the sample confidence interval, that is, for 95% of the outcomes you generate under the second distribution, the population confidence interval will contain the sample mean, and thus the sample confidence interval will contain the population mean, and consequently that there is a 95% chance that an interval generated under the first distribution will contain the population mean. The probability that the interval contains the population mean is the common sense definition of confidence.

(a) Conditional probability. The above logic assumes that outcome of the event you generate could fall anywhere within the distribution. But you may have information that negates that assumption. For example, if you know that the probability of the population mean being other than 1 is very small, then an interval that doesn't contain 1 obviously has little chance of containing the population mean. The common sense confidence must therefore be much lower than 95% in such a situation.

(b) Dependence. The above logic assumes that the second distribution has the same shape as the first, but is centred on the population mean. If it is not the same (in shape and/or centre) then the logic fails. For example, if the population mean is 1 and the second distribution is centred at 10, then most intervals generated under the second distribution will not contain the population mean. This would happen if for example you generated lots of intervals from a sample and selected the most abnormal ones. The distribution of those selected intervals will not be centred on the population mean. Thus the common sense confidence must therefore be much lower than 95% for those intervals. (talk) 12:37, 29 June 2012 (UTC)

Nijdam: I've reread and thought about your comment above ("You still don't see the difference between a confidence interval and its realisation...") and I think you are trying to say the same thing as I am. You just aren't taking enough care with your terminology. I think "CI [X-1.645, X+1.645] " is supposed to be a reference to the parametric confidence interval. And similarly "X is a random variable" is not referring to the parameter as being randomly distributed (which is what it looks like you are saying) but is rather referring to the distribution of samples around the parameter as opposed to the distribution inherant in a particular sample. Using what I think is your terminology, I can state the comment you took objection to as follows: The values that make up the domain of X can be subdivided into two sets, those which will produce intervals that contain μ, and those which won't. The 90% refers to the the probability that an interval will fall into the first set. Do you agree with this? Because so help me it looks like that is what you were trying to say. (talk) 14:48, 30 June 2012 (UTC)

Let's hope you indeed are trying to say the same thing. In that case your intentions are okay, you only fail to find the right wording. For example I do not understand what a reference to the parametric confidence interval means. What you say about the values of X is problematic, asa subdivision has to be made for all possible values of μ. Nijdam (talk) 11:04, 1 July 2012 (UTC)

Referring to your comments above, [0.355, 3.645] is the confidence interval derived from the distribution found in the sample, which is peculiar to that sample. The parametric confidence interval is the same thing but derived from the distribution of samples around μ. (Picture the graph you always see where the confidence intervals are lined up to show that some contain μ and some don't. The implication being that the confidence intervals are distributed around the parameter.) In the above I think you are suggesting that you produce one from the other by applying a transformation.

I'm assuming that μ is the parameter, in which case there is only one possible value of μ, which is not known. Is this correct, or do you intend some other meaning? If it is, then every observed confidence interval either contains μ or it doesn't. Thus you can partition the set of all possible samples into two subsets, those where the associated observed interval contains μ and those where it doesn't. The sum of the probabilities of the first subset is 90%. That's what the 90% means. Thinking about I agree that partitioning the domain of X would be problematic, as the mapping between samples and points in the domain of X is not 1:1. If X were defined on confidence intervals rather than transformed values of statistics it would be possible. (talk) 15:04, 1 July 2012 (UTC)

Sorry, no offence, but it seems you're not really an expert on the matter. However if you mean to make something clear, use formal terminology: A distribution, characterised by the unknown parameter θ, a sample of iid stochastic variables from this distribution, and observations of this sample .Nijdam (talk) 07:00, 2 July 2012 (UTC)

When I did my degree 30 years ago we were not taught the terminology you are using, and so I have to decode it. Anyway, I think I can progress.

A researcher collects a large set of data. He divides it arbitrarily into 10 samples, each of about equal size. For each sample he calculates a statistic, the relative risk of lung cancer amongst people over 175cm, as opposed to the sample as a whole, and for each statistic he calculates a 90% confidence interval. The statistics and intervals he gets are 1.3 [0.9,1.7], 1.2 [0.8,1.6], 0.8 [0.5,1.1], 0.98 [0.5,1.4], 1.2 [0.75,1.65], 1.4 [0.9,1.8], 0.95 [0.6,1.3], 0.7 [0.3, 1.05], 1.4 [0.95,1.85], and 3.4 [2.96,3.85]. He then writes and has published a paper in which he reports only the last statistic. He theorises that because smoke rises, tall people breathe more of it and so are more succeptible to cancer. Expressed as a percentage, how likely is it that the published interval contains the parameter? (talk) 11:17, 2 July 2012 (UTC)

Well, you need to be more specific. What data were collected, what was the size of the sample, what is the relative risk of lung cancer amongst people over 175cm, as opposed to the sample as a whole? And then: CIs are not calculated for statistics, but - we have been discussing this - for parameters. So? Nijdam (talk) 14:47, 2 July 2012 (UTC)

Nobody has ever measured the relative risk of lung cancer amongst people over 175cm for the population as a whole. If they had, the researcher would not be doing this research. The relative risks for the samples are quoted above. The researcher has not published his data nor his sample sizes, only the one result. The CIs published are what you call above the "observed CIS"; which is after all the normal practice amongst researchers. Expressed as a percentage, how likely is it that the published interval contains the parameter? (talk) 22:35, 2 July 2012 (UTC)

I’m not really sure where this discussion is heading, nor why the question the unnamed author poses is relevant to anything. For my part, my concern is that non-statisticians are apt to draw incorrect inferences from confidence intervals because of a misunderstanding of what they mean. I would be in favour of having a section in the article, perhaps entitled “Drawing inferences from confidence intervals” explaining the pitfalls. This could replace the disputed section if others continue to disagree with it. I’d be happy to draft such a section and post it here for discussion. Dezaxa (talk) 00:00, 3 July 2012 (UTC)

My understanding is that a 95% confidence interval is a range of values generated from a statistic such that there is a 95% chance that the process will have produced a range containing the associated parameter. This is what I was taught at university and it is what I have found in every text book I have consulted. Nijdam says this is incorrect, and has flagged the section because of it. I am trying to get him to explain himself, so far with no luck. Everytime I think I have him pinned down he claims I don't understand him, which is true enough. Only he can remove the tag, and that won't happen until I understand his point and he understands mine. (talk) 11:53, 3 July 2012 (UTC)
You even don't understand my criticism. You're right in what you just mentioned, but this is already in the article. You're wrong where you link a CI to statistical significance. That's why I put the flag. The whole section only brings either redundant or incorrect information. Let's just delete it. Nijdam (talk) 07:32, 4 July 2012 (UTC)
Nijdam - the whole article is full of references to significance. There are at least a dozen. Do you object to all of them? Dezaxa (talk) 23:15, 4 July 2012 (UTC)
Well, I'll be willing to discuss this with you, if you explain to me what, in your opinion, is the connection between CI and statistical significance. Only if you do understand the subject is it worthwhile to discuss the matter. Nijdam (talk) 22:07, 5 July 2012 (UTC)
I wasn't planning to start a discussion, I'm just enquiring as to whether you think the whole article needs editing, or just the flagged ection. Dezaxa (talk) 21:55, 6 July 2012 (UTC)
See for a set of slides. :-) (talk) 14:22, 6 July 2012 (UTC)
It follows from what I said that if you take a particular sample and calculate a confidence interval on a statistic derived from that sample then the confidence interval will either contain the parameter or it won't. So if you examine the set of all possible samples that can be drawn from a population, that set can be partitioned into two subsets, the first being the set of samples where the confidence interval contains the parameter and the second those where it does not. If you then define a function mapping each sample to its probability of being randomly selected, the sum of the values of that function across the first subset is equal to the confidence level used to generate the confidence intervals. The confidence level is thus an indication of the size of the subset of samples that you regard as "good". The higher the level, the larger that subset will be.
With regard to statistical testing, each sample gives rise to a p value that is either less than the significance level or is not. So you can similarly divide the set of all samples into two subsets. The sum of the probabilities of the samples which would give rise to positive results is the significance level. Again this number is an indication of the size of the set of samples that you regard as "bad". The smaller the level, the smaller that subset will be.
The actual samples that fall into the subset are not necessarily the same in both cases, obviously, but the two numbers are both serving the same purpose. (talk) 14:40, 4 July 2012 (UTC)
Sorry, but In the case of testing, a particular sample does NOT gives rise to a p-value. As an example the sample x=1 is drawn from a normal population, What would be its p-value?? I explained in the article the connection between CI and testing, and that's all there is. Nijdam (talk) 14:33, 6 July 2012 (UTC)
A sample is a set of data collected by a researcher. x is a statistic drawn from the sample. Many samples may give rise to the same statistic, x=1. Is this where you are failing to understand? If you collect a set of data you can draw a statistic from it and calculate a confidence interval on the statistic, or you can calculate a p-value from it and do a statistical test. (talk) 01:56, 7 July 2012 (UTC)
Sorry, but you seem to have no idea what you're talking about. You do not draw a statistic from a sample. In my simple example, the whole sample consists just of x=1, and x is a sufficient statistic. Now, you tell me what the p-value should be. Nijdam (talk) 12:25, 7 July 2012 (UTC)
"Sample. A subset of measurements selected from a population." (MendenHall & Ott, 1976) "In statistics, a sample is a subset of a population." (wikipedia) "The process of gathering data or of obtaining results from several performances of an experiment is called sampling. The results themselves are called observations, and the collection of observations is called a sample." (Lindgren, 1960) "A sample is a number of elements that does not include the entire group of elements in the population." (Cangelosi, Taylor & Rice, 1976) "A statistic is a function of the observations in a sample." (Lindgren, 1960) Actually, most texts define a statistic by example, by for example averaging a set of measurements: I think the meaning is supposed to be obvious. A sample of size 1 is not large enough to construct a confidence interval from. You need at least 30. Same applies for statistical testing. Actually, if your sample size is 1, how are you going to calculate the variance? :-) (talk) 13:25, 7 July 2012 (UTC)
X is normally distributed with variance 1 and unknown mean μ. Observed value for X: x=1. P(-1.96<X-μ<1.96)=0.95, hence CI=(X-1.96, X+1.96). Observed x=1, hence observed CI=(-0.96,2.96). Now it's your turn: give me the p-value. Nijdam (talk) 22:19, 8 July 2012 (UTC)

In order to calculate a statistic I need a sample. You haven't given me one. Give me the sample I'll give you the statistic. In order to calculate the confidence interval or p value you will have to give me at least 30 points. What you appear to be giving me is a statistic calculated from some unknown sample and expecting me to guess what the sample was.

X is the simple sample, count 1. As you may have noticed I did already calculate the CI for the mean μ. Now I'm interested in your p-value.

Consider the two sets of data, (x = 100+i,i=-20,-19,-18,-17,...,20) and (x = 100+i, i=-10,-9.5,-9,-8.5,...,10). These are both samples that could be drawn from the same population. They have the same means, the same sample count, but different variances. So they give rise to two different confidence intervals (-187,387) and (28.25,171.75) (if I've done my numbers right). Obviously the distributions in these two samples are not the same curve with different means; they are different curves with the same mean. Knowing just the mean doesn't tell me the confidence interval; I need to know the whole sample.

Apparently I have to guess you calculate a CI for the population mean. And what is worse, you haven't said anything about the underlying distribution. But let's look at your first sample. Give me your supposed p-value.Nijdam (talk) 15:07, 9 July 2012 (UTC)
Normal distribution, n=41, 2-tailed, ybar=100, mu=90, s=11.979, alpha=0.05, z=5.345, p=0.0000000452, unless I've mucked up the arithmetic (talk) 17:55, 9 July 2012 (UTC)
Which is the unknown parameter? Is y1,...y41 the observed sample? How did you calculate p?Nijdam (talk) 10:21, 10 July 2012 (UTC)

Actually, what you are doing is defining a set of functions based on a parameter. You are not taking samples and generating statistics from them and then trying to estimate parameters. Rather, you have left the realm of statistics and moved into the functional analysis of normal curves. To a statistician, a statistic is the value obtained by applying a formula to a set of observations, not a point in the domain of a function. A confidence interval is a (pair of) statistic(s) calculated from a sample, not an a function of a parameter or a range produced by substituting a value for that parameter. You've taken the jargon and redefined it, and are surprised that mathematics described in the original jargon no longer makes sense to you. Or at least that's what it looks like. (talk) 12:36, 9 July 2012 (UTC), do you work for a tobacco corporation or something? This section should definitely be deleted. This has gone on too long, you either have a very poor understanding of statistics, or are just trolling.
Your 'analysis' is a non-sequitur and shows a serious misunderstanding of statistical analysis.
How can the "relative risk" of the population be 1 as you state? What is your exposed and what is your non-exposed group if you take the entire 'population' as a group? Please have a read through the interpretation of RR. There must be a stratification of 'exposed' and 'non-exposed' if you are to use this measure (hence the term 'relative'). Saying that the population has a relative risk of 1 makes as much sense as saying the population has an odds ratio of 1 for bowel cancer - at best, it can be considered a vacuous truth.
Your interpretation of the result is done in a bayesian inference manner (probability that the null hypothesis is true given the data, P(H0 | Data)); however, the study was done using confidence intervals, which is a frequentist approach to inference (given the data, what is the probability that the null hypothesis is true, P(Data | H0)). One is not equivalent to the other. Therefore, it makes absolutely no sense to draw conclusions as you did. Please have a read at at the different approaches to probabilistic analysis specifically bayesian inference vs frequentist inference.
You implicitly assumed the prior probability of the null hypothesis H0 as 'vanishingly small' while setting H1 close to 1 which is incorrect. As the paper you cited is a meta-analysis, it was not a one-off study of a 'random hypotheses' or acquired through 'data mining' approaches to discovery (which could indeed be invalid). Your premise states that "Of the vast number of things that can be tested, very few actually are connected to coronary heart disease". This would true if "second-hand smoking" was an arbitrary factor such as "the number of jelly beans I saw at the shopping mall last year". But you seem to forget/omit that passive smoking is far from just a 'random hypothesis'. In fact, there is significant pre-existing literature regarding the association between passive smoking and cardiovascular disease. Therefore this hypothesis was motivated by existing pathophysiological, experimental and epidemiological knowledge of the aetiology of atheromas and CVD.
This would clearly affect the 'a priori' probability for the null hypothesis being true, P(H0). Therefore, if you were to use a bayesian approach to the interpretation of this study (which was actually presented with a frequentist approach), you would need to perform bayesian inference and use the prior probability, P(H0), to find the posterior distribution, P(H0 | Data). This would be done using the relationship P(H0 | Data) = [P(Data | H0) * P(H0)] / P(Data). The computations of that is another matter entirely - but suffice to say, the answer would be nothing like your conclusion.
Note that your 'result' does not even depend on the data, meaning your argument would have been the same even had it been a 99.999% confidence interval of [100000, 100100]. Clearly that is dubious.
All in all, your 'conclusions' about the meta-analysis are as atrocious as your knowledge of statistics and medicine. Your non-sensical arguments lead me to believe you have little formal training in those areas.
There is a fairly large amount of validated, peer-reviewed literature concerning the damaging effects of passive smoking. Perhaps you should go have a look at some meta-analysis on Cochrane. If you need some pointers on the pathophysiological and aetiological aspects of CVD, atheroma formation and cardiovascular physiology, I suggest starting with Guyton's Textbook of Medical Physiology, and then perhaps Robbins and Cotran's Pathologic Basis of Diseases BYS2 (talk) 16:07, 11 July 2012 (UTC)
    • ^ T. Seidenfeld, Philosophical Problems of Statistical Inference: Learning from R.A. Fisher, Springer-Verlag, 1979