Talk:Confidence interval
| This is the talk page for discussing improvements to the Confidence interval article. | |||
|---|---|---|---|
|
|
||
| Archives: 1, 2 | |||
| This article is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This subject is featured in the Outline of statistics, which is incomplete and needs further development. That page, along with the other outlines on Wikipedia, is part of Wikipedia's Outline of Knowledge, which also serves as the table of contents or site map of Wikipedia. |
Archives |
|---|
| This article is the subject of an educational assignment at College of Engineering, Pune supported by Wikipedia Ambassadors through the India Education Program. Further details are available on the course page. |
[edit] What more should be in this article
Following on from "Daqu"'s last comment from above, perhaps it is time to move on to what more needs to be in this article. I would not want any more on "explanations" and "interpretations" as this is not a text book. Topics that I think should be in, but are not yet included are:
-
- one-sided intervals
- equal-tail-area ways of determining limits
- derivation via inversion of significance tests
- confidence intervals derived from non-parameteric tests
- overlap of confidence intervals (to extend the limited content in existing figure)
In addition, I think the material in the subsection headed "Meaning of the term confidence" should be removed, possibly into a separate article ... as, given some other articles, there seems a need to discuss the layman's question of "how much confidence is there in that test" in terms of statistical power rather than being led through to "confidence intervals". So possibly an article "confidence in statistical results" would be justified.
-
- This an interesting discussion. I hope the following comments are helpful.
-
- In considering "confidence in statistical results" it is important to make a difference between concepts of probability and uncertainty. In some cases, but not always, uncertainty is the same as probability. Fisher, R. A. (Statistical Methods and Scientific Inference, 1956, p. 9) writes
-
-
- "The prime difficulty lay in the uncertainty of such inferences, and it was a fortunate coincidence that the recognition of the concept of probability ... provided a possible means by which such uncertainty could be specified and made explicit."
-
-
- and later on page 37 he says that
-
-
- "... the concept of Mathematical Probability affords a means, in some cases, of expressing inferences from observational data, involving a degree of uncertainty, and of expressing them rigorously, in that the nature and degree of the uncertainty is specified with exactitude, yet it is by no means axiomatic that the appropriate inferences, though in all cases involving uncertainty, should always be rigorously expressible in terms of this same concept. ... in the vast majority of cases the work is completed without any statement of mathematical probability being made about the hypothesis under consideration. The simple rejection of a hypothesis, at an assigned level of significance ..."
-
-
- In Bayesian inference measures of uncertainties have probability interpretation, that is, they have the properties of "Mathematical Probability" (with collection of events, etc.). In case of confidence intervals and significance tests the measures of uncertainties, that is, confidence levels and observed significance levels, have been calculated from probability distributions, but as post-data concepts they do not have probability interpretation. What would be the collection of events in these cases? In case of observed confidence interval surely not the interval and its complement. And in case of significance test surely not the hypothesis and its complement. In case of confidence interval there is no meaning to assert the complement of the observed interval has confidence level alpha and in the same way in case of significance test it is not meaningful to assert that the significance of the complement of the hypothesis is one minus the observed significance level.
-
- If confidence levels and significance levels had the interpretation of probabilities there would not be need to use words "confidence level" and "significance level". Because they are not probabilities of some events, but instead measures of uncertainties of certain inferential statements these phrases are necessary to make the difference clear. --Esauusi (talk) 00:15, 16 August 2008 (UTC)
Any constructive thoughts about what should be in the article, as opposed to teaching each other statistics?
Melcombe (talk) 08:41, 5 June 2008 (UTC)
- As mooted above, I have moved the subsection mentioned into a new article Confidence in statistical conclusions. Melcombe (talk) 11:07, 24 June 2008 (UTC)
If, as a layman, I type in the term Confidence Level, I want to see some discussion telling me what this actually means in terms of what real confidence I can have in the result being offered. You have removed this from the article and placed it where no layman looking for it can find it. I would suggest redirecting the term confidence level to significance level, which is more appropriate. In the mean time, I'm putting the information back here. 125.255.16.233 (talk) 10:22, 29 June 2008 (UTC)
[edit] table to be added
plus things could be centered and stuff... --Sigmundur (talk) 11:58, 6 September 2008 (UTC)
| probability % | std deviations |
|---|---|
| 50 | 0.676 |
| 75 | 1.154 |
| 90 | 1.653 |
| 95 | 1.972 |
| 99 | 2.601 |
| 99.9 | 3.341 |
| 99.99 | 3.973 |
| 99.999 | 4.536 |
On what distribution is this based? Clearly not the normal distribution, for which correct figures are given at Normal_distribution#Standard_deviation_and_confidence_intervals. Seems to be close to a t distribution with around 200 degrees of freedom. I can't immediately see the relevance of that particular distribution to the current version of the article, however. Maybe Sigmundur could clarify? Qwfp (talk) 13:38, 6 September 2008 (UTC)
[edit] Confidence Level / Significance Level
(Message and first reply copied from users' talk pages for general info)
The latest change is better (though the link doesn't work). However, the article still needs a simple definition of confidence level (i.e. something a layman can understand), and more importantly, an indication that, when used with statistical testing, it should be interpreted as indicating the significance level, preferably with a link to that article. The reason for this is simply the obvious fact that, for the average layman, the reason they would be looking up confidence levels or confidence intervals it that they have seen something in print reporting the results of statistics based research and want to know what the terms mean.
We can do the standard wikipedia "who can hold their breath longest" thing over this, but I'd rather be sensible about it. Perhaps if you changed the article to include these changes in the way you want, we can avoid the baby stuff.
Jim 125.255.16.233 (talk) 12:44, 12 October 2008 (UTC)
- The link to further down the article works for me, but only moves to a section containing the definition. Your earlier edits seemed to want to imply that it is standard to use "significance level" in the context of a confidence interval and "confidence level" in the context of a significance test. I don't think this is done and that it is only of use when developing one or other via the significance-test inversion approach. You may be right that more infiormal description of confidence levels is needed ... possibly a short separate article would be better than trying to fit it in within the article for confidence intervals. Melcombe (talk) 09:22, 14 October 2008 (UTC)
The problem is that, for laymen, the usual contact they have with statistics is of the kind where they see stated something like "scientists have discovered that eating onions increases your chance of getting brain cancer by 52%", which is inevitably nonsense. If they research this they will see the statistic printed as 0.52 (95% CI, 0.29-0.90). When they look up CI they will find it means confidence interval. When they look that up they will see that this means that 0.29-0.90 is the interval likely to contain the chance of getting brain cancer, with the confidence level 95% indicating how likely that is. And that's where they stop, "knowing" that the scientist is 95% likely to be right, ie. that eating onions almost certainly causes brain cancer. If you look up significance level you will find all the warnings about what the 95% actually means and about multiple tests, but there is nothing here to guide them to that, and there are no warnings here. There needs to be something at the top of the article that leads to a layman's explanation of this problem. A definition of the confidence level in terms of the significance level is one way of doing that. Can you suggest another way? 125.255.16.233 (talk) 09:46, 15 October 2008 (UTC)
- There's one 'proof-by-contradiction-type argument that i've seen somewhere (but unfortunately can't remember where) that brought home to me that you can't have it both ways, i.e. you can't interpret frequentist confidence intervals as meaning there's 95% probability of the true parameter lying within the calculated values:
- Say two samples from the same population result in non-overlapping 95% confidence intervals. Correctly interpreted, this is perfectly possible, just very unlikely (1 in 800??). If you misinterpret the intervals as probability statements about a random parameter and those two fixed intervals, however, you find that the probability that the parameter lies in either interval is 0.95 + 0.95 = 1.90 (using the axioms of probability and the fact that it can't be simultaneously in two non-overlapping intervals). It's simplest to think about if the top end of one CI happens to coincide exactly with the bottom of the other: e.g. if the 95%CI for the mean is [1.2, 1.5] in the first experiment and [1.5, 1.8] in the second, then the probability the true population mean lies in [1.2, 1.8] is 1.9. But probabilities can't exceed 1, so something's gone badly wrong. This should be pretty convincing to any student of mathematics, but i couldn't call it a layman's explanation of the problem. Qwfp (talk) 11:18, 15 October 2008 (UTC)
-
- If you go back to the underlying mathematics, and remember that the confidence interval is an expression of the significance level and the probability of the observed event, you will realise that, because 5% of the time the event is just a random event that happens to be rare enough to satisfy the test, the confidence interval it produces could be anything. There's no reason why it should overlap a genuine confidence interval. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)
- I think that at one stage there were article versions (for CI's and significance tests) that attempted to define confidence level as one (or 100) minus a significance level and also to define significance level as one (or 100) minus a confidence level, which would be circular. I think that the notions of confidence intervals and significance tests need to be treated separately as practical applications do not always make any attempt to deal with them jointly. The basic definition of a confidence level is in terms of the coverage probability. If a 2,3 or 4 sentence layman's explanation in terms of coverage can be added to the introduction, then OK, if it needs to be much longer than this then either a separate article is needed or a new subsection might be made somewhere. Unfortunately, I haven't seen a good example of an article divided in moderately long layman's and technical portions. I would agree that the CI article needs to have something about constructing confidence intervals by inverting significance tests, but it shouldn't imply that this is always done, which would be the danger in a poorly written layman's explanation in terms of significance levels. Melcombe (talk) 15:11, 15 October 2008 (UTC)
I'm not sure yet that I'm putting this right, so I'll try an analogy. If I were writing a book about the countries of the world I might include a section on Myanmar. But if someone looks for Burma they will be looking in the wrong place. So I would include a reference under Burma to Myanmar. If a layman looks up confidence in wp, seeking to understand what the result of a statistical test means, they will be taken to this article. There is nothing here to explain to them that they need to go look at the article on statistical significance. Either this article is divorced from statistical testing and so that reference needs to be here, or there needs to be a prominent section for laymen explaining the use of confidence intervals in understanding the results of statistical testing. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)
- You may be expecting too much of what is only one of about 1500 statistics-related articles. However, I have added a new section at Confidence interval#Relation to hypothesis testing that may be start on what you are thinking of. Ideally it should have a formal proof of the equivalence. Melcombe (talk) 12:14, 21 October 2008 (UTC)
This misses the point in that it is hardly something that laymen are going to understand. You must be an academic because you don't seem to have any idea of how to write for the general public. :-) Might I suggest the following to replace the last statement in the first section: "When the result of a statistical test is presented with a confidence interval, the confidence level of the interval indicates the significance level of the test (significance level is 100% minus confidence level)." With significance level linking to the appropriate article. This addresses my concerns; would you be happy with it? Jim 125.255.16.233 (talk) 14:12, 21 October 2008 (UTC)
- No. You want "When the result of a statistical test is presented with a confidence interval" ... but the result of a statistical test is either yes or no (accept/reject) and a confidence interval is a different thing. If both a test and a confidence interval are presented there need not be a connection between their associated probabilities. Possibly you could have the result of a statistical analysis being presented as a confidence interval, with the confidence interval either covering or not covering a particular value of special interest and this could then be interpreted as providing a significance test. Note that "analysis" is not the same as "test" and that in statistics "test" has a particular meaning. Something like the following might do: "The parameter values inside a 100(1-α)% confidence interval can usually be regarded as being those values for which a significance test (of the null hypothesis that the true parameter value is the given value) would be accepted at a significance level of 100α% , while those outside would be values for which the test would be rejected." Melcombe (talk) 15:29, 21 October 2008 (UTC)
And what layman will ever have a hope of understanding that? A definition for laymen means no jargon and no mathematical formulae beyond simple arithmetic. That's why I spelt out the word minus. I regularly read published papers that say things like: "Total energy was associated with increased risk for both local and regional/distant stage disease. The adjusted odds ratios [95% confidence intervals (CIs)] contrasting highest to lowest quintile of energy intake were 2.15 (95% CI, 1.35–3.43) for local and 1.96 (95% CI, 1.08–3.56) for regional/distant disease." In other words, they calculated the odds ratio to be 2.15, generated a confidence interval of 1.35 to 3.43, and because it does not contain 1 are saying that the result is positive. So we see that the result of a statistical test (that there was an association) is being presented with a confidence interval (95% CI, 1.35–3.43). I want a layman seeing this in a paper and coming here to look up confidence interval to go from this article to the one on significance levels, where they can read about the pitfalls of accepting this as scientific fact. How would you suggest I phrase it? Without using jargon or "complex" formulae? 125.255.16.233 (talk) 13:00, 22 October 2008 (UTC)
[edit] Request for expert help in making this article more usable
I am an experienced user of statistics, but not a statistical expert. I came to this page for help because journal editors increasingly require authors to report confidence limits of populations statistics, rather than standard deviations or standard errors, for reasons which I understand and find persuasive.
Obviously, I expect eventually to have to consult text books and if need be the original literature, and I am capable of doing that. But I expect an encyclopaedia article to give me an orientation to the subject, tell me the main terms to look out for, and if the matter is simple enough enable me to get on with my job. I also expect it to tell me where I can read more - the sneering tone above people's desire for references is way out of order. What I found was almost entirely useless for my purposes, which I suspect are like those of the majority of people who will want to use this article. The article told me a certain amount that I already knew, and then plunged me immediately into more mathematics, in more pedantic detail, than I needed or could easily cope with. It also contained a certain amount of what I strongly suspected was philosophical grandstanding - and a glance at the talk page confirmed that impression. We all know there are issues at the philosophical foundations of probability, and that there are differences of opinion between classical and Bayesian statistics; a page about a particular topic that lies essentially within the classical sphere is not the right place for expressing those disagreements. It is right to point out somewhere that they exist, and that a confidence interval though it occupies the same role as a credible interval is not the same; but after that we should hear no more of the matter.
Accordingly, I am going to try to reorganize the article to improve its usability. In doing so, I appeal for help to all the experts who have, from their different points of view, contributed to its content. Although I intend to rely largely on material that is already present, I am going to cut out (or relegate to later sections) some detail, and as a user rather than an expert, I am also almost bound to introduce solecisms. I am confident that despite the disagreements aired above, you will be able to help, and in a consensual way. I do not believe that expert statisticians really disagree about the kind of statement that needs to be included in an article at the level appropriate for Wikipedia; and if there are differences of view, they should be signalled in the article but we should not be trying to resolve them - that would be the dreaded Original Research.
Having written a lengthy comment, I now have no time to start the work of revision. I'll be back later to get on with it, and I look forward to your help. seglea (talk) 16:55, 12 April 2009 (UTC)
- Well whether or not you consider it "philosophical grandstanding" your opening comment that "By definition, the confidence interval with probability P has that probability of including the true value of parameter which is being estimated." while perhaps defensible on some readings will certainly be misinterpreted by many readers, as was discussed ad nauseam on this page previously. Is there actually a reason why you want the page to be misleading? Jdannan (talk) 21:45, 12 April 2009 (UTC)
-
- Hah, good, I see the previous commentators have not forgotten this page, so we have a chance to work on it. I have now done my best with the opening section, which I have reduced to a (fairly) short opening para and a longer section now called "Conceptual basis". It is important to get this correct and effective before moving on to tidying up the rest of the article, so I am glad you are opening the discussion up.
- In answer to your specific point:
- (a) That is the definition of CI that I have found in the only stats book I happen to have on my shelf at home - I will check others next time I am in my office, but my memory is that they say the same thing. If this definition is not accepted by all, we are going to have to give an alternative, equally brief and conceptual, definition, and give references to where they are each discussed. It is not appropriate to try to resolve within a Wikipedia article matters on which qualified experts disagree - all we can do is try to help readers be aware of and if possible understand the disagreement.
- (b) Having read the discussions above, I think I can see the misinterpretation you are concerned about. I have now expanded the "Conceptual Basis" section to try to head them off. From the timing of your comment above, I think you caught this process half way through. Please have a look at the current version and see if it is addressing the point you are worried about. It is not very elegantly expressed yet, but if the point is right we can work on its expression later.
- (c) Without disputing the need to get this correct, I have yet to see that if readers made the misinterpretation you are concerned about, they would then go on to make errors of inference about data. (I am perfectly open to persuasion on this, but I need to be shown an example before I'll believe it.) If the interpretation is important to the statistical theory of CIs but not to their practical use, then all we need do is note that at the theoretical level the definition needs further discussion, and move that discussion to the theoretical section of the article. seglea (talk) 22:08, 12 April 2009 (UTC)
-
-
-
- The "errors of inference about data" are that even talking in a probabilistic way about (unknown but constant) parameters is a category error for a frequentist. It makes about as much sense for them to say there is a 90% probability that x lies in the interval [2.3,3.4] or whatever as it would for them to talk about the colour of x.Jdannan (talk) 02:01, 13 April 2009 (UTC)
-
-
-
-
-
-
- Thank you for your response. I understand your concern and have modified the text to try to meet it. It doesn't seem appropriate to go into this matter in the opening paragraph, as (regrettably, no doubt) it will on first encounter be largely meaningless to most people who find themselves wanting to use confidence intervals. So in the opening para I have made clear that this isn't an ideal way of expressing things, and in the next section I have tried to put it precisely; and would welcome help in getting that bit right. seglea (talk) 15:56, 13 April 2009 (UTC)
-
-
-
"A confidence interval is always qualified by a particular probability (say, P), usually expressed as a percentage; thus one speaks of a "95% confidence interval"." P is not a probability, it is the Confidence Level. At best, it is the level of probability at which a result is accepted by a researcher, or perhaps the probability of not getting a type I error. The real probability of a particular interval containing the parameter may be much higher.
"By definition, the confidence interval with probability P has that probability of including the true value of the parameter which is being estimated." This is total bullshit. P is the percentage of confidence intervals that the generation process produces that will contain the parameter. Different thing. To get the actual probability of the interval containing the parameter you would have to know the details of the generation process and do some complex arithmetic.
More seriously, a general introduction should be for a general user, not a mathematician. Yours isn't. If you want to write a good general introduction to this article, go and read the entry in the Encyclopaedia Brittanica. They do in about two paragraphs what this article fails to do in pages of waffle.
Until I see a good introduction, I'm putting the page back to where it was. It may have been hard to read, but at least it wasn't bullshit. —Preceding unsigned comment added by 125.255.16.233 (talk) 23:58, 12 April 2009 (UTC)
- I am afraid I have simply reverted your edit, which was inappropriate on several grounds:
- (a) You really should get yourself a userid so that your contributions can be properly identified and your areas of expertise recognised.
- (b) Your comments are falling short of the basics expected under WP:Civility.
- (c) You have reintroduced into the opening para a basic error (that a 90% CI will be wider than a 95% CI).
- (d) As I am not a mathematician (and have made that clear above), I would be quite incapable in writing an introduction that was suitable for one. I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means. If you find parts of the opening paragraph overly mathematical in expression, I suggest that you try some detailed editing to improve them, rather than wholesale reversion.
- (e) As I have already indicated in my responses to the helpful interventions of Jdannan, I understand the issue about defining a CI in terms of a probability - but it happens to be the definition that is widely used, and as such it needs to be given. A formulation very like the one you propose already appears in the next section of the article, to explain what is actually meant, and I am afraid this leads me to believe that you may not have read what you were criticising. I agree that it would be better to introduce the P value as the Confidence Level rather than setting people off on the wrong track by calling it a probability, and I am just going to change the text to do that.
- Where I do agree with you is that this article contains a great deal of unnecessary material, some of it repetitive. I am working from the top down to eliminate that and will not be able to do it all in one edit. I do not think that the 2 paragraphs you cite for the Britannica article are a necessary target; the present material includes some useful mathematical sections that should be retained, though not in the earliest parts of the article. seglea (talk) 15:56, 13 April 2009 (UTC)
-
- And I see that Confidence Level redirects to this article, so that term should certainly be in the opening para, in bold - it now is. seglea (talk) 16:05, 13 April 2009 (UTC)
- IP editors are welcome if their edits meet with consensus, as are you. But do you really think you should be throwing out accusations of incivility, after your first missive here? Allowing time for discussion of your concerns before making changes might have caused less fireworks and reduced the chance of your work being reverted. But I agree with you on (c), at least, and that there is plenty of scope for improvement. -- Avenue (talk) 18:02, 13 April 2009 (UTC)
- Unreserved apologies to all if my opening remarks came across as uncivil; they were not intended to be; I was simply trying to state what I thought was wrong with the article from my point of view as a likely user, to set out a strategy for improving it, and to recruit editors of good will and greater expertise than mine (among whom I strongly suspect you can be counted) to help in the process. seglea (talk) 18:48, 13 April 2009 (UTC)
- IP editors are welcome if their edits meet with consensus, as are you. But do you really think you should be throwing out accusations of incivility, after your first missive here? Allowing time for discussion of your concerns before making changes might have caused less fireworks and reduced the chance of your work being reverted. But I agree with you on (c), at least, and that there is plenty of scope for improvement. -- Avenue (talk) 18:02, 13 April 2009 (UTC)
- And I see that Confidence Level redirects to this article, so that term should certainly be in the opening para, in bold - it now is. seglea (talk) 16:05, 13 April 2009 (UTC)
-
Suppose a researcher conducts a study in which she does ten statistical tests and gets one positive link. If each test is done at 95% confidence level then she has ten chances at 5% or over a 40% chance of getting at least one type I error. So the probability that the interval will contain the parameter is just under 60%, not 95%. Textbook definitions are based on the assumption that only one test is done, but in the real world that is very rarely the case.
The real problem with this article is one common to most of the mathematical articles in this monstrosity, which is that is is so overburdened with fiddly technical definitions and qualifications that no ordinary person has much hope of understanding it. Show the opening paragraph to your wife and then ask her to tell you what a confidence interval actually is; I doubt she'll get past "interval estimate". Then do the same with the Britannica article, and you'll see what I'm talking about.
"I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means." -- I'd really like to see an article here that such a user could understand, but it won't happen.
With reference to (c), that got snuck in 31st march. It wasn't there last time I read it. :-) There are two many idiots editing this monstrosity.
With reference to (b), I've been through one revert war already, and have seen what 'civility' means here. :-) Get yourself a thicker skin. —Preceding unsigned comment added by 125.255.16.233 (talk) 11:10, 14 April 2009 (UTC)
- Now, if I wanted to be tiresome I could suggest that generalising from one bad experience is no way for someone who claims to know about statistics to behave. Having edited hundreds of Wikipedia articles, and seen many of them grow into fine and usable documents through good-humoured collaborative effort, my experience is different. Until proved wrong, I'll assume that the same process can work for this article, and I am doing my best to look through your dyspeptic way of expressing yourself to what you are saying. Indeed, I've just made a couple of edits that reflect the points you are making above. Actually I think we (and other contributors) have the same aim here - namely to reduce a sprawling article to a usable one. That should not mean sacrificing technical rigour, it just means getting it in the right place. However, I edit on this site for pleasure and my own education, and neither of those is added to by interacting with people who don't have any manners; so I have a general policy of ignoring people who are simply rude. seglea (talk) 21:56, 14 April 2009 (UTC)
I have made some more tweaks to the intro, and removed a couple of chunks lower down that seemed out of place. If anyone thinks they were needed, can you explain what their particular role was, please? seglea (talk) 21:56, 14 April 2009 (UTC)
Re: 283982996 -- this stays in until I see the point made elsewhere in the article, in language my wife could understand. :-) —Preceding unsigned comment added by 125.255.16.233 (talk) 12:29, 15 April 2009 (UTC)
- My concern about these paras are that, though the points are fair enough, I am not sure they belong in this particular article - they are very general issues about statistical inference. This kind of thing (i.e. general issues finding their way into specific articles) is a common cause of bloat in Wikipedia articles - and this article is by common consent suffering from bloat. I will have a look around and see if they can be placed somewhere more suitable and dealt with more briefly here, by a link. seglea (talk) 22:25, 15 April 2009 (UTC)
Firstly, given that both Confidence Level and Confidence Interval point here, where else is a more appropriate place to talk about what the word confidence means in a statistical context? Where else is someone looking up the word confidence going to end up? Secondly, given that the reason many people will look up confidence level/interval is because they've seen a published statistic and want to know what the 95% means, this is the appropriate place to tell them. Thirdly, in my experience "move it to a more suitable article" is a weasel way of saying delete it without appearing to. The wikipedia indexing function won't find the article "Meaning of the term confidence in a statistical sense" when someone types in confidence level, and so the content is buried. My suggestion is to have another section directly under the main section with a title like "Confidence Intervals and Statistical Tests" which tells laymen how to interpret confidence intervals when presented with the results of statistical tests. That to be followed by the technical definitions. But I'm happy so long as this remains. —Preceding unsigned comment added by 125.255.16.233 (talk) 00:38, 16 April 2009 (UTC)
-
- Let's recall that this stuff was deleted previously and put back unnecessarily by this same person who insists that only he is right. And now put balk for a second time. The article Confidence in statistical conclusions was created so as to have a place for this stuff which clearly does not belong in this article but might have some small merit .. but by the tags added no one else seems to think so. Of course this wasn't good enough. The key to "making this article more usable" must be to first omit all the stuff not directly related to the article's title. Melcombe (talk) 08:53, 16 April 2009 (UTC)
[edit] Archive proposal
This Talk page is getting clumsy and could do with archiving. I propose archiving everything prior to 2009 - any objections, or suggestions of a different cutoff point? seglea (talk) 22:25, 15 April 2009 (UTC)
- I agree it's too big, but the discussions in June 2008 seem to overlap with some of the points raised recently. How about just archiving everything prior to that? -- Avenue (talk) 00:16, 16 April 2009 (UTC)
[edit] I'm a bit rusty in statistics, but......
The terms of confidence interval, confidence level and degree of certainty are needed to be clarified. If they are synonyms, please list them. BTW, is there any APPROVED STATISTICS TERMINOLOGY? —Preceding unsigned comment added by 124.78.212.232 (talk) 11:37, 10 May 2009 (UTC)
I recommend to create another article for confidence level, even there are a few words--124.78.212.232 (talk) 11:58, 10 May 2009 (UTC)
In the book of mine titled as Introduction to Business Statistics by Alan H. Kvanli, the relationship of Confidence Interval and Confidence Level has bee described as follows:
The higher the confidence level, the wider the confidence interval. The confidence level is written as (1 − α)·100%, where α= .01 for a 99% confidence interval, α= .05 for a 95% confidence interval, and so on.--124.78.212.232 (talk) 12:12, 10 May 2009 (UTC)
I tried to get the two terms separated some time back, but there are forces at work here that don't want to see that happen. 125.255.16.233 (talk) 14:47, 7 June 2009 (UTC)
[edit] Avoid jargon words in the definition of the term.....
The definition of the term in the following
https://www.involvenursing.com/SADTN/web_101_glossary.jsp
is more understanable than the one in the begining of this article.--124.78.212.232 (talk) 12:25, 10 May 2009 (UTC)
[edit] Meaning of the term "confidence"
Hello. I'll admit right away that I'm not a statistician, but this section had me so confused that I feel forced to question it.
"In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less."
Wouldn't the researcher only see it if his observation is incorrect? If we reject hypothesis A with a confidence interval of 95%, we have not necessarily observed something that happens one time in twenty. If hypothesis A is CORRECT (which is not necessarily the case) and we STILL reject it due to our observations, then we have seen something that happens one time in twenty or less (and vice versa). This text says kind of the opposite.
- A statistical test involves looking at the probability of an event and comparing it with the significance level. If the probability of the event is less than the significance level you reject the null hypothesis, which is that the event happened by chance. If the confidence level is 95%, the significance level is 5%, or 1 in 20. If you reorganise the mathematics so that you are testing on a confidence interval, the basis for the test is still that there is a 5% chance that the expected statistic will fall outside the interval. That's what the 95% means. So when you do a statistical test at 95%, you are looking to see if something has occurred that has a probability of happening of 5% or less. If the result is positive, then the question arises as to whether it is a false positive (a type I error) or a real result. If you only do one test, and it returns a positive result, then 95% of the time it will be a real result.
-
- "If you only do one test, and it returns a positive result, then 95% of the time it will be a real result." NO this is flat-out wrong. It is simply not possible to infer a probability that an effect is "real" or not based on frequentist statistics. If the null hypothesis is true, then in 5% of experiments a test will generate a "significant" result (according to the test) and 95% of the time it will not. That's all. Jdannan (talk) 04:11, 8 June 2009 (UTC)
"If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were."
- Assuming you had specified your test before rolling the dice, this is true enough, but note that the 97% "confidence" here does not equate to believing the dice are fixed with probability 97%. Jdannan (talk) 04:11, 8 June 2009 (UTC)
The chance is also 1/36 of rolling a double one, two, three, four or five. Even if you get a mix, by the above reasoning you could say: "I got a 3 and a 5, the odds of that are only 2/36, or about 6%, so I am 94% confident that the dice are fixed". "Statistically speaking" this whole reasoning must be considered faulty, since you're very certain that the dice are wrong no matter what. At least this sounds to me as though the test is made "a posteriori". You need to, so to speak, FIRST set up a confidence interval, THEN look at the dice. Here you are purposefully constructing a confidence intervall that does not include the sample outcome. Maybe I'm just reading it wrong; it would work if you first say, "if the dice are normal, I am 97% confident that they will turn up a result from 2 to 11", then roll the dice once and discover that they failed your test. If that's the case, I think it needs to be clarified.
- The underlying assumption is that you are trying to roll a double six, and hence that the probability of doing so is what matters. If you use your 3 in 5 argument, you are really saying that if you roll the dice and get two numbers then the dice must be fixed. But the chance of that is 100%. Or maybe you are saying two different numbers; the chance of that is 5/6. The double six argument is two sixes.
"For example, say a study is conducted which involves 40 statistical tests at 95% confidence, and which produces 3 positive results. Each test has a 5% chance of producing a false positive, so such a study will produce 3 false positives about two times in three."
First of all, the outcome of the tests should depend on the things that are actually tested. Assume for instance that the actual value of the 40 things tested were all positive. Then you could have no "false positives", but in this example, you would have 37 false negatives. I guess the idea is that the tests should all come out negative (if the tests were 100% correct), but 3 come out as false positives. That is to say we do 40 coin tosses with a coin that's 95% likely to end up on one side and 5% on the other, and still we get the other 3 times out of 40. Then I must ask how it's been calculated. I would get
0.95^37 * 0.05^3 * 40! / (37! * 3!) = 0.185 as the likelihood of 3 false positives, so how is it "two times in three"?
- Two times in three the study will have produced 3 false positives, there being no real effects. The next most likely outcome is two false positives and one real result; obviously the chance of that is less than one third. Within the third, there is a very minute chance that the study will have produced no false positives and 37 false negatives. I can't calculate the exact probability of that for you because I don't know the numbers involved, but it is insignificant. You can't start by making an assumption about which result are reael because if you knew that why would you be doing statistics?
- Just noticed... The chance of a positive being a false positive is 5%, because that's what the test is based on. The chance of a negative being a false negative is dependent on the strength of the effect and the number of tests. It's not 5%; unless you are dealing with small sample sizes its a hell of a lot less. —Preceding unsigned comment added by 125.255.16.233 (talk) 23:28, 7 June 2009 (UTC)
"Thus the confidence one can have that any of the study's results are real is only about 32%, well below the 95% the researchers have set as their standard of acceptance."
Well, the confidence that any single result is correct is 95%, obviously (what does "any of the study's results" mean if not these?). 213.100.32.24 (talk) 23:20, 6 June 2009 (UTC)
- Another way of explaining this is as follows: A statistical test is based on the probability of an event. When you do multiple statistical tests you are looking at multiple events. When you select the ones that are positive you are looking at multiple events and selecting. An underlying law of probability is that when you do this the events are not independent, and hence the probability of all the events must be taken into account when calculating the probability of any individual event. The mathematics used to calculate p-values, confidence intervals, and so on are derived on the assumption that there is only one event. Hence, when you do multiple tests you break that assumption and you get the wrong answers. In every case the actual probability will be much higher than the probability tested. In this case the tests were conducted at 95% confidence but because more than one was done the real confidence is about 32%. To get the 95% desired the individual tests would have had to be conducted at a much higher confidence level. 125.255.16.233 (talk) 14:44, 7 June 2009 (UTC)
-
- One, what is wrong? You said yourself that you are not a statistician, so it would be useful to see what you regard as actually wrong and why. What you have shown above is a lack of understanding of the mathematics underlying statistical testing, which is understandable, so I assume you are not going to argue with me over that. If there's something wrong with the wording I'm happy to see if we can find a more acceptable phrasing. Two, Confidence Interval and Confidence Level both refer to this article. I tried to change that once and failed dismally. So this article, like it or not, has to cover both topics. 125.255.16.233 (talk) 16:38, 7 June 2009 (UTC)
- Well I'm not the one who wrote most of the above. And I am a statistician. And this article is about the term confidence interval, so an explanation about 'confidence' in this article should relate to confidence interval. There is of course a one-to-one relation between testing and confidence intervals. But: In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less., well, may not be wrong if interpreted in the right way, but at least completely not understandable, and also not direct related to confidence intervals. Also the next sentence: If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were. Similarly, the finding of a statistical link at 95% confidence is not proof, nor even very good evidence, that there is any real connection between the things linked. doesn't contribute much to the understanding of 'confidence'. Nijdam (talk) 09:00, 8 June 2009 (UTC)
- One, what is wrong? You said yourself that you are not a statistician, so it would be useful to see what you regard as actually wrong and why. What you have shown above is a lack of understanding of the mathematics underlying statistical testing, which is understandable, so I assume you are not going to argue with me over that. If there's something wrong with the wording I'm happy to see if we can find a more acceptable phrasing. Two, Confidence Interval and Confidence Level both refer to this article. I tried to change that once and failed dismally. So this article, like it or not, has to cover both topics. 125.255.16.233 (talk) 16:38, 7 June 2009 (UTC)
- Firstly, the terms confidence interval and confidence level both redirect here. Hence, even though the title is confidence interval, the article is about both terms. I feel the two terms should be split into two separate articles. Good luck making it happen; I tried and got shouted down.
- Secondly, the point of this section is to highlight the difference between the normal meaning of the word 'confidence' and 'confidence' as a statistical term. When a layman talks of being 50% confident of something he means that he is half sure that it is right. I hope you agree that the result of a statistical test that is reported with a 50% confidence level does not mean the same thing. I believe that it is entirely appropriate within an 'encyclopedia' supposedly for the masses to ensure that that point is clear. 125.255.16.233 (talk) 12:01, 8 June 2009 (UTC)
- Nuts! This section continues to conflate study-wise error and error in individual tests. How can it possibly be that if I conduct one hypothesis test and get result A and you conduct tests A and B that our results for A mean different things? It's nonsensical. There isn't some probability fairy out there who watches over what tests you do and changes the likelihood of a result on that basis. The joint probability of A and B is of course lower than the individual probability of A or B. You lower the critical value alpha for the tests on A and B not because anything has changed about those individual tests, but because you want to reduce the likelihood that any of the results you present, all together, are in error. But again, your decision with A is no more or less likely to be in error if you publish it alone than if you publish it along with B. --161.136.5.10 (talk) 18:43, 30 June 2009 (UTC)
Neutrality Dispute: It's clear from the above discussion that there is some debate about the 'Meaning of the term "confidence"'. Also, I have some doubts about the neutrality of this particular section and whether it is encyclopedic in its current form. For example, the statement that 95% confidence intervals aren't "even very good evidence" is clearly not neutral. (Indeed, when does statistical evidence become 'good' anyway?) Secondly, it doesn't really clear up the meaning of confidence intervals. Rather, it is devoted in the main to discrediting the 95% confidence interval. Yet this itself misleads the reader into thinking that all we need to do is up the confidence level -- when in fact most misconceptions about confidence intervals are to do with its interpretation, not with what level is chosen. It would be enough to mention that there is nothing special about the 95% confidence intervals so frequently used in research.
As for interpreting multiple statistical tests as giving confidence to the study itself... Who does this? In my experience of scientific research (where confidence limits are rife) I've never come across this.
All in all, too many slights at us wretched laymen and non-statisticians, and not enough cited facts to speak for themselves. Unfortunately, I'm one of these poor, unfortunate 'laymen', and so I don't have the confidence to edit the section and make it factually accurate myself (you've got to give me credit for that pun, folks! ;-) ) Robnpov (talk) 09:27, 24 July 2009 (UTC)
- The aim of this section is to point out the difference between the common usage meaning of the word 'confidence' and its meaning as a technical term. Specifically, it is intended to make the point that 95% confidence does not mean 'unlikely to be wrong', but rather pretty much the opposite. If you think it needs rephrasing, have a go at it.
- If you are claiming that this section is not neutral, then you are saying that it is slanted towards some position that is in dispute. What is that position? If it is that statistical evidence is not very good evidence, well, you've made the same point above.
- I'll give you a week. If you don't respond I'll remove the warning. 125.255.16.233 (talk) 12:25, 26 July 2009 (UTC)
- It's now been 2 weeks since you tagged the section and you have neither responded or made any attempt to modify the section. I assume therefore that your tagging of the section is nothing more than vandalism and am removing the tag. 125.255.16.233 (talk) 03:00, 8 August 2009 (UTC)
-
- Well, I'm sorry for not replying to your points. I only edit Wikipedia occasionally, when I spot something and think it needs attention. If I have the time to research and do a proper job of it, I do try to edit the article rather than just tag it. But sometime I forget to go back and follow up.
-
- So, can I assure you that the tag was added in good faith. Please don't forget the wikipedia principle, WP:Assume good faith. Several things should have indicated good faith too: I tagged only the subsection rather than the whole article, I didn't blank the section, I didn't add nonsense/offensive material, and I explained the edit on the talk page. Check out WP:Vandalism before throwing that term around in future.
-
- Anyway, to the article. I think my original post already answered your questions, though. Here's what is in dispute -- the meaning of the term confidence in a statistical context as presented. In addition, as I said, I personally dispute that 95% confidence intervals "aren't very good evidence". There should be no value-judgement here. Nor do I see how this is relevant to the meaning of confidence intervals.
-
- This needs reliable, published sources about the meaning of confidence in statistics. The problems are a mixture of neutrality, tone and relevance, and I am sorry if NPOV was a bad choice of tag. But previous editors on this talk page seem to back up my assessment.
-
- I note that you are the original contributor of this section. It is very useful that you have clarified what the main point of this subsection is: to distinguish between the everyday and statistical usage of the term 'confidence'. Thank you for this. I will edit the article to emphasise that point as soon as I get chance. Hopefully a few statisticians will weigh in and judge my efforts on their accuracy...
[edit] Relation to testing
In the section about the relation with statistical testing I read twice: 'second' parameter. What is the meaning of this 'second'?Nijdam (talk) 13:24, 8 June 2009 (UTC)
- If you measure the average height of a group of people and generate a confidence interval on it, then measure the height of a second group of people, if the second height does not fall in the confidence interval then you can reject the hypothesis that the true average heights of the two groups are the same. With lots of caveats about distributions. And buried in waffle. 125.255.16.233 (talk) 10:41, 9 June 2009 (UTC)
- Hence the caveats and the waffle. 125.255.16.233 (talk) 14:19, 9 June 2009 (UTC)
-
-
-
- I think this section is wrong. It seems to say: Suppose f(.|q) is a density with parameter q, and x1,...,xn is a random sample from this distribution, with X=t(x1,...,xn) and also (a,b) is a P-confidence interval for q, then if y1,...,ym is a random sample from f(.|w), with Y=t'(y1,...,ym), then the complement of (a,b) is the critical region for testing H0: w=X on the basis of Y. If so it is nonsense, and it seemes to show the error often made by people who have difficulty in understanding CI, to give the CI the role of acceptance region, which it is not. Nijdam (talk) 15:27, 9 June 2009 (UTC)
-
-
[edit] "Interval" vs "length of an interval"
The second sentence of the article "Instead of estimating the parameter by a single value, an interval likely to include the parameter is given." could be improved by saying "the length of an interval" instead of "an interval". (I agree that an example in the article does make it clear that a confidence interval is not a specific interval with specific numerical endpoints. But why not start out on the right track?) Or you might want to resort to more lengthy language that says it is a conceptual interval centered about the unknown true value of the parameter being estimated.
Tashiro (talk) 19:22, 3 July 2009 (UTC)
- A confidence interval does indeed have specific endpoints; it's not just a length. Michael Hardy (talk) 17:37, 24 July 2009 (UTC)
[edit] Needs Attention, Rewrite
I have a technical background - programming, databases, etc. - but it's been a few years since I studied stat or calculus. Still, I couldn't understand the first sentence, and I made several tries. Maybe I just need more coffee, but I don't think this article would be any use to anyone but a student of mathematics. All I want is to calculate confidence intervals for a data set, but this article does not help me. If there's anyone out there who can speak math-geek and also English, please share! 12.168.25.203 (talk) 21:08, 9 October 2009 (UTC)
- I agree; the plural and singular words don't agree and it's confusing. I would try to edit the section, but there's no edit button since the grammar problem is in the beginning.99.141.167.152 (talk) 01:19, 11 October 2010 (UTC)
- "No edit button": either (i) use the general edit tab that allows the whole page to be edited.... I think the actual position of this changes depending on defaults and options chosen, or (ii) there is a person-prefence option that produces an edit option for the lead section in the same way as for other sections, but you would need to register and sign on for that to work. The remark to which you are replying is a year old now and changes may have been made since then. But go ahead and make "improvements" if you wish. Melcombe (talk) 08:46, 11 October 2010 (UTC)
[edit] Relation to hypothesis testing
The article says: one general purpose approach to constructing confidence intervals is to define a 100(1−α)% confidence interval to consist of all those values θ0 for which a test of the hypothesis θ=θ0 is not rejected at a significance level of 100α%.
At least for Binomial, this is not true: for example the 95% "significance interval" for p arising from Bin(n=50,p=.5) will contain p=.5 about 96.7% of the time. Is that only because Binomial is discrete? Or is something more subtle going on?
What's the most extreme example of the difference in performance between CIs and "significance intervals" that the authors of this article can come up with? It might be worth publishing in the main article.
Red Ed (talk) 20:43, 13 October 2009 (UTC)
[edit] Example is poorly constructed
It is highly unlikely that standard deviation from mean is known but the mean is not. Calculating the standard deviation from mean requires information that provides the mean. The example reads in part: "The distribution of X is assumed here to be a normal distribution with unknown expectation μ and (for the sake of simplicity) known standard deviation σ = 2.5 grams." If the expectation is unknown, it seems contrived and unreasonable to present the variance of X as a known. AngleWyrm (talk) 18:18, 11 January 2010 (UTC)
- Obviously the example is not meant to be particularly realistic, but rather to provide a case where both the setting and manipulations required are simple to understand. Melcombe (talk) 09:39, 11 January 2010 (UTC)
- The current example adds confusion rather than illumination. X is presented as an unknown distribution, with an unknown expectation (mean of X). Then it is assigned the property of a Normal Distribution, a common practice, especially in examples. But the example as it is currently written goes on to say the variance of an unknown distribution is known, a rather extraordinary claim, even to the point of casting doubt on the validity of the methods being used. What can be known is the variance of the sampled 25 test cups. AngleWyrm (talk) 18:18, 11 January 2010 (UTC)
-
-
- The article doen't say "unknown distribution". There can certainly be pratical situations, similar to that here where a standard deviation is effectively known exactly ... thus the sample of 25 being considered might be from one batch of many such batches, where experience has shown that the mean changes between batches but the variance does not ... combining information across batches can lead to an effectively exact value. In any case: (1) there is no need to be more complicated than necessary, and (2) the second example ("theoretical example") treats the case you seem to want. Melcombe (talk) 16:19, 12 January 2010 (UTC)
-
-
-
-
- The example fails to specify the desired size of a cup:
- "...is supposed to be adjusted so that the mean content of the cups is close to 250 grams of margarine." Close to is an arbitrary and subjective quantity. Is the cup supposed to be 250g +/- 100g? Probably not. Is it supposed to be 250g +/- 5g? That's a little more reasonable. I'm suggesting that the desired outcome be explicitly stated. With the both μ and σ specified, the "For the sake of simplicity" cop-out can be removed. AngleWyrm (talk) 20:45, 13 January 2010 (UTC)
-
-
Proposed change to example I've created an image for the example. This picture represents the desired output of the factory equipment. The actual output can then be sampled, and then compared to see if it is an approximation of the desired curve, or if the sample differs significantly from the target. AngleWyrm (talk) 01:29, 23 January 2010 (UTC)
-
-
-
-
- The cups are filled by a filling machine of which the expected quantity of margarine may be adjusted. The desired output is 250 gr, but due to unavoidable random effects it is normally distributed with (property of the filling machine) σ = 2.5 gr. The machine is adjusted to deliver on the average 250 gr, i.e. the expected value is set to μ = 250 gr. But in due course this value may change a little. That's why one is interested in the confidence interval. All quite realistic. Your picture also needs be adjusted, as it shows the wrong standard deviation. Nijdam (talk) 15:38, 23 January 2010 (UTC)
-
-
-
-
-
-
-
-
- Image altered to represent standard deviation of 2.5g, and altered the initial paragraph it include σ = 2.5g as part of the problem definition. Two questions: 1. How was n=25 determined? (Required Sample Size) and 2. What is a good choice of confidence interval? Also, the problem should directly address some sort of conclusion about the sample from the factory floor. AngleWyrm (talk) 00:11, 24 January 2010 (UTC)
-
-
-
-
-
- Let's not go too overboard with the example here. It seems from the comments that what someone is wanting is something that would be much better placed in another article, such as acceptance sampling. Let's not unbalance the article for confidence interval. Melcombe (talk) 11:01, 25 January 2010 (UTC)
- The example states "To determine if the machine is adequately calibrated," so it's not too much to ask that the example directly address and answer that question. Otherwise it is not a practical example. If this article proposes the use of Confidence Intervals to deal with the factory example, then it should clearly demonstrate at the end of the problem a conclusion that "yes, the machine needs calibrating," or "No, the data does not support the expenditure of a maintenance call." So is this problem a demonstration of Confidence intervals in use? If it's not, then the entire example should be removed. AngleWyrm (talk) 00:28, 27 February 2010 (UTC
- Let's not go too overboard with the example here. It seems from the comments that what someone is wanting is something that would be much better placed in another article, such as acceptance sampling. Let's not unbalance the article for confidence interval. Melcombe (talk) 11:01, 25 January 2010 (UTC)
Imagine for a moment that the testing process is destructive, and that they are not inexpensive margarine cups, but much more expensive car engine blocks. The testing questions that come up are "Is the machine manufacturing engine blocks to within the engineer's specifications?" and "How few can I destroy and still get a defensible answer to the first question?" Let's say that each engine block costs $25 in materials and 20 min of factory time. So: Is this scenario answerable with Confidence Intervals, or is there a better way? AngleWyrm (talk) 05:12, 14 March 2010 (UTC)
[edit] By Chance
When doing a test you calculate the probability of something happening and (one way or another) compare it to your significance level. A probability is a measure of chance. To add "by chance" is therefore pointless. The argument associated with a test is that the probability that you've calculated is not the real probability of the event because the calculated probability was too small. That is, the researcher has seen something happen whose probability is less than 5% and is arguing that its probability of happening is actually higher, or if you like that the underlying distribution is not the real distribution. The actual probability of the event happening is unknown. (If you knew it you'd know the underlying distribution, and so why would you be doing testing?) You could rephrase the sentence as "In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in 20 or less but that the researcher wants to argue actually occurs more often." But surely this is needless pedantry, given that the point of testing has been made repeatedly in the preceeding articles. 13:51, 14 June 2010 (UTC) —Preceding unsigned comment added by 125.255.16.233 (talk)
[edit] likely? often?
I recently changed the article form
-
Instead of estimating the parameter by a single value, an interval likely to include the parameter is given.
-
Instead of estimating the parameter by a single value, an interval that often includes the parameter is given.
(emphasis added in both cases). I don't like the word, "likely" because I think it generally is read as, "probably." But a CI does not probably contain the real value, that would require Bayesian statistics. In contract, "often" or "frequently" or "usually" all suggest that it is the typical way of things, though it might not be true. I'd suggest any of those would be preferable to "likely." 018 (talk) 02:34, 24 September 2010 (UTC)
- I do prefer "likely" to the other terms you mention. I think it comes closest to what it means. And BTW it is a kind of probability. Nijdam (talk) 11:14, 24 September 2010 (UTC)
-
- What "is given" does not probably include the true parameter. The method produces intervals that probably include the parameter. My point is that I think this sentence lends itself to the misinterpretation. In contrast, emphasizing repetition as, "often," "usually," or "frequently," do both captures the probability aspect as well as the reality that you just don't know once the numbers have been handed to you. 018 (talk) 15:22, 24 September 2010 (UTC)
[edit] second paragraph in lead
Michael Hardy recently [1] replaced the second paragraph] in the lead stating, "This paragraph as written is wrong." The paragraph in question was
-
Because the endpoints of the confidence interval (the confidence limits) are random variables, the only claim that can be made about it is that the method of generating confidence intervals usually results in an interval containing the true value, not that any particular confidence interval (i.e. range of values) has some probability of containing the true value. To make a statement like that would require Bayesian statistics.
I'd appreciate knowing what exactly he objects to.
In addition, the paragraph was added
-
An somewhat delicate epistemological issue arises in connection with confidence intervals. Before one takes a sample consisting of one or more (usually more) data points, one may say of a 90% confidence interval that there is a 90% probability that it will contain the parameter of interest. Can one assert the same probability after one sees the data and thus knows where the endpoints of the confidence interval are? In some cases, the answer is clearly no. Specifically, if there is a known probability distribution for the parameter itself before the data are known, and the confidence interval falls in a region where the parameter is very unlikely to be, then one can say that one has probably observed one of the 10% of cases in which the 90% confidence interval fails to contain the parameter of interest. In such cases, one uses instead a 90% posterior probability interval, found by using Bayes' theorem. In cases where one has not assigned a probability distribution to the parameter, may one say, after knowing the endpoints of a 90% confidence interval, that there is a 90% chance that the interval contains the parameter of interest? One cannot say that if one construes "90% probability" as meaning 90% of the cases in which one takes a sample of one or more data points. But can one say that one should be 90% sure? That is not strictly a mathematical problem and is philosophically problematic.[1]
There are several problems with this. The first is that it is... not an paragon of clear communication. I think the lead should follow the principle of KISS and more complicated explanations can be saved for later. This article is not about Bayesian statistics, so I think it should really just be mentioned in the lead that Bayesian statistics is what would give you what you might suspect you wanted.
The second is that it is wrong. It reads, "Before one takes a sample consisting of one or more (usually more) data points, one may say of a 90% confidence interval that there is a 90% probability that it will contain the parameter of interest. Can one assert the same probability after one sees the data and thus knows where the endpoints of the confidence interval are? In some cases, the answer is clearly no." No. You can't go from Pr(B|A) to Pr(A|B) without Bayesian statistics. Bayes' theorem could or could not exist, priors could or could not exist, and it wouldn't matter. A frequentest can not transform Pr(B|A) to Pr(A|B)--end of story. 018 (talk) 18:25, 11 October 2010 (UTC)
- I agree completely. While the previous version isn't perfect, I think it is much better than its replacement, and we should revert back to the previous version for now. --Avenue (talk) 01:38, 12 October 2010 (UTC)
-
- Given the above, I have moved the offending paragraph out of the lead, where it does not seem to belong and where a general reader is likely to be put off by the lead phrase "A somewhat delicate epistemological issue arises", as who knows what "epistemological" means. I have left it in the article, as at least it has a citation for what it is trying to say, something that is sadly lacking everywhere else. I think the "previous version" of the lead mentioned above had only the single paragraph in it. But this then leaves nothing being said about "importance". Melcombe (talk) 11:22, 13 October 2010 (UTC)
- Melcombe, I'm also adding back the paragraph that Michael Hardy removed. While he claimed it was wrong, he never said why he thought this. 018 (talk) 14:15, 13 October 2010 (UTC)
- By "previous version", I meant the version including that removed paragraph, so I support O18's action. O18 has also explained why the offending paragraph is wrong. It's not clear what part of the paragraph is supported by the source cited (perhaps just the last sentence or two?), and we should fix or remove the clearly erroneous parts. --Avenue (talk) 15:35, 13 October 2010 (UTC)
- Melcombe, I'm also adding back the paragraph that Michael Hardy removed. While he claimed it was wrong, he never said why he thought this. 018 (talk) 14:15, 13 October 2010 (UTC)
- Given the above, I have moved the offending paragraph out of the lead, where it does not seem to belong and where a general reader is likely to be put off by the lead phrase "A somewhat delicate epistemological issue arises", as who knows what "epistemological" means. I have left it in the article, as at least it has a citation for what it is trying to say, something that is sadly lacking everywhere else. I think the "previous version" of the lead mentioned above had only the single paragraph in it. But this then leaves nothing being said about "importance". Melcombe (talk) 11:22, 13 October 2010 (UTC)
-
- I have replaced the offending paragraph for now with something better that may still reflect what was trying to be said. There is a question of whether a contrast with Bayesian stuff is appropriate at this point, but I did try to bring in some context of general statistical inference rather than being just Bayesian/frequentist. I will try to adjust the lead a bit. Melcombe (talk) 17:20, 13 October 2010 (UTC)
- Melcombe, what you wrote is still wrong. The implication of, "To make a statement like that would require Bayesian inference and a Bayesian interpretation of probability." suggests that Bayesian probability gives you Pr(A), not Pr(A|B). Bayesian inference only tells you the assumptions (and that is why it is okay to make probability statements) it isn't that Bayesian inference is a magic wand.
- Here is an example. Imagine if we each have private information about the value of something. Then we get shared data. You might give get one range and I might get another. Say we both constructed 90% credibility intervals--but they don't overlap, then can I say, "the probability that my credibility interval contains the true value is 90%, and the probability that your credibility interval contains the true value is 90%." No, obviously not. Or, consider a situation where I am given data and a good prior from someone and do an analysis and then by the time I give it to them the real answer has been revealed (like who will win a football game), then I would be wrong to say, "the probability that Peru wins is 70%" because they lost. This is similarly an issue with the statement, "A particular confidence interval either does or does not contain the true value and no probability can be attached." again, the same problem exists with the credibility interval.
- Finally, this article is about confidence intervals, NOT credibility intervals. So the lead should focus on confidence intervals, not credibility intervals. 018 (talk) 17:39, 13 October 2010 (UTC)
- I also want to add that I think you did a very good job with the philosophy section. 018 (talk) 17:42, 13 October 2010 (UTC)
- I don't see a problem there. The sentence "To make a statement like that would require Bayesian inference and a Bayesian interpretation of probability" does not say that Bayesian inference alone is enough; it also points out that a Bayesian probability interpretation is required. In your example, one can say "my subjective probability that my credibility interval contains the true value is 90%, and your subjective probability that your credibility interval contains the true value is 90%." Once you allow Bayesian interpretations, it may not make sense to expect people to agree on the probability of something, because this depends on what information they have available and give credence too. In the Peru soccer game example, the probability should be updated once more information comes to light (e.g. a report from a reliable source that Peru has won). But the statement "the probability that Peru wins is 70%" still makes sense if we interpret "the probability" in terms of the information available when that statement was made. If we break your last statement up into (a) "A particular confidence interval either does or does not contain the true value" and (b) "no probability can be attached", then (a) also applies to credibility intervals, but (b) does not. I agree Melcombe did a nice job. --Avenue (talk) 22:41, 13 October 2010 (UTC)
- I think the statement, "you can't say that the probability of the true value is in the confidence interval is 95% is true, that would require Bayesian..." is very misleading. I think it implies that Bayesian statistics allows you to make statements about Pr(A). Neither Bayesian statistics and frequentist statistics allow that. Bayesian statistics does allow you to make probability statements about posteriors, but now unconditionally. I'd be okay with something like, confidence intervals allow you to say "foo." An alternative is Bayesian statistics that allow you to say, "bar." But it certainly doesn't deserve much space in the lead since, again, this article isn't a compare and contrast, it is an article about confidence intervals. Now, if we wanted to merge the two... that would be different (not that I'm proposing it). 018 (talk) 23:03, 13 October 2010 (UTC)
- I don't see a problem there. The sentence "To make a statement like that would require Bayesian inference and a Bayesian interpretation of probability" does not say that Bayesian inference alone is enough; it also points out that a Bayesian probability interpretation is required. In your example, one can say "my subjective probability that my credibility interval contains the true value is 90%, and your subjective probability that your credibility interval contains the true value is 90%." Once you allow Bayesian interpretations, it may not make sense to expect people to agree on the probability of something, because this depends on what information they have available and give credence too. In the Peru soccer game example, the probability should be updated once more information comes to light (e.g. a report from a reliable source that Peru has won). But the statement "the probability that Peru wins is 70%" still makes sense if we interpret "the probability" in terms of the information available when that statement was made. If we break your last statement up into (a) "A particular confidence interval either does or does not contain the true value" and (b) "no probability can be attached", then (a) also applies to credibility intervals, but (b) does not. I agree Melcombe did a nice job. --Avenue (talk) 22:41, 13 October 2010 (UTC)
- I have replaced the offending paragraph for now with something better that may still reflect what was trying to be said. There is a question of whether a contrast with Bayesian stuff is appropriate at this point, but I did try to bring in some context of general statistical inference rather than being just Bayesian/frequentist. I will try to adjust the lead a bit. Melcombe (talk) 17:20, 13 October 2010 (UTC)
In reference to the recent edits by Benwing. I think the lead would make more sense if it made positive statements, "a CI is foo" rather than negative statements, "a CI is not bar." I also think that the implication in the new text remains that Bayesian statistics magically allows you to make statements about Pr(A) and not just Pr(A|B). 018 (talk) 15:57, 14 October 2010 (UTC)
- I edited this paragraph to try and find a compromise between your version and my version. Although you may object to "negative statements", the statement in question is in fact the most important point being made by the paragraph and deserves to be stated first off rather than cloaked three or four sentences down. The fact that confidence intervals cannot be interpreted in the "obvious" way is something that every beginning statistics student eventually runs up against, and invariably gets confused by. As a result, we owe it to our readers to make this point clearly. (IMO the fact that confidence intervals have such a tortured interpretation is strong evidence that the frequentist paradigm is not the right way to look at things.) Benwing (talk) 09:17, 16 October 2010 (UTC)
[edit] two mentions of credibility intervals that are not well linked
Right now the article mentions credibility intervals in two spots. The first is in the definition section and the second is in the "Alternatives and critiques" section. I think the information in both sections is good, and that the do belong in their respective sections... but it would be nice to mention prediction intervals in both places and to have something that ties them together. Maybe it just isn't possible. 018 (talk) 15:07, 18 October 2010 (UTC)
- I'm not quite sure what you mean here ... also, since credible intervals are the main alternative to confidence intervals, I don't see a problem in mentioning them in multiple places in the article.
A major point is that this article is about confidence intervals, not credibility intervals. A discussion/comparison of the two could reasonably go in the articles for either topic. The trouble is that too many articles then get swamped with ill-informed and citation-less discussion of Bayesian-vs-classical stuff, so much so that the basic ideas of each individual article then become confused and lost. While there might be a point in having a completely separate article for Bayesian-vs-classical arguments across the whole of statistics, such discussion has previously been removed from statistical inference which is where it should logically be if it goes in a more general article ... but for the time being it may be worth seeing if discussion for interval estimation should/would fit into interval estimation. However, on the topic of Bayesian citations here ... wikilinks to the credibility interval article should be sufficient here unless citations are to material that actually does do a reasonably good job of camparing confidence and credibility intervals. (No I haven't looked to see exactly what was added/removed.) Melcombe (talk) 09:10, 19 October 2010 (UTC)
[edit] "Philosophical issues" section is confusing
I added a "confusing-section" tag on this section because it seems very confusing to me ... reading through it, I have a hard time understanding what exactly the section is trying to say.
Perhaps more useful than this section as written would be a section addressing the reason why confidence intervals are expressed the way they are and why the interpretation ends up tricky. The way I see it, frequentist statistics asserts that probabilities are real, objective numbers, and you can only speak probabilistically about phenomena that are actually random in nature. Hence you can talk about the probability of a random process such as selecting a random voter to poll, or about any quantity that is in some way derived from a random process, but you can't talk about the probability of an inherently non-random quantity such as the percent of voters who will vote for a specific party (assuming that the voters have made up their minds and won't change) -- the percent of voters is a specific, non-random quantity, since theoretically you could ask every voter what their vote will be and determine the answer with certainty. Confidence intervals are basically the "best you can do" given these philosophical tenets. The problem is that this viewpoint is fundamentally in conflict with human reasoning about uncertainty -- humans have no problems making probabilistic assessments about non-random phenomena (e.g. "I'm 75% sure you are lying"). Humans don't see any fundamental difference in uncertainty stemming from actual randomness and uncertainty stemming from lack of knowledge, and reason probabilistically about both phenomena in the same way. Bayesian statistics takes a philosophical stance that is in accordance with this viewpoint. This is why the natural tendency of humans is to think about confidence intervals in a way that actually meshes with the way that Bayesian credible intervals work.
I think that something similar to what I've just expressed is what should go into the philosophical issues section. We should also certainly include a short discussion of what the problems with the Bayesian standpoint are.
This section should be short, as it's getting into larger and more basic issues of Bayesian vs. frequentist statistics (and should refer to an article discussing these larger issues for further info). Benwing (talk) 03:20, 19 October 2010 (UTC)
- Benwing, I feel like I keep harping on this point... you appear to understand it in the above, but the "gotcha" of freqentist CIs is that you can't say Pr(A|...)=foo. The "gotcha" of Bayesian CIs is that you can't say, Pr(A)=bar. It is important to note that the probability of A is only subjective and conditions on some prior and data. 018 (talk) 04:16, 19 October 2010 (UTC)
- You need to explain this further. In Bayesian statistics, you can talk about the probability of anything. The probability may depend on subjective assumptions, but it certainly exists. AFAIK the point of frequentist statistics was to try and construct "objective" probabilities, with the trade-off that you can't talk about certain sorts of probabilities. Bayesian statistics does not limit what you can view probabilistically. Benwing (talk) 05:04, 19 October 2010 (UTC)
[edit] "then the interval [a, b] consist of exactly the values θ0 of θ, for which the null hypothesis: θ = θ0 is not rejected at significance level 1 − γ."
This phrase doesn't make sense to me. Could someone clarify what are θ0 and θ, and what is the null hypothesis? — Preceding unsigned comment added by 134.206.18.10 (talk • contribs) 09:58, 18 November 2010
Does that explain it? 018 (talk) 16:35, 18 November 2010 (UTC)
- I think we should start with a simple, "if the null is 0" and then add the note about all theta-0 in the CI... 018 (talk) 17:09, 19 November 2010 (UTC)
[edit] ???
"In statistics, a confidence interval ... is an interval that frequently includes the parameter of interest, if the experiment is repeated."
So when you repeat the experiment you will find that you always get the same interval but that the parameter varies each time. At least that is what this is saying. 125.255.16.233 (talk) 16:58, 4 December 2010 (UTC)
[edit] Sections
I do not understand why there are the two sections "CI as random intervals" and "CI for inference". What is the intended difference? Nijdam (talk) 23:01, 4 January 2011 (UTC)
- The intended difference is given by their titles. The first points out that in the "probability of" equations, it is the end points of the intervals that are random, or that the interval itself is a random outcome deriving from the randomness in the original sampling process. The second indicates the way in which the probability associated with the random interval covering the true value can be used for inference. Melcombe (talk) 15:29, 5 January 2011 (UTC)
[edit] Intervals for random outcomes
I do not believe these intervals (prediction intervals) are also called confidence intervals. Nijdam (talk) 22:41, 7 January 2011 (UTC)
[edit] Inconsistency?
"For example, a confidence interval can be used to describe how reliable survey results are. In a poll of election voting-intentions, the result might be that 40% of respondents intend to vote for a certain party. A 90% confidence interval for the proportion in the whole population having the same intention on the survey date might be 38% to 42%. From the same data one may calculate a 95% confidence interval, which might in this case be 36% to 44%."
and
"one general purpose approach to constructing confidence intervals is to define a 100(1 − α)% confidence interval to consist of all those values θ0 for which a test of the hypothesis θ = θ0 is not rejected at a significance level of 100α%."
Surely these two statements are contradictory? I'm not enough of an expert to know which one is wrong though... 202.189.118.31 (talk) 03:07, 25 January 2011 (UTC)
No, I'm just confused... —Preceding unsigned comment added by 202.189.118.31 (talk) 03:12, 25 January 2011 (UTC)
[edit] Confused use of term significance level
Five lines down it says: "The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-value:"
Then right below Use In Practice, it says: "The significance level is usually denoted by the Greek symbol α (lowercase alpha). Popular levels of significance are 10% (0.1), 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a p-value lower than the α-level, the null hypothesis is thus rejected"
Why does it call both the p-value and α significance level? 158.145.240.100 (talk) 16:02, 16 May 2011 (UTC)
- Because α is the pre-set value that an observed p-value must be below for a test to be significant. Hence α = critical p-value as implied (note the "critical"), but α and "p-value" are not the same. Melcombe (talk) 16:11, 18 May 2011 (UTC)
[edit] Is this page unnecessarily confusing?
The current page focuses on the "true value" of a parameter. What is the meaning of that in the normal case of a "normal" distribution where values are distributed. For example, the boiling point of water under particular conditions has a "true value" and measurements can be used to narrow in on that, with variation due to error. But what is the "true value" of the height of Americans? Obviously there is no such thing, what can be estimated is the mean value, and the amount of variation through parameters like standard deviation and confidence intervals. For example, we might find that the mean value is 5'6" and an X% confidence interval might be 5' to 6'.
Talking about the true value produces confusion with the standard error of the mean as, in the case of a distribution with a "true value" the true value and the mean should be pretty much the same given large sampling. But for something like height, confidence intervals tell us about the spread of the data. The Standard Error of the Mean gives us confidence intervals on the mean but those are different than confidence intervals (by a factor of the square root of sample size). This page produces confusion on this point.
Basically, a confidence interval is the mean plus or minus a multiple of the SD. This means that it DOES tell you, for a normal distribution, that X% of values are within X% confidence intervals. In other words, if sampling shows that the 95% confidence intervals for American male height is 5'6" +/- six inches we can interpret that to mean that approximately 95% of Americans will have heights between 5' and 6'. Obviously this is an approximation, just like everything in the statistical realm, as only measurement of the entire population would provide absolute certainty.
Let me rephrase myself. If you sample a population and get mean M and standard deviation S and then construct from that a perfect (mathematical) normal distribution with these parameters we could say that 68% of measurements are between M-S and M+S. Given that M and S are estimates of the true mean and true standard deviation for a population then it seems obvious that approximately 68% of measurements are estimated to be between M-S and M+S. Why would this simple and useful interpretation be denied? I understand that there are many assumptions (the sample was sufficiently large, the sampling was sufficiently random, the underlying population is normally distributed and so on) but everything in real statistics is based on assumptions like that. — Preceding unsigned comment added by DavidRCrowe (talk • contribs) 16:59, 2 August 2011 (UTC)
- I think you might want to talk about your theories at the mathematics reference desk. If you got this information from this page, then we do have expositional issues and you should come back hare and we edit the part of the page that made you believe this is what a confidence interval is. 018 (talk) 17:43, 4 August 2011 (UTC)
I think I have a better understanding of the problem with this page now and that it mixes up two distinct concepts. "Confidence interval" is a generic concept. For example, the Oxford English Dictionary defines it as, "a range of values so defined that there is a specific probability that the value of a parameter of a population lies within it". This could be any parameter of the population, not just mean. The other concept, is of course, "Confidence interval of the mean", which is the range of values with a specific probability of the mean lying within.
The opening sentence defines quite well the generic term, "In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate."
The problem occurs when the term "confidence interval" is misused to mean "confidence interval of the mean", as in the "Theoretical Example" section, which uses the equation for confidence interval of the mean. Note that the confidence interval of the population is a similar equation without the division by the square root of the sample size (i.e. just the mean plus or minus the SD multiplied by a constant depending on the probability desired (1.96 for 95% for example)).
Either the discussion of confidence interval of the mean should be deleted or it should be clarified to indicate that this is just one type of confidence interval.
DavidRCrowe (talk) 03:54, 14 August 2011 (UTC)
Is this clearer and correct? "A Confidence Interval is claim that, if a population parameter measured across the full population falls within a specified interval, then that parameter measured on any single, perfectly random sample will also fall within that range.
For example, a political poll that places a candidate's approval rating between 36%-44% with 95% confidence is claiming that, if the actual percentage of approving voters falls within the 36%-44% interval and a single random sample of voters is selected, there is a 95% chance that the percentage of approving votes within the sample would also fall within the 36%-44% interval.
This claim does not directly address the chance that the approval rating is in the 36%-44% interval to begin with, or the chance that the claim is correct. It also does not claim to address samples that are not perfectly random, or one sample chosen from multiple samples." 98.175.26.25 (talk) 17:36, 16 December 2011 (UTC)
- unfortunately, no. There is even text about this in the article under, "Philosophical issues". 018 (talk) 21:48, 16 December 2011 (UTC)
[edit] Amazing writing.
"Mathematics can take over once the basic principles of an approach to inference have been established, but it has only a limited role in saying why one approach should be preferred to another."
What an "amazing" sentence this is. So precise and yet so deep... Who would have guessed such masters of literature and philosophy would come on our humble Wikipedia. Anonywiki (talk) 08:29, 24 November 2011 (UTC)
Cite error: There are <ref> tags on this page, but the references will not show without a {{Reflist}} template or a <references /> tag; see the help page.
- C-Class Statistics articles
- High-importance Statistics articles
- WikiProject Statistics articles
- Mathematics articles related to probability and statistics
- Frequently viewed mathematics articles
- Mathematics articles with comments
- C-Class mathematics articles
- High-Priority mathematics articles
- C-Class Measurement articles
- High-importance Measurement articles
- Measurement articles with comments
- India Education Program student projects
is a CI for
is a CI for
is a CI for 