Talk:Confidence interval
From Wikipedia, the free encyclopedia
| WikiProject Statistics | |||||
|---|---|---|---|---|---|
|
|||||
| WikiProject Mathematics (Rated Start-Class) | ||||||
|---|---|---|---|---|---|---|
| This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics. | ||||||
| Mathematics rating: | Start Class | High Priority | Field: Probability and statistics | |||
| One of the 500 most frequently viewed mathematics articles. | ||||||
|
||||||
| WikiProject Measurement | (Rated C-Class, High-importance) | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||||||||||
Archives |
|---|
[edit] What more should be in this article
Following on from "Daqu"'s last comment from above, perhaps it is time to move on to what more needs to be in this article. I would not want any more on "explanations" and "interpretations" as this is not a text book. Topics that I think should be in, but are not yet included are:
-
- one-sided intervals
- equal-tail-area ways of determining limits
- derivation via inversion of significance tests
- confidence intervals derived from non-parameteric tests
- overlap of confidence intervals (to extend the limited content in existing figure)
In addition, I think the material in the subsection headed "Meaning of the term confidence" should be removed, possibly into a separate article ... as, given some other articles, there seems a need to discuss the layman's question of "how much confidence is there in that test" in terms of statistical power rather than being led through to "confidence intervals". So possibly an article "confidence in statistical results" would be justified.
-
- This an interesting discussion. I hope the following comments are helpful.
-
- In considering "confidence in statistical results" it is important to make a difference between concepts of probability and uncertainty. In some cases, but not always, uncertainty is the same as probability. Fisher, R. A. (Statistical Methods and Scientific Inference, 1956, p. 9) writes
-
-
- "The prime difficulty lay in the uncertainty of such inferences, and it was a fortunate coincidence that the recognition of the concept of probability ... provided a possible means by which such uncertainty could be specified and made explicit."
-
-
- and later on page 37 he says that
-
-
- "... the concept of Mathematical Probability affords a means, in some cases, of expressing inferences from observational data, involving a degree of uncertainty, and of expressing them rigorously, in that the nature and degree of the uncertainty is specified with exactitude, yet it is by no means axiomatic that the appropriate inferences, though in all cases involving uncertainty, should always be rigorously expressible in terms of this same concept. ... in the vast majority of cases the work is completed without any statement of mathematical probability being made about the hypothesis under consideration. The simple rejection of a hypothesis, at an assigned level of significance ..."
-
-
- In Bayesian inference measures of uncertainties have probability interpretation, that is, they have the properties of "Mathematical Probability" (with collection of events, etc.). In case of confidence intervals and significance tests the measures of uncertainties, that is, confidence levels and observed significance levels, have been calculated from probability distributions, but as post-data concepts they do not have probability interpretation. What would be the collection of events in these cases? In case of observed confidence interval surely not the interval and its complement. And in case of significance test surely not the hypothesis and its complement. In case of confidence interval there is no meaning to assert the complement of the observed interval has confidence level alpha and in the same way in case of significance test it is not meaningful to assert that the significance of the complement of the hypothesis is one minus the observed significance level.
-
- If confidence levels and significance levels had the interpretation of probabilities there would not be need to use words "confidence level" and "significance level". Because they are not probabilities of some events, but instead measures of uncertainties of certain inferential statements these phrases are necessary to make the difference clear. --Esauusi (talk) 00:15, 16 August 2008 (UTC)
Any constructive thoughts about what should be in the article, as opposed to teaching each other statistics?
Melcombe (talk) 08:41, 5 June 2008 (UTC)
- As mooted above, I have moved the subsection mentioned into a new article Confidence in statistical conclusions. Melcombe (talk) 11:07, 24 June 2008 (UTC)
If, as a layman, I type in the term Confidence Level, I want to see some discussion telling me what this actually means in terms of what real confidence I can have in the result being offered. You have removed this from the article and placed it where no layman looking for it can find it. I would suggest redirecting the term confidence level to significance level, which is more appropriate. In the mean time, I'm putting the information back here. 125.255.16.233 (talk) 10:22, 29 June 2008 (UTC)
[edit] table to be added
plus things could be centered and stuff... --Sigmundur (talk) 11:58, 6 September 2008 (UTC)
| probability % | std deviations |
|---|---|
| 50 | 0.676 |
| 75 | 1.154 |
| 90 | 1.653 |
| 95 | 1.972 |
| 99 | 2.601 |
| 99.9 | 3.341 |
| 99.99 | 3.973 |
| 99.999 | 4.536 |
On what distribution is this based? Clearly not the normal distribution, for which correct figures are given at Normal_distribution#Standard_deviation_and_confidence_intervals. Seems to be close to a t distribution with around 200 degrees of freedom. I can't immediately see the relevance of that particular distribution to the current version of the article, however. Maybe Sigmundur could clarify? Qwfp (talk) 13:38, 6 September 2008 (UTC)
[edit] Confidence Level / Significance Level
(Message and first reply copied from users' talk pages for general info)
The latest change is better (though the link doesn't work). However, the article still needs a simple definition of confidence level (i.e. something a layman can understand), and more importantly, an indication that, when used with statistical testing, it should be interpreted as indicating the significance level, preferably with a link to that article. The reason for this is simply the obvious fact that, for the average layman, the reason they would be looking up confidence levels or confidence intervals it that they have seen something in print reporting the results of statistics based research and want to know what the terms mean.
We can do the standard wikipedia "who can hold their breath longest" thing over this, but I'd rather be sensible about it. Perhaps if you changed the article to include these changes in the way you want, we can avoid the baby stuff.
Jim 125.255.16.233 (talk) 12:44, 12 October 2008 (UTC)
- The link to further down the article works for me, but only moves to a section containing the definition. Your earlier edits seemed to want to imply that it is standard to use "significance level" in the context of a confidence interval and "confidence level" in the context of a significance test. I don't think this is done and that it is only of use when developing one or other via the significance-test inversion approach. You may be right that more infiormal description of confidence levels is needed ... possibly a short separate article would be better than trying to fit it in within the article for confidence intervals. Melcombe (talk) 09:22, 14 October 2008 (UTC)
The problem is that, for laymen, the usual contact they have with statistics is of the kind where they see stated something like "scientists have discovered that eating onions increases your chance of getting brain cancer by 52%", which is inevitably nonsense. If they research this they will see the statistic printed as 0.52 (95% CI, 0.29-0.90). When they look up CI they will find it means confidence interval. When they look that up they will see that this means that 0.29-0.90 is the interval likely to contain the chance of getting brain cancer, with the confidence level 95% indicating how likely that is. And that's where they stop, "knowing" that the scientist is 95% likely to be right, ie. that eating onions almost certainly causes brain cancer. If you look up significance level you will find all the warnings about what the 95% actually means and about multiple tests, but there is nothing here to guide them to that, and there are no warnings here. There needs to be something at the top of the article that leads to a layman's explanation of this problem. A definition of the confidence level in terms of the significance level is one way of doing that. Can you suggest another way? 125.255.16.233 (talk) 09:46, 15 October 2008 (UTC)
- There's one 'proof-by-contradiction-type argument that i've seen somewhere (but unfortunately can't remember where) that brought home to me that you can't have it both ways, i.e. you can't interpret frequentist confidence intervals as meaning there's 95% probability of the true parameter lying within the calculated values:
- Say two samples from the same population result in non-overlapping 95% confidence intervals. Correctly interpreted, this is perfectly possible, just very unlikely (1 in 800??). If you misinterpret the intervals as probability statements about a random parameter and those two fixed intervals, however, you find that the probability that the parameter lies in either interval is 0.95 + 0.95 = 1.90 (using the axioms of probability and the fact that it can't be simultaneously in two non-overlapping intervals). It's simplest to think about if the top end of one CI happens to coincide exactly with the bottom of the other: e.g. if the 95%CI for the mean is [1.2, 1.5] in the first experiment and [1.5, 1.8] in the second, then the probability the true population mean lies in [1.2, 1.8] is 1.9. But probabilities can't exceed 1, so something's gone badly wrong. This should be pretty convincing to any student of mathematics, but i couldn't call it a layman's explanation of the problem. Qwfp (talk) 11:18, 15 October 2008 (UTC)
-
- If you go back to the underlying mathematics, and remember that the confidence interval is an expression of the significance level and the probability of the observed event, you will realise that, because 5% of the time the event is just a random event that happens to be rare enough to satisfy the test, the confidence interval it produces could be anything. There's no reason why it should overlap a genuine confidence interval. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)
- I think that at one stage there were article versions (for CI's and significance tests) that attempted to define confidence level as one (or 100) minus a significance level and also to define significance level as one (or 100) minus a confidence level, which would be circular. I think that the notions of confidence intervals and significance tests need to be treated separately as practical applications do not always make any attempt to deal with them jointly. The basic definition of a confidence level is in terms of the coverage probability. If a 2,3 or 4 sentence layman's explanation in terms of coverage can be added to the introduction, then OK, if it needs to be much longer than this then either a separate article is needed or a new subsection might be made somewhere. Unfortunately, I haven't seen a good example of an article divided in moderately long layman's and technical portions. I would agree that the CI article needs to have something about constructing confidence intervals by inverting significance tests, but it shouldn't imply that this is always done, which would be the danger in a poorly written layman's explanation in terms of significance levels. Melcombe (talk) 15:11, 15 October 2008 (UTC)
I'm not sure yet that I'm putting this right, so I'll try an analogy. If I were writing a book about the countries of the world I might include a section on Myanmar. But if someone looks for Burma they will be looking in the wrong place. So I would include a reference under Burma to Myanmar. If a layman looks up confidence in wp, seeking to understand what the result of a statistical test means, they will be taken to this article. There is nothing here to explain to them that they need to go look at the article on statistical significance. Either this article is divorced from statistical testing and so that reference needs to be here, or there needs to be a prominent section for laymen explaining the use of confidence intervals in understanding the results of statistical testing. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)
- You may be expecting too much of what is only one of about 1500 statistics-related articles. However, I have added a new section at Confidence interval#Relation to hypothesis testing that may be start on what you are thinking of. Ideally it should have a formal proof of the equivalence. Melcombe (talk) 12:14, 21 October 2008 (UTC)
This misses the point in that it is hardly something that laymen are going to understand. You must be an academic because you don't seem to have any idea of how to write for the general public. :-) Might I suggest the following to replace the last statement in the first section: "When the result of a statistical test is presented with a confidence interval, the confidence level of the interval indicates the significance level of the test (significance level is 100% minus confidence level)." With significance level linking to the appropriate article. This addresses my concerns; would you be happy with it? Jim 125.255.16.233 (talk) 14:12, 21 October 2008 (UTC)
- No. You want "When the result of a statistical test is presented with a confidence interval" ... but the result of a statistical test is either yes or no (accept/reject) and a confidence interval is a different thing. If both a test and a confidence interval are presented there need not be a connection between their associated probabilities. Possibly you could have the result of a statistical analysis being presented as a confidence interval, with the confidence interval either covering or not covering a particular value of special interest and this could then be interpreted as providing a significance test. Note that "analysis" is not the same as "test" and that in statistics "test" has a particular meaning. Something like the following might do: "The parameter values inside a 100(1-α)% confidence interval can usually be regarded as being those values for which a significance test (of the null hypothesis that the true parameter value is the given value) would be accepted at a significance level of 100α% , while those outside would be values for which the test would be rejected." Melcombe (talk) 15:29, 21 October 2008 (UTC)
And what layman will ever have a hope of understanding that? A definition for laymen means no jargon and no mathematical formulae beyond simple arithmetic. That's why I spelt out the word minus. I regularly read published papers that say things like: "Total energy was associated with increased risk for both local and regional/distant stage disease. The adjusted odds ratios [95% confidence intervals (CIs)] contrasting highest to lowest quintile of energy intake were 2.15 (95% CI, 1.35–3.43) for local and 1.96 (95% CI, 1.08–3.56) for regional/distant disease." In other words, they calculated the odds ratio to be 2.15, generated a confidence interval of 1.35 to 3.43, and because it does not contain 1 are saying that the result is positive. So we see that the result of a statistical test (that there was an association) is being presented with a confidence interval (95% CI, 1.35–3.43). I want a layman seeing this in a paper and coming here to look up confidence interval to go from this article to the one on significance levels, where they can read about the pitfalls of accepting this as scientific fact. How would you suggest I phrase it? Without using jargon or "complex" formulae? 125.255.16.233 (talk) 13:00, 22 October 2008 (UTC)
[edit] Request for expert help in making this article more usable
I am an experienced user of statistics, but not a statistical expert. I came to this page for help because journal editors increasingly require authors to report confidence limits of populations statistics, rather than standard deviations or standard errors, for reasons which I understand and find persuasive.
Obviously, I expect eventually to have to consult text books and if need be the original literature, and I am capable of doing that. But I expect an encyclopaedia article to give me an orientation to the subject, tell me the main terms to look out for, and if the matter is simple enough enable me to get on with my job. I also expect it to tell me where I can read more - the sneering tone above people's desire for references is way out of order. What I found was almost entirely useless for my purposes, which I suspect are like those of the majority of people who will want to use this article. The article told me a certain amount that I already knew, and then plunged me immediately into more mathematics, in more pedantic detail, than I needed or could easily cope with. It also contained a certain amount of what I strongly suspected was philosophical grandstanding - and a glance at the talk page confirmed that impression. We all know there are issues at the philosophical foundations of probability, and that there are differences of opinion between classical and Bayesian statistics; a page about a particular topic that lies essentially within the classical sphere is not the right place for expressing those disagreements. It is right to point out somewhere that they exist, and that a confidence interval though it occupies the same role as a credible interval is not the same; but after that we should hear no more of the matter.
Accordingly, I am going to try to reorganize the article to improve its usability. In doing so, I appeal for help to all the experts who have, from their different points of view, contributed to its content. Although I intend to rely largely on material that is already present, I am going to cut out (or relegate to later sections) some detail, and as a user rather than an expert, I am also almost bound to introduce solecisms. I am confident that despite the disagreements aired above, you will be able to help, and in a consensual way. I do not believe that expert statisticians really disagree about the kind of statement that needs to be included in an article at the level appropriate for Wikipedia; and if there are differences of view, they should be signalled in the article but we should not be trying to resolve them - that would be the dreaded Original Research.
Having written a lengthy comment, I now have no time to start the work of revision. I'll be back later to get on with it, and I look forward to your help. seglea (talk) 16:55, 12 April 2009 (UTC)
- Well whether or not you consider it "philosophical grandstanding" your opening comment that "By definition, the confidence interval with probability P has that probability of including the true value of parameter which is being estimated." while perhaps defensible on some readings will certainly be misinterpreted by many readers, as was discussed ad nauseam on this page previously. Is there actually a reason why you want the page to be misleading? Jdannan (talk) 21:45, 12 April 2009 (UTC)
-
- Hah, good, I see the previous commentators have not forgotten this page, so we have a chance to work on it. I have now done my best with the opening section, which I have reduced to a (fairly) short opening para and a longer section now called "Conceptual basis". It is important to get this correct and effective before moving on to tidying up the rest of the article, so I am glad you are opening the discussion up.
- In answer to your specific point:
- (a) That is the definition of CI that I have found in the only stats book I happen to have on my shelf at home - I will check others next time I am in my office, but my memory is that they say the same thing. If this definition is not accepted by all, we are going to have to give an alternative, equally brief and conceptual, definition, and give references to where they are each discussed. It is not appropriate to try to resolve within a Wikipedia article matters on which qualified experts disagree - all we can do is try to help readers be aware of and if possible understand the disagreement.
- (b) Having read the discussions above, I think I can see the misinterpretation you are concerned about. I have now expanded the "Conceptual Basis" section to try to head them off. From the timing of your comment above, I think you caught this process half way through. Please have a look at the current version and see if it is addressing the point you are worried about. It is not very elegantly expressed yet, but if the point is right we can work on its expression later.
- (c) Without disputing the need to get this correct, I have yet to see that if readers made the misinterpretation you are concerned about, they would then go on to make errors of inference about data. (I am perfectly open to persuasion on this, but I need to be shown an example before I'll believe it.) If the interpretation is important to the statistical theory of CIs but not to their practical use, then all we need do is note that at the theoretical level the definition needs further discussion, and move that discussion to the theoretical section of the article. seglea (talk) 22:08, 12 April 2009 (UTC)
-
-
-
- The "errors of inference about data" are that even talking in a probabilistic way about (unknown but constant) parameters is a category error for a frequentist. It makes about as much sense for them to say there is a 90% probability that x lies in the interval [2.3,3.4] or whatever as it would for them to talk about the colour of x.Jdannan (talk) 02:01, 13 April 2009 (UTC)
-
-
-
-
-
-
- Thank you for your response. I understand your concern and have modified the text to try to meet it. It doesn't seem appropriate to go into this matter in the opening paragraph, as (regrettably, no doubt) it will on first encounter be largely meaningless to most people who find themselves wanting to use confidence intervals. So in the opening para I have made clear that this isn't an ideal way of expressing things, and in the next section I have tried to put it precisely; and would welcome help in getting that bit right. seglea (talk) 15:56, 13 April 2009 (UTC)
-
-
-
"A confidence interval is always qualified by a particular probability (say, P), usually expressed as a percentage; thus one speaks of a "95% confidence interval"." P is not a probability, it is the Confidence Level. At best, it is the level of probability at which a result is accepted by a researcher, or perhaps the probability of not getting a type I error. The real probability of a particular interval containing the parameter may be much higher.
"By definition, the confidence interval with probability P has that probability of including the true value of the parameter which is being estimated." This is total bullshit. P is the percentage of confidence intervals that the generation process produces that will contain the parameter. Different thing. To get the actual probability of the interval containing the parameter you would have to know the details of the generation process and do some complex arithmetic.
More seriously, a general introduction should be for a general user, not a mathematician. Yours isn't. If you want to write a good general introduction to this article, go and read the entry in the Encyclopaedia Brittanica. They do in about two paragraphs what this article fails to do in pages of waffle.
Until I see a good introduction, I'm putting the page back to where it was. It may have been hard to read, but at least it wasn't bullshit. —Preceding unsigned comment added by 125.255.16.233 (talk) 23:58, 12 April 2009 (UTC)
- I am afraid I have simply reverted your edit, which was inappropriate on several grounds:
- (a) You really should get yourself a userid so that your contributions can be properly identified and your areas of expertise recognised.
- (b) Your comments are falling short of the basics expected under WP:Civility.
- (c) You have reintroduced into the opening para a basic error (that a 90% CI will be wider than a 95% CI).
- (d) As I am not a mathematician (and have made that clear above), I would be quite incapable in writing an introduction that was suitable for one. I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means. If you find parts of the opening paragraph overly mathematical in expression, I suggest that you try some detailed editing to improve them, rather than wholesale reversion.
- (e) As I have already indicated in my responses to the helpful interventions of Jdannan, I understand the issue about defining a CI in terms of a probability - but it happens to be the definition that is widely used, and as such it needs to be given. A formulation very like the one you propose already appears in the next section of the article, to explain what is actually meant, and I am afraid this leads me to believe that you may not have read what you were criticising. I agree that it would be better to introduce the P value as the Confidence Level rather than setting people off on the wrong track by calling it a probability, and I am just going to change the text to do that.
- Where I do agree with you is that this article contains a great deal of unnecessary material, some of it repetitive. I am working from the top down to eliminate that and will not be able to do it all in one edit. I do not think that the 2 paragraphs you cite for the Britannica article are a necessary target; the present material includes some useful mathematical sections that should be retained, though not in the earliest parts of the article. seglea (talk) 15:56, 13 April 2009 (UTC)
-
- And I see that Confidence Level redirects to this article, so that term should certainly be in the opening para, in bold - it now is. seglea (talk) 16:05, 13 April 2009 (UTC)
- IP editors are welcome if their edits meet with consensus, as are you. But do you really think you should be throwing out accusations of incivility, after your first missive here? Allowing time for discussion of your concerns before making changes might have caused less fireworks and reduced the chance of your work being reverted. But I agree with you on (c), at least, and that there is plenty of scope for improvement. -- Avenue (talk) 18:02, 13 April 2009 (UTC)
- Unreserved apologies to all if my opening remarks came across as uncivil; they were not intended to be; I was simply trying to state what I thought was wrong with the article from my point of view as a likely user, to set out a strategy for improving it, and to recruit editors of good will and greater expertise than mine (among whom I strongly suspect you can be counted) to help in the process. seglea (talk) 18:48, 13 April 2009 (UTC)
- IP editors are welcome if their edits meet with consensus, as are you. But do you really think you should be throwing out accusations of incivility, after your first missive here? Allowing time for discussion of your concerns before making changes might have caused less fireworks and reduced the chance of your work being reverted. But I agree with you on (c), at least, and that there is plenty of scope for improvement. -- Avenue (talk) 18:02, 13 April 2009 (UTC)
- And I see that Confidence Level redirects to this article, so that term should certainly be in the opening para, in bold - it now is. seglea (talk) 16:05, 13 April 2009 (UTC)
-
Suppose a researcher conducts a study in which she does ten statistical tests and gets one positive link. If each test is done at 95% confidence level then she has ten chances at 5% or over a 40% chance of getting at least one type I error. So the probability that the interval will contain the parameter is just under 60%, not 95%. Textbook definitions are based on the assumption that only one test is done, but in the real world that is very rarely the case.
The real problem with this article is one common to most of the mathematical articles in this monstrosity, which is that is is so overburdened with fiddly technical definitions and qualifications that no ordinary person has much hope of understanding it. Show the opening paragraph to your wife and then ask her to tell you what a confidence interval actually is; I doubt she'll get past "interval estimate". Then do the same with the Britannica article, and you'll see what I'm talking about.
"I am attempting to write for the most likely user of this article - the person who encounters the term "Confidence interval" in a scientific text and wants to check what it means." -- I'd really like to see an article here that such a user could understand, but it won't happen.
With reference to (c), that got snuck in 31st march. It wasn't there last time I read it. :-) There are two many idiots editing this monstrosity.
With reference to (b), I've been through one revert war already, and have seen what 'civility' means here. :-) Get yourself a thicker skin. —Preceding unsigned comment added by 125.255.16.233 (talk) 11:10, 14 April 2009 (UTC)
- Now, if I wanted to be tiresome I could suggest that generalising from one bad experience is no way for someone who claims to know about statistics to behave. Having edited hundreds of Wikipedia articles, and seen many of them grow into fine and usable documents through good-humoured collaborative effort, my experience is different. Until proved wrong, I'll assume that the same process can work for this article, and I am doing my best to look through your dyspeptic way of expressing yourself to what you are saying. Indeed, I've just made a couple of edits that reflect the points you are making above. Actually I think we (and other contributors) have the same aim here - namely to reduce a sprawling article to a usable one. That should not mean sacrificing technical rigour, it just means getting it in the right place. However, I edit on this site for pleasure and my own education, and neither of those is added to by interacting with people who don't have any manners; so I have a general policy of ignoring people who are simply rude. seglea (talk) 21:56, 14 April 2009 (UTC)
I have made some more tweaks to the intro, and removed a couple of chunks lower down that seemed out of place. If anyone thinks they were needed, can you explain what their particular role was, please? seglea (talk) 21:56, 14 April 2009 (UTC)
Re: 283982996 -- this stays in until I see the point made elsewhere in the article, in language my wife could understand. :-) —Preceding unsigned comment added by 125.255.16.233 (talk) 12:29, 15 April 2009 (UTC)
- My concern about these paras are that, though the points are fair enough, I am not sure they belong in this particular article - they are very general issues about statistical inference. This kind of thing (i.e. general issues finding their way into specific articles) is a common cause of bloat in Wikipedia articles - and this article is by common consent suffering from bloat. I will have a look around and see if they can be placed somewhere more suitable and dealt with more briefly here, by a link. seglea (talk) 22:25, 15 April 2009 (UTC)
Firstly, given that both Confidence Level and Confidence Interval point here, where else is a more appropriate place to talk about what the word confidence means in a statistical context? Where else is someone looking up the word confidence going to end up? Secondly, given that the reason many people will look up confidence level/interval is because they've seen a published statistic and want to know what the 95% means, this is the appropriate place to tell them. Thirdly, in my experience "move it to a more suitable article" is a weasel way of saying delete it without appearing to. The wikipedia indexing function won't find the article "Meaning of the term confidence in a statistical sense" when someone types in confidence level, and so the content is buried. My suggestion is to have another section directly under the main section with a title like "Confidence Intervals and Statistical Tests" which tells laymen how to interpret confidence intervals when presented with the results of statistical tests. That to be followed by the technical definitions. But I'm happy so long as this remains. —Preceding unsigned comment added by 125.255.16.233 (talk) 00:38, 16 April 2009 (UTC)
-
- Let's recall that this stuff was deleted previously and put back unnecessarily by this same person who insists that only he is right. And now put balk for a second time. The article Confidence in statistical conclusions was created so as to have a place for this stuff which clearly does not belong in this article but might have some small merit .. but by the tags added no one else seems to think so. Of course this wasn't good enough. The key to "making this article more usable" must be to first omit all the stuff not directly related to the article's title. Melcombe (talk) 08:53, 16 April 2009 (UTC)
[edit] Archive proposal
This Talk page is getting clumsy and could do with archiving. I propose archiving everything prior to 2009 - any objections, or suggestions of a different cutoff point? seglea (talk) 22:25, 15 April 2009 (UTC)
- I agree it's too big, but the discussions in June 2008 seem to overlap with some of the points raised recently. How about just archiving everything prior to that? -- Avenue (talk) 00:16, 16 April 2009 (UTC)
[edit] I'm a bit rusty in statistics, but......
The terms of confidence interval, cofidence level and degree of certainty are needed to be clarified. If they are synonyms, please list them. BTW, is there any APPROVED STATISTICS TERMINOLOGY? —Preceding unsigned comment added by 124.78.212.232 (talk) 11:37, 10 May 2009 (UTC)
I recommend to create another article for confidence level, even there are a few words--124.78.212.232 (talk) 11:58, 10 May 2009 (UTC)
In the book of mine titled as Introduction to Business Statistics by Alan H. Kvanli, the relationship of Confidence Interval and Confidence Level has bee described as follows:
The higher the confidence level, the wider the confidence interval. The confidence level is written as (1 − α)·100%, where α= .01 for a 99% confidence interval, α= .05 for a 95% confidence interval, and so on.--124.78.212.232 (talk) 12:12, 10 May 2009 (UTC)
I tried to get the two terms separated some time back, but there are forces at work here that don't want to see that happen. 125.255.16.233 (talk) 14:47, 7 June 2009 (UTC)
[edit] Avoid jargon words in the definition of the term.....
The definition of the term in the following
https://www.involvenursing.com/SADTN/web_101_glossary.jsp
is more understanable than the one in the begining of this article.--124.78.212.232 (talk) 12:25, 10 May 2009 (UTC)
[edit] Meaning of the term "confidence"
Hello. I'll admit right away that I'm not a statistician, but this section had me so confused that I feel forced to question it.
"In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less."
Wouldn't the researcher only see it if his observation is incorrect? If we reject hypothesis A with a confidence interval of 95%, we have not necessarily observed something that happens one time in twenty. If hypothesis A is CORRECT (which is not necessarily the case) and we STILL reject it due to our observations, then we have seen something that happens one time in twenty or less (and vice versa). This text says kind of the opposite.
- A statistical test involves looking at the probability of an event and comparing it with the significance level. If the probability of the event is less than the significance level you reject the null hypothesis, which is that the event happened by chance. If the confidence level is 95%, the significance level is 5%, or 1 in 20. If you reorganise the mathematics so that you are testing on a confidence interval, the basis for the test is still that there is a 5% chance that the expected statistic will fall outside the interval. That's what the 95% means. So when you do a statistical test at 95%, you are looking to see if something has occurred that has a probability of happening of 5% or less. If the result is positive, then the question arises as to whether it is a false positive (a type I error) or a real result. If you only do one test, and it returns a positive result, then 95% of the time it will be a real result.
-
- "If you only do one test, and it returns a positive result, then 95% of the time it will be a real result." NO this is flat-out wrong. It is simply not possible to infer a probability that an effect is "real" or not based on frequentist statistics. If the null hypothesis is true, then in 5% of experiments a test will generate a "significant" result (according to the test) and 95% of the time it will not. That's all. Jdannan (talk) 04:11, 8 June 2009 (UTC)
"If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were."
- Assuming you had specified your test before rolling the dice, this is true enough, but note that the 97% "confidence" here does not equate to believing the dice are fixed with probability 97%. Jdannan (talk) 04:11, 8 June 2009 (UTC)
The chance is also 1/36 of rolling a double one, two, three, four or five. Even if you get a mix, by the above reasoning you could say: "I got a 3 and a 5, the odds of that are only 2/36, or about 6%, so I am 94% confident that the dice are fixed". "Statistically speaking" this whole reasoning must be considered faulty, since you're very certain that the dice are wrong no matter what. At least this sounds to me as though the test is made "a posteriori". You need to, so to speak, FIRST set up a confidence interval, THEN look at the dice. Here you are purposefully constructing a confidence intervall that does not include the sample outcome. Maybe I'm just reading it wrong; it would work if you first say, "if the dice are normal, I am 97% confident that they will turn up a result from 2 to 11", then roll the dice once and discover that they failed your test. If that's the case, I think it needs to be clarified.
- The underlying assumption is that you are trying to roll a double six, and hence that the probability of doing so is what matters. If you use your 3 in 5 argument, you are really saying that if you roll the dice and get two numbers then the dice must be fixed. But the chance of that is 100%. Or maybe you are saying two different numbers; the chance of that is 5/6. The double six argument is two sixes.
"For example, say a study is conducted which involves 40 statistical tests at 95% confidence, and which produces 3 positive results. Each test has a 5% chance of producing a false positive, so such a study will produce 3 false positives about two times in three."
First of all, the outcome of the tests should depend on the things that are actually tested. Assume for instance that the actual value of the 40 things tested were all positive. Then you could have no "false positives", but in this example, you would have 37 false negatives. I guess the idea is that the tests should all come out negative (if the tests were 100% correct), but 3 come out as false positives. That is to say we do 40 coin tosses with a coin that's 95% likely to end up on one side and 5% on the other, and still we get the other 3 times out of 40. Then I must ask how it's been calculated. I would get
0.95^37 * 0.05^3 * 40! / (37! * 3!) = 0.185 as the likelihood of 3 false positives, so how is it "two times in three"?
- Two times in three the study will have produced 3 false positives, there being no real effects. The next most likely outcome is two false positives and one real result; obviously the chance of that is less than one third. Within the third, there is a very minute chance that the study will have produced no false positives and 37 false negatives. I can't calculate the exact probability of that for you because I don't know the numbers involved, but it is insignificant. You can't start by making an assumption about which result are reael because if you knew that why would you be doing statistics?
- Just noticed... The chance of a positive being a false positive is 5%, because that's what the test is based on. The chance of a negative being a false negative is dependent on the strength of the effect and the number of tests. It's not 5%; unless you are dealing with small sample sizes its a hell of a lot less. —Preceding unsigned comment added by 125.255.16.233 (talk) 23:28, 7 June 2009 (UTC)
"Thus the confidence one can have that any of the study's results are real is only about 32%, well below the 95% the researchers have set as their standard of acceptance."
Well, the confidence that any single result is correct is 95%, obviously (what does "any of the study's results" mean if not these?). 213.100.32.24 (talk) 23:20, 6 June 2009 (UTC)
- Another way of explaining this is as follows: A statistical test is based on the probability of an event. When you do multiple statistical tests you are looking at multiple events. When you select the ones that are positive you are looking at multiple events and selecting. An underlying law of probability is that when you do this the events are not independent, and hence the probability of all the events must be taken into account when calculating the probability of any individual event. The mathematics used to calculate p-values, confidence intervals, and so on are derived on the assumption that there is only one event. Hence, when you do multiple tests you break that assumption and you get the wrong answers. In every case the actual probability will be much higher than the probability tested. In this case the tests were conducted at 95% confidence but because more than one was done the real confidence is about 32%. To get the 95% desired the individual tests would have had to be conducted at a much higher confidence level. 125.255.16.233 (talk) 14:44, 7 June 2009 (UTC)
-
- One, what is wrong? You said yourself that you are not a statistician, so it would be useful to see what you regard as actually wrong and why. What you have shown above is a lack of understanding of the mathematics underlying statistical testing, which is understandable, so I assume you are not going to argue with me over that. If there's something wrong with the wording I'm happy to see if we can find a more acceptable phrasing. Two, Confidence Interval and Confidence Level both refer to this article. I tried to change that once and failed dismally. So this article, like it or not, has to cover both topics. 125.255.16.233 (talk) 16:38, 7 June 2009 (UTC)
- Well I'm not the one who wrote most of the above. And I am a statistician. And this article is about the term confidence interval, so an explanation about 'confidence' in this article should relate to confidence interval. There is of course a one-to-one relation between testing and confidence intervals. But: In statistics, a claim to 95% confidence simply means that the researcher has seen something occur that only happens one time in twenty or less., well, may not be wrong if interpreted in the right way, but at least completely not understandable, and also not direct related to confidence intervals. Also the next sentence: If one were to roll two dice and get double six (which happens 1/36th of the time, or about 3%), few would claim this as proof that the dice were fixed, although statistically speaking one could have 97% confidence that they were. Similarly, the finding of a statistical link at 95% confidence is not proof, nor even very good evidence, that there is any real connection between the things linked. doesn't contribute much to the understanding of 'confidence'. Nijdam (talk) 09:00, 8 June 2009 (UTC)
- One, what is wrong? You said yourself that you are not a statistician, so it would be useful to see what you regard as actually wrong and why. What you have shown above is a lack of understanding of the mathematics underlying statistical testing, which is understandable, so I assume you are not going to argue with me over that. If there's something wrong with the wording I'm happy to see if we can find a more acceptable phrasing. Two, Confidence Interval and Confidence Level both refer to this article. I tried to change that once and failed dismally. So this article, like it or not, has to cover both topics. 125.255.16.233 (talk) 16:38, 7 June 2009 (UTC)
- Firstly, the terms confidence interval and confidence level both redirect here. Hence, even though the title is confidence interval, the article is about both terms. I feel the two terms should be split into two separate articles. Good luck making it happen; I tried and got shouted down.
- Secondly, the point of this section is to highlight the difference between the normal meaning of the word 'confidence' and 'confidence' as a statistical term. When a layman talks of being 50% confident of something he means that he is half sure that it is right. I hope you agree that the result of a statistical test that is reported with a 50% confidence level does not mean the same thing. I believe that it is entirely appropriate within an 'encyclopedia' supposedly for the masses to ensure that that point is clear. 125.255.16.233 (talk) 12:01, 8 June 2009 (UTC)
- Nuts! This section continues to conflate study-wise error and error in individual tests. How can it possibly be that if I conduct one hypothesis test and get result A and you conduct tests A and B that our results for A mean different things? It's nonsensical. There isn't some probability fairy out there who watches over what tests you do and changes the likelihood of a result on that basis. The joint probability of A and B is of course lower than the individual probability of A or B. You lower the critical value alpha for the tests on A and B not because anything has changed about those individual tests, but because you want to reduce the likelihood that any of the results you present, all together, are in error. But again, your decision with A is no more or less likely to be in error if you publish it alone than if you publish it along with B. --161.136.5.10 (talk) 18:43, 30 June 2009 (UTC)
[edit] Relation to testing
In the section about the relation with statistical testing I read twice: 'second' parameter. What is the meaning of this 'second'?Nijdam (talk) 13:24, 8 June 2009 (UTC)
- If you measure the average height of a group of people and generate a confidence interval on it, then measure the height of a second group of people, if the second height does not fall in the confidence interval then you can reject the hypothesis that the true average heights of the two groups are the same. With lots of caveats about distributions. And buried in waffle. 125.255.16.233 (talk) 10:41, 9 June 2009 (UTC)
- Hence the caveats and the waffle. 125.255.16.233 (talk) 14:19, 9 June 2009 (UTC)
-
-
-
- I think this section is wrong. It seems to say: Suppose f(.|q) is a density with parameter q, and x1,...,xn is a random sample from this distribution, with X=t(x1,...,xn) and also (a,b) is a P-confidence interval for q, then if y1,...,ym is a random sample from f(.|w), with Y=t'(y1,...,ym), then the complement of (a,b) is the critical region for testing H0: w=X on the basis of Y. If so it is nonsense, and it seemes to show the error often made by people who have difficulty in understanding CI, to give the CI the role of acceptance region, which it is not. Nijdam (talk) 15:27, 9 June 2009 (UTC)
-
-
[edit] "Interval" vs "length of an interval"
The second sentence of the article "Instead of estimating the parameter by a single value, an interval likely to include the parameter is given." could be improved by saying "the length of an interval" instead of "an interval". (I agree that an example in the article does make it clear that a confidence interval is not a specific interval with specific numerical endpoints. But why not start out on the right track?) Or you might want to resort to more lengthy language that says it is a conceptual interval centered about the unknown true value of the parameter being estimated.
is a CI for
is a CI for
is a CI for 
