Talk:Type I and type II errors

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated B-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
 
WikiProject Mathematics (Rated B-class, High-priority)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
High Priority
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
edit·history·watch·refresh Stock post message.svg To-do list for Type I and type II errors:
  • Promote case study. Medical screening is one of the oldest uses of statistics.
  • Adding a more rigorous mathematical foundation to the formula sections. Rice - Mathematical Statistics and Data Analysis sets up such a framework. See pp 16, 300.

Merge False negative, false positive, and type III error into this article[edit]

The reason that I have suggested the merge is that each of these three pieces desperately need to be understood within the context of the other two.

I have done what I can to make the "matrix" which the combined three will inhabit as clear as I possibly can.

The further work, to some extent, needs someone with a higher level of statistical understanding than I possess. However, I am certain that, now I have supplied all of the "historical" information, supported by the appropriate references, that the statistical descriptions and merging will be a far easier task.

Also, I am certain that, once the articles are merged, and once all of the statistical discussions have become symmetrical in form and content, that the average reader will find the Wiki far more helpful than it is at present. My best to you allLindsay658 22:01, 23 June 2006 (UTC)

What do others think about this merger? I wanted to point a friend to an article about "false positives". Even though the information in the article is statistically correct, it makes little sense if you talk about false positives in the context off e-mail spam filters or anti-virus checks. Imagine you want to explain someone what a false postive is in the case of an anti-virus check and you point him to this site. He will see all the math talk and run - but that is not necessary: the concept of a false positive can be understood even if you have no clue about statistics. e.g. here is an explanation which has nothing to do with statistics, but is perfectly understandable for people who are not mathematically inclined: http://service1.symantec.com/sarc/sarc.nsf/info/html/what.false.positive.html

Wouldn't it be better to create an article about "false postives" and let it point to this article for further details? 79.199.225.197 (talk) 15:40, 4 May 2009 (UTC)

Prod?[edit]

This article was proposed for deletion on the basis of qualms about its name. I agree that it should be moved, and suggest Type I and Type II errors, but I see no reason to delete. Septentrionalis 21:32, 2 July 2006 (UTC)

Very well. The {{Prod}} was a mistake, as there is material here that should be retained in an Errors (statistical) or Type I and Type II errors article. Conversation with the primary author (on my talk page User talk:Arthur Rubin#Four types of error) suggests he/she intends to improve it. — Arthur Rubin | (talk) 12:17, 3 July 2006 (UTC)
Lindsay658's plans for that improvement seem reasonable. Lindsay, if you could use some assistance, especially in straightening out double redirects, let me know. Septentrionalis 12:56, 3 July 2006 (UTC)

Reorganization of article[edit]

Heeding the advice and guidance of Arthur Rubin and Septentrionalis this article is being re-titled and re-organized in a way that far better represents the overall situation.Lindsay658 22:45, 3 July 2006 (UTC)

True negative and true positive[edit]

The term true negative is requested at Wikipedia:Requested_articles/Mathematics. I think it would be most appropriate to link the two terms true negative and true positive to this article. They are however not explained here, so they need to be added. I think it would be a good thing to create a confusion matrix at the beginning of this article, and explain each term with a reference to the matrix. I'm going to do the linkup when I finish writing this. If no opposition is uttered before Tuesday, I will do the proposed changes. Bfg 15:18, 16 July 2006 (UTC)

Bfg. The reason they were not included in this "original" article is due to the fact that the present article, as it stands, is a comprehensive rearrangement and reordering of information, and an amalgamation of already existing Wiki-articles that, in my view, should never have been existing separately in the first place, given that (a) each needed to be understood in the context of the others, and (b) the way each was written did not encourage the reader to visit any of the other associated articles. (In other words, there is no reason not to do what you suggest).
What you are suggesting sounds a very positive step to me, provided that:
(a) very clear distinctions are made between the terms "true negative" and "true positive" (i.e., how are they the same and how are they different?),
(b) a coherent story is included which clearly describes why these two (being "true") are considered to be in the same sort of domain as those already mentioned in the current article (being "false"),
(c) a clear identification of who was the first to use each of the terms, when, and to what purpose (and, also, whether they were first used separately or as a pair),
(d) a description of practical instances of each of the terms' application -- and, wherever possible, the examples of the usages are contrasted such that, depending upon the real-life applications you might be describing, you might finish up describing "true negative" contrasted with "false positive" in one situation, and "true negative" contrasted with "true positive" in another,
(e) an indication of the sorts of people that might use the two technical terms,
(f) whether all of those that use the two technical terms use them in the same way, and,
(g) (in my view, the most important of all), is it the case that these terms (and the concepts they represent) were derived from, and first used to contrast with, "false negative" and "false positive", or was it the other way around?
My view is that, if those simple points can be covered in your changes, the impact, importance and information value of the present article will be very greatly embellished by such a valuable inclusion. Best to you and your work Lindsay658 19:45, 16 July 2006 (UTC)

Lindsay. I admit to not reading the entire article, just shimming. I've done a reread of the article upon reading your comments. I'll try to cover your points:

  • (a),(b) Should be no problem.
  • (c)-(e) I do not know of these terms are the most commonly used, but they are easily understood by most practicioners in the context of false positive and false negative, as they explain the two remainig boxes of in the confusion matrix of a two-class decision problem.
  • (f) I can not see how anyone could use them in any other way.
  • (g) From what I have already stated, I think it is pretty clear that these concepts are understood by being the complements of false negative and false positive.

After reading the article, I also propose that the article is reorganization in the following way:

  • Medical screening promote to case study. Medical screening is one of the oldest uses of statistics. Explanations could be done with respect to the case study. Expand and elaborate. Move this so it becomes chapter 1.
  • False negative vs. false positive. This is basically a listing of different uses. Rename it to Usage examples. Demote it so that it becomes the very last chapter before See also and the rest of the endnotes.
  • Critical false positive. I'm not familiar with this term, but for certain it should not be introduced in subchapter x.y.z.t etc. This need to be promoted somehow. I'm not sure of how, but either we could use spam filtering as the case study (No need to be conservative about the screening if not necessary), or we could make spam filtering the only case study. Other sugestions are welcome.
  • Statistics formula. I would propose adding a more rigorous mathematical foundation to the formula sections. Rice - Mathematical Statistics and Data Analysis sets up such a framework. See pp 16, 300.
  • Statistical treatment. I'm not sure if it is the best name, but I would like a major heading, with "Neyman and Pearson","Type I and type II errors","False negative rate","False positive rate" and "The null hypothesis" as subheadings. "Bayes' theorem" is probably also best treated under this heading. I think some rearangement could be done under this chapter too, perhaps reducing the number of subheadings, and introduce a subchapter "definitions"?
  • Footnotes. There are far to many footnotes. Is there an easy way to distinguish between footnotes and references? The specific references should be treated as specific references and not as footnotes. Bfg 11:06, 17 July 2006 (UTC)
Bfg. Away for a couple of days. Will respond more fully then. All seems ok with what you have written, except for bit on footnotes. Can you please explain (at some length because I don't understand what you have written at all) what you mean in relation to the remarks on footnotes and references? Back in 2 days. Best to you Lindsay658 05:14, 18 July 2006 (UTC)
Lindsay. If you take a look at the notes in this revision, you will notice a difference in nature between eg. note 3-7, which are really very specific references, and eg. 14-16, which are true footnotes. They should be separated in order to make a clear distinction between them. A typical reader (at least me) would typically read the footnotes, but only look up the references if they were of special interest to me. Having too many notes makes the text flow badly, while adding many references are usually not a big problem IMHO. Bfg 18:24, 18 July 2006 (UTC)
Bfg. Upon reflection your views are correct here. They are two entirely different sorts of "additional information". I will do my best to correct things so that everything now matches your design -- except that, I now think that the piece in note 8 ("O" not zero, etc) should appear in the text (simply on the basis that most of those prone to make the mistake will probably not look at the footnote). Lindsay658 01:25, 20 July 2006 (UTC)
Bfg. Forgot to mention that, overall, the article looking much better with your input. Also far more informative. Hope that what I have done with the references now meets your approval. If it doesn't, can you please explain why (as I have done what I think you are asking). Also, I hope that you agree that the "O"/zero bit should appear in the text, rather than in the footnote. Best to you for the rest of your task. Lindsay658 01:46, 20 July 2006 (UTC)
Lindsay. I would say I agree with what you have done, but not how you have done it. I much preffered the old style of linking when references were by numbers, rather than the new inline scheme which you now employ. I didn't do the changes myself since I am not sure of how to do them, but what I would have liked had been two separate numbering schemes, one for the footnotes and one for the references. I've done some research on the proper style for wikipedia, but I can't figure it out properly, see (WP:Footnotes,Wikipedia:Footnote2,WP:Footnote3). Bfg 14:31, 21 July 2006 (UTC)
Bfg (I've gone "inwards" a little, so that our discussion, if it continues, as it well might, does not finish up being comprised of single word columns).
To the best of my knowledge the WP:Footnote3 method is now, essentially, obsolete (for reasons connected with its overall inflexibilty in relation to it not being able to continue to match other completely unrelated, but inter-connected technical changes that might be made in overall Wiki-programming).
It also seems that Wikipedia:Footnote2 is moving towards being, at least strongly non-recommended, if not it being the obligatory practice not to use it (and, once again, it seems that this is so on the grounds of flexibility in the face of other programming changes).
Thus, the currently recommended system is WP:Footnotes; and this system also allows multiple textual footnotes citing the same page of the same reference to be listed as a single footnote in the "NOTES" section.
Both my "first version" (all in "notes") and my "second version" (half in text and half in "notes") are symmetrical with the conventions of the WP:Footnotes system.
There is a very long and very convoluted and very hard-to-follow discussion at Wikipedia_talk:Footnote3 which -- as far as I can tell (and I stress "as far as I can tell") -- has a number of participants advocating a position that what they term "references" and what they term "footnotes" should be processed in such a way that they appear in separate lists (a distinction which, I think, matches the two different sorts of thing that you have very clearly identified).
It seems that this suggestion was rejected; and it seems that there were, essentially, two grounds to the objection:
(a) none of the available "style manuals" suggested such a practice, and
(b) none of the "objecting" participants had ever seen such a thing in their lives in any book or journal article.
I have not the slightest "personal" problem with your suggestion, which seems just as logical (if not more logical) as having them all combined; but it does seem that, according to the current Wiki convention, the "references" must be either in the "text" of the article, or in the "footnotes".
Perhaps, this is the way we should proceed:
(1) If you still feel that the "references" should be (a) separate from the "footnotes", and (b) not in the article's text -- and, as I have said, I have no aesthetic or intellectual objections to such a move -- then maybe you should request "HELP ME" and seek the advice and guidance of a Wiki "higher up" (because there may be some "higher level" of programming or formatting that will allow you to achieve what you are seeking).
(2) Otherwise, I am more than happy to set about changing things back to how they were, with one exception: upon reflection I am most certain that the piece about the difference between the "O" and ZERO subscript should remain in the text.
PLease let met know your thinking in due course.
Best to you and your work Lindsay658 23:29, 21 July 2006 (UTC)
I have seen books which distinguish between footnotes (substantive) and endnotes (pure citations); but WP doesn't envisage anything that long. If you really want to make this distinction, Harvard referencing, the inline use of (Snikklefritz 1994), will do that; but it looks quite forbidding to readers who aren't used to it. (Some editors like it for that reason; they think it adds tone.). There's even a set of templates. Septentrionalis 15:46, 22 July 2006 (UTC)
The style I'm talking about is the one I know from most scientific journals. Eg. IEEE with more, for some examples see [1]. Except for the Trier article, they all use the same style. The Trier article has an interesting alternative style. FYI, I will probably not be doing so much editing for a while, since I have broken my wrist this weekend, and using the keyboard with one hand is not that easy. (Training is bad for the body ;-) ) Bfg 14:24, 24 July 2006 (UTC)
Bfg. Hope your wrist is better soon. I have followed the links you provide. The articles you have displayed certainly do separate the "references" from the "footnotes"; however, as I understood from your original request, you were well aware that Wiki articles did not have "pages" (in the same way that the articles you have offered as samples have individual pages) -- and, therefore, you were aware that it was impossible for the "footnotes" to appear at the bottom of each page (whilst the "references" appeared at the absolute end of the article).
Thus, the focus of my response was at the possibility of having both "footnotes" and "references" (a) in different locations at the end of the article, and (b) indicated in a different way (so that one could easily say "this [digit] indicates a footnote" or "this [lower case letter of the alphabet] indicates a reference".
It seems that this is impossible (and, it seems that the note, above, from Septentrionalis confirms this). As, I said before, I would be quite happy to change the text back to what it was, if that was what you thought best. Lindsay658 00:22, 26 July 2006 (UTC)
Lindsay. Got a lighter cast now, so it is now possible to write with both hands. Makes things a bit easier :) . (BTW, this discussion is becoming quite large now.). The links were in response to your point (b) above.
This discussion has become more and more about footnotes. To summarize: The special case in this article is the amount of footnotes and how we make references. This boils down to my original problem, references are good, footnotes too, but only in limited numbers. The amount of notes is so large that we should perhaps do something about it. I wonder if it is time to involve more people in this discussion. In particular to let them review the problem of separation in light of this? If more people agree, we might also need to involve developers. Thoughts? Bfg 14:39, 31 July 2006 (UTC)
Bfg. On the basis of all of the above, which asserts that what you desire is 100% impossible, I wonder if you could look at Amda Seyon I. Maybe it is using some sort of non-recommended Wiki-manipulation, but it certainly seems to meet your needs. Thoughts? Lindsay658 02:03, 4 August 2006 (UTC)
Aditional note: When you carefully examine the structure of Amda Seyon I you will also find that the smart and innovative system used -- which allows distinctions to be made between "NOTES" and "CITATIONS" within the text of the main article -- also allows the use of (digit) "footnotes" within the (alphabet) "NOTES" section that direct the reader to the "CITATION" that is the source of the information contained within each individual "NOTE".
Once again, it seems that this mechanism would match your expressed needs 100%. If you agree, I will attempt to appply this system to this article. Thoughts? Lindsay658 00:07, 7 August 2006 (UTC)
Lindsay. From a readers POV this is exactly what I'm looking for. From an editors POV, there are some drawbacks I think, namely the lack of auto-numbering and sorting at the end. I think this is as close as we may reasonably get to my vision. If you find the style appealing, you have my blessing to go ahed and implement it. I'm going to hold off on the other edits untill we resolve this issue. Bfg 12:19, 7 August 2006 (UTC)

Article improvements[edit]

I copied the article improvement suggestions here to separate them from the very lengthy discussion about footnotes. Bfg 12:14, 7 August 2006 (UTC)

  • Medical screening promote to case study. Medical screening is one of the oldest uses of statistics. Explanations could be done with respect to the case study. Expand and elaborate. Move this so it becomes chapter 1.
  • False negative vs. false positive. This is basically a listing of different uses. Rename it to Usage examples. Demote it so that it becomes the very last chapter before See also and the rest of the endnotes.
  • Critical false positive. I'm not familiar with this term, but for certain it should not be introduced in subchapter x.y.z.t etc. This need to be promoted somehow. I'm not sure of how, but either we could use spam filtering as the case study (No need to be conservative about the screening if not necessary), or we could make spam filtering the only case study. Other sugestions are welcome.
  • Statistics formula. I would propose adding a more rigorous mathematical foundation to the formula sections. Rice - Mathematical Statistics and Data Analysis sets up such a framework. See pp 16, 300.
  • Statistical treatment. I'm not sure if it is the best name, but I would like a major heading, with "Neyman and Pearson","Type I and type II errors","False negative rate","False positive rate" and "The null hypothesis" as subheadings. "Bayes' theorem" is probably also best treated under this heading. I think some rearangement could be done under this chapter too, perhaps reducing the number of subheadings, and introduce a subchapter "definitions"?
  • Footnotes. There are far to many footnotes. Is there an easy way to distinguish between footnotes and references? The specific references should be treated as specific references and not as footnotes. Bfg 11:06, 17 July 2006 (UTC

Medical screening[edit]

I am currently contemplating how this could best be done, an example should be useful both from a hypothesis testing perspective and a Bayesian classification perspective. I am considering making up an example where we look at the hypothesis testing as a two class Bayesian classification problem. Does that sound reasonable? Bfg 13:07, 7 August 2006 (UTC)

The following is a draft proposal for a case study, meant to come before Various proposals for further extension Bfg 13:36, 18 August 2006 (UTC)

Case study: Medical screening[edit]

In medicine one often dicsuss the possibility of large scale medical screening (see below).

Give a hypothetical idealized screening situation the following is true:

  • The total population is 1.000.000
  • The symmetric error is 1%
  • The occurrence of illness is 1 ‰

From these data, we can form a confusion matrix

Correct well (H0) Correct ill (HA)
Classified well 989.010 10
Classified ill 9.990 990
Legend
     True negative
False negative
False positive
True positive

From this we may read the following:

  • True negative: 989.010
  • False negative: 10
  • False positive: 9.990
  • True positive: 990
  • {\rm false\ positive\ rate} = \frac{\rm number\ of\ false\ positives}{\rm number\ of\ negative\ instances}=\frac{9.990}{989.010+9.990}=0.01=1%
  • etc...

Under frequentist theory one would always refer to one hypothesis as the null hypothesis, and the other as an alternative hypothesis. Under Bayesian theory however, there is no such preference among the alternative hypothesis, the notion of false positive and false negative becomes connected to the hypothesis one is currently discussing. Refering to the confusion matrix above, the false negatives becomes the sum of the offdiagonal column elements belonging to the respective hypothesis, while the false positives becomes the sum of the offdiagonal row elements. The true positives becomes the ondiagonal element, while true negative makes little sense.

Confusion of false negative and false positive[edit]

In my opinion, the article as currently written systematically reverses the usual meanings of false negative and false positive.

In my experience, the usual notion of false positive is in situations such as this: One is testing to see if a patient has a disease, or if a person has taken illegal drugs. The null hypothesis is that no disease exists, or no drugs were taken. A test is administered, and a result is obtained. Suppose the test indicates that the disease condition exists or that the illegal drugs were taken. Then, if the patient is disease-free, or the drugs were not taken, this is a "false positive" in the usual meaning of the term. If for example under the null hypothesis, 5% of the subjects nonetheless test positive, then it is usually said that "the type I error rate is 0.05." That is, 5% of the subjects have falsely tested positive.

This observation is contrary to how the article is written.

Similarly, the notion of false discovery rate is connected with the rate of falsely "discovering" effects that are solely due to chance under the null hypothesis.

The article attempts to account for this by mentioning ambiguity of testing for innocence versus testing for guilt; while there is some rationale for this (especially under a Bayesian interpretation, where all hypotheses are treated the same), it is not usually the situation in frequentist theory, where one hypothesis is usually distinguished as the obvious null hypothesis. Usually this is the "default" hypothesis that one is trying to reject, e.g., that the accused is innocent, that the patient is disease-free, that the athlete is not using illegal drugs.

Therefore, I seriously propose that the article be edited to reflect properly the usual uses of these terms. As it stands, any Freshman statistics student would be immediately confused when comparing this article to what his statistics textbook states. Bill Jefferys 02:03, 11 August 2006 (UTC)

Reading the article more carefully, I find that it is inconsistent even within itself about "false negative" and "false positive". Whereas the lead-in (incorrectly) identifies a false negative as "rejecting a null hypothesis that should have been accepted", later on in the article, in the "medical screening" section, rejecting a null hypothesis that should have been accepted (i.e., deciding that the individual being screened has the condition when he does not -- the null hypothesis being that the individual does not have the condition), is correctly identified as a false positive.

This is going to take a lot of care to edit to make the whole article consistent with both usual terminology and to make it self-consistent.

I note in passing that under frequentist theory, one never "accepts" the null hypothesis. Rather, one fails to reject the alternative hypothesis. When the article is edited to remove these problems, this should also be fixed. Bill Jefferys 23:44, 11 August 2006 (UTC)

Bill. I feel a little bit guilty of negligence here, there was an anonymous user doing such edits the 31st of July, he was quickly reverted by User:Arthur Rubin, hence I concluded without checking into the matter that the interchange was vandalism. Upon investigating your claims, I agree that they have substance. There is a list of work that has to be done to this article, so I will put it on the todo list, however if you had been able to do it yourself, nothing would be better. Bfg 07:36, 14 August 2006 (UTC)

HI, thanks for the feedback. I've been a bit busy of late, but if I can find the time I'll go at it. Some of the later stuff is fine, but the introduction and some of the earlier stuff needs to be fixed. Bill Jefferys 13:59, 14 August 2006 (UTC)

Hi, I've made the changes I am aware of that should be made. There was also some confusion on sensitivity and power which I've also fixed. However, as this was done in one grand edit, I may have missed some things or incorrectly changed some others. Part of the confusion is that the article is written in such a way that it's not always clear what one should consider the null hypothesis. Thus, when discussing computer identity authenticitation, I've presumed that a positive test results corresponds with identifying the subject logging in as authentic, whereas a negative result corresponds with identifying the subject as an imposter. Please check my work. Bill Jefferys 21:47, 14 August 2006 (UTC)

Bill. First off, I would like to thank you for your edits. I have two tips about form which I would like to bring to your attention. You forgot to use an edit summary, upon doing such a major edit, it is especially important, since it could easily be considered vandalism or its like. I would have used an edit summary such as Corrected wrong use of false positive/false negative + some minor stuff, see talk. Another less important thing is how you reply on this discussion page, you should really use indentation with an apropriate amount of ":". In large discussions it makes it easier to follow branches. I'll try to find time to verify your actual edits some time later. Bfg 07:02, 15 August 2006 (UTC)
Yeah, I sometimes forget the edit summary and after I sent it off I had a "Doh!" moment. The only problem with repeated indents is that after a while it gets out of hand; so I was using the alternative convention where the original person doesn't indent. I've seen this frequently. But since you request it, I will follow the other convention here.
I look forward to your having a chance to check my edits. I do think that the article needs a lot of work, even now. For example, a preamble identifying examples of null hypotheses and positive/negative test results, at the very beginning, would help. As it is, the initial definitions are very abstract, which may have caused the confusion in the first place. Best wishes! Bill Jefferys 13:10, 15 August 2006 (UTC)
Hi, First time making a comment so forgive me if this is in the wrong place. The article gives the impression that false positive and type I errors are the same and false negative and type II error are the same. This is common, but incorrect. A false positive results is when true status is known to be negative and the test return positive. A false negative is the converse (known to be true, test returns false). In contrast, type I error is the probability of incorrectly rejecting the null hypothesis and making a wrong claim. Whereas type II error we fail to reject the null hypothesis even though the alternative hypothesis is true. BTW: Other term you may want to add include: The sensitivity of of a test is the P(test positive|has condition), and the sensitivity is P(test negative|without condition). The predictive value positive (which is most likely the piece of information physicians and patients actually want) is the P(has condition|positive test) and the predictive negative value is the P(without condition|negative test). --Dan (Daniels W.W, Biostatistics: A Foundation for Analysis in the Health Sciences 8th Ed,p74-5,217, Wiley:New York) —Preceding unsigned comment added by 128.231.88.5 (talk) 13:42, 11 March 2008 (UTC)
I'd agree that the article is a bit confusing at present. In particular the terms "False positive" and "false negative" are used in a wider context than just statistical hypothesis testing and I don't think they should redirect to this article. I think the article is correct about their use in this particular context, but i think it gets rather confusing to try to discuss their wider use in testing in general, such as diagnostic testing, within the same article. I recently proposed a series of mergers of sensitivity (test), specificity (test), test sensitivity... — see Talk:Sensitivity_and_specificity#Merger_proposal and feel free to add your thoughts. I notice now that we also have two separate articles on positive predictive value and negative predictive value, which I think it would be sensible to merge into a single article. Maybe all of these could even be merged into one article? There's already a diagnostic test article but it says hardly anything on the statistics. In fact there's already an overview of these at Binary classification#Evaluation of binary classifiers, and perhaps that could be expanded to take in (positive and negative) likelihood ratios as well, or alternatively split off into a separate article. To be honest, Wikipedia's coverage of theis topic seems in need of some major reorganisation, so if you want to contribute it would be much appreciated. (Registering a user name might be a good start). Regards, Qwfp (talk) 15:33, 11 March 2008 (UTC)
Hello everyone, I want to comment several things. First of all, in order to understand the meaning of error types, we need to understand when a hypothesis is eligible to be used as a "Null hyphotesis hypothesis" within a statistical standpoint. For example, if I'm interested in proving that "my eyes are brown", some may argue that the null hypothesis can be stated like this:
"H0: the color of my eyes is brown"
and some may think it is more appropiate to use this one:
"H0: the color of my eyes is not brown"
So what is the right answer? Well, neither of them is valid because the question is not suited for the kind of analysis that requires the testing of a hyphotesis hypothesis. Let me show a couple of examples that DO require us to make use of hypothesis testing:
I am a judge in a court, there is a suspect of murder and he is trying to prove his alibi. What would be the null hypothesis? Lets assume I have no evidence that the suspect is guilty, but still: can I reject the validity of his alibi? If I do so, does it turns him innocent? does it make makes him guilty? my null here has to be:
"H0: the guy remains a suspect", vs "H1: the guy is no longer a suspect". Error type I would be not to believe the alibi when it was a true one. Error type II would be to believe the alibi when it was not true.
Notice that one can replace H1 by this one: "H1: the guy is innocent". Phrasing it like this shows that a set of hypothesis are not independent, and one should be careful to always state both the null and the alternative together. Otherwise, a reader might think that the corresponding null of "H1: the guy is innocent" was "H0: the guy is guilty", which is not the case! Lets see another example:
A parent is worried that his teenage daughter has been lying to him. She said to him: "last night I went to Amanda's house". The father suspects that she went to a party instead. If he checks with Amanda (his daughter's friend) whether she was there indeed, there is some chance that she lies to cover the whole thing.
The father in this story can make two kinds of mistakes i) if he doesn't trust Amanda when she was saying the truth, or ii) if he trusts Amanda when she is in fact lying. The null here is:
"H0: it is PLAUSIBLE that the daughter went to the party", vs "H1: the daughter couldn't go to the party".
One last concern I have is that the whole thing with the medical examples is not helping to understand the statistical elements underlying the two types of errors. It's like we are no longer dealing with conditional probabilities, and instead we are finding simple frecuencies and classifying them in matrices like that is all is needed to understand hypotheses and their tests.
So, in sum, my suggestion is to state these errors as being "consistent with the data" or "inconsistent with the data", instead of the "accepting" and "rejecting" terminology. See Arnold Zellner's chapter in the "Handbook of econometrics" of Elsevier for a reliable source on this.--Forich (talk) 21:19, 5 May 2008 (UTC)

Double negative[edit]

For the paragraph:

Type II error, also known as a "error of the second kind", a β error, or a "false negative": the error not rejecting a null hypothesis that is not the true state of nature. In other words, this is the error of failing to accept an alternative hypothesis when you don't have adequate power.

Given that you seem to be saying that:

(a) null hypothesis is NOT rejected.
(b) null hypothesis is NOT "the true state of nature",

I am certain that it could easily be re-written in the form:

the error of accepting a null hypothesis that is not the true state of nature

Also, it seems that "the error not rejecting a null hypothesis" should either be:

"the error OF not rejecting a null hypothesis", or
"the error IN not rejecting a null hypothesis". (I suspect it should be the first; i.e., "of").

Best Lindsay658 03:05, 18 August 2006 (UTC)

I hear what you're saying and indeed the double negative bothers me as well. The problem is that according to standard (frequentist) hypothesis testing theory, one never "accepts" the null hypothesis; one only "fails to reject" it. I haven't been able to think of a good way around this that is both technically correct and easy to understand. Any help here would be appreciated.

In a Bayesian or decision-theoretic context, it wouldn't be an issue, but as long as we're considering standard hypothesis testing, it is. Bill Jefferys 11:36, 18 August 2006 (UTC)

In my opinion, your "the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature" has solved everything. Congratulations on a clear mind. Lindsay658 21:44, 18 August 2006 (UTC)

Security Screening[edit]

An anonyomous editor [63.167.255.231] noted on the main page that:

Comment: I am not sure if "positive predictive value" is an appropriate method of measurement of security screening measures, because Type I and Type II errors are not of equal importance in this context. Traditional analysis of error rates assumes that Type I and Type II errors are equally undesirable outcomes. However, unlike many other forms of testing and measurement, the purpose of the security screening is, ostensibly, to reduce the incidence of Type II errors to zero or as close to zero as possible. Consequently, the rate of Type I error is of secondary importance. The only time this would be of primary relevance would be if you were comparing two or more different screening protocols, all of which identical Type II error rates, but each of which had a different Type I error rate. The protocol that had the lowest incidence of Type I error would obviously be preferable under these circumstances, but of course this assumes complete equivalence among all of the measured protocols in eliminating Type II errors to begin with.

I have moved the comment here because comments don't belong on the main page. The editor here has some justice behind his comments. Indeed, this is a problem of decision theory since not only must the Type I/II error rate be taken into account, but also the loss function that applies, since the loss on detaining a harmless passenger is much less than the loss on allowing a terrorist passenger on board.

I think that the right way to approach this in the main article is to point out (and link to) the connection to decision theory, perhaps using this example as a springboard. But since this article is explaining only Type I/II errors, it isn't the place to give a full explication of decision theory. Perhaps this can be added to the list of "to-dos". Bill Jefferys 23:55, 21 August 2006 (UTC)

It isn't quite true that one normally tries to make the Type I and Type II error rates equal. Usually one tries to balance the two against each other, i.e., choosing a test that will have an appropriate Type I error rate, and then picking other factors (e.g., the number of cases tested) that will guarantee a desired Type II error rate. I'm not aware that anyone slavishly decides that a study will be designed so that Type I and Type II error rates will be equal.

That said, it is still the case that in decision problems, the Type I/II error rates are part, but not all of the problem. For one thing, Type I/II error rates assume that the null/alternative hypothesis is correct, and take no account of the probability that the null/alternative hypothesis is actually the state of nature. This is something that is only reflected in a Bayesian context, through the prior. Secondly, as I point out above, the cost of making a particular decision, given the probability that the state of nature is what it is (e.g., terrorist, innocuous traveller) has to be taken into account. None of these has anything to do with the definition of what is a Type I/II error. They are important considerations that have to be considered when one is making a decision, but they don't reflect directly on the definition of the terms being described by this article.

Thus, I think that it's appropriate to mention this issue briefly with links to the appropriate articles on decision theory and loss function, for example, but it is also appropriate to use the security checking example as one that describes what Type I/II errors are, which is of course the point of the article. Bill Jefferys 01:47, 22 August 2006 (UTC)

size[edit]

I removed the statement that size is equal to power. This is wrong; in fact size is the maximum probability of Type I error; that is, the smallest level of significance of a test. In many cases, size is equal to \alpha (for example a test statistic T having a continuous distribution and point null hypothesis), but in general it is \operatorname{sup}\{\operatorname{Pr}\left[T \ge c\mid\theta\right] : \theta\in\Theta_0\} where c is the critical value. Btyner 22:11, 3 November 2006 (UTC)

Truth table confusion[edit]

Once again, the truth table got changed to an invalid state. The problem is obviously that True and False are used here in two different senses (i.e. the name of the actual condition and the validity of the test result); and so some people look at the intersection of the False column and the Negative row and think the cell should contain "False Negative". I've tried to make this blatantly clear, though some will think this solution is belabored. If you can think of a better way to get this idea across, please leap in. Trevor Hanson 05:19, 13 December 2006 (UTC)

Capitalization of the terms type I and type II errors[edit]

There seem to be some inconsistant use of capitalization of the word type in this article. It seems there is confusion on the web too. Does any one know if 'type' should be capitalized if appearing in the middle of a sentence? 194.83.138.183 10:40, 8 March 2007 (UTC)

There seems to be no reason to capitalize 'type'. Let's proceed with small letters. Loard (talk) 15:41, 13 April 2011 (UTC)

Statistics Heavy[edit]

This article, currently, is very math and statistics heavy and is not useful as a cross reference from other articles that talk about false negatives and false positives.—C45207 | Talk 20:45, 12 March 2007 (UTC)

Distinguishing between type II errors and false negatives[edit]

From the article:

Type II error, also known as an "error of the second kind", a β error, or a "false negative": the error of accepting a null hypothesis when the alternative hypothesis is the true state of nature.

It's my understanding that in a hypothesis test, you never accept the null. You either reject the null (i.e. accept the alternative) or you refrain from rejecting the null. So the above sentence incorrectly defines "type II error". However, a "false negative" can indeed refer to a situation where an assertive declaration is made, e.g. "You do not have cancer."

So a type II error and a false negative aren't necessarily the same thing.

Despite the Wikipedia guideline to be bold, I'm posting this to the talk page rather than making a correction to the article itself, in part because my knowledge of statistics isn't all that deep and there may well be further subtleties here that I'm missing. (In other words, I'd rather risk a type II error than a type I error. ;-) If you can affirm, qualify, or refute what I've said here, please do. Thanks. Fillard 16:40, 4 May 2007 (UTC)

I'd agree that i'd prefer "fail to reject the null" to "accept the null" (and i see it's been changed now). However I think the term "false negative" *is* appropriate as the situation in medical testing is pretty much analagous - a negative test result means e.g. "the test produced no evidence that you have cancer". It's a good analogy when the test result is really continuous (e.g. a chemical concentration) and is dichotomised at some cut-off that should be chosen with care. Qwfp (talk) 12:30, 21 January 2008 (UTC)

Null hypothesis tweaks[edit]

I've made some minor tweaks to the section on null hypotheses. If anyone wants to check out the background to these changes (it necessitated a bit of head scratching and a visit to a library!), there are comments over at Talk:Null hypothesis#Formulation of null hypotheses and User_talk:Lindsay658#Null_hypotheses. -- Sjb90 11:35, 14 June 2007 (UTC)

Phaedrus273 (talk) 04:14, 14 February 2014 (UTC)I have a major issue with the definitions of "Null Hypothesis" on this page and, less so, on the null hypothesis page. There are two types of null hypothesis:

Type 1 H0 defined as: the experiment is inconclusive

Type 2 H0 defined as: there is no correlation between the two parameters.

To assert a type 2 null hypothesis deductively implies that the experiment is "complete", in other words, the methodology was perfect, the measuring instruments were 100% accurate, every confounding factor was fully accounted for. In many areas of research we can not say this, particularly in psychology. For example:Schernhammer et al [1] state, “Findings from this study indicate that job stress is not related to any increase in breast cancer risk”. In fact their trial contained many serious flaws and so demonstrated nothing. A claim of rejection as a null hypothesis is predicated on a complete and flawless study which is very often not the case. [1]Schernhammer, ES, Hankinson, SE, Rosner, B, Kroenke, CH, Willett, WC, Colditz, GA, Kawachi, I. Job Stress and Breast Cancer Risk. Am J Epidemiol, 2004 160(11):1079-1086

Phaedrus273 - Paul Wilson

Misleading example?[edit]

In one of the elaborations of the presence or absence of errors, the example of pregnancy testing is given.

On the basis that there is a well-established medical/physiological/psychological condition known as "false pregnancy" (see pseudocyesis) I suggest that it would be far better to choose a domain other than pregnancy for the example; because it could well be construed that, rather than testing for the presence/absence of a pregnancy (i.e., the presence of a foetus within the woman's uterus), it was actually testing for the presence/absence of a "false pregnancy" (viz., pseudocyesis with no foetus present). Lindsay658 01:13, 2 August 2007 (UTC)

Yes, good point. Somebody added that example long ago, and it seemed clear enough at the time, but something else would be better. I've changed it. Trevor Hanson 03:18, 2 August 2007 (UTC)

Removing the paragraph "Associated section" from the section Understanding Type I and Type II Errors[edit]

The section "Understanding Type I and Type II Errors" was created by Varuag doos on 10 December 2006 (19:27 edit). The second paragraph, starting with the lead-in "Associated section," was (at the time I deleted it) substantially the same as in that original contribution. To help the discussion, here is the deleted paragraph in its entirety:

Associated section - Statistics derives its power from random sampling. The argument is that random sampling will average out the differences between two populations and the differences between the populations seen post "treatment" could be easily traceable as a result of the treatment only. Obviously, life isn't as simple. There is little chance that one will pick random samples that result in significantly same populations. Even if they are the same populations, we can't be sure whether the results that we are seeing are just one time (or rare) events or actually significant (regularly occurring) events.

At this point one might have wanted to, say, provide a better segue into this paragraph, getting rid of the awkward "Associated section" lead in. One might also have corrected the fourth sentence (the one beginning with "There is little chance"), which makes it sound as if random samples can "result in" one or another kind of "population" (the term "population" has a technical meaning in statistics and is not synonymous with "statistical sample"). However, my real problem is that I simply don't understand three out of five sentences in this paragraph (I do understand sentences #1 and #3). There are some "differences between two populations" that random samples are supposed to be "averaging out." What are then the differences that persist "post treatment"? (While we're at it, what is this "treatment," enclosed in scare quotes?) I find the whole second sentence (the one beginning with "The argument is") very confusing. And finally, what is the point of this paragraph? Is it simply to warn that Type-I errors are always possible? Or is it that Type-II errors are always possible? In either case, it's too trivial a point to deserve a paragraph. If the intention is to say something else, then the whole paragraph should definitely be re-written by someone who understands what this something else is supposed to be. Reuqr 05:25, 14 October 2007 (UTC)

Alternate terms for Type I and Type II Error[edit]

In qualitity control applications, this is also known as producer's risk (the risk of rejecting a good item) and consumer's risk (the risk of accepting a bad item) —Preceding unsigned comment added by 70.168.79.54 (talk) 13:48, 29 July 2010 (UTC)

I have heard many statisticians complain about how "type I" and "type II" error are bad names for statistical terminology, because they are easily confused, and the names themselves carry no explanatory meaning. Has anyone encountered a published source making a statement to this effect? Are there any alternative terms that have been proposed in the literature? This would make an important topic to include in the article, if this sort of thing has gone on. It seems there may be a parallel here between this sort of thing and what has happened in other areas of mathematics, such as where the use of baire category terminology is being replaced by more descriptive terms, such as "of the first category" being replaced by meagre. Cazort (talk) 18:57, 7 December 2007 (UTC)

I'd agree that "Type I" and "Type II" isn't great terminology. I see little wrong with "false negative" and "false positive" as alternative more descriptive names myself and i've seen them used in medical statistics literature. Qwfp (talk) 12:45, 21 January 2008 (UTC)
I completely agree, Type I and Type II are merely legacy terminology, are easily confused, and are really meaningless. "false negative" and and "false positive" are to the point. I see no benefit for any other term. Mingramh (talk) 14:09, 2 April 2008 (UTC)

I struggled with remembering which way round these are when I was doing courses in psychology, and discovered a mnemonic to help in this which I've added to the page. I hope they meet others' approval.

Meltingpot (talk) 21:03, 6 October 2009 (UTC)

Null hypotheses[edit]

I thought that the following should appear here (originally at [2] for the ease of others.Lindsay658 (talk) 21:21, 22 February 2008 (UTC)

Hi there,

Over on the null hypothesis talk page, I've been canvassing for opinions on a change that I plan to make regarding the formulation of a null hypothesis. However I've just noticed your excellent edits on Type I and type II errors. In particular, in the null hypothesis section you say:

The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression Ho -- associated with an increasing tendency to incorrectly read the expression's subscript as a zero, rather than an "O" (for "original") -- has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil hypothesis". That is, they incorrectly understand it to mean "there is no phenomenon", and that the results in question have arisen through chance.

Now I know the trouble with stats in empirical science is that everyone is always feeling their way to some extent -- it's an inexact science that tries to bring sharp definition to the real world! But I'm really intrigued to know what you're basing this statement on -- I'm one of those people who has always understood the null hypothesis to be a statement of null effect. I've just dug out my old undergrad notes on this, and that's certainly what I was taught at Cambridge; and it's also what my stats reference (Statistical Methods for Psychology, by David C. Howell) seems to suggest. In addition, whenever I've been an examiner for public exams, the markscheme has tended to state the definition of a null as being a statement of null effect.

I'm a cognitive psychologist rather than a statistician, so I'm entirely prepared to accept that this may be a common misconception, but was wondering whether you could point me towards some decent reference sources that try to clear this up, if so! —The preceding unsigned comment was added by Sjb90 (talkcontribs) 11:07, 16 May 2007 (UTC).

Sjb90 . . . There are three papers by Neyman and Pearson:
  • Neyman, J. & Pearson, E.S., "On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference, Part I", reprinted at pp.1-66 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1928).
  • Neyman, J. & Pearson, E.S., "The testing of statistical hypotheses in relation to probabilities a priori", reprinted at pp.186-202 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1933).
  • Pearson, E.S. & Neyman, J., "On the Problem of Two Samples", reprinted at pp.99-115 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1930).
Unfortunately, I do not have these papers at hand and, so, I can not tell you precisely which of these papers was the source of this statement; but I can assure you that the statement was made on the basis of reading all three papers. From memory, I recall that they were quite specific in their written text and in their choice of mathematical symbols to stress that it was O for original (and not 0 for zero). Also, from memory, I am certain that the first use of the notion of a "null" hypothesis comes from:
  • Fisher, R.A., The Design of Experiments, Oliver & Boyd (Edinburgh), 1935.
And, as I recall, Fisher was adamant that whatever it was to be examined was the NULL hypothesis, because it was the hypothesis that was to be NULLIFIED.
I hope that is of some assistance to you.
It seems that it is yet one more case of people citing citations that are also citing a citation in someone else's work, rather than reading the originals.
The second point to make is that the passage you cite from my contribution was 100% based on the literature (and, in fact, the original articles).
Finally, and this comment is not meant to be a criticism of anyone in particular, simply an observation, I came across something in social science literature that mentioned a "type 2 error" about two years ago. It took me nearly 12 months to track down the source to Neyman and Pearson's papers. I had many conversations with professional mathematicians and statisticians and none of them had any idea where the notion of Type I and type II errors came from and, as a consequence, I would not be at all surprised to find that the majority of mathematicians and statisticians had no idea of the origins and meaning of "null" hypothesis.
I'm not entirely certain, But I have a feeling that Fisher's work -- which I cited as "Fisher (1935, p.19)", and that reference would be accurate -- was an elaboration and extension of the work of Neyman and Pearson (and, as I recall, Fisher completely understood the it was an oh, rather than a zero in the subscript). Sorry I can't be of any more help. The collection that contains the reprints of Neyman and Pearson's papers and the book by Fisher should be fairly easy for you to find in most university libraries.Lindsay658 22:37, 16 May 2007 (UTC)
Thanks for the references, Lindsay658 -- I'll dig them out, and have a bit of a chat with my more statsy colleagues here, and will let you know what we reckon. I do agree that it's somewhat non-ideal that such a tenet of experimental design is described rather differently in a range of texts!
As a general comment, I think it entirely acceptable for people working in a subject, or writing a subject-specific text book / course to read texts more geared towards their own flavour of science, rather than the originals. After all, science is built upon the principle that we trust much of the work created by our predecessors, until we have evidence to do otherwise, and most of these derived texts tend to be more accessible to the non-statistician. However I agree that, when writing for e.g. Wikipedia, it is certainly useful to differentiate between 'correct' and 'common' usage, particularly when the latter is rather misleading. This is why your contribution intrigued me so -- I look forward to reading around this and getting back to you soon -- many thanks for your swift reply! -- Sjb90 07:39, 17 May 2007 (UTC)
OK, I've now had a read of the references that you mentioned, as well as some others that seemed relevant. Thanks again for giving me these citations -- they were really helpful. This is what I found:
  • First of all, you are quite right to talk of the null hypothesis as the 'original hypothesis' -- that is, the hypothesis that we are trying to nullify. However Neyman & Pearson do in fact use a zero (rather than a letter 'O') as the subscript to denote a null hypothesis. In this way, they show that the null hypothesis is merely the original in a range of possible hypotheses: H0, H1, H2 ... Hi.
  • As you mentioned, Fisher introduced the term null hypothesis, and defines this a number of times in The Design of Experiments. When talking of an experiment to determine whether a taster can successfully discriminate whether milk or tea was added first to a cup, Fisher defines his null hypothesis as "that the judgements given are in no way influenced by the order in which the ingredients have been added ... Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."
  • Later, Fisher talks about fair testing, namely in ensuring that other possible causes of differentiation (between the cups of tea, in this case) are held fixed or are randomised, to ensure that they are not confounds. By doing this, Fisher explains that every possible cause of differentiation is thus now i) randomised; ii) a consequence of the treatment itself (order of pouring milk & tea), "of which on the null hypothesis there will be none, by definition"; or iii) an effect "supervening by chance".
  • Furthermore, Fisher explains that a null hypothesis may contain "arbitrary elements" -- e.g. in the case where H0 is "that the death-rates of two groups of animal are equal, without specifying what those death-rates actually are. In such cases it is evidently the equality rather than any particular values of the death-rates that the experiment is designed to test, and possibly to disprove."
  • Finally, Fisher emphasises that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution". He gives an example of a hypothesis that can never be a null hypothesis: that a subject can make some discrimination between two different sorts of object. This cannot be a null hypothesis, as it is inexact, and could relate to an infinity of possible exact scenarios.
So, where does that leave us? I propose to make the following slight changes to the Type I and type II errors page and the null hypothesis page.
  • I will tone down the paragraph about original vs. nil hypotheses: the subscript is actually a zero, but it is entirely correct that the hypothesis should not be read as a "nil hypothesis" -- I agree that it is important to emphasise that the null hypothesis is that one that we are trying to nullify.
  • In the null hypothesis article, I will more drastically change the paragraph that suggests that, for a one-tailed test, it is possible to have a null hypothesis "that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn". As I had previously suspected, this is actively incorrect: such a hypothesis is numerically inexact. The null hypothesis, in the case described, remains "that sample A is drawn from a population with the same mean as sample B".
  • I will tone down my original suggestion slightly: A null hypothesis isn't a "statement of no effect" per se, but in an experiment (where we are manipulating an independent variable), it logically follows that the null hypothesis states that the treatment has no effect. However null hypotheses are equally useful in an observation (where we may be looking to see whether the value of a particular measured variable significantly differs from that of a prediction), and in this case the concept of "no effect" has no meaning.
  • I'll add in the relevant citations, as these really do help to resolve this issue once and for all!
Thanks again for your comments on this. I will hold back on my edits for a little longer, in case you have any further comments that you would like to add!
-- Sjb90 17:33, 17 May 2007 (UTC)
I agree with your changes. As you can see from [[3]],

[[4]], [[5]], and [[6]] I really didn't have a lot to work with.

I believe that it might be helpful to make some sort of comment to the effect that when statisticians work -- rather than scientists, that is -- they set up a question that is couched in very particular terms and then try to disprove it (and, if it can not be disproved, the proposition stands, more or less by default).
The way that the notion of just precisely how the issue of a "null hypothesis" is contemplated by "statisticians" and the way that this (to common ordinary people counter-intuitive notion) of, essentially, couching one's research question as the polar opposite of what one actually believes to be the case (by contrast with "scientists" who generally couch their research question in terms of what they actually believe to be the case) is something that someone like you could far better describe than myself -- and, also, I believe that it would be extremely informative to the more general reader. All the best in your editing. If you have any queries, contact me again pls. Lindsay658 21:49, 17 May 2007 (UTC)
Just a note to say that I have finally had the chance to sit down and word some changes to the Null hypothesis article and the section on Type_I_and_type_II_errors#The_null_hypothesis. Do shout and/or make changes if you think my changes are misleading/confusing! -- Sjb90 11:33, 14 June 2007 (UTC)

Computer Security[edit]

Are the 2 examples under the Computer Security section backwards?Mingramh (talk) 14:15, 2 April 2008 (UTC)

I don't see anything backward about them myself. Qwfp (talk) 18:06, 2 April 2008 (UTC)
I think I see what Mingramh is saying... it depends on how you define your positive test result. If a positive test indicates an intruder (which would be my intuition for a computer security test), then the example is correct. If a positive test indicates a user, then the example is reverse. I've actually seen "positive" indicating both definitions in this domain. The example given in the computer security section is clear to me because it fits my intuition, but perhaps can be made a little clear for those to whom it is counter-intuitive. If no one wants to try to modify it, I'll take a stab at it soon. WDavis1911 (talk) 23:20, 28 May 2008 (UTC)

Innocent / Not Innocent[edit]

I suggest removing this example table. I'm not clear on what legal system abjectly uses a test for "innocence" where the presumption (null hypothesis) is on "not innocent". It could be argued that civil law uses this; however, there it's not really consistent with a hypothesis test at all since the basis is more a "preponderance of doubt". It could also be argued that the de facto situation in a kangaroo court uses a "not innocent" null hypothesis, but in that case I don't beleive it's consistent with a hypothesis test either since the "truth" is not what they're after. I never heard of someone being found "not innocent" by even a kangaroo court. Garykempen (talk) 19:41, 22 March 2009 (UTC)

Minor Edits[edit]

I'm going to be studying this article, and in the process will clean up a few minor things. The references to a "person" getting pregnant are silly, for example. Check the wiki page on pregnancy: only women get pregnant. Etc.. Brad (talk) 01:50, 25 August 2008 (UTC)

I've changed the two "In other words" lines that try to put the false negative and false positive hypotheses in layman's terms, in order to clarify them a bit. In their previous form they said "In other words, a false positive indicates a Positive Inference is False." This is both poor grammar and very confusing. A false positive doesn't "indicate" anything; a false positive is a statistical mistake - a misreading of the auguries, if you will. I've also taken out the Winnie-the-Pooh capitalization and inserted "actually" to make it clear that the false positive (or negative) stands in contradiction to reality. Ivan Denisovitch (talk) 13:41, 29 November 2010 (UTC)

Merge proposal[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
The consensus was not to merge Qwfp (talk) 08:52, 9 November 2008 (UTC)

Sensitivity and Specificity are essentially measures of type-I and type-II error - I think the articles could be merged. Wjastle (talk) 14:43, 10 October 2008 (UTC)

- Sensitivity and Specificity are also used to characterize the performance of physical or biological tests. In that case nobody calls them type-I or type-II errors, which is statistical jargon. I think we should leave them separate. Frederic Y Bois (talk) 12:53, 14 October 2008 (UTC)
I agree with Frederic. Many people in the health sciences need to learn about sensitivity and specificity but do not need to learn about Type I and Type II errors and would be confused by that nomenclature (which is pretty awful, IMHO, but its far too entrenched to change it). Although the mathematical definitions are closely related the contexts are quite different, so it's less confusing to keep them separate. Qwfp (talk) 23:07, 14 October 2008 (UTC)
If this is a vote, I agree with Qwfp and Frederic Y Bois--the terms Type I and Type II errors are widely used in statistics and (at least) the social sciences. Were a student to look up Type I errors and be redirected to Sensitivity, she would be enormously confused, since the concepts are only similar, not identical.--Anthon.Eff (talk) 02:26, 15 October 2008 (UTC)
Agreed, definitely no merging. While the concepts are similar they are applied to very different situations.Kjbeath (talk) 05:01, 3 November 2008 (UTC)
Strongly oppose suggestion of a merger. In epidemiology and diagnostic testing, the issue is Sensitivity and Specificity, and type I and type II errors are barely mentioned. SNALWIBMA ( talk - contribs ) 11:03, 3 November 2008 (UTC)

Both are jargon, and are essentially the same thing, but one takes a positive view (how good the test is) and the other negative (how bad the test is). --Rumping (talk) 17:39, 8 November 2008 (UTC)

Strongly oppose: they are used in completely different disciplinary domains, and are used to denote entirely different entities as their respective referents. A merge would simply encourage an unproductive and misleading sloppiness of speech and thought in those who are less better informed and who come to Wikipedia to be set straight.Lindsay658 (talk) 01:09, 9 November 2008 (UTC)

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.


aids testing[edit]

Is home testing for aids accurate —Preceding unsigned comment added by 70.119.131.29 (talk) 23:15, 30 July 2009 (UTC)

computer database searches[edit]

As far as I remember, the null hypothesis in database searches is that documents are NOT relevant unless proven otherwise. This is thus exactly the opposite of what is said in the article. I tend to remove the whole section, but I think I will wait for some time to see whether citations for one or the other interpretation can be given.194.94.96.194 (talk) 08:43, 9 December 2009 (UTC)

I agree. It's garbage. And in any case this specific example of an application of type I/type II errors etc adds nothing of value to the article. Remove the lot. SNALWIBMA ( talk - contribs ) 09:49, 9 December 2009 (UTC)
Removed section for reasons stated above since no one has disputed in a year. Dthvt (talk) 21:38, 7 December 2010 (UTC)

Apple example[edit]

Consider your self in a market and you want to buy an apple. Now seeing an apple at fruit shop you have to take decision whether apple is Healthy or Unhealthy? There will be four cases:

  1. You purchased apple thinking that apple is Healthy and it turns out to be Healthy. Your decision was right. No error in decision.
  2. You purchased apple thinking that apple is Healthy and it turns out to be Unhealthy. Your decision was wrong. One kind of error in decision.
  3. You leave the apple thinking it is is Unhealthy and it turns out to be Healthy. Again your decision was wrong. Other kind of error in decision.
  4. You left apple thinking that apple is Unhealthy and it turns out to be Unhealthy. Your decision was right. No error in decision.

In case 1 and 4 you made no mistake your decision was correct. But in case 3 and 4 there is error. Error in case 2 is more harmful, it effect you more because you have purchased unhealthy apple. So this is TYPE II error. Where as error of case 3 is not so crucial, is not so harmful. because you have left apple. So this is TYPE I error.

I moved this from the Lead. The 4-option matrix is helpful (there's actually a chart of this in the body of the article, maybe it should go there. It probably doesn't belong in the lead, since there are already examples, but let's discuss that. Also, whether type I or type II error is more harmful depends on what the null hypothesis is. Sometimes being too conservative is ok (walking along a cliff), but sometimes being too conservative is very harmful (not pulling the trigger enough). Sometimes being too open is ok (meeting new friends) but sometimes being too open is very dangerous (falling for cons). Any example must make clear that context matters. I'll see if I can incorporate this into the body. Ocaasi (talk) 21:36, 18 March 2011 (UTC)
What makes an apple healthy and unhealthy? If you don't buy it, you won't know until someone else make a bad decision. It's a selfish decision example. It takes an objective outsider to help. Four cases for the individual, extra cases to get the statistical errors. The simple single event example neglects the multi event rates required to define Type I and Type II errors. Zulu Papa 5 * (talk) 07:37, 20 March 2011 (UTC)
I agree that it's not a great example. Although, the single simple event is implied to have scale effects; for example, a common type II error, and one used in the article is a legal case, is where the court lets a guilty man go free to avoid the type I error of convicting an innocent one. That single case is not altogether different from the apple example, and if many such people make many such choices, the error becomes statistically significant. Maybe I should mention that while single case examples can help illustrate the Types, that they do not become statistically relevant until there is a group of them. Otherwise, what do you think of the current lead as I updated it yesterday? Ocaasi (talk) 12:45, 20 March 2011 (UTC)
Defining statistical errors has nothing to do with their relative "harmfulness". We seem to have (too) many good examples already.

Loard (talk) 15:49, 13 April 2011 (UTC)

Interview[edit]

I have removed the following:

"A false positive in hiring job candidates would mean turning down a good candidate which performed poorly in an interview, while a false negative means hiring a bad candidate that performs well in an interview."

This scenario provided by User 174.21.120.17 is an excellent example of (so to speak) "perfect choice" consequent upon a "flawed" selection process, and has nothing to do with type I and type II errors.

By contrast, a type I or type II error would be one that is consequent upon a flawed selection choice that was made from a (so to speak) "perfect" selection process.Lindsay658 (talk) 22:32, 27 March 2011 (UTC)

Reorganization 2011[edit]

There are two articles discussing related, but non-synonymous terms: this one and Sensitivity and specificity. Consensus (look: Merge proposal) seems to be that thay should stay separate. This article takes care of technical side of things (formulas, alternative definitions), while Sensitivity and specificity discusses issues arrising in the application of statistical methods to fields such as medicine, physics and "airport security check science". I have edited the summary accordingly, also adding (a little clumsy) "about" template redirecting "applied" readers to Sensitivity and specificity. The rest of the article should follow.

I would like to discuss following improvements:

  1. My introduction
  2. Creation of new paragraph "Definitions", which would clearly link (also using aforementioned table) often confused, but related terms such as type I and II errors, false negative, positive, alpha, beta, sensitivity, specificity, false positive and negative rate etc - this should of course not appear in specificity and sensitivity, where application has the priority
  3. Clean-up of "Further extension" - some examples are anecdotal [Type III errors should be elucidated in more detail, especially with relation to drawing true conclusion from false premisses as a definition of invalid argument]
  4. Stressing that talking about type I and type II errors is meaningless without knowing "the real state of things" (we need a clear term for this one too) and clear relation to null hypothesis; as well as that null hypothesis is never accepted, one can only fail to reject it
  5. Rewriting of "Statistical error" in a non-textbook manner
  6. Accomodation of true positives and true negatives (false/true negatives/positives all redirect here, maybe 'trues' should redirect to Statistical hypothesis testing?)
  7. Expansion of the table under type II error to include "true positive" and "true negative" under "right decision"
  8. Merging of "Consequences" paragraph with "Examples"
  9. Shortening of "Examples" (they should be more prominent in specificity and sensitivity)
  10. Removing "false positive rate", "false negative rate", "null hypothesis", "Bayes' theorem" and "error in science" from their respective paragraphs and putting them into "definitions" in another, proper place

Loard (talk) 17:47, 13 April 2011 (UTC)

Problematic article, overall[edit]

While there are portions of great substance in this article, overall it leaves this reader -- who has seen many presentations of this subject over the years, and three further today -- with the impression of a flailing about with the subject, or perhaps experimenting with it. It is like a teacher converting their lecture notes into a textbook after the first time teaching through a course, rather than after teaching the subject for many years.

So, though rarely would I say this, I feel less solidly informed on this subject (honestly, feeling more confused) after perusing the wiki article than I did when first coming to the site (after having looked at 2-4 min YouTube videos to see state of that pedagogic art). Here, one is left with the sense of the results of gulping large mouthfuls and swallowing with only brief chewing -- which gets the job done, but is not great in terms of nutrition (or avoiding GI discomfort). I ascribe the impression as likely being due to the polyglot authorship.

So, I'd suggest that the wiki group that's created this page:

(1) Agree to have one significant and authoritative contributor take on the task of a major edit;

(2) That the individual do that major edit by first distilling the article down to the best available single pedagogic approach to presenting the T-1/T-2 definitions and fundamentals, **including standard graphics/tables the best teachers use** (!);

(3) Expanding to add a second standard but alternative (textbook-worthy) approach to the explanation, but one clearly tied to the first approach so that readers get two different attempts ("angles") at explaining the fundamental concepts;

(4) That next provided might be **limited**, clear, authoritative set examples, with focus on standard examples appearing in at least one major source text, which can then be cited so that deeper discussion and understanding can be pursued;

(5) That all other extraneous subject matter be limited or eliminated, e.g., in taking the reader from standard Type I/II thinking to the current state of the statistical art (what most statisticians believe and apply, right now), rather than listing encyclopedically every additional proposed error type, even the humorous -- one short insight-filled paragraph (please!);

(6) That historical aspects be reduced to one such brief paragraph, rather than as an alternative parallel development of the fundamentals; (Every chemist mentions its origin with Kekule when teaching about benzene's aromaticity; no one spends any more than a line or two on it, because our understanding of the concept is all the deeper for the passed time, and because the target audience for his writing is not like current audiences in any way; and

(7) That referencing and notes be reviewed for focus, relevance, and scholarly value; e.g., the citation of the high school AP stats prep website should probably go.

These are my suggestions as a teacher and writer and as a knowledgable outsider on this, and is prompted by my inability to suggest the article, as is, to young people needing to understand these errors.

Note: The only thing I did editorially was to remove as tangential the reference to types of error in science, because that section begged the question of the relation of those types to this articles' Types I and II error, where connection (explanatory text) was completely missing. (!)

Prof D.Meduban (talk) 00:45, 9 June 2011 (UTC)

Thanks for your insightful comments. One key point. Wikipedia functions as a reference encyclopedia not a pedagogic tool. We include all relevant aspects and do not do so in a how-to format. You can help us improve specific examples and phrasing if you like, but please take note of the difference between this approach and a teaching guide. Cheers, Ocaasi t | c 02:35, 9 June 2011 (UTC)

this concept is totally wrong[edit]

the type I and type II definition are totally wrong according to more books and otherwebsites — Preceding unsigned comment added by K2k1984 (talkcontribs) 10:05, 15 July 2011 (UTC)

Hopefully the parts you are referring to are now correct. Kernel.package (talk) 06:56, 13 October 2011 (UTC)

In the section "Type II error", under the section "Statistical test theory", rejecting the null hypothesis is equated with proving it false. This is inaccurate:

"A statistical test can either reject (prove false) or fail to reject (fail to prove false) a null hypothesis, but never prove it true (i.e., failing to reject a null hypothesis does not prove it true)" — Preceding unsigned comment added by 97.251.38.192 (talk) 12:36, 9 November 2012 (UTC)

New intro[edit]

I rewrote the introduction, firstly because the term "false positive" and "false negative" come from other test areas, and are not specifically used in statistical test situations. And secondly the example that was given, could hardly be understood as an example in a statistical test. Nijdam (talk) 09:15, 13 October 2011 (UTC)

Related terms[edit]

I've the impression that big parts of the article treats type I and type II errors as synonymous with false positives and false negatives. Imo they belong to different fields of research.Nijdam (talk) 11:15, 14 November 2011 (UTC)

I've noticed that "false positive" and 'false negative" directs to this article. This means this either article should clearly treat all the terms and the different context they refer to, or a separate article should treat "false positive" and 'false negative". Nijdam (talk) 20:28, 14 November 2011 (UTC)

3.4 Parts Per Million[edit]

In the section "Understanding Type I and Type II errors," the last sentence includes "3.4 parts per million (0.00002941 %)," in which the parenthetical percentage seems to be in error. If you calculate 3.4/1000000, the answer is 3.4E-6, or 0.00034%. Gilmore.the.Lion (talk) 15:06, 21 October 2011 (UTC)

Too technical[edit]

I got to this article via the false positive redirect, hoping for some basic explanation of what a false positive is and what the significance of them is. Instead I get something about a "Type I error" and an "error of the first kind" that just makes my brain hurt trying to understand what it is saying - and I've taken an undergraduate-level course in philosophical logic in the past and have been told on several occasions I have above average comprehension. Wikipedia is a general encyclopaedia not an encyclopaedia for professional statisticians. Thryduulf (talk) 13:11, 9 December 2011 (UTC)

Totally agree with Thurduulf. I came here using the same search terms and was dismayed by the hyper-technicality of the article. For the lay reader it is, IMO, mostly rubbish. (anon, 15, June 2012)
Agree... Can something be done about the triple negative in the opening sentence please "incorrect rejection of a true null hypothesis." 122.150.178.86 (talk) 11:42, 1 July 2013 (UTC)
Here is a suggested rewording. Because I am a statistician I may be kidding myself that this is simpler for lay people, but here goes...
A type I error (or error of the first kind) happens when a statistician mistakenly suggests a new cause and effect relationship. The correct conclusion in such as situation should have been to accept the "null hypothesis" (ie the hypothesis that the relationship observed by predecessors is valid).
A type II error (or error of the second kind) is the failure to recognise a new relationship. In such a situation the null hypothesis was wrongly accepted by the statistician.
Sqgl (talk) 14:42, 1 July 2013 (UTC)

New Introduction[edit]

As the original contributor, I have written a new introduction, and have moved the previous introduction down the page. I think that, as I have written it, most of the "complaints" about the technical complexity of the article will stop -- given that we can now say that the other stuff is connected with the imprecise catachrestical extension of the technical terms (by people that are too lazy to create their own) into these other areas in which, it seems, it is not desirable to speak in direct plain English, and that one must be as distant, obscure, and as jargon-laden as possible.Lindsay658 (talk) 00:48, 21 December 2011 (UTC)

Wolf[edit]

There has been editing over and over about the example of the wolf.

H_0: "no wolf"

Is rejected when some-one cries "wolf'.

Type I error is made when H_0 is rejected and there is actually no wolf.
Type II error is made when H_0 is not rejected but there is a wolf.

So far so good. But what about the terms (excessive) "skepticism" and "credulity". Normally a test is skeptical, i.e. one only beliefs there is a wolf if one sees one. Because of this skepticism a type I error is not made easily. As a consequence a type II error is at hand, and may only be avoided by choosing for credulity, i.e. choosing a reliable investigator, who only cries "wolf" when he is pretty sure there is one. If we want to use the words credulity and skepticism in connection with the types of error, we may say: as a consequence of the "credulity" an error of type I is sometimes made, and as a consequence of the "skepticism" an error of type II is sometimes made. Nijdam (talk) 07:58, 12 April 2012 (UTC)

The reference given does not say that Type I error is an "error of credulity." It discusses these terms in the extremes: "believe everything!" and "believe nothing!" The phrase "Type I error is an error of credulity" is vague - does it mean that one makes an error by believing that the null is true? That is not Type I error. Also an "error of credulity" could go either way - too much or too little. There is no good way to fix these sentences using the same words because they can be interpreted different ways. What is an "error of skepticism"? Was one too skeptical? Not skeptical enough? Skeptical when they should have been credulous? Finally I cannot see why anyone would want to rely on an "Encyclopedia of Pseudoscience" for a definition of a statistical concept. I could find ten other good definitions in statistical references. I don't think that we want pseudoscience but rather science. Mathstat (talk) 19:23, 12 April 2012 (UTC)

Recent edits[edit]

It never is good practice to make much edits at the same time, as it is rather difficult to see which ones are acceptable and which not. So, make suggestions here on the talk page, before changing the article. Nijdam (talk) 07:04, 5 September 2012 (UTC)

Dear N,
Can you please have a look at those edits and tell me which ones you think are incorrect?
They are each also, I think, easily revertable one by one with reasons, rather than en masse...
Talk here would essentially be recapitulating them. Concisely, The edits removed unnecessary anecdotes, unreferenced claims for alternative names, and focussed on clear examples (as requested in the heading, noting that the article is current;y unapproachable. No substantive content removed Tim bates (talk) 11:25, 5 September 2012 (UTC)

I wouldn't know how to reject one edit and accept a later one. Nijdam (talk) 07:56, 6 September 2012 (UTC)

Then you edit and stop reverting.Curb Chain (talk) 00:44, 11 September 2012 (UTC)
You're the one who wants to change some parts. Nijdam (talk) 05:48, 12 September 2012 (UTC)

Mnemonic Device to remember Type I and Type II Errors[edit]

A few years ago, I came up with a mnemonic device to help people remember what Type I and Type II errors are. Just remember "apparition" and "blind".

Apparition starts with an "A", as does "alpha, which is another term for Type I error. An apparition is a ghost, i.e. you're seeing something (a defect or a difference) that isn't there.

Blind starts with a "B", as does beta, which is another term for Type II error. Blind means you're not seeing something (a defect or a difference) that is there. — Preceding unsigned comment added by WaltGary (talkcontribs) 08:38, 6 October 2012 (UTC)

Consequences section about NASA.[edit]

I personally do not agree about the section called : Consequences. Where the article discusses NASA. You guys wrote: "For example, NASA engineers would prefer to throw out an electronic circuit that is really fine (null hypothesis H0: not broken; reality: not broken; action: thrown out; error: type I, false positive) than to use one on a spacecraft that is actually broken (null hypothesis H0: not broken; reality: broken; action: use it; error: type II, false negative). In that situation a type I error raises the budget, but a type II error would risk the entire mission."

I thought the definitions are the following: Type I error is: rejecting the null hypothesis when it is really true. Type II error: accepting the null hypothesis when it is really not true. Null Hypothesis: the hypothesis a researcher is testing is not true, there is no statistical significance. Researcher hypothesis: there is a statistical significance.

Basically the section about NASA could have been interpreted with two scenarios, for instance:

Scenario 1:alternative scenario

Electronic circuit Is good: researcher hypothesis.

Electronic circuit is not good: Null hypothesis

Type I error: Electronic circuit is good, when its really not good. Rejecting Null hypothesis.

Type II error: electronic circuit is not good, when its really good. Accepting Null Hypothesis


Scenario 2:Your scenario

Electronic circuit is broken: researcher hypotheis.

Electronic circuit is not broken: Null hypothesis

Type I error: Electronic circuit is broken, when its really not broken.Rejecting Null hypothesis

Type II error: Electronic circuit is not broken, when its really broken.Accepting Null hypothesis


I was totally confused by this section of the article. Please inform me If I am wrong, or maybe I didn't understand what you meant by this section. — Preceding unsigned comment added by 67.80.92.202 (talk) 22:26, 3 April 2013 (UTC)

The Wolf Thing Again[edit]

The moral of The Boy Who Cried Wolf is that if you abuse peoples trust they wont trust you when it matters. It is about deliberate deciet not being mistaken (i.e. an error!). It's not intuitive and it's a terrible metaphor. It really doesn't help make it clearer. Why not use an example which is actually about making an error?

Lucaswilkins (talk) 17:05, 25 October 2013 (UTC)