Talk:Sensitivity and specificity
|This article is of interest to the following WikiProjects:|
|The content of Sensitivity (tests) was merged into Sensitivity and specificity. That page now redirects here. For the contribution history and old versions of the redirected page, please see ; for the discussion at that location, see its talk page.|
|The content of Specificity (tests) was merged into Sensitivity and specificity. That page now redirects here. For the contribution history and old versions of the redirected page, please see ; for the discussion at that location, see its talk page.|
- 1 merge
- 2 action-required: new-image
- 3 Merger proposal
- 4 Order
- 5 Specificity/Sensitivity in Bioinformatics
- 6 Suggested edits to be made under 'Specificity'
- 7 Simple summary or interpretation
- 8 Sufficient sample size for sensitivity?
- 9 Confidence intervals
- 10 Denominators of definitions
- 11 How "highly" sensitive or specific does a test have to be before "SNOUT" and "SPIN" apply?
- 12 Type I/II errors, and FP and FN's
request merge of "negative predictive value" "positive predictive value" and "Sensitivity and specificity". these terms are intimiately related, and should be in one place, possibly with a discussion of ROC. Further, suggest modeling on <http://www.musc.edu/dc/icrebm/sensitivity.html>> this is a great expostion of this complicated stuff.Cinnamon colbert (talk) 22:25, 16 November 2008 (UTC) PS: the three articles are a great start
I also think that the entries on precision and recall should be some how linked to this page. Recall is the same things as sensitivity, furthermore, specificity is merely your recall with regards to negative data-points. (Wyatt) —Preceding unsigned comment added by 18.104.22.168 (talk) 15:17, 5 November 2010 (UTC)
OK, I give up. Adding Images to Wiki is a nightmare.
I made a new image for this page that I think is more intuitive. http://s15.postimg.org/yoykdv34r/sensitivity_vs_specificity.png
Someone with more edits to their credit (most of mine have been made anon due to laziness) should add that image to the page. Cheers, -- Dave — Preceding unsigned comment added by Ddopson (talk • contribs) 18:12, 6 September 2013 (UTC)
Common usage should play no role in article name, as normally-present concerns (e.g. unfindability) are compensated for by redirects. The same holds true in respect to layman's terminology (gingiva vs. gums). By reversing the order, the article invertes intuitiveness by presenting and elucidating that which relates to type 2 errors prior to what which relates to type 1 errors. DRosenbach (Talk | Contribs) 12:40, 1 October 2009 (UTC)
The Wikipedia:Naming conventions policy, specifically the section Wikipedia:Naming conventions#Use common names, says otherwise: "Articles are normally titled using the most common English-language name of a person or thing that is the subject of the article". 'Gingiva' is different as 'Gums' is ambiguous (see Gum) so the next section Wikipedia:Naming conventions#Be precise when necessary comes into play, but in the case of this article neither order is more or less ambiguous or precise than the other. Presenting specificity first because it relates to Type I error is more logical only to those who already know about Type I and Type II errors. To most non-statisticians, these terms are more confusing and less familiar than sensitivity and specificity. Qwfp (talk) 07:22, 3 October 2009 (UTC)
Specificity/Sensitivity in Bioinformatics
In bioinformatics literature, the terms 'specificity' and 'sensitivity' are used, but are different from what is shown in this article. The terms are used as synonyms for precision and recall. In other words, sensitivity is used correctly, but specificity seems to be used for positive predictive value. I haven't been able to find a source explicitly exposing this difference, though, so I haven't edited the article to include this caveat. —Preceding unsigned comment added by Kernco (talk • contribs) 17:53, 9 April 2010 (UTC)
In fact,the terms 'specificity' and 'sensitivity' are same when in bioinformatics literatures.Maybe some authors used the wrong formula definition in one bioinformatics literature.Someone point out the error in a comment.
- I suspect that in being virtual, informatics has issues with what is the real and correct (ground) truth when operationally defining what are "actual positives" and "actual negatives". Perhap a section on informatics definitions would be beneficial in this article. Zulu Papa 5 * (talk) 02:11, 6 November 2010 (UTC)
In the year since I posted this, and continued my phd research in bioinformatics, it's definitely apparent to me that there's no consensus in the community on what the formula for specificity should be, though my impression is still that the most common usage of specificity in the literature is as a synonym for positive predictive value. It's probably just the case of a wrong usage in the past propagating forward, since you must use the same evaluation metrics to compare your own results with previous ones, so I'm not pushing for this page to be changed in any way. I think it's an interesting anomaly, though. Kernco (talk) 20:34, 19 April 2011 (UTC)
Suggested edits to be made under 'Specificity'
Suggested edits to be made under 'Specificity':
Hi, I'm new to editing here and don't know how to properly do it. However, I hope that somebody who cares, can see if the following edits help. Because I'm unfamiliar with 'code' used by wikipaedia, please accept my improvised 'formatting' which I've devised as follows.
- What is italics are the original text which should be deleted.
- What is in bold is what I think should be added.
Rgds email kingnept(at) singnet.com.sg
Specificity: "...A specificity of 100% means that the test recognizes all actual negatives - for example in a test for a certain disease, all healthy disease free people will be recognized as healthy disease free. Because 100% specificity means no positives disease free persons are erroneously tagged as diseased. , a positive result in a A high specificity test is normally used to confirm the disease. The maximum can trivially be achieved by a test that claims everybody healthy regardless of the true condition. Unfortunately a 100%-specific test standard can also be ascribed to a 'bogus' test kit whereby nobody, not even those who are truly diseased, ever get tagged as diseased. Therefore, the specificity alone does not tell us how well the test recognizes positive cases. We also need to know the sensitivity of the test. A test with a high specificity has a low type I error rate." —Preceding unsigned comment added by 22.214.171.124 (talk) 21:49, 9 July 2010 (UTC)
Postscript: IMHO: 'sensitivity' and 'specificity' are both VERY BAD 'misnomers' that serve confusion rather then clarity. The more accurate description of each would be 'True-Positive-Rate', 'True-Neg-Rate' respectively; alas, the terms 'sensitivity' and 'specificity' seem poor colloquials that have long served to confuse and mislead. Too bad that they now seem to be convention, but perhaps someone with clout and commitment should no less clarify this ambiguity. 126.96.36.199 (talk) 00:48, 10 July 2010 (UTC)Rgds, Kingnept188.8.131.52 (talk) 00:48, 10 July 2010 (UTC)
- Coming to this article by searching for False positive I too wonder if there is scope for describing the simplest concepts in terms that will make obvious sense to an interested layman? False positive, false negative, are both important and if grasped do allow the layperson to understand quite a lot of the more important issues in real life. I would hope to see them in the lede. Richard Keatinge (talk) 18:59, 6 August 2012 (UTC)
Simple summary or interpretation
I tried to summarise with
"==Medical example== Eg. a medical diagnostic criteria quoted as having sensitivity = 43% and specificity = 96% means that 43% of the people with the criteria have the disease, and 96% of the people without the criteria do not have the disease. Hence 'sensitive' tests help confirm a diagnosis whilst 'specific' tests help exclude a diagnosis."
but it seems to conflict with the Worked Example. Can anyone confirm or correct my summary above please. Does it mean 43% of the people with the disease have the criteria and 96% of the people without the disease do not have the criteria. ? If that's the case wouldn't we more usefully characterise tests by PPV and NPV rather than sensitivity and specificity ? Rod57 (talk) 13:33, 17 August 2010 (UTC)
- I believe the point is in the context of how the test is to be applied. Sensitivity and specificity are the standard method to achieving Receiver operating characteristics which are great to optimize diagnostic performance, but say little about how they are applied to benefit a decision. The predictive values are focused on the test outcome while the Sens. and Spec. are a description of the quality of the test as to its performance as a standard. The example given means to illustrate that a test may be good at "confirming" or "excluding" a diagnosis. In this example with 96% specificity, means the test is better at excluding. How the diagnosis is framed to benefit, sets the next question with this given test. Zulu Papa 5 * (talk) 02:23, 6 November 2010 (UTC)
[First-time user] The above summary confuses Sensitivity/Specificity with Positive Predictive Value (PPV)/Negative Predictive Value (NPV). Correction of the above summary: A medical test is 34% sensitive; if we are given a person with the condition, then the test has a 34% chance of being positive. A medical test is 69% specific; if we are given a person without the condition, then the test has a 69% chance of being negative.
A test has a 43% PPV; if we are given a positive test, then there is a 43% chance that the person has the condition. If we are given 100 positive tests (one per person), it is likely that 43% of the people have the condition. A test has a 96% NPV; if we are given a negative test, then there is a 96% chance that the person does not have the condition. If we are given 100 negative tests, it is likely that 96% of the people do not have the condition.
The original page also confuses Specificity with PPV, with the following: "If a test has high specificity, a positive result from the test means a high probability of the presence of disease." It should read, "For a test with a high specificity, if we are given a person without the condition then there is a high chance that the test is negative." The original page must be corrected, by an experienced editor. No sources to cite. 184.108.40.206 (talk) 09:25, 2 November 2011 (UTC)
Sufficient sample size for sensitivity?
Responding to this comment here:
- Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it will, however, pick up 66.7% of all cancers (the sensitivity).
In: the worked example.
Only three people were tested so if the test were done on, let's say a 100 people, with bowel cancer then maybe there would be a different proportion then 66.7%.
So is it correct to say "the sensitivity of the test is 66.7%"? Wouldn't we need to test it on more people who have bowel cancer?
Although perhaps we could have said something like "the sample sensitivity is 66.7%" as contrasted with the theoretical sensitivity.
At least Wolfram's MathWorld calls "sensitivity" the probability that a positive value tests positive -- so we may not have enough samples to get an estimate of the probability.
- Although there were only three people who tested positive, the test was done on all two hundred and three people. More number would give you a better estimate of the probability - I think the most you can say is that although the sensitivty appears to be 66.7%, the confidence limits on that figure would necessarily be very wide. A larger sample may or may not show a changed figure, but the more cases the narrower the confidence interval should be. I guess when talking about a test the numbers come from samples of the whole population and are always an estimate. Egmason (talk) 23:47, 30 March 2011 (UTC)
- I removed the clarify tag -- it looks like somebody changed the numbers around so that it is now thirty people who have bowel cancer rather than only three people. I just changed the wording around a little bit consistent with the idea that one study does not necessarily determine the performance of the particular test. There may be further uncertainty about the performance.
Would it be useful to have something about confidence intervals (or credible intervals for Bayesians) here? You would presumably calculate the Binomial proportion confidence interval, or would find the posterior distribution given an uninformative Beta(1,1) prior. In the given example you would compute a 95% confidence interval for the sensitivity of (0.5060, 0.8271) and a 95% credible interval of (0.4863, 0.8077). You would compute a 95% confidence interval for the specificity of (0.8971, 0.9222) and 95% credible interval of (0.8967, 0.9218). These calculations assume that prevalence is exactly that in the data. Ts4079 (talk) 14:42, 1 November 2011 (UTC)
- Such confidence intervals could be easily misinterpreted since sensitivity and specificity are often closely related, as demonstrated in the ROC curve. Something more advanced is probably necessary. Ts4079 (talk) 15:14, 1 November 2011 (UTC)
Denominators of definitions
The section Sensitivity provides the following definition:
I do not understand how the denominator "number of positives" is related to the denominator that precedes it: "number of true positives + number of false negatives". I thought the number of positives would instead equal the number of true positives + number of false positives. Thus, I believe this to be a typo and that "number of positives" should be replaced with something like "number of ill people", which is number of true positives + number of false negatives.
As the page currently stands, it appears that the first and last expressions in this three part equation correspond to "sensitivity" as defined here. The middle expression, which I am questioning, instead appears to correspond to the Positive predictive value as defined on that page. (The above talk section on Sensitivity in Bioinformatics suggests that sometimes the word "sensitivity" is used for positive predictive value. While that might be true, I think we should not switch between the two definitions mid equation.)
I have a similar concern about the definition of "specificity" in the section Specificity. There I believe that the denominator "number of negatives" should be something like "number of well people".
- I think this is a valid point; both the sensitivity and specificity equations use "positive" to mean both a positive test result (which is either true or false relative to the population ground truth) or a positive (i.e., actual) occurrence of illness in a member of the population without sufficiently distinguishing between the two meanings. I agree that changing "number of positives" to something like "number of ill people" or "number of actual positives" (and making an analogous change for the specificity equation) would clarify matters. 220.127.116.11 (talk) 19:12, 18 June 2013 (UTC)
How "highly" sensitive or specific does a test have to be before "SNOUT" and "SPIN" apply?
The article states (under the "Sensitivity" and "Specificity" sections respectively) that "negative results in a high sensitivity test are used to rule out the disease" (referred to by the mnemonic "SNOUT" later in the article) and that "a positive result from a test with high specificity means a high probability of the presence of disease" (described by the mnemonic "SPIN"). However, the example calculation (for a test with a specificity of 91% and a sensitivity of 67%) demonstrates a case in which a positive result from a high specificity test (SPIN) clearly does not correspond to a high probability of the presence of disease (PPV is 10% in the example). Although this depends on prevalence, it seems to indicate that the numerous SNOUT/SPIN-type statements throughout the article are inaccurate as written. (Another such statement is in the "Medical Examples" section, which states that "[a] highly specific test is unlikely to give a false positive result: a positive result should thus be regarded as a true positive".)
Accordingly, I think these statements should be modified to give some idea of exactly how high sensitivity/specificity have to be before SNOUT/SPIN apply (as noted above, the worked example uses a specificity of 91%, which is fairly high, yet a positive test result clearly does not correspond to a high probability of the disease being present) and note the effect of disease prevalence on these assertions or be removed entirely. 18.104.22.168 (talk) —Preceding undated comment added 19:40, 18 June 2013 (UTC)
Type I/II errors, and FP and FN's
It seems to me that the big table near the top of the page has Type I and Type II errors reversed. According to elsewhere (both in my stats book and to the explicit links to Type I and Type II errors in the table itself), a Type I error is "test rejects when hypothesis is actually true", in other words it's a false negative; and a Type II error is "accept when actually false", ie., a false positive. This is exactly backwards from where the Type I and Type II are located. I think the table should be corrected, but I'll leave it to the experts.
-- Wayne Hayes Associate Professor of Computer Science, UC Irvine.
PS: This is exactly why I *HATE* the terms "type I" and "type II". Let's just call them false positives and false negatives, for Pete's sake!! (Yeah, I'm trying to change decades of nomenclature here... :-)
- You are wrong and the article is right. In the phrase "test rejects when hypothesis is actually true", the hypothesis refers to the null hypothesis, i.e. that there is nothing special going on. Rejecting the null hypothesis means triggering an alarm, i.e. saying that something special is going on. Therefore this is the false positive (false alarm) case.
- These are the quirks of statistical hypothesis testing. They all have things kind of "backwards" as they use frequentist concepts as opposed to Bayesian ones. Qorilla (talk) 16:13, 6 May 2015 (UTC)