Talk:Sensitivity and specificity

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Medicine (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Medicine, which recommends that this article follow the Manual of Style for medicine-related articles and use high-quality medical sources. Please visit the project page for details or ask questions at Wikipedia talk:WikiProject Medicine.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 

merge[edit]

request merge of "negative predictive value" "positive predictive value" and "Sensitivity and specificity". these terms are intimiately related, and should be in one place, possibly with a discussion of ROC. Further, suggest modeling on <http://www.musc.edu/dc/icrebm/sensitivity.html>> this is a great expostion of this complicated stuff.Cinnamon colbert (talk) 22:25, 16 November 2008 (UTC) PS: the three articles are a great start

I also think that the entries on precision and recall should be some how linked to this page. Recall is the same things as sensitivity, furthermore, specificity is merely your recall with regards to negative data-points. (Wyatt) —Preceding unsigned comment added by 156.56.93.239 (talk) 15:17, 5 November 2010 (UTC)

action-required: new-image[edit]

OK, I give up. Adding Images to Wiki is a nightmare.

I made a new image for this page that I think is more intuitive. http://s15.postimg.org/yoykdv34r/sensitivity_vs_specificity.png

Someone with more edits to their credit (most of mine have been made anon due to laziness) should add that image to the page. Cheers, -- Dave — Preceding unsigned comment added by Ddopson (talkcontribs) 18:12, 6 September 2013 (UTC)

Merger proposal[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
Decision was merge

I don't believe I can be the only person who thinks it would be better to have a single page for sensitivity and specificity than separate pages for Sensitivity (tests) and Specificity (tests). At present Sensitivity and specificity is a redirect to Binary classification. One section of that, Binary classification#Evaluation of binary classifiers is covering the same ground again. One possibility would be to locate the merged page here at Sensitivity and specificity, replacing the redirect. Binary classification#Evaluation of binary classifiers could then have a "main article on this topic: ..." link to here too. Thoughts? --Qwfp (talk) 08:28, 28 February 2008 (UTC)

I've just realised (thanks to WhatamIdoing) that Test sensitivity should also be included in this discussion. (There's no corresponding test specificity article thank goodness.) --Qwfp (talk) 08:55, 28 February 2008 (UTC)

I agree with merging the two articles as the interpretation of one of them needs the other one also. —Preceding unsigned comment added by 92.249.193.252 (talk) 04:36, 11 March 2008 (UTC)

I also agree with merging the discussed articles, i think one should know things about both values. —Preceding unsigned comment added by 89.37.10.99 (talk) 15:28, 20 March 2008 (UTC)

I also agree with the proposal. Most of the times these two notions are taught, calculated and used as a pair.--[16:36, 21 March 2008 (UTC)

I concur. —Preceding unsigned comment added by 130.88.232.43 (talk) 17:54, 29 March 2008 (UTC)

When describing medical diagnostic tests, sensitivity and specificity always appear as a pair. I am all for merging the two articles. —Preceding unsigned comment added by 74.12.238.106 (talk) 01:04, 7 April 2008 (UTC)

I'm going to go one step further and say that it doesn't make sense to talk about either of sensitivity or specifity without the other. Why make it hard on the user by cross-referencing them instead of just putting them together? -- Jon Miller —Preceding comment added by Sighthndman (talk) 16:53, 2 May 2008 (UTC)

I absolutely agree with the previous comments. It does not make sense to talk about one without the other. —Preceding unsigned comment added by 198.153.57.100 (talk) 18:07, 9 May 2008 (UTC)

Make it so. —Preceding unsigned comment added by 206.117.152.201 (talk) 18:37, 23 May 2008 (UTC)

Yes I completely agree —Preceding unsigned comment added by 152.78.213.56 (talk) 15:03, 31 May 2008 (UTC)

Okay, merging done! Still needs a bit of work though. For the old talk pages, see Talk:Sensitivity (tests) and Talk:Specificity (tests). -3mta3 (talk) 11:55, 12 June 2008 (UTC)

There is an error in the text. A test that is high in sensitivity has a low type I error not tyoe II, and vice versa. It was recorded wrong, but i caught it after much thought. —Preceding unsigned comment added by 69.143.32.107 (talk) 15:26, 14 June 2010 (UTC)


The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.


Order[edit]

Common usage should play no role in article name, as normally-present concerns (e.g. unfindability) are compensated for by redirects. The same holds true in respect to layman's terminology (gingiva vs. gums). By reversing the order, the article invertes intuitiveness by presenting and elucidating that which relates to type 2 errors prior to what which relates to type 1 errors. DRosenbach (Talk | Contribs) 12:40, 1 October 2009 (UTC)

The Wikipedia:Naming conventions policy, specifically the section Wikipedia:Naming conventions#Use common names, says otherwise: "Articles are normally titled using the most common English-language name of a person or thing that is the subject of the article". 'Gingiva' is different as 'Gums' is ambiguous (see Gum) so the next section Wikipedia:Naming conventions#Be precise when necessary comes into play, but in the case of this article neither order is more or less ambiguous or precise than the other. Presenting specificity first because it relates to Type I error is more logical only to those who already know about Type I and Type II errors. To most non-statisticians, these terms are more confusing and less familiar than sensitivity and specificity. Qwfp (talk) 07:22, 3 October 2009 (UTC)

Specificity/Sensitivity in Bioinformatics[edit]

In bioinformatics literature, the terms 'specificity' and 'sensitivity' are used, but are different from what is shown in this article. The terms are used as synonyms for precision and recall. In other words, sensitivity is used correctly, but specificity seems to be used for positive predictive value. I haven't been able to find a source explicitly exposing this difference, though, so I haven't edited the article to include this caveat. —Preceding unsigned comment added by Kernco (talkcontribs) 17:53, 9 April 2010 (UTC)


In fact,the terms 'specificity' and 'sensitivity' are same when in bioinformatics literatures.Maybe some authors used the wrong formula definition in one bioinformatics literature.Someone point out the error in a comment.

I suspect that in being virtual, informatics has issues with what is the real and correct (ground) truth when operationally defining what are "actual positives" and "actual negatives". Perhap a section on informatics definitions would be beneficial in this article. Zulu Papa 5 * (talk) 02:11, 6 November 2010 (UTC)


In the year since I posted this, and continued my phd research in bioinformatics, it's definitely apparent to me that there's no consensus in the community on what the formula for specificity should be, though my impression is still that the most common usage of specificity in the literature is as a synonym for positive predictive value. It's probably just the case of a wrong usage in the past propagating forward, since you must use the same evaluation metrics to compare your own results with previous ones, so I'm not pushing for this page to be changed in any way. I think it's an interesting anomaly, though. Kernco (talk) 20:34, 19 April 2011 (UTC)

Suggested edits to be made under 'Specificity'[edit]

Suggested edits to be made under 'Specificity':

Hi, I'm new to editing here and don't know how to properly do it. However, I hope that somebody who cares, can see if the following edits help. Because I'm unfamiliar with 'code' used by wikipaedia, please accept my improvised 'formatting' which I've devised as follows.

  • What is italics are the original text which should be deleted.
  • What is in bold is what I think should be added.

Rgds email kingnept(at) singnet.com.sg


Specificity: "...A specificity of 100% means that the test recognizes all actual negatives - for example in a test for a certain disease, all healthy disease free people will be recognized as healthy disease free. Because 100% specificity means no positives disease free persons are erroneously tagged as diseased. , a positive result in a A high specificity test is normally used to confirm the disease. The maximum can trivially be achieved by a test that claims everybody healthy regardless of the true condition. Unfortunately a 100%-specific test standard can also be ascribed to a 'bogus' test kit whereby nobody, not even those who are truly diseased, ever get tagged as diseased. Therefore, the specificity alone does not tell us how well the test recognizes positive cases. We also need to know the sensitivity of the test. A test with a high specificity has a low type I error rate." —Preceding unsigned comment added by 220.255.64.42 (talk) 21:49, 9 July 2010 (UTC)

===================[edit]

Postscript: IMHO: 'sensitivity' and 'specificity' are both VERY BAD 'misnomers' that serve confusion rather then clarity. The more accurate description of each would be 'True-Positive-Rate', 'True-Neg-Rate' respectively; alas, the terms 'sensitivity' and 'specificity' seem poor colloquials that have long served to confuse and mislead. Too bad that they now seem to be convention, but perhaps someone with clout and commitment should no less clarify this ambiguity. 119.74.145.154 (talk) 00:48, 10 July 2010 (UTC)Rgds, Kingnept119.74.145.154 (talk) 00:48, 10 July 2010 (UTC)

Coming to this article by searching for False positive I too wonder if there is scope for describing the simplest concepts in terms that will make obvious sense to an interested layman? False positive, false negative, are both important and if grasped do allow the layperson to understand quite a lot of the more important issues in real life. I would hope to see them in the lede. Richard Keatinge (talk) 18:59, 6 August 2012 (UTC)

Simple summary or interpretation[edit]

I tried to summarise with

"==Medical example==
Eg. a medical diagnostic criteria quoted as having sensitivity = 43% and specificity = 96%
means that 43% of the people with the criteria have the disease, and 96% of the people without
the criteria do not have the disease. Hence 'sensitive' tests help confirm a diagnosis whilst
'specific' tests help exclude a diagnosis."

but it seems to conflict with the Worked Example. Can anyone confirm or correct my summary above please. Does it mean 43% of the people with the disease have the criteria and 96% of the people without the disease do not have the criteria. ? If that's the case wouldn't we more usefully characterise tests by PPV and NPV rather than sensitivity and specificity ? Rod57 (talk) 13:33, 17 August 2010 (UTC)

I believe the point is in the context of how the test is to be applied. Sensitivity and specificity are the standard method to achieving Receiver operating characteristics which are great to optimize diagnostic performance, but say little about how they are applied to benefit a decision. The predictive values are focused on the test outcome while the Sens. and Spec. are a description of the quality of the test as to its performance as a standard. The example given means to illustrate that a test may be good at "confirming" or "excluding" a diagnosis. In this example with 96% specificity, means the test is better at excluding. How the diagnosis is framed to benefit, sets the next question with this given test. Zulu Papa 5 * (talk) 02:23, 6 November 2010 (UTC)

[First-time user] The above summary confuses Sensitivity/Specificity with Positive Predictive Value (PPV)/Negative Predictive Value (NPV). Correction of the above summary: A medical test is 34% sensitive; if we are given a person with the condition, then the test has a 34% chance of being positive. A medical test is 69% specific; if we are given a person without the condition, then the test has a 69% chance of being negative.

A test has a 43% PPV; if we are given a positive test, then there is a 43% chance that the person has the condition. If we are given 100 positive tests (one per person), it is likely that 43% of the people have the condition. A test has a 96% NPV; if we are given a negative test, then there is a 96% chance that the person does not have the condition. If we are given 100 negative tests, it is likely that 96% of the people do not have the condition.

The original page also confuses Specificity with PPV, with the following: "If a test has high specificity, a positive result from the test means a high probability of the presence of disease.[1]" It should read, "For a test with a high specificity, if we are given a person without the condition then there is a high chance that the test is negative." The original page must be corrected, by an experienced editor. No sources to cite. 122.57.152.49 (talk) 09:25, 2 November 2011 (UTC)

Sufficient sample size for sensitivity?[edit]

Responding to this comment here:

Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it will, however, pick up 66.7% of all cancers (the sensitivity).

In: the worked example.

Only three people were tested so if the test were done on, let's say a 100 people, with bowel cancer then maybe there would be a different proportion then 66.7%.

So is it correct to say "the sensitivity of the test is 66.7%"? Wouldn't we need to test it on more people who have bowel cancer?

Although perhaps we could have said something like "the sample sensitivity is 66.7%" as contrasted with the theoretical sensitivity.

At least Wolfram's MathWorld calls "sensitivity" the probability that a positive value tests positive -- so we may not have enough samples to get an estimate of the probability.

MathWorld entry for sensitivity Jjjjjjjjjj (talk) 08:24, 8 March 2011 (UTC)

Although there were only three people who tested positive, the test was done on all two hundred and three people. More number would give you a better estimate of the probability - I think the most you can say is that although the sensitivty appears to be 66.7%, the confidence limits on that figure would necessarily be very wide. A larger sample may or may not show a changed figure, but the more cases the narrower the confidence interval should be. I guess when talking about a test the numbers come from samples of the whole population and are always an estimate. Egmason (talk) 23:47, 30 March 2011 (UTC)
I removed the clarify tag -- it looks like somebody changed the numbers around so that it is now thirty people who have bowel cancer rather than only three people. I just changed the wording around a little bit consistent with the idea that one study does not necessarily determine the performance of the particular test. There may be further uncertainty about the performance.
Jjjjjjjjjj (talk) 05:20, 10 May 2011 (UTC)

Confidence intervals[edit]

Would it be useful to have something about confidence intervals (or credible intervals for Bayesians) here? You would presumably calculate the Binomial proportion confidence interval, or would find the posterior distribution given an uninformative Beta(1,1) prior. In the given example you would compute a 95% confidence interval for the sensitivity of (0.5060, 0.8271) and a 95% credible interval of (0.4863, 0.8077). You would compute a 95% confidence interval for the specificity of (0.8971, 0.9222) and 95% credible interval of (0.8967, 0.9218). These calculations assume that prevalence is exactly that in the data. Ts4079 (talk) 14:42, 1 November 2011 (UTC)

Such confidence intervals could be easily misinterpreted since sensitivity and specificity are often closely related, as demonstrated in the ROC curve. Something more advanced is probably necessary. Ts4079 (talk) 15:14, 1 November 2011 (UTC)

Maybe if it had sources? Zulu Papa 5 * (talk) 02:10, 2 November 2011 (UTC)

Denominators of definitions[edit]

The section Sensitivity provides the following definition:

\begin{align}
\text{sensitivity} & = \frac{\text{number of true positives}}{\text{number of true positives} + \text{number of false negatives}} = \frac{\text{number of true positives}}{\text{number of positives}} \\  \\
& = \text{probability of a positive test, given that the patient is ill}
\end{align}

I do not understand how the denominator "number of positives" is related to the denominator that precedes it: "number of true positives + number of false negatives". I thought the number of positives would instead equal the number of true positives + number of false positives. Thus, I believe this to be a typo and that "number of positives" should be replaced with something like "number of ill people", which is number of true positives + number of false negatives.

As the page currently stands, it appears that the first and last expressions in this three part equation correspond to "sensitivity" as defined here. The middle expression, which I am questioning, instead appears to correspond to the Positive predictive value as defined on that page. (The above talk section on Sensitivity in Bioinformatics suggests that sometimes the word "sensitivity" is used for positive predictive value. While that might be true, I think we should not switch between the two definitions mid equation.)

I have a similar concern about the definition of "specificity" in the section Specificity. There I believe that the denominator "number of negatives" should be something like "number of well people".

Mikitikiwiki (talk) 00:27, 18 June 2013 (UTC)

I think this is a valid point; both the sensitivity and specificity equations use "positive" to mean both a positive test result (which is either true or false relative to the population ground truth) or a positive (i.e., actual) occurrence of illness in a member of the population without sufficiently distinguishing between the two meanings. I agree that changing "number of positives" to something like "number of ill people" or "number of actual positives" (and making an analogous change for the specificity equation) would clarify matters. 142.20.133.199 (talk) 19:12, 18 June 2013 (UTC)

How "highly" sensitive or specific does a test have to be before "SNOUT" and "SPIN" apply?[edit]

The article states (under the "Sensitivity" and "Specificity" sections respectively) that "negative results in a high sensitivity test are used to rule out the disease" (referred to by the mnemonic "SNOUT" later in the article) and that "a positive result from a test with high specificity means a high probability of the presence of disease" (described by the mnemonic "SPIN"). However, the example calculation (for a test with a specificity of 91% and a sensitivity of 67%) demonstrates a case in which a positive result from a high specificity test (SPIN) clearly does not correspond to a high probability of the presence of disease (PPV is 10% in the example). Although this depends on prevalence, it seems to indicate that the numerous SNOUT/SPIN-type statements throughout the article are inaccurate as written. (Another such statement is in the "Medical Examples" section, which states that "[a] highly specific test is unlikely to give a false positive result: a positive result should thus be regarded as a true positive".)

Accordingly, I think these statements should be modified to give some idea of exactly how high sensitivity/specificity have to be before SNOUT/SPIN apply (as noted above, the worked example uses a specificity of 91%, which is fairly high, yet a positive test result clearly does not correspond to a high probability of the disease being present) and note the effect of disease prevalence on these assertions or be removed entirely. 142.20.133.199 (talk) —Preceding undated comment added 19:40, 18 June 2013 (UTC)