Jump to content

Coverage probability: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
more clearly
further clarification
Line 1: Line 1:
{{Use dmy dates|date=December 2013}}
{{Use dmy dates|date=December 2013}}
In statistics, the '''coverage probability''', or '''coverage''' for short, is the [[probability]] that a [[confidence interval]] or [[confidence region]] will include the [[Statistical parameter|true value]] of interest. It can be defined as the [[Empirical probability|proportion of instances]] where the interval surrounds the true value as assessed by [[Frequentist probability|long-run frequency]].<ref>Dodge, Y. (2003). ''The Oxford Dictionary of Statistical Terms.'' OUP, {{ISBN|0-19-920613-9}}, p. 93.</ref>
In statistics, the '''coverage probability''', or '''coverage''' for short, is the [[probability]] that a [[confidence interval]] or [[confidence region]] will include the [[Statistical parameter|true value]] (parameter) of interest. It can be defined as the [[Empirical probability|proportion of instances]] where the interval surrounds the true value as assessed by [[Frequentist probability|long-run frequency]].<ref>Dodge, Y. (2003). ''The Oxford Dictionary of Statistical Terms.'' OUP, {{ISBN|0-19-920613-9}}, p. 93.</ref>


== Concept ==
For example, suppose the interest is in the [[Expected value|mean]] number of months that people with a particular type of [[cancer]] remain in [[Remission (medicine)|remission]] following successful treatment with [[chemotherapy]]. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is called the ''confidence level'' or ''confidence coefficient'' of the constructed interval, which is effectively the '''nominal coverage probability''' of the procedure for constructing confidence intervals. Hence, referring to a "nominal confidence level" (e.g., as a synonym for ''nominal coverage probability'') has to be considered a nonsensical or [[Tautology (language)|tautological]] [[malapropism]]. The nominal coverage probability is often set at 0.95. The coverage probability is the ''actual'' probability that the interval contains the true mean remission duration in this example.
The fixed [[Certainty#Degrees of certainty|degree of certainty]] pre-specified by the analyst, referred to as the ''confidence level'' or ''confidence coefficient'' of the constructed interval, is effectively the '''nominal coverage probability''' of the procedure for constructing confidence intervals. Hence, referring to a "nominal confidence level" or "nominal confidence coefficient" (e.g., as a synonym for ''nominal coverage probability'') generally has to be considered [[Tautology (language)|tautological]] and misleading, as the notion of ''confidence level'' itself inherently implies [[Real versus nominal value|nominality]] already.<ref group="note">However, some textbooks use the terms ''nominal confidence level'' or ''nominal confidence coefficient'', and ''actual confidence level'' or ''actual confidence coefficient'' in the sense of "nominal" and "actual coverage probability"; cf., for instance, {{cite book |mode=cs2 |last=Wackerly |first=Dennis |last2=Mendenhall |first2=William |last3=Schaeffer |first3=Richard L. |title=Mathematical Statistics with Applications |year=2008 |publisher=Cengage Learning |page=437 |isbn=978-1-111-79878-9 |url=https://books.google.de/books?id=lTgGAAAAQBAJ&pg=PA437}}.</ref> The nominal coverage probability is often set at 0.95. By contrast, the (true) coverage probability is the ''actual'' probability that the interval contains the parameter.


If all assumptions used in deriving a confidence interval are met, the nominal coverage probability will equal the coverage probability (termed "true" or "actual" coverage probability for emphasis). If any assumptions are not met, the actual coverage probability could either be less than or greater than the nominal coverage probability. When the actual coverage probability is greater than the nominal coverage probability, the interval is termed a '''conservative (confidence) interval'''; if it is less than the nominal coverage probability, the interval is termed '''anti-conservative''', or '''permissive'''.
If all assumptions used in deriving a confidence interval are met, the nominal coverage probability will equal the coverage probability (termed "true" or "actual" coverage probability for emphasis). If any assumptions are not met, the actual coverage probability could either be less than or greater than the nominal coverage probability. When the actual coverage probability is greater than the nominal coverage probability, the interval is termed a '''conservative (confidence) interval'''; if it is less than the nominal coverage probability, the interval is termed '''anti-conservative''', or '''permissive'''. For example, suppose the interest is in the [[Expected value|mean]] number of months that people with a particular type of [[cancer]] remain in [[Remission (medicine)|remission]] following successful treatment with [[chemotherapy]]. The confidence interval aims to contain the unknown mean remission duration with a given probability. In this example, the coverage probability would be the real probability that the interval actually contains the true mean remission duration.


A discrepancy between the coverage probability and the nominal coverage probability frequently occurs when approximating a [[Probability distribution#Discrete probability distribution|discrete distribution]] with a [[Continuous probability distribution#Absolutely continuous probability distribution|continuous one]]. The construction of [[Binomial proportion confidence interval|binomial confidence intervals]] is a classic example where coverage probabilities rarely equal nominal levels.<ref>{{cite journal | last = Agresti| first = Alan |author2=Coull, Brent | year = 1998 | title = Approximate Is Better than "Exact" for Interval Estimation of Binomial Proportions | journal = The American Statistician | volume = 52 | pages = 119–126 | jstor=2685469 | doi = 10.2307/2685469 | issue = 2}}</ref><ref>{{cite journal | last=Brown | first=Lawrence | author2=Cai, T. Tony | author3=DasGupta, Anirban | title=Interval Estimation for a binomial proportion | journal=Statistical Science | year=2001 | volume=16 | issue=2 | pages=101–117 | url=http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf | doi=10.1214/ss/1009213286 | doi-access=free | access-date=17 July 2009 | archive-date=23 June 2010 | archive-url=https://web.archive.org/web/20100623070611/http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf | url-status=live }}</ref><ref>{{cite journal | last = Newcombe| first = Robert | year = 1998 | title = Two-sided confidence intervals for the single proportion: Comparison of seven methods. | journal = Statistics in Medicine | volume = 17 | number = 2, issue 8 |pages = 857–872 | url=http://www3.interscience.wiley.com/journal/3156/abstract | archive-url=https://archive.today/20130105132032/http://www3.interscience.wiley.com/journal/3156/abstract | url-status=dead | archive-date=2013-01-05 | doi = 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E | pmid = 9595616}}</ref> For the binomial case, several techniques for constructing intervals have been created. The Wilson score interval is one well-known construction based on the [[normal distribution]]. Other constructions include the Wald, exact, Agresti-Coull, and likelihood intervals. While the Wilson score interval may not be the most conservative estimate, it produces average coverage probabilities that are equal to nominal levels while still producing a comparatively narrow confidence interval.
A discrepancy between the coverage probability and the nominal coverage probability frequently occurs when approximating a [[Probability distribution#Discrete probability distribution|discrete distribution]] with a [[Continuous probability distribution#Absolutely continuous probability distribution|continuous one]]. The construction of [[Binomial proportion confidence interval|binomial confidence intervals]] is a classic example where coverage probabilities rarely equal nominal levels.<ref>{{cite journal | last = Agresti| first = Alan |author2=Coull, Brent | year = 1998 | title = Approximate Is Better than "Exact" for Interval Estimation of Binomial Proportions | journal = The American Statistician | volume = 52 | pages = 119–126 | jstor=2685469 | doi = 10.2307/2685469 | issue = 2}}</ref><ref>{{cite journal | last=Brown | first=Lawrence | author2=Cai, T. Tony | author3=DasGupta, Anirban | title=Interval Estimation for a binomial proportion | journal=Statistical Science | year=2001 | volume=16 | issue=2 | pages=101–117 | url=http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf | doi=10.1214/ss/1009213286 | doi-access=free | access-date=17 July 2009 | archive-date=23 June 2010 | archive-url=https://web.archive.org/web/20100623070611/http://www-stat.wharton.upenn.edu/~tcai/paper/Binomial-StatSci.pdf | url-status=live }}</ref><ref>{{cite journal | last = Newcombe| first = Robert | year = 1998 | title = Two-sided confidence intervals for the single proportion: Comparison of seven methods. | journal = Statistics in Medicine | volume = 17 | number = 2, issue 8 |pages = 857–872 | url=http://www3.interscience.wiley.com/journal/3156/abstract | archive-url=https://archive.today/20130105132032/http://www3.interscience.wiley.com/journal/3156/abstract | url-status=dead | archive-date=2013-01-05 | doi = 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E | pmid = 9595616}}</ref> For the binomial case, several techniques for constructing intervals have been created. The Wilson score interval is one well-known construction based on the [[normal distribution]]. Other constructions include the Wald, exact, Agresti-Coull, and likelihood intervals. While the Wilson score interval may not be the most conservative estimate, it produces average coverage probabilities that are equal to nominal levels while still producing a comparatively narrow confidence interval.
Line 19: Line 20:
* [[False coverage rate]]
* [[False coverage rate]]
* [[Interval estimation]]
* [[Interval estimation]]

== Notes ==
{{reflist|group=note}}


== References ==
== References ==

Revision as of 22:07, 19 July 2023

In statistics, the coverage probability, or coverage for short, is the probability that a confidence interval or confidence region will include the true value (parameter) of interest. It can be defined as the proportion of instances where the interval surrounds the true value as assessed by long-run frequency.[1]

Concept

The fixed degree of certainty pre-specified by the analyst, referred to as the confidence level or confidence coefficient of the constructed interval, is effectively the nominal coverage probability of the procedure for constructing confidence intervals. Hence, referring to a "nominal confidence level" or "nominal confidence coefficient" (e.g., as a synonym for nominal coverage probability) generally has to be considered tautological and misleading, as the notion of confidence level itself inherently implies nominality already.[note 1] The nominal coverage probability is often set at 0.95. By contrast, the (true) coverage probability is the actual probability that the interval contains the parameter.

If all assumptions used in deriving a confidence interval are met, the nominal coverage probability will equal the coverage probability (termed "true" or "actual" coverage probability for emphasis). If any assumptions are not met, the actual coverage probability could either be less than or greater than the nominal coverage probability. When the actual coverage probability is greater than the nominal coverage probability, the interval is termed a conservative (confidence) interval; if it is less than the nominal coverage probability, the interval is termed anti-conservative, or permissive. For example, suppose the interest is in the mean number of months that people with a particular type of cancer remain in remission following successful treatment with chemotherapy. The confidence interval aims to contain the unknown mean remission duration with a given probability. In this example, the coverage probability would be the real probability that the interval actually contains the true mean remission duration.

A discrepancy between the coverage probability and the nominal coverage probability frequently occurs when approximating a discrete distribution with a continuous one. The construction of binomial confidence intervals is a classic example where coverage probabilities rarely equal nominal levels.[2][3][4] For the binomial case, several techniques for constructing intervals have been created. The Wilson score interval is one well-known construction based on the normal distribution. Other constructions include the Wald, exact, Agresti-Coull, and likelihood intervals. While the Wilson score interval may not be the most conservative estimate, it produces average coverage probabilities that are equal to nominal levels while still producing a comparatively narrow confidence interval.

The "probability" in coverage probability is interpreted with respect to a set of hypothetical repetitions of the entire data collection and analysis procedure. In these hypothetical repetitions, independent data sets following the same probability distribution as the actual data are considered, and a confidence interval is computed from each of these data sets; see Neyman construction. The coverage probability is the fraction of these computed confidence intervals that include the desired but unobservable parameter value.

Formula

The construction of the confidence interval ensures that the probability of finding the true parameter in the sample-dependent interval is (at least) :

See also

Notes

  1. ^ However, some textbooks use the terms nominal confidence level or nominal confidence coefficient, and actual confidence level or actual confidence coefficient in the sense of "nominal" and "actual coverage probability"; cf., for instance, Wackerly, Dennis; Mendenhall, William; Schaeffer, Richard L. (2008), Mathematical Statistics with Applications, Cengage Learning, p. 437, ISBN 978-1-111-79878-9.

References

  1. ^ Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms. OUP, ISBN 0-19-920613-9, p. 93.
  2. ^ Agresti, Alan; Coull, Brent (1998). "Approximate Is Better than "Exact" for Interval Estimation of Binomial Proportions". The American Statistician. 52 (2): 119–126. doi:10.2307/2685469. JSTOR 2685469.
  3. ^ Brown, Lawrence; Cai, T. Tony; DasGupta, Anirban (2001). "Interval Estimation for a binomial proportion" (PDF). Statistical Science. 16 (2): 101–117. doi:10.1214/ss/1009213286. Archived (PDF) from the original on 23 June 2010. Retrieved 17 July 2009.
  4. ^ Newcombe, Robert (1998). "Two-sided confidence intervals for the single proportion: Comparison of seven methods". Statistics in Medicine. 17 (2, issue 8): 857–872. doi:10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E. PMID 9595616. Archived from the original on 5 January 2013.