Jump to content

P–P plot

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by JHunterJ (talk | contribs) at 21:59, 22 September 2020 (Disambiguating links to Probability plot (intentional link to DAB) using DisamAssist.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, which plots the two cumulative distribution functions against each other. P-P plots are vastly used to evaluate the skewness of a distribution.

The Q–Q plot is more widely used, but they are both referred to as "the" probability plot, and are potentially confused.

Definition

A P–P plot plots two cumulative distribution functions (cdfs) against each other:[1] given two probability distributions, with cdfs "F" and "G", it plots as z ranges from to As a cdf has range [0,1], the domain of this parametric graph is and the range is the unit square

Thus for input z the output is the pair of numbers giving what percentage of f and what percentage of g fall at or below z.

The comparison line is the 45° line from (0,0) to (1,1) – the distributions are equal if and only if the plot falls on this line – any deviation indicates a difference between the distributions.[2]

Example

As an example, if the two distributions do not overlap, say F is below G, then the P–P plot will move from left to right along the bottom of the square – as z moves through the support of F, the cdf of F goes from 0 to 1, while the cdf of G stays at 0 – and then moves up the right side of the square – the cdf of F is now 1, as all points of F lie below all points of G, and now the cdf of G moves from 0 to 1 as z moves through the support of G. (need a graph for this paragraph)

Use

As the above example illustrates, if two distributions are separated in space, the P–P plot will give very little data – it is only useful for comparing probability distributions that have nearby or equal location. Notably, it will pass through the point (1/2, 1/2) if and only if the two distributions have the same median.

P–P plots are sometimes limited to comparisons between two samples, rather than comparison of a sample to a theoretical model distribution.[3] However, they are of general use, particularly where observations are not all modelled with the same distribution.

However, it has found some use in comparing a sample distribution from a known theoretical distribution: given n samples, plotting the continuous theoretical cdf against the empirical cdf would yield a stairstep (a step as z hits a sample), and would hit the top of the square when the last data point was hit. Instead one only plots points, plotting the observed kth observed points (in order: formally the observed kth order statistic) against the k/(n + 1) quantile of the theoretical distribution.[3] This choice of "plotting position" (choice of quantile of the theoretical distribution) has occasioned less controversy than the choice for Q–Q plots. The resulting goodness of fit of the 45° line gives a measure of the difference between a sample set and the theoretical distribution.

A P–P plot can be used as a graphical adjunct to a tests of the fit of probability distributions,[4][5] with additional lines being included on the plot to indicate either specific acceptance regions or the range of expected departure from the 1:1 line. An improved version of the P–P plot, called the SP or S–P plot, is available,[4][5] which makes use of a variance-stabilizing transformation to create a plot on which the variations about the 1:1 line should be the same at all locations.

See also

References

Citations

  1. ^ Nonparametric statistical inference by Jean Dickinson Gibbons, Subhabrata Chakraborti, 4th Edition, CRC Press, 2003, ISBN 978-0-8247-4052-8, p. 145
  2. ^ Derrick, B; Toher, D; White, P (2016). "Why Welchs test is Type I error robust". The Quantitative Methods for Psychology. 12 (1): 30–38. doi:10.20982/tqmp.12.1.p030.
  3. ^ a b Testing for Normality, by Henry C. Thode, CRC Press, 2002, ISBN 978-0-8247-9613-6, Section 2.2.3, Percent–percent plots, p. 23
  4. ^ a b Michael J.R. (1983) "The stabilized probability plot". Biometrika, 70(1), 11–17. JSTOR 2335939
  5. ^ a b Shorack, G.R., Wellner, J.A (1986) Empirical Processes with Applications to Statistics, Wiley. ISBN 0-471-86725-X p248–250

Sources