Extreme value theory
|
|
This article needs attention from an expert on the subject. See the talk page for details. WikiProject Statistics or the Statistics Portal may be able to help recruit an expert. (May 2008) |
Extreme value analysis (EVA) is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any observed prior. Extreme value analysis is widely used in many disciplines, ranging from structural engineering, finance, earth sciences, traffic prediction, geological engineering, etc. For example, EVA might be used in the field of hydrology to estimate the value an unusually large flooding event, such as the 100-year flood. Similarly, for the design of a breakwater, a coastal engineer would seek to estimate the 50-year wave and design the structure accordingly.
Contents |
[edit] Data sampling
Two approaches exist to fit the tail of a sample empirical cumulative distribution function (ECDF)to one of the three possible distribution functions. The first method relies on approximating a distribution from a so-called block maxima (minima) series. In operational statistics situations, it is customary and convenient to apply a sampling method that consists in extracting the annual maxima. In doing so, a so-called "Annual Maxima Series" (AMS) is generated. The second method relies on sampling points from the data set that exceeds a certain threshold (falls below a certain floor). This method is generally referred to as the "Point Over Threshold" method (POT).:
- Basic theory approach as described in the book by Burry (1975). In general this conforms to the first theorem in extreme value theory (Fisher and Tippett, 1928; Gnedenko, 1943).
- Most common at this moment is the tail-fitting approach based on the second theorem in extreme value theory (Pickands, 1975; Balkema and de Haan, 1974).
The difference between the two theorems is due to the nature of the data generation. For Theorem I the data are generated in full range, while in Theorem II data is only generated when it surpasses a certain threshold, called Peak Over Threshold models (POT). The POT approach has been developed largely in the insurance business, where only losses (pay outs) above a certain threshold are accessible to the insurance company. Strangely, this approach is often used for cases where Theorem I applies, which creates problems with the basic model assumptions.
Extreme value distributions are the limiting distributions for the minimum or the maximum of a very large collection of independent random variables from the same arbitrary distribution. Emil Julius Gumbel (1958) showed that for any well-behaved initial distribution (i.e., F(x) is continuous and has an inverse), only a few models are needed, depending on whether you are interested in the maximum or the minimum, and also if the observations are bounded above or below.
[edit] Applications
Applications of extreme value theory include predicting the probability distribution of:
- Extreme floods
- The amounts of large insurance losses
- Equity risks
- Day to day market risk
- The size of freak waves
- Mutational events during evolution
- Large wildfires[1]
- It can be applied to some characterization of the distribution of the maxima of incomes, like in some surveys done in virtually all the National Offices of Statistics
- Estimate fastest time humans are capable of running the 100 metres sprint.[2]
- Pipeline failures due to pitting corrosion
[edit] History
The field of extreme value theory was pioneered by Leonard Tippett (1902–1985). Tippett was employed by the British Cotton Industry Research Association, where he worked to make cotton thread stronger. In his studies, he realized that the strength of a thread was controlled by the strength of its weakest fibers. With the help of R. A. Fisher, Tippet obtained three asymptotic limits describing the distributions of extremes. The German mathematician and anti-Nazi activist Emil Julius Gumbel codified this theory in his 1958 book Statistics of Extremes, including the Gumbel distributions that bear his name.
A summary of historically important publications relating to extreme values theory can be found on the article List of publications in statistics.
[edit] Univariate theory
[edit] Classical extreme value theory and models
Let
be a sequence of independent and identically distributed variables with distribution function F and let
denote the maximum.
In theory, the exact distribution of the maximum can be derived:
In practice, we might not have the distribution function F but the Fisher–Tippett–Gnedenko theorem provides the following asymptotic result
If there exist sequences of constants {an > 0} and {bn} such that
as
and G is a non-degenerate distribution then G belongs to one of the following families:
where α > 0.
[edit] Models for exceedances
| This section is empty. You can help by adding to it. |
[edit] See also
- Generalized extreme value distribution
- Pareto distribution
- Large deviation theory
- Weibull distribution
- Extreme risk
- Extreme weather
- Fisher–Tippett–Gnedenko theorem
|
|
This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. Please help to improve this article by introducing more precise citations. (September 2010) |
[edit] Citations
- ^ Alvardo (1998, p.68.)
- ^ "Ultimate 100m World Records Through Extreme-Value Theory", CentER Discussion Paper, Tilburg University, 57, 2009, http://arno.uvt.nl/show.cgi?fid=95436, retrieved 2009-08-12
[edit] References
- Abarbane, H.; Koonin, S.; Levine, H.; MacDonald, G.; Rothaus, O. (January 1992), "Statistics of Extreme Events with Application to Climate" (PDF), JASON JSR-90-30S, http://www.fas.org/irp/agency/dod/jason/statistics.pdf, retrieved 2011-10-11
- Alvarado, Ernesto; Sandberg, David V.; Pickford, Stewart G. (Special Issue 1998), "Modeling Large Forest Fires as Extreme Events" (PDF), Northwest Science 72: 66–75, http://www.vetmed.wsu.edu/org_nws/NWSci%20journal%20articles/1998%20files/Special%20addition%201/v72%20p66%20Alvarado%20et%20al.PDF, retrieved 2009-02-06
- Balkema, A., and Laurens de Haan (1974). Residual life time at great age, Annals of Probability, 2, 792–804.
- Burry K.V. (1975). Statistical Methods in Applied Science. John Wiley & Sons.
- Castillo, E. 1988. Extreme value theory in engineering. Academic Press, Inc. New York.
- Embrechts, P., C. Klüppelberg, and T. Mikosch (1997) Modelling extremal events for insurance and finance. Berlin: Spring Verlag
- Fisher, R.A., and L. H. C. Tippett (1928). Limiting forms of the frequency distribution of the largest and smallest member of a sample, Proc. Cambridge Phil. Soc., 24, 180–190.
- Gnedenko, B.V. (1943), Sur la distribution limite du terme maximum d'une serie aleatoire, Annals of Mathematics, 44, 423–453.
- Gumbel, E.J. (1935), "Les valeurs extrêmes des distributions statistiques" (PDF), Ann. Inst. H. Poincaré 5 (2): 115–158, http://archive.numdam.org/article/AIHP_1935__5_2_115_0.pdf, retrieved 2009-04-01
- Gumbel, Emil J. (1958), Statistics of Extremes, Columbia University Press, ISBN 0-483-43604-7, http://books.google.com/?id=kXCg8B5xSUwC&lpg=PP1&dq=Statistics%20of%20Extremes%20gumbel&pg=PP1#v=onepage&q=
- Leadbetter, M. R., G. Lindgren, and H. Rootzen. 1982. Extremes and related properties of random sequences and processes. Springer-Verlag. New York.
- Lindgren. G., and H. Rootzen. 1987. Extreme values: Theory and technical applications. Scandinavian Journal of Statistics, Theory and Applications 14:241–279.
- Pickands, J. (1975). Statistical inference using extreme order statistics, Annals of Statistics, 3, 119–131.

as
and G is a non-

