Heavy-tailed distribution
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded:[1] that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
There are three important subclasses of heavy-tailed distributions, the fat-tailed distributions, the long-tailed distributions and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class.
There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a variance. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally acknowledged to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)
Contents |
[edit] Definition of heavy-tailed distribution
The distribution of a random variable X with distribution function F is said to have a heavy right tail if[1]
This is also written in terms of the tail distribution function
as
This is equivalent to the statement that the moment generating function of F, MF(t), is infinite for all t > 0.[2]
The definitions of heavy-tailed for left-tailed or two tailed distributions are similar.
[edit] Definition of long-tailed distribution
The distribution of a random variable X with distribution function F is said to have a long right tail[1] if for all t > 0,
or equivalently
This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level: if you know the situation is bad, it is probably worse than you think.
All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.
[edit] Subexponential distributions
Subexponentiality is defined in terms of convolutions of probability distributions. For two independent, identically distributed random variables
with common distribution function
the convolution of
with itself,
is defined, using Lebesgue–Stieltjes integration, by:
The n-fold convolution
is defined in the same way. The tail distribution function
is defined as
.
A distribution
on the positive half-line is subexponential[1] if
This implies[3] that, for any
,
The probabilistic interpretation[3] of this is that, for a sum of
independent random variables
with common distribution
,
This is often known as the principle of the single big jump.[4]
A distribution
on the whole real line is subexponential if the distribution
is.[5] Here
is the indicator function of the positive half-line. Alternatively, a random variable
supported on the real line is subexponential if and only if
is subexponential.
All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.
[edit] Common heavy-tailed distributions
All commonly used heavy-tailed distributions are subexponential.[3]
Those that are one-tailed include:
- the Pareto distribution;
- the Log-normal distribution;
- the Lévy distribution;
- the Weibull distribution with shape parameter less than 1;
- the Burr distribution;
- the log-gamma distribution;
- the log-Cauchy distribution, sometimes described as having a "super-heavy tail" because it exhibits logarithmic decay producing a heavier tail than the Pareto distribution.[6][7]
Those that are two-tailed include:
- The Cauchy distribution, itself a special case of both the stable distribution and the t-distribution;
- The family of stable distributions,[8] excepting the special case of the normal distribution within that family. Some stable distributions are one-sided (or supported by a half-line), see e.g. Lévy distribution. See also financial models with long-tailed distributions and volatility clustering.
- The t-distribution.
- The skew lognormal cascade distribution.[9]
[edit] Estimating the tail-index
[edit] Pickands tail-index
With
a random sequence of independent and same density function
, the Maximum Attraction Domain of the generalized extreme value density
, where
. If
and
, then the Pickands tail-index estimation is :[3]
where
. This estimator converge in probability to
.
[edit] Hill tail-index
With
a random sequence of independent and same density function
, the Maximum Attraction Domain of the generalized extreme value density
, where
. If
and
, then the Hill tail-index estimation is :[3]
where
. This estimator converge in probability to
.
[edit] See also
[edit] References
- ^ a b c d Asmussen, Søren (2003). Applied probability and queues. Berlin: Springer. ISBN 978-0-387-00211-8.
- ^ Rolski, Schmidli, Scmidt, Teugels, Stochastic Processes for Insurance and Finance, 1999
- ^ a b c d e Embrechts, Paul; Claudia Klüppelberg; Mikosch, Thomas (1997). Modelling Extremal Events for Insurance and Finance. Berlin: Springer. ISBN 978-3-540-60931-5.
- ^ Foss, Konstantopolous, Zachary, "Discrete and continuous time modulated random walks with heavy-tailed increments", Journal of Theoretical Probability, 20 (2007), No.3, 581—612
- ^ Willekens, E. Subexponentiality on the real line. Technical Report, K.U. Leuven(1986)
- ^ Falk, M., Hüsler, J. & Reiss, R. (2010). Laws of Small Numbers: Extremes and Rare Events. Springer. p. 80. ISBN 978-3-0348-0008-2.
- ^ Alves, M.I.F., de Haan, L. & Neves, C. (March 10, 2006). "Statistical inference for heavy and super-heavy tailed distributions".
- ^ John P. Nolan (2009). "Stable Distributions: Models for Heavy Tailed Data" (PDF). Retrieved 2009-02-21.
- ^ Stephen Lihn (2009). "Skew Lognormal Cascade Distribution".
![\lim_{x \to \infty} e^{\lambda x}\Pr[X>x] = \infty \quad \mbox{for all } \lambda>0.\,](http://upload.wikimedia.org/math/1/4/5/145015ec197be460e4d4d66243b23f71.png)
![\overline{F}(x) \equiv \Pr[X>x] \,](http://upload.wikimedia.org/math/a/1/b/a1bc750c46bbed225e1abed298704b8a.png)

![\lim_{x \to \infty} \Pr[X>x+t|X>x] =1, \,](http://upload.wikimedia.org/math/a/a/2/aa24b1d9c41259ff98bea457271cf262.png)

![\Pr[X_1+X_2 \leq x] = F^{*2}(x) = \int_{- \infty}^\infty F(x-y)\,dF(y).](http://upload.wikimedia.org/math/e/9/b/e9b35a59b7d2d05ffbe846f92fbeb81e.png)


![\Pr[X_1+ \cdots +X_n>x] \sim \Pr[\max(X_1, \ldots,X_n)>x] \quad \text{as } x \to \infty.](http://upload.wikimedia.org/math/3/7/7/3775879198f05acc4975117fff392d6d.png)

