Truncated normal distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Probability density function
Probability density function for the truncated normal distribution for different sets of parameters. In all cases, a = −10 and b = 10. For the black: μ = −8, σ = 2; blue: μ = 0, σ = 2; red: μ = 9, σ = 10; orange: μ = 0, σ = 10.
Cumulative distribution function
Cumulative distribution function for the truncated normal distribution for different sets of parameters. In all cases, a = −10 and b = 10. For the black: μ = −8, σ = 2; blue: μ = 0, σ = 2; red: μ = 9, σ = 10; orange: μ = 0, σ = 10.
Notation \xi=\frac{x-\mu}{\sigma},\ \alpha=\frac{a-\mu}{\sigma},\ \beta=\frac{b-\mu}{\sigma}
Z=\Phi(\beta)-\Phi(\alpha)
Parameters μR — mean (location)
σ2 ≥ 0 — variance (squared scale)
a ∈ R — minimum value
b ∈ R — maximum value
Support x ∈ [a,b]
pdf f(x;\mu,\sigma, a,b) = \frac{1}{\sigma Z}\phi(\xi)
CDF F(x;\mu,\sigma, a,b) = \frac{\Phi(\xi) - \Phi(\alpha)}{Z}
Mean \mu +  \frac{\phi(\alpha)-\phi(\beta)}{Z}\sigma
Mode \left\{\begin{array}{ll}a, & \mathrm{if}\ \mu<a \\ \mu, & \mathrm{if}\ a\le\mu\le b\\ b, & \mathrm{if}\ \mu>b\end{array}\right.
Variance \sigma^2\left[1+\frac{\alpha\phi(\alpha)-\beta\phi(\beta)}{Z}
-\left(\frac{\phi(\alpha)-\phi(\beta)}{Z}\right)^2\right]
Entropy \ln(\sqrt{2 \pi e} \sigma Z) + \frac{\alpha\phi(\alpha)-\beta\phi(\beta)}{2Z}
MGF e^{\mu t + \sigma^2 t^2 / 2} * \left[ \frac{ \Phi(\frac{b - \mu}{\sigma} - \sigma t) - \Phi(\frac{a - \mu }{\sigma} - \sigma t)  }{\Phi(\frac{b - \mu}{\sigma}) - \Phi(\frac{a - \mu }{\sigma}) }  \right]

In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the Tobit model.

Definition[edit]

Suppose  X \sim N(\mu, \sigma^{2}) \! has a normal distribution and lies within the interval  X \in (a,b), \; -\infty \leq a < b \leq \infty . Then X conditional on  a < X < b has a truncated normal distribution.

Its probability density function, ƒ, for  a \leq x \leq b , is given by


f(x;\mu,\sigma,a,b) = \frac{\frac{1}{\sigma}\phi(\frac{x - \mu}{\sigma})}{\Phi(\frac{b - \mu}{\sigma}) - \Phi(\frac{a - \mu}{\sigma}) }

and by ƒ=0 otherwise.

Here,  \scriptstyle{\phi(\xi)=\frac{1}{\sqrt{2 \pi}}\exp{(-\frac{1}{2}\xi^2})} \ is the probability density function of the standard normal distribution and  \scriptstyle{\Phi(\cdot)} \ is its cumulative distribution function. There is an understanding that if  \scriptstyle{b=\infty} \ , then  \scriptstyle{\Phi(\frac{b - \mu}{\sigma}) =1}, and similarly, if  \scriptstyle{a=-\infty} \ , then  \scriptstyle{\Phi(\frac{a - \mu}{\sigma}) =0}.

Moments[edit]

Two sided truncation:[1]

 \operatorname{E}(X \mid a<X<b) = \mu +  \frac{\phi(\frac{a-\mu}{\sigma})-\phi(\frac{b-\mu}{\sigma})}{\Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma})}\sigma\!
 \operatorname{Var}(X \mid a<X<b) = \sigma^2\left[1+\frac{\frac{a-\mu}{\sigma}\phi(\frac{a-\mu}{\sigma})-\frac{b-\mu}{\sigma}\phi(\frac{b-\mu}{\sigma})}{\Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma})}
-\left(\frac{\phi(\frac{a-\mu}{\sigma})-\phi(\frac{b-\mu}{\sigma})}{\Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma})}\right)^2\right]\!

One sided truncation (upper tail):[2]

 \operatorname{E}(X \mid X>a) = \mu +\sigma\lambda(\alpha) \!
 \operatorname{Var}(X \mid X>a) = \sigma^2[1-\delta(\alpha)],\!

where \alpha=(a-\mu)/\sigma,\; \lambda(\alpha)=\phi(\alpha)/[1-\Phi(\alpha)]\; and  \; \delta(\alpha) = \lambda(\alpha)[\lambda(\alpha)-\alpha].

One sided truncation (lower tail):

 \operatorname{E}(X \mid X<b) = \mu -\sigma\frac{\phi(\beta)}{\Phi(\beta)} \!
 \operatorname{Var}(X \mid X<b) = \sigma^2\left[1-\beta \frac{\phi(\beta)}{\Phi(\beta)}- \left(\frac{\phi(\beta)}{\Phi(\beta)} \right)^2\right],\!

where \beta=(b-\mu)/\sigma.

Barr and Sherrill (1999) give a simpler expression for the variance of one sided truncations. Their formula is in terms of the chi-square CDF, which is implemented in standard software libraries. Bebu and Mathew (2009) provide formulas for (generalized) confidence intervals around the truncated moments.

Differential equation


\left\{\sigma ^2 f'(x)+f(x) (x-\mu )=0,f(0)=\frac{\sqrt{\frac{2}{\pi }}
   e^{-\frac{\mu ^2}{2 \sigma ^2}}}{\sigma 
   \left(\text{erf}\left(\frac{\mu -a}{\sqrt{2} \sigma
   }\right)-\text{erf}\left(\frac{\mu -b}{\sqrt{2} \sigma
   }\right)\right)}\right\}

A Recursive Formula

As for the non-truncated case, there is a neat recursive formula for the truncated moments. See.[3]

Simulating[edit]

A random variate x defined as  x = \Phi^{-1}( \Phi(\alpha) + U\cdot(\Phi(\beta)-\Phi(\alpha)))\sigma + \mu with \Phi the cumulative distribution function and \Phi^{-1} its inverse, U a uniform random number on (0, 1), follows the distribution truncated to the range (a, b). This method is theoretically the best, however the simulation of random variables from \Phi and \Phi^{-1} may imply numerical errors; thus practically one has to find other implementations.

For more on simulating a draw from the truncated normal distribution, see Robert (1995), Lynch (2007) Section 8.1.3 (pages 200–206), Devroye (1986). The MSM package in R has a function, rtnorm, that calculates draws from a truncated normal. The truncnorm package in R also has functions to draw from a truncated normal.

Chopin proposed an algorithm inspired from the Ziggurat algorithm of Marsaglia and Tsang (1984, 2000), which is usually considered as the fastest Gaussian sampler, and is also very close to Ahrens’s algorithm (1995). Implementations can be found in C, C++, Matlab and Python.

Sampling from the multivariate truncated normal distribution is considerably more difficult. Damien and Walker (2001) introduce a general methodology for sampling truncated densities within a Gibbs sampling framework. Their algorithm introduces one latent variable and is more computationally efficient than the algorithm of Robert (1995).

See also[edit]

References[edit]

  1. ^ Johnson, N.L., Kotz, S., Balakrishnan, N. (1994) Continuous Univariate Distributions, Volume 1, Wiley. ISBN 0-471-58495-9 (Section 10.1)
  2. ^ Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 0-13-066189-9. 
  3. ^ Document by Eric Orjebin, "http://www.smp.uq.edu.au/people/YoniNazarathy/teaching_projects/studentWork/EricOrjebin_TruncatedNormalMoments.pdf"
  • Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 0-13-066189-9. 
  • Norman L. Johnson and Samuel Kotz (1970). Continuous univariate distributions-1, chapter 13. John Wiley & Sons.
  • Lynch, Scott (2007). Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. New York: Springer. ISBN 978-1-4419-2434-6. 
  • Robert, Christian P. (1995). "Simulation of truncated normal variables". Statistics and Computing 5 (2): 121–125. doi:10.1007/BF00143942. 
  • Barr, Donald R.; Sherrill, E.Todd (1999). "Mean and variance of truncated normal distributions". The American Statistician 53 (4): 357–361. doi:10.1080/00031305.1999.10474490. 
  • Bebu, Ionut; Mathew, Thomas (2009). "Confidence intervals for limited moments and truncated moments in normal and lognormal models". Statistics and Probability Letters 79: 375–380. doi:10.1016/j.spl.2008.09.006. 
  • Damien, Paul; Walker, Stephen G. (2001). "Sampling truncated normal, beta, and gamma densities". Journal of Computational and Graphical Statistics 10 (2): 206–215. doi:10.1198/10618600152627906. 
  • Nicolas Chopin, "Fast simulation of truncated Gaussian distributions". Statistics and Computing 21(2): 275-288.