# Half-normal distribution

Jump to navigation Jump to search
Parameters Probability density function${\displaystyle \sigma =1}$ Cumulative distribution function${\displaystyle \sigma =1}$ ${\displaystyle \sigma >0}$ — (scale) ${\displaystyle x\in [0,\infty )}$ ${\displaystyle f(x;\sigma )={\frac {\sqrt {2}}{\sigma {\sqrt {\pi }}}}\exp \left(-{\frac {x^{2}}{2\sigma ^{2}}}\right)\quad x>0}$ ${\displaystyle F(x;\sigma )=\operatorname {erf} \left({\frac {x}{\sigma {\sqrt {2}}}}\right)}$ ${\displaystyle Q(F;\sigma )=\sigma {\sqrt {2}}\operatorname {erf} ^{-1}(F)}$ ${\displaystyle {\frac {\sigma {\sqrt {2}}}{\sqrt {\pi }}}}$ ${\displaystyle \sigma {\sqrt {2}}\operatorname {erf} ^{-1}(1/2)}$ ${\displaystyle 0}$ ${\displaystyle \sigma ^{2}\left(1-{\frac {2}{\pi }}\right)}$ ${\displaystyle {\frac {{\sqrt {2}}(4-\pi )}{(\pi -2)^{3/2}}}\approx 0.9952717}$ ${\displaystyle {\frac {8(\pi -3)}{(\pi -2)^{2}}}\approx 0.869177}$ ${\displaystyle {\frac {1}{2}}\log _{2}\left(2\pi e\sigma ^{2}\right)-1}$

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

Let ${\displaystyle X}$ follow an ordinary normal distribution, ${\displaystyle N(0,\sigma ^{2})}$, then ${\displaystyle Y=|X|}$ follows a half-normal distribution. Thus, the half-normal distribution is a fold at the mean of an ordinary normal distribution with mean zero.

## Properties

Using the ${\displaystyle \sigma }$ parametrization of the normal distribution, the probability density function (PDF) of the half-normal is given by

${\displaystyle f_{Y}(y;\sigma )={\frac {\sqrt {2}}{\sigma {\sqrt {\pi }}}}\exp \left(-{\frac {y^{2}}{2\sigma ^{2}}}\right)\quad y\geq 0,}$

where ${\displaystyle E[Y]=\mu ={\frac {\sigma {\sqrt {2}}}{\sqrt {\pi }}}}$.

Alternatively using a scaled precision (inverse of the variance) parametrization (to avoid issues if ${\displaystyle \sigma }$ is near zero), obtained by setting ${\displaystyle \theta ={\frac {\sqrt {\pi }}{\sigma {\sqrt {2}}}}}$, the probability density function is given by

${\displaystyle f_{Y}(y;\theta )={\frac {2\theta }{\pi }}\exp \left(-{\frac {y^{2}\theta ^{2}}{\pi }}\right)\quad y\geq 0,}$

where ${\displaystyle E[Y]=\mu ={\frac {1}{\theta }}}$.

The cumulative distribution function (CDF) is given by

${\displaystyle F_{Y}(y;\sigma )=\int _{0}^{y}{\frac {1}{\sigma }}{\sqrt {\frac {2}{\pi }}}\,\exp \left(-{\frac {x^{2}}{2\sigma ^{2}}}\right)\,dx}$

Using the change-of-variables ${\displaystyle z=x/({\sqrt {2}}\sigma )}$, the CDF can be written as

${\displaystyle F_{Y}(y;\sigma )={\frac {2}{\sqrt {\pi }}}\,\int _{0}^{y/({\sqrt {2}}\sigma )}\exp \left(-z^{2}\right)dz=\operatorname {erf} \left({\frac {y}{{\sqrt {2}}\sigma }}\right),}$

where erf is the error function, a standard function in many mathematical software packages.

The quantile function (or inverse CDF) is written:

${\displaystyle Q(F;\sigma )=\sigma {\sqrt {2}}\operatorname {erf} ^{-1}(F)}$

where ${\displaystyle 0\leq F\leq 1}$ and ${\displaystyle \operatorname {erf} ^{-1}}$ is the inverse error function

The expectation is then given by

${\displaystyle E[Y]=\sigma {\sqrt {2/\pi }},}$

The variance is given by

${\displaystyle \operatorname {var} (Y)=\sigma ^{2}\left(1-{\frac {2}{\pi }}\right).}$

Since this is proportional to the variance σ2 of X, σ can be seen as a scale parameter of the new distribution.

The differential entropy of the half-normal distribution is exactly one bit less the differential entropy of a zero-mean normal distribution with the same second moment about 0. This can be understood intuitively since the magnitude operator reduces information by one bit (if the probability distribution at its input is even). Alternatively, since a half-normal distribution is always positive, the one bit it would take to record whether a standard normal random variable were positive (say, a 1) or negative (say, a 0) is no longer necessary. Thus,

${\displaystyle h(Y)={\frac {1}{2}}\log _{2}\left({\frac {\pi e\sigma ^{2}}{2}}\right)={\frac {1}{2}}\log _{2}\left(2\pi e\sigma ^{2}\right)-1.}$

## Applications

The half-normal distribution is commonly utilized as a prior probability distribution for variance parameters in Bayesian inference applications.[1][2]

## Parameter estimation

Given numbers ${\displaystyle \{x_{i}\}_{i=1}^{n}}$ drawn from a half-normal distribution, the unknown parameter ${\displaystyle \sigma }$ of that distribution can be estimated by the method of maximum likelihood, giving

${\displaystyle {\hat {\sigma }}={\sqrt {{\frac {1}{n}}\sum _{i=1}^{n}x_{i}^{2}}}}$

The bias is equal to

${\displaystyle b\equiv \operatorname {E} {\bigg [}\;({\hat {\sigma }}_{\mathrm {mle} }-\sigma )\;{\bigg ]}=-{\frac {\sigma }{4n}}}$

which yields the bias-corrected maximum likelihood estimator

${\displaystyle {\hat {\sigma \,}}_{\text{mle}}^{*}={\hat {\sigma \,}}_{\text{mle}}-{\hat {b\,}}.}$

## Modification

Notation ${\displaystyle {\text{MHN}}\left(\alpha ,\beta ,\gamma \right)}$ ${\displaystyle \alpha >0,\beta >0{\text{ and }}\gamma \in \mathbb {R} }$ ${\displaystyle x>0}$ ${\displaystyle f(x)={\frac {2\beta ^{\frac {\alpha }{2}}x^{\alpha -1}\exp(-\beta x^{2}+\gamma x)}{\Psi {\left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}}}$, where ${\displaystyle \Psi (\alpha ,z)={}_{1}\Psi _{1}\left({\begin{matrix}\left(\alpha ,{\frac {1}{2}}\right)\\(1,0)\end{matrix}};z\right)}$ denotes the Fox-Wright Psi function. ${\displaystyle F_{_{\text{MHN}}}(x\mid \alpha ,\beta ,\gamma )={\frac {2\beta ^{\frac {\alpha }{2}}}{\Psi {\left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}}\sum _{i=0}^{\infty }{\frac {\gamma ^{i}}{2i!}}\beta ^{-{\frac {\alpha +i}{2}}}\gamma ({\frac {\alpha +i}{2}},\beta x^{2})}$, where ${\displaystyle \gamma (s,y)}$ denotes the lower incomplete gamma function. ${\displaystyle E(X)={\frac {\Psi \left({\frac {\alpha +1}{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}{\beta ^{\frac {1}{2}}\Psi \left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}}$ ${\displaystyle X_{\text{mode}}\leq E(X)\leq {\frac {\gamma +{\sqrt {\gamma ^{2}+8\alpha \beta }}}{4\beta }}{\text{ if }}\alpha >1.}$ ${\displaystyle X_{\text{mode}}={\frac {\gamma +{\sqrt {\gamma ^{2}+8\beta (\alpha -1)}}}{4\beta }}{\text{ if }}\alpha >1}$. ${\displaystyle {\text{Var}}(X)={\frac {\Psi \left({\frac {\alpha +2}{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}{\beta \Psi \left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}-\left[{\frac {\Psi \left({\frac {\alpha +1}{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}{\beta ^{\frac {1}{2}}\Psi \left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}\right]^{2}}$.

The modified half-normal distribution (MHN)[3] is a three-parameter family of continuous probability distributions supported on the positive part of the real line. The truncated normal distribution, half-normal distribution, and square-root of the Gamma distribution are special cases of the MHN distribution.

The MHN distribution is used a probability model, additionally it appears in a number of Markov Chain Monte Carlo (MCMC) based Bayesian procedures including the Bayesian modeling of the Directional Data,[4][5][6] Bayesian Binary regression,[7] Bayesian Graphical model.[8] The MHN distribution occurs in the diverse areas of research [9][10][11] [12] signifying its relevance to the contemporary statistical modeling and associated computation. Additionally, the moments and its other moment based statistics (including variance, skewness) can be represented via the Fox-Wright Psi functions, denoted by ${\displaystyle \Psi (\cdot ,\cdot )}$. There exists a recursive relation between the three consecutive moments of the distribution.

### Moments

• Let ${\displaystyle X\sim MHN(\alpha ,\beta ,\gamma )}$then for ${\displaystyle k\geq 0}$, then assuming ${\displaystyle \alpha +k}$ to be a positive real number ${\displaystyle E(X^{k})={\frac {\Psi \left({\frac {\alpha +k}{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}{\beta ^{\frac {k}{2}}\Psi \left({\frac {\alpha }{2}},{\frac {\gamma }{\sqrt {\beta }}}\right)}}}$
• If ${\displaystyle \alpha +k>0}$, then ${\displaystyle E(X^{k+2})={\frac {\alpha +k}{2\beta }}E(X^{k})+{\frac {\gamma }{2\beta }}E(X^{k+1})}$[3]
• The variance of the distribution ${\displaystyle {\text{Var}}(X)={\frac {\alpha }{2\beta }}+E(X)\left({\frac {\gamma }{2\beta }}-E(X)\right)}$

### Modal characterization of MHN

Consider the MHN${\displaystyle (\alpha ,\beta ,\gamma )}$ with ${\displaystyle \alpha >0}$, ${\displaystyle \beta >0}$ and ${\displaystyle \gamma \in \mathbb {R} }$.[3]

• The probability density function of the distribution is log-concave if ${\displaystyle \alpha \geq 1}$.
• The mode of the distribution is located at ${\displaystyle {\frac {\gamma +{\sqrt {\gamma ^{2}+8\beta (\alpha -1)}}}{4\beta }}{\text{ if }}\alpha >1}$.
• If ${\displaystyle \gamma >0}$ and ${\displaystyle 1-{\frac {\gamma ^{2}}{8\beta }}\leq \alpha <1}$ then the density has a local maxima at

${\displaystyle {\frac {\gamma +{\sqrt {\gamma ^{2}+8\beta (\alpha -1)}}}{4\beta }}}$ and a local minima at ${\displaystyle {\frac {\gamma -{\sqrt {\gamma ^{2}+8\beta (\alpha -1)}}}{4\beta }}}$.

• The density function is gradually decresing on ${\displaystyle \mathbb {R} _{+}}$ and mode of the distribution doesn't exist, if either ${\displaystyle \gamma >0}$, ${\displaystyle 0<\alpha <1-{\frac {\gamma ^{2}}{8\beta }}}$ or ${\displaystyle \gamma <0,\alpha \leq 1}$.[3]

### Additional properties involving mode and Expected values

Let ${\displaystyle X\sim {\text{MHN}}(\alpha ,\beta ,\gamma )}$ for ${\displaystyle \alpha \geq 1}$, ${\displaystyle \beta >0}$ and ${\displaystyle \gamma \in \mathbb {R} {}}$. Let ${\displaystyle X_{\text{mode}}={\frac {\gamma +{\sqrt {\gamma ^{2}+8\beta (\alpha -1)}}}{4\beta }}}$ denotes the mode of the distribution. For all ${\displaystyle \gamma \in \mathbb {R} }$ if ${\displaystyle \alpha >1}$ then, ${\displaystyle X_{\text{mode}}\leq E(X)\leq {\frac {\gamma +{\sqrt {\gamma ^{2}+8\alpha \beta }}}{4\beta }}.}$ The difference between the upper and lower bound provided in the above inequality approaches to zero as ${\displaystyle \alpha }$ gets larger. Therefore, it also provides high precision approximation of ${\displaystyle E(X)}$ when ${\displaystyle \alpha }$ is large. On the other hand, if ${\displaystyle \gamma >0}$ and ${\displaystyle \alpha \geq 4}$, ${\displaystyle \log(X_{\text{mode}})\leq E(\log(X))\leq \log \left({\frac {\gamma +{\sqrt {\gamma ^{2}+8\alpha \beta }}}{4\beta }}\right)}$. For all ${\displaystyle \alpha >0,\beta >0{\text{ and }}\gamma \in \mathbb {R} }$, ${\displaystyle {\text{Var}}(X)\leq {\frac {1}{2\beta }}}$. An implication of the fact ${\displaystyle E(X)\geq X_{\text{mode}}}$ is that the distribution is positively skewed.[3]

## References

1. ^ Gelman, A. (2006), "Prior distributions for variance parameters in hierarchical models", Bayesian Analysis, 1 (3): 515–534, doi:10.1214/06-ba117a
2. ^ Röver, C.; Bender, R.; Dias, S.; Schmid, C.H.; Schmidli, H.; Sturtz, S.; Weber, S.; Friede, T. (2021), "On weakly informative prior distributions for the heterogeneity parameter in Bayesian random‐effects meta‐analysis", Research Synthesis Methods, 12 (4): 448–474, arXiv:2007.08352, doi:10.1002/jrsm.1475, PMID 33486828, S2CID 220546288
3. Sun, Jingchao; Kong, Maiying; Pal, Subhadip (22 June 2021). "The Modified-Half-Normal distribution: Properties and an efficient sampling scheme". Communications in Statistics - Theory and Methods: 1–23. doi:10.1080/03610926.2021.1934700. ISSN 0361-0926. S2CID 237919587.
4. ^ Pal, Subhadip; Gaskins, Jeremy (23 May 2022). "Modified Pólya-Gamma data augmentation for Bayesian analysis of directional data". Journal of Statistical Computation and Simulation: 1–22. doi:10.1080/00949655.2022.2067853. ISSN 0094-9655. S2CID 249022546.
5. ^ Chakraborty, Saptarshi; Khare, Kshitij (2017). "Convergence properties of Gibbs samplers for Bayesian probit regression with proper priors". Electronic Journal of Statistics. 11 (1): 177–210. arXiv:1602.08558. doi:10.1214/16-EJS1219. ISSN 1935-7524. S2CID 54829238. Retrieved 16 July 2021.
6. ^ Hernandez-Stumpfhauser, Daniel; Breidt, F. Jay; Woerd, Mark J. van der (2017). "The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference". Bayesian Analysis. 12 (1): 113–133. doi:10.1214/15-BA989. ISSN 1936-0975. Retrieved 16 July 2021.
7. ^ Pal, Subhadip; Khare, Kshitij; Hobert, James P. (2 October 2015). "Improving the Data Augmentation Algorithm in the Two-Block Setup". Journal of Computational and Graphical Statistics. 24 (4): 1114–1133. doi:10.1080/10618600.2014.955177. ISSN 1061-8600. S2CID 6869127. Retrieved 16 July 2021.
8. ^ Finegold, Michael; Drton, Mathias (2014). "Robust Bayesian Graphical Modeling Using Dirichlet t-Distributions". Bayesian Analysis. 9 (3): 521–550. doi:10.1214/13-BA856. ISSN 1936-0975. Retrieved 16 July 2021.
9. ^ Altun, Emrah; Korkmaz, Mustafa Ç; El-Morshedy, M.; Eliwa, M. S.; Altun, Emrah; Korkmaz, Mustafa Ç; El-Morshedy, M.; Eliwa, M. S. (2021). "The extended gamma distribution with regression model and applications". AIMS Mathematics. 6 (3): 2418–2439. doi:10.3934/math.2021147. ISSN 2473-6988.
10. ^ M. Olmos, Neveka; Venegas, Osvaldo (30 May 2018). "Modified Generalized Half-Normal Distribution with Application to Lifetimes". Applied Mathematics & Information Sciences. 12 (3): 637–643. doi:10.18576/amis/120320. ISSN 2325-0399. Retrieved 16 July 2021.
11. ^ Cordeiro, Gauss M.; Pescim, Rodrigo R.; Ortega, Edwin M. M.; Demétrio, Clarice G. B. (31 December 2013). "The Beta Generalized Half-Normal Distribution: New Properties". Journal of Probability and Statistics. 2013: 1–18. doi:10.1155/2013/491628.
12. ^ Norman, Johnson; Kotz, Samuel; Balakrishnan, N. (21 October 1994). Continuous Univariate Distributions (2nd ed.). New York: John Wiley & Sons. ISBN 978-0-471-58495-7.