Exponentially modified Gaussian distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search
EMG
Probability density function
Probability density function for the EMG distribution
Cumulative distribution function
Cumulative distribution function for the EMG distribution
Parameters μR — mean of Gaussian component
σ2 > 0 — variance of Gaussian component
λ > 0 — rate of exponential component
Support μR
σR
λR
pdf \frac{\lambda}{2} e^{\frac{\lambda}{2} (2 \mu + \lambda \sigma^2 - 2 x)}
             \operatorname{erfc} (\frac{\mu + \lambda \sigma^2 - x}{ \sqrt{2} \sigma})
CDF 
  \Phi(u, 0, v) - e^{-u + v^2/2+ \log(\Phi(u, v^2, v))\; }
where

  \Phi(x, \mu, \sigma) is the CDF of a Gaussian distribution

  u = \lambda(x - \mu)

  v = \lambda \sigma
Mean \mu + 1/\lambda
Variance \sigma^2 + 1/\lambda^2
Skewness \frac{2}{\sigma^3 \lambda^3} \left(  1 + \frac{1}{\sigma^2 \lambda^2}  \right)^{-3/2}
Ex. kurtosis \frac{2 (1 + \frac{2}{\sigma^2 \lambda^2} + \frac{3}{\lambda^4 \sigma^4})}{\left( 1 + \frac{1}{\lambda^2 \sigma^2} \right)^2  } - 3
MGF \left(1 - \frac{t}{\lambda}\right)^{-1}\,\exp \{ \mu t + \frac{1}{2}\sigma^2 t^2 \}
CF \left(1 - \frac{it}{\lambda}\right)^{-1}\,\exp \{ i\mu t - \frac{1}{2}\sigma^2 t^2 \}

In probability theory, an exponentially modified Gaussian (EMG) distribution (ExGaussian distribution) describes the sum of independent normal and exponential random variables. An exGaussian random variable Z may be expressed as Z = X + Y where X and Y are independent, X is Gaussian with mean μ and variance σ2 and Y is exponential of rate λ. It has a characteristic positive skew from the exponential component.

It may also be regarded as a weighted function of a shifted exponential with the weight being a function of the normal distribution.

Definition[edit]

The probability density function of the exponentially modified normal distribution is[1]

f(x;\mu,\sigma,\lambda) = \frac{\lambda}{2} e^{\frac{\lambda}{2} (2 \mu + \lambda \sigma^2 - 2 x)}
             \operatorname{erfc} \left(\frac{\mu + \lambda \sigma^2 - x}{ \sqrt{2} \sigma}\right)

where erfc is the complementary error function defined as

\begin{align}
             \operatorname{erfc}(x) & = 1-\operatorname{erf}(x) \\
                                    & = \frac{2}{\sqrt{\pi}} \int_x^\infty e^{-t^2}\,dt.
       \end{align}

This density function is derived via convolution of the normal and exponential probability density functions.

Differential equation


\left\{\begin{array}{l}
\sigma ^2 f''(x)+f'(x) \left(\lambda  \sigma ^2-\mu +x\right)+\lambda  f(x) (x-\mu)=0, \\[12pt]
f(0)=\frac{1}{2} \lambda  e^{\frac{1}{2} \lambda  \left(\lambda  \sigma ^2+2 \mu
   \right)} \text{erfc}\left(\frac{\lambda  \sigma ^2+\mu }{\sqrt{2} \sigma}\right), \\[12pt]
f'(0)=\frac{\lambda  e^{-\frac{\mu ^2}{2 \sigma ^2}} \left(\sqrt{2}-\sqrt{\pi }
   \lambda  \sigma  e^{\frac{\left(\lambda  \sigma ^2+\mu \right)^2}{2 \sigma ^2}}
   \text{erfc}\left(\frac{\lambda  \sigma ^2+\mu }{\sqrt{2} \sigma }\right)\right)}{2 \sqrt{\pi
   } \sigma } \end{array} \right\}

Alternative form[edit]

An alternative but equivalent form of the probability density function is used in chromatography.[2] This is as follows


f(x; y_0, A, x_c, w, t_0 )=y_0+\frac{A}{t_0} \exp \left( \frac {1}{2} \left( \frac {w}{t_0} \right)^2 - \frac {x-x_c}{t_0} \right) \left( \frac{1}{2} + \frac{1}{2} \operatorname{erf} \left( \frac {z}{\sqrt{2}} \right) \right) ,

where

y_0 = the initial value,
A = the amplitude,
x_c = the center of the peak,
w = the width of the peak,
t_0 = the modification factor (skewness, t_0 > 0),
z = \frac {x-x_c}{w} - \frac {w}{t_0}
 \operatorname{erf} \left( \frac {z}{\sqrt{2}} \right) = the error function evaluated at  \frac {z}{\sqrt{2}} .

Example in alternative form[edit]

Emg.png

Parameter estimation[edit]

There are three parameters: the mean of the normal distribution (μ), the standard deviation of the normal distribution (σ) and the exponential parameter ( ν = 1 / λ ). A fourth parameter — the shape K = ν / σ — is sometimes used also to characterise the distribution. Depending on the values of the parameters the distribution may vary in shape from almost normal to almost exponential.

The parameters of the distribution can be estimated from the sample data with the method of moments as follows:,[3][4]

 m = \mu + \nu
s^2 = \sigma^2 + \nu^2
 \gamma_1 = \frac{ 2 \nu^{ 3 } } { ( \sigma^2 + \nu^2 )^{3/2} }

where m is the sample mean, s is the sample standard deviation and γ1 is the skewness.

Solving these for the parameters gives

  \overline{ \mu } = m - s \left( \frac{\gamma_1} {2} \right)^{ 1 / 3 }
 \overline{ \sigma^2 } = s^2 \left[ 1 - \left( \frac{ \gamma_1 } { 2 } \right)^{ 2 / 3 } \right]
 \overline{ \nu } = s \left( \frac{ \gamma_1 } { 2 } \right)^{ 1 / 3 }.

Recommendations[edit]

Ratcliff has suggested that there be at least 100 data points in the sample before the parameter estimates should be regarded as reliable.[5] Vincent averaging may be used with smaller samples as this procedure only modestly distorts the shape of the distribution.[6] These point estimates may be used as initial values that can be refined with more powerful methods including maximum likelihood.

Confidence intervals[edit]

There are currently no published tables available for significance testing with this distribution. The distribution can be simulated by forming the sum of two random variables one drawn from a normal distribution and the other from an exponential.

Skew[edit]

The value of the nonparametric skew

 \frac{\text{mean} - \text{median}}{\text{standard deviation}}

of this distribution lies between 0 and 0.31[7][8] The lower limit is approached when the normal component dominates and the upper when the exponential component dominates.

Usage[edit]

The distribution is used as a theoretical model for the shape of chromatographic peaks.[9][10] It has been proposed as a statistical model of intermitotic time in dividing cells.[11][12] It is also used in modelling cluster ion beams.[13] It is commonly used in psychology in the study of response times.[14][15]

Related distributions[edit]

This family of distributions is a special or limiting case of the normal-exponential-gamma distribution. The distribution is a compound probability distribution in which the mean of a normal distribution varies randomly as a shifted exponential distribution.

References[edit]

  1. ^ Grushka, Eli (1972). "Characterization of Exponentially Modified Gaussian Peaks in Chromatography". Analytical Chemistry 44 (11): 1733–1738. doi:10.1021/ac60319a011. 
  2. ^ Kalambet, Y.; Kozmin, Y.; Mikhailova, K.; Nagaev, I.; Tikhonov, P. (2011). "Reconstruction of chromatographic peaks using the exponentially modified Gaussian function". Journal of Chemometrics 25 (7): 352. doi:10.1002/cem.1343.  edit
  3. ^ http://books.google.com/books?id=x8tGby300QMC&lpg=PA20&ots=vSSgLqAEiQ&dq=exponentially%20modified%20gaussian%20function&pg=PA27#v=onepage&q=exponentially%20modified%20gaussian%20function&f=false
  4. ^ Olivier J and Norberg MM (2010) Positively skewed data: Revisiting the Box−Cox power transformation. Int J Psych Res 3 (1) 68−75
  5. ^ Ratcliff R (1979). Group reaction time distributions and an analysis of distribution statistics. Psych Bull 86, 446−461
  6. ^ Vincent SB (1912) The functions of the vibrissae in the behaviour of the white rat. Behaviour Monographs 1, 7−81
  7. ^ Heathcote A (1996). RTSYS: A DOS application for the analysis of reaction time data. Behavioural Research Methods, Instruments, & Computers 28, 427−445
  8. ^ Ulrich R, & Miller J (1994) Effects of outlier exclusion on reaction time analysis. J Exp Psych: General 123, 34−80
  9. ^ Gladney, HM; Dowden, BF; Swalen, JD (1969). "Computer-Assisted Gas-Liquid Chromatography". Anal. Chem. 41 (7): 883–888. doi:10.1021/ac60276a013. 
  10. ^ http://onlinelibrary.wiley.com/doi/10.1002/cem.1343/abstract
  11. ^ Golubev, A. (2010). "Exponentially modified Gaussian (EMG) relevance to distributions related to cell proliferation and differentiation". Journal of Theoretical Biology 262 (2): 257–266. doi:10.1016/j.jtbi.2009.10.005. PMID 19825376.  edit
  12. ^ Tyson, D. R.; Garbett, S. P.; Frick, P. L.; Quaranta, V. (2012). "Fractional proliferation: A method to deconvolve cell population dynamics from single-cell data". Nature Methods 9 (9): 923–928. doi:10.1038/nmeth.2138. PMC 3459330. PMID 22886092.  edit
  13. ^ Nicolaescu, D.; Takaoka, G. H.; Ishikawa, J. (2006). "Multiparameter characterization of cluster ion beams". Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures 24 (5): 2236. doi:10.1116/1.2335433.  edit
  14. ^ Palmer EM and Horowitz Todd S, Torralba A, Wolfe JM (2011) What are the shapes of response time distributions in visual search? J Exp Psychol 37(1) 58–71
  15. ^ Rohere D, Wixted JT (1994) An analysis of latency and interresponse time in free recall. Memory & Cognition 22 (5) 511–524