Ratio distribution

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A ratio distribution (or quotient distribution) is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two (usually independent) random variables X and Y, the distribution of the random variable Z that is formed as the ratio

Z = X/Y

is a ratio distribution.

The Cauchy distribution is an example of a ratio distribution. The random variable associated with this distribution comes about as the ratio of two Gaussian (normal) distributed variables with zero mean. Thus the Cauchy distribution is also called the normal ratio distribution.[citation needed] A number of researchers have considered more general ratio distributions.[1][2][3][4][5][6][7][8][9] Two distributions often used in test-statistics, the t-distribution and the F-distribution, are also ratio distributions: The t-distributed random variable is the ratio of a Gaussian random variable divided by an independent chi-distributed random variable (i.e., the square root of a chi-squared distribution), while the F-distributed random variable is the ratio of two independent chi-squared distributed random variables.

Often the ratio distributions are heavy-tailed, and it may be difficult to work with such distributions and develop an associated statistical test. A method based on the median has been suggested as a "work-around".[10]

Algebra of random variables[edit]

The ratio is one type of algebra for random variables: Related to the ratio distribution are the product distribution, sum distribution and difference distribution. More generally, one may talk of combinations of sums, differences, products and ratios. Many of these distributions are described in Melvin D. Springer's book from 1979 The Algebra of Random Variables.[8]

The algebraic rules known with ordinary numbers do not apply for the algebra of random variables. For example, if a product is C = AB and a ratio is D=C/A it does not necessarily mean that the distributions of D and B are the same. Indeed, a peculiar effect is seen for the Cauchy distribution: The product and the ratio of two independent Cauchy distributions (with the same scale parameter and the location parameter set to zero) will give the same distribution.[8] This becomes evident when regarding the Cauchy distribution as itself a ratio distribution of two Gaussian distributions: Consider two Cauchy random variables, C_1 and C_2 each constructed from two Gaussian distributions C_1=G_1/G_2 and C_2 = G_3/G_4 then

\frac{C_1}{C_2} = \frac{{G_1}/{G_2}}{{G_3}/{G_4}} = \frac{G_1 G_4}{G_2 G_3} = \frac{G_1}{G_2} \times \frac{G_4}{G_3} = C_1 \times C_3,

where C_3 = G_4/G_3. The first term is the ratio of two Cauchy distributions while the last term is the product of two such distributions.

Derivation[edit]

A way of deriving the ratio distribution of Z from the joint distribution of the two other random variables, X and Y, is by integration of the following form[3]

p_Z(z) = \int^{+\infty}_{-\infty} |y|\, p_{X,Y}(zy, y) \, dy.

This is not always straightforward.

The Mellin transform has also been suggested for derivation of ratio distributions.[8]

Gaussian ratio distribution[edit]

When X and Y are independent and have a Gaussian distribution with zero mean the form of their ratio distribution is fairly simple: It is a Cauchy distribution. However, when the two distributions have non-zero means then the form for the distribution of the ratio is much more complicated. In 1969 David Hinkley found a form for this distribution.[6] In the absence of correlation (cor(X,Y) = 0), the probability density function of the two normal variable X = N(μX, σX2) and Y = N(μY, σY2) ratio Z = X/Y is given by the following expression:

 p_Z(z)= \frac{b(z) \cdot c(z)}{a^3(z)} \frac{1}{\sqrt{2 \pi} \sigma_x \sigma_y} \left[2 \Phi \left( \frac{b(z)}{a(z)}\right) - 1 \right] + \frac{1}{a^2(z) \cdot \pi \sigma_x \sigma_y } e^{- \frac{1}{2} \left( \frac{\mu_x^2}{\sigma_x^2} + \frac{\mu_y^2}{\sigma_y^2} \right)}

where

 a(z)= \sqrt{\frac{1}{\sigma_x^2} z^2 + \frac{1}{\sigma_y^2}}
 b(z)= \frac{\mu_x }{\sigma_x^2} z + \frac{\mu_y}{\sigma_y^2}
 c(z)= e^{\frac {1}{2} \frac{b^2(z)}{a^2(z)} - \frac{1}{2} \left( \frac{\mu_x^2}{\sigma_x^2} + \frac{\mu_y^2}{\sigma_y^2} \right)}
 \Phi(z)= \int_{-\infty}^{z}\, \frac{1}{\sqrt{2 \pi}} e^{- \frac{1}{2} u^2}\ du\

The above expression becomes even more complicated if the variables X and Y are correlated. It can also be shown that p(z) is a standard Cauchy distribution if μX = μY = 0, and σX = σY = 1. In such case b(z) = 0, and

p(z)= \frac{1}{\pi} \frac{1}{1 + z^2}

If \sigma_X \neq 1, \sigma_Y \neq 1 or \rho \neq 0 the more general Cauchy distribution is obtained

p_Z(z) = \frac{1}{\pi} \frac{\beta}{(z-\alpha)^2 + \beta^2},

where ρ is the correlation coefficient between X and Y and

\alpha = \rho \frac{\sigma_x}{\sigma_y},
\beta = \frac{\sigma_x}{\sigma_y} \sqrt{1-\rho^2}.

The complex distribution has also been expressed with Kummer's confluent hypergeometric function or the Hermite function.[9]

A transformation to Gaussianity[edit]

A transformation has been suggested so that, under certain assumptions, the transformed variable T would approximately have a standard Gaussian distribution:[1]

t = \frac{\mu_y z - \mu_x}{\sqrt{\sigma_y^2 z^2 - 2\rho \sigma_x \sigma_y z + \sigma_x^2}}

The transformation has been called the Geary–Hinkley transformation,[7] and the approximation is good if Y is unlikely to assume negative values.

Uniform ratio distribution[edit]

With two independent random variables following a uniform distribution, e.g.,

p_X(x) = \begin{cases} 1 \qquad 0 < x < 1 \\ 0 \qquad \mbox{otherwise}\end{cases}

the ratio distribution becomes

p_Z(z) = \begin{cases}
1/2            \qquad & 0 < z < 1 \\ 
\frac{1}{2z^2} \qquad & z \geq 1  \\
0              \qquad & \mbox{otherwise} \end{cases}

Cauchy ratio distribution[edit]

If two independent random variables, X and Y each follow a Cauchy distribution with median equal to zero and shape factor a

p_X(x|a) = \frac{a}{\pi (a^2 + x^2)}

then the ratio distribution for the random variable Z = X/Y is [11]

p_Z(z|a) = \frac{1}{\pi^2(z^2-1)} \ln \left(z^2\right).

Interestingly, this distribution does not depend on a and it should be noted that the result stated by Springer [8] (p158 Question 4.6) is not correct. The ratio distribution is similar to but not the same as the product distribution of the random variable W=XY:

p_W(w|a) = \frac{a^2}{\pi^2(w^2-a^4)} \ln \left(\frac{w^2}{a^4}\right). [8]

More generally, if two independent random variables X and Y each follow a Cauchy distribution with median equal to zero and shape factor a and b respectively, then:

1. The ratio distribution for the random variable Z = X/Y is [11]

p_Z(z|a,b) = \frac{ab}{\pi^2(b^2z^2-a^2)} \ln \left(\frac{b^2 z^2}{a^2}\right).

2. The product distribution for the random variable W = XY is [11]

p_W(w|a,b) = \frac{ab}{\pi^2(w^2-a^2b^2)} \ln \left(\frac{w^2}{a^2b^2}\right).

The result for the ratio distribution can be obtained from the product distribution by replacing b with \frac{1}{b}.

Ratio of standard normal to standard uniform[edit]

Main article: Slash distribution

If X has a standard normal distribution and Y has a standard uniform distribution, then Z = X / Y has a distribution known as the slash distribution, with probability density function

p_Z(z) = \begin{cases}
\left[ \phi(0) - \phi(z) \right] / z^2 \quad &  z \ne 0 \\
\phi(0) / 2 \quad &  z = 0,  \\
\end{cases}

where φ(z) is the probability density function of the standard normal distribution.[12]

Other ratio distributions[edit]

Let X be a normal(0,1) distribution, Y and Z be a chi square distributions with m and n degrees of freedom respectively. Then

 \frac{ X }{ \sqrt{ Y / m } } = t_m
 \frac{ Y / m }{ Z / n }  = F_{ m, n }
 \frac{ Y }{ Y + Z } = beta( m / 2, n / 2 )

where tm is Student's t distribution, F is the F distribution and beta is the beta distribution.

Ratio distributions in multivariate analysis[edit]

Ratio distributions also appear in multivariate analysis. If the random matrices X and Y follow a Wishart distribution then the ratio of the determinants

\phi = |\mathbf{X}|/|\mathbf{Y}|

is proportional to the product of independent F random variables. In the case where X and Y are from independent standardized Wishart distributions then the ratio

\Lambda = {|\mathbf{X}|/|\mathbf{X}+\mathbf{Y}|}

has a Wilks' lambda distribution.

See also[edit]

References[edit]

  1. ^ a b Geary, R. C. (1930). "The Frequency Distribution of the Quotient of Two Normal Variates". Journal of the Royal Statistical Society 93 (3): 442–446. doi:10.2307/2342070. JSTOR 2342070. 
  2. ^ Fieller, E. C. (November 1932). "The Distribution of the Index in a Normal Bivariate Population". Biometrika 24 (3/4): 428–440. doi:10.2307/2331976. JSTOR 2331976. 
  3. ^ a b Curtiss, J. H. (December 1941). "On the Distribution of the Quotient of Two Chance Variables". The Annals of Mathematical Statistics 12 (4): 409–421. doi:10.1214/aoms/1177731679. JSTOR 2235953. 
  4. ^ George Marsaglia (April 1964). Ratios of Normal Variables and Ratios of Sums of Uniform Variables. Defense Technical Information Center.
  5. ^ Marsaglia, George (March 1965). "Ratios of Normal Variables and Ratios of Sums of Uniform Variables". Journal of the American Statistical Association 60 (309): 193–204. doi:10.2307/2283145. JSTOR 2283145. 
  6. ^ a b Hinkley, D. V. (December 1969). "On the Ratio of Two Correlated Normal Random Variables". Biometrika 56 (3): 635–639. doi:10.2307/2334671. JSTOR 2334671. 
  7. ^ a b Hayya, Jack; Armstrong, Donald; Gressis, Nicolas (July 1975). "A Note on the Ratio of Two Normally Distributed Variables". Management Science 21 (11): 1338–1341. doi:10.1287/mnsc.21.11.1338. JSTOR 2629897. 
  8. ^ a b c d e f Springer, Melvin Dale (1979). The Algebra of Random Variables. Wiley. ISBN 0-471-01406-0. 
  9. ^ a b Pham-Gia, T.; Turkkan, N.; Marchand, E. (2006). "Density of the Ratio of Two Normal Random Variables and Applications". Communications in Statistics - Theory and Methods (Taylor & Francis) 35 (9): 1569–1591. doi:10.1080/03610920600683689. 
  10. ^ Brody, James P.; Williams, Brian A.; Wold, Barbara J.; Quake, Stephen R. (October 2002). "Significance and statistical errors in the analysis of DNA microarray data". Proc Natl Acad Sci U S A 99 (20): 12975–12978. doi:10.1073/pnas.162468199. PMC 130571. PMID 12235357. 
  11. ^ a b c Kermond, John (2010). "An Introduction to the Algebra of Random Variables". Mathematical Association of Victoria 47th Annual Conference Proceedings - New Curriculum. New Opportunities (The Mathematical Association of Victoria): 1–16. ISBN 978-1-876949-50-1. 
  12. ^ "SLAPPF". Statistical Engineering Division, National Institute of Science and Technology. Retrieved 2009-07-02. 

External links[edit]