# Product distribution

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product

${\displaystyle Z=XY}$

is a product distribution.

## Algebra of random variables

The product is one type of algebra for random variables: Related to the product distribution are the ratio distribution, sum distribution (see List of convolutions of probability distributions) and difference distribution. More generally, one may talk of combinations of sums, differences, products and ratios.

Many of these distributions are described in Melvin D. Springer's book from 1979 The Algebra of Random Variables.[1]

## Derivation for independent random variables

If ${\displaystyle X}$ and ${\displaystyle Y}$ are two independent, continuous random variables, described by probability density functions ${\displaystyle f_{X}}$ and ${\displaystyle f_{Y}}$ then the probability density function of ${\displaystyle Z=XY}$ is[2]

${\displaystyle f_{Z}(z)=\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{|x|}}\,dx.}$

### Proof [3]

We first write the cumulative distribution function of ${\displaystyle Z}$ starting with its definition

${\displaystyle {\begin{array}{lcl}F_{Z}\left(z\right)&{\overset {\underset {\mathrm {def} }{}}{=}}&\mathbb {P} (Z\leq z)\\&=&\mathbb {P} (XY\leq z)\\&=&\mathbb {P} (XY\leq z,X\geq 0)+\mathbb {P} (XY\leq z,X\leq 0)\\&=&\mathbb {P} (Y\leq z/X,X\geq 0)+\mathbb {P} (Y\geq z/X,X\leq 0)\\&=&\int _{0}^{\infty }f_{X}\left(x\right)\int _{-\infty }^{z/x}f_{Y}\left(y\right)\,dy\,dx+\int _{-\infty }^{0}f_{X}\left(x\right)\int _{z/x}^{\infty }f_{Y}\left(y\right)\,dy\,dx\end{array}}}$

We find the desired probability density function by taking the derivative of both sides with respect to ${\displaystyle z}$. Since on the right hand side, ${\displaystyle z}$ appears only in the integration limits, the derivative is easily performed using the fundamental theorem of calculus and the chain rule. (Note the negative sign that is needed when the variable occurs in the lower limit of the integration.)

${\displaystyle {\begin{array}{lcl}f_{Z}(z)&=&\int _{0}^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{x}}\,dx-\int _{-\infty }^{0}f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{x}}\,dx\\&=&\int _{0}^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{|x|}}\,dx+\int _{-\infty }^{0}f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{|x|}}\,dx\\&=&\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{|x|}}\,dx.\end{array}}}$

where the absolute value is used to conveniently combine the two terms.

### Alternate Proof

A faster more compact proof begins with the same step of writing the cumulative distribution of ${\displaystyle Z}$ starting with its definition:

${\displaystyle {\begin{array}{lcl}F_{Z}\left(z\right)&{\overset {\underset {\mathrm {def} }{}}{=}}&\mathbb {P} (Z\leq z)\\&=&\mathbb {P} (XY\leq z)\\&=&\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(y\right)u\left(z-xy\right)\,dy\,dx\end{array}}}$

where ${\displaystyle u\left(.\right)}$ is the Heaviside step function and serves to limit the region of integration to values of ${\displaystyle x}$ and ${\displaystyle y}$ satisfying ${\displaystyle xy\leq z}$.

We find the desired probability density function by taking the derivative of both sides with respect to ${\displaystyle z}$.

${\displaystyle {\begin{array}{lcl}f_{Z}(z)&=&\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(y\right)\delta \left(z-xy\right)\,dy\,dx\\&=&\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right)\left[\int _{-\infty }^{\infty }\delta \left(z-xy\right)\,dy\right]\,dx\\&=&\int _{-\infty }^{\infty }f_{X}\left(x\right)f_{Y}\left(z/x\right){\frac {1}{|x|}}\,dx.\end{array}}}$

where we utilize the translation and scaling properties of the Dirac delta function ${\displaystyle \delta \left(.\right)}$.

A more intuitive description of the procedure is illustrated in the figure below. The joint pdf ${\displaystyle f_{X}(x)f_{Y}(y)}$ exists in the x-y plane and an arc of constant z value is shown as the shaded line. To find the marginal probability ${\displaystyle f(z)}$ on this arc, integrate over increments of area ${\displaystyle dxdy\;f(x,y)}$ on this contour.

Diagram to illustrate the product distribution of two variables.

We have ${\displaystyle dy=-dx\;y/x^{2}}$ so the increment of probability is ${\displaystyle \delta p=f(x,y)dx|dy|=f_{X}(x)f_{Y}(z/x){\frac {y}{x^{2}}}(dx)^{2}}$. If ${\displaystyle {\frac {y}{x}}\;dx}$ can be equated with ${\displaystyle dz}$, then integration over x, yields ${\displaystyle \int f_{X}(x)f_{Y}(z/x){\frac {1}{|x|}}dx}$ yields the integral above.

### A Bayesian Interpretation

Let ${\displaystyle X\sim f(x)}$ be a random sample drawn from probability distribution ${\displaystyle f_{x}(x)}$. Scaling ${\displaystyle X}$ by ${\displaystyle \theta }$ generates a sample from scaled distribution ${\displaystyle \theta X\sim {\frac {1}{|\theta |}}f_{x}\left({\frac {x}{\theta }}\right)}$ which can be written as a conditional distribution ${\displaystyle g_{x}(x|\theta )={\frac {1}{|\theta |}}f_{x}\left({\frac {x}{\theta }}\right)}$.

Letting ${\displaystyle \theta }$ be a random variable with pdf ${\displaystyle f_{\theta }(\theta )}$, the distribution of the scaled sample becomes ${\displaystyle f_{x}(\theta X)=g_{x}(x|\theta )f_{\theta }(\theta )}$ and integrating out ${\displaystyle \theta }$ we get ${\displaystyle h_{x}(x)=\int _{-\infty }^{\infty }g_{x}(x|\theta )f_{\theta }(\theta )d\theta }$ so ${\displaystyle \theta X}$ is drawn from this distribution ${\displaystyle \theta X\sim h_{x}(x)}$. However, substituting the definition of ${\displaystyle g}$ we also have ${\displaystyle h_{x}(x)=\int _{-\infty }^{\infty }{\frac {1}{|\theta |}}f_{x}\left({\frac {x}{\theta }}\right)f_{\theta }(\theta )d\theta }$ which has the same form as the product distribution above. Thus the Bayesian posterior distribution ${\displaystyle h_{x}(x)}$ is the distribution of the product of the two independent random samples ${\displaystyle \theta }$ and ${\displaystyle X}$.

For the case of one variable being discrete, let ${\displaystyle \theta {\text{ have probability }}P_{i}{\text{ at levels }}\theta _{i}{\text{ with }}\sum _{i}P_{i}=1}$. The conditional density is ${\displaystyle f_{x}(X|\theta _{i})={\frac {1}{|\theta _{i}|}}f_{x}\left({\frac {x}{\theta _{i}}}\right)}$. Therefore ${\displaystyle f_{x}(\theta X)=\sum {{\frac {P_{i}}{|\theta _{i}|}}f_{x}\left({\frac {x}{\theta _{i}}}\right)}}$.

## Expectation of product of random variables

When two random variables are statistically independent, the expectation of their product is the product of their expectations. This can be proved from the Law of total expectation:

${\displaystyle \operatorname {E} (XY)=\operatorname {E} _{Y}(\operatorname {E} _{XY\mid Y}(XY\mid Y))}$

In the inner expression, Y is a constant. Hence:

${\displaystyle \operatorname {E} _{XY\mid Y}(XY\mid Y)=Y\cdot \operatorname {E} _{X\mid Y}[X]}$
${\displaystyle \operatorname {E} (XY)=\operatorname {E} _{Y}(Y\cdot \operatorname {E} _{X\mid Y}[X])}$

This is true even if X and Y are statistically dependent. However, in general ${\displaystyle \operatorname {E} _{X\mid Y}[X]}$ is a function of Y. In the special case in which X and Y are statistically independent, it is a constant independent of Y. Hence:

${\displaystyle \operatorname {E} (XY)=\operatorname {E} _{Y}(Y\cdot \operatorname {E} _{X}[X])}$
${\displaystyle \operatorname {E} (XY)=\operatorname {E} _{X}(X)\cdot \operatorname {E} _{Y}(Y)}$

## Characteristic function of product of random variables

Assume X, Y are independent random variables. The characteristic function of X is ${\displaystyle \phi _{X}(t)}$, and the distribution of Y is known. Then from the Law of total expectation, we have

{\displaystyle {\begin{aligned}\phi _{Z}(t)&=\operatorname {E} (e^{itXY})\\&=\operatorname {E} _{Y}(\operatorname {E} _{XY\mid Y}(e^{itXY}\mid Y))\\&=\operatorname {E} _{Y}(\operatorname {E} _{X\mid Y}(e^{itXY}\mid Y))\\&=\operatorname {E} _{Y}(\phi _{X}(tY))\end{aligned}}}

If the characteristic functions and distributions of both X and Y are known, then alternatively, ${\displaystyle \phi _{Z}(t)=\operatorname {E} _{X}(\phi _{Y}(tX))}$ also holds.

## Mellin Transform

The Mellin transform of a distribution ${\displaystyle f(x)}$ with support only on ${\displaystyle x\geq 0}$ and having a random sample ${\displaystyle X}$ is

${\displaystyle {\mathcal {M}}f(x)=\varphi (s)=\int _{0}^{\infty }x^{s-1}f(x)\,dx=\mathbb {E} [X^{s-1}].}$

The inverse transform is

${\displaystyle {\mathcal {M}}^{-1}\varphi (s)=f(x)={\frac {1}{2\pi i}}\int _{c-i\infty }^{c+i\infty }x^{-s}\varphi (s)\,ds.}$

if ${\displaystyle X{\text{ and }}Y}$ are two independent random samples from different distributions, then the Mellin transform of their product is equal to the product of their Mellin transforms:

${\displaystyle {\mathcal {M}}_{XY}(s)={\mathcal {M}}_{X}(s){\mathcal {M}}_{Y}(s)}$

If s is restricted to integer values, a simpler result is

${\displaystyle \mathbb {E} [(XY)^{n}]=\mathbb {E} [X^{n}]\;\mathbb {E} [Y^{n}]}$

Thus the moments of the random product ${\displaystyle XY}$ are the product of the corresponding moments of ${\displaystyle X{\text{ and }}Y}$ and this extends to non-integer moments, for example

${\displaystyle \mathbb {E} [{\sqrt[{p}]{XY}}]=\mathbb {E} [{\sqrt[{p}]{X}}]\;\mathbb {E} [{\sqrt[{p}]{Y}}]}$.

The pdf of a function can be reconstructed from its moments using the Saddlepoint approximation method.

To illustrate how the product of moments yields a much simpler result than finding the moments of the distribution of the product, let ${\displaystyle X,Y}$ be sampled from two Gamma distributions, ${\displaystyle \Gamma (\theta )^{-1}x^{\theta -1}e^{-x}}$ with parameters ${\displaystyle \theta =\alpha ,\beta }$ whose moments are ${\displaystyle E[X^{p}]=\int _{0}^{\infty }x^{p}\Gamma (x,\theta )dx={\frac {\Gamma (\theta +p)}{\Gamma (\theta )}}}$.

Multiplying the corresponding moments gives the Mellin Transform result

${\displaystyle E[(XY)^{p}]=E[X^{p}]\;E[Y^{p}]={\frac {\Gamma (\alpha +p)}{\Gamma (\alpha )}}\;{\frac {\Gamma (\beta +p)}{\Gamma (\beta )}}}$

Independently, it is known that the product of two Gamma samples has the distribution ${\displaystyle f(z,\alpha ,\beta )=2\Gamma (\alpha )^{-1}\Gamma (\beta )^{-1}z^{{\frac {\alpha +\beta )}{2}}-1}K_{\alpha -\beta }(2{\sqrt {z}}),\;z\geq 0}$.

To find the moments of this, make the change of variable ${\displaystyle y=2{\sqrt {z}}}$, simplifying similar integrals to:

${\displaystyle \int _{0}^{\infty }z^{p}K_{\nu }(2{\sqrt {z}})dz=2^{-2p-1}\int _{0}^{\infty }y^{2p+1}K_{\nu }(y)dy}$

thus

${\displaystyle 2\int _{0}^{\infty }z^{{\frac {\alpha +\beta }{2}}-1}K_{\alpha -\beta }(2{\sqrt {z}})dz=2^{-(\alpha +\beta )-2p+1}\int _{0}^{\infty }y^{(\alpha +\beta )+2p-1}K_{\alpha -\beta }(y)dy}$

The definite integral ${\displaystyle \int _{0}^{\infty }y^{\mu }K_{\nu }(y)dy=2^{\mu -1}\Gamma \left({\frac {1+\mu +\nu }{2}}\right)\Gamma \left({\frac {1+\mu -\nu }{2}}\right)}$ is well documented and we have finally

{\displaystyle {\begin{aligned}E[Z^{p}]&={\frac {2^{-(\alpha +\beta )-2p+1}\;2^{(\alpha +\beta )+2p-1}}{\Gamma (\alpha )\;\Gamma (\beta )}}\Gamma \left({\frac {(\alpha +\beta +2p)+(\alpha -\beta )}{2}}\right)\Gamma \left({\frac {(\alpha +\beta +2p)-(\alpha -\beta )}{2}}\right)\\\\&={\frac {\Gamma (\alpha +p)\,\Gamma (\beta +p)}{\Gamma (\alpha )\,\Gamma (\beta )}}\end{aligned}}}

which, after some difficulty, has agreed with the moment product result above.

## Special cases

The distribution of the product of two random variables which have lognormal distributions is again lognormal. This is itself a special case of a more general set of results where the logarithm of the product can be written as the sum of the logarithms. Thus, in cases where a simple result can be found in the list of convolutions of probability distributions, where the distributions to be convolved are those of the logarithms of the components of the product, the result might be transformed to provide the distribution of the product. However this approach is only useful where the logarithms of the components of the product are in some standard families of distributions.

This works for the product of two independent variables each uniformly distributed on the x-interval [0,1]. Making the transformation ${\displaystyle u=\ln(x)}$, each is distributed on u as ${\displaystyle p_{u}(u)=p_{x}(x)/|du/dx|={\frac {1}{x^{-1}}}=e^{u},u\leq 0}$.
The convolution of the two distributions is the autoconvolution
${\displaystyle c(z)=\int _{0}^{z}e^{u}e^{z-u}du=-\int _{z}^{0}e^{z}du=-ze^{z},z\leq 0}$
Next retransform the variable to ${\displaystyle x=e^{z}}$ yielding the distribution
${\displaystyle c_{x}(x)=c_{z}(z)/|dx/dz|={\frac {-ze^{z}}{e^{z}}}=-z=\ln(1/x)}$ on the interval [0,1]

The product of two independent Normal samples follows a modified Bessel function. Let ${\displaystyle x,y}$ be samples from a Normal(0,1) distribution and ${\displaystyle z=xy}$. Then ${\displaystyle p_{z}(z)={\frac {K_{0}(|z|)}{\pi }}}$

The product of two independent Gamma samples, ${\displaystyle z=x_{1}x_{2}}$, defining ${\displaystyle \Gamma (x;k_{i},\theta _{i})={\frac {x^{k_{i}-1}e^{-x/\theta _{i}}}{\Gamma (k_{i})\theta _{i}^{k_{i}-1}}}}$, follows ${\displaystyle p_{z}(z)={\frac {2}{\Gamma (k_{1})\Gamma (k_{2})\theta _{1}\theta _{2}}}y^{{\frac {k_{1}+k_{2}}{2}}-1}K_{k_{1}-k_{2}}\left(2{\sqrt {y}}\right)}$ where ${\displaystyle y={\frac {z}{\theta _{1}\theta _{2}}}}$

The distribution of the product of a random variable having a uniform distribution on (0,1) with a random variable having a gamma distribution with shape parameter equal to 2, is an exponential distribution.[4] A more general case of this concerns the distribution of the product of a random variable having a beta distribution with a random variable having a gamma distribution: for some cases where the parameters of the two component distributions are related in a certain way, the result is again a gamma distribution but with a changed shape parameter.[4]

The K-distribution is an example of a non-standard distribution that can be defined as a product distribution (where both components have a gamma distribution).

## In theoretical computer science

In computational learning theory, a product distribution ${\displaystyle {\mathcal {D}}}$ over ${\displaystyle \{0,1\}^{n}}$ is specified by the parameters ${\displaystyle \mu _{1},\mu _{2},\dots ,\mu _{n}}$. Each parameter ${\displaystyle \mu _{i}}$ gives the marginal probability that the ith bit of ${\displaystyle x\in \{0,1\}^{n}}$ sampled as ${\displaystyle x\sim {\mathcal {D}}}$ is 1; i.e. ${\displaystyle \mu _{i}=\operatorname {Pr} _{\mathcal {D}}[x_{i}=1]}$. In this setting, the uniform distribution is simply a product distribution with every ${\displaystyle \mu _{i}=1/2}$.

Product distributions are a key tool used for proving learnability results when the examples cannot be assumed to be uniformly sampled.[5] They give rise to an inner product ${\displaystyle \langle \cdot ,\cdot \rangle }$ on the space of real-valued functions on ${\displaystyle \{0,1\}^{n}}$ as follows:

${\displaystyle \langle f,g\rangle _{\mathcal {D}}=\sum _{x\in \{0,1\}^{n}}{\mathcal {D}}(x)f(x)g(x)=\mathbb {E} _{\mathcal {D}}[fg]}$

This inner product gives rise to a corresponding norm as follows:

${\displaystyle \|f\|_{\mathcal {D}}={\sqrt {\langle f,f\rangle _{\mathcal {D}}}}}$