Joint probability distribution

From Wikipedia, the free encyclopedia
  (Redirected from Joint probability)
Jump to: navigation, search

In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. The equation for joint probability is different for both dependent and independent events.

Contents

[edit] Example

Consider the roll of a die and let A = 1 if the number is even (i.e. 2,4, or 6) and A = 0 otherwise. Furthermore, let B = 1 if the number is prime (i.e. 2,3, or 5) and B = 0 otherwise. Then, the joint distribution of A and B is


  \mathrm{P}(A=0,B=0)=P\{1\}=\frac{1}{6},\; \mathrm{P}(A=1,B=0)=P\{4,6\}=\frac{2}{6}

  \mathrm{P}(A=0,B=1)=P\{3,5\}=\frac{2}{6},\; \mathrm{P}(A=1,B=1)=P\{2\}=\frac{1}{6}

[edit] Cumulative distribution

The cumulative distribution function for a pair of random variables is defined in terms of their joint probability distribution;

F(x,y)=P(X \le x, Y \le y) .

[edit] Discrete case

The joint probability mass function of two discrete random variables is equal to


\begin{align}
\mathrm{P}(X=x\ \mathrm{and}\ Y=y) & {} = \mathrm{P}(Y=y \mid X=x) \cdot \mathrm{P}(X=x) \\
& {} = \mathrm{P}(X=x \mid Y=y) \cdot \mathrm{P}(Y=y).
\end{align}

In general, the joint probability distribution of n\, discrete random variables X_1, X_2, \dots,X_n is equal to


\begin{align}
\mathrm{P}(X_1=x_1,\dots,X_n=x_n) & =  \mathrm{P}(X_1=x_1)\times  \\ & \qquad \times \mathrm{P}(X_2=x_2|X_1=x_1)\times \\ & \quad \qquad \times \mathrm{P}(X_3=x_3|X_1=x_1,X_2=x_2) \times  \dots \times P(X_n=x_n|X_1=x_1,X_2=x_2,\dots,X_{n-1}=x_{n-1})
\end{align}

This identity is known as the chain rule of probability.

Since these are probabilities, we have

\sum_x \sum_y \mathrm{P}(X=x\ \mathrm{and}\ Y=y) = 1.\;

generalizing for n\, discrete random variables X_1, X_2, \dots , X_n

\sum_{x_1} \sum_{x_2} \dots \sum_{x_n} \mathrm{P}(X_1=x_1,X_2=x_2, \dots, X_n=x_n) = 1.\;

[edit] Continuous case

Similarly for continuous random variables, the joint probability density function can be written as fX,Y(xy) and this is

f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) = f_{X|Y}(x|y)f_Y(y)\;

where fY|X(y|x) and fX|Y(x|y) give the conditional distributions of Y given X = x and of X given Y = y respectively, and fX(x) and fY(y) give the marginal distributions for X and Y respectively.

Again, since these are probability distributions, one has

\int_x \int_y f_{X,Y}(x,y) \; dy \; dx= 1.

[edit] Mixed case

In some situations X is continuous but Y is discrete. For example, in a logistic regression, one may wish to predict the probability of a binary outcome Y conditional on the value of a continuously-distributed X. In this case, (X, Y) has neither a probability density function nor a probability mass function in the sense of the terms given above. On the other hand, a "mixed joint density" can be defined in either of two ways:


\begin{align}
f_{X,Y}(x,y) &= f_{X|Y}(x|y)\mathrm{P}(Y=y)\\
             &= \mathrm{P}(Y=y \mid X=x) f_X(x)
\end{align}

Formally, fX,Y(x, y) is the probability density function of (X, Y) with respect to the product measure on the respective supports of X and Y. Either of these two decompositions can then be used to recover the joint cumulative distribution function:


\begin{align}
F_{X,Y}(x,y)&=\sum\limits_{t\le y}\int_{s=-\infty}^x f_{X,Y}(s,t)\;ds
\end{align}

The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

[edit] General multidimensional distributions

The cumulative distribution function for a vector of random variables is defined in terms of their joint probability distribution;

F(x_1,\dots,x_n)=P(X_1 \le x_1,\dots, X_n \le x_n) .


The joint distribution for two random variables can be extended to many random variables X1, ... Xn by adding them sequentially with the identity

\begin{align} f_{X_1, \ldots X_n}(x_1, \ldots x_n) =& f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}) f_{X_1, \ldots X_{n-1}}( x_1, \ldots x_{n-1} )\\
=& f_{X_1} (x_1) \\
 & \cdot f_{X_2|X_1} (x_2|x_1)\\
 & \cdot \dots \\
 & \cdot f_{X_{n-1}| X_1 \ldots X_{n-2}}(x_{n-1}| x_1, \ldots x_{n-2} ) \\
 & \cdot f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}),\end{align}

where

\begin{align}
f_{X_i| X_1, \ldots X_{i-1}}(x_i | x_1, \ldots x_{i-1})=
  &\frac{f_{X_1, \dots X_i}(x_1,\dots x_i)}{\int f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\
= &\frac{\int \dots \int f_{X_1, \dots X_n}(x_1,\dots x_i,u_{i+1}, \dots u_n) \mathrm{d} u_{i+1}\dots \mathrm{d}u_n}{\int \dots \int \int f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i \,\mathrm{d} u_{i+1}\dots \mathrm{d}u_n}
\end{align}

and

f_{X_1,\dots X_i}(x_1,\dots x_i) = \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_i,x_{i+1},\dots x_n) \mathrm{d} x_{i+1} \dots \mathrm{d} x_n

(notice, that these latter identities can be useful to generate a random variable (X_1, \dots X_n) with given distribution function f(x_1,\dots x_n)); the density of the marginal distribution is

f_{X_i}(x_i) = \int \dots \int \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_{i-1},x_i,x_{i+1},\dots x_n) \mathrm{d} x_1\dots \mathrm{d}x_{i-1} \, \mathrm{d}x_{i+1} \dots \mathrm{d}x_n.

The joint cumulative distribution function is

F_{X_1,\dots X_n}\left( x_1, \dots x_n\right)= \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} f_{X_1,\dots X_n}\left(u_1,\dots u_n\right) \mathrm{d} u_1 \dots \mathrm{d}u_n,

and the conditional distribution function is accordingly

\begin{align}
F_{X_i| X_1, \ldots X_{i-1}}(x_i| x_1, \ldots x_{i-1})=
  &\frac{\int_{-\infty}^{x_i}f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i)\mathrm{d}u_i}{\int_{-\infty}^\infty f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\
= &\frac{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^{x_i} f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i\dots \mathrm{d}u_n}{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^\infty f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i,\dots u_n) \mathrm{d} u_i \dots \mathrm{d} u_n}.
\end{align}


Expectation reads

\mathbb{E}\left[h(X_1,\dots X_n) \right]=\int_{-\infty}^\infty \dots \int_{-\infty}^\infty h(x_1,\dots x_n) f_{X_1,\dots X_n}(x_1,\dots x_n) \mathrm{d} x_1 \dots \mathrm{d} x_n;

suppose that h is smooth enough and h(u_1,\dots u_n)=h(x_1,\dots x_n) for u_1 \ge x_1, \dots u_n\ge x_n, then, by iterated integration by parts,

\begin{align}\mathbb{E}\left[h(X_1,\dots X_n) \right]=& h(x_1,\dots x_n)+ \\
& (-1)^n \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} F_{X_1,\dots X_n}(u_1,\dots u_n) \frac{\partial^n}{\partial x_1 \dots \partial x_n} h(u_1,\dots u_n) \mathrm{d} u_1 \dots \mathrm{d} u_n.\end{align}

[edit] Joint distribution for independent variables

If for discrete random variables \ P(X = x \ \mbox{and} \ Y = y ) = P( X = x) \cdot P( Y = y) for all x and y, or for absolutely continuous random variables \ f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) for all x and y, then X and Y are said to be independent.

[edit] Joint Distribution for conditionally independent variables

If a subset A of the variables X_1,\cdots,X_n is conditionally independent given another subset B of these variables, then the joint distribution P(X1,...,Xn) is equal to P(B)\cdot P(A|B). Therefore, it can be efficiently represented by the lower-dimensional probability distributions P(B) and P(A | B). Such conditional independence relations can be represented with a Bayesian network.

[edit] See also

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages