Central limit theorem for directional statistics

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In probability theory, the central limit theorem states conditions under which the average of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.[1]

Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rn), axes (lines through the origin in Rn) or rotations in Rn. The means and variances of directional quantities are all finite, so that the central limit theorem may be applied to the particular case of directional statistics.[2]

This article will deal only with unit vectors in 2-dimensional space (R2) but the method described can be extended to the general case.

The central limit theorem[edit]

A sample of angles \theta_i are measured, and since they are indefinite to within a factor of 2\pi, the complex definite quantity z_i=e^{i\theta_i}=\cos(\theta_i)+i\sin(\theta_i) is used as the random variate. The probability distribution from which the sample is drawn may be characterized by its moments, which may be expressed in Cartesian and polar form:

m_n=E(z^n)= C_n +i S_n = R_n e^{i \theta_n}\,

It follows that:

C_n=E(\cos (n\theta))\,
S_n=E(\sin (n\theta))\,
R_n=|E(z^n)|=\sqrt{C_n^2+S_n^2}\,
\theta_n=\arg(E(z^n))\,

Sample moments for N trials are:

\overline{m_n}=\frac{1}{N}\sum_{i=1}^N z_i^n =\overline{C_n} +i \overline{S_n} = \overline{R_n} e^{i \overline{\theta_n}}

where

\overline{C_n}=\frac{1}{N}\sum_{i=1}^N\cos(n\theta_i)
\overline{S_n}=\frac{1}{N}\sum_{i=1}^N\sin(n\theta_i)
\overline{R_n}=\frac{1}{N}\sum_{i=1}^N |z_i^n|
\overline{\theta_n}=\frac{1}{N}\sum_{i=1}^N \arg(z_i^n)

The vector [\overline{ C_1 },\overline{ S_1 }] may be used as a representation of the sample mean (\overline{m_1}) and may be taken as a 2-dimensional random variate.[2] The bivariate central limit theorem states that the joint probability distribution for \overline{ C_1 } and \overline{ S_1 } in the limit of a large number of samples is given by:

[\overline{C_1},\overline{S_1}] \xrightarrow{d} \mathcal{N}([C_1,S_1],\Sigma/N)

where \mathcal{N}() is the bivariate normal distribution and \Sigma is the covariance matrix for the circular distribution:


\Sigma
=
\begin{bmatrix}
 \sigma_{CC} & \sigma_{CS} \\
 \sigma_{SC} & \sigma_{SS}
\end{bmatrix}
\quad
\sigma_{CC}=E(\cos^2\theta)-E(\cos\theta)^2\,
\sigma_{CS}=\sigma_{SC}=E(\cos\theta\sin\theta)-E(\cos\theta)E(\sin\theta)\,
\sigma_{SS}=E(\sin^2\theta)-E(\sin\theta)^2\,

Note that the bivariate normal distribution is defined over the entire plane, while the mean is confined to be in the unit ball (on or inside the unit circle). This means that the integral of the limiting (bivariate normal) distribution over the unit ball will not be equal to unity, but rather approach unity as N approaches infinity.

It is desired to state the limiting bivariate distribution in terms of the moments of the distribution.

Covariance matrix in terms of moments[edit]

Using multiple angle trigonometric identities[2]

C_2= E(\cos(2\theta)) = E(\cos^2\theta-1)=E(1-\sin^2\theta)\,
S_2= E(\sin(2\theta)) = E(2\cos\theta\sin\theta)\,

It follows that:

\sigma_{CC}=E(\cos^2\theta)-E(\cos\theta)^2 =\frac{1}{2}\left(1 + C_2 - 2C_1^2\right)
\sigma_{CS}=E(\cos\theta\sin\theta)-E(\cos\theta)E(\sin\theta)=\frac{1}{2}\left(S_2 - 2 C_1 S_1   \right)
\sigma_{SS}=E(\sin^2\theta)-E(\sin\theta)^2 =\frac{1}{2}\left(1   - C_2 - 2S_1^2\right)

The covariance matrix is now expressed in terms of the moments of the circular distribution.

The central limit theorem may also be expressed in terms of the polar components of the mean. If P(\overline{C_1},\overline{S_1})d\overline{C_1}d\overline{S_1} is the probability of finding the mean in area element d\overline{C_1}d\overline{S_1}, then that probability may also be written P(\overline{R_1}\cos(\overline{\theta_1}),\overline{R_1}\sin(\overline{\theta_1}))\overline{R_1}d\overline{R_1}d\overline{\theta_1}.

References[edit]

  1. ^ Rice (1995)[full citation needed]
  2. ^ a b c Jammalamadaka, S. Rao; SenGupta, A. (2001). Topics in circular statistics. New Jersey: World Scientific. ISBN 981-02-3778-2. Retrieved 2011-05-15.