= Hopkins statistic =

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set. It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed. If individuals are aggregated, then its value approaches 1, and if they are randomly distributed along the value tends to 0.5.

==Preliminaries==

A typical formulation of the Hopkins statistic follows.
Let $X$ be the set of $n$ data points.
Generate a random sample $\overset{\sim}{X}$ of $m \ll n$ data points sampled without replacement from $X$.
Generate a set $Y$ of $m$ uniformly randomly distributed data points.
Define two distance measures,
$u_i,$ the minimum distance (given some suitable metric) of $y_i \in Y$ to its nearest neighbour in $X$, and
$w_i,$ the minimum distance of $\overset{\sim}{x}_i \in \overset{\sim}{X}\subseteq X$ to its nearest neighbour $x_j \in X,\, \overset{\sim}{x_i}\ne x_j.$

==Definition==
With the above notation, if the data is $d$ dimensional, then the Hopkins statistic is defined as:

$H=\frac{\sum_{i=1}^m{u_i^d}}{\sum_{i=1}^m{u_i^d}+\sum_{i=1}^m{w_i^d}} \,$

Under the null hypotheses, this statistic has a Beta(m,m) distribution.
