# Horvitz–Thompson estimator

In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson,[1] is a method for estimating the total[2] and mean of a pseudo-population in a stratified sample by applying inverse probability weighting to account for the difference in the sampling distribution between the collected data and the a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data, as well as many sources of unequal selection probabilities.

## The method

Formally, let ${\displaystyle Y_{i},i=1,2,\ldots ,n}$ be an independent sample from n of N ≥ n distinct strata with a common mean μ. Suppose further that ${\displaystyle \pi _{i}}$ is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the ith stratum. The Horvitz–Thompson estimator of the total is given by:[3]: 51

${\displaystyle {\hat {Y}}_{HT}=\sum _{i=1}^{n}\pi _{i}^{-1}Y_{i},}$

and the Horvitz–Thompson estimate of the mean is given by:

${\displaystyle {\hat {\mu }}_{HT}=N^{-1}{\hat {Y}}_{HT}=N^{-1}\sum _{i=1}^{n}\pi _{i}^{-1}Y_{i}.}$

In a Bayesian probabilistic framework ${\displaystyle \pi _{i}}$ is considered the proportion of individuals in a target population belonging to the ith stratum. Hence, ${\displaystyle \pi _{i}^{-1}Y_{i}}$ could be thought of as an estimate of the complete sample of persons within the ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.[4]

For post-stratified study designs, estimation of ${\displaystyle \pi }$ and ${\displaystyle \mu }$ are done in distinct steps. In such cases, computating the variance of ${\displaystyle {\hat {\mu }}_{HT}}$ is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator.[5] The "survey" package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.[6]

## Proof of Horvitz-Thompson Unbiased Estimation of the Mean

The Horvitz–Thompson estimator can be shown to be unbiased when evaluating the expectation of the Horvitz–Thompson estimator, ${\displaystyle \mathbf {E} {\bar {X}}_{n}^{HT}}$, as follows:

${\displaystyle \mathbf {E} {\bar {X}}_{n}^{HT}=\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{n}{\frac {\mathbf {X} _{I_{i}}}{\pi _{I_{i}}}}}$
${\displaystyle =\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}}}$
${\displaystyle =\sum _{b=1}^{B}P(D_{n}^{(b)})\left[{\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}^{(b)}}\right]}$
${\displaystyle ={\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}\sum _{b=1}^{B}1_{i\in D_{n}^{(b)}}P(D_{n}^{(b)})}$
${\displaystyle ={\frac {1}{N}}\sum _{i=1}^{N}\left({\frac {X_{i}}{\pi _{i}}}\right)\pi _{i}}$
${\displaystyle ={\frac {1}{N}}\sum _{i=1}^{N}X_{i}}$
${\displaystyle {\text{where}}~D_{n}=\{x_{1},x_{2},...,x_{n}\}}$

The Hansen-Hurwitz (1943) is known to be inferior to the Horvitz–Thompson (1952) strategy, associated with a number of Inclusion Probabilities Proportional to Size (IPPS) sampling procedures.[7]

## References

1. ^ Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784
2. ^ William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X
3. ^ Särndal, Carl-Erik; Swensson, Bengt; Wretman, Jan Hȧkan (1992). Model Assisted Survey Sampling. ISBN 9780387975283.
4. ^ Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5
5. ^ Quatember, A. (2014). "The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach". Austrian Journal of Statistics. 43 (2): 93–102. doi:10.17713/ajs.v43i2.10.
6. ^ "CRAN - Package survey". 19 July 2021.
7. ^ PRABHU-AJGAONKAR, S. G. "Comparison of the Horvitz-Thompson Strategy with the Hansen-Hurwitz Strategy." Survey Methodology (1987): 221. (pdf)