Horvitz–Thompson estimator

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson,[1] is a method for estimating the total[2] and mean of a superpopulation in a stratified sample. Inverse probability weighting is applied to account for different proportions of observations within strata in a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data.

The method[edit]

Formally, let Y_i, i = 1, 2, \ldots, n be an independent sample from n of N ≥ n distinct strata with a common mean μ. Suppose further that \pi_i is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the ith stratum. The Horvitz–Thompson estimate of the total is given by:

\hat{Y}_{HT} = \sum_{i=1}^n \pi_i ^{-1} Y_i,

and the estimate of the mean is given by:

\hat{\mu}_{HT} = N^{-1}\hat{Y}_{HT} = N^{-1}\sum_{i=1}^n \pi_i ^{-1} Y_i.

In a Bayesian probabilistic framework \pi_i is considered the proportion of individuals in a target population belonging to the ith stratum. Hence, \pi_i^{-1} Y_i could be thought of as an estimate of the complete sample of persons within the ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.[3]

For post-stratified study designs, estimation of \pi and \mu are done in distinct steps. In such cases, computating the variance of \hat{\mu}_{HT} is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator[citation needed]. The Survey package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.


  1. ^ Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, . JSTOR 2280784
  2. ^ William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley. ISBN 0-471-16240-X
  3. ^ Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley. ISBN 0-471-18386-5

External links[edit]