Probit model

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married.

A probit model is a popular specification for an ordinal[1] or a binary response model that employs a probit link function. This model is most often estimated using standard maximum likelihood procedure, such an estimation being called a probit regression.

Probit models were introduced by Chester Bliss in 1935, and a fast method for computing maximum likelihood estimates for them was proposed by Ronald Fisher in an appendix to the same article.

Contents

[edit] Introduction

Suppose response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes form


    \Pr(Y=1 \mid X) = \Phi(X'\beta),

where Pr denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood.

It is also possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable

 Y^\ast = X'\beta + \varepsilon, \,

where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive:

 Y = \mathbf{1}_{\{Y^\ast>0\}} = \begin{cases} 1 & \text{if }Y^\ast > 0 \ \text{ i.e. } - \varepsilon < X'\beta, \\
0 &\text{otherwise.} \end{cases}

[edit] Maximum likelihood estimation

Suppose data set \{y_i,x_i\}_{i=1}^n contains n independent statistical units corresponding to the model above. Then their joint log-likelihood function is

 \ln\mathcal{L}(\beta) = \sum_{i=1}^n \bigg( y_i\ln\Phi(x_i'\beta) + (1-y_i)\ln\!\big(1-\Phi(x_i'\beta)\big) \bigg)

The estimator \hat\beta which maximizes this function will be consistent, asymptotically normal and efficient provided that E[XX'] exists and is not singular. It can be shown that this log-likelihood function is globally concave in β, and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.

Asymptotic distribution for \hat\beta is given by

\sqrt{n}(\hat\beta - \beta)\ \xrightarrow{d}\ \mathcal{N}(0,\,\Omega^{-1}),

where

\Omega = \operatorname{E}\bigg[ \frac{\varphi^2(X'\beta)}{\Phi(X'\beta)(1-\Phi(X'\beta))}XX' \bigg], \qquad
  \hat\Omega = \frac{1}{n}\sum_{i=1}^n \frac{\varphi^2(x'_i\hat\beta)}{\Phi(x'_i\hat\beta)(1-\Phi(x'_i\hat\beta))}x_ix'_i

and φ = Φ' is the Probability Density Function (PDF) of standard normal distribution.

[edit] Berkson's minimum chi-square method

This method can be applied only when there are many observations of response variable yi having the same value of the vector of regressors xi (such situation may be referred to as “many observations per cell”). More specifically, the model can be formulated as follows.

Suppose among n observations \{y_i,x_i\}_{i=1}^n there are only T distinct values of the regressors, which can be denoted as \{x_{(1)},\ldots,x_{(T)}\}. Let nt be the number of observations with xi = x(t), and rt the number of observations with xi = x(t) and yi = 1. We assume that there are indeed “many” observations per each “cell”: limit nt÷n → constt>0 as n→∞ and for each group t.

Denote

 \hat{p}_t = r_t/n_t
 \hat\sigma_t^2 = \frac{1}{n_t} \frac{\hat{p}_t(1-\hat{p}_t)}{\varphi^2\big(\Phi^{-1}(\hat{p}_t)\big)}

Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of \Phi^{-1}(\hat{p}_t) on x(t) with weights \hat\sigma_t^{-2}:

 \hat\beta = \Bigg( \sum_{t=1}^T \hat\sigma_t^{-2}x_{(t)}x'_{(t)} \Bigg)^{-1} \sum_{t=1}^T \hat\sigma_t^{-2}x_{(t)}\Phi^{-1}(\hat{p}_t)

It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient.[citation needed] Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts rt, nt, and x(t) (for example in the analysis of voting behavior).

[edit] See also

[edit] References

  • Bliss, C.I. (1935). "The calculation of the dosage-mortality curve". Annals of Applied Biology (22)134–167. doi:10.1111/j.1744-7348.1935.tb07713.x
  • Bliss, C.I (1938). "The determination of the dosage-mortality curve from small numbers". Quarterly Journal of Pharmacology (11)192–216.
  • McCullagh, Peter; John Nelder (1989). Generalized Linear Models. London: Chapman and Hall. ISBN 0-412-31760-5. 
  • Albert, J.H., and Chib, S. (1993). "Bayesian Analysis of Binary and Polychotomous Response Data." Journal of the American Statistical Association

(88)422: pp. 669-679. http://www.jstor.org/stable/2290350

[edit] Notes

  1. ^ Ordinal probit regression model UCLA Academic Technology Services http://www.ats.ucla.edu/stat/stata/dae/ologit.htm
Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages