Probit model

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married.

A probit model is a popular specification for an ordinal[1] or a binary response model that employs a probit link function. This model is most often estimated using standard maximum likelihood procedure, such an estimation being called a probit regression.

Probit models were introduced by Chester Bliss in 1935, and a fast method for computing maximum likelihood estimates for them was proposed by Ronald Fisher in an appendix to the same article.

Contents

[edit] Introduction

Suppose response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes form


    \Pr(Y=1 \mid X) = \Phi(X'\beta),

where Pr denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood.

It is also possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable

 Y^\ast = X'\beta + \varepsilon, \,

where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive:

 Y = \mathbf{1}_{\{Y^\ast>0\}} = \begin{cases} 1 & \text{if }Y^\ast > 0 \ \text{ i.e. } - \varepsilon < X'\beta, \\
0 &\text{otherwise.} \end{cases}

[edit] Maximum likelihood estimation

Suppose data set \{y_i,x_i\}_{i=1}^n contains n independent statistical units corresponding to the model above. Then their joint log-likelihood function is

 \ln\mathcal{L}(\beta) = \sum_{i=1}^n \bigg( y_i\ln\Phi(x_i'\beta) + (1-y_i)\ln\!\big(1-\Phi(x_i'\beta)\big) \bigg)

The estimator \hat\beta which maximizes this function will be consistent, asymptotically normal and efficient provided that E[XX'] exists and is not singular. It can be shown that this log-likelihood function is globally concave in β, and therefore standard numerical algorithms for optimization will converge rapidly to the unique maximum.

Asymptotic distribution for \hat\beta is given by

\sqrt{n}(\hat\beta - \beta)\ \xrightarrow{d}\ \mathcal{N}(0,\,\Omega^{-1}),

where

\Omega = \operatorname{E}\bigg[ \frac{\varphi^2(X'\beta)}{\Phi(X'\beta)(1-\Phi(X'\beta))}XX' \bigg], \qquad
  \hat\Omega = \frac{1}{n}\sum_{i=1}^n \frac{\varphi^2(x'_i\hat\beta)}{\Phi(x'_i\hat\beta)(1-\Phi(x'_i\hat\beta))}x_ix'_i

and φ = Φ' is the Probability Density Function (PDF) of standard normal distribution.

[edit] Berkson's minimum chi-square method

This method can be applied only when there are many observations of response variable y_i having the same value of the vector of regressors x_i (such situation may be referred to as “many observations per cell”). More specifically, the model can be formulated as follows.

Suppose among n observations \{y_i,x_i\}_{i=1}^n there are only T distinct values of the regressors, which can be denoted as \{x_{(1)},\ldots,x_{(T)}\}. Let n_t be the number of observations with x_i=x_{(t)}, and r_t the number of observations with x_i=x_{(t)} and y_i=1. We assume that there are indeed “many” observations per each “cell”: limit nt÷n → constt>0 as n→∞ and for each group t.

Denote

 \hat{p}_t = r_t/n_t
 \hat\sigma_t^2 = \frac{1}{n_t} \frac{\hat{p}_t(1-\hat{p}_t)}{\varphi^2\big(\Phi^{-1}(\hat{p}_t)\big)}

Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of \Phi^{-1}(\hat{p}_t) on x_{(t)} with weights \hat\sigma_t^{-2}:

 \hat\beta = \Bigg( \sum_{t=1}^T \hat\sigma_t^{-2}x_{(t)}x'_{(t)} \Bigg)^{-1} \sum_{t=1}^T \hat\sigma_t^{-2}x_{(t)}\Phi^{-1}(\hat{p}_t)

It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient.[citation needed] Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts r_t, n_t, and x_{(t)} (for example in the analysis of voting behavior).

[edit] See also

[edit] References

  • Bliss, C.I. (1935). "The calculation of the dosage-mortality curve". Annals of Applied Biology (22)134–167. doi:10.1111/j.1744-7348.1935.tb07713.x
  • Bliss, C.I (1938). "The determination of the dosage-mortality curve from small numbers". Quarterly Journal of Pharmacology (11)192–216.
  • McCullagh, Peter; John Nelder (1989). Generalized Linear Models. London: Chapman and Hall. ISBN 0-412-31760-5. 
  • Albert, J.H., and Chib, S. (1993). "Bayesian Analysis of Binary and Polychotomous Response Data." Journal of the American Statistical Association

(88)422: pp. 669-679. http://www.jstor.org/stable/2290350

[edit] Notes

  1. ^ Ordinal probit regression model UCLA Academic Technology Services http://www.ats.ucla.edu/stat/stata/dae/ologit.htm
Personal tools
Namespaces

Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages