Tobit model

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Tobit model is a statistical model proposed by James Tobin (1958)[1] to describe the relationship between a non-negative dependent variable y_i and an independent variable (or vector) x_i. The term Tobit was derived from Tobin's name by truncating and adding -it by analogy with the probit model.[2]

The model supposes that there is a latent (i.e. unobservable) variable y_i^*. This variable linearly depends on x_i via a parameter (vector) \beta which determines the relationship between the independent variable (or vector) x_i and the latent variable y_i^* (just as in a linear model). In addition, there is a normally distributed error term u_i to capture random influences on this relationship. The observable variable y_i is defined to be equal to the latent variable whenever the latent variable is above zero and zero otherwise.

y_i = \begin{cases} 
    y_i^* & \textrm{if} \; y_i^* >0 \\ 
    0     & \textrm{if} \; y_i^* \leq 0
\end{cases}

where y_i^* is a latent variable:

 y_i^* = \beta x_i + u_i, u_i \sim N(0,\sigma^2) \,

Consistency[edit]

If the relationship parameter \beta is estimated by regressing the observed  y_i on  x_i , the resulting ordinary least squares regression estimator is inconsistent. It will yield a downwards-biased estimate of the slope coefficient and an upward-biased estimate of the intercept. Takeshi Amemiya (1973) has proven that the maximum likelihood estimator suggested by Tobin for this model is consistent.

Interpretation[edit]

The \beta coefficient should not be interpreted as the effect of x_i on y_i, as one would with a linear regression model; this is a common error. Instead, it should be interpreted as the combination of (1) the change in y_i of those above the limit, weighted by the probability of being above the limit; and (2) the change in the probability of being above the limit, weighted by the expected value of y_i if above.[3]

Variations of the Tobit model[edit]

Variations of the Tobit model can be produced by changing where and when censoring occurs. Amemiya (1985, p. 384) classifies these variations into five categories (Tobit type I - Tobit type V), where Tobit type I stands for the first model described above. Schnedler (2005) provides a general formula to obtain consistent likelihood estimators for these and other variations of the Tobit model.

Type I[edit]

The Tobit model is a special case of a censored regression model, because the latent variable y_i^* cannot always be observed while the independent variable  x_i is observable. A common variation of the Tobit model is censoring at a value  y_L different from zero:

 y_i = \begin{cases} 
    y_i^* & \textrm{if} \; y_i^* >y_L \\ 
    y_L   & \textrm{if} \; y_i^* \leq y_L.
\end{cases}

Another example is censoring of values above  y_U.

 y_i = \begin{cases} 
    y_i^* & \textrm{if} \; y_i^* <y_U \\ 
    y_U   & \textrm{if} \; y_i^* \geq y_U.
\end{cases}

Yet another model results when  y_i is censored from above and below at the same time.

 y_i = \begin{cases} 
    y_i^* & \textrm{if} \; y_L<y_i^* <y_U \\ 
    y_L   & \textrm{if} \; y_i^* \leq y_L \\
    y_U   & \textrm{if} \; y_i^* \geq y_U.
\end{cases}

The rest of the models will be presented as being bounded from below at 0, though this can be generalized as we have done for Type I.

Type II[edit]

Type II Tobit models introduce a second latent variable.

 y_{2i} = \begin{cases} 
    y_{2i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}

Heckman (1987) falls into the Type II Tobit. In Type I Tobit, the latent variable absorb both the process of participation and 'outcome' of interest. Type II Tobit allows the process of participation/selection and the process of 'outcome' to be independent, conditional on x.

Type III[edit]

Type III introduces a second observed dependent variable.

 y_{1i} = \begin{cases} 
    y_{1i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}
 y_{2i} = \begin{cases} 
    y_{2i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}

The Heckman model falls into this type.

Type IV[edit]

Type IV introduces a third observed dependent variable and a third latent variable.

 y_{1i} = \begin{cases} 
    y_{1i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}
 y_{2i} = \begin{cases} 
    y_{2i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}
 y_{3i} = \begin{cases} 
    y_{3i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}

Type V[edit]

Similar to Type II, in Type V we only observe the sign of y_{1i}^*.

 y_{2i} = \begin{cases} 
    y_{2i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}
 y_{3i} = \begin{cases} 
    y_{3i}^* & \textrm{if} \; y_{1i}^* >0 \\ 
    0   & \textrm{if} \; y_{1i}^* \leq 0.
\end{cases}

The likelihood function[edit]

Below are the likelihood and log likelihood functions for a type I Tobit. This is a Tobit that is censored from below at  y_L when the latent variable  y_j^* \leq y_L . In writing out the likelihood function, we first define an indicator function  I(y_j) where:

 I(y_j) = \begin{cases} 
    0  & \textrm{if} \; y_j = y_L \\ 
    1   & \textrm{if} \; y_j \neq y_L.
\end{cases}

Next, we mean  \Phi to be the standard normal cumulative distribution function and  \phi to be the standard normal probability density function. For a data set with N observations the likelihood function for a type I Tobit is

 \mathcal{L}(\beta, \sigma) =  \prod _{j=1}^N \left(\frac{1}{\sigma}\phi \left(\frac{y_j-X_j\beta  }{\sigma
   }\right)\right)^{I\left(y_j\right)} \left(1-\Phi
   \left(\frac{X_j\beta-y_L}{\sigma}\right)\right)^{1-I\left(y_j\right)}

and the log likelihood is given by


\log \mathcal{L}(\beta, \sigma) = \sum^n_{j = 1} I(y_j) \log \left( \frac{1}{\sigma} \phi\left( \frac{y_j - X_j\beta}{\sigma} \right) \right) + (1 - I(y_j)) \log\left( 1- \Phi\left( \frac{X_j \beta - y_L}{\sigma} \right) \right)

See also[edit]

References[edit]

  1. ^ Tobin, James (1958). "Estimation of relationships for limited dependent variables". Econometrica 26 (1): 24–36. doi:10.2307/1907382. JSTOR 1907382. 
  2. ^ International Encyclopedia of the Social Sciences (2008)
  3. ^ McDonald, John F.; Moffit, Robert A. (1980), "The Uses of Tobit Analysis", The Review of Economics and Statistics (The MIT Press) 62 (2): 318–321, doi:10.2307/1924766 

Further reading[edit]