Jump to content

Quantile regression

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 186.214.12.93 (talk) at 23:03, 28 October 2011 (→‎Implementations). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Quantile regression is a type of regression analysis used in statistics. Whereas the method of least squares results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression results in estimates approximating either the median or other quantiles of the response variable.

Advantages and applications

Quantile regression is used when an estimate of the various quantiles (such as the median) of a population is desired. One advantage of using quantile regression to estimate the median, rather than ordinary least squares regression to estimate the mean, is that quantile regression will be more robust in response to large outliers. Quantile regression can be seen as a natural analogue in regression analysis to the practice of using different measures of central tendency and statistical dispersion to obtain a more comprehensive and robust analysis.[1] Another advantage to quantile regression is the fact that any quantile can be estimated.

In ecology, quantile regression has been proposed and used as a way to discover more useful predictive relationships between variables in cases where there is no relationship or only a weak relationship between the means of such variables. The need for and success of quantile regression in ecology has been attributed to the complexity of interactions between different factors leading to data with unequal variation of one variable for different ranges of another variable.[2]

Mathematics

The mathematical forms arising from quantile regression are distinct from those arising in the method of least squares. The method of least squares leads to a consideration of problems in an inner product space, involving projection onto subspaces, and thus the problem of minimizing the squared errors can be reduced to a problem in numerical linear algebra. Quantile regression does not have this structure, and instead leads to problems in linear programming that can be solved by the simplex method. The fact that the algorithms of linear programming appear more esoteric to users may explain why quantile regression is not as widely used as the method of least squares.[3]

Quantiles

Let be a real valued random variable with distribution function . The th quantile of Y is given by

where

Define the loss function as . A specific quantile can be found by minimizing the expected loss of with respect to :[4]

This can be shown by setting the derivative of the expected loss function to 0 and letting be the solution of

This equation reduces to

and then to

Hence is th quantile of the random variable Y.

Example

Let be a discrete random variable that takes values 1,2,..,9 with equal probabilities. The task is to find the median of Y, and hence the value is chosen. The expected loss, L(u), is

Since is a constant, it can be taken out of the expected loss function (this is only true if ). Then, at u=3,

Suppose that u is increased by 1 unit. Then the expected loss will be changed by on changing u to 4. If , u=5, the expected loss is

and any change in u will increase the expected loss. Thus u=5 is the median. The Table below shows the expected loss (divided by ) for different values of u.

u 1 2 3 4 5 6 7 8 9
Expected loss 36 29 24 21 20 21 24 29 36

Intuition

Consider and let q be an initial guess for . The expected loss evaluated at q is

In order to miminize the expected loss, we move the value of q a little bit to see whether the expect loss will rise or fall. Suppose we increase q by 1 unit. Then the change of expected loss would be

The first term of the equation is and second term of the equation is . Therefore the change of expected loss function is negative if and only if , that is if and only if q is smaller than the median. Similarly, if we reduce q by 1 unit, the change of expected loss function is negative if and only if q is larger than the median.

In order to minimize the expected loss function, we would increase (decrease) L(q) if q is smaller (larger) than the median, until q reaches the median. The idea behind the minimization is to count the number of points (weighted with the density) that are larger or smaller than q and then move q to a point where q is larger than % of the points.

Sample quantile

The sample quantile can be obtained by solving the following minimization problem

The intuition is the same as for the population quantile.

Conditional Quantile and Quantile Regression

Suppose the th conditional quantile function is . Given the distribution function of , can be obtained by solving

Solving the sample analog gives the estimator of .

Computation

The minimization problem can be reformulated as a linear programming problem

where

,    ,     ,   

Simplex methods[5] or interior point methods[6] can be applied to solve the linear programming problem.

Asymptotic properties

For , under some regularity conditions, is asymptotically normal:

where

and

Equivariance

See invariant estimator for background on invariance and equivariance.

Scale equivariance

For any and

Shift equivariance

For any and

Equivariance to reparameterization of design

Let be any nonsingular matrix and

Invariance to monotone transformations

If is a nondecreasing function on 'R, the following invariance property applies:

Example 1

Let and , then . The mean regression does not have the same property since

Example 2

Let and , then . This is the censored quantile regression model: estimated values can be obtained without making any distributional assumptions, but at the cost of computational difficulty,[7] some of which can be avoided by using a simple three step censored quantile regression procedure as an approximation.[8]

Implementations

Some statistics packages, such as R, Eviews (ver. 6), Stata (via qreg), gretl, SAS through proc quantreg (ver. 9.2), and RATS include implementations of quantile regression. R implements it through Roger Koenker's quantreg package.

Notes

  1. ^ Koenker (2005) [page needed]
  2. ^ Brian S. Cade, Barry R. Noon, (2003) "A gentle introduction to quantile regression for ecologists", Frontiers in Ecology and the Environment, 1 (8), 412–420.
  3. ^ Roger Koenker, Kevin F. Hallock, (2001) "Quantile Regression", Journal of Economic Perspectives, 15 (4), 143–156
  4. ^ Koenker (2005) p.5-p.6
  5. ^ Koenker (2005) p.181
  6. ^ Koenker (2005) p.190
  7. ^ James L. Powell (1986) “Censored regression quantiles,” Journal of Econometrics, 32 (1), 143–155.
  8. ^ Victor Chernozhukov and Han Hong, “Three-Step Censored Quantile Regression and Extramarital Affairs,” Journal of the American Statistical Association, 97 (9), no. 459, 872–882.

References

  • Koenker, Roger (2005) Quantile Regression, Cambridge University Press. ISBN 0-521-60827-9

External links

  • Quantile LOWESS – A method to perform Local Quantile regression (with R code)