# Granger causality

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another.[1] Ordinarily, regressions reflect "mere" correlations, but Clive Granger, who won a Nobel Prize in Economics, argued that a certain set of tests reveal something about causality.

A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.

## Intuitive Explanation

The "correlation equals causation" fallacy says that one thing preceding another can't be used as a proof of causation. Consider the claim that "increased education spending leads to better outcomes for children." A simple correlation of spending on schools versus education outcomes would lead to a positive result; those that spent more also had better outcomes. We can control for confounding variables, such as income, at any given time, but current spending may not affect outcomes until later. The idea of Granger causality is if whenever there is a "surprise" in the explanatory variable that leads to a later increase in the outcome variable we call this variable "Granger causal."

In the education example, assume there were some places where education spending spiked to an unusual level some years while the confounders did not significantly change. If every time this spike happened there was a corresponding increase in performance in the future relative to when these spikes didn't happen, education spending "Granger causes" higher performance. This is not the only definition of causality but in many applications it is useful.

## Method

If a time series is stationary, the test is performed using the level values of two (or more) variables. If the variables are non-stationary, then the test is done using first (or higher) differences. The number of lags to be included is usually chosen using an information criterion, such as the Akaike information criterion or the Schwarz information criterion. Any particular lagged value of one of the variables is retained in the regression if (1) it is significant according to a t-test, and (2) it and the other lagged values of the variable jointly add explanatory power to the model according to an F-test. Then the null hypothesis of no Granger causality is not rejected if and only if no lagged values of an explanatory variable have been retained in the regression.

In practice it may be found that neither variable Granger-causes the other, or that each of the two variables Granger-causes the other.

## Limitations

As its name implies, Granger causality is not necessarily true causality. If both X and Y are driven by a common third process with different lags, one might still accept the alternative hypothesis of Granger causality. Yet, manipulation of one of the variables would not change the other. Indeed, the Granger test is designed to handle pairs of variables, and may produce misleading results when the true relationship involves three or more variables. A similar test involving more variables can be applied with vector autoregression.

## Mathematical statement

Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in a univariate autoregression of y:

$y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + \cdots + a_my_{t-m} + \mathrm{residual}_t.$

Next, the autoregression is augmented by including lagged values of x:

$y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + \cdots a_my_{t-m} + b_1x_{t-1} + \cdots + b_qx_{t-q} + \mathrm{residual}_t.$

One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x's). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.

The null hypothesis that x does not Granger-cause y is accepted if and only if no lagged values of x are retained in the regression.

## Extensions

A method for Granger causality has been developed that is not sensitive to deviations from the assumption that the error term is normally distributed.[2] This method is especially useful in financial economics since many financial variables are non-normally distributed.[3] Recently, asymmetric causality testing has been suggested in the literature in order to separate the causal impact of positive changes from the negative ones.[4]

## References

1. ^ Granger, C. W. J. (1969). "Investigating Causal Relations by Econometric Models and Cross-spectral Methods". Econometrica 37 (3): 424–438. doi:10.2307/1912791. JSTOR 1912791.
2. ^ Hacker R.S. and Hatemi-J A. (2006) "Tests for causality between integrated variables using asymptotic and bootstrap distributions: theory and application", Applied Economics, Vol. 38(13), pp. 1489–1500.
3. ^ Mandelbrot, Benoit (1963). "The variation of certain speculative prices". Journal of Business 36 (1): 394–419. doi:10.1086/294632.
4. ^ Hatemi-J, A. (2012). "Asymmetric causality tests with an application". Empirical Economics 42 (6): forthcoming. doi:10.1007/s00181-011-0484-x.