# Granger causality

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another.[1] Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be reflected by some sort of tests. Since the question of "true causality" is deeply philosophical, econometricians assert that the Granger test finds only "predictive causality".[2]

Granger also stressed that some of studies using "Granger causality" test in the areas outside economics reached "ridiculous" conclusions. "Of course, many ridiculous papers appeared", he said in his Nobel Lecture, December 8, 2003.[3]

A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.

## Intuitive explanation

The "correlation equals causation" fallacy says that one thing preceding another can't be used as a proof of causation. Consider the claim that "increased education spending leads to better outcomes for children." A simple correlation of spending on schools versus education outcomes would lead to a positive result; those that spent more also had better outcomes. We can control for confounding variables, such as income, at any given time, but current spending may not affect outcomes until later. The idea of Granger causality is if whenever there is a "surprise" in the explanatory variable that leads to a later increase in the outcome variable we call this variable "Granger causal."

In the education example, assume there were some places where education spending spiked to an unusual level some years while the confounders did not significantly change. If every time this spike happened there was a corresponding increase in performance in the future relative to when these spikes didn't happen, education spending "Granger causes" higher performance. This is not the only definition of causality but in many applications it is useful.

## Method

If a time series is a Stationary process, the test is performed using the level values of two (or more) variables. If the variables are non-stationary, then the test is done using first (or higher) differences. The number of lags to be included is usually chosen using an information criterion, such as the Akaike information criterion or the Schwarz information criterion. Any particular lagged value of one of the variables is retained in the regression if (1) it is significant according to a t-test, and (2) it and the other lagged values of the variable jointly add explanatory power to the model according to an F-test. Then the null hypothesis of no Granger causality is not rejected if and only if no lagged values of an explanatory variable have been retained in the regression.

In practice it may be found that neither variable Granger-causes the other, or that each of the two variables Granger-causes the other.

## Limitations

As its name implies, Granger causality is not necessarily true causality. If both X and Y are driven by a common third process with different lags, one might still fail to reject the alternative hypothesis of Granger causality. Yet, manipulation of one of the variables would not change the other. Indeed, the Granger test is designed to handle pairs of variables, and may produce misleading results when the true relationship involves three or more variables. A similar test involving more variables can be applied with vector autoregression.

## Mathematical statement

Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in a univariate autoregression of y:

$y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + \cdots + a_my_{t-m} + \mathrm{residual}_t.$

Next, the autoregression is augmented by including lagged values of x:

$y_t = a_0 + a_1y_{t-1} + a_2y_{t-2} + \cdots + a_my_{t-m} + b_px_{t-p} + \cdots + b_qx_{t-q} + \mathrm{residual}_t.$

One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x's). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.

The null hypothesis that x does not Granger-cause y is not rejected if and only if no lagged values of x are retained in the regression.

## Extensions

A method for Granger causality has been developed that is not sensitive to deviations from the assumption that the error term is normally distributed.[4] This method is especially useful in financial economics since many financial variables are non-normally distributed.[5] Recently, asymmetric causality testing has been suggested in the literature in order to separate the causal impact of positive changes from the negative ones.[6]

### Granger causality in neuroscience

A long held belief about neural function maintained that different areas of the brain were task specific; that the structural connectivity local to a certain area somehow dictated the function of that piece. Collecting work that has been performed over many years, there has been a move to a different, network-centric approach to describing information flow in the brain. Explanation of function is beginning to include the concept of networks existing at different levels and throughout different locations in the brain.[7] The behavior of these networks can be described by non-deterministic processes that are evolving through time. That is to say that given the same input stimulus, you will not get the same output from the network. The dynamics of these networks are governed by probabilities so we treat them as stochastic (random) processes so that we can capture these kinds of dynamics between different areas of the brain.

Different methods of obtaining some measure of information flow from the firing activities of a neuron and its surrounding ensemble has been explored in the past, but they are limited in the kinds of conclusions you can draw and tell you little about the directional flow of information, to what degree, and how it can change with time.[8] Recently Granger causality has been applied to address some of these issues with great success. Put plainly, one examines how to best predict the future of a neuron: using either the entire ensemble or the entire ensemble except a certain target neuron. If the prediction is made worse by excluding the target neuron, then we say it has a “g-causal” relationship with the current neuron.

#### Extensions to point process models

Previous granger-causality methods could only operate on continuous-valued data so the analysis of neural spike train recordings involved transformations that ultimately altered the stochastic properties of the data, indirectly altering the validity of the conclusions that could be drawn from it. Recently however, a new general-purpose granger-causality framework was proposed that could directly operate on any modality, including neural-spike trains.[8]

Neural spike train data can be modeled as a point-process. A temporal point process is a stochastic time-series of binary events that occurs in continuous time. It can only take on two values at each point in time, indicating whether or not an event has actually occurred. This type of binary-valued representation of information suits the activity of neural populations because a single neuron’s action potential has a typical waveform. In this way, what carries the actual information being output from a neuron is the occurrence of a “spike”, as well as the time between successive spikes. Using this approach one could abstract the flow of information in a neural-network to be simply the spiking times for each neuron through an observation period. A point-process can be represented either by the timing of the spikes themselves, the waiting times between spikes, using a counting process, or, if time is discretized small enough to ensure that in each window only one event has the possibility of occurring, that is to say one time bin can only contain one event, as a set of 1s and 0s, very similar to binary.

One of the simplest types of neural-spiking models is the Poisson process. This however, is limited in that it is memory-less. It does not account for any spiking history when calculating the current probability of firing. Neurons, however, exhibit a fundamental (biophysical) history dependence by way of its relative and absolute refractory periods. To address this a conditional intensity function is used to represent the probability of a neuron spiking, conditioned on its own history. The conditional intensity function expresses the instantaneous firing probability and implicitly defines a complete probability model for the point process. It defines a probability per unit time. So if this unit time is taken small enough to ensure that only one spike could occur in that time window, then our conditional intensity function completely specifies the probability that a given neuron will fire in a certain time.

#### Reconstructing a sample network

One may wish however to account for a number of covariates when analyzing neural data, not just a neuron’s own history. These covariates could include the spiking history of surrounding neurons or the onset of some external stimulation. Using a generalized linear model, one could construct a parametric model of the conditional intensity functions of these point processes. This allows us to intuitively model relationships and the biophysical properties we think are relevant in a certain situation. Under the GLM, the log of the conditional intensity function is modeled as a linear combination of functions of the postulated covariates. It should also be noted that the linearity of the model refers to the relationships between the covariates, not the functions of the covariates themselves. There are many different families of generalized linear models, and different techniques that have been proposed to optimize statistically and computationally. To determine the model order, or rather, how far back one should look in a neuron’s own history as well as those of its ensemble, one could use one of several different measures of statistical significance, such as Akaike’s information criterion.

##### Simulation

The three networks in this figure are simulated using a simple GLM. These 9 neurons were simulated according to these simple graphs for 100 seconds, resulting in 100,000 samples per neuron.

##### Reconstruction

The raw data was then analyzed using the granger causality framework, without specifying absolutely anything about their connection. These three simple networks were analyzed as one large network in order to both verify that we could directly pick up directional information within the simple networks themselves, as well as the absence of any connection between them. In other words, as far as the granger causality analysis is concerned, the analysis was performed on recordings from 9 neurons for 100 seconds, without knowing anything about them. These neurons are then numbered from 1 to 9, with network A being 1–3, network B being 4–6, and network C being 7–9.

After performing the full granger causality analysis to estimate relative strengths and directions in the network, we are left with the following figure which represents the relative strengths of the causal interaction between neurons. It estimates as to the extent to which a “causal source” neuron has an effect on a “causal sink” neuron compared to other interconnections.

This however has not performed any statistical significance calculations so hypothesis testing was performed and the false discovery rate was controlled at 1%. Here the red, blue, and green colors denote the presence of excitatory, inhibitory, or no interactions from causal source neuron to causal sink, respectively.   In other words, one can look at the bottom row as the start of an arrow, and if there is a blue square up that column you draw an inhibitory directed arrow, if there is a red square you draw an excitatory directed arrow, and if there is a green square you don’t draw anything.

As we can see, the networks are indeed recovered correctly.

## References

1. ^ Granger, C. W. J. (1969). "Investigating Causal Relations by Econometric Models and Cross-spectral Methods". Econometrica 37 (3): 424–438. doi:10.2307/1912791. JSTOR 1912791.
2. ^ Diebold, Francis X. (2001). Elements of Forecasting (2nd ed.). Cincinnati: South Western. p. 254. ISBN 0-324-02393-6.
3. ^ http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/2003/granger-lecture.pdf
4. ^ Hacker, R. S.; Hatemi-J, A. (2006). "Tests for causality between integrated variables using asymptotic and bootstrap distributions: theory and application". Applied Economics 38 (13): 1489–1500.
5. ^ Mandelbrot, Benoit (1963). "The variation of certain speculative prices". Journal of Business 36 (1): 394–419. doi:10.1086/294632.
6. ^ Hatemi-J, A. (2012). "Asymmetric causality tests with an application". Empirical Economics 43 (1): 447–456.
7. ^ Knight, R. T. (2007). "Neural Networks Debunk Phrenology". Science 316 (5831): 1578–1579. doi:10.1126/science.1144677.
8. ^ a b Kim, S.; Putrino, D.; Ghosh, S.; Brown, E. N. (2011). "A Granger Causality Measure for Point Process Models of Ensemble Neural Spiking Activity". PLoS Comput Biol 7 (3): e1001110. doi:10.1371/journal.pcbi.1001110.