# Cointegration

Cointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.

## Introduction

If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower order of integration, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated (I(1)) but some (cointegrating) vector of coefficients exists to form a stationary linear combination of them. For instance, a stock market index and the price of its associated futures contract move through time, each roughly following a random walk. Testing the hypothesis that there is a statistically significant connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series. (If such a combination has a low order of integration—in particular if it is I(0), this can signify an equilibrium relationship between the original series, which are said to be cointegrated.)

Before the 1980s many economists used linear regressions on (de-trended[citation needed]) non-stationary time series data, which Nobel laureate Clive Granger and Paul Newbold showed to be a dangerous approach that could produce spurious correlation,[1] [2] since standard detrending techniques can result in data that are still non-stationary.[3] His 1987 paper with Nobel laureate Robert Engle formalized the cointegrating vector approach, and coined the term.[4]

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).[1]

The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had initially been differenced. This method is incorrect if the non-stationary variables are cointegrated. Cointegration measures may be calculated over sets of time series using fast routines.[5]

## Test

The three main methods for testing for cointegration are:

### Engle–Granger two-step method

If two time series $x_t$ and $y_t$ are cointegrated, a linear combination of them must be stationary. In other words:

$y_t - \beta x_t = u_t \,$

where $u_t$ is stationary.

If we knew $u_t$, we could just test it for stationarity with something like a Dickey–Fuller test, Phillips–Perron test and be done. But because we don't know $\beta$, we must estimate this first, generally by using ordinary least squares, and then run our stationarity test on the estimated $u_t$ series, often denoted $\hat{u}_t$. This is the Engle–Granger two-step method.

### Johansen test

The Johansen test is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method. but this test is subject to asymptotic properties i.e. large sample. if the sample size is too small then the results will not be reliable and go for ARDL (Auto Regressive Distributed Lags[6] [7]).

### Notes

In practice, cointegration is often used for two I(1) series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.

However, these tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people’s preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown structural break,[8] and tests for cointegration with two unknown breaks are also available.[9]

## Stochastic trends

Cointegration has become an important property in contemporary time series for the following reasons. Time series often have trends—either deterministic or stochastic. The R-squared statistic used in assessing adequacy of regressions gives substantially misleading results for time series with trends. To verify this, pick any consumption series for any country and regress it against GNP for any other country. Unless you are unlucky, you will find a strong correlation, and a regression with very high R-squared will result. This is called spurious regression—even though there is no relationship between the two series, the regression results suggests that there is a strong relationship. When both series have deterministic trends, the problem can be solved by detrending the series prior to running the regression. In a seminal paper, Nelson and Plosser (1982) showed that most time series have stochastic trends—these are also called unit root processes, or processes integrated of order 1—I(1). For integrated I(1) processes, Granger and Newbold showed that detrending does not work to eliminate the problem of spurious regression. A superior alternative is to check for co-integration. Two series with I(1) trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is as follows. Check all series involved for integration. If there are I(1) series on both sides of the regression relationship, then there is a possibility that you will get misleading results from running a regression. So now check for co-integration between all the I(1) series. If this holds, this is a guarantee that the regression results you get are not spurious.

## References

1. ^ a b Granger, C.; Newbold, P. (1974). "Spurious Regressions in Econometrics". Journal of Econometrics 2 (2): 111–120. doi:10.1016/0304-4076(74)90034-7.
2. ^ Mahdavi Damghani, Babak (2012). "The Misleading Value of Measured Correlation". Wilmott 2012 (1): 64–73. doi:10.1002/wilm.10167.
3. ^ Granger, Clive (1981). "Some Properties of Time Series Data and Their Use in Econometric Model Specification". Journal of Econometrics 16 (1): 121–130. doi:10.1016/0304-4076(81)90079-8.
4. ^ Engle, Robert F.; Granger, Clive W. J. (1987). "Co-integration and error correction: Representation, estimation and testing". Econometrica 55 (2): 251–276. JSTOR 1913236.
5. ^ Yang, Michael (August 29, 2013). "A Patch for scipy.spatial.distance for cointegration". Retrieved 29 August 2013.
6. ^ Giles, David. "ARDL Models - Part II - Bounds Tests". Retrieved 4 August 2014.
7. ^ Pesaran, M.H.; Shin, Y.; Smith, R.J. (2001). "Bounds testing approaches to the analysis of level relationships". Journal of Applied Econometrics 16: 289-326. doi:10.1002/jae.616.
8. ^ Gregory, Allan W.; Hansen, Bruce E. (1996). "Residual-based tests for cointegration in models with regime shifts". Journal of Econometrics 70 (1): 99–126. doi:10.1016/0304-4076(69)41685-7.
9. ^