Autoregressive integrated moving average

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied to remove the non-stationarity.

The model is generally referred to as an ARIMA(p,d,q) model where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling.

When one of the three terms is zero, it is usual to drop "AR", "I" or "MA" from the acronym describing the model. For example, ARIMA(0,1,0) is I(1), and ARIMA(0,0,1) is MA(1).

Definition[edit]

Given a time series of data X_t where t is an integer index and the X_t are real numbers, then an ARMA(p' ,q) model is given by:


\left(
  1 - \sum_{i=1}^{p'} \alpha_i L^i
\right) X_t
=
\left(
  1 + \sum_{i=1}^q \theta_i L^i
\right) \varepsilon_t \,

where L is the lag operator, the \alpha_i are the parameters of the autoregressive part of the model, the \theta_i are the parameters of the moving average part and the \varepsilon_t are error terms. The error terms \varepsilon_t are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.

Assume now that the polynomial \left( 1 - \sum_{i=1}^{p'} \alpha_i L^i \right) has a unitary root of multiplicity d. Then it can be rewritten as:


\left(
  1 - \sum_{i=1}^{p'} \alpha_i L^i
\right)
=
\left(
  1 - \sum_{i=1}^{p'-d} \phi_i L^i
\right)
\left(
  1 - L
\right)^{d} .

An ARIMA(p,d,q) process expresses this polynomial factorisation property with p=p'−d, and is given by:


\left(
  1 - \sum_{i=1}^p \phi_i L^i
\right)
\left(
  1-L
\right)^d
X_t
=
\left(
  1 + \sum_{i=1}^q \theta_i L^i
\right) \varepsilon_t \,

and thus can be thought as a particular case of an ARMA(p+d,q) process having the autoregressive polynomial with d unit roots. (For this reason, every ARIMA model with d>0 is not wide sense stationary.)

The above can be generalized as follows.


\left(
  1 - \sum_{i=1}^p \phi_i L^i
\right)
\left(
  1-L
\right)^d
X_t
=
\delta + \left(
  1 + \sum_{i=1}^q \theta_i L^i
\right) \varepsilon_t \,

This defines an ARIMA(p,d,q) process with drift δ/(1−Σφi).

Other special forms[edit]

The explicit identification of the factorisation of the autoregression polynomial into factors as above, can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor \left( 1 - L^s \right) in a model is one way of including a non-stationary seasonality of period s into the model; this factor has the effect of re-expressing the data as changes from s periods ago. Another example is the factor \left( 1 -\sqrt{3} L + L^2 \right), which includes a (non-stationary) seasonality of period 2.[clarification needed] The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.[clarification needed]

Identification and specification of appropriate factors in an ARIMA model can be an important step in modelling as it can allow a reduction in the overall number of parameters to be estimated, while allowing the imposition on the model of types of behaviour that logic and experience suggest should be there.

Forecasts using ARIMA models[edit]

The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:


Y_t
=
\left(
  1-L
\right)^d
X_t

while the second is wide-sense stationary:


\left(
  1 - \sum_{i=1}^p \phi_i L^i
\right)
Y_t
=
\left(
  1 + \sum_{i=1}^q \theta_i L^i
\right) \varepsilon_t \, .

Now forecasts can be made for the process Y_t, using a generalization of the method of autoregressive forecasting.

Examples[edit]

Some well-known special cases arise naturally. For example, an ARIMA(0,1,0) model (or I(1) model) is given by

X_t = X_{t-1} + \varepsilon_t

—which is simply a random walk.

Variations and extensions[edit]

A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the X_t can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally better to use a SARIMA (seasonal ARIMA) model than to increase the order of the AR or MA parts of the model. If the time-series is suspected to exhibit long-range dependence, then the d parameter may be allowed to have non-integer values in an autoregressive fractionally integrated moving average model, which is also called a Fractional ARIMA (FARIMA or ARFIMA) model.

Implementations in statistics packages[edit]

Various packages that apply methodology like Box-Jenkins parameter optimization are available to find the right parameters for the ARIMA model.

  • In R, the standard stats package includes an arima function, is documented in "ARIMA Modelling of Time Series". Besides the ARIMA(p,d,q) part, the function also includes seasonal factors, an intercept term, and exogenous variables (xreg, called "external regressors"). The CRAN task view on Time Series is the reference with many more links. The "forecast" package in R can automatically select an ARIMA model for a given time series with the auto.arima() function. The package can also simulate seasonal and non-seasonal ARIMA models with its simulate.Arima() function. It also has a function Arima(), which is a wrapper for the arima from the "stats" package.
  • IBM SPSS includes ARIMA modeling in its Statistics and Modeler statistical packages. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive (p), integrated (d), and moving average (q) settings and seven exponential smoothing models to determine which model produces the smallest normalized Bayesian Information Criterion (BIC) value, a goodness-of-fit measure. The Expert Modeler will also transform the target time-series data into its square root or natural log if it lowers the BIC with the best fit model. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonal p, d, and q settings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
  • The APO-FCS package[1] in SAP ERP from SAP allows creation and fitting of ARIMA models using the Box-Jenkins methodology.
  • SAS includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
  • Stata includes ARIMA modelling (using its arima command) as of Stata 9.
  • SQL Server Analysis Services from Microsoft includes ARIMA as a Data Mining algorithm.
  • Mathematica includes ARIMAProcess function.
  • EViews has extensive ARIMA and SARIMA capabilities.
  • MATLAB's Econometrics Toolbox includes ARIMA models and regression with ARIMA errors

See also[edit]

References[edit]

  1. ^ "Box Jenkins model". SAP. Retrieved 8 March 2013. 

Further reading[edit]

  • Asteriou, Dimitros; Hall, Stephen G. (2011). "ARIMA Models and the Box–Jenkins Methodology". Applied Econometrics (Second ed.). Palgrave MacMillan. pp. 265–286. ISBN 978-0-230-27182-1. 
  • Mills, Terence C. (1990). Time Series Techniques for Economists. Cambridge University Press. ISBN 0-521-34339-9. 
  • Percival, Donald B.; Walden, Andrew T. (1993). Spectral Analysis for Physical Applications. Cambridge University Press. ISBN 0-521-35532-X. 

External links[edit]