Lee–Carter model

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Lee–Carter model is a numerical algorithm used in mortality forecasting and life expectancy forecasting.[1] The input to the model is a matrix of age specific mortality rates ordered monotonically by time, usually with ages in columns and years in rows. The output is another forecasted matrix of mortality rates.

The model uses the singular value decomposition (SVD) to find a univariate time series vector "kt" that captures 80–90% of the mortality trend (here the subscript "t" refers to time), a vector "bx" that describes the amount of mortality change at a given age for a unit of yearly total mortality change (here the subscript "x" refers to age), and a scaling constant (referred to here as s1 but unnamed in the literature). Surprisingly, kt is usually linear, implying that gains to life expectancy are fairly constant year after year in most populations. Before being input to the SVD, age specific mortality rates are transformed into "ax,t", by taking their logarithms, and then centering them by subtracting their age-specific means (calculated over time). (The subscript "x,t" refers to the fact that ax,t spans both age and time.) Many researchers adjust the kt vector by fitting it to empirical life expectancies for each year, using the ax and bx just generated with the SVD; when adjusted using this approach, changes to kt are usually small.

To forecast mortality, the above kt (either adjusted or not) is projected into the future using ARIMA time series methods, the corresponding future ax,t+n is recovered by multiplying kt+n by bx and the appropriate diagonal element of S (when [U S V] = svd(mort)), and the actual mortality rates are recovered by taking exponentials of this vector. Because of the linearity of kt, it is generally modeled as a random walk with trend. Life expectancy and other life table measures can be calculated from this forecasted matrix after adding back the means and taking exponentials to yield regular mortality rates.

In most implementations, confidence intervals for the forecasts are generated by simulating multiple mortality forecasts using Monte-Carlo methods; a band of mortality between 5% and 95% percentiles of the simulated results is considered to be a valid forecast. These simulations are done by extending kt into the future using randomization based on the standard error of kt derived from the input data.

In outline and Matlab-style pseudocode, the algorithm is as follows:

  1. Create ax by taking logarithms of the mortality rates and centering the results with the average log mortality at a given age.
  2. Derive kt, a scaling eigenvalue, and bx from U(:,1), S(1,1), V(1,:), where [U S V] = svd(mort).
  3. Forecast kt with standard univariate ARIMA methods.
  4. Use the forecast kt with the original bx and ax to calculate logged mortality rates for each forecast year.
  5. Recover regular mortality rates by calculating the exponential of the forecasted log mortality rates.

Without applying SVD or some other method of dimension reduction the table of mortality data is a highly correlated multivariate data series; the complexity of these multidimensional time series makes such them almost impossible to forecast. SVD has become widely used as a method of dimension reduction in many disparate fields, including by Google in their page rank algorithm.

The Lee–Carter model was introduced by Ronald D. Lee and Lawrence Carter in 1992 with the article "Modeling and Forecasting the Time Series of U.S. Mortality," (Journal of the American Statistical Association 87 (September): 659–671).[2] The model grew out of their work in the late 1980s and early 1990s attempting to use inverse projection to infer rates in historical demography.[3] The model has been used by the United States Social Security Administration, the US Census Bureau, and the United Nations. It has become the most widely used mortality forecasting technique in the world today.[4]

There have been extensions to the Lee–Carter, most notably to account for missing years, correlated male and female populations, and large scale coherency in populations that share a mortality regime (western Europe, for example). Many related papers can be found on Professor Ronald Lee's website.

There are surprisingly few software packages for forecasting with the Lee-Carter Model. LCFIT is a web-based package with interactive forms. Professor Rob J. Hyndman provides an R package for demography that includes routines for creating and forecasting a Lee-Carter Model. Professor German Rodriguez provides code for the Lee-Carter Model using Stata.

References[edit]