Generalized additive model for location, scale and shape

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the generalized additive model location, scale and shape (GAMLSS) is a class of statistical model that provides extended capabilities compared to the simpler generalized linear models and generalized additive models. These simpler models allow the typical values of a quantity being modelled to be related to whatever explanatory variables are available. Here the "typical value" is more formally a location parameter, which only describes a limited aspect of the probability distribution of the dependent variable. The GAMLSS approach allows other parameters of the distribution to be related to the explanatory variables; where these other parameters might be interpreted as scale and shape parameters of the distribution, although the approach is not limited to such parameters.

Overview of the model[edit]

The generalized additive model location, scale and shape (GAMLSS) is a statistical model developed by Rigby and Stasinopoulos,[citation needed] and later expanded[1] to overcome some of the limitations associated with the popular generalized linear models (GLMs) and generalized additive models (GAMs).[2]

In GAMLSS the exponential family distribution assumption for the response variable, (y), (essential in GLMs and GAMs), is relaxed and replaced by a general distribution family, including highly skew and/or kurtotic continuous and discrete distributions.

The systematic part of the model is expanded to allow modeling not only of the mean (or location) but other parameters of the distribution of y as linear and/or nonlinear, parametric and/or additive non-parametric functions of explanatory variables and/or random effects.

GAMLSS is especially suited for modelling leptokurtic or platykurtic and/or positive or negative skew response variable. For count type response variable data it deals with over-dispersion by using proper over-dispersed discrete distributions. Heterogeneity also is dealt with by modelling the scale or shape parameters using explanatory variables. There are several packages written in R related to GAMLSS models.[3]).

A GAMLSS model assumes independent observations y_i for i = 1, 2, \dots , n with probability (density) function f (y_i | \mu_i , \sigma_i , \nu_i , \tau_i ) conditional on (\mu_i , \sigma_i , \nu_i , \tau_i ) a vector of four distribution parameters, each of which can be a function to the explanatory variables. The first two population distribution parameters \mu_i and \sigma_i are usually characterized as location and scale parameters, while the remaining parameter(s), if any, are characterized as shape parameters, e.g. skewness and kurtosis parameters, although the model may be applied more generally to the parameters of any population distribution with up to four distribution parameters, and can be generalized to more than four distribution parameters.



\begin{align}
g_1 (\mu) = \eta_1= X_1 \beta_1 + \sum_{j=1}^{J_1} {h}_{j1}(x_{j1}) \\
g_2(\sigma) = \eta_2= X_2 \beta_2 + \sum_{j=1}^{J_2}{h}_{j2}(x_{j2}) \\
g_3(\nu) = \eta_3 =  X_3 \beta_3 + \sum_{j=1}^{J_3}{h}_{j3}(x_{j3}) \\
g_4(\tau)=\eta_4=X_4 \beta_4 + \sum_{j=1}^{J_4}{h}_{j4}(x_{j4})
\end{align}

where μ, σ, ν, τ and \eta_k are vectors of length n, \beta^{T}_k = (\beta_{1k},\beta_{2k},\ldots,\beta_{J'_{k}
k}) is a parameter vector of length J'_k, X_k is a fixed known design matrix of order n \times J'_k and h_{jk} is a smooth non-parametric function of explanatory variable x_{jk}, j=1,2,\ldots, J_{k} and k=1,2,3,4.

For centile estimation the WHO Multicentre Growth Reference Study Group have recommended GAMLSS and the Box-Cox power exponential (BCPE) distributions[4] for the construction of the WHO Child Growth Standards.[5][6]

What distributions can be used[edit]

The form of the distribution assumed for the response variable y, is very general. For example an implementation of GAMLSS in R[7] has around 50 different distributions available. Such implementations also allow use of truncated distributions and censored (or interval) response variables.[7]

Notes[edit]

  1. ^ Ahmed Zaheer. Rigby, R. A. and Stasinopoulos D. M. (2005) "Generalized additive models for location, scale and shape, (with discussion)", Appl. Statist., 54, part 3, pp 507–554.Link
  2. ^ For an overview of these limitations see Nelder and Wedderburn, 1972[full citation needed] and Hastie and Tibshirani, 1990.[full citation needed]
  3. ^ Stasinopoulos D. M.; Rigby R.A. (2007) "Generalized additive models for location scale and shape (GAMLSS) in R". Journal of Statistical Software, 23 (7), Issue 7, Dec 2007. Link
  4. ^ Robert A. Rigby and D. Mikis Stasinopoulos (2004)"Smooth centile curves for skew and kurtotic data modelled using the Box-Cox Power Exponential distribution". Preprint
  5. ^ Borghi, E.; De Onis, M.; Garza, C.; Van Den Broeck, J.; Frongillo, E. A.; Grummer-Strawn, L.; Van Buuren, S.; Pan, H.; Molinari, L.; Martorell, R.; Onyango, A. W.; Martines, J. C.; WHO Multicentre Growth Reference Study Group (2006). "Construction of the World Health Organization child growth standards: Selection of methods for attained growth curves". Statistics in Medicine 25 (2): 247–265. doi:10.1002/sim.2227. PMID 16143968.  edit
  6. ^ WHO Multicentre Growth Reference Study Group (2006) WHO Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: World Health Organization.
  7. ^ a b R packages for GAMLSS can be downloaded from here

Further reading[edit]

  • Beyerlein, A., Fahrmeir, L., Mansmann, U., Toschke., A. M. (2001) "Alternative regression models to assess increase in childhood BM". IBMC Medical Research Methodology, 2008, 8(59) doi:10.1186/1471-2288-8-59
  • Cole, T. J., Stanojevic, S., Stocks, J., Coates, A. L., Hankinson, J. L., Wade, A. M. (2009), "Age- and size-related reference ranges: A case study of spirometry through childhood and adulthood", Statistics in Medicine, 28(5), 880-898.Link
  • Fenske, N., Fahrmeir, L., Rzehak, P., Hohle, M. (25 September 2008), "Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data", Department of Statistics: Technical Reports, No.38 Link
  • Hudson, I. L., Kim, S. W., Keatley, M. R. (2010), "Climatic Influences on the Flowering Phenology of Four Eucalypts: A GAMLSS Approach Phenological Research". In Phenological Research, Irene L. Hudson and Marie R. Keatley (eds), Springer Netherlands Link
  • Hudson, I. L., Rea, A., Dalrymple, M. L., Eilers, P. H. C. (2008), "Climate impacts on sudden infant death syndrome: a GAMLSS approach", Proceedings of the 23rd international workshop on statistical modelling pp. 277–280. Link
  • Nott, D. (2006), "Semiparametric estimation of mean and variance functions for non-Gaussian data", Computational Statistics, 21(3-4), 603-620. Link
  • Serinaldi, F. (2011), "Distributional modeling and short-term forecasting of electricity prices by Generalized Additive Models for Location, Scale and Shape", Energy Economics, 33(6), 1216-1226, doi:10.1016/j.eneco.2011.05.001
  • Serinaldi, F., Cuomo, G. (2011) "Characterizing impulsive wave-in-deck loads on coastal bridges by probabilistic models of impact maxima and rise times", Coastal Engineering, 58(9), 908-926, doi:10.1016/j.coastaleng.2011.05.010
  • Serinaldi, F., Villarini, G., Smith, J. A., Krajewski, W. F. (2008), "Change-Point and Trend Analysis on Annual Maximum Discharge in Continental United States", American Geophysical Union Fall Meeting 2008, abstract #H21A-0803*
  • van Ogtrop, F. F., Vervoort, R. W., Heller, G. Z., Stasinopoulos, D. M., Rigby, R. A. (2011) "Long-range forecasting of intermittent streamflow", Hydrology and Earth System Sciences Discussions, 8(1), 681-713. doi:10.5194/hessd-8-681-2011
  • Villarini, G., Serinaldi, F. (2011), "Development of statistical models for at-site probabilistic seasonal rainfall forecast", International Journal of Climatology. doi:10.1002/joc.3393
  • Villarini, G., Serinaldi, F., Smith, J. A., Krajewski, W. F. (2009), "On the stationarity of annual flood peaks in the continental United States during the 20th century", Water Resources Research, 45(8). Link
  • Villarini, G., Smith, J. A., Napolitano, F. (2010), "Nonstationary modeling of a long record of rainfall and temperature over Rome", Advances in Water Resources doi: 10.1016/j.advwatres.2010.03.013

External links[edit]