In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani [1] to blend properties of generalized linear models with additive models.

The model relates a univariate response variable, Y, to some predictor variables, xi. An exponential family distribution is specified for Y (for example normal, binomial or Poisson distributions) along with a link function g (for example the identity or log functions) relating the expected value of Y to the predictor variables via a structure such as

$g(\operatorname{E}(Y))=\beta_0 + f_1(x_1) + f_2(x_2)+ \cdots + f_m(x_m).\,\!$

The functions fi(xi) may be functions with a specified parametric form (for example a polynomial, or a coefficient depending on the levels of a factor variable) or may be specified non-parametrically, or semi-parametrically, simply as 'smooth functions', to be estimated by non-parametric means. So a typical GAM might use a scatterplot smoothing function, such as a locally weighted mean, for f1(x1), and then use a factor model for f2(x2). This flexibility to allow non-parametric fits with relaxed assumptions on the actual relationship between response and predictor, provides the potential for better fits to data than purely parametric models, but arguably with some loss of interpretability.

Estimation

The original GAM estimation method was the backfitting algorithm,[1] which provides a very general modular estimation method capable of using a wide variety of smoothing methods to estimate the fᵢ(xᵢ). A disadvantage of backfitting is that it is difficult to integrate with well founded methods for choosing the degree of smoothness of the fᵢ(xᵢ). As a result alternative methods have been developed in which smooth functions are represented semi-parametrically, using penalized regression splines,[2] in order to allow computationally efficient estimation of the degree of smoothness of the model components using generalized cross validation[3] or similar criteria.

Overfitting can be a problem with GAMs.[4] The number of smoothing parameters can be specified, and this number should be reasonably small, certainly well under the degrees of freedom offered by the data. Cross-validation can be used to detect and/or reduce overfitting problems with GAMs (or other statistical methods).[citation needed] Other models such as GLMs may be preferable to GAMs unless GAMs improve predictive ability substantially (in validation sets) for the application in question.