# Regression-kriging

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables (such as parameters derived from digital elevation modelling, remote sensing/imagery, and thematic maps) with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.

## BLUP for spatial data

Regression-kriging is an implementation of the best linear unbiased predictor (BLUP) for spatial data, i.e. the best linear interpolator assuming the universal model of spatial variation. Matheron (1969) proposed that a value of a target variable at some location can be modeled as a sum of the deterministic and stochastic components:

$Z(\mathbf {s} )=m(\mathbf {s} )+\varepsilon '(\mathbf {s} )+\varepsilon ''$ which he termed universal model of spatial variation. Both deterministic and stochastic components of spatial variation can be modeled separately. By combining the two approaches, we obtain:

${\hat {z}}(\mathbf {s} _{0})={\hat {m}}(\mathbf {s} _{0})+{\hat {e}}(\mathbf {s} _{0})=\sum \limits _{k=0}^{p}{{\hat {\beta }}_{k}\cdot q_{k}(\mathbf {s} _{0})}+\sum \limits _{i=1}^{n}\lambda _{i}\cdot e(\mathbf {s} _{i})$ where ${\hat {m}}(\mathbf {s} _{0})$ is the fitted deterministic part, ${\hat {e}}(\mathbf {s} _{0})$ is the interpolated residual, ${\hat {\beta }}_{k}$ are estimated deterministic model coefficients (${\hat {\beta }}_{0}$ is the estimated intercept), $\lambda _{i}$ are kriging weights determined by the spatial dependence structure of the residual and where $e(\mathbf {s} _{i})$ is the residual at location ${\mathbf {s} }_{i}$ . The regression coefficients ${\hat {\beta }}_{k}$ can be estimated from the sample by some fitting method, e.g. ordinary least squares (OLS) or, optimally, using generalized least squares (GLS):

$\mathbf {\hat {\beta }} _{\mathtt {GLS}}=\left(\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {q} \right)^{-\mathbf {1} }\cdot \mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {z}$ where $\mathbf {\hat {\beta }} _{\mathtt {GLS}}$ is the vector of estimated regression coefficients, $\mathbf {C}$ is the covariance matrix of the residuals, ${\mathbf {q} }$ is a matrix of predictors at the sampling locations and $\mathbf {z}$ is the vector of measured values of the target variable. The GLS estimation of regression coefficients is, in fact, a special case of the geographically weighted regression. In the case, the weights are determined objectively to account for the spatial auto-correlation between the residuals.

Once the deterministic part of variation has been estimated (regression-part), the residual can be interpolated with kriging and added to the estimated trend. The estimation of the residuals is an iterative process: first the deterministic part of variation is estimated using OLS, then the covariance function of the residuals is used to obtain the GLS coefficients. Next, these are used to re-compute the residuals, from which an updated covariance function is computed, and so on. Although this is by many geostatisticians recommended as the proper procedure, Kitanidis (1994) showed that use of the covariance function derived from the OLS residuals (i.e. a single iteration) is often satisfactory, because it is not different enough from the function derived after several iterations; i.e. it does not affect much the final predictions. Minasny and McBratney (2007) report similar results—it seems that using more higher quality data is more important than to use more sophisticated statistical methods.

In matrix notation, regression-kriging is commonly written as:

${\hat {z}}_{\mathtt {RK}}(\mathbf {s} _{0})=\mathbf {q} _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {\hat {\beta }} _{\mathtt {GLS}}+\mathbf {\lambda } _{\mathbf {0} }^{\mathbf {T} }\cdot (\mathbf {z} -\mathbf {q} \cdot \mathbf {\hat {\beta }} _{\mathtt {GLS}})$ where ${\hat {z}}({\mathbf {s} }_{0})$ is the predicted value at location ${\mathbf {s} }_{0}$ , ${\mathbf {q} }_{\mathbf {0} }$ is the vector of $p+1$ predictors and $\mathbf {\lambda } _{\mathbf {0} }$ is the vector of $n$ kriging weights used to interpolate the residuals. The RK model is considered to be the Best Linear Predictor of spatial data. It has a prediction variance that reflects the position of new locations (extrapolation) in both geographical and feature space:

${\hat {\sigma }}_{\mathtt {RK}}^{2}(\mathbf {s} _{0})=(C_{0}+C_{1})-\mathbf {c} _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {C} ^{\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }+\left(\mathbf {q} _{\mathbf {0} }-\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }\right)^{\mathbf {T} }\cdot \left(\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {q} \right)^{\mathbf {-1} }\cdot \left(\mathbf {q} _{\mathbf {0} }-\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }\right)$ where $C_{0}+C_{1}$ is the sill variation and ${\mathbf {c} }_{0}$ is the vector of covariances of residuals at the unvisited location.

Many (geo)statisticians believe that there is only one Best Linear Unbiased Prediction model for spatial data (e.g. regression-kriging), all other techniques such as ordinary kriging, environmental correlation, averaging of values per polygons or inverse distance interpolation can be seen as its special cases. If the residuals show no spatial auto-correlation (pure nugget effect), the regression-kriging converges to pure multiple linear regression, because the covariance matrix ($\mathbf {C}$ ) becomes an identity matrix. Likewise, if the target variable shows no correlation with the auxiliary predictors, the regression-kriging model reduces to ordinary kriging model because the deterministic part equals the (global) mean value. Hence, pure kriging and pure regression should be considered as only special cases of regression-kriging (see figure).

## RK and UK/KED

The geostatistical literature uses many different terms for what are essentially the same or at least very similar techniques. This confuses the users and distracts them from using the right technique for their mapping projects. In fact, both universal kriging, kriging with external drift, and regression-kriging are basically the same technique.

Matheron (1969) originally termed the technique Le krigeage universel, however, the technique was intended as a generalized case of kriging where the trend is modelled as a function of coordinates. Thus, many authors reserve the term universal kriging (UK) for the case when only the coordinates are used as predictors. If the deterministic part of variation (drift) is defined externally as a linear function of some auxiliary variables, rather than the coordinates, the term kriging with external drift (KED) is preferred (according to Hengl 2007, "About regression-kriging: From equations to case studies"). In the case of UK or KED, the predictions are made as with kriging, with the difference that the covariance matrix of residuals is extended with the auxiliary predictors. However, the drift and residuals can also be estimated separately and then summed. This procedure was suggested by Ahmed et al. (1987) and Odeh et al. (1995) later named it regression-kriging, while Goovaerts (1997) uses the term kriging with a trend model to refer to a family of interpolators, and refers to RK as simple kriging with varying local means. Minasny and McBratney (2007) simply call this technique Empirical Best Linear Unbiased Predictor i.e. E-BLUP.

In the case of KED, predictions at new locations are made by:

${\hat {z}}_{\mathtt {KED}}(\mathbf {s} _{0})=\sum \limits _{i=1}^{n}w_{i}^{\mathtt {KED}}(\mathbf {s} _{0})\cdot z(\mathbf {s} _{i})$ for

$\sum \limits _{i=1}^{n}w_{i}^{\mathtt {KED}}(\mathbf {s} _{0})\cdot q_{k}(\mathbf {s} _{i})=q_{k}(\mathbf {s} _{0})$ for $k=1,\ldots ,p$ or in matrix notation:

${\hat {z}}_{\mathtt {KED}}(\mathbf {s} _{0})=\mathbf {\delta } _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {z}$ where $z$ is the target variable, $q_{k}$ 's are the predictor variables i.e. values at a new location $({\mathbf {s} }_{0})$ , ${\mathbf {\delta } }_{\mathbf {0} }$ is the vector of KED weights ($w_{i}^{\mathtt {KED}}$ ), $p$ is the number of predictors and $\mathbf {z}$ is the vector of $n$ observations at primary locations. The KED weights are solved using the extended matrices:

$\mathbf {\lambda } _{\mathbf {0} }^{\mathtt {KED}}=\left\{w_{1}^{\mathtt {KED}}(\mathbf {s} _{0}),\ldots ,w_{n}^{\mathtt {KED}}(\mathbf {s} _{0}),\varphi _{0}(\mathbf {s} _{0}),\ldots ,\varphi _{p}(\mathbf {s} _{0})\right\}^{\mathbf {T} }=\mathbf {C} ^{{\mathtt {KED}}-1}\cdot \mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}$ where ${\mathbf {\lambda } }_{\mathbf {0} }^{\mathtt {KED}}$ is the vector of solved weights, $\varphi _{p}$ are the Lagrange multipliers, ${\mathbf {C} }^{\mathtt {KED}}$ is the extended covariance matrix of residuals and ${\mathbf {c} }_{\mathbf {0} }^{\mathtt {KED}}$ is the extended vector of covariances at new location.

In the case of KED, the extended covariance matrix of residuals looks like this (Webster and Oliver, 2007; p. 183):

$\mathbf {C} ^{\mathtt {KED}}=\left[{\begin{array}{ccccccc}C(\mathbf {s} _{1},\mathbf {s} _{1})&\cdots &C(\mathbf {s} _{1},\mathbf {s} _{n})&1&q_{1}(\mathbf {s} _{1})&\cdots &q_{p}(\mathbf {s} _{1})\\\vdots &&\vdots &\vdots &\vdots &&\vdots \\C(\mathbf {s} _{n},\mathbf {s} _{1})&\cdots &C(\mathbf {s} _{n},\mathbf {s} _{n})&1&q_{1}(\mathbf {s} _{n})&\cdots &q_{p}(\mathbf {s} _{n})\\1&\cdots &1&0&0&\cdots &0\\q_{1}(\mathbf {s} _{1})&\cdots &q_{1}(\mathbf {s} _{n})&0&0&\cdots &0\\\vdots &&\vdots &\vdots &\vdots &&\vdots \\q_{p}(\mathbf {s} _{1})&\cdots &q_{p}(\mathbf {s} _{n})&0&0&\cdots &0\end{array}}\right]$ and $\mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}$ like this:

$\mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}=\left\{C(\mathbf {s} _{0},\mathbf {s} _{1}),\ldots ,C(\mathbf {s} _{0},\mathbf {s} _{n}),q_{0}(\mathbf {s} _{0}),q_{1}(\mathbf {s} _{0}),\ldots ,q_{p}(\mathbf {s} _{0})\right\}^{\mathbf {T} };q_{0}(\mathbf {s} _{0})=1$ Hence, KED looks exactly as ordinary kriging, except the covariance matrix/vector are extended with values of auxiliary predictors.

Although the KED seems, at first glance, to be computationally more straightforward than RK, the parameters of the variogram for KED must also be estimated from regression residuals, thus requiring a separate regression modelling step. This regression should be GLS because of the likely spatial correlation between residuals. Note that many analyst use instead the OLS residuals, which may not be too different from the GLS residuals. However, they are not optimal if there is any spatial correlation, and indeed they may be quite different for clustered sample points or if the number of samples is relatively small ($\ll 200$ ).

A limitation of KED is the instability of the extended matrix in the case that the covariate does not vary smoothly in space. RK has the advantage that it explicitly separates trend estimation from spatial prediction of residuals, allowing the use of arbitrarily-complex forms of regression, rather than the simple linear techniques that can be used with KED. In addition, it allows the separate interpretation of the two interpolated components. The emphasis on regression is important also because fitting of the deterministic part of variation (regression) is often more beneficial for the quality of final maps than fitting of the stochastic part (residuals).

## Software to run regression-kriging

Regression-kriging can be automated e.g. in R statistical computing environment, by using gstat and/or geoR package. Typical inputs/outputs include:

INPUTS:

• Interpolation set (point map) — $z(\mathbf {s} _{i})$ $i=1,\ldots ,n$ at primary locations;
• Minimum and maximum expected values and measurement precision ($\Delta z$ );
• Continuous predictors (raster map) — $q(\mathbf {s} )$ ; at new unvisited locations
• Discrete predictors (polygon map);
• Validation set (point map) — $z*(\mathbf {s} _{j})$ $j=1,\ldots ,l$ (optional);
• Lag spacing and limiting distance (required to fit the variogram);

OUTPUTS:

• Map of predictions and relative prediction error;
• Best subset of predictors and correlation significance (adjusted R-square);
• Variogram model parameters (e.g. $C_{0}$ , $C_{1}$ , $R$ )
• GLS drift model coefficients;
• Accuracy of prediction at validation points: mean prediction error (MPE) and root mean square prediction error (RMSPE);

## Application of regression-kriging

Regression-kriging is used in various applied fields, from meteorology, climatology, soil mapping, geological mapping, species distribution modeling and similar. The only requirement for using regression-kriging versus e.g. ordinary kriging is that one or more covariate layers exist, and which are significantly correlated with the feature of interest. Some general applications of regression-kriging are:

• Geostatistical mapping: Regression-kriging allows for use of hybrid geostatistical techniques to model e.g. spatial distribution of soil properties.
• Downscaling of maps: Regression-kriging can be used a framework to downscale various existing gridded maps. In this case the covariate layers need to be available at better resolution (which corresponds to the sampling intensity) than the original point data.
• Error propagation: Simulated maps generated by using a regression-kriging model can be used for scenario testing and for estimating propagated uncertainty. Simulations of zinc concentrations derived using a regression-Kriging model. This model uses one continuous (distance to the river) and one categorical (flooding frequency) covariate. Code used to produce these maps is available here.

Regression-kriging-based algorithms play more and more important role in geostatistics because the number of possible covariates is increasing every day. For example, DEMs are now available from a number of sources. Detailed and accurate images of topography can now be ordered from remote sensing systems such as SPOT and ASTER; SPOT5 offers the High Resolution Stereoscopic (HRS) scanner, which can be used to produce DEMs at resolutions of up to 5 m. Finer differences in elevation can also be obtained with airborne laser-scanners. The cost of data is either free or dropping in price as technology advances. NASA recorded most of the world's topography in the Shuttle Radar Topographic Mission in 2000. From summer of 2004, these data has been available (e.g. via USGS ftp) for almost whole globe at resolution of about 90 m (for the North American continent at resolution of about 30 m). Likewise, MODIS multispectral images are freely available for download at resolutions of 250 m. A large free repository of Landsat images is also available for download via the Global Land Cover Facility (GLCF).