Regression-kriging

In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables (such as parameters derived from digital elevation modelling, remote sensing/imagery, and thematic maps) with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.[1]

BLUP for spatial data

The universal model of spatial variation scheme.

Regression-kriging is an implementation of the best linear unbiased predictor (BLUP) for spatial data, i.e. the best linear interpolator assuming the universal model of spatial variation. Matheron (1969) proposed that a value of a target variable at some location can be modeled as a sum of the deterministic and stochastic components:[2]

${\displaystyle Z(\mathbf {s} )=m(\mathbf {s} )+\varepsilon '(\mathbf {s} )+\varepsilon ''}$

which he termed universal model of spatial variation. Both deterministic and stochastic components of spatial variation can be modeled separately. By combining the two approaches, we obtain:

${\displaystyle {\hat {z}}(\mathbf {s} _{0})={\hat {m}}(\mathbf {s} _{0})+{\hat {e}}(\mathbf {s} _{0})=\sum \limits _{k=0}^{p}{{\hat {\beta }}_{k}\cdot q_{k}(\mathbf {s} _{0})}+\sum \limits _{i=1}^{n}\lambda _{i}\cdot e(\mathbf {s} _{i})}$

where ${\displaystyle {\hat {m}}(\mathbf {s} _{0})}$ is the fitted deterministic part, ${\displaystyle {\hat {e}}(\mathbf {s} _{0})}$ is the interpolated residual, ${\displaystyle {\hat {\beta }}_{k}}$ are estimated deterministic model coefficients (${\displaystyle {\hat {\beta }}_{0}}$ is the estimated intercept), ${\displaystyle \lambda _{i}}$ are kriging weights determined by the spatial dependence structure of the residual and where ${\displaystyle e(\mathbf {s} _{i})}$ is the residual at location ${\displaystyle {\mathbf {s} }_{i}}$. The regression coefficients ${\displaystyle {\hat {\beta }}_{k}}$ can be estimated from the sample by some fitting method, e.g. ordinary least squares (OLS) or, optimally, using generalized least squares (GLS):[3]

${\displaystyle \mathbf {\hat {\beta }} _{\mathtt {GLS}}=\left(\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {q} \right)^{-\mathbf {1} }\cdot \mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {z} }$

where ${\displaystyle \mathbf {\hat {\beta }} _{\mathtt {GLS}}}$ is the vector of estimated regression coefficients, ${\displaystyle \mathbf {C} }$ is the covariance matrix of the residuals, ${\displaystyle {\mathbf {q} }}$ is a matrix of predictors at the sampling locations and ${\displaystyle \mathbf {z} }$ is the vector of measured values of the target variable. The GLS estimation of regression coefficients is, in fact, a special case of the geographically weighted regression. In the case, the weights are determined objectively to account for the spatial auto-correlation between the residuals.

Once the deterministic part of variation has been estimated (regression-part), the residual can be interpolated with kriging and added to the estimated trend. The estimation of the residuals is an iterative process: first the deterministic part of variation is estimated using OLS, then the covariance function of the residuals is used to obtain the GLS coefficients. Next, these are used to re-compute the residuals, from which an updated covariance function is computed, and so on. Although this is by many geostatisticians recommended as the proper procedure, Kitanidis (1994) showed that use of the covariance function derived from the OLS residuals (i.e. a single iteration) is often satisfactory, because it is not different enough from the function derived after several iterations; i.e. it does not affect much the final predictions. Minasny and McBratney (2007) report similar results—it seems that using more higher quality data is more important than to use more sophisticated statistical methods.[4]

In matrix notation, regression-kriging is commonly written as:[5]

${\displaystyle {\hat {z}}_{\mathtt {RK}}(\mathbf {s} _{0})=\mathbf {q} _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {\hat {\beta }} _{\mathtt {GLS}}+\mathbf {\lambda } _{\mathbf {0} }^{\mathbf {T} }\cdot (\mathbf {z} -\mathbf {q} \cdot \mathbf {\hat {\beta }} _{\mathtt {GLS}})}$

where ${\displaystyle {\hat {z}}({\mathbf {s} }_{0})}$ is the predicted value at location ${\displaystyle {\mathbf {s} }_{0}}$, ${\displaystyle {\mathbf {q} }_{\mathbf {0} }}$ is the vector of ${\displaystyle p+1}$ predictors and ${\displaystyle \mathbf {\lambda } _{\mathbf {0} }}$ is the vector of ${\displaystyle n}$ kriging weights used to interpolate the residuals. The RK model is considered to be the Best Linear Predictor of spatial data.[5][6] It has a prediction variance that reflects the position of new locations (extrapolation) in both geographical and feature space:

${\displaystyle {\hat {\sigma }}_{\mathtt {RK}}^{2}(\mathbf {s} _{0})=(C_{0}+C_{1})-\mathbf {c} _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }+\left(\mathbf {q} _{\mathbf {0} }-\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }\right)^{\mathbf {T} }\cdot \left(\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {q} \right)^{\mathbf {-1} }\cdot \left(\mathbf {q} _{\mathbf {0} }-\mathbf {q} ^{\mathbf {T} }\cdot \mathbf {C} ^{-\mathbf {1} }\cdot \mathbf {c} _{\mathbf {0} }\right)}$

where ${\displaystyle C_{0}+C_{1}}$ is the sill variation and ${\displaystyle {\mathbf {c} }_{0}}$ is the vector of covariances of residuals at the unvisited location.

Decision tree for selecting a suitable spatial prediction model.

Many (geo)statisticians believe that there is only one Best Linear Unbiased Prediction model for spatial data (e.g. regression-kriging), all other techniques such as ordinary kriging, environmental correlation, averaging of values per polygons or inverse distance interpolation can be seen as its special cases. If the residuals show no spatial auto-correlation (pure nugget effect), the regression-kriging converges to pure multiple linear regression, because the covariance matrix (${\displaystyle \mathbf {C} }$) becomes an identity matrix. Likewise, if the target variable shows no correlation with the auxiliary predictors, the regression-kriging model reduces to ordinary kriging model because the deterministic part equals the (global) mean value. Hence, pure kriging and pure regression should be considered as only special cases of regression-kriging (see figure).

RK and UK/KED

The geostatistical literature uses many different terms for what are essentially the same or at least very similar techniques. This confuses the users and distracts them from using the right technique for their mapping projects. In fact, both universal kriging, kriging with external drift, and regression-kriging are basically the same technique.

Matheron (1969) originally termed the technique Le krigeage universel, however, the technique was intended as a generalized case of kriging where the trend is modelled as a function of coordinates. Thus, many authors reserve the term universal kriging (UK) for the case when only the coordinates are used as predictors. If the deterministic part of variation (drift) is defined externally as a linear function of some auxiliary variables, rather than the coordinates, the term kriging with external drift (KED) is preferred (according to Hengl 2007, "About regression-kriging: From equations to case studies"). In the case of UK or KED, the predictions are made as with kriging, with the difference that the covariance matrix of residuals is extended with the auxiliary predictors. However, the drift and residuals can also be estimated separately and then summed. This procedure was suggested by Ahmed et al. (1987) and Odeh et al. (1995) later named it regression-kriging, while Goovaerts (1997) uses the term kriging with a trend model to refer to a family of interpolators, and refers to RK as simple kriging with varying local means. Minasny and McBratney (2007) simply call this technique Empirical Best Linear Unbiased Predictor i.e. E-BLUP.[7][8][9][4]

In the case of KED, predictions at new locations are made by:

${\displaystyle {\hat {z}}_{\mathtt {KED}}(\mathbf {s} _{0})=\sum \limits _{i=1}^{n}w_{i}^{\mathtt {KED}}(\mathbf {s} _{0})\cdot z(\mathbf {s} _{i})}$

for

${\displaystyle \sum \limits _{i=1}^{n}w_{i}^{\mathtt {KED}}(\mathbf {s} _{0})\cdot q_{k}(\mathbf {s} _{i})=q_{k}(\mathbf {s} _{0})}$

for ${\displaystyle k=1,\ldots ,p}$ or in matrix notation:

${\displaystyle {\hat {z}}_{\mathtt {KED}}(\mathbf {s} _{0})=\mathbf {\delta } _{\mathbf {0} }^{\mathbf {T} }\cdot \mathbf {z} }$

where ${\displaystyle z}$ is the target variable, ${\displaystyle q_{k}}$'s are the predictor variables i.e. values at a new location ${\displaystyle ({\mathbf {s} }_{0})}$, ${\displaystyle {\mathbf {\delta } }_{\mathbf {0} }}$ is the vector of KED weights (${\displaystyle w_{i}^{\mathtt {KED}}}$), ${\displaystyle p}$ is the number of predictors and ${\displaystyle \mathbf {z} }$ is the vector of ${\displaystyle n}$ observations at primary locations. The KED weights are solved using the extended matrices:

${\displaystyle \mathbf {\lambda } _{\mathbf {0} }^{\mathtt {KED}}=\left\{w_{1}^{\mathtt {KED}}(\mathbf {s} _{0}),\ldots ,w_{n}^{\mathtt {KED}}(\mathbf {s} _{0}),\varphi _{0}(\mathbf {s} _{0}),\ldots ,\varphi _{p}(\mathbf {s} _{0})\right\}^{\mathbf {T} }=\mathbf {C} ^{{\mathtt {KED}}-1}\cdot \mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}}$

where ${\displaystyle {\mathbf {\lambda } }_{\mathbf {0} }^{\mathtt {KED}}}$ is the vector of solved weights, ${\displaystyle \varphi _{p}}$ are the Lagrange multipliers, ${\displaystyle {\mathbf {C} }^{\mathtt {KED}}}$ is the extended covariance matrix of residuals and ${\displaystyle {\mathbf {c} }_{\mathbf {0} }^{\mathtt {KED}}}$ is the extended vector of covariances at new location.

In the case of KED, the extended covariance matrix of residuals looks like this (Webster and Oliver, 2007; p. 183):[10]

${\displaystyle \mathbf {C} ^{\mathtt {KED}}=\left[{\begin{array}{ccccccc}C(\mathbf {s} _{1},\mathbf {s} _{1})&\cdots &C(\mathbf {s} _{1},\mathbf {s} _{n})&1&q_{1}(\mathbf {s} _{1})&\cdots &q_{p}(\mathbf {s} _{1})\\\vdots &&\vdots &\vdots &\vdots &&\vdots \\C(\mathbf {s} _{n},\mathbf {s} _{1})&\cdots &C(\mathbf {s} _{n},\mathbf {s} _{n})&1&q_{1}(\mathbf {s} _{n})&\cdots &q_{p}(\mathbf {s} _{n})\\1&\cdots &1&0&0&\cdots &0\\q_{1}(\mathbf {s} _{1})&\cdots &q_{1}(\mathbf {s} _{n})&0&0&\cdots &0\\\vdots &&\vdots &\vdots &\vdots &&\vdots \\q_{p}(\mathbf {s} _{1})&\cdots &q_{p}(\mathbf {s} _{n})&0&0&\cdots &0\end{array}}\right]}$

and ${\displaystyle \mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}}$ like this:

${\displaystyle \mathbf {c} _{\mathbf {0} }^{\mathtt {KED}}=\left\{C(\mathbf {s} _{0},\mathbf {s} _{1}),\ldots ,C(\mathbf {s} _{0},\mathbf {s} _{n}),q_{0}(\mathbf {s} _{0}),q_{1}(\mathbf {s} _{0}),\ldots ,q_{p}(\mathbf {s} _{0})\right\}^{\mathbf {T} };q_{0}(\mathbf {s} _{0})=1}$

Hence, KED looks exactly as ordinary kriging, except the covariance matrix/vector are extended with values of auxiliary predictors.

Although the KED seems, at first glance, to be computationally more straightforward than RK, the parameters of the variogram for KED must also be estimated from regression residuals, thus requiring a separate regression modelling step. This regression should be GLS because of the likely spatial correlation between residuals. Note that many analyst use instead the OLS residuals, which may not be too different from the GLS residuals. However, they are not optimal if there is any spatial correlation, and indeed they may be quite different for clustered sample points or if the number of samples is relatively small (${\displaystyle \ll 200}$).

A limitation of KED is the instability of the extended matrix in the case that the covariate does not vary smoothly in space. RK has the advantage that it explicitly separates trend estimation from spatial prediction of residuals, allowing the use of arbitrarily-complex forms of regression, rather than the simple linear techniques that can be used with KED. In addition, it allows the separate interpretation of the two interpolated components. The emphasis on regression is important also because fitting of the deterministic part of variation (regression) is often more beneficial for the quality of final maps than fitting of the stochastic part (residuals).

Software to run regression-kriging

Example of a generic framework for spatial prediction of soil variables based on regression-kriging.[9]

Regression-kriging can be automated e.g. in R statistical computing environment, by using gstat and/or geoR package. Typical inputs/outputs include:

INPUTS:

• Interpolation set (point map) — ${\displaystyle z(\mathbf {s} _{i})}$ ${\displaystyle i=1,\ldots ,n}$ at primary locations;
• Minimum and maximum expected values and measurement precision (${\displaystyle \Delta z}$);
• Continuous predictors (raster map) — ${\displaystyle q(\mathbf {s} )}$; at new unvisited locations
• Discrete predictors (polygon map);
• Validation set (point map) — ${\displaystyle z*(\mathbf {s} _{j})}$ ${\displaystyle j=1,\ldots ,l}$ (optional);
• Lag spacing and limiting distance (required to fit the variogram);

OUTPUTS:

• Map of predictions and relative prediction error;
• Best subset of predictors and correlation significance (adjusted R-square);
• Variogram model parameters (e.g. ${\displaystyle C_{0}}$, ${\displaystyle C_{1}}$, ${\displaystyle R}$)
• GLS drift model coefficients;
• Accuracy of prediction at validation points: mean prediction error (MPE) and root mean square prediction error (RMSPE);

Application of regression-kriging

Regression-kriging is used in various applied fields, from meteorology, climatology, soil mapping, geological mapping, species distribution modeling and similar. The only requirement for using regression-kriging versus e.g. ordinary kriging is that one or more covariate layers exist, and which are significantly correlated with the feature of interest. Some general applications of regression-kriging are:

• Geostatistical mapping: Regression-kriging allows for use of hybrid geostatistical techniques to model e.g. spatial distribution of soil properties.
• Downscaling of maps: Regression-kriging can be used a framework to downscale various existing gridded maps. In this case the covariate layers need to be available at better resolution (which corresponds to the sampling intensity) than the original point data.[11]
• Error propagation: Simulated maps generated by using a regression-kriging model can be used for scenario testing and for estimating propagated uncertainty.
Simulations of zinc concentrations derived using a regression-Kriging model. This model uses one continuous (distance to the river) and one categorical (flooding frequency) covariate. Code used to produce these maps is available here.

Regression-kriging-based algorithms play more and more important role in geostatistics because the number of possible covariates is increasing every day.[1] For example, DEMs are now available from a number of sources. Detailed and accurate images of topography can now be ordered from remote sensing systems such as SPOT and ASTER; SPOT5 offers the High Resolution Stereoscopic (HRS) scanner, which can be used to produce DEMs at resolutions of up to 5 m.[12] Finer differences in elevation can also be obtained with airborne laser-scanners. The cost of data is either free or dropping in price as technology advances. NASA recorded most of the world's topography in the Shuttle Radar Topographic Mission in 2000.[13] From summer of 2004, these data has been available (e.g. via USGS ftp) for almost whole globe at resolution of about 90 m (for the North American continent at resolution of about 30 m). Likewise, MODIS multispectral images are freely available for download at resolutions of 250 m. A large free repository of Landsat images is also available for download via the Global Land Cover Facility (GLCF).

References

1. ^ a b Pebesma, Edzer J (1 July 2006). "The Role of External Variables and GIS Databases in Geostatistical Analysis" (PDF). Transactions in GIS. 10 (4): 615–632. doi:10.1111/j.1467-9671.2006.01015.x. S2CID 22146107.
2. ^ Matheron, Georges (1969). "Part 1 of Cahiers du Centre de morphologie mathématique de Fontainebleau". Le krigeage universel. École nationale supérieure des mines de Paris.
3. ^ Cressie, Noel (2012). Statistics for spatio-temporal data. Hoboken, N.J.: Wiley. ISBN 9780471692744.
4. ^ a b Minasny, Budiman; McBratney, Alex B. (31 July 2007). "Spatial prediction of soil properties using EBLUP with the Matérn covariance function". Geoderma. 140 (4): 324–336. Bibcode:2007Geode.140..324M. doi:10.1016/j.geoderma.2007.04.028.
5. ^ a b Christensen, Ronald (2001). Advanced linear modeling : multivariate, time series, and spatial data; nonparametric regression and response surface maximization (2. ed.). New York, NY [u.a.]: Springer. ISBN 9780387952963.
6. ^ Goldberger, A.S. (1962). "Best Linear Unbiased Prediction in the Generalized Linear Regression Model". Journal of the American Statistical Association. 57 (298): 369–375. doi:10.1080/01621459.1962.10480665. JSTOR 2281645.
7. ^ Ahmed, Shakeel; De Marsily, Ghislain (1 January 1987). "Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity". Water Resources Research. 23 (9): 1717. Bibcode:1987WRR....23.1717A. doi:10.1029/WR023i009p01717.
8. ^ Odeh, I.O.A.; McBratney, A.B.; Chittleborough, D.J. (31 July 1995). "Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging". Geoderma. 67 (3–4): 215–226. Bibcode:1995Geode..67..215O. doi:10.1016/0016-7061(95)00007-B.
9. ^ a b Hengl, Tomislav; Heuvelink, Gerard B.M.; Stein, Alfred (30 April 2004). "A generic framework for spatial prediction of soil variables based on regression-kriging" (PDF). Geoderma. 120 (1–2): 75–93. Bibcode:2004Geode.120...75H. doi:10.1016/j.geoderma.2003.08.018.
10. ^ Webster, Richard; Oliver, Margaret A. (2007). Geostatistics for environmental scientists (2nd ed.). Chichester: Wiley. ISBN 9780470028582.
11. ^ Hengl, Tomislav; Bajat, Branislav; Blagojević, Dragan; Reuter, Hannes I. (1 December 2008). "Geostatistical modeling of topography using auxiliary maps" (PDF). Computers & Geosciences. 34 (12): 1886–1899. Bibcode:2008CG.....34.1886H. doi:10.1016/j.cageo.2008.01.005.
12. ^ Toutin, Thierry (30 April 2006). "Generation of DSMs from SPOT-5 in-track HRS and across-track HRG stereo data using spatiotriangulation and autocalibration". ISPRS Journal of Photogrammetry and Remote Sensing. 60 (3): 170–181. Bibcode:2006JPRS...60..170T. doi:10.1016/j.isprsjprs.2006.02.003.
13. ^ Rabus, Bernhard; Eineder, Michael; Roth, Achim; Bamler, Richard (31 January 2003). "The shuttle radar topography mission—a new class of digital elevation models acquired by spaceborne radar". ISPRS Journal of Photogrammetry and Remote Sensing. 57 (4): 241–262. Bibcode:2003JPRS...57..241R. doi:10.1016/S0924-2716(02)00124-7.

• Hengl T., Heuvelink G. B. M., Rossiter D. G. (2007). "About regression-kriging: from equations to case studies". Computers & Geosciences. 33 (10): 1301–1315. Bibcode:2007CG.....33.1301H. doi:10.1016/j.cageo.2007.05.001.{{cite journal}}: CS1 maint: uses authors parameter (link)