2.2.1. IDW Interpolation
IDW is one of the oldest spatial interpolation methods used in different research fields [
57,
58,
59], with various versions aiming to improve the parameter selection [
60,
61,
62,
63]. IDW estimates values at unsampled locations using the data recorded at neighbor locations. The formula used for this purpose is
where:
n is the number of sampling points,
is the estimated value at the site ,
is the value recorded at the site ,
is the distance from and
β > 1 is a parameter [
64].
According to Equation (1), the closer the location is to
the higher the contribution in computing the estimated value [
64]. The default
β utilized in most applications is 2, called the inverse squared distance (ISD) interpolator [
65].
The main disadvantage of this method is the arbitrary choice of
β and low performances for clustered or unevenly distributed points. Such points located at similar distances from the target location will have approximately the same weight, and so have almost the same contribution to the estimated value [
66].
In optimizing the IDW model, the focus lies on selecting the most effective power parameter
β using LOOCV. This optimization process involves evaluating a range of power parameters (from 1 to 10) to determine their influence on the model’s accuracy. During each iteration, for a specific
β, LOOCV is conducted, wherein the model’s predictive performance is assessed by temporarily omitting the entire records from a point at a time from the dataset [
67].
Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) were used to assess the model’s performance. These metrics provide a comprehensive model evaluation, with RMSE as the primary criterion for selecting the optimal β. The power parameter that results in the lowest RMSE is identified as optimal, ensuring precision and reliability in the model’s spatial predictions.
2.2.2. Spline Smoothing
The key idea behind spline interpolation is to fit a function consisting of local polynomials of degree
p through the data points. These polynomials describe pieces of a line or a surface and can be constrained to pass through all the known data points while maintaining smoothness. Among the types of splines used in spatial interpolation, the most common is the cubic spline. It is defined over small intervals, and its cubic polynomials are stitched together at specific points, called knots. These knots are the known data points where the curve must pass through. Alternatively, knots away from the data points can be fitted using least squares or other methods to produce smoothing splines [
68].
The most significant advantage of cubic spline interpolation (CSI) is the smoothness of the curve, capturing the overall data shape. The basic minimum curvature technique makes CSI more convenient for gently varying surfaces. However, we must be aware that errors can be expected in cases of uneven distribution. In such cases, the tension spline interpolation technique can be applied [
68,
69].
BSS is an extension of CSI for interpolating data points on a two-dimensional regular grid. The interpolated surface is defined by a piecewise polynomial function in two variables,
x and
y. In each rectangular cell formed by the data points, the interpolating function is a bicubic polynomial:
where
are the coefficients of the polynomial.
We utilized the
‘interp()’ function of the
akima package in R [
70] with the parameter linear = FALSE to implement bicubic interpolation. The method offers smooth surface fitting for our spatial dataset. Despite setting
extrap = TRUE to enable the extrapolation of values beyond the convex hull of our data points, we encountered instances of NA results, particularly at the edge points of the rectangular grid (points 1, 3–9). Note that, in this context, ‘NA’ signifies missing interpolated values for some locations due to the insufficient nearby data points that BSS requires to compute a value. This limitation arises because the interpolation relies on data that fall within the range of observed values, and points at the periphery may lack sufficient neighboring data to guide the extrapolation process. The issue is particularly pronounced during LOOCV when edge points are omitted, leading to significant uncertainty and the potential for less accurate predictions as the algorithm stretches beyond its reliable bounds.
To address this challenge, our R code introduces a solution by creating a boundary buffer: a series of synthetic points added around the grid’s perimeter. This technique extends the convex hull to include these new points, effectively converting extrapolation scenarios into interpolation ones. The synthetic points are placed just beyond the original grid’s extent, and their associated values are derived from the mean of the nearest actual data points, ensuring a smooth gradient that aligns with the known data distribution. By integrating this boundary buffer into our LOOCV process, we facilitate the ‘interp()’ function’s ability to perform interpolation across the entire grid, including the previously problematic edge points. This approach mitigates the occurrence of NA results and enhances the precision of our spatial predictions, leveraging the strengths of bicubic interpolation across the augmented dataset.
2.2.3. Spatio-Temporal Kriging
Kriging represents a comprehensive suite of generalized least-squares regression algorithms for spatial interpolation. Apart from deterministic methodologies, this approach optimizes weights in a linear predictor to achieve the lowest possible average error in interpolation [
71]. STK expands upon the foundational principles of traditional kriging, enabling the analysis of data that encompass both spatial and temporal aspects, which can be particularly advantageous. STK aims to proficiently forecast values at unobserved spatio-temporal points, utilizing the intricate patterns of spatial and temporal correlations within the dataset. STK’s strength lies in its ability to provide statistically robust interpolation, even at boundary points. It can make better-informed predictions at the edges of a grid by considering the spatial relationship of points within the dataset. However, its complexity requires a thorough understanding of the underlying statistical models and assumptions, making it more challenging to implement than simpler methods such as IDW or BSS.
At the core of STK lies the concept of Gaussian processes, which can be thought of as a generalization of the normal distribution of functions. It is a probabilistic model assuming that every point in some continuous input space is associated with a normally distributed random variable [
72]. Understanding Gaussian processes helps in comprehending how STK makes predictions at new spatio-temporal points by considering both the mean and variability of the data, providing a statistical framework for our analysis [
73].
Considering that a sequence of observations is sampled from a process that can be decomposed into a true spatio-temporal process (assumed Gaussian) and an observational error, the process is expressible through spatio-temporal fixed effects attributed to various covariates. The observational error is modeled as a spatio-temporal-dependent random process. In this case, the process can be modeled by
where for each
there are
observations [
66].
This formulation effectively separates the observed data into a deterministic component influenced by specific covariates and a stochastic component that accounts for the randomness inherent in the observation process.
The overarching objective is to employ the dataset z (which includes time- and space-specific observations) to build a model of the random field Z [
74] aiming to predict values at unobserved spatio-temporal locations or to enable simulations based on its conditional distribution. It assumes that Z maintains stationarity and exhibits spatial isotropy, allowing it to be defined by a mean function and a covariance function. With an aptly chosen covariance function, one can ascertain the covariance matrices essential for the linear predictor. By applying algebraic methods such as those used in established spatial analyses, it becomes feasible to accurately predict the values of the intended random field [
75]. In this article, one covariance model was implemented from the
gstat package, as described in [
74]. The separable model assumes that the global spatio-temporal covariance could be expressed as the product of a spatial and a temporal term, with a variogram given by
where
and
are the standardized spatial and temporal variograms with separate nugget effects and (joint) sill.
For this study, we explored diverse variogram types (exponential, spherical, Gaussian, and Matérn) that best fit the data series and evaluated them in relation to the covariance model under consideration.
A direct implementation of the above procedure was not possible, given the particularities of the series set. Outliers in spatial and temporal datasets can markedly distort the predictive modeling in kriging, leading to skewed semivariograms and, thus, unreliable interpolation results. Data transformation techniques play a pivotal role in counteracting this. Transformations such as the logarithmic, square root, or more adaptive methods such as the Box–Cox transformation re-scale the data, diminishing the influence of outliers. These transformations compress the range of extreme values, reducing their leverage on the analysis. By normalizing the data distribution and minimizing the variance introduced by outliers, these transformations facilitate a more accurate representation of spatial and temporal autocorrelation, which is central to kriging.
Moreover, pre-processing datasets, particularly the removal of spatio-temporal trends to focus on residuals, is an indispensable step that significantly bolsters the interpretability and precision of kriging outcomes. The detrending process separates the core spatial structure and inherent temporal dynamics from the observed variability, thus enabling kriging to finely tune into the intrinsic autocorrelation of the residuals without the confounding influence of broader trends. Working directly with residuals circumvents the complications introduced by non-stationary behavior attributable to deterministic trends, leading to enhanced prediction accuracy and a profound comprehension of the stochastic elements of the data. Therefore, we adopted a modeling approach using generalized mixed models (GMM) and linear mixed effects (LME) models to remove spatio-temporal trends and work with the residuals.
To address the normality issue that remained after the log10 transformation (
Section 2.1), we initially adopted a straightforward approach using linear models by implementing the
lm() function, which fits linear surfaces to the data. This first step involves exploring simple relationships, such as first-order (straight) or second-order (curved) surfaces, to understand the basic trends in the series set. However, the data series is collected over sea and desert areas, and these basic linear models often fail. They might not adequately capture the complex and nuanced relationships inherent in the data, such as the non-linear patterns and the unique characteristics of different sampling points. Therefore, we transitioned to more advanced statistical models such as generalized additive models (GAM), GAMM, and LME [
76].
GAMs are particularly adept at uncovering complex, non-linear relationships within a data series. For instance, they can help us understand how the concentration levels might increase non-uniformly in response to changing environmental factors such as humidity or temperature. This aspect is particularly relevant when considering the diverse nature of the data from various geographical locations. Building on this, GAMM adds another layer of sophistication by incorporating ‘random effects’. These are useful in our scenario, where each of the 70 distinct points presents a unique environmental characteristic, allowing modeling these differences while analyzing the overall data trends.
LME models come into play due to their strength in handling linear relationships and their capability to address fixed effects (common trends across all points) and random effects (unique characteristics of each point). In the context of our dataset, which exhibits a clear seasonal pattern with peaks every six months, LMEs are particularly suitable. They can effectively capture these predictable temporal trends while accounting for spatial variations across locations. By focusing on the residuals from the LME component, we can discern aspects of the seasonal variation and the site-specific differences that simpler linear models do not fully explain.
GAMM from the
‘mgcv’ package in R was utilized to model the spatial field and temporal trends. The thin plate regression splines’
t2()’ within GAMM allowed for flexible shaping of the spatial and temporal effects, capturing the complex underlying patterns in the data. The ‘
gamm()’ function was used to fit these models, which internally uses the ‘
lme()’ function to estimate random effects, providing a robust framework for detrending the dataset. After fitting the GAMM, we extracted the residuals representing the detrended data—the pure spatial-temporal stochastic component we aim to model with kriging. The residuals follow a normal distribution almost perfectly with an adequate, stable variance as indicated by a D = 0.013273 and a
p-value = 0.2279 of the K–S test and by the visual verification of diagnostic plots in
Figure 3.
For LOOCV, we systematically removed each station from the dataset and used the remaining stations to predict the left-out station’s values. We performed spatio-temporal kriging on these residuals, ensuring that the predictions were based on the stochastic properties rather than any underlying deterministic trends.
The final step involved back-transforming the predictions to the original series scale to interpret the results in their natural units. It was accomplished by first predicting the trend component for each data point using the ‘predict()’ function on the fitted GAMM model. Then, we combined this trend component with the kriged residuals to obtain the final predictions in the log10 scale. To correct the bias introduced by the log transformation, a correction factor of was calculated, where is the variance of the log10-transformed predictions. The final predictions were then scaled back to the original series units by reversing the log10 transformation and applying the correction factor. This back-transformation is crucial as it allows the results to be presented on the same scale as the original measurements, making them directly comparable and interpretable in the context of the study.
The entire LOOCV process was automated in R software, where we evaluated various variogram models, including spherical, exponential, Gaussian, and Matérn, to accurately capture the dataset’s complex spatio-temporal dynamics. This systematic approach involved fitting each model to the detrended data to identify the ones that best represent the spatio-temporal correlations. We kept the spatio-temporal combination of variogram models that yielded the lowest MSE.
To gauge the overall performance of the models, MAE, RMSE, and MAPE were calculated for each station and then averaged across all periods. MAE provides the evaluation of the systematic error. RMSE better explains the random errors. RMSE is valuable in situations where significant errors are particularly undesirable, as it helps identify models with occasional large deviations from the actual values. MAPE is beneficial when dealing with data where the magnitude varies significantly or when comparing the accuracy of models across different data scales. Values close to zero indicate the models’ good performances.
Three more efficiency indicators were also used for comparing the models:
Nash–Sutcliffe Efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance. It is defined by
where
= observed data,
= simulated data,
= mean of observed data, and
N = number of time steps.
The range of NSE is (−∞, 1). NSE = 1 indicates a perfect match between the model and the observations. A value of 0 shows that the model predictions are as accurate as the average of the observed data. Still, NSE is sensitive to both high and low values.
Kling–Gupta Efficiency (KGE) decomposes the NSE into components representing correlation, variability, and bias, providing a more comprehensive measure of model performance. The formula is
where
r = correlation between observed and simulated data,
α = ratio of the standard deviation of simulated to observed data, and
β = ratio of the mean of simulated to observed data.
The benefits of using KGE are as follows:
- (1)
The decomposition allows us to diagnose which aspect of the model is contributing to inefficiencies.
- (2)
KGE balances different performance aspects, providing a more holistic view of model performance.
- (3)
Using it with NSE to assess model performance leads to making informed decisions about model improvements.
Index of Agreement (dIndex) is a standardized measure of model prediction error and represents the ratio of the mean square error and the potential error. It is given by
DIndex measures the magnitude of the error and its pattern and distribution. It is effectively an agreement measure, assessing how well the predicted data match the observed data range. Moreover, it can be used across different disciplines, including hydrology, climatology, and ecology.
Finally, we present a comprehensive workflow diagram (
Figure 4) and its corresponding algorithmic description. Algorithm 1 details the step-by-step process of analyzing data series through STK, including data preprocessing, transformation, and validation stages, culminating in the prediction and error quantification phases.
Algorithm 1. The algorithm for the data series interpolation using STK has the following stages |
- 0.
Input: Raw dataset - 1.
Perform Exploratory Data Analysis (EDA) on raw dataset - 2.
Assess EDA Results: IF the dataset passes normality and homoscedasticity tests, THEN GOTO Step 3, ELSE, GOTO Step 6. - 3.
Perform LOOCV for STK - 4.
Back-transformation of Predicted Data: - 5.
Calculate Error Metrics: MAE, RMSE, and MAPE END Algorithm - 6.
Data Transformation (IF needed after Step 2) - 7.
Simple Linear Detrending (IF Step 6 fails to normalize data) - 8.
Complex Model Detrending (IF Step 7 fails to normalize data)
|