3.2. Linear Model Outcomes
The correlation matrix between SOC and the 25 covariates (23 geophysical variables plus the geographical coordinates) was first computed, and different sets of highly correlated covariates were derived and used to fit SOC data.
The following equation shows the first attempt to model SOC with the most correlated variables:
The five-point summary statistics and the coefficients of the linear model are reported in
Table 3 and
Table 4. The outcomes seemed to indicate a larger contribution of the GPR data than of the EMI sensor data. The covariates related to the higher frequency antennae (1600MHz frequency) were therefore excluded.
In particular, the GPR data representations for both frequencies showed a first discontinuity in the radar signal at 0.1 m depth, a high level of spatial continuity along the soil profile at least to 0.30 m, and a second discontinuity after 0.30 m depth. Therefore, the selected covariates were representative of information derived by two different layers.
The model was significant (F-statistic: 4.80 on 3 and 67 DF,
p-value: 0.004) and showed a residual standard error of 0.26 with 67 degrees of freedom; multiple R-squared and adjusted R-squared were 0.177 and 0.14, respectively. Analysing
Table 4, it was evident that there was a unique significant covariate, ckAmp0.1m_600MHz. The result showed the distribution of SOC to be significantly affected by the shallower layer, probably because it was comparable with the portion of sampled soil.
After many other attempts (not reported), a model was developed with the following optimal arrangement of the covariates:
This model included the geographical coordinates and a unique geophysical covariate, ckAmp0.35m_600MHz (see
Table 5 and
Table 6). This model was better that the aforementioned one, with all the covariates significant, a better value of R-squared (multiple R-squared: 0.26, adjusted R-squared: 0.22), and a more significant F-statistic
p-value (F = 7.9 on 3 and 67 DF,
p-value: 0.00018). Residual standard error was 0.24 with 67 degrees of freedom.
The model’s residuals were then analysed. The Shapiro–Wilk Gaussianity test showed a nonsignificant departure from the normal distribution (W = 0.98567, p-value = 0.598); as a consequence, the Gaussian hypothesis was accepted. Afterwards, spatial autocorrelation analysis was performed to check at what extent the linear model filtered out the autocorrelation present in the raw data.
From
Table 7, it was evident that in the linear model’s residuals, there was still a significant quantity of spatial autocorrelation (
p-value = 0.0012). Therefore, it made sense to apply regression kriging (RK) to exploit the residual autocorrelation with the aim of improving the goodness of fit.
3.4. MARS Model Assessment
The original dataset was split into two complementary subsets, namely, training and test, corresponding to 80% and 20% of the original data, respectively.
Since the model is calibrated by means of the training dataset with the aim to predict the test data, the two subsets should be (statistically) similar at some extent. For this reason, after the splitting, subsets were subjected to the t-test for mean homogeneity and the Levene test for variance homogeneity. In addition, a univariate cluster analysis, carried out to assess the presence of clusters among data, showed that observations could be split into four groups. This represents another constraint about the splitting that has to be taken into account, i.e., the training and test subsets should be formed by a balanced quantity of elements extracted from all the clusters. Subsets were both checked for Gaussianity by means of Shapiro–Wilk test; results showed for both subsets a nonsignificant departure from normal distribution (W = 0.99, p-value = 0.90, for the training set; W = 0.97, p-value = 0.81, for the test set).
A Welch two-sample t-test showed that the means of the two subsets were not statistically different (t = −0.25, df = 20.36,
p-value = 0.81). In addition, a boxplot confirmed the equality of the two means of the SOC variable subsets (
Figure 8).
A Levene test, based on the absolute deviations from the median with a modified structural zero removal method and correction factor, showed the homogeneity of the group variances (test statistic = 0.059,
p-value = 0.81). In
Figure 9, the placement of the observations for the training (red points) and the test (green points) sets is reported.
In summary, the two subsets could be considered similar according to the distribution, mean value, and variance comparisons. Therefore, the training set seemed to be appropriate to calibrate the model and the test set to check for overfitting.
The MARS model selected only 4 out of 25 predictors, namely, ckAmp0.35m_600MHz, X, ckECaVer, and ckAmp0.1m_600MHz.
The model included the main GPR covariates selected previously. Regarding EMI data, only the apparent electrical conductivity measured in vertical polarization was selected because the two electrical conductivity variables were strongly correlated and therefore redundant. In addition, the sensor in vertical polarization had a maximum sensitivity approximately at a depth of 0.40 m, which was comparable with the time slices of GPR repeatedly selected (0.35 m).
From
Table 10, it can be drawn that the MARS model was formed by four terms; apart from the intercept, the first was linear, and the remaining two were interactions between couples of covariates. After importance analysis was applied, by using the GCV and raw residual sum of squares (Rss) indices, the selected predictors were ranked accordingly (
Table 11).
As first step, the Gaussianity of the residuals after the training was tested using the Shapiro–Wilk test; the residuals distribution could be considered Gaussian with a distribution ~ (W = 0.98, p-value = 0.50).
By applying a blind cross-validation with k-fold = 10, the resulting was 0.51, but it should be borne in mind that this was a pessimistic result, as the extractions of blocks of 10 elements (k-fold with k = 10) from the original dataset was performed 200 times in a purely random fashion, neglecting similar subsets. Moreover, the original dataset was relatively small and represents a not-very-homogeneous reality. Finally, the results in terms of goodness of fit were averaged.
The first step consisted of checking the correlation between predicted and observed values for the training set; the results showed a certain agreement (r = 0.72, p-value ≈ 0.0). In addition, correlation between residuals and predicted values of training subset was checked and was close to zero, as expected.
Afterwards, the MARS model calibrated on the training set was applied to predict SOC data from the test set, which was independent from the model calibration (training) set.
As a first step, the correlation between observations and (test set) predictions was analysed. This resulted in a highly significant correlation (r = 0.87, p-value ≈ 0.0). The value gained after the validation step surprisingly outperformed that of the training set, which is a rare event. The correlation between residuals and (test-set) predicted was not significant.
The residuals, according to the Shapiro–Wilk test, were Gaussian, with a distribution ~ (W = 0.93, p-value = 0.23).
Computing the Lin coefficient (CCC) between observations and predictions, the outcomes showed very good agreement (overall CCC, 0.81; overall precision, 0.88; overall accuracy, 0.93).
Since the observations were available, it was possible to compute the error metrics, which are reported in
Table 12.
The error indices were good overall; in particular, MAPE was below 10%, which value has been indicated in literature as a critical threshold. Another very interesting result concerned the ratio between MAE and RMSE, which was larger than that obtained with regression kriging (0.8 vs. 0.76). In conclusion, the MARS model could be considered effective whenever the coefficients of the covariates were not constant over the study domain and the covariates were intertwined in more complex ways than additively.
By comparing the error indices and Lin’s coefficients of both methods, it became evident that MARS performed better than RK. The two methods were linear (RK) and nonlinear (MARS), respectively. The main difference concerns the interaction terms, since the MARS model has one linear term and two multiplicative terms (interactions) that represent the added value that allowed improving the predictive capability of MARS with respect to that of RK.
In
Figure 10, the map of SOC predictions obtained with the MARS model is reported. Comparing the RK and MARS maps, they showed overall agreement, with a cluster of lower values in the northern part of the study area, a central part with the lowest values, and finally, a southern part with two clusters of larger values and a cluster of lower values.
Finally, to quantitatively compare the maps obtained by the two methods, a cross-correlogram was computed. The result was a value of 0.67 at the distance 0. Therefore, the map gained from RK can be considered a first approximation of that from MARS. This result underlines the reliability of the SOC spatial distribution predicted by the MARS model.