Next Article in Journal
OPTIMILK: A Web-Based Tool for Least-Cost Dairy Ration Optimization Using Linear Programming
Previous Article in Journal
Cotton Disease Recognition Method in Natural Environment Based on Convolutional Neural Network
Previous Article in Special Issue
Spatial Prediction of Organic Matter Quality in German Agricultural Topsoils
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Interpretable Digital Soil Organic Matter Mapping Based on Geographical Gaussian Process-Generalized Additive Model (GGP-GAM)

College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(9), 1578; https://doi.org/10.3390/agriculture14091578
Submission received: 13 June 2024 / Revised: 7 September 2024 / Accepted: 8 September 2024 / Published: 11 September 2024

Abstract

:
Soil organic matter (SOM) is a key soil component. Determining its spatial distribution is necessary for precision agriculture and to understand the ecosystem services that soil provides. However, field SOM studies are severely limited by time and costs. To obtain a spatially continuous distribution map of SOM content, it is necessary to conduct digital soil mapping (DSM). In addition, there is a vital need for both accuracy and interpretability in SOM mapping, which is difficult to achieve with conventional DSM models. To address the above issues, particularly mapping SOM content, a spatial coefficient of variation (SVC) regression model, the Geographic Gaussian Process Generalized Additive Model (GGP-GAM), was used. The root mean squared error (RMSE), mean average error (MAE), and adjusted coefficient of determination (adjusted R 2 ) of this model for SOM mapping in Leizhou area are 7.79, 6.01, and 0.33 g kg−1, respectively. GGP-GAM is more accurate compared to the other three models (i.e., Geographical Random Forest, Geographically Weighted Regression, and Regression Kriging). Moreover, the patterns of covariates affecting SOM are interpreted by mapping coefficients of each predictor individually. The results show that GGP-GAM can be used for the high-precision mapping of SOM content with good interpretability. This DSM technique will in turn contribute to agricultural sustainability and decision making.

1. Introduction

Organic matter is a key soil property and crucial factor in agriculture [1,2,3,4]; thus, it is important to study the spatial distribution of soil organic matter (SOM). In soil surveys, SOM content is generally determined by measuring soil organic carbon (SOC) content and multiplying it by the Van Bemmelen factor. SOM content and other soil properties can only be obtained by establishing sampling points in a specific study area by collecting and testing the soil at these points. However, to obtain the continuous distribution trend of such soil properties, spatial interpolation must be conducted on the soil sampling points to transforming discrete sampling point data into a continuous distribution map [5], i.e., transform geospatial data from point features to a raster image. Digital soil mapping (DSM) is a series of methods used to predict and obtain soil maps using correlated predictor variables based on soil spatial prediction functions with spatially autocorrelated errors (SSPFe) [6], which has become a research hotspot and been widely applied in the research of the spatial distribution of soil properties [7,8,9]. Interpolation methods for soil property digital mapping can be mainly divided into three types, namely deterministic interpolation [10,11,12], Geostatistics [11,13,14,15], and machine learning (ML) [16,17,18]. The representative models of deterministic interpolation are Inverse Distance Weighting [10] and Radial Basis Function [12]. Deterministic interpolation models have simple forms and less parameters, and are not consuming computationally. However, they fail to model spatial distributions with a satisfactory accuracy. Geostatistic interpolation, or kriging series interpolation, is commonly used in digital soil mapping [19,20,21]. In addition to the geostatistical methods that only consider the spatial autocorrelation of soil sampling points themselves, the modeling of spatial variability of soil properties through their correlation with auxiliary variables to improve geostatistics prediction is also studied [13,14,15]. With the increasing availability of environmental variate datasets such as remote sensing images and the digital elevation model (DEM), ML-based digital soil mapping using environmental covariates has become a new research hotspot [16,22,23,24,25,26]. A variety of covariates are used as predictors, and ML models are trained using soil sampling points and predict the soil properties at unknown locations. This method treats soil property interpolation as a spatial regression prediction. There are two types of ML models for soil spatial interpolation. The first are non-spatial models, such as the one used in Grimm et al. (2008) [16], with random forest models being utilized to predict SOC content through environmental factors, and Kalambukattu et al. (2018) [22], who established the artificial neural network interpolation of SOC and nitrogen content. These models essentially link environmental covariates with soil properties without considering the spatial autocorrelation and spatial heterogeneity inherent in soil as a natural geographical element. Considering this problem, a series of ‘spatial’ ML models have been proposed. For example, Gao et al. (2022) [23] proposed a two-point machine learning method for soil pollution interpolation, by computing the difference in covariates between observation points and the difference in response variables, using the difference in covariates between interpolation points and neighboring points to predict the difference in response variables, thereby establishing a model. This method considers both the heterogeneity of soil properties themselves and the heterogeneity of covariates’ influence on properties, thus modeling spatial autocorrelation and attribute similarity well. ML interpolation has high accuracy and can consider the synergistic effects between the environment and soil, but it requires a trade-off between interpretability and model accuracy [27]. High-precision models often have poor interpretability [16], making it difficult to interpret their results and revealing the impact patterns of environmental covariates on soil properties at the spatial local level. Therefore, it is necessary to conduct research on spatial interpolation methods for SOM content and explore high-precision interpolation models for SOM while also considering the need for the interpretability of soil property interpolation results, thus making the interpolation results widely trusted.
For complex ML models whose specific processes we cannot understand, a common method is to establish interpretable models outside of the original models and use post hoc means to achieve the interpretation of the trained models, exploring the impact patterns of factors on interpolation results. The Shapley Additive Explanation Value (SHAP) based on game theory [28] serves as an excellent post hoc interpretation method, which has been widely researched and applied in the field of geoscience and is a common model interpretation tool [29]. For example, Padarian et al. (2020) [30] used SHAP to interpret artificial neural network models for SOC mapping, quantifying the influence levels of different covariates at different values and analyzing the influence levels of covariates in different spatial locations through visualization, revealing spatial autocorrelation and spatial variability characteristics through SHAP value mapping. While SHAP has excellent explanatory power, it also has some drawbacks. For example, despite applying certain optimizations in the calculation process, it still involves a large computational burden and slow execution speed [31].
As a ML model with a complexity that lies between ordinary linear regression and deep neural networks, the Generalized Additive Model (GAM) offers both nonlinear fitting capabilities and interpretability similar to linear models. The Geographical Gaussian Process Generalized Additive Model (GGP-GAM) [32] is a spatially varying coefficient regression model that combines the generalized additive model with Gaussian process regression. It uses two-dimensional Gaussian process regression smoothing curves as basis functions to construct the GAM. Since Gaussian process regression has certain advantages in expressing the spatial autocorrelation of geographical phenomena [32], building a GGP-GAM with discrete observation points should model the spatial distribution trends of soil properties. As a spatially varying coefficient (SVC) model constructed with GAM framework, the spatial interpolation result obtained w=with GGP-GAM can be visualized through coefficient mapping, hence making the interpolation interpretable. GGP-GAM has not yet been applied to the digital mapping of SOM.
The study area of this paper is Leizhou, China. As a typical hilly region in southern China, Leizhou’s farmland is characterized by fragmented and dispersed features (as shown in Figure 1 [33]), making it difficult for traditional soil sampling to cover different characteristics of agricultural areas. Due to the fragmented nature of agricultural land in southern China, DSM can significantly reduce the number of soil samples required for surveys.
Based on the above analysis, the aim of this study is to apply an interpretable spatially varying coefficient, GGP-GAM, to digital SOM mapping for the first time. To do so, the accuracy and interpretability requirements for organic matter DSM, as well as the difficulty of directly interpreting commonly used models such as random forests, were considered. The accuracy of this model was compared with three commonly used models, and the influence patterns of environmental covariates on the interpolation results were explained and analyzed through coefficient mapping and effective degrees of freedom analysis. The experimental results demonstrated that the interpolation accuracy of the GGP-GAM was higher than that of the comparision models. The key innovations and contributions of this paper lie in the first application of GGP-GAM within the DSM framework. Unlike conventional ML models, GGP-GAM offers superior interpretability. This study provides a novel approach to high-precision and interpretable soil prediction mapping by integrating cutting-edge ML techniques with soil science principles. This study also discusses the impact of the number of knots on the accuracy of GGP-GAM in the interpolation of SOM.

2. Materials and Methods

2.1. Study Area

The study area in this paper is the county-level city of Leizhou, Guangdong Province, China. Leizhou is situated between 109° 42 12 E–110° 23 34 E longitude and 20° 26 0 8 N–21° 11 06 N latitude. It is located in the central part of the Leizhou Peninsula, bordered by the South China Sea to the east and the Beibu Gulf to the west. Leizhou falls within a tropical monsoon climate zone. The average annual temperature in 2021 was 24.3 °C, with an annual rainfall of 1468.6 mm and 2098.3 h of sunshine per year. Based on the report of the local government (refer to http://www.leizhou.gov.cn/lzsq/dlhj/content/post_1780166.html (accessed on 3 May 2024)), the total area of Leizhou is 3709.33 square kilometers, with coastlines on the east and west sides totaling 406 km. Most of the inland areas in Leizhou are characterized by flat tableland terrain, with some low hills. The terrain generally slopes from south to north, with valleys running in a north–south direction. The coastal areas in the east and west slightly slope towards the sea. The elevation ranges from 65 m to 174 m above sea level, with slopes generally ranging from 5 to 10 degrees. The overall terrain of Leizhou is relatively flat, with low elevations. There is a distinct plain zone in Leizhou formed by alluvial deposits in the middle and lower reaches of the Nandu River. The main soil types in Leizhou are red soil, coastal saline–alkali marsh soil, coastal sandy soil, coastal saline soil, and marsh soil. The cultivated land area in the city is about 1424 km2, with a forest coverage rate of 21.3%. Figure 1 illustrates the land cover in Leizhou. Figure 2 shows the Landsat 8 satellite image of Leizhou (https://www.gscloud.cn/sources/index (accessed on 3 May 2024)).

2.2. Data

The soil sampling data were obtained from the soil sampling of Guangdong Province Soil Testing and Fertilization Project (sampled during 2018–2020). The sampling area was divided into multiple sampling units based on soil type, land use, and administrative divisions. Within each sampling unit, sampling points were established with the aim of ensuring consistent soil characteristics. Each sampling point consisted of a composite sample collected from 15 to 20 individual sub-samples. Soil samples were analyzed for organic carbon content in a laboratory using the Walkley–Black method [34] and then multiplied by the Van Bemmelen factor (1.724) to obtain SOM content. To avoid outliers potentially introducing during sampling and testing, we filtered the raw data based on the PauTa Criterion [35] within a spatial domain. Specifically, we divided the study area into grids of 5 km, calculated the average and standard deviation of organic matter content for each grids, and removed sample points whose organic matter content deviated from the average by more than three times the standard deviation. After outlier processing, the number of valid sampling points was 590 (Table 1). Figure 3 shows the frequency histogram of the SOM sampling dataset.
This study utilized six environmental covariates, namely elevation, aspect, plan curvature, topographic wetness index (TWI), Normalized Difference Vegetation Index (NDVI), and enhanced vegetation index (EVI). Terrain covariates were derived from digital elevation model (DEM) (https://www.eorc.jaxa.jp/ALOS/en/dataset/aw3d30/aw3d30_e.htm (accessed on 9 January 2024)) by conducting terrain analysis. EVI was calculated from Landsat 8 (https://www.gscloud.cn/sources/index (accessed on 28 December 2023)) imagery bands.
Terrain is an important factor affecting soil nutrients [36]. Elevation affects temperature and precipitation, thereby influencing vegetation coverage and soil properties [37]. Different slope aspects receive varying sunlight lengths and angles, leading to differences in solar radiation received by the soil [38]. Therefore, aspect can affect soil properties by influencing soil water and thermal conditions [37,39]; curvatures are the primary attributes of the land surface that impact the energy fluxes and matter distribution within a watershed and can be considered a measure of the disequilibrium state prevailing in the landscape [40]. In some areas, it has been observed that the percentage of organic carbon has a significant correlation coefficient with the plan curvature [40]. Therefore, it can be considered as a factor influencing SOM content.
TWI can serve as a proxy for soil moisture content [41]. Soil moisture can affect soil nutrient levels through various mechanisms, such as influencing plant nutrient absorption and organic matter mineralization processes [42,43].
There is also a correlation between vegetation and soil nutrients. Differences in soil nutrients can lead to differences in vegetation types, as well as the extent of foliage growth [44]. On the other hand, different plant species have varying nutrient requirements, and differences in vegetation types can result in differences in soil nutrient content [45]. EVI was selected in this study as a covariate related to vegetation.
The above said factors have been widely used in SOM interpolation and mapping [9,46,47,48]. Therefore, this study selected them for SOM content spatial modeling. All environmental variables were organized in raster with pixel size of 30 m, and their spatial distribution is shown in Figure 4. To test the correlation between six covariates and SOM, importance analysis based on random forest was conducted. Compared to linear models, nonlinear models such as random forests can better model the relationships between soil properties and covariates [49]. In this study, the importance ranking was based on permuting out-of-bag (OOB) data. The parameters n-tree and m-try for this model were set to 500 and 2, respectively. The results are shown in Table 2, where the six environmental variables are ranked in descending order of importance: elevation, curvature, NDVI, EVI, aspect, TWI. The results indicate that there is a certain correlation between topography and vegetation with SOM, with elevation being the primary influencing factor.

2.3. Methods

2.3.1. Geographic Gaussian Process-Generalized Additive Model

GGP-GAM, proposed by Comber et al. (2024) [32], is a SVC model. It can be expressed as follows: form:
y = α + x 1 f 0 ( z ) + x 1 f 1 ( z ) + x 2 f 2 ( z ) + + x m f m ( z ) + ϵ
where each function f is a nonlinear coefficient on corresponding predictors, x. ϵ is random error and α is the offset. Z is the spatial location to be predicted. As a GAM, each f can be generated by add up Gaussian Process basis splines:
f j ( z ) = i = 1 k γ i b ( z ) , γ N 0 , s j 2
where b is an GP basis with zero mean, γ are the corresponding regression coefficient estimates. The n-th GP basis satisfying covariance function is as follows:
k n ( d ) = C o v b n ( z ) , b n z + d
where d is the distance between two locations. In spatial modeling, k is important as it essentially reflects the spatial heterogeneity of the target variable. Therefore, the estimation of k is critical in GGP-GAM modeling. The covariance function in GGP-GAM can be analogized to the semivariogram function in kriging interpolation, as it can capture spatial autocorrelation. Since the mean of each Gaussian regression basis function is 0, a fixed offset constant α should be introduced. GAM also has a crucial smoothing parameter λ , which controls the smoothness of the combined basis functions by introducing penalties into the model. In the multivariate form of GAM, each smooth term has its own λ . Compared to linear regression, the GGP-GAM model can fit smooth local non-linear relationships. Gaussian process basis functions embody the first law of geography and can effectively reveal spatial autocorrelation. Therefore, when used for the spatial regression modeling of soil properties and other attributes, it can achieve better fitting results. The Gaussian process regression smooth curve parameterization of GAM can explore the trends of the target variable in space through sample points. Additionally, as a model framework, GAM endows GGP-GAM with good interpretability. In this study, the GGP-GAM model was implemented using the R package ‘mgcv’, and automatic calculation of smoothing parameters was achieved through generalized cross-validation (GCV). GGP-GAM has two key parameters: the number of nodes (k-value) and the type of covariance function. The k value determines how many basis functions are added to fit the coefficient surface for each covariate. Covariance function type is a crucial parameter in Gaussian process regression modeling. In spatial modeling, it can reflect how spatial heterogeneity is expressed. In this study, the k-value and covariance function type of GGP-GAM model were set as 43 and exponential function, respectively.

2.3.2. Model Selection for Accuracy Comparison

To validate the performance of the GGP-GAM in the spatial interpolation of SOM, three comparison models were selected to interpolate the same dataset and compare their accuracies. Firstly, we chose Regression Kriging (RK) [13]. RK is a hybrid approach that combines regression modeling with kriging interpolation. It starts by establishing a relationship between the target variable and predictor variables using regression techniques. Then, it employs kriging to predict the residuals from the regression model. By incorporating environmental covariates into the prediction process, RK generally achieves a higher accuracy compared to other geostatistical interpolators. The second comparison model is Geographically Weighted Regression (GWR) [50], an extension of ordinary linear regression in spatial modeling. GWR decomposes global linear regression into spatially local regressions based on geographical principles. It can be applied to soil property spatial interpolation [51,52] and is a relatively simple and interpretable model. The final comparison model is Geographical Random Forest (GRF) [53]. Digital soil mapping based on random forest is a recent research hotspot in the field of soil property spatial interpolation and has been proven to achieve a high accuracy in interpolating SOM and other nutrient contents [54,55]. GRF is a local spatial ML model that combines random forest with GWR. It outperforms ordinary random forest in spatial modeling [53] and was thus chosen as the representative complex ML model for SOM content interpolation in this study.
The RK model was constructed using the R package ‘gstat’. GWR and GRF were implemented using R package ‘GWmodel’ and ‘SpatialML’, respectively. For the RK, GRF, and GWR models, the parameters were set as follows:
For RK, ordinary linear regression was selected as the regression model. For the semivariogram of the kriging model for residual interpolation, the stable model was chosen, and the a sill, range, and nugget were set to 5, 7000, and 15, respectively.
For GRF: n-trees, m-try, local weight, and global weight were set to 500, 2, 0.6, and 0.4, respectively.
For GWR: kernel type was Gaussian and selected bandwidth was 9843.53.

2.3.3. Accuracy Evaluation

To evaluate the accuracy of interpolation results, we conducted a 5-fold cross-validation. The training sample sets were used as the inputs for training the model, and the test sample sets were input into the trained model to compare the interpolation results with the actual values. Three accuracy evaluation metrics were selected: root mean squared error (RMSE), mean absolute error (MAE), and adjusted coefficient of determination (adjusted R 2 ).
R M S E = 1 m i = 1 m y i y ^ i 2
M A E = 1 m i = 1 m y i y ^ i
R 2 = 1 i y ^ i y i 2 i y y i 2
R a d j 2 = 1 1 R 2 n 1 n k 1
where m is the number of sample points of the test set, y i is the predicted value for the i-th point, y ^ i is the observed value for the i-th point, and y is the mean SOM content of all sampling points. n is the number of points and k is the number of independent variables. The smaller values of RMSE and MAE indicate that the model. has better accuracy The range of R-squared is [−1, 1], with larger values indicating better interpolation performance.

3. Results

3.1. Mapping Result of SOM Content in Leizhou

In this study, SOM maps were obtained using R software (version 4.2.2). Firstly, values of environmental covariates at the location of each sample points were extracted from the original raster for model training. Subsequently, the raster datasets were converted into an array composed of point features for SOM prediction. Based on these input data, the trained model predicted SOM values for each location. The prediction results were converted back into a raster, and SOM maps were finalized via visualization and by adding essential map elements, i.e., scale bars, north arrows, and legends. The mapping results and statistics of SOM and in the study area are shown in Figure 5 and Table 3, respectively. The figure clearly illustrates the spatial heterogeneity of SOM, with average predicted values varying across different regions. This variation corresponds to the natural geographic conditions that differ on a large scale within the study area. The details in the distribution map also depict significant heterogeneity of SOM on a smaller scale, which may be influenced by factors such as topographic and hydrological conditions, as well as human activities.
The comparison of mapping results reveals that all four models clearly demonstrate a certain trend in soil organic matter distribution in Leizhou. However, details of each map show a difference in their ability to capture local patterns. GGP-GAM and GRF are better at capturing local details. GGP-GAM’s mapping results clearly highlight the differences in SOM between subregions while producing smoother maps. This model effectively captures spatial patterns across different scales. GWR can illustrate the heterogeneity of SOM content between subregions, but its spatial variation is not very smooth. RK’s mapping exhibits a strong kriging interpolation characteristic, failing to reveal sufficient detail and leading to a less realistic representation of the actual distribution. These mapping results of SOM demonstrate that, due to the inherent advantages of gaussian process regression in modeling spatial autocorrelation and the flexible smooth linear additive structure of the additive model, GGP-GAM can accurately and reasonably express the overall distribution trends of the target variable within the region while also capturing fine details. Therefore, it proves to be an excellent model for interpolating SOM content. More importantly, this model achieves high accuracy while maintaining good interpretability.
Compared to the observed values of the sampled dataset, GGP-GAM produced soil maps with a minimum value closest to observed value. RK and GRF obtained minimum values higher than the observed values, while GWR resulted in negative predicted values. Among the four models, GGP-GAM and GWR achieved first quartiles that were closer to the observed value. For median, mean, and third quartile, the mapping results of all four models were relatively close to the observed values. RK and GWR obtained more accurate maximum values for the soil maps.

3.2. Accuracy

The RMSE, MAE, and adjusted R 2 for the spatial interpolation of SOM content in Leizhou using GGP-GAM and three comparison models are shown in Table 4. For RMSE and adjusted R 2 , GGP-GAM outperformed the other comparison models, while MAE was very close to GRF. Among the three comparison models, the RK model performed the best. The accuracy of the black box model GRF was between that of RK and GWR.
Figure 6 shows the scatter plot of observed and predicted SOM values for the test data from GGP-GAM and comparison models. A brown fitted line and a gray reference line are added. The closer the fitted line is to the reference line, the higher the accuracy. The results of the observed predicted value plots are consistent with the aforementioned accuracy validation results. Compared to the comparison models, the scatter points of GGP-GAM are noticeably closer to the reference line, and the fitted line is also closer to the reference line, demonstrating that this model performs well in the spatial interpolation of SOM content in Leizhou.
The validation results demonstrate that the GGP-GAM model achieves adequate accuracy for SOM content interpolation. Compared to the other models, it can more accurately estimate the attribute values at unknown locations, producing interpolation results that are closer to reality and generating high-quality SOM maps.

3.3. Model Interpretation

To better understand SOM interpolation results, coefficients of 6 covariates were mapped by inputting a matrix of value 1 to each predictor of interest and by setting other inputs to 0. The statistics and spatial distributions of SVCs are shown in Table 5 and Figure 7, respectively.
To enhance interpretability, we normalized the coefficients and mapped them using the same color scale. This allowed for a direct visual comparison of the relative influence of different coefficients at the same spatial location.
The distribution of elevation coefficients showed significant regional heterogeneity. Firstly, elevation in the southern part of Leizhou had a positive impact on SOM, while in the northern part, the impact was mainly negative. Secondly, the influence of elevation formed two peaks in the southeast and southwest, where the impact was positive, while a negative peak was observed directly in the north.
Aspect shows an overall correlation with SOM in Leizhou. The map illustrates that the influence of aspect on SOM is higher in the southwest and lower in the northeast. However, the variation trend is relatively smooth and stable, with changes occurring in only one direction. This indicates that the local spatial heterogeneity of the aspect’s influence on SOM is not significant. The influence of aspect on SOM mainly comes from the different amounts of solar radiation received by land with different aspects. Therefore, the spatial variability of aspect in Leizhou may result from varying solar radiation levels in different regions.
The distribution of plan curvature coefficients shows more evident changes in the north–south direction, with negative values mainly in the south and positive values mainly in the north. Most of Leizhou’s terrain is relatively flat, with some differences in elevation between the north and south. The spatial differences in aspect coefficients can partly reflect how, in flat terrain areas with significant elevation stratification at a larger scale, the impact of aspect on SOM varies across different elevation regions.
EVI shows a positive correlation with SOM in most areas. However, in the central western and southern eastern parts of the study area, it has a significantly negative impact. The evident differences in EVI’s influence on SOM, as shown in the above map, are related to the distribution patterns of EVI values within the study area. Areas with negative coefficients have more farmland with high EVI values, where human activities (such as farming) significantly affect SOM. The intensive cultivation of crops that demand high organic matter content might be the reason why EVI’s impact on SOM differs significantly in these areas compared to others.
The impact of TWI on SOM content exhibits noticeable heterogeneity, with coefficient values varying between positive and negative ones, showing different trends across various locations in the study area. In the central northern valley regions, TWI mainly shows a positive impact, while in the high-altitude southeastern regions, it shows a significantly negative impact. The southwestern region shows a slightly negative impact. This indicates that the relationship between soil moisture and SOM is not simply linear but involves a synergistic effect with various factors, such as elevation, terrain, and land type, demonstrating characteristics of local spatial autocorrelation.
The impact of NDVI is mostly positive in the southeastern region of Leizhou and negative in the northern part. Compared to EVI, its spatial variation trend is unidirectional and very smooth. This may be related to the insignificant difference in vegetation cover within the study area reflected by NDVI. This map also indicates that NDVI has low effective degrees of freedom in the established GAM.
Environmental covariates can illustrate various geographical features that have complex associations with SOM. The coefficient distribution maps indicate that the influence of each covariate on SOM varies significantly within the study area. Therefore, using SVC models in SOM content spatial interpolation can lead to better DSM performance.

3.4. Effective Degrees of Freedom

Effective degrees of freedom (EDF) is an important indicator for interpreting GAMs. It is related to the degree of nonlinearity of smoothing terms. The larger the effective degrees of freedom indicator, the higher the nonlinearity between the explanatory variables and the target variables; conversely, the smaller the effective degrees of freedom, the lower the nonlinearity. Table 6 shows the effective degrees of freedom for each environmental covariate in the GGP-GAM model established for the interpolation of SOM in Leizhou. There is a stronger nonlinear relationship between SOM and elevation and TWI, while the nonlinearity between SOM and NDVI, aspect, and plan curvature is weaker.

4. Discussion

4.1. The Influence of Knot Numbers on GGP-GAM SOM Interpolation

The number of knots is an important concept in GAM modeling. Basis functions are connected at the knots to form nonlinear smooth terms in a GAM. The number of knots represents the dimensionality of GAM basis functions. The distribution of the number and positions of the knots can alter the model’s fitting effect on the target variable. More knots result in a more pronounced nonlinear relationship in two-dimensional spatial modeling, producing a more “wiggly” surface (or a more “wiggly” curve in one-dimensional cases). Conversely, too few knots might lead to insufficient effective degrees of freedom in the model, resulting in underfitting [56]. The selection of the number of knots needs to ensure that there are enough effective degrees of freedom to capture the nonlinear relationships in the sampled data within the GAM. However, too many knots can lead to overfitting, reducing the accuracy of the fit. This section analyzes the effective degrees of freedom for each factor in the GGP-GAM for the spatial interpolation of SOM and discusses the impact of knot numbers. As Table 6 shows, the EDFs for EVI, aspect, and curvature are noticeably smaller, indicating that a knot number of 43 may not be sufficient for these smooth terms to fully capture their complete trends with SOM. Therefore, an experiment was conducted with these three predictors. The knot number for the selected factors was incrementally increased, starting from 1, while the knot number for the other predictors remained fixed at 43, and the models were established for interpolation. The other model parameters remained unchanged. The RMSE, MAE, and R 2 for the interpolation results with different knot numbers on the test set are shown in Figure 8. From the figure, it can be seen that as the number of knots increases, the RMSE and MAE metrics significantly decrease, while R 2 shows a significant increase. When the number of knots reaches around 60, the model’s accuracy peaks. Increasing the number of knots beyond this point leads to a slow decline in accuracy, followed by a subsequent rise. These results demonstrate that increasing the number of knots for the selected smooth terms with low effective degrees of freedom allows the model to better approximate real situations, thereby improving its accuracy. However, too many knots can lead to overfitting, which degrades performance in accuracy validation. Although interpolation accuracy improves again when the number of knots is around 50, this does not imply that more knots are always beneficial for SOM interpolation. On the contrary, an excessive number of knots may greatly increase computational costs.
The experiment’s results show that selecting the number of knots is crucial in GAM modeling. Knots and smoothing coefficients together control the “undulation” of the model’s smooth terms. Unlike the calculation of smoothing coefficients, the selection of the knot number often depends on the model user’s specification. Both excessively high and excessively low knot numbers can lead to suboptimal models. Therefore, appropriate knot numbers should be selected for each factor based on diagnostic information, such as the degrees of freedom of the terms in the GAM, to achieve a better interpolation accuracy and ensure that the results reflect actual conditions.
The method for automatically selecting the number of knots in additive models has been addressed in related research [57]. However, establishing a GAM on a two-dimensional spatial plane and appropriately selecting the number of knots is more complex and must conform to geographical principles. Further research is needed in this area.

4.2. Limitations

This study has some limitations. Firstly, this study applies a limited number of factors for prediction. Besides topography and vegetation, lithology, soil type, parent material, and climate are also strongly correlated with SOM content. Soil type and lithology can provide rich detailed information from this perspective. Therefore, introducing more comprehensive environmental covariates, especially the attributes of the soil itself, is a worthwhile direction for further research. Secondly, the covariance function for GGP-GAM used in this paper was merely an exponential function. Other types of covariance functions were not discussed. Since the covariance function represents variance with regard to distance, it is closely linked to modeling the spatial autocorrelation of an interpolated object, significantly impacting the interpolation accuracy. Future research should delve deeper into the selection of covariance function types and parameters in GGP-GAM soil property interpolation. Additionally, as SOM interpolation involves spatial modeling using two-dimensional Gaussian process regression basis functions, specifying each GGP-GAM smooth term through geostatistical empirical variogram fitting methods is also feasible and warrants further exploration. In addition, this study discussed the impact of the number of knots on the modeling accuracy but only compared interpolation accuracy by increasing the number of knots for smooth terms with low effective degrees of freedom. The specific mechanism of how the number of knots affects the model was not thoroughly investigated. Since the number of knots controls the degree of model’s fitting [56] and each covariate influences the target variable at different scales, further research is needed to determine the appropriate number of knots for each smooth term. Additionally, the selection of knot positions in the two-dimensional GAM should also be considered, which is a potential direction for future research. Moreover, another limitation of this study is the conversion factor (Van Bemmelen factor) used to estimate SOM content as using a fixed constant to calculate SOM can be inaccurate [58].

5. Conclusions

This study provides a novel approach for acquiring key soil information, potentially reducing soil survey costs in typical hilly regions of southern China. In this paper, GGP-GAM was applied in the DSM of organic matter content in Leizhou, followed by an accuracy comparison, interpretation of mapping results, and parameter discussion. Through cross-validation, we obtained RMSE, MAE, and R 2 values of 7.79, 6.01, and 0.33 g kg 1 for GGP-GAM, respectively, demonstrating its higher accuracy compared to the other three models studied here. Furthermore, coefficient mapping allowed us to analyze the influence patterns of covariates on SOM content, proving the excellent interpretability of GGP-GAM.
The findings of this study suggest that GGP-GAM is suitable for high-precision SOM prediction mapping. Compared to mainstream machine learning DSM models, it offers superior interpretability, aiding in understanding the relationships between environmental covariates and soil properties. This enhanced interpretability lends greater credibility to the resulting maps. Thus, the model can better support agricultural decision making. Moreover, the methodology employed can be extended to the predictive mapping of other soil properties, such as soil nutrients and soil moisture. A key limitation of this research lies in the limited types of environmental variables considered. Incorporating a wider range of high-resolution covariate data could further enhance GGP-GAM’s interpolation accuracy. This will be a direction for our further research.

Author Contributions

Conceptualization, L.C. and J.X.; methodology, L.C.; software, M.Y.; validation, W.Z., W.G. and L.Z.; formal analysis, L.C.; investigation, L.C.; resources, J.X.; data curation, W.Z.; writing—original draft preparation, L.C.; writing—review and editing, L.C.; visualization, L.C.; supervision, J.X.; project administration, J.X.; funding acquisition, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors are grateful to the editors and reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wood, S.A.; Tirfessa, D.; Baudron, F. Soil organic matter underlies crop nutritional quality and productivity in smallholder agriculture. Agric. Ecosyst. Environ. 2018, 266, 100–108. [Google Scholar] [CrossRef]
  2. Kane, D.A.; Bradford, M.A.; Fuller, E.; Oldfield, E.E.; Wood, S.A. Soil organic matter protects US maize yields and lowers crop insurance payouts under drought. Environ. Res. Lett. 2021, 16, 044018. [Google Scholar] [CrossRef]
  3. Dias, P.M.S.; Portela, J.C.; Gondim, J.E.F.; Batista, R.O.; Rossi, L.S.; Medeiros, J.L.F.; Farias, P.K.P.; Mota, P.J.; Bandeira, D.J.D.C.; Filho, L.C.D.A.L.; et al. Soil Attributes and Their Interrelationships with Resistance to Root Penetration and Water Infiltration in Areas with Different Land Uses in the Apodi Plateau, Semiarid Region of Brazil. Agriculture 2023, 13, 1921. [Google Scholar] [CrossRef]
  4. Phiwdaeng, N.; Polpinit, P.; Poltanee, A.; Kaewpradit, W. Land use change from paddy rice to sugarcane under long-term no-till conditions: Increase P balance, soil organic matter and sugarcane productivity. Arch. Agron. Soil Sci. 2024, 70, 1–17. [Google Scholar] [CrossRef]
  5. He, W.; Xiao, Z.; Lu, Q.; Wei, L.; Liu, X. Digital Mapping of Soil Particle Size Fractions in the Loess Plateau, China, Using Environmental Variables and Multivariate Random Forest. Remote Sens. 2024, 16, 785. [Google Scholar] [CrossRef]
  6. McBratney, A.; Mendonça Santos, M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
  7. Wang, S.; Zhuang, Q.; Wang, Q.; Jin, X.; Han, C. Mapping stocks of soil organic carbon and soil total nitrogen in Liaoning Province of China. Geoderma 2017, 305, 250–263. [Google Scholar] [CrossRef]
  8. Martínez Pastur, G.; Aravena Acuña, M.C.; Chaves, J.E.; Cellini, J.M.; Silveira, E.M.O.; Rodriguez-Souilla, J.; von Müller, A.; La Manna, L.; Lencinas, M.V.; Peri, P.L. Nitrogenous and Phosphorus Soil Contents in Tierra del Fuego Forests: Relationships with Soil Organic Carbon, Climate, Vegetation and Landscape Metrics. Land 2023, 12, 983. [Google Scholar] [CrossRef]
  9. Zhang, W.; Cheng, L.; Xu, R.; He, X.; Mo, W.; Xu, J. Assessing Spatial Variation and Driving Factors of Available Phosphorus in a Hilly Area (Gaozhou, South China) Using Modeling Approaches and Digital Soil Mapping. Agriculture 2023, 13, 1541. [Google Scholar] [CrossRef]
  10. Saffari, M.; Yasrebi, J.; Fathi, H.; Karimian, N.; Moazallahi, M.; Gazni, R. Evaluation and Comparison of Ordinary Kriging and Inverse Distance Weighting Methods for Prediction of Spatial Variability of Some Soil Chemical Parameters. Res. J. Biol. Sci. 2009, 4, 93–102. [Google Scholar]
  11. Hani, A.; Abari, S.A.H. Determination of Cd, Zn, K, pH, TNV, Organic Material and Electrical Conductivity (EC) Distribution in Agricultural Soils using Geostatistics and GIS (Case Study: South- Western of Natanz-Iran). Int. J. Agric. Biosyst. Eng. 2011, 5, 852–855. [Google Scholar]
  12. Bhunia, G.S.; Shit, P.K.; Maiti, R. Spatial variability of soil organic carbon under different land use using radial basis function (RBF). Model. Earth Syst. Environ. 2016, 2, 17. [Google Scholar] [CrossRef]
  13. Odeh, I.O.A.; McBratney, A.B.; Chittleborough, D.J. Further results on prediction of soil properties from terrain attributes: Heterotopic cokriging and regression-kriging. Geoderma 1995, 67, 215–226. [Google Scholar] [CrossRef]
  14. Hengl, T.; Heuvelink, G.B.M.; Stein, A. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 2004, 120, 75–93. [Google Scholar] [CrossRef]
  15. Wälder, K.; Wälder, O.; Rinklebe, J.; Menz, J. Estimation of soil properties with geostatistical methods in floodplains. Arch. Agron. Soil Sci. 2008, 54, 275–295. [Google Scholar] [CrossRef]
  16. Grimm, R.; Behrens, T.; Märker, M.; Elsenbeer, H. Soil organic carbon concentrations and stocks on Barro Colorado Island—Digital soil mapping using Random Forests analysis. Geoderma 2008, 146, 102–113. [Google Scholar] [CrossRef]
  17. Hengl, T.; Leenaars, J.G.B.; Shepherd, K.D.; Walsh, M.G.; Heuvelink, G.B.M.; Mamo, T.; Tilahun, H.; Berkhout, E.; Cooper, M.; Fegraus, E.; et al. Soil nutrient maps of Sub-Saharan Africa: Assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutr. Cycl. Agroecosyst. 2017, 109, 77–102. [Google Scholar] [CrossRef] [PubMed]
  18. Saygın, F.; Aksoy, H.; Alaboz, P.; Dengiz, O. Different approaches to estimating soil properties for digital soil map integrated with machine learning and remote sensing techniques in a sub-humid ecosystem. Environ. Monit. Assess. 2023, 195, 1061. [Google Scholar] [CrossRef]
  19. Burgess, T.M.; Webster, R. Optimal Interpolation and Isarithmic Mapping of Soil Properties: I The Semi-Variogram and Punctual Kriging. J. Soil Sci. 1980, 31, 315–331. [Google Scholar] [CrossRef]
  20. Karydas, C.G.; Gitas, I.Z.; Koutsogiannaki, E.; Lydakis-Simantiris, N.; Silleos, G.N. Evaluation of spatial interpolation techniques for mapping agricultural topsoil properties in Crete. EARSeL eProc. 2009, 8, 26–39. [Google Scholar]
  21. John, K.; Lawani, S.; Ayito, E.; Kebonye, N.; Ogeh, J.; Penížek, V. Predictive Mapping of Soil Properties for Precision Agriculture Using Geographic Information System (GIS) Based Geostatistics Models. Mod. Appl. Sci. 2019, 10, 60–77. [Google Scholar] [CrossRef]
  22. Kalambukattu, J.G.; Kumar, S.; Arya Raj, R. Digital soil mapping in a Himalayan watershed using remote sensing and terrain parameters employing artificial neural network model. Environ. Earth Sci. 2018, 77, 203. [Google Scholar] [CrossRef]
  23. Gao, B.; Stein, A.; Wang, J. A two-point machine learning method for the spatial prediction of soil pollution. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102742. [Google Scholar] [CrossRef]
  24. Pereira, G.W.; Valente, D.S.M.; de Queiroz, D.M.; Santos, N.T.; Fernandes-Filho, E.I. Soil mapping for precision agriculture using support vector machines combined with inverse distance weighting. Precis. Agric. 2022, 23, 1189–1204. [Google Scholar] [CrossRef]
  25. Suleymanov, A.; Gabbasova, I.; Komissarov, M.; Suleymanov, R.; Garipov, T.; Tuktarova, I.; Belan, L. Random Forest Modeling of Soil Properties in Saline Semi-Arid Areas. Agriculture 2023, 13, 976. [Google Scholar] [CrossRef]
  26. Wang, J.; Feng, C.; Hu, B.; Chen, S.; Hong, Y.; Arrouays, D.; Peng, J.; Shi, Z. A novel framework for improving soil organic matter prediction accuracy in cropland by integrating soil, vegetation and human activity information. Sci. Total. Environ. 2023, 903, 166112. [Google Scholar] [CrossRef] [PubMed]
  27. Marcinkevičs, R.; Vogt, J.E. Interpretable and explainable machine learning: A methods-centric overview with concrete examples. WIREs Data Min. Knowl. Discov. 2023, 13, e1493. [Google Scholar] [CrossRef]
  28. Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
  29. Qiu, H.; Xu, Y.; Tang, B.; Su, L.; Li, Y.; Yang, D.; Ullah, M. Interpretable Landslide Susceptibility Evaluation Based on Model Optimization. Land 2024, 13, 639. [Google Scholar] [CrossRef]
  30. Padarian, J.; McBratney, A.B.; Minasny, B. Game theory interpretation of digital soil mapping convolutional neural networks. Soil 2020, 6, 389–397. [Google Scholar] [CrossRef]
  31. Lin, K.; Gao, Y. Model interpretability of financial fraud detection by group SHAP. Expert Syst. Appl. 2022, 210, 118354. [Google Scholar] [CrossRef]
  32. Comber, A.; Harris, P.; Brunsdon, C. Multiscale spatially varying coefficient modelling using a Geographical Gaussian Process GAM. Int. J. Geogr. Inf. Sci. 2024, 38, 27–47. [Google Scholar] [CrossRef]
  33. Yang, J.; Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
  34. Walkley, A.; Black, I.A. An Examination of the Degtjareff Method for Determining Soil Organic Matter, and a Proposed Modification of the Chromic Acid Titration Method. Soil Sci. 1934, 37, 29–38. [Google Scholar] [CrossRef]
  35. Jiang, H.; Zou, Q.; Zhou, B.; Hu, Z.; Li, C.; Yao, S.; Yao, H. Susceptibility Assessment of Debris Flows Coupled with Ecohydrological Activation in the Eastern Qinghai-Tibet Plateau. Remote Sens. 2022, 14, 1444. [Google Scholar] [CrossRef]
  36. Gou, Y.; Chen, H.; Wu, W.; Liu, H.B. Effects of slope position, aspect and cropping system on soil nutrient variability in hilly areas. Soil Res. 2015, 53, 338–348. [Google Scholar] [CrossRef]
  37. Zhu, M.; Feng, Q.; Qin, Y.; Cao, J.; Zhang, M.; Liu, W.; Deo, R.C.; Zhang, C.; Li, R.; Li, B. The role of topography in shaping the spatial patterns of soil organic carbon. Catena 2019, 176, 296–305. [Google Scholar] [CrossRef]
  38. McCune, B.; Keon, D. Equations for potential annual direct incident radiation and heat load. J. Veg. Sci. 2002, 13, 603–606. [Google Scholar] [CrossRef]
  39. Sigua, G.C.; Coleman, S.W.; Albano, J.; Williams, M. Spatial distribution of soil phosphorus and herbage mass in beef cattle pastures: Effects of slope aspect and slope position. Nutr. Cycl. Agroecosyst. 2011, 89, 59–70. [Google Scholar] [CrossRef]
  40. Khanifar, J.; Khademalrasoul, A. Multiscale computation of different plan curvature forms to enhance the prediction of soil properties in a low-relief watershed. Acta Geophys. 2024, 72, 933–944. [Google Scholar] [CrossRef]
  41. Riihimäki, H.; Kemppinen, J.; Kopecký, M.; Luoto, M. Topographic Wetness Index as a Proxy for Soil Moisture: The Importance of Flow-Routing Algorithm and Grid Resolution. Water Resour. Res. 2021, 57, e2021WR029871. [Google Scholar] [CrossRef]
  42. Brown, R.L.; Hangs, R.; Schoenau, J.; Bedard-Haughn, A. Soil Nitrogen and Phosphorus Dynamics and Uptake by Wheat Grown in Drained Prairie Soils under Three Moisture Scenarios. Soil Sci. Soc. Am. J. 2017, 81, 1496–1504. [Google Scholar] [CrossRef]
  43. Benbi, D.K.; Khosa, M.K. Effects of Temperature, Moisture, and Chemical Composition of Organic Substrates on C Mineralization in Soils. Commun. Soil Sci. Plant Anal. 2014, 45, 2734–2753. [Google Scholar] [CrossRef]
  44. Ng, C.W.W.; Tasnim, R.; Capobianco, V.; Coo, J.L. Influence of soil nutrients on plant characteristics and soil hydrological responses. Géotech. Lett. 2018, 8, 19–24. [Google Scholar] [CrossRef]
  45. Johnson, B.G.; Verburg, P.S.J.; Arnone, J.A. Plant species effects on soil nutrients and chemistry in arid ecological zones. Oecologia 2016, 182, 299–317. [Google Scholar] [CrossRef]
  46. Nketia, K.A.; Asabere, S.B.; Erasmi, S.; Sauer, D. A new method for selecting sites for soil sampling, coupling global weighted principal component analysis and a cost-constrained conditioned Latin hypercube algorithm. MethodsX 2019, 6, 284–299. [Google Scholar] [CrossRef] [PubMed]
  47. Zeng, P.; Song, X.; Yang, H.; Wei, N.; Du, L. Digital Soil Mapping of Soil Organic Matter with Deep Learning Algorithms. ISPRS Int. J. Geo-Inf. 2022, 11, 299. [Google Scholar] [CrossRef]
  48. Arroyo, I.; Tamariz-Flores, V.; Castelan, R. Mapping Forest Cover and Estimating Soil Organic Matter by GIS-Data and an Empirical Model at the Subnational Level in Mexico. Forests 2023, 14, 539. [Google Scholar] [CrossRef]
  49. Lu, Q.; Tian, S.; Wei, L. Digital mapping of soil pH and carbonates at the European scale using environmental variables and machine learning. 2023, 856, 159171. [CrossRef]
  50. Brunsdon, C.; Fotheringham, S.; Charlton, M. Geographically Weighted Regression. J. R. Stat. Soc. Ser. D (Stat.) 1998, 47, 431–443. [Google Scholar] [CrossRef]
  51. Chen, L.; Ren, C.; Li, L.; Wang, Y.; Zhang, B.; Wang, Z.; Li, L. A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS Int. J. Geo-Inf. 2019, 8, 174. [Google Scholar] [CrossRef]
  52. Wang, S.; Zhuang, Q.; Jin, X.; Yang, Z.; Liu, H. Predicting Soil Organic Carbon and Soil Nitrogen Stocks in Topsoil of Forest Ecosystems in Northeastern China Using Remote Sensing Data. Remote Sens. 2020, 12, 1115. [Google Scholar] [CrossRef]
  53. Georganos, S.; Grippa, T.; Niang Gadiaga, A.; Linard, C.; Lennert, M.; Vanhuysse, S.; Mboga, N.; Wolff, E.; Kalogirou, S. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int. 2021, 36, 121–136. [Google Scholar] [CrossRef]
  54. Wiesmeier, M.; Barthold, F.; Blank, B.; Kögel-Knabner, I. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant Soil 2011, 340, 7–24. [Google Scholar] [CrossRef]
  55. Wang, L.; Zhou, Y. Combining Multitemporal Sentinel-2A Spectral Imaging and Random Forest to Improve the Accuracy of Soil Organic Matter Estimates in the Plough Layer for Cultivated Land. Agriculture 2023, 13, 8. [Google Scholar] [CrossRef]
  56. Wood, S.N. Generalized Additive Models: An Introduction with R, 2nd ed.; Chapman & Hall/CRC Texts in Statistical Science; CRC Press/Taylor & Francis Group: Boca Raton, FL, USA, 2017. [Google Scholar]
  57. Ruppert, D. Selecting the Number of Knots for Penalized Splines. J. Comput. Graph. Stat. 2002, 11, 735–757. [Google Scholar] [CrossRef]
  58. Pribyl, D.W. A critical review of the conventional SOC to SOM conversion factor. Geoderma 2010, 156, 75–83. [Google Scholar] [CrossRef]
Figure 1. Land cover map of study area.
Figure 1. Land cover map of study area.
Agriculture 14 01578 g001
Figure 2. Satellite Image of study area (Landsat 8, accessed on 3 May 2024) and distribution of soil sample points.
Figure 2. Satellite Image of study area (Landsat 8, accessed on 3 May 2024) and distribution of soil sample points.
Agriculture 14 01578 g002
Figure 3. Frequency histogram of SOM dataset.
Figure 3. Frequency histogram of SOM dataset.
Agriculture 14 01578 g003
Figure 4. Environmental covariates. (a) EVI. (b) TWI. (c) NDVI. (d) Aspect. (e) Plan curvature. (f) Elevation.
Figure 4. Environmental covariates. (a) EVI. (b) TWI. (c) NDVI. (d) Aspect. (e) Plan curvature. (f) Elevation.
Agriculture 14 01578 g004aAgriculture 14 01578 g004b
Figure 5. SOM map of Leizhou based on GGP-GAM and comparison models. (a) GGP-GAM, (b) GWR, (c) GRF, (d) RK.
Figure 5. SOM map of Leizhou based on GGP-GAM and comparison models. (a) GGP-GAM, (b) GWR, (c) GRF, (d) RK.
Agriculture 14 01578 g005
Figure 6. Scatter plots of observed and predicted SOM content from the validation data based interpolation using (a) GGP-GAM, (b) RK, (c) GRF, and (d) GWR.
Figure 6. Scatter plots of observed and predicted SOM content from the validation data based interpolation using (a) GGP-GAM, (b) RK, (c) GRF, and (d) GWR.
Agriculture 14 01578 g006
Figure 7. The spatially varying coefficients of SOM interpolation from GGP-GAM. (a) Elevation, (b) aspect, (c) plan curvature, (d) EVI, (e) TWI, (f) NDVI.
Figure 7. The spatially varying coefficients of SOM interpolation from GGP-GAM. (a) Elevation, (b) aspect, (c) plan curvature, (d) EVI, (e) TWI, (f) NDVI.
Agriculture 14 01578 g007
Figure 8. RMSE, MAE, and R 2 for SOM interpolation with different knot numbers. (a) RMSE, (b) MAE, (c) R 2 .
Figure 8. RMSE, MAE, and R 2 for SOM interpolation with different knot numbers. (a) RMSE, (b) MAE, (c) R 2 .
Agriculture 14 01578 g008
Table 1. Major statistical moments. (maximum, minimum, mean, 1st quartile, 3rd quartile, and standard deviation).
Table 1. Major statistical moments. (maximum, minimum, mean, 1st quartile, 3rd quartile, and standard deviation).
StatisticMaxMinMean1st Quartile3rd QuartileStd
SOM Content (g/kg)48.52.521.414.527.72.6
Table 2. Importance of covariates based on RF.
Table 2. Importance of covariates based on RF.
Environment VariablesElevationCurvatureNDVIEVIAspectTWI
Importance9168.2867437.7337168.6676888.2174590.1513296.906
Table 3. Summaries of SOM mapping results.
Table 3. Summaries of SOM mapping results.
ModelMin (g/kg)1st Quartile (g/kg)Median (g/kg)Mean (g/kg)3rd Quartile (g/kg)Max (g/kg)
Observed2.5014.5321.3521.3527.6848.50
GGP-GAM0.1416.6721.0221.0125.3340.59
RK7.0118.3122.6022.7027.1546.37
GRF7.4917.6721.3721.4824.9641.92
GWR−1.1216.6221.3021.2425.5050.31
Table 4. RMSE, MAE, and adjusted R 2 of SOM interpolation results using GGP-GAM and comparison models.
Table 4. RMSE, MAE, and adjusted R 2 of SOM interpolation results using GGP-GAM and comparison models.
ModelRMSE (g kg 1 )MAE ( g kg 1 ) R 2 Adjusted R 2
GGP-GAM7.79576.01190.34250.3343
RK7.86636.12990.33820.3314
GRF7.92535.95020.31700.3085
GWR8.07546.17910.28970.2809
Table 5. Summaries of GGP-GAM coefficients for each SOM predictor.
Table 5. Summaries of GGP-GAM coefficients for each SOM predictor.
StatisticMin1st QuartileMedian3rd QuartileAverageMax
aspect−0.27−0.040.090.230.090.43
plan curvature−6239.5−2430.2−212.702042.8−137.76288.9
elevation−0.19−0.13−0.030.04−0.040.16
EVI−18.66−8.55−5.54−3.13−5.596.35
NDVI−15.55−4.200.966.160.9917.71
TWI−3.09−0.26−0.070.30−0.031.27
Table 6. Effective degrees of freedom of each predictor.
Table 6. Effective degrees of freedom of each predictor.
PredictorEDF
Aspect2.50
Elevation11.00
Plan curvature2.00
TWI18.641
EVI2.50
NDVI6.579
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, L.; Yan, M.; Zhang, W.; Guan, W.; Zhong, L.; Xu, J. Interpretable Digital Soil Organic Matter Mapping Based on Geographical Gaussian Process-Generalized Additive Model (GGP-GAM). Agriculture 2024, 14, 1578. https://doi.org/10.3390/agriculture14091578

AMA Style

Cheng L, Yan M, Zhang W, Guan W, Zhong L, Xu J. Interpretable Digital Soil Organic Matter Mapping Based on Geographical Gaussian Process-Generalized Additive Model (GGP-GAM). Agriculture. 2024; 14(9):1578. https://doi.org/10.3390/agriculture14091578

Chicago/Turabian Style

Cheng, Liangwei, Mingzhi Yan, Wenhui Zhang, Weiyan Guan, Lang Zhong, and Jianbo Xu. 2024. "Interpretable Digital Soil Organic Matter Mapping Based on Geographical Gaussian Process-Generalized Additive Model (GGP-GAM)" Agriculture 14, no. 9: 1578. https://doi.org/10.3390/agriculture14091578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop