1. Introduction
Soil is a significant natural resource in human production and life and is an important carrier of the human living environment. As the main material basis of land resources, the inherently non-renewable nature of soil determines the limited carrying capacity [
1]. One of the critical factors reflecting the quality of farmland is the soil nutrient content, and the normal growth and development of plants and soil nutrients are closely related [
2]. In the era of rapid development of digital agriculture, accurate, fast and dynamic acquisition of soil information on demand is the guarantee of modern precision agriculture and understanding the spatial distribution characteristics of nutrients is a basic requirement for sustainable agriculture, which plays an important role in sustainable agro-ecological development [
2,
3,
4]. Total nitrogen (TN) is a key indicator of soil nitrogen availability and one of the measures of soil fertility. Nitrogen is a nutrient element that is required in high amounts during vegetation growth [
3,
5,
6]. With the progress of modern agriculture, it has been an important problem for researchers to obtain the required soil information in a limited time to assess the land fertility in a timely manner and to guide the scientific fertilization of agricultural production.
Traditional nutrient content testing is expensive, time-consuming, and basically obtained in laboratory analysis, the nutrient content obtained at the point scale is not conducive to good development of sustainable agriculture, and remote-sensing technology is an important technique for obtaining spatial characteristics of soil nutrients [
3,
7]. Combining remote sensing images to predict the spatial distribution of soil nutrients meets the current requirements for sustainable agricultural development. Presently, most of the models used for spatial prediction of TN are divided into two categories. The first type is linear models, which are constructed by simulating the linear relationship between the reflectance of remote sensing image bands and the TN content, and thus inverse models, including partial least squares regression (PLSR) [
8,
9,
10], multiple linear regression (MLR) [
11,
12], and other models. However, due to the multiple and complex relationships between the reflectance of multispectral image bands and soil nutrient content, the constructed linear models are not sufficient to reflect the spatial distribution of nutrients well and are lacking in prediction accuracy. In this case, the second type of models for nutrient prediction, that is, machine learning methods, is needed. Machine learning techniques have the best performance in soil nutrient spatial inversion prediction [
13], which can make up for the deficiencies of linear models and solve the complex nonlinear relationship between band reflectance and nutrient content and can well reflect the characteristics of nutrient spatial distribution in the study area. The models commonly used in soil nutrient spatial distribution prediction are neural networks (NN) [
4,
14], support vector machines (SVM) [
15,
16], random forest (RF) [
3,
17]. Models such as BPNN and SVM can solve the nonlinear problem well, which makes the prediction accuracy of the model higher and the nutrient spatial distribution information more accurate. Nonlinear models, mainly BPNN and SVM, have been widely used in soil moisture [
18,
19], soil organic matter [
20], soil heavy metal [
21], and soil quality [
22].
The existing studies on TN basically extract sensitive bands through correlation analysis and select the best inversion model based on the deviation between measured and predicted values [
23], which contain both linear and nonlinear models, both of which can be used for spatial prediction of soil nutrients in a region. Prediction accuracy of the BPNN model in inversion of TN content in black soil is 6.5% higher than that of PLSR [
24]. Xiao et al. [
25] used MODIS data to construct a stochastic forest model to invert the spatial distribution of TN content in Shandong province with a model coefficient of determination of 0.570. Lin et al. [
26] estimated soil total nitrogen based on the synthetic color learning machine (SCLM) method. Machine learning models have useful application prospects in the study of predicting the spatial distribution of TN, but machine learning models have different results for different nutrient species and study areas. It is especially important to study the spatial prediction model of soil total nitrogen suitable for this study area, which will be of great help in predicting the spatial distribution of soil total nitrogen in this study area.
The study of spatial characteristics of soil nutrients is an urgent problem in agriculture and ecology, and the construction of a model suitable for predicting the spatial distribution of TN in this study area is equally important. Therefore, in this study, three linear and nonlinear prediction models, PLSR, BPNN and SVM, were constructed based on Landsat 5 TM multispectral remote sensing images, the original reflectance and the mathematically transformed reflectance to analyze the spatial distribution of soil TN in Datong County, Xining City, Qinghai Province, China. The prediction accuracy of the three models was also compared, and the optimal model was selected to predict the spatial distribution of TN, which provides data support and basis for soil quality evaluation, grain yield estimation and sustainable agriculture in Datong County.
4. Discussion
Soil nutrients are an important factor in measuring soil fertility, and traditional farm management and agricultural systems have led to polarization of soil nutrients in farmland, i.e., excess nutrients in fertile soils reduce the utilization of chemical fertilizers, and nutrient-poor soil vegetation does not receive adequate nutrients. Soil nutrient monitoring has evolved from qualitative to quantitative studies [
35], and the determination of TN inversion models plays an important role in understanding their spatial distribution characteristics, guiding the implementation of precision agriculture, and promoting the development of agricultural production [
2,
36]. Monitoring of total nitrogen content has further improved and updated its inversion means from traditional laboratory analysis to large-scale spatial prediction based on regional scale. The most important of them is the application of remote-sensing technology, which has the advantages of accuracy, speed and economy [
35] and can predict well the spatial distribution of total soil nitrogen in the study area. The establishment of models for prediction has gradually developed from linear to nonlinear models, for example, Liu et al. [
28] established a PLSR model to predict the soil total nitrogen content in Shaanxi Province. Li et al. [
37] established a stepwise linear regression model to invert the soil total nitrogen content based on vegetation information. Xu et al. [
38] used Landsat remote sensing images to construct the regression kriging (RK) to predict the soil total nitrogen content of a farm. Wang et al. [
39] mapped the distribution of TN content in northeastern Liaoning Province, China, based on a random forest model with three different kernel functions and a multiple linear stepwise regression (MLSR) model. Ye et al. [
40] predicted soil total nitrogen content based on a radial basis function neural network (RBFNN). Further development of the prediction model can be seen.
The spatial distribution of TN is largely influenced by anthropogenic and natural factors, and there is spatial heterogeneity [
16,
40]. Therefore, the spatial prediction of TN cannot only use image band reflectance as the model independent variable, but needs to consider more environmental and anthropogenic factors, such as topographic factors [
36], climatic factors [
41], soil types [
6], tillage practices [
4,
42], and crop types [
43]. Nonlinear models play an important role in regions subject to anthropogenic disturbances [
40]. Sun et al. [
44] used a random forest model to analyze the relative importance of factors affecting soil nutrients. The results showed that temperature, precipitation and elevation were significant influencing factors for soil nutrient prediction. Different soil types [
45] and different topographic trends will directly affect the TN spatial distribution. Dong et al. [
17] showed that predicting soil nutrients based on environmental and anthropogenic factors is more important than the common interpolation technique to obtain better prediction accuracy. Zhang et al. [
43] found that the TN spatial pattern is related to elevation. Peri et al. [
41] found that TN decreased with increasing soil desertification, indicating that TN content was closely related to the degree of vegetation denseness. The growth and development of vegetation cannot be separated from the influence of climatic conditions, and climate will ultimately affect the spatial distribution of TN content. In summary, spatial distribution of TN content is influenced by many factors and there is a complex relationship between them. Future research will focus on the spatial prediction of TN under the synergistic conditions of environmental and anthropogenic factors, and understand the relationship between each influencing factor, and extract TN sensitive bands and factors with high correlation with the help of higher resolution remote sensing data to accurately predict soil TN content.
5. Conclusions
Based on the field sampling data of Datong County, this paper uses statistical software to correlate the TN of sampling points with the reflectance of the corresponding TM image bands, as well as to construct PLSR, BPNN and SVM models to invert the spatial distribution of TN in the study area, respectively. The study demonstrates that the average TN content in Datong County is 1.864 g/kg with a coefficient of variation of 30.596%, which is a medium degree of variation. The correlation was significantly improved (r = 0. 584, p < 0.01) and the prediction accuracy of the model were improved after preprocessing the mathematical transformation of the raw spectral reflectance. Comparing the prediction accuracy of the three models, PLSR, BPNN and SVM, the prediction accuracy from highest to lowest was SVM (R2 = 0.676, RMSE = 0.296), BPNN (R2 = 0.560, RMSE = 0.305) and PLSR (R2 = 0.374, RMSE = 0.334), it can be seen that the prediction accuracy of the nonlinear model is better than the linear model, and the prediction accuracy of the SVM model among the nonlinear models is higher than that of the BPNN model, which can accurately predict the spatial distribution of soil TN in Datong County. The prediction of TN spatial distribution by the SVM model shows an overall spatial distribution trend of high in the north and low in the south, and slightly lower in the middle of the study area. Soil TN is an important indicator of soil fertility and predicting soil TN spatial distribution is beneficial to soil quality evaluation and effective implementation of precision agriculture. Prediction the spatial distribution of TN based on SVM algorithm is an effective technical tool for sustainable agriculture at the county scale, which can effectively map the distribution of soil TN. It provides a basis and technical support for soil precision fertilization and sustainable agricultural management.