Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning

Liu, Zunfang; Lei, Haochuan; Lei, Lei; Sheng, Haiyan

doi:10.3390/su141911998

Open AccessArticle

Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning

by

Zunfang Liu

¹,

Haochuan Lei

^1,*,

Lei Lei

¹ and

Haiyan Sheng

²

¹

Department of Geological Engineering, Qinghai University, Xining 810016, China

²

College of Agriculture and Animal Husbandry, Qinghai University, Xining 810016, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(19), 11998; https://doi.org/10.3390/su141911998

Submission received: 3 August 2022 / Revised: 6 September 2022 / Accepted: 20 September 2022 / Published: 22 September 2022

(This article belongs to the Special Issue Sustainable Agricultural Engineering Technologies and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In order to satisfy the basic requirements of sustainable agricultural development, it is important to understand the spatial distribution characteristics of soil total nitrogen (TN) content to better guide accurate fertilization to increase grain yield. To this end, this paper constructs three inversion models of partial least squares regression (PLSR), back propagation neural network (BPNN) and support vector machines (SVM) with remote sensing data to predict the TN content in Datong County, Xining City, Qinghai Province, China. The results showed that the average TN content was 1.864 g/kg, and the coefficient of variation (CV) was 30.596%. The prediction accuracy of the SVM model (R² = 0.676, RMSE = 0.296) among the three inversion models was higher than that of the BPNN model (R² = 0.560, RMSE = 0.305) and the PLSR model (R² = 0.374, RMSE = 0.334). The model with the highest accuracy predicted the spatial distribution of TN, and TN content showed a spatial distribution trend which was high in the northwest and low in the southeast, and gradually decreased from north to south. This study provides reference basis and support for soil fertility evaluations and sustainable agricultural development.

Keywords:

soil total nitrogen; BP neural network; support vector machines; spatial distribution; remote sensing

1. Introduction

Soil is a significant natural resource in human production and life and is an important carrier of the human living environment. As the main material basis of land resources, the inherently non-renewable nature of soil determines the limited carrying capacity [1]. One of the critical factors reflecting the quality of farmland is the soil nutrient content, and the normal growth and development of plants and soil nutrients are closely related [2]. In the era of rapid development of digital agriculture, accurate, fast and dynamic acquisition of soil information on demand is the guarantee of modern precision agriculture and understanding the spatial distribution characteristics of nutrients is a basic requirement for sustainable agriculture, which plays an important role in sustainable agro-ecological development [2,3,4]. Total nitrogen (TN) is a key indicator of soil nitrogen availability and one of the measures of soil fertility. Nitrogen is a nutrient element that is required in high amounts during vegetation growth [3,5,6]. With the progress of modern agriculture, it has been an important problem for researchers to obtain the required soil information in a limited time to assess the land fertility in a timely manner and to guide the scientific fertilization of agricultural production.

Traditional nutrient content testing is expensive, time-consuming, and basically obtained in laboratory analysis, the nutrient content obtained at the point scale is not conducive to good development of sustainable agriculture, and remote-sensing technology is an important technique for obtaining spatial characteristics of soil nutrients [3,7]. Combining remote sensing images to predict the spatial distribution of soil nutrients meets the current requirements for sustainable agricultural development. Presently, most of the models used for spatial prediction of TN are divided into two categories. The first type is linear models, which are constructed by simulating the linear relationship between the reflectance of remote sensing image bands and the TN content, and thus inverse models, including partial least squares regression (PLSR) [8,9,10], multiple linear regression (MLR) [11,12], and other models. However, due to the multiple and complex relationships between the reflectance of multispectral image bands and soil nutrient content, the constructed linear models are not sufficient to reflect the spatial distribution of nutrients well and are lacking in prediction accuracy. In this case, the second type of models for nutrient prediction, that is, machine learning methods, is needed. Machine learning techniques have the best performance in soil nutrient spatial inversion prediction [13], which can make up for the deficiencies of linear models and solve the complex nonlinear relationship between band reflectance and nutrient content and can well reflect the characteristics of nutrient spatial distribution in the study area. The models commonly used in soil nutrient spatial distribution prediction are neural networks (NN) [4,14], support vector machines (SVM) [15,16], random forest (RF) [3,17]. Models such as BPNN and SVM can solve the nonlinear problem well, which makes the prediction accuracy of the model higher and the nutrient spatial distribution information more accurate. Nonlinear models, mainly BPNN and SVM, have been widely used in soil moisture [18,19], soil organic matter [20], soil heavy metal [21], and soil quality [22].

The existing studies on TN basically extract sensitive bands through correlation analysis and select the best inversion model based on the deviation between measured and predicted values [23], which contain both linear and nonlinear models, both of which can be used for spatial prediction of soil nutrients in a region. Prediction accuracy of the BPNN model in inversion of TN content in black soil is 6.5% higher than that of PLSR [24]. Xiao et al. [25] used MODIS data to construct a stochastic forest model to invert the spatial distribution of TN content in Shandong province with a model coefficient of determination of 0.570. Lin et al. [26] estimated soil total nitrogen based on the synthetic color learning machine (SCLM) method. Machine learning models have useful application prospects in the study of predicting the spatial distribution of TN, but machine learning models have different results for different nutrient species and study areas. It is especially important to study the spatial prediction model of soil total nitrogen suitable for this study area, which will be of great help in predicting the spatial distribution of soil total nitrogen in this study area.

The study of spatial characteristics of soil nutrients is an urgent problem in agriculture and ecology, and the construction of a model suitable for predicting the spatial distribution of TN in this study area is equally important. Therefore, in this study, three linear and nonlinear prediction models, PLSR, BPNN and SVM, were constructed based on Landsat 5 TM multispectral remote sensing images, the original reflectance and the mathematically transformed reflectance to analyze the spatial distribution of soil TN in Datong County, Xining City, Qinghai Province, China. The prediction accuracy of the three models was also compared, and the optimal model was selected to predict the spatial distribution of TN, which provides data support and basis for soil quality evaluation, grain yield estimation and sustainable agriculture in Datong County.

2. Materials and Methods

2.1. Study Area

Datong County, Xining City, Qinghai Province is located in the eastern part of Qinghai Province, Hehuang Valley (100°51′–101°56′ E, 36°43′–37°23′ N), the southern foot of Qilian Mountains, Huangshui River upstream of the Beichuan River basin, and is the transition zone of Qinghai–Tibet Plateau and Loess Plateau. Its elevation ranges from 2178 to 4444 m; the topography of the northwest is high and the southeast low. The study area belongs to the plateau continental climate, the maximum and minimum temperature are 35.6 °C and −26.1 °C, the annual average temperature is 4.9 °C, with the most precipitation being in August. The region has a typical highland continental climate with dry spring and humid summer.

2.2. Soil Sampling and Analysis

Soil samples for the study were collected in September 2012 at a sampling depth of 0–20 cm, approximately 2 kg per sample, for a total of 73 soil samples, which were sampled while the latitude and longitude data of the sampling points were recorded by handheld Global Positioning System (GPS) (Figure 1). The samples were naturally dried and finely ground in the laboratory and employed in chemical analysis of soil composition. Soil total nitrogen mass fraction was given in the semi-micro Kjeldahl method. The maximum values of TN in the study area were 0.639 g/kg and 3.375 g/kg, the mean value was 1.864 g/kg, the skewnesswas 0.326, the kurtosis was −0.106, and the study data was consistent with a normal distribution. The coefficient of variation (CV) was 30.596%, with a moderate degree of dispersion (Table 1). In this study, soil samples were randomly divided into modeling and validation samples in the ratio of 7:3.

2.3. Image Acquisition and Processing

The Landsat 5 TM remote sensing image data used in this paper were obtained quickly from the Google Earth Engine (GEE) remote-sensing cloud platform (https://earthengine.google.com/ (accessed on 21 April 2021)). The image observation parameters and waveband characteristics are shown in Table 2. In order to ensure the accuracy of the prediction of TN content spatial distribution, the imaging time of remote sensing data selected in this paper is consistent with the collection time of soil samples. Moreover, the month when crops have been harvested is chosen for when there is no considerable amount of vegetation and crop shading on the soil surface and it is more convenient to obtain soil spectral information. The correlation analysis of the nutrient data and the image bands obtained from the field sampling and assay was utilized to obtain the best inversion band for the soil total nitrogen content.

Geometric correction, radiometric correction, image mosaic and cropping are the principal processes of remote sensing image preprocessing. After image pre-processing, the Digital Number (DN) of the original image is converted into the true Surface Reflectance (SR). The Google Earth Engine (GEE) platform can provide the TOA image set and the SR image set, which can save a lot of time in processing data. Image preprocessing stage calls Landsat 5 SR images (LANDSAT/LT05/C02/T1_L2) from the GEE platform using the function ee.ImageCollection, through which only the called images are reprocessed for de-clouding, mosaicking and cropping. Based on the findings of previous studies, it is difficult to extract the characteristic bands only from the original reflectance, but the full nitrogen characteristic bands can be extracted effectively by processing the original reflectance (1/R, log(1/R), 1/log R) [27,28].

2.4. Methods

The technical roadmap of this study is shown in Figure 2. Firstly, remote sensing images were obtained from GEE, combined with actual soil sampling data after a series of preprocessing, and mathematically transformed the image band reflectance. Then, correlation analysis was performed to extract the feature bands to construct three prediction models, PLSR, BPNN and SVM. Finally, the model prediction accuracy was evaluated by the coefficient of determination (R²) and root mean square error (RMSE); the model with the highest accuracy was selected for total soil nitrogen spatial prediction.

2.4.1. Partial Least Squares Regression

The PLSR algorithm, first proposed by Wold et al. [29]. The algorithm covers common multiple regression analysis, principal component analysis and correlation analysis, while retaining the advantages of the three regression analyses, and is an optimized algorithm for the previous linear regression. In this paper, the original reflectance or the mathematically transformed reflectance of a TM image in multispectral band was used as the basis, and each principal component was obtained by the Karhunen–Loeve (KL) transformation. Then, with the total nitrogen content of soil in the study area as the dependent variable and the real reflectance value of the surface of 51 sampling points or the mathematically transformed reflectance as the independent variable, the PLSR model was established using SPSS software, and the regression equation is shown in Equation (1).

Y = a_{1} X_{1} + a_{2} X_{2} + \dots + a_{m} X_{m} .

(1)

where Y represents the TN content, X_m represents the independent variable in the regression equation, that is, the characteristic band, and a represents the coefficient of each characteristic band, which is determined by the regression results.

2.4.2. Neural Networks

A neural network is a system that simulates the structure and function of the neuronal network in human brain by engineering technology and consists of a large number of simple nonlinear processing units [30]. The artificial neural network can process fuzzy, nonlinear and noisy information through the interaction of neuron groups in the network, and is especially suitable for processing nonlinear problems, which is widely used in pattern recognition, image processing and automatic control. The neural network model consists of an input layer, a hidden layer, and an output layer, of which there are one input layer and one output layer, and multiple hidden layers. The BPNN model used in this paper is one of the more commonly used neural network models.

The neural network model constructed in this paper contains 3 layers, and the hidden layer has 1 layer. The number of neurons in the input layer is the number of feature bands extracted by Pearson correlation analysis. The number of neurons in the output layer is 1, which is the TN content, and the number of nodes in the input layer is the number of feature spectral bands, which is obtained by correlation analysis, and the number of nodes in the hidden layer is 100. The learning rate of the model is placed at 0.001, the maximum number of 8000 iterations, and the transfer function of the BP neural network model uses the sigmoid function. Each parameter in the model needs to be determined by comparing the operation results after repeated trials.

2.4.3. Support Vector Machines

Support vector machines (SVM) is supervised machine learning method proposed based on statistical theory, and the algorithm can solve nonlinear problems well. The key technique of the algorithm is the selection of the kernel function, and there are four kernel functions available: polynomial, sigmoid, linear and radial basis function (RBF) [31]. The kernel functions can be selected reasonably according to the different research problems in the practical problem solving. The radial basis function RBF is the most commonly used kernel function in soil mapping studies [32], in which two parameters penalty (cost) and kernel width (sigma) are included [5]. The Python language is used to determine the optimal parameters in order to obtain a better model.

2.4.4. Pearson Correlation Coefficient

Correlation analysis is a statistical analysis that determines the linear relationship between two variables [33]. In this paper, the Pearson correlation coefficient is applied in order to determine the correlation between TN and band reflectance, in order to extract the TN characteristic band.

2.4.5. Model Validation

In this paper, the R² in the standard regression evaluation and the RMSE in the error index evaluation to evaluate the stability and prediction accuracy of the model. The value of R² is [0, 1]; when the fit of the model is higher, the closer the measured value is to the predicted value, the closer R² is to 1. Conversely, when the value of RMSE is smaller, the deviation of the predicted value from the measured value is smaller, the prediction accuracy is higher, and the inversion error of TN content is smaller.

3. Results and Analysis

3.1. Correlation Analysis

A Pearson correlation analysis showed that there was a correlation between TN and band reflectance (Table 3). The original reflectance of the band showed a negative correlation with the soil total N content, and the highest correlation between the reflectance of the B2 band and TN (r = −0.534). The correlation between reflectance and TN after spectral transformation treatment was improved, and the correlation between reflectance and TN in the visible band was higher than that in other bands. In particular, the highest correlation between reflectance and TN was found after inverse treatment (r = 0.584). The highest correlation between the transformed reflectance and TN was 9.4% higher than the original reflectance. It is clear from the above analysis that the most sensitive band of TN is the visible band.

3.2. Partial Least Squares Regression

Characteristic bands identified by Pearson correlation analysis were used for model construction. The highest correlation visible band was used to construct the PLSR model. The regression coefficients of determination (R²) and root mean square errors (RMSE) of the best principal components, modeling datasets and validation dataset, are shown in Table 4. The constructed models all had R² greater than 0.5. The model constructed with the inverse of reflectance (1/R) had the highest accuracy (R² = 0.604, RMSE = 0.285). The accuracy of the model constructed with the reflectance after mathematical transformation was higher than that of the model constructed with the original reflectance, indicating that appropriate mathematical transformation can improve the prediction accuracy of the model. The model can perform a rough inversion of the spatial distribution of TN in Datong County.

3.3. BP Neural Network

A BPNN TN spatial inversion model was constructed with four-band reflectance as the input variables of the model, and the inversion accuracy results were evaluated (Table 5). The results show that the inversion accuracy of the model constructed with the inverse of reflectance as the independent variable was the highest (R² = 0.882, RMSE = 0.218). The model constructed with the original reflectance was the next highest (R² = 0.806, RMSE = 0.273). The R² of the models constructed with the four reflectance was greater than 0.7. Therefore, the model constructed in this paper can quantitatively invert the TN content of soil in the Datong County and provide favorable conditions for precise nutrient mapping of agricultural farmland.

Meanwhile, a scatter plot of the model inversion results was plotted in this study (Figure 3). From the figure, it can be observed that the sampling points of the BPNN model built with the inverse of reflectance as the input variable are distributed around the 1:1 line and are more aggregated. Several other transformations from models also have TN spatial prediction capability, but the prediction accuracy is not very satisfactory, and the scatter plot is more scattered. Moreover, these models are suitable for the prediction of TN content in the range of 1.5 to 3.0 g/kg, beyond which the prediction accuracy is low. In general, the PLSR model has weaker prediction ability than the BPNN model, and the inversion results of the constructed BPNN model have less error and higher accuracy. The R² of the validation dataset was less than that of the modeling dataset, which was probably due to the uneven distribution of TN at the sampling points and the small number of soil sampling points used for validation. In summary, the predictive ability of the 1/R model is high. Compared to the PLSR model, the nonlinear model has a greater advantage in terms of predictive power and stability, which is consistent with the findings of many scholars [3,34].

3.4. Support Vector Machines

The prediction accuracy of the TN inversion model constructed with the SVM algorithm is higher than that of the BPNN model, as seen from the accuracy of the modeling samples and the validation samples (Table 6). The inversion accuracy of the model constructed after logarithmic and then inverse processing of reflectance is the highest (R² = 0.879, RMSE = 0.233), which is different from that of the BPNN model. In this instance, logR (R² = 0.879, RMSE = 0.233) > 1/R (R² = 0.837, RMSE = 0.288) > log1/R (R² = 0.824, RMSE = 0.296) > REF (R² = 0.763, RMSE = 0.336). The R2 of the validation samples of this model are all greater than 0.6, while the BPNN model is all below 0.6, indicating that for modeling, the BPNN model has the advantage and the SVM model is superior in predictive inversion.

As can be seen from the scatter plot (Figure 4), the model constructed by the original reflectance has a higher prediction than the measured value when the TN content is lower than 2.0 g/kg, and a lower prediction when it is greater than 2.0 g/kg; all other models have similar prediction errors. However, in general, the model prediction and the actual value are basically around the 1:1 line, with less deviation, better tightness of prediction points and fewer outliers.

3.5. Predicting the Spatial Distribution of TN

When the three models of PLSR, BPNN and SVM are compared together, SVM is more advantageous and has higher prediction accuracy than PLSR and BPNN models and can reflect the spatial distribution of TN content in the study area well. Figure 5 shows the spatial distribution of TN prediction by the three models for Datong County. From the prediction map, it can be seen that the spatial distribution of TN predicted by the PLSR model differs greatly from the other two models, and the prediction result is high, with the maximum value of 3.724 g/kg and the minimum value of 1.456 g/kg; the maximum value of TN spatial distribution predicted by the BPNN model is 3.163 g/kg and 0.124 g/kg, respectively; the predicted content of SVM model ranges from 0.028–2.955 g/kg. By comparison, SVM model predicts the most accurate TN content in Datong County, which can provide a good understanding of the spatial distribution of TN; the three models show the same spatial prediction trend of TN, with the overall trend of TN being high in the northwest and low in the southeast.

4. Discussion

Soil nutrients are an important factor in measuring soil fertility, and traditional farm management and agricultural systems have led to polarization of soil nutrients in farmland, i.e., excess nutrients in fertile soils reduce the utilization of chemical fertilizers, and nutrient-poor soil vegetation does not receive adequate nutrients. Soil nutrient monitoring has evolved from qualitative to quantitative studies [35], and the determination of TN inversion models plays an important role in understanding their spatial distribution characteristics, guiding the implementation of precision agriculture, and promoting the development of agricultural production [2,36]. Monitoring of total nitrogen content has further improved and updated its inversion means from traditional laboratory analysis to large-scale spatial prediction based on regional scale. The most important of them is the application of remote-sensing technology, which has the advantages of accuracy, speed and economy [35] and can predict well the spatial distribution of total soil nitrogen in the study area. The establishment of models for prediction has gradually developed from linear to nonlinear models, for example, Liu et al. [28] established a PLSR model to predict the soil total nitrogen content in Shaanxi Province. Li et al. [37] established a stepwise linear regression model to invert the soil total nitrogen content based on vegetation information. Xu et al. [38] used Landsat remote sensing images to construct the regression kriging (RK) to predict the soil total nitrogen content of a farm. Wang et al. [39] mapped the distribution of TN content in northeastern Liaoning Province, China, based on a random forest model with three different kernel functions and a multiple linear stepwise regression (MLSR) model. Ye et al. [40] predicted soil total nitrogen content based on a radial basis function neural network (RBFNN). Further development of the prediction model can be seen.

The spatial distribution of TN is largely influenced by anthropogenic and natural factors, and there is spatial heterogeneity [16,40]. Therefore, the spatial prediction of TN cannot only use image band reflectance as the model independent variable, but needs to consider more environmental and anthropogenic factors, such as topographic factors [36], climatic factors [41], soil types [6], tillage practices [4,42], and crop types [43]. Nonlinear models play an important role in regions subject to anthropogenic disturbances [40]. Sun et al. [44] used a random forest model to analyze the relative importance of factors affecting soil nutrients. The results showed that temperature, precipitation and elevation were significant influencing factors for soil nutrient prediction. Different soil types [45] and different topographic trends will directly affect the TN spatial distribution. Dong et al. [17] showed that predicting soil nutrients based on environmental and anthropogenic factors is more important than the common interpolation technique to obtain better prediction accuracy. Zhang et al. [43] found that the TN spatial pattern is related to elevation. Peri et al. [41] found that TN decreased with increasing soil desertification, indicating that TN content was closely related to the degree of vegetation denseness. The growth and development of vegetation cannot be separated from the influence of climatic conditions, and climate will ultimately affect the spatial distribution of TN content. In summary, spatial distribution of TN content is influenced by many factors and there is a complex relationship between them. Future research will focus on the spatial prediction of TN under the synergistic conditions of environmental and anthropogenic factors, and understand the relationship between each influencing factor, and extract TN sensitive bands and factors with high correlation with the help of higher resolution remote sensing data to accurately predict soil TN content.

5. Conclusions

Based on the field sampling data of Datong County, this paper uses statistical software to correlate the TN of sampling points with the reflectance of the corresponding TM image bands, as well as to construct PLSR, BPNN and SVM models to invert the spatial distribution of TN in the study area, respectively. The study demonstrates that the average TN content in Datong County is 1.864 g/kg with a coefficient of variation of 30.596%, which is a medium degree of variation. The correlation was significantly improved (r = 0. 584, p < 0.01) and the prediction accuracy of the model were improved after preprocessing the mathematical transformation of the raw spectral reflectance. Comparing the prediction accuracy of the three models, PLSR, BPNN and SVM, the prediction accuracy from highest to lowest was SVM (R² = 0.676, RMSE = 0.296), BPNN (R² = 0.560, RMSE = 0.305) and PLSR (R² = 0.374, RMSE = 0.334), it can be seen that the prediction accuracy of the nonlinear model is better than the linear model, and the prediction accuracy of the SVM model among the nonlinear models is higher than that of the BPNN model, which can accurately predict the spatial distribution of soil TN in Datong County. The prediction of TN spatial distribution by the SVM model shows an overall spatial distribution trend of high in the north and low in the south, and slightly lower in the middle of the study area. Soil TN is an important indicator of soil fertility and predicting soil TN spatial distribution is beneficial to soil quality evaluation and effective implementation of precision agriculture. Prediction the spatial distribution of TN based on SVM algorithm is an effective technical tool for sustainable agriculture at the county scale, which can effectively map the distribution of soil TN. It provides a basis and technical support for soil precision fertilization and sustainable agricultural management.

Author Contributions

Conceived and designed the research, Z.L. and H.L.; data curation, Z.L. and L.L.; writing—original draft preparation, Z.L.; visualization, Z.L.; supervision, H.S.; funding acquisition, H.L. while the other authors supported writing the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (U20A20115).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source of remote sensing images used in this paper is the same as Google Earth Engine (https://earthengine.google.com/ (accessed on 21 April 2021)); additional data may be obtained with the consent of the corresponding author.

Acknowledgments

We acknowledge the anonymous reviewers for their valuable comments on our study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, B.; Jiang, Q.; Wang, K. Application and progress in estimating soil organic matter content based on remote sensing. J. Shandong Agr. Univ. Nat. Sci. Ed. 2011, 42, 317–321. [Google Scholar]
Wang, R.; Zou, R.; Liu, J.; Liu, L.; Hu, Y. Spatial distribution of soil nutrients in farmland in a hilly region of the pearl river delta in China based on geostatistics and the inverse distance weighting method. Agriculture 2021, 11, 50. [Google Scholar] [CrossRef]
Song, Y.; Zhao, X.; Su, H.; Li, B.; Hu, Y.; Cui, X. Predicting Spatial Variations in Soil Nutrients with Hyperspectral Remote Sensing at Regional Scale. Sensors 2018, 18, 3086. [Google Scholar] [CrossRef]
Alemu, L.; Mesfin, B. Performance of mid infrared spectroscopy to predict nutrients for agricultural soils in selected areas of Ethiopia. Heliyon 2022, 8, e09050. [Google Scholar]
Zhou, T.; Geng, Y.; Chen, J.; Pan, J.; Haase, D.; Lausch, A. High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms. Sci. Total Environ. 2020, 729, 138244. [Google Scholar] [CrossRef]
Mobasheri, M.; Amani, M.; Ranjbaran, M.; Mahdavi, S.; Zabihi, H.R. Introducing an index in determination of doil total nitrogen content in an agricultural soil using laboratory spectrometry. Commun. Soil Sci. Plan. 2020, 51, 288–296. [Google Scholar] [CrossRef]
Gulhane, V.; Rode, S.; Pande, C. Wavelet for predicting soil nutrients using remotely sensed satellite images. Int. J. Comput. Appl. 2017, 174, 35–38. [Google Scholar]
Yu, S.; Bu, H.; Dong, W.; Jiang, Z.; Zhang, L.; Xia, Y. Construction and evaluation of prediction model of main soil nutrients based on spectral information. Appl. Sci. 2022, 12, 6298. [Google Scholar] [CrossRef]
Xu, L.; Xie, D.; Wei, C.; Li, B. Prediction of total nitrogen and total phosphorus concentrations in purple soil using hyperspectral data. Spectrosc. Spect. Anal. 2013, 33, 723–727. [Google Scholar]
Wang, S.; Shi, P.; Zhang, H.; Wang, X. Retrieval of soil total nitrogen content in reclaimed farmland of mining area based on hyperspectral imaging. Chin. J. Ecol. 2019, 38, 294–301. [Google Scholar]
Komolafe, A.A.; Olorunfemi, I.E.; Oloruntoba, C.; Akinluyi, F.O. Spatial prediction of soil nutrients from soil, topography and environmental attributes in the northern part of Ekiti State, Nigeria. Remote Sens. Appl. Soc. Environ. 2021, 21, 100450. [Google Scholar] [CrossRef]
Miran, N.; RasouliSadaghiani, M.H.; Feiziasl, V.; Sepehr, E.; Rahmati, M.; Mirzaee, S. Predicting soil nutrient contents using Landsat OLI satellite images in rain-fed agricultural lands, northwest of Iran. Environ. Monit. Assess. 2021, 193, 607. [Google Scholar] [CrossRef]
Swapna, B.; Manivannan, S.; Kamalahasan, M. Prognostic of soil nutrients and soil fertility index using machine learning classifier techniques. Int. J. Collab. 2022, 18, 3. [Google Scholar] [CrossRef]
Li, Y.; Zhao, Z.; Wei, S.; Sun, D.; Yang, Q.; Ding, X. Prediction of regional forest soil nutrients based on Gaofen-1 remote sensing data. Forests 2021, 12, 1430. [Google Scholar] [CrossRef]
Xu, Y.; Li, B.; Shen, X.; Li, K.; Cao, X.; Cui, G.; Yao, Z. Digital soil mapping of soil total nitrogen based on Landsat 8, Sentinel 2, and WorldView-2 images in smallholder farms in Yellow River Basin, China. Environ. Monit. Assess. 2022, 194, 282. [Google Scholar] [CrossRef]
Dharumarajan, S.; Lalitha, M.; Niranjana, K.; Hegde, R. Evaluation of digital soil mapping approach for predicting soil fertility parameters—A case study from Karnataka Plateau, India. Arab. J. Geosci. 2022, 15, 386. [Google Scholar] [CrossRef]
Dong, W.; Wu, T.; Luo, J.; Sun, Y.; Xia, L. Land parcel-based digital soil mapping of soil nutrient properties in an alluvial-diluvia plain agricultural area in China. Geoderma 2019, 340, 234–248. [Google Scholar] [CrossRef]
Wang, X.; Lü, H.; Crow, W.T.; Zhu, Y.; Wang, Q.; Su, J.; Zheng, J.; Gou, Q. Assessment of SMOS and SMAP soil moisture products against new estimates combining physical model, a statistical model, and in-situ observations: A case study over the Huai River Basin, China. J. Hydrol. 2021, 598, 126468. [Google Scholar] [CrossRef]
Zhu, Q.; Wang, Y.; Luo, Y. Improvement of multi-layer soil moisture prediction using support vector machines and ensemble Kalman filter coupled with remote sensing soil moisture datasets over an agriculture dominant basin in China. Hydrol. Process. 2021, 35, 14154. [Google Scholar] [CrossRef]
De Santana, F.B.; Otani, S.K.; De-Souza, A.M.; Poppi, R.J. Comparison of PLS and SVM models for soil organic matter and particle size using vis-NIR spectral libraries. Geoderma Reg. 2021, 27, e00436. [Google Scholar] [CrossRef]
Zhang, H.; Yin, S.; Chen, Y.; Shao, S.; Wu, J.; Fan, M.; Chen, F.; Gao, C. Machine learning-based source identification and spatial prediction of heavy metals in soil in a rapid urbanization area, eastern China. J. Clean. Prod. 2020, 273, 122858. [Google Scholar] [CrossRef]
Niu, Y.; Ye, S. Data Prediction Based on Support Vector Machine (SVM)—Taking Soil Quality Improvement Test Soil Organic Matter as an Example. IOP Conf. Ser. Earth Environ. Sci. 2019, 295, 012021. [Google Scholar] [CrossRef]
Qiu, H. Hyperspectral Remote Sensing Inversion of Organic Matter, Available Nitrogen, Phosphorus and Potassium Contents in Cropland Soil. M.D. Thesis, Fujian Agriculture and Forestry University, Fuzhou, China, 2017. [Google Scholar]
Yang, Y.; Zhao, J.; Qin, K.; Zhao, N.; Yang, C.; Zhang, D.; Cui, X. Prediction of black soil nutrient content based on airborne hyperspectral remote sensing. Trans. Chin. Soc. Agric. Eng. 2019, 35, 94–101. [Google Scholar]
Xiao, W.; Chen, W.; He, T.; Ruan, L.; Guo, J. Multi-Temporal mapping of soil total nitrogen using Google Earth Engine across the Shandong province of China. Sustainability 2020, 12, 10274. [Google Scholar] [CrossRef]
Lin, L.; Gao, Z.; Liu, X. Estimation of soil total nitrogen using the synthetic color learning machine (SCLM) method and hyperspectral data. Geoderma 2020, 380, 114664. [Google Scholar] [CrossRef]
Zhang, S.; Lu, X.; Nie, G.; Li, Y.; Shao, Y.; Tian, Y.; Fan, L.; Zhang, Y. Estimation of soil organic matter in coastal wetlands by SVM and BP based on hyperspectral remote sensing. Spectrosc. Spect. Anal. 2020, 40, 556–561. [Google Scholar]
Liu, J.; Dong, Z.; Chen, X. Study on hyperspectral estimation model of total nitrogen content in soil of Shaanxi province. IOP Conf. Ser. Earth Environ. Sci. 2018, 108, 042025. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Irmak, A.; Jones, J.W.; Batchelor, W.D.; Irmak, S.; Boote, K.J.; Paz, J.O. Artificial neural network model as a data analysis tool in precision farming. Trans. ASABE 2006, 49, 2027–2037. [Google Scholar] [CrossRef]
Pradhan, B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2012, 51, 350–365. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Neupane, R.; Sood, K.; Kumar, S. Artificial bee colony feature selection algorithm combined with machine learning algorithms to predict vertical and lateral distribution of soil organic matter in South Dakota, USA. Carbon Manag. 2017, 8, 277–291. [Google Scholar] [CrossRef]
Yang, Z.; Chen, X.; Jing, F.; Guo, B.; Lin, G. Spatial variability of nutrients and heavy metals in paddy field soils based on GIS and Geostatistics. Chin. J. Appl. Ecol. 2018, 29, 1893–1901. [Google Scholar]
Morellos, A.; Pantazi, X.E.; Moshou, D.; Alexandridis, T.; Whetton, R.; Tziotzios, G.; Wiebensohn, J.; Bill, R.; Mouazen, A.M. Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosyst. Eng. 2016, 152, 104–116. [Google Scholar] [CrossRef]
Ji, W.; Liu, Y. Research on Quantitative Evaluation of Remote Sensing and Statistics Based on Wireless Sensors and Farmland Soil Nutrient Variability. Comput. Intell. Neurosc. 2022, 2022, 3646264. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Adhikari, K.; Wang, Q.; Jin, X.; Li, H. Role of environmental variables in the spatial distribution of soil carbon (C), nitrogen (N), and C:N ratio from the northeastern coastal agroecosystems in China. Ecol. Indic. 2018, 84, 263–272. [Google Scholar] [CrossRef]
Li, Y.; Pan, X.; Wang, C.; Liu, Y.; Zhao, Q. Monitoring changes of soil organic matter and total nitrogen in cultivated land in Guangxi by remote sensing. Acta Ecol. Sin. 2014, 34, 5283–5291. [Google Scholar]
Xu, Y.; Smith, S.E.; Grunwald, S.; Abd-Elrahman, A.; Wani, S.P.; Nair, V.D. Estimating soil total nitrogen in smallholder farm settings using remote sensing spectral indices and regression kriging. Catena 2018, 163, 111–122. [Google Scholar] [CrossRef]
Wang, S.; Jin, X.; Adhikari, K.; Li, W.; Yu, M.; Bian, Z.; Wang, Q. Mapping total soil nitrogen from a site in northeastern China. Catena 2018, 166, 134–146. [Google Scholar] [CrossRef]
Ye, Y.; Jiang, Y.; Kuang, L.; Han, Y.; Xu, Z.; Guo, X. Predicting spatial distribution of soil organic carbon and total nitrogen in a typical human impacted area. Geocarto Int. 2022, 37, 4465–4482. [Google Scholar] [CrossRef]
Peri, P.L.; Rosas, Y.M.; Ladd, B.; Toledo, S.; Lasagno, R.G.; Pastur, G.M. Modeling soil nitrogen content in South Patagonia across a climate gradient, vegetation type, and grazing. Sustainability 2019, 11, 2707. [Google Scholar] [CrossRef]
Ojoyi, M.M.; Mutanga, O.; Odindi, J.; Kahinda, J.M.M.; Abdelrahman, E.M. Implications of land use transitions on soil nitrogen in dynamic landscapes in Tanzania. Land Use Policy 2017, 64, 95–100. [Google Scholar] [CrossRef]
Zhang, Y.; Sui, B.; Shen, H.; Ouyang, L. Mapping stocks of soil total nitrogen using remote sensing data: A comparison of random forest models with different predictors. Comput. Electron. Agr. 2019, 160, 23–30. [Google Scholar] [CrossRef]
Sun, M.; Hou, E.; Wu, J.; Huang, J.; Huang, X.; Xu, X. Spatial patterns and drivers of soil chemical properties in typical hickory plantations. Forests 2022, 13, 457. [Google Scholar] [CrossRef]
Xiao, S.; He, Y.; Dong, T.; Nie, P. Spectral analysis and sensitive waveband determination based on nitrogen detection of different soil types using near infrared sensors. Sensors 2018, 18, 523. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Study area and sampling points.

Figure 2. Methodology flow chart.

Figure 3. Inversion scatter plot of BPNN model (a): REF; (b): 1/R; (c): 1/log(R); (d): log(1/R).

Figure 4. SVM model constructed with four reflectance (a): REF; (b): 1/R; (c): 1/log(R); (d): log(1/R).

Figure 5. Predictive maps of TN using (a) PLSR; (b) BPNN; (c) SVM model.

Table 1. Description statistics of TN content.

Variable	Sample Numbers	Max/(g/kg)	Min/(g/kg)	Mean/(g/kg)	CV/%	Skewness	Kurtosis
TN	73	3.375	0.639	1.864	30.596	0.326	−0.106

Table 2. Landsat5 TM observation parameters and band characteristics.

Band	Name	Wavelength/µm	Spatial Resolution/m	Band Characteristics
B1	Blue	0.45–0.52	30	High penetration ability for water bodies, which is conducive to detecting water depth.
B2	Green	0.52–0.60	30	Distinguish between tree species and vegetation.
B3	Red	0.63–0.69	30	Chlorophyll is the main absorption band and responds to the health status of vegetation.
B4	Near Infrared	0.77–0.90	30	Crop yield estimation, identification of green vegetation.
B5	Shortwave infrared 1	1.55–1.75	30	Moisture absorption zone, commonly used for soil moisture surveys.
B6	Thermal infrared	10.40–12.50	120	Record the surface thermal radiation capacity.
B7	Shortwave infrared 2	2.08–2.35	30	Strong moisture absorption zones, distinguishing the main rock types.

Table 3. Pearson correlation and p-Value.

Spectral Transformation	Indicators	B1	B2	B3	B4	B5	B7
REF	r	−0.483 **	−0.534 **	−0.517 **	−0.231	−0.310 *	−0.378 **
REF	p	0.000	0.000	0.000	0.106	0.028	0.007
1/R	r	0.566 **	0.584 **	0.514 **	0.185	0.284 *	0.349 *
1/R	p	0.000	0.000	0.000	0.199	0.046	0.013
log(1/R)	r	0.527 **	0.562 **	0.518 **	0.209	0.300 *	0.366 **
log(1/R)	p	0.000	0.000	0.000	0.145	0.034	0.009
1/log(R)	r	0.499 **	0.541 **	0.517 **	0.243	0.310 *	0.375 **
1/log(R)	p	0.000	0.000	0.000	0.088	0.029	0.007

Notes: ** = p ≤ 0.01; * = p ≤ 0.05.

Table 4. Comparison of evaluation results of TN PLSR model.

Spectral Transformation	Principal Component Number	Modeling Dataset		Validation Dataset
Spectral Transformation	Principal Component Number	R²	RMSE	R²	RMSE
REF	3	0.564	0.299	0.356	0.315
1/R	3	0.604	0.285	0.374	0.334
log(1/R)	4	0.598	0.287	0.357	0.331
1/log(R)	4	0.557	0.334	0.255	0.430

Table 5. Accuracy evaluation results of BPNN model.

Spectral Transformation	Modeling Dataset		Validation Dataset
Spectral Transformation	R²	RMSE	R²	RMSE
REF	0.806	0.273	0.522	0.313
1/R	0.882	0.218	0.560	0.305
log(1/R)	0.771	0.291	0.425	0.349
1/log(R)	0.787	0.282	0.466	0.331

Table 6. The prediction results of TN using SVM model.

Spectral Transformation	Modeling Dataset		Validation Dataset
Spectral Transformation	R²	RMSE	R²	RMSE
REF	0.763	0.336	0.626	0.385
1/R	0.837	0.288	0.635	0.302
log(1/R)	0.824	0.296	0.645	0.312
1/log(R)	0.879	0.233	0.676	0.296

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Lei, H.; Lei, L.; Sheng, H. Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning. Sustainability 2022, 14, 11998. https://doi.org/10.3390/su141911998

AMA Style

Liu Z, Lei H, Lei L, Sheng H. Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning. Sustainability. 2022; 14(19):11998. https://doi.org/10.3390/su141911998

Chicago/Turabian Style

Liu, Zunfang, Haochuan Lei, Lei Lei, and Haiyan Sheng. 2022. "Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning" Sustainability 14, no. 19: 11998. https://doi.org/10.3390/su141911998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatial Prediction of Total Nitrogen in Soil Surface Layer Based on Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Soil Sampling and Analysis

2.3. Image Acquisition and Processing

2.4. Methods

2.4.1. Partial Least Squares Regression

2.4.2. Neural Networks

2.4.3. Support Vector Machines

2.4.4. Pearson Correlation Coefficient

2.4.5. Model Validation

3. Results and Analysis

3.1. Correlation Analysis

3.2. Partial Least Squares Regression

3.3. BP Neural Network

3.4. Support Vector Machines

3.5. Predicting the Spatial Distribution of TN

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI