1. Introduction
Soil fertility is the ability of soil to supply and coordinate nutrients, water, air and heat for crop growth, and it is the most important indicator to measure the quality of soil resources [
1]. Among the various nutrients in the soil, nitrogen (N), phosphorus (P) and potassium (K) are the necessary elements in large demand for crops, which are usually replenished by fertilization [
2]. In particular, soil N, the main component of protein in crops, plays an important role in the growth of stems and roots and the development of fruits [
3]. There are many types of soil nitrogen types. Among them, total nitrogen (TN) is the sum of various forms of nitrogen in the soil, representing the total storage capacity and nitrogen supply potential of soil nitrogen. TN can be divided into two categories: organic nitrogen and inorganic nitrogen. Organic nitrogen includes water-soluble nitrogen, hydrolyzed nitrogen and non-hydrolysable nitrogen; Inorganic nitrogen includes ammonium nitrogen, nitrate nitrogen and nitrous nitrogen. Available nitrogen (AN) refers to the nitrogen in the soil that could be easily absorbed and utilized by crops, and reflects the recent nitrogen supply capacity of the soil [
4], mainly including ammonium nitrogen (NH
4-N), nitrate nitrogen (NO
3-N), amino nitrogen, amide nitrogen and some simple polypeptide and protein compounds. The application of N fertilizers is one of the most important management tools to ensure and increase yield in agricultural systems [
5]. As common synthetic N fertilizers, NH
4-N, NO
3-N and amide nitrogen fertilizer all belong to the available nitrogen. The mechanism of action and effect on plants of different types of nitrogen fertilizers are also different. For example, NH
4-N is easily soluble in water to produce ammonium ions and corresponding anions, which can be directly absorbed and utilized by crops with faster fertilizer efficiency as topdressing. NO
3-N can promote the absorption of calcium, magnesium and potassium by crops, but it is easy to be lost with water, so it should not be used in paddy fields [
6]. Amide nitrogen generally needs to be converted into ammonium nitrogen by soil microorganisms before being absorbed by crops. Therefore, the real-time determination of soil N content in different types is of great significance for agricultural production activities. In addition, the surplus and deficiency of soil nitrogen will change the quality and yield of crops to a certain extent [
7]. Besides, the loss of nitrogen will lead to eutrophication of water bodies and the greenhouse effect [
8]. Therefore, the effective and accurate measurement of soil nitrogen content has a guiding role in rational fertilization and precision agriculture.
The traditional methods of detecting N are mainly based on chemical analysis, such as the Kjeldahl method [
9], the Dumas method [
10], ultraviolet (UV) spectrophotometry [
11], chromatography, etc. The chemical analysis method requires sample preparation in the early stage, which is time-consuming and labor-intensive, and is prone to human error. In addition, a large amount of strong acid and strong alkali needs to be used in the experimental process, and there are unsafe factors. Researchers have explored many rapid and nondestructive detection methods in nitrogen and found that the hyperspectral imaging (HSI) technique coupled with visible (vis) and/or near-infrared (NIR) spectroscopy is generally used for quantitative detection of fertilizer nitrogen, plant nitrogen and soil nitrogen, etc. [
12]. For soil nitrogen detection, researchers have studied the effect of soil sample pretreatment methods on the NIR detection of TN, and found that the detection accuracy of soil after drying and sieving was better [
13]. Moreover, results of detection accuracy are influenced by different machine learning algorithms and characteristic wavelengths selecting methods [
14]. The main function of machine learning is to generate a model from empirical data on a computer by means of calculation, that is, a learning algorithm to achieve rapid judgment [
15,
16]. The machine learning model established based on spectral data can qualitatively or quantitatively detect nitrogen, phosphorus, potassium, pesticide residues and other substances in complex soil environments. For example, Kensuke Kawamura et al. used vis-NIR spectroscopy to estimate soil TN and found that the model using the characteristic wavelengths showed better prediction accuracy for TN than the full-band model [
17]. Morellos et al. explored NIR spectroscopy combined with different machine learning algorithms to predict soil TN and found that the Cubist model provided the best prediction results [
18]. Xu et al. used HSI combined with machine learning algorithms to detect TN, AN, NO
3-N and NH
4-N in soil profiles and found that SVM model performed best [
19]. Previous studies have shown the potential of using NIR or HSI coupled with multivariate data analysis for the detection of soil nitrogen content in soil. In addition, the detection speed of HSI is faster than that of point-based techniques, as many samples can be scanned and analyzed at the same time by using an HSI camera [
20]. However, to the best of our knowledge, many studies only focused on the detection of TN content, and so far, no study has been carried out to detect the NH
4-N, NO
3-N and urea-N in different types of soils by using HSI simultaneously, so it is urgent to fulfill the research gaps in these aspects. Moreover, we also want to further explore the possibility of improving the accuracy of soil nitrogen detection.
In view of the shortcomings of existing research studies, this paper mainly explored 2 problems: (1) to explore the spectral responses of different nitrogen fertilizers and different kinds of soils in the NIR band, and to illustrate the feasibility of detection in principle. (2) to compare the impact of different machine learning algorithms and characteristic wavelength selection algorithms on soil nitrogen detection, and to select the optimal model. It was hypothesized that (1) laboratory-based HSI spectroscopy would be capable of capturing the spectral properties of low nitrogen levels in soils, and (2) HSI spectroscopy combined with machine learning techniques could also perform well for simultaneously predicting the various soil N at the fine scale. To test our hypotheses, 2 soils with different properties from different areas were collected and 3 nitrogen fertilizers were purchased. The focus of this research includes the following parts: (1) Analyze the NIR reflectance spectral characteristics of 2 soils and 3 nitrogen fertilizer standards (ammonium bicarbonate, sodium nitrate and urea); (2) Compare the detection results of 2 preprocessing algorithms and 2 machine learning models in different types of nitrogen fertilizers using the full-band spectra; (3) Use 2 characteristic wavelength selection algorithms to select the characteristic wavelength on the spectra processed by the optimal preprocessing method, and compare them with the full-band models to obtain the optimal models. This research aimed to provide theoretical and experimental guidance for the realization of precision fertilization.
2. Results and Discussion
2.1. Soil Chemical Properties Analysis
Due to the complex soil composition, the organic matter and water in the soil all respond to the NIR spectra, so before analyzing the spectral characteristics of soils with different nitrogen contents, the main chemical composition values of the collected soils were measured and analyzed. It can be seen from
Table 1 that soil 1 is acidic soil (pH = 4.69), and soil 2 is alkaline soil (pH = 8.85); the conductivity of soil 1 (44.3 μm/cm) is much lower than that of soil 2 (346 μm/cm); the available nitrogen of soil 1 (31.45 mg/kg) was slightly lower than that in soil 2 (42.19 mg/kg); the content of available potassium (8.595 mg/kg) and available phosphorus (1.45 mg/kg) in soil 1 was much lower than that in soil 2; the organic matter in soil 1 (0.59%) has a content slightly lower than soil 2 (0.63%). Overall, the chemical compositions of the 2 soils were quite different.
2.2. Soil Spectral Feature Analysis
Figure 1a shows the NIR spectral reflectance curves of the 2 soils. It was observed that both soils have absorption peaks around 1410 nm, which could be assigned to the O-H first overtones. The crystal water peak of minerals was reflected here [
21], and the absorption peak of soil 1 was more obvious than that of soil 2, so its mineral content was higher.
The reflectance of soil2 is lower than that of soil1 owning to the higher organic matter content of soil2 than that of soil1, which has been confirmed in previous studies [
22].
Figure 1b shows the NIR spectral reflectance curves of three nitrogen fertilizer standards. For the three nitrogen fertilizer standards, there were clear differences in the spectra between 900 and 1700 nm. The specific performance was that NH
4-N, NO
3-N had no obvious absorption peaks, while urea-N had multiple obvious absorption peaks. According to relevant literature [
23], the absorption peak of urea molecule at 1160 nm could be assigned to the C=O stretch fourth overtone, and the absorption peaks at 1460, 1490 and 1520 nm could be ascribed to the Sym N−H stretch first overtone, Sym N−H stretch first overtone and N−H stretch first overtone, respectively.
Figure 2 shows the average NIR reflectance spectra of soil sample sets at different nitrogen content levels. Combining with
Figure 1, the absorption peaks of the 3 nitrogen fertilizers themselves were not reflected in the waveform diagram, which may be submerged due to the low concentration.
Observing the waveform of each sample set, it was obvious that with the change of nitrogen concentration, the spectral reflectance also changes, which showed that there was a certain correlation between nitrogen concentration and reflectance. It is concluded that NIR spectra had spectral responses to soil nitrogen concentrations. In order to further explore the relationship between spectral reflectance and nitrogen concentration, data mining methods should be used to analyze the high-dimensional data. Due to the obvious noise at the beginning and end of the spectra, the 975–1645 nm bands with high signal-to-noise ratio were selected for the subsequent data analysis.
2.3. Models Based on Full-Wavelength Spectra
Taking the reflectance spectra as the independent variables and the 12 nitrogen concentrations of the soil as the dependent variables, the PLSR and LSSVM models were established using the raw spectra, multiplicative scatter correction (MSC), and wavelet transform (WT) preprocessed spectra, respectively.
Table 2 shows the average results of 100 runs of the models under the optimal preprocessing method, standard deviation (SD) is shown in parentheses. For soil1_NH
4-N, the results after MSC preprocessing were the best, for soil2_NH4-N, soil2_NO3-N, soil1_urea-N and soil2_urea-N, the results after WT preprocessing were the best, and for soil1_NO
3 -N, the raw spectra got satisfactory results.
In terms of model performance, the RPD of the PLSR and LSSVM models for the 6 data sets were both above 2, indicating that the models had reliable prediction accuracy. Among them, the results of soil urea-N were the best. In soil1, the prediction set R2p of PLSR and LSSVM reached 0.94 and 0.91, respectively, and RMSEP were 0.66% and 0.75%, respectively. In soil2, the R2p of both PLSR and LSSVM was 0.92, and the RMSEP was 0.79% and 0.73%, respectively. For NH4-N, in soil1, the LSSVM (R2p = 0.86, RMSEP = 1.07%) model outperformed PLSR, while in soil2, PLSR (R2p = 0.88, RMSEP = 0.98%) performed significantly better than LSSVM (R2p = 0.80, RMSEP = 1.21%). For NO3-N, PLSR performed slightly better than LSSVM in both soil1 and soil2, with R2p of 0.88 and 0.78, and RMSEP of 1.08% and 1.31%, respectively.
In summary, NIR-HSI combined with PLSR and LSSVM algorithms could effectively detect nitrogen content in soils. The predicted results of urea-N were better than that of NH4-N and NO3-N; this might be related to the obvious characteristic peaks of urea molecules in NIR band.
2.4. Characteristic Wavelengths Selection
The full-band-based NIR spectral dataset contains 200 wavelengths of data, the amount of data was large and there was a lot of redundant information irrelevant to the spectral response of soil nitrogen. Therefore, competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA) feature wavelength selection algorithms were used to find the feature wavelengths related to soil nitrogen content. The number of characteristic wavelengths selected from the 6 data sets and the ratio to the number of full wavelengths are shown in
Table 3.
Overall, after the selection of characteristic wavelengths, variable numbers were greatly reduced in the 6 datasets. The wavelengths selected by the CARS were between 16 and 44, accounting for 8–24.5% of the full band, and the wavelengths selected by the SPA were between 8 and 14, accounting for 5–7% of the full band. On each dataset, there were fewer characteristic wavelengths selected by SPA than by CARS. Among them, for soil1_urea-N and soil2_urea-N, the characteristic wavelengths selected by CARS were reduced to 8% and by the SPA, 5%, respectively.
In order to show the positions of the characteristic wavelengths more intuitively, we took the first-order derivation of the average spectra of the 6 data sets, and marked the CARS and SPA methods with vertical lines of different colors in
Figure 3. The peaks and troughs of the first-order derivative spectra show the differences between the spectra of different nitrogen concentrations. On the whole, whether it was CARS or SPA, the selected characteristic wavelengths basically covered the positions where these differences appeared, and the positions where the 2 appear overlap to some extent. Specifically, for the same type of nitrogen, in the soil1 dataset, the spectra had a significant difference around 1400 nm, and the characteristic wavelengths selected by the two algorithms were mainly concentrated here; the differences at 1000 nm and 1600 nm were not obvious, so fewer characteristic wavelengths were covered. For soil2, the overall differences of the spectra were not as obvious as that of soil1. Although the selected characteristic wavelengths were also basically around 1000, 1400 and 1600 nm, the numbers of selected characteristic wavelengths were not as large as those of soil1. In addition, for different types of nitrogen in the same soil, the positions and quantities of characteristic wavelengths were obviously different.
Therefore, the characteristic wavelengths founded by the CARS and SPA methods were closely related to the nitrogen content in the soil. However, in previous literatures [
24,
25], the characteristic wavelengths of soil nitrogen were rarely selected, and even if the characteristic wavelength selection algorithm was used, the location of their bands was not analyzed specifically or visually. In order to prove whether the selected wavelengths were reliable, further analysis through machine learning modeling was conducted.
2.5. Models Based on Characteristic Wavelengths
Table 4 shows the PLSR and LSSVM model performances based on characteristic wavelengths selected by CARS. For soil1_NH
4-N and soil2_NH
4-N, the results of PLSR were better than LSSVM, with R
2p reaching 0.93 and 0.91, respectively, which were improved by 0.07 and 0.03, respectively, compared with full-band modeling; similarly, for NO
3-N, the results of CARS-PLSR were also better. The R
2p in soil1 and soil2 were 0.96 and 0.88, respectively, which were 0.12 and 0.10 higher than before, and the increases were more obvious. Among the 3 types of nitrogen, the detection result of urea-N was the best. In soil1_urea-N, the R
2p (0.96) of the CARS-PLSR and CARS-LSSVM models were consistent, which was 0.02 higher than that of the full-band model, but the RMSE
P (0.53) of CARS-LSSVM was lower and the RPD (5.65) was higher, so its performance was slightly better. In soil1_urea-N, the result of CARS-PLSR (R
2p = 0.95) was better, which was 0.03 higher than before.
Table 5 shows the detection results of soil nitrogen content based on the SPA-PLSR and SPA-LSSVM models. For NH
4-N, the best result on soil1 (R
2p = 0.87) was only 0.01 higher than that of the full-band model; the best result on soil2 (R
2p = 0.86) was 0.02 lower than the full-band model. It can be seen from
Table 5 that the number of characteristic wavelengths selected by SPA in this dataset was only 4% of the full band. Although the redundancy of spectral information was greatly reduced, it might make some useful features related to NH
4-N in soil2 lost. On the soil1_NO
3-N and soil2_NO
3-N datasets, the results that performed better were all SPA-PLSR, with R
2p of 0.89 and 0.84, respectively. Similarly, urea-N got the best predicted results, SPA-PLSR performed slightly better on both soil1 (R
2p = 0.95, RMSE
P = 0.62, RPD = 4.43) and soil2 (R
2p = 0.96, RMSE
P = 0.60, RPD = 4.82) datasets. Compared with the full-band models, the improvements were 0.01 and 0.04, respectively.
Combining
Table 4 and
Table 5, after selecting characteristic wavelengths by CARS and SPA, the prediction accuracy of the models on most datasets had been improved to some extent, and the results on only a few datasets were similar or slightly reduced.
2.6. Predictive Fit of the Optimal Model
By comprehensively comparing the average R
2p, RMSE
P and RPD of all models, the best model suitable for detecting different types of nitrogen and soils could be obtained. On the soil1_NH
4-N, soil2_NH
4-N, soil1_NO
3-N, soil2_NO
3-N, soil1_urea-N and soil2_urea-N datasets, the models with the best prediction results were CARS-PLSR, CARS-PLSR, CARS-PLSR, CARS-PLSR, CARS-LSSVM and SPA-PLSR, respectively. We output the results of the best models on the prediction sets and fitted them with scatter plots, as shown in
Figure 4.
For the detection of NH
4-N in the 2 soils, the average R
2p of the CARS-PLSR model was 0.92, the average RMSE
P was 0.77% and the average RPD was 3.63, which indicated that the model proposed in this study could make reliable prediction when the NH
4-N content in soil was higher than 0.77%. Compared with the result of Shengxiang Xu et al. [
19] using the SVMR model (R
2p = 0.70), this study improved by 0.22. In the datasets of soil1_NO
3-N and soil2_NO
3-N, the average R
2p of CARS-PLSR was 0.92, the average RMSE
P was 0.74%, and the average RPD was 4.17, which meant that the NO
3-N content in the soil was higher than 0.74% and the model could make accurate prediction. Shengxiang Xu et al. [
19] used the SVMR model to detect NO
3-N with a R
2p of 0.82, which was improved by 0.1 in this study. For urea-N, the models had an average R
2p of 0.96 in both soils, an average RMSE
P of 0.57%, and an average RPD of 5.24. That is, when the urea-N content in the soil was higher than 0.57%, the model could achieve a good prediction effect. The R
2p reached by Yong He et al. [
25] using the PLS model was 0.94, which was improved by 0.02 in this study. Comparing the 3 types of nitrogen, it was found that the detection effect of the model in NH
4-N and NO
3-N was very close, and the prediction ability for urea-N was significantly better, which was shown in
Figure 4e,f as a good fit of the actual and predicted values on both sides of the trend line. Moreover, the detection accuracy of the 3 types of nitrogen detected by this method had been improved compared with the previous literatures, and the PLSR model was found to be more stable than the LS-SVM model.
4. Conclusions
In this study, the NIR-HSI instrument was used to detect two kinds of soils after tableting (latter soil and loess) added with NH4-N, NO3-N and urea-N solutions. Machine learning models were established based on the full band and the characteristic bands respectively, and the optimal prediction models for different nitrogen detection were obtained by comprehensively comparing the evaluation indicators. First, we compared the results of PLSR and LSSVM modeling of spectra with no preprocessing, MSC and WT methods, and obtained the most suitable pretreatment method for the detection of different types nitrogen in different soil regions, which was used for subsequent data analysis. Second, this study significantly reduced the spectral dimension required for modeling by using two characteristic wavelength selection algorithms. It was found that the numbers and positions of characteristic wavelengths in different datasets were significantly different, while the wavelengths selected by different algorithms had a certain degree of similarity. Finally, the optimal model was obtained by comprehensively analyzing the performances of different models on different datasets and the average results were calculated. In NH4-N, R2p was 0.92, RMSEP was 0.77% and RPD was 3.63; for NO3-N, R2p was 0.92, RMSEP was 0.74% and RPD was 4.17; for urea-N, R2p was 0.96, RMSEP was 0.57% with RPD of 5.24.