1. Introduction
Soil is a fundamental natural non-renewable resource that people rely on for food production, fiber, and energy. It is also a habitat for microorganisms and earthworms and the foundation for buildings and other constructions [
1]. Fundamentally, soil is a complex matrix that consists of organic and inorganic mineral matter, water, and air. The organic material in soils ranges from decomposed and stable humus to fresh, particulate residues of various origins. The distribution of these different organic pools in soil influences biological activity, nutrient availability and its dynamics, soil structure and aggregation, and water-holding capacity [
2]. The inorganic mineral fractions are often described by their particle size distribution (proportions of sand, silt, and clay) and by additional subclasses in various classification systems [
3]. Nitrogen (N) is one of the essential elements that affect vegetative growth and plant development because it plays a central role in all metabolic processes, as well as in cellular structure and genetic coding [
4]. Its association with phosphorus (P) plays a vital role in plant growth as these two elements interact with each other. Therefore, it is essential to investigate the N content in the soils and obtain the spatial distribution information that would improve field nitrogen management efficiency and the economic benefit from agricultural production and contribute to sustainable agriculture [
5].
The goal of the research was to determine the total soil nitrogen (Ntot); its concentration was expressed in the percentage of dry weight of soil (%). It consists of all accessible and inaccessible forms that can pass between each other. The natural nitrogen cycle depends on the ability of organisms to bind and convert inert atmospheric nitrogen (N2) and decompose proteins. The basic mineral forms of ammonium (NH4+) and nitrate (NO3–) are also important accessible forms of nitrogen for plant nutrition and indication of soil viability. Both forms of nitrogen move in the soil sorption complex, indicating the degree of eutrophication and the efficiency of the nitrogen cycle.
The nitrogen content in the soils is usually measured using laboratory methods. However, they are often time consuming, expensive, and destructive. Therefore, new techniques such as laboratory spectroscopy are being developed to minimize the disadvantage of traditional laboratory methods [
6]. Examples of popular spectrometers include ASD FieldSpec 3, Peristrom NIR System 6500, and FOSS XSD Rapid Content Analyzer Spectrometer [
5,
7,
8,
9]. These devices measure mostly at spectral bandwidths ranging from 1 to 2 nm over a wavelength range of 300 to 2500 nm.
Another option is to use remote sensing methods. There are aerial and satellite remote sensing techniques. Aerial remote sensing is generally used to acquire data for smaller areas (national scale), while satellite remote sensing is used for covering larger areas (global scale). In addition, there are three types of optical data. The first is panchromatic, which is characterized by high or very high spatial resolution and a single band. The band is formed by total light energy in a visible spectrum. Multispectral data have a higher spectral resolution consisting of multiple spectral bands with broader bandwidth instead of lower spatial resolution compared to panchromatic data. Therefore, a pan-sharpening method is sometimes employed, which improves the spatial resolution of multispectral data. For example, Sentinel-2A has 13 spectral bands in visible, NIR (near-infrared), and SWIR (short-wavelength infrared) spectra with a spatial resolution of 10, 20, and 60 meters, respectively. The range of bandwidth is from 15 to 175 nm [
10]. Hyperspectral data contain many spectral bands compared to multispectral data. However, their main advantage is contiguous bands with narrow bandwidth. As a result, the spectral curve of a surface is continuous. Hyperspectral images are used for soil mapping [
11] or identifying types of iron and clay minerals [
12].
Aerial and satellite hyperspectral images enable the mapping of relatively large areas over a short period. However, the raw data must be corrected to eliminate the influence of the atmosphere in the determination of soil properties [
13]. Hyperspectral imaging combines conventional spectroscopy with imaging techniques to acquire spectral and spatial information to detect physical, chemical, and biological attributes of the samples [
14,
15]. A soil spectrum is generated by directing radiation containing all relevant frequencies to the sample. Depending on the constituents present in the soil, the radiation will cause individual molecular bonds to vibrate, either by bending or stretching, and absorb light to various degrees. The resulting absorption spectrum produces a characteristic shape that can be used for analytical purposes [
16]. Visible and near-infrared (vis–NIR) regions, encompassing wavelengths between 400 and 2500 nm, contain useful information on organic and inorganic materials in the soil. Absorptions in these regions can be used to detect mineral content associated with iron, soil organic matter, clay, carbonates, or water [
17,
18,
19,
20]. It can also be used to detect soil matter, such as organic carbon (SOC) or total nitrogen, as a result of the stretching and bending of NH, CH, and CO groups [
21,
22,
23,
24,
25]. Viscarra Rossel and Behrens [
26] present a summary of important fundamental absorptions in the mid-infrared (mid-IR) region and the occurrence of their overtones and combinations in the vis–NIR regions, which can be used to help with the interpretation of soil constituents.
Diffuse reflectance spectra of soil in the vis–NIR regions is largely nonspecific due to the overlapping absorption of soil constituents. This inherent lack of specificity is compounded by scattering effects caused by soil structure or its specific components such as quartz. All these factors result in complex absorption patterns that need to be mathematically extracted from the spectra and correlated with soil properties. Hence, the analyses of soil diffuse reflectance spectra require a sophisticated statistical technique to discern the response of the soil attributes from spectral characteristics [
27]. The most common calibration methods for soil applications are based on linear regressions, namely, stepwise multiple linear regression (SMLR), principal component regression (PCR), and partial least squares regression (PLSR) [
28]. The main reason for using SMLR is the inadequacy of more conventional regression techniques such as multiple linear regression (MLR) and the lack of awareness among soil scientists of the existence of full-spectrum data compression techniques such as PCR and PLSR. Both methods can cope with data containing large numbers of predictor variables that are highly collinear. PCR and PLSR are related techniques, and in most situations, their prediction errors are similar. However, PLSR is often preferred by analysts because it relates the response and predictor variables so that the model explains more of the variance in the response with fewer components, and the algorithm is therefore computationally faster. The use of data mining techniques such as neural networks (NN) [
23,
24], multivariate adaptive regression splines (MARS) [
25] and boosted regression trees [
26] is increasing. Viscarra Rossel et al. [
29] combined PLSR with bootstrap aggregation (bagging-PLSR) to improve the robustness of the PLSR models and produce predictions with uncertainty. MLR, PCR, and PLS are linear models, while the data mining techniques can handle nonlinear data. Viscarra Rossel and Lark [
28] used wavelets combined with polynomial regressions to reduce the spectral data, account for non-linearity, and produce accurate and parsimonious calibrations based on selected wavelet coefficients. Mouazen et al. [
30] compared NN with PCR and PLS for predicting selected soil properties. They found combined PLSR-NN models to provide improved forecasts compared to PLSR and PCR. Viscarra Rossel and Behrens [
26] examined the use of PLSR to several data mining algorithms and feature selection techniques for predictions of clay, organic carbon, and soil acidity (pH). The comparison included MARS, random forests (RF), boosted trees (BT), support vector machines (SVM), NN, and wavelet transform. Their results suggest that data mining algorithms produced more accurate results than PLSR. Some of the algorithms provide information on the importance of specific wavelengths in the models so that they can be used to interpret them.
The objectives of the presented study were to evaluate the suitability of airborne hyperspectral imaging, determine the soil nitrogen content, and produce a soil nitrogen map on a pixel-wise basis usable for precision agriculture. The goal was fulfilled with emphasis on the use of geoinformation technologies, for processing of hyperspectral data and performing spatial analyses in a GIS environment.
3. Results
Data containing information on the spectral signature of the soil samples were obtained in the form of reflectance according to the spectral range of the scanning instrument. Each sampling site (or soil sample) was measured three times (
Table 2). Based on notes on high cloud transitions recorded during measurements and graphical visualization of spectral curves, a single measurement was selected for further analysis.
The processed data were then combined with soil data (
Table 3), so all 22 sampling points had information about their location, total nitrogen content measured in the laboratory from soil samples, and the spectral reflectance characteristics.
Several pre-treatment methods were tested to ensure the best results were achieved when using the PLSR method. The results (
Table 4 and
Table 5) differed quite significantly according to the selected variant (a) without resampling the spectra and thus intended for possible new data from the spectrometer, or (b) with resampled spectra for building a model for prediction based on analyzed hyperspectral data.
Table 4 shows the characteristics of the developed models after fitting the original spectral information from which the noise was removed without performing another spectral resampling.
Table 5 presents results of the evaluation of models that were built for the prediction of total nitrogen content in the soils for a specific hyperspectral dataset. Therefore, their spectral information was resampled into the resolution of hyperspectral images. Models for which the values are not available indicate that after cross-validation the R
2 became a negative number. This statistical indicator describes how much of the dependent variable is explained by the model and how much remains unexplained [
42].
For determining the total soil nitrogen content from data collected with the spectrometer, the model using four latent components and reflectance without pre-treatment (R2, 0.36; RMSEP, 0.0195; and RPD, 1.25) achieved the best results. Other variants for both elements resulted in a lower predictive power. When predicting the total soil nitrogen using hyperspectral data, the best predictive abilities of all 48 built models were achieved using a pre-treatment model with absorbance (R2, 0.44; RMSEP, 0.01; and RPD, 1.34). The developed application and the prediction procedure of the PLSR method were subsequently tested on this model.
Since the data sample included only 22 samples, the maximum number of components was about 20 depending on the applied pre-treatment method. This is important because the best model required six components, which is almost one-third of all available data.
Figure 6 shows the basic predictive abilities of the model using six components determined based on cross-validation. The closer the individual values are to the line of best fit, the better the model performed.
Regression vectors in
Figure 7 show that the use of seven components contributes to the prediction, but this contribution is not significant and, according to cross-validation (
Figure 8), there is a slight deterioration in the prediction ability. Although the use of seven components would not be wrong, only six were used to avoid over-fitting the model and, as a result, increasing the noise.
The decision to use six components is also supported by a score graph (
Figure 8), which shows the variance of the data that make up the main components [
46]. It is possible to read the variance of data between individual components from this graph. The most important component forming the model itself was the second component, explaining 87% of the problem relevant variance in the data. This is a rather surprising result because for most models built this way, the most important component is the first one. The sixth component is the most important for the second part of modeling and prediction. The values described by the sixth component are the values with the least variance and, except for two outliers, describe the data very well.
The main result of the prediction is a map that displays the spatial distribution of total soil nitrogen. The spatial component of the map output enables its immediate application in precision agriculture for spatially variable nitrogen management. The prediction results were refined by removing several negative values, which could be caused by low concentration or variability of the investigated element. Further refinement was made by removing those pixels that represent a surface (buildings and roads) for which the model was not built, and therefore, the resulting values for these surfaces are irrelevant.
It can be seen from
Table 6 that the predicted values were slightly higher than the values obtained by pedological measurements. This is due to the overall higher reflectance values on the hyperspectral image than the values measured by the handheld spectrometer.
Figure 9 shows a map with the spatial distribution of nitrogen values. It is possible to see that higher N concentrations were observed in the northwest area.
4. Discussion
The determination of total nitrogen in the soil is a very complex matter and involves several processes. Ideally, it is necessary to include all nitrogen cycles that can help to understand the evolution of nitrogen. However, these models will always be subject to a certain degree of error. Pedologists deal with assessing nitrogen mainly due to better management of nitrogen fertilizers. It is in the general interest to ensure high yields with minimal soil degradation [
8].
Soil properties are most often determined in the laboratory. Unfortunately, classical laboratory methods are often time consuming, expensive, and destructive. Therefore, new methods are being developed to eliminate these disadvantages [
6]. One of these new methods is laboratory spectroscopy. Another option is to employ remote sensing, which is probably the least expensive path and non-destructive to the soil. Aerial and satellite multispectral and hyperspectral images can map a relatively large area in a short time. However, when determining soil properties using this method, it is necessary to eliminate the influence of the atmosphere to achieve pure reflectance [
13]. For example, aerial photography with a hyperspectral camera has been used in the past to detect nitrogen [
47]. Of the satellite systems, the Hyperion hyperspectral system [
48] or the Landsat 5 and 7 multispectral systems [
9,
49] are widely used.
The wavelengths suitable for examining soil nitrogen content vary considerably among published work. Dalal and Henry [
22] determined the most suitable range of wavelengths to be 1100 to 2500 nm (more specifically 1702, 1870, and 2052 nm). On the other hand, Sterberg et al. [
50] considered 1100, 1600, 1700, 1800, 2000, and 2200 nm to be the most appropriate wavelengths. Shi et al. [
51] compared their results with previous works, which they incorporated into their work, and the output was a series of wavebands 1450, 1850, 2250, 2330, and 2400 nm. In this work, it is pointed out that the determination of suitable wavelengths depends mainly on the overall processing and the method used. The latter bands are especially suitable for nitrogen (but they can also be used for other soil properties) and for the popular PLSR method. In general, water absorption bands from 1300 to 3000 nm are the most suitable and most widely used for determining the soil nitrogen content [
51,
52]. The area is associated with water content, which changes the total carbon content in the soil. This suggests that the soil nitrogen content can be indirectly related to carbon content through these wavelengths [
53]. Chang et al. [
7] also pointed to a high correlation between the amount of these two elements in the soil.
Several studies have further compared the vis-NIR and MIR portions of the spectrum. MIR performed better in most of these studies [
52,
54,
55]. The PLSR method or its variations (e.g., CARS-PLSR) were mostly used when working with MIR. Wavelengths suitable for measuring nitrogen in MIR were determined to be between 1676 and 1672, 1260 and 1036 cm
−1 [
52].
In the present work, the PLSR method was used to develop a model for the prediction of total soil nitrogen for selected test locations. Unfortunately, the resulting parameters and prediction capabilities were not as high as results published in some previous studies reviewed by authors that have shown that this method can achieve accuracy higher than an R
2 of 0.8 [
9,
43]. The determining factor for the PLSR model presented in this paper reached an R
2 of 0.44, indicating that after cross-validation, the model describes data from which it was created with 44% confidence. Areas for improvement are listed below.
The primary disadvantage, when developing the PLSR model, concerns the spectral range of the handheld spectrometer used. The FieldSpec HandHeld2 spectrometer measures wavelengths in the range of 325–1075 nm with an excessive noise affecting the signal with a wavelength of less than 450 nm and greater than 950 nm. According to Vohland et al. [
52], the critical wavelengths required to build a highly accurate model based on the spectral behavior of soil properties are in the NIR (near-infrared) and MIR (mid-infrared) bands starting at 750 nm. Using spectral information measured up to the infrared spectrum of the electromagnetic spectrum, Zornoza et al. [
45] achieved results with an R
2 reaching up to 0.95 for total soil nitrogen content.
Another issue that possibly influenced the quality of the developed model was the different acquisition periods for the hyperspectral imaging (8 November 2016) and field data collection using a handheld spectrometer (7 October 2016), as the weather conditions for these two periods were different. Hyperspectral imaging was performed at lower temperatures, with possible ground frosts but minimal precipitation, while handheld spectrometer data were collected at higher temperatures with higher cloud coverage.
Finally, the low sample size was identified to affect the development of the model. The number of samples collected in our study (a total of 22 samples) was small and did not allow us to perform the second standard validation step. Unfortunately, with this low sample size, it was impossible to carry out a more rigorous statistical evaluation of the dataset and the selection of the most optimal samples. This small number of samples from only two different sites also caused the low variability of the total soil nitrogen, minimizing the ability to capture the key relationships between the soil property and the relevant spectral information. Studies that used hundreds of samples [
47,
56] achieved results with an R
2 higher than 0.8. Regarding the samples, some degree of uncertainty could also be due to laboratory measurements.
Based on the results, it was identified that close attention should be paid to datasets, as each can exhibit some degree of inaccuracy. In general, meteorological conditions, sample size, and the spectrometer parameters used for field data collection should be evaluated during the test preparation. The encountered problems could be significantly minimized by using a larger dataset sample and by implementing a public soil spectral library for the Czech region. Unfortunately, the only suitable soil spectral library, created at the Czech University of Life Sciences in Prague [
57], was not publicly available. Therefore, field data collection was performed with the handheld spectrometer available to the authors.