1. Introduction
Human-induced land degradation has become a problem on a global scale [
1]. In arid and semi-arid regions of different countries, such as in the northeast of Brazil, the salinization and alkalization resulting from irrigation are the main processes that lead to a decline in soil quality. These processes result in the emergence of large tracts of land affected by salts [
2]. Salts impair the development of agricultural crops, reducing their productivity [
3]. Monitoring the spatial distribution of salinity is therefore vital for the management and handling of soils and agriculture as a whole [
4].
Remote sensing is one alternative in the study of soils affected by salts, as it provides spatial information for large areas of land [
5]. With recent advances in sensor technology and the perspective of new satellites being launched with hyperspectral instruments, different studies are necessary for a better understanding of the spectral response of saline soils [
6,
7,
8,
9]. One state-of-the-art example is the German Environmental Mapping and Analysis (EnMAP) mission, scheduled for launch in 2019, carrying a sensor with more than 200 bands (400–2500 nm), with a spatial resolution of 30 m and an imaging swath width of 30 km [
10]. However, in the current absence of satellites having high spectral resolution and radiometric quality, hyperspectral instruments onboard aircraft can be used in studies of soil salinity. For example, the ProSpecTIR-VS sensor (SpecTIR Advanced Hyperspectral & Geospatial Solutions) operates in 357 spectral bands in the visible, near infrared (NIR), and shortwave infrared (SWIR) (400–2500 nm). The spatial resolution can be controlled by flight altitude. It is therefore a sensor with the necessary technical specifications, and a spectral resolution closer to that of laboratory equipment. Despite the potential use of hyperspectral sensors to detect soil salinization, as far as we know, there are no studies evaluating the effects of bandwidth and band positioning of the sensors on such detection. Hyperspectral data are adequate for this purpose, because they allow simulation of the spectral resolution of distinct multispectral sensors using their filter functions.
One way to measure soil salinity is to measure electrical conductivity (EC), either in the field or in the laboratory. Since most salts strongly reflect solar energy incident on the soil surface toward the satellite sensors, there is a direct relationship between the reflectance recorded in the images and the salt concentration, especially for NaCl. On the other hand, an indirect relationship of cause and effect is generally observed between the soil reflectance and the EC of exposed saline soils [
2]. Reflectance data at certain wavelengths are therefore correlated with EC data, although there is no physical relationship of cause and effect. Thus, computational models calibrated in the laboratory can be generated and applied to the reflectance data of images to estimate EC in the scene, pixel by pixel, in areas of exposed soils.
Some studies have applied computational models to estimate soil EC by linear relationships, especially the method of Ordinary Least Squares (OLS) and Partial Least Squares Regression (PLSR). Many studies defend the use of linear models for this purpose due to the results obtained by Farifteh et al. [
11]. In their work, the authors compared the use of the linear model of least squares with the non-linear model of artificial neural networks with backpropagation training. In that study, the performance of an artificial neural network was inferior to the linear model, especially when using field measurements of the EC. However, neural algorithms have shown improvement with respect to both learning period and gain in model performance. The Multilayer Perceptron (MLP) is an alternative to be tested in salinization studies. Another model that has been highlighted is that proposed by Huang et al. [
12], termed the Extreme Learning Machine (ELM). ELM has some advantages compared to the other methods: low complexity of programming language; fast computation of parameters; high rates of learning; and high generalization capability [
13]. This model has shown potential in various applications using hyperspectral data [
14,
15].
There are factors that cannot be ignored in hyperspectral studies. These factors include the need for sufficient computer memory for the rapid processing of data, and the limited capacity of these algorithms to work with a large number of bands [
16]. Consequently, to facilitate the analysis, interpretation of the results, and application of the models discussed, one of the recommended approaches has been a reduction in the dimensionality of the dataset [
2]. In addition to reducing data dimensionality, techniques such as principal component analysis (PCA) and derivative analysis can also provide additional input data to spectral reflectance for use in these models, as they can highlight variations in the spectral curves associated with saline soils [
17,
18].
Within this context, the objectives of this study are to: (1) evaluate the performance of linear (OLS and PLSR) and non-linear computational models (MLP and ELM) in estimating EC in saline soils through laboratory reflectance spectroscopy; (2) test the potential of reflectance data, first-order derivative of the spectra, and PCA transformation as input variables for these models after feature selection; (3) verify the applicability of the laboratory-calibrated model to aircraft level, using a hyperspectral image obtained by the ProSpecTIR-VS airborne sensor for estimation of soil EC on a per-pixel basis; and (4) study the influence of bandwidth and band positioning on the soil EC estimates, using the laboratory data to simulate the spectral resolution of different hyperspectral and multispectral sensors.
2. Materials and Methods
2.1. The Study Area
The Morada Nova Irrigation District, which has several problems of soil salinization associated with the cultivation of irrigated rice, was selected as the study area. The irrigated area is located in the towns of Morada Nova and Limoeiro do Norte, in the State of Ceará, in the Banabuiú sub-basin of the Lower Jaguaribe micro-region, 170 km from Fortaleza (
Figure 1).
In the Morada Nova Irrigated Area, there is a predominance of two types of soil: Fluvic Neosols (75% of the study area) and Litholic Neosols (25% of the area) [
19]. The first are alluvial soils, generally cultivated with rice for many years, which display many problems of salinization, while the latter are unsuitable for agriculture. According to Ferreyra and Silva [
19], the mineralogical composition of these soils is a mixture, in which 2:1 minerals, both expansive (vermiculite and montmorillonite) and non-expansive (mica and illite), predominate in relation to kaolinite and quartz. According to the Köppen classification, the climate corresponds to BSW’h’ (very hot and semi-arid), with an average annual rainfall of less than 900 mm. Rainfall distribution is unimodal, with 80% of the total rain concentrated in the first four months of the year.
2.2. Field Data Collection
The study focused on the eastern part of the irrigated area (
Figure 1). Soil samples were collected over agricultural fields previously cultivated with rice. Salinization problems were reported for most of these fields, affecting crop yield.
Two field campaigns were carried out to collect soil samples. In the first campaign (12 May 2015), 46 soil samples were collected from the surface horizon (0–10 cm). In the second campaign (31 August 2015), 107 topsoil samples were collected. The samples were taken to the laboratory, and then ground and sieved (2 mm) to reduce the effect of surface roughness.
2.3. EC and Spectral Reflectance Measurements
EC data were obtained in the laboratory using a bench conductivity meter with a range between 0 and 20 dS·m−1. The EC was measured from a 1:1 dilution extract (soil sample: distilled water). The measurements were performed after 24 h of dilution.
To obtain spectral data in the laboratory, the FieldSpec Pro FR-3 spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) was used in a controlled environment (dark room). This equipment has a spectral resolution of three nm in the visible and near infrared region (VNIR = 350–1300 nm), and of 10 nm in the shortwave infrared (SWIR = 1300–2500 nm). At the nadir, the sensor was positioned at a distance of 7 cm from the soil sample to measure a circular area of 3 cm in diameter. The light source was a 250 W tungsten-halogen lamp with parabolic reflector and collimated beam along the target plane, and a zenith angle of 45°. To obtain the reflectance of the soil sample, white Spectralon of known reflectance was used as reference standard. Three readings were made by the spectrometer for the calculation of the mean reflectance with a sample rotation of 120° to reduce the influence of soil roughness and bidirectional effects on the spectral measurements.
2.4. Generation of the OLS, PLSR, MLP, and ELM Models with Laboratory Data
2.4.1. Model Calibration
The methodology used to develop the models is presented in
Figure 2. The first step consisted of model calibration or data processing. At this stage, laboratory reflectance data obtained with the FieldSpec Pro FR-3 were resampled to the 357 bands of the ProSpecTIR-VS airborne hyperspectral sensor. Earlier studies have shown that PCA can be useful in studies of soil salinity, since the first component is usually associated with the brightness (average reflectance) of the samples, which increases with salt concentration [
20,
21,
22]. It was therefore decided that PCA should be applied to the laboratory data resampled to the bands of the ProSpecTIR-VS. As the first derivative can enhance spectral features associated with salts, derivative analysis was applied to the resampled reflectance data, preceded by the use of a moving average filter with a three-band window.
Due to the large number of attributes, both in the original data (reflectance) and in the transformed data (PCA and first derivative), the second step included feature selection from each data set. The forward feature selection process was used, where the selected attributes are initially unitary, and the remaining attributes are added according to an adjustment increment. The increment adopted was the adjusted coefficient of determination (adjusted R2). The stopping criteria for the algorithm were to obtain an adjusted R2 greater than 95% and a maximum total of attributes equal to 25% of the sample size. This second criterion was adopted to avoid possible overfitting, thereby ensuring the generalization power of the model.
2.4.2. Model Validation
Model validation was performed using leave one out cross validation (LOOCV) (
Figure 2). The parameters of the OLS, PLSR, MLP, and ELM models necessary for calibration and the resulting transformation matrices were obtained. The following parameters were used to evaluate the MLP neural network: five neurons in the hidden layer, randomly chosen weights, a learning rate of 95%, a logistic sigmoid activation function, and 5000 learning epochs. For the ELM model, the following parameters were used: five neurons in the hidden layer and a logistic sigmoid activation function.
2.5. Evaluation of the OLS, PLSR, MLP, and ELM Models with Laboratory Data
The evaluation of the four tested models was made according to the following statistical metrics: adjusted R2, root mean square error (RMSE), the Pearson correlation coefficient (r), and the ratio of the performance to deviation (RPD). The coefficient of determination was calculated during the calibration stage. The other metrics were obtained during the validation stage.
As shown by Huang et al. [
12], a large number of training samples are necessary to achieve the desired performance with the ELM model. Therefore, to optimize the total number of samples collected in the field, the technique of LOOCV was applied, where the separation of data into calibration and validation is repeated for a number of times equal to the total number of collected samples (
n = 153), leaving out one sample. In each of the repetitions, the data set used for validation is unitary and new in relation to the other repetitions, a process that is repeated until all the collected samples have been used to validate the model. The results obtained with the validation metrics are equal to the average value of the results from the repetitions.
2.6. Model Application Using the ProSpecTIR-VS Hyperspectral Images
The best laboratory-calibrated model was inverted to the reflectance data obtained by the ProSpecTIR-SV airborne sensor. The image was acquired by the Fototerra Company on 23 May 2015. The sensor operates in 357 spectral bands located between 400 and 2500 nm, and spaced at intervals of 5 nm. The spatial resolution was 1 m, and the images were acquired at 1:00 p.m. (local time). During image acquisition, the agricultural fields were not under irrigation, which could affect the relationships between reflectance and soil salinity.
Radiance data measured by the ProSpecTIR-VS sensor were corrected for the effects of scattering and absorption by atmospheric gases and converted into surface reflectance using the ATCOR4. A rural tropical atmospheric model was adopted. The water vapor was calculated using the absorption bands located at 940 nm and 1130 nm. The images were corrected geometrically using data from the inertial navigation system of the aircraft.
To separate the pixels of exposed soil in the image from other scene components, the Normalized Difference Vegetation Index (NDVI) was calculated. Exposed soils generally exhibit NDVI values of less than 0.30 [
23,
24], a threshold that was tested and adopted in the present study. For the purposes of validating the laboratory-calibrated model applied to the image, EC data estimated for 32 pixels located in areas of exposed soil (NDVI < 0.30) in the image were plotted as a function of the EC values for the corresponding soil samples measured in the laboratory in the first campaign (12 May 2015). These pixels were selected independently from the training sample locations for model building.
2.7. Influence of Bandwidth and Band Positioning on Soil EC Estimates
To evaluate the performance of the four computational models as a function of the spectral resolution, we simulated two hyperspectral sensors and three multispectral instruments. The objective was to evaluate the influence of bandwidth and band positioning on the results. The selected hyperspectral instruments simulated from the laboratory data were the airborne ProSpecTIR-VS and the planned orbital Hyperspectral Infrared Imager (HyspIRI) with bandwidth of 5 nm and 10 nm, respectively, between 400 and 2500 nm. The multispectral sensors were simulated using their filter functions, including the RapidEye/REIS, High Resolution Geometric (HRG)/SPOT-5, and the Operational Land Imager (OLI)/Landsat-8)).
The RapidEye has five bands in the VNIR interval: 1 (blue; 440–510 nm), 2 (green; 520–590 nm), 3 (red; 630–685 nm), 4 (red edge; 690–730 nm), and 5 (NIR; 760–850 nm). The HRG has four bands in the VNIR/SWIR-1 spectral range: 1 (green; 500–590 nm), 2 (red; 610–680 nm), 3 (NIR; 780–890 nm), and 4 (SWIR-1; 1580–1750 nm). Finally, we simulated seven of the nine reflective bands of OLI: 1 (blue; 435–451 nm), 2 (blue; 452–512 nm), 3 (green; 533–590 nm), 4 (red; 636–673 nm), 5 (NIR; 851–879 nm), 6 (SWIR-1; 1566–1651 nm), and 7 (SWIR-2; 2107–22,294 nm).
Therefore, the simulation of ProSpecTIR-VS and HyspIRI, compared to the other sensors, represents the transition from narrowband to broadband sensors. On the other hand, the simulation from RapidEye to OLI represents the change in band positioning from the VNIR to the SWIR-1 and SWIR-2 spectral intervals.
Evaluation of the performance of the four models for the simulated sensors followed the protocol described before in
Section 2.5.
4. Discussion
We studied the performance of four computational models with different spectral attributes as a function of the spectral resolution of hyperspectral and multispectral sensors, and used the best laboratory-calibrated model to estimate soil EC on a per-pixel basis in a hyperspectral image acquired by the airborne ProSpecTIR-VS instrument. Our findings showed that the OLS, PLSR, and ELM models performed better than the MLP model in estimating soil EC with the metrics related to soil brightness having greater predictive power than the metrics related to spectral features. The laboratory models were transferable to the aircraft level of data acquisition with their performance to estimate soil EC being better for simulated narrowband sensors (ProSpecTIR-VS and HyspIRI) than for simulated broadband instruments (RapidEye, HRG, and OLI).
In our study, soil brightness was mostly controlled by variations in soil salinization. In contrast to the other salts (e.g., MgCl
2 and CaCl
2), the effect of NaCl is to increase soil brightness, as also observed in other studies [
25]. In addition, NaCl produces also absorption bands at 1450 and 1950 nm related to fluid inclusions or absorbed water [
26]. Unfortunately, these features in pixel spectra are coincident with strong water vapor absorptions that cannot be used in the data analysis even after atmospheric correction. Overall, our results are in agreement with those obtained by Moreira et al. [
22], who observed a positive correlation between soil brightness and soil salinization, as expressed by increasing values of EC. Such correlation is not observed for all the salts and is typical of the predominance of NaCl. For instance, when treating the soils of the study area with increasing levels of salinization in the laboratory, Moreira et al. [
27] observed an increase in brightness for NaCl and a decrease in brightness for MgCl
2 and CaCl
2 with increased salt concentration.
Compared to the absorption bands, which were indirectly represented by the calculation of first-order derivative spectra, our feature selection procedure confirmed that soil brightness, expressed by the reflectance of selected bands and PCA scores, was more sensitive to changes in salt concentration. When studying different computational models to estimate soil EC, Mashimbye et al. [
5] obtained a lower adjusted R
2 for the first-order derivative than for the original reflectance, which was in agreement with the current results. In the literature, soil brightness is generally related to the first principal component [
22]. Brightness corresponds approximately to the mean reflectance calculated between 400 and 2500 nm. Our results confirmed PC1 as a proxy of soil brightness because its eigenvector loadings were approximately similar for the 357 simulated VNIR-SWIR bands of the ProSpecTIR-VS. As a result, PC1 was highly correlated with the average reflectance calculated between 400 and 2500 nm (
r = +0.87), as also observed by Moreira et al. [
22]. That was the reason that PC1 and the reflectance of three bands placed in this interval were selected as the best attributes for the computational models.
In order to estimate soil EC using remote sensing, most studies indicated a superior performance of the linear models compared to the non-linear models [
11]. The best performance of the PLSR model to estimate soil EC in our study area was also consistent with the work by Kumar et al. [
4], when using the reflectance of Hyperion bands as input variables for the model. However, the modified approach proposed by Huang et al. [
12] in the ELM model produced results in northeastern Brazil comparable in performance with the PLSR, as indicated by the different statistical metrics (
r, RMSE, and RPD). It is well known that neural networks models benefit from a large number of samples [
12]. Therefore, the performance of the neural algorithms may be underestimated here because they have an accuracy that is generally affected by the number of training samples, which is relatively small in our work. Another factor that can affect the reliability of the models for detecting soil salinization is the spatial dependence of the soil samples collected in the field. In our study, we checked for spatial autocorrelation between the samples using the Moran’s Index with the value of 0.261 indicating the absence of spatial autocorrelation.
When evaluating the predictive ability of linear models to estimate soil EC using PCA, derivative analysis, and band ratioing, Mashimbye et al. [
5] observed that the best results were associated with the PCA set of laboratory attributes. In agreement with our findings, Kobayashi et al. [
8] noted that the predictive power of the reflectance of bands positioned between 1970 and 2130 nm was higher than the use of the first-order derivative. Because soil brightness is probably the most sensitive attribute to soil salinization in these studies [
5,
8,
11], techniques to quantify absorption band parameters (e.g., derivative analysis and continuum removal) will not be effective to estimate soil EC in areas where NaCl is the predominant salt.
Our results from sensor simulation were in agreement with a recent study by Castaldi et al. [
28]. When evaluating the potential of the current and forthcoming multispectral and hyperspectral sensors to estimate physico-chemical attributes of soils from Europe using PLSR, Castaldi et al. [
29] found large RPD values for simulated hyperspectral instruments than for multispectral sensors. Although brightness was the best indicator of soil salinization in our study area, our results with the simulation of RapidEye, HRG, and OLI showed the importance of the SWIR-1 and SWIR-2 spectral information to compose this attribute, or if the choice is to use selected bands to estimate soil EC.
Some constraints should be analyzed with care during the transition from the laboratory to the airborne level of data acquisition. First, the models were developed under a controlled environment and the soils were dried and sieved. As a result, soil structure was not preserved, which is an important factor for salinization. Second, soil moisture in the field reduces the reflectance, from the VNIR to SWIR, which introduces uncertainties on the EC estimates of the pixels, especially during the rainy season. Finally, because brightness or the reflectance of selected bands of the airborne ProSpecTIR-VS were the most sensitive attributes to detect soil salinization in the study area, care is necessary during the planning of the flightlines. These attributes do not normalize bidirectional effects associated with variations in solar zenith angle (SZA) and relative azimuth angle (RAA) between the sensor and the sun. If several flightlines are programmed for a given region, the best strategy is to fix the period of data acquisition (time and date) and the direction of the lines (e.g., NNE/SSW) to reduce bidirectional effects on reflectance and EC estimates over the images. For differently oriented flightlines, an alternative is to look for the best band-ratio index to detect soil salinization and normalize the bidirectional effects.
When the PLSR-calibrated model using reflectance of selected bands was applied to the ProSpecTIR-VS image, the RMSE from the validation set of samples was higher (1.22 dS·m
−1) than that obtained with soil samples under controlled laboratory conditions (RMSE = 0.77 dS·m
−1). This result was expected because the transition from the laboratory to the airborne level of data acquisition produces a greater influence of other factors over the EC estimates such as spectral mixture and signal-to-noise (SNR). Despite these factors, the RPD was 2.21, which still indicates the adequacy of the predictive ability of the PLSR model. The reduction in model performance with the scaling up of observations from laboratory to satellite was also observed by Zhang et al. [
27], after modeling Ca
2+ using Hyperion/EO-1 data in a land degradation area in China. They highlighted the poor SNR of Hyperion as one of the factors that affected the quality of the estimates.
Our soil salinization map (
Figure 7) detected greater amounts of saline soils in the central portion of the study area with EC ranging from 10 to 20 dS·m
−1. These values are generally the result of inadequate irrigation and poor drainage of soils, which affect crop yield (e.g., rice) and cause subsequent land abandonment [
22]. Our results have shown the importance of testing different models to estimate soil EC using hyperspectral data. They can contribute to data analysis of the upcoming orbital hyperspectral missions such as the EnMAP, planned for 2019 [
10], and the HyspIRI. The HyspIRI has been proposed to obtain high spectral resolution data (380–2500 nm) over large areas with 30 m spatial resolution and 145 km swath width [
29].