Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform

Liu, Weichao; Huo, Hongyuan; Zhou, Ping; Li, Mingyue; Wang, Yuzhen

doi:10.3390/rs15194681

Open AccessArticle

Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform

by

Weichao Liu

^1,†

,

Hongyuan Huo

^2,†

,

Ping Zhou

^1,*

,

Mingyue Li

¹ and

Yuzhen Wang

¹

School of Geosciences and Resources, China University of Geosciences (Beijing), Beijing 100083, China

²

Faculty of Architecture, Transportation and Civil Engineering, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2023, 15(19), 4681; https://doi.org/10.3390/rs15194681

Submission received: 27 July 2023 / Revised: 17 September 2023 / Accepted: 21 September 2023 / Published: 24 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

The influence of some seemingly anomalous samples on modeling is often ignored in the quantitative prediction of soil composition modeling with hyperspectral data. Soil spectral transformation based on wavelet packet technology only performs pruning and threshold filtering based on experience. The feature bands selected by the Pearson correlation coefficient method often have high redundancy. To solve these problems, this paper carried out a study of the prediction of soil total iron composition based on a new method. First, regarding the problem of abnormal samples, the Monte Carlo method based on particle swarm optimization (PSO) is used to screen abnormal samples. Second, feature representation based on Shannon entropy is adopted for wavelet packet processing. The amount of information held by the wavelet packet node is used to decide whether to cut the node. Third, the feature bands selected based on the correlation coefficient and the competitive adaptive reweighted sampling (CARS) algorithm using the least squares support vector regression (LSSVR) are applied to the soil spectra before and after wavelet packet processing. Finally, the Fe content was calculated based on a 1D convolutional neural network (1D-CNN). The results show that: (1) The Monte Carlo method based on particle swarm optimization and modeling multiple times was able to handle the abnormal samples. (2) Based on the Shannon entropy wavelet packet transformation, simple operations could simultaneously preserve the spectral information while removing high-frequency noise from the spectrum, effectively improving the correlation between soil spectra and content. (3) The 1D-CNN with added residual blocks could also achieve better results in soil hyperspectral modeling with few samples.

Keywords:

Shannon entropy wavelet packet transform; particle swarm; least squares support vector machine; ResNet; CARS

1. Introduction

Iron (Fe) is one of the main mineral elements in soil [1]. As a trace element, Fe plays an important role in the growth and development of crops. Fe oxide can form stable soil ion groups, improve the compactness of soil particles, and thereby maintain soil nutrients [2]. However, excessive Fe will promote the production of hydroxyl radicals in plant cells and cause vegetation iron poisoning. Therefore, the retrieval and quantitative analysis of soil iron elements have important scientific value for monitoring soil environments, precision agriculture, and cultivated land protection [3]. The traditional chemical method of detecting the content of heavy metal elements in soil needs to collect soil samples and conduct biochemical analysis in the laboratory, which is costly, and obtaining large-scale field soil samples will also destroy cultivated land and cause serious environmental pollution. Hyperspectral remote sensing technology has unique advantages in quickly obtaining various soil property information, overcomes the limitations of traditional laboratory measurements, and has been widely used in soil environmental quality monitoring [4,5,6,7].

Effective removal of background noise from soil spectra is necessary before modeling. Fourier transformation and wavelet packet transformation (WPT) are commonly used frequency domain filtering and denoising methods, which are widely used in many fields, such as structure detection and electronic signals [8,9,10]. In the decomposition and reconstruction of different layers of hyperspectral data, wavelet transformation can not only smooth the noise, but also maintain its original characteristics, which has a great advantage in hyperspectral processing [11,12]. The WPT not only has the advantages of time-domain localization and multi-scale analysis [13], but can also adaptively select the corresponding frequency band according to the signal characteristics [14]. The pruning or threshold filtering adopted by traditional WPT often depends on experience. To solve the problem of the number of wavelet packets and the number of wavelet envelopes, Shannon entropy is used as the standard entropy of each spectral curve to evaluate the amount of information contained in each node of the wavelet packet tree. The Shannon entropy of the parent node and the child node can be used to automatically judge whether the child node will stay or not [15], which reduces the influence of experience on pruning. Zhang [16] performed Shannon entropy wavelet packet transformation on the spectral data of rice soil. This transformation improves the hyperspectral retrieval accuracy of organic carbon content in rice planting soil on the premise of effectively maintaining the soil organic carbon spectral signal. It solves the contradiction between spectral noise removal processing and retaining useful information to the maximum extent.

Owing to the high spectral resolution and spectral continuity, hyperspectral data usually produce information redundancy. In the modeling process, many characteristic bands often lead to an over-fitting problem [17,18]. Currently, commonly used spectral dimensionality reduction methods include the successive projection algorithm (SPA), competitive adaptive weighting algorithm, and Pearson correlation coefficient method [19,20,21]. Li [22] first proposed the competitive adaptive reweighted sampling (CARS) algorithm. The core of the CARS is to use partial least squares (PLS) to generate a polynomial. The number of variables of the polynomial is the number of bands. The size of the coefficient is the importance (weight) of the variable. Through a large number of PLS models, the importance of each band can be measured. Then, the exponentially decreasing function (EDF) is used to eliminate the variables with smaller absolute values of the regression coefficient, and adaptive reweighted sampling (ARS) is used to select variables. After layers of screening, the optimal band is obtained. Wei et al. [23] combined CARS with Spearman’s correlation coefficient, selected bands with a correlation coefficient higher than 0.6 based on CARS for normalized modeling, and achieved good results.

In addition to the selection of feature variables, the modeling method is also a dominant factor affecting the accuracy of the model. In terms of modeling, the support vector machine (SVM) method learns according to the principle of minimum structural risk and converts nonlinear problems into linear problems through kernel function mapping [24,25]. As a kind of SVM, the least squares support vector machine still depends on the selection of hyperparameters [26], and different compensation coefficients C and kernels have a great impact on the accuracy of the model. Most of the previous literature depends on the selection of hyperparameters based on experience, while the particle swarm optimization algorithm has the advantages of fast operation speed, few parameters that need to be adjusted, and easy operation [27]. Compared with the global grad-search algorithm [28], the PSO algorithm does not need to exhaustively search the hyperparameter space, nor does it need to determine the effective interval of the hyperparameter and the appropriate sampling step size. In addition, the PSO algorithm can determine multiple hyperparameters simultaneously [29]. Tao et al. used particle swarm optimization (PSO) with variable weights to optimize the kernel function parameter σ of SVM and proposed a PSO-SVM method based on variable weights [30]. In addition to traditional random forests and support vector machines, convolutional neural networks (CNNs) have been widely applied to hyperspectral data, including soil content inversion [31,32] and image fusion denoising [33,34]. Compared with the widely used two-dimensional image convolution, some scholars have also tried to use 1D-CNN to invert the content of soil components. Ng et al. carried out a 1D-CNN inversion based on the content of sand, clay, total carbon (TC), organic carbon (OC), and other substances in the typical spectral library. The results showed that 1D spectral data were superior to 2D spectral image data inversion [35].

Without spectral transformation, Xu et al. compared and analyzed the soil composition results inverted by the deep learning multi-layer perceptron, Le-Net5, and DenseNet10 neural networks. The results showed that deep learning-based algorithms can effectively use complex spectral data to retrieve the contents of soil components [36]. In addition, to solve the network degradation problem caused by the deepening of the network depth, the ResNet block is introduced; while retaining the original information, it also increases the convolutional information, effectively preventing the over-fitting problem of the model [37,38].

This paper focuses on some problems in the quantitative prediction of soil components from hyperspectral data, mainly including the removal of abnormal samples, the removal of hyperspectral curve noise by wavelet transformation, the selection of characteristic bands by the CARS algorithm, and the retrieval of soil components from hyperspectral data by the 1D convolutional neural network. To address these issues, we carried out this study from the following aspects: (1) to analyze the rationality of removing abnormal samples based on the Monte Carlo method of particle swarm optimization (PSO); (2) to measure the effectiveness of Shannon entropy wavelet packet pruning processing technology for the noise removal of the original spectral curve and the spectral curve after mathematical transformation; (3) to estimate the accuracy of the characteristic bands extracted by the CARS algorithm on the basis of the LSSVR modeling of PSO optimization; and (4) we combined the weight layer with 1D-CNN to explore the feasibility analysis of 1D-CNN in the quantitative prediction of the soil iron content. Then, we extracted the feature map of the convolutional neural network and analyzed the attention law of the convolutional neural network.

2. Materials

2.1. Study Area

The study area is located in the Jiansanjiang area of Heilongjiang Province in Northeast China (132.51°–134.37°E, 46.82°–48.22°N). It is located in the hinterland of the alluvial plain formed by Wusuli River, Songhua River, and Heilongjiang River. This region belongs to one of the four major black soil belts in the world, with a total coverage area of about 800 km² (Figure 1), and is one of the important agricultural product production bases in China [39]. The study area has a mid-temperate continental monsoon climate, with an annual average temperature of 1–2 °C, a frost-free period of 110–135 days, and an average precipitation of 550–600 mm. The average elevation is about 50 m, and the soil in the area is dominated by black soil. In addition, white pulp soil, meadow soil, and swamp soil are also widely distributed. The terrain of the study area is flat, mostly concentrated and contiguous farmland. The main crops are rice, soybean, and corn.

2.2. Data Sources

2.2.1. Soil Sampling and Fe Content Measurement

The sampling time for this study was April 2019. The bare soil area of cultivated land was selected as the sampling area. Soil samples were collected at a depth of 0–20 cm. To increase the representativeness of the sample, the soil at the center point and the four corners of the square whose diagonal line passing the center point was 20 cm was collected. After sampling, the soil was mixed into bags; each bag contained about 500 g, and the bag number, location coordinates, surrounding environment, and soil feature description were recorded. A total of 182 groups of samples were collected. The soil samples were air-dried, mixed well, ground, and passed through a 200-mesh sieve. Chemical analysis of the iron content in soil samples was performed by inductively coupled plasma-atomic emission spectrometry (ICP-AES) [40]. This work was completed by the Experimental Testing Center of Shenyang Geological Survey Center, China Geological Survey.

2.2.2. Soil Spectral Data Acquisition

This research used the FieldSpec 4 portable ground object spectrometer produced by ASD Company of the United States. The wavelength range of the spectrometer is 350–2500 nm, the sampling interval is 1.4 nm (wavelength 350–1000 nm) and 2 nm (wavelength 1000–2500 nm), and it covers a total of 2151 bands. For the detailed experimental steps, please refer to reference [41]. After sampling, the spectral curves were smoothed and denoised [42], and the bands affected by water vapor and the abnormal bands in the range of 350–399 nm and 2401–2500 nm caused by instrument noise were removed.

3. Methodology

3.1. Data Preprocessing

3.1.1. Particle Swarm Optimization Algorithm

PSO is one kind of evolutionary algorithm. On the premise of establishing a random number of individuals, the optimal solution is obtained through iteration. Each individual particle has two attributes, namely position X and velocity V. Each individual updates its position and velocity through two values, g_best and p_best. p_best represents the optimal solution of individual particles and g_best represents the optimal solution of the entire particle population [27,43]. The speed of the next iteration of the particle swarm

v_{i + 1}

is:

v_{i + 1} = w \times v_{i} + c_{1} \times rand 1 \times (p_best - x_{i}) + c_{2} \times rand 2 \times (g_{best}_{} - x_{i})

(1)

where, w, c₁, and c₂ are constants, and rand1 and rand2 are random numbers from 0 to 1. The position at the next moment is x_i+1:

x_{i + 1} = x_{i} + v_{i + 1}

(2)

PSO finds the optimal value of the fitness function through multiple iterations. The main functions of the PSO algorithm in this paper are as follows: (1) using the PSO algorithm to find the optimal number of features of PLS in abnormal sample removal and (2) finding the optimal hyperparameters of the least squares support vector machine (LSSVR), and enhancing the ability of LSSVR to evaluate the spectral modeling accuracy after different mathematical transformations. Ten-fold cross-validation is used to ensure the rationality of the training set and validation set. The specific parameters of PSO are shown in Table 1.

3.1.2. Monte Carlo Outlier Removal

The Monte Carlo method screens the abnormal samples based on their sensitivity to prediction errors. It can effectively detect outliers [44]. Usually, removing data outliers based only on sample size information often leads to poor results and low efficiency. For example, some sample points appear to be anomalous in composition, but have a positive response in modeling. In the selection of soil samples, due to the different types of land use in the study area, the Fe content of the soil samples was significantly different. If the box plot is simply used to remove abnormal Fe content values, although the variance in the data is reduced, the variation range of the target value is also reduced in machine learning. This will cause the retrieval of soil Fe elements to concentrate in a certain range, making it difficult to detect anomalies in the entire region. Before modeling, we used the PLS model based on the particle swarm optimization algorithm to predict the modeling results and convert the variance in the box plot into the sample residual mean and residual variance. As the content and spectral information of the samples were considered at the same time, the misjudgment of the sample point could be effectively avoided. The processing flow of this article is: (1) To find the best number of principal components to retain based on PSO. (2) All spectral data and measured data are used for learning to predict the results based on partial least squares prediction at a ratio of 8:2 between the training set and test set. Then, the mean value and mean square error of the residual error of each sample are calculated. (3) To establish a coordinate system for the residual mean and mean square error and create a scatter plot (Figure 2). The threshold value is used to segment and retain the sample points whose residual variance is below 0.72 and residual mean is within 8.

3.1.3. Wavelet Packet Transformation Based on Shannon Entropy

To enhance the correlation between the element content and spectrum in soil, and to reduce the influence of background noise, the raw spectra will be mathematically transformed using first differential (FD), second differential (SD), multiple scatter correction (MSC), and reciprocal logarithmic transformation (RLT). However, this processing may enhance the high-frequency noise in some bands [45,46,47]. The wavelet packet transformation can solve this problem. Wavelet and WPT are time-frequency transformation methods developed in recent years [48,49]. The difference between WPT and wavelet transformation is that wavelet transformation is a unilateral binary tree and only low frequencies are decomposed, while wavelet packet not only decomposes low-frequency signals, but also decomposes high-frequency area signals to form a complete binary tree. Compared with wavelet transformation, WPT has better flexibility [50] and overcomes the disadvantage of a low frequency resolution when the time resolution in wavelet transformation is high. It has a strong time-frequency localized analysis capability, and there is no redundancy in spectral decomposition, which can reduce detailed high-frequency noise caused by spectrometer precision, test conditions, and soil particle size. It can enhance the basic spectral shape characteristics determined by soil texture and moisture [51,52]. Wavelet packet decomposition is actually to convolve the wavelet packet function along the spectral direction, which is similar to Fourier transformation, except that the sine function of the Fourier transformation is replaced by a wavelet packet. The wavelet packet function can be expressed as:

\sum_{m \in Z} h_{n - 2 k} \cdot h_{n - 2 l} = δ_{kl} \sum_{m \in Z} h_{n} = \sqrt{2}, g_{k} = {(- 1)}^{k} h_{1 - k}

(3)

If

{h_{k}}_{k \in Z}

(Z is a set of integers),

{g_{k}}_{k \in Z}

is a set of conjugate mirror filters. A series of functions

{W_{n} (t)}_{n \in N}

(N is a non-negative integer) can be defined to satisfy the following equation:

{\begin{matrix} W_{2 n} (t) = \sqrt{2} \sum_{k \in Z} h_{k} W_{n} (2 t - k) \\ W_{2 n + 1} (t) = \sqrt{2} \sum_{k \in Z} g_{k} W_{n} (2 t - k) \end{matrix}

(4)

The signal x(t) expands the orthogonal wavelet packet basis function according to the following criteria:

x (t) = \sum_{k \in Z} C_{n, j}^{k} \cdot 2^{k / 2} \cdot W_{n} (2^{k} t - j)

(5)

where

C_{n, j}^{k} = 2^{k / 2} \int_{R} x (t) W_{n} (2^{k} t - j) dt

and

2^{j / 2} W_{n} (2^{j} t - k), j, k \in Z

) are called wavelet packet functions, and both are called wavelet packet libraries.

Each node of the spectrum decomposed by the wavelet packet will be divided into low-frequency and high-frequency parts. Previous studies on soil spectroscopy used wavelet packet decomposition mostly based on extracted high-frequency and low-frequency signals for wavelet reconstruction or used methods such as setting hard/soft thresholds to process the wavelet coefficients passing through the valve, and then performed spectral reconstruction. Most of these processing methods rely on experience and correlation indicators to judge whether to perform pruning. Therefore, Shannon entropy is introduced for pruning judgment, a certain wavelet packet function is selected, and n-layer wavelet packet transformation is performed on the spectrum to obtain the wavelet packet coefficients of each layer [53,54]. The following are the calculation steps of Shannon entropy:

(1) Calculating the energy value

E_{m, k} of each layer . E_{m, k} = \int {| D_{m, k} (t) |}^{2} dt

,

D_{m, k} (t)

is the wavelet packet reconstruction coefficient, and t is the soil spectral data;

(2) Calculating the Shannon entropy H(m,k) of each node in each layer according to the definition of Shannon entropy:

E = \sum_{1}^{N} E_{m, k} p_{m, k} = \frac{E_{m, k}}{E} H_{m, k} = - \sum_{i = 1}^{N} p_{m, k} {log p}_{m, k}

(6)

After calculating the Shannon entropy of each node, each node is pruned. The criterion for judging the pruning is: if the sum of the Shannon entropy of the two child nodes is less than the value of the parent node, it will be pruned, otherwise it will be kept. Finally, the pruned binary tree is reconstructed to obtain the spectrum after N-layer wavelet packet transformation. The wavelet packet transformation has the advantages of a greater focus on spectral details, retaining the sample spectral information to the greatest extent, and effectively removing spectral noise.

3.2. Extraction of Feature Bands Based on CARS Algorithm

The competitive adaptive weighting algorithm (CARS) is a conventional band extraction method. It is a variable selection algorithm based on the PLS coefficient, exponentially decreasing function, and adaptive reweighted sampling. The exponentially decreasing function can efficiently remove bands with lower regression coefficients. In the early stage, the spectral dimension is rapidly reduced, and in the later stage, the band selection is carried out in a gentle way with adaptive reweighted sampling [55]. Adaptive reweighted sampling can be extracted according to the weight of each coefficient. Each iteration adopts a repeatable extraction strategy. The weight is the coefficient of each band. Only bands with high weights have a greater probability of being retained [56].

Eighty percent of the total samples collected in this study were used for PLS modeling. The coefficient b_i of each variable was calculated, and the adaptive reweighted sample algorithm was used to sample the samples after the removal of abnormalities. Based on the new variable subset, 10-fold cross-validation modeling was carried out, and the model with the smallest RMSE of the validation set was selected. The detailed steps and features can be found in [18,22]. After CARS, multiple bands will be screened out, and the serial numbers of the bands are not the same for each run because CARS has a certain degree of randomness. In order to solve this problem, this study used multiple CARS runs, then counted the frequency of the selected bands, and then used the threshold method to filter the bands. The characteristic variables selected by the CARS algorithm can reduce the high collinearity problem between spectral bands and improve the accuracy and speed of the prediction model.

3.3. Modeling Method

3.3.1. Least Squares Support Vector Machine Modeling

Vapnik first proposed the method of support vector machine (SVM) [57]. SVM non-linearly maps the low sample space to a high-dimensional feature space, looking for a hyperplane so that, under the condition of correct classification, the distance between the sample and the hyperplane is the shortest. Support vector machine regression (SVR) is similar to SVM; the difference is that, in the SVR method, when looking for a hyperplane, the distance between the sample and the hyperplane is the longest, and the internal distance of the sample is the shortest. Least squares support vector machine regression (LSSVR) is a kind of SVR, and LSSVR finds the hyperplane with the smallest distance from all samples [58].

3.3.2. CNN Modeling Based on ResNet

The residual neural network (ResNet) is a kind of convolutional neural network (CNN) [59]. CNN consists of the input layer, convolutional layer, pooling layer, fully connected layer, and output layer. The convolutional layer is used to extract features from the input information, which can effectively extract multi-layer special information. In CNN, the fully connected layer is located at the last layer of the CNN network. In the classification task, the fully connected layer scores the information extracted by the convolutional layer to judge the “importance” of the extracted features. The activation function can introduce nonlinear factors into the neural network so that it has the effect of fitting various curves and has the ability to improve network learning [60].

The ResNet algorithm was developed based on the CNN architecture and is widely used in target recognition and feature information extraction. The expressiveness and feature-extraction capabilities of deep neural networks increase as the depth of the network increases. However, when the number of network layers increases to a certain number, the model accuracy will decrease rapidly [59,61]. To solve the problem of network degradation caused by increasing depth, He et al. proposed the ResNet deep learning framework [37]. This framework introduces an identity mapping layer to the network. They redefined each layer as a residual learning function due to input through the reference layer, instead of learning the unreferenced function, thus simplifying the training of the network, increasing the number of information transfer paths, and greatly improving the system’s accuracy.

The network structure used in this experiment is shown in Figure 3. The first layer is the weight layer, which randomly selects 80% of the sample data for modeling. The loop is used to find the principal component corresponding to the maximum mean squared error (MSE) of the subset. Modeling is performed using the selected principal components. We normalized the MSE and took the absolute value, setting the threshold to remove some small values, as the final weight. Using this method not only reduced band redundancy, but also avoided over-fitting under small sample conditions.

The second layer is the convolution layer. We used 30 one-dimensional convolution kernels with a length of 3 and a stride of 1. The third layer is the residual structure layer, we used the three same residual structures. The input data have two processing flows. (1) First, perform m-dimensional convolution of n convolution kernels, where n is the number of input features and m is the number of output features. A one-dimensional convolution kernel is used for normalization processing, and on this basis, the Relu activation function is activated. (2) The input value is subjected to the convolution operation of n unbiased convolution kernels with a size of 1 and a dimension of m. Finally, add the results of the two processes to perform a Relu activation. The input of the first residual structure is 30 features, and the output is 25 features. The input feature of the second residual structure is 25, and the output feature is 20. The input feature of the third residual structure is 20, and the output feature is 15. The fourth layer is the fully connected layer: before entering the fully connected layer, the 15 features output by the residual structure must be expanded along the feature dimension. The expanded data are sent to the fully connected layer, which has 26 neurons, and the dropout is 0.4. During training, 40% of the neurons are randomly excluded to reduce the occurrence of overfitting. Finally, it is normalized, activated with Relu, and sent to an output layer, which uses one neuron to fit the predicted value. The specific hyperparameter settings in ResNet training are shown in Table 2.

4. Results and Discussion

4.1. Processing of Spectral Data

4.1.1. Mathematical Transformation of Spectrum

The original spectral curves of 140 samples were subjected to four mathematical transformations of MSC, FD, SD, and LR. According to the Pearson correlation, the correlation coefficients between the original soil spectrum and its four mathematically transformed eigenvalues and the measured soil Fe content were obtained (Figure 4a–e). From Figure 4a, we can find that there are correlation coefficient reflection peaks at 1800, 2100, and 2300 nm, but the highest value is only around 0.175, which makes modeling difficult due to the low correlation between the original spectrum and Fe content. In Figure 4b, the correlation of the spectrum and Fe content is significantly improved, showing a strong correlation around the wavelengths of 1800 nm and 2300 nm. The correlation coefficient curves of the spectra and Fe content after MSC transformation (Figure 4c) near the 600 nm, 1000 nm, 1500–1800 nm, and 2100–2200 nm bands showed a strong correlation, and the correlation of many bands exceeded 0.6. In Figure 4d, the value of the correlation coefficient curve of the SD spectrum and Fe content fluctuates obviously. The bands with high correlation appeared individually, e.g., near the 1400 nm and 1800 nm bands, up to 0.7. The correlation of LR transformation was relatively low (Figure 4e); the highest was only 0.4, showing a weak correlation.

4.1.2. The Results of Wavelet Packet Enhancement Based on Shannon Entropy

The number of decomposition layers of different wavelet packets has a great influence on the correlation of spectra [62]. In this paper, Shannon entropy was used as the feature representation of information entropy to obtain the wavelet packet transformation spectral curve. Then, correlation analysis between the wavelet packet spectral curve and the measured value was carried out, and the number of decomposition layers corresponding to the spectral curve with the largest correlation coefficient was selected. Figure 5 shows the correlation curve of the four mathematical transformations and their optimal WP transformation. WP (wavelet packet) represents the spectral correlation curve of WPT. FD, MSC, SD, and SR represent the correlation curves of the original spectrum after four mathematical transformations. In Figure 5a, the red dotted line is the correlation curve between the FD transformation of the spectrum and its wavelet packet transformation and the Fe content. Within the spectral region of 600 nm–1600 nm, the correlation of FD-WP (red dotted line) significantly improved, and the correlation coefficient exceeded 0.6, which was significantly higher than that of FD-OS (green dotted line). The correlation of the curves showed a downward trend within the spectral region of 1600–2500 nm; although the correlation at 2400 nm dropped significantly, it also reached a negative correlation of 0.6. In Figure 5b, in the entire spectral range, the two transformed curves are very close, and the correlation coefficients of the two curves are both above 0.6 near the 620 nm band, while the correlation coefficient after OS-MSC transformation is close to 0.7. In the range of 1500 nm–1750 nm, the correlation coefficient transformed by WP-MSC (red dotted line) was close to 0.8, which was higher than the 0.7 observed for OS-MSC. Figure 5c shows the WP-SD spectrum after wavelet packet transformation; its correlation coefficient is not much different from that of the original spectrum, but the overall peak value of the correlation coefficient significantly improved. Figure 5d shows that the correlation coefficient between the spectrum obtained by wavelet packet transformation and the original spectrum is not much different, and the correlation is not high, but the peak value of the correlation coefficient after WP-LR transformation also significantly improved. Comparing the correlation curves of the above four kinds of wavelet packet transformations, the highest correlation was observed for WP-MSC transformation, followed by WP-FD transformation and WP-SD transformation. Therefore, this study will use the wavelet packet transformation curve after MSC, FD, and SD transformation to carry out LSSVR modeling and make a comparative analysis with the model established by MSC, FD, and SD transformation of the original spectrum.

4.2. Feature Band Extraction Based on CARS Algorithm

The CARS algorithm was used to select the characteristic bands, which were compared with the characteristic bands selected by the commonly used correlation coefficient method. Owing to the randomness of the CARS algorithm, repeated calculations of the CARS algorithm were carried out here, and the number of occurrences of each band was counted (taking SD as an example). Figure 6a shows the relationship between the RMSE and the number of iterations of the optimal subset of multiple CARS. It can be seen from the figure that the RMSE at the initial stage of the iteration showed a slow decrease with the increase in the number of iterations. Each CARS had the smallest RMSE value between 15–60 iterations, and when it exceeded 60 iterations, the RMSE began to increase significantly. Figure 6b is a plot of the number of iterations versus RMSE and the number of retained bands as the number of iterations increased. With the increase in the number of iterations, the number of bands kept decreasing, but showed a slowing trend. When the number of iterations was 52 and the number of bands was 20, the RMSE of the optimal subset was the smallest.

Figure 7 is a statistical diagram of the bands selected by the 10 times CARS algorithm. The box diagram represents the histogram of the frequency of occurrence of different wavelengths, and the right side is the number of selected characteristic bands. Among them, the five bands of 617, 2260, 2319, 2327, and 2351 nm appeared in all ten CARS calculations, indicating that these five bands were obvious characteristic bands, and their contribution was relatively high in the inversion of Fe element content.

We compared the bands with more than four occurrences with the characteristic bands selected by the correlation coefficient method (Figure 8). Figure 8a–c show the characteristic bands obtained by the CARS algorithm and the characteristic bands selected by the correlation coefficient method after MSC, FD, and SD transformations, respectively. The blue line is the correlation coefficient curve after the MSC, FD, and SD transformation. The red line is the band selected by the extreme points of the correlation coefficient (correlation coefficient, CC). The green line is the band selected by the CARS algorithm. The bands selected by the correlation coefficient method were only at the extreme points with high correlation. The CARS algorithm not only extracted bands with high correlation, but also some bands with low correlation were selected, so the total number of bands selected by CARS was much larger than that extracted by the correlation coefficient method. This shows that the bands extracted by the correlation coefficient method were only a subset of those extracted by the CARS algorithm. This further proves that, if only the correlation analysis method is used to select the characteristic bands, some characteristic values will be ignored.

4.3. Analysis of Modeling

4.3.1. PSO-LSSVR Modeling

Under the same mathematical transformation conditions and using the same data set, the PSO algorithm was used to select the optimal hyperparameters in LSSVR for the feature bands selected based on the Pearson correlation coefficient method and the CARS algorithm. The R², RMSE, and RPD were used to evaluate the accuracy of the built model, and 10-fold cross-validation was performed (see Table 3). It can be seen from Table 3 that, compared with the band selection of the correlation coefficient, the model established by the CARS algorithm to select the band had a higher R², which was improved by 0.072 in the MSC transformation, 0.035 in the FD transformation, and 0.044 in the SD transformation. When the correlation of the bands of the samples was low, the accuracy of the model based on the CARS algorithm was obviously better than that based on the correlation coefficient method. At the same time, this study also conducted a comparative study of LSSVR modeling on the FD, MSC, and SD values based on WP transformation and their values without WP transformation (Figure 9). Combining Table 3 and Figure 9a–f, it can be seen that the correlation coefficients of the models built by FD, SD, and MSC after WP transformation were increased by 0.069, 0.081, and 0.132, respectively. The RPD was above 2.1, indicating that the Fe content prediction model established by the soil spectrum after WP transformation achieved better performance.

Wavelet packet, as an effective method to remove noise during signal processing, has been used by many other scholars for the preprocessing of soil spectra [63,64]. The Shannon entropy wavelet packet transformation can solve the problem that the wavelet coefficients need to be artificially selected multiple times in the pruning judgment of the wavelet packet transformation. It was found that, because the wavelet packet could well remove the high-frequency noise caused by the FD and SD transformation of the soil spectrum, and the accuracy of the LSSVR model with the training set greatly improved (see Table 3). This research shows that, when the soil spectrum is denoised by MSC transformation, because the spectral average fitting method was used, it was not easy to generate high-frequency noise, and the modeling accuracy improvement by using wavelet packet transformation here was not obvious. Therefore, whether it is necessary to smooth and denoise the spectrum before using Shannon entropy wavelet packet transformation also needs to be further discussed. Owing to the variety of wavelet packet transformations, it is difficult to determine the number of decomposition layers, which requires continuous modeling verification. Therefore, we tried to use the characteristics of the mean value, variance, and fluctuation degree of the correlation coefficient curve after wavelet packet transformation to set the weight. Then, we chose a quantitative index to evaluate the correlation coefficient curve so that we could use the idea of optimization to carry out the selection of the number of decomposition layers by the wavelet packet transformation kernel, which could improve the efficiency of the number of decomposition layers.

4.3.2. CNN Modeling Based on ResNet

To solve the model overfitting problem, this paper adopted the CNN network with added residual blocks for modeling and compares it with the traditional machine learning model represented by LSSVR. The construction of ResNet was based on python3.8 and a GTX4050 graphics card environment, using Cuda in Pytorch to build the model. Figure 10a–f present a schematic diagram of six mathematical transformations and weight layers of the original spectrum. It can be seen that the weight values of the pure FD, MSC, and SD spectral transformations were lower than those of the FD, MSC, and SD spectral transformations after WP transformation. As shown in Figure 10a,b, the WP-FD spectral transformation value had similar characteristics to the single FD spectral transformation value in terms of weight. That is to say, the bands with high weights mainly appeared in the spectral range after 2000 nm, but the correlation coefficients of the WP-FD transformation values were higher than 10. The bands with high weights were more concentrated in small intervals (Figure 10b). Only a few weights of the single FD transformation values were higher than 10⁻³ (Figure 10a). The weight value of MSC spectral transformation changed significantly, and the weight value of a single MSC was mostly around 7.5 × 10⁻⁴, while the WP-MSC transformation value was greater than 10⁻³. Especially after 1000 nm, the band weight remained high (Figure 10c,d). Figure 10e,f show that the SD weight value after WP transformation improved within all spectral ranges. Different from the single FD, MSC, and SD transformations, the distribution of the correlation coefficient curves after the WP transformation was partly blocky, instead of jagged peaks. This shows that the WP transformation enhanced the ability to combine feature bands, and at the same time reduced the adverse effects of abnormal weight values on modeling caused by high-frequency noise.

To explore the change rules of the feature bands in different stages of training in ResNet, we took the FD transformation with the best effect as an example. Figure 11 is the training result of WP-FD, and the orange line is the MSE value of the training set. The blue line is the MSE value of the validation set. In the early stage of training, the accuracy of the training set and validation set of the model was very low. However, when it reached 150 learning iterations, the training set began to stabilize, along with the Pearson correlation coefficient between the training set and the validation set around 0.80. This shows that the CNN with the residual block avoided the over-fitting phenomenon of the model well. The optimal model was set to save, and Hook was used to extract the result and gradient of the last layer of the residual structure.

Figure 12 visualizes the changes during the training of the 10th feature of the residual network. In Figure 12a,c,e, we can see the difference between the spectrum after processing by the residual structure and the spectrum after processing by the weight layer. The learning rules of the residual structure for one-dimensional spectral data can be obtained. In Figure 12b,d,f, the blue area is the difference between the spectrum and the weight value after residual structure processing. Specific weight changes in different training periods could be found. The yellow and red lines represent gradient values greater than 0 and less than 0 in the backward feedback, respectively. The smaller they floated, the smaller the backward feedback of the corresponding band. This means that the contribution of the corresponding band to the residual layer during this period was smaller. It can be seen from Figure 12b that the backward feedback in the early stage of training was random, the area of the blue area was also large, and there were distributions in the entire spectral range. When the training reached the middle stage (Figure 12d), the backward feedback was concentrated in the wavelength ranges of 500–1300 nm and 2300–2500 nm, the blue spectrum difference area was also significantly reduced, and the features began to match the spectrum after adding weight. When reaching the late stage of training (Figure 12f), the backward feedback started to decrease, the feedback area was more concentrated, and the blue area also decreased rapidly.

By visualizing the features of the residual structure and the backward feedback gradient of the extracted 10th feature (Figure 12b,d,f), we can see that, without the residual layer, some bands may have been reinforced by reinforcement learning due to vanishing gradients. Some weak features will be ignored for learning, leading to overfitting. We believe that it is necessary to visualize the process of neural networks. By visualizing the specific changes in band weights during the training of the residual structure, the characteristic band that contributed most to the retrieval of soil iron content can be found. This can also provide a better understanding of the impact of each parameter in the model. This helps in analyzing the learning behavior and parameter sensitivity of the model, and provides guidance for adjusting the model, optimization algorithm, and feature extraction [65]. Owing to the small amount of data, the design of the CNN structure was relatively simple. To remove redundant spectra and accelerate learning, a weight layer was used, and bands with small coefficients in the PLS model were assigned a value of 0. However, it can be seen from the 15 characteristic curves that there were still some bands assigned a 0 value to participate in the training. We think that it will show better performance if these bands with smaller model coefficients are directly eliminated.

Figure 13 is a graph of the comparative analysis results of the modeling of CNN and WP-CNN. It can be seen from the results that the accuracy of the one-dimensional CNN modeling results of FD, MSC, and SD after wavelet packet transformation was better than that without WP transformation and the LSSVR modeling results, and the accuracy of WP-FD-CNN was the highest. It can be seen from Figure 13a,b that, compared with the FD spectral data without wavelet packet transformation, the correlation coefficient after WP-FD transformation increased by 0.023, RMSE decreased by 0.067, and RPD increased by 0.136. The modeling accuracy based on WP-FD-CNN was 0.027 higher than that of the R² of WP-FD-CARS-LSSVR, RMSE was reduced by 0.601, and RPD was reduced by 0.13.

The results show that the 1D-CNN network with added residual blocks still achieved high prediction accuracy under small sample conditions. The main possible reasons include: First, we established a predictive relationship with the soil Fe content based on spectral information. Unlike two-dimensional images, we only utilized one-dimensional regression fitting. The phenomenon of “same substance with different spectrum” and “foreign substance with the same spectrum” may have been due to the small change in the spectral characteristic curves between different features. This made it difficult for the deep learning network to fit relevant parameters, leading to spectral mismatch problems. Second, the residual block structure in the ResNet network enhanced the depth of the model network. This enabled the model to learn the characteristics of the spectral curve more deeply, while avoiding model instability and accuracy loss caused by increased network depth [66,67]. Third, in the establishment of the weighted layer, we used the MSE with the largest subset of 1600 times for modeling, removed frequency bands with smaller MSE coefficients, and used the normalized absolute value of the MSE coefficient as the initial weight. Fourth, L2 regularization was performed on the model. By imposing penalty terms on parameters and choosing smaller parameter values to reduce the covariance between features, the complexity of the model was limited and the risk of overfitting was reduced. This allowed the model to better adapt to new data during the training process and improve the generalization effect [68].

5. Conclusions

In this paper, when using hyperspectral data for the quantitative prediction of the soil total Fe, based on the Fe content and spectral characteristics of soil samples, combined with the Monte Carlo graph of the particle swarm optimization algorithm, the abnormal samples were screened by the partial least squares method. At the same time, WPT was performed on the processing of the MSC, FD, SD, and LR of the soil spectrum. The correlation change of the spectrum after wavelet packet transformation was analyzed. Finally, the characteristic bands of MSC, FD, and SD were extracted and a prediction model of the soil Fe content was established. This study reached the following conclusions:

(1): In the process of using machine learning to perform quantitative inversion modeling of soil composition on hyperspectral data, before using the box plot to remove outliers, it was necessary to combine the sample content and spectral characteristics for multiple pre-modeling stages until the optimal sample modeling method was selected.
(2): Compared with the Pearson correlation coefficient, when the CARS algorithm extracted feature bands, R² increased by 0.035–0.072. The feature bands selected based on the correlation coefficient were equivalent to the subset of feature bands selected by CARS. This shows that the Pearson correlation coefficient method selected a high degree of information redundancy between the characteristic bands and may have led to the loss of information on many important characteristic bands, and the accuracy of the model established accordingly will also decrease.
(3): Compared with the optimal model trained by LSSVR, WP-FD had the highest accuracy. After the Shannon entropy wavelet packet transformation of FD, the R² increased by 0.069, the R² of WP-MSC increased by 0.132, and the R² of WP-SD increased by 0.081. This shows that the Shannon entropy wavelet packet transformation had a significant positive impact on the modeling accuracy. In ResNet training, the R² of WP-FD-CNN was 0.023 higher than that of the FD-CNN model, and the R² value of WP-FD-CNN was 0.073 and 0.027 higher than those of FD-CARS-LSSVR and WP-FD-CARS-LSSVR respectively. This shows that the residual neural network still had high retrieval accuracy in the case of a small sample size.

Author Contributions

Conceptualization, W.L. and H.H.; methodology, W.L. and H.H.; software, W.L.; validation, W.L. and P.Z.; formal analysis, W.L. and P.Z.; investigation, M.L.; resources, Y.W.; data curation, M.L.; writing—original draft preparation, W.L.; writing—review and editing, W.L., P.Z. and H.H.; visualization, W.L.; supervision, P.Z.; project administration, P.Z.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the social service project completed in collaboration between Beijing Dadi Kaiyuan Geology Engineering Co., Ltd. and China University of Geosciences (Beijing), grant number H02691, and the State Administration of Science, Technology and Industry for National Defence, PRC, Subproject of major projects: H02697.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Viscarra Rossel, R.A.; Bui, E.N.; De Caritat, P.; McKenzie, N.J. Mapping iron oxides and the color of Australian soil using visible–near-infrared reflectance spectra. J. Geophys. Res. Earth Surf. 2010, 115, F4. [Google Scholar] [CrossRef]
Zhu, F.; Li, Y.; Xue, S.; Hartley, W.; Wu, H. Effects of iron-aluminium oxides and organic carbon on aggregate stability of bauxite residues. Environ. Sci. Pollut. Res. 2016, 23, 9073–9081. [Google Scholar] [CrossRef] [PubMed]
Rossel, R.V.; Walvoort, D.J.J.; McBratney, A.B.; Janik, L.J.; Skjemstad, J.O. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 2006, 131, 59–75. [Google Scholar] [CrossRef]
Demattê, J.A.M.; Sousa, A.A.; Alves, M.C.; Nanni, M.R.; Fiorio, P.R.; Campos, R.C. Determining soil water status and other soil characteristics by spectral proximal sensing. Geoderma 2006, 135, 179–195. [Google Scholar] [CrossRef]
Reeves, J.B. Near-versus mid-infrared diffuse reflectance spectroscopy for soil analysis emphasizing carbon and laboratory versus on-site analysis: Where are we and what needs to be done? Geoderma 2010, 158, 3–14. [Google Scholar] [CrossRef]
Ben-Dor, E.; Banin, A. Near-infrared analysis as a rapid method to simultaneously evaluate several soil properties. Soil Sci. Soc. Am. J. 1995, 59, 364–372. [Google Scholar] [CrossRef]
Shepherd, K.D.; Walsh, M.G. Development of reflectance spectral libraries for characterization of soil properties. Soil Sci. Soc. Am. J. 2002, 66, 988–998. [Google Scholar] [CrossRef]
Nishad, A.; Upadhyay, A.; Ravi Shankar Reddy, G.; Bajaj, V. Classification of epileptic EEG signals using sparse spectrum based empirical wavelet transform. Electron. Lett. 2020, 56, 1370–1372. [Google Scholar] [CrossRef]
Martini, A.; Pankin, I.A.; Marsicano, A.; Lomachenko, K.A.; Borfecchia, E. Wavelet analysis of a Cu-oxo zeolite EXAFS simulated spectrum. Radiat. Phys. Chem. 2020, 175, 108333. [Google Scholar] [CrossRef]
Zheng, L.; Li, M.; An, X.; Pan, L.; Sun, H. Spectral feature extraction and modeling of soil total nitrogen content based on NIR technology and wavelet packet analysis. Multispectral Hyperspectral Ultraspectral Remote Sens. Technol. Tech. Appl. III 2010, 7857, 362–369. [Google Scholar]
Liu, W.; Chang, Q.R.; Guo, M.; Xing, D.X.; Yuan, Y.S. Extraction of first derivative spectrum features of soil organic matter via wavelet de-noising. Spectrosc. Spectr. Anal. 2011, 31, 100–104. [Google Scholar]
Meng, X.; Bao, Y.; Liu, J.; Liu, H.; Zhang, X.; Zhang, Y.; Wang, P.; Tang, H.; Kong, F. Regional soil organic carbon prediction model based on a discrete wavelet analysis of hyperspectral satellite data. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102111. [Google Scholar] [CrossRef]
Wickerhauser, M.V. INRIA Lectures on Wavelet Packet Algorithms; Yale University Department of Mathematics: New Haven, CT, USA, 1991. [Google Scholar]
Zheng, L.H.; Li, M.Z.; Pan, L.; Sun, J.Y.; Tang, N. Application of wavelet packet analysis in estimating soil parameters based on NIR spectra. Spectrosc. Spectr. Anal. 2009, 29, 1549–1552. [Google Scholar]
Narváez, P.; Gutierrez, S.; Percybrooks, W.S. Automatic segmentation and classification of heart sounds using modified empirical wavelet transform and power features. Appl. Sci. 2020, 10, 4791. [Google Scholar] [CrossRef]
Zhang, R.; Li, Z.F.; Pan, J.J. Coupling discrete wavelet packet transformation and local correlation maximization improving prediction accuracy of soil organic carbon based on hyperspectral reflectance. Trans. Chin. Soc. Agric. Eng. 2017, 33, 175–181. [Google Scholar]
Shen, L.; Gao, M.; Yan, J.; Li, Z.-L.; Leng, P.; Yang, Q.; Duan, S.-B. Hyperspectral Estimation of Soil Organic Matter Content using Different Spectral Preprocessing Techniques and PLSR Method. Remote Sens. 2020, 12, 1206. [Google Scholar] [CrossRef]
Bai, Z.; Xie, M.; Hu, B.; Luo, D.; Wan, C.; Peng, J.; Shi, Z. Estimation of Soil Organic Carbon Using Vis-NIR Spectral Data and Spectral Feature Bands Selection in Southern Xinjiang, China. Sensors 2022, 22, 6124. [Google Scholar] [CrossRef] [PubMed]
Xie, S.; Ding, F.; Chen, S.; Wang, X.; Li, Y.; Ma, K. Prediction of soil organic matter content based on characteristic band selection method. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2022, 273, 120949. [Google Scholar] [CrossRef]
Hong, Y.; Chen, Y.; Yu, L.; Liu, Y.; Liu, Y.; Zhang, Y.; Liu, Y.; Cheng, H. Combining fractional order derivative and spectral variable selection for organic matter estimation of homogeneous soil samples by VIS–NIR spectroscopy. Remote Sens. 2018, 10, 479. [Google Scholar] [CrossRef]
Xie, S.; Li, Y.; Wang, X.; Liu, Z.; Ma, K.; Ding, L. Research on estimation models of the spectral characteristics of soil organic matter based on the soil particle size. Spectrochim. Acta A Mol. Biomol. Spectrosc. 2021, 260, 119963. [Google Scholar] [CrossRef]
Li, H.; Liang, Y.; Xu, Q.; Cao, D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Analytica Chimica Acta. 2009, 648, 77–84. [Google Scholar] [CrossRef]
Wei, L.; Yuan, Z.; Zhong, Y.; Yang, L.; Hu, X.; Zhang, Y. An improved gradient boosting regression tree estimation model for soil heavy metal (Arsenic) pollution monitoring using hyperspectral remote sensing. Appl. Sci. 2019, 9, 1943. [Google Scholar] [CrossRef]
Cherkassky, V.; Ma, Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004, 17, 113–126. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Juyang, L.; Qilin, Z.; Yudong, W. Using genetic algorithm to optimize parameters of support vector machine and its application in material fatigue life prediction. Adv. Nat. Sci. 2015, 8, 21–26. [Google Scholar]
Awad, M.; Khanna, R.; Awad, M.; Khanna, R. Support Vector Regression. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar]
Huang, C.L.; Dun, J.F. A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
Van Gestel, T.; Suykens, J.A.; Baesens, B.; Viaene, S.; Vanthienen, J.; Dedene, G.; Demoor, B.; Vandewalle, J. Benchmarking least squares support vector machine classifiers. Mach. Learn. 2004, 54, 5–32. [Google Scholar] [CrossRef]
Guo, X.C.; Yang, J.H.; Wu, C.G.; Wang, C.Y.; Liang, Y.C. A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 2008, 71, 3211–3215. [Google Scholar] [CrossRef]
Lin, T.; Wu, P.; Gao, F.; Yu, Y.; Wang, L. Study on SVM temperature compensation of liquid ammonia volumetric flowmeter based on variable weight PSO. Int. J. Heat Technol. 2015, 33, 151–156. [Google Scholar] [CrossRef]
Xu, X.; Du, C.; Ma, F.; Qiu, Z.; Zhou, J. A Framework for High-Resolution Mapping of Soil Organic Matter (SOM) by the Integration of Fourier Mid-Infrared Attenuation Total Reflectance Spectroscopy (FTIR-ATR), Sentinel-2 Images, and DEM Derivatives. Remote Sens. 2023, 15, 1072. [Google Scholar] [CrossRef]
Yang, P.; Hu, J.; Hu, B.; Luo, D.; Peng, J. Estimating Soil Organic Matter Content in Desert Areas Using In Situ Hyperspectral Data and Feature Variable Selection Algorithms in Southern Xinjiang, China. Remote Sens. 2022, 14, 5221. [Google Scholar] [CrossRef]
Dian, R.; Li, S.; Kang, X. Regularizing hyperspectral and multispectral image fusion by CNN denoiser. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1124–1135. [Google Scholar] [CrossRef] [PubMed]
Dian, R.; Guo, A.; Li, S. Zero-Shot Hyperspectral Sharpening. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12650–12666. [Google Scholar] [CrossRef] [PubMed]
Ng, W.; Minasny, B.; Montazerolghaem, M.; Padarian, J.; Ferguson, R.; Bailey, S.; McBratney, A.B. Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra. Geoderma 2019, 352, 251–267. [Google Scholar] [CrossRef]
Xu, Z.; Zhao, X.; Guo, X.; Guo, J. Deep learning application for predicting soil organic matter content by VIS-NIR spectroscopy. Comput. Intell. Neurosci. 2019, 2019, 3563761. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 2015, 1–12. [Google Scholar] [CrossRef]
Fang, N.; Yang, Z.; Liu, G.; Dai, M.; Liu, K. Spatial heterogeneity and influencing factors of the ecological stoichiometry of soil nitrogen and phosphorus in the Jiansanjiang area. Geophys. Geochem. Explor. 2022, 46, 1121–1131. [Google Scholar]
Li, X.; Coles, B.J.; Ramsey, M.H.; Thornton, I. Sequential extraction of soils for multielement analysis by ICP-AES. Chem. Geol. 1995, 124, 109–123. [Google Scholar] [CrossRef]
Danner, M.; Locherer, M.; Hank, T.; Richter, K. Spectral Sampling with the ASD FieldSpec 4 – Theory, Measurement, Problems, Interpretation; EnMAP Field Guides Technical Report; GFZ Data Services: Potsdam, Germany, 2015. [Google Scholar]
Tsai, F.; Philpot, W. Derivative analysis of hyperspectral data. Remote Sens. Environ. 1998, 1, 41–51. [Google Scholar] [CrossRef]
Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
Liu, C.; Sun, X.; Wu, J.; Wu, S.; Miao, Y. Study on Elimination of Abnormal Wheat Powder Samples Based on NIR. J. Agric. Mech. Res. 2014, 36, 46–48+56. [Google Scholar]
Zhou, W.; Yang, H.; Xie, L.; Li, H.; Huang, L.; Zhao, Y.; Yue, T. Hyperspectral inversion of soil heavy metals in Three-River Source Region based on random forest model. Catena 2021, 202, 105222. [Google Scholar] [CrossRef]
Liu, J.; Zhang, Y.; Wang, H.; Du, Y. Study on the prediction of soil heavy metal elements content based on visible near-infrared spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2018, 199, 43–49. [Google Scholar] [CrossRef]
Vašát, R.; Kodešová, R.; Klement, A.; Borůvka, L. Simple but efficient signal pre-processing in soil organic carbon spectroscopic estimation. Geoderma 2017, 298, 46–53. [Google Scholar] [CrossRef]
Yen, G.G.; Lin, K.C. Wavelet packet feature extraction for vibration monitoring. IEEE Trans. Ind. Electron. 2000, 3, 650–667. [Google Scholar] [CrossRef]
Coifman, R.R.; Meyer, Y.; Quake, S.; Wickerhauser, M.V. Signal Processing and Compression with Wavelet Packets. Wavelets and Their Applications; Springer: Dordrecht, The Netherlands, 1994; pp. 363–379. [Google Scholar]
Jiang, C.; Qian, L.; Wu, Z.; Wen, Y.; Deng, N. Multi-scale correlation analysis of soil organic carbon with its influence factors using wavelet transform. Chin. J. Appl. Ecol. 2013, 24, 3415–3422. [Google Scholar]
Hoang, V.D. Wavelet-based spectral analysis. TrAC Trends Anal. Chem. 2014, 62, 144–153. [Google Scholar] [CrossRef]
Zheng, L.H.; Li, M.Z.; Sun, H. Development of an analyzing system for soil parameters based on NIR spectroscopy. Spectrosc. Spectr. Anal. 2009, 29, 2633–2636. [Google Scholar]
Qu, D.; Li, W.; Zhang, Y.; Sun, B.; Zhong, Y.; Liu, J.; Yu, D.; Li, M. Support vector machines combined with wavelet-based feature extraction for identification of drugs hidden in anthropomorphic phantom. Measurement 2013, 46, 284–293. [Google Scholar] [CrossRef]
Walczak, B.; Van Den Bogaert, B.; Massart, D.L. Application of wavelet packet transform in pattern recognition of near-IR data. Anal. Chem. 1996, 68, 1742–1747. [Google Scholar] [CrossRef]
Ali, K.M.; Khan, M. Application based construction and optimization of substitution boxes over 2D mixed chaotic maps. Int. J. Theor. Phys. 2019, 58, 3091–3117. [Google Scholar] [CrossRef]
Cai, L.; Ding, J. Wavelet transformation coupled with CARS algorithm improving prediction accuracy of soil moisture content based on hyperspectral reflectance. Trans. Chin. Soc. Agric. Eng. 2017, 33, 144–151. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Liu, S.; Xu, L.; Jiang, Y.; Li, D.; Chen, Y.; Li, Z. A hybrid WA–CPSO-LSSVR model for dissolved oxygen content prediction in crab culture. Eng. Appl. Artif. Intell. 2014, 29, 114–124. [Google Scholar] [CrossRef]
Wu, Z.; Shen, C.; Van Den Hengel, A. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognit. 2019, 90, 119–133. [Google Scholar] [CrossRef]
Qu, J.; Yu, L.; Yuan, T.; Tian, Y.; Gao, F. Adaptive fault diagnosis algorithm for rolling bearings based on one-dimensional convolutional neural network. Chin. J. Sci. Instrum. 2018, 39, 134–143. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Hanbay, D.; Turkoglu, I.; Demir, Y. Prediction of wastewater treatment plant performance based on wavelet packet decomposition and neural networks. Expert Syst. Appl. 2008, 2, 1038–1043. [Google Scholar] [CrossRef]
Hinge, G.; Piplodiya, J.; Sharma, A.; Hamouda, M.A.; Mohamed, M.M. Evaluation of Hybrid Wavelet Models for Regional Drought Forecasting. Remote Sens. 2022, 14, 6381. [Google Scholar] [CrossRef]
Shen, L.; Gao, M.; Yan, J.; Wang, Q.; Shen, H. Winter Wheat SPAD Value Inversion Based on Multiple Pretreatment Methods. Remote Sens. 2022, 14, 4660. [Google Scholar] [CrossRef]
Baldi, P. Gradient descent learning algorithm overview: A general dynamical systems perspective. IEEE Trans. Neural Netw. 1995, 6, 182–195. [Google Scholar] [CrossRef]
Zeng, P.; Song, X.; Yang, H.; Wei, N.; Du, L. Digital Soil Mapping of Soil Organic Matter with Deep Learning Algorithms. ISPRS Int. J. Geo-Inf. 2022, 11, 299. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, X.; Feng, W.; Xu, J. Deep Learning Classification by ResNet-18 Based on the Real Spectral Dataset from Multispectral Remote Sensing Images. Remote Sens. 2022, 14, 4883. [Google Scholar] [CrossRef]
Liu, H.; Cui, Y.; Wang, J.; Yu, H. Analysis and Research on Rice Disease Identification Method Based on Deep Learning. Sustainability 2023, 15, 9321. [Google Scholar] [CrossRef]

Figure 1. (a) Location of the study area; (b) spatial distribution of sampling points.

Figure 2. Monte Carlo scatter plot. The red box shows the selected sample points.

Figure 3. Schematic diagram of neural network structure.

Figure 4. (a) Correlation coefficient diagram between original spectral curve and soil Fe content; (b) correlation coefficient diagram between spectral curves after FD transformation and soil Fe content; (c) correlation coefficient diagram between MSC transformed spectral curve and soil Fe content; (d) correlation coefficient diagram of spectral curve after SD transformation and soil Fe content; (e) correlation coefficient between spectral curve and soil Fe content after LR transformation.

Figure 5. (a) Correlation curve after FD and WP-FD transformation; (b) correlation curve after MSC and MSC-WP transformation; (c) correlation curve after SD and SD-WP transformation; (d) correlation curve after LR and WP-LR transformation.

Figure 6. (a) Relationship between the number of iterations and RMSE in the optimal subset of 10 CARS; (b) relationship between the number of iterations and RMSE, and the number of bands after removing abnormalities in a single CARS. The red line is the RMSE transformation curve and the green line is the number of selected bands.

Figure 7. Statistical diagram of the 10 times CARS-selected bands.

Figure 8. Feature bands selected by CARS algorithm and correlation coefficient method after MSC, FD, and SD transformation. (a) Bands selected by the FD-CARS method and the correlation coefficient method. (b) Bands selected by the MSC-CARS method and the correlation coefficient method. (c) Bands selected by the SD-CARS method and the correlation coefficient method.

Figure 9. LSSVR modeling diagram (the best result of 10 cross-validations): (a) FD; (b) WP-FD; (c) MSC; (d) WP-MSC; (e) SD; (f) WP-SD. The red line is a straight line with a slope of 1 and the yellow line is a regression line.

Figure 10. Schematic diagram of band weights and correlations under different mathematical changes: (a) FD; (b) WP-FD; (c) MSC; (d) WP-MSE (e) SD; and (f) WP-SD. The red curve is the weight curve after mathematical transformation, and the blue curve is the reflectance of each band after mathematical transformation.

Figure 11. Figure of the training set and the validation set R² with the increasing number of iterations in ResNet.

Figure 12. Schematic diagram and backward feedback diagram of the tenth Residual structural feature. The red line is the spectrum after the weight layer. The blue line is the spectrum after the residual layer. The graph on the right is the feedback gradient greater than 0 with the yellow line. The green line is the feedback gradient less than 0. The blue area is the spectral difference between the weight layer spectrum and the residual layer. Red and blue lines use the left scale, and yellow and green lines use the right scale.

Figure 13. CNN model and WP-CNN model R², RMSE, and RPD. The red line is a straight line with a slope of 1. The yellow line is the regression line.

Table 1. Particle swarm optimization parameters.

Method	Particles Size	Max Iterations	Inertia Weight	Acceleration Coefficients	Topology	Fitness Function	Binary Mask
Feature extraction	20	300	0.9	c₁ = 1.5 c₂ = 1.5	Global Topology	PLS Validation accuracy	0.5
Hyperparameter tuning	20	300	0.8	c₁ = 2.0 c₂ = 2.0	Global Topology	PLSVR Validation accuracy	0.5

Table 2. Hyperparameter settings in the training of ResNet.

Hyperparameters	Value
Batch size	30
Loss	MSE
Weight decay	L2 regularization
Constraint rate	0.03
optimizer	Adam
Learning rate	0.05
Learning rate decay	0.4
Max epochs	500

Table 3. Comparison of CARS and Correlation Coefficient Method in Selecting Feature Bands for Modeling.

Feature Extraction Method	Spectral Pretreatments	Whether it Undergoes Wavelet Packet Transformation	Calibration R²	Validation R²
CARS	FD	NO	0.690	0.690
	FD	YES	0.749	0.748
	MSC	NO	0.639	0.647
	MSC	YES	0.641	0.640
	SD	NO	0.645	0.643
	SD	YES	0.756	0.754
Correlation Coefficient	FD	NO	0.655	0.654
	FD	YES	0.688	0.688
	MSC	NO	0.567	0.567
	MSC	YES	0.602	0.601
	SD	NO	0.601	0.600
	SD	YES	0.712	0.712

Note: The calibration R² and validation R² in the table are the mean values of the 10 cross-validations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, W.; Huo, H.; Zhou, P.; Li, M.; Wang, Y. Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform. Remote Sens. 2023, 15, 4681. https://doi.org/10.3390/rs15194681

AMA Style

Liu W, Huo H, Zhou P, Li M, Wang Y. Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform. Remote Sensing. 2023; 15(19):4681. https://doi.org/10.3390/rs15194681

Chicago/Turabian Style

Liu, Weichao, Hongyuan Huo, Ping Zhou, Mingyue Li, and Yuzhen Wang. 2023. "Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform" Remote Sensing 15, no. 19: 4681. https://doi.org/10.3390/rs15194681

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Hyperspectral Modeling of Total Iron Content in Soil Applying LSSVR and CNN Based on Shannon Entropy Wavelet Packet Transform

Abstract

1. Introduction

2. Materials

2.1. Study Area

2.2. Data Sources

2.2.1. Soil Sampling and Fe Content Measurement

2.2.2. Soil Spectral Data Acquisition

3. Methodology

3.1. Data Preprocessing

3.1.1. Particle Swarm Optimization Algorithm

3.1.2. Monte Carlo Outlier Removal

3.1.3. Wavelet Packet Transformation Based on Shannon Entropy

3.2. Extraction of Feature Bands Based on CARS Algorithm

3.3. Modeling Method

3.3.1. Least Squares Support Vector Machine Modeling

3.3.2. CNN Modeling Based on ResNet

4. Results and Discussion

4.1. Processing of Spectral Data

4.1.1. Mathematical Transformation of Spectrum

4.1.2. The Results of Wavelet Packet Enhancement Based on Shannon Entropy

4.2. Feature Band Extraction Based on CARS Algorithm

4.3. Analysis of Modeling

4.3.1. PSO-LSSVR Modeling

4.3.2. CNN Modeling Based on ResNet

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI