Next Article in Journal
Abdomen Fat and Liver Segmentation of CT Scan Images for Determining Obesity and Fatty Liver Correlation
Previous Article in Journal
Special Issue on New Frontiers in Diatom Nanotechnology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of Hyperspectral Technology Combined with Genetic Algorithm to Optimize Convolution Long- and Short-Memory Hybrid Neural Network Model in Soil Moisture and Organic Matter

School of Mechanical and Electrical Engineering, Shihezi University, Shihezi 832003, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(20), 10333; https://doi.org/10.3390/app122010333
Submission received: 11 September 2022 / Revised: 26 September 2022 / Accepted: 30 September 2022 / Published: 13 October 2022
(This article belongs to the Section Agricultural Science and Technology)

Abstract

:
A method of soil moisture and organic matter content detection based on hyperspectral technology is proposed. A total of 800 different soil samples and hyperspectral data were collected in the laboratory and from the field. A hyperspectral database was established. After wavelet denoising and principal component analysis (PCA) preprocessing, the convolutional neural network (CNN) module was first used to extract the wavelength features of the data. Then, the long- and short-memory neural network (LSTM) module was used to extract the feature bands and nearby hidden state vectors. At the same time, the genetic algorithm (GA) was used to optimize the hyperparametric weight and bias value of the LSTM training network. At the initial stage, the data were normalized, and all features were analyzed by grey correlation degree to extract important features and to reduce the computational complexity of the data. Then, the GA-optimized CNN-LSTM hybrid neural network (GA-CNN-LSTM) algorithm model proposed in this paper was used to predict soil moisture and organic matter. The prediction performance was compared with CNN, support vector regression (SVR), and CNN-LSTM hybrid neural network model without GA optimization. The GA-CNN-LSTM algorithm was superior to other models in all indicators. The highest accuracy rates of 94.5% and 92.9% were obtained for soil moisture and organic matter, respectively. This method can be applied to portable hyperspectrometers and unmanned aerial vehicles to realize large-scale monitoring of moisture and organic matter distribution and to provide a basis for rational irrigation and fertilization in the future.

1. Introduction

1.1. Background and Motivation

Agriculture is the foundation of and is essential for human survival. The total global population has exceeded 7.5 billion, and it is expected to reach 11 billion by 2100 [1]. By the end of 2021, 702 million to 828 million people in the world will face hunger [2]. With the development of human society and economy, limited resources cannot meet the needs of the ever-increasing population [3]. Under the influence of multiple factors, such as nature and social economy, productivity declines, land degradation, soil erosion, and other problems are emerging sequentially, and mankind is facing severe challenges [4,5]. As a natural resource for agricultural production, soil is a natural complex that contains a variety of ingredients, which provide necessary moisture and nutrients for plant growth [6,7]. Soil organic matter (SN) comprises essential nutrients provided by the soil for plant growth. Mineral nutrients in the soil can be absorbed by plant roots directly or after transformation. These include 13 elements, such as nitrogen, phosphorus, potassium, calcium, magnesium, sulfur, iron, boron, molybdenum, zinc, manganese, copper, and chlorine [8,9]. Generally, the contents of nitrogen, phosphorus, and potassium are used as an index to measure soil fertility. Fast and accurate monitoring of SN content plays an important role in research fields, such as precision fertilization of farmland [10,11]. Soil moisture (SM) changes have an important impact on the physical and chemical properties of the soil and the growth and development of vegetation [12,13]. The accurate control of the temporal and spatial distribution and changes of soil moisture is in the monitoring of crop growth, agricultural moisture, and soil drought. The field is of great significance.
SN mainly comes from plants, animals, and microbial residues, among which higher plants are the main source. The organisms that first appeared in the parent material in the original soil were microorganisms. With the development of biological evolution and soil-forming process, animal and plant residues and their secretions became the basic source of SN [14]. In natural soil, ground vegetation residues and roots are the main sources of SN, such as trees, shrubs, grasses, and their residues, which provide a large amount of organic residues to the soil every year [15]. In agricultural soil, the source of SN is wide, mainly including crop stubble, straw returned to the field, and green manure turned over; the manure of humans and livestock; leftovers of industrial and agricultural by-products (such as distillers’ grains, ammonium sulfite papermaking waste liquid, etc.); urban domestic garbage and sewage; the remains and secretions of soil microorganisms and animals (such as earthworms, insects, etc.); and various artificial organic fertilizers (manure, humic acid fertilizer, sludge, soil miscellaneous fertilizer, etc.) [16]. Among them, there is no natural vegetation in the cultivated soil, which mainly comes from the exudates of crop roots, root stubbles, dead leaves, and organic fertilizers (green manure, compost, compost and manure, etc.) that people apply every year [17].
Soil moisture is the main source of water absorption by plants (except for hydroponic plants). In addition, plants can also directly absorb a small amount of water that falls on the leaves [18]. The main sources of soil moisture are precipitation and irrigation water, which participate in the large water cycle of the lithosphere-biosphere-atmosphere-hydrosphere. At present, mudslides, earthquakes, and droughts are common [19]. Among them, drought is the most serious because it leads to the loss of land and water resources. Compared with other natural disasters, the scope of drought caused by soil moisture loss is wider, and the experience lasts longer [20]. It severely damages the regional ecological environment and affects the normal economic production and life of human beings. Surveys and studies in recent years have shown that about 30% of the world’s land area is arid, which largely restricts the development of agricultural production and hydrological environment [21].

1.2. Problem Statement

SN and moisture detection can guide irrigation and fertilization, ensure a balanced supply of crop nutrients, meet crop growth needs, improve fertilizer utilization efficiency, and reduce application [22], thereby protecting the environment, increasing yield, and saving manpower and material resources. However, the traditional method of measuring the content of soil components is through field sampling and laboratory analysis [23,24]. This method requires many testing personnel and expensive testing equipment; it has problems, such as high testing cost, low efficiency, and inability to perform large-scale simultaneous testing [25]. Hyperspectral analysis technology has the advantages of low cost, fast speed, and environmental protection [26]. In recent years, it has been more and more widely used in SN determination [27,28]. The hyperspectral analysis technology can be carried on the back of the inspector and can be equipped with an unmanned aerial vehicle.

1.3. Contribution

In this paper, a method of soil moisture and organic matter detection based on hyperspectral technology is proposed. This method can be applied to portable hyperspectral spectrometers and unmanned aerial vehicles to achieve rapid detection of soil moisture and organic matter. First, the hyperspectral data of soil moisture and organic matter in the laboratory and in the field were obtained. After wavelet denoising and principal component analysis (PCA) preprocessing, the GA-optimized CNN-LSTM hybrid neural network (GA-CNN-LSTM) algorithm model proposed in this paper was used to predict soil moisture and organic matter. The prediction performance of the two single models, namely CNN and SVR, and the CNN-LSTM hybrid neural network model without GA optimization were compared. The GA-CNN-LSTM algorithm has the highest accuracy of 94.5% and 92.9% for soil moisture and organic matter, respectively. The detection method of soil moisture and organic matter using hyperspectral technology can monitor the distribution of soil moisture and organic matter on a large scale and provide a basis for future rational irrigation and fertilization.
The rest of this paper is organized as follows. The second part discusses the related work. Section 3 discusses the proposed method in detail. Section 4 discusses the results and discussion. Finally, a conclusion is drawn in Section 5.

2. Related Work

The soil moisture and organic matter detection method through on-site sampling and laboratory analysis has proven to be the most reliable and accurate means [23,24]. M. Chen and M. Zhang used gold nanoparticles (AuNPs) and electrochemically reduced graphene oxide (ERGO) nano-hybrid composite membranes to prepare an all-solid nitrate-doped polypyrrole (PPy(NO) ion-selective electrode (ISE)) [29,30]. The ISE-based soil nitrate-nitrogen (NO-N) in situ test is monitored in a three-stage laboratory column [31]. Gou H and Yoo Y used genetic algorithms to classify the qualitative levels of SN or installed wired or wireless sensor devices in the field for real-time soil monitoring [32]. Sohan, Dudala, and others designed and produced a composite polydimethylsiloxane (PDMS) device to test the amount of soil EC and nitrite [33].
Many scholars also used near-infrared (NIR) spectroscopy technology for SN moisture determination. H. T. Cai et al. combined the migration learning algorithm with near-infrared spectroscopy technology to establish a s SN information extraction model. The results showed that the model improved the efficiency of Please add academic editor if available. information extraction and obtained high-confidence feature extraction results [34]. J. A. Prananto et al. used NIRS to accurately predict soil macronutrients (N, P, K, S, Ca, and Mg) and micronutrients (Zn, Mn, and Cu) [35]. P. Zhou et al. developed an online TN concentration detector based on near-infrared spectroscopy technology by using absorbance data at 1680, 1550, 1375, 1245, 1130, and 1070 nm to establish an ELM model to identify soil nitrogen content [36]. The predictive model based on machine learning algorithms had strong analytical capabilities and robustness to nonlinear problems; it has been successfully applied to many fields [37,38]. In soil detection, ELM was also widely used to estimate soil moisture, soil temperature, and SN and achieved high prediction accuracy [39,40].
In recent years, hyperspectral technology has been developed and increasingly used to determine soil organic matter and moisture content. O’Rourke et al. used hyperspectral technology to detect the OM and OC contents of forest surface soil [41]. Jiang et al. used hyperspectral technology to quantitatively estimate and compare the cadmium (Cd) concentration in standard and natural cadmium-contaminated soil samples [42]. However, these researchers only used the spectral information in the hyperspectral data for modeling and prediction. Hyperspectral technology combined with image technology can reflect the distribution of sample reference values spatially; thus, the corresponding decisions are based on the distribution map. This method has been successfully applied to many fields [43,44], proving that it can feasibly be used to detect SN content and obtain content distribution maps with spectral and image information [45,46].
In this article, hyperspectral technology was used in combination with machine learning to identify SN and moisture, which can save manpower and avoid large-area sensor deployment and achieve non-contact detection. The method can be applied to a variety of platforms, such as portable hyperspectrometers or unmanned aerial vehicles.

3. Proposed Methods

3.1. Soil Sample Collection

Soil samples were collected in Xinjiang Autonomous Region of China, as shown in Figure 1. The collection sites are located in three experimental fields in Tumushuke, Shihezi, and Tacheng in Xinjiang, China, which are all operated and managed by Shihezi University.
The longitude and latitude of the Tumushuke region range from 40°04′ to 39°36′ N and 78°38′ to 79°50′ E. The annual average temperature is 11.6 °C, the average temperature in the hottest month (July) is 25.0 °C–26.7 °C, the average temperature in the coldest month (January) is −6.6 °C–−7.3 °C, the annual average frost-free period is about 225 days, the annual precipitation is 38.3 mm, and the altitude is 425–437 m.
The longitude and latitude of Shihezi region range from 44°25′ to 44°27′ N and 85°40′ to 86°10′ E, respectively. The annual average temperature is 9.2 °C, the average temperature in the hottest month (July) is 25.1–26.1 °C, the average temperature in the coldest month (January) is −20.6 °C–−7.3 °C, the annual average frost-free period is about 170 days, the annual precipitation is 43.5 mm, and the altitude is 510–520 m.
The longitude and latitude of Tacheng region range from 46°31 to 46°37′ N and 83°37′ to 83°41′ E, respectively. The annual average temperature is 7.6 °C, the average temperature in the hottest month (July) is 24.5 °C to 25.8 °C, the average temperature in the coldest month (January) is −26.6 °C to −9.6 °C, the annual average frost-free period is about 155 days, the annual precipitation is 27.1 mm, and the altitude is 1100 m to 1150 m. The cash crops are cotton and wheat, and the soil properties are loamy and silt clay. Tumushuke, Shihezi, and Tacheng test fields were established with ten 5 km grids as sampling units.
To make each sampling point representative of the soil properties of the sampling unit, 80 soil organic matters were sampled for each sampling, and a total of 800 soil samples were collected. Five meters at each sampling point according to the five-point sampling method × samples were collected within 5 m. A total of five soil samples were collected at each sampling point. The samples were fully mixed and loaded into sampling bags. To avoid lead pollution with different contents caused by human activities, the sampling distance from the highway was at least 150 m during the sampling process.
Considering that soil macro-nutrients are concentrated on the earth’s surface, soil samples were collected at a depth of about 15 cm according to traditional soil sampling technology.
The plant tissue, gravel, and other impurities were removed from the sample. The sample was ground and screened after air drying so that the particle size of the soil was less than 0.25 mm. The sample was divided into two parts; one was for determining the content of soil elements, and the other was for the indoor hyperspectrometer. The potassium dichromate volumetric method was used to determine the content of S, and the delta professional 4050 portable X-ray fluorescence spectrometer (Olympus, Allen Town, Pennsylvania, USA) was used to measure the content of soil phosphorus and potassium. The equipment was equipped with an Au target micro X-ray excitation source, and a Peltier semiconductor cooled SDD detector was used. The detector area was 25 mm2, and the energy resolution was 128 ev. The detector can simultaneously measure 38 elements, including soil phosphorus and potassium. Then, the measurement results of element content were statistically analyzed according to “technical specifications for soil analysis”, and the results are shown in Table 1. In total, 800 soil samples were divided into 200 groups according to the element content from low to high. One sample was randomly selected from each group and put into the validation set. In total, 200 samples were used as the validation set, and the other 600 samples were used as the training set.

3.2. Hyperspectral Data Acquisition and Preprocessing

3.2.1. Spectral Data Acquisition

The FieldSpec 4 surface feature spectrometer was used to collect soil hyperspectral data. To eliminate the interference of scattered light, a light source and a black box were used, as shown in Figure 2. The light source was provided by two 50 W (650 lx) quartz tungsten halide lamps. The lamp source was 130 cm high from the soil sample to be measured and irradiated the measurement target at a depression angle of about 50°. The measurement target was located at the center of the facula of the light source to ensure that the light source was uniform in the field of view, and the light source spectrum was as close to the solar spectrum as possible. The soil sample was placed in a culture dish (diameter: 10 cm, depth: 1.5 cm) with all the inside blackened, and its surface was scraped flat with a ruler. The ASD optical fiber probe was vertically fixed at a height of about 3 cm from the measured soil sample. The sensor probe adopted a 25° field of view angle probe. The area where the probe received the soil spectrum was a circle with a diameter of 1.33 cm. The field of view of the spectrometer was much smaller than the area of the Petri dish and was a circle with a diameter of 0.67 cm; these specifications ensured that the probe received the reflection spectrum of the soil sample. White board correction was performed before the test. Ten spectral curves were collected from each soil sample, and the arithmetic mean was taken as the actual reflection spectrum data of the soil sample.

3.2.2. Spectral Data Preprocessing

Signal noise interference is unavoidable in the process of soil hyperspectral data collection, and the proportion of noise will reduce the accuracy of soil moisture and organic matter identification. The wavelet with good time-frequency localization characteristics was changed to eliminate noise while retaining the characteristic peak of the signal [47]. The signal S(n) containing noise can be expressed as follows:
S n = f n + σ e n     n = 0 ,   1 ,   ,   k 1
where f(n) is a useful signal, S(n) is a signal containing noise, e(n) is noise, and σ is the standard deviation of noise figure.
Wavelet denoising includes signal decomposition, threshold setting, and wavelet reconstruction operations. These operations can suppress the noise part e(n) of the signal and restore the signal f(n) as much as possible. This experiment sets up a three-layer wavelet decomposition and reconstruction.
The reconstructed hyperspectral data were processed by principal component analysis to reduce the dimensionality [48]. Principal component analysis can extract the main information from the original data with minimal loss. The set X with input dimensions was used as an example:
X = x 1 ,   x 2 , , x m
As shown below, the focus was on each dimension:
x 1 x i 1 m i = 1 m x 1                                                          
Then, the covariance matrix XXT was calculated, and eigenvalue decomposition was performed. The eigenvalues were sorted as follows:
λ 1 λ 2 λ n
The largest eigenvector of first d (dn) was used to form the eigenvalue of W:
W = ω 1 , ω 2 , , ω d
W is called the feature transformation matrix, and the principal component matrix can be expressed as follows:
A = W T X
T is the transpose of matrix. By retaining the previous principal components as a substitute for the original data, the dimensionality can be reduced, and the original information can be retained to the utmost extent.

3.3. Model Overview

3.3.1. Convolutional Neural Network (CNN)

Support vector machine and convolutional network models for classification were constructed, and the classification results were compared. The support vector machine transforms the input vector into a high-dimensional feature space through the nonlinear mapping κ(xi,xj), makes the input vector linearly separable, and constructs the optimal classification hyperplane in the feature space [49]. SVR can be expressed as follows:
g x = a = 1 n ω a κ x a , x + b
Among them, ωa represents the weight, κ(xi, xj) represents the kernel function, and x = (x1, x2, …, xd) represents the input vector. The Gaussian kernel function, also known as the radial basis function (RBF), is used as the kernel function of the support vector machine. The radial basis function has good anti-interference ability when the data contain noise, and it is the mainstream kernel function of support vector machine as a nonlinear classification. The RBF kernel function is as follows:
κ x i , x j = exp x i x j 2 σ 2
Z i l + 1 = Z i l ω l + 1 + b = k = 1 K 1 x = 1 f Z k l s 0 i + x ω k l + 1 x + b i 0 , 1 , . L l + 1
where b is the deviation, and Zl and Zl+1 represent the convolution input and output of the l+1 layer, respectively; Ll+1 is the size of Zl+1, Zi corresponds to the wavelength of the spectral data, and K is the number of channels. The f, s0, and p, respectively, correspond to the size of the convolution kernel, the step size, and the number of filling layers.
The convolutional layer contained an activation function to help express complex features, and its representation was as follows:
A i , k l = f Z i , k l
After feature extraction in the convolutional layer, the output feature map was passed to the pooling layer for feature selection and information filtering. This article selected the pooling model established by Lp pooling [50], and its general expression was as follows:
A k l i = x = 1 K 1 A k l s 0 i + x p 1 p
where p is a pre-specified parameter.

3.3.2. Long- and Short-Memory Neural Network (LSTM)

LSTM network is a recurrent neural network (RNN) [17]. Like all recurrent neural networks, LSTM can calculate the data of traditional computers with sufficient network elements, especially for time-series data. Its general structure is shown in Figure 3.
The three module frame parts in the figure can be seen as three cell structures. The front and rear two cells represent the cell states at the previous and later times. The middle cell is the state at the current time. It can be divided into three gating parts, representing the forgetting gate, the input gate, and the output gate 18] The three gates receive the LSTM output value ht-1 of the previous time state and the input data xt of the current time as inputs [19]. The ft of the forgetting gate part can be regarded as obtained from the input xt and ht-1, which are used to control the forgetting degree of the information in ct-1. Each value in ft belongs to the range of [0,1]. The lower bound value 0 represents complete forgetting, and the upper bound value 1 represents completely unchanged, which can be retained. The forgetting gate determines the forgetting degree of the state information in the previous period, and the function of the input gate is to add new content to the current state information. Similarly, the input gate part obtains the current one from the input xt and ht-1 to control the update degree of the current state information Here, the current state information cg is also calculated from the input xt and ht-1. Then, the current new state information ct can be calculated by the following formula. It means forgetting some old information and updating some new information [22]. The last part is the output gate. Similarly, according to xt and ht-1, ot can be calculated to control which information needs to be output. The specific calculation formula is as follows:
f t = σ w f h t 1 , x t + b f i t = σ w f h t 1 , x t + b i h t = tan h w g h t 1 , x t + b g c t = f t × c t 1 + i t × g t o t = σ w o h t 1 , x t + b o h t = o t × tan h c t
where w and b represent the weight matrix and bias vector of the above gate, ct represents the memory cell, and σ and tanh represent S-type functions and hyperbolic tangent activation functions, respectively.

3.3.3. Genetic Algorithm (GA)

GA is an algorithm that simulates biological evolution for individual selection, crossover, and mutation. Its main core comprises parameter coding, setting of initial population, and determination of fitness function; then, the optimal solution is obtained through final search [23]. In this paper, GA is used to optimize the determination and calculation of weight and bias during training of LSTM neural network.
The basic operation process of genetic algorithm is as follows:
(1) Initialization: set the evolution algebra counter t = 0, set the maximum evolution algebra T, and randomly generate M individuals as the initial population P(0);
(2) Individual evaluation: calculate the fitness of each individual in the population P(t);
(3) Selection operation: Apply the selection operator to the group. The purpose of selection is to directly inherit the optimized individuals to the next generation or to generate new individuals through pairing and crossover and then to the next generation. The selection operation is based on the fitness evaluation of individuals in the group;
(4) Crossover operation: it applies the cross operator to the group. Crossover operator plays a key role in genetic algorithm;
(5) Mutation operation: the mutation operator is applied to the population. That is, change the gene value at some loci of individual strings in the population. The next generation population P(t+1) is obtained from the population P(t) after selection, crossover, and mutation operations;
(6) Judgment of termination condition: if t = T, the individual with the maximum fitness obtained in the evolution process is taken as the output of the optimal solution, and the calculation is terminated.
Genetic operations include the following three basic genetic operators: selection, crossover, and mutation.
The process of optimization and improvement model is shown in Figure 4.

3.3.4. SVR Model

The epsilon SVR karyotype equation was used to construct the SVR model. When developing SVR models, different types of kernel functions are considered, such as linear, radial basis function (RBF), sigmoid, and polynomial. In Section 4, cross-validation of different types of kernel functions was performed to establish the best prediction model.

3.4. Model Evaluation Criteria

The training and test sets were divided by K-fold cross-validation (KCV) [24]. KCV makes full use of the limited data set to make the evaluation results more convincing [26]. The most common 10-fold cross-validation was used in this experiment. Indicators such as confusion matrix, accuracy, recall, F1 score, and precision recall curve (PRC) were used to evaluate the performance of model. The confusion matrix was the basis of many classification metrics, which can be used to make a more comprehensive judgment on the performance of the classifier, as shown in Table 2.
Average recall rate (AR) and average precision (AP) under different intersection constraints.
A P = T P T P + F P  
A R = T P T P + F N
where TP: true positive, FP: false positive, and FN: false negative.
The precision recall curve can be obtained by treating precision as the vertical axis and recall as the horizontal axis. By using PRC, one can intuitively measure precision and recall according to the different requirements of soil testing. The above evaluation indicators were based on binary classification, which can be applied to multi-classification by converting to a binary classification problem [27]. The macro-average and micro-average formulas are shown in Table 3.

4. Results and Discussion

4.1. Optimization of Model Parameters

Python 3.6 was used to build the model. The two models were trained and tested under different parameter values. The main parameter of CNN was the number of convolutional layers. The prediction effect of layer 1 to 4 of CNN was studied. The main parameter of SVR was the type of kernel function. The accuracy of 10-fold cross validation was used to compare CNN and SVR, and 95% confidence interval, average precision, average recall, and average F1 score were given. The results are shown in Table 4.
The kCV accuracy in Table 3 represents the highest 10-fold cross-validation accuracy of SVR and CNN. The minimum cross-validation accuracy of CNN (0.61) was higher than the minimum cross-validation accuracy of SVR (0.34). Therefore, CNN was more suitable for classification and recognition of this data set than SVR. In addition, by adjusting the structure of the neural network model, the cross-validation accuracy can reach 80%. The CNN model obtained the best 95% confidence interval, average precision, average recall, and average F1 score.
The convolution long- and short-memory neural network hybrid model (GA-CNN-LSTM) optimized by genetic algorithm is shown in Figure 5. The model structure was composed of convolutional neural network and long- and short-memory neural network. First, the convolution neural network extracted the spatial features of the data [13], and then, the long- and short-memory neural network extracted the temporal features. The whole model combined the advantages of the two neural networks, and during the training of the superparameters of the LSTM neural network, the weight coefficient and offset value of the network were updated and calculated by GA, replacing the traditional gradient descent method. Hence, the whole training and learning process were optimized. Among them, the convolutional neural network (as shown in Figure 4) is a one-dimensional two-layer convolutional neural network model. The network structure was composed of convolution layer, pooling layer, and full connection layer; 1D-CNN included two convolution layers, two pooling layers, and three full connection layers. The cross-entropy function was selected as the objective function. The convolution network included two convolution layers (C1 and C3) and two pool layers (S2 and S4), and F5 was used to flatten the output of S4 into a one-dimensional sequence to facilitate the connection between the pool layer (S4) and the fully connected layer (F5).
The machine learning algorithm development environment proposed in this article was trained and tested on Windows 10, NVIDIA GTX 1060 (4G) GPU, and CUDA v10.0 using the Intel Core i76700 2.6GHz six-core processor. The programming language uses OpenCV 3.0, Tensorflow 2.0, and Python 3.6.4.

4.2. Analysis of Model Prediction Results

The predicted and measured values of the model were used to draw the fitting diagram of the predicted result, and the fitting effects of CNN, SVR, CNN-LSTM, and GA-CNN-LSTM were further compared and analyzed.
The prediction of soil moisture by each model is shown in Figure 6. The prediction accuracy rates of CNN and SVR models in the training set were 82.1% and 85.6%, respectively. The prediction effect of SVR model was better than CNN model, and the accuracy of CNN model after LSTM processing was only improved to 92.1%. However, the accuracy of the CNN-LSTM model optimized by GA was greatly improved, and the accuracy of the prediction set reached 94.5%.
The prediction of SN by each model is shown in Figure 7. The figure shows that the prediction effects of CNN, SVR, CNN-LSTM, and GA-CNN-LSTM models on SN were weaker than those on soil moisture. Similarly, the CNN-LSTM model optimized by GA had the best accuracy, and the accuracy of the prediction set reached 92.9%.
Compared with CNN-LSTM and CNN model, the prediction value of the optimized GA-CNN-LSTM model was more closely distributed at about 1:1, and the data-fitting ability and stability were better than those of the CNN-LSTM model. The fitting effect of the training set was better than that of the validation set, and the sample points of the validation set were relatively scattered, indicating that the optimized GA-CNN-LSTM model can be used as an alternative to the prediction of soil OM and moisture.
PRC shows the relationship between precision and recall. Figure 8 shows the effect of the optimized GA-CNN-LSTM model on precision and recall. Precision and recall can be selected according to the requirements of different soil identification tasks. For some moisture or organic matter with high accuracy requirements, high precision should be selected as far as possible. If one wants to identify as many specific soil moisture and organic matter areas as possible, then one should choose high recall. For example, high precision can be selected in the discrimination of identifying drought or excessive fertilizer application, and high recall can be selected in the regional statistics of soil moisture and organic matter.

5. Conclusions

In this paper, soil nutrient and hyperspectral data of different SN and moisture content were collected in laboratory and field. Then, the triple wavelet denoising and PCA dimensionality reduction were carried out to preprocess the hyperspectral data and eliminate environmental interference. Tenfold cross validation was used to compare the classification and recognition effects of CNN convolution layer and SVR kernel function types on soil moisture and nutrients. Evaluation indicators, including confidence interval and prediction accuracy, were given. A GA-optimized CNN-LSTM hybrid neural network (GA-CNN-LSTM) algorithm model was proposed to predict SN and moisture. The prediction performance was compared with those of CNN, SVR, and CNN-LSTM hybrid neural network without GA optimization. The GA-CNN-LSTM algorithm was superior to other models in all indicators; it had the highest accuracy rates of 94.5% and 92.9% for soil moisture and organic matter, respectively. The accuracy of GA-CNN-LSTM model for predicting soil moisture was better than that for predicting SN. Finally, the precision and recall curves of the modified model were obtained, and the applicable methods of the model were discussed. By comparing the measured values and predicted values of each model, the predicted values of the optimized GA-CNN-LSTM model are more closely distributed at about 1:1, and the data-fitting ability and stability are better than CNN-LSTC and other models. However, it is still necessary to further study the relationship between soil moisture and organic matter, verify the robustness of GA-CNN-LSTM model under different soil types, and verify its ability to explore the contents of macroelements and microelements in different soils.

Author Contributions

This study was conceptualized by H.W. and L.Z; software was designed by H.W. and validated by X.H., J.Z., and X.M.; H.W. provided resources, and L.Z. curated the data; the original draft of the manuscript was prepared by H.W. and X.H; X.M. reviewed and edited the manuscript; J.Z and X.H. assisted with project administration; H.W. and X.M. managed funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China No.51365048.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All relevant data presented in the article are stored according to institutional requirements and, as such, are not available online. However, all data used in this manuscript can be made available upon request to the authors.

Acknowledgments

We are grateful to Huan Wang, Lixin Zhang, and Jiawei Zhao for their data recording and checking work during the process. We are also grateful to Chanchan Du, Yongchao Shan, and Haoran Bu for their help in project management. We thank Shihezi University for providing the experimental conditions for us to successfully complete this experiment. Finally, we thank the instructor for his constructive comments on the earlier version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. McBride, J. Climate change, global population growth, and humanoid robots. J. Future Robot. Life 2021, 2, 23–41. [Google Scholar] [CrossRef]
  2. FAO; IFAD; UNICEF; WFP; WHO. Brief to the State of Food Security and Nutrition in the World 2021; FAO: Rome, Italy, 2022. [Google Scholar]
  3. Sanchez, P.A. Soil fertility and hunger in Africa. Science 2002, 295, 2019–2020. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Kucha, C.T.; Liu, L.; Ngadi, M.; Gariépy, C. Improving Intramuscular Fat Assessment in Pork by Synergy Between Spectral and Spatial Features in Hyperspectral Image. Food Anal. Methods 2021, 15, 212–226. [Google Scholar] [CrossRef]
  5. Ohana-Levi, N.; Ben-Gal, A.; Peeters, A.; Termin, D.; Linker, R.; Baram, S.; Raveh, E.; Paz-Kagan, T. A comparison between spatial clustering models for determining N-fertilization management zones in orchards. Precis. Agric. 2021, 22, 99–123. [Google Scholar] [CrossRef]
  6. Hou, P.; Jiang, Y.; Yan, L.; Petropoulos, E.; Wang, J.; Xue, L.; Yang, L.; Chen, D. Effect of fertilization on nitrogen losses through surface runoffs in Chinese farmlands: A meta-analysis. Sci. Total Environ. 2021, 793, 148554. [Google Scholar] [CrossRef]
  7. Ng, W.; Husnain; Anggria, L.; Siregar, A.F.; Hartatik, W.; Sulaeman, Y.; Jones, E.; Minasny, B. Developing a soil spectral library using a low-cost NIR spectrometer for precision fertilization in Indonesia. Geoderma Reg. 2020, 22, e00319. [Google Scholar] [CrossRef]
  8. Rose, S.; Nickolas, S.; Sangeetha, S. A recursive ensemble-based feature selection for multi-output models to discover patterns among the soil organic matter. Chemom. Intell. Lab. Syst. 2020, 208, 104221. [Google Scholar] [CrossRef]
  9. Kiboi, M.N.; Ngetich, F.K.; Mucheru-Muna, M.W.; Diels, J.; Mugendi, D.N. Soil organic matter and crop yield response to conservation-effective management practices in the sub-humid highlands agro-ecologies of Kenya. Heliyon 2021, 7, e07156. [Google Scholar] [CrossRef]
  10. Zhou, H.K.; Wu, J.J.; Li, X.H.; Liu, L.Z.; Yang, J.H.; Han, X.Y. Suitability of assimilated data-based standardized soil moisture index for agricultural drought monitoring. Acta Ecol. Sin. 2019, 39, 2191–2202. [Google Scholar]
  11. Song, X.; Wei, H.; Rees, R.M.; Ju, X. Soil oxygen depletion and corresponding nitrous oxide production at hot moments in an agricultural soil. Environ. Pollut. 2021, 292, 118345. [Google Scholar] [CrossRef]
  12. Lorenz, D.J.; Otkin, J.A.; Zaitchik, B.; Hain, C.; Anderson, M.C. Predicting Rapid Changes in Evaporative Stress Index (ESI) and Soil Moisture Anomalies over the Continental United States. J. Hydrometeorol. 2021, 22, 3017–3036. [Google Scholar] [CrossRef]
  13. Kargaran, M.; Habibi, M.; Magierowski, S. Self-Powered Soil Moisture Monitoring Sensor Using a Picoampere Quiescent Current Wake-Up Circuit. IEEE Trans. Instrum. Meas. 2020, 69, 6613–6620. [Google Scholar] [CrossRef]
  14. Archana, A.; Sankari, V.; Nair, S. An economically mobile device for the on-site testing of soil organic matter by studying the spectrum. Mater. Today Proc. 2021, in press. [Google Scholar] [CrossRef]
  15. Wei, X.; Waterhouse, M.J.; Qi, G.; Wu, J. Long-term logging residue loadings affect tree growth but not soil organic matter in Pinus contorta Doug. ex Loud. forests. Ann. For. Sci. 2020, 77, 61. [Google Scholar] [CrossRef]
  16. Maxwell, T.L.; Augusto, L.; Bon, L.; Courbineau, A.; Altinalmazis-Kondylis, A.; Milin, S.; Bakker, M.R.; Jactel, H.; Fanin, N. Effect of a tree mixture and water availability on soil organic matter and extracellular enzyme activities along the soil profile in an experimental forest. Soil Biol. Biochem. 2020, 148, 107864. [Google Scholar] [CrossRef]
  17. Zhang, T.; Li, F.Y.; Shi, C.; Li, Y.; Tang, S.; Baoyin, T. Enhancement of nutrient resorption efficiency increases plant production and helps maintain soil organic matter under summer grazing in a semi-arid steppe. Agric. Ecosyst. Environ. 2020, 292, 106840. [Google Scholar] [CrossRef]
  18. Eon, R.S.; Bachmann, C.M. Mapping barrier island soil moisture using a radiative transfer model of hyperspectral imagery from an unmanned aerial system. Sci. Rep. 2021, 11, 3270. [Google Scholar] [CrossRef]
  19. Valverde, M.G.; Bueno, M.M.; Gómez-Ramos, M.; Aguilera, A.; Gil García, M.; Fernández-Alba, A. Determination study of contaminants of emerging concern at trace levels in agricultural soil. A pilot study. Sci. Total Environ. 2021, 782, 146759. [Google Scholar] [CrossRef]
  20. Cui, Y.; Zeng, C.; Chen, X.; Fan, W.; Liu, H.; Liu, Y.; Xiong, W.; Sun, C.; Luo, Z. A New Fusion Algorithm for Simultaneously Improving Spatio-temporal Continuity and Quality of Remotely Sensed Soil Moisture over the Tibetan Plateau. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 83–91. [Google Scholar] [CrossRef]
  21. Sekertekin, A.; Marangoz, A.M.; Abdikan, S. ALOS-2 and Sentinel-1 SAR data sensitivity analysis to surface soil moisture over bare and vegetated agricultural fields. Comput. Electron. Agric. 2020, 171, 105303. [Google Scholar] [CrossRef]
  22. Pezol, N.S.; Adnan, R.; Tajjudin, M. Design of an Internet of Things (Iot) Based Smart Irrigation and Fertilization System Using Fuzzy Logic for Chili Plant. In Proceedings of the 2020 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Shah Alam, Malaysia, 20 June 2020. [Google Scholar]
  23. Bitella, G.; Rossi, R.; Bochicchio, R.; Perniola, M.; Amato, M. A Novel Low-Cost Open-Hardware Platform for Monitoring soil moisture Content and Multiple Soil-Air-Vegetation Parameters. Sensors 2014, 14, 19639–19659. [Google Scholar] [CrossRef]
  24. Ferrarezi, R.S.; Dove, S.K.; van Iersel, M.W. An Automated System for Monitoring Soil Moisture and Controlling Irrigation Using Low-cost Open-source Microcontrollers. HortTechnology 2015, 25, 110–118. [Google Scholar] [CrossRef] [Green Version]
  25. Chen, G.; Zhao, S.; Caiying, N.I. Hyperspectral monitoring of soil contaminated by heavy metals. J. Univ. Chin. Acad. Sci. 2019, 36, 560–566. [Google Scholar]
  26. Guan, T. Practice of Hyperspectral Remote Sensing in Monitoring Soil Heavy Metal Pollution. Guangdong Chem. Ind. 2019. [Google Scholar]
  27. Zhou, W.; Yang, H.; Xie, L.; Li, H.; Huang, L.; Zhao, Y.; Yue, T. Hyperspectral inversion of soil heavy metals in Three-River Source Region based on random forest model. Catena 2021, 202, 105222. [Google Scholar] [CrossRef]
  28. Jiang, C.; Fang, H. GSV: A general model for hyperspectral soil reflectance simulation. Int. J. Appl. Earth Obs. Geoinf. 2019, 83, 101932. [Google Scholar] [CrossRef]
  29. Nerger, R.; Klüver, K.; Cordsen, E.; Fohrer, N. Intensive long-term monitoring of soil organic carbon and nutrients in Northern Germany. Nutr. Cycl. Agroecosyst. 2020, 116, 57–69. [Google Scholar] [CrossRef]
  30. Kashyap, B.; Kumar, R. Sensing Methodologies in Agriculture for Soil Moisture and Nutrient Monitoring. IEEE Access 2021, 9, 14095–14121. [Google Scholar] [CrossRef]
  31. Chen, M.; Zhang, M.; Wang, X.; Yang, Q.; Wang, M.; Liu, G.; Yao, L. An All-Solid-State Nitrate Ion-Selective Electrode with Nanohybrids Composite Films for In-Situ soil organic matter Monitoring. Sensors 2020, 20, 2270. [Google Scholar]
  32. Puno, J. Soil organic matter Detection using Genetic Algorithm. In Proceedings of the 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Laoag, Philippines, 29 November–1 December 2019. [Google Scholar]
  33. Dudala, S.; Dubey, S.K.; Goel, S. Microfluidic soil organic matter Detection System: Integrating Nitrite, pH, and Electrical Conductivity Detection. IEEE Sens. J. 2020, 20, 4504–4511. [Google Scholar] [CrossRef]
  34. Cai, H.-T.; Liu, J.; Chen, J.-Y.; Zhou, K.-H.; Pi, J.; Xia, L.-R. Soil organic matter information extraction model based on transfer learning and near infrared spectroscopy. AEJ—Alex. Eng. J. 2021, 60, 2741–2746. [Google Scholar] [CrossRef]
  35. Prananto, J.A.; Minasny, B.; Weaver, T. Near infrared (NIR) spectroscopy as a rapid and cost-effective method for nutrient analysis of plant leaf tissues. Adv. Agron. 2020, 164, 1–49. [Google Scholar]
  36. Zhou, P.; Li, M.; Yang, W.; Yao, X.; Liu, Z.; Ji, R. Development and performance tests of an on-the-go detector of soil total nitrogen concentration based on near-infrared spectroscopy. Precis. Agric. 2021, 22, 1479–1500. [Google Scholar] [CrossRef]
  37. Alekseev, I.; Abakumov, E. Soil organic carbon stocks and stability of organic matter in permafrost-affected soils of Yamal region, Russian Arctic. Geoderma Reg. 2021, 28, e00454. [Google Scholar] [CrossRef]
  38. Bae, J.-S.; Oh, S.-K.; Pedrycz, W.; Fu, Z. Design of fuzzy radial basis function neural network classifier based on information data preprocessing for recycling black plastic wastes: Comparative studies of ATR FT-IR and Raman spectroscopy. Appl. Intell. 2019, 49, 929–949. [Google Scholar] [CrossRef]
  39. Taylor, N.; Wyres, M.; Bollard, M.; Kneafsey, R. Use of functional near-infrared spectroscopy to evaluate cognitive change when using healthcare simulation tools. BMJ Simul. Technol. Enhanc. Learn. 2020, 6, 360–364. [Google Scholar] [CrossRef]
  40. O’Rourke, S.M.; Holden, N.M. Determination of Soil Organic Matter and Carbon Fractions in Forest Top Soils using Spectral Data Acquired from Visible—Near Infrared Hyperspectral Images. Soil Sci. Soc. Am. J. 2012, 76, 586. [Google Scholar] [CrossRef]
  41. Jiang, X.-L.; Zou, B.; Tu, Y.-L.; Feng, H.-H.; Chen, X. Quantitative Estimation of Cd Concentrations of Type Standard Soil Samples Using Hyperspectral Data. Spectrosc. Spectr. Anal. 2018, 38, 3254–3260. [Google Scholar]
  42. Cheng, W.; Sun, D.-W.; Pu, H.; Liu, Y. Integration of spectral and textural data for enhancing hyperspectral prediction of K value in pork meat. LWT—Food Sci. Technol. 2016, 72, 322–329. [Google Scholar] [CrossRef]
  43. Zhang, C.; Ye, H.; Liu, F.; He, Y.; Kong, W.; Sheng, K. Determination and Visualization of pH Values in Anaerobic Digestion of Water Hyacinth and Rice Straw Mixtures Using Hyperspectral Imaging with Wavelet Transform Denoising and Variable Selection. Sensors 2016, 16, 244. [Google Scholar] [CrossRef] [Green Version]
  44. Henderson, C.; Potter, W.; McClendon, R.; Hoogenboom, G. Predicting Aflatoxin Contamination in Peanuts: A Genetic Algorithm/Neural Network Approach. Appl. Intell. 2000, 12, 183–192. [Google Scholar] [CrossRef]
  45. Zhao, Y.-R.; Li, X.; Yu, K.-Q.; Cheng, F.; He, Y. Hyperspectral Imaging for Determining Pigment Contents in Cucumber Leaves in Response to Angular Leaf Spot Disease. Sci. Rep. 2016, 6, 27790. [Google Scholar] [CrossRef] [Green Version]
  46. Chen, Z.; Wang, Y.; Wu, J.; Deng, C.; Hu, K. Sensor data-driven structural damage detection based on deep convolutional neural networks and continuous wavelet transform. Appl. Intell. 2021, 51, 5598–5609. [Google Scholar] [CrossRef]
  47. Li, Y.; Chao, X. Semi-supervised few-shot learning approach for plant diseases recognition. Plant Methods 2021, 17, 68. [Google Scholar] [CrossRef]
  48. Liu, J.-W.; Zuo, F.-L.; Guo, Y.-X.; Li, T.-Y.; Chen, J.-M. Research on improved wavelet convolutional wavelet neural networks. Appl. Intell. 2021, 51, 4106–4126. [Google Scholar] [CrossRef]
  49. Wu, Y.; Zhao, R.; Jin, W.; He, T.; Ma, S.; Shi, M. Intelligent fault diagnosis of rolling bearings using a semi-supervised convolutional neural network. Appl. Intell. 2020, 51, 2144–2160. [Google Scholar] [CrossRef]
  50. Wang, Y.; Li, X.; Wang, J. A neurodynamic optimization approach to supervised feature selection via fractional programming. Neural Netw. 2021, 136, 194–206. [Google Scholar] [CrossRef]
Figure 1. Data collection location of soil organic matter hyperspectral database.
Figure 1. Data collection location of soil organic matter hyperspectral database.
Applsci 12 10333 g001
Figure 2. Laboratory data acquisition device.
Figure 2. Laboratory data acquisition device.
Applsci 12 10333 g002
Figure 3. LSTM neural network structure.
Figure 3. LSTM neural network structure.
Applsci 12 10333 g003
Figure 4. Optimization process of genetic algorithm.
Figure 4. Optimization process of genetic algorithm.
Applsci 12 10333 g004
Figure 5. One-dimensional convolutional neural network model structure.
Figure 5. One-dimensional convolutional neural network model structure.
Applsci 12 10333 g005
Figure 6. Measured and predicted values of soil moisture content.
Figure 6. Measured and predicted values of soil moisture content.
Applsci 12 10333 g006
Figure 7. Measured and predicted values of SN content.
Figure 7. Measured and predicted values of SN content.
Applsci 12 10333 g007aApplsci 12 10333 g007b
Figure 8. Precision recall curve: (A) is the PRC of SVR, and (B) is the PRC of CNN.
Figure 8. Precision recall curve: (A) is the PRC of SVR, and (B) is the PRC of CNN.
Applsci 12 10333 g008
Table 1. Statistics of moisture and organic matter content of soil samples.
Table 1. Statistics of moisture and organic matter content of soil samples.
Soil SamplesNumber of SamplesMinimum (g kg−1)Maximum (g kg−1)Mean (g kg−1)Standard Deviation (g kg−1)
Soil moisture800124.831181.295142.80424.081
Soil organic matter8002.0165.1073.2090.487
Table 2. Fuzzy rule tables based on fuzzy control.
Table 2. Fuzzy rule tables based on fuzzy control.
Confusion MatrixPredict
10
Real1TPFN
0FPTN
Table 3. Macro average and micro-average formulas.
Table 3. Macro average and micro-average formulas.
Evaluation
Criteria
Macro-AverageMicro-Average
Precision M a c r o _ P = 1 n i = 1 n P i M i c r o _ P = 1 n T P i 1 n T P i + 1 n F P i
Recall M a c r o _ R = 1 n i = 1 n R i M i c r o _ R = 1 n T R i 1 n T R i + 1 n F R i
F1 score M a c r o _ F 1 = 2 × M a c r o _ P × M a c r o _ R M a c r o _ P + M a c r o _ R M i c r o _ F 1 = 2 × M i c r o _ P × M i c r o _ R M i c r o _ P + M i c r o _ R
Table 4. Preliminary evaluation of the CNN and SVR using 10-fold cross-validation.
Table 4. Preliminary evaluation of the CNN and SVR using 10-fold cross-validation.
ModelParametersKCV
Accuracy
95% Confidence IntervalAVG
Precision
AVG
Recall
AVG
F1 Score
CNNOne layer0.61(±0.25)0.650.660.67
Two layer0.81(±0.11)0.850.730.74
Three layer0.77(±0.18)0.790.780.78
Four layer0.75(±0.11)0.690.680.55
SVR“linear”0.49(±0.08)0.550.560.56
“poly”0.57(±0.22)0.600.500.56
“rbf”0.61(±0.18)0.550.600.59
“sigmoid”0.34(±0.21)0.320.350.30
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, H.; Zhang, L.; Zhao, J.; Hu, X.; Ma, X. Application of Hyperspectral Technology Combined with Genetic Algorithm to Optimize Convolution Long- and Short-Memory Hybrid Neural Network Model in Soil Moisture and Organic Matter. Appl. Sci. 2022, 12, 10333. https://doi.org/10.3390/app122010333

AMA Style

Wang H, Zhang L, Zhao J, Hu X, Ma X. Application of Hyperspectral Technology Combined with Genetic Algorithm to Optimize Convolution Long- and Short-Memory Hybrid Neural Network Model in Soil Moisture and Organic Matter. Applied Sciences. 2022; 12(20):10333. https://doi.org/10.3390/app122010333

Chicago/Turabian Style

Wang, Huan, Lixin Zhang, Jiawei Zhao, Xue Hu, and Xiao Ma. 2022. "Application of Hyperspectral Technology Combined with Genetic Algorithm to Optimize Convolution Long- and Short-Memory Hybrid Neural Network Model in Soil Moisture and Organic Matter" Applied Sciences 12, no. 20: 10333. https://doi.org/10.3390/app122010333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop