3.2. Locust Density Inversion Model
This study divided the dataset into training and test sets, and it constructed models based on BP neural network regression combined with the principal component analysis (PCA), random forest regression, BP neural network regression only, deep belief network regression, and support vector regression (SVR). Subsequently, the models underwent training and parameter optimization [
13].
BP Neural Network Regression Based on Principal Component Analysis: The principal component analysis (PCA) has become a common method for handling high-dimensional data and simplifying datasets. The core purpose of this technique is to transform complex multi-dimensional data into a lower-dimensional subspace, while minimizing the overall loss of information in order to more effectively represent the original dataset. In meteorological data analyses, the PCA is particularly important, as factors such as rainfall, air humidity, and soil moisture often have close inter-relationships. By applying the PCA to transform these interrelated data, their dimensions can be reduced, thereby improving the efficiency of model training [
14].
The BP (backpropagation) neural network, inspired by the human brain’s response mechanism, is a type of multi-layer, fully connected network primarily used for data fitting and classification [
15]. It consists of three key components: the input layer, hidden layers, and the output layer. Neurons, as the fundamental units of the network, facilitate signal transmission between these layers. With the help of internal activation functions in the neurons, the BP neural network can approximate a variety of complex non-linear functions. The workflow of the BP neural network is as follows: signals propagate forward from the input layer, passing through multiple hidden layers, where the signal undergoes complex processing before reaching the output layer [
16]. The data at the output layer are compared with the target data, generating an error value. If the current weights and thresholds do not produce the desired output, the error information will propagate back along the same path; that is, it backpropagates to each corresponding neuron, adjusting the weights and thresholds. This process repeats until the network output error falls within an acceptable range, completing the training process [
17]. A model of the BP neural network is illustrated in
Figure 4.
The calculation formula for the nodes in the hidden layer in the diagram is as follows:
In formula (2),
n represents the number of nodes;
is the activation function,
denotes the parameter weights for the
i-th layer, and
is the bias for the
i-th layer. Combining the PCA and BP neural network for a regression analysis helps to reduce the risk of overfitting and enhances generalization to unseen data by eliminating noise and irrelevant variables from the data. However, it also comes with disadvantages [
18]. The dimensionality reduction process may discard some components that are crucial for prediction, leading to a deterioration in the interpretability of the model. Combining these two techniques also implies the need to adjust and optimize more parameters, potentially complicating the model training and optimization processes.
Random Forest Regression: The random forest regression algorithm employs an ensemble method consisting of numerous independently constructed decision trees. The core process of this algorithm includes the following: firstly, the generation of multiple different training samples and attribute subsets by repeatedly sampling the original dataset with replacement; secondly, the construction of a decision tree for each sample and attribute subset; and, finally, the derivation of the final prediction value by voting or taking the weighted average of the predictions from these decision trees [
19]. Compared with other machine learning techniques, a significant advantage of a random forest is its ensemble learning characteristic. A random forest can usually avoid the overfitting problem that might occur in a single decision tree, thereby improving generalizability to new data, as well as possessing good noise resistance. Moreover, a random forest maintains an efficient training speed, even when handling large datasets; it can process high-dimensional data without the need for feature selection; and it can provide assessments of the impact of each feature on prediction results, offering some basis for model interpretation [
20]. In this study, the model used cross-validation. A schematic diagram of the random forest regression model is shown in
Figure 5.
BP Neural Network Regression: In this model, BP neural network regression is used independently, without the implementation of the principal component analysis. BP neural networks are capable of capturing and modeling complex non-linear relationships, which is extremely valuable for complex datasets that are difficult to handle with linear models. BP neural networks can effectively predict unseen data, demonstrating good generalization capabilities.
Deep Belief Network Regression: Deep belief networks (DBNs) are a type of deep learning model composed of multiple layers of generative models; specifically, typically stacked Restricted Boltzmann Machines (RBMs). Each RBM layer learns representations of data at different levels of abstraction. DBNs initially employ unsupervised learning for the layer-wise pre-training of the network, followed by fine-tuning through supervised learning. DBNs are capable of automatically learning complex and high-level feature representations of data, which is particularly important in fields such as image and speech recognition [
21]. DBNs generally demonstrate good generalization performance across a variety of tasks.
Figure 6 shows a structural diagram of a deep belief network (DBN) model. This model includes three stacked RBM layers and one BP layer. DBNs initially conduct preliminary pre-training of the network through multiple RBM layers and utilize the BP layer for fine-tuning with supervised learning, thereby achieving comprehensive training of the model [
22].
In this study, the RBM receives the data vector transmitted from the bottommost layer through the visible layer. The input vector undergoes an activation function transformation to the hidden layer and, through training, the internal energy function is minimized [
23]. Given visible units
, hidden units
, and their connection weights
(with a size of
,
), as well as the offset
for
and the bias weight
for
, the energy function
is defined using formula (3):
By calculating the energy function
, the probability distribution
for the visible and hidden layers can be expressed as Equations (4) and (5), where Z denotes the normalization factor:
The probability distribution
, for observed data
, corresponding to the marginal distribution of
, is referred to as the likelihood function, as shown in Equation (6). Equation (7) represents the vector obtained by removing component
from
, and it is substituted into Equations (8) and (9).
The energy function simplifies to Equation (10), and the solution for the likelihood function is derived as shown in Equations (11) and (12):
The activation probability formula for Restricted Boltzmann Machines (RBMs) is the sigmoid function. This function yields values between 0 and 1 for the entire range of (−∞, +∞), allowing for the computation of activation probabilities for respective nodes. When the activation status of all neural units in the visible layer (or hidden layer) is known, the activation probabilities for the hidden layer (or visible layer) neurons can be inferred. This involves calculating
and
. The unknown RBM parameters W, a, and b can be determined through unsupervised learning [
24].
SVR Model: Support vector regression (SVR) is a regression method based on support vector machines (SVMs). In traditional SVMs, the goal is to find a decision boundary that maximizes the margin between different classes of data points. In SVR, this concept is applied to regression problems, i.e., predicting a continuous value, rather than classification [
25]. SVR allows for the setting of an “epsilon margin” within the model, which defines the acceptable error between predicted values and actual values. This approach helps to control the model’s generalization ability and the risk of overfitting. SVR is robust against outliers and noise [
26]. The model primarily relies on support vectors (i.e., data points near the boundary) rather than all data, making it less sensitive to outliers [
27]. SVR can effectively handle data in high-dimensional feature spaces, working well even when the number of features exceeds the number of samples [
28].