1. Introduction
As society develops, the drawbacks of traditional fossil fuel usage have become increasingly prominent. In contrast to the use of conventional fossil fuels, which can cause severe harm to the environment and ecology, new types of environmentally friendly energy, such as wind power generation, are proving to be more sustainable and worthy of promotion [
1]. Wind power generation has the potential to alleviate the shortage of conventional energy sources and mitigate the increasingly severe environmental pollution [
2,
3]. However, due to the inherent properties of natural wind, such as intermittency, randomness, and volatility, wind power generation exhibits strong randomness, intermittency, and high variability, which pose significant challenges for wind power grid integration and power system scheduling, consequently affecting the quality of electricity and the safe operation of the system [
4]. Therefore, accurate wind power forecasting is crucial for alleviating peak load pressure on the grid, reducing the backup capacity of the power system, and enhancing the level of wind power injection and system reliability [
5].
Wind power forecasting can be categorized into indirect and direct methods based on the acquisition approach. Indirect methods refer to physical-driven prediction techniques, while direct methods utilize statistical methods and machine learning to learn patterns from historical data [
5]. Among these, physical-driven methods are often complex in modeling, costly, and computationally intensive [
6]. Physical methods require the establishment of models based on complex physical relationships between various physical quantities, such as meteorological and topographical information. The computational cost for predictions is high, making them unsuitable for ultra-short-term forecasting [
7]. Traditional statistical methods leverage historical data from wind farms to generate linear characteristics of wind output, such as Autoregressive Integrated Moving Average (ARIMA) [
8] and Bayesian regression [
9]. However, the linear characteristics of statistical models are not suitable for nonlinear and non-stationary predictions. With the rapid advancement of information technology and artificial intelligence, machine learning methods have been widely applied in the field of wind power forecasting, including artificial neural networks (ANNs) [
10] and Support Vector Machines (SVMs) [
11].
Early machine learning methods struggled to handle multivariate time-series data characterized by high dimensionality, temporal dynamics, and complexity. The expressive power and feature extraction capabilities of neural networks improve with increased network depth, allowing deep learning methods to better mine the high-dimensional, deep features contained within the data [
12]. Due to their outstanding performance in extracting and fitting data features, deep learning methods have been widely applied in wind power forecasting in recent years [
13], Among these methods, Long Short-Term Memory (LSTM) neural networks [
14] stand out due to their strong memory retention capabilities, enabling them to effectively extract valuable information from long sequence data. Therefore, they have found widespread applications in the field of forecasting. References [
15,
16,
17,
18,
19] propose several wind power forecasting models based on LSTM. These models utilize LSTM networks to learn the temporal features in wind power data, achieving higher prediction accuracy than linear models, traditional machine learning models, and artificial neural networks. Reference [
20] proposes an improved Long Short-Term Memory (LSTM) neural network for wind power forecasting, demonstrating excellent predictive performance. However, standalone LSTM neural networks often face issues such as a high number of gate units, slow training speeds, and relatively low stability in predictive models. Reference [
21] highlights the current research trend of continuously improving the prediction accuracy of LSTM models by combining them with bio-inspired ensemble forecasting models. Among these, the Dung Beetle Optimizer (DBO) algorithm [
22], inspired by the rolling, dancing, foraging, stealing, and reproductive behaviors of dung beetles, generates diverse regional search strategies and update rules, leading to the development of a DBO-LSTM neural network for short-term power load forecasting. In this model, different populations of dung beetles are used for searching, replacing the traditional approach of manually setting parameters based on human experience. Additionally, this method enhances the generalization ability of the LSTM network in handling computational time-series problems.
However, the Dung Beetle Optimizer (DBO) may exhibit low convergence accuracy and be prone to local optima in certain situations. To further enhance the accuracy of wind power forecasting, an improved DBO algorithm is proposed to address the global optimization problem, which we name MSADBO. Inspired by the Modified Sine Algorithm (MSA), we endow the dung beetle with the global exploration and local development capabilities of MSA, expanding its search range, improving global exploration ability, and reducing the likelihood of falling into local optima. Additionally, chaotic mapping initialization and mutation operators are introduced for perturbation.
The improved DBO optimizes three hyperparameters in the LSTM, significantly enhancing the model’s diagnostic accuracy. Ultimately, when compared with other models, MSADBO-LSTM demonstrates the best prediction accuracy, robustness, and the least lag, indicating that this model can accurately capture the changing trends of wind power and respond promptly to future variations, showcasing high practicality and reliability.
The following sections will introduce the principles of the MSADBO-LSTM model and highlight the improvements made based on the original DBO-LSTM. A performance comparison between MSADBO and other optimization algorithms, as well as a comparison of wind power forecasting results, will also be presented. Finally, the superiority of this algorithm will be demonstrated.
2. Model Principles
2.1. Long Short-Term Memory Network
LSTM (Long Short-Term Memory network) is a specially designed recurrent neural network (RNN) aimed at addressing the gradient vanishing and exploding issues faced by standard RNNs when processing long sequence data. The basic structure of an LSTM unit is illustrated in
Figure 1. What makes the LSTM unit unique is its inclusion of three different types of gates: the forget gate, input gate, and output gate. These gates precisely control the flow of information between units by weighting the input data and hidden states, effectively managing long-term dependencies.
The main function of the forget gate is to determine which information should be discarded from the cell state. It generates a value between 0 and 1 by weighting and activating the hidden state from the previous time step and the current input, where 0 indicates complete removal and 1 indicates complete retention. The input gate consists of two parts: a sigmoid layer and a tanh layer. The sigmoid layer determines which input values should be updated, outputting a value between 0 and 1 that controls the new information to be introduced. The tanh layer generates new candidate values that, after being controlled by the sigmoid layer, may be added to the cell state to update its value. The output gate is responsible for determining the output value based on the current cell state and the hidden state from the previous time step. Specifically, it first uses a sigmoid activation function to decide which information should be output, then applies a tanh activation function to transform the cell state, ultimately generating the new hidden state and output.
Through these gate mechanisms, LSTM can effectively retain and utilize long-term information when processing long sequence data, significantly improving the performance of traditional RNNs, especially in tasks such as language modeling and time-series prediction. As a result, LSTM networks are capable of capturing dependencies in sequence data over extended time horizons, thereby enhancing the model’s predictive ability and accuracy.
If the input sequence is
and the hidden layer is
, then at time t, we have:
In the equation, ft, it and ot represent the forget gate, input gate, and output gate, respectively; ct is used to update the memory cell state. Wf, Wi, Wc, Wo, Uf, Ui, Uc and Uo represent the weights for each network layer; bf, bi, bc and bo represent the biases for each function; are the activation functions, respectively.
2.2. Dung Beetle Optimization Algorithm
The Dung Beetle Optimization (DBO) algorithm is a swarm intelligence optimization algorithm based on the behavioral characteristics of dung beetles. The algorithm simulates various behaviors of dung beetles, such as rolling, dancing, breeding, foraging, and stealing, and designs a series of update rules and strategies. Each dung beetle group consists of four different types of agents: rolling beetles, breeding beetles (breeding balls), small beetles, and stealing beetles.
- (1)
Rolling beetles
Dung beetles roll balls of dung to suitable locations. While rolling, they use cues such as the sun or wind direction to maintain a straight path. To simulate this behavior in the algorithm, dung beetles need to move in a given direction within the search space. During the rolling process, the positions of these beetles are updated, their position changes are shown in Equations (7) and (8):
where t represents the iteration count; x
i(t) indicates the position information of the i-th dung beetle at the t-th iteration;
represents the natural coefficient, assigned a value of 1 or −1. When
, it indicates no deviation from the direction; when
, it indicates a deviation from the direction. K
(0,0.2] represents the deflection coefficient; b indicates a constant value belonging to (0,1); X
w represents the global worst position; and
is used to simulate changes in light intensity.
When these dung beetles encounter an obstacle blocking their path, they can use their dancing behavior to replan their route. In this case, the position update formula for these dung beetles is shown in Equation (9):
where
represents the deflection angle. When
, the position of the dung beetle will not be updated.
- (2)
Breeding beetles (breeding balls)
To safely breed their offspring, dung beetles roll the dung balls to a secure location and hide them inside, where they lay their eggs. Therefore, the boundary selection strategy for the dung beetles is shown in Equations (10) and (11):
where
represent the lower and upper bounds of the egg-laying area, respectively;
denotes the current optimal position;
,
indicate the maximum number of iterations; and
represent the lower and upper bounds of the optimization problem, respectively.
Once the location of the egg-laying area is determined, female dung beetles will choose the breeding ball in that area to lay their eggs. Each female dung beetle lays one egg per iteration. As seen from Equations (10) and (11), the boundaries of this egg-laying area are dynamic and depend on the value of R. Therefore, the position of the breeding ball also changes dynamically during the iterations, which can be represented as follows:
where
is the position of the i-th dung ball at the t-th iteration; b
1 is a D-dimensional random vector following a normal distribution, and b
2 represents a D-dimensional random vector within the range [0, 1].
- (3)
Small beetles
When some larvae mature into adult dung beetles and emerge from the ground to forage, they are referred to as small beetles. The boundaries of their optimal foraging area are defined as follows:
where
represent the lower and upper bounds of the optimal foraging area for small beetles, respectively;
denotes the global best position. Therefore, the position update for the small beetles is as follows:
where x
i(t) represents the position information of the i-th small beetle at the t-th iteration; C
1 indicates a random number following a normal distribution; and C
2 represents a random vector belonging to the interval (0,1).
- (4)
Stealing beetles
Some dung beetles will steal dung balls from other beetles, and these dung-stealing beetles are referred to as “thieving beetles”. In Equations (13) and (14), it can be observed that X
b represents the best food source. Therefore, we can consider that the area near X
b is the optimal location for food competition. During the iteration process, the positions of the thieving beetles are continuously updated and can be described as follows:
where x
i(t) represents the position information of the i-th thief at the t-th iteration; g is a random vector of size
following a normal distribution; and S denotes a constant.
2.3. Improved Dung Beetle Optimization Algorithm
2.3.1. The Purpose of the Improvement
Although the Dung Beetle Optimization (DBO) algorithm performs better than other algorithms in optimization, exhibiting strong optimization capability and fast convergence speed, it still faces imbalances in global exploration and local exploitation when tackling complex problems. This can lead to the risk of getting trapped in local optima and indicates weaker global exploration ability. Therefore, to enhance the exploration capacity of DBO, improvements are made to the existing optimization algorithm by incorporating Bernoulli mapping strategies, embedding an improved Sine Algorithm strategy, and utilizing adaptive Gaussian–Cauchy mutation perturbations to address these shortcomings.
2.3.2. Initialize the Population Using Bernoulli Mapping
Before the improvement, the population initialization of the Dung Beetle Optimization (DBO) algorithm was carried out using random generation. The shortcomings of this method include the uneven distribution of the beetles’ positions, weak global exploration capability, low population diversity, and a tendency to get trapped in local optima. In contrast, chaotic mapping combines determinism and randomness, characterized by randomness and non-periodicity. Chaotic initialization can enhance the search breadth of optimization algorithms and can be used to address global optimization problems [
23]. Currently, there are many types of chaotic mappings, among which the Bernoulli mapping is one. In the field of optimization, it can replace the random number initialization of populations, improving the distribution quality of the dung beetle population and enhancing global search capability. Therefore, we use Bernoulli mapping to initialize the positions of the dung beetles. First, we project the values obtained through the Bernoulli mapping relation into the chaotic variable space. Then, the resulting chaotic values are mapped into the initial space of the algorithm through linear transformation. The specific expression for the Bernoulli mapping is as follows:
where
is the mapping parameter,
, which we set to
to achieve optimal value performance.
The distribution of the Bernoulli mapping chaotic sequence is shown in
Figure 2. In
Figure 2a, the scatter plot helps to observe whether the initial points are uniformly distributed or if there are certain clustered regions. In
Figure 2b, the histogram primarily shows the frequency distribution of the system states, revealing whether the chaotic system is uniformly distributed, its randomness, and the diversity of its distribution. This method distributes the initial points almost uniformly (equitably) within the unit interval and ensures that the latter half of the generated sequence does not overlap. We have also demonstrated through statistical tests that the generated sequence exhibits good randomness. This allows for a more uniform distribution of the population initialized by the Bernoulli mapping, enhancing the quality and diversity of the population while avoiding issues such as getting trapped in local optima.
2.3.3. Introduction of the Improved Sine Algorithm
The Improved Sine Algorithm (MSA) [
24] is inspired by various algorithms related to the Sine Cosine Algorithm (SCA) [
25], the Sine Algorithm (SA) [
26], the Exponential Sine Cosine Algorithm (ESCA) [
27], and the Improved Sine Cosine Algorithm (ISCA) [
28]. It utilizes the sine function in mathematics for iterative optimization, demonstrating strong global exploration capabilities. To achieve a good balance between global exploration and local exploitation, an adaptive variable inertia weight coefficient
is introduced during the position update process. The position update formula for the Improved Sine Algorithm is shown in Equation (18):
where t is the current iteration count,
is the inertia weight,
is the i-th position component of individual X at the t-th iteration,
is the i-th component of the best individual position variable at the t-th iteration, r
1 is a nonlinear decreasing function, r
2 is a random number in the interval [0, 2π], and r
3 is a random number in the interval [−2, 2].
r
1 represents the search distance and direction of the dung beetle, optimizing the search method of the DBO algorithm. Its value is shown in Equation (19):
where
and
represent the maximum and minimum values of
, t denotes the current iteration count, and T
max represents the maximum number of iterations.
By using the adaptive coefficient
, the search space is gradually reduced, and as the number of iterations increases, the inertia weight decreases. In the early stages of the algorithm, a relatively large inertia weight allows for strong global exploration capability, while a relatively small inertia weight in the later stages helps improve local exploitation ability. The formula for the adaptive coefficient
is as follows:
To further enhance the global exploration and local exploitation capabilities of the DBO algorithm, a sine guiding mechanism has been introduced on top of the existing framework. By applying sine operations to the entire population of dung beetles during the rolling phase, the positions of the beetles are guided during updates. The improved formula is as follows:
where
, ST
(0.5,1]. In the improved position update formula, when
ST, it indicates that the dung beetle is rolling with a specific target, remaining in the normal global exploration phase. Conversely, when
ST, it signifies that the dung beetle does not have a clear rolling target but will move using a sine function when searching. By introducing this improved sine guiding mechanism, the excessive randomness in the DBO algorithm’s position update strategy can be significantly mitigated, while also addressing the original algorithm’s tendency to become trapped in local optima. This MSA sine guiding mechanism allows the dung beetles to engage in global exploration and local exploitation within the specified range of the algorithm, effectively expanding the search space. It facilitates a gradual convergence toward the same optimal solution, the target function value, thereby enhancing the algorithm’s global optimization capability.
2.3.4. Adaptive Gaussian–Cauchy Mixture Mutation Perturbation
In the final stage of the algorithm iteration, which is the foraging phase, the dung beetles tend to gather near the optimal position. However, the current position may not be the global optimum, causing the beetles to continuously search for the optimal position around their current location. This leads to the inability to discover the true optimal solution, resulting in them being trapped in local optima. To address this issue, mutation perturbations are generally employed to interfere with individuals, increasing the diversity of the population. This allows the beetles to escape local optima and explore other regions of the solution space until they ultimately find the global optimum.
In the Dung Beetle Optimization algorithm, a mutation operator is incorporated, where Gaussian mutation and Cauchy mutation are two commonly used mutation operators. Considering the advantages and disadvantages of both, an adaptive Gaussian–Cauchy mixture perturbation strategy is proposed, which combines the strengths of Cauchy mutation and Gaussian mutation. The specific formula is shown in Equation (19):
where
represents the optimal position of individual X at the t-th iteration,
is the position of
after the Gaussian–Cauchy mixture perturbation in the t-th iteration,
is the Gaussian mutation operator, and
is the Cauchy mutation operator,
,
.
In the early iterations of the algorithm, mutation perturbations are performed using the Cauchy distribution function, enabling global exploration and rapid convergence. As the algorithm continues to iterate, the positions of the dung beetles do not stabilize. At this point, the algorithm mainly employs the Gaussian distribution function to perturb the population, helping the algorithm escape local optima. By coordinating the characteristics of both Gaussian and Cauchy distribution functions, the diversity of the dung beetles is enhanced, further improving the algorithm’s local exploitation and global exploration capabilities. However, it cannot be guaranteed that the new position obtained after mutation perturbation will always have a better fitness than the original position. Therefore, after performing the mutation perturbation update, a greedy mechanism [
29] is introduced to compare the fitness of the new and old positions in order to determine whether to update the position.
represents the fitness value of position x, and the formula for its greedy mechanism is shown in Equation (23):
2.4. Performance Testing of MSADBO
To demonstrate that MSADBO exhibits strong optimization capabilities and convergence, this study compares the MSADBO algorithm with the Whale Optimization Algorithm (WOA) [
30], the Grey Wolf Optimizer (GWO) [
31], and the Dung Beetle Optimizer (DBO). These algorithms are tested on three single-peak benchmark functions (F1, F3, F5) and three multi-peak benchmark functions (F8, F11, F13). The unimodal benchmark test functions are used to assess the performance and effectiveness of optimization algorithms, evaluating their optimization capabilities and convergence speeds. In contrast, multimodal benchmark test functions are employed to determine whether optimization algorithms can avoid getting trapped in local minima and find the global optimal solution, thereby evaluating their global search and exploration abilities. The expressions of the test functions are shown in
Table 1, the results of the benchmark test functions are shown in
Table 2, and the test results are illustrated in
Figure 3.
From
Figure 3, it is evident that the MSADBO algorithm exhibits excellent optimization performance, high precision, fast convergence speed, and good stability. Additionally, this algorithm demonstrates strong global search capabilities and the ability to escape local optima.