BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization

Zheng, Danqi; Qin, Jiyun; Liu, Zhen; Zhang, Qinglei; Duan, Jianguo; Zhou, Ying

doi:10.3390/a18050243

Open AccessArticle

BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization

by

Danqi Zheng

¹,

Jiyun Qin

²,

Zhen Liu

³,

Qinglei Zhang

²,

Jianguo Duan

² and

Ying Zhou

^2,*

¹

Logistics Engineering College, Shanghai Maritime University, Shanghai 201306, China

²

China Institute of FTZ Supply Chain, Shanghai Maritime University, Shanghai 201306, China

³

Shandong Future Network Research Institute, Jinan 250002, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(5), 243; https://doi.org/10.3390/a18050243

Submission received: 25 March 2025 / Revised: 18 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Advanced Artificial Intelligence/Machine Learning Techniques for Safe Operation and Control in Power and Sustainable Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Maintaining the equilibrium between electricity supply and demand remains a central concern in power systems. A demand response program can adjust the power load demand from the demand side to promote the balance of supply and demand. Load forecasting can facilitate the implementation of this program. However, as electricity consumption patterns become more diverse, the resulting load data grows increasingly irregular, making precise forecasting more difficult. Therefore, this paper developed a specialized forecasting scheme. First, the parameters of improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) were optimized using beluga whale optimization (BWO). Then, the nonlinear power load data were decomposed into multiple subsequences using ICEEMDAN. Finally, each subsequence was independently predicted using the iTransformer model, and the overall forecast was derived by integrating these individual predictions. Data from Singapore was selected for validation. The results showed that the BWO–ICEEMDAN–iTransformer model outperformed the other comparison models, with an R² of 0.9873, RMSE of 48.0014, and MAE of 66.2221.

Keywords:

power system; load forecasting; beluga whale optimization; data decomposition; iTransformer

1. Introduction

The increasing complexity of modern power systems, driven by the integration of renewable energy and the growth of distributed microgrids, has introduced new challenges to grid operation and planning. As the energy supply becomes more dynamic and decentralized, ensuring the stability and efficiency of the power system requires accurate short-term load forecasting. Such forecasting not only supports real-time decision-making in power dispatch and grid management but also plays a crucial role in shaping electricity pricing strategies and achieving low-carbon objectives. To address these needs, a wide range of forecasting techniques have been developed, which generally fall into two main categories: traditional statistical models and data-driven approaches based on machine learning.

Earlier studies primarily relied on traditional statistical models due to their simplicity and interpretability. For example, Lee et al. used autoregressive integral moving average (ARIMA) for one-day-ahead load forecasting [1]. Li et al. used principal component regression (PCR) for medium-term load forecasting [2]. However, traditional statistical methods often fall short when it comes to forecasting nonlinear data like electricity load. Machine learning methods excel at managing complex data, which makes them increasingly preferred for load forecasting. For example, Al Amin et al. found that the Support Vector Machine performed better than statistical methods in predicting load data [3]. In recent years, deep learning has attracted considerable interest in the field of power load forecasting, owing to its capability to automatically learn complex temporal patterns from time series data without the need for manual feature engineering [4,5,6]. However, the variation of power load time series is affected by several factors, limiting the prediction accuracy of individual models. Therefore, more and more researchers tend to combine multiple methods into hybrid models to improve prediction accuracy. Li et al. designed a composite forecasting framework that leverages the strengths of convolutional neural network (CNN) and gated recurrent unit (GRU), which successfully enhanced the effectiveness of feature extraction from load data [7]. Kim et al. combined CNN and long short-term memory network (LSTM) to extract feature information from load data, demonstrating better performance compared to other methods [8]. To capture features more efficiently, some studies have introduced Attention Mechanism into prediction models [9]. Lin et al. proposed an attention-based LSTM network, where the feature-attention-based encoder can adaptively select the input features, and the decoder can mine the temporal dependencies, which improves prediction accuracy and generalization ability [10]. Niu et al. proposed a CNN–BiGRU–Attention model for load forecasting, where the integration of the attention mechanism reduced the amount of forecasting errors [11]. However, these models still have certain limitations for complex load data. Therefore, data decomposition techniques are employed to handle complex electricity load data. By decomposing the time series into intrinsic modal functions (IMFs), these techniques help reduce the instability of the data, thereby facilitating feature extraction and enhancing forecasting accuracy. Gao et al. combined empirical mode decomposition (EMD) and GRU for short-term load forecasting and demonstrated improved prediction accuracy [12]. However, the occurrence of intermittent fluctuations within the load data gives rise to modal aliasing in the extracted IMFs. Therefore, more advanced techniques have been applied in load forecasting. Li et al. used complementary ensemble empirical mode decomposition (CEEMD) to decompose the load sequence and introduced an evaluation factor to reconstruct the modal components for better forecasting performance [13]. Li et al. utilized grey relation analysis (GRA) to reconstruct the components decomposed by improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN), thereby enhancing the model’s anti-interference capability [14]. The model is improved with the introduction of the above decomposition technique, but the parameter selection mainly relies on empirical settings, which may limit the effectiveness of the decomposition and, consequently, affect forecasting accuracy.

The Transformer model based on attention mechanisms performs excellently in load forecasting [15,16,17,18]. iTransformer is a variant of the Transformer model that enhances its accuracy in prediction tasks involving multi-dimensional features [19]. To enhance load prediction accuracy, this paper proposed an ICEEMDAN–iTransformer forecasting model with the beluga optimization algorithm (BWO). BWO is used to optimize the ICEEMDAN to maximize its decomposition effectiveness. ICEEMDAN is then applied to decomposed load data to reduce its volatility. The IMFs and other influencing factors are used as inputs to the iTransformer to produce the final forecast results. The main contributions of this study are as follows:

A novel ICEEMDAN–iTransformer forecasting model based on the BWO is proposed. By integrating data decomposition, optimization algorithms, and an improved Transformer model, the accuracy of short-term electricity load forecasting is enhanced.
The data decomposition process is optimized by utilizing BWO to adjust ICEEMDAN parameters, maximizing the decomposition effectiveness and reducing the non-stationarity of electricity load data.

The organization of this paper is presented as follows, where each section addresses a specific aspect of the research in detail. Section 2 provides a detailed introduction to the relevant algorithms employed in the forecasting model. Section 3 outlines the design of the proposed forecasting model in detail. Section 4 describes the experimental setup and dataset, discusses the experimental results, and compares the load forecasting performance of different methods. Finally, the conclusion is presented in Section 5.

2. Methodology

2.1. Correlation Analysis

The variation in electricity load during actual operation is influenced by various external factors. However, not all influencing factors are significantly correlated with the load. Including weakly correlated variables may reduce the accuracy of the model. The Spearman correlation coefficient (SCC) was used to select the most relevant features as inputs, and its formula is as follows:

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(1)

where

d_{i}

represents the rank difference between the variables

x_{i}

and

y_{i}

,

n

is the sample size, and

ρ

ranges between [−1,1], where its absolute value increases with the strength of the correlation.

This study carried out a correlation analysis based on publicly accessible data obtained from the Singapore Energy Market Corporation (EMC) platform, and the findings are presented in Table 1. The analysis revealed that temperature had a high positive correlation (SCC = 0.6830) with electricity load, while relative humidity had a high negative correlation (SCC = −0.6440). Electricity price showed a medium positive correlation (SCC = 0.5131). Wind speed had a low positive correlation (SCC = 0.2080), suggesting a limited impact. Dew point (SCC = 0.0694), pressure (SCC = 0.0615), and wind direction (SCC = 0.0639) had very low correlations with electricity load, indicating that these factors have minimal impact on load variations. Therefore, temperature, relative humidity, and electricity price were selected as the inputs. To further validate the effectiveness of these three factors, holiday data were selected for analysis in Table 2. As a special time period, holidays may influence consumer electricity usage differently from regular days, such as increased family activities and greater use of leisure and entertainment devices. The analysis’ results showed that, even during holidays, the correlation between these three variables and electricity load remained strong, confirming the applicability and reliability of these factors during holidays.

Figure 1 shows that load data exhibits a high Spearman correlation between the current load and its lagged values. This reflects the temporal dependence of load data and supports the use of historical load data in short-term forecasting models.

2.2. Beluga Whale Optimization (BWO)

When dealing with complex time series data, the model may be affected by the local optimal solution and cannot achieve global optimal performance. The optimization algorithm optimizes the model by refining its hyperparameters to find the best possible outcome. Therefore, BWO is applied to optimize ICEEMDAN parameters. BWO has three stages, which respectively simulate the swimming, hunting, and whale fall behaviors of beluga whales [20]. The key feature of the BWO is its ability to balance global exploration and local exploitation. During the exploration phase, it escapes local optima and converges accurately to the global optimum in the exploitation phase. With its excellent optimization capability, it has been applied in various fields, including economic load dispatch [21] and risk assessment [22]. The balance factor

B_{f}

determines the phase in which BWO operates, the formula for which is as follows:

B_{f} = B_{0} \cdot (1 - \frac{t}{2 T})

(2)

where

B_{0}

varies in the range of (0, 1),

t

represents the current iteration, and

T

represents the maximum iteration. When

B_{f} > 0.5

, the BWO is in the exploration stage. As

t

increases,

B_{f}

gradually decreases, and when

B_{f} \leq 0.5

, the BWO enters the exploitation stage.

2.2.1. Exploration Phase

The initial search process in BWO mimics the natural instincts of beluga whales as they navigate and explore their environment. Beluga whales display different swimming postures, often in sync or as mirror images. In the algorithm, the position is updated as follows:

\{\begin{matrix} X_{i, j}^{t + 1} = X_{i, p_{j}}^{t} + (X_{r, p_{1}}^{t} - X_{i, p_{j}}^{t}) (1 + r_{1}) \sin (2 π r_{2}), j = e v e n \\ X_{i, j}^{t + 1} = X_{i, p_{j}}^{t} + (X_{r, p_{1}}^{t} - X_{i, p_{j}}^{t}) (1 + r_{1}) \cos (2 π r_{2}), j = o d d \end{matrix}

(3)

where

X_{i, j}^{t + 1}

denotes the position of individual

i

in dimension

j

in the next iteration. Assuming the dimension of the problem is d,

p_{j}

is a random integer within the range of [1, d].

X_{i, p_{j}}^{t}

denotes the position in dimension

p_{j}

in the current iteration. Assuming the population size is N,

r

is a random integer in the range of [1, N].

X_{r, p_{1}}^{t}

denotes the position of random individual

r

in dimension

p_{1}

in the current iteration, and

r_{1}

and

r_{2}

range between (0, 1).

2.2.2. Exploitation Phase

The exploitation mechanism is designed based on beluga whale hunting strategies. When hunting, beluga whales send their current position information to nearby beluga whales, allowing for a more efficient completion of the hunting task. In the algorithm, convergence is enhanced by introducing the Levy flight strategy. This strategy, featuring random jumps drawn from heavy-tailed distributions, enhances the exploration of the solution space and reduces the likelihood of getting stuck in local optima. The mathematical model for the Levy flight is as follows:

X_{i}^{t + 1} = r_{3} \cdot X_{b e s t}^{t} - r_{4} \cdot X_{i}^{t} + C_{1} \cdot L_{F} \cdot (X_{r}^{t} - X_{i}^{t})

(4)

where

X_{b e s t}^{t}

denotes the best position under the current iteration, and

r_{3}

and

r_{4}

range between (0, 1).

C_{1} = 2 \cdot r_{4} \cdot (1 - t / T)

denotes the random jump height of Levy flight intensity.

L_{F}

is a random number that conforms to the Levy distribution with the following formula:

L_{F} = 0.05 \times \frac{u \times σ}{{|v|}^{\frac{1}{β}}}

(5)

σ = {(\frac{Γ (1 + β) \times \sin (π \cdot \frac{β}{2})}{Γ (\frac{1 + β}{2}) \times β \times 2^{\frac{β - 1}{2}}})}^{\frac{1}{β}}

(6)

where

u

and

v

are drawn from a normal distribution and

β = 1.5

, influencing the scale of the random jumps.

2.2.3. Whale Fall Phase

The whale fall phase is designed based on the whale fall phenomenon. When a beluga whale fails to evade a threat, its carcass will sink into the deep sea, where it gives rise to a specialized ecosystem called a whale fall. In the algorithm, the position is updated as shown in the following formula:

X_{i}^{t + 1} = r_{5} \cdot X_{i}^{t} - r_{6} \cdot X_{r}^{t} + r_{7} \cdot X_{s t e p}^{t}

(7)

X_{s t e p}^{t} = (u_{b} - l_{b}) e x p (- C_{2} \frac{t}{T})

(8)

C_{2} = 2 W_{f} \times N

(9)

W_{f} = 0.1 - 0.05 \frac{t}{T}

(10)

where

r_{5}

,

r_{6}

, and

r_{7}

range between (0, 1), and

X_{s t e p}^{t}

denotes the step size of the whale fall descent.

u_{b}

refers to the upper bound of the variables, while

l_{b}

represents its lower bound.

C_{2}

is the step factor, and

W_{f}

is the probability of a whale fall.

2.3. ICEEMDAN

The irregular patterns observed in historical power load data present considerable challenges for developing reliable forecasting models. Decomposition algorithms are able to decompose the raw time series into multiple intrinsic modal functions (IMFs), with each IMF representing a particular frequency component or trend information in the data. This approach helps to capture the variations of the data at multiple levels, enabling the model to better understand the different hierarchical features of the data. The most commonly used decomposition algorithms include EEMD, CEEMD, and ICEEMDAN, among which ICEEMDAN exhibits the best denoising ability and decomposition accuracy. By regulating the degree of noise addition, ICEEMDAN enables a more accurate extraction of intrinsic oscillatory components from the signal. The specific steps are as follows.

(1): White noise is introduced into the original signal $x (t)$ to construct the signal:

x^{i} (t) = x (t) + α_{0} E_{1} (β^{(i)})

(11)

where

α_{k}

is the noise coefficient added at the k-th stage,

E_{j} (\cdot)

is the j-th stage modal component generated by the EMD,

(\cdot)

denotes the mean operator, and

β^{(i)}

is white noise with a mean of zero.

(2): The residual of the first stage is obtained as follows:

r_{1} = M (x^{i} (t))

(12)

where

M (\cdot)

denotes the local mean operator.

(3): The modal component of the first stage is calculated as follows:

I_{1} = x (t) - r_{1}

(13)

(4): The residual and modal components of the second stage are obtained as follows:

r_{2} = M (r_{1} + α_{1} E_{2} (β^{(i)}))

(14)

I_{2} = r_{1} - r_{2}

(15)

(5): The above steps are repeated to find the residual and modal component of the j-th stage:

r_{j} = M (r_{j - 1} + α_{j - 1} E_{j} (β^{(i)}))

(16)

I_{j} = r_{j - 1} - r_{j}

(17)

(6): The iteration stops when the residual becomes a monotonic function, or when the standard deviation of adjacent IMFs becomes less than 0.2.

The load data was first divided into training, validation, and test sets, and decomposition was performed separately on each set to avoid data leakage, which could affect the authenticity of the prediction. The results are shown in Figure 2, Figure 3 and Figure 4. All three datasets were decomposed into thirteen intrinsic mode functions and one residual, demonstrating that the data decomposition technique exhibits consistency and stability across different data partitions.

2.4. iTransformer

The iTransformer model focuses on the totality of variables by inverting the original modules of the Transformer, which is suitable for multivariate time series forecasting. The structure of the iTransformer model is illustrated in Figure 5, which shows the key components and their interactions in the prediction process. Given a multidimensional time series

X_{:, n} \in R^{T \times N}

, where

X_{:, n}

represents the entire time series of each variable indexed by

n

with time length

T

and variable dimension

D

, the specific prediction process is as follows:

(1): A multilayer perceptron (MLP) to map $X_{:, n} \in R^{T \times N}$ to $H = \{h_{1}, \dots, h_{N}\} \in R^{N \times D}$ . Here, $h_{n} = M L P (X_{:, n}) \in R^{D}$ contains all the temporal changes of the corresponding variable over the past time, called a Variate Token.

(2): A multivariate attention mechanism is used to analyze the correlation between each Variate Token.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(18)

where query, key, and value

Q, K, V \in R^{N \times d_{k}}

are obtained through linear projection, and

d_{k}

denotes the projection dimension.

(3): Each Variate Token is normalized to follow a Gaussian distribution, ensuring that the features of all variables are under a relatively uniform distribution, thereby reducing differences in measurement units.

L a y e r N o r m (H) = \{\frac{h_{n} - M e a n (h_{n})}{\sqrt{V a r (h_{n})}}| n = 1, \dots, N\}

(19)

H^{l - 1} = L a y e r N o r m (H^{l - 1} + S e l f - A t t n (H^{l - 1}))

(20)

(4): Extract the intrinsic properties of the sequences using feed-forward neural networks, followed by layer normalization.

H^{l} = L a y e r N o r m (H^{l - 1} + F e e d - F o r w a r d (H^{l - 1}))

(21)

(5): Return the predicted sequence.

\hat{Y} = M L P (H^{L})

(22)

2.5. Criteria

In this paper, three statistical indicators were used to evaluate the relevant models comprehensively [23,24]. Mean absolute error (MAE) quantifies the average magnitude of the differences between predicted and actual values. A smaller MAE indicates better prediction accuracy. Root mean square error (RMSE) is widely used to quantify the deviation between predicted and actual values. A smaller MAE indicates better prediction accuracy. R-squared (R²) represents the amount of information captured by the model. A value near 1 indicates a better model. They are calculated as follows:

M A E = \frac{1}{N} \sum_{i = 1}^{N} |{\hat{Y}}_{i} - Y_{i}|

(23)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{Y}}_{i} - Y_{i})}^{2}}

(24)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(Y_{i} - {\hat{Y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(Y_{i} - \bar{Y})}^{2}}

(25)

where

N

is the sample’s number of the

Y_{i}

series,

{\hat{Y}}_{i}

is the prediction value,

Y_{i}

is the actual value, and

\bar{Y}

is the mean value of

Y_{i}

.

3. Proposed BWO–ICEEMDAN–iTransformer Model

The process of the proposed model is shown in Figure 6.

(1): The raw power load data contains some anomalies and missing values that need to be preprocessed. For missing values, fill them with the average value of the same time point over the previous and subsequent three days. For anomalies, apply the three-sigma rule for screening and treat them as missing values.
(2): Optimize the two parameters of ICEEMDAN, noise standard deviation (Nstd) and the number of realizations (NR), using BWO to maximize the decomposition effect.
(3): Use ICEEMDAN to decompose the preprocessed electricity load data, generating modal components and residuas.
(4): Use SCC to select factors with high correlation to electricity load, including electricity price, related humidity (RH), and temperature (T).
(5): Reconstruct the decomposed subsequences with the selected relevant factors.
(6): Use the iTransformer model to obtain the predicted results for each subsequence and then combine them to obtain the final prediction.

4. Results and Discussion

To validate the predictive ability of the proposed model, we compared the performance of 17 forecasting algorithms on real datasets. The parameters of the intelligent optimization algorithms and the deep learning models are shown in Table 3. The experimental data uesd in this paper was obtained from the EMC [25]. It contains 2019–2020 data on electricity load, price, and weather recorded at 30-min intervals. The data is divided into three parts: training, validation, and test datasets, in a 3:1:1 ratio. Figure 7 shows the trends of four variables over 30 days, comprising a total of 1440 records in the test dataset. The variation in load follows a certain pattern. During a given week, daytime load levels on working days are generally higher than on weekends due to increased industrial and commercial electricity consumption. The trends of electricity prices, temperature, and relative humidity also show a certain degree of correlation with the load. The input sequence length set for the experiment was 96, and the output sequence length was 48. The experiments were conducted on a computer with a 2.5 GHz Intel CPU (Intel Corporation, Santa Clara, CA, USA), 32 GB of RAM (Samsung Electronics, Seoul, Republic of Korea), using Python 3.8 as the programming language.

As shown in Table 4, among the seven individual models, iTransformer was the best method among all the models, with an R² of 0.9577, MAE of 83.2604, and RMSE of 120.8423. iTransformer encoded the different variables into independent tokens, transforming the roles of the attention mechanism and the feedforward network, which enabled the model to better capture the temporal relationships within the data and the relationships between variables.

To assess the effectiveness of the data decomposition technique, we compared three decomposition algorithms. It has been shown through past experiments that predictive ability is improved after the introduction of data decomposition algorithms. The three data decomposition algorithms, ICEEMDAN, CEEMD, and EEMD, improved the predictive ability of the model to different degrees. For the three performance metrics—R², MAE, and RMSE—ICEEMDAN improved by 1.68%, 14.49%, and 21.37%; CEEMD improved by 1.58%, 13.51%, and 19.76%; and EEMD improved by 1.14%, 3.42%, and 13.89%, respectively. The degree of improvement of ICEEMDAN for the model was higher than that of CEEMD and EEMD. By reconstructing and decomposing the original data several times, ICEEMDAN effectively removed the noise in the data, allowing the model to focus more on the core trends of the data.

To validate the impact of optimization algorithms, we compared four optimization algorithms, including BWO, Gray Wolf Optimization (GWO) [33], Dragonfly Algorithm (DA) [34], and Chameleon Swarm Algorithm (CSA) [35]. After the introduction of the optimization algorithms, the prediction accuracy was improved to different degrees. BWO–ICEEMADN–iTransformer achieved an R² of 0.9873, with MAE reduced to 48.0014 and RMSE reduced to 66.2221, which significantly outperformed the other models. This indicates that BWO enhances the predictive ability of the model through global searches and optimization of parameters. Load data exhibits non-stationarity, leading to a complex solution space. By simulating the behaviors of beluga whales, BWO can effectively adjust the search direction and avoid getting trapped in local optima. Figure 8 illustrates the convergence analysis of four optimization algorithms with minimum envelope entropy as the fitness function. A comparison of the convergence curves highlights the superior performance of BWO, both in terms of convergence speed and final fitness value.

It is worth noting that the effect of BWO is not limited to optimizing the parameters of ICEEMDAN, but also the performance of other data decomposition algorithms. The BWO–CEEMD–iTransformer and BWO–EEMD–iTransformer models also showed significant performance improvement. Although their metrics, such as R² and MAE, were not as improved as those of the BWO–ICEEMADN–iTransformer model, they still showed considerable improvement compared to the original CEEMD and EEMD models.

To highlight the superiority of the BWO–ICEEMDAN–iTransformer model, its prediction curves were compared with the rest of the models, as shown in Figure 9, Figure 10 and Figure 11. The selected prediction curves represent the prediction results for an entire day.

Figure 9 compares the proposed model with eight other individual models. The results show that the prediction curve of the proposed model was closest to the actual values with the best-fitting performance. Although the other individual models could predict the general trend of electricity load, they showed significant errors in specific values, especially during the day when the load is high.

Figure 10 compares the proposed model with models using different decomposition algorithms. It can be seen that the proposed model performed better than the others. Compared to the individual iTransformer model, the ICEEMDN–iTransformer, CEEMD–iTransformer, and EEMD–iTransformer models show prediction curves that are much closer to the actual values during the day, though some fluctuations still exist during the night.

Figure 11 compares the proposed model with models using different optimization algorithms. Compared to the individual models, all four hybrid models showed significant improvements in prediction accuracy both during the day and night, with BWO outperforming the other optimization algorithms.

Figure 12 illustrates the errors of ablation models. The error of the individual iTransformer is larger, maintaining an error of about 200 MW for a period of time. The error of ICEEMDAN–iTransformer is significantly lower than that of iTransformer, but there is still a small period of time when the error reaches about 200 MW. The proposed BWO–ICEEMDAN–iTransformer model has the smallest error, and its error curve remains relatively smooth. Combined with Table 4, the proposed model improves 1.39% over ICEEMDAN–iTransformer and 3.09% over iTransformer.

Based on the above experiments, the BWO–ICEEMDAN–iTransformer exhibited the most superior performance in electricity load forecasting. The predictive model, through combining ICEEMDAN and BWO, can more effectively handle complex electricity load data, providing more accurate and stable forecasting results in practical applications. Although model fusion introduces additional computational burden, the increase in training time remains within an acceptable range for practical applications. This reasonable trade-off between performance and efficiency enables the fused model to significantly improve prediction accuracy while maintaining relatively high computational efficiency, providing a valuable solution for power load forecasting.

5. Conclusions

This paper proposed a load forecasting method, BWO–ICEEMDAN–iTransformer, that integrates electricity price and meteorological factors. The iTransformer model encoded different variables as independent tokens and employed the attention mechanism to efficiently extract the correlations between electricity load and other variables, showcasing exceptional performance in load forecasting. The introduction of ICEEMDAN effectively improved the prediction accuracy of the model. Compared with CEEMD and EEMD, ICEEMDAN had stronger denoising ability. Especially after combining with BWO, the parameters of ICEEMDAN were optimized to decompose the power load data more adequately than other models. Experiments on Singapore electricity load data demonstrated the superiority of the proposed model in various aspects.

Author Contributions

Conceptualization, D.Z.; Data curation, Q.Z.; Formal analysis, J.Q., Z.L. and Y.Z.; Investigation, J.Q. and J.D.; Methodology, D.Z.; Resources, Q.Z.; Software, D.Z. and Z.L.; Supervision, Q.Z.; Validation, J.Q. and Y.Z.; Visualization, Z.L. and J.D.; Writing—original draft, D.Z. and J.Q.; Writing—review & editing, Q.Z., J.D. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All data are presented in the main text.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lee, C.-M.; Ko, C.-N. Short-term load forecasting using lifting scheme and ARIMA models. Expert Syst. Appl. 2011, 38, 5902–5911. [Google Scholar] [CrossRef]
Li, Y.; Niu, D. Application of Principal Component Regression Analysis in power load forecasting for medium and long term. In Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Chengdu, China, 20–22 August 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 3, pp. V3-201–V3-203. [Google Scholar]
Al Amin, M.A.; Hoque, M.A. Comparison of ARIMA and SVM for short-term load forecasting. In Proceedings of the 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON), Jaipur, India, 13–15 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Imani, M. Electrical load-temperature CNN for residential load forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
Das, A.; Annaqeeb, M.K.; Azar, E.; Novakovic, V.; Kjærgaard, M.B. Occupant-centric miscellaneous electric loads prediction in buildings using state-of-the-art deep learning methods. Appl. Energy 2020, 269, 115135. [Google Scholar] [CrossRef]
So, D.; Oh, J.; Jeon, I.; Moon, J.; Lee, M.; Rho, S. BiGTA-Net: A hybrid deep learning-based electrical energy forecasting model for building energy management systems. Systems 2023, 11, 456. [Google Scholar] [CrossRef]
Li, C.; Li, G.; Wang, K.; Han, B. A multi-energy load forecasting method based on parallel architecture CNN-GRU and transfer learning for data deficient integrated energy systems. Energy 2022, 259, 124967. [Google Scholar] [CrossRef]
Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
Yadav, K.; Singh, M. A novel energy management of public charging stations using attention-based deep learning model. Electr. Power Syst. Res. 2025, 238, 111090. [Google Scholar] [CrossRef]
Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
Niu, D.; Yu, M.; Sun, L.; Gao, T.; Wang, K. Short-term multi-energy load forecasting for integrated energy systems based on CNN-BiGRU optimized by attention mechanism. Appl. Energy 2022, 313, 118801. [Google Scholar] [CrossRef]
Gao, X.; Li, X.; Zhao, B.; Ji, W.; Jing, X.; He, Y. Short-term electricity load forecasting model based on EMD-GRU with feature selection. Energies 2019, 12, 1140. [Google Scholar] [CrossRef]
Li, K.; Duan, P.; Cao, X.; Cheng, Y.; Zhao, B.; Xue, Q.; Feng, M. A multi-energy load forecasting method based on complementary ensemble empirical model decomposition and composite evaluation factor reconstruction. Appl. Energy 2024, 365, 123283. [Google Scholar] [CrossRef]
Li, L.; Jing, R.; Zhang, Y.; Wang, L.; Zhu, L. Short-term power load forecasting based on ICEEMDAN-GRA-SVDE-BiGRU and error correction model. IEEE Access 2023, 11, 110060–110074. [Google Scholar] [CrossRef]
Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Tian, Z.; Liu, W.; Jiang, W.; Wu, C. Cnns-transformer based day-ahead probabilistic load forecasting for weekends with limited data availability. Energy 2024, 293, 130666. [Google Scholar] [CrossRef]
Hu, J.; Hu, W.; Cao, D.; Sun, X.; Chen, J.; Huang, Y.; Chen, Z.; Blaabjerg, F. Probabilistic net load forecasting based on transformer network and Gaussian process-enabled residual modeling learning method. Renew. Energy 2024, 225, 120253. [Google Scholar] [CrossRef]
Xu, C.; Chen, G. Interpretable transformer-based model for probabilistic short-term forecasting of residential net load. Int. J. Electr. Power Energy Syst. 2024, 155, 109515. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
Zhong, C.; Li, G.; Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl. Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
Hassan, M.H.; Kamel, S.; Jurado, F.; Ebeed, M.; Elnaggar, M.F. Economic load dispatch solution of large-scale power systems using an enhanced beluga whale optimizer. Alex. Eng. J. 2023, 72, 573–591. [Google Scholar] [CrossRef]
Wang, C.; Wang, K.; Liu, D.; Zhang, L.; Li, M.; Khan, M.I.; Li, T.; Cui, S. Development and application of a comprehensive assessment method of regional flood disaster risk based on a refined random forest model using beluga whale optimization. J. Hydrol. 2024, 633, 130963. [Google Scholar] [CrossRef]
Hou, H.; Liu, C.; Wang, Q.; Wu, X.; Tang, J.; Shi, Y.; Xie, C. Review of load forecasting based on artificial intelligence methodologies. Electr. Power Syst. Res. 2022, 210, 108067. [Google Scholar] [CrossRef]
Shen, Y.; Li, D.; Wang, W. Multi-energy load prediction method for integrated energy system based on fennec fox optimization algorithm and hybrid kernel extreme learning machine. Entropy 2024, 26, 699. [Google Scholar] [CrossRef] [PubMed]
NEMS Prices [EB/OL]. 2024. Available online: https://www.nems.emcsg.com/nems-prices (accessed on 30 September 2024).
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: Cambridge, MA, USA, 2022; pp. 27268–27286. [Google Scholar]
Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; Lian, D.; An, N.; Cao, L.; Niu, Z. Frequency-domain MLPs are more effective learners in time series forecasting. Adv. Neural Inf. Process. Syst. 2024, 36, 76656–76679. [Google Scholar]
Zhang, T.; Zhang, Y.; Cao, W.; Bian, J.; Yi, X.; Zheng, S.; Li, J. Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures. arXiv 2022, arXiv:2207.01186. [Google Scholar]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
Liu, Y.; Li, C.; Wang, J.; Long, M. Koopa: Learning non-stationary time series dynamics with koopman predictors. Adv. Neural Inf. Process. Syst. 2023, 36, 12271–12290. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Mirjalili, S. Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput. Appl. 2016, 27, 1053–1073. [Google Scholar] [CrossRef]
Braik, M.S. Chameleon Swarm Algorithm: A bio-inspired optimizer for solving engineering design problems. Expert Syst. Appl. 2021, 174, 114685. [Google Scholar] [CrossRef]

Figure 1. Load and lagged load correlation heatmap. t indicates the current time step.

Figure 2. Decomposed training data using ICEEMDAN.

Figure 3. Decomposed validation data using ICEEMDAN.

Figure 4. Decomposed test data using ICEEMDAN.

Figure 5. The structure of the iTransformer.

Figure 6. The process of the proposed BWO–ICEEMDAN–iTransformer model.

Figure 7. Visualization of a subset of the test dataset.

Figure 8. Convergence analysis of different optimization algorithms.

Figure 9. Prediction curves of the proposed model and other single models.

Figure 10. Prediction curves of the proposed model and other models using different decomposition algorithms.

Figure 11. Prediction curves of the proposed model and other models using different optimization algorithms.

Figure 12. Prediction errors of ablation models.

Table 1. Spearman correlation coefficient between load and factors.

Factors	SCC
Price	0.5131
Temperature	0.6830
Dew Point	0.0694
Relative Humidity	−0.6440
Pressure	0.0615
Wind Direction	0.0639
Wind Speed	0.2080

Table 2. Spearman correlation coefficient between load and factors during holidays.

Factors	SCC
Price	0.4521
Temperature	0.5699
Relative Humidity	−0.4473

Table 3. Parameter settings.

Parameter	Value
Population size	30
Maximum number of iterations	30
Nstd	[0.15, 0.6]
NR	[10, 600]
Batch_size	72
Epochs	10
Patience	3
Learning rate	0.0001
linear layer	512
FFN layer	2048

Table 4. Prediction results of different models.

Method	R²	MAE	RMSE
iTransformer	0.9577	83.2604	120.8423
FEDformer [26]	0.9316	114.4395	153.7248
Pyraformer [27]	0.8888	135.9388	195.9085
Autoformer [28]	0.8138	193.9856	253.5933
FreTS [29]	0.9029	126.8714	183.1028
LightTS [30]	0.8796	149.3750	203.8944
PatchTST [31]	0.9162	118.5022	170.0605
Koopa [32]	0.8654	158.0816	215.6215
ICEEMDN–iTransformer	0.9738	71.1998	95.0231
CEEMD–iTransformer	0.9728	72.0094	96.9611
EEMD–iTransformer	0.9686	80.4151	104.0551
CSA–ICEEMDAN–iTransformer	0.9867	49.1543	67.7606
DA–ICEEMADN–iTransformer	0.9858	50.0338	69.9737
GWO–ICEEMADN–iTransformer	0.9869	48.8129	67.3607
BWO–CEEMD–iTransformer	0.9767	67.1662	89.6762
BWO–EEMD–iTransformer	0.9689	78.1639	103.5629
BWO–ICEEMADN–iTransformer	0.9873	48.0014	66.2221

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, D.; Qin, J.; Liu, Z.; Zhang, Q.; Duan, J.; Zhou, Y. BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization. Algorithms 2025, 18, 243. https://doi.org/10.3390/a18050243

AMA Style

Zheng D, Qin J, Liu Z, Zhang Q, Duan J, Zhou Y. BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization. Algorithms. 2025; 18(5):243. https://doi.org/10.3390/a18050243

Chicago/Turabian Style

Zheng, Danqi, Jiyun Qin, Zhen Liu, Qinglei Zhang, Jianguo Duan, and Ying Zhou. 2025. "BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization" Algorithms 18, no. 5: 243. https://doi.org/10.3390/a18050243

APA Style

Zheng, D., Qin, J., Liu, Z., Zhang, Q., Duan, J., & Zhou, Y. (2025). BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization. Algorithms, 18(5), 243. https://doi.org/10.3390/a18050243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

BWO–ICEEMDAN–iTransformer: A Short-Term Load Forecasting Model for Power Systems with Parameter Optimization

Abstract

1. Introduction

2. Methodology

2.1. Correlation Analysis

2.2. Beluga Whale Optimization (BWO)

2.2.1. Exploration Phase

2.2.2. Exploitation Phase

2.2.3. Whale Fall Phase

2.3. ICEEMDAN

2.4. iTransformer

2.5. Criteria

3. Proposed BWO–ICEEMDAN–iTransformer Model

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI