1. Introduction
Over the past few decades, electrical load forecasting has gained increasing importance in the electricity sector. This topic has become particularly crucial due to the growing demand for electricity and the need to ensure efficient management of power grids. This ever-evolving technique allows us to anticipate variations in energy consumption, a critical issue for infrastructure adaptation, and short-, medium-, and long-term planning.
The hierarchical analysis in [
1] underscores the importance of proper preparation in electricity exchanges, ensuring efficient and sustainable management of energy resources in a global context. Non-linear and hybrid methods, such as those developed in [
2,
3], emphasize the significance of combining different approaches to improve forecasting accuracy. Furthermore, the integration of artificial intelligence technologies has significantly enhanced the ability of forecasting systems to handle uncertainty [
4,
5]. Additionally, specific techniques, such as residual modification in Autoregressive Integrated Moving Average (ARIMA) models for seasonal forecasts, have also shown promising results [
6]. Electricity network planners rely on these forecasting methods to maintain grid stability and optimize operating costs. Anticipating demand more accurately not only helps prevent overloads and blackouts but also reduces costs associated with power generation and distribution. The hybrid approaches and adaptive models proposed by [
7,
8] demonstrate the effectiveness of mixed methods in multi-stage forecasting. Moreover, these strategic forecasts help mitigate the risk of outages and ensure a reliable electricity supply, as demonstrated by studies in [
9,
10].
Studies have shown that a 1% prediction error can cost millions of US dollars for power utilities [
1,
11,
12,
13,
14,
15]. In the literature, machine learning methods such as artificial neural networks, support vector machines, and recurrent neural networks are commonly used to address this issue [
16,
17,
18,
19]. Unfortunately, artificial neural networks suffer from generalization problems [
19]. Support vector machines are inefficient with few training datasets and present algorithmic complexity that increases rapidly with the size of the data [
16,
17,
18]. With a long-sequence dataset, recurrent networks suffer from explosion and vanishing gradients [
20,
21]. Long Short-Term Memory (LSTM) networks offer an alternative to this problem [
22], and perform well with large datasets. However, they face the issue of overfitting due to sensitivity to missing data and the training data size, which sometimes leads to costly computation times and significant resource consumption during hyperparameter optimization. Alternatively, wavelet transforms can make LSTMs more suitable for compression and feature extraction from nonstationary data. Recent contributions have been made in this context for forecasting purposes in the electricity sector [
16,
23,
24].
In our case, the aim is to propose new approaches that combine Deep Learning and data preprocessing techniques to develop robust models. The objective is to integrate wavelet coefficients with sorting, combined with Stacked LSTM, Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM), and Convolutional Long Short-Term Memory (ConvLSTM), to predict electricity demand. The main reasons for selecting these models are (i) their ability to learn complex relationships, (ii) their flexibility in predicting spatiotemporal data, and (iii) their performance with large amounts of training data. The use of wavelet transforms with sorted coefficients offers significant advantages: (iv) extraction of relevant information by denoising, (v) reduction in bias and variance errors, and (vi) improved processing of patterns within the same data class.
Although wavelet decomposition and LSTM networks have been explored in the literature [
25], the specific approach described here, which integrates wavelet decomposition with coefficient sorting and the ConvLSTM model, provides novel elements. The promising results and improved performance compared to existing models indicate a significant contribution and open up prospects for future research and practical applications.
The rest of the paper is organized as follows:
Section 2 presents study contributions;
Section 3 discusses the methodology of the proposed approach in detail and outlines the mathematical formalisms. The results and discussion of the proposed model are presented in
Section 4, followed by conclusions, recommendations, and policy perspectives in
Section 5.
2. Contribution of the Study
Electrical load is influenced by a multitude of non-linear and interdependent factors, such as weather conditions, economic cycles, and special events. This complexity necessitates models that can capture these intricate and dynamic relationships, thereby justifying the use of sophisticated methods like recurrent neural networks (RNN) and LSTM. These methods are specifically designed to handle non-linear time series and to capture long-term dependencies between variables. Indeed, machine learning methods, such as LSTMs, have demonstrated superior accuracy compared to traditional time series methods (ARIMA, ARMAX, …) and artificial neural networks (ANN) as evidenced by several comparative studies [
26,
27,
28,
29].
Furthermore, improved accuracy is crucial to avoid forecasting errors that could lead to critical imbalances in the power grid. Machine learning models, while demanding in terms of data and computation, offer robustness in the face of volatile and noisy data, which is essential for reliable forecasts in the dynamic environment of electrical load management. Additionally, the ability of these models to adapt to increasingly large datasets and incorporate new variables without requiring a complete redesign is a key selection criterion. However, methods such as Support Vector Machines (SVMs), although less complex than LSTMs, can quickly become impractical with large volumes of data due to their exponential training time.
Although sophisticated models like LSTMs offer high accuracy, they also introduce significant computational complexity. It is therefore essential to strike a balance between the need for accuracy and the complexity of implementation. Forecast optimality, however, is not solely about accuracy; it also includes the ability to generate forecasts within the reasonable timeframes required for real-time decision-making. As a result, it may be necessary to select methods that, while potentially less accurate, allow for faster and more economical implementation, ensuring an optimal compromise between accuracy and complexity. In this context, wavelets, with their efficiency in signal processing and mathematical analysis, can become powerful tools for enhancing LSTM networks. Consequently, the impact of this study on electrical load management decision-making is manifested in a significant improvement in network stability and efficiency.
This study aimed to improve the electric load prediction methods identified above. The use of sorted wavelet coefficients combined with LSTM variants was explored to forecast electric load. Four models, Stacked LSTM, BiLSTM, CNN-LSTM, and Conv-LSTM, were employed with hyperparameter optimization. The selected optimal model was then combined with the wavelet transform using sorted coefficients to predict the electrical load of the city of Lome. This approach offers significant advantages, such as extracting relevant features per data class, dimensionality reduction, robustness, and algorithmic complexity reduction, to better understand and model the complex variations in electrical loads, enabling more reliable and efficient predictions. This involves the introduction of a novel approach that has not yet been applied to electrical load forecasting, with the contributions outlined below.
Use of Multiple LSTM Architectures: The research explores four variants of LSTM neural networks, namely Stacked LSTM, BiLSTM, CNN-LSTM, and ConvLSTM.
Hyperparameter Optimization: An optimization process was implemented to identify the best-performing model among the variants tested.
Integration of Sorted Wavelet Coefficients: The approach combines sorted wavelet coefficients with the optimal LSTM model to enhance the accuracy of electrical load forecasts.
Application to the City of Lomé: The model was applied to predict the electrical load of the city of Lomé, demonstrating its effectiveness in a real-world context.
3. Materials and Methods
The diagram in
Figure 1 illustrates the complete methodology of the proposed approach. The preprocessing consists of eliminating missing values, smoothing out spurious data, and handling outliers. In our approach, inconsistent values were replaced with the mean of the dataset. This is followed by the optimization step, where the defined hyperparameters are optimized using a grid search algorithm to identify the best configuration. The mean squared error (MSE), as defined in Equation (1), was chosen as the loss function during the training process. Next, the entire dataset is divided into training and test data. The training data are subject to wavelet decomposition. This extracts various sorted components, such as the approximate component (CA_Tri) and the detailed components (CD1_Tri, …, CDn_Tri). These components are then used to train the optimal model. For the prediction phase, the test data aew also decomposed into wavelets to extract the same types of components with the same level of decomposition and the same parent wavelet. The trained model is then used to make predictions on these decomposed components. The resulting forecasts are recombined using the inverse wavelet transform to reconstruct the predicted time series. Finally, the forecast results are compared with the actual data to make a final assessment of the model’s performance.
where
is the number of training samples,
and
are the predicted and target values, respectively, for each sample. To optimize this loss function, we used the Adam optimizer [
30] to train various LSTM models. The Adam (Adaptive Moment Estimation) optimizer is widely used in deep learning for several reasons: the Adam optimizer is chosen for its ability to combine the advantages of different optimization methods, its ability to automatically adjust learning rates, and its stability when training complex models. For an in-depth, understanding and details of the optimizer are presented [
30]. In this paper, the authors present the theory behind Adam, details of the algorithm, and its effectiveness compared to other optimizers.
To avoid long discussions, only the wavelet transform is detailed. For the mathematical formalisms of the other four models, the reader can consult the documentation [
31,
32,
33,
34,
35,
36,
37].
3.1. Wavelet Decomposition Analyses
This technique decomposes a signal into its various components using the mother wavelet and the decomposition level. Equation (2) is used to represent a function
in terms of basic functions called wavelets [
38]. In this relationship, the first term represents the sum of the low scales, and the second is the sum of the high scales. The coefficients of detail and approximation were obtained using Mallat’s algorithm. The original signal was obtained by applying inverse decomposition.
where
are the scaling coefficients for level
,
are the scaling functions, often called scale basis functions, which correspond to the low-frequency components of the function
; and
are the wavelet coefficients for level
,
are the wavelet functions, which represent detail at different scales or resolutions,
is the scale index or decomposition level, and
:
the translation index.
3.2. Models Architectures
LSTM networks are recurrent neural networks [
31], used on long data sequences [
22]. Variants of LSTM networks exist in the literature, including those with and without a forgetting gate and LSTM with a peephole connection. This article uses LSTM networks with a peephole connection to develop four load prediction models: stacked LSTM, bidirectional LSTM, CNN-LSTM, and ConvLSTM. Indeed, LSTMs with forgetting gates offer the advantage of removing unnecessary information and retaining only essential data during training. By modifying the original LSTM cell and introducing a forgetting gate into the cell [
32], this process determines whether information should be retained or eliminated from the cell state. A value of the forgetting gate
close to 1 indicates the retention of information, while a value close to 0 means discarding all information. For stacked LSTMs, LSTM cells are arranged in layers to form a deep network. In this configuration, the number of LSTM cells per layer remains constant. Any deviation from this configuration results in a misconfigured network, which leads to information loss. If
represents the number of LSTM cells in the first layer, this layer outputs
N values. If the subsequent layer has a smaller number of cells than the preceding layer, some values are inevitably lost. This contrasts with the role of the forgetting gate, which removes information no longer required by the cell. It is essential to note that the decision to delete or retain information lies in the LSTM cell itself, not the network configuration. The fully connected layer allows adjustment of the number of network outputs. Therefore, with this final layer, it is possible to have 1, 2, 3, …,
n output(s). The Bidirectional LSTM network introduced in [
33], operates in two opposite directions. This bidirectional nature enables it to learn data sequences in both directions during training at the network input. Consider a data sequence [
xt−1,
xt,
xt+1, …,
xT]; to learn this sequence, the first layer of the BiLSTM network will use the data sequence from
xT to
xt−1. The second layer will use the data sequence from x
T to x
t−1. The first layer stores past information by executing the data sequence from
xt−1 to
xT, and the second layer stores future information by executing the data sequence from
xt−1 vers
xT. The BiLSTM network therefore can store both past and future information, which is impossible for the stacked LSTM network.
Figure 2 presents the architecture of the stacked LSTM and BiLSTM, and the input data structure of these models. The input data are initially prepared as a matrix of dimensions (n, 3), and then resized in 3d with a final dimension of (n, 3, 1).
Convolutional neural networks (CNNs) are widely used in the field of computer vision [
34] to solve problems such as object identification, image recognition, image classification, image segmentation, and so on. A CNN consists mainly of three layers, namely the convolution layer, the pooling layer, and the fully connected layer. These three layers perform mathematical operations that are necessary for the functioning of a CNN. The ConvLSTM network [
35], was designed for nowcasting precipitation and is particularly well-suited for handling spatiotemporal data like images and videos [
36,
37]. Each input is a three-dimensional spatiotemporal tensor, with the first two dimensions representing spatial dimensions. The matrix product between the input data and the weight matrices in a standard LSTM is replaced by a convolution product. The CNN-LSTM models utilize input data arranged as a matrix of dimensions (n, 4), which is then reshaped into 3d with a final dimension of (n, 2, 2, 1).
Figure 3 illustrates the data shape for the CNN-LSTM and ConvLSTM. For the ConvLSTM model, the input data have a dimension of (n, 4), which is then reshaped into 3D with a final dimension of (n, 2, 2).
3.3. Model Evaluation
For the performance evaluation indices like Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Correlation coefficient (R
2). The RMSE criterion measures accuracy by comparing the deviation between the estimated values and the measured data. The RMSE is always positive and is calculated using Equation (3).
where
represents the estimated values and
represents the measured values
The MAPE criterion shows the average percentage difference between the estimated values and the measured values. The MAPE is calculated using Equation (4).
The correlation coefficient (R
2) indicates the strength of the linear relationship between the predicted values and the observed values. R
2 is given by Equation (5).
where
is the mean predicted value and
is the mean observed value.
A calculation of signal-to-noise by the ratio between the variance of the signal values and the RMSE squared can be approximated by the inverse of the RMSE squared. The higher the ratio, the better the quality of the signal relative to the noise by Equation (6).
3.4. Modelling Data and Materials
The data are provided by Compagnie Energie Electrique du Togo (CEET) and concerns electricity consumption in Lome. The database used covers a period from 1 January 2018 to 31 December 2023, i.e., six (06) years of data. For the remainder of the work, 80% of the data will be used for training our models and 20% for testing. The database is structured in Excel format, with workbooks comprising 12 sheets corresponding to the different months of the year. Each workbook includes details of electricity consumption. Each of these data points is collected at hourly intervals. In addition, the entire project runs on an Intel(R) Core (TM) i7-4720HQ @ 2.60 GHz processor (4 CPUs) with 16 GB RAM from LENOVO. Each of these models was developed using Python 3.12.
4. Results and Discussion
The key hyperparameters for LSTM networks include input data size, output data size, number of hidden layers, activation function, learning rate, optimizer, epoch, batch size, neuron deactivation, and the number of LSTM cells. Each of these hyperparameters significantly influences the performance of LSTM networks.
Figure 4,
Figure 5,
Figure 6 and
Figure 7 depict the effects of hidden layers, activation function, epoch, and the number of LSTM cells on each model.
Figure 4 reveals that the RMSE increases between the first and second hidden layers. It stabilizes from the third to sixth hidden layers and a slight increase for the seventh and eighth hidden layers. The number of LSTM cells that give the lowest RMSE value is 160 cells. The results of the effect of the activation function on RMSE show that only the “softmax” function gives a very high RMSE value compared with the other activation functions. The minimum RMSE values were obtained with the “relu, softplus, selu, and elu” functions. Finally, the RMSE of Stack LSTM decreases from 10 to 100 epochs. It is recommended to choose a training epoch greater than or equal to 100.
In
Figure 5, it is evident that the RMSE value experiences an increase between the third and seventh hidden layers. Additionally, RMSE values exhibit slight variations with the number of BiLSTM layers, with the lowest RMSE observed at 64 BiLSTM cells. The impact of activation functions on RMSE reveals that only the “softmax” activation function results in a notably high RMSE compared to other activation functions. Conversely, minimum RMSE values were obtained with “relu, sigmoid, softplus, softsign, tanh, selu, and elu” activation functions. For the BiLSTM model, the RMSE decreased from 10 to 30 epochs and remained virtually constant from 40 to 100 epochs.
Figure 6 indicates that the RMSE value remained practically constant when the number of CNN layers varied from one to eight. Slight variations in the RMSE value were observed owing to the changes in the number of CNN filters. The analysis of the activation function’s effect on RMSE indicates that the softmax function leads to a significantly higher RMSE compared to other functions. Conversely, the minimum RMSE values were achieved with the “relu, softsign, tanh, selu, and elu” activation functions. For the CNN-LSTM model, the RMSE sharply decreased from 10 to 20 epochs and stabilized from 30 to 100 epochs.
Figure 7 reveals that the minimum RMSE occurs with six ConvLSTM layers. The RMSE remained constant for 32, 64, 128, and 160 filters, whereas only 96 ConvLSTM filters resulted in a notably high value compared to other filter numbers. Additionally, the impact of activation functions on RMSE highlights that only the “softmax” function yields a considerably higher RMSE, while minimum RMSE values are associated with the “relu, softplus, selu, and elu” functions. Finally, the optimal minimum RMSE value was obtained using 60 epochs.
4.1. Hyperparameters Optimization
The algorithms commonly used to optimize hyperparameters LSTM networks include grid search and random search. Grid search involves testing all possible combinations of candidate hyperparameter values to determine the best combination. However, it faces challenges in high-dimensional search spaces, where the search space expands with the number of hyperparameters [
39]. Nevertheless, parallelization can be applied to mitigate this issue. The random search involves randomly selecting hyperparameter values from a specified list and testing various combinations. According to [
40], random search is more efficient than grid search when the search space is reduced. However, for high-dimensional search spaces, grid search is often preferred. In summary, the choice between grid search and random search depends on the complexity and dimensionality of the hyperparameter search space. In this study, grid search is employed for hyperparameter optimization. This algorithm, however, tends to be memory-intensive. To mitigate this, the study focuses on optimizing four fixed hyperparameters. The process is parallelized to enhance search time efficiency. For each network type (stacked LSTM, BiLSTM, CNN-LSTM, and Conv-LSTM), the “GridSearchCV” class from the “Scikit-learn” module was used. This class integrates grid search with cross-validation. Cross-validation involves dividing the training dataset into k folds, optimizing hyperparameters on one-fold, and training the network on the remaining k − 1 folds. The parameterization of the “GridSearchCV” class aligns with the specifications outlined in Equation (7).
where: model defines the model to be optimized; parameters are the list of hyperparameters to be optimized; cv: number of folds fixed at three in this context; n_ jobs are the number of processors used for the search (n_ jobs = −1 allows all available processors to be used to speed up the search). The optimized hyperparameter results and their codifications used to elaborate some configurations are given in
Table 1 and
Figure 8, respectively.
The training curves of the different forecasting models are depicted in
Figure 9. Analysis of these curves reveals a scenario of a good fit, characterized by a training loss (loss) and validation loss (val_loss) decreasing to a stable point with a minimal gap (almost equal to zero) between the two final loss values. These results indicate that the training can be extended further, as there is no sign of overfitting yet. Therefore, the training epoch is configured to 500 (epochs = 500), and early stopping is implemented to prevent overtraining.
4.2. Optimal Models Training
For the Stacked LSTM, the optimal model was obtained with the configuration [C K I D] consisting of three (03) hidden layers, 100 training epochs, the activation function elu, and 32 LSTM cells. The optimal BiLSTM configuration was [C K H E], with three (03) hidden layers, 100 training epochs, activation function selu, and 64 LSTM cells. For the optimal CNN-LSTM configuration [B K H O], the model is obtained with two (02) hidden layers, 100 training epochs, the activation function selu, 96 filters on the first CNN layer, and 64 filters on the second CNN layer. The optimal ConvLSTM configuration is [C K H Q], obtained with three (03) hidden layers, 100 training epochs, the activation function relu, 96 filters on the first convolutional layer, 64 filters on the second convolutional layer, and 32 filters on the third convolutional layer. The various trained models are used to predict electricity consumption over a one-hour time horizon. The MSE for these models are presented in
Table 2.
The correlation between predicted and actual load is presented according to
Figure 10.
Figure 11 displays the forecast results for the first seven days of 2023. It shows an overview of the forecasting results achieved by all these models. The correlation results between actual and predicted load are depicted. The criteria used to evaluate model prediction results include the MAPE, the RMSE, and the coefficient of determination (R
2). Based on these criteria, the most accurate model will be the one with the lowest MAPE, lowest RMSE, and highest R
2. Moreover, the model accuracy also depends on finding a compromise with the algorithms used. The complexity of each prediction model is evaluated by measuring the average time taken to predict an output value. This average prediction time is inherently linked to the mathematical operations executed within each model and is influenced by the characteristics of the computer used, such as the number and frequency of the CPU. The results presented in
Figure 12, indicate that the fastest model is BiLSTM, with an average prediction time of 0.13 s for one output value. The slowest model is ConvLSTM, taking an average of 0.23 s. Despite the speed differences, all models yielded satisfactory results.
It is noteworthy that the ConvLSTM model while being the most accurate, is also the slowest. This highlights the trade-off between accuracy and prediction speed. In terms of model ranking, both Stacked LSTM and CNN-LSTM are notable for their accuracy and speed. It is worth mentioning that electric charge prediction models are at times deployed on the web for public access. Under these conditions, the Stacked LSTM and CNN-LSTM networks are suitable, as the calculation time of the forecasting model is very high when operating on remote servers. For electricity market applications, the ConvLSTM model is suitable, as at this level 1% forecast error can lead to increased operational costs.
4.3. Effect of Wavelet Transform
We considered the best model ConvLSTM for the rest of the study. Various steps are described as follows: the electric charge is decomposed into approximation and detail coefficients using wavelet transform. The results are sorted in ascending or descending order. The sorted values are used to train the model. The final model is used to predict the approximation and detail coefficients. The sorted coefficients are then transformed into their initial states. The predicted load is finally used to reconstruct with these coefficients. This approach offers several advantages, such as reduced data dimensionality, highlighting the most significant signal components, potentially improving prediction performance, and leading to less costly computation time etc. In the literature, orthogonal wavelets are adapted for time series analysis [
24,
41,
42,
43]. The fouth-order Daubechies mother wavelet was employed. The maximum decomposition level is given by Equation (8), where N represents the number of samples of the time series [
44,
45,
46]. In our context, the decomposition level is 4.
The results of the decomposition as well as those with the unsorted coefficients are depicted in
Figure 13. Training with the sorted coefficients of the multi-level decomposition led to an enhancement in the performance of the ConvLSTM network. The coefficients predicted by the ConvLSTM model in sorted form and subsequently reconstructed are illustrated in
Figure 14.
Figure 15 illustrates the evolution of load profiles over time with unsorted coefficients. The analysis reveals a large discrepancy between actual load and estimated forecast load in absolute error values between 10 and 50 MW.
Compared au model with wavelet coefficients sorting, the prediction results are shown in
Figure 16. The analysis reveals that this technique improves the performance of the ConvLSTM model. The proposed model offers a better match with the CEET electrical load data with absolute error ranging between 0.5 and 3 MW.
4.4. Benchmark Models
To demonstrate the effectiveness of the proposed approach, a comparative study was carried out on reference models, including the ARIMA and ANNs of the MLP-type multilayer perceptron and ConvLSTM as reliable forecasting methods. The reasons for making these choices are enormous, some of which are mentioned in the introduction to the article; and others in [
47,
48]. The prediction results for these models are illustrated in
Figure 17. All models underwent optimization using the same LSTM hyperparameter optimization algorithms (GridsearchCV). The optimized MLP model consists of three (03) hidden layers. The first hidden layer comprises 96 neurons. The second has 64 neurons and the third has 32 neurons—these values were obtained through optimization. The chosen activation function is “relu”. For the ARIMA model, the best hyperparameters are “p = 4. d = 1. q = 0”. corresponding to ARIMA (4. 1. 0). The performance of these four models is evaluated based on the MAPE, RMSE, and R
2 criteria. The results are presented in
Table 3.
The WT+ConvLSTM model has the highest SNR (1.6393), indicating the best performance in terms of noise robustness among the compared models. The other models, with SNRs around 0.17 to 0.18, show lower performance, with the ARIMA model having the lowest SNR. The results demonstrate the impact of sorting wavelet transform coefficients on electric load prediction. The relevance of using this approach often overlooked in the literature reveals better prospects in the electricity sector. The proposed method can be employed for short-term forecasts and assisting grid operators in integrating other sources more effectively and enhancing grid stability.
5. Conclusions
To address the challenges of electric load forecasting, an in-depth study was conducted, leading to the development of a robust model based on wavelet decomposition, both with and without coefficient sorting, combined with different types of LSTM networks. Four models were employed: Stacked LSTM, BiLSTM, CNN-LSTM, and ConvLSTM. The hyperparameters of each model were optimized using grid search cross-validation. Evaluation criteria, including MAPE, RMSE, and R2, indicated that the ConvLSTM model outperformed the others.
Subsequently, a hybrid approach was proposed, combining ConvLSTM with wavelet decomposition using coefficient sorting, specifically employing a fourth-order Daubechies wavelet and a decomposition level of 4. This enhancement significantly improved accuracy, reducing the absolute error from 10–50 MW to 0.5–3 MW. Moreover, the proposed hybrid model demonstrated its effectiveness by surpassing traditional reference models such as ARIMA and MLP ANNs based on performance metrics.
This method can be utilized by power plant operators to address high prediction errors associated with the nature and quality of data, optimizing operational and maintenance activities through accurate load forecasts, and facilitating overall planning.
The study will have a significant impact on the decision-making process regarding the regulation of electricity prices in line with market demand. Additionally, it could have medium- and long-term repercussions by contributing to the development of policies aimed at improving universal access to electricity and reducing energy dependency.
Future work suggests increasing model complexity by incorporating additional variables, including social factors, to further improve forecast accuracy. Exploring combinations of these models with each other or with other machine learning techniques is also recommended. Additionally, it is advised to investigate other hyperparameter optimization algorithms for more comprehensive theoretical research in electric load prediction. Extending model validation to other cities or countries is essential to ensure robustness and generalizability.