A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting

Liu, Mingping; Li, Yangze; Hu, Jiangong; Wu, Xiaolong; Deng, Suhui; Li, Hongqiao

doi:10.3390/en17010095

Open AccessArticle

A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting

by

Mingping Liu

¹

,

Yangze Li

¹,

Jiangong Hu

^1,*

,

Xiaolong Wu

^1,2,

Suhui Deng

¹ and

Hongqiao Li

³

¹

School of Information Engineering, Nanchang University, Nanchang 330031, China

²

Shenzhen Research Institute, Huazhong University of Science and Technology, Shenzhen 518000, China

³

EAST Group Co., Ltd., Dongguan 523808, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(1), 95; https://doi.org/10.3390/en17010095

Submission received: 26 November 2023 / Revised: 15 December 2023 / Accepted: 20 December 2023 / Published: 23 December 2023

(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Download

Browse Figures

Versions Notes

Abstract

:

A stable and reliable power system is crucial for human daily lives and economic stability. Power load forecasting is the foundation of dynamically balancing between the power supply and demand sides. However, with the popularity of renewable energy sources and electric vehicles, it still struggles to achieve accurate power load forecasting due to the complex patterns and dynamics of load data. To mitigate these issues, this paper proposes a new hybrid model based on a sample convolution and integration network (SCINet) and a long short-term memory network (LSTM) for short-term power load forecasting. Specifically, a feed-forward network (FFN) is first used to enhance the nonlinear representation of the load data to highlight the complex temporal dynamics. The SCINet is then employed to iteratively extract and exchange information about load data at multiple temporal resolutions, capturing the long-term dependencies hidden in the deeper layers. Finally, the LSTM networks are performed to further strengthen the extraction of temporal dependencies. The principal contributions of the proposed model can be summarized as follows: (1) The SCINet with binary tree structure effectively extracts both local and global features, proving advantageous for capturing complex temporal patterns and dynamics; (2) Integrating LSTM into the SCINet-based framework mitigates information loss resulting from interactive downsampling, thereby enhancing the extraction of temporal dependencies; and (3) FNN layers are strategically designed to enhance the nonlinear representations prior to feeding the load data fed into the SCINet and LSTM. Three real-world datasets are used to validate the effectiveness and generalization of the proposed model. Experimental results show that the proposed model has superior performance in terms of evaluation metrics compared with other baseline models.

Keywords:

short-term load forecasting; sample convolution and interaction network; long short-term memory network; complex patterns; dynamics

1. Introduction

The dynamic balance between power supply sides and demand sides is the premise of power system operation due to the inability to store massive amounts of electric energy. Otherwise, the power quality will be severely disrupted, causing instability and unreliability in the power system. To overcome this problem, power load forecasting is employed to predict the changing trend of future load data, assisting utilities in formulating advance plans for power generation and consumption. Consequently, power load forecasting has garnered increasing attention worldwide. According to the prediction horizon, power load forecasting is divided into short-term, medium-term, and long-term load forecasting, respectively [1]. Among them, short-term load forecasting (STLF) plays a critical role in the daily operation of the power systems, enabling the electricity producers to proactively adjust power production and significantly reduce operation costs [2]. As described in [3], even a 1% reduction in the prediction error has the potential to save millions of dollars for these typical utilities. Therefore, accurate power load forecasting is very important for ensuring the stable and economical operation of the power system.

Over the past several decades, a wide range of models and techniques for STLF have been developed and put into real-world applications. Generally, these models can be classified into two major groups: statistical-based models and machine learning-based models [4,5,6]. The traditional statistical models include approaches such as exponential smoothing (ES) models [7,8], autoregressive moving average models (ARIMA) [9,10], and gray models (GM) [11,12]. However, these classical methods face challenges in accurately predicting complex time series due to their linear assumptions and limited scalability. As a result, they struggle to effectively handle the nonlinearities inherent in time series data and address the characteristics of large-scale time series.

With the enrichment of load data and advancements in computational power, machine learning models overcome the constraints of traditionally statistical approaches, proving to be effective in the area of load forecasting. These models utilize various unique techniques to extract nonlinear features and multivariate time features. Notable examples include support vector regression (SVR) [13,14,15], Kalman filter (KF) [16,17], random forest regression (RF) [18,19], fuzzy logic framework (FL) [20,21], artificial neural network (ANN) [22,23], etc. The performances of machine learning models heavily depend on the manual selection of feature engineering, and a significant portion of studies in machine learning algorithms are focused on data preprocessing. As a result, machine learning models are unable to automatically extract key information from time series data at multiple levels and then easily lead to overfitting, so other advanced approaches must be developed and used to compensate for this limitation [24,25]. In recent years, deep learning approaches have successfully mitigated the dependence on intricate and manual feature engineering by virtue of their capability to automatically learn and extract in-depth features from time series data. Therefore, deep learning-based models have been widely used in STLF. For instance, Cai et al. [26] designed a convolutional neural network (CNN) deep learning model with gating for day-ahead load forecasting in commercial buildings, and the accuracy of the prediction was improved by 22.6% compared to traditional statistical models. Although CNNs are capable of modeling complex patterns between time series, they are unable to capture the long-term dependencies of time series. As a result, recurrent neural networks (RNNs) are increasingly employed to capture long-term dependencies in time series data [27]. However, traditional RNN has limited performance due to its inherent problem of gradients vanishing or exploding. To address these challenges, advanced RNN models like LSTM [28] and gated recurrent unit (GRU) [29] have been introduced, demonstrating enhanced performance in load forecasting. For instance, Islam et al. [30] applied the LSTM model to a smart grid system for urban planning in Bangladesh, resulting in a more significant improvement in prediction accuracy compared with conventional load forecasting techniques. It should be noted that single models typically perform well in specific scenarios but may not excel in all areas of time-series forecasting.

Recently, researchers have developed adaptive hybrid models to improve prediction accuracy and generalization. By integrating various individual modules, hybrid models utilize the unique strengths of each, compensating for the limitations inherent in single models. This usually provides higher accuracy and better generalization for solving load forecasting problems. For instance, a hybrid model based on CNN and GRU in [31] was proposed for probabilistic residential load forecasting. In [32], a CNN-LSTM hybrid model was utilized for residential load forecasting, and its predicted results outperformed those of single models. In the work of Liu et al. [33], a hybrid model concatenating DenseNet with an improved TCN was proposed. The proposed model was capable of extracting in-depth features of time series and utilized an attention mechanism to reinforce feature information. Experimental results indicated good precision for various time scales of prediction. However, the increasing penetration of renewable energy sources, flexible loads, and time-varying load consumption in the distributed grids further complicates the patterns and dynamics of the load data. For instance, renewable energy sources are influenced by unpredictable environmental factors such as weather conditions, resulting in irregular trend variations. Moreover, power load data display substantial fluctuations on intraday and seasonal scales due to human activities, industrial production, and meteorological conditions. Any alteration in one of these external factors may lead to distinct nonlinear patterns in load data. Consequently, extracting the complex patterns and dynamics of load data across different time scales remains a challenging task in the modern power system. Deng et al. [34] proposed a deep convolutional neural network model based on multi-scale convolution with the goal of extracting features at different layers. This approach effectively captures multi-scale local features of time series data by utilizing multi-size convolutional kernels. However, the proposed model did not thoroughly account for the dependencies between different time scales, making it challenging to capture long-term trends and global features. Wang et al. [35] employed multi-scale downsampling convolution and isometric convolution to extract local features and global correlations, respectively. However, this approach primarily concentrates on capturing average features at different time scales, thereby neglecting in-depth features in complex temporal dynamics. It is noteworthy that trends and seasonal components of the load data can persist even after downsampling into two sub-sequences. Therefore, Liu et al. [36] introduced the sample convolution and integration network (SCINet). This innovative framework utilizes a perfect binary tree structure to capture time series dependencies at various time scales through recursive downsampling combined with a comprehensive set of convolutional filters and interactive learning mechanisms. This approach not only captures features within different time scales but also identifies and understands interdependencies between time scales of different sizes, effectively revealing both local features and global trends in the entire time series. Parri et al. [37] integrated variational mode decomposition (VMD) with SCINet for wind speed forecasting. VMD was utilized for denoising, while SCINet captured global patterns and long-term dependencies. However, the implementation of VMD may introduce a risk of information leakage and significantly increase the computational complexity of SCINet. Silva et al. [38] proposed deep convolutional networks in a binary tree structure with skip connections for long-term series forecasting. The proposed model extracted and exchanged key information at different resolutions, thereby enhancing prediction accuracy. However, as the levels of SCINet tree structure increase, the proposed model faces challenges in effectively transferring information to deeper levels and incurs higher computational costs due to the large number of training parameters. Therefore, it is imperative to develop a new model for STLF that can comprehensively extract long-term dependencies at different time scales from load data with complex patterns and dynamics.

To address the aforementioned limitations, a new hybrid model based on feed-forward networks (FFN), SCINet, and LSTM is proposed and successfully applied to forecast the load data. This model commences by employing FFN for nonlinear transformations of raw data, enhancing its ability to capture complex features. Subsequently, it utilizes SCINets multi-scale convolutional interactive operations to grasp the long-term dependencies in load data characterized by complex patterns and dynamics. The processed data from SCINet is further refined through FFN and LSTM to extract temporal dependencies more effectively, thereby boosting predictive accuracy. The efficacy and generalizability of the model are validated using three real-world load datasets, showcasing its innovative approach to handling the complexities of time series data without significantly increasing computational complexity. The primary contributions and innovations of this paper include:

(1): A new hybrid framework, constructed with SCINet and LSTM, has been proposed to forecast short-term load data with complicated temporal patterns and dynamics. This framework employs an encoder-decoder architecture, effectively capturing feature dependencies of load data across different time scales.
(2): The SCINet, utilizing its unique binary tree structure and a downsample-convolution-interaction architecture, extracts both local and global features and facilitates capturing complex temporal patterns and dynamics through interactive learning between the sub-sequences.
(3): Integrating LSTM into the SCINet-based framework mitigates information loss resulting from iterative downsampling, thereby further enhancing the extraction of long-term dependencies and then improving the prediction accuracy.
(4): In order to effectively capture the nonlinear features of the load data with complex temporal patterns and dynamics, the proposed model employs FFN layers with residual connections prior to SCINet and LSTM modules to improve the nonlinear representations. In this way, the SCINet can largely retain multi-scale temporal features, and the LSTM is strengthened to capture intricate long-term dependencies.
(5): The proposed model is tested on two real-world power load datasets. The experimental results indicate superior performance in STLF compared to other state-of-the-art models.

The remainder of this paper is structured as follows: Section 2 offers a comprehensive description of the components within the proposed model. Section 3 outlines the experimental results of the proposed model, evaluates its performance, and compares it with other contrast models. Finally, Section 4 concludes this paper and briefly discusses future work.

2. Methodology

In this section, the proposed model mainly consists of three core modules: FFN, SCINet, and LSTM. All these modules will be described in detail, emphasizing their respective roles and unique functionalities. Furthermore, the framework architecture of the proposed model and its operation will be introduced.

2.1. Feed-Forward Network

The proposed model integrates two FFN layers with residual connections and employs the Swish activation function. This integration aims to enhance learning capabilities and prediction accuracy, particularly by addressing the gradient vanishing issue common in deep networks. The structure of the FFN is illustrated in Figure 1. The residual connection in the FFN allows for effective gradient propagation, ensuring robust learning even in complex data scenarios. Furthermore, the expressions for the Swish activation function and the FFN are written as follows:

S w i s h (z) = z \cdot σ (β z)

(1)

y = F F N (x) + x

(2)

where z is the input to the Swish activation function, and β is a trainable parameter that modulates the slope of the function. The term

σ (z)

denotes the sigmoid function,

1 / (1 + e^{- z})

, providing the necessary non-linearity. In the context of our FFN,

x

represents the series of input data, and FFN (•) denotes the output of the FFN layer. The final output

y

of the model is obtained by adding the input

x

directly to the output of the FFN layer, thereby forming the residual connection, effectively expressed as Equation (2). This residual connection is pivotal in mitigating the vanishing gradient problem, especially in deeper network architectures.

2.2. Sample Convolution and Interaction Network

The SCINet is a novel time series forecasting framework, and its structure is depicted in Figure 2. It is evident that the SCINet is built by organizing multiple SCI-Blocks into a binary tree structure to downsample both the local and global features of the load data. This unique method leverages sample convolution to iteratively extract and exchange features at multiple time scales for time series modeling. Hence, it can substantially improve the extraction of in-depth features from complex time series data, leading to enhanced prediction accuracy.

The basic module of SCINet is SCI-Block, which decomposes the input matrix into two sub-sequences by using splitting and interactive-learning operations. In particular, the original input matrix fed into SCINet is denoted as

X

. Within each SCI-Block at the l-th layer, the input

F_{l}

is downsampled into two sub-sequences

F_{l, o d d}

and

F_{l, e v e n}

by separating odd and even elements. Then, various convolutional filters are utilized to process

F_{l, o d d}

and

F_{l, e v e n}

. An interactive learning technique is employed between the two subsequences to mitigate information loss during the downsampling procedure. Ultimately, two refined sub-features

F_{l + 1, e v e n}

and

F_{l + 1, o d d}

are achieved. The operations of SCI-Block are depicted by the following equations:

F_{l, e v e n} (:, x, :) = F_{l} (:, 2 x, :)

(3)

F_{l, o d d} (:, x, :) = F_{l} (:, 2 x + 1, :)

(4)

F_{l, o d d} ⊙ e x p (η (F_{l, e v e n})) = F_{l, o d d}^{m}

(5)

F_{l, e v e n} ⊙ e x p (θ (F_{l, o d d})) = F_{l, e v e n}^{m}

(6)

γ (F_{l, e v e n}^{m}) + F_{l, o d d}^{m} = F_{l + 1, e v e n}

(7)

F_{l, e v e n}^{m} - δ (F_{l, o d d}^{m}) = F_{l + 1, o d d}

(8)

where

η, θ, γ

and

δ

are four independent one-dimensional (1D) convolutional modules, as specifically illustrated in Figure 2, and

⊙

is the Hadamard product.

Based on the previous introduction of SCI-Block, the SCINet is constructed by hierarchically arranging multiple SCI-Blocks to form a complete binary tree structure. In this novel structure, there are 2^l SCI-Blocks at the l-th layer, where l = 1, 2, … L represents the depth of the layer, and L represents the depth of the lowest layer in the complete binary tree. In the SCINet network, the input time series undergoes layer-by-layer downsampling and is processed by the respective SCI-Blocks at each layer. This allows SCINet to efficiently learn comprehensive features across multiple time scales. Consequently, each SCI-Block captures both local and global perspectives of the entire time series. Following the execution of SCI-Block operations across L layers, elements from all sub-features are rearranged through the recursive implementation of the odd-even splitting operation in reverse. The reverse parity splitting operation is implemented using the following formula:

F_{l} (:, x, :) = \{\begin{matrix} F_{l + 1, e v e n} (: \frac{x}{2}, :) x = 0,2, 4 . . . N_{l} - 1 \\ F_{l + 1, o d d} (: \frac{x}{2}, :) x = 1,3, 5 . . . N_{l} \end{matrix}

(9)

\hat{X} (:, x, :) = F_{1} (x)

(10)

where

\hat{X}

is the final output of the SCINet and N is the sum of the lengths of the parity subsequences of the l-th layer.

The rearranged sequence is summed with the original time series by residual concatenation to obtain the feature-enhanced new sequence, and finally, the feature matrix of the new sequence is decoded into the final output

\hat{X}

using a fully connected layer. Within this framework, as the number of iterative layers increases, the input time series effectively captures feature dependencies within different time scales and between various time scales of the time series data. However, as the number of iterative layers increases, local interactive learning can make it progressively difficult for the model to learn global inter-dependencies, resulting in information blocking.

2.3. Long Short-Term Memory Network

The LSTM is an improved RNN that addresses the challenge of gradients vanishing or exploding. It achieves this by substituting neurons in the hidden layer of the RNN with memory units, distinguishing it from a conventional RNN network. Figure 3 shows the structure of the LSTM cell. One can see that the structure of the memory unit includes an input gate, a forget gate, and an output gate, which allows the network to remove invalid information and then retain important information at each time step. Specifically, at each discrete time step

t

, the forget gate

f_{t}

appraises the cell state

c_{t - 1}

from the previous step to determine the extent of data to be deleted, signifying the cell’s capacity for selective memory. Subsequently, the input gate

I_{t}

assimilates new input while factoring in updated and historical data, thereby refining the cell’s memory. Finally, the output gate

O_{t}

dictates the portion of the cell’s memory to be transmitted as output at its current state. The operation of the LSTM cell is quantifiable through a series of equations that delineate its ability to preserve information continuity across time steps, ensuring the network’s retention of long-term temporal dependencies. The operation of the LSTM cell can be calculated according to the following equations:

F_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(11)

I_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(12)

{\tilde{C}}_{t} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(13)

C_{t} = F_{t} \cdot C_{t - 1} + I_{t} \cdot {\tilde{C}}_{t}

(14)

O_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(15)

h_{t} = O_{t} \cdot t a n h (C_{t})

(16)

where

W_{f}

,

W_{i}

,

W_{c}

, and

W_{o}

denote the matrices of weights that the model learns through training. The bias vectors

b_{f}

,

b_{i}

,

b_{c}

, and

b_{o}

contribute to the output of their respective gates and cell updates.

{\tilde{C}}_{t}

is a new candidate value for cell date based on

C_{t}

. Tanh (•) is also an activation function. The input

x_{t}

and the output

h_{t}

at each time step

t

are processed through this meticulously structured flow of information, ensuring that the LSTM can precisely regulate the forgetting, storage, and updating of information, which is pivotal for the network to capture and maintain long-term dependencies.

2.4. Ensemble Framework

To attain superior prediction accuracy and enhanced generalizability, this study introduces an encoder-decoder ensemble framework for STLF, incorporating SCINet and LSTM. This ensemble framework capitalizes on the strengths of SCINet in extracting multi-scale features and LSTM in capturing long-term dependencies. The architecture of the proposed model, as depicted in Figure 4, encompasses four steps: data processing, encoder, decoder, and load forecasting module.

Step 1: Data processing. The initial load data, combined with external factors like temperature and day types, must undergo preprocessing to form a suitable input matrix for the framework, which includes normalizing load demand and temperature via max-min normalization and processing day types with a one-hot encoder. Subsequently, these preprocessed elements are combined to construct the feature matrix. Finally, the input data are organized as a tensor of 24 × 10.

Step 2: Encoder. The encoder comprises an FFN layer followed by a SCINet network, both integrated with residual connections. The FFN layer serves to augment non-linear representations of the input tensor, thereby enhancing the model’s capability to capture complex data patterns. The SCINet network, structured as a perfect binary tree, processes the input data at multiple levels. Within the same level of the tree-structured framework, different nodes use a set of convolution filters with distinct weights to capture local features at the same time scale. However, for multiple levels of the framework, the feature information from the shallower levels will be transmitted to deeper levels, which accumulate extra finer-scale features at multi-scale temporal resolutions. In the l-th level, the data within each node is transformed into a tensor of 24/2^l × 10. After processing through the SCINet, these subsets are recombined by reversing the odd-even splitting operation, restoring the temporal dimension to match the original input. In this way, this model can effectively capture both local and global features throughout the entire time series. The final load forecast can be obtained by processing these captured features through subsequent decoding stages.

Step 3: Decoder. The decoder is composed of an FFN layer and a series of LSTM networks. The FFN layer is specifically employed to further enhance the nonlinear representations of the time series data, thereby refining the data for more accurate forecasting. The LSTM networks are constructed to capture long-term dependencies of the load data that may compensate for the information loss during downsampling into multiple subsequences in the binary tree structure of the SCINet.

Step 4: Forecasting Evaluation. The output from Step 3 would be fed into the fully connected layers. The performance of the proposed model is validated by employing the test set. Multiple evaluation metrics, such as root mean square error (RMSE), mean absolute error (MAPE), and the coefficient of determination (R²), will be used to evaluate the performance of the proposed model.

The encoder-decoder architecture in the proposed model is key to preserving and effectively transmitting both local and global features, thereby reducing the risk of information loss. Similarly, it contributes to enhancing the ability of the model to model complicated dynamic features such as long-term dependencies, seasonality, and randomness, significantly enhancing the accuracy and generalization of load forecasting.

3. Experimental Results and Discussions

3.1. Experimental Design

3.1.1. Data Preparation

Experiments were conducted on three real-world datasets. The first dataset was collected from the state grid of a region in southern China, covering the range from 1 January 2012 to 31 December 2014, with a sampling interval of 15 min. This dataset was used to verify the superior performance of the proposed model. The second dataset, collected from ISO-NE (New England), spanned from 1 March 2003 to 3 March 2008 and employed a one-hour resolution. It was utilized to validate the generalization performance of the proposed model. The third dataset was sourced from Victoria (VIC), Australia, encompassing the period from 1 October 2010 to 5 January 2015, with a sampling rate of 1 h. This dataset provides further insights into the model’s applicability in diverse geographical and temporal scenarios. All these datasets were divided into a training set, a validation set, and a test set at a ratio of 8:1:1, as presented in Table 1.

3.1.2. Sliding Window Configuration

The configuration of the sliding window plays a crucial role in ensuring accurate data processing. Both the length and step size of the sliding window must be carefully chosen to achieve two objectives: capturing sufficient past data and maintaining reasonable computational demands. To determine the optimum sliding window size, comparative experiments have been performed by using previous 16, 24, and 48 h load data to predict one-hour ahead load data. From the results shown in Table 2, it is evident that the sliding window with a length of 24 h achieves the best performance.

3.1.3. Comparative Models

In order to validate the superior performance of the proposed model, the contrast models include CNN, LSTM, TCN, SCINet, TCN-LSTM, and FFN-SCINet-TCN. It should be noted that the FFN-SCINet-TCN architecture is derived by replacing the LSTM block in the proposed model with a TCN block. For the Southern China and ISO-NE datasets, each single model was trained for 6000 epochs, while the hybrid models and the proposed model were trained for 8000 epochs. In contrast, for the VIC dataset, single models were trained for 5000 epochs and hybrid models for 6000 epochs. The batch size for all models was set to 512, and the initial learning rate was fixed at 0.006. The hyperparameters for each model listed in this paper are comprehensively summarized in Table 3. All models were trained and tested more than five times until the statistical metrics were inclined to be stable. The training was conducted in a Python 3.8 environment using PyTorch 1.12, with the support of the NVIDIA A100 80GB PCIe.

3.1.4. Performance Evaluation

To evaluate the performance of the proposed model, three key statistical indicators are selected: RMSE, MAPE, and R². The RMSE provides an indication of the robustness of the model against outliers, while the MAPE offers a measure of the model’s prediction accuracy, with lower values indicating higher accuracy in both cases. Additionally, R² is introduced as a measure of the proportion of variance in the dependent variable that is predictable from the independent variables. This metric serves as a gauge of the overall explanatory power of the model, with values closer to 1 indicating a better fit. The statistical indicators are defined as follows:

M A P E = \frac{1}{n} \sum_{t = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}| \times 100 %

(17)

R M S E = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(18)

R^{2} = 1 - \frac{\sum_{t = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{t = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(19)

where y_t is the true value of the load,

\hat{y_{t}}

is the predicted value,

\bar{y}

is the average true value of the load, and n is the number of predicted time points.

3.2. Feature Selection and Processing

The evolution and patterns observed in power load data sequences are significantly impacted by a range of external factors. Due to the cyclical nature of societal activities and industrial production, the power load data exhibits distinct seasonal and intraday patterns. As illustrated in Figure 5, peak load values in summer occur between 13:00 and 17:00 on weekdays. In winter, the peak load values are between 17:00 and 19:00 on weekdays. In spring and autumn, the load values exhibit more moderate fluctuations with two peak load points. The first peak is around 7:00, and the second is around 19:00. Furthermore, meteorological conditions are also important factors in the load demand, including season, humidity, and temperature. Figure 6 shows the 10-year variation of load demand with temperature for the New England region of the United States. Clearly, temperature and load values exhibit regular seasonal trends and periodicity, with load demand peaking at extremely high or low temperatures. Therefore, temperature is an important factor in electric load forecasting. Furthermore, Figure 7 illustrates the correlation coefficients between load demand and external factors. Evidently, there is a moderate correlation between hours, temperature, and demand, with correlation coefficients of 0.21 and 0.5, respectively.

Based on the above analysis, this work selected basic features for the model, such as historical loads, temperatures, seasons, weekdays, and holidays, each of which required a different preprocessing method. Categorical features such as seasons, weekdays, and holidays are processed using a one-hot coder, which effectively converts each category into a binary vector, which is particularly beneficial for features such as seasons that lack a linear or hierarchical order. Historical loads and temperatures are numerical features, but they contain time series data spanning a wide range of time. They are often subject to a variety of unforeseen interferences, and differences in the size of the data are significant. Therefore, normalization of these data are needed to facilitate convergence of the network and thus improve the stability of model training. In this case, the maximum-minimum normalization method is used to normalize the load demand and temperature as defined below:

x = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(20)

where x is the normalized value, and

x_{m a x}

are the minimum and maximum in the time series, respectively. For external factors with Boolean characteristics, such as holidays, seasons, weekdays, and weekends, it is convenient to employ a one-hot encoder for processing, which is shown in Table 4.

3.3. Analysis and Comparison of Experimental Results

3.3.1. Southern China Dataset

Table 5 lists the MAPE, RMSE, and R² values of the proposed model and other contrast models based on the Southern China dataset. One can find that these single contrast models, including CNN, LSTM, and TCN, achieve similar predictive results. This is attributed to the suitability of these single models for addressing time series challenges and capturing the temporal variations in load data. Compared with the CNN and LSTM, the TCN obtains a decrease of 11.9%, 10.4%, 8.6%, and 7.8% in terms of MAPE and RMSE, respectively. It attributes the larger respective field of TCN to capture more feature information from the input matrix. Additionally, the SCINet model further reduces the MAPE and RMSE by 18.7% and 12.8%, respectively, compared to the TCN. It demonstrates that the downsample-convolve-interact architecture of SCINet achieves a larger receptive field than the dilated causal convolution of TCN and then extracts both local and global features of load data. Furthermore, the SCINet model even achieves higher prediction accuracies than the classical TCN-LSTM hybrid model. For instance, the MAPE of SCINet is reduced by 8% compared to that of the TCN-LSTM model. It proves that the SCINet facilitates the capture of complex temporal patterns and dynamics through its unique interactive learning capability. Thus, it is necessary to develop a new hybrid model for forecasting load demand based on an SCINet-based network. One can see that the FFN-SCINet-TCN hybrid model achieves a reduction of 21.8% in MAPE and 28% in RMSE compared with the SCINet. It means that the TCN can compensate for the temporal information loss during the downsampling of the input feature into multiple sub-sequences for the SCINet. Moreover, the proposed model can further optimize prediction accuracy compared to the FFN-SCINet-TCN model. For example, the proposed model outperforms the FFN-SCINet-TCN model by 24.6% and 27.3% in terms of MAPE and RMSE, respectively. The results prove that the LSTM network can compensate for the temporal information loss in SCINet due to its recursive downsampling. The prediction errors of the proposed model are obviously smaller than those of other contrast models. In addition, the R² values of all models exceed 99%, which infers that experimental results can be trusted. Furthermore, compared with other models, the proposed model achieves the best performance in terms of R². It should be noted that the running time of the model is positively related to the complexity of the model. Due to the presence of multiple SCIBlocks in SCINet and eight convolutional filters in each SCIBlock, the proposed model takes more time to run. However, compared to SCINet, the training cost of the proposed model increases by only 12.9%, but the accuracy is improved by 41.3%. Considering the development of computational capabilities, it is completely acceptable for practical applications.

To validate the reliability of the experimental results, Figure 8 depicts the MAPE values of the above models in a boxplot. It indicates that the hybrid models depict lower fluctuations than those of single models. Moreover, the proposed model exhibits the smallest error range and the lowest prediction error when compared to other contrasting models. Furthermore, Figure 9 presents the predicted load data of all models and the actual load data from 9 September 2013 to 16 September 2013. It is evident that all models effectively predict the rising and falling trends of load demand. Nevertheless, during the fluctuation stage, or at the peak and valley points of the load data, the deviations between the prediction curves of these single models and the actual load curve are more pronounced compared to those of hybrid models. One can see carefully from the subplots in Figure 9 that the predicted curve of the proposed model is closer to the actual load curve than other contrast models. Overall, the proposed model can effectively capture the complicated temporal patterns and dynamics in power load data and then achieve superior performance in load forecasting compared with other baseline models.

3.3.2. ISO-NE Dataset

To verify the generalization of the proposed model, a public and widely used dataset, i.e., the ISO-NE dataset, has been conducted as follows: Table 6 lists the experimental results of all models. For the single models, the SCINet model achieves lower prediction accuracy than other models such as CNN, LSTM, and TCN. However, the prediction error of SCINet is not significantly lower than that of TCN. It attributes that the nonlinearity and non-stationarity of the ISO-NE dataset are smaller than those of the Southern China dataset, which can be clearly seen from the comparison between Figure 9 and Figure 10. As such, the TCN-LSTM hybrid model achieves slightly lower prediction accuracy compared to the SCINet model. Nevertheless, the FFN-SCINet-TCN hybrid model obtains a reduction in prediction accuracy compared to the SCINet single model. For instance, the MAPE and RMSE of the FFN-SCINet-TCN model achieve a decrease of 32.9% and 23.4% compared to those of the SCINet model. It means that the architecture of the FFN-SCINet-TCN model facilitates the extraction of important feature information from the load data. Furthermore, it is obvious that the prediction errors of the proposed model are much smaller than those of other contrast models. For example, the proposed model outperforms the FFN-SCINet-TCN model by 20% and 20.5% in terms of MAPE and RMSE, respectively. It further demonstrates that the proposed model combines the multi-scale interactive learning capability of SCINet with the temporal dependency of LSTM. Additionally, the R² of the proposed model also achieves the maximum among all models and is up to 99.441%. This combined approach not only enhances its ability to capture complicated temporal patterns and dynamics but also improves its robustness against noise. Thus, the novel architecture used in this paper ensures that the proposed model achieves high prediction accuracy and generalization for different datasets with various nonlinear patterns.

Figure 11 also displays the MAPE errors of all models in the boxplot. One can find that the prediction errors of the single models show a larger fluctuation than those of other hybrid models. Moreover, the proposed model depicts a stable prediction range, which indicates that the proposed model has more stable and reliable prediction results than other contrast models. Figure 10 shows the predicted load data of all models and the actual load data from 3 September 2007 to 9 September 2007. A similar fitting situation also occurs in the ISO-NE dataset. One can see from the subplots in Figure 10 that the proposed model is closer to the actual load data than other contrast models, which means that the proposed model has generalization and robustness in load forecasting. Consequently, this novel hybrid model proposed in this paper achieves high prediction accuracy for STLF and then has promising practical applications.

3.3.3. VIC Dataset

Compared with the Southern China and ISO-NE datasets, the VIC dataset obviously presents volatility and randomness, which can be seen from the comparison between Figure 12 and Figure 9 and Figure 11. As a result, it is very meaningful to further validate the performance of the proposed model by using the VIC dataset. Table 7 shows the MAPE, RMSE, and R² values of all models. Similarly, the SCINet model achieves the best performance in terms of MAPE and RMSE compared with other single models such as CNN, LSTM, and TCN. At the same time, the FFN-SCINet-TCN model significantly improves prediction accuracy compared to the SCINet model. For example, the MAPE and RMSE of the former decrease by 21.5% and 21%, respectively, compared with the latter. Furthermore, compared to the FFN-SCINet-TCN model, the proposed model reduces the MAPE and RMSE by 4.8% and 5.3%, respectively. The R² of the proposed model achieves the maximum again among all models and reaches up to 98.493%. It should be noted that the predictive results of all models are evidently higher than those of the Southern China and ISO-NE datasets, indicating that the extraction of in-depth features is more challenging in the VIC dataset. Nevertheless, the proposed model still achieves the best performance in such a complex and dynamic dataset.

The MAPE errors of all models in the boxplot are depicted in Figure 13. It is evident that the SCINet-based hybrid models exhibit a stable prediction range and the lowest prediction error compared to other models. This further validates the reliability of predictive results for the proposed model. Figure 12 shows the predicted load data of all models and the actual load data from 6 July 2014 to 12 July 2014. One can see that all models nearly fit the changing trend of the actual load data. However, the proposed model is closer to the actual load data compared to other models, especially during the turning areas. Therefore, the proposed model not only achieves superior performance in STLF but also has good generalization in different load forecasting scenarios.

3.4. Discussion

After a thorough analysis and comparison of the three datasets mentioned above, it can be seen that the proposed model has significantly improved accuracy and generalization ability. It attributes this to the integration of SCINets multi-scale feature extraction and LSTMs long-term sequence dependency processing, effectively capturing the complex dynamic patterns and dynamics in time series data. Specifically, the SCINet, with its unique perfect binary tree structure, utilizes layered convolutional filters to deeply encode dependencies across various time scales. Subsequently, the LSTM decodes the long-term dependencies within these time sequences. This combined approach not only enhances the efficiency of processing complex time dynamics but also accurately captures long-term trends and patterns in load data.

It is noted that the computational complexity of SCINet is relatively higher than that of TCN due to its binary tree structure. Therefore, the SCINet-based hybrid model takes longer to process than the TCN-LSTM model. However, compared to the FFN-SCINet-TCN model, the proposed model demonstrates lower run times. Nonetheless, with the rapid advancement in computational capabilities, the impact of this increased computational complexity in practical applications is gradually diminishing. Thus, despite the higher theoretical computational complexity of the SCINet-based hybrid model, its advantages in precision and generalization capabilities often outweigh the additional computational burden in practice.

3.5. Comparison of the Proposed Model with Other Advanced Models

To further demonstrate the superior performance of the proposed model, some of the state-of-the-art models are selected as comparisons based on the ISO-NE dataset, as illustrated in Table 8. It can be seen that the proposed model shows a notable reduction in the MAPE and RMSE values compared with other models. These recent studies have been devoted to improving forecasting models by integrating advanced architectural components. For instance, Chen et al. enhanced the ResNet structure by adding side residual blocks to better capture intricate temporal features [39]. Similarly, Wan et al. employed parallel asymmetrical residual blocks to address the long-term multivariate dependencies of the time series data [40]. Liu et al. proposed a hybrid model that fused the advantages of improved TCN and DenseNet to dig implicit relationships among multiple features and construct long-term sequence dependencies [33]. Hua et al. proposed a novel ensemble framework that integrated parallel convolutional neural networks with GRU enhanced by a modified ResNet, aiming at capturing both the spatial and temporal features of load data [41]. However, these methods still struggle with capturing the complex patterns and dynamics in load data. To conquer this challenge, the proposed model in this paper was designed to better capture the complex patterns and dynamics in load data, thereby offering enhanced predictive capabilities and greatly improving forecasting accuracy.

4. Conclusions

In this paper, a novel hybrid model integrating SCINet, LSTM, and FNN is proposed for STLF. The process begins with reconstructing the preprocessed load data into an input matrix. This matrix is then introduced to an encoder comprising an FFN layer and a stacked SCINet network with residual connections. Subsequently, the matrix output from the encoder is channeled into a decoder, equipped with another FFN layer and LSTM networks. The final forecasting results are extracted from the fully connected layers. The FFN layers are instrumental in enhancing the non-linear representations of the input matrix, while the stacked SCINet adeptly captures complex temporal patterns and dynamics through interactive learning among sub-sequences. The LSTM networks are pivotal in bolstering the extraction of temporal dependencies in load data, thereby offsetting any information loss induced by iterative downsampling. The efficacy of the proposed model was validated using the Southern China, ISO-NE, and VIC datasets. The experimental outcomes reveal that the proposed model significantly reduces MPAE by 25% to 58% and RMSE by 27% to 59% for the Southern China dataset, and reduces MPAE by 20% to 49% and RMSE by 21% to 36% for the ISO-NE dataset. Moreover, the R² of the proposed model achieves the highest value compared with other contrast models in all datasets. Overall, this novel approach substantially enhances prediction accuracy and demonstrates strong generalization capabilities in STLF. Consequently, the model offers stable and precise power load forecasting, which can markedly reduce operational costs and generate substantial economic benefits for power systems.

In future work, it will focus on enhancing the SCINet-LSTM model to address the intricate challenges of power load forecasting, particularly in modern power systems with high renewable energy penetration. Moreover, it must optimize its training efficiency while integrating external factors, which are essential for accurate photovoltaic power generation prediction. In this way, it will not only improve the prediction accuracy of the proposed model in various load forecasting scenarios but also ensure its applicability in the rapid development of renewable energy forecasting.

Author Contributions

Conceptualization, M.L. and Y.L.; methodology, M.L. and Y.L.; software, Y.L.; validation, M.L. and J.H.; formal analysis, J.H. and S.D.; investigation, M.L. and X.W.; resources, J.H. and H.L.; data curation, M.L. and Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, M.L. and J.H.; visualization, S.D. and X.W.; supervision, H.L.; project administration, S.D. and H.L.; funding acquisition, J.H. and S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from multiple sources, including the National Natural Science Foundation of China (Grants 12364036 and 62065012), the Natural Science Foundation of Jiangxi Province of China (Grant 20232BAB201046), and the National College Students Innovation and Entrepreneurship Training Program (Grant 2023CX121).

Data Availability Statement

Publicly available datasets were analyzed in this study. The datasets can be found here: https://www.iso-ne.com/isoexpress/web/reports/load-and-demand (accessed on 1 March 2022), https://github.com/keatoncu/Southern-China-Dataset (accessed on 8 June 2022), and https://www.aemo.com.au/ (accessed on 15 February 2023).

Conflicts of Interest

Author Hongqiao Li was employed by the company EAST Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Brodowski, S.; Bielecki, A.; Filocha, M. A hybrid system for forecasting 24-h power load profile for Polish electric grid. Appl. Soft Comput. 2017, 58, 527–539. [Google Scholar] [CrossRef]
Wang, H.; Ruan, J.; Wang, G.; Zhou, B.; Liu, Y.; Fu, X.; Peng, J. Deep learning-based interval state estimation of AC smart grids against sparse cyber attacks. IEEE Trans. Ind. Inform. 2018, 14, 4766–4778. [Google Scholar] [CrossRef]
Alfares, H.K.; Nazeeruddin, M. Electric load forecasting: Literature survey and classification of methods. Int. J. Forecast. 2010, 33, 23–34. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Q.X.; Zhang, N.; Wang, Y.S. Conditional residual modeling for probabilistic load forecasting. IEEE Trans. Power Syst. 2018, 33, 7327–7330. [Google Scholar] [CrossRef]
Saksornchai, T.; Lee, W.-J.; Methaprayoon, K.; Liao, J.R.; Ross, R.J. Improve the unit commitment scheduling by using the neural-network-based short-term load forecasting. IEEE Trans. Ind. Appl. 2005, 41, 169–179. [Google Scholar] [CrossRef]
Lopez, M.; Valero, S.; Rodriguez, A.; Veiras, I.; Senabre, C. New online load forecasting system for the Spanish transport system operator. Electr. Power Syst. Res. 2018, 154, 401–412. [Google Scholar] [CrossRef]
Christiaanse, W.R. Short-Term load forecasting using general exponential smoothing. IEEE Trans. Power App. 1971, 90, 900–911. [Google Scholar] [CrossRef]
Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. ETSformer: Exponential smoothing transformers for time-series forecasting. arXiv 2022. [Google Scholar] [CrossRef]
Taylor, J.W.; McSharry, P.E. Short-term load forecasting methods: An evaluation based on European data. IEEE Trans. Power Syst. 2007, 22, 2213–2219. [Google Scholar] [CrossRef]
Noureen, S.; Atique, S.; Roy, V.; Bayne, S. Analysis and application of seasonal ARIMA model in energy demand forecasting: A case study of small scale agricultural load. In Proceedings of the 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX, USA, 4–7 August 2019; pp. 521–524. [Google Scholar] [CrossRef]
Li, G.D.; Wang, C.H.; Masuda, S.; Nagai, M. A research on short term load forecasting problem applying improved grey dynamic model. Int. J. Electr. Power Energy Syst. 2011, 33, 809–816. [Google Scholar] [CrossRef]
Li, W.; Han, Z.-h. Application of improved grey prediction model for power load forecasting. In Proceedings of the 2008 12th International Conference on Computer Supported Cooperative Work in Design, Xi’an, China, 16–18 April 2008; pp. 1116–1121. [Google Scholar] [CrossRef]
Li, S.; Wang, P.; Goel, L. Short-term load forecasting by wavelet transform and evolutionary extreme learning machine. Electr. Power Syst. Res. 2015, 122, 96–103. [Google Scholar] [CrossRef]
Jiang, H.; Zhang, Y.; Muljadi, E.; Zhang, J.J.; Gao, D.W. A short-term and high-resolution distribution system load forecasting approach using Support Vector Regression with hybrid parameters optimization. IEEE Trans. Smart Grid 2018, 9, 3341–3350. [Google Scholar] [CrossRef]
Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
Guan, C.; Luh, P.B.; Michel, L.D.; Chi, Z. Hybrid Kalman filters for very short-term load forecasting and prediction interval estimation. IEEE Trans. Power Syst. 2013, 28, 3806–3817. [Google Scholar] [CrossRef]
Sharma, S.; Majumdar, A.; Elvira, V.; Chouzenoux, E. Blind Kalman filtering for short-term load forecasting. IEEE Trans. Power Syst. 2020, 35, 4916–4919. [Google Scholar] [CrossRef]
Aprillia, H.; Yang, H.-T.; Huang, C.-M. Statistical load forecasting using optimal quantile regression random forest and risk assessment index. IEEE Trans. Smart Grid 2021, 12, 1467–1480. [Google Scholar] [CrossRef]
Huang, N.; Lu, G.; Xu, D. A Permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 2016, 9, 767. [Google Scholar] [CrossRef]
Tamimi, M.; Egbert, R. Short term electric load forecasting via fuzzy neural collaboration. Electr. Power Syst. Res. 2000, 56, 243–248. [Google Scholar] [CrossRef]
Ali, M.; Adnan, M.; Tariq, M.; Poor, H.V. Load forecasting through estimated parametrized based fuzzy inference system in smart grids. IEEE Trans. Fuzzy Syst. 2021, 29, 156–165. [Google Scholar] [CrossRef]
Asar, A.U.; Hassnain, S.R.U.; Khan, A. Short term load forecasting using particle swarm optimization based ANN approach. In Proceedings of the 2007 International Joint Conference on Neural Networks, Orlando, FL, USA, 12–17 August 2007; pp. 1476–1481. [Google Scholar] [CrossRef]
Drezga, I.; Rahman, S. Input variable selection for ANN-based short-term load forecasting. IEEE Trans. Power Syst. 1998, 13, 1238–1244. [Google Scholar] [CrossRef]
MacInnes, J.; Santosa, S.; Wright, W. Visual classification: Expert knowledge guides machine Learning. IEEE Comput. Graph. Appl. 2010, 30, 8–14. [Google Scholar] [CrossRef] [PubMed]
Steenwinckel, B.; De Paepe, D.; Vanden Hautte, S.; Heyvaert, P.; Bentefrit, M.; Moens, P.; Dimou, A.; Van Den Bossche, B.; De Turck, F.; Van Hoecke, S.; et al. A methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning. Future Gener. Comput. Syst. 2021, 116, 30–48. [Google Scholar] [CrossRef]
Cai, M.; Pipattanasomporn, M.; Rahman, S. Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl. Energy 2019, 236, 1078–1088. [Google Scholar] [CrossRef]
Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting: A novel pooling deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R. Improving time series forecasting using LSTM and attention models. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 673–691. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014. [Google Scholar] [CrossRef]
Islam, M.R.; Al Mamun, A.; Sohel, M.; Hossain, M.L.; Uddin, M.M. LSTM-Based electrical load forecasting for Chattogram city of Bangladesh. In Proceedings of the 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 12–14 March 2020; pp. 188–192. [Google Scholar] [CrossRef]
Afrasiabi, M.; Mohammadi, M.; Rastegar, M.; Stankovic, L.; Afrasiabi, S.; Khazaei, M. Deep-based conditional probability density function forecasting of residential loads. IEEE Trans. Smart Grid 2020, 11, 3646–3657. [Google Scholar] [CrossRef]
Kim, T.-Y.; Cho, S.-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
Liu, M.; Qin, H.; Cao, R.; Deng, S. Short-Term load forecasting based on improved TCN and DenseNet. IEEE Access 2022, 10, 115945–115957. [Google Scholar] [CrossRef]
Deng, Z.; Wang, B.; Xu, Y.; Xu, T.; Liu, C.; Zhu, Z. Multi-Scale convolutional neural network with time-cognition for multi-step short-term load forecasting. IEEE Access 2019, 7, 88058–88071. [Google Scholar] [CrossRef]
Wang, H.; Peng, J.; Huang, F.; Chen, J.; Xiao, Y. MICN: Multi-Scale Local and Global Context Modeling for Long-Term Series Forecasting. In Proceedings of the Eleventh International Conference on Learning Representations; OpenReview.net. Available online: https://openreview.net/forum?id=zt53IDUR1U (accessed on 15 March 2023).
Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. SCINet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar] [CrossRef]
Parri, S.; Teeparthi, K. VMD-SCINet: A hybrid model for improved wind speed forecasting. Earth Sci. Inform. 2023, 16, 1–22. [Google Scholar] [CrossRef]
Silva, A.Q.B.; Gonçalves, W.N.; Matsubara, E.T. DESCINet: A hierarchical deep convolutional neural network with skip connection for long time series forecasting. Expert Syst. Appl. 2023, 228, 120246. [Google Scholar] [CrossRef]
Chen, K.J.; Chen, K.L.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-Term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2018, 10, 3943–3952. [Google Scholar] [CrossRef]
Wan, R.; Mei, S.; Wang, J.; Liu, M.; Yang, F. Multivariate temporal convolutional network: A deep neural networks approach for multivariate time series forecasting. Electronics 2019, 8, 876. [Google Scholar] [CrossRef]
Hua, H.; Liu, M.; Li, Y.; Deng, S.; Wang, Q. An ensemble framework for short-term load forecasting based on parallel CNN and GRU with improved ResNet. Electr. Power Syst. Res. 2023, 216, 109057. [Google Scholar] [CrossRef]

Figure 1. The structure of FFN.

Figure 2. The structure of the SCINet.

Figure 3. The structure of the LSTM cell.

Figure 4. The framework of the proposed model.

Figure 5. Comparative load curves across four seasons for a day in the ISO-NE dataset.

Figure 6. Power load about temperature change curves for 10 years of the ISO-NE dataset.

Figure 7. Heatmap of correlation coefficients between power load and external factors.

Figure 8. The box diagrams of MAPE on the Southern China dataset.

Figure 9. Load forecasting results of all models from 9 September 2013 to 16 September 2013 based on the Southern China dataset.

Figure 10. Load forecasting results of all models from 3 September 2007 to 9 September 2007 based on the ISO-NE dataset.

Figure 11. The box diagrams of MAPE on the ISO-NE dataset.

Figure 12. Load forecasting results of all models from 6 July 2014 to 12 July 2014 based on the VIC dataset.

Figure 13. The box diagrams of MAPE on the VIC dataset.

Table 1. Division of datasets.

Dataset	Total	Training Set	Validation Set	Testing Set	Maximum (MW)	Minimum (MW)
Southern China	65,880	52,704	6588	6588	11,430	1306
ISO-NE	43,920	35,136	4392	4392	27,622	8820
VIC	43,920	35,136	4392	4392	10,240	3273

Table 2. Experimental results for different sliding window sizes on the ISO-NE dataset.

	16 h			24 h			48 h
Model	MAPE (%)	RMSE(MW)	R² (%)	MAPE (%)	RMSE(MW)	R² (%)	MAPE (%)	RMSE(MW)	R² (%)
CNN	1.94	372.33	97.808	0.96	258.57	98.836	1.91	347.95	98.08
LSTM	1.01	284.54	98.72	0.87	242.92	98.87	1.12	276.01	98.792
Proposed model	0.53	192.43	99.415	0.49	166.18	99.441	0.57	175.80	99.51

Table 3. Hyperparameters for each model.

Model	Hyperparameters
CNN	Conv1: out_channels = 24, kernel_size = 3, Conv2: out_channels = 48, kernel_size = 3
LSTM	hidden_size = 10, num_layers= 3, dropout= 0.1
TCN	num_channels = [20, 20, 20], dilation_size = 2, kernel_size = 2, dropout = 0.2
SCINet	hidden_size = 1, num_stacks = 1, num_levels = 3, kernel_size = 5, dropout = 0.5
TCN-LSTM	TCN: num_channels = [20, 20, 20], dilation_size = 2, kernel_size = 2, dropout = 0.2, LSTM: hidden_size = 10, num_layers = 3, dropout = 0.1
FFN-SCINet-TCN	FFN: mult = 4, dropout = 0.2, TCN: num_channels = [20, 20, 20], dilation_size = 2, kernel_size = 2, dropout = 0.2, SCINet: hidden_size = 1, num_stacks = 1, num_levels = 3, kernel_size= 5, dropout rate = 0.5
Proposed model	FFN: mult = 4, dropout = 0.2, LSTM: hidden_size = 10, num_layers = 3, dropout = 0.1, SCINet: hidden_size = 1, num_stacks = 1, num_levels = 3, kernel_size = 5, dropout = 0.5

Table 4. Preprocessing methods for input features.

Feature	Size	Description
Demand	1 × 24	Maximum-min normalized power load data
Temperature	1 × 24	Maximum-min normalized temperature data
Season	4 × 24	One-hot encoder	4 seasons	[1, 0, 0, 0] to [0, 0, 0, 1]
Holiday	2 × 24	One-hot encoder	Yes/No	[1, 0]/[0, 1]
Weekend	2 × 24	One-hot encoder	Yes/No	[1, 0]/[0, 1]

Table 5. Load forecasting evaluation on the Southern China dataset.

Model	MAPE (%)	RMSE (MW)	R² (%)	Cost Time (s)
CNN	1.09	118.50	99.541	442.28
LSTM	1.05	115.12	99.624	487.11
TCN	0.96	106.18	99.718	771.37
SCINet	0.78	92.55	99.834	1723.2
TCN-LSTM	0.85	93.36	99.818	812.46
FFN-SCINet-TCN	0.61	66.47	99.874	2121.9
Proposed model	0.46	48.30	99.922	1945.55

Table 6. Load forecasting evaluation on the ISO-NE dataset.

Model	MAPE (%)	RMSE (MW)	R² (%)	Cost Time(s)
CNN	0.96	258.57	98.836	316.44
LSTM	0.87	242.92	98.87	361.41
TCN	0.76	228.77	99.177	515.47
SCINet	0.73	216.86	99.35	1247.51
TCN-LSTM	0.71	213.90	99.333	587.51
FFN-SCINet-TCN	0.61	209.11	99.36	1439.85
Proposed model	0.49	166.18	99.441	1362.41

Table 7. Load forecasting evaluation on the VIC dataset.

Model	MAPE (%)	RMSE(MW)	R² (%)	Cost Time(s)
CNN	2.75	174.79	95.52	365.26
LSTM	2.63	167.78	95.874	419.96
TCN	2.23	144.51	96.92	621.20
SCINet	2.09	134.86	97.312	2437.90
TCN-LSTM	2.00	128.18	97.576	641.76
FFN-SCINet-TCN	1.64	106.57	98.334	2985.40
Proposed model	1.56	101.06	98.493	2935.90

Table 8. Comparison of performance on the ISO-NE dataset.

Research	Year	Method	MAPE (%)
Chen et al. [39]	2019	ResNetPlus	1.45
Wan et al. [40]	2019	M-TCN	0.97
Liu et al. [33]	2022	DenseNet-iTCN-A	0.87
Hua et al. [41]	2023	PCGA-iResNet	0.78
Proposed model	2023	Proposed Model	0.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, M.; Li, Y.; Hu, J.; Wu, X.; Deng, S.; Li, H. A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting. Energies 2024, 17, 95. https://doi.org/10.3390/en17010095

AMA Style

Liu M, Li Y, Hu J, Wu X, Deng S, Li H. A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting. Energies. 2024; 17(1):95. https://doi.org/10.3390/en17010095

Chicago/Turabian Style

Liu, Mingping, Yangze Li, Jiangong Hu, Xiaolong Wu, Suhui Deng, and Hongqiao Li. 2024. "A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting" Energies 17, no. 1: 95. https://doi.org/10.3390/en17010095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Hybrid Model Based on SCINet and LSTM for Short-Term Power Load Forecasting

Abstract

1. Introduction

2. Methodology

2.1. Feed-Forward Network

2.2. Sample Convolution and Interaction Network

2.3. Long Short-Term Memory Network

2.4. Ensemble Framework

3. Experimental Results and Discussions

3.1. Experimental Design

3.1.1. Data Preparation

3.1.2. Sliding Window Configuration

3.1.3. Comparative Models

3.1.4. Performance Evaluation

3.2. Feature Selection and Processing

3.3. Analysis and Comparison of Experimental Results

3.3.1. Southern China Dataset

3.3.2. ISO-NE Dataset

3.3.3. VIC Dataset

3.4. Discussion

3.5. Comparison of the Proposed Model with Other Advanced Models

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI