Next Article in Journal
Heuristic Optimization Approaches for Capacitor Sizing and Placement: A Case Study in Kazakhstan
Previous Article in Journal
Evaluation of the Implementation Effect of China’s Industrial Sector Supply-Side Reform: From the Perspective of Energy and Environmental Efficiency
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting of Short-Term Load Using the MFF-SAM-GCN Model

School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400030, China
*
Author to whom correspondence should be addressed.
Energies 2022, 15(9), 3140; https://doi.org/10.3390/en15093140
Submission received: 17 March 2022 / Revised: 18 April 2022 / Accepted: 24 April 2022 / Published: 25 April 2022
(This article belongs to the Section A1: Smart Grids and Microgrids)

Abstract

:
Short-term load forecasting plays a significant role in the operation of power systems. Recently, deep learning has been generally employed in short-term load forecasting, primarily in the extraction of the characteristics of digital information in a single dimension without taking into account of the impact of external variables, particularly non-digital elements on load characteristics. In this paper, we propose a joint MFF-SAM-GCN to realize short-term load forecasting. First, we utilize a Bi-directional Long Short-Term Memory (Bi-LSTM) network and One-Dimensional Convolutional Neural Network (1D-CNN) in parallel connection to form a multi-feature fusion (MFF) framework, which can extract spatiotemporal correlation features of the load data. In addition, we introduce a Self-Attention Mechanism (SAM) to further enhance the feature extraction capability of the 1D-CNN network. Then with the deployment of a Graph Convolutional Network (GCN), the external non-digital features such as weather, strength, and direction of wind, etc., are extracted. Moreover, the generated weight matrices are incorporated into the load features to enhance feature recognition ability. Finally, we exploit Bayesian Optimization (BO) to find the optimal hyperparameters of the model to further improve the prediction accuracy. The simulation is taken from our proposed model and six benchmark schemes by using the bus load dataset of the Shandong Open Data Network, China. The results show that the RMSE of our proposed MFF-SAM-GCN model is 0.0284, while the SMAPE is 9.453%,the MBE is 0.025, and R-squared is 0.989, which is better than the selected three traditional machine learning methods and the three deep learning models.

1. Introduction

In 2021, China proposed the construction of a new power system with new energy as the primary supply, which ensures energy and power security as its fundamental premise and sets the power needs of economic and social development as its primary goal. Supported by the interaction between storage and multi-energy complementarity, the new power system has the characteristics of being clean, low-carbon, open, and interactive. Taking a strong smart grid as the hub platform, it is also controllable, efficient, intelligent, and flexible [1]. The new power system must carry out real-time monitoring and precise regulation of the entire power generation, transmission, distribution, and consumption chain. However, power load prediction is not only the basis of the new power system’s flexibility, efficiency, and open-accessibility, but also the basis for formulating power grid construction planning, and optimizing Demand Side Response (DSR). It is the foundation for users to improve energy efficiency, which has an important impact on improving quality and efficiency for the department of power production.
Based on various time scales, electricity load forecasting may be split into three categories [2,3,4]: short-term load forecasting (SLF), medium-term load forecasting (MLF), and long-term load forecasting (LLF). SLF refers to the forecast of power load in the next few days, hours, or even minutes. MLF is the forecasting of power load in the period of one month to one quarter, and LLF is the forecasting of power over more than one year. SLF gives an accurate insight into the consumption behaviors of energy customers and enables better flexibility in demand-side management, which is essential for economical operation and short-term maintenance planning. Furthermore, aggregating short-term load projections provides the basis for medium and long-term load forecasting.
The main features used for load forecasting include historical load data, economic indicators of various industries in the whole society, weather (temperature, humidity, wind, and rainfall, etc.), and electricity price policies, etc. Due to the fact that people in different regions and industries have various power consumption modes, SLF is more challengable than MLF and LLF. The SLF approaches are now divided into two categories, which are the classical techniques and the machine learning methods. The first category includes time series analysis [5], multiple linear regression [6], and Kalman filtering [7], etc. Although these models are easy to conduct, it is difficult to extract complex features of energy consumption patterns from them, and their robustness is poor. The other is the machine learning based methods, such as clustering [8], decision trees [9], random forests [10], support vector machines [11], fuzzy logic [12], and neural networks [13], etc. Owing to its superior feature extraction capabilities, deep learning has been one of the most appealing approaches for SLF in recent years. Recurrent Neural Networks (RNN) [14] are the most commonly used time series prediction models in deep learning, accomplishing series prediction by examining the temporal correlation between time series. Long Short Term Memory (LSTM) networks [15] are one of the most effective implementations of RNNs, solving the problem of long-term dependence of time series and are the most commonly used model in SLF. Convolutional Neural Networks (CNN) [16] are another popular deep learning model effective at extracting spatial data. One-Dimensional Convolutional Neural Networks (1D-CNN) [17] can identify basic patterns in time series data and collect local information, which can be used with RNN to boost prediction accuracy even further. One of the current hot issues in time series research is selecting and extracting distinct features and boosting the prediction potential of the model via feature fusion [18]. External characteristics such as geography, industry, economics, and weather may boost the feature richness and forecasting capabilities of the model in SLF. External features, on the other hand, are usually irregular, and CNN cannot handle the unstructured input and has limited capacity to extract irregular features. Thus, the Graph Convolutional Network (GCN) [19] was proposed. Not only can a GCN recover irregular features, but it can also capture the association between distinct characteristics.
In this paper, we offer a model for SLF that combines with multi-feature fusion (MFF), the Self-Attention Mechanism (SAM), and GCN. To extract the spatiotemporal correlation features of load data, our proposed model employs 1D-CNN and bi-directional Long Short-Term Memory (Bi-LSTM) networks in parallel to form an MFF framework. Then we introduce an SAM to improve the feature extraction capability of 1D-CNN. In addition, GCN is used to extract external non-numerical features such as weather, wind force, and direction. Moreover, the generated weight matrices are incorporated into the load features to enhance feature recognition ability. Finally, we exploit the Bayesian Optimization (BO) to find the optimal hyperparameters of the model to further improve the prediction accuracy. The following are the primary contributions of this study.
(1)
An MFF and GCN based SLF model is suggested, in which MFF is used to extract spatiotemporal correlation characteristics of load data and GCN is to elicit external non-numerical information.
(2)
A novel MFF structure is developed, which employs the 1D-CNN based on the SAM to extract spatially relevant load data features. Moreover, the use of Bi-LSTM network can help to extract temporally relevant load data features.
(3)
A BO algorithm is used to find the optimal model parameters, and the proposed prediction model is experimentally compared with currently popular prediction models to investigate the role of each module and the effect of hyperparameter settings and busbars on prediction performance via ablation experiments.
The rest of the paper is structured as follows: Section 2 presents current SLF methodologies, Section 3 describes the MFF-SAM-GCN framework presented in this research, and Section 4 is dedicated to experimentation and analysis. The suggested framework is contrasted with frequently used approaches in the experiments, and ablation experiments are performed; Section 5 is the conclusion, which summarizes this paper and suggests the direction of future research.

2. Related Work

SLF based on machine learning has been extensively studied. The European Network for Intelligent Technologies (EUNITE) held a worldwide competition on power demand forecasting in 2001, in which Chen et al. [20] used Support Vector Regression (SVR) technique to win the event; Rafi et al. [21] proposed a SLF method based on CNN and LSTM networks for SLF in Bangladesh’s national electricity system; Goh et al. [22] proposed a hybrid method combining multiple independent 1D-CNN and LSTM networks for SLF in the Irish national grid; Farsi et al. [23] proposed a hybrid parallel CNN-LSTM network model, which consists of two paths, CNN and LSTM, and the outputs of the two paths are combined to form a fully connected path as the final output, using the hourly load dataset in Malaysia and the daily load dataset in Germany to forecast the model; Wang et al. [24] proposed an SLF model based on time convolution network (TCN) and LightGBM algorithm, and validated the model using datasets from different industries in China, Australia, and Ireland.
Because of its unique capacity to extract significant information, the Attention Mechanism (AM) has been integrated with deep learning and has been employed in a variety of industries. For SLF, Fahim et al. [25] designed a hybrid predictive model using a LSTM network combined with a SAM for monitoring educational reform in Morocco; Miao et al. [26] suggested a hybrid model of CNN-Bi-LSTM network coupled with BO and AM; Azam et al. [27] proposed a hybrid deep learning architecture based on Bi-LSTM and interpretable multi-headed attention mechanism, combining deep learning with Ensemble Empirical Mode Decomposition (EEMD) algorithm, which was evaluated using ISO-NE load and price dataset; Zang et al. [28] combined LSTM networks and SAM to design a two-channel hybrid model that combined feature engineering to pre-process the raw data and tested it using residential customer electricity load datasets from different countries and regions; Shang et al. [29] suggested a Multivariate Multistep CNN-LSTM (MMCNN-LSTM) hybrid model estimate 24-h power demands using Particle Swarm Optimization (PSO) and Kernel Fuzzy C Means (KFCM), and the model’s performance was validated by three prediction trials.
The GCN has unique advantages in extracting spatial features and has been applied to traffic flow prediction. Zhao et al. [30] proposed a traffic flow prediction method using GCN and Gated Recurrent Unit (GRU) to form a Temporal Graph Convolutional Network (T-GCN) model, which was validated on an SZ-taxi dataset and Los-loop dataset; Wang et al. [31] created a global relational reasoning GCN model for human posture assessment and included it into the early phases of HRNet and SimpleBaseline. The findings suggest that the model can successfully learn key point correlations.

3. Method

3.1. Overall Network

The framework of MFF-SAM-GCN prediction model is shown at Figure 1. The input of the model adopts both numerical and textual features. The basic module is a MFF framework consisting of Bi-LSTM and 1D-CNN, with Bi-LSTM extracting time-related load data features and 1D-CNN extracting spatially related load data features. The two models are fused to obtain multi-dimensional features. SAM is applied to the 1D-CNN to improve the efficiency of feature extraction by increasing the significance of target features and reducing irrelevant detail information. Furthermore, GCN is employed as a feature improvement module to collect external non-numerical characteristics such as weather, wind force, and wind direction, which are then reorganized with global information.

3.2. Multi-Feature Fusion Framework

The commonly used deep learning models for data sequence feature extraction are LSTM and CNN, where LSTM is a special type of RNN with time series as input and a cyclic structure that can learn long-term dependency information between sequences, and CNN can extract spatial features that can recognize simple patterns of sequences and use these simple patterns to generate complex patterns at a high level. 1D-CNN can extract effective features directly. In order to obtain rich dimensional information, this paper uses one Bi-LSTM module and two 1D-CNN modules in parallel to form a MFF framework to extract spatiotemporally correlated features of load data in parallel and adds SAM between the two 1D-CNNs to enhance the feature extraction capability of the 1D-CNN.
The load data matrix is represented in the MFF framework by X R N × C , where N is the number of data collection points obtained for a specific period (e.g., hours, days), C is the number of periods. x t = x 1 t , , x C t , t = 1 , N is the load data vector for the same collection points of different periods, and x i = [ x i 1 , , x i N ] , i = 1 , , C is the load data vector for different collection points for the same period.

3.2.1. Bi-Directional Long Short-Term Memory Network

The LSTM network is a special kind of RNN that substitutes the RNN’s implicit function with a memory neuron, which can recall more information than an RNN alone and is appropriate for processing sequences with longer gaps or delays. Figure 2 [32] depicts the structure of the memory neuron structure in the LSTM network, which primarily consists of the forget gate, input gate, information update, and output gate, where x t , y ^ t denote the input and output information at moment t, respectively, which s t represents the state vector, which is utilized by the LSTM network to recall the history information; s ˜ t represents the candidate vector, which is used to update the state vector, h t sames as output message y ^ t , and it is delivered to the next memory neuron as the activation vector. o f t , o u t , o o t denote the forget gate, input gate, output gate, respectively; σ and t a n h are both activation operations, σ designates the sigmoid function, sigmoid x = 1 / 1 + e x and t a n h denotes the hyperbolic tangent function, t a n h ( x ) = ( e x e x / e x + e x ) .
(1)
Forget gate: The forget gate utilizes the sigmoid function to pick some of the information from the output h t 1 at the moment of t 1 and the input x t at the moment of t. The output of the forgotten gate is given by:
o f t = sigmoid ( w f [ h t 1 , x t ] + b f )
(2)
Input gate: The s i g m o i d and t a n h functions are used to choose certain information from the output h t 1 at the moment of t 1 and the input x t at the moment of t, respectively. Two outputs of input gate are:
o u t = sigmoid ( w u [ h t 1 , x t ] + b u ) s ˜ t = tanh ( w s [ h t 1 , x t ] + b s )
(3)
Information update: The output o f t of the forget gate at the moment of t is multiplied by the state vector s t 1 at the moment of t 1 and added to the product of the two output values of the input gate. The updated information is represented as:
s t = o f t s t 1 + o u t s ˜ t
where ⊙ denotes the Hadamard product operation, which is defined as the product of two vectors with the same number of elements at the same position.
(4)
Output gate: on the one hand, it utilizes the s i g m o i d function to pick some information from the output h t 1 at the moment of t 1 and the input x t at the moment of t. Use the t a n h function, on the other hand, to filter the updated state vector s t selection and then multiply the two to obtain the output of the present instant that is:
o o t = s i g m o i d ( w o [ h t 1 , x t ] + b o ) h t = o o t tanh ( s t )
In the above description, the calculated weight matrix w f , w u , w o , w s , and bias vector b u , b o , b s , as described previously, are the parameters that the LSTM network must learn during the training phase.
Although they can manage and anticipate events with large gaps and delays in a time series, LSTM networks can only utilize the past data. The Bi-LSTM network is made up of two LSTM networks with opposing time orientations, one starting at the beginning and ending at the end of the time series and the other starting at the end and ending at the beginning. The Bi-LSTM network has much higher accuracy than the LSTM network.
Figure 3 [33] depicts the Bi-LSTM network model, where x = [ x 1 , , x N ] , y ^ = [ y ^ 1 , , y ^ N ] represents the input and output information. h and h denote the output vectors of the forward and reverse LSTM networks respectively. The Bi-LSTM network model may be expressed as follows:
h = f ( w h p r e , x + b ) h = f ( w h n e x t , x + b ) y ^ = F ( w y h , h + b y )
where f · denotes the activation function that incorporates the operations of Equations (1)–(4), F · denotes the activation function that performs weighted splicing on the outputs h and h of the forward and inverse LSTM networks, w y , b y are the weight matrices and bias vectors for calculating the forward LSTM network’s output y ^ , respectively, and w , b are the weight matrices and bias vectors for generating the forward LSTM network’s output h , and h pre is the output of the previous memory neuron in the inverse LSTM network’s output. h next is the output of the previous memory neuron in the forward LSTM network, which is set to 0 if the present memory neuron is the first, and is the output of the latter memory neuron in the reverse LSTM network, which is set to 0 if the present memory neuron is the last.

3.2.2. One-Dimensional Convolutional Neural Network

The 1D-CNN extracts local characteristics of data sequences using a certain size window segmentation, which recovers spatially meaningful features from the load data. Two 1D-CNN cascades are created in this research, and the SAM is introduced between them to improve the 1D- CNN’s extraction capacity. The result x ^ i of the first 1D-CNN on the load data x i is:
x ^ i = σ ( w i x i + b i )
where ⊗ represents the convolution procedure, σ · denotes the activation function, w i , b i are the weight matrix and bias vector of x ^ i , respectively.

3.2.3. Self-Attention Mechanism

The AM improves the convolutional kernel’s perceptual field, while the SAM is a version [34] of the AM that lowers reliance on external input and is better at capturing the internal importance of data or features. The SAM is employed in this paper to improve the capture of target feature dependencies, and its implementation framework is presented in Figure 4.
The primary premise of the SAM is to identify the most significant aspects and improve them by calculating data scores at various places in the data stream. Three one-dimensional convolutional extraction load data features are employed in this study, labeled as a x , b x , c x .
a x ^ i = w a x ^ i b x ^ i = w b x ^ i c x ^ i = w c x ^ i
where w a , w b , w c are the weight vectors for each of the three one-dimensional convolutions.
The softmax weights α i m n , m , n = 1 , , N between a x , b x has been calculated while using x ^ i m , x ^ i n to represent the two data points in the output sequence x ^ i after the first 1D-CNN.
α i mn = soft max ( ( a ( x ^ i m ) ) T × b ( x ^ i n ) )
where c x and α i mn , m , n = 1 , , N are used to calculate self-attention scores for data x ^ i m .
β i m = n = 1 N α i mn c x ^ i n
The first 1D-CNN output sequence x ^ i m is concatenated with the weighted residuals of the SAM score, and the result is utilized as the input of the second 1D-CNN, i.e.:
x ˜ i = [ x ˜ i T + γ β i T ] T
where γ is the trainable weight coefficient and β i = [ β i 1 , , β i N ] is the SAM score vector for each data point. The output y ^ i of x ˜ i after the second 1D-CNN is:
y ^ i = σ ( w i x ˜ i + b i )
The output information y ˜ = [ y ˜ 1 , , y ˜ N ] is acquired by conducting the same operations and processing on a matrix of load data X = x 1 , x C T splited by a specified time in a sequential manner, and the feature y is obtained by sewing y ^ and y ˜ together, resulting in multi-feature fusion.
y = [ y ^ , y ˜ ]

3.3. Graph Convolutional Network

GCN is a feature extractor that works similarly to the CNN, except that a CNN extracts tensor data features, while a GCN extracts graph-structured data features. In this paper, we use GCN to extract external non-numerical features such as weather, wind force and wind direction, and reorganize the extracted features with global information. The GCN is made up of T vertices and linked edges between them, marked as G = ( V , A ) , where V indicates the set of vertices and A denotes the adjacency matrix of dimension T × T with each vertex having a feature dimension A and all vertices constituting a feature matrix X of dimension T × D .
GCN is essentially concerned with determining a learnable kernel weight that applies to the graph. The GCN propagates across layers as a deep learning model in the following manner.
H ( l + 1 ) = σ ( D ^ 1 2 A ^ D ^ 1 2 H l W l )
where A ^ = A + I , I is the unit matrix, which is the same as adding a self-loop to each vertex. D ^ is the connectivity diagonal matrix characterizing the connectivity of G , H l is the feature matrix of the layer l, and W l represents the trainable weight of the layer l, σ · represents the activation function.
External factors such as weather, wind force, and wind direction are non-numerical data that are connected in load forecasting, and their correlation matrix may be created using conditional probabilities [35]. The matrix can be obtained by assuming that the number of times C i an external feature label appears in the training dataset, and another external feature label appearing in the training dataset is C j , and both appearing in the training dataset is C i j ; i , j = 1 , , T , and T indicate the dimensionality of the external feature label in the dataset. As a result, C R T × T can be obtained. The probability matrix of external characteristics, that is, the adjacency matrix A R T × T , may be computed, and its matrix elements a i j can be identified.
a i j = C i j C i ; i , j = 1 , , T
where a i j represents the likelihood that C j occurs when C i occurs. The correlation between external features is extracted using a multi-layer GCN, which means that the external features are quantized and encoded first, then mapped into a high-dimensional word embedding vector by the word embedding method. Finally, the word embedding vector and adjacency matrix are fed into the GCN for feature extraction to obtain the output G o . As seen below, G o is multiplied by y get the feature vector improved by GCN.
S = G o × y
A fully linked layer then S-reduces the anticipated values to the predictive dimension.

3.4. Bayesian Optimization

The deep learning optimization issue is a black-box optimization problem, with the main notion being to assess each set of hyperparameter values in the set of hyperparameters Z = { z 1 , , z n } , and the evaluation result may be stated as f z . The best hyperparameter z * is discovered throughout the optimization process, i.e.,:
z * = arg min z Z f z
BO [36] discovers parameter values that minimize the objective function by estimating probabilistic proxy models (substitution functions) which is based on previous assessments of the objective function. In contrast of the random and grid searches, BO considers prior evaluations when deciding on the next hyperparameter value. BO is utilized in this research to alter the hyperparameters and get the value of the hyperparameter which is best for the probabilistic agent model. Compared with the grid search method, BO requires fewer iterations. The following are the steps in the BO method.
Step 1: Determine the goal function f z and the defining domain of the hyperparameter z. The independent variable z in the optimization is the hyperparameter space, and the objective function is often a loss function. In reality, the formulation of the loss function is known, while the only thing unknown is the exact legislation.
Step 2: Determine the observed values. n parameter values Z = { z 1 , , z n } are retrieved from the hyperparameters’ designated domain, and the objective function value { f z i | i = 1 , , n } corresponding to each parameter value is determined.
Step 3: Develop a probabilistic proxy model. The goal function’s distribution model is selected based on the data, often using a Gaussian or a mixed Gaussian process.
Step 4: Choose a collection function. The collection function is used to choose the next hyperparameter value and to assess the influence of the chosen hyperparameter value on the fitted results, often using the probability of improvement, expectation improvement, upper confidence bound and entropy, etc.
Repeat steps 2–4 until the distribution model’s goal value meets a predefined condition or the computing resources are depleted (e.g., the maximum number of observations or maximum allowed running time).

4. Experiments and Evaluation

4.1. Training Planning

4.1.1. Information Sources

The data is derived from the Shandong Data Open Network, China - Grid Bus Load Dataset and includes monitoring data for various buses corresponding to 5-min sample intervals from 1 January 2019 to 2 March 2020, with around 20,000 data points for each bus. Furthermore, the weather data for distinct area IDs comprise a total of 142 data points for weather (e.g., sunny/cloudy), 78 data points for wind force, and wind direction (e.g., north wind force 1–2/north wind force 3–4), and 83 data points for substation and bus information. The number of buses used in these trials is 10, with a total data volume of roughly 200,000. Figure 5 depicts non-digital feature data such as wind force, wind direction, and weather.

4.1.2. Pre-Processing of Data

The pre-processing of load data, as well as weather, wind force, and wind direction parameters consists primarily of data cleaning and data transformation. Because the busbar load monitoring data is acquired every 5 min and does not fluctuate significantly, the average value of 10 min before and after (i.e., four monitoring data) is utilized instead for the load data. Different categories of weather, wind force, and wind direction were initially searched for, then weather, wind force, and wind direction were transformed into numerical characteristics based on the categories. The adjacency matrix and feature matrix dimension with the most weather, wind force, and wind direction categories, 142, is selected, and the adjacency matrix dimension is T × T = 142 × 142 . The label vector is transferred to high-dimensional space using a word embedding to create the dimensioned feature matrix with the dimension of T × D = 142 × 300 .
The bus monitoring data every 5 min for 24 h (288 total data) is used to estimate the load data every 5 min for the following hour (12 total data) therefore the input has a load data matrix dimension of N × C = 24 × 12 , and the output dimension is 12.

4.1.3. Establishment of a Training Environment

There is a link between the deep learning training environment, computational power, and training version. All models in the following trials were trained in the same software environment to control for factors. The operating system is Windows 10, all models were developed in TensorFlow using TensorFlow version 2.5.0, the CUDA version needed for GPU training was 11.3, and the cuDNN version was 8.2, and the GPU was an NVIDIA RTX 3090.
The hyperparameters are set up as follows during the model training phase: the total number of training epochs is set to 100, and the Adam optimizer is employed. The learning rate was initially set at 0.1. The learning rate was doubled by 0.8 per 10 generations. The Bi- LSTM’s output dimension was 32. The CNN had two layers, the output dimension was 16 and 32, the convolutional kernel had three layers, the stride was set to SAME, The GCN had two layers, the output feature dimension was 32 and 64, and the batch size was 28,000.
The first 70 % of monitoring data from each bus (a total of 140,000 data points) was utilized to create the training data set, and the final output was the projected outcome of the 60,000 individual monitoring data points selected for the test. Five identical experiments were averaged to improve experiment reliability. As indicated in Figure 6, the loss function was the Mean Square Error (MSE), MSE defined as follows:
MSE = 1 N i = 1 n y i y ^ i
where y i is the actual load data, y ^ i is the anticipated load data, and N is the sample size.
As shown in the graph, as the number of iterations increases, the loss value of the training data set and the loss value of the test data set continue to decrease until they are near zero, showing that the training can be completed correctly and excellent results may be obtained.

4.1.4. Indicators of Evaluation

As assessment indicators, Symmetric Mean Absolute Percentage Error (SMAPE), Root Mean Square Error (RMSE), Mean Bias Error (MBE) and coefficient of determination (R-squared) were used, with SMAPE defined as follows:
SMAPE = 100 % N i = 1 N | y i y ^ i | ( | y i | ) / 2 + | y ^ i |
where y i is the actual load data, y ^ i is the anticipated load data, and N is the sample size.
The RMSE is a measure of the degree of variance in data and it is defined as:
RMSE = 1 N i = 1 N ( y i y ^ i ) 2
The MBE does not use absolute values so that positive and negative errors can cancel each other out. It determines whether there is a positive or negative bias in the model. MBE is defined as follows:
MBE = i = 1 N y i y ^ i N
The R-squared is used to evaluate the goodness of fit, with the denominator being the dispersion of the original data and the numerator being the error between the predicted and true data, the two being divided to eliminate the effect of the original data dispersion, defined as:
R 2 = 1 i = 1 N ( y i y ^ i ) 2 i = 1 N ( y i y ¯ ) 2
where y ¯ is the mean of the actual load data

4.2. Test Results Comparison

Machine learning methods such as Linear Regression (LR), K-Nearest Neighbor (KNN), Decision Tree (DT), Bi-LSTM, CNN, and CNN-BILSTM (S2S), and other deep learning models were used as comparison models to compare and assess the prediction performance of various prediction models. CNN-Bi-LSTM (S2S) is distinct from the MFF-SAM-GCN model suggested in this research, which employs CNN first for initial feature extraction and then Bi-LSTM for subsequent feature extraction. To maintain consistency with the previous parameters, the optimizer was set to Adam, and the learning rate was set to 0.1. The highest number of iterations was 100. Table 1 displays the prediction results of each model. The RMSE, SMAPE and MBE of the proposed model in this work are lower, and the R-squared is greater, showing that the model’s prediction accuracy is better than the comparison model.
After 100 iterations, the final predicted load data for every 5 min in 1 hour is shown in Figure 7.

4.3. Experiments with Ablation

This section performs independent ablation tests for the SAM and GCN in order to assess the influence of these two modules on the experiments.
SAM module effect: Remove the SAM in the midst of the two 1D-CNNs to see the effect of the SAM. The rest of the options are the same.
GCN module effect: Remove GCN and disregard the effect of weather and external characteristics of wind force, and wind direction on the electrical load.
Table 2 displays the outcomes of the ablation trials. The findings reveal that eliminating the SAM and GCN from the MFF-SAM-GCN prediction model framework increases the RMSE, SMAPE, and MBE, reduces the R-squared, and makes the model less effective, demonstrating that SAM and GCN may enhance the model’s performance.

4.4. Extensive Research

4.4.1. The Effect of Hyperparameter Values

BO is used to discover the ideal hyperparameters in order to investigate and test the influence of hyperparameter settings on the model’s prediction performance. The objective function is the model’s loss function on the test dataset when the set of hyperparameters is used. The goal function in this research is the Mean Absolute Error (MAE) and MAE is defined as follows:
MAE = 1 N i = 1 N y i y ^ i
where y i is the actual load data, y ^ i is the anticipated load data, and N is the sample size.
The domain space is represented as the range of values chosen for each hyperparameter, and the alternative parameters in the domain space are picked based on empirical data as shown in Table 3.
The domain space is searched 20 times, with the least MAE after each search shown in Figure 8. The smallest MAE after the 18th search, according to the graph, is 0.01928 with parameters [0.2, 0.1, 32, 16, 32, 2, 3, 200, 25,000].

4.4.2. The Impact of the Bus

Ten buses were used in the experiment, and three sub-experiments were carried out. In the first experiment, one input was added. The buses were embedded as words, and the buses’ features were extracted using LSTM networks, and fused with the MFF-SAM-GCN model to obtain the fused features, and then the fused features were enhanced using GCN; in the second experiment, the MFF-SAM-GCN model was trained using all of the experimental data without taking into account the influence of buses on the prediction results; the third method is to train the respective models for different buses and make predictions, the final predicted value is the average of all model predictions.
Table 4 displays the outcomes of the tests. As shown in the table, the RMSE, SMAPE, MBE, and R-squared obtained from the three experiments do not differ significantly, and the highest accuracy is obtained by training ten models without taking into account the systematic errors generated during the training process, but the time consumed increases as well.

5. Conclusions and Future Work

We propose an SLF model combined with MFF, SAM, and GCN, employing a Bi-LSTM network and SAM based 1D-CNN to extract spatiotemporal correlation features of load data and conduct feature fusion. We use the correlation between the weather, wind force, and wind direction to construct a GCN model, which not only captures textual information but also effectively enhances the concentration of load features and improves the accuracy of predictions.The proposed MFF-SAM-GCN forecasting model has been tested on the bus load dataset of Shandong Open Data Network, China. The simulation is among our proposed model and six benchmark models, i.e., three machine learning models (LR, KNN, and DT) and three deep learning models (Bi-LSTM, CNN, and CNN-BILSTM (S2S)). The simulation results indicate that the MFF-SAM-GCN prediction model has a SMAPE of 9.453%, an RMSE of 0.028, a MBE of 0.025 and an R-squared of 0.989, with an overall better prediction performance than the benchmark models. In addition, ablation tests were carried out in this paper to investigate the function of the SAM and GCN. Finally, extensive tests have been carried out to investigate and assess the impacts of hyperparameter settings and buses on prediction performance.
Finally, the MFF-SAM-GCN model described in this article is proven to accept a variety of data inputs. MFF can efficiently extract time-related and spatially-related features from load data, expanding the feature dimension of load prediction. Meanwhile, GCN can effectively extract textual features and capture external characteristics that have an influence on load prediction. In the future, different datasets will be sought to verify the generality of the model. New optimization algorithms or the integration of multiple optimization algorithms will be investigated to optimize the forecasting model, while more external such as day types, economic indicators factors [37] that may have an impact on load forecasting will be investigated to further improve forecasting accuracy and develop a practical product.

Author Contributions

Conceptualization, Y.Z.; Data curation, W.F.; Methodology, Y.Z.; Validation, J.Z.; Writing—original draft Y.Z.; Funding acquisition, W.F.; Investigation, J.Z.; Project administration, J.L.; Resources, W.F.; Supervision, W.F.; Visualization, J.Z.; Writing—review & editing, J.L. All authors have read and agreed to the published version of manuscript.

Funding

The work of Weiheng Jiang and Wenjiang Feng was supported by the National Natural Science Foundation of China under Grant 62001067 and Chongqing Basic Science and Frontier Technology Research Project under Grant cstc2017jcyjBX0047.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AMAttention Mechanism
Bi-LSTMBi-directional Long Short-Term Memory
BOBayesian Optimization
CNNConvolutional Neural Network
1D-CNNOne-Dimensional Convolutional Neural Network
DTDecision Tree
DSRDemand Side Response
EEMDEnsemble Empirical Mode Decomposition
GCNGraph Convolutional Network
GRUGated Recurrent Unit
KFCMKernel Fuzzy C-Means
KNNK-Nearest Neighbor
LLFLong-term Load Forecasting
LRLinear Regression
LSTMLong Short Term Memory
MBEMean Bias Error
MFFMulti-Feature Fusion
MLFMedium-term Load Forecasting
MSEMean Square Error
PSOParticle Swarm Optimization
RMSERoot Mean Square Error
RNNRecurrent Neural Network
SAMSelf-Attention Mechanism
SLFShort-term Load Forecasting
SMAPESymmetric Mean Absolute Percentage Error
TCNTime Convolution Network

References

  1. Fang, X.; Misra, S.; Xue, G.; Yang, D. Smart Grid-The New and Improved Power Grid: A Survey. Commun. Surv. Tutor. IEEE 2012, 14, 950–956. [Google Scholar] [CrossRef]
  2. Ekonomou, L.; Christodoulo, C.A.; Mladenov, V. A short-term load forecasting method using artificial neural networks and wavelet analysis. Int. J. Power Syst. 2016, 1, 64–68. [Google Scholar]
  3. Javed, F.; Arshad, N.; Wallin, F.; assileva, I.V.; Dahlquist, E. Forecasting for demand response in smart grids: An analysis on use of anthropologic and structural data and short term multiple loads forecasting. Appl. Energy 2012, 96, 150–160. [Google Scholar] [CrossRef]
  4. Xia, C.; Wang, J.; Mcmenemy, K. Short, medium and long term load forecasting model and virtual load forecaster based on radial basis function neural networks. Int. J. Electr. Power Energy Syst. 2010, 32, 743–750. [Google Scholar] [CrossRef] [Green Version]
  5. Wei, L.; Zhang, Z.-G. Based on Time Sequence of ARIMA Model in the Application of Short-Term Electricity Load Forecasting. In Proceedings of the International Conference on Research Challenges in Computer Science IEEE, Shanghai, China, 28–29 December 2009. [Google Scholar]
  6. Song, K.-B.; Baek, Y.-S.; Hong, D.H.; Jang, G. Short-term load forecasting for the holidays using fuzzy linear regression method. IEEE Trans. Power Syst. 2005, 20, 96–101. [Google Scholar] [CrossRef]
  7. Al-Hamadi, H.M.; Soliman, S.A. Short-term electric load forecasting based on Kalman filtering algorithm with moving window weather and load model. Electr. Power Syst. Res. 2004, 68, 47–59. [Google Scholar] [CrossRef]
  8. Jain, A.; Satish, B. Clustering based Short Term Load Forecasting using Support Vector Machines. In Proceedings of the 2009 IEEE Bucharest Powertech, Bucharest, Romania, 28 June–2 July 2009. [Google Scholar]
  9. Hambali, A.O.J.; Akinyemi, M.; Jyusuf, N. Electric power load forecast using decision tree algorithms. Comput. Inf. Syst. Dev. Inform. Allied Res. J. 2016, 7, 29–42. [Google Scholar]
  10. Dudek, G. Short-term load forecasting using random forests. In Intelligent Systems; Springer: Cham, Switzerland, 2014; pp. 821–828. [Google Scholar]
  11. Ceperic, E.; Ceperic, V.; Baric, A. A Strategy for Short-Term Load Forecasting by Support Vector Regres-sion Machines. IEEE Trans. Power Syst. 2013, 28, 4356–4364. [Google Scholar] [CrossRef]
  12. Çevik, H.H.; Çunkaş, M. Short-term load forecasting using fuzzy logic and ANFIS. Neural Comput. Appl. 2015, 26, 1355–1367. [Google Scholar] [CrossRef]
  13. Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
  14. Shi, H.; Xu, M.; Li, R. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans. Smart Grid 2017, 9, 5271–5280. [Google Scholar] [CrossRef]
  15. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2017, 10, 841–851. [Google Scholar] [CrossRef]
  16. Dong, X.; Qian, L.; Huang, L. Short-term load forecasting in smart grid: A combined CNN and K-means clustering ap-proach. In Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Korea, 13–16 February 2017; pp. 119–125. [Google Scholar]
  17. Lang, C.; Steinborn, F.; Steffens, O.; Lang, E.W. Electricity Load Forecasting—An Evaluation of Simple 1D-CNN Network Structures. arXiv 2019, arXiv:1911.11536. [Google Scholar]
  18. Zhou, F.; Hu, P.; Yang, S.; Wen, C. A multimodal feature fusion-based deep learning method for online fault diagnosis of ro-tating machinery. Sensors 2018, 18, 3521. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. arXiv 2018, arXiv:1809.10185. [Google Scholar]
  20. Chen, B.J.; Chang, M.W.; Lin, C.J. Load Forecasting Using Support Vector Machines: A Study on EUNITE Compe-tition 2001. IEEE Trans. Power Syst. 2004, 19, 1821–1830. [Google Scholar] [CrossRef] [Green Version]
  21. Rafi, S.H.; Deeba, S.R.; Hossain, E. A Short-Term Load Forecasting Method Using Integrated CNN and LSTM Network. IEEE Access 2021, 99, 32436–32448. [Google Scholar] [CrossRef]
  22. Goh, H.H.; He, B.; Liu, H.; Zhang, D.; Dai, W.; Kurniawan, T.A.; Goh, K.C. Multi-Convolution Feature Extraction and Recurrent Neural Network Dependent Model for Short-Term Load Forecasting. IEEE Access 2021, 9, 118528–118540. [Google Scholar] [CrossRef]
  23. Farsi, B.; Amayri, M.; Bouguila, N.; Eicker, U. On short-term load forecasting using machine learning techniques and a novel parallel deep LSTM-CNN approach. IEEE Access 2021, 9, 31191–31212. [Google Scholar] [CrossRef]
  24. Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Liu, Y. Short-Term Load Forecasting for Industrial Customers Based on TCN-LightGBM. IEEE Trans. Power Syst. 2021, 36, 1984–1997. [Google Scholar] [CrossRef]
  25. Fahim, A.; Tan, Q.; Mazzi, M.; Sahabuddin, M.; Naz, B.; Ullah, B.S. Hybrid LSTM Self-Attention Mechanism Model for Forecasting the Reform of Scientific Research in Morocco. Comput. Intell. Neurosci. 2021, 2021, 6689204. [Google Scholar] [CrossRef] [PubMed]
  26. Miao, K.; Hua, Q.; Shi, H. Short-Term load forecasting based on CNN-BiLSTM with Bayesian optimization and attention mechanism. In Proceedings of the International Conference on Parallel and Distributed Computing: Applications and Technologies, Guangzhou, China, 17–19 December 2020; pp. 116–128. [Google Scholar]
  27. Azam, M.F.; Younis, S. Multi-Horizon Electricity Load and Price Forecasting using an Interpretable Multi-Head Self-Attention and EEMD-Based Framework. IEEE Access 2021, 9, 85918–85932. [Google Scholar] [CrossRef]
  28. Zang, H.; Xu, R.; Cheng, L.; Ding, T.; Liu, L.; Wei, Z.; Sun, G. Residential load forecasting based on LSTM fusing self-attention mechanism with pooling. Energy 2021, 229, 120682. [Google Scholar] [CrossRef]
  29. Shang, C.; Gao, J.; Liu, H.; Liu, F. Short-Term Load Forecasting Based on PSO-KFCM Daily Load Curve Clustering and CNN-LSTM Model. IEEE Access 2021, 9, 50344–50357. [Google Scholar] [CrossRef]
  30. Shang, C.; Gao, J.; Liu, H.; Liu, F. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 99, 1–11. [Google Scholar]
  31. Wang, R.; Huang, C.; Wang, X. Global relation reasoning graph convolutional networks for human pose estimation. IEEE Access 2020, 8, 38472–38480. [Google Scholar] [CrossRef]
  32. Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
  33. Wang, S.; Wang, X.; Wang, S.; Wang, D. Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. Int. J. Electr. Power Energy Syst. 2019, 109, 470–479. [Google Scholar] [CrossRef]
  34. Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
  35. Chen, Z.M.; Wei, X.S.; Wang, P.; Guo, Y. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  36. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25, 2960–2968. [Google Scholar]
  37. Liang, Y.; Niu, D.; Hong, W.C. Practical bayesian optimization of machine learning algorithms. Short Term Load Forecast. Based Feature Extr. Improv. Gen. Regres. Neural Netw. Model 2019, 166, 653–663. [Google Scholar]
Figure 1. Frame structure diagram.
Figure 1. Frame structure diagram.
Energies 15 03140 g001
Figure 2. Memory neuron structure of LSTM network.
Figure 2. Memory neuron structure of LSTM network.
Energies 15 03140 g002
Figure 3. Bi-LSTM network model.
Figure 3. Bi-LSTM network model.
Energies 15 03140 g003
Figure 4. SAM framework.
Figure 4. SAM framework.
Energies 15 03140 g004
Figure 5. Pie chart of weather and wind types. (a) Proportion of different weather. (b) Proportion of different wind types.
Figure 5. Pie chart of weather and wind types. (a) Proportion of different weather. (b) Proportion of different wind types.
Energies 15 03140 g005
Figure 6. Loss for training and testing.
Figure 6. Loss for training and testing.
Energies 15 03140 g006
Figure 7. Prediction curve of 12 time periods per hour.
Figure 7. Prediction curve of 12 time periods per hour.
Energies 15 03140 g007
Figure 8. Min (MAE) after n calls.
Figure 8. Min (MAE) after n calls.
Energies 15 03140 g008
Table 1. Prediction performance of different models.
Table 1. Prediction performance of different models.
MethodRMSESMAPEMBE R 2 Runtimes (Company: s)
LR0.03712.561%0.0360.952189.431
KNN0.04215.243%0.0400.947159.563
DT0.04516.655%0.0420.939209.767
BiLSTM0.03411.046%0.0330.964317.269
CNN0.03511.784%0.0330.957306.436
CNN-LSTM(S2S)0.03210.542%0.0310.974324.584
GCN-MTF0.0289.453%0.0250.989337.063
Table 2. Outcomes of the abliation trials.
Table 2. Outcomes of the abliation trials.
BasicAttentionGCNRMSESMAPEMBE R 2 Runtimes (Company: s)
 --0.03711.485%0.0350.965316.453
CNN-LSTM-0.03511.236%0.0320.972322.536
(MFF)-0.03210.256%0.0310.979326.461
 0.0289.453%0.0250.989337.063
Table 3. Alternative parameters of domain space.
Table 3. Alternative parameters of domain space.
HyperparameterSearch Scope
drop_out_rate(0.1, 0.2, 0.3)
learning_rate(0.1, 0.2, 0.3)
Bi-LSTM_hidden size(32–128)
CNN1_number of filter(16, 32, 64)
CNN2_number of filter(16, 32, 64)
CNN1_kernel size(2, 3, 4)
CNN1_kernel size(2, 3, 4)
epochs(50, 100, 200)
Batch size(15,000, 20,000, 25,000, 30,000)
Table 4. Model performance comparison under different strategies.
Table 4. Model performance comparison under different strategies.
MethodRMSESMAPEMBE R 2 Runtimes (Company: s)
method 10.0289.286%0.0250.990353.985
method 20.0289.453%0.0250.989337.063
method 30.0269.232%0.0230.9932980.231
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zou, Y.; Feng, W.; Zhang, J.; Li, J. Forecasting of Short-Term Load Using the MFF-SAM-GCN Model. Energies 2022, 15, 3140. https://doi.org/10.3390/en15093140

AMA Style

Zou Y, Feng W, Zhang J, Li J. Forecasting of Short-Term Load Using the MFF-SAM-GCN Model. Energies. 2022; 15(9):3140. https://doi.org/10.3390/en15093140

Chicago/Turabian Style

Zou, Yongqi, Wenjiang Feng, Juntao Zhang, and Jingfu Li. 2022. "Forecasting of Short-Term Load Using the MFF-SAM-GCN Model" Energies 15, no. 9: 3140. https://doi.org/10.3390/en15093140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop