1. Introduction
Power grid systems supply power loads for millions of users which are dynamic and complex. Therefore, efficient and reliable power grid systems are essential for maintaining power stability and avoiding power system outages and supply user load demands without power interruptions [
2]. A sufficient power utilization scheme with accurate short-term load forecasting (STLF) is necessary for application on power grid systems [
5]. One percent of forecasting error can cause operation losses of 10 million or more [
6]. Since 40% of electrical power is supplied to buildings through the power grid system, an accurate STLF benefits all stakeholders of the energy market and results in substantial savings for users [
7]. An accurate STLP also contributes significant savings economically and also ensures power grid reliability and security [
8]. Accurate forecasting is essential for the system controller to maintain grid system stability [
To perform STLF, physics-based models consisting of system equations can be used. Those physics-based models can be used to explicitly illustrate the system dynamics. However, developing those physics-based models requires extensive knowledge of the internal components of systems or buildings which are related to power consumption. Data-driven models are developed by data. Developing such models does not require extensive knowledge relating to systems or buildings. Thanks to advanced smart sensor technologies, smart meters can be used to capture the loads consumed by users in real time. Smart sensors can be used to capture weather information such as temperature, wind speed/direction and sea level pressure, which are correlated with load consumption. This time-series data is captured in real time in order to perform accurate and reliable STLF [
15]. Among those data-driven models, deep neural networks (DNNs) have commonly been used, since DNNs consist of complex, multiple-neuron layers which are effective for modelling nonlinear and chaotic load consumption data [
16]. Recent DNN techniques for STLP can be classified into two categories, (i) single time-varying feature, where the DNN uses past load consumption to predict future load consumption, and (ii) multi-time-varying features, where the DNN uses past dynamic information such as past weather conditions, past meteorology information, past seasonal and calendar information to predict future load consumption, despite using past load consumption.
For the single time-varying feature, the DNNs forecast future load consumption using past load consumption. A long short-term memory (LSTM)-based DNN was developed using past load consumption sequences of appliances in order to forecast future load consumption [
17]. Peng et al. [
18] applied linear regression and LSTM to forecast future load consumption using past load consumption. Hafeez et al. [
19] proposed a Boltzmann machine-based DNN to predict future load consumption using past load consumption. Aly et al. [
20] proposed a clustering technique which used past load consumption in order to classify the future load consumption demands of users. Based on past power demands, various models have been developed to predict load consumption for various users. Rafati et al. [
21] proposed a dense neural network to model the nonlinear and dynamic characteristics of past electrical load in order to predict future load consumption. Sekhar et al. [
22] proposed a hybrid DNN by combining LSTM and a convolution neural network (CNN) to perform load prediction using past load information. Hybrid DNNs based on CNN, LSTM and decision tree have been proposed by Wan et al. [
23] and Massaoudi et al. [
24] to improve prediction accuracy. Tavassoli-Hojati et al. [
25] proposed a self-partitioning local neuro-fuzzy model, where the model is trained by analysing both the linear and nonlinear characteristics of past load time-series features. Wei et al. [
26] proposed a decomposition algorithm based on detrend singular spectrum fluctuation analysis to extract the trend and periodic components in past load data. An LSTM was trained with the extracted components. Yang et al. [
27] proposed a decomposition approach to extract the time-series components of past load consumption. The decomposition approach captures useful past load consumption components to train DNNs.
For the multi-time-varying features, DNN models are developed by correlating future load consumption with past load consumption and past dynamic information such as past seasonal time information, past weather or meteorological conditions. Liang et al. [
28] developed a hybrid DNN based on empirical mode decomposition and a regression neural network; the features used in the DNN included past temperature, past meteorology conditions and past load consumption. Ahmad et al. [
29] proposed a novel DNN which included the features of past load information and past meteorology conditions. Kwon et al. [
30] proposed a DNN where both past weather information and past load consumption were used as the DNN inputs. An adaptive neuro fuzzy inference system was proposed to predict the future load consumption of the Rajasthan region of India using past load consumption and past acute climatic conditions [
31]. Zor et al. [
32] proposed a DNN where the DNN inputs were based on past load consumption and past meteorological variables at a large hospital in the eastern Mediterranean. Eseye et al. [
33] developed a hybrid machine-learning technique where the features included past weather, past load consumption, past seasonality and calendar information. Eseye et al. [
34] proposed a novel feature selection based on a genetic algorithm to select significant features to improve load consumption forecasting accuracy. Hu et al. [
35] proposed a back propagation-based neural network to predict the load consumption of the process industry where past load consumption, past production planning information and past humidity were used as DNN inputs. Yaprakdal et al. [
36] proposed a feedforward neural network to predict the future load consumption, where the time-varying features included past load consumption, past temperature, past direct horizontal radiation and past diffuse horizontal radiation. Tziolis et al. [
37] proposed a Bayesian neural network model where time-varying features such as past load consumption, past humidity, past dew point temperature, past horizontal irradiance and past wind speed were used as the network inputs.
The aforementioned DNN models only use the dynamics of single time-varying features or multi-time-varying features in order to forecast future load consumption. They use those dynamic features as the DNN inputs. They do not use the static information of the time-invariant features such as year built, building spaces, number of person in the building or building purposes. In fact, these time-invariant features are related to load consumption. When both time-invariant and time-varying features are used as the DNN inputs, more information is available for the DNNs to perform STLF; therefore, more accurate predictions are likely to be achieved. For example, building ages relate to load consumption [
38]. Newer buildings consume less energy since they are constructed with strong isolation material. Older buildings consume more energy since the isolation material is generally poorer than in new buildings. More electricity for heaters or air-conditioners is consumed. As another example, more electricity is used for a larger building space, while less electricity is consumed for a smaller building. Hence, building space correlates to load consumption [
39]. Buildings with more users consume more energy; less energy is consumed for buildings with a smaller number of users [
41]. Occupant characteristics such as age, education, income and residency length are also correlated to load consumption [
43]. On the contrary, building purposes relate to power consumption. Commercial or industrial buildings use more energy; resident buildings use less energy [
44]. When more correlated features are included, more accurate predictions are likely to be achieved. Therefore, we can use time-invariant features to improve STLF since time-invariant features are also correlated to load consumption.
In this paper, a fuzzy clustering-based DNN is proposed by using both time-varying and time-invariant features to perform STLF. Clusters are generated to classify users with respect to time-invariant features, where the fuzzy c-means algorithm [
45] is used since this algorithm is commonly used to cluster samples with time-invariant features [
46]. Each cluster groups old users with similar time-invariant features which address the static information. Various DNNs are developed by the time-varying features of old users in the corresponding clusters, which have similar time-invariant features. The time-varying features of users in the same cluster are shared and are used to develop a DNN model for this particular cluster. Since the time-invariant features are already used to cluster users, the DNN model does not need to include the time-invariant features and the model is simpler. In addition, the DNN model only needs to learn time-varying features and it predicts time-varying dynamics for users in the same cluster, which has similar time-invariant features. Therefore, more accurate predictions of time-varying dynamics are likely to be achieved by the proposed model, compared to the commonly used DNN models, which need to address time-varying dynamics for all users. The proposed fuzzy clustering-based DNN is integrated with an LSTM and a CNN which is commonly used for STLF when time-varying features are used [
36]. The performance of the proposed fuzzy clustering-based DNN was evaluated by Miller’s data [
48], which includes both time-varying features such as load consumption, air temperature and wind speed and invariant time features such as building size and floor count. Experimental results show that more accurate forecasting can be achieved by the proposed fuzzy clustering-based DNN to predict the load consumption of new users when the data of new users is not available to train the DNN.
The main contributions of this research article are listed below.
- (1)
To perform STLF, the existing approaches only use time-varying dynamics such as past load consumption or past power correlated features [
54]. No existing approach uses time-invariant features such as building spaces or building age to perform STLF. A novel approach is proposed in this paper to incorporate both time-varying and time-invariant features in order to improve STLF accuracy.
- (2)
A novel STLF approach, namely fuzzy clustering-based DNN, is proposed by incorporating fuzzy clustering and deep learning. The fuzzy clustering addresses time-invariant features and the deep learning addresses time-varying features. This incorporation improves existing DNN models, which only address time-varying features.
- (3)
The proposed fuzzy clustering-based DNN is evaluated by Miller’s dataset [
48], which is used for evaluating load consumption predictors. The datasets are involved with both time-invariant and time-varying features. The results demonstrate that better STLF can be achieved by the proposed fuzzy clustering-based DNN.
- (4)
To evaluate the prediction performance of the proposed fuzzy clustering-based DNN, its prediction performance is compared with some recently published STLF approaches.
The rest of the article is structured as follows:
Section 2 describes the purposes of STLF and describes how a DNN model can be developed for STLF.
Section 3 describes the mechanism of the proposed fuzzy clustering-based DNN. It also describes how the fuzzy clustering addresses the time-invariant features and the DNN model addresses the time-varying features.
Section 4 shows the load consumption data, which is used for evaluating the proposed method; it shows how the proposed method is implemented, and the prediction results are also shown, compared with other existing methods. A conclusion is drawn in
Section 5.
2. Load Consumption Forecasting
The STLF performed by the DNN model is given as (
In (
1), the DNN model,
, forecasts future load consumption,
, with
m time samples ahead.
W is the parameter set of
, which needs to be optimized with respect to the prediction accuracy.
is the noise residual at time
in (
2) is the past information set, which is windowed by a time series between the current time,
t, to the past,
p, samples of time.
denotes the forecasting feature vector which contains the
forecasting feature,
, such as past weather information, past climate information, past seasonal information, user information and building information.
is the past load consumption. Both
are correlated to the future load consumption. Therefore,
, containing both
, is used to forecast
To optimize
W is determined by the training dataset collected from the
M existing users, namely
, where
in (
3) is the data collected for the
user with
, which contains
n samples of past load consumption and the past information set.
is the past information set windowed with time
for the
is the load consumption at time
for the
is further written as:
which contains the past load consumption and past forecasting features within the time window between
for the
Based on the past information set and the load consumption in
W in
can be determined by solving the optimization problem in (
The forecasting framework is shown in
Figure 1. The DNN model,
, is developed by the training dataset,
D, which contains the data from the
M users,
. Some past information features are time-invariant, such as building spaces, year built, number of building floors and building purposes. Those time-invariant features are related to load consumption for new users. We can use those time-invariant features to improve the prediction accuracy for new users. For example, a larger building space uses more electricity, while less electricity is consumed with a smaller building. In addition, building age correlates with energy consumption, since older buildings are mostly constructed with older material which has less isolation capability. Hence, more energy is required to warm or cool buildings during winters or summers. For modern buildings, better isolation material is used and less energy is consumed. Furthermore, building purposes are related to user behaviours regarding power consumption. Residential buildings use more energy at night time and less energy at day time. On the contrary, commercial or industrial buildings use more energy at day time and less energy at night time. Therefore, building space, building age and building purpose are time-invariant features which correlate to load consumption.
Section 3 discusses how time-invariant features are used to improve STLF.
3. Fuzzy Clustering-Based Deep Learning Model
All forecasting features,
in (
2), with
are divided by two sets of features in (
6), namely time-invariant features,
, and time-varying features,
, where
C is the number of time-invariant features and
is the number of time-varying features. All elements in
are constants since they are time-invariant.
Given that the first
C features are time-invariant constants, the past information of the
user in (
4) can be rewritten as:
into (
can be rewritten as:
where the terms with the subscripts from 1 to
C in (
7) are the time-invariant feature data for the
user. The terms are included in a vector,
; those from
N are the time-varying feature data, which is written as a vector,
The time-varying set for the
user is grouped as:
In this section, a fuzzy clustering-based DNN model is proposed to forecast the load consumption of new users. Clusters are generated to classify users with respect to time-invariant features using the time-invariant vector with . Each cluster is grouped with users with similar time-invariant features. Each DNN model is developed by time-varying sets for users in the same cluster, which have similar time-invariant features. Hence, all in the same cluster are used to develop a DNN model. The time-varying features in the same cluster are shared and are used to develop the DNN model.
Since the time-invariant features are already used to cluster users, the DNN model does not need to include the time-invariant features and the model only uses the time-varying features to forecast future load consumption; therefore, a simpler model can be generated. In addition, the model only needs to learn the time-varying features and predict time-varying dynamics in the clusters which have similar time-invariant features. Therefore, the learning is simpler and more accurate predictions of time-varying dynamics are likely to be achieved by the proposed model, compared to the commonly used DNN models which address both the time-varying and time-invariant dynamics.
Section 3.1 discusses the clustering method for classifying users based on time-invariant features.
Section 3.2 discusses the deep-learning models based on time-varying features to forecast future load consumption.
3.1. Clustering of Time-Invariant Features
When the time-invariant vectors of all users are given, clusters can be generated to classify users which have similar behaviours of using electrical power. Given that we have
clusters with
, we determine which cluster the
user belongs to, where
. Here,
in (
10) is defined as the membership of the
user to the
cluster, where
. The membership indicates how much
belongs to the
cluster. If
is large, the
user has a similar behaviour to the users in the
cluster. Therefore,
is in the
cluster if
for all
is the
norm distance between
and the
cluster centre, and
is the weighting exponent with
is given as:
is a positive definite
weight matrix and
denotes the centre of the
cluster, which is given by:
To determine the cluster centres,
, the generalized least-squared error in (
13) is minimized for all
In (
is the membership function of
to the
cluster and
is the
-norm distance between the
user to the
cluster centre. The weight attached to each
, which is the
power of the
membership in cluster
k. Therefore, minimizing (
13) ensures that all users are close to their corresponding cluster centres. If
minimizes equally to all distances. If
is larger,
minimizes large distances since the power of large distances dominates other small distances.
To minimize
, the FCM algorithm is proposed [
45]. The FCM algorithm is one of the most commonly used methods for identifying cluster centres and memberships between each sample to each cluster. Recent research shows that the FCM algorithm is an effective approach for clustering data [
49], particularly in solving recent engineering problems such as predicting power system risks [
50], bearing fault diagnosis [
55], power equipment image segmentation [
51], PV array fault diagnosis [
52] and classifying load consumption for users [
53], classifying groundwater quality [
54]. Therefore, we proposed the FCM algorithm illustrated in Algorithm 1 to minimize (
13) in order to determine the optimal cluster centres,
, in (
12). The fuzzy partition coefficient,
, indicates the clustering performance.
In the FCM algorithm, the inputs are the time-invariant features of the
M users. The first two steps randomly initialize a membership matrix which indicates how much a user belongs to a cluster. Step 3 initializes the first set of cluster centres using (
12). Step 5 computes the membership of a user to a cluster using (
10), and it generates the membership matrix. Step 6 compares whether the membership matrix is smaller than a threshold. If the membership matrix is smaller, the fuzzy partition coefficient is computed; both the computed fuzzy partition coefficient and the computed cluster centres in Step 3 are returned as the output of the FCM algorithm. Otherwise, Step 3 computes the cluster centres and the algorithm is repeated iteratively.
Algorithm 1
Fuzzy C-Means (FCM) Algorithm |
Input: All time-invariant vectors , with . Output: The centres of the clusters, ; the fuzzy partition coefficient, Step 1: Set the algorithmic parameters, , m, , and the threshold, . Step 2: Randomly initialize a membership matrix, , with the iteration . Step 3: Compute the cluster centres using ( 12). Step 4: Set . Step 5: Compute an updated membership matrix, , with using ( 10). Step 6: Compare whether is higher than : If is smaller than , then compute the fuzzy partition coefficient, , goto Step 7. Else Set as and goto Step 3. Step 7: Return The cluster centres,
After the cluster centres are determined, they are used to determine the memberships to each cluster when the time-invariant vector
of the
user is given. The
user belongs to the
cluster if the membership belonging to the
cluster is larger than that belonging to the
cluster, where
, and the membership of
to the
cluster is
belongs to one of the
M clusters. The time-varying sets of all users in a single cluster are used to develop a model to predict the future load consumption.
are in the
cluster, where
denotes the index vector which indicates the time-invariant vectors in the
is the number of elements in the
cluster and the
time-invariant vector with
is in the
cluster. All
are different, where
. Since there are
models are developed using the time-varying sets.
Fuzzy Deep Learning in Algorithm 2 illustrates how the
models are developed, when the time-invariant vectors and time-varying sets are given. The first two steps generate
cluster centres using the FCM in Algorithm 1. Step 3 determines the time-invariant vector belonging to each cluster, based on (
15). Step 4 determines the index vector of time-varying sets to each cluster using (
16). Step 5 develops the model using the time-varying sets in each cluster. Each model is developed based on the time-varying sets in the corresponding cluster. In this paper, the two commonly used deep-learning approaches, namely LSTM and CNN described in
Section 3.2.1 and
Section 3.2.2, are used, respectively.
Algorithm 2
Fuzzy Deep learning |
Input: All time-invariant vector and time-varying set , with . Output: DNN models, , with , which forecasts future load consumption. Step 1: Initialize the parameters, , m, , and , and the threshold, . Step 2: Generate the cluster centres, , using the FCM in Algorithm 1. Step 3: Determine the membership of to the cluster, using ( 15). Step 4: Determine the index vector, using ( 16), with , which indexed the time-varying sets in the cluster. Step 5: Use all time-varying sets with to develop the DNN model, , using deep learning. Step 6: Return The DNN models with .
The flow involving
FCM in Algorithm 1 and
Fuzzy Deep learning in Algorithm 2 is summarized in
Figure 2.
FCM generates the centres of the
clusters using the time-invariant vectors;
Fuzzy Deep Learning generates the
DNN models using the time-varying sets. As aforementioned in
Section 1, existing methods only use time-varying sets to develop DNN models for STLF. In fact, DNN models can be trained by both time-invariant vectors and time-varying sets, when both are available. The number of inputs in the DNN models is more than that of the proposed fuzzy clustering-based DNN, since the proposed fuzzy clustering-based DNN is only trained by the time-varying sets. Therefore, the proposed fuzzy clustering-based DNN is simpler than the existing DNN models.
After the
cluster centres and the
models are generated, the fuzzy clustering-based DNN in
Figure 3 can be used to forecast future load consumption when the time-invariant vector and time-varying set of a new user, namely
, are given. We assume that the membership of
belonging to the
cluster is larger than that belonging to the other clusters. The new user belongs to the
cluster with the cluster centre
. The corresponding
, uses
to predict the future load consumption,
. If the membership is smaller than a threshold value, the DNN trained by both time-varying sets and time-invariant vectors is used.
Section 3.2 describes how those models are developed.
3.2. DNNs for Predicting Time-Varying Features
Both LSTM and CNN are implemented on the proposed fuzzy clustering-based DNN since they have been developed for power forecasting when time-varying features such as past weather, load consumption, climate and meteorological variables are given [
3.2.1. Long Short-Term Memory Network
The LSTM network is suitable for time-series predictions since it benefits from long-term memory cells [
56]. The LSTM network in
Figure 4 is developed to forecast future load consumption,
, with
m time units ahead, when the past time-varying features,
,… and
, are given.
p denotes the number of temporal lags. The LSTM network consists of
layers: an input layer which feeds in the past time-varying features in multi-dimensions, an LSTM layer with
neurons and a dense net which determines
at the last layer. Each LSTM neuron is fed with
past time-varying features.
The LSTM nodes in
Figure 4 are interconnected in order to update the neuron states with previous inputs. Each LSTM neuron has two inputs, namely previous short-term state,
, and previous long-term state,
, where
. It also has two outputs, namely future short-term state,
, and future long-term state,
. The LSTM neurons select some of the previous short-term state and long-term state and pass those to the later LSTM neurons. At the last layer, the dense net forecasts
by combining the values of all forecasting elements in
Figure 5 illustrates the computations of how the LSTM neuron manipulates the previous and the future short- and long-term states. To simplify the state expression, the hidden layer index is omitted. The previous and future short-term states are denoted as
, respectively; the previous and future long-term states are denoted as
, respectively. The figure shows that the LSTM neuron consists of a main connected layer and three gate controller layers. The upper layer involves a control state which computes the future long-term state,
, by analysing the current input gate,
, previous short-term state,
, and previous long-term state,
. The lower layer involves
with the forget gate,
with the input gate,
, with the input node and
with the output gate. The LSTM states are changed by the three gate operations, such as by removing, writing or reading. The computations for
are performed by (
17a) to (
17f), respectively:
denotes the logistic activation function;
are the weight matrices of the four gates connecting to
are the weight matrices of the four gates connecting to the previous short-term state
, and
are the bias terms for the four gates.
The input gate and input node decide which parts of input, , are added to the long-term state, , after the forget gate, , stores the important part of which needs to be kept. The output gate generates , which decides which parts of need to be output for the current time. , and are the outputs of the function ranged from 0 to 1. is the output of the tanh function, which is between −1 and 1. After the input sequence is processed by the gate operations, the long-term memory, , and short-term memory, , are passed to the next or upper LSTM neurons.
3.2.2. Convolution Neural Network
Despite the LSTM, CNNs are suitable for predicting one-dimensional time-series data. Since sequential time-series data make up a one-dimensional image, a window-based convolution operation can be used to extract useful information [
Figure 6 illustrates the proposed CNN framework, which is a multi-head convolution network [
59]. The framework consists of many CNN heads, which are developed for time-series prediction. The time series of each time-varying feature is processed by a CNN head. Since the time-varying features are indexed from
N, the
time-varying feature within a window between
t and
, namely
in (
18), is processed by a
Head, where
t is the current time and
is the past time with
p sample lag and
Each -Head is responsible for capturing useful information from , which is correlated with the future load consumption, . Since all have different natures and scales, each can be processed independently and useful information from each feature can be captured. The individual prediction of each -Head is gathered by a dense network in order to predict the future load consumption, .
The CNN head in
Figure 7 consists of an input layer, several convolution layers, several pooling layers, a concatenate layer and a dense layer. The input layer feeds in the time-varying feature,
. The convolution layer extracts important information from
. Each convolution layer consists of multiple sliding windows which scan input time series. The sliding window extracts useful information from the time series by capturing repeated patterns at different regions of the time series. Since the sliding windows in the convolution layer focus on the corresponding features, useful information from each feature can be kept. An activation function is applied to the convolution output to learn the nonlinear patterns of each feature. The pooling layer is used after the convolution layer to reduce the time-series size. After several convolutions and pooling operations, the processed time series is concatenated and is passed to the dense layer. The future information is passed to the dense network at the CNN framework in
Figure 6 in order to predict the future load consumption,