1. Introduction
Energy is essential for people’s social life and scientific and technological development. With social development and technological innovation, the demands of society for energy are increasing. Of these, commercial and residential buildings account for a significant proportion of energy consumption. With buildings accounting for approximately 40% of energy consumption, 36% of energy-related greenhouse gas emissions, and 80% of the energy consumed by citizens for heating, cooling, and domestic hot water in Europe, buildings are the largest single consumer of energy in Europe [
1]. In addition, commercial and residential buildings are major contributors to global carbon emissions. Energy for buildings is mainly derived from the burning of coal, oil, and natural gas. The use of fossil fuels increases carbon dioxide emissions, contributes to climate warming, and leads to accelerated environmental degradation. In the current form of energy, reducing energy consumption has become a goal that we have to achieve.
For commercial and residential buildings, energy consumption is concentrated in air conditioning, lighting systems, and various modern appliances. As the quality of life has improved, people’s electricity needs have become more diverse. The increasing variety of modern appliances used in life and the consequent increase in the use of electrical equipment has increased the share of energy consumption in buildings year on year. Final energy use in buildings increased from 118 EJ in 2010 to almost 130 EJ in 2019 at an average annual rate of 1%. The fastest-increasing end uses of energy in buildings—for space cooling, appliances, and electric plug-loads—drive buildings sector electricity demand growth. While electricity made up one-third of building energy use in 2020, fossil fuel use has also increased at a marginal annual average growth rate of 0.7% since 2010 [
2]. It is therefore particularly important to understand the electricity consumption of buildings to reduce CO
2 emissions and energy consumption. To analyze the electricity consumption of a building, detailed building electricity and environmental data need to be obtained. Usually, data are obtained by installing appropriate sensors at various locations in the building. However, in practice, this traditional detection method is easily rejected by users because it violates the privacy of others. In addition, distributed installations are more expensive, cumbersome to count, and do not allow building managers easy access to analytical data. The advent of smart meters has adequately addressed the problems of the traditional approach. Smart meters use fewer sensors, can record electricity consumption at hourly or even shorter intervals, and have more than just a billing function. Instead of collecting data on-site, managers can obtain the corresponding energy consumption data in real-time through wireless signal transmission, facilitating energy consumption statistics and analysis [
3].
Since abnormal power consumption behavior results in higher electricity consumption and wasted energy, identifying abnormal power consumption behavior of users from the obtained energy consumption data allows for more efficient use of electricity [
4,
5]. There are many reasons for abnormal electricity consumption by end-users, for example, damage to equipment, wasteful behavior by end-users (forgetting to switch off the equipment after use or using incorrectly configured equipment), electricity theft attacks [
6,
7]. As a result, end-users can check their electrical equipment in terms of abnormal electrical behavior and develop good electrical habits. The method of finding patterns in the data that do not conform to expected or normal behavior is generally referred to as anomaly detection. Identifying abnormal energy consumption increases by specific end users should be seen as a way of early warning to reduce energy waste in buildings [
8].
The methods for anomaly detection can be classified into distance-based methods, density-based methods, dimensionality reduction-based methods, and deep learning-based methods.
Among the distance-based methods, the K-Nearest Neighbor (KNN) algorithm is one of the more popular methods. This algorithm calculates the average distance between each sample point and its nearest K samples in turn and then uses the calculated distance for anomaly detection [
9,
10]. The distance-based approach, although effective in some cases, performs better with a priori knowledge of the anomaly duration and the number of anomalies.
The density-based approach is to investigate the density of each power consumption pattern and its neighbors. Among them, the local density cluster-based outlier factor (LDCOF) applies the concept of local density in assigning anomaly scores. Refs. [
11,
12] used the density-based spatial clustering of applications with noise (DBSCAN) method to detect anomalous power consumption in the wind farm environment. However, the density-based approach cannot take into account time correlation and is therefore not applicable to multivariate time series data.
The method based on dimensionality reduction can be used as a classification method that removes irrelevant power patterns and redundancies, possessing a low computational cost [
13]. Principal component analysis (PCA) is a multivariate data analysis method that preserves as much as possible the relationships between data extracted from process measurements and reduces the dimensionality of a large number of raw data [
14]. However, methods based on dimensionality reduction are only valid for highly correlated data and require that the data follow a multivariate Gaussian distribution [
15].
In recent years, deep learning-based methods have been widely used, and work on anomaly detection of time series data has increased significantly [
16].
First, convolutional neural networks (CNNs) have proven their effectiveness in different research applications and have superior performance in detecting time series data anomalies compared to artificial neural network (ANN) algorithms. In [
17], the authors propose a new anomaly detection technique, FuseAD, which utilizes a statistical ARIMA (Autoregressive Integrated Moving Average model) and convolutional neural network (CNN) based approach to fusing them in a residual manner. The results obtained show that this fusion-based technique can achieve the best of both by combining their strengths and complementing their weaknesses. In addition, deep CNNs can accurately identify the non-periodicity of electricity theft and the periodicity of normal electricity consumption based on two-dimensional (2D) electricity consumption data, solving the problem of low accuracy when detecting electricity theft [
18]. In [
19], the authors use convolutional neural networks for feature extraction and then use random forest algorithms to detect electricity theft to help utilities solve the problem of inefficient electricity detection and irregular energy consumption.
On the other side, Recurrent Neural Networks (RNN) also have excellent performance in time series data prediction, especially LSTM (Long Short-Term Memory) networks. As in [
20], the authors use deep learning algorithms to remove seasonality and trends from data for better anomaly detection, helping electric utilities to minimize the impact of uncaptured errors in their daily work. Meanwhile, in [
21], the authors propose a power consumption prediction and anomaly detection algorithm based on LSTM neural network, which focuses on seasonal and monthly trends, resulting in a significant improvement in power theft identification. Ref. [
22] predicted the system energy consumption using pattern decomposition based on the LSTM algorithm and detected abnormal system energy consumption by Grubbs test using the difference between the predicted and actual values, which effectively reduced the energy waste during the system operation. In [
23], the authors combined OC-SVM (one class-support vector machine) and SVDD (support vector data description), based on the generic structure of LSTM, with modified formulas to achieve efficient anomaly detection, especially for time series data, capable of handling variable length data sequences.
For the problems of traditional machine learning methods that do not apply to multiple variables, require prior knowledge, and require data to follow a multivariate Gaussian distribution, we follow the successful prospects of deep learning-based anomaly detection methods by combining CNN, Bi-LSTM (Bidirectional Long ShortTerm Memory), and attention mechanisms with a 3 criterion to propose a new energy consumption anomaly detection method. CNN can extract higher-order features from the input data. Bi-LSTM network has the advantage of acquiring contextual information of time series data compared to the LSTM network, which combines information in both forward and backward directions. In addition, attention mechanisms have been very successful in the fields of machine translation and image description generation. We use it to assign different weights to different hidden units of the neural network to make the hidden layer focus more on the key information in the sequence data.
The method applies CNN, Bi-LSTM, and attention mechanism to the prediction model mines the contextual information in historical high-dimensional energy consumption data and the contribution of different feature dimensions to the prediction results, and then uses the 3 criterion to make energy consumption outlier judgments.
Therefore, this study develops a method to identify abnormal power consumption behavior of customers in real-time using high-dimensional energy consumption data through experiments. The experimental data includes electricity consumption as well as several environmental parameters that affect electricity consumption. The aim is to detect abnormal user energy consumption behavior in real-time based on high-dimensional energy consumption data. Therefore, the results of this research have potential application in an IoT-based energy management system. In addition, the results of this research are not only applicable to the detection of anomalies in electricity consumption but also have potential applications in the detection of anomalies based on other sensor data. The main contributions of this work can be summarized as follows:
A deep learning based HDEC-AD (High Dimensional Energy Consumption) method for identifying abnormal energy consumption behavior of users;
The method is divided into two stages. The first stage is the prediction stage, where the power at the next moment is predicted from the high-dimensional power-related data collected in real time. The second stage is the anomalous pattern detection stage, where the predicted values are compared with the actual values and the prediction error is calculated and defined as an anomalous activity when it exceeds 3SDs (standard deviations);
Anomaly detection helps to build managers understanding users’ daily electricity consumption patterns so they can plan reasonable electricity demand, while users can analyze electricity costs from the anomaly results and thus reduce energy waste.
The rest of the paper is structured as follows: In
Section 2, we describe the materials and methods. In
Section 3, we perform relevant experimental tests and analysis of the results. In
Section 4, we provide a discussion. In
Section 5, we summarize the full paper and provide an outlook for the future.
2. Materials and Methods
In this section, we explain the various components of the model. Among them, the main body of the prediction model consists of CNN, Bi-LSTM, an attention mechanism, which is used to predict the value of the next time point in a given time series. The predicted values are further passed to the anomaly detection module, which will determine whether the data point is anomalous or not.
This experiment is implemented using the Keras deep learning library, running on the Google TensorFlow framework [
24,
25]. The hardware uses RTX 2060super for GPU acceleration and AMD Ryzen 7 3700X for the CPU.
As shown in
Figure 1, the HDEC-AD proposed includes a prediction model. The model uses high-dimensional data related to the energy consumption of 144 sets of users with a time interval of 10 min to predict the electricity consumption at the next moment. The prediction results are evaluated using Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) to assess accuracy.
As shown in
Figure 2, abnormal user energy consumption behavior is then identified by monitoring the difference between predicted consumption and actual consumption. The difference between the predicted consumption and the actual consumption is calculated, and, if it is greater than three times the standard deviation of the actual consumption for the previous 144 moments, then the electricity consumption is identified as abnormal and the next 24 h containing that moment are considered as abnormal electricity consumption periods.
2.1. Convolutional Layer
Convolutional neural networks have representational learning capabilities and are capable of extracting higher-order features from the input information. The convolution layer consists of several feature filters, which are used to compute different feature mappings. Each neuron in the convolution layer connects a local region in the previous layer, and the convolutional result is obtained by summing the input features by doing matrix element multiplication and superimposing the amount of deviation. The ReLU activation function is then applied to the convolution result. As shown in Equation (
1), the output of the convolutional layer can be expressed as:
where
is the activation function, ∗ is the convolution operation,
is the input of the
f-th feature filter, and both
and
are learnable parameters in the
f-th feature filter.
2.2. Dropout Layer
In deep learning, models are prone to overfitting when there are too many parameters. Overfitting is a common problem with much deep learning and even machine learning algorithms, as evidenced by high prediction accuracy on the training set and a significant drop in accuracy on the test set. The basic idea of Dropout is shown in the figure. During training, each neuron is retained with probability p (stop working with probability ), and each forward propagation retains a different neuron, which allows the model to be less dependent on certain local features and has better generalization performance. During testing, each parameter is also multiplied by p to ensure the same output expectation.
2.3. Bi-LSTM
Traditional (feed-forward) neural networks assume that the data are independent in time. However, this assumption does not apply to continuous time-series data. Therefore, for time series data, recurrent neural networks (RNNs) are commonly used. Recurrent Neural Network (RNN) is a class of recursive neural network that takes sequence data as input, recursion in the direction of sequence evolution, and all nodes (recurrent units) are connected in a chain. However, in the case of Long Term Dependencies, RNNs modeling sequential data will face the problem of gradient disappearance. Therefore, long and short-term memory networks are used to solve this problem. The LSTM model is composed of an input
at moment
t, a cell state
, a temporary cell state
, a hidden state
, an oblivion gate
, a memory gate it, and an output gate
. The computational process of the LSTM can be summarized by forgetting information in the cell state and remembering new information so that information useful for subsequent moments of computation is passed on, while useless information is discarded, and the hidden state
is output at each time step, where forgetting, memory, and output are controlled by the forgetting gate
, the memory gate it, and the output gate
, calculated from the implicit state
at the previous moment and the current input of
. The overall framework and the memory updates for each time step
t are calculated as follows [
26]:
In the previous equation, , ,, , and denote input gates, oblivion gates, output gates, storage cells, and hidden states, respectively, and ∗ denotes the product of elements. The other parameters are the weight matrices to be learned, shared between all time steps.
LSTMs can better capture dependencies over longer distances, but cannot integrate temporal information about the future, so bi-directional long and short-term memory neural networks (Bi-LSTMs) were chosen to solve this problem. The Bi-LSTM consists of a forward LSTM and a backward LSTM. The input sequence is fed into the two LSTM neural networks in forward and reverse order respectively for feature extraction, and the two output vectors (i.e., the extracted feature vectors) are stitched together to form the final feature representation. The model design concept of the Bi-LSTM is to make the feature data obtained at the moment t have information between the past and the future at the same time.
2.4. Attention Mechanisms
The introduction of an attention mechanism allows for better capturing of information about the entire sequence, selectively focusing on the state of the relevant vectors. The attention model takes the output of the Bi-LSTM as input, places a weight
on each moment,
is determined by the similarity between the current vector state ht and all vector states
= (
,
,⋯,
), and outputs a series of contextual vectors
with the same length [
27]. The calculation of the weights and context vectors can be calculated as shown in Equations (
9) and (
10):
where
is a weight on each moment,
is a series of contextual vectors,
is the current vector state, and
are all vector states.
2.5. Flatten and Dense Layers
In the last part of the model, a layer is used to ‘flatten’ the input, i.e., to make the multidimensional input one-dimensional for the transition to the fully connected layer, and does not affect the size of the batch. Finally, the fully connected layer uses a activation function with a one-dimensional output result.
2.6. Anomaly Detection
We assess the trend in electricity consumption over time by using the standard deviation, setting a threshold value of 3
above the predicted value, where
is the standard deviation of electricity consumption on the day before the actual moment [
28]. A value higher than the threshold for predicted electricity consumption at the actual moment indicates an abnormal state.
and
are calculated as shown in Equations (
11) and (
12):
where
is the standard deviation,
is the threshold,
is the predicted value,
is the electricity consumption,
is the average electricity consumption, and
n is the number of samples.
4. Discussion
Compared with traditional approaches to energy consumption prediction detection using machine learning methods, our method can predict customers’ electricity consumption information from high-dimensional energy consumption history data without a priori knowledge and can detect customers’ electricity consumption anomalies in real-time. As for the method of energy consumption anomaly detection using LSTM networks, our method combines the features of CNN, Bi-LSTM, and an attention mechanism, and has the advantage of obtaining contextual information on time series data. It will consider the impact of different time dimensions in the input sequence on the energy consumption and make the hidden layer more focused on the key information in the sequence data.
The main objective is to use high-dimensional energy consumption data to identify abnormal electricity consumption behavior of users. However, there are corresponding limitations, such as the need to obtain the user’s historical energy consumption information in advance to train the model as a way to capture the user’s electricity consumption habits and to obtain a day’s worth of energy consumption information first at the time of use. Secondly, the system is not well placed to point out a user’s high energy consumption behavior if the user’s past electricity habits are poor, and their electricity consumption is high. In addition, the practical implementation of the system depends on the availability, privacy constraints, and computing power of the data streams concerning the different user profiles.
After our neural network model is developed and deployed, the data distribution will change for various reasons, and then the model needs to be updated. We use the old model as the base model, combine the old data with the new data, and re-train the model to update the model. The model update interval will be changed according to the actual situation, usually once every six months.
The COVID-19 pandemic and the energy crisis are anomalies that will lead to changes in customers’ electricity consumption habits. Since our approach uses historical energy consumption-related data to build the model, past models will become unreliable. To deal with this anomaly, the best solution is to update the model to reduce the error.
5. Conclusions
In this study, a high-dimensional energy consumption anomaly detection method based on CNN, Bi-LSTM, and attention mechanism is proposed.
The experimental results show that the model obtained by training with historical high-dimensional energy consumption information can effectively reflect the electricity consumption behavior of users. In addition, comparisons in the ablation experiments fully illustrate that the combination of CNN, Bi-LSTM, and attention mechanisms has a better performance compared to using isolated components. In anomaly detection, the resulting model was trained to identify abnormal electricity usage behavior of users in real time. Therefore, it confirms the suitability of the model for anomaly detection.
At the same time, the research helps to establish a real-time anomaly detection system in buildings, through which building managers can plan energy consumption rationally and identify abnormal electricity usage by users. In addition, users can use the system to understand their electricity consumption and reduce energy waste.
In practical applications, information on various parameters required for model training and prediction can be obtained by reading from various sensors installed in the building. These data are transmitted through IoT devices to the cloud for calculation and storage. In addition, due to the development of IoT and communication technologies, our approach is highly applicable by simply training and deploying models in the cloud and sending the energy consumption anomaly detection results to individual building managers and users through the network. In terms of cost, the main cost is the purchase and installation of IoT devices. Since this study uses an open source system, the software licensing cost is effectively reduced.
There are two directions for future research. The first is to continue to improve model accuracy and reduce prediction errors so that the model can better analyze users’ electricity consumption behavioral habits; the second is that customers’ electricity consumption habits may change over time and the applicability of the model decreases, so addressing conceptual drift is a priority in future work.