Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN

Zai, Zhoupeng; Zhao, Sheng; Zhang, Zhengjiang; Li, Haolei; Sun, Nianqi

doi:10.3390/electronics12132824

Open AccessArticle

Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN

by

Zhoupeng Zai

,

Sheng Zhao

^*,

Zhengjiang Zhang

,

Haolei Li

and

Nianqi Sun

School of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325035, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(13), 2824; https://doi.org/10.3390/electronics12132824

Submission received: 28 May 2023 / Revised: 23 June 2023 / Accepted: 25 June 2023 / Published: 26 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

Non-intrusive load monitoring (NILM) is the practice of estimating power consumption of a single household appliance using data from a total power meter of the user’s house. The transformer model has emerged as a popular method for handling NILM problems. However, with the increase in data from electricity meters, there is a need for research focusing on the accuracy and computational complexity of the transformer model. To address this, this paper proposes a sequence-to-sequence load decomposition structure named GTCN, which combines the gate-transformer and convolutional neural networks (CNNs). GTCN introduces a gating mechanism to reduce the number of parameters for training the model while maintaining performance. The introduction of CNNs can effectively capture local features that the gate-transformer may not be able to capture, thereby improving the accuracy of power estimation of individual household appliances. The results of the experiments, based on the UK-DALE dataset, illustrate that GTCN not only demonstrates excellent decomposition performance but also reduces the model parameters compared to conventional transformers. Moreover, the proposed GTCN structure, despite maintaining the same number of model parameters as the traditional transformer architecture after incorporating CNNs, outperforms the conventional transformer model, as well as current seq2seq and R-LSTM technologies, and achieves enhanced prediction accuracy and improved generalization capability.

Keywords:

non-intrusive load monitoring; gate-transformer; gating mechanism; convolutional neural network; generalization ability

1. Introduction

Due to the deteriorating global environment and the decreasing availability of renewable energy, energy conservation and emission reduction are crucial topics that cannot be neglected. Monitoring the power consumption of specific household appliances and providing users with more accurate and sophisticated services for home energy consumption evaluation can be challenging due to the widespread use and high operational flexibility of household appliances. Load monitoring has increasingly demonstrated enormous research potential in energy conservation and management, enabling household users to comprehend the energy consumption of household equipment more precisely and clearly and to optimize power management. In general, load monitoring can be categorized into two distinct categories, namely, intrusive load monitoring (ILM) and non-intrusive load monitoring (NILM). ILM refers to a technology that records electricity usage by installing sensors on each electrical appliance, with high accuracy but also high implementation costs and complexity. NILM was first proposed by Hart [1] in 1992 using the concept of non-intrusive load decomposition, which involves recording total electricity consumption by installing sensors at the home electricity meter. A series of algorithms are then used to estimate information such as the operating status and energy consumption level of specific electrical appliances. Not only can NILM reduce household energy consumption by 12% [2], which will benefit both consumers and power grid businesses economically, but it also plays a significant role in global energy conservation and emission reductions. In comparison to ILM, NILM possesses numerous advantages, including a reduced intrusion into consumer residences, a minimal financial investment, and exceptional practicality [3].

Over the past few years, NILM has received widespread attention from researchers. Optimization and pattern recognition are the primary methods for studying load monitoring. The optimization approach aims to identify the optimal combination of electrical appliances in order to minimize the discrepancy between the aggregated power consumption and the sum of the power consumption of each individual electrical appliance. Hart [1] clustered comparable events of electrical characteristics and used combinatorial optimization (CO) to divide each device into several states, each of which has a corresponding power consumption. Dash [4] and Wittmann [5] address the issue of electrical state overlap by employing mixed integer programming techniques. Although the performance of load disaggregation has been significantly improved, it cannot be used in every instance. In [6], it was suggested to use a load identification method based on the dynamic time warping (DTW) algorithm to measure the similarity between the variable-length raw transient power waveform sample and the template time-series. With the rise of big data and machine learning, many pattern recognition techniques, including k-nearest neighbors (KNNs) [7], support vector machine (SVM) [8], hidden Markov model (HMM) and its variants [9], artificial neural network (ANN) [10], and decision tree (DT) [11], have been applied to NILM. Although these techniques have been proven to increase the accuracy of NILM, they are difficult to implement in practice since they require manual feature extraction from a large data set [12]. Furthermore, as the number of electrical appliances increases, the complexity of the models based on optimization or pattern recognition also increases exponentially.

In this context, the pursuit of alternative and more robust approaches to the issue of energy load forecasting has assumed the utmost significance. With the continuous advancement of scientific computing resources, the implementation of deep learning methods has become increasingly feasible. Researchers have switched their study focus to deep learning applications in order to replace manual labor and automatically extract characteristics from data. This will improve the accuracy and usability of NILM. Deep neural networks (DNNs), including convolutional neural networks (CNNs), recurrent neural networks (RNNs), denoising auto-encoders (DAEs) and transformers, are used. In 2015, Kelly [13] proposed combining CNN, RNN, and DAE with sequence-to-sequence (seq2seq) learning, and compared the resulting models with traditional ones, demonstrating that deep learning has a greater impact than the aforementioned conventional ones. In addition, motivated by self-attention, transformer-based models have also been developed to enhance NILM performance over long input sequences [14]. However, NILM remains a challenging task, and several obstacles must be overcome. Firstly, most advanced technologies have proven unsuccessful in adapting the same model to different users [15]. Thus, constructing a reliable algorithm model that possesses robust generalization capabilities is a formidable task. Secondly, as the amount of meter data increases, the accuracy and computational efficiency of the network model decrease gradually [16,17]. Finally, a single network model alone cannot learn and represent all the intrinsic characteristics of a target appliance from power data, and there is still a need to flexibly use multi-model collocation to complement each model’s deficiencies.

Based on the conventional transformer model, this paper proposes a network structure named GTCN, which combines the transformer with a gating mechanism (gate-transformer) and a CNN, aiming to improve the load decomposition accuracy without increasing the number of model parameters compared with BERT4NILM [14]. The transformer differs from the other network models mentioned above in that it comprehends the significance of each component in the input sequence and assigns importance weights accordingly, utilizing attention mechanisms to learn global dependencies in the sequence, as opposed to sequentially processing the entire load power data sequence. However, its applicability is constrained by the computational complexity. In order to fully utilize the capabilities of the transformer architecture, the proposed model has improved the attention function of the multi-head self-attention mechanism, and incorporated the gating mechanism to ensure that the accuracy of home appliance power estimation is not compromised despite the reduction in the number of model training parameters. Furthermore, while the transformer has shown excellent performance in sequence prediction, it has limitations in capturing local features [18] that may result in a decrease in the decomposition performance of multi-state electrical appliances during energy decomposition. To overcome this limitation, the strategy is combined with a CNN, and experimental analyses are conducted on three different positions of the overall structure to explore the best placement of the CNN within it. The experimental results demonstrate that, compared to the traditional transformer, the gate-transformer has fewer model parameters and the same excellent decomposition performance on the UK-DALE dataset. The GTCN structure possesses the identical number of training parameters as the conventional transformer, owing to the inclusion of the CNN. However, it exhibits superior performance and better generalization ability in terms of predictions compared to the conventional transformer model. Moreover, the GTCN network was also compared with three state-of-the-art models based on different technologies, seq2seq CNN [19], R-LST [20], and SGN [21], in the experiment, which further verifies the superiority of the GTCN network.

The main contributions to this article are as follows:

An energy decomposition structure based on a gate-transformer is proposed. The gate-transformer is capable of drastically reducing the number of model training parameters without impacting model performance, and even outperforms classic transformer models in the majority of electrical devices.
To address the limitations of the transformer, the proposed structure incorporates a CNN to leverage its superior local feature capture ability. This approach effectively captures the global dependencies of the power sequence while considering the local protruding parts, resulting in a reduction in prediction errors in household appliance power estimation and a significant improvement in model performance.
The optimal combination of the CNN and gate-transformer is identified by analyzing the advantages and disadvantages of the performance provided by the CNN in various positions within the overall structural configuration.
Use the public dataset UK-DALE for model evaluation. It provides a detailed description of how to preprocess the raw data and use it for model training, use the load power data of one house for model training, and use the data of other houses for testing to evaluate the generalization ability of the proposed method.

2. Related Work

In recent years, with the advancement of high-end hardware equipment, deep learning has achieved significant accomplishments in the domains of computer vision and natural language processing. Since Kelly [13] first applied deep learning to NILM problems in 2015, significant progress has been made in the application of various deep neural network models to NILM problems.

Although convolutional neural networks (CNNs) were originally designed for two-dimensional image data, they can also serve for sequence-to-sequence (seq2seq) or sequence-to-point (seq2point) learning, as is the case for NILM problems. In 2018, Chen [22] proposed a seq2seq NILM framework based on CNNs, which utilizes gated linear unit convolutional layers to facilitate the mapping of long aggregated power sequences to short power sequences of individual appliances. The further refinement of the output sequence is accomplished by utilizing residual blocks of fully connected layers, and the partially overlapping output sequences are filtered to produce the final output. Zhang [19] also combined CNNs with seq2point learning that year, applying a sliding window to capture aggregated data within a period of time. The network’s output was taken from the midpoint within the power consumption window of home appliances. Experimental results showed that on the UK-DALE [23] dataset and REDD [24] dataset, commonly used metrics such as mean absolute error (MAE) and signal aggregate error (SAE) were significantly superior to Kelly’s seq2seq. However, compared to seq2point, seq2seq is more suitable for real-time applications [25] because both the input and output vectors belong to the same time frame. In 2019, Shin [21] proposed a subtask gated network (SGN), which uses two deep neural networks for regression and classification, respectively, and forms the final estimate by multiplying the regression output with the classification probability. Furthermore, the utilization of CNNs has also ushered in a novel era of multi-scale information extraction. In order to make full use of multi-scale functions, Chen [26] designed a scale- and context-aware network based on the SGN structure by using expansive convolution and self-attention modules, and incorporated an adversarial loss and on-state augmentation to further improve the model’s performance. Similarly, in 2022, Grover [27] proposed a multi-head (Mh-Net) CNN under dynamic grid voltage based on SGN, introducing the attention layer and dilation convolution to increase device power consumption estimation accuracy. Although the features above are homogeneous, and most networks are dominated by CNNs, applications are faced with limitations in processing long-distance information. Therefore, it is difficult to learn better sequence information from power data by using a single network model.

Recurrent neural networks (RNNs) are effective at learning dynamic mapping between input sequences and outputs, utilizing cyclic mechanisms to identify temporal characteristics and correlations in sequences, so they are well-suited for decomposing sequential power signal data. However, due to the vanishing gradient problem, RNNs lack the ability to learn long-term time dependency, so Kelly [13] chose long-short-term memory (LSTM) networks to replace traditional RNNs. On this basis, Krystalakos [28] proposed an online energy disaggregation method based on gate recurrent unit (GRU) neural networks in order to reduce the model complexity caused by the complex gating mechanism of LSTM, which inputs a window of past aggregated data to infer the last point of the window. With the same performance, the trainable parameters of the GRU neural network can be reduced by 60%, and the online decomposition can be achieved by using previous sample points to predict the current sample points. Piccialli [29] combines LSTM with the CNN-based SGN architecture outlined above and incorporates attention mechanisms within the architecture to identify the positions of the input sequence where the pertinent information is present. The experimental results demonstrate that this method outperforms the SGN architectures mentioned above. However, using LSTM for modeling has a drawback in that it cannot encode information from back to front. Since bidirectional LSTM (BiLSTM) is a combination of a forward LSTM and a backward LSTM, it can better capture bidirectional semantic dependence. Kaselimi [30] employs the BiLSTM model to solve the multidimensional problem that arises when the number of devices increases and uses Bayesian optimization to select the best hyperparameters to enhance the model’s performance, creating a unique optimal model that can adapt to the individual settings and seasonal variations of each device. Peng [31] constructed a teacher model and a student model to implement NILM based on knowledge distillation, where the teacher model consists of bidirectional GRU (BiGRU) and the attention mechanism and achieves superior decomposition results. However, relying on a single BiLSTM or BiGRU makes it difficult to obtain good spatial information.

Over the past two years, the transformer model has been proposed as an alternative architecture for sequence modeling tasks [32] and has achieved state-of-the-art performance. This is due to the fact that, in contrast to RNNs, the transformer possesses the capability to access information in an arbitrarily and instantaneously across time, and surpass the limitations of RNNs that cannot be computed in parallel. The traditional transformers do not rely on past hidden states to capture dependencies. On the contrary, they deal with sequences as a whole, reducing the risk of losing past information. In [33], the transformer was first applied to NILM, and two different network structures were proposed, one containing only multiple encoder blocks and the other retaining the original encoder–decoder structure. The experimental results showed that, compared with existing RNNs and CNNs, the transformer improved the accuracy, robustness, and training cost of NILM. In addition, [14] adopted a transformer-based architecture (BERT4NILM) that utilizes self-attention for energy decomposition to process power signal sequence data, which is superior to state-of-the-art models in most scenarios. Despite the transformer architecture’s apparent suitability for the NILM challenge, its applicability is constrained by issues of efficiency and computational complexity. Sykiotis [34] has introduced an efficient model training routine for BERT4NILM, which mainly divides model training into pre-training and training routines, reducing training time by up to 50%. However, this method undoubtedly increases the memory burden in terms of model parameter quantity.

3. Proposed Method

The definition of the NILM problem is to predict the energy consumption of each device based on the total power consumption signal of the entire house measured by the main meter. Assuming the total power consumption of users over a period of time N is

P (N)

, where

P (N) = (p_{1}, p_{2}, \dots, p_{N})

, M is the number of electrical appliances in the house and for the m-th electrical appliance, its power sequence is represented as

x_{m} = (x_{1}^{m}, x_{2}^{m}, \dots, x_{N}^{m})

. Assuming that each device has only two states, ON and OFF, the total power at time t can be expressed as:

p_{t} = \sum_{m = 1}^{M} s_{t}^{m} x_{t}^{m} + e_{t}

(1)

where

s_{t}^{m}

represents the status of electrical appliance m at moment t. If the electrical appliance m is in the ON state, then

s_{t}^{m} = 1

; if the electrical appliance m is in the OFF state, then

s_{t}^{m} = 0

.

x_{t}^{m}

denotes the power consumption when electrical appliance m is used and

e_{t}

represents the noise, which includes measurement noise and all unknown load. The total power can be expressed as the sum of the power consumption at each moment over a period of time N, as follows:

p (N) = \sum_{t = 1}^{N} p_{t}

(2)

Under a NILM framework, only the total power consumption of the user

P (N)

is given. Instead, the power consumption of individual appliances

x^{m} (N)

over a period of time N are unknown, so the problem is to estimate

x^{m} (N)

given

P (N)

. The estimated values

{\hat{x}}^{m} (N)

of

x^{m} (N)

are related to

P (N)

over a time window, thus, a set of decomposition maps about electrical appliance m can be obtained through training a network model:

f^{m} : P (N) \mapsto x^{m} (N)

(3)

where

f^{m}

represents the nonlinear function of electrical appliance m during the process of network learning. By using this function, the power consumption sequence of the electrical appliance m can be mapped from the total power consumption sequence

P (N)

.

3.1. Gate-Transformer Model Construction

The architecture of the traditional transformer model [34] is shown in Figure 1a. This model is mainly composed of a multi-head attention (MHA) module and a position-wise feedforward network (PFFN) from bottom to top. The MHA module receives the normalized input sequence, which undergoes a series of linear transformations, and then uses the scaled dot-product attention mechanism to assign importance weights to each position in the sequence. The process of calculating the single-head scaled dot-product attention is shown in Figure 1b. The input sequence after linear transformations is divided into query (Q), keyword (K), and value (V) matrices. Q and K are first multiplied, followed by scaling and softmax operations, and then multiplied by the V matrix. The specific attention score results are as follows:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt[]{d_{k}}}) V

(4)

where

d_{k}

is the size of the K matrix, and the function of the square root corresponds to the scaling operation mentioned above. Attention can essentially be understood as identifying which parts of a sequence are important for predicting output while ignoring those that are not. Due to the high computational cost and slow training speed of traditional attention mechanisms in training large-scale transformers, this article simplifies the

s o f t m a x

function to a square

R e L U

function [35] to reduce the burden of self-attention and accelerate the model to reach convergence in a shorter time. However, this method will affect the learning effect of the model to some extent. Therefore, to reduce computational complexity while retaining efficient model performance, we propose a gate-transformer model. This idea of adding a gating mechanism is somewhat similar to the attention mechanism mentioned in [29], but unlike in that work, the gating mechanism in this paper is applied to the entire sequence rather than a single sequence point. From the perspective of the overall attention calculation process, there is also a significant difference between the two. As shown in Figure 2a, a set of gating (G) matrices are added during the linear transformation of the input sequence to introduce gating mechanisms into the attention calculation process and improve the accuracy of model decomposition. The procedure for calculating the single-head gated scaled dot-product attention is shown in Figure 2b. The specific gated attention score results are as follows:

G A t t e n t i o n (Q, K, V, G) = (R e l u^{2} (\frac{Q K^{T}}{\sqrt[]{d_{k}}}) V) ⊙ G

(5)

Similarly, the

M H A

module divides the hidden space into multiple subspaces with parameter matrices, each with a separate attention mechanism that can perform parallel calculations of their own attention scores. Ultimately, the outcomes of each subspace are merged.

M H A (Q, K, V, G) = C o n c a t (G A t t e n t i o n (Q_{i}, K_{i}, V_{i}, G_{i})) \forall i \in 1, \dots, h

(6)

After obtaining the attention score result as mentioned above, they are transmitted to the positional feedforward network through the linear layer and normalization layer. This layer uses two linear transformations and the

G E L U

activation function to process the input elements [36]. Let us denote the

M H A

output as Y and the weight matrices and bias vectors of each linear transformation as

W_{1}

,

b_{1}

, and

W_{2}

,

b_{2}

, respectively. Then, the calculation process of the positional feedforward network is as follows:

P F F N (Y) = G E L U (0, Y W_{1} + b_{1}) W_{2} + b_{2}

(7)

Note that after both the

M H A

module and the PFFN module, residual connections are introduced to retain the input sequence features, and dropout rates are set in each normalization layer to increase the stability of the model. Compared with the traditional transformer, the gate-transformer has less dependence on the accuracy of attention, but its performance is even better.

3.2. Network Structure Based on the Combination of Gate-Transformer and CNN

As shown in Figure 3, the proposed architecture in this article is based on the BERT4NILM [14] framework, which mainly consists of an embedding module, a gate-transformer module, and an output module. Because the gate-transformer module cannot learn the order of the input sequence, it is necessary to use the embedding module to embed the feature values and encode the position information of the input sequence. Firstly, a sliding window with a given step size is used to create training windows for the input sequence. These windows are then fed to the embedding module in batches. The embedding module uses a 1D convolution with a filter width of 5 and a stride of 1 to transform the input sequence into a vector of 256 dimensions. Then, the convolutional output is subjected to an Lp pooling operation to preserve features and reduce the length of each one-dimensional vector by half. Meanwhile, position encoding generates position vectors with the same dimension as the feature value embedding for the input sequence, which can effectively represent the relative and absolute position information of the input sequence. Finally, the position vector is added to the feature vector to achieve the goal of having sequence position information in the input of the transformer layer.

E (X) = L p p o o l i n g (C o n v (X)) + E_{P} O S E

(8)

E_{P} O S E (p o s, 2 i) = s i n (\frac{p o s}{{10,000}^{2 i / d}})

(9)

E_{P} O S E (p o s, 2 i + 1) = c o s (\frac{p o s}{{10,000}^{2 i / d}})

(10)

where

p o s

represents sequence position, and i represents dimension.

The output matrix generated by the embedded module is passed to the transformer module which comprises multiple gate-transformers. The structure of each gate-transformer is described in Section 3.1. The transformer module is able to further extract features from the input sequence on the basis of the embedding module and learn the global dependencies in the sequence. Ultimately, the output module employs deconvolution operations to extend the output matrix of the transformer module to the window length of its initial sequence. Subsequently, a two-layer linear transformation with Tanh function activation is applied to restore the output matrix’s dimension to the desired output size.

O u t (X) = T a n h (D e c o n v (X) W_{3} + b_{3}) W_{4} + b_{4}

(11)

where

W_{3}

,

b_{3}

, and

W_{4}

,

b_{4}

are the weight matrices and bias vectors of two linear transformations, respectively.

Due to the limitations of the transformer in capturing the local features of sequences, and given that CNNs offers greater advantages in local feature extraction, this article contemplates incorporating a CNN into the aforementioned network structure to enhance the precision of the load energy consumption decomposition. The circled numbers in Figure 3 represent three locations where CNN can be added, and are selected based on the proposed network structure’s learning process.Figure 3 shows three positions where the CNN can be added, selected based on the proposed network structure’s learning process. The first position is to extract the deeper features of the original sequence before adding the position encoding; the second position is to carry out a series of feature extractions on the feature vector after adding position encoding and before it is sent to the transformer module; the third position is to perform a series of deep feature extractions on feature vectors after learning the global dependence of sequences through the transformer module. The selection of these three positions is based on the purpose of improving the learning of more useful sequence features. However, to avoid the overfitting problem caused by a deeper network, this paper selects the best position to add the CNN through experimental data analysis. The specific design architecture of the CNN, as shown in Figure 4, is composed of six convolution layers, in which the size, number, stride, padding, and activation function of the convolutional filters are marked.

3.3. Loss Function

The loss function is utilized to ascertain the discrepancy between the output of the network model and the designated target value, aiming to minimize the loss. To more effectively showcase the network structure’s progressiveness in this paper, the same loss function as in the comparative literature [14], which uses a variety of loss functions for comprehensive evaluation, is adopted. The specific loss function is as follows:

\begin{matrix} L (x, s) = & \frac{1}{T} \sum_{i = 1}^{T} {({\hat{x}}_{i} - x_{i})}^{2} + D_{K L} (s o f t m a x (\hat{x} / τ) ‖ s o f t m a x (x / τ)) \\ + \frac{1}{T} \sum_{i = 1}^{T} l o g (1 + e x p (- {\hat{s}}_{i} s_{i})) + \frac{λ}{T} \sum_{i \in o} | {\hat{x}}_{i} - x_{i} | \end{matrix}

(12)

where x,

\hat{x}

are the real and predicted values of energy consumption series, and s,

\hat{s}

are the status labels and predicted values of the electrical appliances. The sequence length is represented by T, and o refers to the collection of incorrectly predicted samples and time points when the state of the appliance is ON. In addition, the hyperparameters

τ

and

λ

are introduced to control the

s o f t m a x

temperature coefficient and reduce the influence of the absolute error of the sequence. Since most of the electrical appliances are in the OFF state, the value of the hyperparameter

τ

is consistent with the literature [14]. Note that the electrical status is calculated by comparing the ON-threshold.

4. Experiments

In this section, the proposed NILM network structure based on gate-transformer (GTNILM) is compared with the BERT4NILM structure from the literature [14]. Additionally, the aforementioned CNN model is incorporated at three positions of the GTNILM for experimental analysis in order to determine the optimal position of the CNN. Note that the network structure after adding the CNN will be named GTCN in the experiments, where the GTCN networks formed with the CNN at the three different positions are named GTCN-1, GTCN-2, and GTCN-3, respectively.

The experimental environment is a Windows 11 64-bit operating system with a 12th Gen Intel (R) Core (TM) i7-12700H 2.30 GHz processor and 16 GB of RAM. The whole model is implemented in the Pytorch deep learning framework and accelerated using GPUs.

4.1. Dataset

The effectiveness of the model is evaluated using the real-life measured dataset UK-DALE, which recorded the electricity consumption of five London households approximately every 6 s between November 2012 and January 2015. The dataset contains the electricity consumption of over 10 appliances and the total electricity consumption of each household in five UK households. Although the sampling time varies by household, the sampling frequency for all power signals is 1/6 Hz.

The experiments select refrigerators (FRZ), washing machines (WM), dishwashers (DW), microwaves (MW), and kettles (KT) as the target household appliances for decomposition. The reasons are as follows: the operating characteristics of refrigerators are periodic; the working modes of washing machines and dishwashers are more diverse, and their operating characteristics are more complex; microwave ovens and kettles have high operating power and short operating time cycles. During the experiments, 80% of the data from house 1 is utilized as the training set to train the model, while the remaining 20% of the data from house 1 is utilized as the test set. Because only house 1 and house 2 contain all the target appliances, in order to verify the generalization ability of the proposed network structure, the experiments take house 2 as the migration object, keeping all parameters of the model unchanged during the migration process.

4.2. Data Preprocessing and Network Parameters

In the experiment, a sliding window with aggregated power is used as the input sequence, and the power window sequence corresponding to the target appliance is used as the output sequence. Due to the presence of noise signals and some inconsistencies in the aggregated power measurements of the UK-DALE dataset, the subsequent data preprocessing was conducted: (1) aligning timestamps and resampling the aggregated power and individual electrical appliance power data to 6 s; (2) eliminating missing values; (3) filling forward with a time interval of less than 3 min; (4) normalization processing. Table 1 lists the relevant parameters of each target appliance after preprocessing, such as the maximum power, ON-threshold, minimum ON duration, minimum OFF duration, and hyperparameter

λ

required by the loss function of each appliance. Depending on these parameters, the power values and ON-thresholds of each electrical appliance are compared. When the duration exceeds the minimum ON and OFF durations, it can be deemed that the actual status label of the electrical appliance has undergone a change. Furthermore, during the experiment, an initial setting of a sliding window input length of 480 and a span of 240 was made. It is noteworthy that the window length and span are not optimal for the proposed method.

For the GTCN network structure, there are two gate-transformer layers, and each layer has

h = 2

multi-head attention modules. The GTNILM network structure requires parameter settings for some layers: (1) the 1D convolutional layer has a filter size of 5, a filter number of 256, a stride of 1, and a padding length of 2; (2) the deconvolution layer has a filter size of 4, a filter number of 256, a stride of 2, and a padding length of 1; (3) the Lp pooling layer has a filter size of 2 and a stride of 2. During the training process, the entire model is not subject to masking training. The learning rate is set to 0.0001, the optimizer is Adam, and the discard rate is 0.1. The number of training iterations is 10, and the number of training sample batches processed is 64.

4.3. Evaluating Metrics

The experiment adopted four metrics widely used in NILM research: accuracy, F1 score, signal aggregate error (SAE), and mean absolute error (MAE). The accuracy and F1 score are utilized to evaluate the performance of the electrical appliance state activation, whereas SAE and MAE are utilized to evaluate the precision of individual electrical appliance active power estimates. It should be noted that MAE measures the average deviation of estimated power from actual power at each moment, whereas SAE measures the relative error of power estimation throughout the evaluation period. The specific calculations are as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

F 1 = \frac{T P}{T P + 1 / 2 (F P + F N)}

(14)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} |

(15)

S A E = \frac{\sum_{t} \hat{y} - \sum_{t} y}{\sum_{t} y}

(16)

where

T P

is the number of moments when the load is correctly recognized as active;

T N

is the number of moments when the load is correctly assessed to be closed;

F P

represents the number of moments when the load is not working but is recognized as working;

F N

is the number of moments when the load actually worked but was incorrectly evaluated as OFF; N represents the number of power sequence points; y is the true value of the power and

\hat{y}

is the estimated value of the power.

4.4. Experimental Result

This article conducts experiments on two different NILM evaluation scenarios based on the UK-DALE dataset, namely, the visible scenario (without training data in house 1) and the invisible scenario (house 2). The experimental results of the two scenarios are shown in Table 2 and Table 3, respectively, where bold font represents the best execution algorithm for each device.

Based on the results presented in Table 2, it is evident that the GTNILM network showed improved overall decomposition performance in various performance indicators compared to the BERT4NILM network in the visible scenario. Among them, the overall improvement in the F1 and accuracy indicators, which measure the status of electrical appliances, is relatively small because both of the two network structures obtain the state of electrical appliances by comparing the predicted power value of each electrical appliance with the ON-threshold value rather than through the direct output of the network. The overall error in MAE indicators decreased by 11%, with reductions of 32%, 9%, 5%, and 19% for kettles, refrigerators, washing machines, and dishwashers, respectively. Furthermore, the overall error in SAE indicators is reduced by 40%, with kettles, refrigerators, washing machines, and dishwashers reduced by 54%, 97%, 2%, and 45%, respectively. For microwaves, the power estimation indicators of the two network structures are basically the same. From the comparison results of the overall visible scenes, it is evident that the GTCN-3 network structure achieved optimal overall performance on all indicators. This indicates that, among the three positions where the CNN was introduced in the GTNILM network structure, the introduction at the third position was most effective. Especially for the MAE and SAE power estimation indicators, compared with the GTNILM network without CNN, the overall performance of the GTCN-3 network improved by 16% and 33%, respectively.

From Table 3, it is evident that in the invisible scenario, the GTNILM network has slightly lower overall decomposition performance in various performance indicators compared to the BERT4NILM network, mainly reflected in the performance ability of microwaves and dishwashers. Considering that the amount of data for these two types of appliances in the active state is limited and the number of model training iterations is small, adding more training data and increasing the number of model training iterations may further enhance the decomposition performance of these types of appliances. Further comparing the overall invisible scenes, it can be observed that although the BERT4NILM network has a high F1 score, the GTCN-3 network structure still achieves the best overall performance in the other indicators. This indicates that the combination of the GTNILM network and CNN effectively improves the generalization ability of the overall network structure in NILM. Especially in terms of MAE and SAE power estimation indicators, compared with the GTNILM network without CNN, the overall performance of the GTCN-3 network improved by 32.6% and 64.7%, respectively. When compared with the BERT4NILM network, the two indicators also improved by 13% and 61%, respectively.

The experiments also compared the above models in terms of model training parameter quantity, as shown in Figure 5. From the graph, it can be observed that compared with the BERT4NILM network, the GTNILM network significantly reduces both the training parameters and computational complexity, which reduces the calculation cost in the training process on the premise of ensuring the power decomposition performance. Due to the invariance of the CNN’s structural parameters, the three network structures of GTCN maintain consistency in terms of model parameter quantity, but in terms of model calculation quantity, the GTCN-1 network is much larger than the GTCN-2 and GTCN-3 networks. According to the above network decomposition performance, the model computation of the GTCN-3 network is more expensive than that of GTNILM without the CNN, but its decomposition performance is much better than that of GTNILM. Moreover, the model computation of the GTCN-3 network is basically the same as that of the GTCN-2 and BERT4NILM networks. Therefore, while deepening the network structure, the GTCN-3 network maintains the same computational cost as the BERT4NILM network but significantly improves the power decomposition performance.

Figure 6 shows examples of different models predicting various devices in the dataset, among which the three structures of the GTCN network outperform other networks for most electrical appliances, especially the GTCN-3 network. From the power decomposition curve of the kettles, it can be seen that the decomposition power of the GTCN-3 network is closer to the actual power, and the decomposition curve of the network is relatively smooth. The decomposition performance of the electrical appliance power peak is more prominent compared to other networks, which can be clearly observed in the decomposition results of the refrigerators. In addition, the misjudgment rate of the GTCN-3 network on various electrical appliances is also relatively low. From the decomposition effect of washing machines and microwaves, BERT4NILM, GTNILM, GTCN-1, and GTCN-2 all experienced situations where the equipment is not actually working but is predicted to be working. Finally, although the decomposition performance of the GTCN-3 network on dishwashers deviates from the actual power, its decomposition performance is significantly better than other networks from a visual perspective.

In order to further validate the effectiveness of the GTCN-3 network, three typical networks, seq2seq [19], R-LSTM [20], and SGN [21] are introduced in the experiment for evaluation. During the experiment, changes were only made to the network model, and the other experimental parameters remained the same as before. Due to the R-LSTM network’s inability to decompose the effective power of the microwave oven under the parameters set in this experiment, only the other four devices were selected for comparison, as shown in Figure 7. From the overall results, it can be seen that the GTCN-3 network is more prominent in various performance indicators, whether in the visible or invisible scenarios. This method effectively avoids the risk of the LSTM losing past information due to its reliance on past hidden states while combining the characteristics of gate-transformer with the advantages of the CNN to achieve better decomposition results.

5. Conclusions

In this article, an energy decomposition model based on the combination of gate-transformer and a CNN is proposed. It has been proved that the proposed gate-transformer can significantly reduce the training parameters of traditional transformer models without affecting the overall performance of the model. The introduction of the CNN compensates for the disadvantage of the transformer’s lack of local feature capture ability. The combination of both not only effectively grasps the global dependency and local prominence of the power sequence but also avoids the complex model training computational burden caused by the addition of a CNN.

Through experimental analysis on the UK-DALE dataset, the GTCN-3 network achieved optimal decomposition performance compared to other models in both visible and invisible scenarios, determining the optimal position of the CNN in the architecture. Our proposed model shows the best performance compared to other state-of-the-art deep learning models in terms of F1, accuracy, MAE, and SAE. At the same time, the power decomposition depictions of five typical electrical appliances—microwaves, refrigerators, kettles, dishwashers, and washing machines—are given, which intuitively show the effectiveness of the model proposed in this paper in real power tracking. However, there is still some distance between the GTCN-3 network itself and its practical application. For some electrical appliances, such as microwave ovens, the decomposition performance of the GTCN-3 network needs to be further improved.

In future work, this study will continue to optimize the proposed network structure to further reduce the number of model parameters and improve its own generalization ability on the premise of ensuring the accuracy of load decomposition. Since our goal is to apply the network structure to actual practice, more devices will be put into the network structure for load decomposition, and we will strive to bring higher decomposition efficiency at low storage costs.

Author Contributions

Conceptualization, Z.Z. (Zhoupeng Zai); methodology, Z.Z. (Zhoupeng Zai) and Z.Z. (Zhengjiang Zhang); software, Z.Z. (Zhoupeng Zai) and H.L.; validation, Z.Z. (Zhoupeng Zai) and Z.Z. (Zhengjiang Zhang); formal analysis, Z.Z. (Zhoupeng Zai) and N.S.; investigation, Z.Z. (Zhoupeng Zai) and N.S.; resources, S.Z.; data curation, Z.Z. (Zhoupeng Zai) and H.L.; writing—original draft preparation, Z.Z. (Zhoupeng Zai); writing—review and editing, S.Z. and Z.Z. (Zhengjiang Zhang); visualization, Z.Z. (Zhoupeng Zai); supervision, S.Z. and Z.Z. (Zhengjiang Zhang); project administration, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52077158.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hart, G.W. Nonintrusive appliance load monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
Luan, W.P.; Zhang, R.Q.; Liu, B. Leveraging sequence-to-sequence learning for online non-intrusive load monitoring in edge device. Electr. Power Energy Syst. 2023, 148, 108910. [Google Scholar] [CrossRef]
Breschi, V.; Piga, D.; Bemporad, A. Online end-use energy disaggregation via jump linear models. Control Eng. Pract. 2019, 89, 30–42. [Google Scholar] [CrossRef]
Dash, S.; Sodhi, R.; Sodhi, B. An Appliance Load Disaggregation Scheme Using Automatic State Detection Enabled Enhanced Integer Programming. IEEE Trans. Ind. Inform. 2021, 17, 1176–1185. [Google Scholar] [CrossRef]
Wittmann, F.M.; López, J.C.; Rider, M.J. Nonintrusive Load Monitoring Algorithm Using Mixed-Integer Linear Programming. IEEE Trans. Consum. Electron. 2018, 64, 180–187. [Google Scholar] [CrossRef]
Liu, B.; Luan, W.P.; Yu, Y. Dynamic Time Warping based Non-intrusive Load Transient Identification. Appl. Energy 2017, 195, 634–645. [Google Scholar] [CrossRef]
Tabatabaei, S.M.; Dick, S.; Xu, W. Toward non-intrusive load monitoring via multi-label classification. IEEE Trans. Smart Grid. 2017, 8, 26–40. [Google Scholar] [CrossRef]
Liu, Q.Z.; Shen, Y.B.; Wu, L.; Li, J.; Zhuang, L.; Wang, S. A hybrid FCW-EMD and KF-BA-SVM based model for short-term load forecasting. CSEE J. Power Energy Syst. 2018, 4, 226–237. [Google Scholar] [CrossRef]
Wu, Z.; Wang, C.; Zhang, H.Q.; Peng, W.; Liu, W. A time-efficient factorial hidden Semi-Markov model for non-intrusive load monitoring. Electr. Power Syst. Res. 2021, 199, 107372. [Google Scholar] [CrossRef]
Zheng, Z.; Chen, H.; Luo, X. A supervised event-based non-intrusive load monitoring for non-linear appliances. Sustainability 2018, 10, 1001. [Google Scholar] [CrossRef] [Green Version]
Lin, J.; Ding, X.; Qu, D.; Li, H. Non-intrusive load monitoring and decomposition method based on decision tree. J. Math. Ind. 2020, 10, 1. [Google Scholar] [CrossRef]
Wang, S.X.; Chen, H.W. A Novel Deep Learning Method for The Classification of Power Quality Disturbances Using Deep Convolutional Neural Network. Appl. Energy 2019, 235, 1126–1140. [Google Scholar] [CrossRef]
Kelly, J.; Knottenbelt, W. Neural NILM: Deep Neural Networks Applied to Energy Dis-aggregation. In Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, New York, NY, USA, 4–5 November 2015; pp. 55–64. [Google Scholar]
Yue, Z.; Witzig, C.R.; Jorde, D.; Jacobsen, H.A. BERT4NILM: A Bidirectional Transformer Model for Non-Intrusive Load Monitoring. In Proceedings of the 5th International Workshop on Non-Intrusive Load Monitoring, New York, NY, USA, 18 November 2020; pp. 89–93. [Google Scholar]
Murray, D.; Stankovic, L.; Stankovic, V.; Lulic, S.; Sladojevic, S. Transferability of Neural Network Approaches for Low-rate Energy Disaggregation. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8330–8334. [Google Scholar]
Çavdar, İ.H.; Feryad, V. Efficient Design of Energy Disaggregation Model with BERT-NILM Trained by AdaX Optimization Method for Smart Grid. Energies 2021, 14, 4649. [Google Scholar] [CrossRef]
Athanasiadis, C.L.; Doukas, D.I.; Papadopoulos, T.A.; Barzegkar-Ntovom, G.A. Real-Time Non-Intrusive Load Monitoring: A Machine-Learning Approach for Home Appliance Identification. In Proceedings of the 2021 IEEE Madrid PowerTech, Madrid, Spain, 28 June–2 July 2021; pp. 1–6. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Zhang, C.Y.; Zhong, M.J.; Wang, Z.Z.; Goddard, N.; Sutton, C. Sequence-to-Point Learning With Neural Networks for Non-Intrusive Load Monitoring. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2604–2611. [Google Scholar]
Rafiq, H.; Zhang, H.; Li, H.; Ochani, M.K. Regularized LSTM Based Deep Learning Model: First Step towards Real-Time Non-Intrusive Load Monitoring. In Proceedings of the 2018 IEEE International Conference on Smart Energy Grid Engineering (SEGE), Oshawa, ON, Canada, 12–15 August 2018; pp. 234–239. [Google Scholar]
Shin, C.; Joo, S.; Yim, J.; Lee, H.; Moon, T.; Rhee, W. Subtask Gated Networks for Non-Intrusive Load Monitoring. In Proceedings of the Thirty-Third AAAl Conference on Artificial lntelligence, Honolulu, HI, USA, 27 January 2019; pp. 1150–1157. [Google Scholar]
Chen, K.; Wang, Q.; He, Z. Convolutional Sequence to Sequence Non-intrusive Load Monitoring. Engineering 2018, 17, 1860–1864. [Google Scholar] [CrossRef]
Kelly, J.; Knottenbelt, W. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Sci. Data 2015, 2, 150007. [Google Scholar] [CrossRef] [Green Version]
Kolter, J.Z.; Johnson, M.J. REDD: A public data set for energy disaggregation research. In Proceedings of the Workshop on Data Mining Applications in Sustainability (SIGKDD), San Diego, CA, USA, 21 August 2011; pp. 59–62. [Google Scholar]
Cimen, H.; Wu, Y.; Wu, Y.P.; Terriche, Y.; Vasquez, J.C.; Guerrero, J.M. Deep Learning-based Probabilistic Autoencoder for Residential Energy Disaggregation: An Adversarial Approach. IEEE Trans. Ind. Inform. 2022, 18, 8399–8408. [Google Scholar] [CrossRef]
Chen, K.J.; Zhang, Y.; Wang, Q.; Hu, J.; Fan, H.; He, J. Scale- and Context-Aware Convolutional Non-Intrusive Load Monitoring. IEEE Trans. Power Syst. 2020, 35, 2362–2373. [Google Scholar] [CrossRef] [Green Version]
Grover, H.; Panwar, L.; Verma, A.; Panigrahi, B.K.; Bhatti, T.S. A multi-head Convolutional Neural Network based non-intrusive load monitoring algorithm under dynamic grid voltage conditions. Sustain. Energy Grids Netw. 2022, 32, 100938. [Google Scholar] [CrossRef]
Krystalakos, O.; Nalmpantis, C.; Vrakas, D. Sliding Window Approach for Online Energy Disaggregation Using Artificial Neural Networks. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, New York, NY, USA, 9–12 July 2018; pp. 1–6. [Google Scholar]
Piccialli, V.; Sudoso, A.M. Improving Non-Intrusive Load Disaggregation through an Attention-Based Deep Neural Network. Energies 2021, 14, 847. [Google Scholar] [CrossRef]
Kaselimi, M.; Doulamis, N.; Doulamis, A.; Voulodimos, A.; Protopapadakis, E. Bayesian-optimized Bidirectional LSTM Regression Model for Non-intrusive Load Monitoring. In Proceedings of the 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 2747–2751. [Google Scholar]
Peng, B.G.; Qiu, L.X.; Yu, T.; Zhong, L.; Liu, Y. Incorporating Knowledge Distillation Into Non-intrusive Load Monitoring for Hardware Systems Deployment. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 3054–3058. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Lin, N.; Zhou, B.; Yang, G.; Ma, S. Multi-head attention networks for nonintrusive load monitoring. In Proceedings of the 2020 IEEE International Conference on Signal Processing, Communications and Computing, Macau, China, 21–24 August 2020; pp. 1–5. [Google Scholar]
Sykiotis, S.; Kaselimi, M.; Doulamis, A.; Doulamis, N. ELECTRIcity: An Efficient Transformer for Non-Intrusive Load Monitoring. Sensors 2022, 22, 2926. [Google Scholar] [CrossRef]
So, D.R.; Ma´nke, W.; Liu, H.; Dai, Z.; Shazeer, N.; Le, Q.V. Primer: Searching for efficient transformers for language modeling. In Proceedings of the Advances in Neural Information Processing Systems 34, Virtual, 6–14 December 2021; pp. 6010–6022. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]

Figure 1. Overview of the traditional transformer model. (a) Traditional transformer structure. (b) Traditional scaled dot-product attention.

Figure 2. Overview of the traditional transformer model. (a) Gate-transformer structure. (b) Gated scaled dot-product attention.

Figure 3. Network Structure based on gate-transformer.

Figure 4. Structure and parameters of CNN model.

Figure 5. Training Parameter quantity and calculation quantity of each model.

Figure 6. Decomposition examples of different models on UK-DALE dataset.

Figure 7. Performance comparison of four models for each indicator.

Table 1. Relevant parameters of various electrical appliances after UK-DALE pretreatment.

Device	$λ$	Max. Power	ON-Threshold	Min. ON Duration	Min. OFF Duration
KT	1	3100 W	2000 W	12 s	0 s
FRZ	10 $^{- 6}$	300 W	50 W	60 s	12 s
WM	10 $^{- 3}$	2500 W	20 W	1800 s	160 s
MW	1	3000 W	200 W	12 s	30 s
DW	1	2500 W	10 W	1800 s	1800 s

Table 2. Performance comparison of various models in the visible scenario.

Device	Model	Acc.	F1	MAE	SAE
KT	BERT4NILM	0.996	0.737	9.24	0.33
	GTNILM	0.997	0.841	6.28	0.15
	GTCN-1	0.997	0.803	8.05	0.06
	GTCN-2	0.997	0.804	7.46	0.198
	GTCN-3	0.998	0.862	5.63	0.158
FRZ	BERT4NILM	0.797	0.77	23.19	0.19
	GTNILM	0.82	0.817	21.09	0.005
	GTCN-1	0.828	0.82	20.97	0.003
	GTCN-2	0.804	0.805	18.7	0.12
	GTCN-3	0.83	0.83	20.97	0.003
WM	BERT4NILM	0.982	0.879	14.15	0.1
	GTNILM	0.982	0.874	13.41	0.098
	GTCN-1	0.97	0.804	14.75	0.116
	GTCN-2	0.98	0.85	14.28	0.17
	GTCN-3	0.984	0.87	13.45	0.1
MW	BERT4NILM	0.988	0.556	15.65	0.347
	GTNILM	0.987	0.567	15.83	0.364
	GTCN-1	0.988	0.467	17.01	0.778
	GTCN-2	0.985	0.59	14.1	0.1
	GTCN-3	0.991	0.684	11.99	0.137
DW	BERT4NILM	0.946	0.4	18.99	0.759
	GTNILM	0.96	0.27	15.34	0.42
	GTCN-1	0.91	0.32	25.95	0.24
	GTCN-2	0.94	0.4	23.74	1.23
	GTCN-3	0.972	0.32	10.85	0.18
Average	BERT4NILM	0.9418	0.6684	16.244	0.3452
	GTNILM	0.9492	0.6738	14.39	0.2074
	GTCN-1	0.9386	0.6428	16.848	0.2424
	GTCN-2	0.9412	0.6898	16.11	0.3402
	GTCN-3	0.955	0.7132	12.124	0.139

Table 3. Performance comparison of various models in the invisible scenario.

Device	Model	Acc.	F1	MAE	SAE
KT	BERT4NILM	0.997	0.83	13.27	0.39
	GTNILM	0.997	0.86	12.4	0.35
	GTCN-1	0.997	0.84	0.33	0.004
	GTCN-2	0.999	0.94	0.24	0.003
	GTCN-3	0.998	0.89	0.33	0.004
FRZ	BERT4NILM	0.827	0.806	22.73	0.26
	GTNILM	0.86	0.85	20.49	0.176
	GTCN-1	0.8445	0.832	21.16	0.2
	GTCN-2	0.846	0.843	19.4	0.13
	GTCN-3	0.888	0.885	17.6	0.045
WM	BERT4NILM	0.99	0.64	6.48	0.61
	GTNILM	0.985	0.575	6.48	0.4
	GTCN-1	0.969	0.4	13.25	0.58
	GTCN-2	0.981	0.47	6.79	0.58
	GTCN-3	0.99	0.59	6.79	0.4
MW	BERT4NILM	0.996	0.735	3.53	0.011
	GTNILM	0.989	0.47	6.9	0.22
	GTCN-1	0.996	0.67	5.93	0.68
	GTCN-2	0.987	0.43	8.92	0.33
	GTCN-3	0.992	0.51	7.87	0.09
DW	BERT4NILM	0.97	0.73	10.82	0.126
	GTNILM	0.97	0.6	27.02	0.397
	GTCN-1	0.96	0.66	28.8	0.55
	GTCN-2	0.96	0.62	22.36	0.03
	GTCN-3	0.97	0.61	16.78	0.005
Average	BERT4NILM	0.9564	0.7482	11.366	0.2794
	GTNILM	0.9602	0.671	14.658	0.3086
	GTCN-1	0.9533	0.6804	13.894	0.4028
	GTCN-2	0.9546	0.6606	11.542	0.2146
	GTCN-3	0.9676	0.697	9.874	0.1088

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zai, Z.; Zhao, S.; Zhang, Z.; Li, H.; Sun, N. Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN. Electronics 2023, 12, 2824. https://doi.org/10.3390/electronics12132824

AMA Style

Zai Z, Zhao S, Zhang Z, Li H, Sun N. Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN. Electronics. 2023; 12(13):2824. https://doi.org/10.3390/electronics12132824

Chicago/Turabian Style

Zai, Zhoupeng, Sheng Zhao, Zhengjiang Zhang, Haolei Li, and Nianqi Sun. 2023. "Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN" Electronics 12, no. 13: 2824. https://doi.org/10.3390/electronics12132824

APA Style

Zai, Z., Zhao, S., Zhang, Z., Li, H., & Sun, N. (2023). Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN. Electronics, 12(13), 2824. https://doi.org/10.3390/electronics12132824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Intrusive Load Monitoring Based on the Combination of Gate-Transformer and CNN

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Gate-Transformer Model Construction

3.2. Network Structure Based on the Combination of Gate-Transformer and CNN

3.3. Loss Function

4. Experiments

4.1. Dataset

4.2. Data Preprocessing and Network Parameters

4.3. Evaluating Metrics

4.4. Experimental Result

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI