A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary

Liu, Siqi; Xie, Zhiyuan; Hu, Zhengwei

doi:10.3390/electronics13193970

Open AccessArticle

A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary

by

Siqi Liu

,

Zhiyuan Xie

^* and

Zhengwei Hu

School of Electrical and Electronic Engineering, North China Electric Power University, Baoding 071003, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(19), 3970; https://doi.org/10.3390/electronics13193970

Submission received: 1 September 2024 / Revised: 29 September 2024 / Accepted: 30 September 2024 / Published: 9 October 2024

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the non-invasive load monitoring (NILM) method based on sparse coding has shown promising research prospects. This type of method learns a sparse dictionary for each monitoring target device, and it expresses load decomposition as a problem of signal reconstruction using dictionaries and sparse vectors. The existing NILM methods based on sparse coding have problems such as inability to be applied to multi-state and time-varying devices, single-load characteristics, and poor recognition ability for similar devices in distributed manners. Using the analysis above, this paper focuses on devices with similar features in households and proposes a distributed non-invasive load monitoring method using Karhunen–Loeve (KL) feature extraction and an improved deep dictionary. Firstly, Karhunen–Loeve expansion (KLE) is used to perform subspace expansion on the power waveform of the target device, and a new load feature is extracted by combining singular value decomposition (SVD) dimensionality reduction. Afterwards, the states of all the target devices are modeled as super states, and an improved deep dictionary based on the distance separability measure function (DSM-DDL) is learned for each super state. Among them, the state transition probability matrix and observation probability matrix in the hidden Markov model (HMM) are introduced as the basis for selecting the dictionary order during load decomposition. The KL feature matrix of power observation values and improved depth dictionary are used to discriminate the current super state based on the minimum reconstruction error criterion. The test results based on the UK-DALE dataset show that the KL feature matrix can effectively reduce the load similarity of devices. Combined with DSM-DDL, KL has a certain information acquisition ability and acceptable computational complexity, which can effectively improve the load decomposition accuracy of similar devices, quickly and accurately estimating the working status and power demand of household appliances.

Keywords:

load decomposition; distributed non-invasive load monitoring; sparse coding technology; deep dictionary learning; Karhunen–Loeve

1. Introduction

NILM technology is one of the key technologies for implementing load management, which can monitor the usage of electrical equipment in buildings online. Through NILM, users can understand their own electricity usage characteristics, and power companies can also have a detailed understanding of their electricity usage behavior. This can improve the planning, operation, and management capabilities of the power grid, providing services for two-way interaction and intelligent electricity consumption between both parties [1]. NILM mainly monitors household and commercial appliances. According to their working modes, these electrical appliances can be divided into three categories [2]: on-off-state devices, multi-state devices, and time-varying-state devices.

The earliest NILM technology was proposed by Hart in 1992, which used load characteristics such as power, voltage, and reactive power measured by household electricity meters to model devices as finite state machines [3]. This method is relatively successful in decomposing simple switch-state devices, but its effect is not good for multi-state and time-varying-state devices. At present, one development direction of the NILM system is achieving non-invasive load monitoring. Specifically, it relies on the existing data collection equipment and communication network of the power information collection system, adopts advanced data communication technology to obtain fine user power load data, and then uses the powerful data processing capabilities of the power cloud platform to run more complex and accurate load decomposition algorithms [4,5,6], such as kNN clustering and Fisher discrimination [7], FCM fuzzy clustering [8], subtraction clustering [9], quadratic linear programming [10], deep learning methods based on seq2seq and attention mechanisms [11], etc. Reference [4] designed an AdaBoost sample selection algorithm that simplifies the load feature library by analyzing the distribution of load samples in the feature space. It combines the k-nearest neighbors (k-NN) classification algorithm and kernel Fisher discriminant algorithm to identify loads by controlling the risk of misclassification. In Reference [7], for low-power electrical appliances, the most obvious harmonic amplitude in the frequency domain is selected as a new feature, and a new NILM method is proposed. This method uses incremental feature extraction to obtain changes in load characteristics and uses information entropy to determine the optimal number of clusters and load similarity by calculating inter-cluster entropy. Then, the fuzzy clustering method is used to monitor the number and types of electrical appliances. Reference [8] proposed a low-sampling-rate, non-invasive device load monitoring (NIALM) method, which is based on subtraction clustering and maximum likelihood classifiers. Compared with the NIALM method based on K-means, this method has lower sensitivity to power grid noise.

Load characteristics are an important basis for load decomposition, and the load characteristics of equipment can be roughly divided into steady-state characteristics and transient characteristics [12]. Steady state characteristics mainly include parameters such as steady-state voltage, steady-state power, and reactive power, while transient characteristics mainly include transient voltage, transient power, and power at the moment of equipment switching. Due to the high requirement for device sampling accuracy in obtaining transient data (sampling frequencies greater than 10 kHz), it is difficult to use it for large-scale residential or commercial electricity monitoring. Therefore, the use of steady-state characteristics is more widespread in practical applications.

NILM that utilizes steady-state characteristics mostly relies on analyzing historical operating data of individual devices and inferring the current state of the device by combining observation data. Reference [9] proposed a new NILM method called Linear Chain Conditional Random Fields (CRFs), which combines the characteristics of current signal and real power. This method relaxes the independent assumption, avoids label bias issues, and can effectively identify and detect polymorphic household appliances. Reference [10] first introduces sparse coding and dictionary learning methods into NILM, proposing a discriminative decomposition sparse coding method suitable for low-sampling-rate data load decomposition. Although the decomposition effect is not satisfactory, it brings new ideas to NILM research. Subsequently, the load decomposition technique based on dictionary learning has received increasing attention. Reference [11] combines dictionary learning with time-based prior probabilities and proposes the use of power-let technology to learn load features. Reference [13] combines sparse coding techniques with deep learning concepts and uses deep sparse coding for load decomposition. This method learns a multi-layer dictionary (also called a deep dictionary) for each device in the scene and then obtains a corresponding sparse matrix through the deep dictionary and measurement data as the basis for judging the device status. Deep dictionary learning effectively improves the accuracy of load decomposition, but there are also some shortcomings: devices can only be assumed to be simple switch-state devices, and dictionaries are learned separately for each device, which not only breaks the interconnection between devices but also accumulates recognition errors of individual devices, affecting the judgment of the overall state. In addition, due to the limitations of the dictionary learning framework used, existing load decomposition methods based on sparse coding cannot simultaneously recognize devices based on multiple load features. Using only a specific load feature is highly susceptible to interference from device feature aliasing, resulting in poor recognition ability for devices with similar features. With the advancement of technology and science, the types of household electrical equipment are constantly increasing, and the load characteristics are gradually converging, making the existing steady-state load characteristics unable to effectively distinguish and identify equipment. It is necessary to further explore the load characteristics of equipment to meet the needs of NILM [12].

This paper proposes a non-invasive load monitoring method that combines the KLE load feature matrix with improved deep dictionary learning based on distance separability measure to address the problems of existing NILM methods based on sparse coding, such as inability to apply to multi-state and time-varying devices, single-load features, and poor recognition ability for similar devices. This method first quantifies the states of multi-state devices and time-varying-state devices in the training sample set and aggregates the target device states into super states. With a method based on KL expansion and singular value decomposition for dimensionality reduction, a new feature, i.e., the KLE feature matrix, is extracted from the total power of the target device, and a hyper-state feature library is established based on it. Using the KL feature matrix of the super state as the learning sample, a distance-based class separability measure function is used to improve the objective function of deep dictionary learning for super states with similar load characteristics and high risk of misjudgment. The spatial sparsity of the super state is utilized to reduce the computational complexity of load decomposition, the current super state is identified, and the state and power of each device are estimated based on the minimum reconstruction error criterion. Through testing on the UK-DALE [14] power dataset, it was verified that the proposed load decomposition method based on the KLE feature matrix and improved deep dictionary learning has the advantages of low similarity of equipment load features, fast load decomposition speed, and improved recognition accuracy of appliances with similar features.

2. Establishment of a Super-State Feature Library

A hyper-state is an aggregation of all target device states and each hyper-state uniquely corresponding to a combination of internal states within a single device. The super state preserves the correlation between device operations, and the state of all devices can be directly inferred from the super state during load decomposition, which can reduce the impact of device feature aliasing. The establishment process of the hyper-state feature library is shown in Figure 1, which will be introduced in detail in Section 2.1 of this paper.

2.1. Equipment State Division and Hyper-State Modeling

Most existing NILM methods simply assume devices are switch-state electrical appliances; however, in practical applications, most devices have multiple states or time-varying states. The power difference between different gears and functions of electrical equipment can be significant, and recording the status changes in the equipment while recording electrical data can also create a heavy workload. Therefore, this paper first uses the peak-binning method to partition the states of devices in the dataset and then models the devices. Assuming the distribution of power values of device m

P = {P_{1}, P_{2}, \dots P_{n}}

is as shown in Table 1, we first calculate the number of power values

P_{n}

and their proportion

P Y_{m} (P_{n})

. Then, we search for the peak power

y_{p e a k}^{(m)}

from low to high according to

P

, and the peak value must meet the requirements of Equation (1).

\{\begin{matrix} P Y_{m} (P_{n}) - P Y_{m} (P_{n - 1}) > 0 \\ P Y_{m} (P_{n + 1}) - P Y_{m} (P_{n}) < 0 \end{matrix}, P Y_{m} (P) > 0.002

(1)

A threshold 0.002 is set for the peak value in the formula to filter out background noise, that is, to filter out smaller peaks. After the search is completed, the peak values are numbered, with each peak representing a quantized state of the device. The power value corresponding to the peak value is used as the quantized power value for that state, and non-peak power is included in the state where the nearest peak is located. As shown in Table 1, there are six power values for the equipment, with only two peak values

y_{p e a k}^{m} [0] = 0

and

y_{p e a k}^{m} [1] = 300 W

. Therefore, the power values will be divided into two quantization states:

Q^{(m)} \in {0, 1}

, with quantization powers of

y [Q^{(m)} = 0] = 0

and

y [Q^{(m)} = 1] = 300 W

, respectively. At the same time, the quantified states corresponding to each power of device m in the dataset can also be obtained, such as

Q [y^{(m)} = 200] = 1

and

Q [y^{(m)} = 100] = 0

.

Next, according to Reference [15], the states of all devices in the house are aggregated into super states. Assuming there are M target devices in the scene, the super state

S_{t}

at time t can be directly represented as

{\overset{⌢}{S}}_{t} = ({\overset{⌢}{X}}^{(1)}, {\overset{⌢}{X}}^{(2)}, \dots, {\overset{⌢}{X}}^{(M)})

by the quantified states of each appliance, where

{\overset{⌢}{X}}^{(m)}

represents the quantified state of device m in

S_{t}

.

After establishing a super state for all target devices, the super state corresponding to the total power at each moment in the training set can be determined. Based on the super state, the state transition probability matrix and observation probability matrix can be established. If the number of total power observations in the scene is N and the number of super states is L, then A is a transition probability matrix of size

L \times L

and B is an observation probability matrix of size

K \times N

. The elements in matrices A and B are defined by Equations (2) and (3), respectively.

A [i, j] = P_{a} (i . j) = P [S_{t} = j | S_{t - 1} = i]

(2)

B [j, n] = P_{b} (y_{t}, j) = P [y_{t} = n | S_{t} = j]

(3)

where

S_{t}

is the super state of time t,

y_{t}

is the observed value of

S_{t}

, and

S_{t - 1}

is the super state of time t − 1. In this paper, the observed value of the super state refers to the total power data in the test dataset.

2.2. Spatial Distribution of Load Characteristics

Decomposing the load into super states requires a substantial amount of training and testing samples. A super-state load sample, represented by a high-dimensional feature vector, corresponds to a point in the load feature space. Consequently, all samples of a super state within a dataset form a cluster in the load feature space, referred to as the class domain of the super state. This class domain encompasses all possible load feature values for that super state. If the load features of two super states differ significantly, their respective class domains are well separated, with no overlap. However, when the features of two super states are highly similar, their class domains overlap, as illustrated in Figure 2. In such cases, it becomes challenging to accurately determine the category of a test sample located within the overlapping region.

The recognition ability of the samples in the overlapping area directly affects the accuracy of the NILM system. In order to reduce the overlapping region of the hyper-state class domain, on the one hand, the load feature with low similarity should be selected, on the other hand, the overlapping region can be reduced in the load decomposition algorithm. In these two directions, this paper proposes a load feature extraction method based on KLE and an improved depth dictionary based on the distance separability measure function. These two methods are described in detail in Section 2.3 and Section 3.2 later.

2.3. Load Feature Extraction Based on KLE

X = {[X (0) X (1) \dots X (N)]}^{T}

is set to represent the power sequence of a particular electrical appliance, and its autocorrelation matrix

Φ_{X X}

is of size

\tilde{N} \times \tilde{N}

. By performing eigenvalue decomposition on

Φ_{X X}

, we obtain

\tilde{N}

orthogonal eigenvectors

q_{0}, q_{1}, \dots q_{\tilde{N} - 1}

, and the Karhunen-Loeve (KL) transformation of the sample

X

is expressed as follows:

\tilde{x} = Q^{T} X

(4)

In Equation (4),

\tilde{x}

represents the KL-transformed signal, and

Q = [q_{0}, q_{1}, \dots, q_{\tilde{N} - 1}]

is a unitary matrix, ensuring

Q^{T} Q = Q Q^{T} = I

. Hence, the original signal

X

can be perfectly reconstructed using the inverse KL transformation. The process of reconstructing the original signal

X

using the KL-transformed signal is expressed as follows:

X = Q \tilde{x} = \sum_{i = 0}^{\tilde{N} - 1} q_{i}^{T} X q_{i}

(5)

From Equation (5), it can be seen that the signal

X

can be decomposed into

\tilde{N}

uncorrelated spectral components

x_{0}, x_{1}, \dots, x_{\tilde{N} - 1}

, in which

x_{i} = q_{i}^{T} X q_{i}

. These unrelated spectral components are called subspace components (subspace component, SC) of signal

X

. These SCs expand the

N \times 1

dimensional sampled power vector into an

\tilde{N} \times \tilde{N}

size SC matrix

X_{S C} = [x_{0}, x_{1}, \dots x_{\tilde{N} - 1}]

. The KL expansion decomposes

X

into multiple subspaces with minimal-truncation mean-square errors. By applying dimensionality reduction to the SC matrix and retaining only the elements within each SC that best represent the original signal characteristics, the similarity between signals can be reduced to some extent. Therefore, this paper utilizes a dimensionality reduction method based on singular value decomposition (SVD) to analyze the subspace matrix and extract deeper information hidden in the power data.

Singular value decomposition (SVD) is a widely used mathematical technique in the field of signal processing. By applying SVD to the subspace matrix for dimensionality reduction, noise and correlated components within the signal can be removed, leaving only the primary features of the samples. Additionally, SVD provides some degree of filtering for outliers caused by data acquisition equipment or equipment malfunctions. During the dimensionality reduction process, it is necessary to do singular value decomposition of the covariance matrix

X_{S C} X_{S C}^{T}

of

X_{S C}

to obtain the eigenvalue matrix and the left singular matrix

U

of the covariance matrix.

The eigenvalue matrix is used to determine the row number

k

of the reduced matrix, and the square method is expressed as Formula (6):

k = \min_{k} (1 - \frac{\sum_{1}^{k} λ_{i}}{\sum_{1}^{\tilde{N}} λ_{i}} \leq 0.05)

(6)

In the formula,

λ_{i}

is the

i

eigenvalue of the covariance matrix

X_{S C} X_{S C}^{T}

. The purpose of Formula (6) is to select the minimum

k

value which can retain more than 95% of the information of the original matrix. After the

k

value is determined, the first

k

column of the matrix

U

is taken to form a reduced-dimension matrix

U_{k}

, and a matrix

Z

of

\tilde{N} \times k

value is calculated by using Formula (7), which is called the KLE load characteristic matrix of the equipment.

Z = U_{k}^{T} X_{S C}

(7)

To achieve a relatively stable subspace, the length of a sampling sequence should be much smaller than the total length of the device state power signal, and it should be determined based on the data sampling frequency. In the UK-DALE dataset, the total length of the device state power signals ranges between 60 and 80 samples. Therefore, this paper selects five samples as the length of a sampling sequence, resulting in a 5 × 5 autocorrelation matrix, that is,

N = \tilde{N} = 5

.

In actual measurement data, the power value of each super state is not unique. Even if the internal state of the appliance does not change, small power fluctuations can still occur within the super state. The approach used in this paper is to sequentially obtain five total power samples from the current super state each time a super-state transition occurs in the training set. If the super state

i

appears

r

times in the dataset, then

r

power vectors of size 5 can be obtained. The KL feature matrices of these vectors constitute the KLE feature class domain for super state

i

. The mean vector of the

r

power vectors is calculated, and the KLE feature matrix of the mean vector is taken as the KLE central feature matrix for the super state

i

. This central feature matrix is then used as the training sample for dictionary learning.

3. Improved In-Depth Dictionary Learning

This paper proposes an improvement to the objective function of deep dictionary learning by incorporating a distance-based class separability measure function and applying it to load decomposition. Section 3 provides a detailed introduction to the optimization method of DSM-DDL and outlines the specific process for performing load decomposition using this approach.

3.1. Deep Dictionary Learning

Reference [10] was the first to introduce the method of sparse coding and dictionary learning into NILM. Sparse coding is essentially a modeling approach, where a signal matrix can be linearly represented by a subset of elements from a dictionary. One of the key aspects of classification methods based on sparse representation is dictionary learning. The learning process primarily involves optimizing the dictionary through training samples and minimizing an objective function.

Assume that the target device’s power has been time-sampled using a smart meter, represented as a column vector

X

. This vector can be represented by a dictionary

D

and a sparse matrix

Z

as follows:

X = D Z

(8)

This is a typical sparse dictionary learning problem, and the objective function can be obtained by

L_{1}

regularization of sparse matrix

Z

.

\min_{D, Z} {‖X - D Z‖}_{F}^{2} + λ {‖Z‖}_{1}

(9)

In the case where

X

is known, the device dictionary

D

can be obtained by optimizing Formula (9) using the KSVD method [16]. If there are multiple devices in the scene, load characteristics such as power are used, and it is assumed that the total reading

X

of the smart meter is the sum of the readings

X_{m}

of

M

devices. Total power is modeled as

X = \sum_{M} X_{m} = \sum_{M} D_{m} Z_{m}

(10)

The objective function is

\min_{z_{1}, \dots, z_{M}} {‖X - [D_{1} |\dots| D_{M}] [\begin{matrix} Z_{1} \\ ⋮ \\ Z_{M} \end{matrix}]‖}_{F}^{2} + λ {‖\begin{matrix} Z_{1} \\ ⋮ \\ Z_{M} \end{matrix}‖}_{1}

(11)

In-depth dictionary learning extends the single-layer dictionary of Formula (8) to multiple layers, as follows:

X = D^{1} D^{2} \dots D^{l} Z, l = 1, 2, 3 \dots

(12)

The corresponding objective function is

\min_{D_{1}, D_{2}, D_{3}, Z} {‖X_{i} - D_{1} D_{2} D_{3} \dots D_{l} Z‖}_{F}^{2} + λ {‖Z‖}_{1}

(13)

3.2. Improved Objective Function Based on Distance Separability Measure

As seen from Equation (10), methods based on deep dictionary learning can only utilize additive load features, modeling the total load as the sum of the power from individual devices. This approach assumes that the devices being monitored are simple on–off-state devices, which makes it challenging to accurately identify devices with overlapping features. Furthermore, modeling the total load as the sum of the power from all appliances in the scenario can lead to the accumulation of decomposition errors from individual devices into the total load. As the number of appliances increases, the total state error will rapidly escalate.

To address these issues, this paper proposes an improved deep dictionary learning method based on a distance separability measure function. The objective function of deep dictionary learning, as shown in Equation (13), is enhanced using a distance-based measure function to reduce the overlapping regions of load feature class domains for similar super states. The solution method for the improved objective function is also provided. The number of layers in the dictionary involves a trade-off between learning depth and overfitting, with most medium-sized problems using a three-layer architecture. The three-layer objective function for the improved deep dictionary learning proposed in this paper is formulated as follows:

\begin{array}{r} \min_{D_{1}, D_{2}, D_{3}, Z} {‖K_{i} - D_{1} D_{2} D_{3} Z‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {D_{2} D_{3} Z‖}_{F}^{2} \\ - n_{i} ‖D_{1} D_{2} D_{3} Z - {K_{j}‖}_{F}^{2}) + λ {‖Z‖}_{1} \end{array}

(14)

where

K_{i}

is the KL central eigenmatrix of the hyper-state

i

,

D

shows the multi-layer dictionary matrix,

Z

is the sparse representation matrix of

K_{i}

, and

λ

is the regularization parameter greater than 0.

K_{j}

is another KL characteristic matrix of hyper-state

j

which overlaps with class domain

K_{i}

,

C_{i j}

is the KL matrix set of super state

i

and

j

overlapping-region samples, and

n_{i}

is the number of KL matrix

K_{c}

belonging to hyper-state

i

in

C_{i j}

. This paper stipulates that the number of

K_{c}^{i}

does not exceed 1/10 of

K_{i}

. The method of determining

C_{i j}

and

K_{c}

is as follows: After determining the KL feature class domain of each hyper-state in Section 2.2, the Adaboost-based method in Reference [4] is used to find the hyper-state

j

which has the highest similarity with the class domain of hyper-state

i

, the KL feature matrix

K_{c}^{i}

, and

K_{c}^{j}

with high similarity in the two class domains, and these matrices are placed into the set

C_{i j}

.

In this method, the concepts of intra-class distance and inter-class distance are introduced to optimize the target function. In Formula (14),

\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {D_{2} D_{3} Z‖}_{F}^{2}

and

n_{i} ‖D_{1} D_{2} D_{3} Z - {K_{j}‖}_{F}^{2}

are

F

norm forms of intra-class dispersion and inter-class dispersion, respectively, and they are subtracted and multiplied by

\frac{1}{n_{i}}

as the optimization function of similar hyper-state discrimination. The F-norm form is used here to facilitate the measurement of the distance among matrices and can also be synchronized with the original objective function during optimization iterations. On the one hand, the optimization function can make the distance between the learned dictionary and the same kind of samples in the overlapping area closer; on the other hand, it can increase the distance between different kinds of dictionaries. Combining the load characteristics of KLE proposed beforehand can reduce the number of samples in the overlapping area and enhance the ability of NILM system to distinguish the super state of feature similarity.

It is not easy to optimize Equation (14). In this paper, the split Bragman method [17] is used to decompose the objective function. By making

Y_{1} = D_{2} D_{3} Z

and introducing Bragman relaxation coefficient

B_{1}

, we can obtain

\begin{array}{r} \min_{D_{1}, D_{2}, D_{3}, z} {‖K_{i} - D_{1} Y_{1}‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {Y_{1}‖}_{F}^{2} - n_{i} ‖D_{1} Y_{1} \\ - {K_{j}‖}_{F}^{2}) + μ_{1} ‖Y_{1} - D_{2} D_{3} Z - B_{1}‖ + λ {‖Z‖}_{1} \end{array}

(15)

The objective function of the improved deep dictionary learning for

Y_{2} = D_{3} Z

is as follows:

\begin{array}{r} \min_{D_{1}, D_{2}, D_{3}, Z, Y_{1}, Y_{2}} {‖K_{i} - D_{1} Y_{1}‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {Y_{1}‖}_{F}^{2} \\ - n_{i} {‖D_{1} Y_{1} - K_{j}‖}_{F}^{2}) + μ_{1} {‖Y_{1} - D_{2} Y_{2} - B_{1}‖}_{F}^{2} \\ + μ_{2} ‖Y_{2} - D_{3} Z - B_{2}‖ + λ {‖Z‖}_{1} \end{array}

(16)

In Formula (16),

μ_{1}

and

μ_{2}

are weight coefficients, and the function is to determine the weight of each part in the optimization of the objective function. In this paper,

μ_{1} = μ_{2} = 1

.

Using the alternating direction method, Formula (16) is decomposed into six sub-problems:

\begin{array}{l} \begin{array}{r} 1 . \min_{D_{1}} {‖K_{i} - D_{1} Y_{1}‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {Y_{1}‖}_{F}^{2} \\ - n_{i} ‖D_{1} Y_{1} - {K_{j}‖}_{F}^{2}) \end{array} \\ \begin{array}{l} 2 . \min_{Y_{1}} {‖K_{i} - D_{1} Y_{1}‖}_{F}^{2} + μ_{1} {‖Y_{1} - D_{2} Y_{2} - B_{1}‖}_{2}^{2} + \\ \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} {‖K_{c}^{i} - D_{1} Y_{1}‖}_{F}^{2} - n_{i} {‖D_{1} Y_{1} - K_{j}‖}_{F}^{2}) \end{array} \\ 3 . \min_{D_{2}} {‖Y_{1} - B_{1} - D_{2} Y_{2}‖}_{F}^{2} \\ 4 . \min_{Y_{2}} μ_{1} {‖Y_{1} - B_{1} - D_{2} Y_{2}‖}_{F}^{2} + μ_{2} {‖Y_{2} - D_{3} Z - B_{2}‖}_{F}^{2} \\ 5 . \min_{D_{3}} {‖Y_{2} - B_{2} - D_{3} Z‖}_{F}^{2} \\ \underset{Z}{6 . \min} μ_{3} {‖Y_{2} - B_{2} - D_{3} Z‖}_{F}^{2} + λ {‖Z‖}_{1} \end{array}

Sub-problem 1 to 5 are convex problems and are least-square problems with closed solutions, which can be optimized block by block by using the least-square method. The partial derivative of

D_{1}

can be obtained using sub-problem 1.

D_{1} = [(K_{i} - n_{i} K_{j}) Y_{1}^{T} + \frac{1}{n_{i}} \sum_{K_{c}^{i} \in C_{i j}} K_{c}^{i} Y_{1}^{T}] {(3 Y_{1} Y_{1}^{T})}^{- 1}

(17)

In the same way, the optimization formulas of sub-equations 2 to 5 can be obtained, which will not be repeated in this paper.

Sub-problem 6 is a single-objective L1 regular sparse problem, which can be optimized by the orthogonal matching pursuit algorithm [18]. In each iteration, the update formula of relaxation coefficient

B

is as follows:

B_{1}^{k + 1} = - B_{1}^{k} + D_{2} Y_{2} - Y_{1}

(18)

B_{2}^{K + 1} = - B_{2}^{k} + D_{3} Z - Y_{2}

(19)

When learning the improved depth dictionary according to the training samples, we first initialize the three sub-dictionaries randomly. In this paper, we make them an n-order unit matrix, and n is the number of rows of the KL characteristic matrix. On this basis, use the following:

\begin{array}{r} \min_{Z} {‖K_{i} - D_{1} D_{2} D_{3} Z‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {D_{2} D_{3} Z‖}_{F}^{2} \\ - n_{i} ‖D_{1} D_{2} D_{3} Z - {K_{j}‖}_{F}^{2}) \end{array}

(20)

Initialize

Z

, then initialize

Y_{1} = D_{2} D_{3} Z

,

Y_{2} = D_{3} Z

.

In dictionary learning, the reconstruction error

d

is obtained by Formula (21) after each iteration optimization.

\begin{array}{r} d = {‖K_{i} - D_{1} D_{2} D_{3} Z‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{i} \in C_{i j}} ‖K_{c}^{i} - D_{1} {D_{2} D_{3} Z‖}_{F}^{2} \\ - n_{i} ‖D_{1} D_{2} D_{3} Z - {K_{j}‖}_{F}^{2}) \end{array}

(21)

In this paper, two termination conditions are set for the algorithm: when the reconstruction error

d

converges to the local minimum or reaches the maximum number of iterations. The local minimum is defined as five consecutive iterations without a smaller

d

, and the maximum number of iterations is set to 50. After terminating the algorithm, the three-layer dictionary

D_{1}, D_{2}, D_{3}

of hyper-state

i

can be obtained.

3.3. Load Decomposition Based on Prior Probability Matrix and Minimum Reconstruction Error

In load decomposition, super-state estimation and power consumption prediction need to be carried out according to the power samples in the test set and the previously acquired dictionary. Assuming that the KL characteristic matrix of the t-time test sample is

K_{t}

, the objective function of load decomposition using the depth dictionary of super state

j

is as follows (22):

\begin{array}{r} \min_{Z} {‖K_{t} - D_{1}^{(i)} D_{2}^{(i)} D_{3}^{(i)} Z‖}_{F}^{2} + \frac{1}{n_{i}} (\sum_{K_{c}^{(i)} \in C_{i j}} ‖x_{k} - D_{1}^{(i)} {D_{2}^{(i)} D_{3}^{(i)} Z‖}_{F}^{2} \\ - n_{i} {‖D}_{1}^{(i)} D_{2}^{(i)} D_{3}^{(i)} Z - {X_{j}‖}_{F}^{2}) + λ {‖Z‖}_{1} \end{array}

(22)

The optimization method of Formula (22) is the same as the termination condition and Formula (16).

If the sparse matrices corresponding to all hyper-states are solved every time during load decomposition, the amount of calculation will be very large. In order to reduce the computational complexity, the concepts of state transition probability matrix and observation probability matrix in the hidden Markov model are introduced, and a dictionary screening method based on a posteriori probability is proposed. In this method, the transition probability between super states and the observation probability of total power are used as the basis for dictionary selection in load decomposition.

The load decomposition process is shown in Figure 3. In the load decomposition stage, the event detection of the power signal in the test set is carried out firstly, and the event detection uses the edge detection method in Reference [18]. After a change in the overstate is detected, five consecutive power samples are taken as load decomposition samples. Assuming that the total power value of the moment before the event is

y_{t - 1}

and the observed value of the event time is

y_{t}

, according to the matrices An and B constructed in Section 2.1, a posteriori probability

P_{p ost}

is obtained by using the following formula:

P_{p o s t} = P_{b} (y_{t - 1}, i) \times P_{a} (i, j) \times P_{b} (y_{t}, j)

(23)

Select the hyper-state dictionary whose

P_{p o s t}

is not zero to form an alternative hyper-state dictionary

Ψ_{D} = [D_{i}, \dots D_{k}]

, where

D_{i}

represents the improved depth dictionary of hyper-state

i

. Select

D_{i}, \dots, D_{k}

in

Ψ_{D}

and then use Formula (7) to extract KL features from load decomposition samples, obtain KLE feature matrix

K_{t}

with the same order as the depth dictionary, calculate sparse matrix

Z

and reconstruction error

d_{i}

, and use Formula (24) to select the hyper-state corresponding to depth dictionary with minimum reconstruction error.

j = \arg \min_{D_{i} \in Ψ_{D}} d_{i}

(24)

It is stipulated that the quantized state of each equipment in the super state is the state-estimated value of each target equipment at the current time, and the quantized power of each equipment state is the power-estimated value of each piece of equipment. The quantized status and quantized power values within a single equipment have been determined according to the box-splitting method in Section 2.1. So far, the DSM-DDL algorithm has completed a device state and power estimation.

4. Experimental Results and Analysis

The proposed method in this paper was validated and analyzed using a real-world dataset. The software platform utilized the Anaconda3 compiler with Python 3.6. The hardware platform consisted of a desktop computer with an Intel i7-8700 CPU and 16 GB of RAM. The experimental data used in this study were sourced from the UK-DALE dataset, specifically the sampling data from House 1 published by Imperial College London. This dataset provides device-level and total power data for 50 appliances in a single UK household over 385 days, along with records of state-change events for each device. The sampling interval is 6 s.

4.1. Comparison of Similarity of Load Characteristics

The accuracy of NILM load identification largely depends on the selected load features. In this paper, the super states are formed by aggregating the internal states of all target devices. If the target devices include appliances or combinations of appliances with similar power values and change patterns, super states with similar observed values and patterns of change may be generated, which can adversely affect the accuracy of load decomposition.

To validate the ability of the KLE feature matrix to reduce the similarity of load features, we first identified appliances and appliance combinations with high-power feature similarity. By comparing the power waveforms of appliances in the dataset, three devices were selected: a television (TV), a refrigerator (fridge, FRE), and a desktop computer (office_PC, OP). Among these, the power curves of the television and refrigerator exhibit high similarity, and the power curve of the television–refrigerator combination has a high similarity to that of the desktop computer. For each device and device combination, a representative portion of power samples was selected, with the sample waveforms shown in the following figure.

It can be seen that the power values of the refrigerator and TV are mainly distributed in the range of 80 W to 120 W, and there are very similar waves near 90 W. Compared with the power waveform of the combination of the computer and TV machine–refrigerator, it can also be found that the power value and waveform change law have a high similarity. For these devices and equipment combinations, it is difficult to distinguish effectively by using traditional steady-state characteristics such as power and current.

The binning method was used to quantify the internal states of the super states for the television, refrigerator, computer, and television–refrigerator combination. The power curves in Figure 4 were converted into the corresponding quantized-state sequence diagrams, with the results shown in Figure 5. For simplicity in labeling, the quantized states of the television, computer, refrigerator, and television–refrigerator combination are denoted as TV, OP, FR, and TV-FRE, respectively.

By examining Figure 4 and Figure 5, it can be observed that the states of the television and refrigerator have a high degree of similarity. This implies that if these devices or combinations are included in a super state, and other device states remain the same, the power magnitude and waveform of the super states where the television is on and the refrigerator is off, or the refrigerator is on and the television is off, will be very similar. Similarly, the computer’s state has a high similarity to the state of the television–refrigerator super state, which can also lead to misclassification during load decomposition. The experiment will use these states as examples to demonstrate the ability of the KL feature extraction method to reduce the similarity of load feature curves with close values and change patterns.

To quantitatively compare the impact of different load feature selections on the similarity of various device states or super states, the similarity index in Formula (25) is used to represent the degree of similarity between the load features of two states.

R_{(a, b)}^{j} = \frac{{‖y_{f, a}‖}_{2}}{{‖y_{f, a}‖}_{2} + {‖y_{f, a} - y_{f, b}‖}_{2}}

(25)

In Equation (25),

a, b

represent any two device states or super states,

f

denotes the load feature currently being compared, and

y

represents the sample vectors of

a

and

b

. By calculating the similarity index for a specific type of load feature using the similarity function, the similarity and identifiability of appliances under that load feature can be quantitatively compared. The higher the similarity, the more challenging it is to distinguish between the states.

Based on the analysis of Figure 4 and Figure 5, the following states were selected for comparison: TV(1) and REF(1), PC(2) and TV-REF(2), PC(2) and TV-REF(3). Equation (25) was used to calculate the similarity indices for both the power features and KLE features of these device states.

The power and corresponding KLE feature matrices of TV(1), PC(2) and REF(1) are shown in Figure 6. In the experiment,

N = \tilde{N} = 5

.

Figure 7 compares the similarity indices of the three state groups under different load features. It can be observed that when using power states, the similarity indices of all three comparison groups are above 0.8, indicating a high degree of similarity. In contrast, the similarity indices for the KLE feature matrices show a significant reduction, all falling between 0.3 and 0.4. This indicates that KLE features can reduce the similarity between device states and super states with similar power curves, which is beneficial for load decomposition and state detection.

The similarity index experiment confirms that KLE features can increase the distinction between load features at the feature level. The DSM-DDL algorithm further enhances the ability to identify devices with similar features at the algorithmic level.

4.2. Definition and Handling of Noise in the Dataset

Before conducting the load decomposition tests, it is essential to define the noise in the experimental dataset. In the actual monitoring of household electricity usage, monitoring all devices would make the model extremely cumbersome and negatively impact load detection efficiency. Low-power appliances, such as lamps and chargers, consume very little electricity, and their state changes have minimal impact on the total load, making them of little monitoring value. On the other hand, changes in the total power due to the addition or removal of appliances can also affect the performance of the load monitoring system. How non-target appliances should be handled is a practical issue in the design of NILM systems. Most existing studies only use the sampling data of the appliances of interest as the training and test sets. Although this approach yields good load identification results, it considers actual application scenarios insufficiently.

In this paper, the power data outside the target equipment are set as noise, and the size of noise is defined as the total power minus the power value of non-target equipment.

n o i s e = y_{t} - \sum_{m = 1}^{M} y_{t}^{(m)}

(26)

In Formula (26),

y_{t}

is the total power value of the equipment at

t

time, which is obtained by adding the power of the target device and the non-target device in the scene, and

y_{t}^{(m)}

is the measured value of device

m

at time

t

.

Define the percentage of noise in the experiment as follows:

NM = \frac{\sum_{t = 1}^{T} | y_{t} - \sum_{t = 1}^{M} y_{t}^{(m)} |}{\sum_{t = 1}^{T} y_{t}} \times 100 %

(27)

4.3. Accuracy Testing Criteria

Firstly, three parameters are defined. For hyper-state

j

, the load decomposition results are as follows: TP is true positive (recognition result is

j

), FP is false positive (identification result is not

j

), and FN is false negative (other hyper-states are identified as

j

).

Compared with the classical f-score criterion, the finite state FS-f score method [19] with local penalty measures can transform the binary attributes of true positive (TP) into the measurement of discrete values and is more suitable for measuring the accuracy of non-binary classification such as state classification and power estimation.

The FS-F score is defined as follows:

The inaccurate portion of true positives (inacc) is defined as follows:

i n a c c = \sum_{t = 1}^{T} \frac{| {\overset{⌢}{y}}_{t} - y_{t} |}{K}

(28)

where

T

is the length of the time series,

{\overset{⌢}{y}}_{t}

duration is the estimated power value of the super state at

t

time, and

y_{t}

is the power observation value at

t

time.

K

is the total number of super states. The method punishes according to the error between the estimated state and the real state. The precision and recall rate of FS-f score are defined as

p r e c i s i o n = \frac{t p - i n a c c}{t p + f p}, r e c a l l = \frac{t p - i n a c c}{t p + f n}

(29)

The FS-f score is the harmonic mean of precision and recall:

F S - f s c o r e = \frac{2 \cdot p r e c i s i o n \cdot r e c a l l}{p r e c i s i o n + r e c a l l}

(30)

Considering the power consumption estimation error of the algorithm, it is defined as follows:

E S T . A c c = (1 - \frac{\sum_{t = 1}^{T} | {\overset{⌢}{y}}_{t} - y_{t} |}{2 \cdot \sum_{t = 1}^{T} y_{t}}) \times 100 %

(31)

The higher the

E S T . A C C

, the more accurate the algorithm’s energy consumption estimation.

4.4. Analysis of Algorithm Efficiency

The super state includes all possible combinations of the internal states of the target devices. Same as the state space of multi-agents, the number of super states grows exponentially [20] with the number of devices increases. For example, if two devices with three internal states each form a super state, the number of possible super states would be 3² = 9, and for three devices, it increases to 3³ = 27. However, this is only the theoretical count. An analysis of datasets such as AMpds [21] and UK-DALE reveals that if we only consider the super states that actually appear in the dataset, they are highly sparse. This sparsity is determined by household electricity usage patterns and the operating principles of the appliances. For instance, a washing machine’s state sequence might be wash–rinse–spin, but it is unlikely to follow the sequence spin–wash–rinse. Additionally, some appliances are only used at specific times of the day, such as lights and kitchen appliances. These factors lead to the actual number of super states being far lower than the theoretical number.

In the UK-DALE dataset, the relationship between the theoretical number of super states and the actual number of super states as the number of devices increases is illustrated in Figure 8. Even when simultaneously decomposing 15 devices, only about 6500 super states are observed. The sparsity of the super states not only prevents the number of improved deep dictionaries that need to be learned from growing exponentially but also results in the state transition probability matrix

A

and the observation probability matrix

B

having high sparsity. By only considering the super states that actually occur when learning dictionaries and constructing matrices, storage requirements and computational complexity can be significantly reduced.

Figure 9 illustrates the relationship between the time required for a complete super-state estimation using the KL feature extraction and DSM-DDL load disaggregation method and the actual number of super states [22]. As shown, the algorithm’s execution time increases in a piecewise linear fashion as the number of super states increases. The impact of the number of super states on load disaggregation time primarily affects the selection of the candidate super-state set, which requires traversing the state transition matrix

A

and the observation probability matrix

B

to calculate

P_{p o s t}

. The greater the number of super states, the more dictionaries are included in the candidate super-state set. By examining Figure 8 and Figure 9, it can be observed that even with as many as 10⁶ super states, the load disaggregation time is only around 7 s. For common scales of load monitoring target devices, where the number of appliances ranges from 5 to 15, the actual number of super states is below 10,000, and the algorithm’s runtime is less than 1.3 s, meeting the requirements for practical applications.

4.5. Load Disaggregation Accuracy Experiment

We selected 10 days of sampling data from seven commonly used and highly similar devices in the UK-DALE dataset as the experimental data. The seven devices are kitchen lights (KILs), refrigerators (FRs), televisions (TVs), computers (PCs), dishwashers (DWs), hoovers (HOs), washing machines (WMs), hair dryers (HDs), and toasters (TOs). The appliances involved in the experiment were divided into two groups. The first group consists of devices with high similarity, including DWs, WMs, TOs, and HOs. Among them, HOs and WMs, as well as TOs and HDs, have a high degree of similarity, as shown in the power curve graphs in Figure 4. The second group consists of devices with lower similarity, including KILs, TVs, PCs, DWs, and FRs.

In addition to the target devices involved in the super-state modeling, data from non-modeled devices were added to the total power data of each group as noise, resulting in an approximately 6% NM for both groups. Sampling data from non-operating periods (where the power remains at 0 for an extended duration) were partially removed to refine the dataset. For each group of devices, the load disaggregation experiments were designed for the following two scenarios [22]:

No Noise: Only the power data of the devices of interest are used in both the training and test sets, representing an ideal scenario.

With Noise: The power data of non-participating devices are retained in both the training and test sets, making this scenario more aligned with real-world conditions.

First, the deep dictionary optimization function defined in Equation (13) is used for dictionary learning and load disaggregation. The current, power, and KLE feature matrices are separately used as load features to aggregate the target devices into super states. The KL features of the target devices are extracted from the power sampling data, while the KL feature matrices of the super states are extracted from the total power sampling data. Experiments are conducted on the first group of devices under both noise scenarios, following the iteration stopping criteria established in Section 2.3. The results are as follows:

The average number of dictionaries within the load disaggregation candidate dictionary set is 4.7.

It is clear that, in the absence of noise, all three methods achieve the best results. Among the three load features, experiments using power features exhibit the lowest accuracy due to the high similarity between the power characteristics of the devices, while current features perform slightly better than power features. In all scenarios, the KLE features achieve the best results, demonstrating that this feature effectively reduces the similarity between similar devices’ characteristics and, when applied to load disaggregation, can significantly improve the algorithm’s accuracy.

Next, the proposed DSM-DDL algorithm is applied. The optimization objective is given by Equation (14). The threshold in Equation (6) is set to 0.1, and both the dimension of the power column vector and the order of the autocorrelation matrix are set to 5. Parameters

μ_{1} = μ_{2} = 1

and

λ = 0.05

in Equation (16) are configured accordingly. The algorithm is then tested using current, KLE feature matrices, and power as load features to verify whether the improved deep dictionary method can further enhance the NILM system’s ability to disaggregate similar devices.

The experimental results of the improved depth dictionary in three scenarios are as follows.

The average number of dictionaries within the load disaggregation candidate dictionary set

Ψ_{D}

to 4.7.

Based on the results of both experiments, it can be observed that using state transition probabilities and observation probabilities as criteria for selecting the super-state dictionary resulted in an average of 4.7 algorithm runs per disaggregation, significantly less than the total 268 super states. This approach effectively reduces computation time and hardware overhead while maintaining accuracy, thereby ensuring efficient load disaggregation.

When comparing the results of the two experiments, it is evident that the proposed DSM-DDL algorithm outperforms deep dictionary learning in all aspects, especially when disaggregating highly similar devices. Interestingly, the accuracy improvement was more pronounced when using power features compared to KLE features. This is because KLE features already exhibit low similarity, resulting in minimal overlap between super-state domains. Consequently, the effect of reducing overlap using the improved deep dictionary was more significant for the power features, which originally had higher similarity.

Comparing Table 2 and Table 3 as well as Table 4 and Table 5, it is evident that noise has a more significant negative impact on the deep dictionary method than on the improved deep dictionary method. After incorporating the distance-based optimization into the dictionary learning objective function, the elements within the super-state domains are more closely aligned, and the position of the KL feature matrix within the domains is more optimal. Consequently, the overlap between similar super-state domains is reduced, making the disaggregation process less susceptible to noise-induced measurement deviations.

In both sets of experiments, the overall accuracy of noise modeling is lower than in the noise-free scenarios, which is expected given the complexity of real-world applications. However, when compared to scenarios where noise is retained but not addressed, noise modeling demonstrates a higher disaggregation accuracy, highlighting its effectiveness in practical applications.

In the noise modeling scenario, the accuracy metrics for the load disaggregation method using current load features and deep dictionary learning (DDL) are compared against the method using the KL feature matrix and the proposed DSM-DDL algorithm. As shown in Figure 10 and Table 6, the NILM method combining the KL feature matrix with the improved deep dictionary learning significantly outperforms the method using current features and DDL. Specifically, there is an approximately 8% improvement in both the FS-f score and EST.ACC accuracy metrics.

The above experiments tested the algorithm’s accuracy on super states aggregated from devices with partially similar features. It can be observed that even in the noise modeling scenario, where KL features and the improved deep dictionary are used, the power estimation accuracy only reaches 73.4%. Based on the feature similarity experiments in Section 3.1 and experimental data from References [10,14], this is likely due to the high similarity among devices in the scenario. In fact, similar devices only make up a portion of the entire dataset, while most devices have significantly different load characteristics.

As a comparison, four devices with generally lower similarity—refrigerators, dishwashers, hair dryers, and vacuum cleaners—were selected for aggregation into super states for the load disaggregation experiment. In the noise modeling scenario, load disaggregation was performed using current features combined with DDL and KL feature matrices combined with DSM-DDL, respectively. The results are shown in Table 6. It can be observed that device similarity significantly impacts the accuracy of load disaggregation. The disaggregation accuracy for the highly similar dishwasher and vacuum cleaner is noticeably lower than that for the other two devices. However, the overall accuracy of the aggregated data improves substantially compared to previous experiments as the overall similarity among target devices decreases. Furthermore, combining the results from Figure 10 and Table 6, it can be seen that the refrigerator has low similarity in power consumption with other devices. However, its small power consumption means that its power variation has little impact on the overall power variation, leading to high similarity between super states where only the refrigerator’s state differs, thus affecting accuracy. Table 6 also shows that the accuracy parameters of the aggregated data are closer to those of dishwashers and vacuum cleaners, indicating that the larger the proportion of a device in the total power, the greater its influence on the final power estimation. Improving the estimation accuracy of similar high-power devices can effectively enhance the performance of the load disaggregation method.

The load disaggregation accuracy experiments demonstrate that the use of KLE features and super-state modeling with improved deep dictionary learning significantly enhances the ability to identify similar devices. When combined with noise modeling techniques, the accuracy and robustness of the NILM system are effectively improved.

5. Conclusions

This paper addresses the low-frequency sampling NILM system by employing KL expansion and singular value decomposition (SVD) to extract the KL load feature matrix from device power data. It aggregates the states of all target electrical appliances into super states and applies an improved deep dictionary learning method, combined with distance-based measures, to learn the KL feature matrix of the super states for household appliance load identification. The method demonstrates strong performance on low-sampling-rate datasets like UK-DALE. The empirical results show that the KLE feature matrix has the advantage of lower load feature similarity, and the improved deep dictionary learning algorithm, based on distance separability metrics, outperforms standard deep dictionary methods in recognizing similar devices. While maintaining the accuracy of the NILM system, the use of posterior probability matrices reduces the computational complexity and time consumption of the load disaggregation algorithm. Additionally, noise modeling enhances the system’s robustness against noise, ensuring a certain level of load identification accuracy and information retrieval capability. This provides a solid foundation for further analyzing household energy consumption behavior, formulating optimized energy strategies, and facilitating precise power scheduling and load peak shaving for power suppliers.

Author Contributions

Conceptualization, S.L. and Z.H.; methodology, S.L.; software, S.L.; validation, S.L.; formal analysis, S.L.; writing—original draft preparation, Z.X.; writing—review and editing, Z.X. and Z.H.; supervision, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the General Program of National Natural Science Foundation of China (Grant No. 52177083).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bonfigli, R.; Principi, E.; Squartini, S.; Fagiani, M.; Severini, M.; Piazza, F. User-aided footprint extraction for appliance modelling in Non-Intrusive Load Monitoring. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–8. [Google Scholar]
Dong, W.X.M.; Meira, M.C.M.; Freitas, W. An event window based load monitoring technique for smart meters. IEEE Trans. Smart Grid 2012, 3, 782–796. [Google Scholar] [CrossRef]
Hart, G.W. Nonintrusive appliance load monitoring. Proc. IEEE 1993, 80, 1870–1891. [Google Scholar] [CrossRef]
Song, X.; Zhou, M.; Tu, J.; Li, G. Non-intrusive Load Monitoring Method based on k-NN and Kernel Fisher Discriminant. Autom. Electr. Power Syst. 2018, 42, 73–80. [Google Scholar] [CrossRef]
Jia, D.; Cao, M.; Sun, J.; Wang, F.; Xu, W.; Wang, Y. Interval Constrained Multi-Objective Optimization Scheduling Method for Island-Integrated Energy Systems Based on Meta-Learning and Enhanced Proximal Policy Optimization. Electronics 2024, 13, 3579. [Google Scholar] [CrossRef]
Meng, X.; Hu, G.; Liu, Z.; Wang, H.; Zhang, G.; Lin, H.; Sadabadi, M.S. Neural Network-Based Impedance Identification and Stability Analysis for Double-Sided Feeding Railway Systems. IEEE Trans. Transp. Electrif. 2024. [Google Scholar] [CrossRef]
Sun, Y.; Cui, C.; Lu, J.; Hao, J.; Liu, X. Non-intusive Load Monitoring Based on Delta Feature Extraction and Fuzzy Clustering. Autom. Electr. Power Syst. 2017, 41, 86–91. [Google Scholar] [CrossRef]
Henao, N.; Agbossou, K.; Kelouwani, S.; Dubé, Y.; Fournier, M. Approach in Nonintrusive Type I Load Monitoring Using Subtractive Clustering. IEEE Trans. Smart Grid 2015, 8, 812–821. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Ma, J.; Hill, D.J.; Zhao, J.; Luo, F. An Extensible Approach for Non-Intrusive Load Disaggregation with Smart Meter Data. IEEE Trans. Smart Grid 2023, 7, 3362–3372. [Google Scholar] [CrossRef]
Kolter, J.Z.; Batra, S.; Ng, A.Y. Energy Disaggregation via Discriminative Sparse Coding. In Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6–9 December 2010. [Google Scholar]
Elhamifar, E.; Sastry, S. Energy Disaggregation via Learning ‘Powerlets’ and Sparse Coding. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Cheng, X.; Li, L.; Wu, H.; Ding, Y.; Song, Y.; Sun, W. A Survey of the Research on Non-intrusive Load Monitoring and Disaggregation. Power Syst. Technol. 2016, 40, 3108–3117. [Google Scholar]
Singh, S.; Majumdar, A. Deep Sparse Coding for Non-Intrusive Load Monitoring. IEEE Trans. Smart Grid 2017, 9, 4669–4678. [Google Scholar] [CrossRef]
Kelly, J.; Knottenbelt, W. The UK-DALE dataset, domestic appliance-level electricity demand and whole-house demand from five UK homes. Sci. Data 2015, 2, 150007. [Google Scholar] [CrossRef] [PubMed]
Makonin, S.; Popowich, F.; Bajić, I.V.; Gill, B.; Bartram, L. Exploiting HMM Sparsity to Perform Online Real-Time Nonintrusive Load Monitoring. IEEE Trans. Smart Grid 2016, 7, 2575–2585. [Google Scholar] [CrossRef]
Aharon, M.; Elad, E.; Bruckstein, A. K-SVD:and algorithm for designing over-complete dictionaries for sparse representations. IEEE Trans. Signal Process. 2006, 54, 4311–4322. [Google Scholar] [CrossRef]
Goldstein, T.; Osher, S. The Split Bregman Method for L1-Regularized Problems. SIAM J. Imaging Sci. 2009, 2, 323–343. [Google Scholar] [CrossRef]
Li, C. Research on the Original Sinal Reconstruction Algorithm of Compressed Sensing. Master’s Thesis, Xidian University, Xi’an, China, 2015. [Google Scholar]
Lu, T.; Xu, Z.; Huang, B. An Event-Based Nonintrusive Load Monitoring Approach: Using the Simplified Viterbi Algorithm. IEEE Pervasive Comput. 2017, 16, 54–61. [Google Scholar] [CrossRef]
Liu, X.; Chuai, G.; Wang, X.; Xu, Z.; Gao, W.; Zhang, K.; Liu, Q.; Maimaiti, S.; Zuo, P. QoE-driven antenna tuning in cellular networks with cooperative multi-agent reinforcement learning. IEEE Trans. Mob. Comput. 2022, 23, 1186–1199. [Google Scholar] [CrossRef]
Makonin, S.; Popowich, F.; Bartram, L.; Gill, B.; Bajić, I.V. AMPds: A public dataset for load disaggregation and eco-feed back research. In Proceedings of the 2013 IEEE Electrical Power & Energy Conference, Halifax, NS, Canada, 21–23 August 2013; IEEE: Piscataway, NJ, USA, 2014. [Google Scholar]
Ming, L.I.; Peng, X.; Yan, W.A. Facial Expression Recognition Based on Improved Dictionary Learning and Sparse Representation. J. Syst. Simul. 2018, 30, 28–36. [Google Scholar]

Figure 1. Flowchart of KLE feature extraction.

Figure 2. Overlapping-region graph of super-state class clusters.

Figure 3. Flowchart of energy disaggregation.

Figure 4. Current diagram of three appliances. (a) Power curve of TV and refrigerator; (b) Power curve of refrigerator and computer.

Figure 5. Quantitative feature diagram of three kinds of equipment. (a) Quantitative characteristic curve of TV and refrigerator; (b) Quantitative characteristic curves of desktop computer and refrigerator–TV set.

Figure 6. KL characteristic matrix of three appliances. (a) KL characteristic matrix of TV(1); (b) KL characteristic matrix of PC(2); (c) KL characteristic matrix of REF(1).

Figure 7. Similarity index comparison.

Figure 8. The relationship between the number of super states and the number of appliances.

Figure 9. Effect of super-state number on load decomposition time.

Figure 10. Comparison of accuracy between DDL and DSM-DDL.

Table 1. Example of state quantization.

P (W)	0	100	200	300	400	500
Count (n)	900	80	100	620	200	100
$P Y_{m} (I)$	0.45	0.04	0.05	0.31	0.10	0.05
${\overset{⌢}{X}}^{(m)}, y_{p e a k}^{(m)}$	0.0	1.3

Table 2. Comparison of load decomposition results of denoised data.

Without Noise	Precision	Recall	FS-fscore	EST.Acc
Current	87.3%	89.8%	87%	90.2%
Power	84.7%	84.3%	84.5%	88.1%
KLE features	91.2%	90.5%	90.8%	91.6%

Table 3. Comparison of load decomposition results of noisy data.

With Noise	Precision	Recall	FS-fscore	EST.Acc
Current	79.6%	76.4%	79.1%	78.8%
Power	77%	75.2%	75.2%	77.3%
KLE features	82.9%	81.7%	82.3%	82.4%

Table 4. Comparison of load decomposition results of denoised data.

Without Noise	Precision	Recall	FS-fscore	EST.Acc
Current	90.7%	91.1%	90.9%	90.2%
Power	87.6%	89.3%	86.4%	88.5%
KLE features	93.2%	93.5%	93.4%	96.1%

Table 5. Comparison of load decomposition results of noisy data.

With Noise	Precision	Recall	FS-fscore	EST.Acc
Current	83.1%	82.7%	82.9%	84.8%
Power	85.3%	81.5%	82.1%	83.7%
KLE features	88.4%	86.5%	86.4%	90.2%

Table 6. Comparison of load decomposition results of modeled noisy data.

	Current + DDL		KL Matrix + DSM-DDL
	FSscore	EST.ACC	FSscore	EST.ACC
Aggregated data	83.24%	85.82%	86.4%	90.2%
FR	87.05%	90.77%	90.2%	93.04%
DW	80.63%	84.21%	86.71%	89.46%
HO	81.26%	83.95%	85.02%	87.25%
HD	88.74%	90.14%	90.83%	92.42%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, S.; Xie, Z.; Hu, Z. A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary. Electronics 2024, 13, 3970. https://doi.org/10.3390/electronics13193970

AMA Style

Liu S, Xie Z, Hu Z. A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary. Electronics. 2024; 13(19):3970. https://doi.org/10.3390/electronics13193970

Chicago/Turabian Style

Liu, Siqi, Zhiyuan Xie, and Zhengwei Hu. 2024. "A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary" Electronics 13, no. 19: 3970. https://doi.org/10.3390/electronics13193970

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Distributed Non-Intrusive Load Monitoring Method Using Karhunen–Loeve Feature Extraction and an Improved Deep Dictionary

Abstract

1. Introduction

2. Establishment of a Super-State Feature Library

2.1. Equipment State Division and Hyper-State Modeling

2.2. Spatial Distribution of Load Characteristics

2.3. Load Feature Extraction Based on KLE

3. Improved In-Depth Dictionary Learning

3.1. Deep Dictionary Learning

3.2. Improved Objective Function Based on Distance Separability Measure

3.3. Load Decomposition Based on Prior Probability Matrix and Minimum Reconstruction Error

4. Experimental Results and Analysis

4.1. Comparison of Similarity of Load Characteristics

4.2. Definition and Handling of Noise in the Dataset

4.3. Accuracy Testing Criteria

4.4. Analysis of Algorithm Efficiency

4.5. Load Disaggregation Accuracy Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI