HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation

Jaradat, Abdelkareem; Alarbi, Muhamed; Haque, Anwar; Lutfiyya, Hanan

doi:10.3390/s24175619

Open AccessArticle

HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation^†

by

Abdelkareem Jaradat

^*

,

Muhamed Alarbi

,

Anwar Haque

and

Hanan Lutfiyya

The Department of Computer Science, Western University, London, ON N6A 3K7, Canada

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in Jaradat, A.; Lutfiyya, H.; Haque, A. Synthetic Power Consumption Data Generation For Appliance Operation Modes. In Proceedings of the 2024 International Conference on Computing, Networking and Communications (ICNC24), Big Island, HI, USA, 19–22 February 2024; pp. 689–694.

Sensors 2024, 24(17), 5619; https://doi.org/10.3390/s24175619

Submission received: 25 July 2024 / Revised: 26 August 2024 / Accepted: 28 August 2024 / Published: 29 August 2024

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Versions Notes

Abstract

Realistic appliance power consumption data are essential for developing smart home energy management systems and the foundational algorithms that analyze such data. However, publicly available datasets are scarce and time-consuming to collect. To address this, we propose HYDROSAFE, a hybrid deterministic-probabilistic model designed to generate synthetic appliance power consumption profiles. HYDROSAFE employs the Median Difference Test (MDT) for profile characterization and the Density and Dynamic Time Warping based Spatial Clustering for appliance operation modes (DDTWSC) algorithm to cluster appliance usage according to the corresponding Appliance Operation Modes (AOMs). By integrating stochastic methods, such as white noise, switch-on surge, ripples, and edge position components, the model adds variability and realism to the generated profiles. Evaluation using a normalized DTW-distance matrix shows that HYDROSAFE achieves high fidelity, with an average DTW distance of ten samples at a 1Hz sampling frequency, demonstrating its effectiveness in producing synthetic datasets that closely mimic real-world data.

Keywords:

appliance operation modes; demand response (DR); dynamic time warping (DTW); HEMSs; load profile simulation; SHEMSs

1. Introduction

This era of dependence on fossil fuels and concern about greenhouse gas emissions has recently increased interest in utilizing new solutions in the Smart Grid (SG) to decrease energy consumption [1]. The residential sector’s contribution to total energy usage has increased from 22% of the total energy consumption in the USA in 2009 [2] to approximately 27% in 2020 [3]. In Canada, the residential sector accounted for 28% of total energy usage in 2006, while it accounted for 32% in 2020. It is expected that residential use will continue to rise with the same pattern until 2050 [3]. Particularly, home appliances are responsible for a decent portion of energy usage; in the USA, home appliances consume approximately 30% of the total household consumption [4], while in Canada, home appliances account for 14% of total household consumption [5].

One of the common strategies that offers many integrated solutions to reduce electric usage in the residential sector is the utilization of Smart Home Energy Management Systems (SHEMSs) [6]. SHEMSs are multi-component systems that focus on energy monitoring, analysis, scheduling, storage [7,8] and feedback based on several inputs such as electricity tariffs, appliances power data collected by sensors, power consumption limit, tenants preferences, occupancy rates, and environmental data [9]. SHEMSs utilize different approaches to analyze these inputs such as Machine Learning (ML) [10] and Digital Signals Processing (DSP) methods which result in providing user feedback, appliances scheduling, user information systems [11,12]. All these outcomes focus on having users understand household usage better and promote energy sustainability [13] using different approaches such as Demand Response (DR) [14].

Within SHEMSs, to develop and validate the aforementioned analytical algorithms and methods, representative datasets are needed. Power Consumption Datasets (PCDs) [15] play a crucial role in this context. PCDs are datasets containing time-series data corresponding to samples of the instantaneous power consumption for electric loads. PCDs come into aggregated and disaggregated forms. The aggregated form is when more than one load or all the loads within the same residence are measured into a single time-series. The other form is the disaggregated PCDs when each load in the house is individually measured into a separate time-series [16]. Despite the recent efforts in collecting residential PCDs [15], public PCDs availability is still limited [17,18,19,20] due to the need to setup measurement devices within households and the long time it takes for data collection which may take years in some cases [15]. In many cases, the publicly available PCDs do not satisfy the necessity for PCDs that holds appliance-specific features to support validating data analysis algorithms [14]. To overcome this issue, researchers use Synthetic PCDs (SyPCDs) [17,21], which have the potential to extend PCDs and save the installation cost and measurement time [20]. SyPCDs are generated load profiles for household appliances based on either publicly available PCDs (deterministic) or on mathematical (probabilistic) models [22]. SyPCDs should be realistic in representing the original dataset, tunable, expandable, and unbiased. In this work, a Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation (HYDROSAFE) is proposed. HYDROSAFE is a hybrid deterministic-stochastic model that is built to extend existing PCDs and aims to simulate household appliances’ usage profiles when activated with different Appliance Operation Modes (AOMs), which represent specific settings set to the appliance to meet the customers needs. At its core, the main objective of HYDROSAFE is to generate realistic appliance power consumption time series data based on a hybrid model. This model incorporates both deterministic methods that are built on top of a data analysis of publicly available PCDs, and probabilistic methods, which adds stochasticity to the model to maximize the realistic aspect of the generated data. The ultimate goal of HYDROSAFE is to generate Synthetic SUPs (SySUPs) in different AOMs for household appliances.

In domestic settings, home appliances often operate in distinct modes tailored to specific user needs and circumstances. An Appliance Operation Mode (AOM) denotes a predefined configuration established by the appliance manufacturer to accommodate user preferences across varying scenarios. Each AOM is characterized by its duration of operation and the unique cycles and states through which the appliance transitions. For instance, consider a dishwasher equipped with three operation modes: a light setting for lightly soiled dishes, a medium setting for moderately soiled dishes, and a heavy setting for heavily soiled dishes. The consumption of electricity varies depending on the specific AOM activated for a given appliance. Figure 1 illustrates two instances of Synthetic Usage Profiles (SUPs) for a clothes dryer, each corresponding to the appliance being activated with different AOMs. Figure 2 shows the average annual power consumption with the associated cost for three appliances within the same household. The figure shows the potential saving percentages achieved in load reduction by switching the use of appliances from heavy to medium, medium to light, and heavy to light AOMs. For example, if a household switches from using heavy to medium modes in a dishwasher, 45% of the cost is cut, while if the shifting occurs from heavy towards a light mode, 68% of the consumption is reduced annually [23].

The rest of the paper is organized as follows: In Section 2, previous work in the literature is presented. In Section 3, the problem formulation is elaborated. Section 4 presents the architecture of HYDROSAFE. In Section 5, SUPs extraction and smoothing is discussed. Section 6 presents the formal characterization of SUPs. Section 7 discusses the operation modes clustering using DTW algorithm. In Section 8, the process of generating synthetic SUPs is presented. Section 9 presents the evaluation of HYDROSAFE. Finally, Section 10 concludes the paper and suggests future work.

2. Related Work

Many of the available methods [17,24,25] of simulating appliance usage profiles focus on the consumer’s behavioral patterns [26], and determine the power consumption based on the occupancy actions [27] or based upon a psychological model [20]. A popular simulator is CREST [28], which is based on a combination of active occupancy patterns and profiles of daily activity that describes the patterns of occupants activities. An extension model [29] is built on top of CREST integrates a new thermal-electrical model into the existing model.

A probabilistic-empirical residential electricity load model [19] is designed to generate 1 min intervals power use of appliances based on both measured and statistical data besides occupant activities such as cooking, watching TV, etc. A stochastic approach [30] is used in the generation of high-resolution multi-energy load profiles for residential loads in remote areas. A mathematical framework [31] is developed for simulating household appliances by re-synthesizing the current waveforms, harmonic currents and the phase shifting of the appliances. Similar work [21] uses GUI in Matlab Simulink to simulate household loads.

Generative Adversarial Networks (GANs) [32] are rapidly evolving in many disciplines, including synthetic PCDs generation. TraceGAN [33] and PowerGAN [34] by Harell et al., ProfileSR-GAN [35] by Song et al., mREAL-GAN [36] by Sanderson et al., RLPGen [37] by Liang et al., SGAN [38] by Gkoutroumpi et al., are examples of recent literature works that explores the realm of generating realistic appliance data using GANs. GANs are known to be data-hungry [39] and require large datasets to be trained [40]. Since available PCDs with labeled AOMs is still very small [14], HYDROSAFE does not focus on using GANs. Language models are also used to generate PCDs as an N-gram language model based approach is proposed in [41] by obtaining a string representations of electricity consumption time series data from a household appliance, and then creating a unigram and bigram for each appliance category.

Several synthetic datasets for the residential sector is available. Table 1 lists recent publicly available residential and commercial SyPCDs and their basic characteristics. The Automated Model Builder for Appliance Loads (AMBAL) [42] is a load simulation tool designed to extract appliance models from real datasets. These models are composed of sequences of parameterized signatures, and play a crucial role in simulating a realistic household environment through the use of a trace generator. This synthetic appliance trace generator enables the recombination of appliance models, thus facilitating the simulation of user activities in homes with customizable complexity. A similar approach is used in SynD dataset [43], but for a larger number of appliances. SmartSim [44] is a device-accurate home energy load generator. It utilizes device energy and device usage models to simulate a household loads using a sequence of Distribution learning, Event marking, and Trace Generation components. SmartSim leverages its modeling by build on the data from Smart* energy dataset [45]. Other datasets, such as SHED [46,47] contains data for commercial buildings.

Research gap: The literature extensively examines methods for simulating residential load profiles [24,53]. However, to the best of the authors’ knowledge, no prior research has explicitly focused on simulating household appliances within the context of Appliance Operation Modes (AOMs). This observation underscores a notable limitation in the existing literature, revealing a significant gap in understanding and addressing the dynamics of appliance behavior within diverse operational modes.

Academic contribution: In this work, HYDROSAFE fills this gap by presenting a novel open-source [54] hybrid deterministic-probabilistic approach. HYDROSAFE leverages both empirical data and sophisticated statistical models to generate appliance usage profiles encompassing multiple operation modes. This development is crucial, as it provides a more accurate representation of real-world appliance usage, thereby enhancing the effectiveness of energy management systems and algorithms. Furthermore, simulating AOMs offers an opportunity to improve methods for analyzing power consumption data with a focus on AOMs, enabling households to achieve more significant energy savings.

3. Problem Formulation

The purpose of HYDROSAFE is to generate a synthetic dataset of household appliance usage profiles. This section describes the definitions and formulation for the generation process.

It is assumed that a household, h, belongs to the household set,

H

, such that:

\begin{matrix} h \in H = {h_{1}, \dots, h_{i}, \dots, h_{| H |}} \\ 1 \leq i \leq | H | \end{matrix}

(1)

where the household set,

H

, is of size,

| H |

. A household, h, runs an appliance, a, that belongs to the set of appliances,

A^{h}

, such that:

\begin{matrix} a \in A^{h} = {a_{1}, \dots, a_{i}, \dots, a_{| A^{h} |}} \\ 1 \leq i \leq | A^{h} | \end{matrix}

(2)

where

| A^{h} |

is the number of appliances in the household, h.

An operation mode, p, that belongs to the operation modes set,

P^{a}

, is defined as follows:

\begin{matrix} p \in P^{a} = {p_{1}, \dots, p_{i}, \dots, p_{| P^{a} |}} \\ 1 \leq i \leq | P^{a} | \end{matrix}

(3)

where

| P^{a} |

is the number of operation modes available for appliance, a. It is assumed that a complete run for an appliance, a, starts and ends in the same day. A day, d, is defined as:

\begin{matrix} d \in D = {d_{1}, \dots, d_{| D |}} \\ 1 \leq i \leq | D | \end{matrix}

(4)

where the set of days,

D

, is of size,

| D |

. A daily power consumption sequence,

Ω_{a}^{d}

, represents the power consumption samples taken in a single day, d, for the appliance, a, such that:

\begin{matrix} Ω_{a}^{d} = {\{ω_{n}\}}_{n = 1}^{n_{*}} = {ω_{1}, ω_{2}, \dots, ω_{n}, \dots, ω_{n_{*}}} \\ 1 \leq n \leq n_{*} \end{matrix}

(5)

where

ω_{n}

is the

n^{t h}

instantaneous power sample value measured in (W), and

n_{*}

represents the last sample index within d. When a sampling frequency,

f_{s}

, is used, the value of

n_{*}

for a single day is defined as follows:

n_{*} = t . f_{s}

(6)

where t is the time measured in seconds. For a single day,

t = 86, 400

s.

A Single Use Profile (SUP) is used to formally model the power consumption of a preprogrammed appliance between the time it is turned on and the time it is switched off. A SUP represents the sequence (or time-series) of the power consumption values (measured in Watts) consumed by an appliance from the moment of turning it on to the moment of turning it off. A SUP,

ψ^{p}

, with length

| ψ^{p} |

, is defined by the sequence of samples that represent a subsequence of the daily consumption,

Ω_{a}^{d}

, from the moment of turning the appliance on,

n^{s}

, to the moment that it is turned off,

n^{e}

. This is defined as:

\begin{matrix} ψ^{p} = {ω_{n}}_{n = n^{s}}^{n^{e}} = {ω_{n^{s}}, ω_{n^{s} + 1}, \dots, ω_{n}, \dots, ω_{n^{e} - 1}, ω_{n^{e}}} \\ ψ^{p} \subseteq Ω_{a}^{d}, n^{s} \leq n \leq n^{e} \leq n_{*} \end{matrix}

(7)

where:

\begin{matrix} | ψ^{p} | = n^{e} - n^{s} + 1 \end{matrix}

(8)

and for all SUPs,

ψ^{p} \subseteq Ω_{a}^{d}

, there is no overlapping between any two SUPs such that the intersection between these SUPs.

The set of SUPs,

Ψ_{a}^{d, p}

, of size

| Ψ_{a}^{d, p} |

, that corresponds to appliance a, and labeled by the AOM p, is defined as:

\begin{matrix} Ψ_{a}^{d, p} = {ψ_{1}^{p}, \dots, ψ_{j}^{p}, \dots, ψ_{| Ψ_{a}^{d, p} |}^{p}} \end{matrix}

(9)

where

Ψ_{a}^{d, p}

contains all the SUPs that run using the same AOM, p. The set of all SUPs,

Ψ_{a}^{d}

, in d is defined as the union of all disjoint subsets

Ψ_{a}^{d, p}

corresponding to every AOM

p \in P^{a}

. This is defined as:

\begin{matrix} Ψ_{a}^{d} = ⋃_{i = 1}^{| P^{a} |} Ψ_{a}^{d, p_{i}} = {ψ_{1}^{p_{j}}, \dots, ψ_{i}^{p_{k}}, \dots, ψ_{Z_{a}^{d}}^{p_{l}}} \\ {p_{j}, p_{k}, p_{l}, \dots} \in P^{a} \\ s . t, ⋂_{i = 1}^{| P^{a} |} Ψ_{a}^{d, p_{i}} = ϕ \end{matrix}

(10)

where

Ψ_{a}^{d}

is a set of size,

| Ψ_{a}^{d} |

, that represents the total size of all AOM subsets, such that:

\begin{matrix} | Ψ_{a}^{d} | = \sum_{p \in P^{a}} | Ψ_{a}^{d, p} | \end{matrix}

(11)

The main objective of HYDROSAFE is to generate a set of Synthetic SUPs (SySUPs) that can be used to validate the analytical methods to support Demand Response (DR) [14]. The set of SySUPs,

{\ddot{Ψ}}_{a}

, are generated by the HYDROSAFE modules, such that:

(12)

where

H

is the HYDROSAFE generator function, p is the selected AOM to generate SUPs in,

Ψ_{a}^{p}

is the set of extracted SUPs from the dataset, and Sensors 24 05619 i001

is the set of tuning parameters used in the generation process.

4. HYDROSAFE Architecture

The architecture of HYDROSAFE depicted in Figure 3 comprises five main components: First, the SUPs Extraction module processes a publicly available PCD [23] and extracts a set of Single Use Profiles (SUPs) for all appliances represented in the PCD. The SUP Characteristics Extraction module applies a series of processes on the set of extracted SUPs to identify the characteristics of SUPs in a formal model. The Operation Modes Clustering component applies the Dynamic Time Warping (DTW) algorithm [55] among all SUPs so that these SUPs are grouped into clusters, each of which contains SUPs that are associated with the same AOM. The SUP Generation component is responsible for synthesizing the Synthetic SUPs (SySUPs) using the set of extracted SUPs, the SUPs characteristics, and the AOMs. This module is composed of multiple submodules, each of which accounts for a particular part in the formation of SySUPs. Finally, the Validation module is used to evaluate the similarity of the resulting SySUPs compared to the extracted SUPs. Figure 3 illustrates the architecture of HYDROSAFE.

5. SUPs Extraction and Smoothing

This section discusses the process of extracting SUPs from the time series data [23] to be used in the synthesis of SySUPs.

5.1. SUPs Extraction

All SUPs are extracted from the Rainforest Automation Energy Dataset (RAE) by Makonin et al. [23] on a daily basis per appliance. This dataset is chosen because it is extensively used in numerous works within the literature, making it a well-established and validated source of data for research on appliance power consumption. The extraction process is based on XCorrelation [56], which is based on this previous work [14]. For any appliance, the daily consumption sequence,

Ω_{a}^{d}

, contains zero or more non-overlapping SUPs, such that:

\begin{matrix} ψ_{i}^{p_{α}} = {\{ω_{n}\}}_{n = n^{s_{i}}}^{n^{e_{i}}}, ψ_{j}^{p_{β}} = {\{ω_{n}\}}_{n = n^{s_{j}}}^{n^{e_{j}}} \\ i < j \forall {i, j} \in {1, 2, \dots, | Ψ_{a}^{d} |} \\ 1 \leq n^{s_{i}} < n^{e_{i}} < n^{s_{j}} < n^{e_{j}} \leq n_{*} \end{matrix}

(13)

where the first SUP,

ψ_{i}^{p_{α}}

, runs with the operation mode,

p_{α}

, and starts at

n = n^{s_{i}}

and ends at

n = n^{e_{i}}

. The other SUP,

ψ_{i}^{p_{β}}

, runs with the operation mode,

p_{β}

, and starts at

n = n^{s_{j}}

and ends at

n = n^{e_{j}}

. The activation time of each SUP is before the switch off time as

n^{s_{i}} < n^{e_{i}}

,

n^{s_{j}} < n^{e_{j}}

, and both SUPs do not overlap throughout the day, such that

n^{e_{i}} < n^{s_{j}}

. The power consumption samples that belong to the time intervals between SUPs are defined as follows:

\begin{matrix} ω_{n} \leq τ^{ε} \forall n \in {n^{e_{i}} + 1, n^{e_{i}} + 2, \dots, n^{s_{j}} - 1} \end{matrix}

(14)

where

τ^{ε}

is a threshold that represents the stand-by power which corresponds to the appliance state when the appliance is switched off or switched to a stand-by state. In this state, the appliance consumption is closest to zero as the consumption is very minimal, which is caused by low-rated appliance components such as Light Emitting Diodes (LED lights).

5.2. SUPs Smoothing

Typically, any signal can be decomposed into multiple components. Signal components can be high or low in frequency. In this context, a high frequency component with relatively low amplitude is considered noise, and needs to be reduced [57]. The moving median smoother is used to reduce the high frequency component [58]. To reduce the high frequency component within SUPs, a transformation function is applied on the SUP sequence,

ψ

, of length,

| ψ |

, to generate the smoothed SUP,

\hat{ψ}

, of length,

| \hat{ψ} |

. The moving median smoother,

M

, is selected. This transformation is performed as follows:

\begin{matrix} \hat{ψ} (n) = M (ψ (k)) \end{matrix}

(15)

where W is the sliding window size that is used by

M

. By using a sliding window transformation, the transformed sequence length is shortened by a factor of the window size, W, where two sequences of length

⌊\frac{W}{2}⌋

are padded to

\hat{ψ}

to compensate the decrease in length caused by the transformation. A leading sequence is prepended before

\hat{ψ}

, where each prepended item is equal to the first sample of

\hat{ψ}

, i.e.,

\hat{ψ} (1)

. A lagging sequence is appended after

\hat{ψ}

, where each appended item is the last sample of

\hat{ψ}

, i.e.,

\hat{ψ} (| \hat{ψ} |)

. This is shown as follows:

\begin{matrix} {\hat{ψ}}_{after} = \underset{⌊ \frac{W}{2} ⌋}{\underset{︸}{{\hat{ψ} (1), \dots, \hat{ψ} (1)}}} \cup {\hat{ψ}}_{before} \cup \underset{⌊ \frac{W}{2} ⌋}{\underset{︸}{{\hat{ψ} (| \hat{ψ} |), \dots, \hat{ψ} (| \hat{ψ} |)}}} \end{matrix}

(16)

where

{\hat{ψ}}_{before}

is the smoothed SUP before padding, and

{\hat{ψ}}_{after}

is the smoothed SUP after padding. After this concatenation, the length of the transformed SUP,

| \hat{ψ} |

, will be equal to the length of the SUP,

| ψ |

.

The choice of the moving median smoother in this context stems from its advantageous property of edge preservation. When processing SUPs characterized by square waveforms with minor fluctuations, the precise location of edges holds significance in delineating SUP characteristics. Unlike alternative smoothers like the moving mean, the moving median exhibits superior edge preservation capabilities. Specifically, the moving mean introduces distortions to edges, resulting in the transformation of vertical edges into skewed counterparts. This distortion introduces an inherent uncertainty, thereby compromising the precision of SUP characteristics delineation. As such, the utilization of the moving median smoother ensures enhanced accuracy in defining SUP characteristics by mitigating edge distortion effects. In Figure 4, a square wave with added noise is illustrated. The smoothed wave obtained using the moving average exhibits a skewness in the edges. Conversely, the smoothed wave obtained using the moving median effectively preserves the edges.

Selecting the smoothing window size, W, impacts the degree of similarity between

ψ

and

\hat{ψ}

. Higher values of W produce wider windows and, therefore, significantly reduce the high frequency noise, while they may cause deformation in the major shape (the low frequency component) of

\hat{ψ}

, which results in a higher distance. On the other hand, lower values of W produce narrower windows and, therefore, the high frequency noise may not be sufficiently eliminated, which may also result in a higher distance. To measure this impact, according to Tan et al. [59], the Minkowski distance, particularly the

L 2

norm of the Minkowski distance, is utilized to measure this impact, namely the Normalized Euclidean Distance [59]. This is defined as follows:

\begin{matrix} E (ψ, \hat{ψ}) = \frac{1}{| ψ |} \sqrt{\sum_{k = 1}^{| ψ |} {|ψ (k) - \hat{ψ} (k)|}^{2}} \end{matrix}

(17)

where the function, E, measures the normalized pairwise distance between the points,

ψ (k)

and the corresponding point,

\hat{ψ} (k)

, assuming that

| ψ | = | \hat{ψ} |

. Figure 5 illustrates the effect of varying the window size, W, on the distance,

E (ψ, \hat{ψ})

. The window size W spans from 5 to 250 samples, and the distance E is depicted for 25 SUPs. A higher value on the plot indicates a greater difference in

E (ψ, \hat{ψ})

, suggesting that some of the significant topographies of

ψ

have been smoothed out. Conversely, as the plot flattens, it signifies that the smoothing function has retained all major topographies defining the shape of

ψ

while eliminating the noise component. These patterns shown in the figure highlight how the SUPs respond differently based on the characteristics of the states being removed and the corresponding window size. The plot presents three distinct groups of SUPs, each exhibiting a staircase-like response to the distance, E. This staircase pattern emerges due to the progressive removal of short states from the SUPs as the window size increases. Group 1 shows an early step up in the plot, which corresponds to the removal of shorter states from the SUP. As the window size increases slightly, these short states are excluded, resulting in a noticeable increase in the plot. Group 2 and Group 3 demonstrate steps that occur at larger window sizes. In these cases, the removal of states with wider durations causes the step in the plot to appear later, as it requires a larger window size for these wider states to be excluded. The initial disparity in

E (ψ, \hat{ψ})

stems from the elimination of spikes at the beginning of each state (referred to as the inrush current [60]), which typically contributes to a brief period of high power consumption.

6. Extraction of SUPs Features

In major appliances, a SUP consists of a sequence of activations and deactivations of the internal components of the appliance. For example, in the clothes dryer, the heating element and the spinning motor switch on and off during the operation cycle. During the activation of such internal elements, the appliance is considered to be in a particular state.

A state corresponds to the power values recorded during the activation of these components over specific time interval. This process of switching the appliance’s components on and off results into a sudden changes in the power consumption, which show as a common pattern in the corresponding SUP as a sequence of square-like wave with abrupt changes (edges) between the states. The exact sample when the abrupt change occurs is defined as Exact Edge. When detecting these exact edges, the detection mechanism firstly specifies a Thick Edge, which is an interval that surrounds the exact edge with high likelihood indication of the existence of the exact edge.

The number of states, their distributions, durations, and power levels are all characteristics that determine the features of SUPs of a specific AOM compared to other AOMs within the process of generating SySUPs. The following subsections describe the steps to determine the characteristics of SUPs in terms of the states that form these SUPs.

6.1. Estimation of State Edges

An indicator vector, I, is used to determine the bounds of each state in the SUP,

\hat{ψ}

. The Median Difference Test (MDT) [61] is used to calculate the values of the indicator vector, I. MDT utilizes a moving window with length,

W^{I}

, that slides over the subsequences of

\hat{ψ}

. The MDT estimates the presence of an exact edge within

\hat{ψ}

by dividing the moving widow into two equal length partitions. The median,

M

, is evaluated for each partition along with the standard deviation,

σ

, of the entire window. The indicator vector is defined as the following:

\begin{matrix} I^{\hat{ψ}} (n) = \sqrt{σ (\hat{ψ} (n^{i})) |M {(\hat{ψ} (n^{j}))}^{2} - M {(\hat{ψ} (n^{q}))}^{2}|} \\ \forall n^{i} \in k, \forall n^{j} \in k^{l}, \forall n^{q} \in k^{r} \\ k^{l} = \{n - \frac{W^{I}}{2}, n - \frac{W^{I}}{2} + 1, \dots, n\}, k^{r} = \{n + 1, \dots, n + \frac{W^{I}}{2} - 1, n + \frac{W^{I}}{2}\} \\ k = k^{l} \cup k^{r} \\ \forall n, 1 \leq n \leq | \hat{ψ} | \end{matrix}

(18)

where

I^{\hat{ψ}}

is the indicator vector corresponding to the SUP,

\hat{ψ}

.

M (\hat{ψ} (n^{j}))

is the median of the SUP samples corresponding to the left partition of the window as of

n^{j} \in k^{l}

, where

k^{l}

is the list of sample indexes of the left window.

M (\hat{ψ} (n^{q}))

is the median of the SUP samples correspond to the right partition of the window as of

n^{q} \in k^{r}

, where

n^{q}

is the list of indexes of the right window.

σ (\hat{ψ} (n^{i}))

is the standard deviation of the SUP samples of the entire window as of

n^{i} \in k

.

The evaluated value of

I^{\hat{ψ}} (n)

is proportional to the likelihood of having an exact edge in

\hat{ψ}

at sample index n. When the window falls within a rising edge, the value of the left partition median,

M (\hat{ψ} (k^{q}))

, is lower that the value of the right partition median,

M (\hat{ψ} (k^{q}))

. The difference between the two medians will be relatively high. Additionally, since there is a large difference between the left and right partitions values, the indicator value,

I^{\hat{ψ}}

, increases by a factor of the standard deviation of the entire window,

σ (\hat{ψ} (k^{i}))

. The leading and lagging gaps caused by the moving window is padded using the same technique used in Equation (16).

The indicator vector,

I^{\hat{ψ}}

, is depicted in Figure 6. The plot of

I^{\hat{ψ}}

shows two states: A steady low amplitude values that indicate a steady behavior in

\hat{ψ}

as the change in the value of

\hat{ψ}

is relatively small. The other state is a list of spikes that indicate a higher likelihood that an abrupt change in the value of

\hat{ψ}

has occurred. The starting and ending sample indexes of each of the spikes bound the exact edge in

\hat{ψ}

. These two bounds form a thick edge, which is a pair of sample indexes that indicates the presence of a rising of falling exact edges in

\hat{ψ}

. The sequence of thick edges,

Π^{\hat{ψ}}

, of size

| Π^{\hat{ψ}} |

, that is identified by

I^{\hat{ψ}}

in

\hat{ψ}

, is defined as follows:

Π^{\hat{ψ}} = {\{π_{i}\}}_{i = 1}^{| Π^{\hat{ψ}} |}, | Π^{\hat{ψ}} | \leq \frac{| \hat{ψ} |}{2}

(19)

where

π_{i}

is the

i^{t h}

thick edge that is defined as the following pair of sample indexes:

π_{i} = (n_{i}^{o}, n_{i}^{ι}), 1 \leq n_{i}^{o} \leq n_{i}^{ι} \leq | \hat{ψ} |

(20)

where

π_{i}

defines two boundaries,

n_{i}^{o}

as the lower bound, while the upper bound is

n_{i}^{ι}

. The sequence of thick edges,

Π^{\hat{ψ}}

is obtained by applying a threshold,

τ^{I}

, so that a thick edge,

π

, is defined as the following:

\begin{matrix} I^{\hat{ψ}} (n) \leq τ^{I}, \forall n \in \{n_{j}^{ι} + 1, n_{j}^{ι} + 2, \dots, n_{k}^{o}\} \\ π_{j} = (n_{j}^{o}, n_{j}^{ι}), π_{k} = (n_{k}^{o}, n_{k}^{ι}) \\ 1 \leq n_{j}^{o} \leq n_{j}^{ι} < n_{k}^{o} \leq n_{k}^{ι} \leq | \hat{ψ} | \end{matrix}

(21)

where

π_{j}

,

π_{k}

are two consecutive thick edges.

6.2. Determining SUP States

A SUP encompasses a succession of states representing the power consumption behavior of internal electrical components within an appliance. Each state within the sequence corresponds to the power values recorded during the activation of these components over specific time intervals. These sequences tend to exhibit a relatively stable pattern with slight variations, reflecting the consistent power consumption behavior of the internal components.

The sequence of states,

Λ

, that is associated with

\hat{ψ}

is defined as follows:

\begin{matrix} Λ^{\hat{ψ}} = {λ_{1}, \dots, λ_{i}, \dots, λ_{| Λ |}} \\ λ_{i} = (e^{o}, e^{ι}, ω^{λ}), e^{o} < e^{ι} \\ λ_{1} = (1, 1, ω_{1}^{λ}), λ_{| Λ |} = (| \hat{ψ} |, | \hat{ψ} |, ω_{| \hat{ψ} |}^{λ}) \end{matrix}

(22)

where the state,

λ_{i}

, is represented by a tuple with three elements: the left exact edge,

e^{o}

, the right exact edge,

e^{ι}

, and the power value,

ω^{λ}

. The values of the exact edges are determined through the process of Edge Thinning. In this process, the exact edge, e, is evaluated from the corresponding thick edge,

π

. One method for edge thinning is argmax method where the value of e equals the sample index that produces the maximum indicator vector value,

I^{\hat{ψ}}

. This is defined as:

\begin{matrix} e_{i}^{o} = a r g m a x (I^{\hat{ψ}} (n)), \forall n \in π_{i} \\ e_{i}^{ι} = a r g m a x (I^{\hat{ψ}} (n)), \forall n \in π_{i + 1} \\ i \in {1, 3, 5, \dots} \end{matrix}

(23)

where the left exact edge,

e_{i}^{o}

, is selected from a thick edge,

π_{i}

, i.e.,

e_{i}^{o} \in π_{i}

and the right exact edge,

e_{i}^{ι}

, is selected from a thick edge,

π_{i + 1}

, i.e.,

e_{i}^{ι} \in π_{i + 1}

.

The other method of edge thinning is the mid-point method. The center point of the thick,

π_{i}

, is selected as the exact edge. This is shown as follows:

\begin{matrix} e_{i}^{o} = \frac{n_{i}^{o} + n_{i}^{ι}}{2}, e_{i}^{ι} = \frac{n_{i + 1}^{o} + n_{i + 1}^{ι}}{2} \\ i \in {1, 3, 5, \dots} \end{matrix}

(24)

As both of the left and right exact edges of the state,

λ_{i}

are determined, the power value of the state,

ω_{i}^{λ}

, is evaluated as the median,

M

, of the power values in

\hat{ψ}

corresponding to each sample index in the state,

λ_{i}

. This is defined as the following:

\begin{matrix} ω_{i}^{λ} = M ({\{ω_{k}\}}_{k = e_{i}^{o}}^{e_{i}^{ι}}) \end{matrix}

(25)

where

ω_{k}

, is the power values of

\hat{ψ}

at the sample index, k.

The sequence of states,

Λ

, that represents

\hat{ψ}

is defined as follows:

\begin{matrix} Λ^{\hat{ψ}} = {λ_{1}, \dots, λ_{i}, \dots, λ_{| Λ |}} \\ λ_{i} = (e_{i}^{o}, e_{i}^{ι}, ω^{λ}) \end{matrix}

(26)

such that

Λ

contains all the features that distinguish one SUP from another. These features are used for SUP classification and clustering [62].

7. SUP Clustering

Typically, disaggregated PCDs comprise a set of SUPs per appliance where each SUP represents an activation with a particular AOM. This section discusses how to use DTW to group SUPs that belong to the same AOM in clusters [63]. The DTW distance measure is used in HYDROSAFE as a measure of similarity among SUPs. This similarity is used to create clusters of appliance SUPs where the members of each cluster share the same operation mode. The DTW distance is calculated between any two temporal sequences, Q and P, such that

Q = {q_{i}}_{i = 1}^{n}

and

P = {p_{j}}_{j = 1}^{m}

. DTW maps the elements of Q and P to minimize the distance between Q and P based on a warping path [55],

W \in W = {w_{k}}_{k = 1}^{K}

over all possible paths,

W

. This is defined as follows:

\begin{matrix} D T W (Q, P) = min_{\begin{matrix} W \\ W \in W \end{matrix}} \{\sum_{k = 1}^{K} ℵ (w_{k})\} \\ max (m, n) \leq K < m + n - 1 \end{matrix}

(27)

where

w_{k} = (i, j)

is the pair of indices, and

ℵ (w_{k})

is the distance element that belongs to the warping path W that aligns the elements of both Q and P such that the distance between them is minimized. The minimization is obtained through iteratively minimizing the distance of the current element and the adjacent elements within the distance matrix, ℵ, such that:

\begin{matrix} ℵ (i, j) = d_{e} (q_{i}, p_{j}) + min {ℵ (i - 1, j - 1), ℵ (i - 1, j), ℵ (i, j - 1)} \end{matrix}

(28)

where

d_{e} (q_{i}, p_{j})

is the Euclidean distance between

q_{i}

and

p_{j}

. The optimal warping path is found by tracing ℵ backward, choosing the adjacent points with the lowest distance [55].

Generally, SUPs that belong to a specific operation mode have a smaller DTW distance among each other, and hence, higher similarity. On the other hand, pairwise DTW distance which is the distance between two SUPs that belong to different operation modes have higher distance and thus less similarity. To ensure the uniformity in the distances, the normalized pairwise DTW distance,

\bar{δ}

, between two SUPs,

ψ^{p_{i}}

and

ψ^{p_{j}}

, is defined as the following:

\begin{matrix} \bar{δ} (ψ^{p_{i}}, ψ^{p_{j}}) = \frac{D T W (ψ^{p_{i}}, ψ^{p_{j}})}{| ψ^{p_{i}} | + | ψ^{p_{j}} |} \\ p_{i}, p_{j} \in P^{a}, ψ^{p_{i}} \in Ψ_{a}^{d, p_{i}}, ψ^{p_{j}} \in Ψ_{a}^{d, p_{j}} \end{matrix}

(29)

The distance matrix,

Δ

, is a

(| Ψ_{a}^{d} | \times | Ψ_{a}^{d} |)

square matrix that contains the normalized pairwise distances,

\bar{δ}

, for all SUPs

ψ^{p} \in Ψ_{a}^{d}

. The distance matrix is defined as:

\begin{matrix} Δ = \\ [\begin{matrix} \begin{matrix} [\begin{matrix} Δ_{i, j} = \bar{δ} (ψ_{i}^{p_{0}}, ψ_{j}^{p_{0}}) \\ i, j \in {1, \dots, | Ψ_{a}^{d, p_{0}} |} \end{matrix}] \end{matrix} \\ [\begin{matrix} ⋱ \end{matrix}] \\ \begin{matrix} [\begin{matrix} Δ_{i, j} = \bar{δ} (ψ_{i}^{p_{k}}, ψ_{j}^{p_{k}}) \\ i, j \in {| Ψ_{a}^{d, p_{k - 1}} |, \dots, | Ψ_{a}^{d, p_{k}} | - | Ψ_{a}^{d, p_{k - 1}} |} \end{matrix}] \end{matrix} \end{matrix}] \end{matrix}

(30)

where the distance matrix,

Δ

, is a symmetric square matrix, i.e.,

Δ_{i, j} = Δ_{j, i}

since each of its elements is compared to all other elements in

Δ

. The element

Δ_{i, j}

represents the distance between

ψ_{i}^{p_{k}}

and

ψ_{j}^{p_{l}}

, such that:

\begin{matrix} Δ_{i, j} = \bar{δ} (ψ_{i}^{p_{k}}, ψ_{j}^{p_{l}}) \\ \forall {ψ_{i}^{p_{k}}, ψ_{j}^{p_{l}}} \in Ψ_{a}^{d} \end{matrix}

(31)

Since

Δ

is a symmetric square matrix, the upper triangle in

Δ

contains the same values in the lower triangle. The diagonal values of

Δ

are defined as:

\begin{matrix} Δ_{i, j} = 0, \forall i = j \end{matrix}

(32)

which refers to the zero distance between a SUP and itself.

The SUPs clustering is based on the Density And Dynamic Time Warping Based Spatial Clustering For Appliance Operation Modes (DDTWSC) [63]. DDTWSC is a clustering algorithm for SUPs based on the DTW distance. DDTWSC utilizes two hyperparameters:

ϵ, μ

[64] where

ϵ

is the Eps-neighborhood hyperparameter that determines the maximum radius in which all SUPs within that radius will be considered for further steps.

μ

represents the minimum number of adjacent elements located in the given region surrounded by

ϵ

. The clustering of the SUPs is modeled as follows:

\begin{matrix} {{Ψ_{a}^{p}}^{p \in P}, Ψ_{a}^{o}} = D D T W C S (Ψ_{a}, μ, ϵ) \end{matrix}

(33)

such that, by the end of the algorithm, the set of all SUPs,

Ψ_{a}

is clustered into

| P_{a} |

clusters. Each cluster,

Ψ_{a}^{p}

, contains all SUPs that belongs to the operation mode, p, while the outlier SUPs set is

Ψ_{a}^{o}

.

Based on the calculated matrix,

Δ_{i, j}

, a subset of directly density-reachable SUPs,

Ψ_{a}^{*}

, is selected from

Ψ_{a}

that contains all SUPs,

ψ_{j}

, with a distance,

δ (ψ_{i}, ψ_{j})

, less than a predetermined threshold,

ϵ

, such that:

\begin{matrix} Ψ_{a}^{*} = {\{ψ_{j} \in Ψ_{a} ∣ \bar{δ} (ψ_{i}, ψ_{j}) < ϵ\}}_{i = j = 1}^{| Ψ_{a} |, j \neq i} \end{matrix}

(34)

The next step in the clustering process is when the next hyperparameter,

μ

, is used. Three different subsets are formed upon applying

μ

and

ϵ

on

Ψ_{a}^{*}

, namely the core SUPs set,

Ψ_{a}^{c}

, the border SUPs set,

Ψ_{a}^{b}

, and the outlier SUPs set,

Ψ_{a}^{o}

. These subsets are defined as follows corresponding to the value of

μ

:

\begin{matrix} \{\begin{matrix} Ψ_{a}^{o} \Leftarrow Ψ_{a}^{o} \cup Ψ_{a}^{*}, & | Ψ_{a}^{*} | = 0 \\ Ψ_{a}^{b} \Leftarrow Ψ_{a}^{b} \cup Ψ_{a}^{*}, & 0 \leq | Ψ_{a}^{*} | < μ \\ Ψ_{a}^{c} \Leftarrow Ψ_{a}^{c} \cup Ψ_{a}^{*}, & | Ψ_{a}^{*} | \geq μ \end{matrix} \end{matrix}

(35)

where

Ψ_{a}^{o}

contains the SUPs that are considered outliers to any cluster.

Ψ_{a}^{c}

contains the core SUPs which incrementally joins other core SUPs in further iterations until it forms a cluster,

Ψ_{a}^{P}

.

Ψ_{a}^{b}

contains the border SUPs which may join other core sets in further iterations or remain as-is until the end of the algorithm. By The end of the algorithm, the set of all SUPs,

Ψ_{a}

, is clustered into

| P_{a} |

clusters. Each cluster,

Ψ_{a}^{p}

, contains all SUPs that belongs to the operation mode, p.

Assuming that SUPs are grouped in

Δ

by the operation mode, each row in

Δ

contains a set of values that are more similar to each other in the same operation mode, such that:

\begin{matrix} \bar{δ} (ψ^{p_{i}}, ψ^{p_{j}}) \approx \bar{δ} (ψ^{p_{i}}, ψ^{p_{k}}), p_{i} = p_{j} = p_{k} \end{matrix}

(36)

while it also contains a set of values that are relatively larger. That correspond to distances with SUPs among different operation modes. This is defined as:

\begin{matrix} \bar{δ} (ψ^{p_{i}}, ψ^{p_{j}}) < < \bar{δ} (ψ^{p_{i}}, ψ^{p_{k}}), p_{i} = p_{j} \neq p_{k} \end{matrix}

(37)

A heat map is used to visualize the operation modes based on the pairwise distance among them. Figure 7 shows the distance matrix,

Δ

, for three appliances. The heatmap shows a gradient that refers to the degree of similarity between two corresponding SUPs. Darker color implies higher similarity while lighter color implies less similarity. For example, in Figure 7A, the dryer shows three different operation modes, as it shows three darker areas surrounded by squiggly lines. According to Equation (37), if we consider SUP-0, it is observed that the distance,

\bar{δ}

, between SUP-0 and SUPs 1 to 6 is very minimal. This means that high similarity exists among this cluster of SUPs, i.e., these SUPs share the same operation mode. On the other hand, according to Equation (37), the same SUP-0 has higher distance with SUPs 7 to 8 and 9 to 25, since these SUPs belong to different operation modes. A summary of the statistics of the pairwise DTW distances for the RAE dataset [23] are listed in Table 2 and Table 3.

8. Generating Synthetic SUPs

A set of Synthetic SUPs (SySUPs),

{\ddot{Ψ}}_{a}^{p}

, is defined as:

\begin{matrix} {\ddot{Ψ}}_{a}^{p} = {\{{\ddot{ψ}}_{i}\}}_{i = 1}^{| {\ddot{Ψ}}_{a}^{p} |} \end{matrix}

(38)

where

{\ddot{ψ}}_{i}

is a SySUP for the appliance, a, with operation mode, p. The value of

| {\ddot{Ψ}}_{a}^{p} |

is selected to be a large number (typically

| {\ddot{Ψ}}_{a}^{p} | > 1000

) for evaluation purposes.

A base SySUP,

\ddot{ψ} \in {\ddot{Ψ}}_{a}^{p}

, represents a SySUP without adding the effect of any additional tuning parameters. The base SySUP,

\ddot{ψ}

, is defined as follows:

\begin{matrix} \ddot{ψ} = {\{{\{ω_{j}^{λ}\}}_{j = 1}^{e_{λ}^{ι} - e_{λ}^{o}}\}}^{λ \in Λ^{{\hat{ψ}}^{p}}} \end{matrix}

(39)

where

ω^{λ}

is the power value of the state,

λ \in Λ^{{\hat{ψ}}^{p}}

.

Λ^{{\hat{ψ}}^{p}}

, is a randomly selected SUP with operation mode, p, and

e_{λ}^{ι} - e_{λ}^{o}

is the length of the state,

λ

. The following subsections discuss the probabilistic components that is added to the base SySUP.

The base SySUP,

\ddot{ψ}

, forms the foundation of our synthetic dataset. It consists of sequences of power values,

ω_{j}^{λ}

, which represent the power consumption in different states

λ

of an appliance a operating in mode p. The set

Λ^{{\hat{ψ}}^{p}}

includes all possible states

λ

that an appliance can be in during the operation mode p.

8.1. The White Noise Component

Real-world PCDSs often includes noise due to various factors like sensor inaccuracies or environmental influences causing the states of a SUP not to be completely flat. To simulate this, and to better capturing the variability and stochastic nature of real-world appliance usage patterns, a random noise is added to the power values

ω_{j}^{λ}

. This noise can be modeled as a small random perturbation, typically drawn from a normal distribution centered around zero. The noise coefficient,

ξ

, is defined as follows:

\begin{matrix} ξ_{j} = N (μ^{ξ}, σ^{ξ}) \end{matrix}

(40)

where

ξ_{j}

is the added noise to the

j^{t h}

sample in the base SySUP,

\ddot{ψ}

. This noise is selected based on a normal distribution function,

N

, with a mean of

μ^{ξ}

and standard deviation

σ^{ξ}

. The set of SySUPs with the added noise,

{}^{ξ}{\ddot{Ψ}}_{a}^{p}

, is defined as:

\begin{matrix} {}^{ξ}{\ddot{Ψ}}_{a}^{p} = {\{{}^{ξ}{\ddot{ψ}}_{i}\}}_{i = 1}^{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |} \end{matrix}

(41)

where the resulting SySUP with the added noise,

{}^{ξ}{\ddot{ψ}}

, is defined as:

\begin{matrix} {}^{ξ}{\ddot{ψ}} = {\{{\{ω_{j}^{λ} + ξ_{j}\}}_{j = 1}^{e_{λ}^{ι} - e_{λ}^{o}}\}}^{λ \in Λ^{{\hat{ψ}}^{p}}}, {}^{ξ}{\ddot{ψ}} \in {}^{ξ}{\ddot{Ψ}}_{a}^{p} \end{matrix}

(42)

8.2. The Switch-On Surge Component

The Switch-On Surge (SOS), or Inrush Current, is the maximum instantaneous input current consumed by electrical transformers within an electrical device when first switched on [60]. This phenomenon typically occurs within the first few samples of high-power states when a major component within the appliance is triggered. The SOS component exhibits distinct characteristics that are important to understand for accurate modeling and analysis. SOS is characterized by a sharp initial peak at the beginning of a state, which then decays in amplitude over time. The High Initial Peak occurs when an electrical device, particularly one with inductive loads such as motors or transformers, is first powered on, it draws a significantly higher current than during its steady-state operation. This surge is due to the sudden demand for energy to establish magnetic fields in inductive components. The Decay Over Time shows when the SOS decays rapidly within a short period, transitioning to the normal operating current. This decay is a critical aspect of inrush current behavior and can be modeled using various mathematical functions.

The SOS component can be modeled using different approaches. One common method used in approximated scenarios is a reciprocal function such as

\frac{1}{x}

that might be used to describe the initial rapid drop in current.

The set of SySUPs with the added SOS,

{}^{ϑ}{\ddot{Ψ}}_{a}^{p}

, is defined as:

\begin{matrix} {}^{ϑ}{\ddot{Ψ}}_{a}^{p} = {\{{}^{ϑ}{\ddot{ψ}}_{i}\}}_{i = 1}^{| {}^{ϑ}{\ddot{Ψ}}_{a}^{p} |} \end{matrix}

(43)

where the resulting SySUP with the added SOS,

{}^{ϑ}{\ddot{ψ}}

, is defined as:

\begin{matrix} {}^{ϑ}{\ddot{ψ}} = {\{{\{ω_{j}^{λ} + ξ_{j} + \frac{ϑ_{j}}{1 + j}\}}_{j = 1}^{e_{λ}^{ι} - e_{λ}^{o}}\}}^{λ \in Λ^{{\hat{ψ}}^{p}}}, {}^{ϑ}{\ddot{ψ}} \in {}^{ϑ}{\ddot{Ψ}}_{a}^{p} \end{matrix}

(44)

where

ϑ_{j}

is the SOS coefficient that follows a normal distribution function as:

\begin{matrix} ϑ = N (μ^{ϑ}, σ^{ϑ}) \end{matrix}

(45)

The term

\frac{ϑ_{j}}{1 + j}

represents a reciprocal function in which this models of the SOS component simulates the behavior of SOS current at each state.

8.3. The Ripple Component

In addition to the SOS component, appliance profiles may also exhibit ripple components. Ripple refers to the small, periodic oscillations in the electrical current or voltage within the steady-state periods of an appliance’s operation. It shows as temporal fluctuation in the state power value in a sinusoidal form. These ripples can arise from various factors such as switching operations of internal components, power supply noise, or inherent characteristics of the device’s operation.

The set of SySUPs with the added ripple,

{}^{ϱ}{\ddot{Ψ}}_{a}^{p}

, is defined as:

\begin{matrix} {}^{ϱ}{\ddot{Ψ}}_{a}^{p} = {\{{}^{ϱ}{\ddot{ψ}}_{i}\}}_{i = 1}^{| {}^{ϱ}{\ddot{Ψ}}_{a}^{p} |} \end{matrix}

(46)

where the resulting SySUP with the added ripple,

{}^{ϱ}{\ddot{ψ}}

, is defined as:

\begin{matrix} {}^{ϱ}{\ddot{ψ}} = {\{{\{ω_{j}^{λ} + ξ_{j} + \frac{ϑ_{j}}{1 + j} + γ s i n (\frac{j}{ρ})\}}_{j = 1}^{e_{λ}^{ι} - e_{λ}^{o}}\}}^{λ \in Λ^{{\hat{ψ}}^{p}}}, {}^{ϱ}{\ddot{ψ}} \in {}^{ϱ}{\ddot{Ψ}}_{a}^{p} \end{matrix}

(47)

where two parameters control the behavior of the ripple: the ripple amplitude,

γ

, and the ripple period length,

ρ

. The amplitude

γ

determines the magnitude of the oscillations in the ripple, while the period length

ρ

dictates the frequency of these oscillations, or how quickly they repeat over time.

The values of these parameters are selected based on a normal distribution, which allows for realistic variability in the synthetic appliance profiles. Specifically:

\begin{matrix} ρ = N (μ^{ρ}, σ^{ρ}), γ = N (μ^{γ}, σ^{γ}) \end{matrix}

(48)

here,

μ^{ρ}

and

σ^{ρ}

are the mean and standard deviation of the period length, respectively, while

μ^{γ}

and

σ^{γ}

are the mean and standard deviation of the ripple amplitude. By sampling from these normal distributions, each synthetic SUP can exhibit unique but realistically varying ripple characteristics. This stochastic approach ensures that the synthetic profiles capture the natural diversity observed in real appliance operation.

8.4. State Edge Position Variation

The last parameter that controls the shape of SySUPs is the Exact Edge Position (EEP), ℓ. This parameter introduces a variation factor to the position (sample index) of the two exact edges that define the boundaries of a state as defined in Equation (22). By varying the exact positions of these edges, the synthetic SUPs can better mimic the natural variability observed in real appliance profiles.

The set of SySUPs with the state edge position variation factor,

{}^{ℓ}{\ddot{Ψ}}_{a}^{p}

, is defined as:

\begin{matrix} {}^{ℓ}{\ddot{Ψ}}_{a}^{p} = {\{{}^{ℓ}{\ddot{ψ}}_{i}\}}_{i = 1}^{| {}^{ℓ}{\ddot{Ψ}}_{a}^{p} |} \end{matrix}

(49)

where the resulting SySUP with the added EEP,

{}^{ℓ}{\ddot{ψ}}

, is defined as:

\begin{matrix} {}^{ℓ}{\ddot{ψ}} = {\{{\{ω_{j}^{λ_{i}} + ξ_{j} + \frac{ϑ_{j}}{1 + j} + γ sin (\frac{j}{ρ})\}}_{j = 1}^{(e_{i}^{ι} + ℓ_{i}^{ι}) - (e_{i}^{o} + ℓ_{i}^{o})}\}}^{λ_{i} \in Λ^{{\hat{ψ}}^{p}}}, {}^{ℓ}{\ddot{ψ}} \in {}^{ℓ}{\ddot{Ψ}}_{a}^{p} \\ λ_{i} = (e_{i}^{o} + ℓ_{i}^{o}, e_{i}^{ι} + ℓ_{i}^{ι}, ω^{λ}) \end{matrix}

(50)

such that:

\begin{matrix} 0 \leq e_{i}^{o} + ℓ_{i}^{o} < e_{i}^{ι} + ℓ_{i}^{ι} \leq e_{i + 1}^{o} + ℓ_{i + 1}^{o} \end{matrix}

(51)

The variation in the exact edge positions,

ℓ_{i}^{o}

and

ℓ_{i}^{ι}

, is sampled from a normal distribution. This stochastic approach ensures that the edges of the states are not fixed but exhibit natural fluctuations, thereby adding realism to the synthetic profiles. Specifically:

\begin{matrix} {ℓ_{i}^{o}, ℓ_{i}^{ι}, ℓ_{i + 1}^{o}} \in N (μ_{i}^{ℓ}, σ_{i}^{ℓ}) \end{matrix}

(52)

here,

μ_{i}^{ℓ}

and

σ_{i}^{ℓ}

are the mean and standard deviation of the edge position variations, respectively. By sampling from these normal distributions, each synthetic SUP can exhibit unique, yet realistically varying state boundaries. This method ensures that the synthetic profiles can accurately reflect the dynamic nature of real appliance operation, where the exact start and end times of states can vary due to numerous factors such as load conditions, user interactions, and inherent appliance behavior.

9. Evaluation

To evaluate the impact of the tuning parameters explained in the previous section on the SySUP with respect to the SUP, two evaluation metrics are defined:

\bar{δ}

and

\bar{κ}

. The first evaluation metric,

\bar{δ}

, is designed to measure the average similarity between the SySUPs and the actual SUPs for a given appliance and operation mode. This metric uses the Dynamic Time Warping (DTW) distance, a well-known method for measuring similarity between time series data. The DTW distance is normalized by the sum of the lengths of the sequences being compared, ensuring that the metric is scale-invariant. The formal definition of

\bar{δ}

is defined as follows:

\begin{matrix} \bar{δ} ({}^{ξ}{\ddot{Ψ}}_{a}^{p}, Ψ_{a}^{p}) = \frac{1}{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |} \sum_{j = 1}^{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |} [\frac{1}{| Ψ_{a}^{p} |} \sum_{i = 1}^{| Ψ_{a}^{p} |} \frac{D T W ({}^{ξ}{\ddot{ψ_{j}^{p}}}, ψ_{i}^{p})}{| ψ_{i}^{p} | + | \ddot{ψ_{j}^{p}} |}] \end{matrix}

(53)

here,

\bar{δ}

represents the average DTW distance between each synthetic SUP,

{}^{ϑ}{\ddot{ψ}}_{a}^{p} \in {}^{ξ}{\ddot{Ψ}}_{a}^{p}

, and every actual SUP,

ψ_{a}^{p} \in Ψ_{a}^{p}

. The parameter a refers to the appliance, and p refers to the operation mode. The normalization factor,

| ψ_{i}^{p} | + | \ddot{ψ_{j}^{p}} |

, is the sum of the sequence lengths, ensuring that the comparison is fair across different sequence lengths.

The second evaluation metric,

\bar{κ}

, is designed to measure the consistency of the DTW distances between the synthetic and actual SUPs. This metric calculates the standard deviation of the DTW distances for each synthetic SUP and then averages these standard deviations. This provides insight into the variability of the synthetic SUPs relative to the actual SUPs. The formal definition of

\bar{κ}

is as follows:

\begin{matrix} \bar{κ} ({}^{ξ}{\ddot{Ψ}}_{a}^{p}, Ψ_{a}^{p}) & = \frac{1}{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |} \sum_{j = 1}^{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |} σ ({\{\frac{1}{| Ψ_{a}^{p} |} \sum_{i = 1}^{| Ψ_{a}^{p} |} \frac{D T W ({}^{ξ}{\ddot{ψ_{j}^{p}}}, ψ_{i}^{p})}{| ψ_{i}^{p} | + | \ddot{ψ_{j}^{p}} |}\}}_{j = 1}^{| {}^{ξ}{\ddot{Ψ}}_{a}^{p} |}) \end{matrix}

(54)

where

\bar{κ}

represents the average of the standard deviations of the DTW distances between each synthetic SUP,

{}^{ξ}{\ddot{ψ}}_{a}^{p} \in {}^{ξ}{\ddot{Ψ}}_{a}^{p}

, and every actual SUP,

ψ_{a}^{p} \in Ψ_{a}^{p}

. The notation

σ

denotes the standard deviation. By averaging the standard deviations,

\bar{κ}

provides a measure of how consistently the synthetic SUPs match the actual SUPs in terms of their DTW distances.

Together, these metrics,

\bar{δ}

and

\bar{κ}

, provide a comprehensive evaluation of the synthetic SUPs.

\bar{δ}

assesses the overall similarity, while

\bar{κ}

evaluates the consistency of this similarity across different synthetic SUPs. This dual approach ensures a robust evaluation of the synthetic profiles against the real-world data, highlighting both the average performance and the variability of the synthetic SUPs.

9.1. Evaluating the Effect of the White Noise Component

The provided plot in Figure 8 illustrates the impact of the noise coefficient,

ξ

, on the Dynamic Time Warping (DTW) distance metrics for a dryer appliance across three different Appliance Operation Modes (AOMs). The metrics

\bar{δ}

and

\bar{κ}

, as defined in Equation (53) and Equation (54), respectively, are used to assess the performance and consistency of synthetic SUPs (

{}^{ξ}{\ddot{ψ}}_{a}^{p}

) relative to the real SUPs (

Ψ_{a}^{p}

).

The plot shows the effect of the noise coefficient standard deviation on the DTW distance metrics across three AOMs for a dryer. The range of

σ^{ξ}

values is from 1 to 300 samples, with a distribution mean

μ^{ξ} = 0

.

For AOM-1, the average DTW distance (

\bar{δ}

) shows a continuous increasing trend as

σ^{ξ}

increases. This indicates that increasing noise levels result in higher discrepancies between the synthetic and real SUPs. The consistency metric (

\bar{κ}

) for AOM-1 remains relatively constant, suggesting that the variability in DTW distances does not change significantly with increasing noise levels. For AOM-2, the average DTW distance (

\bar{δ}

) initially decreases when

1 \leq σ^{ξ} \leq 50

, indicating that the added noise contributes to increasing the similarity between synthetic and real SUPs. However, beyond this range,

\bar{δ}

starts increasing, which means higher noise values lead to a decrease in similarity. The consistency metric (

\bar{κ}

) for AOM-2 remains low and stable, indicating consistent performance in terms of DTW distances. For AOM-3, the average DTW distance (

\bar{δ}

) exhibits a slight initial decrease followed by a continuous increase as

σ^{ξ}

increases. This pattern suggests that initially, small noise levels may help in improving the similarity between synthetic and real SUPs, but as noise levels increase, the similarity decreases. The consistency metric (

\bar{κ}

) for AOM-3 shows less fluctuation, indicating high consistency in DTW distances.

9.2. Evaluating the Effect of the Switch-On Surge Component

The provided plot in Figure 9 illustrates the impact of the SOS coefficient,

ϑ

, on the DTW distance metrics for a dryer appliance across three different AOMs. The metrics

\bar{δ}

and

\bar{κ}

, as defined in Equations (53) and (54), are used to assess the performance and consistency of synthetic SUPs (

{}^{ϑ}{\ddot{ψ}}_{a}^{p}

) relative to the real SUPs (

Ψ_{a}^{p}

).

The plot shows the effect of the SOS coefficient mean on the DTW distance metrics across three AOMs for a dryer. The range of

μ^{ϑ}

values is from 1 to 5000 samples, with a distribution mean

σ^{ϑ} = 100

. For AOM-1, the average DTW distance (

\bar{δ}

) starts at a relatively high value followed by a slight decrease that indicates the addition of the SOS components contributes to decreasing the distance between the SUPs and SySUPs. The curve then shows a slight increasing trend as

μ^{ϑ}

increases. This indicates that changes in the SOS coefficient mean lead to a gradual increase in the discrepancy between the synthetic and real SUPs. The consistency metric (

\bar{κ}

) for AOM-1 also follows an increasing trend, suggesting that the variability in DTW distances increases with higher values of

μ^{ϑ}

.

AOM-2 shows a similar behavior to AOM-1. The average DTW distance (

\bar{δ}

) shows a stable pattern, starting at a lower value compared to AOM-1 and remaining relatively constant with slight fluctuations. The consistency metric (

\bar{κ}

) for AOM-2 remains low and stable, indicating a consistent performance in terms of DTW distances.

For AOM-3, the average DTW distance (

\bar{δ}

) exhibits an initial increase followed by a decrease, and then rises again as

μ^{ϑ}

increases. This non-linear pattern suggests that the SOS coefficient impacts the SUPs differently at various levels. Low and high values of SOS coefficient increase the distance difference between SUPs and SySUPs, while moderate values of

μ^{ϑ} \approx 2000

samples minimizes the distance difference. The consistency metric (

\bar{κ}

) for AOM-3 shows some fluctuations, indicating varying levels of distance consistency across different

μ^{ϑ}

values.

9.3. Evaluating the Effect of the Ripple Component

To evaluate the impact of the ripple coefficient, the metrics in Equations (53) and (54) are plotted in Figure 10 for a single AOM for a dryer. The distance metric,

\bar{δ}

, evaluates the impact of the ripple parameters on the SySUP,

{}^{ϱ}{\ddot{ψ}}_{a}^{p}

, with respect to the SUPs,

Ψ_{a}^{p}

, while

\bar{κ}

reflects the consistency of distances,

\bar{δ}

.

The left plot in Figure 10 illustrates the effect of the ripple period mean

μ^{ρ}

on the DTW distance metrics. As

μ^{ρ}

increases, the average DTW distance (

\bar{δ}

) initially decreases slightly, indicating an initial improvement in similarity between the synthetic and real SUPs. However, after a certain point,

\bar{δ}

begins to increase sharply, suggesting that excessively long ripple periods introduce significant deviations from the real SUPs. This trend reflects the non-linear impact of the ripple period on the overall shape and timing of the SUPs.

The right plot in Figure 10 shows the effect of the ripple amplitude mean

μ^{γ}

on the DTW distance metrics. Similarly to the effect of

μ^{ρ}

, the plot initially decreases slightly, indicating an initial improvement in similarity between the synthetic and real SUPs. As

μ^{γ}

then increases, the average DTW distance (

\bar{δ}

) consistently rises, indicating that larger ripple amplitudes lead to greater discrepancies between the synthetic and real SUPs. This increase is less sharp more gradual compared to the ripple period impact, suggesting a more predictable but significant effect of amplitude changes on SUP similarity.

For both ripple parameters, the consistency metric (

\bar{κ}

) shows an increasing trend, but with different patterns. In the case of

μ^{ρ}

,

\bar{κ}

remains relatively stable at lower values before increasing sharply, mirroring the trend observed in

\bar{δ}

. This indicates that the consistency of DTW distances remains stable until the ripple period becomes excessively long. For

μ^{γ}

,

\bar{κ}

increases more steadily, indicating that larger ripple amplitudes consistently introduce more variability in the DTW distances.

Both ripple parameters,

μ^{ρ}

and

μ^{γ}

, exhibit a significant impact on the DTW distance metrics, although in different ways. The ripple period mean affects the SUP similarity in a non-linear fashion, with an initial decrease followed by a sharp increase in

\bar{δ}

, while the ripple amplitude mean shows a more straightforward increasing trend. This indicates that while both parameters are crucial for generating realistic synthetic SUPs, their effects on the DTW distance metrics differ in nature.

9.4. Evaluating the Effect of the State Edge Position Variation

The plot in Figure 11 depicts the impact of the EEP factor, ℓ, on the DTW distance metrics for a dryer appliance across three different AOMs: AOM-1, AOM-2, and AOM-3. The metrics

\bar{δ}

and

\bar{κ}

, as defined in Equations (53) and (54), respectively, are used to assess the performance and consistency of synthetic SUPs (

{}^{ℓ}{\ddot{ψ}}_{a}^{p}

) relative to the real SUPs (

Ψ_{a}^{p}

).

As the standard deviation of the EEP (

σ^{ℓ}

) increases, the average DTW distance (

\bar{δ}

) shows a significant increasing trend, particularly for AOM-3. This indicates that larger variations in the edge positions lead to greater discrepancies between the synthetic and real SUPs. This trend is less pronounced in AOM-1 and AOM-2, suggesting that the impact of EEP variation may be more critical for certain operation modes. The different behaviors observed in AOM-1, AOM-2, and AOM-3 indicate that the effect of EEP variation is dependent on the specific operational characteristics of each mode. AOM-3 exhibits a more pronounced increase in

\bar{δ}

with increasing

σ^{ℓ}

, which could be due to more complex or sensitive operational patterns that are highly affected by edge position shifts.

The consistency of the DTW distances, represented by the metric

\bar{κ}

, remains relatively stable across different values of

σ^{ℓ}

. This implies that while the average distance (

\bar{δ}

) increases, the variability of these distances does not fluctuate significantly. This stability in

\bar{κ}

suggests a uniform impact of the EEP variations across different samples within each AOM.

10. Conclusions and Future Work

In this paper, we presented HYDROSAFE, a novel hybrid deterministic-probabilistic model for generating synthetic appliance power consumption profiles. Our approach combines data-driven analysis with stochastic elements to enhance realism and variability. The application of DTW and MDT algorithms ensures accurate clustering and profile characterization, while the probabilistic adjustments simulate realistic usage patterns. Our evaluation demonstrates that HYDROSAFE effectively replicates real-world data, offering a valuable tool for developing and testing energy management systems. The results show a high similarity between original and synthetic profiles, with an average distance of ten samples at a 1 Hz sampling rate.

Future work will explore extending HYDROSAFE to incorporate more complex appliance interactions, such as recommender systems, and expanding its application to various residential environments, thus providing a robust test bed for validating analytical algorithms and energy management solutions. Additionally, future studies will focus on validating the model by comparing its outputs with experimental data, particularly in terms of power curves and energy consumption, to assess the differences between the proposed model and the actual behavior of functioning systems.

Author Contributions

Conceptualization, validation, writing—original draft preparation, writing—review and editing, visualization, A.J.; methodology, formal analysis, A.J. and M.A.; supervision, funding acquisition, A.H. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and in part by the Western WIN 4.0 Research Grant.

Data Availability Statement

The data is generated using the proposed model.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

$H$	Set of households
h	Single household
$A^{h}$	Set of appliances of h
a	Single appliance
$P^{a}$	Set of operation modes for a
p	Operation mode (AOM)
$D$	Set of days
d	Single day
$Ω_{a}^{d}$	Daily power consumption sequence
$ω_{n}$	$n^{t h}$ instantaneous power sample value
$n_{*}$	Last sample index in d
$f_{s}$	Sampling frequency
t	Time
$Ψ_{a}^{d, p}$	Set of SUPs in d
$ψ^{p}$	Single Use Profile (SUP) activated with p
$\hat{ψ}$	Smoothed SUP
${\ddot{Ψ}}_{a}$	Set of synthetic SUPs for a
$θ_{s}$	Length of sequence s
$n^{s}$	Sample index when appliance is turned on
$n^{e}$	Sample index when appliance is turned off
$ϕ$	Empty sequence
$H$	HYDROSAFE generator function
	Set of tuning parameters
$τ^{ε}$	Stand-by power threshold
$M$	Moving median smoother function
W	Sliding window size
$d (x, y)$	Minkowski distance between sequences $x, y$
r	Minkowski distance order
$E (ψ, \hat{ψ})$	$L 2$ norm of the Minkowski pariwise distance between sequences $ψ, \hat{ψ}$
$I^{\hat{ψ}}$	Indicator vector of $\hat{ψ}$
$σ (x)$	Standard deviation of sequence x
$W^{I}$	MDT window size
$k^{l}$	Left partition of the MDT window
$k^{r}$	Right partition of the MDT window
$Π^{\hat{ψ}}$	Sequence of thick edges for $\hat{ψ}$
$π$	Single thick edge
$n^{ι}$	Lower bound of $π$
$n^{o}$	Upper bound of $π$
$τ^{I}$	Thick edges threshold
$Λ^{\hat{ψ}}$	Sequence of states for $\hat{ψ}$
$λ$	Single state
R	Size of $Λ$
$e^{o}$	Left exact edge of $λ$
$e^{ι}$	Right exact edge of $λ$
${}^{⌟}e_{o}^{i}$	Rising exact edge
${}^{⌝}e_{o}^{i}$	Falling exact edge
$d_{e} (q_{i}, p_{j})$	Euclidean distance between $q_{i}, p_{j}$
$D D T W S C$	SUPs clustering algorithm
$ϵ$	Eps-neighborhood hyperparameter
$μ$	Minimum number of adjacent elements
$Ψ_{a}^{c}$	The core set of SUPs
$Ψ_{a}^{b}$	The border set of SUPs
$Ψ_{a}^{o}$	The outlier set of SUPs
$Ψ_{a}^{*}$	directly density-reachable SUPs
ℵ	DTW distance matrix
$D T W (ψ^{p_{i}}, ψ^{p_{j}})$	DTW distance between $ψ^{p_{i}}, ψ^{p_{j}}$
$\bar{δ} (ψ^{p_{i}}, ψ^{p_{j}})$	Normalized pairwise DTW distance between $ψ^{p_{i}}, ψ^{p_{j}}$
$Δ$	Normalized pairwise distance matrix
$μ$	Mean value
$N$	Normal distribution
$ξ$	Noise coefficient
${}^{ξ}{\ddot{ψ}}$	SySUP with added noise
$\bar{δ} (\ddot{ψ}, Ψ)$	Mean DTW distance between a SySUP and corresponding SUPs
$\bar{κ} (\ddot{ψ}, Ψ)$	Mean of standard deviations of DTW distance between a SySUP and corresponding SUPs
${}^{ϑ}{\ddot{ψ}}$	SySUP with added SOS
$ϑ$	SOS coefficient
${}^{ϱ}{\ddot{ψ}}$	SySUP with added ripple
$ϱ$	Ripple coefficient
$γ$	Ripple amplitude
$ρ$	Ripple period length
ℓ	Exact Edge Position (EEP)
${}^{ℓ}{\ddot{ψ}}$	SySUP with variant EEPs

References

Ahmad, T.; Chen, H. Potential of three variant machine-learning models for forecasting district level medium-term and long-term energy demand in smart grid environment. Energy 2018, 160, 1008–1020. [Google Scholar] [CrossRef]
Energy, U. Annual Energy Review 2009; US Energy Information Administration: Washington, DC, USA, 2010; pp. 19–53.
Canada Energy Regulator, Canada’s Energy Future. 2021. Available online: https://bit.ly/3d8vIqH (accessed on 29 May 2023).
U.S. Energy Information Administration. Annual Energy Outlook 2021. Available online: https://www.eia.gov/outlooks/aeo/ (accessed on 29 May 2023).
Canada, N.R. Natural Resources Canada, Appliances for Residential Use. 2021. Available online: https://bit.ly/3omEb0d (accessed on 29 May 2023).
AlHammadi, A.; AlZaabi, A.; AlMarzooqi, B.; AlNeyadi, S.; AlHashmi, Z.; Shatnawi, M. Survey of IoT-Based Smart Home Approaches. In Proceedings of the 2019 Advances in Science and Engineering Technology International Conferences (ASET), Dubai, United Arab Emirates, 26 March–10 April 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Chen, J.; Zhao, Y.; Wang, M.; Wang, K.; Huang, Y.; Xu, Z. Power Sharing and Storage-Based Regenerative Braking Energy Utilization for Sectioning Post in Electrified Railways. IEEE Trans. Transp. Electrif. 2024, 10, 2677–2688. [Google Scholar] [CrossRef]
Chen, J.; Hu, H.; Wang, M.; Ge, Y.; Wang, K.; Huang, Y.; Yang, K.; He, Z.; Xu, Z.; Li, Y.R. Power flow control-based regenerative braking energy utilization in ac electrified railways: Review and future trends. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6345–6365. [Google Scholar] [CrossRef]
Gutiérrez-Peña, J.A.; Flores-Arias, J.M.; Bellido-Outeiriño, F.; Lopez, M.O.; Latorre, F.Q. Smart Home Energy Management System and How to Make It Cost Affordable. In Proceedings of the 2020 IEEE 10th International Conference on Consumer Electronics (ICCE-Berlin), Berlin, Germany, 9–11 November 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Chen, Y.Y.; Chen, M.H.; Chang, C.M.; Chang, F.S.; Lin, Y.H. A smart home energy management system using two-stage non-intrusive appliance load monitoring over fog-cloud analytics based on Tridium’s Niagara framework for residential demand-side management. Sensors 2021, 21, 2883. [Google Scholar] [CrossRef] [PubMed]
Jaradat, A.; Lutfiyya, H.; Haque, A. Smart Home Energy Visualizer: A Fusion Of Data Analytics and Information Visualization. IEEE Can. J. Electr. Comput. Eng. 2022, 45, 77–87. [Google Scholar] [CrossRef]
Shewale, A.; Mokhade, A.; Funde, N.; Bokde, N.D. An Overview of Demand Response in Smart Grid and Optimization Techniques for Efficient Residential Appliance Scheduling Problem. Energies 2020, 13, 4266. [Google Scholar] [CrossRef]
Chouaib, B.; Lakhdar, D.; Lokmane, Z. Smart Home Energy Management System Architecture Using IoT; ICIST: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Jaradat, A.; Lutfiyya, H.; Haque, A. Demand Response for Residential Uses: A Data Analytics Approach. In Proceedings of the 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 2–16 June 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Himeur, Y.; Alsalemi, A.; Bensaali, F.; Amira, A. Building power consumption datasets: Survey, taxonomy and future directions. Energy Build. 2020, 227, 110404. [Google Scholar] [CrossRef]
Gopinath, R.; Kumar, M.; Joshua, C.P.C.; Srinivas, K. Energy management using non-intrusive load monitoring techniques–State-of-the-art and future research directions. Sustain. Cities Soc. 2020, 62, 102411. [Google Scholar] [CrossRef]
Thorve, S.; Baek, Y.Y.; Swarup, S.; Mortveit, H.; Marathe, A.; Vullikanti, A.; Marathe, M. High resolution synthetic residential energy use profiles for the United States. Sci. Data 2023, 10, 76. [Google Scholar] [CrossRef]
Flett, G.; Kelly, N. Modelling of individual domestic occupancy and energy demand behaviours using existing datasets and probabilistic modelling methods. Energy Build. 2021, 252, 111373. [Google Scholar] [CrossRef]
Marszal-Pomianowska, A.; Heiselberg, P.; Larsen, O.K. Household electricity demand profiles–A high-resolution load model to facilitate modelling of energy flexible buildings. Energy 2016, 103, 487–501. [Google Scholar] [CrossRef]
Pflugradt, N.; Muntwyler, U. Synthesizing residential load profiles using behavior simulation. Energy Procedia 2017, 122, 655–660. [Google Scholar] [CrossRef]
Lopez, J.M.G.; Pouresmaeil, E.; Canizares, C.A.; Bhattacharya, K.; Mosaddegh, A.; Solanki, B.V. Smart residential load simulator for energy management in smart grids. IEEE Trans. Ind. Electron. 2018, 66, 1443–1452. [Google Scholar] [CrossRef]
Kabirifar, M.; Pourghaderi, N.; Rajaei, A.; Moeini-Aghtaie, M.; Safdarian, A. Deterministic and probabilistic models for energy management in distribution systems. In Handbook of Optimization in Electric Power Distribution Systems; Springer: Berlin/Heidelberg, Germany, 2020; pp. 343–382. [Google Scholar]
Makonin, S.; Wang, Z.J.; Tumpach, C. RAE: The Rainforest Automation Energy Dataset for Smart Grid Meter Data Analysis. arXiv 2017, arXiv:1705.05767. [Google Scholar]
Happle, G.; Fonseca, J.A.; Schlueter, A. A review on occupant behavior in urban building energy models. Energy Build. 2018, 174, 276–292. [Google Scholar] [CrossRef]
Kaselimi, M.; Protopapadakis, E.; Voulodimos, A.; Doulamis, N.; Doulamis, A. Towards Trustworthy Energy Disaggregation: A Review of Challenges, Methods, and Perspectives for Non-Intrusive Load Monitoring. Sensors 2022, 22, 5872. [Google Scholar] [CrossRef]
Sepehr, M.; Eghtedaei, R.; Toolabimoghadam, A.; Noorollahi, Y.; Mohammadi, M. Modeling the electrical energy consumption profile for residential buildings in Iran. Sustain. Cities Soc. 2018, 41, 481–489. [Google Scholar] [CrossRef]
Widén, J.; Wäckelgård, E. A high-resolution stochastic model of domestic activity patterns and electricity demand. Appl. Energy 2010, 87, 1880–1892. [Google Scholar] [CrossRef]
Richardson, I.; Thomson, M.; Infield, D.; Clifford, C. Domestic electricity use: A high-resolution energy demand model. Energy Build. 2010, 42, 1878–1887. [Google Scholar] [CrossRef]
McKenna, E.; Thomson, M. High-resolution stochastic integrated thermal—Electrical domestic demand model. Appl. Energy 2016, 165, 445–461. [Google Scholar] [CrossRef]
Lombardi, F.; Balderrama, S.; Quoilin, S.; Colombo, E. Generating high-resolution multi-energy load profiles for remote areas with an open-source stochastic model. Energy 2019, 177, 433–444. [Google Scholar] [CrossRef]
Nilsen, C.B.; Hoff, B.; Østrem, T. Framework for Modeling and Simulation of Household Appliances. In Proceedings of the IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; IEEE: New York, NY, USA, 2018; pp. 3472–3476. [Google Scholar]
Gui, J.; Sun, Z.; Wen, Y.; Tao, D.; Ye, J. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Trans. Knowl. Data Eng. 2021, 35, 3313–3332. [Google Scholar] [CrossRef]
Harell, A.; Jones, R.; Makonin, S.; Bajić, I.V. TraceGAN: Synthesizing appliance power signatures using generative adversarial networks. IEEE Trans. Smart Grid 2021, 12, 4553–4563. [Google Scholar] [CrossRef]
Harell, A.; Jones, R.; Makonin, S.; Bajic, I.V. PowerGAN: Synthesizing Appliance Power Signatures Using Generative Adversarial Networks, 2020. arXiv 2007, arXiv:2007.13645. [Google Scholar]
Song, L.; Li, Y.; Lu, N. ProfileSR-GAN: A GAN based Super-Resolution Method for Generating High-Resolution Load Profiles. IEEE Trans. Smart Grid 2022, 13, 3278–3289. [Google Scholar] [CrossRef]
Sanderson, E.; Fragaki, A.; Simo, J.; Matuszewski, B.J. mREAL-GAN: Generating Multiple Residential Electrical Appliance Load Profiles with Inter-Dependencies using a Generative Adversarial Network. arXiv 2021, arXiv:2112.06656. [Google Scholar]
Liang, X.; Wang, H. Synthesis of realistic load data: Adversarial networks for learning and generating residential load patterns. In Tackling Climate Change with Machine Learning 2022, Proceedings of the NeurIPS 2022 Workshop, Neural Information Processing Systems (NIPS), New Orleans, LO, USA, 28 November–9 December 2022; Mitra, P., Joäo Sousa, M., Roth, M., Drgoňa, J., Strubell, E., Bengio, Y., Eds.; NIPS: San Diego, CA, USA; pp. 1–8.
Gkoutroumpi, C.; Gkalinikis, N.V.; Vrakas, D. SGAN: Appliance Signatures Data Generation for NILM Applications Using GANs. In Intelligent Computing; Series Title: Lecture Notes in Networks and Systems; Arai, K., Ed.; Springer Nature Switzerland: Cham, Switzerland, 2024; Volume 1018, pp. 325–339. [Google Scholar] [CrossRef]
Wu, A.N.; Stouffs, R.; Biljecki, F. Generative Adversarial Networks in the built environment: A comprehensive review of the application of GANs across data types and scales. Build. Environ. 2022, 304, 109477. [Google Scholar] [CrossRef]
Ezhilarasi, P.; Ramesh, L.; Liu, X.; Holm-Nielsen, J.B. Smart Meter Synthetic Data Generator development in python using FBProphet. Softw. Impacts 2023, 15, 100468. [Google Scholar] [CrossRef]
Li, D.; Bissyandé, T.F.; Kubler, S.; Klein, J.; Le Traon, Y. Profiling household appliance electricity usage with N-gram language modeling. In Proceedings of the 2016 IEEE International Conference on Industrial Technology (ICIT), Taipei, Taiwan, 14–17 March 2016; pp. 604–609. [Google Scholar] [CrossRef]
Buneeva, N.; Reinhardt, A. AMBAL: Realistic load signature generation for load disaggregation performance evaluation. In Proceedings of the 2017 IEEE International Conference on Smart Grid Communications (Smartgridcomm), Dresden, Germany, 23–27 October 2017; IEEE: New York, NY, USA, 2017; pp. 443–448. [Google Scholar]
Klemenjak, C.; Kovatsch, C.; Herold, M.; Elmenreich, W. A synthetic energy dataset for non-intrusive load monitoring in households. Sci. Data 2020, 7, 108. [Google Scholar] [CrossRef]
Chen, D.; Irwin, D.; Shenoy, P. Smartsim: A device-accurate smart home simulator for energy analytics. In Proceedings of the 2016 IEEE International Conference on Smart Grid Communications (SmartGridComm), Sydney, NSW, Australia, 6–9 November 2016; IEEE: New York, NY, USA, 2016; pp. 686–692. [Google Scholar]
Barker, S.; Mishra, A.; Irwin, D.; Cecchet, E.; Shenoy, P.; Albrecht, J. Smart*: An open data set and tools for enabling research in sustainable homes. SustKDD August 2012, 111, 108. [Google Scholar]
Henriet, S.; Şimşekli, U.; Fuentes, B.; Richard, G. A generative model for non-intrusive load monitoring in commercial buildings. Energy Build. 2018, 177, 268–278. [Google Scholar] [CrossRef]
A Simulated High-Frequency Energy Disaggregation Dataset for Commercial Buildings. Available online: https://nilm.telecom-paristech.fr/shed/ (accessed on 24 July 2024).
Ding, C.H.; Li, T.; Jordan, M.I. Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 32, 45–55. [Google Scholar] [CrossRef] [PubMed]
Batra, N.; Kelly, J.; Parson, O.; Dutta, H.; Knottenbelt, W.; Rogers, A.; Singh, A.; Srivastava, M. NILMTK: An open source toolkit for non-intrusive load monitoring. In Proceedings of the 5th International Conference on Future Energy Systems, Cambridge, UK, 11–13 June 2014; pp. 265–276. [Google Scholar]
Zhao, A.; Chen, M.; Yu, J.; Cui, P. Simulating appliance-level household electricity data: Accounting for residential behavior and usage patterns in China. J. Build. Eng. 2024, 92, 109804. [Google Scholar] [CrossRef]
Donnal, J. NILM-Synth: Synthetic Dataset Generation for Non-Intrusive Load Monitoring Algorithms. In Proceedings of the 2022 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), New Orleans, LA, USA, 24–28 April 2022; pp. 1–6. [Google Scholar] [CrossRef]
Meiser, M.; Duppe, B.; Zinnikus, I. SynTiSeD – Synthetic Time Series Data Generator. In Proceedings of the 2023 11th Workshop on Modelling and Simulation of Cyber-Physical Energy Systems (MSCPES), San Antonio, TX, USA, 9 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Proedrou, E. A comprehensive review of residential electricity load profile models. IEEE Access 2021, 9, 12114–12133. [Google Scholar] [CrossRef]
Jaradat, A. Github Repo for HYDROSAFE, 2021. Available online: https://github.com/abedjar/HYDROSAFE (accessed on 29 May 2023).
Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the KDD Workshop, Seattle, WA, USA, 31 July–1 August 1994; Volume 10, pp. 359–370. [Google Scholar]
Menke, W.; Menke, J. Environmental Data Analysis with MatLab; Academic Press: Cambridge, MA, USA, 2016. [Google Scholar]
Nguyen, T.; Qin, X.; Dinh, A.; Bui, F. Low resource complexity R-peak detection based on triangle template matching and moving average filter. Sensors 2019, 19, 3997. [Google Scholar] [CrossRef]
An, X.; K. Stylios, G. Comparison of motion artefact reduction methods and the implementation of adaptive motion artefact reduction in wearable electrocardiogram monitoring. Sensors 2020, 20, 1468. [Google Scholar] [CrossRef]
Tan, P.N.; Steinbach, M.; Kumar, V. Introduction to Data Mining; Pearson Education India: Bangalore, India, 2016. [Google Scholar]
Patel, K.J. Effects of Transformer Inrush Current. Bachelor’s Thesis, University of Southern Queensland, Toowoomba, QLD, Australia, 2013. [Google Scholar]
Fried, R. On the robust detection of edges in time series filtering. Comput. Stat. Data Anal. 2007, 52, 1063–1074. [Google Scholar] [CrossRef]
Jaradat, A.; Alarbi, M.; Lutfiyya, H.; Haque, A. Appliances Operation Modes Identification Using States Clustering. In Proceedings of the 2023 International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkiye, 25–27 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
Jaradat, A.; Lutfiyya, H.; Haque, A. Density And Dynamic Time Warping Based Spatial Clustering For Appliance Operation Modes. In Proceedings of the 2023 IEEE PES Conference on Innovative Smart Grid Technologies-Middle East (ISGT Middle East), Abu Dhabi, United Arab Emirates, 12–15 March 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the KDD, Portland, OR, USA, 2–5 August 1996; Volume 96, pp. 226–231. [Google Scholar]

Figure 1. Two SUPs for a clothes dryer. Each SUP is activated with a different AOM.

Figure 2. The annual power consumption [23] and corresponding costs for operating 3 appliances, and the potential savings by switching to the lighter operation modes.

Figure 3. The architecture of HYDROSAFE.

Figure 4. A square wave with uniform noise, moving average, and moving median.

Figure 5. The Euclidean distance between trimmed vs. smoothed multiple SUP for a dryer using moving median with variation in the window size.

Figure 6. (a) The smoothed SUP sequence

\hat{ψ} (n)

with the indicator vector

I^{\hat{ψ}} (n)

. (b) A zoom-in to a state showing its upper and lower bounds, the upper and lower bounds of thick edges, the exact edges.

Figure 6. (a) The smoothed SUP sequence

\hat{ψ} (n)

with the indicator vector

I^{\hat{ψ}} (n)

. (b) A zoom-in to a state showing its upper and lower bounds, the upper and lower bounds of thick edges, the exact edges.

Figure 7. The distance matrix,

Δ

, for 3 appliances in 2 houses [23]. (A) A dryer in house-2 with 3 AOMs. (B) A dryer in house-1 with 3 AOMs. (C) A clothes washer in house-1 with 2 AOMs. (D) A dishwasher in house-1 with 2 AOMs.

Figure 7. The distance matrix,

Δ

, for 3 appliances in 2 houses [23]. (A) A dryer in house-2 with 3 AOMs. (B) A dryer in house-1 with 3 AOMs. (C) A clothes washer in house-1 with 2 AOMs. (D) A dishwasher in house-1 with 2 AOMs.

Figure 8. The impact of changing the noise coefficient,

ξ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 8. The impact of changing the noise coefficient,

ξ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 9. The impact of changing the SOS coefficient,

ϑ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 9. The impact of changing the SOS coefficient,

ϑ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 10. The impact of changing the ripple parameters,

ρ

and

γ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 10. The impact of changing the ripple parameters,

ρ

and

γ

, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 11. The impact of EEP factor, ℓ, on the values of the distance mean,

\bar{δ}

, for a dryer.

Figure 11. The impact of EEP factor, ℓ, on the values of the distance mean,

\bar{δ}

, for a dryer.

Table 1. A comparison of HYDROSAFE and publicly available synthetic datasets and simulators.

Simulator	No. Appliances	Availability	Sampling Rate	Scope	Description
Henriet et al. [46]	66	Public	0.033 Hz	Commercial	SHED is a stochastic-based comprehensive framework for energy disaggregation in commercial buildings, including a statistical analysis of differences between commercial and residential buildings, a generative model for simulating high-frequency current waveforms utilizing the Semi Non-negative Matrix Factorization (SNMF) algorithm [48].
Chen et al. [44]	25	Public	1 Hz	Residential	SmartSim is a device-accurate, NILM-TK integrated [49], smart home energy trace generator that generates complete datasets for homes with second-level energy data through a generation pipeline that utilizes historical data, Distribution learning, Event marking, and Trace Generation processes.
Buneeva et al. [42]	14	N/A	1 Hz	Residential	AMBAL is a NILM-TK integrated system for automatically generating realistic synthetic power consumption traces represented as sequences of parameterized signatures, minimizing complexity for desired accuracy.
Zhao et al. [50]	N/A	N/A	N/A	Residential	A data generation model based on Markov chains and Variational Autoencoders (VAE) to simulate diversified and random electricity consumption data for household appliances, accounting for the residential behavior and usage patterns in Chinese households.
Thorve et al. [17]	7	Public	Hourly	Residential	A large-scale digital-twin dataset of residential energy use for the contiguous United States, featuring synthetic hourly energy use profiles for the U.S. population using census data, statistical methods, activity-related attributes through regression models and survey data.
Donnal [51]	Variable	Public	Variable	Residential	NILM-Synth is a synthetic dataset generation tool that creates realistic power waveforms by superimposing extracted exemplars from live power data using existing NILM infrastructure.
Ezhilarasi et al. [40]	N/A	Public	30 min	N/A	Smart meter-SDG is a Smart Meter Synthetic Data Generator using the FBProphet library based on the UK Power Networks project.
Meiser et al. [52]	N/A	Public	N/A	Residential	SynTiSeD is a probabilistic multi-agent-based simulation tool that generates synthetic energy data based on real-world data. The model is interactive and involves Behavior Modeling, residents, and appliances into account.
Klemenjak et al. [43]	21	Public	5 Hz	Residential	SynD is a synthetic energy dataset that is generated using a custom simulation process based on power consumption patterns recorded from real household constantly on, periodical, single-pattern, and multi-pattern appliances in Austria.

N/A: Information is not available.

Table 2. The average of the pairwise DTW distance per house per appliance.

House	Appliance	AOM-1	AOM-2	AOM-3
1	dryer	32.704	17.23	18.74
2	dryer	77.63	203.19	208.24
1	dishwasher	31.92	8.45	-
2	dishwasher	0.76	-	-
1	washer	10.04	9.064	-
2	washer	7.65	9.29	6.24

Table 3. The standard deviation of pairwise DTW distance per house per appliance.

House	Appliance	AOM-1	AOM-2	AOM-3
1	dryer	44.91	5.36	17.80
2	dryer	45.22	0.0	93.15
1	dishwasher	30.11	4.91	-
2	dishwasher	0.36	-	-
1	washer	3.69	3.74	-
2	washer	2.75	3.15	1.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jaradat, A.; Alarbi, M.; Haque, A.; Lutfiyya, H. HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation. Sensors 2024, 24, 5619. https://doi.org/10.3390/s24175619

AMA Style

Jaradat A, Alarbi M, Haque A, Lutfiyya H. HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation. Sensors. 2024; 24(17):5619. https://doi.org/10.3390/s24175619

Chicago/Turabian Style

Jaradat, Abdelkareem, Muhamed Alarbi, Anwar Haque, and Hanan Lutfiyya. 2024. "HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation" Sensors 24, no. 17: 5619. https://doi.org/10.3390/s24175619

APA Style

Jaradat, A., Alarbi, M., Haque, A., & Lutfiyya, H. (2024). HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation. Sensors, 24(17), 5619. https://doi.org/10.3390/s24175619

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation^†

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

4. HYDROSAFE Architecture

5. SUPs Extraction and Smoothing

5.1. SUPs Extraction

5.2. SUPs Smoothing

6. Extraction of SUPs Features

6.1. Estimation of State Edges

6.2. Determining SUP States

7. SUP Clustering

8. Generating Synthetic SUPs

8.1. The White Noise Component

8.2. The Switch-On Surge Component

8.3. The Ripple Component

8.4. State Edge Position Variation

9. Evaluation

9.1. Evaluating the Effect of the White Noise Component

9.2. Evaluating the Effect of the Switch-On Surge Component

9.3. Evaluating the Effect of the Ripple Component

9.4. Evaluating the Effect of the State Edge Position Variation

10. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation †

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

4. HYDROSAFE Architecture

5. SUPs Extraction and Smoothing

5.1. SUPs Extraction

5.2. SUPs Smoothing

6. Extraction of SUPs Features

6.1. Estimation of State Edges

6.2. Determining SUP States

7. SUP Clustering

8. Generating Synthetic SUPs

8.1. The White Noise Component

8.2. The Switch-On Surge Component

8.3. The Ripple Component

8.4. State Edge Position Variation

9. Evaluation

9.1. Evaluating the Effect of the White Noise Component

9.2. Evaluating the Effect of the Switch-On Surge Component

9.3. Evaluating the Effect of the Ripple Component

9.4. Evaluating the Effect of the State Edge Position Variation

10. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

HYDROSAFE: A Hybrid Deterministic-Probabilistic Model for Synthetic Appliance Profiles Generation^†