Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts

Richter, Lucas; Bender, Tom; Lenk, Steve; Bretschneider, Peter

doi:10.3390/en17071634

Open AccessArticle

Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts

¹

Fraunhofer IOSB—Applied System Technology, Am Vogelherd 90, 98693 Ilmenau, Germany

²

Department of Electrical Engineering and Information Technology, Ilmenau University of Technology, Ehrenbergstraße 29, 98693 Ilmenau, Germany

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(7), 1634; https://doi.org/10.3390/en17071634

Submission received: 20 February 2024 / Revised: 20 March 2024 / Accepted: 21 March 2024 / Published: 28 March 2024

(This article belongs to the Special Issue Energy Communities for the Transition to a Sustainable and Decarbonized Society)

Download

Browse Figures

Versions Notes

Abstract

:

Thanks to various European directives, individuals are empowered to share and trade electricity within Renewable Energy Communities, enhancing the operational efficiency of local energy systems. The digital transformation of the energy market enables the integration of decentralized energy resources using cloud computing, the Internet of Things, and artificial intelligence. In order to assess the feasibility of new business models based on data-driven solutions, various electricity consumption time series are necessary at this level of aggregation. Since these are currently not yet available in sufficient quality and quantity, and due to data privacy reasons, synthetic time series are essential in the strategic planning of smart grid energy systems. By enabling the simulation of diverse scenarios, they facilitate the integration of new technologies and the development of effective demand response strategies. Moreover, they provide valuable data for assessing novel load forecasting methodologies that are essential to manage energy efficiently and to ensure grid stability. Therefore, this research proposes a methodology to synthesize electricity consumption time series by applying the Box–Jenkins method, an intelligent sampling technique for data augmentation and a probabilistic forecast model. This novel approach emulates the stochastic nature of electricity consumption time series and synthesizes realistic ones of Renewable Energy Communities concerning seasonal as well as short-term variations and stochasticity. Comparing autocorrelations, distributions of values, and principle components of daily sequences between real and synthetic time series, the results exhibit nearly identical characteristics to the original data and, thus, are usable in designing and studying efficient smart grid systems.

Keywords:

synthetic time series; probabilistic neural network; electricity consumption; district scale; household data

1. Introduction

1.1. The Energy Market

The European Renewable Energy Directive RED II [1] firstly mentions the concept of Renewable Energy Communities REC (Definition 1) and enables local energy exchange, leading to reduced prices. This initiative not only promotes regional development but also unlocks the potential for local energy efficiency through sector coupling of electricity, heat, and transport. Moreover, it creates opportunities for innovative business models such as tenant electricity, energy sharing, demand side management, and localized energy pricing. As the liberalization of the energy market enables consumers to potentially switch their community on a daily basis [1], energy management systems are facing an increasingly dynamic portfolio, leading to non-stationary (Definition 2) and discontinuous (Definition 3) electricity consumption time series. The integration of new consumer types, such as heat pumps and electric vehicles, reinforces this effect. Since electricity consumption forecasts of REC are critical for the optimization of local power plants, there is a pressing need to develop appropriate prediction models using synthetic time series.

REC consist of multiple districts that are composed of several households within one low voltage network—separated from neighboring ones by local grid transformers or gas pressure regulators and often possessing a different spatial extent [2]. They have various characteristics concerning local infrastructures, building characteristics, compositions of residents with various socioeconomic backgrounds, and energy systems of different technical equipment. Additionally, REC are individually shaped, e.g., urban, rural, industrial, or mixed ones [3]. Consequently, their specific energy consumption depends on many factors. Overall, the building sector accounts for more than 35% of energy demand, and 90% of this is used to produce heat [4]. While the electrification of the heat and mobility sector entails immense potential to increase energy efficiency, electricity load increases, which can lead to high stress on the grid [5]. The load shift potential of flexible producers and consumers as well as their systematic interaction becomes more important in the context of energy transition [4]. Cross-sectoral energy management of local devices leverages synergy potentials between electricity, heat, and mobility; reduces grid losses; increases the security of the energy supply; decreases dependence on energy imports; and integrates new technologies like seasonal storage systems [2].

In our definition of an energy management system EMS, it plans the operational management of electricity generation, consumption, and storage, which is facilitated by gathering their data, analyzing them and optimizing their operation by using data-based forecasts [6]. It is able to reduce energy costs and to enhance operational efficiency. Since electricity consumption behavior of REC is non-stationary and discontinuous, innovative forecasting models must be developed to take this uncertainty into account. In particular, the dependencies of residents with different socioeconomic backgrounds on the total energy demand has to be considered [7,8,9,10]. Time series synthesis plays a vital role in numerous research areas (e.g., time series forecasting, energy system optimization, and anomaly detection) by addressing the constraints of limited datasets and enabling a substantial expansion of the available data: it (i) can serve as a benchmark to evaluate and validate the performance of forecasting models [11], (ii) can protect sensitive data [12], (iii) can augment existing datasets [13], (iv) is applicable in scenario analysis and planning [14], (v) can enhance risk management calculations [15], and (vi) is able to support the analysis of new business models.

Definition 1 ([16]).

A Renewable Energy Community

Consists of at least 50 natural persons;
Involves at least 75% of the shares being held by natural persons who are located within one postal area and a radius of 50 km;
Requires that no member possesses more than 10% of the shares.

Definition 2.

Non-Stationarity

Mean, variance, and autocorrelation are not constant over time of non-stationary time series after removing seasonality.

Definition 3.

Discontinuity

Discontinuous time series possess bounds in the sequence of observations.

1.2. Time Series Synthesis

Time series synthesis has been studied for a long time in various domains. Shamshad et al. used a Markov chain to calculate transition probabilities from one state to another and were able to obtain statistical characteristics in accordance to real time series data [17]. Talbot et al. applied a two-step procedure to detrend signals by a Fourier analysis and then to characterize the residual signal by using of an Auto-Regressive Moving-Average ARMA model [18]. Richardson et al. simulated the energy demand of households by analyzing occupants’ activities depending on different weekdays [19]. Talbot as well as Richardson applied a visual comparison between real and synthetic time series to evaluate the usability of their method. A bottom-up approach was used to synthesize residential electricity consumption by modeling the usage frequency of different appliances in terms of turn-on time, operating time and their potential correlation in between [20]. Here, the distribution of time series values exhibited strong consistency between real and synthetic ones. Naumann et al. combined deterministic, periodic, exogenous, and stochastic time series components additively in order to generate synthetic sequences [21,22]. However, this approach requires parameterization of the process stochasticity, which requires additional—and for REC unavailable—information. Recently, diverse Generative Adversarial Networks GANs have been widely used to synthesize one or multidimensional energy time series [15,23,24] by splitting and processing them in daily vectors. It was shown that the discriminator part is not able to distinguish between real and fake sequences. Real and synthetic sequences could not be distinguished within a 2-dimensional principal component analysis [23]. While these papers focus intensely on single-day vectors, the entire time series process including auto-regression, trend, and periodicity is neglected. Additionally, GANs have some disadvantages like mode collapse, training instability, and the high amount of training data [25]. The application of conditional Invertible Neural Networks cINNs shows benefits regarding realistic synthesis of non-stationary and periodic time series [26]. Here, a predictive score was used to measure the quality of generated sequences. This was achieved by training a forecast model using synthetic time series and evaluating its performance on real data. Some synthetic energy time series are already publicly available [27,28].

1.3. Contributions

A time series typically consists of four components (Equation (1)): (1) The trend component

T_{t}

represents systematic linear changes over time. (2) The seasonal component

S_{t}

captures periodic variations within a year. (3) The periodic component

C_{t}

represents short and repetitive occurrences of typical patterns, such as the type of day. (4) The white noise

ϵ_{t}

represents residuals from unexplained influences.

Y_{t} = T_{t} + S_{t} + C_{t} + ϵ_{t}

(1)

While districts represent subsets of REC with higher stochasticity and individual characteristics, we propose the synthesis of district electricity consumption time series DECTS (Definition 4) as a first step. Subsequently, the electricity consumption time series of REC can be constructed by combining different DECTS, resulting in a diverse range of urban or rural profiles. Therefore, we develop a simplified approach to synthesize DECTS by using a probabilistic forecast model in conjunction with conditional inputs like weather and calendar data (as described in Section 3). Compared to the state-of-the-art, this novel methodology can emulate the stochastic nature of time series, is transferable to varying characteristics of DECTS, ensuring scalability, and is able to handle seasonality, periodicity, and white noise. Based on the Box–Jenkins method [29], we examine various seasonalities through data clustering, determine autoregressive terms using partial autocorrelations, account for day-of-week effects in differencing, and compare the statistical properties (such as autocorrelation and partial autocorrelation) of real and synthetic time series. This top-down approach overlooks the synthesis of time series for individual households because of issues related to data gaps, considerable stochasticity, and challenges in modeling. In our work, we use

N = 10

(Definition 4) to obtain a characteristic DECTS of realistic size with high stochasticity. To address limited training data, we develop a sophisticated pre-processing pipeline including time series clustering and a multi-step probabilistic forecast model to generate daily mean as well as half-hourly values (Figure 1). The generated time series are to be used prospectively for research in short-term district energy management. For this purpose, the trend component T is neglected. Since districts are comprised of residents with diverse characteristics, their DECTS will be non-stationary and discontinuous due to resident exchanges or their behavioral development (Section 1.1). To obtain such characteristics within time series, the composition of residents could be altered over time. The structure of this work is as follows: (1) analysis, transformation, and clustering of a publicly available dataset (Section 2); (2) methodology including problem description, concept of time series synthesis, time series process, data pre-processing, and augmentation (Section 3); (3) description of model architecture, training strategy, and procedure of time series synthesis (Section 4); (4) evaluation of synthetic time series (Section 5); and (5) discussion and conclusion (Section 6 and Section 7).

Definition 4.

DECTS

A DECTS is the aggregated electricity consumption time series of N distinct households.

2. Data

Model-based time series synthesis requires extensive and diverse data for investigation and validation. To satisfy this prerequisite, data are taken from the Low Carbon London project, which contains energy consumption time series of 5567 London households [30]. The records span from November 2011 to February 2014, with a temporal resolution of half an hour and additionally include weather data, ACORN household classifications (Appendix A, Table A1), and bank holidays [31]. Figure 2 shows the frequency of occurrence of different ACORN groups in the dataset.

2.1. Analysis and Data Transformation

The London dataset contains time series with different characteristics regarding seasonal as well as short-term variations and amplitudes. As electricity consumption of households mainly depends on activities of its residents, their time series possess a strong stochasticity. Overall, the majority contains small consumption values due to nighttime or work hours. Consequently, resident time series exhibit skewness in terms of their distribution (Figure 3 (left)). This can lead to biased models and low prediction performance. A Yeo–Johnson transformation (Equation (2)) with

λ = - 2

is applied to raw data to yield more symmetric distribution, to smooth extreme values, and to stabilize variance (Figure 3 (right)). This leads to a range of values

x^{'} \in

[0, 0.5)

. While transformed values are used within time series synthesis, their back-transformation is used for evaluation purposes (Equation (3)). To fit the model for synthesis purposes, Yeo–Johnson transformed training data are additionally scaled between

[- 1, 1]

(Equation (4)).

x^{'} = \frac{{(x + 1)}^{λ} - 1}{λ}

(2)

x = e^{\frac{l o g (λ \times x^{'} + 1)}{λ}} - 1

(3)

x^{*} = a + \frac{(x^{'} - x_{m i n}^{'}) \times (b - a)}{x_{m a x}^{'} - x_{m i n}^{'}}

(4)

where

a = - 1

and

b = 1

to satisfy

x^{*} \in [- 1, 1]

, using Yeo–Johnson transformed values

x^{'}

.

The Kmeans clustering technique is utilized on half-hourly daily sequences to investigate their correlations with corresponding daily average values (Figure 4). Due to significant dependencies of daily profiles on these mean values, the latter serve as external input variables to synthesize half-hourly data points.

2.2. Clustering ACORN Groups

Assuming different characteristics between time series within one ACORN group concerning seasonal and short-term variations, we perform a technical clustering using Kmeans, which is commonly applied to various types of data and easy to interpret, to divide data into characteristic classes, receiving in total 55 subgroups (inspired by [32], see Figure 5). In addition to the ACORN subgroups mentioned in [31], this is mainly done to ensure similar characteristics within a dataset to stabilize model training and to ensure robustness. The entire clustering is summarized in the following:

Normalizing each participant’s time series by its median $\tilde{x}$ (Equation (5));
Calculating monthly mean values ${\bar{x}}_{m o n t h}$ of normalized time series (Figure 5 (left));
Calculating half-hourly mean values ${\bar{x}}_{h a l f - h o u r l y}$ of normalized time series (Figure 5 (right));
Applying Kmeans to ${\bar{x}}_{m o n t h}$ , where the number of clusters equals 2, and separate time series with regard to their seasonal variations (more seasonal vs. less seasonal characteristics, Figure 5 (left));
Applying Kmeans to ${\bar{x}}_{h a l f - h o u r l y}$ to further separate time series with respect to their daily sequence (Figure 5 (right)). The number of clusters is adjusted dynamically to ensure that each cluster has a minimum of 15 participants. This criterion is crucial for generating artificial DECTS with diverse characteristics, as one DECTS represents the collective consumption of 10 households.

x^{'} = \frac{x}{\tilde{x}}

(5)

3. Methodology

3.1. Problem Description

Time series underlie a stochastic process and the Box–Jenkins method is commonly used for prediction purposes [33]. While past observations as well as random shocks directly affect future values, different time series models (ARX, ARIX, and ARIMA) are applied. As these are not probabilistic, their forecasts are not suitable for time series synthesis. By considering random shocks within each forecast step, the resulting time series sequences might be quite individual. Having a training dataset

D = {(Y_{n}, X_{n})}_{n = 1}^{N}

with

Y_{t} \in Y_{n}

depicting targets and

X_{t} \in X_{n}

depicting temporal conditionals in respect to an ARIX process, synthetic time series might be generated by using uncertainties

ϵ_{t}

with conditional probability distribution

p (ϵ_{t} | X_{t})

to predict subsequent values. Here,

ϵ_{t}

depicts a Gaussian distribution and can be seen as the deviation from mean forecast values

{\hat{Y}}_{t}

. As each

{\hat{Y}}_{t}

is added to

X_{t + 1}

to predict

{\hat{Y}}_{t + 1}

,

p (ϵ_{t} | X_{t})

directly depends on past random shocks generated by

ϵ_{t - 1}

and consequently emulates a stochastic process. The main objective for the neural network (Section 4) is then to learn

\hat{p} (ϵ_{t} | X_{t})

time-dependently in respect to a proper measure of distance (Equation (6)), using historical observations. Having

\hat{p} (ϵ_{t} | X_{t})

, time series can be synthesized incrementally by forecasting stochastic values of

\hat{Y_{t}}

, using random shocks

ϵ_{t}

(Figure 6).

min_{\hat{p}} D (p (ϵ_{t} | X_{t}) ‖ \hat{p} (ϵ_{t} | X_{t}))

(6)

3.2. Concept

Based on the problem description and time series components neglecting the trend T (Section 1), we firstly generate training data in compliance with an ARIX process and additionally augment it by applying intelligent sampling techniques. The training data are then used within a two-step model approach: Firstly, a feedforward neural network FNN-S is used to model seasonal daily means (

S_{t}

) of DECTS, providing initial and supportive values for a probabilistic neural network PNN-S, which emulates a stochastic process by using random shocks. Compared to FNN-S, PNN-S trains additionally a probability density function to predict uncertainties. Then, synthesized stochastic daily means are used to model short-term variations in half-hourly values (

C_{t}

) by applying a probabilistic neural network PNN-C (Figure 7). Having pre-trained FNN-S, PNN-S, and PNN-C, DECTS could be synthesized iteratively by forecasting the next time step while taking random shocks into account (Section 3.1). To do this, we have to make some assumptions:

Assumption 1.

As daily means of electricity consumption are meaningful for trend and seasonality analysis with less noise representing the level, half-hourly values should be used to analyze short-term daily variations representing the patter [24].

Assumption 2.

The higher temporal resolution depends on the lesser one, e.g., if daily means of electricity consumption would change, their half-hourly patterns will change as well. This fact can additionally be observed by clustering short-term patterns with a specific number and comparing their member distribution of daily mean values to a 2-dimensional scatter plot of its principal components (Figure 4).

Assumption 3.

There are huge differences concerning time series characteristics within one ACORN group due to the size of households and further variations. This can be justified by ACORN subgroups [31].

3.3. Time Series Forecast

As mentioned in Section 1.3, this work neglects the trend component but rather focuses on seasonal as well as short-term variations and considers long-term as well as short-term stochastic processes regarding Equation (1). While usual households possess various seasonalities and additionally depend on exogenous weather variables [34], ARIX forms the baseline stochastic process behavior of these time series. In the field of time-series analysis, there are periodic (such as solar radiation and electricity load) and non-periodic (such as wind velocity and stocks) components, each with distinct dependencies and characteristics. As this study focuses specifically on the synthesis of DECTS by using probabilistic forecasts, we give a brief overview of its process principles.

To effectively model a time-series process, a component model must be provided with a clear specification of the input data considering a stochastic process, commonly known as the Box–Jenkins method [29]. While Equation (7) represents the regression equation RE of the autoregressive part AR (The AR part in ARIX models characterizes the relationship between actual values and past values, named as lagged values. Usually, only a certain number of lags p are considered) of the endogenous variable, i.e., the electricity consumption, Equation (8) represents RE of AR of exogenous variables, e.g., calendar features, temperature, and relative humilitiy, and Equation (9) represents RE of future exogenous variables and reference values. To handle non-stationarity by differencing, reference values shifted by

τ

according to the time-series process are used (Figure 8). These equations collectively form an ARIX time series process (Equation (10)). Considering calendar dependencies of electricity consumption time-series, calendar features such as time of day tod, day of week dow, day of year doy, and holidays hol should be included as exogenous variables in Equations (8) and (9). In contrast, temporal features with periodic patterns can be represented as one-hot encodings (Definition 5), resulting in high-dimensional data input, or as sine and cosine transformed features (Definition 6) with lower dimensions [35,36]. In the context of time-series analysis, neural networks utilize more or less the same input data as described in Equation (10). However, the key difference is that neural networks do not require a specific regression equation and are capable of learning non-linear relationships and latent features.

Definition 5.

One-Hot Encoding

Within a one-hot encoding, each class is represented by a binary vector. In this encoding, each class occurrence assigns to 1 and otherwise to 0.

Definition 6.

Periodic Encodings

Periodic encodings are transformations of one-hot encodings into more continuous variables by using sine and cosine functions. This can only be applied to periodic variables like daytime, day of the week, or day of the year.

y_{1} (t) = \sum_{i = 1}^{p} (α_{i} \times \nabla_{τ}^{d} y (t - i))

(7)

y_{2} (t) = \sum_{j = 1}^{n} \sum_{i = 1}^{p} (β_{j, i} \times \nabla_{τ}^{d} x_{j} (t - i))

(8)

y_{3} (t) = \sum_{j = 1}^{m} (β_{j} \times \nabla_{τ}^{d} x_{j} (t)) + γ \times y_{t, r e f}

(9)

y (t) = y_{1} (t) + y_{2} (t) + y_{3 t} (t) + ϵ_{t}

(10)

where:

$\nabla_{τ}^{d}$	Difference filter with order d to eliminate non-stationarity and backshift operator $τ$ depicting the shift to reference values.
$\nabla_{τ}^{1} y (t - i)$	$y (t) - y (t - τ)$
$\nabla_{τ}^{2} y (t - i)$	$(y (t) - y (t - τ)) - (y (t - τ) - y (t - 2 \times τ))$
$y_{t, r e f}$	$y (t - τ)$
$α$ , $β$ , $γ$	Regression parameters within an ARIX model
n, m	Number of past and future exogenous variables that could differ
x, y	Exogenous, endogenous variables

3.4. Pre-Processing Independent Variables for Seasonal and Short-Term Model Training

Based on time series separated into distinctive ACORN subgroups, training data are prepared (i) for the seasonal FNN-S/PNN-S and (ii) for the short-term PNN-C model considering the underlying ARIX process (Equation (10)):

Include calendar data and exogenous/endogenous variables (temperature T, relative humidity RH). Holidays are considered to be uniform due to similar characteristics and consequently possess the same one-hot encoding (Definition 5).
As partial autocorrelations of DECTS only show significant dependencies for the first two lags (Figure 9), p equals 2 within Equations (7) and (8) to train PNN-C. Longer but weaker temporal dependencies (see lags 5 to 8 in Figure 9) due to time-shifted activities like doing sport, cooking, or washing are neglected.
p equals 1 within Equations (7) and (8) to train PNN-S.
While periodic encodings (Definition 6) of tod are neglected for FNN-S/PNN-S, they are included in PNN-C to analyze half-hourly effects on the district electricity consumption. Moreover, FNN-S and PNN-S use doy as periodic encoding in the input space to simulate seasonal variations.
Additonally, seasonal and short-term models use different temperature observations in the input space, where the maximum daily value is taken for PNN-S and half-hourly values are the best choice for PNN-C.

A summary of dependencies between time series processes and model types FNN-S, PNN-S, and PNN-C referring to Equation (10), including the backshift

τ

to obtain reference values (workday refers to last workday, weekend day refers to last weekend day), can be found in Table 1. The resulting input arrays are as follows: (1)

{\vec{X}}_{S, C a l}

and

{\vec{X}}_{S, X}

are used to fit targets within FNN-S and PNN-S (Table 2). (2)

{\vec{X}}_{C, X}

is used to fit the target within PNN-C (Table 2).

3.5. Pre-Processing Dependent Variables for Seasonal and Short-Term Model Training

The independent input arrays

{\vec{X}}_{S, C a l}

,

{\vec{X}}_{S, X}

, and

{\vec{X}}_{C, X}

are fitted to dependent variables of seasonal as well as short-term variations (Table 3). As shown in Section 2.2, the ACORN subgroup time series possess partially strong seasonal characteristics. While training data are sampled per day including the aggregation of several household time series, daily means of scaled pre-transformed values

y_{μ_{d}, t}^{*}

are non-stationary. To stabilize model output, PNN-S is fitted to relative ratios between

y_{μ_{d}, t}^{*}

and seasonal daily mean values

{\hat{y}}_{μ_{d}, t}^{*}

generated by FNN-S of the entire ACORN subgroup resulting in the following:

y_{r e l, μ_{d}, t}^{*} = \frac{y_{i, μ_{d}, t}^{*}}{{\hat{y}}_{μ_{d}, t}^{*}}

(11)

where i is an individual DECTS and t is the date.

To receive absolute values within the synthesis, probabilistic forecasts

{\hat{y}}_{r e l, μ_{d}, t}^{*}

are multiplied by

{\hat{y}}_{μ_{d}, t}^{*}

. The target variables are assigned to their inputs and models as follows:

3.6. Sampling Training Data to Generate Artificial DECTS

After determining how training data are constructed for both model types resulting in exemplary training samples (Table 2), a various amount of artificial DECTS has to be created for model training. Consumer time series can be initially aggregated to a certain amount of artificial DECTS by selecting a subset of consumers for several times (Definition 4). However, this way of sampling has several disadvantages regarding data variability, its distribution, the lack of unseen situations and even missing data of some households due to smart meter time-out. To overcome this, data augmentation is applied by sampling per day to simulate a stochastic process and synthesize DECTS realistically.

4. Probabilistic Feedforward Neural Network

Since DECTS underlie a stochastic ARIX process (Figure 10), they can be simulated by using PNN, which is able to model uncertainties [38]. While usual forecast algorithms only predict mean values

μ_{t}

, PNN is further able to predict the standard deviation

σ_{t}

(Equation (12)). As this approach assumes a Gaussian distribution of errors

ϵ_{t}

(Equation (13)), it can only inadequately represent the confidence interval at lower and upper boundaries of the range of values. To overcome this circumstance, we make some adaptations to the model architecture (Section 4.1). As DECTS are composed of level (seasonal variations in daily means) and pattern (short-term variations in half-hourly sequences), this paper proposes to synthesize seasonal variations firstly and then, based on these values, to synthesize dependent short-term variations (Section 4.3).

y_{t} = μ_{t} + σ_{t}

(12)

ϵ_{t} = y_{t} - \hat{y_{t}}

(13)

where

\hat{y_{t}}

is the forecast value.

4.1. Architecture

PNN-S and PNN-C have the same model architecture, whereas FNN-S neglects the σ-Layer and the Gaussian Layer (Figure 10). They are used in combination to predict seasonal and short-term variations in DECTS by using adequate training data (Section 3.4, Section 3.5 and Section 3.6). Input arrays

{\vec{X}}_{S, C a l}

,

{\vec{X}}_{S, X}

, and

{\vec{X}}_{C}

are initially processed in a State Extraction Layer to encode key features and characteristics. This encoding is further processed in a μ-Layer to fit

y_{r e l, μ_{d}, t}^{*}

and

y_{h h, t}^{*}

and in a σ-Layer to fit error distributions adequately. A Gaussian Layer is then applied on outputs of the μ-Layer and σ-Layer to combine them in respect to the probability density function (Equation (14)). However, this would result in out of sample forecast values at lower and upper boundaries. Moreover, distributions of

ϵ_{t}

close to the minimum/maximum rather correspond to a beta distribution. Therefore, the Gaussian Layer is additionally equipped with a special activation function

R (x)

(Equation (15), Figure 11). For simplicity, any outlier is rebounded at upper or lower boundaries by the value of its exceedance:

ϕ_{μ, σ} (x) = \frac{1}{σ \times \sqrt{2 π}} \times e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}

(14)

R (x) = \{\begin{matrix} x & X_{m i n} \leq x \leq X_{m a x} \\ 2 \times X_{m i n} - x & x < X_{m i n} \\ 2 \times X_{m a x} - x & x > X_{m a x} \end{matrix}

(15)

4.2. Training Strategy

FNN-S, PNN-S, and PNN-C are built with TensorFlow [39], an open-source machine learning framework, and equipped with 10 neurons for the State Extraction Layer and with 1 neuron for the μ-Layer and σ-Layer. They are trained with the Adam optimization algorithm and 200/50 epochs for the seasonal/short-term model. As described in Section 3.4, the input array

{\vec{X}}_{S, C a l}

is used to train FNN-S firstly to fit

y_{j, μ_{d}, t}^{*}

regarding the supervised loss

L_{S} = M S E

(Equation (16)). Predictions

{\hat{y}}_{μ_{d}, t}^{*}

are subsequently used in conjunction with

{\vec{X}}_{S, C a l}

and

{\vec{X}}_{S, X}

to fit

y_{r e l, μ_{d}, t}^{*}

, where the reconstruction loss is

L_{R} = M S E + 50 \times V E

. Thereby,

V E

(Equation (17)) is weighted 50 times more than

M S E

to treat both losses equally concerning their magnitude. Since each half-hour of the day possesses a specific data distribution with individual minimum and maximum boundaries, PNN-C is trained half-hourly to fit

y_{h h, t}^{*}

with regard to

L_{R}

, and 48 different models exist in total. The advantage of this approach lies in satisfying data distributions at any daytime by parameterizing the activation function

R (X)

specifically.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(16)

V E = \frac{1}{n} \sum_{i = 1}^{n} {(|y_{i} - {\hat{y}}_{i}| - {\hat{y}}_{σ, i})}^{2}

(17)

where:

MSE: Mean-Squared-Error;
VE: Variance-Error;
$y_{i}$ : Measurements;
${\hat{y}}_{i}$ : Ouputs of the μ-Layer;
${\hat{y}}_{σ, i}$ : Ouputs of the σ-Layer.

4.3. Generating Synthetic Time Series

While FNN-S, PNN-S, and PNN-C are pre-trained, they can not be applied directly to synthesize DECTS, as autoregressive values and integrated differences are used within PNN-S and PNN-C. To circumvent this, calendar data

{\vec{X}}_{S, C a l}

are firstly used within FNN-S to predict seasonal daily mean values. Since half-hourly values are still missing, we utilize average seasonal daily profiles based on the weekday. As these time series are not varying, they are only used within the first week to initialize synthetic time series:

Predict mean seasonal daily means ${\hat{y}}_{μ_{d}, t}^{*}$ with FNN-S and ${\vec{X}}_{S, C a l}$ and generate half-hourly values using average seasonal daily profiles.
Use ${\hat{y}}_{μ_{d}, t}^{*}$ to synthesize a set of time series of stochastic relative daily mean values ${\hat{y}}_{r e l, μ_{d}, t}^{*}$ with PNN-S and ${\vec{X}}_{S, C a l}$ , ${\vec{X}}_{S, X}$ .
Multiply ${\hat{y}}_{r e l, μ_{d}, t}^{*}$ by ${\hat{y}}_{μ_{d}, t}^{*}$ to receive absolute stochastic daily means ${\hat{y}}_{a b s, μ_{d}, t}^{*}$ . As each ${\hat{y}}_{a b s, μ_{d}, t}^{*}$ is used again in the input array to predict ${\hat{y}}_{a b s, μ_{d}, t + 1}^{*}$ , daily means are generated iteratively (Figure 12).
After appending ${\hat{y}}_{a b s, μ_{d}, t}^{*}$ to $X_{C}$ , half-hourly values ${\hat{y}}_{h h, t}^{*}$ are generated iteratively (Figure 12) with PNN-C to synthesize individual short-term variations.

5. Results

The quality of synthetic time series has to be assessed by comparing these series to real ones concerning different evaluation metrics. Therefore, statistical properties like the distribution of half-hourly values over different months or hours (compare to [20], see Section 5.1), raw visualization of daily sequences (Section 5.2), auto-correlation measurements (compare to [17], see Section 5.3), principal components of daily sequences (compare to [23], see Section 5.4), or a discriminator [15] can be used to analyze and distinguish between real and synthetic sequences. Besides the analysis of half-hourly daily sequences, seasonal variations should additionally be realistic. The following analyses were conducted on a clustered subgroup of ACORN-A (Appendix A, Table A1).

5.1. Distribution of Half-Hourly Values

Initially, we investigate the similarity of daily as well as hourly energy consumption values between synthetic (blue) and real time series (orange) by using box-and-whisker plots, which visually summarize the distribution, analyze skewness (asymmetry), and identify outliers in a dataset [40]. Figure 13a shows a box-and-whisker plot depicting the distribution of daily energy consumption values across different months. For most months of the year, the distribution, as indicated by the interquartile range (width of the box), is nearly identical between the real and synthetic consumption data. Please note that the position of the quartiles is determined by actual values and is not solely dependent on a probability density function. Therefore, minor differences between boxes and whiskers of real and synthetic time series are acceptable. It is noteworthy that the synthetic data show slightly higher values for the box, except for January and March, while the interquartile range differs significantly in January compared to other months. Furthermore, the range between the lower and upper whisker is predominantly larger for the synthetic values throughout the year. Therefore, the model overestimates the spread of the data by assuming a Gaussian distribution of consumption uncertainty. In reality, it is more likely to have strict boundaries for minimum and maximum values. Another indicator of this behavior is the higher number of outliers observed beyond the upper whisker in the synthetic data, whereas no outliers are detected at the lower whisker. However, this high occurrence rate of outliers allows for the investigation of extreme situations related to energy consumption in energy management systems.

In contrast, Figure 13b illustrates hourly energy consumption values (0 h represents the average of half-hourly values at 23:30 and 00:00) across different time periods throughout the day. It can be seen that the distribution of synthetic data fits well. The largest deviations (but still acceptable) are observed during daytime periods when there are significant changes in energy consumption, specifically in mornings, late afternoons, and close to midnight. As previously mentioned, the number of outliers above the upper whisker is higher for the synthetic data, and outliers below the lower whisker are only observed at two specific times of the day (5 h, 7 h). Table 4 summarizes the generic statistics for the training and synthetic data, indicating good agreement between measures of mean values and stronger deviations regarding standard deviations at both 6 h and 12 h. Lower standard deviation values in the synthetic data may be attributable to the assumption of a Gaussian distribution for forecast errors, which are used to describe the stochastic nature of the time series process (Section 3.1). This assumption does not perfectly match the original characteristics, as forecast errors are more likely to be beta-distributed close to the minimum and maximum range of values (Section 4.1). As a result, synthetic time series underestimate stochasticity compared to real ones. However, since DECTS include an uncertain component and synthetic time series are generated probabilistically, this analysis demonstrates the similarity between the synthetic and real energy consumption data. Additionally, it provides an initial estimation of the capability of the proposed approach.

5.2. Examples of Daily Sequences on Different Days

Synthetic and real daily sequences are shown in Figure 14 for different weekdays. While grey solid lines depict synthetic sequences, solid black lines represent the average and dashed black lines represent the upper (0.95 quantile) and lower (0.05 quantile) values of the real sequences. Most daily sequences for Monday, Saturday, and Sunday can be reproduced accurately within this range of values. However, there are sequences for which a relatively large number of outliers are present, as seen on Saturday between 6:00 and 9:00, Sunday around noon, and Monday between 18:00 and 21:00. In accordance with previous analyses (Figure 13b), the outliers mostly overestimate the consumption values during periods of increasing and decreasing electricity consumption, and only slight underestimations (Sunday and Monday between 6:00 and 9:00/12:00) are found. As discussed in Section 5.1, the discrepancy between real and synthetic sequences can be attributed to forecast errors being Gaussian-distributed. However, the synthetic time series exhibit similar daily characteristics as the real electricity consumption data.

5.3. Autocorrelation of Entire Time Series

The autocorrelation function [29] is a statistical measure that quantifies the degree of similarity between a time series and its historical/past values. It measures temporal dependencies and linear relationships between observations at different time lags within the same time series to identify periodic patterns. In Figure 15a, the comparison of average autocorrelations between several synthetic time series and their corresponding real ones are shown within an entire week, i.e., 168 h and 336 half-hourly values. General time series characteristics, meaning self-correlation to lags 48 (1 day), 96 (2 days), 144 (3 days), 192 (4 days), 240 (5 days), 288 (6 days), and 336 (7 days), are clearly observable in both the real and synthetic data. However, for a temporal shift of 7 to 18 h (lags 14 to 36, Figure 15a), there are larger differences in autocorrelations. While real time series possess positive correlations close to zero, synthetic ones exhibit weak negative correlations and follow stronger general characteristics. As correlations are very low in this interval, the overall reliability of this approach is guaranteed. The partial autocorrelation function [29] measures the strength and significance of the direct relationship between a time series and its lagged values while considering the influence of intermediate values. Figure 15b shows that the real and synthetic time series nearly possess identical dependencies on past observations concerning partial autocorrelations with lags up to an entire day. However, there are larger differences for lags 3 to 8, which may depend on the chosen AR value of

p = 2

(Section 3.4).

5.4. Principal Components of Daily Sequences

Principal Component Analysis PCA [41] is a statistical technique used for dimensionality reduction and data visualization. It transforms a dataset by extracting a smaller number of new orthonormal variables (principal components) that are linear combinations of the input data space, explaining the majority of its variance and representing the most important patterns or features. Figure 16 shows the two dimensional principal components of 500 examples of the real and synthetic daily sequences. It can be seen that both datasets are equally distributed and possess similar characteristics concerning their variance, which clearly indicates the capability of the proposed approach. Due to the more irregular behavior of the real sequences, they tend to have a higher number of outliers within the two-dimensional plane. Moreover, means and standard deviations of principle components (PC1 and PC2) for both the training and synthetic data are similar, except for PC2-

σ

(Table 5). This observation can be explained by the more irregular behavior of real consumers, while the synthetic ones closely follow a learned average behavior.

5.5. Correlating Real vs. Synthetic Time Series

Since the analyses before only use various metrics to compare the real and synthetic time series of one specific ACORN subgroup, this section additionally evaluates correlations between various ACORN subgroups in order to demonstrate the approach’s applicability and transferability across various characteristics. Figure 17 shows the correlation matrix between the real and synthetic time series of various ACORN subgroups (Section 2.2) displayed as a heatmap. In this analysis, we utilized the average of multiple time series. The strongest correlations (indicated by yellow color) between the real and synthetic time series are observed within the same ACORN subgroups, as seen by the yellow components on the diagonal. Additionally, the matrix reveals nearly identical inter-correlations (Inter-correlation refers to the relationship between two or more variables. It measures the degree to which variables are related to each other) with other ACORN subgroups, as evidenced by the nearly perfect mirror symmetry along the diagonal line. This indicates similar patterns between synthetic and real time series across different ACORN subgroups. In detail, the ACORN subgroups 0–12 exhibit high inter-correlations among each other, except for ACORN subgroup 2, which shows a low inter-correlation with almost all other subgroups. Another point to note is that subgroups 45–54 show only small inter-correlation coefficients with most other subgroups in both the synthetic and real time series. The high similarity of inter-correlations between the real and synthetic time series further proves the capability of the proposed method.

6. Discussion

The results (Section 5) show that DECTS can be realistically generated using a combination of FNN and PNN to satisfy seasonal as well as short-term variations and general characteristics concerning distribution and autoregression. The main advantage is the imitation of an ARIX process, including realistic stochasticity. This approach is adaptable with regards to data pre-processing and calendar and exogenous variables; in addition, the corresponding process parameters (Section 3.3) can be varied depending on individual time series characteristics. Compared to other papers [15,17,20,23], our study additionally uses various metrics for evaluation purposes to prove the approach’s capability. Since the same model parameterization was applied to all 55 ACORN subgroups, the approach is able to automize time series synthesis and is simultaneously highly robust. Synthetic DECTS from various ACORN subgroups can be used to construct consumption time series for RECs with diverse socioeconomic compositions. Since consumers may switch their electricity provider daily or even evolve into another consumption group (ACORN subgroup), RECs may possess a dynamic portfolio, resulting in time series with trends and non-stationary characteristics. With a large number of synthetic DECTS from various ACORN subgroups available, this trend could be synthetically generated by altering the number of specific participants within an REC.

A considerable disadvantage is that time series have to be initialized within the first week to provide autoregressive values and integrated differences. While this affects only first synthesized values, it dilutes after a few time steps. Additionally, this approach is currently applicable only to endogenous variables with strong periodicities like electricity consumption, temperature, or solar radiation. To generate highly stochastic time series like wind velocity or grid losses, which possess irregular behaviors, data pre-processing (higher p values regarding AR for seasonal and periodic model) and model architecture (usage of recurrent layers like Long-Short-Term Memory within PNN to handle long-term dependencies) have to be adapted. Since energy market data in countries like Germany usually have a temporal resolution of 15 min, the synthesis only needs to be trained with respective quarter-hourly time series, assuming that no fundamental differences in process characteristics exist. For time series with higher temporal resolutions (less than 15 min), the data possess stronger stochastic behavior. Consequently, synthetic time series on a quarterly or half-hourly basis are not directly suitable for very short-term energy management research. Hence, it is essential to firstly interpolate DECTS appropriately by including specific process characteristics to achieve higher temporal resolutions. After this step, many additional use cases can be simulated, e.g., optimizing electricity storage in smart grids, utilizing flexibility potentials within REC, forecasting district electricity consumption, or detecting anomalies to ensure energy efficiency.

For this research, our analysis was conducted on a Windows 10 Pro machine with an Intel(R) Core(TM) i7-8665U CPU running at a base frequency of 1.90 GHz and a maximum turbo frequency of 2.11 GHz. The machine was equipped with 16.0 GB of RAM. We used Python 3.11.6 and TensorFlow 2.14.0 [39] within a single process. The training time for 334,705 samples was approximately 3.2 ± 0.5 min. Generating 300 synthetic time series with a length of 2 years took approximately 4.0 ± 1.5 min. This can be further accelerated by leveraging parallel computing or utilizing distributed hardware resources to scale up the processing of a large number of DECTS with different ACORN subgroup characteristics at multiple locations, thus accounting for weather effects in a broader study area.

7. Conclusions and Outlook

This paper introduces a novel two-step approach, using the Box–Jenkins method and probabilistic forecast models, to emulate and to synthesize the stochastic nature of district electricity consumption time series realistically. Scalability, robustness, and transferability are demonstrated by the application to various time series characteristics while simultaneously measuring computational efficiency. Compared to state-of-the-art publications, our approach is successfully validated using a variety of evaluation metrics, e.g., seasonal as well as short-term variations and general characteristics concerning distribution, autoregression, autocorrelation, and stochasticity. Our approach for synthesizing consumption time series enables in-depth investigations of energy system modeling by exploring various scenarios, specifically focusing on preventing peak loads and considering various consumption characteristics depending on different socioeconomic factors and weather data. It has the potential to facilitate the integration of emerging technologies and to optimize the portfolio of Renewable Energy Communities. Future work will investigate the approach’s transferability to time series with different process characteristics, the applicability of recurrent neural networks in combination with larger autoregressive parts to overcome some disadvantages, and lastly the integration of a trend component to include systematic changes.

Author Contributions

Conceptualization, L.R.; Methodology, L.R.; Validation, L.R.; Formal analysis, T.B. and S.L.; Writing—original draft, L.R.; Writing—review & editing, L.R., T.B. and S.L.; Visualization, T.B.; Supervision, S.L. and P.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Federal Ministry for Economic Affairs and Climate Action in Germany grant number 01MK20013A.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AR	Auto-Regressive
ARIMA	Auto-Regressive Integrated Moving-Average
ARIX	Auto-Regressive Integrated with eXogenous variables
ARMA	Auto-Regressive Moving-Average
ARX	Auto-Regressive with eXogenous variables
cINN	Conditional Invertible Neural Networks
DECTS	District Electricity Consumption Time Series
dow	Day of Week
doy	Day of Year
EMS	Energy Management System
FNN	Feedforward Neural Network
GAN	Generative Adversarial Network
hol	Holiday
PNN	Probabilistic Neural Network
RE	Regression Equation
REC	Renewable Energy Communities
RH	Relative Humidity
T	Temperature
tod	Time of Day

Appendix A. ACORN User Segmentation

Table A1. ACORN user segmentation taken from [31].

ACORN A	Lavish Lifestyles	ACORN J	Starting Out
ACORN B	Executive Wealth	ACORN K	Student Life
ACORN C	Mature Money	ACORN L	Modest Means
ACORN D	City Sophisticates	ACORN M	Striving Families
ACORN E	Career Climbers	ACORN N	Poorer Pensioners
ACORN F	Countryside Communities	ACORN O	Young Hardship
ACORN G	Successful Suburbs	ACORN P	Struggling Estates
ACORN H	Steady Neighborhoods	ACORN Q	Difficult Circumstance
ACORN I	Comfortable Seniors	ACORN U	Not Private Households

References

Available online: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX:32019L0944 (accessed on 7 February 2024).
Flemming, S.; Bender, T.; Surmann, A.; Pelka, S.; Martin, A.; Kühnbach, M. Vor-Ort-Systeme als flexibler Baustein im Energiesystem? Eine Cross-Sektorale Potenzialanalyse; Fraunhofer-Publica: Berlin, Germany, 2023. [Google Scholar] [CrossRef]
Koch, A.; Schmelcher, S.; Sternkopf, T.; Wrede, M. Modellierung Sektorintegrierter Energieversorgung IM Quartier—Untersuchung der Vorteile der Optimierung von Energiesystemen auf Quartiersebene gegenüber der Optimierung auf Gebäudeebene. Available online: https://www.dena.de/fileadmin/dena/Publikationen/PDFs/2022/STUDIE_Modellierung_sektorintegrierter_Energieversorgung_im_Quartier.pdf (accessed on 5 January 2023).
Energieforschungsprogramm der Bundesregierung. Available online: https://www.bmwk.de/Redaktion/DE/Publikationen/Energie/7-energieforschungsprogramm-der-bundesregierung.pdf?__blob=publicationFile&v=4 (accessed on 5 January 2023).
Available online: https://www.bmwk.de/Redaktion/DE/Dossier/netze-und-netzausbau.html (accessed on 19 July 2023).
Available online: https://wirtschaftslexikon.gabler.de/definition/energiemanagementsystem-53996 (accessed on 19 July 2023).
Deutsch, M.; Timpe, P. The effect of age on residential energy demand. In 8. Dynamics of Consumption; European Council for an Energy Efficient Economy: Stockholm, Sweden, 2013. [Google Scholar]
Estiri, H.; Zagheni, E. Age matters: Ageing and household energy demand in the United States. Energy Res. Soc. Sci. 2019, 55, 62–70. [Google Scholar] [CrossRef]
Abrahamse, W.; Steg, L. Factors Related to Household Energy Use and Intention to Reduce It: The Role of Psychological and Socio-Demographic Variables. Hum. Ecol. Rev. 2011, 18, 30–40. [Google Scholar]
Frederiks, E.; Stenner, K.; Hobman, E. The Socio-Demographic and Psychological Predictors of Residential Energy Consumption: A Comprehensive Review. Energies 2015, 8, 573–609. [Google Scholar] [CrossRef]
Cerqueira, V.; Torgo, L.; Mozeti?, I. Evaluating time series forecasting models: An empirical study on performance estimation methods. Mach. Learn. 2020, 109, 1997–2028. [Google Scholar] [CrossRef]
Hittmeir, M.; Ekelhart, A.; Mayer, R. On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks. In Proceedings of the 14th International Conference on Availability, Reliability and Security. Association for Computing Machinery (ARES ’19), Canterbury, UK, 26–29 August 2019. [Google Scholar] [CrossRef]
Iwana, B.K.; Uchida, S. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 2021, 16, e0254841. [Google Scholar] [CrossRef] [PubMed]
Thapa, R.; Shimada, M.; Watanabe, M.; Motohka, T.; Shiraishi, T. The tropical forest in South East Asia: Monitoring and scenario modeling using Synthetic Aperture Radar data. Appl. Geogr. 2013, 41, 168–178. [Google Scholar] [CrossRef]
Yilmaz, B.; Korn, R. Synthetic demand data generation for individual electricity consumers: Generative Adversarial Networks (GANs). Energy AI 2022, 9, 100161. [Google Scholar] [CrossRef]
Available online: https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:02018L2001-20181221&from=EN (accessed on 5 January 2023).
Shamshad, A.; Bawadi, M.; Wan Hussin, W.; Majid, T.; Sanusi, S. First and second order Markov chain models for synthetic generation of wind speed time series. Energy 2005, 30, 693–708. [Google Scholar] [CrossRef]
Talbot, P.W.; Rabiti, C.; Alfonsi, A.; Krome, C.; Kunz, M.R.; Epiney, A.; Wang, C.; Mandelli, D. Correlated synthetic time series generation for energy system simulations using Fourier and ARMA signal processing. Int. J. Energy Res. 2020, 44, 8144–8155. [Google Scholar] [CrossRef]
Richardson, I.; Thomson, M.; Infield, D. A high-resolution domestic building occupancy model for energy demand simulations. Energy Build. 2008, 40, 1560–1566. [Google Scholar] [CrossRef]
Fischer, D.; Härtl, A.; Wille-Haussmann, B. Model for Electric Load Profiles With High Time Resolution for German Households. Energy Build. 2015, 92. [Google Scholar] [CrossRef]
Naumann, S.; Klaiber, S.; Kummerow, A.; Bretschneider, P. Simulation of Coordinated Market Grid Operations considering Uncertainties. In Proceedings of the 2018 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Sarajevo, Bosnia and Herzegovina, 21–25 October 2018; pp. 1–6. [Google Scholar] [CrossRef]
Klaiber, S. Analyse, Identifikation und Prognose Preisbeeinflusster Elektrischer Lastzeitreihen. Ph.D. Thesis, Technische Universität Ilmenau, Ilmenau, Germany, 2020. [Google Scholar]
Asre, S.; Anwar, A. Synthetic Energy Data Generation Using Time Variant Generative Adversarial Network. Electronics 2022, 11, 355. [Google Scholar] [CrossRef]
Zhang, C.; Kuppannagari, S.; Kannan, R.; Prasanna, V.K. Generative Adversarial Network for Synthetic Time Series Data Generation in Smart Grids. In Proceedings of the 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Aalborg, Denmark, 29–31 October 2018. [Google Scholar] [CrossRef]
Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 2017, 4, 588–598. [Google Scholar] [CrossRef]
Heidrich, B.; Turowski, M.; Phipps, K.; Schmieder, K.; Süß, W.; Mikut, R.; Hagenmeyer, V. Controlling Non-Stationarity and Periodicities in Time Series Generation Using Conditional Invertible Neural Networks. Appl. Intell. 2022, 53, 8826–8843. [Google Scholar] [CrossRef]
Available online: https://www.loadprofilegenerator.de (accessed on 2 August 2023).
Available online: https://synpro-lastprofile.de/ (accessed on 2 August 2023).
Tunnicliffe Wilson, G. Time Series Analysis: Forecasting and Control, 5th ed.; Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M., Eds.; John Wiley and Sons Inc.: Hoboken, NJ, USA, 2015; p.712, ISBN: 978-1-118-67502-1. J. Time Ser. Anal. 2016, 37, 709–711. [Google Scholar] [CrossRef]
Available online: https://www.kaggle.com/datasets/jeanmidev/smart-meters-in-london (accessed on 19 July 2023).
Available online: https://www.caci.co.uk/wp-content/uploads/2021/06/Acorn-User-Guide-2020.pdf (accessed on 19 July 2023).
Savi, M.; Olivadese, F. Short-Term Energy Consumption Forecasting at the Edge: A Federated Learning Approach. IEEE Access 2021, 9, 95949–95969. [Google Scholar] [CrossRef]
Makridakis, S.; Hibon, M. ARMA models and the Box-Jenkins methodology. J. Forecast. 1997, 16, 147–163. [Google Scholar] [CrossRef]
Kang, J.; Reiner, D.M. What is the effect of weather on household electricity consumption? Empirical evidence from Ireland. Energy Econ. 2022, 111, 106023. [Google Scholar] [CrossRef]
Pinheiro, M.; Madeira, S.; Francisco, A. Short-term electricity load forecasting? A systematic approach from system level to secondary substations. Appl. Energy 2023, 332, 120493. [Google Scholar] [CrossRef]
Gasparin, A.; Lukovic, S.; Alippi, C. Deep Learning for Time Series Forecasting: The Electric Load Case. arXiv 2019, arXiv:1907.09207. [Google Scholar] [CrossRef]
Available online: https://transparency.entsoe.eu (accessed on 28 November 2023).
Heidrich, B.; Phipps, K.; Neumann, O.; Turowski, M.; Mikut, R.; Hagenmeyer, V. ProbPNN: Enhancing Deep Probabilistic Forecasting with Statistical Information. arXiv 2023, arXiv:2302.02597. [Google Scholar]
Available online: https://www.tensorflow.org. (accessed on 13 March 2024).
Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Boston, MA, USA, 1977. [Google Scholar]
Jolliffe, I. Principal Component Analysis; Springer: Berlin/Heidelberg, Germany, 1986. [Google Scholar]

Figure 1. Procedure of time series synthesis consisting of multiple steps of analysis, pre-processing, training, and synthesis: (i) Clustering ACORN (ACORN is a segmentation tool which categorizes UK’s population into demographic types) household electricity consumption time series and transforming and scaling of non-Gaussian distributed data. (ii) Aggregation of household data to the level of a Renewable Energy Community, extracting the time series process in order to sample training data adequately. (iii) Training a probabilistic seasonal

S_{t}

and a probabilistic periodic model

C_{t}

considering stochasticity to (iv) iteratively generate synthetic time series.

Figure 1. Procedure of time series synthesis consisting of multiple steps of analysis, pre-processing, training, and synthesis: (i) Clustering ACORN (ACORN is a segmentation tool which categorizes UK’s population into demographic types) household electricity consumption time series and transforming and scaling of non-Gaussian distributed data. (ii) Aggregation of household data to the level of a Renewable Energy Community, extracting the time series process in order to sample training data adequately. (iii) Training a probabilistic seasonal

S_{t}

and a probabilistic periodic model

C_{t}

considering stochasticity to (iv) iteratively generate synthetic time series.

Figure 2. Distribution of the number of different ACORN groups within the dataset.

Figure 3. Distribution of raw consumption values (left) and ones which are obtained by Yeo–Johnson transformation (Equation (2)) (right). The transformed values are more Gaussian-like.

Figure 4. Distribution of daily means (left) and their 2-dimensional principal components of corresponding daily sequences (right).

Figure 5. Two-step clustering of household consumption time series of one ACORN group: Seasonal cycle (left) and short-term cycle (right).

Figure 6. Time series forecasting taking uncertainties

ϵ_{t}

into account: Actual time series with observed uncertainties (blue) and conditional Gaussian distribution of uncertainties (grey).

Figure 6. Time series forecasting taking uncertainties

ϵ_{t}

into account: Actual time series with observed uncertainties (blue) and conditional Gaussian distribution of uncertainties (grey).

Figure 7. Two-step model approach for time series synthesis: FNN-S uses a feedforward neural network FNN as well as calendar data

X_{s, c a l}

to simulate daily mean values

{\hat{y}}_{μ, t}^{'}

. PNN-S uses a probabilistic neural network PNN as well as process data

X_{s, x}

to simulate variations in daily mean values

{\hat{y}}_{r e l, t}^{'}

.

{\hat{y}}_{r e l, t}^{'}

as well as process data

X_{i}

are then used within PNN-C to simulate short-term variations in half-hourly values

{\hat{y}}_{h h, t}^{'}

.

L_{S}

/

L_{R}

are supervised/reconstruction loss to train generic/stochastic time series behavior.

Figure 7. Two-step model approach for time series synthesis: FNN-S uses a feedforward neural network FNN as well as calendar data

X_{s, c a l}

to simulate daily mean values

{\hat{y}}_{μ, t}^{'}

. PNN-S uses a probabilistic neural network PNN as well as process data

X_{s, x}

to simulate variations in daily mean values

{\hat{y}}_{r e l, t}^{'}

.

{\hat{y}}_{r e l, t}^{'}

as well as process data

X_{i}

are then used within PNN-C to simulate short-term variations in half-hourly values

{\hat{y}}_{h h, t}^{'}

.

L_{S}

/

L_{R}

are supervised/reconstruction loss to train generic/stochastic time series behavior.

Figure 8. Electricity load time series of 50 Hz [37] showing exemplary temporal shift (

τ_{s a}

: 7 days,

τ_{s o}

: 7 days, and

τ_{m o}

: 3 days) for reference values used in

\nabla_{τ}^{d} x_{j} (t - i)

,

\nabla_{τ}^{d} y (t - i)

,

y_{t, r e f}

(Equations (7)–(9)).

Figure 8. Electricity load time series of 50 Hz [37] showing exemplary temporal shift (

τ_{s a}

: 7 days,

τ_{s o}

: 7 days, and

τ_{m o}

: 3 days) for reference values used in

\nabla_{τ}^{d} x_{j} (t - i)

,

\nabla_{τ}^{d} y (t - i)

,

y_{t, r e f}

(Equations (7)–(9)).

Figure 9. Partial autocorrelation plot of one exemplary DECTS.

Figure 10. Architecture of probabilistic neural network.

Figure 11. Rebound activation, which is used within the Gaussian Layer (Figure 10). Lower

X_{m i n}

and upper

X_{m a x}

boundaries are exemplarily shown for

x \in [- 1, 1]

.

Figure 11. Rebound activation, which is used within the Gaussian Layer (Figure 10). Lower

X_{m i n}

and upper

X_{m a x}

boundaries are exemplarily shown for

x \in [- 1, 1]

.

Figure 12. Workflow of model M to synthesize seasonal and short-term variations iteratively.

Figure 13. Distribution of (a) daily energy consumption for different months and (b) hourly values for different day times.

Figure 14. Distribution of daily sequences on different days with a temporal resolution of 30 min.

Figure 15. (a) Autocorrelation and (b) partial autocorrelation of real vs. synthetic DECTS.

Figure 16. 2-D principle components of daily sequences.

Figure 17. Correlation matrix between the real and synthetic time series with higher (yellow) and lower (dark blue) values.

Table 1. Process variables for FNN-S, PNN-S, and PNN-C concerning Equation (10).

Process Variable	FNN-S	PNN-S	PNN-C
p	X	1	2
$τ_{m o}$	X	3 days	3 days
$τ_{t u}$	X	1 day	1 day
$τ_{w e}$	X	1 day	1 day
$τ_{t h}$	X	1 day	1 day
$τ_{f r}$	X	1 day	1 day
$τ_{s a}$	X	6 days	6 days
$τ_{s u}$	X	1 day	1 day
Exogenous variables	X	$T_{m a x}$ , $R H_{m e a n}$	$T_{h h}$ , $R H_{h h}$
dow/H	one-hot-enc	one-hot-enc	one-hot-enc
tod	X	X	periodic-enc
doy	periodic-enc	periodic-enc	periodic-enc

Table 2. Input arrays

{\vec{X}}_{S, C a l}

/

{\vec{X}}_{S, X}

used to fit targets within FNN-S/PNN-S and

{\vec{X}}_{C, X}

used to fit the target within PNN-C.

Table 2. Input arrays

{\vec{X}}_{S, C a l}

/

{\vec{X}}_{S, X}

used to fit targets within FNN-S/PNN-S and

{\vec{X}}_{C, X}

used to fit the target within PNN-C.

${\vec{X}}_{S, Cal} (t)$	${\vec{X}}_{S, X} (t)$	${\vec{X}}_{C, X} (t)$
${(\begin{matrix} d o w (t) \\ H (t) \\ d o y (t) \end{matrix})}^{T}$	${(\begin{matrix} {\vec{X}}_{S, C a l} (t) \\ T (t - τ) \\ T (t - 1) \\ T (t) \\ R H (t - τ) \\ R H (t - 1) \\ R H (t) \end{matrix})}^{T}$	${(\begin{matrix} {\vec{X}}_{S, C a l} (t - 1) \\ {\vec{X}}_{S, C a l} (t) \\ t o d (t - 1) \\ t o d (t) \\ T (t - τ) \\ T (t - 1) \\ T (t) \\ R H (t - τ) \\ R H (t - 1) \\ R H (t) \end{matrix})}^{T}$

Table 3. Summary of assignment of inputs and targets to generate outputs with a specific model.

Model	Input	Target	Output
FNN-S	${\vec{X}}_{S, C a l}$	$y_{μ_{d}, t}^{*}$	${\hat{y}}_{μ_{d}, t}^{*}$
PNN-S	${\vec{X}}_{S, C a l}$ , ${\vec{X}}_{S, X}$	$y_{r e l, μ_{d}, t}^{*}$	${\hat{y}}_{r e l, μ_{d}, t}^{*}$
PNN-C	${\vec{X}}_{C}$	$y_{h h, t}^{*}$	${\hat{y}}_{h h, t}^{*}$

Table 4. Means

μ

and standard deviations

σ

of the training and synthetic data regarding hourly electricity consumption and specific times of the day, namely, 0 h, 6 h, 12 h, and 18 h.

Table 4. Means

μ

and standard deviations

σ

of the training and synthetic data regarding hourly electricity consumption and specific times of the day, namely, 0 h, 6 h, 12 h, and 18 h.

	Measure	0 h	6 h	12 h	18 h
training data	$μ$ [kW] $σ$ [kW]	0.295231 0.103369	0.318273 0.101921	0.404327 0.146100	0.693896 0.251876
synthetic data	$μ$ [kW] $σ$ [kW]	0.315641 0.099668	0.323386 0.081332	0.409102 0.120735	0.712146 0.244308

Table 5. Statistics of the principle components of the training and synthetic data regarding the mean

μ

and standard deviation

σ

.

Table 5. Statistics of the principle components of the training and synthetic data regarding the mean

μ

and standard deviation

σ

.

	PC1- $μ$	PC2- $μ$	PC1- $σ$	PC2- $σ$
Training data	0.009	−0.007	0.225	0.070
Synthetic data	−0.009	0.007	0.232	0.053

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Richter, L.; Bender, T.; Lenk, S.; Bretschneider, P. Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts. Energies 2024, 17, 1634. https://doi.org/10.3390/en17071634

AMA Style

Richter L, Bender T, Lenk S, Bretschneider P. Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts. Energies. 2024; 17(7):1634. https://doi.org/10.3390/en17071634

Chicago/Turabian Style

Richter, Lucas, Tom Bender, Steve Lenk, and Peter Bretschneider. 2024. "Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts" Energies 17, no. 7: 1634. https://doi.org/10.3390/en17071634

APA Style

Richter, L., Bender, T., Lenk, S., & Bretschneider, P. (2024). Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts. Energies, 17(7), 1634. https://doi.org/10.3390/en17071634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generating Synthetic Electricity Load Time Series at District Scale Using Probabilistic Forecasts

Abstract

1. Introduction

1.1. The Energy Market

1.2. Time Series Synthesis

1.3. Contributions

2. Data

2.1. Analysis and Data Transformation

2.2. Clustering ACORN Groups

3. Methodology

3.1. Problem Description

3.2. Concept

3.3. Time Series Forecast

3.4. Pre-Processing Independent Variables for Seasonal and Short-Term Model Training

3.5. Pre-Processing Dependent Variables for Seasonal and Short-Term Model Training

3.6. Sampling Training Data to Generate Artificial DECTS

4. Probabilistic Feedforward Neural Network

4.1. Architecture

4.2. Training Strategy

4.3. Generating Synthetic Time Series

5. Results

5.1. Distribution of Half-Hourly Values

5.2. Examples of Daily Sequences on Different Days

5.3. Autocorrelation of Entire Time Series

5.4. Principal Components of Daily Sequences

5.5. Correlating Real vs. Synthetic Time Series

6. Discussion

7. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. ACORN User Segmentation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI