**1. Introduction**

A smart grid consists of multiple types of entities such as those involved in generation, distribution, and consumption (smart appliances and buildings). One of the aims of a smart grid is to manage electricity demand in an economical manner via integration and exchange of information about all entities involved. For the customers or the end-consumers as well as the electricity distributing agencies or the electricity brokers, it offers the flexibility to choose/allocate among dynamically changing tariffs to meet certain objectives, e.g.,

**Citation:** Narwariya, J.; Verma, C.; Malhotra, P.; Vig, L.; Subramanian, E.; Bhat, S. Electricity Consumption Forecasting for Out-of-Distribution Time-of-Use Tariffs. *CSFM* **2022**, *3*, 1. https://

doi.org/10.3390/cmsf2022003001Academic Editors: Kuan-Chuan Peng andZiyanWu

 Published: 8 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

minimize electricity bill for customers, maximize profit for retailers, etc. However, meeting such objectives is challenging due to dynamics of the market, e.g., changing wholesale electricity prices, supply–demand fluctuations, etc.

As depicted in Figure 1, a broker typically performs three functions: (1) purchase or sell power to its subscribers or customers in the retail market, (2) purchase or sell power in the wholesale market, and (3) rectify any supply–demand imbalance within its portfolio through the balancing market. In this work, we consider a simplified setting where the broker performs the following two functions: (1) sell power to those customers in the retail market who are electricity consumers, and (2) purchase power in the wholesale market. Typical examples of consumers include offices, housing complexes, hospitals, and villages. Furthermore, we focus on only those subset of consumers who have a *shiftable* load component in their total or aggregate consumption in addition to the traditional fixed or non-shiftable load, i.e., the consumption (e.g., appliance usage) at an hour that cannot be moved to another hour. This shiftable load can be shifted from the originally preferred hour to another hour in the day if the tariff for the latter is lower. The broker may want to encourage such a behavior, known as demand response managemen<sup>t</sup> [1], to maximize profit or balance demand–supply.

**Figure 1.** Various aspects and objectives in an electricity markets. In this work, we focus on a sub-problem related to allocation of optimal time-of-use tariff (TOU Tariff) to each customer.

In this work, we consider the following out-of-distribution generalization problem: given historical aggregated consumption of consumers to tariff profiles allocated to them, forecast the aggregated consumption for new tariff profiles. These new tariff profiles are part of the electricity broker or retailer's plan to explore new profiles to further improve the profits. This is different from standard forecasting problems as the exogenous variables (tariff profiles) at test time are different from the exogenous variables at train time. Furthermore, the allocation of tariff profiles in the past is not random, so the data is biased in the sense that, for different consumer personas, not all historical tariff profiles would have been tried. We note that the logic based on which the consumers respond to tariff profiles is consistent irrespective of the tariff profile. We propose to capture that logic in the neural network by using permutation equivariant networks and attention mechanisms.

The key contributions of this work can be summarized as follows:


Through empirical evaluation, we show that the proposed approach is able to improve upon vanilla methods that do not take into account suitable inductive biases guided by the knowledge of how consumers respond to tariff profiles.

### **2. Problem Formulation**

The aggregated consumption *ec*,*<sup>t</sup>* ∈ R<sup>+</sup> of a consumer *c* at time *t* has two components: (1) *Type-I consumption*: this is non-shiftable consumption corresponding to the appliances that have to be used at specific hours only and cannot be shifted to alternative hour; (2) *Type-II consumption*: this is shiftable component of the consumption corresponding to appliances whose usage can be planned. Refer to Figure 2a for more details.

**Figure 2.** (**a**) Logic for Consumption Data generation in Electricity Markets and (**b**,**<sup>c</sup>**) Hourly Tariff Rate Distributions depicting changing distribution across hours that poses generalization challenge. (**a**) Causal Diagram. (**b**) Hourly Tariff Distributions in IID Profiles depicting temporal bias (T*in*). (**c**) Hourly Tariff Distributions in OOD Profiles (T*out*).

Let *ec*,1:*<sup>t</sup>* denote the time series of electricity consumption for consumer *c* until time *t*. We consider a consumer *c* ∈ C, where C is the set of consumers with non-zero Type-II consumption, i.e., part of their load can be shifted in response to variations in tariff across hours. Further, the *i*-th time-of-use (TOU) tariff profile is denoted as an ordered sequence or *H*-length time series of hourly tariffs *TOU<sup>i</sup>* = *TOUi*1 ... *TOUiH*, where *TOUih* (*h* = 1 ... *H*) denotes the tariff at hour *h*. In this work, we consider tariff profile with hourly rates over a day such that *H* = 24, without loss of generality.

Let **f***<sup>c</sup>*,1:*<sup>t</sup>* denote all features (static or time-varying) for consumer *c* at time *t*, including e.g., past consumption time series, type of consumer (household, office, etc.), and **f***t* denote a vector of temporal features at timestamp *t*, e.g., hour of the day, day of the week, week of the month, month of the year, etc. Note that **f***<sup>c</sup>*,1:*<sup>t</sup>* refers to relevant features from entire history, but in practice, we consider a window of length *w* over *t* − *w* + 1 : *t* for deriving features at time *t*.

Further consider a tariff allocation policy function *π* such that

$$TOL\_{\mathfrak{c},t+\tau} = \pi(\mathbf{f}\_{\mathfrak{c},t}, \mathbf{f}\_{t+\tau}, \hat{p}\_{t+1:t+H})\_{\tau}$$

i.e., the tariff at a future time *t* + *τ* with *τ* = 1 ... *H* is decided based on consumer features at time *t*, the temporal features for time *t* + *τ*, where *p*<sup>ˆ</sup>*t*+*τ* denotes the estimate of electricity price *pt*<sup>+</sup>*τ* in the wholesale market at time *t* + *τ*. Without loss of generality, we consider the scenario where *t* + 1 corresponds to the first hour of the day, i.e., tariff profile for the next day is decided using data until the end of the current day.

Consider historical time series data D = {*ec*,1:*t*, *TOUc*,1:*t*}*c*∈C , where the tariff time series are a result of sequence of tariff profile allocations over days such that any profile *TOU<sup>i</sup>* ∈ T*in* is chosen from a fixed set of profiles T*in*.

The goal for the broker is to allocate that tariff profile *TOU<sup>i</sup>* to a consumer that maximizes the gain *Gic* over the next *H* hours:

$$G\_c^i = \sum\_{t'=1}^H \left( TOll\_{c,t+t'}^i - p\_{t+t'} \right) \times \mathcal{e}\_{c,t+t'}.\tag{1}$$

Importantly, the electricity consumption *ec*,*t*+*t* at *t* + *t* hour is a function of the entire tariff profile on that day, as the consumer could choose to shift the shiftable part of the load from high tariff hours to low tariff hours by looking at the tariff profile allocated to the consumer at the beginning of the day.

We consider the following two scenarios depending on the tariff profiles being considered for future allocations:

**IID Scenario**: when the profiles to be allocated to the consumers in future are from the same set of profiles T*in* used historically, i.e., *TOU<sup>i</sup>* ∈ T*in*.

**OOD Scenario**: when the tariff profiles to be allocated to the consumers in future belong to T*all* = T*in* ∪ T*out*, where T*out* is a new set of profiles not previously seen in D, i.e., are out-of-distribution with respect to the training data, and not previously allocated to any consumer by the broker who wants to consider these new profiles to improve future gains, i.e., *TOU<sup>i</sup>* ∈ T*all*.

### **3. Related Work**

Our work relates to two bodies of literature: (1) demand response managemen<sup>t</sup> in electricity markets and the related sub-problem of electricity consumption forecasting under exogenous variables, using reinforcement learning and deep learning methods [2–4], and (2) out-of-distribution (OOD) generalization [5–8].

There have been many studies for (1); however, to the best of our knowledge, the problem of bias in historical data in terms of the tariff profiles has been largely overlooked. We draw attention of the community working on (1) to the potential of OOD generalization by improving forecasts for previously unallocated tariffs by using the underlying structure of the problem in terms of the particular way in which consumers shift loads in response to changes in tariff. More specifically, we rely on the partial permutation equivariance property of the response to time series of tariffs.

OOD detection and generalization is an emerging area of research, and aims at improving the robustness of models to previously unseen scenarios. Many of the recent approaches for (2) rely on changes in the objective function or different training procedures. For example, the approaches based on meta-learning [9] are not applicable as there is no notion of multiple tasks. We can consider each tariff profile as a task but then the forecasting can involve different profiles in input versus output. In this work, we focus on using inductive biases in the form of the neural network architecture to improve OOD generalization. There is enough evidence to support the improvement in generalization abilities of neural networks by using the structure of the problem to introduce suitable inductive biases in the learning process. The most commonly used inductive bias is in the design of the neural network architecture motivated by the structure of the problem. Recent examples of this include using graph neural networks [10,11] and modular networks [12]. Recently, using structural biases in deep neural networks motivated by the nature of bias and the structure of the problem have been successfully evaluated for time series forecasting [13]. Data-dependent priors have been recently proposed in [14]. However, to the best of our knowledge, using consumer behavior properties for electricity time series forecasting under out-of-distribution exogenous variables to guide the design of neural network architecture has not been considered so far in the literature.

### **4. The Learning Problem**

We consider a 2-step approximate solution to maximize the gain (Equation (1)): **Step 1**: For each consumer, forecast/estimate the consumption under each potential tariff profile allocation. Given features **f***<sup>c</sup>*,1:*<sup>t</sup>* (including *ec*,1:*t*), history of allocated tariffs *TOUc*,1:*t*, and values of potential future tariff *TOUc*,*t*+1:*t*+*H*, the goal is to estimate *ec*,*t*+1:*t*+*H*. This can be seen as **a multi-step time series forecasting problem with exogenous variables**. We provide the details of our proposed approach for this in the next section. **Step 2**: Compute the profit using

$$\mathcal{G}\_{\mathbf{c}}^{i} = \sum\_{t'=1}^{H} \left( TOU\_{\mathbf{c},t+t'}^{i} - \mathfrak{p}\_{t+t'} \right) \times \mathcal{E}\_{\mathbf{c},t+t'} \tag{2}$$

for each tariff in *TOU<sup>i</sup>* ∈ T*all* for OOD scenario (T*in* for IID scenario). Allocate the tariff profile to consumer *c* which results in maximum *G* ˆ*i c*. Note that, in practice, the future wholesale rates *pt*+*t* (*t* = 1 ... *T*) are also not known and might need to be estimated. In this work, we assume that *pt*+*t* s are known in advance or estimable accurately and focus on estimating *<sup>e</sup>*<sup>ˆ</sup>*c*,*t*+*t* s which are the only terms controllable via *TOUc*,*t*+*t* s.

In summary, the tariff profile allocation policy corresponds to estimating the gain for each tariff profile for a consumer, and then allocating the profile with maximum estimated gain. We use a deep neural network based architecture as the function approximator that estimates <sup>E</sup>[*ec*,*t*+*<sup>t</sup>*|*TOUc*,*t*+1:*t*+*<sup>T</sup>*] from the data.

### *4.1. Biased and Scarce Data*

The OOD scenario is challenging as there is no historical data for the profiles in T*out*. More concretely, we consider three possible values of tariff at any time *t*: low (0.2), medium (0.5), and high (0.8). Therefore, there are 3*<sup>H</sup>* unique profiles possible. For *H* = 24, there can be ≈ 3 × 10<sup>11</sup> profiles possible. However, in practice, the number of allocated profiles would be significantly smaller than this. In this work, we consider |T*in*|∈{2, 5, 8, 10, 12, 15, 20, 30, <sup>35</sup>}, which is a range of values encountered for |T*in*| in practice. This poses serious OOD generalization challenge in estimating *ec*,*t*+1:*t*+*<sup>T</sup>* for previously unseen profiles *TOUit*+1:*t*+*<sup>T</sup>* ∈ T*out*.

We note that one peculiar type of bias that manifests in practice is the **temporal bias**: at any hour *h* of the day, certain values of tariff are more common than others. We explain this further using a practical scenario as depicted in Figure 2: In practice, it is common to use the following heuristic for tariff profile allocation: Keep most expensive tariff rates during peak demand periods, least expensive tariff rates during non-peak hours, and slightly cheaper (medium) rates, typically between peak and off-peak periods. Every tariff profile is curated on the basis of average aggregated consumption of each customer. High tariff is allocated when the aggregated consumption is high, and for rest of the hours, low/mid tariff are allocated. The distribution of tariff rates over hours would depend on the distribution of peak consumption across customers (refer Figure 2c). Furthermore, there is confounding bias [15] with latent consumer attributes affecting (1) past aggregated consumption which in turn affects the treatment (tariff profile allocation), and (2) the outcome (electricity consumption) in D both can depend on the consumer features (refer Figure 2a). We leave the handling of confounding bias for future work, and focus on handling temporal bias in this work.

We empirically show that temporal bias poses a generalization challenge for vanilla feed-forward neural networks, and propose an attention-based architecture to deal with the same, in the next section.

### *4.2. How Consumers Respond to Tariffs*

Consider the following toy example with *H* = 6 where there is only one tariff profile in T*in* given by {*HHMMLL*}, i.e., tariff rate is high (H) for the first two hours, medium (M) for the next two hours, and low (L) for the last two hours. Further assume that the consumer has a certain Type-II load during the 1st hour. After looking at this tariff profile, the consumer responds by shifting the load from the 1st (high tariff) hour to the 5th (low tariff) hour. Now, consider a tariff profile in T*out* as {*HHLLMM*}. Clearly, this profile is different from the profile in T*in* as the sequence of highs and lows over the hours is different. However, importantly, the underlying decision-making behavior of the consumer remains the same, i.e., shift the Type-II load from high tariff hour (1st hour in this case) to low tariff hour (3rd hour instead of 5th hour in this case). Therefore, it is still possible to forecast the behavior of the user for this OOD profile. In this work, we intend to leverage this aspect of the consumer's decision-making process that stays the same irrespective of the IID-vs-OOD profiles.

Further, consider five ways to process the sequence of tariff rates (Figure 3):

**Figure 3.** How different methods process the sequence of tariff rates.

	- **– Attention w/o Hour of Day (Att.-HOD)**: As explained above, the standard selfattention method can mimic the logic of how consumers respond to tariffs, but due to temporal bias in the data, the attention method does not generalize well to T*out*. We propose a simple variant that does not take HOD as input in the self-attention module to obtain the permutation equivariance property.
	- **– Attention with Permutation Equivariant Query Processing Module (Att.+PE)**: Here, the tariff rates in a day are considered as a set and processed in such a way that ordering of the tariff rates does not matter, i.e., the processing is permutation equivariant [18,19].

In the next section, we explain how we achieve permutation equivariance while forecasting the consumption given a consumer's consumption history, sequence of past tariff profiles, and a future tariff profile.

### **5. Forecasting Architecture**

Consider the consumption history of a consumer along with past allocated tariffs to be a time series of vectors **f**1:*t* including dimensions for past aggregate consumption and past allocated tariff rates {*<sup>e</sup>*1:*t*, *TOU*1:*t*}, and the candidate tariff profile for the next *H* hours to be *TOUt*+1:*t*+*H*. The goal is to estimate *et*+1:*t*+*H* while ensuring permutation equivariance in processing *TOU*1:*t*+*<sup>H</sup>* in the sense of [19], e.g., if the output of processing {*TOU*1, *TOU*2, *TOU*3} is {**<sup>o</sup>**1, **o**2, **<sup>o</sup>**3}, then the output of processing a permutation

**------------**





+,
--

+23
--

+/0 
--

.

\*

1

 **- -**  of the input, say {*TOU*2, *TOU*1, *TOU*3}, is given by the permutation {**<sup>o</sup>**2, **o**1, **<sup>o</sup>**3} of the original output.

To achieve the above-stated goal, we consider the following modularized neural network architecture as depicted in Figures 4 and 5:

**Figure 4.** Flow diagram of "Attention w/o Hour of Day" approach. The left part of the figure indicates the variability in the tariff profiles and also some tariffs are more frequent in tariff profiles. The right part of the figure indicates flow of the inputs through the network and how the information of tariffs is consumed by the proposed approach.

> **Figure 5.** Architectures contrasting "Attention w/o Hour of Day" and "Attention with Permutation Equivariant Query Processing Module" approaches.


Next, we provide details of the exogenous branch which is the key novel component of the proposed approach and helps to mitigate temporal bias.

To achieve permutation equivariance and handle temporal bias, we consider processing the tariff rates *TOUt*+1:*t*+*<sup>H</sup>* (same processing is done for past tariffs as well) via an attention mechanism where a part of the processing is done independently for tariff at each time step *t* + *t* (*t* = 1 ... *H*) while still taking into account the global information

*TOUt*+1:*t*+*<sup>H</sup>* in order to mimic the behavior of the consumer as explained in the previous section.

More specifically, we consider key *K* and value *V* for the attention mechanism to be dependent on a single time step *t* + *t*, while the query *Q* depends on the entire tariff profile *TOUt*+1:*t*+*<sup>H</sup>* for the day. In other words, *Kt*+*t* = *fK*(*TOUt*+*t* , *t* + *t*, *<sup>θ</sup>K*), *Vt*+*t* = *fV*(*TOUt*+*t* , *t* + *t*, *<sup>θ</sup>V*), and *Qt*+*t* = *fQ*(*TOUt*+1:*t*+*H*, *<sup>θ</sup>Q*). Subsequently, the output for the part of the exogenous branch processing the tariffs at time *t* + *t* is given by

$$\mathbf{Act}(Q\_{t+t'}K\_{t+t'}V\_{t+t'}) = \mathbf{soft} \mathbf{max}(\frac{Q\_{t+t'}K\_{t+t'}^T}{\sqrt{d}})V\_{t+t'} \tag{3}$$

where *d* is the dimension of *Q*, *K*, and *V*. While the *fK* and *fV* are implemented as simple linear layers, *fQ* is implemented as a permutation equivariant network as follows:

$$f(\mathbf{x}) = \sigma(\mathbf{x}\mathbf{A} - \mathbf{1}\mathbf{n}\mathbf{x}\mathbf{p}\mathbf{o}\mathbf{1}(\mathbf{x})\mathbf{I})\tag{4}$$

where *x* = ReLU(*TOUt*+1:*t*+*H*, *<sup>θ</sup>TOU*) ∈ R*H*×*<sup>d</sup>* and *θ* shared across timesteps *t* + 1 ... *t* + *H*, **Λ**, **Γ** ∈ R*dxd* , matrix of ones **1** ∈ <sup>1</sup>*H*×*H*, maxpool is taken along columns implying that the resulting value for any timestep contains information from all timesteps and is independent of a particular timestep. In this work, we use *d* = 10, *d* = 20.

**Objective function**: We use quantile loss for training the DCNN model given by:

$$\mathcal{L}\_{\text{quantile}} = \frac{1}{b \times n} \sum\_{i=1}^{b} \sum\_{q=q\_1}^{q\_n} \max(q \times e^i, (q-1) \times e^i),\tag{5}$$

where *ei* = *yi* − *y*ˆ*i* indicates the error of the forecasted consumption *y*ˆ*i* with respect to ground-truth consumption *yi* of *i*-th window instance, *b* is the batch size and *n* is the number of quantiles used for training.

### **6. Experimental Evaluation**

The goal is to evaluate the efficacy of the proposed approach to deal with OOD scenarios. For this, we compare the proposed approach with various baselines in the IID as well as OOD settings. We use the simulated data from a high-fidelity and popular PowerTAC (https://powertac.org/, accessed on 12 November 2021) [21] simulator that uses complex state-of-the-art user-behavior models and real world weather data to simulate the complex dynamics of a smart grid system.

We consider 'Office Complex Controllable type' consumers where consumers' daily behavior depends on factors such as number of sub-customers, number of appliances, weather information, hour of day, month, day of week, etc. The various values these factors can take across consumers is given in Table 1.



To obtain train, validation, and test split, we divide the total data of 6 months into 4, 1, and 1 month, respectively. The time series of hourly data for each consumer is divided into

windows of length *t* = 168 (corresponding to 7 days) with window-shift of 24 to forecast one day-head consumption, i.e., output window size is 24. We consider varying number of tariff profiles in historical data, i.e., |T*in*|∈{2, 5, 8, 10, 12, 15, 20, 25, 30, <sup>35</sup>}, and an additional set of |T*out*| = 40 profiles. As the number of profiles |T*in*| in the training set increases, we expect the bias in the training data to reduce.

### *6.1. Baselines Considered*

For comparison, we consider the following approaches all using DCNN as the core time series processing module:


### *6.2. Hyperparameters Used*

We use z-normalized consumption time series. DCNN has three layers with each layer having 16 convolutional filters of length 2, and dilation rate 1, 2, and 4, respectively. We use batch normalization and L2 filter regularizer ( *λ* = 0.001) for regularization purposes. ReLU layers are applied on each CNN layer. The output of the DCNN layer is processed by a channel-wise fully connected layer, which has 24 hidden units (equal to the output window size) i.e., 24, followed by locally connected layer with 10 filters which are applied at each time-step independently (filter size = 1).

To obtain categorical feature (hour of day, day of week, month of year) embeddings and tariff rate embeddings, we use a separate feed-forward network with ReLU layer followed by linear layer, having 5 hidden units and 10 hidden units respectively. Similarly, we use 10 hidden units for each feed-forward network *fQ*, *fK*, *fV*. Finally, the output layer is a small feed-forward network that has 2 layers followed by a linear layer having 40, 10, and 1 hidden unit, respectively. We use batch size of 16, number of epochs 200, and Adam optimizer with fixed learning rate of 0.0001 for training the neural network. During training, quantiles are sampled from uniform distribution while during validation and testing, we use three quantiles 0.1, 0.5, and 0.9. All hyperparameters were obtained via grid search based on validation quantile loss on the IID set.

*6.3.*

*Results*

*and*

*Observations*
