We make following key observations from the results in Figures 6 and 7:

**Figure 6.** Forecasting performance Comparison of different approaches (in terms of Average Quantile Loss). (**a**) IID Scenario. (**b**) OOD Scenario.

**Figure 7.** %gains of the proposed Att.+PE, Att.-HOD, and Att. approaches over the vanilla FC approach. (**a**) Option-1. (**b**) Option-2.

	- **–** In the IID scenario, the average quantile loss (AQL) for all approaches increases with increasing number of tariff profiles as the complexity of the dataset increases. The FC approach performs better than other approaches for |T*in*| ≤ 15, indicating higher expressivity of the FC approach to fit to a smaller number of IID profiles, indicating potential overfitting.
	- **–**On the other hand, for the OOD scenario, the performance of all approaches improves with increasing number of IID profiles which is expected as more IID profiles implies less bias and better generalization to OOD profiles as well. Interestingly, the FC approach which was the best approach for the IID profiles for |T*in*| ≤ 12, is the worst approach (except the lower bound NoX) in the OOD setting, because it uses a fully connected layer to process the tariffs of the day, and due to temporal bias in the data, the weights of fully connected layer will try to overfit on |T*in*| and thus not generalize to OOD profiles|T*out*|.

On the other hand, our proposed approaches Att.+PE and Att.-HOD are consistently better than FC for all values of T*in*, which shows that FC struggles with the temporal bias in the historical data. We also analyze that Att.-HOD as well as Att.+PE are also consistently better than Att. for all values of T*in*, which shows that permutation equivariant way of handling tariff profiles provide better generalization on OOD profiles.

	- **– Comparison with FC**: We observe that all attention-based proposed approaches Att., Att.-HOD, and Att.+PE depict significant positive gains over FC. We also observe that Att., Att.-HOD, and Att.+PE approaches have higher positive gain in fewer IID tariff profiles scenarios |T*in*| ≤ 12 (except |T*in*| = 2, where data is too little to claim any generalization), and the gains tend to diminish as |T*in*| increases.
	- **–** As expected, we note that it is not important that the gains in forecasting translate directly into monetary profits, as the optimization objective involves other terms such as wholesale costs *p*. Therefore, the best approach on forecasting (Att.+PE) in the OOD scenario is not necessarily the best approach in terms of profit always.
	- **– Comparison with Att.**: For Option-1, Att.-HOD has significantly better gains than Att. for all values of T*in* except |T*in*| = 2, which shows that the permutation equivariant way of handling tariff profiles is helpful. For Option-2, the gains of Att.-HOD are better or close to the gains of Att. approach (except |T*in*| = 2).

In Figure 8, we also provide sample forecasts comparing Att., Att.-HOD, Att.+PE, and FC with the ground truth (GT) on an OOD profile, indicating better generalization ability of Att.-HOD and Att.+PE, especially around points where Type-II load gets shifted. On the other hand, all methods perform well in the IID setting as shown in Figure 9.

**Figure 8.** Sample results comparing the proposed approaches Att.-HOD and Att.+PE with FC on an OOD tariff profile. Here, GT: Ground Truth time series. FC struggles to capture the subtle changes in consumption due to shifting of load, while both Att.-HOD and Att.+PE are able to forecast better.

**Figure 9.** Sample results comparing the proposed approaches Att.-HOD and Att.+PE with FC on an IID tariff profile. Here, GT: Ground Truth time series. In IID scenario, all proposed attention-based approaches and baselines perform well.

### **7. Conclusions and Future Work**

In this work, we consider the problem of demand response managemen<sup>t</sup> from an electricity broker or retailer's perspective. We highlight temporal bias as an issue in optimizing profits via suitable tariff profile allocations. We motivate the need for better generalization to out-of-distribution profiles, and note that this is possible by leveraging the fact that consumers respond with same logic across profiles. We propose suitable inductive biases in deep neural networks-based approach for forecasting electricity consumption in response to new tariff profiles. This takes the form of a permutation equivariance-enabled attention mechanism that can leverage the property of consumer behavior to respond in a certain way across profiles. In the future, it will be interesting to look at the generalization from the perspective of handling confounding bias as the historical profile allocation and the outcome are affected by the historical allocation policies, which in turn rely on the latent consumer attributes acting as confounders. The current optimization objective takes into account broker's profit but ignores the cost of electricity for the end consumer—bringing this into the optimization objective is a potential next step.

**Author Contributions:** Conceptualization, methodology, resources, software, formal analysis and writing of original draft, P.M., J.N. and C.V.; Validation and data curation, J.N. and C.V.; writing— editing and review L.V., E.S. and S.B.; supervision, P.M., L.V., E.S. and S.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** We use simulated data from PowerTAC simulator. Further details about the data are provided in Section 6. Data is confidential, so we can not provide the simulated data.

**Conflicts of Interest:** The authors declare no conflict of interest.
