*2.1. Data (Pre-)Processing*

Data collected by smart meters are not always directly usable for the provision of user-centric services. At least some preprocessing steps are generally needed to create a uniform and error-free foundation for data analytics. On the one hand, many services rely on processed input data, such as a building's energy consumption during a specific period, rather than raw readings of electrical voltage levels and current flows. On the other hand, errors introduced during the sampling process, the analog-to-digital conversion step, and the transmission over communication channels raise the possibility of errors and signal falsifications that need to be eliminated. Proper preprocessing thus serves to transform the collected data into a unified and interpretable format, based on which user-centric services can be provided reliably. To establish the foundation for the data preprocessing steps required to realize the use cases surveyed in Section 3, we list typical data preprocessing steps preceding the actual data analysis as follows.

First, obviously erroneous values are generally eliminated. These primarily occur due to faulty storage devices, unreliable communication channels, or buffer overflows on the transmitting or receiving devices. Readings that do not represent valid number representations and infeasible values (e.g., current flows exceeding the nominal circuit breaker limits by a large factor) are thus removed. Unless a long sequence of wrong data is being reported, the imputation of values and the interpolation of gaps in the sampled data (e.g., by using the *impyute* library [15]) is an effective means to prepare the data for further processing.

The fundamental mode of operation of smart meters is to measure raw voltage (*V*) and current (*I*) waveforms at sampling rates that allow for the computation of Root Mean Square

$$\text{(RMS) values, } V\_{RMS} = \sqrt{\frac{1}{T} \int\_{t\_0}^{t\_0 + T} V(t)^2 dt} \text{ and } I\_{RMS} = \sqrt{\frac{1}{T} \int\_{t\_0}^{t\_0 + T} I(t)^2 dt} \text{, with } T \text{ denoting the}$$

duration of one or more mains periods and *V*(*t*) and *I*(*t*) being the voltage and current waveform signals, respectively. However, raw data are rarely communicated beyond the local system boundary due to their sheer size and their highly redundant information content [16]. Instead, smart meters typically process the raw samples locally and return one or multiple of the following parameters: RMS voltage ( *VRMS*), RMS current (*IRMS*), phase angle between voltage and current (*cos* Φ), active power ( *P*), reactive power ( *Q*), apparent power (*S*), and/or the consumed electrical energy (*E*). In multi-phase electrical installations, parameters are either returned individually for all phases or merely available in an aggregated fashion. If a particular parameter is required but not directly provided by the smart meter, it may still be possible to calculate it from the provided parameters; this is, again, a part of the preprocessing step.

To demonstrate the variability of data reported by practical smart meter deployment, Table 1 provides a brief overview of the attributes, sampling rate, and communication interface of smart meters and custom-built meters, which have been used to record publicly released electrical consumption datasets. The diversity of the provided data highlights why general data preprocessing is required to create a uniform data representation to realize consumer-centric use cases independently of the specific underlying smart meter hardware.


a eGauge. b Schneider Electric. c DENT Instruments. d Landis + Gyr.

Table 1 highlights one more aspect of heterogeneity in smart meter data, which is also confirmed in [25]: the temporal resolution at which the parameters are being reported. Reducing the rate at which values are being made available, i.e., *downsampling* smart meter data, is usually trivial and computationally lightweight, as long as the original data have undergone low-pass filtering to avoid aliasing artifacts. Commonly used methods to downsample data include subsampling, averaging, and interpolation [26,27]. Conversely, increasing the temporal resolution of data is not as trivial, but it may be required for smart meter data reported at very low sampling rates. Interpolation techniques such as *super-resolution* [28] have been shown to achieve good performance during preliminary tests on the Dataport [17] dataset. As the sampling rate is frequently limited by the smart meter's communication channel and processing power, finding the optimal sampling rates for various electricity load analysis algorithms has been investigated in numerous works (e.g., [29–32]. Similarly, lossy compression mechanisms [33,34]), and pattern recognition methods [16] have been investigated as candidates to maintain high temporal resolutions while reducing the extent of exchanged data.

#### *2.2. Extracting Higher-Level Information*

While inspecting conditioned smart meter data may be of interest for tech-savvy users or grid operators, it has been shown to provide little benefit to the average consumer, according to Serrenho et al. [35]. Consumer relevant information such as provided

in Section 3 must first be inferred from the consumption data by extracting higher level information. This includes signal features, transient events, or individual appliance consumption data. Calculating these features from the consumption data is a widely used preprocessing step that goes beyond the data cleansing and adaptation steps described in Section 2.1. Instead, it is used to eliminate redundant information and only retain the most informative features about the consumption data. Besides this, it also generally leads to implicit data compression, e.g., to utilize the available communication channels optimally or to reduce the input size for machine learning algorithms. Domain experts have introduced and compared numerous features in related works [36–38]. For example, Kahl et al. [36] evaluated 36 features such as the *voltage and current trajectory* or the *harmonic energy distribution* for their suitability to serve as distinctive higher-level features for the enablement of user-centric services. Because of their virtually ubiquitous usage, we survey a selection of methods to extract higher-level features from smart meter data as follows.

Many user-centric use cases for smart meter data rely on the analysis of user-induced events, e.g., when electrical appliances are being switched on or off, or their mode of operation is changed. In Table 2, we summarize the number of such power events found in a selection of publicly available electricity datasets. The average of the tabulated values is approximately 275 events per day, i.e., approximately one event every 6 min. As such, the Switch Continuity Principle (SCP), first introduced by Hart [39] and confirmed to hold by Makonin [40], states that the total number of events is small compared to the number of samples in the overall signal. In other words, events can be assumed to be anomalies in the signal, which makes it possible to utilize a range of known methods for their detection [41].


**Table 2.** Summary of the number of events detected in publicly released electricity datasets.

In practice, event detection algorithms span the range from computationally lightweight solutions (e.g., using thresholds between successive power samples [39,50,51]) to the application of probabilistic models and voting methods [52–54]. More recently, the application of even more complex filters to electrical signals was proposed in order to suppress minor fluctuations while emphasizing actual events. Trung et al. [55] used a CUmulative SUM (CUSUM) filter to clean the power signal, while Wild et al. [56] applied a Kernel Fisher Discriminant Analysis (KFDA) on harmonics of the current signal. De Baets et al. [57] used spectral components of the current signal which have been smoothed using an inverse *Hann* window in the *Cepstral* domain, and the method of Cox et al. [58] solely uses the voltage signal and extracts the spectral envelope of the first and third harmonics.

Data collection from smart meters implies that data are only available on the scale of buildings or apartments (cf. Figure 1). Consequently, the energy consumption of individual electrical consumers is not directly identifiable within the reported (aggregate) data. The concept of Non-Intrusive Load Monitoring (NILM) thus refers to the process of disaggregating a composite electrical load into the contributions of all individual consumers. NILM methods frequently utilize machine learning techniques or neural networks to this end [59–69]. This makes their execution on current-generation smart meters largely impossible. However, it is generally possible to send collected data to external entities that offer the required storage and processing capabilities to perform NILM and thus provide appliance-level consumption values. As will become apparent in Section 3, several use

cases can benefit from the availability of appliance-level data. The use of NILM, which comes at the advantage of requiring no additional metering devices to be deployed, is thus a widely usable data preparation method to enable additional user-centric use cases when smart meter data is available.

#### **3. Consumer-Centric Use Cases of Smart Meter Data**

While it is crucial for the operators of electrical power grids to understand the load and generation characteristics [5] in order to ensure grid stability and avoid power outages, electrical parameters can also be used to provide services to the benefit of the customers. Figure 2 depicts the primary services that can be realized when smart meter data and the corresponding higher-level information are available. We provide more details about the enabled use cases as follows.

**Figure 2.** Overview of consumer-centric services enabled by smart meter data and their proper data (pre-)processing. Dashed lines indicate the possible existence of other potential (pre-)processing steps or use cases beyond those covered in this work.

#### *3.1. Providing User Feedback*

One of the vital value propositions of smart meter deployments is providing near real-time and historical information on electricity consumption to the customers. Having access to such information is expected to result in the adoption of more sustainable consumption behavior, and thus to ultimately lead to energy savings [70–72]. Feedback on electricity consumption has been provided in numerous ways, including In-Home Displays (IHDs) [73,74], ambient displays [75,76], web and mobile applications [77–79], and public displays [80,81]. While the majority of the works focused on providing information only to the home residents, other studies also looked at the potential of social pressure by enabling direct comparisons between individual consumers or consumer groups [82,83].

A meta-review of 118 studies that involved providing feedback on electricity consumption is presented in [35]. In general, the surveyed studies report that feedback can reduce a household's energy consumption from 5 % to 10 %, particularly in cases where the deployed systems are able to provide consumption information of individual appliances. The potential of feedback to energy savings was also confirmed in [84], where 12 studies on the efficacy of disaggregated feedback were examined. Again, an average energy reduction of 4.5 % was reported across the surveyed studies. Even though there are no reports of long-term results on how to sustain the accomplished energy savings, many works have identified that, without proper engagemen<sup>t</sup> strategies, once habituation sets in (after as little as four weeks), there is a considerable loss of interest from the end-users in the feedback devices (e.g., [85–88]). However, it is evident from the literature that, through visualizing smart meter data in a timely and intuitive way, consumers become increasingly literate in understanding their domestic energy consumption, and in particular on how unintentional behavior can lead to unnecessary consumption [89,90].

With increasing distributed Renewable Energy Sources (RES), such as rooftop Photovoltaic (PV) installations, it also becomes increasingly important to aid users in aligning

their consumption habits to their local generation [91,92]. As a result of this trend, energy feedback has received renewed interest to enable prosumers, i.e., consumers with local production facilities, to interact with the power grid optimally. Even at larger scales (e.g., smart microgrids [93]), the emergence of Peer-to-Peer (P2P) energy markets requires prosumers to have an understanding of the saving potentials and the consequences of their actions, both of which can be conveyed through feedback systems [94–96]. One such use case is practically studied in [97], confirming that user feedback was consistently utilized throughout the entire duration of the study (4.5 months) in order to make or defer consumption decisions.

#### *3.2. Recognizing Patterns and Anomalies*

Finding patterns that do not conform to the expected behavior indicated through abnormal electrical energy consumption is another consumer-centric use case for smart meter data. Even though detecting anomalies in smart meter data is challenging, signal processing and machine learning techniques can efficiently be utilized for this purpose. For example, detecting anomalies in smart meter data can be used to enable Ambient Assisted Living (AAL), where consumption patterns are indicative of the Activities of Daily Livings (ADLs) executed by the residents [98–101]. Detecting unusually short or long ADLs, or unexpected ADLs sequences, in general, are often suitable indicators of unusual user behavior. Knowledge of such situations can help to alert relatives early and thus contribute to safety and well-being [102]. Several different algorithmic approaches have been used to accomplish the recognition of patterns and anomalies. Clement et al. [98] presented a semi-Markov model that describes the daily use of appliances to detect human activity/behavior from smart meter data. In [99], smart meter data are analyzed to identify the behavioral patterns of the occupants, and Bousbiat et al. [100] proposed a framework for detecting abnormal ADLs from smart meter data.

Further use cases based on the application of machine learning for anomaly detection in smart meter data have emerged and manifested themselves in areas such as energy theft detection [103,104], detecting inaccurate smart meters [105], and detecting abnormal consumption behavior in general [106]. In [104], two anomaly detection schemes for detecting energy theft attacks and locating metering defects in smart meter data are presented. The work by Sial et al. [106] investigates heuristic approaches for identifying abnormal energy consumption from smart meter data, based on a combination of four distinct power-, energy-, and time-related features used in conjunction to detect anomalies. An even more sophisticated approach was presented by Liu et al. [105], who applied a deep neural network in detecting inaccurate meters to prevent the unnecessary replacement of smart meters, thus increasing their service life span. Lastly, the detection and quantification of anomalies in smart meter energy data play a crucial role in assessing the energy quality, which is essential for detecting faulty appliances, malfunctioning appliances, and non-technical losses [107–110].

#### *3.3. Enabling Demand-Side Flexibility*

Demand-side flexibility (DSF) refers to the portion of electricity demand that can be reduced, increased, or shifted within a specific time window. DSF plays a crucial role in the smart grid by facilitating the integration of RES and reducing peak load demand [111]. Traditionally provided by industrial consumers (e.g., refrigerated warehouses and steel mills [112]), flexibility can also be provided to operators by domestic and commercial consumers through controllable appliances and Electric Vehicles (EVs), e.g., by triggering them to change their consumption profiles [111]. While each consumer is only able to supply a limited amount of flexibility, once controllable consumers (and RES) of multiple dwellings are aggregated, their flexibility can add a significant volume of DSF to the grid. Ultimately, this leads to direct and indirect benefits to a larger group of consumers. On the one hand, it enables an additional revenue source by offering controllable loads to help make demand and supply meet. On the other hand, balanced power grids have a more

favorable eco-footprint and an overall lower cost of generation, resulting in cheaper energy tariffs. Nevertheless, this flexibility is highly dependent on consumer behaviors, which correspondingly affects their willingness to provide flexible loads [113]. In this context, smart meter data are crucial to understand the potential of device-level flexibility on the consumer's premises [114–116].

In [114], the authors presented one of the first works that analyzed appliance-level consumption data in order to determine the device's flexibility and its relation to device operations and usage patterns. The work shows that a significant percentage (50 % on average) of the total energy demand for a house can be considered to provide flexibility. The results of a pilot study in Belgian households are reported in [115]. Five types of appliances available within residential premises were considered (washing machines, tumble dryers, dishwashers, domestic hot water buffers, and EVs) and assessed concerning their availability for DSF. The authors concluded that, except for EVs, the DSF potential is highly asymmetrical among appliances, possibly associated with user routines. The authors also estimated that EVs and water heaters have a flexibility potential that is much greater than that of wet appliances. In [116], the authors proposed and evaluated a data-driven approach to quantify the potential of flexible loads for participation in DSF programs. Their approach considered EVs, wet appliances (dryer, washing machine, and dishwasher), and Air Conditioning Unit (AC) loads and was evaluated on data from over 300 households from the Pecan Street project [117]. Analogous to previous works' results, the study confirms that variations in providing flexibility are considerable among households. Besides this, the results show that EVs and ACs provide higher levels of flexibility compared to wet appliances. As can be observed, in the context of DSF, EVs are of particular interest to the end-users since beyond sustainable transportation, they provide additional benefits like charging flexibility and a non-stationary energy storage solution [118,119].

While these and other works (e.g., [119–121]) assume that individual appliance consumption profiles are readily available, other researchers tried to assess the flexibility of domestic loads relying on NILM (cf. Section 2.2) to extract their individual consumption [122–124]. The main motivations for this approach are twofold: (1) avoid the costs of instrumenting the household with sensors in the individual appliances; and (2) protect the consumer privacy by not directly revealing data about individual appliances consumption (see Section 4.3). Ultimately, the obtained results show that it is possible to estimate and predict device-level flexibility from NILM outputs, even though a high disaggregation performance is necessary to reduce the uncertainty of the DSF estimation.

#### *3.4. Forecasting Power Demand and Generation*

The level of detail made available by smart meters opens several opportunities for load forecasting at the individual building level. Forecasting the electricity consumption using smart meter data plays a significant role in energy managemen<sup>t</sup> for end-customers by enabling the possibility of linking current usage behaviors to future energy costs [125]. Similarly, anomaly detection (as discussed in Section 3.2) is often closely related to the comparison of actual and predicted consumption (or generation) behavior; as such, efficient and accurate forecasting techniques are required. Forecasting individual household demands is particularly challenging, however, due to many contributing factors. These include, but are not limited to, user behavior, appliance ownership, the considered time period(s), and/or external factors such as the prevailing weather conditions.

Against this background, researchers have proposed many forecasting approaches. For example, in [126], four of the most widely used machine learning methods, namely Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), Classification and Regression Tree (CART), and Long Short-Term Memory (LSTM), are used to provide forecasts of both the daily consumption peak and the hourly energy consumption of domestic buildings using historical consumption data. It was found that MLPs and especially LSTM-based approaches can significantly improve the short term (24 h) demand forecasting as these

models can capture the underlying non-linear relationships best. Several authors have tried to incorporate information from external factors into the forecasting algorithms. For instance, Amin et al. [127] proposed three different models Piecewise Linear Regression (PLR), Auto-Regressive Integrated Moving Average (ARIMA), and LSTM to forecast the electricity demand of a building leveraging smart meter data and weather information. A similar approach was followed by Gajowniczek and Z ˛abkowski [125]. However, instead of considering the effect of weather details, the authors focused on enhancing the forecasting algorithms by considering the impact of the residents' behavior patterns. The general consensus is that the combination of historical usage data and external features such as weather and household behavior can provide significant improvements to the forecasting results. Furthermore, these authors also confirm the suitability of LSTM models for short-term (24–48 h) forecasting. The work by Dinesh et al. [128] demonstrates a novel method to forecast the power consumption of a single house based on NILM and affinity aggregation spectral clustering. The presented work incorporates human behavior and environmental influence in terms of calendar and seasonal contexts to improve individual appliances' forecasting performance. The house-level forecast is thus obtained by the aggregation of the individual appliance-level forecasts.

Prosumers in general, but mainly when they own micro-production units (e.g., PV or wind generators) and Energy Storage Systems (ESS), can use forecasting to optimize and manage these resources. On the one hand, consumption forecasting techniques can help users to anticipate their future energy needs, so they can plan their local generation and optimize the operation of their ESS accordingly. On the other hand, users can also support the operation of the electricity grid by taking control actions to balance the electricity supply and demand while maximizing self-consumption and profiting from energy arbitrage (i.e., trading electricity by purchasing energy at times the price is low and selling it when it is expensive) [129,130]. For example, Hashmi et al. [129] proposed an algorithm to control the ESS in the presence of dynamic pricing, whereas Hashmi et al. [130] optimized the ESS to maximize the PV self-consumption in a scenario where there is no reward for feeding energy into the power grid. In either case, forecasting the future demand is necessary to decide when to charge or discharge the ESS. Particularly, if feeding surplus power into the power grid is not rewarded [130], an understanding of the residual load (i.e., the difference between consumption and production) is necessary, generally based on forecasts of the local production and demand, in order to avoid unintended grid injection or PV curtailment. Intuitively, these optimizations are sensitive to forecasting errors. For example, Kiedanski et al. [131] showed that when the optimizations are performed at higher sampling rates (every 15 min in this work), the negative implications of forecasting errors are limited. In contrast, the authors stated that lower sampling rates (e.g., a 12 h forecasting horizon) require almost perfect forecasts to unleash their full potential to optimize ESS operations.

With the increasing number of EVs sales and their high power consumption during charging, it is also necessary to forecast their charging needs, as this will allow for better scheduling and capacity planning [132,133]. Ai et al. [133] attempted to forecast household day-ahead charging needs using machine learning ensembles. Such forecasts gain particular importance if the EV owners are also prosumers, since in these cases their EVs also function as an ESS. The ability to increase self-consumption and reduce peak demand using EVs was studied by Fachrizal and Munkhammar [134], who showed that, in a single (Swedish) household, the self-consumption could be increased up to 8.7 %. However, this result was obtained in the presence of perfect load demand and PV production forecasts, which again raises the question of sensitivity to forecasting errors. In sum, as more research works indicate that in general EV owners favor domestic over public charging infrastructures (e.g., [135–137]), it becomes evident that accurate load demand and production forecasts will gain increasing importance in the near future.
