*3.2. Data Manipulation*

Di fferent from the demand shaping approach, data manipulation aims to modify the smart meter data before sending it to the utility. Data aggregation, data obfuscation, data down-sampling, and anonymization all belong to this category.

Data obfuscation, which is also called data distortion, tries to add noise to the original smart meter data to cover the real power consumption [56–60]. Like demand shaping technique, data obfuscation also reduces the privacy loss by distorting the smart meter data, but on the network layer. Noises such as Gaussian noise [56,60], Laplace noise [56], gamma noise [57] are added into the original smart meter data to distort the load curve. These noise-adding mechanisms follow normal distributions with mean μ equals to zero, hence the noise would cancel out if enough readings are added up together, P. Barbosa, et al. [59] conclude that these probability distributions would not influence the relationship between the utility and privacy, so all distributions can achieve similar performance in protecting privacy. Moreover, to guarantee the billing correctness, serval schemes are proposed: Reference [56] proposes a power consumption distribution reconstruction methods by adding another Gaussian distribution into the data, but the method does not quantify how much noise should be added to recover the original curve; Reference [59] sends a filtered profile to the utility rather than masked profile, then result shows that the error of the overall power consumption is reduced in this way. However, they also find that the error during di fferent periods (peak period, o ff-peak period) is significantly di fferent, which provides new challenge. In summary, although the data distortion scheme shows efficient performance in reducing privacy loss, there are serval problems which should be discussed in future studies: (1) The TOU tari ff is unavailable. Although the noise would be zero-mean, but the multiplier for TOU pricing is not. Hence the sum of TOU bills would be influenced. (2) Although from the signal processing and information-theoretic viewpoint that a zero-mean noise would not influence the result, we should notice that the power system is operating on a real-time basis. The power system operator manages the grid with the real-time data sent from the smart meter, even a minor error between the ground truth with the distorted data could result in serious faults, even the collapse of the whole system.

Data aggregation reduces the privacy loss by constructing aggregators to collecting the data from a few smart meters together, so the utility is unable to detect the electricity events in a single house [8,22,60–62]. The data aggregation technique is divided into aggregation with trusted third parties (TTP) [60] and aggregation without TTP [8,22]. J.-M Bohli, et al. [60] propose data aggregation with TTP, the data aggregator (DA) operated by the TTP is responsible for gathering the data from neighbouring smart meters and then sending the aggregated data to ES. At the end of every month, the DA also generates energy consumption of individual consumer for billing purpose. However, there are several concerns about involving TTP. Above all, a TTP could try to infer the personal information, so the TTP itself may bring extra privacy risks to the system. Secondly, with the increasing numbers of smart meters being installed, it is unrealistic for the TTP to build enough DA to satisfy the demand, and the maintenance and development budget would be una ffordable to EP and NO.

References [8,22,61–63] introduces data aggregation mechanisms without TTP. Instead, encryption techniques such as HE, MPC [8,22,62,63] are employed. Both HE and MPC encryp<sup>t</sup> personal smart meter data before sending it to the utility/TP. However, di fferently from conventional encryption techniques, HE and MPC enable TPs to manipulate the data without knowing the detail of it. F. Li, et al. [8] and R. Lu, et al. [22] independently proposed an aggregation method with HE separately. By encrypting smart meter data, the DA can implement aggregation without knowing the data details. In this way, there are no concerns that the TTP may infer sensitive information without permission. However, the drawbacks of data aggregation technology are twofold. Firstly, after aggregating, it is impossible for the utility to obtain the power usage information of an individual consumer. Secondly, complex encryption would cause high computational overhead. MPC requires low computing ability but involves several servers to deal with the data [63]. In MPC, each server holds a part of the input data and they cannot infer the whole information. MPC has been successfully adopted in smart metering services such as TOU billing. However, complex value-added services, such as load forecasting and online energy disaggregation, require an advanced cloud server to implement these algorithms. So, the availability of MPC in these services should be discussed. The privacy boundary of aggregation size is also investigated in T.N. Buescher, et al. work [61]. They investigated the aggregation size referring to a privacy metric named 'privacy game'. Referring to the data-driven evaluation, a conclusion is made that even a DA with over 100 houses can still reveal private information. But the privacy measure they adopt is abstract and just simply measures the di fference between the individual load curve and the aggregated curve, a more detailed privacy measure should be proposed to reflect whether advanced algorithms (such as NILM) can infer personal information from the aggregated data.

References [56–58] combines data aggregation with noise-adding technique together, to enable di fferential privacy to the aggregated data. Di fferential privacy is employed as privacy guarantee, the concept of di fferential privacy is through adding noise to a largescale dataset, any two neighboring datasets (only one data in these two datasets is di fferent) should be indistinguishable [64]. In other aggregation mechanisms, *N* smart meters are aggregated at first, then a distributed Laplacian Perturbation Algorithm (DLPA) is applied to the aggregated data. By adjusting the parameters ε and δ, then we can say (<sup>ε</sup>, δ)-di fferential privacy is achieved (ε is the parameter to show the strength of privacy guarantee, and the δ is the failure probability, the closer ε and δ to 0, the better privacy can guarantee). The security and privacy performance are analysed in [56], two denoising filter attacks, the linear mean (LM) filter, and the non-local mean (NLM) filter are employed to evaluate the original. The results convince that attackers cannot infer the original load curve from the distorted one.

Data Anonymization mechanism [12,65,66] reduces privacy loss by replacing the real smart meter identification with pseudonyms. C. Efthymiou and G. Kalogridis proposed a data anonymization method with a TTP escrow in 2010 [12]. They suggested that two IDs are attached to each smart meter, LFID for sending attributable low frequency and HFID for sending anonymous high-frequency data, while the HFIDs are kept by a TTP, making it unknown to the utility. The low-frequency data are used for billing purposes while the high-frequency information is for network management. However, the workload of the TTP is high, and the development costs increase since all anonymous IDs are processed here. Moreover, with the introduction of the TTP escrow, the privacy risks are not eliminated but just shift from the utility to TTP.

The down-sampling method is a naive approach that aims to reduce sensitive information by reducing the interval resolution of the metered data [13,33,66]. However, like other methods, functions such as demand response and TOU billing would be sacrificed. Moreover, value-add services that require high-resolution data are unavailable as well. To quantify the privacy loss with di fferent interval data, G. Eibl and D. Engel adopt NILM as adversary to the extract of personal information. They apply an edge detection NILM to smart meter data and examine the performance of 15 appliances via F-score values and the proportion of appliances. They conclude that 15-min interval data already protect most appliances. We would like to have an in-depth research based on the research by implanting more powerful NILM algorithm (such as deep learning based NILM) since deep learning has shown distinctive ability to extract features than conventional approaches.

To sum up, solutions either require the installation of expensive devices (rechargeable battery or RES) or employ complex and high computing algorithms (data distortion and data aggregation). Moreover, some schemes introduce TTP into the smart metering system, which just moves the privacy risk from one party (ES) to another one (TTP). Most importantly, unlike other communication networks, the physical connections of the electricity grid already aggregate load consumptions at feeder level or substation level without privacy concerns, the construction of the data aggregator is superfluous. And no existing solution emphasizes the availability of value-added services, which is the vital functionality the smart meter brings to consumers. Comparing the two solutions listed above, the proposed scheme is simpler and more e fficient: The proposed scheme is based on existing physical facilities (the smart meter, private platform, distribution substation) and does not require any extra RES or high computation encryption. In the proposed scheme, the smart meter only communicates with the private platform (PC or smartphone) inside the house via Home Area Network (HAN). A multi-channel smart metering system enables the private platform to communicate with other stakeholders (e.g., ES, TP) with di fferent data granularity, which takes the advantages of both data aggregation mechanism to enable grid operation and managemen<sup>t</sup> and data down sampling mechanism to provide accurate TOU bills. Furthermore, a privacy preserving NILM algorithm is designed to enable value-added services.

#### **4. A Proposed Privacy-Functionality Trade-o** ff **Strategy and Model**

Given the scale of smart meter roll-out processes in countries and worldwide, the above risks and operational strategies could be dismissed or subordinated to utilitarian market logic, with the responsibility for their implementation and subsequent privacy protection of consumers (i.e., households) delegated to third parties, many of whom might not have privacy protection as a priority in their agendas. Moreover, and as stated before, there is a lack of clarity about such responsibilities. Furthermore, whilst smart grids could be conceived as necessary technologies to regulate the conduct of individuals in our societies [67], what could be more concerning is that privacy intrusion could also generate negative social consequences [31]. Consumers can be left powerless or socially isolated to devise their own strategies to counteract intrusion to their privacy, becoming mere means rather than ends [5].

It might be possible, however, for stakeholders to exert their creativity even in the face of privacy intrusion and existing regulations (i.e., GDPR directive) [5,10,68]. This would help households comply with the functionalities that digital technologies establish for them [68] whilst socially protecting or enhancing their sense of authentic household 'hood' [6].

To meet this, we thus propose a trade-o ff strategy that attends to both the operational and ethical concerns for smart meters and smart grids raised in this paper. The strategy adopts these principles:


The operation of the strategy is shown in the proposed model, Figure 2, as follows. The model components are consumers, DCC, energy supplier (ES), a network operator (NO), third parties (TP), and the distribution-level substations (Sub) which supply electricity to households.

**Figure 2.** Proposed smart metering system.
