*4.1. Data Aggregation*

•

Data aggregation is the process of gathering data from several sources and combining it into a variable or report. In smart homes, various appliances are connected to an SM and send their demand/consumption report to the SM. It creates a communications overhead and privacy hole [64]. To avoid this issue, an aggregator is used that collects messages from various appliances and aggregates them into a single message. The following techniques are discussed for data aggregation while preserving the privacy.

In [22], a scheme has been presented, which is based on incremental hash operation. This scheme reports the cost to the operation center instead of energy consumption readings. After an interval of time, the SM calculates the cost of the recorded reading using hash function and sends it to an operation center. The operation center first receives all the consumption costs from different residential areas and then aggregates them for forwarding to utility providers for the verification of integrity. Utility providers sum up all the received values and compare it with the power distribution for that time interval to validate the integrity. If the value of cost and distribution is not equal, the entire consumption reading is discarded automatically.

A framework based on Shamir's secret sharing is proposed in [65] in order to effectively reduce computational overhead and dependency on a single dedicated aggregator. The scheme also prevents the electrical utility from linking its data to a single SM. The architecture describes that the area under the supply of one service provider is divided into subregions. Each SM divides its reading into shares and connects it to several aggregators. The scheme masks the SM form the utility by sending the aggregated reading and reduces the dependency on a single aggregator.

An in-network data aggregation scheme is proposed in [30], which aggregates the data hop by hop. Each appliance has its own chip code and spreads the energy consumption using these chip codes, which are sent to the SM after every time interval. These chip codes are unique among appliances. The SM can extract each appliance's consumption by knowing the chip codes. Since each appliance has its own chip code, any malfunctioning appliance cannot alter the consumption of other appliances.

In [66], a multidimensional aggregation scheme is used to save the communication bandwidth and increase the computational speed of the SM. There is a gateway between the CC and HAN, which receives the encrypted data from a large number of SMs and then aggregates the data before sending it to CC. A TTP is used to mask the gateway from HANs to avoid any mishandling. Any failure or attack on the TTP end can lead to a serious disturbance in communications between the CC and HAN.

**Summary:** Table 1 provides a detailed aggregation summary of the above analyzed techniques. It is perceived that the majority of the aggregation steps are performed by a separate third party device or CC [22,65–67]. Similarly, in [65,66], the selection of devices for aggregation and their group header nomination also increases computation overhead. The authors of [67] assume that all entities taking part in the communications are secure and resistant to tampering and modification attacks.



### *4.2. CIA Triad and Anonymity*

In this section, we present schemes that ensure CIA triad and anonymity while preserving the privacy of HAN. A scheme proposed in [68] divides the users of a residential area in subsets based on the energy consumption ranges over a period of time. The energy consumption is then summed up for each subset. The TTP and Paillier homomorphic schemes are used to ensure the privacy of data. However, a damaged SM may not report the data correctly and the malfunctioning or misuse of TTP can lead to serious concerns regarding the authenticity of aggregation reports.

In [69], a Q-learning technique, which is based on artificial intelligence, is proposed and presented. The structure is that there are three kinds of information shared between a HAN/BAN or SCC: control flow, data flow, and power flow. Smart appliances and SMs constitute the HAN. Different HANs that are in the same building constitute a BAN. The regional power supplier which manages multiple BANs is called NAN. The NAN sends information such as dispatch instruction, billing, real-time reporting, and uploads the data to the SCC. Before sending data to the control center, the data is distributed to uniformly random secret shares. SCC outsources information to professional cloud server operators to train the Q-Learning model using edge computing. The secret shares are randomly distributed so that cloud servers could not obtain the information. However, if the two servers collude, then it can be a very serious privacy breach. The scheme also has it own protocols for selection and addition and subtraction but, as we know, the honest but curious entities in the network can access the information from the secret shares anytime.

Similarly, in [16], a homomorphic scheme is proposed for smart homes, which consists of home appliances, SM, and a third-party aggregator. The third party aggregator assigns an ID to every appliance at the time of installation. All appliances in a home are similarly arranged in a sequence order as per given IDs. All appliances report their consumption report to an SM. Before sending their consumption, they add homomorphic features and forward data to the aggregator for the current round. The aggregator appliances sum

up all received readings, encryp<sup>t</sup> it with SM's public key, and send to the SM. The SM authenticates the aggregator appliance using a private key.

The SM encrypts the consumption and gives the identity to the SS. After verifying the identity, SS generates the group blind signature and generates the tags for each data block which the CC acquires and matches with the corresponding data block. In this way CC verifies the data integrity. The author supports the scheme by following that if an adversary or SS somehow can obtain the encrypted consumption but could not obtain the CC's private key. This is because in order to guess the private key prime numbers must be used and exact prime numbers are difficult to match in a polynomial equation. Thus, the possibility of compromising the CC's private key is almost negligible. However, the CC is assumed to be honest in this scheme [21].

In many privacy preserving schemes, TTP is certification authority to generate public and private keys. To avoid TTP, Xiaoli et al. presented a secure privacy preserving scheme. At the time of physical configuration each SM is assigned an ID by the CC [70]. The same ID is also registered with the CC. Every time the CC sends a request message to the SM for sending the energy consumption pattern, the request message includes SM, CC ID, and the key material. Using ID and key material, the SM first generates a random number and then a secret key. The SM will encryp<sup>t</sup> the energy consumption report by using their secret key and current time stamp. The encrypted message is then forward to CC for identity verification and decryption. The CC first verifies the SM identity by its ID and then decrypts the message using the same secret key.

**Summary:** PPMA [68], LiPSG [69], and lattice-based homomorphic schemes [16] provide confidentiality and integrity, but not anonymity and availability (see Table 2). PPMA and lattice-based homomorphic schemes are resistant against passive and active attacks, but blind signature [21] fails to do this. Similarly, [16,21,68,70] do not update their encryption key.
