PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing
Abstract
:1. Introduction
2. Problem Statements and Preliminaries
2.1. System and Security Model
2.2. XOR Homomorphic Encryption
- a.
- Key generation:
- (1)
- The trusted authority uniformly and independently picks . Then the authority computes and for each user , and sends them to user i.
- (2)
- For each dataset with the nonce information t which is different in each time of transmission and all the user are synchronized, user i computes its secret key by
- b.
- Encryption:Denote by a bit-string. The user i encrypts it by computing
- c.
- Decryption:Denote by a ciphertext of user i. The user i decrypts it by computing
- d.
- Aggregation:Anyone can decrypt the bitwise XOR of all users’ plaintexts without any user’s secret key by computing
2.3. FM Sketch
3. Privacy-Preserving Distinct Counting Computation
3.1. Overview of PPDC
3.2. Main Idea
- Step 1
- This step is taken by users. For a user , every element in his original dataset is hashed by a hash function h with a w-bit string output. Let denote this bit string output of length w, where is the jth bit in the string and the probability is . That is to say, . Then a bitwise OR operation is taken to getThe string represents all elements in user i’s original dataset.However, S should not be calculated out straightforward. Because according to Equation (9), if the aggregator could receive directly, would reveal the original data of user i, especially when the size of his dataset is small. Therefore, a series of operations should be carried on the .
- Step 2
- This step is also done on the user’s side. The user i operates on each bit of the string in order to avoid any damage caused on the privacy. In PPDC, we design a kind of specific coding scheme for these bits. Let denote the corresponding code of in the coding scheme, where . The coding scheme is defined as follows:Figure 2 shows an example of the process user 1 deals with his original dataset.
- Step 3
- The aggregator takes this step after aggravation all users’ coded data. Let with bitwise XOR operation. Then there is a judgement rule designed to determine each bit of FM sketch S, corresponding to the coding scheme (11). We define the rule as follows:
- Step 4
- The calculation work is done by the aggregator. Based on the FM sketch S, the aggregator can get a significant parameter , the position of the last bit in S that is 1, according to Equation (5). As mentioned in Section 3.3, the approximation of distinct counting needs several more FM sketches in which the hash functions are different. After taking Step 1 to Step 3 for d times and according to Equation (8), the aggregator can get the final result , the number of distinct elements in the sensing dataset.
3.3. Privacy-Preserving Distinct Counting Scheme
- (1)
- Setup. The protection mechanism of PPDC is based on the bitwise XOR homomorphic encryption introduced in Section 2.2. The trusted authority has privately and he computes and for each user . Then the two seeds are sent to the corresponding user. The user i does a bitwise XOR operation on the seeds as well as a nonce number t according to Equation (1) to acquire his own key . Notice that the nonce number t used for calculating k is different in each transmission.
- (2)
- Encrypt. The data encryption is operated on the user’s side. The user i regards the coding string for jth bit of his data representation as the plaintext and encryptes it with the bitwise XOR homomorphic encryption algorithm to get the ciphertext
- (3)
- Aggregate. On the side of the aggregator, he collects all the n users’ data about the jth-LSB and then does the bitwise XOR computations. Denote by the bit string result. According to Equation (4), it can be drawn thatIt is easy to see that if the jth-LSBs of the n users are all 0, then the bitwise XOR of the corresponding strings is always a q-bit string of 0s. If there is any user whose data is 1 on the jth-bit, the bitwise XOR of all reports’ corresponding strings is not a q-bit string of 0s with a probability of . However this situation has little influence on the accuracy of PPDC which will be proved in Section 3.4.
- (4)
- Judge. Just like the rule mentioned above, we define the rule as follows:
Algorithm 1 Privacy-preserving Distinct Counting Scheme |
|
3.4. Scheme Analysis
4. Performance Evaluation
4.1. Accuracy Evaluation of PPDC
4.2. Efficiency Evaluation of PPDC
- (1)
- Communication Overhead.Table 1 shows the comparison of communication cost between the baseline method without privacy protection and PPDC. In Table 1, the total bits sent by a user, as the communication cost of a user, and the total bits received by the aggregator, as the communication cost of the aggregator, are the measured standards, as well as the computation complexity and round complexity of two schemes. The mentioned parameters include: n which is the total number of users, and the range of users’ data is , and is the length of each user’s bit string, and d is the number of FM sketches we applied. As the proof of Theorem 1 shows, is the upper limit of the correctness of PPDC. When q is approximately equal to w and not too small, the error rate of PPDC will decrease to an acceptable level (for example, less than 0.001). Meanwhile, the communication cost of PPDC affected by q would also be reduced. Besides, PPDC can send or receive less data than the baseline method when n is not greater than N, i.e., .However, the total communication cost is also influenced by the round complexity. The round complexity refers to the amount of time a user has to keep communicating online. Notice that the baseline method needs only one round of communication which is its most significant advantage. Therefore, the cases where PPDC performs better are when the network connection is stable, while when the network connection cannot stay reliable, the baseline method is more suitable.
- (2)
- Computation Overhead. We discuss the computing time spent during the whole process. Here, the computing time of PPDC includes the time of hash operation and coding, encryption time for each user, and the time of decryption to determine the final FM sketch S and calculating results for the aggregator. Note that ‘decryption’ is the decryption of the bitwise XOR of all users’ plaintexts which guarantees the protection of user’s privacy. The data used as a comparison is the computing time of the number of distinct counting calculated without privacy protection in Figure 9. It can be seen that it takes more time for PPDC to calculate the results. Since there are more processes like encoding, encrypting, decrypting and formula calculating than the general method, PPDC is relatively more time-consuming. However, this consumption is within an acceptable range, as shown in Figure 9; with the dataset expanding, the trend of the increase in the consumption time is slower than the linear increase. Besides, due to solving the distinct counting problem, the following other operations on the aggregated dataset will reduce resources consumption of repetitive process. Furthermore, when the size of a dataset is huge, the probability of the index approaching the end of FM sketch S is high. With our knack in Section 3.2, the computing time will gradually decrease accordingly. Therefore, on the whole, PPDC does not waste computing time. This conclusion proves that the efficiency of PPDC is appropriate for large-scale data aggregation processing.
5. Related Work
5.1. Privacy Preserving in Mobile Sensing Applications
5.2. Distinct Counting
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Thiagarajan, A.; Ravindranath, L.; La Curts, K.; Madden, S.; Balakrishnan, H.; Toledo, S.; Eriksson, J. VTrack: Accurate, energy-aware road traffic delay estimation using mobile phones. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, Berkeley, CA, USA, 4–6 November 2009; ACM: New York, NY, USA, 2009; pp. 85–98. [Google Scholar]
- Rana, R.K.; Chou, C.T.; Kanhere, S.S.; Bulusu, N.; Hu, W. Ear-phone: An end-to-end participatory urban noise mapping system. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, Stockholm, Sweden, 12–16 April 2010; ACM: New York, NY, USA, 2010; pp. 105–116. [Google Scholar]
- Huang, K.; Liu, X.; Fu, S.; Guo, D.; Xu, M. A Lightweight Privacy-Preserving CNN Feature Extraction Framework for Mobile Sensing. IEEE Trans. Dependable Secure Comput. 2019. [Google Scholar] [CrossRef]
- Bar-Yossef, Z.; Jayram, T.; Kumar, R.; Sivakumar, D.; Trevisan, L. Counting distinct elements in a data stream. In International Workshop on Randomization and Approximation Techniques in Computer Science; Springer: Berlin/Heidelberg, Germany, 2002; pp. 1–10. [Google Scholar]
- Lochert, C.; Scheuermann, B.; Mauve, M. Probabilistic aggregation for data dissemination in VANETs. In Proceedings of the Fourth ACM International Workshop on Vehicular Ad Hoc Networks, Montreal, QC, Canada, 10 September 2007; ACM: New York, NY, USA, 2007; pp. 1–8. [Google Scholar]
- Wang, L.; Cai, Z.; Wang, H.; Jiang, J.; Yang, T.; Cui, B.; Li, X. Fine-grained probability counting: Refined loglog algorithm. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (Bigcomp), Shanghai, China, 15–17 January 2018. [Google Scholar]
- Han, Q.; Du, S.; Ren, D.; Zhu, H. SAS: A secure data aggregation scheme in vehicular sensing networks. In Proceedings of the 2010 IEEE International Conference on Communications (ICC), Cape Town, South Africa, 23–27 May 2010; pp. 1–5. [Google Scholar]
- Castelluccia, C.; Chan, A.C.; Mykletun, E.; Tsudik, G. Efficient and provably secure aggregation of encrypted data in wireless sensor networks. ACM Trans. Sens. Netw. 2009, 5, 20. [Google Scholar] [CrossRef]
- Flajolet, P.; Martin, G.N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 1985, 31, 182–209. [Google Scholar] [CrossRef] [Green Version]
- Lochert, C.; Rybicki, J.; Scheuermann, B.; Mauve, M. Scalable data dissemination for inter-vehicle-communication: Aggregation versus peer-to-peer (skalierbare informationsverbreitung für die fahrzeug-fahrzeug-kommunikation: Aggregation versus peer-to-peer). Inf. Technol. 2008, 50, 237–242. [Google Scholar] [CrossRef]
- Nadeem, T.; Dashtinezhad, S.; Liao, C.; Iftode, L. TrafficView: traffic data dissemination using car-to-car communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2004, 8, 6–19. [Google Scholar] [CrossRef]
- Garofalakis, M.; Hellerstein, J.M.; Maniatis, P. Proof sketches: Verifiable in-network aggregation. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; pp. 996–1005. [Google Scholar]
- Scheuermann, B.; Mauve, M. Near-Optimal Compression of Probabilistic Counting Sketches for Networking Applications. In Proceedings of the DIALM-POMC, Portland, OR, USA, 16 August 2007; Citeseer: Princeton, NJ, USA, 2007. [Google Scholar]
- Kirschenhofer, P.; Prodinger, H.; Szpankowski, W. How to count quickly and accurately: A unified analysis of probabilistic counting and other related problems. In International Colloquium on Automata, Languages, and Programming; Springer: Berlin/Heidelberg, Germany, 1992; pp. 211–222. [Google Scholar] [Green Version]
- Li, Q.; Cao, G.; La Porta, T.F. Efficient and privacy-aware data aggregation in mobile sensing. IEEE Trans. Dependable Secure Comput. 2014, 11, 115–129. [Google Scholar] [CrossRef]
- Liu, Z.; Li, B.; Huang, Y.; Li, J.; Xiang, Y.; Pedrycz, W. NewMCOS: Towards a Practical Multi-cloud Oblivious Storage Scheme. IEEE Trans. Knowl. Data Eng. 2019. [Google Scholar] [CrossRef]
- Liu, Z.; Huang, Y.; Li, J.; Cheng, X.; Shen, C. DivORAM: Towards a practical oblivious RAM with variable block size. Inf. Sci. 2018, 447, 1–11. [Google Scholar] [CrossRef]
- Li, J.; Huang, Y.; Wei, Y.; Lv, S.; Liu, Z.; Dong, C.; Lou, W. Searchable Symmetric Encryption with Forward Search Privacy. IEEE Trans. Dependable Secure Comput. 2019. [Google Scholar] [CrossRef]
- Ma, R.; Cao, Z. Serial number based encryption and its application for mobile social networks. Peer-to-Peer Netw. Appl. 2017, 10, 332–339. [Google Scholar] [CrossRef]
- Au, M.H.; Liang, K.; Liu, J.K.; Lu, R.; Ning, J. Privacy-preserving personal data operation on mobile cloud—Chances and challenges over advanced persistent threat. Future Gener. Comput. Syst. 2018, 79, 337–349. [Google Scholar] [CrossRef]
- Bae, M.; Kim, K.; Kim, H. Preserving privacy and efficiency in data communication and aggregation for AMI network. J. Netw. Comput. Appl. 2016, 59, 333–344. [Google Scholar] [CrossRef]
- Lu, R.; Liang, X.; Li, X.; Lin, X.; Shen, X. Eppa: An efficient and privacy-preserving aggregation scheme for secure smart grid communications. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 1621–1631. [Google Scholar]
- Samanthula, B.K.; Jiang, W.; Madria, S. A probabilistic encryption based MIN/MAX computation in wireless sensor networks. In Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management (MDM), Milan, Italy, 3–6 June 2013; Volume 1, pp. 77–86. [Google Scholar]
- Li, Q.; Cao, G. Efficient and privacy-preserving data aggregation in mobile sensing. In Proceedings of the 2012 20th IEEE International Conference on Network Protocols (ICNP), Austin, TX, USA, 30 October–2 November 2012; pp. 1–10. [Google Scholar]
- Xiong, J.; Ma, R.; Chen, L.; Tian, Y.; Lin, L.; Jin, B. Achieving incentive, security, and scalable privacy protection in mobile crowdsensing services. Wirel. Commun. Mob. Comput. 2018, 2018, 8959635. [Google Scholar] [CrossRef]
- Miao, C.; Su, L.; Jiang, W.; Li, Y.; Tian, M. A lightweight privacy-preserving truth discovery framework for mobile crowd sensing systems. In Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Zhang, Y.; Chen, Q.; Zhong, S. Efficient and Privacy-Preserving Min and k th Min Computations in Mobile Sensing Systems. IEEE Trans. Dependable Secure Comput. 2017, 14, 9–21. [Google Scholar] [CrossRef]
- Dietzel, S.; Bako, B.; Schoch, E.; Kargl, F. A fuzzy logic based approach for structure-free aggregation in vehicular ad-hoc networks. In Proceedings of the Sixth ACM International Workshop on VehiculAr InterNETworking, Beijing, China, 25 September 2009; ACM: New York, NY, USA, 2009; pp. 79–88. [Google Scholar]
- Lochert, C.; Scheuermann, B.; Mauve, M. A probabilistic method for cooperative hierarchical aggregation of data in VANETs. Ad Hoc Netw. 2010, 8, 518–530. [Google Scholar] [CrossRef]
- Considine, J.; Li, F.; Kollios, G.; Byers, J. Approximate aggregation techniques for sensor databases. In Proceedings of the 20th International Conference on Data Engineering, Boston, MA, USA, 2 April 2004; pp. 449–460. [Google Scholar]
- Tao, Y.; Kollios, G.; Considine, J.; Li, F.; Papadias, D. Spatio-temporal aggregation using sketches. In Proceedings of the 20th International Conference on Data Engineering, Boston, MA, USA, 2 April 2004; pp. 214–225. [Google Scholar]
- Zekri, D.; Defude, B.; Delot, T. Building, sharing and exploiting spatio-temporal aggregates in vehicular networks. Mob. Inf. Syst. 2014, 10, 259–285. [Google Scholar] [CrossRef]
A User | The Aggregator | Round Complexity | |||
---|---|---|---|---|---|
Comm. Cost | Comp. Complexity | Comm. Cost | Comp. Complexity | ||
Baseline | 1 | ||||
PPDC | d |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Xu, M.; Fu, S.; Luo, Y. PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing. Appl. Sci. 2019, 9, 3695. https://doi.org/10.3390/app9183695
Yang X, Xu M, Fu S, Luo Y. PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing. Applied Sciences. 2019; 9(18):3695. https://doi.org/10.3390/app9183695
Chicago/Turabian StyleYang, Xiaochen, Ming Xu, Shaojing Fu, and Yuchuan Luo. 2019. "PPDC: A Privacy-Preserving Distinct Counting Scheme for Mobile Sensing" Applied Sciences 9, no. 18: 3695. https://doi.org/10.3390/app9183695