Shuffle Model of Differential Privacy: Numerical Composition for Federated Learning
Abstract
:1. Introduction
- We provide a tight privacy parameter characterization of vector randomized response in the shuffle model.
- We deduce an efficient strategy to numerically approximate the privacy loss distribution for both single-message and multi-message shuffle protocols, which is key to deriving tight sequential composition for federated analytics and learning in the shuffle model.
- Through simulation experiments, we demonstrate that our privacy characterization for vector randomized response saves about of the privacy budget shuffle model. For the multi-message shuffle protocols that enjoy higher utility, our numerical sequential composition saves of the privacy budget.
2. Related Work
2.1. Shuffle Model of Differential Privacy
2.2. Sequential Composition in the Shuffle Model
3. Preliminaries
3.1. Differential Privacy
3.2. The Shuffle Model of Differential Privacy
3.3. Privacy Loss Variables for Sequential Composition
- are supported on ,
- for all ,
- for every and
- and
- where are probability density functions of , respectively.
4. Problem Formulation
5. A Tight Characterization of Vector Randomized Response
- I.
- -variation property: we say that the -variation property holds if and for all possible .
- II.
- q-ratio property: we say that the q-ratio property holds if and hold for all possible and all .
6. Privacy Loss Distribution of Generalized Dominating Pairs
- ,
- ,
- ,
- .
- Here, C represents the number of other users whose outputs are indistinguishable “clones” of the two users, and A denotes the random split between these clones.
Algorithm 1 PLD Computation of | |
Input: | |
Output: The PLD of over -grid range with tolerance | |
1: | |
2: | |
3: | |
4: | |
5: | |
6: | |
7: | ▷ initializes the PLD with a list of zeros |
8: | for to do |
9: | |
10: | |
11: | for to do |
12: | |
13: | |
14: | if and then |
15: | |
16: | |
17: | end if |
18: | end for |
19: | end for |
20: | return |
7. Applications to Federated Learning
7.1. Building Decision Tree/Forest
7.2. Building Deep Learning Models
7.3. Machine Learning
8. Experiments
8.1. Federated Deep Learning Simulation
8.2. Federated Decision Tree/Forest Simulation
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gao, J.; Hou, B.; Guo, X.; Liu, Z.; Zhang, Y.; Chen, K.; Li, J. Secure aggregation is insecure: Category inference attack on federated learning. IEEE Trans. Dependable Secur. Comput. 2021, 20, 147–160. [Google Scholar] [CrossRef]
- Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Dwork, C. Differential privacy: A survey of results. In Proceedings of the International Conference on Theory and Applications of Models of Computation, Xi’an, China, 25–29 April 2008; pp. 1–19. [Google Scholar]
- Apple; Google. Exposure Notification Privacy-Preserving Analytics White Paper. 2021. Available online: https://covid19-static.cdn-apple.com/applications/covid19/current/static/contact-tracing/pdf/ENPA_White_Paper.pdf (accessed on 31 January 2025).
- Talwar, K.; Wang, S.; McMillan, A.; Jina, V.; Feldman, V.; Bansal, P.; Basile, B.; Cahill, A.; Chan, Y.S.; Chatzidakis, M.; et al. Samplable anonymous aggregation for private federated data analysis. arXiv 2024, arXiv:2307.15017. [Google Scholar]
- Erlingsson, Ú.; Feldman, V.; Mironov, I.; Raghunathan, A.; Talwar, K.; Thakurta, A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, CA, USA, 6–9 January 2019. [Google Scholar]
- Feldman, V.; McMillan, A.; Talwar, K. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In Proceedings of the 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), Denver, CO, USA, 7–10 February 2022. [Google Scholar]
- Feldman, V.; McMillan, A.; Talwar, K. Stronger privacy amplification by shuffling for rényi and approximate differential privacy. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Florence, Italy, 22–25 January 2023. [Google Scholar]
- Wang, S.; Peng, Y.; Li, J.; Wen, Z.; Li, Z.; Yu, S.; Wang, D.; Yang, W. Privacy Amplification via Shuffling: Unified, Simplified, and Tightened. Proc. Vldb Endow. 2024, 17, 1870–1883. [Google Scholar] [CrossRef]
- Girgis, A.; Data, D.; Diggavi, S.; Kairouz, P.; Suresh, A.T. Shuffled model of differential privacy in federated learning. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 13–15 April 2021. [Google Scholar]
- Zhu, Y.; Dong, J.; Wang, Y.X. Optimal accounting of differential privacy via characteristic function. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 28–30 March 2022; pp. 4782–4817. [Google Scholar]
- Ghazi, B.; Golowich, N.; Kumar, R.; Pagh, R.; Velingker, A. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, 17–21 October 2021; Springer: Cham, Switzerland, 2021; pp. 463–488. [Google Scholar]
- Koskela, A.; Heikkilä, M.A.; Honkela, A. Numerical Accounting in the Shuffle Model of Differential Privacy. Trans. Mach. Learn. Res. 2023. Available online: https://openreview.net/forum?id=11osftjEbF (accessed on 31 January 2025).
- Cheu, A.; Zhilyaev, M. Differentially private histograms in the shuffle model from fake users. In Proceedings of the 2022 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 22–26 May 2022. [Google Scholar]
- Balle, B.; Bell, J.; Gascon, A.; Nissim, K. Private summation in the multi-message shuffle model. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 9–13 November 2020. [Google Scholar]
- Luo, Q.; Wang, Y.; Yi, K. Frequency Estimation in the Shuffle Model with Almost a Single Message. In Proceedings of the Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 2219–2232. [Google Scholar]
- Chang, A.; Ghazi, B.; Kumar, R.; Manurangsi, P. Locally private k-means in one round. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 1441–1451. [Google Scholar]
- Tenenbaum, J.; Kaplan, H.; Mansour, Y.; Stemmer, U. Differentially private multi-armed bandits in the shuffle model. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Girgis, A.M.; Diggavi, S. Multi-message shuffled privacy in federated learning. IEEE J. Sel. Areas Inf. Theory 2024, 5, 12–27. [Google Scholar] [CrossRef]
- Balle, B.; Bell, J.; Gascón, A.; Nissim, K. The privacy blanket of the shuffle model. In Proceedings of the 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2019. [Google Scholar]
- Kasiviswanathan, S.P.; Lee, H.K.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
- Cheu, A.; Smith, A.; Ullman, J.; Zeber, D.; Zhilyaev, M. Distributed differential privacy via shuffling. In Proceedings of the EUROCRYPT, Darmstadt, Germany, 19–23 May 2019. [Google Scholar]
- Dingledine, R.; Mathewson, N.; Syverson, P. Tor: The Second-Generation Onion Router. In Proceedings of the 13th USENIX Security Symposium (USENIX Security 04), San Diego, CA, USA, 9–13 August 2004. [Google Scholar]
- Bittau, A.; Erlingsson, Ú.; Maniatis, P.; Mironov, I.; Raghunathan, A.; Lie, D.; Rudominer, M.; Kode, U.; Tinnes, J.; Seefeld, B. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28 October 2017. [Google Scholar] [CrossRef]
- Xu, S.; Zheng, Y.; Hua, Z. Camel: Communication-Efficient and Maliciously Secure Federated Learning in the Shuffle Model of Differential Privacy. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, Salt Lake City, UT, USA, 14–18 October 2024; CCS ’24. pp. 243–257. [Google Scholar] [CrossRef]
- Gopi, S.; Lee, Y.T.; Wutschitz, L. Numerical composition of differential privacy. In Proceedings of the Thirty-Fifth Annual Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
- Koskela, A.; Jälkö, J.; Honkela, A. Computing tight differential privacy guarantees using fft. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Online, 26–28 August 2020; pp. 2560–2569. [Google Scholar]
- Chen, W.N.; Song, D.; Ozgur, A.; Kairouz, P. Privacy amplification via compression: Achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation. Adv. Neural Inf. Process. Syst. 2024, 36, 69202–69227. [Google Scholar]
- Li, X.; Liu, W.; Feng, H.; Huang, K.; Hu, Y.; Liu, J.; Ren, K.; Qin, Z. Privacy enhancement via dummy points in the shuffle model. IEEE Trans. Dependable Secur. Comput. 2023, 21, 1001–1016. [Google Scholar] [CrossRef]
- Rokach, L.; Maimon, O. Decision trees. In Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2005; pp. 165–192. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Krishna, K.; Murty, M.N. Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1999, 29, 433–439. [Google Scholar] [CrossRef] [PubMed]
Notation | Description |
---|---|
1-2 | |
1-2 | the shuffling procedure |
the randomization algorithm | |
d | the dimensionality of user data |
the number of categories in j-th dimension () | |
n | the number of users |
the domain of user data | |
the domain of a sanitized message | |
1-2 | the local privacy budget (level) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, S.; Zeng, S.; Li, J.; Huang, S.; Chen, Y. Shuffle Model of Differential Privacy: Numerical Composition for Federated Learning. Appl. Sci. 2025, 15, 1595. https://doi.org/10.3390/app15031595
Wang S, Zeng S, Li J, Huang S, Chen Y. Shuffle Model of Differential Privacy: Numerical Composition for Federated Learning. Applied Sciences. 2025; 15(3):1595. https://doi.org/10.3390/app15031595
Chicago/Turabian StyleWang, Shaowei, Sufen Zeng, Jin Li, Shaozheng Huang, and Yuyang Chen. 2025. "Shuffle Model of Differential Privacy: Numerical Composition for Federated Learning" Applied Sciences 15, no. 3: 1595. https://doi.org/10.3390/app15031595
APA StyleWang, S., Zeng, S., Li, J., Huang, S., & Chen, Y. (2025). Shuffle Model of Differential Privacy: Numerical Composition for Federated Learning. Applied Sciences, 15(3), 1595. https://doi.org/10.3390/app15031595