Bidirectional Decoupled Distillation for Heterogeneous Federated Learning
Abstract
:1. Introduction
- We design a trustworthy federated learning framework with bidirectional distillation for heterogeneous scenarios, allowing two local networks to learn from each other through their decoupled relative-entropy loss. This approach prevents the increasing divergence in optimization direction between the local and private models, a common issue in one-way distillation methods.
- We introduce decoupled knowledge distillation to balance the effect of knowledge distillation across target and non-target classes. This approach enhances cross-referencing information interaction and improves the convergence speed of both server and client models.
- We integrate our proposed framework into four classical federated learning methods to verify its superiority on the CIFAR-10, CIFAR-100, and MNIST datasets.
2. Related Work
2.1. Heterogeneous Federated Learning
2.2. Personalized Federated Learning
3. Methods
3.1. Problem Definition
3.2. Framework
3.3. Bidirectional Decoupled Knowledge Distillation
3.4. Training Pipeline
Algorithm 1: BDD-HFL |
|
4. Results
4.1. Dataset
4.2. Baseline and Hyper-Parameter Settings
4.3. Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, B.; Ding, M.; Shaham, S.; Rahayu, W.; Farokhi, F.; Lin, Z. When machine learning meets privacy: A survey and outlook. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics; PMLR: Cambridge, MA, USA, 2017; pp. 1273–1282. [Google Scholar]
- Kaissis, G.A.; Makowski, M.R.; Rückert, D.; Braren, R.F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
- Husnoo, M.A.; Anwar, A.; Hosseinzadeh, N.; Islam, S.N.; Mahmood, A.N.; Doss, R. Fedrep: Towards horizontal federated load forecasting for retail energy providers. In Proceedings of the 2022 IEEE PES 14th Asia-Pacific Power and Energy Engineering Conference (APPEEC), Melbourne, Australia, 20–23 November 2022; pp. 1–6. [Google Scholar]
- Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Adv. Neural Inf. Process. Syst. 2020, 33, 3557–3568. [Google Scholar]
- Sun, B.; Huo, H.; Yang, Y.; Bai, B. Partialfed: Cross-domain personalized federated learning via partial initialization. Adv. Neural Inf. Process. Syst. 2021, 34, 23309–23320. [Google Scholar]
- Luo, J.; Wu, S. Adapt to adaptation: Learning personalization for cross-silo federated learning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna, Austria, 23–29 July 2022; pp. 2166–2173. [Google Scholar]
- Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; Guan, H. Fedala: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11237–11244. [Google Scholar]
- Wang, Y.; Fu, H.; Kanagavelu, R.; Wei, Q.; Liu, Y.; Goh, R.S.M. An aggregation-free federated learning for tackling data heterogeneity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 26233–26242. [Google Scholar]
- Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar]
- Shen, T.; Zhang, J.; Jia, X.; Zhang, F.; Lv, Z.; Kuang, K.; Wu, C.; Wu, F. Federated mutual learning: A collaborative machine learning method for heterogeneous data, models, and objectives. Front. Inf. Technol. Electron. Eng. 2023, 24, 1390–1402. [Google Scholar] [CrossRef]
- T Dinh, C.; Tran, N.; Nguyen, J. Personalized federated learning with moreau envelopes. Adv. Neural Inf. Process. Syst. 2020, 33, 21394–21405. [Google Scholar]
- Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
- Diao, E.; Ding, J.; Tarokh, V. HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- An, P.; Wang, Z.; Zhang, C. Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection. Inf. Process. Manag. 2022, 59, 102844. [Google Scholar] [CrossRef]
- Zhang, Y.; Xiang, T.; Hospedales, T.M.; Lu, H. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4320–4328. [Google Scholar]
- Zhao, B.; Cui, Q.; Song, R.; Qiu, Y.; Liang, J. Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11962. [Google Scholar]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, Virtual, 12–18 July 2020; pp. 5132–5143. [Google Scholar]
- Durmus, A.E.; Yue, Z.; Ramon, M.; Matthew, M.; Paul, W.; Venkatesh, S. Federated learning based on dynamic regularization. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- Gao, L.; Fu, H.; Li, L.; Chen, Y.; Xu, M.; Xu, C.Z. Feddc: Federated learning with non-iid data via local drift decoupling and correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10112–10121. [Google Scholar]
- Ye, R.; Xu, M.; Wang, J.; Xu, C.; Chen, S.; Wang, Y. Feddisco: Federated learning with discrepancy-aware collaboration. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 39879–39902. [Google Scholar]
- Tuor, T.; Wang, S.; Ko, B.J.; Liu, C.; Leung, K.K. Overcoming noisy and irrelevant data in federated learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Virtual, 10–15 January 2021; pp. 5020–5027. [Google Scholar]
- Yoshida, N.; Nishio, T.; Morikura, M.; Yamamoto, K.; Yonetani, R. Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data. In Proceedings of the ICC 2020–2020 IEEE International Conference On Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar]
- Mothukuri, V.; Parizi, R.M.; Pouriyeh, S.; Huang, Y.; Dehghantanha, A.; Srivastava, G. A survey on security and privacy of federated learning. Future Gener. Comput. Syst. 2021, 115, 619–640. [Google Scholar] [CrossRef]
- Wang, T.; Zhu, J.Y.; Torralba, A.; Efros, A.A. Dataset distillation. arXiv 2018, arXiv:1811.10959. [Google Scholar]
- Chen, H.; Wang, Y.; Xu, C.; Yang, Z.; Liu, C.; Shi, B.; Xu, C.; Xu, C.; Tian, Q. Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3514–3522. [Google Scholar]
- Chen, M.; Poor, H.V.; Saad, W.; Cui, S. Convergence time optimization for federated learning over wireless networks. IEEE Trans. Wirel. Commun. 2020, 20, 2457–2471. [Google Scholar] [CrossRef]
- Yang, Z.; Chen, M.; Saad, W.; Hong, C.S.; Shikh-Bahaei, M. Energy efficient federated learning over wireless communication networks. IEEE Trans. Wirel. Commun. 2020, 20, 1935–1949. [Google Scholar] [CrossRef]
- Shinde, S.S.; Tarchi, D. Joint air-ground distributed federated learning for intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 2023, 24, 9996–10011. [Google Scholar] [CrossRef]
- Shinde, S.S.; Bozorgchenani, A.; Tarchi, D.; Ni, Q. On the design of federated learning in latency and energy constrained computation offloading operations in vehicular edge computing systems. IEEE Trans. Veh. Technol. 2021, 71, 2041–2057. [Google Scholar] [CrossRef]
- Wang, K.; Mathews, R.; Kiddon, C.; Eichner, H.; Beaufays, F.; Ramage, D. Federated evaluation of on-device personalization. arXiv 2019, arXiv:1910.10252. [Google Scholar]
- Deng, Y.; Kamani, M.M.; Mahdavi, M. Adaptive Personalized Federated Learning. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–5 May 2021. [Google Scholar]
- Huang, Y.; Chu, L.; Zhou, Z.; Wang, L.; Liu, J.; Pei, J.; Zhang, Y. Personalized cross-silo federated learning on non-iid data. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 7865–7873. [Google Scholar]
- Hinton, G. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Proc. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Huang, D.; Ye, X.; Sakurai, T. Knowledge distillation-based privacy-preserving data analysis. In Proceedings of the Conference on Research in Adaptive and Convergent Systems, Virtual, 3–6 October 2022; pp. 15–20. [Google Scholar]
- Faisal, F.; Leung, C.K.; Mohammed, N.; Wang, Y. Privacy-Preserving Learning via Data and Knowledge Distillation. In Proceedings of the 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), Thessaloniki, Greece, 9–12 October 2023; pp. 1–10. [Google Scholar]
- Sánchez, P.M.; Huertas, A.; Xie, N.; Bovet, G.; Martinez Perez, G.; Stiller, B. FederatedTrust: A solution for trustworthy federated learning. Future Gener. Comput. Syst. 2024, 152, 83–98. [Google Scholar] [CrossRef]
- Zhang, Y.; Zeng, D.; Luo, J.; Fu, X.; Chen, G.; Xu, Z.; King, I. A Survey of Trustworthy Federated Learning: Issues, Solutions, and Challenges. ACM Trans. Intell. Syst. Technol. 2024, 2157–6904. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report TR-2009; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Lin, J. On the Dirichlet Distribution. Master’s Thesis, Department of Mathematics and Statistics, Queens University, Kingston, ON, Canada, September 2016. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Methods | FedAvg | +KD | +FML | +BDD-HFL | FedProx | +KD | +FML | +BDD-HFL | FedDC | +KD | +FML | +BDD-HFL | FedDyn | +KD | +FML | +BDD-HFL | FedDisco | +KD | +FML | +BDD-HFL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Setting 1 | 100 clients partial participation | |||||||||||||||||||
CIFAR10-IID | 81.67 | 80.79 | 81.26 | 82.41 | 82.16 | 81.78 | 81.57 | 82.02 | 85.71 | 84.80 | 84.94 | 86.83 | 84.50 | 83.61 | 83.96 | 85.20 | 84.81 | 85.06 | 85.28 | 86.35 |
CIFAR10-D1 | 81.05 | 80.25 | 80.77 | 82.34 | 81.32 | 80.65 | 80.81 | 82.31 | 84.77 | 84.26 | 84.33 | 85.90 | 84.10 | 82.55 | 83.17 | 84.60 | 84.40 | 84.36 | 84.38 | 85.82 |
CIFAR10-D2 | 79.77 | 79.03 | 80.19 | 81.67 | 79.84 | 79.57 | 79.62 | 82.23 | 84.58 | 82.97 | 83.56 | 85.03 | 82.30 | 81.33 | 82.37 | 83.33 | 83.49 | 82.97 | 83.17 | 84.88 |
CIFAR10-unbalance | 81.68 | 81.43 | 81.53 | 81.87 | 81.88 | 81.41 | 81.24 | 81.46 | 85.35 | 84.64 | 84.94 | 85.81 | 84.30 | 84.18 | 84.30 | 85.12 | 84.90 | 84.76 | 84.84 | 86.20 |
CIFAR100-IID | 40.80 | 42.09 | 42.11 | 44.47 | 40.67 | 40.51 | 41.91 | 42.92 | 55.40 | 54.53 | 54.56 | 56.94 | 51.20 | 50.02 | 50.74 | 52.13 | 54.04 | 54.04 | 54.51 | 57.37 |
CIFAR100-D1 | 41.76 | 41.78 | 42.48 | 45.78 | 41.83 | 40.68 | 42.23 | 47.50 | 54.65 | 54.01 | 54.46 | 57.00 | 51.75 | 48.83 | 50.19 | 51.50 | 53.84 | 53.62 | 54.25 | 57.04 |
CIFAR100-D2 | 41.81 | 42.13 | 42.98 | 47.24 | 41.84 | 41.12 | 42.57 | 49.98 | 53.91 | 52.89 | 53.11 | 56.39 | 51.13 | 47.71 | 49.44 | 50.64 | 53.93 | 53.44 | 53.67 | 56.39 |
CIFAR100-unbalance | 40.90 | 43.19 | 42.16 | 43.27 | 41.05 | 41.33 | 40.55 | 42.94 | 55.27 | 53.72 | 53.79 | 56.45 | 51.01 | 50.16 | 51.03 | 50.39 | 54.21 | 54.15 | 54.67 | 56.80 |
MNIST-IID | 98.15 | 98.12 | 97.99 | 98.20 | 98.11 | 98.19 | 98.19 | 98.08 | 98.47 | 98.42 | 98.45 | 98.49 | 98.38 | 98.35 | 98.30 | 98.39 | 98.39 | 98.53 | 98.42 | 98.48 |
MNIST-D1 | 98.13 | 98.05 | 97.99 | 98.20 | 98.12 | 98.19 | 98.14 | 98.22 | 98.49 | 98.45 | 98.42 | 98.43 | 98.30 | 98.26 | 98.21 | 98.35 | 98.36 | 98.49 | 98.40 | 98.46 |
MNIST-D2 | 98.00 | 97.91 | 98.03 | 98.21 | 98.04 | 98.02 | 98.01 | 98.05 | 98.40 | 98.40 | 98.35 | 98.46 | 98.30 | 98.25 | 98.12 | 98.44 | 98.28 | 98.41 | 98.46 | 98.47 |
MNIST-unbalance | 98.15 | 98.16 | 98.02 | 98.16 | 98.13 | 98.10 | 98.18 | 98.10 | 98.53 | 98.49 | 98.41 | 98.16 | 98.34 | 98.30 | 98.31 | 98.32 | 98.37 | 98.42 | 98.40 | 98.56 |
Setting 2 | 500 clients partial participation | |||||||||||||||||||
CIFAR10-IID | 73.26 | 73.09 | 73.26 | 75.74 | 72.58 | 72.66 | 72.90 | 75.03 | 84.19 | 82.26 | 81.70 | 85.17 | 82.49 | 81.12 | 81.16 | 82.77 | 74.34 | 79.84 | 79.36 | 81.78 |
CIFAR100-IID | 27.36 | 28.46 | 27.93 | 27.36 | 26.50 | 27.23 | 27.47 | 24.52 | 50.61 | 54.53 | 43.38 | 54.54 | 44.11 | 44.48 | 44.25 | 40.58 | 41.17 | 36.90 | 43.80 | 53.02 |
Methods | FedAvg | +KD | +FML | +BDD-HFL | FedProx | +KD | +FML | +BDD-HFL | FedDC | +KD | +FML | +BDD-HFL | FedDyn | +KD | +FML | +BDD-HFL | FedDisco | +KD | +FML | +BDD-HFL |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Setting 1 | 100 clients all participation | |||||||||||||||||||
CIFAR10-IID | 82.16 | 81.84 | 81.53 | 82.76 | 81.85 | 81.76 | 81.80 | 82.58 | 86.18 | 85.39 | 85.38 | 86.65 | 85.26 | 82.48 | 84.59 | 85.84 | 85.49 | 85.06 | 85.71 | 87.07 |
CIFAR10-D1 | 80.42 | 80.89 | 80.86 | 82.36 | 80.70 | 80.43 | 80.75 | 82.83 | 85.64 | 85.09 | 85.30 | 86.36 | 85.26 | 84.17 | 84.50 | 85.69 | 85.11 | 85.16 | 85.49 | 85.75 |
CIFAR10-D2 | 79.14 | 80.82 | 80.92 | 81.82 | 40.93 | 40.27 | 41.56 | 48.44 | 84.32 | 83.59 | 83.69 | 84.83 | 84.14 | 83.37 | 82.97 | 84.17 | 83.98 | 84.30 | 84.13 | 84.95 |
CIFAR10-unbalance | 86.31 | 81.80 | 81.67 | 82.49 | 81.90 | 81.58 | 81.71 | 82.56 | 86.31 | 85.41 | 85.74 | 86.85 | 85.68 | 84.50 | 84.93 | 86.15 | 85.55 | 85.84 | 85.81 | 87.02 |
CIFAR100-IID | 39.68 | 41.81 | 41.70 | 43.15 | 40.39 | 38.79 | 39.58 | 42.94 | 55.52 | 53.99 | 54.61 | 57.11 | 52.07 | 51.68 | 51.46 | 53.53 | 53.57 | 54.67 | 54.36 | 56.95 |
CIFAR100-D1 | 40.48 | 42.09 | 42.38 | 44.07 | 40.15 | 40.24 | 40.84 | 45.55 | 55.34 | 53.76 | 54.66 | 57.94 | 52.84 | 51.62 | 51.84 | 52.76 | 54.68 | 53.43 | 55.13 | 57.32 |
CIFAR100-D2 | 40.11 | 43.14 | 43.27 | 43.56 | 40.93 | 40.27 | 41.56 | 48.44 | 54.86 | 53.76 | 54.63 | 56.89 | 51.89 | 50.40 | 50.67 | 53.01 | 53.90 | 53.83 | 54.15 | 57.16 |
CIFAR100-unbalance | 40.03 | 43.31 | 42.25 | 44.12 | 39.93 | 40.08 | 40.76 | 41.82 | 55.69 | 53.87 | 54.55 | 57.06 | 52.81 | 51.70 | 52.17 | 52.55 | 54.18 | 54.77 | 54.24 | 57.15 |
MNIST-IID | 98.12 | 98.49 | 98.50 | 98.45 | 98.12 | 98.21 | 98.19 | 98.14 | 98.45 | 98.42 | 98.44 | 98.45 | 98.51 | 98.72 | 98.66 | 98.63 | 98.33 | 98.46 | 98.42 | 98.48 |
MNIST-D1 | 98.09 | 98.43 | 98.48 | 98.31 | 98.05 | 98.15 | 98.11 | 98.15 | 98.48 | 98.45 | 98.45 | 98.53 | 98.44 | 98.72 | 98.67 | 98.55 | 98.39 | 98.44 | 98.46 | 98.49 |
MNIST-D2 | 97.98 | 98.38 | 98.35 | 98.39 | 97.96 | 98.06 | 98.12 | 98.10 | 98.51 | 98.44 | 98.45 | 98.58 | 98.46 | 98.70 | 98.57 | 98.64 | 98.40 | 98.49 | 98.52 | 98.46 |
MNIST-unbalance | 98.12 | 98.49 | 98.37 | 98.35 | 98.10 | 98.26 | 98.31 | 98.18 | 98.46 | 98.48 | 98.43 | 98.48 | 98.60 | 98.62 | 98.73 | 98.69 | 98.43 | 98.46 | 98.46 | 98.48 |
Setting 2 | 500 clients all participation | |||||||||||||||||||
CIFAR10-IID | 73.43 | 72.86 | 73.01 | 75.63 | 72.77 | 71.57 | 72.53 | 75.17 | 84.93 | 85.39 | 83.93 | 86.36 | 84.07 | 83.38 | 83.43 | 84.73 | 82.20 | 82.06 | 84.25 | 85.40 |
CIFAR100-IID | 26.03 | 27.95 | 27.75 | 27.86 | 28.22 | 26.48 | 27.53 | 27.04 | 54.25 | 38.86 | 47.89 | 56.26 | 50.22 | 51.95 | 51.75 | 51.80 | 50.05 | 38.60 | 49.46 | 56.38 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, W.; Yan, M.; Li, X.; Han, L. Bidirectional Decoupled Distillation for Heterogeneous Federated Learning. Entropy 2024, 26, 762. https://doi.org/10.3390/e26090762
Song W, Yan M, Li X, Han L. Bidirectional Decoupled Distillation for Heterogeneous Federated Learning. Entropy. 2024; 26(9):762. https://doi.org/10.3390/e26090762
Chicago/Turabian StyleSong, Wenshuai, Mengwei Yan, Xinze Li, and Longfei Han. 2024. "Bidirectional Decoupled Distillation for Heterogeneous Federated Learning" Entropy 26, no. 9: 762. https://doi.org/10.3390/e26090762