FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data
Abstract
1. Introduction
2. Background
3. Related Work
4. The FedDPA Methodology
4.1. Local Prototype Computation
4.2. Adaptive Regularization
4.3. Hierarchical Aggregation
4.4. Contrastive Alignment
4.5. Theoretical Analysis for FedDPA
4.5.1. Problem Formulation
4.5.2. Convergence Analysis
- L-Smoothness: The global loss is L-smooth such that .
- Bounded Gradient Dissimilarity: For any client k, , where δ quantifies data heterogeneity.
- Prototype Stability: The prototype alignment error decays as with communication round t.
- Per-Round Descent: Uses L-smoothness and the update rule:
- Telescoping Sum: Summed over to T while taking the expectations:
- Learning Rate Selection: Substitute and bound the sums:
- Final Bound: Divide by and use convexity:
4.5.3. Computational Efficiency of Hierarchical Aggregation
- K total clients partitioned into M groups ( clients per group);
- d-dimensional prototypes for C classes;
- -dimensional model parameters;
- groups active per round.
- Communication Cost: per round (vs. for FedAvg [1]);
- Server Computation: per round;
- Client Computation: per client (for local samples).
- FedAvg: All K clients transmit -dimensional weights ;
- FedDPA:
- Clients send -dimensional prototypes plus weights to group leaders;
- Leaders aggregate and forward only m updates to server;
- Total: .
- Server:
- Clients:
- -
- Prototype computation: (Equation (2));
- -
- Local training: Same as FedAvg ().
- Communication reduces from to ;
- Server computation becomes .
- FedDPA’s overhead is dominated by ;
- It still achieves -fold improvement over FedAvg.
- With and , FedDPA uses ∼10× fewer uplinks than FedAvg;
- Prototype alignment adds overhead but enables faster convergence (fewer rounds);
- The speedup matches the reduction from 100 to 17–21 rounds for CIFAR-10.
4.6. FedDPA Algorithm
Algorithm 1 FedDPA: client side. |
Input: Dataset partitions , Initial global model weights
|
Algorithm 2 FedDPA: global server side. |
|
5. Experiments and Results
5.1. Set-Up
5.2. Evaluation Metrics
5.3. Baselines
5.4. Results and Analysis
6. Ablation Study
6.1. Experimental Set-Up
- FedDPA without Adaptive Regularization: In this version, we replaced the dynamic regularization weight with a fixed, non-adaptive value. This set-up aimed to quantify the advantage of dynamically adjusting the regularization penalty according to each client’s level of data heterogeneity.
- FedDPA without Hierarchical Aggregation: We disabled the two-level aggregation structure for both prototypes and model weights. All client updates were sent directly to the global server for a single-step averaging process. This variant measured the performance impact of the hierarchical aggregation separate from its communication efficiency benefits.
- FedDPA without Contrastive Alignment: In this final configuration, we removed the contrastive loss function from the server-side optimization. The global prototypes were aggregated via simple averaging without the explicit goal of minimizing intra-class variance and maximizing inter-class separation. This test revealed the significance of actively structuring the global feature space.
6.2. Results and Analysis
- Adaptive Regularization: Removing this component caused a significant drop in accuracy across all datasets: a decrease for FEMNIST, a decrease for CIFAR-10, and a decrease for CIFAR-100. This confirms that a one-size-fits-all regularization penalty is suboptimal; dynamically adapting to client-specific data distributions is crucial for mitigating client drift effectively.
- Hierarchical Aggregation: Deactivating this component led to a consistent, albeit smaller, decrease in accuracy. Grouping clients with similar data distributions for an intermediate aggregation step provides a more stable and refined update to the global server, improving final model performance beyond communication efficiency benefits.
- Contrastive Alignment: Disabling this component resulted in the most dramatic performance degradation across all datasets, with the accuracy dropping by for FEMNIST, for CIFAR-10, and for CIFAR-100. Actively structuring the feature space by enforcing class separation is the single most important contributor to FedDPA’s robustness; without it, the model cannot effectively learn discriminative features to overcome severe data heterogeneity.
7. Discussion
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics. PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Li, X.; Huang, K.; Yang, W.; Wang, S.; Zhang, Z. On the convergence of fedavg on non-iid data. arXiv 2019, arXiv:1907.02189. [Google Scholar]
- Dai, Y.; Chen, Z.; Li, J.; Heinecke, S.; Sun, L.; Xu, R. Tackling data heterogeneity in federated learning with class prototypes. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 7314–7322. [Google Scholar]
- Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated learning based on dynamic regularization. arXiv 2021, arXiv:2111.04263. [Google Scholar] [CrossRef]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar] [CrossRef]
- Tan, Y.; Long, G.; Liu, L.; Zhou, T.; Lu, Q.; Jiang, J.; Zhang, C. Fedproto: Federated prototype learning across heterogeneous clients. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 8432–8440. [Google Scholar]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
- Shoham, N.; Avidor, T.; Keren, A.; Israel, N.; Benditkis, D.; Mor-Yosef, L.; Zeitak, I. Overcoming forgetting in federated learning on non-iid data. arXiv 2019, arXiv:1910.07796. [Google Scholar] [CrossRef]
- Chen, H.Y.; Chao, W.L. Fedbe: Making bayesian model ensemble applicable to federated learning. arXiv 2020, arXiv:2009.01974. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Arivazhagan, M.G.; Aggarwal, V.; Singh, A.K.; Choudhary, S. Federated learning with personalization layers. arXiv 2019, arXiv:1912.00818. [Google Scholar] [CrossRef]
- Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Exploiting shared representations for personalized federated learning. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 2089–2099. [Google Scholar]
- Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
- Oh, J.; Kim, S.; Yun, S.Y. Fedbabu: Towards enhanced representation for federated image classification. arXiv 2021, arXiv:2106.06042. [Google Scholar]
- Kuang, L.; Guo, K.; Liang, J.; Zhang, J. An Enhanced Federated Prototype Learning Method under Domain Shift. arXiv 2024, arXiv:2409.18578. [Google Scholar] [CrossRef]
- Kim, H.; Kwak, Y.; Jung, M.; Shin, J.; Kim, Y.; Kim, C. Protofl: Unsupervised federated learning via prototypical distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6470–6479. [Google Scholar]
- Yang, T.; Xu, J.; Zhu, M.; An, S.; Gong, M.; Zhu, H. FedZaCt: Federated learning with Z average and cross-teaching on image segmentation. Electronics 2022, 11, 3262. [Google Scholar] [CrossRef]
- Zhang, J.; Shan, C.; Han, J. FedGMKD: An Efficient Prototype Federated Learning Framework through Knowledge Distillation and Discrepancy-Aware Aggregation. In Proceedings of the 38th International Conference on Neural Information Processing System, Vancouver, BC, Canada, 10–15 December 2024; Volume 37, pp. 118326–118356. [Google Scholar]
- Hu, M.; Zhou, P.; Yue, Z.; Ling, Z.; Huang, Y.; Li, A.; Liu, Y.; Lian, X.; Chen, M. FedCross: Towards accurate federated learning via multi-model cross-aggregation. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; pp. 2137–2150. [Google Scholar]
- Zhang, C.; Sun, H.; Shen, Z.; Wang, D. CS-FL: Cross-Zone Secure Federated Learning with Blockchain and a Credibility Mechanism. Appl. Sci. 2024, 15, 26. [Google Scholar] [CrossRef]
- Bernardi, M.L.; Cimitile, M.; Usman, M. DQFed: A Federated Learning Strategy for Non-IID Data based on a Quality-Driven Perspective. In Proceedings of the 2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Cohen, G.; Afshar, S.; Tapson, J.; Van Schaik, A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2921–2926. [Google Scholar]
- Caldas, S.; Duddu, S.M.K.; Wu, P.; Li, T.; Konečnỳ, J.; McMahan, H.B.; Smith, V.; Talwalkar, A. Leaf: A benchmark for federated settings. arXiv 2018, arXiv:1812.01097. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Chen, H.Y.; Chao, W.L. On bridging generic and personalized federated learning for image classification. arXiv 2021, arXiv:2107.00778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Viualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Dataset | Experiment | ||||
---|---|---|---|---|---|
GM Acc (%) | Std | GM Acc (%) | Std | ||
EMNIST | Experiment 01 | 97.62 | 98.13 | ||
Experiment 02 | 97.43 | 0.09 | 98.05 | 0.06 | |
Experiment 03 | 97.51 | 97.98 | |||
FEMNIST | Experiment 01 | 84.72 | 83.60 | ||
Experiment 02 | 85.60 | 0.5 | 84.22 | 0.66 | |
Experiment 03 | 84.42 | 84.52 | |||
CIFAR-10 | Experiment 01 | 74.72 | 82.13 | ||
Experiment 02 | 74.25 | 0.33 | 81.95 | 0.07 | |
Experiment 03 | 73.92 | 82.06 | |||
CIFAR-100 | Experiment 01 | 47.13 | 51.95 | ||
Experiment 02 | 46.95 | 0.07 | 52.55 | 0.17 | |
Experiment 03 | 47.06 | 51.05 | |||
Tiny-ImageNet-200 | Experiment 01 | 52.10 | 54.40 | ||
Experiment 02 | 51.82 | 0.58 | 54.82 | 0.26 | |
Experiment 03 | 50.75 | 55.05 |
Dataset | Experiment | ||
---|---|---|---|
EMNIST | Experiment 01 | 16 | 15 |
Experiment 02 | 14 | 15 | |
Experiment 03 | 17 | 14 | |
FEMNIST | Experiment 01 | 23 | 22 |
Experiment 02 | 22 | 24 | |
Experiment 03 | 23 | 22 | |
CIFAR-10 | Experiment 01 | 17 | 12 |
Experiment 02 | 18 | 15 | |
Experiment 03 | 21 | 13 | |
CIFAR-100 | Experiment 01 | 35 | 35 |
Experiment 02 | 36 | 36 | |
Experiment 03 | 38 | 41 | |
Tiny-ImageNet-200 | Experiment 01 | 15 | 17 |
Experiment 02 | 17 | 20 | |
Experiment 03 | 15 | 18 |
Dataset | Method | ||
---|---|---|---|
EMINST | FedAvg | 96.90 ± 0.26 | 97.00 ± 0.62 |
FedPer | 93.30 ± 0.40 | 97.20 ± 0.55 | |
Ditto | 97.00 ± 0.27 | 97.20 ± 0.15 | |
FedRep | 95.00 ± 0.18 | 97.50 ± 0.49 | |
FedBABU | - | - | |
FedNH | - | - | |
FedROD | 97.30 ± 0.12 | 97.50 ± 0.30 | |
FedDPA | 97.52 ± 0.09 | 98.05 ± 0.06 | |
FEMNIST | FedAvg | 83.10 ± 0.13 | 83.4 ± 0.54 |
FedPer | 79.90 ± 0.67 | 74.50 ± 0.58 | |
Ditto | 81.5 ± 0.36 | 83.30 ± 0.30 | |
FedRep | 79.5 ± 0.70 | 80.6 ± 0.42 | |
FedBABU | - | - | |
FedNH | - | - | |
FedROD | 86.30 ± 0.5 | 83.90 ± 0.66 | |
FedDPA | 84.91 ± 0.50 | 84.10 ± 0.66 | |
CIFAR-10 | FedAvg | 66.40 ± 3.13 | 73.07 ± 1.60 |
FedPer | 61.58 ± 0.43 | 63.33 ± 0.53 | |
Ditto | 66.40 ± 3.13 | 73.07 ± 1.60 | |
FedRep | 40.13 ± 0.17 | 47.92 ± 0.38 | |
FedBABU | 62.78 ± 3.09 | 70.34 ± 1.72 | |
FedNH | 69.01 ± 2.51 | 75.34 ± 0.86 | |
FedROD | 72.31 ± 0.16 | 75.50 ± 0.15 | |
FedDPA | 74.30 ± 0.33 | 82.05 ± 0.07 | |
CIFAR-100 | FedAvg | 35.14 ± 0.48 | 36.07 ± 0.41 |
FedPer | 15.04 ± 0.06 | 14.69 ± 0.03 | |
Ditto | 35.14 ± 0.48 | 36.07 ± 1.41 | |
FedRep | 5.42 ± 0.03 | 6.37 ± 0.04 | |
FedBABU | 32.41 ± 0.40 | 22.21 ± 0.15 | |
FedNH | 41.34 ± 0.25 | 43.19 ± 0.24 | |
FedROD | 33.83 ± 0.25 | 35.20 ± 0.19 | |
FedDPA | 47.05 ± 0.07 | 51.72 ± 0.17 | |
Tiny-ImageNet-200 | FedAvg | 34.63 ± 0.26 | 37.65 ± 0.37 |
FedPer | 15.28 ± 0.14 | 13.71 ± 0.07 | |
Ditto | 34.63 ± 0.26 | 37.65 ± 0.37 | |
FedRep | 03.27 ± 0.02 | 03.91 ± 0.03 | |
FedBABU | 26.36 ± 0.32 | 30.25 ± 0.32 | |
FedNH | 36.71 ± 0.36 | 38.68 ± 0.30 | |
FedROD | 36.46 ± 0.28 | 37.71 ± 0.31 | |
FedDPA | 51.55 ± 0.58 | 54.75 ± 0.26 |
Configuration | FEMNIST | CIFAR-10 | CIFAR-100 |
---|---|---|---|
Full FedDPA Model | 84.91 | 74.30 | 47.05 |
Without Adaptive Regularization | 82.10 | 71.15 | 44.20 |
Without Hierarchical Aggregation | 83.55 | 72.50 | 45.95 |
Without Contrastive Alignment | 80.78 | 69.83 | 41.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bensiah, O.A.; Benaboud, R. FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data. Electronics 2025, 14, 3286. https://doi.org/10.3390/electronics14163286
Bensiah OA, Benaboud R. FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data. Electronics. 2025; 14(16):3286. https://doi.org/10.3390/electronics14163286
Chicago/Turabian StyleBensiah, Oussama Akram, and Rohallah Benaboud. 2025. "FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data" Electronics 14, no. 16: 3286. https://doi.org/10.3390/electronics14163286
APA StyleBensiah, O. A., & Benaboud, R. (2025). FedDPA: Dynamic Prototypical Alignment for Federated Learning with Non-IID Data. Electronics, 14(16), 3286. https://doi.org/10.3390/electronics14163286