Few-Shot Federated Learning: A Federated Learning Model for Small-Sample Scenarios
Abstract
:1. Introduction
- Overcoming the Challenges of Training Models with Extremely Limited Data: In practical applications, acquiring a large volume of labeled data can be prohibitively expensive. A few-shot federated learning model has been designed that enables the normal progression of model training even in scenarios with limited data availability.
- Enhancing Privacy Protection: In an era where privacy and security issues are of paramount concern, ensuring that user privacy is not compromised during model training is crucial. Therefore, federated learning has been adopted as the foundational framework. This framework avoids the privacy leaks that could occur through the sharing of client’s original data.
- Facilitating Personalized Learning on the Client Side: Considering real-world scenarios, especially those with extremely limited data, where data distributions are often non-independent and identically distributed (non-IID), and participants’ classification tasks can vary greatly, personalized knowledge distillation has been introduced. This enables clients to perform partial knowledge distillation to the server, thereby further improving the training efficiency of the model.
2. Related work
2.1. Few-Shot Learning
- Siamese Networks: Koch et al. [11] first introduced the concept of Siamese networks in their paper, employing two identical networks to learn the similarity between input pairs. These networks share the same weights and architecture and are trained jointly to minimize or maximize some distance metric, thereby determining whether two inputs belong to the same category. Thomas et al. [12] proposed a method based on Siamese networks for building text classifiers, embedding texts and labels into a common vector space, and using a similarity function to calculate the similarity between two items. Niamh et al. [13] innovated on the Siamese network architecture by introducing the Stop Loss function to prevent representation collapse, simplifying the training process and enhancing model robustness.
- Matching Networks: Vinyals et al. [14] introduced matching networks, incorporating attention mechanisms and memory modules to enable the model to learn a matching function for a small-sample task directly from the support set. The design of matching networks allows the model to consider all samples in the support set at each step, learning how to make effective predictions from a few samples in an end-to-end manner. Cao et al. [15] proposed a Bi-directional Matching Network architecture, incorporating a semantic alignment model and combining appearance flow, relation flow, and mutual information flow for sample alignment and comparison. This method addresses the challenge of image classification in few-shot settings by leveraging the deep semantic relationships between images, significantly improving the classification performance. Zhang et al. [16] presented SGMNet, a meta-learning framework based on scene graph matching for few-shot remote sensing scene classification, introducing a Graph Construction Module (GCM) and a Graph Matching Module (GMM) to effectively utilize the co-occurrence and spatial correlation features of remote sensing images, enhancing the classification performance.
- Prototypical Networks: Snell et al. [5] introduced Prototypical Networks, which classify by computing prototypes for each category—the averages of all sample features within a class—and then classifying query samples based on their distance to these prototypes. The essence of Prototypical Networks is to represent each category in the feature space such that the features of samples within the same category are as close together as possible, while those of different categories are far apart. This approach has shown excellent performance in few-shot learning tasks, especially in one-shot and few-shot learning scenarios, effectively improving model accuracy and the capacity for generalization. Zhou et al. [17] proposed a new architecture based on Prototypical Networks, LDP-Net, employing a dual-branch network structure to predict the category of an input image using a global branch method. The image is then randomly cropped and the cropped image is used as a new input for prediction. After prediction, knowledge distillation is used to reinforce the consistency between the overall and partial predictions of the image, thereby enhancing the model’s generalization performance. Qin et al. [18] proposed a robust network supervised learning method based on Prototypical Networks to address noise and domain discrepancies in network data. This method introduces a small number of real-world samples as “truth” and uses contrastive learning to minimize the distance between network data and this “truth”.
2.2. Federtated Learning
3. Preliminary
3.1. Knowledge Distillation
3.2. Few-Shot Learning
4. Model System
4.1. Model Learning Process
- Initialization. The global model is initialized on the server side and distributed to all clients participating in the training.
- Local Model Training by Clients. Each client trains the received global model on its local dataset. Specifically, clients calculate the prototypes for each class in their local dataset and based on these prototypes, complete the training of the model.
- Client Parameters Upload. After the local model training is completed in the current round, each client uploads the model update parameters to the server.
- Server-Side Aggregation. Upon receiving the model update parameters from the clients, the server averages and aggregates these parameters to obtain the global model parameters. After aggregation, the global model is evaluated for performance, specifically testing its classification accuracy on a test set.
- Iterative Training. The server distributes the updated global model to all clients, who then commence the next round of model training.
- Personalized Knowledge Distillation. When clients receive a global model that is not in an initialized state, they use their local model as the student model and the received global model as the teacher model. Through a personalized knowledge distillation algorithm, the student model is guided to learn knowledge relevant to its local classification task from the teacher model.
Algorithm 1 Few-shot Federated Learning | |
Input: | Client Collection{} |
Output: | Global model with completed training |
1: | Server side sends initial global model to client side |
2: | for Client{}… do |
3: | Clients use local data for model few-shot learning |
4: | Upload model training parameters to the server |
5: | end for |
6: | for Client{}… do |
7: | Average aggregation of model parameters uploaded by clients |
8: | end for |
9: | while Global model does not converge |
10: | for Client{} do |
11: | ← |
12: | |
13: | end for |
14: | Repeat the process until the model converges |
4.2. FsFL
4.3. Personalized Knowledge Distillation Based on Student Model Classification Tasks
Algorithm 2 PersonalizedKD | |
Input: | Server’s model, Client’s model |
Output: | Updated Client’s model |
1: | for Client{} do |
2: | The client receives the global model from the server |
3: | are determined according to Equation (17) |
4: | For data related to the local classification task in the server, the confidence discrepancy is calculated using Equation (15) |
5: | The corresponding temperature parameters are obtained according to Equation (18) |
6: | The probabilities of the client-side model , and the server-side model , are calculated separately. |
7: | The knowledge distillation loss is calculated using Equation (19) |
8: | The client model parameters are updated through backpropagation |
9: | end for |
5. Experiment
5.1. Datasets
5.2. Performance Evaluation
5.3. Result & Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, J.; Shi, Y. A Personalized Federated Learning Method Based on Clustering and Knowledge Distillation. Electronics 2024, 13, 857. [Google Scholar] [CrossRef]
- Li, Y.; Wen, G. Research and Practice of Financial Credit Risk Management Based on Federated Learning. Eng. Lett. 2023, 31, 271. [Google Scholar]
- Yang, D.; Xu, Z.; Li, W.; Myronenko, A.; Roth, H.R.; Harmon, S.; Xu, S.; Turkbey, B.; Turkbey, E.; Wang, X.; et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China, Italy, Japan. Med. Image Anal. 2021, 70, 101992. [Google Scholar] [CrossRef] [PubMed]
- Zhuang, W.; Gan, X.; Wen, Y.; Zhang, S. Optimizing performance of federated person re-identification: Benchmarking and analysis. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–18. [Google Scholar] [CrossRef]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Wang, Y.; Krishnan, D.; Tenenbaum, J.B.; Isola, P. Rethinking few-shot image classification: A good embedding is all you need? In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 266–282. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
- Fan, Q.; Zhuo, W.; Tang, C.-K.; Tai, Y.-W. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4013–4022. [Google Scholar]
- Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015. [Google Scholar]
- Müller, T.; Pérez-Torró, G.; Franco-Salvador, M. Few-shot learning with siamese networks and label tuning. arXiv 2022, arXiv:2203.14655. [Google Scholar]
- Belton, N.; Hagos, M.T.; Lawlor, A.; Curran, K.M. Fewsome: One-class few shot anomaly detection with siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2977–2986. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Cao, C.; Zhang, Y. Learning to compare relation: Semantic alignment for few-shot learning. IEEE Trans. Image Process. 2022, 31, 1462–1474. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Feng, S.; Li, X.; Ye, Y.; Ye, R.; Luo, C.; Jiang, H. SGMNet: Scene graph matching network for few-shot remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5628915. [Google Scholar] [CrossRef]
- Zhou, F.; Wang, P.; Zhang, L.; Wei, W.; Zhang, Y. Revisiting prototypical network for cross domain few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20061–20070. [Google Scholar]
- Qin, Y.; Chen, X.; Chen, C.; Shen, Y.; Ren, B.; Gu, Y.; Yang, J.; Shen, C. Fopro: Few-shot guided robust webly-supervised prototypical learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 2101–2109. [Google Scholar]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Ye, H.-J.; Ming, L.; Zhan, D.-C.; Chao, W.-L. Few-shot learning with a strong teacher. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 46, 1425–1440. [Google Scholar] [CrossRef] [PubMed]
- Parnami, A.; Lee, M. Learning from few examples: A summary of approaches to few-shot learning. arXiv 2022, arXiv:2203.04291. [Google Scholar]
- Kang, D.; Cho, M. Integrative few-shot learning for classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 9979–9990. [Google Scholar]
- Yang, G.; Tae, H. Federated Distillation Methodology for Label-Based Group Structures. Appl. Sci. 2023, 14, 277. [Google Scholar] [CrossRef]
- Sun, Q.; Liu, Y.; Chua, T.-S.; Schiele, B. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 403–412. [Google Scholar]
- Tan, Z.; Wang, S.; Ding, K.; Li, J.; Liu, H. Transductive linear probing: A novel framework for few-shot node classification. In Proceedings of the Learning on Graphs Conference, Virtual, 9–12 December 2022; pp. 4:1–4:21. [Google Scholar]
- Oreshkin, B.; Rodríguez López, P.; Lacoste, A. Tadam: Task dependent adaptive metric for improved few-shot learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Fan, C.; Huang, J. Federated few-shot learning with adversarial learning. In Proceedings of the 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Philadelphia, PA, USA, 18–21 October 2021; pp. 1–8. [Google Scholar]
Dataset | Omniglot | FC100 | MiniImageNet | |||
---|---|---|---|---|---|---|
Setting | One-Shot | Five-Shot | One-Shot | Five-Shot | One-Shot | Five-Shot |
FsFL | 95.8 ± 1.54 | 96.4 ± 2.03 | 41.00 ± 2.10 | 57.51 ± 1.48 | 49.4 ± 1.21 | 68.2 ± 1.57 |
Fedavg | 2 | 71.2 ± 1.99 | 19.6 ± 1.00 | 40.9 ± 1.03 | 18.7 ± 0.86 | 44.1 ± 0.97 |
FedFSL | 78.1 ± 1.45 | 89.7 ± 2.06 | 38.60 ± 2.00 | 50.90 ± 1.08 | 53.52 ± 1.1 | 61.56 ± 1.66 |
Dataset | Omniglot | FC100 | MiniImageNet |
---|---|---|---|
FsFL | 0.907 | 0.603 | 0.712 |
Fedavg | 0.685 | 0.434 | 0.394 |
FedFSL | 0.875 | 0.593 | 0.632 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, J.; Chen, X.; Wang, S. Few-Shot Federated Learning: A Federated Learning Model for Small-Sample Scenarios. Appl. Sci. 2024, 14, 3919. https://doi.org/10.3390/app14093919
Tian J, Chen X, Wang S. Few-Shot Federated Learning: A Federated Learning Model for Small-Sample Scenarios. Applied Sciences. 2024; 14(9):3919. https://doi.org/10.3390/app14093919
Chicago/Turabian StyleTian, Junfeng, Xinyao Chen, and Shuo Wang. 2024. "Few-Shot Federated Learning: A Federated Learning Model for Small-Sample Scenarios" Applied Sciences 14, no. 9: 3919. https://doi.org/10.3390/app14093919
APA StyleTian, J., Chen, X., & Wang, S. (2024). Few-Shot Federated Learning: A Federated Learning Model for Small-Sample Scenarios. Applied Sciences, 14(9), 3919. https://doi.org/10.3390/app14093919