A Communication-Efficient Federated Text Classification Method Based on Parameter Pruning
Abstract
:1. Introduction
- We extend the W2V model to a federated version, called FedW2V, in the case that data are non-IID distributed and the document corpus is not allowed to be shared. In FedW2V, the aggregation algorithm is refined.
- Then we extend the deep learning model TextCNN for text classification under the federal architecture.
- In order to deal with the problem of large model parameters and high federation communication costs in federated TextCNN. A parameter pruning algorithm FedInitPrune (Federated Initialization Prune, FedInitPrune) is proposed, which reduces the amount of uplink and downlink communication data during the training process and only loses small model performance.
- We run a set of experiments on real-world datasets, the accuracy of our proposal is up to 91.71%, and the communication cost is reduced by 74.26% in the best case, indicating the effectiveness of our method.
2. Related Works
3. System Architecture and Algorithms
3.1. System Architecture
3.2. Federated Word Embedding Model
3.2.1. FedW2V Aggregation Method
3.2.2. FedW2V Algorithm
Algorithm 1: FedW2V algorithm |
Input:’s dataset initial parameter of federated Skip-gram model, global training rounds ; Output: ’s local vocabulary , global vocabulary , global word embedding model ; |
Pi’s side |
1. Build a local vocabulary based on the local dataset: ; |
2. Upload local vocabulary to the server; |
Server’s side |
3. Collect the local vocabularies from the participants; 4. Calculate the global vocabulary ; 5. Send to the participants; |
Participants’ side |
6. Build the training set Si based on ; |
Server’s side 7. Calculate each participant’s update vocabulary weight: ; 8. Initialize the federated Skip-gram model parameter |
9. for = 0, 1, 2,…, G do |
10. for do |
11. ; |
12. end |
13. |
14. end |
- Input layer: The input is the one-hot encoding form of the word index in the dictionary, corresponding to the headword, background word and noise word.
- Hidden layer: the number of neurons in the hidden layer is consistent with the dimension of the word vector; that is, the parameter of the hidden layer is a matrix of size , called the central word vector matrix The reduced dimension representation of the headword vector can be obtained by multiplying the one-hot vector of the headword of the input layer vector.
- Output layer: the aim is to predict the score of background words and noise words according to the headword. The one-hot vector of background words and noise words multiplies the background word vector matrix with size , as the parameter weight from the hidden layer to the output layer. Multiply the parameter weight matrix of the hidden layer and the output layer to get the predicted scores of background words and noise words . Perform a sigmoid operation on the score and output it as the prediction score and . Compare the predicted score with the ground truth to calculate the model loss, and carry out backpropagation to update matrix and .
Algorithm 2: |
Input: the global model parameter of the -th training round, local training rounds ; Output: the local model parameter of the -th round; |
1. Participants update local model parameters: ; |
2. Partition the training set by the ; |
3. for do |
4. for B do |
5. ; |
6. end |
7. end |
8. Upload to the server; |
3.3. Communication-Efficient Federated Text Classification
3.3.1. Parameter Pruning
3.3.2. FedInitPrune Algorithm
Algorithm 3: FedInitPrune |
Input: ’s dataset , Global model initial parameters , pruning rate , global training rounds , local training rounds , participatory dataset partition size ; Output: global model after pruning |
Server: 1. Initialize the TextCNN model parameter ; 2. for do |
3. if do |
4. for do |
5. |
6. |
7. //normalization |
8. |
9. |
10. |
11. |
12. else do |
13. for do |
14. |
15. Pi’s side: 16. partition the dataset based on ; 17. if 18. update local model parameter: 19. for B do 20. calculate based on Formula (6) and 21. Upload to the server; 22.else 23. update local model parameter: 24. for do 25. for 26. 27. Upload to the server; |
4. Experiments and Results Analysis
4.1. Dataset
4.2. Evaluation Measures
4.3. Experimental Results and Analysis
4.3.1. Training Results of Word Embedding Model
4.3.2. Classification Model Communication Cost Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef] [Green Version]
- Yang, Q.; Liu, Y.; Chen, T.J.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Yin, X.; Zhu, Y.; Hu, J. A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2022, 54, 1–36. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations, ICLR (Workshop Poster), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Kim, S.; Park, H.; Lee, J. Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis. Expert Syst. Appl. 2020, 152, 113401. [Google Scholar] [CrossRef]
- Pablos, A.G.; Cuadros, M.; Rigau, G. W2VLDA: Almost unsupervised system for Aspect Based Sentiment Analysis. Expert Syst. Appl. 2018, 91, 127–137. [Google Scholar] [CrossRef] [Green Version]
- Sharma, A.; Kumar, S. Ontology-based semantic retrieval of documents using Word2vec model. Data Knowl. Eng. 2023, 144, 102110. [Google Scholar] [CrossRef]
- Ma, J.; Wang, L.; Zhang, Y.-R.; Yuan, W.; Guo, W. An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local. Expert Syst. Appl. 2023, 212, 118695. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics. Proc. Mach. Learn. Res. 2017, 54, 1273–1282. [Google Scholar]
- Bathla, G.; Singh, P.; Singh, R.K.; Cambria, E.; Tiwari, R. Intelligent fake reviews detection based on aspect extraction and analysis using deep learning. Neural Comput. Appl. 2022, 34, 20213–20229. [Google Scholar] [CrossRef]
- Yin, L.; Feng, J.; Xun, H.; Sun, Z.; Cheng, X. A privacy-preserving federated learning for multiparty data sharing in social IoTs. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2706–2718. [Google Scholar] [CrossRef]
- Dong, Y.; Hou, W.; Chen, X.; Zeng, X. Efficient and Secure Federated Learning Based on Secret Sharing and Selection. J. Comput. Res. Dev. 2020, 57, 10. [Google Scholar]
- Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
- Reisizadeh, A.; Mokhtari, A.; Hassani, H.; Jadbabaie, A.; Pedarsani, R. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. Proc. Mach. Learn. Res. 2020, 108, 2021–2031. [Google Scholar]
- Wang, Y.; Zhang, X.; Xie, L.; Zhou, J.; Su, H.; Zhang, B.; Hu, X. Pruning from Scratch. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12273–12280. [Google Scholar]
- Chen, X.; Xu, L.; Liu, Z.; Sun, M.; Luan, H. Joint Learning of Character and Word Embeddings. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Li, K.; Wang, H.; Zhang, Q. FedTCR: Communication-efficient federated learning via taming computing resources. Complex Intell. Syst. 2023, 1–21. [Google Scholar] [CrossRef]
Aggregation Algorithm | External Tasks (Classification Accuracy) | Internal Tasks (Similarity) | Internal Tasks (Analogy) | |
---|---|---|---|---|
WordSim-240 | WordSim-297 | |||
FedAvg | 88.68% | 34.98 | 52.70 | 20.88% |
FedW2V | 90.25% | 39.27 | 60.99 | 28.57% |
Aggregation Algorithm | External Tasks (Classification Accuracy) | Internal Tasks (Similarity) | Internal Tasks (Analogy) | |
---|---|---|---|---|
WordSim-240 | WordSim-297 | |||
FedAvg | 91.08% | 32.77 | 51.54 | 34.77% |
FedW2V | 91.29% | 56.48 | 54.29 | 56.48% |
Algorithms | Communication Data Volume/Round | Total Communication Data Volume | ||
---|---|---|---|---|
Fudan | THUCNews | Fudan (200 Rounds) | THUCNews (50 Rounds) | |
FedAvg | 4.7337 × 107 | 4.6156 × 107 | 9.4674 × 109 | 2.3078 × 109 |
Top-k/10% | 2.8700 × 107 | 2.7991 × 107 | 5.7399 × 109 | 1.3995 × 109 |
Top-k/25% | 3.5800 × 107 | 3.4914 × 107 | 7.1600 × 109 | 1.7457 × 109 |
FedInitPrune/25% | 1.1948 × 107 | 1.1651 × 107 | 2.4365 × 109 | 6.2872 × 109 |
FedInitPrune/50% | 2.3744 × 107 | 2.3153 × 107 | 4.7962 × 109 | 1.2038 × 109 |
Algorithms | Accuracy | Compression Ratio | ||
---|---|---|---|---|
Fudan | THUCNews | Fudan | THUCNews | |
FedAvg | 88.28% | 91.09% | −0% | −0% |
Top-k/10% | 87.20% | 90.37% | −39.37% | −39.36% |
Top-k/25% | 88.25% | 90.88% | −24.37% | −24.36% |
FedInitPrune/25% | 86.84% | 91.02% | −74.26% | −72.76% |
FedInitPrune/50% | 88.15% | 91.71% | −49.34% | −47.84% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huo, Z.; Fan, Y.; Huang, Y. A Communication-Efficient Federated Text Classification Method Based on Parameter Pruning. Mathematics 2023, 11, 2804. https://doi.org/10.3390/math11132804
Huo Z, Fan Y, Huang Y. A Communication-Efficient Federated Text Classification Method Based on Parameter Pruning. Mathematics. 2023; 11(13):2804. https://doi.org/10.3390/math11132804
Chicago/Turabian StyleHuo, Zheng, Yilin Fan, and Yaxin Huang. 2023. "A Communication-Efficient Federated Text Classification Method Based on Parameter Pruning" Mathematics 11, no. 13: 2804. https://doi.org/10.3390/math11132804
APA StyleHuo, Z., Fan, Y., & Huang, Y. (2023). A Communication-Efficient Federated Text Classification Method Based on Parameter Pruning. Mathematics, 11(13), 2804. https://doi.org/10.3390/math11132804