FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection
Abstract
:1. Introduction
- We construct a naive Federated RGCN framework for social bot detection in a multi-party data fusion scenario;
- We define a class-level cross-entropy loss function. and applied it to the training process of local models, mitigating the impact of class imbalance issues during client model training;
- By applying the knowledge distillation method, we make adjustments at both the server and client sides, effectively alleviating the impact of data distribution heterogeneity among clients on the performance of the federated learning model.
2. Related Work
2.1. Social Bot Detection
2.2. Federated Learning
2.3. Federated Social Bot Detection
3. Materials and Methods
3.1. Problem Definition
3.2. FedKG Social Bot Detection Framework
3.2.1. Social Bot Detection Model Based on RGCN Graph Representation Approach
- User feature coding
- Feature extractor for user feature representation
- Classifier for user classification
- Learning and optimization
3.2.2. Knowledge Distillation-Optimized Federated Learning Framework
4. Experiments and Analysis
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Baselines
- Local: A detection model trained only on local private data, without any information exchange between participants;
- FedAvg: A basic federated learning algorithm, with parameter sharing between the client and the server, and the server performs weighted averaging of client parameters based on their sample quantities;
- FedProx: A federated learning method that introduces a regularization term in the loss function on the clients to reduce the distance between local and global models;
- FedDistill: A data-free federated knowledge distillation method where the clients share the average of the label-based logit vectors. Since there is no parameter sharing, the performance of FedDistill decreases significantly. To ensure fairness, we modified the original method to share logit averages based on shared model parameters as a baseline comparison method;
- FedACK: A federated social bot detection method that proposes a GAN-based federated adversarial comparison knowledge distillation mechanism. The relationship information between users is not considered in the social bot detection model, and the detection performance has a large gap with the detection of graph model-based methods; we modify its framework by replacing its local social bot detection model with our local detection framework as a baseline comparison method.
4.1.3. Data Heterogeneity
4.1.4. Implementation Details
4.2. Performance Comparison
4.3. Communication Efficiency Comparison
4.4. Impact of Local Epochs
4.5. Impact of the Number of Clients
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Abokhodair, N.; Yoo, D.; McDonald, D.W. Dissecting a social botnet: Growth, content and influence in Twitter. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, Vancouver, BC, Canada, 14–18 March 2015; pp. 839–851. [Google Scholar]
- Ferrara, E.; Wang, W.Q.; Varol, O.; Flammini, A.; Galstyan, A. Predicting online extremism, content adopters, and interaction reciprocity. In Proceedings of the Social Informatics, 8th International Conference, SocInfo 2016, Bellevue, WA, USA, 11–14 November 2016; Part II 8. Springer: Berlin/Heidelberg, Germany, 2016; pp. 22–39. [Google Scholar]
- Berger, J.; Morgan, J. Defining and Describing the Population of ISIS Supporters on Twitter. 2015. Available online: http://www.Brookings.Edu/research/papers/2015/03/isis-Twitter (accessed on 1 December 2022).
- Cresci, S. A decade of social bot detection. Commun. ACM 2020, 63, 72–83. [Google Scholar] [CrossRef]
- Yang, Y.; Yang, R.; Peng, H.; Li, Y.; Li, T.; Liao, Y.; Zhou, P. FedACK: Federated Adversarial Contrastive Knowledge Distillation for Cross-Lingual and Cross-Model Social Bot Detection. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1314–1323. [Google Scholar]
- Peng, H.; Zhang, Y.; Sun, H.; Bai, X.; Li, Y.; Wang, S. Domain-Aware Federated Social Bot Detection with Multi-Relational Graph Neural Networks. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–8. [Google Scholar]
- McMahan, H.; Moore, E.; Ramage, D.; Arcas, B. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-iid data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Jeong, E.; Oh, S.; Kim, H.; Park, J.; Bennis, M.; Kim, S.L. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv 2018, arXiv:1811.11479. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
- Yardi, S.; Romero, D.; Schoenebeck, G. Detecting spam in a twitter network. First Monday 2010, 15, 1–4. [Google Scholar] [CrossRef]
- Varol, O.; Ferrara, E.; Davis, C.; Menczer, F.; Flammini, A. Online human-bot interactions: Detection, estimation, and characterization. In Proceedings of the International AAAI Conference on Web and Social Media, Montreal, QC, Canada, 15–18 May 2017; Volume 11, pp. 280–289. [Google Scholar]
- Yang, K.C.; Varol, O.; Hui, P.M.; Menczer, F. Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 1096–1103. [Google Scholar]
- Kantepe, M.; Ganiz, M.C. Preprocessing framework for Twitter bot detection. In Proceedings of the 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 5–8 October 2017; IEEE: New York, NY, USA, 2017; pp. 630–634. [Google Scholar]
- Kudugunta, S.; Ferrara, E. Deep neural networks for bot detection. Inf. Sci. 2018, 467, 312–322. [Google Scholar] [CrossRef]
- Wei, F.; Nguyen, U.T. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. In Proceedings of the 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Los Angeles, CA, USA, 12–14 December 2019; IEEE: New York, NY, USA, 2019; pp. 101–109. [Google Scholar]
- Stanton, G.; Irissappane, A.A. GANs for semi-supervised opinion spam detection. arXiv 2019, arXiv:1903.08289. [Google Scholar]
- Feng, S.; Wan, H.; Wang, N.; Li, J.; Luo, M. Satar: A self-supervised approach to twitter account representation learning and its application in bot detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Online, 1–5 November 2021; pp. 3808–3817. [Google Scholar]
- Hayawi, K.; Mathew, S.; Venugopal, N.; Masud, M.M.; Ho, P.H. DeeProBot: A hybrid deep neural network model for social bot detection based on user profile data. Soc. Netw. Anal. Min. 2022, 12, 43. [Google Scholar] [CrossRef] [PubMed]
- Arin, E.; Kutlu, M. Deep learning based social bot detection on twitter. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1763–1772. [Google Scholar] [CrossRef]
- Ali Alhosseini, S.; Bin Tareaf, R.; Najafi, P.; Meinel, C. Detect me if you can: Spam bot detection using inductive representation learning. In Proceedings of the Companion Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 148–153. [Google Scholar]
- Feng, S.; Wan, H.; Wang, N.; Luo, M. BotRGCN: Twitter bot detection with relational graph convolutional networks. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, The Hague, The Netherlands, 7–10 December 2021; pp. 236–239. [Google Scholar]
- Feng, S.; Tan, Z.; Li, R.; Luo, M. Heterogeneity-aware twitter bot detection with relational graph transformers. Proc. AAAI Conf. Artif. Intell. 2022, 36, 3977–3985. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, D.; Zhou, J.; Jain, A.K. Fedface: Collaborative learning of face recognition model. In Proceedings of the 2021 IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, China, 4–7 August 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
- Zhou, P.; Wang, K.; Guo, L.; Gong, S.; Zheng, B. A privacy-preserving distributed contextual federated online learning framework with big data support in social recommender systems. IEEE Trans. Knowl. Data Eng. 2019, 33, 824–838. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated learning based on dynamic regularization. arXiv 2021, arXiv:2111.04263. [Google Scholar]
- Li, Q.; He, B.; Song, D. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 10713–10722. [Google Scholar]
- Seo, H.; Park, J.; Oh, S.; Bennis, M.; Kim, S.L. 16 federated knowledge distillation. In Machine Learning and Wireless Communications; Cambridge University Press: Cambridge, UK, 2022; p. 457. [Google Scholar]
- Rasouli, M.; Sun, T.; Rajagopal, R. Fedgan: Federated generative adversarial networks for distributed data. arXiv 2020, arXiv:2006.07228. [Google Scholar]
- Zhu, Z.; Hong, J.; Zhou, J. Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12878–12889. [Google Scholar]
- Zhang, L.; Shen, L.; Ding, L.; Tao, D.; Duan, L.Y. Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10174–10183. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Liu, F.; Ma, X.; Wu, J.; Yang, J.; Xue, S.; Beheshti, A.; Zhou, C.; Peng, H.; Sheng, Q.Z.; Aggarwal, C.C. Dagad: Data augmentation for graph anomaly detection. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022; IEEE: New York, NY, USA, 2022; pp. 259–268. [Google Scholar]
- Feng, S.; Wan, H.; Wang, N.; Li, J.; Luo, M. Twibot-20: A comprehensive twitter bot detection benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Online, 1–5 November 2021; pp. 4485–4494. [Google Scholar]
- Li, Q.; Diao, Y.; Chen, Q.; He, B. Federated learning on non-iid data silos: An experimental study. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; IEEE: New York, NY, USA, 2022; pp. 965–978. [Google Scholar]
Feature Name | Description |
---|---|
followers | number of followers |
followings | number of followings |
favorites | number of likes |
statuses | number of statuses |
active_days | number of active days |
screen_name_length | screen name character count |
Feature Name | Description |
---|---|
protected | protected or not |
geo_enabled | enable geo-location or not |
verified | verified or not |
contributors_enabled | enable contributors or not |
is_translator | translator or not |
is_translation_enabled | translation or not |
profile_background_tile | the background tile |
profile_user_background_image | have background image or not |
has_extended_profile | have extended profile or not |
default_profile | the default profile |
default_profile_image | the default profile image |
Setting | = 1 | = 0.8 | = 0.5 | = 0.3 |
---|---|---|---|---|
local | 86.09 | 74.54 | 64.56 | 55.96 |
FedAvg | 94.72 | 75.95 | 64.75 | 56.13 |
FedProx | 94.97 | 87.11 | 79.37 | 72.65 |
FedDistill | 95.25 | 90.32 | 79.88 | 70.86 |
FedACK | 94.78 | 82.65 | 79.27 | 73.61 |
FedKG-C | 95.03 | 87.72 | 81.64 | 75.76 |
FedKG | 95.10 | 93.43 | 91.12 | 82.48 |
Setting | = 1 (acc = 90%) | = 0.8 (acc = 80%) | = 0.5 (acc = 70%) | = 0.3 (acc = 70%) |
---|---|---|---|---|
FedAvg | 4 | unreached | unreached | unreached |
FedProx | 13 | 13 | 14 | 34 |
FedDistill | 4 | 10 | 24 | 32 |
FedACK | 5 | 3 | 8 | 3 |
FedKG | 2 | 1 | 2 | 1 |
Setting | = 1 | = 0.8 | = 0.5 | = 0.3 |
---|---|---|---|---|
2 | 94.66 | 94.07 | 94.08 | 90.24 |
4 | 94.02 | 93.43 | 91.12 | 82.48 |
6 | 93.82 | 92.77 | 89.94 | 85.63 |
8 | 93.67 | 88.25 | 82.99 | 81.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Chen, K.; Wang, K.; Wang, Z.; Zheng, K.; Zhang, J. FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection. Sensors 2024, 24, 3481. https://doi.org/10.3390/s24113481
Wang X, Chen K, Wang K, Wang Z, Zheng K, Zhang J. FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection. Sensors. 2024; 24(11):3481. https://doi.org/10.3390/s24113481
Chicago/Turabian StyleWang, Xiujuan, Kangmiao Chen, Keke Wang, Zhengxiang Wang, Kangfeng Zheng, and Jiayue Zhang. 2024. "FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection" Sensors 24, no. 11: 3481. https://doi.org/10.3390/s24113481
APA StyleWang, X., Chen, K., Wang, K., Wang, Z., Zheng, K., & Zhang, J. (2024). FedKG: A Knowledge Distillation-Based Federated Graph Method for Social Bot Detection. Sensors, 24(11), 3481. https://doi.org/10.3390/s24113481