An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks
Abstract
:1. Introduction
2. Related Works
2.1. Spam Bot Detection
2.2. Graph Convolutional Networks
2.3. Attention Mechanism
3. The Proposed Approach
3.1. Formulation of The Problem
3.2. Subgraph Construction
3.2.1. User-Follow Subgraph
3.2.2. User–Retweet Subgraph
3.3. GAT-Based Spam Bot Detection Model
Algorithm 1. The main process of our approach |
Input: The graphs and ; The feature vectors ; The number of attention head Output: prediction labels vector 1: for do 2: for do 3: for do 4: Compute the weight coefficient in user–follow subgraph using Equation (5) 5: Compute the weight coefficient in retweet–follow subgraph using Equation (5) 6: end for 7: compute the node embedding using Equation (7) with coefficient 8: Using Equation (8) to average the learned embeddings from all attention head 9: end for 10: compute cross-entropy loss using Equation (9) and do back-propagation 11: Update model parameters and prediction labels vector 12: return |
4. Experiments
4.1. Dataset
4.2. Evaluation Metrics
- (1)
- Recall
- (2)
- Precision
- (3)
- F1-score
4.3. Compared Methods
- (1)
- MLP: MLP is a feedforward artificial neural network model, which maps a set of input vectors to a set of output vectors, and trains based on the feature set defined in the feature part.
- (2)
- BP [34]: The belief propagation (BP) algorithm is an approximate calculation based on MRF, in which information is transmitted iteratively between nodes in the graph, and the labels of nodes are inferred from the prior knowledge of nodes and other adjacent nodes.
- (3)
- RF [35]: This is a random forest classifier with multiple decision trees, and the output class is determined by the mode of the categories output by individual trees.
- (4)
- GCN [20]: This is a semi-supervised graph convolution network designed for graph structure. The edge information is used to aggregate the nodes to generate a new node representation.
- (5)
- GraphSAGE [21]: GraphSAGE extends GCN into an inductive learning task by training the function (convolution layer) of neighbors of aggregation nodes, which play a generalization role for unknown nodes.
- (6)
- GAT [22]: This is a semi-supervised neural network with a graph attention mechanism. The attention mechanism is used to aggregate the neighbor nodes to realize the adaptive allocation of different neighbor weights.
4.4. Parameter Settings
4.5. Result Analysis
4.5.1. Comparison with Baselines
4.5.2. Comparison with State-of-the-Art Methods
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Adewole, K.S.; Anuar, N.B.; Kamsin, A.; Varathan, K.D.; Razak, S.A. Malicious accounts: Dark of the social networks. J. Netw. Comput. Appl. 2017, 79, 41–67. [Google Scholar] [CrossRef]
- Wei, F.; Nguyen, U.T. Twitter Bot Detection Using Bidirectional Long Short-Term Memory Neural Networks and Word Embeddings. In Proceedings of the 2019 First IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Los Angeles, CA, USA, 12–14 December 2019; pp. 101–109. [Google Scholar]
- Yang, C.; Harkreader, R.; Gu, G. Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1280–1293. [Google Scholar] [CrossRef]
- Singh, M.; Bansal, D.; Sofat, S. Detecting Malicious Users in Twitter using Classifiers. In Proceedings of the 7th International Conference on Security of Information and Networks-SIN ’14, Glasgow, UK, 9–11 September 2014; ACM Press: Glasgow, UK, 2014; pp. 247–253. [Google Scholar]
- VanDam, C.; Tan, P.-N. Detecting hashtag hijacking from Twitter. In Proceedings of the 8th ACM Conference on Web Science, Hannover, Germany, 22–25 May 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 370–371. [Google Scholar]
- Varol, O.; Ferrara, E.; Davis, C.A.; Menczer, F.; Flammini, A. Online Human-Bot Interactions: Detection, Estimation, and Characterization. arXiv 2017, arXiv:170303107. [Google Scholar]
- Chen, Z.; Subramanian, D. An Unsupervised Approach to Detect Spam Campaigns that Use Botnets on Twitter. arXiv 2018, arXiv:180405232. [Google Scholar]
- Cresci, S.; Di Pietro, R.; Petrocchi, M.; Spognardi, A.; Tesconi, M. The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; International World Wide Web Conferences Steering Committee: CHE, Geneva, 2017; pp. 963–972. [Google Scholar]
- Kudugunta, S.; Ferrara, E. Deep neural networks for bot detection. Inf. Sci. 2018, 467, 312–322. [Google Scholar] [CrossRef] [Green Version]
- Yang, C.; Harkreader, R.; Zhang, J.; Shin, S.; Gu, G. Analyzing spammers’ social networks for fun and profit: A case study of cyber criminal ecosystem on twitter. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 71–80. [Google Scholar]
- Cresci, S.; Di Pietro, R.; Petrocchi, M.; Spognardi, A.; Tesconi, M. DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection. IEEE Intell. Syst. 2016, 31, 58–64. [Google Scholar] [CrossRef] [Green Version]
- Loyola-González, O.; Monroy, R.; Rodríguez, J.; López-Cuevas, A.; Mata-Sánchez, J.I. Contrast Pattern-Based Classification for Bot Detection on Twitter. IEEE Access 2019, 7, 45800–45817. [Google Scholar] [CrossRef]
- Davis, C.A.; Varol, O.; Ferrara, E.; Flammini, A.; Menczer, F. BotOrNot: A System to Evaluate Social Bots. In Proceedings of the 25th International Conference Companion on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2016; pp. 273–274. [Google Scholar]
- Li, C.; Wang, S.; He, L.; Yu, P.S.; Liang, Y.; Li, Z. SSDMV: Semi-Supervised Deep Social Spammer Detection by Multi-view Data Fusion. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 247–256. [Google Scholar]
- Jia, J.; Wang, B.; Gong, N.Z. Random Walk Based Fake Account Detection in Online Social Networks; IEEE: Denver, CO, USA, 2017; pp. 273–284. [Google Scholar]
- Wang, B.; Zhang, L.; Gong, N.Z. SybilSCAR: Sybil detection in online social networks via local rule based propagation. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- El-Mawass, N.; Honeine, P.; Vercouter, L. Supervised Classification of Social Spammers using a Similarity-based Markov Random Field Approach. In Proceedings of the 5th Multidisciplinary International Social Networks Conference, Saint-Etienne, France, 16–18 July 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–8. [Google Scholar]
- Mulamba, D.; Ray, I.; Ray, I. On Sybil Classification in Online Social Networks Using Only Structural Features. In Proceedings of the 2018 16th Annual Conference on Privacy, Security and Trust (PST), Belfast, UK, 28–30 August 2018; pp. 1–10. [Google Scholar]
- Wu, Y.; Lian, D.; Xu, Y.; Wu, L.; Chen, E. Graph Convolutional Networks with Markov Random Field Reasoning for Social Spammer Detection. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1054–1061. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:160902907. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; pp. 1024–1034. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:171010903. [Google Scholar]
- Zhao, H.; Yao, Q.; Li, J.; Song, Y.; Lee, D.L. Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 635–644. [Google Scholar]
- Wang, J.; Huang, P.; Zhao, H.; Zhang, Z.; Zhao, B.; Lee, D.L. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 839–848. [Google Scholar]
- Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 974–983. [Google Scholar]
- Grbovic, M.; Cheng, H. Real-time Personalization using Embeddings for Search Ranking at Airbnb. In Proceedings of the Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 311–320. [Google Scholar]
- Liu, Z.; Chen, C.; Yang, X.; Zhou, J.; Li, X.; Song, L. Heterogeneous Graph Neural Networks for Malicious Account Detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 2077–2085. [Google Scholar]
- Ali Alhosseini, S.; Bin Tareaf, R.; Najafi, P.; Meinel, C. Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning. In Proceedings of the 2019 World Wide Web Conference on-WWW ’19, San Francisco, CA, USA, 19–21 May 2019; ACM Press: San Francisco, CA, USA, 2019; pp. 148–153. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
- Cha, M.; Haddadi, H.; Benevenuto, F.; Gummadi, K.P. Measuring User Influence in Twitter: The Million Follower Fallacy. Icwsm 2010, 10, 30. [Google Scholar]
- Ribeiro, M.H.; Calais, P.H.; Santos, Y.A.; Almeida, V.A.F.; Meira, W., Jr. Characterizing and Detecting Hateful Users on Twitter. arXiv 2018, arXiv:180308977. [Google Scholar]
- Ribeiro, M.H.; Calais, P.H.; Santos, Y.A.; Almeida, V.A.F.; Meira, W., Jr. “Like Sheep among Wolves”: Characterizing Hateful Users on Twitter. arXiv 2018, arXiv:180100317. [Google Scholar]
- Zhao, C.; Xin, Y.; Li, X.; Yang, Y.; Chen, Y. A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data. Appl. Sci. 2020, 10, 936. [Google Scholar] [CrossRef] [Green Version]
- Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Elsevier: Amsterdam, The Netherlands, 2014; ISBN 978-0-08-051489-5. [Google Scholar]
- Fu, H.; Xie, X.; Rui, Y.; Gong, N.Z.; Sun, G.; Chen, E. Robust Spammer Detection in Microblogs: Leveraging User Carefulness. ACM Trans. Intell. Syst. Technol. 2017, 8, 83:1–83:31. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; Association for Computing Machinery: New York, NY, USA, 2006; pp. 233–240. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Feature | Description |
---|---|
account_age | The number of days an account was created on Twitter |
no_followers | The number of followers for an account on Twitter |
no_followings | The number of followings for an account on Twitter |
no_userfavourites | The number of favorites received by an account on Twitter |
no_statuses | The number of tweets posted by the account on Twitter, including retweets |
no_tweets | The number of tweets posted by the account on Twitter |
no_retweets | The number of tweets retweeted |
Method | Recall | Precision | F1-Score |
---|---|---|---|
RF | 0.66 | 0.88 | 0.76 |
MLP | 0.73 | 0.81 | 0.77 |
BP | 0.54 | 0.56 | 0.55 |
Our Approach | 0.88 | 0.93 | 0.91 |
Method | Recall | Precision | F1-Score |
---|---|---|---|
GCN | 0.76 | 0.87 | 0.81 |
GraphSAGE | 0.80 | 0.88 | 0.84 |
GAT | 0.83 | 0.87 | 0.85 |
Our Approach | 0.88 | 0.93 | 0.91 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, C.; Xin, Y.; Li, X.; Zhu, H.; Yang, Y.; Chen, Y. An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks. Appl. Sci. 2020, 10, 8160. https://doi.org/10.3390/app10228160
Zhao C, Xin Y, Li X, Zhu H, Yang Y, Chen Y. An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks. Applied Sciences. 2020; 10(22):8160. https://doi.org/10.3390/app10228160
Chicago/Turabian StyleZhao, Chensu, Yang Xin, Xuefeng Li, Hongliang Zhu, Yixian Yang, and Yuling Chen. 2020. "An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks" Applied Sciences 10, no. 22: 8160. https://doi.org/10.3390/app10228160
APA StyleZhao, C., Xin, Y., Li, X., Zhu, H., Yang, Y., & Chen, Y. (2020). An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks. Applied Sciences, 10(22), 8160. https://doi.org/10.3390/app10228160