Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model
Abstract
:1. Introduction
- 1.
- We design a novel fusion framework that combines entropy, causality, and a GCN for link prediction in dynamic social networks;
- 2.
- We propose the concept of Temporal Information Entropy (TIE), which is used as a weighting factor in the Node2Vev random walk. We then introduce an improved Node2Vec algorithm for feature generation, enabling analysis of dynamic social networks from both temporal and structural perspectives;
- 3.
- We construct a causality analysis model and use it to process the generated feature vectors, which helps to weight the influence of current node features based on their causal strength;
- 4.
- We use a specific optimizer and dynamic learning rate with the GCN model, enabling better capturing of network characteristics and achieving a higher performance output;
- 5.
- We conduct repeated experiments on different datasets, and highlight the performance of the proposed fusion framework compared to other models.
2. Methods
2.1. Feature Generation Based on Improved Node2Vec
2.1.1. Temporal Information Entropy (TIE)
- 1.
- Sequentially record the timestamp difference value for each node between time i and time and denote it as ;
- 2.
- Sequentially sum the timestamp difference values for each node and denote it as ;
- 3.
- Sequentially calculate the probability of each node’s timestamp difference value, as shown in Equation (1) below. Given the low probability of interactions transpiring at identical time intervals in the real world, each node’s timestamp difference signifies a unique scenario. The proportion of each timestamp difference to the sum of all timestamp differences is used as a probability for subsequent operations, where represents the probability of each node’s timestamp difference occurring, represents each node’s timestamp difference value between time i and time , and represents the sum of timestamp differences for each node:
- 4.
- Sequentially compute the Shannon Entropy by using the above probabilities and then sum them them up. Next, the initial value of time information entropy of each node can be obtained, as shown in Equation (2) below, where represents the initial non-normalized Temporal Information Entropy:
- 5.
- Sequentially standardize the initial entropy of each node, and the final output TIE is obtained as shown in Equation (3) below, where represents each node’s TIE and N represents the total number of nodes in the current network:
2.1.2. Feature Generation of Combining Node2Vec and TIE
2.2. Feature Processing Based on a Causality Analysis
- 1.
- Draw a network map based on the data;
- 2.
- Analyze node correlation. Firstly, introduce the concept of mixed centrality (mixing degree centrality, closeness centrality and eigenvector centrality). Secondly, set a threshold, and select nodes larger than the threshold as the current nodes for correlation;
- 3.
- Set up counterfactual experiments for a causality analysis. Firstly, estimate the node selected in step 2 as the “cause”. Secondly, carry out a path analysis between this current node and other nodes, referred to as target nodes, within the network, examining these paths sequentially. If more paths contain target nodes, this indicates that deleting the current node will cause changes in the target node, confirming the current node as the “cause” of the target node. Otherwise, there is no causal relationship between the current node and the target node. Thirdly, count the number of times the current node is used as a “cause”, denoted as causal strength ;
- 4.
- Standardize the causal strength obtained from the casual analysis and then incorporate it into the existing feature vector generation method, as shown in Equations (8) and (9) below, where represents the standardized causal strength value, represents the feature vectors generated based on the improved Node2Vec, v represents feature vector processing based on the causal analysis, and c represents a constant (default 0.1, considering the case where the causal intensity is 0).
2.3. Training Based on Specific GCN Models
3. Experiments
3.1. Dataset Introduction
3.2. Experimental Settings
3.3. Benchmark Models
- 1.
- Deep Autoencoder: Deep Autoencoder is an artificial neural network model used for unsupervised learning and dimensionality reductions. It consists of an encoder network that maps input data to a low-dimensional latent space and a decoder network that reconstructs input data from a latent space representation, where z represents the first part of the composition, as shown in Equation (10) below. represents the second part of the composition, as shown in Equation (11) below. The Deep Autoencoder link prediction process after training is shown in Equation (12):
- 2.
- GraphSAGE (Graph Sample and Aggregate): GraphSAGE is a graph neural network architecture that captures graph structure information by sampling and aggregating information from each node’s domain, where represents the neighbor of Node v. represents the output of Node v at layer , as shown in Equation (13) below. The GraphSAGE link prediction process after training is shown in Equation (14):
- 3.
- Graph Convolutional Networks (GCNs): GCNs are a graph neural network designed for manipulating graph-structured data, learning by aggregating information from adjacent nodes, where represents the output of Node v at layer , as shown in Equation (15) below. The GCN link prediction process after training is shown in Equation (16) below:
3.4. Evaluation Metrics
4. Results
4.1. Sensitivity to the Positive and Negative Sample Ratio
4.2. Email Dataset
4.3. CollegeMsg Dataset
4.4. Hypertext Dataset
4.5. Confidence Interval Analysis
4.6. Complexity Analysis
- 1.
- Feature generation based on improved Node2Vec: The main time consumption of this part is linked to the Skip-gram model’s iterative training to generate features, and the complexity is roughly , where E represents the edge number, L represents the random walk length, D represents the eigenvector dimension, W represents the random walk numbers, and represents the Skip-gram model iteration number;
- 2.
- Feature processing based on a causality analysis: The main time consumption of this part is to determine whether there is causality between nodes based on the path, and the complexity is roughly , where N represents nodes number, represents paths number, and represents path length;
- 3.
- Training based on specific GCN models: The main time consumption of this part is linked to the GCN model’s iterative training for link prediction, and the complexity is roughly , where D represents the eigenvector dimension, E represents the edge number, represents the convolution layer number, represents the GCN model iteration number, and K represents the repeated experiment number.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
MDPI | Multidisciplinary Digital Publishing Institute |
GCN | Graph Convolutional Network |
TIE | Temporal Information Entropy |
DFS | Depth-First Search |
BFS | Breadth-First Search |
References
- Rashid, Y.; Bhat, J.I. An insight into topological, machine and Deep Learning-based approaches for influential node identification in social media networks: A systematic review. Multimed. Syst. 2024, 30, 57. [Google Scholar] [CrossRef]
- McAuley, J.; Leskovec, J. Learning to discover social circles in ego networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 3–6 December 2012; Volune 1, pp. 539–547. [Google Scholar]
- Yang, J.; Leskovec, J. Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 2015, 42, 181–213. [Google Scholar] [CrossRef]
- Vikash; Kumar, T.V.V. Trust assessment in social networks. Int. J. Syst. Assur. Eng. Manag. 2023. [CrossRef]
- Cai, H.; Zheng, V.W.; Chang, K.C.-C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef]
- Tang, Y.; Li, J.; Haldar, N.A.H.; Guan, Z.; Xu, J.; Liu, C. Reliability-Driven Local Community Search in Dynamic Networks. IEEE Trans. Knowl. Data Eng. 2024, 36, 809–822. [Google Scholar] [CrossRef]
- Choudhury, N. Community-Aware Evolution Similarity for Link Prediction in Dynamic Social Networks. Mathematics 2024, 12, 285. [Google Scholar] [CrossRef]
- Daud, N.N.; Ab Hamid, S.H.; Saadoon, M.; Sahran, F.; Anuar, N.B. Applications of link prediction in social networks: A review. J. Netw. Comput. Appl. 2020, 166, 102716. [Google Scholar] [CrossRef]
- Mutlu, E.C.; Oghaz, T.; Rajabi, A.; Garibay, I. Review on Learning and Extracting Graph Features for Link Prediction. Mach. Learn. Knowl. Extr. 2020, 2, 672–704. [Google Scholar] [CrossRef]
- Qin, M.; Yeung, D.-Y. Temporal Link Prediction: A Unified Framework, Taxonomy, and Review. ACM Comput. Surv. 2024, 56, 89. [Google Scholar] [CrossRef]
- Chen, C.; Liu, Y.Y. A Survey on Hyperlink Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Adamic, L.A.; Adar, E. Friends and neighbors on the Web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
- Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
- Zhang, Z.-K.; Chen, B.; Li, F.; Chen, S.; Hu, R.; Chen, L. Link prediction based on non-negative matrix factorization. PLoS ONE 2017, 12, e0182968. [Google Scholar]
- Chen, G.; Wang, H.; Fang, Y.; Jiang, L. Link prediction by deep non-negative matrix factorization. Expert Syst. Appl. 2022, 188, 115991. [Google Scholar] [CrossRef]
- Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
- Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar]
- Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
- Yi, T.; Zhang, S.; Bu, Z.; Du, J.; Fang, C. Link prediction based on higher-order structure extraction and autoencoder learning in directed networks. Knowl.-Based Syst. 2022, 241, 108241. [Google Scholar] [CrossRef]
- Kumar, N.; Verma, H.; Sharma, Y.K. Graph Convolutional Neural Networks for Link Prediction in Social Networks. In Concepts and Techniques of Graph Neural Networks; IGI Global: Hershey, PA, USA, 2023; pp. 86–107. [Google Scholar]
- Kerkache, H.M.; Sadeg-Belkacem, L.; Tayeb, F.B.-S. Similarity-Based Hybrid Algorithms for Link Prediction Problem in Social Networks. New Gener. Comput. 2023, 41, 281–314. [Google Scholar] [CrossRef]
- Jiawei, E.; Zhang, Y.; Yang, S.; Wang, H.; Xia, X.; Xu, X. GraphSAGE++: Weighted Multi-scale GNN for Graph Representation Learning. Neural Process. Lett. 2024, 56, 24. [Google Scholar] [CrossRef]
- Tan, J.; Geng, S.; Fu, Z.; Ge, Y.; Xu, S.; Li, Y.; Zhang, Y. Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning. In Proceedings of the 24th International Conference on World Wide Web, Lyon, France, 25–29 April 2022; pp. 1018–1027. [Google Scholar]
- Kumar, A.; Jain, D.K.; Mallik, A.; Kumar, S. Modified Node2Vec and attention based fusion framework for next POI recommendation. Inf. Fusion 2024, 101, 101998. [Google Scholar] [CrossRef]
- Dong, H.; Li, L.; Tian, D.; Sun, Y.; Zhao, Y. Dynamic link prediction by learning the representation of node-pair via graph neural networks. Expert Syst. Appl. 2024, 241, 122685. [Google Scholar] [CrossRef]
- Jiang, X.; Pu, Y. Exploring Time Granularity on Temporal Graphs for Dynamic Link Prediction in Real-world Networks. arXiv 2023, arXiv:2311.12255. [Google Scholar]
- Wu, C.; Wang, Y.; Jia, T. Dynamic Link Prediction Using Graph Representation Learning with Enhanced Structure and Temporal Information. In Proceedings of the 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; pp. 279–284. [Google Scholar]
- Wen, J.; Gabrys, B.; Musial, K. Towards Digital Twin-Oriented Complex Networked Systems: Introducing heterogeneous node features and interaction rules. PLoS ONE 2024, 19, e0296426. [Google Scholar] [CrossRef] [PubMed]
- Peng, J.; Lu, G.; Shang, X. A Survey of Network Representation Learning Methods for Link Prediction in Biological Network. Curr. Pharm. Des. 2020, 26, 3076–3084. [Google Scholar] [CrossRef] [PubMed]
- Frenkel, M.; Shoval, S.; Bormashenko, E. Shannon Entropy of Ramsey Graphs with up to Six Vertices. Entropy 2023, 25, 1427. [Google Scholar] [CrossRef] [PubMed]
- Paranjape, A.; Benson, A.R.; Leskovec, J. Motifs in Temporal Networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017. [Google Scholar]
- Panzarasa, P.; Opsahl, T.; Carley, K. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. J. Am. Soc. Inf. Sci. Technol. 2009, 60, 911–932. [Google Scholar] [CrossRef]
- Isella, L.; Stehlé, J.; Barrat, A.; Cattuto, C.; Pinton, J.-F.; den Broeck, W.V. What is in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 2010, 271, 166–180. [Google Scholar] [CrossRef] [PubMed]
- Khanam, K.Z.; Singhal, A.; Mago, V. NODDLE: Node2vec Based Deep Learning Model for Link Prediction; Springer Nature: Cham, Switzerland, 2023; pp. 196–212. [Google Scholar]
- Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhang, M.; Chen, Y. Link prediction based on graph neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Name | Nodes | Edges | Description |
---|---|---|---|
986 | 332,334 | Research emails between institutional users. | |
CollegeMsg | 1899 | 59,835 | Research messages on platforms like Facebook. |
Hypertext | 113 | 20,818 | Research face-to-face contact between participants. |
Model | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
① | 83.33 | 1 | 90.90 | 83.33 |
② | 83.30 | 1 | 90.89 | 83.30 |
③ | 83.39 | 1 | 90.94 | 83.39 |
④ | 83.36 | 1 | 90.93 | 83.36 |
⑤ | 86.42 | 99.25 | 92.33 | 86.16 |
⑥ | 92.37 | 98.05 | 95.12 | 91.61 |
Model | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
① | 83.35 | 1 | 90.92 | 83.35 |
② | 83.40 | 1 | 90.95 | 83.40 |
③ | 83.39 | 1 | 90.94 | 83.39 |
④ | 88.16 | 96.31 | 91.86 | 85.68 |
⑤ | 88.66 | 99.72 | 93.72 | 88.58 |
⑥ | 92.63 | 97.83 | 95.11 | 91.53 |
Model | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
① | 83.23 | 1 | 90.85 | 83.23 |
② | 83.26 | 1 | 90.88 | 83.26 |
③ | 83.43 | 1 | 90.97 | 83.43 |
④ | 83.39 | 1 | 90.94 | 83.39 |
⑤ | 86.50 | 96.58 | 90.76 | 83.66 |
⑥ | 96.59 | 86.40 | 91.13 | 86.00 |
Confidence | CollegeMsg | Hypertext | |
---|---|---|---|
90% | [89.20, 90.99] | [90.58, 91.40] | [85.06, 85.50] |
95% | [89.02, 91.16] | [90.50, 91.48] | [85.02, 85.54] |
98% | [88.82, 91.36] | [90.41, 91.57] | [84.97, 85.59] |
99% | [88.68, 91.50] | [90.35, 91.64] | [84.93, 85.63] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Li, J.; Yuan, Y. Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model. Entropy 2024, 26, 477. https://doi.org/10.3390/e26060477
Huang X, Li J, Yuan Y. Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model. Entropy. 2024; 26(6):477. https://doi.org/10.3390/e26060477
Chicago/Turabian StyleHuang, Xiaoli, Jingyu Li, and Yumiao Yuan. 2024. "Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model" Entropy 26, no. 6: 477. https://doi.org/10.3390/e26060477
APA StyleHuang, X., Li, J., & Yuan, Y. (2024). Link Prediction in Dynamic Social Networks Combining Entropy, Causality, and a Graph Convolutional Network Model. Entropy, 26(6), 477. https://doi.org/10.3390/e26060477