A Multiple Salient Features-Based User Identification across Social Media
Abstract
:1. Introduction
- We extract and fuse multiple salient features contained in user display name, network topology, and published content.
- We adopt multi-module calculation methods to obtain the similarity between various redundant features.
- We design a bidirectional stable marriage matching algorithm to ensure that the different accounts achieve bidirectional optimality in the matching process in order to further improve user identification performance.
- We compare existing algorithms, and experiment results verify the effectiveness of the proposed algorithm.
2. Related Work
2.1. User Attribute Information-Based User Identification
2.2. User Network Topology Information-Based User Identification
2.3. User Behavior Information-Based User Identification
3. Problem Definition
- (1)
- Given two accounts and ; can it be determined that these two accounts belong to the same person?
- (2)
- Given two accounts and ; can it be determined that these two accounts belong to two different persons?
4. User Identification Algorithm across Social Media
4.1. Overall Framework of the Algorithm
4.2. User Display Name Analysis
4.2.1. Length Feature
4.2.2. Character Feature
4.2.3. Letter Feature
4.3. User Network Topology Information Analysis
4.4. User Behavior Information Analysis
4.4.1. Text Information
4.4.2. Punctuation Mark
4.4.3. Status Timestamp
4.5. User Account Matching
Algorithm 1: Bidirectional Stable Marriage Matching Algorithm. |
Input: and .
Output: Final matching pairs .
|
5. Analysis of Experimental Results
5.1. Acquisition of Datasets
5.2. Evaluation Metrics
5.3. Experimental Results
5.3.1. Ablation Study
- A represents the performance curve based only on the user’s display name.
- B represents a performance curve based only on the user’s network topology.
- C represents a performance curve based only on user behavior information.
- D represents a performance curve based on multiple salient features.
5.3.2. Baseline Comparison
- RFCA-SMM is a random forest secondary confirmation algorithm based on stable marriage matching. The stable marriage matching algorithm obtains candidate matching pairs of multi-user accounts and then trains the random forest model to perform secondary confirmation on the obtained candidate matching pairs, thereby further improving the precision of multi-user account matching.
- RCM uses account attribute similarity and user surrounding score, and then selects the user with the highest current score as the candidate matching user. Subsequently, the concept of user matching score (UMS) is proposed. UMS is combined with attribute similarity and network structure, and the user who matches the candidate user is determined based on this score.
5.3.3. Complexity Analysis
6. Discussion and Future Directions
- There are many different types of user information in social media, and some of them have not been deeply mined and analyzed [41]. For some existing special institutions, they need to be highly recognized in terms of user identification. At this time, only a few dimensions of user information can be integrated and analyzed to have certain limitations on the accuracy of user identification.
- When the model identifies large-scale users, the identification performance of existing algorithms will gradually decrease, which is also reflected in the experimental part of this paper. Community discovery in a complex network can solve this problem well [35]. By dividing a large-scale dataset into several communities and identifying between communities, it can effectively alleviate the problem that the existing identification algorithm reduces the identification performance as the dataset increases.
- Existing identification algorithms only pay attention to the user’s neighbor nodes, that is, single-hop nodes, while ignoring the role of multi-hop nodes in the friend relationship when analyzing the user’s network topology [30]. Therefore, in future research work, the contribution of multi-hop nodes to multi-user identification can be deeply analyzed, so that the performance of existing identification algorithms based on network topology can be further improved.
- Although a small amount of user information is used to achieve better multi-user identification, it also provides a shortcut for some malicious attackers in the network to obtain normal user information [26]. Therefore, future work should consider from the perspective of game theory to balance the relationship between user information and privacy protection.
- As supervised learning methods rely on pre-matched user pairs, which are difficult to obtain, the classification models adopted in the existing work lack sufficient training sets, which may lead to a lack of accuracy of the classifier, so obtaining more valid data and constructing more suitable classification models will further improve the accuracy of identification.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.T.; Tang, J.; Yang, Z.L.; Pei, J.; Yu, P.S. Cosnet: Connecting heterogeneous social networks with local and global consistency. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1485–1494. [Google Scholar]
- Qu, Y.; Xing, L.; Ma, H.; Wu, H.; Zhang, K.; Deng, K. Exploiting User Friendship Networks for User Identification across Social Networks. Symmetry 2022, 14, 110. [Google Scholar] [CrossRef]
- Most Popular Social Networks Worldwide as of October 2021, Ranked by Number of Active Users. Available online: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/ (accessed on 8 January 2022).
- Mostafa, M.M. More than words: Social networks’ text mining for consumer brand sentiments. Exp. Syst. Appl. 2013, 40, 4241–4251. [Google Scholar] [CrossRef]
- Tuna, T.; Akbas, E.; Aksoy, A.; Canbaz, M.A.; Karabiyik, U.; Gonen, B.; Aygun, R. User characterization for online social networks. Soc. Netw. Anal. Min. 2016, 6, 104. [Google Scholar] [CrossRef] [Green Version]
- Xing, L.; Deng, K.K.; Wu, H.H.; Xie, P.; Zhang, M.C.; Wu, Q.T. Exploiting Two-Level Information Entropy across Social Networks for User Identification. Wirel. Commun. Mob. Comput. 2021, 2021, 1082391. [Google Scholar] [CrossRef]
- Xing, L.; Deng, K.K.; Wu, H.H.; Xie, P.; Gao, J.P. Behavioral habits-based user identification across social networks. Symmetry 2019, 11, 1134. [Google Scholar] [CrossRef] [Green Version]
- Qu, Y.; Yu, S.; Zhou, W.; Niu, J. FBI: Friendship learning-based user identification in multiple social networks. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
- Shu, K.; Wang, S.H.; Tang, J.; Zafarani, R.; Liu, H. User identity linkage across online social networks: A review. ACM SIGKDD Explor. Newsl. 2017, 18, 5–17. [Google Scholar] [CrossRef]
- Zafarani, R.; Liu, H. Connecting users across social media sites: A behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 41–49. [Google Scholar]
- Xing, L.; Deng, K.; Wu, H.; Xie, P. Review of User Identification across Social Networks: The Complex Network Approach. J. Univ. Electron. Sci. Technol. China 2020, 49, 905–917. [Google Scholar]
- Goga, O.; Lei, H.; Parthasarathi, S.H.K.; Friedland, G.; Sommer, R.; Teixeira, R. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International World Wide Web Conference Committee (IW3C2), Rio de Janeiro, Brazi, 13–17 May 2013; pp. 447–458. [Google Scholar]
- Kong, X.; Zhang, J.; Yu, P.S. Inferring anchor links across multiple heterogeneous social networks. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 179–188. [Google Scholar]
- Haupt, J.; Bender, B.; Fabian, B.; Lessmann, S. Robust identification of email tracking: A machine learning approach. Eur. J. Oper. Res. 2018, 271, 341–356. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.J.; Zhang, Z.; Peng, Y.; Yin, H.Z.; Xu, Q.Q. Matching user accounts based on user-generated content across social networks. Future Gener. Comput. Syst. 2018, 83, 104–115. [Google Scholar] [CrossRef]
- Korrula, N.; Lattanzi, S. An efficient reconciliation algorithm for social networks. Proc. VLDB Endow. 2014, 7, 377–388. [Google Scholar] [CrossRef] [Green Version]
- Tan, S.L.; Guan, Z.Y.; Cai, D.; Qin, X.Z.; Bu, J.J.; Chen, C. Mapping users across networks by manifold alignment on hypergraph. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 14, pp. 159–165. [Google Scholar]
- Zhou, X.P.; Liang, X.; Zhang, H.Y.; Ma, Y.F. Cross-platform identification of anonymous identical users in multiple social media networks. IEEE Trans. Knowl. Data Eng. 2016, 28, 411–424. [Google Scholar] [CrossRef]
- Zhou, X.P.; Liang, X.; Du, X.Y.; Zhao, J.C. Structure based user identification across social networks. IEEE Trans. Knowl. Data Eng. 2018, 30, 1178–1191. [Google Scholar] [CrossRef]
- Deng, K.K.; Xing, L.; Zheng, L.; Wu, H.H.; Xie, P.; Gao, F.F. A user identification algorithm based on user behavior analysis in social networks. IEEE Access 2019, 7, 47114–47123. [Google Scholar] [CrossRef]
- Raad, E.; Chbeir, R.; Dipanda, A. User profile matching in social networks. In Proceedings of the 13th International Conference on Network-Based Information Systems (NBiS’10), Takayama, Japan, 14–16 September 2010; pp. 297–304. [Google Scholar]
- Ye, N.; Zhao, L.; Dong, L.; Bian, G.; Liu, E.; Clapworthy, G.J. User identification based on multiple attribute decision making in social networks. China Commun. 2013, 10, 37–49. [Google Scholar]
- Cortis, K.; Scerri, S.; Rivera, I.; Handschuh, S. An ontology-based technique for online profile resolution. In Social Informatics; Springer: Berlin, Germany, 2013; pp. 284–298. [Google Scholar]
- Abel, F.; Herder, E.; Houben, G.J.; Henze, N.; Krause, D. Cross-system user modeling and personalization on the social web. User Modeling User-Adapt. Interact. 2013, 23, 169–209. [Google Scholar] [CrossRef]
- Zamani, K.; Paliouras, G.; Vogiatzis, D. Similarity-based user identification across social networks. In International Workshop on Similarity-Based Pattern Recognition; Springer: Cham, Switzerland, 2015; pp. 171–185. [Google Scholar]
- Reza, K.J.; Islam, M.Z.; Estivill-Castro, V. Privacy protection of online social network users, against attribute inference attacks, through the use of a set of exhaustive rules. Neural Comput. Appl. 2021, 33, 12397–12427. [Google Scholar] [CrossRef]
- Narayanan, A.; Shmatikov, V. De-anonymizing social networks. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy, Oakland, CA, USA, 17–20 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 173–187. [Google Scholar]
- Bartunov, S.; Korshunov, A.; Park, S.; Ryu, W.; Lee, H. Joint link-attribute user identity resolution in online social networks. In Proceedings of the 6th SNA-KDD Workshop’12, Beijing, China, 12 August 2012; pp. 1–9. [Google Scholar]
- Cui, Y.; Pei, J.; Tang, G.T.; Luk, W.S.; Jiang, D.X.; Hua, M. Finding email correspondents in online social networks. World Wide Web 2013, 16, 195–218. [Google Scholar] [CrossRef]
- Li, Y.J.; Su, Z.T.; Yang, J.Q.; Gao, C.J. Exploiting similarities of user friendship networks across social networks for user identification. Inf. Sci. 2020, 506, 78–98. [Google Scholar] [CrossRef]
- Alam, K.A.; Ahmad, R.; Ko, K. Enabling Far-Edge Analytics: Performance Profiling of Frequent Pattern Mining Algorithms. IEEE Access 2017, 5, 8236–8249. [Google Scholar] [CrossRef]
- Cao, W.; Wu, Z.W.; Wang, D.; Li, J.; Hu, H.S. Automatic user identification method across heterogeneous mobility data sources. In Proceedings of the IEEE 32nd International Conference on Data Engineering, Helsinki, Finland, 16–20 May 2016; pp. 978–989. [Google Scholar]
- Hao, T.Y.; Zhou, J.B.; Cheng, Y.S.; Huang, L.B.; Wu, H.S. User identification in cyber-physical space: A case study on mobile query logs and trajectories. In Proceedings of the ACM SigSpatial (Short Paper), Burlingame, CA, USA, 31 October–3 November 2016. No. 71. [Google Scholar]
- Han, X.H.; Wang, L.H.; Xu, S.J.; Liu, G.Q.; Zhao, D.W. Linking social network accounts by modeling user spatiotemporal habits. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Beijing, China, 22–24 July 2017; pp. 19–24. [Google Scholar]
- Chen, H.X.; Yin, H.Z.; Sun, X.G.; Chen, T.; Gabrys, B.; Musial, K. Multi-level graph convolutional networks for cross-platform anchor link prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 6–10 July 2020; pp. 1503–1511. [Google Scholar]
- Li, Y.J.; Peng, Y.; Zhang, Z.; Wu, M.; Xu, Q.; Yin, H. A deep dive into user display names across social networks. Inf. Sci. 2018, 447, 186–204. [Google Scholar] [CrossRef] [Green Version]
- Rexford, J.; Dovrolis, C. Future Internet architecture: Clean-slate versus evolutionary research. Commun. ACM 2010, 53, 36–40. [Google Scholar] [CrossRef] [Green Version]
- Yan, M.; Sang, J.; Xu, C. Unified YouTube video recommendation via cross-network Collaboration. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 25 July 2015; pp. 19–26. [Google Scholar]
- Deng, K.K.; Xing, L.; Zhang, M.C.; Wu, H.H.; Xie, P. A multiuser identification algorithm based on internet of things. Wirel. Commun. Mob. Comput. 2019, 2019, 1–11. [Google Scholar] [CrossRef]
- Rossetti, G.; Cazabet, R. Community discovery in dynamic networks: A survey. ACM Comput. Surv. 2018, 51, 1–37. [Google Scholar] [CrossRef] [Green Version]
- Xing, L.; Deng, K.K.; Wu, H.H.; Xie, P.; Zhao, H.V.; Gao, F.F. A survey of across social networks user identification. IEEE Access 2019, 7, 137472–137488. [Google Scholar] [CrossRef]
User Data | Data Type |
---|---|
Length feature | |
Character feature | User display name information |
Letter feature | |
Friend relationship | User network topology information |
Text information | |
Punctuation mark | User behavior information |
Status timestamp |
User Information | Matching Metrics | Matching Thresholds |
---|---|---|
Length feature | 0.98 | 0.9 |
Character feature | 0.87 | 0.8 |
Letter feature | 0.98 | 0.92 |
Friend relationship | 23 | 20 |
Text information | 5500 | 5000 |
Punctuation mark | 0.98 | 0.9 |
Status timestamp | 0.68 | 0.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qu, Y.; Ma, H.; Wu, H.; Zhang, K.; Deng, K. A Multiple Salient Features-Based User Identification across Social Media. Entropy 2022, 24, 495. https://doi.org/10.3390/e24040495
Qu Y, Ma H, Wu H, Zhang K, Deng K. A Multiple Salient Features-Based User Identification across Social Media. Entropy. 2022; 24(4):495. https://doi.org/10.3390/e24040495
Chicago/Turabian StyleQu, Yating, Huahong Ma, Honghai Wu, Kun Zhang, and Kaikai Deng. 2022. "A Multiple Salient Features-Based User Identification across Social Media" Entropy 24, no. 4: 495. https://doi.org/10.3390/e24040495
APA StyleQu, Y., Ma, H., Wu, H., Zhang, K., & Deng, K. (2022). A Multiple Salient Features-Based User Identification across Social Media. Entropy, 24(4), 495. https://doi.org/10.3390/e24040495