Integration of Semantic and Topological Structural Similarity Comparison for Entity Alignment without Pre-Training
Abstract
:1. Introduction
- We propose a novel EA model that employs both semantic and structural similarity comparisons. The entities are enhanced through the integration of semantic and structural information, thereby achieving highly accurate alignment. Furthermore, weighting factors are introduced to effectively balance the contributions of the two models, ensuring optimal alignment across different dataset features.
- We conduct EA task-based experiments on four datasets, and the results of these experiments demonstrate the effectiveness of our model.
2. Methods
2.1. Preliminaries
- Knowledge graph (KG). We define a KG as , where represents the set of entities and represents the set of relations. In a KG, a fact or edge is represented as a triple , where denotes the head entity, denotes the tail entity and denotes the relation between them. The embedding vectors for h, r and t are denoted by , , and , respectively, using bold characters.
- Entity alignment (EA). EA is a crucial task in KG research. Given two KGs, and , the goal is to determine the identical entity set }. In this set, each pair represents the same real-world entity but exists in different KGs.
2.2. Semantic-Based Similarity Comparison
2.3. Topology-Based Similarity Comparison
2.4. Fusion of Semantic and Structural Similarity Comparison
3. Experiment Setting
3.1. Datasets
3.2. Baselines
3.3. Hyper-Parameters
3.4. Evaluation Metrics
- (1)
- Hits@k, measuring the percentage of correct alignments within the top matches. It is computed by , where is the indicator function.
- (2)
- Mean Reciprocal Rank (MRR), reflecting the average inverse ranking of correct results. It is computed by average of the reciprocal ranks , where is a set of ranking results.
4. Results and Discussion
4.1. Performance Comparison
- In Table 2, which shows the test results of our model on two non-temporal upper datasets, the results show that our model performs consistently at a high level for most of the evaluation parameters, closely following the Dual-AMN model.
- Table 3 presents the evaluation results of our method on the time series dataset. The results clearly show that our method outperforms the other baseline models. Compared to the Dual-AMN model, our method achieves an average improvement of 0.03 in Hit@1, Hits@10 and MRR, which highlights the effectiveness and superiority of our model.
- The results clearly show that our method has a significant advantage on the temporal dataset. This advantage is due to the use of temporal features, which act as identifiers during entity alignment, enabling us to accurately determine the same entities. Analyzing the parameters , , and , we observe that the structure-based information is more useful in datasets with higher graph density (e.g., ICEWS-WIKI, ICEWS-YAGO). On the other hand, descriptive information of entities tends to be more advantageous in cases where the graph structure is sparser (e.g., DBP15K(EN-FR), DBP-WIKI), while alignment using semantic information proves to be more effective.
4.2. Parameter Sensitivity
- As shown in the figure, the analysis reveals a clear trend. It is evident that each dataset has an optimal value, beyond which the performance of the model either converges or even tends to decrease. This finding suggests that there is a critical threshold for the parameter, beyond which further tuning may not significantly improve model performance.
- When evaluating the DBP15K(EN-FR) and DBP-WIKI datasets, we observed that the model’s MRR initially increases with increasing values, but eventually starts to decrease. Here, represents the weight of semantic information, indicating that increased emphasis on semantic information leads to improved performance in the early stages. However, in the later stages, the pronounced decrease in performance suggests that overemphasis on semantic information may lead to diminishing returns.
- When evaluating the ICEWS-WIKI and ICEWS-YAGO datasets, we observe a consistent trend in the performance of the model. The MRR peaks at a value of 0.4 for these datasets, while the best overall performance for the DBP15K(EN-FR) and DBP-WIKI datasets is achieved at a value of 0.6. This observation suggests that the ICEWS-WIKI and ICEWS-YAGO datasets have a higher dependence on the structural information of the model. This higher dependence on structural information can be attributed to the higher density and number of triples present in these datasets. Consequently, comparing structures in these datasets provides more valuable information.
- In summary, structural information is advantageous in densely structured datasets because it provides richer information about neighboring nodes and links. On the contrary, in sparsely structured datasets, semantic information is particularly helpful to compensate for the lack of structural information and to provide semantic information about characterized entities.
4.3. Ablation Studies
- Table 4 presents the performance of both the DBP15K(EN-FR) and DBP-WIKI datasets, demonstrating that the SSC module contributes significantly more than the TSC module to these two datasets. The SSC module exhibited a greater impact on the datasets, with an average increase of 0.318 in Hits@1, 0.244 in Hits@10 and 0.258 in MRR compared to the TSC module. This suggests that the semantic similarity module is a more effective approach for these two datasets.
- Table 5 presents the performance of the ICEWS-WIKI and ICEWS-YAGO datasets, indicating that the TSC module exhibits a significant contribution to both datasets compared to the SSC module. On average, Hits@1 improves by 0.0235, Hits@10 improves by 0.0865 and MRR improves by 0.0455. These results suggest that employing the structural similarity module is more effective for these two datasets.
- The results presented above shows that both modules contribute to the overall performance of our model. However, it is important to note that each module shows different contributions based on the KG characteristics.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Yan, Z.; Peng, R.; Wu, H. Similarity propagation based semi-supervised entity alignment. Eng. Appl. Artif. Intell. 2024, 130, 107787. [Google Scholar] [CrossRef]
- Yang, L.; Chen, J.; Wang, Z.; Shang, F. Relation mapping based on higher-order graph convolutional network for entity alignment. Eng. Appl. Artif. Intell. 2024, 133, 108009. [Google Scholar] [CrossRef]
- Ranaldi, L.; Pucci, G. Knowing knowledge: Epistemological study of knowledge in transformers. Appl. Sci. 2023, 13, 677. [Google Scholar] [CrossRef]
- Cao, J.; Fang, J.; Meng, Z.; Liang, S. Knowledge graph embedding: A survey from the perspective of representation spaces. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
- Ge, X.; Wang, Y.C.; Wang, B.; Kuo, C.C.J. Knowledge Graph Embedding: An Overview. Apsipa Trans. Signal Inf. Process. 2024, 13, 1–51. [Google Scholar] [CrossRef]
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge graph prompting for multi-document question answering. AAAI Conf. Artif. Intell. 2024, 38, 19206–19214. [Google Scholar] [CrossRef]
- Yang, L.; Chen, H.; Wang, X.; Yang, J.; Wang, F.Y.; Liu, H. Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment. arXiv 2024, arXiv:2401.16960. [Google Scholar]
- Chen, C.; Zheng, F.; Cui, J.; Cao, Y.; Liu, G.; Wu, J.; Zhou, J. Survey and open problems in privacy-preserving knowledge graph: Merging, query, representation, completion, and applications. Int. J. Mach. Learn. Cybern. 2024, 1–20. [Google Scholar] [CrossRef]
- Chen, J.; Yang, L.; Wang, Z.; Gong, M. Higher-order GNN with Local Inflation for entity alignment. Knowl.-Based Syst. 2024, 293, 111634. [Google Scholar] [CrossRef]
- Luo, S.; Yu, J. ESGNet: A multimodal network model incorporating entity semantic graphs for information extraction from Chinese resumes. Inf. Process. Manag. 2024, 61, 103524. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Y.; Fang, Y.; Geng, Y.; Guo, L.; Chen, X.; Li, Q.; Zhang, W.; Chen, J.; Zhu, Y.; et al. Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey. arXiv 2024, arXiv:2402.05391. [Google Scholar]
- Huber, M.N.; Angst, M.; Fischer, M. The Link Between Social-Ecological Network Fit and Outcomes: A Rare Empirical Assessment of a Prominent Hypothesis. Soc. Nat. Resour. 2024, 1–18. [Google Scholar] [CrossRef]
- Chen, M.; Tian, Y.; Yang, M.; Zaniolo, C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv 2016, arXiv:1611.03954. [Google Scholar]
- Sun, Z.; Hu, W.; Zhang, Q.; Qu, Y. Bootstrapping entity alignment with knowledge graph embedding. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18, Stockholm, Sweden, 9–19 July 2018; Volume 18. [Google Scholar]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26; Curran Associates Inc.: Glasgow, UK, 2013. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Wang, Z.; Lv, Q.; Lan, X.; Zhang, Y. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 349–357. [Google Scholar]
- Chen, Z.; Wu, Y.; Feng, Y.; Zhao, D. Integrating manifold knowledge for global entity linking with heterogeneous graphs. Data Intell. 2022, 4, 20–40. [Google Scholar] [CrossRef]
- Mao, X.; Wang, W.; Wu, Y.; Lan, M. Boosting the speed of entity alignment 10*: Dual attention matching network with normalized hard sample mining. In WWW ’21: Proceedings of the Web Conference 2021; Association for Computing Machinery: New York, NY, USA, 2021. [Google Scholar]
- Xu, C.; Su, F.; Lehmann, J. Time-aware graph neural networks for entity alignment between temporal knowledge graphs. arXiv 2022, arXiv:2203.02150. [Google Scholar]
- Xu, C.; Su, F.; Xiong, V.; Lehmann, J. Time-aware Entity Alignment using Temporal Relational Attention. In Proceedings of the ACM Web Conference 2022 (WWW ’22), Barcelona, Spain, Online, 26–29 June 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 788–797. [Google Scholar]
- Cai, L.; Mao, X.; Ma, M.; Yuan, H.; Zhu, J.; Lan, M. A simple temporal information matching mechanism for entity alignment between temporal knowledge graphs. arXiv 2022, arXiv:2209.09677. [Google Scholar]
- Wang, C.; Huang, Z.; Wan, Y.; Wei, J.; Zhao, J.; Wang, P. FuAlign:Cross-lingual entity alignment via multi-view representation learning of fused knowledge graphs. Inform. Fusion 2023, 89, 41–52. [Google Scholar] [CrossRef]
- Jiang, X.; Xu, C.; Shen, Y.; Su, F.; Wang, Y.; Sun, F.; Li, Z.; Shen, H. Rethinking gnn-based entity alignment on heterogeneous knowledge graphs: New datasets and a new method. arXiv 2023, arXiv:2304.03468. [Google Scholar]
- Tang, X.; Zhang, J.; Chen, B.; Yang, Y.; Chen, H.; Li, C. BERT-INT: A BERT-based Interaction Model For Knowledge Graph Alignment. Interactions 2020, 100, e1. [Google Scholar]
- Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32; Curran Associates Inc.: Glasgow, UK, 2019. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. Preprint. 2018. Available online: https://paperswithcode.com/paper/improving-language-understanding-by (accessed on 20 May 2024).
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Jiang, X.; Shen, Y.; Shi, Z.; Xu, C.; Li, W.; Li, Z.; Guo, J.; Shen, H.; Wang, Y. Unlocking the Power of Large Language Models for Entity Alignment. arXiv 2024, arXiv:2402.15048. [Google Scholar]
- Lynch, C.J.; Jensen, E.; Munro, M.H.; Zamponi, V.; Martinez, J.; O’Brien, K.; Feldhaus, B.; Smith, K.; Reinhold, A.M.; Gore, R. GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study. arXiv 2024, arXiv:2402.05435. [Google Scholar]
- Sun, Z.; Zhang, Q.; Hu, W.; Wang, C.; Chen, M.; Akrami, F.; Li, C. A benchmarking study of embedding-based entity alignment for knowledge graphs. arXiv 2020, arXiv:2003.07743. [Google Scholar] [CrossRef]
Dataset | #Entities | #Relations | #Facts | Density | #Anchors | Temporal | |
---|---|---|---|---|---|---|---|
DBP15K(EN-FR) | EN | 15,000 | 193 | 96,318 | 6.421 | 15,000 | No |
FR | 15,000 | 166 | 80,112 | 5.341 | No | ||
DBP-WIKI | DBP | 100,000 | 413 | 293,990 | 2.940 | 10,000 | No |
WIKI | 100,000 | 261 | 251,708 | 2.517 | No | ||
ICEWS-WIKI | ICEWS | 11,047 | 272 | 3,527,881 | 319.352 | 5058 | Yes |
WIKI | 15,896 | 226 | 198,257 | 12.472 | Yes | ||
ICEWS-YAGO | ICEWS | 26,863 | 272 | 4,192,555 | 156.072 | 18,824 | Yes |
YAGO | 22,734 | 41 | 107,118 | 4.712 | Yes |
Models | DBP15K(EN-FR) | DBP-WIKI | ||||
---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
MTransE | 0.247 | 0.577 | 0.360 | 0.281 | 0.520 | 0.363 |
BootEA | 0.653 | 0.874 | 0.731 | 0.748 | 0.898 | 0.801 |
GCN-Align | 0.411 | 0.772 | 0.530 | 0.494 | 0.756 | 0.590 |
RDGCN | 0.873 | 0.950 | 0.901 | 0.974 | 0.994 | 0.980 |
Dual-AMN | 0.954 | 0.994 | 0.970 | 0.983 | 0.996 | 0.991 |
Ours | 0.901 | 0.980 | 0.962 | 0.980 | 0.993 | 0.981 |
Models | ICEWS-WIKI | ICEWS-YAGO | ||||
---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
MTransE | 0.021 | 0.158 | 0.068 | 0.012 | 0.084 | 0.040 |
BootEA | 0.072 | 0.275 | 0.139 | 0.020 | 0.120 | 0.056 |
GCN-Align | 0.046 | 0.184 | 0.093 | 0.017 | 0.085 | 0.038 |
RDGCN | 0.064 | 0.202 | 0.096 | 0.029 | 0.097 | 0.042 |
Dual-AMN | 0.083 | 0.281 | 0.145 | 0.031 | 0.144 | 0.068 |
Ours | 0.081 | 0.285 | 0.148 | 0.033 | 0.146 | 0.071 |
Models | DBP15K(EN-FR) | DBP-WIKI | ||||
---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
SSC | 0.793 | 0.810 | 0.892 | 0.849 | 0.930 | 0.890 |
TSC | 0.560 | 0.678 | 0.692 | 0.445 | 0.574 | 0.573 |
SSC+TSC | 0.901 | 0.980 | 0.962 | 0.980 | 0.993 | 0.981 |
Models | ICEWS-WIKI | ICEWS-YAGO | ||||
---|---|---|---|---|---|---|
Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
SSC | 0.034 | 0.121 | 0.086 | 0.022 | 0.091 | 0.032 |
TSC | 0.073 | 0.245 | 0.141 | 0.030 | 0.140 | 0.068 |
SSC+TSC | 0.081 | 0.285 | 0.148 | 0.033 | 0.146 | 0.071 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Liu, Y. Integration of Semantic and Topological Structural Similarity Comparison for Entity Alignment without Pre-Training. Electronics 2024, 13, 2036. https://doi.org/10.3390/electronics13112036
Liu Y, Liu Y. Integration of Semantic and Topological Structural Similarity Comparison for Entity Alignment without Pre-Training. Electronics. 2024; 13(11):2036. https://doi.org/10.3390/electronics13112036
Chicago/Turabian StyleLiu, Yao, and Ye Liu. 2024. "Integration of Semantic and Topological Structural Similarity Comparison for Entity Alignment without Pre-Training" Electronics 13, no. 11: 2036. https://doi.org/10.3390/electronics13112036
APA StyleLiu, Y., & Liu, Y. (2024). Integration of Semantic and Topological Structural Similarity Comparison for Entity Alignment without Pre-Training. Electronics, 13(11), 2036. https://doi.org/10.3390/electronics13112036