A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs
Abstract
:1. Introduction
- We propose a co-embedding model for knowledge graphs, which learns low-dimensional representations for KG components, including entities and relations in the same semantic space, as a result of measuring their affinities effectively.
- To address the issue of neglecting uncertainty, we introduce a variational auto-encoder into our model, which represents KG components as Gaussian distributions. The variational auto-encoder consists of two parts: (1) an inference model to encode KG components into latent vector spaces, (2) a generative model to reconstruct random variables from latent embeddings.
- We conduct experiments on real-world datasets to evaluate the performance of our model in link prediction. The experimental result demonstrates that our model outperforms the state-of-the-art baselines.
2. Related Work
2.1. Knowledge Graph Representation
2.2. Gaussian Embedding
2.3. Variational Auto-Encoder
3. Notations and Problems
3.1. Notations
3.2. Problem Definition
4. Model
4.1. Variational Lower Bound
4.2. Learning
5. Experiment
5.1. Data Sets
5.2. Experimental Setup
- TransE [6]. TransE was the first model to introduce translation-based embedding, which interprets relations as the translations operating on entities.
- DistMult [10]. DistMult is based on the bilinear model, where each relation is represented by a diagonal rather than a full matrix. DistMult enjoys the same scalable properties as TransE and it achieves superior performance over TransE.
- ComplEx [11]. ComplEx extends DistMult by introducing complex-valued embeddings so as to better model asymmetric relations. It has been proven that HolE is subsumed by ComplEx as a special case.
- ConvE [12]. ConvE is a multi-layer convolutional network model for link prediction [24] of KGs, and it reports state-of-the-art results for several established datasets. Unlike previous work which has focused on shallow, fast models that can scale to large knowledge graphs, ConvE uses 2D convolution over embeddings and multiple layers of nonlinear features to model KGs.
- ConvKB [13]. ConvKB applies the global relationships among same-dimensional entries of the entity and relation embeddings, so that ConvKB generalizes the transitional characteristics in the transition-based embedding models.
- R-GCN [14]. R-GCN applies graph convolutional networks to relational knowledge bases, creating a new encoder for link prediction and entity classification tasks.
5.3. Link Prediction
5.4. Results and Analysis
- Our method has the ability to measure the uncertainty in KG embedding. The covariance of Gaussian embedding can effectively describe the uncertainties by calculating the determinants and traces of the covariances.
- The relations with more semantic information (the number of associated heads and tails, type of relation) have larger uncertainty. For example, the ’major_field_of_study’ relation has the largest uncertainty, and the ’educational_insitution’ relation has the smallest uncertainty in those relations.
6. Results
- The experimental results on FB15k-237 and WN18RR indicate that our method can learn high-quality representations in KG.
- Our method outperformed other baselines in terms of the Hits@3 and Hits@10 metrics, but its performance was poor in terms of mean reciprocal rank and the Hits@1 metric on WN18RR. This may be because WN18RR contains a large number of entities and several relations, so most methods can only judge the correctness of a triple but cannot rank it in the top position.
- On FB15k-237, our method outperformed other baselines in terms of the Hits@3, Hits@10, and mean reciprocal rank metrics, and came second in terms of the Hits@1 and mean rank metrics. The improvements observed in FB15k-237 were greater than those in WN18RR, showing that FB15k-237 contains more relations and thus the uncertainties in its components are larger than those in WN18RR, which indicates that our method can learn valid representations with uncertainties in KG.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Berant, J.; Chou, A.; Frostig, R.; Liang, P. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1533–1544. [Google Scholar]
- Heck, L.; Hakkani-Tür, D.; Tur, G. Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. In Proceedings of the International Speech Communication Association, Lyon, France, 25–29 August 2013. [Google Scholar]
- Wang, W.Y.; Mazaitis, K.; Lao, N.; Mitchell, T.; Cohen, W.W. Efficient Inference and Learning in a Large Knowledge Base: Reasoning with Extracted Information using a Locally Groundable First-Order Probabilistic Logic. arXiv 2014, arXiv:cs.AI/1404.3301. [Google Scholar] [CrossRef] [Green Version]
- Bordes, A.; Weston, J.; Usunier, N. Open Question Answering with Weakly Supervised Embedding Models. arXiv 2014, arXiv:cs.CL/1404.4326. [Google Scholar]
- Bordes, A.; Chopra, S.; Weston, J. Question Answering with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 615–620. [Google Scholar] [CrossRef]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 2787–2795. [Google Scholar]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
- Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Yang, B.; tau Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2014, arXiv:cs.CL/1412.6575. [Google Scholar]
- Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. arXiv 2016, arXiv:cs.AI/1606.06357. [Google Scholar]
- Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. arXiv 2017, arXiv:cs.LG/1707.01476. [Google Scholar]
- Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 2, pp. 327–333. [Google Scholar] [CrossRef]
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. arXiv 2017, arXiv:stat.ML/1703.06103. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Paccanaro, A.; Hinton, G.E. Learning distributed representations of concepts using linear relational embedding. IEEE Trans. Knowl. Data Eng. 2001, 13, 232–244. [Google Scholar] [CrossRef] [Green Version]
- He, S.; Liu, K.; Ji, G.; Zhao, J. Learning to Represent Knowledge Graphs with Gaussian Embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM’15, New York, NY, USA, 19–30 October 2015; pp. 623–632. [Google Scholar] [CrossRef]
- Vilnis, L.; McCallum, A. Word Representations via Gaussian Embedding. arXiv 2014, arXiv:cs.CL/1412.6623. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:stat.ML/1312.6114. [Google Scholar]
- Kingma, D.P.; Rezende, D.J.; Mohamed, S.; Welling, M. Semi-Supervised Learning with Deep Generative Models. arXiv 2014, arXiv:cs.LG/1406.5298. [Google Scholar]
- Jiang, Z.; Zheng, Y.; Tan, H.; Tang, B.; Zhou, H. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. arXiv 2016, arXiv:cs.CV/1611.05148. [Google Scholar]
- Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial Autoencoders. arXiv 2015, arXiv:cs.LG/1511.05644. [Google Scholar]
- Dosovitskiy, A.; Brox, T. Generating Images with Perceptual Similarity Metrics based on Deep Networks. arXiv 2016, arXiv:cs.LG/1602.02644. [Google Scholar]
- Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar] [CrossRef]
Symbol | Description |
---|---|
a knowledge graph | |
set of entities | |
set of relations | |
set of triples | |
size of entities | |
size of relations | |
size of triples | |
D | dimension of latent variables |
observed data for triples | |
latent representation matrix for entities | |
latent representation matrix for relations |
Relation | #Head | #Tail | Type | log (det) | Trace |
---|---|---|---|---|---|
major_field_of_study | 225 | 77 | m-n | −338.8 | 38.1 |
student | 183 | 292 | 1-n | −340.6 | 34.8 |
institution | 22 | 222 | m-n | −376.2 | 32.8 |
colors | 85 | 19 | m-n | −400.9 | 26.9 |
fraternities_sororities | 20 | 3 | m-1 | −406.9 | 24.9 |
campuses | 13 | 13 | 1-1 | −411.9 | 21.3 |
currency | 5 | 3 | m-1 | −423.4 | 19.8 |
educational_institution | 13 | 13 | 1-1 | −430.6 | 18.7 |
WN18 | FB15k-237 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
MR | MRR | HITS@N | MR | MRR | HITS@N | |||||
1 | 3 | 10 | 1 | 3 | 10 | |||||
TransE (Bordes et al., 2013) 6 | 2300 | 0.243 | 4.27 | 44.1 | 53.2 | 323 | 0.279 | 19.8 | 37.6 | 44.1 |
DistMult (Yang et al., 2015) 10 | 7000 | 0.444 | 41.2 | 47 | 50.4 | 512 | 0.281 | 19.9 | 30.1 | 44.6 |
ComplEx (Trouillon et al., 2016) 11 | 7882 | 0.449 | 40.9 | 46.9 | 53 | 546 | 0.278 | 19.4 | 29.7 | 45 |
ConvE (Dettmers et al., 2018) 12 | 4464 | 0.456 | 41.9 | 47 | 53.1 | 245 | 0.312 | 22.5 | 34.1 | 49.7 |
ConvKB (Nguyen et al., 2018) 13 | 1295 | 0.265 | 5.82 | 44.5 | 55.8 | 216 | 0.289 | 19.8 | 32.4 | 47.1 |
R-GCN (Schlichtkrull et al., 2018) 14 | 6700 | 0.123 | 8 | 13.7 | 20.7 | 600 | 0.164 | 10 | 18.1 | 30 |
Our work | 1963 | 0.236 | 11.4 | 48.0 | 57.6 | 240 | 0.518 | 21.8 | 42.0 | 52.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, L.; Huang, H.; Du, Q. A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs. Appl. Sci. 2022, 12, 715. https://doi.org/10.3390/app12020715
Xie L, Huang H, Du Q. A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs. Applied Sciences. 2022; 12(2):715. https://doi.org/10.3390/app12020715
Chicago/Turabian StyleXie, Luodi, Huimin Huang, and Qing Du. 2022. "A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs" Applied Sciences 12, no. 2: 715. https://doi.org/10.3390/app12020715
APA StyleXie, L., Huang, H., & Du, Q. (2022). A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs. Applied Sciences, 12(2), 715. https://doi.org/10.3390/app12020715