1. Introduction
Knowledge Graphs (KGs) are collections of large-scale triples, such as Freebase [
1], YAGO [
2] and DBpedia [
3]. KGs play a crucial role in applications such as question answering services, search engines, and smart medical care. Although there are billions of triples in KGs, they are still incomplete. These incomplete knowledge bases will bring limitations to practical applications [
4]. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity, which will significantly limit our search and answering [
5]. Therefore, knowledge graph completion, known as link prediction, which automatically predicts missing links between entities based on given links, has recently attracted growing attention.
Inspired by word embedding [
6], researchers recently tried to solve the task of link prediction through knowledge graph embedding. Knowledge graph embedding models map entities and relations into low-dimensional vectors (or matrices, tensors), measure the rationality of triples through specific score functions between entities and relations, and rank the triples with scores. TransE [
1] first proposes to utilize relation vectors as the geometric distance between entities. Then many variants emerge.
The tensor decomposition models [
7,
8,
9,
10,
11,
12,
13] are a family of which the inference performance is relatively good among these variants. RESCAL [
7] is the basic tensor decomposition model, which is the first tensor decomposition model. Since RESCAL [
7] represents the relations as a matrix, the large number of parameters makes it difficult for the model to learn effectively. So DisMult [
8] directly diagonalizes the matrix, which takes the relations as vectors. This operation significantly reduces the number of parameters. There are a large number of complex relation types in the knowledge graphs. However, DisMult is an over-simplified model, which cannot describe complex relations. Then subsequent variants are invented to describe more types of relations, such as asymmetric and hierarchical relations, which are equivalent to designing unique structures for description of specific types of relations. For example, ComplEx [
9], similarly to DistMult [
8], forces each relation embedding to be a diagonal matrix but extends such formulation in the complex space. Analogy [
14] aims at modeling analogical reasoning, which is crucial for any knowledge induction. It employs the general bilinear scoring function but adds two main constraints inspired by analogical structures. TuckER [
10] relies on the Tucker decomposition [
15], which factorizes a tensor into a set of vectors and a smaller shared core. SimplE [
11] forces relation embeddings to be diagonal matrices, similarly to DistMult [
8], but extends it by associating two separate embeddings with each entity and associating two separate diagonal matrices with each relation. These models mainly explore particular regularization to improve performance. No matter how sophisticated the design of such tensor decomposition models is, they find it difficult to surpass the basic tensor decomposition model theoretically. In addition, the previous tensor decomposition models do not consider the problem of attribute separation. The unnoticed task of attribute separation in the traditional models is just handed over to the training. However, the amount of parameters for this task is tremendous, and the model is prone to overfitting.
Considering that none of the variant models under the current research route can exceed the theoretical tensor decomposition model, we focus on making the tensor decomposition model approach the theoretical performance in this paper. The tensor decomposition models cannot achieve theoretical performance because too many parameters limit the dimensional expansion. Inspired by attribute selection in practical comparisons of triples, we propose a tensor decomposition model based on attribute subspace segmentation in this paper.
In practice, entities are collections of attributes, and different entities can contain various semantic attributes. Comparing triples with different relations should only select specific attributes for comparison.
Figure 1 shows the comparison of boxes with the same shape and different colors. When comparing different attributes such as colors or shapes, we should first separate the colors or shapes of the entities that need to be compared and then compare the associations of the corresponding colors or shapes of the entities. Inspired by this fact, we should first separate the properties that need to be compared. Measuring the plausibility of a given triple means comparing the matching degree of the attributes associated with the predicate between the entities. However, the traditional tensor decomposition model ignores the first operation (attribute separation). Therefore, we propose a novel model—a tensor decomposition model based on separating attribute space for knowledge graph completion (SeAttE) in this paper. SeAttE transfers the large-parameter learning for the attribute space separation task in traditional tensor decomposition models to the model structure design. This operation effectively reduces the number of parameters, allowing the model to focus on learning the semantic equivalence between relations and better performance.
The actual size of the attribute subspace is related to the complexity of the relations. Predefined designs cannot accurately model the relations. In order to facilitate the realization of the model, we propose the initialization design of the uniform attribute subspace. Specifically, SeAttE limits the size of each attribute subspace by setting the maximum attribute subspace dimension. In this paper, the large amount of parameters that need to be learned for the attribute space separation task is transformed into the design of the model structure. This design dramatically reduces the need to learn parameters so that the tensor decomposition model can be extended to higher dimensions, significantly improving performance.
Overall, inspired by the fact that inference should first perform attribute space filtering, we propose SeAttE—a tensor factorization model based on separating attribute space for knowledge graph completion in this paper. Our main contributions are as follows.
SeAttE is the first model among the tensor decomposition family to consider the attribute space separation task. SeAttE transforms the learning of too many parameters for the attribute space separation task into the structure’s design. This operation allows the model to focus on learning the semantic equivalence between relations, causing the performance to approach the theoretical limit. Experiments on the benchmark datasets show that SeAttE achieves state-of-the-art among the tensor factorization models.
We prove that RESCAL, DisMult, and ComplEx are all special cases of SeAttE in this paper;
We classify the tensor factorization models from a new perspective for their better understanding by subsequent researchers.
The rest of this paper is organized as follows:
Section 2 presents a brief overview of related work. We provide the problem formulation, including definitions, preliminaries and research questions in
Section 3. We analyze the design of SeAttE and prove the relation to previous tensor factorization models in
Section 4. The experiments are conducted and discussed with the existing KG embedding models in
Section 5. Finally, we summarize our findings along with the future directions in
Section 6.
2. Related Work
In this section, we describe related works and the critical differences between them. We divide knowledge graph embedding models into three leading families [
16,
17,
18,
19], including Tensor Decomposition Models, Geometric Models, and Deep Learning Models.
Tensor Decomposition Models. These models implicitly consider triples as tensor decomposition. DistMult [
8] constrains all relation embeddings to be diagonal matrices, which reduces the space of parameters to access a more accessible model to train. RESCAL [
7] represents each relationship with a total rank matrix. ComplEx [
9] extends the KG embeddings to the complex space to better model asymmetric and inverse relations. Analogy [
14] employs the general bilinear scoring function but adds two main constraints inspired by analogical structures. Based on the Tucker decomposition, TuckER [
10] factorizes a tensor into a set of vectors and a smaller shared core matrix. SimplE [
11] is a simple enhancement of CP to allow the two embeddings of each entity to be learned dependently. HolE [
13] is a multiplicative model that is isomorphic to ComplEx [
9]. Inspired by the recent success of automated machine learning (AutoML), AutoSF [
12] proposes to automatically design scoring functions for distinct KGs by the AutoML techniques. QuatDE [
20] captures the variety of relational patterns and separates different semantic information of the entity, using transition vectors to adjust the point position of the entity embedding vectors in the quaternion space via Hamilton product, enhancing the feature interaction capability between elements of the triplet. DensE [
21] develops a novel knowledge graph embedding method to provide an improved modeling scheme for the complex composition patterns of relations.
Geometric Models. Geometric Models interpret relations as geometric transformations in the latent space. TransE [
1] is the first translation-based method, which treats relations as translation operations from the head entities to the tail entities. Along with TransE [
1], multiple variants, including TransH [
22], TransR [
23] and TransD [
24], are proposed to improve the embedding performance of KGs. Recently, RotatE [
25] defines each relation as a rotation from head entities to tail entities. Inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy, HAKE [
26] maps entities into the polar coordinate system. HAKE [
26] can effectively model the semantic hierarchies in knowledge graphs. OTE [
27] proposes a distance-based knowledge graph embedding. First, OTE extends the modeling of RotatE from 2D complex domain to high dimensional space with orthogonal relation transforms. Second, graph context is proposed to integrate graph structure information into the distance scoring function to measure the plausibility of the triples during training and inference.
Deep Learning Models. Deep Learning Models use deep neural networks to perform knowledge graph completion. ConvE [
28] and ConvKB [
29] employ convolutional neural networks to define score functions. CapsE [
30] embeds entities and relations into one-dimensional vectors under the basic assumption that different embeddings encode homologous aspects in the same positions. CompGCN [
31] utilizes graph convolutional networks to update the knowledge graph embedding. Neural Tensor Network (NTN) combines E-MLP with several bilinear parts. Nathani [
32] proposes a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood. RLH [
33] is inspired by the hierarchical structure through which a human being handles cognitionally ambiguous cases. The whole reasoning process is decomposed into a hierarchy of two-level Reinforcement Learning policies for encoding historical information and learning structured action space. R2D2 [
34] is a novel method for automatic reasoning on knowledge graphs based on debate dynamics. R2D2 is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments-paths in the knowledge graph—with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. RNNLogic [
35] is a probabilistic model. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. MADLINK [
36] introduces an attentive encoder–decoder-based link prediction approach considering both structural information of the KG and the textual entity descriptions.
There are also other models, such as DURA [
37], which are proposed to solve overfitting. RuleGuider [
38] leverages high-quality rules generated by symbolic-based methods to provide reward supervision for walk-based agents. SFBR [
39] provides a relation-based semantic filter to extract the attributes that need to be compared and suppress the irrelevant attributes of entities. Together, most of the above studies intend to find a more robust representing approach. Measuring the effectiveness of certain triples is to compare the matching degree of specific attributes based on relations. Only a few models, such as TransH [
22], TransR [
23], and TransD [
24], consider that entities in different triples should have different representation. However, these variants require many resources occupations and are limited to particular models.
Although there is much research on this task, this paper mainly focuses on the models based on tensor decomposition. The previous tensor decomposition models mainly achieved better performance through unique regularization, but these models still could not reach the theoretical upper limit of the tensor decomposition model. No matter how sophisticated the design of tensor decomposition models is, the performance is theoretically under the basic tensor decomposition model. Moreover, the previous tensor decomposition model did not consider the problem of attribute separation. The unnoticed task of attribute separation in the traditional models was just handed over to the training. However, the amount of parameters for this task is tremendous, and the model is prone to overfitting. Inspired by the actual semantic comparison, this paper proposes an attribute subspace structure design—SeAttE, which reaches the theoretical upper limit of the tensor decomposition model. We will describe the relationship between SeAttE and other models based on tensor decomposition in detail in
Section 4.3.
3. Background
In this section, we introduce KG embedding, KG completion tasks and the notations used throughout this paper. Next, we briefly introduce several models involved in this paper.
3.1. KG Completion and Notations
KGs are collections of factual triples , where represents a triple in the knowledge graph, are head, tail entities and relations, respectively. We associates the entities and relations r with vectors in knowledge graph embedding. Then we design an appropriate scoring function : , to map the embedding of the triple to a certain score. For a particular question , the task of KG completion is ranking all possible answers and obtain the preference of prediction.
We use and to distinguish matrix representation and vector representation of the relations, respectively. T, and ∘ denote the operation of transpose, the generalized dot product and the Hadamard product, respectively. Especially, we utilize to represent the matrix of relation in SeAttE. Let ∥∥, () and () denote the norm, matrix diagonalization and the real part of complex vectors.
3.2. Basic Models
Tensor Factorization Models. Models in this family interpret link prediction as a task of tensor decomposition, where triples are decomposed into a combination (e.g., a multi-linear product) of low-dimensional vectors for entities and relations. CP [
40] represents triples with canonical decomposition. Note that the same entity has different representations at the head and tail of the triplet. The score function can be expressed as:
where
.
RESCAL [
7] represents a relation as a matrix
that describes the interactions between latent representations of entities. The score function is defined as:
DistMult [
8] forces all relations to be diagonal matrices, which consistently reduces the space of parameters to be learned, resulting in a much easier model to train. On the other hand, this makes the scoring function commutative, which amounts to treating all relations as symmetric.
where
.
ComplEx [
9] extends the real space to complex spaces and constrains the embeddings for relation to be a diagonal matrix. The bilinear product becomes a Hermitian product in complex spaces. The score function can be expressed as:
where
.
5. Experiments and Discussion
This section is organized as follows. First, we introduce the experimental settings in
Section 5.1. Then, we show the effectiveness of SeAttE on three benchmark datasets in
Section 5.2. Finally, we visualize and analyze the embeddings generated by SeAttE in
Section 5.3.
5.1. Experimental Settings
Dataset. In order to evaluate the proposed module, we consider three common knowledge graph datasets—WN18RR [
41], FB15k-237 [
28] and YAGO3-10 [
42]. Details of these datasets are listed in
Table 1.
FB15k-237 is obtained by eliminating the inverse and equal relations in FB15K, making it more difficult for simple models to do well. WN18RR is achieved by excluding inverse and equal relations in WN18. The main relation patterns are symmetry/antisymmetry and composition. YAGO3-10 is a subset of YAGO3, which is produced to alleviate the test set leakage problem.
Evaluation Protocol and Settings. For evaluation, we use the same ranking procedure as in the literature [
43]. For each test triple, the head is removed and replaced by each of the entities of the dictionary in turn. Dissimilarities (or energies) of those corrupted triplets are first computed by the models and then sorted by ascending order; the rank of the correct entity is finally stored. This whole procedure is repeated while removing the tail instead of the head. We use evaluation metrics standard across the link prediction literature: mean reciprocal rank (MRR) and Hits@k, k = 1,3,10. Mean reciprocal rank is the average of the inverse of the mean rank assigned to the true triple over all candidate triples. Hits@k measures the percentage of times a true triple is ranked within the top k candidate triples. We evaluate the performance of link prediction in the filtered setting [
1], i.e., all known true triples are removed from the candidate set except for the current test triple. In both settings, higher MRR or higher Hits@1/3/10 indicate better performance.
Baselines and Training Protocol. In this section, we compare the performance of SeAttE against two categories of KGC models: (1) geometric models including TransE [
1], TransH [
22], TransR [
23], RotatE [
25], TucKer [
10], AutoERTR [
44] and HAKE [
26]; (2) models based on tensor decomposition including CP [
40], SimplE [
11], DisMult [
8], RESCAL [
7], ANALOGY [
14], ComplEx [
9], DURA [
37], SFBR [
39] and AutoSF [
12]. (3) deep learning models including ConvE [
28], RAN [
45] ConvKB [
29], CapsE [
30] and Nathani [
32].
Because ComplEx is a particular case of SeAttE, the parameters of our experiments are consistent with those in DURA [
37]. SeAttE only introduces the parameter of the attribute subspace dimension based on DURA, which will be marked in the specific experimental results.
5.2. Comparison with Existing Link Prediction Models
In this section, we compare the results of SeAttE and other state-of-the-art models on three benchmark datasets.
Table 2 shows the comparison between SeAttE and geometric models. The table shows that SeAttE outperforms all the compared geometric models in MRR, Hit@1 and Hit@1. Compared with the best geometric model—HAKE, SeAttE still has significant improvements: on YAGO3-10, MRR increases by 4%; on FB15k-237, MRR increases by 2.5%.
Table 3 shows the comparison between SeAttE and deep learning models. The table shows that SeAttE also achieves the best performance on WN18RR and YAGO3-10. Compared with the best deep learning model, SeAttE still has significant improvements: on YAGO3-10, MRR increases by 5.8%; on WN18RR, MRR increases by 3.2%. Nathani’s model still keeps the best performance on FB15K-237, because it applies a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood. Utilizing graph neural network techniques for link prediction is our ongoing research.
Table 4 shows the comparison between SeAttE and tensor decomposition models. The table shows that SeAttE also achieves the best performance among all datasets. On WN18RR, RESCAL-DURA initially achieved the best performance. SeAttE achieves the same inference performance as the RESCAL-DURA model. On FB15K-237 and YAGO3-10, ComplEx-DURA initially performed the best inference. SeAttE achieves the same inference performance as ComplEx-DURA. This experiment also verifies the novelty of SeAttE and the proof in
Section 4.2.
In summary, SeAttE belongs to the family of tensor decomposition models. Compared to other tensor models, SeAttE reaches the upper-performance limit of this family of models. SeAttE achieves the best performance as a tensor decomposition model compared with geometric models. SeAttE achieves the best performance on some datasets compared with deep learning models. Since Nathani’s model utilizes a novel attention-based feature embedding that captures neighborhood features, it achieves the best performance in FB15K-237. Comparative experiments show that this operation of separating attribute space allows the model to focus on learning the semantic equivalence between relations, resulting in better performance approaching the theoretical limit.
5.3. Visualization and Analysis
In this part, we analyze the performance of SeAttE from three aspects. First, we visualize the embedding through T-SNE; then, we randomly select a pair of samples to analyze the function of SFBR and show the additional resources occupied by SFBR.
Visualization. We use T-SNE to visualize embeddings of tails. Suppose the link prediction task is , where h and r are head entities and relations, respectively. We randomly select ten queries in FB15k-237, each of which has more than 50 answers. Then we use T-SNE to visualize the embeddings generated by RESCAL and SeAttE. For each question, we convert the answers into two-dimensional points with T-SNE and display them on the graph with the same color.
As shown in
Figure 5 and
Figure 6, it is a visualization of the distribution of answers to 10 questions. SeAttE makes the answers to the same question more similar, indicating that SeAttE effectively separates the needed semantics of each entity and suppresses the attributes of other dimensions, which verifies the claim in
Section 4.1.
Resource occupation. As shown in
Table 5,
Table 6 and
Table 7, we compare the parameter size of different models under the identical dimension of entities. When the entity vector dimension
d is fixed, the number of parameters in SeAttE increases slightly as the dimension
k of each subspace increases. First, we compare the parameters of ComplEx and SeAttE. When the subspace dimension
k is set to two, the parameters of SeAttE and ComplEx are the same, which is consistent with the proof in
Section 4.2. We find that the parameter amount of SeAttE is slightly higher than that of ComplEx as the subspace dimension
k increases. Then we compare the parameters of RESCAL and SeAttE in the three tables. We find that the parameter amount of SeAttE is much lower than that of RESCAL at the same entity dimension. In summary, the experiments show that learning too many parameters for the attribute space separation task in traditional tensor decomposition models is transformed into the structure’s design in SeAttE. SeAttE achieves good performance while significantly reducing the number of parameters, verifying the statement in
Section 4.1.