CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding

Zhang, Pengfei; Chen, Dong; Fang, Yang; Zhao, Xiang; Xiao, Weidong

doi:10.3390/math10173161

Open AccessArticle

CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding

by

Pengfei Zhang

¹,

Dong Chen

¹,

Yang Fang

^1,*

,

Xiang Zhao

²

and

Weidong Xiao

¹

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China

²

Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(17), 3161; https://doi.org/10.3390/math10173161

Submission received: 5 August 2022 / Revised: 22 August 2022 / Accepted: 1 September 2022 / Published: 2 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge representation learning is representing entities and relations in a knowledge graph as dense low-dimensional vectors in the continuous space, which explores the features and properties of the graph. Such a technique can facilitate the computation and reasoning on the knowledge graphs, which benefits many downstream tasks. In order to alleviate the problem of insufficient entity representation learning caused by sparse knowledge graphs, some researchers propose knowledge graph embedding models based on instances and concepts, which utilize the latent semantic connections between concepts and instances contained in the knowledge graphs to enhance the knowledge graph embedding. However, they model instances and concepts in the same space or ignore the transitivity of isA relations, leading to inaccurate embeddings of concepts and instances. To address the above shortcomings, we propose a knowledge graph embedding model that differentiates concepts and instances based on spatial transformation—CIST. The model alleviates the gathering issue of similar instances or concepts in the semantic space by modeling them in different embedding spaces, and adds a learnable parameter to adjust the neighboring range for concept embedding to distinguish hierarchical information of different concepts, thus modeling the transitivity of isA relations. The above features of instances and concepts serve as auxiliary information so that thoroughly modeling them could alleviate the insufficient entity representation learning issue. For the experiments, we chose two tasks, i.e., link prediction and triple classification, and two real-life datasets: YAGO26K-906 and DB111K-174. Compared with state of the arts, CIST achieves an optimal performance in most cases. Specifically, CIST outperforms the SOTA model JOIE by 51.1% on Hits@1 in link prediction and 15.2% on F1 score in triple classification.

Keywords:

knowledge graph; knowledge graph embedding; concepts and instances

MSC:

68T30

1. Introduction

Knowledge graphs describe entities and their relations in the real world in the form of triples (h, r, t). Knowledge representation learning is a crucial technology for exploring knowledge graphs. It can help computation and reasoning based on a knowledge graph by representing the semantic information of triples as dense low-dimensional vectors in a continuous space while preserving the specific properties of the original graph and, thus, forming an essential foundation for supporting downstream tasks of intelligent information services such as intelligent search, question answering, and personalized recommendation [1].

Knowledge representation learning has become a hot research topic over the last few years. Researchers have proposed various models to learn the representation of entities and relations in knowledge graphs. TransE [2], which regards relations as transitions from head entities to tail entities, is regarded as a classic model due to its simplicity and efficiency. Later researchers have proposed Trans series improved models such as TransH [3], semantic-matching-based models such as DistMult [4], and neural-network-based models such as R-GCN [5]. Although these knowledge graph embedding models have achieved good experimental outcomes, they cannot learn the embedding of entities sufficiently due to the sparseness of knowledge graphs. It can be seen that a graph shows a high sparseness by analyzing the distribution of entities in knowledge graphs. Most entities in a knowledge graph are long-tail entities that are only connected to one or two other entities, resulting in only a small number of entities with rich neighbor information being able to be frequently trained and fully learned. A vast number of long-tail entities are trained with limited training frequency, making the representation learning of these entities insufficient. Therefore, approaches using auxiliary information such as entity types [6] and entity descriptions [7,8] have been proposed to enhance the representation learning of knowledge graphs. However, these models have additional requirements for the samples of knowledge graphs.

For knowledge graphs containing triples with a concept and instance, the instance and concept-based knowledge graph embedding model uses the latent semantic relations between the concepts and instances in the graph to learn its representation. It can alleviate the problem of insufficient learning caused by the sparseness of the graph without adding any additional requirements for training samples. The latent semantic relations between concepts and instances include two aspects: on the one hand, concepts provide basic type information for the instances belonging to them. For example, for an instance “city A” with few neighbor entities, its embedding can be determined by observing the embedding of other cities in the embedding space: because “city A” has the same basic type information of the concept “city” as other cities, its embedding should not be too far from those of other cities. On the other hand, instances provide detailed information for their corresponding concepts. For example, the embedding of the concept “city” can be roughly determined by observing multiple individual city instances in the embedding space. The latest representative researches on the instance and concept-based knowledge graph embedding include JECI++, TransC, JOIE, etc.

The JECI++ [9] model predicts instances by constructing prediction functions based on interaction information of the instance neighborhood information and the concepts to which the instance belongs, and iteratively learns the embedding of instances and concepts by minimizing the gap between the predicted value and the actual instance embedding. However, the model is not generalizable enough because not all instances have sufficient neighbor information for training due to the sparseness of the knowledge graph. However, in this work, we intend to address the long-tail issues as the distribution of real-life datasets is very sparse.

TransC [10] models semantic relations between instances and concepts by the relative positions between points and spheres in the same space. However, TransC models the instances and concepts in the same space without considering the difference between them, which causes similar instances to gather closely in the space, thus affecting the performance of the model. Instances and concepts are different, in that instances are concrete, with each referring to a unique concrete physical object, whereas a concept is an abstraction of a kind of instance, which can be regarded as the type of instance. Instances and concepts contain different amounts of attribute information. An instance may belong to multiple concepts; therefore, it has more attribute information than concepts. For example, Arnold Schwarzenegger, as shown in Figure 1, is an instance of both the concept “actor” and the concept “politician”, so it should have attributes of both “actor” and “politician”. Concepts are more generalized and contain less attribute information than instances. An instance contains more attribute information, meaning that the embedding of the instance needs more information. It is just like drawing an abstract person and drawing Mona Lisa. Compared to drawing an abstract person, drawing Mona Lisa requires drawing not only the basic features of a person, such as the primary body shape (the head, the torso, the arms, etc.), but also some other details, such as its unique facial features, clothing, hairstyle, etc. Therefore, drawing Mona Lisa needs more “strokes” or information than drawing an abstract person. Similarly, if more information is required to describe the embedding of an instance, it indicates that the embedding of the instance should be of a higher dimension. Conversely, the less attribute information a concept contains, the lower the dimension of its embedding should be. Therefore, we believe that the embedding of instances and concepts should belong to spaces with different dimensions, and the embedding space of instances has a higher dimension than that of concepts. Similar to the portrait of Mona Lisa, which has the basic features of a person portrait, the instance should have all of the attribute information of the concept to which it belongs. We believe that the embedding of an instance, if mapped to the concept-embedding space, should be close to the embedding of the concept to which it belongs.

JOIE [11] utilized a cross-view association model to learn the correlation between instances and concepts. However, the model cannot handle the case in which an instance belongs to different concepts, and cannot model the transitivity of isA relations (instanceof and subclassof are special relations in knowledge graphs, called isA relations [12]) because it ignores the hierarchy of concepts. For example, Arnold Schwarzenegger, as shown in Figure 1, is both an instance of the concept “actor” and an instance of the concept “politician”. According to JOIE’s cross-view transformation, the concepts “actor” and “politician” will be modeled as the same embedding, thus influencing the modeling performance. Concepts are hierarchical, and concepts of different levels and granularities are linked by the subclassof relation to the form subclassof triples: (fine-grained concept, subclassof, coarse-grained concept). As shown in Figure 1, the triples (Singer, subclassof, Artist) and (Artist, subclassof, Person) construct a hierarchical tree of Singer, Artist, and Person, so that these three concepts are located at three different levels. However, the JOIE model cannot model the hierarchy of concepts. The IsA relations are of transitivity. As shown in Figure 1, from triples (Pablo Alborán, instanceof, Singer) and (Singer, subclassof, Artist), (Pablo Alborán, instanceof, Artist) can be inferred. From triples (Singer, subclassof, Artist) and (Artist, subclassof, Person), (Singer, subclassof, Person) can be inferred. However, the JOIE model cannot model the transitivity of isA relations.

In order to differentiate instances from concepts and to differentiate the hierarchy of different concepts, we propose a model, called CIST: differentiating Concepts and Instances based on Spatial Transformation for knowledge graph embedding. We divided the embedding space into a concept-embedding space and an instance-embedding space in order to model the embeddings of concepts and instances, respectively. We separated the triples in the knowledge graph into three types: instanceof triples, subclassof triples, and relation triples (including triples with instances connected via relation and triples with concepts connected via relation, excluding subclassof triples), and modeled them separately. For instanceof triples, because the head entity, an instance, and the tail entity, a concept, are not in the same embedding space, we modeled the instanceof relation by mapping the instance to the concept-embedding space; therefore, the embedding of the mapped instance should be close to that of the concept to which the instance belongs. For subclassof triples, we took advantage of the transitivity of the subclassof relation for modeling, and, in order to differentiate the hierarchy of different concepts, we introduced a learnable parameter of the concept to adjust the neighboring range in order to model the hierarchical information of concepts. For relation triples, we used the classical TransE model for modeling because the embedding of the head entities and tail entities are in the same space. Taking TransE, DistMult, HolE, TransC, and JOIE as the baseline, the experiments show that CIST can achieve an optimal performance in most cases.

The main contributions of this paper can be summarized as follows:

We propose a new knowledge graph embedding model that exploits the latent semantic links between concepts and instances to enhance the representation learning of knowledge graphs and can effectively alleviate the key issue of this work, that is, the problem of insufficient learning due to data sparseness.
By performing embedding learning on concepts and instances in different spaces, we can effectively alleviate the problem of different instances that belong to the same concept gathering too closely in the embedding space.
We added a learnable parameter to adjust the neighboring range for concept embedding, which can effectively differentiate the hierarchical information of different concepts and simultaneously alleviate the problem of different concepts, which corresponds to the same instance gathering in the embedding space.

2. Related Works

2.1. Mainstream Knowledge Representation Model

In this section, we will introduce the current mainstream knowledge graph embedding models. We roughly divide the models into four classes: distance-translation-based models, semantic matching models, neural-networks-based models, and auxiliary-information-based models.

Distance-translation-based models. Such models use distance-based scoring functions. Inspired by the translation invariance of the word2vec model, the TransE [2] model proposed by Border et al. regards the relation vectors as a translation from the head entity vector to the tail entity vector in the same space. The TransE model, though with fewer parameters and a low complexity in computation, cannot handle complex relations such as 1-n, n-1, and n-n. For this reason, variant models such as TransH [3], TransR [13], and TransD [14] were proposed successively. These models all allow an entity with different relations to have different representations to handle complex relations.

Semantic matching models. Such models employ scoring functions based on similarity. RESCAL [15] models entities as vectors and relations as matrixes. DistMult [4], based on RESCAL, restricts relation matrixes to diagonal matrixes. HolE [16] introduces the circular correlation of embedding for semantic matching. Each of these models can capture rich interactive information from triples. RotatE [17], inspired by Euler’s formula, uses complex vectors to represent entities and relations, and model relations as rotations from head entities to tail entities.

Neural-networks-based models. Such models use neural network models, such as CNN, RNN, and GCN, to learn deep expressive features. Among the CNN-based models, ConvE [18] models the interaction between entities and relations by mapping head entities and relations into 2D matrixes, ConvKB [19] uses CNN to encode the concatenation of entities and relations to extract features, and ConvR [20] constructs convolution filters from the relation representation and obtains good results in link prediction. RNN-based models are designed to capture longer relation dependencies in KG. Gardner et al. [21] and Neelakantan et al. [22] propose RNN-based models for modeling relation paths and learning vector representations with and without entity information, respectively. RSN [23] designed a circular skip mechanism to enhance semantic representation learning by distinguishing entities from relations. The GCN-based model is designed to capture the structural information in KG. R-GCN [5] proposes that directed properties of knowledge graphs are modeled by transformations of relations. SACN [24] introduces weighted GCNs, and Nathani et al. [25] introduce GAN as an encoder to capture multi-hop neighborhood features.

Auxiliary-information-based models. Such models learn knowledge representation by using the auxiliary information of a knowledge graph (such as entity type, entity description, entity context, etc.). DKRL [26], based on TransE, learns representation directly from entity descriptions through the convolutional encoder. SSP [8] models the strong correlation between triples and text descriptions by projecting them into a semantic subspace. TEKE [27] learns entity and relation representations with stronger expression abilities by incorporating textual embedding into traditional approaches such as TransE. TKRL [6] captures hierarchical information by fully using the projection matrix of additional entity types to entities.

2.2. Concept and Instance-Based Knowledge Representation Models

Concept and instance-based knowledge embedding models are auxiliary-information-based approaches. Such models use relations between concepts and instances to enhance the embedding learning of knowledge graphs. The JECI++ [9] model predicts instances utilizing neighbor instance neighborhood information and the concepts to which the instance belongs. However, this model only utilizes part of the knowledge graph. TransC [10] models each concept as a sphere in the semantic space and each instance as a point in the same semantic space, and models the potential semantic relations between the instance and the concept through the relative positions of the point and the sphere. JOIE [11] adopts both the intra-view and cross-view models, captures the structured knowledge of instances and concepts in the embedding spaces, respectively, through the intra-view model, and learns the semantic relations between instances and concepts using the cross-view model. Compared with these approaches, our proposed model CIST can not only alleviate the gathering issue of the embedding of instances and concepts caused by the modeling of them in the same space, which may influence the model performance, but can also model the hierarchy of concepts and the transitivity of isA relations well.

3. Problem Description

In this section, we will give a formalized description of the problem.

The knowledge graph KG describes entities and their relations. Due to the difference between instances and concepts in entities, we represent instances and concepts in the instance-embedding space

R^{d_{e}}

and the concept-embedding space

R^{d_{c}}

, respectively. The spatial dimension of the instance-embedding

d_{e}

is higher than that of the concept-embedding

d_{c}

, that is,

d_{e} > d_{c}

. Thus, we formalize the knowledge graph as KG={C, I, R, S},where C, I, R, S indicate the concept set, the instance set, the relation set, and the triple set, respectively. We have three types of triples, i.e., instanceof triples, subclassof triples, and relation triples. The instanceof triples, since the head entity and the tail entity cannot be represented in the same embedding space, need to be modeled separately. The subclassof triples with the head and tail entities both being concepts can be modeled by using the transitivity of the subclassof relation. For relation triples (excluding subclassof triples), the head and tail entities are both concepts or both instances, and can be modeled in the same space.

Thus, the relation set R can be formalized as

R = r_{i} \cup r_{s} \cup {R_{r}^{e}, R_{r}^{c}}

, where r

_{i}

is the instanceof relations, r

_{s}

is the subclassof relation,

R_{r}^{e}

is the instance relation set, and

R_{r}^{c}

is the concept relation set (except r

_{s}

relations). For training, the triple set S can be divided into three disjoint subsets:

Instanceof triple set $S_{e} = {{(i, r_{i}, c)}_{k}}_{k = 1}^{n_{e}}$ , where $i \in I$ , with its embedding $e \in R^{d_{e}}$ , $c \in C$ , with its embedding $o \in R^{d_{c}}$ , and $n_{e}$ is the size of $S_{e}$ .
Subclassof triple set, where $S_{c} = {{(c_{i}, r_{s}, c_{j})}_{k}}_{k = 1}^{n_{e}}$ , where $c_{i}$ , $c_{j} \in C$ , with its embedding $o_{i}$ , $o_{i} \in R_{c}^{d}$ , $c_{i}$ is a subconcept of c $_{j}$ , and $n_{c}$ is the size of S $_{c}$ .
Relation triple set $S_{r} = {S_{r}^{e} \cup S_{r}^{c}} = {{(h, r, t)}_{k}}_{k = 1}^{n_{r}}$ , where h, r, t ∈ C or h, r, t ∈ I, with its embedding represented by $h$ , r, t, respectively, $h$ , r, t $\in R_{e}^{d}$ or $h$ , r, t $\in R_{c}^{d}$ , $n_{r}$ is the size of S $_{r}$ . S $_{r}^{e}$ is the instance relation triple set $S_{r}^{e} = {{(h_{i}, r_{i}, t_{i})}_{k}}_{k = 1}^{n_{r}^{e}}$ , where h $_{i}$ ,t $_{i}$ , r $_{i} \in R^{d_{e}}$ , $n_{r}^{e}$ is the size of $S_{r}^{e}$ . $S_{r}^{c}$ is the set of concept relation triples $S_{r}^{c} = {{(h_{c}, r_{c}, t_{c})}_{k}}_{k = 1}^{n_{r}^{c}}$ , where h $_{c}$ , t $_{c}$ ∈ C, $r_{c} \in R_{c}^{r}$ , and $n_{r}^{c}$ is the size of S $_{r}^{c}$ . We exclude subclassof triples from the relation triple.

Instanceof triples contain concepts and instances, subclassof triples describe the hierarchy relation between concepts, and relation triples are ordinary triples. Each kind of triples should be trained by different embedding mechanisms so that the latent semantic links existing in each type of triples can be fully captured. This is the reason why we divided the triple set into the above three disjoint datasets.

Next, we will model and describe the three types of triples in the knowledge graph based on

S = {S_{e} \cup S_{c} \cup S_{r}}

.

4. Model

4.1. Model Description

In this paper, instanceof triples, subclassof triples, and relation triples in the knowledge graph were modeled separately. The models are described specifically as follows. For instanceof triples (i, r

_{i}

, c), the embedding of instance i and concept c are e and o, respectively, which should be in different embedding spaces. Since the instance i has the attribute information of the concept c, we believe that the embedding e of the instance i should be close to the embedding o of the concept c after the mapping, i.e., for the instanceof triple (i, r

_{i}

, c), there exists:

f_{ins} (e) \to o .

(1)

Instanceof triples are many-to-many relations, that is, one concept may contain multiple instances, and one instance may belong to multiple concepts. For the case that one concept contains multiple instances, the design of the mapping function f

_{ins}

(e) should consider that, for different e, the value of f

_{ins}

(e) may be the same. We defined f

_{ins}

(e) as a non-linear affine function:

f_{ins} (e) = σ (W \cdot e + b),

(2)

where W

\in R^{d_{c} \times d_{e}}

is a weight matrix, b

\in R^{d_{c}}

is a bias vector,

σ (\cdot)

is a non-linear activation function, and tanh is used in this work.

For the cases where one instance may belong to multiple concepts, if

o = f_{ins} (e)

is directly used, it is unable to make a discrimination between these cases. Therefore, we added a learnable parameter

δ_{c}

for the embedding of each concept to indicate the range of o that has an effect. We believe that the embedding e of instance i after being mapped should lie within the range of

δ_{c}

of the embedding o of concept c, that is:

| | f_{ins} (e) - o {| |}_{2} \leq δ_{c},

(3)

where

| | \cdot {| |}_{2}

is the Euclidean distance formula.

δ_{c}

can address the issue that multiple concepts corresponding to the same instance would gather closely in the embedding space. The embedding e of the instance i that belongs to c, after being mapped, should lie in the hypersphere neighboring range with the sphere center point o and radius

δ_{c}

. In other words, after training, e will be around the o rather than overlapping with point o.

The objective function of instanceof triples (i, r

_{i}

, c) was defined as:

F_{ins} (e, o) = | | f_{ins} (e) - o {| |}_{2} - δ_{c} .

(4)

For subclassof triples (c

_{i}

, r

_{s}

, c

_{j}

), concept c

_{i}

is the sub-concept of concept c

_{j}

, and concept c

_{i}

is in the same space as concept c

_{j}

; then, o

_{i}

should be in the neighborhood range

δ_{j}

of o

_{j}

. Combining the triples (i, r

_{i}

, c

_{i}

) and (c

_{i}

, r

_{s}

, c

_{j}

), based on the transitivity of the isA relations, (c

_{i}

, r

_{s}

, c

_{j}

) can be inferred. From this, we can deduce that all of the instances belonging to c

_{i}

are instances of c

_{j}

, that is, the embedding of all instances belonging to c

_{i}

, after being mapped, should also be within the neighborhood range

δ_{j}

of o

_{j}

. Figure 2 shows the relative positions of the head entity and tail entity of the subclassof triple.

As shown in Figure 3, if we make the embedding of all instances that belong to c

_{i}

within the neighborhood range

δ_{j}

of o

_{j}

after being mapped, the neighborhood range

δ_{i}

of o

_{i}

should be within the neighborhood range

δ_{j}

of o

_{j}

. Therefore, we defined the objective function of the subclassof triplets (c

_{i}

, r

_{s}

, c

_{j}

) as:

F_{sub} (o_{i}, o_{j}) = | | o_{i} - o_{j} {| |}_{2} - (δ_{j} - δ_{i}) .

(5)

For relation triples (h, r, t), which consist of instance relation triples and concept relation triples (excluding subclassof triples), we learned embeddings h, r, and t in the corresponding instance-embedding space or concept-embedding space for h, r, t, respectively. This model used the classic TransE model, and the objective function was defined as:

F_{r} = | | h + r - {t | |}_{2} .

(6)

The model can handle the transitivity of the subclassof relation after finishing the embedding learning above. Suppose that there are two positive triples samples (c

_{i}

, r

_{s}

, c

_{j}

) and (c

_{j}

, r

_{s}

, c

_{k}

). According to the model, the hypersphere neighborhood range

δ_{i}

of the embedding o

_{i}

of concept c

_{i}

is within the hypersphere neighborhood range

δ_{j}

of the embedding

δ_{j}

of c

_{j}

, and the hypersphere neighborhood range

δ_{j}

of o

_{j}

is within the hypersphere neighborhood range

δ_{k}

of o

_{k}

. It can be derived that the hypersphere neighborhood range

δ_{i}

of o

_{i}

is also within the hypersphere neighborhood range

δ_{k}

of o

_{k}

, that is, (c

_{j}

, r

_{s}

, c

_{k}

) is a positive triple. Therefore, our model is able to explore the transitivity of the subclassof relation.

4.2. Training Method

We used the margin-based ranking loss function as the optimization objective for training, and the margin parameter was denoted as

γ

. This function can help to distinguish between positive and negative samples by limiting the score of positive triples to be at least lower than the score of the corresponding negative triples with a value of

γ

.

Both positive and negative samples need to be used in the training. The existing knowledge graph only contains positive triples. We need to generate negative triples through the positive triples of knowledge graphs. The commonly used negative sampling strategy based on uniform distribution (“unif” strategy) is to generate negative triples by randomly replacing h, t in positive triples (h, r, t). For example, for an instance relation triple (h, r, t), one of its negative triples (h’, r, t) is obtained by replacing the original head entity h with a randomly selected h’ from all of the instances, and (h’, r, t) does not exist in KG.

However, the uniform sampling strategy has the problem of a low efficiency because, during the training, many negative samples are obviously wrong and do not provide any meaningful information. Therefore, we used the self-adversarial negative sampling method proposed by RotatE [17], which samples negative triples in accordance with the prevailing embedding model. Specifically, the negative triples were sampled from the following distributions:

p (h_{j}^{'}, r, t_{j}^{'} | {(h, r, t)}) = \frac{\exp α [- F (h_{j}^{'}, r, t_{j}^{'})]}{Σ_{i} \exp α [- F (h_{i}^{'}, r, t_{i}^{'})]},

(7)

where

α

is the temperature of sampling and F(

h_{j}^{'}

, r,

t_{j}^{'}

) is the objective function value of the model corresponding to a candidate negative sampling triple (

h_{j}^{'}

, r,

t_{j}^{'}

) of (h, r, t). The idea of self-adversarial negative sampling is to choose the negative triples that are as similar to the positive triples as possible for training while the model makes the greatest effort to distinguish the positive triples from the negative triples.

For instanceof triples, we used

ξ_{e}

and

ξ_{e}^{'}

to denote positive triples and negative triples, and S

_{e}

and S

_{e}

’ to denote the positive triple set and negative triple set. We defined the margin-based ranking loss function

L_{e}

for the instance of triple set S

_{e}

:

L_{e} = Σ_{ξ \in S_{e}} Σ_{ξ^{'} \in S_{e}^{'}} {[γ_{e} + F_{e} (ξ) - F_{e} (ξ^{'})]}_{+},

(8)

where

{[x]}_{+} = \max (0, x)

and

γ_{e}

is the margin hyperparameter of the instance of triples.

Similarly, we defined the margin-based ranking loss function

L_{c}

for the subclass of triple set S

_{c}

:

L_{c} = Σ_{ξ \in S_{c}} Σ_{ξ^{'} \in S_{c}^{'}} {[γ_{c} + F_{c} (ξ) - F_{c} (ξ^{'})]}_{+} .

(9)

We defined the margin-based ranking loss function

L_{r}^{e}

for the instance relation triple set S

_{r}^{e}

:

L_{r}^{e} = Σ_{ξ \in S_{r}^{e}} Σ_{ξ^{'} \in S_{r}^{' e}} {[γ_{r}^{e} + F_{r}^{e} (ξ) - F_{r}^{e} (ξ^{'})]}_{+} .

(10)

We defined the margin-based ranking loss function

L_{r}^{c}

for the concept relation triple set S

_{r}^{c}

:

L_{r}^{c} = Σ_{ξ \in S_{r}^{c}} Σ_{ξ^{'} \in S_{r}^{' c}} {[γ_{r}^{c} + F_{r}^{c} (ξ) - F_{r}^{c} (ξ^{'})]}_{+} .

(11)

Finally, we defined the overall loss function as a linear combination of these four loss functions:

L = β_{1} \cdot L_{e} + β_{2} \cdot L_{c} + β_{3} \cdot L_{r}^{c} + L_{r}^{e} .

(12)

where

β_{1}, β_{2}, β_{3} > 0

are hyperparameters that maintain a balance among

L_{e}, L_{c}, L_{r}^{c}

, and

L_{r}^{e}

.

We adopted the stochastic gradient descent (SGD) algorithm to minimize the above loss function. During model training, to avoid overfitting, we enforced constraints on all entities and relations in relation triples, instanceof triples, and subclassof triples, so as to make L2 norms less than or equal to 1, that is,

{| | h | |}_{2} \leq 1

,

{| | r | |}_{2} \leq 1

,

{| | t | |}_{2} \leq 1

,

{| | e | |}_{2} \leq 1

, and

{| | o | |}_{2} \leq 1

.

4.3. Model Complexity

Here, we analyze the model parameter complexity. We used N

_{i}

and N

_{o}

to denote the total number of instances and concepts, respectively, and used N

_{i}^{r}

and N

_{o}^{r}

to denote the number of relations in the instance relation triple set and the number of relations in the concept relation triple set, respectively, and

d_{e}

and

d_{c}

are the dimensions of the instance-embedding space and the concept-embedding space, respectively. For instance relation triples and concept relation triples, we used TransE for modeling. The model parameter complexities of instance relation triples and concept relation triples are

O (N_{i} d_{e} + d_{e})

and

O (N_{o} (d_{c} + 1) + N_{o}^{r} (d_{c} + 1))

, respectively. Since the number of relations is much smaller than the number of instances or concepts, i.e.,

N_{i} ≫ N_{i}^{r}

,

N_{o} ≫ N_{o}^{r}

, N

_{i}^{r}

and N

_{o}^{r}

were ignored here. The model parameter complexities of instance relation triples and concept relation triples can be approximated as

O (N_{i} d_{e})

and

O (N_{o} d_{c})

, respectively. For instanceof triples, the complexity of the model is

O (N_{i} d_{e} + N_{o} (d_{c} + 1) + d_{e} (d_{c} + 1) + N_{o})

. For subclassof triples, the parameter complexity of the model is

O (N_{o} (d_{c} + 1))

. The parameters represented by the embeddings of instances and concepts are shared across the model, so the overall model parameter complexity is

O (N_{i} d_{e} + N_{o} (d_{c} + 1) + d_{e} (d_{c} + 1)

. As N

_{o}

is negligible for

N_{o} d_{c}

,

d_{e}

is also negligible for

d_{e} d_{c}

, so it can be approximated as

O (N_{i} d_{e} + N_{o} d_{c} + d_{e} d_{c}

. Since

N_{i} ≫ d_{e}

,

d_{c} \approx d_{e}

, it is clear that there is a roughly linear relationship between parameter complexity and the number of entities.

5. Experiments

This section introduces the experiment results of the CIST knowledge representation model, and mainly verifies and evaluates the performance of the CIST model on two works—link prediction (Section 5.2) and triples classification (Section 5.3). These two works assess the model’s capacity to predict the accuracy of the invisible triples from different angles, corresponding to different application scenarios. The assessment criteria for these two tasks, the specific configuration of the experiment, and the corresponding experiment results will all be discussed. Then, the experimental performance of the CIST model will be analyzed and compared to that of other knowledge representation models.

5.1. Experiment Preparation

Most of the datasets used in previous work consist of FB15K [3], which contains instances only, and WN18 [19], which contains concepts only, while the datasets YAGO39K and M-YAGO39K used in TransC do not contain concept relation triples, so they are not suitable for evaluating our model. We used two datasets proposed in JOIE [11] that meet all of the requirements: YAGO26K-906, extracted from YAGO [28], and DB111K-174, extracted from DBpedia [25]. The basic statistics of the experimental datasets are shown in Table 1.

5.2. Link Predition

The main task of link prediction is to predict t given (h, r) or predict h given (r, t) for a relation triple (h, r, t) where either the head entity or the tail entity is missing. It tests the quality of embedding learning by focusing on ranking candidate entity sets in the knowledge graph rather than directly obtaining the best result. In our experiments, this task was divided into two subtasks, that is, link prediction for instance relation triples and link prediction for concept relation triples. We performed these two subtasks on two datasets.

Experimental design. First, we divided the triple set into the training set, the validation set, and testing set, with relative percentages of 85%, 5%, and 10%, in accordance under the setting of the widely used benchmark dataset TransE [2]. To avoid overfitting, we adopted the parameters obtained in the validation set during the testing stage. For each testing triple (h, r, t), replacing the tail entity t with an entity x in the knowledge graph, we obtained a set of corrupted triples (h, r, x). After calculating the score of the objective function F(h, r, x) of corrupted triples and ranking them in ascending order, we had the ranking of testing triples (h, r, t) within all candidate triples. Similarly, the ranking of the objective function scores for a set of candidate triples (x, r, t) that replace the head entity h for the testing triple (h, r, t) could also be obtained. Like TransE [2], we used two evaluation metrics in this task: one was the mean reciprocal rank (MRR) of all testing triples, and the other was the proportion of correct triples that rank no larger than N (Hits@N). The better the performance of the model, the larger the MRR and Hits@N values. In addition, we applied the “Filter” [4] from earlier studies, that is, if a candidate triple exists in the knowledge graph (that is, the candidate triple with head or tail replacing is still correct), then it is reasonable that its objective function score is lower than the score of the original testing triples. To eliminate the influence of this interference factor, we removed the candidate triplets that generated “interference” from the training set, the validation set, and the testing set before obtaining the ranking score of each testing triple in order to ensure that the candidate triplet did not belong to any dataset.

Experimental implementation. During training, the stochastic gradient descent algorithm learning rate

λ = {0.0005, 0.001, 0.01, 0.1}

, the margin hyperparameters

γ_{e}

,

γ_{c}

,

γ_{r}

are in the range of

{0.1, 0.2, 0.5, 1, 2, 5, 7}

, the embedding space dimension

d_{c}

is in the range

{25, 50, 75, 100}

,

d_{e}

is in the range

{100, 300, 500, 1000}

, and the weights of

β_{1}

,

β_{2}

, and

β_{3}

are in the range

{0.5, 1, 2}

, respectively. The best parameters were determined by the Hits@10 of the validation set. “Unif” was used to represent the traditional strategy of equal probability replacing head or tail entities, “self-adv” was used to represent the use of the self-adversarial negative sampling strategy, and the self-adversarial negative sampling parameter

α \in {0.5, 1.0}

. The best parameter of the models for datasets are the same value:

λ = 0.001

,

d_{c} = 50

,

d_{e} = 300

,

γ_{e} = 2

,

γ_{c} = 1

,

γ_{r}^{e} = 5.0

,

γ_{r}^{c} = 5.0

,

β_{1} = 1

,

β_{2} = 1

,

β_{3} = 2

, batch = 128, self-adversarial sampling parameter

α = 1

under the “self-adv” strategy. The model trains 5000 rounds for each dataset iteratively. We selected TransE as the intra-view model in JOIE to ensure fairness.

Analysis of experiment results.Table 2 presents the experiment results. We can see from the table that CIST performs better than other baseline models in almost every evaluation metric, especially the CIST (self-adv) variant, which uses the self-adversarial negative sampling method. On Hit@1 and Hits@10, CIST (self-adv) achieves almost twice as high scores compared with other baselines, which further verifies CIST’s effectiveness.

Compared with traditional models TransE, DissMult, and HolE, CIST outperforms these models on average by 87.2% on MRR, 71.8% on Hit@1, and 144.2% on Hit@10. This shows that CIST can learn the specific features of instances and concepts when learning entity embedding by exploiting the latent semantic relations between instances and concepts. CIST adds additional features to the embedding learning of the knowledge graph, which makes its performance much better than other models. Compared with TransC, CIST outperforms the model on average by 44.1% on MRR, 57.3% on Hit@1, and 38.3% on Hit@10. This is because, compared with TranC, CIST models instances and concepts into different embedding spaces, thereby avoiding the problem of a limited learning performance caused by the gathering issue of different instances that belong to the same concept. The experiment results demonstrate the effectiveness of modeling instances and concepts in different embedding spaces.

Compared to JOIE, CIST outperforms the model on average by 21.4% on MRR, 51.1% on Hit@1, and 29.2% on Hit@10. This is because, compared to JOIE, CIST adds a neighboring range parameter to the embedding of a concept, which can model the transitivity of the isA relations and the situation where the same instance belongs to different concepts. The experiment results show that CIST performs better in learning the latent semantic relations between instances and concepts. Compared with “unif”, the CIST with “self-adv” setting basically outperforms on MRR, Hit@1, and Hit@10. The experiment results demonstrate the effectiveness of the “self-adv” negative sampling strategy.

Compared to other baseline models, CIST, using “unif” sampling, performs slightly worse in instance relational triples on the YAGO26K-906 dataset. The reason may be that we search the best experimental configuration parameters based on Hits@10 for all triples on the validation set, which may not be the best parameters for instance relational triples, and may lead to the relatively lower evaluation value.

Overall, CIST is an effective knowledge graph embedding learning model that benefits the mutual learning of instance embedding and concept embedding. As a result, link prediction tasks perform well with CIST.

5.3. Triple Classification

Determining whether a testing triple is labeled as “correct” or “incorrect” is the main task of triple classification. The triples can be instance relation triples, concept relation triples, instanceof triples, or subclassof triples. This is a binary classification task, and its evaluation metrics include the accuracy rate, precision rate, recall rate, and F1 value, which are commonly used in binary classification tasks. We constructed the negative triples needed for the triple classification task testing according to the same settings as the neural tensor network model NTN [29]. We constructed one negative triple for each positive triple in the validation set and the testing set. The numbers of positive triples and negative triples in the verification set and the testing set are the same.

Experimental design. We divided the triple set into the training set, validation set, and testing set, the percentage of which being 60%, 20%, and 20%, respectively. To avoid overfitting, we adopted the parameters obtained in the validation set during the testing stage. We set a threshold

δ_{r}

for each relation r in the dataset. For a given testing relation triple (h, r, t), we calculated the score of its score function F(h, r, t). If its function score was less than the threshold

δ_{r}

, then the label of the triple was predicted to be “correct”; otherwise, it was “incorrect”. Similarly, for instanceof triple (x, r

_{i}

, c), if the score of Formula (4) was less than

δ_{r i}

, it was predicted to be “correct"; for subclassof triple (x, r

_{s}

, c), if the score of Formula (5) was less than

δ_{r i}

, the triple was predicted as “correct”. The threshold

δ_{r}

was determined by maximizing the classification accuracy on the validation set.

Experimental implementation. In this task, the optimization method of model parameters is the same as that of the link prediction task. The optimal configuration was determined by the accuracy of the validation set. The optimal parameter configuration of the model is as follows: for the YAGO26K-906 dataset,

λ = 0.001

,

d_{c} = 50

,

d_{e} = 300

,

γ_{e} = 2

,

γ_{c} = 1

,

γ_{r}^{e} = 5.0

,

γ_{r}^{c} = 5.0

,

β_{1} = 1

,

β_{2} = 1

,

β_{3} = 2

, batch = 128, self-adversarial sampling parameter

α = 0.5

under “self-adv” strategy; for the DB111K-174 dataset,

λ = 0.001

,

d_{c} = 50

,

d_{e} = 300

,

γ_{e} = 2

,

γ_{c} = 1

,

γ_{r}^{e} = 5.0

,

γ_{r}^{c} = 5.0

,

β_{1} = 1

,

β_{2} = 1

,

β_{3} = 2

, batch = 128, self-adversarial sampling parameter

α = 1

under the “self-adv” strategy. For each dataset, all training triples were trained for 5000 rounds iteratively.

Experiment results. In our dataset, triples can be categorized into four different types: instance relation triples, concept relation triples, instanceof triples, and subclassof triples. We conducted experiments on the above four types of triple sets, respectively. The experiment results are shown in Table 3, Table 4, Table 5, and Table 6, respectively.

We have the following observations from the experiment results.

(1) CIST has the highest accuracy and F1 score across all experiments, showing that it outperforms the state-of-the-art models in triple classification tasks. Although some baselines outperform CIST on the precision and recall value, CIST still achieves the highest F1 score. The F1 score balances the precision and recall, so the advantages of CIST can still be verified to a large extend.

(2) Compared with TransE, CIST outperforms it on average by 30.5% on accuracy, 29.0% on precision, 17.9% on recall, and 23.3% on F1 score. This further proves that CIST can model the specific features of instances and concepts when learning entity embedding by exploiting the latent semantic relations between instances and concepts.

(3) From Table 3, Table 5 and Table 6, compared to TransC, CIST outperforms it on average by 17.4% on accuracy, 16.1% on precision, 14.4% on recall, and 14.9% on F1 score. The experiment results further demonstrate the effectiveness of modeling instances and concepts into different embedding spaces. TransC is not able to model concept relational triples, and thus we did not include TransC in Table 4.

(4) Compared to JOIE, CIST outperforms it on average by 16.3% on accuracy, 13.9% on precision, 16.7% on recall, and 15.2% on F1 score, which further demonstrates that CIST can model the transitivity of isA relations and learn the latent semantic links between instances and concepts much better.

(5) For the CIST model, compared to the “unif” sampling strategy, CTSI (self-adv) outperforms it on average by 4.5% on accuracy, 3.7% on precision, 5.6% on recall, and 3.9% on F1 score. This shows that the “self-adv” sampling strategy is more effective than “unif” by extracting higher-quality negatives samples.

In conclusion, compared to TransC and JOIE, CIST can alleviate the gathering issue of instances and concepts embedding, and can model the transitivity of isA relations much better, making the CIST model achieve better results in triple classification tasks.

6. Conclusions and Future Work

In this paper, we propose a new knowledge graph representation model—CIST—which explores the latent semantic relations between concepts and instances. It enhances the representation learning of knowledge graphs and can effectively alleviate the problem of insufficient learning due to data sparseness. In addition, we conducted embedding learning on concepts and instances in different spaces, which can effectively alleviate the problem of different instances belonging to the same concept gathering too closely in the embedding space. We also added a learnable parameter to adjust the neighboring range for concept embedding that is able to differentiate the hierarchical information of different concepts. It can also alleviate the problem of different concepts corresponding to the same instance gathering closely in the embedding space. The experiment results show that CIST outperforms baseline models such as TransC and JOIE in most cases, indicating that CIST can handle the transitivity of isA relations while alleviating the gathering issue in the embedding space.

As for the future work, we will explore the following research directions: (1) due to the limitations of modeling concepts hierarchy by the hypersphere neighboring range, we will try to find more expressive models to model isA relations; (2) considering that concepts and instances are just part of the information in knowledge graphs, we will try to find more generalizable features for knowledge graph representation learning; (3) two datasets may not be enough to verify the effectiveness of our model, and we will construct a new dataset meeting an experimental set up to further consolidate our work by comparing with the state of the arts.

Author Contributions

Conceptualization, P.Z. and D.C.; methodology, P.Z. and Y.F.; software, D.C.; validation, X.Z. and W.X.; formal analysis, P.Z.; investigation, P.Z.; resources, P.Z.; data curation, D.C.; writing—original draft preparation, P.Z.; writing—review and editing, Y.F.; supervision, W.X.; project administration, X.Z.; funding acquisition, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by NSFC under grants Nos. 61872446, and The Science and Technology Innovation Program of Hunan Province under grant No. 2020RC4046.

Data Availability Statement

The datasets could be found at https://github.com/JunhengH/joie-kdd19, accessed on 1 March 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; MIT Press: Cambridge, MA, USA, 2013; pp. 3111–3119. [Google Scholar]
Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; MIT Press: Cambridge, MA, USA, 2013; pp. 2787–2795. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Brodley, C.E., Stone, P., Eds.; AAAI Press: Menlo Park, CA, USA, 2014; pp. 1112–1119. [Google Scholar]
Yang, B.; Yih, W.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Bengio, Y., LeCun, Y., Eds.; Openreview.net: Cambridge, MA, USA, 2015. [Google Scholar]
Schlichtkrull, M.S.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web, Proceedings of the 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Gangemi, A., Navigli, R., Vidal, M., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 10843, pp. 593–607. [Google Scholar] [CrossRef]
Wang, Z.; Li, J. Text-Enhanced Representation Learning for Knowledge Graph. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016; Kambhampati, S., Ed.; IJCAI/AAAI Press: Menlo Park, CA, USA, 2016; pp. 1293–1299. [Google Scholar]
Zuo, Y.; Fang, Q.; Qian, S.; Zhang, X.; Xu, C. Representation Learning of Knowledge Graphs with Entity Attributes and Multimedia Descriptions. In Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, BigMM 2018, Xi’an, China, 13–16 September 2018; pp. 1–5. [Google Scholar] [CrossRef]
Xiao, H.; Huang, M.; Meng, L.; Zhu, X. SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Singh, S., Markovitch, S., Eds.; AAAI Press: Menlo Park, CA, USA, 2017; pp. 3104–3110. [Google Scholar]
Wang, P.; Zhou, J. JECI++: A Modified Joint Knowledge Graph Embedding Model for Concepts and Instances. Big Data Res. 2021, 24, 100160. [Google Scholar] [CrossRef]
Lv, X.; Hou, L.; Li, J.; Liu, Z. Differentiating Concepts and Instances for Knowledge Graph Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1971–1979. [Google Scholar] [CrossRef]
Hao, J.; Chen, M.; Yu, W.; Sun, Y.; Wang, W. Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining, KDD 2019, Anchorage, AK, USA, 4–8 August 2019; Teredesai, A., Kumar, V., Li, Y., Rosales, R., Terzi, E., Karypis, G., Eds.; ACM: New York, NY, USA, 2019; pp. 1709–1719. [Google Scholar] [CrossRef]
Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Bonet, B., Koenig, S., Eds.; AAAI Press: Menlo Park, CA, USA, 2015; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; Volume 1: Long Papers. The Association for Computer Linguistics: Stroudsburg, PA, USA, 2015; pp. 687–696. [Google Scholar] [CrossRef]
Nickel, M.; Tresp, V.; Kriegel, H. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, WA, USA, 28 June–2 July 2011; Getoor, L., Scheffer, T., Eds.; Omnipress: Madison, WI, USA, 2011; pp. 809–816. [Google Scholar]
Nickel, M.; Rosasco, L.; Poggio, T.A. Holographic Embeddings of Knowledge Graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Schuurmans, D., Wellman, M.P., Eds.; AAAI Press: Menlo Park, CA, USA, 2016; pp. 1955–1961. [Google Scholar]
Sun, Z.; Deng, Z.; Nie, J.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; McIlraith, S.A., Weinberger, K.Q., Eds.; AAAI Press: Menlo Park, CA, USA, 2018; pp. 1811–1818. [Google Scholar]
Nguyen, D.Q.; Nguyen, T.D.; Nguyen, D.Q.; Phung, D.Q. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018; Walker, M.A., Ji, H., Stent, A., Eds.; Volume 2 (Short Papers). Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 327–333. [Google Scholar] [CrossRef]
Jiang, X.; Wang, Q.; Wang, B. Adaptive Convolution for Multi-Relational Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Volume 1 (Long and Short Papers). Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 978–987. [Google Scholar] [CrossRef]
Gardner, M.; Talukdar, P.P.; Krishnamurthy, J.; Mitchell, T.M. Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014; Moschitti, A., Pang, B., Daelemans, W., Eds.; ACL: Stroudsburg, PA, USA, 2014; pp. 397–406. [Google Scholar] [CrossRef] [Green Version]
Neelakantan, A.; Roth, B.; McCallum, A. Compositional Vector Space Models for Knowledge Base Completion. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, Beijing, China, 26–31 July 2015; Volume 1: Long Papers. The Association for Computer Linguistics: Stroudsburg, PA, USA, 2015; pp. 156–166. [Google Scholar] [CrossRef]
Guo, L.; Sun, Z.; Hu, W. Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Proceedings of Machine Learning Research. PMLR: New York, NY, USA, 2019; Volume 97, pp. 2505–2514. [Google Scholar]
Shang, C.; Tang, Y.; Huang, J.; Bi, J.; He, X.; Zhou, B. End-to-End Structure-Aware Convolutional Networks for Knowledge Base Completion. In Proceedings of the The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, HI, USA, 27 January–1 February 2019; AAAI Press: Menlo Park, CA, USA, 2019; pp. 3060–3067. [Google Scholar] [CrossRef]
Nathani, D.; Chauhan, J.; Sharma, C.; Kaul, M. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28July–2 August 2019; Korhonen, A., Traum, D.R., Màrquez, L., Eds.; Volume 1: Long Papers. Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4710–4723. [Google Scholar] [CrossRef]
Xie, R.; Liu, Z.; Jia, J.; Luan, H.; Sun, M. Representation Learning of Knowledge Graphs with Entity Descriptions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Schuurmans, D., Wellman, M.P., Eds.; AAAI Press: Menlo Park, CA, USA, 2016; pp. 2659–2665. [Google Scholar]
Xie, R.; Liu, Z.; Sun, M. Representation Learning of Knowledge Graphs with Hierarchical Types. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016; Kambhampati, S., Ed.; IJCAI/AAAI Press: Menlo Park, CA, USA, 2016; pp. 2965–2971. [Google Scholar]
Guo, S.; Wang, Q.; Wang, B.; Wang, L.; Guo, L. Semantically Smooth Knowledge Graph Embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, Beijing, China, 26–31 July 2015; Volume 1: Long Papers. The Association for Computer Linguistics: Stroudsburg, PA, USA, 2015; pp. 84–94. [Google Scholar] [CrossRef] [Green Version]
Socher, R.; Chen, D.; Manning, C.D.; Ng, A.Y. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; MIT Press: Cambridge, MA, USA, 2013; pp. 926–934. [Google Scholar]

Figure 1. Knowledge graph of concepts and instances.

Figure 2. Relative positions of the embeddings of the head entity instance and the embeddings of the tail entity concept in instanceof triples in two embedding spaces.

Figure 3. The relative positions of the embeddings of head entity ci and its instances after mapped and the embeddings of the tail entity cj in the concept-embedding space.

Table 1. Statistics of YAGO26K-906 and DB111K-174.

Datasets	YAGO26K-906	DB111K-174
#Instances	26,078	111,762
#Concepts	906	174
#Instance Relations	34	305
#Concept Relations	29	19
#Instanceof Triples	9962	99,748
#Subclassof Triples	1410	163
#Instance Relational Triples	390,738	863,643
#Concept Relational Triples	7552	600

Table 2. Link prediction results; the best results are in bold and the second best results are underlined.

Datasets	YAGO26K-906						DB111K-174
Triples	Instance Relational Triples			Concept Relational Triples			Instance Relational Triples			Concept Relational Triples
Metrics	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10
TransE	0.160	8.66	30.17	0.145	12.29	20.59	0.212	9.66	41.85	0.164	13.43	19.78
DisMult	0.223	10.91	32.76	0.197	17.72	25.08	0.163	8.07	32.54	0.153	12.43	16.32
HolE	0.235	12.90	34.31	0.192	18.70	20.29	0.206	9.53	37.68	0.144	15.50	20.11
TransC	0.159	8.65	30.04	0.210	20.24	26.85	0.217	9.94	42.74	0.197	16.60	23.88
JOIE	0.176	8.64	32.68	0.263	16.72	38.46	0.208	9.38	41.90	0.461	25.00	78.73
CIST (unif)	0.157	8.39	30.10	0.227	23.60	53.45	0.218	9.85	43.66	0.479	26.49	80.04
CIST (self-adv)	0.364	23.97	61.54	0.291	30.03	61.13	0.322	15.50	62.41	0.529	34.33	79.11

Table 3. Experiment results on triple classification for instance relational triples; the best results are in bold and the second best results are underlined.

Datasets	YAGO26K-906				DB111K-174
Metrics	Accuracy	Precision	Recall	F1-Score	Accuracy	Precision	Recall	F1-Score
TransE	92.92	94.91	90.70	92.76	68.01	72.61	57.85	64.39
TransC	93.73	96.94	90.31	93.50	69.76	72.27	64.12	67.95
JOIE	93.37	96.99	89.52	93.11	71.57	72.30	69.93	71.09
CIST (unif)	96.53	96.38	96.68	96.53	71.90	72.18	71.28	71.73
CIST (self-adv)	97.57	96.68	98.53	97.60	75.98	75.14	77.64	76.37

Table 4. Experiment results triple classification for concept relational triples; the best results are in bold and the second best results are underlined.

Datasets	YAGO26K-906				DB111K-174
Metrics	Accuracy	Precision	Recall	F1-Score	Accuracy	Precision	Recall	F1-Score
TransE	52.5	51.81	71.5	60.08	58.73	57.53	66.67	61.76
JOIE	74.75	77.36	73.5	75.38	81.75	81.25	82.53	81.89
CIST (unif)	75.0	73.81	77.5	75.61	82.74	81.30	84.13	82.69
CIST (self-adv)	75.25	71.13	85.00	77.45	85.71	88.14	82.54	85.25

Table 5. Experiment results triple classification for instanceof triples; the best results are in bold and the second best results are underlined.

Datasets	YAGO26K-906				DB111K-174
Metrics	Accuracy	Precision	Recall	F1-Score	Accuracy	Precision	Recall	F1-Score
TransE	50.53	50.47	57.74	53.86	53.57	52.38	78.57	62.86
TransC	62.38	63.79	57.29	60.36	52.54	52.40	55.49	53.90
JOIE	58.44	61.22	46.07	52.58	54.88	53.85	68.31	60.22
CIST (unif)	74.75	77.65	69.5	73.35	55.69	54.59	67.65	60.43
CIST (self-adv)	76.60	69.53	94.71	80.19	68.23	70.54	62.58	66.32

Table 6. Experiment results triple classification for subclassof triples; the best results are in bold and the second best results are underlined.

Datasets	YAGO26K-906				DB111K-174
Metrics	Accuracy	Precision	Recall	F1-score	Accuracy	Precision	Recall	F1-Score
TransE	52.54	51.92	63.12	56.97	52.20	51.79	63.58	57.08
TransC	54.22	53.27	68.75	60.03	60.71	57.14	85.71	68.57
JOIE	72.79	73.13	72.06	72.59	64.29	64.29	64.29	64.29
CIST (unif)	74.82	74.02	76.47	75.22	71.43	68.75	78.57	73.33
CIST (self-adv)	74.81	75.09	74.26	74.68	71.44	68.75	78.57	73.33

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, P.; Chen, D.; Fang, Y.; Zhao, X.; Xiao, W. CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding. Mathematics 2022, 10, 3161. https://doi.org/10.3390/math10173161

AMA Style

Zhang P, Chen D, Fang Y, Zhao X, Xiao W. CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding. Mathematics. 2022; 10(17):3161. https://doi.org/10.3390/math10173161

Chicago/Turabian Style

Zhang, Pengfei, Dong Chen, Yang Fang, Xiang Zhao, and Weidong Xiao. 2022. "CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding" Mathematics 10, no. 17: 3161. https://doi.org/10.3390/math10173161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CIST: Differentiating Concepts and Instances Based on Spatial Transformation for Knowledge Graph Embedding

Abstract

1. Introduction

2. Related Works

2.1. Mainstream Knowledge Representation Model

2.2. Concept and Instance-Based Knowledge Representation Models

3. Problem Description

4. Model

4.1. Model Description

4.2. Training Method

4.3. Model Complexity

5. Experiments

5.1. Experiment Preparation

5.2. Link Predition

5.3. Triple Classification

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI