Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding

Han, Huixia; Li, Xinyue; Wu, Kaijun

doi:10.3390/app13052817

Open AccessArticle

Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding

by

Huixia Han

,

Xinyue Li

and

Kaijun Wu

^*

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(5), 2817; https://doi.org/10.3390/app13052817

Submission received: 21 December 2022 / Revised: 7 February 2023 / Accepted: 16 February 2023 / Published: 22 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge graphs are a popular research field in artificial intelligence, and store large amounts of real-world data. Since data are enriched over time, the knowledge graph is often incomplete. Therefore, knowledge graph completion is particularly important as it predicts missing links based on existing facts. Currently, the family of translation models delivers a better performance in knowledge graph completion. However, most of these models randomly generate negative triplets during the training process, resulting in the low quality of negative triplets. In addition, such models ignore the important characteristics of triplet-mapping properties during model learning. Therefore, we propose an optimization framework based on the translation models (Op-Trans). It enhances the knowledge-graph completion effect from both negative sampling and triplet-mapping properties. First, we propose a clustering cache to generate negative triplets, which generate negative triplets based on entity similarity. This sampling method can directly use the cache to track the negative triplets with large scores. In addition, we focus on the different contributions of the triplets to the optimization goal. We calculate the distinct weight for each triplet according to its mapping properties. In this way, the scoring function deals with each triplet depending on its own weight. The experimental results show that Op-Trans can help the state-of-the-art baselines to obtain a better performance in a link prediction task.

Keywords:

knowledge graph completion; negative sampling; triplet-mapping properties; link prediction

1. Introduction

A knowledge graph is a directed graph structure with entities as nodes and relations as directed edges. Knowledge in a knowledge graph is expressed in the form of triplets (head entity, relation, tail entity) [1], and each triplet represents an objective fact. A knowledge graph contains many triplets that constitute a large and complex knowledge network graph. It organizes and stores information in a simple and efficient way, which is very suitable for information query and reasoning.

A knowledge graph can efficiently manage large amounts of information and has excellent semantic processing abilities. Therefore, knowledge graphs have been widely studied by researchers since they were proposed. Knowledge graphs have developed rapidly in recent years and many classical knowledge graphs have been established, such as WordNet [2], Freebase [3], DBpedia [4], and NELL [5]. These knowledge graphs are widely used in the field of artificial intelligence [6,7], including intelligent search [8], automated question and answer [9], and recommendation systems [10].

Although knowledge graphs can provide high-quality structured data [11], many large-scale knowledge graphs are usually sparse, such as Freebase and DBpedia, where the implicit relations between many entities are not fully explored. According to statistics, 71% of people do not have birth date information and 75% do not have nationality information in Freebase [12]. Current real-world knowledge graphs are usually incomplete, so need an inference engine to predict the missing links. Therefore, knowledge graph completion, known as link prediction, which automatically predicts missing links based on the given links, has recently become a significant topic of interest. However, triplets in knowledge graphs are hard to manipulate, and their symbols and logic are difficult for digital machines to learn. Thus, one fundamental problem is how to find a good representation for the entities and relations in the knowledge graph.

In recent years, knowledge graph embedding (KGE) [13] has been introduced into knowledge graphs and has shown a good performance. It maps entities and relations into a low-dimensional vector space while capturing the structural and semantic information of the knowledge graph [14]. KGE learns the representation vectors of entities and relations based on constructed knowledge graphs, so that these vectors can reflect information about the structure of nodes and edges. At present, scholars have proposed some classical KGE models in the field of knowledge graph completion, such as the structured embedding (SE) [15] model, semantic matching energy (SME) [16] model, neural tensor network (NTN) [17] model, and matrix factorization (RESCAL) [18] model. However, these KGE models have a high computational complexity and cannot learn the semantic information of entities and relations effectively. They are difficult to apply to some practical application scenarios. Therefore, researchers began to explore new KGE models. After practical tests, translation models have gradually become the mainstream in KGE.

The TransE [19] model first proposed to use relations vectors as the geometric distance between entities [20], which is a typical representative of translation models. To represent entities and relations more effectively, TransE has been continuously improved and has established models of the Trans (Translation Embedding Methods) family. These include TransH [21], TransR [22], TransD [23], TransM [24], TranSparse [25], STransE [26], TransA [27], TransAt [28], PTransE [29], etc. Recently, RotatE [30] has changed the translation process of the head and tail entity vectors in the plane into the rotation translation of the vectors in space. HAKE [31] maps the entities into spatial polar coordinates. OTE [32] exploits the orthogonality relation variation, extending the rotation model from a two-dimensional complex domain to a higher-dimensional space. Although all these works display a good performance in knowledge graph completion, there are still two major challenges when modeling entities and relations of complex knowledge graphs.

Challenge 1: Negative sampling, which is a very important perspective of KGE, is not sufficiently emphasized. The existing KGE models widely use the random sampling method, which generates negative samples by randomly replacing an entity in the positive sample with other entities in the knowledge graph. However, this method will most likely sample entities that do not belong to the same entity type or are semantically unrelated as replacements. For example, for a positive triple (China, Capital, Beijing), random sampling is used to select replacement tail entities to generate a negative triple. To illustrate this, suppose an entity, Yao Ming, of the type “person”, is selected from the entity set to form a negative triple (China, Capital, Yao Ming). Because “Yao Ming” has no semantic relationship with “Beijing”, the quality of the generated negative triplet (Yao Ming, Capital, Beijing) is very low, which is not helpful for the training of link prediction model. Low-quality negative triplets are not beneficial for learning entity vectors and relation vectors, and prevent KGE from achieving the desired performance.

Challenge 2: Figure 1 shows the vector distribution of triplets (China, Capital, Beijing and China, Located, Asia) in the embedding space. The embedding vector of each entity and relation in China, Capital, Beijing can represent the semantic relationship of this triplet well, while the embedding vector of each entity and relation in China, Located, Asia cannot represent the semantic relationship of this triplet well. Existing studies treat each embedding vector of entities and relations equally, ignoring the variability of their performance in different triplets. These studies are not flexible enough to accurately tackle the various mapping properties of triplets. The triplet mapping properties are the vector representation status of the triplet in the embedding space. Our intuition is that the mapping properties of each triplet are decided by the embedding vector of entities and relations. When triplet-mapping properties are ignored, it will lead to a fixed weight for the projection distance of entities and relations in each triplet. This would miss the important feature information in each triplet, thus reducing the representation power of the model and the accuracy of predicting candidate entities.

In order to deal with the above problems, this paper propose a learning optimization framework for translation models (Op-Trans), aiming to improve the capturing capability of the model for triplet features to achieve an accurate knowledge base completion. The contributions of our work are summarized as follows:

(1) We propose an effective negative sampling method to generate high-quality negative triplets, based on cluster-cache sampling (CCS). It is a general negative sampling method that can be applied to numerous KGE models.

(2) We propose a weighting strategy based on triplet-mapping properties, associating each triplet with a weight that represents the degree of mapping. We will focus on the different contributions of each triplet to the optimization goal.

(3) We combine the proposed optimization framework Op-Trans with underlying translation models, and we conduct experiments on four popular data sets, i.e., WN18 [33], FB15k [33], WN18RR [34], and FB15K237 [35]. The experiments show that our method is effective and universal.

2. Related Works

In this section, we introduce the current mainstream translation models and the critical differences between them. We also describe related works on negative sampling methods in the fields of knowledge graph completion.

2.1. The Translation Models

Inspired by the word-embedding vector translation invariance of the word2vec model, Borders et al. proposed the first translation model, TransE [19]. The model treats relations as translation operations from the head entities to the tail entities. TransE expects the positive triplet to satisfy

h + r - t = 0

. However, in triplets, this assumption leads to N-terminal entity embedding vectors approximately identical in the complex relationships, such as 1-to-N, N-to-1, and N-to-N, as shown in Figure 2. In addition, if the relationsip is self-reflexive, such as the classmate relationship, there will be cases where

h + r - t = 0

and

t + r - h = 0

exist at the same time, leading to the embedding vectors

h = t

or

r = 0

. Thus, TransE can only offer great results when modeling simple 1-to-1 relationships, and finds it more difficult to solve complex relationships.

To solve the complex relationships between entities, researchers have made a series of improvements to TransE, proposing some classical translation models, where the TransH [21] model projects the head entity vectors and the tail entity vectors to the hyperplane of the relation located through the unit matrices, and completes the translation process on the hyperplane. It allows the entity to have different representations when it is on different relational hyperplanes.

The TransH model better solves the shortcomings of TransE in handling complex relationships poorly. However, the model still maps entities and relations to the same semantic space, which causes it to be inadequate in representing entities and relations in the knowledge graph. The TransR [22] model establishes an entity space and a relation space, and completes the process of translation in the relation semantic space where the entity vectors are translated through the relationship mapping matrix. Compared to the TransH model, the TransR model has a better generalization performance because it takes into account that entities may have multiple aspects and that the various relations focus on different aspects of the entity. Due to TransR creating a mapping matrix for each relation, it brings more parameters and increases the computational complexity of the model. TransR is difficult to apply to large-scale knowledge graphs. The TransD [23] model replaces the fixed relation-mapping matrices of the TransR model with dynamically determined vectors of entity-relation pairs. The TransD model uses two vectors to represent each entity and relation, where the first one represents the meaning of the entity or relation, and the other one is used as a projection vector to construct the mapping matrix to project the entity into the vector space of the relation. It changes the matrix multiplication in TransR to vector multiplication, which greatly reduces the computational effort and thus increases the speed of operations. The TransM [24] model is different from the way TransE treats each triplet equally. It calculates the weights of the triplets based on the complexity of relations, and the more complex the relation, the lower the weight of the triplet. However, the TransM model focuses only on the type of relations, with triplets that have the same relation having the same weight. It does not take into account the mapping properties of each triplet. TranSparse [25] is designed to address the heterogeneity (the number of entity nodes corresponding to each relation is different) and imbalance (the same relation corresponds to a different number of head and tail entity nodes) of entities and relations. It proposes a self-adaptive construction of sparse matrices [36,37] for the projection of entities according to the complexity of the relations, which prevents the occurrence of over-fitting of simple relations and the under-fitting of complex relations.

2.2. Negative Sampling Methods

During model training, a negative triplet needs to be generated for each positive triplet. KGE models establish the loss function by calculating the scoring functions of the two triplets, so high-quality negative triplets give the model a better generalization ability. Therefore, some works have explored different negative sampling methods to obtain high-quality negative triplets for the link prediction.

The existing negative sampling is divided into two main categories: fixed distribution sampling and dynamic distribution sampling [38]. The previous sampling works, such as the TransE model, use a uniform random sampling method to generate negative samples, which uses any entity in the entity set to randomly replace the head entity or tail entity in the triplet with equal probability. However, this method is overly simple, and the space of the sampling is fixed and large. The possibility of getting high-quality negative samples is small, which will lead to an inability to update the entity vectors and relation vectors effectively when training the KGE models. In addition, there would be a greater possibility of getting a false negative triplet, i.e., a generated negative triplet that happens to be a positive triplet present in the knowledge graph, which is not conducive to model training. The TansH model uses the Bernoulli sampling method, which aims at reducing false negative triplets by replacing the head entity or tail entity of the triplets with different probabilities. However, the sampling method still samples from a fixed space.

Inspired by the generative adversarial network (GAN) [39], some works have attempted to generate high-quality negative triplets using an adversarial training framework. IGAN [40] and Kbgan [41] both introduce GAN for negative sampling in KGE; when GAN is applied to the negative sampling, it not only generates high-quality negative triplets but also dynamically adapts to the new distributions through model training. However, GAN has instability and is prone to large deviations during experiments. To solve this problem, IGAN and Kbgan are trained by adding many parameters, which raises the cost of model training and increases the difficulty of training. NSCaching [38] has fewer parameters than both IGAN and Kbgan. It stores negative triplets directly by setting the cache and dynamically updates the cache during the training process to contain more high-quality negative triplets. These negative sampling methods mentioned above, although helping to improve the performance of KGE, do not take into account the different mapping properties between entities when negative sampling.

3. Proposed Method

Traditional translation models are mainly designed to address the complexity and diversity of entities and relations in triplets. They do not consider the mapping properties of triplets, resulting in learned embedding vectors of entities and relations that do not represent the structural information of the knowledge graph well. To improve the link prediction accuracy of translation models, we propose an optimization framework for translation models—Op-Trans.

In this section, we describe our proposed Op-Trans method in detail. First, we briefly review the general framework of the translation model, then introduce the framework of the Op-Trans method. We describe in detail two key components of the Op-Trans framework, i.e., negative sampling and the weighting strategy of triplet-mapping properties. We begin with a description of the general concepts and symbols, as shown in Table 1.

3.1. Review of Translation Model Framework

Translation models interpret relations as geometric transformations in the latent space. They consider the relation

r

in the triplet

(h, r, t)

as a translation from the head entity

h

to the tail entity

t

. In this way, when two facts (China, Capital, Beijing) and (UK, Capital, London) have the same semantic relation, the word vectors exist in the embedding space: China–Beijing ≈ UK–London.

Translation models have a common principle that

h + r \approx t

. We used the most typical TransE model as an analysis case. Figure 3 shows the triplet structure of the TransE.

In translation models, the scoring function can model the complex interactions between entities and relations. For a given positive triplet, the distance between

h + r

and

t

is expected to be reduced in the corresponding space, i.e., the sum of the head entity vector

h

and the relation vector

r

should be approximately the same as the tail entity vector

t

. Conversely, for a negative triplet, the distance between

h + r

and

t

is expected to be far away from the corresponding space. The translation model uses the distance between

h + r

and

t

in the embedding space to measure the probability that the triplet is incorrect, and it defines the scoring function as shown in Equation (1).

f (h, r, t) = {‖h + r - t‖}_{1 / 2}

(1)

f (h, r, t)

can be considered as the deviation degree of the vector

h + r

from

t

after the triplet embedding. Therefore, the smaller the value of the scoring function, the more likely the triplet is to be correct. It requires lower scores for positive triplets and higher scores for negative triplets.

During the training process, the set of training triplets will be randomly traversed multiple times, and a batch

Δ b a t c h

of size

b

will be extracted from the training triplet set at each iteration. Whenever we access a positive triplet and need to generate a negative triplet, the TransE model randomly selects an entity from the entity set to generate a negative triplet, which is not contained in the set of

Δ

. The set of negative triplets is shown in Equation (2).

{Δ^{'}}_{(h^{'}, r, t^{'})} = {(h^{'}, r, t) \notin Δ | h^{'} \in E} \cup {(h, r, t^{'}) \notin Δ | t^{'} \in E}

(2)

With the factual information, the embeddings are learned by solving the optimization problem that minimizes the scoring function for positive triplets and maximizes it for negative triplets at the same time. Then, we define the following margin-based loss as the training goal:

l = \sum_{(h, r, t) \in Δ, (h^{'}, r, t^{'}) \in Δ^{'}} {[f (h, r, t) - f (h^{'}, r, t^{'}) + γ]}_{+}

(3)

where

γ

is the margin hyper-parameter separating the positive and negative triplets,

{[x]}_{+} = \max (0, x)

. The embedding parameters are then updated after each iteration of the model.

Although translation models have made greater progress in the link prediction task, researchers have made many contributions to enable a better vector representation of entities and relations. However, those translation models treat the embedding vector representation of each triplet equally in the model training process. They do not consider the variability of the triplet vector representation, i.e., some vectors’ representation of triplets can reflect the information of the triplets well, while some cannot. They ignore the mapping properties of different triplets and lead to the degradation of the representation ability of the model.

3.2. Op-Trans Framework

In this paper, we propose an Op-Trans framework, aiming at improving the performance of the underlying KGE models. The main part of our framework consists of two parts: negative sampling and a scoring function combined with triplet-mapping property weights, as shown in Figure 4.

Firstly, we initialize the entities and relations in the knowledge graph, mapping the entities and relations into a low-dimensional dense vector space, and obtain the properties vectors of each entity and relation. We propose a negative sampling optimization method to generate high-quality negative triplets based on entity similarity. In this sampling method, the entities are clustered based on their feature vectors. Then, the entities in the cluster where the head entity is located are used as candidates for replacing the head entity. Similarly, the entities in the cluster where the tail entity is located are used as candidates for replacing the tail entity.

In the training process, we use the dynamic distribution method to sample the negative triplets. We set up a head-cache H and a tail-cache T to, respectively, store from the same cluster of head and tail entities in the triplet, and later evaluate the probability of each candidate entity. Then, based on the complexity of the relationships in a given positive triplet, the probability of replacing the head entity or the tail entity is calculated. For the entity clustering, we adopted an update after completing

s

iterations, instead of an immediate update, which greatly saved on the cost of calculation.

Finally, we pre-calculate the different weights for each triplet according to its mapping properties, and the scoring function deals with each triplet depending on its weight. Different from previous models, we pay more attention to the different contributions of each triplet to the training goals. Our approach simultaneously makes great progress in dealing with self-adverse relational triplets, thus making the model more flexible in dealing with heterogeneous features of the knowledge graph.

Here, we combine the Op-Trans framework with the TransE model to obtain Op-TransE and give the training procedure of the complete code of the Op-TransE, shown as Algorithm 1. In the subsequent sections, we will introduce the implementation method of each module in detail.

Algorithm 1: Learning Op-TransE

Input: Training set

Δ = {(h, r, t)}

,

{Δ^{'}}_{(h^{'}, r, t^{'})} = {(h^{'}, r, t) \notin Δ | h^{'} \in E} \cup {(h, r, t^{'}) \notin Δ | t^{'} \in E}

, entity set

E

, relation set

R

, dimension of embeddings

d

, margin

γ

,

k

value of K-Means.

1: initialize:
2:

r \leftarrow u n i f o r m (- 6 / \sqrt{n}, 6 / \sqrt{n})

for each entity

r \in R

.
3:

r \leftarrow r / ‖r‖

for each

r \in R

.
4:

e \leftarrow u n i f o r m (- 6 / \sqrt{n}, 6 / \sqrt{n})

for each

e \in E

.
5:

e \leftarrow e / ‖e‖

for each

e \in E

.

6: loop:
7:

Δ b a t c h \leftarrow s a m p l e (Δ, b)

//sample a mini-batch of size

b

8:

Ι b a t c h \leftarrow \emptyset

// initialize the set of positive and negative triplet pairs
9: for

(h, r, t) \in Δ b a t c h

do
10:

(h^{'}, r, t^{'}) \leftarrow s a m p l e ({Δ^{'}}_{(h^{'}, r, t^{'})})

//sample a negative triplet

(h^{'}, r, t)

or

(h, r, t^{'})

11:

Ι b a t c h \leftarrow Ι b a t c h \cup {(h, r, t), (h^{'}, r, t^{'})}

12: end for
13: updating model parameters w.r.t

\sum_{(h, r, t) \in Δ, (h^{'}, r, t^{'}) \in Δ^{'}} {[f (h, r, t) - f (h^{'}, r, t^{'}) + γ]}_{+}

14: if epoch % s == 0 then
15: Updating

E_{i}

//cluster entities by K-Means
16: end if
17: end loop

3.3. Negative Sampling

In the knowledge-graph link-prediction task, how to construct the dynamic distribution of negative triplets and how to sample negative triplets efficiently have always been the biggest challenges faced by KGE models. To address these two challenges, we propose a negative sampling method CCS.

First, we perform cluster analysis on the entity set, pay attention to the complex relationship between the head and tail entities. We calculate the probability score of each candidate negative triplet, using a small amount of cache to save high-quality entities as candidate negative triplets. Then, the negative triplets are extracted directly from the cache, and the cache is constantly updated during the training process. Figure 5 shows the negative sampling process.

Since the model generates negative samples by replacing head or tail entities, we set the head-cache

H

, indexed by

(r, t)

, to store

h^{'} \in E

as the candidate to replace the head entity. Similarly, we set the tail-cache

T

, indexed by

(h, r)

, to store

t^{'} \in E

as the candidate to replace the tail entity. When a positive triplet is received, it corresponds to caches that contain candidates for negative triplets, i.e.,

H

and

T

. When selecting a candidate entity from the cache

H

or

T

which generates a negative triplet, we adopt the rule in [21]. It sets different probabilities for replacing the head entity or the tail entity when destroying the triplet, according to the relation mapping properties. If the relation is 1-to-N, the head entity is preferred to be replaced, and if the relation is N-to-1, the tail entity is preferred to be replaced, which greatly reduces the generation of false negative triplets. Under each relation

r

, the average number of tail entities corresponding to per head entity is calculated, denoted as

t p h

, and the average number of head entities per tail entity is calculated, denoted as

h p t

. Then, the head entity in the triplet is replaced with the probability of

t p h

/

(t p h + h p t)

, and the tail entity in the triplet is replaced with the probability of

h p t

/

(t p h + h p t)

, which generates the negative triplet. We sample from the cache based on the score of the triplet, where the larger score indicates a higher probability of being sampled.

We are more concerned with negative triplets with high scores during training than in previous work. Our method not only has fewer parameters but is also easier to train than a randomly initialized model. In addition, our method has good scalability and can be flexibly combined with other KGE models.

The cache needs to be updated during the iteration; otherwise, it will still face a fixed distribution sampling method. Algorithm 2 shows the way to update the cache (updating the head-cache is the same as updating the tail-cache, and we show only the updating head-cache process here). First, we cluster the entity set using the simple and flexible K-Means algorithm [42,43] which is more suitable for large entity sets and avoids introducing other biases with many parameters. This method divides the set of entities into

k

clusters, i.e.,

{E_{1}, E_{2}, \dots, E_{i}, \dots, E_{k}}

, and obtains a subset

E_{N} \in E_{i}

of size

N

from the cluster

E_{i}

of the replaced head entity, and stores this subset

E_{N}

in the cache

H

. Then, the scores of all the triplets in

H

are calculated. Since we want to sample the negative samples with larger scores in the cache, we use the ratio of the scores of each triplet to the sum of all scores in the cache as the probability of each candidate entity in H being sampled. The probability is calculated as shown in Equation (4).

p (h^{'} | (t, r)) = \frac{\exp (f (h^{'}, r, t))}{\sum_{h_{i} \in H} \exp (f ({h^{'}}_{i}, r, t))}

(4)

Algorithm 2: Updating Cache

Input: head cache H of size

N

.

1: initialize

H \leftarrow \emptyset

2: K-Means

{E_{1}, E_{2}, \dots, E_{i}, \dots, E_{k}} \leftarrow E

3: uniformly sample entity subset of size

N

from the cluster in which the head entity resides

E_{N} \in E_{i}

4:

H \leftarrow E_{N}

;
5: for i = 1, ……,

N

do
6: calculate the score

f (h^{'}, r, t)

for all

h^{'} \in H

;
7: sample

h^{'} \in H

with probability in Equation (4);
8: end for

Figure 6 shows a simple example of negative sampling. N1, N2, N3, and N4 are the candidate negative samples of the positive triplet P stored in the cache. Because the triplet- mapping properties are different, the negative triplets are mapped into the embedding space at different distances from the positive triplet, where a larger score of a negative sample means it is from the positive triplet, since we expect a large margin between the positive triplet score and the negative triplet score. The probability of a negative sample is the ratio of its score in the cache. Thus, we adopt the top sampling method according to the probability of each triplet being calculated in the cache.

3.4. Triplets Mapping Properties Weighting

For previous work that did not take into account the different mapping properties of each triplet, we associate each training triplet with the weight that represents the degree of mapping. The main issue we consider is that positive triplets and negative triplets both conform to the triplet constraint

h + r \approx t

in the embedding space. As shown in Figure 7,

h

denotes the head entity vector,

r

denotes the relation vector,

t

denotes the tail entity vector, and

\hat{t}

denotes a real representation of

h + r

in the vector space. As with the family of translation models, we expect a small distance between the performance of

\hat{t}

and

t

in each positive triplet. We consider that the model has a better learning ability when

\hat{t} \approx t

, and the features of entities and relations are well represented by vectors. On the other hand, we expect a large distance between the performance of

\hat{t}

and the performance of

t

in each negative triplet.

For each triplet, we take

h + r

as the input sample and

t

as the real label. We focus on triplets with better vector representation, which contribute more to the training goal. We regard that the model has a better performance when the correlation between both

\hat{t}

and

t

in the positive triplet is strong, while the correlation between both

\hat{t}

and

t

in the negative triplet is weak. We evaluate the correlation of two vectors,

\hat{t}

and

t

, in the embedding space, by calculating their cosine values, and the higher the cosine similarity, the stronger the correlation of the head and tail entities in the relation

r

. We use the cosine similarity that reflects the degree of the triplet vector representation as the triplet-mapping property. Thus, we define the triplet-mapping-property calculation as shown in Equation (5).

t m p = \frac{\hat{t} \cdot t}{‖\hat{t}‖ ‖t‖} + α

(5)

Since the range of cosine values is [−1,1], to avoid introducing other biases, we use the hyperparameter

α

as an equilibrium coefficient, so that

t m p > 0

. When the

t m p

value is larger, we consider that the input sample being closer to the true label means there is a higher probability that the triplet is correct. We extend the method to the Trans series model, as shown in Equation (6).

t m p = \frac{(h_{⊥} + r_{⊥}) \cdot t_{⊥}}{‖h_{⊥} + r_{⊥}‖ ‖t_{⊥}‖} + α

(6)

where

h_{⊥}

,

r_{⊥}

,

t_{⊥}

denote, respectively, the head entity, relation, and tail entity mapping vectors in the particular space. A simple way to measure the degree of mapping for the triplet is to count the

t m p

of each triplet. Ideally, the positive triplet would have a low score. We take

1 / t m p

as the triplet-mapping weight. We need to pre-calculate the specific weight of each triplet according to its mapping properties; thus, we constructed a new score function for the triplet, as shown in Equation (7).

f (h, r, t) = \frac{1}{t m p} {‖h_{⊥} + r_{⊥} - t_{⊥}‖}_{1 / 2}

(7)

In experiments, we enforce constraints as

‖h_{⊥}‖ \leq 1, ‖r_{⊥}‖ \leq 1, ‖t_{⊥}‖ \leq 1

. For a positive triplet

(h, r, t)

in the training set

Δ

, we expect that the score of

f (h, r, t)

will be much lower than any negative triplet

(h^{'}, r, t^{'})

.

4. Experiments

In this section, we demonstrate the effectiveness of our proposed method through extensive experiments. This section contains four parts. The first part introduces the datasets used in the experiments. The second part elaborates on the details of the experimental setup. The third part describes the evaluation tasks and evaluation metrics of the model. The fourth part introduces the experimental design and the analysis of the experimental results in detail.

4.1. Data Sets

The experimental data in KGE models are generally derived from Freebase [3] and WordNet [2]. Freebase is a large and still growing knowledge base. It is currently one of the largest general knowledge repositories in the world, containing tens of millions of entities and nearly two billion triplets. Some of the data in Google’s knowledge graph comes from Freebase. WordNet is a massive English vocabulary knowledge network covering a very wide range. It provides semantic knowledge among words; the entities are synonyms expressing different semantic concepts, and the relations describe the semantic and lexical connections between these synonyms.

In this paper, we used four common datasets, i.e., FB15K, WN18, and their variants FB15K237, and WN18RR. FB15K is a small dataset in Freebase. FB15K consists of 1345 relations and 14,951 entities. FB15K237 is a subset of FB15K and removes near-duplicate or inverse-duplicate relations, which consists of 237 relations and 14,541 entities. WN18 is from WordNet, and WN18 consists of 18 relations and 40,943 entities. Similarly, WN18RR is a subset of WN18, and removes near-duplicate or inverse-duplicate relations in the construction of WN18RR, which is a dataset consisting of 237 relations and 14,541 entities. The basic statistics of the datasets are shown in Table 2.

The positive triplets on each data set were divided into three parts: a training set (to train the model), a valid set (to tune the model parameters) and a test set (to evaluate the model performance). The WN18RR had a total of 93,003 positive triplets, of which the approximate ratio of the training set, valid set, and test set was 38:1:1. The FB15K237 had a total of 310,116 positive triplets, of which the approximate ratio was 31:2:3. The WN18 had a total of 151,442 positive triplets, of which the approximate ratio was 28:1:1. The FB15k had a total of 592,213 positive triplets, of which the approximate ratio was 48:5:1.

4.2. Experiment Setup

In the experiments, we used the PyTorch framework and Adam [44] optimization to implement Op-Trans method. Our experiments ran on a single Teals P100 GPU with 16 GB RAM. The ranges of several hyperparameters are listed as follows: the embedding dimension

d

= {50, 80, 150}, learning rate

η

= {0.00001, 0.0001, 0.001}, the margin value

γ

= {3, 4, 5, 6}, the mini-batch size

b

= {500, 1000, 2000}, the number of clustering centers

k

= {20, 50, 80}, the number of clustering iterations

s

= {10, 30, 50}, equilibrium coefficient

α

= {1.5, 2, 3}, and the epoch

e p o c h

= {2000, 3000, 5000}. The optimal hyperparameters on the FB15K237, WN18RR, FB15K, and WN18 were obtained after several experiments, and the optimal hyperparameters are shown in Table 3.

4.3. Evaluation Task and Metrics

4.3.1. Link Prediction

The main task of link prediction [45] was to deduce the new triplet of facts and predict the missing information. Here, for a positive triplet

(h, r, t)

, the head entity wasremoved, i.e.,

(?, r, t)

. Then, the missing part of the triplet was replaced with all the entities in the entity set to create a set of candidate triplets with an equal number of entities [46]. The score of this group of candidate triplets was calculated according to the scoring function and sorted in order from low to high. In order to better evaluate the performance of the model, we needed to consider the issue of a candidate triplet being a false negative triplet, that is, the candidate triplet formed by replacing the triplet entity with another entity is itself a correct triplet. We could not treat this type of triplet as a negative triplet, if it ranked higher than the original triplet, as it would disturb the experimental results. In view of this, we evaluated the performance of link prediction in the filtered setting [47], i.e., all false negative triplets were removed from the candidate set. All experimental results in this paper are based on the results of the data after filtering out the false negative triplets.

Eventually, the ranking of the triplet whose entity is

h

in this group of candidate triplets was recorded. In the experiment, emphasis was placed on the ranking of the correct entity rather than on finding the best result for the triplet. The performance of the model was evaluated by the ranking of the correct entity.

4.3.2. Performance Measurements

We adopted the evaluation metrics commonly used in the field of knowledge graph completion, i.e., MR, MRR, Hits@N.

Mean Rank (MR): It averages the ranking of all triplets in the test set. MR reflects the overall ranking of the correct triplets in the test set, and the smaller the value, the better the performance of the model. Here,

i \in

{1, 2, …,

|S|

} is a set of ranking results and

r a n k_{i}

is the ranking of the correct triplet in this group order. The MR is calculated using Equation (8).

M R = \frac{1}{|S|} \sum_{i = 1}^{|S|} r a n k_{i}

(8)

Mean Reciprocal Rank (MRR): It is an international entity and relation completion standard. MRR is the reciprocal of the ranking of the position where the correct target is located. The larger the value, the better the accuracy of the model prediction, and the better the effect of the model. The MRR is shown in Equation (9).

M R R = \frac{1}{|S|} \sum_{i = 1}^{|S|} \frac{1}{r a n k_{i}}

(9)

Hits@N: It evaluates the proportion of triads in the top N of the test set, which reflects the accuracy of the model prediction. A larger value of Hits@N indicates a higher accuracy of the model prediction. Here, we take N to be equal to 1,3, and 10. The calculation of Hits@N is shown in Equation (10).

H i t s @ N = \frac{1}{|S|} \sum_{i = 1}^{|S|} ∥ (r a n k_{i} \leq N)

(10)

where

∥ (\cdot)

is the indicator function. It is defined from Equation (11).

∥ (r a n k_{i} \leq n) = \{\begin{matrix} 1, 当 r a n k_{i} \leq n \\ 0, 当 r a n k_{i} > n \end{matrix}

(11)

4.4. Experimental Design

In this section, we compare our method with translation models to verify the effectiveness of our proposed method for the link prediction task. The overall structure of the experiment is shown in Figure 8.

In order to evaluate the effectiveness of our proposed method, we designed two experiments to verify the performance of the negative sampling method and triplet-mapping-property weighing method, respectively. We compared our proposed method with the state-of-the-art KGE modelling methods, including TransE, TransH, TransR, and TransD. Table 4 shows the scoring functions of these translation models.

4.4.1. Comparing Different Negative Sampling Methods

In this section, we compare different negative sampling methods, including Bernoulli sampling, Kbgan sampling, and NSCaching sampling as well as our proposed CCS sampling.

Bernoulli [21]: Bernoulli is a randomized experiment conducted repeatedly and independently under the same conditions, with only two possible outcomes for each test. Specifically, it samples

(h^{'}, r, t)

or

(h, r, t^{'})

under a predefined Bernoulli distribution.

Kbgan [41]: Kbgan first samples the negative sample set

N e g

from the whole entity set, and then uses the entity in

N e g

to replace the head or tail entity to form a negative triplet

(h^{'}, r, t)

or

(h, r, t^{'})

.

NSCaching [38]: It generates two negative sample candidate sets from the entity set-—the head entity candidate set cache and the tail entity candidate set cache. A negative triplet

(h^{'}, r, t)

or

(h, r, t^{'})

is formed by replacing the head entity or tail entity, with one uniformly sampled from head cache or tail cache.

CCS (Ours): As in Section 3.3, we extract entities of the same type from the clusters of the replaced entities to cache them as candidate triplets. High-quality head or tail entities are sampled from the corresponding cache to form a negative triplet

(h^{'}, r, t)

or

(h, r, t^{'})

. We use

N = 30

and iterate s times re-cluster the entities once.

Table 5 compares the link prediction performance of the translation models. Kbgan, NSCaching and CCS all significantly outperform the baseline Bernoulli scheme. These methods effectively optimize the negative sampling in the training process, which verifies that the use of high-quality negative triplets can better improve the performance of the KGE models. We can also observe that the CCS negative sampling method outperforms the other advanced models.

4.4.2. Analysis of Experimental Results of Op-Trans

In this section, we will explore the effect of the weights of triplet-mapping properties on the underlying model mentioned above. We still adopted the comparison method introduced in Section 4.4.1, taking the CCS negative sampling optimization method as a prerequisite for the KGE models, so that the analysis results on the base model can be more accurate and the subsequent opportunity to further enhance the base algorithm is available. We integrated the triplet-mapping-property weights with the CCS negative sampling optimization method to form the knowledge-graph optimization framework Op-Trans. We tested the predictive power of Op-Trans on four data sets.

Table 6 shows the experimental results of these translation models after using the Op-Trans learning framework. On the WN18RR, Op-Trans improves on Hits@10 by approximately 13% compared to TransE, and 6% on Hits@10 compared to CCS + TransE (which takes our proposed negative sampling method for TransE only). On the FB15K237, Op-TransE improves on Hits@10 by approximately 16% compared to TransE. Meanwhile, our Op-Trans learning framework achieved a good improvement on the results of other baseline models of KGE.

The Op-Trans learning framework enables a better performance of the family of translation models. Figure 9 and Figure 10 show the comparison results of Hits@10 for baseline models. Baseline models using only the CCS negative sampling optimization method, and baseline models using Op-Trans framework are compared with the baseline model on two datasets: WN18RR and FB15K237, respectively.

Table 7 summarizes our results on WN18. CCS + TransE only used the CCS negative-sampling optimization method for TransE, and Op-TransE used of Op-Trans lifting framework for the TransE model. We can observe that the Hits@10 of CCS + TransE is approximately 3.7% better than the TransE model. Op-TransE was based on CCS + TransE further considering the mapping properties of triplets in the embedding space, and the Hits@10 value of Op-TransE increased by about 5.1% compared with the TransE model. Table 8 summarizes the results of our comparison experiment on the FB15.

On the FB15k, there are 592,213 positive triplets with 14,951 entities, and each entity corresponds to approximately 39.6 positive triplets on average. On the WN18, there are 151,442 positive triplets with 40,943 entities, and each entity corresponds approximately to about 3.7 positive triplets on average. Since the average number of positive triplets corresponding to each entity on WN18 is much less than that on FB15k, and the number of relationship types on WN18 is also much lower than that on FB15k, we think that the graphical information on WN18 may be simpler than that on FB15k. However, our method showed a better improvement on FB15k. This indicates that the Op-Trans framework performs better in the face of various complex entities and relation embedding, and has an advantage in complex knowledge graphs.

To further evaluate the performance of Op-Trans and also verify that the model has better advantages for complex relationships, we continued to study the performance of Op-Trans in different relationship types on FB15K. Figure 11 shows the distribution of 1345 relations on FB15K, where the 1-to-1 type of simple relationship accounts for 24%, and the 1-to-N, N-to-1, and N-to-N types of complex relationship account for 23%, 29% and 24%, respectively.

Table 9 shows the results of the comparison of the Hits@10 of four different types of relations on FB15K. From Table 9, we can observe that our proposed method has a great improvement on the effect of dealing with complex relationships compared with the TransE. For the 1-to-1 relation prediction, the head-entity forecast improved the Hits@10 by 15.1% and tail entities increased by 17.6%. For the 1-to-N relation type, the head entity Hits@10 prediction increased by 18.1%, and the tail entity increased by 17.8%. For the N-to-1 relations type, the head entity Hits@10 prediction increased by 26.9%, and the tail entity increased by 12.0%. For the more complex N-to-N relations type, the head entity Hits@10 prediction increased by 10.8%, and the tail entity prediction increased by 37%. It is worth noting that the Op-Trans outperforms other baseline models in most cases. In particular, Op-Trans can improve the performance of 1-to-N relation tail-entity prediction and N-to-1 relation head-entity prediction, which are difficult tasks for other models. Overall, our method offers significant advantages in dealing with complex relationships.

From the above experiments, it can be seen that the Op-Trans model achieves good results in all tasks. The increase in Hits@10 indicates that the Op-Trans has a higher quality for knowledge graph completion tasks. This proves that the Op-Trans framework can help the translation models learn entity embedding and relation embedding with more representational power. Thus, the task of link prediction can be completed more effectively.

5. Conclusions

This paper has proposed a simple and effective optimization framework for translation-based knowledge-graph embedding, which is named Op-trans. Op-Trans consists of two components: negative sampling by a cluster-cache sampling and weighting strategy, based on triplet-mapping properties. We address the inability of the knowledge-graph embedding model to obtain high-quality negative triplets, and propose cluster-cache sampling as a new negative sampling method. It largely improves the similarity between the replacement entity and the replaced entity, thereby improving the quality of the negative triplet. Furthermore, we assign a different mapping-weighting strategy to each triplet and provide positive feedback to the triplet that contributes more to the model training goal. It not only represents the hierarchical characteristics but is also flexible enough to adapt to the various mapping properties of the knowledge triplets. The experimental results also indicate that our method has a positive impact on completion ability compared with most baseline models. The biggest feature of our proposed Op-Trans method is its strong flexibility, which can effectively improve various basic models. However, our negative sampling approach incurs some maintenance costs due to the cache-update operation. In future work, we will discuss in detail the construction costs of our method in terms of time and storage space. We would like to extend our model to address knowledge-graph embedding in real-world applications with a smaller cost.

Author Contributions

Conceptualization, H.H.; methodology, H.H.; software, H.H.; validation, K.W. and X.L.; formal analysis, H.H.; investigation, H.H. and X.L.; resources, K.W.; data curation, H.H.; writing—original draft preparation, H.H.; writing—review and editing, H.H., X.L. and K.W.; visualization, H.H.; supervision, K.W.; project administration, K.W.; funding acquisition, K.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shanghai Science and Technology Innovation Action Planning (grant number:20dz1203800).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This research was supported by Shanghai Ocean University.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chami, I.; Wolf, A.; Juan, D.C.; Sala, F.; Ravi, S.; Ré, C. Low-Dimensional Hyperbolic Knowledge Graph Embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6901–6914. [Google Scholar]
Miller, G.A. WordNet. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data-SIG-MOD’08, Vancouver, BC, Canada, 9–12 June 2008; ACM Press: New York, NY, USA, 2008; pp. 1247–1250. [Google Scholar]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, 11–15 November 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 722–735. [Google Scholar]
Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Hruschka, E.; Mitchell, T. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, 11–15 July 2010; pp. 1306–1313. [Google Scholar]
Roopak, N.; Deepak, G. OntoKnowNHS: Ontology driven knowledge centric novel hybridised semantic scheme for image recommendation using knowledge graph. In Proceedings of the Iberoamerican Knowledge Graphs and Semantic Web Conference, Kingsville, TX, USA, 22–24 November 2021; Springer: Cham, Switzerland, 2021; pp. 138–152. [Google Scholar]
Li, L.; Li, H.; Kou, G.; Yang, D.; Hu, W.; Peng, J.; Li, S. Dynamic Camouflage Characteristics of a Thermal Infrared Film Inspired by Honeycomb Structure. J. Bionic Eng. 2022, 19, 458–470. [Google Scholar] [CrossRef]
Wu, X.; Tang, Y.; Zhou, C.; Zhu, G.; Song, J.; Liu, G. An Intelligent Search Engine Based on Knowledge Graph for Power Equipment Management. In Proceedings of the 2022 5th International Conference on Energy, Electrical and Power Engineering (CEEPE), Chongqing, China, 22–24 April 2022; pp. 370–374. [Google Scholar]
Shi, M. Knowledge graph question and answer system for mechanical intelligent manufacturing based on deep learning. Math. Probl. Eng. 2021, 2021, 6627114. [Google Scholar] [CrossRef]
Su, X.; He, J.; Ren, J.; Peng, J. Personalized Chinese Tourism Recommendation Algorithm Based on Knowledge Graph. Appl. Sci. 2022, 12, 10226. [Google Scholar] [CrossRef]
Ding, J.H.; Jia, W.J. A Review of Knowledge Graph Completion Algorithms. Inf. Commun. Technol. 2018, 12, 56–62. [Google Scholar]
Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; Zhang, W. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; ACM: Rochester, NY, USA, 2014; pp. 601–610. [Google Scholar]
Wang, Q.; Mao, Z.; Wang, B.; Guo, L. Knowledge graph embedding: A survey of approaches and applications. IEEE Trans. Knowl. Data Eng. 2017, 29, 2724–2743. [Google Scholar] [CrossRef]
Chang, L.; Zhu, M.; Gu, T.; Bin, C.; Qian, J.; Zhang, J. Knowledge graph embedding by dynamic translation. IEEE Acess 2017, 23, 20898–20907. [Google Scholar] [CrossRef]
Bordes, A.; Weston, J.; Collobert, R.; Bengio, Y. Learning structured embeddings of knowledge based. In Proceedings of the AAAI 2011, San Francisco, CA, USA, 7–11 August 2011; AAAI: Menlo Park, CA, USA, 2011; pp. 301–306. [Google Scholar]
Bordes, A.; Glorot, X.; Weston, J.; Bengio, Y. A semantic matching energy function for learning with multi-relational data. Mach. Learn. 2014, 94, 233–259. [Google Scholar] [CrossRef] [Green Version]
Socher, R.; Chen, D.; Manning, C.D.; Ng, A. Reasoning with Neural Tensor Networks for Knowledge Base Completion. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 926–934. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collection learning on multi-Relational data. In Proceedings of the ICML 2011, Washington, DC, USA, 28 June–2 July 2011; ACM: New York, NY, USA, 2011; pp. 809–816. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
Liang, Z.; Yang, J.; Liu, H.; Huang, K.; Qu, L.; Cui, L.; Li, X. SeAttE: An Embedding Model Based on Separating Attribute Space for Knowledge Graph Completion. Electronics 2022, 11, 1058. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI, Austin, TX, USA, 25–30 January 2015; AAAI: Menlo Park, CA, USA, 2015; pp. 2181–2187. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Beijing, China, 27–31 July 2015; pp. 687–696. [Google Scholar]
Fan, M.; Zhou, Q.; Chang, E.; Zheng, F. Transition-based knowledge graph embedding with relational mapping properties. In Proceedings of the Twenty-Eighth Pacific Asia Conference on Language, Information and Computation, Phuket, Thailand, 12–14 December 2014; pp. 328–337. [Google Scholar]
Ji, G.; Liu, K.; He, S.; Zhao, J. Knowledge graph completion with adaptive sparse transfermatrix. In Proceedings of the National Conference on Artificial Intelligence, Amsterdam, The Netherlands, 10–11 November 2016; pp. 985–991. [Google Scholar]
Nguyen, D.Q.; Sirts, K.; Qu, L.; Johnson, M. STransE: A novel embedding model of entities and relationships in knowledge bases. In Proceedings of the NAACL HLT, San Diego, CA, USA, 12–17 June 2016; ACL: Stroundsburg, PA, USA, 2016; pp. 460–466. [Google Scholar]
Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransA: An adaptive approach for knowledge graph embedding. arXiv 2015, arXiv:1509.05490, 2015. [Google Scholar]
Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Philip, S.Y. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE T Rans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
Lei, Z.; Sun, Y.; Nanehkaran, Y.A.; Yang, S.; Islam, M.S.; Lei, H.; Zhang, D. A novel data-driven robust framework based on machine learning and knowledge graph for disease esification. Future Gener. Comput. Syst. 2020, 102, 534–548. [Google Scholar] [CrossRef]
Sun, Z.Q.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge graph embedding by relation in complex space [EB/OL]. arXiv 2019, arXiv:1902.10197. [Google Scholar]
Zhang, Z.; Cai, J.; Zhang, Y.; Wang, J. Learning hierarchy -aware knowledge graph embeddings for link prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 9–11 February 2020; Volume 34, pp. 3065–3072. [Google Scholar]
Tang, Y.; Huang, J.; Wang, G.; He, X.; Zhou, B. Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; pp. 2713–2722. [Google Scholar]
Han, X.; Cao, S.; Lv, X.; Lin, Y.; Liu, Z.; Sun, M.; Li, J. Openke: An Open Toolkit for Knowledge Embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018; pp. 139–144. [Google Scholar]
Wang, Y.; Ruffinelli, D.; Gemulla, R.; Broscheit, S.; Meilicke, C. On evaluating embedding models for knowledge base completion. arXiv 2018, arXiv:1810.07180, 2018. [Google Scholar]
Toutanova, K.; Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China, 31 July 2015; pp. 57–66. [Google Scholar]
Han, Z.; Chen, Y.; Li, M.; Liu, W.; Yang, W. An efficient node influence metric based on triangle in complex networks. Acta Phys. Sin. 2016, 65, 168901. [Google Scholar]
Hu, X.; Tao, Y.; Chung, C.W. I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 2014, 39, 1–30. [Google Scholar] [CrossRef]
Zhang, Y.; Yao, Q.; Shao, Y.; Chen, L. NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding. In Proceedings of the IEEE International Conference on Data Engineering, Macao, China, 8–11 April 2019; pp. 614–625. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Wang, P.; Li, S.; Pan, R. Incorporating GAN for negative sampling in knowledge representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Cai, L.; Wang, W.Y. Kbgan: Adversarial learning for knowledge graph embeddings. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, New Orleans, LA, USA, 1–6 June 2018; Volume 1, pp. 1470–1480. [Google Scholar]
Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A K-Means clustering algorithm. J. R. Stat. Soc. 1979, 28, 100–108. [Google Scholar] [CrossRef]
Hamerly, G.; Elkan, C. Alternatives to the K-Means algorithm that find better clusterings. In Proceedings of the 11th International Conference on Information and Knowledge Management, McLearn, VA, USA, 4–9 November 2002; pp. 600–607. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Dai, C.; Chen, L.; Li, B.; Li, Y. Link prediction in multi-relational networks based on relational similarity. Inf. Sci. 2017, 394–395, 198–216. [Google Scholar] [CrossRef]
Wang, P.; Liu, J.; Hou, D.; Zhou, S. A Cybersecurity Knowledge Graph Completion Method Based on Ensemble Learning and Adversarial Training. Appl. Sci. 2022, 12, 12947. [Google Scholar] [CrossRef]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D knowledge graph embeddings. In Proceedings of the AAAI, New Orleans, LA, USA, 2–3 February 2018; pp. 1811–1818. [Google Scholar]

Figure 1. Embedding space.

Figure 2. Example of complex relationships. The left side is the 1-to-N relationship type, the right side is the N-to-1 relationship type.

Figure 3. TransE model illustration.

h

representing head entity,

r

representing relation,

t

representing tail entity.

Figure 3. TransE model illustration.

h

representing head entity,

r

representing relation,

t

representing tail entity.

Figure 4. The Op-Trans method framework. Part (I) of the figure is the initialization module. The backbone maps the entities and relations in the knowledge graph into a low-dimensional dense vector space, as shown in (a), where the circles in the figure represent the entities in the knowledge graph and the triangles represent the relations in the knowledge graph. We cluster the entities according to their mapping features, as shown in (b). Part (II) of the figure represents the training phase, and (c) represents the triplet

(h, r, t)

sample batches extracted from training. the triplets in the sample batch are treated as positive triplets, as shown in (d). A corresponding negative triplet is generated for each positive triplet participating in the training, as shown in (e). The scoring function is calculated for each triplet in (f), where

t m p

is the computed mapping properties for each triplet and is the computed score for each triplet.

Figure 4. The Op-Trans method framework. Part (I) of the figure is the initialization module. The backbone maps the entities and relations in the knowledge graph into a low-dimensional dense vector space, as shown in (a), where the circles in the figure represent the entities in the knowledge graph and the triangles represent the relations in the knowledge graph. We cluster the entities according to their mapping features, as shown in (b). Part (II) of the figure represents the training phase, and (c) represents the triplet

(h, r, t)

sample batches extracted from training. the triplets in the sample batch are treated as positive triplets, as shown in (d). A corresponding negative triplet is generated for each positive triplet participating in the training, as shown in (e). The scoring function is calculated for each triplet in (f), where

t m p

is the computed mapping properties for each triplet and is the computed score for each triplet.

Figure 5. Negative triplet sampling process.

Figure 6. An example of negative sampling. There is a positive triplet in (a), and (b) shows the candidate negative samples in the cache, which are generated by negative sampling of the positive triplet. (c) represents the embedded space. ▲ denotes the positive triplet, ● denotes the negative triplet.

Figure 7.

h

denotes the head entity,

r

denotes the relation,

t

denotes the tail entity, and

\hat{t}

denotes a real representation of

h + r

in the vector space.

Figure 7.

h

denotes the head entity,

r

denotes the relation,

t

denotes the tail entity, and

\hat{t}

denotes a real representation of

h + r

in the vector space.

Figure 8. Experimental structure diagram.

Figure 9. Hits@10 values for the three methods on the dataset WN18RR, ■ indicates initial model, ■ indicates CCS negative sample, ■ indicates Op-Trans framework, * is denoted as various baseline models.

Figure 10. Hits@10 values for the three methods on the dataset FB15K237, ■ indicates initial model, ■ indicates CCS negative sample, ■ indicates Op-Trans framework, * is denoted as various baseline models.

Figure 11. Type distribution of 1345 relations on FB15K.

Table 1. The general concepts and symbols in this paper.

Symbols	Concepts
$E$	The entity set of the knowledge graph
$R$	The relation set of the knowledge graph
$h, r, t$	Head entity, relation, and tail entity, respectively. $h, t \in E$ , $r \in R$
$h, r, t$	The embedding vector of $h, r, t$
$Δ$	The triplet set of the knowledge graph
$(h, r, t)$	The triplet in the knowledge graph, $(h, r, t) \in Δ$
$Δ^{'}$	The negative triplet set generated
$(h^{'}, r, t^{'})$	The negative triplet that is not in the knowledge graph, $(h^{'}, r, t^{'}) \in Δ^{'}$

Table 2. Statistics of datasets. The symbols #entity and #relation denote the number of entities and relations, respectively. #train, #valid, and #test denote the size of train set, validation set, and test set, respectively.

Dataset	#Entity	#Relation	#Train	#Valid	#Test
WN18RR	40,943	11	86,835	3034	3134
FB15K237	14,541	237	272,115	17,535	20,466
WN18	40,943	18	141,442	5000	5000
FB15K	14,951	1345	483,142	50,000	59,071

Table 3. Optimal configurations of each dataset.

	FB15K237	WN18RR	FB15K	WN18
$d$	80	80	150	80
$η$	0.0001	0.0001	0.0005	0.0005
$γ$	5	5	4	4
$b$	1000	1000	2000	1000
k	50	50	50	50
s	30	30	30	30
$α$	2	2	2	2
epoch	3000	3000	3000	3000

Table 4. Definition of scoring functions for TransE, TransH, TransR, TransD.

Model	Score Function
TransE	${‖h + r - t‖}_{1 / 2}$
TransH	${‖(h - w_{r}^{⊤} h w_{r}) + r - (t - w_{r}^{⊤} t w_{r})‖}_{1 / 2}$
TransR	${‖h M_{r} + r - t M_{r}‖}_{1 / 2}$
TransD	${‖h M_{r h} + r - t M_{r t}‖}_{1 / 2}$

Table 5. Comparison of various negative sampling methods on two datasets, WN18RR and FB15K237. Bold indicates the best performance.

Model	Dateset	WN18RR			FB15K237
Model	Metrics	MR	MRR	Hits@10	MR	MRR	Hits@10
TransE	Bernoulli	3924	0.1784	0.4509	197	0.2556	0.4189
	Kbgan	5356	0.1808	0.4324	722	0.2926	0.4659
	NSCaching	4472	0.2002	0.4783	186	0.2993	0.4764
	CCS	4481	0.2031	0.4809	182	0.3101	0.4802
TransH	Bernoulli	4113	0.1862	0.4509	202	0.2329	0.4010
	Kbgan	4881	0.1869	0.4481	455	0.2779	0.4619
	NSCaching	4491	0.2041	0.4804	185	0.2832	0.4659
	CCS	4337	0.2273	0.4894	187	0.2897	0.4726
TransR	Bernoulli	3824	0.1884	0.4603	191	0.2397	0.4201
	Kbgan	4572	0.1871	0.4521	793	0.2424	0.4431
	NSCaching	3639	0.2010	0.4822	181	0.2751	0.4773
	CCS	3801	0.2300	0.4891	174	0.2822	0.4830
TransD	Bernoulli	3555	0.1901	0.4641	188	0.2451	0.4289
	Kbgan	4083	0.1875	0.4641	825	0.2465	0.4440
	NSCaching	3104	0.2013	0.4839	189	0.2863	0.4785
	CCS	3317	0.2415	0.5297	177	0.2890	0.4852

Table 6. Results of the performance of the Op-Trans framework on the baseline model on two datasets, WN18RR and FB15K237.

Model	WN18RR			FB15K237
Model	MR	MRR	Hits@10	MR	MRR	Hits@10
Op-TransE	4171	0.2122	0.5102	177	0.2415	0.4852
Op-TransH	4404	0.2307	0.5121	180	0.2893	0.4815
Op-TransR	3685	0.2415	0.5217	169	0.2993	0.4873
Op-TransD	3352	0.2900	0.5425	182	0.3010	0.4924

Table 7. Experimental results on the WN18 compared with the TransE model.

	MR	MRR	Hits@1	Hits@3	Hits@10
TransE	251	0.779	0.709	0.820	0.892
CCS + TransE	237	0.792	0.728	0.822	0.925
Op-TransE	223	0.803	0.711	0.836	0.937

Table 8. Experimental results on the FB15K compared with the TransE model.

	MR	MRR	Hits@1	Hits@3	Hits@10
TransE	125	0.451	0.297	0.418	0.471
CCS + TransE	84	0.524	0.302	0.603	0.682
Op-TransE	77	0.550	0.371	0.617	0.744

Table 9. Detailed results on FB15K in terms of different relation categories.

	Predicting Head (Hits@10)				Predicting Tail (Hits@10)
	1-to-1	1-to-N	N-to-1	N-to-N	1-to-1	1-to-N	N-to-1	N-to-N
TransE	0.437	0.657	0.182	0.472	0.437	0.197	0.667	0.500
Op-TransE	0.503	0.776	0.231	0.523	0.514	0.232	0.747	0.685
TransH	0.668	0.876	0.287	0.645	0.655	0.398	0.833	0.672
Op-TransH	0.759	0.893	0.315	0.741	0.759	0.400	0.841	0.717
TransR	0.788	0.892	0.341	0.692	0.792	0.374	0.904	0.721
Op-TransR	0.796	0.927	0.339	0.765	0.793	0.378	0.925	0.817
TransD	0.861	0.955	0.398	0.785	0.854	0.506	0.944	0.812
Op-TransD	0.874	0.962	0.441	0.822	0.881	0.566	0.958	0.854

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, H.; Li, X.; Wu, K. Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding. Appl. Sci. 2023, 13, 2817. https://doi.org/10.3390/app13052817

AMA Style

Han H, Li X, Wu K. Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding. Applied Sciences. 2023; 13(5):2817. https://doi.org/10.3390/app13052817

Chicago/Turabian Style

Han, Huixia, Xinyue Li, and Kaijun Wu. 2023. "Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding" Applied Sciences 13, no. 5: 2817. https://doi.org/10.3390/app13052817

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Op-Trans: An Optimization Framework for Negative Sampling and Triplet-Mapping Properties in Knowledge Graph Embedding

Abstract

1. Introduction

2. Related Works

2.1. The Translation Models

2.2. Negative Sampling Methods

3. Proposed Method

3.1. Review of Translation Model Framework

3.2. Op-Trans Framework

3.3. Negative Sampling

3.4. Triplets Mapping Properties Weighting

4. Experiments

4.1. Data Sets

4.2. Experiment Setup

4.3. Evaluation Task and Metrics

4.3.1. Link Prediction

4.3.2. Performance Measurements

4.4. Experimental Design

4.4.1. Comparing Different Negative Sampling Methods

4.4.2. Analysis of Experimental Results of Op-Trans

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI