1. Introduction
Theknowledge graph is composed of many fact triples (
head entity,
relation,
tail entity), in the directed graph, the source and target nodes correspond to the head and tail entities, respectively, while the relations are depicted as edges [
1,
2]. In recent years, knowledge graphs (KGs) have found applications across a broad spectrum of real-world scenarios, including intelligent question answering [
3], personalized recommendation [
4,
5], natural language processing [
6], and object detection [
7,
8]. However, real-world knowledge graphs including WordNet [
9], Freebase [
10], or Yago [
11] are usually incomplete. In recent years, predicting missing links through knowledge graph embedding (KGE) has gained substantial attention as a pivotal research area in achieving knowledge graph completion [
2].
KGE transforms the entities and relations within the knowledge graph into low-dimensional continuous space representations. Each fact triple (
head entity,
relation,
tail entity) is represented as
. If entities and relations are represented using d-dimensional real vectors,
. To evaluate the performance of the entity and the relation representation, the KGE approach evaluates the credibility of the triples by designing a scoring function. The optimization objective of KGE is geared towards ensuring that elevated scores are assigned to positive triples, while negative triples receive lower scores. Presently, prevailing KGE methods can be classified into dot product models and distance models based on the structure of the scoring function. The dot product model is used as a triple scoring function by calculating the dot product between head entity embedding, relation embedding, and tail entity embedding. Examples of such methods include DistMult [
12], HoLE [
13], ComplEx [
14], QuatE [
15], which capture semantic information through pairwise feature interactions between potential factors. The distance model uses
or
distance as the scoring function. Among them, the translation model TransE [
16] and the complex embedding model RotatE [
17] can be classified into this category. In the distance model, the process involves either adding the head entity embedding to the relation embedding or computing the Hadamard product to obtain a vector close to the tail entity embedding. Subsequently, the distance between the two vectors is calculated. Such methods as TransH [
18], TransR [
19], TorusE [
20] and GIE [
21] utilize translation invariance to preserve the original semantic relationships.
We observe that the scoring function’s range remains unbounded, irrespective of whether it belongs to the dot product model or the distance model. This unrestricted range raises the potential for increased variance. The score range of the dot product model is from negative infinity to positive infinity, and the score range of the distance model is from 0 to positive infinity. Given that the score range of triples is unbounded, the sensitivity of triple scores to variations in both the embeddings of entities and relations results in substantial model variance. To resolve this issue, a straightforward approach is to normalize [
22] both the embeddings of entities and relations. The range of triple scores is guaranteed to be bounded by eliminating the difference in numerical values between each feature. However, such approach is affected by the dimensionality of the embedding, and the score range varies for embeddings of different dimensions. As a result, to obtain a fixed bounded range, we adopt the use of cosine similarity as a scoring function. Firstly, the cosine similarity is used as a normalization mechanism, independent of the embedding dimension, and its score is fixed in the range of −1 to 1. Secondly, cosine similarity stands out as a widely employed semantic similarity measure, commonly used to assess the similarity between document vectors [
23,
24,
25]. Smaller angles between similar vectors aid in distinguishing the encoded information of various types of entity embeddings.
To achieve this goal, we propose RoCS, a KGE based on joint cosine similarity. Cosine similarity is chosen for its bounded range, dimensionality independence, and effectiveness in capturing semantic relationships. This measure ensures numerical stability during training and adapts seamlessly to varying dimensions of embeddings. The rotation embedding model RotatE [
17] is a stronger baseline for reasoning about three important relational patterns in knowledge graphs, i.e., symmetric/antisymmetric, inverse and composition. RotatE uses
distance as the score function to score an unbounded range, while we consider the use of cosine similarity as the score function to ensure that the score range is bounded. However, directly calculating the cosine similarity result for two complex vectors can be intricate, while we need a real number result to score the triplet. To address this challenge, we present a joint cosine similarity calculation method as the complex vector cosine similarity, as shown in
Figure 1. Specific, we merge the real and imaginary aspects of the complex vector into a novel joint vector, and subsequently compute the cosine similarity of this joint vector. It can be found that the joint cosine similarity does not change the range of the calculated results while reflecting the overall similarity between the two complex vectors. We evaluate the performance of our method on FB15K [
16], FB15K-237 [
26], WN18 [
16], and WN18RR [
27] datasets for the link prediction task. Experimental results show that our method outperforms the current state-of-the-art complex vector embedding models ComplEx [
14] and RotatE [
17] on all evaluation metrics for all datasets. Furthermore, the proposed method RoCS is highly competitive with the current state-of-the-art methods [
15,
28,
29]. We also explore various techniques for computing the cosine similarity of complex vectors, and through experiments, we validate the superiority of our proposed joint method over other approaches.
In summary, the primary contributions of our work are as follows:
We propose a joint cosine similarity method to calculate the complex vector similarity as a scoring function.
Our approach combines the rotational properties of the complex vector model RotatE to reason about a variety of important relational patterns.
We have experimentally verified that the proposed RoCS provides a significant improvement over RotatE and achieves results close to or even higher than the current state-of-the-art.
2. Related Work
KGE predicts missing links by mapping symbolic representations of entities and relations into vector or matrix representations. Most KGE methods [
30] are considered to utilize triples as learning resources, deriving the semantics of entities and relations from graph structures. Preserving original semantic relations through scoring function design has become a key research focus in recent years [
1,
2]. Based on the scoring function’s structure, the majority of the work can be categorized into dot product models and distance models.
The dot product model takes the form of dot product operations on the head entity, relation, and tail entity. Semantic information is captured through pairwise feature interactions between potential factors. The earliest work is the RESCAL [
31], which uses a matrix to represent the relation
and a vector to represent the entities
. To reduce the relation embedding parameters, DistMult [
12] constrains the relation matrix to be a diagonal matrix and uses a vector to represent the relation
. Since DistMult is overly simple and can only infer symmetric relations. HolE [
13] utilizes the cyclic correlation dot product operation to infer anti-symmetric relations. ComplEx [
14] applies a complex space to encode entities and relations, utilizing the complex conjugate property to model anti-symmetric relations. To further facilitate feature interaction, QuatE [
15] suggests the use of quaternion spaces to represent entities and relations. In addition, there are neural network models including ConvE [
27], InteractE [
32], graph neural networks [
33,
34] and tensor decomposition models Tucker [
28], LowFER [
29] can also be regarded as dot product models.
Distance models utilize relations to translate or rotate the head entity and subsequently calculate the distance to the tail entity as the scoring function. In the case of TransE [
16], the relationship is a translation originating from the head entity and extending to the tail entity. Guided by the principle of translation invariance, the sum of the head entity embedding and the relation embedding is expected to be close to the distance between the tail entity embeddings. Consequently, TransE uses the
or
distances as a scoring function. Since the TransE model cannot handle N-to-N relationships, TransH [
18] presents a hyperplane representation that maps entities to relationship specifications. TransR [
19] consider simplifying the space specified by the hyperplane for relationships. The complex embedding model RotatE [
17] has been recently proposed, which uses a complex space to represent entities and relations. RotatE utilizes Euler’s formula to represent the relationship as a rotational operation between the head entity and the tail entity. By leveraging the rotation property, RotatE deduces various essential relation patterns [
17].
Nevertheless, whether using the dot product model or distance model, the triple scores remain unbounded. Substantial score disparities between positive and negative samples amplify variance and diminish the model’s generalization capability. In contrast to prior approaches, we propose the method of computing the joint cosine similarity of complex vectors as a scoring function to constrain the bounded triple scoring range. Moreover, we propose a KGE method utilizing joint cosine similarity. Our work combines the RotaE rotation property of the complex vector embedding model to model a variety of different relational patterns.
Table 1 summarizes our approach with other related work. The normalization effect of cosine similarity can reduce the variance and prevent gradient vanishing [
35]. Moreover, cosine similarity finds extensive application in natural language processing for assessing the similarity of words, sentences, and document vectors [
23,
24,
25]. The angle between similar vectors should be smaller, which can also help to distinguish different types of entities. In short, the main motivations behind these models include (1) using cosine similarity can make the triple scores bounded and reduce the variance, (2) distinguishing the embedding information of various entity types, and (3) reflecting the difference in direction between vectors.