3.1. Problem Definition
As a type of graph structure, in a knowledge graph entities are represented as nodes and the relations between entities are represented as edges. However, the symbolic features of triples make processing difficult. Obtaining more effective entity embedding representations has become a major challenge in entity alignment tasks. For generality, this paper uses uppercase letters to represent sets and lowercase letters to represent vectors. Let represent a knowledge graph, where represents the set of all entities; represents the set of relationship predicates; represents the set of property predicates; represents the set of property values; and represents the set of triples that relate entities and their properties. , where represents relationship triples in the knowledge graph—where stands for head entity, stands for tail entity, and stands for relationship (predicate) between them— represents property triples —where stands for entity, stands for property name, and stands for property value. Entity alignment endeavors to unearth the corresponding entities across divergent knowledge graphs.
Given two knowledge graphs, and , we aim to discover , where , and , indicating that and denote the identical real-world entity, with indicating an equivalence relation. We employ an embedding-based model to assign a continuous representation to every element of two types of triples and , represented in bold font , .
3.3. Property Character Embedding
After applying the TransE algorithm, we proceeded with the property character-level embedding. Here, the predicate is interpreted as the transformation from the head entity () to the property value () within the context of this paper. However, in the two knowledge graphs, the same property value may manifest in different forms, for example, as 20.445 and 20.445444 in financial data, or as “Li Bai” and “Qinglian Jushi” in personal names. Therefore, to encode the property values, a composite function is used in this paper, and the relationship for each element in the is defined as . Here, is a composite function, and is the property value, . The composite function is employed in this paper to encode the property values into a single vector while mapping similar property values to similar vector representations. Three composite functions are defined in this paper.
Summation composite function (SUM). The initial composite function pertains to the summation function (SUM), which is defined as the total sum of all character embeddings of the property values. The definition of the summation composite function is as follows:
The characters represent the character embeddings of the property value. However, this composite function is not without limitations. Its inadequacy lies in the fact that when two strings share the same character set but in different orders, they will have identical vector representations. For instance, the values “20.18” and “18.02” would result in the same vector representation, rendering the function less effective.
Composite function based on LSTM (LSTM). In order to surmount the restrictions posed by the SUM composite function, this paper puts forth a novel composite function founded on Long Short-Term Memory (LSTM). This function employs an LSTM network to encode the character sequence of the property value into a solitary vector. The ultimate hidden state of the LSTM network is utilized as the vector representation of the property value. The composite function based on LSTM is delineated as follows:
N-gram-based composite function (N-gram). This paper further proposes an N-gram-based composite function as a viable solution to mitigate the limitations of the SUM composite function. Specifically, this function uses the sum of all n-tuples (N-grams) in the property value as the vector representation. The definition of the N-gram-based composite function is shown as follows:
where
N represents the upper limit of N-gram combinations utilized (in this study, N = 15); and t signifies the length of the property value.
To acquire the property character embedding, the following objective function is minimized in this study, the detailed definition of
is as follows:
The detailed definitions of
and
are as follows:
where
denotes the collection of authentic property triplets within the training dataset; and
represents the collection of defective property triplets (where A signifies the collection of properties in G). The erroneous triplets serve as negative samples, where a random entity replaces the head entity or a random property value replaces the property.
symbolizes the confidence score of the vector representation of the property value, which is rooted in the embedding of the head entity “h”, the embedding of the relationship “r”, and the vector representation of the property value derived via the composite function
.
3.4. Heterogeneous Graph Transformer with Relation Awareness (HGTRA)
The process by which the graph transformer assimilates all the neighboring features of node “h” can be elegantly formulated as follows:
In this equation,
is used to calculate the importance of each neighboring node;
extracts features from each neighboring node; and
aggregates neighbor information using attention weights. However, as illustrated in Equation (7), the graph transformer fails to consider edge features. To address this, we designed a novel heterogeneous graph transformer with relation awareness (HGTRA) in this paper, inspired by previous work [
26]. The proposed HGTRA enables our model to differentiate between the heterogeneous features of relations and properties, thus facilitating a better capture of neighborhood similarities among aligned entities. Let
represents the output of the
-th layer of HGTRA, which also serves as the input to the
-th layer. At first, the value of
is equal to
.When HGTRA takes a relation triplet as input, the output is relation-based embedding. When it takes a property triplet as input, the output is property-based embedding. HGTRA mainly consists of the following four layers, as shown in
Figure 2:
(a) Relation Embedding. Considering the possible similarity between the aligned relation and the head entity and tail entity, this paper generates relation features by combining the relevant entity features. In particular, the relation embedding
is approximately calculated by taking an average of the embeddings of its related head entity
and tail entity
, as demonstrated in the subsequent formula:
In this equation, denotes the size of a set; and are attention vectors; denotes the operation of concatenation; and denotes the activation function Rectified Linear Unit (ReLU);
(b) Heterogeneous Attention. In this work, entity
is mapped to a key vector
, and its neighboring entity
is mapped to a query vector
. In contrast to other methods, this work uses the dot product of their concatenation and
as the value of attention, rather than directly using the dot product of the key and query vectors.
is derived from the feature aggregation of the related head and tail entities (refer to Equation (8)), hence it does not stray too far from the embeddings of its linked entities. In addition,
signifies the heterogeneous feature of the edge, thereby causing distinct effects on the contribution of neighboring pairs linked to different edges towards the entity
. More specifically, this work calculates the multi-head attention of each neighbor relation
, evaluated in the following manner in this study:
Among them, the detailed expression of
is as follows:
where
the symbol
denotes the set of entities neighboring
; the parameter
is the attention parameter of dimensionality
, where
denotes the number of attention heads. It should be noted that the Softmax operation ensures that the sum of attention weights assigned to all neighboring entities is equal to one;
(c) Heterogeneous Message. Likewise, this paper aims to integrate relationships into the message-passing mechanism in order to differentiate the disparities between diverse categories of edges. For any given
, the calculation of its multi-head message is carried out as follows:
The detailed expression of
is shown below:
In order to obtain the -th message head, , this paper first applies the linear projection to project the characteristics of the tail entity . Subsequently, it concatenates the features of and the relation , and connects all message heads to obtain the final heterogeneous message;
(d) Heterogeneous Aggregation. The final step is heterogeneous aggregation, depicted in
Figure 2d, where the heterogeneous multi-head attention and messages are merged into entities. By using attention coefficients to weigh the messages of neighboring entities, we can aggregate information from neighbors with different features and update the vector representation of entity
. The specific formula is shown below:
In this context, the symbol
represents the operation of superimposition. In order to combine the characteristics of names and the features derived from a multi-layer neural network, we employ residual connections [
27] to create the ultimate modified embedding, as demonstrated in the subsequent equation:
where
is a trainable weight; and
and
are linear projections. Finally, based on the entire relation structure
and the property structure
, this paper can generate relation-based embedding
and property-based embedding
, respectively, and employ them for end-to-end entity alignment tasks.
3.5. Learning Alignment
Upon obtaining the ultimate entity representations, this paper uses the Manhattan distance to gauge the similarity among potential pairs of entities. The more negligible the distance, the greater the likelihood of entity alignment. To calculate the similarity between candidate entity pairs, this paper uses
and
, and the specific equation is stated as follows:
where
;
denotes the Manhattan distance.
Previous methods generally concatenated the embeddings of entities from multiple sources and employed them directly in the loss function to capture the entity features comprehensively. Nevertheless, we opine that relation-based embedding and property-based embedding may contribute to EA differently, since these two entities’ structures may have notable dissimilarities. Hence, we did not embrace the concatenation embedding method outright. Instead, we allotted distinct weights to these two embeddings to differentiate their contributions during training. Bearing this in mind, we integrated a margin-based ranking loss function in the model training process, intended to reduce the embedding distance of positive pairs and enlarge that of negative pairs. The particular equation is stated as follows:
Here and represent negative pairs based on relation and property embeddings, correspondingly; and (both > 0) are the margin hyperparameters that separate positive and negative pairs.
3.6. Enriching Triplets with Transitivity Rules
Although the relational embeddings implicitly learn the information of relation transitivity, incorporating this information explicitly augments the number of properties and related entities for each entity, thereby facilitating the identification of similarities between entities. For instance, let us consider the triplet and , from this, we can infer the existence of a relationship, namely, “”, between the entities “” and “”. In actuality, this information can be leveraged to enhance the relevant entities “”. This paper addresses the handling of single-hop transitive relations as follows: Given the relationship triplets and , we interpret and as the relationships from the head entity to the tail entity . Therefore, the relationships between these transitive triplets are defined as , and by replacing the relationship vector with , we can obtain the relationship between and .