Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction

Hou, Chunning; Wang, Xinzhi; Luo, Xiangfeng; Xie, Shaorong

doi:10.3390/electronics13142762

Open AccessArticle

Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(14), 2762; https://doi.org/10.3390/electronics13142762

Submission received: 20 June 2024 / Revised: 7 July 2024 / Accepted: 9 July 2024 / Published: 13 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Link prediction is a key technique for connecting entities and relationships in a graph reasoning field. It leverages known information about the graph structure data to predict missing factual information. Previous studies have either focused on the semantic representation of a single triplet or on the graph structure data built on triples. The former ignores the association between different triples, and the latter ignores the true meaning of the node itself. Furthermore, common graph-structured datasets inherently face challenges, such as missing information and incompleteness. In light of this challenge, we present a novel model called Multi-source Information Graph Embedding with Ensemble Learning for Link Prediction (EMGE), which can effectively improve the reasoning of link prediction. Ensemble learning is systematically applied throughout the model training process. At the data level, this approach enhances entity embeddings by integrating structured graph information and unstructured textual data as multi-source information inputs. The fusion of these inputs is effectively addressed by introducing an attention mechanism. During the training phase, the principle of ensemble learning is employed to extract semantic features from multiple neural network models, facilitating the interaction of enriched information. To ensure effective model learning, a novel loss function based on contrastive learning is devised, effectively minimizing the discrepancy between predicted values and the ground truth. Moreover, to enhance the semantic representation of graph nodes in link prediction, two rules are introduced during the aggregation of graph structure information. These rules incorporate the concept of spreading activation, enabling a more comprehensive understanding of the relationships between nodes and edges in the graph. During the testing phase, the EMGE model is validated on three datasets, including WN18RR, FB15k-237, and a private Chinese financial dataset. The experimental results demonstrate a reduction in the mean rank (MR) by 0.2 times, an improvement in the mean reciprocal rank (MRR) by 5.9%, and an increase in the Hit@1 by 12.9% compared to the baseline model.

Keywords:

link prediction; graph embedding; ensemble learning; contrastive learning

1. Introduction

Link prediction in graph reasoning is a key technique for connecting entities and relationships. It has been widely used in various fields, such as information retrieval [1], information recommendation [2], intelligent chat systems [3], and intelligent detection [4,5,6]. Many open-source knowledge graph datasets have been built, such as DBpedia [7], Freebase [8], and ConceptGraph [9]. Those knowledge graph datasets store knowledge of the graph structure in the form of fact triples (head(h), relation(r), tail(t)). Nodes h and t represent entities. Edges

(r)

represent relationships. The left part of Figure 1 illustrates the extraction of structured knowledge from unstructured text and the formation of a knowledge graph. For example, (Chinese Badminton Association, located _ in, China) means “Chinese Badminton Association located in China”. Link prediction is to predict the entity (head or tail) and ensure that each fact triplet structure is complete. To represent these symbolic structural fact triplets as computable digitized information, graph embedding encodes entities and relationships into computable embedding vectors in a unified space and designs a model to predict unseen entities. The right half of Figure 1 demonstrates that in the link prediction task, graph representation learning primarily focuses on studying the structural information of the graph. However, it often overlooks the semantic information embedded within the graph, such as the textual node descriptions.

Link prediction has witnessed numerous models being proposed, which have achieved remarkable success in this field. Those models transform graph data into a computable vector space to represent semantic features of the entities and relations. We categorize these models into three types: decomposition-based methods, including DistMult [10] SimplE [11], and ComplEx [12]; Translational Vector Methods, such as TransE [13], TransH [14], HAKE [15], and RoCS [16]; and Deep Learning Methods, which consist of ConvE [17], CompGCN [18], KG-BERT [19], SAttLE [20], RUGA [21], CKCB [22], and Caps-OWKG [23]. The following shortcomings are found in the mainstream methods according to the related papers:

Common graph-structured datasets inherently face challenges, such as missing information and incompleteness. Previous link prediction studies have tended to favor a single triplet representation, ignoring contextual information about entities-associated nodes in graph-structured data, as demonstrated in models such as KG-BERT and SAttLE. Conversely, models like CompGCN and SE-GNN [24] have considered the graph structure information while overlooking the textual characteristics of nodes. Specific approaches, such as kNN-KGE [25], have also attempted to leverage text data and graph structure information. Still, their integration remains rudimentary, lacking a more sophisticated fusion of these two information sources. Therefore, the restricted utilization of multi-source information in existing models diminishes their capacity to express semantic relationships effectively. The right half of Figure 1 highlights that in the link prediction task, graph representation learning predominantly emphasizes the study of the graph’s structural information. Nevertheless, it tends to neglect the valuable semantic information present in the graph, such as the textual descriptions of the nodes. Furthermore, the process of mapping graph node information compresses the semantic features of the nodes, exacerbating the scarcity of semantic information for graph nodes.
Previous models have typically utilized a single network model for training and learning semantic features. However, due to the distinct characteristics of different neural networks, the learned features can vary significantly. Ensemble learning addresses this issue by integrating multiple models and leveraging the strengths of other models to improve the final results. However, current ensemble learning approaches often focus solely on information fusion at the decision level, neglecting the crucial information fusion at the data and feature layers. This limitation leads to inadequate semantic mining and exploration by the link prediction model.

Based on these considerations, we propose a novel framework called Multi-source Information Graph Embedding with Ensemble Learning for Link Prediction (EMGE) to tackle the challenges in link prediction. EMGE enhances entity embeddings at the data level by incorporating both structured graph information and unstructured textual data as inputs from multiple sources. An attention mechanism is introduced to effectively fuse these inputs. This integration enables a more comprehensive understanding of nodes and their relationships within the graph. During the training process, EMGE leverages the principles of ensemble learning to extract semantic features from various types of neural network models. This approach promotes the interaction and integration of information across different models, resulting in a more robust data representation. In addition, a novel loss function based on contrastive learning is designed to optimize the learning process. This innovative loss function aims to minimize the discrepancy between predicted and actual values and enhance the reliability of predicted values. EMGE also defines two node information fusion rules during the aggregation of graph structure information, inspired by the concept of spreading activation. These rules aggregate node information from different perspectives, further enhancing the semantic representation of nodes in the graph. By integrating the multi-source information of nodes, the EMGE model captures a more comprehensive understanding of the nodes and their relationships, leading to improved predictions. Additionally, the adoption of contrastive learning and ensemble learning enables EMGE to learn from both positive and negative examples, leveraging the strengths of multiple models to make more accurate predictions. Overall, the combination of these innovative features in EMGE can significantly enhance link prediction performance.

In summary, the main contributions of this paper are as follows:

Our method introduces revolutionary solutions for the link prediction task. The EMGE model enhances the semantic expressiveness of graph representation learning and improves the prediction accuracy for unknown data by integrating diverse information sources and mapping them into a unified semantic vector space.
We introduce text description information to enhance the semantic richness of nodes in the graph structure and employ attention mechanisms to integrate textual and diagrammatic structure information. Subsequently, ensemble learning is applied to facilitate information interaction among different networks and enhance the overall predictive capability of the model.
Based on experimental evaluations, it demonstrates that EMGE achieves competitive performance on two public datasets and a private dataset.

2. Related Work

2.1. Link Prediction in Graph Reasoning

Link prediction has been identified as a hot area of research in previous studies, where most studies have focused on graph embeddings that represent the elements in triples. In this paper, we divide the existential research into three types. The first type is decomposition-based, including DistMult, SimplE, and ComplEx. These methods map entities and relationships to a unified and continuous vector space, which can capture more semantic features and then formulate a scoring function to verify the model’s validity. However, these early methods only consider each element and ignore the connections between the elements, which makes these methods unable to capture deep semantic features. The second is Translational Vector Methods, such as TransE, TransH, and HAKE. These methods are inspired by the translation invariance of word2vec and define a scoring function to measure the plausibility of triples. Upon examining the results of these models, we observed that despite the introduction of numerous new methods, the overall experimental outcomes have not shown significant improvement. The third is Deep Learning Methods, such as ConvE, R-GCN [26], cacGCN-W [27], and DTAE [28]. These models use the characteristics of different kinds of neural networks to extract the semantic features of data from the spatiotemporal and structural information of data.

2.2. Ensemble Learning in Link Prediction

Ensemble learning is essential in machine learning research because it combines different models to solve the problem of a single model’s poor prediction and improve the model’s accuracy. Several studies have utilized ensemble learning techniques to achieve improved results in link prediction. For instance, Ref. [29] employs multiple ensemble learning methods to tackle link prediction and uses a voting mechanism to aggregate the final predictive outcomes. Similarly, Ref. [30] utilizes random forest-based recursive feature elimination and constructs a two-level ensemble model designed explicitly for link prediction tasks. Furthermore, Ref. [31] applies ensemble learning techniques to perform link prediction on large-scale biomedical knowledge graphs. It is worth noting that existing ensemble learning methods, such as Refs. [21,22,23], primarily integrate various base models at the decision-making layer, overlooking the fusion of data and feature layers. In contrast, EMGE places a stronger emphasis on multi-level information fusion. EMGE enhances the model’s overall performance by considering information integration across multiple layers and improves link prediction accuracy.

2.3. Contrastive Learning in Graph Embeddings

By comparing data with both positive and negative samples, contrastive learning is a potent technique for extracting representations and learning meaningful feature representations. There are related applications in natural language processing as well. For instance, Ref. [32] introduces contrastive learning into sentence embeddings, achieving impressive results in unsupervised and supervised semantic similarity tasks. By leveraging contrastive learning, Ref. [33] develops a schema-augmented multi-level approach for knowledge graph link prediction. Ref. [34] applies contrastive learning to extract rich information from multiple perspectives and constructs a line graph for conducting link prediction. Similarly, Ref. [35] employs contrastive learning to enrich node information through a shared encoder to facilitate link prediction. The utilization of contrastive learning in these studies demonstrates its effectiveness in enhancing representation learning and improving link prediction tasks in various domains.

The EMGE model focuses more on integrating the multi-source information and improving the semantic expression ability of graph nodes. In contrast to previous work, we propose a novel framework that leverages the power of graph neural networks, ensemble learning, and contrastive learning to improve the semantic representation of graph representation learning.

3. Method

Firstly, this section establishes a clear definition of the link prediction task to establish a consensus on the problem. Secondly, we will delve into the frameworks of the EMGE model, which encompass the methods, techniques, and algorithms intended to effectively address the link prediction challenge. By introducing these ideas and modeling frameworks, this paper aims to present a comprehensive and robust strategy for efficiently and effectively solving the link prediction problem.

3.1. Preliminary

This paper adopts a widely accepted definition, where a graph is defined as

G r a p h = {V, E, T}

. Here, V represents a set of entities (nodes), E represents a set of relations (edges), and T represents the set of triple facts. Triple facts can be defined as

s = (h e a d (h), r e l a t i o n (r), t a i l (t))

, where

h e a d (h)

and

t a i l (t)

are entities belonging to V, and

r e l a t i o n (r)

is a relation from E.

The link prediction task is focused on predicting the tail entity t given

(h, r, ?)

or indicating the head entity h given

(?, r, t)

, where “?” denotes missing information. For example, given h = “USA” and r = “capital of”, the link prediction task entails predicting the missing tail entity t. In this example, the possible candidates for t could be “Washington” or “New York”. The underlying objective is to build models using existing fact triples to make predictions and inferences about missing information, thereby reducing the fragmentation and incompleteness within the graph.

The EMGE model adopts a question-and-answer-based approach for the link prediction task. It considers the head entity and the relation as questions and the tail entity as answers and, conversely, predicts the head entity given the relation and the tail entity. By treating the task as a question–answer matching problem, the EMGE model aims to improve the semantic representation of the node information in the graph network from multiple perspectives. Finally, we validate the effectiveness of the model through experiments.

3.2. Model Framework

This part provides a comprehensive overview of the EMGE model for link prediction, which encompasses three stages: the input stage, the encoding stage, and the decoding stage. Figure 2 illustrates the structural framework and core ideas of the EMGE model. It enhances the graph representation learning by applying ensemble learning and the node text description information of graphs. The input stage outlines the composition of the data, identifying the entities and relationships that make up the knowledge graph. This stage ensures that the data are properly prepared for subsequent processing. The encoding stage focuses on embedding the representation of the entity and relation. Specific techniques are used to effectively capture the semantic features of the entities and relationships within the knowledge graph. This embedding process lays the foundation for the subsequent analysis and prediction. The decoding stage includes capturing relevant features and training the score function, which is essential for predicting actual entities in the link prediction task. The EMGE model uses concepts from contrastive learning and comparative learning to improve the semantic representability of the model. These methodologies enhance the holistic comprehension of the entity relationships, consequently augmenting the precision and efficacy of the link prediction. By combining these three stages, our model provides a robust framework for link prediction. It embeds entity and relationship representations effectively, captures relevant features, and uses advanced learning techniques to make accurate predictions. So, we can define the EMGE model as:

E M G E = {A, B, C}

(1)

where A represents the input stage, B represents the encoding stage, and C represents the decoding stage.

3.2.1. Input Stage

The initial phase of this framework is primarily concerned with the composition of the dataset. This critical stage requires two different types of data as multi-source information inputs. One type is essential for constructing the intricate graph, while the other is used to provide textual descriptions for the nodes within the graph data. To carefully create the graph structures, it meticulously organizes the triples and uses sophisticated coding techniques to transform them into a coherent and comprehensive graph structure. It is worth noting that previous research has predominantly focused on exploiting only the node information of the graph, lacking the semantic properties necessary to elucidate the intricate relationships between nodes effectively. Consequently, the representation only captures the topological aspects within the graph.

In the case of textual description data, it is imperative that we meticulously filter the existing data and focus only on the relevant information that meets the objectives. Figure 3 highlights that triple information is a condensed form of unstructured information, resulting in a loss of certain semantic features. The mapping process of graph-structured information further aggravates the scarcity of semantic information. By introducing textual description information, it enhances the semantic and contextual information of the nodes. Furthermore, it discovers that longer text descriptions are not necessarily advantageous. Long text descriptions not only consume significant computational resources but also undergo significant compression during vector transformations, which ultimately results in a reduction in the accurate semantic information encapsulated within the vector. Through careful observation, it is found that the opening sentence within the textual description tends to be the most informative and revealing. Consequently, this paper has implemented a careful filtering mechanism to extract and retain only the most salient textual information. The input stage can be formulated as:

A = f (T, G, I)

(2)

where f represents the data preprocessing function, T represents the set of triples

(h, r, t)

, G represents the graph, and I represents the text description.

3.2.2. Encoding Stage

The encoding stage primarily clarifies the methodology used to embed the representation of entities and relationships. At this crucial stage, this paper adopts the concept of ensemble learning, where it constructs two different methods for representing entities and relationships. These methods provide a multifaceted and semantic representation of entities and relationships, capturing different perspectives and enhancing the overall understanding of the data. The encoding stage can be formulated as:

B = g (A)

(3)

where g represents the encoding part.

A. Graph Embedding

Graph embedding is a powerful technique for preserving network topology and node content information within a graph by representing vertices as compact, low-dimensional vectors. The EMGE model exploits the concept of spreading activation to enhance the semantic expression of node information. Rooted in cognitive psychology, spreading activation theory explains how the brain traverses associative thought networks to retrieve specific information. In this theory, a spreading activation network can be visualized as the shortest straight line connecting two nodes in a network graph, indicating that these two nodes are more closely related and tend to establish a connection with the initial node quickly.

Inspired by spreading activation, the EMGE model establishes two rules to effectively convey the semantic essence of the nodes within the graph. Figure 4 illustrates two methods of information aggregation in graph-structured data, based on the concept of activation diffusion. These methods represent the graph nodes by incorporating neighboring nodes and edge features according to distinct rules.

Rule 1: The information that defines the edges and nodes directly connected to node t should be integrated into node t and updated. The figure on the left in Figure 4 shows the information integration process for Rule 1. As is well known, nodes that have only a one-hop relationship with node t are the most closely related to node t, so the information of these nodes needs to be added to node t.
Rule 2: It is defined that under the same conditions as the head node h and relationship r, the information of these two is added to node $t_{i}$ ; these nodes $t_{1}, t_{2}, t_{3}, \dots, t_{i}, \dots, t_{n}$ are reintegrated; and the integrated information is added to the original node $t_{i}$ again. The figure on the right in Figure 4 shows the information integration process for Rule 2.

We can obtain the preliminary graph node embedding through the above two rules. Before that, it needs to understand the information propagation mechanism of graph neural networks, which can be formulated as:

m_{h \to r}^{l} = M^{l} (E_{h}^{l - 1}, E_{r}^{l - 1})

(4)

m_{t}^{l} = \sum_{h, r \in N} m_{h \to r}^{l}

(5)

E_{t}^{l} = U^{l} (E_{t}^{l - 1}, m_{t}^{l})

(6)

where

E_{h}

and

E_{r}

represent the node embedding and edge embedding. N represents a collection of nodes and edges. ∑ represents the information aggregation. U represents the information update.

Equations (4)–(6) show the graph neural network messaging framework. Equation (4) represents the messaging function, which means passing the node information to an edge that has a relationship with it. Equation (5) represents the reduce function, which means updating the new self-node embedding based on messages from the neighbors. Equation (6) represents the update function, triggering the messaging and reducing functions.

Rule 1 describes the representation of node t as a result of the joint action with node h, which has a one-hop relationship, and the relationship r between two nodes. According to the information propagation mechanism of the graph neural network, it forms a calculation formula based on Rule 1, which can be formulated as:

α_{h \to r}^{l} = \frac{exp (E_{h}^{l - 1} \cdot E_{r}^{l - 1})}{\sum_{h, r \in N} exp (E_{h} \cdot E_{r})}

(7)

m_{t}^{l} = \sum_{h, r \in N} α_{h \to r} * E_{r}^{l - 1}

(8)

E_{t}^{l} = U^{l} (m_{t}^{l} W)

(9)

E_{r u l e 1} = σ (E_{t}^{l})

(10)

where

σ

is the non-linear activation function. W is a matrix of the weight parameters.

Rule 2 describes how information transfer and aggregation are accomplished under certain conditions. According to the information propagation mechanism of the graph neural network, we define the calculation formulas based on Rule 2, which can be formulated as:

α_{h \to r \to t}^{l} = \frac{exp (δ (E_{h}^{l - 1} \cdot E_{r}^{l - 1}) \cdot E_{t}^{l - 1})}{\sum_{h, r \in N} exp (δ (E_{h} \cdot E_{r}) \cdot E_{t})}

(11)

m_{t}^{l} = \sum_{h, r, t \in N} α_{h \to r \to t} * δ ({E_{r}^{l - 1} \cdot E}_{r}^{l - 1})

(12)

E_{t}^{l} = U^{l} (m_{t}^{l} W)

(13)

E_{r u l e 2} = σ (E_{t}^{l})

(14)

We take Rule 1, Rule 2, and the aggregation of random initial nodes as the original embedding representation of the nodes, which can be formulated as:

E^{'} = E + E_{r u l e 1} + E_{r e l u 2}

(15)

where E is the initial node embedding.

B. Text Embedding

Text embedding, with its ability to enrich the information encapsulated within the nodes of the graph and elevate the semantic expression of these nodes, assumes immense significance. Section 3.2.1 expounds upon the requisite data format for our model. The present study has adopted the illustrious GloVe English word embedding [36] as our preferred means to encode sentences. Textual information enables the model to concentrate on the most influential information components affecting model performance significantly, thereby improving overall performance.

E^{″} = E m b e d d i n g (T e x t)

(16)

where

E m b e d d i n g

represents a vectorization function.

T e x t

represents the textual information.

In this paper, we concatenate the vectorized graph structure nodes with the text information to form a new vector. Next, we employ an attention mechanism to extract features from the new vector. The extracted features are then added to the graph structure node vectors to create updated node vectors. Simultaneously, the extracted features are concatenated with the text vectors to generate new text vectors. The attention mechanism effectively focuses on and balances the importance of the node data and text information by constructing a weight parameter feature matrix, achieving the efficient fusion of data. This process can be defined as follows: leveraging attention weights to linearly combine node and textual information, allowing the model to focus on the most informative aspects that impact model performance maximally, thereby enhancing overall performance. The formula is presented as:

E_{I n t e r} = A t t e n (E^{'}, E^{″}, W)

(17)

E^{″} = E^{″} \oplus E_{I n t e r}

(18)

E^{'} = E^{'} + E_{I n t e r}

(19)

where W represents a hyperparameter.

A t t e n

represents an attention function. ⊕ represents the concatenation of two vectors. + represents the addition of two vectors.

Recognizing the inherent variability in the sentence length of individual entity descriptions, we successfully overcame this challenge by using a carefully chosen batch size to ensure uniformity during the training process. Consequently, the spliced word vector is then skillfully introduced to a BiLSTM architecture, enabling the extraction of pertinent features. In concise terms, the process can be succinctly defined as follows:

\vec{h_{t}} = L S T M (E^{″}, l e n (E^{″}))

(20)

\overset{\leftarrow}{h_{t}} = L S T M (E^{″}, l e n (E^{″}))

(21)

h_{t} = D r o p o u t (\vec{h_{t}}, \overset{\leftarrow}{h_{t}})

(22)

where

D r o p o u t

is a function that prevents model overfitting.

3.2.3. Decoding Stage

The decoding stage elucidates the methodology in capturing essential features and training the score function, facilitating accurate query responses. This stage encompasses three integral components: Att-Maxpooling, ConvE, and the loss function. The decoding stage can be formulated as:

C = p (B)

(23)

where p represents the data decoding function.

A. Att-Maxpooling

Att-Maxpooling is an important part of the decoding stage, which can be defined as:

a t t_{s c o r e} = s o f t m a x (W_{1} tanh (h_{t} W_{2})) h_{t}

(24)

F^{'} = m a x p o o l i n g (h_{t})

(25)

F_{A t t - M a x p o o l i n g} = m a x p o o l i n g (F^{'} + a t t_{s c o r e})

(26)

{l o g i t}_{M a x p} = s i g m o i d (F_{A t t - M a x p o o l i n g} \cdot N_{e})

(27)

where

s o f t m a x

is a kind of non-linear function.

m a x p o o l i n g

is a feature extraction function.

s i g m o i d

is a kind of non-linear function.

N_{e}

is a vector representation of all the nodes.

B. ConvE

ConvE is a convolutional neural network that primarily maps the entity and relationship features in fact triples into a uniform vector space.

F_{C o n v E} = C o n v E (F^{'} \oplus E^{'} \oplus F_{A t t - M a x p o o l i n g}, R)

(28)

{l o g i t}_{C o n v E} = s i g m o i d (F_{C o n v E} \cdot N_{e})

(29)

where R represents the relationship between two nodes.

C. Loss Function

The loss function plays a crucial role in model learning. To this end, we formulate a composite loss function, primarily comprising a contrastive loss function and a binary cross-entropy loss, to evaluate the alignment between

(h, r)

and the prospective answer t. This paper proposes the construction of a loss function based on contrastive learning, which involves comparing an example with a semantically similar example (positive example) or with one that is semantically different (negative example). The aim is to devise a model architecture and contrastive loss function in a manner that brings the representations of semantically similar instances closer in the feature space while pushing apart the representations of semantically dissimilar instances, thereby achieving a clustering effect. Contrastive learning is often used for unsupervised learning, which can be defined as:

L_{q} = - l o g \frac{exp (q \cdot k_{+} / τ)}{\sum_{i = 0}^{K} exp (q \cdot k_{i} / τ)}

(30)

where q is a feature vector of the sample data, and

k_{+}

is a positive sample.

τ

is a temperature hyperparameter. K represents negative samples.

Inspired by unsupervised contrastive learning, this study presents a supervised contrastive learning loss function. The formulation of the supervised contrastive learning loss entails four key steps: feature normalization, predicted value generation, actual value generation, and loss function construction. Normalizing and building predicted values are introduced. It should also be noted that the predicted values constructed here are based on the batch size, which can be described as:

Z_{C o n v E} = N o r m (F_{C o n v E})

(31)

Z_{A t t - M a x p o o l i n g} = N o r m (F_{A t t - M a x p o o l i n g})

(32)

θ_{C o n v E} = (Z_{C o n v E} \cdot Z_{C o n v E}^{T}) / τ

(33)

θ_{M a x p} = (Z_{A t t - M a x p o o l i n g} \cdot Z_{A t t - M a x p o o l i n g}^{T}) / τ

(34)

where

τ

is a temperature hyperparameter.

N o r m

is a normalized function.

Then, we introduce building actual values, which can be defined as:

p_{C o n v E}^{t r u e} = E q u (θ_{C o n v E}, θ_{C o n v E}^{T}) / s u m (E q u (θ_{C o n v E}, θ_{C o n v E}^{T}))

(35)

p_{M a x p}^{t r u e} = E q u (θ_{M a x p}, θ_{M a x p}^{T}) / s u m (E q u (θ_{M a x p}, θ_{M a x p}^{T}))

(36)

where

E q u

is to determine whether two vectors are equal.

Finally, the contrastive learning loss function can be defined as:

{l o s s}_{C o n v E}^{C L} = - w l o g \frac{exp (θ_{C o n v E}, p_{C o n v E}^{t r u e})}{\sum_{c = 1}^{C} exp (θ_{C o n v E, c})}

(37)

{l o s s}_{M a x p}^{C L} = - w l o g \frac{exp (θ_{M a x p}, p_{M a x p}^{t r u e})}{\sum_{c = 1}^{C} exp (θ_{M a x p, c})}

(38)

where w is the weight parameter, and C is the number of classes.

C L

is a contrastive learning loss.

Next, the joint loss function combined with contrastive learning loss is introduced, which can be defined as:

\begin{matrix} {l o s s}_{C o n v E}^{B C E} = - w [y_{l a b e l} log ({l o g i t}_{C o n v E}) + (1 - y_{l a b e l}) log (1 - {l o g i t}_{C o n v E})] \end{matrix}

(39)

\begin{matrix} {l o s s}_{M a x p}^{B C E} = - w [y_{l a b e l} log ({l o g i t}_{M a x p}) + (1 - y_{l a b e l}) log (1 - {l o g i t}_{M a x p})] \end{matrix}

(40)

\begin{matrix} l o s s = α * (\frac{β * {l o s s}_{M a x p}^{B C E} * {l o s s}_{M a x p}^{C L}}{{l o s s}_{M a x p}^{B C E} {+ l o s s}_{M a x p}^{C L}}) + (1 - α) * (\frac{λ * {l o s s}_{C o n v E}^{B C E} * {l o s s}_{C o n v E}^{C L}}{{l o s s}_{C o n v E}^{B C E} + {l o s s}_{C o n v E}^{C L}}) \end{matrix}

(41)

where

B C E

is the binary cross-entropy loss, and y is the label.

α

,

β

, and

λ

are coefficient factors that can be adjusted based on the experimental results.

To better understand the model framework, the training process in the form of pseudocode is shown. Algorithm 1 shows the model framework. Algorithm 2 shows the contrastive learning loss.

Algorithm 1: EMGE Framework

Input: E, R, Text

Output: Score

01: for i in epoch:

02:

E_{r u l e 1}

←

R u l e 1 (E, R)

03:

E_{r u l e 2}

\leftarrow R u l e 2 (E, R)

04:

E^{'}

\leftarrow E + E_{r u l e 1} + E_{r u l e 2}

05:

E^{″}

\leftarrow E m b e d d i n g (T e x t)

06:

E_{I n t e r}

\leftarrow A t t e n (E^{'}, E^{″}, W)

07:

E^{″}

\leftarrow E^{″} \oplus E_{I n t e r}

08:

E^{'}

\leftarrow E^{'} + E_{I n t e r}

09:

h \leftarrow B i L S T M (E^{″}, l e n (E^{″}))

10:

h \leftarrow D r o p o u t (h)

11:

a t t_{s c o r e} \leftarrow A t t e n t i o n (h)

12:

F^{'}

\leftarrow M a x p o o l i n g (h)

13:

F_{A t t - M a x p o o l i n g}

\leftarrow M a x p o o l i n g (F^{'} + h)

14:

l o g i t M a x p \leftarrow s i g m o i d (F_{A t t - M a x p o o l i n g} \cdot N e)

15:

F_{C o n v E} \leftarrow C o n v E (F^{'} \oplus E^{'} \oplus F_{A t t - M a x p o o l i n g}, R)

16:

l o g i t_{C o n v E} \leftarrow s i g m o i d (F_{C o n v E} \cdot N e)

17:

l o s s \leftarrow l o s s (F_{A t t - M a x p o o l i n g}, l o g i t_{M a x p}

,

F_{C o n v E}, l o g i t_{C o n v E}, y_{l a b e l})

Algorithm 2: Contrastive Learning Loss

Input: (

F_{a t t - m a x p o o l i n g}

,

y_{l a b e l}

) or (

F_{C o n v E}

,

y_{l a b e l}

)

Output:

{l o s s}_{C o n v E}^{C L}

or

{l o s s}_{M a x p}^{C L}

01: Z ← Norm(F) # normalizing

02:

θ

← (Z ·

Z^{T}

)/

τ

# build predicted values

03: # build actual values

04: temp ←

y_{l a b e l}

.type_as(

θ

)

05: x ← temp. unsqueeze (−1)

06: y ←

t e m p^{T}

. unsqueeze (0)

07: m ← initialize a matrix

08: for i = 0; i < len(temp); i++ do

09: for j = 0; j < len(temp); j++ do

10: if x == y:

11:

m_{i j}

← True

12: else:

13:

m_{i j}

← False

14: endif

15: m ← m.all(1)

16: m ← m.type_as(

θ

)

17:

p^{t r u e}

← m/sum(m)

18: # contrastive learning loss

19:

l o s s^{C L}

←

- w l o g \frac{exp (θ_{M a x p}, p_{M a x p}^{t r u e})}{\sum_{c = 1}^{C} exp (θ_{M a x p, c})}

4. Experiments

This section provides a detailed analysis of the experimental findings. It begins by outlining the benchmark dataset, evaluation protocol, and experimental configuration utilized. Following this, a comparative analysis is conducted between the performance of the EMGE model and the baseline model. An ablation study is subsequently performed to assess the influence of various components on the EMGE’s performance. Finally, the effectiveness of the EMGE model is reinforced through a case study analysis.

4.1. Datasets

Our experiment utilizes two publicly available benchmark datasets: FB15K-237 and WN18RR. FB15K-237 is a refined subset of FB15K, which improves and simplifies the relationships between entities in the dataset. Similarly, WN18RR is a subset of WN18 that removes duplicate and inverse relations. Additionally, this paper introduces a proprietary Chinese financial-domain dataset (CFD), which focuses on the knowledge graph pertaining to relationships between enterprises. In the CFD dataset, the entities are the names of financial enterprises, and the relationships represent the interactions between these enterprises (such as ownership or investments), mirroring the data structure of two other publicly available datasets. It is noteworthy that the FB15K-237, WN18RR, and CFD datasets are meticulously crafted to ensure that each triple

(h, r, t)

within the valid and test sets does not possess any direct connections

(h, r^{'}, t)

within the training set, thereby rendering the experiment more arduous and challenging. A comprehensive overview of the statistical aspects associated with these three datasets is presented in Table 1. The model learning process exclusively utilizes the training set, while the remaining two sets are employed for model verification and evaluation. Notably, a well-trained model exhibits the capability to accurately predict the tail entity, especially for previously unseen data. This setup significantly enhances the model’s predictive ability on novel data instances.

4.2. Evaluation Protocols

This paper uses evaluation metrics widely adopted in the link prediction task to demonstrate the effectiveness of the EMGE model. These metrics include the mean reciprocal rank of correct (MRR), mean rank of correct (MR), and the proportion of valid triples within the top-N ranks (Hits@N, where N = 1, 3, and 10). A more favorable performance of our model is indicated by lower values of the MR and higher values of the MRR and Hits@N. These metrics collectively provide valuable insights into the model’s ability to accurately predict links and rank correct triples. PROCRUSTES [37], SE-GNN, NePTuNe [38], SAttLE, HIE [39], SEA [40], SHGNet [41], CoDLR [42], and DTAE act as baselines for public datasets. Additionally, we have included a table to illustrate the differences between the baseline models and EMGE, as shown in Figure 5.

4.3. Settings

To better represent the entity’s text, this paper uses GloVe English word embedding. The number of the epoch is 200. The number of hidden units for the BiLSTM is 200. And the layer of the BiLSTM is 1. The temperature

τ

is 0.1. The Adam optimizer is adopted. This paper adopts the method of dynamically adjusting the learning rate to speed up the model training. To improve the computational efficiency, we transform the non-directionality of the graphical data into bi-directionality in data processing. It defines that

(h, r, ?) \to t

and

(?, r, t) \to h

transform them as

(h, r, ?) \to t

and

(t, r, ?) \to h

. In addition, during the training process, this study randomly removes a certain proportion of edges in the graph to enhance the robustness of the model. It not only can prevent unknown information from being exposed in advance, but the absence of edges is closer to the real scene. This model is implemented using PyTorch (1.9) on an NVIDIA RTX 3090 Ti GPU.

4.4. Results

We present and analyze the experimental results for the CFD dataset, as well as the publicly available WN18RR and FB15K-237 datasets in this part. The recent research models are selected as baseline models to compare and contrast the performance differences between the models. The specific experimental results are illustrated in Figure 6 and Figure 7.

By observing the results of the experiment, the following can be found:

The EMGE model demonstrates superior performance compared to the baseline model in seven out of ten evaluation metrics for the WN18RR and FB15K-237 datasets. Notably, the MR index shows a significant reduction in the WN18RR dataset, reaching approximately 0.27 times the value observed in the baseline model. Furthermore, the EMGE model exhibits commendable experimental results in the H@10 metric. The experimental findings validate the model’s adeptness in handling diverse data types. The challenge of the WN18RR dataset lies in its abundance of entities but limited relationship types, which are exacerbated by the scarcity of training data. Similarly, the FB15k-237 dataset poses difficulties due to its relatively fewer entities, yet intricate relationships and imbalanced data, thus intensifying the training complexity. In addition, compared to the FB15k-237 dataset, the WN18RR training data have a smaller volume and a larger number of entities, as evident from Table 1. This has also impacted the performance of the EMGE model. However, this situation is common across other models as well, representing a common issue. However, the EMGE model tackles these two dataset variations by leveraging multi-source information to enhance the semantic understanding of the nodes within the graph structure data, thereby bolstering the predictive prowess of the model. The experimental results further substantiate the efficacy of the EMGE model. Compared to SE-GNN, which also utilizes node information aggregation, it shows significant performance differences compared to the EMGE model. This suggests that our information aggregation method is superior. In addition, the CoDLR method also incorporates textual information for description but shows significant performance differences compared to the EMGE model. This suggests that the method may not be as strong in feature extraction as our approach.
As the CFD dataset focuses on inter-firm linkages and is privately curated, it does not have a comparable baseline model like the publicly available datasets for comparison. Therefore, this paper selects ConvE, SE-GNN, and CompGCN as the benchmark models. The experimental results indicate that our model ranks second in terms of the MR metric, closely following the top-ranked model, and outperforms the other models in four additional metrics. Given the small scale of the CFD dataset, with relatively fewer entities than the other two datasets, effectively handling small-sample data becomes a test of the model’s generalizability. The experimental results demonstrate that our model excels in handling Chinese datasets, showcasing outstanding performance even in scenarios with limited samples and exhibiting stable experimental results. This further validates the effectiveness of leveraging multi-source information and ensemble learning in enhancing the predictive capabilities of the EMGE model.

4.5. Ablation Study

To assess the impact of each critical section on the experiment, this paper performs ablation experiments to remove one or more sections in sequence. Here, we conducted seven sets of experiments. Each set involves removing different components to demonstrate the impact of different modules on the model itself. These seven sets of experiments involve removing different components as follows: (1) Remove the text description information. (2) Remove the text description information and the contrastive loss function. (3) Remove node aggregation Rule 1 and the text description information. (4) Remove node aggregation Rule 1, the text description information, and the contrastive loss function. (5) Remove node aggregation Rule 2 and the text description information. (6) Remove node aggregation Rule 2, the text description information, and the contrastive loss function. (7) Remove node aggregation Rules 1 and 2, the text description information, and the contrastive loss function.

Figure 7 and Figure 8 show the results of CFD, WN18RR, and FB15k-237. The analysis reveals a consistent decrease in the overall performance of the EMGM model following the removal of each key component, underscoring the crucial role that each element plays in enhancing the model’s effectiveness. In addition, the model’s overall performance is worse when critical parts are removed. It is further explained that each part has a limited effect, but the combined model performs better. Removing each part of the model is relatively stable for WN18RR and FB15-237 but has a more significant impact on CFD. Removing the text information resulted in a significant decrease in overall model performance, highlighting the importance of text description information. The absence of text information also implies that the model cannot utilize ensemble learning for data fusion, feature fusion, and decision information fusion, leading to an overall reduction in model performance. Additionally, the ablation experiments demonstrate that the model’s effectiveness does not solely rely on the performance of any single module but rather on the collective contribution of all modules working together to achieve results.

4.6. Case Study

We employ a detailed case study to highlight the effectiveness of the EMGE model, demonstrating its capability to predict unseen data. This paper carefully selects specific cases from the FB15k-237 test set and presents the ranking results in Figure 9, including models such as EMGE, EMGE (w/o Rule 1 and text), SE-GNN, and ConvE. Through these representative data, it observes how different models perform in predicting unknown instances.

For example, consider the tail entity prediction (Silkwood, award, ?) → BAFTA Award for Best Actress in a Leading Role. It is worth noting that this particular triplet only appears in the test set, further substantiating the predictive power of the EMGE model for previously unseen data. Additionally, when it comes to head entity prediction (?, film performance, Screen Actors Guild Award for Outstanding), it discovers five candidate entities that can potentially serve as the head entities for this set of triples. To further establish the predictive ability of our model for unseen data, the average ranking method is used to calculate the prediction performance of different models. As shown in Figure 8, the EMGE model consistently exhibits a more prominent average ranking than the other models, corroborating its superior predictive capabilities.

Furthermore, to corroborate the experimental outcomes, a random selection of 100 data points from the FB15k-237 dataset is conducted. The data analysis results are depicted in Figure 8, and the analysis of the experimental results is performed.

In Figure 9, this paper compares the rankings produced by the four models and divides the ranking values into four intervals ( $[1, 3), [3, 5), [5, 10], (10, 10 +)$ ). By comparing the experimental results, it is found that the EMGE algorithm performs better. In particular, it performs better than other algorithms in the $[5, 10]$ interval, consistent with the experimental results in Figure 7. At the same time, in the $(10, 10 +)$ interval, the EMGE algorithm obtains the minor data and performs better, indicating that the stability, robustness, and generalization ability of the algorithm itself are better.
In the second half of Figure 9, based on the results of 100 random data experiments, four pie charts are drawn to show the overall effect of different models. The analysis shows that the EMGE model accounts for the most minor proportion in the interval $(10, 10 +)$ . This further proves that the EMGE model can effectively improve the accuracy of link prediction and the overall data ranking.

By setting different experimental conditions, the experimental results demonstrate the application of the concept of ensemble learning to discover entities and relationships by combining textual and graph structural data. We employ various neural networks to capture diverse features and leverage a range of scoring functions to comprehensively evaluate prediction accuracy. Additionally, we introduce contrastive learning to construct loss functions, thus amplifying the differences between various samples.

5. Conclusions

In this paper, we introduce a novel link prediction model, EMGE, to address the challenges of knowledge fragmentation and hidden knowledge extraction in graph-structured data. It uses both structured graph information and unstructured textual data as multi-source information inputs, makes use of the attention mechanism to fusion the multi-source information, and adopts the idea of ensemble learning to extract semantic features from different types of neural network models and facilitate information interaction. In addition, we employ the concept of contrastive learning to create a new loss function that minimizes the gap between predicted values and the ground truth. This method achieves competitive results on the link prediction task and demonstrates good adaptability and robustness to datasets from different domains. Moreover, the EMGE model is validated on three datasets, including WN18RR, FB15k-237, and a Chinese financial dataset. The experimental results demonstrate a reduction in the mean rank (MR) by 0.2 times, an improvement in the mean reciprocal rank (MRR) by 5.9%, and an increase in the Hit@1 by 12.9% compared to the baseline model.

In the future, we plan to explore more information aggregation patterns and develop a novel framework based on a GNN. We believe that our work will contribute to the advancement of link prediction and facilitate the development of intelligent systems that can reason and learn from complex knowledge graphs.

Author Contributions

C.H. and X.L. primarily handled the conceptualization and methodology; C.H. carried out the software development and implementation; X.W. conducted the experimental validation; S.X. was responsible for data collection and management; C.H. and X.W. prepared the initial draft of the manuscript; X.L. and S.X. performed the review and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of China 2021YFC3300602, the Outstanding Academic Leader Project of Shanghai under the grant No. 20XD1401700, the National Natural Science Foundation of China under the grant 91746203 and 72204155, and the Natural Science Foundation of Shanghai under the grant 23ZR1423100.

Data Availability Statement

The data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hou, R.; Zhang, Y.; Ou, Q.; Li, S.; He, Y.; Wang, H.; Zhou, Z. Recommendation Method of Power Knowledge Retrieval Based on Graph Neural Network. Electronics 2023, 12, 3922. [Google Scholar] [CrossRef]
Lee, S.; Ahn, J.; Kim, N. Embedding Enhancement Method for LightGCN in Recommendation Information Systems. Electronics 2024, 13, 2282. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, H.; Zong, T.; Wu, J.; Dai, W. Knowledge Base Question Answering via Semantic Analysis. Electronics 2023, 12, 4224. [Google Scholar] [CrossRef]
Wang, P.; Liu, J.; Zhong, X.; Zhou, S. A Cybersecurity Knowledge Graph Completion Method for Penetration Testing. Electronics 2023, 12, 1837. [Google Scholar] [CrossRef]
Zhang, L.; Wang, J.; Wang, W.; Jin, Z.; Zhao, C.; Cai, Z.; Chen, H. A novel smart contract vulnerability detection method based on information graph and ensemble learning. Sensors 2022, 22, 3581. [Google Scholar] [CrossRef]
Jiang, J.; Liu, F.; Liu, Y.; Tang, Q.; Wang, B.; Zhong, G.; Wang, W. A dynamic ensemble algorithm for anomaly detection in IoT imbalanced data streams. Comput. Commun. 2022, 194, 250–257. [Google Scholar] [CrossRef]
Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S. Dbpedia—A Large-Scale, Multilingual Knowledge Base Extracted from Wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
Ji, L.; Wang, Y.; Shi, B.; Zhang, D.; Wang, Z.; Yan, J. Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding. Data Intell. 2019, 1, 238–270. [Google Scholar] [CrossRef]
Yang, B.; Yih, W.t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2014, arXiv:1412.6575. [Google Scholar]
Kazemi, S.M.; Poole, D. Simple Embedding for Link Prediction in Knowledge Graphs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Red Hook, NY, USA, 3–8 December 2018; Volume 31. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33rd International Conference on Machine Learning PMLR, New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems 26 (NIPS 2013), Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Zhang, Z.; Cai, J.; Zhang, Y.; Wang, J. Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3065–3072. [Google Scholar]
Wang, L.; Luo, J.; Deng, S.; Guo, X. RoCS: Knowledge Graph Embedding Based on Joint Cosine Similarity. Electronics 2023, 13, 147. [Google Scholar] [CrossRef]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d Knowledge Graph Embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-Based Multi-Relational Graph Convolutional Networks. arXiv 2019, arXiv:1911.03082. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for Knowledge Graph Completion. arXiv 2019, arXiv:1909.03193. [Google Scholar]
Baghershahi, P.; Hosseini, R.; Moradi, H. Self-Attention Presents Low-Dimensional Knowledge Graph Embeddings for Link Prediction. Knowl.-Based Syst. 2023, 260, 110124. [Google Scholar] [CrossRef]
Caifang, T.; Yuan, R.; Hualei, Y.; Ling, S.; Jiamin, C.; Yutian, W. Improving Knowledge Graph Completion Using Soft Rules and Adversarial Learning. Chin. J. Electron. 2021, 30, 623–633. [Google Scholar] [CrossRef]
Ju, J.; Yang, D.; Liu, J. Commonsense knowledge base completion with relational graph attention network and pre-trained language model. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4104–4108. [Google Scholar]
Wang, Y.; Xiao, W.; Tan, Z.; Zhao, X. Caps-OWKG: A capsule network model for open-world knowledge graph. Int. J. Mach. Learn. Cybern. 2021, 12, 1627–1637. [Google Scholar] [CrossRef]
Li, R.; Cao, Y.; Zhu, Q.; Bi, G.; Fang, F.; Liu, Y.; Li, Q. How Does Knowledge Graph Embedding Extrapolate to Unseen Data: A Semantic Evidence View. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022; Volume 36, pp. 5781–5791. [Google Scholar]
Wang, P.; Xie, X.; Wang, X.; Zhang, N. Reasoning through memorization: Nearest neighbor knowledge graph embeddings. In Proceedings of the 12th National CCF Conference, NLPCC 2023, Foshan, China, 12–15 October 2023; Springer: Cham, Switzerland, 2023; pp. 111–122. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, 3–7 June 2018; Springer: Cham, Switerland, 2018; pp. 593–607. [Google Scholar]
Niu, H.; He, H.; Feng, J.; Nie, J.; Zhang, Y.; Ren, J. Knowledge Graph Completion Based on GCN of Multi-Information Fusion and High-Dimensional Structure Analysis Weight. Chin. J. Electron. 2022, 31, 387–396. [Google Scholar] [CrossRef]
Deng, W.; Zhang, Y.; Yu, H.; Li, H. Knowledge graph embedding based on dynamic adaptive atrous convolution and attention mechanism for link prediction. Inf. Process. Manag. 2024, 61, 103642. [Google Scholar] [CrossRef]
Kim, D.P.T. Improve the Accuracy of Link Predictions on Sparse Networks Based on Similarity Measures and Multiple Ensemble Learning. J. Inf. Hiding Multim. Signal Process 2020, 11, 151–161. [Google Scholar]
Wang, T.; Jiao, M.; Wang, X. Link Prediction in Complex Networks Using Recursive Feature Elimination and Stacking Ensemble Learning. Entropy 2022, 24, 1124. [Google Scholar] [CrossRef]
Prabhakar, V.; Vu, C.; Crawford, J.; Waite, J.; Liu, K. An Ensemble Learning Approach to Perform Link Prediction on Large Scale Biomedical Knowledge Graphs for Drug Repurposing and Discovery. bioRxiv 2023, 19, 533306. [Google Scholar]
Gao, T.; Yao, X.; Chen, D. Simcse: Simple Contrastive Learning of Sentence Embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
Peng, M.; Liu, B.; Xie, Q.; Xu, W.; Wang, H.; Peng, M. SMiLE: Schema-Augmented Multi-Level Contrastive Learning for Knowledge Graph Link Prediction. arXiv 2022, arXiv:2210.04870. [Google Scholar]
Zhang, Z.; Sun, S.; Ma, G.; Zhong, C. Line Graph Contrastive Learning for Link Prediction. Pattern Recognit. 2023, 140, 109537. [Google Scholar] [CrossRef]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Deep Graph Contrastive Representation Learning. arXiv 2020, arXiv:2006.04131. [Google Scholar]
Shi, T.; Liu, Z. Linking GloVe with Word2vec. arXiv 2014, arXiv:1411.5595. [Google Scholar]
Peng, X.; Chen, G.; Lin, C.; Stevenson, M. Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis. arXiv 2021, arXiv:2104.04676. [Google Scholar]
Sonkar, S.; Katiyar, A.; Baraniuk, R. NePTuNe: Neural Powered Tucker Networkfor Knowledge Graph Completion. In Proceedings of the 10th International Joint Conference on Knowledge Graphs, Virtual Event, 6–8 December 2021; pp. 177–180. [Google Scholar]
Liu, J.; Chen, J.; Fan, C.; Zhou, F. Joint Embedding in Hierarchical Distance and Semantic Representation Learning for Link Prediction. arXiv 2023, arXiv:2303.15655. [Google Scholar]
Gregucci, C.; Nayyeri, M.; Hernández, D.; Staab, S. Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models. arXiv 2023, arXiv:2302.06229. [Google Scholar]
Li, Z.; Zhang, Q.; Zhu, F.; Li, D.; Zheng, C.; Zhang, Y. Knowledge graph representation learning with simplifying hierarchical feature propagation. Inf. Process. Manag. 2023, 60, 103348. [Google Scholar] [CrossRef]
Wang, J.; Qiu, D.; Liu, Y.; Wang, Y.; Chen, C.; Zheng, Z.; Zhou, Y. Contextual Dictionary Lookup for Knowledge Graph Completion. arXiv 2023, arXiv:2306.07719. [Google Scholar]

Figure 1. Graph representation learning typically relies on the topological information of the graph to find missing information in triples. For instance, the triple (Chen Long, live in, ?) can be formed as (Chen Long, live in, China) using the topological information, but the accuracy of this new triple is uncertain. To enhance the authenticity of the new triple information, semantic information from the nodes themselves can be incorporated into the graph representation learning. For example, the textual descriptive information of the Chinese Badminton Association contains crucial information indicating its location in China.

Figure 2. The framework of Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction. The model leverages textual descriptions and node information from the graph as multi-source data, integrating them to enhance the graph representation. Additionally, it adopts an ensemble learning approach to enhance the accuracy of link prediction.

Figure 3. The graph node information mapping process compresses the node semantic features, which can lead to a scarcity of semantic information for the graph nodes. To address this issue, textual description information is added to enhance the semantic expression and contextual information of the nodes.

Figure 4. Two forms of node semantic representation. Using two different information aggregation methods.

Figure 5. The key differences between EMGE and other state-of-the-art models [20,24,28,37,38,39,40,41,42].

Figure 6. Results for CFD dataset. Bolded fonts represent the best results.

Figure 7. Results for WN18RR dataset and FB15k-237 dataset. Bolded fonts represent the best results [20,24,28,37,38,39,40,41,42].

Figure 8. Ablation study for WN18RR and FB15k-237. Bolded fonts represent the best results.

Figure 9. Case study of FB15k-237 test set.

Table 1. Statistics of three datasets.

Dataset	Entity	Relation	Train	Valid	Test
CFD	5934	8	14,869	1316	929
FB15k-237	14,541	237	272,115	17,535	20,446
WN18RR	40,943	11	86,835	3034	3134

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, C.; Wang, X.; Luo, X.; Xie, S. Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction. Electronics 2024, 13, 2762. https://doi.org/10.3390/electronics13142762

AMA Style

Hou C, Wang X, Luo X, Xie S. Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction. Electronics. 2024; 13(14):2762. https://doi.org/10.3390/electronics13142762

Chicago/Turabian Style

Hou, Chunning, Xinzhi Wang, Xiangfeng Luo, and Shaorong Xie. 2024. "Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction" Electronics 13, no. 14: 2762. https://doi.org/10.3390/electronics13142762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Information Graph Embedding with Ensemble Learning for Link Prediction

Abstract

1. Introduction

2. Related Work

2.1. Link Prediction in Graph Reasoning

2.2. Ensemble Learning in Link Prediction

2.3. Contrastive Learning in Graph Embeddings

3. Method

3.1. Preliminary

3.2. Model Framework

3.2.1. Input Stage

3.2.2. Encoding Stage

3.2.3. Decoding Stage

4. Experiments

4.1. Datasets

4.2. Evaluation Protocols

4.3. Settings

4.4. Results

4.5. Ablation Study

4.6. Case Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI