Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples

Wang, Shaofei

doi:10.3390/app15031088

Open AccessArticle

Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples

by

Shaofei Wang

College of Information Engineering, Capital Normal University, Beijing 100048, China

Appl. Sci. 2025, 15(3), 1088; https://doi.org/10.3390/app15031088

Submission received: 6 November 2024 / Revised: 22 December 2024 / Accepted: 16 January 2025 / Published: 22 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge graph (KG) reasoning aims to obtain new knowledge based on existing data. Utilizing large language models (LLMs) through in-context learning for KG reasoning has become a significant direction. However, existing methods mainly extract in-context triples by manually defined standards (such as the neighbors that are directly linked with the query triple), without considering whether they are useful for LLM reasoning. Furthermore, the triples beyond the neighbors can also provide important clues for reasoning. Therefore, it is necessary to extract more useful in-context triples of LLMs for KG reasoning. This paper proposes a rule-and-reinforce in-context triple extraction method to enhance the in-context learning of LLMs for KG reasoning. First, we collect the in-context triples specific to each query triple with the guidance of logical rules, and a neural extractor is pre-trained by the collected triples. Subsequently, the feedback of LLMs is collected as rewards to further optimize the extractor, where the policy gradient is utilized to encourage the extractor to explore more useful triples that yield higher rewards. The experimental results on five different knowledge graphs demonstrate that the proposed method can effectively improve the reasoning performance of LLMs. Compared to the traditional reasoning method AnyBURL, the greatest improvement is 0.147 on Hits@10, FB15k-237.

Keywords:

knowledge graph reasoning; logical rules; large language models

1. Introduction

A knowledge graph is a type of data that represents real-world knowledge in the form of triples

(h, r, t)

, where h and t denote the head entity and tail entity, respectively, and r represents the relation between the two entities [1]. Knowledge graphs contain large-scale structured factual knowledge, which can effectively improve the effectiveness of various tasks [2,3,4,5]. Knowledge graph reasoning is the process of analyzing existing triples of knowledge graphs to infer new knowledge [6].

Knowledge graph reasoning has been extensively studied, and existing methods are mainly categorized into two groups: (1) distributed representation-based methods, which project entities and relations into low-dimensional dense vector space and perform reasoning by measuring the distances between vectors; (2) logical rule-based methods, which mine logical rules from knowledge graphs and apply them to existing data to find new triples. However, in practical applications, knowledge graphs are generally constructed by information extraction methods automatically, leading to the incompleteness problem [7]. Since the aforementioned traditional methods are mainly data-driven methods, the incompleteness problem of knowledge graphs limits the knowledge that can be provided by existing data, thereby constraining the reasoning effectiveness of these methods [8,9]. As illustrated in the example shown in Figure 1, given the query triple “(Eric Allin Cornell, lives in country, ?)”, a logical rule can be mined from the knowledge graph through logical rule learning: “(x, lives in country, w)←(x, works at, y)∧(y, located in state, z)∧(z, state in country, w)”. This logical rule indicates that if there exists a path consisting “works at, located in state, state in country” between entities x and w, then we can infer that the relation “lives in country” exists between x and w. However, as shown in Figure 1c, by analyzing the existing triples in the knowledge graph, it can be found that due to the incompleteness problem, the relation “state in country” (depicted with the dashed line) between the entities “Colorado” and “USA” does not exist. Therefore, the path represented by the aforementioned logical rule is not valid in the existing data, ultimately leading to an incomplete reasoning process.

In recent years, the emergence of large language models (LLMs) has sparked a new wave in the field of artificial intelligence. LLMs, equipped with millions of parameters and trained on large-scale corpora, have acquired immense knowledge and demonstrated superior performance across various natural language tasks, such as machine translation [10], text analysis [11], and intelligent question answering [12]. Consequently, the knowledge implied in LLMs may alleviate the limitations imposed by the incompleteness of knowledge graphs. As a result, leveraging LLMs for knowledge graph reasoning has emerged as a novel direction.

To better adapt the LLMs to reasoning tasks, in-context learning is utilized in the process of reasoning, the main idea is to retrieve relevant triples from existing knowledge graphs based on the given query triple, incorporating them into the prompt [13,14,15]. However, existing methods mainly extract in-context triples from KGs by manually defined standards (such as the neighbors that are directly linked with the given query triple), without considering whether these triples are truly useful for the reasoning of LLMs. In fact, triples directly linked to the query entity may not necessarily be effective for the reasoning of the given query, while triples not directly connected to the query entity may potentially provide crucial reasoning clues. As depicted in Figure 1, for the given query “(Eric Allin Cornell, lives in country, ?)”, existing methods may retrieve triples directly connected to the query entity “Eric Allin Cornell” as in-context triples. Upon analysis, it becomes evident that most of these retrieved triples are irrelevant to the reasoning. Unselectively incorporating them into the context may instead introduce noise, thereby diminishing the reasoning effectiveness. In contrast, it can be determined that introducing triples such as “(Eric Allin Cornell, works at, University of Colorado Boulder), (University of Colorado Boulder, located in state, Colorado)” as in-context triples can provide valuable contextual knowledge, thereby enhancing the performance of LLMs on knowledge graph reasoning.

Therefore, in this paper, we propose the rule-and-reinforce triple extraction method; the method can enhance the in-context learning of LLMs for knowledge graph reasoning. The proposed method contains two parts: (1) logical rules guiding in-context triples retrieval and extractor pre-training. We retrieve the in-context triples involved in the logical rules for each query triple as supervised training data, and train an extractor to generate in-context triples for specific query triples; (2) reinforcement learning with LLMs’ feedback as rewards. The feedback of LLMs is collected as rewards to further optimize the extractor through reinforcement learning, in which the policy gradient is utilized to encourage the extractor to explore more useful triples that yield higher rewards. Through experiments and comparisons conducted on five different knowledge graphs, the results demonstrate that the in-context triples extracted by the proposed method can effectively enhance the performance of large language models in knowledge graph reasoning.

The contributions of this paper are summarized as follows:

To alleviate the problem that traditional LLM-based reasoning methods fail to fully utilize the existing data of knowledge graphs, we propose the rule-and-reinforce triple extraction method; the method can enhance the in-context learning of LLMs for knowledge graph reasoning;
In order to obtain more effective in-context triples, we construct an in-context triple extractor, which is designed based on the encoder-decoder architecture. The triples involved in the logical rules of the knowledge graph are utilized as supervised data for pre-training. Then, the extractor is further trained through reinforcement learning methods with the feedback from LLMs;
The experimental results on five different knowledge graphs indicate that the in-context triples extracted by the proposed method can effectively enhance the capabilities of LLMs in knowledge graph reasoning.

2. Related Work

Knowledge graph reasoning is one of the crucial research directions in the field of artificial intelligence. In this section, the methods of knowledge graph reasoning are introduced from two aspects: traditional knowledge graph reasoning methods, and pre-trained language model and large language model-based reasoning models.

2.1. Traditional Knowledge Graph Reasoning Methods

Traditional knowledge graph reasoning methods mainly contain two streams: (1) distributed representation-based methods. These methods embed entities and relations into low-dimensional continuous vector space, and the plausibility of triples can be formulated as vector computation. TransE [16] is a representative distributed representation learning model; the relation of a triple is modeled as the translation process from the head entity to tail entity. Then, TransH [17], TransR [18], and TransD [19] further extend the method to interpret 1-N, N-1, and N-N relations. In order to model more complex relation patterns of knowledge graphs, some studies project the entities and relations into complex vector space. ComplEx [20] introduces the complex vector to model the symmetric and antisymmetric relations. RotatE [21] transforms the relations as rotational operations from head entities to tail entities.

(2) Logical rule-based methods. The main idea of logical rule-based methods is to learn the frequency patterns of knowledge graphs and represent the patterns by logical rules. Then, the rules are applied to existing triples and the new triples can be obtained through reasoning by logical rules. Amie [22], Amie+ [23], and Amie3 [24] utilize a machine learning method to mine logical rules for specific relations, and prune the rules with low confidence and coverage. RLvLR [25] defines new criteria for the logical rules and enables the method to extend to large-scale knowledge graphs. NeuralLP [26] transforms the logical rule learning to a differentiable process and the process of rule mining can be conducted by the gradient-based optimization. AnyBURL [27] proposes a bottom-up rule mining method; it can learn fuzzy and uncertain rules and can extend to large-scale knowledge graphs with relatively low resource consumption.

However, these methods are mainly data-driven methods, and highly rely on the existing triples in the knowledge graph. The incompleteness problem of knowledge graphs may impair the performance of reasoning [16,22,23].

2.2. Pre-Trained Language Model- and Large Language Model-Based Methods

Based on the self-attention mechanism, pre-trained language models are constructed and are pre-trained on a large corpus of text. These models have learnt vast amounts of knowledge and can be well adapted to a variety of downstream tasks through simply fine-tuning, such as machine translation [10], text analysis [11], and intelligent question answering [12]. Some studies also utilize the pre-trained language models to enhance the performance of knowledge graph reasoning. KG-BERT [28] represents the query triple as a textual sequence with the special token [CLS] in front of the sequence. Then, the sequence is encoded by the pre-trained language model BERT [29] and the hidden state of [CLS] is put into a classifier to predict the missing entity. KGT5 [30] further trains T5 small [31] on link prediction and question answering tasks with multiple forms; then, the reasoning can be completed by integrating the query triple and simple instructions. GenKGC [32] is developed based on the model BART [33]. Besides the query triple, the method also incorporates demonstration triples which contain the same query relation to improve the reasoning results.

Subsequently, large language models are proposed and bring new waves in multiple fields. LLMs are trained on a very large corpus and contain hundreds of billions of parameters. Many experiments have proven that LLMs perform well at various natural language processing tasks without further training for the tasks [10,11,12]. In order to make the LLMs comprehend different downstream tasks, in-context learning is widely used [13]. It is a type of few-shot learning method which incorporates a few examples into the prompt [15]. AutoKG [34] utilizes the LLMs GPT4 and chatGPT (https://openai.com/blog/chatgpt (accessed on 15 January 2025)), and adopts in-context learning on knowledge graph reasoning through one-shot learning. A triple is extracted as the in-context example to enhance the reasoning performance. KAPING [35] first links the entities in the query and that of knowledge graphs, and the triples which contain the linked entities are retrieved as in-context triples. KoPA [36] selects the triples which contain the query head entity as in-context learning triples to enhance the performance of triple classification. KICGPT [37] first conducts the knowledge graph reasoning through the distributed representation learning method RotatE, and the results are further refined by the LLMs through multiple interactions.

However, the in-context triples utilized in previous LLM-based reasoning methods are mainly retrieved around the query entities or query relations, which cannot make full use of the existing triples of knowledge graphs. The triples that are not directly connected to the query entities, as well as those that do not contain the query relations, may also potentially provide important knowledge for reasoning. While our proposed method can retrieve these useful in-context triples with the guidance of logical rules and the feedback of LLMs, the experimental results also demonstrate the effectiveness of the model.

3. Formalized Description of the Proposed Solution

Knowledge graph reasoning aims to obtain new knowledge based on the existing knowledge graphs. In this paper, we formulate the knowledge graph reasoning task as link prediction. Formally, a knowledge graph is a collection of triples, i.e.,

G = {(h_{i}, r_{i}, t_{i}) | h_{i}, t_{i} \in E, r_{i} \in R, 1 \leq i \leq N}

, where

E

and

R

represent the sets of entities and relations, respectively, and N is the number of triples of

G

. For the link prediction task, given a query triple

q = (h, r, ?)

, where h is the query head entity (

h \in E

), and r is the query relation (

r \in R

), the ? denotes the tail entity to be predicted.

More specifically, in the proposed model, besides the query triple

(h, r, ?)

, the in-context triples are also retrieved from the knowledge graph

G

. We denote the set of retrieved in-context triples as

I = {(h_{j}, r_{j}, t_{j}) | h_{j}, t_{j} \in E, r_{j} \in R, 1 \leq j \leq M}

, where

I \subset G

, and

M ≪ N

. To simplify the process of connecting the predicted results of LLMs with the entities in the knowledge graph, we also extract the potential candidate entities from the knowledge graph and denote the set of candidate entities as

C

.

Based on the given query triple

(h, r, ?)

, retrieved in-context triples

I = {(h_{j}, r_{j}, t_{j}) | h_{j}, t_{j} \in E, r_{j} \in R, 1 \leq j \leq M}

, and candidate tail entities

C

, the prompt can be constructed and put into the LLMs; finally, the predicted entities can be generated by the LLMs.

4. Methodology

In order to extract the in-context triples which provide useful clues for reasoning, an in-context triple extractor is proposed. The extractor is designed based on the encoder-decoder structure. Figure 2 depicts the overall framework of the proposed rule-and-reinforce triple extracting method. The training procedure of the extractor is composed of two main phases:

(1) Logical rules guide in-context triples retrieval and extractor pre-training (Figure 2a). In this phase, the in-context triples for each query triple are retrieved based on whether they can support the logical rules corresponding to the current reasoning. These in-context triples with corresponding query triples are collected as training data, and the extractor can be trained with the guidance of logical rules;

(2) Reinforcement learning with LLM’s feedback as rewards (Figure 2b). In this phase, the feedback of LLMs when incorporating the in-context triples generated by the pre-trained extractor is collected. The feedback is utilized as the rewards to further train the extractor through reinforcement learning.

4.1. Logical Rules-Guided in-Context Triples Retrieval and Extractor Pre-Training

The in-context triple extractor employs a transformer-inspired encoder-decoder-based architecture. Initially, the supervised data are constructed from the logical rules-guided triples to pre-train the extractor, enabling it to adapt well to the task of generating the in-context triplets for a given query triple.

4.1.1. Logical Rules-Guided in-Context Triples Retrieval

Knowledge graphs contain a vast amount of frequency patterns, which can be expressed in the form of logical rules. These logical rules can serve as explicit reasoning bases for reasoning. Based on this, for a given query triple, the in-context triples can be retrieved from the knowledge graph by searching for those that support the relevant logical rules associated with the query triple.

Figure 2. Framework of the proposed rule-and-reinforce in-context triple extraction method.

For example, assuming that a query triple is “(Zuckerberg, nationality, ?)”, then two logical rules about “nationality” can be obtained by the logical rule learning models (logical rule learning models generally take the given knowledge graph as input and mine a series of logical rules through analyzing the frequency patterns. Meanwhile, the confidence can be measured for each logical rule to reflect the degree of support for the corresponding logical rule within the existing knowledge graph data. There are various logical rule learning models, such as AMIE [22], AMIE3 [24], and AnyBURL [27], see Section 2.1 for more details.): “(x, nationality, y)←(x, works in, z)∧(z, located in state, w)∧(w, state in country, y)”, and “(x, nationality, y)←(x, born in, z)∧(z, located in state, w)∧(w, state in country, y)”. Then, the triples which support each logical rule can be extracted from knowledge graph

G

(see more examples of this part in Appendix A). These in-context triples can provide crucial in-context clues for LLMs to reason on this given query triple. Therefore, we can construct the training data by retrieving triples from the given knowledge graph that can support the corresponding logical rules, and further pre-train the in-context triple extractor.

Formally, for a relation

r_{i}

, the logical rules learning models can be employed on the knowledge graph

G

, and the set of logical rules

U_{i}

specific to the relation

r_{i}

can be obtained, where

U_{i} = {u_{j}}_{j = 1}^{| U_{i} |}

, and

u_{j}

is a logical rule about the relation

r_{i}

. Based on a logical rule

u_{j}

, a set of triplets that support the reasoning of

(h_{i}, r_{i}, t_{i})

can be retrieved from the knowledge graph, which is denoted by

I_{j}

. Subsequently, we merge the sets of triples that support the rules in

U_{i}

, denoted as

I_{i} = I_{1} \cup I_{2} \cup \dots \cup I_{| U_{i} |}

. If the number of in-context triples of

I_{i}

is greater than M, then these in-context triples are sorted in descending order according to the confidence and order of the logical rules they support, and select the top M triples. Finally, for each query triple

(h_{i}, r_{i}, t_{i})

, the logical rules set

U_{i}

can be obtained, and then the in-context triples

I_{i}

which support the reasoning of

(h_{i}, r_{i}, t_{i})

are retrieved according to

U_{i}

. Together, they constitute the pre-training data for the in-context triple extractor:

D_{rule} = {(h_{i}, r_{i}, ?), I_{i}} .

(1)

4.1.2. Supervised Extractor Pre-Training

The in-context triple extractor is pre-trained based on the former training data

D_{rule} = {(h_{i}, r_{i}, ?), I_{i}}

. It is an encoder-decoder generation model

θ

, and takes a query triple

(h_{i}, r_{i}, ?)

as input, then outputs the set of in-context triples

I_{i}

guided by the logical rules. The encoder, decoder, and the process of pre-training are introduced as follows:

Encoder. Given a query triple

(h_{i}, r_{i}, ?)

, it is first converted into a sequence and then encoded to obtain the corresponding hidden states:

M_{i} = \begin{matrix} Encoder \end{matrix} (h_{i} [S E P] r_{i}), M_{i} \in R^{d \times | S_{i} |},

(2)

where the input is the sequence

S_{i} = {h_{i} [S E P] r_{i}}

, and

[S E P]

is a separation token between

h_{i}

and

r_{i}

.

M_{i}

are the hidden states of the given sequence

S_{i}

.

| S_{i} |

represents the length of sequence

S_{i}

, and d is the dimension of the hidden states.

Decoder. The decoder aims to decode the obtained hidden states

M_{i}

, and generate the set of the in-context triples

I_{i}

for reasoning. The procedure of generating in-context triples

I_{i}

at the k-th decoder step can be represented by:

\begin{matrix} F_{i} & = Decoder (M_{i}), F_{i} \in R^{d \times k}, \\ P (y_{k} | {h_{i} [S E P] r_{i}}, I_{i}^{< k}; θ) & = softmax ({WF}_{i} + b), \\ P (I_{i} | {h_{i} [S E P] r_{i}}; θ) & = \prod_{k = 1}^{K} P (y^{k} | I_{i}^{< k}, {h_{i} [S E P] r_{i}}; θ), \end{matrix}

(3)

where

F_{i}

are the output hidden states of the extractor decoder.

P (y_{k} | {h_{i} [S E P] r_{i}}, I_{i}^{< k}; θ)

represents the probability distribution at the k-th decode step.

I_{i}^{< k}

denotes the decoded sequence in the previous

k - 1

decoding steps. W and b are trainable parameters.

Training. During the training phase, the optimization of parameters is achieved by reducing the negative log-likelihood of the parallel training data

D_{r u l e} = {(h_{i}, r_{i}, ?), I_{i}}

to its minimum value:

L (θ) = - \frac{1}{| I_{i} |} \sum_{k = 1}^{| I_{i} |} l o g P (y_{k} | {h_{i} [S E P] r_{i}}, I_{i}^{< k}; θ),

(4)

where

I_{i}^{< k}

is the generated sequence in the previous k-1 decoding steps.

| I_{i} |

is the length of the sequence of the in-context triples

I_{i}

, where i denotes the index of the training data.

After pre-training, given a query triple

(h, r, ?)

, the in-context triple extractor can output the corresponding in-context triples which can be used for LLM reasoning.

4.2. Reinforcement Learning with LLM’s Feedback as Rewards

In previous phases, the extractor has been trained by the in-context triples which are retrieved by the logical rules. However, triples generated solely by the extractor supervised by the logical rules may not necessarily strengthen the reasoning process of LLMs. To alleviate this, in addition to the aforementioned training, we collect feedback from LLMs and further optimize the extractor through reinforcement learning.

4.2.1. Feedback of LLM Collecting

In this subsection, we collect the feedback of the LLM on knowledge graph reasoning, and the feedback is further utilized to optimize the in-context triple extractor.

First, we sample the in-context learning data

{(h_{i}, r_{i}, ?), I_{i}}

based on the current extractor models

I_{i} \sim P (I_{i} | {h_{i} [S E P] r_{i}}; θ)

, where

P (I_{i} | {h_{i} [S E P] r_{i}}; θ)

can be calculated by Equation (3). Then, we initially concatenate the query triple

(h_{i}, r_{i}, ?)

and the candidate tail entities (in order to simplify the entity linking of predicted entities by the LLMs and entities in knowledge graphs, for each query triple and corresponding in-context triples

{(h_{i}, r_{i}, ?), I_{i}}

, the entities contained in triples of

I_{i}

and entities which are connected with

h_{i}

in m hops are collected as candidate tail entities. This set of entities is denoted by

C_{i}

)

C_{i}

as a prediction prompt (the Top-N prediction prompt is to enable the LLM to predict the N tail entities sorted by the probabilities. The prompt can be formulated as: “Given the query triple $(h, r, [m a s k])$ , please predict the top n tail entities $[m a s k]$ by probabilities in descending order from the candidates { $c a n d i d a t e$ $t a i l$ $e n t i t i e s$ }. Below are triples that might be helpful for answering this question { $i n - c o n t e x t$ $t r i p l e s$ }. Then, the $[m a s k]$ is:”.), and the predicted entities can be obtained by the LLM. We denote the probability of the ground truth tail entity

t_{i}

by

p_{0}

.

After that, we incorporate the in-context triples

I_{i}

into the prompt for LLM reasoning, and the probability of generating gold standard tail entities is represented as

p (I_{i})

. If

p (I_{i}) > p_{0}

, it is deemed as incorporating the in-context triples

I_{i}

which are helpful for the reasoning of LLMs; thus, the

p (I_{i}) - p_{0}

are used as rewards:

\begin{matrix} R (I_{i}) & = p_{i}^{I} - p_{i}^{q}, \\ p_{i}^{q} & = P (t_{i} | (h_{i}, r_{i}, ?), C; ω), \\ p_{i}^{I} & = P (t_{i} | (h_{i}, r_{i}, ?), I_{i}, C; ω), \end{matrix}

(5)

where

ω

denotes the LLM,

p_{i}^{q}

and

p_{i}^{I}

represent the probability distributions of the LLM outputting the gold standard answer when the prompt does not contain the in-context triples and when it contains the in-context triples, respectively. As a result, we utilize the difference between these two probabilities, denoted as

R (I_{i})

, as the reward to further optimize the in-context triple extractor.

4.2.2. Reinforcement Learning with LLM’s Feedback

Based on the feedback of LLMs, the expected reward of LLM

ω

can be formulated by:

J (θ) = E_{I_{i} \sim P (I_{i} | {h_{i} [S E P] r_{i}}; θ)} (R (I_{i})),

(6)

where

R (I_{i})

is the reward for the sampled in-context triples

I_{i}

. If the reward is large, it indicates that the sampled in-context triple could enhance the LLM to generate the correct answer, so the sampled in-context triple should be encouraged.

By utilizing the policy gradient, the extractor could be optimized by:

▽ J (θ) \approx R (I_{i}) ▽ log P (I_{i} | (h_{i}, r_{i}, ?); θ) .

(7)

It encourages the extractor to explore more in-context triples that yield higher rewards.

4.3. In-Context Learning and Reasoning

After obtaining the in-context triple extractor, here, we leverage the extractor and LLM to conduct the knowledge graph reasoning. Given a query triple

(h, r, ?)

, the necessary in-context triples

I

can be obtained by the extractor, and the candidate tail entities

C

can be retrieved. Then, the contexts above are used to construct the prompt. Finally, the prompt is put into the LLM and the predicted tail entities

A

can be generated by the LLM. This procedure can be formally represented by:

A = L L M ({(h, r, ?), I, C}),

(8)

where

A

is the set of predicted tail entities sorted by the probabilities.

5. Experimental Setup

This section mainly introduces the datasets used in the experiments, the compared baseline methods, the evaluation metrics, and the experimental settings.

5.1. Datasets

In the experiments, we utilize five different knowledge graphs to evaluate the proposed reasoning method:

WN18RR [38], obtained from the knowledge graph WordNet [39], with the reverse relations removed;
FB15k [16], a subset of the universal knowledge graph Freebase [40];
FB15k-237 [41], derived from FB15k, and some reverse relations are deleted.
YAGO3-10-dr [42], derieved from YAGO3-10 [38] with the reciprocal relations removed.
Wikidata5m [43], derieved from Wikidata [44]. The scale is much larger than for the previous four datasets.

The statistics for the datasets are shown in Table 1, where “#entity” and “#relation” represent the numbers of entities and relations, respectively. “#train”, “#valid”, and “#test” denote the number of triples in the training set, validation set, and test set for each dataset, respectively.

5.2. Baseline Methods

In the experiments, we employ seven methods for comparison, and these methods can be categorized into the following three types:

Distributed representation-based methods. TransE [16] is a translation-based distributed representation learning method, and RotatE [21] is a complex vector space-based method which can better model the complex relations;
Logical rule-based methods. We use two representative models: AMIE [24] is one of the representative logical rule learning methods, and AnyBURL [27] improves the process of logical rule learning by extension to large-scale datasets;
Pre-trained language model- and large language model-based methods. The models KG-BERT [28] and KGT5 [30] are pre-trained language model-based reasoning methods. ${GPT}_{o n e - s h o t}$ is the implementation version of the one-shot in-context learning of the model AutoKG [34] on the large language model GPT3.5.

5.3. Evaluation Metrics

In the experiments, we utilize the mean reciprocal rank (MRR) and Hits@n [16] as metrics to evaluate the results. Specifically, the reasoning results of the proposed method are multiple entities sorted in descending order of probabilities, so the metric Hits@n means the proportion of gold standard entities in the top n of the predicted results. MRR measures the average of the reciprocal ranks of all the testing data. These methods are widely used in various knowledge graph reasoning studies [42,45].

5.4. Experimental Settings

In the procedure of logical rule mining, we utilize the model AnyBURL [27] to mine the logical rules on the given knowledge graph

G

. The maximum number of in-context triples M is set to 5. During the process of logical rule learning, the parameters of AnyBURL are configured to be identical to the paper [27]. The in-context triple extractor

θ

adopts the generative model Transformer [46,47], the dimension d is set to 512, and the batch size is 4096. The iterations of the layers in the encoder and decoder are set to 6. The other parameters of

θ

are the same as in [48]. When collecting the candidate tail entities

C_{i}

, the number of hops m is set to 4. In the in-context learning and reasoning procedure, we utilize the large language model gpt-3.5-turbo. The temperature is set to 0.8, and the maximum number of tokens is 1024. The experiments are conducted with Nvidia P100 GPUs.

6. Experimental Results and Discussion

In this section, the performance of the proposed method and the baselines are evaluated on five datasets. The results are compared and analyzed in detail.

6.1. Comparison with Baselines

In this section, the reasoning results of the baselines and the proposed method on five datasets are compared and analyzed. The main results are shown in Table 2 and Table 3 (in Table 2 and Table 3, the results of rows 1 to 3 are quoted from [42]. The results of rows 4 to 6 are obtained from the original papers [27,28,30]. The results of rows 7 and 8 are obtained by our implementation) and the best results are marked in bold.

In Table 2 and Table 3, the results in rows 1–2 are obtained by distributed representation learning models, and the rows 3–4 are the performance of the logical rules learning methods. Rows 5–7 display the performance of the pre-trained language models and the large language models. The results in row 8 are obtained by our proposed method.

Comparing the results in rows 1–4 (traditional methods) and rows 5–8 (pre-trained language model- and large language model-based methods) in two tables, among 15 metrics, the 10 best results are obtained by the language model-based methods. These results demonstrate that employing a large language model can enhance the performance of knowledge graph reasoning. One reason could be that compared to reasoning based on limited knowledge graph triples, pre-trained language models contain more extensive knowledge, which is more conducive to completing the reasoning process.
Considering the results in rows 5–6 and rows 7–8 of the two tables, most of the performances in rows 7–8 are better than rows 5–6. The reason may be that the methods ${GPT}_{o n e - s h o t}$ and RuleLLM employ in-context learning in reasoning, which can help LLMs to comprehend the given reasoning task and further improve the reasoning performance. These results validate the importance and effectiveness of in-context learning for LLMs on reasoning.
Our proposed method performs the best on three datasets (WN18RR, FB15k-237, and Wikidata5m). Compared with traditional reasoning methods, the greatest improvement is 0.147 on Hits@10 for FB15k-237 (compared with the model AnyBURL). These results demonstrate the effectiveness of our proposed method. One reason may be that the proposed rule-and-reinforce in-context triple selection method is able to extract better in-context examples with respect to the specific reasoning task, providing more helpful factual evidence for reasoning.

Although the experimental results show that the proposed method can achieve certain improvements, there are still several limitations: (1) First, the reasoning results entirely depend on the output of LLMs, which suffer from hallucination issues. Therefore, even if the in-context triples are correct, the reasoning results may still be wrong. This is one of the limitations of the proposed method. In the future, we should focus on retrieving more accurate in-context triples on the one hand, and processing the outputs of the LLMs on the other hand, to further improve the quality of the reasoning results. (2) In addition, in this method, one of the important bases for exploring the in-context triples are the logical rules, which are mainly obtained through probabilistic methods by analyzing the frequent patterns. In practical applications, the logical rules used are not always appropriate for the given reasoning task. This is another limitation of the proposed method. Therefore, considering more reasoning factors to improve the accuracy of extracting in-context triples is also one of the directions that is worth further investigation.

6.2. Ablation Study

In the proposed model, we utilize in-context learning on LLMs for knowledge graph reasoning. The in-context triples are extracted by the extractor, which is trained by the logical rule-guided supervisory data as well as the reasoning feedback of LLMs. So, in this section, we study the performance when different in-context triples are incorporated on the validation sets of WN18RR and FB15k-237. The results are displayed in Table 4.

In Table 4, row 1 displays the results when only the query triple is fed into the LLMs for reasoning. The method of row 2 incorporates the triples which are directly connected to the query entity, but the in-context query may not necessarily be effective for the reasoning of a given query task. In row 3, the in-context triples are obtained by the extractor, which is trained by the logical rule-guided supervision data. Row 4 shows the performance when both the logical rule-guided triples and the LLM’s feedback are all used to train the extractor.

Comparing the results of rows 1 and 2, all the results of rows 3 and 4 are better than those of rows 1 and 2. These results confirm that refining in-context triples into the prompt can enhance the performance of knowledge graph reasoning. When the in-context triples are obtained by the extractor which is trained by both the logical rules-guided triples and the LLM’s feedback (row 4), we can obtain the best performance. This performance indicates that the in-context triples obtained based only on the logical rules may not always be useful for LLM reasoning. Therefore, it is necessary to further optimize the extractor using feedback from the LLMs on top of the logical rules, so that it can extract triples that are more suitable for LLM reasoning. Comparing the results in row 4 and row 1, the greatest improvement is 0.317 on Hits@1 for WN18RR. These results demonstrate the effectiveness of our proposed method.

6.3. Performance on Entities with Different Frequencies

Considering the long tail distribution of entities in the knowledge graphs [7], in this experiment, we divide the data based on the frequency of entities and observe the performance of the proposed method in reasoning. Specifically, the validation set of FB15k-237 is separated into four subsets according to the frequencies of the entities: 1–10, 10–20, 20–30, and >30. As the quantities of the four subsets are different, we randomly sample 500 query triples from each subset. The reasoning results are depicted in Figure 3.

Figure 3 depicts the reasoning results on query triples with different entity frequencies. The x-axis of each sub-figure represents different frequencies of entity, and the y-axes denote the MRR (Figure 3a), Hits@1 (Figure 3b), and Hits@10 (Figure 3c), respectively. We can conclude that the performance of the traditional reasoning methods (RotatE and AnyBURL) is greatly affected by the frequency of entities. Only when the frequency of entities is high can better results be achieved (only when the frequency of entities is larger than 20 are the results of RotatE slightly better than those of

{GPT}_{o n e - s h o t}

on MRR in Figure 3a). One reason could be that the traditional data-driven models perform better on entities with higher frequencies. In contrast, our model performs well across varying frequencies with the best performance occurring at the lowest ones.

6.4. Performance of Methods with Different Number of in-Context Triples

Previous experiments have demonstrated that in-context triples retrieved by the proposed method can enhance the reasoning of LLMs. In this experiment, we discuss the performance when introducing different numbers of in-context triples. The experiments are conducted on the validation set of FB15k-237. The numbers of in-context triples are set to 1, 3, 5, 7, and 9, respectively. The results are depicted in Figure 4.

Figure 4 demonstrates the MRR (Figure 4a), Hits@1 (Figure 4b), and Hits@10 (Figure 4c) of the reasoning results when varying the number of in-context triples. It can be observed that as the number of in-context triples increases from 1 to 5, the reasoning performance gradually improves. However, as the number continues to grow beyond 5, there is a slight decline in the reasoning results. This phenomenon may be attributed to the fact that when the number of in-context triples becomes larger, more noise may be introduced into the reasoning and consequently deteriorate the results. Taking the query triple in Figure 1 as an example, when the number of in-context triples are set to 5, the extracted in-context triples are (Eric Allin Cornell, works at, University of Colorado Boulder), (University of Colorado Boulder, is located in, Colorado), (Eric Allin Cornell, was born in, Palo Alto), (Palo Alto, is located in, California), (Eric Allin Cornell, graduated from, Massachusetts Institute of Technology), these triples are enough to provide necessary clues for reasoning. If the number is set to 9, four more triples (Eric Allin Cornell, has won prize, King Faisal International Prize), (Eric Allin Cornell, has friend, Wolfgang Ketterle), (Wolfgang Ketterle, nationality, Germany), (Wolfgang Ketterle, graduated from, Heidelberg University) are extracted, and it is obvious that they are not helpful for reasoning. In summary, the optimal performance is achieved when the number of in-context triples is set to 5.

6.5. Manual Evaluation of in-Context Triples by Different Methods

In order to further evaluate whether the in-context triples retrieved by the proposed method are more effective for reasoning, in this experiment, we randomly select 200 query triples from the test set of FB15k-237, and invite experts to evaluate the effectiveness of the extracted triples for reasoning. Specifically, five experts in the field of natural language processing are invited and are required to score the effectiveness of the extracted in-context triples for reasoning based on the corresponding query triple, with a scoring range of 0–10, where a higher score represents higher effectiveness. Furthermore, for the convenience of comparison, we also utilize the triples connected to the query entities and ask the experts to score them as well. Table 5 shows an example. It displays the scores given by five experts on the in-context triples extracted by three different methods.

In Table 5, the query triple is (Eric Allin Cornell, lives in country, ?), and the in-context triples extracted by the different methods are scored manually. Considering the in-context triples in row 1, we can see that all the five triples cannot help for the reasoning of the given query. Then, considering the in-context triples in row 3, the triples about the work place can provide useful clues for the reasoning. As a consequence, the scores in row 1 are quite low, and the scores in the last row are the highest.

The statistics for the overall scores are shown in Table 6.

Table 6 shows the average score of each expert for the effectiveness of the in-context triples extracted by different methods. “Ex. 1” is an abbreviation of “Expert 1”, and the “average” in the last column is the average score of the former five columns. In the first row, the triples which are directly connected by the query entities are retrieved as in-context triples. In the second row, the in-context triples are obtained by the extractor, which is trained only by logical rules-guided triples. In the last row, the in-context triples are retrieved by our proposed method. We can see that the average scores of rows 2 and 3 are higher than those of row 1; these results demonstrate the effectiveness of refining the in-context triples. Finally, the scores of row 3 are the best. The results demonstrate that our proposed extractor, which is trained by both logical rules and feedback of LLMs, can extract more useful in-context triples for each specific reasoning task.

7. Conclusions and Future Work

We consider the problem that existing LLM-based knowledge graph reasoning methods extract the in-context triples only through fixable metrics, such as triples surrounding the query entities, or triples contain the query relations. These methods cannot make full use of the triples of knowledge graphs, resulting in difficulties in ensuring the effectiveness of knowledge graph reasoning. To alleviate this problem, in this paper, a rule-and-reinforce in-context triple extraction method is proposed to enhance the in-context learning of LLMs for KG reasoning. Specifically, the encoder-decoder structure-based extractor is trained by the triples which support the reasoning of the logical rules. Furthermore, the LLM’s feedback is also collected and is formulated as rewards to optimize the in-context triple extractor. Finally, given a query triple, the extractor can generate the necessary in-context triples to enhance the reasoning of the LLMs. The proposed method and its experimental results demonstrate that both logical rules and the feedback of LLMs are effective for in-context triples extraction. Furthermore, the extracted in-context triples obtained by the proposed method can improve the effectiveness of knowledge graph reasoning.

In this study, we mainly consider the logical rules which can be represented by multiple hops on KGs. However, in practical applications, there exist many more complex types of KG reasoning. For example, the reasoning query “who is the person who has both starred in the movie ‘New Police Story’ and directed the movie ‘On the Run’?” introduces a logical conjunction while reasoning. So in the future, we will expand the proposed method to cover more types of complex reasoning.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in [16,38,41,42,43].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Examples of the Logical Rules-Guided in-Context Triples Retrieval

In this appendix, some examples are given to explain the details of the parts: logical rules-guided in-context triples retrieval.

The aim of this part is to construct training data to pre-train the in-context triple extractor. Assuming that a triple is sampled from the knowledge graph

G

:

(Zuckerberg, nationality, USA)

The logical rules about the relation “nationality” can be learnt from the knowledge graph by existing logical rule learning models:

(x, nationality, y)←(x, works in, z)∧(z, located in state, w)∧(w, state in country, y)
(x, nationality, y)←(x, born in, z)∧(z, located in state, w)∧(w, state in country, y)

Then, we can extract triples which support each rule from the knowledge graph

G

:

{(Zuckerberg, works in, Facebook), (Facebook, located in state, California), (California, state in country, USA)}
{(Zuckerberg, born in, New York City), (New York City, located in state, New York State), (New York State, state in country, USA)}

Thus, we can obtain the set of in-context triples through combining the extracted triples above:

{(Zuckerberg, born in, New York City),

(New York City, located in state, New York State),

(New York State, state in country, USA)

(Zuckerberg, works in, Facebook),

(Facebook, located in state, California) }

Finally, training data can be constructed which mainly contain the in-context triples with the corresponding query triple.

Appendix B. Top-N Prediction Prompt

As mentioned in Section 4.2, the top-N prediction prompt is constructed to make the LLM predict N tail entities sorted by the probabilities. The process is formally represented by:

\begin{matrix} A = L L M ({(h, r, ?), I, C}), \end{matrix}

Here, we use an example to explain this process. Assuming the given query triple is

(h, r, ?)

is:

(Eric Allin Cornell, lives in country, ?)

Then, the corresponding in-context triples

I

extracted by the proposed method are as follows:

(Eric Allin Cornell, works at, University of Colorado Boulder),

(University of Colorado Boulder, is located in, Colorado),

(Eric Allin Cornell, was born in, Palo Alto)

(Palo Alto, is located in, California)

(Eric Allin Cornell, graduated from, Massachusetts Institute of Technology)

Besides that, the retrieved set of candidate tail entities

C

is:

{Germany, Stanford University, Denton, Colorado,

Sweden, California, USA, University of California, physics,

scientist, New York City, Benjamin Franklin Medal}.

Based on the query triple, in-context triples, and the set of candidate tail entities, we can construct the prompt as:

“Given the triple (Eric Allin Cornell, lives in country, [mask]), please predict the top 10 tail entities [mask] by probabilities in descending order from the candidates {Germany, Stanford University, Denton, Colorado, Sweden, California, USA, University of California, physics, scientist, New York City, Benjamin Franklin Medal}. Below are triples that might be helpful for answer this question {(Eric Allin Cornell, works at, University of Colorado Boulder), (University of Colorado Boulder, is located in, Colorado), (Eric Allin Cornell, was born in, Palo Alto), (Palo Alto, is located in, California), (Eric Allin Cornell, graduated from, Massachusetts Institute of Technology)}. Then, the [mask] is:”

The LLM takes the prompt above as input, and conducts the reasoning through in-context learning. Finally, the output of the LLM are the predicted tail entities in the candidate tail entity set which are sorted by the probabilities in descending order. For example:

“USA, Colorado, California, Germany, Denton, Sweden, New York City, Stanford University, University of California, physics, scientist, Benjamin Franklin Medal.”

Among a series of entities output by LLM, the entity USA ranks the first, indicating that the LLM considers USA as the most likely tail entity for this query.

References

Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Philip, S.Y. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
Ren, Z.; Zhao, Y.; Zong, C. Towards Informative Open-ended Text Generation with Dynamic Knowledge Triples. In Proceedings of the Empirical Methods in Natural Language Processing 2023, Singapore, 6–10 December 2023; pp. 3189–3203. [Google Scholar]
Wang, S.; Dang, D. Robust cross-lingual knowledge base question answering via knowledge distillation. Data Technol. Appl. 2021, 55, 661–681. [Google Scholar] [CrossRef]
Zhao, Y.; Xiang, L.; Zhu, J.; Zhang, J.; Zhou, Y.; Zong, C. Knowledge graph enhanced neural machine translation via multi-task learning on sub-entity granularity. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 4495–4505. [Google Scholar]
Zhao, Y.; Zhang, J.; Zhou, Y.; Zong, C. Knowledge graphs enhanced neural machine translation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 7–15 January 2021; pp. 4039–4045. [Google Scholar]
Zhang, W.; Chen, J.; Li, J.; Xu, Z.; Pan, J.Z.; Chen, H. Knowledge graph reasoning with logics and embeddings: Survey and perspective. arXiv 2022, arXiv:2202.07412. [Google Scholar]
Xue, B.; Zou, L. Knowledge graph quality management: A comprehensive survey. IEEE Trans. Knowl. Data Eng. 2022, 35, 4969–4988. [Google Scholar] [CrossRef]
Shen, T.; Zhang, F.; Cheng, J. A comprehensive overview of knowledge graph completion. Knowl. Based Syst. 2022, 255, 109597–109661. [Google Scholar] [CrossRef]
Jia, N.; Yao, C. A Brief Survey on Deep Learning-Based Temporal Knowledge Graph Completion. Appl. Sci. 2024, 14, 8871. [Google Scholar] [CrossRef]
Liang, Y.; Zhang, Y.; Ma, C.; Zhang, Z.; Zhao, Y.; Xiang, L.; Zong, C.; Zhou, Y. Document Image Machine Translation with Dynamic Multi-pre-trained Models Assembling. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; pp. 7077–7088. [Google Scholar]
Krawczyk, N.; Probierz, B.; Kozak, J. Towards AI-Generated Essay Classification Using Numerical Text Representation. Appl. Sci. 2024, 14, 9795. [Google Scholar] [CrossRef]
Iaroshev, I.; Pillai, R.; Vaglietti, L.; Hanne, T. Evaluating Retrieval-Augmented Generation Models for Financial Report Question and Answering. Appl. Sci. 2024, 14, 9318. [Google Scholar] [CrossRef]
Hu, L.; Liu, Z.; Zhao, Z.; Hou, L.; Nie, L.; Li, J. A survey of knowledge enhanced pre-trained language models. IEEE Trans. Knowl. Data Eng. 2023, 36, 1413–1430. [Google Scholar] [CrossRef]
Wang, X.; Zhu, W.; Saxon, M.; Steyvers, M.; Wang, W.Y. Large language models are latent variable models: Explaining and finding good demonstrations for in-context learning. In Advances in Neural Information Processing Systems, Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates, Inc.: Red Hook, New York, USA, 2024; Volume 36. [Google Scholar]
Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q.V.; et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, Nevada, 5–8 December 2013; Curran Associates, Inc.: Red Hook, New York, USA, 2013; Volume 26, p. 26. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; pp. 687–696. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Galárraga, L.A.; Teflioudi, C.; Hose, K.; Suchanek, F. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 413–422. [Google Scholar]
Galárraga, L.; Teflioudi, C.; Hose, K.; Suchanek, F.M. Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 2015, 24, 707–730. [Google Scholar] [CrossRef]
Lajus, J.; Galárraga, L.; Suchanek, F. Fast and exact rule mining with AMIE 3. In The Semantic Web: 17th International Conference, ESWC 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 36–52. [Google Scholar]
Omran, P.G.; Wang, K.; Wang, Z. An embedding-based approach to rule learning in knowledge graphs. IEEE Trans. Knowl. Data Eng. 2019, 33, 1348–1359. [Google Scholar] [CrossRef]
Yang, F.; Yang, Z.; Cohen, W.W. Differentiable learning of logical rules for knowledge base reasoning. In Advances in Neural Information Processing Systems, Proceedings of the Thirty-First Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, New York, USA, 2017; Volume 30. [Google Scholar]
Meilicke, C.; Chekol, M.W.; Ruffinelli, D.; Stuckenschmidt, H. Anytime bottom-up rule learning for knowledge graph completion. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3137–3143. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for knowledge graph completion. arXiv 2019, arXiv:1909.03193. [Google Scholar]
Kenton, J.D.M.W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
Saxena, A.; Kochsiek, A.; Gemulla, R. Sequence-to-Sequence Knowledge Graph Completion and Question Answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 2814–2828. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Xie, X.; Zhang, N.; Li, Z.; Deng, S.; Chen, H.; Xiong, F.; Chen, M.; Chen, H. From discrimination to generation: Knowledge graph completion with generative transformer. In Proceedings of the Companion Proceedings of the Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 162–165. [Google Scholar]
Lewis, M. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
Zhu, Y.; Wang, X.; Chen, J.; Qiao, S.; Ou, Y.; Yao, Y.; Deng, S.; Chen, H.; Zhang, N. Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. World Wide Web 2024, 27, 58. [Google Scholar] [CrossRef]
Baek, J.; Aji, A.F.; Saffari, A. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv 2023, arXiv:2306.04136. [Google Scholar]
Zhang, Y.; Chen, Z.; Zhang, W.; Chen, H. Making Large Language Models Perform Better in Knowledge Graph Completion. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 28 October–1 November 2024. [Google Scholar]
Wei, Y.; Huang, Q.; Zhang, Y.; Kwok, J. KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Miller, G.A. WordNet: A lexical database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 10–12 June 2008; pp. 1247–1250. [Google Scholar]
Toutanova, K.; Chen, D. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China, 26–31 July 2015; pp. 57–66. [Google Scholar]
Akrami, F.; Saeef, M.S.; Zhang, Q.; Hu, W.; Li, C. Realistic re-evaluation of knowledge graph completion methods: An experimental study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 14–19 June 2020; pp. 1995–2010. [Google Scholar]
Wang, X.; Gao, T.; Zhu, Z.; Zhang, Z.; Liu, Z.; Li, J.; Tang, J. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. Trans. Assoc. Comput. Linguist. 2021, 9, 176–194. [Google Scholar] [CrossRef]
Vrandečić, D.; Krötzsch, M. Wikidata: A free collaborative knowledgebase. Commun. ACM 2014, 57, 78–85. [Google Scholar] [CrossRef]
Wang, S.; Li, S.; Zou, L. Analogy-Triple Enhanced Fine-Grained Transformer for Sparse Knowledge Graph Completion. In Proceedings of the International Conference on Database Systems for Advanced Applications, Tianjin, China, 17–20 April 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 742–757. [Google Scholar]
Vaswani, A. Attention is all you need. In Advances in Neural Information Processing Systems. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zhao, Y.; Zhang, J.; Zong, C. Transformer: A general framework from machine translation to others. Mach. Intell. Res. 2023, 20, 514–538. [Google Scholar] [CrossRef]
Tan, Z.; Zhang, J.; Huang, X.; Chen, G.; Wang, S.; Sun, M.; Luan, H.; Liu, Y. THUMT: An open-source toolkit for neural machine translation. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), Online, 6–9 October 2020; pp. 116–122. [Google Scholar]

Figure 1. An example of reasoning on incomplete knowledge graphs.

Figure 3. Performance on entities with different frequencies, including (a) MRR, (b) Hits@1, and (c) Hits@10.

Figure 4. Performance of methods with different number of in-context triples, including (a) MRR, (b) Hits@1, and (c) Hits@10.

Table 1. The statistics for the five knowledge graphs.

Dataset	#Entity	#Relation	#Train	#Valid	#Test
WN18RR	40,943	11	86,835	3034	3134
FB15k	14,951	1345	483,142	50,000	59,071
FB15k-237	14,541	237	272,115	17,535	20,046
YAGO3-10-dr	122,837	36	732,556	3390	3359
Wikidata5m	4,818,503	828	21,343,515	5357	5321

Table 2. Performance of baselines and proposed model on WN18RR, FB15k, and FB15k-237.

#	Model	WN18RR			FB15k			FB15k-237
#	Model	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10
1	TransE	0.243	0.043	0.532	0.391	0.031	0.624	0.279	0.198	0.441
2	RotatE	0.476	0.428	0.571	0.791	0.742	0.881	0.338	0.241	0.533
3	AMIE	0.357	0.287	0.356	0.797	0.617	0.881	0.308	0.174	0.477
4	AnyBURL	0.480	0.445	0.549	0.830	0.808	0.876	0.260	0.196	0.410
5	KG-BERT	0.219	0.095	0.497	–	–	–	0.237	0.144	0.427
6	KGT5	0.508	0.487	0.544	–	–	–	0.276	0.210	0.414
7	${GPT}_{o n e - s h o t}$	0.511	0.482	0.556	0.815	0.731	0.867	0.323	0.255	0.551
8	RuleLLM	0.518	0.493	0.573	0.826	0.737	0.879	0.341	0.260	0.557

The bolded data indicates the best performance.

Table 3. Performance of baselines and proposed model on YAGO3-10-dr and Wikidata5m.

#	Model	YAGO3-10-dr			Wikidata5m
#	Model	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10
1	TransE	0.190	0.136	0.323	0.253	0.170	0.392
2	RotatE	0.214	0.153	0.332	0.290	0.234	0.390
3	AMIE	–	–	–	–	–	–
4	AnyBURL	0.211	0.154	0.331	–	–	–
5	KG-BERT	–	–	–	–	–	–
6	KGT5	0.211	0.151	0.327	0.300	0.267	0.365
7	${GPT}_{o n e - s h o t}$	0.213	0.153	0.333	0.326	0.298	0.397
8	RuleLLM	0.212	0.152	0.337	0.357	0.321	0.448

The bolded data indicates the best performance.

Table 4. Ablation study of proposed method.

#	Model	WN18RR			FB15k-237
#	Model	MRR	Hits@1	Hits@10	MRR	Hits@1	Hits@10
1	w/o in-context triples	0.326	0.177	0.528	0.233	0.151	0.431
2	directly connected in-context triples	0.315	0.171	0.503	0.226	0.151	0.422
3	in-context triples guided by logical rules	0.457	0.403	0.552	0.326	0.254	0.547
4	in-context triples by logical rules and LLM’s feedback	0.517	0.494	0.571	0.343	0.264	0.558

Table 5. An example of manual scoring. The query triple is (Eric Allin Cornell, lives in country, ? ), then the in-context triples extracted by three different methods are displayed and scored by five experts manually.

Method	Extracted in-Context Triples	Ex. 1	Ex. 2	Ex. 3	Ex. 4	Ex. 5	Average
directly connected in-context triples	1. (Eric Allin Cornell, occupation, scientist) 2. (Eric Allin Cornell, has won prize, Nobel Prize) 3. (Eric Allin Cornell, has friend, Wolfgang Ketterle) 4. (Eric Allin Cornell, has gender, male) 5. (Eric Allin Cornell, year of birth, 1961)	2	3	3	1	2	2.2
in-context triples by logical rules	1. (Eric Allin Cornell, has friend, Wolfgang Ketterle) 2. (Wolfgang Ketterle, graduated From, Heidelberg University) 3. (Heidelberg University, is located in, Heidelberg) 4. (Heidelberg, is located in, Germany) 5. (Eric Allin Cornell, graduated from, Massachusetts Institute of Technology)	4	5	6	5	6	5.2
in-context triples by logical rules and LLMs feedback	1. (Eric Allin Cornell, works at, University of Colorado Boulder) 2. (University of Colorado Boulder, is located in, Colorado) 3. (Eric Allin Cornell, was born in, Palo Alto) 4. (Palo Alto, is located in, California) 5. (Eric Allin Cornell, graduated from, Massachusetts Institute of Technology)	8	8	7	8	9	8.0

Table 6. Scores of manual evaluations for in-context triples by different methods.

#	Method	Ex. 1	Ex. 2	Ex. 3	Ex. 4	Ex. 5	Average
1	directly connected in-context triples	2.50	3.44	3.86	2.46	1.84	2.82
2	in-context triples guided by logical rules	4.64	4.78	5.56	5.34	4.64	4.99
3	in-context triples by logical rules and LLMs feedback	5.82	5.66	6.22	6.78	7.02	6.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S. Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples. Appl. Sci. 2025, 15, 1088. https://doi.org/10.3390/app15031088

AMA Style

Wang S. Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples. Applied Sciences. 2025; 15(3):1088. https://doi.org/10.3390/app15031088

Chicago/Turabian Style

Wang, Shaofei. 2025. "Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples" Applied Sciences 15, no. 3: 1088. https://doi.org/10.3390/app15031088

APA Style

Wang, S. (2025). Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples. Applied Sciences, 15(3), 1088. https://doi.org/10.3390/app15031088

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing In-Context Learning of Large Language Models for Knowledge Graph Reasoning via Rule-and-Reinforce Selected Triples

Abstract

1. Introduction

2. Related Work

2.1. Traditional Knowledge Graph Reasoning Methods

2.2. Pre-Trained Language Model- and Large Language Model-Based Methods

3. Formalized Description of the Proposed Solution

4. Methodology

4.1. Logical Rules-Guided in-Context Triples Retrieval and Extractor Pre-Training

4.1.1. Logical Rules-Guided in-Context Triples Retrieval

4.1.2. Supervised Extractor Pre-Training

4.2. Reinforcement Learning with LLM’s Feedback as Rewards

4.2.1. Feedback of LLM Collecting

4.2.2. Reinforcement Learning with LLM’s Feedback

4.3. In-Context Learning and Reasoning

5. Experimental Setup

5.1. Datasets

5.2. Baseline Methods

5.3. Evaluation Metrics

5.4. Experimental Settings

6. Experimental Results and Discussion

6.1. Comparison with Baselines

6.2. Ablation Study

6.3. Performance on Entities with Different Frequencies

6.4. Performance of Methods with Different Number of in-Context Triples

6.5. Manual Evaluation of in-Context Triples by Different Methods

7. Conclusions and Future Work

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Examples of the Logical Rules-Guided in-Context Triples Retrieval

Appendix B. Top-N Prediction Prompt

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI