Next Article in Journal
Complexity of Recent Earthquake Swarms in Greece in Terms of Non-Extensive Statistical Physics
Next Article in Special Issue
Multi-Focus Image Fusion for Full-Field Optical Angiography
Previous Article in Journal
Effects of Stochastic Noises on Limit-Cycle Oscillations and Power Losses in Fusion Plasmas and Information Geometry
Previous Article in Special Issue
A Hybrid Deep Learning Model for Brain Tumour Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

IMF: Interpretable Multi-Hop Forecasting on Temporal Knowledge Graphs

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2023, 25(4), 666; https://doi.org/10.3390/e25040666
Submission received: 20 February 2023 / Revised: 26 March 2023 / Accepted: 8 April 2023 / Published: 16 April 2023
(This article belongs to the Special Issue Methods in Artificial Intelligence and Information Processing II)

Abstract

:
Temporal knowledge graphs (KGs) have recently attracted increasing attention. The temporal KG forecasting task, which plays a crucial role in such applications as event prediction, predicts future links based on historical facts. However, current studies pay scant attention to the following two aspects. First, the interpretability of current models is manifested in providing reasoning paths, which is an essential property of path-based models. However, the comparison of reasoning paths in these models is operated in a black-box fashion. Moreover, contemporary models utilize separate networks to evaluate paths at different hops. Although the network for each hop has the same architecture, each network achieves different parameters for better performance. Different parameters cause identical semantics to have different scores, so models cannot measure identical semantics at different hops equally. Inspired by the observation that reasoning based on multi-hop paths is akin to answering questions step by step, this paper designs an Interpretable Multi-Hop Reasoning (IMR) framework based on consistent basic models for temporal KG forecasting. IMR transforms reasoning based on path searching into stepwise question answering. In addition, IMR develops three indicators according to the characteristics of temporal KGs and reasoning paths: the question matching degree, answer completion level, and path confidence. IMR can uniformly integrate paths of different hops according to the same criteria; IMR can provide the reasoning paths similarly to other interpretable models and further explain the basis for path comparison. We instantiate the framework based on common embedding models such as TransE, RotatE, and ComplEx. While being more explainable, these instantiated models achieve state-of-the-art performance against previous models on four baseline datasets.

1. Introduction

Knowledge graphs (KGs) are collections of triples, such as Freebase [1] and YAGO [2]. Temporal KGs introduce a new dimension into static knowledge graphs [3], i.e., a timestamp for each triple to form a quadruple. Although there are billions of triples in temporal KGs, they are still incomplete. These incomplete knowledge bases will lead to limitations in practical applications. Since temporal KGs involve the time dimension, the completion of temporal KGs can be divided into interpolation and forecasting. The former utilizes the facts of all timestamps to predict the triples at a particular moment; the latter employs historical facts to predict future triples. Due to the importance of temporal KG forecasting in event prediction, it has attracted growing attention recently. This paper mainly focuses on temporal KG forecasting.
Most current research on temporal KG completion focuses on interpolation [4,5,6,7,8,9,10]. Recently, there have been attempts to investigate temporal KG forecasting [3,4,7,11,12,13]. According to the interpretability, research on temporal KG forecasting can be divided into two categories. One type is the black-box model, which designs an unexplainable scoring function for quadruples’ rationality. The other type is interpretable approaches. CyGNet [11] utilizes one-hop repetitive facts to realize prediction. Its performance is limited by the lack of direct repetitive knowledge of historical moments. xERTR [7], CluSTeR [3], and TITer [14] are all path-based temporal KG forecasting models. xERTR [7] adopts the inference subgraphs to aggregate local information around the question. CluSTeR [3] and TITer [14] manipulate reinforcement learning for the path search and improve the performance through temporal reasoning.
Thus far, however, there has been little discussion on the following two aspects. Firstly, uniformly measuring the paths of different hops requires handling the same semantics equivalently at different hops. Current models utilize separate networks to evaluate paths at different hops. Although each hop’s network has the same architecture, each network acquires different parameters for better performance. Different parameters cause identical semantics to have different scores, so current models cannot truly compare multi-hop paths according to the same criteria. For example, xERTR [7] simply gathers the scores of different paths for comparison, which is mainly based on training datasets. Secondly, although current models can provide reasoning paths, the comparison of paths operates in a black-box fashion. The interpretability of the current models means providing the reasoning paths, which is an essential property of path-based models. These models lack an explanation of the preference for various paths, i.e., they cannot provide the basis for path comparison.
In practice, forecasting based on path searching aims to find the appropriate multi-hop paths, the combination of whose relations is equivalent to the question’s relation. As we observe, reasoning based on multi-hop paths is akin to stepwise question answering. Inspired by stepwise question answering, this paper designs a new Interpretable Multi-Hop Reasoning (IMR) framework based on consistent basic models, which can uniformly integrate the paths of different hops and perform more interpretable reasoning.
The primary pathway of IMR can be as follows. IMR first transforms reasoning based on path searching into stepwise question answering based on basic KG embedding models [1,15,16,17,18] and IRN [19]. This framework calculates the unanswered parts of questions after each hop as the new question for the next hop during the stepwise question answering, which is named the remainder of questions in this paper. Moreover, IMR designs three indicators based on the unanswered parts of questions and the inferred tails: the query matching degree, answer completion level, and path confidence. The query matching degree, i.e., the matching degree between the reasoning tails and the original questions, measures the rationality of the new quadruples. The answer completion level, i.e., the matching degree between the relations of paths and that of the questions, measures the answer’s completeness. Path confidence, i.e., the difference between the same entities with different timestamps, measures the reliability of the reasoning paths. IMR achieves the unified scoring of multi-hop paths and better explainable reasoning simultaneously with these indicators’ combination.
The major contributions of this work are as follows. (1) A new Interpretable Multi-Hop Reasoning framework (IMR) is proposed in this paper, which provides a new framework for the specific design of forecasting models. Furthermore, IMR defines three indicators: the query matching degree, answer completion level, and path confidence. (2) Unlike other models that cannot measure the paths of different hops uniformly, IMR can measure the paths of different hops according to the same criteria and utilize multi-hop paths for inference. (3) IMR can provide reasoning paths similarly to other interpretable models and further explain the basis for path comparison. (4) Based on basic embedding models, IMR is instantiated as the specific model. Experiments on four benchmark datasets show that these instantiated models achieve state-of-the-art performance against previous models.

2. Related Work

Static KG reasoning. Knowledge graph reasoning based on representation learning has been widely investigated by scholars. These approaches to reasoning can be categorized into geometric models [1,17,20,21,22], tensor decomposition models [15,16,18,23], and deep learning models [24,25,26]. In recent years, some scholars have attempted to introduce GCN into knowledge graph reasoning [27], which can improve the performance of basic models. Some other scholars focus on multi-hop reasoning with symbolic inference rules learned from relation paths [28,29]. The above methods are all designed for static KGs, making it challenging to deal with temporal KG reasoning.
Temporal KG reasoning. Temporal KGs import the time dimension into static KGs, which makes the facts of a specific timestamp extremely sparse. The temporal KG reasoning task can be divided into two categories: reasoning about historical facts [4,5,6,7,8,30], i.e., interpolation on temporal KGs, and reasoning about future facts [3,4,7,11], i.e., forecasting on temporal KGs. The former predicts the missing facts of a specific historical moment based on the facts of all moments, and the latter predicts future events based only on the past facts. There are many studies on the task of temporal KG interpolation. However, these studies are all black-box models, which cannot explain predictions. Most of the proposed models for temporal KG forecasting are also black-box models. BoxTE [31] utilizes BoxEmbedding for temporal KG forecasting, which is expressive and possesses an inductive capacity. Recently, xERTR [7], CluSTer [3], and TITer [14] were shown to explain predictions to some extent. These models can provide the reasoning paths for the predictions. However, both models cannot truly handle multi-hop paths crossing the same criteria, which is more similar to the weighted combination. xERTR and TiTer combine the scores of paths with different hops by training weights. Experiments show that CluSTeR performs worse on paths with multiple hops than on paths with only one hop.
Most current temporal KG forecasting models are black-box models. Only some models can provide reasoning paths for prediction. Moreover, none of them can explain how path comparisons work and none of them can integrate paths of different hops uniformly.

3. Preliminaries

The task of temporal KG forecasting. Suppose that E , R , and T represent the entity set, predicate set, and timestamp set, respectively. The temporal KG is a collection of quadruples, which can be expressed as
K = e s , r , e o , t , e s , e o E , r R , t T
e s , r , e o , t denotes a quadruple; e s and e o represent the subject and object, respectively. r represents the relation, and t represents the time that the quadruple occurs. Suppose that facts happening before the selected time t k can be expressed as
G t k = e i , r , e j , t i K | t i < t k
Temporal KG forecasting predicts future links based on past facts. This means that its foundation is the process of predicting e o based on a question e s , r q , ? , t q and the previous facts G t q , where r q , t q denote the relation and timestamp of the question. Temporal KG forecasting involves ranking all entities of the specific moment and obtaining the preference for prediction.
Temporal KG forecasting based on paths. Knowledge graph embedding associates the entities e E and relations r R with vectors e , r . Different from static KGs, the entities in temporal KGs contain time information. The entity may contain different attributes at different moments. In order to better characterize the entity in temporal KGs, we associate each entity e with a specific time label t i T , so the entity e can be depicted as e t i and its embedding can be denoted as e t i . The set of quadruples directly associated with e s t i , which can be defined as the 1-hop paths associated with e s t i , can be expressed as P ( e s , t i ) = e s , r , e j , t k | e s , r , e j , t k G t i , where e s , e j E , r p R , t k < t i T . In this way, P ( e s , t i ) can represent all associated quadruples. The set of entities directly associated with e s t q in the path P ( e s , t q ) , i.e., the 1-hop neighbors of e s t q , can be denoted as N ( e s , t q ) = e i t h | e s , r , e i , t h P ( e s , t q ) , where e s , e i E , r R , t h < t q T . Given the question e s , r q , ? , t q , the forecasting task can be depicted as requesting the entity e o based on path searching. For example, we search the path with e s as the starting point:
e s , r p 1 , e 1 , t 1 , e 1 , r p 2 , e 2 , t 2 , , e i 1 , r p i , e i , t i
where r p i denotes the relations of the ith-hop. Thus, answers to the question may be e 1 , e 2 , e 3 , , e i , and the corresponding inference hop is 1 , 2 , 3 , , i , respectively. Moreover, e s i , r q i denotes the remaining (or unanswered) subjects and relations of questions after the ith-hop paths, which will be explained in Section 4.3.2.
Uniformly measuring paths of different hops. Uniformly measuring paths of different hops requires models scoring paths of different hops according to the same criteria. For example, given question e s , r q , ? , t q and the searched 1-hop path e s , r p , e 1 , t 1 , the score obtained for the searched 1-hop path is f. If we find no path during the first hop, the original question is left to the second hop to solve. Thus, the remaining question (unanswered question) for the second hop is still e s , r q , ? , t q . When the path searched at the second hop is also e s , r p , e 1 , t 1 , the score for the searched path at the second hop should also be f. As is shown in this example, we should score identical semantics equivalently even under different hops. Moreover, the equal comparison of paths provides the basis for the interpretability of path comparison. This attribute constrains models to have an identical scoring mechanism at each hop, i.e., each hop’s separate networks for the models based on neural networks should have the same parameters. However, only IMR can meet the attribute.
Fact matching based on TransE. This paper is the first study of the design of interpretable evaluation indicators from the perspective of actual semantics. We instantiate IMR to better illustrate the design pathway and thus choose the basic embedding model TransE as the basis of IMR. In TransE, relations are represented as translations in the embedding space. If the triple e s , r , e o holds in static KGs, TransE [1] assumes the following relationship.
e s + r e o = 0
where e s , r and e o R k , and k denotes the dimension of each vector.
For each quadruple e s , r q , e o , t q in temporal KGs, the relation r q can also be taken as the translation from the subject e s to the object e o , i.e., e s t q + r q = e o t q . We suppose that when the distance d of quadruples is smaller, the quadruple will be better matched. The distance of the quadruple e s , r q , e o , t q can be expressed as
d = e s t q + r q e o t q
The relations in KG embedding models indicate the translations between entities, whose specific design determines the complexity of the indicators designed by IMR. The design route of IMR originates from the perspective of reasoning from actual semantics, which is not limited to specific basic models. The consistent basic model of IMR-TransE is TransE, i.e., all IMR-TransE’s specific formulas are based on TransE, which will not be explained below. To limit the length of the paper, we move the details of IMR-TransE and IMR-ComplEx to Appendix A.2.

4. IMR: Interpretable Multi-Hop Reasoning

We introduce the Interpretable Multi-Hop Reasoning framework (IMR) in this section. We first provide an overview of IMR in Section 4.1. IMR comprises three modules: the path searching module, query updating module, and path scoring module. The path searching module searches related paths hop by hop from the subjects of questions, involving path sampling and entity clipping, whose motivation and design are presented in Section 4.2. The query updating module calculates the remaining questions hop-by-hop for each path, involving the update of the subject and relations, whose motivation and design are introduced in Section 4.3. The path scoring module designs three indicators: the question matching degree, answer completion level, and path confidence. This module combines three indicators to evaluate each path, whose motivation and design are presented in Section 4.4. We introduce training strategies and the regularizations on state continuity in Section 4.5. IMR conducts uniform path comparisons based on consistent basic models. To better illustrate this framework, we also include the corresponding instance model (IMR-TransE) in Section 4.3, Section 4.4 and Section 4.5. The detailed implementations of IMR-RotatE and IMR-ComplEx are included in Appendix A.2.

4.1. Framework Overview

We notice that predicting unknown facts based on paths is akin to answering questions, i.e., the question can be answered directly via finding triples with an equal relation or gradually by utilizing the multi-hop equivalent paths. Inspired by this observation, we take the task of link prediction as stepwise question answering. IMR primarily consists of searching for paths hop by hop, updating the remaining questions for each path, and filtering the best answers based on three indicators: the question matching degree, answer completion level, and path confidence.
We show a toy example in Figure 1. Given a question e s , r q , ? , t q and the previous facts G t q , the task of forecasting is predicting the missing object e o . The steps of IMR are as follows.
Step 1: Starting from the subject e s , we first acquire the associated quadruples P ( e s , t q ) , namely 1-hop paths. We temporally bias the neighborhood sampling using an exponential distribution for the neighbors [7]. The distribution negatively correlates with the time difference between node e s and its neighbor N ( e s , t q ) . Then, we calculate the remaining questions (the remaining subject e s 1 and the remaining relation r q 1 ) for each sampled path. Finally, IMR scores 1-hop paths based on three indicators, which is discussed in Section 4.4.
Step 2: To prevent the path searching from exploding, the model samples the tails of 1-hop paths for the 2-hop path searching. As shown by the pink arrow in Figure 1, the tails of 1-hop paths are clipped according to the scores of 1-hop paths. For the 2-hop paths searched from the clipped tails, IMR samples the paths negatively correlated with time distances. Then, IMR calculates the remaining questions for each 2-hop path (the remaining subject e s 2 and the remaining relation r q 2 ) and scores the 2-hop paths based on three indicators.
Step 3: Rank the scores of 1-hop and 2-hop paths to obtain the preference answer.

4.2. Path Searching Module

Inspired by the observation that reasoning based on multi-hop paths is akin to stepwise question answering, this module searches related paths hop by hop from the subjects of questions.
Path sampling. For the path searching from the starting subject e s t q , the number of triples in P ( e s , t q ) may be very large. To prevent the path searching from exploding, we sample a subset of the paths. In fact, the attributes of entities in temporal KGs may change over time. Consider the observation that when t 1 is closer to t q , the attributes of e s t 1 should be more similar to those of e s t q . We also verify the correlation between attributes and the time distance in Appendix A.6. Therefore, we are more prone to sample nodes whose time is closer to t q . In this paper, we employ time-aware exponentially weighted sampling in xERTR [7]. xERTR temporally biases the neighborhood sampling using an exponential distribution of temporal distance.
Entity pruning. The search for next-hop paths is based on the tails of previous-hop paths, so the number of paths is increased by the exponent of dimensions. To avoid the explosion of next-hop path searching, this paper proposes to select the top-K entities for the next-hop search based on the sorted scores of the previous hops.

4.3. Query Updating Module

Given a question e s , r q , ? , t q , there may be a few relations directly equivalent to r q in the temporal KGs for the task of link prediction. More questions need to go through multi-hop paths to infer the outcome. In question answering, a complex question can be decomposed into multiple sub-questions, with one sub-question answered at each step. Thus, inference based on the multi-hop path is equivalent to answering complex questions step by step. Moreover, we need to remove the part resolved to focus on the remaining questions. IMR proposes to update the question according to the last hop and focus on finding the unsolved parts. The query updating module mainly calculates the remaining questions, i.e., the unanswered questions.
The embedding of entities is first introduced in this subsection, followed by the query updating module of IMR-TransE.

4.3.1. Entity Representation

The attributes contained in the entities may change over time. This paper divides the entity embeddings of each timestamp into a static representation and dynamic representation.
e = a c t M L P ( [ e sta | | e dy ] )
Here, the vector e sta denotes the static embedding, which captures time-invariant features and global dependencies over the temporal KGs. The vector e dy represents the dynamic embedding for each entity that changes over time. | | denotes the operation of concatenation and M L P ( · ) denotes the multilayer perceptron (MLP). a c t ( · ) denotes the activation function. We provide more details about e sta and e dy in Appendix A.3.

4.3.2. Question Updating

Each path contains a different set of relations. After each hop, the question needs to discard the processed semantic, i.e., to obtain the remaining subject and relation of the question.
Question updating for IMR-TransE. As shown in Figure 1, the subject and relation of the question after the i-th hop path are updated based on Equation (5) as follows.
e s i = e s i 1 + r p i
r q i = r q i 1 r p i
where the embedding e s i and r q i represent the remaining subject and relation of the question after the i-hop path, respectively. Moreover, e s 0 = e s , r q 0 = e q and r p i denotes the relation of i-th hop path and i is the number of hops for each path.

4.4. Path Scoring Module

For the question (Sub, Rel, ?, Tq), we search the 2-hop path (Sub, R1, Obj1, T1),(Obj1, R2, Obj2, T2). The pink box indicates that the original question and the tail of the path are combined as a quadruple to measure the rationality of searched tails, i.e., the question matching degree f q m d . The purple box represents the comparison between the question’s relation and the path relations to measure the semantic equivalence between the question and the path, i.e., the answer completion level f a c . These green boxes compare the attributes of the same entities with different timestamps to measure the reliability of the search path, i.e., the path confidence f p c .
We evaluate the path searching from three perspectives. First, the searched tails should match the original question, which means that the correct tails searched by paths and the question should satisfy the consistent basic embedding model. Secondly, the ideal path should be the search for equivalent semantics for relations, not merely the search for the correct tails. It is necessary to ensure the correctness of semantic equivalence, i.e., the path is semantically equivalent to the relation of the question. Finally, considering the particularity of the temporal KGs, the attributes of the same entity may change over time. The current sampling strategy for path searching is to sample adjacent timestamp triples of the same entity. When the attribute value of the entity changes significantly over time, it is inappropriate to perform this sampling strategy for the next hop. We need to ensure that the same entity with different timestamps has similar properties in the same path. In this way, three indicators have been developed by IMR to measure the rationality of the reasoning path, respectively: the question matching degree, answer completion level, and path confidence. Although the current methods, such as models based on reinforcement learning, can have complicated designs, the score functions simply belong to a type of question matching degree. We provide a detailed analysis of the correlation between IMR and reinforcement-learning-based models in Appendix A.5.

4.4.1. Question Matching Degree

For the tails found by path searching, we need to measure the matching degree between the tails and the question, the question matching degree. In fact, the scoring function applied by some traditional reinforcement learning methods is a type of question matching degree. As shown in the yellow box in Figure 2, for the entity e p i t i searched by the paths with i hops, we combine the entity e i t i and the question e s , r q , ? , t q into a new quadruple e s , r q , e p i t i , t q .
Question matching degree for IMR-TransE. Question matching degree f q m d in IMR-TransE calculates the distance of the constructed quadruple based on TransE [1]. The better the entity matches the question, the smaller the distance of quadruples will be. The calculation of f q m d for ith-hop path is as follows.
f q m d i = e s t q + r q e p i t i p
where the p-norm of a complex vector V is defined as V p = V i p p . We use the L1-norm for all indicators in the following.

4.4.2. Answer Completion Level

Among the paths to the right tails, some paths are not related to the semantics of the question. Although these paths can infer the tail, these paths are invalid due to being unrelated to the question in semantics. Therefore, IMR designs an index to measure the semantic relevance between the path and the question. Answer completion level f a c indicates whether the combination of path relations can reflect the relation of the question in semantics. IMR takes the remaining relations of the question as the answer completion level, which is calculated based on the distance between the relations of paths r p 1 , r p 2 , and the relation r q . The fewer the relations of a question that remain, the more complete the answer given by the combination of path relations.
Answer completion level for IMR-TransE. The calculation of f a c for ith-hop path in IMR-TransE is as follows.
f a c i = r q r p 1 r p 2 r p 3 r p i p = r q 1 r p 2 r p 3 r p i p = r q i p

4.4.3. Path Confidence

Path searching is the process of searching for the next-hop paths based on the tail of the previous hop. When searching for a path, the current sampling strategy is to sample adjacent timestamp triples of the same entity. There are deviations between the same entities with different timestamps in temporal KGs. The premise of this sampling strategy is that only when entities have similar attributes under different timestamps, the path searching is valid. When the entity’s attributes change significantly over time, performing an effective next path search is inappropriate. The reasoning path is more reliable when the deviations between entities are smaller. IMR designs path confidence f p c , i.e., the error between the subject of the updated question e s i and the tails e p i t i of the path with i hops.
Path confidence for IMR-TransE. The calculation of f p c for ith-hop path in IMR-TransE is as follows.
f p c i = e s i e p i t i p
where e q i represents the remaining subject of the question updated by paths of the length i, and e p i t i represents the tail reasoned by the i-hop paths.

4.4.4. Combination of Scores

IMR merges indicators with positive weights to obtain the final score of each path, i.e., f = w q m d f q m d + w a c f a c + w p c f p c , where w q m d , w a c , w p c R + .
Entity aggregation for IMR. Considering that the searched paths may lead to entities with different timestamps, IMR adopts specific aggregation for searched entities. First, entities with the same timestamp may be inferred by different paths, so IMR needs to combine the scores of entities with unique timestamps. Considering that only one path matches the question best, IMR employs max aggregation to various paths reaching the same entities with the same timestamp. Moreover, specific paths may infer the same entity with different timestamps. IMR performs average aggregation on the scores of entities with different timestamps. Finally, IMR obtains the score of each entity at the question timestamp.

4.5. Learning

We utilize binary cross-entropy as the loss function, which is defined as
L = 1 Q q Q 1 ε q p e i ε q p y e i , q log f e i , q e i ε q p f e i , q + 1 Q q Q 1 ε q p e i ε q p 1 y e i , q log 1 f e i , q e i ε q p f e i , q
where ε q p represents the set of entities reasoned by selected paths. y e i , q represents the binary label that indicates whether it is the answer for q and Q represents the training set. f e i , q denotes the score obtained by Section 4.4.4 for each path. We jointly learn the embeddings and other model parameters by end-to-end training.
Regularization. For the same entity with different timestamps, the closer its time distance is, the closer its dynamic embedding is [32]. IMR proposes the regularization on continuity for the dynamic vectors of entities.
The specific regularization for IMR is as follows.
r e g = e k t j e k t j 1 p + e k t j e k t j + 1 p
where e k t j denotes the dynamic embedding of the k-th entity at the j-th timestamp. e k t j 1 , e k t j + 1 denotes the dynamic embedding of the previous and later timestamp against e k t j , respectively. · p denotes the p norm of the vectors and we take p as 1 in this paper.

5. Experiments

5.1. Datasets and Baselines

To evaluate the proposed module, we consider two standard temporal KG datasets Integrated Crisis Early Warning System (ICEWS) [33], WIKI [34], and YAGO [35]. The ICEWS dataset contains information about political events with time annotations. We select two subsets of the ICEWS dataset, i.e., ICEWS14 and ICEWS18, containing event facts in 2014 and 2018. WIKI and YAGO is a temporal KG that fuses information from Wikipedia with WordNet [36]. Following the experimental settings of HyTE [37], we deal with year-level granularity by dropping the month and date information. We compare IMR and baseline methods by performing the temporal KG forecasting task on ICEWS14, ICEWS18, WIKI, and YAGO. Details of these datasets are listed in Table 1. We adopt the same dataset split strategy as in [38].
We compare the performance of IMR-TransE against the temporal KG reasoning models, including TTransE [34], TA-DistMult/TA-TransE [30], DE-SimplE [39], TNTComplEx [32], CyGNet [11], RE-Net [38], TANGO [40], TITer [14], and xERTR [7].
In the experiments, the widely used Mean Reciprocal Rank (MRR) and Hits@1,3,10 are employed as the metrics. The filtered setting for static KGs is not suitable for the reasoning task under the exploration setting, as mentioned in xERTR [7]. This paper adopts the time-aware filtering scheme, which only filters out genuine triples at the question time.

5.2. Experimental Results

Main results. Table 2 and Table 3 show the comparison between IMR-TransE, IMR-RotatE, IMR-ComplEx, and other baseline models on ICEWS, WIKI, and YAGO. Overall, the instantiated models of IMR outperform the baseline models in all metrics while being more interpretable, which convincingly verifies its effectiveness. Due to the limited paper length, a detailed analysis of the interpretability is provided in Appendix A.1. Compared to the best baseline (TiTer), IMR-TransE obtains a relative improvement of 3.3% and 2.5% in MRR and Hits@1, averaged on ICEWS, WIKI, and YAGO. Moreover, different IMR models achieve the best performance across unique datasets due to basic models.
Comparison of multi-hop paths. Figure 3 shows the performance of IMR-TransE on ICEWS, WIKI, and YAGO as the maximum length of paths increases. The performance basically continues rising with the increase in the paths’ length. However, as the maximum length of paths increases, the performance on ICEWS18 hardly improves. Further analysis on ICEWS18 in [3] explains that there are no strong dependencies between the relations of the question and the multi-hop paths. Thus, longer paths provide little gain for inference. Moreover, as the maximum length of paths increases, the number of inference paths increases exponentially and most of the invalid paths will suppress the performance of IMR-TransE. In order to ensure that the performance of the model does not decrease, we propose to control the sampling number of next-hop paths to limit the total number of multi-step paths and suppress the impact of noisy samples. This paper set the number of next-hop samplings to 5. In summary, experiments show that unified indicators designed by IMR based on consistent basic models can uniformly measure the paths of different hops, allowing better reasoning based on paths with different hops, which verifies the claim in Section 4.4. We present an extra ablation study on three indicators in IMR-TransE in Appendix A.4.

6. Conclusions

We propose an Interpretable Multi-Hop Reasoning framework for temporal KG forecasting tasks. IMR transforms reasoning based on path searching into stepwise question answering based on consistent basic models. Moreover, IMR develops three indicators to measure the answer and reasoning paths, and this is the first study to develop interpretable evaluation indicators from the perspective of actual semantics for the temporal KG forecasting task. IMR can measure the paths of different hops according to the same criteria and be more explainable. Extensive experiments on four benchmark datasets demonstrate the effectiveness of our method. In the future, we plan to enhance the prediction by integrating different paths reaching the same tail, which will be more effective and interpretable. We will also continue to explore the models based on GAT [3] for temporal KG forecasting tasks.

Author Contributions

Supervision, Z.L. and K.H.; writing—review and editing, Z.D. and L.Q.; validation, L.C. and Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Case Studies and Interpretability

For the question (John Kerry, Make a visit, ?, 2014-11-11), we extract some of the paths for the case study in Table A1. The lower the scores or indicators in Table A1, the better the performance of the path. We compare the paths based on the total score, analyze various aspects of the paths based on detailed indicators, and verify the interpretation of the model with actual semantics.
The first block of Table A1 selects reasoning paths with the same objects to analyze the answer completion level. First, we compare path 1-1 and path 1-2. The score of path 1-1 is lower than that of path 1-2. As we analyze the three indicators further, we find that the answer completion level of path 1-1 is smaller than that of path 1-2. The comparison of the answer completion level indicates that the relation of path 1-1 should be closer to the relations of the question. Practically, path 1-1 has the same relation as the question, which is closer to the relation of question than path 1-2. Thus, actual semantics verify the interpretation of the model. Comparing path 1-4 and path 1-5, we find that the total score of path 1-4 is lower than that of path 1-5, and the answer completion level of path 1-5 is higher than that of path 1-4. IMR shows that the combination of reasoning relations of path 1-4 is better than that of path 1-5. In fact, these two paths for inference do not seem to be particularly appropriate to the question. Nevertheless, the combination of relations [Meet at a ’third’ location + Make a visit] is actually closer to the relation of the question [Make a visit] than the combination of relations [Consult + Consult]. To summarize, the first set of experiments shows that the answer completion level can effectively indicate how well the combination of path relations equals the question’s relation, verifying the statement in Section 4.4.
Table A1. Reasoning paths searched for the question (John Kerry, Make a visit, ?, 2014-11-11) and their scores, respectively.
Table A1. Reasoning paths searched for the question (John Kerry, Make a visit, ?, 2014-11-11) and their scores, respectively.
Question:John KerryMake a VisitOman2014-11-11
Path-ID Reasoning Path Score
e s r 2 e 2 t 2 e 2 r 3 e 3 t 3 f ac f qmd f pc Combined Score
path 1-1John KerryMake a visitOman2014-11-09----07474137
path 1-2John KerryExpress intent to meet or negotiateOman2014-11-09----267469169
path 1-3John Kerry(Reversed) Host a visitOman2014-11-09----277476178
path 1-4John KerryMeet at a ’third’ locationCatherine Ashton2014-11-10Catherine AshtonMake a visitOman2014-11-09387490206
path 1-5John KerryConsultMohammad Javad Zarif2014-11-10Mohammad Javad ZarifConsultOman2014-11-097374107254
path 2-1John KerryExpress intent to meet or negotiateOman2014-11-10----264741119
path 2-2John KerryExpress intent to meet or negotiateOman2014-11-09----267469170
path 2-3John KerryExpress intent to meet or negotiateOman2014-11-05----268983196
path 2-4John KerryExpress intent to meet or negotiateOman2014-11-02----269082197
path 2-5John KerryReversed Meet at a ’third’ locationCatherine Ashton2014-11-10Catherine AshtonExpress intent to meet or negotiateOman2014-11-034991101246
path 2-6John KerryReversed Meet at a ’third’ locationCatherine Ashton2014-11-10Catherine AshtonExpress intent to meet or negotiateOman2014-11-054989100242
path 2-7John KerryMake a visitChina2014-11-05----08888162
path 2-8John KerryMake a visitNorth Atlantic Treaty Organization2014-06-25----08787160
path 2-9John KerryMake a visitCanada2014-10-27----08585157
path 3-1John KerryReversed Meet at a ’third’ locationCatherine Ashton2014-11-10----534640155
path 3-2John KerryExpress intent to meet or negotiateOman2014-11-09----267469169
path 3-3John KerryMake a visitAfghanistan2014-07-21----08888162
path 3-4John KerryMake a visitAfghanistan2014-07-21AfghanistanReversed Make statementBarack Obama2014-07-183494104241
path 3-5John KerryMake a visitAngola2014-08-05Angola(Reversed) Make statementAnthony Foxx2014-08-043593105241
path 3-6John Kerry(Reversed) Make a visitCatherine Ashton2014-11-10Catherine AshtonMake a visitOman2014-11-09337485197
Table A2. Reasoning paths searched for the query (Citizen (Nigeria), Use unconventional violence, ?, 8016) and their scores, respectively.
Table A2. Reasoning paths searched for the query (Citizen (Nigeria), Use unconventional violence, ?, 8016) and their scores, respectively.
Query:Citizen (Nigeria)Use Unconventional ViolenceSecretariat (Nigeria)8016
Path-ID Reasoning Path Score
e s r 2 e 2 t 2 f ac f qmd f pc Combined Score
path 4-1Citizen (Nigeria)Use unconventional violenceMilitant (Nigeria)79680162162215
path 4-1Citizen (Nigeria)Use unconventional violenceMilitant (Nigeria)77280185185245
path 5-2Citizen (Nigeria)Reversed Use unconventional violenceTerrorist (Boko Haram)782472204199359
path 5-3Citizen (Nigeria)Reversed Use unconventional violenceTerrorist (Boko Haram)777672174168319
path 5-4Citizen (Nigeria)Reversed Use unconventional violenceMilitant (Boko Haram)787272206202363
path 5-5Citizen (Nigeria)Reversed Use unconventional violenceMilitant (Boko Haram)777672173166317
path 5-6Citizen (Nigeria)Reversed Use unconventional violenceMilitant (Boko Haram)775272175167319
Path 6-1Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram7992739595220
path 6-2Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram787273174168321
path 6-3Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram784873177171324
path 6-4Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram782473178171325
path 6-5Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram768073180173328
path 7-1Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram784873177171324
path 7-2Citizen (Nigeria)Make an appeal or requestGovernment (Nigeria)784878167158315
path 7-3Citizen (Nigeria)Reversed fight with small arms and light weaponsBoko Haram784873177171324
path 7-4Citizen (Nigeria)Reversed Make an appeal or requestTony Momoh784880207205377
path 7-5Citizen (Nigeria)Reversed Express intent to meet or negotiateSouth Africa784885169165330
path 7-6Citizen (Nigeria)Reversed Bring lawsuit againstFessehaye Yohannes784880210206379
The second block of Table A1 selects the paths of the same reasoning relations to verify the path confidence and the question matching degree. Comparing paths 2-1, 2-2, 2-3, and 2-4, we observe that the scores of the paths are increasing. Additionally, the path confidence of these three paths is also growing. In fact, the time distance between the paths and the question is gradually increasing, which means that the reliability of the paths gradually decreases. The reliability indicated by path confidence is consistent with the actual reliability. Similarly, we find that the path confidence of path 2-5 is higher than that of path 2-6, indicating that path 2-5 is less reliable. The actual situation is that the timestamp of path 2-5 (2014-11-03 < 2014-11-05) is farther from the timestamp of the question, which is consistent with the explanation. Comparing path 2-9 with paths 2-7 and 2-8, respectively, the model further infers that the path confidence and question matching degree of path 2-9 are better than those of the other two paths. The actual situation is that the timestamp error with the question satisfies path 2-7 > path 2-9 > path 2-8. This is because the question matching degree covers the path confidence. Because the path confidence contains the error of the triple in the training dataset, the triple error covers the error caused by different timestamps, which makes path 2-9 more reliable than path 2-7. In general, the second set of experiments illustrates that the path confidence can effectively indicate the validity of each path.
In the third block of Table A1, we randomly select the paths, explain the paths based on these indicators, and verify them with the actual situation. We first sort three paths according to the answer completion level: path 3-1 < path 3-2 < path 3-3. Therefore, the semantic similarity of relations between the three paths and the question should satisfy path 3-3 > path 3-2 > path 3-1. The actual semantic similarity between the relations of paths and that of the question satisfies Make a visit > Express intent to meet or negotiate > Meet at a ‘third’ location, which is consistent with the interpretation of IMR. Sort three paths by path confidence: path 3-1 < path 3-2 < path 3-3. The reliability of the three inference paths should satisfy path 3-1 < path 3-2 < path 3-3. We observe that the time distance between the three paths and the question is gradually increasing, which verifies the explanation by path confidence. The analysis of paths 3-4 to 3-6 is similar to the analysis of former paths. Case studies show that IMR can provide reasoning paths and offer a valid basis for path comparison.

Appendix A.2. Details on IMR-RotatE and IMR-ComplEx

Appendix A.2.1. IMR-RotatE

RotatE. RotatE [17] defines each relation as a rotation from head entities to tail entities in a complex vector space. Given a triplet h , t , r , we expect that t = h r , where h , r , t C k are the embeddings, the modulus for each dimension of relations satisfies r i = 1 , and ∘ denotes the Hadamard product. The score function for e s , r q , e o , t q is
f r q e s t q , e o t q = e s t q r q e o t q 2
where e s t q , r q , e o t q C k , r q i = 1 .
Question updating for IMR-RotatE.
e s i = e s i 1 r p i
r q i = r q i 1 r p i
Question matching degree for IMR-RotatE. Question matching degree f q m d in IMR-RotatE calculates the distance of the constructed quadruple based on RotatE [17]. The better the entity matches the question, the smaller the distance of quadruples will be. The calculation of f q m d for ith-hop path is as follows.
f q m d i = e s t q r q e p i t i p
where the p-norm of a complex vector V is defined as V p = V i p p . We use the L1-norm for all indicators in the following.
Answer completion level for IMR-RotatE. The calculation of f a c for ith-hop path in IMR-RotatE is as follows.
f a c i = r q r p 1 r p 2 r p 3 r p i p = r q 1 r p 2 r p 3 r p i p = r q 2 r p 3 r p i p = r q i p
Path confidence for IMR-RotatE. The calculation of f p c for ith-hop path in IMR-RotatE is as follows.
f p c i = e s i e p i t i p
where e q i represents the remaining subject of the question updated by paths of the length i, and e p i t i represents the tail reasoned by the i-hop paths.

Appendix A.2.2. IMR-ComplEx

ComplEx. ComplEx [15] extends the real space to the complex space and constrains the embeddings for relations to be diagonal matrices. The bilinear product becomes a Hermitian product in the complex space. The score function for e s , r q , e o , t q can be expressed as
f r q e s t q , e o t q = Re e s t q T diag r q e o t q
where e s t q , r q , e o t q C k .
Question updating for IMR-ComplEx. Considering that such tensor decomposition models are difficult to interpret geometrically, the metrics of IMR-ComplEx are not computed stepwise. The index of each path is calculated independently, which will lead to a certain increase in the amount of calculation.
Question matching degree for IMR-ComplEx. Question matching degree f q m d in IMR-RotatE calculates the distance of the constructed quadruple based on RotatE [17]. The better the entity matches the question, the smaller the distance of quadruples will be. The calculation of f q m d for ith-hop path is as follows.
f q m d i = Re e s t q T diag r q e p i t i
where the p-norm of a complex vector V is defined as V p = V i p p . We use the L1-norm for all indicators in the following.
Answer completion level for IMR-ComplEx. The calculation of f a c for ith-hop path in IMR-ComplEx is as follows.
f a c i = r q r p 1 × r p 2 × r p 3 × × r p i p
Path confidence for IMR-ComplEx. The calculation of f p c for ith-hop path in IMR-ComplEx is as follows.
f p c i = e s i e p i t i p
where e q i represents the remaining subject of the question updated by paths of the length i, and e p i t i represents the tail reasoned by the i-hop paths.

Appendix A.3. Entity Representation

We denote the static embedding of the entity e k with e sta k R d , which is a vector independent of time. IMR-TransE adopts the static embedding in xERTR [41]. xERTR [41] proposes a generic time encoding to generate the time-variant part of entity representations, which can be denoted as Φ t .
Φ t = 1 d cos ω 1 t + ϕ 1 , , cos ω d t + ϕ d , Φ t R d
where ω i , ϕ i , i = 1 , 2 , , d denote the frequencies and phase shift of time encoding, respectively. Employing this time encoding, quadruples with the same subject, predicate, and object can have different attention scores. Specifically, quadruples that occurred recently tend to have higher attention scores. This makes the embedding more interpretable and effective.
In fact, the attribute deviation caused by the time deviation is the only assumption obtained after statistics. It is the semantic attributes of entities that determine the reasoning. In order to avoid being only affected by time factors, we propose a new time-specific entity representation Ψ k t R d , i.e., each entity has a different representation at different timestamps. If each entity applies different representations at every moment, it will consume enormous resources. As most of the entities are only observed at limited timestamps, this paper characterizes the entities whose timestamps only appear in the training dataset. IMR utilizes the embedding of the separate entity when it last occurred in the training dataset to represent the embedding at the timestamps missing from the training dataset. Moreover, we apply regularizations on time continuity to avoid over-fitting caused by too many parameters. This regularization believes that the temporally continuous entities should have closer embeddings, which is described in Section 4.5. Finally, we combine Φ t and Ψ k t to construct e dy k t R 2 d .
e dy k t = Φ t | | Ψ k t
In summary, the embedding for each entity e k t can be represented as follows:
e k t = a c t M L P ( [ e sta k | | e dy k t ] )
The entities’ timestamps in actual datasets are sparse, e.g., ICEWS114 and YAGO have only 11 and 21 timestamps per entity on average. In view of the huge memory usage, we reduce the parameters by basis vectors in actual implementations. The entities’ dynamic embeddings are linearly combined by 50 shared vectors. Table A3 shows the memory usage in the ablation experiments on entity-time-specific embeddings.
Table A3. The memory usage of the ablation experiments on entity-time-specific embeddings.
Table A3. The memory usage of the ablation experiments on entity-time-specific embeddings.
DatasetEnt-Time-SpecificNon-Ent-Time-SpecificMemory Increment
ICEWS1445.21 G39.84 G5.37 G
ICEWS1861.45 G46.00 G15.37 G
WIKI54.39 G21.36 G33.03 G
YAGO38.40 G26.60 G11.80 G
We can find that using entities’ dynamic embeddings brings an extra 5-15 G in memory usage, which is under the affordable range.

Appendix A.4. Combination of Indicators

The three indicators measure different aspects of the path: the matching degree between answers and the question, the completeness of relational equivalence, and the reliability of the reasoning paths. We verify the performance of each metric through ablation experiments. As shown in Table A4 and Table A5, the first block displays the performance with only one indicator, the second block presents the performance with a combination of two parameters, and the last is a combination of three indicators. The bottom line shows the error between the combination of the three parameters and the best result. Since the distribution varies across two datasets, there are certain differences in performance when employing a single indicator to rank paths. The model’s performance has significantly improved after incorporating the three indicators in pairs, but a few differences remain. IMR-TransE can obtain the best inference performance in most datasets by combining three indicators. In summary, the experiment illustrates that the combination of three indicators designed by IMR-TransE can effectively measure the reasoning paths.
Table A4. The comparison of the three indicators in different combinations between YAGO and ICEWS14 datasets. We average the output of ten experiments with different random seeds and fixed hyperparameters. All metrics are multiplied by 100.
Table A4. The comparison of the three indicators in different combinations between YAGO and ICEWS14 datasets. We average the output of ten experiments with different random seeds and fixed hyperparameters. All metrics are multiplied by 100.
DatasetYAGOICEWS14
IndicatorHit@1Hit@3Hit@10MRRHit@1Hit@3Hit@10MRR
f q m d 87.3292.5392.7689.8722.6139.2055.3233.48
f a c 87.7992.6792.7890.1831.6746.0259.2141.05
f p c 87.7492.6792.7790.1525.6543.0358.2536.63
f a c , f q m d 87.9592.6792.7790.2634.9149.2661.1243.82
f p c , f q m d 87.7492.6792.7590.1525.6443.1658.3036.63
f a c , f p c 87.9192.6592.7790.2434.8149.0261.1543.74
f a c , f p c , f q m d 88.3192.6692.7790.4834.9649.2761.0943.89
Distance to the best00.0100000.060
Table A5. The comparison of the three indicators in different combinations between WIKI and ICEWS18 datasets. We average the output of ten experiments with different random seeds and fixed hyperparameters. All metrics are multiplied by 100.
Table A5. The comparison of the three indicators in different combinations between WIKI and ICEWS18 datasets. We average the output of ten experiments with different random seeds and fixed hyperparameters. All metrics are multiplied by 100.
DatasetWIKIICEWS18
IndicatorHit@1Hit@3Hit@10MRRHit@1Hit@3Hit@10MRR
f q m d 70.7583.3985.8777.1212.7626.4743.7522.66
f a c ----20.4133.5047.4829.45
f p c 70.7283.3585.3177.0014.9229.0045.5824.82
f a c , f q m d 76.1284.9085.9480.4623.0536.2049.4731.84
f p c , f q m d 73.8584.1285.6578.9913.1026.3843.2722.75
f a c , f p c 76.0484.9185.9580.4123.0436.1049.4631.83
f a c , f p c , f q m d 76.0984.9285.9680.4423.1536.1249.5231.89
Distance to the best0.03000.0200.0800
From the above experiments, we can only use two indicators in IMR-TransE. However, IMR can be instantiated based on other models. For example, the performance of IMR-RotatE with any two indicators is quite different. Thus, we should reserve all indicators for the best performance.

Appendix A.5. Correlation between IMR and Other Models

Correlation between IMR and PTransE. Both IMR and PTransE consider measuring the semantic equivalence between relations. PtransE resembles the ensemble, which combines the scores of relations and triples in different models. IMR indicators are based on unified theoretical models (such as TransE or RotatE), which can effectively combine different paths. IMR can truly measure the paths of different hops under the same criteria. Moreover, IMR further designs path confidence for time attributes.
Correlation between IMR and reinforcement-learning-based models. First, the reinforcement learning models are black-box models, which cannot explain the basis of judgment. Moreover, reinforcement learning utilizes rewards, which is essentially a measure of the matching degree between tails and the question. This end-to-end design is essentially that of the question matching degree in IMR, which is unexplainable and complicated.
Moreover, IMR is the first to design indicators from the perspective of actual semantics, so we select the basic embedding models as the basis for IMR to better illustrate the pathway. The modeling of triples in TransE is elementary, so the formulas of indicators are simple. Compared with the complex greedy algorithm, it is natural to take the design of IMR as too simple. Although the design of IMR-TransE is simple, it achieves better performance than reinforcement learning models, such as Titer. The indicators of instantiated IMR models can be more complex and their performance will be better.
Finally, we should design other indicators of IMR based on consistent basic models (such as RotatE). Current reinforcement learning models are commonly based on multi-layer networks. We cannot further design the other two indicators.

Appendix A.6. Correlation between Path Confidence and Time Distance in IMR-TransE

The current sampling strategy believes that the greater the time distance of the same entity, the greater the deviation of its semantic properties. Therefore, IMR adopts a time-negative sampling strategy to search for more effective paths. Path reliability is affected by semantic similarity, and negative time-aware correlation is a general situation or statistical result. IMR proposes path reliability to better measure the reliability of the searched path. Here, we utilize the path confidence of the same path with different timestamps to analyze the changes in semantic similarity over time. For the same problem, we find the same path with various timestamps. We randomly select 20 questions for the path search, and each question selects the same path containing ten different timestamps to calculate the path confidence. Figure A1 shows how the path confidence of each path changes with time and distance.
Figure A1 shows that as the time distance between the paths and questions increases, the score of path confidence gradually increases, indicating that its confidence is gradually decreasing. Experiments show that the semantic deviation of the same entity increases as the time distance increases, which verifies the rationality of time-aware negative exponential sampling.
Figure A1. The relation between path confidence and time distance. The questions and paths corresponding to each polyline are shown in (a,b).
Figure A1. The relation between path confidence and time distance. The questions and paths corresponding to each polyline are shown in (a,b).
Entropy 25 00666 g0a1

Appendix A.7. The Offsetting Property in Question Updating

In order to infer the correct tails, the query updating module should satisfy that the question still matches the same tail entity even after updating. As shown in Equation (A14), we take IMR-TransE to analyze the offsetting property.
e q i + r q i = e q i 1 + r pi + r q i 1 r pi = e q i 1 + r q i 1 = e q 0 + r q 0 = e q + r q = e o
This cancellation of the relation guarantees that the answers to questions will not change along with the paths. In addition, the offset will not appear in the calculation of the indicator. Only the subject of the question is applied in the calculation of the path confidence, and only the relation in the question is used in the calculation of the answer completion level.

References

  1. Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the NIPS 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
  2. Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A large ontology from wikipedia and wordnet. J. Web Semant. 2008, 6, 203–217. [Google Scholar] [CrossRef]
  3. Li, Z.; Jin, X.; Guan, S.; Li, W.; Guo, J.; Wang, Y.; Cheng, X. Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs. In Proceedings of the ACL/IJCNLP (1), Virtual Event, 1–6 August 2021; pp. 4732–4743. [Google Scholar]
  4. Jin, W.; Zhang, C.; Szekely, P.A.; Ren, X. Recurrent Event Network for Reasoning over Temporal Knowledge Graphs. arXiv 2019, arXiv:1904.05530. [Google Scholar]
  5. Xu, C.; Nayyeri, M.; Alkhoury, F.; Yazdi, H.S.; Lehmann, J. Temporal Knowledge Graph Completion Based on Time Series Gaussian Embedding. In Proceedings of the ISWC (1), Athens, Greece, 2–6 November 2020; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2020; Volume 12506, pp. 654–671. [Google Scholar]
  6. Jung, J.; Jung, J.; Kang, U. Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion. In Proceedings of the KDD, Singapore, 14–18 August 2021; ACM: New York, NY, USA, 2021; pp. 786–795. [Google Scholar]
  7. Han, Z.; Chen, P.; Ma, Y.; Tresp, V. Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs. In Proceedings of the ICLR, OpenReview.net, Vienna, Austria, 4 May 2021. [Google Scholar]
  8. Wu, J.; Cao, M.; Cheung, J.C.K.; Hamilton, W.L. TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion. In Proceedings of the EMNLP (1), Online, 16–20 November 2020; pp. 5730–5746. [Google Scholar]
  9. Pavlović, A.; Sallinger, E. ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion. arXiv 2022, arXiv:2206.04192. [Google Scholar]
  10. Wang, X.; Chen, J.; Wu, F.; Wang, J. Exploiting Global Semantic Similarities in Knowledge Graphs by Relational Prototype Entities. arXiv 2022, arXiv:2206.08021. [Google Scholar]
  11. Zhu, C.; Chen, M.; Fan, C.; Cheng, G.; Zhan, Y. Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks. arXiv 2020, arXiv:2012.08492. [Google Scholar] [CrossRef]
  12. Nayyeri, M.; Vahdati, S.; Khan, M.T.; Alam, M.M.; Wenige, L.; Behrend, A.; Lehmann, J. Dihedron Algebraic Embeddings for Spatio-Temporal Knowledge Graph Completion. In Proceedings of the Semantic Web—19th International Conference, ESWC 2022, Hersonissos, Crete, Greece, 29 May–2 June 2022; Lecture Notes in Computer Science. Groth, P., Vidal, M., Suchanek, F.M., Szekely, P.A., Kapanipathi, P., Pesquita, C., Skaf-Molli, H., Tamper, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2022; Volume 13261, pp. 253–269. [Google Scholar] [CrossRef]
  13. Chen, K.; Wang, Y.; Li, Y.; Li, A. Rotateqvs: Representing temporal information as rotations in quaternion vector space for temporal knowledge graph completion. arXiv 2022, arXiv:2203.07993. [Google Scholar]
  14. Sun, H.; Zhong, J.; Ma, Y.; Han, Z.; He, K. TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting. In Proceedings of the EMNLP, Virtual, 7–9 November 2021. [Google Scholar]
  15. Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the ICML, New York, NY, USA, 19–24 June 2016. [Google Scholar]
  16. Yang, B.; Yih, W.T.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2015, arXiv:1412.6575. [Google Scholar]
  17. Sun, Z.; Deng, Z.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. arXiv 2019, arXiv:1902.10197. [Google Scholar]
  18. Nickel, M.; Tresp, V.; Kriegel, H. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
  19. Zhou, M.; Huang, M.; Zhu, X. An Interpretable Reasoning Network for Multi-Relation Question Answering. In Proceedings of the COLING, Santa Fe, NM, USA, 20–26 August 2018; pp. 2010–2022. [Google Scholar]
  20. Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the AAAI, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]
  21. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the AAAI, Austin, TX, USA, 14–18 November 2015. [Google Scholar]
  22. Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the ACL, Beijing, China, 26–31 July 2015. [Google Scholar]
  23. Balazevic, I.; Allen, C.; Hospedales, T.M. TuckER: Tensor Factorization for Knowledge Graph Completion. arXiv 2019, arXiv:1901.09590. [Google Scholar]
  24. Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  25. Nguyen, D.Q.; Nguyen, T.; Nguyen, D.Q.; Phung, D.Q. A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. arXiv 2018, arXiv:1712.02121. [Google Scholar]
  26. Nguyen, D.Q.; Vu, T.; Nguyen, T.; Nguyen, D.Q.; Phung, D.Q. A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization. arXiv 2019, arXiv:1808.04122. [Google Scholar]
  27. Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based Multi-Relational Graph Convolutional Networks. arXiv 2020, arXiv:1911.03082. [Google Scholar]
  28. Li, R.; Cheng, X. DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning. In Proceedings of the EMNLP/IJCNLP (1), Hong Kong, China, 3–7 November 2019; pp. 2642–2651. [Google Scholar]
  29. Wang, H.; Li, S.; Pan, R.; Mao, M. Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning. In Proceedings of the EMNLP/IJCNLP (1), Hong Kong, China, 3–7 November 2019; pp. 2623–2631. [Google Scholar]
  30. García-Durán, A.; Dumancic, S.; Niepert, M. Learning Sequence Encoders for Temporal Knowledge Graph Completion. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
  31. Messner, J.; Abboud, R.; Ceylan, İ.İ. Temporal Knowledge Graph Completion Using Box Embeddings. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, the Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022, Virtual Event, 22 February–1 March 2022; pp. 7779–7787. [Google Scholar]
  32. Lacroix, T.; Obozinski, G.; Usunier, N. Tensor Decompositions for temporal knowledge base completion. arXiv 2020, arXiv:2004.04926. [Google Scholar]
  33. Boschee, E.; Lautenschlager, J.; O’Brien, S.; Shellman, S.; Starz, J.; Ward, M. Icews Coded Event Data; Harvard Dataverse: Cambridge, MA, USA, 2015; Volume 12. [Google Scholar]
  34. Leblay, J.; Chekol, M. Deriving Validity Time in Knowledge Graph. In Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018. [Google Scholar]
  35. Mahdisoltani, F.; Biega, J.; Suchanek, F.M. YAGO3: A Knowledge Base from Multilingual Wikipedias. In Proceedings of the CIDR, Asilomar, CA, USA, 4–7 January 2015. [Google Scholar]
  36. Miller, G.A. WordNet: A Lexical Database for English. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
  37. Dasgupta, S.S.; Ray, S.N.; Talukdar, P.P. HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding. In Proceedings of the EMNLP, Brussels, Belgium, 31 October–4 November 2018; pp. 2001–2011. [Google Scholar]
  38. Jin, W.; Qu, M.; Jin, X.; Ren, X. Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs. In Proceedings of the EMNLP, Online, 16–20 November 2020. [Google Scholar]
  39. Goel, R.; Kazemi, S.M.; Brubaker, M.A.; Poupart, P. Diachronic Embedding for Temporal Knowledge Graph Completion. arXiv 2020, arXiv:1907.03143. [Google Scholar] [CrossRef]
  40. Ding, Z.; Han, Z.; Ma, Y.; Tresp, V. Temporal Knowledge Graph Forecasting with Neural ODE. arXiv 2021, arXiv:2101.05151. [Google Scholar]
  41. Xu, D.; Ruan, C.; Körpeoglu, E.; Kumar, S.; Achan, K. Inductive Representation Learning on Temporal Graphs. arXiv 2020, arXiv:2002.07962. [Google Scholar]
Figure 1. The architecture of IMR. We take the 2-hop path search as an example. The black and red arrows denote time-aware exponentially weighted sampling and pruning based on the scores of paths, respectively (Section 4.2). The blue arrows denote the calculation of the rest of the questions for each path (Section 4.3). (Sub, Rel, ?, Time) is regarded as the original question, which can be denoted as e s , r q , ? , t q . The searched two paths are [(Sub,R1,Obj1,Time1)] and [(Sub,R1,Obj1,Time1),(Obj1,R5,Obj5,Time5)], which can be denoted as e s , r p 1 , e 1 , t 1 and e s , r p 1 , e 1 , t 1 , e 1 , r p 2 , e 2 , t 2 , respectively. (Sub’, Rel’, ?, Time) and (Sub”, Rel”, ?, Time) denote the remaining questions after the 1-hop and 2-hop path, which can be taken as e s 1 , r q 1 , ? , t q , e s 2 , r q 2 , ? , t q , respectively.
Figure 1. The architecture of IMR. We take the 2-hop path search as an example. The black and red arrows denote time-aware exponentially weighted sampling and pruning based on the scores of paths, respectively (Section 4.2). The blue arrows denote the calculation of the rest of the questions for each path (Section 4.3). (Sub, Rel, ?, Time) is regarded as the original question, which can be denoted as e s , r q , ? , t q . The searched two paths are [(Sub,R1,Obj1,Time1)] and [(Sub,R1,Obj1,Time1),(Obj1,R5,Obj5,Time5)], which can be denoted as e s , r p 1 , e 1 , t 1 and e s , r p 1 , e 1 , t 1 , e 1 , r p 2 , e 2 , t 2 , respectively. (Sub’, Rel’, ?, Time) and (Sub”, Rel”, ?, Time) denote the remaining questions after the 1-hop and 2-hop path, which can be taken as e s 1 , r q 1 , ? , t q , e s 2 , r q 2 , ? , t q , respectively.
Entropy 25 00666 g001
Figure 2. A brief illustration of the path scoring module.
Figure 2. A brief illustration of the path scoring module.
Entropy 25 00666 g002
Figure 3. Comparison of the performance of paths with different maximum hops on four datasets. We average the output of four experiments with different random seeds and fixed hyperparameters.
Figure 3. Comparison of the performance of paths with different maximum hops on four datasets. We average the output of four experiments with different random seeds and fixed hyperparameters.
Entropy 25 00666 g003
Table 1. Statistics of three benchmark datasets.
Table 1. Statistics of three benchmark datasets.
DatasetICEWS14ICEWS18WIKIYAGO
entity712823,03312,55410,623
relation2302562410
timestamp365304232189
training63,685373,018539,286161,540
validation13,82345,99567,53819,523
test13,22249,54563,11020,026
Table 2. Results comparison on ICEWS14 and ICEWS18. Compared metrics are time-aware filtered MRR (%) and Hits@1/3/10 (%), which are multiplied by 100. The best results among all models are in bold.
Table 2. Results comparison on ICEWS14 and ICEWS18. Compared metrics are time-aware filtered MRR (%) and Hits@1/3/10 (%), which are multiplied by 100. The best results among all models are in bold.
ICEWS14ICEWS18
MRRHit@1Hit@3Hit@10MRRHit@1Hit@3Hit@10
TTransE13.433.1117.3234.558.311.928.5621.89
TA-DistMult26.4717.0930.2245.4116.758.6118.4133.59
DE-SimplE32.6724.4335.6949.1119.3011.5321.8634.80
TNTComplEx32.1223.3536.0349.1327.5419.5230.8042.869
CyGNet32.7323.6936.3150.6724.9315.9028.2842.61
RE-NET38.2828.6841.3454.5228.8119.0532.4447.51
xERTE40.7932.7045.6757.3029.3121.0333.5146.488
TANGO-Tucker28.6819.3532.1747.04
TANGO-DistMult26.7517.9230.0844.09
TITer41.7332.7446.4658.4429.9822.0533.46
IMR-TransE44.7635.6449.4962.3032.4522.9736.0549.36
IMR-RotatE44.2135.1348.7262.0432.6723.5336.7650.67
IMR-ComplEx44.0334.5549.2162.1133.3324.0737.6551.51
Table 3. Results comparison on WIKI and YAGO. Compared metrics are time-aware filtered MRR (%) and Hits@1/3/10 (%), which are multiplied by 100. The best results among all models are in bold.
Table 3. Results comparison on WIKI and YAGO. Compared metrics are time-aware filtered MRR (%) and Hits@1/3/10 (%), which are multiplied by 100. The best results among all models are in bold.
WIKIYAGO
MRRHit@1Hit@3Hit@10MRRHit@1Hit@3Hit@10
TTransE29.2721.6734.4342.3931.1918.1240.9151.21
TA-DistMult44.5339.9248.7351.7154.9248.1559.6166.71
DE-SimplE45.4342.647.7149.5554.9151.6457.3060.17
TNTComplEx45.0340.0449.3152.0357.9852.9261.3366.69
CyGNet33.8929.0636.1041.8652.0745.3656.1263.77
RE-NET49.6646.8851.1953.4858.0253.0661.0866.29
xERTE71.1468.0576.1179.0184.1980.0988.0289.78
TANGO-Tucker50.4348.5251.4753.5857.8353.0560.7865.85
TANGO-DistMult51.1549.6652.1653.3562.7059.1860.3167.90
TITer75.5072.9677.4979.0287.4784.8989.9690.27
IMR-TransE80.4176.0484.9185.9590.2487.9192.6592.77
IMR-RotatE79.4374.3684.5985.7990.3488.1092.6992.78
IMR-ComplEx80.5476.1284.9885.9790.1987.8092.7192.78
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, Z.; Qu, L.; Liang, Z.; Huang, K.; Cui, L.; Gao, Z. IMF: Interpretable Multi-Hop Forecasting on Temporal Knowledge Graphs. Entropy 2023, 25, 666. https://doi.org/10.3390/e25040666

AMA Style

Du Z, Qu L, Liang Z, Huang K, Cui L, Gao Z. IMF: Interpretable Multi-Hop Forecasting on Temporal Knowledge Graphs. Entropy. 2023; 25(4):666. https://doi.org/10.3390/e25040666

Chicago/Turabian Style

Du, Zhenyu, Lingzhi Qu, Zongwei Liang, Keju Huang, Lin Cui, and Zhiyang Gao. 2023. "IMF: Interpretable Multi-Hop Forecasting on Temporal Knowledge Graphs" Entropy 25, no. 4: 666. https://doi.org/10.3390/e25040666

APA Style

Du, Z., Qu, L., Liang, Z., Huang, K., Cui, L., & Gao, Z. (2023). IMF: Interpretable Multi-Hop Forecasting on Temporal Knowledge Graphs. Entropy, 25(4), 666. https://doi.org/10.3390/e25040666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop