1. Introduction
Knowledge graphs (KGs) contain facts/triples that indicate relationships between entities, such as (South Korea, Accuse, and North Korea). However, in the real world, many entities and relations are time-dependent. To capture the evolution of information over time, temporal knowledge graphs (TKGs) have been developed. In TKGs, triples (s, p, o) are expanded into quadruples (s, p, o, t), where the time t denotes the validity of the fact. For example, (South Korea, Accuse, North Korea, 17 October 2014) indicates that South Korea accused North Korea on 17 October 2014.
While the TKGs provide a comprehensive description of the dynamics of real-world relationships, specific complexities are introduced when dealing with such structures. Temporal knowledge graph reasoning is realized through a link prediction task that focuses on identifying candidates that are most likely to complete a timestamped query of the form (s, p, ?, t), where s represents the head entity, p represents the relationship, and t represents the timestamp. Traditional learning techniques for static KGs are unable to capture the dynamic interactions between entities. To address this issue, various temporal knowledge graph embedding (KGE) models are proposed to encode entities and relations in a low-dimensional space using translation-based functions [
1,
2], a deep neural network-based method [
3,
4], and a tensor decomposition method [
5,
6]. As these methodologies have progressed, numerous strategies for TKG completion have been proposed, including geometric methods (ChronoR [
7], BoxTE [
8], TLT-KGE [
9], RotateQVS [
10], HTTR [
11], PTKE [
12]); tensor decomposition methods (QDN [
13], TBDRI [
14]); deep learning and embedding-based methods (TASTER [
15], TeAST [
16], RoAN [
17], BiQCap [
18]); and graph neural network-based reasoning methods (TARGCN [
19], T-GAP [
20], TAL-TKGC [
21]). Interpolation is a statistical method that utilizes the relevant known values to estimate an unknown value or set [
19]. The goal of extrapolation is to infer “future” unknown values from the current information [
22]. These models employ time-dependent embedding of entities and relations to enhance the KG reasoning process. While effective, these methods lack interpretability. As shown in
Figure 1, the color shades in the heatmap indicate how often the entity appears at different timestamps. Temporal information can be regarded as perturbations to the representations of entities or relations. As a result, it can be challenging for current models to accurately represent their temporal dynamics over time.
To overcome this issue, logic rule-based symbolic techniques have been proposed as a substitute for embedding-based models. One advantage of these methods is that they produce transparent and comprehensible inference results. However, the subtleties of event complexities can make the manual rule formulation process challenging, which might impede the acquisition of knowledge. Recent works, such as TLogic [
23] and LCGE [
24], propose methods for temporal KG reasoning and automatically extract knowledge rules. However, there are still several challenges that need to be addressed. First, these techniques used to determine confidence in individual rules sometimes ignore the interaction between several rules, resulting in an oversight of certain rules. Second, temporal rule selection with high-confidence metrics may include inconsistencies and irrelevant facts.
To overcome the limitations of both embedding-based and rule-based methods, we propose the MVFF framework that integrates temporal logical rules, tensor decomposition, time-encoding module, and GNN-based method. MVFF employs closed-loop restrictions and the temporal random walk algorithm [
25] to produce a set of candidate rules based on sample entity sequences. The rules obtained after Gumbel sampling are combined with tensor decomposition embedding, time-encoding, and relationship-aware graph neural networks [
26] for both scalability and interpretability.
The main contributions of our work include:
The proposed rule sampling method is differentiable, which aids in network backpropagation and weight updates. Additionally, it allows for the exploration of rule diversity or the utilization of rules with high confidence through temperature parameters.
To capture semantic, structural, and temporal information about temporal knowledge graphs, MVFF combines temporal coding techniques, relation-aware graph neural networks, and tensor decomposition.
Experimental results on four benchmark temporal knowledge graph datasets show that our model significantly outperforms many state-of-the-art baseline methods. Our framework improves 2.98%/4.96%/23.97%/6.78% on the MRR evaluation metrics, and 5.03%/3.29%/26.0%/7.82% on the Hit@1 evaluation metrics for datasets ICEWS14/ICEWS05-15/Wikidata12k/GDELT, respectively.
The rest of this paper is organized as follows.
Section 2 briefly reviews the related work of temporal knowledge graph reasoning. In
Section 3, we introduce the proposed method (MVFF), including the temporal logical rule learning module, temporal logical rule learning module, and multi-view feature fusion module. Afterward, we discuss experiments of temporal knowledge graph reasoning in
Section 4.
Section 5 delineates the limitations of our method.
Section 6 concludes this paper and discusses the future research directions.
4. Experiments
4.1. Datasets
In this paper, we utilized four datasets to validate the effectiveness of the proposed model, including ICEWS14 [
10], ICEWS0515 [
10], Wikidata12k [
2], and GDELT [
10]. Dataset statistics are shown in
Table 2. The following explains the selection of the above datasets in this paper: First, training distinct time sets can satisfy the variable time intervals in these datasets. Second, the temporal representation of Wikidata12k is similar to that of ICEWS. Third, to give a more precise comparison of the experimental results, we chose the same dataset as the benchmark model.
4.2. Experimental Setup
RGNN was first trained on static knowledge graphs without temporal information. The embedding dimensions of entities and relations ranged from 100, 150, 200, 250, and the learning rate was optimized within 0.01, 0.05, 0.1, 0.2. The batch size of each training data set was set to 30,000. The rule confidence threshold range used was 0.1, 0.2, 0.3, 0.4, 0.5. The range of embedding dimensions of entities and relations in the tensor decomposition is 500, 1000, 1500, 2000, 2500. The learning rate was set within 0.01, 0.05, 0.1, 0.5, and the weight range of N3 regularization was set to 0.01. The ratio of static knowledge features in the loss calculation was set to 2.5, and the batch size was set to 1000. The maximum number of iterations for the model was set to 500. During model training, early stopping was used.
4.3. Baseline Models
We compare MVFF with static knowledge graph methods and temporal knowledge graph methods. Static knowledge graph methods include TransE [
27], DisMult [
42], ComplEx [
5], SimplE [
43]. TKG methods are divided into geometric methods (TTransE [
1], HyTE [
2], TeRo [
31], ChronoR [
7], BoxTE [
8], TLT-KGE [
9], RotateQVS [
10], HTTR [
11], PTKE [
12], TGeomE [
44]); tensor decomposition methods (TNTComplEx [
6], TuckERTNT [
33], QDN [
13], TBDRI [
14]); deep learning and embedding-based methods (TeMP-SA [
3], DE-Simple [
4], TeLM [
45], TASTER [
15], TeAST [
16], RoAN [
17], BiQCap [
18]); and graph neural network based-reasoning methods (TARGCN [
19], T-GAP [
20], TAL-TKGC [
21]).
4.4. Measures
We evaluate MVFF using mean reciprocal rank (MRR) and Hits@{1, 3, 10}(H@1, 3, 10).
where
N is the total amount of test data.
If is true, the value of function f is 1. Typically, n is usually set as 1, 3, or 10. Keep in mind that better performance is indicated by higher MRR and Hits@n.
4.5. Overall Results
In this section, we compare the MVFF model with other TKG reasoning models, including traditional static knowledge graph-based reasoning models, embedding-based TKG reasoning models, and GNN-based TKG reasoning models. The comparison results on the ICEWS2014 and ICEWS05-15 datasets are shown in
Table 3, and the comparison results on the Wikidata12k and GDELT datasets are shown in
Table 4.
As shown in the tables, the MVFF model achieved the best experimental results in all evaluation metrics. We notice that, across all datasets, the MVFF model consistently and significantly outperforms all baselines. Our technique achieves significant performance increases, particularly when compared to the baselines that perform the best overall. Compared to the state-of-the-art approach, our framework improves 2.98%/4.96%/ 23.97%/6.78% on the MRR evaluation metrics and 5.03%/3.29%/26.0%/7.82% on the Hit@1 evaluation metrics for datasets ICEWS14/ICEWS05-15/Wikidata12k/GDELT, respectively.
4.6. The Impact of Different Hyperparameters
This section investigates the impact of different hyperparameters on model performance.
4.6.1. Comparison of Different Rule Confidence
Based on the knowledge graph reasoning algorithm that uses temporal logical rules, the quality of the rules plays a crucial role in the algorithm’s performance. The rule confidence threshold is a metric utilized to measure the minimum probability value at which a rule is considered to be correct. In this study, we conducted experiments to determine the optimal range of rule confidence thresholds, which include values of 0.1, 0.2, 0.3, 0.4, 0.5. We specified a minimum body support of 20 for the experiment to prevent small body support from leading to high overall rule confidence. Moreover, we held all other parameters constant to disentangle the influence of the rule confidence threshold from other factors. The results, which are displayed in
Table 5, illustrate how various rule confidence criteria affect the algorithm’s performance for the ICEWS14 and ICEWS05-15 datasets.
4.6.2. Comparison of Embedding Dimensions
In this study, we investigated the impact of embedding dimensions on the TKG reasoning method based on embedding representation, where entities, relations, and timestamps are embedded into a low-dimensional vector space for link prediction tasks. While higher embedding dimensions can improve model performance, excessively high dimensions can also increase the computational cost. To examine the influence of embedding dimensions, with a batch size of 1000, an RGNN embedding dimension of 250, a learning rate of 0.1, and a rule confidence threshold of 0.2, we conducted comparison analyses on the ICEWS14 dataset. Except for the embedding dimension, all other parameters remained unchanged. According to the experimental results, as illustrated in
Figure 4, the model’s performance increases with increasing embedding dimensions—that is, until a certain point, at which point it begins to decrease. As
Figure 4 shows, the embedding dimension is fixed at 2000.
4.6.3. The Impact of Rule Sampling Methods
The experiment used a Gumbel-Softmax method to sample rules that meet the conditions for specific relationships. Other methods utilized Argmax to select the rule with the highest confidence score and used all the rules that meet the conditions without sampling. Comparative experiments were conducted on the ICEWS14 dataset while keeping other model parameters constant. The experimental results, as shown in
Figure 5, demonstrated that the Gumbel-Softmax method yielded the best performance.
In the comparative analysis of temporal rule sampling methods, the Gumbel-softmax method can flexibly sample the appropriate rules, the Argmax method selects the rules with maximum confidence, and the third method uses all the rules; the experiment was performed on the ICEWS14 dataset. As can be seen in
Figure 5, the training evaluation curve of the Gumbel-softmax method has been kept the highest, and as a result, it can be confirmed that this study utilizes differentiable sampling effectively.
4.6.4. Comparison of Different Temperature Coefficients
To explore the impact of the Gumbel-Softmax method on rule sampling, we conducted experiments using different temperature coefficients (
). A higher
value means more new rules are explored during the sampling process, while a lower
value means more rules with higher probabilities are exploited. It can be seen from
Figure 6 that
is set to 10, and the result is the best. This implies that the model prefers to explore diverse temporal logic rules.
4.7. Statistics of Extracted Temporal Logic Rules
Table 6 displays the statistics of the rules collected from the ICEWS14 dataset. The number of excursions across each relation was set to 200, and the sequence lengths of the excursions were 2 and 3, corresponding to rules of lengths 1 and 2.
Table 7 includes the number of rule instances and the number of rules for various rule confidence levels. Our study presents six distinct temporal reasoning patterns, each with its own confidence level, reflecting the likelihood of one event leading to another within temporal knowledge graphs. These patterns range from direct actions such as accusations leading to rejections (Pattern 1, confidence 0.186) to more complex interactions involving multiple parties and actions, like the influence of making statements and hosting visits on subsequent consultations (Pattern 3, confidence 0.143). Notably, Pattern 2 (confidence 0.506) underscores a significant likelihood of property confiscation following investigations, highlighting a strong predictive relationship in legal or regulatory contexts. By synthesizing these examples, we gain insights into the reasoning power and practical implications of these patterns. This not only validates the effectiveness of our method in capturing complex temporal relationships but also opens avenues for further exploration into the impact of various rule extraction methods on reasoning accuracy.
4.8. Ablation Study
To further analyze the contribution of model components (logic rules, RGNN, and TE), we perform a set of ablation studies. In the MVFF model, the temporal logic rule models the causal relationship using extracted rules, the RGNN encodes structural information for the static knowledge graph, and TE learns the temporal information. To ensure the comparability of the results, the same parameters were used in the experiments. The embedding dimension of RGNN was set to 250, the embedding dimension of tensor decomposition was set to 2000, and the rule confidence threshold was set to 0.2. The experimental results are shown in
Table 8. As shown in the table, the performance of MVFF decreases no matter which module is removed, especially after the removal of the RGNN module, the performance of MVFF decreases the most, thus verifying the importance of a static knowledge graph in temporal knowledge graph reasoning.
4.9. Case Study
Our framework provides interpretable reasoning processes, as can be seen in the explainable reasoning module in the top right corner of
Figure 2; for the query (South Korea, Reject, ?, 30 April 2024), based on the temporal rules
and the existing quadruple (South Korea, Accuse, North Korea, 17 October 2024), we can interpretably answer the query as to where the answer (North Korea) is located.