1. Introduction
With the rapid development of the internet, people have witnessed the emergence of many online social media (OSM) tools in the past decade, such as Twitter, Facebook, Instagram, etc. These OSMs have gradually become the primary source of information in people’s daily lives and have fundamentally changed how people share information [
1]. However, OSMs are a double-edged sword. On the one hand, they allow the creation of social connections during periods of social distance and facilitate the dissemination of knowledge in various contexts. On the other hand, they may cause people to share quick and superficial ideas, such as rumors, and spread them rapidly [
1,
2,
3,
4,
5]. In this paper, rumors are defined as information that is unconfirmed or has been proven to be false [
3,
6,
7,
8]. The explosive spread of rumors threatens the credibility of the internet and has serious adverse effects on individuals and society [
9,
10,
11,
12,
13,
14]. Therefore, effective identification of rumors is crucial to maintaining the security of cyberspace and preserving personal privacy [
13,
15,
16,
17,
18,
19]. However, rumors’ brief and fast-spreading nature make them difficult to detect automatically. Therefore, automatic rumor detection has attracted attention from more and more researchers.
Early automatic rumor detection mainly focused on the content features of posts [
13,
14]. However, these single features do not achieve excellent results. The propagation process from the source post to responsive posts is a natural tree structure, known as the spatial structure. Several researchers have used the spatial structure as a feature to identify rumors. Liu et al. [
20] used graph convolutional networks (GCN) to dynamically combine influence and propagation structure relationships. Bian et al. [
21] used bidirectional graph convolutional networks (Bi-GCN) to encode top-down propagation and bottom-up diffusion of rumor trees. Meanwhile, temporal information (called the temporal structure) is an important propagation feature. Li et al. [
22] applied a time-step encoder and a temporal attention mechanism to learn the temporal structure of propagation. Huang et al. [
4] designed a neural network to capture the spatial and temporal structure of propagation jointly. Dun et al. [
23] concerned that existing studies ignored external knowledge, and they extended and enriched the original representation using external knowledge related to the posts in a knowledge base. Sun et al. [
8] focused on external knowledge information. In order to incorporate knowledge information into the representation of propagation, they designed two dynamic graph structures: the dynamic propagation graph and the dynamic knowledge graph. Specifically, they constructed a temporal post-propagation graph according to the comment relationship between posts and built a temporal knowledge graph based on the posts and related knowledge, then used two sets of GCNs to encode the two graphs separately.
Although the existing methods have made significant progress, we argue that existing methods possess a number of limitations:
(L1) Lack of adaptive aggregation of posts and knowledge: existing methods usually calculate the edge weights of the dynamic knowledge graphs based on statistical approaches [
8,
24]. However, edge weights generated in this way cannot aggregate the knowledge information that is most beneficial for detecting rumors.
(L2) Lack of adaptive fusion of propagation graph information and knowledge graph information: existing methods tend to concatenate propagation structure information and knowledge structure information as feature representations [
8,
23,
24]. However, this approach lacks effective fusion of propagation and knowledge information. As the role of propagation and knowledge information in distinguishing rumors differs at different temporal stages.
This paper proposes a new rumor detection model, ASTKN, which can effectively integrate propagation’s spatial, temporal, and external knowledge information. Specifically, we design two dynamic graph structures based on the post-propagation process and external knowledge information: the post-propagation graph and the post-entity-concept propagation graph. Among them, the post-propagation graph constructs the propagation process of posts based on temporal comment relationships. Similarly, the post-entity-concept propagation graph builds relationships between each post and its entities and concepts based on time series. In particular, in order to adaptively aggregate information, we apply dynamic graph attention networks to both graphs. Dynamic graph attention networks enable the model to assign different attention weights to different nodes when encoding propagation graphs. This allows the model to dynamically aggregate and weight features based on the relative importance between nodes. This adaptability enables better discrimination of the importance of external knowledge, thereby enhancing the model’s ability to judge the authenticity of events. Meanwhile, a new attention mechanism has been introduced to better integrate the information from the propagation graph and the knowledge graph. This attention mechanism can dynamically calculate the importance of post propagation and knowledge propagation information at different temporal stages and allocate these importance scores as weights to the post-propagation graph and the post-entity-concept propagation graph. Through the above process, the model can achieve information interaction between different graph structures. The post propagation and post-entity-concept propagation graphs can effectively complement and enhance each other based on the semantic correlation between nodes, thereby providing a richer and more accurate representation of features.
The main contributions of this paper are summarized as follows:
We apply dynamic graph attention networks to the graph structure at different temporal stages. This not only captures spatial–temporal information, it efficiently calculates the importance of different knowledge. This allows each post to aggregate information more important for detecting rumors.
We introduce a new attention mechanism in the fusion process of post propagation and knowledge propagation structures, which can generate weights adaptively for each at different temporal stages. Consequently, dynamic propagation and dynamic knowledge information can be effectively fused.
We constructed extensive experiments on two public rumor detection datasets. Experimental results show that ASTKN outperforms strong baselines and exhibits excellent results in early rumor detection.
Section 2 below presents recent works on rumor detection and dynamic graph attention networks.
Section 3 formalizes the problem of our study.
Section 4 elaborates on the framework of ASTKN.
Section 5 shows the experimental performance and analysis.
Section 6 is the conclusion.
2. Related Work
2.1. Rumor Detection Based on Spatial Structure
Spatial structure-based rumor detection typically constructs a tree or graph structure of the diffusion between the source post and individual responsive posts to seek to identify rumors [
10,
25]. Liu et al. [
20] used GCN and an attention mechanism to combine the influence and propagation structure relations. Bian et al. [
21] considered propagation trees in top-down propagation and bottom-down diffusion, then used Bi-GCNs to encode the two structures. Bai et al. [
26] were concerned that many existing methods focus on only one of the content features and structural features in rumor dialogues. They proposed a source-response conversation tree convolutional neural network to extract the content and structural features of rumor dialogues. Specifically, they constructed a source-response conversation tree and extracted the content and structural features using an autoencoder. Song et al. [
27] focused on the importance of the position of responsive posts in the graph structure for correct prediction, then used the positional information of responsive posts to train a model using generative adversarial methods.
2.2. Rumor Detection Based on Temporal Structure
The temporal structure is helpful for detecting rumors as well [
28]. Huang et al. [
4] addressed the issue that existing methods focus only on the structure information of rumor diffusion and ignore the temporal information. To effectively capture the spatial–temporal structure of diffusion, they proposed a spatial–temporal structure neural network to learn the spatial–temporal structure as a whole. Li et al. [
22] used a time step encoder and a temporal attention mechanism to learn the temporal correlations between responsive posts. Han et al. [
29] used the discrete Fourier transform to obtain the temporal characteristics of rumor propagation and reduced the computational effort using the fast Fourier transform. Bonifazi et al. [
30] proposed a combined temporal and spatial framework for determining the range of user sentiment on specific topics on social platforms. Sun et al. [
14] used a hyperedge learning method to represent the temporal propagation structure and a fusion neural network to jointly learn the content, structural, and temporal features of rumor propagation. Song et al. [
31] noted that the node and edge features during post propagation change over time. They proposed a new framework for temporal rumor detection that can effectively fuse content, structural, and temporal information. Unlike the above approaches, our model applies dynamic graph networks to jointly encode the spatial and temporal diffusion information. Specifically, we encode the graph structure of each temporal stage and use it as input for the next temporal stage. In this way, diffusion’s spatial and temporal information can be integrated more effectively.
2.3. Dynamic Graph Attention Networks
Recently, many researchers have tried to fuse graph structure information with temporal information to encode spatial and temporal features jointly. Meanwhile, the graph attention network (GAT) can assign weights to different neighboring nodes in a global context. Thus, dynamic graph attention networks have received increasing attention. Xu et al. [
32] argued that node embedding should contain static and changing structural features. They encoded temporal features based on harmonic analysis principles and inferred node classes based on multiple GAT layers. Rossi et al. [
33] incorporated each node’s temporal information into the node representation using concatenation and encoded node information using GAT layers. Wang et al. [
34] proposed an attention-based spatial–temporal graph attention network (ASTGAT) to capture dynamic spatial–temporal data correlations. Each component of ASTGAT contains multiple spatial–temporal blocks constructed from gated convolution and graph attention layers to capture stage-specific temporal information. Carchiolo et al. [
35] constructed dynamic graph networks, assigned timestamps to each event, and then employed GAT for information aggregation. Tang and Zeng [
36] used a gated cyclic unit layer, a graph attention layer with edge features, a gated bidirectional long short-term memory network, and a residual structure to jointly extract the spatial–temporal features of the data.
2.4. Knowledge-Enhanced Rumor Detection
Most knowledge-based rumor detection methods use external knowledge to enrich the representation of posts. Cui et al. [
37] combined medical knowledge graphs and article–entity dichotomous graphs to generate health information representations and applied this representation to healthcare misinformation detection. Wang et al. [
24] jointly modeled the semantic representation of text, external knowledge, and visual information and used it for misinformation detection. Dun et al. [
23] combined attention mechanisms to incorporate knowledge into textual representation to identify fake news. Chen et al. [
38] proposed a knowledge graph-based method for rumor data enhancement which introduces knowledge representation in the generation process of posts to cope with data deficiency. Sun et al. [
8] combined spatial, temporal, and external knowledge. They applied GCN to encode both the propagation graph and the knowledge graph, then concatenated the information from both graphs and used it as the initial representation of the next temporal phase. However, these methods neglect the adaptive aggregation of knowledge and post in the graph structure as well as the adaptive fusion of knowledge propagation and post propagation, which are fully considered in our model.
Table 1 contains summaries of recent related works. Compared to the existing works, our model focuses on the adaptive aggregation of knowledge and post in the propagation graph and the adaptive fusion of the knowledge propagation structure and post propagation structure, which have been neglected in the existing works. As a result, our model not only extracts the content, spatial, and temporal information of propagation, it better aggregates external knowledge information and the dynamic evolution of knowledge information in the propagation structure.
3. Problem Definition
Given an event , the set of post texts it contains is represented as . Here, s represents the source post, represents the responsive post, and m represents the number of responsive posts. The source post s can be considered as . We can obtain the release time sequence associated with the event , where . Then, and are combined to obtain .
We divide into stages along the temporal order. Each temporal stage has an equal temporal interval . Thus, the r-th sub-event of is .
We need to learn a model to classify each event into predefined categories , which is the ground truth label of the event. Here, 0 denotes non-rumor and 1 denotes rumor.
To facilitate understanding, we have provided the important symbols and their descriptions in
Table 2.
4. Method
Figure 1 shows the structure of ASTKN, which is mainly divided into three parts:
Dynamic Graph Construction Module: we construct the post-propagation graph and the post-entity-concept propagation graph, respectively, using the reply/comment relationship and related external knowledge.
Dynamic Graph Aggregation Module: we use the dynamic graph attention network to encode the graph structure and use a new attention mechanism to fuse the information of the two graph structures.
Classification Module: we use the graph information output by the dynamic graph aggregation module at the last temporal stage and the source post content to discriminate whether the event is a rumor or not.
We will describe them in detail in the following sections.
4.1. Dynamic Graph Construction Module
This module mainly constructs the post propagation graph based on the reply/comment relationship between posts and the post-entity-concept propagation graph based on the relationship between posts and external knowledge. The constructed graphs are then used as input for the next module.
4.1.1. Construction of the Post-Propagation Graph
For an event we construct a propagation graph set based on its source post and responsive posts; represents the post-propagation graph of the r-th stage, the node set represents the source post as well as the responsive posts, and the edge set represents the interaction between posts. For example, if is a comment of , then in there exists an edge to connect them. For simplicity, we do not consider the direction of edges, and denote as an undirected graph. For a node , we initialize its representation with .
4.1.2. Construction of the Post-Entity-Concept Propagation Graph
Most posts are short texts containing many entities, proper nouns, and abbreviations. Understanding their meanings requires knowing their corresponding concepts. For example, given a post “Slight glitch with @SpaceX Starlink. coming back online now” we need to let the machine know that “SpaceX” is a “space exploration technology company” and not a “spacecraft”, and that “Starlink” is a “high-speed internet access service” and nothing else. Therefore, we introduce external knowledge related to posts, allowing knowledge information to be involved in message propagation. Specifically, we construct a post-entity-concept propagation graph to model the dynamic relationship between posts, entities, and related concepts.
First, we use TagMe [
39] for entity linking to link entity mentions to related entities in the knowledge graph.
For each entity, we obtained its corresponding concepts in YAGO. We extracted concepts based on the isA relationship, which refers to the relationship between entities and concepts. For example, “China isA country” or “China isA Asian country”. For a given post, this allows relevant entities and concepts to be obtained. Therefore, we can find an entity set and a concept set for a temporal sub-event .
For , we construct a post-entity-concept propagation graph , where the set of nodes is the union of , , and . We construct the post-entity-concept propagation graph mainly to simulate the temporal propagation of knowledge information. Thus, unlike the post-propagation graph, we do not build edges between posts. We construct other edges according to the following rules.
Post-entity edges. If a post in contains a word that can be linked to an entity in , we add an edge between the post node and the entity node.
Entity-entity edges, entity-concept edges, and concept-concept edges. We use the Pointwise Mutual Information (PMI) to measure entity-entity, entity-concept, and concept-concept correlations. Specifically, we set a fixed-size sliding window to count the co-occurrence information of nodes from the global corpus and then calculate the PMI scores between node pairs. A negative PMI usually means that the correlation between terms is weak. We keep edges with positive PMI scores and remove edges with non-positive PMI scores. As in the post-propagation graph, we initialize the representation of node in the post-entity-concept propagation graph to . If appears in both and , then they have the same initial embedding.
4.2. Dual Dynamic GAT Module
This module aggregates the two types of graphs and generates posts node representations that incorporate spatial, temporal, and knowledge information.
4.2.1. A Single Dual-Static GAT Unit
We first describe how to encode the post propagation graph. We define the initial set of feature vectors of the post propagation graph nodes as , where represents the initial feature vector of a node. These feature vectors can form a feature matrix .
A Single Dual-Static GAT Unit contains two layers of GAT.
first passes through one layer of GAT. For the node
, its attention coefficients
with its neighbor
are computed using the softmax function:
where
denotes the weight matrix of the first GAT layer at the
r-th temporal stage;
is a weight vector, ‖ represents the concatenation operation, and
represents the neighbors of node
i in the graph. We use LeakyReLU [
40] as the activation function, which provides better gradient flow for the model:
After aggregating the features with the first GAT layer, we can obtain a new set of feature vectors
. These feature vectors form the feature matrix
. The source post of an event plays a crucial role in the whole event. We concatenate the hidden feature vector of each node with the source post vector
from the previous layer to obtain an enhanced feature matrix
Similar to the first GAT layer, the second layer takes
as input for information aggregation and generates a feature matrix
. We continue with the source feature enhancement on
and put the enhanced features through a linear transformation to obtain the final output
at the current stage:
Unlike post-propagation graphs, post-entity-concept propagation graphs contain different types of nodes. Using source feature augmentation in post-entity-concept propagation graphs can weaken knowledge information while causing redundancy in the final classification stage. Therefore, we only utilize a two-layer GAT to encode the post-entity-concept graph without source feature augmentation.
4.2.2. Temporal Stage Fusion Unit
In one temporal stage, the post-propagation and post-entity-concept propagation graphs are encoded by a single dual-static GAT unit and generate
and
, respectively. We expect to fuse the two different sources of structural information through a temporal stage fusion unit and use it as the initial node embedding for the next temporal stage. To implement this idea, we first define a global feature matrix
O which contains all post representations of event
. The post vector is initialized in the same way as the post propagation graph.
O is updated at each temporal stage and is used to retain information from the previous temporal stage. Then, we apply linear transformations to the feature matrix
, the post-propagation graph feature matrix
, and the post-entity-concept propagation graph feature matrix
, respectively.
where
,
, and
are weight matrices.
Next, we use two linear layers to convert
and
into one-dimensional scores and calculate the relative importance weights.
where
and
are the weight matrices used to reduce the feature dimension, the mean is used to obtain the weight scores
and
, tanh is the activation function, and
denotes the importance of the weights.
Then,
and
are used as weights for the dynamic fusion of the two graphs:
where
represents the feature matrix generated by dynamic fusion; note that we only fuse the source and responsive post nodes.
To effectively incorporate the propagation, knowledge, and previous information, we concatenate
with
, as
already contains the post-propagation graph and the post-entity-concept propagation graph information. The dimension of the concatenated vector is reduced by a linear layer and activated using tanh:
The fused feature matrix is used as the initial embedding for the corresponding position in the next temporal stage:
The three parts of Equation (13) demonstrate that updates the corresponding node representations of the post-propagation graph of the temporal stage, the corresponding node representations of the post-entity-concept propagation graph of the temporal stage, and the corresponding node representations of the feature matrix O, respectively.
Because the post-propagation and post-entity-concept propagation graphs are different at different temporal stages, the temporal stage fusion unit uses the output of the previous temporal stage as the initial representations of the corresponding nodes of the next temporal stage in order to fully capture this dynamic structural information. Then, the structural features of the next temporal stage are encoded with the dual static GAT unit.
4.3. Rumor Classification Module
The output of the last temporal stage fusion unit contains information about the entire event. We use average pooling to aggregate the information:
We concatenate
H with the BERT representation of the source post and feed the results into a linear layer for further feature extraction:
Another linear layer is used to classify the event:
The cross-entropy loss is used to calculate the loss:
where
is the ground truth label of the
ith event.
Algorithm 1 shows the training process of ASTKN.
Algorithm 1 Training of ASTKN |
Input: A set of events , temporal stage , a concept knowledge-graph Output: a trained model
- 1:
repeat - 2:
for in a batch do - 3:
Constructing the post-propagation graph - 4:
Constructing the post-entity-concept propagation graph - 5:
for r in do - 6:
Obtain a representation of the post-propagation graph using Equations ( 1)–( 4) - 7:
Obtain a representation of the post-entity-concept graph using Equations ( 1) and ( 2) - 8:
Combining the above two types of information using Equations ( 5)–( 11) - 9:
Obtain the initial embedding of the relevant portion of the next temporal stage using Equations ( 12) and ( 13) - 10:
end for - 11:
The average pooling aggregation node representation of the last temporal stage using Equation ( 14) - 12:
The node representation is fused with the source post representation using Equation ( 15) - 13:
The model predicts and calculates the loss using Equations ( 16) and ( 17) - 14:
Update parameters using Adam - 15:
end for - 16:
until convergence
|
5. Experiments
We tested the performance of ASTKN on two publicly available rumor detection datasets. Specifically, we focused on the following issues.
Q1: How does ASTKN perform compare to state-of-the-art baselines on rumor detection?
Q2: What are the impacts of our proposed innovations on model performance?
Q3: How do different hyperparameters affect model performance?
Q4: Is ASTKN able to detect rumors in the early propagation stage?
5.1. Datasets
In the experimental part, we used two rumor detection datasets, namely, PHEME5 and PHEME9.
PHEME5 contains rumor tweets related to five major events, including Charliehebdo, Ferguson, Germanwings-crash, Otawashooting, and Sydney-siege. Each major event includes a large number of sub-events (which we call events). Each event contains a source post, responsive posts, propagation structure information, and the time information of each posting. Each event has already been labeled as Rumor or Non-rumor.
PHEME9 extends PHEME5 with four main events: Ebola-Essien, Gurlitt, Prince-Toronto, and Putinmissing. The structure of PHEME9 is the same as PHEME5. Similarly, each event has been labeled as Rumor or Non-rumor.
We removed events that do not contain responsive posts and divided the two datasets into training, validation, and testing sets with a ratio of 7:1:2. The statistics after this division are shown in
Table 3.
5.2. Baseline Models
SVM-BOW [
13]: SVM-BOW utilizes bag-of-words and N-grams as feature representation and applies Support Vector Machine (SVM) as the classifier.
CNN [
13]: CNN using convolutional neural networks to extract post features and softmax as the classifier.
BiLSTM [
13]: BiLSTM extracts contextual information of posts using a bidirectional long short-term memory network.
BERT [
41]: BERT is a language model based on a deep bidirectional transformer encoder representation, which we use to encode source posts.
TD-RvNN [
42]: A tree-structured Recursive Neural Network (RvNN) with GRU units, where the RvNN obtains its representation from a top-down (TD-RvNN) propagation structure.
BU-RvNN [
42]: A tree-structured RvNN with GRU units, where the RvNN obtains its representation from a bottom-up (BU-RvNN) propagation structure.
Bi-GCN [
21]: A GCN-based rumor detection method using bidirectional propagation structures (propagation and diffusion structures) and the text content of posts.
CALN [
7]: CALN is a new Contrastive Adversarial Learning Network. It captures topic-related features using unsupervised topic clustering methods. It applies unsupervised adversarial learning methods to align the data distribution of unseen topics. We compared the performance of CALN as reported by Ma et al. [
7].
DDGCN [
8]: DDGCN is a dual dynamic graph convolutional network. It can capture dynamically post-propagated information as well as dynamic knowledge-propagated information.
5.3. Experimental Settings and Evaluation Metrics
ASTKN was implemented in PyTorch 1.12.0 and CUDA 11.3. All experiments were performed on several identically configured Linux servers with AMD EPYC 7601 CPU and a NVIDIA GeForce RTX 3090 GPU. The temporal stage was set to 3. The number of epochs was set to 5. The parameters were optimized using the Adam algorithm. BERT-base was used as the encoder for the source post and pretrained on the datasets. Due to the category imbalance between the PHEME5 and PHEME9 datasets, we used Accuracy (Acc), Recall (Rec), and F1 as evaluation metrics to assess model performance. We present the average results from five different random seeds.
5.4. Comparison Experiments (Q1)
The performance comparison between ASTKN and other baselines is presented in
Table 4, yielding the following observations:
The feature-based model SVM-BOW performs poorly, as it uses hand-developed features based on the overall statistics of posts. However, these features are too coarse and have low generalizability.
Deep learning-based models automatically extract effective features due to using neural networks. Thus, their performance is significantly better than the feature-based approach. CNN, BiLSTM, and BERT all utilize content features only, with BERT achieving higher performance due to its more robust rumor feature capture capability. RvNN and Bi-GCN both use the spatial structure of propagation. RvNN models post propagation as a tree-like structure and designs two ways to extract spatial structure features, i.e., top-down (TD-RvNN) and bottom-up (BU-RvNN). However, RvNN has weaker feature extraction ability for text and spatial structure. Bi-GCN takes into account the fact that both propagation and diffusion are crucial features. Therefore, they use Bidirectional GCNs to encode both propagation and diffusion structures separately. Thus Bi-GCN is more effective than RvNN. However, using only the post propagation structure has disadvantages; as the number of nodes in the propagation tree decreases, the information that can be provided decreases, reducing the model’s performance. DDGCN addresses the concern that existing methods do not consider external knowledge related to the post and temporal information associated with the propagation process. Therefore, they model two dynamic graph structures, namely, the dynamic propagation graph and the dynamic knowledge graph, and encode the information of the two graph structures separately using GCNs. DDGCN can effectively capture the spatial structure, temporal structure, and relevant external knowledge information of rumors. In particular, it uses a statistical approach to assign edge weights to the knowledge graph. However, this approach is ineffective in aggregating knowledge information to relevant posts, as discussed in the following part. CALN achieves suboptimal performance on PHEME5, demonstrating the effectiveness of using visible topic clustering and unsupervised adversarial learning for its invisible topic distribution.
Compared to the baseline models, ASTKN achieves optimal performance. First, compared to CNN, BiLSTM, and BERT, which only utilize content features, ASTKN not only encodes source posts’ content features, it focuses on propagated spatial, temporal, and external knowledge. Second, compared to TD-RvNN, BU-RvNN, Bi-GCN, and CALN, ASTKN applies stronger encoders to extract rumor features and pays more attention to temporal and external knowledge information. Compared with DDGCN, which only uses a statistical approach to fix edge weights, we consider adaptive post-to-post and post-to-knowledge aggregation. This adaptive aggregation can better capture the relationship between nodes by learning the importance weights. Compared to the method using fixed edge weights, our model can adaptively adjust the weights of information transfer between different nodes according to the importance of the nodes in the propagation structure, ensuring that more important nodes can gain more influence in the information transfer, resulting in better capture of the key information in the propagation structure. Meanwhile, DDGCN only applies simple concatenation to fuse the propagation and knowledge information. In contrast, we introduce a new attention mechanism that can effectively integrate the propagation structure information of posts and the propagation structure information of knowledge through weighted fusion. This means that the model can pay more attention to the information relevant to the rumor detection task. At the same time, it can enable the model to learn which parts are more important, thereby suppressing or ignoring noise or redundant information.
5.5. Ablation Experiments (Q2)
In this section, we describe our ablation experiments on the two datasets used to comprehensively analyze the key components of ASTKN. Specifically, we set up the following comparison models:
R1: Removing the post-entity-concept propagation graph and encoding the post-propagation graph using dynamic graph convolutional networks.
R2: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph convolutional networks while fusing post propagation and knowledge propagation information using concatenation.
R3: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph convolutional networks. Following Sun et al. [
8], we assign edge weights to the post-entity-concept propagation graph and use concatenation to fuse post propagation and knowledge propagation information (post-entity edges use the frequency–inverse document frequency term as the edge weight, while entity-entity, entity-concept, and concept-concept edges use the PMI as the edge weight).
R4: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph attention networks and using concatenation to fuse post propagation and knowledge propagation information;
R5: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph attention networks and using our designed attention mechanism to fuse post propagation and knowledge propagation information.
The Acc and F1 obtained from the ablation experiments on PHEME5 and PHEME9 are shown in
Figure 2. From the figures, a number of conclusions can be observed. First, from the experimental results of R1, it can be seen that utilizing only post propagation and content information while ignoring other information has limited performance.
Second, incorporating external knowledge information can be effective in improving performance (R2). This may be due to the short length of posts and the lack of contextual information. Relevant external knowledge can supplement the background information. Meanwhile, the information on knowledge propagation structure helps to identify rumors.
Third, we find that adding statistically computed edge weights (R3) to the graph structure has a limited effect on improving performance (R3 improves average Acc by 0.002 and average F1 by 0.001 compared to R2). In contrast, the use of the adaptive method to generate edge weights results in a relatively significant improvement (R4 improves average Acc by 0.009 and average F1 by 0.008 compared to R3, and R4 improves average Acc by 0.011 and average F1 by 0.009 compared to R2). This may be because treating each neighbor node as equally important or using statistically based fixed edge weights is not conducive to efficient feature aggregation. In contrast, the model can better aggregate features that are important for identifying rumors by the adaptive method.
Fourth, using our designed attention mechanism (R5) to dynamically fuse post propagation information with knowledge information achieves the highest performance. Although the comprehensiveness of the information is essential, the importance of information for identifying rumors is variable. This dynamic fusion process allows the model to learn which information is more important for identifying rumors and which is less important. Thus, the model using dynamic graph attention networks and the attention mechanism we designed achieves optimal results.
5.6. Hyperparameter Tuning Experiments (Q3)
In this experiment, we first tested whether applying more attention heads improves model performance. Then, we tested the effect of different dropout rates on model performance.
Figure 3 shows the effect on model performance of applying a different number of attention heads. Although the application of the multi-head attention mechanism can provide richer local information and more comprehensive global information, the experimental results show that applying the multi-head attention mechanism does not improve our model’s performance. This may be due to two reasons: first, increasing the number of attention heads increases the complexity of the model and the number of parameters, which may lead to increased training difficulty and decreased generalization performance; second, the graph structure consisting of replies and comments may be sparse, and increasing the number of attention heads may lead to excessive dispersion of attention among relatively few neighboring nodes, reducing the expressive power of the model.
As shown in
Figure 4, we tested the effect of different dropout rates on the model’s performance. It can be seen from the figure that as the dropout rate increases, the performance of ASTKN shows a trend of first increasing and then decreasing. This is because a dropout rate that is too small is not enough to provide sufficient regularization, which may lead to overfitting and make the network unable to compute the true distribution of the input data correctly.
A dropout rate that is too large results in a network that is too simple to adequately learn the features of the input data. Therefore, the model performs optimally when the dropout rate is moderate.
5.7. Early Detection (Q4)
Detecting rumors at the early propagation stage can prevent rumors from spreading widely. As shown in
Figure 5, to evaluate the early detection performance of ASTKN we intercepted different numbers of responsive posts in chronological order. Both ASTKN and the baseline models were able to identify rumors using the source post and the given responsive posts.
From
Figure 5, it can be seen that the models perform poorly when there are few responsive posts. Because of the lack of responsive posts, there is a corresponding lack of spatial and temporal structure. Second, it is clear that the performance of all models increases as the number of responsive posts increases. This is because the models can acquire more information as the number of responsive posts increases. Third, ASTKN has strong performance at all responsive post numbers, exceeding Bi-GCN and DDGCN. This demonstrates that adaptive aggregation of posts and knowledge along with adaptive aggregation of propagation structure and knowledge structure information can effectively improve the model’s early detection ability.
6. Conclusions and the Future Work
In this paper, we observe that existing rumor detection methods pay attention to the spatial–temporal structure of propagation and external knowledge information. However, two issues are overlooked: (L1) lack of adaptive aggregation of posts and knowledge and (L2) lack of adaptive fusion between propagation structure and knowledge structure information. Therefore, we propose a new rumor detection model, ASTKN. ASTKN applies the dynamic graph attention network to encode the spatial–temporal structure of information propagation jointly, enabling adaptive aggregation of post node and knowledge node information. To better fuse the propagation structure and knowledge structure information, we introduce a new attention mechanism that can calculate the importance of propagation and knowledge information in each temporal stage and assign importance scores such as weights to the propagation structure and the knowledge structure. Through the above process, our model can generate a better representation for distinguishing rumors.
In future work, we aim to apply ranking algorithms to rumor detection. Ranking algorithms can score or classify rumors based on specific metrics and criteria, helping to prioritize the handling of information with a higher likelihood of being a rumor or having greater destructiveness. It is possible to quickly and automatically analyze vast amounts of information flow using ranking algorithms, thereby improving the efficiency and accuracy of rumor detection. The application of ranking algorithms can be based on multiple factors, including but not limited to the content features of rumors (such as topics, sentiment, and credibility), propagation features (such as retweets, comments, and user interactions), and external knowledge bases (such as factual databases and authoritative institution information). Combining these factors and utilizing appropriate ranking algorithms makes it possible to identify and exclude information that may be rumors.