Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection

Li, Hui; Huang, Guimin; Li, Cheng; Li, Jun; Wang, Yabing

doi:10.3390/electronics12163457

Open AccessArticle

Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection

by

Hui Li

¹,

Guimin Huang

^1,2,*,

Cheng Li

¹,

Jun Li

¹ and

Yabing Wang

¹

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China

²

Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3457; https://doi.org/10.3390/electronics12163457

Submission received: 25 July 2023 / Revised: 7 August 2023 / Accepted: 11 August 2023 / Published: 15 August 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

With the growth of the internet and popularity of mobile devices, propagating rumors on social media has become increasingly easy. Widespread rumors may cause public panic and have adverse effects on individuals. Recently, researchers have found that external knowledge is useful for detecting rumors. They usually use statistical approaches to calculate the importance of different knowledge for the post. However, these methods cannot aggregate the knowledge information most beneficial for detecting rumors. Second, the importance of propagation and knowledge information for discriminating rumors differs among temporal stages. Existing methods usually use a simple concatenation of two kinds of information as feature representation. However, this approach lacks effective integration of propagation information and knowledge information. In this paper, we propose a rumor detection model, Adaptive Spatial-Temporal and Knowledge fusing Network (ASTKN). In order to adaptively aggregate knowledge information, ASTKN employs dynamic graph attention networks encoding the temporal knowledge structure. To better fuse propagation structure information and knowledge structure information, we introduce a new attention mechanism to fuse the two types of information dynamically. Extensive experiments on two public real-world datasets show that our proposal yields significant improvements compared to strong baselines and that it can detect rumors at early stages.

Keywords:

social media; rumor detection; external knowledge; adaptive spatial-temporal integration

1. Introduction

With the rapid development of the internet, people have witnessed the emergence of many online social media (OSM) tools in the past decade, such as Twitter, Facebook, Instagram, etc. These OSMs have gradually become the primary source of information in people’s daily lives and have fundamentally changed how people share information [1]. However, OSMs are a double-edged sword. On the one hand, they allow the creation of social connections during periods of social distance and facilitate the dissemination of knowledge in various contexts. On the other hand, they may cause people to share quick and superficial ideas, such as rumors, and spread them rapidly [1,2,3,4,5]. In this paper, rumors are defined as information that is unconfirmed or has been proven to be false [3,6,7,8]. The explosive spread of rumors threatens the credibility of the internet and has serious adverse effects on individuals and society [9,10,11,12,13,14]. Therefore, effective identification of rumors is crucial to maintaining the security of cyberspace and preserving personal privacy [13,15,16,17,18,19]. However, rumors’ brief and fast-spreading nature make them difficult to detect automatically. Therefore, automatic rumor detection has attracted attention from more and more researchers.

Early automatic rumor detection mainly focused on the content features of posts [13,14]. However, these single features do not achieve excellent results. The propagation process from the source post to responsive posts is a natural tree structure, known as the spatial structure. Several researchers have used the spatial structure as a feature to identify rumors. Liu et al. [20] used graph convolutional networks (GCN) to dynamically combine influence and propagation structure relationships. Bian et al. [21] used bidirectional graph convolutional networks (Bi-GCN) to encode top-down propagation and bottom-up diffusion of rumor trees. Meanwhile, temporal information (called the temporal structure) is an important propagation feature. Li et al. [22] applied a time-step encoder and a temporal attention mechanism to learn the temporal structure of propagation. Huang et al. [4] designed a neural network to capture the spatial and temporal structure of propagation jointly. Dun et al. [23] concerned that existing studies ignored external knowledge, and they extended and enriched the original representation using external knowledge related to the posts in a knowledge base. Sun et al. [8] focused on external knowledge information. In order to incorporate knowledge information into the representation of propagation, they designed two dynamic graph structures: the dynamic propagation graph and the dynamic knowledge graph. Specifically, they constructed a temporal post-propagation graph according to the comment relationship between posts and built a temporal knowledge graph based on the posts and related knowledge, then used two sets of GCNs to encode the two graphs separately.

Although the existing methods have made significant progress, we argue that existing methods possess a number of limitations:

(L1) Lack of adaptive aggregation of posts and knowledge: existing methods usually calculate the edge weights of the dynamic knowledge graphs based on statistical approaches [8,24]. However, edge weights generated in this way cannot aggregate the knowledge information that is most beneficial for detecting rumors.

(L2) Lack of adaptive fusion of propagation graph information and knowledge graph information: existing methods tend to concatenate propagation structure information and knowledge structure information as feature representations [8,23,24]. However, this approach lacks effective fusion of propagation and knowledge information. As the role of propagation and knowledge information in distinguishing rumors differs at different temporal stages.

This paper proposes a new rumor detection model, ASTKN, which can effectively integrate propagation’s spatial, temporal, and external knowledge information. Specifically, we design two dynamic graph structures based on the post-propagation process and external knowledge information: the post-propagation graph and the post-entity-concept propagation graph. Among them, the post-propagation graph constructs the propagation process of posts based on temporal comment relationships. Similarly, the post-entity-concept propagation graph builds relationships between each post and its entities and concepts based on time series. In particular, in order to adaptively aggregate information, we apply dynamic graph attention networks to both graphs. Dynamic graph attention networks enable the model to assign different attention weights to different nodes when encoding propagation graphs. This allows the model to dynamically aggregate and weight features based on the relative importance between nodes. This adaptability enables better discrimination of the importance of external knowledge, thereby enhancing the model’s ability to judge the authenticity of events. Meanwhile, a new attention mechanism has been introduced to better integrate the information from the propagation graph and the knowledge graph. This attention mechanism can dynamically calculate the importance of post propagation and knowledge propagation information at different temporal stages and allocate these importance scores as weights to the post-propagation graph and the post-entity-concept propagation graph. Through the above process, the model can achieve information interaction between different graph structures. The post propagation and post-entity-concept propagation graphs can effectively complement and enhance each other based on the semantic correlation between nodes, thereby providing a richer and more accurate representation of features.

The main contributions of this paper are summarized as follows:

We apply dynamic graph attention networks to the graph structure at different temporal stages. This not only captures spatial–temporal information, it efficiently calculates the importance of different knowledge. This allows each post to aggregate information more important for detecting rumors.
We introduce a new attention mechanism in the fusion process of post propagation and knowledge propagation structures, which can generate weights adaptively for each at different temporal stages. Consequently, dynamic propagation and dynamic knowledge information can be effectively fused.
We constructed extensive experiments on two public rumor detection datasets. Experimental results show that ASTKN outperforms strong baselines and exhibits excellent results in early rumor detection.

Section 2 below presents recent works on rumor detection and dynamic graph attention networks. Section 3 formalizes the problem of our study. Section 4 elaborates on the framework of ASTKN. Section 5 shows the experimental performance and analysis. Section 6 is the conclusion.

2. Related Work

2.1. Rumor Detection Based on Spatial Structure

Spatial structure-based rumor detection typically constructs a tree or graph structure of the diffusion between the source post and individual responsive posts to seek to identify rumors [10,25]. Liu et al. [20] used GCN and an attention mechanism to combine the influence and propagation structure relations. Bian et al. [21] considered propagation trees in top-down propagation and bottom-down diffusion, then used Bi-GCNs to encode the two structures. Bai et al. [26] were concerned that many existing methods focus on only one of the content features and structural features in rumor dialogues. They proposed a source-response conversation tree convolutional neural network to extract the content and structural features of rumor dialogues. Specifically, they constructed a source-response conversation tree and extracted the content and structural features using an autoencoder. Song et al. [27] focused on the importance of the position of responsive posts in the graph structure for correct prediction, then used the positional information of responsive posts to train a model using generative adversarial methods.

2.2. Rumor Detection Based on Temporal Structure

The temporal structure is helpful for detecting rumors as well [28]. Huang et al. [4] addressed the issue that existing methods focus only on the structure information of rumor diffusion and ignore the temporal information. To effectively capture the spatial–temporal structure of diffusion, they proposed a spatial–temporal structure neural network to learn the spatial–temporal structure as a whole. Li et al. [22] used a time step encoder and a temporal attention mechanism to learn the temporal correlations between responsive posts. Han et al. [29] used the discrete Fourier transform to obtain the temporal characteristics of rumor propagation and reduced the computational effort using the fast Fourier transform. Bonifazi et al. [30] proposed a combined temporal and spatial framework for determining the range of user sentiment on specific topics on social platforms. Sun et al. [14] used a hyperedge learning method to represent the temporal propagation structure and a fusion neural network to jointly learn the content, structural, and temporal features of rumor propagation. Song et al. [31] noted that the node and edge features during post propagation change over time. They proposed a new framework for temporal rumor detection that can effectively fuse content, structural, and temporal information. Unlike the above approaches, our model applies dynamic graph networks to jointly encode the spatial and temporal diffusion information. Specifically, we encode the graph structure of each temporal stage and use it as input for the next temporal stage. In this way, diffusion’s spatial and temporal information can be integrated more effectively.

2.3. Dynamic Graph Attention Networks

Recently, many researchers have tried to fuse graph structure information with temporal information to encode spatial and temporal features jointly. Meanwhile, the graph attention network (GAT) can assign weights to different neighboring nodes in a global context. Thus, dynamic graph attention networks have received increasing attention. Xu et al. [32] argued that node embedding should contain static and changing structural features. They encoded temporal features based on harmonic analysis principles and inferred node classes based on multiple GAT layers. Rossi et al. [33] incorporated each node’s temporal information into the node representation using concatenation and encoded node information using GAT layers. Wang et al. [34] proposed an attention-based spatial–temporal graph attention network (ASTGAT) to capture dynamic spatial–temporal data correlations. Each component of ASTGAT contains multiple spatial–temporal blocks constructed from gated convolution and graph attention layers to capture stage-specific temporal information. Carchiolo et al. [35] constructed dynamic graph networks, assigned timestamps to each event, and then employed GAT for information aggregation. Tang and Zeng [36] used a gated cyclic unit layer, a graph attention layer with edge features, a gated bidirectional long short-term memory network, and a residual structure to jointly extract the spatial–temporal features of the data.

2.4. Knowledge-Enhanced Rumor Detection

Most knowledge-based rumor detection methods use external knowledge to enrich the representation of posts. Cui et al. [37] combined medical knowledge graphs and article–entity dichotomous graphs to generate health information representations and applied this representation to healthcare misinformation detection. Wang et al. [24] jointly modeled the semantic representation of text, external knowledge, and visual information and used it for misinformation detection. Dun et al. [23] combined attention mechanisms to incorporate knowledge into textual representation to identify fake news. Chen et al. [38] proposed a knowledge graph-based method for rumor data enhancement which introduces knowledge representation in the generation process of posts to cope with data deficiency. Sun et al. [8] combined spatial, temporal, and external knowledge. They applied GCN to encode both the propagation graph and the knowledge graph, then concatenated the information from both graphs and used it as the initial representation of the next temporal phase. However, these methods neglect the adaptive aggregation of knowledge and post in the graph structure as well as the adaptive fusion of knowledge propagation and post propagation, which are fully considered in our model.

Table 1 contains summaries of recent related works. Compared to the existing works, our model focuses on the adaptive aggregation of knowledge and post in the propagation graph and the adaptive fusion of the knowledge propagation structure and post propagation structure, which have been neglected in the existing works. As a result, our model not only extracts the content, spatial, and temporal information of propagation, it better aggregates external knowledge information and the dynamic evolution of knowledge information in the propagation structure.

3. Problem Definition

Given an event

ε

, the set of post texts it contains is represented as

ε_{p} = \{s, p_{1}, p_{2}, \dots, p_{m}\}

. Here, s represents the source post,

p_{i}

represents the responsive post, and m represents the number of responsive posts. The source post s can be considered as

p_{0}

. We can obtain the release time sequence

ε_{t} = \{t_{0}, t_{1}, \dots, t_{m}\}

associated with the event

ε

, where

t_{0} = 0

. Then,

ε_{p}

and

ε_{t}

are combined to obtain

ε = \{(p_{0}, t_{0}), (p_{1}, t_{1}), \dots, (p_{m}, t_{m})\}

.

We divide

ε

into

τ

stages along the temporal order. Each temporal stage

r \in {1, 2, \dots, τ}

has an equal temporal interval

Δ t = \frac{t_{m}}{τ}

. Thus, the r-th sub-event of

ε

is

ε_{r} = {(p_{k}, t_{k}) ∣ t_{k} \leq r Δ t}

.

We need to learn a model

f : ε \to y

to classify each event

ε

into predefined categories

y = {0, 1}

, which is the ground truth label of the event. Here, 0 denotes non-rumor and 1 denotes rumor.

To facilitate understanding, we have provided the important symbols and their descriptions in Table 2.

4. Method

Figure 1 shows the structure of ASTKN, which is mainly divided into three parts:

Dynamic Graph Construction Module: we construct the post-propagation graph and the post-entity-concept propagation graph, respectively, using the reply/comment relationship and related external knowledge.
Dynamic Graph Aggregation Module: we use the dynamic graph attention network to encode the graph structure and use a new attention mechanism to fuse the information of the two graph structures.
Classification Module: we use the graph information output by the dynamic graph aggregation module at the last temporal stage and the source post content to discriminate whether the event is a rumor or not.

We will describe them in detail in the following sections.

4.1. Dynamic Graph Construction Module

This module mainly constructs the post propagation graph based on the reply/comment relationship between posts and the post-entity-concept propagation graph based on the relationship between posts and external knowledge. The constructed graphs are then used as input for the next module.

4.1.1. Construction of the Post-Propagation Graph

For an event

ε

we construct a propagation graph set

\{G_{1}^{p}, \dots, G_{τ}^{p}\}

based on its source post and responsive posts;

G_{r}^{p} = 〈 V_{r}^{p}, E_{r}^{p} 〉

represents the post-propagation graph of the r-th stage, the node set

V_{r}^{p}

represents the source post as well as the responsive posts, and the edge set

E_{r}^{p}

represents the interaction between posts. For example, if

p_{2}

is a comment of

p_{1}

, then in

G_{r}^{p}

there exists an edge to connect them. For simplicity, we do not consider the direction of edges, and denote

G_{r}^{p}

as an undirected graph. For a node

v_{l}^{p}

, we initialize its representation with

\vec{p_{l}}

.

4.1.2. Construction of the Post-Entity-Concept Propagation Graph

Most posts are short texts containing many entities, proper nouns, and abbreviations. Understanding their meanings requires knowing their corresponding concepts. For example, given a post “Slight glitch with @SpaceX Starlink. coming back online now” we need to let the machine know that “SpaceX” is a “space exploration technology company” and not a “spacecraft”, and that “Starlink” is a “high-speed internet access service” and nothing else. Therefore, we introduce external knowledge related to posts, allowing knowledge information to be involved in message propagation. Specifically, we construct a post-entity-concept propagation graph to model the dynamic relationship between posts, entities, and related concepts.

First, we use TagMe [39] for entity linking to link entity mentions to related entities in the knowledge graph.

For each entity, we obtained its corresponding concepts in YAGO. We extracted concepts based on the isA relationship, which refers to the relationship between entities and concepts. For example, “China isA country” or “China isA Asian country”. For a given post, this allows relevant entities and concepts to be obtained. Therefore, we can find an entity set

E t_{r}

and a concept set

C_{r}

for a temporal sub-event

ε_{r}

.

For

ε_{r}

, we construct a post-entity-concept propagation graph

G_{r}^{k} = 〈V_{r}^{k}, E_{r}^{k}〉

, where the set of nodes

V_{r}^{k}

is the union of

V_{r}^{p}

,

E t_{r}

, and

C_{r}

. We construct the post-entity-concept propagation graph mainly to simulate the temporal propagation of knowledge information. Thus, unlike the post-propagation graph, we do not build edges between posts. We construct other edges according to the following rules.

Post-entity edges. If a post in

V_{r}^{p}

contains a word that can be linked to an entity in

E t_{r}

, we add an edge between the post node and the entity node.

Entity-entity edges, entity-concept edges, and concept-concept edges. We use the Pointwise Mutual Information (PMI) to measure entity-entity, entity-concept, and concept-concept correlations. Specifically, we set a fixed-size sliding window to count the co-occurrence information of nodes from the global corpus and then calculate the PMI scores between node pairs. A negative PMI usually means that the correlation between terms is weak. We keep edges with positive PMI scores and remove edges with non-positive PMI scores. As in the post-propagation graph, we initialize the representation of node

v_{l}^{k}

in the post-entity-concept propagation graph to

\vec{k_{l}}

. If

v_{l}^{k}

appears in both

V_{r}^{p}

and

V_{r}^{k}

, then they have the same initial embedding.

4.2. Dual Dynamic GAT Module

This module aggregates the two types of graphs and generates posts node representations that incorporate spatial, temporal, and knowledge information.

4.2.1. A Single Dual-Static GAT Unit

We first describe how to encode the post propagation graph. We define the initial set of feature vectors of the post propagation graph nodes as

p_{r} = \{\vec{p_{1}}, \vec{p_{2}}, \dots, \vec{p_{N}}\}

, where

\vec{p_{i}}

represents the initial feature vector of a node. These feature vectors can form a feature matrix

P_{r}

.

A Single Dual-Static GAT Unit contains two layers of GAT.

P_{r}

first passes through one layer of GAT. For the node

\vec{p_{i}}

, its attention coefficients

α_{i j}^{r (1)}

with its neighbor

\vec{p_{j}}

are computed using the softmax function:

α_{i j}^{r (1)} = \frac{\exp (Leaky ReLU ({\vec{a_{r}}}^{T} [W_{r}^{(1)} \vec{p_{i}} ∥ W_{r}^{(1)} \vec{p_{j}}]))}{\sum_{k \in N (i)} \exp (Leaky ReLU ({\vec{a_{r}}}^{T} [W_{r}^{(1)} \vec{p_{i}} ∥ W_{r}^{(1)} \vec{p_{k}}]))},

(1)

where

W_{r}^{(1)}

denotes the weight matrix of the first GAT layer at the r-th temporal stage;

\vec{a_{r}}

is a weight vector, ‖ represents the concatenation operation, and

N (i)

represents the neighbors of node i in the graph. We use LeakyReLU [40] as the activation function, which provides better gradient flow for the model:

{\vec{p}}_{i}^{(1)} = \sum_{j \in N (i)} α_{i j}^{r (1)} W_{r}^{(1)} \vec{p_{j}} .

(2)

After aggregating the features with the first GAT layer, we can obtain a new set of feature vectors

p_{r}^{(1)} = \{{\vec{p_{1}}}^{(1)}, {\vec{p_{2}}}^{(1)}, \dots, {\vec{p_{N}}}^{(1)}\}

. These feature vectors form the feature matrix

P_{r}^{(1)}

. The source post of an event plays a crucial role in the whole event. We concatenate the hidden feature vector of each node with the source post vector

{(P_{r})}^{source}

from the previous layer to obtain an enhanced feature matrix

{\tilde{P}}_{r}^{(1)}

{\tilde{P}}_{r}^{(1)} = Leaky ReLU (P_{r}^{(1)} ∥ {(P_{r})}^{source}) .

(3)

Similar to the first GAT layer, the second layer takes

{\tilde{P}}_{r}^{(1)}

as input for information aggregation and generates a feature matrix

P_{r}^{(2)}

. We continue with the source feature enhancement on

P_{r}^{(2)}

and put the enhanced features through a linear transformation to obtain the final output

{\tilde{P}}_{r}^{(2)}

at the current stage:

{\tilde{P}}_{r}^{(2)} = Leaky ReLU (linear (P_{r}^{(2)} ∥ {(P_{r}^{(1)})}^{source})) .

(4)

Unlike post-propagation graphs, post-entity-concept propagation graphs contain different types of nodes. Using source feature augmentation in post-entity-concept propagation graphs can weaken knowledge information while causing redundancy in the final classification stage. Therefore, we only utilize a two-layer GAT to encode the post-entity-concept graph without source feature augmentation.

4.2.2. Temporal Stage Fusion Unit

In one temporal stage, the post-propagation and post-entity-concept propagation graphs are encoded by a single dual-static GAT unit and generate

{\tilde{P}}_{r}^{(2)}

and

{\tilde{K}}_{r}^{(2)}

, respectively. We expect to fuse the two different sources of structural information through a temporal stage fusion unit and use it as the initial node embedding for the next temporal stage. To implement this idea, we first define a global feature matrix O which contains all post representations of event

ε

. The post vector is initialized in the same way as the post propagation graph. O is updated at each temporal stage and is used to retain information from the previous temporal stage. Then, we apply linear transformations to the feature matrix

O [V_{r}^{p}]

, the post-propagation graph feature matrix

{\tilde{P}}_{r}^{(2)}

, and the post-entity-concept propagation graph feature matrix

{\tilde{K}}_{r}^{(2)}

, respectively.

\begin{matrix} ^{*} O [V_{r}^{p}] = W_{r}^{o} O [V_{r}^{p}], \end{matrix}

(5)

\begin{matrix} ^{*} {\tilde{P}}_{r}^{(2)} = W_{r}^{p} {\tilde{P}}_{r}^{(2)}, \end{matrix}

(6)

\begin{matrix} ^{*} {\tilde{K}}_{r}^{(2)} = W_{r}^{k} {\tilde{K}}_{r}^{(2)}, \end{matrix}

(7)

where

W_{r}^{o}

,

W_{r}^{p}

, and

W_{r}^{k}

are weight matrices.

Next, we use two linear layers to convert

^{*} {\tilde{P}}_{r}^{(2)}

and

^{*} {\tilde{K}}_{r}^{(2)}

into one-dimensional scores and calculate the relative importance weights.

\begin{matrix} w_{r}^{p} = mean (W_{r}^{s c o r e} \tanh (W_{r}^{w e i g h t}^{*} {\tilde{P}}_{r}^{(2)})), \end{matrix}

(8)

\begin{matrix} w_{r}^{k} = mean (W_{r}^{s c o r e} \tanh (W_{r}^{w e i g h t}^{*} {\tilde{K}}_{r}^{(2)})), \end{matrix}

(9)

\begin{matrix} α_{r}^{p (k)} = \frac{\exp (w_{r}^{p (k)})}{\sum_{ϕ \in {p, k}} \exp (w_{r}^{ϕ})}, \end{matrix}

(10)

where

W_{r}^{s c o r e}

and

W_{r}^{w e i g h t}

are the weight matrices used to reduce the feature dimension, the mean is used to obtain the weight scores

w_{r}^{p}

and

w_{r}^{k}

, tanh is the activation function, and

α_{r}^{p (k)}

denotes the importance of the weights.

Then,

α_{r}^{p}

and

α_{r}^{k}

are used as weights for the dynamic fusion of the two graphs:

H_{r}^{p, k} = α_{r}^{p} {\tilde{P}}_{r}^{(2)} + α_{r}^{k} {\tilde{K}}_{r}^{(2)},

(11)

where

H_{r}^{p, k}

represents the feature matrix generated by dynamic fusion; note that we only fuse the source and responsive post nodes.

To effectively incorporate the propagation, knowledge, and previous information, we concatenate

H_{r}^{p, k}

with

^{*} O [V_{r}^{p}]

, as

H_{r}^{p, k}

already contains the post-propagation graph and the post-entity-concept propagation graph information. The dimension of the concatenated vector is reduced by a linear layer and activated using tanh:

{\overset{o}{H}}_{r} = \tanh (linear (H_{r}^{p, k} ∥^{*} O [V_{r}^{p}])) .

(12)

The fused feature matrix is used as the initial embedding for the corresponding position in the next temporal stage:

\begin{matrix} P_{r + 1} [V_{r}^{p}] = {\overset{o}{H}}_{r}, \\ K_{r + 1} [V_{r}^{p}] = {\overset{o}{H}}_{r}, \\ O [V_{r}^{p}] = {\overset{o}{H}}_{r} . \end{matrix}

(13)

The three parts of Equation (13) demonstrate that

{\overset{o}{H}}_{r}

updates the corresponding node representations of the post-propagation graph of the

r + 1 th

temporal stage, the corresponding node representations of the post-entity-concept propagation graph of the

r + 1 th

temporal stage, and the corresponding node representations of the feature matrix O, respectively.

Because the post-propagation and post-entity-concept propagation graphs are different at different temporal stages, the temporal stage fusion unit uses the output of the previous temporal stage as the initial representations of the corresponding nodes of the next temporal stage in order to fully capture this dynamic structural information. Then, the structural features of the next temporal stage are encoded with the dual static GAT unit.

4.3. Rumor Classification Module

The output of the last temporal stage fusion unit contains information about the entire event. We use average pooling to aggregate the information:

H = meanpooling (\overset{o}{H_{τ}}) .

(14)

We concatenate H with the BERT representation of the source post and feed the results into a linear layer for further feature extraction:

S = Leaky ReLU (linear (H ∥ BERT (p_{0}))) .

(15)

Another linear layer is used to classify the event:

\hat{y} = sigmoid (linear (S)) .

(16)

The cross-entropy loss is used to calculate the loss:

L = - \sum_{i} y_{i} \log {\hat{y}}_{i}

(17)

where

y_{i}

is the ground truth label of the ith event.

Algorithm 1 shows the training process of ASTKN.

Algorithm 1 Training of ASTKN

Input: A set of events

ε = {\{ε_{i}\}}_{i = 1}^{n}

, temporal stage

τ

, a concept knowledge-graph
Output: a trained model

1:: repeat
2:: for $ε_{i}$ in a batch do
3:: Constructing the post-propagation graph $G_{i r}^{p}$
4:: Constructing the post-entity-concept propagation graph $G_{i r}^{k}$
5:: for r in $τ$ do
6:: Obtain a representation of the post-propagation graph using Equations (1)–(4)
7:: Obtain a representation of the post-entity-concept graph using Equations (1) and (2)
8:: Combining the above two types of information using Equations (5)–(11)
9:: Obtain the initial embedding of the relevant portion of the next temporal stage using Equations (12) and (13)
10:: end for
11:: The average pooling aggregation node representation of the last temporal stage using Equation (14)
12:: The node representation is fused with the source post representation using Equation (15)
13:: The model predicts and calculates the loss using Equations (16) and (17)
14:: Update parameters using Adam
15:: end for
16:: until convergence

5. Experiments

We tested the performance of ASTKN on two publicly available rumor detection datasets. Specifically, we focused on the following issues.

Q1: How does ASTKN perform compare to state-of-the-art baselines on rumor detection?

Q2: What are the impacts of our proposed innovations on model performance?

Q3: How do different hyperparameters affect model performance?

Q4: Is ASTKN able to detect rumors in the early propagation stage?

5.1. Datasets

In the experimental part, we used two rumor detection datasets, namely, PHEME5 and PHEME9.

PHEME5 contains rumor tweets related to five major events, including Charliehebdo, Ferguson, Germanwings-crash, Otawashooting, and Sydney-siege. Each major event includes a large number of sub-events (which we call events). Each event contains a source post, responsive posts, propagation structure information, and the time information of each posting. Each event has already been labeled as Rumor or Non-rumor.

PHEME9 extends PHEME5 with four main events: Ebola-Essien, Gurlitt, Prince-Toronto, and Putinmissing. The structure of PHEME9 is the same as PHEME5. Similarly, each event has been labeled as Rumor or Non-rumor.

We removed events that do not contain responsive posts and divided the two datasets into training, validation, and testing sets with a ratio of 7:1:2. The statistics after this division are shown in Table 3.

5.2. Baseline Models

SVM-BOW [13]: SVM-BOW utilizes bag-of-words and N-grams as feature representation and applies Support Vector Machine (SVM) as the classifier.

CNN [13]: CNN using convolutional neural networks to extract post features and softmax as the classifier.

BiLSTM [13]: BiLSTM extracts contextual information of posts using a bidirectional long short-term memory network.

BERT [41]: BERT is a language model based on a deep bidirectional transformer encoder representation, which we use to encode source posts.

TD-RvNN [42]: A tree-structured Recursive Neural Network (RvNN) with GRU units, where the RvNN obtains its representation from a top-down (TD-RvNN) propagation structure.

BU-RvNN [42]: A tree-structured RvNN with GRU units, where the RvNN obtains its representation from a bottom-up (BU-RvNN) propagation structure.

Bi-GCN [21]: A GCN-based rumor detection method using bidirectional propagation structures (propagation and diffusion structures) and the text content of posts.

CALN [7]: CALN is a new Contrastive Adversarial Learning Network. It captures topic-related features using unsupervised topic clustering methods. It applies unsupervised adversarial learning methods to align the data distribution of unseen topics. We compared the performance of CALN as reported by Ma et al. [7].

DDGCN [8]: DDGCN is a dual dynamic graph convolutional network. It can capture dynamically post-propagated information as well as dynamic knowledge-propagated information.

5.3. Experimental Settings and Evaluation Metrics

ASTKN was implemented in PyTorch 1.12.0 and CUDA 11.3. All experiments were performed on several identically configured Linux servers with AMD EPYC 7601 CPU and a NVIDIA GeForce RTX 3090 GPU. The temporal stage

τ

was set to 3. The number of epochs was set to 5. The parameters were optimized using the Adam algorithm. BERT-base was used as the encoder for the source post and pretrained on the datasets. Due to the category imbalance between the PHEME5 and PHEME9 datasets, we used Accuracy (Acc), Recall (Rec), and F1 as evaluation metrics to assess model performance. We present the average results from five different random seeds.

5.4. Comparison Experiments (Q1)

The performance comparison between ASTKN and other baselines is presented in Table 4, yielding the following observations:

The feature-based model SVM-BOW performs poorly, as it uses hand-developed features based on the overall statistics of posts. However, these features are too coarse and have low generalizability.
Deep learning-based models automatically extract effective features due to using neural networks. Thus, their performance is significantly better than the feature-based approach. CNN, BiLSTM, and BERT all utilize content features only, with BERT achieving higher performance due to its more robust rumor feature capture capability. RvNN and Bi-GCN both use the spatial structure of propagation. RvNN models post propagation as a tree-like structure and designs two ways to extract spatial structure features, i.e., top-down (TD-RvNN) and bottom-up (BU-RvNN). However, RvNN has weaker feature extraction ability for text and spatial structure. Bi-GCN takes into account the fact that both propagation and diffusion are crucial features. Therefore, they use Bidirectional GCNs to encode both propagation and diffusion structures separately. Thus Bi-GCN is more effective than RvNN. However, using only the post propagation structure has disadvantages; as the number of nodes in the propagation tree decreases, the information that can be provided decreases, reducing the model’s performance. DDGCN addresses the concern that existing methods do not consider external knowledge related to the post and temporal information associated with the propagation process. Therefore, they model two dynamic graph structures, namely, the dynamic propagation graph and the dynamic knowledge graph, and encode the information of the two graph structures separately using GCNs. DDGCN can effectively capture the spatial structure, temporal structure, and relevant external knowledge information of rumors. In particular, it uses a statistical approach to assign edge weights to the knowledge graph. However, this approach is ineffective in aggregating knowledge information to relevant posts, as discussed in the following part. CALN achieves suboptimal performance on PHEME5, demonstrating the effectiveness of using visible topic clustering and unsupervised adversarial learning for its invisible topic distribution.
Compared to the baseline models, ASTKN achieves optimal performance. First, compared to CNN, BiLSTM, and BERT, which only utilize content features, ASTKN not only encodes source posts’ content features, it focuses on propagated spatial, temporal, and external knowledge. Second, compared to TD-RvNN, BU-RvNN, Bi-GCN, and CALN, ASTKN applies stronger encoders to extract rumor features and pays more attention to temporal and external knowledge information. Compared with DDGCN, which only uses a statistical approach to fix edge weights, we consider adaptive post-to-post and post-to-knowledge aggregation. This adaptive aggregation can better capture the relationship between nodes by learning the importance weights. Compared to the method using fixed edge weights, our model can adaptively adjust the weights of information transfer between different nodes according to the importance of the nodes in the propagation structure, ensuring that more important nodes can gain more influence in the information transfer, resulting in better capture of the key information in the propagation structure. Meanwhile, DDGCN only applies simple concatenation to fuse the propagation and knowledge information. In contrast, we introduce a new attention mechanism that can effectively integrate the propagation structure information of posts and the propagation structure information of knowledge through weighted fusion. This means that the model can pay more attention to the information relevant to the rumor detection task. At the same time, it can enable the model to learn which parts are more important, thereby suppressing or ignoring noise or redundant information.

5.5. Ablation Experiments (Q2)

In this section, we describe our ablation experiments on the two datasets used to comprehensively analyze the key components of ASTKN. Specifically, we set up the following comparison models:

R1: Removing the post-entity-concept propagation graph and encoding the post-propagation graph using dynamic graph convolutional networks.

R2: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph convolutional networks while fusing post propagation and knowledge propagation information using concatenation.

R3: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph convolutional networks. Following Sun et al. [8], we assign edge weights to the post-entity-concept propagation graph and use concatenation to fuse post propagation and knowledge propagation information (post-entity edges use the frequency–inverse document frequency term as the edge weight, while entity-entity, entity-concept, and concept-concept edges use the PMI as the edge weight).

R4: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph attention networks and using concatenation to fuse post propagation and knowledge propagation information;

R5: Encoding the post-propagation graph and post-entity-concept propagation graph using dynamic graph attention networks and using our designed attention mechanism to fuse post propagation and knowledge propagation information.

The Acc and F1 obtained from the ablation experiments on PHEME5 and PHEME9 are shown in Figure 2. From the figures, a number of conclusions can be observed. First, from the experimental results of R1, it can be seen that utilizing only post propagation and content information while ignoring other information has limited performance.

Second, incorporating external knowledge information can be effective in improving performance (R2). This may be due to the short length of posts and the lack of contextual information. Relevant external knowledge can supplement the background information. Meanwhile, the information on knowledge propagation structure helps to identify rumors.

Third, we find that adding statistically computed edge weights (R3) to the graph structure has a limited effect on improving performance (R3 improves average Acc by 0.002 and average F1 by 0.001 compared to R2). In contrast, the use of the adaptive method to generate edge weights results in a relatively significant improvement (R4 improves average Acc by 0.009 and average F1 by 0.008 compared to R3, and R4 improves average Acc by 0.011 and average F1 by 0.009 compared to R2). This may be because treating each neighbor node as equally important or using statistically based fixed edge weights is not conducive to efficient feature aggregation. In contrast, the model can better aggregate features that are important for identifying rumors by the adaptive method.

Fourth, using our designed attention mechanism (R5) to dynamically fuse post propagation information with knowledge information achieves the highest performance. Although the comprehensiveness of the information is essential, the importance of information for identifying rumors is variable. This dynamic fusion process allows the model to learn which information is more important for identifying rumors and which is less important. Thus, the model using dynamic graph attention networks and the attention mechanism we designed achieves optimal results.

5.6. Hyperparameter Tuning Experiments (Q3)

In this experiment, we first tested whether applying more attention heads improves model performance. Then, we tested the effect of different dropout rates on model performance.

Figure 3 shows the effect on model performance of applying a different number of attention heads. Although the application of the multi-head attention mechanism can provide richer local information and more comprehensive global information, the experimental results show that applying the multi-head attention mechanism does not improve our model’s performance. This may be due to two reasons: first, increasing the number of attention heads increases the complexity of the model and the number of parameters, which may lead to increased training difficulty and decreased generalization performance; second, the graph structure consisting of replies and comments may be sparse, and increasing the number of attention heads may lead to excessive dispersion of attention among relatively few neighboring nodes, reducing the expressive power of the model.

As shown in Figure 4, we tested the effect of different dropout rates on the model’s performance. It can be seen from the figure that as the dropout rate increases, the performance of ASTKN shows a trend of first increasing and then decreasing. This is because a dropout rate that is too small is not enough to provide sufficient regularization, which may lead to overfitting and make the network unable to compute the true distribution of the input data correctly.

A dropout rate that is too large results in a network that is too simple to adequately learn the features of the input data. Therefore, the model performs optimally when the dropout rate is moderate.

5.7. Early Detection (Q4)

Detecting rumors at the early propagation stage can prevent rumors from spreading widely. As shown in Figure 5, to evaluate the early detection performance of ASTKN we intercepted different numbers of responsive posts in chronological order. Both ASTKN and the baseline models were able to identify rumors using the source post and the given responsive posts.

From Figure 5, it can be seen that the models perform poorly when there are few responsive posts. Because of the lack of responsive posts, there is a corresponding lack of spatial and temporal structure. Second, it is clear that the performance of all models increases as the number of responsive posts increases. This is because the models can acquire more information as the number of responsive posts increases. Third, ASTKN has strong performance at all responsive post numbers, exceeding Bi-GCN and DDGCN. This demonstrates that adaptive aggregation of posts and knowledge along with adaptive aggregation of propagation structure and knowledge structure information can effectively improve the model’s early detection ability.

6. Conclusions and the Future Work

In this paper, we observe that existing rumor detection methods pay attention to the spatial–temporal structure of propagation and external knowledge information. However, two issues are overlooked: (L1) lack of adaptive aggregation of posts and knowledge and (L2) lack of adaptive fusion between propagation structure and knowledge structure information. Therefore, we propose a new rumor detection model, ASTKN. ASTKN applies the dynamic graph attention network to encode the spatial–temporal structure of information propagation jointly, enabling adaptive aggregation of post node and knowledge node information. To better fuse the propagation structure and knowledge structure information, we introduce a new attention mechanism that can calculate the importance of propagation and knowledge information in each temporal stage and assign importance scores such as weights to the propagation structure and the knowledge structure. Through the above process, our model can generate a better representation for distinguishing rumors.

In future work, we aim to apply ranking algorithms to rumor detection. Ranking algorithms can score or classify rumors based on specific metrics and criteria, helping to prioritize the handling of information with a higher likelihood of being a rumor or having greater destructiveness. It is possible to quickly and automatically analyze vast amounts of information flow using ranking algorithms, thereby improving the efficiency and accuracy of rumor detection. The application of ranking algorithms can be based on multiple factors, including but not limited to the content features of rumors (such as topics, sentiment, and credibility), propagation features (such as retweets, comments, and user interactions), and external knowledge bases (such as factual databases and authoritative institution information). Combining these factors and utilizing appropriate ranking algorithms makes it possible to identify and exclude information that may be rumors.

Author Contributions

Conceptualization, H.L. and G.H.; methodology, H.L. and C.L.; Writing—original draft, H.L.; supervision, J.L. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62066009), the Key Research and Development Project of Guilin (No. 2020010308), the Guangxi Natural Science Foundation (No. 2022GXNSFBA035510), and the Open Funds from Guilin University of Electronic Technology, Guangxi Key Laboratory of Image and Graphic Intelligent Processing (No. GIIP2207).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, X.; Zhou, F.; Trajcevski, G.; Bonsangue, M. Multi-view learning with distinguishable feature fusion for rumor detection. Knowl.-Based Syst. 2022, 240, 108085. [Google Scholar] [CrossRef]
Zhong, N.; Zhou, G.; Ding, W.; Zhang, J. A Rumor Detection Method Based on Multimodal Feature Fusion by a Joining Aggregation Structure. Electronics 2022, 11, 3200. [Google Scholar] [CrossRef]
Liu, B.; Sun, X.; Meng, Q.; Yang, X.; Lee, Y.; Cao, J.; Luo, J.; Lee, R.K.W. Nowhere to hide: Online rumor detection based on retweeting graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12. [Google Scholar] [CrossRef]
Huang, Q.; Zhou, C.; Wu, J.; Liu, L.; Wang, B. Deep spatial–temporal structure learning for rumor detection on Twitter. Neural Comput. Appl. 2020, 35, 12995–13005. [Google Scholar] [CrossRef]
Fang, L.; Feng, K.; Zhao, K.; Hu, A.; Li, T. Unsupervised Rumor Detection Based on Propagation Tree VAE. IEEE Trans. Knowl. Data Eng. 2023, 1–16. [Google Scholar] [CrossRef]
Gao, Y.u.; Liang, G.; Jiang, F.; Xu, C.; Yang, J.; Chen, J.; Wang, H. Social network rumor detection: A survey. Acta Electonica Sin. 2020, 48, 1421. [Google Scholar]
Ma, G.; Hu, C.; Ge, L.; Zhang, H. Open-Topic False Information Detection on Social Networks with Contrastive Adversarial Learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 2911–2923. [Google Scholar]
Sun, M.; Zhang, X.; Zheng, J.; Ma, G. DDGCN: Dual Dynamic Graph Convolutional Networks for Rumor Detection on Social Media. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 4611–4619. [Google Scholar]
Bai, L.; Han, X.; Jia, C. A Rumor Detection Model Incorporating Propagation Path Contextual Semantics and User Information. Neural Process. Lett. 2023, 1–20. [Google Scholar] [CrossRef]
Chen, Z.; Wang, L.; Zhu, X.; Dietze, S. TSNN: A Topic and Structure Aware Neural Network for Rumor Detection. Neurocomputing 2023, 531, 114–124. [Google Scholar] [CrossRef]
Ran, H.; Jia, C.; Yu, J. A metric-learning method for few-shot cross-event rumor detection. Neurocomputing 2023, 533, 72–85. [Google Scholar] [CrossRef]
Tu, K.; Chen, C.; Hou, C.; Yuan, J.; Li, J.; Yuan, X. Rumor2vec: A rumor detection framework with joint text and propagation structure representation learning. Inf. Sci. 2021, 560, 137–151. [Google Scholar] [CrossRef]
Sujana, Y.; Li, J.; Kao, H. Rumor Detection on Twitter Using Multiloss Hierarchical BiLSTM with an Attenuation Factor. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, 4–7 December 2020; pp. 18–26. [Google Scholar]
Sun, X.; Yin, H.; Liu, B.; Meng, Q.; Cao, J.; Zhou, A.; Chen, H. Structure learning via meta-hyperedge for dynamic rumor detection. IEEE Trans. Knowl. Data Eng. 2022, 35, 9128–9139. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, R.; Sun, J.; Wang, Y.; You, H.; Zhang, Y. Rumor Localization, Detection and Prediction in Social Network. IEEE Trans. Comput. Soc. Syst. 2022, 1–11. [Google Scholar] [CrossRef]
Ma, J.; Li, J.; Gao, W.; Yang, Y.; Wong, K.F. Improving rumor detection by promoting information campaigns with transformer-based generative adversarial learning. IEEE Trans. Knowl. Data Eng. 2021, 35, 2657–2670. [Google Scholar] [CrossRef]
Silva, A.; Luo, L.; Karunasekera, S.; Leckie, C. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2021; Volume 35, pp. 557–565. [Google Scholar]
Huang, Q.; Yu, J.; Wu, J.; Wang, B. Heterogeneous graph attention networks for early detection of rumors on twitter. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA; pp. 1–8. [Google Scholar]
Qu, S.; Xu, H.; Fu, L.; Long, H.; Wang, X.; Chen, G.; Zhou, C. Tracing truth and rumor diffusions over mobile social networks: Who are the initiators. IEEE Trans. Mob. Comput. 2021, 22, 2473–2490. [Google Scholar]
Liu, X.; Miao, C.; Fiumara, G.; De Meo, P. Information Propagation Prediction Based on Spatial–Temporal Attention and Heterogeneous Graph Convolutional Networks. IEEE Trans. Comput. Soc. Syst. 2023, 1–14. [Google Scholar] [CrossRef]
Bian, T.; Xiao, X.; Xu, T.; Zhao, P.; Huang, W.; Rong, Y.; Huang, J. Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 549–556. [Google Scholar]
Li, J.; Bao, P.; Shen, H.; Li, X. Mistr: A multiview structural-temporal learning framework for rumor detection. IEEE Trans. Big Data 2021, 8, 1007–1019. [Google Scholar] [CrossRef]
Dun, Y.; Tu, K.; Chen, C.; Hou, C.; Yuan, X. KAN: Knowledge-aware Attention Network for Fake News Detection. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 81–89. [Google Scholar]
Wang, Y.; Qian, S.; Hu, J.; Fang, Q.; Xu, C. Fake News Detection via Knowledge-driven Multimodal Graph Convolutional Networks. In Proceedings of the International Conference on Multimedia Retrieval, Dublin, Ireland, 8–11 June 2020; pp. 540–547. [Google Scholar]
Athira, A.; Kumar, S.M.; Chacko, A.M. A systematic survey on explainable AI applied to fake news detection. Eng. Appl. Artif. Intell. 2023, 122, 106087. [Google Scholar]
Bai, N.; Meng, F.; Rui, X.; Wang, Z. Rumor detection based on a source-replies conversation tree convolutional neural net. Computing 2022, 104, 1155–1171. [Google Scholar] [CrossRef]
Song, Y.; Chen, Y.; Chang, Y.; Weng, S.; Shuai, H. Adversary-Aware Rumor Detection. In Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP, Online, 1–6 August 2021; pp. 1371–1382. [Google Scholar]
Sun, L.; Rao, Y.; Wu, L.; Zhang, X.; Lan, Y.; Nazir, A. Fighting False Information from Propagation Process: A Survey. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Han, S.; Yu, K.; Su, X.; Wu, X. Combining Temporal and Interactive Features for Rumor Detection: A Graph Neural Network Based Model. Neural Process. Lett. 2022, 1–17. [Google Scholar] [CrossRef]
Bonifazi, G.; Cauteruccio, F.; Corradini, E.; Marchetti, M.; Sciarretta, L.; Ursino, D.; Virgili, L. A Space-Time Framework for Sentiment Scope Analysis in Social Media. Big Data Cogn. Comput. 2022, 6, 130. [Google Scholar] [CrossRef]
Song, C.; Shu, K.; Wu, B. Temporally evolving graph neural network for fake news detection. Inf. Process. Manag. 2021, 58, 102712. [Google Scholar] [CrossRef]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. arXiv 2020, arXiv:2002.07962. [Google Scholar]
Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal graph networks for deep learning on dynamic graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar]
Wang, Y.; Jing, C.; Xu, S.; Guo, T. Attention based spatiotemporal graph attention networks for traffic flow forecasting. Inf. Sci. 2022, 607, 869–883. [Google Scholar] [CrossRef]
Carchiolo, V.; Cavallo, C.; Grassia, M.; Malgeri, M.; Mangioni, G. Link prediction in time varying social networks. Information 2022, 13, 123. [Google Scholar] [CrossRef]
Tang, J.; Zeng, J. Spatiotemporal gated graph attention network for urban traffic flow prediction based on license plate recognition data. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 3–23. [Google Scholar] [CrossRef]
Cui, L.; Seo, H.; Tabar, M.; Ma, F.; Wang, S.; Lee, D. DETERRENT: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Online, 6–10 July 2020; pp. 492–502. [Google Scholar]
Chen, X.; Zhu, D.; Lin, D.; Cao, D. Rumor knowledge embedding based data augmentation for imbalanced rumor detection. Inf. Sci. 2021, 580, 352–370. [Google Scholar] [CrossRef]
Vitale, D.; Ferragina, P.; Scaiella, U. Classification of short texts by deploying topical annotations. In Proceedings of the European Conference on Information Retrieval, Barcelona, Spain, 1–5 April 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 376–387. [Google Scholar]
Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Ma, J.; Gao, W.; Wong, K. Rumor Detection on Twitter with Tree-structured Recursive Neural Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 1980–1989. [Google Scholar]

Figure 1. The structure of ASTKN.

Figure 2. Ablation experiments on the PHEME5 and PHEME9 datasets.

Figure 3. Performance of ASTKN on the two datasets with different numbers of attention heads.

Figure 4. Performance of ASTKN on the PHEME5 and PHEME9 dataset with different dropout rates.

Figure 5. Early rumor detection performance on the PHEME5 and PHEME9 datasets.

Table 1. Summary of recent works.

	Text	Temporal	Spatial	Knowledge	Adaptive Aggregation of Knowledge and Posts	Adaptive Integration of Knowledge Propagation and Post Propagation
Bian et al. [21]	✓		✓
Bai et al. [26]	✓		✓
Song et al. [27]	✓		✓
Huang et al. [4]	✓	✓	✓
Li et al. [22]	✓	✓	✓
Han et al. [29]	✓	✓	✓
Sun et al. [14]	✓	✓	✓
Song et al. [31]	✓	✓	✓
Cui et al. [37]	✓		✓	✓
Wang et al. [24]	✓		✓	✓
Dun et al. [23]	✓			✓
Sun et al. [8]	✓	✓	✓	✓
ASTKN (our model)	✓	✓	✓	✓	✓	✓

Table 2. Important symbols and their descriptions.

Symbols	Descriptions
$ε_{p}$	the source post and responsive posts for the event $ε$
$ε_{t}$	release time sequence of the event $ε$
$τ$	number of temporal stages
$Δ t$	time interval of each temporal stage
$ε_{r}$	r-th sub-event of $ε$
$G_{r}^{p} = 〈V_{r}^{p}, E_{r}^{p}〉$	post-propagation graph of the r-th temporal stage
$G_{r}^{k} = 〈V_{r}^{k}, E_{r}^{k}〉$	post-entity-concept propagation graph of the r-th temporal stage

Table 3. Dataset statistics.

	PHEME5		PHEME9
	Rumor	Non-rumor	Rumor	Non-rumor
# of training set	1304	2508	1465	2557
# of validation set	186	359	210	366
# of testing set	373	717	419	731
Total #	1863	3584	2094	3654
Avg. # of words/post	14.3		14.3
Avg. # of posts/event	18.7		18

Table 4. Performance comparison on the PHEME5 and PHEME9 datasets. The best results are shown in bold.

	PHEME5			PHEME9
	Acc	Rec	F1	Acc	Rec	F1
SVM-BOW	0.669	0.524	0.529	0.688	0.512	0.515
CNN	0.787	0.702	0.719	0.795	0.673	0.701
BiLSTM	0.795	0.691	0.725	0.794	0.677	0.701
BERT	0.815	0.779	0.796	0.821	0.788	0.803
TD-RvNN	0.821	0.764	0.769	0.804	0.803	0.803
BU-RvNN	0.817	0.761	0.762	0.789	0.788	0.788
Bi-GCN	0.829	0.814	0.818	0.847	0.834	0.835
DDGCN	0.844	0.813	0.823	0.855	0.841	0.843
CALN	0.858	-	-	0.848	-	-
ASTKN	0.872	0.852	0.856	0.867	0.851	0.855

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Huang, G.; Li, C.; Li, J.; Wang, Y. Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection. Electronics 2023, 12, 3457. https://doi.org/10.3390/electronics12163457

AMA Style

Li H, Huang G, Li C, Li J, Wang Y. Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection. Electronics. 2023; 12(16):3457. https://doi.org/10.3390/electronics12163457

Chicago/Turabian Style

Li, Hui, Guimin Huang, Cheng Li, Jun Li, and Yabing Wang. 2023. "Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection" Electronics 12, no. 16: 3457. https://doi.org/10.3390/electronics12163457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Spatial–Temporal and Knowledge Fusing for Social Media Rumor Detection

Abstract

1. Introduction

2. Related Work

2.1. Rumor Detection Based on Spatial Structure

2.2. Rumor Detection Based on Temporal Structure

2.3. Dynamic Graph Attention Networks

2.4. Knowledge-Enhanced Rumor Detection

3. Problem Definition

4. Method

4.1. Dynamic Graph Construction Module

4.1.1. Construction of the Post-Propagation Graph

4.1.2. Construction of the Post-Entity-Concept Propagation Graph

4.2. Dual Dynamic GAT Module

4.2.1. A Single Dual-Static GAT Unit

4.2.2. Temporal Stage Fusion Unit

4.3. Rumor Classification Module

5. Experiments

5.1. Datasets

5.2. Baseline Models

5.3. Experimental Settings and Evaluation Metrics

5.4. Comparison Experiments (Q1)

5.5. Ablation Experiments (Q2)

5.6. Hyperparameter Tuning Experiments (Q3)

5.7. Early Detection (Q4)

6. Conclusions and the Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI