4.2. Experiment Settings
To confirm the efficacy of the model posited in this paper for the task of disinformation detection, this study contrasts it with other foundational techniques within the experimental framework. The methodological specifics are elucidated as follows:
DTC [
41]: A classification model based on decision trees, which manually extracts features to obtain information credibility for false information detection.
SVM-RBF [
6]: The model is based on a support vector machine with an RBF kernel, which utilizes the aggregate statistics of the postings for disinformation detection features by manually constructing features.
GRU [
8]: A deep learning model based on RNN that detects false information by learning the propagation sequence of messages, i.e., the temporal structure characteristics of events.
RvNN [
42]: A false information detection method based on a tree-structured recurrent neural network with GRU units.
PPC_RNN + CNN [
43]: The model combines RNN and CNN to learn the representation of events through the user information on the message propagation path, and then identifies false information.
Bi-GCN [
11]: This is a graph model using a bidirectional graph convolutional neural network. Features are extracted from top-down (top-down) and bottom-up (bottom-up) propagation directions of rumors for detection.
GCN-Bert [
44]: The rumor detection method not only considers the features of the message itself, but also utilizes the rumor features of all relevant texts and words.
HAGNN [
45]: A graph neural network-based disinformation detection model that captures high-level representations of textual content at different granularities, and fuses propagation structures for disinformation detection.
The experiment detailed in this paper is executed on the Ubuntu 22.04 platform, with the experimental environment consisting of Python 3.10 and PyTorch 2.1.0.
Table 2 presents the precise specifications of the experimental setup. To ensure impartiality in comparison, the dataset was randomly partitioned into five segments and a five-fold cross-validation experiment was conducted on these segments. During training, the configurations were set as follows: hidden layer dimension to 64, iterations (epoch) to 200, batch size (batch_size) to 128, learning rate to 0.0005, and dropout rate to 0.2. The Adam optimization algorithm facilitated model parameter updates, while the early stopping technique was implemented to prematurely cease training if the validation set’s loss value remained stagnant for 10 consecutive trials. Model efficacy was gauged using the evaluation metrics: Accuracy (Acc) and F1 score.
4.3. Results and Analysis
On Twitter15 and Twitter16 datasets, the proposed ICP-BGCN method is analyzed with seven baseline models such as the classical DTC, and the experimental results are shown in
Table 3 and
Table 4.
As evidenced by the experimental results in
Table 3 and
Table 4, the ICP-BGCN model achieves superior classification performance with accuracies of 89.7% and 91.7% on the Twitter15 and Twitter16 datasets respectively, demonstrating marked superiority over baseline models. Additionally, it exhibits exceptional performance across precision and recall metrics: achieving precision and recall rates of 90.9% and 89.3% on Twitter15, and 90.1% and 91.7% on Twitter16. This performance profile confirms its dual capability in maintaining classification accuracy while ensuring comprehensive sample coverage. Furthermore, ICP-BGCN performs well on the NR, FR, TR and UR criteria on both datasets. The model is able to maintain a high level of performance when dealing with different categories of samples, achieving more than 85% on all criteria. The combined advantages of the multidimensional metrics validate the robustness and stability of the model in handling different categories of rumor detection tasks.
Through experiments, it can be found that the detection method based on deep learning is superior to the detection method based on machine learning. On the Twitter15 dataset, the accuracy of ICP-BGCN is 44.3% higher than that of DTC and 57.9% higher than that of SVM-RBF. The main reason is that machine learning relies on manual extraction of features, which needs to rely on the experience and judgment of workers, while deep learning-based models can automatically capture deeper features and the correlation between features, thus better identifying false information.
Among the seven deep learning-based misinformation detection models, ICP-BGCN, HAGNN, GCN-Bert, Bi-GCN utilize graph neural networks to extract false information propagation structure features, and demonstrate better performance than the other three models. This shows that GNNs are effective to model the propagation process of information using propagation graphs and extract propagation structure features. Our ICP-BGCN model fuses the propagation structure and the semantic features of the message text. Compared with Bi-GCN, which only considers the structure of information dissemination, GCN-Bert, which utilizes text information features at different granularities, and HAGNN, which captures multi-level semantic information of text content and combines the structural features of dissemination networks, the detection accuracies on the Twitter15 dataset are improved by 1.1%, 2.5%, and 3.2%, respectively. It is also better than other models in various indicators, which shows that it is reasonable and effective to fully fuse the original text features, propagation text features, and propagation structure features to improve the accuracy of false information detection. Overall, the ICP-BGCN model outperforms the other eight models, which include traditional machine learning and deep learning approaches, in terms of detection accuracy and F1 score for each category to varying extents.
To comprehensively assess the cross-scenario generalization capability of our proposed ICP-BGCN model, we select the Pheme dataset with significant scenario variations as our validation benchmark. As a misinformation detection benchmark dataset in breaking news scenarios, Pheme encompasses multiple crisis domains, including social, political, and health-related events [
46], with its cross-domain characteristics providing an ideal experimental environment for evaluating model adaptability across different scenarios. Compared to conventional datasets, like Twitter15 and Twitter16, Pheme exhibits three distinctive characteristics. Firstly, in terms of data composition, the dataset contains high-risk events with heightened emotional intensity (e.g., public health crises, political scandals) and user interactions exhibiting more pronounced emotional signals [
47], presenting multidimensional challenges for semantic modeling. Secondly, the Pheme dataset exhibits significant variations in event scales, where much of the information labeled as rumors actually originates from misclassifications of real events [
46]. This label distribution characteristic constitutes a rigorous test of the model’s discriminative power. Thirdly, regarding data scale, the limited rumor samples (with non-rumors constituting 63.5% of instances) intensify training challenges under class imbalance conditions [
46].
These characteristics closely align with the complex scenarios of misinformation propagation in real-world settings. To validate ICP-BGCN’s performance on cross-scenario datasets, we maintain identical parameter configurations to those used in Twitter15 and Twitter16 experiments, conducting comparative analyses with four baseline models. The experimental results for the Pheme dataset are shown in
Table 5.
The four baseline models employ distinct technical approaches. GCAN [
33] utilizes graph neural networks with dual co-attention mechanisms to achieve multimodal dynamic feature fusion across source text, user attributes, and propagation pathways. Bi-GCN [
11] leverages bidirectional graph convolutional networks to concurrently model forward diffusion (source-to-retweeters) and reverse traceability (leaf-to-source) patterns in information dissemination. GACL-CADA [
48] implements a class-aware adversarial domain adaptation framework to address cross-domain distribution alignment between historical data and emerging events. GAN [
49] enhances detection robustness for hybrid true/false content through adversarial sample generation (e.g., semantically ambiguous text variants) coupled with discriminator-based decision boundary optimization.
Experimental results demonstrate that ICP-BGCN achieves an accuracy of 84.4%, representing a 1% improvement over the best baseline model (GCAN: 83.4%), while also delivering competitive performance in precision, recall, and F1-score. This finding illustrates that the model not only effectively identifies routine rumor patterns in Twitter15 and Twitter16 datasets but also exhibits robust performance in handling Pheme’s high-emotion, semantically ambiguous crisis-related misinformation. This cross-domain adaptability highlights its robustness and potential for generalization across diverse propagation scenarios.
To further ensure the accuracy of our results and assess the generalization capability of our proposed ICP-BGCN model, we incorporate the SemEval-17-task 8 dataset as an additional benchmark in our experiments. The SemEval-17-task 8 dataset is widely adopted for rumor analysis and provides a rich set of Twitter conversational threads with fine-grained labels—“True rumor” (TR), “False rumor” (FR), and “Unverified rumor” (UR)—as well as stance classification tasks. The experimental results are shown in
Table 6. The technical specifications of the four baseline models are as follows.
HiTPLAN [
10] employs a multi-level Transformer architecture to capture nuanced contextual representations from social media posts for deceptive content detection. MTL2-Hierarchical Transformer hierarchically [
50] segments conversational threads into sub-threads, encodes contextual features using BERT embeddings, and aggregates cross-sub-thread semantics via Transformer fusion to enable multi-granular representation learning. Coupled Hierarchical Transformer extends MTL2 [
50] by integrating multi-task learning through a hybrid attention mechanism that aligns BERT-derived semantics with stance-aware propagation patterns, jointly optimizing rumor verification and stance detection. Hierarchical Contrastive Disentangled Multi-task Graph Network (HCD-MGN) [
51] enhances multi-task performance through (1) a feature decoupling module (PFN) separating shared/task-specific features, (2) dual graph encoders modeling propagation structures and semantic relationships, and (3) stance-aware contrastive learning for representation optimization.
On the SemEval-17 Task 8 dataset, the proposed ICP-BGCN model achieves state-of-the-art performance with 78.5% accuracy and 79.2% Macro-F1 score, demonstrating a 1.8% absolute performance improvement over the best baseline model HCD-MGN (76.7% accuracy). These results, together with those obtained from the Twitter15, Twitter16, and Pheme datasets, confirm that our model consistently generalizes across multiple social data sources, effectively capturing the unique propagation structures inherent in different social media scenarios.
4.5. Propagation Graph Analysis
In order to explore the impact of propagation paths on disinformation detection, we statistically analyze the structure of the propagation networks of disinformation and non-disinformation, aiming to reveal the differences in propagation patterns between the two. We integrated the labels of the Twitter15 and Twitter16 datasets, classified “verified true rumors (TR)”, “verified false rumors (FR)”, and “unverified rumors (UR)” into the category of “rumors”, and compared them with the data of “Non-rumors (NR)”. The visualization of information dissemination is shown in
Figure 5 below. In an information dissemination relationship graph, each node represents a separate unit of information, which includes the original tweet, its related comments, or retweets. These nodes are connected by edges, which represent interactive behaviors between them, such as retweets or comments. We define the original tweet as the root node of the relationship graph, and all posts that directly reply to that tweet become its children. Following this logic, if a post
receives a reply from another post
, then according to the order of information dissemination, post
becomes a child node of post
, which is represented in the graph as node
is a subordinate node of node
.
As shown in
Figure 5, the information dissemination tree exhibits a broad structure in which most nodes belong to the shallow first-level responses [
52]. In the disinformation dissemination network, there are obvious clusters of nodes, and the nodes within the cluster are more densely connected, while the nodes connected to nodes outside the cluster are relatively sparse, which reflects the high degree of aggregation in the disinformation dissemination network. This discrepancy may be explained by the fact that well-crafted rumors usually carry inherent information content features that can trigger the replies and reposts of multiple internet celebrity individuals with high social influence. In contrast, naturally occurring true events are not well crafted to maximize their social impact, making it non-trivial to trigger reposts or replies of multiple internet celebrity individuals simultaneously.
In addition, the structure of the disinformation dissemination network is more complex, the dissemination path is usually longer, and the connections between nodes are closer, which suggests that the rumor information not only spreads quickly in the process of dissemination but also gets strengthened and consolidated in specific groups. In contrast, a lower aggregation trend is presented in the dissemination network of non-false information. Its path distribution is more uniform and the connection between nodes is sparse, all these features indicating that the breadth of information dissemination is limited and decentralized.
Further, we computed and compared the topological metrics of the information dissemination graphs, such as the number of nodes, number of edges, average path length, degree distribution, etc. We used the average values of the metrics for all graphs in both categories, non-spurious information and spurious information, as the final comparison data. The calculation results of the network structure metrics are detailed in
Table 9; see
Figure 6 for the visualization of the degree distribution.
In the network metrics calculation results shown in
Table 9, the results of the two datasets show opposite trends. Specifically, in the Twitter15 dataset, network metrics such as the number of nodes and the number of edges present higher values in the non-rumor category; however, in the Twitter16 dataset, the values of these same network metrics in the rumor category exceed those in the non-rumor category. According to the relevant literature discussion, false information spreads farther, faster, deeper, and wider in terms of speed, scope, depth, and breadth compared to non-false information [
3,
53,
54]. In addition, the experimental results in
Section 4.3 show that disinformation detection based on propagation paths performs significantly better in the Twitter16 dataset than in the Twitter15 dataset. Therefore, we believe that the network characteristics revealed by the Twitter16 dataset are more in line with the characteristics displayed by disinformation and non-disinformation during the propagation process, where key indicators, such as the number of nodes, the number of edges, the graph diameter, and the average path length of disinformation, are larger than those of non-disinformation. That is, there is more user participation, more frequent forwarding of information, a wider dissemination range, and more complex dissemination paths in the dissemination process of disinformation. Due to its unrepresentative sample failing to encompass a broader or more balanced range of user groups and message types, the Twitter15 dataset may not exhibit network characteristics consistent with theory. In summary, the Twitter16 dataset is more consistent with the characteristics of real rumor propagation, and our method is more valid for this type of data.
In addition, by analyzing the coefficient of congruence and network density values, the analysis reveals that the information dissemination network is a low-density heterogeneous network. In such a network structure, the connections between nodes are not tightly connected, but show a tendency to connect between highly connected nodes and lowly connected nodes. This phenomenon suggests that a small number of nodes in the information dissemination network, i.e., those highly connected nodes, which play a key role in the network, have a significant impact on the overall performance of the network. Further, based on the analysis of the degree distribution graph in
Figure 6, we can observe that the node degree distributions of both disinformation and non-disinformation in the information dissemination network exhibit significant long-tail characteristics. That is, there are a few highly connected nodes in the network structure, which are usually called opinion leaders or key communicators, and they play a crucial role in the information dissemination process. Therefore, accurate identification of these key nodes will help to improve the accuracy and efficiency of detection when performing disinformation detection.
4.6. Discussion
In this study, we propose the ICP-BGCN model, an innovative approach to disinformation detection, which integrates original text content, propagation text, and the structural information of message dissemination. Our experiments on the Twitter15, Twitter16, and Pheme datasets demonstrate that the fusion of semantic features extracted via BERT and propagation features learned through a bidirectional graph convolutional network leads to superior detection accuracy. Notably, the model achieves accuracies of 89.7% on Twitter15 and 91.7% on Twitter16, outperforming eight mainstream baselines by 1.1% and 3.7%, respectively, and maintains robust generalization with an 84.4% accuracy on the Pheme dataset. These results strongly support our original hypothesis that leveraging both text semantics and propagation structure can enhance disinformation detection.
Our work makes several key contributions that push the field forward. First, by embedding interactive data, such as user comments and retweets, into a graph structure, ICP-BGCN captures global coupling features that traditional models often overlook. This methodological advancement addresses limitations identified in earlier studies [
11,
45] and opens new avenues for exploiting network topology in misinformation analysis. Second, the detailed analysis of propagation metrics—such as degree distribution, diameter, and average path length—provides fresh insights into the distinct dissemination patterns of disinformation versus non-disinformation. For example, our observation that disinformation tends to form low-density heterogeneous networks with several highly connected nodes not only explains the superior performance on the Twitter16 dataset but also suggests potential indicators for early detection in real-world applications.
Despite these advances, our study has limitations that must be acknowledged. One notable limitation is that the current model does not account for the temporal decay of post influence, an aspect that requires further investigation.