1. Introduction
With the Internet’s widespread adoption and rapid development, social media platforms have become convenient spaces for users to access information, share opinions, and communicate daily [
1,
2]. Users have transformed from passive information consumers to active content creators and distributors, accelerating the rapid dissemination of fake information across social networks [
3,
4]. Fake information could be defined as intentionally or unintentionally disseminated content that violates authenticity, including false statements, manipulated facts, and misleading narratives, with the potential to cause societal harm [
5,
6]. In 2025, fake news continues to pollute the Internet at an alarming scale, with 62% of online content now deemed false. A staggering 86% of global citizens have been exposed to misinformation, while 40% of content shared on social media is fake [
7]. False information spread by social media has amplified its disruptive effects, disturbed daily life, and posed serious challenges to public safety and governance. Therefore, exploring rapid and efficient methods for fake information detection has become a persistent research focus in academia.
Traditional fake information detection methods rely on manually defined linguistic features, such as sentiment polarity, lexical complexity, and propagation rule models [
8,
9]. However, these approaches struggle with significant limitations in generalization when confronted with adversarially generated fake texts. The advancement of pre-trained language models has introduced new characteristics in fake information, including logical closed loops and contextual adaptation [
10]. These developments present dual challenges for detection technologies. Detection methods must effectively capture semantic contradictions within texts while modeling the evolving patterns during information dissemination [
11,
12]. Deep learning-based detection methods have emerged as a crucial technological approach to tackle these difficulties due to their powerful feature abstraction capabilities [
13,
14].
In the domain of fake information detection, the integration of Convolutional Neural Networks (CNN) and bidirectional long short-term memory networks (BiLSTM) has demonstrated unique value [
15]. CNN accurately captures local semantic patterns through hierarchical convolution operations [
16]. These operations effectively identify unconventional modifier stacking and abnormal anaphoric relations, which are micro-linguistic features essential for precise detection. BiLSTM models bidirectional temporal dependencies and effectively tracks the logical coherence of textual context [
17]. As a result, they are particularly well-suited for analyzing the progressive semantic distortion that occurs during the propagation of fake information. Compared to commonly used Transformer architectures, this combination offers irreplaceable advantages in specific scenarios. In data-scarce contexts, such as detecting fake information during emerging events, Lightweight recurrent neural networks such as BiLSTM can compress the number of parameters to less than 1/50 of the Transformer model while maintaining a performance level of over 90% in text classification tasks [
18]. Additionally, the inference speed of CNNs in short text classification tasks is 3 to 5 times faster than that of Transformer-based models, and the accuracy is comparable in scenarios with high-frequency vocabulary [
19]. Nevertheless, the performance of such models heavily depends on hyperparameter configuration [
20]. Traditional grid search methods may require thousands of GPU hours when optimizing complex parameters, resulting in substantial human and computational costs. Gradient-based optimization strategies struggle with discrete parameter combinations, leading to high tuning costs and poor cross-domain robustness in practical deployments [
21].
Researchers are exploring swarm intelligence algorithms to improve efficiency in optimizing deep learning model parameters. From classical particle swarm optimization (PSO) and genetic algorithms (GA) to emerging Black Kite Algorithm (BKA), these methods simulate collective behaviors in biological populations and demonstrate unique advantages in tackling high-dimensional, non-convex optimization problems [
22,
23,
24]. Taking BKA as an example, it simulates the circling foraging behavior of black kites by dividing population individuals into explorers and followers and dynamically adjusting their movement strategies during the search. Compared with traditional algorithms, BKA exhibits faster convergence in continuous parameter optimization tasks. However, finding optimal solutions can still be challenging. This difficulty largely stems from imbalances between global exploration and local search capabilities, as well as BKA’s limited global search capability and vulnerability to local optima. Therefore, it is necessary to enhance BKA.
Here, we can obtain the following two motivations for this study:
(1) The dual-channel integrates local and global semantic understanding, enabling comprehensive characterization of fake information’s diverse linguistic features.
(2) The improved BKA model can optimize the hyperparameters of the hybrid deep learning model, enabling it to achieve higher accuracy.
Thus, this study proposes an improved multi-strategy Black Kite Algorithm (MIBKA) and constructs the MIBKA-CNN-BiLSTM hybrid model. The innovations are reflected in the following aspects.
(1) The paper proposes three strategies to enhance the BKA. The circle chaotic mapping is used for population initialization to make the initial population distribution more uniform. A differential elite mutation strategy is integrated during the kite’s attack phase for position updates to balance global exploration and local exploitation. An opposition-based learning mechanism is introduced for an individual foraging to dynamically explore the op-position solution space.
(2) A dual-channel feature extraction network combining CNN and BiLSTM is proposed. The CNN branch employs multi-scale convolution kernels to capture local textual anomaly patterns, while the BiLSTM branch models contextual logical relations through hierarchical state propagation. Then, the improved MIBKA is used to optimize hyperparameters, including the number of 1D convolution filters, convolution kernel sizes, and the number of BiLSTM units.
(3) A fake information dataset is constructed and validated. Extensive comparative experiments with various single and hybrid deep learning models demonstrate the superiority of the MIBKA-CNN-BiLSTM model in fake information detection tasks.
3. Methodology
The paper proposes the MIBKA-CNN-BiLSTM hybrid model for fake information detection. First, the original BKA is enhanced using three strategies. Then, the MIBKA is employed to optimize the hyperparameters of the CNN-BiLSTM model. Finally, the trained MIBKA-CNN-BiLSTM model is used to classify the crawled information as real or fake. The overall architecture of the proposed model is illustrated in
Figure 1.
3.1. Black Kite Optimization Algorithm (BKA)
The black kite is a medium-sized bird of prey known for its exceptional agility in hovering, strategic hunting maneuvers, and adaptive migratory patterns. These traits enable it to explore and exploit resources in diverse environments efficiently. The Black Kite Optimization Algorithm (BKA) is a nature-inspired optimization method derived from these distinctive behaviors. The BKA consists of three main stages: population initialization, attack behavior, and migratory behavior.
3.1.1. Population Initialization
In BKA, the first step is to create a set of random solutions to initialize the population. The following matrix can be used to represent the position of each Black Kite (BK).
Here,
denotes the population size, and
refers to the dimensionality of the search space.
represents the position of the
black kite in the
dimension. The positions are initialized according to Equation (2).
is an integer between 1 and
.
and
represent the lower and upper bounds of the
dimension, and
is a randomly generated number within the range [0, 1]. During the initialization phase, the individual with the best fitness is selected as the leader of the initial population, as shown in Equation (3).
3.1.2. Attack Behavior
Black kites, while flying, adjust the angles of their wings and tails based on the wind speed. This flexibility allows them to hover silently and observe their prey before diving quickly to attack. The attack behavior is modeled using global exploration and local exploitation. The position update rule is formulated in Equation (4).
The first case simulates scenarios where the black kite hovers in the air and then dives rapidly toward its prey. The second case simulates hovering and gradual movement.
and
denote the
dimension position of the
black kite at iterations
and
.
[0, 1] is a random number, and
is a constant set to 0.9. The parameter
is a nonlinear control variable, defined in Equation (5), where
is the maximum number of iterations, and
is the current iteration number. This parameter is designed to adjust the search scope dynamically.
3.1.3. Migratory Behavior
In nature, black kites migrate to new habitats for better survival conditions and resources. When the current leader fails to find optimal environments, the group dynamically replaces it with more capable individuals, ensuring continuous movement toward better habitats. This process is similar to the dissemination process of false information, where key communicators guide the spread of the information. Therefore, the migration behavior of the black kites can inspire the identification of the adaptive transmission patterns of false information. Migration is typically led by a dominant bird whose navigation ability is critical for group success. BKA adopts a migratory strategy based on the assumption that if the fitness of the current individual is worse than that of a randomly selected individual, the leader should relinquish control and join the migratory group. Otherwise, the leader continues to guide the population toward the optimal solution. The migration-based position update rule is shown in Equation (6).
Here,
denotes the
dimension of the leader’s position at iteration
(i.e., the current global best solution).
is the fitness value of the current individual, and
is the fitness value of a randomly selected individual.
refers to a value sampled from the Cauchy distribution. The probability density function of the Cauchy distribution is defined in Equation (8). When
= 1 and
= 0, the distribution simplifies to its standard form, as shown in Equation (9).
3.2. Multi-Strategy Improved Black Kite Optimization Algorithm (MIBKA)
Although BKA demonstrates fast convergence speed and high optimization precision, global exploration and local exploitation are imbalanced. Specifically, its global search capability is relatively weak, making it prone to falling into local optima. Three improvement strategies are adopted in this study to address this issue.
3.2.1. Introducing Circle Chaotic Mapping for Population Initialization
In swarm intelligence algorithms, the quality of the initial solution significantly impacts performance and convergence. The standard BKA utilizes a uniform random initialization strategy that may lead to uneven distribution and boundary aggregation in mixed-parameter spaces, thereby diminishing the effectiveness of global search. Alternative methods such as Latin hypercube sampling, low-discrepancy sequences like Halton, and chaotic mappings are commonly used to mitigate this issue. Chaotic mappings are preferred due to their ergodicity, randomness, and adaptability to complex parameter spaces.
This study evaluates 21 chaotic mappings, categorized into three types based on their dynamic characteristics, as summarized in
Table 2.
To determine the most suitable chaotic map for BKA initialization, each of the 21 strategies is tested by integrating it into BKA and evaluating the performance on the CEC2017 benchmark suite. The experiments are conducted with a population size of 50, a dimensionality of 30, and a maximum of 300 iterations. Each algorithm variant is run five times, and the average results are used for ranking. Evaluation metrics include optimal fitness, convergence iterations, and global optimality hit rate. Selected results are shown in
Figure 2.
The experimental results show that the circle chaotic map outperforms other mappings on most benchmark functions. For instance, in the F1 test function, BKA with circle mapping achieves an average fitness of 8.3388 × 102, reducing the original BKA result of 1.1078 × 103 by 24.77%. The number of convergence iterations has decreased from 213 to 127, reflecting a 40.38% improvement in performance. Moreover, circle mapping ranks first in 21 out of 30 runs and exhibits premature convergence in only two cases. Based on these results, circle chaotic mapping is selected to improve the initialization phase of BKA.
The circle map generates pseudo-random sequences using a phase-shifting mechanism. Its iterative equation is defined in Equation (10). Here,
= 0.5 is the rotation angle controlling the global traversal direction, and
= 0.2 is the perturbation strength regulating local randomness.
The improved initialization process is described as follows.
Chaotic sequence generation. For each individual, a chaotic sequence is generated from a random seed [0, 1). The sequence is preheated for 100 iterations to eliminate transient effects. Here, represents the number of parameters to be optimized.
Handling discrete parameters. For integer parameters such as the number of network layers
or the number of convolution kernels
, floor mapping is applied to convert chaotic values. Specifically, given a normalized chaotic value
, the transformation is defined in Equation (11).
and
are the upper and lower bounds of the parameters. For example, if
, the convolution kernels
=
+ 16 = 51 is rounded and adjusted to 64.
Preserving continuous parameters. For continuous variables such as learning rate
[0.0001, 0.1] and dropout rate
[0.2, 0.6], the original distribution of the chaotic sequence is retained through linear scaling, as shown in Equation (12). For instance, when
= 0.15, then
= 0.0151.
Dynamic boundary reflection. If
exceeds the defined domain boundaries, a reflection mechanism is used to keep it within the feasible range, as defined in Equation (13). This strategy ensures valid solutions and maintains population diversity in the adequate search space.
The BKA achieves better coverage and faster convergence in mixed parameter spaces by reconstructing the initialization phase using the circle chaotic mapping.
3.2.2. Using Random-to-Elite Differential Mutation for Attack Phase
The core search mechanism of BKA differentiates roles between explorers and followers during the attack phase. Explorers expand the search space through stochastic movements, while followers focus on local exploitation around the current best solution. However, in complex high-dimensional mixed-parameter spaces, the original random perturbation strategy lacks guidance, challenging balancing global exploration and local exploitation. Some researchers have attempted to tune BKA’s internal parameters to address this limitation or integrate it with other metaheuristic algorithms. Although such methods can improve performance, they often face difficulties in parameter coordination. Compared with other approaches, mutation strategies introduce random or probabilistic perturbations to break population homogeneity. These strategies can be designed according to the specific problem and the characteristics of the algorithm. By incorporating historical information or elite individuals, they can guide the search direction more effectively. This approach helps achieve a better balance between global search and local exploitation but only requires a linear complexity of time.
This study investigates 13 mutation strategies, including Gaussian mutation (Guass1), elite Gaussian mutation (Guass2), Cauchy mutation (Cauchy1), inverse Cauchy mutation (Cauchy2), t-distribution mutation (t), adaptive t-distribution mutation (Self-t), normal cloud mutation (Cloud), periodic mutation (Periodic), elite differential mutation (DE/best/1), random-to-elite differential mutation (DE/rand-to-best/1), random differential mutation (DE/rand/2), best/2 mutation (DE/best/2), and non-uniform mutation (H). These strategies are categorized into four types based on their mathematical mechanisms, as shown in
Table 3.
To identify the most effective strategy for improving BKA’s attack phase, each of the 13 mutation strategies is embedded into BKA and tested on the CEC2017 benchmark functions. Experiments are conducted with a population size of 50, a dimensionality of 30, and a maximum of 300 iterations. Every improved algorithm is executed five times, and average performance is recorded. Evaluation metrics included best fitness and number of convergence iterations. Selected results are illustrated in
Figure 3.
Experimental results show that the random-to-elite differential mutation (DE/rand-to-best/1) performs best on 19 test functions. For instance, in the F10 benchmark test, the average fitness of this strategy reached 1.9581 × 109, which was 80.42% lower than the original BKA’s 1.0002 × 1010. Differential evolution-based strategies balance exploration and exploitation by combining random exploration with elite guidance. Due to its random perturbation and elite direction characteristic, DE/rand-to-best/1 is integrated into BKA’s attack phase to compensate for the lack of guided search in the original design.
The random-to-elite differential mutation originates from the Differential Evolution (DE) algorithm, a population-based global optimization method. DE generates mutation vectors by computing individuals’ differences and combining them with crossover and selection to search for the global optimum. The DE/rand-to-best/1 strategy builds on this by introducing randomness to enhance diversity while leveraging elite individuals to accelerate convergence toward the global optimum.
In the BKA framework, this mutation strategy is embedded into the attack phase to replace the original stochastic disturbance formula. The updated position equation is defined in Equation (14).
Here, and represent two randomly selected individuals from the population. refers to the elite individual (with the best fitness) in the dimension at iteration . is the differential scaling factor controlling the impact of the difference vector . is the elite learning factor, which determines how much influence the elite solution exerts on the updated position.
By incorporating the DE/rand-to-best/1 strategy into BKA’s attack phase, the algorithm significantly reduces its susceptibility to local convergence. The enhanced equation combines random perturbation with elite-driven guidance, improving search efficiency and robustness across complex optimization tasks.
3.2.3. Utilizing Logarithmic Spiral Opposition Learning for Position Update Phase
After completing the attack and migration phases, BKA only retains the superior individuals through fitness comparisons and lacks active exploration of the potential reverse solution space, resulting in a limited search range. In high-dimensional mixed parameter spaces, especially when discrete network layers coexist with continuous learning rates, forward search is prone to miss high-quality solutions in the reverse regions. Opposition-based learning (OBL) has been widely adopted among various improvement techniques due to its simplicity and effectiveness. The core idea of OBL is to generate opposite solutions relative to the current candidates and explore whether better solutions may lie in the opposite direction. This approach expands the search space, helps the algorithm avoid premature convergence, and enhances global optimization capability without significantly modifying the original algorithm structure. This study evaluates 12 opposition-based learning strategies and classifies them into five categories based on their core mechanisms, as shown in
Table 4.
To identify the most suitable opposition learning strategy for enhancing the BKA population update phase, 12 strategies are individually integrated into BKA and tested on the CEC2017 benchmark set. Experiments are configured with a population size of 50, 30-dimensional search space, and 300 maximum iterations. Each modified algorithm is executed five times, and the average performance is recorded. Evaluation metrics include best fitness and number of convergence iterations. Partial results are illustrated in
Figure 4.
Experimental results indicate that logarithmic spiral opposition-based learning (LSOBL) performs best on 24 benchmark functions. For the F23 function, the BKA-LSOBL algorithm converges in 102 iterations, while the original BKA takes 225 iterations, resulting in a 54.67% improvement. Its average fitness decreased from 1.204 × 104 to 5.824 × 103, representing a 51.63% reduction. Overall, LSOBL demonstrates the most significant enhancement among all tested strategies.
LSOBL is inspired by observing logarithmic spiral patterns in nature, which exhibit self-similarity and infinite scalability. These properties allow the algorithm to search effectively at multiple scales. In the context of opposition learning, the LSOBL strategy introduces a mathematical model based on the logarithmic spiral to generate opposite solutions. Compared to conventional OBL strategies, LSOBL enables wider yet directed exploration, accelerating convergence to the global optimum.
In the original BKA, the update mechanism retains the fitter individuals without exploring the solution space in the opposite direction. LSOBL introduces a spiral-based and dynamically bounded opposition learning model, mathematically defined in Equation (15).
Here, denotes the best individual in the current population. is the spiral angle that linearly increases with iteration. = 0.2 controls the spiral’s tightness. and define the dynamic opposition boundaries. is a decay function that gradually reduces the boundary range.
This strategy is embedded into the population update phase of BKA and consists of three steps. (1) Generating opposite solutions based on a logarithmic spiral. (2) Discretizing and applying boundary truncation for discrete parameters. (3) When conducting a competitive update, only the better individual, either the original or the spiral-opposite candidate, is retained. The spiral mechanism dynamically balances global and local search. In the early stages, is small, and the spiral radius is large, promoting global exploration. In the later stages, as close to , the spiral becomes tighter, focusing on local exploitation. The elite-guided term biases the search toward high-quality regions, avoiding ineffective exploration. A reflection boundary handling mechanism is applied when solutions exceed the defined domain for continuous parameters.
Incorporating LSOBL into BKA’s update mechanism significantly improves the algorithm’s global search ability and robustness. This enhancement effectively mitigates the limitations of the original BKA in dealing with unexplored solution spaces and premature convergence.
3.2.4. Time Complexity Analysis of MIBKA
Time complexity is a critical metric for evaluating the efficiency of an algorithm, representing its performance in the worst-case scenario concerning input size. The core procedure of the MIBKA includes four major phases: chaotic initialization, attack, migration, and population update. Let denote the population size, the dimensionality of the search space, and the maximum number of iterations.
The time complexity of MIBKA is jointly determined by the baseline operations of the BKA and the additional overhead from the proposed improvement strategies. MIBKA employs the circle chaotic map to generate the initial population during the chaotic initialization phase. For each individual, 100 warm-up iterations are conducted to eliminate transient effects. Therefore, the initialization phase has a time complexity of , simplifying to as the constant factor is negligible in asymptotic analysis. During the attack phase, mutation operations inspired by differential evolution, combined with boundary reflection handling and discrete parameter rounding, are applied to each individual across all dimensions, resulting in a time complexity of . The updates to positions during migration require traversing all individuals and computing dimensional updates, resulting in another complexity. The phase of population update involves generating logarithmic spiral-based opposition solutions, conducting validation filtering, and comparing elite steps, all of which contribute to the complexity of . Fitness comparison requires , which is asymptotically dominated by the higher-dimensional operations. Thus, the overall complexity remains .
In summary, the total time complexity per iteration of the MIBKA is . Considering the entire optimization process over iterations, the overall time complexity becomes . Although MIBKA introduces additional procedures such as chaotic mapping, differential mutation, and opposition learning, these improvements only marginally increase the per-iteration computation. The algorithm retains the same polynomial-level complexity as the original BKA. Thus, MIBKA maintains strong scalability and practical efficiency for large-scale optimization problems.
3.2.5. The Steps of MIBKA
MIBKA initializes the search space using the circle chaotic map, which ensures a uniformly distributed population. It employs an elite differential mutation strategy to balance global exploration and local exploitation during the optimization process. Furthermore, it integrates logarithmic spiral-based opposition learning to extend the search boundaries and avoid premature convergence. Throughout the optimization process, the model continuously evaluates the fitness of current candidate solutions, updates the direction of search based on elite individuals, and gradually converges towards the near-optimal solution. The complete pseudocode of MIBKA is presented in Algorithm 1.
Algorithm 1: MIBKA (Multi-Strategy Improved Black Kite Algorithm) |
Input: Population size N, Dimensionality D, Max iterations T_max |
Output: Best solution x_best and its fitness f_best |
1: Initialize population X = {x1, x2, ..., x_N} using circle chaotic map |
2: Evaluate fitness f(xi) for each xi ∈ X |
3: Identify initial leader x_best with best fitness f_best |
4: for t = 1 to T_max do |
5: for each individual xi ∈ X do |
6: Generate rand ∈ [0, 1] |
7: if rand < p_global then |
8: // Global exploration |
9: Update xi using leader-based exploration rule |
10: else |
11: // Local exploitation |
12: Update xi near x_best using local refinement rule |
13: end if |
14: Apply random-elite differential mutation to xi |
15: Evaluate new fitness f(xi) |
16: if f(xi) better than f_best then |
17: Update x_best ← xi, f_best ← f(xi) |
18: end if |
19: end for |
20: // Migration phase |
21: Generate new candidate x_new |
22: Evaluate f(x_new) |
23: if f(x_new) better than f_best then |
24: Update x_best ← x_new, f_best ← f(x_new) |
25: end if |
26: // Spiral opposition-based population update |
Input: Population size N, Dimensionality D, Max iterations T_max |
27: for each xi ∈ X do |
28: Generate spiral-opposition solution xi_opp |
29: Evaluate f(xi_opp) |
30: if f(xi_opp) better than f(xi) then |
3.3. CNN-BiLSTM Model
To effectively capture local patterns and global semantic structures from textual data, we construct a dual-branch CNN-BiLSTM hybrid feature extraction model. This model extracts local linguistic features using a Convolutional Neural Network (CNN). At the same time, a bidirectional long short-term memory (BiLSTM) network is employed to model sequential semantic dependencies. The outputs of these two branches are concatenated and used for final classification.
The input text is first preprocessed and converted into a sequence of word embeddings with dimensionality
. Given an input text length of
, the embedded representation can be formulated as a matrix.
Multiple one-dimensional convolutional kernels with varying sizes in the CNN branch are applied to capture multi-scale n-gram features. The feature map generated by the
convolutional kernel is computed as
and denote the weight and bias of the convolutional filter, respectively, and represents the one-dimensional convolution operation. ReLU is the activation function applied elementwise. All convolutional outputs are subject to max pooling along the temporal dimension and concatenated to form a fixed-dimensional local representation vector.
In the BiLSTM branch, the same input matrix
is fed into a bidirectional LSTM network to learn temporal dependencies. Let
denote the number of hidden units in each direction. The hidden state at time step
is computed as
The bidirectional hidden states are aggregated via average pooling or attention mechanisms to obtain a global semantic representation vector.
The outputs from the CNN and BiLSTM branches are concatenated to form the final text representation
.
Then, the representation vector is passed through a fully connected layer, followed by a softmax to generate the predicted label.
The CNN-BiLSTM model involves several structural hyperparameters that significantly affect its performance, including the kernel sizes and counts in CNN, the number of LSTM layers and hidden units, the dropout rate, and the learning rate. These hyperparameters form a high-dimensional mixed search space, and their optimal configuration is crucial for the model’s predictive capability.
3.4. MIBKA-CNN-BiLSTM Model
To enhance the adaptability and parameter tuning efficiency of the CNN-BiLSTM model, we introduce a multi-strategy improved Black Kite Algorithm (MIBKA) to optimize the model’s hyperparameters jointly.
In this framework, the optimized hyperparameters include the number of convolutional kernels, the number of hidden units in the BiLSTM layer, learning rate, batch size, dropout rate, and additional parameters. MIBKA considers cross-validation accuracy to be the fitness function. During the initialization phase, the circle chaotic map is employed to enhance the diversity of initial solutions. A random-to-best differential mutation strategy is utilized during the optimization process to enhance local perturbation capabilities. An elite-driven updating mechanism is implemented to steer the population towards promising areas in the solution space. Moreover, logarithmic spiral-based opposition learning is embedded in the position update phase to improve the quality of solutions further. During each iteration, MIBKA uses the hyperparameter configuration represented by each individual to train a CNN-BiLSTM model and evaluates its performance on a validation set. Through continuous iteration, the algorithm converges toward a near-optimal set of hyperparameters. The final output of MIBKA is used to construct the optimized CNN-BiLSTM model, which is then evaluated on the test set.
This optimization framework enables the model to achieve improved training efficiency and predictive performance highlighting the practical value of MIBKA in complex, high-dimensional optimization scenarios.
5. Conclusions and Future Work
5.1. Conclusions
This paper proposes an improved MIBKA-CNN-BiLSTM hybrid model for detecting fake information. The model enhances the Black Kite Optimization Algorithm by implementing a triple strategy and optimizes the dual-channel deep learning architecture. A high-quality dataset has also been constructed and validated, providing a reliable foundation for model performance evaluation.
Firstly, the MIBKA achieves breakthrough improvements through three key strategies. The first strategy involves the reconstruction of population initialization using circle chaotic mapping, effectively addressing the uneven distribution caused by traditional random initialization. This approach significantly accelerates convergence on the CEC2017 benchmark tests. The second strategy introduces a random-to-elite differential mutation (DE/rand-to-best/1), which replaces the original BKA attack phase’s random perturbation mechanism, establishing a dynamic balance between global exploration and local exploitation. Experiments demonstrate its effectiveness in avoiding local optima in complex optimization tasks. The third strategy designs an LSOBL mechanism that guides the active exploration of the reverse solution space through spiral phase angles and dynamic boundaries, substantially improving search efficiency and robustness. These improvements collectively endow MIBKA with superior parameter optimization capabilities.
Secondly, this study constructs an efficient CNN-BiLSTM dual-channel feature extraction network and employs MIBKA for intelligent hyperparameter optimization. The CNN branch utilizes multi-scale convolutional kernels to precisely capture local anomalous patterns in text, such as modifier stacking and abnormal referencing. Meanwhile, the BiLSTM branch models long-range contextual logical dependencies, such as causal breaks and concept shifts, through bidirectional state propagation. The MIBKA jointly optimizes key hybrid parameters, including the number of convolutional kernels and LSTM units, enabling the model structure to meet task requirements adaptively. Experiments on the self-built dataset show that the optimized model achieves an accuracy of 88.05%, improving by 6.22% over the unoptimized baseline CNN-BiLSTM and by 3.11% compared to the best-performing alternative, validating the effective synergy of architecture and optimization strategies.
Moreover, the proposed model compares with other baseline models on the publicly available Weibo21 dataset, achieving an accuracy of 86.72%, which is 0.41% higher than the best baseline model. The result indicates that the proposed model performs exceptionally well on the self-built dataset and exhibits strong generalization capability on a cross-domain, multi-source heterogeneous public dataset. The benefit primarily arises from MIBKA’s integration of global search and local adaptive mechanisms within complex parameter spaces. This integration enables the CNN-BiLSTM framework to maintain significant expressive power and robust classification across various textual styles and semantic differences. t-SNE analyses further illustrate clear class separability on both datasets, providing solid empirical support for the experimental results.
In summary, this research thoroughly investigates the synergistic application value of swarm intelligence optimization algorithms and deep learning models in fake information detection. Experimental results demonstrate that MIBKA-CNN-BiLSTM significantly outperforms mainstream single models, hybrid models, and pre-trained models on both the self-built and Weibo21 datasets, highlighting its comprehensive advantages in feature extraction, parameter optimization, and cross-scenario generalization. The constructed dataset provides excellent resources for domain research, significantly enhancing the rigor and reliability of model validation.
5.2. Suggestion
The proposed MIBKA-CNN-BiLSTM model demonstrates significant practical application potential in detecting and preventing false information. The efficient, accurate, and adaptable model enables effective performance in real-world scenarios. One application is real-time content review. Due to its lightweight architecture, the model can be integrated into the social media platform’s backend system to scan posts, comments, and shared content in real time. As a result, the system can quickly mark potential false information and shorten its diffusion window period. Another application is trend prediction for dissemination. By combining with social network analysis, the model can identify new clusters of false information. The ability of BiLSTM to capture semantic evolution can predict the changing trends of false information during the dissemination process. This capability enables the platform to limit the spread of high-risk topics actively.
5.3. Limitations and Future Work
Although MIBKA-CNN-BiLSTM excels at detecting fake information, it still has limitations and areas that require improvement. First, the model’s ability to discern ambiguous features or complex texts is inadequate due to its architecture’s lack of depth in understanding deep semantic contradictions and covert narrative strategies. Future research could explore incorporating external knowledge graphs or fact verification modules to enhance the model’s recognition of implicit logical fallacies. Second, although the model exhibits better generalization than baseline models in cross-domain scenarios, there remains room for improvement. Future work should investigate domain adaptation techniques to mitigate performance degradation caused by data distribution shifts. Finally, this study focuses on textual modality. Future work could expand to multimodal fake information detection, exploring cross-modal consistency modeling and joint optimization frameworks.