A Survey of Information Dissemination Model, Datasets, and Insight

Liu, Yanchao; Zhang, Pengzhou; Shi, Lei; Gong, Junpeng

doi:10.3390/math11173707

Open AccessReview

A Survey of Information Dissemination Model, Datasets, and Insight

by

Yanchao Liu

¹

,

Pengzhou Zhang

¹,

Lei Shi

^1,2

and

Junpeng Gong

^1,*

¹

State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China

²

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(17), 3707; https://doi.org/10.3390/math11173707

Submission received: 26 July 2023 / Revised: 18 August 2023 / Accepted: 21 August 2023 / Published: 28 August 2023

(This article belongs to the Topic Distributed Optimization for Control)

Download

Browse Figures

Versions Notes

Abstract

:

Information dissemination refers to how information spreads among users on social networks. With the widespread application of mobile communication and internet technologies, people increasingly rely on information on the internet, and the mode of information dissemination is constantly changing. Researchers have performed various studies from mathematical modeling and cascade prediction perspectives to explore the previous problem. However, lacking a comprehensive review of the latest information dissemination models hinders scientific development. As a result, it is essential to review the latest models or methods. In this paper, we review information dissemination models from the past three years and conduct a detailed analysis, such as explanatory and predictive models. Moreover, we provide public datasets, evaluation metrics, and interface tools for researchers focusing more on algorithm design and modeling. Finally, we discuss the model application and future research directions. This paper aims to understand better the research progress and development trends for beginners and guide future research endeavors. We believe this article will attract more researchers’ interest and attention to the information dissemination field on social networks.

Keywords:

information dissemination; explanatory model; predictive model; deep learning

MSC:

37-99

1. Introduction

With the widespread application of mobile communication and internet technology, many social platforms (e.g., Weibo, WeChat, and Twitter) have emerged and become major hubs for various types of information dissemination [1]. People keep in touch, share news, and engage in real-time interactions, which broaden the existing channels of information dissemination. Social networks’ mobility, openness, and convenience bring convenience to people’s lives, work, and learning [2]. However, it also provides a fertile environment for harmful information dissemination, such as malicious information and rumors [3], significantly threatening societal order [4,5]. Therefore, analyzing the process of information dissemination is of great significance for understanding the laws of information dissemination; it is applied widely in devising effective control strategies [6,7,8], predicting information dissemination trends [9], and identifying influential users [10].

Information dissemination (information propagation) refers to how information, new ideas, or user influence spreads among users through communication links on social networks [11]. Information dissemination models aim to capture the dynamic behavior of information dissemination [12]. This paper divides the analysis of the information dissemination process into macro-level and micro-level perspectives. At the macro level, the research involves studying information dissemination by observing the changing trends of different groups over time, which can reveal the law of information dissemination and develop optimized control strategies to curb the spread of harmful information, including epidemic models [13,14], network structure-based models [15,16], and multi-information competition models [17,18]. At the micro-level, it assesses the likelihood of potential users retweeting information [19]. Although researchers have conducted extensive research on modeling the information dissemination process, it remains a hot topical issue due to the breadth and complexity of research.

Figure 1 provides a summary of the papers included in this survey. As shown in the figure, the left picture shows that information dissemination has attracted attention recently, and the analytical methods used vary over time. Researchers are gradually starting to use deep learning to study information cascades and popularity prediction tasks from a new micro perspective. The right picture shows that the research works have been published in internationally renowned conferences/journals, such as Information Sciences, Expert Systems with Applications, Chaos, Solitons & Fractals, SIGKDD, IJIS, AAAI, AI, and WWW. We notice that over 48% of the papers come from other journals, further proving that this is an interdisciplinary topic covering artificial intelligence, physics, statistics, and more. In addition to traditional models, we supplement the information dissemination modeling framework by introducing data-driven methods at the micro-level. The research of information dissemination is still an ongoing process, and researchers are attempting to solve it using different technical tools, such as mathematical models, statistical models, and representation learning, as shown in Figure 2.

This paper aims to provide a comprehensive review of information dissemination models. We have achieved abundant research literature in this field, but lacking a unified categorization for existing work is challenging. Li et al. [12] and Zhou et al. [33] introduce a review of information dissemination models and information cascades prediction methods of information dissemination in 2021, respectively. Nevertheless, they do not include the most critical research work beyond 2021. Raponi et al. [34], who focus only on Epidemic models, summarize models from the perspective of user states. Similarly, Sun et al. [35] mainly summarize the three aspects of dissemination mechanism, propagation model, and optimized control strategy, but the research lacks network structure-based models and data-driven models. We find that researchers have conducted long-term research on information dissemination; there are few reviews of the latest methods, or the research only focuses on a particular technical line, hindering the in-depth exploration of information dissemination. In addition, some key concepts and methods are still ambiguous.

In view of the above problem, this paper summarizes and analyses the latest information dissemination models or methods over the past three years from macro-level and micro-level perspectives. This paper covers the major models and techniques by collecting and organizing no less than 100 articles, offering comprehensive insights from multiple perspectives, and facilitating the in-depth exploration of information dissemination. Furthermore, this paper introduces publicly available datasets, evaluation metrics, and interface tools. Additionally, we discuss the model application and future research directions for beginners.

The main contributions of this paper can be summarized as follows:

(1) A comprehensive review of the latest developments in information dissemination. This paper summarizes the state-of-the-art models or methods by extensively surveying the current state and analyzing over 100 papers. It is particularly valuable for beginners to grasp and familiarize themselves with the latest progress quickly.

(2) Classification of models from macro-level and micro-level perspectives. This paper proposes two main categories: explanatory and predictive models, which cover major information dissemination models and methods. We introduce and analyze each type of representative approach in detail for researchers to better understand the differences between various techniques.

(3) Summarization of publicly available datasets, evaluation metrics, and interface tools. This paper allows researchers to only focus on modeling and algorithm design. Furthermore, we discuss future research directions and model applications to help beginners understand the research directions and trends, inspiring them to produce more valuable research outcomes.

The rest section of this paper is organized as follows. In Section 2, we introduce the information dissemination models, including a formal definition and information dissemination models; Section 3 summarizes publicly available datasets, evaluation metrics, and interface tools; We further provide future research directions in Section 4; Section 5 discusses the application of information dissemination models. Finally, we present the conclusion in Section 6.

2. Information Dissemination Model

The information dissemination models aim to capture the dynamic behavior of information dissemination on social networks [12], which benefits us in understanding the law and the interaction of influencing factors on information dissemination with mathematical or computable models. In this section, this paper summarizes and analyzes the information dissemination models or methods from the past three years. Based on the macro-level and micro-level perspectives, Section 2 mainly covers the definition, explanatory models, and predictive models. Figure 3 shows the taxonomy of the information dissemination models.

We categorize methods from two different aspects, as shown in Figure 3. First, we define the information dissemination process according to the information dissemination mode. Second, we can classify the models or methods into explanatory and predictive models from the macro-level and micro-level perspectives; for example, analyzing the influence of factors on information dissemination. Finally, methodologically, we classify the explanatory models into epidemic, improved, network structure, and competitive models and also classify predictive models into traditional and deep learning models.

2.1. Definition

Information dissemination refers to a dynamic transmission and reception process; we take it as sharing resources process between the spreader and the receiver [36]. In this paper, we introduce the definition of information dissemination using a six-tuple representation, as shown in Equation (1):

I D P = (C, M, U, S, T, A)

(1)

where

I D P

represents the information dissemination process,

C

represents the information content, such as advertising products, valuable information, and harmful information.

M

represents the social platforms, such as Twitter, Weibo, and WeChat;

U

represents users;

S

represents the social networks, such as small-world networks, random networks, scale-free networks, and real social networks;

T

represents the time of information dissemination with timestamps

{t_{1}, t_{2}, ......, t_{n}}

.

A

represents actions, such as retweeting, commenting, liking, and saving.

2.2. The Explanatory Model

The explanatory models (the macro-level) can analyze the information dissemination process and predict the population’s trends over time using mathematical models [37]. The dynamics model has the advantages of high applicability, fast analysis, high selectivity, and sensitivity, widely applying to the spread of infectious diseases [38,39], computer viruses [40,41], malware [42,43], marketing advertising, and allocation of medical resources [44]. In this section, this paper primarily focuses on the epidemic, network-based structure, and competitive models. Finally, we compare and analyze the explanatory model.

Figure 4 shows the process of explanatory information dissemination. Red, yellow, blue, and cyan users represent information disseminators, the information-exposed, the information-ignorant, and the information-immune, respectively; we use explanatory models to model and analyze different types of population sizes that change over time.

2.2.1. Epidemic Model

Owing to the significant similarity in the internal mechanisms of epidemic propagation and information dissemination [45,46]. Epidemic models are applied widely in information dissemination research. The epidemic model, also named the “compartmental” model [47], refers to individuals who are assigned to different subcategories (states) and transformed according to deterministic rules in information dissemination [34]. The epidemic models rely on the following assumptions: the conditions of nodes within each subcategory are mutually independent, and the population is well-mixed and undifferentiated. In this section, we will primarily present classical epidemic models and improved epidemic models.

Classical Epidemic Model

In this section, we will primarily introduce the classic epidemic models, including the Susceptible-Infected (SI) [48], Susceptible-Infected-Susceptible (SIS) [49], and Susceptible-Infected-Recovered (SIR) [13,14,20,50,51]. The SI model is fundamental, and the subsequent models are all improved. The models divide the population into two compartments: S (susceptible) represents individuals unaware of the information, and I (infected) represents individuals spreading the information. Individuals may become susceptible again because of forgetting mechanisms or other factors, forming the SIS model. In the SIR model, R (recovered) is an individual who may become immune by exposure to debunking information released by authorities or media. Figure 5 shows the schematic diagram of the classic dissemination models. In the early stages, epidemic models are used to study the spread of smallpox [52], pioneering the era of infectious disease models. Based on those pioneering works, the DK model [21] and MK model [22] are proposed and applied in information dissemination. Sudbury et al. [20] are the first to use the SIR model to investigate information dissemination.

Where α represents the probability of a spreader to an information-ignorant;

β

represents the probability of an information-ignorant becoming a spreader;

γ

represents the possibility of a spreader becoming immune.

This paper takes the SIR model as an example. It divides the population into three sub-types to simplify the complexity of analysis, remaining constant over time. Information-ignorant individuals (susceptible) who are unaware of the information and at risk of being infected by the spreader; information spreader (infected) engage in information dissemination, such as liking, saving, and sharing information on social networks; information-immune individuals (recovered) lose interest or no longer propagate the information. The assumptions are as follows: when an information-ignorant encounters a spreader, who will become a spreader with probability α; when the spreader encounters accurate information or refutation information released by the government or authoritative media, the spreader becomes information-immune individuals with possibility β. Equation (2) shows the dynamic equations of the SIR model.

{\begin{cases} \frac{d S (t)}{d t} = - α S (t) I (t) \\ \frac{d I (t)}{d t} = α S (t) I (t) - β I (t) \\ \frac{d R (t)}{d t} = β I (t) \\ S (t) + I (t) + R (t) = 1 \end{cases}

(2)

where

α

denotes the infection rate when information-ignorance individuals come into contact with the spreader,

β

represents the immunity rate.

S (t)

,

I (t)

, and

R (t)

, respectively, indicate the density or proportion of the population in the susceptible, infected, and immune status at the time

t

. Algorithm 1 shows the basic algorithmic steps of the

S I R

model.

Algorithm 1: Basic Idea of the

S I R

Input:

S (0)

: Proportion of information-ignorance individuals,

I (0)

: Proportion of spreader,

R (0)

: Proportion of information-immune individuals,

T

: the cutoff time of information dissemination,

α

: the transmission rate of information,

β

: the immunity rate,

N

: the total population size (constant),

m

: the proportion of initial spreader.

Out:

S (T)

,

I (T)

,

R (T)

: Proportions of individuals in different compartments at the time

T

.

1

S (0) = N

,

I (0) = m

,

I (0) = m

,

R (0) = 0

,

t = 0

2 While

t < T

do

3

S (t + 1) = - α S (t) I (t)

;

4

I (t + 1) = α S (t) I (t) - β R (t)

;

5

R (t) = β R (t)

;

6

t = t + 1

;

7 End

The information infection rate exhibits a nonlinear relationship over time. Ke et al. [13] discuss the influence of control factors from the government and media on information dissemination by using the

S I R

model with non-smooth control and nonlinear contact functions. Experimental shows that enforcing silence and enhancing the ability of online information supervision can contain the spread of rumors. The openness of social platforms breaks the limitations of language and geography, Xia et al. [14], who investigate the impact of introducing new spatial distances, education, and mandatory silence mechanisms on curbing rumor propagation by optimization control theory and sensitivity analysis, propose a reaction-diffusion model with nonlinear functions in a multilingual environment, showing that the theoretical analysis is correct. Zhu et al. [48] develop an

S I

model with nonlinear dissemination rates and non-smooth optimization control functions. The research discusses the stability of equilibrium points and provides conditions for saddle-node bifurcation, Turing bifurcation, and Hopf bifurcation by Lyapunov functions. Zhu et al. [49] verify that the proposed immunity control strategy can effectively suppress information dissemination by analyzing the stability of

S I S

models with saturation functions. Ma et al. [50] prove the stability of equilibrium points and give the existence conditions of Hopf bifurcation using a non-smooth SIR model with two stages, investigating the incentive effect of secondary dissemination on information dissemination. Hu et al. [51] propose a cross-diffusion model with Allee effects and derive the system’s dynamic equations using a “multi-scale analysis” approach. The research also provides the conditions for the existence of Turing bifurcation. Considering time delays and two optimization control strategies, Li et al. [53] use the

S I R

model to introduce the impact of propaganda and post-deletion operations on information dissemination. The experiments demonstrate that the proposed models can reduce the number of rumor spreaders, control costs, and effectively suppress rumor propagation. Tu et al. [54] utilize ordinary differential equations to model information dissemination without underlying topology, which considers friendship, relationships, and the interactions among users, predicting the temporal and spatial patterns of information dissemination on social networks; the result shows no significant discrepancies compared with the actual data. Hu et al. [55] analyze the impact of time delays and Crowley’s incidence rate on information dissemination; they use the least squares method to fit model parameters. The experiment shows high consistency with real examples.

Discussion: The

S I R

,

S I

, and

S I S

models are the earliest and most basic models for information dissemination. Inspired by physics, Researchers consider the information dissemination process analysis as system stability analysis, which can effectively and flexibly restore the information dissemination process. At the same time, the computational complexity is relatively tiny when modeling the dynamic process. As a starting point for information dissemination research, these models are easy to understand and implement, reducing the complexity of modeling information dissemination. In simulation experiments, the effects of different factors on the information dissemination process can be quickly and conveniently evaluated with sensitivity analysis, thereby effectively and dynamically capturing the essential characteristics and dynamic changes in information dissemination and providing support for formulating reasonable optimization control strategies. However, these models belong to the well-mixed and compartment models, which homogenize the user features and transformation rules, thus ignoring the differences between individuals. Due to the limited types of compartments, only a few factors can be considered to affect information dissemination, limiting the model’s applicability. The correlation between the model’s parameters is considered independently, subjecting to subjectivity, which may not be consistent with the actual situation. In addition, the complexity of the model largely depends on assumptions.

2.: Improved Model

Based on the classical models, researchers introduce various user roles and hierarchical patterns to analyze the influences of individual, social, and natural factors on information dissemination; they propose many improved models to capture better the dynamics of information dissemination on social networks, which is in line with reality.

In Single Layer Model. Considering wise man and negative social reinforcement, Huo et al. [6] propose the

I S W R

model and find that positive social reinforcement facilitates information dissemination. By introducing a group that knows the information but does not spread it, Zhu et al. [56] develop a rumor propagation model by enforcing silence, which analyzes equilibrium points’ stability with backward bifurcation theory, linearization theory, and Hurwitz theory. The experiment demonstrates that enforced silence effectively inhibits information dissemination. Similarly, Pan et al. [57] use the

S I D R W

model considering media and refuters to analyze the influence of media reporting and debunking on rumor propagation, showing that positive media publicity can mitigate the harm caused by rumors but cannot eliminate it. Additionally, as the initial value of rumor spreaders increases, the duration of rumor propagation decreases. Chen et al. [58] find that the threshold increases with the improvement of scientific knowledge. Simultaneously, positive social reinforcement lowers the threshold and amplifies the impact. Mutlu et al. [59] propose a novel cognitive-driven model incorporating users’ cognitive depth, who first utilize the compartmental model to predict the size of information cascades, achieving a lower fitting error rate. Yu et al. [60] obtain the parameters of the

I D S R I

by fitting the real datasets using the least squares method, analyzing the impact of discussants on information dissemination using the

I D S R I

model.

Yu et al. [61] employ the

2 I 2 S R

to model information dissemination in a multilingual environment. Wang et al. [62] analyze the stability and existence of Hopf bifurcations of the

I S 2 R 2

with nonlinear inhibitory mechanism functions and forward-backward scanning algorithms. Meanwhile, they discuss the impact of a multilingual environment and time-delay factors on cross-information dissemination. Yu et al. [63] use

2 I 2 S R

to analyze the effects of time delay and no time delay on rumor propagation in a multilingual environment and provide critical conditions for the existence of Hopf bifurcations and optimization control strategies. Since the openness of the social platforms, Wang et al. [64] use the

I O - U A R

model, which considers the time delay of official information and the phenomenon of “following the crowd”, to explain the impact of user mobility on information dissemination caused; demonstrating the strong applicability of the model. Jiang et al. [65] propose a two-stage

S P N R

model to analyze the dynamic mechanisms on Weibo regarding incidental events. Considering dual refutation mechanisms, Guo et al. [66] explore the

S I C M R

model to study the influence of media reporting and initial values of counteracting individuals at the peak. Wang et al. [67] investigate the impact of scientific knowledge level theories and control strategies on rumor propagation using the

G - S C N D R

model.

In the Multi-Layer Model. In recent years, researchers have gained attention from multi-layer models, such as the media website layer [68], the friendship layer, the information layer, the epidemic layer, and the resource layer, which are specific interactions and constraints on multi-layer networks. For instance, when individuals become aware of epidemic information, they may take preventive measures to avoid infection, such as isolation, influencing the disease spread in the epidemic layer. Conversely, the spread of the disease can also impact information dissemination. Figure 6 shows a schematic diagram of the multi-layer model.

Cheng et al. [68] use the

X Y - I S R

model to gate the impact of time delay, optimal control strategies, and infected media on information dissemination. Meanwhile, Dong et al. [69] develop the

X Y Z - I S R

model considering event pulses to explore the effects of multi-channel and time delay in rumor propagation, validating the applicability and effectiveness of the model on a real dataset from Weibo. Xu et al. [70] propose a double-layer coupled SIS model considering the differences of individual influence, which utilize Markov and Mean Field Theory to analyze the interaction between awareness and epidemic. Experiments show that higher acceptance of information sharing among individuals facilitates epidemic propagation. Huang et al. [71] use a

U A 1 A 2 - S E I S

model to simulate the mutual competition between epidemics and information, finding that knowledge can eliminate rumors propagation and control epidemics spreading, and taking self-protective measures can reduce the risk of infection. Huo et al. [72] investigate the influence of individual emotional factors on information and epidemics propagation using a

U A U - S I S

model. Guo et al. [73] establish the dynamic equation to study the impact between information diffusion and epidemic spreading on a two-layer time-varying network, which considers the partial mapping relationship; the proposed model thresholds are found to be closely related to the correspondence ratio of the multi-layer. Wang et al. [74] construct a double-layer coupled model to analyze the impact of three social behavior strategies on a hyper-graph network.

Based on the two-layer model, Huo et al. [75] investigate the interactions among information, resources, and epidemics; the experimental reveals authoritative information inhibits epidemics dissemination. In contrast, information dissemination promotes the spread of epidemics, and resource utilization can inhibit epidemics’ propagation. Huo et al. [76] propose a

U A U - D K D - S I S

model, which utilizes the Heaviside step function and the microscopic Markov to investigate the evolutionary relationship between negative information, immune behavior, and epidemic spreading, indicating that strengthening the clarification of mass media and improving self-recognition ability contribute to controlling the epidemics spreading.

Discussion: Based on classic epidemic models, researchers introduce more compartments and factors in complex scenarios, such as lurker, doubter, and forgetting mechanism, which analyze the information transmission process more accurately. Especially in the two-layer model, the researcher can analyze the dynamic changes between each layer and the interactions between different layers in depth, which makes the model closer to the actual situation. However, introducing the compartment and realistic factors increase the number of the model parameters, which will increase the complexity of the model, put higher requirements on the model analysis work, and require a solid foundation in mathematical theory.

2.2.2. Network Structure Model

Complex network dynamics are based on network theory, integrating physics, sociology, computer science, and other disciplines. The information dissemination process relies on the underlying structure of social networks [77] in reality. Researchers combine the dynamics of information dissemination with the topology of social networks and propose various models, including random networks [23], small-world networks [24], as well as real social platforms (e.g., Twitter, Weibo, and Digg). In this section, this paper introduces homogeneous networks and heterogeneous networks, where homogeneous networks refer to the networks where each node (individual) has the same degree, corresponding to the average degree of the social network. Heterogeneous networks take the heterogeneity of nodes into account.

In Homogeneous Networks

Cheng et al. [1] present an improved

I S R M

model introducing media roles and analyze the effects of time delay, media coverage, science education, and impulsive control strategies. The experiment verifies that impulsive immunization is more effective than continuous immunization on homogeneous networks.

Taking the

S I R

model as an example, this paper provides the dynamical equation on homogeneous networks, which categorizes individuals into three states:

S

(susceptible individuals) represents the individuals who are unaware of the propagated information,

I

(infected individuals) represent the individuals actively spreading information, and

R

(immune individuals) represents the individuals that no longer spread information due to certain factors, such as debunking information, etc. Equation (3) shows the dynamical equation of the

S I R

model.

{\begin{cases} \frac{d S (t)}{d t} = - α < k > S (t) I (t) \\ \frac{d I (t)}{d (t)} = α < k > S (t) I (t) - β < k > I (t) [I (t) + R (t)] \\ \frac{d R (t)}{d t} = β < k > I (t) [I (t) + R (t)] \\ S (t) + I (t) + R (t) = 1 \end{cases}

(3)

where

S (t)

,

I (t)

, and

R (t)

represent the density or proportion of susceptible, infectious, and immune individuals, respectively;

< k >

represents the average degree of nodes on social network;

α

represents the infection rate;

β

represents the immunization rate.

2.: In Heterogeneous Networks

Due to the complexity of social relationships, homogeneous networks cannot accurately reflect the reality of network topology. Therefore, researchers further delve into the dynamic behavior of information dissemination on heterogeneous networks.

Lattice networks (LA): A two-dimensional space where nodes are arranged in a square grid of side length

\sqrt{N}

,

N

denotes the total number of nodes.

Small World Network (WS): Initially, a circular grid is formed with

N

nodes, where each node will connect to nodes on each side [25]. Then, generating a new edge by connecting node v

(v = 0, 1, 2, ......, N - 1)

with its rightmost adjacent nodes with a probability

w

.

Scale-free networks (BA): obtaining the BA network by evolving a network consisting of N nodes, and it grows continuously by adding new nodes. Each new node establishes links by preferentially attaching to existing nodes based on their degree

k_{i}

. Specifically, a new node connects to node v with a probability

q_{i} = \frac{k_{i}}{\sum_{j} k_{j}}

proportional to its degree. Furthermore, the degree of the nodes follows the power-law distribution. P(n) is the density of nodes with degree n, as shown in Equation (4).

P (n) = n^{- λ}

(4)

where

n

denotes the degree of nodes;

λ

denotes the hyper-parameter, which ranges from (2, 3).

Taking the

S I R

model as an example, as shown in Equation (5). The model classifies the population into three states:

S

(information-ignorant) represents individuals who are unaware of the information;

I

(information-spreaders) represents individuals who actively spread the information; Because of certain factors such as debunking news or forgetting,

R

(Information-Immune) represents individuals who lose interest in the information propagation.

S_{k} (t)

,

I_{k} (t)

,

R_{k} (t)

represent the density of information-ignorant individuals, spreaders, and immune with the node degree

k

at the time

t

, respectively. The underlying assumption is that the population size remains constant over time.

{\begin{cases} \frac{d S (t)}{d t} = - α k θ (t) S_{k} (t) \\ \frac{d I (t)}{d t} = α k θ (t) S_{k} (t) - β I (t) \\ \frac{d R (t)}{d t} = β I (t) \\ S (t) + I (t) + R (t) = 1 \end{cases}

(5)

where θ(t) denotes the probability that the information-ignorant is connected to the spreader at time t, as shown in Equation (6).

θ (t) = \sum_{k = 1}^{N} p (k | k^{'}) I_{k} (t) = \frac{1}{< k >} \sum_{k = 1}^{N} k p (k) I_{k} (t)

(6)

where

k

and

k^{'}

denote the degree of nodes,

α

represents the information forwarding rate,

p (k | k^{'}) = \frac{k p (k)}{< k >}

indicates the probability that a node with node degree

k^{'}

will randomly contact the spreader with the degree

k

, and

< k > = \sum_{k = 1}^{N} k p (k)

denotes the average degree of the network,

p (k)

denotes the probability of degree k,

β

represents the immunization rate.

Considering the “uncertainty” psychological, Yi et al. [15] use the

S U I R S

model to investigate the influence of individual repeated participation and subjective judgment behavior. The experiment shows that the occurrence of information dissemination depends on the threshold. Kumar et al. [16] analyze information dissemination in three types of groups with the

S E I

model, and simulation on real datasets reveals significant differences between the exposed and infected individuals. Ai et al. [78] explain the impact of anxiety on improved networks. Yin et al. [79] use an

S F I

model to analyze the influence of cross-platform environments and network topology, demonstrating that increasing the node degree facilitates rapid information outbreaks. In the multilingual and heterogeneous network environment, Li et al. [80] utilize the

I S_{1} S_{2} R_{1} R_{2}

model considering the educational mechanism to analyze the impact of multilingual environments and heterogeneous networks on rumor propagation; the research finds that the proposed model and control strategies effectively suppress the spread of harmful information.

There is a time delay phenomenon in information dissemination. Zhu et al. [81] analyze the behavior of susceptible and infected individuals by the optimized control dynamic model. Mei et al. [82] propose a Hyper-ILSR model with a saturation incidence rate, which uses hypergraph theory to study the higher-order effects. Chen et al. [83] propose the SEIR model to investigate the impact of saturation incidence rate and time delay. Cui et al. [84] capture the heterogeneity of individual intimacy on ER and SF networks. They use an adoption threshold model with a tent-like probability function to analyze the influence of individual fashion-passion trend (

I F P T

) characteristics. The simulation results are consistent with the theoretical analysis. Tong et al. [85] analyze the random perturbations in the propagation process, which consider user heterogeneity and dynamic network environments, demonstrating the feasibility of the model. Gong et al. [86] propose a

U H I R

model which combines the super-network theory and

S E I R

model to study user and information attributes’ influence; the proposed model establishes new research directions for hyper-networks.

In addition, researchers have compared and analyzed the dynamic of information dissemination simultaneously in heterogeneous and homogeneous networks. Yuan et al. [87] propose a

2 S I R

model with a nonlinear inhibition mechanism and time delay considering bilingual environments and multiple optimization control strategies. They analyze the stability of equilibrium points with Lyapunov functions and linearized equations and verify the proposed model’s correctness.

3.: Influence Model

The influence model refers to the influence of an individual’s activation behavior affected by the states or behaviors of its neighbors [54], including the linear threshold models [88] and the independent cascade models, explaining the dissemination from the perspectives of probability and threshold, respectively, applying widely in various fields, such as influence node detection [89,90], link prediction [11], and behavioral propagation [91].

The Linear Threshold Model: It is first proposed by Granovetter [26], and a mathematical and receiver-centered model to describe binary decision-making events [54,88]. Each agent has two states: activate and inactivate. Each individual has a threshold

w \in [0, 1]

, which relies on age, education, background, etc. The basic idea of the linear threshold model is as follows: selecting a subset of nodes randomly as the seed set

S \in V

, which are in the active state on the social network

G (V, E, W)

, where

V

represents the nodes set,

E

represents the edges set, and

W

represents the edge weights. At each time step, each node

i \in S

has a threshold value

T_{i}

. Where

e (i, j) \in E

,

j \in N (i)

represents the neighbors of a node

i

, and the sum of neighbors’ weights of a node must be less than or equal to 1, as shown in Equation (7).

\sum_{j \in N (i)} W_{i j} \leq 1

(7)

where

W_{i j} \in W

. In the beginning, there is a set of activated nodes and a threshold value, and after time

t

, if node

i

is activated, then the total activation value of the node’s neighbors

N (i)

needs to satisfy the following condition, as shown in Equation (8).

\sum_{j \in N (i)} W_{i j} \geq T_{i}

(8)

Finally, the information dissemination process will end when the number of individuals reaches a stable state. Table 1 shows meaning of the symbol.

Tian et al. [91] introduce a multi-layer network model with edge weight; the research utilizes two threshold models with trapezoidal and triangular probability functions to study the effect of behavioral preferences on behavioral propagation. The correctness of the theoretical analysis is verified.

The Independent Cascade Model: Inspired by the theory of interacting particle systems [54], the literature [92] proposes first the independent cascade model to study the marketing problem. It is a probabilistic model [10] focusing on the sender and is widely applied in information dissemination [93] and identification of rumor sources [94]. The node has two states: activated and inactivated.

The algorithm description is as follows: First, initializing the active node set

V

and the inactive node set

U

; second, the latest activated node v tries to activate its neighboring node u with the probability

p (u, v)

at the moment

t

, and if more than one activated nodes have the same inactive neighboring nodes, then the set of activated nodes will independently go to activate the common inactive nodes in an arbitrary order. Regardless of whether it succeeds in activating the inactive nodes, which has only one chance Supposing the neighboring node u is activated successfully. Subsequently, the node

u

turns to the active state at the time

t + 1

and will continue to try to activate its inactive neighboring nodes; otherwise, the state of node u does not change at the time

t + 1

. Finally, the above process is repeated continuously in an iterative manner. Like the threshold model’s ending condition, the dissemination process will end when no influential active nodes exist.

p_{v} (t) = 1 - \prod_{u \in N (v)} (1 - p (u, v))

(9)

where

v \in V

and

u \in U

, Table 2 shows meaning of the symbol.

Qiu et al. [95] propose the

B H I C M

model considering a dynamic relationship strategy between the propagation probability and the number of hops, which effectively avoids dealing with the neighbors of the seed nodes; the proposed method achieves promising results. Considering user and topic attributes, Chen et al. [96] introduce a hot topic diffusion approach based on the independent cascade (

I C

) model and trending search lists to predict the diffusion trend of a hot topic; the proposed model shows a significant reduction in the error rate compared to the other four models. Sharma et al. [97] find differences between fake and real content regarding dissemination dynamics and user behavior and propose a mixture of independent cascade models to facilitate network interventions for fake news mitigation.

Discussion: The dynamics model based on the network considers the heterogeneity and interaction between users and more accurately simulates the dynamic changes in information dissemination, effectively predicting the path and speed of information diffusion. Currently, Researchers mainly analyze the dynamic process on scale-free networks, small-world networks, and real social networks. The independent cascade and threshold models can effectively consider the temporal and spatial and explain how individuals make decisions and related mechanisms from a micro level. However, scale-free and small-world networks cannot reflect the real social network. Meanwhile, collecting real social network datasets is relatively difficult due to security and privacy. In addition, compared to the epidemic models, the complexity is higher, and the solving process requires greater computational complexity. The independent cascade and threshold models only consider the interaction among users and ignore the impact of information content attributes on the information dissemination process. Furthermore, the independent cascading model only has one chance to affect neighboring nodes, which is not in line with the actual situation.

2.2.3. Competitive Model

Competitive information dissemination refers to multiple types of information dissemination on social networks. It investigates the effects of interactions among different information (e.g., rumors and ant-rumor) on information dissemination, thus revealing the laws and nature of disseminating pluralistic information. Researchers pay increasing attention to pluralistic competitive information dissemination in recent years.

Ding et al. [17] develop an individual-level mathematical model with a forgetting mechanism from the competition perspective, which analyzes the impact of the forgetting and refutation mechanisms on rumor propagation. Zhang et al. [18] study the effects of individual decision and reputation factors on information dissemination with and without the principle of indirect reciprocity; the research finds that user interaction rules can effectively reduce the influence of malicious users. In addition, Wang et al. [98] use the

S I C

model to capture the short-term competition mechanism between two messages, which analyzes the effects of network topology and initial conditions on the survival period of the messages. Li et al. [99] explore the game relationship of the rumor and the ant-rumor model based on sparse representation and tensor completion. The research improves user behavior prediction accuracy and reflects the game relationship between rumor and ant-rumor. In addition, Liu et al. [100] combine the classical

S I S

model and the Markov method to analyze the effect of the homogenization trend. The experiments show that a greater homogenization tendency is conducive to forming “echo chambers.” Jiang et al. [101] introduce a

R S D

model with an optimal control strategy to divide the rumor-spreading process into two phases, focusing on the interactions between rumor and ant-rumor.

Subsequently, Yilmaz et al. [102] propose a reinforcement learning model based on the gradient of multi-intelligent depth-deterministic strategies, which combines game theory and agent-based methods to investigate the dynamic process of multi-dimensional information dissemination, achieving promising results. Considering the ability of self-identify, self-influence, and the influence of the heat event, Chen et al. [103] introduce a heat-influenced evolutionary game-theoretic model to effectively study the impact of competitiveness relationship between rumor and ant-rumor. Moreover, Yin et al. [104] investigate the effect of factors on rumor spreading with the

S O - S / E I R

and

C - S / E I D R

model, considering the difference in communicators’ confidence, inter-user interactions, cognitive, and knowledge level factors. Mou et al. [105] use evolutionary game theory to study the coexistence and antagonistic relationship between rumors and ant-rumors, reflecting the cooperation and competition relationships among multiple information.

Discussion: Competitive models can effectively reflect the interactions among multiple information, such as rumor and ant-rumor, which can help to reveal the complex interactions in information dissemination and provide us with an in-depth understanding. However, competitive models increase the complexity of analysis and greater computational resources, which are based on some assumptions of rational behavior. However, people’s behaviors are more complex and diverse in real situations. At the same time, the solution process of the model requires more complex computational methods, which are not applicable in large-scale networks. Table 3 shows the categorization of explanatory models.

2.3. The Predictive Model

Predictive propagation models (micro-level) refer to the methods which predict the probability of the following information disseminator and popularity in the information dissemination process on social networks. This section primarily describes these models from the traditional method and deep learning. Figure 7 shows a schematic process of information dissemination. For example, according to the observed historical information dissemination graph from time t1 to t7, we predict the following user who will forward information at time t8. Finally, we compare and analyze the predictive models.

2.3.1. Traditional Method

Machine learning is a significant branch of artificial intelligence in handling large-scale data. Its essence is feature-based learning, enabling automation and intelligent decision-making, predicting future trends, and identifying patterns. It involves designing appropriate algorithmic models based on the specific task, utilizing datasets as inputs to these models or algorithms, and iteratively refining and optimizing the algorithms to enhance model performance. Figure 8 shows that the traditional method consists of data collection and pre-processing, feature engineering, model selection and training, model evaluation and optimization, and information dissemination analysis, as shown in Figure 8.

Osho et al. [7] employ the random forest to select relevant features and then utilize Bayesian logistic regression to learn and predict user interactions on social networks. Foroozani et al. [9] propose a mathematical model with the nonlinear parabolic Fisher equation with Neumann boundary conditions based on anomalous diffusion characteristics, which aligns more with the actual situation. Zhu et al. [106] analyze user interactions on the Weibo platform by extracting user, topic, and social features as the primary features for predicting the path. The experiments show that decision tree methods perform best on the Weibo dataset; however, the back-propagation neural network performs best on the Twitter dataset. Singh et al. [107] employ a random walk to study the similarity measurement between users and content for predicting information dissemination. The proposed method outperforms the existing method based on structure and content. Ruchjana et al. [108] utilize a time-series Markov to predict the tweet-retweet behavior of social media users, which mainly relies on the previous user’s retweet behavior. The feasibility of the proposed method is validated. Equation (10) shows the discrete-state Markov process.

\begin{array}{l} P {X (t + s) = j | X (s) = i, X (α) = x (α), 0 \leq α \leq s} \\ = P {X (t + s) = j | X (s) = i} \forall s, t \geq 0, i, j \in S \end{array}

(10)

where

X (t)

represents the probability of the social media user disseminating information at the time

t

,

{X (t), t \geq 0}

represents the form of the temporal Markov chain,

S

represents the discrete state space, s represents time, and

x (α)

represents the state at the time

α

.

Ramezani et al. [109] propose a probabilistic generative model that maps the interrelationships between chain edges and cascade processes with a coupling matrix decomposition to infer the underlying social network structure and information dissemination. Considering the temporal, Xu et al. [110] incorporate the time feature into the proximity matrix and utilize hierarchical clustering methods to classify multi-scale information, which improves the model’s performance. Liu et al. [111] use Markov to analyze the information dissemination process, which considers an adaptive network’s information state and network topology.

In addition to the above mainstream methods, researchers try to study information dissemination from other technical routes, including Poisson and Hawkes processes, Forest Fire, self-exciting processes, and Energy models.

Wang et al. [112] utilize an improved energy model to analyze the impact of linkage rates in cross-platform settings quantitatively. Experiments validate the effectiveness of the proposed approach. Han et al. [27] put in a physical model which utilizes thermal energy to measure the impact of rumors on the network, finding several important and exciting results.

By incorporating a hierarchical attention mechanism, Yu et al. [113] present a transformed-enhanced Hawkes process into the Hawkes self-exciting point process. Results demonstrate significant improvements over the state-of-the-art methods.

Based on epidemic models, Kong et al. [114] first combine epidemic models and self-exciting processes with mathematical components to further the popularity prediction, and the performance of the proposed model is further improved.

Drawing an analogy between information dissemination and wildfires in forests, Indu et al. [28] analyze the key factors contributing to the rapid spread of forest fires used as the main features. They employ a designed nature-inspired algorithm to examine the impact of rumor-affected nodes, and the experimental results show the validity of the proposed method and the selected features. Kumar et al. [115] try to study information propagation with a modified forest-fire model, which introduces a novel Burnt state as non-spreaders, achieving a desirable result.

Discussion: Traditional methods belong to probabilistic statistical models, which construct models with prior knowledge and are used to assess the probability of potential users disseminating information. Compared with deep learning, the modeling process of traditional methods possesses stronger interpretability, which helps to understand the inherent laws of information dissemination, such as the Hawkes process and Markov. Moreover, traditional approaches require fewer computational resources, which are easy to meet. According to different fields, selecting and designing effective features can improve the model’s generalization. Traditional methods usually achieve better performance on small datasets, effectively compensating for the problems caused by lacking large-scale datasets. However, traditional methods cannot effectively capture deeper interrelationships within or between information cascades. Feature selection relies on domain knowledge, which is time-consuming and does not necessarily represent data information completely. The performance is limited when dealing with complex patterns, large-scale data, and high-dimensional features. In the complex feature space, traditional methods have the risk of over-fitting or under-fitting. The accuracy of Hawkes process prediction is low, which relies on strong assumptions.

2.3.2. Deep Learning

In recent years, data and computing resources have further improved, leading to the thriving growth of data-driven methods, which have gained considerable attention from researchers. They have been applied in many fields, such as information cascade prediction [3], natural language processing, image processing [116], and drug discovery [117]. This section primarily focuses on applying deep learning in information dissemination. DeepCas [29] is the first graph representation learning-based method for modeling the information dissemination process. Figure 9 shows a general framework of information dissemination based on deep learning. We can observe that deep learning methods (LSTM, GRU, RNN, Attention, GAT et al.) are used to effectively learn abstract features from social graphs, communication graphs, and information for information cascade prediction. Table 4 shows the meaning of the symbols used in the section.

(1) Information dissemination graph is a graph structure

G (V, E)

, where

e_{uv} = 1

represents the user

u

follows the user

v

. Where

u, v \in V

.

(2) The information cascade is denoted as

c^{i} = {(u_{i, 1}, t_{i, 1}), (u_{i, 2}, t_{i, 2}), ......, (u_{i, N}, t_{i, N})}

, all the information dissemination sequences constitute the information cascades set

C^{i} = {c_{1}^{i}, c_{2}^{i}, ......, c_{M - 1}^{i}, c_{M}^{i}}

.

(3) The model

Ω

is trained by the observed historical cascades set

C^{i} = {c_{1}^{i}, c_{2}^{i}, ......, c_{M}^{i}}

for predicting the probability

p (u_{i, N + 1})

of the following forwarding user, as shown in Equation (11).

p (u_{i, N + 1}) = Ω : {C^{i} \Rightarrow u_{i, N + 1}}

(11)

Zhou et al. [2] apply user representation knowledge to all layers of the hierarchical framework and investigate the impact of multi-scale modeling with the user representation learning framework. Zhao et al. [10] employ a multi-layer nonlinear auto-encoder collaborative embedding framework to learn node and cascade collaborative features without the underlying diffusion mechanism and network topology to predict nodes and order. Wang et al. [19] utilize dynamic encoders to infer user interests from historical data and employ a dual attention mechanism to capture the likelihood from both the information cascade and the original user, ranking the probabilities of the potential user. Wang et al. [118] use representation learning and attention modules to aggregate rich historical information and diverse latent factors, effectively avoiding mapping all users into a single vector and neglecting important user attributes, and achieving a promising result. Molaei et al. [119] combine a meta-path and representation learning approach to predict information dissemination.

Similarly, Ducci et al. [120] present a tree-structured long short-term memory (

L S T M

) network to learn rich features from information cascades. Considering social factors and personality traits, Yan et al. [121] propose a new multi-task framework, which designs a universal

G N N

gating component to simulate the impact of personality traits, achieving improved predictive performance. Wang et al. [122] introduce an end-to-end framework that utilizes graph convolution networks, dynamic routing methods, and

L S T M

to aggregate node feature representations and network structure for information cascade prediction, achieving promising results. Wang et al. [123] propose a topic-aware attention network model to study specific topics’ diffusion patterns and dependencies using deep learning and attention mechanisms. Jin et al. [124] learn dynamic user representations and use a dual-channel hypergraph to capture the relationships of information cross-diffusion, respectively, which investigates the influence of external factors and users’ dynamic interests, validating the effectiveness and practicality of the proposed framework. Taking the user influence and community redundancy features into account, Zhong et al. [125] utilize a neural network framework with two-layer attention to predict the incremental size of the information cascade, and the model takes the superiority and effectiveness. Equation (12) shows the loss function.

L = \frac{1}{M} \sum_{i = 1}^{M} {({\hat{y}}_{i} - y_{i})}^{2} + λ_{2} \sum_{θ \in P} | | θ | |_{2}

(12)

Liu et al. [126] propose a prediction model based on temporal-spatial attention and graph convolution networks, which effectively integrates time, user influence, and behavior. Considering network structure, content semantics, and temporal, Jin et al. [127] develop community influence graphs and multi-modal information from videos to predict the trajectories of community-level information dissemination. Equations (13) and (14) show the loss functions.

L_{B P R} = \sum_{(i, j^{+}, j^{-}, t)} - \ln (s i g m o i d ({\hat{y}}_{i j^{+}} - {\hat{y}}_{i j^{-}}))

(13)

L_{C E}^{i} = C r o s s E n t r o p y (P (S_{N + 1} | v_{i}, P_{i}), y_{N + 1})

(14)

Equation (15) shows the overall optimization objective.

L = L_{B P R} + λ_{1} \sum_{i \in V} L_{C E}^{i} + λ_{2} | | | Θ |_{2}

(15)

{\hat{y}}_{i j}^{t} = M L P ({\tilde{v}}_{i}^{t} ⊙ M L P ({\tilde{s}}_{j}^{t}))

(16)

Since information dissemination is similar to the process of image restoration, Xiao et al. [128] use a diffusion network pixelation algorithm to transform the user relationship network of topic dissemination into an image pixel matrix, which embeds the user relationship network into a low-dimensional space and build an information dissemination mutual influence transfer matrix, the study uses representation learning and game theory to predicts users’ behaviors, efficiently reflecting the competitive relationship between rumor and ant-rumor. Furthermore, researchers combine deep learning and dynamics to model information dissemination. Wang et al. [129] introduce a CCasGNN model integrating the individual profile, structural features, and sequence features, effectively addressing the limitation of spectral or spatial methods. Fatemi et al. [130] utilize

G N N

to identify influential nodes as initial nodes in the Susceptible-Infected-Recovered model to study information dissemination processes, validating the method’s effectiveness. Zang et al. [131] capture the nonlinear dynamics and dependencies between systems in a data-driven manner, transforming the discrete-time into continuous-time prediction with

G N N

and differential equation-based temporal dynamics. Experimental show the excellent performance and parameter efficiency of the approach. Murphy et al. [132] and La et al. [133] learn effective local mechanisms and comprehensive dynamics from sequential data with deep learning and simulate dynamics on complex networks. Kushwaha et al. [134] propose a customized weighted word embedding method combined with a long short-term memory (

L S T M

) to predict the likelihood of information diffusion, significantly improving overall accuracy. Yang et al. [30] proposed a novel full scale diffusion prediction model based on reinforcement learning (

R L

), which combines reinforcement learning and deep learning to predict information diffusion from microscopic and macroscopic.

To address the missing of labeled data, Xu et al. [31] propose a comparative self-supervised approach to learn the knowledge of graphs for downstream prediction tasks, which can effectively capture data changes and the dynamic characteristics of the cascades, showing the model outperforms supervised and semi-supervised methods. Considering global dependencies between users and cascades, Sun et al. [135] develop Memory-enhanced Sequential Hypergraph Attention Networks for information diffusion, improving the model’s performance. Wang et al. [136] propose Cascade-Enhanced Graph convolution Networks and design a cascade-specific aggregator that merges user, time, and cascade context, effectively exploiting collaborative patterns from other cascades to enhance the prediction of future infections.

Furthermore, the prediction of information cascade size is also an important research task. By obtaining rich features, Chen et al. [137] employ graph neural networks to learn latent representations from a multi-scale perspective to predict the information cascade size. Zhou et al. [138] employ hierarchical variation models to explore uncertainty at the sub-graph and cascade levels, which use variation inference to learn the posterior probability of cascade distribution, thereby improving the accuracy. Wu et al. [139] propose a novel framework based on user preference for popularity prediction, which considers preference topic generation, preference shift modeling, and social influence activation. Zhou et al. [140] present a general decoupling prediction solution from a long-tailed distribution to predict the size of information cascades, and the long-tailed cascade prediction problem is mitigated.

Discussion: Deep learning is a representation learning method that obtains highly abstract features (such as content features, time features, structural features, and user behavior features) by learning from raw information cascade datasets without human intervention. These features can be used to predict user behavior and popularity in information dissemination. Deep learning frameworks can handle multi-modal data and complex network structures, such as text, images, and audio, thus capturing the intrinsic interactions between multiple modes of information dissemination. In addition, researchers use the network structure of deep learning to simulate the information dissemination process. Compared with explanatory models, the modeling process of deep learning does not require prerequisite assumptions. However, predictive models’ performance depends on the data’s size and quality. Deep learning is an end-to-end model named “black box,” in which the training and learning process is difficult to interpret. Insufficient data volume or excessive model complexity can easily lead to over-fitting, which reduces the model’s generalization. The model training process requires a lot of computational resources and time; it is an unavoidable problem. Additionally, the performance of the model is also affected by the hyper-parameter settings, which is a challenging task, as shown in Table 5.

3. Datasets, Evaluation Metrics, and Tools

After introducing models or methods, in this section, we present publicly available datasets, evaluation metrics, and interface tools for researchers, which are related to the research for information cascade prediction.

3.1. Datasets

In this section, this paper introduces publicly available datasets for researchers focusing more on algorithm design and model construction. Compared with Zhou [33], we summarize much broader datasets involving academic citations and social networks, such as Stanford Network Analysis Platform and Aminer, Douban. Table 6 shows a review of frequently used datasets. We can find that these datasets, which come from the mainstream platforms, have been widely used by researchers to validate the model’s performance, so the validity, reliability, and quality are recognized.

Stanford Network Analysis Platform (SNAP) [141]: It is a general-purpose network library for analysis and graph mining library. A collection of more than 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges include social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks.

Aminer Dataset [142]: The datasets released public by ArnetMiner mainly focus on scientific paper citation networks, which can be widely applied in cascade prediction research [35]. The citation data comes from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources; each paper contains the abstract, authors, year, title, etc.

Twitter Dataset [143]: The datasets contain the links, the participated users, and the time from Twitter in October 2010, which is publicly available on the website (https://www.isi.edu/~lerman/downloads/twitter/twitter2010.html, (accessed on 26 July 2023)), and viewing each url among users as the information cascade.

Digg Dataset [144]: The datasets come from Digg.com, which collects news stories on the Digg homepage during a month in 2009. Digg users rate these news stories through votes, retweets, or comments.

Douban Dataset [145]: The datasets come from Douban, which provides a web service where users can share content about books. In this dataset, books are shared as information items, activating a user’s status when reading a book.

Memetracker Dataset [146]: The datasets containing blogs and online news articles are collected from 1 August 2008, to 30 April 2009. viewing each website or blog as a user. Specifically, the datasets are applied in social network analysis, information dissemination, and recommender systems.

Weibo Dataset [32]: The publicly available datasets (https://github.com/CaoQi92/DeepHawkes/tree/master, (accessed on 26 July 2023)) contain micro-blogs posted on 1 June 2016. The datasets include only those messages with more than ten retweets and the retweets records of these messages within 24 h. In addition to user IDs, the datasets also provide the retweeting path. It shows the retweeted news links from the original publisher to other users.

3.2. Evaluation Metrics

This section presents public evaluation metrics for information cascading prediction. We can apply evaluation metrics to measure the model performance. Next, this paper provides an overview of commonly used evaluation metrics.

(1) R-Squared [3]: It is a commonly used statistical metric for assessing the fit degree of a regression model with observed data. It ranges from 0 to 1, where a value closer to 1 indicates a better-fitting degree of the model [15]. However, Since it is essentially calculated based on the fit degree of the linear model, it may not be accurate enough to assess the explanatory power of nonlinear models. It applies to linear models and is not applicable to nonlinear models, as shown in Equation (17).

R - s q u a r e d = \frac{S S R}{S S T} = 1 - \frac{S S E}{S S T}

(17)

where

S S T

is the sum of the squares of the differences between the original data and the mean,

S S R

is the sum of the squares of the differences between the predicted data and the mean, and

S S E

is the sum of the squares of the errors between the fitted data and the corresponding original data points. Additionally,

S T T = S S E + S S R

.

(2) MSLE (Mean Square Logarithmic Error) [3]: It is a widely used evaluation metric for regression models to amplify the differences between smaller predicted values and actual values, directing the model’s focus toward the errors in these smaller values. It is more sensitive to the logarithmic scale of the data distribution and suitable for cases with an exponential growth trend. However, it is not applicable to negative or zero values, which may be too focused on more minor errors, as shown in Equation (18).

M S L E = \frac{1}{N} \sum_{i = 1}^{N} {(\log_{2} \overset{⌢}{y} - \log_{2} y_{i})}^{2}

(18)

where

\hat{y}

represents the actual value,

{\hat{y}}_{i}

represents the predicted value, and

N

represents the total number of samples.

(3) MAE (Mean Absolute Error) [3] is a commonly used evaluation metric for regression models that measures the average absolute difference between the predicted values and the actual values. It is easy to interpret, insensitive to outliers, and suitable for regression problems. However, it does not consider the square of the error, which may not capture the distribution of the error, as shown in Equation (19).

M A E = \frac{\sum_{i = 1}^{N} | \log_{2} \overset{⌢}{y} - \log_{2} y_{i} |}{N}

(19)

where

\hat{y}

represents the actual value,

{\hat{y}}_{i}

represents the predicted value, and

N

represents the total number of samples.

(4) MSE (Mean Square Error) [125]: It is used to evaluate the average of the squared differences between predicted values and actual values in regression analysis which is a way to measure the performance of the models. It penalizes significant errors more severely. However, the square of the MSE error may amplify the effect of outliers, as shown in Equation (20).

M S E = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}

(20)

where

y_{i}

indicates the actual value,

\hat{y}

represents the predicted value, and

N

represents the total number of samples.

(5) RMSPE (Root Mean Square Percentage Error) [125] is utilized to evaluate the performance of the models, which is particularly useful for dealing with data that has different scales or units. However, It may not be well suited to actual values that are small in absolute terms and susceptible to zero values. It is suitable for regression problems, as shown in Equation (21).

R M S P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(\frac{{\hat{y}}_{i} - y_{i}}{y_{i}})}^{2}}

(21)

where

y_{i}

indicates the actual value,

{\hat{y}}_{i}

represents the predicted value, and N denotes the total number of samples.

(6) Hits@k [126]: It is an indicator used to assess the performance of search or recommendation systems, which measures the probability of successfully identifying relevant results within the top

k

results. We consider successfully hitting if the top

k

results contain the actual label. A higher value for this metric indicates better performance. It is simple and intuitive for categorization problems, which can measure the percentage of correct predictions made by the model. It is suitable for ranking potential users. However, it does not consider the magnitude of the error and may not reflect the detailed performance of the model. as shown by Equation (22).

h i t s @ k = \frac{1}{| S |} \sum_{i = 1}^{| S |} f (r a n k_{i} \leq n)

(22)

where

S

represents a set of triplets,

| S |

represents the number of triplets,

r a n k_{i}

represents the predicted ranking of the

i - th

triplet, and

f (\cdot)

is the indicator function.

(7) Map@k [126] is a metric used to measure the average precision within the retrieval results of the top

k

.

k

represents the ranking or the number of retrieval results. For each query, the accuracy of the relevant documents is accumulated and averaged. The value ranges from 0 to 1; the higher values indicate the higher average precision within the top

k

results. It is suitable for classification tasks, which integrate the ranking and relevance of retrieval results. However, the calculation is complex.

3.3. Interface Tools

AnyLogic [65] is a visualization tool used to create dynamic models, which perform modeling and simulation of discrete, system dynamics, multi-agent, and hybrid systems. It is an innovative simulation software based on the latest methodologies for complex system design, which introduces the UML language in the model simulation. It is the only software supporting hybrid state machines, enabling compelling descriptions of discrete and continuous behaviors.

EoN [147] is a Python toolkit that utilizes the Scipy library to implement approximately 20 partial differential equation models. It can simulate SIS and SIR models in information dissemination processes on social networks.

Data collection tools [35]: It provides the interface tools of five domestic and international mainstream social platforms to obtain datasets, which are convenient for researchers.

4. Future Research Directions

Although modeling the information dissemination process has been researched from multiple perspectives and levels in recent years, many achievements have been made. However, mathematical models are limited by many assumptions; in view of data-driven modeling, the model’s performance has a gap with users’ expectations. Therefore, the information dissemination process analysis is still a direction that needs to be deeply cultivated and attracts more researchers. In this section, we will briefly introduce future research.

(1) Cross-platform and dynamic information dissemination. The current research focuses on information dissemination on a single social platform. People can access information through various media platforms, including social media, news websites, blogs, television, and radio, which increases the scope of information diffusion and accelerates the flow of information. At the same time, due to media platforms’ different operation and management mechanisms, audience behavior and interactive behavior vary greatly. Understanding how information spreads on different platforms can help us develop strategies to combat them. Moreover, previous researchers have focused more on analyzing the information propagation process on a given static network. Due to the behaviors of new user registration and user logout on social platforms, the graph of information dissemination changes dynamically over time, which can spread information quickly, causing severe consequences. Therefore, it is crucial to promptly detect the first signs of information outbreaks in a short time, which is more relevant to practical needs.

(2) Collecting large-scale multi-modal datasets. To improve the accuracy of information cascade prediction, researchers are constantly striving to create more sophisticated algorithms. However, there is still a problem: the lack of effective benchmark datasets that cover multiple modal data and have a specific scale for information dissemination verification. Although existing information dissemination datasets simplify data attributes by preserving nodes, time, and relationships, they overlook the impact of data modality and attributes. Therefore, there is an urgent need to construct a large-scale generalized datasets containing multiple modalities (e.g., audio, video, image, and text). The datasets will provide researchers with a more comprehensive and realistic information dissemination environment, which will help to reveal the interactions between different modalities and how they collectively affect the patterns and effects of information dissemination. Through this initiative, researchers will be able to more accurately analyze, predict and intervene in the dynamics of information dissemination in multi-modal environments, promoting the in-depth development of the field of information dissemination.

(3) Combining Dynamical Theory and Deep Learning. In recent years, the dynamical theory has been widely applied in various fields such as physics, disease propagation, and information dissemination, enabling the effective prediction of system behavior and evolutionary trends with reliability and accuracy. At the same time, deep learning leverages its powerful representation-learning capabilities to learn and extract meaningful features from raw data automatically. Since information dissemination is a dynamic process involving temporal variations and interaction among users on social networks, dynamical models can capture information dissemination patterns over time, while deep learning can extract valuable information from these variations. Dynamical models may not be able to capture the complex patterns in information dissemination, while deep learning models are better able to deal with nonlinear relationships and high-dimensional data; by combining the strengths of these two techniques, we can increase the complexity and adaptability of the model better to reflect the information dissemination process. Besides, deeper deep learning network structures can be utilized to simulate complex information dissemination processes. Consequently, combining deep learning with dynamical theory to analyze information dissemination processes is a promising direction.

5. Model Application

Information dissemination is a common phenomenon with theoretical and practical significance, which can help us understand the law of information dissemination and provide a basis for formulating strategies for information governance.

(1) Information Recommendation. We often observe the phenomenon of information dissemination in the recommendation field, existing in advertising and marketing strategy. Advertising marketers can better build brand image and more powerful delivery of the product or service’s core message by effectively promoting product advertising. The audience in-depth can understand the advantages of the product or service and then be more targeted to make purchasing decisions. At the same time, effective marketing strategies can be formulated. Therefore, studying information diffusion models is crucial for making informed decisions before advertising. In addition, it has a wide range of applications, such as computer virus propagation, opinion propagation, disease propagation, and malware propagation [42,43,45]. Therefore, it reveals insights and applications in a variety of fields, thus guiding more research and practice.

(2) Network Reconstruction. In network science research, reconstructing network structures from historical data has become important [148]. Information dissemination relies on complex underlying social networks. Due to the security and privacy issues of the platform, it is generally difficult to obtain a complete social network structure. However, we can use known partial information, models, and parameters to reconstruct the network’s connectivity structure. In addition, we can design more robust network structures to resist malicious attacks, e.g., hackers may develop a malicious virus to hinder the transmission of network flow by analyzing the network structure, leading to communication line failures and congestion.

(3) Popularity Prediction. It has extremely important practical significance in various fields. It can determine whether the information will experience an “avalanche” effect and when it will gradually disappear, which can not only greatly help us make measures in advance and provide a basis but also help us better understand the trend of information dissemination. Furthermore, it can also observe and analyze the information dissemination process from a macro perspective. For example, before malicious information reaches its peak, It can not only present necessary intervention and governance measures to mitigate the effects of malicious information but also guide us in developing more accurate communication strategies, thereby transmitting information to a broader target audience.

6. Conclusions

In this paper, we provide a more comprehensive summary of the research hot spot of information dissemination models. First, we present the definition of the information dissemination process. Then, we introduce methods used in information dissemination and analyze the advantages and disadvantages in detail. Moreover, we present publicly available datasets, evaluation metrics, and interface tools. Subsequently, we develop a detailed outlook on future research directions and discuss the model application. However, from the above research, we can find significant differences in information dissemination mechanisms among different social platforms; current researchers mainly focus on single social platforms, such as Weibo and Twitter. Studying cross-platform is beneficial for identifying the patterns and evolutionary trends of information dissemination on different platforms; it is a promising research direction. Simultaneously, it is difficult for mathematical models and traditional methods to form a universal analysis framework and apply it to practical engineering applications. Due to the advantages of end-to-end and highly abstract features for deep learning, current research methods are gradually shifting to deep learning. This review aims to help readers quickly sort out the technical paths and development trends of information dissemination and inspire subsequent research.

Author Contributions

Conceptualization, Y.L. and L.S.; methodology, L.S.; formal analysis, P.Z.; investigation, Y.L.; resources, J.G.; writing—original draft preparation, Y.L. and L.S.; writing—review and editing, Y.L. and L.S.; supervision, J.G.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (No. 2022YFC3302100).

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cheng, Y.; Huo, L.; Zhao, L. Stability analysis and optimal control of rumor spreading model under media coverage considering time delay and pulse vaccination. Chaos Solitons Fractals 2022, 157, 111931. [Google Scholar] [CrossRef]
Zhou, H.; Xu, S.; Fu, Z.; De Melo, G.; Zhang, Y.; Kapadia, M. HID: Hierarchical multiscale representation learning for information diffusion. Int. Jt. Conf. Artif. Intell. 2020, 3357–3363. [Google Scholar] [CrossRef]
Sun, X.; Zhou, J.; Liu, L.; Wei, W. Explicit time embedding based cascade attention network for information popularity prediction. Inf. Process. Manag. 2023, 60, 103278. [Google Scholar] [CrossRef]
Van Der Linden, S. Misinformation: Susceptibility, spread, and interventions to immunize the public. Nat. Med. 2022, 28, 460–467. [Google Scholar] [CrossRef]
Mian, A.; Khan, S. Coronavirus: The spread of misinformation. BMC Med. 2020, 18, 89. [Google Scholar] [CrossRef]
Huo, L.; Chen, S.; Zhao, L. Dynamic analysis of the rumor propagation model with consideration of the wise man and social reinforcement. Phys. A Stat. Mech. Appl. 2021, 571, 125828. [Google Scholar] [CrossRef]
Osho, A.; Goodman, C.; Amariucai, G. MIDMod-OSN: A microscopic-level information diffusion model for online social networks. In Computational Data and Social Networks, Proceedings of the 9th International Conference, CSoNet 2020, Dallas, TX, USA, 11–13 December 2020; Proceedings 9; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
Li, Z.; Du, X.; Zhao, Y.; Tu, Y.; Lev, B.; Gan, L. Lifecycle research of social media rumor refutation effectiveness based on machine learning and visualization technology. Inf. Process. Manag. 2022, 59, 103077. [Google Scholar] [CrossRef]
Foroozani, A.; Ebrahimi, M. Nonlinear anomalous information diffusion model in social networks. Commun. Nonlinear Sci. Numer. Simul. 2021, 103, 106019. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, N.; Lin, T.; Yu, P.S. Deep Collaborative Embedding for information cascade prediction. Knowl. Based Syst. 2020, 193, 105502. [Google Scholar] [CrossRef]
Singh, S.S.; Mishra, S.; Kumar, A.; Biswas, B. CLP-ID: Community-based link prediction using information diffusion. Inf. Sci. 2020, 514, 402–433. [Google Scholar] [CrossRef]
Li, H.; Xia, C.; Wang, T.; Wen, S.; Chen, C.; Xiang, Y. Capturing Dynamics of Information Diffusion in SNS: A Survey of Methodology and Techniques. ACM Comput. Surv. 2021, 55, 1–51. [Google Scholar] [CrossRef]
Ke, Y.; Zhu, L.; Wu, P.; Shi, L. Dynamics of a reaction-diffusion rumor propagation model with non-smooth control. Appl. Math. Comput. 2022, 435, 127478. [Google Scholar] [CrossRef]
Xia, Y.; Jiang, H.; Yu, Z.; Yu, S.; Luo, X. Dynamic analysis and optimal control of a reaction-diffusion rumor propagation model in multi-lingual environments. J. Math. Anal. Appl. 2023, 521, 126967. [Google Scholar] [CrossRef]
Yi, Y.; Zhang, Z.; Yang, L.T.; Gan, C.; Deng, X.; Yi, L. Reemergence Modeling of Intelligent Information Diffusion in Heterogeneous Social Networks: The Dynamics Perspective. IEEE Trans. Netw. Sci. Eng. 2020, 8, 828–840. [Google Scholar] [CrossRef]
Kumar, S.; Saini, M.; Goel, M.; Aggarwal, N. Modeling Information Diffusion In Online Social Networks Using SEI Epidemic Model. Procedia Comput. Sci. 2020, 171, 672–678. [Google Scholar] [CrossRef]
Ding, H.; Xie, L. Simulating rumor spreading and rebuttal strategy with rebuttal forgetting: An agent-based modeling approach. Phys. A Stat. Mech. Its Appl. 2023, 612, 128488. [Google Scholar] [CrossRef]
Zhang, H.; Li, Y.; Chen, Y.; Zhao, H.V. Smart evolution for information diffusion over social networks. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1203–1217. [Google Scholar] [CrossRef]
Wang, R.; Huang, Z.; Liu, S.; Shao, H.; Liu, D.; Li, J.; Wang, T.; Sun, D.; Yao, S.; Abdelzaher, T. DyDiff-VAE: A dynamic variational framework for information diffusion prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021. [Google Scholar]
Sudbury, A. The proportion of the population never hearing a rumour. J. Appl. Probab. 1985, 22, 443–446. [Google Scholar] [CrossRef]
Daley, D.J.; Kendall, D.G. Epidemics and Rumours. Nature 1964, 204, 1118. [Google Scholar] [CrossRef]
Maki, D.P.; Thompson, M. Mathematical Models and Applications: With Emphasis on the Social Life, and Management Sciences; Prentice-Hall: Upper Saddle River, NJ, USA, 1973. [Google Scholar]
Barabási, A.-L.; Albert, R. Emergence of Scaling in Random Networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
Zanette, D.H. Critical behavior of propagation on small-world networks. Phys. Rev. E 2001, 64, 050901. [Google Scholar] [CrossRef] [PubMed]
Granovetter, M. Threshold Models of Collective Behavior. SSRN Electron. J. 1978, 83, 1420–1443. [Google Scholar] [CrossRef]
Han, S.; Zhuang, F.; He, Q.; Shi, Z.; Ao, X. Energy model for rumor propagation on social networks. Phys. A Stat. Mech. Appl. 2014, 394, 99–109. [Google Scholar] [CrossRef]
Indu, V.; Thampi, S.M. A nature—Inspired approach based on Forest Fire model for modeling rumor propagation in social networks. J. Netw. Comput. Appl. 2018, 125, 28–41. [Google Scholar] [CrossRef]
Li, C.; Ma, J.; Guo, X.; Mei, Q. DeepCas: An end-to-end predictor of information cascades. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 577–586. [Google Scholar]
Yang, C.; Wang, H.; Tang, J.; Shi, C.; Sun, M.; Cui, G.; Liu, Z. Full-Scale Information Diffusion Prediction with Reinforced Recurrent Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2271–2283. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Zhou, F.; Zhang, K.; Liu, S. Ccgl: Contrastive cascade graph learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 4539–4554. [Google Scholar] [CrossRef]
Cao, Q.; Shen, H.; Cen, K.; Ouyang, W.; Cheng, X. Deephawkes: Bridging the gap between prediction and understanding of information cascades. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017. [Google Scholar]
Zhou, F.; Xu, X.; Trajcevski, G.; Zhang, K. A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances. ACM Comput. Surv. CSUR 2021, 54, 1–36. [Google Scholar]
Raponi, S.; Khalifa, Z.; Oligeri, G.; Di Pietro, R. Fake news propagation: A review of epidemic models, datasets, and insights. ACM Trans. Web TWEB 2022, 16, 1–34. [Google Scholar] [CrossRef]
Sun, L.; Rao, Y.; Wu, L.; Zhang, X.; Lan, Y.; Nazir, A. Fighting False Information from Propagation Process: A Survey. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Bakshy, E.; Rosenn, I.; Marlow, C.; Adamic, L. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012. [Google Scholar]
Yang, B.; Yu, Z.; Cai, Y. The impact of vaccination on the spread of COVID-19: Studying by a mathematical model. Phys. A Stat. Mech. Appl. 2022, 590, 126717. [Google Scholar] [CrossRef]
Wang, F.; Cao, L.; Song, X. Mathematical modeling of mutated COVID-19 transmission with quarantine, isolation and vaccination. Math. Biosci. Eng. 2022, 19, 8035–8056. [Google Scholar] [CrossRef]
Gómez, A.; Oliveira, G. New approaches to epidemic modeling on networks. Sci. Rep. 2023, 13, 468. [Google Scholar] [CrossRef]
Yu, Z.; Gao, H.; Wang, D.; Alnuaim, A.A.; Firdausi, M.; Mostafa, A.M. Sei2rs malware propagation model considering two infection rates in cyber–physical systems. Phys. A Stat. Mech. Appl. 2022, 597, 127207. [Google Scholar] [CrossRef]
Shao, S.; Li, Z. Distributed immune time-delay SEIR-S model for new power system information network virus propagation. J. Intell. Fuzzy Syst. 2023, 44, 6865–6876. [Google Scholar] [CrossRef]
Yu, X.; Wan, A. Dynamical aspects of a delayed SEI2RS malware dissemination model in cyber–physical systems. Results Phys. 2022, 40, 105851. [Google Scholar] [CrossRef]
Xiao, M.; Chen, S.; Zheng, W.X.; Wang, Z.; Lu, Y. Tipping point prediction and mechanism analysis of malware spreading in cyber–physical systems. Commun. Nonlinear Sci. Numer. Simul. 2023, 122, 107247. [Google Scholar] [CrossRef]
Guan, G.; Guo, Z. Bifurcation and stability of a delayed SIS epidemic model with saturated incidence and treatment rates in heterogeneous networks. Appl. Math. Model. 2021, 101, 55–75. [Google Scholar] [CrossRef]
Lerman, K.; Ghosh, R. Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks. Proc. Int. AAAI Conf. Web Soc. Media 2010, 4, 90–97. [Google Scholar] [CrossRef]
Abdullah, S.; Wu, X. An epidemic model for news spreading on Twitter. In Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 7–9 November 2011. [Google Scholar]
Choi, D.; Chun, S.; Oh, H.; Han, J.; Kwon, T. Rumor Propagation is Amplified by Echo Chambers in Social Media. Sci. Rep. 2020, 10, 310. [Google Scholar] [CrossRef]
Zhu, L.; Zheng, W.; Shen, S. Dynamical analysis of a SI epidemic-like propagation model with non-smooth control. Chaos Solitons Fractals 2023, 169, 113273. [Google Scholar] [CrossRef]
Zhu, L.; Yang, F.; Guan, G.; Zhang, Z. Modeling the dynamics of rumor diffusion over complex networks. Inf. Sci. 2021, 562, 240–258. [Google Scholar] [CrossRef]
Ma, X.; Shen, S.; Zhu, L. Complex dynamic analysis of a reaction-diffusion network information propagation model with non-smooth control. Inf. Sci. 2023, 622, 1141–1161. [Google Scholar] [CrossRef]
Hu, J.; Zhu, L.; Peng, M. Analysis of Turing patterns and amplitude equations in general forms under a reaction–diffusion rumor propagation system with Allee effect and time delay. Inf. Sci. 2022, 596, 501–519. [Google Scholar] [CrossRef]
Bernoulli, D. Essai d’une Nouvelle Analyse de la Petite Vérole, & des Avantages de l’Inoculation pour la Prévenir. Histoire de l’Académie Royale des Sciences Avec les Mémoires de Mathématique et de Physique Tirés de Cette Académie. 1766, pp. 1–45. Available online: https://gallica.bnf.fr/ark:/12148/bpt6k35800 (accessed on 26 July 2023).
Li, C.; Ma, Z. Dynamics Analysis and Optimal Control for a Delayed Rumor-Spreading Model. Mathematics 2022, 10, 3455. [Google Scholar] [CrossRef]
Tu, H.T.; Phan, T.T.; Nguyen, K.P. Modeling information diffusion in social networks with ordinary linear differential equations. Inf. Sci. 2022, 593, 614–636. [Google Scholar] [CrossRef]
Hu, J.; Zhu, L. Turing pattern analysis of a reaction-diffusion rumor propagation system with time delay in both network and non-network environments. Chaos Solitons Fractals 2021, 153, 111542. [Google Scholar] [CrossRef]
Zhu, L.; Wang, B. Stability analysis of a SAIR rumor spreading model with control strategies in online social networks. Inf. Sci. 2020, 526, 1–19. [Google Scholar] [CrossRef]
Pan, W.; Yan, W.; Hu, Y.; He, R.; Wu, L. Dynamic analysis of a SIDRW rumor propagation model considering the effect of media reports and rumor refuters. Nonlinear Dyn. 2022, 111, 3925–3936. [Google Scholar] [CrossRef]
Huo, L.; Chen, S. Rumor propagation model with consideration of scientific knowledge level and social reinforcement in heterogeneous network. Phys. A Stat. Mech. Appl. 2020, 559, 125063. [Google Scholar] [CrossRef]
Mutlu, E.; Rajabi, A.; Garibay, I. CD-SEIZ: Cognition-Driven SEIZ Compartmental Model for the Prediction of Information Cascades on Twitter. In Proceedings of the 2020 Conference of The Computational Social Science Society of the Americas; Springer International Publishing: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
Yu, Z.; Lu, S.; Wang, D.; Li, Z. Modeling and analysis of rumor propagation in social networks. Inf. Sci. 2021, 580, 857–873. [Google Scholar] [CrossRef]
Li, L.; Li, Y.; Zhang, J. A Fractional-Order SIR-C Cyber Rumor Propagation Prediction Model with a Clarification Mechanism. Axioms 2022, 11, 603. [Google Scholar] [CrossRef]
Wang, J.; Jiang, H.; Hu, C.; Yu, Z.; Li, J. Stability and Hopf bifurcation analysis of multi-lingual rumor spreading model with nonlinear inhibition mechanism. Chaos Solitons Fractals 2021, 153, 111464. [Google Scholar] [CrossRef]
Yu, S.; Yu, Z.; Jiang, H.; Yang, S. The dynamics and control of 2I2SR rumor spreading models in multilingual online social networks. Inf. Sci. 2021, 581, 18–41. [Google Scholar] [CrossRef]
Wang, Y.; Wang, J.; Wang, H.; Zhang, R.; Li, M. Users’ mobility enhances information diffusion in online social networks. Inf. Sci. 2021, 546, 329–348. [Google Scholar] [CrossRef]
Jiang, G.; Li, S.; Li, M. Dynamic rumor spreading of public opinion reversal on Weibo based on a two-stage SPNR model. Phys. A Stat. Mech. Its Appl. 2020, 558, 125005. [Google Scholar] [CrossRef]
Guo, H.; Yan, X. Dynamic modeling and simulation of rumor propagation based on the double refutation mechanism. Inf. Sci. 2023, 630, 385–402. [Google Scholar] [CrossRef]
Wang, X.; Li, Y.; Li, J.; Liu, Y.; Qiu, C. A rumor reversal model of online health information during the COVID-19 epidemic. Inf. Process. Manag. 2021, 58, 102731. [Google Scholar] [CrossRef]
Cheng, Y.; Huo, L.; Zhao, L. Dynamical behaviors and control measures of rumor-spreading model in consideration of the infected media and time delay. Inf. Sci. 2021, 564, 237–253. [Google Scholar] [CrossRef]
Dong, Y.; Huo, L.; Zhao, L. An improved two-layer model for rumor propagation considering time delay and event-triggered impulsive control strategy. Chaos Solitons Fractals 2022, 164, 112711. [Google Scholar] [CrossRef]
Xu, H.; Zhao, Y.; Han, D. The impact of the global and local awareness diffusion on epidemic transmission considering the heterogeneity of individual influences. Nonlinear Dyn. 2022, 110, 901–914. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Chen, Y.; Ma, Y. Modeling the competitive diffusions of rumor and knowledge and the impacts on epidemic spreading. Appl. Math. Comput. 2020, 388, 125536. [Google Scholar] [CrossRef] [PubMed]
Huo, L.; Gu, J. The influence of individual emotions on the coupled model of unconfirmed information propagation and epidemic spreading in multilayer networks. Phys. A Stat. Mech. Appl. 2023, 609, 128323. [Google Scholar] [CrossRef]
Guo, H.; Yin, Q.; Xia, C.; Dehmer, M. Impact of information diffusion on epidemic spreading in partially mapping two-layered time-varying networks. Nonlinear Dyn. 2021, 105, 3819–3833. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Wang, Z.; Yu, P.; Xu, Z. The impact of different strategy update mechanisms on information dissemination under hyper network vision. Commun. Nonlinear Sci. Numer. Simul. 2022, 113, 106585. [Google Scholar] [CrossRef]
Huo, L.; Zhao, R.; Zhao, L. Effects of official information and rumor on resource-epidemic coevolution dynamics. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9207–9215. [Google Scholar] [CrossRef]
Huo, L.; Yu, Y. The impact of the self-recognition ability and physical quality on coupled negative information-behavior-epidemic dynamics in multiplex networks. Chaos Solitons Fractals 2023, 169, 113229. [Google Scholar] [CrossRef]
Kuznetsov, O.P. Complex networks and activity spreading. Autom. Remote. Control. 2015, 76, 2091–2109. [Google Scholar] [CrossRef]
Ai, S.; Hong, S.; Zheng, X.; Wang, Y.; Liu, X. CSRT rumor spreading model based on complex network. Int. J. Intell. Syst. 2021, 36, 1903–1913. [Google Scholar] [CrossRef]
Yin, F.; Pan, Y.; Tang, X.; Wu, C.; Jin, Z.; Wu, J. An information propagation network dynamic considering multi-platform influences. Appl. Math. Lett. 2022, 133, 108231. [Google Scholar] [CrossRef]
Li, J.; Jiang, H.; Mei, X.; Hu, C.; Zhang, G. Dynamical analysis of rumor spreading model in multi-lingual environment and heterogeneous complex networks. Inf. Sci. 2020, 536, 391–408. [Google Scholar] [CrossRef]
Zhu, L.; Tang, Y.; Shen, S. Pattern study and parameter identification of a reaction-diffusion rumor propagation system with time delay. Chaos Solitons Fractals 2023, 166, 112970. [Google Scholar] [CrossRef]
Mei, X.; Zhang, Z.; Jiang, H. Dynamical Analysis of Hyper-ILSR Rumor Propagation Model with Saturation Incidence Rate. Entropy 2023, 25, 805. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Jiang, H.; Li, L.; Li, J. Dynamical behaviors and optimal control of rumor propagation model with saturation incidence on heterogeneous networks. Chaos Solitons Fractals 2020, 140, 110206. [Google Scholar] [CrossRef]
Cui, Y.; Wei, R.; Tian, Y.; Tian, H.; Zhu, X. Information propagation influenced by individual fashion-passion trend on multi-layer weighted network. Chaos Solitons Fractals 2022, 160, 112200. [Google Scholar] [CrossRef]
Tong, X.; Jiang, H.; Qiu, J.; Luo, X.; Chen, S. Dynamic analysis of the IFCD rumor propagation model under stochastic disturbance on heterogeneous networks. Chaos Solitons Fractals 2023, 173, 113637. [Google Scholar] [CrossRef]
Gong, Y.-C.; Wang, M.; Liang, W.; Hu, F.; Zhang, Z.-K. UHIR: An effective information dissemination model of online social hypernetworks based on user and information attributes. Inf. Sci. 2023, 644, 119284. [Google Scholar] [CrossRef]
Yuan, T.; Guan, G.; Shen, S.; Zhu, L. Stability analysis and optimal control of epidemic-like transmission model with nonlinear inhibition mechanism and time delay in both homogeneous and heterogeneous networks. J. Math. Anal. Appl. 2023, 526, 127273. [Google Scholar] [CrossRef]
Kempe, D.; Kleinberg, J.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003. [Google Scholar]
Rajeh, S.; Yassin, A.; Jaber, A.; Cherifi, H. Analyzing community-aware centrality measures using the linear threshold model. In Complex Networks & Their Applications X: Volume 1, Proceedings of the Tenth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2021 10; Springer International Publishing: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Tran, C.; Zheleva, E. Heterogeneous Peer Effects in the Linear Threshold Model. Proc. AAAI Conf. Artif. Intell. 2022, 36, 4175–4183. [Google Scholar] [CrossRef]
Tian, Y.; Tian, H.; Cui, Y.; Zhu, X.; Cui, Q. Influence of behavioral adoption preference based on heterogeneous population on multiple weighted networks. Appl. Math. Comput. 2023, 446, 127880. [Google Scholar] [CrossRef]
Goldenberg, J.; Libai, B.; Muller, E. Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth. Mark. Lett. 2001, 12, 211–223. [Google Scholar] [CrossRef]
Chair-Macskassy, S.G.; Chair-Perlich, C.G.; Chair-Leskovec, J.P.; Chair-Wang, W.P.; Chair-Ghani, R.P. Cost-effective outbreak detection in networks. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007. [Google Scholar] [CrossRef]
Berenbrink, P.; Hahn-Klimroth, M.; Kaaser, D.; Krieg, L.; Rau, M. Inference of a Rumor’s Source in the Independent Cascade Model. arXiv 2022, arXiv:2205.12125. [Google Scholar]
Qiu, L.; Liu, Y.; Duan, X. The best hop diffusion method for dynamic relationships under the independent cascade model. Appl. Intell. 2022, 52, 17315–17325. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Zhou, W.; Du, Y.; Fan, Y.; Huang, D.; Chen, X. A hot topic diffusion approach based on the independent cascade model and trending search lists in online social networks. Math. Biosci. Eng. 2023, 20, 11260–11280. [Google Scholar] [CrossRef] [PubMed]
Sharma, K.; He, X.; Seo, S.; Liu, Y. Network Inference from a Mixture of Diffusion Models for Fake News Mitigation. Proc. Int. AAAI Conf. Web Soc. Media 2021, 15, 668–679. [Google Scholar] [CrossRef]
Wang, J.; Wang, W.; Wang, C. Modeling and Analysis of Conflicting Information Propagation in a Finite Time Horizon. IEEE/ACM Trans. Netw. 2020, 28, 972–985. [Google Scholar] [CrossRef]
Li, Q.; Zeng, C.; Xu, W.; Xiao, Y. A social rumor and anti-rumor game diffusion model based on sparse representation and tensor completion. J. Netw. Comput. Appl. 2022, 201, 103343. [Google Scholar] [CrossRef]
Liu, L.; Wang, X.; Zheng, Y.; Fang, W.; Tang, S.; Zheng, Z. Homogeneity trend on social networks changes evolutionary advantage in competitive information diffusion. New J. Phys. 2020, 22, 013019. [Google Scholar] [CrossRef]
Jiang, M.; Gao, Q.; Zhuang, J. Reciprocal spreading and debunking processes of online misinformation: A new rumor spreading–debunking model with a case study. Phys. A Stat. Mech. Appl. 2020, 565, 125572. [Google Scholar] [CrossRef]
Yilmaz, T.; Ulusoy, O. Misinformation Propagation in Online Social Networks: Game Theoretic and Reinforcement Learning Approaches. IEEE Trans. Comput. Soc. Syst. 2022; in press. [Google Scholar] [CrossRef]
Chen, J.; Wei, N.; Xin, C.; Liu, M.; Yu, Z.; Liu, M. Anti-Rumor Dissemination Model Based on Heat Influence and Evolution Game. Mathematics 2022, 10, 4064. [Google Scholar] [CrossRef]
Yin, F.; Jiang, X.; Qian, X.; Xia, X.; Pan, Y.; Wu, J. Modeling and quantifying the influence of rumor and counter-rumor on information propagation dynamics. Chaos Solitons Fractals 2022, 162, 112392. [Google Scholar] [CrossRef]
Mou, X.; Xu, W.; Zhu, Y.; Li, Q.; Xiao, Y. A Social Topic Diffusion Model Based on Rumor and Anti-Rumor and Motivation-Rumor. IEEE Trans. Comput. Soc. Syst. 2022; in press. [Google Scholar] [CrossRef]
Zhu, H.; Yang, X.; Wei, J. Path prediction of information diffusion based on a topic-oriented relationship strength network. Inf. Sci. 2023, 631, 108–119. [Google Scholar] [CrossRef]
Singh, N.; Singh, A.; Sharma, R. Predicting Information Cascade on Twitter Using Random Walk. Procedia Comput. Sci. 2020, 173, 201–209. [Google Scholar] [CrossRef]
Firdaniza; Ruchjana, B.N.; Chaerani, D. Information diffusion model using continuous time Markov chain on social media. J. Phys. Conf. Ser. 2021, 1722, 012091. [Google Scholar] [CrossRef]
Ramezani, M.; Ahadinia, A.; Bideh, A.Z.; Rabiee, H.R. Joint Inference of Diffusion and Structure in Partially Observed Social Networks Using Coupled Matrix Factorization. ACM Trans. Knowl. Discov. Data 2023, 17, 1–28. [Google Scholar] [CrossRef]
Xu, Y.; Wu, P. Multiscale clustering based diffusion representation learning method. In Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT’21), Leicester, UK, 6–9 December 2021. [Google Scholar] [CrossRef]
Liu, C.; Zhou, N.; Zhan, X.-X.; Sun, G.-Q.; Zhang, Z.-K. Markov-based solution for information diffusion on adaptive social networks. Appl. Math. Comput. 2020, 380, 125286. [Google Scholar] [CrossRef]
Wang, C.; Wang, G.; Luo, X.; Li, H. Modeling rumor propagation and mitigation across multiple social networks. Phys. A Stat. Mech. Its Appl. 2019, 535, 122240. [Google Scholar] [CrossRef]
Yu, L.; Xu, X.; Trajcevski, G.; Zhou, F. Transformer-enhanced Hawkes process with decoupling training for information cascade prediction. Knowl. Based Syst. 2022, 255, 109740. [Google Scholar] [CrossRef]
Kong, Q.; Rizoiu, M.-A.; Xie, L. Modeling information cascades with self-exciting processes via generalized epidemic models. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020. [Google Scholar] [CrossRef]
Kumar, S.; Saini, M.; Goel, M.; Panda, B.S. Modeling information diffusion in online social networks using a modified forest-fire model. J. Intell. Inf. Syst. 2020, 56, 355–377. [Google Scholar] [CrossRef] [PubMed]
Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
Xu, H.; Sang, S.; Bai, P.; Li, R.; Yang, L.; Lu, H. GripNet: Graph information propagation on supergraph for heterogeneous graphs. Pattern Recognit. 2023, 133, 108973. [Google Scholar] [CrossRef]
Wang, H.; Yang, C. Information diffusion prediction with latent factor disentanglement. arXiv 2020, arXiv:2012.08828. [Google Scholar] [CrossRef]
Molaei, S.; Zare, H.; Veisi, H. Deep learning approach on information diffusion in heterogeneous networks. Knowl. Based Syst. 2019, 189, 105153. [Google Scholar] [CrossRef]
Ducci, F.; Kraus, M.; Feuerriegel, S. Cascade-LSTM: A tree-structured neural classifier for detecting misinformation cascades. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020. [Google Scholar]
Yan, D.; Cao, J.; Xie, W.; Zhang, Y.; Zhong, H. PersonalityGate: A general plug-and-play GNN gate to enhance cascade prediction with personality recognition task. Expert Syst. Appl. 2022, 203, 117381. [Google Scholar] [CrossRef]
Wang, Y.; Wang, X.; Ran, Y.; Michalski, R.; Jia, T. CasSeqGCN: Combining network structure and temporal sequence to predict information cascades. Expert Syst. Appl. 2022, 206, 117693. [Google Scholar] [CrossRef]
Wang, H.; Yang, C.; Shi, C. Neural Information Diffusion Prediction with Topic-Aware Attention Network. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, Queensland, Australia, 1–5 November 2021. [Google Scholar]
Jin, H.; Wu, Y.; Huang, H.; Song, Y.; Wei, H.; Shi, X. Modeling Information Diffusion with Sequential Interactive Hypergraphs. IEEE Trans. Sustain. Comput. 2022, 7, 644–655. [Google Scholar] [CrossRef]
Zhong, C.; Xiong, F.; Pan, S.; Wang, L.; Xiong, X. Hierarchical attention neural network for information cascade prediction. Inf. Sci. 2023, 622, 1109–1127. [Google Scholar] [CrossRef]
Liu, X.; Miao, C.; Fiumara, G.; De Meo, P. Information Propagation Prediction Based on Spatial–Temporal Attention and Heterogeneous Graph Convolutional Networks. IEEE Trans. Comput. Soc. Syst. 2023; in press. [Google Scholar] [CrossRef]
Jin, Y.; Lee, Y.-C.; Sharma, K.; Ye, M.; Sikka, K.; Divakaran, A.; Kumar, S. Predicting Information pathways across online communities. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), Long Beach, CA, USA, 6–10 August 2023; ACM: New York, NY, USA, 2023. [Google Scholar]
Xiao, Y.; Huang, Z.; Li, Q.; Lu, X.; Li, T. Diffusion Pixelation: A Game Diffusion Model of Rumor & Anti-Rumor Inspired by Image Restoration. IEEE Trans. Knowl. Data Eng. 2022, 35, 4682–4694. [Google Scholar] [CrossRef]
Wang, Y.; Wang, X.; Jia, T. Ccasgnn: Collaborative cascade prediction based on graph neural networks. In Proceedings of the 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hangzhou, China, 4–6 May 2022. [Google Scholar]
Fatemi, B.; Molaei, S.; Pan, S.; Rahimi, S.A. GCNFusion: An efficient graph convolutional network based model for information diffusion. Expert Syst. Appl. 2022, 202, 117053. [Google Scholar] [CrossRef]
Zang, C.; Wang, F. Neural dynamics on complex networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020. [Google Scholar]
Murphy, C.; Laurence, E.; Allard, A. Deep learning of contagion dynamics on complex networks. Nat. Commun. 2021, 12, 4720. [Google Scholar] [CrossRef] [PubMed]
La Malfa, E.; La Malfa, G.; Nicosia, G.; Latora, V. Characterizing learning dynamics of deep neural networks via complex networks. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA, 1–3 November 2021. [Google Scholar]
Kushwaha, A.K.; Kar, A.K.; Ilavarasan, P.V. Predicting information diffusion on Twitter a deep learning neural network model using custom weighted word features. In Responsible Design, Implementation and Use of Information and Communication Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza, South Africa, 6–8 April 2020; Proceedings, Part I 19; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
Sun, L.; Rao, Y.; Zhang, X.; Lan, Y.; Yu, S. MS-HGAT: Memory-Enhanced Sequential Hypergraph Attention Network for Information Diffusion Prediction. Proc. Conf. AAAI Artif. Intell. 2022, 36, 4156–4164. [Google Scholar] [CrossRef]
Wang, D.; Wei, L.; Yuan, C.; Bao, Y.; Zhou, W.; Zhu, X.; Hu, S. Cascade-enhanced graph convolutional network for information diffusion prediction. In International Conference on Database Systems for Advanced Applications; Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Chen, X.; Zhang, F.; Zhou, F.; Bonsangue, M. Multi-scale graph capsule with influence attention for information cascades prediction. Int. J. Intell. Syst. 2021, 37, 2584–2611. [Google Scholar] [CrossRef]
Zhou, F.; Xu, X.; Zhang, K.; Trajcevski, G.; Zhong, T. Variational information diffusion for probabilistic cascades prediction. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
Wu, L.; Wang, H.; Chen, E.; Li, Z.; Zhao, H.; Ma, J. Preference enhanced social influence modeling for network-aware cascade prediction. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022. [Google Scholar]
Zhou, F.; Yu, L.; Xu, X.; Trajcevski, G. Decoupling representation and regressor for long-tailed information cascade prediction. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021. [Google Scholar]
Leskovec, J.; Sosič, R. Snap: A general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. TIST 2016, 8, 1–20. [Google Scholar] [CrossRef] [PubMed]
Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; Su, Z. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008. [Google Scholar]
Hodas, N.O.; Lerman, K. The Simple Rules of Social Contagion. Sci. Rep. 2014, 4, 4343. [Google Scholar] [CrossRef] [PubMed]
Hogg, T.; Lerman, K. Social dynamics of Digg. EPJ Data Sci. 2012, 1, 5. [Google Scholar] [CrossRef]
Zhong, E.; Fan, W.; Wang, J.; Xiao, L.; Li, Y. Comsoc: Adaptive transfer of user behaviors over composite social network. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 696–704. [Google Scholar]
Leskovec, J.; Backstrom, L.; Kleinberg, J. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 497–506. [Google Scholar]
Miller, J.; Ting, T. EoN (Epidemics on Networks): A fast, flexible Python package for simulation, analytic approximation, and analysis of epidemics on networks. J. Open Source Softw. 2019, 4, 1731. [Google Scholar] [CrossRef]
Gray, C.; Mitchell, L.; Roughan, M. Bayesian inference of network structure from information cascades. IEEE Trans. Signal Inf. Process. Over Netw. 2020, 6, 371–381. [Google Scholar] [CrossRef]

Figure 1. (Left) Number of publications in recent years (orange color represents the number of explanatory types, and yellow represents the number of prediction types). (Right) Distribution of venues for publications.

Figure 2. Development of Information Dissemination Models [20,21,22,23,24,25,26,27,28,29,30,31,32].

Figure 3. Taxonomy of Information Dissemination Models.

Figure 4. The illustration of explanatory information dissemination.

Figure 5. Illustration of the Classic SI, SIS, and SIR Models.

Figure 6. Schematic Diagram of the Multi-Layer Model.

Figure 7. Schematic Process of Information Dissemination Prediction.

Figure 8. Schematic Process of Traditional Method.

Figure 9. General Framework of Information Dissemination based on Deep Learning.

Table 1. Meaning of the Symbol.

Symbol	Meaning	Symbol	Meaning
$G (V, E, W)$	the social network	$V$	the nodes set
E	the edges set	$W$	the edge weights
$j \in N (i)$	the neighbors of a node $i$	$e (i, j) \in E$	the edge between node i and node j
$S$	the seed set	$T_{i}$	the threshold value
$i$	the node	$W_{i j}$	the edge weights between node i and node j

Table 2. Meaning of the Symbol.

Symbol	Meaning	Symbol	Meaning
$V$	the active node set	$U$	the inactive state set
$p (u, v)$	the probability of	$t$	the time
$u$ , $v$	the user	$N (v)$	the neighbor set of user $v$
$p_{v} (t)$	the probability that the user $v$ is activated at the time t

Table 3. Categorization of explanatory models.

Class	Subclass	Models	Advantages	Disadvantages
Epidemic Model	Classic Epidemic Model	SI [48], SIS [49], SIR [13,14,20,50,51]	Basic models. Simplifying the information dissemination process, Minimal computational. Quickly verifying the impact of different parameters on information dissemination. Strong generalizations.	Well-mixed model, ignoring individual differences and randomness. The model’s parameters are subject to human subjectivity. Providing a degree of simplification of reality, ignoring complex factors and interactions. There are more parameters and assumptions. In a multi-layer model, theoretical analysis is complicated.
Epidemic Model	Improved Epidemic Model	SIDRW [57], CD-SEIZ [59], ISWR [6], IDSRI [60], 2I2SR [61], IS2R2 [62], IO-UAR [64], SPNR [65], SICMR [66], G-SCNDR [67] XY-ISR [68], XYZ-ISR [69], SIS-SIS [70], UA1A2-SEIS [71], UAU-SIS [72], UAU-SIS [73]	Introducing various roles. In the multi-layer model, describing effectively the dynamic process within the respective layer and reflecting the interaction of information dissemination between different layers, which is more in line with the actual situation.
Network Structure Model	Homogeneous Network	ISRM [1]	The nodes have the same connection patterns and laws, simplifying the analysis process and effectively reflecting the impact of network structure on information dissemination.	Ignoring the heterogeneity and complexity between nodes. Therefore, there are fewer studies on homogeneous networks.
	Heterogeneous Network	SUIRS [15], SEI [16], Net-E-SFI [79], S1S2R1R2 [80], Hyper-ILSR [82], SEIR [83], IFCD [85], UHIR [86]	In line with reality, highlighting the influence of network topology on information dissemination. Effectively reflects differences among nodes.	Usually difficult to obtain real datasets, increasing the complexity of analysis. Computationally intensive.
	Influence Model	Linear Threshold Model [91], Independent Cascade Model [95]	Differing in the activation function. Capturing the dynamics and complexity of information dissemination and effectively considering temporal and spatial information. It is easy to explain microscopically how individuals make decisions.	Ignoring the mutual influence and interaction. Difficult to handle time delay and information attributes. Have a chance to affect neighboring nodes. Introducing the propagation threshold or weight of nodes increases the model’s complexity and makes analysis and calculation more difficult.
Competitive Model		EGT [18], SIC [98], RSD [101], SDIR [103], SO-S/EIR and C-S/EIDR [104]	Modeling the interactions between multiple types of information and the effects of these interactions on the information dissemination process reveals the trend of pluralism.	Competitive model construction and theoretical analysis are more complex and computationally cost.

Table 4. Meaning of the Symbol.

Symbol	Meaning	Symbol	Meaning
$u$ , $v$	user	$V$ , $E$	node sets and edge sets
$G (V, E)$	information dissemination graph	$p (u_{i, N + 1})$	the probability of the following forwarding user
$c^{i}$	i-th information cascade	$C^{i}$	the information cascades set
M	the total number of information cascades	$(u_{i, j}, t_{i . j})$	the user $u_{i, j}$ forward a message at the time t in the i-the information cascade
$N$	the maximum number of forwards	$u_{i, N + 1}$	the following forwarding user
$Θ$	the parameter set	$λ_{2}$	the $L 2$ regularization coefficient
$y_{i}$	the predicted cascade increment	$\hat{y}$	the actual value
${\hat{y}}_{i j}^{t}$	the predicted score of the video $v_{i}$ with the community S_j at the time $t$	$L_{B P R}$	the prediction of an observed interaction to be greater than an unobserved one
$(i, j^{+}, j^{-}, t)$	an example in the pairwise training data	$j^{+}$	one sharing of video $v_{i}$ is observed in the community $s_{j}^{+}$
$j^{-}$	an unobserved one	$L_{C E}^{i}$	the cross-entropy function of the predicted and actual communities
$y_{N + 1} \in R^{\| S \|}$	one-hot encoding	$λ_{1}$	the hyper-parameters
$L$	loss function	$Ω$	the models

Table 5. Classification of the predictive models.

Class	Methods	Advantages	Disadvantages
Traditional Method	Fisher Equation [9], Hawkes process [113], Decision Trees [106], BP neural network [106], Random walk [107], Markov [108], Bayesian [7,106], Matrix Factorization [109], Clustering [110], Energy Mode [27,112,116], Forest-Fire Model [28,115]	Belonging to probabilistic statistical models. Good interpretability to the patterns and causal relationships behind the information dissemination data. Less computational resources, which are easy to meet.	The model’s performance depends on the quality and quantity of training data. Lacking generalization. Over-fitting risks. Manual feature selection.
Deep Learning	DCE [10], Attention Mechanism [118,125], HDD [28], GNN [121,129,131,132], GCN [19], MUCas [137], HyperINF [124], STAHGCNs [126], INPAC [127], LSTM [120,134], RNN [118], Diffusion2pixel [128]	Automatically learning highly abstract feature representation from the raw datasets. Belongs to the end-to-end model. Effectively avoiding complex feature extraction engineering, capturing sequence dependencies in cascades, and saving many assumptions.	An amount of computation and storage costs, time-consuming, and many high-quality datasets. Belonging to the “black box” model, which cannot be explained. The model’s performance relies on datasets and hyper-parameters, which are difficult to choose. Social networks are generally implicit and cannot accurately obtain.

Table 6. A Review of Frequently Used Datasets.

Datasets	References	Description
Stanford Network Analysis Platform	Leskovec [141], Zhou [2]	A collection of more than 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges
Aminer	Tang [142], Xu [117], Cheng [29]	14,134 persons, 10,716 papers, and 1434 conferences
Twitter	Hodas [143], Zhao [10], Cheng [29], Wang [118]	66,059 Urls, 2,859,764 tweets, 736,930 users, and 36,743,448 links, the average Length is 32.6
Digg	Hogg [144], Zhou [2], Zhao [10]	3553 news stories, 139,409 users, totaling 3,018,197 votes
Douban	E. Zhong [145], Wang [118]	10,602 information cascades, 23,123 nodes, and 348,280 links, the average Length is 27.14
Memetracker	J. Leskovec [146], Zhou [2], Wang [118]	12,661 information cascades, 4709 nodes, the average Length is 16.24
Weibo	Cao [32], Sun [3], Zhao [10]	119,313 messages and 6,738,040 users. on 1 June 2016

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhang, P.; Shi, L.; Gong, J. A Survey of Information Dissemination Model, Datasets, and Insight. Mathematics 2023, 11, 3707. https://doi.org/10.3390/math11173707

AMA Style

Liu Y, Zhang P, Shi L, Gong J. A Survey of Information Dissemination Model, Datasets, and Insight. Mathematics. 2023; 11(17):3707. https://doi.org/10.3390/math11173707

Chicago/Turabian Style

Liu, Yanchao, Pengzhou Zhang, Lei Shi, and Junpeng Gong. 2023. "A Survey of Information Dissemination Model, Datasets, and Insight" Mathematics 11, no. 17: 3707. https://doi.org/10.3390/math11173707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Survey of Information Dissemination Model, Datasets, and Insight

Abstract

1. Introduction

2. Information Dissemination Model

2.1. Definition

2.2. The Explanatory Model

2.2.1. Epidemic Model

2.2.2. Network Structure Model

2.2.3. Competitive Model

2.3. The Predictive Model

2.3.1. Traditional Method

2.3.2. Deep Learning

3. Datasets, Evaluation Metrics, and Tools

3.1. Datasets

3.2. Evaluation Metrics

3.3. Interface Tools

4. Future Research Directions

5. Model Application

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI