3.1. Extraction and Analysis of Information Diffusion Patterns in Social Networks
In the realm of social networks, the transmission of information is subject to the influence of various network characteristics. These characteristics encompass the size and structure of the social network, the attributes associated with network nodes, and the dynamic nature of information propagation. It is imperative to consider these diverse characteristics when constructing a model for information dissemination and to ground this analysis in empirical data pertaining to information transmission.
To gain a deeper understanding of the impact of these characteristics on information diffusion, it is essential to leverage real-world data. By scrutinizing the cumulative distribution curve of actual information dissemination under different circumstances, we discern three distinct types of information propagation processes.
Figure 2 illustrates the cumulative distribution curves that correspond to these three distinct propagation phenomena.
The three figures in
Figure 2 correspond to the cumulative distribution curves of the nodes of the three kinds of information diffusion. The horizontal axis in the figure represents the time elapsed during the information propagation, with each unit on this axis corresponding to a six-minute interval, effectively delineating propagation rounds. The vertical axis depicts the proportion of nodes within the largest connected component of the social network.
Our study investigates three unique information diffusion scenarios, each reflective of varying dynamics in the communication process:
Uniform Diffusion: In this scenario, we observe a consistent and uninterrupted flow of information without distinct stages of promotion or hindrance. The propagation maintains a steady pace throughout.
Accelerated Diffusion: Here, the information propagation experiences sudden acceleration at a specific point in time. Certain nodes play a prominent role in promoting the information, resulting in an abrupt increase in the propagation rate.
Hindered Diffusion: In this case, the communication process encounters an initial stage of stagnation with minimal growth. Unlike the first scenario, there is an evident obstruction that impedes the early phases of communication.
The representation of these three communication processes provides valuable insights into the diverse dynamics at play during information diffusion within social networks.
The disparities observed in the cumulative distribution curves of propagation in
Figure 2 can be attributed to distinct underlying factors.
In the initial phase of the first propagation process, information originates from user nodes with a substantial following within the actual social network. Consequently, it experiences rapid dissemination through its initial followers. However, as time elapses, the temporal relevance of the information diminishes, user interest wanes, and the pace of information transmission decelerates.
In the second propagation process, a similar pattern emerges, with information initially emanating from user nodes with a substantial following. However, during the early stages of dissemination, user engagement with the information is relatively low, leading to a slower rate of propagation. At a critical inflection point, nodes with a significant number of subscribers begin disseminating the information. This action effectively revitalizes the information’s temporal relevance, piquing the interest of other users and accelerating the dissemination process.
Conversely, in the third propagation process, information initially originates from a node with only a few followers. Consequently, the dissemination rate remains sluggish during the initial phase. It is only at the first inflection point that a node with a substantial following takes charge of propagating the information, resulting in an acceleration of the dissemination process.
According to the analysis conducted, several key conclusions can be drawn:
Effect of Information Transmission Speed: It has been observed that the rate of information transmission within a network has a significant impact on the probability of nodes successfully transmitting information over time. A slower information transmission speed is associated with a decrease in the likelihood of successful information propagation.
Influence of Node Subscribers: Nodes with a substantial number of subscribers play a crucial role in the information dissemination process. When such nodes initiate the propagation of information, it significantly increases the probability of subsequent nodes successfully transmitting the same information.
Building upon the above findings, we can further delve into the impact of various factors on the probability of nodes disseminating information within social networks. These factors can be categorized into three main groups:
Network Characteristics: This category encompasses factors related to the size and structure of the social network. These include the topology of the network, its connectivity, and the density of connections between nodes.
Node Attributes: Factors in this category pertain to the characteristics of individual nodes within the network. These may include the duration of exposure to information, the current volume of information being spread, the concentration of information in the proximity of nodes, the spatial positioning of nodes within the network, and the level of activity exhibited by nodes.
Information Transmission Status: This category addresses the overall status of information transmission within the network. It involves factors such as the current state of information dissemination, the pace of information flow, and the patterns of interaction among nodes.
By considering these three broad categories, we can comprehensively assess the multitude of factors influencing the likelihood of nodes transmitting information in social networks. This structured approach allows for a more systematic analysis of the intricate dynamics governing information dissemination.
indegree: This parameter quantifies the degree of node centrality within the current network of interest. It represents the number of incoming connections or followers that a node possesses when the scope of the social network is restricted. In essence, in-degree reflects the significance of a node within the designated network and directly influences the likelihood of information dissemination by the node;
follower_count: This parameter corresponds to the user attribute of the node and denotes the total number of followers that the user has in the real-world social network. In practical social networks, the follower count plays a pivotal role in determining the reach of information dissemination initiated by the user node. When one or more nodes (where ’x’ is a constant) disseminate information at any given time, it elevates the probability of other nodes participating in the information diffusion process; activitiness: The activity level of user nodes associated with a particular node is derived from a statistical analysis of user behaviors within a specified time frame in the actual social network. In social networks, higher user activity levels directly correlate with an increased probability of information propagation; t_expose: This parameter quantifies the duration of time a node remains exposed to a specific piece of information. The probability of the node engaging in information dissemination is inversely proportional to this exposure time; active_node_count: The current volume of ongoing information dissemination and the duration of the dissemination collectively influence the probability of node participation in information diffusion. The ratio of active nodes to time active_node_count/ t can be regarded as the rate of information propagation within a unit of time. The probability of node engagement in dissemination is directly proportional to this rate.
3.2. Calculation of Propagation Probability in Social Networks
The methodology section of this paper presents enhancements to the Information Cascades (IC) model. Our improved information propagation model is designed to closely mimic real-world social networks and can accurately simulate the cumulative distribution curve observed in actual network data. Consequently, we select the same initial node for information propagation simulation as that observed in real-world information dissemination scenarios.
Building upon the insights gained from our earlier discussion on the factors influencing information transmission probabilities, this section forms the foundation for calculating these probabilities through their integration. The computation of information propagation probability is predicated on the assumption that only when node v directs its attention towards node u, does node v possess the potential to propagate information originating from node u. The probability of node v transmitting information from node u can be accurately determined through the utilization of relevant characteristic parameters.
In accordance with the previously defined characteristics that impact information transmission probabilities, this section synthesizes these factors to compute the probability of information transmission. We operate under the assumption that information can only propagate from node
u to node
v when node
v actively engages with node
u. The precise calculation of the probability of node
v disseminating information from node
u is contingent upon the incorporation of pertinent characteristic parameters.
In Equation (
4),
C represents constant, and t_expose denotes the time at which the node becomes exposed to the information. This time corresponds to the current round of the node and the moment when the concerned node initially disseminates this message. user_status signifies the impact of the node’s intrinsic attributes and its position within the network of interest (e.g., activity level, degree of centrality within the network of interest, etc.) on the propagation probability. info_status represents the influence of the information’s status (comprising the total propagation count and the current propagation round) on the propagation of the node. factor_t encapsulates the effect of nodes that have disseminated the information prior to the current time on other nodes.
Equation (
5) delineates the quantity of individuals who exhibit an interest in the focal user within the contemporary social network. To ensure that the logarithmic function yields a positive value, constant terms are introduced. This value is indicative of the node’s level of activity within the real-world social network, specifically denoting the dynamic count within a unitary time frame. The formulation presented above is the outcome of amalgamating prior analyses regarding the attributes of nodes in the context of information dissemination.
In Equation (
6), active_node_count denotes the cumulative propagation quantity of the current information, while
t signifies the current propagation iteration. info_density represents the ratio of nodes that have disseminated information among the nodes pertinent to the current node, signifying the concentration of information.
In the context of real-world social networks, when a total of users with a significant simultaneously propagate the same message before time t, these nodes collectively influence the information dissemination process beyond time t. To model the impact of nodes that have participated in information propagation at time t on subsequent nodes, we introduce the parameter , where represents the growth coefficient of influence.
Given that the experimental network is generated to emulate the actual information propagation process, it can be assumed that all nodes within the network are interested in the information being disseminated. For the purpose of this discussion, we temporarily omit consideration of the specific level of interest that nodes have in the information.
This section has provided a comprehensive explanation of the information dissemination model and has outlined the probability calculation associated with information propagation. Building upon the foundation of the information dissemination model, we can proceed to simulate information propagation using real-world social network data.
3.3. Analysis of Transmissibility in Social Networks
3.3.1. Propagation Modes Generated by Information Diffusion Models
In accordance with the information propagation model as outlined in this research paper, when conducting propagation simulation experiments with various initial nodes, numerous propagation cascades are concurrently generated. These cascades exhibit substantial disparities in their structural characteristics and statistical parameters. These distinctions serve as indicators of diverse propagation modes and the varying propagation capabilities associated with different nodes selected as starting points. This section will exemplify these principles using empirical data.
In the utilization of the information propagation model for propagation experiments, nodes with distinct centrality of penetration are chosen as the starting nodes for experimentation, thereby generating multiple propagation cascades. Although these cascades adopt a tree-like structure, discernible disparities persist among them. As illustrated in
Figure 3, the depicted propagation cascades originate from two different initial nodes within the same social network. In the left figure, the entry centrality value of the information propagation starting node stands at 0.591, while in the right figure, it is 0.033.
In
Figure 3, the green nodes symbolize the information propagation starting nodes, while the remaining black nodes represent nodes propagated in subsequent simulation processes. The connecting edges denote the propagation paths generated during the simulated information propagation. Notably, when selecting the initial node, the entry centrality of the initial node in the left figure significantly surpasses that in the right figure. Furthermore, it is discernible that the structural characteristics of the two cascaded networks markedly differ. Specifically, in the left figure, the information within the propagation cascade predominantly propagates from the initial node as its focal point, with most nodes in the entire cascade concentrated around this initial node, resulting in a shallow cascade depth. Conversely, in the right figure, the information within the propagation cascade disseminates from multiple centers, with the starting node being just one among several central nodes, leading to a greater cascade depth.
Figure 4 and
Figure 5 depict the temporal evolution of two distinct communication modes, which we refer to as ‘single center divergence’ and ‘multi-center divergence’ throughout this paper. These graphical representations elucidate the progression of these modes over time with greater clarity.
In the context of the first communication mode, ‘single center divergence’, the majority of nodes that can be reached by the initial node are concentrated in close proximity to the source node. Conversely, the second communication mode, ‘multi-center divergence’, exhibits a more dispersed distribution of nodes.
The observed patterns of information communication cascades align closely with these two communication modes. Our analysis leads to a significant conclusion: When an influential node at the core of the social network serves as the point of origin for information dissemination, it exerts a direct influence on its immediate neighbors. Even in the absence of other highly communicative nodes in its vicinity, this central node can independently propagate information. In contrast, nodes with limited communication capabilities and situated away from the social network’s core lack the capacity to significantly impact their surroundings. They must rely on other more communicative nodes to facilitate the dissemination process. This fundamental disparity forms the basis for the divergence between the two communication modes observed within cascading networks.
When examining the communicative impact of nodes within social networks, it becomes imperative to consider the influence of two distinct modes of communication. Specifically, the positioning of a node within the social network exerts a significant influence on the manner in which it serves as the originating point for the dissemination of information to other nodes. In cases where certain nodes are intentionally designated to direct the flow of information within the social network, it becomes necessary to devise strategic approaches aimed at exerting influence over these pivotal nodes. By doing so, the objective is to expedite the dissemination of information throughout the social network, thereby achieving a more expeditious and cost-effective information propagation process.
3.3.2. Definition of Node Transmissibility in Social Networks
Based on the aforementioned propagation model, this section outlines the methodology employed in utilizing the model to simulate propagation and generate multiple propagation processes for the purpose of elucidating the transmissibility of nodes within a social network. In this paper, the concept of node transmissibility is delineated from two distinct perspectives: (1) the transmissibility of a node when serving as the initial information source and (2) the transmissibility of a node during the process of information dissemination.
The assessment of node transmissibility as initial information sources involves the selection of nodes situated at various locations within the network to act as the starting points for information dissemination. Propagation simulation experiments are then conducted using the chosen models. These experiments serve to contrast the differing propagation extents of information originating from distinct starting nodes within a predefined time frame. Additionally, they facilitate an examination of the varying time intervals required for information to propagate to specified ranges from diverse starting nodes, each associated with a predetermined propagation range size.
In this experimental setup, multiple nodes, characterized by distinct penetration centrality values within the social network, are designated as the initial information sources. The chosen information propagation model is employed to conduct these propagation experiments. The experiments are executed repeatedly, and the resultant average propagation time, as influenced by the changing attributes of the starting nodes, serves as a critical evaluation criterion.
This approach enables a comprehensive analysis of how different nodes, based on their penetration centrality, impact the dissemination of information throughout the entire network. It offers insights into the dynamics of information propagation and the role played by various nodes in facilitating or hindering the spread of information within the social network.
Transmissibility of Nodes in the Information Dissemination Process: In the context of the social network information dissemination model, this parameter characterizes the transmissibility of individual nodes. This research quantifies the node’s ability to propagate information to other nodes during the information dissemination process and denotes this transmissibility as
. It serves as a fundamental metric in evaluating the node’s influence and effectiveness in the information diffusion within the social network. The calculation method is shown in Equation (
9):
In which
indicates the number of times a node propagates information in the process of information dissemination. Click count_out refers to the number of times the information is spread. To measure the transmissibility of nodes in the information dissemination process, a small number of nodes with different degrees of centrality in the social network are used as the starting information nodes. The information dissemination model is used for the dissemination experiment. The information dissemination cascade obtained from the experiment is used as the data. The statistics in Equation (
9) are carried out for all nodes in the dissemination cascade to obtain the transmissibility of each node in the social network in the information dissemination process, that is, the dissemination influence. According to the method of Equation (
9), a value greater than or equal to 0 can be calculated for each node in the social network to indicate the number of times that a node can cause other nodes to spread information by spreading a piece of information.