1. Introduction
In the dynamically evolving landscape of digital interconnectedness, social network analysis (SNA) emerges as a crucial disciplinary field, positioned at the intersection of graph theory and sociology. This unique discipline offers a deep understanding of complex interactions among diverse entities such as individuals, organizations, or even URLs [
1]. Utilizing nodes to represent entities and links to symbolize their interactions, SNA provides both a visual and mathematical perspective on human interconnections. This analytical approach reveals insights into community dynamics, market trends, and political movements, and finds practical applications in varied areas, ranging from influence marketing to crisis management and public health surveillance. The primary objectives of this analysis include the detection of communities [
2,
3], the prediction of potential links [
4,
5,
6], and crucially the identification of influential actors [
7], thus demonstrating its versatility and relevance in contemporary society.
The significance of identifying these influencers or opinion leaders cannot be understated. These are unique individuals who, even though a minority, can cast vast influence over a majority [
8]. Their role becomes even more critical when considering the implications of their influence, such as mitigating rumor spread, virus control [
9,
10], optimizing energy dissemination [
10], and fortifying crucial zones against deliberate threats [
7,
11,
12]. Over time, a rich tapestry of methodologies has evolved to identify influential figures within social networks, each leveraging distinct features such as content and network structure [
13]. Content-based detection methods focus on the impact of textual content, considering both linguistic criteria, like the nature of arguments or agreement/disagreement between users, and numerical criteria, such as response frequency, message size, or the extent of relationships. For example, empirical studies have compared messages from influencers with those from non-influencers to discern patterns [
14]. Another content-based approach involves analyzing how influencers affect the themes and directions of conversations [
15]. Simultaneously, approaches focusing on network structure harness various structural components, with centrality-based methods being particularly prominent [
16,
17,
18]. In these approaches, social networks are typically represented as simple undirected graphs, G = (V, E), where V symbolizes the set of vertices (network users), and E denotes the interconnections between users. We employ centrality measures to capture the structural properties of these nodes. These measures assign real values to nodes, ranking them based on their significance within the network. Structural centrality encompasses two core types: local centrality measures, which are based on the immediate neighborhood of a node [
19], and global centrality measures, taking into account a node’s broader membership paths within the network. Local centrality measures include degree centrality [
19], local rank [
6], and K-shell [
20], while global measures feature metrics like eccentricity [
21], closeness centrality [
19], betweenness centrality [
19], and Katz centrality [
22]. However, it is crucial to note that despite their utility in understanding network structure, centrality-based methods may exhibit instability in large-scale networks. This is particularly evident in methods like degree centrality, which, while valuable, may overlook nodes with fewer direct connections yet substantial influence within the network’s broader context. This highlights the need for a more nuanced approach to effectively balance both content-based and structural insights to accurately map out the influence landscape within social networks.
While diffusion centrality has garnered attention for its efficacy in modeling the spread of influence, especially in epidemic scenarios, by considering both network topology and information dynamics, this paper introduces a novel methodological approach, centrality degree paths (CDPs), which extends beyond the scope of diffusion centrality. CDPs offer a unique perspective by intricately evaluating the influence exerted by nodes, not only based on their immediate reachability but also by considering the structural paths of influence, encompassing both direct and indirect connections. This nuanced focus allows CDPs to provide a more comprehensive understanding of a node’s influence within the network. Unlike diffusion centrality, which primarily focuses on the potential spread of influence, CDPs delve deeper into the underlying mechanisms of influence propagation, capturing the variability in the lengths of indirect paths and highlighting their significance in shaping network dynamics. By incorporating these structural insights, CDPs offer a more nuanced and detailed perspective on how influence percolates through the network, thereby enhancing our understanding of the dynamics of influence propagation within social networks.
As the field progresses, deep reinforcement learning (DRL) [
23] has introduced a novel paradigm in influence maximization, employing iterative learning strategies to optimize the identification of key influencers. Despite the promise of DRL in revolutionizing influence strategies, it encounters notable hurdles, including computational complexity and the extensive data requirements for model training. These challenges, along with additional drawbacks such as training instability, highlight the need for methodologies that are not just efficient but also adaptable across diverse network environments. In this context, the proposed measure aims to address these specific issues by offering a less computationally intensive and more stable alternative for identifying key influencers, thereby ensuring its applicability and effectiveness in various network scenarios.
Addressing these challenges, this paper introduces a novel methodological approach in the field of social network analysis (SNA), standing out for its integration of innovative and unconventional elements. This approach emphasizes the importance of node degree while also considering the broader network context, and is grounded in a solid theoretical foundation. Based on the premise that direct connections (node degree) are significant, our study acknowledges that the true essence of a node’s influence often lies in its broader relationships within the network, including indirect paths of varying lengths. Therefore, rather than solely focusing on the degree of connectivity, we have incorporated the concept of paths of different lengths, aiming for a more nuanced understanding of influence within the network. This innovative approach highlights the importance of indirect paths and their variability in length for a more comprehensive analysis of influence in social networks.
The experimental framework of our study is carefully structured to test the effectiveness of our approach. We have selected a variety of real-world social networks for our analysis, each chosen for its unique characteristics and relevance to our research objectives. Our methodology integrates specific tools and techniques, including advanced statistical methods and computational algorithms, to evaluate the proposed centrality measure. To attest to the viability of our proposed methodology, we subjected it to rigorous testing using Spearman and Pearson correlations across selected real-world social networks. The initial results are promising, with our metric effectively highlighting influential nodes. Continuing our exploration, a cornerstone of our experimental analysis is the application of the susceptible–infected–recovered (SIR) model traditionally used in epidemiology and adapted in our study to simulate the spread of information and influence within social networks. The adaptability of the SIR model to our context demonstrates its utility beyond traditional public health applications, providing valuable insights into the dynamics of information flow and influence propagation in social networks.
The structure of this paper is as follows.
Section 2 touches upon prior research,
Section 3 details the proposed method,
Section 4 offers a deep dive into our findings, and
Section 5 wraps up with concluding remarks.
2. Related Work
Over the years, social network analysis has continuously evolved, consistently emphasizing the understanding of roles and significance of entities within a network. Among the various aspects studied, centrality computation has stood out as an essential component, having been at the forefront for several decades [
24]. This importance stems from the underlying quest to discern and quantify the prominence or influence of individual actors within a broader collective or network. The centrality concept not only gauges the immediate impact of an actor but also reflects its broader implications on the network’s dynamics and flow. As we navigate this section, we will elucidate five fundamental definitions that have been instrumental in shaping the understanding of centrality: degree centrality, closeness centrality, betweenness centrality, PageRank [
25] and CNA-GE centrality [
26]. Each of these metrics offers a unique perspective on the role and influence of nodes within a network, contributing to the comprehensive landscape of social network analysis.
2.1. Degree Centrality
Degree centrality (DC) is foundational in centrality metrics. It quantitatively assesses a node’s importance based on its direct connections within the network. This metric’s intuitive nature correlates increased connections with enhanced influence. Mathematically, using the adjacency matrix
of an undirected graph G with
N representing the total number of nodes, the degree centrality for a node
is defined as:
However, while DC offers valuable insights [
27], it may have limitations, mainly when employed in complex scenarios such as web page graph analyses. Building on this, a study by Kitsak et al. [
28] highlighted an intriguing observation: the most influential nodes are not necessarily the ones with the most connections. Their research led them to explore
k-core decomposition, an iterative process that segregates nodes based on their minimum degrees [
29,
30]. A node’s coreness, indicating its rank in the decomposition hierarchy, is directly tied to its capacity to influence network dynamics [
31]. In contrast to coreness, the H-index is a local centrality measure that utilizes only partial information, specifically the degrees of the neighbors of the nodes [
32]. This emphasizes that while degree centrality offers a foundational understanding, capturing the nuances of influence in complex networks often necessitates a more multifaceted approach.
2.2. Closeness Centrality
Closeness centrality is a fundamental metric within the suite of global centrality measures. It is predicated on the assumption that a node’s strategic position within a network is gauged by its global accessibility to all other nodes. For an actor within a social network, high closeness centrality suggests the ability to efficiently interact with others with minimal intermediation, thereby decreasing potential control by others. Mathematically, for an undirected graph G composed of
N vertices, represented by its adjacency matrix A, the closeness centrality
Cclos of a node
vi ∈
V is given by:
where
dist(vi,
vj) denotes the shortest distance between the vertices
vi and
vj. The geodesic distance, which is the length of the shortest path connecting the nodes, is commonly employed as the measure of distance in this context, as suggested by Freeman [
19]. It is essential to recognize that closeness centrality is only pertinent for strongly connected graphs, where paths exist between all node pairs, ensuring that the network’s global structure contributes to the centrality assessment.
This metric, while informative, assumes a network’s even and uninterrupted connectivity, which may not always be representative of real-world scenarios, where networks can be large, sparse, or fragmented. For instance, in a web graph analysis, where not all pages are equidistant, the utility of closeness centrality could be constrained. Similarly to degree centrality, while providing foundational insights, closeness centrality may not always capture the subtle intricacies of node influence in complex network structures.
2.3. Betweenness Centrality
Betweenness centrality (BC) quantifies how often an agent acts as a conduit on the most direct path between two other nodes. Crucial for discerning power dynamics in communication networks, BC measures an entity’s control over information flow. Conceptually, BC offers a probabilistic perspective: it quantifies the likelihood that information traveling between two distinct nodes will traverse through a given node [
19]. Formally, the betweenness centrality for any node
can be articulated as:
where
denotes the number of shortest paths linking nodes
and
that incorporate node
, while
is the total number of geodesic paths between nodes
and
. While traditionally attributed to Freeman’s seminal work, the concept of node betweenness centrality also draws on earlier ideas, such as those presented by Anthonisse in 1971 [
33], who introduced the notion of “rush in a graph”, which conceptually aligns with what would later be detailed as betweenness centrality. Over time, the concept of betweenness centrality has evolved, incorporating nuances like link betweenness centrality or edge betweenness [
34,
35]. Before the conceptualization of betweenness centrality, Katz [
22] laid the groundwork for understanding network dynamics with an innovative methodology. This approach, distinct from betweenness centrality, prioritizes all potential paths within a network, assigning diminishing significance to increasingly longer paths. By doing so, it effectively captures both the direct and indirect influences within the system, offering a foundational perspective on the importance of network paths that would later complement the development of betweenness and other centrality measures.
2.4. PageRank
PageRank, pioneered by Larry Page and Sergey Brin [
25], revolutionized web search optimization. It ranks each node based on its connections and affiliations with significant nodes. The foundation of PageRank lies in the “Random Surfer” model, reflecting typical internet navigation patterns. The PageRank formula is given by:
where
is the set of nodes for which there is a link to
(i.e.,
,
),
is the outgoing degree of
, and
is a damping factor. The computational complexity of the PageRank algorithm mainly arises from the matrix multiplication step, which has a time complexity of
O(
nm), where m represents the number of iterations and n represents the number of nodes in the network.
2.5. CNA-GE Centrality
CNA-GE centrality, blending traditional network topology with gene expression data, advances node influence analysis in bioinformatics and systems biology. It evaluates node significance not only through structural connections but also by considering the biological data of gene expression levels. The centrality measure [
26] for a node,
vi, can be expressed as:
where
deg(
vi) is the degree centrality,
expr(vi) represents the gene expression level at node
vi, and
and
β are weights modifying the impact of their own and neighbors’ gene expressions, respectively.
While CNA-GE introduces promising advancements, it also presents unique challenges, particularly in the accurate modeling and interpretation of the biological relevance of centrality scores. The accuracy of CNA-GE centrality heavily relies on the quality and granularity of the gene expression data. Furthermore, applying CNA-GE in non-biological contexts necessitates thoughtful adaptation to ensure that the gene expression component is suitably redefined to align with other types of nodal attributes.
However, another study has led to the development of the “bridging centrality” metric [
27], gaining increasing recognition for its proficiency in the analysis of complex networks. It is crucial to emphasize that the effort to combine basic degree metrics with advanced structural determinants introduces its own complexities, making the quest for a universally effective method a formidable challenge. Utilizing various structural properties to identify the most influential nodes in a network indeed proves to be an effective approach. Nevertheless, the selection of these properties for combination remains a challenging task.
Table 1 delineates a comprehensive comparison of CDPs with traditional centrality measures, illustrating how CDPs effectively address the limitations of conventional approaches by integrating advanced structural properties. Each row in the table breaks down a different centrality metric, providing insights into their inherent limitations and the unique benefits introduced by CDPs in assessing both overt and subtle network influences. This holistic view underscores the complexity and effectiveness of combining various network properties to enhance the identification and analysis of influential nodes across diverse network types.
3. The Proposed CDP Measure
In the vast field of network analysis, a variety of metrics have been introduced over the years. These measures, while insightful, often operate in isolation, focusing mainly on specific facets of node importance. This section introduces the centrality degree paths (CDPs) measure, a holistic approach that integrates multiple aspects of node influence, in recognition of this limitation. In the subsequent sections, we delve into the intricacies of this innovative approach.
3.1. Problem Context
In this research, we delve into the structural analysis of an undirected and unweighted graph, representing a complex social network. Our primary objective is to pinpoint nodes that wield significant influence within this network. This task transcends the simple evaluation of nodes’ degrees, extending to a meticulous examination of paths of varied lengths. Such an approach is critical for encapsulating both direct and indirect interactions between nodes.
The impetus for identifying influential nodes is rooted in the need to comprehend the dynamics of information dissemination, the spread of trends, and the propagation of behaviors across the network. This understanding is pivotal for various applications, ranging from marketing strategies and public health campaigns to the analysis of social movements and the spread of misinformation. This necessitates a nuanced approach that goes beyond traditional analyses, addressing both the observable and the subtle, often overlooked, pathways of influence transmission.
To tackle this challenge effectively, we propose a multifaceted approach. This involves assessing not only the immediate reach of each node, represented by its degree, but also its extended influence, reflected in the network pathways it influences or controls.
The complexity of social networks, with their inherent unpredictability and nonlinear interactions, necessitates a methodology, both simple and robust, capable of capturing the nuances of these networks. Our approach, therefore, focuses on developing a more sophisticated model that considers various factors contributing to a node’s influence. This includes examining network topology, node centrality, and the role of clusters within the network.
While existing measures like eigenvector centrality and length-scaled betweenness also consider the impact of path lengths, CDPs take into consideration both direct connections (node degrees) and measure the impact of indirect paths through specific calculations that account for the variability in path length and their influence on determining a node’s centrality. This nuanced approach emphasizes the impact of both direct and extended network connections, thereby allowing CDPs to capture influence dynamics more comprehensively and address both overt and subtle influence pathways within the network.
Furthermore, recent works have explored the application of machine learning techniques to enhance network centrality measures [
26,
36], indicating a growing interest in advancing the field through innovative methodologies. Li et al. [
37] have also delved into centrality learning, introducing novel approaches to capture the nuances of influence within networks. Each one of the previously proposed measures can be said to leverage network structures and paths to assess influence uniquely.
3.2. CDP Measure
Traditional local and global centrality metrics, while insightful, have inherent limitations, particularly when applied to specific network typologies. The amalgamation of multiple measures can provide enriched insights. In this research, we introduce the CDP score, a composite measure that accentuates degree centrality while incorporating the number of paths. Solely relying on degrees may result in potentially overlooking crucial network dynamics. The proposed approach enhances node importance using squared degrees and concurrently considers the number of simple paths. Let represent an undirected simple graph.
The CDP centrality score of a node
is defined as follows:
where
represents the aggregate count of all paths from node
x to any other node
y within the network, where the path length
l does not exceed the predefined maximum
d. This variable
d acts as a control to limit the path length, ensuring that both direct and extended influences are captured. The variable
y iterates over all nodes reachable from
x under these conditions, allowing a comprehensive evaluation of
x’s influence.
To derive the CDP centrality score for nodes in a graph, the methodology below is proposed.
Squared Degree of a Node (deg(x))2: The degree of a node in a network graph denotes the count of edges connected to that node. Squaring this value accentuates the impact of nodes with heightened degrees over those with lesser degrees, providing a balance that sufficiently recognizes the influence of highly connected nodes without disproportionately amplifying their impact. This moderation prevents the overshadowing of other structural features of the network. By squaring the degree, the formula magnifies the significance of highly interconnected nodes more than a linear scaling would, yet avoids the extreme influence escalation that would result from higher powers.
Path Count (): This element of the formula calculates the number of paths between nodes x and y, with the condition that only those paths of length l that are less than or equal to a predefined distance d are included. This distinction is crucial, as it considers both the direct influence of a node (reflected by its degree) and its indirect influence via network connections within a determined range. This approach provides a more comprehensive view of a node’s capacity to disseminate influence across the network.
Division of the Two Components: The division of the squared degree of a node by the count of paths within a stipulated distance appears to standardize the influence exerted by a node. This serves to mitigate the potential distortion caused by nodes with exceptionally high degrees, particularly in extensive or dense networks.
Collectively, the choice to square the degree underscores nodes that are not merely connected but highly interconnected, conceivably serving as hubs or pivotal influencers in the network. The incorporation of path count introduces a layer of intricacy to the analysis, acknowledging that influence within a network extends beyond direct connections to encompass the dissemination of these connections throughout the network. Such methodological deliberations enhance the scholarly rigor of the manuscript by furnishing a nuanced and comprehensive measure of node influence in network analysis.
This procedural approach facilitates the identification of pivotal nodes, which can be visualized and verified as illustrated in
Figure 1.
3.3. Example
To illustrate the application of the CDP score, we consider
Figure 1. For this analysis, a specific choice of
l = 2 has been made, focusing on direct neighbors and those of second-degree proximity. This parameter choice provides a balance between depth and computational feasibility.
The CDP score in
Table 2 displays the influence of each node. As depicted, node 3 stands out as the most influential despite its degree, underscoring the advantage of the CDP measure.
To underscore the added value of the proposed metric, we turn our attention to the network depicted in
Figure 1. When relying solely on degree centrality, node 5 stands out as the most influential with a degree of 4, followed by nodes 1, 3, 8, and 11, each having a degree of 3. However, when applying the CDP metric, node 3 emerges as the predominant node with a score close to 1.8. It is followed by node 5 with a score of 1.77 and node 8 with 1.5. Even though node 3 has a lower degree than node 5, its position within the network makes it more influential. In summation, this innovative metric elevates accuracy in ranking by assimilating not just the degree centrality but also the abundance of simple paths throughout the network.