1. Introduction
From transportation to information diffusion, networked systems provide an effective way to describe these real-world behaviors [
1,
2,
3,
4,
5]. A significant part of network analysis, especially social network analysis, is based on network structure [
6]. Community structures that can be regarded as clustering with linkage are of commonness in networks [
7,
8], and the structural hole spanner (SHS), the node bridging different communities, is often accompanied by community structures. Community detection and structural hole spanner identification, the two flourishing topics in network science, help us to understand the structural mechanism of networked systems, such as social networks, information diffusion networks, etc. In the contact tracing and control of epidemics such as COVID-19, community detection and structural hole spanner identification play a very important role collectively. The restraint of contagions generally focuses on individual behaviors, but collective guidance is also very important. Guiding collective behaviors is less implemented since most studies neglect the mesoscopic structure of transmission networks [
9]. Finding the community structure and SHS contributes to the understanding of epidemic spreading in the inner community and the inter community, which is helpful in restraining the further spread. In the process of a pandemic, epidemic diseases spread first within the community and then across the community, which is determined by human behavior patterns. Based on human interactions in physical space and cyberspace, the analysis of social networks is conducive to contact tracing. As shown in
Figure 1a, with a network structure such as hierarchy and community, we can first detect these susceptible people with direct contact (“level 1” in the epidemic network) and their direct contacts (“level 2” in the epidemic network), which may help us to contain the spread of infection. In the information diffusion network, information such as rumors is usually formed in a local subnetwork and then spread among different communities. As shown in
Figure 1b, information transmission consists of inner-community spread and cross-community spread. If the path of cross-community transmission can be found and controlled as soon as possible, further spreading will be effectively contained. In other words, if we can quickly detect the community structure and identify the intermediary nodes between different communities of the diffusion network, we may effectively prevent the further spread of rumors [
10].
The above applications show the significant importance of community detection and structural hole spanner identification. From the perspective of network structure, the community is defined from the mesoscopic structure, and SHS is an important node described at the microscale level. Microscopic structural features coexist with mesoscopic structural features in real-world networked systems [
11]. To some extent, mesoscopic and microscopic properties of networks are able to determine the dynamics of complex networks collectively [
12].
However, most existing research on these two issues has been performed separately, which has ignored the structural correlation between mesoscale and microscale. In topological space, the network modeling approach is intrinsically constrained by the fact that it can only account for pairwise interactions, which makes the structural relation between mesoscale and microscale elusive. As a result, some existing methods suffer from low accuracy and high computational costs. To solve this problem, we use network representation learning (NRL) to embed the network into the geometric space for analysis. NRL is conducted to represent the node or the linkage of a network in low-dimensional spaces [
13]. In geometric space, the relationship between nodes or edges of a network can be measured by a certain distance, which may provide metrics to detect communities and structural hole spanners simultaneously.
In recent years, research on hyperbolic spaces has gained much attention in network science [
14,
15,
16,
17]. Hyperbolic space is a geometry space of constant negative curvature that can be used to represent the generation of scale-free networks [
18]. A common characteristic of many real-world networks is that their degree distributions fit a power-law distribution [
19,
20,
21], which is the premise of embedding networks into hyperbolic space. Hyperbolic embeddings are able to preserve the linkage structure of a scale-free network in a low-dimensional space, especially for hierarchical networks with community structures. On the one hand, in geometric networks, the similarity or distance between nodes can be used for the purpose of measuring community structures. Communities and structural hole spanners are related to the representation of similarity. Hyperbolic embedding can represent the similarity in a very low dimension. Hence, we may address these two tasks on the Poincare disk simultaneously after hyperbolic embedding. On the other hand, hyperbolic embedding makes it possible to represent a complex network through efficient and simple visualization.
In the paper, we propose an efficient algorithm SDHE for simultaneously detecting community structures and structural hole spanners of scale-free networks. Specifically, we use the Poincaré disk model, a model of a two-dimensional hyperbolic plane, to embed high-dimensional networks into low-dimensional hyperbolic space, in which, the angular distribution of nodes reveals their communities. Then, the critical gap, which is conducive to obtaining the angular region of structural hole spanners, is analyzed to detect communities of the network in the hyperbolic plane. Moreover, we study the link relationship between the community and structural hole spanners in hyperbolic space. Finally, we identify structural hole spanners via two-step connectivity. The main contributions of this article are highlighted as follows:
We analyze mesoscopic and microscopic structural features of scale-free networks and study the inter-community connection probability, which is described as the distance between the mesoscopic communities and the microscopic SHS in hyperbolic space.
By analyzing community structure and structural hole spanners bridging different communities in hyperbolic space, we find that low-dimensional similarity can be used to measure the community and SHS of networks. We obtain the critical gap for detecting communities and the angular region where structural hole spanners may exist.
Based on the analysis of the critical gap and angular region, we propose an algorithm SDHE for detecting communities and structural hole spanners simultaneously. Experimental results on synthetic networks and real networks testify the effectiveness and efficiency of our proposed algorithm SDHE.
The rest of the article is organized as follows.
Section 2 briefly reviews related work on community detection, SHS identification, and hyperbolic embedding.
Section 3 introduces some essential notations and definitions of the issue studied in this paper.
Section 4 proposes theoretical analyses and algorithm formation.
Section 5 discussed the performance of our proposed algorithm. In
Section 6, we analyze the rationale for our algorithm and conclude the paper.
5. Results
In this section, we evaluate the effectiveness of our proposed algorithm in this article. Firstly, we briefly illustrate the network datasets and several compared methods. Then, we present the quality measurement and evaluate the performance of the proposed algorithm on community detection and SHS identification. Finally, we discuss the results.
5.1. Datasets
We used three synthetic networks and nine real-world networks to experiment. We utilized the nPSO model [
65] to generate synthetic networks. The nPSO model was used for generating some specific networks in hyperbolic space, where heterogeneous angular attractiveness of nodes was preset by means of sampling the angular coordinates from a mixture of Gaussian distributions. We used the nPSO model to generate three networks of different parameters. In addition, nine real-world networks [
66] were used in the experiment. Some indicators of these experimental networks are shown in
Table 3.
5.2. Compared Methods
Based on the CGM, we compared SDHE with the following methods for detecting top-k structural hole spanners.
PageRank-based (PR) [
67] is a classic ranking algorithm that assigns each node a PageRank score for ranking potential SHSs. This algorithm is widely used in industries such as Google Search.
Betweenness-centrality-based (BC) [
68] gives each node a score for its shortest paths. Then, the algorithm selects nodes of the top-k scores as the possible SHSs.
Two-step connectivity (2-Step) [
69] estimates the number of pairs of a node’s neighbors that are not pairwise linked. The more that the number of edges means, the larger the possibility of belonging to an SHS.
HAM [
43] formulates a harmonic modularity function for discovering the possible SHSs. The rationale is that nodes whose neighbors belong to more different subnetworks can be regarded as SHSs.
ESH [
70] is an algorithm that simulates a factor diffusion process in SHS identification. To some extent, the motivation of ESH is similar to that of the label propagation algorithm (LPA) [
71].
The complexity of the aforementioned methods is briefly discussed as follows. The complexity of efficient embedding (EE) used for achieving the hyperbolic embedding is . The computational complexity of modularity Q is , so the CGM runs in , where represents the iteration times. For classic community detection algorithms, they usually have a high computational complexity. The Louvain algorithm runs in . The complexity of KL is . The CNM algorithm runs in . After obtaining the hyperbolic coordinates of the network nodes, we can use their geometric relationship to limit the angular area of structural hole spanners to a small angular region, which can greatly improve the efficiency of SHS detection. The computational complexity of the top-k SHS detection algorithm in this paper is . For some existing SHS detection algorithms, the complexity is often high because of the inevitable global search. For example, the computational complexity of betweenness is . That of two-step connectivity is . The HAM algorithm runs in . ESH and PageRank, although they both have a low complexity of , are unstable.
5.3. Evaluation Criteria
To evaluate the effectiveness of algorithms, two elements should be taken into consideration: the quality of community detection and the accuracy of SHS identification. Commonly, modularity, which is described in Equation (
11), is applied to estimate the quality of detecting community structures. In addition, the strength of ties, which is the quantitative measurement of edges, can be used for measuring the accuracy of SHS identification. Strong ties have greater values of strength, whereas weak ties have smaller values. From the point of view of topology, nodes and edges of a graph are equivalent. When a vertex has abundant edges connecting with vertices in one community, it can be considered that the vertex is strongly tied with the community. Specifically, we used the average connection strength of node
i to quantify the degree of weak ties as follows
where
represents the connection strength of
and
;
represents the number of varied communities that
’s neighbors belong to. The average connection strength reflects connection properties for a specific node. When
is very small, node
is more likely to be an SHS.
In this paper, we combine these two evaluation indicators for measuring the effectiveness of joint detection. Similar to the GR-score [
16], a CS-score that covers the two tasks has been introduced for the purpose of comparing the performance of methods. It is set as follows.
where
k represents the number of selected top-k SHSs,
represents the average connection strength of node
i, and
Q represents the modularity. When the CS-score is higher, the performance of community and SHS detection is more effective.
5.4. Experimental Result
Some experimental results of our proposed algorithm are discussed in detail in this section.
We used the EE method to embed networks into hyperbolic space.
Figure 5 shows the visualization result of hyperbolic embedding. In hyperbolic space, every point represents a node. The hyperbolic distance of two nodes represents the linkage possibility of the two nodes, and the angular difference between the two nodes indicates the similarity between the two nodes. We used the CGM to detect communities. Nodes in different communities are distinguished by different colors, which represent angular areas. Moreover, we obtained the angular region according to Equation (
18) and located it at the community interval divided by CGM. We then used two-step connectivity to select structural hole spanners. Three synthetic networks and nine real network datasets were used to conduct experiments, and the results are listed in
Table 4. We compared our method with PageRank [
67], betweenness [
68], two-step connectivity [
69], HAM [
43], and ESH [
70] for detecting top-k SHSs. Theoretically, SHS is more likely to have a low strength of ties.
It can be seen from
Table 4 that our algorithm outperforms other algorithms in all datasets, especially for synthetic networks. Due to ambiguous and overlapping community structures of the synthetic network in the hyperbolic plane, SHS identification by our algorithm can be efficiently achieved. Although the structure of real networks is not as ambiguous as that of synthetic networks, our algorithm applied in real networks has a good effect.
To jointly compare the comprehensive performance of community detection and SHS identification, the CS-score is devised to evaluate different algorithms. The formation of the CS-score consists of two parts: modularity and the strength of ties. The results are shown in
Figure 6. Based on CGM, the CS-score of our algorithm is higher than that of other algorithms. This indicates that the proposed algorithm in this paper is effective in the joint detection of community and SHS.
6. Conclusions and Discussion
In this article, we propose a novel algorithm SDHE for simultaneously detecting communities and structural hole spanners of networks in hyperbolic space. Different from common algorithms, our proposed algorithm can avoid global searches on a large scale. Specifically, with the help of the extended Poincaré disk model to embed nodes of scale-free networks into hyperbolic space, we are able to utilize the critical gap and the angular region to detect communities and SHSs. The CS-score containing the modularity and strength of ties is introduced to evaluate the performance of our proposed algorithm in this paper, and the computational result indicates that our algorithm outperforms other detection algorithms. The main reason is that the network structure is well represented in hyperbolic space. Communities of the network are placed in sectors with different angular ranges in the hyperbolic plane, so it is efficient in detecting communities and structural hole spanners by means of hyperbolic geometry. In other words, our proposed algorithm avoids analyzing nodes near the center of each community, which reduces the computational complexity.
We used the proposed algorithm to analyze synthetic networks and real networks, respectively. The experimental results have shown a great performance in synthetic networks because the generative mechanism of these synthetic networks is consistent with our method. Our method also has a good performance in some real networks, but the precondition is that these real networks are of good community structure in hyperbolic space and are not too sparse. The results indicate that the joint detection of mesoscopic and microscopic structure is effective and efficient in hyperbolic space because hyperbolic embeddings shed light on the hierarchy, community, and linkage of complex networks in a simple low-dimensional plane. In hyperbolic space, the similarity of nodes can be represented by the hyperbolic distance, which provides a metric to analyze the network structure efficiently. Although we spent some time embedding the network into the hyperbolic representation space, it is very necessary and instructive for network analysis.
Overall, the proposed algorithm of detecting communities and structural hole spanners simultaneously in hyperbolic space is effective and efficient, and has good application prospects in the fields of contact tracing, rumor control, and so on. The main drawback of our proposed algorithm is that the total computational complexity is greatly affected by the hyperbolic embedding algorithm. Hence, future possible research achievements could be directed towards developing highly efficient embedding algorithms that can represent real networks quickly and accurately in representation space.