1. Introduction
Most real-world networks are complex networks with small-world, scale-free and strong-clustering properties. Complex network embedding is a valid tool for downstream network analysis tasks [
1,
2,
3]. Many network embedding approaches based on Euclidean space have been addressed well [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. Complex networks usually have latent tree-like and scale-free properties [
20]; Euclidean space mapping can not capture the above features well. According to this, some researchers propose non-Euclidean network embedding [
21,
22]. They have shown that the hyperbolic space is more suitable for complex network representation with a tree-like hierarchical organization [
20]. The hyperbolic space extracts the hierarchical topology organization by approximating tree-like structures smoothly with constant negative curvature rather than the flat Euclidean space [
23]. The network hyperbolic embedding theory makes the geometrical representation of the complex network while it preserves the small-world and scale-free properties well. It can effectively interpret the hierarchical topology characteristics and generation mechanisms of complex networks. Compared with network embedding in Euclidean space, dynamic hyperbolic space embedding is a research area not yet fully studied. Some of the hyperbolic space embedding approaches are rather highly complex.
Current hyperbolic embedding approaches are mainly divided into three categories. The first category is based on manifold learning. The research work [
24] proposes a data-driven manifold learning approach based on Laplacian network embedding. The approach is similar to Laplacian matrix decomposition in Euclidean space by using the Laplacian matrix for eigenvalue decomposition. The second category is based on the maximum likelihood estimation approach. HyperMap [
25] infers angular coordinates by reproducing the generation of a network generation model. All nodes are sorted in descending order. The possible angle values are traversed to maximize the likelihood function for the most suitable angular coordinates. HyperMap-CN [
26] proposes to derive the hidden geometric coordinates of nodes in a complex network based on the number of common neighborhoods. It utilizes the common neighbor information from the likelihood function of the HyperMap method to improve the accuracy of node embedding coordinates. The third category is the hybrid approach that combines both manifold learning and maximum likelihood estimation. Although LaBNE has high embedding efficiency, it sacrifices embedding performance. Similarly, HyperMap has high embedding performance, but the complexity is also high. Accordingly, LaBNE+HM [
27] proposes using LaBNE for network embedding to obtain initial embedding values of nodes, and it utilizes HyperMap to obtain the final embedding coordinates by sampling the angles near initial values.
Nevertheless, most of the aforementioned methods are designed for static networks. In the real world, networks have inherent dynamics with evolving characteristics. For example, nodes in social networks add and delete neighbors with varied social relations. The nodes in brain networks make changes to the neighboring relations according to new connections of neurons. However, efficient representations with varied nodes and edges are extremely crucial, especially for the stabilization evolution of network application scenarios [
28]; it presents challenges for the embedding of dynamic networks.
Inspired by Node2vec [
10], which extended Deepwalk by changing the random walk method, researchers introduced temporal meta-paths [
29,
30] to modify the sampling method. Both approaches are derivatives of static approaches, which do not capture the dynamics and high-order proximity of nodes and edges of a local structure well. The high-order proximity has proven to be valuable in capturing the network structure [
31]. Research in [
32] proposes to separate the dynamic network into several snapshots and then process the static network embedding according to the variation. Inevitably, some complex terms involving global structural information occur in the process of preserving global higher-order approximations, which results in high complexity. With this dilemma, some researchers propose conducting dynamic embedding with the consideration of network evolution [
32,
33,
34,
35]. They propose to capture characteristics with variations to reflect network dynamics and then improve the efficiency of application tasks based on features. Cao [
33] et al. made a review on current dynamic network embedding approaches. They point out that current embedding approaches have made breakthroughs in many ways, but there are still problems to be solved. For example, how to effectively capture the influence of node variations on neighboring nodes and the local network structure are still key problems. How to overcome problems such as data storage, training efficiency and heterogeneous information fusion [
36] for large-scale network embedding is also still not well addressed.
According to the above, the main challenges for hyperbolic space embedding of temporal complex networks include: (1) The embedding complexity of hyperbolic space is a key factor for complex network analysis efficiency. (2) Dynamic network embedding needs to be adaptive toward variations within network evolution.
In this paper, we propose low-complexity hyperbolic embedding schemes for temporal complex networks. First, we propose a low-complexity hyperbolic embedding approach using matrix perturbation with time-evolving features for a medium-scale complex network. Next, we propose a fast update hyperbolic embedding approach with a local maximum likelihood estimation-based geometric initialization and R tree-based local search for large-scale complex networks.
The main contributions of this paper are summarized as follows:
- (1)
We propose MpDHE to implement dynamic network hyperbolic embedding with low complexity, i.e., . To the best of our knowledge, we are the first to model the increment of the network utilizing the matrix perturbation when inferring hyperbolic coordinates.
- (2)
We computed geometric initialization to embed medium-scale dynamic networks via hyperbolic circular domain construction.
- (3)
We implement the proposed schemes for real-world network scenarios with several kinds of downstream application tasks, including community discovery, visualization and routing, which proves the efficiency and effectiveness.
The remainder of the paper is organized as follows.
Section 2 gives some preliminaries for hyperbolic embedding.
Section 3 proposes a novel low-complexity embedding scheme for dynamic temporal complex networks.
Section 4 gives the performance evaluations.
Section 5 concludes the paper.
2. Some Preliminaries
2.1. Complex Network and Generation Model
In a realistic world, many complex systems can be represented by networks with a collection of nodes and edges, i.e., . Different from small-scale networks, most complex systems are large-scale networks following a power-law distribution. They are modeled as temporal complex networks within the time evolution processes. Time is divided into continuous time steps, which then form the sequence of network snapshots for each time step. The temporal complex network can be represented by: . For each time step t, the adjacency matrix is denoted as . The element in is 1 if there is an edge between node i and j; otherwise it is 0. The Laplacian matrix of the graph is: , where D is a matrix with node degrees on its diagonal (with 0 elsewhere). We assume all the networks considered in the paper are connected networks. For unconnected networks, each of the connected subnetworks is taken into consideration. In this case, A, D and L are symmetric matrices, and the degree of each node is positive.
Two commonly used complex network generation models are the popularity-similar optimization (PSO) model [
37] and the nonuniform PSO (nPSO) model [
38]. The PSO model keeps a trade-off between node generation time and node similarity. The node generation time is positively related to node popularity. The PSO model can generate a complex network of
N nodes with real, known hyperbolic coordinates. The model parameters include an average node degree
, the scaling exponent
and the network temperature
T. The PSO model can simulate how random geometric graphs grow in the hyperbolic space, generating realistic networks with small-world, scale-free and strong-clustering features. PSO cannot reproduce the community structure of a network, and the nPSO model is proposed for community structure based on this. It enables heterogeneous angular node attractiveness by sampling the angular coordinates from a tailored nonuniform probability distribution, e.g., a mixture of Gaussians. The nPSO model can explicitly determine the number and size of community structures with an adjustment of network temperature, which controls network clustering and generates highly clustered networks efficiently.
2.2. Hyperbolic Space and Poincare Disk Model
Hyperbolic space is hard to imagine and equivalently embedded into Euclidean space. Hyperbolic space is even “larger” than Euclidean space. In this paper, we use the Poincare disk model as the embedding target. The circumference and area equation of a hyperbolic disk with hyperbolic radius
R and centroid as
can be represented by (
1) and (
2).
where both the circumference and the area present an exponential increase along radius
R (i.e.,
,
). The hyperbolic space grows rapidly along the radius. The region grows to a fairly large space at the edge of the disk, which is the most prominent feature of the hyperbolic space.
Hyperbolic space is suitable to represent complex networks, especially for tree-like property-based structures. Actually, hyperbolic space can be viewed as the continuous version of a tree-based network. For a
n-numeration tree in the network system, the circumference and the area of the Poincare disk correspond to the number of nodes within
s-hop from the root as:
and the total number of nodes within
s-hop from the root as:
, respectively. If the curvature of the hyperbolic space satisfies
, then the circumference and the area of the hyperbolic space increase with the rate of
, consistent with the growth rate
of the
n-numeration tree. In this case, the tree structure can be regarded as a discrete hyperbolic space, which is shown in
Figure 1.
Branches of the tree structure need a storage space of exponential magnitude, which is well supported by hyperbolic geometry methodology. The scale-free and tree-like structure of complex networks fit with the negative curvature and the exponential expansion of hyperbolic space, so the hyperbolic space embedding approaches are well suited for geometry-based representation learning of complex networks. For hyperbolic space embedding, the radial coordinates in the Poincare disk represent the popularity of nodes. The angular coordinates represent the node similarities. Moreover, we can effectively illustrate the evaluation of complex networks as the completion between popularity and similarity by using the Poincare disk model. Further, the Poincare disk model is also effective for explaining the topology features of complex networks.
2.3. Initial Static Embedding
To mine the evolution characteristics of the network and reduce the complexity, we utilize the Laplacian matrix decomposition-based hyperbolic embedding approach LaBNE in [
24] to make an initial static embedding for the network snapshot at time step
. We then update the network embedding results for the subsequent time steps by capturing the main variations in the topology structure. The complexity of hyperbolic embedding mainly comes from angular coordinate embedding, so the embedding for temporal complex networks focuses on the update of angular coordinates.
LaBNE for initial static embedding: The common sense of the hyperbolic network model is that the connection probability between nodes is negatively correlated with the angle difference. The connected nodes have similar angles. In LaBNE, the network is embedded into the two-dimensional hyperbolic plane
represented by a Euclidean circle, which gives matrix
Y with shape
as:
. In which each row represents the embedding coordinate of the node. By using the Laplace operator, the objective is to minimize
. Where trace is the weighted sum of the distance between adjacent nodes. By minimizing the trace, it can reduce the Euclidean distance between connected nodes. If nodes are distributed around a circle centered at the origin point, then the distance in Euclidean space also reflects the angle difference. To avoid being arbitrarily scaled, the problem also includes an additional constraint as:
. The optimization problem can be described as:
Using the Rayleigh–Ritz theorem, the solution of this problem consists of eigenvectors corresponding to two minimum non-zero eigenvalues of the generalized eigenvalue problem: . Where the minimum eigenvalue is zero, we take eigenvectors corresponding to the second- and third-smallest eigenvalues.
Embedding the network in a two-dimensional hyperbolic disk needs radial coordinates and angular coordinates of nodes. Angular coordinates can be approximated by:
according to the conformal properties, where
and
are the first and second items of the row vectors corresponding to the nodes in
Y. In addition, we need an additional equidistant adjustment step to distribute nodes on the disk evenly. There are two ways to calculate the radial coordinate by utilizing the PSO model and static estimation [
39]. We choose the latter to get the radius of hyperbolic disk
R and the radial coordinate
, which can be calculated by (
4) and (
5).
where
n is the total number of nodes in the network (the maximum connected subgraph),
and
is the power-law distribution exponent.
T is the clustering coefficient of the network.
is the number of edges.
The calculation radial coordinate embedding has complexity . The calculation of angular coordinate embedding needs the first two items of the generalized eigen decomposition with complexity .
3. Hyperbolic Embedding Schemes for Temporal Complex Networks
We propose dynamic hyperbolic embedding schemes to tackle the challenges of complexity and dynamics in the temporal complex network. First, we propose the matrix perturbation-based dynamic network hyperbolic embedding scheme (MpDHE) to achieve low complexity, which is adaptive within the scale-fixing network. We then generalize the implementation of MpDHE into a large-scale network utilizing the geometry initialization in a hyperbolic circular domain.
Figure 2 shows the overview of the proposed scheme for dynamic hyperbolic embedding.
3.1. MpDHE Scheme
To reduce the time complexity for dynamic hyperbolic embedding, we propose the use of matrix perturbation [
32,
40] to update the embedding coordinates in the MpDHE scheme. Matrix perturbation is also commonly used in the dynamic network embedding of Euclidean space. Compared with the matrix at time step
t, a matrix perturbation is involved in the matrix at time step
. For a specific feature dimension
i with feature pair as:
, the generalized eigenvalue problem after the perturbation is shown in (
6).
According to the matrix perturbation, the approximate solutions of the increments of eigenvalues and eigenvectors are shown in (
7) and (
8).
The objective function of network embedding in Euclidean space is the same as the objective of solving angular coordinates in hyperbolic embedding; the above scheme can be directly used to incrementally update angular coordinates in dynamic hyperbolic embedding. Specifically, the embedding vectors in Euclidean space are deduced as follows:
Then, according to the conformal property, the angular coordinates
at time
t can be calculated by the corresponding embedding vectors
as follows:
Obviously, the differences in the network between continuous time steps induce changes in the embedding vectors, which are analytically formulated with the incremental eigen decomposition. Therefore, the dynamic hyperbolic embedding at a later time step can be implemented in low complexity. Specifically, the time complexity of MpDHE is analyzed as follows, and the framework is summarized as Algorithm 1.
Time complexity analysis: Suppose
T is the total number of time steps to be predicted. Each time the radial coordinate embedding complexity is the same as LaBNE, i.e.,
, then the total complexity of the radial coordinate calculation is
. For angular coordinate embedding, the complexity of the initial value setting is
. The complexity of updating eigenvalues is
, and the complexity of updating
k eigenvectors is
. Where
and
are the numbers of non-zero items in sparse matrices
and
, respectively. In general, the complexity of MpDHE is
. For
,
, the MpDHE scheme effectively reduces the embedding complexity.
Algorithm 1: MpDHE algorithm |
|
3.2. Geometric Initialization
However, the matrix perturbation in the above MpDHE cannot embed new nodes in the network. The basis of matrix perturbation makes updates based on the eigenvectors from the previous time step. For the node not existing at the previous time step t but appearing at time step , MpDHE is not suitable for this case. It cannot update for no eigenvectors from the previous time step.
For this reason, we technically use eigenvectors at time step t to calculate values for new nodes appearing at and construct a geometric initialization by hyperbolic circular domain.
Obviously, the hyperbolic distance between nodes determines the connection probability of them in the hyperbolic disk. The shorter the hyperbolic distance between the two nodes, the bigger the connection probability is, as well as the similarity. When a new node occurs, if the original node is far from the new one, it has less effect on the angular coordinate. The initial angular coordinates can, therefore, be calculated from those nodes with a small hyperbolic distance from the new node.
Based on the above, we first filter out the neighbors with small and similar degrees toward the new node and set the mean of their angular coordinates as a basic approximation of the new node. The corresponding computational process is shown as (
12).
where
is the neighboring node set toward the new node.
i is a small and similar degree, and
m is the number of nodes in the set.
Then we calculate the hyperbolic circular domain with the basic approximation as the center and select nodes inside the circle. The distance
between node
and node
is shown in (
13) by using the hyperbolic cosine theorem.
Given the centroid of the hyperbolic disk with radius
R as:
, the radial coordinate
corresponding to the angle
on the disk is shown in (
14). Where
.
The hyperbolic circle is not easy to represent in an equation directly. To quickly find the inner range of the hyperbolic disk by using R-tree (Rectangle tree) [
41], we sample points on the hyperbolic disk and then outline the polygon contour to approximate the hyperbolic disk. R-tree is a tree-based data structure that can be used for spatial high-dimensional data storage and fast query. The core strategy of R-tree is to aggregate adjacent nodes and use the minimum circumscribed rectangles of them as the nodes of each layer in the tree. R-tree can query node sets inside the polygon quickly.
Here we take the polygon contour obtained by outlining the hyperbolic disk as the input of the R-tree. We then approximately query the node sets of the hyperbolic disk. Considering the error that exists from transforming the
function and R-tree search based on the maximal circumscribed rectangle, more nodes will certainly be searched than the exact result. However, we can still think that nodes beyond the searched results have a distance of more than
R from the center of the disk. Therefore, the nodes beyond the search results can be ignored, i.e., the search results of R-tree are within the initialization range. The geometric initialization is calculated as follows:
where
is the node set contained within the hyperbolic circular domain.
is the number of nodes in the set.
4. Performance Evaluations
In this section, we verify the performance of our schemes with numerous evaluations. First, we perform evaluations on the reliability of our schemes by comparison with the eigen decomposition-based scheme in terms of MSE. Then, we perform evaluations on synthetic networks by comparisons among the other static hyperbolic embedding schemes in terms of embedding precision. Afterward, we implement our schemes with different downstream tasks by comparison with the other hyperbolic embedding schemes.
4.1. Scheme Analysis
The proposed MpDHE is combined with the matrix perturbation and conformal mapping to reduce the dynamic embedding complexity with preserving the embedding precision. Conformal mapping transforms a Euclidean coordinate into a hyperbolic coordinate, which is lossless. However, if matrix perturbation acts on quickly re-embedding, it would inevitably incur errors.
To analyze the reliability of the Euclidean coordinates obtained by matrix perturbation, we implement ablation experiments on 10 groups of synthetic networks. These groups of networks are constructed by nPSO with network scales ranging from 100 to 1000. The proportion of changed nodes between net0 and net1 is 5%. We calculate the mean square error (MSE) between eigenvectors obtained from eigen decomposition and matrix perturbation. The corresponding results are shown in
Figure 3.
It shows that all the MSEs are at a low level (under 0.05). Moreover, the MSE decreases with the increase in nodes. It indicates that the proposed method is capable of embedding large-scale networks under a convergence error.
4.2. Embedding Performance Evaluations
4.2.1. Settings
To verify the efficiency effectiveness of our embedding schemes for complex networks with different parameters, we generate 10 PSO synthetic networks with the combination of parameters as: , and . The network scale is set as 10,000. We then implement LaBNE and the two proposed embedding schemes to embed the network into the Poincare disk. The results are based on the average of ten networks with the above parameter configurations.
To perform evaluations within network dynamic scenarios, we generate two snapshots of networks with the above generation configurations. It can also be easily extended to more snapshots. The first snapshot represents the initial “net0”, and the second snapshot represents “net1”. The second snapshot has a 1% change in nodes compared to the first snapshot, i.e., newly added nodes have the ratio , and the varied old nodes have the ratio according to the PSO model.
We perform evaluations on our MpDHE scheme with the performance bounds provided by the other static embedding schemes for dynamic network situations. Those static embedding schemes can still be used for dynamic scenarios if we treat each snapshot of the dynamic network as the static embedding scenario. Obviously, this procedure would incur high complexity, and it is not realistic for large-scale network application scenarios. In our evaluations, improved static embedding schemes are used as a precision bound for comparisons.
4.2.2. Baselines
In performance evaluation experiments, we compare the performance of the following static embedding methods to evaluate their effectiveness.
EE [42]: EE is an efficient hyperbolic embedding method with a greedy strategy, which combines common neighbor information with the maximum likelihood of optimizing the embedding.
Coalescent [43]: Coalescent approximates the hyperbolic distance between connected nodes with two manifold-learning-based pre-weighting strategies. The final embedding vectors are adjusted via maximum likelihood.
LPCS [44]: LPCS is a novel hyperbolic embedding method utilizing the community information of the network; it embeds nodes from a common community to preserve the mesoscale structure of the networks.
CHM [45]: CHM detects communities of the network and then constructs a fast index to solve the maximum likelihood with the guide of the obtained communities.
Mercator [46]:
Mercator embeds networks into the
model incorporating machine learning and maximum likelihood via a fast and precise mode.
LaBNE [24]: LaBNE is a manifold learning method based on the Laplace eigen decomposition. Embedding vectors are transformed into a two-dimensional hyperbolic plane according to conformal mapping.
4.2.3. Metrics
We use two metrics to evaluate the network embedding: the hyperbolic distance correlation and the concordance score.
Hyperbolic distance correlation (HD-corr): HD-corr is the Pearson correlation of the pairwise hyperbolic distance between initial coordinates and embedding coordinates. The Pearson correlation coefficient can measure the linear relation between two objects and estimate whether the linear relation can be fitted to a straight line. The closer the absolute value approaches 1, the stronger the correlation is. Otherwise, the closer the absolute value approaches 0, the weaker the correlation is.
Concordance score (C-score): C-score is the proportion of node pairs arranged in the same rotation direction of the initial network versus the embedding network, which is shown in (
16).
In which n is the node number and i and j represent two nodes. If the direction (clockwise or anti-clockwise) of the shortest angular distance between i and j in the initial coordinates is the same as that in the embedding coordinates, then is 1. Otherwise, is 0. Similar to HD-corr, the C-score increases from 0 to 1 as the embedding performance improves. Therefore, these two metrics can guide the parameter selection of MpDHE. Specifically, we set T and to high HD-corr and C-score.
4.2.4. Results
First, we evaluate the embedding performance within different complex network configuration parameters.
Figure 4 shows the mean HD-corr within different complex network parameters, where each row represents one specific embedding scheme. Each column corresponds to a specific power-law index. For each subgraph, the horizontal axis represents the temperature coefficient
T, and the vertical axis represents the average degree
m. The color of the heat map reflects the HD-corr value. The embedding performance increases as the color deepens. The results show that the embedding performance of our scheme is pretty good when compared with the bound of LaBNE within different network configurations. In addition, the optimal network configuration of parameter combinations appears at the lower left corner of the heat map for schemes, i.e., the parameter combination of
,
and
. This indicates that the hyperbolic embedding scheme is well-suited for low-temperature and dense, complex networks.
To further verify the adaptability of our embedding scheme within different networks, we perform evaluations on three groups of experiments in different scenarios.
The first group: We set the complex network configuration as a parameter combination of:
,
and
. The initial network size is set to 10,000 nodes. Then, we choose a different proportion of changed nodes in the network, and generate 10 groups of networks. The embedding results of HD-corr and C-score are shown in
Table 1. Compared with the other static embedding methods, our scheme has pretty good embedding performance. As the proportion of changed nodes increases, the HD-corr and C-score of the embedding schemes slightly decrease. At the same time, C-score is higher than HD-corr within the same situation since HD-corr measures hyperbolic distance, while C-score only measures the relative angle.
The second group: We set the complex network configuration parameters as the same as the first group; we only changed the node variation ratio to 1%.
Table 2 shows the HD-corr and C-score with a network scale from 1000 to 20,000. The results also show that our scheme achieves good embedding performance when compared with other methods.
The third group: We set the network scale as 10,000 nodes and the node variation ratio as 1%. Then, we extended the network time step from two steps to six steps, i.e., net0, net1 till net5. The other network configuration parameters are the same as the previous two groups. The results are shown in
Table 3. We find that the HD-corr and C-score of our method slightly decreases but still stay above 0.96 and 0.99 with more updates.
From the three groups of evaluations, we find that our dynamic embedding scheme is rather competitive and may even be superior when compared with many schemes of static embedding in terms of embedding efficiency. Coalescent is the only one that obviously exceeds our scheme. It shows that our dynamic updating process does not incur an obvious loss of embedding efficiency.
4.3. Embedding Efficiency Evaluations
4.3.1. Settings
We fix the proportion of changed nodes to 1%, change the network scale from 2500 to 17,500 in the initial network net0 and use LaBNE and MpDHE to make the embeddings for net1. The other parameters are set as: , and .
We make embedding efficiency evaluations on our proposed hyperbolic embedding scheme with both a synthetic network generated by the nPSO model and realistic, complex network datasets. To embody the dynamic network scenario, we involve two continuous time steps of network scenarios in the evaluation, which can also be easily extended to more continuous time steps. The dynamic network situation includes two continuous time steps: the initial “net0” and the second time step of “net1”. We compare the embedding time of different hyperbolic embedding schemes to “net1”. The details of the complex networks used for the evaluation are as follows.
nPSO-1: a synthesis network dataset generated by the nPSO model. We set 15 communities, and the network parameters are . The proportion of varied nodes between net0 and net1 is 20%.
BS: Network [
48] consists of recorded users’ behavior on the Internet in a Chinese city for two weeks. It uses base stations (BS) as nodes. If there are people exchanges over a period of time, we think the two base stations have an edge.
DBLP: Is the reference network of DBLP [
49], which is a database of scientific publications. The dataset can be downloaded from
http://konect.cc/networks/dblp-cite/, accessed on 7 September 2022.
arXiv-HepPh: Is a co-reference network of scientific papers from the high-energy physics and phenomenology (Hep-Ph) part of arXiv [
50]. The dataset can be downloaded from
http://konect.cc/networks/ca-cit-HepPh/, accessed on 7 September 2022.
The statistical metrics of the complex network generated by the nPSO model and from the realistic dataset are shown in
Table 4. Where
is the number of nodes in the initial network net0.
is the number of edges in the initial network net0.
is the power-law distribution index in the initial network net0.
is the number of newly added edges in net1 compared with net0. Correspondingly,
is the number of deleted edges in net1 compared with net0.
4.3.2. Results
The embedding time results are shown in
Table 5. We can see that our scheme achieves the best time efficiency within the different datasets. The static hyperbolic embedding schemes need to recompute coordinates each time the network changes, and our scheme only needs to update coordinates with low time complexity. In the last section, Coalescent achieves the best result; it exceeds our scheme a little bit. However, our scheme takes less time and achieves very close results, so our scheme is effective with better embedding time efficiency and valid precision.
4.4. Visualization Effect for Downstream Community Discovery
4.4.1. Settings
We perform performance evaluations of our proposed hyperbolic embedding schemes within the downstream community discovery task. Our evaluations are implemented on both complex networks generated by the nPSO model and the realistic complex network datasets.
4.4.2. Results
We implement our proposed scheme and then utilize the Critical Gap Method (CGM) [
51] to perform the downstream community discovery task based on the embedding. We show the visualization of our scheme when compared with the classic community discovery algorithm: Louvain algorithm [
52]. It forms the community directly based on the topology structure without representation of learning-based approaches. The performance of the community discovery is quantified by modularity in
Table 6. Evidently, the modularity obtained by MpDHE still maintains a high level on most networks.
To conveniently show the visual effect of networks based on our schemes, we choose two medium-scale networks. One is from the synthetic network, and the other is from the real network.
Visualization of the synthetic network:Figure 5 and
Figure 6 show the visualization effect of the community discovery task for net0 on the nPSO-1 dataset.
Figure 5 is the visualization effect of the community discovery task using the Louvain algorithm. In the figure, the color represents community division, and the same color represents the same community.
Figure 6a shows the hyperbolic embedding coordinates of the network, and
Figure 6b is the result of community discovery using the CGM algorithm based on the embedding coordinates in
Figure 6a.
Comparing the above figures, we can see that the Louvain algorithm forms the community based on network topology, so the location of nodes and the distance between nodes have no obvious physical meaning. However, community discovery based on the hyperbolic embedding results in a good visualization such that the embedding node location represents the balance between popularity and similarity. The similar color of two communities (i.e., with similar angular coordinates) in the figure implies the similarity of the two communities. Moreover, the radial coordinates (representing the popularity) imply the key nodes (circled in
Figure 6b) in the community.
Visualization for the realistic complex network: For the realistic, complex network, we choose the “Students” dataset to show the visualization effect.
Figure 7 and
Figure 8 show the visualization results of the community discovery of net0 in the “Students” dataset.
Figure 7 shows the visualization of community discovery using the Louvain algorithm. The same color in the figure represents the same community.
Figure 8c shows the hyperbolic embedding coordinates of the network. Similar to the nPSO network, the visualization effect based on our embedding scheme is obviously better than the Louvain algorithm. The interaction within and between communities is obvious and clear. If the labels and attributes of nodes are known, such visualization effects will be further improved.
We then exhibit the visualization effect of the “Students” dataset within two continuous time-step scenarios.
Figure 8a,b shows the evolution of net0 and net1 in the “Students” dataset. The results show that the community interactions vary with the two time steps’ evolution, but the overall community situation remains stable. The change in the node with a larger radial coordinate is more obvious than the node with a smaller radial coordinate, i.e., the node with a big degree does not incur a big change in the evolution, which is consistent with our basic assumption.
4.5. Important Nodes Searching for Downstream Routing
4.5.1. Settings
We performed performance evaluations within the downstream routing task on two datasets. One is the synthetic dataset nPSO-1, and the other is the realistic dataset DBLP. The other settings are the same as the previous evaluation.
4.5.2. Metrics
Traffic Load Centrality (TLC): First, we use TLC to calculate the importance of nodes for routing. It assumes each node sends a unit of some commodity (e.g., traffic) to any other node. The commodity is transferred from one node to a neighbor closest to the destination. If more than one such neighbor exists, the commodity would be equally divided among them. TLC is defined as the total amount of commodities passing through a node via these exchanges [
53]. The more the commodity passes, the more important the node is.
Hyperbolic Traffic Load Centrality (HTLC): HTLC is an approximation of TLC. It assumes each node sends a unit of some commodity to each other node, and from each node (except the destination), the commodity is equally divided to its greedy neighbors. HTLC is defined as the total amount of commodity passing through a node via these exchanges. Thus, HTLC considers greedy paths over hyperbolic coordinates instead of hop-measured shortest paths [
54].
We calculate HTLC using hyperbolic coordinates obtained from each scheme, respectively. Then, we compare the top-k nodes by HTLC with that by TLC that do not need hyperbolic coordinates. The more nodes in the intersection of top-k nodes by HTLC and, therefore, by TLC, the better the scheme is.
4.5.3. Results
The results of important node searching are shown in
Figure 9 and
Figure 10. The “prediction” means that we use HTLC to predict the top-k nodes, which is calculated by TLC. The results show that our scheme does not achieve the best result, but the difference between our scheme and the other schemes is minor. This is consistent with our target, i.e., to reduce the complexity of dynamic hyperbolic embedding while maintaining good precision for downstream tasks.