Next Article in Journal
Biotechnological Conversions of Mizithra Second Cheese Whey by Wild-Type Non-Conventional Yeast Strains: Production of Yeast Cell Biomass, Single-Cell Oil and Polysaccharides
Previous Article in Journal
A Low-Temperature and Low-Pressure Distillation Plant for Dairy Wastewater
Previous Article in Special Issue
Generalized Sketches for Streaming Sets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Community Detection Based on Evolutionary DeepWalk

1
School of Computer Science and Technology, China University of Mining and Technology, No. 1 Daxue Road, Xuzhou 221116, China
2
State Key Laboratory of NBC Protection for Civilian, Beijing 100038, China
3
Department of Transportation and Marketing, Huaibei Mining Co., Ltd., Huaibei 235006, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11464; https://doi.org/10.3390/app122211464
Submission received: 15 September 2022 / Revised: 3 November 2022 / Accepted: 7 November 2022 / Published: 11 November 2022

Abstract

:
To fully characterize the evolution process of the topological structure of dynamic communities, we propose a dynamic community detection based on Evolutionary DeepWalk (DEDW) for the high-dimensional data and dynamic characteristics. First, DEDW solves the problem of data sparseness in the process of dynamic network data representation through graph embedding. Then, DEDW uses the DeepWalk algorithm to generate node embedding feature vectors based on the characteristics of the stable change of the community structure; finally, DEDW integrates historical network structure information to generate evolutionary graph features and implements dynamic community detection with the K-means algorithm. Experiments show that DEDW can mine the time-smooth change characteristics of dynamic communities, solve the problem of data sparseness in the process of node embedding, fully consider historical structure information, and improve the accuracy and stability of dynamic community detection.

1. Introduction

Complex networks are ubiquitous in real life, and many real systems are represented by complex networks, such as social networks [1,2], the biological information networks [3,4], co-authored networks [5,6], and so on. The analysis of the complex network structure is an important means of understanding the characteristics of the network, and the community is one of the most basic characteristics of the network structure. Communities are collections of nodes in the network, the connection of nodes within the collections is close, while between the collections is sparse [7]. Community detection has achieved good applications in Public Security [8], Medical Health [9], Recommendation Systems [10], Link Prediction [11], and many other fields.
Complex networks usually develop dynamically. Real-time analysis of dynamic network structures and distinguishing the communities where nodes belong to has gradually become one of the main research directions in the network analysis. In the process of dealing with dynamic networks, the following two problems are mainly faced:
  • High-dimensional characteristics of the network. In real-world complex networks, the number of nodes is usually tens of thousands. The representation of the network in computer applications mainly uses two basic structures: Adjacency list and adjacency matrix. The storage mode of the adjacency list and matrix are not conducive to computer calculation and need large storage [12]. Additionally, the network is mostly sparse, which will cause a waste of storage and computing resources.
  • Dynamic characteristics of the network. The nodes and edges in the network often change over time. The dynamic changes of the network will lead to corresponding changes in the structure of the network, but most of the nodes and edges in the network have not changed. How to balance the historical structure information and the current structure information in the network has become a major challenge.
For problem 1, the graph embedding technology provides a better solution. Graph embedding provides an efficient way for the dynamic network model and representation. The aim of graph embedding is to map large-scale, high-dimensional networks to a low dimensional space. Graph embedding uses a low dimensional vector matrix to store the structure and attribute characteristics of nodes and networks. The storage space is much lower than the network structure represented by the adjacency matrix. For problem 2, evolutionary clustering provides a solution, which considers that, in the development of networks, dynamic networks have the characteristics of temporal smoothing and only local network elements change. Thus, the historical structure information has a greater influence on the current network. Using the historical network structure information to regularize the current network can make the network structure evolve smoothly.
Inspired by evolutionary clustering, we extend the idea of evolutionary clustering to graph the embedding and propose the concept of evolutionary DeepWalk. Deepwalk [13] is one of the representative network structure analysis models, and it can learn the hidden information from networks combining random work with word2vec models using unsupervised learning. Evolutionary DeepWalk is the fusion of the historical embedding features of the previous snapshot network and the network embedding features of the current snapshot network to construct a new comprehensive embedding feature, so that the static DeepWalk algorithm can be applied to the dynamic graph analysis. Then, we combine evolutionary DeepWalk with the clustering algorithm to build a dynamic community detection framework based on evolutionary DeepWalk (DEDW). Finally, we use the artificial and real-world datasets to verify the performance of the algorithm.
The main contributions of our studies can be summarized as follows:
  • According to the idea of evolutionary clustering, we propose the concept of evolutionary DeepWalk, and extend the application of static graph embedding algorithm DeepWalk to the field of dynamic graph analysis;
  • We propose a method of dynamic community detection based on evolutionary DeepWalk, which combines evolutionary DeepWalk with the clustering algorithm;
  • We compare our algorithm with five algorithms on ten datasets, including six artificial dynamic networks and four real-world dynamic networks. The experimental results demonstrate that DEDW improves accuracy, showing that network embedding is promising for a dynamic community detection.
The remaining of this paper is organized as follows. Related work is surveyed in Section 2. Next, we describe the problem formulation and solution in Section 3. We empirically validate our new approach in Section 4 and conclude in Section 5.

2. Related Work

In this section, we first introduce some works on graph embedding, which include static and dynamic methods. Then, we introduce some works on community detection.

2.1. Graph Embedding

The research of graph embedding algorithms and models mainly includes static graph embedding algorithms and dynamic graph embedding algorithms. Static graph embedding algorithms mainly include matrix decomposition, random walk, deep learning, and other graph embedding methods [14]. Dynamic graph embedding algorithm includes matrix factorization, random-walk, deep learning, edge reconstruction [15].
In the static graph embedding algorithms, the methods based on random walk expand the Word2vec model in natural language processing and processes the node information in the network as word information. DeepWalk [13] and Node2vec [16] are the main representative algorithms. The methods based on deep learning mainly adopt deep neural network technology using the encoder-decoder framework model. SDNE [17] is the main representative method, which considers the first-order and two-order of similarity of the graph, designs highly nonlinear functions, and optimizes the objective function to get the result of embeddings. However, for a network with millions of nodes, the efficiency of this method is relatively low.
In the dynamic graph embedding algorithm, DynGEM [18] extends SDNE to the field of dynamic graph embedding. It uses a heuristic method to construct the first-order and second-order similarity of the networks, and introduces regular terms to prevent overfitting. DynGEM retains the embedding information at the previous moment so that the embedded model at the next moment can directly inherit the parameters of the model trained at the previous moment. However, considering only the embedded information from a previous snapshot network, other historical information may be ignored. Thus, Goyal et al. [19] put forward the Dyngraph2vec model, which takes the network structure at time step t − l, t − l + 1, …, t as historical information when computing the embedding matrix at time t + 1, to reflect the cross interaction of nodes in the whole time series and the nonlinear interaction in each temporal snapshot network. In addition, the model proposes three methods, including Dyngraph2vecAE, Dyngraph2vecRNN, and Dyngraph2vecAERNN.

2.2. Community Detection

The community detection method based on graph embedding usually embeds the network into the low-dimensional space first, then uses a clustering algorithm, such as K-Means, to classify the embedding results and classifies the community types.
In the static community detection methods, some studies have tried to use graph embedding methods. Cavallari et al. [20] proposed the concept of community embedding and studied the distribution of community structure in low-dimensional space. They think that community detection, node embedding, and community embedding form a closed loop. According to the closed-loop theory, they combined the three methods to establish a community embedding framework. The CommunityGAN [21] framework focuses on the clique structure in the network and applies a generative adversary network theory to community detection for the first time. In the generator, CommunityGAN uses positive sampling for clique distribution and negative sampling for the whole graph structure. However, the above two methods need to know the number of social network communities, which is precisely the condition that most real-world networks cannot have.
At present, most of the graph embedding methods are only applied to static community detection, while the research on dynamic community detection is less. According to different research frameworks, dynamic community detection methods can be divided into evolutionary graph clustering-based methods and incremental clustering-based methods.
The evolutionary graph clustering method mainly uses the idea of evolutionary clustering. It considers that the network evolution has temporal smoothness. In the process of the network development with time, only a few nodes and edges change, and the overall structure of the network will not change. In real life, the real network evolution basically conforms to this situation. Our method is also based on this premise. Evolutionary clustering was first proposed by Chakrabarti et al. [22] in 2006. It uses temporal smoothness to balance the communities obtained in two continuous-temporal snapshot networks. It consists of snapshot cost (CS) and temporal cost (CT), where CS represents the degree of community structure in the current temporal snapshots, and CT is a measure of how similar the community structure of the current temporal snapshot network is to the community structure of the previous temporal snapshot. The cost function is represented by the following Equation (1):
Cost = αCS + (1–α)CT
The FacetNet [23] is inspired by the idea of evolutionary clustering. It combines the historical community structure with the current community to detect the community and uses non-negative matrix decomposition as an expression to unify the formal community. Ma et al. [24] proposed a graph regularized evolutionary nonnegative matrix factorization algorithm for dynamic community detection based on the idea of evolutionary graph clustering. To balance temporal cost and snapshot cost, the algorithm using regularization exam is used to optimize the temporal cost of the overall objective function.
The incremental clustering method mainly deals with changing elements from the behavior patterns of nodes and edges. When nodes and edges in the network are added or deleted, the dynamic community detection algorithm will judge and classify the changed elements. Dyperm [25] is an incremental community detection that takes permanence as the attribute of incremental elements. It discriminates the changing elements and their associated changing elements in each scenario to discover the changing elements in the temporal snapshot network.

3. The DeDW Algorithm

In this section, we describe several concepts in dynamic community detection and provide some symbols in this article. Then we apply the static graph embedding algorithm DeepWalk to dynamic network computing by proposing the evolutionary DeepWalk method. Finally, we propose a dynamic community detection based on evolutionary DeepWalk combined with the K-means algorithm. Figure 1 shows the research framework of this article. Firstly, the idea of evolutionary clustering is introduced and extended to graph embedding. Secondly, the concept of evolutionary DeepWalk is proposed to fuse the historical embedding features of the previous snapshot network and the network embedding features of the current snapshot network. In this way, a new comprehensive embedding feature can be constructed, so that we can apply different static DeepWalk algorithms or static DeepWalk algorithms multi-times to solve the problems. Thirdly, we build a dynamic community detection framework based on evolutionary DeepWalk (DEDW) combining evolutionary DeepWalk with a clustering algorithm. Finally, both artificial and real-world datasets are used to verify the performance of our work.

3.1. Problem Description and Symbol Definition

Definition 1. 
Dynamic network. In a dynamic network G = (V, E), V = {v1, ⋯, vn} represents the set of nodes, E = {eij| ∀i, j ∈ {1, ⋯, n}, i ≠ j} represents the collection of edges. Dynamic networks can be divided from time step 1 to the time step T of a fixed-length temporal snapshot network G = {g1, …, gt, …, gT }, 1 ≤ t ≤ T. In the temporal snapshot network gt = (Vt, Et), vit ∈ Vt represents node i in gt, eijt ∈ Et represents the connection edge between node i and node j in gt.
In a dynamic network, the attribute of each node is represented as a feature vector. In this paper, we suppose the attribute of each node at different time step is stable and invariable. The original community structure is represented as adjacent matrix, due to the nodes and their relationship are sparse, graph embedding is used to map high dimensional sparseness adjacent matrix to low dimensional dense vector space.
Definition 2. 
Dynamic communities. the temporal snapshot network gt = (Vt, Et) can be divided into k communities, the community Ct can be expressed as Ct = Ct = {ct1, ct2, …, ctk},1 ≤ t ≤ T, and cti∩ct j = Ø, ∀i, j ∈ {1, ⋯, n}, i ≠ j. The dynamic communities divided by the dynamic network G = (V, E) can be represents by C = {C1, C2, ⋯, CT}.
In order to capture the change in dynamic networks, the dynamic graph G is partitioned into different time steps. In each time step, there are nodes added to or removed from G, meanwhile edges associated with nodes are added or removed. For simplification, we generated embedding matrix of G at each time step. At each time step t, the comprehensive features of nodes are represented as their current state (namely current features) and previous state (namely historical features). Table 1 shows the symbols commonly used in this article and their corresponding descriptions.
In this paper, graph embedding is used to map adjacent matrix to feature vector. Graph embedding as one of the technologies of graph feature learning, attracts great attention. Given graph G, the aim of graph embedding is to learn a map function f, f : G d , that can map high dimensional sparseness adjacent matrix into low dimensional dense vector space. In the map function, d is embedding dimension and d << |V|. In this way, the dense and low dimension matrix is used for graph analysis instead of a sparse and high dimensional adjacent matrix. Moreover, the graph model can represent the relation of nodes with those with no direct edges, which can remit the sparse node interaction problem.

3.2. Evolutionary DeepWalk

DeepWalk is the first algorithm to combine natural language processing with unsupervised feature learning in complex networks. Firstly, the random walk sequence is obtained for each node. Then, the sequences are input into the skip-gram model [26] of natural language processing in the context of the node, to obtain the embedded feature vector of each node. The time complexity of DeepWalk is mainly divided into two aspects. One is that the time complexity of random walk sampling is O(n + m), and the other is that the complexity of the skip-gram model is O(nlogn), so the time complexity of this algorithm is O(n + m + nlogn).
This method aims to analyze the community and its evolution in a unified and smooth time process. That is, the evolution of network community structure from time t − 1 to time t is relatively stable, and there will be no network structure mutation. To achieve this goal, inspired by the cost function of evolutionary clustering (Equation (1)), we try to use the community structure at time t − 1 to regulate the community structure at time t. Therefore, we propose the concept of evolutionary DeepWalk. The calculation formula is as follows:
ComF = αHF + (1 − α)CF
where ComF (Comprehensive Features) represents the overall features of the community structure at time t, CF (Current Features) is the node embedding feature obtained by DeepWalk into the network at time t. HF (Historical Features) represents comprehensive features of the network at time t − 1. The parameter α is a hyperparameter to control the influence weight of the historical features on the current features.
In the real-world network, the nodes in the network will change over time. New nodes will join the current network and establish new connections with other nodes in the network. Some existing nodes in the network will also leave the current network, and the connections established by these nodes will be disconnected accordingly. By analyzing the above situation, we can see that the change in the edge can be divided into two situations. That is, the addition of edges and the deletion of edges, and the change of node can be divided into two situations: The addition of nodes and the deletion of nodes.
When the connected edges in the network change, the node binary tree constructed in the DeepWalk algorithm will be different, so the node sequence generated by random walk will change, then the current embedded vector of each node will change. Therefore, there is no need to consider the change of network connection edge in the algorithm.
The change of the node will change the size of the overall embedding matrix. When dealing with the change of nodes, our method is used to deal with the embedded feature matrix obtained by DeepWalk. To consider the change of nodes in the network, it is necessary to deal with the historical embedded matrix and the current embedded matrix. We define the type of node change, node change is mainly divided into two cases, one is Node addition, and the other is node deletion.
The addition of node i at time t is shown in Equation (3):
H F i t =   and   C F i t    
The deletion of node i at time t is shown in Equation (4):
H F i t       and   C F i t =
The detailed calculation processing of node change is given in Figure 2.
  • Nodes Addition. At time t, if a new node i is added, there will be a new feature vector CFit in the feature vector of CFt. To use Equation (2), we must ensure that the dimension of HFt and CFt are n × d. HFt does not have node i, so we need to deal with HFt. Our method is to add zero vectors to the position of the corresponding new node i as the reserved position in HFt, so that there is a position of the corresponding new node i vector when calculating ComFt.
  • Nodes Deletion. (b). At time t, if there is a node i leaving the network, the eigenvector of node i does not exist in the eigenvector of CFt. It only needs to delete the eigenvector HFit in HFt to keep the same number of nodes and dimensions of HFt and CFt, which is convenient for calculating ComFt.
The AddNodes and DelNodes are two sets to store the added and deleted nodes calculated, respectively, according to Equations (3) and (4). If there is a node to be added, a zero vector is added to the corresponding position to complete HFt. If there is a node to be deleted, HFt directly removes the relevant node vector from it. EvoGE only processes the embedding feature matrix without other complicated steps, so the time complexity is O(1).
During the execution of Algorithm 1, given historical embedding matrix HFt of G at time step t and current embedding matrix CFt of G at time step t. Taking Figure 2 as an example, Line 1 is used to find nodes added to the graph with the time step. In the top part of Figure 2, we can that node i is added to the graph from the first two time-steps (from t = 1 to t = 2). While from the time step t = 2 to t = 3, node j leaves the graph. This step is calculated with Line 2. If any nodes are added to the graph detected by Line 3–5, In order to keep the same size of the embedding matrix, corresponding zero vectors should be added to HFt. Meanwhile, if the algorithm detects some nodes are removed from the graph, then corresponding node vectors should be removed from CFt (Line 6–8). Finally, comprehensive features matrix can be calculated by Line 9. Then, we can obtain the final embedding matrix of the graph at time step t.
Algorithm 1: evolutionary DeepWalk (EDW)
Input: Historical Features matrix HFt; Current Features matrix CFt
Output: Comprehensive Features matrix ComFt
1compute AddNodes form Equation (3) //get nodes which are added
2compute DelNodes from Equation (4)  //get nodes which are deleted
3if AddNodes ≠ Ø do          //Nodes Addition
4    HFt. Add( 0 ∗ AddNodes)
5end if
6if DelNodes ≠ Ø do           //Nodes Deletion
7    HFt. Del(DelNodes)
8end if
9ComFt = αHFt + (1 − α) CFt
10return ComFt

3.3. Dynamic Community Detection

According to the proposed EDW, the comprehensive embedding feature vector of each node in each temporal snapshot network can be obtained, and then the community structure of the current temporal snapshot network can be divided by applying the clustering algorithm. Therefore, we propose DEDW. Figure 2 shows an illustrative example to detect the community using DEDW. The algorithm is specified in Algorithm 2.
Algorithm 2: Dynamic community detection framework based on EDW (DEDW)
Input: Dynamic network G = {g1, ⋯, gt, ⋯,gT}; Weight parameter α
Output: Whole community structure C = {C1, C2, ⋯, CT}
1C = Ø
2if t = 1 do // deal with the initial network
3HF2 = ComF1 = HF1 = CF1 = DeepWalk(g1)
4  C1 = K-Means(ComF1)
5  C.append(C1)
6end if
7fort = 2 to T do
8  CFt = DeepWalk(gt)
9  HFt = ComFt−1
10  ComFt = EvoGE(CFt, HFt)
11  Ct = K-Means(ComFt)
12  C.append(Ct)
13end for
14return C
As shown in Figure 2, in DEDW, the dynamic network G can be partitioned into three temporal snapshot networks: g1, g2, and g3.
  • Firstly, we perform community detection on g1. Then, we use graph embedding to obtain the comprehensive feature matrix ComF1 and then use the clustering algorithm to cluster the feature matrix to obtain the community result at t = 1.
  • Then, observing g2, we can see that the network only adds a new node i in t = 2. First, we use DeepWalk to embed g2 to obtain the current embedding feature matrix CF2 and use ComF1 as the historical embedding feature matrix HF2 in this step. Since there is no corresponding feature vector of node i in HF2, it is necessary to judge the change of nodes in g2. According to the row number of feature vectors, we can get the embedding features vector of node i added in CF2, so we can set the node i vectors with 0 vectors as the features in the corresponding position of HF2. Then, we calculate ComF2 according to Equation (2) and use the clustering algorithm for ComF2 to obtain the community structure of g2 when t = 2.
  • Finally, we try to deal with the changing of nodes and their relationship in g3. As shown in Figure 2, the main change in g3 is that node j leaves the current network, and the connections between this node and other nodes are eliminated at the same time. Firstly, we perform graph embedding on g3 to obtain the current embedding feature matrix CF3. Since there is no feature vector of node j in CF3, the relevant information of node j needs to be deleted from HF3. Then, we use the Equation (2) to obtain ComF3 and use the clustering algorithm to obtain the community information of g3.
In the above method, DeepWalk is a static graph embedding algorithm, K-Means is the clustering algorithm. When dealing with the initial network, the static graph embedding vector is used as the comprehensive features of the network.
Analysis of the time complexity of the frame, mainly in four aspects.
  • The time complexity of DeepWalk;
  • The time complexity of evolutionary graph embedding;
  • The time complexity of K-means;
  • The number of temporal snapshot networks.
The main time consumption of DeepWalk is random walk strategies and SkipGram training. For a given graph with n nodes and m edges, the sampling strategy is used for the random walk sampling sequence with O(1) time complexity [12]. Suppose the walk length is T, the count of walk sequence for each node is L, and the time complexity to sampling all sequences is O(m) + O(n × T × L). Then the DeepWalk is optimized by skip-gram with the total time complexity is O(E × n × T × L × K × logn), where E is the epoch and K is the window size. The total time complexity is O(m) + O(nTL) + O(E × nLTK × logn). Because L, T, K and E are small constant [27,28]. Therefore, the time complexity of DeepWalk is O(n + m + nlogn). The time complexity of EvoGE is O(1). The time complexity of K-means is O(k × d × n). Since kdn, k and d are small constants, the time complexity of K-means can be approximated as O(n). The number of temporal snapshot networks is T. Therefore, the time complexity of using DeepWalk and K-means as the graph embedding and clustering algorithm for DEDW is O((n + m + nlogn) × T).

4. Experimental Analysis

In this section, we start by briefly describing the datasets, evaluation criteria, and baseline methods, followed by detailed experimental results. The experimental environment is as follows: a laptop with 12 GB memory and 2.3 GHz Intel Core i5-6300 HQ CPU, the operating system is Windows 10 (64 bit), and all algorithms are implemented based on the Python.

4.1. Datasets and Baseline Methods

4.1.1. Datasets

The datasets used in this experiment are continuous temporal snapshot networks, including artificial networks and real-world networks.
The artificial network datasets are generated by the dynamic LFR benchmark model [29]. The definition of the dynamic LFR benchmark model is as follows:
LFR = (N, s, k, maxk, μ, p)
where N represents the number of nodes, s denotes the number of time snapshots, k represents the average degree of the artificial networks, and maxk represents the maximum degree in the networks. μ represents the mixing coefficient, which is used to control the complexity of the network. p is the probability that node exchanges community members in temporal snapshot networks. In the experiment, we set N = 1000, s = 9, μ ∈ [0.3, 0.8]. We just changed μ to obtain six kinds of dynamic artificial datasets.
Real-world network datasets use continuous-temporal snapshot networks with known community structures, including the following networks: (1) The face-to-face communication network PS, (2) the contact network of employees in an office building CW, (3) the teacher and student network in collected in December 2011 HS2011, and (4) the teacher and student network collected in November 2012 HS2012. The detailed description of datasets is given in Table 2.

4.1.2. Baseline Methods

In this experiment, we chose five algorithms as the baseline methods: DeepWalk, Dyngraph2vecAE, DynGEM, Dyperm and Louvain [30].
The dynamic graph embedding algorithms like DynAE and DynGEM algorithms have been integrated into DynamicGEM [31] library. We use the K-means as a clustering algorithm to divide the communities of nodes. Since Dyperm needs the community results of the initial network, to maintain the fairness of comparison, we select DeepWalk and K-means as the basic algorithms for initial community detection.

4.2. Network Evaluation

In this section, we mainly analyze the performance from two aspects:
  • Analyzing the influence of the parameter α in EvoGE on the community detection results.
  • Analyzing the average performance of each algorithm on artificial and real-world datasets.

4.2.1. Parameter Analysis

The parameter α is used to balance the ratio of historical node characteristics to the current node characteristics. The larger the value of α, the greater influence of the historical network structure on the current network. To analyze the influence of the parameter α in evolutionary graph embedding on dynamic network community detection, this section shows the research of α in real-world networks changes in evolution. The parameter α takes the value within the range of [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], and uses grid search to obtain a better result, according to the evaluation results of dynamic community detection. The specific evaluation results are shown in Figure 3. Figure 3a shows the average evaluation results of the different α values with real-world dataset CW, Figure 3b shows the average evaluation results of different α values with real-world dataset PS.
As can be seen from Figure 3, when α = 0.6, the results of evaluation criteria in CW and PS are the best. It can be analyzed that the history of the network structure has a greater impact on the current network structure in evolution. In the next experiment, we will use α = 0.6 as the fixed coefficient for our method.

4.2.2. Overall Network Analysis

This section mainly discusses the community detection performance of each algorithm to dynamic networks. Figure 4 shows the effect of the value change of μ in the dynamic artificial LFR networks on the average evaluation results of each algorithm.
It can be seen from Figure 4 that, as the value of the complexity coefficient μ increases, the average evaluation results of all algorithms for the artificial network show a decreasing trend. This is because, when μ increases, the connection between communities in the LFR networks also increases, which results in the edge tightness of the inter-community and inner-community are not very different. Thus, it is more difficult to divide the community structure.
Figure 4 can be divided into two areas:
  • When μ ≤ 0.5, DeepWalk achieves the best results. It has more outstanding processing performance in dealing with relatively simple networks than other dynamic graph embedding algorithms and community detection algorithms.
  • When μ ≥ 0.6, DEDW has the best performance. It can be inferred that our method has a good dynamic community partition effect for the complex networks.
In addition, the average evaluation results of Louvain in the LFR networks are poor. The reason for these results may be that the generated artificial network community structure is not obvious, and the modularity is hard to be measured. When μ is between [0.3, 0.6], the evaluation results of DynGEM and DynAE have a large decline. It may be because these algorithms have better embedding performance for simpler networks, but for complex network structures, the ability of the algorithm to extract node features needs to be improved.
With the analysis of the above results, our method has good detection performance for more complex social networks. For networks with simple social relationships, we need to improve the detection performance of DEDW. Figure 5 describes the average evaluation results of each algorithm in real-world dynamic networks.
As can be seen from Figure 5, DEDW has the best evaluation results on the community detection of the four real-world network datasets. The performance of the DeepWalk is lower than DEDW. The reason is that DEDW considers the entire network evolution process, uses the influence of historical network structure on the current network and uses graph embedding methods to obtain node features of the network, to obtain more accurate community results. The purity and homo evaluation results of Dyperm in HS2011, HS2012, and PS are higher than other algorithms. It is found that the evaluation meaning of purity and homo is the same, both focus on finding and processing the nodes belonging to the same real community in the divided community, which indicates that the community results divided by Dyperm focus more on homogeneity. The Louvain algorithm found relatively good results for real-world network communities, indicating that real-world networks are more in line with the community characteristics of high cohesion and low coupling.

5. Conclusions

Dynamic network community detection is a project with broad application prospects. This paper studies an important, but largely understudied problem: The application of graph embedding technology to dynamic community detection. Inspired by evolutionary clustering, we propose the concept of evolutionary graph embedding which uses historical embedded information to regularize the current embedded information and expands the application scope of static graph embedding to continuous temporal snapshot networks. Meanwhile, we build a framework for dynamic community detection based on evolutionary graph embedding. With this framework, graph embedding and clustering algorithms can be combined for community detection. In addition, we also apply this method to many artificial and real-world networks and achieved good results.
Since it is found in experiments that our method needs to be improved for community detection of relatively low-complex dynamic networks, we will study how to enhance the graph embedding effect of this framework for relatively simple networks in future work. At the same time, we will design graph embedding algorithms that can consider community structure and try to apply this method to heterogeneous networks.

Author Contributions

Funding acquisition, G.Y.; Methodology, S.Q. and Y.Z.; Validation, Y.D. and X.D.; Writing—review & editing, M.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program under Grant 2020YFB1314100, the National Natural Science Foundation of China under Grants 71774159 and the funding of State Key Laboratory of NBC Protection for Civilian under Grants SKLNBC2020-23.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in the paper include artificial networks and real-world networks. The artificial networks are from reference [29] and now available at http://mlg.ucd.ie/dynamic/. The real-world networks are from reference [25] and now available at http://www.sociopatterns.org/datasets/.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Duan, X.Y.; Yuan, G.; Meng, F.R. Dynamic Community Detection: A Survey. J. Front. Comput. Sci. 2021, 15, 612–630. [Google Scholar] [CrossRef]
  2. Wang, Y.; Piao, C.; Liu, C.H.; Zhou, C.; Tang, J. Modeling User Interests with Online Social Network Influence by Memory Augmented Sequence Learning. IEEE Trans. Netw. Sci. Eng. 2021, 8, 541–554. [Google Scholar] [CrossRef]
  3. Bugnon, L.A.; Yones, C.; Milone, D.H.; Stegmayer, G. Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics. IEEE Trans. Neural Netw. Learning Syst. 2020, 31, 2857–2867. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Ma, X.; Sun, P.; Gong, M. An Integrative Framework of Heterogeneous Genomic Data for Cancer Dynamic Modules Based on Matrix Decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 2022, 19, 305–316. [Google Scholar] [CrossRef] [PubMed]
  5. Zhuang, H.; Sun, Y.; Tang, J.; Zhang, J.; Sun, X. Influence Maximization in Dynamic Social Networks. In Proceedings of the 13th IEEE International Conference on Data Mining, Dallas, TX, USA, 8–11 December 2013; pp. 1313–1318. [Google Scholar] [CrossRef]
  6. Tang, J.; Fong, A.C.M.; Wang, B.; Zhang, J. A Unified Probabilistic Framework for Name Disambiguation in Digital Library. IEEE Trans. Knowl. Data Eng. 2012, 24, 975–987. [Google Scholar] [CrossRef]
  7. Newman, M.E.J. Detecting Community Structure in Networks. Eur. Phys. J. B. 2004, 38, 321–330. [Google Scholar] [CrossRef]
  8. Calderoni, F.; Brunetto, D.; Piccardi, C. Communities in Criminal Networks: A Case Study. Soc. Netw. 2017, 48, 116–125. [Google Scholar] [CrossRef]
  9. Taya, F.; de Souza, J.; Thakor, N.V.; Bezerianos, A. Comparison Method for Community Detection on Brain Networks from Neuroimaging Data. Appl. Netw. Sci. 2016, 1, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Rezaeimehr, F.; Moradi, P.; Ahmadian, S.; Qader, N.N.; Jalili, M. TCARS: Time- and Community-Aware Recommendation System. Future Gener. Comput. Syst. 2018, 78, 419–429. [Google Scholar] [CrossRef]
  11. Soundarajan, S.; Hopcroft, J. Using Community Information to Improve the Precision of Link Prediction Methods. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 December 2012; pp. 607–608. [Google Scholar] [CrossRef]
  12. Liu, Q.D. Research on Community Detection Based on Network Embedding. Ph.D. Dissertation, Lanzhou University, Lanzhou, China, 2018. [Google Scholar]
  13. Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar] [CrossRef] [Green Version]
  14. Qi, Z.W.; Wang, J.H.; Yue, K.; Qiao, S.J.; Li, J. Methods and Applications of Graph Embedding: A Survey. Act. Electron. Sinica 2020, 48, 808–818. [Google Scholar] [CrossRef]
  15. Cao, Y.; Dong, Y.M.; Wu, S.Q.; Chen, H.H.; Qian, J.B.; Pan, S.L. Dynamic Network Representation Learning: A Review. Act. Electron. Sinica 2020, 48, 2047–2059. [Google Scholar] [CrossRef]
  16. Grover, A.; Leskovec, J. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar] [CrossRef] [Green Version]
  17. Wang, D.; Cui, P.; Zhu, W. Structural Deep Network Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
  18. Goyal, P.; Kamra, N.; He, X.; Liu, Y. DynGEM: Deep Embedding Method for Dynamic Graphs. In Proceedings of the 3rd International Workshop on Representation Learning for Graphs, Melbourne, Australia, 19–25 August 2017; pp. 1–8. [Google Scholar] [CrossRef]
  19. Goyal, P.; Chhetri, S.R.; Canedo, A. Dyngraph2vec: Capturing Network Dynamics Using Dynamic Graph Representation Learning. Knowl.-Based Syst. 2020, 187, 104816. [Google Scholar] [CrossRef]
  20. Cavallari, S.; Zheng, V.W.; Cai, H.; Chang, K.C.C.; Cambria, E. Learning Community Embedding with Community Detection and Node Embedding on Graphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 377–386. [Google Scholar] [CrossRef]
  21. Jia, Y.; Zhang, Q.; Zhang, W.; Wang, X. CommunityGAN: Community Detection with Generative Adversarial Nets. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 784–794. [Google Scholar] [CrossRef] [Green Version]
  22. Chakrabarti, D.; Kumar, R.; Tomkins, A. Evolutionary Clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 554–560. [Google Scholar] [CrossRef]
  23. Lin, Y.R.; Chi, Y.; Zhu, S.; Sundaram, H.; Tseng, B.L. Facetnet: A Framework for Analyzing Communities and Their Evolutions in Dynamic Networks. In Proceedings of the 17th International Conference on World Wide Web, Beijing, China, 23–27 April 2008; pp. 685–694. [Google Scholar] [CrossRef]
  24. Ma, X.; Li, D.; Tan, S.; Huang, Z. Detecting Evolving Communities in Dynamic Networks Using Graph Regularized Evolutionary Nonnegative Matrix Factorization. Phys. A. 2019, 530, 121279. [Google Scholar] [CrossRef]
  25. Agarwal, P.; Verma, R.; Agarwal, A.; Chakraborty, T. DyPerm: Maximizing Permanence for Dynamic Community Detection. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, Australia, 15–18 May 2018; pp. 437–449. [Google Scholar] [CrossRef] [Green Version]
  26. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 16th International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013; pp. 1–12. [Google Scholar] [CrossRef]
  27. Cai, H.; Zheng, V.W.; Chang, K.C.C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef] [Green Version]
  28. Cui, P.; Wang, X.; Pei, J.; Zhu, W. A Survey on Network Embedding. IEEE Trans. Knowl. Data Eng. 2019, 31, 833–852. [Google Scholar] [CrossRef] [Green Version]
  29. Greene, D.; Doyle, D.; Cunningham, P. Tracking the Evolution of Communities in Dynamic Social Networks. In Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, Odense, Denmark, 9–11 August 2010; pp. 176–183. [Google Scholar] [CrossRef] [Green Version]
  30. Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. 2008, 2008, P10008. [Google Scholar] [CrossRef] [Green Version]
  31. Goyal, P.; Chhetri, S.R.; Mehrabi, N.; Ferrara, E.; Canedo, A. DynamicGEM: A Library for Dynamic Graph Embedding Methods. arXiv 2018. [Google Scholar] [CrossRef]
Figure 1. The research framework of this article.
Figure 1. The research framework of this article.
Applsci 12 11464 g001
Figure 2. Main framework of DEDW.
Figure 2. Main framework of DEDW.
Applsci 12 11464 g002
Figure 3. Avg evaluation results of different α values in the real-world datasets. (a) Avg evaluation of different α with CW. (b) Avg evaluation of different α with PS.
Figure 3. Avg evaluation results of different α values in the real-world datasets. (a) Avg evaluation of different α with CW. (b) Avg evaluation of different α with PS.
Applsci 12 11464 g003
Figure 4. Avg evaluation results of each algorithm in LFR datasets. (a) Avg FMI of algorithms with LFR. (b) Avg purity of algorithms with LFR. (c) Avg NMI of algorithms with LFR. (d) Avg homo of algorithms with LFR. (e) Avg comp of algorithms with LFR. (f) Avg V-measure of algorithms with LFR.
Figure 4. Avg evaluation results of each algorithm in LFR datasets. (a) Avg FMI of algorithms with LFR. (b) Avg purity of algorithms with LFR. (c) Avg NMI of algorithms with LFR. (d) Avg homo of algorithms with LFR. (e) Avg comp of algorithms with LFR. (f) Avg V-measure of algorithms with LFR.
Applsci 12 11464 g004
Figure 5. Avg evaluation results of each algorithm in the real-world dataset. (a) Avg results of algorithms with HS2011. (b) Avg results of algorithms with HS2012. (c) Avg results of algorithms with PS. (d) Avg results of algorithms with CW.
Figure 5. Avg evaluation results of each algorithm in the real-world dataset. (a) Avg results of algorithms with HS2011. (b) Avg results of algorithms with HS2012. (c) Avg results of algorithms with PS. (d) Avg results of algorithms with CW.
Applsci 12 11464 g005
Table 1. Symbols and descriptions.
Table 1. Symbols and descriptions.
SymbolsDescriptions
Gdynamic networks
gtemporal snapshot network
ttime snapshot
Tthe number of temporal snapshot networks
mthe number of edges
nthe number of nodes
dthe dimension of embedding vector
Ccommunity aggregation of dynamic network
HFitthe embedding vector of node i in historical embedding matrix at time t
CFitthe embedding vector of node i in current embedding matrix at time t
ComFtthe comprehensive features matrix of network embedding at time t
Table 2. Descriptions of datasets.
Table 2. Descriptions of datasets.
Dataset NameDescriptions
PSThe face-to-face communication network of 242 teachers and students
in a primary school.
CWA contact network of 145 employees in an office building in France.
HS2011The network of 126 teachers and students in Marseille high school in
France within a certain 4-days period in December 2011
HS2012The contact network of 180 teachers and students of Marseille High
School in France within a 7-day period in November 2012.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qu, S.; Du, Y.; Zhu, M.; Yuan, G.; Wang, J.; Zhang, Y.; Duan, X. Dynamic Community Detection Based on Evolutionary DeepWalk. Appl. Sci. 2022, 12, 11464. https://doi.org/10.3390/app122211464

AMA Style

Qu S, Du Y, Zhu M, Yuan G, Wang J, Zhang Y, Duan X. Dynamic Community Detection Based on Evolutionary DeepWalk. Applied Sciences. 2022; 12(22):11464. https://doi.org/10.3390/app122211464

Chicago/Turabian Style

Qu, Song, Yuqing Du, Mu Zhu, Guan Yuan, Jining Wang, Yanmei Zhang, and Xiangyu Duan. 2022. "Dynamic Community Detection Based on Evolutionary DeepWalk" Applied Sciences 12, no. 22: 11464. https://doi.org/10.3390/app122211464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop