A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks

Zhu, Wenxin; Li, Huan; Wei, Wenhong

doi:10.3390/math11122702

Open AccessArticle

A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks

by

Wenxin Zhu

,

Huan Li

and

Wenhong Wei

^*

School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(12), 2702; https://doi.org/10.3390/math11122702

Submission received: 5 May 2023 / Revised: 31 May 2023 / Accepted: 8 June 2023 / Published: 14 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

Community detection is a crucial research direction in the analysis of complex networks and has been shown to be an NP-hard problem (a problem that is at least as hard as the hardest problems in nondeterministic polynomial time). Multi-objective evolutionary algorithms (MOEAs) have demonstrated promising performance in community detection. Given that distinct crossover operators are suitable for various stages of algorithm evolution, we propose a two-stage algorithm that uses an individual similarity parameter to divide the algorithm into two stages. We employ appropriate crossover operators for each stage to achieve optimal performance. Additionally, a repair operation is applied to boundary-independent nodes during the second phase of the algorithm, resulting in improved community partitioning results. We assessed the effectiveness of the algorithm by measuring its performance on a synthetic network and four real-world network datasets. Compared to four existing competing methods, our algorithm achieves better accuracy and stability.

Keywords:

multi-objective optimization; community detection; complex network; evolutionary algorithms

MSC:

68T20; 90C27

1. Introduction

The relationships between individuals in the real world can be modeled as complex networks, such as social system networks, power system networks, transportation system networks, neural system networks, and the World Wide Web. A widely concerning issue in complex networks is the community structure of the network, which is characterized by tight connections between nodes within the same community and sparse connections between nodes in different communities [1]. Studying the topological structure of communities in complex networks has notable theoretical and practical significance for discovering the interrelationships between communities and understanding network behavior. In recent years, community detection has made substantial progress and has garnered growing interest across various interdisciplinary fields [2,3,4,5].

Community detection is essentially the process of partitioning a network into vertex subsets based on the characteristics of network vertices. This process can be formulated as an optimization problem [6]. Over the past decade, many community discovery methods have been developed, and evolutionary algorithms (EA) have proven to be effective in solving such problems [7]. EA-based community detection methods can be classified into two primary categories based on the number of optimized objectives [8]. The first type of algorithm aims to uncover community organization by optimizing a single objective. For example, Meme-Net [9] achieved favorable community divisions by maximizing modularity density, while CCDECD [10] and MA-Net [11] revealed the community structure by optimizing modularity Q. Although single-objective EA-based community detection algorithms have demonstrated favorable performance, it has been confirmed that the single-objective function of modularity Q suffers from the resolution limitation problem [12]. To overcome this limitation, multi-objective evolutionary algorithms (MOEAs) have been utilized to optimize multiple objectives simultaneously. MOEA-based community discovery methods have shown superior performance in identifying communities in complex networks, leading to the development of many competitive algorithms. MOGA-Net, proposed by Pizzuti [13] for community detection in complex networks, is the first MOEA that simultaneously optimizes two objectives: community score and community fitness. Its superiority over single-objective EA community detection algorithms was demonstrated by conducting experiments. Shi [14] developed a multi-objective community detection approach called MOCD, which optimizes two objectives that are both in conflict and complementary to each other. Gong [15] proposed MOEA/D-Net, which uses a decomposition approach in community detection problems and decomposes the multi-objective problem into a series of weighted subproblems for solutions. They continued this research direction and proposed MODPSO [16], which combines the particle swarm optimization algorithm into their study. Feng [17] designed a new algorithm for community detection using the discrete variant DBSA and decomposition method. Ji [18] combined ant colony optimization and decomposition ideas to propose MOCD-ACO. Shang [19] proposed a new algorithm for community discovery called MOPIO, which improves upon existing methods by utilizing a multi-individual crossover update strategy and an enhanced PIO method. Su [20] proposed PMOEA, an algorithm designed to detect communities in large-scale networks. This algorithm employs multiple copies of an MOEA to detect communities related to key nodes in parallel, which has been shown to have good performance. Qi [21] developed the MOEA-MR algorithm for community partitioning by utilizing multi-layer network reduction techniques. The algorithm leverages non-negative matrix factorization for generating reliable prior information while utilizing a node-degree-based strategy for iterative network reduction to reduce the size of the prior information network. Guo [22] introduced a new multi-objective community detection method that uses a multi-objective PSO algorithm based on label propagation and a VGAE with GAT. This approach is implemented within a multi-objective PSO framework, resulting in an effective community detection algorithm.

The multi-objective evolutionary algorithm based on decomposition (MOEA/D) [23] is a concise and efficient algorithm that decomposes multi-objective optimization problems into weighted sub-problems and optimizes them simultaneously. MOEA/D has shown strong competitiveness in solving community detection problems [24,25]. In this paper, we propose a new algorithm called TSMOCD that builds upon this research direction. TSMOCD uses similarity coefficients of individuals and is divided into two stages that use different crossover operators. An external population is used to store elite individuals and perform boundary-independent node repair operations, resulting in improved community detection outcomes. The main contributions of this paper are as follows:

We propose a two-stage community detection algorithm that takes into account the characteristics of different crossover operators. The first stage uses the uniform crossover operator to exchange information between individuals and improve overall fitness. When the individual similarity parameter is high, the algorithm enters the second stage where a graph-connected component-based crossover operator is applied to partially retain good partition results of parent individuals.
TSMOCD incorporates an external population (EP) to store solutions with high fitness values, and a repair operation is applied to the boundary-independent nodes of individuals in the EP in order to reassign their community labels, further improving community partitioning outcomes.

2. Related Background

This section introduces some fundamental concepts related to community detection problems, their objective functions, and MOEA/D.

2.1. Community Detection Problems

Community detection is extensively studied as a valuable tool for understanding the structure and function of complex networks. Complex networks are typically modeled as an undirected graph G as follows:

G = (V, E)

(1)

where V is a vertex or set of vertices, and E represents the set of edges connecting the elements of V. Community discovery is the process of partitioning network nodes based on their connection density. A well-detected community in a graph is densely connected among its own nodes and has few connections to nodes outside the community [26]. Specifically, the purpose of community detection is to obtain high-quality graph segmentation

G_{i} (i = 1, 2, \dots, n)

, where G satisfies the following:

\begin{matrix} G_{1} \cup G_{2} \cup \dots G_{n - 1} \cup G_{n} = G & a n d & G_{1} \cap G_{2} \cap \dots G_{n - 1} \cap G_{n} = \emptyset \end{matrix}

(2)

For example, Figure 1 shows a complex network with nine nodes, which is partitioned into two communities: one including nodes

\{1, 4, 5, 6\}

and the other including nodes

\{2, 3, 7, 8, 9\}

.

Community detection can be formulated as a bi-objective optimization problem by minimizing the kernel k-means (KKM) and the ratio cut (RC), which have been widely used in recent studies [21,22]. The specific definitions are as follows:

K K M = 2 (n - s) - \sum_{i = 1}^{s} \frac{L (V_{i}, V_{i})}{|V_{i}|}

(3)

R C = \sum_{i = 1}^{s} \frac{L (V_{i}, {\bar{V}}_{i})}{|V_{i}|}

(4)

where s represents the total number of communities in an individual,

L (V_{i}, V_{i})

represents the sum of node degrees within the community, and

L (V_{i}, {\bar{V}}_{i})

is the sum of node degrees between nodes in the community and nodes out of the community. KKM measures the overall density of links within a community, and it is inversely related to the number of communities. In contrast, RC exhibits the opposite trend, resulting in KKM and RC being conflicting targets. By simultaneously minimizing these two objectives, the links within communities can be dense, while those between communities can be sparse, meeting the requirements of both community detection and multi-objective optimization problems.

2.2. Multi-Objective Optimization

The multi-objective optimization problem is defined as follows:

minimize F (x) = {(f_{1} (x), f_{2} (x), \dots, f_{m} (x))}^{T} s u b j e c t t o x \in Ω

(5)

where

Ω \in R^{n}

.

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T} \in Ω

is the n-dimensional vector of the decision variable;

Ω

is the variable space;

F : Ω \to R^{m}

consists of m real value objective functions. The objective functions usually conflict with each other, and we need to find a set of optimal solutions that can strike a balance between them. The optimal trade-offs among the objectives can be described by Pareto optimality. Let

u, v \in R^{m}

; u is said to dominate v if and only if

u_{i} \leq v_{i}

for every

i \in \{1, 2, \dots, m\}

and

u_{i} < v_{i}

for at least one index

j \in \{1, 2, \dots, m\}

. A point

x^{*} \in Ω

is Pareto optimal if there is no point

x \in Ω

such that

F (x)

dominates

F (x^{*})

.

F (x^{*})

is then called a Pareto optimal vector, and the set containing all Pareto optimal objective vectors is referred to as the Pareto front.

2.3. MOEA/D

MOEA/D is a multi-objective evolutionary algorithm that utilizes decomposition techniques to solve multi-objective optimization problems. It decomposes the problem into a set of single-objective subproblems and optimizes them concurrently. There are three commonly employed decomposition methods: weighted sum, Tchebycheff, and penalty-based boundary intersection [27]. The Tchebycheff decomposition method is used in this paper, and its formula is as follows:

\begin{array}{l} g^{t e} (x | λ_{i}, z^{*}) = \max_{1 \leq i \leq m} \{λ_{i} | f_{i} (x) - z_{i}^{*} |\}, \\ \begin{matrix} s u b j e c t & t o & x \in Ω \end{matrix} \end{array}

(6)

where

λ^{i} = (λ_{1}^{i}, λ_{2}^{i}, \dots, λ_{m}^{i})

is the weight vector uniformly distributed in the objective space for the current subproblem,

z^{*} = (z_{1}^{*}, z_{2}^{*}, \dots, z_{m}^{*})

is the reference point, and

z_{i} = m i n_{1 \leq i \leq m} \{f_{i} (x) |x \in Ω\}

.

3. The Proposed Algorithm TSMOCD

In this section, we provide a detailed description of the TSMOCD community detection algorithm, which consists of two stages based on individual similarity parameters. In the first stage, the algorithm utilizes a uniform crossover operator to facilitate a comprehensive exchange of information between two parent individuals. This exchange mechanism enables a rapid enhancement of individual fitness. The value of the individual similarity parameter increases as the average individual fitness improves. Once the parameter exceeds a predefined threshold, the algorithm proceeds to the second stage. During the second stage, a connected component crossover operator is utilized to preserve excellent partition results obtained from the parent individuals. Additionally, we have introduced a boundary node repair strategy, which redistributes some nodes located on the community boundary to new communities to further improve the quality of the community.

3.1. Representation

TSMOCD adopts a locus-based adjacency representation [28]. In this representation, a network partitioning solution with N nodes is represented by a chromosome with N genes: that is, a vector with a length of N. Individuals are made up of N genes using this encoding method, and the range of values for each allele is

\{1, 2, \dots N\}

. The genes and alleles in this encoding method are nodes in graph G. Each allele, denoted as the i-th allele, corresponds to an adjacent node of node i in the locus-based representation; this encoding method first identifies each connected subnetwork in an individual and then divides vertices in the same subnetwork into the same community. For example, Figure 2a is a graph containing 7 nodes, Figure 2b shows an individual represented by locus-based adjacency representation, and Figure 2c displays the community division after decoding. Each connected subgraph represents a community.

3.2. Initialization

According to the previous description of the encoding method, if the allele of gene a is b, it indicates that node b is a neighboring node of node a; after decoding, they belong to the same community. During the population initialization process, we take reasonable values for each allele based on the connection relationship between vertices. In other words, the allele value of each gene in an individual can only be its adjacent vertex. This initialization method can significantly reduce invalid searches during algorithm evolution and accelerate the algorithm’s convergence.

3.3. Similarity Parameter

We incorporated the notions of individuals’ fitness standard deviation and similar parameters of individuals into the algorithm. The individuals’ fitness standard deviation represents the average dispersion of the fitness of individuals, and the similarity parameter indicates how similar individuals are within the current population. A higher similarity parameter indicates a greater similarity among individuals, which suggests a tendency for the algorithm to converge and for individuals to perform well overall. In contrast, a lower similarity parameter indicates less similarity among individuals, which can result in poorer overall population performance. Specifically, the similarity parameter can be designed as follows:

f_{a v g} = \frac{f_{1} + f_{2} + \dots + f_{N}}{N}

(7)

σ = \sqrt{\frac{1}{N} (\sum_{i = 1}^{N} {(f_{i} - f_{a v g})}^{2})}

(8)

δ = 0.5 + \frac{1}{1 + e^{\frac{σ}{f_{a v g}}}}

(9)

where

f_{1}, \dots, f_{N}

represents the fitness of the individual, N is the number of individuals in the population, and

f_{a v g}

reflects the average fitness of individuals in the population. 𝜎 represents the standard deviation.

δ

donates the similarity parameter, and

δ \in (0.5, 1)

. As the algorithm runs, the average fitness of the population gradually increases, while the individuals’ fitness standard deviation decreases, which leads to an increase in the value of the similarity parameter.

3.4. Crossover and Mutation Operator

In the first stage of the TSMOCD, a uniform crossover is applied to each individual in the population with a crossover probability of

p_{c}

. This operation is performed on each position of the parent chromosome with equal probability. A binary crossover mask with a length equal to the number of nodes is randomly generated, where each value in the mask is either 0 or 1. This process is applied to each gene of the child, where a gene is inherited from parent 1 if the corresponding position in the mask is 1, and a gene is inherited from parent 2 if the corresponding position is 0. Uniform crossover is illustrated in Figure 3 as an example.

When the value of the similarity parameter exceeds the set value of

δ_{k}

, this suggests that the individuals have already achieved relatively high fitness values, indicating a good overall quality of the community partition. At this point, a graph-connected component-based crossover operator is used to preserve the partially good partition results of parent individuals. Specifically, a random node is selected, and the connected component to which it belongs is identified. Then, all alleles of the nodes in the connected component are inherited by the child. For example, as shown in Figure 4, when the 3rd gene position is selected, the connected component to which it belongs includes nodes

\{1, 3, 6, 7\}

, and all alleles of these nodes are inherited by the child.

Mutation operation: For the individual to be mutated, the mutation probability

p_{m}

is used to mutate the alleles of each gene. If the mutation condition is met, the gene’s allele value is changed to an adjacent vertex’s corresponding value. Therefore, the possible values of the alleles of gene i are limited to the neighboring genes of gene i, and this avoids the search for invalid solution spaces.

3.5. Boundary Node Repair Strategy

In the TSMOCD algorithm, we adopted a repair operation for boundary-independent nodes. A boundary-independent node is one that is connected to nodes in different communities and has a unique gene value in the locus-based adjacency representation, except for the gene corresponding to the node. For example, Figure 5a shows an individual represented by locus-based adjacency representation, and Figure 5b shows the community division after decoding. Nodes 4 and 7 are boundary-independent nodes, as shown in Figure 5c. When boundary-independent node 7 mutates, it will only cause the movement of one node. On the contrary, as shown in Figure 5d, node 2 is not a boundary-independent node, and its mutation would cause the movement of more than one node. All boundary-independent nodes undergo repair operations. For each of them, we select the community with the greatest modularity score, which is induced by a neighboring node, as their new community assignment. This proposed operator generates effective solutions.

3.6. General Framework of TSMOCD

After describing the specific details of the algorithm, we present the framework of TSMOCD in Algorithm 1.

Algorithm 1 General Framework of TSMOCD

Input:G:: network dataset; maxgen: maximum number of generations; pop: po-pulation size; $p_{c}$ : crossover probability; $p_{m}$ : mutation probability; λ: weight vector $\{λ_{1}, λ_{2}, \dots, λ_{p o p}\}$ ; T: the size of neighborhood; $δ_{k}$ : simil-arity parameter.

Output: The results of Community division.

1: Initialize the population

P = \{p_{1}, p_{2}, \dots, p_{p o p}\}

;

2: Initialize reference point

z^{*}

;

3: B(i) ← For each individual

x_{i}

, select the T weight vectors that are closest to each weight vector in terms of Euclidean distance as its neighborhood;

4: for i = 1 to maxgen do

5: for j = 1 to pop do

6: if stage==1

7: Randomly select individuals from B(i) and generate children by uni form crossover and mutation operators;

8: elseif stage==2

9: Randomly select individuals from B(i) and generate children by connected component crossover and mutation operators;

10: end if

11: Update EP by comparing the fitness values of individuals

12: Update reference point

z^{*}

;

13: Update neighbors based on B(i);

14: end for

15: Perform boundary-independent node repair operations on individuals in EP and Update EP

16: end for

17: Decode

The Euclidean distance [29] is the straight-line distance between two points in Euclidean space. The most time-consuming part of the algorithm is the update EP operation in the inner loop, which includes decoding and calculating the fitness of individuals. Its time complexity is O(m + n) (m and n are the number of edges and nodes, respectively). The number of individuals in EP is related to pop. Therefore, the overall time complexity of the algorithm is O(gp²(m + n)) (g = maxgen, p = pop).

4. Experiments

In order to assess the performance of TSMOCD, we compared it with four existing MOEA-based community discovery methods: MOGA-Net [13], MOCD [14], MOEA/D-Net [15], and MOCD-ACO [18]. Except for MOCD-ACO, which has a time complexity of O(gp³(m + n)), all other algorithms have a time complexity of O(gp²(m + n)).

4.1. Evaluate Metrics

The modularity (Q) function is widely used to assess the community structure of networks. It measures the deviation between the observed fraction of edges within communities and the expected fraction of edges within communities in a random network [30]. Generally, a higher value of modularity indicates stronger internal connectivity of the community structure. Put simply, as modularity increases, the result of network partitioning improves. The modularity function for an unweighted and undirected network can be expressed as follows:

Q = \frac{1}{2 m} \sum_{i, j} (A_{i j} - \frac{k_{i} k_{j}}{2 m}) δ (i, j) .

(10)

where A is the adjacency matrix of the graph, m is the number of edges,

k_{i}

and

k_{j}

are the degrees of nodes i and j, respectively, and δ is the Kronecker function, which yields one if i and j are in the same community and zero otherwise.

Normalized mutual information (NMI) [31] evaluates the validity of network partitioning results based on real network partitioning. NMI is defined as follows:

N M I (A, B) = \frac{- 2 \sum_{i = 1}^{C_{A}} \sum_{j = 1}^{C_{B}} C_{i j} \log (\frac{C_{i j} N}{C_{i} C_{j}})}{\sum_{i = 1}^{C_{A}} C_{i} \log (\frac{C_{i}}{N}) + \sum_{j = 1}^{C_{B}} C_{j} \log (\frac{C_{j}}{N})}

(11)

where

C_{A}

and

C_{B}

are the number of clusters in community partitions A and B.

C_{ij}

represents the number of vertices that belong to both community i of partition A and community j of partition B.

C_{i}

is the sum of row i,

C_{j}

is the sum of column j, and N is the number of vertices of the network. When the obtained community partition matches real communities, NMI is 1. When they are completely different, NMI is 0.

4.2. Parameter Settings

To evaluate the performance of algorithms and reduce statistical errors, each algorithm is performed independently 20 times for all networks. At each run, we select the solutions that have the maximum value of NMI and modularity (Q). For all algorithms, the population size is N = 100, crossover probability is

p_{c}

= 0.9, mutation probability is

p_{m}

= 0.1, and the maximum number of iterations is 100.

4.3. Experiments on Real Networks

We consider four real-world networks with diverse characteristics, namely Zachary’s Karate Club, dolphin social network, American college football, and Polbooks network, all of which have been extensively studied in recent works [32,33,34]. Table 1 provides details about the four real-world networks.

Zachary’s Karate Club network, compiled by Zachary, consists of 34 members divided into two groups of almost equal size due to a conflict between the administrator and instructor of the club. As observed in Table 2, all algorithms can achieve good results on this network. MOEA/D-net and TSMOCD have average NMI values of 1, indicating their ability to successfully detect the true community structure in each run. We can see that MOEA/D-net, MOCD-ACO, and TSMOCD achieve the same maximum value of Q, which is the highest among all compared methods. We depict the two community structures, with Q = 0.4198 and NMI = 1 achieved by our TSMOCD in Figure 6.

Lusseau et al. compiled the dolphin network based on their seven-year observation of the behavior of dolphins in Doubtful Sound, New Zealand. It consists of 62 nodes, including 159 edges. From Table 3, we can see that all algorithms can obtain a maximum NMI value of 1 on this network, but algorithms MOEA/D-net, MOCD-ACO, and TSMOCD can achieve a maximum value almost every time. Our method achieves the highest modularity (Q) value and the highest average value. Figure 7a shows the partitioning results using our method when Q = 0.5238 is the optimal value, which divides the network into four clusters. As depicted in Figure 7b, at NMI = 1, as in the actual partitioning result, the dolphin network is partitioned into two parts.

Girvan and Newman created a network model by compiling American college football league networks. Based on Table 4, it is evident that all algorithms have difficulty achieving the correct division of real communities. Our TSMOCD outperforms other approaches in terms of the maximum and average values of Q, with 0.6043 and 0.6027, respectively. Our algorithm achieves a maximum NMI that is smaller than MOCD-ACO, but it ranks first with an average NMI of 0.9245. Our TSMOCD algorithm excels in modularity (Q) and NMI compared to all other algorithms, indicating its superiority. As shown in Figure 8a, in the case of Q = 0.6043, our algorithm obtains only 10 clusters. As shown in Figure 8b, in terms of optimal NMI = 0.9344, the network is partitioned into 12 clusters, which is the same as the actual football network.

Krebs compiled the Polbooks network based on American political books purchased from Amazon.com. All algorithms used in the experiment failed to achieve a true partitioning of the complex and difficult network. From the results in Table 5, with respect to modularity (Q), TSMOCD performs best with a top value of 0.5236 and an average value of 0.5223. For NMI, TSMOCD also obtained the maximum NMI value and a better average value. Figure 9a,b show TSMOCD’s solutions with the highest modularity (Q) and NMI on the Polbooks network. From the comparison results, TSMOCD still has better community partitioning performance.

4.4. Experiments on Synthetic Network

In this paper, we utilize a variant of the classical dataset proposed by Girvan and Newman [35]. The synthetic network consists of four communities, with each community comprising 32 nodes, resulting in a total of 128 nodes. Furthermore, each node in the network maintains an average degree of 16. The fuzziness of the synthesis network can be controlled by a mixing parameter, μ. As the value of μ increases, the network becomes more fuzzy, the clustering structure becomes more difficult to identify, and community detection becomes more difficult. In the experiments, the mixing parameters in the network vary between 0 and 0.5, each algorithm performs 20 repeated experiments on different mixed parameter networks to obtain an average NMI value.

As depicted in Figure 10, the experimental results demonstrate that TSMOCD outperforms other algorithms. All algorithms successfully detect the correct network structure when μ < 0.1. However, when μ > 0.1, MOGA-net fails to accurately identify the actual network structure. Similarly, MOCD also struggles when μ exceeds 0.15. Once the value of μ surpasses 0.35, all algorithms fail to identify the correct community structure. Notably, our proposed algorithm achieves the highest NMI value under these circumstances. In summary, whether it is a real-world network or a synthetic network, TSMOCD is more robust and more accurate than the other algorithms.

5. Conclusions

In this study, we proposed a new two-stage multi-objective community detection algorithm (TSMOCD) to address complex network community detection problems. We formulated community discovery as a bi-objective optimization problem and divided it into multiple sub-optimization problems using the MOEA/D framework. The algorithm was divided into two stages based on the individual similarity parameter, and appropriate crossover operators were used for each stage. We also introduced an external population to store individuals exhibiting superior fitness levels and improved the community partitioning results by repairing boundary-independent nodes. The effectiveness of the TSMOCD method has been proved by carrying out experiments conducted on synthetic as well as real-world networks.

Despite the efficacy of our TSMOCD method in detecting community structures, it still has significant limitations. For instance, it is unable to detect and evaluate overlapping communities, which are prevalent in real-world networks. The next step is to enhance TSMOCD so that it can be extended and applied to overlapping and large-scale networks.

Author Contributions

Original draft preparation and experiment, W.Z.; review and editing, H.L.; revision and project administration, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Project of Science and Technology Innovation 2030 supported by the Ministry of Science and Technology of China (No. 2018AAA0101301), the Key Projects of Artificial Intelligence of High School in Guangdong Province (No. 2019KZDZX1011), Dongguan Social Development Science and Technology Project (No. 20211800904722), and Dongguan Science and Technology Special Commissioner Project (No. 20221800500052).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Radicchi, F.; Castellano, C.; Cecconi, F.; Loreto, V.; Parisi, D. Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 2004, 101, 2658–2663. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shirazi, S.; Albadvi, A.; Akhondzadeh, E.; Farzadfar, F.; Teimourpour, B. A new application of community detection for identifying the real specialty of physicians. Int. J. Med. Inform. 2020, 140, 104161. [Google Scholar] [CrossRef] [PubMed]
Zheng, M.; Domanskyi, S.; Piermarocchi, C.; Mias, G.I. Visibility graph based temporal community detection with applications in biological time series. Sci. Rep. 2021, 11, 1–12. [Google Scholar]
Zhang, Z.; Jiao, Q.; Zhang, Y.; Liu, B.; Wang, Y.; Li, J. OTUCD: Unsupervised GCN based metagenomics non-overlapping community detection. Comput. Biol. Chem. 2022, 98, 107670. [Google Scholar] [CrossRef]
Zhou, X.; Su, L.; Li, X.; Zhao, Z.; Li, C. Community detection based on unsupervised attributed network embedding. Expert Syst. Appl. 2023, 213, 118937. [Google Scholar] [CrossRef]
Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Pan, H.; Su, Y.; Zhang, X.; Niu, Y. A mixed representation-based multiobjective evolutionary algorithm for overlapping community detection. IEEE Trans. Cybern. 2017, 47, 2703–2716. [Google Scholar] [CrossRef]
Cheng, F.; Cui, T.; Su, Y.; Niu, Y.; Zhang, X. A local information based multi-objective evolutionary algorithm for community detection in complex networks. Appl. Soft Comput. 2018, 69, 357–367. [Google Scholar] [CrossRef]
Gong, M.; Fu, B.; Jiao, L.; Du, H. Memetic algorithm for community detection in networks. Phys. Rev. E 2011, 84, 056101. [Google Scholar] [CrossRef] [Green Version]
Huang, Q.; White, T.; Jia, G.; Musolesi, M.; Turan, N.; Tang, K.; Yao, X. Community detection using cooperative co-evolutionary differential evolution. Proceedings 2012, 7492, 235–244. [Google Scholar]
Naeni, L.M.; Berretta, R.; Moscato, P. MA-Net: A reliable memetic algorithm for community detection by modularity optimization. In Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems, Singapore, 10–12 November 2015; Volume 1, pp. 311–323. [Google Scholar]
Lancichinetti, A.; Fortunato, S. Community detection algorithms: A comparative analysis. Phys. Rev. E 2009, 80, 056117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pizzuti, C. A multi-objective genetic algorithm for community detection in networks. IEEE Int. Conf. Tools Artif. Intell. 2009, 21, 379–386. [Google Scholar]
Shi, C.; Yan, Z.; Cai, Y.; Wu, B. Multi-objective community detection in complex networks. Appl. Soft Comput. 2012, 12, 850–859. [Google Scholar] [CrossRef]
Gong, M.; Ma, L.; Zhang, Q.; Jiao, L. Community detection in networks by using multiobjective evolutionary algorithm with decomposition. Phys. A Stat. Mech. Appl. 2012, 391, 4050–4060. [Google Scholar] [CrossRef]
Gong, M.; Cai, Q.; Chen, X.; Ma, L. Complex network clustering by multiobjective discrete particle swarm optimization based on decomposition. IEEE Trans. Evol. Comput. 2013, 18, 82–97. [Google Scholar] [CrossRef]
Zou, F.; Chen, D.; Li, S.; Lu, R.; Lin, M. Community detection in complex networks: Multi-objective discrete backtracking search optimization algorithm with decomposition. Appl. Soft Comput. 2017, 53, 285–295. [Google Scholar] [CrossRef]
Ji, P.; Zhang, S.; Zhou, Z. A decomposition-based ant colony optimization algorithm for the multi-objective community detection. J. Ambient Intell. Humaniz. Comput. 2020, 11, 173–188. [Google Scholar] [CrossRef]
Shang, J.; Li, Y.; Sun, Y.; Li, F.; Zhang, Y.; Liu, J.X. MOPIO: A multi-objective pigeon-inspired optimization algorithm for community detection. Symmetry 2020, 13, 49. [Google Scholar] [CrossRef]
Su, Y.; Zhou, K.; Zhang, X.; Cheng, R.; Zheng, C. A parallel multi-objective evolutionary algorithm for community detection in large-scale complex networks. Inf. Sci. 2021, 576, 374–392. [Google Scholar] [CrossRef]
Qi, X.; He, L.; Wang, J.; Du, Z.; Luo, Z.; Li, X. A Multi-objective Evolutionary Algorithm Based on Multi-layer Network Reduction for Community Detection. In Knowledge Science, Engineering and Management; Springer: Berlin/Heidelberg, Germany, 2022; Volume 13370, pp. 141–152. [Google Scholar]
Guo, K.; Chen, Z.; Lin, X.; Wu, L.; Zhan, Z.H.; Chen, Y.; Guo, W. Community Detection Based on Multiobjective Particle Swarm Optimization and Graph Attention Variational Autoencoder. IEEE Trans. Big Data 2022, 9, 569–583. [Google Scholar] [CrossRef]
Zhang, Q.; Li, H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, K.; Pan, H.; Zhang, L.; Zeng, X.; Jin, Y. A network reduction-based multiobjective evolutionary algorithm for community detection in large-scale complex networks. IEEE Trans. Cybern. 2018, 50, 703–716. [Google Scholar] [CrossRef] [PubMed]
Wan, X.; Zuo, X.; Song, F. Solving dynamic overlapping community detection problem by a multiobjective evolutionary algorithm based on decomposition. Swarm Evol. Comput. 2020, 54, 100668. [Google Scholar] [CrossRef]
Pérez-Peló, S.; Sanchez-Oro, J.; Gonzalez-Pardo, A.; Duarte, A. A fast variable neighborhood search approach for multi-objective community detection. Appl. Soft Comput. 2021, 112, 107838. [Google Scholar] [CrossRef]
Zhao, Q.; Guo, Y.; Yao, X.; Gong, D. Decomposition-based Multi-objective Optimization Algorithms with Adaptively Adjusting Weight Vectors and Neighborhoods. In IEEE Transactions on Evolutionary Computation; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Handl, J.; Knowles, J. An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 2007, 11, 56–76. [Google Scholar] [CrossRef]
Danielsson, P.E. Euclidean distance mapping. Comput. Graph. Image Process. 1980, 14, 227–248. [Google Scholar] [CrossRef] [Green Version]
Kumar, A.; Barman, D.; Sarkar, R.; Chowdhury, N. Overlapping community detection using multiobjective genetic algorithm. IEEE Trans. Comput. Soc. Syst. 2020, 7, 802–817. [Google Scholar] [CrossRef]
Qin, M.; Lei, K. Dual-channel hybrid community detection in attributed networks. Inf. Sci. 2021, 551, 146–167. [Google Scholar] [CrossRef]
Xu, G.; Guo, J.; Yang, P. TNS-LPA: An improved label propagation algorithm for community detection based on two-level neighbourhood similarity. IEEE Access 2020, 9, 23526–23536. [Google Scholar] [CrossRef]
Sathyakala, M.; Sangeetha, M. A weak clique based multi objective genetic algorithm for overlapping community detection in complex networks. J. Ambient Intell. Humaniz. Comput. 2021, 12, 6761–6771. [Google Scholar] [CrossRef]
Dabaghi-Zarandi, F.; KamaliPour, P. Community detection in complex network based on an improved random algorithm using local and global network information. J. Netw. Comput. Appl. 2022, 206, 103492. [Google Scholar] [CrossRef]
Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. A complex network with nine nodes.

Figure 2. (a) A graph with 7 nodes. (b) Individual genotype. (c) Decoded community division structure.

Figure 3. Example of uniform crossover operation.

Figure 4. Example of connected component crossover operation.

Figure 5. An example is provided to illustrate the repair strategy. (a) Individual genotype. (b) Decoded community division structure. (c) The result of the mutation on the boundary-independent node 7. (d) Mutation of non-boundary-independent node 2 results in a change in the community count.

Figure 6. (a) Q = 0.4198 (b) NMI = 1.

Figure 7. (a) Q = 0.5238; (b) NMI = 1.

Figure 8. (a) Q = 0.6043; (b) NMI = 0.9344.

Figure 9. (a) Q = 0.5236; (b) NMI = 0.6214.

Figure 10. GN benchmark network test results.

Table 1. Details of the four real-world networks.

Network	Node Number	Link Number	Clusters Number
Karate	34	78	2
Dolphins	62	159	2
Football	105	613	12
Polbooks	115	441	3

Table 2. Results obtained on the Karate Club network.

Algorithms	$Q_{m a x}$	$Q_{a v g}$	$N M I_{m a x}$	$N M I_{a v g}$
MOGA-net	0.4162	0.4151	0.8432	0.8412
MOCD	0.4188	0.4188	0.8372	0.8372
MOEA/D-net	0.4198	0.4198	1.0000	1.0000
MOCD-ACO	0.4198	0.4198	1.0000	1.0000
TSMOCD	0.4198	0.4198	1.0000	1.0000

Table 3. Results obtained on the dolphin network.

Algorithms	$Q_{m a x}$	$Q_{a v g}$	$N M I_{m a x}$	$N M I_{a v g}$
MOGA-net	0.5198	0.5187	1.0000	0.9789
MOCD	0.5209	0.5190	1.0000	0.9821
MOEA/D-net	0.5213	0.5186	1.0000	1.0000
MOCD-ACO	0.5224	0.5204	1.0000	1.0000
TSMOCD	0.5238	0.5222	1.0000	1.0000

Table 4. Results obtained on the football network.

Algorithms	$Q_{m a x}$	$Q_{a v g}$	$N M I_{m a x}$	$N M I_{a v g}$
MOGA-net	0.5225	0.5065	0.8126	0.8002
MOCD	0.5832	0.5821	0.8450	0.8423
MOEA/D-net	0.6023	0.6019	0.9187	0.9138
MOCD-ACO	0.5906	0.5863	0.9374	0.9214
TSMOCD	0.6043	0.6027	0.9344	0.9245

Table 5. Results obtained on the Polbooks network.

Algorithms	$Q_{m a x}$	$Q_{a v g}$	$N M I_{m a x}$	$N M I_{a v g}$
MOGA-net	0.5158	0.5121	0.5874	0.5845
MOCD	0.5169	0.5145	0.5886	0.5852
MOEA/D-net	0.5230	0.5187	0.5895	0.5871
MOCD-ACO	0.5188	0.5168	0.6159	0.5953
TSMOCD	0.5236	0.5223	0.6214	0.5979

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, W.; Li, H.; Wei, W. A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks. Mathematics 2023, 11, 2702. https://doi.org/10.3390/math11122702

AMA Style

Zhu W, Li H, Wei W. A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks. Mathematics. 2023; 11(12):2702. https://doi.org/10.3390/math11122702

Chicago/Turabian Style

Zhu, Wenxin, Huan Li, and Wenhong Wei. 2023. "A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks" Mathematics 11, no. 12: 2702. https://doi.org/10.3390/math11122702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Stage Multi-Objective Evolutionary Algorithm for Community Detection in Complex Networks

Abstract

1. Introduction

2. Related Background

2.1. Community Detection Problems

2.2. Multi-Objective Optimization

2.3. MOEA/D

3. The Proposed Algorithm TSMOCD

3.1. Representation

3.2. Initialization

3.3. Similarity Parameter

3.4. Crossover and Mutation Operator

3.5. Boundary Node Repair Strategy

3.6. General Framework of TSMOCD

4. Experiments

4.1. Evaluate Metrics

4.2. Parameter Settings

4.3. Experiments on Real Networks

4.4. Experiments on Synthetic Network

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI