Next Article in Journal
Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering
Next Article in Special Issue
Extended Graph of Fuzzy Topographic Topological Mapping Model: G04(FTTMn4)
Previous Article in Journal
Symmetry in Applied Continuous Mechanics 2022
Previous Article in Special Issue
Online and Connected Online Ramsey Numbers of a Matching versus a Path
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Further Study on the Degree-Corrected Spectral Clustering under Spectral Graph Theory

College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2022, 14(11), 2428; https://doi.org/10.3390/sym14112428
Submission received: 31 October 2022 / Revised: 9 November 2022 / Accepted: 12 November 2022 / Published: 16 November 2022
(This article belongs to the Special Issue Symmetry in Graph and Hypergraph Theory)

Abstract

:
Spectral clustering algorithms are often used to find clusters in the community detection problem. Recently, a degree-corrected spectral clustering algorithm was proposed. However, it is only used for partitioning graphs which are generated from stochastic blockmodels. This paper studies the degree-corrected spectral clustering algorithm based on the spectral graph theory and shows that it gives a good approximation of the optimal clustering for a wide class of graphs. Moreover, we also give theoretical support for finding an appropriate degree-correction. Several numerical experiments for community detection are conducted in this paper to evaluate our method.

1. Introduction

Due to the growing availability of datasets of large-scale networks, community detection has attracted significant consideration. The community detection problem is to discover a community structure by dividing the network into multiple clusters according to the affinity between nodes. Because the spectral clustering method is easy to implement and can detect non-convex clusters, it is widely used for detecting clusters in networks. Compared to the traditional algorithms, spectral clustering performs well and has many fundamental advantages [1,2,3,4].
In the spectral clustering algorithm, the similarity between the data points is reflected by the weights on the edges in the graph. The data points are mapped to a lower-dimensional space through the Laplacian matrix of the graph, and finally, the non-convex datasets in the obtained low-dimensional space are clustered by traditional clustering algorithms.
Let G = ( V , E ) be an undirected and unweighted simple graph with n nodes, where V and E are the set of nodes and edges, respectively. The adjacency matrix of graph G, denoted by W = ( w i j ) , is a 0–1 symmetric matrix of order n, where the ( i , j ) -th and ( j , i ) -th element is 1 if there is an edge between two nodes i and j, and 0 otherwise. Let d i = j = 1 n w i j , which is defined as the degree of node i. Moreover, d max = max i V d i and d min = min i V d i are called the maximal degree and minimal degree of G, respectively. Denote d ¯ as the average degree of graph G, which equals 1 n i = 1 n d i . The degree matrix is defined by D = d i a g ( d 1 , , d n ) . The symmetric matrix D W is called an unnormalized Laplacian of G, each of whose row sum is zero. The normalized Laplacian L = I D 1 / 2 W D 1 / 2 has zero as the smallest eigenvalue and plays an very important role in the spectral clustering algorithm. It is well defined only in case D 1 exists, i.e., there are no isolated nodes.
In 2002, Ng et al. [5] proposed a version of spectral clustering (NJW) under the normalized Laplacian matrix. Moreover, the authors in [5] analyzed their algorithm using matrix perturbation theory and gave the conditions for the algorithm performing well when nodes from different clusters are well-separated. However, when dealing with a sparse network with a strong degree of heterogeneity, i.e., the minimum degree of the graph is low, and NJW cannot concentrate well. To resolve this issue, Chaudhuri and Chung [6] introduced the notion of a degree-corrected random-walk Laplacian I D + τ I 1 W and demonstrated that it outputs the correct partition under a wide-range graph generated from extended planted partition (EPP) model. Instead of doing the spectral decomposition on the entire matrix, Chaudhuri and Chung [6] divided the nodes into two random subsets and only used the induced subgraph on one of those random subsets to compute the spectral decomposition. Qin and Rohe [7] investigated the spectral clustering algorithm using the degree-corrected normalized Laplacian L τ = I D + τ I 1 / 2 W D + τ I 1 / 2 under the degree-corrected stochastic blockmodel, where τ = d ¯ . This method extended the previous statistical estimation results to the more canonical spectral clustering algorithm, which is called the regularized spectral clustering (RSC). Recently, Qing and Wang [8] proposed an improved spectral clustering under the degree-corrected stochastic blockmodel also, where τ = 0.1 d m i n + d m a x 2 , (ISC). Unlike NJW and RSC, which use the top k eigenvectors to construct the mapping matrix, ISC uses the top k + 1 eigenvectors and the corresponding eigenvalues instead and outperforms especially in the weak signal networks, where k is the number of clusters.
Actually, previous works for spectral clustering with the degree-corrected Laplacian were mostly applied to graphs generated from stochastic blockmodels. Moreover, the optimal τ has a complex dependence on the degree of distribution of the graph and τ = d ¯ provides good results [6,7]. In [7], the authors claimed that when τ = d ¯ , it could be adjusted by a multiplicative constant and the results are not sensitive to such adjustments. However, some numerical experiments show that an appropriate τ could be found for a better performance.
This paper investigates the spectral clustering algorithm using the degree-corrected Laplacian in view of spectral graph theory [9] and shows that it also works for a wide class of graphs. Moreover, we also provide theoretical guidance on the choice of the parameter τ . Finally, six real-world datasets are used to test the performance of our method for an appropriate τ . The results are roughly equivalent to that of RSC, or even better.
The rest of this paper is organized as follows. In Section 2, we list some relative definitions and useful lemmas in the analysis of our main results in Section 3. In Section 4, some numerical experiments are conducted for the real-world datasets. Moreover, some artificial networks are generated to analyze the effect of our method in terms of some related parameters. The conclusion and future work are provided in Section 5.

2. Preliminary

Let G = ( V , E ) be a graph. The symmetric difference of two subsets S and T of V is defined as S Δ T = ( S \ T ) ( T \ S ) . For a subset S of V, E ( S , V \ S ) = { ( u , v ) E : u S , v V \ S } . The symbol μ ( S ) denotes the volume of S that is given by the sum of degree of all notes in S, i.e., μ ( S ) = v S d v . If k disjoint subsets S 1 , , S k of V satisfy i = 1 k S i = V , we call { S 1 , , S k } a k-way partition of V. Kolev and Mehlhorn [10] introduced the minimal average conductance denoted by
ϕ ¯ k ( G ) = min { S 1 , , S k } U 1 k ( ϕ ( S 1 ) + + ϕ ( S k ) ) ,
where U is a set of containing every k way partition of the points set of G, and    ϕ ( S ) = | E ( S , V \ S ) | μ ( S ) . A partition { S 1 , , S k } is called optimal, if it satisfies that 1 k ( ϕ ( S 1 ) + + ϕ ( S k ) ) = ϕ ¯ k ( G ) . In this paper, we denote A 1 , , A k as the actual partition returned by the RSC algorithm, where k is the number of classes of the graph.
Let · 2 denote the 2-norm for a vector and · F denote the Frobenius norm for a matrix.
The k-means algorithm tends to find a set of k centers c 1 , , c k to minimize the sum of the squared-distance between the points and the center to which it is assigned.
Let F be a spectral embedding map from V to a vector space. Given any k-way partition of G and a set of vectors, say { S 1 , , S k } and w 1 , , w k , respectively, the cost function of partition { S 1 , , S k } of V, mentioned in [11], is defined as
g ( S 1 , , S k , w 1 , , w k ) = i = 1 k v S i d v F ( v ) w i 2 2 .
The main idea of this function is to expand each element F ( v ) of V by making d v copies of F ( v ) and form a set with 2 | E ( G ) | nodes. Then, it acquires a partition by using k-means algorithm. The “trick” is to copy every node u to d u identical nodes. This method can efficiently deal with the networks, which have the overlap between clusters. For convenience, it is necessary to assume that the k-means clustering algorithm outputs of the expansion of vertices V satisfying the following condition.
(A)
For every v V , all d v copies of F ( v ) are contained in one part.
Suppose that { Y 1 , , Y k } is the partition of V with centers z 1 , , z k , which is the output of the k-means clustering algorithm, the value of the clustering cost function is denoted by “COST”, i.e.,
C O S T = g ( Y 1 , , Y k , z 1 , , z k ) .
Then, we will introduce the traditional NJW and RSC Algorithm 1.
Algorithm 1. The traditional NJW and RSC algorithm
Input: 
W , k , ( τ for RSC)
  1: 
Calculate the normalized Laplacian matrix L = D 1 / 2 W D 1 / 2 .
( L τ = ( D + τ I ) 1 / 2 W ( D + τ I ) 1 / 2 for RSC).
  2: 
Find the eigenvectors f 1 , , f k corresponding to the k largest eigenvalues of L. Form X = [ f 1 , , f k ] by putting the eigenvectors into the columns.
  3: 
Normalize each row of X to get matrix Y, i.e., Y i j = X i j / ( j = 1 k X i j 2 ) 1 / 2 , where i = 1 , , n and j = 1 , , k .
  4: 
Apply k-means method to Y to get the label of each node.
Output: 
labels for all nodes

3. Analysis of RSC Algorithm

Our method for analyzing the RSC algorithm follows the strategy developed by Peng et al. [11], Kolve et al. [10], and Mizutani [12]. Let { S 1 , , S k } be a partition of the nodes set of V. Define g i R n is the normalized indicator of S i . That means, if v S i , the v-th element of g i is one, or else is zero. The normalized indicator g ¯ i of S i is given as
g ¯ i = D 1 / 2 g i D 1 / 2 g i 2 = d v μ ( S i ) v S i 0 v S i .
It is obvious that g ¯ i 2 = 1 .
The following result is called the structure theoremwhich plays a very important role to examine the performance of the spectral clustering. It shows that there is a linear combination f ^ i of f 1 , , f k + 1 such that f ^ i and g i are close.
Theorem 1
(Structure Theorem). Let
Ψ = 1 1 λ k + 1 ( τ ) 1 d m i n d m a x + τ + ϕ ¯ k ( G ) d m i n d m a x + τ ,
where λ k + 1 ( τ ) ( λ k + 1 for short) is the ( k + 1 ) -th largest eigenvalue of L τ , and { S 1 , , S k } be the ϕ ¯ k ( G ) -optimal partition of G, G ¯ = [ g ¯ 1 , , g ¯ k ] R n × k , F ¯ = [ f 1 , , f k ] R n × k . If k Ψ < 1 , then there exists a k × k orthogonal matrix U = [ u 1 , , u k ] , such that
F ¯ U G ¯ | | F 2 k Ψ .
Proof. 
Denote g ¯ i u as the element in g ¯ i corresponding to the vertex u. Moreover, let
g ¯ i = j = 1 n h i , j f j , f i ^ = j = 1 k h i , j f j .
First,
g ¯ i T L δ g i ¯ = { u , v } E ( G ) 1 d u g ¯ i u 2 2 d u + τ d v + τ g ¯ i u g ¯ i v + 1 d v g ¯ i v 2 = u S i ; v S ¯ i 1 μ ( S i ) + u , v S i 2 μ ( S i ) ( 1 d u d v d u + τ d v + τ ) ϕ ( S i ) + 2 E ( S i ) μ ( S i ) ( 1 d m i n d m a x + τ ) = 1 2 E ( S i ) μ ( S i ) d m i n d m a x + τ = 1 d m i n d m a x + τ + ϕ ( S i ) d m i n d m a x + τ < 1 .
On the other hand,
g ¯ i T L δ g i ¯ = j = 1 n h i , j f j T L δ j = 1 n h i , j f j = j = 1 n h i , j f j T j = 1 n h i , j ( 1 λ j ) f j = j = 1 n h i , j 2 ( 1 λ j ) j = k + 1 n h i , j 2 ( 1 λ j ) ( 1 λ k + 1 ) j = k + 1 n h i , j 2 .
Then,
| | f i ^ g i ¯ | | 2 2 = j = k + 1 n h i , j 2 1 1 λ k + 1 1 d m i n d m a x + τ + ϕ ( S i ) d m i n d m a x + τ ,
and
| | F ^ G ¯ | | F 2 = i = 1 k | | f i ^ g i ¯ | | 2 2 1 1 λ k + 1 1 d m i n d m a x + τ + ϕ ¯ k ( G ) d m i n d m a x + τ .
Let h i = [ h i , 1 , , h i , k ] T , i = 1 , , k , and H = [ h 1 , , h k ] R k × k . Considering the singular value decomposition of H , given as H = A Σ B T , where A R k × k and B R k × k are orthogonal matrices, and Σ is a k × k diagonal matrix.
Let U = AB T and R = U H R k × k . Then, U is an orthogonal matrix. According to the proof of Theorem 4 in [12], it obtains
R F k Ψ a n d F ¯ U G ¯ F k Ψ + k Ψ .
When k Ψ < 1 , we have
F ¯ U G ¯ F 2 k Ψ .
This completes the proof. □
Given k vectors, say c 1 , , c k R k , we suppose that | | c i c j | | 2 2 is lower bounded by some real numbers ζ i , j 0 and g ( S i , , S k , c 1 , , c k ) is upper bound by a real number ω 0 , i.e.,
| | c i c j | | 2 2 ζ i j ( i j ) a n d g ( S i , , S k , c 1 , , c k ) ω .
We are now ready to derive the bounds of ζ i j and ω shown in (4) for the RSC algorithm. Let F ¯ = [ f 1 , , f k ] and p v be the v-th row of F ¯ , corresponding to the node v. Since U is an orthogonal matrix, the inequality (3) can be rewritten as
F ¯ U G ¯ F = F ¯ G ¯ U T F = F ¯ T U G ¯ T F = i = 1 k v S i p v d v μ ( S i ) u i 2 2 .
The spectral embedding map in the RSC algorithm, denoted by F R S C ( v ) , is given as
F R S C ( v ) = 1 p v 2 p v .
Hence, according to the discussion in [12], it is easy to obtain the upper bound of “COST”. The discussion needs the following inequality.
Lemma 1
([12]). The following inequality holds for a vector a R k and a vector u R k with u 2 = 1 ,
a a 2 u 2 2 a u 2 .
Theorem 2.
Let a partition { S 1 , , S l } of G be an optimal achieving ϕ ¯ k ( G ) and F R S C be the spectral embedding map in RSC algorithm. Define the center of S i , c i = u i for i = 1 , , k , then
  • c i c j 2 2 = 2 .
  • g ( S 1 , , S k , c 1 , , c k ) 16 k μ m a x Ψ ,
where μ m a x = max { μ ( S i ) | i = 1 , 2 , , k } .
Proof. 
First, since c i = u i , we have that
c i c j 2 2 = ( u i u j ) T ( u i u j ) = 2 .
On the other hand, let F ( v ) = μ ( S i ) d v p v . Then,
g ( S 1 , , S k , c 1 , , c k ) = i = 1 k v S i d v F R S C ( v ) u i 2 2 = i = 1 k v S i d v p ( v ) p ( v ) u i 2 2 = i = 1 k v S i d v F ( v ) F ( v ) u i 2 2 4 i = 1 k v S i d v μ ( S i ) d v p v u i 2 2 ( by   Lemma   1 ) = 4 i = 1 k v S i μ ( S i ) p v d v μ ( S i ) u i 2 2 ( by   Equation   ( 5 )   and   Theorem   1 ) 16 k μ m a x Ψ .
The result holds. □
Assume that OPT stands for the optimal clustering cost of graph G, then it is obvious that COST α · OPT , where α is an approximation ratio. Moreover, OPT g ( S 1 , , S k , c 1 , , c k ) . Therefore, we can obtain the upper bound of COST.
Theorem 3.
Let { S 1 , , S k } be a ϕ ¯ k ( G ) -optimal partition of G. Then
C O S T 16 k α μ m a x Ψ .
Lemma 2
([12]). Assume that, for every permutation π : 1 , , k 1 , , k , there is an index l such that μ ( A l Δ S π ( l ) ) 2 ϵ μ ( S π ( l ) ) for a real number 0 ϵ 1 / 2 . Then, the following inequality holds,
C O S T 1 8 i H ξ i ζ i , p min μ ( S i ) , μ ( S l ) ω ,
where H is a subset of 1 , , k , p is an element of 1 , , k and ξ i 0 is a non-negative real number satisfying i H ξ i ϵ , and ω is the upper bound of g ( S 1 , , S k , c 1 , , c k ) in (4).
By setting ζ i , j = 2 and ω = 16 k α μ m a x Ψ , then we obtain the following result.
Theorem 4.
Suppose that the assumption of Lemma 2 holds. Then,
C O S T 1 4 ϵ μ m i n 16 k μ m a x Ψ .
Theorem 5
(Main result). Given a graph G = (V,E) and a positive integer k, let a partition S 1 , , S n of G be ϕ ¯ k G o p t i m a l and A 1 , , A n be a partition of G returned by the RSC clustering algorithm. Assume that a k-means clustering algorithm has an approximation ratio of α and satisfies assumption (A). If Ψ μ m i n 264 2 k α μ m a x , then, after a suitable renumbering of A 1 , , A k , the following holds for i = 1 , , k ,
μ A i Δ S i 264 k α μ m a x μ m i n Ψ μ ( S i ) .
Proof. 
Choose a real number
ϵ = 132 k α μ m a x μ m i n Ψ < 1 4 .
Assume that, for every permutation π : 1 , , k 1 , , k , there is an index l such that μ ( A l Δ S π ( l ) ) 2 ϵ μ ( S π ( l ) ) for a real number ϵ . Hence, applying Theorems 3 and 4, we can obtain the following
C O S T 1 4 ϵ μ m i n 16 k μ m a x Ψ 33 k α μ m a x Ψ 16 k α μ m a x Ψ = 17 k α μ m a x Ψ > 16 k α μ m a x Ψ ,
which contradicts Theorem 3. That means, after a suitable renumbering of A 1 , , A n , we have
μ A i Δ S i 2 ϵ μ ( S i ) = 264 k α μ m a x μ m i n Ψ μ ( S i ) ,
for every i = 1 , 2 , , k . □

4. Finding an Appropriate τ and Numerical Experiment

The main theorem gives an upper bound of μ ( A i Δ S i ) in RSC algorithm. It tells us that the performance would vary while the term Ψ decreases with increasing τ . In this section, we will try to find an appropriate τ for the good partitioning, according to this main theorem.
Before our analysis, we may make some reasonable assumptions as (B) to (D).
(B)
2 | E ( S i ) | / μ ( S i ) > 1 / d ¯
(C)
τ k d ¯
(D)
μ m i n / μ m a x 2 d ¯ n .
Firstly, 2 | E ( S i ) | μ ( S i ) stands for the ratio of the edges in S i to the degree summation of all points in S i . We may assume that 2 | E ( S i ) | μ ( S i ) > 1 / d ¯ , since S i is one of clusters in the optimal partitioning. Second, as mentioned in [6,7], the choice of τ is very important. If τ is too small, there is insufficient regularization. If τ is too large, it washes out significant eigenvalues. Then, it is reasonable to assume that τ k d ¯ . Moreover, μ m i n and μ m a x stand for the edges in the responding cluster and the ratio of them stands for the relative density. Hence, we can assume that μ m i n μ m a x 2 d ¯ n .
Then,
Ψ 1 1 λ k + 1 ( τ ) 1 d m i n d ¯ ( d m a x + τ ) d ¯ ( d m a x + k d ¯ ) d m i n ( 1 λ k + 1 ( τ ) ) ( d ¯ ( d m a x + τ ) ) ,
where λ k + 1 ( τ ) is the ( k + 1 ) -th largest eigenvalue of L τ . Furthermore, the theoretical analysis in [7] shows that τ = d ¯ provides good results and one could adjust this by a multiplicative constant. For these reasons, we set τ = δ d ¯ and attend to find an appropriate δ to refine the algorithm.
Six real datasets are used to test our method. These datasets can be downloaded directly from http://zke.fas.harvard.edu/software.html, accessed on 10 September 2022. Table 1 shows the detail information of six real datasets, including the source of the dataset, the number of data points (n), the number of communities included (k), the minimum degree of data points ( d m i n ), and the maximum degree of data points ( d m a x ).

4.1. Find an Appropriate δ

Let
U B ( δ ) = 1 ( 1 λ k + 1 ( δ d ¯ ) ) ( d m a x + δ d ¯ ) .
Figure 1 plots the variation of UB( δ ) when δ varies between 0 and 1 in six real datasets. It is obvious that U B ( δ ) is decreasing with increasing δ . The following theorem (often called the Geršgorin disc theorem) makes this observation true.
Theorem 6
(Geršgorin Disk Theorem). Let A = ( a i j ) R n × n and
R i ( A ) = j = 1 j i n | a i j | , 1 i n
denote the deleted absolute row sums of A. Then, all the eigenvalues of A are located in the union of n discs
i = 1 n { z C : | z a i i | R i ( A ) } G ( A ) .
It tells us that, for all i = 1 , 2 , , n , R i ( L δ d ¯ ) decreases with increasing δ . Then, λ k + 1 ( δ d ¯ ) and the term U B ( δ ) will decrease as well. It is easy to see that lim δ U B ( δ ) = 0 . Therefore, we would like to find an appropriate δ , such that the upper bound U B ( δ ) will not vary too much, when δ varies small.
According to Theorem 5, we may assume that
Ψ d ¯ ( d m a x + k d ¯ ) d m i n ( 1 λ k + 1 ( τ ) ) ( d ¯ ( d m a x + τ ) ) μ m i n 264 2 k α μ m a x .
Then
1 ( 1 λ k + 1 ( τ ) ) ( d ¯ ( d m a x + τ ) ) d ¯ 264 k 2 n ,
follows from the assumption μ m i n μ m a x 2 d ¯ n and d ¯ ( d m a x + k d ¯ ) d m i n k .
Define U B D ( δ ) as the absolute difference of U B when δ increases 0.005, i.e., U B D ( δ ) = | U B ( δ + 0.005 ) U B ( δ ) | . We would like to find that δ 0 satisfies the following conditions:
δ δ 0 , U B D δ d ¯ 264 k 2 n δ < δ 0 , U B D δ > d ¯ 264 k 2 n .
In the rest of this paper, three indices, namely RI, NMI, and error rate, are used to evaluate the effectiveness.

Evaluation Indices

Rand Index For a dataset with n data points, the total number of sample pairs is n ( n 1 ) 2 . If two sample points belong to the same class are classified into the same class, we denote the number of such sample pairs as a. If two sample points belong to different classes are divided into different classes, we denote the number of such sample pairs as b. The calculation formula of RI is as follows:
R I = a + b n ( n 1 ) / 2 .
The R I value represents the proportion of correctly clustered sample pairs in all sample pairs and is often used to measure the similarity between two datasets. Obviously, R I is between 0 and 1. If R I = 1 , the clustering is completely correct, and if R I = 0 , it is completely wrong.
Normalized Mutual Information We use U and V to denote the true label vector and predicted label vector, respectively. Let U i represent the elements belonging to class i in U and V j represent the elements belonging to class j in V. H ( U ) represents the information entropy of U, that could be calculated by
H ( U ) = i = 1 n p i log p i ,
where the base of the logarithmic function is usually 2 and p i represents the ratio of the number of nodes belonging to class i to the total amount of nodes, i.e., p i = | U i | n . Now, we can obtain the formula for calculating mutual information (MI):
M I ( U , V ) = i = 1 n j = 1 n p i j l o g p i j p i × p j ,
where p i j = | U i V j | n . Based on the information entropy and the mutual information, we can obtain the normalized mutual information as
N M I ( U , V ) = 2 M I ( U , V ) H ( U ) × H ( V ) .
Error Rate Error rate is defined by
m i n π : p e r m u t a t i o n o v e r 1 , 2 , , k 1 n i = 1 n 1 π l ^ i l i ,
where l ^ i and l i are the true and predicted labels of node i, respectively.

4.2. Real Networks Experiments

After some pre-processing, these six real datasets are all networks containing k non-overlapping communities and are labeled. We will use RSC- δ to stand for the RSC algorithm when δ = δ 0 which satisfies the condition in (6). Actually, NJW, RSC, and RSC- δ are three different cases in RSC algorithm, when δ takes different values. When δ = 0 , it is NJW algorithm. When δ = 1 , it is the RSC algorithm. When δ = δ 0 in (6), it is RSC- δ . Table 2 shows the experimental results of these three cases. Furthermore, the best performance in each dataset is indicated by the bold-type letter. The last row in Table 2 shows the corresponding δ 0 in RSC- δ .
As can be seen from the table, RSC- δ is fully clustered correctly on UKfaculty and karate dataset. Moreover, RSC- δ achieves the best clustering results on the politicalblog dataset, with only 58 clustering errors.
Table 3 shows the items of the upper bound for μ ( S i Δ A i ) proposed in Theorem 5. From the observation, the performance of the RSC- δ algorithm is effected by the two parameters of μ m a x μ m i n and ϕ ¯ k ( G ) . For example, the RSC- δ does not perform well in caltech and dolphins. All networks except caltech have the minimal average conductance smaller than 0.4 and that of caltech is larger than 0.5. Although dolphins has a small ϕ ¯ k ( G ) , μ m a x μ m i n is larger than 2.

4.3. Synthetic Data Experiments

In this section, we will use artificial networks to evaluate the performance of the RSC- δ algorithm in terms of the average degree, mixing parameter, and the number of nodes in the largest community. We generate artificial networks using the LFR benchmark, which is considered as a standard test network for community detection, characterized by a non-uniform distribution of node degrees and community sizes.
The test artificial networks are generated with the following parameters: the number of nodes (n), the average degree ( d ¯ ), the maximum degree(maxd), the mixing parameter ( μ ), the number of nodes in the smallest community (minc), and the number of nodes in the largest community (maxc). The value of the mixing parameter, denoted by μ , is between 0.1 and 0.9. Low amounts of μ give a clear community structure where the intra-cluster link is much more than the inter-cluster link [17].

4.3.1. The Ratio of the Average Degree to the Maximum Degree

In this experiment, we generate nine artificial networks consisting of 500 nodes. To evaluate the performance of RSC- δ in terms of the average degree, we fix the parameter μ = 0.5 , minc = 100, maxc = 300, maxd = 220, respectively, and the average degreevaries from 10 to 170, i.e., 10, 30, 50, 70, 90, 110, 130, 150, and 170, respectively. Then, the ratio of the average degree to the maximum degree varies from 0.0455 to 0.7727. The performance comparison is summarized in Figure 2.
From our observation, we can understand that the performance of RSC- δ is highly dependent on the average degree of the network. With the average degree increasing, RI and NMI increase, and the error rate decreases significantly. Actually, this phenomenon is verified by the inequality (2), since the equality holds when the graph is regular.

4.3.2. Mixing Parameter

In this experiment, we also generate nine artificial networks with 500 nodes and fix the parameter d ¯ = 15 , minc = 100, maxc = 300, maxd = 220, respectively. In order to study the effect of the mixing parameter on RSC- δ , μ varies from 0.1 to 0.9. The experimental results are shown in Figure 3.
From the observation, we understand that RSC- δ performs excellently when μ is between 0 and 0.3. However, it drops sharply when μ is varying from 0.3 to 0.5. This phenomenon coincides with the result for the real datasets, that RSC- δ does not perform well when ϕ k ( G ) is larger than 0.5. However, the performance of RSC- δ remains stable when μ 0.5 . This shows that RSC- δ is less affected by μ 0.5 .

4.3.3. The Number of Nodes in the Largest Community

In this experiment, we generate 13 artificial networks consisting of 1700 nodes. To evaluate the performance of RSC- δ in terms of the number of nodes in the largest community, we fix the parameter μ = 0.5 , d ¯ = 30, minc = 300, maxd = 500, respectively, and the number of nodes in the largest community is varying from 300 to 900, step size is 50. The experimental results are shown in Figure 4.
Since both the degree and the community size distributions, in the graph generated by the LFR benchmark, are power laws, this experiment uses m i n c m a x c to simulate μ m a x μ m i n , and the experiment result shows that the RSC- δ algorithm performs well when the network is “balanced”, which also verifies the results in the real datasets.

5. Conclusions

Traditional spectral clustering algorithms such as NJW have poor performance in sparse networks with a strong degree of heterogeneity. The RSC algorithm improves the performance of spectral clustering in sparse networks through degree correction. Based on the spectral graph theory, this paper investigates the degree correction method of RSC, and shows that the RSC algorithm works for a wide class of networks. Moreover, we also provide a method to find an appropriate degree-correction τ to refine the RSC algorithm. Some numerical experiments are conducted to evaluate the performance of our method. By comparing the experimental results on the six real datasets, RSC- δ performs well on the datasets named karate, politicalblog, and simmons. Finally, the experimental results on the artificial networks show that RSC- δ performs well when the average degree is much smaller than the maximum degree. Furthermore, the performance of RSC- δ algorithm is less affected by the mixing parameter μ 0.5 . At last, the numerical experiments also show that the algorithm is affected by two parameters of ϕ ¯ k ( G ) and μ m a x μ m i n .

6. Discussion

The RSC algorithm uses a constant τ for the degree-correction. Can we use different degree-corrections for different nodes? We try to use the information of the neighbor nodes of each node as follows.
Let N ( i ) be the set of nodes adjacent to node i. Denote d m a x i = max { d j : v j N ( i ) } , d m i n i = min { d j : v j N ( v i ) } , d m i d i = 1 2 ( d m a x i + d m i n i ) and d m e a n i = j N ( i ) d j / d i .
Let Π = d i a g ( π 1 , , π n ) be a diagonal matrix of order n. The modified normalized Laplacian matrix is
L Π = ( D + Π ) 1 / 2 W ( D + Π ) 1 / 2 ,
We used RSC-max, RSC-min, RSC-mean, and RSC-mid to denote the method when π i equals to d m a x i , d m i n i , d m e a n i , d m i d i , and i = 1 , 2 , , n , respectively. Table 4 shows the experimental results of these methods. We can see that the RSC-min algorithm is a bit better than RSC. The RSC-min algorithm performs better than RSC in five datasets, and only misclassifies two nodes on UKfaculty. Therefore, using a different degree-correction for each node might improve the performance of the RSC algorithm. We will leave this to our future work.

Author Contributions

Conceptualization, W.L.; data curation, W.L. and F.L.; formal analysis, W.L.; funding acquisition, W.L.; investigation, F.L.; methodology, W.L.; project administration, W.L.; resources, W.L. and F.L.; software, F.L.; supervision, W.L.; validation, W.L., F.L. and Y.Z.; visualization, W.L. and F.L.; writing—original draft preparation, W.L.; writing—review and editing, W.L., F.L. and Y.Z.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (No. 11901094).

Data Availability Statement

The data presented in this study are openly available at http://zke.fas.harvard.edu/software.html, accessed on 10 September 2022.

Conflicts of Interest

The authors declare that there are no conflict of interest.

References

  1. Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
  2. Hamad, D.; Biela, P. Introduction to spectral clustering. In Proceedings of the 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications, Damascus, Syria, 7–11 April 2008; pp. 1–6. [Google Scholar]
  3. Khan, B.S.; Niazi, M.A. Network community detection: A review and visual survey. arXiv 2017, arXiv:1708.00977. [Google Scholar]
  4. Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  5. Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2001, 14, 849–856. [Google Scholar]
  6. Chaudhuri, K.; Chung, F.; Tsiatas, A. Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the Conference on Learning Theory. JMLR Workshop and Conference Proceedings, Edinburgh, UK, 25–27 June 2012; pp. 1–35. [Google Scholar]
  7. Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
  8. Qing, H.; Wang, J. An improved spectral clustering method for community detection under the degree-corrected stochastic blockmodel. arXiv 2020, arXiv:2011.06374. [Google Scholar]
  9. Chung, F.R.K. Spectral Graph Theory; CBMS. Reg. Conf. Ser. Math. 92; AMS: Providence, RI, USA, 1997. [Google Scholar]
  10. Kolev, P.; Mehlhorn, K. A Note on Spectral Clustering. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA 2016), Aarhus, Denmark, 22–26 August 2016; Volume 57, pp. 57:1–57:14. [Google Scholar]
  11. Peng, R.; Sun, H.; Zanetti, L. Partitioning well-clustered graphs: Spectral clustering works! In Proceedings of the Conference on Learning Theory, Paris, France, 3–6 July 2015; pp. 1423–1455. [Google Scholar]
  12. Mizutani, T. Improved analysis of spectral algorithm for clustering. Optim. Lett. 2021, 15, 1303–1325. [Google Scholar] [CrossRef]
  13. Nepusz, T.; Petróczi, A.; Négyessy, L.; Bazsó, F. Fuzzy communities and the concept of bridgeness in complex networks. Phys. Rev. E 2008, 77, 016107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Red, V.; Kelsic, E.D.; Mucha, P.J.; Porter, M.A. Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 2011, 53, 526–543. [Google Scholar] [CrossRef] [Green Version]
  15. Lusseau, D. The emergent properties of a dolphin social network. Proc. R. Soc. Lond. Ser. B Biol. Sci. 2003, 270, S186–S188. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef]
  17. Yang, C.; Liu, Z.; Zhao, D.; Sun, M.; Chang, E. Network representation learning with rich text information. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Figure 1. Plots of the UB( δ ) in six real datasets: x axis: δ and y axis: UB( δ ). (a) UB for different values of δ on UKfaculty; (b) UB for different values of δ on caltech; (c) UB for different values of δ on dolphins; (d) UB for different values of δ on karate; (e) UB for different values of δ on politicalblog; (f) UB for different values of δ on simmons.
Figure 1. Plots of the UB( δ ) in six real datasets: x axis: δ and y axis: UB( δ ). (a) UB for different values of δ on UKfaculty; (b) UB for different values of δ on caltech; (c) UB for different values of δ on dolphins; (d) UB for different values of δ on karate; (e) UB for different values of δ on politicalblog; (f) UB for different values of δ on simmons.
Symmetry 14 02428 g001
Figure 2. Ri, Nmi, and Error rate for different average degrees: x axis: the ratio of the average degree to the maximum degree; and the y axis: Ri, Nmi, Error rate. (a) Ri for different values of d ¯ ; (b) Nmi for different values of d ¯ ; (c) Error rate for different values of d ¯ .
Figure 2. Ri, Nmi, and Error rate for different average degrees: x axis: the ratio of the average degree to the maximum degree; and the y axis: Ri, Nmi, Error rate. (a) Ri for different values of d ¯ ; (b) Nmi for different values of d ¯ ; (c) Error rate for different values of d ¯ .
Symmetry 14 02428 g002
Figure 3. Ri, Nmi, and Error rate for different mixing parameters: x axis: μ and y axis: Ri, Nmi, Error rate. (a) Ri for different values of μ ; (b) Nmi for different values of μ ; (c) Error rate for different values of μ .
Figure 3. Ri, Nmi, and Error rate for different mixing parameters: x axis: μ and y axis: Ri, Nmi, Error rate. (a) Ri for different values of μ ; (b) Nmi for different values of μ ; (c) Error rate for different values of μ .
Symmetry 14 02428 g003
Figure 4. Ri, Nmi, and Error rate for different numbers of nodes in the largest community: x axis: the number of nodes; and y axis: Ri, Nmi, Error rate. (a) Ri for different values of maxc; (b) Nmi for different values of maxc; (c) Error rate for different values of maxc.
Figure 4. Ri, Nmi, and Error rate for different numbers of nodes in the largest community: x axis: the number of nodes; and y axis: Ri, Nmi, Error rate. (a) Ri for different values of maxc; (b) Nmi for different values of maxc; (c) Error rate for different values of maxc.
Symmetry 14 02428 g004
Table 1. The information of six real datasets.
Table 1. The information of six real datasets.
DataSetSourcenk d min d max d ¯
UKfacultyNepusz et al. (2008) [13]79323913.97
caltechTraud et al. (2011) [14]5908117943.46
dolphinsLusseau (2003) [15]6221125.12
karateZachary (1977) [16]3421174.6
politicalblogAdamic and Glance (2005) [1]12222135127.35
simmonsTraud et al. (2011) [14]11374129342.66
Table 2. Results on six real datasets.
Table 2. Results on six real datasets.
DataSetUKfacultyCaltechDolphinsKaratePoliticalblogSimmons
NJW0.98340.909110.94120.50030.8596
RIRSC0.98340.909110.94120.50030.8596
RSC- δ 10.89360.967710.90950.8550
NJW0.95020.613810.83650.00060.6796
NMIRSC10.58810.890410.71330.6143
RSC- δ 10.58670.890410.73170.6187
NJW1/79149/5900/621/34586/1222284/1137
Error rateRSC0/79170/5901/620/3464/1222244/1137
RSC- δ 0/79174/5901/620/3458/1222238/1137
δ 0 0.712.1552.4351.2050.150.625
Table 3. The ϕ ¯ k ( G ) k μ m a x μ m i n of six real datasets.
Table 3. The ϕ ¯ k ( G ) k μ m a x μ m i n of six real datasets.
DataSet μ min μ max ϕ ¯ k ( G ) μ max μ min ϕ ¯ k ( G ) k μ max μ min
UKfaculty1895190.19092.74601.5724
caltech144348210.50623.341013.5302
dolphins942240.04532.38300.2159
karate76800.12831.05260.2701
politicalblog16,17517,2530.09431.06660.2012
simmons879615,5920.29461.77262.0890
Table 4. Different methods of degree correction.
Table 4. Different methods of degree correction.
DataSetUKfacultyCaltechDolphinsKaratePoliticalblogSimmons
RSC10.89670.967710.90070.8521
RSC-min0.96460.9008110.90650.8590
RIRSC-max0.98340.9018110.51040.8525
RSC-mean0.96460.8976110.50020.8504
RSC-mid0.98340.9005110.50020.8539
RSC10.58810.890410.71330.6143
RSC-min0.89850.5953110.72430.6228
NMIRSC-max0.95020.6016110.02270.6172
RSC-mean0.89850.5933110.00190.6073
RSC-mid0.95020.6006110.00190.6189
RSC0/79170/5901/620/3464/1222244/1137
RSC-min2/79162/5900/620/3460/1222222/1137
Error rateRSC-max1/79163/5900/620/34521/1222242/1137
RSC-mean2/79170/5900/620/34586/1222240/1137
RSC-mid1/79164/5900/620/34586/1222237/1137
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, F.; Li, W.; Zhong, Y. A Further Study on the Degree-Corrected Spectral Clustering under Spectral Graph Theory. Symmetry 2022, 14, 2428. https://doi.org/10.3390/sym14112428

AMA Style

Liu F, Li W, Zhong Y. A Further Study on the Degree-Corrected Spectral Clustering under Spectral Graph Theory. Symmetry. 2022; 14(11):2428. https://doi.org/10.3390/sym14112428

Chicago/Turabian Style

Liu, Fangmeng, Wei Li, and Yiwen Zhong. 2022. "A Further Study on the Degree-Corrected Spectral Clustering under Spectral Graph Theory" Symmetry 14, no. 11: 2428. https://doi.org/10.3390/sym14112428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop