Next Article in Journal
HAG-NET: Hiding Data and Adversarial Attacking with Generative Adversarial Network
Previous Article in Journal
(Re)Construction of Quantum Space-Time: Transcribing Hilbert into Configuration Space
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Game Theoretic Clustering for Finding Strong Communities

1
Department of Computer Science, City University of Hong Kong, Hong Kong, China
2
School of General Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(3), 268; https://doi.org/10.3390/e26030268
Submission received: 17 January 2024 / Revised: 12 March 2024 / Accepted: 15 March 2024 / Published: 18 March 2024

Abstract

:
We address the challenge of identifying meaningful communities by proposing a model based on convex game theory and a measure of community strength. Many existing community detection methods fail to provide unique solutions, and it remains unclear how the solutions depend on initial conditions. Our approach identifies strong communities with a hierarchical structure, visualizable as a dendrogram, and computable in polynomial time using submodular function minimization. This framework extends beyond graphs to hypergraphs or even polymatroids. In the case when the model is graphical, a more efficient algorithm based on the max-flow min-cut algorithm can be devised. Though not achieving near-linear time complexity, the pursuit of practical algorithms is an intriguing avenue for future research. Our work serves as the foundation, offering an analytical framework that yields unique solutions with clear operational meaning for the communities identified.

1. Introduction

Community detection is a fundamental problem in various fields, such as biological study and social network analysis. The definition of a community can vary based on the specific problem and objective at hand, but the definitions provided in [1,2,3] are generally considered widely accepted. In broad terms, a community is commonly understood as a group of individuals with stronger connections among its members than with individuals outside the group.
In the process of conducting community detection, real-world problems are typically translated into graphs where nodes represent individuals and edges represent relations. Numerous community detection methods have been developed based on diverse principles and objective functions. Surveys of community detection methods can be found in [4,5,6,7,8,9,10].
Game theory has emerged as a technique applied in community detection [8,11,12,13,14,15]. Its applications extend to identifying disjoint, overlapping, and hierarchical communities. As a systematic framework, game theory models and studies the decisions and outcomes of players in a game [16,17]. Broadly, game theory can be categorized into two main types: non-cooperative game theory and cooperative game theory. Non-cooperative game theory focuses on the competition between individual players, emphasizing their strategies and payoffs. Cooperative game theory, on the other hand, focuses on the cooperation between players and addresses the allocation of payoffs to players based on the worth of the coalitions formed. Within cooperative game theory, there are two main types: non-transferable utility cooperative games, where the payoff for a player within a coalition cannot be transferred to another player in the same coalition, and transferable utility cooperative games, where payoffs are considered transferable among players in the same coalition. Solution concepts such as the core, kernel, nucleolus, Shapley value, egalitarian, etc., play crucial roles in cooperative game theory [18,19].
The community detection method based on cooperative game theory typically identifies the coalition with the highest score determined by a measure evaluated on the coalitions. However, due to the use of approximations, non-unique results are common. Zhou et al. [20] presented a community detection method using cooperative game theory and the Shapley value. The study focused on a social network where nodes are linked to relationships in various finite topics. The Shapley value represents a node’s contribution to the connection closeness of a coalition. The algorithm forms hierarchical and overlapping coalitions by iteratively adding each node to one of the coalitions formed in the previous iteration, where the newly added node obtains the largest Shapley value. Despite running in polynomial time, the algorithm relies on approximation. Another related approach for overlapping and hierarchical community detection [21] also employs cooperative game theory, and the hierarchical structure of coalitions is obtained through a greedy agglomerative method, potentially yielding non-unique results.
The Naming Game [22,23,24] presents another game theoretic approach applicable to community detection, where the community structure emerges from the dynamic interactions between pairs of nodes within the game. However, empirical evidence suggests that the solution is generally not unique [24]. The convergence and computational costs of the method are analyzed through extensive empirical experiments [22,24], while it remains unclear regarding the theoretical bounds. Furthermore, the Naming Game relies on pairwise connections and does not capture higher-order statistics among nodes beyond pairwise relationships, therefore limiting its scope of applications in community detection.
We introduce a notion of strength derived from cooperative game theory to identify strong communities that are interpretable. Moreover, the strong communities are unique, computable in polynomial time with recursive procedures, and can be represented by a dendrogram. The scope of consideration encompasses a set of individuals with a supermodular function for evaluating the communities, which means our approach is applicable to community detection tasks beyond graphical models. Our framework focuses on elucidating the theoretical properties of the strong communities and can provide the foundation for future research on empirical algorithms for large-scale datasets.
This paper is structured as follows: In Section 2, we present the relevant concepts in cooperative games. Section 3 outlines the derivation of the objective function based on convex games. We also formulate the definitions for community strength and strong communities in this section. Moving on to Section 4, we delve into the discussion of the properties of strong communities, laying the foundation for their computation. Section 5 details the solution to the problem through submodular function minimization and, in certain cases, introduces the use of the max-flow min-cut algorithm as a more efficient method in practice. In Section 6, concrete examples are provided to demonstrate the computation of strong communities and the representation of the dendrogram of these communities. Finally, in Section 7, we conclude our work.

2. Cooperative Game

A cooperative game [17] is characterized by ( V , g ) , where
  • V is a finite set of players with | V | 2 , and
  • g : 2 V R is a set function called the characteristic function, where g ( C ) is the worth of the coalition C V , assuming players in C cooperate to form such coalition.
Denote the payoff allocation for the players as a vector
r V = ( r 1 , r 2 , , r | V | ) R | V | ,
with r i being the i-th element in r V as the payoff allocated for i-th player.
The total payoff in the coalition C V is denoted as
r ( C ) : = i C r i .
Furthermore, when g is a supermodular function, the game is called convex [17]. In this case, for B , C V ,
g ( B ) + g ( C ) g ( B C ) + g ( B C ) .
Or equivalently, for B C V , i V B ,
g ( B { i } ) g ( B ) g ( C { i } ) g ( C ) ,
where both sides are the increases in worth when a player i is added to a coalition. (3) means that the increase in worth, when a player adds to a coalition, is equal or larger than that for a larger superset coalition, i.e., the marginal worth is non-diminishing for convex games. For simplicity, g is thought to be normalized, i.e., g ( ) = 0 .
As for the payoff allocation, the transferable utility is considered here, i.e., the payoffs can be transferred between players in the same coalition. The core [18] is one of the relevant solution concepts in cooperative games, which is about the feasible allocation of payoffs to players.
The core of a game ( V , g ) is defined as [17]:
Core ( V , g ) : = { r V R | V | r ( V ) = g ( V ) , r ( C ) g ( C ) , C V } .
In the definition of the core, r ( V ) = g ( V ) means the payoff allocation exactly splits the total worth of the grand coalition V. The inequality r ( C ) g ( C ) says that no other coalition C V can have a worth larger than the payoff C can receive by cooperating in V, and hence will not deviate from the grand coalition V. The core can be viewed as the stable payoff allocation. For a convex game, the core is always nonempty [17,25].

3. Problem Formulation

By regarding the set V of nodes as players, we consider a convex game with the characteristic function g being a supermodular function on 2 V .
In particular, consider a weighted digraph on the set V of nodes. Such a graph can be characterized by the weight of the directed edges described using the weight function w:
w ( B , C ) = i B , j C a i j ,
where a i , j is the weight of the edge from node i to node j. This covers the undirected graphs special cases when a i j = a j i for all i , j V .
Consider the function g defined in the form of
g ( B ) = γ · β · w ( B , B ) γ · ( 1 β ) · w ( V B , B ) ,
where β [ 0 , 1 ] and γ > 0 . The function g in (5) is supermodular [26]. When β = 0 , (5) reduces to the total weight of edges in B scaled by γ ; when β = 1 , (5) reduces to the negative of the total weight of incoming edges from outside to B scaled by γ .
We want to identify strong communities based on the convex game using the following measure of community strength.
Definition 1.
For C V : | C | > 1 , define
σ ( C ) : = min B C B max r C C o r e ( C , g ) min i B r i ,
which is referred to as the strength of community C.
The inner maximization in (6) is the stable payoff guaranteed to any player in B, which we termed the community support to B. The outside minimization in (6) gives the strength of C, which is the minimum community support over B C : B .
The following example illustrates the interpretation of the strength in (6) more concretely.
Example 1.
Consider the unweighted graph in Figure 1a with V = { 1 , 2 , 3 } and characteristic function g in (5) with β = 1 and γ = 1 , i.e., for B V : B , g ( B ) = 1 2 w ( B , B ) , which calculates the total number of internal edges inside B.
We are going to show how to obtain the strength of { 1 , 2 } , which requires us to calculate the minimum community support over B { 1 , 2 } : B according to (6). By definition, the community support to B from { 1 , 2 } is the stable payoff that is guaranteed to each player in B. For a payoff allocation r { 1 , 2 } to be stable, it should be in C o r e ( { 1 , 2 } , g ) , which is calculated to be
C o r e ( { 1 , 2 } , g ) = { ( r 1 , r 2 ) | r 1 + r 2 = 1 , r 1 0 , r 2 0 } .
Then, we consider the guaranteed stable payoff to each player in B. For instance, when B = { 1 } , the guaranteed stable payoff to players in B is 1, which is achieved with the payoff allocation r { 1 , 2 } = ( 1 , 0 ) ; when B = { 2 } , the stable guaranteed payoff to players in B is 1, which is achieved with the payoff allocation r { 1 , 2 } = ( 0 , 1 ) . Therefore, we know that the minimum community support to any non-empty proper subset of { 1 , 2 } is 1, i.e., the strength of { 1 , 2 } is 1.
Additionally, there is 1 unit of payoff that is transferable between players 1 and 2 based on the constraint for the core. Such a transferable payoff tends to improve the guaranteed payoff for players in non-empty subsets of C. As a result, { 1 , 2 } intuitively forms a meaningful community.
Similarly, we can show that the strength of V is 0. For a payoff allocation r V to be stable, it has to be in C o r e ( V , g ) which is calculated to be (see Appendix A.1)
C o r e ( V , g ) = { r V : = ( r 1 , r 2 , r 3 ) | r 1 + r 2 1 , r 1 0 , r 2 0 , r 3 = 0 } .
Then, we consider the community support to B V : B . For instance, when B = { 1 } , the community support to B is 1, which is achieved with the payoff allocation r V = ( 1 , 0 , 0 ) . By enumeration over B V : B , we can obtain that when B = { 3 } , the community support to B is 0, which is the minimum value of such community support. Hence, the strength of V is 0.
There is another equivalent definition for σ ( C ) with (1) where we consider the average payoff allocated to a set B C : B instead of the inner minimization term min i B r i in (1), as stated in the following result.
Proposition 1.
For C V : | C | > 1 ,
σ ( C ) = min B C : B max r C C o r e ( C , g ) r ( B ) | B | ,
Proof. 
See Appendix A.2. □
Our goal is to identify strong communities defined using σ as follows.
Definition 2.
For any threshold α , define the collection of strong communities in V as
C α ( V ) : = maximal { C V | C | > 1 , σ ( C ) > α } .
The maximal F means inclusion-wise maximal subsets in F , i.e.,
maximal F : = { B F | C F , B C } .
Similarly, minimal F means inclusion-wise minimal subsets in F , i.e.,
minimal F : = { B F | C F , B C } .
Figure 1. An illustrative example of an unweighted graph with g ( B ) : = 1 2 w ( B , B ) for B V : B . (a) The unweighted graph; (b) Visualization of Core ( V , g ) ; (c) The curve f ^ α ( V ) ; (d) The dendrogram.
Figure 1. An illustrative example of an unweighted graph with g ( B ) : = 1 2 w ( B , B ) for B V : B . (a) The unweighted graph; (b) Visualization of Core ( V , g ) ; (c) The curve f ^ α ( V ) ; (d) The dendrogram.
Entropy 26 00268 g001
Example 2.
In Example 1, we already get σ ( V ) = 0 . Similarly, we can get σ ( { 1 , 2 } ) = 1 , σ ( { 1 , 3 } ) = 0 and σ ( { 2 , 3 } ) = 0 .
According to (10), the strong communities in V given by our approach are
C α ( V ) = { V } , α < 0 , { { 1 , 2 } } , α 0 .

4. Main Results

4.1. Characterization of Community Strength

The community strength defined in (9) takes a simpler form for the convex game as shown in Theorem 1.
Theorem 1.
For any C V : | C | > 1 ,
σ ( C ) = min B C : B g ( C ) g ( B ) | C B | ,
Furthermore, the set of optimal solutions to (9) is given by
{ C B | B S ( C ) } ,
where S ( C ) is the set of optimal solutions to the minimization in (14).
Proof. 
See Appendix A.3. □
Equation (14) is the basic formula of community strength that we will utilize to derive the properties of the strong communities and investigate how to calculate the strong subsets.
The following example shows the equivalent value of the strength of V calculated by (9) and (14).
Example 3.
Consider V as in Example 1, follow (14) to calculate the strength of V,
σ ( V ) = min B V : B g ( V ) g ( B ) | C B |
= 0
with S ( V ) = { { 1 , 2 } } . The value of σ ( V ) calculated here according to (14) is consistent with that calculated in Example 1.
Define for α R and C V : | C | > 1 that
f ^ α ( C ) : = min B C : B f α ( B ) , where
f α ( B ) : = α | B | g ( B ) , for B V .
Denote the optimal solution set to (18) as B α ( C ) , and the collection of inclusion-wise minimal sets among B α ( C ) as B α * ( C ) , i.e.,
B α * ( C ) : = minimal B α ( C ) .
B α ( C ) is the set that we use to analyze the relation between σ ( C ) and the curve f ^ α ( C ) , and B α * ( C ) is the set we use for showing the computation of C α ( V ) in the latter part.
The following example shows the curve of f ^ α ( V ) for the set V in Example 1.
Example 4.
Consider V as in Example 1, according to (18),
f ^ α ( V ) = 3 α 1 , α < 0 , 2 α 1 , 0 α < 1 , α , α 1 ,
and the inclusion-wise minimal solution set to (18) is given by
B α * ( V ) = { V } , α < 0 , { { 1 , 2 } } , 0 α < 1 , { { i } | i V } , α 1 ,
as illustrated in Figure 1c, where the result for f ^ α ( V ) and B α * ( V ) can be obtained directly after we draw every curve of f α ( B ) for B V : B .
For instance,
  • when α ( 0 , 1 ) , f ^ α ( V ) is given by f α ( { 1 , 2 } ) , hence B α * ( V ) = { { 1 , 2 } } .
  • when α = 0 , both { 1 , 2 } and V are solutions to (18) with respect to f ^ α ( V ) , while { 1 , 2 } is the inclusion-wise minimal solution, hence B α * ( V ) = { 1 , 2 } .
From (19), it can be seen that f α ( B ) is a linear function of α with slope | B | . Therefore, f ^ α ( C ) in (18) is a piece-wise linear function since it is a minimization of linear functions. With C V : | C | > 1 , the curve must have at least one turning point since the slope of f α ( { i } ) , i V is different from f α ( C ) . Figure 1c is the curve of f ^ α ( V ) for V in Example 1.
The following result shows that σ ( C ) can be obtained from the curve. It will be used for deriving the representation and computation of the strong communities defined in Definition 2.
Proposition 2.
For the curve f ^ α ( C ) against α R :
(1) 
σ ( C ) is the α -coordinate of the first turning point. More precisely,
min B C : B f α ( B ) = f α ( C ) σ ( C ) = α ,
min B C : B f α ( B ) > f α ( C ) σ ( C ) > α .
(2) 
The collection B α ( C ) of optimal solution to (18) satisfies
B α ( C ) C , f o r α > σ ( C ) ,
B α ( C ) = S ( C ) { C } , f o r α = σ ( C ) ,
B α ( C ) = { C } , f o r α < σ ( C ) .
Proof. 
See Appendix A.4. □
The following example can further illustrate the property of f ^ α ( V ) .
Example 5.
In Example 4, the first turning point of the curve f ^ α ( V ) is ( 0 , 1 ) , whose α-coordinate is exactly σ ( V ) .

4.2. Representation of Strong Communities

The strong communities defined in Definition 2 form a hierarchy and can be represented by a dendrogram.
Theorem 2.
For any C 1 C α 1 ( V ) , C 2 C α 2 ( V ) where α 1 α 2 , we have
C 1 C 2 , o r C 1 C 2 = .
Furthermore,
i f C 1 C 2 , t h e n α 1 < α 2 .
Proof. 
See Appendix A.6. □
The following example shows that the strong communities in Figure 1a as in Example 1 with respect to two different α ’s have a containment relationship.
Example 6.
Let α 1 : = 1 and α 2 : = 1 . By the calculation results in Example 2, C 1 : = V C α 1 ( V ) and C 2 : = { 1 , 2 } C α 2 ( V ) . Then C 1 C 2 , which means the communities in C α 1 ( V ) are contained by those in C α 2 ( V ) . This shows the hierarchical structure of the strong communities with respect to the specific α 1 and α 2 .
Theorem 2 follows from the following lemma.
Lemma 1.
For all C 1 , C 2 V : C 1 C 2 ,
σ ( C 1 C 2 ) min { σ ( C 1 ) , σ ( C 2 ) } .
Proof. 
See Appendix A.5. □
Example 7.
As an example for showing Lemma 1, consider Figure 1a as in Example 1, let C 1 : = { 1 , 2 } and C 2 : = { 2 , 3 } , then C 1 C 2 . By the calculation results in Example 2,
σ ( C 1 C 2 ) = 0 min { σ ( C 1 ) , σ ( C 2 ) } ,
i.e., (28) holds for C 1 and C 2 .
Lemma 1 establishes that the strength of the union of any two overlapping non-empty sets is lower bound by the smaller strength among the two sets, and this is the basis for Theorem 2.
The family α R C α ( V ) is said to be laminar and can be shown to contain at most | V | 1 elements. More precisely, we will show that the family of communities, together with their levels of strength, can be represented by the following dendrogram with σ , meaning the cophenetic similarity.
Definition 3.
The dendrogram for the set of communities is defined as follows:
(1) 
Every C α R C α ( V ) is an internal node annotated with the value σ ( C ) ;
(2) 
Every singleton {i} for i V is a leaf node (annotated with the value + );
(3) 
The parent of each node B V : B is defined as the minimum
parent ( B ) : = min { C C α ( V ) | B C , α R } .
As illustrated in Figure 2, the dendrogram forms a tree because each node (except the root node V) has a unique parent node.
As a result of Theorem 2, the following corollary states that the parent of each strong community except V exists and is unique.
Corollary 1.
For every B V : B , the minimum element parent ( B ) exists.
Proof. 
See Appendix A.7. □
Using the following result, we can show that the set of children for each node C α R C α ( V ) is
C σ ( C ) ( C ) { { i } | i V C σ ( C ) ( C ) } ,
which is also illustrated in Figure 2.
Analogous to Corollary 1, a community B has a parent C in the dendrogram if and only if B is in C σ ( C ) ( C ) , and the strength of B is larger than that of C, as stated in the following corollary.
Corollary 2.
For any nodes C α R C α ( V ) of the dendrogram,
parent ( B ) = C B C σ ( C ) ( C ) ,
which implies σ ( B ) > σ ( C ) .
Proof. 
See Appendix A.8. □
Example 8.
For Figure 1a as in Example 1, by the calculation results in Example 2, the dendrogram that corresponds to α R C α ( V ) is shown in Figure 1d.
We defined the community strength in (6) by modeling the problem based on the convex game in game theory, gave its alternative forms in (9) and (14), and showed that the community strength and the solutions to the minimization of (14) are related to the first turning point of the curve defined by (18) against the parameter α . We also showed that the collection of strong communities defined in (10) form a hierarchy and can be represented by a dendrogram. These motivate the methods for computing strong communities, as described in the following section.

5. Computation of Strong Communities

In this section, we will show how to calculate the strong communities in C α ( V ) at a threshold α , and how to calculate all the strong communities.
The following result shows that C α ( V ) can be calculated with a recursive procedure.
Theorem 3.
For α R , C α ( V ) can be calculated with the following recurrence relation
C α ( V ) = (31) , if | V | 1 , (32) ( B α * ( V ) { { i } | i V } ) C α ( U ) , otherwise ,
where
U : = V B α * ( V ) ,
B α * ( V ) is defined in (20), and (31) is the base case.
Proof. 
See Appendix A.9. □
Theorem 2 shows that C α ( V ) can be computed in a divisive way. In the first recursive step, V is the ground set, if | V | 1 , we directly calculate C α ( V ) = by the base case (31) and stop the recursion; otherwise we calculate B α * ( V ) , then B α * ( V ) { { i } | i V } is the set of newly found strong subsets, and we enter the next recursive step. The new recursive step is similar to the first recursive step, but we use U given in (33) as the ground set.
The following example shows how to run the recursive procedure in Theorem 3 for computing C α ( V ) .
Example 9.
Consider Figure 1a as in Example 1 and we calculate C α ( V ) at α = 1 2 by following Theorem 3.
(1) 
The first recursive step:
  • | V | > 1 , which corresponds to the case in (32).
  • Then we need to compute B α * ( V ) . By the calculation in Example 4, we know B α * ( V ) = { { 1 , 2 } } .
  • By (32), the elements in B α * ( V ) { { i } | i V } = { { 1 , 2 } } are in C α ( V ) , and computing C α ( U ) will provide us the remaining strong subsets in C α ( V ) , where U is given by (33). Here, U = { 3 } .
(2) 
The second recursive step:
  • Regard U as the ground set and compute C α ( U ) according to (31) and (32).
  • Since | U | = 1 , the base case (31) applies, which means C α ( U ) = , and the recursive procedure ends.
According to the recursive steps, C α ( V ) = { { 1 , 2 } } .
Notice that in this example, there are two recursive steps in total. For some other examples where the U obtained in the first recursive step has a cardinality larger than 1, i.e., the case (32) applies, then the following recursive step will be similar to the first recursive step except that U instead of V is regarded as the ground set.
Additionally, we use the B α * ( V ) from the calculation in Example 4, which employs a brute force method enumerating all B V : B . We will discuss how to compute B α * ( V ) in polynomial time later.
The following example illustrates why there are strong communities not in B α * ( V ) and why the recursive procedure in Theorem 3 can identify those strong communities.
Example 10.
Consider Figure 3a on V = { 1 , 2 , 3 , 4 } with function g defined by (5) with γ = 1 and β = 1 2 . Then g ( C ) is the total weight of internal edges in C.
In the graph, C 1 : = { 1 , 2 } and C 2 : = { 3 , 4 } have relatively large total weights of internal edges compared with other subsets of V hence they are meaningful communities that are expected to be identified.
Let α = 2 . B α * ( V ) in (20) contains the minimal non-empty subsets of V that leads to the minimum value of f α ( · ) in (19).
C 1 is identified by B α * ( V ) because f α ( C 1 ) achieves the minimum value among all the non-empty subsets of V. However, the other meaningful community C 2 is not in B α * ( V ) because C 2 will never be a subsets of V that leads to the minimum value of f α ( · ) , as f α ( C 1 ) < f α ( C 2 ) always holds. In other words, C 1 dominates C 2 .
To identify C 2 , we remove the nodes that appeared in the communities in (32) from V, as described in (33), and then start a new recursive step to identify strong communities within the remaining nodes. Since the community C 1 that dominated C 2 in V was removed, C 2 can now be identified with B α * ( · ) . In this way, the recursive procedure in Theorem 3 works to identify all the strong communities in C α ( V ) .
Figure 3. A simple digraph and the dendrogram when g is defined by (46) with different β . (a) The digraph; (b) The dendrogram when β = 1 ; (c) The dendrogram when β = 1 2 .
Figure 3. A simple digraph and the dendrogram when g is defined by (46) with different β . (a) The digraph; (b) The dendrogram when β = 1 ; (c) The dendrogram when β = 1 2 .
Entropy 26 00268 g003
For the recurrence relation in Theorem 3 to be applicable, the recursive procedure in Theorem 3 finishes with finite recursive steps. The following results imply that U in (31) always has a smaller size than V.
Proposition 3.
For α R and the set V with | V | 2 ,
| B α * ( V ) | 1 .
Proof. 
See Appendix A.10. □
As a result of Proposition 3, the number of recursive steps in Theorem 3 needed is bounded by | V | .
Proposition 4.
For a non-empty set V, it takes at most | V | recursive steps to calculate C α ( V ) by Theorem 3.
Proof. 
See Appendix A.11. □
The following property is the basis of Theorem 3, which ensures that the recursive procedure in Theorem 3 does not leave out any strong communities in C α ( V ) for a chosen α .
Proposition 5.
For any B B α * ( V ) : | C | > 1 , C V ,
B C a n d C B σ ( C ) α ,
or the contrapositive
σ ( C ) > α B C = o r C B .
Proof. 
See Appendix A.12. □
Equation (36) implies that any other strong subset with strength larger than α is either disjoint with the elements in B α * ( V ) , or is a subset of an element in B α * ( V ) . This ensures that when we continue the computation with U in (33) as the ground set after we obtained the strong subset with strength larger than α in B α * ( V ) { { i } | i V } , the remaining strong subsets will be captured by C α ( U ) .
To obtain C α ( V ) , calculating B α * ( V ) is a basic step that requires optimization of (18), which can be done based on the method in [26] as described in the following.

5.1. Divide-and-Conquer

We rewrite the minimization of f ^ α ( V ) in (18) in a similar way as that in [26]:
f ^ α ( V ) = min t V f ^ α ( t ) ( V ) , where
f ^ α ( t ) ( V ) : = min B V : t B f α ( B ) ,
which is a two-step optimization problem, and denote B α ( t ) ( V ) as the minimal optimal solution set to (38).
Since f ^ α defined in (18) is a submodular function, B α ( t ) ( V ) can be solved with submodular function minimization (SFM) algorithms, and it has a unique element since the feasible domain { B V | j B } is a lattice ([27] Proposition 10.1).
Let T α * ( V ) be the set of optimal solutions t to (37), we have the following result that indicates how B α * ( V ) can be calculated.
Proposition 6 
([26] Proposition 2). For α R , B α * ( V ) in (20) can be obtained from B α ( t ) ( V ) , t V and T α * ( V ) by
B α * ( V ) = minimal t T α * ( V ) B α ( t ) ( V ) .
Proof. 
See Appendix A.13. □
According to Theorem 3, computing C α ( V ) , all the strong subset with a strength larger than α , can be done with the following steps:
(1)
Calculate B α ( t ) ( V ) for t V by optimizing (38) with SFM algorithms;
(2)
Calculate T α * ( V ) by optimizing (38);
(3)
Calculate B α * ( V ) according to (39);
(4)
Calculate U by (33);
(5)
M is the newly found strong subsets that have a strength larger than α in this recursive step. If | U | 1 , then stop; otherwise, regard U as V and go to (1) to start a new recursive step.
The union of the set of strong subsets calculated in all the recursive steps is C α ( V ) .
To calculate all the strong subsets, i.e., α R C α ( V ) , for each t V , define
g ( t ) ( A ) : = g ( A { t } ) g ( { t } ) for A ( V { i } ) ,
then
B ( V { i } ) α | B | g ( t ) ( B )
is a normalized submodular function.
We need to obtain the minimal optimal solution to (41) for all α R . Luckily, with SFM algorithms such as Wolfe’s minimum norm point algorithm [28], for (41), we can obtain for some N t N the sequence of α 1 ( t ) , α 2 ( t ) , , α N t ( t ) and the corresponding sequence of sets A 1 ( t ) , A 2 ( t ) , , A N t ( t ) that satisfies ([27] Proposition 8.6)
= α 0 ( t ) < α 1 < α 2 ( t ) < < α N t ( t ) < α N t + 1 ( t ) ,
A 0 ( t ) : = V A 1 ( t ) A 2 ( t ) A N t ( t ) : = ,
and for any α [ α i ( t ) , α i + 1 ( t ) ) , i { 0 , 1 , , N t } ,
A i ( t ) is the minimal minimizer to ( 41 ) .
Equation (44) means with the sequences (42) and (43), we can obtain the minimum solution to (41) for all α R .
For any α R , if A ( t ) is the unique minimal solution to (41), then B ( t ) = A ( t ) { i } will be the unique minimal solution to (37), or in another word,
B α ( t ) ( V ) = { B ( t ) } ,
since g ( B ) g ( t ) ( B ) = g ( { t } ) is a constant for all C V { t } . This means the minimal solution set B α ( t ) ( V ) to (37) for all α R can be obtained from the solutions to (41) with sequences (42) and (43).
Therefore, with sequences (42) and (44) for all t V , T α * ( V ) can be obtained for all α R . Then calculating C α ( V ) for all α t V { a 0 ( t ) , α 1 , , α N t + 1 ( t ) } based on Theorem 3, Proposition 6 and T α * ( V ) is sufficient for us to obtain α R C α ( V ) .
With MNB ( n ) denoting the complexity of the minimum norm base algorithm for SFM on the ground set of size n, we have the following result.
Proposition 7.
α R C α ( V ) can be computed in O ( | V | 3 MNB ( | V | ) ) time.
Proof. 
See Appendix A.14. □

5.2. Using Max-Flow Min-Cut Algorithm

For the step of optimizing (38) in computing C α ( V ) , SFM algorithms are used. However, SFM algorithms are generally computationally expensive. There are works on improving the efficiency of SFM problems by max-flow min-cut algorithms [29,30]. We discuss a category of choices for g when the max-flow min-cut algorithm can be utilized for computing C α ( V ) .
Consider the function g defined in the form of ([26] Difinition 2)
g ( B ) = β · w ( B , B ) ( 1 β ) · w ( V B , B ) ,
which is a special case of (5) with γ = 1 .
Following the method in [26], we can construct an augmented digraph and run a max-flow min-cut algorithm [31,32,33,34] to obtain the solution to (38). With α in (19) and β in (46), the ( α , β , t ) -augmented digraph [26] is a digraph on { V } { s } where s V is an additional node, with the edge weight w α , β , t : ( V { i } ) R 0 defined as
w α β t ( i , j ) : = w ( i , j ) , i V , j V { t } w ( i , j ) + β d i , i V , j = t α , i = s , j V 0 , otherwise .
Proposition 8 
([26] Theorem 3). The B α ( t ) ( V ) contains a unique minimum set C such that { s } V C , C is a minimum s-t cut of the ( α , β , t ) -augmented digraph.
Proposition 8 implies that B α ( t ) ( V ) can be solved by max-flow min-cut algorithm. Moreover, with the parametric max-flow min-cut algorithms [34], we can obtain B α ( t ) ( V ) for all α 0 . Hence, when g has the form of (46), computing C α ( V ) for a certain α or all α follows the same procedure in Section 5.1, except that we can use max-flow min-cut algorithms to calculate B α ( t ) ( V ) instead of SFM algorithms.

6. Discussions

To illustrate the dendrogram of strong communities found by our approach, the digraph in Figure 3a is used as an example, with the function g given in (46) for different choices of β for experiments.
The result for the cases β = 1 and β = 0.5 are shown in Figure 3b and Figure 3c, respectively. The example of the calculation procedures based on Theorem 3 for the case β = 1 is in Appendix A.15.
We can obtain the collection of strong communities C α ( V ) in (10) for α from the dendrogram. For instance, the strong communities for α = 5 2 is,
C 5 2 ( V ) = { 1 , 2 } , { 3 , 4 } .
The parameter β in (46) is a balancing factor between the total weight of internal edges and the negative total weight of incoming edges, and when β = 1 , it can be used for the problem of finding the minimal densest subgraphs.
In [35], another kind of augmented graph is constructed, and an algorithm is given for quickly increasing α value based on the current community that has already been found and then conducting max-flow min-cut algorithms. We want to point out that, although the method there is similar to solving B α ( V ) in our work, the algorithm there for calculating the next critical α , as the author also said, needs more calculation steps if we want to obtain more solutions for intermediate α s . In other words, not all critical α s are found, while our approach calculates all the critical α s and the solutions directly. Additionally, our approach goes beyond just finding B α ( V ) , and we considered digraphs, which can be generalized to undirected graphs directly.
The work in [26] extends the notion of web communities [36] to digraphs and calculates the communities in polynomial time, which is closely related to our approach. In fact, B α * ( V ) { { i } | i V } is the set of web communities, which is included in the strong communities defined in (10) in this work. For a set C V : C , subsets of V C can prevent C from being a web community in [26], even if C has a strength larger than α . Nevertheless, such a phenomenon does not exist for strong communities detected by our approach. Whether C is a strong community in (10) or not is independent of subsets of V C according to (10). In Example 10, C 1 is a web community, and C 2 is not, since C 2 is dominated by C 1 . However, C 2 is a meaningful community that is expected to be identified. The web community method fails to identify C 2 , while our approach can identify it as a strong community, as we have calculated in the example.
Our approach also addresses some known issues associated with existing community detection methods. For instance, Modularity [37], a common community detection method, is NP-hard and suffers from the limitation of resolution limit [38]. There are works such as [39,40] aiming at resolving the resolution limit issue, yet both rely on heuristics to obtain solutions. It is worth mentioning that Modularity is a measure applied over partitions, while our strength measure is on individual communities. In Figure 4, there are four complete graphs of two sizes, m 1 = 20 and m 2 = 5 . Despite the two complete graphs of size m 2 being smaller than the other two, they should be identified as communities since they are the maximal complete graphs. However, Modularity fails to recognize the two smaller complete graphs, and instead, it merges them into a single community [38]. In contrast, our approach successfully identifies the two smaller complete graphs of size m 2 as strong communities.
The strong communities are derived from game theory, where the strength can be interpreted as the support inside the community that can be shared with a part of individuals in need in the same community. Application in real-world problems is promising, such as finding small groups of advertisers and keywords in sponsored auctions, where the community strength means the average money inside the groups [35,41].

7. Conclusions

We introduced a novel concept of strength for community detection using a convex game model in cooperative game theory. It can be applied to networks with a supermodular characteristic function. Theorem 1 establishes the dual objective function, based on which we conducted a comprehensive analysis of strong community properties. The laminar structure demonstrated in Theorem 2 reveals that strong communities form a hierarchy and can be represented by a dendrogram.
To compute strong subsets, Theorem 3 introduces a recurrence relation, enabling polynomial time calculations through submodular function minimization. For specific characteristic function choices in the convex game on the graphical model, an augmented digraph can be constructed to apply the max-flow min-cut algorithm to improve computation efficiency.
Unlike many existing community detection methods, which often rely on approximation, are non-deterministic, and lack guarantees on complexity, our approach for community detection is deterministic, computable in polynomial time, and supported by a rigorous theoretical analysis of its properties. Since our approach captures high-order statistics through the supermodular characteristic function, the primary limitation of our approach lies in its computational complexity. This complexity presents a challenge when applying the method to large-scale real-world datasets. Nevertheless, our work proposes an analytical framework for community detection that yields unique solutions and provides theoretical foundations for future research aimed at improving the complexity and empirical applications.

Author Contributions

Conceptualization, C.C.; Formal analysis, C.C.; Investigation, A.A.-B.; Project administration, C.C.; Software, C.Z.; Supervision, C.C.; Visualization, C.Z.; Writing—original draft, C.Z. and C.C.; Writing—review and editing, A.A.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

We show the computation and visualization of Core ( V , g ) in Example 1.
According to (4),
(A1) Core ( V , g ) = { r V : = ( r 1 , r 2 , r 3 ) | r 1 + r 2 + r 3 = g ( V ) = 1 , (A2) r 1 + r 2 g ( { 1 , 2 } ) = 1 , (A3) r 1 + r 3 g ( { 1 , 3 } ) = 0 , (A4) r 2 + r 3 g ( { 2 , 3 } ) = 0 , (A5) r 1 , r 2 , r 3 0 } (A6) = { ( r 1 , r 2 , r 3 ) | r 1 + r 2 1 , r 1 0 , r 2 0 , r 3 = 0 } ,
where r 1 , r 2 , r 3 are the payoff allocated to players 1 , 2 and 3, respectively.
Equation (A1) ensures that the payoff vector r V exactly splits the total worth of V, (A2) ensures that the total payoff for { 1 , 2 } is no less than the worth of { 1 , 2 } so that players in { 1 , 2 } will not deviate from V. The interpretation for (A3)–(A5) is similar to (A2). The intersection defined by (A1) through (A5) is the Core ( V , g ) which is visualized in Figure 1b, where
  • (A1) is the region given by the plane containing ( 1 , 0 , 0 ) , ( 0 , 1 , 0 ) and ( 0 , 0 , 1 ) , as indicated by green color;
  • (A2) is the region that includes the plain and on the opposite side of the origin relative to the plane, which contains ( 1 , 0 , 0 ) and ( 0 , 1 , 0 ) and parallel to r 3 axis, as indicated by red color;
  • (A5) restricts each element of r V to be non-negative, which includes the region represented by (A3) and (A4); and
  • the line segment in blue between ( 1 , 0 , 0 ) and ( 0 , 1 , 0 ) is Core ( V , g ) .
Equation (A6) is the simplified algebraic form for Core ( V , g ) .

Appendix A.2

Proof of Proposition 1. 
Given C V : | C | > 1 , for any B C : B and payoff vector r C ,
min i B r i r ( B ) | B | .
Then, we show that the equality in (A7) holds. Suppose node j B is a solution to the left-hand side of (A7), i.e.,
min i B r i = r j .
Let B = { j } , then the right-hand side of (A7) is
r ( B ) | B | = r j .
Equations (A8) and (A9) imply that the equality in (A7) holds.
Since the equality in (A7) holds for B C : B and payoff vector r C , then
min B C B max r C Core ( C , g ) min i B r i = min B C B max r C Core ( C , g ) r ( B ) | B | ,
which establishes (9) according to (6). □

Appendix A.3

Proof of Theorem 1. 
For B C , B , by the relationship between r ( · ) and g ( · ) when r C Core ( C , g ) , we have
r ( B ) = r ( C ) r ( C B ) g ( C ) g ( C B ) ,
where the equality is achieved when r ( C B ) = g ( C B ) . Hence,
max r C Core ( C , g ) r ( B ) | B | g ( C ) g ( C B ) | B |
Next, we will show that r C Core ( C , g ) s.t. r ( C B ) = g ( C B ) so that the equality is achieved in (A11).
To ease the notation, denote
T : = C B .
With T, (A11) becomes
max r C Core ( C , g ) r ( B ) | B | g ( C ) g ( T ) | C T |
For the convex subgame ( T , g ) , the core of ( T , g ) is non-empty since the game is convex [17,25]. Suppose r T Core ( T , g ) , which means
r ( T ) = g ( T ) , and
r ( T ) g ( T ) , T T .
Then we show the steps to construct a vector r C from r T with r C Core ( C , g ) .
Step 1: Select an i, with
i arg max j : j C T g ( T { j } ) g ( { j } ) .
Step 2: Assign
r i = g ( T { i } ) g ( { i } ) ,
and construct r T { i } by values from r T and r i .
For T T , by supermodularity of g,
g ( T { i } ) g ( T ) g ( T { i } ) g ( T ) .
Then,
(A19) r ( T { i } ) = r ( { i } ) + r ( T ) (A20) = g ( T { i } ) g ( { i } ) + r ( T ) (A21) g ( T { i } ) g ( { i } ) + g ( T ) (A22) g ( T { i } ) ,
where (A19) is by (1), (A20) is by (A17), (A21) is by (A15), and (A22) is by (A18).
Equations (A22), (A14) and (A15) indicate that
r T { i } Core ( T { i } , g ) .
Update T by T { i } , and continue the Step 1 and Step 2 above, and finally we will have a constructed
r C Core ( C , g ) ,
with r ( T ) = g ( T ) preserved.
This means, for B C , B , the formula (A11), or its equivalent formula (A13), can achieve equality with the constructed r C obtained in (A24). Then (14) is implied considering (9).
Therefore, the minimization over the non-empty proper subset of C for the right-hand side of (A13) will lead to a value equal to the minimization over the non-empty proper subset of the left-hand side of (A13), which is the community strength for C defined in (9). And based on the above proof, we know that if a non-empty set B * is a solution for the minimization in (9), the set T * : = C B * will be a solution for the minimization in (14).
Hence, Theorem 1 is established. □

Appendix A.4

Proof of Proposition 2.
We first show the second property in Proposition 2, which will imply the first property.
Denote the sign of a number x by
s g n ( x ) : = 1 , x > 0 1 , x < 0 0 , x = 0 .
We have
s g n ( σ ( C ) α )
= s g n ( min B C : B = g ( C ) g ( B ) | C B | α )
= s g n ( min B C : B = g ( C ) g ( B ) α | C B | )
= s g n ( min B C : B = f α ( B ) f α ( C ) ) ,
where
  • (A26a) is by (14);
  • (A26b) is because | C B | > 0 and B C and s g n ( x ) = s g n ( a x ) if a > 0 ;
  • (A26c) is by the definition of f α in (19).
Note also that the sets of optimal solutions to the minimization/maximization in each step are the same. Hence,
σ ( C ) < α s g n ( σ ( C ) α ) < 0
s g n ( min B C : B = f α ( B ) f α ( C ) } < 0
min B C : B = f α ( B ) < f α ( C )
C B α ( C ) ,
where
  • (A27) is by (A25);
  • (A28) is by (A26c);
  • (A29) is by (A25);
  • (A30) is by definition of B α ( C ) .
Equation (A30) implies (25).
Similarly, with the inequalities “<” replaced by “>”, and simply by definition of B α ( C ) , we have
σ ( C ) > α min B C : B = f α ( B ) > f α ( C ) { C } = B α ( C ) ,
which implies (24) and (27).
With the inequalities replaced by equalities, we have
σ ( C ) = α min B C : B = f α ( B ) = f α ( C ) ,
which implies (23) and (26). This completes the proof. □

Appendix A.5

Proof of Lemmma 1. 
Assume
σ ( C 1 ) σ ( C 2 ) = α
It suffices to show that σ ( C 1 C 2 ) α , or equivalently, by (23) and (24),
f α ( B ) f α ( C 1 C 2 ) , B C 1 C 2 : B .
Consider the case B C 1 first,
f α ( B ) ( a ) f α ( C 1 ) ( b ) f α ( C 1 C 2 ) f α ( C 1 ) + f α ( C 1 C 2 ) 0 ( c )
which implies (A32) as desired, where
  • ( a ) and ( c ) are by (23) and (24), since both σ ( C 1 ) and σ ( C 2 ) are at least α by (A31);
  • ( b ) is because f α is submodular due to the supermodularity of g and modularity of the cardinality function.
Then consider the remaining case B C 1 , i.e., B C 1 C 2 and B C 2 ,
f α ( B ) ( d ) f α ( B C 2 ) f α ( C 2 ) + f α ( B C 2 ) 0 ( e ) ( f ) f α ( ( B C 2 ) C 1 ) = g C 1 C 2 f α ( C 1 ) + f α ( ( B C 2 ) C 1 0 ( h ) ,
which implies (A32) as desired, where
  • ( d ) and ( f ) are by the submodularity of f α explained above;
  • ( h ) is by (23) and (24) since ( B C 2 ) C 1 C 1 , ( B C 2 ) C 1 C 1 C 2 and C 1 C 2 by the assumption stated in the lemma;
  • (g) is because B C 1 C 2 .

Appendix A.6

Proof of Theorem 2. 
Suppose on the contrary that C 1 C 2 and C 1 C 2 . By Lemma 1,
σ ( C 1 C 2 ) σ ( C 1 ) > α 1 .
This contradicts C 1 C α 1 ( V ) , in particular, the inclusion-wise maximality of C 1 , since C 1 C 2 C 1 , where the strict inclusion is because C 1 C 2 . □

Appendix A.7

Proof of Corollary 1. 
Consider B V : B . An inclusion-wise minimal element in the set in its parent p ( B ) exists since V is in the set. Suppose on the contrary that there can be multiple minimal elements, say C 1 C α ( V ) and C 2 C α ( V ) with α 1 α 2 w.l.o.g.
By Theorem 2, since C 1 C 2 B , we have C 2 C 1 , contradicting the minimality of C 1 as desired. Hence, p ( B ) exists and is unique. □

Appendix A.8

Proof of Corollary 2. 
To prove the “if” case, consider any B C σ ( C ) ( C ) , on one hand, B C ; on the other hand, by definition of C α ( · ) in (14), we have σ ( B ) > σ ( C ) . Hence, C satisfies the condition in (29). To show minimality, note that the maximality of B C σ ( C ) ( C ) according to the definition of C α ( · ) in (14) indicates that
C C : B C , σ ( C ) σ ( C ) .
Hence, p ( B ) = C .
Consider the reverse case, i.e., p ( B ) = C and any α R s.t. B C α ( V ) . Then
σ ( B ) > α σ ( C ) ,
where the last inequality is by Theorem 2 because C C α ( V ) as C B . To show B C σ ( C ) ( C ) , it suffices to show maximality of B in C σ ( C ) ( C ) , i.e.,
C C : C B , σ ( C ) σ ( C ) .
Suppose, on the contrary, that there exists
C C : C B , σ ( C ) > σ ( C ) .
Then, there must also exist
C C σ ( C ) ( C ) : C C .
By Theorem 2, since C C B , we must have C C , contradicting the minimality of C by definition of parent in (29) as desired.
Hence, Corollary 2 is established. □

Appendix A.9

Proof of Theorem 3. 
When | U | 1 , we have C α ( U ) = by the definition of C α ( · ) in (10), then (31) is implied by (32).
Then we prove the equality in (32) by showing that for | V | 2 ,
B α * ( V ) { { i } | i V } C α ( V ) .
and
C α ( V ) B α * ( V ) = C α ( U ) .
To show (A33), consider any B B α * ( V ) : | B | 2 . By minimality of B in B α ( V ) ,
f α ( B ) > min T B T f α ( T ) ,
which implies
σ ( B ) > α
by (23) and (24) since | B | > 1 . Then by Proposition 5, B is maximal among
{ B | B V , σ ( B ) > α } ,
which means B C α ( V ) .
It remains to show (A34).
First, we show that any element of the left-hand side of (A34) is on the right-hand side of (A34).
Consider any element C of the left-hand side of (A34), i.e.,
C C α ( V ) , and
C B α * ( V ) .
Since C B α * ( V ) , we have
σ ( C ) > α .
For any B B α * ( V ) , we deduce that
C B .
This is because,
  • when | B | = 1 , (A39) trivially, otherwise (A36) does not hold by (10).
  • when | B | > 1 , by (A33),
    B C α ( V ) .
    Then by (A37),
    C B ,
    which further implies (A39) by the maximality of C C α ( V ) according to (10).
By (A36), (A37), (A39) and Proposition 5,
B C = .
Then
C V B α * ( V ) ,
which is equivalent to C U by definition of U, and hence by (A36),
C C α ( U ) .
Next, we show that any element of the left-hand side of (A34) is in the right-hand side of (A34).
Consider any element C of the right-hand side of (A34), i.e.,
C C α ( U ) ,
then by (10), we have
σ ( C ) > α .
For any
C V : C C ,
we deduce that
σ ( C ) α .
This is because
  • when C U , (A45) holds by the maximality of C in C α ( U ) indicated by (10).
  • when C U ,
    B B α * ( V ) s . t .
    B C
    since the way U is defined in (33) means any elements not in U must appear in subsets in B α * ( V ) . Additionally,
    C B ,
    otherwise (A44) implies C B , which further implies (A42) must not hold. According to Proposition 5, (A46a), (A46b) and (A47) implies (A45) holds.
(A45) implies C is maximal among subsets in
{ C | C V , σ ( C ) > α } ,
which further implies
C C α ( V ) .
According to (10), (A42) implies
C U .
By (33), any element appeared in subsets in B α * ( V ) is not in U, then (A49) implies all the elements in C is not in any subset in B α * ( V ) . Hence
C B α * ( V )
As a result, (A48) and (A50) implies
C C α ( V ) B α * ( V ) .
Hence, (A34) is established.
The combination of (A33) and (A34) establish (32) in Theorem 3.
Hence, the equality in the recurrence relation holds.
By Proposition 4, the recursive procedure ends in finite steps. Hence, Theorem 3 is valid for calculating C α ( V ) recursively. □

Appendix A.10

Proof of Proposition 3. 
When | V | 2 , the feasible domain of (18) is non-empty, then its solution set B α ( V ) is non-empty, and hence B α * ( V ) is non-empty, i.e., (34) holds. □

Appendix A.11

Proof of Proposition 4. 
When | V | = 1 , it corresponds to the base case in (31), hence only need 1 recursive step to calculate C α ( V ) .
When | V | 2 , for N N , suppose U i , U 2 , , U N 1 is the sequence of the ground set for computing C α ( · ) used in the recursive steps that corresponds to (32), and U N is the ground set for computing C α ( · ) by the base case (31). Then we know U 1 = V , | U N | { 0 , 1 } and N is the number of recursive steps. Proposition 3 implies that | U i + 1 | | U i | 1 for i = 1 , 2 , , N 1 . Hence N | V | .
Hence, Proposition 4 holds. □

Appendix A.12

Proof of Proposition 5.
Suppose on the contrary that there exists C V s.t. σ ( C ) > α , B C = and C B . Consider an inclusion-wise maximal C. By Lemma 1 we have
σ ( C B ) min { σ ( C ) , σ ( B ) } > α ,
and so, by maximality of C we have C B . Then we have
f α ( C ) < ( a ) min T C : T f α ( T ) ( b ) f α ( B ) ,
where ( a ) is by (23) and (24) in Proposition 2, and ( b ) is by C B as obtained above with the assumption. However, this contradicts the optimality of B B α * ( V ) . □

Appendix A.13

Proof of Proposition 6. 
B α * ( V ) = ( a ) minimal t T α * ( V ) arg min B C : t B f α ( B )
= ( b ) minimal t T α * ( V ) minimal arg min B C : t B f α ( B )
= ( c ) minimal t T α * ( V ) B α ( t ) ( V ) ,
where ( a ) is by definition of B α * ( V ) , (20) and (37); ( b ) is by the fact that for a t T α * ( V ) and any A arg min B C : t B f α ( B ) , if A is not minimal, then A is not in the right-hand side due to the outer minimal applied to the union; and ( c ) is by definition of B α ( t ) ( V ) . □

Appendix A.14

Proof of Proposition 7. 
When | V | 1 , the base case (31) applies, and it takes O ( 1 ) time.
When | V | 2 , the recursive case (32) applies. Consider the recursive step with ground set V in Theorem 3:
  • For a t V , calculating B α ( t ) ( V ) for all α R takes MNB ( | V | ) time by the minimum norm base algorithm, hence calculating B α ( t ) ( V ) for all t V , α R takes O ( | V | 2 MNB ( | V | ) ) time;
  • Calculating T α * ( V ) for an α R takes O ( | V | ) time, hence calculating T α * ( V ) for all α R takes O ( | V | 3 ) time, since there can be a number O ( | V | ) of α values where we need to conduct calculation;
Hence it takes O ( | V | 2 MNB ( | V | ) ) time for this recursive step.
According to Proposition 4, the recursive procedure ends with at most | V | recursive steps, hence it takes O ( | V | 3 MNB ( | V | ) ) time to calculate α R C α ( V ) . □

Appendix A.15. Calculation of Strong Communities in Figure 3b

As an example, we show how to obtain the dendrogram in Figure 3b and then obtain the strong communities below.
Following the recursive procedure in Theorem 3, to solve C α ( V ) , first seek to solve B α * ( V ) , then for each critical α value at turning point, solve for C α ( U ) with similar procedure, where U is the complement set of V given by (33). Hence, the procedure to calculate B α * ( V ) is representative, which can be done according to Proposition 6.
When j = 1 is the sink node,
f α ( { 1 } ) = α · | ( { 1 } ) | g ( { 1 } ) = α , f α ( { 1 , 2 } ) = α · | ( { 1 , 2 } ) | g ( { 1 , 2 } ) = 2 α 4 , f α ( { 1 , 2 , 3 } ) = 3 α 5 , f α ( { 1 , 2 , 3 , 4 } ) = 4 α 8 ,
and f α ( { 1 , 2 , 4 } ) , f α ( { 1 , 3 } ) , f α ( { 1 , 3 , 4 } ) , f α ( { 1 , 4 } ) can be written out similarly and we omit them here. Then drawing f α against α , the lowest curve formed is f ^ α ( 1 ) ( V ) , and the corresponding solution B α ( 1 ) ( V ) can be obtained.
f ^ α ( 1 ) ( V ) = 4 α 8 , α < 2 , 2 α 4 , 2 α < 4 , α , α 4 ,
and correspondingly,
B α ( 1 ) ( V ) = { { 1 , 2 , 3 , 4 } } , α < 2 , { { 1 , 2 } } , 2 α < 4 , { { 1 } } , α 4 ,
Then, for all j V , calculate f ^ α ( j ) ( V ) and B α ( j ) ( V ) , and finally obtain T α * ( V ) and B α * ( V ) .
The result is
T α * ( V ) = { 1 , 2 , 3 , 4 } , α < 2 , { 1 , 2 } , 2 α < 4 , { 1 , 2 , 3 , 4 } , α 4 ,
B α * ( V ) = { { 1 , 2 , 3 , 4 } } , α < 2 , { { 1 , 2 } } , 2 α < 4 , { { 1 } , { 2 } , { 3 } , { 4 } } , α 4 ,
and
B α * ( V ) { { i } i V } = { { 1 , 2 , 3 , 4 } } , α < 2 , { { 1 , 2 } } , 2 α < 4 , , α 4 ,
When α < 2 or α 4 , the U in (33) satisfies (31), hence the base case applies, no new strong subset is found and the recursion finishes.
For 2 α < 4 , we need to continue to calculate C α ( U ) with U = { 3 , 4 } . With a similar calculation procedure, we can obtain for 2 α < 4 ,
B α * ( U ) = { { 3 , 4 } } , 2 α < 3 , { { 3 } , { 4 } } , 3 α < 4 ,
and
B α * ( U ) { { i } | i U } = { { 3 , 4 } } , 2 α < 3 , , 3 α < 4 .
Then the remaining recursions to be done satisfy the base case in (31). Hence all the strong communities are obtained. The dendrogram of the strong communities is shown in Figure 3b.

References

  1. Flake, G.W.; Lawrence, S.; Giles, C.L. Efficient identification of web communities. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD 2000), Boston, MA, USA, 20–23 August 2000; Association for Computing Machinery: New York, NY, USA, 2000; Volume 2000, pp. 150–160. [Google Scholar]
  2. Radicchi, F.; Castellano, C.; Cecconi, F.; Loreto, V.; Parisi, D. Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 2004, 101, 2658–2663. [Google Scholar] [CrossRef] [PubMed]
  3. Newman, M.E. Fast algorithm for detecting community structure in networks. Phys. Rev. E 2004, 69, 066133. [Google Scholar] [CrossRef] [PubMed]
  4. Javed, M.A.; Younis, M.S.; Latif, S.; Qadir, J.; Baig, A. Community detection in networks: A multidisciplinary review. J. Netw. Comput. Appl. 2018, 108, 87–111. [Google Scholar] [CrossRef]
  5. Chintalapudi, S.R.; Prasad, M.K. A survey on community detection algorithms in large scale real world networks. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1323–1327. [Google Scholar]
  6. Cai, Q.; Ma, L.; Gong, M.; Tian, D. A survey on network community detection based on evolutionary computation. Int. J. Bio-Inspired Comput. 2016, 8, 84–98. [Google Scholar] [CrossRef]
  7. Pizzuti, C. Evolutionary computation for community detection in networks: A review. IEEE Trans. Evol. Comput. 2017, 22, 464–483. [Google Scholar] [CrossRef]
  8. Jonnalagadda, A.; Kuppusamy, L. A cooperative game framework for detecting overlapping communities in social networks. Phys. A Stat. Mech. Appl. 2018, 491, 498–515. [Google Scholar] [CrossRef]
  9. Su, X.; Xue, S.; Liu, F.; Wu, J.; Yang, J.; Zhou, C.; Hu, W.; Paris, C.; Nepal, S.; Jin, D.; et al. A Comprehensive Survey on Community Detection with Deep Learning. arXiv 2021, arXiv:2105.12584. [Google Scholar] [CrossRef]
  10. Liu, F.; Xue, S.; Wu, J.; Zhou, C.; Hu, W.; Paris, C.; Nepal, S.; Yang, J.; Yu, P.S. Deep Learning for Community Detection: Progress, Challenges and Opportunities. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 11–17 July 2020; pp. 4981–4987. [Google Scholar]
  11. Athey, S.; Calvano, E.; Jha, S. A Theory of Community Formation and Social Hierarchy. SSRN Electron. J. 2016, 1–53. [Google Scholar] [CrossRef]
  12. Gilles, R.P. The Cooperative Game Theory of Networks and Hierarchies; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010; Volume 44. [Google Scholar]
  13. Zhou, X.; Zhao, X.; Liu, Y.; Sun, G. A game theoretic algorithm to detect overlapping community structure in networks. Phys. Lett. A 2018, 382, 872–879. [Google Scholar] [CrossRef]
  14. Torkaman, A.; Badie, K.; Salajegheh, A.; Bokaei, M.H.; Ardestani, S.F.F. A Four-Stage Algorithm for Community Detection Based on Label Propagation and Game Theory in Social Networks. AI 2023, 4, 255–269. [Google Scholar] [CrossRef]
  15. Ferdowsi, F.; Aghababaei Samani, K. Detecting overlapping communities in complex networks using non-cooperative games. Sci. Rep. 2022, 12, 11054. [Google Scholar] [CrossRef]
  16. Morgenstern, O.; Von Neumann, J. Theory of Games and Economic Behavior; Princeton University Press: Princeton, NJ, USA, 1953. [Google Scholar]
  17. Chalkiadakis, G.; Elkind, E.; Wooldridge, M. Computational Aspects of Cooperative Game Theory; Synthesis Lectures on Artificial Intelligence and Machine Learning Series; Springer: Cham, Switzerland, 2011; 168p. [Google Scholar]
  18. Myerson, R.B. Game Theory: Analysis of Conflict; Harvard University Press: Cambridge, MA, USA, 1997. [Google Scholar]
  19. Jonnalagadda, A.; Kuppusamy, L. A survey on game theoretic models for community detection in social networks. Soc. Netw. Anal. Min. 2016, 6, 83. [Google Scholar] [CrossRef]
  20. Zhou, L.; Lü, K.; Cheng, C.; Chen, H. A game theory based approach for community detection in social networks. In Big Data, Proceedings of the 29th British National Conference on Databases, Oxford, UK, 8–10 July 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 268–281. [Google Scholar]
  21. Zhou, L.; Lü, K.; Yang, P.; Wang, L.; Kong, B. An approach for overlapping and hierarchical community detection in social networks based on coalition formation game theory. Expert Syst. Appl. 2015, 42, 9634–9646. [Google Scholar] [CrossRef]
  22. Lu, Q.; Korniss, G.; Szymanski, B.K. The naming game in social networks: Community formation and consensus engineering. J. Econ. Interact. Coord. 2009, 4, 221–235. [Google Scholar] [CrossRef]
  23. Baronchelli, A. A gentle introduction to the minimal naming game. Belg. J. Linguist. 2016, 30, 171–192. [Google Scholar] [CrossRef]
  24. Uzun, T.G.; Ribeiro, C.H.C. Detection of communities with Naming Game-based methods. PLoS ONE 2017, 12, e0182737. [Google Scholar] [CrossRef] [PubMed]
  25. Shapley, L.S. Cores of convex games. Int. J. Game Theory 1971, 1, 11–26. [Google Scholar] [CrossRef]
  26. Chan, C.; Al-Bashabsheh, A.; Zhao, C. Finding Better Web Communities in Digraphs via Max-Flow Min-Cut. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 410–414. [Google Scholar]
  27. Bach, F. Learning with submodular functions: A convex optimization perspective. Found. Trends® Mach. Learn. 2013, 6, 145–373. [Google Scholar] [CrossRef]
  28. Fujishige, S.; Isotani, S. A submodular function minimization algorithm based on the minimum-norm base. Pac. J. Optim. 2011, 7, 3–17. [Google Scholar]
  29. Granot, F.; McCormick, S.T.; Queyranne, M.; Tardella, F. Structural and algorithmic properties for parametric minimum cuts. Math. Program. 2012, 135, 337–367. [Google Scholar] [CrossRef]
  30. Arora, C.; Banerjee, S.; Kalra, P.; Maheshwari, S. Generic cuts: An efficient algorithm for optimal inference in higher order MRF-MAP. In Computer Vision—ECCV 2012, Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Part V 12; Springer: Berlin/Heidelberg, Germany, 2012; pp. 17–30. [Google Scholar]
  31. Goldberg, A.V.; Hed, S.; Kaplan, H.; Tarjan, R.E.; Werneck, R.F. Maximum flows by incremental breadth-first search. In Algorithms—ESA 2011, Proceedings of the 19th Annual European Symposium on Algorithms, Saarbrücken, Germany, 5–9 September 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 457–468. [Google Scholar]
  32. Goldberg, A.V.; Hed, S.; Kaplan, H.; Kohli, P.; Tarjan, R.E.; Werneck, R.F. Faster and more dynamic maximum flow by incremental breadth-first search. In Algorithms—ESA 2015, Proceedings of the 23th Annual European Symposium on Algorithms, Patras, Greece, 14–16 September 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 619–630. [Google Scholar]
  33. Kolmogorov, V. A faster algorithm for computing the principal sequence of partitions of a graph. Algorithmica 2010, 56, 394–412. [Google Scholar] [CrossRef]
  34. Gallo, G.; Grigoriadis, M.D.; Tarjan, R.E. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 1989, 18, 30–55. [Google Scholar] [CrossRef]
  35. Lang, K.J.; Andersen, R. Finding dense and isolated submarkets in a sponsored search spending graph. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, Lisbon, Portugal, 6–10 November 2007; pp. 613–622. [Google Scholar]
  36. Flake, G.W.; Tarjan, R.E.; Tsioutsiouliklis, K. Graph clustering and minimum cut trees. Internet Math. 2004, 1, 385–408. [Google Scholar] [CrossRef]
  37. Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef]
  38. Fortunato, S.; Barthelemy, M. Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 2007, 104, 36–41. [Google Scholar] [CrossRef] [PubMed]
  39. Chen, M.; Nguyen, T.; Szymanski, B.K. A new metric for quality of network community structure. arXiv 2015, arXiv:1507.04308. [Google Scholar]
  40. Lu, X.; Kuzmin, K.; Chen, M.; Szymanski, B.K. Adaptive modularity maximization via edge weighting scheme. Inf. Sci. 2018, 424, 55–68. [Google Scholar] [CrossRef]
  41. Auerbach, J.; Galenson, J.; Sundararajan, M. An empirical analysis of return on investment maximization in sponsored search auctions. In Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, Las Vegas, NV, USA, 24–27 August 2008; pp. 1–9. [Google Scholar]
Figure 2. Dendrogram of the communities.
Figure 2. Dendrogram of the communities.
Entropy 26 00268 g002
Figure 4. An unweighted graph with m 1 = 20 , m 2 = 5 where K m denotes the complete graph with m nodes [38]. As for the two smaller complete graphs denoted by K m 2 , Modularity [37] will merge the two into a single community as indicated by the blue dashed ellipse due to resolution limit [38], while our approach can identify each of them as strong communities.
Figure 4. An unweighted graph with m 1 = 20 , m 2 = 5 where K m denotes the complete graph with m nodes [38]. As for the two smaller complete graphs denoted by K m 2 , Modularity [37] will merge the two into a single community as indicated by the blue dashed ellipse due to resolution limit [38], while our approach can identify each of them as strong communities.
Entropy 26 00268 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, C.; Al-Bashabsheh, A.; Chan, C. Game Theoretic Clustering for Finding Strong Communities. Entropy 2024, 26, 268. https://doi.org/10.3390/e26030268

AMA Style

Zhao C, Al-Bashabsheh A, Chan C. Game Theoretic Clustering for Finding Strong Communities. Entropy. 2024; 26(3):268. https://doi.org/10.3390/e26030268

Chicago/Turabian Style

Zhao, Chao, Ali Al-Bashabsheh, and Chung Chan. 2024. "Game Theoretic Clustering for Finding Strong Communities" Entropy 26, no. 3: 268. https://doi.org/10.3390/e26030268

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop