Next Article in Journal
Kaniadakis Entropy Leads to Particle–Hole Symmetric Distribution
Previous Article in Journal
Turbulence Model Comparative Study for Complex Phenomena in Supersonic Steam Ejectors with Double Choking Mode
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models

School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China
Entropy 2022, 24(9), 1216; https://doi.org/10.3390/e24091216
Submission received: 4 July 2022 / Revised: 25 August 2022 / Accepted: 27 August 2022 / Published: 30 August 2022
(This article belongs to the Topic Complex Systems and Network Science)

Abstract

:
We consider the problem of modeling and estimating communities in directed networks. Models to this problem in the previous literature always assume that the sending clusters and the receiving clusters have non-overlapping property or overlapping property simultaneously. However, previous models cannot model the directed network in which nodes in sending clusters have overlapping property, while nodes in receiving clusters have non-overlapping property, especially for the case when the number of sending clusters is no larger than that of the receiving clusters. This kind of directed network exists in the real world for its randomness, and by the fact that we have little prior knowledge of the community structure for some real-world directed networks. To study the asymmetric structure for such directed networks, we propose a flexible and identifiable Overlapping and Non-overlapping model (ONM). We also provide one model as an extension of ONM to model the directed network, with a variation in node degree. Two spectral clustering algorithms are designed to fit the models. We establish a theoretical guarantee on the estimation consistency for the algorithms under the proposed models. A small scale computer-generated directed networks are designed and conducted to support our theoretical results. Four real-world directed networks are used to illustrate the algorithms, and the results reveal the existence of highly mixed nodes and the asymmetric structure for these networks.

1. Introduction

Community detection is a powerful tool in studying social networks with a latent structure of community [1,2,3,4]. The goal of community detection is to estimate a node’s community information from the network. In the study of social networks, various models have been proposed for community detection to model different networks with different community structures [5]. Due to the extremely intensive studies on community detection, we only focus on identifiable models that are closely relevant to our study in this paper.
The Stochastic Blockmodel (SBM) [6] is a classical and widely used model for an undirected network. SBM assumes that the probability of an edge between two nodes only depends on the clusters they belong to, and this assumption is not realistic because nodes have various degrees in real-world networks. To model real-world un-directed networks in which nodes degrees vary, the Degree-Corrected Stochastic Blockmodel (DCSBM) [7] extends SBM by introducing degree heterogeneities. Under SBM and DCSBM, all nodes are pure, such that each node only belongs to one community. However, in real cases, some nodes may belong to multiple communities, and such nodes have overlapping (also known as mixed membership) property. To model undirected networks in which nodes have an overlapping property, Ref. [8] designs the Mixed Membership Stochastic Blockmodel (MMSB). Ref. [9] introduces the Degree-Corrected Mixed Membership model (DCMM), which extends MMSB by considering degree heterogeneities. Ref. [10] designs the Overlapping Continuous Community Assignment model (OCCAM), which equals DCMM actually. Spectral methods with consistent estimations under the above models are provided in [9,11,12,13,14,15,16,17].
For directed networks in which all nodes have a non-overlapping property, Ref. [18] proposes a model called Stochastic co-Blockmodel, (ScBM) and its extension, the Degree-Corrected Stochastic co-Blockmodel (DCScBM), by considering the degree heterogeneity, where ScBM (DCScBM) is an extension of SBM (DCSBM) from an un-directed network to a directed network. ScBM and DCScBM can model non-overlapping directed networks in which row nodes belong to K r sending clusters (we also use community to denote cluster occasionally) and column nodes belong to K c receiving clusters, where row nodes can differ from column nodes, and K r can differ from K c . Ref. [19] studies the consistency of some adjacency-based spectral algorithms under ScBM. Ref. [20] studies the consistency of the spectral method D-SCORE under DCScBM when K r = K c . Ref. [21] designs the Directed Mixed Membership Stochastic Blockmodel (DiMMSB) as an extension of ScBM and MMSB to model directed networks in which all nodes have overlapping property. Meanwhile, DiMMSB can also be seen as an extension of the two-way blockmodels with a Bernoulli distribution of [22]. All of the above models are identifiable under certain conditions. The identifiability of ScBM and DCScBM holds even for the case when K r K c . DiMMSB is identifiable only when K r = K c . Sure, SBM, DCSBM, MMSB, DCMM, and OCCAM are identifiable when K r = K c , since they model undirected networks. For all the above models, row nodes and column nodes have symmetric structural information such that they always have non-overlapping property or overlapping property simultaneously. As shown by the identifiability of DiMMSB, to model a directed network in which all nodes have overlapping property, the identifiability of the model requires K r = K c . Naturally, there is a bridge model from ScBM to DiMMSB such that the bride model can model a directed network in which the row nodes and column nodes have asymmetric structural information such that they have different overlapping property. In this paper, we introduce this model and name it the Overlapping and Non-overlapping model.
Our contributions in this paper are as follows. We propose an identifiable model for directed networks, the Overlapping and Non-overlapping model (ONM for short). ONM allows that nodes in a directed network can have different overlapping properties. Without a loss of generality, in a directed network, we let the row nodes have overlapping property while the column nodes do not. The proposed model is identifiable when K r K c . Recall that the identifiability of ScBM modeling non-overlapping directed networks holds even for the case K r K c , and that DiMMSB modeling overlapping directed networks is identifiable only when K r = K c , this is the reason for why we call ONM modeling directed networks, in which row nodes have different overlapping properties to column nodes, as a bridge model from ScBM to DiMMSB. We also propose an identifiable model, Overlapping and Degree-Corrected Non-overlapping model (ODCNM), as an extension of ONM, by considering the degree heterogeneity. We construct two spectral algorithms to fit ONM and ODCNM. We show that our methods enjoy consistent estimations under mild conditions. Especially, our theoretical results under ODCNM match those under ONM when ODCNM reduces to ONM. The numerical results of simulated directed networks generated under ONM and ODCNM support our theoretical findings, and the results on four real-world directed networks demonstrate the advantages of our algorithms in studying the asymmetric structure between the sending and receiving clusters.
Notations. We take the following general notations in this paper. For any positive integer m, let [ m ] : = { 1 , 2 , , m } , and let I m denote the m × m identity matrix. For a vector x and fixed q > 0 , x q denotes its l q -norm. For a matrix M, M denotes the transpose of the matrix M, M denotes the spectral norm, M F denotes the Frobenius norm, and M 2 denotes the maximum l 2 -norm of all the rows of M. Let σ i ( M ) be the i-th largest singular value of matrix M, and let λ i ( M ) denote the i-th largest eigenvalue of the matrix M ordered by the magnitude. M ( i , : ) and M ( : , j ) denote the i-th row and the j-th column of matrix M, respectively. M ( S r , : ) and M ( : , S c ) denote the rows and columns in the index sets S r and S c of matrix M, respectively. For any matrix M, we simply use Y = max ( 0 , M ) to represent Y i j = max ( 0 , M i j ) for any i , j . For any matrix M R m × m , let diag ( M ) be the m × m diagonal matrix whose i-th diagonal entry is M ( i , i ) , and let rank ( M ) be M’s rank. 1 is a column vector with all entries being the value 1. e i is a column vector whose i-th entry is 1, while other entries are zero. In this paper, C is a positive constant which may occasionally be different.

2. The Overlapping and Non-Overlapping Model

Consider a directed network N = ( V r , V c , E ) , where V r = { 1 , 2 , , n r } is the set of row nodes, V c = { 1 , 2 , , n c } is the set of column nodes, and E is the set of edges from the row nodes to the column nodes. Note that since the row nodes can be different from the column nodes, we may have V r V c = (i.e., there are no common nodes between V r and V c ), and V r may not be equal to V c (i.e., the row nodes are different from the column nodes), which is a more general case than V r = V c (i.e., all row nodes are same as column nodes), where ⌀ denotes the null set, and such a directed network N is also known as a bipartite graph (or bipartite network) in [18,19]. In this paper, we use the subscript r and c to distinguish the terms for the row nodes and column nodes, where works in [18,19,23,24,25,26] also consider the general bipartite setting, such that the row nodes may differ from the column nodes. Let A { 0 , 1 } n r × n c be the bi-adjacency matrix of directed network N , such that A ( i r , i c ) = 1 if there is a directional edge from row node i r to column node i c , and A ( i r , i c ) = 0 otherwise. For convenience, we call the community that the row nodes belong to as the row community (or sending cluster occasionally), and the community that the column nodes belong to as the column community (or receiving cluster occasionally).
We propose a new blockmodel which we call the Overlapping and Non-overlapping model (ONM for short). ONM can model directed networks whose row nodes belong to K r overlapping row communities, while the column nodes belong to K c non-overlapping column communities. For row nodes, let Π r R n r × K r be the membership matrix, such that
Π r ( i r , ) 0 , Π r ( i r , : ) 1 = 1 for i r [ n r ] .
Call row node i r pure if Π r ( i r , : ) degenerates (i.e., one entry is 1, all others K r 1 entries are 0), and mixed otherwise. From such a definition, row node i r has mixed membership and may belong to more than one row communities for i r [ n r ] .
For column nodes, let be the n c × 1 vector whose i c -th entry ( i c ) = k if column node i c belongs to the k-th column community, and ( i c ) takes value from { 1 , 2 , , K c } for i c [ n c ] . Let Π c R n c × K c be the membership matrix of column nodes, such that for i c [ n c ] , k [ K c ] ,
Π c ( i c , k ) = 1 when ( i c ) = k , and 0 otherwise , and Π c ( i c , : ) 1 = 1 .
From such a definition, column node i c belongs to exactly one of the K c column communities for i c [ n c ] . Sure, all of the column nodes are pure nodes.
In this paper, we assume that
K r K c .
Equation (3) is required for the identifiability of ONM. Let P R K r × K c be the probability matrix, such that
0 P ( k , l ) ρ 1 for k [ K r ] , l [ K c ] ,
where ρ controls the network sparsity and is called the sparsity parameter in this paper. For convenience, set P = ρ P ˜ , where P ˜ ( k , l ) [ 0 , 1 ] for k [ K r ] , l [ K c ] , and max k [ K r ] , l [ K c ] P ˜ ( k , l ) = 1 for model identifiability. For all pairs of ( i r , i c ) with i r [ n r ] , i c [ n c ] , our model assumes that A ( i r , i c ) are independent Bernoulli random variables satisfying
Ω : = Π r P Π c , A ( i r , i c ) Bernoulli ( Ω ( i r , i c ) ) ,
where Ω = E [ A ] , and we call it the population adjacency matrix in this paper.
Definition 1.
Call model (1)–(5) the Overlapping and Non-overlapping model (ONM), and denote it with O N M n r , n c ( K r , K c , P , Π r , Π c ) .
Remark 1.
Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , for i r [ n r ] , j c [ n c ] , since P ( A ( i r , j c ) = 1 ) = Ω ( i r , j c ) = ρ Π ( i r , : ) P ˜ Π c ( j c , : ) , we see that increasing ρ increases the probability to generate an edge from row node i r to column node j c , i.e., the sparsity of the network is governed by ρ.
The following conditions are sufficient for the identifiability of ONM:
  • (I1) rank ( P ) = K r , rank ( Π r ) = K r , and rank ( Π c ) = K c .
  • (I2) There is at least one pure row node for each of the K r row communities.
For k [ K r ] , let I r ( k ) = { i [ n r ] } : Π r ( i , k ) = 1 } . By condition (I2), I r ( k ) is non-empty for all k [ K r ] . For k [ K r ] , select one row node from I r ( k ) to construct the index set I r ; i.e., I r is the indices of row nodes corresponding to K r pure row nodes, one from each row community. Without loss of generality, let Π r ( I r , : ) = I K r (Lemma 2.1 [17] also has a similar setting to design their spectral algorithm under MMSB). I c is defined similarly for the column nodes, such that Π c ( I c , : ) = I K c . The next proposition guarantees the identifiability of ONM.
Proposition 1.
If conditions (I1) and (I2) hold, ONM is identifiable: For eligible ( P , Π r , Π c ) and ( P ˇ , Π ˇ r , Π ˇ c ) , if Π r P Π c = Π ˇ r P ˇ Π ˇ c , then P = P ˇ , Π r = Π ˇ r , and Π c = Π ˇ c .
All proofs of propositions, lemmas, and theorems are provided in Appendix B and Appendix C of this paper. Compared to some previous models, ONM models different directed networks.
  • When the row nodes are the same as the column nodes, K r = K c , and all nodes are pure, ONM degenerates to SBM. However, ONM can model directed networks where row nodes enjoy mixed memberships, while SBM only models un-directed networks.
  • When all row nodes are pure, our ONM reduces to ScBM with K r row clusters and K c column clusters [18]. However, ONM allows for row nodes to have overlapping memberships, while ScBM does not. Meanwhile, for model identifiability, ScBM does not require rank ( P ) = K r that ONM requires, and this can be seen as the cost of ONM when modeling the overlapping row nodes.
  • Though DiMMSB [21] can model directed networks whose row and column nodes have overlapping memberships, DiMMSB requires K r = K c for model identifiability. For comparison, our ONM allows K r K c at the cost of losing the overlapping property of the column nodes.

2.1. A Spectral Algorithm for Fitting ONM

The primary goal of the proposed algorithm is to estimate the row membership matrix Π r and the column membership matrix Π c from the observed adjacency matrix A with a given K r and K c . We now discuss our intuition for the design of our algorithm to fit ONM.
Under conditions (I1) and (I2), by basic algebra, we have rank ( Ω ) = K r . Let Ω = U r Λ U c be the compact singular value decomposition of Ω , where U r R n r × K r , Λ R K r × K r , U c R n c × K r , U r U r = I K r , U c U c = I K r , and I K r is a K r × K r identity matrix. Let n c , k = | { i c : ( i c ) = k } | be the size of the k-th column community for k [ K c ] . Let n c , max = max k [ K c ] n c , k and n c , min = min k [ K c ] n c , k . Meanwhile, without causing confusion, let n c , K r be the K r -th largest size among all column communities. The following lemma guarantees that U r enjoys an ideal simplex structure and U c has K c distinct rows.
Lemma 1.
Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , there exists a unique K r × K r matrix B r and a unique K c × K r matrix B c , such that
  • U r = Π r B r , where B r = U r ( I r , : ) . Meanwhile, U r ( i r , : ) = U r ( i ¯ r , : ) when Π r ( i r , : ) = Π r ( i ¯ r , : ) for i r , i ¯ r [ n r ] .
  • U c = Π c B c . Meanwhile, U c ( i c , : ) = U c ( i ¯ c , : ) when ( i c ) = ( i ¯ c ) for i c , i ¯ c [ n c ] , i.e., U c has K c distinct rows. Furthermore, when K r = K c = K , we have B c ( k , : ) B c ( l , : ) F = 1 n c , k + 1 n c , l for all 1 k < l K .
Lemma 1 says that the rows of U c form a K r -simplex in R K r , which we call the Ideal Simplex (IS), with the K r rows of B r being the vertices. This IS is also found in [9,17,21]. Meanwhile, Lemma 1 says that U c has K c distinct rows, and if two column nodes i c and i ¯ c are from the same column community, then U c ( i c , : ) = U c ( i ¯ c , : ) .
Under ONM, to recover Π c from U c , since U c has K c distinct rows, applying the k-means algorithm on all rows of U c returns true column communities by Lemma 1. Since U c has K c distinct rows, we can set δ c = min k l B c ( k , : ) B c ( l , : ) F to measure the minimum center separation of B c . By Lemma 1, δ c 2 n c , max when K r = K c = K under O N M n r , n c ( K r , K c , P , Π r , Π c ) . However, when K r < K c , it is a challenge to obtain a positive lower bound of δ c ; see the proof of Lemma 1 for details.
Under ONM, to recover Π r from U r , since B r is full rank, if U r and B r are known in advance ideally, we can exactly recover Π r by setting Π r = U r B r ( B r B r ) 1 via Lemma 1. Set Y r = U r B r ( B r B r ) 1 . Since Y r Π r and Π r ( i r , : ) 1 = 1 for i r [ n r ] , we have
Π r ( i r , : ) = Y r ( i r , : ) Y r ( i r , : ) 1 , i r [ n r ] .
With a given U r , since it enjoys IS structure U r = Π r B r Π r U r ( I r , : ) , as long as we can obtain the row corner matrix U r ( I r , : ) (i.e., B r ), we can recover Π r exactly. As mentioned in [9,17,21], for such an ideal simplex, the successive projection (SP) algorithm [27] (for details of SP, see Algorithm A1) can be applied to U r with K r row communities to find U r ( I r , : ) .
Based on the above analysis, we are now ready to give the following algorithm which we call Ideal ONA. Input Ω , K r , and K c with K r K c . Outputs: Π r and .
  • Let Ω = U r Λ U c be the compact SVD of Ω , such that U r R n r × K r , U c R n c × K r , Λ R K r × K r , U r U r = I K r , a n d   U c U c = I K r .
  • For the row nodes,
    -
    Run the SP algorithm on all rows of U r , assuming there are K r row communities to obtain U r ( I r , : ) . Set B r = U r ( I r , : ) .
    -
    Set Y r = U r B r ( B r B r ) 1 . Recover Π r by setting Π r ( i r , : ) = Y r ( i r , : ) Y r ( i r , : ) 1 for i r [ n r ] .
    For the column nodes,
    -
    Run k-means on U c assuming that there are K c column communities, i.e., find the solution to the following optimization problem
    M = argmin M M n c , K r , K c M U c F 2 ,
    where M n c , K r , K c denotes the set of n c × K r matrices with only K c different rows.
    -
    Use M to obtain the labels vector of the column nodes. Note that since M has K c distinct rows, two different column nodes, i c , i ¯ c [ n c ] , are in the same column community if M ( i c , : ) = M ( i ¯ c , : ) .
Following a similar proof of Theorem 1 of [21], the Ideal ONA exactly recovers row nodes memberships and column nodes labels, and this also verifies the identifiability of ONM in turn. For convenience, call the two steps for column nodes “run k-means on U c assuming there are K c column communities to obtain ”.
We now extend the ideal case to the real case. Set A ˜ = U ^ r Λ ^ U ^ c be the top- K r -dimensional SVD of A, such that U ^ r R n r × K r , U ^ c R n c × K r , Λ ^ R K r × K r , U ^ r U ^ r = I K r , U ^ c U ^ c = I K r , and Λ ^ contains the top K r singular values of A. For the real case, we use B ^ r , B ^ c , Y ^ r , Π ^ r , Π ^ c given in Algorithm 1 to estimate B r , B c , Y r , Π r , Π c , respectively. Algorithm 1, called the Overlapping and Non-overlapping algorithm (ONA for short), is a natural extension of the Ideal ONA to the real case. In ONA, we set the negative entries of Y ^ r as 0 by setting Y ^ r = max ( 0 , Y ^ r ) , for the reason that the weights for any row node should be non-negative while there may exist some negative entries of U ^ r B ^ r ( B ^ r B ^ r ) 1 . Note that in a directed network, if the column nodes have an overlapping property while row nodes do not, to perform community detection for such a directed network, the transpose of the adjacency matrix should be set as input when applying our algorithm.
Algorithm 1Overlapping and Non-overlapping Algorithm (ONA)
  • Require: The adjacency matrix A R n r × n c of a directed network, the number of row communities K r , and the number of column communities K c with K r K c .
  • Ensure: The estimated n r × K r membership matrix Π ^ r for row nodes, and the estimated n c × 1 labels vector ^ for column nodes.
    1:
    Compute U ^ r R n r × K r and U ^ c R n c × K r from the top- K r -dimensional SVD of A.
    2:
    For row nodes:
    • Apply SP algorithm (i.e., Algorithm 2) on the rows of U ^ r assuming there are K r row clusters to obtain the near-corners matrix U ^ r ( I ^ r , : ) R K r × K r , where I ^ r is the index set returned by SP algorithm. Set B ^ r = U ^ r ( I ^ r , : ) .
    • Compute the n r × K r matrix Y ^ r such that Y ^ r = U ^ r B ^ r ( B ^ r B ^ r ) 1 . Set Y ^ r = max ( 0 , Y ^ r ) and estimate Π r ( i r , : ) by Π ^ r ( i r , : ) = Y ^ r ( i r , : ) Y ^ r ( i r , : ) 1 , i r [ n r ] .
  For column nodes: run k-means on U ^ c assuming there are K c column communities to obtain ^ .

2.2. Main Results for ONA

In this section, we show the consistency of our algorithm for fitting the ONM as the number of row nodes n r and the number of column nodes n c increases. Throughout this paper, K r and K c are two known integers. First, we assume that:
Assumption 1.
ρ max ( n r , n c ) log ( n r + n c ) .
Assumption 1 controls the sparsity of the directed network considered for theoretical study. When building an estimation consistency of the spectral clustering methods in community detection, the sparsity assumption is common; see [13,14,17,18,20,21]. Especially, when ONM reduces to SBM, the sparsity requirement in Assumption 1 is consistent with that of Theorem 3.1 in [13], which guarantees the theoretical optimality on the sparsity condition of this paper. To measure the performance of ONA for row nodes memberships, since row nodes have mixed memberships, naturally, we use the l 1 norm difference between Π r and Π ^ r . Since the column nodes are all pure nodes, we consider the performance criterion defined in [15] to measure the estimation error of ONA on the column nodes. We introduce this measurement of estimation error below.
Let T c = { T c , 1 , T c , 2 , , T c , K c } be the true partition of column nodes { 1 , 2 , , n c } obtained from , such that T c , k = { i c [ n c ] : ( i c ) = k } for k [ K c ] . Let T ^ c = { T ^ c , 1 , T ^ c , 2 , , T ^ c , K c } be the estimated partition of column nodes { 1 , 2 , , n c } obtained from ^ of ONA, such that T ^ c , k = { i c [ n c ] : ^ ( i c ) = k } for k [ K c ] . The criterion is defined as
f ^ c = min π S K c max k [ K c ] | T c , k T ^ c , π ( k ) c | + | T c , k c T ^ c , π ( k ) | n c , k ,
where S K c is the set of all permutations of { 1 , 2 , , K c } , and the superscript c denotes the complementary set. As mentioned in [15], f ^ c measures the maximum proportion of column nodes in the symmetric difference of T c , k and T ^ c , π ( k ) .
The next theorem gives the theoretical bounds on the estimations of memberships for both the row and column nodes, which is the main theoretical result for ONA.
Theorem 1.
Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , when Assumption 1 holds, suppose that σ K r ( Ω ) C ρ ( n r + n c ) log ( n r + n c ) , with a probability of at least 1 o ( ( n r + n c ) α ) for any α > 0 ,
  • For row nodes, there exists a permutation matrix P r such that
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( ϖ κ ( Π r Π r ) K r λ 1 ( Π r Π r ) ) ,
    where ϖ = U ^ r U ^ r U r U r 2 is the row-wise singular eigenvector error.
  • For column nodes, f ^ c = O ( K r K c max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 σ K r 2 ( Π r ) n c , K r n c , min ) . Especially, when K r = K c = K ,
    f ^ c = O ( K 2 max ( n r , n c ) n c , max log ( n r + n c ) σ K 2 ( P ˜ ) ρ σ K 2 ( Π r ) n c , min 2 ) .
Adding conditions similar to Corollary 3.1 in [17], we have the following corollary.
Corollary 1.
Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , suppose conditions in Theorem 1 hold, and further, suppose that λ K r ( Π r Π r ) = O ( n r K r ) , n c , min = O ( n c K c ) , with a probability of at least 1 o ( ( n r + n c ) α ) ,
  • For row nodes, when K r = K c = K ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( K 2 ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) σ K ( P ˜ ) ρ n c ) .
  • For column nodes, f ^ c = O ( K r 2 K c 3 max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 n r n c 2 ) . When K r = K c = K ,
    f ^ c = O ( K 4 max ( n r , n c ) log ( n r + n c ) σ K 2 ( P ˜ ) ρ n r n c ) .
Especially, when n r = O ( n ) , n c = O ( n ) , K r = O ( 1 ) , and K c = O ( 1 ) ,
  • For row nodes, when K r = K c = K ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( log ( n ) σ K ( P ˜ ) ρ n ) .
  • For column nodes, f ^ c = O ( log ( n ) σ K r 2 ( P ˜ ) ρ δ c 2 n 2 ) . When K r = K c = K ,
    f ^ c = O ( log ( n ) σ K 2 ( P ˜ ) ρ n ) .
When n r = O ( n ) , n c = O ( n ) , K r = K c = K = O ( 1 ) in Corollary 1, the bounds for the row and column nodes are O ( 1 σ K ( P ˜ ) log ( n ) n ) and O ( 1 σ K 2 ( P ˜ ) log ( n ) ρ n ) , respectively, and we see that ONA yields a stable and consistent community detection for both the row and column nodes, since the error rates go to zero as n when P ˜ is fixed. Especially, for the row nodes with mixed memberships, when the DCMM proposed in [9] reduces to MMSB and K = O ( 1 ) , the error bound of the Mixed-SCORE in Theorem 2.2 of [9] is also O ( 1 σ K ( P ˜ ) log ( n ) n ) , which guarantees the theoretical optimality of our analysis for the row nodes. For the column nodes, when every column community enjoys similar sizes and K = O ( 1 ) , our bound O ( 1 σ K 2 ( P ˜ ) log ( n ) ρ n ) matches Corollary 3.2 in [13] up to a logarithmic factor, which guarantees the theoretical optimality of our analysis for column nodes. Furthermore, the optimality of our requirement on network sparsity and the theoretical upper bounds of ONA’s error rates is also supported by using the separation condition and sharp threshold criterion developed in [28].

3. The Overlapping and Degree-Corrected Non-Overlapping Model

In this section, we propose an extension of ONM by considering the degree heterogeneity, and we build theoretical guarantees for algorithm fitting our model.
Let θ c be an n c × 1 vector whose i c -th entry is the degree heterogeneity of column node i c , for i c [ n c ] . Let Θ c be an n c × n c diagonal matrix whose i c -th diagonal element is θ c ( i c ) . For i r [ n r ] , i c [ n c ] , the extended model for generating A is:
Ω : = Π r P Π c Θ c , A ( i r , i c ) Bernoulli ( Ω ( i r , i c ) ) .
Definition 2.
Call model (1)–(4), (6) the Overlapping and Degree-Corrected Non-overlapping model (ODCNM), and denote it by O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) .
Note that, under ODCNM, the maximum element of P can be larger than 1, since max i c [ n c ] θ c ( i c ) also controls the sparsity of directed network N . The following proposition guarantees that ODCNM is identifiable in terms of P , Π r , and Π c , and such identifiability is similar to that of DCSBM.
Proposition 2.
If conditions (I1) and (I2) hold, ODCNM is identifiable for the membership matrices: For eligible ( P , Π r , Π c , Θ c ) and ( P ˇ , Π ˇ r , Π ˇ c , Θ ˇ c ) , if Π r P Π c Θ c = Π ˇ r P ˇ Π ˇ c Θ ˇ c , then Π r = Π ˇ r and Π c = Π ˇ c .
Remark 2.
By setting θ c ( i c ) = ρ for i c [ n c ] , ODCNM reduces to ONM, and this is the reason for why ODCNM can be seen as an extension of ONM. Meanwhile, though DCScBM [18] can model directed networks with degree heterogeneities for both row and column nodes, DCScBM does not allow the overlapping property for row nodes. For comparison, our ODCNM allows row nodes to have an overlapping property at the cost of losing the degree heterogeneities and requiring K r K c for model identifiability.

3.1. A Spectral Algorithm for Fitting ODCNM

We now discuss our intuition for the design of our algorithm to fit ODCNM. Without causing confusion, we also use U r , U c , B r , B c , δ c , Y r under ODCNM. Let U c , R n c × K r be the row-normalized version of U c , such that U c , ( i c , : ) = U c ( i c , : ) U c ( i c , : ) F for i c [ n c ] . Then, clustering the rows of U c , using the k-means algorithm can return perfect clustering for column nodes, and this is guaranteed by the following lemma.
Lemma 2.
Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) , there exists a unique K r × K r matrix B r and a unique K c × K r matrix B c , such that
  • U r = Π r B r , where B r = U r ( I r , : ) . Meanwhile, U r ( i r , : ) = U r ( i ¯ r , : ) when Π r ( i r , : ) = Π r ( i ¯ r , : ) for i r , i ¯ r [ n r ] .
  • U c , = Π c B c . Meanwhile, U c , ( i c , : ) = U c , ( i ¯ c , : ) when ( i c ) = ( i ¯ c ) for i c , i ¯ c [ n c ] . Furthermore, when K r = K c = K , we have B c ( k , : ) B c ( l , : ) F = 2 for all 1 k < l K .
Recall that we set δ c = min k l B c ( k , : ) B c ( l , : ) F by Lemma 2; δ c = 2 when K r = K c = K under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) . However, when K r < K c , it is a challenge to obtain a positive lower bound of δ c ; see the proof of Lemma 2 for details.
Under ODCNM, to recover Π c from U c , since U c , has K c distinct rows, applying the k-means algorithm on all rows of U c , returns true column communities by Lemma 2. To recover Π r from U r , the same idea as that of under ONM can be followed.
Based on the above analysis, we are now ready to present the following algorithm, which we call Ideal ODCNA. Input Ω , K r , K c with K r K c . Output: Π r and .
  • Let Ω = U r Λ U c be the compact SVD of Ω , such that U r R n r × K r , U c R n c × K r , Λ R K r × K r , U r U r = I K r , U c U c = I K r . Let U c , be the row-normalization of U c .
  • For row nodes, they are the same as that of Ideal ONA.
    For column nodes: run k-means on U c , assuming there are K c column communities to obtain .
We now extend the ideal case to the real case. Let U ^ c , R n c × K r be the row-normalized version of U ^ c , such that U ^ c , ( i c , : ) = U ^ c ( i c , : ) U ^ c ( i c , : ) F for i c [ n c ] . The Overlapping and Degree-Corrected Non-overlapping Algorithm (ODCNA for short) is a natural extension of the Ideal ODCNA to the real case, where all steps of ODCNA are the same as ONA except for those for column nodes. ODCNA applies k-means on U ^ c , to obtain ^ .

3.2. Main Results for ODCNA

Set θ c , max = max i c [ n c ] θ c ( i c ) , θ c , min = min i c [ n c ] θ c ( i c ) , and P max = max k [ K r ] , l [ n c ] P ( k , l ) . Assume that
Assumption 2.
P max max ( θ c , max n r , θ c 1 ) log ( n r + n c ) .
The next theorem is the main theoretical result for ODCNA, where we also use the same measurements as ONA to measure the performances of ODCNA.
Theorem 2.
Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) , when Assumption 2 holds, suppose σ K r ( Ω ) C θ c , max ( n r + n c ) log ( n r + n c ) , with a probability at least 1 o ( ( n r + n c ) α ) ,
  • For the row nodes,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( ϖ κ ( Π r Π r ) K r λ 1 ( Π r Π r ) ) .
  • For the column nodes,
    f ^ c = O ( θ c , max 2 K r K c max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 σ K r 2 ( Π r ) n c , K r n c , min ) ,
    where m V c is a parameter defined in the proof of this theorem, and it is 1 when K r = K c . Especially, when K r = K c = K ,
    f ^ c = O ( θ c , max 2 K 2 max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K 2 ( P ) θ c , min 4 σ K 2 ( Π r ) n c , min 2 ) .
Adding some conditions on model parameters, we have the following corollary.
Corollary 2.
Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) , suppose that conditions in Theorem 2 hold, and further, suppose that λ K r ( Π r Π r ) = O ( n r K r ) , n c , min = O ( n c K c ) , with a probability of at least 1 o ( ( n r + n c ) α ) ,
  • For row nodes, when K r = K c = K ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( K 2 θ c , max ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K ( P ) n c ) .
  • For column nodes, f ^ c = O ( θ c , max 2 K r 2 K c 2 max ( θ c , max n r , θ c 1 ) log ( n r + n c ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 n r n c ) . When K r = K c = K ,
    f ^ c = O ( θ c , max 2 K 4 max ( θ c , max n r , θ c 1 ) log ( n r + n c ) σ K 2 ( P ) θ c , min 4 n r n c ) .
Especially, when n r = O ( n ) , n c = O ( n ) , K r = O ( 1 ) and K c = O ( 1 ) ,
  • For row nodes, when K r = K c ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( θ c , max log ( n ) θ c , min σ K ( P ) n ) .
  • For column nodes, f ^ c = O ( θ c , max 2 max ( θ c , max n r , θ c 1 ) log ( n ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 n 2 ) . When K r = K c = K ,
    f ^ c = O ( θ c , max 2 max ( θ c , max n r , θ c 1 ) log ( n ) σ K 2 ( P ) θ c , min 4 n 2 ) .
If we further set θ c , max = O ( ρ ) and θ c , min = O ( ρ ) , we have the below corollary.
Corollary 3.
Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) , suppose that the conditions in Theorem 2 hold, and further, suppose that λ K r ( Π r Π r ) = O ( n r K r ) , n c , min = O ( n c K c ) and θ c , max = O ( ρ ) , θ c , min = O ( ρ ) , with a probability of at least 1 o ( ( n r + n c ) α ) ,
  • For row nodes, when K r = K c = K ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( K 2 ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) σ K ( P ) ρ n c ) .
  • For column nodes, f ^ c = O ( K r 2 K c 2 max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ) ρ δ c 2 m V c 2 n r n c ) . When K r = K c = K ,
    f ^ c = O ( K 4 max ( n r , n c ) log ( n r + n c ) σ K 2 ( P ) ρ n r n c ) .
Especially, when n r = O ( n ) , n c = O ( n ) , K r = O ( 1 ) and K c = O ( 1 ) ,
  • For row nodes, when K r = K c ,
    max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( log ( n ) σ K ( P ) ρ n ) .
  • For column nodes, f ^ c = O ( log ( n ) σ K r 2 ( P ) ρ δ c 2 m V c 2 n ) . When K r = K c = K ,
    f ^ c = O ( log ( n ) σ K 2 ( P ) ρ n ) .
By setting Θ c = ρ I , ODCNM degenerates to ONM. By comparing Corollaries 1 and 3, we see that theoretical results under ODCNM are consistent with those under ONM when ODCNM degenerates to ONM for the case where K r = K c = K .

4. Simulations

In this section, we present some simulations to investigate the performances of the two proposed algorithms. We measure their performances using the Mixed-Hamming error rate (MHamm for short) for row nodes, and the Hamming error rate (Hamm for short) for the column nodes defined below
MHamm = min π S K r Π ^ r π Π r 1 n r , Hamm = min π S K c Π ^ c π Π c 0 n c ,
where S K r is the set of all permutations of { 1 , 2 , , K r } , S K c is the set of all permutations of { 1 , 2 , , K c } ; Π ^ c R n c × K c is defined as Π ^ c ( i c , k ) = 1 if ^ ( i c ) = k , and 0 otherwise for i c [ n c ] , k [ K c ] .
For all simulations in this section, the parameters ( n r , n c , K r , K c , P , ρ , Π r , Π c , Θ c ) are set as below. Unless specified, set n r = 400 , n c = 300 , K r = 3 , K c = 4 . For the column nodes, generate Π c by setting each column node belonging to one of the column communities with equal probability. Let each row community have 100 pure nodes, and let all the mixed row nodes have memberships ( 0.6 , 0.3 , 0.1 ) . P = ρ P ˜ is set independently under ONM and ODCNM. Under ONM, ρ is 0.5 in Experiment 1, and we study the influence of ρ in Experiment 2. Under ODCNM, for z c 1 , we generate the degree parameters for the column nodes as below: let θ c R n c × 1 , such that 1 / θ c ( i c ) i i d U ( 1 , z c ) for i c [ n c ] , where U ( 1 , z c ) denotes the uniform distribution on [ 1 , z c ] . We study the influences of z c and ρ under ODCNM in Experiments 3 and 4, respectively. For all settings, we report the averaged MHamm and the averaged Hamm over 50 repetitions.
Experiment 1: Changing n c under ONM. Let n c range over { 50 , 100 , 150 , , 300 } . For this experiment, P is set as
P = ρ 1 0.3 0.2 0.3 0.2 0.9 0.1 0.2 0.3 0.2 0.8 0.3 .
Let ρ = 0.5 for this experiment designed under ONM. The numerical results are shown in panels (a) and (b) of Figure 1. The results show that as n c increases, ONA and ODCNA perform better. For the row nodes, since both ONA and ODCNA apply the SP algorithm on U ^ to estimate Π r , the estimated row membership matrices of ONA and ODCNA are same, and hence, MHamm for ONA is always equal to that of ODCNA.
Experiment 2: Changing ρ under ONM.P is set the same as in Experiment 1, and we let the range of ρ be { 0.1 , 0.2 , , 1 } to study the influence of ρ on the performances of ONA and ODCNA under ONM. The results are displayed in panels (c) and (d) of Figure 1. From the results, we can see that both methods perform better as ρ increases, since a larger ρ gives more edges generated in a directed network.
Experiment 3: Change z c under ODCNM.P is set to be the same as Experiment 1, and ρ = 0.5 . Let z c range in { 1 , 2 , , 8 } . Increasing z c decreases the edges generated under ODCNM. Panels (e) and (f) in Figure 1 display the simulation results of this experiment. The results show that generally, increasing the variability of the node degrees makes it harder to detect the node memberships for both ONA and ODCNA. Though ODCNA is designed under ODCNM, it holds similar performances as ONA for directed networks in which column nodes have various degrees in this experiment, and this is consistent with our theoretical findings in Corollaries 1 and 2.
Experiment 4: Change ρ under ODCNM. Setting z c = 3 , P is set to be the same as in Experiment 1, and let ρ range in { 0.1 , 0.2 , , 1 } under ODCNM. Panels (g) and (h) in Figure 1 display the simulation results of this experiment. The performances of the two proposed methods are similar as those of Experiment 2.
Remark 3.
For visuality, we plot A generated under ONM. Let n r = 24 , n c = 20 , K r = 2 , K c = 2 , and
P = 1 0.2 0.1 0.9 .
For the row nodes, let Π r ( i r , 1 ) = 1 for 1 i r 8 , Π r ( i r , 2 ) = 1 for 9 i r 16 , and Π r ( r r , : ) = [ 0.7 0.3 ] for 17 i r 24 . For the column nodes, let ( i c ) = 1 for 1 i c 10 , and ( i c ) = 2 for 11 i c 20 . For the above setting, we generate two random adjacency matrices in Figure 2, where we also report the error rates of ONA and ODCNA. Note that, since the adjacency matrices are shown in Figure 2, and as Π r , , K r , and K c are known here, readers can apply ONA and ODCNA to A in Figure 2 to check the effectiveness of ONA and ODCNA.
Remark 4.
For visuality, we also plot a directed network as well as its adjacency matrix generated under ONM. Let n r = 30 , n c = 30 , K r = 2 , K c = 3 , and
P = 0.9 1 0.1 0.1 0.1 0.9 0.1 .
For row nodes, let Π r ( i r , 1 ) = 1 for 1 i r 10 , Π r ( i r , 2 ) = 1 for 11 i r 20 , and Π r ( i r , : ) = [ 0.7 0.3 ] for 21 i r 30 . For column nodes, let ( i c ) = 1 for 1 i c 10 , ( i c ) = 2 for 11 i c 20 , and ( i c ) = 3 for 21 i c 30 . For the above setting, we generate one adjacency matrix in panel (a) of Figure 3, where we also report the error rates of ONA and ODCNA. Furthermore, panels (b) and (c) of Figure 3 show the sending pattern and receiving pattern sides of this simulated directed network, respectively.

5. Real Data Analysis

For real-world directed networks, since nodes always have various degrees, we only apply ODCNA to deal with real-world datasets in this section. For the real-world directed networks analyzed in this section, the row nodes are always same as the column nodes, so we do not use subscript r and c to distinguish the row and column nodes here, and we let n r = n c = n . Meanwhile, the number of row communities is equal to that of the column communities; i.e, K r = K c = K for real data, where we always set K r = K c = K , as analyzed in [18], since it is a challenge to determine the number of row (column) communities for real-world directed networks without prior knowledge. When the row nodes are the same as the column nodes, A ( i , j ) = 1 means that a directed edge is sent from node i to node j. Thus, for any node, it has two patterns, the sending pattern and the receiving pattern. For the sending (receiving) pattern, we use the sending (receiving) cluster to denote the prior row (column) community, where we use the sending and receiving patterns to distinguish the behaviors of any node having the two patterns, as was performed in [18].
For Π ^ r obtained from ODCNA, we call node i a highly mixed node if 0.8 max 1 k K Π ^ r ( i , k ) , where 0.8 is a threshold. Here, 0.8 is a moderate value to define highly mixed nodes, and we can also choose 0.9, 0.95, or some other values in ( 0 , 1 ) . However, we choose 0.8 as the threshold, because setting the threshold to be larger (or lesser) than 0.8 may be too restrictive (loose) to define highly mixed nodes. The definition of highly mixed node is important, since it tells us whether a node only belongs to one community or belongs to multiple communities. Let τ be the proportion of highly mixed row nodes among all nodes, to measure the mixability of a directed network, i.e, τ = | i [ n ] : i is a highly mixed node | n . Meanwhile, we let ^ r be an n × 1 vector, such that ^ r ( i ) = argmax 1 k K Π ^ r ( i , k ) , where we use ^ r ( i ) to denote the home base sending pattern cluster of node i. Set
Hamm r c = min π S K Π ^ c π Π ^ ˜ r 0 n ,
where S K is the set of all permutations of { 1 , 2 , , K } ; Π ^ ˜ r R n × K is defined as Π ^ ˜ r ( i , k ) = 1 if ^ r ( i ) = k , and 0 otherwise for i [ n ] , k [ K ] . Hamm r c is defined to measure the difference between the sending and receiving pattern clusters. After defining τ and Hamm r c , we see that a larger τ indicates a directed network in which a large proposition of nodes are highly mixed nodes with a sending pattern, and a larger Hamm r c indicates that the sending pattern differs a lot with the receiving pattern. For i [ n ] , let d s e n d i n g ( i ) = j = 1 n A ( i , j ) denote the total number of edges sent by node i, and let d r e c e i v i n g ( i ) = j = 1 n A ( j , i ) denote the total number of edges that are received by node i. Call d s e n d i n g ( i ) and d r e c e i v i n g ( i ) the sending degree and receiving degree of node i, respectively. For real-world directed networks, we find that there are many nodes whose sending degree or receiving degree are zero, and so we need the following pre-processing steps before analyzing the real data:
(a)
Set S s e n d i n g , 0 = { i [ n ] : d s e n d i n g ( i ) = 0 } and S r e c e i v i n g , 0 = { i [ n ] : d r e c e i v i n g ( i ) = 0 } .
(b)
Set S d e g r e e , 0 = S s e n d i n g , 0 S r e c e i v i n g , 0 .
(c)
Update A by removing the nodes in S d e g r e e , 0 .
(d)
Repeat (a)–(c) until S d e g r e e , 0 is a null set.
(e)
After obtaining A through the above four steps, obtain the largest connected component of A.
We now describe the real-world directed networks analyzed in this paper:
Metabolic: This is a directed network representing the metabolic reactions of E.coli bacteria. In this data, node means metabolite, and a directed edge from node i to node j means that there is a reaction where node i is an input and node j is a product [29]. These data can be downloaded from http://networksciencebook.com/translations/en/resources/data.html. The original dat has 1039 nodes; after preprocessing, A { 0 , 1 } 893 × 893 . To estimate K, we plot the leading 20 singular values of A, and panel (a) of Figure 4 shows the result that suggests that K = 2 for these data, where [18] also applies the idea of an eigengap to estimate K for real-world directed networks with an unknown number of communities.
Political blogs: this is a directed network of hyperlinks between weblogs on US politics [30], and it can be downloaded from http://www-personal.umich.edu/~mejn/netdata/. Political blogs send and receive hyperlinks to and from blogs for the same political persuasion [18], so node means blog and edge means hyperlink in these data. The original network has 1490 nodes. After removing nodes with zero degrees via pre-processing steps, there are 814 nodes left; i.e., A { 0 , 1 } 813 × 813 for these data. Since there are two parties, “liberal” and “conservative”, K is 2 for both the sending and receiving pattern communities for these data. [18] applies their DI-SIM algorithm to the Political blogs network, assuming that all nodes have non-overlapping property. In this paper, we apply our ODCNA algorithm on these data to study its asymmetric structure on the overlapping property.
Wikipedia links (crh): This directed network represents the wikilinks of the Wikipedia website in the Crimean Turkish language (crh). Node means article, and the directed edge between two articles is the wikilink [31]. These data can be downloaded from http://konect.cc/networks/wikipedia_link_crh/. The original data have 8286 nodes. After pro-processing, A { 0 , 1 } 3555 × 3555 . Panel (c) of Figure 4 suggests K = 2 for this data.
Wikipedia links (dv): These data represent the wikilinks of the Wikipedia website in the Divehi language (dv), where node means article and the directed edge is a wikilink [31]. These data can be downloaded from http://konect.cc/networks/wikipedia_link_dv/. The original data has 4266 nodes. After removing nodes with zero degrees, A { 0 , 1 } 2394 × 2394 . Panel (d) of Figure 4 suggests K = 2 for these data.
The proportion of highly mixed nodes and Hamm r c for these directed networks are reported in Table 1 when assuming that nodes in sending (receiving) clusters having an overlapping (non-overlapping) property. For the Metabolic network, the results show that the sending pattern differs a lot with the receiving pattern, since Hamm r c = 0.2497 is quite large, and there are 893 × 0.1209 108 highly mixed nodes in the sending pattern. For the Political blogs network, there is a slight asymmetric structure between the sending pattern and the receiving pattern, since Hamm r c = 0.0443 is small. Meanwhile, for the sending pattern of Political blogs, there are 813 × 0.0246 20 highly mixed nodes. Thus, we may conclude that though there is a slight asymmetric structure in sending and receiving patterns for Political blogs, there are 20 highly mixed nodes in the sending pattern. For the Wikipedia links (crh), they have a slight asymmetric structure between sending and receiving patterns, and there are 3555 × 0.0444 158 highly mixed nodes in the sending pattern. For the Wikipedia links (dv) network, it has a large number of highly mixed nodes for its large τ , and a heavy asymmetric structure in sending and receiving patterns for its large Hamm r c . Generally, Table 1 suggests that if there are a large number of highly mixed nodes in the sending pattern, there is a heavy asymmetric structure between the sending and receiving patterns, and vice versa.
For visualization, we plot the sending clusters and receiving clusters detected by ODCNA for these directed networks when assuming that nodes in the sending (receiving) clusters have an overlapping (non-overlapping) property, i.e., when the input adjacency matrix of the ODCNA approach is A. The results are shown in Figure 5, Figure 6, Figure 7 and Figure 8, where we also show the highly mixed nodes in sending clusters detected by ODCNA. We see that there exists a clear asymmetric structure between the sending and receiving patterns for Metabolic and Wikipedia links (dv), as shown in Figure 5 and Figure 8, while there is a slight asymmetric structure between the sending and receiving patterns for Political blogs and Wikipedia links (crh), as shown in Figure 6 and Figure 7. Furthermore, most nodes are in the same sending (receiving) cluster for Metabolic and Wikipedia links (crh), while the two sending (receiving) clusters for Political blogs and Wikipedia links (crh) have close sizes. The results also show that most highly mixed nodes have many edges, while some highly mixed nodes have only a few edges, where such a phenomenon can be explained easily, since nodes with many edges tend to have an overlapping property, while it is difficult to detect a community for nodes with only a few edges, and ODCNA tends to treat such nodes as highly mixed nodes.
Furthermore, for real-world directed networks, since we have no prior knowledge on whether nodes in the sending pattern side or the receiving pattern side or both sides have overlapping property, simply inputting A with K sending (receiving) pattern communities in our ODCNA algorithm is not enough. To solve this problem, we also apply ODCNA on A , and the numerical results are provided in Table 2, where the results show that there also exist highly mixed nodes in the receiving pattern for these directed networks, and there also exists a heavy asymmetric structure between the sending and receiving clusters for the Metabolic and Wikipedia links (dv), while there also exists a slight asymmetric structure between the sending and receiving clusters for the Political blogs and Wikipedia links (crh).

6. Discussion

In this paper, we introduced Overlapping and Non-overlapping models and their extension, by considering the degree heterogeneity. The models can model a directed network with K r row communities and K c column communities, in which the row node can belong to multiple sending clusters, while the column node only belongs to one of the receiving clusters. The proposed models are identifiable under the case when K r K c , and some other popular constraints on the connectivity matrix and membership matrices. For comparison, modeling a directed network in which the row nodes have overlapping property while column nodes do not, with K r > K c , is unidentifiable. Meanwhile, since previous works have found that modeling directed networks in which both row and column nodes have an overlapping property with K r K c is unidentifiable, our identifiable ONM and ODCNM supply a gap in modeling overlapping directed networks when K r K c . These models provide exploratory tools for studying community structure in directed networks with one side overlapping while another side is non-overlapping. Two spectral algorithms are designed to fit ONM and ODCNM. We also showed an estimation consistency under mild conditions for our methods. Especially, when ODCNM reduces to ONM, our theoretical results under ODCNM are consistent with those under ONM. The numerical results for the simulated directed networks generated under ONM and ODCNM support our theoretical results, and the results for real-world directed networks reveal the existence of highly mixed nodes and an asymmetric structure between the sending and receiving clusters.
The models and algorithms introduced in this paper are useful tools for studying the asymmetric structure for directed networks, and we wish that they can be widely applied in network science. However, perhaps the main limitation of the models is that K r and K c in the directed network are assumed as givens, and such a limitation also holds for the spectral clustering algorithms developed under the ScBM and DCScBM studied in [18,19,20]. In most community problems, the number of row communities and the number of column communities are unknown; therefore, a complete calculation and theoretical study requires not only the algorithms and their theoretically consistent estimations described in this paper, but also a method for estimating K r and K c . A possible solution to this problem may be a combination of algorithms developed in this paper and the modularity for the directed networks developed in [32]. Meanwhile, our idea can be extended in many ways. In this paper, we only consider modeling an un-weighted directed network, and it is possible to extend our work to a weighted directed network. Our algorithms are designed based on the adjacency matrix, and it is possible to design spectral algorithms to fit ONM and ODCNM by applying the regularized Laplace matrix used in [11,12]. When detecting large-scale directed networks, the random projection-based and the random sampling-based spectral clustering ideas in [33] may be applied to accelerate our algorithms. We leave the studies of these problems to our future work.

Funding

This research was funded by the scientific research start-up fund of China University of Mining and Technology, NO. 102520253, and the high-level personal project of Jiangsu Province, NO. JSSCBS20211218.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available within the article.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SBMStochastic Blockmodel
DCSBMDegree-Corrected Stochastic Blockmodel
MMSBMixed Membership Stochastic Blockmodel
DCMMDegree-Corrected Mixed Membership model
OCCAMOverlapping Continuous Community Assignment model
ScBMStochastic co-Blockmodel
DCScBMDegree-Corrected Stochastic co-Blockmodel
DiMMSBDirected Mixed Membership Stochastic Blockmodel
ONMOverlapping and Non-overlapping model
ODCNMOverlapping and Degree-Corrected Non-overlapping model
SPsuccessive projection algorithm
ONAOverlapping and Non-overlapping algorithm
ODCNAOverlapping and Degree-Corrected Non-overlapping Algorithm

Appendix A. Successive Projection Algorithm

Algorithm A1 is the Successive Projection algorithm.
Algorithm A1 Successive Projection (SP) [27]
Require: Near-separable matrix Y s p = S s p M s p + Z s p R + m × n , where S s p , M s p should satisfy Assumption 1 [27], the number r of columns to be extracted.
Ensure: Set of indices K such that Y ( K , : ) S (up to permutation)
1:
Compute U ^ r R n r × K r and U ^ c R n c × K r from the top- K r -dimensional SVD of A.
2:
Let R = Y s p , K = { } , k = 1 .
3:
While R 0 and k r  do
4:
        k = argmax k R ( k , : ) F .
5:
       u k = R ( k , : ) .
6:
       R ( I u k u k u k F 2 ) R .
7:
       K = K { k } .
8:
      k = k + 1.
9:
end while

Appendix B. Proofs under ONM

Appendix B.1. Proof of Proposition 1

Proof. 
By Lemma 1, let U r Λ U c be the compact SVD of Ω , such that Ω = U r Λ U c ; since Ω = Π r P Π c = Π ˇ r P ˇ Π ˇ c , we have Ω ( I r , I c ) = P = P ˇ , which gives P = P ˇ . By Lemma 1, since U r = Π r U r ( I r , : ) = Π ˇ r U r ( I r , : ) , we have Π r = Π ˇ r where we have used the fact that the inverse of U r ( I r , : ) exists. Since Ω = Π r P Π c = Π ˇ r P ˇ Π ˇ c = Π r P Π ˇ c , we have Π r P Π c = Π r P Π ˇ c . By Lemma 7 of [21], we have P Π c = P Π ˇ c , i.e., Π c X = Π ˇ c X , where we set X = P R K c × K r . Let ˇ be the n c × 1 vector of column nodes labels obtained from Π ˇ c . For i c [ n c ] , k [ K r ] , from Π c X = Π ˇ c X , we have ( Π c X ) ( i c , k ) = Π c ( i c , : ) X ( : , k ) = X ( ( i c ) , k ) = X ( ˇ ( i c ) , k ) , which means that we must have ( i c ) = ˇ ( i c ) for all i c [ n c ] , i.e., = ˇ and Π c = Π ˇ c . Note that for the special case K r = K c = K , Π c = Π ˇ c can be obtained easily: since P Π c = P Π ˇ c and P R K × K is assumed to be full rank, we have Π c = Π ˇ c . Thus, the proposition holds. □

Appendix B.2. Proof of Lemma 1

Proof. 
For U r , since Ω = U r Λ U c and U c U c = I K r , we have U r = Ω U c Λ 1 . Recall that Ω = Π r P Π c ; we have U r = Π r P Π c U c Λ 1 = Π r B r , where we set B r = P Π c U c Λ 1 . Since U r ( I r , : ) = Π r ( I r , : ) B r = B r , we have B r = U r ( I r , : ) . For i r [ n r ] , U r ( i r , : ) = e i r Π r B r = Π r ( i r , : ) B r , so we have U r ( i r , : ) = U r ( i ¯ r , : ) when Π r ( i r , : ) = Π r ( i ¯ r , : ) .
For U c , following a similar analysis as for U r , we have U c = Π c B c , where B c = P Π r U r Λ 1 . Note that B c R K c K r . Sure, U c ( i c , : ) = U c ( i ¯ c , : ) when ( i c ) = ( i ¯ c ) for i c , i ¯ c [ n c ] .
Now, we focus on the case where K r = K c = K . For this case, since B c R K c K r , B c is full rank when K r = K c . Since I K r = I K = U c U c = B c Π c Π c B c , we have Π c Π c = ( B c B c ) 1 . Since Π c Π c = diag ( n c , 1 , n c , 2 , , n c , K ) , we have B c B c = diag ( 1 n c , 1 , 1 n c , 2 , , 1 n c , K ) . When K r = K c = K , we have B c ( k , : ) B c ( l , : ) = 0 for any k l and k , l [ K ] . Then, we have B c B c = diag ( B c ( 1 , : ) F 2 , B c ( 2 , : ) F 2 , , B c ( K , : ) F 2 ) = diag ( 1 n c , 1 , 1 n c , 2 , , 1 n c , K ) , and the lemma follows.
Note that when K r < K c , since B c is not full rank now, we cannot obtain Π c Π c = ( B c B c ) 1 from I K r = B c Π c Π c B c . Therefore, when K r < K c , the equality B c ( k , : ) B c ( l , : ) F = 1 n c , k + 1 n c , l does not hold for any k l . Additionally, we can only know that U c has K c distinct rows when K r < K c , but have no knowledge about the minimum distance between any two distinct rows of U c . □

Appendix B.3. Proof of Theorem 1

Proof. 
First, by Lemma 4 of [21], we have the below lemma.
Lemma A1.
(Row-wise singular eigenvector error) Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , when Assumption 1 holds, suppose σ K r ( Ω ) C ρ ( n r + n c ) log ( n r + n c ) , with a probability of at least 1 o ( ( n r + n c ) α ) ,
U ^ r U ^ r U r U r 2 = O ( K r ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) ρ σ K r ( P ˜ ) σ K r ( Π r ) n c , K r ) ,
where μ is the incoherence parameter defined as μ = max ( n r U r 2 2 K r , n c U c 2 2 K r ) .
For the row nodes, when conditions in Lemma A1 hold, by Theorem 2 of [21], with a probability of at least 1 o ( ( n r + n c ) α ) for any α > 0 , there exists a permutation matrix P r such that, for i r [ n r ] , we have
e i r ( Π ^ r Π r P r ) 1 = O ( ϖ κ ( Π r Π r ) K r λ 1 ( Π r Π r ) ) .
Next, we focus on the column nodes. By the Proof of Lemma 3 in [19], there exists an orthogonal matrix O ^ such that
U ^ c O ^ U c F 2 2 K r A Ω λ K r ( Ω Ω ) .
Under O N M n r , n c ( K r , K c , P , Π r , Π c ) , by Lemma 10 of [21], we have
λ K r ( Ω Ω ) ρ σ K r ( P ˜ ) σ K r ( Π r ) σ K r ( Π c ) .
Since all column nodes are pure, σ K r ( Π c ) = n c , K r . By Lemma 3 of [21], when Assumption 1 holds with a probability at least 1 o ( ( n r + n c ) α ) , we have
A Ω = O ( ρ max ( n r , n c ) log ( n r + n c ) ) .
Substituting the two bounds in Equations (A2) and (A3) into Equation (A1), we have
U ^ c O ^ U c F C K r max ( n r , n c ) log ( n r + n c ) σ K r ( P ˜ ) ρ σ K r ( Π r ) n c , K r .
Let ς > 0 be a small quantity; by Lemma 2 in [15], if
K c ς U c U ^ c O ^ F ( 1 n c , k + 1 n c , l ) B c ( k , : ) B c ( l , : ) F , for each 1 k l K c ,
then the clustering error f ^ c = O ( ς 2 ) . Recall that we set δ c = min k l B c ( k , : ) B c ( l , : ) F to measure the minimum center separation of B c . Setting ς = 2 δ c K c n c , min U c U ^ c O ^ F makes Equation (A5) hold for all 1 k l K c . Then, we have f ^ c = O ( ς 2 ) = O ( K c U c U ^ c O ^ F 2 δ c 2 n c , min ) . By Equation (A4), we have
f ^ c = O ( K r K c max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 σ K r 2 ( Π r ) n c , K r n c , min ) .
Especially, when K r = K c = K , δ c 2 n c , max under O N M n r , n c ( K r , K c , P , Π r , Π c ) by Lemma 1. When K r = K c = K , we have
f ^ c = O ( K 2 max ( n r , n c ) n c , max log ( n r + n c ) σ K 2 ( P ˜ ) ρ σ K 2 ( Π r ) n c , min 2 ) .

Appendix B.4. Proof of Corollary 1

Proof. 
For the row nodes, under the conditions of Corollary 1, we have
max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( ϖ K r n r K r ) = O ( ϖ K n r ) .
Under the conditions of Corollary 1, κ ( Ω ) = O ( 1 ) and μ C for some C > 0 by the proof of Corollary 1 [21]. Then, by Lemma A1, we have
ϖ = O ( K ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) ρ σ K ( P ˜ ) σ K ( Π r ) n c , K r ) = O ( K ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) ρ σ K ( P ˜ ) σ K ( Π r ) n c , min ) = O ( K 1.5 ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) σ K ( P ˜ ) ρ n r n c ) ,
which gives that
max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( K 2 ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) σ K ( P ˜ ) ρ n c ) .
Note that, when K r < K c , we cannot draw a conclusion that μ C . This is because, when K r < K c , the inverse of B c B c does not exist, since B c R K c × K r . Therefore, Lemma 8 of [21] does not hold, and we cannot obtain the upper bound of U c 2 , causing the impossibility of obtaining the upper bound of μ , and this is the reason for why we only consider the case for when K r = K c , for the row nodes here.
For the column nodes, under the conditions of Corollary 1, we have
f ^ c = O ( K r K c max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 σ K r 2 ( Π r ) n c , K r n c , min ) = O ( K r K c max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 ( n r / K r ) ( n c / K c ) ( n c / K c ) ) = O ( K r 2 K c 3 max ( n r , n c ) log ( n r + n c ) σ K r 2 ( P ˜ ) ρ δ c 2 n r n c 2 ) .
For the special case K r = K c = K , since n c , max n c , min = O ( 1 ) when n c , min = O ( n c K ) , we have
f ^ c = O ( K 4 max ( n r , n c ) log ( n r + n c ) σ K 2 ( P ˜ ) ρ n r n c ) .
When n r = O ( n ) , n c = O ( n ) , K r = O ( 1 ) , and K c = O ( 1 ) , the corollary follows immediately by basic algebra. □

Appendix C. Proofs under ODCNM

Appendix C.1. Proof of Proposition 2

Proof. 
Since Ω = Π r P Π c Θ c = Π ˇ r P ˇ Π ˇ c Θ ˇ c = U r Λ U c , we have U r = Π r U r ( I r , : ) = Π ˇ r U r ( I r , : ) by Lemma 2, which gives that Π r = Π ˇ r . Since U c , = Π c B c = Π c U c , ( I c , : ) = Π ˇ c U c , ( I c , : ) by Lemma 2, we have Π c = Π ˇ c . □

Appendix C.2. Proof of Lemma 2

Proof. 
 
  • For U r : since Ω = U r Λ U c and U c U c = I K r , we have U r = Ω U c Λ 1 . Recall that Ω = Π r P Π c Θ c under ODCNM; we have U r = Π r P Π c Θ c U c Λ 1 = Π r B r , where B r = P Π c Θ c U c Λ 1 . Sure, U r ( i r , : ) = U r ( i ¯ r , : ) holds when Π r ( i r , : ) = Π r ( i ¯ r , : ) for i r , i ¯ r [ n r ] .
  • For U c : let D c be a K c × K c diagonal matrix, such that D c ( k , k ) = Θ c Π c ( : , k ) F θ c F for k [ K c ] . Let Γ c be an n c × K c matrix, such that Γ c ( : , k ) = Θ c Π c ( : , k ) Θ c Π c ( : , k ) F for k [ K c ] . For such D c and Γ c , we have Γ c Γ c = I K c and Ω = Π r P θ c F D c Γ c , i.e., Θ c Π c = θ c F Γ c D c .
    Since Ω = U r Λ U c and U r U r = I K r , we have U c = Θ c Π c P Π r U r Λ 1 . Since Θ c Π c = θ c F Γ c D c , we have U c = Γ c θ c F D c P Π r U r Λ 1 = Γ c V c , where we set
    V c = θ c F D c P Π r U r Λ 1 R K c × K r . Note that since U c U c = I K r = V c Γ c Γ c V c = V c V c , we have V c V c = I K r . Now, for i c [ n c ] , k [ K r ] , we have
    U c ( i c , k ) = e i c U c e k = e i c Γ c V c e k = Γ c ( i c , : ) V c e k = θ c ( i c ) [ Π c ( i c , 1 ) Θ c Π c ( : , 1 ) F Π c ( i c , 2 ) Θ c Π c ( : , 2 ) F Π c ( i c , K c ) Θ c Π c ( : , K c ) F ] V c e k = θ c ( i c ) Θ c Π c ( : , ( i c ) ) F V c ( ( i c ) , k ) ,
    which gives that
    U c ( i c , : ) = θ c ( i c ) Θ c Π c ( : , ( i c ) F [ V c ( ( i c ) , 1 ) V c ( ( i c ) , 2 ) V c ( ( i c ) , K r ) ] = θ c ( i c ) Θ c Π c ( : , ( i c ) F V c ( ( i c ) , : ) .
    Then, we have
    U c , ( i c , : ) = V c ( ( i c ) , : ) V c ( ( i c ) , : ) F .
    Sure, we have U c , ( i c , : ) = U c , ( i ¯ c , : ) when ( i c ) = ( i ¯ c ) for i c , i ¯ c [ n c ] . Let B c R K c × K r , such that B c ( l , : ) = V c ( l , : ) V c ( l , : ) F for l [ K c ] . Equation (A6) gives U c , = Π c B c , which guarantees the existence of B c .
    Now, we consider the case for when K r = K c = K . Since V c R K c × K r and U c = Γ c V c R n c × K r , we have V c R K × K and rank ( V c ) = K . Since V c V c = I K r , we have V c V c = I K when K r = K c = K . Then, we have
    V c V c = I K V c V c V c = V c V c ( V c V c I K ) = 0 rank ( V c ) = K V c V c = I K .
    Since V c V c = V c V c = I K , we have U c , ( i c , : ) = V c ( ( i c ) , : ) by Equation (A6), and
    U c , ( i c , : ) U c , ( i ¯ c , : ) F = V c ( ( i c ) , : ) V c ( ( i ¯ c ) , : ) F = 2 when ( i c ) ( i ¯ c ) for i c , i ¯ c [ n c ] , i.e., B c ( k , : ) B c ( l , : ) F = 2 for k l [ K ] .
    Note that, when K r < K c , since rank ( V c ) = K r and V c R K c × K r , the inverse of V c does not exist, which causes that the last equality in Equation (A7) does not hold and B c ( k , : ) B c ( , : ) 2 for all k l [ K c ] .

Appendix C.3. Proof of Theorem 2

Proof. 
First, by the proof of Lemma 4.3 of [25], we have the below lemma.
Lemma A2.
(Row-wise singular eigenvector error) Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , a n d Θ c ) , when Assumption 2 holds, suppose that σ K r ( Ω ) C θ c , max ( n r + n c ) log ( n r + n c ) , with a probability at least 1 o ( ( n r + n c ) α ) ,
U ^ r U ^ r U r U r 2 = O ( θ c , max K r ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K r ( P ) σ K r ( Π r ) n c , K r ) .
For the row nodes, when the conditions in Lemma A2 hold, by Theorem 2 of [21], we have
max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( ϖ κ ( Π r Π r ) K r λ 1 ( Π r Π r ) ) .
Next, we focus on the column nodes. By the proof of Lemma 3 in [19], there is an orthogonal matrix O ^ , such that
U ^ c O ^ U c F 2 2 K r A Ω λ K r ( Ω Ω ) .
Under O D C N M n r , n c ( K r , K c , P , Π r , Π c , a n d   Θ c ) , by Lemma 4 of [25], we have
λ K r ( Ω Ω ) θ c , min σ K r ( P ) σ K r ( Π r ) n c , K r .
By Lemma 4.2 of [25], when Assumption 2 holds, with a probability at least 1 o ( ( n r + n c ) α ) , we have
A Ω = O ( max ( θ c , max n r , θ c 1 ) log ( n r + n c ) ) .
Substituting the two bounds in Equations (A9) and (A10) into Equation (A8), we have
U ^ c O ^ U c F C K r max ( θ c , max n r , θ c 1 ) log ( n r + n c ) σ K r ( P ) θ c , min σ K r ( Π r ) n c , K r .
For i c [ n c ] , by basic algebra, we have
U ^ c , ( i c , : ) O ^ U c , ( i c , : ) F 2 U ^ c ( i c , : ) O ^ U c ( i c , : ) F U c ( i c , : ) F .
Setting m c = min 1 i c n c U c ( i c , : ) F , we have
U ^ c , O ^ U c , F = i c = 1 n c U ^ c , ( i c , : ) O ^ U c , ( i c , : ) F 2 2 U ^ c O ^ U c F m c .
Next, we provide the lower bounds of m c . By the proof of Lemma 2, we have
U c ( i c , : ) F = θ c ( i c ) Θ c Π c ( : , ( i c ) ) F V c ( ( i c ) , : ) F = θ c ( i c ) Θ c Π c ( : , ( i c ) ) F V c ( ( i c ) , : ) F θ c ( i c ) Θ c Π c ( : , ( i c ) ) F m V c θ c , min θ c , max n c , max m V c ,
where we set m V c = min k [ K c ] V c ( k , : ) F . Note that when K r = K c = K , by the Proof of Lemma 2, we know that V c V c = I K , which gives that V c ( k , : ) F = 1 for k [ K ] ; i.e., m V c = 1 when K r = K c = K . However, when K r < K c , it is challenge to obtain a positive lower bound of m V c . Hence, we have 1 m c θ c , max n c , max θ c , min m V c . Then, by Equation (A11), we have
U ^ c , O ^ U c , F = O ( θ c , max K r max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K r ( P ) θ c , min 2 m V c σ K r ( Π r ) n c , K r ) .
Let ς > 0 be a small quantity; by Lemma 2 in [15], if
K c ς U c , U ^ c , O ^ F ( 1 n c , k + 1 n c , l ) B c ( k , : ) B c ( l , : ) F , for each 1 k l K c ,
then the clustering error f ^ c = O ( ς 2 ) . Setting ς = 2 δ c K c n c , min U c , U ^ c , O ^ F makes Equation (A12) hold for all 1 k l K c . Then, we have f ^ c = O ( ς 2 ) = O ( K c U c , U ^ c , O ^ F 2 δ c 2 n c , min ) . By Equation (A11), we have
f ^ c = O ( θ c , max 2 K r K c max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 σ K r 2 ( Π r ) n c , K r n c , min ) .
Especially, when K r = K c = K , δ c = 2 under O D C N M n r , n c ( K r , K c , P , Π r , Π c , Θ c ) by Lemma 2, and m V c = 1 . When K r = K c = K , we have
f ^ c = O ( θ c , max 2 K 2 max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K 2 ( P ) θ c , min 4 σ K 2 ( Π r ) n c , min 2 ) .

Appendix C.4. Proof of Corollary 2

Proof. 
For the row nodes, under the conditions of Corollary 2, we have
max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( ϖ K r n r K r ) = O ( ϖ K n r ) .
Under the conditions of Corollary 2, κ ( Ω ) = O ( 1 ) and μ C θ c , max 2 θ c , min 2 C for some C > 0 by Lemma 2 of [25]. Then, by Lemma A2, we have
ϖ = O ( θ c , max K r ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K r ( P ) σ K r ( Π r ) n c , K r ) = O ( θ c , max K ( κ ( Ω ) max ( n r , n c ) μ min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K ( P ) σ K ( Π r ) n c , min ) = O ( K 1.5 θ c , max ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K ( P ) n r n c ) ,
which gives that
max i r [ n r ] e i r ( Π ^ r Π r P r ) 1 = O ( K 2 θ c , max ( C max ( n r , n c ) min ( n r , n c ) + log ( n r + n c ) ) θ c , min σ K ( P ) n c ) .
The reason for why we do not consider the case when K r < K c for row nodes is similar as that of Corollary 1, and we omit it here.
For column nodes, under conditions of Corollary 2, we have
f ^ c = O ( θ c , max 2 K r K c max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 σ K r 2 ( Π r ) n c , K r n c , min ) = O ( θ c , max 2 K r 2 K c 2 max ( θ c , max n r , θ c 1 ) log ( n r + n c ) σ K r 2 ( P ) θ c , min 4 δ c 2 m V c 2 n r n c ) .
For the case K r = K c = K , we have
f ^ c = O ( θ c , max 2 K 2 max ( θ c , max n r , θ c 1 ) n c , max log ( n r + n c ) σ K 2 ( P ) θ c , min 4 σ K 2 ( Π r ) n c , min 2 ) = O ( θ c , max 2 K 4 max ( θ c , max n r , θ c 1 ) log ( n r + n c ) σ K 2 ( P ) θ c , min 4 n r n c ) .
When n r = O ( n ) , n c = O ( n ) , K r = O ( 1 ) and K c = O ( 1 ) , the corollary follows immediately by basic algebra. □

References

  1. Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
  2. Newman, M.E. The structure and function of complex networks. Siam Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
  3. Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
  4. Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
  5. Goldenberg, A.; Zheng, A.X.; Fienberg, S.E.; Airoldi, E.M. A survey of statistical network models. Found. Trends Mach. Learn. 2010, 2, 129–233. [Google Scholar] [CrossRef]
  6. Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
  7. Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. 2011, 83, 16107. [Google Scholar] [CrossRef]
  8. Airoldi, E.M.; Blei, D.M.; Fienberg, S.E.; Xing, E.P. Mixed Membership Stochastic Blockmodels. J. Mach. Learn. Res. 2008, 9, 1981–2014. [Google Scholar]
  9. Jin, J.; Ke, Z.T.; Luo, S. Estimating network memberships by simplex vertex hunting. arXiv 2017, arXiv:1708.07852. [Google Scholar]
  10. Zhang, Y.; Levina, E.; Zhu, J. Detecting Overlapping Communities in Networks Using Spectral Methods. Siam J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
  11. Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
  12. Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
  13. Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
  14. Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
  15. Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
  16. Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
  17. Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating mixed memberships with sharp eigenvector deviations. J. Am. Stat. Assoc. 2020, 116, 1–13. [Google Scholar] [CrossRef]
  18. Rohe, K.; Qin, T.; Yu, B. Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 2016, 113, 12679–12684. [Google Scholar] [CrossRef] [PubMed]
  19. Zhou, Z.; Amini, A.A. Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 2019, 20, 1–47. [Google Scholar]
  20. Wang, Z.; Liang, Y.; Ji, P. Spectral algorithms for community detection in directed networks. J. Mach. Learn. Res. 2020, 21, 1–45. [Google Scholar]
  21. Qing, H.; Wang, J. Directed mixed membership stochastic blockmodel. arXiv 2021, arXiv:2101.02307v2. [Google Scholar]
  22. Airoldi, E.M.; Wang, X.; Lin, X. Multi-way blockmodels for analyzing coordinated high-dimensional responses. Ann. Appl. Stat. 2013, 7, 2431–2457. [Google Scholar] [CrossRef]
  23. Razaee, Z.S.; Amini, A.A.; Li, J.J. Matched bipartite block model with covariates. J. Mach. Learn. Res. 2019, 20, 1174–1217. [Google Scholar]
  24. Zhou, Z.; Amini, A.A. Optimal bipartite network clustering. J. Mach. Learn. Res. 2020, 21, 1–68. [Google Scholar]
  25. Qing, H. Directed degree corrected mixed membership model and estimating community memberships in directed networks. arXiv 2021, arXiv:2109.07826. [Google Scholar]
  26. Ndaoud, M.; Sigalla, S.; Tsybakov, A.B. Improved clustering algorithms for the bipartite stochastic block model. IEEE Trans. Inf. Theory 2021, 68, 1960–1975. [Google Scholar] [CrossRef]
  27. Gillis, N.; Vavasis, S.A. Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. Siam J. Optim. 2015, 25, 677–698. [Google Scholar] [CrossRef]
  28. Qing, H. A useful criterion on studying consistent estimation in community detection. Entropy 2022, 24, 1098. [Google Scholar] [CrossRef]
  29. Schellenberger, J.; Park, J.O.; Conrad, T.M.; Palsson, B.Ø. BiGG: A Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinform. 2010, 11, 1–10. [Google Scholar] [CrossRef]
  30. Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
  31. Kunegis, J. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
  32. Leicht, E.A.; Newman, M.E. Community structure in directed networks. Phys. Rev. Lett. 2008, 100, 118703. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Zhang, H.; Guo, X.; Chang, X. Randomized spectral clustering in large-scale stochastic block models. J. Comput. Graph. Stat. 2022, 1–20. [Google Scholar] [CrossRef]
Figure 1. Estimation errors of ONA and ODCNA.
Figure 1. Estimation errors of ONA and ODCNA.
Entropy 24 01216 g001
Figure 2. For adjacency matrix in panel (a), MHamm and Hamm for ONA are 0.0544 and 0, respectively. For adjacency matrix in panel (b), MHamm and Hamm for ONA are 0.1004 and 0, respectively. ODCNA enjoys same error rates as ONA. x-axis: row nodes; y-axis: column nodes.
Figure 2. For adjacency matrix in panel (a), MHamm and Hamm for ONA are 0.0544 and 0, respectively. For adjacency matrix in panel (b), MHamm and Hamm for ONA are 0.1004 and 0, respectively. ODCNA enjoys same error rates as ONA. x-axis: row nodes; y-axis: column nodes.
Entropy 24 01216 g002
Figure 3. Illustration of a simulated directed network generated under ONM. Panels (ac) show the adjacency matrix, the sending clusters, and the receiving clusters of this simulated directed network, respectively. For this directed network, MHamm and Hamm for ONA (and ODCNA) are 0.0615 (0.0615) and 0 (0.2333), respectively. In panels (b,c), the dots in the same color are pure nodes in the same sending (receiving) clusters, and the square indicates the mixed nodes with weight 0.7 belonging to red sending clusters, and weight 0.3 belonging to blue sending clusters, where the sending and receiving clusters are obtained by Π r and provided in Remark 4.
Figure 3. Illustration of a simulated directed network generated under ONM. Panels (ac) show the adjacency matrix, the sending clusters, and the receiving clusters of this simulated directed network, respectively. For this directed network, MHamm and Hamm for ONA (and ODCNA) are 0.0615 (0.0615) and 0 (0.2333), respectively. In panels (b,c), the dots in the same color are pure nodes in the same sending (receiving) clusters, and the square indicates the mixed nodes with weight 0.7 belonging to red sending clusters, and weight 0.3 belonging to blue sending clusters, where the sending and receiving clusters are obtained by Π r and provided in Remark 4.
Entropy 24 01216 g003
Figure 4. Leading 20 singular values of adjacency matrices for real-world directed networks used in this paper.
Figure 4. Leading 20 singular values of adjacency matrices for real-world directed networks used in this paper.
Entropy 24 01216 g004
Figure 5. Sending and receiving clusters detected by ODCNA for Metabolic network when assuming that nodes in a sending (receiving) pattern have an overlapping (non-overlapping) property. Colors indicate clusters detected using ODCNA, and squares indicate highly mixed nodes, where sending clusters are obtained using ^ r , the home base sending pattern community, and receiving clusters are obtained by ^ from ODCNA.
Figure 5. Sending and receiving clusters detected by ODCNA for Metabolic network when assuming that nodes in a sending (receiving) pattern have an overlapping (non-overlapping) property. Colors indicate clusters detected using ODCNA, and squares indicate highly mixed nodes, where sending clusters are obtained using ^ r , the home base sending pattern community, and receiving clusters are obtained by ^ from ODCNA.
Entropy 24 01216 g005
Figure 6. Sending and receiving clusters detected by ODCNA for Political blogs network. Colors indicate clusters and square indicates highly mixed nodes.
Figure 6. Sending and receiving clusters detected by ODCNA for Political blogs network. Colors indicate clusters and square indicates highly mixed nodes.
Entropy 24 01216 g006
Figure 7. Sending and receiving clusters detected by ODCNA for Wikipedia links (crh) network. Colors indicate clusters and square indicates highly mixed nodes.
Figure 7. Sending and receiving clusters detected by ODCNA for Wikipedia links (crh) network. Colors indicate clusters and square indicates highly mixed nodes.
Entropy 24 01216 g007
Figure 8. Sending and receiving clusters detected by ODCNA for Wikipedia links (dv) network. Colors indicate clusters and square indicates highly mixed nodes.
Figure 8. Sending and receiving clusters detected by ODCNA for Wikipedia links (dv) network. Colors indicate clusters and square indicates highly mixed nodes.
Entropy 24 01216 g008
Table 1. The proportion of highly mixed nodes and the asymmetric structure measured by Hamm r c for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A; i.e., the case when assuming that nodes in a sending (receiving) pattern have overlapping (non-overlapping) property.
Table 1. The proportion of highly mixed nodes and the asymmetric structure measured by Hamm r c for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A; i.e., the case when assuming that nodes in a sending (receiving) pattern have overlapping (non-overlapping) property.
Data τ Hamm rc
Metabolic0.12090.2497
Political blogs0.02460.0443
Wikipedia links (crh)0.04440.0307
Wikipedia links (dv)0.40890.1466
Table 2. The proportion of highly mixed nodes and the asymmetric structure measured by Hamm r c for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A , i.e., the case when assuming that nodes in sending (receiving) pattern have non-overlapping (overlapping) property.
Table 2. The proportion of highly mixed nodes and the asymmetric structure measured by Hamm r c for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A , i.e., the case when assuming that nodes in sending (receiving) pattern have non-overlapping (overlapping) property.
Data τ Hamm rc
Metabolic0.05940.2945
Political blogs0.13650.0443
Wikipedia links (crh)0.13080.0543
Wikipedia links (dv)0.34920.2059
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qing, H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy 2022, 24, 1216. https://doi.org/10.3390/e24091216

AMA Style

Qing H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy. 2022; 24(9):1216. https://doi.org/10.3390/e24091216

Chicago/Turabian Style

Qing, Huan. 2022. "Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models" Entropy 24, no. 9: 1216. https://doi.org/10.3390/e24091216

APA Style

Qing, H. (2022). Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy, 24(9), 1216. https://doi.org/10.3390/e24091216

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop