Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models

Qing, Huan

doi:10.3390/e24091216

Open AccessArticle

Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models

by

Huan Qing

School of Mathematics, China University of Mining and Technology, Xuzhou 221116, China

Entropy 2022, 24(9), 1216; https://doi.org/10.3390/e24091216

Submission received: 4 July 2022 / Revised: 25 August 2022 / Accepted: 27 August 2022 / Published: 30 August 2022

(This article belongs to the Topic Complex Systems and Network Science)

Download

Browse Figures

Versions Notes

Abstract

:

We consider the problem of modeling and estimating communities in directed networks. Models to this problem in the previous literature always assume that the sending clusters and the receiving clusters have non-overlapping property or overlapping property simultaneously. However, previous models cannot model the directed network in which nodes in sending clusters have overlapping property, while nodes in receiving clusters have non-overlapping property, especially for the case when the number of sending clusters is no larger than that of the receiving clusters. This kind of directed network exists in the real world for its randomness, and by the fact that we have little prior knowledge of the community structure for some real-world directed networks. To study the asymmetric structure for such directed networks, we propose a flexible and identifiable Overlapping and Non-overlapping model (ONM). We also provide one model as an extension of ONM to model the directed network, with a variation in node degree. Two spectral clustering algorithms are designed to fit the models. We establish a theoretical guarantee on the estimation consistency for the algorithms under the proposed models. A small scale computer-generated directed networks are designed and conducted to support our theoretical results. Four real-world directed networks are used to illustrate the algorithms, and the results reveal the existence of highly mixed nodes and the asymmetric structure for these networks.

Keywords:

community detection; directed network; network analysis; spectral clustering

1. Introduction

Community detection is a powerful tool in studying social networks with a latent structure of community [1,2,3,4]. The goal of community detection is to estimate a node’s community information from the network. In the study of social networks, various models have been proposed for community detection to model different networks with different community structures [5]. Due to the extremely intensive studies on community detection, we only focus on identifiable models that are closely relevant to our study in this paper.

The Stochastic Blockmodel (SBM) [6] is a classical and widely used model for an undirected network. SBM assumes that the probability of an edge between two nodes only depends on the clusters they belong to, and this assumption is not realistic because nodes have various degrees in real-world networks. To model real-world un-directed networks in which nodes degrees vary, the Degree-Corrected Stochastic Blockmodel (DCSBM) [7] extends SBM by introducing degree heterogeneities. Under SBM and DCSBM, all nodes are pure, such that each node only belongs to one community. However, in real cases, some nodes may belong to multiple communities, and such nodes have overlapping (also known as mixed membership) property. To model undirected networks in which nodes have an overlapping property, Ref. [8] designs the Mixed Membership Stochastic Blockmodel (MMSB). Ref. [9] introduces the Degree-Corrected Mixed Membership model (DCMM), which extends MMSB by considering degree heterogeneities. Ref. [10] designs the Overlapping Continuous Community Assignment model (OCCAM), which equals DCMM actually. Spectral methods with consistent estimations under the above models are provided in [9,11,12,13,14,15,16,17].

For directed networks in which all nodes have a non-overlapping property, Ref. [18] proposes a model called Stochastic co-Blockmodel, (ScBM) and its extension, the Degree-Corrected Stochastic co-Blockmodel (DCScBM), by considering the degree heterogeneity, where ScBM (DCScBM) is an extension of SBM (DCSBM) from an un-directed network to a directed network. ScBM and DCScBM can model non-overlapping directed networks in which row nodes belong to

K_{r}

sending clusters (we also use community to denote cluster occasionally) and column nodes belong to

K_{c}

receiving clusters, where row nodes can differ from column nodes, and

K_{r}

can differ from

K_{c}

. Ref. [19] studies the consistency of some adjacency-based spectral algorithms under ScBM. Ref. [20] studies the consistency of the spectral method D-SCORE under DCScBM when

K_{r} = K_{c}

. Ref. [21] designs the Directed Mixed Membership Stochastic Blockmodel (DiMMSB) as an extension of ScBM and MMSB to model directed networks in which all nodes have overlapping property. Meanwhile, DiMMSB can also be seen as an extension of the two-way blockmodels with a Bernoulli distribution of [22]. All of the above models are identifiable under certain conditions. The identifiability of ScBM and DCScBM holds even for the case when

K_{r} \neq K_{c}

. DiMMSB is identifiable only when

K_{r} = K_{c}

. Sure, SBM, DCSBM, MMSB, DCMM, and OCCAM are identifiable when

K_{r} = K_{c}

, since they model undirected networks. For all the above models, row nodes and column nodes have symmetric structural information such that they always have non-overlapping property or overlapping property simultaneously. As shown by the identifiability of DiMMSB, to model a directed network in which all nodes have overlapping property, the identifiability of the model requires

K_{r} = K_{c}

. Naturally, there is a bridge model from ScBM to DiMMSB such that the bride model can model a directed network in which the row nodes and column nodes have asymmetric structural information such that they have different overlapping property. In this paper, we introduce this model and name it the Overlapping and Non-overlapping model.

Our contributions in this paper are as follows. We propose an identifiable model for directed networks, the Overlapping and Non-overlapping model (ONM for short). ONM allows that nodes in a directed network can have different overlapping properties. Without a loss of generality, in a directed network, we let the row nodes have overlapping property while the column nodes do not. The proposed model is identifiable when

K_{r} \leq K_{c}

. Recall that the identifiability of ScBM modeling non-overlapping directed networks holds even for the case

K_{r} \neq K_{c}

, and that DiMMSB modeling overlapping directed networks is identifiable only when

K_{r} = K_{c}

, this is the reason for why we call ONM modeling directed networks, in which row nodes have different overlapping properties to column nodes, as a bridge model from ScBM to DiMMSB. We also propose an identifiable model, Overlapping and Degree-Corrected Non-overlapping model (ODCNM), as an extension of ONM, by considering the degree heterogeneity. We construct two spectral algorithms to fit ONM and ODCNM. We show that our methods enjoy consistent estimations under mild conditions. Especially, our theoretical results under ODCNM match those under ONM when ODCNM reduces to ONM. The numerical results of simulated directed networks generated under ONM and ODCNM support our theoretical findings, and the results on four real-world directed networks demonstrate the advantages of our algorithms in studying the asymmetric structure between the sending and receiving clusters.

Notations. We take the following general notations in this paper. For any positive integer m, let

[m] : = {1, 2, \dots, m}

, and let

I_{m}

denote the

m \times m

identity matrix. For a vector x and fixed

q > 0

,

{∥ x ∥}_{q}

denotes its

l_{q}

-norm. For a matrix M,

M^{'}

denotes the transpose of the matrix M,

∥ M ∥

denotes the spectral norm,

{∥ M ∥}_{F}

denotes the Frobenius norm, and

{∥ M ∥}_{2 \to \infty}

denotes the maximum

l_{2}

-norm of all the rows of M. Let

σ_{i} (M)

be the i-th largest singular value of matrix M, and let

λ_{i} (M)

denote the i-th largest eigenvalue of the matrix M ordered by the magnitude.

M (i, :)

and

M (:, j)

denote the i-th row and the j-th column of matrix M, respectively.

M (S_{r}, :)

and

M (:, S_{c})

denote the rows and columns in the index sets

S_{r}

and

S_{c}

of matrix M, respectively. For any matrix M, we simply use

Y = \max (0, M)

to represent

Y_{i j} = \max (0, M_{i j})

for any

i, j

. For any matrix

M \in R^{m \times m}

, let

diag (M)

be the

m \times m

diagonal matrix whose i-th diagonal entry is

M (i, i)

, and let

rank (M)

be M’s rank.

1

is a column vector with all entries being the value 1.

e_{i}

is a column vector whose i-th entry is 1, while other entries are zero. In this paper, C is a positive constant which may occasionally be different.

2. The Overlapping and Non-Overlapping Model

Consider a directed network

N = (V_{r}, V_{c}, E)

, where

V_{r} = {1, 2, \dots, n_{r}}

is the set of row nodes,

V_{c} = {1, 2, \dots, n_{c}}

is the set of column nodes, and E is the set of edges from the row nodes to the column nodes. Note that since the row nodes can be different from the column nodes, we may have

V_{r} \cap V_{c} = ⌀

(i.e., there are no common nodes between

V_{r}

and

V_{c}

), and

V_{r}

may not be equal to

V_{c}

(i.e., the row nodes are different from the column nodes), which is a more general case than

V_{r} = V_{c}

(i.e., all row nodes are same as column nodes), where ⌀ denotes the null set, and such a directed network

N

is also known as a bipartite graph (or bipartite network) in [18,19]. In this paper, we use the subscript r and c to distinguish the terms for the row nodes and column nodes, where works in [18,19,23,24,25,26] also consider the general bipartite setting, such that the row nodes may differ from the column nodes. Let

A \in {0, 1}^{n_{r} \times n_{c}}

be the bi-adjacency matrix of directed network

N

, such that

A (i_{r}, i_{c}) = 1

if there is a directional edge from row node

i_{r}

to column node

i_{c}

, and

A (i_{r}, i_{c}) = 0

otherwise. For convenience, we call the community that the row nodes belong to as the row community (or sending cluster occasionally), and the community that the column nodes belong to as the column community (or receiving cluster occasionally).

We propose a new blockmodel which we call the Overlapping and Non-overlapping model (ONM for short). ONM can model directed networks whose row nodes belong to

K_{r}

overlapping row communities, while the column nodes belong to

K_{c}

non-overlapping column communities. For row nodes, let

Π_{r} \in R^{n_{r} \times K_{r}}

be the membership matrix, such that

\begin{matrix} Π_{r} (i_{r},) \geq 0, {∥ Π_{r} (i_{r}, :) ∥}_{1} = 1 for i_{r} \in [n_{r}] . \end{matrix}

(1)

Call row node

i_{r}

pure if

Π_{r} (i_{r}, :)

degenerates (i.e., one entry is 1, all others

K_{r} - 1

entries are 0), and mixed otherwise. From such a definition, row node

i_{r}

has mixed membership and may belong to more than one row communities for

i_{r} \in [n_{r}]

.

For column nodes, let ℓ be the

n_{c} \times 1

vector whose

i_{c}

-th entry

ℓ (i_{c}) = k

if column node

i_{c}

belongs to the k-th column community, and

ℓ (i_{c})

takes value from

{1, 2, \dots, K_{c}}

for

i_{c} \in [n_{c}]

. Let

Π_{c} \in R^{n_{c} \times K_{c}}

be the membership matrix of column nodes, such that for

i_{c} \in [n_{c}], k \in [K_{c}]

,

\begin{matrix} Π_{c} (i_{c}, k) = 1 when ℓ (i_{c}) = k, and 0 otherwise, and {∥ Π_{c} (i_{c}, :) ∥}_{1} = 1 . \end{matrix}

(2)

From such a definition, column node

i_{c}

belongs to exactly one of the

K_{c}

column communities for

i_{c} \in [n_{c}]

. Sure, all of the column nodes are pure nodes.

In this paper, we assume that

\begin{matrix} K_{r} \leq K_{c} . \end{matrix}

(3)

Equation (3) is required for the identifiability of ONM. Let

P \in R^{K_{r} \times K_{c}}

be the probability matrix, such that

\begin{matrix} 0 \leq P (k, l) \leq ρ \leq 1 for k \in [K_{r}], l \in [K_{c}], \end{matrix}

(4)

where

ρ

controls the network sparsity and is called the sparsity parameter in this paper. For convenience, set

P = ρ \tilde{P}

, where

\tilde{P} (k, l) \in [0, 1]

for

k \in [K_{r}], l \in [K_{c}]

, and

\max_{k \in [K_{r}], l \in [K_{c}]} \tilde{P} (k, l) = 1

for model identifiability. For all pairs of

(i_{r}, i_{c})

with

i_{r} \in [n_{r}], i_{c} \in [n_{c}]

, our model assumes that

A (i_{r}, i_{c})

are independent Bernoulli random variables satisfying

\begin{matrix} Ω : = Π_{r} P Π_{c}^{'}, A (i_{r}, i_{c}) \sim Bernoulli (Ω (i_{r}, i_{c})), \end{matrix}

(5)

where

Ω = E [A]

, and we call it the population adjacency matrix in this paper.

Definition 1.

Call model (1)–(5) the Overlapping and Non-overlapping model (ONM), and denote it with

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

.

Remark 1.

Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, for

i_{r} \in [n_{r}], j_{c} \in [n_{c}]

, since

P (A (i_{r}, j_{c}) = 1) = Ω (i_{r}, j_{c}) = ρ Π (i_{r}, :) \tilde{P} Π_{c}^{'} (j_{c}, :)

, we see that increasing ρ increases the probability to generate an edge from row node

i_{r}

to column node

j_{c}

, i.e., the sparsity of the network is governed by ρ.

The following conditions are sufficient for the identifiability of ONM:

(I1) $rank (P) = K_{r}, rank (Π_{r}) = K_{r}$ , and $rank (Π_{c}) = K_{c}$ .
(I2) There is at least one pure row node for each of the $K_{r}$ row communities.

For

k \in [K_{r}]

, let

I_{r}^{(k)} = {i \in [n_{r}]} : Π_{r} (i, k) = 1}

. By condition (I2),

I_{r}^{(k)}

is non-empty for all

k \in [K_{r}]

. For

k \in [K_{r}]

, select one row node from

I_{r}^{(k)}

to construct the index set

I_{r}

; i.e.,

I_{r}

is the indices of row nodes corresponding to

K_{r}

pure row nodes, one from each row community. Without loss of generality, let

Π_{r} (I_{r}, :) = I_{K_{r}}

(Lemma 2.1 [17] also has a similar setting to design their spectral algorithm under MMSB).

I_{c}

is defined similarly for the column nodes, such that

Π_{c} (I_{c}, :) = I_{K_{c}}

. The next proposition guarantees the identifiability of ONM.

Proposition 1.

If conditions (I1) and (I2) hold, ONM is identifiable: For eligible

(P, Π_{r}, Π_{c})

and

(\overset{ˇ}{P}, {\overset{ˇ}{Π}}_{r}, {\overset{ˇ}{Π}}_{c})

, if

Π_{r} P Π_{c}^{'} = {\overset{ˇ}{Π}}_{r} \overset{ˇ}{P} {\overset{ˇ}{Π}}_{c}^{'}

, then

P = \overset{ˇ}{P}, Π_{r} = {\overset{ˇ}{Π}}_{r}

, and

Π_{c} = {\overset{ˇ}{Π}}_{c}

.

All proofs of propositions, lemmas, and theorems are provided in Appendix B and Appendix C of this paper. Compared to some previous models, ONM models different directed networks.

When the row nodes are the same as the column nodes, $K_{r} = K_{c}$ , and all nodes are pure, ONM degenerates to SBM. However, ONM can model directed networks where row nodes enjoy mixed memberships, while SBM only models un-directed networks.
When all row nodes are pure, our ONM reduces to ScBM with $K_{r}$ row clusters and $K_{c}$ column clusters [18]. However, ONM allows for row nodes to have overlapping memberships, while ScBM does not. Meanwhile, for model identifiability, ScBM does not require $rank (P) = K_{r}$ that ONM requires, and this can be seen as the cost of ONM when modeling the overlapping row nodes.
Though DiMMSB [21] can model directed networks whose row and column nodes have overlapping memberships, DiMMSB requires $K_{r} = K_{c}$ for model identifiability. For comparison, our ONM allows $K_{r} \leq K_{c}$ at the cost of losing the overlapping property of the column nodes.

2.1. A Spectral Algorithm for Fitting ONM

The primary goal of the proposed algorithm is to estimate the row membership matrix

Π_{r}

and the column membership matrix

Π_{c}

from the observed adjacency matrix A with a given

K_{r}

and

K_{c}

. We now discuss our intuition for the design of our algorithm to fit ONM.

Under conditions (I1) and (I2), by basic algebra, we have

rank (Ω) = K_{r}

. Let

Ω = U_{r} Λ U_{c}^{'}

be the compact singular value decomposition of

Ω

, where

U_{r} \in R^{n_{r} \times K_{r}}, Λ \in R^{K_{r} \times K_{r}}, U_{c} \in R^{n_{c} \times K_{r}}

,

U_{r}^{'} U_{r} = I_{K_{r}}, U_{c}^{'} U_{c} = I_{K_{r}}

, and

I_{K_{r}}

is a

K_{r} \times K_{r}

identity matrix. Let

n_{c, k} = | {i_{c} : ℓ (i_{c}) = k} |

be the size of the k-th column community for

k \in [K_{c}]

. Let

n_{c, \max} = \max_{k \in [K_{c}]} n_{c, k}

and

n_{c, \min} = \min_{k \in [K_{c}]} n_{c, k}

. Meanwhile, without causing confusion, let

n_{c, K_{r}}

be the

K_{r}

-th largest size among all column communities. The following lemma guarantees that

U_{r}

enjoys an ideal simplex structure and

U_{c}

has

K_{c}

distinct rows.

Lemma 1.

Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, there exists a unique

K_{r} \times K_{r}

matrix

B_{r}

and a unique

K_{c} \times K_{r}

matrix

B_{c}

, such that

$U_{r} = Π_{r} B_{r}$ , where $B_{r} = U_{r} (I_{r}, :)$ . Meanwhile, $U_{r} (i_{r}, :) = U_{r} ({\bar{i}}_{r}, :)$ when $Π_{r} (i_{r}, :) = Π_{r} ({\bar{i}}_{r}, :)$ for $i_{r}, {\bar{i}}_{r} \in [n_{r}]$ .
$U_{c} = Π_{c} B_{c}$ . Meanwhile, $U_{c} (i_{c}, :) = U_{c} ({\bar{i}}_{c}, :)$ when $ℓ (i_{c}) = ℓ ({\bar{i}}_{c})$ for $i_{c}, {\bar{i}}_{c} \in [n_{c}]$ , i.e., $U_{c}$ has $K_{c}$ distinct rows. Furthermore, when $K_{r} = K_{c} = K$ , we have $∥ B_{c} (k, :) - B_{c} {(l, :) ∥}_{F} = \sqrt{\frac{1}{n_{c, k}} + \frac{1}{n_{c, l}}}$ for all $1 \leq k < l \leq K$ .

Lemma 1 says that the rows of

U_{c}

form a

K_{r}

-simplex in

R^{K_{r}}

, which we call the Ideal Simplex (IS), with the

K_{r}

rows of

B_{r}

being the vertices. This IS is also found in [9,17,21]. Meanwhile, Lemma 1 says that

U_{c}

has

K_{c}

distinct rows, and if two column nodes

i_{c}

and

{\bar{i}}_{c}

are from the same column community, then

U_{c} (i_{c}, :) = U_{c} ({\bar{i}}_{c}, :)

.

Under ONM, to recover

Π_{c}

from

U_{c}

, since

U_{c}

has

K_{c}

distinct rows, applying the k-means algorithm on all rows of

U_{c}

returns true column communities by Lemma 1. Since

U_{c}

has

K_{c}

distinct rows, we can set

δ_{c} = \min_{k \neq l} {∥ B_{c} (k, :) - B_{c} (l, :) ∥}_{F}

to measure the minimum center separation of

B_{c}

. By Lemma 1,

δ_{c} \geq \sqrt{\frac{2}{n_{c, \max}}}

when

K_{r} = K_{c} = K

under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

. However, when

K_{r} < K_{c}

, it is a challenge to obtain a positive lower bound of

δ_{c}

; see the proof of Lemma 1 for details.

Under ONM, to recover

Π_{r}

from

U_{r}

, since

B_{r}

is full rank, if

U_{r}

and

B_{r}

are known in advance ideally, we can exactly recover

Π_{r}

by setting

Π_{r} = U_{r} B_{r}^{'} {(B_{r} B_{r}^{'})}^{- 1}

via Lemma 1. Set

Y_{r} = U_{r} B_{r}^{'} {(B_{r} B_{r}^{'})}^{- 1}

. Since

Y_{r} \equiv Π_{r}

and

∥ Π_{r} (i_{r}, :) ∥_{1} = 1

for

i_{r} \in [n_{r}]

, we have

\begin{matrix} Π_{r} (i_{r}, :) = \frac{Y_{r} (i_{r}, :)}{∥ Y_{r} (i_{r}, :) ∥_{1}}, i_{r} \in [n_{r}] . \end{matrix}

With a given

U_{r}

, since it enjoys IS structure

U_{r} = Π_{r} B_{r} \equiv Π_{r} U_{r} (I_{r}, :)

, as long as we can obtain the row corner matrix

U_{r} (I_{r}, :)

(i.e.,

B_{r}

), we can recover

Π_{r}

exactly. As mentioned in [9,17,21], for such an ideal simplex, the successive projection (SP) algorithm [27] (for details of SP, see Algorithm A1) can be applied to

U_{r}

with

K_{r}

row communities to find

U_{r} (I_{r}, :)

.

Based on the above analysis, we are now ready to give the following algorithm which we call Ideal ONA. Input

Ω, K_{r}

, and

K_{c}

with

K_{r} \leq K_{c}

. Outputs:

Π_{r}

and ℓ.

Let $Ω = U_{r} Λ U_{c}^{'}$ be the compact SVD of $Ω$ , such that $U_{r} \in R^{n_{r} \times K_{r}}, U_{c} \in R^{n_{c} \times K_{r}}, Λ \in R^{K_{r} \times K_{r}}, U_{r}^{'} U_{r} = I_{K_{r}}, a n d U_{c}^{'} U_{c} = I_{K_{r}}$ .
For the row nodes,
-
Run the SP algorithm on all rows of $U_{r}$ , assuming there are $K_{r}$ row communities to obtain $U_{r} (I_{r}, :)$ . Set $B_{r} = U_{r} (I_{r}, :)$ .
-
Set $Y_{r} = U_{r} B_{r}^{'} {(B_{r} B_{r}^{'})}^{- 1}$ . Recover $Π_{r}$ by setting $Π_{r} (i_{r}, :) = \frac{Y_{r} (i_{r}, :)}{∥ Y_{r} (i_{r}, :) ∥_{1}}$ for $i_{r} \in [n_{r}]$ .
For the column nodes,
-
Run k-means on $U_{c}$ assuming that there are $K_{c}$ column communities, i.e., find the solution to the following optimization problem

$\begin{matrix} M^{*} = {argmin}_{M \in M_{n_{c}, K_{r}, K_{c}}} {∥ M - U_{c} ∥}_{F}^{2}, \end{matrix}$

where $M_{n_{c}, K_{r}, K_{c}}$ denotes the set of $n_{c} \times K_{r}$ matrices with only $K_{c}$ different rows.
-
Use $M^{*}$ to obtain the labels vector ℓ of the column nodes. Note that since $M^{*}$ has $K_{c}$ distinct rows, two different column nodes, $i_{c}, {\bar{i}}_{c} \in [n_{c}]$ , are in the same column community if $M^{*} (i_{c}, :) = M^{*} ({\bar{i}}_{c}, :)$ .

Following a similar proof of Theorem 1 of [21], the Ideal ONA exactly recovers row nodes memberships and column nodes labels, and this also verifies the identifiability of ONM in turn. For convenience, call the two steps for column nodes “run k-means on

U_{c}

assuming there are

K_{c}

column communities to obtain ℓ”.

We now extend the ideal case to the real case. Set

\tilde{A} = {\hat{U}}_{r} \hat{Λ} {\hat{U}}_{c}^{'}

be the top-

K_{r}

-dimensional SVD of A, such that

{\hat{U}}_{r} \in R^{n_{r} \times K_{r}}, {\hat{U}}_{c} \in R^{n_{c} \times K_{r}}, \hat{Λ} \in R^{K_{r} \times K_{r}}, {\hat{U}}_{r}^{'} {\hat{U}}_{r} = I_{K_{r}}, {\hat{U}}_{c}^{'} {\hat{U}}_{c} = I_{K_{r}}

, and

\hat{Λ}

contains the top

K_{r}

singular values of A. For the real case, we use

{\hat{B}}_{r}, {\hat{B}}_{c}, {\hat{Y}}_{r}, {\hat{Π}}_{r}, {\hat{Π}}_{c}

given in Algorithm 1 to estimate

B_{r}, B_{c}, Y_{r}, Π_{r}, Π_{c}

, respectively. Algorithm 1, called the Overlapping and Non-overlapping algorithm (ONA for short), is a natural extension of the Ideal ONA to the real case. In ONA, we set the negative entries of

{\hat{Y}}_{r}

as 0 by setting

{\hat{Y}}_{r} = \max (0, {\hat{Y}}_{r})

, for the reason that the weights for any row node should be non-negative while there may exist some negative entries of

{\hat{U}}_{r} {\hat{B}}_{r}^{'} {({\hat{B}}_{r} {\hat{B}}_{r}^{'})}^{- 1}

. Note that in a directed network, if the column nodes have an overlapping property while row nodes do not, to perform community detection for such a directed network, the transpose of the adjacency matrix should be set as input when applying our algorithm.

Algorithm 1Overlapping and Non-overlapping Algorithm (ONA)

Require: The adjacency matrix $A \in R^{n_{r} \times n_{c}}$ of a directed network, the number of row communities $K_{r}$ , and the number of column communities $K_{c}$ with $K_{r} \leq K_{c}$ .
Ensure: The estimated $n_{r} \times K_{r}$ membership matrix ${\hat{Π}}_{r}$ for row nodes, and the estimated $n_{c} \times 1$ labels vector $\hat{ℓ}$ for column nodes.
1:
Compute ${\hat{U}}_{r} \in R^{n_{r} \times K_{r}}$ and ${\hat{U}}_{c} \in R^{n_{c} \times K_{r}}$ from the top- $K_{r}$ -dimensional SVD of A.
2:
For row nodes:
Apply SP algorithm (i.e., Algorithm 2) on the rows of ${\hat{U}}_{r}$ assuming there are $K_{r}$ row clusters to obtain the near-corners matrix ${\hat{U}}_{r} ({\hat{I}}_{r}, :) \in R^{K_{r} \times K_{r}}$ , where ${\hat{I}}_{r}$ is the index set returned by SP algorithm. Set ${\hat{B}}_{r} = {\hat{U}}_{r} ({\hat{I}}_{r}, :)$ .
Compute the $n_{r} \times K_{r}$ matrix ${\hat{Y}}_{r}$ such that ${\hat{Y}}_{r} = {\hat{U}}_{r} {\hat{B}}_{r}^{'} {({\hat{B}}_{r} {\hat{B}}_{r}^{'})}^{- 1}$ . Set ${\hat{Y}}_{r} = \max (0, {\hat{Y}}_{r})$ and estimate $Π_{r} (i_{r}, :)$ by ${\hat{Π}}_{r} (i_{r}, :) = \frac{{\hat{Y}}_{r} (i_{r}, :)}{∥ {\hat{Y}}_{r} (i_{r}, :) ∥_{1}}, i_{r} \in [n_{r}]$ .

For column nodes: run k-means on

{\hat{U}}_{c}

assuming there are

K_{c}

column communities to obtain

\hat{ℓ}

.

2.2. Main Results for ONA

In this section, we show the consistency of our algorithm for fitting the ONM as the number of row nodes

n_{r}

and the number of column nodes

n_{c}

increases. Throughout this paper,

K_{r}

and

K_{c}

are two known integers. First, we assume that:

Assumption 1.

ρ \max (n_{r}, n_{c}) \geq \log (n_{r} + n_{c})

.

Assumption 1 controls the sparsity of the directed network considered for theoretical study. When building an estimation consistency of the spectral clustering methods in community detection, the sparsity assumption is common; see [13,14,17,18,20,21]. Especially, when ONM reduces to SBM, the sparsity requirement in Assumption 1 is consistent with that of Theorem 3.1 in [13], which guarantees the theoretical optimality on the sparsity condition of this paper. To measure the performance of ONA for row nodes memberships, since row nodes have mixed memberships, naturally, we use the

l_{1}

norm difference between

Π_{r}

and

{\hat{Π}}_{r}

. Since the column nodes are all pure nodes, we consider the performance criterion defined in [15] to measure the estimation error of ONA on the column nodes. We introduce this measurement of estimation error below.

Let

T_{c} = {T_{c, 1}, T_{c, 2}, \dots, T_{c, K_{c}}}

be the true partition of column nodes

{1, 2, \dots, n_{c}}

obtained from ℓ, such that

T_{c, k} = {i_{c} \in [n_{c}] : ℓ (i_{c}) = k}

for

k \in [K_{c}]

. Let

{\hat{T}}_{c} = {{\hat{T}}_{c, 1}, {\hat{T}}_{c, 2}, \dots, {\hat{T}}_{c, K_{c}}}

be the estimated partition of column nodes

{1, 2, \dots, n_{c}}

obtained from

\hat{ℓ}

of ONA, such that

{\hat{T}}_{c, k} = {i_{c} \in [n_{c}] : \hat{ℓ} (i_{c}) = k}

for

k \in [K_{c}]

. The criterion is defined as

\begin{matrix} {\hat{f}}_{c} = \min_{π \in S_{K_{c}}} \max_{k \in [K_{c}]} \frac{| T_{c, k} \cap {\hat{T}}_{c, π (k)}^{c} | + | T_{c, k}^{c} \cap {\hat{T}}_{c, π (k)} |}{n_{c, k}}, \end{matrix}

where

S_{K_{c}}

is the set of all permutations of

{1, 2, \dots, K_{c}}

, and the superscript c denotes the complementary set. As mentioned in [15],

{\hat{f}}_{c}

measures the maximum proportion of column nodes in the symmetric difference of

T_{c, k}

and

{\hat{T}}_{c, π (k)}

.

The next theorem gives the theoretical bounds on the estimations of memberships for both the row and column nodes, which is the main theoretical result for ONA.

Theorem 1.

Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, when Assumption 1 holds, suppose that

σ_{K_{r}} (Ω) \geq C \sqrt{ρ (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

for any

α > 0

,

For row nodes, there exists a permutation matrix $P_{r}$ such that

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (ϖ κ (Π_{r}^{'} Π_{r}) K_{r} \sqrt{λ_{1} (Π_{r}^{'} Π_{r})}), \end{matrix}$

where $ϖ = ∥ {\hat{U}}_{r} {\hat{U}}_{r}^{'} - U_{r} U_{r}^{'} ∥_{2 \to \infty}$ is the row-wise singular eigenvector error.
For column nodes, ${\hat{f}}_{c} = O (\frac{K_{r} K_{c} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}})$ . Especially, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{K^{2} \max (n_{r}, n_{c}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K}^{2} (\tilde{P}) ρ σ_{K}^{2} (Π_{r}) n_{c, \min}^{2}}) . \end{matrix}$

Adding conditions similar to Corollary 3.1 in [17], we have the following corollary.

Corollary 1.

Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, suppose conditions in Theorem 1 hold, and further, suppose that

λ_{K_{r}} (Π_{r}^{'} Π_{r}) = O (\frac{n_{r}}{K_{r}}), n_{c, \min} = O (\frac{n_{c}}{K_{c}})

, with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

For row nodes, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{K^{2} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{σ_{K} (\tilde{P}) \sqrt{ρ n_{c}}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{K_{r}^{2} K_{c}^{3} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} n_{r} n_{c}^{2}})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{K^{4} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K}^{2} (\tilde{P}) ρ n_{r} n_{c}}) . \end{matrix}$

Especially, when

n_{r} = O (n), n_{c} = O (n), K_{r} = O (1)

, and

K_{c} = O (1)

,

For row nodes, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{\sqrt{\log (n)}}{σ_{K} (\tilde{P}) \sqrt{ρ n}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{\log (n)}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} n^{2}})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{\log (n)}{σ_{K}^{2} (\tilde{P}) ρ n}) . \end{matrix}$

When

n_{r} = O (n), n_{c} = O (n), K_{r} = K_{c} = K = O (1)

in Corollary 1, the bounds for the row and column nodes are

O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{n}})

and

O (\frac{1}{σ_{K}^{2} (\tilde{P})} \frac{\log (n)}{ρ n})

, respectively, and we see that ONA yields a stable and consistent community detection for both the row and column nodes, since the error rates go to zero as

n \to \infty

when

\tilde{P}

is fixed. Especially, for the row nodes with mixed memberships, when the DCMM proposed in [9] reduces to MMSB and

K = O (1)

, the error bound of the Mixed-SCORE in Theorem 2.2 of [9] is also

O (\frac{1}{σ_{K} (\tilde{P})} \sqrt{\frac{\log (n)}{n}})

, which guarantees the theoretical optimality of our analysis for the row nodes. For the column nodes, when every column community enjoys similar sizes and

K = O (1)

, our bound

O (\frac{1}{σ_{K}^{2} (\tilde{P})} \frac{\log (n)}{ρ n})

matches Corollary 3.2 in [13] up to a logarithmic factor, which guarantees the theoretical optimality of our analysis for column nodes. Furthermore, the optimality of our requirement on network sparsity and the theoretical upper bounds of ONA’s error rates is also supported by using the separation condition and sharp threshold criterion developed in [28].

3. The Overlapping and Degree-Corrected Non-Overlapping Model

In this section, we propose an extension of ONM by considering the degree heterogeneity, and we build theoretical guarantees for algorithm fitting our model.

Let

θ_{c}

be an

n_{c} \times 1

vector whose

i_{c}

-th entry is the degree heterogeneity of column node

i_{c}

, for

i_{c} \in [n_{c}]

. Let

Θ_{c}

be an

n_{c} \times n_{c}

diagonal matrix whose

i_{c}

-th diagonal element is

θ_{c} (i_{c})

. For

i_{r} \in [n_{r}], i_{c} \in [n_{c}]

, the extended model for generating A is:

\begin{matrix} Ω : = Π_{r} P Π_{c}^{'} Θ_{c}, A (i_{r}, i_{c}) \sim Bernoulli (Ω (i_{r}, i_{c})) . \end{matrix}

(6)

Definition 2.

Call model (1)–(4), (6) the Overlapping and Degree-Corrected Non-overlapping model (ODCNM), and denote it by

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

.

Note that, under ODCNM, the maximum element of P can be larger than 1, since

\max_{i_{c} \in [n_{c}]} θ_{c} (i_{c})

also controls the sparsity of directed network

N

. The following proposition guarantees that ODCNM is identifiable in terms of

P, Π_{r}

, and

Π_{c}

, and such identifiability is similar to that of DCSBM.

Proposition 2.

If conditions (I1) and (I2) hold, ODCNM is identifiable for the membership matrices: For eligible

(P, Π_{r}, Π_{c}, Θ_{c})

and

(\overset{ˇ}{P}, {\overset{ˇ}{Π}}_{r}, {\overset{ˇ}{Π}}_{c}, {\overset{ˇ}{Θ}}_{c})

, if

Π_{r} P Π_{c}^{'} Θ_{c} = {\overset{ˇ}{Π}}_{r} \overset{ˇ}{P} {\overset{ˇ}{Π}}_{c}^{'} {\overset{ˇ}{Θ}}_{c}

, then

Π_{r} = {\overset{ˇ}{Π}}_{r}

and

Π_{c} = {\overset{ˇ}{Π}}_{c}

.

Remark 2.

By setting

θ_{c} (i_{c}) = ρ

for

i_{c} \in [n_{c}]

, ODCNM reduces to ONM, and this is the reason for why ODCNM can be seen as an extension of ONM. Meanwhile, though DCScBM [18] can model directed networks with degree heterogeneities for both row and column nodes, DCScBM does not allow the overlapping property for row nodes. For comparison, our ODCNM allows row nodes to have an overlapping property at the cost of losing the degree heterogeneities and requiring

K_{r} \leq K_{c}

for model identifiability.

3.1. A Spectral Algorithm for Fitting ODCNM

We now discuss our intuition for the design of our algorithm to fit ODCNM. Without causing confusion, we also use

U_{r}, U_{c}, B_{r}, B_{c}, δ_{c}, Y_{r}

under ODCNM. Let

U_{c, *} \in R^{n_{c} \times K_{r}}

be the row-normalized version of

U_{c}

, such that

U_{c, *} (i_{c}, :) = \frac{U_{c} (i_{c}, :)}{∥ U_{c} (i_{c}, :) ∥_{F}}

for

i_{c} \in [n_{c}]

. Then, clustering the rows of

U_{c, *}

using the k-means algorithm can return perfect clustering for column nodes, and this is guaranteed by the following lemma.

Lemma 2.

Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

, there exists a unique

K_{r} \times K_{r}

matrix

B_{r}

and a unique

K_{c} \times K_{r}

matrix

B_{c}

, such that

$U_{r} = Π_{r} B_{r}$ , where $B_{r} = U_{r} (I_{r}, :)$ . Meanwhile, $U_{r} (i_{r}, :) = U_{r} ({\bar{i}}_{r}, :)$ when $Π_{r} (i_{r}, :) = Π_{r} ({\bar{i}}_{r}, :)$ for $i_{r}, {\bar{i}}_{r} \in [n_{r}]$ .
$U_{c, *} = Π_{c} B_{c}$ . Meanwhile, $U_{c, *} (i_{c}, :) = U_{c, *} ({\bar{i}}_{c}, :)$ when $ℓ (i_{c}) = ℓ ({\bar{i}}_{c})$ for $i_{c}, {\bar{i}}_{c} \in [n_{c}]$ . Furthermore, when $K_{r} = K_{c} = K$ , we have $∥ B_{c} (k, :) - B_{c} {(l, :) ∥}_{F} = \sqrt{2}$ for all $1 \leq k < l \leq K$ .

Recall that we set

δ_{c} = \min_{k \neq l} {∥ B_{c} (k, :) - B_{c} (l, :) ∥}_{F}

by Lemma 2;

δ_{c} = \sqrt{2}

when

K_{r} = K_{c} = K

under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

. However, when

K_{r} < K_{c}

, it is a challenge to obtain a positive lower bound of

δ_{c}

; see the proof of Lemma 2 for details.

Under ODCNM, to recover

Π_{c}

from

U_{c}

, since

U_{c, *}

has

K_{c}

distinct rows, applying the k-means algorithm on all rows of

U_{c, *}

returns true column communities by Lemma 2. To recover

Π_{r}

from

U_{r}

, the same idea as that of under ONM can be followed.

Based on the above analysis, we are now ready to present the following algorithm, which we call Ideal ODCNA. Input

Ω, K_{r}, K_{c}

with

K_{r} \leq K_{c}

. Output:

Π_{r}

and ℓ.

Let $Ω = U_{r} Λ U_{c}^{'}$ be the compact SVD of $Ω$ , such that $U_{r} \in R^{n_{r} \times K_{r}}, U_{c} \in R^{n_{c} \times K_{r}}, Λ \in R^{K_{r} \times K_{r}}, U_{r}^{'} U_{r} = I_{K_{r}}, U_{c}^{'} U_{c} = I_{K_{r}}$ . Let $U_{c, *}$ be the row-normalization of $U_{c}$ .
For row nodes, they are the same as that of Ideal ONA.
For column nodes: run k-means on $U_{c, *}$ assuming there are $K_{c}$ column communities to obtain ℓ.

We now extend the ideal case to the real case. Let

{\hat{U}}_{c, *} \in R^{n_{c} \times K_{r}}

be the row-normalized version of

{\hat{U}}_{c}

, such that

{\hat{U}}_{c, *} (i_{c}, :) = \frac{{\hat{U}}_{c} (i_{c}, :)}{∥ {\hat{U}}_{c} (i_{c}, :) ∥_{F}}

for

i_{c} \in [n_{c}]

. The Overlapping and Degree-Corrected Non-overlapping Algorithm (ODCNA for short) is a natural extension of the Ideal ODCNA to the real case, where all steps of ODCNA are the same as ONA except for those for column nodes. ODCNA applies k-means on

{\hat{U}}_{c, *}

to obtain

\hat{ℓ}

.

3.2. Main Results for ODCNA

Set

θ_{c, \max} = \max_{i_{c} \in [n_{c}]} θ_{c} (i_{c}), θ_{c, \min} = \min_{i_{c} \in [n_{c}]} θ_{c} (i_{c})

, and

P_{\max} = \max_{k \in [K_{r}], l \in [n_{c}]} P (k, l)

. Assume that

Assumption 2.

P_{\max} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \geq \log (n_{r} + n_{c})

.

The next theorem is the main theoretical result for ODCNA, where we also use the same measurements as ONA to measure the performances of ODCNA.

Theorem 2.

Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

, when Assumption 2 holds, suppose

σ_{K_{r}} (Ω) \geq C \sqrt{θ_{c, \max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with a probability at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

For the row nodes,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (ϖ κ (Π_{r}^{'} Π_{r}) K_{r} \sqrt{λ_{1} (Π_{r}^{'} Π_{r})}) . \end{matrix}$
For the column nodes,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K_{r} K_{c} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}}), \end{matrix}$

where $m_{V_{c}}$ is a parameter defined in the proof of this theorem, and it is 1 when $K_{r} = K_{c}$ . Especially, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K}^{2} (P) θ_{c, \min}^{4} σ_{K}^{2} (Π_{r}) n_{c, \min}^{2}}) . \end{matrix}$

Adding some conditions on model parameters, we have the following corollary.

Corollary 2.

Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

, suppose that conditions in Theorem 2 hold, and further, suppose that

λ_{K_{r}} (Π_{r}^{'} Π_{r}) = O (\frac{n_{r}}{K_{r}}), n_{c, \min} = O (\frac{n_{c}}{K_{c}})

, with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

For row nodes, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{K^{2} \sqrt{θ_{c, \max}} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K} (P) \sqrt{n_{c}}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K_{r}^{2} K_{c}^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} n_{r} n_{c}})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K^{4} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}{σ_{K}^{2} (P) θ_{c, \min}^{4} n_{r} n_{c}}) . \end{matrix}$

Especially, when

n_{r} = O (n), n_{c} = O (n), K_{r} = O (1)

and

K_{c} = O (1)

,

For row nodes, when $K_{r} = K_{c}$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{\sqrt{θ_{c, \max} \log (n)}}{θ_{c, \min} σ_{K} (P) \sqrt{n}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n)}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} n^{2}})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n)}{σ_{K}^{2} (P) θ_{c, \min}^{4} n^{2}}) . \end{matrix}$

If we further set

θ_{c, \max} = O (ρ)

and

θ_{c, \min} = O (ρ)

, we have the below corollary.

Corollary 3.

Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

, suppose that the conditions in Theorem 2 hold, and further, suppose that

λ_{K_{r}} (Π_{r}^{'} Π_{r}) = O (\frac{n_{r}}{K_{r}}), n_{c, \min} = O (\frac{n_{c}}{K_{c}})

and

θ_{c, \max} = O (ρ), θ_{c, \min} = O (ρ)

, with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

For row nodes, when $K_{r} = K_{c} = K$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{K^{2} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{σ_{K} (P) \sqrt{ρ n_{c}}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{K_{r}^{2} K_{c}^{2} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) ρ δ_{c}^{2} m_{V_{c}}^{2} n_{r} n_{c}})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{K^{4} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K}^{2} (P) ρ n_{r} n_{c}}) . \end{matrix}$

Especially, when

n_{r} = O (n), n_{c} = O (n), K_{r} = O (1)

and

K_{c} = O (1)

,

For row nodes, when $K_{r} = K_{c}$ ,

$\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{\sqrt{\log (n)}}{σ_{K} (P) \sqrt{ρ n}}) . \end{matrix}$
For column nodes, ${\hat{f}}_{c} = O (\frac{\log (n)}{σ_{K_{r}}^{2} (P) ρ δ_{c}^{2} m_{V_{c}}^{2} n})$ . When $K_{r} = K_{c} = K$ ,

$\begin{matrix} {\hat{f}}_{c} = O (\frac{\log (n)}{σ_{K}^{2} (P) ρ n}) . \end{matrix}$

By setting

Θ_{c} = ρ I

, ODCNM degenerates to ONM. By comparing Corollaries 1 and 3, we see that theoretical results under ODCNM are consistent with those under ONM when ODCNM degenerates to ONM for the case where

K_{r} = K_{c} = K

.

4. Simulations

In this section, we present some simulations to investigate the performances of the two proposed algorithms. We measure their performances using the Mixed-Hamming error rate (MHamm for short) for row nodes, and the Hamming error rate (Hamm for short) for the column nodes defined below

\begin{matrix} MHamm = \frac{\min_{π \in S_{K_{r}}} {∥ {\hat{Π}}_{r} π - Π_{r} ∥}_{1}}{n_{r}}, Hamm = \frac{\min_{π \in S_{K_{c}}} {∥ {\hat{Π}}_{c} π - Π_{c} ∥}_{0}}{n_{c}}, \end{matrix}

where

S_{K_{r}}

is the set of all permutations of

{1, 2, \dots, K_{r}}

,

S_{K_{c}}

is the set of all permutations of

{1, 2, \dots, K_{c}}

;

{\hat{Π}}_{c} \in R^{n_{c} \times K_{c}}

is defined as

{\hat{Π}}_{c} (i_{c}, k) = 1

if

\hat{ℓ} (i_{c}) = k

, and 0 otherwise for

i_{c} \in [n_{c}], k \in [K_{c}]

.

For all simulations in this section, the parameters

(n_{r}, n_{c}, K_{r}, K_{c}, P, ρ, Π_{r}, Π_{c}, Θ_{c})

are set as below. Unless specified, set

n_{r} = 400, n_{c} = 300, K_{r} = 3, K_{c} = 4

. For the column nodes, generate

Π_{c}

by setting each column node belonging to one of the column communities with equal probability. Let each row community have 100 pure nodes, and let all the mixed row nodes have memberships

(0.6, 0.3, 0.1)

.

P = ρ \tilde{P}

is set independently under ONM and ODCNM. Under ONM,

ρ

is 0.5 in Experiment 1, and we study the influence of

ρ

in Experiment 2. Under ODCNM, for

z_{c} \geq 1

, we generate the degree parameters for the column nodes as below: let

θ_{c} \in R^{n_{c} \times 1}

, such that

1 / θ_{c} (i_{c}) \overset{i i d}{\sim} U (1, z_{c})

for

i_{c} \in [n_{c}]

, where

U (1, z_{c})

denotes the uniform distribution on

[1, z_{c}]

. We study the influences of

z_{c}

and

ρ

under ODCNM in Experiments 3 and 4, respectively. For all settings, we report the averaged MHamm and the averaged Hamm over 50 repetitions.

Experiment 1: Changing $n_{c}$ under ONM. Let

n_{c}

range over

{50, 100, 150, \dots, 300}

. For this experiment, P is set as

P = ρ [\begin{matrix} 1 & 0.3 & 0.2 & 0.3 \\ 0.2 & 0.9 & 0.1 & 0.2 \\ 0.3 & 0.2 & 0.8 & 0.3 \end{matrix}] .

Let

ρ = 0.5

for this experiment designed under ONM. The numerical results are shown in panels (a) and (b) of Figure 1. The results show that as

n_{c}

increases, ONA and ODCNA perform better. For the row nodes, since both ONA and ODCNA apply the SP algorithm on

\hat{U}

to estimate

Π_{r}

, the estimated row membership matrices of ONA and ODCNA are same, and hence, MHamm for ONA is always equal to that of ODCNA.

Experiment 2: Changing $ρ$ under ONM.P is set the same as in Experiment 1, and we let the range of

ρ

be

{0.1, 0.2, \dots, 1}

to study the influence of

ρ

on the performances of ONA and ODCNA under ONM. The results are displayed in panels (c) and (d) of Figure 1. From the results, we can see that both methods perform better as

ρ

increases, since a larger

ρ

gives more edges generated in a directed network.

Experiment 3: Change $z_{c}$ under ODCNM.P is set to be the same as Experiment 1, and

ρ = 0.5

. Let

z_{c}

range in

{1, 2, \dots, 8}

. Increasing

z_{c}

decreases the edges generated under ODCNM. Panels (e) and (f) in Figure 1 display the simulation results of this experiment. The results show that generally, increasing the variability of the node degrees makes it harder to detect the node memberships for both ONA and ODCNA. Though ODCNA is designed under ODCNM, it holds similar performances as ONA for directed networks in which column nodes have various degrees in this experiment, and this is consistent with our theoretical findings in Corollaries 1 and 2.

Experiment 4: Change $ρ$ under ODCNM. Setting

z_{c} = 3

, P is set to be the same as in Experiment 1, and let

ρ

range in

{0.1, 0.2, \dots, 1}

under ODCNM. Panels (g) and (h) in Figure 1 display the simulation results of this experiment. The performances of the two proposed methods are similar as those of Experiment 2.

Remark 3.

For visuality, we plot A generated under ONM. Let

n_{r} = 24, n_{c} = 20, K_{r} = 2, K_{c} = 2

, and

P = [\begin{matrix} 1 & 0.2 \\ 0.1 & 0.9 \end{matrix}] .

For the row nodes, let

Π_{r} (i_{r}, 1) = 1

for

1 \leq i_{r} \leq 8

,

Π_{r} (i_{r}, 2) = 1

for

9 \leq i_{r} \leq 16

, and

Π_{r} (r_{r}, :) = [0.7 0.3]

for

17 \leq i_{r} \leq 24

. For the column nodes, let

ℓ (i_{c}) = 1

for

1 \leq i_{c} \leq 10

, and

ℓ (i_{c}) = 2

for

11 \leq i_{c} \leq 20

. For the above setting, we generate two random adjacency matrices in Figure 2, where we also report the error rates of ONA and ODCNA. Note that, since the adjacency matrices are shown in Figure 2, and as

Π_{r}, ℓ, K_{r}

, and

K_{c}

are known here, readers can apply ONA and ODCNA to A in Figure 2 to check the effectiveness of ONA and ODCNA.

Remark 4.

For visuality, we also plot a directed network as well as its adjacency matrix generated under ONM. Let

n_{r} = 30, n_{c} = 30, K_{r} = 2, K_{c} = 3

, and

P = 0.9 [\begin{matrix} 1 & 0.1 & 0.1 \\ 0.1 & 0.9 & 0.1 \end{matrix}] .

For row nodes, let

Π_{r} (i_{r}, 1) = 1

for

1 \leq i_{r} \leq 10

,

Π_{r} (i_{r}, 2) = 1

for

11 \leq i_{r} \leq 20

, and

Π_{r} (i_{r}, :) = [0.7 0.3]

for

21 \leq i_{r} \leq 30

. For column nodes, let

ℓ (i_{c}) = 1

for

1 \leq i_{c} \leq 10

,

ℓ (i_{c}) = 2

for

11 \leq i_{c} \leq 20

, and

ℓ (i_{c}) = 3

for

21 \leq i_{c} \leq 30

. For the above setting, we generate one adjacency matrix in panel (a) of Figure 3, where we also report the error rates of ONA and ODCNA. Furthermore, panels (b) and (c) of Figure 3 show the sending pattern and receiving pattern sides of this simulated directed network, respectively.

5. Real Data Analysis

For real-world directed networks, since nodes always have various degrees, we only apply ODCNA to deal with real-world datasets in this section. For the real-world directed networks analyzed in this section, the row nodes are always same as the column nodes, so we do not use subscript r and c to distinguish the row and column nodes here, and we let

n_{r} = n_{c} = n

. Meanwhile, the number of row communities is equal to that of the column communities; i.e,

K_{r} = K_{c} = K

for real data, where we always set

K_{r} = K_{c} = K

, as analyzed in [18], since it is a challenge to determine the number of row (column) communities for real-world directed networks without prior knowledge. When the row nodes are the same as the column nodes,

A (i, j) = 1

means that a directed edge is sent from node i to node j. Thus, for any node, it has two patterns, the sending pattern and the receiving pattern. For the sending (receiving) pattern, we use the sending (receiving) cluster to denote the prior row (column) community, where we use the sending and receiving patterns to distinguish the behaviors of any node having the two patterns, as was performed in [18].

For

{\hat{Π}}_{r}

obtained from ODCNA, we call node i a highly mixed node if

0.8 \geq \max_{1 \leq k \leq K} {\hat{Π}}_{r} (i, k)

, where 0.8 is a threshold. Here, 0.8 is a moderate value to define highly mixed nodes, and we can also choose 0.9, 0.95, or some other values in

(0, 1)

. However, we choose 0.8 as the threshold, because setting the threshold to be larger (or lesser) than 0.8 may be too restrictive (loose) to define highly mixed nodes. The definition of highly mixed node is important, since it tells us whether a node only belongs to one community or belongs to multiple communities. Let

τ

be the proportion of highly mixed row nodes among all nodes, to measure the mixability of a directed network, i.e,

τ = \frac{| i \in [n] : i is a highly mixed node |}{n}

. Meanwhile, we let

{\hat{ℓ}}_{r}

be an

n \times 1

vector, such that

{\hat{ℓ}}_{r} (i) = {argmax}_{1 \leq k \leq K} {\hat{Π}}_{r} (i, k)

, where we use

{\hat{ℓ}}_{r} (i)

to denote the home base sending pattern cluster of node i. Set

\begin{matrix} {Hamm}_{r c} = \frac{\min_{π \in S_{K}} {∥ {\hat{Π}}_{c} π - {\tilde{\hat{Π}}}_{r} ∥}_{0}}{n}, \end{matrix}

where

S_{K}

is the set of all permutations of

{1, 2, \dots, K}

;

{\tilde{\hat{Π}}}_{r} \in R^{n \times K}

is defined as

{\tilde{\hat{Π}}}_{r} (i, k) = 1

if

{\hat{ℓ}}_{r} (i) = k

, and 0 otherwise for

i \in [n], k \in [K]

.

{Hamm}_{r c}

is defined to measure the difference between the sending and receiving pattern clusters. After defining

τ

and

{Hamm}_{r c}

, we see that a larger

τ

indicates a directed network in which a large proposition of nodes are highly mixed nodes with a sending pattern, and a larger

{Hamm}_{r c}

indicates that the sending pattern differs a lot with the receiving pattern. For

i \in [n]

, let

d_{s e n d i n g} (i) = \sum_{j = 1}^{n} A (i, j)

denote the total number of edges sent by node i, and let

d_{r e c e i v i n g} (i) = \sum_{j = 1}^{n} A (j, i)

denote the total number of edges that are received by node i. Call

d_{s e n d i n g} (i)

and

d_{r e c e i v i n g} (i)

the sending degree and receiving degree of node i, respectively. For real-world directed networks, we find that there are many nodes whose sending degree or receiving degree are zero, and so we need the following pre-processing steps before analyzing the real data:

(a): Set $S_{s e n d i n g, 0} = {i \in [n] : d_{s e n d i n g} (i) = 0}$ and $S_{r e c e i v i n g, 0} = {i \in [n] : d_{r e c e i v i n g} (i) = 0}$ .
(b): Set $S_{d e g r e e, 0} = S_{s e n d i n g, 0} ⋃ S_{r e c e i v i n g, 0}$ .
(c): Update A by removing the nodes in $S_{d e g r e e, 0}$ .
(d): Repeat (a)–(c) until $S_{d e g r e e, 0}$ is a null set.
(e): After obtaining A through the above four steps, obtain the largest connected component of A.

We now describe the real-world directed networks analyzed in this paper:

Metabolic: This is a directed network representing the metabolic reactions of E.coli bacteria. In this data, node means metabolite, and a directed edge from node i to node j means that there is a reaction where node i is an input and node j is a product [29]. These data can be downloaded from http://networksciencebook.com/translations/en/resources/data.html. The original dat has 1039 nodes; after preprocessing,

A \in {0, 1}^{893 \times 893}

. To estimate K, we plot the leading 20 singular values of A, and panel (a) of Figure 4 shows the result that suggests that

K = 2

for these data, where [18] also applies the idea of an eigengap to estimate K for real-world directed networks with an unknown number of communities.

Political blogs: this is a directed network of hyperlinks between weblogs on US politics [30], and it can be downloaded from http://www-personal.umich.edu/~mejn/netdata/. Political blogs send and receive hyperlinks to and from blogs for the same political persuasion [18], so node means blog and edge means hyperlink in these data. The original network has 1490 nodes. After removing nodes with zero degrees via pre-processing steps, there are 814 nodes left; i.e.,

A \in {0, 1}^{813 \times 813}

for these data. Since there are two parties, “liberal” and “conservative”, K is 2 for both the sending and receiving pattern communities for these data. [18] applies their DI-SIM algorithm to the Political blogs network, assuming that all nodes have non-overlapping property. In this paper, we apply our ODCNA algorithm on these data to study its asymmetric structure on the overlapping property.

Wikipedia links (crh): This directed network represents the wikilinks of the Wikipedia website in the Crimean Turkish language (crh). Node means article, and the directed edge between two articles is the wikilink [31]. These data can be downloaded from http://konect.cc/networks/wikipedia_link_crh/. The original data have 8286 nodes. After pro-processing,

A \in {0, 1}^{3555 \times 3555}

. Panel (c) of Figure 4 suggests

K = 2

for this data.

Wikipedia links (dv): These data represent the wikilinks of the Wikipedia website in the Divehi language (dv), where node means article and the directed edge is a wikilink [31]. These data can be downloaded from http://konect.cc/networks/wikipedia_link_dv/. The original data has 4266 nodes. After removing nodes with zero degrees,

A \in {0, 1}^{2394 \times 2394}

. Panel (d) of Figure 4 suggests

K = 2

for these data.

The proportion of highly mixed nodes and

{Hamm}_{r c}

for these directed networks are reported in Table 1 when assuming that nodes in sending (receiving) clusters having an overlapping (non-overlapping) property. For the Metabolic network, the results show that the sending pattern differs a lot with the receiving pattern, since

{Hamm}_{r c} = 0.2497

is quite large, and there are

893 \times 0.1209 \approx 108

highly mixed nodes in the sending pattern. For the Political blogs network, there is a slight asymmetric structure between the sending pattern and the receiving pattern, since

{Hamm}_{r c} = 0.0443

is small. Meanwhile, for the sending pattern of Political blogs, there are

813 \times 0.0246 \approx 20

highly mixed nodes. Thus, we may conclude that though there is a slight asymmetric structure in sending and receiving patterns for Political blogs, there are 20 highly mixed nodes in the sending pattern. For the Wikipedia links (crh), they have a slight asymmetric structure between sending and receiving patterns, and there are

3555 \times 0.0444 \approx 158

highly mixed nodes in the sending pattern. For the Wikipedia links (dv) network, it has a large number of highly mixed nodes for its large

τ

, and a heavy asymmetric structure in sending and receiving patterns for its large

{Hamm}_{r c}

. Generally, Table 1 suggests that if there are a large number of highly mixed nodes in the sending pattern, there is a heavy asymmetric structure between the sending and receiving patterns, and vice versa.

For visualization, we plot the sending clusters and receiving clusters detected by ODCNA for these directed networks when assuming that nodes in the sending (receiving) clusters have an overlapping (non-overlapping) property, i.e., when the input adjacency matrix of the ODCNA approach is A. The results are shown in Figure 5, Figure 6, Figure 7 and Figure 8, where we also show the highly mixed nodes in sending clusters detected by ODCNA. We see that there exists a clear asymmetric structure between the sending and receiving patterns for Metabolic and Wikipedia links (dv), as shown in Figure 5 and Figure 8, while there is a slight asymmetric structure between the sending and receiving patterns for Political blogs and Wikipedia links (crh), as shown in Figure 6 and Figure 7. Furthermore, most nodes are in the same sending (receiving) cluster for Metabolic and Wikipedia links (crh), while the two sending (receiving) clusters for Political blogs and Wikipedia links (crh) have close sizes. The results also show that most highly mixed nodes have many edges, while some highly mixed nodes have only a few edges, where such a phenomenon can be explained easily, since nodes with many edges tend to have an overlapping property, while it is difficult to detect a community for nodes with only a few edges, and ODCNA tends to treat such nodes as highly mixed nodes.

Furthermore, for real-world directed networks, since we have no prior knowledge on whether nodes in the sending pattern side or the receiving pattern side or both sides have overlapping property, simply inputting A with K sending (receiving) pattern communities in our ODCNA algorithm is not enough. To solve this problem, we also apply ODCNA on

A^{'}

, and the numerical results are provided in Table 2, where the results show that there also exist highly mixed nodes in the receiving pattern for these directed networks, and there also exists a heavy asymmetric structure between the sending and receiving clusters for the Metabolic and Wikipedia links (dv), while there also exists a slight asymmetric structure between the sending and receiving clusters for the Political blogs and Wikipedia links (crh).

6. Discussion

In this paper, we introduced Overlapping and Non-overlapping models and their extension, by considering the degree heterogeneity. The models can model a directed network with

K_{r}

row communities and

K_{c}

column communities, in which the row node can belong to multiple sending clusters, while the column node only belongs to one of the receiving clusters. The proposed models are identifiable under the case when

K_{r} \leq K_{c}

, and some other popular constraints on the connectivity matrix and membership matrices. For comparison, modeling a directed network in which the row nodes have overlapping property while column nodes do not, with

K_{r} > K_{c}

, is unidentifiable. Meanwhile, since previous works have found that modeling directed networks in which both row and column nodes have an overlapping property with

K_{r} \neq K_{c}

is unidentifiable, our identifiable ONM and ODCNM supply a gap in modeling overlapping directed networks when

K_{r} \neq K_{c}

. These models provide exploratory tools for studying community structure in directed networks with one side overlapping while another side is non-overlapping. Two spectral algorithms are designed to fit ONM and ODCNM. We also showed an estimation consistency under mild conditions for our methods. Especially, when ODCNM reduces to ONM, our theoretical results under ODCNM are consistent with those under ONM. The numerical results for the simulated directed networks generated under ONM and ODCNM support our theoretical results, and the results for real-world directed networks reveal the existence of highly mixed nodes and an asymmetric structure between the sending and receiving clusters.

The models and algorithms introduced in this paper are useful tools for studying the asymmetric structure for directed networks, and we wish that they can be widely applied in network science. However, perhaps the main limitation of the models is that

K_{r}

and

K_{c}

in the directed network are assumed as givens, and such a limitation also holds for the spectral clustering algorithms developed under the ScBM and DCScBM studied in [18,19,20]. In most community problems, the number of row communities and the number of column communities are unknown; therefore, a complete calculation and theoretical study requires not only the algorithms and their theoretically consistent estimations described in this paper, but also a method for estimating

K_{r}

and

K_{c}

. A possible solution to this problem may be a combination of algorithms developed in this paper and the modularity for the directed networks developed in [32]. Meanwhile, our idea can be extended in many ways. In this paper, we only consider modeling an un-weighted directed network, and it is possible to extend our work to a weighted directed network. Our algorithms are designed based on the adjacency matrix, and it is possible to design spectral algorithms to fit ONM and ODCNM by applying the regularized Laplace matrix used in [11,12]. When detecting large-scale directed networks, the random projection-based and the random sampling-based spectral clustering ideas in [33] may be applied to accelerate our algorithms. We leave the studies of these problems to our future work.

Funding

This research was funded by the scientific research start-up fund of China University of Mining and Technology, NO. 102520253, and the high-level personal project of Jiangsu Province, NO. JSSCBS20211218.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available within the article.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SBM	Stochastic Blockmodel
DCSBM	Degree-Corrected Stochastic Blockmodel
MMSB	Mixed Membership Stochastic Blockmodel
DCMM	Degree-Corrected Mixed Membership model
OCCAM	Overlapping Continuous Community Assignment model
ScBM	Stochastic co-Blockmodel
DCScBM	Degree-Corrected Stochastic co-Blockmodel
DiMMSB	Directed Mixed Membership Stochastic Blockmodel
ONM	Overlapping and Non-overlapping model
ODCNM	Overlapping and Degree-Corrected Non-overlapping model
SP	successive projection algorithm
ONA	Overlapping and Non-overlapping algorithm
ODCNA	Overlapping and Degree-Corrected Non-overlapping Algorithm

Appendix A. Successive Projection Algorithm

Algorithm A1 is the Successive Projection algorithm.

Algorithm A1 Successive Projection (SP) [27]

Require: Near-separable matrix

Y_{s p} = S_{s p} M_{s p} + Z_{s p} \in R_{+}^{m \times n}

, where

S_{s p}, M_{s p}

should satisfy Assumption 1 [27], the number r of columns to be extracted.

Ensure: Set of indices

K

such that

Y (K, :) \approx S

(up to permutation)

1:: Compute ${\hat{U}}_{r} \in R^{n_{r} \times K_{r}}$ and ${\hat{U}}_{c} \in R^{n_{c} \times K_{r}}$ from the top- $K_{r}$ -dimensional SVD of A.
2:: Let $R = Y_{s p}, K = {}, k = 1$ .
3:: While $R \neq 0$ and $k \leq r$ do
4:: $k_{*} = {argmax}_{k} {∥ R (k, :) ∥}_{F}$ .
5:: $u_{k} = R (k_{*}, :)$ .
6:: $R \leftarrow (I - \frac{u_{k} u_{k}^{'}}{∥ u_{k} ∥_{F}^{2}}) R$ .
7:: $K = K \cup {k_{*}}$ .
8:: k = k + 1.
9:: end while

Appendix B. Proofs under ONM

Appendix B.1. Proof of Proposition 1

Proof.

By Lemma 1, let

U_{r} Λ U_{c}^{'}

be the compact SVD of

Ω

, such that

Ω = U_{r} Λ U_{c}^{'}

; since

Ω = Π_{r} P Π_{c}^{'} = {\overset{ˇ}{Π}}_{r} \overset{ˇ}{P} {\overset{ˇ}{Π}}_{c}^{'}

, we have

Ω (I_{r}, I_{c}) = P = \overset{ˇ}{P}

, which gives

P = \overset{ˇ}{P}

. By Lemma 1, since

U_{r} = Π_{r} U_{r} (I_{r}, :) = {\overset{ˇ}{Π}}_{r} U_{r} (I_{r}, :)

, we have

Π_{r} = {\overset{ˇ}{Π}}_{r}

where we have used the fact that the inverse of

U_{r} (I_{r}, :)

exists. Since

Ω = Π_{r} P Π_{c}^{'} = {\overset{ˇ}{Π}}_{r} \overset{ˇ}{P} {\overset{ˇ}{Π}}_{c}^{'} = Π_{r} P {\overset{ˇ}{Π}}_{c}^{'}

, we have

Π_{r} P Π_{c}^{'} = Π_{r} P {\overset{ˇ}{Π}}_{c}^{'}

. By Lemma 7 of [21], we have

P Π_{c}^{'} = P {\overset{ˇ}{Π}}_{c}^{'}

, i.e.,

Π_{c} X = {\overset{ˇ}{Π}}_{c} X

, where we set

X = P^{'} \in R^{K_{c} \times K_{r}}

. Let

\overset{ˇ}{ℓ}

be the

n_{c} \times 1

vector of column nodes labels obtained from

{\overset{ˇ}{Π}}_{c}

. For

i_{c} \in [n_{c}], k \in [K_{r}]

, from

Π_{c} X = {\overset{ˇ}{Π}}_{c} X

, we have

(Π_{c} X) (i_{c}, k) = Π_{c} (i_{c}, :) X (:, k) = X (ℓ (i_{c}), k) = X (\overset{ˇ}{ℓ} (i_{c}), k)

, which means that we must have

ℓ (i_{c}) = \overset{ˇ}{ℓ} (i_{c})

for all

i_{c} \in [n_{c}]

, i.e.,

ℓ = \overset{ˇ}{ℓ}

and

Π_{c} = {\overset{ˇ}{Π}}_{c}

. Note that for the special case

K_{r} = K_{c} = K

,

Π_{c} = {\overset{ˇ}{Π}}_{c}

can be obtained easily: since

P Π_{c}^{'} = P {\overset{ˇ}{Π}}_{c}^{'}

and

P \in R^{K \times K}

is assumed to be full rank, we have

Π_{c} = {\overset{ˇ}{Π}}_{c}

. Thus, the proposition holds. □

Appendix B.2. Proof of Lemma 1

Proof.

For

U_{r}

, since

Ω = U_{r} Λ U_{c}^{'}

and

U_{c}^{'} U_{c} = I_{K_{r}}

, we have

U_{r} = Ω U_{c} Λ^{- 1}

. Recall that

Ω = Π_{r} P Π_{c}^{'}

; we have

U_{r} = Π_{r} P Π_{c}^{'} U_{c} Λ^{- 1} = Π_{r} B_{r}

, where we set

B_{r} = P Π_{c}^{'} U_{c} Λ^{- 1}

. Since

U_{r} (I_{r}, :) = Π_{r} (I_{r}, :) B_{r} = B_{r}

, we have

B_{r} = U_{r} (I_{r}, :)

. For

i_{r} \in [n_{r}]

,

U_{r} (i_{r}, :) = e_{i_{r}}^{'} Π_{r} B_{r} = Π_{r} (i_{r}, :) B_{r}

, so we have

U_{r} (i_{r}, :) = U_{r} ({\bar{i}}_{r}, :)

when

Π_{r} (i_{r}, :) = Π_{r} ({\bar{i}}_{r}, :)

.

For

U_{c}

, following a similar analysis as for

U_{r}

, we have

U_{c} = Π_{c} B_{c}

, where

B_{c} = P^{'} Π_{r}^{'} U_{r} Λ^{- 1}

. Note that

B_{c} \in R^{K_{c} * K_{r}}

. Sure,

U_{c} (i_{c}, :) = U_{c} ({\bar{i}}_{c}, :)

when

ℓ (i_{c}) = ℓ ({\bar{i}}_{c})

for

i_{c}, {\bar{i}}_{c} \in [n_{c}]

.

Now, we focus on the case where

K_{r} = K_{c} = K

. For this case, since

B_{c} \in R^{K_{c} * K_{r}}

,

B_{c}

is full rank when

K_{r} = K_{c}

. Since

I_{K_{r}} = I_{K} = U_{c}^{'} U_{c} = B_{c}^{'} Π_{c}^{'} Π_{c} B_{c}

, we have

Π_{c}^{'} Π_{c} = {(B_{c} B_{c}^{'})}^{- 1}

. Since

Π_{c}^{'} Π_{c} = diag (n_{c, 1}, n_{c, 2}, \dots, n_{c, K})

, we have

B_{c} B_{c}^{'} = diag (\frac{1}{n_{c, 1}}, \frac{1}{n_{c, 2}}, \dots, \frac{1}{n_{c, K}})

. When

K_{r} = K_{c} = K

, we have

B_{c} (k, :) B_{c}^{'} (l, :) = 0

for any

k \neq l

and

k, l \in [K]

. Then, we have

B_{c} B_{c}^{'} = diag (∥ B_{c} {(1, :) ∥}_{F}^{2}, ∥ B_{c} {(2, :) ∥}_{F}^{2}, \dots, ∥ B_{c} (K, :) ∥_{F}^{2}) = diag (\frac{1}{n_{c, 1}}, \frac{1}{n_{c, 2}}, \dots, \frac{1}{n_{c, K}})

, and the lemma follows.

Note that when

K_{r} < K_{c}

, since

B_{c}

is not full rank now, we cannot obtain

Π_{c}^{'} Π_{c} = {(B_{c} B_{c}^{'})}^{- 1}

from

I_{K_{r}} = B_{c}^{'} Π_{c}^{'} Π_{c} B_{c}

. Therefore, when

K_{r} < K_{c}

, the equality

∥ B_{c} (k, :) - B_{c} {(l, :) ∥}_{F} = \sqrt{\frac{1}{n_{c, k}} + \frac{1}{n_{c, l}}}

does not hold for any

k \neq l

. Additionally, we can only know that

U_{c}

has

K_{c}

distinct rows when

K_{r} < K_{c}

, but have no knowledge about the minimum distance between any two distinct rows of

U_{c}

. □

Appendix B.3. Proof of Theorem 1

Proof.

First, by Lemma 4 of [21], we have the below lemma.

Lemma A1.

(Row-wise singular eigenvector error) Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, when Assumption 1 holds, suppose

σ_{K_{r}} (Ω) \geq C \sqrt{ρ (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

\begin{matrix} ∥ {\hat{U}}_{r} {\hat{U}}_{r}^{'} - U_{r} U_{r}^{'} ∥_{2 \to \infty} = O (\frac{\sqrt{K_{r}} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{\sqrt{ρ} σ_{K_{r}} (\tilde{P}) σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}}), \end{matrix}

where μ is the incoherence parameter defined as

μ = \max (\frac{n_{r} {∥ U_{r} ∥}_{2 \to \infty}^{2}}{K_{r}}, \frac{n_{c} {∥ U_{c} ∥}_{2 \to \infty}^{2}}{K_{r}})

.

For the row nodes, when conditions in Lemma A1 hold, by Theorem 2 of [21], with a probability of at least

1 - o ({(n_{r} + n_{c})}^{- α})

for any

α > 0

, there exists a permutation matrix

P_{r}

such that, for

i_{r} \in [n_{r}]

, we have

\begin{matrix} ∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥_{1} = O (ϖ κ (Π_{r}^{'} Π_{r}) K_{r} \sqrt{λ_{1} (Π_{r}^{'} Π_{r})}) . \end{matrix}

Next, we focus on the column nodes. By the Proof of Lemma 3 in [19], there exists an orthogonal matrix

\hat{O}

such that

\begin{matrix} ∥ {\hat{U}}_{c} \hat{O} - U_{c} ∥_{F} \leq \frac{2 \sqrt{2 K_{r}} ∥ A - Ω ∥}{\sqrt{λ_{K_{r}} (Ω^{'} Ω)}} . \end{matrix}

(A1)

Under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

, by Lemma 10 of [21], we have

\begin{matrix} \sqrt{λ_{K_{r}} (Ω^{'} Ω)} \geq ρ σ_{K_{r}} (\tilde{P}) σ_{K_{r}} (Π_{r}) σ_{K_{r}} (Π_{c}) . \end{matrix}

(A2)

Since all column nodes are pure,

σ_{K_{r}} (Π_{c}) = \sqrt{n_{c, K_{r}}}

. By Lemma 3 of [21], when Assumption 1 holds with a probability at least

1 - o ({(n_{r} + n_{c})}^{- α})

, we have

\begin{matrix} ∥ A - Ω ∥ = O (\sqrt{ρ \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}) . \end{matrix}

(A3)

Substituting the two bounds in Equations (A2) and (A3) into Equation (A1), we have

\begin{matrix} ∥ {\hat{U}}_{c} \hat{O} - U_{c} ∥_{F} \leq C \frac{\sqrt{K_{r} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}}{σ_{K_{r}} (\tilde{P}) \sqrt{ρ} σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}} . \end{matrix}

(A4)

Let

ς > 0

be a small quantity; by Lemma 2 in [15], if

\begin{matrix} \frac{\sqrt{K_{c}}}{ς} ∥ U_{c} - {\hat{U}}_{c} \hat{O} ∥_{F} (\frac{1}{\sqrt{n_{c, k}}} + \frac{1}{\sqrt{n_{c, l}}}) \leq {∥ B_{c} (k, :) - B_{c} (l, :) ∥}_{F}, for each 1 \leq k \neq l \leq K_{c}, \end{matrix}

(A5)

then the clustering error

{\hat{f}}_{c} = O (ς^{2})

. Recall that we set

δ_{c} = \min_{k \neq l} {∥ B_{c} (k, :) - B_{c} (l, :) ∥}_{F}

to measure the minimum center separation of

B_{c}

. Setting

ς = \frac{2}{δ_{c}} \sqrt{\frac{K_{c}}{n_{c, \min}}} {∥ U_{c} - {\hat{U}}_{c} \hat{O} ∥}_{F}

makes Equation (A5) hold for all

1 \leq k \neq l \leq K_{c}

. Then, we have

{\hat{f}}_{c} = O (ς^{2}) = O (\frac{K_{c} {∥ U_{c} - {\hat{U}}_{c} \hat{O} ∥}_{F}^{2}}{δ_{c}^{2} n_{c, \min}})

. By Equation (A4), we have

\begin{matrix} {\hat{f}}_{c} = O (\frac{K_{r} K_{c} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}}) . \end{matrix}

Especially, when

K_{r} = K_{c} = K

,

δ_{c} \geq \sqrt{\frac{2}{n_{c, \max}}}

under

O N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c})

by Lemma 1. When

K_{r} = K_{c} = K

, we have

\begin{matrix} {\hat{f}}_{c} = O (\frac{K^{2} \max (n_{r}, n_{c}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K}^{2} (\tilde{P}) ρ σ_{K}^{2} (Π_{r}) n_{c, \min}^{2}}) . \end{matrix}

□

Appendix B.4. Proof of Corollary 1

Proof.

For the row nodes, under the conditions of Corollary 1, we have

\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (ϖ K_{r} \sqrt{\frac{n_{r}}{K_{r}}}) = O (ϖ \sqrt{K n_{r}}) . \end{matrix}

Under the conditions of Corollary 1,

κ (Ω) = O (1)

and

μ \leq C

for some

C > 0

by the proof of Corollary 1 [21]. Then, by Lemma A1, we have

\begin{matrix} ϖ & = O (\frac{\sqrt{K} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{\sqrt{ρ} σ_{K} (\tilde{P}) σ_{K} (Π_{r}) \sqrt{n_{c, K_{r}}}}) = O (\frac{\sqrt{K} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{\sqrt{ρ} σ_{K} (\tilde{P}) σ_{K} (Π_{r}) \sqrt{n_{c, \min}}}) \\ = O (\frac{K^{1.5} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{σ_{K} (\tilde{P}) \sqrt{ρ n_{r} n_{c}}}), \end{matrix}

which gives that

\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{K^{2} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{σ_{K} (\tilde{P}) \sqrt{ρ n_{c}}}) . \end{matrix}

Note that, when

K_{r} < K_{c}

, we cannot draw a conclusion that

μ \leq C

. This is because, when

K_{r} < K_{c}

, the inverse of

B_{c} B_{c}^{'}

does not exist, since

B_{c} \in R^{K_{c} \times K_{r}}

. Therefore, Lemma 8 of [21] does not hold, and we cannot obtain the upper bound of

∥ U_{c} ∥_{2 \to \infty}

, causing the impossibility of obtaining the upper bound of

μ

, and this is the reason for why we only consider the case for when

K_{r} = K_{c}

, for the row nodes here.

For the column nodes, under the conditions of Corollary 1, we have

\begin{matrix} {\hat{f}}_{c} & = O (\frac{K_{r} K_{c} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}}) = O (\frac{K_{r} K_{c} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} (n_{r} / K_{r}) (n_{c} / K_{c}) (n_{c} / K_{c})}) \\ = O (\frac{K_{r}^{2} K_{c}^{3} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (\tilde{P}) ρ δ_{c}^{2} n_{r} n_{c}^{2}}) . \end{matrix}

For the special case

K_{r} = K_{c} = K

, since

\frac{n_{c, \max}}{n_{c, \min}} = O (1)

when

n_{c, \min} = O (\frac{n_{c}}{K})

, we have

\begin{matrix} {\hat{f}}_{c} = O (\frac{K^{4} \max (n_{r}, n_{c}) \log (n_{r} + n_{c})}{σ_{K}^{2} (\tilde{P}) ρ n_{r} n_{c}}) . \end{matrix}

When

n_{r} = O (n), n_{c} = O (n), K_{r} = O (1)

, and

K_{c} = O (1)

, the corollary follows immediately by basic algebra. □

Appendix C. Proofs under ODCNM

Appendix C.1. Proof of Proposition 2

Proof.

Since

Ω = Π_{r} P Π_{c}^{'} Θ_{c} = {\overset{ˇ}{Π}}_{r} \overset{ˇ}{P} {\overset{ˇ}{Π}}_{c}^{'} {\overset{ˇ}{Θ}}_{c} = U_{r} Λ U_{c}^{'}

, we have

U_{r} = Π_{r} U_{r} (I_{r}, :) = {\overset{ˇ}{Π}}_{r} U_{r} (I_{r}, :)

by Lemma 2, which gives that

Π_{r} = {\overset{ˇ}{Π}}_{r}

. Since

U_{c, *} = Π_{c} B_{c} = Π_{c} U_{c, *} (I_{c}, :) = {\overset{ˇ}{Π}}_{c} U_{c, *} (I_{c}, :)

by Lemma 2, we have

Π_{c} = {\overset{ˇ}{Π}}_{c}

. □

Appendix C.2. Proof of Lemma 2

Proof.

For $U_{r}$ : since $Ω = U_{r} Λ U_{c}^{'}$ and $U_{c}^{'} U_{c} = I_{K_{r}}$ , we have $U_{r} = Ω U_{c} Λ^{- 1}$ . Recall that $Ω = Π_{r} P Π_{c}^{'} Θ_{c}$ under ODCNM; we have $U_{r} = Π_{r} P Π_{c}^{'} Θ_{c} U_{c} Λ^{- 1} = Π_{r} B_{r}$ , where $B_{r} = P Π_{c}^{'} Θ_{c} U_{c} Λ^{- 1}$ . Sure, $U_{r} (i_{r}, :) = U_{r} ({\bar{i}}_{r}, :)$ holds when $Π_{r} (i_{r}, :) = Π_{r} ({\bar{i}}_{r}, :)$ for $i_{r}, {\bar{i}}_{r} \in [n_{r}]$ .
For $U_{c}$ : let $D_{c}$ be a $K_{c} \times K_{c}$ diagonal matrix, such that $D_{c} (k, k) = \frac{∥ Θ_{c} Π_{c} {(:, k) ∥}_{F}}{∥ θ_{c} ∥_{F}}$ for $k \in [K_{c}]$ . Let $Γ_{c}$ be an $n_{c} \times K_{c}$ matrix, such that $Γ_{c} (:, k) = \frac{Θ_{c} Π_{c} (:, k)}{∥ Θ_{c} Π_{c} {(:, k) ∥}_{F}}$ for $k \in [K_{c}]$ . For such $D_{c}$ and $Γ_{c}$ , we have $Γ_{c}^{'} Γ_{c} = I_{K_{c}}$ and $Ω = Π_{r} P {∥ θ_{c} ∥}_{F} D_{c} Γ_{c}^{'}$ , i.e., $Θ_{c} Π_{c} = {∥ θ_{c} ∥}_{F} Γ_{c} D_{c}$ .
Since $Ω = U_{r} Λ U_{c}^{'}$ and $U_{r}^{'} U_{r} = I_{K_{r}}$ , we have $U_{c} = Θ_{c} Π_{c} P^{'} Π_{r}^{'} U_{r} Λ^{- 1}$ . Since $Θ_{c} Π_{c} = {∥ θ_{c} ∥}_{F} Γ_{c} D_{c}$ , we have $U_{c} = Γ_{c} {∥ θ_{c} ∥}_{F} D_{c} P^{'} Π_{r}^{'} U_{r} Λ^{- 1} = Γ_{c} V_{c}$ , where we set
$V_{c} = {∥ θ_{c} ∥}_{F} D_{c} P^{'} Π_{r}^{'} U_{r} Λ^{- 1} \in R^{K_{c} \times K_{r}}$ . Note that since $U_{c}^{'} U_{c} = I_{K_{r}} = V_{c}^{'} Γ_{c}^{'} Γ_{c} V_{c} = V_{c}^{'} V_{c}$ , we have $V_{c}^{'} V_{c} = I_{K_{r}}$ . Now, for $i_{c} \in [n_{c}], k \in [K_{r}]$ , we have

$\begin{matrix} U_{c} (i_{c}, k) & = e_{i_{c}}^{'} U_{c} e_{k} = e_{i_{c}}^{'} Γ_{c} V_{c} e_{k} = Γ_{c} (i_{c}, :) V_{c} e_{k} \\ = θ_{c} (i_{c}) [\frac{Π_{c} (i_{c}, 1)}{∥ Θ_{c} Π_{c} {(:, 1) ∥}_{F}} \frac{Π_{c} (i_{c}, 2)}{∥ Θ_{c} Π_{c} {(:, 2) ∥}_{F}} \dots \frac{Π_{c} (i_{c}, K_{c})}{∥ Θ_{c} Π_{c} (:, K_{c}) ∥_{F}}] V_{c} e_{k} \\ = \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} (:, ℓ (i_{c})) ∥_{F}} V_{c} (ℓ (i_{c}), k), \end{matrix}$

which gives that

$\begin{matrix} U_{c} (i_{c}, :) & = \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} {(:, ℓ (i_{c}) ∥}_{F}} [V_{c} (ℓ (i_{c}), 1) V_{c} (ℓ (i_{c}), 2) \dots V_{c} (ℓ (i_{c}), K_{r})] \\ = \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} {(:, ℓ (i_{c}) ∥}_{F}} V_{c} (ℓ (i_{c}), :) . \end{matrix}$

Then, we have

$\begin{matrix} U_{c, *} (i_{c}, :) = \frac{V_{c} (ℓ (i_{c}), :)}{∥ V_{c} (ℓ (i_{c}), :) ∥_{F}} . \end{matrix}$

(A6)

Sure, we have $U_{c, *} (i_{c}, :) = U_{c, *} ({\bar{i}}_{c}, :)$ when $ℓ (i_{c}) = ℓ ({\bar{i}}_{c})$ for $i_{c}, {\bar{i}}_{c} \in [n_{c}]$ . Let $B_{c} \in R^{K_{c} \times K_{r}}$ , such that $B_{c} (l, :) = \frac{V_{c} (l, :)}{∥ V_{c} {(l, :) ∥}_{F}}$ for $l \in [K_{c}]$ . Equation (A6) gives $U_{c, *} = Π_{c} B_{c}$ , which guarantees the existence of $B_{c}$ .
Now, we consider the case for when $K_{r} = K_{c} = K$ . Since $V_{c} \in R^{K_{c} \times K_{r}}$ and $U_{c} = Γ_{c} V_{c} \in R^{n_{c} \times K_{r}}$ , we have $V_{c} \in R^{K \times K}$ and $rank (V_{c}) = K$ . Since $V_{c}^{'} V_{c} = I_{K_{r}}$ , we have $V_{c}^{'} V_{c} = I_{K}$ when $K_{r} = K_{c} = K$ . Then, we have

$\begin{matrix} V_{c}^{'} V_{c} = I_{K} \Rightarrow V_{c}^{'} V_{c} V_{c}^{'} = V_{c}^{'} \Rightarrow V_{c}^{'} (V_{c} V_{c}^{'} - I_{K}) = 0 \overset{rank (V_{c}) = K}{\Rightarrow} V_{c} V_{c}^{'} = I_{K} . \end{matrix}$

(A7)

Since $V_{c} V_{c}^{'} = V_{c}^{'} V_{c} = I_{K}$ , we have $U_{c, *} (i_{c}, :) = V_{c} (ℓ (i_{c}), :)$ by Equation (A6), and
$∥ U_{c, *} (i_{c}, :) - U_{c, *} ({\bar{i}}_{c}, :) ∥_{F} = {∥ V_{c} (ℓ (i_{c}), :) - V_{c} (ℓ ({\bar{i}}_{c}), :) ∥}_{F} = \sqrt{2}$ when $ℓ (i_{c}) \neq ℓ ({\bar{i}}_{c})$ for $i_{c}, {\bar{i}}_{c} \in [n_{c}]$ , i.e., $∥ B_{c} (k, :) - B_{c} {(l, :) ∥}_{F} = \sqrt{2}$ for $k \neq l \in [K]$ .
Note that, when $K_{r} < K_{c}$ , since $rank (V_{c}) = K_{r}$ and $V_{c} \in R^{K_{c} \times K_{r}}$ , the inverse of $V_{c}$ does not exist, which causes that the last equality in Equation (A7) does not hold and $∥ B_{c} (k, :) - B_{c} (ℓ, :) ∥ \neq \sqrt{2}$ for all $k \neq l \in [K_{c}]$ .

□

Appendix C.3. Proof of Theorem 2

Proof.

First, by the proof of Lemma 4.3 of [25], we have the below lemma.

Lemma A2.

(Row-wise singular eigenvector error) Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, a n d Θ_{c})

, when Assumption 2 holds, suppose that

σ_{K_{r}} (Ω) \geq C \sqrt{θ_{c, \max} (n_{r} + n_{c}) \log (n_{r} + n_{c})}

, with a probability at least

1 - o ({(n_{r} + n_{c})}^{- α})

,

\begin{matrix} ∥ {\hat{U}}_{r} {\hat{U}}_{r}^{'} - U_{r} U_{r}^{'} ∥_{2 \to \infty} = O (\frac{\sqrt{θ_{c, \max} K_{r}} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K_{r}} (P) σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}}) . \end{matrix}

For the row nodes, when the conditions in Lemma A2 hold, by Theorem 2 of [21], we have

\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (ϖ κ (Π_{r}^{'} Π_{r}) K_{r} \sqrt{λ_{1} (Π_{r}^{'} Π_{r})}) . \end{matrix}

Next, we focus on the column nodes. By the proof of Lemma 3 in [19], there is an orthogonal matrix

\hat{O}

, such that

\begin{matrix} ∥ {\hat{U}}_{c} \hat{O} - U_{c} ∥_{F} \leq \frac{2 \sqrt{2 K_{r}} ∥ A - Ω ∥}{\sqrt{λ_{K_{r}} (Ω^{'} Ω)}} . \end{matrix}

(A8)

Under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, a n d Θ_{c})

, by Lemma 4 of [25], we have

\begin{matrix} \sqrt{λ_{K_{r}} (Ω^{'} Ω)} \geq θ_{c, \min} σ_{K_{r}} (P) σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}} . \end{matrix}

(A9)

By Lemma 4.2 of [25], when Assumption 2 holds, with a probability at least

1 - o ({(n_{r} + n_{c})}^{- α})

, we have

\begin{matrix} ∥ A - Ω ∥ = O (\sqrt{\max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}) . \end{matrix}

(A10)

Substituting the two bounds in Equations (A9) and (A10) into Equation (A8), we have

\begin{matrix} ∥ {\hat{U}}_{c} \hat{O} - U_{c} ∥_{F} \leq C \frac{\sqrt{K_{r} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}}{σ_{K_{r}} (P) θ_{c, \min} σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}} . \end{matrix}

(A11)

For

i_{c} \in [n_{c}]

, by basic algebra, we have

\begin{matrix} ∥ {\hat{U}}_{c, *} (i_{c}, :) \hat{O} - U_{c, *} (i_{c}, :) ∥_{F} \leq \frac{2 ∥ {\hat{U}}_{c} (i_{c}, :) \hat{O} - U_{c} (i_{c}, :) ∥_{F}}{∥ U_{c} (i_{c}, :) ∥_{F}} . \end{matrix}

Setting

m_{c} = \min_{1 \leq i_{c} \leq n_{c}} {∥ U_{c} (i_{c}, :) ∥}_{F}

, we have

\begin{matrix} ∥ {\hat{U}}_{c, *} \hat{O} - U_{c, *} ∥_{F} = \sqrt{\sum_{i_{c} = 1}^{n_{c}} {∥ {\hat{U}}_{c, *} (i_{c}, :) \hat{O} - U_{c, *} (i_{c}, :) ∥}_{F}^{2}} \leq \frac{2 ∥ {\hat{U}}_{c} \hat{O} - U_{c} ∥_{F}}{m_{c}} . \end{matrix}

Next, we provide the lower bounds of

m_{c}

. By the proof of Lemma 2, we have

\begin{matrix} ∥ U_{c} (i_{c}, :) ∥_{F} & = ∥ \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} (:, ℓ (i_{c})) ∥_{F}} V_{c} (ℓ (i_{c}), :) ∥_{F} = \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} (:, ℓ (i_{c})) ∥_{F}} {∥ V_{c} (ℓ (i_{c}), :) ∥}_{F} \\ \geq \frac{θ_{c} (i_{c})}{∥ Θ_{c} Π_{c} (:, ℓ (i_{c})) ∥_{F}} m_{V_{c}} \geq \frac{θ_{c, \min}}{θ_{c, \max} \sqrt{n_{c, \max}}} m_{V_{c}}, \end{matrix}

where we set

m_{V_{c}} = \min_{k \in [K_{c}]} {∥ V_{c} (k, :) ∥}_{F}

. Note that when

K_{r} = K_{c} = K

, by the Proof of Lemma 2, we know that

V_{c} V_{c}^{'} = I_{K}

, which gives that

∥ V_{c} {(k, :) ∥}_{F} = 1

for

k \in [K]

; i.e.,

m_{V_{c}} = 1

when

K_{r} = K_{c} = K

. However, when

K_{r} < K_{c}

, it is challenge to obtain a positive lower bound of

m_{V_{c}}

. Hence, we have

\frac{1}{m_{c}} \leq \frac{θ_{c, \max} \sqrt{n_{c, \max}}}{θ_{c, \min} m_{V_{c}}}

. Then, by Equation (A11), we have

\begin{matrix} ∥ {\hat{U}}_{c, *} \hat{O} - U_{c, *} ∥_{F} = O (\frac{θ_{c, \max} \sqrt{K_{r} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}}{σ_{K_{r}} (P) θ_{c, \min}^{2} m_{V_{c}} σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}}) . \end{matrix}

Let

ς > 0

be a small quantity; by Lemma 2 in [15], if

\begin{matrix} \frac{\sqrt{K_{c}}}{ς} ∥ U_{c, *} - {\hat{U}}_{c, *} \hat{O} ∥_{F} (\frac{1}{\sqrt{n_{c, k}}} + \frac{1}{\sqrt{n_{c, l}}}) \leq {∥ B_{c} (k, :) - B_{c} (l, :) ∥}_{F}, for each 1 \leq k \neq l \leq K_{c}, \end{matrix}

(A12)

then the clustering error

{\hat{f}}_{c} = O (ς^{2})

. Setting

ς = \frac{2}{δ_{c}} \sqrt{\frac{K_{c}}{n_{c, \min}}} {∥ U_{c, *} - {\hat{U}}_{c, *} \hat{O} ∥}_{F}

makes Equation (A12) hold for all

1 \leq k \neq l \leq K_{c}

. Then, we have

{\hat{f}}_{c} = O (ς^{2}) = O (\frac{K_{c} {∥ U_{c, *} - {\hat{U}}_{c, *} \hat{O} ∥}_{F}^{2}}{δ_{c}^{2} n_{c, \min}})

. By Equation (A11), we have

\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K_{r} K_{c} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}}) . \end{matrix}

Especially, when

K_{r} = K_{c} = K

,

δ_{c} = \sqrt{2}

under

O D C N M_{n_{r}, n_{c}} (K_{r}, K_{c}, P, Π_{r}, Π_{c}, Θ_{c})

by Lemma 2, and

m_{V_{c}} = 1

. When

K_{r} = K_{c} = K

, we have

\begin{matrix} {\hat{f}}_{c} = O (\frac{θ_{c, \max}^{2} K^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K}^{2} (P) θ_{c, \min}^{4} σ_{K}^{2} (Π_{r}) n_{c, \min}^{2}}) . \end{matrix}

□

Appendix C.4. Proof of Corollary 2

Proof.

For the row nodes, under the conditions of Corollary 2, we have

\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (ϖ K_{r} \sqrt{\frac{n_{r}}{K_{r}}}) = O (ϖ \sqrt{K n_{r}}) . \end{matrix}

Under the conditions of Corollary 2,

κ (Ω) = O (1)

and

μ \leq C \frac{θ_{c, \max}^{2}}{θ_{c, \min}^{2}} \leq C

for some

C > 0

by Lemma 2 of [25]. Then, by Lemma A2, we have

\begin{matrix} ϖ & = O (\frac{\sqrt{θ_{c, \max} K_{r}} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K_{r}} (P) σ_{K_{r}} (Π_{r}) \sqrt{n_{c, K_{r}}}}) \\ = O (\frac{\sqrt{θ_{c, \max} K} (κ (Ω) \sqrt{\frac{\max (n_{r}, n_{c}) μ}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K} (P) σ_{K} (Π_{r}) \sqrt{n_{c, \min}}}) \\ = O (\frac{K^{1.5} \sqrt{θ_{c, \max}} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K} (P) \sqrt{n_{r} n_{c}}}), \end{matrix}

which gives that

\begin{matrix} \max_{i_{r} \in [n_{r}]} {∥ e_{i_{r}}^{'} ({\hat{Π}}_{r} - Π_{r} P_{r}) ∥}_{1} = O (\frac{K^{2} \sqrt{θ_{c, \max}} (\sqrt{\frac{C \max (n_{r}, n_{c})}{\min (n_{r}, n_{c})}} + \sqrt{\log (n_{r} + n_{c})})}{θ_{c, \min} σ_{K} (P) \sqrt{n_{c}}}) . \end{matrix}

The reason for why we do not consider the case when

K_{r} < K_{c}

for row nodes is similar as that of Corollary 1, and we omit it here.

For column nodes, under conditions of Corollary 2, we have

\begin{matrix} {\hat{f}}_{c} & = O (\frac{θ_{c, \max}^{2} K_{r} K_{c} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} σ_{K_{r}}^{2} (Π_{r}) n_{c, K_{r}} n_{c, \min}}) \\ = O (\frac{θ_{c, \max}^{2} K_{r}^{2} K_{c}^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}{σ_{K_{r}}^{2} (P) θ_{c, \min}^{4} δ_{c}^{2} m_{V_{c}}^{2} n_{r} n_{c}}) . \end{matrix}

For the case

K_{r} = K_{c} = K

, we have

\begin{matrix} {\hat{f}}_{c} & = O (\frac{θ_{c, \max}^{2} K^{2} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) n_{c, \max} \log (n_{r} + n_{c})}{σ_{K}^{2} (P) θ_{c, \min}^{4} σ_{K}^{2} (Π_{r}) n_{c, \min}^{2}}) \\ = O (\frac{θ_{c, \max}^{2} K^{4} \max (θ_{c, \max} n_{r}, ∥ θ_{c} ∥_{1}) \log (n_{r} + n_{c})}{σ_{K}^{2} (P) θ_{c, \min}^{4} n_{r} n_{c}}) . \end{matrix}

When

n_{r} = O (n), n_{c} = O (n), K_{r} = O (1)

and

K_{c} = O (1)

, the corollary follows immediately by basic algebra. □

References

Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [PubMed]
Newman, M.E. The structure and function of complex networks. Siam Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef]
Fortunato, S.; Hric, D. Community detection in networks: A user guide. Phys. Rep. 2016, 659, 1–44. [Google Scholar] [CrossRef]
Goldenberg, A.; Zheng, A.X.; Fienberg, S.E.; Airoldi, E.M. A survey of statistical network models. Found. Trends Mach. Learn. 2010, 2, 129–233. [Google Scholar] [CrossRef]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. 2011, 83, 16107. [Google Scholar] [CrossRef]
Airoldi, E.M.; Blei, D.M.; Fienberg, S.E.; Xing, E.P. Mixed Membership Stochastic Blockmodels. J. Mach. Learn. Res. 2008, 9, 1981–2014. [Google Scholar]
Jin, J.; Ke, Z.T.; Luo, S. Estimating network memberships by simplex vertex hunting. arXiv 2017, arXiv:1708.07852. [Google Scholar]
Zhang, Y.; Levina, E.; Zhu, J. Detecting Overlapping Communities in Networks Using Spectral Methods. Siam J. Math. Data Sci. 2020, 2, 265–283. [Google Scholar] [CrossRef]
Rohe, K.; Chatterjee, S.; Yu, B. Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Stat. 2011, 39, 1878–1915. [Google Scholar] [CrossRef]
Qin, T.; Rohe, K. Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 2013, 26, 3120–3128. [Google Scholar]
Lei, J.; Rinaldo, A. Consistency of spectral clustering in stochastic block models. Ann. Stat. 2015, 43, 215–237. [Google Scholar] [CrossRef]
Jin, J. Fast community detection by SCORE. Ann. Stat. 2015, 43, 57–89. [Google Scholar] [CrossRef]
Joseph, A.; Yu, B. Impact of regularization on spectral clustering. Ann. Stat. 2016, 44, 1765–1791. [Google Scholar] [CrossRef]
Mao, X.; Sarkar, P.; Chakrabarti, D. Overlapping Clustering Models, and One (class) SVM to Bind Them All. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 2126–2136. [Google Scholar]
Mao, X.; Sarkar, P.; Chakrabarti, D. Estimating mixed memberships with sharp eigenvector deviations. J. Am. Stat. Assoc. 2020, 116, 1–13. [Google Scholar] [CrossRef]
Rohe, K.; Qin, T.; Yu, B. Co-clustering directed graphs to discover asymmetries and directional communities. Proc. Natl. Acad. Sci. USA 2016, 113, 12679–12684. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Amini, A.A. Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 2019, 20, 1–47. [Google Scholar]
Wang, Z.; Liang, Y.; Ji, P. Spectral algorithms for community detection in directed networks. J. Mach. Learn. Res. 2020, 21, 1–45. [Google Scholar]
Qing, H.; Wang, J. Directed mixed membership stochastic blockmodel. arXiv 2021, arXiv:2101.02307v2. [Google Scholar]
Airoldi, E.M.; Wang, X.; Lin, X. Multi-way blockmodels for analyzing coordinated high-dimensional responses. Ann. Appl. Stat. 2013, 7, 2431–2457. [Google Scholar] [CrossRef]
Razaee, Z.S.; Amini, A.A.; Li, J.J. Matched bipartite block model with covariates. J. Mach. Learn. Res. 2019, 20, 1174–1217. [Google Scholar]
Zhou, Z.; Amini, A.A. Optimal bipartite network clustering. J. Mach. Learn. Res. 2020, 21, 1–68. [Google Scholar]
Qing, H. Directed degree corrected mixed membership model and estimating community memberships in directed networks. arXiv 2021, arXiv:2109.07826. [Google Scholar]
Ndaoud, M.; Sigalla, S.; Tsybakov, A.B. Improved clustering algorithms for the bipartite stochastic block model. IEEE Trans. Inf. Theory 2021, 68, 1960–1975. [Google Scholar] [CrossRef]
Gillis, N.; Vavasis, S.A. Semidefinite programming based preconditioning for more robust near-separable nonnegative matrix factorization. Siam J. Optim. 2015, 25, 677–698. [Google Scholar] [CrossRef]
Qing, H. A useful criterion on studying consistent estimation in community detection. Entropy 2022, 24, 1098. [Google Scholar] [CrossRef]
Schellenberger, J.; Park, J.O.; Conrad, T.M.; Palsson, B.Ø. BiGG: A Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinform. 2010, 11, 1–10. [Google Scholar] [CrossRef]
Adamic, L.A.; Glance, N. The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 36–43. [Google Scholar]
Kunegis, J. Konect: The koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1343–1350. [Google Scholar]
Leicht, E.A.; Newman, M.E. Community structure in directed networks. Phys. Rev. Lett. 2008, 100, 118703. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, H.; Guo, X.; Chang, X. Randomized spectral clustering in large-scale stochastic block models. J. Comput. Graph. Stat. 2022, 1–20. [Google Scholar] [CrossRef]

Figure 1. Estimation errors of ONA and ODCNA.

Figure 2. For adjacency matrix in panel (a),

MHamm

and

Hamm

for ONA are 0.0544 and 0, respectively. For adjacency matrix in panel (b),

MHamm

and

Hamm

for ONA are 0.1004 and 0, respectively. ODCNA enjoys same error rates as ONA. x-axis: row nodes; y-axis: column nodes.

Figure 2. For adjacency matrix in panel (a),

MHamm

and

Hamm

for ONA are 0.0544 and 0, respectively. For adjacency matrix in panel (b),

MHamm

and

Hamm

for ONA are 0.1004 and 0, respectively. ODCNA enjoys same error rates as ONA. x-axis: row nodes; y-axis: column nodes.

Figure 3. Illustration of a simulated directed network generated under ONM. Panels (a–c) show the adjacency matrix, the sending clusters, and the receiving clusters of this simulated directed network, respectively. For this directed network,

MHamm

and

Hamm

for ONA (and ODCNA) are 0.0615 (0.0615) and 0 (0.2333), respectively. In panels (b,c), the dots in the same color are pure nodes in the same sending (receiving) clusters, and the square indicates the mixed nodes with weight 0.7 belonging to red sending clusters, and weight 0.3 belonging to blue sending clusters, where the sending and receiving clusters are obtained by

Π_{r}

and ℓ provided in Remark 4.

Figure 3. Illustration of a simulated directed network generated under ONM. Panels (a–c) show the adjacency matrix, the sending clusters, and the receiving clusters of this simulated directed network, respectively. For this directed network,

MHamm

and

Hamm

for ONA (and ODCNA) are 0.0615 (0.0615) and 0 (0.2333), respectively. In panels (b,c), the dots in the same color are pure nodes in the same sending (receiving) clusters, and the square indicates the mixed nodes with weight 0.7 belonging to red sending clusters, and weight 0.3 belonging to blue sending clusters, where the sending and receiving clusters are obtained by

Π_{r}

and ℓ provided in Remark 4.

Figure 4. Leading 20 singular values of adjacency matrices for real-world directed networks used in this paper.

Figure 5. Sending and receiving clusters detected by ODCNA for Metabolic network when assuming that nodes in a sending (receiving) pattern have an overlapping (non-overlapping) property. Colors indicate clusters detected using ODCNA, and squares indicate highly mixed nodes, where sending clusters are obtained using

{\hat{ℓ}}_{r}

, the home base sending pattern community, and receiving clusters are obtained by

\hat{ℓ}

from ODCNA.

Figure 5. Sending and receiving clusters detected by ODCNA for Metabolic network when assuming that nodes in a sending (receiving) pattern have an overlapping (non-overlapping) property. Colors indicate clusters detected using ODCNA, and squares indicate highly mixed nodes, where sending clusters are obtained using

{\hat{ℓ}}_{r}

, the home base sending pattern community, and receiving clusters are obtained by

\hat{ℓ}

from ODCNA.

Figure 6. Sending and receiving clusters detected by ODCNA for Political blogs network. Colors indicate clusters and square indicates highly mixed nodes.

Figure 7. Sending and receiving clusters detected by ODCNA for Wikipedia links (crh) network. Colors indicate clusters and square indicates highly mixed nodes.

Figure 8. Sending and receiving clusters detected by ODCNA for Wikipedia links (dv) network. Colors indicate clusters and square indicates highly mixed nodes.

Table 1. The proportion of highly mixed nodes and the asymmetric structure measured by

{Hamm}_{r c}

for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A; i.e., the case when assuming that nodes in a sending (receiving) pattern have overlapping (non-overlapping) property.

Table 1. The proportion of highly mixed nodes and the asymmetric structure measured by

{Hamm}_{r c}

for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is A; i.e., the case when assuming that nodes in a sending (receiving) pattern have overlapping (non-overlapping) property.

Data	$τ$	${Hamm}_{rc}$
Metabolic	0.1209	0.2497
Political blogs	0.0246	0.0443
Wikipedia links (crh)	0.0444	0.0307
Wikipedia links (dv)	0.4089	0.1466

Table 2. The proportion of highly mixed nodes and the asymmetric structure measured by

{Hamm}_{r c}

for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is

A^{'}

, i.e., the case when assuming that nodes in sending (receiving) pattern have non-overlapping (overlapping) property.

Table 2. The proportion of highly mixed nodes and the asymmetric structure measured by

{Hamm}_{r c}

for real-world directed networks considered in this paper when ODCNA’s input adjacency matrix is

A^{'}

, i.e., the case when assuming that nodes in sending (receiving) pattern have non-overlapping (overlapping) property.

Data	$τ$	${Hamm}_{rc}$
Metabolic	0.0594	0.2945
Political blogs	0.1365	0.0443
Wikipedia links (crh)	0.1308	0.0543
Wikipedia links (dv)	0.3492	0.2059

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qing, H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy 2022, 24, 1216. https://doi.org/10.3390/e24091216

AMA Style

Qing H. Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy. 2022; 24(9):1216. https://doi.org/10.3390/e24091216

Chicago/Turabian Style

Qing, Huan. 2022. "Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models" Entropy 24, no. 9: 1216. https://doi.org/10.3390/e24091216

APA Style

Qing, H. (2022). Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models. Entropy, 24(9), 1216. https://doi.org/10.3390/e24091216

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Studying Asymmetric Structure in Directed Networks by Overlapping and Non-Overlapping Models

Abstract

1. Introduction

2. The Overlapping and Non-Overlapping Model

2.1. A Spectral Algorithm for Fitting ONM

2.2. Main Results for ONA

3. The Overlapping and Degree-Corrected Non-Overlapping Model

3.1. A Spectral Algorithm for Fitting ODCNM

3.2. Main Results for ODCNA

4. Simulations

5. Real Data Analysis

6. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Successive Projection Algorithm

Appendix B. Proofs under ONM

Appendix B.1. Proof of Proposition 1

Appendix B.2. Proof of Lemma 1

Appendix B.3. Proof of Theorem 1

Appendix B.4. Proof of Corollary 1

Appendix C. Proofs under ODCNM

Appendix C.1. Proof of Proposition 2

Appendix C.2. Proof of Lemma 2

Appendix C.3. Proof of Theorem 2

Appendix C.4. Proof of Corollary 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI