**1. Background**

Inferring the structural change of a network under different conditions is essential in many problems arising in biology, medicine, and other scientific fields. For instance, in genomics, it is often of importance to study the structural change of a genetic pathway between diseased and normal groups. In the field of brain mapping, it is critical to identify the difference in brain connectivity between groups (for example, the brain connectivity network of normal subjects and patients often possess different structures). Most of these applications have relied on the prevailing Gaussian graphical models (GGMs) because of its good interpretability and computational convenience, and there is a rich and growing literature on learning differential networks under GGMs. To name a few, Guo et al. (2015) [1] introduced a joint estimation for multiple GGMs by a group lasso approach, under the assumption that the GGMs being studied are sparse and only differ in a small portion of edges. Danaher et al. (2014) [2] proposed a fused graphical lasso method which is free from the sparsity assumption on condition-specific networks and only requires the sparsity of the differential network. Zhao et al. (2014) [3] constructed a new estimator which directly estimates the differential network defined as Δ = Σ−<sup>1</sup> *<sup>X</sup>* <sup>−</sup> <sup>Σ</sup>−<sup>1</sup> *<sup>Y</sup>* , where <sup>Σ</sup>−<sup>1</sup> *<sup>X</sup>* and <sup>Σ</sup>−<sup>1</sup> *<sup>Y</sup>* represent the two condition-specific precision matrices and Δ, Σ−<sup>1</sup> *<sup>X</sup>* , <sup>Σ</sup>−<sup>1</sup> *<sup>Y</sup>* have the same dimension. Liu (2017) [4] presented a new test to simultaneously study structural similarities and differences between multiple high-dimensional GGMs, which adopts the partial correlation coefficients to characterize the potential changes of dependency strength between two variables.

Most of the aforementioned algorithms were based upon penalized likelihood maximization. Although some algorithms were consistent under certain regularity conditions, they failed to control the false discovery rate (FDR) of the substructure detection as it is difficult to choose a tuning parameter

to control the FDR at the desired level [1–3]. One exception is Liu (2017), who introduced a hierarchical testing framework to adjust for the multiplicity. Liu's test was constructed to asymptotically control the FDR while keeping satisfactory statistical power. Simulation studies in [4] have shown that this new test exhibits substantial power gains over existing methods such as graphical lasso. One major drawback that limits the application of Liu's test is the Gaussian assumption, which is often violated in practice especially in genomics. For instance, some digital measurements of gene expression level such as RNA-Seq data often greatly deviate from normality even after log-transformation or other variance-stabilizing transformations. In this paper, we aim to extend Liu's work to a more flexible semiparametric framework, namely the nonparanormal graphical models (NPNGMs), where the random variables are assumed to follow a multivariate normal distribution after a set of monotonically increasing transformations. We use a novel rank-based multiple testing method to detect the structural difference between multiple networks from non-Gaussian data. The method is computationally efficient and asymptotically controls the FDR at a desired level. To begin with, we give the formal definition of nonparanormal distribution:

**Definition 1.** *A random vector Y* = (*Y*1,*Y*2, ...,*Yp*) *follows a nonparanormal distribution if there exists a set of univariate and monotonically increasing transformations, f* = (*f*1, ..., *fp*)*, such that:*

$$(X\_1, \dots, X\_p) \equiv (f\_1(\varchi\_1), \dots, f\_p(\varchi\_p)) \sim N(\mathfrak{p}, \mathfrak{T}),$$

*where μ and* Σ *denote the mean and covariance matrix in the multivariate normal distribution, respectively. The distribution of Y depends on three parameters and it can be generally written as Y* ∼ *NPN*(*μ*,Σ, *f* ).

By Definition 1 and Sklar's theorem, it is easy to verify that when the transformation functions *f <sup>j</sup>s* are all differentiable, the nonparanormal distribution *NPN*(*μ*,Σ, *f* ) is equivalent to a Gaussian copula [5]. As graphical models, the NPNGMs are much more flexible than GGMs in modeling non-Gaussian data while retaining the interpretability of the latter. Some recent studies have established the estimation and properties of high dimensional nonparanormal graphical models. For example, Liu et al. (2009) [5], who first studied high-dimensional NPNGMs, bridged the estimations of GGMs and NPNGMs by a nonparametric and truncated (Winsorized) estimator of the unknown transformation functions. Xue and Zou (2012) [6] proposed to use an adjusted Spearman's correlation to estimate the structure of high-dimensional NPNGMs, and they showed that the rank-based estimator achieves the same rate of convergence as its oracle counterpart (i.e., assuming known transformation functions). Despite the advances in single NPNGM estimation, to the best of our knowledge, the inference of differential substructure between multiple NPNGMs has not been studied. In this paper, we tackled this problem by embedding the Winsorized estimator into the testing framework of Liu (2017). Under some regularity conditions, we showed that the new test statistic converges to the same distribution as its oracle counterpart does [4].

We begin with the notations and problem formulation. For a vector *a* = (*a*1, ..., *ap*), we define its -<sup>0</sup> norm as *a*-<sup>0</sup> <sup>=</sup> <sup>∑</sup>*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> *I*{*ai* = 0}, its -<sup>1</sup> norm as *a*-<sup>1</sup> <sup>=</sup> <sup>∑</sup>*<sup>p</sup> <sup>i</sup>*=<sup>1</sup> |*ai*|, its -<sup>2</sup> norm as *a*-<sup>2</sup> = ∑*p <sup>i</sup>*=<sup>1</sup> *<sup>a</sup>*<sup>2</sup> *i* , and its -<sup>∞</sup> norm as *a*-<sup>∞</sup> <sup>=</sup> max*<sup>i</sup>* <sup>|</sup>*ai*|. For a matrix *<sup>A</sup>* = (*aij*) <sup>∈</sup> <sup>R</sup>*p*×*q*, we define its -<sup>0</sup> norm as *A*<sup>0</sup> = ∑*i*,*<sup>j</sup> I*{*aij* = 0}, its -<sup>1</sup> norm as *A*<sup>1</sup> = ∑*i*,*<sup>j</sup>* |*aij*|, its Frobenius norm as *A<sup>F</sup>* = ∑*i*,*<sup>j</sup> a*<sup>2</sup> *ij* and its -<sup>∞</sup> norm as *A*<sup>∞</sup> = max*i*,*<sup>j</sup>* |*aij*|. Let *Ai*,−*<sup>j</sup>* denote the *i*th row of *A* with its *j*th entry being removed and *A*−*i*,*<sup>j</sup>* denote the *j*th column with its *i*th entry being removed. We use *A*−*i*,−*<sup>j</sup>* to denote a (*p* − <sup>1</sup>) × (*q* − <sup>1</sup>) matrix by removing the *i*th row and the *j*th column. For square matrix *B*, we let *λ*max(*B*) and *λ*min(*B*) denote the largest and smallest eigenvalues of *B* respectively. In addition, for a given sequence of random variable {*Xn*, *n* = 1, 2, ...} and a constant sequence {*an*, *n* = 1, 2, ...}, *Xn* = *op*(*an*) denotes that *Xn*/*an* converges to zero in probability as *n* approaches to infinity and *Xn* = *Op*(*an*) denotes that *Xn*/*an* is stochastically bounded. If there are positive constants *c* and *C* such that *c* ≤ *Xn*/*an* ≤ *C* for all *n* ≥ 1, we write *Xn* ∼ *an*.

To formulate the problem, we let *k* ∈ {1, 2, ..., *K*} be the index of class, *p* be the dimension, and (*Y*(*k*) <sup>1</sup> , ...,*Y*(*k*) *nk* ) be a sample of size *nk* for class *<sup>k</sup>* where *<sup>Y</sup>*(*k*) *<sup>m</sup>* = (*Y*(*k*) *<sup>m</sup>*<sup>1</sup> , ...,*Y*(*k*) *mp* )*<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*p*, *<sup>m</sup>* ∈ {1, ..., *nk*}. Under *<sup>Y</sup>*(*k*) *<sup>m</sup>* <sup>∼</sup> *NPN*(*μ*(*k*),Σ(*k*), *<sup>f</sup>* (*k*)), we test the following hypothesis:

$$\begin{aligned} H\_{0ij} &: \rho\_{ij\cdot}^{(1)} = \rho\_{ij\cdot}^{(2)} = \dots = \rho\_{ij\cdot}^{(K)}, \\ H\_{\text{aij}} &: \rho\_{ij\cdot}^{(k)} \neq \rho\_{ij\cdot}^{(k')} \text{ for some } k, k' \in \{1, \dots, K\}, \end{aligned}$$

where 1 <sup>≤</sup> *<sup>i</sup>*, *<sup>j</sup>* <sup>≤</sup> *<sup>p</sup>*, {Σ(*k*)}−<sup>1</sup> <sup>=</sup> <sup>Ω</sup>(*k*) = (*ω*(*k*) *ij* ), and *ρ* (*k*) *ij*· represents the partial correlation coefficient between *X*(*k*) *<sup>i</sup>* and *<sup>X</sup>*(*k*) *<sup>j</sup>* given *<sup>X</sup>*(*k*)\(*X*(*k*) *<sup>i</sup>* , *<sup>X</sup>*(*k*) *<sup>j</sup>* ), (*X*(*k*) *<sup>m</sup>*<sup>1</sup> , ..., *<sup>X</sup>*(*k*) *mp*)=(*f* (*k*) <sup>1</sup> (*Y*(*k*) *<sup>m</sup>*<sup>1</sup> ), ..., *f* (*k*) *<sup>p</sup>* (*Y*(*k*) *mp* )). The edge (*i*, *j*) is a differential edge if *ρ* (*k*) *ij*· <sup>=</sup> *<sup>ρ</sup>* (*k* ) *ij*· for some *<sup>k</sup>*, *<sup>k</sup>* ∈ {1, ..., *<sup>K</sup>*}, and the differential network is defined as the set of all differential edges. As a well-known result in statistics, *ρ* (*k*) *ij*· <sup>=</sup> <sup>−</sup>*ω*(*k*) *ij* / *ω*(*k*) *ii <sup>ω</sup>*(*k*) *jj* . Here, we consider an equivalent alternative of the hypothesis testing above. Similar as in [4], let

$$S\_{\vec{ij}}(\mathbf{\Omega}) = \sqrt{\sum\_{1 \le k < k' \le K} (\rho\_{\vec{ij}\cdot}^{(k)} - \rho\_{\vec{ij}\cdot}^{(k')})^2},\tag{1}$$

then the hypothesis testing can be simplified as

$$\begin{aligned} H\_{0i\bar{j}} &: S\_{i\bar{j}}(\mathbf{On}) = 0, \\ H\_{a\bar{i}\bar{j}} &: S\_{i\bar{j}}(\mathbf{On}) > 0. \end{aligned}$$

As *Sij*(Ω) = *Sji*(Ω), we define H<sup>0</sup> = {*H*0*ij*, 1 ≤ *i* < *j* ≤ *p*} and H*<sup>a</sup>* = {*Haij*, 1 ≤ *i* < *j* ≤ *p*}, and the total numbers of tests are *p*(*p* − 1)/2, i.e., *card*(H0) = *card*(H*a*) = *p*(*p* − 1)/2.

The rest of this paper is structured as follows: In Section 2, we introduce the new test statistic and multiple testing procedure. In Section 3 we perform a simulation study to evaluate the finite sample performance of the proposed test in terms of FDR control and statistical power. We then apply the new method to a rich genomic data to study the genetic difference between four breast cancer subtypes. We discuss the strength and shortcomings of the test in Section 5. Technical proof of the asymptotic results is provided in Appendix A.
