*2.3. Multiple Testing Procedure for FDR Control*

Now we introduce the multiple testing procedure for FDR control based on the single-class result from Equation (6). As suggested in [4], the partial correlation coefficient can be well estimated by a thresholding estimator

$$
\hat{\rho}\_{ij.}^{(k)} = \mathcal{S}\_{ij}^{(k)} I\{ |\mathcal{S}\_{ij}^{(k)}| \ge 2\sqrt{\frac{\log p}{n\_k}} \}.
$$

and we define the following two-sample test statistics

$$\mathcal{S}\_{ij}^{(k,k')} = \frac{\mathcal{S}\_{ij}^{(k)} - \mathcal{S}\_{ij}^{(k')}}{\sqrt{\frac{1}{n\_k}(1 - \{\boldsymbol{\rho}\_{ij.}^{(k)}\}^2)^2 + \frac{1}{n\_{k'}}(1 - \{\boldsymbol{\rho}\_{ij.}^{(k')}\}^2)^2}}.$$

In the multi-sample case *Sij* = (*S*(*k*,*<sup>k</sup>* ) *ij* )1≤*k*<*k*≤*K*, we consider a sum squared test statistics

$$\mathbb{S}\_{ij} = \sqrt{\sum\_{k$$

Motivated by [4] (Equations 2.6 and 2.7) and [7], we define the following statistic

$$T\_{ij} = \Phi^{-1} \mathbf{P} \left( \sqrt{\sum\_{i=1}^{M} \lambda\_i Z\_i^2} \le \mathbb{S}\_{ij} \right),$$

and constant *<sup>A</sup>* = (*P*<sup>0</sup> <sup>−</sup> *<sup>P</sup>*<sup>ˆ</sup> <sup>0</sup>)/*Q*0, where *Zi*, *i* = 1, ..., *M* represent a sequence of *M* i.i.d. standard normal random variables, *<sup>P</sup>*<sup>0</sup> <sup>=</sup> <sup>2</sup>Φ(1) <sup>−</sup> 1, *<sup>P</sup>*<sup>ˆ</sup> <sup>0</sup> <sup>=</sup> <sup>2</sup> <sup>∑</sup>1≤*i*<*j*≤*<sup>p</sup> <sup>I</sup>*{|*Tij*| ≤ <sup>1</sup>}/(*p*<sup>2</sup> <sup>−</sup> *<sup>p</sup>*), *<sup>Q</sup>*<sup>0</sup> <sup>=</sup> <sup>√</sup>2*φ*(1) and *<sup>A</sup>*(*t*)=(<sup>1</sup> <sup>+</sup> <sup>|</sup>*A*<sup>|</sup> <sup>|</sup>*t*|*φ*(*t*) <sup>√</sup>2(1−Φ(*t*)))−1. For a given 0 <sup>&</sup>lt; *<sup>α</sup>*<sup>0</sup> <sup>&</sup>lt; 1, let

$$t(a\_0) = \inf \left\{ t \in \mathbb{R}, 1 - \phi(t) \le \frac{a\_0 A(t) \max\{1, \sum\_{1 \le i < j \le p} I\{T\_{ij} \ge t\}\}}{(p^2 - p)/2} \right\}.$$

the FDR can be controlled at level *α*, if we reject *H*0*ij* : *Sij*(Ω) = 0 when *Tij* ≥ *t*(*α*0). One may refer to [7] for the detailed proof about this testing procedure.

Our proposed computational pipeline consisted of three steps: (1) Winsorized imputation for the latent Gaussian variables; (2) rank-based estimation of regression coefficients, and (3) multiple testing with FDR control. On the whole, we put forward a simple procedure to estimate the structural difference between multiple nonparanormal graphical models. The computational pipeline for a two-sample comparison has been implemented in the R package *DNetFinder*, which can be downloaded from the Comprehensive R Archive Network (CRAN).

#### **3. Numerical Study**

We performed a simulation study to evaluate the finite sample performance of the proposed procedure. In particular, we evaluated the empirical false discovery rate (eFDR) as well as the statistical power under two classes, i.e., *K* = 2. The dimension and sample size were set to be *p* = 200 and *n*<sup>1</sup> = *n*<sup>2</sup> = 100. We consider two commonly used graph-generating models including the band graph and Erd˝os–Rényi (ER) graph, and two estimators for regression coefficients including lasso estimator and Dantzig selector. Detailed set-up for precision matrices Ω<sup>1</sup> and Ω<sup>2</sup> are given below:

• **Band graph:** Ω<sup>1</sup> = (*ωij*)1≤*i*,*j*≤*<sup>p</sup>* was obtained by the following assignments

$$
\omega\_{ij} = \begin{cases} 1 & |i - j| = 0 \\ 0.6 & |i - j| = 1 \\ 0 & |i - j| \ge 2 \end{cases}
$$

We then randomly picked 50 edges in Ω<sup>1</sup> as the differential edges and changed their signs in Ω2. To ensure positive definiteness, we added max(|*λ*min(Ω1)|, |*λ*min(Ω2)|) + 0.05, to the diagonal of Ω<sup>1</sup> and Ω2.

• **Erd ˝os–Rényi (ER) graph:** Each node pair (*i*, *j*) were randomly connected with probability 5%. A correlation coefficient is generated for each edge in the network from a two-part uniform distribution [−1/2, −1/4] ∪ [1/4, 1/2]. To ensure positive-definiteness, we shrunk the correlations by a factor of 5 and the diagonals were set to be one for Ω1. We then randomly selected 5% of the edges as the differential edges, and changed their signs in Ω2.

For each graph, we generated the latent Gaussian data (oracle data) from *<sup>N</sup>*(0,Ω−1), <sup>Ω</sup> ∈ {Ω1,Ω2}, and a Winsorized estimator with truncation parameter *<sup>δ</sup><sup>n</sup>* <sup>=</sup> 1/(4*n*1/4*<sup>π</sup>* log *<sup>n</sup>*) was used to implement our test. The performance of the proposed method was then evaluated in two aspects: false discovery rate control and statistical power. In particular, we compared the results based on oracle data and imputed data by the Winsorized estimator. Two estimators including the lasso estimator and Dantzig selector were used to estimate coefficients *β*ˆ. For oracle data, we applied the R package *flare* to calculate the solution path over a sequence of 20 candidate *λ*'s and tune by Akaike information criterion (AIC). For imputed data, we adopted the rank-based methods introduced by [6], i.e., the rank-based lasso and rank-based Dantzig selector. The simulation was repeated for 100 times for each FDR level (*α* ∈ {0.05, 0.10, 0.15, ..., 0.50}) and the average empirical FDR and statistical power were summarized.

Figures 1 and 2 compared the empirical false discovery rate (eFDR) with the desired level *α* under the band graph and ER graph. It can be seen that the empirical FDR based on imputed data is close to the one by oracle data, both close to the desired level of *α*, suggesting that the FDRs were controlled quite well for both cases. The lasso estimator works almost equally well as Dantzig selector in both settings.

**Figure 1.** Empirical false discovery rates (eFDRs) by oracle data and Winsorized imputations under the band graph setting. The *x*-axis represents the desired FDR levels from 0.05 to 0.5, and the solid line is *y* = *x*.

**Figure 2.** Empirical FDRs (eFDRs) by oracle data and Winsorized imputations under the Erd˝os–Rényi (ER) graph setting. The *x*-axis represents the desired FDR levels from 0.05 to 0.5, and the solid line is *y* = *x*.

Figures 3 and 4 summarized the statistical power of the test for the band graph and ER graph. As can be seen, the power for ER graph is substantially lower than the band graph, indicating that the complexity and denseness of the underlying differential network may significantly decrease the power of our test. The test based on oracle data performs slightly better than the imputed data, which is due to the loss of information during Winsorized imputation. Similar as we observed from Figures 1 and 2, the lasso estimator works almost equally well as Dantzig selector.

**Figure 3.** Statistical powers by oracle data and Winsorized imputations under the band graph setting. The *x*-axis represents the desired FDR levels from 0.05 to 0.5.

**Figure 4.** Statistical powers by oracle data and Winsorized estimator under the ER graph setting. The *x*-axis represents the desired FDR levels from 0.05 to 0.5.

In addition, we compared the proposed test with a direct estimator, recently developed by Zhang (2019) [8]. The direct estimator is a rank-based estimator and can be solved by a parametric simplex algorithm. We simulated the data from the Erd˝os–Rényi (ER) graph with different sample sizes (*n* = 25, 50, 100, 150) and numbers of dimensions (*p* = 40, 60, 90, 120). As the direct estimator does not control the false discovery rate, we set the FDR level at 0.05 for our proposed test. Figure 5 summarized the empirical FDR and statistical power under different sample sizes (with dimension fixed at 100) and different dimensions (with sample size fixed at 100). It can be seen that the two methods have comparable performance and our proposed test achieves lower FDR but slightly lower statistical power. However, it is noteworthy that the direct estimator is computationally expensive and becomes impractical when the dimensions exceed 150. Table 1 summarized the running time of the two methods, where it can be seen that our test is much faster than the direct estimator, especially for relatively high dimensions. For instance, when *p* = 120, the direct estimator takes hours while our test takes less than 10 seconds. As the core part of the proposed algorithm is the estimation of regression coefficients, the time complexity is the same as the linear regression. For instance, with LASSO and *p* > *n*, the time complexity is *O*(*np*2), while the direct estimator by Zhang (2019) has a time complexity *O*(*np*4).


**Table 1.** Running time of the proposed test and direct estimator (in seconds).

**Figure 5.** Comparison of the proposed test and direct estimator by Zhang (2019), in terms of empirical FDR and statistical power under different sample sizes and dimensions.

## **4. A Genomic Application**

In this part, we applied the proposed test to the Cancer Genome Atlas data (TCGA, [9]) to study the different roles of the cell cycle pathway in the two subtypes of breast cancer including luminal A subtype and basal-like subtype. The cell cycle pathway is known to play a critical role in the initiation and progression of many human cancers including breast cancer and ovarian cancer [10,11]. For instance, the cell cycle pathway provided by KEGG (Kyoto Encyclopedia of Genes and Genomes, [12]) contains 128 important genes that co-regulate cell proliferation, including *ATM*, *RB1*, *CCNE1*, and *MYC*. Abnormal regulation among these genes may cause the over-proliferation of cells and an accumulation of tumor cell numbers [11].

The transcriptome profiling data for breast cancer were downloaded through the Genomic Data Commons portal [13] in January 2017. The expression level of each gene was quantified by the count of reads mapped to the gene. The quantifications were done by software *HTSeq* of version 0.9.1 [14]. In our analysis, we excluded 43 subjects including 12 male subjects and 31 subjects with >1% missing values. In addition, we removed the effects due to different age groups and batches using a medianmatching and variance-matching strategy [10,15,16]. For example, the batch effect can be removed in the following way:

$$\mathcal{g}\_{ijk}^{\*} = \mathcal{M}\_{\bar{i}} + (\mathcal{g}\_{\bar{i}\bar{j}k} - \mathcal{M}\_{\bar{i}\bar{j}}) \frac{\mathcal{O}\_{\mathcal{S}\bar{i}}}{\mathcal{O}\_{\mathcal{S}\bar{i}\bar{j}}} \mathcal{J}$$

where *gijk* refers to the expression value for gene *i* from sample *k* in batch *j* (*j* = 1, 2, ..., *J*; *k* = 1, 2, ..., *nj*), *Mij* represents the median of *gij* = (*gij*1, ..., *gijnj* ), *Mi* refers to the median of *gi* = (*gi*1, ..., *gi J*), *σ*ˆ *gi* and *σ*ˆ *gij* stand for the standard deviations of *gi* and *gij*, respectively.

The remaining 959 breast cancer samples were further classified into five subtypes according to two molecular signatures, namely *PAM50* [17] and *SCMOD2* [18]. The two classifications were implemented separately using R package *genefu* [19] and we obtained 530 subjects with concordant classification by two classifiers. The resulting set contains 221 subjects in the luminal A group, 119 in the luminal B group, 74 in the her2-enriched group, 105 in the basal-like group, and 11 in the normal-like group. For illustration purposes, we conducted two pairwise comparisons (1) Luminal A vs basal-like and (2) Luminal B vs basal-like.

To balance the bias and variance, we choose the same truncation parameter in Winsorized imputation as in our simulation study

$$\delta\_n^{(k)} = \frac{1}{4n\_k^{1/4}\sqrt{\pi\log n\_k}}\gamma$$

where *k* ∈ {1, 2}, *n*<sup>1</sup> = 221, *n*<sup>2</sup> = 105. The proposed test based on the Winsorized estimator was then conducted for each gene pair with different FDR cutoffs. Figures 6 and 7 summarized all the identified differential edges under FDR levels *α* = 0.05, 0.10, 0.15, 0.20, with all isolated genes being removed. Our results suggested a list of important genes that play different roles in different breast cancer subtypes. For instance, in Figure 6, genes *CCNB1* and *PRKDC* contribute to several differential edges. According to recent studies, gene *CCNB1* is a prognostic biomarker for certain subtypes of breast cancer and it is closely associated with hormone therapy resistance [20]. It has also been reported in the literature that the *PRKDC* regulates chemosensitivity and is a potential prognostic and predictive marker of response to adjuvant chemotherapy in breast cancer patients [21]. Our findings about several other genes including *CHEK2* and *CDC7* also confirmed some existing reports [22,23]. As we observed from the two examples, as the desired FDR level increases, the resulting differential network tends to be denser and denser (Figure 8 showed the correlation between FDR and the number of differential edges). In practice, users should consider the trade-off between the accuracy (FDR) and number of new hypotheses (number of differential edges) and choose an appropriate FDR [24].

**Figure 6.** *Cont.*

**Figure 6.** The inferred differential networks between the LumA and Basal-like subtypes under different desired FDR levels: (**a**) 0.05; (**b**) 0.10; (**c**) 0.15; (**d**) 0.20, with all isolated genes being removed. Each connection in the network represents an identified differential edge.

**Figure 7.** The inferred differential networks between the LumB and Basal-like subtypes under different desired FDR levels: (**a**) 0.05; (**b**) 0.10; (**c**) 0.15; (**d**) 0.20, with all isolated genes being removed. Each connection in the network represents an identified differential edge.

**FDR vs No. of differential edges**

**Figure 8.** Desired FDR level against the number of differential edges.

#### **5. Discussion**

Detecting the differential substructure on multiple graphical models is a fundamental and challenging problem in statistics. Liu (2017) studied the problem under the Gaussian framework and introduced an elegant hierarchical test based on the estimation of single GGM. Unlike most existing methods, Liu's approach asymptotically controlled the false discovery rate at a nominal level, which guarantees the quality of the estimated differential network. In this work, we further extended Liu's test to a more flexible semiparametric framework, namely the nonparanormal graphical models. Our test is built upon a Winsorized estimator of the unknown transformation functions and it enjoys similar asymptotic properties as its oracle counterpart does.

Although the new test holds great promise in many applications such as genetic network modeling, it has some practical limitations. First, as we see from the theoretical derivation, the good performance of the test relied on the sparsity assumption on the differential network. Although the sparsity assumption is reasonable in many cases, it still could be violated in some applications. For instance, some genetic pathways may exhibit a global change of gene–gene regulations between different phenotypes. When the differential network is dense or locally dense, the method may fail to control the FDR. To solve the problem, a new test needs to be defined to evaluate the level of the sparseness of the change between two conditions. However, there is still a gap on the literature of this topic.

Second, one key assumption in NPNGMs is that the transformed variables follow a joint Gaussian distribution. This assumption also needs to be checked in real-world applications. Under low dimensions, one can employ some popular normality tests, including the Anderson–Darling test and Shapiro–Wilk test, on the imputed data or other normal scores. However, most of these tests fail to detect non-normality for high-dimension data. The normality test under high dimension is still an open and challenging problem and we left it for future research.

It is also noteworthy to mention that the new test relied on an accurate estimator for the coefficients *β*. Motivated by [6], we chose two popular estimators including lasso estimator and Dantzig selector based on the adjusted Spearman's rank, which satisfies Condition (7). In fact, some other estimators also satisfy the conditions, for instance, the rank-based adaptive lasso [6,25] and square-root lasso estimator [6,26]. These estimators can also be incorporated into our testing framework.

#### **6. Conclusions**

We have introduced a novel statistical test to detect the structural difference between the two nonparanormal graphical models. The proposed test dropped the Gaussian assumption and can be potentially applied to many non-Gaussian data for differential network analysis. For instance, some digital gene expression data (e.g., RNA-seq data) do not follow Gaussian distribution even after log transformation or other variance-stabilizing transformations. In such cases, one can model the data with a nonparanormal graphical model and apply our test to find differential edges between two or multiple phenotypic conditions. The proposed test may also be used to detect the difference between normal and disease populations in the brain connectivity network.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflict of interest.
