Next Article in Journal
A Parametric Six-Step Method for Second-Order IVPs with Oscillating Solutions
Next Article in Special Issue
Optimal Weighted Markov Model and Markov Optimal Weighted Combination Model with Their Application in Hunan’s Gross Domestic Product
Previous Article in Journal
Continuous Multi-Target Approaching Control of Hyper-Redundant Manipulators Based on Reinforcement Learning
Previous Article in Special Issue
Nonparametric Copula Density Estimation Methodologies
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

netQDA: Local Network-Guided High-Dimensional Quadratic Discriminant Analysis

Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15216, USA
Department of Pediatrics, University of Pittsburgh Medical Center Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
Department of Biostatistics & Data Science, University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas City, KS 66160, USA
Author to whom correspondence should be addressed.
Mathematics 2024, 12(23), 3823;
Submission received: 31 October 2024 / Revised: 25 November 2024 / Accepted: 29 November 2024 / Published: 3 December 2024
(This article belongs to the Special Issue Statistical Forecasting: Theories, Methods and Applications)


Quadratic Discriminant Analysis (QDA) is a well-known and flexible classification method that considers differences between groups based on both mean and covariance structures. However, the connection structures of high-dimensional predictors are usually not explicitly incorporated into modeling. In this work, we propose a local network-guided QDA method that integrates the local connection structures of high-dimensional predictors. In the context of gene expression research, our method can identify genes that show differential expression levels as well as gene networks that exhibit different connection patterns between various biological state groups, thereby enhancing our understanding of underlying biological mechanisms. Extensive simulations and real data applications demonstrate its superior performance in both feature selection and outcome classification compared to commonly used discriminant analysis methods.

1. Introduction

Genes are the fundamental building blocks of the human genome. Comprising segments of DNA, genes encode vital information responsible for the production of proteins or RNA molecules and play indispensable roles in shaping and governing the structure and functionality of the human body. Advancements in biotechnologies have enabled rapid and cost-effective next-generation sequencing of large amounts of genes or RNAs. This has provided us with opportunities to conduct in-depth scans of the human genome and deepen our understanding of the roles of disease-causing genes. Such studies have provided substantial evidence that genes can exhibit significant differences in their expression levels and co-regulation patterns between different groups of disease outcomes.
Genes may express themselves at significantly different levels between different disease status groups. By comparing the expression levels of genes among these groups, researchers can identify strongly differentially expressed (DE) genes. Such genes are usually informative in predicting disease outcomes and shedding light on their roles in disease development mechanisms [1]. For instance, in a study of atopic asthma, the gene CST1 exhibits significant DE levels between patients with atopic asthma and non-atopic controls [2] (Figure 1). We refer to such highly differentially expressed genes as strong DE genes in our study.
In addition to strong DE genes, there are also genes with marginally weak and undetectable DE levels, such as CLCA1, which demonstrate limited differentiation effects marginally (Figure 1). However, when these weak DE genes are considered alongside their co-regulating DE genes, they can exhibit a predictive effect on outcome classes [3,4]. Even though their biological implications can be significant, such weak DE genes are typically difficult to detect in studies with small to moderate sample sizes.
Additionally, it is crucial to recognize that genes do not function in isolation; rather, they co-regulate with each other through various mechanisms. Gene co-regulations change under different biological conditions, leading to alterations in gene connection or co-regulation patterns. Examining the differences in these patterns between different disease outcome groups can further help uncover disease-related gene networks or pathways [1]. For example, a previous study revealed a higher number of gene connections in atopic asthma patients compared to control subjects (Figure 2), suggesting distinct patterns of gene connectivity in the two disease groups [2]. In our study, we refer to genes exhibiting distinct connection (or co-regulation) patterns in different disease groups as differentially connected (DC) genes.
Identifying informative DE (both strong and weak) and DC genes is crucial for a better understanding of underlying molecular mechanisms, identifying novel disease-related biomarkers, and improving disease prediction. However, challenges arise when dealing with high-dimensional gene expression data. Various methods have been developed to tackle feature selection and outcome classification in high-dimensional classification settings.
Among the top performers is Sure Independent Screening (SIS), a marginal screening method that utilizes a marginal correlation ranking procedure [6,7,8]. However, it is a marginal screening approach and, therefore, ignores inter-feature connections when selecting informative features [3]. In the context of selecting disease-related genes, since SIS only selects top genes with strong marginal correlations with the disease outcomes, it can, therefore, only identify strong DE genes, but cannot identify weak DE or DC co-regulating genes. Weak DE and DC genes, however, can play important roles in disease development and progression by regulating other strong genes. In terms of prediction, SIS may give inferior prediction performance as it tends to miss the weak DE and DC genes that might be predictive of the disease outcomes.
Apart from SIS, penalized regression approaches such as Lasso [9] and Elastic Net [10] are also commonly adopted methods. Such methods jointly model all features together and use a penalization term to select informative features, resulting in improved feature selection and outcome prediction. However, those methods face challenges in ultra-high dimensional settings, including computational expediency, estimation accuracy, and algorithmic stability [7]. Even though such joint approaches may help with identifying weak DE genes, they are still incapable of detecting DC genes.
In 2019, Li et al. introduced a feature selection and outcome classification method—mLDA method [3]. mLDA is designed to identify both strong and weak DE features for more accurate outcome classification. However, mLDA is developed under the linear discriminant analysis (LDA) framework. Since LDA models assume a common connection structure across all genes, mLDA cannot be used to identify DC features either, even though it showed promising improvement in classification by incorporating both strong and weak DE genes.
To address these challenges, we propose a network-based quadratic discriminant analysis (netQDA) method that can effectively identify both strong and weak DE genes, as well as DC genes that are informative for outcome classification.
Quadratic Discriminant Analysis (QDA) is a well-known classical classification method that allows for different inter-feature connection structures between groups [11]. This characteristic of QDA makes it suitable for detecting DC features. netQDA is a computationally efficient method. It is can be used to detect strong and weak DE genes, as well as DC genes. netQDA can handle not only high-dimensional features but also ultrahigh-dimensional features, where the number of features is of exponential order of the sample size (e.g., methylation profile with 450 million CpG sites). Compared to other competing methods, netQDA can provide better insight and understanding of the disease-developing mechanisms by identifying informative weak DE genes, co-regulations, and gene networks.
The rest of the paper is organized as follows. Section 2: introduction of the netQDA algorithm. Section 3: evaluation of the proposed method’s performance using simulation studies. Section 4: application of the proposed algorithm to analyze real data applications. And Section 5: conclusion and discussion.

2. Method

2.1. QDA Classification Rule

Consider a binary classification problem, where K = 2 . Denote by Y the group membership. Let X = ( X 1 , , X p ) be a p dimensional vector of features. Suppose that the independent observations of the p-feature vector in group 1 are generated from the conditional distribution X { Y = 1 } N μ 1 , Σ 1 and those in group 2 are generated from X { Y = 2 } N μ 2 , Σ 2 , where μ k R p , k = 1 , 2 are the mean vectors. Σ k R p × p , k = 1 , 2 are the covariance matrices. Denote by Ω k = Σ k 1 the group-specific precision matrix that characterizes the inter-feature connections. When Σ 1 = Σ 2 , the classification boundary becomes linear, and QDA reduces to LDA.
The goal of discriminant analysis is to classify a new observation x , which is drawn from either group with prior binary distribution π 1 and π 2 = 1 π 1 but without a known group label. In the ideal case in which all parameters θ = ( π 1 , π 2 , μ 1 , μ 2 , Σ 1 , Σ 2 ) are known, the classification rule is to find the class membership that has a greater log-likelihood value δ k ( x ) with
δ k ( x ) = 1 2 log Σ k 1 2 x μ k Σ k 1 x μ k + log π k .
By setting the second group as the reference group and taking the difference between δ 1 ( x ) and δ 2 ( x ) , the above classification rule can be written as
G ^ θ ( x ) = argmax k δ k ( x ) = 1 , 2 δ Ω 2 ( x μ ¯ ) + x μ 1 D x μ 1 log Σ 1 Σ 2 + 2 log π 1 π 2 > 0 , 2 , 2 δ Ω 2 ( x μ ¯ ) + x μ 1 D x μ 1 log Σ 1 Σ 2 + 2 log π 1 π 2 0 ,
where δ = μ 2 μ 1 , μ ¯ = μ 1 + μ 2 2 , and D = Ω 2 Ω 1 with Ω k = Σ k 1 for k = 1 , 2 . A detailed derivation of (1) can be found in [12].

2.2. Informative Feature Selection

Evaluating the classification rule (1) involves calculating the group-specific covariance matrices Σ k , k = 1 , 2 and their inverses Ω k . While Σ k can be estimated by Σ ^ k = i : Y i = k ( X i μ ^ k ) ( X i μ ^ k ) / ( n 1 ) , when p > n , computing Ω k could be computationally prohibiting, especially when p n .
In fact, it is unnecessary to estimate the entire matrices of Ω k . Typically, a dimension reduction is first carried out, and the classification rule is then applied to the reduced feature space. This way, Σ k and Ω k only need to be estimated based on the features within the reduced feature space, which significantly alleviates the computational burden.
It is crucial to retain both the informative differential expression and the connectivity information in the reduced feature space. In Equation (1), the strong DE information is represented by δ , the weak DE information is represented by reference class connection adjusted mean differences: Ω 2 δ [3], and the differential connectivity (DC) information is represented by D . Therefore, we will select strong/weak DE and DC genes based on δ , Ω 2 δ , and D , respectively.
To effectively select strong differentially expressed (DE) genes, we can rank them based on the magnitude of marginal mean differences or conduct marginal t-tests. However, to identify co-regulating weak DE genes, we must assess the predictive score [ Ω 2 ] j . δ for gene j, where [ Ω 2 ] j . represents the jth row vector in Ω 2 . To expedite this process and avoid evaluating the entire matrix Ω 2 , we adopted a similar approach as outlined in [3].
For each strong DE gene selected initially, we identify its connected component in the thresholded sample covariance matrix Σ ˜ 2 . Here, [ Σ ˜ 2 ] j j = [ Σ ^ 2 ] j j I ( | R ^ j j | > α ) , where Σ ^ 2 denotes the sample covariance matrix, R ^ j j denotes the sample correlation matrix in class 2, I denotes the indicator function, and 0 < α < 1 serves as a thresholding tuning parameter. The rationale behind this is that the connected component structure of Ω 2 can be consistently estimated by the connected component structure of Σ ˜ 2 [13]. Following this approach, one can concentrate on the connected component containing gene j when evaluating its predictive score [ Ω 2 ] j . δ . Since the submatrix of Ω 2 confined to this connected component is of much smaller dimension compared to the entire Ω 2 , the computational burden of estimating these submatrices is substantially reduced, while retaining essential connection information within them.
In Equation (1), it is evident that on the training data, the average discriminant effects involving D are μ 2 μ 1 D μ 2 μ 1 . Similarly, for the same rationale applied in selecting weak DE genes, when selecting discriminative DC genes, one only needs to focus on the connected components of strong DE genes in both Ω 1 and Ω 2 .

2.3. Algorithm of Local Network-Guided Quadratic Discriminant Analysis (netQDA)

The netQDA algorithm is detailed in the following.

2.3.1. Strong DE Gene Selection

Given the training dataset { Y i , X i } i = 1 n with X i = ( X i 1 , X p ) , denote by X ¯ . j ( k ) = i : Y i = k X i j / n k the sample mean of feature j within class k, k = 1 , 2 . Select the strong DE gene set S ^ s D E by
S ^ s D E = j : X ¯ . j ( 1 ) X ¯ . j ( 2 ) > τ ,
where τ > 0 is a thresholding parameter controlling the size of S ^ s D E or | S ^ s D E | .

2.3.2. Weak DE Gene Selection

For each selected strong DE gene, we then identify its connected components in the thresholded class-specific sample covariance matrices Σ ˜ 1 and Σ ˜ 2 for each of the two classes. Notice that the thresholding parameters α 1 and α 2 can differ.
To detect weak DE features, for each strong DE gene detected, we focus on its connected components in Ω 2 , which can be consistently approximated by its connected components in Σ ˜ 2 . Specifically, for each strong DE gene l, we identify its connected component in Σ ˜ 2 using the recursive labeling algorithm [14]. Denote the connected component of gene l by C l . Suppose there are B 2 distinct connected components in total. Let U 2 = l = 1 B 2 C l . It is clear that S ^ s D E U 2 . Define Σ ^ l ( 2 ) as the principal sub-matrix of Σ ^ 2 with the row and column indices restricted to C l , and calculate Ω ^ l ( 2 ) = ( Σ ^ l ( 2 ) ) 1 . Let Ω ^ U 2 ( 2 ) = diag ( Ω ^ 1 ( 2 ) , , Ω ^ B 2 ( 2 ) ) be a block diagonal matrix of dimension u 2 × u 2 , where u 2 = | U 2 | , and let Ω ^ 2 = diag ( Ω ^ U 2 ( 2 ) , 0 ) be the estimate of whole Ω 2 .
To identify weak DE features, we consider U 2 as the candidate set for them. Thus, the estimated weak DE feature set can be obtained as follows:
S ^ w D E = { j U 2 S ^ s D E c : | j U 2 δ ^ j Ω ^ 2 , j j | ν } ,
where δ ^ j = X ¯ . j ( 1 ) X ¯ . j ( 2 ) represents the difference in sample means between the two classes, ν > 0 is a thresholding parameter controlling the size of the selected weak DE features, and S ^ s D E c denotes the complement of S ^ s D E . Finally, we define S ^ D E = S ^ s D E S ^ w D E as the set of all DE genes containing both strong and weak DE genes.

2.3.3. Select DC Feature

Similarly to detecting weak DE genes, for each strong DE gene detected, we identify its connected components in Ω 1 , which can be consistently approximated by its connected components in Σ ˜ 1 . For each strong DE gene l, we identify its connected component C ˜ l in Σ ˜ 1 . For the distinct B 1 connected components in Σ ˜ 1 , let U 1 = l = 1 B 1 C ˜ l . Notice that B 1 and B 2 could be different. Let Σ ^ l ( 1 ) be the principal sub-matrix of Σ ^ 1 with the row and column indices restricted to C ˜ l . Let Ω ^ U 1 ( 1 ) = diag ( ( Σ ^ 1 ( 1 ) ) 1 , , ( Σ ^ B 1 ( 1 ) ) 1 ) be a block diagonal matrix of dimension u 1 × u 1 , where u 1 = | U 1 | , and let Ω ^ 1 = diag ( Ω ^ U 1 ( 1 ) , 0 ) be the estimate of whole Ω 1 .
Then the DC gene set can be estimated by S ^ D C = { j j U : | D j j | > η } , where D ^ = Ω ^ 1 Ω ^ 2 , represents the difference between the block diagonal matrices of precision matrices, and η is the thresholding parameter controlling the number of selected DC genes.

2.3.4. Final Informative Feature Set and Test Data Classification

The final selected feature set is denoted by S ^ 0 = S ^ s D E S ^ D C . For a new observation with covariate vector x new , we only use the selected features in S ^ 0 to determine its class membership. Suppose μ ^ ( k ) , Ω ^ ( k ) , δ ^ , μ ¯ ^ , D ^ , Σ ^ ( k ) , π ^ k are the estimates of μ ( k ) , Ω ( k ) , δ , μ ¯ , D , Σ ( k ) , π k , respectively, from the training data. We use the subscript s to denote the subvector or submatrix of a vector or matrix with its element indices confined to S ^ 0 . Then, the group membership of x new is determined by plugging these subvector and matrice values into a classification rule (1). Specifically, we classify x new by
G ^ θ ( x new ) = 1 , 2 δ ^ s Ω ^ s ( 2 ) ( x s new μ ¯ ^ s ) + x s n e w μ ^ s ( 1 ) D ^ s x s new μ ^ s ( 1 ) log Σ ^ s ( 1 ) Σ ^ s ( 2 ) | + 2 log π ^ 1 π ^ 2 > 0 , 2 , 2 δ ^ s Ω ^ s ( 2 ) ( x s new μ ¯ ^ s ) + x s n e w μ ^ s ( 1 ) D ^ s x s new μ ^ s ( 1 ) log Σ ^ s ( 1 ) Σ ^ s ( 2 ) | + 2 log π ^ 1 π ^ 2 0 .

3. Simulation Studies

In our simulation studies, we focused on binary classification with K = 2 . To assess the performance of our method, we compared the performance of our method with several other feature selection and outcome classification techniques commonly used in high-dimensional settings: mLDA [3] SIS [8,15] lasso [9] and elastic net [10,16].
In transcriptomic studies, the number of targeted genes in the pathways of interest usually ranges from dozens to a few thousand. To mimic the real scenario, in the simulation studies, we considered a total of p = 1000 predictors, which included two connected components, each with one strong DE feature. To create strong DE features, we set the means of all features, except for the strong DE features, to 0. For the strong DE features, we ensured that the means in the first group were 0, while in the second group, the means were randomly selected from ± 1.5 . The features within each connected component were sampled from their group-specific multivariate normal distribution with a group-specific precision matrix or group-specific partial correlation structure, which characterized the dependencies among the features within the component in the same group. We explored two structures for the precision matrices: star and band. In the star structure, one feature (the center of the star) was connected to all other informative features within the same connected component. However, the remaining features within the component were not directly connected to each other. On the other hand, in the band structure, the features were arranged in a band-like pattern. Both star and band connection structures are the most commonly seen in gene coregulation pathways, such as Kyoto Encyclopedia of Genes and Genomes pathways. Star structures characterize the connections between a hub gene and its neighboring coregulated genes, while band structures characterize a chain of coregulations between multiple genes.
In our simulation studies, we defined weak DE features as those that were connected to a strong DE feature directly or indirectly (meaning through other features) in both of the two groups. In other words, weak DE features exhibited associations with the strong DE features in both classes. On the other hand, DC features were defined as pairs of features that have different connections or strengths in the two groups. These features demonstrate variations in their relationships between the groups. It is important to note that if a feature is identified as a strong DE feature, it is not considered a weak DE or DC feature. By manipulating the structures and partial correlation coefficients of the group-specific precision matrices (see Figure 3, Figure 4 and Figure 5), we varied the number of weak DE features and DC features. This allowed us to investigate the impact of different combinations of weak DE and DC features on the performance of feature selection and outcome classification. It is important to note that any features that were not considered strong DE, weak DE, or DC features were random noise and did not contribute to distinguishing the two groups.
In scenario S 1 (Figure 3), we focused on a specific setting where the informative features were limited to strong DE features and DC features. The precision matrix structure for each connected component in group two was fixed to include only one strong DE feature. However, we varied the structure of the precision matrices for each connected component in group one, examining both the star structure (Figure 3a) and the band structure (Figure 3b). Additionally, we manipulated the number of DC features, ranging from two to four in each connected component, to assess their impact on the performance of the methods. Whenever there was a connection between features in either of the two groups, we fixed the partial correlation coefficient ( ρ ) at 0.5 .
In scenario S 2 (Figure 4), we designed the simulation to focus on a specific setting where the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. These categories were mutually exclusive, meaning that a feature could only belong to one of these categories. To ensure non-overlapping categories of DE and DC features, we fixed the partial correlation coefficient ( ρ = 0.5 ) whenever there was a connection between features in either of the two groups. In all the sub-scenarios within S 2 , we maintained the presence of one strong DE feature in each connected component. The total number of DC features and DE features combined summed up to three in each component. We systematically varied the number of DC features from three to zero, while simultaneously increasing the number of weak DE features from zero to three. For example, in sub-scenario S 2 -1, we specifically examined a setting with one strong DE feature and three DC features. In sub-scenario S 2 -2, we introduced an additional category of weak DE features. This scenario included one strong DE feature, one weak DE feature, and two DC features, providing a more diverse setting to evaluate the methods’ performance.
Unlike simulation scenario S 2 , in scenario S 3 (Figure 5), the weak DE and DC features were allowed to overlap, meaning that a feature could belong to both categories simultaneously. To achieve this overlapping behavior, we introduced different correlation coefficients for each precision matrix in the connected components. This variation in ρ allowed us to control the strength of the connection between features and create overlapping weak DE and DC features. Within each connected component, we ensured the presence of one strong DE feature and three DC features. However, the number of overlapping weak DE and DC features varied across different configurations. We explored scenarios with one, two, and three overlapping weak DE and DC features, respectively.
In each scenario, we conducted 100 independent realizations to evaluate the performance of the methods. Each realization consisted of a training set, a validation set, and a test set. The training set contained 200 samples in each group, while both the validation and test sets contained 100 samples in each group. These sample sizes were chosen to be similar to the sample sizes in our real data analyses. The training set was used to train the model, where the optimal tuning parameter and the associated final models were selected based on the classification performance on the validation set. Finally, the test sets were used to evaluate the final model’s classification performance. To assess the outcome classification performance of the methods, we utilized the prediction accuracy metric. Prediction accuracy measures the proportion of correctly classified instances. Additionally, for evaluating the feature selection aspect of the final model, we employed metrics such as precision, recall, and the F1 score. Precision quantifies the proportion of correctly selected informative features out of all the features selected. Recall measures the percentage of correctly selected informative features among all the informative features present. The F1 score combines precision and recall, providing a balanced measure that considers both aspects of feature selection accuracy.
Across all sub-scenarios in Scenarios S 1 , S 2 , and S 3 , the netQDA method consistently outperformed all other competing methods, regardless of whether the structures were band or star in terms of outcome classification. Notably, netQDA demonstrated increasing accuracy as the number of DC features increased. Moreover, netQDA consistently achieved the highest performance in feature selection, surpassing other methods in terms of F1 score, precision, and recall metrics.
In all sub-scenarios of Scenarios S 1 , S 2 , and S 3 , all methods successfully identified all strong DE features. Notably, in Scenario S 1 , netQDA showcased exceptional capability by accurately selecting nearly all of the DC features, surpassing the performance of other methods that encountered difficulties in this task. In Scenario S 2 , which involved the presence of both DC-only and weak DE-only features, netQDA exhibited superior performance in accurately detecting the DC-only features compared to other methods. Additionally, netQDA achieved perfect detection of weak DE-only features in the star structure, and it significantly outperformed other methods in detecting weak DE-only features in the band structure. In Scenario S 3 , where the presence of DC-only features and overlapping DC and weak DE features was possible, netQDA consistently demonstrated the ability to detect the DC features. Additionally, netQDA successfully detected a significant proportion of the overlapping weak DC and weak DE features, surpassing the performance of other methods in this challenging scenario. These results hold true regardless of whether the precision matrices followed a star or band structure (see Figure 6, Figure 7 and Figure 8 and Table 1, Table 2 and Table 3).

4. Real Data Application

In order to evaluate the performance of our method in gene expression analysis, we conducted experiments using three real gene expression datasets. To ensure the reliability and reduce bias in our results, we employed a three-fold nested cross-validation. This approach allowed us to obtain more robust and unbiased estimates of our model’s performance. In addition, we compared the performance of our method with several well-known techniques, including mLDA [3], lasso [9], elastic net [10], and SIS [8]. All the data were log-transformed and standardized before fitting the models.

4.1. Chronic Obstructive Pulmonary Disease Data GSE76705

The Chronic Obstructive Pulmonary Disease (COPD) dataset utilized in our study was acquired from the Gene Expression Omnibus (GEO) repository, a publicly accessible resource identified by accession number GSE76705 [17,18]. This dataset comprises gene expression data obtained from a whole blood microarray experiment, gathered from 229 individuals diagnosed with COPD. The severity of COPD in these patients was determined by assessing the FEV1/FVC ratio, a measure of lung function that varies among the patient population. To adhere to the guidelines provided by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), patients with an FEV1/FVC ratio below 50% were classified as having severe or very severe COPD, while those with a ratio equal to or greater than 50% were categorized as having mild or moderate COPD. Within the dataset, there were 130 patients with mild or moderate COPD and 99 patients with severe or very severe COPD. To proceed with our analysis, we selected the top 1000 genes exhibiting the highest variability from the dataset to predict the severity of the disease, specifically to classify patients as either having mild/moderate COPD or severe/very severe COPD.
The left-most part of Table 4 summarizes the prediction performance including area under the receiver operating characteristic curve (ROC-AUC) and prediction accuracy. All of the methods produced a numeric output, which was the predicted probability of severe/very severe COPD. The final predicted outcome label is assigned by setting a risk threshold on the predicted probability. The method-specific risk threshold is the optimal threshold selected for the method based on Youden’s Index using R package InformationValue [19]. Among the methods evaluated, netQDA exhibited superior performance in predicting the COPD severity. The netQDA also identified genes that are associated with lung function and the risk of COPD, which were not identified by other competing methods. Notably, netQDA selected the gene IFIT3, which has previously been linked to the severity of airflow limitation in COPD [20] as well as the risk of COPD [21]. Similarly, netQDA identified MX1 [21,22] and OASL [22,23], which have been shown to be decreased in subjects with low FEV1/FVC ratios compared to those with normal FEV1/FVC ratios. Furthermore, among the genes exclusively discovered by netQDA, IFI44 [21], IFI44L [21], IFIT1 [21,23], and RSAD2 [21] have been shown to be associated with COPD. These findings underscore the potential of netQDA in identifying relevant genes associated with lung function and the risk of COPD that may have been overlooked by other methods.
To assess the association of the selected genes with the relevant disease, functional enrichment analysis using canonical pathways was performed using the Ingenuity Pathway Analysis (IPA) software (QIAGEN, Inc., Redwood City, CA, USA., accessed on 23 July 2023). The top five enriched ingenuity canonical pathways by each method are listed in Table 5. By leveraging the potential informative genes selected by netQDA, we were able to identify the following COPD-related pathways: ‘Interferon Signaling’ [24,25], ‘Role of Hypercytokinemia/hyperchemokinemia in the Pathogenesis of Influenza’ [25], and ‘IL-12 Signaling and Production in Macrophages’ [26] (refer to Table 5).

4.2. Allergy Data GSE141661

In addition to our previous evaluations, we assessed the performance of our method using another publicly available dataset obtained from the GEO with accession number GSE141661 [27]. This dataset comprises whole blood gene expression data obtained via array technology from a total of 256 participants enrolled in the BAMSE study. Out of these participants, 100 individuals were diagnosed with an allergy, which encompasses conditions like asthma, dermatitis, and rhinitis. The remaining 156 participants were diagnosed as non-allergic individuals. By analyzing this dataset, we aimed to further validate the performance and applicability of our method in the context of allergic diseases. From the dataset, we selected the top 2000 genes with the largest variance to predict the disease status (no allergy vs. allergy).
The middle part of Table 4 provides a summary of the prediction performance for the dataset. When considering ROC-AUC as the evaluation metric, both our netQDA method and mLDA method achieved the highest prediction performance. Regarding potential allergy-related informative genes, in addition to those selected by one or more competing methods, which include BIRC6-AS1, CLC, CYSLTR2, GSTA7P, HRH4, MIR362, MIR511, and RNU6-69P, netQDA identified an additional 11 genes. Among these genes, IL5RA has been shown to be a biologically relevant allergy-related gene [28,29,30,31,32]. The gene P2RY14 is associated with allergic asthma [33,34,35]. Additionally, two additional microRNAs, MIR516B2 and MIR888, were detected. MircoRNAs have gained increasing recognition as crucial regulators for allergic inflammation with a special focus on asthma, atopic dermatitis and allergic rhinitis [36,37,38]. Lastly, the long non-coding RNA gene VCAN-AS1, which has been reported as upregulated in children with asthma [39], was also detected. please refer to the NCBI Gene Expression Omnibus (GEO) for a comprehensive list of all selected potential informative genes related to allergy by each method.
The top enriched canonical pathways, determined by analyzing the selected features from each method, are summarized in Table 6. Due to netQDA’s selection of additional informative genes, namely HRH4 and P2RY14, three distinct canonical pathways were enriched solely by netQDA using IPA. Two pathways, namely the ‘STAT3 Pathway’ [40,41,42] and ’PI3K/AKT Signaling’ [43] have been shown to be directly associated with allergy-related mechanisms. The ’S100 Family Signaling Pathway’ and the ‘dysregulation of S100 proteins’ have been implicated in inflammatory responses, immune system activation, and tissue remodeling, all of which play significant roles in the pathogenesis of atopic diseases and allergies [44,45,46]. The identified potential informative genes and their corresponding pathways by netQDA provide valuable insights into the underlying molecular mechanisms involved and offer potential targets for therapeutic interventions of allergies.

5. Discussion

In summary, netQDA is an advanced extension of traditional QDA that incorporates a local network-guided approach for feature selection and outcome classification. By considering inter-feature connections, netQDA effectively tackles the challenges posed by high-dimensional data and allows for the identification of both strong and weak DE genes, as well as DC features. Through the detection of informative features, netQDA demonstrates superior or comparable performance in outcome classification compared to conventional methods in the field.
Our method does have certain limitations. While it can effectively detect weak DE or DC features that are highly connected to strong DE features, it may struggle to identify clusters of weak DE or DC features that operate independently but work together synergistically. This limitation means that the method may not fully capture the collective behavior of such features. Moreover, the performance of our method may not reach its optimum potential when confronted with a considerable number of strong DE features and/or an extensive set of informative features within a single local network. In these scenarios, the identification and classification of relevant features could become more challenging, potentially affecting the performance of the method.
Thus far, the netLDA method has been tested on a limited range of datasets, and hence, its performance is still to be properly assessed by a wide range of data for different diseases.
While our project has mainly concentrated on gene expression data and gene coregulation networks, the methodology we have developed can be readily extended to analyze different types of molecular data and their corresponding network structures, such as proteomics and protein–protein networks. Furthermore, although we have presented the method in the context of binary classification, the netQDA approach can be expanded to handle multi-class classification problems as well. These extensions demonstrate the versatility and potential applicability of the netQDA method across various domains and classification scenarios, enhancing its utility in a broader range of research and practical applications.

6. Supplementary

6.1. Potential Informative Genes of COPD

The potential genes selected by netQDA, which were also chosen by one or more of the competing methods, include FCGR1A, ISG15, SERPING1. Furthermore, netQDA also selected genes not selected by the other methods, including ARHGAP12, BATF2, BTAF1, CLK1, DDX60, DEK, FCGR1B, FMR1, HERC5, IFI44, IFI44L, IFIT1, IFIT3, LINC01215, LY6E, MX1, OSAL, OXCT1, PRIMPOL, RSAD2, SMC6, SUCO, THAP9-AS1, TMEM263, ZNF439, ZNF700. Among all the selected features, set comprised BATF2, CLK1, DDX60, DEK, FCGR1A, FCGR1B, FMR1, HERC5, IFIT1, ISG15, LINC01215, MX1, OASL, RSAD2, SERPING1, SMC6, SUCO, ZNF439, ZNF700. The DE-only feature set consisted of ARHGAP12, BTAF1, LY6E, OXCT1, PRIMPOL, THAP9-AS1. The rest of the genes were selected as both DC and DE genes. The genes formed four connected networks: (1) FCGR1A, FCGR1B; (2) BATF2, DDX60, HERC5, IFI44, IFI44L, IFIT1, IFIT3, ISG15, LY6E, MX1; (3) OASL, RSAD2, SERPING1; and (4) ARHGAP12, BTAF1, CLK1, DEK, FMR1, LINC01215, OXCT1, PRIMPOL, SMC6, SUCO, THAP9-AS1, TMEM263, ZNF439, ZNF700.
The mLDA method selected EBF1, ISG15, LOC388210, MS4A1, NR3C2, PLEKHG1, SERPING1, TPST1. Both lasso and elastic net methods selected A2M-AS1, ABCG2, AC079767.4, ADTRP, BTNL8, C8orf31, CCDC122, CEP55, CLEC12A, COL5A3, CPA3, DCAF4L1, DFNA5, DMRTC2, DQ592442, DSP, EBF1, FAM19A1, FGFBP2, FNBP1L, FOLR3, GALNT14, GJB6, GPRASP1, HBG1, HEY1, HOTS, IGDCC4, IGJ, KANSL1-AS1, KIAA1024, LGALS2, LINC00189, LINC01146, LINC01270, LINC01293, LINC01410, LIPC, LOC100130357, LOC100506922, LOC100653086, LOC100996756, LOC101928102, LOC101929842, LOC101929855, LOC102724517, LOC286087, LOC388210, LRRC6, MMP1, MYZAP, NCAPG2, NEFL, NOG, NR3C2, NRN1, OR52K3P, PNMA6A, PPP1R17, RP11-112J3.16, RP11-44F21.5, RP11-981G7.6, SCN3A, SEMG1, SLC19A2, SLC38A11, SLC8A1-AS1, TNFAIP6, TP53TG3, TPST1, TRIM6, TTTY15, XK, ZCCHC18, ZNF204P, ZNF285, ZNF354A, ZNF630. In addition, Lasso also selected ARL5A, ARMCX1, AV2S1A1, BC017398, BMX, C10orf32, C17orf97, C20orf197, C21orf15, CEACAM8, CLHC1, CNTNAP2, CYB5R2, DSC1, FAM118A, FAM124B, GLB1L, GRAMD1C, HLA-DRB4, HTATSF1P2, IDO1, KBTBD6, KBTBD8, KIR2DL2, KIR2DS3, KLRAP1, KLRG1, LINC00115, LOC101927391, LOC101928386, LOC389831, MIR17, NAPSB, PLEK2, RP11-121C2.2, RP11-171I2.4, RP11-382F24.1, RP11-44F14.8, RP11-664D1.1, RP11-722E23.2, SLC24A3, SNORA68, SPARC, TMTC1, TPRG1, TRBV27, TUBBP5. Elastic net also selected A2MP1, AV4S1, CEACAM6, CXCL10, EPSTI1, ERAP2, FCGR1A, HDC, ISG15, JADE3, LOC101928560, LOC102724162, NXN, PLGLB2, RBM38, RP11-127B20.2, RSPH9, SERPING1, SLC6A10P, SPP1, ZNF239. SIS selected BATF2, EBF1, LOC388210, NR3C2, PLEKHG1, SERPING1, TPST1.

6.2. Potential Informative Genes of Allergy

The final model of netQDA identified the following genes as potential allergy-related genes: BIRC6-AS1, CLC, CYSLTR2, GK-AS1, GSTA7P, HRH4, IL1RL1, IL5RA, MIR362, MIR511, MIR516B2, MIR888, P2RY14, PP12708, RNA5SP311, RNU5A-1, RNU5B-1, RNU5E-1, RNU6-69P, RNY3P9, SNORD42A, VCAN-AS1. Among the identified genes, CLC, CYSLTR2, HRH4, IL1RL1, IL5RA, RNU5A-1, and RNU5E-1 were selected as both DC and DE genes. These genes exhibit both changes in their expression levels and alterations in their connectivity patterns, indicating their potential significance in the context of allergy-related mechanisms. The remaining genes were categorized as DE-only genes, meaning they show differential expression but do not exhibit significant changes in their network connectivity. Furthermore, the genes CLC, CYSLTR2, HRH4, IL1RL1, and IL5RA form a connected component within the network, indicating their close interplay and potential functional relationships in the context of allergies. Additionally, the genes RNU5A-1, RNU5B-1, and RNU5E-1 form another connected component within the network, suggesting their potential collaborative roles in allergy-related processes.
On the other hand, the mLDA selected BIRC6-AS1, CLC, CYSLTR2, GSTA7P, HRH4, IL1RL1, MIR362, RNU5A-1, RNU5B-1, RNU6-69P as potential informative genes. Elastic net selected CLC, HRH4 MIR362 and MIR511 as informative genes. Lastly, the SIS method identified BIRC6-AS1, CLC, GSTA7P, HRH4, MIR362, MIR511, and RNU6-69P as informative genes related to the outcome of interest. Lasso did not identify any of the genes as informative genes.

Author Contributions

Conceptualization, W.C. and Y.L.; Methodology, X.Z. and Y.L.; Software, X.Z.; Formal analysis, X.Z.; Investigation, W.C. and Y.L.; Resources, Y.L.; Writing—original draft, X.Z.; Writing—review & editing, W.C. and Y.L.; Supervision, W.C. and Y.L. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available at the NCBI Gene Expression Omnibus (GEO). (accessed on 3 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.


  1. Grimes, T.; Potter, S.S.; Datta, S. Integrating gene regulatory pathways into differential network analysis of gene expression data. Sci. Rep. 2019, 9, 5479. [Google Scholar] [CrossRef] [PubMed]
  2. Forno, E.; Zhang, R.; Jiang, Y.; Yan, Q.; Han, Y.Y.; Acosta-Perez, E.; Boutaoui, N.; Canino, G.; Chen, W.; Celedon, J.C. Transcriptome-wide association study (TWAS) of nasal respiratory epithelium and childhood asthma. ERS 2019, 54, OA4943. [Google Scholar]
  3. Li, Y.; Hong, H.G.; Li, Y. Multiclass linear discriminant analysis with ultrahigh-dimensional features. Biometrics 2019, 75, 1086–1097. [Google Scholar] [CrossRef] [PubMed]
  4. Yan, Z.; Liu, L.; Jiao, L.; Wen, X.; Liu, J.; Wang, N. Bioinformatics analysis and identification of underlying biomarkers potentially linking allergic rhinitis and asthma. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 2020, 26, e924934-1. [Google Scholar] [CrossRef] [PubMed]
  5. Forno, E.; Zhang, R.; Jiang, Y.; Kim, S.; Yan, Q.; Ren, Z.; Han, Y.Y.; Boutaoui, N.; Rosser, F.; Weeks, D.E.; et al. Transcriptome-wide and differential expression network analyses of childhood asthma in nasal epithelium. J. Allergy Clin. Immunol. 2020, 146, 671–675. [Google Scholar] [CrossRef]
  6. Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 2008, 70, 849–911. [Google Scholar] [CrossRef]
  7. Fan, J.; Samworth, R.; Wu, Y. Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 2009, 10, 2013–2038. [Google Scholar]
  8. Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
  9. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  10. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
  11. Qin, Y. A review of quadratic discriminant analysis for high-dimensional data. Wiley Interdiscip. Rev. Comput. Stat. 2018, 10, e1434. [Google Scholar] [CrossRef]
  12. Cai, T.T.; Zhang, L. A convex optimization approach to high-dimensional sparse quadratic discriminant analysis. Ann. Stat. 2021, 49, 1537–1568. [Google Scholar] [CrossRef]
  13. Bickel, P.; Levina, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef]
  14. Shapiro, L.; Stockman, G. Computer Vision; Prentice: Hoboken, NJ, USA, 2002. [Google Scholar]
  15. Saldana, D.F.; Feng, Y. SIS: An R package for sure independence screening in ultrahigh-dimensional statistical models. J. Stat. Softw. 2018, 83, 1–25. [Google Scholar] [CrossRef]
  16. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33, 1. [Google Scholar] [CrossRef]
  17. Chang, Y.; Glass, K.; Liu, Y.Y.; Silverman, E.K.; Crapo, J.D.; Tal-Singer, R.; Bowler, R.; Dy, J.; Cho, M.; Castaldi, P. COPD subtypes identified by network-based clustering of blood gene expression. Genomics 2016, 107, 51–58. [Google Scholar] [CrossRef]
  18. Moll, M.; Hobbs, B.D.; Menon, A.; Ghosh, A.J.; Putman, R.K.; Hino, T.; Hata, A.; Silverman, E.K.; Quackenbush, J.; Castaldi, P.J.; et al. Blood gene expression risk profiles and interstitial lung abnormalities: COPDGene and ECLIPSE cohort studies. Respir. Res. 2022, 23, 157. [Google Scholar] [CrossRef]
  19. Prabhakaran, S. InformationValue: Performance Analysis and Companion Functions for Binary Classification Models, Version 1.2.3. 2016. Available online: (accessed on 29 November 2024).
  20. Jackson, V.E.; Ntalla, I.; Sayers, I.; Morris, R.; Whincup, P.; Casas, J.P.; Amuzu, A.; Choi, M.; Dale, C.; Kumari, M.; et al. Exome-wide analysis of rare coding variation identifies novel associations with COPD and airflow limitation in MOCS3, IFIT3 and SERPINA12. Thorax 2016, 71, 501–509. [Google Scholar] [CrossRef]
  21. Malhotra, R.; Kurian, N.; Zhou, X.H.; Jiang, F.; Monkley, S.; DeMicco, A.; Clausen, I.G.; Delgren, G.; Edenro, G.; Ahdesmäki, M.J.; et al. Altered regulation and expression of genes by BET family of proteins in COPD patients. PLoS ONE 2017, 12, e0173115. [Google Scholar]
  22. Bosco, A.; Ehteshami, S.; Stern, D.A.; Martinez, F.D. Decreased activation of inflammatory networks during acute asthma exacerbations is associated with chronic airflow obstruction. Mucosal Immunol. 2010, 3, 399–409. [Google Scholar] [CrossRef]
  23. Shan, M.; Yuan, X.; Song, L.Z.; Roberts, L.; Zarinkamar, N.; Seryshev, A.; Zhang, Y.; Hilsenbeck, S.; Chang, S.H.; Dong, C.; et al. Cigarette smoke induction of osteopontin (SPP1) mediates TH17 inflammation in human and experimental emphysema. Sci. Transl. Med. 2012, 4, 117ra9. [Google Scholar] [CrossRef] [PubMed]
  24. Bauer, C.M.; Morissette, M.C.; Stämpfli, M.R. The influence of cigarette smoking on viral infections: Translating bench science to impact COPD pathogenesis and acute exacerbations of COPD clinically. Chest 2013, 143, 196–206. [Google Scholar] [CrossRef] [PubMed]
  25. Dupin, I.; Henrot, P.; Maurat, E.; Abohalaka, R.; Chaigne, S.; El Hamrani, D.; Eyraud, E.; Prevel, R.; Esteves, P.; Campagnac, M.; et al. CXCR4 blockade alleviates pulmonary and cardiac outcomes in early COPD. bioRxiv 2023. [Google Scholar] [CrossRef]
  26. Kaneko, Y.; Yatagai, Y.; Yamada, H.; Iijima, H.; Masuko, H.; Sakamoto, T.; Hizawa, N. The search for common pathways underlying asthma and COPD. Int. J. Chronic Obstr. Pulm. Dis. 2013, 8, 65–78. [Google Scholar]
  27. Lemonnier, N.; Melén, E.; Jiang, Y.; Joly, S.; Ménard, C.; Aguilar, D.; Acosta-Perez, E.; Bergström, A.; Boutaoui, N.; Bustamante, M.; et al. A novel whole blood gene expression signature for asthma, dermatitis, and rhinitis multimorbidity in children and adolescents. Allergy 2020, 75, 3248–3260. [Google Scholar] [CrossRef]
  28. Park, B.L.; Kim, L.H.; Choi, Y.H.; Lee, J.H.; Rhim, T.; Lee, Y.M.; Uh, S.T.; Park, H.S.; Choi, B.W.; Hong, S.J.; et al. Interleukin 3 (IL3) polymorphisms associated with decreased risk of asthma and atopy. J. Hum. Genet. 2004, 49, 517–527. [Google Scholar] [CrossRef]
  29. Forno, E.; Wang, T.; Yan, Q.; Brehm, J.; Acosta-Perez, E.; Colon-Semidey, A.; Alvarez, M.; Boutaoui, N.; Cloutier, M.M.; Alcorn, J.F.; et al. A multiomics approach to identify genes associated with childhood asthma risk and morbidity. Am. J. Respir. Cell Mol. Biol. 2017, 57, 439–447. [Google Scholar] [CrossRef]
  30. Cheong, H.S.; Kim, L.H.; Park, B.L.; Choi, Y.H.; Park, H.S.; Hong, S.J.; Choi, B.W.; Park, C.S.; Shin, H.D. Association analysis of interleukin 5 receptor alpha subunit (IL5RA) polymorphisms and asthma. J. Hum. Genet. 2005, 50, 628–634. [Google Scholar] [CrossRef]
  31. Smieszek, S.P.; Przychodzen, B.; Welsh, S.E.; Brzezynski, J.L.; Kaden, A.R.; Mohrman, M.; Wang, J.; Xiao, C.; Ständer, S.; Birznieks, G.; et al. Genomic and phenotypic characterization of Investigator Global Assessment (IGA) scale-based endotypes in atopic dermatitis. J. Am. Acad. Dermatol. 2021, 85, 1638–1640. [Google Scholar] [CrossRef]
  32. Namkung, J.H.; Lee, J.E.; Kim, E.; Cho, H.J.; Kim, S.; Shin, E.S.; Cho, E.Y.; Yang, J.M. IL-5 and IL-5 receptor alpha polymorphisms are associated with atopic dermatitis in Koreans. Allergy 2007, 62, 934–942. [Google Scholar] [CrossRef]
  33. Ferreira, M.A.; Jansen, R.; Willemsen, G.; Penninx, B.; Bain, L.M.; Vicente, C.T.; Revez, J.A.; Matheson, M.C.; Hui, J.; Tung, J.Y.; et al. Gene-based analysis of regulatory variants identifies 4 putative novel asthma risk genes related to nucleotide synthesis and signaling. J. Allergy Clin. Immunol. 2017, 139, 1148–1157. [Google Scholar] [CrossRef] [PubMed]
  34. Karcz, T.; Whitehead, G.; Nakano, H.; Jacobson, K.A.; Cook, D.N. Endogenous UDP-Glc acts through the purinergic receptor P2RY14 to exacerbate eosinophilia and airway hyperresponsiveness in a protease model of allergic asthma. J. Immunol. 2019, 202, 119-18. [Google Scholar] [CrossRef]
  35. Thompson, R.J.; Sayers, I.; Kuokkanen, K.; Hall, I.P. Purinergic receptors in the airways: Potential therapeutic targets for asthma? Front. Allergy 2021, 2, 677677. [Google Scholar] [CrossRef] [PubMed]
  36. Dissanayake, E.; Inoue, Y. MicroRNAs in allergic disease. Curr. Allergy Asthma Rep. 2016, 16, 67. [Google Scholar] [CrossRef]
  37. Specjalski, K.; Jassem, E. MicroRNAs: Potential biomarkers and targets of therapy in allergic diseases? Arch. Immunol. Et Ther. Exp. 2019, 67, 213–223. [Google Scholar] [CrossRef]
  38. Weidner, J.; Bartel, S.; Kılıç, A.; Zissler, U.M.; Renz, H.; Schwarze, J.; Schmidt-Weber, C.B.; Maes, T.; Rebane, A.; Krauss-Etschmann, S.; et al. Spotlight on microRNAs in allergy and asthma. Allergy 2021, 76, 1661–1678. [Google Scholar] [CrossRef]
  39. Xia, L.; Wang, X.; Liu, L.; Fu, J.; Xiao, W.; Liang, Q.; Han, X.; Huang, S.; Sun, L.; Gao, Y.; et al. lnc-BAZ2B promotes M2 macrophage activation and inflammation in children with asthma through stabilizing BAZ2B pre-mRNA. J. Allergy Clin. Immunol. 2021, 147, 921–932. [Google Scholar] [CrossRef]
  40. Siegel, A.M.; Stone, K.D.; Cruse, G.; Lawrence, M.G.; Olivera, A.; Jung, M.y.; Barber, J.S.; Freeman, A.F.; Holland, S.M.; O’Brien, M.; et al. Diminished allergic disease in patients with STAT3 mutations reveals a role for STAT3 signaling in mast cell degranulation. J. Allergy Clin. Immunol. 2013, 132, 1388–1396. [Google Scholar] [CrossRef]
  41. Simeone-Penney, M.C.; Severgnini, M.; Tu, P.; Homer, R.J.; Mariani, T.J.; Cohn, L.; Simon, A.R. Airway epithelial STAT3 is required for allergic inflammation in a murine model of asthma. J. Immunol. 2007, 178, 6191–6199. [Google Scholar] [CrossRef]
  42. Carter, C.A.; Frischmeyer-Guerrerio, P.A. The genetics of food allergy. Curr. Allergy Asthma Rep. 2018, 18, 2. [Google Scholar] [CrossRef]
  43. Jiang, X.; Fang, L.; Wu, H.; Mei, X.; He, F.; Ding, P.; Liu, R. TLR2 regulates allergic airway inflammation and autophagy through PI3K/Akt signaling pathway. Inflammation 2017, 40, 1382–1392. [Google Scholar] [CrossRef] [PubMed]
  44. Heizmann, C.W. S100 Proteins. In Encyclopedia of Molecular Pharmacology; Springer International Publishing: Cham, Switzerland, 2022; pp. 1381–1386. [Google Scholar]
  45. Boguniewicz, M.; Leung, D.Y. Atopic dermatitis: A disease of altered skin barrier and immune dysregulation. Immunol. Rev. 2011, 242, 233–246. [Google Scholar] [CrossRef] [PubMed]
  46. Shishibori, T.; Oyama, Y.; Matsushita, O.; Yamashita, K.; Furuichi, H.; Okabe, A.; Maeta, H.; Hata, Y.; Kobayashi, R. Three distinct anti-allergic drugs, amlexanox, cromolyn and tranilast, bind to S100A12 and S100A13 of the S100 protein family. Biochem. J. 1999, 338, 583–589. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Strong (CST1) and weak (CLCA1) DE genes identified in the nasal epithelium between children with atopic asthma and non-atopic controls in a childhood asthma study.
Figure 1. Strong (CST1) and weak (CLCA1) DE genes identified in the nasal epithelium between children with atopic asthma and non-atopic controls in a childhood asthma study.
Mathematics 12 03823 g001
Figure 2. Differentially connected gene networks in the nasal epithelium between children with atopic asthma and non-atopic controls in a childhood asthma study. The genes, adapted from [5], exhibit different hub-based networks in both the atopic asthma group (top) and non-atopic control group (bottom). In this representation, hub genes are defined as genes having connections to at least four other genes in either group.
Figure 2. Differentially connected gene networks in the nasal epithelium between children with atopic asthma and non-atopic controls in a childhood asthma study. The genes, adapted from [5], exhibit different hub-based networks in both the atopic asthma group (top) and non-atopic control group (bottom). In this representation, hub genes are defined as genes having connections to at least four other genes in either group.
Mathematics 12 03823 g002aMathematics 12 03823 g002b
Figure 3. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 1. In scenario S1, we focused on a specific setting where the informative features were limited to strong DE features and DC features. We examined two different structures for the precision matrices of the connected components. In (a), the precision matrix in group one followed a star structure, while in (b), the precision matrix followed a band structure. The left side of both (a,b) display the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The strong DE feature, denoted as X1, is highlighted in different colors in two groups. In the sub-scenarios of S1, specific combinations of features were considered. In sub-scenario S1-1, there was one DC feature, X2. In sub-scenario S1-2, there were two DC features, X2 and X3. In sub-scenario S1-3, there were three DC features, X2, X3, and X4. All the partial correlation coefficients were set to ρ = 0.5.
Figure 3. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 1. In scenario S1, we focused on a specific setting where the informative features were limited to strong DE features and DC features. We examined two different structures for the precision matrices of the connected components. In (a), the precision matrix in group one followed a star structure, while in (b), the precision matrix followed a band structure. The left side of both (a,b) display the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The strong DE feature, denoted as X1, is highlighted in different colors in two groups. In the sub-scenarios of S1, specific combinations of features were considered. In sub-scenario S1-1, there was one DC feature, X2. In sub-scenario S1-2, there were two DC features, X2 and X3. In sub-scenario S1-3, there were three DC features, X2, X3, and X4. All the partial correlation coefficients were set to ρ = 0.5.
Mathematics 12 03823 g003
Figure 4. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 2. In scenario S 2 , the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. These categories were mutually exclusive. We examined two different structures for the precision matrices of the connected components. Star structure in (a) and band structure in (b). The left side of both (a,b) displayed the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The strong DE feature, was highlighted in different colors in two groups. In the sub-scenarios of S 2 , specific combinations of features were considered. The total number of DC features and DE weak features combined summed up to three in each component. We systematically varied the number of DC features from three to zero, while simultaneously increasing the number of weak DE features from zero to three. For example, for the sub-scenario one in (a), there is one strong DE feature X 1 and three DC features X 2 , X 3 , a n d X 4 . Sub-scenario two in (a) included one strong DE feature X 1 , one weak DE feature X 2 , and two DC features X 2 , X 3 , and X 4 . All the partial correlation coefficients were set to ρ = 0.5 .
Figure 4. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 2. In scenario S 2 , the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. These categories were mutually exclusive. We examined two different structures for the precision matrices of the connected components. Star structure in (a) and band structure in (b). The left side of both (a,b) displayed the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The strong DE feature, was highlighted in different colors in two groups. In the sub-scenarios of S 2 , specific combinations of features were considered. The total number of DC features and DE weak features combined summed up to three in each component. We systematically varied the number of DC features from three to zero, while simultaneously increasing the number of weak DE features from zero to three. For example, for the sub-scenario one in (a), there is one strong DE feature X 1 and three DC features X 2 , X 3 , a n d X 4 . Sub-scenario two in (a) included one strong DE feature X 1 , one weak DE feature X 2 , and two DC features X 2 , X 3 , and X 4 . All the partial correlation coefficients were set to ρ = 0.5 .
Mathematics 12 03823 g004
Figure 5. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 3. In scenario S 3 , the informative features could be classified into three categories: strong DE, weak DE, and DC features. The weak DE and DC features were allowed to overlap. We examined two different structures for the precision matrices of the connected components. Star structure in (a) and band structure (b). The left side of both (a,b) display the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The width of the edges corresponds to the strength of the connection or partial correlation coefficients between the features. Thicker edges indicate stronger connections ρ = 0.5 , while thinner edges represent weaker connections ρ = 0.4 in (a) and ρ = 0.30 in (b). The strong DE feature is highlighted in different colors in two groups. In the sub-scenarios of S 3 , specific combinations of features were considered. Within each connected component, we ensured the presence of one strong DE feature X 1 and three DC features X 1 , X 2 , and X 3 . However, the number of overlapping weak DE and DC features varied across different configurations. We explored scenarios with one, two, and three overlapping weak DE and DC features, respectively.
Figure 5. Precision matrix of a connected component and its interpretation as a graph for simulation scenario 3. In scenario S 3 , the informative features could be classified into three categories: strong DE, weak DE, and DC features. The weak DE and DC features were allowed to overlap. We examined two different structures for the precision matrices of the connected components. Star structure in (a) and band structure (b). The left side of both (a,b) display the precision matrices specific to each connected component. On the right side, there are graphical interpretations of the precision matrices, where features are represented by nodes, and connections are represented by edges. The width of the edges corresponds to the strength of the connection or partial correlation coefficients between the features. Thicker edges indicate stronger connections ρ = 0.5 , while thinner edges represent weaker connections ρ = 0.4 in (a) and ρ = 0.30 in (b). The strong DE feature is highlighted in different colors in two groups. In the sub-scenarios of S 3 , specific combinations of features were considered. Within each connected component, we ensured the presence of one strong DE feature X 1 and three DC features X 1 , X 2 , and X 3 . However, the number of overlapping weak DE and DC features varied across different configurations. We explored scenarios with one, two, and three overlapping weak DE and DC features, respectively.
Mathematics 12 03823 g005
Figure 6. Boxplots for outcome classification accuracy for scenario 1. In Scenario 1, we focused on a specific setting where the informative features were limited to strong DE features and DC features. In each connected component, there was one strong DE feature with one, two, or three DC features. The corresponding outcomes were graphically represented in each column of the plot. The subplots in the lower row demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Figure 6. Boxplots for outcome classification accuracy for scenario 1. In Scenario 1, we focused on a specific setting where the informative features were limited to strong DE features and DC features. In each connected component, there was one strong DE feature with one, two, or three DC features. The corresponding outcomes were graphically represented in each column of the plot. The subplots in the lower row demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Mathematics 12 03823 g006
Figure 7. Boxplots for outcome classification accuracy for scenario 2. In Scenario 2, the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. These categories were mutually exclusive. The total number of DC features and weak DE features combined summed up to three in each component. We systematically varied the number of DC features from three to zero, while simultaneously increasing the number of weak DE features from zero to three. The corresponding outcomes were graphically represented in each column of the plot. The lower row of subplots demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Figure 7. Boxplots for outcome classification accuracy for scenario 2. In Scenario 2, the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. These categories were mutually exclusive. The total number of DC features and weak DE features combined summed up to three in each component. We systematically varied the number of DC features from three to zero, while simultaneously increasing the number of weak DE features from zero to three. The corresponding outcomes were graphically represented in each column of the plot. The lower row of subplots demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Mathematics 12 03823 g007
Figure 8. Boxplots for outcome classification accuracy for scenario 3. In Scenario 3, the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. The weak DE and DC features were allowed to overlap. Within each connected component, we ensured the presence of one strong DE feature and three DC features. The number of overlapping weak DE and DC features is one, two or three. The corresponding outcomes were graphically represented in each column of the plot. The lower row of subplots demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Figure 8. Boxplots for outcome classification accuracy for scenario 3. In Scenario 3, the informative features could be classified into three distinct categories: strong DE, weak DE, and DC features. The weak DE and DC features were allowed to overlap. Within each connected component, we ensured the presence of one strong DE feature and three DC features. The number of overlapping weak DE and DC features is one, two or three. The corresponding outcomes were graphically represented in each column of the plot. The lower row of subplots demonstrated the findings obtained from the star structure, while the upper row depicted the results associated with the band structure.
Mathematics 12 03823 g008
Table 1. Feature selection performance of scenario 1. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 1. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC Feature’ represents the true positive rate for DC-only features, indicating the proportion of actual DC features that are correctly selected by each method. All methods successfully selected all the strong DE features in all realizations within each sub-scenario of S 1 . There was no weak DE feature or overlapping DC and DC feature. The values reported in the table are the mean and standard deviation, provided in parentheses.
Table 1. Feature selection performance of scenario 1. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 1. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC Feature’ represents the true positive rate for DC-only features, indicating the proportion of actual DC features that are correctly selected by each method. All methods successfully selected all the strong DE features in all realizations within each sub-scenario of S 1 . There was no weak DE feature or overlapping DC and DC feature. The values reported in the table are the mean and standard deviation, provided in parentheses.
SettingGroupF1 ScorePrecisionRecallTPR of DC
1 DC-StarnetQDA0.97 (0.08)0.98 (0.07)0.97 (0.10)0.96 (0.15)
mLDA0.65 (0.04)0.94 (0.13)0.50 (0.03)0.00 (0.00)
Lasso0.49 (0.22)0.66 (0.40)0.57 (0.15)0.14 (0.30)
Elastic Net0.52 (0.22)0.69 (0.40)0.60 (0.18)0.20 (0.36)
SIS0.26 (0.04)0.15 (0.02)0.86 (0.14)0.73 (0.28)
2 DC-StarnetQDA0.99 (0.03)0.98 (0.05)1.00 (0.00)1.00 (0.00)
mLDA0.49 (0.03)0.95 (0.12)0.33 (0.02)0.00 (0.00)
Lasso0.43 (0.12)0.68 (0.37)0.45 (0.19)0.17 (0.28)
Elastic Net0.42 (0.12)0.70 (0.38)0.45 (0.20)0.18 (0.30)
SIS0.38 (0.05)0.24 (0.03)0.93 (0.11)0.89 (0.16)
3 DC-StarnetQDA0.99 (0.02)0.99 (0.04)1.00 (0.02)0.99 (0.03)
mLDA0.60 (0.13)0.94 (0.10)0.46 (0.13)0.28 (0.18)
Lasso0.37 (0.13)0.51 (0.37)0.56 (0.27)0.42 (0.36)
Elastic Net0.39 (0.12)0.55 (0.36)0.57 (0.28)0.42 (0.37)
SIS0.51 (0.02)0.34 (0.01)0.99 (0.03)0.99 (0.05)
1 DC-BandnetQDA0.98 (0.05)0.97 (0.07)0.98 (0.06)0.97 (0.12)
mLDA0.69 (0.10)0.96 (0.11)0.56 (0.13)0.12 (0.25)
Lasso0.46 (0.23)0.56 (0.41)0.70 (0.22)0.41 (0.44)
Elastic Net0.49 (0.22)0.59 (0.40)0.70 (0.21)0.40 (0.42)
SIS0.30 (0.01)0.17 (0.00)1.00 (0.03)0.99 (0.05)
2 DC-BandnetQDA0.99 (0.02)0.99 (0.04)1.00 (0.00)1.00 (0.00)
mLDA0.67 (0.11)0.95 (0.11)0.53 (0.12)0.29 (0.18)
Lasso0.37 (0.15)0.47 (0.36)0.52 (0.16)0.28 (0.25)
Elastic Net0.37 (0.16)0.47 (0.37)0.53 (0.16)0.30 (0.25)
SIS0.30 (0.04)0.19 (0.02)0.73 (0.09)0.59 (0.13)
3 DC-BandnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)
mLDA0.62 (0.06)0.94 (0.10)0.47 (0.06)0.29 (0.08)
Lasso0.32 (0.13)0.52 (0.40)0.41 (0.15)0.21 (0.19)
Elastic Net0.34 (0.11)0.53 (0.38)0.41 (0.14)0.21 (0.19)
SIS0.32 (0.05)0.21 (0.04)0.62 (0.11)0.49 (0.14)
Table 2. Feature selection performance of scenario 2. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 2. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC’ represents the true positive rate for DC-only features, indicating the proportion of actual DC features that are correctly selected by each method. Similarly, the ‘TPR of weak DE’ represents the true positive rate for weak DE-only features, indicating the proportion of actual weak DE features that are correctly selected by each method. There was overlapping DC and weak DE features in S 2 . All methods successfully selected all the strong DE features in all realizations within each sub-scenario. The values reported in the table are the mean and standard deviation, provided in parentheses.
Table 2. Feature selection performance of scenario 2. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 2. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC’ represents the true positive rate for DC-only features, indicating the proportion of actual DC features that are correctly selected by each method. Similarly, the ‘TPR of weak DE’ represents the true positive rate for weak DE-only features, indicating the proportion of actual weak DE features that are correctly selected by each method. There was overlapping DC and weak DE features in S 2 . All methods successfully selected all the strong DE features in all realizations within each sub-scenario. The values reported in the table are the mean and standard deviation, provided in parentheses.
SettingGroupF1 ScorePrecisionRecallTPR of DCTPR of
Weak DE
3DC 0weakDE-StarnetQDA0.99 (0.02)0.99 (0.04)1.00 (0.02)0.99 (0.03)-
mLDA0.60 (0.13)0.94 (0.10)0.46 (0.13)0.28 (0.18)-
Lasso0.37 (0.13)0.51 (0.37)0.56 (0.27)0.42 (0.36)-
Elastic Net0.39 (0.12)0.55 (0.36)0.57 (0.28)0.42 (0.37)-
SIS0.51 (0.02)0.34 (0.01)0.99 (0.03)0.99 (0.05)-
2DC 1weakDE-StarnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.73 (0.08)0.95 (0.08)0.60 (0.09)0.21 (0.17)0.99 (0.07)
Lasso0.35 (0.15)0.30 (0.25)0.77 (0.21)0.58 (0.32)0.90 (0.27)
Elastic Net0.35 (0.15)0.30 (0.23)0.76 (0.21)0.57 (0.33)0.90 (0.26)
SIS0.51 (0.02)0.34 (0.02)0.98 (0.05)0.96 (0.09)1.00 (0.00)
1DC 2weakDE-StarnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.85 (0.06)0.96 (0.06)0.77 (0.09)0.16 (0.26)0.96 (0.09)
Lasso0.30 (0.15)0.21 (0.16)0.85 (0.13)0.54 (0.39)0.94 (0.14)
Elastic Net0.31 (0.15)0.21 (0.13)0.84 (0.13)0.50 (0.38)0.94 (0.14)
SIS0.50 (0.03)0.34 (0.02)0.98 (0.05)0.91 (0.21)1.00 (0.00)
0DC 3weakDE-StarnetQDA0.99 (0.03)0.99 (0.03)0.99 (0.04)-0.98 (0.05)
mLDA0.99 (0.03)0.97 (0.05)1.00 (0.00)-1.00 (0.00)
Lasso0.22 (0.10)0.14 (0.12)0.93 (0.12)-0.90 (0.17)
Elastic Net0.22 (0.09)0.14 (0.12)0.92 (0.14)-0.90 (0.18)
SIS0.52 (0.00)0.35 (0.00)1.00 (0.00)-1.00 (0.00)
3DC 0weakDE-BandnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)-
mLDA0.65 (0.02)0.94 (0.09)0.50 (0.00)0.33 (0.00)-
Lasso0.30 (0.14)0.39 (0.34)0.46 (0.14)0.28 (0.18)-
Elastic Net0.30 (0.12)0.37 (0.34)0.45 (0.12)0.27 (0.17)-
SIS0.30 (0.05)0.20 (0.03)0.59 (0.09)0.45 (0.12)-
2DC 1weakDE-BandnetQDA0.99 (0.02)0.98 (0.04)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.66 (0.05)0.95 (0.09)0.52 (0.05)0.04 (0.09)0.99 (0.05)
Lasso0.33 (0.14)0.38 (0.34)0.58 (0.19)0.27 (0.24)0.79 (0.38)
Elastic Net0.33 (0.14)0.37 (0.32)0.59 (0.18)0.29 (0.23)0.79 (0.36)
SIS0.42 (0.05)0.28 (0.03)0.81 (0.10)0.61 (0.20)1.00 (0.00)
1DC 2weakDE-BandnetQDA0.99 (0.03)0.99 (0.03)0.98 (0.05)1.00 (0.00)0.96 (0.09)
mLDA0.77 (0.08)0.96 (0.08)0.64 (0.09)0.00 (0.00)0.79 (0.18)
Lasso0.32 (0.14)0.31 (0.27)0.59 (0.15)0.40 (0.41)0.47 (0.18)
Elastic Net0.33 (0.14)0.30 (0.25)0.59 (0.14)0.39 (0.38)0.48 (0.17)
SIS0.37 (0.04)0.25 (0.03)0.73 (0.08)0.84 (0.25)0.53 (0.09)
0DC 3weakDE-BandnetQDA0.98 (0.05)0.99 (0.04)0.97 (0.07)-0.96 (0.09)
mLDA0.85 (0.14)0.95 (0.08)0.78 (0.21)-0.71 (0.28)
Lasso0.31 (0.14)0.33 (0.30)0.50 (0.11)-0.33 (0.15)
Elastic Net0.29 (0.13)0.29 (0.27)0.50 (0.11)-0.33 (0.15)
SIS0.27 (0.02)0.18 (0.02)0.52 (0.05)-0.36 (0.06)
Table 3. Feature selection performance of scenario 3. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 3. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC’ represents the true positive rate for DC (only) features, indicating the proportion of actual DC features that are correctly selected by each method. The ‘TPR of DC and weak DE’ represents the true positive rate for the overlapping category of DC and weak DE features, indicating the proportion of actual overlapping DC and weak DE features that are correctly selected by each method. All methods successfully selected all the strong DE features in all realizations within each sub-scenario. There was no weak DE-only feature. The values reported in the table are the mean and standard deviation, provided in parentheses.
Table 3. Feature selection performance of scenario 3. This table presents the performance of each method in our simulation studies regarding feature selection in Scenario 3. The evaluation metrics include F1 score, precision, and recall, which were computed based on the overall feature selection performance. The ‘TPR of DC’ represents the true positive rate for DC (only) features, indicating the proportion of actual DC features that are correctly selected by each method. The ‘TPR of DC and weak DE’ represents the true positive rate for the overlapping category of DC and weak DE features, indicating the proportion of actual overlapping DC and weak DE features that are correctly selected by each method. All methods successfully selected all the strong DE features in all realizations within each sub-scenario. There was no weak DE-only feature. The values reported in the table are the mean and standard deviation, provided in parentheses.
SettingGroupF1 ScorePrecisionRecallTPR of DC-OnlyTPR of DC and Weak DE
2DConly 1DCweakDE-StarnetQDA1.00 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.75 (0.07)0.96 (0.08)0.62 (0.08)0.23 (0.17)1.00 (0.00)
Lasso0.35 (0.15)0.32 (0.27)0.77 (0.23)0.61 (0.35)0.85 (0.32)
Elastic Net0.36 (0.16)0.33 (0.27)0.74 (0.22)0.56 (0.34)0.85 (0.30)
SIS0.51 (0.02)0.34 (0.01)0.98 (0.04)0.97 (0.08)1.00 (0.00)
1DConly 2DCweakDE-StarnetQDA1.00 (0.01)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.83 (0.04)0.96 (0.06)0.73 (0.06)0.01 (0.09)0.95 (0.10)
Lasso0.33 (0.16)0.26 (0.23)0.84 (0.17)0.56 (0.40)0.90 (0.22)
Elastic Net0.34 (0.15)0.26 (0.21)0.83 (0.17)0.55 (0.39)0.90 (0.22)
SIS0.51 (0.02)0.34 (0.01)0.99 (0.04)0.94 (0.16)1.00 (0.00)
0DConly 3DCweakDE-StarnetQDA0.99 (0.02)0.99 (0.04)1.00 (0.00)-1.00 (0.00)
mLDA0.88 (0.06)0.97 (0.06)0.81 (0.10)-0.75 (0.13)
Lasso0.29 (0.13)0.19 (0.15)0.93 (0.13)-0.90 (0.17)
Elastic Net0.30 (0.14)0.22 (0.18)0.91 (0.15)-0.89 (0.20)
SIS0.52 (0.00)0.35 (0.00)1.00 (0.00)-1.00 (0.00)
2DConly 1DCweakDE-BandnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.62 (0.06)0.94 (0.10)0.47 (0.06)0.00 (0.00)0.88 (0.23)
Lasso0.32 (0.13)0.52 (0.40)0.41 (0.15)0.05 (0.12)0.54 (0.45)
Elastic Net0.34 (0.11)0.53 (0.38)0.41 (0.14)0.05 (0.12)0.53 (0.45)
SIS0.32 (0.05)0.21 (0.04)0.62 (0.11)0.23 (0.21)1.00 (0.00)
1DConly 2DCweakDE-BandnetQDA0.99 (0.02)0.99 (0.03)1.00 (0.00)1.00 (0.00)1.00 (0.00)
mLDA0.74 (0.09)0.94 (0.08)0.62 (0.10)0.00 (0.00)0.74 (0.20)
Lasso0.32 (0.13)0.47 (0.38)0.42 (0.14)0.04 (0.13)0.32 (0.26)
Elastic Net0.32 (0.13)0.51 (0.39)0.41 (0.14)0.04 (0.14)0.30 (0.26)
SIS0.29 (0.05)0.20 (0.03)0.57 (0.09)0.10 (0.22)0.59 (0.14)
0DConly 3DCweakDE-BandnetQDA1.00 (0.02)0.99 (0.03)1.00 (0.01)-1.00 (0.02)
mLDA0.80 (0.13)0.95 (0.08)0.71 (0.19)-0.62 (0.26)
Lasso0.32 (0.12)0.44 (0.37)0.42 (0.13)-0.23 (0.18)
Elastic Net0.32 (0.12)0.46 (0.38)0.41 (0.13)-0.22 (0.18)
SIS0.29 (0.04)0.19 (0.03)0.55 (0.08)-0.40 (0.10)
Table 4. Prediction performance of netQDA compared to competing methods in real data applications. This table presents the outcome classification performance of netQDA and competing methods in real data applications. The evaluation metrics include ROC-AUC, or area under the receiver operating characteristic curve, and accuracy.
Table 4. Prediction performance of netQDA compared to competing methods in real data applications. This table presents the outcome classification performance of netQDA and competing methods in real data applications. The evaluation metrics include ROC-AUC, or area under the receiver operating characteristic curve, and accuracy.
Elastic Net0.610.620.510.63
Table 5. Top enriched canonical pathways of potentially informative genes by netQDA and competing methods using Ingenuity Pathway Analysis in the GSE76705 COPD dataset. Ratio is the number of candidate genes that map to the pathway divided by the total number of genes in the same pathway. Genes in bold denote potentially informative genes related to allergy uniquely identified by netQDA.
Table 5. Top enriched canonical pathways of potentially informative genes by netQDA and competing methods using Ingenuity Pathway Analysis in the GSE76705 COPD dataset. Ratio is the number of candidate genes that map to the pathway divided by the total number of genes in the same pathway. Genes in bold denote potentially informative genes related to allergy uniquely identified by netQDA.
MethodIngenuity Canonical Pathwayp ValueRatioMolecules
netQDAInterferon Signaling<0.0010.111IFIT1, IFIT3, ISG15, MX1
Role of Hypercytokinemia/hyperchemokinemia in the Pathogenesis of Influenza<0.0010.047IFIT3, ISG15, MX1, RSAD2
Role of Lipids/Lipid Rafts in the Pathogenesis of Influenza0.0260.044RSAD2
IL-12 Signaling and Production in Macrophages0.0300.008FCGR1A, FCGR1B
mLDAInterferon Signaling0.0100.028ISG15
Complement System0.0110.027SERPING1
Activation of IRF by Cytosolic Pattern Recognition Receptors0.0190.015ISG15
IL-7 Signaling Pathway0.0230.013EBF1
Role of Hypercytokinemia/hyperchemokinemia in the Pathogenesis of Influenza0.0250.012ISG15
LassoTriacylglycerol Degradation0.0170.035LIPC, TNFAIP6
Tryptophan Degradation to 2-amino-3-carboxymuconate Semialdehyde0.0200.167IDO1
Natural Killer Cell Signaling0.0320.015COL5A3, KIR2DL2, KIR2DS3
Crosstalk between Dendritic Cells and Natural Killer Cells0.0400.022HLA-DRB4, KIR2DL2
NAD biosynthesis II (from tryptophan)0.0440.077IDO1
Elastic NetHistamine Biosynthesis0.0031.000HDC
HOTAIR Regulatory Pathway0.0110.018MMP1, RBM38, SPP1
Role Of Osteoclasts In Rheumatoid Arthritis Signaling Pathway0.0110.013COL5A3, FCGR1A, MMP1, SPP1
Triacylglycerol Degradation0.0110.035LIPC, TNFAIP6
Thioredoxin Pathway0.0190.143NXN
SISComplement System0.0090.027SERPING1
IL-7 Signaling Pathway0.0190.013EBF1
WNK Renal Signaling Pathway0.0260.009NR3C2
Adipogenesis pathway0.0350.007EBF1
Aldosterone Signaling in Epithelial Cells0.0430.006NR3C2
Table 6. Top enriched canonical pathways of potential informative genes by netQDA and competing methods using Ingenuity Pathway Analysis in the GSE14166 allergy dataset. Ratio is the number of candidate genes that map to the pathway divided by the total number of genes in the same pathway. Lasso did not identify any of the genes as informative genes. Thus, there were no enriched canonical pathways identified. The set of enriched canonical pathways were the same for SIS and Elastic Net as enriched by HRH4. Genes in bold denote potential informative genes related to allergy uniquely identified by netQDA.
Table 6. Top enriched canonical pathways of potential informative genes by netQDA and competing methods using Ingenuity Pathway Analysis in the GSE14166 allergy dataset. Ratio is the number of candidate genes that map to the pathway divided by the total number of genes in the same pathway. Lasso did not identify any of the genes as informative genes. Thus, there were no enriched canonical pathways identified. The set of enriched canonical pathways were the same for SIS and Elastic Net as enriched by HRH4. Genes in bold denote potential informative genes related to allergy uniquely identified by netQDA.
MethodIngenuity Canonical Pathwayp ValueRatioMolecules
netQDAFAK Signaling<0.0010.005CYSLTR2, HRH4, IL1RL1, IL5RA, P2RY14
STAT3 Pathway0.0020.015IL1RL1, IL5RA
Breast Cancer Regulation by Stathmin10.0040.005CYSLTR2, HRH4, P2RY14
CREB Signaling in Neurons0.0040.005CYSLTR2, HRH4, P2RY14
Granulocyte Adhesion and Diapedesis0.0050.011HRH4, IL1RL1
PI3K/AKT Signaling0.0050.010IL1RL1, IL5RA
Phagosome Formation0.0060.004CYSLTR2, HRH4, P2RY14
G-Protein Coupled Receptor Signaling0.0060.004CYSLTR2, HRH4, P2RY14
S100 Family Signaling Pathway0.0080.004CYSLTR2, HRH4, P2RY14
mLDAGranulocyte Adhesion and Diapedesis0.0010.011HRH4, IL1RL1
FAK Signaling0.0030.003CYSLTR2, HRH4, IL1RL1
Breast Cancer Regulation by Stathmin10.0120.003CYSLTR2, HRH4
CREB Signaling in Neurons0.0130.003CYSLTR2, HRH4
Phagosome Formation0.0160.003CYSLTR2, HRH4
Elastic enet/SISGranulocyte Adhesion and Diapedesis0.0320.005HRH4
Breast Cancer Regulation by Stathmin10.0950.002HRH4
CREB Signaling in Neurons0.0980.002HRH4
Phagosome Formation0.1120.001HRH4
G-Protein Coupled Receptor Signaling0.1130.001HRH4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, X.; Chen, W.; Li, Y. netQDA: Local Network-Guided High-Dimensional Quadratic Discriminant Analysis. Mathematics 2024, 12, 3823.

AMA Style

Zhou X, Chen W, Li Y. netQDA: Local Network-Guided High-Dimensional Quadratic Discriminant Analysis. Mathematics. 2024; 12(23):3823.

Chicago/Turabian Style

Zhou, Xueping, Wei Chen, and Yanming Li. 2024. "netQDA: Local Network-Guided High-Dimensional Quadratic Discriminant Analysis" Mathematics 12, no. 23: 3823.

APA Style

Zhou, X., Chen, W., & Li, Y. (2024). netQDA: Local Network-Guided High-Dimensional Quadratic Discriminant Analysis. Mathematics, 12(23), 3823.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop