Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum
Abstract
:1. Introduction
1.1. Motivation and Related Work
1.2. Our Study and Contributions
2. Results and Discussion
2.1. Description of Overlap between Inferred Networks
2.2. Selecting the Best Clustering Parameter Values for the Inferred Networks
2.3. Validating the Inferred Networks in the Task of Predicting Gene-GO Term Associations
2.4. Ranking Predicted Gene–GO Term Associations and Gene–Gene Interactions
2.5. Validating the Inferred Networks Using Endocytosis-Related Biological Signatures
3. Conclusions
4. Materials and Methods
4.1. Data
4.1.1. Gene Expression Data
4.1.2. Processing Probe-Level Data
4.1.3. Imputing Missing Values in the Gene Expression Data
- Multiple imputations by chained equations (MICE) [59] imputes a column (i.e., sample) by modeling each sample with missing values as a function of other samples in a round-robin fashion. That is, given a sample column of interest, namely, y, and all other sample columns, namely X, a regressor is then fitted on X and y by learning a regression model from known values in X and y to predict the missing values in y.
- SVDimpute [60] is a singular vector decomposition (SVD)-based imputation method. Intuitively, a matrix can be recovered asymptotically by only using the significant eigenvalues. That is, given a gene u, a regression model of gene u- and k-most-significant eigenvalues (i.e., eigengenes) is fitted. Then, the learned coefficients of the linear combination of the k eigengenes are used to impute the missing values of gene u. The processes are repeated iteratively until all missing values are imputed.
- KNNimpute [60] imputes missing values as follows. First, given a gene u with a missing value in sample j, k other genes without a missing value in sample j that are most similar to gene u are selected. Then, the weighted average expression level of the k selected genes in sample j is treated as an estimated expression level for gene u in sample j, where the weight is the expression similarity (measured using Euclidean distance) of a gene (i.e., among the k selected genes) to the gene u. We vary k from 1 to 24 with an increment of 2.
- Local least squares imputation (LLSimpute) [61] imputes missing values as follows. First, k other genes that are most similar to (i.e., have the largest absolute Pearson correlation coefficients with) gene u of interest are selected. It differs from KNNimpute (i.e., k is predefined) in that the k value for LLSimpute is introduced automatically.
- SoftImpute [62] imputes missing values by guessing values repeatedly. Specifically, the missing values in the gene expression data are initially filled as zero. Then, a guessed matrix is updated repeatedly by using the soft threshold SVD with different regularization parameters. If the smallest of the guessed singular values is less than the regularization parameter, then the desired guess is obtained. Please refer to [62] for methodological detail.
- BiScaler [63] was proposed based on SoftImpute but using alternating minimization algorithms. It introduced the quadratic regularization to shrink higher-order components more than the lower-order components such that it offers a better convergence compared to SoftImpute. Please refer to [63] for methodological detail.
- NuclearNormMinimization [64] imputes missing values by solving a simple convex optimization problem. That is, for a matrix M based on a theory that the missing values can be recovered if the number of missing values m obeys , where N is the number of rows in matrix M, c is a positive numerical constant, and r is the rank of M. This algorithm usually works well on smaller matrices.
4.1.4. Accounting for Cyclical Stage Variation
4.1.5. Ground Truth GO Term Annotation Data
4.1.6. Endocytosis Data
4.2. Inference of Gene Co-Expression Networks
4.2.1. Network Construction Using ARACNe Framework
4.2.2. Network Construction Using Adaptive Lasso
4.2.3. Construction of the Consensus Network
- Given that we have four networks with different numbers of edges, we derive the following: network 1 has edges, and network 2 has edges, where . We argue that the least important edge in network 1 (ranked as ) should be equally important as the edges that have rank in network 2. This is because each network is inferred via methods with a well-established thresholding strategy, and a network with more edges intuitively has a loosened thresholding strategy compared to a network with fewer edges.
- We aim to make sure that the higher the rank value of an edge, the more important the edge is. For example, in network 1, an edge with rank means the edge is the most important edge in network 1. So, we reverse the way we rank edges from step 1.
- To ensure the above two steps are satisfied when we construct the Consensus network, we first calculate the number of edges in all four networks (i.e., AdaL, MI, absPCC-0.3, and RF-0.03). If network 4 is the largest network with edges, we use as our possible maximum rank such that the most important edge in each of the four networks has the same rank, which is .
- According to step 3, network 1 has edge ranks from to , network 2 has edge ranks from to , network 3 has edge ranks from to , and network 4 has ranks from to 1.
- After we obtain all raw ranks for the edges of each network, we use min-max normalization to normalize the rank of each edge in each network. That is, we first find the maximum (i.e., also ) and minimum (i.e., also 1) edge rank across the four networks. Then, for a given edge between gene i and j with weight , the normalized rank is . The resulting normalized rank of edges across four networks spans from 0 to 1.
- Finally, for a given edge, we sum the weights from the four networks. The collection of all such edges forms the Consensus network. That is, an edge between gene i and j in the Consensus network has a weight of for , where l is the network ID. Consequently, the resulting Consensus network has a maximum possible edge weight of 4 and a minimum possible edge weight of 0.
4.3. Clustering Methods
4.3.1. BigCLAM
4.3.2. MCL
4.4. Predicting and Evaluating Gene–GO Term Associations from Clusters
- For the given gene expression data, we test against relevant GO terms to predict and evaluate the accuracy of predicted gene–GO term associations for each of the combinations.
- For all clusters from a given combination, we use hypergeometric test (Section 4.4.1) to compute the probability scores (i.e., p values) of the enrichment significance between each pair of a given cluster and a GO term. If a cluster is statistically significantly enriched in at least one GO term (i.e., p-value < 0.05), we mark this cluster as an enriched cluster. We test all clusters and obtain the significantly enriched clusters.
- In parallel, we make gene–GO term association predictions using significantly enriched clusters and GO terms via leave-one-out crossvalidation [80] as follows:
- First, we hide a gene i’s GO term knowledge at a time.
- Second, we test whether each of the clusters that gene i belongs to is significantly enriched in any GO term. If such a cluster is statistically significantly enriched by a GO term j, we predict gene i annotated by GO term j.
- Third, we repeat the above two steps for every gene that has at least one existing GO term annotation. Then, we use precision and recall to evaluate prediction accuracy. The precision is the percentage of correct predictions among all predictions we make. The recall is the percentage of correct predictions among all existing gene–GO term associations. Because there is always a trade-off between precision and recall, we use precision as our parameter selection criteria (3). We do this because we believe that in biomedicine for wet lab validation of predictions, it is more important to have a high precision if we can not have both high precision and recall [44,45].
- According to our three selection criteria, each combination has up to three clustering parameter values. Different selection criteria could end up with the same clustering parameter. This is why we have up to three selected clustering parameters for a given combination of a network and a clustering method.
- Because we also aim to compare prediction performance across different networks, we first select the best clustering parameter based on their leave-one-out crossvalidation precision and recall for each of the 11 networks. Specifically, for two clustering parameters, if parameter 1 has a higher precision and a higher recall or a similar recall compared to parameter 2, we select parameter 1. If parameter 1 has a higher precision but a lower recall compared to parameter 2, we keep both. For those selected “best” parameters of each network, we further compare them using the same selection criteria to find the combinations that yield the best prediction accuracy in terms of precision and recall.
- We then qualitatively analyze our predicted gene–GO term associations using relevant biological pathways. In particular, we visualize how effectively each of the 22 combinations predicts true gene–GO term associations as a heat map showing the proportion of gene–GO term associations correctly predicted for each GO term (rows) by a given combination (columns). GO terms were grouped using semantic similarity using a web tool called REVIGO [81]. Default REVIGO parameters were used to analyze the list of GO terms. The semantic similarity groupings and descriptions from REVIGO were exported by utilizing the “Export to TSV” option under the TREE MAP view.
- Finally, we perform deep dive analysis for those selected combinations from step 5 using the Jaccard index and overlap coefficient. Both methods measure the overlaps between two sets. In particular, given set A and set B, the Jaccard index is , and the overlap coefficient is . The Jaccard index results in more accurate results when the sizes of sets A and B are close to each other, while the overlap coefficient results in more accurate results for small data when the sizes of set A and B are far away from each other. The number of predictions made by different combinations can be very different or similar, which is why we use both indices to quantitatively measure our deep-dive analysis.
4.4.1. Hypergeometric Test
- Within each network_cluster_parameter combination, we assume we have clusters and GO terms. We adjust p values (i.e., each corresponds to a cluster and a GO term pair). We then use the adjusted p values to determine whether a GO term is significantly enriched in a cluster.
- After we select up to six parameters per network based on our three systematic selection criteria, we recorrect up to p values (i.e., we have 11 networks) for fair comparison between networks for the given gene expression data. By recorrect, we mean we take the up to p values and use FDR correction to obtain the adjusted p values. We do not correct across all tested clustering parameters because some of the parameters are tested to make sure that we are not missing some of the important clustering parameters. Therefore, adding p values from these parameters for test correction can be too conservative and consequently remove lots of true positives.
4.5. Assigning Confidence Scores to the Predicted Gene–GO Term Associations
- Recall that we have 11 co-expression networks and two clustering methods, which totals to 22 combinations of a network and a clustering method (i.e., 22 adjusted p values). Recall that we selected up to three clustering parameters per combination, with each corresponding to an adjusted p value. We select the smallest adjusted p value as the adjusted p value for the corresponding combination.
- We rank predictions based on the number of combinations that support this prediction and their corresponding adjusted p values. Specifically, we first take the negative log of the adjusted p values (transformed p values) such that the smaller the adjusted p value a prediction has (i.e., the more important the prediction is), the larger the transformed p value is. We then sum the 22 transformed p values and obtain one final index for each prediction, i.e., the confidence score. We rank the predictions from high to low based on their confidence scores.
4.6. Assigning Confidence Scores to the Predicted Gene–Gene Interactions
- For each of the 22 combinations, we first identify the statistically significantly enriched clusters and obtain all genes in the clusters, along with their edges from the network.
- For each identified edge, we assign the negative log-transformed p value associated with the cluster that the edge belongs to as its weight. If an edge belongs to multiple clusters, (e.g., genes can be grouped into multiple clusters via BigCLAM), we select the one with the smallest adjusted p value (i.e., the largest transformed p value).
- We sum up the 22 transformed p values (i.e., each corresponds to a combination) and obtain a final confidence score for each GGI. We rank the GGIs from high to low based on their confidence scores.
4.7. Examining the Connectivity of Endocytosis-Related Genes in the Consensus Network
- Given a group of endocytosis genes and assuming m genes, we select the induced subgraph (i.e., genes and their interactions) of the genes from a given network, and we calculate the observed density of the subgraph. Network density measures how close a network is to its complete version (i.e., all pairs of nodes are connected).
- We then randomly select m genes from the network and their induced subgraph. We calculate the network density. We repeat this process a thousand times and calculate the z score of the observed density compared to densities from the 1000 random runs.
- We use as the z score threshold to determine whether the observed density is significantly larger than the random densities. In particular, if the z score of a group of endocytosis genes is greater than 2.0, then the group of endocytosis genes is more densely connected than expected by chance.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GO | Gene Ontology |
P. | Plasmodium |
MI | Mutual information |
absPCC | absolute value of the Pearson Correlation Coefficient |
RF | Random Forest |
AdaL | Adaptive Lasso |
BigCLAM (BC) | cluster affiliation model for big networks |
MCL | Markov Clustering |
PPI | Protein-protein interaction |
GGI | Gene-gene interaction |
References
- Tangpukdee, N.; Duangdee, C.; Wilairatana, P.; Krudsood, S. Malaria diagnosis: A brief review. Korean J. Parasitol. 2009, 47, 93. [Google Scholar] [CrossRef] [PubMed]
- Oliveira-Ferreira, J.; Lacerda, M.V.; Brasil, P.; Ladislau, J.L.; Tauil, P.L.; Daniel-Ribeiro, C.T. Malaria in Brazil: An overview. Malar. J. 2010. [Google Scholar] [CrossRef] [PubMed]
- Talapko, J.; Škrlec, I.; Alebić, T.; Jukić, M.; Včev, A. Malaria: The past and the present. Microorganisms 2019, 7, 179. [Google Scholar] [CrossRef] [PubMed]
- Greenwood, B.; Mutabingwa, T. Malaria in 2002. Nature 2002, 415, 670–672. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Malaria. 2020. Available online: https://www.who.int/news-room/fact-sheets/detail/malaria (accessed on 17 January 2022).
- World Health Organization. The Potential Impact of Health Service Disruptions on the Burden of Malaria: A Modelling Analysis for Countries in Sub-Saharan Africa; WHO: Geneva, Switzerland, 2020.
- Yang, D.; He, Y.; Wu, B.; Deng, Y.; Li, M.; Yang, Q.; Huang, L.; Cao, Y.; Liu, Y. Drinking water and sanitation conditions are associated with the risk of malaria among children under five years old in sub-Saharan Africa: A logistic regression model analysis of national survey data. J. Adv. Res. 2020, 21, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Weiss, D.J.; Lucas, T.C.; Nguyen, M.; Nandi, A.K.; Bisanzio, D.; Battle, K.E.; Cameron, E.; Twohig, K.A.; Pfeffer, D.A.; Rozier, J.A.; et al. Mapping the global prevalence, incidence, and mortality of Plasmodium falciparum, 2000–17: A spatial and temporal modelling study. Lancet 2019, 394, 322–331. [Google Scholar] [CrossRef] [PubMed]
- Dondorp, A.M.; Nosten, F.; Yi, P.; Das, D.; Phyo, A.P.; Tarning, J.; Lwin, K.M.; Ariey, F.; Hanpithakpong, W.; Lee, S.J.; et al. Artemisinin resistance in Plasmodium falciparum malaria. N. Engl. J. Med. 2009, 361, 455–467. [Google Scholar] [CrossRef] [PubMed]
- Kochar, D.; Das, A.; Kochar, A.; Middha, S.; Acharya, J.; Tanwar, G.; Pakalapati, D.; Subudhi, A.; Boopathi, P.; Garg, S.; et al. A prospective study on adult patients of severe malaria caused by Plasmodium falciparum, Plasmodium vivax and mixed infection from Bikaner, northwest India. J. Vector Borne Dis. 2014, 51, 200. [Google Scholar] [CrossRef]
- Manning, L.; Laman, M.; Law, I.; Bona, C.; Aipit, S.; Teine, D.; Warrell, J.; Rosanas-Urgell, A.; Lin, E.; Kiniboro, B.; et al. Features and prognosis of severe malaria caused by Plasmodium falciparum, Plasmodium vivax and mixed Plasmodium species in Papua New Guinean children. PLoS ONE 2011, 6, e29203. [Google Scholar] [CrossRef]
- Iqbal, J.; Al-Awadhi, M.; Ahmad, S. Decreasing trend of imported malaria cases but increasing influx of mixed P. falciparum and P. vivax infections in malaria-free Kuwait. PLoS ONE 2020, 15, e0243617. [Google Scholar] [CrossRef]
- Le Bras, J.; Durand, R. The mechanisms of resistance to antimalarial drugs in Plasmodium falciparum. Fundam. Clin. Pharmacol. 2003, 17, 147–153. [Google Scholar] [CrossRef] [PubMed]
- Ippolito, M.M.; Moser, K.A.; Kabuya, J.B.B.; Cunningham, C.; Juliano, J.J. Antimalarial Drug Resistance and Implications for the WHO Global Technical Strategy. Curr. Epidemiol. Rep. 2021, 8, 46–62. [Google Scholar] [CrossRef] [PubMed]
- Oberstaller, J.; Otto, T.D.; Rayner, J.C.; Adams, J.H. Essential Genes of the Parasitic Apicomplexa. Trends Parasitol. 2021, 37, 304–316. [Google Scholar] [CrossRef] [PubMed]
- Hunt, P.; Martinelli, A.; Modrzynska, K.; Borges, S.; Creasey, A.; Rodrigues, L.; Beraldi, D.; Loewe, L.; Fawcett, R.; Kumar, S.; et al. Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites. BMC Genom. 2010, 11, 499. [Google Scholar] [CrossRef] [PubMed]
- LaCount, D.J.; Vignali, M.; Chettier, R.; Phansalkar, A.; Bell, R.; Hesselberth, J.R.; Schoenfeld, L.W.; Ota, I.; Sahasrabudhe, S.; Kurschner, C.; et al. A protein interaction network of the malaria parasite Plasmodium falciparum. Nature 2005, 438, 103–107. [Google Scholar] [CrossRef] [PubMed]
- Hain, A.U.; Miller, A.S.; Levitskaya, J.; Bosch, J. Virtual screening and experimental validation identify novel inhibitors of the Plasmodium falciparum Atg8-Atg3 protein-protein interaction. ChemMedChem 2016, 11, 900–910. [Google Scholar] [CrossRef] [PubMed]
- Ramaprasad, A.; Pain, A.; Ravasi, T. Defining the protein interaction network of human malaria parasite Plasmodium falciparum. Genomics 2012, 99, 69–75. [Google Scholar] [CrossRef] [PubMed]
- Hu, G.; Cabrera, A.; Kono, M.; Mok, S.; Chaal, B.K.; Haase, S.; Engelberg, K.; Cheemadan, S.; Spielmann, T.; Preiser, P.R.; et al. Transcriptional profiling of growth perturbations of the human malaria parasite Plasmodium falciparum. Nat. Biotechnol. 2010, 28, 91–98. [Google Scholar] [CrossRef] [PubMed]
- Weirauch, M.T. Gene co-expression networks for the analysis of DNA microarray data. Appl. Stat. Netw. Biol. Methods Syst. Biol. 2011, 1, 215–250. [Google Scholar]
- Siwo, G.H.; Tan, A.; Button-Simons, K.A.; Samarakoon, U.; Checkley, L.A.; Pinapati, R.S.; Ferdig, M.T. Predicting functional and regulatory divergence of a drug resistance transporter gene in the human malaria parasite. BMC Genom. 2015, 16, 115. [Google Scholar] [CrossRef]
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Szklarczyk, D.; Gable, A.L.; Nastou, K.C.; Lyon, D.; Kirsch, R.; Pyysalo, S.; Doncheva, N.T.; Legeay, M.; Fang, T.; Bork, P.; et al. The STRING database in 2021: Customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021, 49, D605–D612. [Google Scholar] [CrossRef] [PubMed]
- Huynh-Thu, V.A.; Irrthum, A.; Wehenkel, L.; Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 2010, 5, e12776. [Google Scholar] [CrossRef] [PubMed]
- Krämer, N.; Schäfer, J.; Boulesteix, A.L. Regularized estimation of large-scale gene association networks using graphical Gaussian models. BMC Bioinform. 2009, 10, 384. [Google Scholar] [CrossRef] [PubMed]
- Rider, A.K.; Milenković, T.; Siwo, G.H.; Pinapati, R.S.; Emrich, S.J.; Ferdig, M.T.; Chawla, N.V. Networks’ characteristics are important for systems biology. Netw. Sci. 2014, 2, 139–161. [Google Scholar] [CrossRef] [PubMed]
- Marbach, D.; Costello, J.C.; Kuffner, R.; Vega, N.M.; Prill, R.J.; Camacho, D.M.; Allison, K.R.; Bonneau, R.; Chen, Y.; Collins, J.J.; et al. Wisdom of crowds for robust gene network inference. Nat. Methods 2012, 9, 796–804. [Google Scholar] [CrossRef] [PubMed]
- Adjalley, S.H.; Scanfeld, D.; Kozlowski, E.; Llinas, M.; Fidock, D.A. Genome-wide transcriptome profiling reveals functional networks involving the Plasmodium falciparum drug resistance transporters PfCRT and PfMDR1. BMC Genom. 2015, 16, 1090. [Google Scholar] [CrossRef] [PubMed]
- Tan, Q.W.; Mutwil, M. Malaria.tools—Comparative genomic and transcriptomic database for Plasmodium species. Nucleic Acids Res. 2019, 48, D768–D775. [Google Scholar] [CrossRef]
- Yu, F.D.; Yang, S.Y.; Li, Y.Y.; Hu, W. Co-expression network with protein–protein interaction and transcription regulation in malaria parasite Plasmodium falciparum. Gene 2013, 518, 7–16. [Google Scholar] [CrossRef]
- Margolin, A.A.; Nemenman, I.; Basso, K.; Wiggins, C.; Stolovitzky, G.; Dalla Favera, R.; Califano, A. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 2006, 7, S7. [Google Scholar] [CrossRef]
- Crawford, J.; Milenković, T. ClueNet: Clustering a temporal network based on topological similarity rather than denseness. PLoS ONE 2018, 13, e0195993. [Google Scholar] [CrossRef]
- Yang, J.; Leskovec, J. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013; pp. 587–596. [Google Scholar]
- Enright, A.J.; Van Dongen, S.; Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30, 1575–1584. [Google Scholar] [CrossRef] [PubMed]
- Pham, M.; Wilson, S.; Govindarajan, H.; Lin, C.H.; Lichtarge, O. Discovery of disease-and drug-specific pathways through community structures of a literature network. Bioinformatics 2020, 36, 1881–1888. [Google Scholar] [CrossRef] [PubMed]
- Lu, K.; Yang, K.; Niyongabo, E.; Shu, Z.; Wang, J.; Chang, K.; Zou, Q.; Jiang, J.; Jia, C.; Liu, B.; et al. Integrated network analysis of symptom clusters across disease conditions. J. Biomed. Inform. 2020, 107, 103482. [Google Scholar] [CrossRef] [PubMed]
- Gorovits, A.; Gujral, E.; Papalexakis, E.E.; Bogdanov, P. Larc: Learning activity-regularized overlapping communities across time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1465–1474. [Google Scholar]
- Wang, Y.; Tang, H.; DeBarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.h.; Jin, H.; Marler, B.; Guo, H.; et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012, 40, e49. [Google Scholar] [CrossRef] [PubMed]
- Nelson-Sathi, S.; Sousa, F.L.; Roettger, M.; Lozada-Chávez, N.; Thiergart, T.; Janssen, A.; Bryant, D.; Landan, G.; Schönheit, P.; Siebers, B.; et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 2015, 517, 77–80. [Google Scholar] [CrossRef] [PubMed]
- Liao, Q.; Liu, C.; Yuan, X.; Kang, S.; Miao, R.; Xiao, H.; Zhao, G.; Luo, H.; Bu, D.; Zhao, H.; et al. Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic Acids Res. 2011, 39, 3864–3878. [Google Scholar] [CrossRef]
- Wang, Y.; Batra, A.; Schulenburg, H.; Dagan, T. Gene sharing among plasmids and chromosomes reveals barriers for antibiotic resistance gene transfer. Philos. Trans. R. Soc. B 2021, 377, 20200467. [Google Scholar] [CrossRef] [PubMed]
- Carey, S.B.; Jenkins, J.; Lovell, J.T.; Maumus, F.; Sreedasyam, A.; Payton, A.C.; Shu, S.; Tiley, G.P.; Fernandez-Pozo, N.; Healey, A.; et al. Gene-rich UV sex chromosomes harbor conserved regulators of sexual development. Sci. Adv. 2021, 7, eabh2488. [Google Scholar] [CrossRef]
- Li, Q.; Milenković, T. Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 2484–2498. [Google Scholar] [CrossRef]
- Li, Q.; Newaz, K.; Milenković, T. Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC Bioinform. 2021, 22, 520. [Google Scholar] [CrossRef]
- Lawson, K.A.; Sousa, C.M.; Zhang, X.; Kim, E.; Akthar, R.; Caumanns, J.J.; Yao, Y.; Mikolajewicz, N.; Ross, C.; Brown, K.R.; et al. Functional genomic landscape of cancer-intrinsic evasion of killing by T cells. Nature 2020, 586, 120–126. [Google Scholar] [CrossRef]
- Koskinen, P.; Törönen, P.; Nokso-Koivisto, J.; Holm, L. PANNZER: High-throughput functional annotation of uncharacterized proteins in an error-prone environment. Bioinformatics 2015, 31, 1544–1552. [Google Scholar] [CrossRef]
- Ng, P.K.S.; Li, J.; Jeong, K.J.; Shao, S.; Chen, H.; Tsang, Y.H.; Sengupta, S.; Wang, Z.; Bhavana, V.H.; Tran, R.; et al. Systematic functional annotation of somatic mutations in cancer. Cancer Cell 2018, 33, 450–462. [Google Scholar] [CrossRef]
- Birnbaum, J.; Scharf, S.; Schmidt, S.; Jonscher, E.; Hoeijmakers, W.A.M.; Flemming, S.; Toenhake, C.G.; Schmitt, M.; Sabitzki, R.; Bergmann, B.; et al. A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science 2020, 367, 51–59. [Google Scholar] [CrossRef]
- Pieperhoff, M.S.; Schmitt, M.; Ferguson, D.J.; Meissner, M. The role of clathrin in post-Golgi trafficking in Toxoplasma gondii. PLoS ONE 2013, 8, e77620. [Google Scholar] [CrossRef]
- Henrici, R.C.; Edwards, R.L.; Zoltner, M.; van Schalkwyk, D.A.; Hart, M.N.; Mohring, F.; Moon, R.W.; Nofal, S.D.; Patel, A.; Flueck, C.; et al. The Plasmodium falciparum artemisinin susceptibility-associated AP-2 adaptin μ subunit is clathrin independent and essential for schizont maturation. Mbio 2020, 11, e02918-19. [Google Scholar] [CrossRef]
- Thakur, V.; Asad, M.; Jain, S.; Hossain, M.E.; Gupta, A.; Kaur, I.; Rathore, S.; Ali, S.; Khan, N.J.; Mohmmed, A. Eps15 homology domain containing protein of Plasmodium falciparum (PfEHD) associates with endocytosis and vesicular trafficking towards neutral lipid storage site. Biochim. Biophys. Acta (BBA)-Mol. Cell Res. 2015, 1853, 2856–2869. [Google Scholar] [CrossRef]
- Barabási, A.L. Network science. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2013, 371, 20120375. [Google Scholar] [CrossRef]
- Hulovatyy, Y.; D’Mello, S.; Calvo, R.A.; Milenković, T. Network analysis improves interpretation of affective physiological data. J. Complex Netw. 2014, 2, 614–636. [Google Scholar] [CrossRef]
- Gysi, D.M.; Voigt, A.; Fragoso, T.d.M.; Almaas, E.; Nowick, K. wTO: An R package for computing weighted topological overlap and a consensus network with integrated visualization tool. BMC Bioinform. 2018, 19, 392. [Google Scholar] [CrossRef]
- Spielmann, T.; Gras, S.; Sabitzki, R.; Meissner, M. Endocytosis in Plasmodium and Toxoplasma parasites. Trends Parasitol. 2020, 36, 520–532. [Google Scholar] [CrossRef]
- Karczewski, K.J.; Francioli, L.C.; Tiao, G.; Cummings, B.B.; Alföldi, J.; Wang, Q.; Collins, R.L.; Laricchia, K.M.; Ganna, A.; Birnbaum, D.P.; et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020, 581, 434–443. [Google Scholar] [CrossRef]
- Hu, G.; Llinás, M.; Li, J.; Preiser, P.R.; Bozdech, Z. Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy. BMC Bioinform. 2007, 8, 350. [Google Scholar] [CrossRef]
- Buuren, S.v.; Groothuis-Oudshoorn, K. MICE: Multivariate imputation by chained equations in R. J. Stat. Softw. 2010, 48, 1–67. [Google Scholar] [CrossRef]
- Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Altman, R.B. Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef]
- Kim, H.; Golub, G.H.; Park, H. Missing value estimation for DNA microarray gene expression data: Local least squares imputation. Bioinformatics 2005, 21, 187–198. [Google Scholar] [CrossRef]
- Mazumder, R.; Hastie, T.; Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 2010, 11, 2287–2322. [Google Scholar]
- Hastie, T.; Mazumder, R.; Lee, J.D.; Zadeh, R. Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 2015, 16, 3367–3402. [Google Scholar]
- Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717. [Google Scholar] [CrossRef]
- Stacklies, W.; Redestig, H.; Scholz, M.; Walther, D.; Selbig, J. PCAMethods—A bioconductor package providing PCA methods for incomplete data. Bioinformatics 2007, 23, 1164–1167. [Google Scholar] [CrossRef]
- Rito, T.; Wang, Z.; Deane, C.M.; Reinert, G. How threshold behaviour affects the use of subgraphs for network comparison. Bioinformatics 2010, 26, i611–i617. [Google Scholar] [CrossRef] [PubMed]
- Bansal, M.; Belcastro, V.; Ambesi-Impiombato, A.; Di Bernardo, D. How to infer gene networks from expression profiles. Mol. Syst. Biol. 2007, 3, 78. [Google Scholar] [CrossRef]
- Ahsen, M.E.; Chun, Y.; Grishin, A.; Grishina, G.; Stolovitzky, G.; Pandey, G.; Bunyavanich, S. NeTFactor, a framework for identifying transcriptional regulators of gene expression-based biomarkers. Sci. Rep. 2019, 9, 12970. [Google Scholar] [CrossRef] [PubMed]
- van Dam, S.; Vosa, U.; van der Graaf, A.; Franke, L.; de Magalhaes, J.P. Gene co-expression analysis for functional classification and gene–disease predictions. Briefings Bioinform. 2018, 19, 575–592. [Google Scholar] [CrossRef] [PubMed]
- Montes, R.A.C.; Coello, G.; González-Aguilera, K.L.; Marsch-Martínez, N.; de Folter, S.; Alvarez-Buylla, E.R. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks. BMC Plant Biol. 2014, 14, 97. [Google Scholar]
- Lachmann, A.; Giorgi, F.M.; Lopez, G.; Califano, A. ARACNe-AP: Gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics 2016, 32, 2233–2235. [Google Scholar] [CrossRef] [PubMed]
- Camacho, D.; De La Fuente, A.; Mendes, P. The origin of correlations in metabolomics data. Metabolomics 2005, 1, 53–63. [Google Scholar] [CrossRef]
- Jahagirdar, S.; Suarez-Diez, M.; Saccenti, E. Simulation and Reconstruction of Metabolite–Metabolite Association Networks Using a Metabolic Dynamic Model and Correlation Based Algorithms. J. Proteome Res. 2019, 18, 1099–1113. [Google Scholar] [CrossRef] [PubMed]
- Whittaker, J. Graphical Models in Applied Multivariate Statistics; Wiley Publishing: Hoboken, NJ, USA, 2009. [Google Scholar]
- Dobra, A.; Hans, C.; Jones, B.; Nevins, J.R.; Yao, G.; West, M. Sparse graphical models for exploring gene expression data. J. Multivar. Anal. 2004, 90, 196–212. [Google Scholar] [CrossRef]
- Schäfer, J.; Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 2005, 4. [Google Scholar] [CrossRef]
- Li, H.; Gui, J. Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 2006, 7, 302–317. [Google Scholar] [CrossRef] [PubMed]
- Wong, D.C.; Sweetman, C.; Ford, C.M. Annotation of gene function in citrus using gene expression information and co-expression networks. BMC Plant Biol. 2014, 14, 186. [Google Scholar] [CrossRef] [PubMed]
- Bauer-Mehren, A.; Bundschus, M.; Rautschka, M.; Mayer, M.A.; Sanz, F.; Furlong, L.I. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS ONE 2011, 6, e20284. [Google Scholar] [CrossRef] [PubMed]
- Rund, S.S.; Yoo, B.; Alam, C.; Green, T.; Stephens, M.T.; Zeng, E.; George, G.F.; Sheppard, A.D.; Duffield, G.E.; Milenković, T.; et al. Genome-wide profiling of 24 hr diel rhythmicity in the water flea, Daphnia pulex: Network analysis reveals rhythmic gene expression and enhances functional gene annotation. BMC Genom. 2016, 17, 653. [Google Scholar] [CrossRef]
- Supek, F.; Matko, B.; Skunca, N.; Smuc, T. REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLoS ONE 2011, 6, e21800. [Google Scholar] [CrossRef]
- Hahne, F.; Huber, W.; Gentleman, R.; Falcon, S.; Falcon, S.; Gentleman, R. Hypergeometric testing used for gene set enrichment analysis. In Bioconductor Case Studies; Springer: New York, NY, USA, 2008; pp. 207–220. [Google Scholar]
Network | Number of Nodes | Number of Edges | Density |
---|---|---|---|
MI | 4373 | 73,474 | 0.77% |
absPCC-0.3 | 3792 | 285,405 | 3.97% |
absPCC-0.4 | 3993 | 380,539 | 4.77% |
absPCC-0.5 | 4126 | 475,674 | 5.59% |
absPCC | 4322 | 951,347 | 10.19% |
RF-0.03 | 4188 | 25,000 | 0.29% |
RF-0.05 | 4323 | 43,406 | 0.46% |
RF-0.1 | 4372 | 86,811 | 0.91% |
RF | 4374 | 868,105 | 9.08% |
AdaL | 4082 | 7708 | 0.09% |
Consensus | 4374 | 333,162 | 3.48% |
Network | BC Parameter Values | MCL Parameter Values | ||||
---|---|---|---|---|---|---|
Criterion (i) | Criterion (ii) | Criterion (iii) | Criterion (i) | Criterion (ii) | Criterion (iii) | |
MI | 425 | 25 | 450 | I2.4 (3775) | I1.6 (133) | I1.7 (571) |
absPCC-0.3 | 350 | 100 | 450 | I1.5 (138) | I2.9 (908) | I2 (335) |
absPCC-0.4 | 650 | 125 | 500 | I1.6 (160) | I2.3 (522) | I2 (338) |
absPCC-0.5 | 650 | 275 | 475 | I1.32 (53) | I2.6 (724) | I2.4 (565) |
absPCC | 650 | 225 | 225 | I3.6 (1115) | I2.5 (361) | I3 (711) |
RF-0.03 | 700 | 25 | 200 | I2.8 (2315) | I1.2 (31) | I2 (1156) |
RF-0.05 | 600 | 50 | 350 | I3.2 (2985) | I1.24 (19) | I1.9 (830) |
RF-0.1 | 700 | 50 | 450 | I5 (3953) | I1.28 (10) | I2.1 (1129) |
RF | 475 | 50 | 75 | I5 (3519) | I1.7 (6) | I2.5 (201) |
AdaL | 550 | 25 | 75 | I1.24 (314) | I1.2 (238) | I1.7 (1105) |
Consensus | 275 | 275 | 200 | I5 (3656) | I1.36 (13) | I1.7 (243) |
Group | Time Points | Treatments |
---|---|---|
1 | 8 | Control, RoscovitineA, CyclosporineA, FK506 |
2 | 5 | Control, Colchicine, Na3VO4, StaurosporineA |
3 | 7 | Control, ML7, W7, KN93, Staurosporine |
4 | 6 | Control, Artemisinin, Chloroquine, Febrifugine, Quinine |
5 | 5 | Control, E64, Leupeptine, PMSF, RetinolA |
6 | 6 | Control, Apicidin (troph 5 nM), Apicidine (troph IC90) |
7 | 5 | Control, Apicidin (schiz IC50), Apicidin (schiz IC90) |
8 | 6 | Control, TrichostatinA (IC50), TrichostatinA (IC90) |
9 | 6 | Control, Chloroquine (IC50), Chloroquine (IC90), Chloroquine (2 × IC90) |
10 | 10 | Control, EGTA (IC50), EGTA (IC90) |
# of Genes | Number of Samples | Number of Relevant GO Terms | Number of Relevant Gene-0GO Term Associations |
---|---|---|---|
4374 | 183 | 255 | 3232 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Q.; Button-Simons, K.A.; Sievert, M.A.C.; Chahoud, E.; Foster, G.F.; Meis, K.; Ferdig, M.T.; Milenković, T. Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum. Genes 2024, 15, 685. https://doi.org/10.3390/genes15060685
Li Q, Button-Simons KA, Sievert MAC, Chahoud E, Foster GF, Meis K, Ferdig MT, Milenković T. Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum. Genes. 2024; 15(6):685. https://doi.org/10.3390/genes15060685
Chicago/Turabian StyleLi, Qi, Katrina A. Button-Simons, Mackenzie A. C. Sievert, Elias Chahoud, Gabriel F. Foster, Kaitlynn Meis, Michael T. Ferdig, and Tijana Milenković. 2024. "Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum" Genes 15, no. 6: 685. https://doi.org/10.3390/genes15060685
APA StyleLi, Q., Button-Simons, K. A., Sievert, M. A. C., Chahoud, E., Foster, G. F., Meis, K., Ferdig, M. T., & Milenković, T. (2024). Enhancing Gene Co-Expression Network Inference for the Malaria Parasite Plasmodium falciparum. Genes, 15(6), 685. https://doi.org/10.3390/genes15060685