Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
Abstract
:1. Introduction
2. Materials and Methods
2.1. Consensus-Guided Unsupervised Feature Selection
2.2. Evaluation
2.3. Ensemble Consensus-Guided Unsupervised Feature Selection
3. Results
3.1. Gene Expression Data
3.2. Parameter Setting
3.3. Performance Comparison between Ensemble Consensus-Guided Unsupervised Feature Selection and Other Methods
3.4. Enrichment Analysis
4. Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Barchet, T.M.; Amiji, M.M. Challenges and opportunities in CNS delivery of therapeutics for neurodegenerative diseases. Expert Opin. Drug Deliv. 2009, 6, 211–225. [Google Scholar] [CrossRef] [PubMed]
- Bateman, R. Alzheimer’s disease and other dementias: Advances in 2014. Lancet Neurol. 2015, 14, 4–6. [Google Scholar] [CrossRef]
- Wurtman, R. Biomarkers in the diagnosis and management of Alzheimer’s disease. Metab. Clin. Exp. 2014, 64, S47–S50. [Google Scholar] [CrossRef] [PubMed]
- Miller, D.B.; O’Callaghan, J.P. Biomarkers of Parkinson’s disease: Present and future. Metab. Clin. Exp. 2015, 64, S40–S46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luthi-Carter, R.; Apostol, B.L.; Dunah, A.W.; Dejohn, M.M.; Farrell, L.A.; Bates, G.P.; Young, A.B.; Standaert, D.G.; Thompson, L.M.; Cha, J.H. Complex alteration of NMDA receptors in transgenic Huntington’s disease mouse brain: Analysis of mRNA and protein expression, plasma membrane association, interacting proteins, and phosphorylation. Neurobiol. Dis. 2003, 14, 624–636. [Google Scholar] [CrossRef] [PubMed]
- Luthi-Carter, R.; Strand, A.; Peters, N.L.; Solano, S.M.; Hollingsworth, Z.R.; Menon, A.S.; Frey, A.S.; Spektor, B.S.; Penney, E.B.; Schilling, G. Decreased expression of striatal signaling genes in a mouse model of Huntington’s disease. Hum. Mol. Genet. 2000, 9, 1259–1271. [Google Scholar] [CrossRef] [PubMed]
- Romanoski, C.E.; Lee, S.; Kim, M.J.; Ingram-Drake, L.; Plaisier, C.L.; Yordanova, R.; Tilford, C.; Guan, B.; He, A.; Gargalovic, P.S. Systems Genetics Analysis of Gene-by-Environment Interactions in Human Cells. Am. J. Hum. Genet. 2010, 86, 399–410. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hong, F.; Breitling, R. A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics 2008, 24, 374–382. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jiang, X.; Zhang, H.; Zhang, Z.; Quan, X. Flexible non-negative matrix factorization to unravel disease-related genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018. [Google Scholar] [CrossRef] [PubMed]
- Xulvibrunet, R.; Li, H. Co-expression networks: Graph properties and topological comparisons. Bioinformatics 2010, 26, 205–214. [Google Scholar] [CrossRef] [PubMed]
- Iancu, O.D.; Kawane, S.; Bottomly, D.; Searles, R.; Hitzemann, R.; Mcweeney, S. Utilizing RNA-Seq data for de novo coexpression network inference. Bioinformatics 2012, 28, 1592–1597. [Google Scholar] [CrossRef] [PubMed]
- Jiang, X.; Zhang, H.; Quan, X.; Liu, Z. Disease-related gene module detection based on a multi-label propagation clustering algorithm. PLoS ONE 2017, 12, e178006. [Google Scholar] [CrossRef] [PubMed]
- Saeys, Y.; Abeel, T.; Peer, Y. Robust Feature Selection Using Ensemble Feature Selection Techniques. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 14–18 September 2008; pp. 313–325. [Google Scholar] [CrossRef]
- Wolf, L.; Shashua, A. Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weighted-based Approach. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; p. 378. [Google Scholar]
- Liu, H.; Shao, M.; Fu, Y. Consensus Guided Unsupervised Feature Selection. In Proceedings of the Association for the Advancement of Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Wan, S.; Duan, Y.; Zou, Q. HPSLPred: An ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics 2017, 17, 1700262. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Ying, Z.; Ji, Q.; Liu, X.; Jiang, Y.; Ke, C.; Zou, Q. Hierarchical classification of protein folds using a novel ensemble classifier. PLoS ONE 2013, 8, e56499. [Google Scholar] [CrossRef]
- Zou, Q.; Guo, J.; Ju, Y.; Wu, M.; Zeng, X.; Hong, Z. Improving tRNAscan-SE annotation results via ensemble classifiers. QSAR Comb. Sci. 2015, 34, 761–770. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Xing, P.; Zou, Q. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci. Rep. 2017, 7, 40242. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
- Mirkin, B. Reinterpreting the category utility function. Mach. Learn. 2001, 45, 219–228. [Google Scholar] [CrossRef]
- Wu, J.; Liu, H.; Xiong, H.; Cao, J. A Theoretic Framework of K-Means-Based Consensus Clustering. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 1799–1805. [Google Scholar]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
- Langfelder, P.; Cantle, J.P.; Chatzopoulou, D.; Wang, N.; Gao, F.; Alramahi, I.; Lu, X.H.; Ramos, E.M.; Elzein, K.; Zhao, Y. Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice. Nat. Neurosci. 2016, 19, 623–633. [Google Scholar] [CrossRef] [PubMed]
- Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. EdgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009, 26, 139–140. [Google Scholar] [CrossRef] [PubMed]
- Smyth, G.K. Limma: Linear Models for Microarray Data. In Bioinformatics & Computational Biology Solutions Using R & Bioconductor; Springer Science & Business Media: New York, NY, USA, 2005; pp. 397–420. [Google Scholar]
- Wang, H.Q.; Zheng, C.H.; Zhao, X.M. jNMFMA: A joint non-negative matrix factorization meta-analysis of transcriptomics data. Bioinformatics 2015, 31, 572–580. [Google Scholar] [CrossRef] [PubMed]
- Jiang, X.; Zhang, H.; Duan, F.; Quan, X. Identify Huntington’s disease associated genes based on restricted Boltzmann machine with RNA-seq data. BMC Bioinform. 2017, 18, 447. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schuldt, C. Recognizing Human Action: A Local SVM Approach. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004. [Google Scholar]
- Huang, D.W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 44–57. [Google Scholar] [CrossRef] [PubMed]
- Waldvogel, H.J.; Kim, E.H.; Thu, D.C.; Tippett, L.J.; Faull, R.L. New perspectives on the neuropathology in Huntington’s Disease in the human brain and its relation to symptom variation. J. Huntingt. Dis. 2012, 1, 143–153. [Google Scholar]
- Difiglia, M.; Sapp, E.; Chase, K.O.; Davies, S.W.; Bates, G.P.; Vonsattel, J.P.; Aronin, N. Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 1997, 277, 1990–1993. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.; Kim, H.J. Prion-like mechanism in Amyotrophic Lateral Sclerosis: Are protein aggregates the key? Exp. Neurobiol. 2015, 24, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Lim, J.; Yue, Z. Neuronal aggregates: Formation, clearance, and spreading. Dev. Cell. 2015, 32, 491–501. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Huang, T.; Bu, G.; Xu, H. Dysregulation of protein trafficking in neurodegeneration. Mol. Neurodegener. 2014, 9, 1–9. [Google Scholar] [CrossRef] [PubMed]
Input: : The gene expression matrix; : The matrix of basic clustering results; , : Parameters; : The number of bags; : The number of samples in one bag. Initialize ; 1: For Initialize ,,; 2: repeat; 3: build the matrix ; 4: run K-means on to update and ; 5: update ; 6: update ; 7: until the value of the objective function remains unchanged. 8: Obtain the gene weights according to Equation (7); sort genes according to to get the gene-ranked list; 9: get the area under the ROC of the gene-ranked list ; 10: End 11: Calculate the ensemble gene weights according to Equation (8). 12: Output: |
FNMF | jNMFMA | CGUFS | ECGUFS | |
---|---|---|---|---|
AUC | 56.0 ± 1.9 | 56.7 ± 1.6 | 54.3 ± 1.5 | 59.2 ± 0.8 |
AUPR | 20.4 ± 1.9 | 20.7 ± 1.6 | 22.5 ± 1.8 | 29.4 ± 1.9 |
E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | |
---|---|---|---|---|---|---|---|---|---|
E1 | 710 (71%) | 705 (70.5%) | 686 (68.6%) | 677 (67.7%) | 695 (69.5%) | 679 (67.9%) | 663 (66.3%) | 691 (69.1%) | 666 (66.6%) |
E2 | 697 (69.7%) | 686 (68.6%) | 657 (65.7%) | 686 (68.6%) | 721 (72.1%) | 676 (67.6%) | 737 (73.7%) | 682 (68.2%) | |
E3 | 689 (68.9%) | 677 (67.7%) | 691 (69.1%) | 683 (68.3%) | 655 (65.5%) | 678 (67.8%) | 665 (66.5%) | ||
E4 | 684 (68.4%) | 704 (70.4%) | 696 (69.6%) | 681 (68.1%) | 715 (71.5%) | 668 (66.8%) | |||
E5 | 659 (65.9%) | 657 (65.7%) | 674 (67.4%) | 665 (66.5%) | 664 (66.4%) | ||||
E6 | 670 (67.0%) | 670 (67.0%) | 691 (69.1%) | 690 (69.0%) | |||||
E7 | 666 (66.6%) | 707 (70.7%) | 669 (66.9%) | ||||||
E8 | 678 (67.8%) | 649 (64.9%) | |||||||
E9 | 682 (68.2%) |
E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | |
---|---|---|---|---|---|---|---|---|---|
E1 | 1593 (79.7%) | 1598 (79.9%) | 1565 (78.3%) | 1570 (78.5%) | 1603 (80.2%) | 1569 (78.5%) | 1547 (77.4%) | 1589 (79.5%) | 1564 (78.2%) |
E2 | 1623 (69.7%) | 1589 (79.5%) | 1550 (77.5%) | 1621 (81.1%) | 1618 (80.9%) | 1534 (76.7%) | 1621 (81.1%) | 1595 (79.8%) | |
E3 | 1582 (79.1%) | 1589 (79.5%) | 1610 (80.5%) | 1603 (80.2%) | 1559 (78.0%) | 1599 (80.0%) | 1590 (79.5%) | ||
E4 | 1573 (78.7%) | 1605 (80.3%) | 1563 (78.2%) | 1570 (78.5%) | 1592 (79.6%) | 1567 (78.4%) | |||
E5 | 1569 (78.5%) | 1545 (77.3%) | 1575 (78.8%) | 1572 (78.6%) | 1567 (78.4%) | ||||
E6 | 1597 (79.9%) | 1570 (78.5%) | 1607 (80.4%) | 1619 (81.0%) | |||||
E7 | 1557 (77.9%) | 1615 (80.8%) | 1584 (79.2%) | ||||||
E8 | 1561 (78.1%) | 1550 (77.5%) | |||||||
E9 | 1596 (79.8%) |
Annotation Cluster | Category | Annotation | Count | p Value | Benjamini |
---|---|---|---|---|---|
1 Enrichment Score: 3.02 | GOTERM_CC_DIRECT | Postsynaptic density | 16 | 4.2 × 10−7 | 1.3 × 10−4 |
GOTERM_CC_DIRECT | Postsynaptic membrane | 10 | 2.2 × 10−3 | 9.1 × 10−2 | |
GOTERM_CC_DIRECT | Synapse | 14 | 1.3 × 10−2 | 2.4 × 10−1 | |
GOTERM_CC_DIRECT | Cell junction | 15 | 7.4 × 10−2 | 4.9 × 10−1 | |
2 Enrichment Score: 1.93 | GOTERM_BP_DIRECT | Fatty acid metabolic process | 9 | 9.3 × 10−4 | 4.8 × 10−1 |
GOTERM_BP_DIRECT | Fatty acid biosynthetic process | 5 | 1.5 × 10−2 | 8.4 × 10−1 | |
KEGG_PATHWAY | Fatty acid metabolism | 4 | 3.6 × 10−2 | 8.7 × 10−1 | |
GOTERM_MF_DIRECT | Transferase activity, transferring acyl groups other than amino-acyl groups | 3 | 3.7 × 10−2 | 7.6 × 10−1 | |
3 Enrichment Score: 1.81 | GOTERM_MF_DIRECT | Transferase activity | 37 | 2.4 × 10−4 | 1.1 × 10−1 |
GOTERM_BP_DIRECT | Phosphorylation | 17 | 5.7 × 10−3 | 6.4 × 10−1 | |
GOTERM_MF_DIRECT | kinase activity | 18 | 8.5 × 10−3 | 5.7 × 10−1 | |
GOTERM_MF_DIRECT | Nucleotide binding | 38 | 1.4 × 10−2 | 6.3 × 10−1 | |
GOTERM_MF_DIRECT | ATP binding | 30 | 2.6 × 10−2 | 7.3 × 10−1 | |
GOTERM_BP_DIRECT | Protein phosphorylation | 14 | 3.5 × 10−2 | 9.6 × 10−1 | |
GOTERM_MF_DIRECT | Protein kinase activity | 12 | 9.6 × 10−2 | 8.6 × 10−1 | |
GOTERM_MF_DIRECT | Protein serine/threonine kinase activity | 9 | 2.1 × 10−1 | 9.4 × 10−1 | |
4 Enrichment Score: 1.58 | GOTERM_BP_DIRECT | Learning or memory | 5 | 4.1 × 10−3 | 6.9 × 10−1 |
GOTERM_BP_DIRECT | Regulation of synaptic plasticity | 4 | 1.7 × 10−2 | 8.4 × 10−1 | |
GOTERM_BP_DIRECT | Embryo development | 3 | 2.7 × 10−1 | 9.9 × 10−1 | |
5 Enrichment Score: 1.40 | GOTERM_CC_DIRECT | Cell–cell adherens junction | 10 | 2.0 × 10−2 | 2.9 × 10−1 |
GOTERM_MF_DIRECT | Cadherin binding involved in cell–cell adhesion | 9 | 3.3 × 10−2 | 7.7 × 10−1 | |
GOTERM_BP_DIRECT | Cell–cell adhesion | 6 | 9.7 × 10−2 | 9.9 × 10−1 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, X.; Jiang, X.; Xu, J.; Quan, X.; Wu, M.; Zhang, H. Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes. Genes 2018, 9, 350. https://doi.org/10.3390/genes9070350
Guo X, Jiang X, Xu J, Quan X, Wu M, Zhang H. Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes. Genes. 2018; 9(7):350. https://doi.org/10.3390/genes9070350
Chicago/Turabian StyleGuo, Xia, Xue Jiang, Jing Xu, Xiongwen Quan, Min Wu, and Han Zhang. 2018. "Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes" Genes 9, no. 7: 350. https://doi.org/10.3390/genes9070350
APA StyleGuo, X., Jiang, X., Xu, J., Quan, X., Wu, M., & Zhang, H. (2018). Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes. Genes, 9(7), 350. https://doi.org/10.3390/genes9070350