Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data †
Abstract
:1. Introduction
2. Material and Method
2.1. Dataset
2.2. Feature Extraction
2.3. Machine Learning Algorithms
2.3.1. Support Vector Machines
2.3.2. Naive Bayes
2.3.3. K-Nearest Neighbor
2.3.4. Random Forest
3. Results
4. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sohrabi, C.; Alsafi, Z.; O’Neill, N.; Khan, M.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, R. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. 2020, 76, 71–76. [Google Scholar] [CrossRef] [PubMed]
- Cui, J.; Li, F.; Shi, Z.L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019, 17, 181–192. [Google Scholar] [CrossRef] [PubMed]
- Guan, Y.; Zheng, B.J.; He, Y.Q.; Liu, X.L.; Zhuang, Z.X.; Cheung, C.L.; Luo, S.W.; Li, P.H.; Zhang, L.J.; Guan, Y.J.; et al. Isolation and Characterization of Viruses Related to the SARS Coronavirus from Animals in Southern China. Science 2003, 302, 276–278. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Pearson, W.R. Rapid and Sensitive Sequence Comparison with FASTP and FASTA. Methods Enzymol. 1990, 183, 63–98. [Google Scholar] [CrossRef] [PubMed]
- Pinello, L.; Lo Bosco, G.; Yuan, G.C. Applications of alignment-free methods in epigenomics. Brief. Bioinform. 2013, 15, 419–430. [Google Scholar] [CrossRef] [PubMed]
- Vinga, S.; Almeida, J. Alignment-free sequence comparison—A review. Bioinformatics 2003, 19, 513–523. [Google Scholar] [CrossRef] [PubMed]
- Kari, L.; Hill, K.A.; Sayem, A.S.; Karamichalis, R.; Bryans, N.; Davis, K.; Dattani, N.S. Mapping the Space of Genomic Signatures. PLoS ONE 2015, 10, 1–17. [Google Scholar] [CrossRef]
- Karamichalis, R.; Kari, L.; Konstantinidis, S.; Kopecki, S. An investigation into inter- and intragenomic variations of graphic genomic signatures. Bmc Bioinform. 2015, 16, 246. [Google Scholar] [CrossRef] [PubMed]
- Solis-Reyes, S.; Avino, M.; Poon, A.; Kari, L. An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes. PLoS ONE 2018, 13, 1–21. [Google Scholar] [CrossRef] [PubMed]
- Randhawa, G.; Hill, K.; Kari, L. ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels. BMC Genom. 2019, 20, 267. [Google Scholar] [CrossRef] [PubMed]
- Randhawa, G.S.; Soltysiak, M.P.M.; El Roz, H.; de Souza, C.P.E.; Hill, K.A.; Kari, L. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE 2020, 15, 1–24. [Google Scholar] [CrossRef]
- Wang, Y.; Mao, J.M.; Wang, G.D.; Luo, Z.P.; Yang, L.; Yao, Q.; Chen, K.P. Human SARS-CoV-2 has evolved to reduce CG dinucleotide in its open reading frames. Sci. Rep. 2020, 10, 5165–5184. [Google Scholar] [CrossRef]
- Zhao, W.-M.; Song, S.-H.; Chen, M.-L.; Zou, D.; Ma, L.-N.; Ma, Y.-K.; Li, R.-J.; Hao, L.-L.; Li, C.-P.; Tian, D.-M.; et al. The 2019 novel coronavirus resource. Yi Chuan 2020, 42, 212–221. [Google Scholar] [CrossRef] [PubMed]
- Dinka, H.; Milkesa1, A. Unfolding SARS-CoV-2 viral genome to understand its gene expression regulation. Infect Genet Evol. 2020, 84, 104386. [Google Scholar] [CrossRef] [PubMed]
- Ponger, L.; Mouchiroud, D. CpGProD: Identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 2002, 18, 631–633. [Google Scholar] [CrossRef] [PubMed]
- Arslan, H. A New Promoter Prediction Method using Support Vector Machines. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Zhang, W.; Zhao, D.; Chai, Z.; Yang, L.T.; Liu, X.; Gong, F.; Yang, S. Deep Learning and SVM-Based Emotion Recognition from Chinese Speech for Smart Affective Services. Softw. Pract. Exper. 2017, 47, 1127–1138. [Google Scholar]
- Brown, I.; Mues, C. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 2012, 39, 3446–3453. [Google Scholar] [CrossRef]
- Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. In Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th ed.; Morgan Kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
Human Coronaviruses | The Number Sequences | Label |
---|---|---|
SARS-CoV-2 | 1000 | 1 |
Alphacoronavirus | 88 | 0 |
Betacoronavirus-1 | 140 | 0 |
Human Coronavirus 229E | 27 | 0 |
Human Coronavirus HKU1 | 18 | 0 |
Human Coronavirus NL63 | 61 | 0 |
Method | Precision | Recall | F-Measure | Accuracy |
---|---|---|---|---|
Support Vector Machine | 0.869 | 0.873 | 0.868 | 0.87 |
Naive Bayes | 0.882 | 0.885 | 0.879 | 0.88 |
K-Nearest Neighbor | 0.927 | 0.926 | 0.926 | 0.92 |
Random Forest | 0.93 | 0.93 | 0.93 | 0.93 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arslan, H. Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data. Proceedings 2021, 74, 20. https://doi.org/10.3390/proceedings2021074020
Arslan H. Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data. Proceedings. 2021; 74(1):20. https://doi.org/10.3390/proceedings2021074020
Chicago/Turabian StyleArslan, Hilal. 2021. "Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data" Proceedings 74, no. 1: 20. https://doi.org/10.3390/proceedings2021074020
APA StyleArslan, H. (2021). Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data. Proceedings, 74(1), 20. https://doi.org/10.3390/proceedings2021074020