Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review
Abstract
1. Introduction
2. Materials and Methods
2.1. Eligibility Criteria
2.2. Analysis of the Extracted Variables
3. Results
4. Discussion
- (a)
- Integrating multiple AI techniques or combining AI with other diagnostic modalities, such as imaging or transcriptomics, to improve prediction accuracy;
- (b)
- Developing more accurate and robust AI models that can handle complex and noisy data from different sources and settings;
- (c)
- Exploring the use of AI for predicting resistance to other drugs besides rifampicin, isoniazid, pyrazinamide, and fluoroquinolones;
- (d)
- Integrating AI with other technologies such as molecular diagnostics, biosensors, or nanotechnology for rapid and point-of-care detection of DR-TB;
- (e)
- Evaluating the cost-effectiveness, feasibility, and ethical implications of implementing AI for DR-TB diagnosis in low- and middle-income countries.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- World Health Organization Global Tuberculosis Report 2022; World Health Organization: Geneva, Switzerland, 2022.
- Jang, J.G.; Chung, J.H. Diagnosis and Treatment of Multidrug-Resistant Tuberculosis. J. Yeungnam Med. Sci. 2020, 37, 277–285. [Google Scholar] [PubMed]
- Green, A.G.; Yoon, C.H.; Chen, M.L.; Freschi, L.; Gröschel, M.I.; Kohane, I.; Beam, A.; Farhat, M. A Convolutional Neural Network Highlights Mutations Relevant to Antimicrobial Resistance in Mycobacterium tuberculosis. Nat. Commun. 2022, 13, 3817. [Google Scholar]
- Aytan-Aktug, D.; Clausen, P.T.L.C.; Bortolaia, V.; Aarestrup, F.M.; Lund, O. Prediction of Acquired Antimicrobial Resistance for Multiple Bacterial Species Using Neural Networks. mSystems 2020, 5, e00774-19. [Google Scholar] [PubMed]
- Aytan-Aktug, D.; Nguyen, M.; Clausen, P.T.L.C.; Stevens, R.L.; Aarestrup, F.M.; Lund, O.; Davis, J.J. Predicting Antimicrobial Resistance Using Partial Genome Alignments. mSystems 2021, 6, e0018521. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.L.; Doddi, A.; Royer, J.; Freschi, L.; Schito, M.; Ezewudo, M.; Kohane, I.S.; Beam, A.; Farhat, M. Beyond Multidrug Resistance: Leveraging Rare Variants with Machine and Statistical Learning Models in Mycobacterium tuberculosis Resistance Prediction. EBioMedicine 2019, 43, 356–369. [Google Scholar] [CrossRef]
- Gröschel, M.I.; Owens, M.; Freschi, L.; Vargas, R.; Marin, M.G.; Phelan, J.; Iqbal, Z.; Dixit, A.; Farhat, M.R. GenTB: A User-Friendly Genome-Based Predictor for Tuberculosis Resistance Powered by Machine Learning. Genome Med. 2021, 13, 138. [Google Scholar] [CrossRef]
- Kuang, X.; Wang, F.; Hernandez, K.M.; Zhang, Z.; Grossman, R.L. Accurate and Rapid Prediction of Tuberculosis Drug Resistance from Genome Sequence Data Using Traditional Machine Learning Algorithms and CNN. Sci. Rep. 2022, 12, 2427. [Google Scholar] [CrossRef]
- Zhang, A.; Teng, L.; Alterovitz, G. An Explainable Machine Learning Platform for Pyrazinamide Resistance Prediction and Genetic Feature Identification of Mycobacterium tuberculosis. J. Am. Med. Inf. Assoc. 2021, 28, 533–540. [Google Scholar] [CrossRef]
- Jamal, S.; Khubaib, M.; Gangwar, R.; Grover, S.; Grover, A.; Hasnain, S.E. Artificial Intelligence and Machine Learning Based Prediction of Resistant and Susceptible Mutations in Mycobacterium tuberculosis. Sci. Rep. 2020, 10, 5487. [Google Scholar] [CrossRef]
- Safari, A.H.; Sedaghat, N.; Zabeti, H.; Forna, A.; Chindelevitch, L.; Libbrecht, M. Predicting Drug Resistance in M. tuberculosis Using a Long-Term Recurrent Convolutional Network. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021, Gainesville, FL, USA, 1–4 August; 2021; Volume 1. [Google Scholar] [CrossRef]
- Jiang, Z.; Lu, Y.; Liu, Z.; Wu, W.; Xu, X.; Dinnyés, A.; Yu, Z.; Chen, L.; Sun, Q. Drug Resistance Prediction and Resistance Genes Identification in Mycobacterium tuberculosis Based on a Hierarchical Attentive Neural Network Utilizing Genome-Wide Variants. Brief. Bioinform. 2022, 23, bbac041. [Google Scholar] [CrossRef]
- Chowdhury, A.S.; Khaledian, E.; Broschat, S.L. Capreomycin Resistance Prediction in Two Species of Mycobacterium Using a Stacked Ensemble Method. J. Appl. Microbiol. 2019, 127, 1656–1664. [Google Scholar] [CrossRef]
- Deelder, W.; Napier, G.; Campino, S.; Palla, L.; Phelan, J.; Clark, T.G. A Modified Decision Tree Approach to Improve the Prediction and Mutation Discovery for Drug Resistance in Mycobacterium tuberculosis. BMC Genom. 2022, 23, 46. [Google Scholar] [CrossRef]
- Viveiros, M.; Coll, F.; Deelder, W.; Kouchaki, S.; Yang, Y.; Lachapelle, A.; Walker, T.M.; Walker, A.S.; Consortium, C.; Peto, T.E.A.; et al. Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking. Front. Microbiol. 2020, 11, 667. [Google Scholar] [CrossRef]
- Deelder, W.; Christakoudi, S.; Phelan, J.; Benavente, E.D.; Campino, S.; McNerney, R.; Palla, L.; Clark, T.G. Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance from Whole Genome Sequencing Data. Front. Genet. 2019, 10, 922. [Google Scholar] [CrossRef]
- Libiseller-Egger, J.; Phelan, J.; Campino, S.; Mohareb, F.; Clark, T.G. Robust Detection of Point Mutations Involved in Multidrug-Resistant Mycobacterium tuberculosis in the Presence of Co-Occurrent Resistance Markers. PLoS Comput. Biol. 2020, 16, e1008518. [Google Scholar] [CrossRef]
- Nguyen, M.; Olson, R.; Shukla, M.; VanOeffelen, M.; Davis, J.J. Predicting Antimicrobial Resistance Using Conserved Genes. PLoS Comput. Biol. 2020, 16, e1008319. [Google Scholar] [CrossRef]
- Sergeev, R.S.; Kavaliou, I.S.; Sataneuski, U.V.; Gabrielian, A.; Rosenthal, A.; Tartakovsky, M.; Tuzikov, A.V. Genome-Wide Analysis of MDR and XDR Tuberculosis from Belarus: Machine-Learning Approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 16, 1398–1408. [Google Scholar] [CrossRef]
- Yang, Y.; Walker, T.M.; Walker, A.S.; Wilson, D.J.; Peto, T.E.A.; Crook, D.W.; Shamout, F.; Zhu, T.; Clifton, D.A.; Arandjelovic, I.; et al. DeepAMR for Predicting Co-Occurrent Resistance of Mycobacterium tuberculosis. Bioinformatics 2019, 35, 3240–3249. [Google Scholar] [CrossRef]
- Kouchaki, S.; Yang, Y.Y.; Walker, T.M.; Walker, A.S.; Wilson, D.J.; Peto, T.E.A.; Crook, D.W.; Clifton, D.A.; Hoosdally, S.J.; Gibertoni Cruz, A.L.; et al. Application of Machine Learning Techniques to Tuberculosis Drug Resistance Analysis. Bioinformatics 2019, 35, 2276. [Google Scholar] [CrossRef]
- Müller, S.J.; Meraba, R.L.; Dlamini, G.S.; Mapiye, D.S. First-Line Drug Resistance Profiling of Mycobacterium tuberculosis: A Machine Learning Approach. AMIA Annu. Symp. Proc. 2021, 2021, 891–899. [Google Scholar]
- Kavvas, E.S.; Catoiu, E.; Mih, N.; Yurkovich, J.T.; Seif, Y.; Dillon, N.; Heckmann, D.; Anand, A.; Yang, L.; Nizet, V.; et al. Machine Learning and Structural Analysis of Mycobacterium tuberculosis Pan-Genome Identifies Genetic Signatures of Antibiotic Resistance. Nat. Commun. 2018, 9, 4306. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Lin, J.; Hu, Y.; Zhou, J. PARMAP: A Pan-Genome-Based Computational Framework for Predicting Antimicrobial Resistance. Front. Microbiol. 2020, 11, 578795. [Google Scholar] [CrossRef] [PubMed]
- Zabeti, H.; Dexter, N.; Safari, A.H.; Sedaghat, N.; Libbrecht, M.; Chindelevitch, L. INGOT-DR: An interpretable classifier for predicting drug resistance in M. tuberculosis. Algorithms Mol. Biol. 2021, 16, 17. [Google Scholar] [CrossRef] [PubMed]
- Kavvas, E.S.; Yang, L.; Monk, J.M.; Heckmann, D.; Palsson, B.O. A Biochemically-Interpretable Machine Learning Classifier for Microbial GWAS. Nat. Commun. 2020, 11, 2580. [Google Scholar] [CrossRef]
- Su, M.; Satola, S.W.; Read, T.D. Genome-Based Prediction of Bacterial Antibiotic Resistance. J. Clin. Microbiol. 2019, 57, e01405-18. [Google Scholar] [CrossRef]


| Variable | Categories | Definition | Example | 
|---|---|---|---|
| Sample size | <150, 150–1500, 1500–3600, 3600–8600, 8600–17,000, 17,000–32,700 | Number of samples included | NA | 
| Publication year | 2017–2022 | Year of the publication date of the article | 2017–2022 | 
| Country of study | Countries | Country where the study was published | USA, Mexico, Brazil | 
| Input Data type | Omic data | Second-generation sequencing platform output files | FastaQ files Illumina | 
| Sequence data | First-generation sequencing platform output files | FastaQ files Sanger | |
| Clinical data | Clinical records | SQL or any database | |
| Type of features | Whole genome variants | All variants, including SNPs, deletions, and insertions | 1296_ins_3_a_attc, fprA_564_del_2_acg_a | 
| Whole genome SNPs | Only variants previously classified as SNP | Known genetic positions registered in databases such as NC_000962.3: 1524 nt | |
| DNA variant | All variants of a resistance-related gene | rpob, katG, embB | |
| SNPs | All SNPs of a resistance-related gene | rpob_S450L, rpob_L430P, katG_R463L | |
| Catalog of mutation resistance | Genomic positions selected by a catalog of resistances published by WHO | Lys43Arg (aag/aGg) | |
| Number of features | NA | Variables or attributes that are used to describe and quantify the input data that is used to train a machine-learning model | Binary or categorical representations of variants, complete sequence representations, and the number of patterns or relationships used for training. | 
| Origen of genomes | Countries | Countries from which genomes were taken | |
| Availability of data | No available | There is no available data | |
| Available | There are available data | The data or code used is provided through web pages or GitHub | |
| Type of algorithm | Artificial Neural Network | Artificial intelligence method | Convolutional neural network, Recurrent neural network, multi-layer perceptron | 
| Bayesian Methods | A method of statistical inference | Naïve Bayes | |
| Clustering | The task involves organizing a set of objects into groups based on similarities between objects within the same group | k-means clustering, hierarchical clustering | |
| Decision tree | A graph that uses a branching method to illustrate every possible output for a specific input | Decision tree | |
| Discriminant analysis | A multivariate technique used to separate two or more groups of observations | Linear discriminant analysis | |
| Ensemble methods | Combines several base models | AdaBoost, Random Forest | |
| Instance-based learning | Family of techniques for classification and regression | k-nearest neighbor | |
| Logistic regression | Statistical analysis method to predict a binary outcome | Logistic regression | |
| Regression (Other) | Statistical processes for estimating the relationships between a dependent variable and one or more independent variables | Linear regression | |
| Kernel methods | This is a deep learning algorithm that uses supervised learning to classify or regress data groups | Support vector machine | |
| Other | Algorithms not classified into one of the categories above | Reinforcement learning, graphical models | |
| External Validation | Yes | Performance of the algorithm tested on external data | Automated scoring of the genome with scoring by DST | 
| No | NA | NA | |
| Reduction method | Not reduction | No dimensionality reduction methods were used. | NA | 
| Statistical | Statistical dimensionality reduction methods were used | PCA, RF, T-SNE | |
| Not statistical | Statistical dimensionality reduction methods were not used | Match catalogue | |
| Number of Drugs | 1–14 | Number of drugs tested | NA | 
| Treatment line | First line | OMS definition | RIF, INH, STR, EMB, PZA | 
| Second line | OMS definition | AMK, CAP, KAN, CIP, OFL, MOX, ETH, CYS, PAS | |
| Both lines | OMS definition | First and second line | 
| Type of Algorithm | % | 
|---|---|
| Artificial Neural Network | 28 | 
| Decision Tree | 22 | 
| Clustering | 13 | 
| Logistic Regression | 6 | 
| Kernel Methods | 6 | 
| Ensemble Methods | 6 | 
| Bayesian Methods | 3 | 
| Instance-Based Learning | 3 | 
| Other | 6 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Perea-Jacobo, R.; Paredes-Gutiérrez, G.R.; Guerrero-Chevannier, M.Á.; Flores, D.-L.; Muñiz-Salazar, R. Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review. Microorganisms 2023, 11, 1872. https://doi.org/10.3390/microorganisms11081872
Perea-Jacobo R, Paredes-Gutiérrez GR, Guerrero-Chevannier MÁ, Flores D-L, Muñiz-Salazar R. Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review. Microorganisms. 2023; 11(8):1872. https://doi.org/10.3390/microorganisms11081872
Chicago/Turabian StylePerea-Jacobo, Ricardo, Guillermo René Paredes-Gutiérrez, Miguel Ángel Guerrero-Chevannier, Dora-Luz Flores, and Raquel Muñiz-Salazar. 2023. "Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review" Microorganisms 11, no. 8: 1872. https://doi.org/10.3390/microorganisms11081872
APA StylePerea-Jacobo, R., Paredes-Gutiérrez, G. R., Guerrero-Chevannier, M. Á., Flores, D.-L., & Muñiz-Salazar, R. (2023). Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review. Microorganisms, 11(8), 1872. https://doi.org/10.3390/microorganisms11081872
 
        



 
       