*Article* **A Machine Learning-based Pipeline for the Classification of CTX-M in Metagenomics Samples**

**Diego Ceballos 1,2,\*, Diana <sup>L</sup>ópez-Álvarez 3, Gustavo Isaza 2,\*, Reinel Tabares-Soto 1, Simón Orozco-Arias 1,2 and Carlos D. Ferrin 4**


Received: 16 February 2019; Accepted: 8 April 2019; Published: 24 April 2019

**Abstract:** Bacterial infections are a major global concern, since they can lead to public health problems. To address this issue, bioinformatics contributes extensively with the analysis and interpretation of in silico data by enabling to genetically characterize different individuals/strains, such as in bacteria. However, the growing volume of metagenomic data requires new infrastructure, technologies, and methodologies that support the analysis and prediction of this information from a clinical point of view, as intended in this work. On the other hand, distributed computational environments allow the managemen<sup>t</sup> of these large volumes of data, due to significant advances in processing architectures, such as multicore CPU (Central Process Unit) and GPGPU (General Propose Graphics Process Unit). For this purpose, we developed a bioinformatics workflow based on filtered metagenomic data with Duk tool. Data formatting was done through Emboss software and a prototype of a workflow. A pipeline was also designed and implemented in bash script based on machine learning. Further, Python 3 programming language was used to normalize the training data of the artificial neural network, which was implemented in the TensorFlow framework, and its behavior was visualized in TensorBoard. Finally, the values from the initial bioinformatics process and the data generated during the parameterization and optimization of the Artificial Neural Network are presented and validated based on the most optimal result for the identification of the CTX-M gene group.

**Keywords:** machine learning; metagenomics; bioinformatics; CTX-M
