**5. Conclusions**

Our robust comparative genomic analysis between three *Streptomyces* strains with varying keratinolytic activities permitted the identification of a set of putative proteases that could potentially be involved in the keratinolytic capacity of *Streptomyces* sp. G11C. According to peptidase family classification and orthogroup identification, we consider as promising candidates: (1) Unique putative peptidases in the keratinolytic strain G11C (17 unassigned p-orthogroup peptidases), including those belonging to the peptidase families C02 and S53; and (2) three peptidases present in the orthologous groups shared between strains G11C and CHD11, but not present in the non-keratinolytic strain Vc74B-19. Additionally, similarity network analysis identified three communities of keratinases-linked peptidases belonging to families S01, S08, and M04. Complementing this information with sub-cellular localization data and phylogenetic analysis, we identified seven promising genes likely to encode potential keratinases from *Streptomyces* sp. G11C, belonging to peptidase families S01 and S08. These findings provide genetic information for the proteomic analysis in the keratinolytic strain G11C, described in related work [40], which functionally validates the predictions accomplished in this study. This is the first comprehensive bioinformatics analysis that complements comparative genomics with phylogeny, network similarities, and cellular localization prediction to provide a set of genes considered to encode putative keratinases. This semi-supervised pipeline, involving t-SNE clustering on cellular localization data, is a novel approach in the keratinase literature, and we consider it as a significant advance that will help build more sophisticated pipelines in the future. In addition, it can be useful for various other hydrolytic enzyme families, such as lipases, glycosidases, esterases, among others.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/md19060286/s1, Figure S1: ANIm matrix of genomes employed for the phylogeny, Figure S2: Classification of putative peptidases from *Streptomyces* strains G11C, CHD11, and Vc74B-19 using MEROPS database, Figure S3: Clustering of t-SNE points by the DBSCAN algorithm, Figure S4: Occupancy plots, Figure S5: Maximum likelihood tree from the filtered MSA of t-SNE group 2 sequences, Figure S6: Protease similarity network using E-value threshold 1 <sup>×</sup> <sup>10</sup>−<sup>80</sup> and Maximum likelihood trees from filtered MSA of t-SNE group 0 and t-SNE group 1 sequences, Table S1: List of strains utilized in this study, isolation sources, and 16S rRNA identification, Table S2: Assembly metrics of genomes incorporated in the phylogenomic tree, Table S3: ANIm matrix, Table S4: Annotation and selection of proteases, Table S5: Classification of peptidases by MEROPS database, Table S6: Orthogroup classification of proteases, Table S7: Functional keratinases from databases, Table S8: Promising genes of *Streptomyces* sp. G11C, Table S9: Putative non-keratinases from databases, Tabla S10: Nodes and edges of the network, Table S11: Edges of nodes connected with a functional keratinase, Table S12: Consolidated table of subcellular localization data for each protein sequence, Table S13: t-SNE group tables.

**Author Contributions:** R.V., V.G. and B.C. contributed to the conception and design of the experiments. V.G. performed the experiments of keratin degradation. L.Z.-L. and A.U. extracted the genomic DNA. V.G. and R.V. analyzed the data and prepared figures and tables. J.A.U. and B.C. supervised bioinformatic analyses. R.V., V.G., A.U. and B.C. wrote and edited the manuscript. All authors contributed to manuscript revision, read and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

**Funding:** Financial support was provided by CONICYT FONDECYT N◦1171555 and CONICYT PIA GAMBIO Project N◦ ACT172128 (to BC). VG was funded by Conicyt PhD fellowship and Conicyt Gastos Operacionales N◦ 21161188, and PIIC program (UTFSM). AU was funded by 'CONICYT FONDECYT POSTDOCTORADO N◦ 3180399 . LZ was supported by Conicyt PhD fellowship N◦ 21180908. JAU was supported by the ANID Millennium Science Initiative/Millenium Initiative for Collaborative Research on Bacterial Resistance, MICROB-R, NCN17\_081. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Complete genome sequences of *Streptomyces* sp. G11C, CHD11, and Vc74B-19 are available in NCBI Genbank under WGS accession numbers JABTTT000000000, JABTTS000000000, and JABTTR000000000, respectively. Prokka annotations of the genomes of *Streptomyces* sp. G11C, CHD11, Vc74B-19 are available via Figshare through the following link https://doi.org/10.6084/m9.figshare.13133270.v1 (accessed on 28 October 2020). Fasta headers are consistent with reported sequence labels in this work.

**Acknowledgments:** We thank Brigitte Böckle for her initial guidance and Danilo Pérez-Pantoja for facilitating computational access to perform the necessary analyses.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Marine Drugs* Editorial Office E-mail: marinedrugs@mdpi.com www.mdpi.com/journal/marinedrugs

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-3300-1