*2.5. Summary of the Main Results of the Bioinformatic Pipeline*

To determine the genetic determinants potentially involved in the keratinolytic capacity of marine *Streptomyces*, a comparative genomic analysis was carried out between three streptomycetes with different keratinolytic activities, strains G11C, CHD11, and Vc74B-19 with high, medium, and null activity, respectively (Figure 6). Initially, a search for proteases was performed by genomic annotation using three servers: Prokka, PANNZER2, and eggNOG. These proteases were manually cured using Blastp and classified into peptidase families according to the MEROPS database. Subsequently, a comparative genomic analysis between the universe of proteases of the three strains allowed the identification of protease orthogroups shared between the three streptomycete genomes, highlighting proteases unique to each strain. From this analysis, genes of interest were exclusively related to keratinolytic strains, where 17 genes unique to strain G11C and three peptidases belonging to orthogroups shared between strains CHD11 and G11C are found. Additionally, a similarity network analysis of all the proteases of the three strains together with two databases of functional keratinases and hypothetical non-keratinases, allowed the identification of three communities related to functional keratinases belonging to peptidase families S01, S08, and M04. 11 proteases from *Streptomyces* sp. G11C emerge from this analysis. Subsequently, to identify extracellular proteases related to keratinases, the information from the similarity networks and p-orthogroup analysis was integrated with a cell localization analysis through t-SNE based clustering. This analysis allowed the identification of three groups containing extracellular proteases, named groups t-SNE 0, 1, and 2. In this analysis, most of the unique peptidases of the strain G11C (unassigned p-orthogroup peptidases) and the peptidases shared between the keratinolytic strains CHD11 and G11C were predicted to be intracellular. However, given their exclusive relationship with the keratinolytic strains, they are still considered interesting because their presence may possibly explain the differences in keratinolytic activity between the three strains. On the other hand, the peptidases predicted to be extracellular present in the t-SNE groups were subjected to a phylogenetic analysis incorporating the tool: Ancestral state reconstruction, which assigns a probability distribution to each ancestor node within the tree, of belonging to the categories: Functional keratinase, keratinase-linked protein, three-strain category, and non-keratinase. Finally, after applying selection criteria for the analysis of the clades of the phylogenetic trees (presence of a functional keratinase, presence of a G11C sequence, and 50% probability in the ancestor node of being keratinase or keratinase-linked sequence), seven coding sequences for potential extracellular keratinases were identified in *Streptomyces* sp. G11C.

**Figure 6.** Bioinformatic pipeline to predict potential keratinases in *Streptomyces* sp. G11C. This analysis integrates a series of steps, including comparative genomics with network similarities, cellular localization prediction, and phylogeny to provide a set of genes considered to encode putative keratinases.

In this analysis, sequences codifying for (1) proteases related exclusively to keratinolytic strains and (2) proteases predicted to be extracellular and related to functional keratinases are considered as interesting candidates that could explain the differences in keratinolytic activity between the three strains.

#### **3. Discussion**

In this study, a multi-step bioinformatic pipeline, applying several comparative genomics tools, was developed to predict a set of genes that encode putative keratinases in a marine *Streptomyces* strain with keratinolytic activity. To see if genetic features related to these activities could be identified, three strains with differential keratinolytic activity were selected, *Streptomyces* sp. G11C presented a rather high percentage of feather degradation, reaching approximately 80% and with a relative keratinase activity of 60%, after five days of incubation, whereas *Streptomyces* sp. CHD11 presented lower relative keratinase activity (less than 10%) [40]. In contrast, *Streptomyces* sp. Vc74B-19 presented no keratinolytic activity, even after 10 days of incubation [40].

Genome comparison showed approximately 3% of the total gene count encoded putative peptidases, revealing an unexpected similar abundance and diversity in all three strains. Previous reports described that among bacterial species, the percentage of peptidases encoded in the genomes ranges from 1.5% to 4% [53]. A similar diversity of peptidase families was found in all three strains, with serine, metallo- and cysteine super-families being more abundant in all three genomes. Peptidase diversity may reflect their adaptation to environmental conditions [46]. Considering the marine origin of the *Streptomyces* analyzed, the diversity of peptidases could be related to environmental characteristics such as the pH of the ocean, which varies from slightly neutral to alkaline [54]. Serine, cysteine, and metallo-peptidases are generally active under these conditions [55,56] and may contribute to their ecologic success, and possibly, their degradative abilities. In general, strains CHD11 and Vc74B-19 showed to be more similar to each other, sharing a large number of peptidase families and p-orthogroups that are absent in strain G11C, in agreement with their close phylogenomic relatedness. By contrast, strain G11C presented more "unassigned p-orthogroup" peptidases (17 unique peptidases) compared to the other two strains CHD11 and Vc74B-19 (8 and 3, respectively), of which 12 were actively involved in keratin degradation, whose presence was confirmed in our previous secretome analysis in keratinolytic strain G11C [40], highlighting them as interesting candidates. According to the subcellular localization analysis, most of these sequences are predicted as cytoplasmic and membrane-associated proteases. Although most of the keratinolytic proteases known to date are predominantly extracellular, some cell-bound and intracellular keratinolytic proteases have also been described [57]. Therefore, the participation of the unique proteases of the strain G11C in the keratinolytic activity cannot be ruled out.

Most of the keratinases known to date, have been classified as serine proteases [14,16,19,58,59], and a few as metalloproteases [20,60,61]. The latter mainly come from Gram-negative bacteria and fungi [62]. To compare and identify promising sequences related to known keratinases, we constructed a functional keratinase database, mainly from Gram-positive bacteria (Table S7), retrieving data from the NCBI. An analysis based on the MEROPS classification indicates that keratinases from *Bacillus* belong to the peptidase S08 family, and keratinases from *Actinobacteria*, such as *Nocardiopsis*, *Actinomadura*, and *Streptomyces*, belong mostly to the S01 family. There is scarce information about the metalloprotease family associated with keratinases. In our database, only one of the keratinases, from a *Geobacillus* strain (AJD77429.1), belongs to the metalloprotease super-family, specifically to the family M04. A protease similarity network, using these functionally characterized keratinases together with our three streptomycete genomes and non-keratinases databases, highlighted three communities of nodes containing keratinase-linked peptidases belonging to the S1, S8 (serine proteases), and M4 (metalloprotease) families, allowing the identification of 11 promising sequences of the keratinolytic strain G11C. It is worth mentioning that community 41 (family M04) presented the least number of grouped sequences, which is possibly

related to the presence of only one functional keratinase, as mentioned above. Furthermore, the three strains show few peptidases belonging to this family, for example, *Streptomyces* sp. G11C only presents three M04 metallopeptidases that were subsequently discarded because they did not meet the pipeline criteria (being extracellular and phylogenetically close to functional keratinases). On the other hand, the promising sequences, including the unique peptidases of the strain G11C (unassigned p-orthogroup peptidases), and the peptidases belonging to p-orthogroups shared between the keratinolytic strains CHD11 and G11C, did not group in any community of the network, since most belong to other protease families, being divergent from the known keratinase families. In these groups, there are only two serine proteases of the family S01, although they did not group in community 4 (family S01). In addition, it can be mentioned that one of these unique sequences of strain G11C (G11C\_00267) presented high abundance in the secretome analysis carried out previously [40], indicating a relevant role in the degradation of keratin.

To narrow the search for potential keratinases, we complemented this information with the subcellular localization data and phylogeny. In the t-SNE clustering analysis, we identified those putative peptidases that are predicted as extracellular (including the keratinase-linked sequences mentioned above). In this analysis, we reasoned that cellular localization prediction is intimately related to a putative keratinolytic function. This is in line with previous reports, that have shown that the macromolecular characteristics of keratin prevent its direct absorption by microbial cells [2]. Thus, the utilization of keratin as a nutrient source usually requires the production of extracellular keratinases. Therefore, to finely separate and select potential extracellular keratinase sequences, we included a phylogenetic analysis of the clusters, which also provides a way to integrate the similarity network data into our selection process. The latter is one of the main challenges addressed in this work, since network sequence data has been mainly used as a standalone tool for functional inference [63], partly due to its graphics that are not completely compatible with tabular or structural sequence data. Finally, from this analysis, we identified seven gene sequences encoding potential keratinases (belonging to families S01 and S08) together with 17 unique genes encoding unassigned p-orthogroup peptidases in *Streptomyces* sp. G11C could explain the differences observed, in terms of keratinolytic activity, between the three strains. Apparently, the degradation of recalcitrant keratin wastes requires the cooperation of several keratinolytic proteases, as evidenced in some bacterial [36,64–67] and fungi species [7,68]. Recently, a work reported by Huang et al. [65], evidenced the presence of five proteases in the culture of *Bacillus* sp. 8A6 with keratin-rich substrates, belonging to four protease families M12, S01A, S8A, and T3. In fungi, the participation of a set of proteases in keratin degradation has also been reported. Pathogenic fungi mainly secrete endoproteases, including proteases from families A1, S8A, M36, and M35 [68]. The non-pathogenic fungus *Onygena corvina* secretes proteases belonging to three protease families: S08, M28, and M03, when cultivated with pig bristles [7]. However, we have not found proteases belonging to these families in our analysis with streptomycete genomes, except for some putative peptidases belonging to the M28 family, suggesting that the mechanisms of keratin degradation vary between fungi and bacteria. For most keratinolytic studies involving *Streptomyces*, the approach has been focused on purifying and characterizing the main keratinolytic enzyme. For example, in work reported by Bressollier et al. [14], at least six extracellular proteases were identified in the culture of *Streptomyces albidoflavus* grown on feather meal-based medium, but only the most abundant keratinolytic serine protease was further characterized. Recently, through a transcriptomic analysis performed by Li et al. in 2020, it was possible to elucidate a set of factors involved in the keratin degradation mechanism mediated by *Streptomyces* sp. SCUT-3 [36]. In this analysis, 19 genes codifying potential extracellular proteases, along with 10 genes codifying potential intracellular proteases belonging to serine-type, cysteine-type, and metalloproteases, were up-regulated during growth in medium containing feathers. In addition, two genes involved in mycothiol synthesis, and some genes related to sulfite production were also up-regulated, indicating a cooperative action of reducing agents in the

breaking of feather disulfide bonds. According to this evidence, it is conceivable to propose a set of enzymes acting together in keratin degradation by a single bacterium. The literature mentioned above is solely based on the functional exploration of keratin degradation, and no systematic bioinformatics analysis has been approached. The advantage of our pipeline is it considers whole genomes instead of single sequences, which has not been previously addressed with keratinases. With the advances in genome sequencing and the improvement of the number of genome-scale studies, our pipeline could be enriched and further bioinformatic predictions tested.

To go further on this argument, in our recent work [40], we confirmed the presence of these predicted enzymes in the *Streptomyces* sp. G11C secretome: Six of seven keratinolytic proteases (predicted by phylogenetic analysis), and 12 of 17 unique genes of the G11C genome ("unassigned p-orthogroup" peptidases), were detected under the culture conditions with feathers as the sole carbon source, indicating that a set of enzymes may act synergistically during keratin degradation. Interestingly, one of the unique genes of the G11C strain (G11C\_00267) presented one of the highest protein abundances in the proteomic analysis [40], suggesting an important role for this enzyme in the keratin degradation mechanism. These findings are consistent with our bioinformatic predictions. The coordination of all these extracellular and intracellular enzymes, including the unique peptidases of *Streptomyces* sp. G11C could be potentiating its keratinolytic activity, leading to an advantage over the other *Streptomyces* analyzed strains CHD11 and Vc74B-19. Genes encoding disulfide reductases and genes related to sulfite export were similar in the three strains (data not shown), suggesting that disulfide bond reduction is not related to the observed functional differences, at least not the common mechanisms described in the literature [35]. Additional efforts to discover the reasons for such functional differences between these strains are part of our ongoing investigation. Our novel bioinformatic pipeline, together with the increased sequencing of keratinolytic strains genomes, could serve as the basis for future predictions of keratinolytic proteases, facilitating the selection of potential keratinolytic bacteria. To our knowledge, this is the first comprehensive bioinformatics analysis that complements comparative genomics with phylogeny, network similarities, and cellular localization prediction to provide a set of candidate genes considered to encode putative keratinases.

## **4. Materials and Methods**

### *4.1. Bacterial Strains*

Previously, bacterial strains belonging to our Chilean marine actinobacterial culture collection, isolated from marine sediments, sponges, and sea urchins collected from the coast of Chile [37–39], were analyzed for keratinolytic activity through a simple feather degradation test on agar plates and culture tubes [40]. Based on these results, three *Streptomyces* were chosen to perform a genomic comparison according to their keratinolytic activity. *Streptomyces* sp. G11C isolated from marine sediments derived from Penas Gulf, with high keratinolytic activity, *Streptomyces* sp. CHD11, isolated from a marine sponge from Chañaral de Aceituno Island, with low keratinolytic activity and *Streptomyces* sp. Vc74B-19, isolated from marine sediments from Valparaíso Bay, with no keratinolytic activity (Table S1).
