*2.2. Comparative Genomics of Differential Keratinolytic Streptomycete Strains*

To identify potential genes encoding keratinases that could be involved in keratin hydrolysis and the observed differences in *Streptomyces* strains G11C, CHD11, and Vc74B-19, a comparison of the diversity and quantity of peptidases present in all three strains was accomplished using the following approaches: Genome annotation using the Prokka, PANNZER2, and eggNOG servers, classification by MEROPS database and identification of orthologous groups. In addition, to compare these protein sequences with functionally characterized keratinases obtained from literature, a similarity network was developed.

**Figure 1.** Phylogenomic tree of selected *Streptomyces* strains using 634 identified single-copy orthogroups via Orthofinder and subsequent Fasttree inference. Genomes of type (T) and reference (R) strains belonging to the *Streptomyces* genus were retrieved from the PATRIC database. *Catenulispora acidiphila* DSM44928 and *Kitasatospora setae* KM-6054 were used as outgroups, rooting the tree at the *Catenulispora* node. Bootstrap support values (b = 1000) are depicted for each branch. Strains sequenced in this study are displayed in bold font. The following abbreviations apply: *S.*, *Streptomyces*; *K.*, *Kitasatospora*; *C.*, *Catenulispora*.

#### 2.2.1. Protease Search

Multiplatform annotation revealed that all three *Streptomyces* strains presented a similar abundance of genes encoding for putative peptidases, corresponding to 3% of their genomic content (Table 2, Table S4). According to MEROPS classification, approximately 85% of their putative peptidases were classified into a protease family (Table S5), where strains G11C, CHD11, and Vc74B-19 presented 46, 48, and 49 peptidase families, respectively. In general, the most substantial fraction of peptidases for all three strains belongs to the serine (47–48.6%) and metallo- super-families (36.7–41.1%), whereas the minor fraction consists of cysteine (6.6–11.4%), aspartic (1.1–1.3%), threonine (1.1–1.3%) and mixed (1.1–1.3%) super-families (Figure S2). This result is consistent with previous studies, where serine, metallo-, and cysteine peptidases are the dominant proteolytic enzymes (>90%) of Bacteria, and aspartic and threonine peptidases contribute to a minor extent [46,47]. Interestingly, *Streptomyces* sp. G11C presents two putative peptidases belonging to two unique families (C02 and S53) that are not present in strains CHD11 and Vc74B-19. C02 belongs to the cysteine peptidases of the calpain family, although its biological role in bacteria is unclear, and there are not many studies that clarify its properties [48]. On the other hand, serine S53 family peptidases belong to the sedolisin family, which has been strongly correlated with an acidophilic lifestyle [46]. In contrast, strains CHD11 and Vc74B-19 share five peptidase families (C14, C15, C40, C56, and M103) between them that are absent in strain G11C. Similarly, strain G11C only shares a peptidase family (M17) with the non-keratinolytic strain Vc74B-19, which is not present in strain CHD11. The similarities shared with the non-keratinolytic strain Vc74B-19 could indicate that such peptidases may not contribute to the degradative ability of the keratinolytic strains G11C and CHD11.


**Table 2.** Characteristics of putative proteases genes in *Streptomyces* strains G11C, CHD11, and Vc74B-19.

Subsequently, putative peptidases (584 sequences) of the three strains were classified into 140 protease orthologous groups (p-orthogroups) using Orthofinder [42]. As expected, putative peptidases from the same p-orthogroup belong to a single peptidase family (Table S6). In fact, 102 p-orthogroups are shared by all three strains (Figure 2), confirming the similarity of the protease space between these streptomycete genomes. As for strain G11C, three p-orthogroups are exclusively shared with strain CHD11, and eight p-orthogroups with Vc74B-19, that are not present in the other strain. Particularly, the three p-orthogroups 121, 122, and 134 (Table S6), shared between the keratinolytic strain G11C and the low-keratinolytic strain CHD11, belonging to the M50, S12, and S01 peptidase families, can be considered good candidates for potential keratinolytic proteases, assuming both strains could have similar degradation mechanisms. By contrast, the eight p-orthogroups shared between the keratinolytic strain G11C and the non-keratinolytic strain Vc74B-19 can be potentially discarded as keratinolytic peptidase candidates.

**Figure 2.** Venn diagram of common p-orthogroup representatives between streptomycete strains G11C, CHD11, and Vc74B-19. Numbers indicate the number of p-orthogroups found for each strain or between strains.

Most of the shared p-orthogroups belong to the serine (n = 37) and metallo- (n = 40) super-families, while the cysteine, aspartic, threonine, and mixed super-families are found in a smaller proportion (n = 2–5), which agrees with our previous MEROPS results. On the other hand, some peptidases are not classified into any p-orthogroup. Further exploration of these "unassigned p-orthogroup" peptidases could give an insight into the differences observed in terms of keratinolytic activity between the strains, considering that strain G11C presents comparatively the greatest keratin degradative capacity under the conditions analyzed [40]. Among the 17 putative peptidases unique for strain G11C, there are five serine peptidases (families S01, S16, S15, S53, and S51), three metallo-peptidases (families

M86, M50, and M56), two cysteine peptidases (families C82 and C02), and seven unassigned peptidases to any family.

Putative proteases belonging to families, S01, S08, and M04, where most known bacterial keratinases are found (Table S7), can serve as indicators to search for putative keratinolytic proteases. Two sequences of *Streptomyces* sp. G11C belonging to the S01 family draw our attention, which are absent in orthogroups shared with the non-keratinolytic strain Vc74B-19: An unassigned p-orthogroup peptidase (G11C\_00267) and a peptidase (G11C\_00756) belonging to one of the three p-orthogroups shared between the keratin degrading strains G11C and CHD11. Additionally, *Streptomyces* sp. G11C presents 8, 12, and 3 putative peptidases belonging to the families S08, S01, and M04, respectively, that may be of interest for the search for putative keratinases, despite belonging to orthogroups shared between the three strains. The detail of the promising sequences of *Streptomyces* sp. G11C (i.e., "unassigned p-orthogroup" peptidases, peptidases shared between the strains G11C and CHD11, and peptidases belonging to the peptidase families S01, S08, and M04) are summarized in Table S8.

#### 2.2.2. Network Analysis

To complement the previous analysis, and identify those sequences related to families of known keratinases, a similarity network was constructed using an all-vs-all local alignment. For this analysis, 584 putative proteases of the three strains (hereinafter named "three-strain dataset"), 61 functional keratinases (mainly from Gram-positive bacteria) collected from NCBI (Table S7), and 50 selected trypsin, papain, and pepsin sequences, representing our hypothetical non-keratinase database (Table S9), were compared. Nodes in the network depict each protease, and an edge represents a hit in the resulting alignment (Figure 3; Table S10). Our protease similarity network graphically depicts the p-orthogroup distribution and identifies p-orthogroups that are related to functionally described keratinases. In total, 123 network communities composed of at least two or more nodes were detected by the Louvain algorithm.

Three network communities (N◦ 1, 4, and 41) possess sequences linked with functional keratinases with an E-value threshold of 1 × <sup>10</sup>−<sup>40</sup> (Table S11). The largest one, community N◦ 1 (Figure 3B), is related to sequences belonging to the peptidase family S08, constituted by most of the functional keratinases (n = 51 sequences) harboring 3, 5, and 6 sequences from strains G11C, CHD11, and Vc74B-19, respectively (p-orthogroups 0 and 11). Community N◦ 4 (Figure 3C) is composed of putative peptidases belonging to the family S01, where strains G11C, CHD11, and Vc74B-19 contribute with 5, 6, and 6 sequences, respectively (p-orthogroups 8, 17, and 116). This cluster presents six functional keratinases from *Actinomadura*, *Streptomyces*, and *Nocardiopsis*. Curiously, two putative non-keratinase sequences, annotated as trypsin-like serine protease (family S01) from other *Streptomyces* strains, also cluster together. This observation suggests that these two specific peptidases could have keratinolytic activity, although it has not been experimentally tested yet. Finally, in the smaller community N◦ 41 (Figure 3D), sequences belonging to the peptidase family M04 can be observed, consisting of 3, 2, and 5 sequences from strains G11C, CHD11, and Vc74B-19, respectively (p-orthogroups 16 and 64), clustering together with a functional keratinase from a *Geobacillus* strain (AJD77429.1). Scattered in the network, we found three keratinases from *Lactobacillus* and *Bifidobacterium* that do not group with any sequence from our three-strain dataset. In general, putative non-keratinase sequences are depicted as unconnected nodes or communities, except for specific cases such as communities N◦ 4 (mentioned above) and N◦ 47, both belonging to the peptidase family S01 (Table S10).

**Figure 3.** Protease similarity network, including the three-strain (584 sequences), keratinase (61 sequences), and non-keratinase (50 sequences) datasets. The E-value threshold of the blast alignment for the network is 1 <sup>×</sup> <sup>10</sup>−<sup>40</sup> Each node represents an identified putative protease, and the color fill indicates the origin of the sequence: Red, strain G11C; green, strain CHD11; blue, strain Vc74B-19; black, functionally known keratinases; yellow, putative non-keratinases. Edge transparency was adjusted to represent E-value difference: Darker edges correspond to smaller E-values. (**A**) entire network, (**B**–**D**) zoom into particular network clusters possessing known keratinases: Community 1, 4, and 41, respectively.

In summary, there are 11, 13, and 16 putative proteases linked to functional keratinases from *Streptomyces* strains G11C, CHD11, and Vc74B-19, respectively. Although the number of genes is less for keratinolytic-strain G11C, the percentage of identity when compared with some functional keratinase sequences is higher with this strain (Table S11). For instance, the first hit for all three strains is a keratinase synthesized by *Streptomyces albidoflavus* TBG-S13A5 (AYM48028.1), which presented a 96.4% amino acid identity with a predicted protein from strain G11C (G11C\_01512), 72.8% identity for strain CHD11 (CHD11\_02603) and 72.2% for strain Vc74B-19 (Vc74B-19\_03690). The sequence found in G11C, belonging to peptidase family S01, p-orthogroup 8, and community N◦ 4, can be considered a putative keratinase, considering the high amino acid similarity with the known keratinase from *Streptomyces albidoflavus* TBG-S13A5, which can also be explained by the phylogenomic closeness between both strains.
