*2.4. Phylogeny on t-SNE Groups*

To identify extracellular peptidases of *Streptomyces* sp. G11C that are phylogenetically related to functional keratinases, multiple sequence alignments (MSA), and phylogenetic analysis were performed on the t-SNE groups 0, 1, and 2 (containing sequences predicted as extracellular). The average occupancy (average number of residues per position in the alignment) [50] of the three t-SNE groups MSAs was rather low as highly divergent sequences hinder phylogenetic tree construction: group 0—35.8%, group 1—19.3%, and group 2—35.7%. For this reason, we filtered positions with occupancy below 70% to generate more compact MSAs, and the new average occupancy for these compact alignments was t-SNE group 0—87.4%, group 1—85.1%, and group 2—90.3% (Figure S4). Then, we constructed bootstrapped maximum likelihood trees with these compact MSAs (Figure 5). Due to the high divergence observed for the sequences in each t-SNE group, an additional tool was applied, ancestral state reconstruction, to aid the interpretation of the trees.

**Figure 5.** Maximum likelihood trees from filtered MSA of t-SNE group 0 (**A**) and t-SNE group 1 (**B**) sequences. Ancestral state probabilities inferred in internal nodes for each category (functional keratinase, keratinase-linked protein, three-strain, and non-keratinase) are depicted as pie charts, with a total of 1 for each pie chart. Support values based on bootstrapping are indicated for each node. Selected clades under stipulated criteria are enclosed by light red boxes, while discarded clades are enclosed by light yellow boxes.

The same discrete categories used in the PCA and t-SNE plots (i.e., functional keratinase, keratinase-linked protein, three-strain category, and non-keratinase) were used for the ancestral state reconstruction analysis. With this tool, a probability distribution is assigned to each ancestor node within the tree, indicating the likeliness of the ancestor to belong to one of these categories. This provides a visual interface to study which branches could potentially be related to proteases with keratinolytic activity, as the ancestral state depends on both branch length (i.e., sequence similarity) and tree topology [51,52]. We applied three filters to select clades for a more detailed description. First, we analyzed branches that presented >50% probability of belonging to the functional keratinase or keratinase-linked categories. Second, we selected clades that possess at least one functional keratinase. And third, we focused only on clades where at least one sequence from strain G11C is present. These criteria reduced the subsequent analysis to clade 2 in the t-SNE group 0 (Figure 5A, Box 2) and clades 1 and 5b from t-SNE group 1 (Figure 5A, Box 1 and 5b). Clades of the t-SNE group 2 did not meet these requirements, and therefore, were not analyzed (Figure S5).

Clade 2 of t-SNE group 0 (Figure 5A, Box 2) harbors seven proteases from the threestrain set, annotated as serine proteases belonging to the S01 family, community 4, three from the p-orthogroup 17, and four from the p-orthogroup 8. Within this clade, there are four keratinases, belonging to *Actinomadura* (ASU91959.1, AMH86070.1), *Nocardiopsis* (AAO06113.1), and *Streptomyces* (CAH05008.1) strains. Interestingly, a sequence of the keratinolytic strain G11C (G11C\_05333) and low-keratinolytic strain CHD11 (CHD11\_00976) are phylogenetically close, compared to the non-keratinolytic strain Vc74B-19. It is possible that specific motifs or amino acids within these sequences, not present in strain Vc74B-19, enhance the activity of the codified enzymes, and should be considered as candidates for putative keratinases. In addition, a subclade that presents Vc74B-19 and CHD11 sequences related to the known *Streptomyces fradiae* K11 keratinase CAH05008.1 was observed. In this case, branch lengths indicate significant sequence divergence, and therefore, no evidence of putative keratinolytic activity can be assigned to the three-strain genes of this subclade.

In the case of the t-SNE group 1 tree, clade 1 (Figure 5B, Box 1) groups seven three-strain proteases belonging to the p-orthogroup 0, community 1: Three from strain G11C (G11C\_02264, G11C\_03013, G11C\_05273), two from strain CHD11 (CHD11\_02120, CHD11\_02299), and two from strain Vc74B-19 (Vc74B-19\_00125, Vc74B-19\_05629). All these sequences are annotated as serine proteases of the S08 family. There are only two sequences of functional keratinases within this clade, one is a partial sequence from *Streptomyces* sp. OWU 1633 (AAU94350.1) and the other is from *Amycolatopsis* sp. BJA-103 (QGA70043.1), both belonging to the family S08. In this clade, the sequences of CHD11 and Vc74B-19 strains are phylogenetically closer, compared to strain G11C.

Clade 5b (Figure 5B, Box 5b) comprises five three-strain sequences of the p-orthogroup 8, community 4: Three from strain G11C (G11C\_01510, G11C\_01512, G11C\_02546), one from strain CHD11 (CHD11\_02602) and one from strain Vc74B-19 (Vc74B-19\_03689), annotated as streptogrisins A, B, and D. This clade has a branch that contains two keratinases from *Streptomyces albidoflavus*, strains Fea-10 (AQX39246.1) and TBG-S13A5 (AYM48028.1). In this branch, only one sequence of strain G11C is present (G11C\_01512), and given its high similarity with the mentioned keratinases (96.4% amino acid identity), it is probably a potential keratinase, which could explain, to a certain extent, the differences in keratinolytic activity between our strains. All these sequences belong to the peptidase S01 family.

Focusing on *Streptomyces* sp. G11C, which evidenced the greater level of keratin degradation, we identified seven putative proteases of interest, that are phylogenetically close to known keratinases, and that could contribute to explaining the observed differential keratinolytic activities. These sequences are the following: G11C\_05333, G11C\_02264, G11C\_03013, G11C\_05273, G11C\_01510, G11C\_01512, and G11C\_02546. They belong to p-orthogroups 0, 8, and 17, communities 1 and 4, which are related to peptidase families S01 and S08, therefore, supporting this prediction.
