*3.3. Prediction of Secondary Metabolites Biosynthetic Gene Clusters in the Albidoflavus Phylogroup*

Isolates belonging to the *albidoflavus* phylogroup have been reported to produce bioactive compounds of pharmacological relevance, such as antibiotics. As mentioned previously, the *Streptomyces albidoflavus* strain J1074 is the best described member of the *albidoflavus* phylogroup to date. As such, several of secondary metabolites produced by this isolate have been identified, including acyl-surugamides and surugamides with antifungal and anticancer activities, respectively [16]; together with paulomycin derivatives with antibacterial activity [60]. The *Streptomyces* sp. FR-008 isolate has been shown to produce the antimicrobial compound FR-008/candicidin [61,62]; while the *Streptomyces sampsonii* KJ40 isolate has been shown to produce a chitinase that possesses anti-fungal activity against plant pathogens [53]. On the other hand, although no bioactive compounds have been characterised from *Streptomyces albidoflavus* SM254, this isolate has been shown to possess anti-fungal activity, specifically against the fungal bat pathogen *Pseudogymnoascus destructans*, which is responsible for the White-nose Syndrome [55,63]. The *Streptomyces* sp. SM17 isolate has also previously been shown to possess antibacterial and antifungal activities against clinically relevant pathogens, including methicillin-resistant *Staphylococcus aureus* (MRSA) [13]. However, no natural products derived from this strain have been identified and isolated until now.

In order to further *in silico* assess the potential of these *albidoflavus* phylogroup isolates to produce secondary metabolites, and also to determine how potentially similar or diverse they are within this phylogroup, prediction of secondary metabolites biosynthetic gene clusters (BGCs) was performed using the antiSMASH (version 5) program [42]. The antiSMASH prediction was processed using the BiG-SCAPE program [43], in order to cluster the BGCs into gene cluster families (GCFs), based on sequence and Pfam [64] protein families similarity, and also by comparing them to the BGCs available from the minimum information about a biosynthetic gene cluster (MiBIG) repository [44] (Figure 3). When compared to known BGCs from the MiBIG database, a significant number of BGCs predicted to be present in the *albidoflavus* phylogroup genomes could potentially encode for the production of novel compounds, including those belonging to the non-ribosomal peptide synthetase (NRPS) and bacteriocin families of compounds (Figure 3). The presence/absence of homologous BGCs in the *albidoflavus* isolates' genomes was determined using BiG-SCAPE and is represented in Figure 4. Interestingly, the vast majority of the BGCs predicted in the *albidoflavus* phylogroup are shared among all of its members (15 BGCs); while another large portion (8 BGCs) are present in at least two isolates (Figure 4). Among the five members of the *albidoflavus* phylogroup, only the J1074 strain and the SM17 strain appeared to possess unique BGCs when compared to the other strains. Three unique BGCs were predicted to be present in the J1074 genome: a predicted type I polyketide synthase (T1PKS)/NRPS without significant similarity to the BGCs from the MiBIG database; a predicted bacteriocin, which also did not show any significant similarity to the BGCs from the MiBIG database; and a BGC predicted to encode for the production of the antibiotic paulomycin, with similarity to the paulomycin-encoding

BGCs from *Streptomyces paulus* and *Streptomyces* sp. YN86 [65], which has also been experimentally shown to be produced by the J1074 strain [60]. One BGC predicted to encode a type III polyketide synthase (T3PKS)—with no significant similarity to the BGCs from the MiBIG database—was also identified as being unique to the SM17 genome.

**Figure 3.** Biosynthetic gene clusters (BGCs) similarity clustering using BiG-SCAPE. Singletons, i.e., BGCs without significant similarity with the BGCs from the minimum information about a biosynthetic gene cluster (MiBIG) database or with the BGCs predicted in other genomes, are not represented.

**Figure 4.** Venn diagram representing BGCs presence/absence in the genomes of the members of the *albidoflavus* phylogroup, determined using antiSMASH and BiG-SCAPE.

Importantly, BGCs with similarity to the surugamide A/D BGC from "*Streptomyces albus* J1074" (now classified as *S. albidoflavus*) from the MiBIG database [16] were identified in all the other genomes of the members of the *albidoflavus* phylogroup. This raises the possibility that this BGC may be commonly present in *albidoflavus* species isolates. However, as only a few complete genomes of isolates belonging to this phylogroup are currently available, further data will be required to support this hypothesis. Nevertheless, these results further highlight the genetic similarities of the isolates belonging to the *albidoflavus* phylogroup, even with respect to their potential to produce secondary metabolites.

### *3.4. Phylogeny and Gene Synteny Analysis of Sur BGC Homologs*

In parallel to the previous phylogenomics analysis performed with the *albidoflavus* phylogroup isolates, sequence similarity and phylogenetic analyses were performed, using the previously described and experimentally characterised *Streptomyces albidoflavus* LHW3101 surugamides biosynthetic gene cluster (*sur* BGC, GenBank accession number: MH070261) as a reference [18]. The aim was to assess how widespread in nature the *sur* BGC might be, and the degree of genetic variation, if any; that might be present in *sur* BGCs belonging to different microorganisms.

Nucleotides sequence similarity to the *sur* BGC was performed in the GenBank database [35], using the NCBI BLASTN tool [36,37]. It is important to note that, since the quality of the data is crucial for sequence similarity, homology, and phylogeny inquiries, only complete genome sequences were employed in this analysis. For this reason, for example, the marine *Streptomyces* isolate in which surugamides and derivatives were originally identified, namely *Streptomyces* sp. JAMM992 [14], was not included, since its complete genome is not available in the GenBank database.

The sequence similarity analysis identified five microorganisms that possessed homologs to the *sur* BGC and had their complete genome sequences available in the GenBank database: *Streptomyces* sp. SM17; *Streptomyces albidoflavus* SM254; *Streptomyces* sp. FR-008; *Streptomyces albidoflavus* J1074; and *Streptomyces sampsonii* KJ40. Notably, these results overlapped with the isolates belonging to the previously discussed *albidoflavus* phylogroup (Figure 1), further highlighting the possibility that the *sur* BGC may be commonly present in and potentially exclusive to the *albidoflavus* species.

Phylogenetic analysis was performed in the genomic regions determined to be homologs to the *Streptomyces albidoflavus* LHW3101 *sur* BGC, using the MrBayes program [39] (Figure 5). Although a larger number of sequences should ideally be employed in this type of analysis, these results suggest the possibility of a clade with aquatic saline environment-derived *sur* BGCs (Figure 5). Thus, these aquatic saline environment-derived *sur* BGCs are likely to share more genetic similarities amongst each other, rather than with those derived from terrestrial environments. Since this analysis took into consideration the whole genome regions that contained the *sur* BGCs of each isolate, it is likely that the similarities and differences present in these regions involve not only coding sequences (CDSs) for biosynthetic genes and/or transcriptional regulators, but also could include promoter regions and other intergenic sequences.

**Figure 5.** Consensus phylogenetic tree of the *sur* BGC region of the *S. albidoflavus* LHW3101 reference *sur* BGC sequence, plus five *Streptomyces* isolates determined to have *sur* BGC homologs, generated using MrBayes and Mega X, with a 95% posterior probability cut-off. Aquatic saline environment-derived isolates are highlighted in cyan.

With this in mind, the genomic regions previously determined to share homology with the *sur* BGC from *S. albidoflavus* LHW3101 were further analysed, with respect to the genes present in the surrounding region, the organisation of the BGCs, together with the overall gene synteny (Figure 6). Translated CDSs predicted in the region were manually annotated using the NCBI BLASTP tool [36,37], together with GenBank [35] and the CDD [66] databases. These included the main biosynthetic genes, namely *surABCD*, the transcriptional regulator *surR*, and the thioesterase *surE*—all of which had previously been reported to have roles in the biosynthesis of surugamides and their derivatives [15–19] (Figure 6).

**Figure 6.** Gene synteny of the *sur* BGC region, including the reference *sur* BGC nucleotide sequence (LHW3101) and each of the *albidoflavus* phylogroup genomes. Arrows at different positions represent genes transcribed in different reading frames.

Interestingly, this result indicated that the gene synteny of the biosynthetic genes as well as the flanking genes is highly conserved, with the exception to the 3' flanking region of the BGC from *S. sampsonii* KJ40. Notably, even the reading frames of the *surE* gene and the *surABCD* genes are conserved amongst all the genomes. As indicated by the numbers in Figure 6, the 5' region in all the genomic regions consisted of: 1) A MbtH-like protein, which have been reported to be involved in the synthesis of non-ribosomal peptides, antibiotics, and siderophores, in *Streptomyces* species [67,68]; 2) a putative ABC transporter, which is a family of proteins with varied biological functions, including conferring resistance to drugs and other toxic compounds [69,70]; 3) a BcrA family ABC transporter, which is a family commonly involved in peptide antibiotics resistance [71,72]; 4) a hypothetical protein; followed by 5) the transcriptional repressor SurR, which has been experimentally demonstrated to repress the production of surugamides [16]; 6) a hypothetical membrane protein; 7) the thioesterase SurE, which is homologous to the penicillin binding protein, reported to be responsible for the cyclisation of surugamides molecules [21]; and finally 8–11) the main surugamides biosynthetic genes *surABCD*, all of which encode non-ribosomal peptide synthetase (NRPS) proteins [19]. The 3' flanking region consisted of: 12) A predicted multi-drug resistance (MDR) transporter belonging to the major facilitator superfamily (MFS) of membrane transport proteins [73,74]; 13) a predicted TetR/AcrR transcriptional regulator, which is a family of regulators reported to be involved in antibiotic resistance [75]; 14) a hypothetical protein; and 15) another predicted MDR transporter belonging to the MFS superfamily. In contrast, the 3' flanking region of the KJ40 strain *sur* BGC, consisted of: 16) A group of four hypothetical proteins, which may represent pseudogene versions of the first MDR transporter identified in the other isolates (gene number 12 in Figure 6); 17) a predicted rearrangement hotspot (RHS) repeat protein, which is a family of proteins reported to be involved in mediating intercellular competition in bacteria [76]; 18) a hypothetical protein; and 19) a MDR transporter belonging to the MFS superfamily, which, interestingly, is a homolog of protein number 15, which is present in all the other isolates.

The conserved gene synteny observed in the *sur* BGC genomic region, particularly those positioned upstream of the main biosynthetic *surABCD* genes, together with the observation that even the reading frames of the *surE* and the *surABCD* genes are conserved among all the genomes analysed, coupled with the previous phylogenetic and pan-genome analyses, suggest the following. Firstly, it is very likely that these strains share a common ancestry and that the *sur* BGC genes had a common origin. Secondly, there is a strong evolutionary pressure ensuring the maintenance of not only gene synteny, but also of the reading frames of the main biosynthetic genes involved in the production of surugamides. The latter raises the question of which other genes in this region may be involved in the production of these compounds, or potentially conferring mechanisms of self-resistance to surugamides in the isolates, particularly since many of the genes have predicted functions that are compatible with the transport of small molecules and with multi-drug resistance. These observations are particularly interesting considering that these strains are derived from quite varied environments and geographic locations.
