*3.2. Analysis of Groups of Orthologous Genes in the Albidoflavus Phylogroup*

In an attempt to provide further genetic evidence with respect to the similarities shared among the members of the *albidoflavus* phylogroup (Figure 1), a pan-genome analysis was performed to determine the number of core genes, accessory genes, and unique genes present in this group of isolates. The Roary program was employed for this objective [56], which allowed the identification of groups of orthologous and paralogous genes (which from now on will be referred to simply as "genes") present in the set of *albidoflavus* genomes, with a protein identity cut-off of 95%, which is the identity value recommended by the Roary program manual when analysing organisms belonging to the same species.

A total of 7565 genes were identified in the *albidoflavus* pan-genome, and among these a total of 5177 were determined to be shared among all the *albidoflavus* isolates (i.e., the core genome) (Figure 2). This represents a remarkably high proportion of genes that appear to be highly conserved between all the isolates, representing approximately 68.4% of the pan-genome. Additionally, when considering the genomes individually (Table S1), the core genome accounts for approximately 84.5% of the FR-008 genome; 88.5% of J1074; 85.5% of KJ40; 86.7% of SM17; and 83.7% of the SM254 genome. On the other hand, the accessory genome (i.e., genes present in at least two isolates) was determined to consist of 1055 genes (or ~13.9% of the pan-genome); while the unique genome (i.e., genes present in only one isolate) was determined to consist of 1333 genes (or ~17.6% of the pan-genome). This strikingly high conservation of genes present in their genomes together with the previous multi-locus phylogeny analysis are very strong indicators that these microorganisms may belong to the same species.

**Figure 2.** Venn diagram representing the presence/absence of groups of orthologous genes in the organisms belonging to the *albidoflavus* phylogroup.

An additional pan-genome analysis similar to the aforementioned analysis was also performed including the *Streptomyces koyangensis* strain VK-A60T in the dataset (Figure S1), which was an isolate shown to be a closely related neighbour to the *albidoflavus* phylogroup (Figure 1, clade 2). When compared to the previous analysis, the pan-genome analysis including the VK-A60T isolate showed significant changes in the values representing the core genome, which changed from 5177 genes (Figure 2) to 3912 genes (Figure S1), with an additional 1273 genes also shared among all of the *albidoflavus* isolates (Figure S1). The results also showed a much larger number of genes uniquely present in the VK-A60T genome than in the other genomes, with 2059 unique genes identified from a total of 6245 CDSs present in the VK-A60T genome in total, or approximately a third of its total number of genes (Figure S1). This proportion of unique genes present in the VK-A60T genome is considerably higher than the proportions of unique genes observed in the other *albidoflavus* phylotype genomes (Figure 2), which accounted for approximately only 2.5% of the total number of genes in

SM17; 4.2% in J1074; 4.9% in KJ40; 5% in FR-008; and 5.1% in SM254. Taken together, these results further demonstrate the similarities between the isolates belonging to the *albidoflavus* phylogroup, while the VK-A60T isolate is clearly more distantly related.

Thus, from previous studies [30,31] and in light of the phylogeny analysis and further genomic evidence presented in this study, it is likely that all the isolates belonging to the *albidoflavus* phylogroup are in fact members of the same species. It is reasonable to infer that, for example, the isolates in the *albidoflavus* phylogroup that possess no species assignment thus far (i.e., strains SM17 and FR-008) are indeed members of the *albidoflavus* species. Also, it is possible that the *Streptomyces sampsonii* KJ40 has been misassigned, and possibly requires reclassification as an *albidoflavus* isolate.

Misassignment and reclassification of *Streptomyces* species is a common issue, and an increase in the quantity and the quality of available data from these organisms (e.g., better-quality genomes available in the databases) will provide better support for taxonomy claims, or correction of these when new information becomes available [31,57–59].
