*3.3. Allocating Family and Subfamily to P450s*

The hit proteins that were collected were subjected to BLAST analysis against bacterial P450s at the website http://www.p450.unizulu.ac.za/. Based on the International P450 Nomenclature Committee rule [17–19], proteins with a percentage identity greater than 40% were assigned to the same family as named homolog P450s, and those that had greater than 55% identity were assigned to the same

subfamily as named homolog P450s. Proteins that had a percentage identity less than 40% were assigned to a new family.

#### *3.4. Streptomyces P450 Phylogenetic Analysis*

Phylogenetic analysis of the *Streptomyces* P450s was carried out following the method described in the literature [102]. First, the *Streptomyces* P450 sequences were aligned using the MAFFT v6.864 program with an automatically optimized model option [103], available at the Trex web server [104]. The alignments were then automatically subjected to inference and optimization of the tree by the Trex web server with its embedded weighting procedure, and the best inferred tree was visualized and annotated by iTOL [105].

#### *3.5. Streptomyces P450 Profile Heat-Maps*

P450 profile heat-maps were generated following a method published previously [22,27] to check the presence and absence of P450s in *Streptomyces* species. Briefly, a tab-delimited file was imported into Multi-Experiment Viewer (Mev) [106] and hierarchical clustering using a Euclidean distance metric was used to cluster the data. In total, 203 *Streptomyces* species formed the vertical axis and P450 family numbers formed the horizontal axis. Data were presented as −3 for family absence (green) and 3 for family presence (red).

#### *3.6. Identification of P450s That Are Part of Secondary Metabolite BGCs*

Secondary metabolite BGCs analysis and identification of P450s that are part of these BGCs were carried out following the procedure mentioned previously [102], with slight modification. For each *Streptomyces* species genome available at JGI IMG/M, the secondary metabolite BGCs were searched for the presence of P450s. The DNA sequence of BGCs with P450s was collected and formatted to fasta format using PSPad editor (http://www.pspad.com/en/). The fasta-formatted files were then used to identify the type of cluster and most similar known clusters using the Antibiotics and Secondary Metabolite Analysis Shell (anti-SMASH) program [107]. The results obtained were recorded on Excel spreadsheets and represented as species-wise BGCs, type and similar known BGCs, percentage similarity to known BGCs, and P450s that are part of specific BGCs. Some *Streptomyces* species genome IDs did not pass through anti-SMASH analysis, and thus these species were not included in P450s analysis as part of secondary metabolite BGCs. A list of *Streptomyces* species subjected to anti-SMASH analysis is presented in Supplementary Table S4.

#### *3.7. Data Analysis*

All calculations were done following the method described in the literature [23]. The average number of P450s was calculated using the formula: Average number of P450s = Number of P450s/ Number of species. The average number of BGCs was calculated using the formula: Average number of BGCs = Total number of BGCs/Number of species. The percentage of P450s that formed part of BGCs was calculated using the formula: Percentage of P450s part of BGCs = 100 × Number of P450s part of BGCs /Total number of P450s present in species. For comparative analysis of P450s and BGCs, information for bacterial species belonging to the genera *Bacillus* [22], *Mycobacterium* [21], and *Cyanobacteria* [23] was resourced from published articles.

#### **4. Conclusions**

In the last five decades, research on cytochrome P450 monooxygenases (CYPs/P450s) has mainly focused on their function and structural aspects, with little focus on evolutionary analysis, especially in microbes. The availability of a large number of microbial species genomes gives us an opportunity to focus on exploring the evolutionary aspects of P450s. Because a typical nomenclature system that has been established for P450s, each species genome needs to be data-mined and P450 proteins need

to be annotated (assigning family and subfamily). In this way, researchers around the world can make use of uniform P450 names. In this study, we therefore annotated a large number of P450s in 203 *Streptomyces* species and found 38 new P450 families. Some P450 families were found to be bloomed in *Streptomyces* species even at the subfamily level. Comparative analysis of key P450 features among different bacterial species revealed that *Streptomyces* species had a greater number of P450s, more secondary metabolite BGCs, and the highest number of P450s as part of BGCs compared to the bacterial species belonging to the genera *Bacillus*, *Mycobacterium*, and *Cyanobacteria*. This further confirmed that the higher the number of P450s, the higher the secondary metabolite diversity in a species. This was true for *Streptomyces* species, as large number of P450s were found to be involved in the generation of diverse secondary metabolites. One interesting phenomenon observed was the linkage between a particular P450 family and BGC. This indicates that these BGCs were horizontally transferred among different *Streptomyces* species. This study is a good addition to the comparative analysis of P450s and BGCs among different bacterial populations. Data presented in the study will serve as a reference for further annotation of P450s in *Streptomyces* species and other bacterial species. *In silico* predicted BGCs need to be experimentally validated to assess the secondary metabolites' biological properties.

### **Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/13/ 4814/s1.

**Author Contributions:** Conceptualization, K.S.; data curation, F.C.M., T.P., W.C., D.G., J.-H.Y., D.R.N. and K.S.; formal analysis, F.C.M., T.P., W.C., D.G., J.-H.Y., D.R.N. and K.S.; funding acquisition, K.S.; investigation, F.C.M., T.P., W.C., J.-H.Y., D.R.N. and K.S.; methodology, F.C.M., T.P., W.C., D.G., J.-H.Y., D.R.N. and K.S.; project administration, K.S.; resources, K.S.; supervision, K.S.; validation, F.C.M., T.P., W.C., D.G., J.-H.Y., D.R.N. and K.S.; visualization, F.C.M., T.P., W.C., and K.S.; writing—original draft, F.C.M., T.P., W.C., J.-H.Y., D.R.N. and K.S.; writing—review and editing, F.C.M., T.P., W.C., J.-H.Y., D.R.N. and K.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work presented in this article is part of research funded by a National Research Foundation (NRF), South Africa grant awarded to Khajamohiddin Syed (Grant No. 114159), where all the international authors involved in the study are listed as international collaborators. Fanele Cabangile Mnguni thanks the NRF, South Africa for a DST-NRF Innovation Master's Scholarship for the year 2019 (Grant No. 117171). Honours student, Tiara Padayachee, thanks the NRF, South Africa for an honours bursary (Grant No. MND190619448759). Dominik Gront was supported by the National Science Centre, Poland (Grant No. 2018/29/B/ST6/01989). Khajamohiddin Syed expresses sincere gratitude to the University of Zululand Research Committee for funding (Grant No. C686) and for the laboratory facilities.

**Acknowledgments:** The authors want to thank Barbara Bradley, Pretoria, South Africa for English language editing.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.
