*2.4. Cyanobacterial Species Has Highest Gene Cluster Diversity Percentage Compared to Bacillus and Mycobacterial Species*

Analysis of types of gene clusters in 103 cyanobacterial species revealed the presence of 73 different types of secondary metabolite BGCs (Figure 7 and Table S4). Among secondary metabolite BGCs, terpene BGC is dominant (235 clusters), followed by bacteriocin (183 clusters) and non-ribosomal peptides (NRPS) (64 clusters) (Figure 7). Forty types of BGCs have only a single gene cluster, indicating the highest diversity in types of gene clusters in cyanobacterial species (Figure 7). Comparative analysis of types of BGCs among different bacterial species revealed that cyanobacterial species have the highest number of types of BGCs compared to *Bacillus* and mycobacterial species, but the lowest compared to *Streptomyces* species (Table 1).

**Figure 7.** Comparative analysis of types of secondary metabolite biosynthetic gene clusters (BGCs) in 103 cyanobacterial species (**A**) and most similar known clusters (**B**). Standard abbreviations representing secondary metabolite BGCs as indicated in anti-SMASH (antibiotics & Secondary Metabolite Analysis Shell) [65] were used in the figure. Detailed information is presented in Supplementary Table S4.

In order to measure accurate BGC diversity among different bacterial species, we have developed a new equation, similar to the one we developed for P450 diversity percentage calculation [55], with some modification. The formula below will nullify the number of species used and will give an accurate gene cluster diversity percentage comparison between different populations.

$$\text{Grenelusterdiversitypercentage} = \frac{100 \times \text{Total numberof types of clusters}}{\text{Total numberof clusters} \times \text{numberof species}}$$

Based on the above formula, the gene cluster diversity percentage in cyanobacterial species was found to be four times higher compared to *Bacillus* and mycobacterial species (Table 2). This indicates that despite cyanobacterial species having the lowest number of gene clusters, these clusters are diverse and destined to produce different types of secondary metabolites. This was evident when looking into the most similar known clusters where, among 770 clusters, only 228 clusters showed similarity to the 79 best known clusters (Figure 7 and Table S4). Among the known similar clusters, only four most similar known clusters are dominant, with 25 (heterocyst glycolipids) 17 (1-heptadecene) and 12 (Nostopeptolide and Nostophycin) (Figure 7 and Table S4). A detailed analysis on most similar known clusters is presented in Supplementary Table S4. The remaining 542 BGCs have no similar known clusters, indicating that these BGCs might encode novel secondary metabolites, possibly with potential biotechnological value.

### *2.5. Few Cyanobacterial Species P450s Found to be Part of Secondary Metabolite Biosynthetic Gene Clusters*

Analysis of P450s that are part of different secondary metabolite BGCs revealed that only a few P450s were part of secondary metabolite BGCs in cyanobacterial species compared to *Bacillus*, mycobacterial and *Streptomyces* species (Tables 2 and 3). Only 8% of P450s are part of BGCs in cyanobacterial species compared to other bacterial species, where 22% (*Bacillus* species), 11% (mycobacterial species) and 34% (*Streptomyces* species) of P450s were found to be part of BGCs (Table 2). Among 341 P450s only 27 P450s were found to be part of secondary metabolite BGCs in cyanobacterial species, indicating that cyanobacterial species P450s might play a major role in their primary metabolism. The 27 P450s that are part of BGCs belong to six P450 families (Table 3). P450s belonging to the CYP110 family are dominantly present in BGCs (17 P450s—63%), followed by CYP213 (4 P450s—15%), CYP120 (3 P450s—11%) and a single member found in P450 families CYP1011, CYP1185, and CYP197 (Table 3). A point to be noted is that the CYP110 family is dominantly present in cyanobacterial species, indicating its requirement for the production of secondary metabolites, as the same phenomenon was observed where dominant P450 families were found to be part of BGCs in *Bacillus*, mycobacterial and *Streptomyces* species [54,55]. The 27 P450s were found to be part of 10 types of clusters, where nine P450s were found to be part of an NRPS, Type I PKS (polyketide synthase), followed by five P450s that were part of terpene and three P450s that were part of bacteriocin (Table 2). P450s found in each of the clusters and most similar known clusters were presented in Table 3. Analysis of the most similar known clusters revealed that CYP110AH1 from *Synechococcus* sp. PCC 7502 is certainly involved in the production of anabaenopeptin NZ 857/nostamide, as this P450 NRPS cluster showed 100% similarity to the gene cluster that produces the metabolite (Table 3). Apart from this match, the percentage similarity to most known clusters is very low and thus the metabolites produced by different gene clusters cannot be predicted.
