*2.1. Presence of CYP128 P450s in Five of the Six Mycobacterial Category Species*

Comprehensive comparative analysis of *CYP128* P450s in 4250 mycobacterial species belonging to six different categories (MTBC, MCAC, MAC, MCL, NTM and SAP) revealed that this P450 family is present in 2646 mycobacterial species (Figures 1 and 2; Supplementary Dataset 1). Thirty-seven mycobacterial species were found to have short P450 sequences that have none or only one of the highly conserved P450 motifs, i.e., EXXR and CXG (Supplementary Dataset 1). Thus, these short sequences are not included in the study and the species are considered not to have *CYP128* P450. Among mycobacterial species that were used in this study, most of the species (99%) belonging to the MTBC category have this P450 (Figure 2). Among 2350 MTBC species only 38 species do not have *CYP128* (Figure 2). In contrast to MTBC species, most of the species belonging to another five mycobacterial categories do not have this P450 (Figure 2). CYP128 P450s were found in only 34% of NTM species, followed by 26% of MAC species, 12% of MCAC species and 18% of SAP species. The results observed in this study revealed that none of the MCL species has CYP128 P450s, which is consistent with previous observations that MCL species do not have this P450 [22]. Overall, based on the results of this study and comparison with the results reported earlier [22], we conclude that *CYP128* P450s can be found in species belonging to the five mycobacterial categories except in species of the MCL category. Most of the CYP128 P450s (81%) are 489 amino acids in length (Supplementary Dataset 1). An anomaly was observed that two CYP128 P450 sequences, one from *M. tuberculosis* BTB10-120 and the other from *M. tuberculosis* SIT745/EAI1-MYS, has 877 amino acids. However, analysis of genome sequences revealed that *CYP128* and the sulfotransferase (*stf3*) genes were present next to each other on different DNA strands, indicating an error in annotation leading to the fused protein. A single copy of the *CYP128* gene was found in almost all species, except for 25 species where two copies (22 species), three copies (two species) and four copies (one species) of this gene were found (Supplementary Dataset 1). The species with more than one copy of this P450 gene were found across four different categories, where eight species were from MTBC, 12 species from NTM, three species from MAC and two species from MCAC (Supplementary Dataset 1). CYP128 P450 sequences identified in the study are presented in Supplementary Dataset 2.

**Figure 1.** Phylogenetic tree of the cytochrome P450 monooxygenases (CYPs/P450s). Different mycobacterial categories are indicated indifferent colors. Abbreviations: MTBC, *Mycobacterium tuberculosis* complex; NTM, non-tuberculosis mycobacteria; MCAC, *Mycobacterium chelonae-abscessus* complex; MAC, *M. avium* complex and SAP, Saprophytes. A high-resolution phylogenetic tree is provided in Supplementary Dataset 3.

**Figure 2.** Comparative analysis of the CYP128 P450s in six different mycobacterial categories. Abbreviations: MTBC, *Mycobacterium tuberculosis* complex; NTM, non-tuberculosis mycobacteria; MCAC, Mycobacterium chelonae-abscessus complex; MAC, *M. avium* complex; SAP, saprophytes and MCL, mycobacteria causing leprosy. Information on mycobacterial species and CYP128 P450s is presented in Supplementary Dataset 1.

#### *2.2. Di*ff*erent Mycobacterial Category Species Have Di*ff*erent CYP128 Subfamily Preferences*

CYP128 subfamily analysis revealed the presence of a new CYP128 subfamily in mycobacterial species. Thus, at present two CYP128 subfamilies can be found in mycobacterial species, i.e., A and B (Figure 3). Phylogenetic analysis revealed an interesting feature in CYP128 subfamilies with respect to their categories (Figure 1). In a previous study, it was observed that P450s belonging to different mycobacterial categories grouped together according to their category, indicating high conservation of P450 protein sequences after speciation into different categories [22]. In this study, the same phenomenon was observed for CYP128 subfamilies A and B, as these subfamily proteins were grouped together according to their mycobacterial categories, despite the subfamilies being clearly separated on the tree (Figure 1). Also, contrasting subfamily features were observed among different mycobacterial categories (Figure 3). Except for one species, *M. tuberculosis* XTB13-223, all species belonging to the MTBC category have CYP128 subfamily A (Figure 3). However, in contrast to MTBC species, in other mycobacterial categories subfamily B was dominantly present (Figure 3). This indicates that during evolution mycobacterial species had both subfamilies, but owing to their lifestyles or ecological niches only one subfamily was favored. This phenomenon of enriching a particular type of P450 family or subfamilies in microbial species is well known [6,22–28]. An interesting pattern of subfamilies was found in species having more than one copy of the *CYP128* gene. Eight MTBC species have two copies of the *CYP128* gene, both belonging to the same subfamily A. However, in other category species subfamilies A and B are present; especially in species belonging to the NTM category, subfamily B is populated (Supplementary Dataset 1).

**Figure 3.** Comparative analysis of CYP128 P450 subfamilies in five different mycobacterial categories. Abbreviations: MTBC, *Mycobacterium tuberculosis* complex; NTM, non-tuberculosis mycobacteria; MCAC, *Mycobacterium chelonae-abscessus* complex; MAC, *M. avium* complex; and SAP, Saprophytes. Information on mycobacterial species and CYP128 P450s is presented in Supplementary Dataset 1.

#### *2.3. CYP128 Family Ranked Fifth among P450 Families*

It has been proposed that the present-day P450s are evolved from the ancient P450 CYP51 [29,30]. Evolution of different P450 families and their rate of evolution indicate that the higher the evolutionary rate, the more catalytically diverse they are [6,22,24]. Parvez and co-workers [22] proposed a ranking system for P450 families where different P450 families are given a rank based on the number of conserved amino acids and it was proposed that the higher the conservation, the less the catalytic diversity. Furthermore, a recent study revealed that better ranking of P450 families can be achieved using a larger sample size [31]. In a previous study, only 49 CYP128 P450s were used to predict the ranking [22]. Identification of quite a large number of CYP128 P450s in this study necessitates

re-assessment of ranking, in order to calculate the accurate number of conserved amino acids in P450s, as it was mentioned that P450s with similar amino acid length should be used [22,25,31,32]. Thus, in this study, 2191 CYP128 P450s with amino acid lengths ranging from 479 to 489 amino acids (Supplementary Dataset 1) were used to assess the conservation ranking of the CYP128 P450 family. PROfile Multiple Alignment with Predicted Local Structures and 3D Constraints (PROMALS3D) analysis [33] revealed the presence of 217 amino acids that are invariantly conserved in CYP128 P450s (Table 1). Comparison with other P450 families from different biological kingdoms placed CYP128 family in the fifth position, compared to 23rd position previously (Table 1), indicating that this P450 family is one of the best conserved families. A complete table with updated P450 family ranking is presented in Table S1.

**Table 1.** Comparative amino acid conservation analysis of CYP128 P450 family with top 10 ranked families. The conservation index score (5–9) is obtained as described elsewhere [33] using PROMALS3D, where the number 9 indicates invariantly conserved amino acids in P450 members. The CYP128 family is indicated in bold.


The bold shows the position of CYP128 as it been revised from 23rd place to 5th because of this study results.

#### *2.4. CYP128 Family Has Distinctive Amino Acid Patterns at EXXR and CXG Motif*

A study of the analysis of P450 motifs EXXR and CXG revealed that all P450 families have a unique amino acid pattern that serves as a signature characteristic of that particular P450 family [34]. Subsequent studies of P450 families strongly supported this hypothesis [25,31,32]. Analysis of the EXXR and CXG P450 motifs in the CYP128 family has not been performed, and the large number of CYP128 P450s identified in this study provides an opportunity to analyze the amino acid patterns at the EXXR and CXG-motifs for this family. Analysis of the amino acids at the EXXR motif indicated the presence of E-T(84%)/Q(11%)/H(1%) -W(84%)/L(15%)-R amino acid patterns at this motif, with most of the CYP128 P450s (84%) having the E-T-W-R amino acid pattern (Figure 4). Analysis of amino acid patterns at the CXG motif revealed the presence of F-G-S(88%)/Y(12%) -G-I(88%)/V(6%)/A(5%)/P(1%) -H-L(89%)/M(11%) -C-P(87%)/I(7%)/L(6%) -G amino acid patterns, with the majority of the CYP128 P450s (86%) containing the F-G-S-G-I-H-L-C-P-G amino acid patterns (Figure 4). Amino acids patterns at the EXXR and CXG motifs of the CYP128 family were found to be unique compared to P450 families from different biological kingdoms [26,31,32,34], further supporting the hypothesis that amino acid patterns at these motifs are a signature of the P450 family [34].

**Figure 4.** Analysis of amino acid patterns at the EXXR and CXG motif in the CYP128 P450 family. All CYP128 P450 sequences (total of 2674 sequences) were analyzed for the EXXR and CXG signature motifs.

#### *2.5. Most CYP128 P450s Exist in Secondary Metabolite Biosynthetic Gene Clusters*

P450s are well known to play a key role in the synthesis of different secondary metabolites [35,36] and a recent study undertaking comparative analysis of secondary metabolite biosynthetic gene clusters (BGCs) between 48 *Streptomyces* species and 60 mycobacterial species revealed that *CYP128* P450s are part of a secondary metabolite BGC [6]. The *CYP128* P450 BGCs were found to have one or two other P450s in the cluster and the *CYP128* P450 is always found with CYP124A1 or together with *CYP124A1* and *CYP121A1* P450s [6]. In this study, secondary metabolite BGC analysis revealed that 78% of CYP128 P450s were found to be part of secondary metabolite BGCs; 2090 *CYP128* P450s from 2674 *CYP128* P450s were found to be part of secondary metabolite BGCs. Analysis of cluster types revealed 1994 *CYP128* P450 clusters belonging to the tRNA-dependent cyclodipeptide synthases (CDPS) cluster type, followed by 94 having no cluster types, indicating a novel BGC cluster; one belongs to Type I Polyketide synthase (T1PKS) and the last one belongs to the non-ribosomal peptide synthetase (NRPS) cluster type (Supplementary Dataset 1). In 25 species with two or more *CYP128* P450s, four species (*M. tuberculosis* KI\_19771, *M. tuberculosis* 402267, *M. canettii* CIPT 140070010, and *M. canettii* CIPT 140010059) were found to have all their *CYP128* P450s as part of a cluster type, three species (*M. tuberculosis* BTB10-253, *M. tuberculosis* M1415, and *Mycobacterium* sp. GA-1199) had only one and the rest of the species had no *CYP128* P450s as part of any cluster types (Supplementary Dataset 1).

An interesting contrast pattern was observed when comparing *CYP128* P450 BGCs among different mycobacterial category species (Table 2). Most of the CYP128 P450s were found to be part of BGCs in MTBC species, whereas only a handful of CYP128 P450s were part of BGCs in species belonging to the categories MCAC, NTM, and MAC (Table 2). Furthermore, most BGCs of MTBC species have three P450s, *CYP128*, *CYP121,* and *CYP124,* followed by BGCs with *CYP128* P450 and BGCs with *CYP128* and *CYP124* P450s (Table 2). Interestingly, the remaining four mycobacterial category species BGCs have only *CYP128* P450 (Table 2). None of the BGCs from all five different mycobacterial categories has the combination of P450s, *CYP128,* and *CYP121* (Table 2).


**Table 2.** *CYP128* P450 gene cluster analysis in five mycobacterial category species.

Abbreviations: MTBC, *Mycobacterium tuberculosis* complex; NTM, non-tuberculosis mycobacteria; MCAC, *Mycobacterium chelonae-abscessus* complex; MAC, *M. avium* complex; and SAP, saprophytes; BGCs, biosynthetic gene clusters.

#### *2.6. CYP128A1 Protein Model Has High A*ffi*nity to Menaquinone-9*

Since the expression of *CYP128A1* in heterologous hosts such as *E. coli* was found to be difficult [14,15] and at present there is no scope to generate the CYP128A1 crystal structure, in this study a 3D model of CYP128A1 was built to explore its binding affinity with MK9 and different azole drugs. The CYP128A1 model was built using the structure of Vitamin D3 hydroxylase (Vdh) from *Pseudonocardia autotrophica* as a template (PDB ID: 3VRM, 33.8% identity) [37]. The model was optimized using molecular-dynamics (MD) simulations (Figure 5) as described in the subsection on Materials and Methods. Notably, other models based on templates sharing similar coverage and identity compared to 3VRM were tested. Other structures include 1Z8P (32.4% identity) [38], 3A4G (33.5% identity) [39] and 5GNM (33.6% identity) [40]. Although some of these structures have a bit better resolution than 3VRM, they either correspond to apo structures (5GNM) or structures bound to small ligands (1Z8P and 3A4G), leaving the catalytic site "packed up" and thus inaccessible through docking simulations. This was confirmed by visual inspection together with our inability to generate binding poses in the catalytic site when docking on models based on 5GNM or 3A4G templates (results not shown). Conversely, 3VRM corresponds to the structure of Vdh mutant bound with Vitamin D3 where the catalytic site is open enough to enable the binding of ligands as big as MK9. As a result, docking simulations on this model were successful in finding binding poses for all the ligands tested.

**Figure 5.** A 3D model of CYP128A1 with structured regions labeled. Heme and the iron atom are shown in grey and red, respectively.

Homology modeling was carried out using the Molecular Operating Environment (MOE) software as described in the Materials and Methods section. Importantly, sequence alignment did not reveal any major indel regions compared to our target sequence (see Figure S1). We reported a five-residue loop as the most significant insertion with a fairly large distance to the catalytic site (23.5 Å to the ferric ion of the HEME group) and a seven-residue gap as the most important deletion. During our protocol, 10 intermediate models were generated independently. For every model, the RMSD (root-mean-square deviation) to the mean conformation was calculated and a low standard deviation (STD) to the mean structure was obtained (STD = 2.654 *A* for residues in the five-residue insertion, STD = 0.102 *A* for residues on both sides of the seven-residue deletion, STD = 0.021 *A* for the whole structure), suggesting good convergence and a limited number of possible variations in the target model. Our final CYP128A1 model (Figure 5), selected based on MOE's built-in score, was used as a target in a first round of docking simulations. For each compound, the 10 top docked complexes were then equilibrated through MD (six complexes in the case of MK9, see Materials and Methods section). Next, MK9 and azole drugs were removed and re-docked independently onto each of their corresponding equilibrated receptors. In Table 3, we reported the results of our re-docking procedure with top scores (in kcal/mol) listed in column 2. The order of binding was as follows: MK9 > itraconozole > posaconazole > ketoconazole > econazole > miconzaole > voriconazole > clotrimazole > fluconazole (Table 2).

**Table 3.** Final results obtained after re-docking MK9 and azole drugs on equilibrated CYP128A1 models. The compounds are ranked according to their top docking score, from highest to lowest affinity (column 2). Column 3 shows the score of the first well-orientated pose with respect to the catalytic site. In the case of MK9, proper orientation means that the hydrophobic tail of MK9 is facing the ferric ion of heme, which is consistent with omega hydroxylation of the molecule reported elsewhere [13,16]. For azole drugs, the imidazole ring is facing Fe of heme and the tail should be out of the cavity. In column 3, numbers in parenthesis refer to the rank of the compound based on that score (over all compounds). In column 4, we reported the rank of the first well-oriented pose among all the poses generated for each ligand.


In P450 enzyme systems, substrates and azole drugs are known to coordinate with the ferric core of the heme group. In the case of MK9, coordination takes place at the end of its hydrophobic tail, consistent with omega hydroxylation of the molecule [13,16]. For azole drugs, coordination occurs via the imidazole ring. Importantly, docking of MK9 and azole drugs was performed in a non-covalent way in the present paper. Therefore, no coordination was predicted for the tested ligands, which would require more thorough investigation, especially considering changes in electronic density. However, since non-covalent binding is a critical step in the establishment of covalent interactions, we believe the present results still provide a good indication of how azoles drugs can interact with CYP128A1 and how they rank in terms of affinity.

Column 3 in Table 3 shows the score of the first well-oriented pose, i.e., where the end of the hydrocarbon tail (MK9) or the imidazole ring (azole drugs) faces the ferric core of the heme. Interestingly, considering well-oriented poses does not significantly affect the ranking of compounds. Column 4 shows the rank of the first well-oriented pose over all the poses generated for each ligand. While top-ranked

poses are already well oriented in some cases (posaconazole, miconazole, voriconazole, fluconazole), well-oriented poses of other compounds such as itraconazole and ketoconazole exhibit lower ranking (10 and 9, respectively). Nonetheless, we observed that at least one pose among the top 10 is well oriented for each compound, leading to a score value similar to the top-ranked pose in each case. We depicted the first well-oriented poses of each compound in Figure 6.

**Figure 6.** First well-oriented binding poses obtained for a substrate (menaquinone 9) and the azole compounds. The substrate/azole compounds are shown in orange/yellow, respectively, while the heme is shown in red and the ferric core is depicted in light green.

In summary, *in silico* results indicated that the CYP128A1 indeed binds to the substrate (MK9) with highest affinity compared to the azole drugs analyzed in the study (Table 3). Among azole drugs itraconazole, posaconazole and ketoconazole have the highest binding affinity with CYP128A1. One possible reason for the high binding affinity of these compounds compared to other azoles is that these molecules are more extended depicting the longer side chain of MK9. A point to be noted that same phenomenon of better interactions of azoles due to their more extended structures was also observed for CYP51 of *Sporothrix schenckii* [41]. CYP128A1 binding to different azole drugs is also consistent with other *M. tuberculosis* H37Rv P450s that are experimentally shown to bind azole drugs albeit with different affinities [21].
