*2.9. Comparison with Experiments and Previous Simulations*

There are no experimental data characterizing full length CYPs and their interactions with the membrane in atomic detail. However, various experiments have been performed to study the membrane topology of CYPs and their interactions beyond the N-terminal transmembrane helix. Engineered CYP 2C9 without an N-terminal helix remained membrane-associated through the catalytic domain, as seen by atomic force microscopy (AFM) [34]. The height of the catalytic domain above the membrane was reported as 35 ± 9 Å using atomic force microscopy [35], consistent with our simulations. The binding orientation and height were also reasonably consistent with site-directed antibody-antipeptide studies [36] and the surface hydrophobicity pattern in the crystal structure of mammalian CYP 2C5 [28], the first mammalian CYP to have its 3D structure determined.

The insertion depth of the catalytic domain in the membrane was studied by tryptophan fluorescence scanning of CYP 2C2, which suggested that L36 at the start of the A'-helix, F69 at the end of the β1-1 strand, and L380 in the β2-2 strand were inserted in the lipid bilayer, whereas Y225 in the F'–G' region is in or near the phospholipid head groups [37]. Primary sequence analysis of CYP 2C2, CYP 2C9, and CYP 2C19 showed that the CYP 2C2 residues identified by tryptophan fluorescence scanning are conserved in all three CYPs. Similar interactions were observed in our simulations of CYP 2C9 where residues L36 and L380 showed interactions with the lipid tail region (100% occupancy), while residue F69 interacted with the tail region for 40% of the simulation time (see Figure 5A). In CYP 2C19, residue L36 was buried in the membrane, while residue F69 showed interactions with the membrane tail region for 49% of the simulation time. Due to the difference in the orientation of the globular domain, strands β2-1 and β2-2 did not interact with the membrane in CYP 2C19, and therefore residue L380 remained outside the membrane. In both CYPs, the F'–G' helices formed a strong anchoring point and residue Y225 remained buried in the membrane.

The rearrangement of the linker region in the two CYPs during the simulations to expose polar residues in the flexible region next to the TM-helix (see Figure 5A) is consistent with the observed cytoplasmic accessibility of rat CYP 2B2 in rough microsomes to an antibody raised against residues 24–38 [36]. The linker region consists of a patch of polar residues, including several positively charged residues (22–30), a hydrophobic proline-rich patch (residues 30–40), and a patch of polar residues (40–49). The linker orientation and interactions in the two CYPs, the distribution of amino acid residues in the lipid bilayer and their propensity to reside in the lipid head or tail region matched well with the hydrophobicity scale for amino acids determined by various experiments and MD simulation studies [38,39]. During simulations, the linker remained highly mobile and changed conformation to keep polar residues in the polar patches outside the membrane core, including polar residues and charged residues that were buried in the hydrophobic core of the bilayer in the initial structure.

The orientations of CYP 2C9 in the membrane in the current study using the LIPID14 forcefield matched well with previously our published work on CYP 2C9 [22] (see Table 3 for heme-tilt angles). MD simulations of CYP 2C9 in a 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC) bilayer by Berka et al. [40,41] resulted in orientations with a higher heme-tilt angle and greater burial of the globular domain in the bilayer, with more ligand pathways leading from the heme into the bilayer. The authors used a different initial crystal structure of CYP 2C9 (PDB 1OG2), and a different procedure to model and generate starting orientations. The Berger united atom forcefield for lipids was used, which could also contribute to different membrane-protein interactions and orientations. A bilayer of DOPC is slightly thinner than a bilayer of POPC (by 0.3 Å) and has a larger area per lipid (by 4 Å2) [42]. These small differences might lead to slightly more tilting of the TM-helix in DOPC than POPC, but we would expect the dipping of the CYP globular domain into the bilayer to be similar for DOPC and POPC, as they have the same head group. Interestingly, however, Berka et al. observed that the globular domain of CYP 2C9 was immersed in a depression in the bilayer, surrounded by phospholipid head groups [41].


**Table 3.** Comparison of the computed heme-tilt angle from AA MD simulations of CYP 2C9 and CYP 2C19 with previously reported values. The heme-tilt angle is the angle between the heme plane and the membrane normal (see Materials and Methods and Figure 2).

The predicted orientation of CYPs in a DOPC bilayer has been reported in the OPM (orientation of proteins in membranes) database (https://opm.phar.umich.edu/). The heme-tilt angle of the orientation of CYP 2C19 (PDB: 4GQS) reported in the OPM database is 74◦, notably higher than that observed in our MD simulations (see Table 3). A similar orientation with a heme-tilt angle of 72◦ is given in the OPM database for the CYP 2C9 crystal structure with PDB ID 1OG5, whereas the 1R9O structure has an orientation with a heme-tilt angle of 60◦. These discrepancies may have arisen because the OPM uses the crystal structure as is to predict the protein orientation in the membrane, and therefore the orientation may be affected by the missing (PDB 1R9O) or mutated (PDB 1OG5) residues in the F'–G' loop region, or the lack of flexible linker and TM-helix residues.

#### **3. Concluding Discussion**

The two isoforms of the CYP2C subfamily CYP 2C9 and CYP 2C19 exhibit ~92% sequence identity, yet they show distinct substrate specificity. Since mammalian CYPs are anchored in the endoplasmic reticulum membrane by an N-terminal helix and secondary contacts from the catalytic domain, differences in the sequence and 3D structure in the membrane-interacting region in the catalytic domain can lead to different membrane-protein interactions. As it has been hypothesized that lipophilic substrates enter into the binding pockets of CYPs from the membrane core, determining the orientation of CYPs in the membrane can provide insights into differences in the opening of ligand entrance tunnels to the membrane, substrate specificity, and the mechanism of drug selectivity. Here, we have used a multiscale simulation methodology to understand the differences in primary sequence and 3D structure of two CYPs and their impact on their interactions and orientations in the membrane.

The results of multiple CG and AA simulations showed consistency and a clear tendency for the two CYPs to adopt different orientations and positions with respect to the membrane bilayer. The orientations adopted by the globular domains of the two CYPs were classified into classes A, B, or A/B (intermediate orientation). CYP 2C9 mainly adopted a class A orientation, which has lower α, β and heme-tilt angles, and CYP 2C19 adopted a class B orientation. The class B orientation is similar to the orientations observed in simulations with the same bilayer and force field for CYP 3A4 and N-terminal mutants of CYP 17A1 and CYP 19A1, with measured heme-tilt angles of about 60◦ [27]. CYP 2C9 thus appears to be unusual in this set of CYPs (simulated under the same conditions) in adopting an orientation with a lower heme-tilt angle. Berka et al. simulated the six major drug-metabolizing CYPs (which vary much more in sequence than the CYP 2C9/CYP 2C19 pair studied here) in a DOPC bilayer [41]. The computed heme-tilt angle varied over the six proteins between 56 ± 5◦ and 72 ± 6◦, with CYP 2C9 having a heme-tilt angle of 61◦: at the lower end of this range, although higher than found in our simulations. The tendency to a low heme-tilt angle for CYP 2C9 was also observed by Cojocaru et al. [22], with a slightly lower heme-tilt angle when an F–G loop was present instead of the F'–G' helices. It is reasonable to expect that ligand passage may involve unwinding of the F'–G' helices into a loop extending further into the membrane that can open up the entrance to the ligand access route from the membrane. However, we do not expect this change in conformation to result in an increase in the heme-tilt angle.

The difference in the sequence and in conformations in the SRSs near the membrane interface resulted in different orientations and insertion depths in the membrane of the globular domains of the two CYPs. A mutational swap of the key residues differing at the membrane interface in CYP 2C9 (or 2C19) in the CG simulations resulted in similar orientations (about 50% of simulation results) to the wild-type CYP 2C19 (or 2C9) orientation. Therefore, we concluded that altering a few key residues that differ in the linker, the β1-1 strand, and the F'–G' region through mutation or a change in conformation can significantly influence the orientation and interactions of CYP-membrane systems. McDougle et al. showed that a double mutant in the F–G loop of CYP 2J2 lowered membrane insertion in MD simulations and tryptophan fluorescence studies [44]. Notably, despite the importance of the F'–G' region, we found that mutating residues in the F–G loop alone was not sufficient to switch the

orientation of CYP 2C9 to that of CYP 2C19 or vice versa. The mutation of additional residues in the β1-1 and 1-2 (K72E and P73R) and B–C loop (I99H) was necessary.

The orientation of the CYP globular domain in a membrane may be affected by simulation parameters such as the force field used and the procedure used to model and sample the conformational space of the simulated systems. We have shown previously that the force field and procedure employed here can give good agreement with experiments [25,27]. However, other factors relevant in vivo may affect the orientation of the CYP globular domain, such as homodimerization [37], heterodimerization, or the binding of redox partner proteins. Indeed, the orientational preferences of the different CYPs may affect their ability to present the proximal binding face for effective electron transfer from cytochrome P450 reductase, or to engage in CYP oligomers. The orientation of the protein also affects the access of substrates from the membrane to the active site. We have here observed the opening of tunnels to a water probe. Further exploration would require simulation of the passage of substrate molecules by standard or enhanced sampling approaches [26,29,45]. Furthermore, allosteric ligands may affect protein orientation and substrate tunnel opening, e.g., in CYP 3A4, the allosteric ligand binds at the protein-membrane interface [46].

The membrane composition may also have an influence on protein positioning in the membrane. We have here simulated the proteins in a pure POPC bilayer. Phosphatidylcholine is the main lipid component of the mammalian ER membrane [47], and POPC bilayers are often used as a simple ER mimic in in vitro studies. For example, we previously compared the heme-tilt angle computed from simulations of three different CYPs in a POPC bilayer with that measured in experiments done on these CYPs in a Nanodisc containing a POPC bilayer and found very good agreement [27]. However, the ER membrane in fact contains a variety of glycerophospholipids, as well as cholesterol and ceramide. This more heterogeneous membrane composition may affect protein positioning and dynamics as well as ligand entrance to the active site. Indeed, Navratilova et al. [48] found that the orientation of CYP 3A4 changed as the cholesterol content of a DOPC bilayer was changed, with the heme-tilt angle increasing with increasing cholesterol. The addition of cholesterol also altered the substrate access tunnel opening patterns due to interactions of the protein with the cholesterol and the ordering and thickening of the membrane due to cholesterol. Molecular simulation studies with more realistic membrane compositions, such as that employed by Park et al. in their simulation of CYP 19A1 to mimic the rat liver endoplasmic reticulum membrane [49], will be necessary to fully understand the structural and dynamics interplay between the CYP proteins, substrates, and the membrane.

In conclusion, our MD simulations demonstrate that small sequence changes at key positions can result in distinct orientations of CYP proteins in a phospholipid bilayer. These differences affect substrate access tunnels to the active site from the membrane. The differences observed for CYP 2C9 and CYP 2C19 are consistent with their differences in substrate selectivity, providing further evidence that substrate selectivity is governed by the residues lining the substrate access route as well as those in the active site.

#### **4. Materials and Methods**

*Preparation of structures of full-length models of CYP 2C9 and CYP 2C19*—Our simulations of CYP 2C9 (Uniprot id P11712) were based on the crystal structure of CYP 2C9 (PDB 1R9O, in the Protein Data Bank http://www.rcsb.org), resolved at 2.0 Å resolution in complex with flurbiprofen. This structure of the globular domain of CYP 2C9 was chosen because it was the highest resolution structure available that was determined with the wild-type sequence (apart from removal of the N-terminal residues 1–25 and addition of terminal expression tags). The structure had missing residues in the linker region (residues 38–42) and in the F'–G' region (residues 214–220). The crystal structure of CYP 2C19 (Uniprot id P33261) (PDB 4GQS: chain A) in complex with the inhibitor (2-methyl-1-benzofuran-3-yl) (4-hydroxy-3,5-dimethylphenyl) methanone (Protein Data Bank chemical component 0XV) was used. It was resolved at 2.87 Å, after truncating residues 1–28 from the N-terminus and adding expression tags [9]. The missing residues in the linker and globular domain of CYP 2C9 were similar to those in

CYP 2C19 as shown in Figure 1, where the sequence alignment and secondary structure were generated using the online tool ESPript3.0 (http://espript.ibcp.fr/ESPript/ESPript) [50]. Therefore, the crystal structure of CYP 2C19 was used as a template for modeling the missing linker residues and F'–G' residues in CYP 2C9. The TM-helix (residues 3–21) and missing linker (residues 22–25) of CYP 2C9 were modeled similarly to Cojocaru et al. [22], who modeled and simulated CYP 2C9 in a POPC membrane starting from the crystal structure (PDB 1R9O). The TM-helix of CYP 2C19 was predicted to span from residues 4–20 or 3–22 by the online server Psipred (http://bioinf.cs.ucl.ac.uk/psipred/), which uses the membrane protein structure and topology (MEMSAT3) software and transmembrane protein topology prediction using support vector machines (SVM-MEMSAT) software [51]. The PredictProtein server (https://www.predictprotein.org/) [52] suggested an N-terminal alpha-helical conformation spanning residues 2–23, and this assignment was used for the simulations of CYP 2C19, along with additional simulations with assignment of the TM-helix to residues 3–21 for consistency with the simulations of CYP 2C9. The final models of each protein consisted of the crystal structure of the globular domain with modeled missing regions (we have referred to these below as systems with the letter S). For each of the proteins, 10 different starting orientations of the globular domains above the membrane were generated by changing the dihedral angles in the linker regions to generate a diverse set of initial structures with the CYP globular domain positioned to ensure that it was outside the membrane bilayer when the protein is immersed in a bilayer (see below). These structures were used for the construction of CG models.

*Preparation of additional models of CYP 2C9*—Additional CG systems of CYP 2C9 were prepared by using four modeled structures. Since the crystal structure (PDB 1R9O) of CYP 2C9 lacks the F'–G' helices (or F–G loop), different modeling approaches with different template structures were used to assemble four structures of full-length CYP 2C9 (see Appendix A). The CG systems (M1–M4) prepared with these modeled structures of CYP 2C9 have been designated by the letter M for "models", which differ from the CG systems indicated by S, for which crystal structures were used (with modeled missing regions only) (Table S1).

*Modeling of chimeric mutants of CYP 2C9 and CYP 2C19*—The residues at the protein-membrane interface differing between CYP 2C9 and CYP 2C19 were substituted to create chimeric CYP 2C9/2C19 structures. The residues of CYP 2C9 substituted by CYP 2C19 residues were in the linker (G46D), β-strand 1–2 (K72E and P73R), B–C loop (I99H), and F'–G' helices (S220P and P221T). The corresponding substitutions were also made in CYP 2C19. Five different orientations of the wild-type all-atom models (S1) were selected to make the substitution mutations, while keeping the initial orientations of the globular domain of the mutant and wild-type structures the same. These modeled mutants are referred to as mt2C9 and mt2C19.

*Preparation of coarse-grained systems*—The MARTINI CG forcefield was used for CG simulations. A similar procedure was used to generate CG models of CYP 2C9 and CYP 2C19 in a POPC bilayer in water, as described in our previous work [25]. All-atom protein models were converted to MARTINI CG models using the martinize.py script (http://cgmartini.nl) and the TM-helix was immersed in a pre-equilibrated modeled CG POPC lipid bilayer consisting of 594 POPC molecules. The MARTINI version 2.2 forcefield with the standard non-polarizable water model (NPW) was used [53]. The elastic network model was used to apply additional harmonic restraints with an elastic force constant of 500 kJ·mol−1·nm−<sup>2</sup> and a distance cut-off of 5 to 9 Å to preserve the secondary and tertiary structure of the protein during simulation. The secondary structure information was provided in a DSSP file obtained from the DSSP server (www.cmbi.ru.nl/dssp.html).

The effects of differing linker flexibility on the final orientations of CYP 2C9 and CYP 2C19 were checked by defining two different flexible linker regions: residues 22–36 and residues 26–38. The linker was kept flexible by removing the restraints on specified residues in the elastic network. CG systems consisting of the globular domain only (S3), residues 47–490, were prepared for CYP 2C9 and CYP 2C19 to allow an unbiased conformational search of the protein orientation and to evaluate convergence of the orientations in the membrane. The CG systems were solvated using the MARTINI standard water model (NPW) (S1–S3). Tests were also performed for the MARTINI polarizable water (PW) model (S4–S5) (see SI, Table 1).

*Coarse-grained simulations*—After preparation of the CG models, several different CG simulations were performed with the Gromacs software [54]. The MARTINI standard water model (NPW) was used and the non-bonded interactions were treated with a reaction field (RF) for Coulomb interactions, and the cut-off distance for these and for van der Waals' interactions was set to 1.1 nm. We also tested the polarizable water (PW) model with electrostatic and van der Waals interactions calculated by the Shift method (Gromacs 4.5.5), PME and a cut-off, or RF and a cut-off (using Gromacs 5.0.4), as in Mustafa et al. [25]. Similar positioning of the globular domain with respect to the membrane was obtained as that in simulations with the NPW model, but simulation times to achieve convergence were much longer with the PW model, and convergence was not always achieved.

Each simulation started with a short steepest-descent energy minimization until the maximum force on a CG particle was less than 10 kJ·mol−1·nm<sup>−</sup>1. A 40 ns equilibration simulation at a constant temperature of 310 K and pressure of 1 atm was performed in the NPT ensemble, using velocity rescale (v-rescale) and the Berendsen procedure for pressure coupling before switching to the Parrinello–Rahman barostat method for production simulations of 12–20 μs. A coupling constant of 12 ps was used to maintain semi-isotropic pressure coupling with a compressibility of 3.0 <sup>×</sup> 10−5. A time step of 20 fs was applied.

*Convergence of coarse-grained simulations*—The CG simulations were considered converged when no further significant changes in the orientations of the CYP globular domains above the membrane were observed. The orientation and position of the CYP globular domain was specified by the angles and distances defined previously [22,25,55] (see Figure 2). The angles were computed by defining the following vectors: v1, from the center of mass (CoM) of the backbone particles/atoms of the first four residues to the CoM of the last four residues of the I-helix; v2, from the CoM of the first four residues of the C-helix to the CoM of the last four residues of the F-helix; v3, the vector between the CoMs of the first and last four residues of the TM-helix; and the z-axis perpendicular to the membrane. The angle α was then defined as the angle between v1 and the z-axis and angle β was defined as the angle between v2 and the z-axis. Angles α and β define the orientation of the globular domain above the lipid membrane. Similarly, the TM-helix tilt angle (γ) in the lipid membrane was defined as the angle between v3 and the z-axis. The axial distances of the CoM of the globular domain (residues 50–490), the linker region (residues 22–49), and the F'–G' helices (residues 210–220) to the CoM of the lipid bilayer were monitored during the trajectories.

*Back conversion from CG to AA models*—For each system, representative frames from the converged parts of each set of CG production runs were selected for back-conversion to an all-atom model. The representative frame was chosen to have angle and distance values within 1% of their mean value over the converged parts of the production runs [55]. The back-conversion of the POPC bilayer was performed as described in Cojocaru et al. [22], whereas the protein back-conversion was done using scripts backward.py and initram.sh, available at the MARTINI website (http://cgmartini.nl) [56]. In the absence of the heme cofactor in the CG model, conformational changes in the side chains of the heme-binding pocket residues were observed. Therefore, the globular domain (residues 50–490) from the crystal structure was superimposed on the back-mapped structure and used in subsequent AA simulations. The AA model of the globular domain contained the heme-cofactor. If there was a co-crystallized ligand in the crystal structure, it was also reincorporated in the model. The TM-helix and flexible linker region obtained from the back-conversion procedure were then connected to the globular domain, resulting in a full-length all-atom model. Finally, the all-atom model of the CYP was placed into the all-atom model of the POPC bilayer to obtain a complete CYP-membrane complex.

*All-atom molecular dynamics simulations of CYP 2C9 and CYP 2C19*—AA MD simulations were performed with two different starting orientations of the CYPs in the membrane for each CYP. Different orientations were obtained for each CYP from two different CG simulation systems, S1 and S2. AA forcefields AMBER ff14SB [57] and LIPID14 [58] were used for the protein residues and for the

POPC lipids, respectively. The heme parameters were provided by D. Harris with the partial atomic charges derived from DFT calculations [59]. The ionic concentration was maintained at 150 mM using Na<sup>+</sup> and Cl<sup>−</sup> ions in a periodic box of TIP3P [60] water molecules. The same procedure for AA MD simulation was used as described by Cojocaru et al. [22]. The simulation protocol began with energy minimization with a decreasing harmonic force constant of 1000 to 0 kcal/mol.Å<sup>2</sup> on non-hydrogen atoms of the protein and lipid residues, as described in Reference [22]. The system was then equilibrated using NAMD 2.10 [61] in a constant surface area, pressure, and temperature (NPAT) ensemble, for 1.5 ns with a gradual decrease in the harmonic restraints from 100 to 0 kcal/mol.Å2 on non-hydrogen atoms of the protein and lipid residues. The equilibration simulations in the NPAT ensemble were extended to 10 ns without harmonic restraints with a 1 fs integration time, keeping water molecules rigid. During subsequent production simulations, all bonds were kept rigid and the time step was increased to 2 fs. Anisotropic pressure coupling was applied, in which the cell fluctuates independently in the x, y, and z cell dimensions.

Control calculations were also performed with the GAFF lipid force field which was used in previous work [22]. The GAFF lipid forcefield requires surface tension to maintain the structural properties of the membrane bilayer, whereas the LIPID14 parameters are optimized for use without application of surface tension. We also assessed semi-isotropic pressure coupling for the simulations with the LIPID14 force field. The results show that the alternative simulation parameters gave the same class of orientation of the globular domain in the bilayer as obtained with LIPID14 and anisotropic pressure coupling. We have previously found that the combination of AMBERff14SB and LIPID14 gives better agreement with experiment than the GAFF force field and that it results in heme-tilt angles for CYPs in bilayers in excellent agreement with linear dichroism data for CYPs in Nanodiscs [27].

The orientation and position of the CYP globular domain in the AA MD simulations was characterized by computing the same angles and distances as for the CG MD simulations. In addition, the heme-tilt angle, the angle between the heme plane defined by the four nitrogen atoms coordinating the iron and the z-axis, was computed (see Figure 2). VMD (www.ks.uiuc.edu/Research/vmd/) [62] was used for the analysis and to generate the molecular graphics figures. Tunnels accessible to a water molecule probe were computed using the MOLEonline webserver (mole.upol.cz/) [63] with default parameters.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/18/ 4328/s1. Supplementary information is provided as a separate document containing Tables S1–S3 and Figures S1–S7, as well as coordinates from AA MD simulations.

**Author Contributions:** Conceptualization, G.M. and R.C.W.; methodology, G.M., P.P.N., N.J.B., R.C.W.; validation, G.M.; formal analysis, G.M., and P.P.N; investigation, G.M.; resources, R.C.W.; data curation, G.M.; writing—original draft preparation, G.M.; writing—review and editing, G.M. and R.C.W.; visualization, G.M. and P.P.N.; supervision, R.C.W.; project administration, R.C.W.; funding acquisition, G.M., P.P.N, R.C.W.

**Funding:** This research was funded by Klaus Tschira Foundation and the German Academic Exchange Service (DAAD) (scholarships to G.M., P.P.N.). The authors acknowledge support for computing resources from the state of Baden-Württemberg through bwHPC and the German Research Foundation (DFG) through grant INST 35/1134-1 FUGG and for use of the Hazel Hen Cray XC40 at the high-performance computing center Stuttgart, Germany (HLRS; Project Dynathor). The APC was funded by the Deutsche Forschungsgemeinschaft within the funding programme Open Access Publishing, by the Baden-Württemberg Ministry of Science, Research and the Arts and by Ruprecht-Karls-Universität Heidelberg.

**Acknowledgments:** G.M. gratefully acknowledges the support of the PhD program of the Institute of Pharmacy and Molecular Biotechnology, Heidelberg University. We thank Stefan Richter for assistance in optimizing software performance.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
