**1. Introduction**

Human cytochrome P450 (CYP) enzymes play important roles in the metabolism of drugs, steroids, fatty acids, and xenobiotics. CYPs also catalyze the conversion of some prodrugs into active drugs. Only about a dozen human CYPs metabolize 70–80% of all drugs, and these mainly belong to families CYP1, CYP2, and CYP3, and their subfamilies [1]. The human CYP2C subfamily contributes significantly to the hepatic clearance of many drugs. Although the members of the subfamily exhibit about 70% sequence similarity, they have unique substrate specificity profiles [2]. The human CYP2C subfamily consists of four isoforms: *CYP2C8*, *CYP2C18*, *CYP2C9*, and *CYP2C19* [3]. CYP 2C9 is the most highly expressed CYP protein after CYP3A4 and it is responsible for the metabolism of over 12.8% of drugs, with its substrates being mostly weak acids, such as non-steroidal anti-inflammatory drugs (NSAID) [1]. CYP 2C19 has a 10-fold lower expression level than CYP 2C9, but contributes to the metabolism of 6.8% of drugs [1], although without the specificity for acidic drugs of CYP 2C9. Nevertheless, the polymorphism of *CYP2C19* can dramatically affect drug treatments. For example,

it has been observed in the treatment of *Helicobacter pylori* infections with proton-pump inhibitors that are substrates of CYP 2C19, such as omeprazole, that the therapeutic efficiency is improved in patients with a poorly metabolizing *CYP2C19* genotype due to slower drug clearance [4]. Furthermore, *CYP2C19* is important for the enzymatic activation of the antiplatelet agent, clopidogrel, to its active thiol metabolite [5,6], and loss of function in the common *CYP2C19\*2* allele, which has a splicing variant leading to truncation of the protein, results in poor response to clopidogrel [7]. On the other hand, *CYP2C9* polymorphism results in reduced affinity for cytochrome P450 reductase (*CYP2C9\*2*) and altered substrate specificity (*CYP2C9\*3*) [8].

CYP 2C9 and CYP 2C19 have distinct substrate specificities, despite having high sequence conservation with 91.2% sequence identity (see sequence alignment in Figure 1). Crystal structures of the globular domains of the proteins have been resolved by X-ray crystallography after truncation to remove the N-terminal transmembrane (TM) domain and flexible linker sequences, as well as mutation to introduce terminal expression tags (see Figure 1). Only one crystal structure of CYP 2C19 has been resolved (Protein Data Bank (PDB) identifier 4GQS) [9], whereas a number of crystal structures of CYP 2C9 in various liganded and mutated states have been determined (currently 11 PDB entries) [10–17]. The crystal structures show that CYP 2C19 differs from CYP 2C9 at two residues in the active site: L208 and L362 in CYP 2C9 are substituted by V208 and I362 in CYP 2C19 [9,11]. Other differences are seen on the outer surface of the globular domain. The three-dimensional fold of CYP 2C19 is closer to the structure of CYP 2C8 (PDB 2NNI) [12], which shares 78% sequence identity with CYP 2C19, than to the structures of CYP 2C9 (PDB 1R9O) [11] or CYP 2C9m7 (PDB 1OG2, 1OG5) [10], despite their higher sequence identity (91.2%). The latter structures were resolved after making seven substitutions (K206E, I215V, C216Y, S220P, P221A, I222L, and I223L) in the F'–G' loop region of CYP 2C9 for the purpose of crystallization, as this part of the protein is hydrophobic and interacts with the membrane [10]. The structure of CYP 2C9m7 differs from that of CYP 2C9 in the B–C loop, which is highly flexible in the 1R9O structure, and the conformations of the F' and G' helices, which are missing in the 1R9O structure. The F'–G' region shows high structural variation amongst the crystal structures of CYP 2C9; in structures in which the protein has the wild-type sequence, the F'–G' region is either missing (e.g., in PDB 1R9O) [11], has an extended loop conformation and a small G' helix (PDB 5W0C) [17], or has an F' helix followed by a loop in the G' region interacting with a peripherally bound ligand [15]. CYP 2C9m7 also differs in the position of the sidechain of R108, which points out of the binding cavity in the CYP 2C9m7 (1OG2) structure and inside in the CYP 2C9 (1R9O) structure. The structure of CYP 2C19 shows a more than 3.0 Å C*a* atom deviation from both the CYP 2C9 (1R9O) and CYP 2C9m7 (1OG2) structures on the outer surface entrance region of the protein responsible for substrate access and selectivity. The main differences are observed in helices F, F', G', and G and their turns, the turn in β-hairpin 1, and the B–C loop region.

The sequence differences outside the CYP active site binding cavity may be responsible for the differential selection of drugs entering the binding pocket [18–21]. Indeed, differences in the use of the access tunnels have been suggested by mutagenesis studies on CYP 2C9/2C19 chimeras [18–21] and simulations of the globular domain of CYP 2C9 [22]. For example, CYP 2C19 selectively hydroxylates omeprazole and S-mephenytoin, whereas CYP 2C9 has little activity against these substrates. However, substitution of residues outside the binding site (I99H, S220, and P221T) at the entrance to tunnel 2b (using the nomenclature of Cojocaru et al. [23]) increased the omeprazole 5-hydroxylase activity of CYP 2C9 to a level similar to CYP 2C19 [18]. On the other hand, the E72K substitution in CYP 2C19 was shown to decrease its enzymatic metabolic activity against three tricyclic antidepressant (TCA) CYP 2C19 substrates, amitriptyline, imipramine, and dothiepin, whereas the K72E mutation in CYP 2C9 increased its metabolic activity against these compounds [21]. Most of these differences are found in the substrate recognition sites (SRS) identified by Zawaira et al. [24] (see Figure 1). Since most residues that differ between CYP 2C9 and CYP 2C19 are found in these SRS regions, we hypothesized that the sequence differences in the SRS regions and, thereby, the conformational differences observed between

the two CYPs, can contribute to different protein–membrane interactions which, in turn, can lead to differences in the substrate access tunnels to the binding cavity and the product release tunnels

**Figure 1.** (Left) Sequence alignment of CYP 2C9 and CYP 2C19. Identical residues are shown with a red background, similar residues with a yellow background, and differing residues with a white background. The secondary structure in the crystal structure of CYP 2C19 (PDB 4GQS) is indicated by arrows for β-strands, springs for α-helices, and 'TT' for turns; long loops are unmarked. The substrate recognition sites (SRS) are shown by blue dashed line boxes. The residues in the globular domain differing at the membrane interface are highlighted by red numbers. The missing regions in the crystal structure (PDB 1R9O) of the globular domain of CYP 2C9 are shown by transparent green boxes. (Right) Cartoon representations of the crystal structures of CYP 2C9m7 (PDB 1OG5), CYP 2C9 (PDB 1R9O), and CYP 2C19 (PDB 4GQS), showing the structural differences in the F'–G' region highlighted by the green rings, the heme in stick representation, and key secondary structure elements colored as follows: β-strand regions in magenta, the B–C loop in yellow, the F and G helices in red, the F'–G' helices/loop in green, the I-helix in blue, and the linker in orange. The active site is lined by the heme and the I-helix.

To investigate the effect of sequence differences between CYP 2C9 and CYP 2C19 on the protein–membrane interactions and the orientation of the protein globular domain in the membrane, we applied our optimized multiscale modeling and simulation protocol [25] to model and simulate the two proteins in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) bilayer. We have previously applied a similar procedure to simulate CYP 2C9 in a POPC bilayer [22]. For each system, this protocol starts with building a model of the full protein in a POPC bilayer based on the crystal structure of the globular domain (see Figure 2), and then proceeds with optimizing the system to reach a converged arrangement by coarse-grained (CG) and all-atom (AA) molecular dynamics (MD) simulations (see Materials and Methods (Section 4) for details). We compared the behavior of the two proteins in the simulations and compared our results with previously reported experimental and computational data.

**Figure 2.** Initial model of human cytochrome P450 (CYP) 2C9 showing its three domains and the initial information on which modeling and simulation of its arrangement in the phospholipid bilayer was based. The crystal structure (PDB 1R9O) of the globular domain (residues 50–490) and part of the linker region (residues 37–49) are shown in cartoon representation. Secondary structure predictions indicate the length of the N-terminal transmembrane (TM)-helix (cyan, residues 3–21). These two components are connected by a modeled linker loop of unknown conformation (orange, residues 22–36). The flexible C-terminal tail (residues 491–492) was not included in the model. The F'–G' helices (residues 210–220) were not observed in the structure and were modeled from the crystal structure of CYP 2C19 (PDB 4GQS). Important secondary structure elements in the globular domain are colored as follows: β-strand region in magenta, F and G helices in red, I-helix in blue, B–C loop in yellow, and F'–G' helices in green. The heme is shown in brown stick representation. Experimentally, it is known that the globular domain interacts with the bilayer (shown in grey line representation with grey spheres representing the phosphorous atoms) and, during the coarse-grained (CG) simulations, it approached and dipped into the bilayer. The heme tilt angle and the angles α and β defining its orientation in the bilayer are shown on the right, along with the definition of the TM-helix tilt angle (γ), and the vectors (v1 along the I-helix , v2 shown by the red arrow from the C to the F helix, and v3 along the TM-helix) computed to define these angles; the definitions of these angles are given in the Materials and Methods section.

#### **2. Results and Discussion**
