*Article* **A Rational Design of** α**-Helix-Shaped Peptides Employing the Hydrogen-Bond Surrogate Approach: A Modulation Strategy for Ras-RasGRF1 Interaction in Neuropsychiatric Disorders**

**Maria Rita Gulotta 1, \* , Riccardo Brambilla 2 , Ugo Perricone 1,† and Andrea Brancale 3,†**


**Abstract:** In the last two decades, abnormal Ras (rat sarcoma protein)–ERK (extracellular signalregulated kinase) signalling in the brain has been involved in a variety of neuropsychiatric disorders, including drug addiction, certain forms of intellectual disability, and autism spectrum disorder. Modulation of membrane-receptor-mediated Ras activation has been proposed as a potential target mechanism to attenuate ERK signalling in the brain. Previously, we showed that a cell penetrating peptide, RB3, was able to inhibit downstream signalling by preventing RasGRF1 (Ras guanine nucleotide-releasing factor 1), a neuronal specific GDP/GTP exchange factor, to bind Ras proteins, both in brain slices and in vivo, with an IC<sup>50</sup> value in the micromolar range. The aim of this work was to mutate and improve this peptide through computer-aided techniques to increase its inhibitory activity against RasGRF1. The designed peptides were built based on the RB3 peptide structure corresponding to the α-helix of RasGRF1 responsible for Ras binding. For this purpose, the hydrogen-bond surrogate (HBS) approach was exploited to maintain the helical conformation of the designed peptides. Finally, residue scanning, MD simulations, and MM-GBSA calculations were used to identify 18 most promising α-helix-shaped peptides that will be assayed to check their potential activity against Ras-RasGRF1 and prevent downstream molecular events implicated in brain disorders.

**Keywords:** Ras; RasGRF1; hydrogen-bond surrogate; computational residue scanning; molecular dynamics; MM-GBSA; protein–protein interaction; ERK signalling; cocaine addiction; intellectual disability (ID); autism spectrum disorder (ASD)

#### **1. Introduction**

Maladaptive signalling mechanisms in the brain have been demonstrated in several psychiatric and neurological conditions. Amongst the most severe and poorly treated brain disorders, cocaine and psychostimulant addiction is a poorly managed chronic disease characterised by high relapse rates and compulsive drug use [1,2]. Most notably, addictive drugs exploit cellular mechanisms and signalling pathways involved in normal learning and memory processes [3–6]. Modulation of such learned associations between drug-paired cues and the rewarding effects of these drugs significantly contribute to persistently elicited drug-seeking behaviours and high rates of relapse [7–22]. The Ras (rat sarcoma protein)–ERK (extracellular signal-regulated kinase) pathway is crucially involved in both the acute and long-term effects of cocaine in experimental animal models, and previous work has shown that brain-penetrating Ras–ERK inhibitors may ameliorate the

**Citation:** Gulotta, M.R.; Brambilla, R.; Perricone, U.; Brancale, A. A Rational Design of α-Helix-Shaped Peptides Employing the Hydrogen-Bond Surrogate Approach: A Modulation Strategy for Ras-RasGRF1 Interaction in Neuropsychiatric Disorders. *Pharmaceuticals* **2021**, *14*, 1099. https://doi.org/10.3390/ph14111099

Academic Editor: Osvaldo Andrade Santos-Filho

Received: 30 September 2021 Accepted: 26 October 2021 Published: 28 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

associated symptoms. Particularly relevant was the observation that Ras-RasGRF1 (Ras guanine nucleotide-releasing factor 1) interaction is responsible for the activation of the ERK cascade downstream of neurotransmitter receptor systems. Indeed, we previously showed that a cell-penetrating peptide, RB3, able to attenuate Ras-RasGRF1 binding could reduce downstream ERK signalling in response to cocaine in a mouse model. Similarly, RB3 was shown to ameliorate cellular effects and behavioural symptoms in two distinct mouse models of intellectual disability (ID) and autism spectrum disorder (ASD) [23,24].

ERK signalling in most tissues responds to extracellular signals and regulates cell proliferation, differentiation, and survival [25–27]. In this context, Ras proteins act as binary switches in signalling pathways by cycling between inactive GDP- and active GTPbound states [28]. Kinetic studies highlighted that the activation of Ras protein, proceeding from the conversion of Ras-GDP to Ras-GTP, initiates through the recruitment of the guanine nucleotide exchange factors (GEFs), such as RasGRF1 and Sos (Son of sevenless protein) [29–34], that catalyse GDP release and allow its replacement by GTP [35–40]. Then, the GTP molecule binds to this complex, promoting the release of the GEF protein [41]. The RasGRF1 and Sos region responsible for Ras-specific nucleotide exchange activity exhibits a Ras exchanger motif (Rem) domain of about 450 amino acids and a Cdc25 homology domain [42–46]. In addition, Sos requires allosteric activation through a second Ras-binding site that bridges the Rem and Cdc25 domains [47,48]. When Sos is activated, a helical hairpin belonging to the Cdc25 domain inserts between two flexible regions of Ras, switch I (amino acids 25–40) and switch II (amino acids 57–75) [49–58], causing Ras conversion to the transient state by opening the nucleotide-binding site of Ras for GDP release [43] (Figure 1). After this event, Ras can promptly accommodate and bind GTP into the nucleotide-binding site, thus exhibiting its active state. Therefore, a potential strategy to inhibit Ras-GEF interaction should target the open—or transient—state of Ras protein by designing modulators able to bind the nucleotide-exchange region.

**Figure 1.** Inactive and transient states of the Ras protein. On the left, a Ras protein (orange chain; PDB ID: 1XD2 [48]) bound to GDP (grey ligand in stick format) exhibits its inactive state, where the Switch I region (highlighted by a blue circle) is closed; on the right, after binding to a guanine nucleotide-exchange factor (Sos protein, green chain), Ras (orange chain; PDB ID: 1XD2) exhibits a transient state, where the Switch I region is open to accommodate the GEF α-helix.

In contrast to Sos, which requires Ras binding to the allosteric site for activity, the Cdc25 domain of RasGRF1 is active on its own [33,41,47,48]. The structure of the Cdc25 domain of RasGRF1 is very similar to that of Sos, registering 30% of sequence identity between the two Cdc25 domains. The orientation and conformation of the RasGRF1 helical hairpin resemble that of Sos in its active form, with an RMSD value of 2.3 Å for the helical hairpins after superposition on the Cdc25 domain core. Moreover, distance difference matrices demonstrated that the main differences between RasGRF1 and Sos in its inactive form have been identified in the helical hairpin, even in this case confirming that the RasGRF1 Cdc25 domain is more similar to active Sos [59]. Therefore, the analysis of the Ras-Sos complex might provide crucial insights, even for RasGRF1 interaction.

#### *1.1. Mutational Studies on Sos and Design of a Peptide-Based Ras Inhibitor*

To date, no complex structure of Ras-RasGRF1 is available in the Protein Data Bank [60]; thus, provided that Sos and RasGRF1 proteins share the Ras-specific nucleotide exchange domain [59], Ras-Sos X-ray crystallographic complexes were exploited for a comparative analysis.

Over the last decades, several mutational studies were conducted on Ras and Sos proteins to determine the key amino acids. In 1998, Boriack-Sjodin et al. [43] demonstrated that the contacts between Ras and Sos are mainly mediated by the Switch I and Switch II regions of Ras [43,49–58] and are essentially hydrophobic, polar, and charge–charge contacts. Hall et al. [61] performed site-directed mutagenesis to deeply investigate these contacts. The results shed light on the hydrophobic pocket of Sos protein, consisting of residues Ile825, Leu872, and Phe929, which embed the side chain of Tyr64 of Ras through hydrophobic contacts. In addition, the contribution of Tyr64 of Ras was explored by applying a mutation to alanine (Y64A). The result was a 50-fold reduction in the apparent binding affinities of Ras to Sos, but did not provide significant nucleotide dissociation. Then, the authors performed another binding assay by using wild-type (WT) Ras and mutated Phe929 of Sos to alanine (F929A). The Sos mutant reported a decrease of more than 50-fold in binding affinity for Ras. This data indicated that Tyr64 and Phe929 mediated crucial contacts for the formation of a stable Ras-Sos complex.

On the other hand, polar and charged interactions showed to be not essential for the binding affinity of Ras to Sos. Indeed, alanine mutations on Sos residues Arg826, Thr935, and Glu1002 weakly impacted on Ras binding and activation. In addition, the mutation of Ala59 of Ras to glycine (A59G) did not significantly affect the GDP-dissociation rate, displaying more than 50% of the inhibitory effect on Sos-catalysed GTP dissociation. Finally, the contribution of two Ras amino acids involved in the Switch I region were investigated: Tyr32 of Ras that established hydrophobic contacts with Lys939 of Sos, and Tyr40 of Ras that mediated stacking interaction with His911 of Sos. Tyr32 and Tyr40 of Ras were mutated to Ser (Y32S) and Ala (Y40A), respectively. Both mutations decreased the binding of Sos to Ras and accelerated the rate of intrinsic GDP/GTP exchange, suggesting that these residues are important for Ras-Sos recognition and the nucleotide stabilization. Consistent with these results, mutations of Sos Lys939 and His911 to alanine (K939A and H911A, respectively) also caused a reduction in Ras-Sos binding. Furthermore, the Y40A mutation had no significant effect on Sos-catalysed guanine nucleotide exchange, whereas the disruption of the contact between Tyr32 of Ras and Lys939 of Sos reduced the sensitivity of Ras to the exchange activity of Sos [43].

Several efforts have been reported in the literature to design and identify Ras inhibitors to block the nucleotide exchange. However, to date, the current scientific insights on ERK signalling in drug addiction has not been transferred into clinical treatments due to the lack of drugs with relatively low IC<sup>50</sup> values, toxicity, and ability to efficiently cross the blood–brain barrier (BBB) [62].

In 2016, Papale and colleagues [63] designed and generated an active cell-penetrating peptide, named RB3 [64], based on the interaction between Ras and a GEF protein; i.e., RasGRF1, able to attenuate cocaine-mediated activation of the Ras–ERK signalling cascade in vivo. Subsequently, the same peptide was successfully used, in combination with the KIM sequence containing RB1 peptide, to treat two genetic animal models of intellectual disability and autism spectrum disorder, both characterised by an abnormally high ERK signalling activity [23,24].

The cell-penetrating peptides have been shown to be promising for the treatment of neuropsychiatric disorders, especially due to their low reported toxicity and tolerability [65,66]. Although their biological activity spans a micromolar range, they usually show a potential advantage, as they are able to partially disrupt protein–protein interactions without preventing the enzymatic activity.

The RB3 peptide was designed by using molecular graphics tools, on the basis of the ternary complex consisting of Ras in its transient state bound to the Sos Cdc25 domain and Ras in its inactive state complexed with a GDP molecule (PDB ID: 1XD2) [48]. The Cdc25 domain of Sos involved in this ternary complex was compared to the crystal structure of the RasGRF1 Cdc25 domain (PDB ID: 2IJE [59]). The peptide sequence (amino acids 1173–1203 of the Cdc25 domain) includes an α-helix—from Met1181 to Glu1191—crucial for the GDP exchange activity on Ras proteins, linked to two loops—from Pro1173 to Gly1180 and from Gly1192 to Asn1203 (Figure 2).

**Figure 2.** RB3 peptide structure including a loop from Pro1173 to Gly1180, an α-helix from Met1181 to Glu1191, and another loop from Gly1192 to Asn1203.

Moreover, Papale and colleagues [67] added to the RB3 peptide sequence a portion retrieved from the HIV TAT protein known to exhibit a translocating behaviour [68]. In this way, the final structure of the cell-penetrating peptide was created able to cross the cell membranes and the BBB [67]. Thus, the peptide sequence is below reported:

#### GRKKRRQRRR—PPCVPYLGMYLTDLVFIEEGTPNYTEDGLVN TAT sequence RasGRF1 interacting region

Then, the RB3 peptide was tested in an ex vivo model of acute striatal brain slices to investigate its inhibitory potential on ERK phosphorylation after stimulation with 100 µM of glutamate. The result was a significant reduction of ERK activity, with an IC<sup>50</sup> of 6 µM. To deeply explore the effect of the RB3 peptide on Ras–ERK signalling pathway, Papale and colleagues investigated whether RB3 may also affect the phosphorylation of two well-characterised ERK substrates, (Ser10)-acetylated (Lys14) histone H3 (pAc-H3) and S6 ribosomal protein (pS6, Ser235/236 specific site) [69–71]. Even in this case, the peptide was effective in decreasing the phosphorylation of Ac-H3, with an IC<sup>50</sup> of 5.2 µM; and pS6 levels, with an IC<sup>50</sup> of 3.69 µM [63].

#### *1.2. RB3 Peptide Modifications by Using Hydrogen-Bond Surrogates*

In light of the above, the RB3 peptide was selected to enter a compound optimisation process to increase the biological activity. This work aimed to provide insights on Ras-RasGRF1 interaction and suggest novel potential modulators of this interaction based on the RB3 peptide structure, which will be further investigated through biological assays. For this purpose, multiple computational techniques were exploited to investigate modifications of the RB3 peptide in order to potentially increase its inhibitory activity of Ras-RasGRF1 interaction. Scheme 1 lists the steps of the workflow that are described in detail in the following sections.

**Scheme 1.** Overview of the computational workflow performed to identify potential peptide-based modulators to inhibit Ras-RasGRF1 interaction.

First, a computational alanine scanning of the Ras-Sos complex available from the Protein Data Bank [60] was performed and, in parallel, molecular dynamics (MD) simulations on the three complexes (Ras-Sos, Ras-RasGRF1, and the Ras-RB3 peptide), were run to identify the most stable and frequent interactions between protein partners. Surprisingly, the RB3 peptide exhibited helicity loss, where the helical hairpin corresponding to RasGRF1 interacting region lost helicity propensity, generating instability within the complex. In order to optimise the structure of this peptide and increase the inhibitory capacity of the peptide, from the analysis of the literature, a potential strategy arose to solve this issue. Indeed, in the literature, similar cases of helicity loss have been reported and faced by exploiting the hydrogen-bond surrogate (HBS) approach [72]. This methodology was developed especially for modulating biomolecular interactions, such as protein–protein contacts, through small-molecular-weight protein secondary structure mimetics, when designing small molecules could be a very challenging strategy [73–79]. The HBS approach is based on the helix-coil transition theory for peptides, whereas α-helices composed of a few amino acids are expected to be essentially unstable due to a low nucleation probability [80,81]. This approach is expected to overwhelm the intrinsic nucleation propensities of the amino acids by providing a preorganization of the residues upstream, which triggers the helix formation initialization [82,83]. An example of a successful case of HBS use was the HBS3 peptide [62,84], a synthetic α-helix that reduced the Ras nucleotide exchange in vitro and modestly activated ERK in cells [62,84]. This peptide was basically built on the Sos sequence able to bind the Ras protein, and incorporates the HBS strategy reported in the literature [62,72,84]. On the other hand, in the literature, there is no evidence that

the HBS methodology has been applied to the RB3 peptide. Thus, it appeared interesting to investigate the employment of this strategy aiming at avoiding the helicity loss of RB3 peptide.

In general, in an α-helix, the carbonyl group of the *i*th amino acid residue mediates a hydrogen bond with the amine group of the (*i* + 4)th amino acid residue by generating nucleation and stabilisation of the helical structure. Based on this evidence, the HBS strategy for generation of artificial α-helices involves the replacement of one of the main chain hydrogen bonds with a covalent linkage [73,85]. Indeed, to mimic the C=O · · · H-N hydrogen bond as closely as possible, a covalent bond of the type C=X–Y–N is included, where X and Y are usually carbon atoms that would be part of the *i*th and the (*i* + 4)th residues. However, the analysis of the RB3 α-helix highlighted that the first amino acid implicated in the helix H-bond ensemble does not establish a traditional hydrogen bond with the (*i* + 4)th amino acid, while it forms a contact with the (*i* + 3)th amino acid by creating the so-called 310-helix [86]. Therefore, in this work, the RB3 peptide was modified by creating a C-C bond between the first (Tyr1178) and the fourth amino acid (Met1181), hereafter called the 310-HBS RB3 peptide. An MD simulation of the complex Ras-310- HBS RB3 peptide was run, highlighting a stable peptide helical conformation during the entire trajectory.

Then, a computational residue scanning was run on the peptide to analyse and identify the most promising mutations to be considered in terms of ∆∆Gaffinity and ∆∆Gstability. The point-mutated peptides, in complex with the Ras protein, were exploited to run MD simulations and MM-GBSA calculations to guide the selection of the most interesting mutations. The resulting ones were further combined with each other in the 310-HBS RB3 peptide structure to obtain 48 combinatorial peptides. Thus, these latter were investigated through MD and MM-GBSA to retrieve the calculated ∆Gbinding average values for each couple Ras-combinatorial peptide. Finally, 18 combinatorial peptides were selected, and they will enter an experimental follow-up to further explore their potential activity against the Ras-RasGRF1 interaction. In the next sections of this manuscript, efforts to modify the peptide structure to increase the biological activity will be described.

#### **2. Results and Discussion**

#### *2.1. Sos and RasGRF1 Binding Interfaces Analysis*

To date, no PDB structures of the Ras-RasGRF1 complex are available in the literature, thus the Ras-Sos complex structure was exploited to collect key information useful to guide and address the computational studies described herein. In order to deeply explore and predict the hot-spot residues of the Sos Cdc25 domain (i.e., amino acids 924 to 957), all of the six available PDB structures of the Ras-Sos complex (PDB IDs: 1XD2 [48], 1BKD [43], 1NVW, 1NVV, 1NVU, and 1NVX [47]) were examined by using the Robetta Computational Interface Alanine Scanning Server [87,88]. In Table 1, the highest ∆∆G values from the alanine mutations on the Sos binding interface are reported.

**Table 1.** Predicted ∆∆G values from the alanine mutations on Sos binding interface amino acids retrieved by the Robetta Computational Interface Alanine Scanning Server [87,88].


n.a. = not available.

These predicted data were in accordance with mutational studies performed by Hall et al. [61], whereas Phe929, Thr935, and Lys939 were highlighted as Sos interact-

ing hot spots. In detail, from the computational alanine scanning, Phe929 and Thr935 were shared by most of the six PDB complexes as hot spots, while Lys939 resulted from the PDB 1NVV analysis. As can be seen, this computational tool pointed out another Sos hot spot not previously identified by Hall et al., Asn944, which was shared from all the six PDB complexes. Another hot spot on Glu942 was retrieved from PDB 1NVW.

These identified hot spots, both from biological assays [61] and computational alanine scanning, were considered equally important for the next steps, and were used for comparison to RasGRF1 amino acids in order to investigate similarities between the two GEF sequences (RasGRF1 and Sos). For this purpose, PDB 1XD2 [48], including Sos in complex with the Ras protein, was chosen for the low resolution, while the only available PDB structure of the RasGRF1 Cdc25 domain (PDB ID: 2IJE [59]) was used. However, the latter PDB was from *Mus musculus* as organism. Hence, before proceeding with the protein structure alignment between the Sos and RasGRF1 Cdc25 domains, a FASTA alignment was performed while considering human and murine RasGRF1 sequences through the Protein BLAST sequence alignment tool [89,90]. The resulted overall identity was 83.22%, whereas within the RasGRF1 region involved in Ras binding (i.e., from residue 1173 to 1203 of mouse sequence), the only detected difference was between Ala1198 for human and Val1187 for mouse, as illustrated in Figure S1 in the Supplementary Materials. These two amino acids exhibited side chains with very similar chemical properties, thus the PDB 2IJE was considered suitable for proceeding with this study.

Therefore, both PDB protein structures (2IJE [59] and 1XD2 [48]) were aligned through the "Protein Structure Alignment" tool of the Schrödinger suite, and the result is depicted in Figure 3. As can be seen, the two α-helices of Sos and RasGRF1 are perfectly aligned.

**Figure 3.** Superposition of the RasGRF1 binding region (green chain, retrieved from PDB 2IJE [59]), and the Sos binding region (pink chain, retrieved from PDB 1XD2 [48]) in complex with the Ras protein (blue chain from PDB 1XD2).

Furthermore, Sos and RasGRF1 binding regions share several amino acids, as shown in the sequence alignment of the Sos and RasGRF1 Cdc25 domains illustrated in Figure 4.

**Figure 4.** FASTA sequence alignment between the Sos and RasGRF1 regions able to bind the Ras protein.

RasGRF1 amino acids corresponding to Sos hot spots are reported in Table 2. As can be seen, the pairs Thr935-Thr1184 and Glu942-Glu1191 shared the same amino acid, while Phe929 (Sos) and Tyr1178 (RasGRF1) both presented hydrophobic side chains, and Asn944 (Sos) and Thr1193 (RasGRF1) shared polar uncharged side chains. Only Lys939 and Phe1188 were very different amino acids, whereas lysine showed an electrically charged side chain, while phenylalanine exhibited a hydrophobic side chain. The amino acids of RasGRF1 highlighted in the above-described comparison were considered for the next steps of the analysis.

**Table 2.** Correspondences of Sos hot-spot residues (from biological assays [61] and computational alanine scanning) to RasGRF1 amino acids identified by performing protein structures alignment.


#### *2.2. MD Simulations of Ras in Complex with Sos and RasGRF1 Binding Fragments*

In order to investigate the importance of the computationally predicted hot spots not reported in the literature; i.e., Glu942 and Asn944, and to explore the interactions between the Ras-Sos complex, a MD simulation was run. For this purpose, PDB 1XD2 [48] was chosen, and the simulation was run for a short time of 50 ns to observe and identify the most stable and frequent contacts. The stability of the system was monitored during the entire trajectory, thus registering the RMSD plot depicted in Figure S2 in the Supplementary Materials. Then, the MD frames were clustered using the "Desmond trajectory clustering" tool of the Schrödinger suite (Schrödinger Inc., New York, NY, USA, software release v2018-4) by setting five clusters to be generated, and the frame centroids representative for the clusters were analysed; i.e., frame 120, frame 540, frame 660, frame 780, and frame 820. These five frames were investigated to identify the interactions between the Ras and Sos proteins, where among the above-mentioned five Sos hot spots (please refer to Table 2), four residues established stable interactions with Ras during the trajectory; i.e., Phe929, Thr935, Glu942, and Asn944. The related interactions are listed in Table 3, and the H-bonds are plotted against the simulation time in Table S1 in the Supplementary Materials.


**Table 3.** Stable interactions established during MD trajectories of Ras-Sos and Ras-RasGRF1 complexes.

Based on the previously described comparison between the Sos interacting region and the RasGRF1 Cdc25 domain, an MD simulation of the Ras-RasGRF1 complex was performed through the Schrödinger suite [91] to investigate whether the five putative RasGRF1 key residues (Tyr1178, Thr1184, Phe1188, Glu1191, and Thr1193) were responsible for contacting the Ras protein and stabilising the complex.

Therefore, the protein–protein complex was generated by using the previous aligned structures involving PDBs 1XD2 [48] and 2IJE [59], where the RasGRF1 interacting region (residues 1173 to 1203) was located within the binding pocket of the Ras protein through performing a superimposition on the Sos chain, which was subsequently deleted. The complex was minimised, and the MD was run by setting 50 ns as the simulation time. The output was analysed, and the stability of the system was checked through the RMSD plot (Figure S3 in the Supplementary Materials). The trajectory was clustered by setting five clusters to be generated. Then, the MD frame centroids were analysed; that is, frame 40, frame 65, frame 110, frame 280, and frame 362. The most stable interactions between Ras and RasGRF1 were observed, and they are reported in Table 3. In Table S2 in the Supplementary Materials, the H-bonds are plotted against the simulation time. As can be observed, most of these interactions were similarly established between Ras and the corresponding Sos amino acids (please refer to Table 2) during the MD simulation. This provided interesting information to take forward in this work.

#### *2.3. MD Simulations of the Ras-RB3 Peptide Complex*

After collecting information about interactions between Ras and its GEFs (Sos and RasGRF1), other MD simulations were performed to explore the binding mode and the established contacts between Ras and the parental (WT) RB3 peptide.

The core sequence of this peptide (without TAT portion) corresponds to RasGRF1 sequence 1173-PPCVPYLGMYLTDLVFIEEGTPNYTEDGLVN-1203. Therefore, the complex Ras-RasGRF1 was used, and all those residues not included in the RB3 peptide sequence were deleted. Thus, this new complex, the Ras-RB3 peptide, was processed by running MD simulations of 500 ns each. The RMSD plot was generated, and it is depicted in Figure S4a in the Supplementary Materials. This plot revealed a certain instability of the systems, ranging from about 3.5 to 6.4 Å. Therefore, a second MD simulation was computed, and

even in this case the RMSD plot (Figure S4b in the Supplementary Materials) showed the same trend. On the other hand, the interactions between Ras and the peptide were investigated for both trajectories. The MD frames of both simulations were grouped into five clusters each, and the frame centroids were analysed to retrieve the most stable and frequent interactions: (a) first MD → centroid frame 390, centroid frame 2340, centroid frame 1400, centroid frame 3590, and centroid frame 4270; (b) second MD → centroid frame 3610, centroid frame 4800, centroid frame 2660, centroid frame 620, and centroid frame 1280. The observed contacts between the Ras protein and the RB3 peptide from the above-listed frames are reported in Table 4.

**Table 4.** The most stable interactions between Ras and the RB3 peptide highlighted from the MD trajectory analyses.


The two MD simulations shared only two interactions; i.e., one hydrogen bond between Tyr40 of Ras and Asp1185 of the RB3 peptide, and a pi–pi stacking between the aromatic ring of the Tyr40 side chain of Ras and the other aromatic ring of the Phe1188 side chain of the RB3 peptide. Furthermore, all the other interactions retrieved from the MD simulations registered low stability during the entire trajectories.

Thus, a visual check of both simulations shed light on an important behaviour of the RB3 peptide α-helix; i.e., a portion of the α-helix (from Met1181 to Thr1184) began to lose helicity propensity after about 50/60 ns of simulation time, resulting in a misfolding behaviour. Indeed, not surprisingly, the two above-mentioned interactions shared by both simulations from the MD analyses were involved into the folded region of the peptide over the entire trajectories; i.e., the residues from Asp1185 to Glu1191 not exhibiting the misfolding. Figure 5 illustrates the misfolding of the RB3 α-helix after 50 ns of the first MD simulation.

Therefore, a strategy to overcome this issue was implemented by applying the hydrogen-bond surrogate approach, which had already provided successful experimental evidence [62,73–79,84]. Thus, the RB3 peptide was processed by modifying the structure. First, two portions of the peptide were deleted; i.e., the residues belonging to the two loops of the peptide (residues 1171 to 1177 and 1194 to 1203), because they showed a lack of crucial interactions according to mutational studies reported in the literature [43,61] and the MD simulations of Ras-RasGRF1 (please refer to Table 3). Thus, residues 1171 to 1177 and 1194 to 1203 were considered not important for the purpose of this work, and were deleted from the structure. Then, the analysis of the α-helix highlighted that Tyr1178, the first amino acid implicated in the helix H-bonds ensemble, did not establish a traditional hydrogen bond with the (*i* + 4)th amino acid, while it formed a contact with the (*i* + 3)th amino acid, by creating the so-called 310-helix [86].

Hence, a MD simulation of 500 ns was performed on the Ras protein in complex with the RB3 peptide modified by deleting the two loops at the N- and C-termini and creating a covalent C-C bond between the carbonyl oxygen of the Tyr1178 backbone and the amine hydrogen of the Met1181 backbone. The resulting peptide was termed 310-HBS RB3 (Figure 6).

**Figure 5.** (**a**) Frame 0 of the first MD simulation on Ras (blue chain) in complex with a RasGRF1 fragment (pink chain, aa 1173 to 1203); (**b**) frame from the first MD simulation on Ras in complex with a RasGRF1 fragment after about 50 ns, depicting the RB3 peptide losing helicity propensity in the portion from Met1181 to Thr1184.

**Figure 6.** (**a**) Structure of the 310-HBS RB3 peptide; (**b**) the 310-HBS RB3 peptide including a covalent C-C bond (green bond) between the carbonyl oxygen of Tyr1178 backbone and the amine hydrogen of Met1181 backbone.

> The analysis of the output revealed a stable trend for the α-helicity of the peptide, which held its folded conformation. Even the RMSD plot (Figure S5 in the Supplementary Materials) showed a certain stability of the system, thus the frames were analysed to retrieve information about the most stable interactions, and the results were plotted as depicted in Figure 7.

**Figure 7.** (**a**) Histogram of the interactions established between the Ras protein binding region and the 310-HBS RB3 peptide; (**b**) Plot illustrating the frequency of interaction occurrences between the Ras protein binding region and the 310-HBS RB3 peptide.

This newly designed 310-HBS RB3 peptide showed that it could establish some of the key interactions identified in the previous MD simulation between the Ras and RasGRF1 proteins (please refer to Table 3) and other contacts with Ras amino acids (Tyr32 and Tyr40) highlighted as key residues from mutational studies [61]. Finally, an MM-GBSA calculation of the MD frame was computed to obtain the ∆Gbinding of the complex Ras-310-HBS RB3 peptide, which was −79.70 kcal/mol. This value was exploited as a reference for the peptide optimisation process described in the following sections.

#### *2.4. Computational Residue Scanning of the 310-HBS RB3 Peptide and MD Simulations of Point-Mutated Peptides*

In order to optimise the structure of the 310-HBS RB3 peptide, a computational residue scanning was performed on the amino acids of the peptide by using the "Residue scanning" tool of Bioluminate (Schrödinger Inc., software release v2018-4) [92]. The peptide was point-mutated where each residue was substituted with all the standard amino acids, and ∆∆Gaffinity and ∆∆Gstability values of the new complexes were computed. The aim was to identify the most promising mutations in terms of ∆∆Gaffinity and ∆∆Gstability. For this purpose, only mutations reporting both ∆∆Gaffinity and ∆∆Gstability values below −3 kcal/mol were considered, according to the work of Beard et al. [92], which reported a correlation between experimental results and the computationally predicted ones through the Schrödinger suite. Indeed, the authors demonstrated that a difference of 3 kcal/mol between the mutated and WT forms of a complex might be considered reliable in hot-spot prediction. Finally, 16 mutations reported ∆∆Gaffinity and ∆∆Gstability values lower than −3 kcal/mol, thus they were considered for the next steps of this work (Table 5).


**Table 5.** Computational residue scanning results for the peptide 310-HBS RB3, highlighting 16 promising mutations.

These 16 mutations were used to create as many complexes involving the Ras protein and the 310-HBS RB3 point-mutated peptides that underwent MD simulations. The trajectory time was set at 100 ns for each system, since this timeframe was considered suitable to detect potential misfolding of the peptides. Indeed, the previously described MD simulations on the WT RB3 peptide exhibited misfolded conformation by losing αhelicity after about 50/60 ns of simulation. From the analysis of the MD trajectories, all the point-mutated peptides were able to keep the helical conformation, thus MM-GBSA calculations were computed, and the related ∆Gbinding values are reported below in Table 6. Table S3 in the Supplementary Materials lists the ∆Gbinding average values of the interaction energies and the generalized Born solvation energy for the MD trajectories of the complexes' Ras-point-mutated 310-HBS RB3 peptides. The stability of the systems was investigated by

analysing the RMSD plots per each complex, resulting in suitable stationary shape for each system (Table S4 in the Supplementary Materials).

**Table 6.** MM-GBSA calculation results based on MD simulations of the 16 point-mutated 310-HBS RB3 peptides in complex with the Ras protein.


As previously mentioned, the ∆Gbinding of the complex between Ras and the WT 310- HBS RB3 peptide (−79.70 kcal/mol) was used as a reference to select the most promising mutations associated with ∆Gbinding values lower than the reference one. In light of the above, from the MM-GBSA results, only three mutated peptides showed higher ∆Gbinding values. Hence, the related mutations, F1188H, E1190H, and E1191I, were neglected. On the contrary, the other 13 mutations were considered for creating combinatorial peptides, as described in the next section.

### *2.5. Combinatorial Peptides Using 310-HBS RB3: Creation and MD Simulations*

The most promising mutations on the 310-HBS RB3 peptide were combined with each other to obtain 48 mutated peptides overall, as listed below in Table 7.


**Table 7.** Combinatorial peptides designed based on computational residue scanning performed on the 310-HBS RB3 peptide and MM-GBSA calculations on MD simulations.

The 48 combinatorial peptides in complex with the Ras protein were processed by running MD simulations of 100 ns each to investigate helix conformational stability and the contacts formed with the Ras protein. All the trajectories were observed by generating RMSD plots to ensure the reliability of the outputs, and the interaction frequency and stability were analysed. Finally, MM-GBSA calculations of the MD simulations were computed. Thus, all those combinatorial peptides not responding to the following criteria were neglected:


Finally, 18 combinatorial peptides overall fulfilled the above criteria by resulting in promising ∆Gbinding values and exhibiting a helical trend during the whole MD trajectory. Thus, Table 8 reports the MM-GBSA results of these 18 most promising combinatorial peptides, which will be considered for the follow-up of this study by carrying out biological assays. Table S5 in the Supplementary Materials lists the ∆Gbinding average values of the interaction energies and the generalized Born solvation energy for the MD trajectories of the complexes' Ras-combinatorial peptides. The related RMSD plots and interaction diagrams of these 18 selected peptides are reported in Tables S6 and S7, respectively, in the Supplementary Materials. Based on the above-mentioned plots, these selected combinatorial peptides were able to mainly reproduce the key interactions of the 310-HBS RB3 peptide reported above in Figure 7 by establishing contacts with Ras key residues, especially Tyr32, Tyr40, and Tyr64, which were highlighted as crucial by previous experimental assays [43,61].


**Table 8.** MM-GBSA calculation results based on MD simulations of 310-HBS combinatorial peptides not misfolded during the simulations in complex with Ras protein, and with ∆Gbinding values lower than the reference one (−79.70 kcal/mol).

> Furthermore, other contacts appeared, especially with Asp57, Gly60, and Gln61 of Ras. Indeed, these amino acids were involved in interactions with Sos and RasGRF1 according to previous MD simulations, thus confirming that these designed peptides might bind and inhibit the interaction between the Ras protein and the guanine nucleotide exchange factors, Sos and RasGRF1. Figure 8 depicts the binding mode of the 310-HBS combinatorial peptide forty-three, which reported the lowest ∆Gbinding average.

**Figure 8.** Frame of the MD simulation performed on the combinatorial peptide forty-three in complex with the Ras protein, depicting the binding mode of the peptide.

#### **3. Methods**

#### *3.1. Protein Preparation*

The 3D structures of the Ras-Sos complex (PDB IDs: 1XD2 [48], 1BKD [43], 1NVW, 1NVV, 1NVU, and 1NVX [47]) and RasGRF1 protein (PDB ID: 2IJE [59]) were retrieved from the Protein Data Bank [60] and optimised using the "Protein preparation" tool of the Schrödinger suite (Schrödinger Inc., New York, NY, USA) software release v2018-4) [93]. The bond orders for untemplated residues were assigned by using known HET groups based on their SMILES strings in the Chemical Component Dictionary. Hydrogens were added to the structure, zero-order bonds between metals and nearby atoms were added, and formal charges to metals and neighbouring atoms were corrected. Disulfide bonds were created according to possible geometries, and water molecules beyond 5.0 Å from any of the HET groups, including ions, were deleted. Then, protonation and metal charge states for the ligands, cofactors, and metals were generated [94,95]. Finally, PROPKA [95] was run under pH 7.0 to optimise hydroxyl groups and Asn, Gln, and His states.

#### *3.2. MD Simulations of Ras Protein in Complex with Sos, RasGRF1, RB3 Peptide, and the Designed 310-HBS Peptides*

In this work, 69 MD simulations were performed using Desmond [91,96–99], as follows: 1 MD simulation of 50 ns for the Ras-Sos complex, 1 MD simulation of 50 ns for the Ras-RasGRF1 complex, 2 MD simulations of 500 ns for Ras in complex with the WT RB3 peptide, 1 MD simulation of 500 ns for Ras in complex with the 310-HBS RB3 peptide, 16 MD simulations of 100 ns for Ras complexed with the point-mutated 310-HBS peptides, and 48 MD simulations of 100 ns for Ras in complex with the combinatorial 310-HBS peptides. All the trajectories were computed by applying the same MD settings below described. The systems were created using TIP3P [100] as a solvent model, and the orthorhombic shape box was chosen. The box side distances were set at 10 Å. The force field OPLS3e [101] was applied, and the systems were neutralized by adding Na<sup>+</sup> ions. The outputs were further processed by performing MD simulations with the above-reported simulation times.

The ensemble class NPT was chosen to maintain the number of atoms, the pressure, and the temperature constant for the entire trajectories. The thermostat method employed was the Nosé–Hoover chain with a relaxation time of 1.0 ps and a temperature of 300 K. The barostat method applied was Martyna–Tobias–Klein, with a relaxation time of 2.0 ps and an isotropic coupling style. The timestep for numerical integration was 2.0 fs for bonded interactions, 2.0 fs for nonbonded-near (van der Waals and short-range electrostatic interactions), and 6.0 fs for nonbonded-far (long-range electrostatic interactions). For Coulombic interactions, a cut-off radius of 9.0 Å was tuned as a short-range method. Pressure and temperature were set at 1.01325 bar and 300 K, respectively. Finally, the systems were relaxed before beginning the simulations according to the following steps:


6. Final step of 24 ps of relaxation in NPT ensemble using a Berendsen thermostat and barostat, a temperature of 300 K and a pressure of 1 atm, a fast temperature relaxation constant, and a normal pressure relaxation constant.

#### *3.3. MD Frame Clustering*

In order to retrieve the key contacts between the protein partners during the entire simulations, for the MD simulations performed for Ras-Sos, Ras-RasGRF1, and Ras-RB3 peptide complexes, the frames were clustered to identify the most representative centroids to be analysed. The RMSD matrix calculation was set using the protein backbone as reference, the frequency of frames analysis was set 10, and the hierarchical cluster linkage method as average. Finally, for each MD trajectory, five clusters were generated; the analysis was reported in the Results and Discussion section.

### *3.4. Computational Residue Scanning of Peptide 310-HBS RB3 in Complex with Ras*

The 310-HBS RB3 peptide in complex with Ras (from PDB 1XD2 [48]) was used to perform a computational residue scanning (Schrödinger Inc., software release v2018-4) to perform point mutations on the peptide residues. The predicted changes in binding affinity and stability were calculated according to Equation (1) [92]:

$$
\Delta\Delta G\_{Affinity} = \left(E\_{A\cdot B}^{MIT} - E\_A^{MIT} - E\_B^{MIT}\right) - \left(E\_{A\cdot B}^{WT} - E\_A^{WT} - E\_B^{WT}\right) \tag{1}
$$

where *E* is the calculated energy of each protein (A and B) or complex (A·B) after refinement while considering the mutant form (MUT) and the wild-type (WT) of the protein. The resulting structures were refined by selecting side-chain prediction with backbone minimization.

For the purpose of the model, ∆∆Gstability was computed while representing the unfolded ligand as a tripeptide, A-X-B, where X is the residue that is mutated, and A and B are its neighbours, capped with ACE and NMA. The assumption was that the remaining interactions in the unfolded state were negligible. Thus, ∆∆Gstability values were calculated according to Equation (2):

$$
\Delta\Delta G\_{Stability} = \left(E\_{L(u)}^{MLT} - E\_{L(f)}^{MLT}\right) - \left(E\_{L(u)}^{WT} - E\_{L(f)}^{WT}\right) \tag{2}
$$

where *E*, in this case, is the calculated energy for the unfolded parent ligand (L(u)) and the folded parent ligand (L(f)) while considering the mutant form (MUT) and the wild-type (WT) of the protein [92]. The calculations were done with Prime MM-GBSA [102,103], which employs an implicit (continuum) solvation model.

#### *3.5. MM-GBSA Calculations of All the Complexes Used to Perform MD*

The MD outputs of Ras protein in complex with 310-HBS RB3 peptide, the pointmutated peptides, and the combinatorial peptides were used to compute MM-GBSA calculations through the command line. For this purpose, the Python script "thermal\_mmgbsa.py" was used. Overall, 65 MM-GBSA calculations were carried out using VSGB as a solvation model, and OLPS3 FF was set for each MD trajectory. The ∆Gbinding values were computed for each trajectory frame according to Equation (3):

$$
\Delta \mathbf{G}\_{\text{binding}} = E\_{\text{A-B (minimized)}} - E\_{\text{A (minimized)}} - E\_{\text{B (minimized)}} \tag{3}
$$

where *E* is the calculated energy of complex (A·B) or each protein (A and B) after minimization [68]. Finally, the average of ∆Gbinding values of the entire trajectories was calculated; the results were reported above in Tables 7 and 8 in the "Results and Discussion" section.

#### **4. Conclusions**

The above-described work was intended to investigate potential modifications of a patented peptide, RB3 [64], to increase its inhibitory capacity of the Ras–ERK signalling pathway involved in cocaine abuse. This peptide has been reported to work as an inhibitor

targeting the interaction between Ras protein and the guanine nucleotide exchange factors [63]. In detail, assays carried out on an ex vivo model of acute striatal brain slices reported an inhibitory activity of RB3 peptide against ERK phosphorylation by significantly reducing the ERK activity, with an IC<sup>50</sup> of 6 µM. The inhibitory potential of this peptide was further explored by Papale and colleagues, who highlighted that this peptide was effective in decreasing the phosphorylation of two ERK substrates, (Ser10)-acetylated (Lys14) histone H3 (pAc-H3) and S6 ribosomal protein (pS6, Ser235/236 specific site) [69–71], with an IC<sup>50</sup> of 5.2 µM for pAc-H3 and 3.69 µM for pS6 [63]. Due to the increasing interest in the RB3 peptide, it was chosen for to improve and modify the structure, aiming at increasing its inhibitory activity to reduce cocaine relapses in drug-addicted patients. The below-described strategy allowed us to identify 18 peptides exploiting the peptide RB3 structure, including amino acid mutations, and employing an artificial construct, the hydrogen bond surrogate, to stabilise the helical conformation of the peptides. The MD simulations performed on these molecules in complex with the Ras protein registered stable and frequent contacts with key residues of the Ras protein, as known from the literature [43,61]. Furthermore, MM-GBSA calculation of the MD trajectories reported promising ∆Gbinding average values, where, for example, the combinatorial peptide forty-three showed a ∆Gbinding value of −123.50 kcal/mol. Interestingly, the selected combinatorial peptides showed an important interaction energy increase compared to the point-mutated ones, whereas the GB solvation term reported positive values (see Tables S3 and S5 in the Supplementary Materials). Thus, it seems that the presented combinatorial peptides' interaction patterns were crucial for the complex stabilisation, as also observed in the MD RMSD plots (see Table S6 in the Supplementary Materials).

Therefore, the 16 selected combinatorial peptides were chosen, and the next step of this work will be the biological screening of the ERK signalling pathway by measuring the phosphorylation rate of the two ERK substrates, pAc-H3 and pS6 [69–71]. The results of these assays will provide crucial information about the potential of these designed peptides in inhibiting Ras activation, thus preventing molecular effects' maladaptive behavioural manifestations associated with brain conditions in which this signalling pathway is abnormally enhanced, such as cocaine abuse and certain forms of ID and ASD.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/ph14111099/s1, Figure S1. FASTA sequence alignment between RasGRF1 interacting region from two different organisms (*Homo sapiens* and *Mus musculus*); Figure S2. RMSD plot of the MD simulation performed on the Ras-Sos complex (PDB 1XD2); Table S1. Plots of the H-bonds established between Ras and Sos during the MD simulation; Figure S3. RMSD plot of the MD simulation performed on the Ras-RasGRF1 complex; Table S2. Plots of the H-bonds established between Ras and RasGRF1 during the MD simulation; Figure S4. RMSD plots of first (a) and second (b) MD simulations performed on the Ras-RB3 peptide complex; Figure S5. RMSD plot of the MD simulation performed on the Ras protein in complex with the 310-HBS RB3 peptide; Table S3. ∆Gbinding average values of the interaction energies and the generalized Born solvation energy for the MD trajectories of the complexes' Ras-point-mutated 310-HBS RB3 peptides; Table S4. RMSD plots of 100 ns MD simulations performed on the Ras protein in complex with the 16 point-mutated 310-HBS RB3 peptides; Table S5. ∆Gbinding average values of the interaction energies and the generalized Born solvation energy for the MD trajectories of the complexes' Ras-combinatorial peptides; Table S6. RMSD plots of 100 ns MD simulations performed on the Ras protein in complex with the selected 18 310-HBS RB3 combinatorial peptides; Table S7. The bar charts of protein-ligand interactions for the 18 310-HBS RB3 combinatorial peptides (left column), and the plots illustrating the frequency of interaction occurrences between the combinatorial peptides and Ras protein (right column).

**Author Contributions:** Conceptualization, R.B. and A.B.; methodology, M.R.G. and A.B.; software, A.B.; formal analysis, U.P. and A.B.; investigation, M.R.G.; resources, A.B.; data curation, M.R.G.; writing—original draft preparation, M.R.G.; writing—review and editing, R.B, U.P. and A.B.; visualization, U.P. and A.B.; supervision, U.P. and A.B.; project administration, A.B.; funding acquisition, U.P. and A.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research activity of M.R.G. was funded by the Regione Siciliana, Assessorship of Productive Activities—Department of Productive Activities, funds: FSC 2014/2020, project name: Computational Molecular Design e Screening—CheMISt—CUPG77B17000110001, Scientific Research within the "Patto per il sud" of the Sicily Region. The APC were funded by Regione Siciliana, Assessorship of Productive Activities—Department of Productive Activities, Action 1.1.5 of PO FESR Sicilia 2014/2020, Project n. 086202000366—"OBIND", CUP G29J18000700007 to U.P.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Review* **Mechanistic Understanding from Molecular Dynamics in Pharmaceutical Research 2: Lipid Membrane in Drug Design**

**Tomasz Róg 1,\* , Mykhailo Girych <sup>1</sup> and Alex Bunker 2**


**Abstract:** We review the use of molecular dynamics (MD) simulation as a drug design tool in the context of the role that the lipid membrane can play in drug action, i.e., the interaction between candidate drug molecules and lipid membranes. In the standard "lock and key" paradigm, only the interaction between the drug and a specific active site of a specific protein is considered; the environment in which the drug acts is, from a biophysical perspective, far more complex than this. The possible mechanisms though which a drug can be designed to tinker with physiological processes are significantly broader than merely fitting to a single active site of a single protein. In this paper, we focus on the role of the lipid membrane, arguably the most important element outside the proteins themselves, as a case study. We discuss work that has been carried out, using MD simulation, concerning the transfection of drugs through membranes that act as biological barriers in the path of the drugs, the behavior of drug molecules within membranes, how their collective behavior can affect the structure and properties of the membrane and, finally, the role lipid membranes, to which the vast majority of drug target proteins are associated, can play in mediating the interaction between drug and target protein. This review paper is the second in a two-part series covering MD simulation as a tool in pharmaceutical research; both are designed as pedagogical review papers aimed at both pharmaceutical scientists interested in exploring how the tool of MD simulation can be applied to their research and computational scientists interested in exploring the possibility of a pharmaceutical context for their research.

**Keywords:** molecular modeling; permeability; membrane disruption; membrane proteins; drugs; antimicrobial peptides

#### **1. Introduction**

In the most breathtaking giant leap forward in life science since the determination of the double helix structure of DNA, the general structure from the sequence problem has now been solved for the case of an individual protein domain [1]. Additionally, in 2020, the Human Proteome project, after ten years of work, reported that their complete high-stringency blueprint of the human proteome is 90.4% complete [2]; thus, an accurate sequence of every human protein and the most common variants is in sight. Together, these developments mean that we can now foresee being in possession of accurate structures for the active sites of all human proteins and most common variants; with predicted advances in computational power and the development of better algorithms, it is no longer fantasizing to speculate that, in the near future, it could be possible to obtain a sufficiently accurate estimate of the binding free energy for any given drug candidate molecule for every single possible human protein active site it may encounter. This may represent a Holy Grail of drug design; however, it does not mean the problem of computational drug design will have become a solved problem—it leaves out the rest of the biophysical landscape within which drug action takes place. This review paper explores the role that

**Citation:** Róg, T.; Girych, M.; Bunker, A. Mechanistic Understanding from Molecular Dynamics in Pharmaceutical Research 2: Lipid Membrane in Drug Design. *Pharmaceuticals* **2021**, *14*, 1062. https://doi.org/10.3390/ph14101062

Academic Editor: Osvaldo Andrade Santos-Filho

Received: 20 September 2021 Accepted: 15 October 2021 Published: 19 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

computational modeling, using the toolkit of molecular dynamics (MD) simulation, can play regarding drug design outside of this "lock and key" paradigm, focusing on one fundamentally important aspect: the lipid membrane.

In our previous review paper [3], part 1 of this series, we presented many examples of how MD simulation has been used as a tool to provide mechanistic insight relevant to drug delivery and how this insight has been used, in concrete practical terms; we now continue the discussion of the role MD simulation has and will continue to play in pharmaceutical research, focusing, in this review paper, on its role in drug design, specifically covering its ability to elucidate a central element left out of the conventional drug design paradigm: lipid membranes. As with part 1, this is a pedagogical review, with two separate target audiences, pharmaceutical researchers interested in understanding the increasing role MD simulation can play in their research, and computational researchers interested in the possibility of developing a pharmaceutical context for their research.

In part 1, we reviewed what MD simulation is and the limitations of both the aforementioned conventional "lock and key" approach to drug design [4,5] and the absorption, distribution, metabolism, and excretion (ADME) approach to drug delivery [6–8] that has led to the diminishing returns known as "Eroom's law" [9]. We showed how MD simulation can provide mechanistic insight needed for the development of advanced drug delivery mechanisms. We argued that research has been hamstrung by the limited paradigm used and that MD simulation has the power to bring into consideration the broader biophysical context within which drug delivery occurs; we now proceed to argue the same for the case of drug design. As discussed in part 1, drug design has been carried out within a limited paradigm that only considered a compromise between the drug interaction with the active site of a specific target protein and its solubility; increasingly sophisticated techniques are being used to approximate the relative binding free energies of different drugs, in each case a balance being reached between maximizing accuracy and minimizing the computational resources used [10]. Although techniques have recently been developed to consider the importance of binding kinetics [10–14], in addition to thermodynamics, this still does not take the broader biophysical context in which the drug interaction with the protein occurs into consideration, i.e., the environment beyond the immediate drug protein interaction; additionally, it is possible for drugs to have an effect that does not even involve interaction with a protein, i.e., a mode of action entirely external to the "lock and key" paradigm.

The elucidation of all aspects of the biophysical context of drug action is currently beyond reach and even all aspects that MD simulation is capable of elucidating are beyond the scope of a single review paper; we thus focus on an important element involved in drug action left out of the "lock and key" paradigm: lipid membranes. Biomembranes, complex fluid structures formed primarily from bilayers of phospholipids, are one of the primary building blocks of life [15–18]. In all cases, drug access to its target and action involves, in some fashion, interaction with biomembranes. Drug molecules must traverse membranes that form biological barriers. The majority of proteins that are drug targets are associated with biomembranes and the membrane plays a role in the interaction with substrates, thus potential drug molecules; interactions of both target proteins and drugs, with the membrane to which the protein is associated, play a role in the drug–target protein interaction. Finally, there are cases where the mode of action of the drug does not even involve interaction with a protein; the drug acts directly on a specific biomembrane rather than with a protein. In all of these cases, MD simulation, a mature tool for the study of biomembranes [19–27], can provide a window on the role the lipid membrane plays in drug action; this review paper describes how MD provides mechanistic insight and includes many examples where it has been successfully applied. We will first discuss the behavior of drug molecules in the membrane, then how drugs transfect through membranes that form biological barriers and how drugs can, collectively, at sufficient concentration, affect the properties of the membrane. Finally, we will discuss the role the membrane plays in the selection of substrates, thus potential drug molecules, for membrane associated proteins. In all these areas, MD simulation has acted as a unique window capable of adding

mechanistic insight that can be applied in drug design; we describe many case studies where this has been used effectively.

While previous review papers have shown how MD simulation has helped elucidate the role the lipid membrane plays in substrate and thus drug selection for membrane proteins [28–32], drug membrane interactions [33–41], drug delivery [3,42–45], antimicrobial peptides [46–54], and methodologies [28,55–60], this is the first review paper, that we are aware of, focusing on the entirety of the use of MD simulation to incorporate the role played by interactions with lipid membranes in drug design. This can be seen, in turn, as a case study of the potential for MD simulation to expand the paradigm of drug design to all aspects of the broader biophysical environment within which drug action occurs.

We now refer the reader to our discussion of the basics of the MD simulation method in our previous review paper, i.e., "part 1" [3], or an equivalent discussion found in Braun et al. [61], where its ability to provide an effective visualization of the system studied with all atom resolution (also reviewed in [62–65]) and how coarse grained (CG) potential sets can be used to effectively zoom out to study larger length and time scales (also reviewed in [66–70]) are covered. However, the application of MD simulation to the study of the more complex systems involved in drug action rather than drug delivery, that we cover in this review, requires advanced MD simulation related methodologies not discussed in the aforementioned references; these will now be described and examples of their use will be encountered later, throughout the rest of the paper. As an aside, it is appropriate that we now draw attention to the fact that the chemical structures of the lipids discussed in this manuscript are shown in Figure 1.

#### **2. Advanced Simulation Methods**

In its pure form, MD simulation provides a window into the system, with all atom resolutions, at a timescale of up to 1–2 µs and a box size of up to 15–20 nm, significantly greater with CG models [3]. In our previous review paper [3], we discussed many examples where this has been used in issues related to drug delivery, and how it is a valuable tool when used its in pure form; however, significant limitations are encountered when one addresses issues related to drug design rather than delivery.

When expanding what we consider in drug design the oversimplified "lock and key" paradigm to the broader context of all the relevant biophysics, there are many challenges that result from the added complexity. Within the context of biophysics, drug design can be studied through the paradigm of the interaction of four different basic varieties of entities, each of which present their own challenges: (1) small drug molecules, which, while simple in structure, are often chemically/structurally novel as they are often synthetically created molecules that do not naturally exist, thus in many cases new parameter sets must be developed and extra challenges are encountered in attempting the development of CG models; (2) specifically structured large biomolecules, e.g., proteins and nucleic acids, that have a highly specific structure defined by a very rigid, steep, and complex energy landscape that molecules that interact with them must negotiate; (3) large unstructured molecules, e.g., polymers that, while without specific structure but with an uncomplicated energy landscape governing their interactions, exhibit complex behavior that is difficult to model due to topological effects that arise spontaneously due to their size and flexibility; (4) lipid membranes, that are flexible, complex, and diverse entities with very complex interactions with other elements. Clearly, one must adopt more advanced methodologies rooted in MD simulation to explore this complex landscape and these will now be introduced. Discussing this will, however, require the presentation of some mathematical and physical (thermodynamic) constructs.

**Figure 1.** Chemical structure of lipids mentioned in this paper.

Let us consider our window into a given physical system that is provided by MD simulation: a set of molecules with periodic boundary conditions being simulated at temperature *T*, e.g., a section of lipid membrane, protein, or other macromolecule or molecular complex in solvent. Technically, this can be seen as a set *N* particles governed by

a potential that is a function of the position of all particles: *U*( → *r N* ); → *r N* is the position of all particles, which can in turn be seen as a point in a 3*N* dimensional space with { → *r N* } representing the entirety of this space, i.e., the set of all possible particle positions, known as the *conformation space*, i.e., the system of *N* particles governed by potential *U*( → *r N* ) can be said to inhabit the space { → *r N* }. If we begin the simulation at a certain point in this space → *r N* <sup>0</sup> and simulate, with a thermostat holding the system at constant temperature *T*, using the MD simulation method described in our previous review paper "part 1" [3], then the system will move around in this space following what can be called a *trajectory*. In any real simulation there will be enough particles and *U*( → *r N* ) will be sufficiently non-trivial so that we can assume what is known as the ergodic hypothesis: given infinite time, the system will eventually pass through all possible points in { → *r N* }. The relative amount of time spent within a given 3*N* dimensional differential element *d* → *r N* will be equivalent to the probability of the system being within that element that we will signify as *P*( → *r N* , *T*)*d* → *r N* . For a system at constant temperature, i.e., a *canonical ensemble* [71],

$$P(\stackrel{\rightarrow}{r}^N, T) = \frac{e^{-\gamma}}{Z\_{\mathcal{U}}(T)} \quad \gamma = \frac{\mathcal{U}(\stackrel{\rightarrow}{r}^N)}{T} \tag{1}$$

where *Zu*(*T*) is a normalization factor known as the *partition function* that is, for any but the most trivial systems, impossible to determine analytically; in this equation, we have assumed temperature to be in units that match *U*( → *r N* ), thus have not included the Boltzmann constant normally seen in this equation. A clever way around the impossible to determine partition function makes use of the principle of detailed balance: given two points in the conformation space <sup>→</sup> *r N* <sup>1</sup> and <sup>→</sup> *r N* 2 , the ratio of the transition probabilities between these two points, *W*( → *r N* <sup>1</sup> → → *r N* 2 , *T*) and *W*( → *r N* <sup>2</sup> → → *r N* 1 , *T*) is equal to the ratio of the probabilities of being at the two points:

$$\frac{\mathcal{W}(\stackrel{\rightarrow}{r}\_2^N \to \stackrel{\rightarrow}{r}\_1^N, T)}{\mathcal{W}(\stackrel{\rightarrow}{r}\_1^N \to \stackrel{\rightarrow}{r}\_2^N, T)} = \frac{P(\stackrel{\rightarrow}{r}\_1^N, T)}{P(\stackrel{\rightarrow}{r}\_2^N, T)}\tag{2}$$

Given

$$
\Delta \gamma = \frac{\mathcal{U}(\stackrel{\rightarrow}{r\_2}^N)}{T} - \frac{\mathcal{U}(\stackrel{\rightarrow}{r\_1}^N)}{T} \,. \tag{3}
$$

an algorithm that satisfies all preceding relations, is: for a system at <sup>→</sup> *r N* 1 considering a transition to <sup>→</sup> *r N* 2 , if <sup>∆</sup>*<sup>γ</sup>* <sup>≤</sup> 0 the system moves to <sup>→</sup> *r N* <sup>2</sup> otherwise the transition occurs with probability *e* −∆*γ* . This can be continued indefinitely, following what is known as a *Markov chain*, moving through conformation space sampling a set of states that will have a distribution of *P*( → *r N* , *T*) in the limit of infinite steps. The result of carrying this out for a finite number of steps is thus an approximation of *P*( → *r N* , *T*), that we will refer to as *P*( → *r N* , *T*); as the number of steps in the Markov chain increases, the longer the trajectory, the greater the accuracy of our approximation. The simulation of a system in this fashion is known as a *Monte Carlo (MC) simulation* [72].

If one is able to concisely calculate *P*( → *r N* , *T*) over all { → *r N* } for any system, then one will know everything that is to be known about that system and any property of that system can be calculated; this is, however, impossible for all but the most trivial systems. Yet, both MC and MD simulation are able to provide an incomplete estimate of this: the above mentioned *P*( → *r N* , *T*); the longer the trajectory the more information, for the case of

MD as well as MC simulation. Depending on how the algorithms are constructed, each have their relative strengths and weaknesses and can be hybridized, particularly since MD simulation has been shown to be superior at relaxing the local degrees of freedom, while MC simulation, if designed properly, can more effectively relax the more global degrees of freedom [73–76]. While MD simulation, if carried out on its own, can also provide information related to dynamics, this information is usually of secondary importance; we

will only consider MC and MD simulation as means to provide a result for *P*( → *r N* , *T*) here. The problem with both MC and MD simulation, or even the two hybridized, in their

basic forms, is that the approximation of *P*( → *r N* , *T*) (*P*( → *r N* , *T*)) will provide the most information in the regions of the conformation space of the system where *P*( → *r N* , *T*) is greatest. For many cases, for example, the aforementioned systems related to drug delivery [3], this may be sufficient, i.e., the answer to the question "what does the system normally do?" may be all we seek. There are many cases where this will not be sufficient, for example, when calculating the free energy to cross certain barriers from one region of conformation space to another. A useful schematic is to express <sup>→</sup> *r N* as a single dimension *x* rather than 3*N* dimensions and plot *U*( → *r N* ) as *U*(*x*), as shown in Figure 2. The energy landscape is very rough with many peaks and valleys; with either MC or MD, in their basic form, crossing a very high energy barrier will be an extremely unlikely event. If you have managed to

determine the global energy minimum, then you will only obtain results for *P*( → *r N* , *T*) in the local valley of the global minimum. Even worse, usually, you cannot even be certain you have found the global minimum and thus will only be exploring the vicinity of a local minimum. In general, the primary concern is the region of { → *r N* } where you need to obtain information regarding *P*( → *r N* , *T*). If you wish to calculate the free energy difference, → *N*

between two different local minima, then you will need to obtain a result for *P*( *r* , *T*) in the region of conformation space along the path that connects them. For example, for the affinity of a drug for a protein, we need to determine the free energy difference between the drug in solution and the drug bound to the specific binding site of the protein that is being targeted. Moreover, in many cases, what is of interest in the simulation is what is known as a "rare event"; this corresponds to a transition through a region of conformation space with low *P*( → *r N* , *T*). We will now discuss how approximate values of such free energies can be calculated and rare events can be sampled more effectively, using advanced simulation

The most direct way to increase the region of space where *P*( → *r N* , *T*) is obtained, and to ensure the rare events that we wish to observe are sampled, is to globally accelerate the sampling. The simplest way to achieve this is to just artificially raise the concentration of the molecules that would trigger the event. This is known as a *flooding simulation* [77]. Another simple method is to perform an MD simulation at an artificially raised temperature, to move the system through conformation space more quickly and through greater thermal excitation push the system through energy barriers that would otherwise not be crossed. Obviously, just simply raising the temperature will often be problematic as the alterations to the system, e.g., passing through phase transitions, that happen in the real system with elevated temperature, will often occur in the model, producing a false result. A way around this is to anticipate the unphysical changes to the system the temperature rise will invoke and add a bias potential to *U*( → *r N* ) to block this from occurring while raising the temperature; this method is known as temperature accelerated molecular dynamics (TAMD) [78]. While TAMD has been used effectively in many cases, it suffers from the problem that the bias potential may not completely alleviate the undesired temperature effects. A safer but more complex method to achieve the same acceleration, but with a guarantee of obtaining an unbiased result, is Replica Exchange.

methods.

**Figure 2.** Schematic of the energy landscape of a system; with basic unbiased MD or MC simulation, if performed below a certain temperature, only the region of the local minimum will be explored, thus a result for *P*( → *r N* , *T*) will only be obtained here. Finding the free energy change to another region of conformation space, where the energy barrier must be crossed and *P*( → *r N* , *T*) determined along the path, is not practically possible without the advanced biasing schemes we discuss.

In replica exchange [79–81], originally described as *parallel tempering* in the seminal literature, the system of *N* particles governed by potential *U*( → *r N* ) that inhabits the space { → *r N* } being simulated at temperature *T*, using either MD or MC, can be simulated in parallel with a set of other systems with the same set of particles, however, with other parameters altered. The set of systems can be simulated as a single extended ensemble; this can be seen as a metaverse of parallel universes, a concept that a broader segment of the audience of this paper can understand intuitively now, due to the popularity of recent movies and television shows that have featured such plot elements. Within this new extended ensemble, switching the parameters between two, so to speak, universes within the multiverse that we have created, can be seen as an MC step that can be made with probability *e* <sup>−</sup>∆*<sup>γ</sup>* where *γ* is as above, however, now rather than alteration of particle positions, i.e., a change in <sup>→</sup> *r N* , the change in *γ*, ∆*γ*, results from temperatures and/or potential functions that governs the systems switching; a schematic of this algorithm is found in Figure 3A. It can be demonstrated mathematically that not only does the collective metaverse that we have created move to equilibrium, but each individual system, separately, moves to equilibrium, thus systems with parameter sets that move more slowly through conformation space are effectively pulled along by those that move more quickly. Initially, this was performed with multiple systems that differed in thermodynamic parameters of the system, like temperature and pressure [82]. Then, Bunker and Dünweg realized that one could more precisely accelerate the dynamics by having the systems differ not

in a thermodynamic parameter, but rather a term in the interaction potential *U*( → *r N* ); specifically, they proposed eroding the repulsive core of the Lennard–Jones potentials, allowing for a finite acceptance of MC pivot moves hybridized with MD simulation in the backbones of polymer chains [76] and referred to this as parallel excluded volume tempering. Fukushini et al., then generalized this and applied it to protein structure determination and coined the term by which this procedure is known today: Hamiltonian replica exchange [83].

The aforementioned techniques globally accelerate the motion through conformation

space, thus the efficiency with which the quality of *P*( → *r N* , *T*) as an approximation of *P*( → *r N* , *T*) is improved. There are, however, other techniques capable of focusing more specifically on the local region of the conformation space where the system is located specifically allowing the system to escape the local minimum within which it is trapped. For a system on a discrete space, that is where <sup>→</sup> *r N* is a set of specific points, or states, i.e., { → *r N* } → *xi*=1,*N*, Wang and Landau developed an algorithm [84] where an MC simulation of a system on the discrete space is performed and the number of times the system samples a certain state is recorded and periodically a bias is added to the potential, proportional to the number of times a state has been sampled. Metaphorically, this can be seen as filling up the energy wells that the system has already explored with sand, pushing the system over energy barriers into new energy wells, and then filling them up and so on. This algorithm was adapted to continuum systems, on which MD simulation can be performed, as Statistical Temperature Molecular Dynamics (STMD) [85] and this has been formally shown to be mathematically equivalent [86] to the better-known technique of metadynamics [87–91]. An example of metadynamics at work is shown in Figure 4. An alternative to metadynamics is the application of a specific alteration to the potential to push the system out of a specific local minimum; this is possible when the form of the local minimum can be approximated by an analytical function that can be cancelled out. This technique is known as the addition of a *flooding potential* [92].

We have now described methods to globally accelerate the exploration of conformation space and to, more specifically, escape from the local minima that constrain the system; this, however, still often leaves a conformation space too large to explore to obtain a sufficiently accurate result for *P*( → *r N* , *T*) in the region of conformation space that we are specifically interested in. Often, this is the transition between two specific local minima, in particular, calculating the free energy difference between them. A specific example of this is a ligand in solution vs. the same ligand docked to a specific active site of a specific protein. The most direct way to achieve this is to push the system through a path in conformation space between the two local minima. There are two means to achieve this: (1) umbrella sampling [93] and (2) steered MD [94,95].

In umbrella sampling, a set of what we refer to as *windows* are created along a path in conformation space that connect the two local minima. This is achieved through the introduction of a biasing force that is in the form of a harmonic interaction, i.e., a spring, centered on a certain distance along the path in conformation space that connects the two local minima. Histograms of the distribution in position along the path in conformation space are determined for each window; a schematic of umbrella sampling is shown in Figure 5. The free energy along any path in conformation space, known as the Potential of Mean Force (PMF), can be determined from these histograms, if the overlap between their tails is sufficient, through a technique known as Weighted Histogram Analysis Method (WHAM) [96]. WHAM is itself an extension of a technique developed by Ferrenberg and Swendsen [97,98] primarily for spin models [99], to specifically adapt it to systems governed by a molecular mechanics potential. A specific example that elucidates the use of umbrella sampling is the study by Nandy et al. [100], where it was used to study the binding of PAMAM dendrimers of different generations to strands of ds-DNA; in the supplementary information of this paper the overlapping set of histograms is shown. The efficiency of the umbrella sampling algorithm can, in many cases, be significantly improved through hybridization with the replica exchange algorithm [101]; the set of all windows

can be treated as a single extended ensemble where exchanges of the force biases between neighboring windows can be made through the aforementioned MC moves.

**Figure 3.** (**A**) Schematic representation of replica exchange (based on ref. [28]); (**B**) force bias/Jarzynsky, reproduced with permission from ref. [94]; (**C**) RAMD, reproduced with permission from ref. [102]; (**D**) thermodynamics integration, reproduced with permission from ref. [103].

Steered MD is another technique capable of determining the free energy change along a path in conformation space that, in many cases, is significantly more computationally efficient than umbrella sampling. It is based on an astounding, counterintuitive, piece of theoretical statistical mechanics proposed by by Jarzynski in 1997, almost 20 years before the computational power to apply this algorithm effectively became available; the relation he derived [104,105], known as the *Jarzynski equality*, can be expressed as

$$
\langle e^{\frac{-W}{T}} \rangle = e^{\frac{-\Delta F}{T}} \tag{4}
$$

where h i signifies the average over infinite samples, *W* is the work carried out along the path in conformation space, i.e., the integral of the force along the path, and ∆*F* is the difference in free energy between the two ends of the path. The free energy difference between two points in conformation space, a property defined only for a system in equilibrium, can actually be approximated from multiple samples of the work carried out pulling the system from one to the other, a driven system, i.e., a measurement of the system out of equilibrium. Steered MD involves simulating the system while pulling the system along a path in conformation space to determine the work carried out along this path; it is performed multiple times, each time a measurement of *W* is made. There are two

fashions in which this pulling can be performed [94]: constant force, where the velocity is measured along the path, or constant velocity, where the force is measured along the path; this is shown as a schematic in Figure 3B. This algorithm was initially thought not to be practical as, in its early use, it obtained very poor results for the free energy; however, more recently, it was found that if you perform a much larger number of samples than previously feasible, one is able to obtain a result for the free energy difference that, in many cases, is significantly more accurate than that obtained using umbrella sampling using the same computational resources. This algorithm is increasingly being used in drug design [95].

**Figure 4.** Schematic representation of metadynamics.

The above methods presuppose that the path in conformation space that the system will traverse is known; however, this is not always the case. Often, the active site of a protein is within an internal cavity and the path through channels and mobile loops of the protein that a ligand takes to reach the active site, is unknown. In this case, a technique to not only find the free energy along a path in conformation space must be found, but the path itself must also be found. A technique capable of achieving this is Randomly Accelerated Molecular Dynamics (RAMD) [106]; this involves starting the ligand in its docked position, moving it in a random direction until it gets stuck, then picking another direction and repeating until the ligand has left the protein, bouncing back and forth until it wiggles out of the box it is in; an example of this is shown in Figure 3C. A modified version of RAMD, known as τRAMD allows for the study of the residence time of ligands in the binding pocket [107]. Two additional methods have been developed to find the optimal pathway in complex multidimensional space that bear mentioning: the first is based on the combined use of metadynamics and a path-searching algorithm [108] and the second is based on what is known as a "string method" involving a swarm of trajectories [109,110].

All of the above methods to determine the free energy between two regions of conformation space, usually two different local minima, involved moving the system through a path in conformation space. While accurate, this is not always the most efficient method if all you are interested in is a single quantitative approximation for the free energy difference between the end points of the path and not the free energy along the path, i.e., the PMF. While this information is often useful, as it provides insight into the kinetics in addition to the thermodynamics of the system [10], it does not come without a price; if you is only interested in the free energy difference between the end points of the path, there are more efficient means to obtain an approximation of this. Instead of charting a path through

conformation space, you can instead move through a space composed of different possible forms of the interaction potentials, *U*( → *r N* ), itself to determine this; if you start with, for example a ligand bound to the active site of a protein, and that same ligand free in solution, in each case you can gradually dissolve the interactions around the molecules to find the free energy difference to the molecule in empty space: subtracting these from one another determines the free energy difference between them. There are several different related methods that follow this scheme, each with different advantages and disadvantages, including Thermodynamics Integration (TI) [111], Free Energy Perturbation (FEP) [112–114], and Bennett Acceptance Ratio (BAR) [115,116]. If one is also only interested in the difference of the binding affinities of two different molecules and the absolute free energy of binding is not important, then one can perform what is known as *molecular alchemy*, calculating the free energy change along the path of metamorphosis from one molecule to the other, both in solution and at the binding site and subtracting these two from one another, as shown in the schematic in Figure 3D. These techniques in the context of drug design are discussed in detail in a review paper by Limongelli et al. [10].

**Figure 5.** Schematic representation of umbrella sampling algorithm.

Finally, it bears mentioning that, as in other fields of computational science, an emergent trend is the incorporation of the toolkit of novel computational tools widely known as Machine Learning (ML) into hybrid MD algorithms; for a recent review (2020) of achievements in the hybridization of MD with ML, see Noé et al. [117]. Recently, a hybrid FEP + ML algorithm was found to provide a more accurate result, given the same expenditure of computational resources, than the use of FEP alone, for the prediction of hydration free energies [118]; a hybrid MD + ML algorithm was used to predict self-solvation free energies and limiting activity coefficients [119].

#### **3. Location and Orientation of Drug Molecules in the Lipid Bilayer**

ሺ⃗ேሻ The beginning of the story of the role of lipid membranes in drug design is a discussion of what individual drugs do when they initially interact with a lipid membrane, where in the membrane they locate to and how they orient within the membrane; this is clearly a result of both the nature of the lipid membrane, i.e., what particular phospholipids and other amphiphilic molecules make up the membrane formulation and the structure of the drug molecule itself, i.e., location and relative dominance of polar, charged, and non–polar regions of the molecule and the flexibility/rigidity of the molecule (Figure 6).

**Figure 6.** Chemical structure of drugs introduced in Section 3.

The partitioning of small molecules, including proposed drug candidates (see Table 1) but also natural molecules (see Table 2) and other xenobiotic molecules (see Table 3), into lipid bilayers has been extensively studied via a combination of experimental and computational methods. How molecules interact with the membrane, i.e., how they orient and the different regions within the membrane that they locate to, is determined by their structure; MD simulation can provide significant insight into this specific selection. Through simulations carried out on many different drug molecules, a wide range of drug behavior in the membrane, i.e., location and orientation, has been observed.

Berendsen and coworkers [120,121] developed a classification scheme for the behavior of molecules within the membrane according to the region of the membrane that they locate to; they defined four separate regions, as shown in Figure 7: region (1) at the interface between the bulk solvent phase and the lipid headgroups, i.e., the region of the headgroups and the first shells of water adjacent to the membrane, region (2) just below the lipid headgroups, region (3) at the position of the hydrophobic lipid tails, and region (4) at the very center of the membrane core, between the two leaves of the bilayer. Through MD simulations, it has been demonstrated that small hydrophobic molecules e.g., benzene or toluene [122,123], and gases like xenon, O2, NO, CO, and CO<sup>2</sup> [124–129] locate to region 4; larger hydrophobic molecules, e.g., porphyrins, locate to region 3 [130,131]; amphiphilic molecules, e.g., the majority of drug molecules, our primary interest in this review paper, locate to region 2, their polar components interacting with the polar groups of the lipids and water at the membrane surface and their hydrophobic components located among the hydrocarbon chains of the lipids, as shown in a study by Paloncýová et al., of drug molecules, natural compounds, and other xenobiotics [132]. Mostly polar molecules, including some drug molecules, locate to region 2. Finally, kanamycin A, a highly polar molecule with potential application as an antibiotic, locates to region 1 [133]; antibacterial peptides and saponins [134–136] also locate to this region prior to pore formation. It is possible for

drugs to locate to the membrane in a fashion that spans more than one of these regions; examples of this case are particularly interesting as they include several lipid molecules with therapeutic potential. For example, N-arachidonylglycine and oleoyl-L-carnitine are glycine transporter GlyT2 inhibitors that span over regions 2 to 4 [137]. Nevertheless, even molecules that do not directly interact with the membrane can have a long–range membrane association. For example, the concentration of polar molecules relative to the bulk concentration may increase or decrease in the ~1 nm layer of water adjacent to the lipid bilayer; also, acetylcholine is attracted to bilayers that contain negatively charged lipids [138–140] and repulsed by bilayers composed of zwitterionic lipids, as determined through both the MD simulation and experimental study [139]. The zwitterionic neurotransmitters GABA and glycine are attracted to bilayers that contain anionic lipids but are, however, not affected by zwitterionic lipids [139]. Glutamate, another polar, anionic, neurotransmitter, is repulsed from the vicinity of negatively charged lipids [139].

**Figure 7.** Location of the drugs and small molecules in four regions of a lipid bilayer. (**A**) Top panel: Snapshot of lipid bilayer composed of POPC; Lower panel: corresponding density profiles showing location of lipids, water, phosphate atoms, carbonyl atom (C35), and atoms of double bond of oleoyl chain (C9=C10) of POPC (see Figure 1). Data from 1000 ns simulation of hydrated POPC bilayer with our OPLSaa lipid force field [141–143]. (**B**) Snapshots showing location of small molecules in different regions of the bilayer: kanamycin A locates to region 1 (reproduced with permission from [133]), indocyanine green locates to region 2 and 4, and entangled in PEG corona (reproduced with permission from [144]), ubiquinone locates to region 3 and 4 (reproduced with permission from [145]).

Since drugs and lipids are generally small molecules without the structural complexity often found in biomolecules, studies of drug molecule–lipid bilayer (membrane) interactions are usually relatively straightforward to carry out, even given limited computational resources; the systems studied, lipid membranes with small drug molecules, equilibrate relatively quickly and can be studied effectively using relatively small systems. It is thus not surprising that interactions of proposed drug molecules with lipid bilayers were among the first topics to be studied using MD simulations of lipid membranes [19], for example the studies by Tu et al. [146], Pasenkiewicz-Gierula et al. [147] and Bemporad et al. [148,149] carried out in 1998, 2003, and 2005, respectively. More recently, a particularly noteworthy study by Abdiche and Myszka measured the liposome partitioning of 86 drugs and

observed an astounding affinity range of ~1000 fold [150]. In another impressive study, Natesan et al., proposed a theoretical method based on drug structure that allowed for the prediction of the location of drug molecules within the bilayer [151]; this work was performed using experimental results of 107 separate small molecules interacting with lipid membranes. An extensive database was compiled that contains over 3600 cases of compound–membrane interactions gathered from both experimental and theoretical studies, known as the "Molecules on Membranes Database" (MolMeDB), [152]; MolMeDB provides data concerning drug–membrane partitioning, penetration, and positioning.

As is the case for all phenomena studied through MD simulation, the precise values for the parameters of the potential set used are of critical importance, thus the testing and validation of the model are crucial [153–162]. Partition coefficients of the molecule (log K) provide an important test of the potential set that can be directly correlated to experimental measurements. For membrane partitioning of small molecules, comparative studies have demonstrated that the best models predict the partition coefficients (log K) with an accuracy of 0.5–0.8 log units [163]. This result thus provides us with a certain degree of confidence in the accuracy with which MD simulation is capable of effectively modeling the behavior of molecules in the membrane; the partition coefficient plays a dominant role in the drug–membrane interaction.

Simulation studies using MD have demonstrated that the location of the drug molecule is dependent on the hydration level or lipid composition; this relationship has also been studied using MD simulation: a reduced hydration level can be simulated by reducing the thickness of the solvent phase of the simulation box. For example, in a hydrated bilayer, curcumin has been found to locate below the water–membrane interface and orient parallel to the bilayer normal [164]. Interestingly, curcumin, in bilayers with low hydration, locates outside the bilayer and adopts an orientation parallel to the membrane surface, forming a carpet-like structure. Another MD simulation study determined that the distribution of carprofen derivatives along the bilayer normal is bimodal, with the first density maximum located at the position of the lipid headgroup and a second deeper below the membrane–water interface [165]. When the bilayer is in the liquid disordered phase, the second position has been found to be more frequent than in the liquid-ordered phase. The location of ibuprofen is sensitive to cholesterol [166], thus also to the membrane phase. In the DPPC bilayer with 25 mol% of cholesterol, ibuprofen locates slightly closer to the water–membrane interface in comparison to the case of the pure DMPC bilayer (0.81 and 0.95 nm below the position of the phosphate group, respectively); in the bilayer with 50 mol% of cholesterol, ibuprofen, however, is found below the lipid headgroups, within the hydrophobic bilayer core (regions 3 and 4).

Extra complexity to the drug molecule–bilayer interaction is added when a lipid membrane with polymers conjugated to a set of the lipid headgroups is considered [167]. For example, lipids functionalized with poly(ethylene–glycol) (PEG) are most frequently used to fabricate so-called stealth liposomes applied as drug carriers in drug delivery [168]; PEG forms a polymer mesh on the top of the membrane surface, the PEG corona covering the liposome, where hydrophobic and amphipathic molecules may locate instead of within the bilayer. Here, they are shielded, by the polymer, from unfavorable contact with the polar solvent (water), in the same fashion as they would be if located within the membrane if the PEG were not present, however, with a reduced free energy penalty to their normal location within the membrane, leading them to locate instead within the PEG corona [144,168–170]; this can thus be seen as a potential additional compartment for small molecules.

The importance of lipid affinity and exact location in various membrane compartments has also been demonstrated for antioxidants that protect lipids from peroxidation. Two MD simulation studies have found evidence that that the functionalization of antioxidants with lipophilic groups, incapable of free–radical scavenging, decreases the extent of lipid peroxidation [171,172]. This effect results from the increased concentration of functionalized antioxidants in the hydrocarbon phase (regions 3 and 4). Another study, combining MD simulation with experiments, elucidated the mechanism of the effect of flavonoids on

the antioxidant activity of tocopherol and ascorbic acid in lipid bilayers [173]; flavonoids, quercetin in this case, synergistically increase the antioxidant activity of tocopherol and ascorbic acid. Through MD simulation, it was demonstrated that the orientation and location within the lipid bilayer of the three antioxidants differ: tocopherol locates below the lipid headgroups (region 2) and is thus capable of translocating through the bilayer, while ascorbic acid locates among the lipid headgroups (region 1), and quercetin prefers an intermediate position (regions 2 and 3). Moreover, these three antioxidants can form noncovalent complexes: quercetin–tocopherol, quercetin–ascorbic acid, tocopherol–ascorbic acid, and tocopherol–tocopherol. The formation of these complexes was observed in MD simulations and subsequently confirmed through explicit quantum mechanical calculations; the formation of quercetin–tocopherol complexes was then also confirmed experimentally, via fluorescence spectroscopy [173]. This formation of non–covalent complexes of the molecules provides a rational explanation for the synergistic effect of using a mixture of antioxidants: (1) the regeneration of tocopherol (a reaction transforming a tocopherol radical back to its native form) is facilitated and (2) quercetin locates deeper within the hydrocarbon phase when complexed with tocopherol.

Biomembranes provide a variety of local environments with physicochemical properties that vary considerably, dependent on the lipid composition; this allows for the optimization of membranes for different roles [174–176]. Even within single biomembrane domains, the lipid composition and properties may vary [177,178]. Here, MD simulation of drug molecules in different membranes can be used as a tool to investigate how drug molecules will interact with different biomembranes encountered in the body or even different lipid domains. For example, Alves et al. [179] studied the well-known chemotherapy agent doxorubicin in both a cholesterol rich, ordered, environment and in a liquid-disordered cholesterol poor environment using a combination of experimental analysis and MD simulation. They found evidence that the presence of cholesterol reduces the effect on membrane fluidity, thus possibly increasing the density of doxorubicin in cholesterol rich membrane domains, where the efflux P-gP protein is found, thus providing a possible explanation of the observed heightened vulnerability to efflux proteins of doxorubicin. On the contrary, amantadine [180] and chlorzoxazone [181] are less soluble in bilayers containing cholesterol.

Drug molecules can be functionalized to polymers to improve their bioavailability and achieve passive targeting. A common choice is direct covalent conjugation to a PEG polymer; however, other polymers have also been considered [43,182]. Conjugation to a polymer will clearly alter the interaction between drug molecules and lipid membranes; this can be studied through MD simulation. Tetraphenyl–porphyrin is a photosensitizer used in photodynamic therapy for cancer treatment. In MD simulations, we observed that tetraphenyl-porphyrin locates to regions 1 and 2 and orients within the lipid bilayer such that the two hydroxyl groups that it possesses are located at the water–membrane interface; PEGylated tetraphenyl–porphyrin, however, remains in the water phase. This result has been validated through MD simulations performed with its initial position within the membrane; it was seen to leave the membrane as the system equilibrated [183].

As we discussed above, for the case of tetraphenyl porphyrin, in addition to location in the lipid bilayer, the orientation of the molecule with respect to the membrane normal will also characterize its behavior. The antifungal drug itraconazole is a rigid and long molecule with weakly polar groups distributed along its backbone (long axis of the molecule); as a result of this particular structure, it adopts an orientation parallel to the membrane surface and locates to the region of the upper segments of the lipid acyl tails (region 3), close to the lipid headgroups (Figure 8) [184]. Due to the rigidity of the itraconazole molecule, its presence can affect the orientation of other molecules in the bilayer. For example, the fluorescent probe 1,6-Diphenyl-1,3,5-hexatriene (DPH) alters its orientation from parallel to the lipid chains to parallel to the itraconazole molecules within the membrane [186]; this change could lead to an incorrect interpretation of steady–state fluorescence anisotropy and fluorescence lifetime measurements. Finally, in bilayers containing cholesterol, itraconazole

molecules prefer bilayer regions depleted of cholesterol due to the incompatible orientation of the cholesterol molecules parallel to the membrane normal [187]. For this reason, cholesterol, typically present in liposome formulations used in drug delivery [188,189], is not useful as a component of a vesicle used as a delivery mechanism for itraconazole. Conversely, there are also drug molecules that are sufficiently lipophilic to locate within the membrane that, however, do not adopt a specific orientation when there, i.e., their orientation in the membrane is isotropic. For example, there are many tautomers of the drug molecule piroxicam: six are uncharged (PxA, PxB, PxC, PxD, PxE, and PxF) [190] in addition to zwitterionic, cationic, and anionic tautomers (see Figure 8C). Although, PxA and zwitterionic tautomers are dominant, in vacuum and water solutions respectively, the probability of occurrence of all possible tautomers, when in the environment of the lipid headgroups (regions 2 and 3), is not known; MD simulation studies of four piroxicam tautomers demonstrated that the orientation of the molecules differs significantly between tautomers and, for most cases with the exception of the cationic tautomer, there are no strong preferences for a specific orientation in the bilayer (Figure 8D) [185].

**Figure 8.** (**A**) Distribution of values of the angle between the vector representing the itraconazole long axis (red arrow at the chemical structure of itraconazole) and the bilayer normal [184]. (**B**) Snapshot showing itraconazole molecules in a lipid bilayer, purple spheres are phosphate groups of the POPC molecules [184]. (**C**) Chemical structures of piroxicam tautomers. (**D**) Distribution of angles between the vector representing the piroxicam long and short axes (red arrows at the chemical structure of PxA) and bilayer normal; PxA—black line, PxE—read line, zwitterionic—green line, cationic—blue line [185].


**Table 1.** List of recent studies of drug–membrane interactions.


**Table 1.** *Cont.*

Pulmonary surfactants form a complex structure on the inner lung surfaces, the alveoli, including the only monolayer in the human body at the boundary between the pulmonary liquids and air. For drugs designed to treat lung conditions that are delivered through pulmonary administration, e.g., corticosteroid and salbutamol inhalers for the treatment of asthma, this monolayer is the first barrier that they encounter. Pulmonary surfactant monolayers are composed predominately of DPPC (85%), and the remaining components are POPG (11%) and cholesterol (4%). A study by Hu et al., comprised a set of MD simulations of a lung surfactant monolayer models at five different surface tensions, representing various stages of monolayer expansion and compression, demonstrated that the nonsteroidal anti-inflammatory drug ketoprofen changes its location in the monolayer depending on the degree of monolayer compression [263]. In expanded monolayers with a surface area per lipid of 0.9 nm<sup>2</sup> , ketoprofen locates to the water–membrane interface; in the condensed monolayer, with a surface area per lipid of 0.52 nm<sup>2</sup> , ketoprofen locates to the hydrocarbon phase, beneath the headgroups of the monolayer. In other studies that were performed on monolayers mimicking pulmonary surfactants at surface pressures of 55 and 43 mNm−<sup>1</sup> , the antibiotic levofloxacin located to the water–membrane interface with a tendency to locate deeper within the monolayer when the surface pressure was increased [238]. Moreover, levofloxacin aggregated in the hydrocarbon phase of the monolayer.


**Table 2.** List of recent studies of natural compound–membrane interactions.

The orientation and location of drug molecules in lipid membranes is clearly only the beginning of our story. Once molecules enter a biomembrane they can do three different things: (1) pass out of the membrane again, possibly on the other side, the membrane merely forming a biological barrier on their way to their eventual destination, (2) act collectively to instigate large scale alteration or even disruption of the membrane structure, or (3) interact with proteins that are associated with the membrane. These three phenomena are all of clear relevance to molecules designed to act as drugs. In the rest of this review article, we will discuss the use of MD simulation to elucidate each of these in the context of drug design.


**Table 3.** List of recent studies of xenobiotic–membrane interactions.

#### **4. Translocation through the Membrane**

For all drugs, there is a ubiquitous membrane interaction they need to perform in every case: transfection, or translocation, through biomembranes that form biological barriers that need to be crossed in order to reach the target site for their action (Figure 9). This process has been studied intensively using MD simulations; for previous extensive reviews, see references [34,560–563]. Unsurprisingly, the permeation of small molecules through the lipid bilayer was among the first membrane-related topics studied through MD simulation, e.g., in 2004, Bemporad et al. [149] studied permeation of small organic molecules like benzene or ethanol, and in older studies, Marrink and Berendsen (1994) studied permeation of water [564]. A significant amount of work has been performed to predict the permeability of small molecules through lipid bilayers using various molecular modeling methods and theoretical tools [565–574]; this includes new computational methodologies, developed specifically for the study of membrane permeability [575–577]. A web server and database PerMM is dedicated to gathering experimental and computational data related to small molecule membrane partitioning and translocation [578]. For a recent review of the development of experimental methods used for studying passive diffusion through membranes, see reference [579].

**Figure 9.** Chemical structure of the chemical compounds introduced in Section 4.

Regarding the study of drug translocation through the lipid bilayer, the main limitation is the time scale that is possible to reach with MD simulations with all atom resolution; this typically is not sufficient to allow for the observation of the entire translocation process without some form of force bias. Translocation in MD simulations can be observed for tiny molecules, e.g., gases [557] or water [548,552,555] in numbers allowing for the observation of a sufficient number of events to allow for a statistically significant result; for larger molecules, only single cases of translocation can be observed, e.g., [550,580–585]. Membrane transfection represents a transition through an energy barrier, as shown in Figure 10; the force biased methods discussed earlier in this review are thus the tools needed to effectively study this. The transition rate will be directly proportional to the height of the energy barrier and force biased methods are an effective means to calculate this. For the case of membrane translocation, the umbrella sampling method [93,586,587] is the most frequent choice to perform this calculation; umbrella sampling calculations show a profile of the potential of mean force (PMF) along selected transition pathways, which, for the case of membrane permeability, is normal to the bilayer surface (Figure 10). The recent review by Lee and Kuczera provides insight into methods used for the calculation of free energies of translocation through lipid bilayers [57].

**Figure 10.** (**A**) Free energy profiles along the bilayer normal of zwitterionic (dashed line) and neutral (full line) ciprofloxacin molecules; the membrane center is at z = 0. Snapshots of ciprofloxacin molecule in the bilayer center: (**B**) uncharged, (**C**) zwitterionic form. Lipids are not shown for clarity. Reproduced with permission from ref. [233].

As discussed previously, the computational cost of an umbrella sampling calculation is relatively high, but through clever tweaks to the algorithm this cost can be reduced; membrane transfection can be seen as a near perfect testbed for the comparison of variants of umbrella sampling and other algorithms for the study of transitions through energy barriers. Nitschke et al., developed a number of shortcuts regarding the calculation of the energy barrier to membrane transfection using umbrella sampling, including decreasing the size of the bilayer, parallel use of multiple solutes, and decreasing the cutoff radius for the Lennard–Jones interactions; this did not significantly affect the quantitative results of the calculations while reducing the computational resources required by up to a factor of 40 [274]. Other methodological studies have determined that adding the so-called flooding potential [92] to the force field allows for improved conformational sampling of the solute before umbrella sampling, thus improving the accuracy of the calculation of the free energy barrier [203]; this was discussed earlier in the paper. A comparison of the four methods used to obtain the free energy profiles (umbrella sampling, replica-exchange umbrella sampling, adaptive biasing force, and multiple-walker adaptive biasing force) for three solutes (urea, benzoic acid, and codeine) found no benefit in implementing the more advanced algorithms, i.e., no improvement in the quantitative accuracy of the results for a given expenditure of computational resources [287]. In another study, three methods were compared: metadynamics, umbrella sampling, and replica-exchange umbrella sampling [379]. Comparisons were performed for six compounds: arginine, sodium ion, side-chain analog of alanine (methane), alanine with neutral termini, zwitterionic alanine, and water. In partial disagreement with the previous result, this study found a significant deviation in the free energy profiles for the case of charged molecules: arginine, sodium ions, and zwitterionic alanine, while for neutral molecules such as methane and zwitterionic alanine, no significant differences were observed [379]. The reason for the discrepancy observed for charged molecules was the slow relaxation of electrostatic interactions between lipid headgroups and the solute. The only method available that allowed for sufficient sampling

was replica-exchange umbrella sampling. Recently, Bennett et al., combined MD simulations with ML to study transfer free energies, i.e., free energy difference between two environments [571]. They calculated free energies of transfer from water to cyclohexane for 15,000 small molecules using MD simulations and next used obtained data to train the ML algorithm. The mean absolute error of the obtained prediction in comparison to MD data was only ~4 kJ/mol.

A qualitative insight into the mechanism of drug translocation through a lipid bilayer, i.e., following what the molecule actually does in a physical sense as it passes through the membrane, can also be obtained through MD simulation. For example, Chipot and Comer demonstrated subdiffusive behavior in small molecules translocating through the membrane, with a mean squared displacement proportional to time *t* as *t* 0.7 [413] rather than the expected linear relationship. Cramariuc et al., proposed that the translocation of ciprofloxacin, a fluoroquinolone antibiotic, is facilitated through the collective entrance into the membrane of dimers, or even larger column-like stacks of ciprofloxacin molecules (Figure 11A,B) [233]. The column-like stacks of ciprofloxacin molecules have also been observed by Li et al. [343]. Complexed to form dimers, or larger aggregates, negatively and positively charged groups of ciprofloxacin neutralize each other and thus provide an easy avenue for proton translocation; this leads to the collective transformation of all the ciprofloxacin molecules to their uncharged form. Interestingly, in studies of the antibiotic mangostin (Figure 11C), the formation of a transmembrane aggregate of 16 molecules of the drug was observed [241]; MD simulations can find evidence of general trends connecting drug structure to permeability. For example, performing simulations of 49 compounds, Dickson et al., derived a simple correlation between passive permeation and the number of hydrogen bonds between solute and the surrounding molecules, i.e., water and lipids, with a correlation coefficient of 0.63 [588,589].

**Figure 11.** (**A**) Snapshots of MD simulation showing a stack of neutral ciprofloxacin entering the lipid bilayer. (**B**) Electrostatic potential maps at the molecular van der Waals surface in a dielectric continuum corresponding to the water phase calculated using the DFT method and chemical structure of ciprofloxacin. (**C**) Snapshots of MD simulation showing transmembrane arrangement of mangostin molecules and its chemical structure. A and B reproduced with permission from ref. [233]; C reproduced with permission from ref. [241].

The translocation of the drug molecules through lipid bilayers may be facilitated using additional molecules that assist the translocation of the drug through the membrane. Well known examples of such molecules include ionophores facilitating the translocation of ions, e.g., K<sup>+</sup> [590], K<sup>+</sup> and Mg2+ [591], Cl<sup>−</sup> [592–594] and positively charged doxorubicin [221]. Other examples are so-called molecular umbrellas, capable of transporting hydrophilic cargo through lipid bilayers [595–597] or hydrophobic cargo through aqueous solution [598]. In a similar spirit, the use of conjugated antioxidants with antiretroviral drugs has been proposed to increase drug penetration into the central nervous system [599]. Additionally, peptides, including antimicrobial peptides, were proposed as an effective enhancer [600,601]. Recent MD simulations demonstrated that glycyrrhizic acid enhanced the translocation of the antiparasitic drug praziquantel through a lipid bilayer by lowering the free energy barrier associated with the hydrophobic center of the membrane along with a rearrangement of the lipid headgroups [252]. Similarly, studies of menthol, as an enhancer of translocation, found a large decrease in the free energy barrier in the bilayer center for the translocation of quercetin [342]. Additionally, graphene quantum dots have also been found to decrease the free energy barrier against the translocation of doxorubicin and deoxyadenosine [484]. Interestingly, carbon dioxide increases the permeability of ethanol, 20,30-dideoxyadenosine, and trimethoprim through the POPC bilayer [242]. Recently, cyclic peptides have been proposed as potential enhancers of drug permeation through lipid bilayers [602]. The cyclic peptide (Trp-D-Leu)4-Gln-D-Leu has the ability to assemble into tube-like structures in lipid bilayers; this has been demonstrated to possibly be capable of acting as an enhancer for the antitumor drug 5-fluorouracil [206]—the drug is believed to pass the bilayer through the tube created by the peptide as free energy calculations indicate the presence of only a small 5 kJ/mol barrier against such a translocation. In coarse-grained simulations of lidocaine translocation through a lipid bilayer, two enhancers, ethanol and linoleic acid, were shown to have a synergistic effect on lidocaine permeability [603]. Finally, Gupta et al. [604] performed extensive screening of possible enhancers through a massively parallel array of umbrella sampling calculations using the coarse-grained MARTINI model [605,606] to obtain approximate results for the free energy barriers to translocation for each case (Figure 12) [604].

The effect of the lipid composition of the membrane on its permeability to a broad range of molecules has also been studied though MD simulation. The presence of cholesterol, a major component of the cell membrane, is known to reduce the permeability of the membrane to various solutes [607–609]; this phenomenon has been examined through MD simulation. For example, in POPC bilayers, the addition of 33 mol% of cholesterol has been found to reduce the permeability of the membrane to 9-anthroic acid and 2',3'-dideoxyadenosine by a factor of ten; however, the permeability to hydrocortisone is reduced by a remarkable factor of 600 [281]. In a lipid bilayer mimicking the cell membrane, with an asymmetric distribution of phospholipid types, i.e., the formulation of the two leaves of the membrane differed, the permeability was found to be lower by 5–6 orders of magnitude in comparison to that of a pure DOPC bilayer [219]. The permeability of this lipid bilayer was further reduced by an order of magnitude when 33 mol% of cholesterol was added to both leaflets [219]. In cancer cells, the cell membrane asymmetry frequently vanishes; thus, comparison of symmetric and asymmetric models of the cell membrane, are of significant interest. A comparison of the permeability to cisplatin of symmetric and asymmetric bilayers found a decrease in permeability in a membrane designed as a model of a cancer cell membrane by a factor of 11. It has also been shown through MD simulation studies that the addition of DOPE, representing a lipid type that does not form a lamellar structure, to a DOPC bilayer, reduces the permeability regarding small molecules (molecular weight less than 100 Da); however, for larger molecules the effect is the opposite [284]. Simulations of lipid bilayers containing products of lipid oxidations have found a decrease in permeability for the case of oxysterols [550] and tail oxidized phosphatidylcholine [481].

**Figure 12.** Snapshots of the *stratum corneum* model with selected enhancers: Oleic Acid (OLE), Palmitic Acid (PLA), Geranic Acid (GRA), Undecanoic acid (UND), DMSO (DMS), Geraniol (GOL), Glycerylmonooleate (GMO), Isopropyl palmiate (ISP), Limonene (LEM), and Octyl pyrrolidone (OCP). Reproduced from ref. [604].

From the standpoint of pharmaceutical research, the most interesting are studies of membranes that form a boundary with an extracellular environment, including bacterial membranes [415,610–616], the *stratum corneum*, i.e., the most external layer of the skin [617–626], membranes present in the eyes [557,627–631], and the ocular mucous membrane [632], or lung surfactant monolayers [263,558,633–638]. Specifically, MD simulations have been used in studies of enhancer effects on the permeability of the *stratum corneum* (e.g., [544]). For example, studies of the effects of the terpene derivative borneol on the enhancement of *stratum corneum* permeability for osthole [543] and gastrodin, catechin, quercetin, emodin, imperatorin, ligustrazine, ferulaic acid, colchicine, and baicalin found

that borneol facilitates drugs permeation via a destabilization of the condensed and ordered arrangement of ceramides and free fatty acids [282]. Another study found evidence that the destruction of the *stratum corneum*, caused by ethanol, leads to the extraction of lipids from the membrane and subsequently the formation of the pore–like structures that allow for benzoic acid to translocate through the membrane [387]. Next, extensive studies, using umbrella sampling methods, of the permeability of the *stratum corneum* for water, oxygen, ethanol, acetic acid, urea, butanol, benzene, dimethyl sulfoxide (DMSO), toluene, phenol, styrene, and ethylbenzene identified the locations of the free energy barriers to transit through the *stratum corneum* for these compounds [417]. The umbrella sampling method has also used been used for studies of the permeation of p–aminobenzoic acid, benzocaine, and butamben through a lipid bilayer in the gel phase composed of ceramide, one of the main components of the *stratum corneum* [290].

In recent studies, Liu et al., calculated the time of entry of 79 drugs into the lipid bilayer, mimicking the lipid composition of the COVID-19 envelope with and without the spike protein and five bilayers mimicking the lipid composition of the: (1) plasma membrane, (2) lysosome, (3) endoplasmic reticulum, (4) Golgi apparatus, and (5) mitochondrial membranes [639]. The set of 79 drugs was selected from the currently approved drugs as potential antiviral therapeutics [640]. These calculations demonstrated that the presence of the spike protein significantly reduces the time required for drugs to enter the lipid bilayer.

Important membrane-based systems that form barriers that, for the treatment of many conditions, drugs must pass, include the blood–brain barrier and the membrane lining the gastrointestinal tract. Intestinal permeability is essential for pharmacokinetics and is widely studied through both experimental and theoretical methods [641–644]. The blood–brain barrier is another significant intraorganismal barrier considered in treating numerous diseases, e.g., Parkinson's disease [645,646] and even in screening potential drugs against SARS-CoV-2, capable of infecting brain tissue [647,648]. Steered MD simulations have found a correlation between experimental parameters that describe drug permeability through the blood–brain barrier and both the maximum force needed for pulling molecules through it and the overall non-equilibrium work performed during the pulling simulation [199]. Studies have been performed for 26 compounds in simple DOPC and DOPC–Cholesterol bilayers. In another study, unconstrained MD simulations were performed at elevated temperature to observe the spontaneous translocations of drug molecules through a multicomponent asymmetric lipid bilayer [416]. These studies produced results in agreement with the experimental data. Additionally, propionylated amylose has been used as a carrier for hydrophobic drugs designed to cross the blood–brain barrier. Amylose forms a helical structure that captures hydrophobic drugs and transports them through hydrophilic environments; however, when the drug-loaded amylose helix enters the membrane–water interface (region 1) the drugs are released [649]. Lipid membranes are, however, only one component of the structure of the blood–brain barrier; in fact, the most important elements are tight junctions controlling the entrance to the paracellular compartments. The main proteins forming these junctions belong to the claudin family. Since these junctions are large and complex, relatively few MD simulation studies of them have been performed; one example is a recent study of Claudin-5 [650].

It is commonly assumed that small molecules transform into their uncharged (unionized) form when they translocate through lipid bilayers, due to the prohibitively high free energy barriers for transporting charged species through a membrane (Figure 10). This assumption can be justified by, e.g., QM calculations performed in a polarizable continuum that demonstrates that proton transfer from the NH3+ to the COO<sup>−</sup> group in the zwitterionic drug molecule will occur, so long as the dielectric constant is lower than 20 [233]; thus, molecules become neutral in the lipid headgroup region when the dielectric constant drops from 78.4 (water phase) to 12–18 (water–membrane interface) [651]. In order to understand the translocation process, one should thus perform calculations using both the uncharged and charged forms of the molecule. In recent studies, Yue et al., performed free energy calculations for uncharged and charged states of drugs and compared them with simulations

where dynamic protonation was applied [202]. These calculations demonstrated that the free energy profile obtained with a more realistic dynamic protonation approach cannot be modeled as a simple superposition of the two other profiles; free energy barriers do not directly correspond to those observed in the simulation of the uncharged/charged form of the molecule; this assumption is thus an oversimplification.

Another methodological issue related to translocation and partitioning of small molecules into the lipid bilayer is the absence of explicit polarizability in the majority of the force fields commonly used for the simulation of biomolecules, i.e., potential sets used in the model. Jämbeck and Lyubartsev proposed a scheme to calculate free energy profiles through simulations with the partial charges in the potential set derived from QM calculations performed in a polarizable continuum; the calculations were performed twice with the dielectric constant set to 78.4 and 2.04, in order to mimic both water and hexane, respectively [292]. Based on these calculations, the authors define polarization correction terms for calculations of free energy differences between water and the environment of the membrane core.

A combination of atomistic and coarse-grained (CG) simulations have been applied to study the permeability of both planar lipid bilayers and a model of a small liposome with a radius of 10.1 nm, to 5-aminolevulinic acid and its esters [380]. In the first step, the free energy profile of solute translocation through a planar lipid bilayer was calculated for both coarse-grained and atomistic models. These calculations found a qualitative agreement between both models. In the second step, free energy profiles were calculated for the coarsegrained model of a liposome and compared with a coarse-grained model of a flat bilayer, demonstrating significant differences in the free energy barrier against translocation. In another coarse-grained study of liposomes, encapsulation and translocation between the outer and inner liposome leaflets, for the local anesthetic prilocaine, was observed [296]. Coarse-grained simulations have also been successfully used in the study of nanoparticles partitioning into lipid bilayers, e.g., [652,653] or where large-scale screening is performed for hundreds of drugs [654–658]. It should be noted that the MARTINI model was recently reparametrized (MARTINI 3) [659] to fix problems related to the imbalance of interactions between beads of various sizes leading to unrealistically strong interactions, e.g., protein– protein interactions [660–662].

#### **5. Effect of Drug Molecules on Membrane Properties**

#### *5.1. Drug Molecules Can Do More in the Membrane than Merely Locate, Orient or Pass Through*

In many cases, once drug molecules locate to the membrane, they gather in sufficient numbers to alter the structure of the membrane itself; designing drugs to affect the function of membrane proteins via an indirect modification of the properties of the membrane environment of the protein has been proposed. This mechanism is clearly theoretically possible as it is well known that the function of membrane proteins is modulated by their membrane environment [21,663–667]. Additionally, drugs can have a mode of action that involves solely the disruption of specific membranes; the desired activity of the drug does not, in any fashion, involve affecting the behavior of proteins. Regarding this tantalizing possible drug design strategy, it must, however, be determined whether this kind of affect can be induced with a realistic concentration of the drug molecule; the same question is relevant when alteration of a membrane structure is considered as an unwanted side effect (Figure 13).

This question was first addressed, through experimental studies of the anesthetic isoflurane in physiologically relevant concentrations of 1 and 5 mM in 4 membrane types, using multiple methods including fluorescence microscopy, fluorescence recovery after photobleaching, and patch-clamp measurements, among others. The results of these studies indicated that isoflurane, at this concentration, significantly decreases the extent of lipid ordering [304]. In erythrocyte ghosts, i.e., erythrocytes with their contents removed, the effect of the presence of a 1 mM concentration of isoflurane was found to be more substantial than that of 52.2 mM of ethanol. In another experimental study, the anesthetic phenyl-ethanol

was shown to affect the properties of hippocampal membranes at a drug/membranevolume ratio as low as 0.008% [306]; this indicates that the modification of membrane properties as a mechanism of drug action could actually be feasible. Finally, some drugs accumulate in specific cellular organelles, e.g., cationic amphiphilic drugs accumulate in lysosomes, organelles with an internal pH of ~4–5, perturbing the lysosomal membrane [668,669], thus decreasing cellular viability [670]. Among these drugs are numerous psychotropic drugs, which have shown potency as anticancer therapeutics [671,672].

**Figure 13.** Chemical structure of drugs introduced in Section 5.

From what has been discussed so far in this review paper, regarding the use of MD simulation to study the behavior of drug molecules in lipid membranes, it should be clear that MD simulation can also act as a tool to study the effect on membrane properties of drug molecules that have entered the membrane; MD can model a small microcosm of the large scale global structural changes being made to the membrane. Through MD, it is also possible to measure changes to global properties of the membrane due to the presence of specific new foreign molecules; how this is determined through analysis of the MD trajectory is described in the following section.

#### *5.2. Relevant Aside: Membrane Properties That Can Be Measured in MD Simulations*

When studying the effect that drug molecules can have on lipid membranes, MD simulations can provide insight into numerous global structural, elastic, and dynamical properties of lipid bilayers [673,674]; however, in the context of drug–membrane interactions, one usually only calculates a few specific parameters. The results for the surface area per lipid molecule and membrane thickness provide information regarding the lipid bilayer size. The surface area per lipid molecule is easily calculated from MD simulations by dividing the simulation box size in the membrane plane by the number of lipids in a single leaflet of the bilayer. There is no unique definition of bilayer thickness; however, thickness can be estimated from the so-called P–P distance, the distance between the average positions along the membrane normal of the phosphorus atoms of the phosphate groups in opposite leaflets. Both of these parameters, the area per lipid and P–P distance, can be obtained from both X-ray and neutron scattering experiments [675–677]. Order parameters can be defined and calculated to describe the extent of ordering in the hydrocarbon chains; the most frequently used is the SCD deuterium order parameter (Figure 14) that can also be measured experimentally through NMR spectroscopy [678,679]. Thickness and surface area are both global parameters, i.e., they represent the large-scale structure of the membrane; thus, their values will usually not be significantly affected by the presence of drug molecules at low concentration. Nevertheless, through MD simulations, these parameters can be calculated for the local region of the membrane around a specific drug

molecule, i.e., by looking at a tiny microcosm of the membrane we can study the effect of a high concentration of many drug molecules entering the membrane.

**Figure 14.** (**A**) Profile of the order parameter SCD for lipids located in three zones: distances 0–1, 1–2, >2 nm from a drug molecule (reproduced with permission from ref. [187]). (**B**) Profile of the lateral pressure in bilayers composed of DPPC, DPPC and cholesterol and DPPC and other steroids (reproduced with permission from ref. [680]).

One more useful parameter is the profile of the lateral pressure (pressure in the direction parallel to the membrane surface) along the bilayer normal [680,681]. It has been hypothesized by Cantor (1997) [682–684] that changes in the lateral pressure profile of the bilayer may affect the structure of membrane proteins, affecting their functionality. The local pressure inside a lipid bilayer can reach a value of as high as 1000 bar (Figure 14); it is thus clearly plausible that the lateral pressure from the membrane can affect protein structure. Finally, MD simulations can provide information that describes the properties of the water–membrane interface. For example, increased membrane hydration due to the presence of nonsteroidal anti-inflammatory drugs has been suggested as a possible cause of some of the side effects of these drugs [261,278]. Moreover, it has been shown that changes in the orientation of the lipid headgroups, involved in signaling, can affect the binding behavior of peripheral membrane proteins [685,686]; the design of drugs to alter the orientation of lipid headgroups can thus be seen as a drug design strategy to alter their signaling and thereby the related metabolic pathways.

#### *5.3. Unwanted Side Effects of Drugs Due to Their Alteration of Membrane Properties*

While the possibility of drugs designed to affect membrane properties is indeed an attainable goal, where MD simulation has been brought to bear as a design tool, a more common issue remains the undesired effects that drug molecules can have on the

biomembranes they encounter, i.e., drug toxicity (aka side effects); MD simulation has also played a role in elucidating such phenomena. The most frequent effect that drugs have been found to have on biomembranes is a local decrease of the extent of ordering in the acyl tails of the lipids. Examples of such drugs include, e.g., acebutolol, oxprenolol, propranolol [194], and aminoadamantane [255]. A less frequent effect is an increase in the extent of lipid ordering; for example, the drug itraconazole increases the ordering of the acyl tails in the upper part of the chain, close to the water–membrane interface (regions 2 and 3) [186].

In most cases, for example, the aforementioned itraconazole, the mechanism of action of the drug is related to the inhibition of enzymes, channels, or receptors rather than their effect on a specific lipid membrane; alterations of the membrane properties caused by the drug are thus not relevant or even possibly an unwanted side effect. For example, glycyrrhizic acid is a saponin found in licorice root that is proposed as a potential component of drug delivery formulations, due to its ability to form complexes with a wide range of hydrophobic molecules. Using MD simulations, it has been demonstrated that glycyrrhizic acid locates inside the lipid bilayer (region 1 and 2) and significantly decreases the extent of lipid tail ordering, even at low concentration (Figure 15) [452,453]; this phenomenon could limit the pharmaceutical applications of this compound. Moreover, the gastrointestinal toxicity that has been observed in many nonsteroidal anti-inflammatory drugs, designed primarily as inhibitors of prostaglandin–endoperoxide synthase, is, at least in part, the result of the effect the drug has on membranes [687–689]. Furthermore, the cardiotoxicity of nonsteroidal anti-inflammatory drugs is in part induced by drug–lipid interactions [690]. In addition to demonstrating that certain molecules have unwanted effects on lipid membranes, as described above, MD simulation has found evidence of innocence on the part of several drug molecules; for example, lipid-like potential drugs N-arachidonylglycine and oleoyl-L-carnitine have been shown not to affect membrane properties [137].

**Figure 15.** Snapshot of a lipid bilayer with a single glycyrrhizic acid molecule (**a**), and profiles of order parameter for lipids near and far from glycyrrhizic acid molecule (**b**). Reproduced with permission from ref. [453].

#### *5.4. Drug Membrane Interaction Can Play a Role in the Mechanism of Drug Action—Case of Anesthetics*

Now, we address the possibility that drugs could influence the membrane structure in a fashion that is desirable, i.e., an element of a drug's mechanism of action could be a direct alteration of the membrane's properties; MD simulation can be used in the same fashion to elucidate such positive effects in addition to the negative effects described in the previous section. The very first instance of the proposal of a mechanism of drug action that solely involved an interaction with the lipid membrane, rather than the active site of a protein, was the case of the theorized mechanism of action of general anesthetics. This hypothesis originates from the correlation between the efficacy of different anesthetics and

their respective oil–water partition coefficients (the Meyer–Overton correlation [691,692]); it was later found that their efficacy has an even stronger correlation with their membrane– water partition coefficients [693]. Thus, unsurprisingly, several MD simulation studies of the interaction between general anesthetics and lipids have been undertaken [694]. For example, results obtained from MD simulations of chloroform in DOPC and DSPC– cholesterol bilayers, representing liquid disordered and liquid-ordered phases respectively, have demonstrated a difference in the fashion with which the addition of chloroform affects their properties [695]. In the DOPC bilayer, chloroform induced a small increase in the extent of tail ordering, while, in the DSPC–cholesterol bilayer, it was seen to have the opposite effect, inducing a pronounced reduction in the extent of tail ordering; it is hypothesized that the preference for chloroform to interact with flexible acyl chains and not with rigid cholesterol molecules leads to this effect. As a result, the number of direct cholesterol–cholesterol contacts increased while the number of cholesterol–DSPC contacts decreased.

In MD simulation studies, four anesthetics, desflurane, isoflurane, sevoflurane, and propofol, were shown to have negligible influence on bilayer properties when present in a POPC bilayer [305]. Additionally, chloroform, halothane, diethyl ether, and enflurane in a DPPC membrane, at a temperature of 310 K, were shown to have a negligible effect on the extent of acyl tail ordering, with the exception of the case of diethyl ether, where a small increase in the extent of lipid ordering was observed [298]. The presence of diethyl ether and sevoflurane in the membrane leads to a decrease in the extent of ordering of the acyl tails for the case of both DPPC and PSM bilayers; this effect is more pronounced for the case of bilayers that contain ~50 mol% of cholesterol [308]. Interestingly, evidence has been found that aspirin, one of the most frequently used analgesics, can also disrupt liquid-ordered domains or prevent their formation altogether [271]. The results of MD simulations have indicated that aspirin molecules affect the structure of the acyl tails of the lipids up to a depth of 1 nm into the bilayer with 30 mol% of cholesterol; coherent inelastic neutron scattering experiments have demonstrated liquid-ordered domain disruption, i.e., transition from the liquid ordered to the liquid disordered phase [266]. The local anesthetics lidocaine and articaine have also been found to decrease the extent of ordering of the acyl tails when in their charged (ionized) form; the results, however, indicate that, in their neutral form, their effect on the membrane structure is negligible [289].

Several experimental studies have been carried out that support the conclusions of the above MD simulations. For example, it has been demonstrated that the addition of chloroform loosens the structure of membranes in both gel and liquid-ordered phases [696]. Recent studies performed using multiple experimental techniques (Super–Resolution Microscopy dSTORM, Patch–Clamp, Fluorescence Resonance Energy Transfer) have demonstrated that both chloroform and isoflurane disrupt lipid domains (rafts) that contain the ganglioside GM1 [299]. Another anesthetic, propofol, at concentrations in the range 1 µM to 5 µM, has also been found to destabilize nanodomains in cellular membranes via Binned Imaging Fluorescence Correlation Spectroscopy [309]. Thus, the significant perturbations of the structure of bilayers in the liquid ordered phase, observed in MD simulation studies [271,308], are in agreement with experimental observations.

The aforementioned studies mainly focused on the effect of anesthetic molecules on the properties of the lipids related to the ordering behavior of the hydrocarbon chains of the lipids, i.e., order parameter, phase behavior, bilayer thickness, and surface area per lipid; this is, however, not the entire story. In 1997, Cantor proposed that general anesthesia is related to an alteration of the lateral pressure profile [683], a parameter not easily measured experimentally but relatively easy to calculate from MD simulations. An MD simulation study of ketamine, in a POPC bilayer, found evidence that the presence of the drug molecule could possibly give rise to a significant alteration to the lateral pressure profile with only a very limited alteration to the membrane structure, e.g., the global properties of the membrane discussed previously: membrane thickness, surface area, or the extent of ordering of the acyl chains [311]. Another MD simulation demonstrated that two

other anesthetics, diethyl ether and sevoflurane in (1) DPPC, (2) DPPC–cholesterol 50 mol% (3) PSM, and (4) PSM–cholesterol 50 mol% bilayers, were found to significantly affect the lateral pressure profile [308]. It has also been demonstrated, through MD simulation, that the anesthetic chloroform (CHCl3) and the relatively similar molecule carbon tetrachloride (CCl4), a non–anesthetic, affect the membranes in strikingly different fashions, as measured by the effect on the lateral pressure and electrostatic potential profiles of the presence of one or the other of these two molecules in the membrane [300]. Experimental studies have also been carried out that provide evidence of the activation of transmembrane proteins as a result of alterations to the lipid composition, e.g., using the mild detergent Triton X-100 [541,542], the relationship between lateral pressure profile and lipid composition was investigated both experimentally [697] and through MD simulation [698]. A few MD simulation studies found specific effects due to direct lipid interaction on protein behavior e.g., [699–705]; in most cases, the cholesterol molecules in the membrane were, however, most frequently shown to play the role of modulating the behavior of membrane proteins [706–710].

The entry of certain molecules into the biomembrane can affect its properties in a more complex fashion than just altering the aforementioned parameters: it can affect the extent to which undulations are present in the membrane structure. In MD simulation studies of lipid bilayers using the coarse-grained MARTINI potential, it was found that the addition of chloroform can decrease the degree to which undulations are present in a membrane that contains ordered and disordered domains [301]. In the bilayer free of the anesthetic, the ordered domains in opposite leaflets were not registered, i.e., they were not across from one another. In this case, the bilayer was characterized by clearly visible undulations; the addition of chloroform led to the rearrangement of the bilayer, and ordered domains became registered and the bilayer thus became flat.

Although the above discussed computational and experimental studies demonstrate the possible membrane–mediated mechanisms of action of molecules designed as general anesthetics, namely the disruption of ordered lipid domains, such drug–membrane interactions cannot explain the sensitivity of the potency of anesthetic molecules to drug stereochemistry or single point mutations on the proteins involved in anesthesia [711]. Moreover, binding sites for anesthetics have been found from crystal structures of the membrane proteins [712–714], docking calculations [715], and MD simulation [307]. For these reasons the membrane mediated mechanism of anesthesia is frequently questioned [711], however, the above discussed studies provide clear evidence of anesthetic induced effects on membrane properties; additionally, evidence that the properties of the membrane influence protein functionality exist. Thus, a membrane-associated mechanism of action for general anesthetics cannot be discarded as one of the mechanisms involved in what is, admittedly, a complex process. Nevertheless, the interpretation of the Meyer–Overton correlation as an indication of a membrane-mediated mechanism is incorrect; we return to this issue later, in Section 6.2.1; this point will become clearer to the reader when they have read this section.

#### *5.5. Can Drugs Prevent Amyloid Formation via the Modification of Membrane Properties?*

The formation of amyloid plaques deposited on the extracellular membranes of neurons is one of the primary features of Alzheimer's disease [716]. This phenomenon is affected by specific lipids including cholesterol, sphingomyelin, and gangliosides [717,718]: lipid types typically present in the outer leaflet of the cell membrane. Amyloid fibers at a membrane surface have thus been frequently studied using MD simulations, e.g., [719–730], however, a surprisingly small fraction of this work has been performed in the context of drug design [731,732]. Khondker et al., proposed designing drugs with a mode of action that involved modulating membrane properties to affect the aggregation of amyloid-β25- 35 at the bilayer surface [229]. They wished to study the modulation of the membrane thickness, ordering, and other properties, as possible modes of drug action; thus, they selected three molecules to study: curcumin, acetylsalicylic acid, and melatonin. Using MD

simulations, they demonstrated that curcumin decreases membrane thickness, decreasing the extent of membrane ordering and rigidity; however, acetylsalicylic acid increases membrane thickness, thereby increasing membrane ordering and rigidity while melatonin does not affect the membrane properties at all; it was thus verified that these three molecules represent the needed set of examples of molecules that affect membranes in different ways. It was then determined experimentally that the decrease in the volume fraction of the cross-β sheets was ~70% for the case of curcumin in the membrane; evidence was thus found that decreasing membrane thickness in the vicinity of the amyloid-β25-35 molecules could possibly be a mechanism of drug action for the treatment of Alzheimer's disease.

#### *5.6. A Clear Case of Drug Membrane Interaction as Mechanism of Action—Antimicrobial Agents*

Antimicrobial agents represent a clear case of an established therapy where the global alteration of the properties of specific membranes is the clear mechanism of drug action. In general, biomembranes have a high degree of stability under physiological conditions and possess the capacity to adapt to extreme changes in their environment via changes in the lipid composition, e.g., [733–736]; evolution has provided biomembranes with several tools to preserve themselves over a wide range of conditions. They are, however, not completely invulnerable: numerous chemical agents and certain external conditions can disrupt their structure. The lipid bilayer structure emerges from weak, nonspecific, interactions and, additionally, the structure of biomembranes is formed by the cell, in driven processes and can include lipids that do not spontaneously form bilayers; it is thus unsurprising that they have vulnerabilities to several sources of disruption. The structure of a lipid bilayer can be compromised through the formation of pores that perforate the membrane, due to the presence specific molecules or physical stimuli, e.g., electric field [737], ultrasound [738,739], and charge imbalance [740,741]. Finally, a lipid bilayer can be dissolved completely at the molecular level by, e.g., detergents [742,743] or organic solvents [744].

Procaryotic (bacterial) and eurkaryotic (animal, i.e., patient) membranes differ substantially regarding the lipids of which they are composed. The extracellular leaflet of a eukaryotic cell membrane is comprised of cholesterol, saturated phospholipids, and sphingolipids, while procaryotic membranes contain mainly phosphatidylethanolamines, phosphatidylglycerols, and cardiolipin. This substantial difference, combined with the fact that some molecules can disrupt a cell membrane catastrophically enough to lead to cell death, opens the door to a tantalizing opportunity: the design of molecules that severely disrupt the structure of bacterial membranes while leaving eukaryotic membranes, i.e., the membranes of the cells of the patient being treated, relatively intact: they act as an antibiotic. The bacterial membrane is, however, not the only possible target for peptides designed to destroy specific lipid membranes. A recent development is the emergence of a second important target: cancer cells. The overall lipid composition of the cancer cell resembles that of normal cells, however, the asymmetric distribution of lipid classes between the inner and outer leaflets, normally found in eukaryotic cell membranes [745], is lost; they instead display a substantial concentration of phosphatidylserine in the outer leaflet. The selective disruption of lipid membranes rich in phosphatidylserine becomes a possible mode of action for cancer therapy: anticancer peptides [746,747] or drugs affecting the membranes of cancer cells [748].

Peptides that selectively compromise bacterial but not eukaryotic (host/patient) membranes, are a large and diverse group of potential drugs with a long history of research [749]. Currently, the term "antimicrobial peptides" is used as an umbrella term for peptides that display a wide range of different activities against different pathogens. For example, the APD3 database (2016) of natural antimicrobial peptides includes 2169 antibacterial, 172 antiviral, 961 antifungal, 185 anticancer, 307 hemolytic, 80 antiparasitic, 11 spermicidal, 27 insecticidal, and four anti-protist peptides [750]. The current number of antimicrobial peptide sequences deposited in "DBAASP v3: the database of antimicrobial/cytotoxic peptides" is 16,633 (01.02.2021); this number includes over 12,000 synthetic peptides [751]. Interestingly, over 3200 peptides have been the subject of MD simulations. Finally, dbAMP

database contains over 12,000 entries [752]. There are at least six other databases dedicated to antimicrobial peptides [753–758] (for review, see [759,760]). Surprisingly, the number of clinical studies of antimicrobial peptides is, as of yet, relatively low (only 76) and the majority of these have ended in failure [761]; so far, only seven peptides have been approved for clinical use by the FDA [762]. However, the literature concerning antimicrobial peptides is extensive; prior comprehensive reviews of various aspects of antimicrobial and anticancer peptides have been published [763–785], including reviews focused specifically on computational studies [46–54]; we provide only a brief overview of this topic. Table 4 includes all antimicrobial peptides discussed in this paragraph.

Molecular dynamics simulations are frequently used to provide information regarding (1) peptide conformation (e.g., % of helicity) [786–798], (2) peptide location (at the membrane interface or inserted into the bilayer core), and (3) orientation in the membrane [783,787–790,792–794,796,799–806]. As we stated previously, simulations can also provide information regarding the effects of the peptides on membrane properties, including (1) the extent of ordering in the acyl tails, (2) the overall membrane thickness, and (3) curvature [793,794,799,800,807] as well as the formation of membrane defects [794]. MD simulations and other computational methods, combined with complementary experimental methods, can be used to design more effective antimicrobial [783,798,808–811] and anticancer peptides [747,789,812–814]. For designing new peptides, an understanding of the interactions between peptides and lipids is of particular interest. The results of MD simulations can provide explicit information concerning, for example, (1) hydrogen bonds, (2) hydrophobic contacts, and (3) electrostatic and Van der Waals interaction energies [792–794,799,800,815]. For example, studies of pardaxin in the membrane have demonstrated the importance of cationic residues and phenylalanine residues on peptide association with lipids [806]. Finally, MD simulations allow for the study of the effect of chemical modification on the peptides (e.g., amidation) [787] and the effect of helical kink on peptide insertion into the bilayer [816,817].

The above discussed studies considered only peptides statically located at a membrane; the main mechanism through which antimicrobial peptides kill cells is, however, the formation of transmembrane pores. Simulating pore formation is computationally expensive; thus in a few older studies, to avoid long simulations, antimicrobial peptides were initially placed in the lipid bilayer core [818–820] or simulations were performed using the coarse-grained MARTINI model [821–825]. A few studies set out to elucidate the entire process of pore formation; however, the protocol of the MD simulations was altered e.g., by running simulations at an artificially high temperature or through a biased selection of the initial configuration. For example, Sun et al., modeled the formation of transmembrane pores, created by melittin, by initializing the simulated membrane with artificially created defects already in place [826,827]. The N-terminus of the peptide stabilized these defects, resulting in their transformation into small pores following the recruitment of two additional peptides. Wang et al., performed MD simulation at temperatures in the range 80–120 ◦C; this allowed for the observation of pore formation by the peptide maculatin [828]. Interestingly, they observed the formation of every variety of possible oligomers, from dimers to octamers. In simulations of the antimicrobial peptide pleurocidin, using a model with all atom resolution, the formation of a small pore composed of two helices was observed; however, in much longer coarse-grained simulations a larger number of helices were seen to be recruited to form a single pore [829].

Unbiased MD simulations of transmembrane pore formation were used to design a new 14 residue long peptide, LDKL, which was then modified to include input from the analysis of known antimicrobial peptides [830]. The final peptide, LDKA, contained only four amino acid types: (1) leucine, (2) aspartic acid, (3) lysine, and (4) alanine. Large pores formed by the LDKA peptide consisted of two overlapping pores formed separately in both membrane monolayers and occurred at low peptide/lipid ratios (1:1000). The ability of the peptide to kill both Gram-positive and Gram-negative bacteria without harming erythrocytes has been confirmed experimentally [830].

The results of MD simulations of magainin 2 in its native form and its covalently bound dimer, provided evidence for the formation of a disordered toroidal pore. The concentration of the peptide necessary for pore formation was substantially lower for the dimeric than the native form of magainin [831].

Calculations of the free energy of collective pore formation using umbrella sampling (Figure 16A) found that six melittin molecules can stabilize a pore; formation is associated with a small free energy (activation) barrier of ~10 kJ/mol. For seven peptide molecules, the barrier is absent; however, after pore formation, one molecule was seen to be diffusing away from the pore. When the number of melittin molecules is lower than six, the free energy barrier becomes significant, but five peptide molecules were also able to form a stable pore [832]. In another study, the free energy for the insertion of a single melittin molecule into a bilayer containing 0 to 6 peptide molecules was calculated (Figure 16B,C), a pore composed of three or more peptide molecules was seen to form [833]. The free energy barrier is reduced when a larger number of peptides is present in the bilayer and almost disappears, i.e., is close to 0 kJ/mol, when six peptides are present in the membrane. Atomistic MD simulation studies have also found that synthetic polycations [834], itraconazole [258], and DMSO [258] induce a decrease in the free energy barrier against pore nucleation.

**Figure 16.** (**A**) Snapshots of stages of pore formation along the collective variable ξ. Reproduced with permission from ref [832]. Snapshot demonstrating the insertion of a single helix into the membrane (colored by residue polarity) where pores formed of (**B**) 3 and (**C**) 5 helices are present. Reproduced with permission from ref [833].

It is worth noting that the results of calculations of the free energy of pore formation are strongly dependent on the force field used [258,835]; other methodological issues related to the simulation of membrane active peptides are (1) the size of the simulation box, (2) electrostatic interaction treatment, (3) initial conditions, and (4) the simulation protocol used [836].

Peptoids, N-substituted glycine oligomers, are a possible alternative class of molecules to antimicrobial peptides with the advantage of increased flexibility in functionalization via the presence of an amide bond [837]. Peptoids have not been studied with theoretical methods in the context of their membrane activity, but rather in the context of their applications in nanotechnology, e.g., the study by Jin et al., in 2016 [838]. Recently, a coarse-grained force-field created within the MARTINI framework [605] for peptoids was developed [839]. Nevertheless, the interactions between peptoids and membranes have been investigated experimentally [840]. These studies have found that antimicrobial peptoids increase the permeability of bacterial membranes; this effect is increased via cyclization [841]. Moreover, peptoids have recently been found to be potent antiviral agents and have been successfully tested for activity against the SARS-CoV-2 virus [842].

Another class of drug molecules derived from peptides that has been proposed for use as an antimicrobial agent are the β 2,2-amino acid derivatives. In a simulation study carried out by Koivuniemi et al., MD simulations of these derivatives interacting with both prokaryotic and eukaryotic model membranes found evidence of important differences in their behavior in the two membrane types [843]. Additionally, antimicrobial peptides can have their desired properties enhanced through the introduction of non–natural amino acids, e.g., the azoALY peptide derived from ALY (full sequence: ALYLAIRKR) by functionalization of the sole tyrosine residue present with an azobenzene group [844]. Both MD simulations and experimental studies have found evidence that this modification results in a peptide with an enhanced interaction with lipid bilayers. Finally, peptides conjugated with dendrimers have been proposed as a novel form of potential antimicrobial compound; MD simulations have demonstrated the presence of strong interactions between these molecules and bilayers composed of POPG [802].

In the above discussed studies, the bacterial membranes were modeled as simple mixtures of either PC or PE with PG lipids, in spite of the fact that lipopolysaccharides (LPS) are known to be present in bacterial membranes; aspects of the membrane structure, behavior, and interactions with other molecules resulting from the presence of LPS in the membrane were not explored. The reason for this is that when LPS molecules are included in the membranes, the size of the system and timescale needed for system equilibration becomes too large for the system to be tractable for MD simulation with a model with all atom resolution. Theoretical studies have, however, been carried out that indicate LPS plays a protective role against antibacterial peptides, e.g., magainin [845].

Antimicrobial peptides are not the only group of chemical compounds capable of selectively disrupting the bacterial membrane; for example, recent studies of potential antibiotics kanamycin A [133] and bithionol [247], provide evidence that these drugs have an affinity towards model membranes that mimic the bacterial membrane and either do not interact with, or only weakly affect, model membranes designed to mimic eukaryotic membranes. Another set of potential antimicrobial compounds were synthetized on the basis of cholic acid esterified with three glycine and an aliphatic alcohol of length 1, 6, or 12 carbons; MD simulations found evidence of the presence of strong interactions between these compounds and bilayers composed of lipopolysaccharides from Gramm– negative bacteria [846]. The strongest effects on bilayer properties were observed for the compound with the six-carbon long alcohol. Membrane perturbation by antibiotics (e.g., fluoroquinolones and aminoglycosides) was proposed as an additional bactericidal mechanism for these drugs [847]; recent studies of aminoglycosides, macrolides, and fluoroquinolones determined that these antibiotics have a disordering effect on bacterial membranes; however, they found no evidence of large-scale membrane disruption or pore formation [848]. Interestingly, colchicine, a drug frequently used in chemotherapy, is capable of inducing pore formation in lipid bilayers [849].

#### *5.7. Other Effects on Lipid Layers—Pulmonary Surfactants and Indirect Effect on Membrane Proteins*

Drug molecules have also been shown to affect the properties of lipid monolayers, e.g., disrupting lung surfactants [264]. To condense pulmonary surfactant monolayers doped with ketoprofen, higher surface pressures were necessary in comparison to the case of the pure monolayer [263]. The difference was found to increase with decreasing surface area per lipid. This observation was in qualitative agreement with data obtained from Langmuir– Blodgett monolayer experiments. A similar result was obtained for pulmonary surfactant monolayers doped with levofloxacin [238]. The increase was more significant when the model was simulated with an elevated value for surface pressure. Additionally, the presence of levofloxacin in the membrane had a disordering effect on the acyl tails; again, the effect was more substantial for the model simulated with a higher surface pressure. Glycerol is often used in the formulation of pulmonary drugs and the liquid medium in electronic cigarettes. It has been demonstrated, both experimentally and through MD simulation, that glycerol affects lipid monolayers and bilayers; its presence leads to significant monolayer expansion, even at the low *w*/*w* concentration level of 1% [424].


**Table 4.** Antimicrobial peptides.


**Table 4.** *Cont.*

#### **6. Role of Membrane in Substrate (Drug) Selection of Membrane Proteins**

*6.1. Membrane Proteins: The Majority of Drug Targets*

In addition to direct drug–membrane interactions, lipid membranes also play a role in the interaction between substrates and membrane–associated proteins, with both substrate– membrane and protein–membrane interactions playing roles in the substrate, and thus the drug selection mechanism of membrane-associated proteins. Since membrane–associated proteins are, in fact, the majority of drug targets [880,881], this issue should be central to drug design. However, it can be argued that, so far, it has not received the attention it deserves (Figure 17).

**Figure 17.** Chemical structure of drugs introduced in Section 6.

Membrane associated proteins can be subdivided into three categories: (1) multi– pass (integral) membrane proteins, where the ligand/substrate binding site is frequently within the membrane core (39% of membrane proteins [882]), (2) bitopic (single-pass) membrane proteins with one or more domains outside the membrane but anchored to the membrane through a single trans-membrane helix, with the functional domain, thus

the active site, located outside the membrane (36% of membrane proteins [882]), and (3) peripheral membrane proteins, with no permanent association with the membrane (16% of membrane proteins [882]); however, for catalytic activity that involves interaction with a membrane, catalysis occurs at the membrane surface.

There is what can arguably be described as a fourth variety of membrane associated proteins that we, however, do not discuss in this review paper as they have not been studied to as significant an extent: 9% of membrane proteins are covalently bound with lipids; a common example is a glycosylphosphatidylinositol (GPI) molecule bound to the C terminus of the protein, known as a GPI-anchor [882]. As is the case with bitopic membrane proteins, the functional domain of the protein is located completely outside the membrane but is bound to a lipid membrane through an anchor.

The term "multi-pass" to describe integral membrane proteins refers to the structure of the trans-membrane domain, composed of several (up to 14 [883]) *α*-helices as the protein threads back and forth through the membrane; another class of integral membrane protein exists with a β–barrel rather than a set of *α*-helices as the trans-membrane domain. However, such proteins are not common [884] and have not been used as drug targets, thus they are not discussed in this review; yet, it should be mentioned that β-barrel membrane proteins are predominately found in bacterial membranes, e.g., the outer membrane of gram negative bacteria (in eukaryotic cells only in mitochondria and chloroplasts), thus could possibly in the future be potential targets for antibiotics, though, as far as we are aware, this is not yet the case.

A mechanistic understanding of the role the lipid plays in the substrate selection of membrane proteins is a key element of the design of drugs that target them, and MD simulation has a central role to play in obtaining this; we now discuss examples where MD simulation has obtained insight relevant to drug design, for all three varieties of membrane associated proteins.

In contrast to multi-pass (integral) membrane proteins, where considerable work has been performed, in particular the GPCR class of membrane proteins e.g., [885–899] (for review see references: [31,900–903]), that are a very hot topic, peripheral and bitopic (single pass) membrane proteins, which constitute 43–45% of transmembrane proteins [883], are underrepresented in MD simulation studies.

One of the reasons for this is that obtaining experimental insight into the nature of the membrane–protein interaction for weakly membrane associated proteins is extremely challenging. Obtaining structures from x-ray crystallography or cryo-EM, while less straightforward than for the case of water-soluble proteins, is still possible for integral membrane proteins within membrane-like nanodiscs [904–906] or a detergent environment [907–909]; obtaining such information for the more weakly associated proteins directly from experiment is challenging and rarely achieved. The rare cases of success in this regard represent a Herculean effort: one must work with incomplete structures combining fragments originating from several PDB entries and, frequently, the assistance of theoretical methods is necessary (e.g., homology modeling [910–912] and modeling of short loops [913]). Moreover, extracellular domains of bitopic receptors are heavily glycosylated and the precise sequences of these complex carbohydrate branches have, in most cases, not been determined. The few MD simulation studies of glycosylated receptor proteins that have been carried out have demonstrated that glycosylation determines the behavior of these receptors [914–916]. Interactions of these latter two classes of membrane proteins with lipids differ significantly in comparison to multi-pass (integral) membrane proteins since interactions with the lipid headgroups become dominant over interactions with the membrane core. As mentioned in the beginning of this review, the solution of the structure from sequence problem by the Alphafold project [1] completely changes the game; initial results show promise for even the structure of bitopic membrane proteins, so there may soon be rapid progress in this area.

#### *6.2. Multi-Pass (Integral) Membrane Proteins*

#### 6.2.1. Our Discussion Follows the Framework of Vauquelin

Multi pass membrane proteins are the class of membrane proteins that have been experimentally and computationally studied the most in their context as drug targets, primarily due to their functions as receptors (40% of multi pass membrane proteins), channels and transporters (another 24%), and enzymes (yet another 16%). The importance of this group of proteins may be exemplified by the Nobel prizes awarded for the study of various aspects of an important family of 901 (in human) integral membrane proteins [917], the G coupled receptors, including Nobel prizes in Medicine and Physiology: 1967 (studies of rhodopsin), 1988 (discovery of β-blockers), 1994 (discovery of G-protein), 2004 (odorant receptors), and even a Nobel prize in Chemistry: 2012 (3D structure of GPCRs) (GPCR) [918–922]. Issues relating to the design of drugs to target integral membrane proteins are covered in several already published reviews [923–926].

In describing the role played by MD simulation in the study of multi-pass membrane proteins, in their context as drug targets, we will follow the categorization scheme developed by Vauquelin, presented in ref [927], shown as a schematic in Figure 18, i.e., we draw attention to the contribution that MD simulations and related methods have made towards understanding drug–membrane–target protein relationships using their classification scheme, as a framework for our discussion. The scheme is summarized (see Figure 18) as: (1) the effective increase in both the kinetics and affinity due to drug accumulation in the membrane and the resultant reduced dimensionality of drug diffusion (Figure 18A–C); (2) how the membrane affects the entry of the ligand into the binding site (Figure 18D); (3) the role the membrane plays in the ligand interaction with the protein (Figure 18E); and (4) the possible design of hybrid drugs connecting orthosteric and allosteric pockets (Figure 18F). Our discussion will now follow this framework in order.

**(1) The accumulation of drug molecules in membranes significantly increases the local concentration of the drug molecule in the vicinity of the targeted membrane proteins in comparison to bulk solution, thus increasing the apparent affinity of the drug for the active site and making the drug kinetics more favorable (Figure 18A).**

Due to the high affinity of drugs for the lipid membrane, the release of the drug from the membrane environment is slow and the local concentration of the drug remains enhanced in relation to its bulk concentration in the solvent medium. This results in drug molecules being predominantly confined to the two-dimensional environment of the membrane that the protein is associated with, rather than the three-dimensional environment of the bulk solution around the protein. Thus, as a result of geometry alone, diffusion confined to two dimensions is significantly more limited in comparison to diffusion in three dimensions; the drug is more likely to remain in the vicinity of the protein with an increased chance of binding again as a result of locating preferentially to the lipid membrane as opposed to being randomly distributed in the bulk solution. This phenomenon is known as "rebinding" and has been documented experimentally [928] and considered in models of drug kinetics [929]. In order to take this into account, micro–pharmacokinetic models, which consider local drug concentration, have been proposed.

Extensive studies of ligands of the β2-adrenergic receptor have demonstrated that lipophilic drugs that accumulate on the lipid bilayer are characterized by a higher rate of drug−target protein association (*Kon*) in comparison to the more hydrophilic drugs with the same pharmacophore [930]. On the other hand, the dissociation rate (*Koff*) was not found to be dependent on drug lipophilicity [930], as lipids do not affect protein–drug interactions inside the binding site. Subsequent studies that used fluorescently labeled propranolol found an accelerated accumulation of the drug on the cell membrane and suggested that the actual affinity of propranolol for the β2-adrenergic receptor was significantly less than previously anticipated, based on a standard measurement assuming a uniform distribution of the drug in the volume of a sample [931].

**Figure 18.** Accumulation of drugs in the membrane and its consequences for drug–protein interactions. Based on Figure 2 from ref. [927].

In addition to the kinetics of binding, this phenomenon will affect the effective affinity, of the binding site, for the ligand; it will be a factor that is missed by conventional docking and scoring studies that do not take the membrane into account. While this will not affect the enthalpic term of the binding affinity, it will have a marked effect on the entropic component; bias of drug molecules to confinement to a two-dimensional plane rather than the three-dimensional bulk medium lowers the entropic penalty of binding. Techniques that use MD simulation with a force bias, that we discussed previously, can be used to obtain a quantitative estimation of this effect. For example, a deconvolution of the membrane and protein binding contributions, i.e., separating the contributions to affinity due to the membrane and the binding site, performed for β2 adrenergic receptor ligands, improved the accuracy of the results of a structure–activity relationship analysis [932].

We can now, in light of the discussion of the activity of membrane proteins, return to our discussion of the mechanism of action of anesthetics. The aforementioned Meyer–Overton correlation is the result that more lipophilic anesthetics have a tendency to produce stronger effects. The correlation can be most easily explained through a micropharmacokinetic model. However, this interpretation, while the most elegant, does not support a membrane mediated mechanism for the action of anesthetics but rather suggests the involvement of membrane proteins; nevertheless, the possibility that modification of the membrane properties plays some role in a more complex mechanism of action for anesthetics is not excluded.

**(2) The accumulation of drugs in the membrane will affect the entry pathway of a ligand into the binding site as this will create a bias, increasing the likelihood that drugs approach the protein from within the lipid membrane (Figure 18D).**

As discussed previously, the accumulation of ligand molecules in the membrane will increase both the entrance frequency and apparent affinity of ligands for the active site; this is, however, not the only effect of the preferential location of ligands to the membrane: the path taken to enter the protein will also be affected, resulting in further inaccuracy in the standard "lock and key" paradigm that does not take the membrane into account. In many cases, the active site is entirely within the membrane core, with entry being made from the lipid phase into an opening between trans-membrane helices. In fact, the direct entry of a ligand from a membrane into a receptor can be anticipated based on the crystal structure of the receptor. For example, for the case of the sphingosine 1–phosphate receptor 1 the extracellular opening is blocked by the N–terminus and extracellular loops [933]; the entry location for ligands is rather from within the membrane core to a receptor deep within the lipid phase of the membrane between trans membrane (TM) helices TMI and TMVII. For the protein Opsin, the retinal binding site opens directly into the membrane core at two points between TM1–TM7 and TM5–TM6 [934]. The two openings and the central cavity, constitute a channel through the protein, with a length of 7 nm and a width of 1.16–0.32 nm. The entry of a ligand into the binding pocket has been studied via MD simulations, in some cases using force bias methods and these will be discussed in detail later in this section. Membrane mediated ligand entry/exit pathways to the binding pocket have been discussed in previously published review articles for the case of GPCRs [935,936], but also for the larger group of membrane proteins that includes GPCRs, ion channels, and transporters [41].

**(3) Drug partitioning into the membrane can alter the conformation of the drug molecule, thus affecting its interaction with the binding site of the target protein.**

When immersed in the hydrophobic environment of the membrane core, the situation is roughly equivalent to immersion in a non-polar solvent. Many molecules exist that are soluble in both polar and non-polar solvents and achieve this through changing their conformation; it is thus possible for a molecule to exist that first enters the membrane, then alters its conformation within the membrane core and, only at this point, with its new conformation adopted within the hydrophobic environment of the membrane core, has a structure with a high affinity for a specific binding site of a membrane protein (Figure 18E). Through MD simulations, we can directly observe such behavior, for example, the candesartan antagonist of the GPCR AT1 receptor changes conformation after entry into the membrane [193]. Another example is the cannabinoid analog AMG3. This molecule is in possession of six flexible bonds, thus allowing for significant conformational change; MD simulations of AMG3 in a lipid bilayer were used to find the optimal conformation for binding to both the CB1 and CB2 receptors [338].

**(4) The exosite model (Figure 18F) assumes that the drug molecule is composed of two linked component ligands: the first binding to the orthosteric pocket and the second to the allosteric site [937].**

The allosteric sites are frequently located at the protein–lipid interphase and allow for increased specificity in comparison to a drug that targets only the orthosteric (binding) pocket. Due to the highly conserved structure of orthosteric pockets, e.g., among GPCRs, it is difficult to find a highly selective ligand based on this criterion alone; an inclusion of the allosteric site into the computational drug design approach allows for the necessary targeting specificity. The use of the allosteric site in addition to the orthosteric pocket in drug design has been referred to as the "exosite model" and "the design of bitopic ligands/drugs" by two separate research communities [938,939]. Modeling methods, including MD simulations, are one of the tools used in the design of bitopic drugs [900,940,941]; however, in these studies lipids are only considered as a passive component and their complete role in substrate selection is not fully elucidated.

6.2.2. Exploring the Complex Pathways to the Active Sites of Integral Membrane Proteins

The time scale that can be investigated through MD simulations is a limiting factor for a direct observation of ligand entry into the binding site of receptors or enzymes; only a

few studies have thus addressed this problem with unbiased MD simulations. A study of histamine entry into the H4 receptor found evidence of two possible entry pathways [407], one of which overlaps with the previously determined Na<sup>+</sup> entry pathway [556]. In extensive simulations of alprenolol, an inhibitor of the β2 adrenergic receptor, the entrance of the drug into the binding pocket of the receptor was observed 12 times; entry, however, occurred from the water phase despite strong preferences observed for the drug molecule to locate to the lipid phase and remain there for the majority of the simulation [195]; in this case, the membrane acts as a drug reservoir, and its mechanism described in Figure 18A. The ligand entrance into the protein binding site, directly from within the lipid core of the membrane, was demonstrated for the example of the cannabinoid sn-2 arachidonoylglycerol (2-AG) and the CB2 receptors [455]. The entrance gate for the ligand locates between TM6 and TM7. The entry of the ligand into the sphingosine-1-phosphate receptor-1 was characterized using multiple, microsecond long simulations [942]. The entry path began in the vestibule at the top of TM7, then preceded to enter the protein between TM7 and TM1. A similar MD study of the binding and activation of Orexin–A was also carried out [943]. In studies of the dopamine D3 receptor, a set of 1000 short 20 ns simulations of systems, where dopamine was initially located in the water phase, found only 22 cases of direct entrance of dopamine into the binding pocket of the receptor [394]. In 736 cases, the dopamine molecule located to the membrane, settling among the lipids but not in contact with the protein; in another 180 cases, it located to the protein–lipid interface and in the remaining 62 cases located to the protein surface exposed to the water phase. These numbers closely correspond to the surface area of the membrane occupied by lipids and protein in the studied model; thus, this provides evidence that the protein has no inherent properties that attract the ligand, i.e., the relative number of binding events correspond to the relative surface areas, an unexpected result.

Flooding simulation was used to study the interaction between the *Gloeobacter violaceus* ligand-gated ion channel and the general anesthetic desflurane [307]. These studies demonstrated that the desflurane molecule enters a binding site, with a structure known from X-ray crystallography [712], via a membrane-mediated pathway. Moreover, simulations revealed an additional binding site not anticipated in previous studies. In this case, desflurane entered the binding site directly from the lipid core (Figure 19). In another flooding simulation study, menthol, a small compound extracted from mint that acts as a local anesthetic, bound to the α4β2 nicotinic acetylcholine receptor, to a binding site located at the lipid–protein interface via a membrane–mediated pathway [434]. In other flooding simulations of two anesthetics, chloroform and isoflurane, the molecules first entered the lipid bilayer and then subsequently diffused to the allosteric binding sites of the vanilloid-1 receptor (TRPV1) [297]. These studies identified five binding sites for chloroform and three for isoflurane, in spite of the fact that the overall affinity of the aforementioned drug molecules toward TRPV1 was relatively small.

In some cases, MD simulations determined only a fragment of the drug entry pathway or merely demonstrated the accumulation of the drug in the vicinity of possible entry points. For example, multiple 500 ns MD simulations of the drugs clozapine and haloperidol, acting as ligands of the dopamine D2 and D3 receptors, determined the pathway between the binding site and the receptor vestibule, where drugs were placed into the initial structure; however, the pathway between the vestibule and membrane remains unknown [322]. Amantadine is a drug used to treat influenza. At the molecular level, amantadine inhibits the M2 transmembrane proton channel, preventing virus budding. Through MD simulation, it has been demonstrated that the more hydrophobic derivative of amantadine, spiro[pyrrolidine2,2′ -adamantane], has a tendency to aggregate around the M2 channel to a greater extent than amantadine; this could be the mechanism behind the experimentally observed higher binding affinity of spiro[pyrrolidine2,2′ -adamantane] for the M2 protein in comparison to the case for amantadine [254]. Simulations of candesartan, an antagonist of the GPCR AT1 receptor, an integral membrane protein, found evidence that that the drug approaches the opening between helices 4 and 5 from where entry into the

binding site has a high probability [193]. This study then made the important observation that the two-dimensional diffusion coefficient of the drug, in the plane of the membrane, calculated from the MD simulation trajectories, was in excellent agreement with the diffusion coefficient measured experimentally using liposomes as a membrane model. This result further supports the 2D model of drug diffusion when approaching the previously discussed receptor. Finally, Kiriakidi et al., suggest that lipids affect the conformation of the candesartan molecule, potentially affecting the affinity of the drug for the binding site. Yet, for two lipid-like drugs, N-arachidonylglycine, and oleoyl-L-carnitine, the only possible pathway to access the binding site is from within the membrane. Extensive MD simulations of the glycine transporter GlyT2 embedded into the lipid bilayer found evidence for numerous possible drug interactions with the protein; unfortunately, this study was unable to determine the specific binding site responsible for receptor inhibition [137].

**Figure 19.** Partitioning of desflurane into the membrane and Gloeobacter violaceus ligand-gated ion channel studied using a flooding simulation. Snapshots of the simulated model at the (**A**) beginning and (**B**) end of the simulation. Desflurane molecules were colored according to their location: blue—water phase, green—lipid bilayer, red—ion channel. Reproduced with permission from ref. [307].

As stated above, the timescales involved limit the extent to which unbiased MD simulation can elucidate the interaction between potential ligands and integral membrane proteins. As a result, several groups have made use of a variety of force bias, and other enhanced sampling techniques to gain further insight; the situation is similar to that regarding membrane translocation discussed earlier, though the specific methods that are optimal differ. Studies of the β-adrenergic receptor using RAMD found five possible exit pathways of the receptor antagonist carazolol [200]. The most frequent entry/exit pathway was passing through the extracellular opening, i.e., the opening exposed to the solvent, at the top of the receptor; however, in 30% of runs, carazolol exited the binding pocket through clefts between the transmembrane helices. The most frequently utilized alternate pathway was the cleft between the TM4 and TM5 helices (TM4–TM5); other pathways pass between helices: TM5–TM6, TM1–TM2, TM1–TM7, and TM6–TM7 (see Figure 20). For the case of dopamine, D3 receptor RAMD simulations revealed an alternative pathway that passed through a lateral gate between helices TM5 and TM6 [394]. Subsequent free energy calculation using the umbrella sampling algorithm [93] demonstrated that there is no energy barrier along this pathway. Temperature accelerated molecular dynamics simulation was applied to the study of protease-activated receptor-1, a GPCR activated by the binding of a peptide; it was determined that an exit/entry patch of a ligand from the binding pocket directly faces into the membrane hydrocarbon core [944]. The two observed exit/entry points were located between helices TM4, TM5, TM6, and TM7. These simulations also found evidence for a third possible entry/exit pathway located between the extracellular loops of the receptor and the water phase.

**Figure 20.** Entry/exit pathways of carazolol observed in the RAMD simulations. Left and right panels show sets of simulations with different acceleration magnitudes. The color scale is from yellow at the beginning of the simulation to blue at the end of the simulation. The starting conformation of the receptor is shown in a cartoon representation. Reproduced with permission from ref. [200].

Most drugs bind to the orthosteric binding pockets of proteins where their natural ligands bind; however, it is possible for proteins to be modulated by a drug binding to an allosteric site, e.g., for the GPCR family, 7 allosteric pockets facing into the membrane hydrocarbon core have been found in 12 GPCRs [945]. Yet, the most recent studies of M5 muscarinic acetylcholine receptor revealed the presence of three allosteric sites including one novel site [946]. Recent studies provide an example of such an allosteric modulator of the β2-adrenergic receptor: compound AS408 was shown to bind between transmembrane helices TM3 and TM5 at the surface, directly facing the membrane core, thus stabilizing the inactive conformation of the receptor [197]. In MD simulations, Yuan et al., demonstrated the binding of the P2Y1 receptor antagonist, BPTU, to the extra-helical site between helices 1 and 3 [329]. The binding took place via the lipid bilayer in three steps: first, BPTU enters the water–membrane interface from the water phase; next, BPTU forms an initial set of interactions with the extracellular loop 2 of the receptor; finally, the drug translocates to the binding site. Similarly, the binding site for ZM241385 in adenosine receptor type 2A was identified at the protein–lipid interface [947]; in this case, lipids stabilized the binding pose of the ligand.

#### *6.3. Bitopic Membrane Proteins*

#### 6.3.1. Proteins with a Tenuous, but Permanent Connection to a Specific Lipid Bilayer

As described above, for the case of bitopic proteins, the protein is tethered to the membrane through a single transmembrane helix, but with the functional domain/domains, thus active site/sites, located outside the membrane; there is, however, considerable evidence that the lipid membrane plays a role in substrate selection [381]. Monk et al., [948] in the very first complete structure of a bitopic protein to be determined, already found evidence that the trans–membrane helix and linker segment controls the orientation of the catalytic domain relative to the membrane, thus indicating a role for the membrane in catalysis; bitopic proteins cannot be thought of as merely a catalytic domain loosely attached to the membrane like a balloon with the linker segment as a piece of string attaching it to a the trans membrane helix acting like a pin in a cork board: the linker segment and trans-membrane helix play an active role in positioning the catalytic domain relative to the membrane.

We will now focus on four cases where MD simulation has seen considerable success in elucidating the role the membrane plays in substrate selection (thus drug affinity) for bitopic membrane proteins: (1) cytochrome P450, (2) the membrane bound isoform of catechol-o-methyltransferase (MB–COMT), (3) monoamine oxidase, and (4) tropomyosin receptor kinase B. In many cases, for example MB–COMT, these are proteins with water– soluble in addition to bitopic, membrane bound, isoforms.

#### 6.3.2. Cytochrome P450

Cytochrome P450 (CYP) is a family of 41 (in humans) bitopic membrane enzymes responsible for both the catabolism (synthesis) and anabolism (degradation) of a variety of small molecules [949]. For example, the synthesis of cholesterol, steroid hormones, bile acids, vitamin D3, and second messengers derived from polyunsaturated lipids is performed by CYPs. CYPs are also responsible for the degradation of small molecules, e.g., neurotransmitters, and various xenobiotics, including about 50% of all approved drugs [950–954]. A review of MD simulation studies of CYPs is given in ref. [955].

The catalytic domains of proteins belonging to the cytochrome P450 family have been studied extensively, in a water environment without the trans-membrane helix present, using MD simulation and related molecular modeling methodologies in the context of both drug design (e.g., [956,957]) and their mechanism of catalysis (e.g., [958]). It has been argued that these studies can elucidate the catalytic mechanism of membrane bound proteins, since the catalytic site is placed in an internal pocket of the protein and does not interact directly with the membrane. An analysis of the conformation of the active site of CYP 3A4 in both membrane bound and water-soluble forms reveals a lack of significant difference, the volume of the binding site was, however, larger in the water-soluble simulations [959].

The majority of known crystal structures of CYP P450 are only partial structures, containing only the catalytic domain. The only complete protein structure (including linker, transmembrane helix, and attached membrane) belonging to this family is yeast lanosterol 14α-demethylase [948]. The structure of lanosterol 14α-demethylase suggests that the transmembrane helix affects the orientation of the catalytic domain towards the lipid bilayer. Unsurprisingly, numerous MD simulation studies have been dedicated to elucidating the orientation of the catalytic domain relative to the membrane surface and its specific interactions with lipids: for example, this was achieved for CYP 2C9 [960]; CYP 17A1 [961]; CYP1A2, 2A6, 2C9, 2D6, 2E1, and 3A4 [326,962,963] through MD simulations, then subsequently validated experimentally. Interestingly, the orientation of the catalytic domains of CYP 2C9 and CYP 2C19 were found to differ from each other significantly, in spite of the relatively high extent of sequence similarity; through MD simulation it was determined that the difference was induced by only three residues located at the membrane–protein interface [964]. Interestingly, truncation of the transmembrane helix in CYP 19A1 and 17A1 did not change the orientation of the catalytic domain significantly; however, mutations in the transmembrane helix of CYP 17A1 destabilized the interaction between the membrane and catalytic domain of the protein [965]. MD simulations also demonstrated that lipid composition, e.g., the addition of cholesterol [966] or charged lipids [967], affects the orientation of the extra-membrane domain with respect to the membrane surface. Additionally, the choice of force field (parameters set) may have a measurable, but not significant, effect on the orientation of the domains of the protein that are outside the membrane [968].

The active site of CYP enzymes is located inside the protein, thus substrates and products enter/exit the active site through numerous possible channels [969] (Figure 21). Such channels have been observed in crystal structures [969], MD simulations [969], RAMD

calculations [970,971], and steered MD simulations [972]. The channels found can be subdivided into seven classes: channel 1, channel 2, channel 3, channel 4, channel 5, the solvent channel and the water channel; in turn, channel 2 has 7 subclasses. The aforementioned channels open towards the membrane core, membrane–water interface, and directly into the solvent phase (water) (Figure 21).

**Figure 21.** (**A**) Snapshots of five CYP enzymes simulated attached to the membrane (left) and simulated in water (right) with open (minimum radius 1.5 Å) channels shown. Reproduced with permission from ref. [962]. (**B**) Snapshots of the models of membrane-bound CYP 2C9 and (**C**) visualization of the channels observed in MD simulations of these models. Reproduced with permission from ref. [960].

Free energy calculations have demonstrated that the substrate may enter/exit via various channels; however, the free energy barriers encountered traversing the channels, i.e., largest barrier along the pathway for each case, differ by up to 8 kJ/mol [388]. An MD simulation study that compared the aforementioned channels in the human and human parasite *Trypanosoma brucei* versions of CYP51s provided evidence that subtle differences in structure exist which could possibly be taken advantage of to design a drug to specifically target the parasite enzyme [973], i.e., design drugs based on transit through the channels, selecting a drug that has a lower free energy barrier for traversing the channels of the form of the enzyme in the parasite than that present for the host.

The interactions of CYPs with lipids affects the behavior of the entry channels significantly. In studies of CYP 1A2, the authors demonstrated that the probability of the opening of a specific channel is dependent on contact with the membrane, with the radius of the channel regulated through interactions with lipids in the membrane [974]; through very subtle conformational change in the protein, the membrane facing channels 2b, 2d, and 4 are enlarged, i.e., "opened", when the protein is in contact with the membrane while the water facing channel 2c is enlarged when the protein is in the water phase. For the case of the protein CYP 2B4, free energy calculations have determined that the transition from the open to the closed state is associated with a free energy change of 10 Kcal/mol for the membrane-bound protein and 25 Kcal/mol for the protein in solution [975]. Moreover, for the case of CYP 17A1, the membrane facilitates the opening of the entry channels [961]. This opening of channels or catalytic pockets is also observed in peripheral membrane proteins (discussed in more detail in the following section) due to interactions with the lipid membrane, e.g., simulations have observed this occurring for the phospholipase A2 [976,977], dihydroorotate dehydrogenase [978], monoglycosyltransferase [979], and other proteins that we will later discuss.

Combined experimental and simulation studies of CYP 2B4 reveal the presence of a sphingomyelin binding region on the protein and a protein-induced increase in the level of cholesterol and sphingomyelin in nanodiscs incubated with micelles containing lipids [980]. Putting these together, the authors concluded that CYP 2B4 induces the formation of ordered raft-like domains in its environment as lipids are exchanged between the micelles and the nanodisks. In turn, at the ordered membrane, the thermal stability of CYP 2B4 is significantly increased. The lipid environment also affected the affinity of hydrophobic substrates for CYP 2B4: highest in the raft-like environment and lowest in water solution. The behavior of CYPs may also be modulated due to the formation of heterodimers with cytochrome b5 or cytochrome P450 reductase [981,982].

The results of older MD simulation studies have provided evidence that the location and membrane orientation of the entry channels of CYP 2C9 correspond well with possible substrate positions [272]. In recent unbiased MD simulations, spontaneous events of the entry into the catalytic pocket of CYP 2D6, by substrates paracetamol and butadiene, from the water–membrane interface, were seen [285]. In eight observed events of paracetamol entry into the catalytic pocket, three possible channels were used; thus, there is no specificity for the entry pathway for this compound. For butadiene, only two events of entry occurred, both through channel 2c. Using a combination of extensive, unbiased MD simulations and RAMD, the entrance pathway from the membrane to the binding site was determined for testosterone [408]; umbrella sampling calculations along this pathway revealed the free energy landscape.

Through MD simulations, the effect of point mutations on the enzymatic activity of CYP 2C19 was analyzed, providing evidence for a possible mechanism responsible for the reduction of the affinity of the enzyme for the substrate, e.g., local deformation of the secondary structure of the protein results in a change in the shape and the dynamics of the catalytic pocket that, in turn, gives rise to the formation of a new stable network of hydrogen bonds [983]. A study of CYP 1A1 found that point mutations may facilitate the interactions of organic pollutants with the protein [984], while CYP 2D6 is characterized by a high degree of genetic polymorphism that leads to several possible phenotypes [985]. The results of MD simulations of a few variants of CYP 2D6 indicated that a decrease, or even a complete loss, of enzymatic activity may arise from a decrease in the frequency of channel openings, decreased minimal diameters of channels, or decreased catalytic site volume [986]. On the other hand, an increase of active site volume facilitates enzyme activity. MD simulations have been used to design mutations of CYP 3A4, designed to enhance the rate with which it epoxides carbamazepine [328]. Finally, interactions of the drugs progesterone and carbamazepine [326,327] and atorvastatin and dronedarone [312] have been studied using a combination of experimental methods and MD simulations.

#### 6.3.3. Catechol-O-methyltransferase

The enzyme catechol-O-methyltransferase (COMT) is an exciting example of a protein that has two isoforms: a water–soluble form (S–COMT) that is generally thought to perform its catalysis in solution but with the capacity to act also as a peripheral membrane protein and a membrane–bound isoform (MB–COMT), a bitopic protein [987]. The difference between S–COMT and MB–COMT is a 26 residue long transmembrane α–helix and a 24-residue long linker segment connecting the catalytic domain with the transmembrane segment. The function of COMT is the catabolism of catechols (Figure 22), including the neurotransmitters dopamine, epinephrine, and norepinephrine. Inhibitors for COMT are thus used to treat Parkinson's disease together with the dopamine precursor L–DOPA [645,646]; the goal is to prevent dopamine deficiency in neurons, and the role of COMT inhibitors is to avoid L–DOPA degradation before it is converted to dopamine. Importantly, in the central nervous system, MB–COMT is the dominant form of the enzyme; thus, selective inhibition could possibly be profitable for patients. In our studies of MB–COMT, we found evidence, through MD simulation, that a group of inhibitors specific for MB–COMT is characterized by a clear orientation, when in the lipid membrane, with the catechol group oriented outwards and towards the water phase (Figure 22) [381]. The catalytic activity of COMT first involves the binding of the S-adenosyl methionine (ADOMET) cofactor, then an Mg++ ion is bound to form the binding pocket (Figure 22). As part of the same study, a simulation of the entire MB–COMT in the lipid membrane in both holo- and apo- (with and without ADOMET cofactor bound) states was performed and it was determined that the binding of ADOMET leads, i.e., orients, the catalytic surface towards the lipid headgroups, through the creation of a membrane binding patch similar to that found in peripheral membrane proteins, where the substrates are methylated at the membrane–water interface. Through MD simulation, the catalytic mechanism, unique to the membrane bound isoform of COMT, has been determined, presenting the tantalizing possibility of designing drugs that selectively target MB–COMT over S–COMT. Through datamining, other examples of proteins where the same mechanism can be exploited have been found.

#### 6.3.4. Monoamine Oxidase

Monoamine oxidase (MAO) A and B are bitopic proteins involved in the degradation of monoamines, particularly monoamine neurotransmitters: dopamine, serotonin, epinephrine, norepinephrine, histamine, and trace amines, as well as the catabolism of xenobiotics. As a result of this role, they are a drug target for inhibition in the treatment of Parkinson's and Alzheimer's disease and major depressive disorder [988–990]. Not surprisingly, MAO inhibitors are frequently studied using computational modeling methods including docking calculations, free energy calculations, and hybrid quantum mechanics/molecular mechanics (QM–MM) simulations, e.g., [991–993]. A recent example is the study of MAO–A and MAO–B inhibition by naturally occurring flavonoids that originate from medicinal plants [994]. Quantum mechanics/molecular mechanics simulations have also been used to elucidate the mechanism of enzymatic reaction catalyzed by MAO–A (e.g., [995]) and MOA–B (e.g., [996,997]); for a review of this topic, see ref. [998].

Simulations of MAO–A at the membrane−water interface provided evidence of the opening of a channel for the substrate and the reaction products that was not seen to occur in simulations performed without the lipid membrane, i.e., the channel remained closed [999]. For the case of MAO–B, the lipid bilayer was found to control the behavior of the two loops at the entrance of the active site, allowing for the opening of a channel [1000]. The other channel-facing membrane was observed in simulations of the MOA–B dimer [1001]. These studies also found evidence of changes in protein dynamics induced by the membrane, potentially affecting MOA–B activity.

–

**Figure 22.** (**A**) Schematic of the catalytic mechanism of MB–COMT. (**B**) The behavior of MB–COMT selective vs. non– selective inhibitors in the membrane. (**C**) Quantitative experimental and computational results for the interactions of the ADOMET and catalytic domain of COMT in complex and separately with lipids: (**C1**) the free energy changes calculated computationally with umbrella sampling methods; (**C2**) quartz crystal microbalance (QCM) frequency changes during interaction of COMT with the lipid bilayer; (**C3**) dissociation constant from liposomes determined by isothermal calorimetry. Reproduced with permission from ref. [381]. Copyright 2018 the Royal Society of Chemistry.

> It has been attempted, through experimental mutagenesis studies, to engineer MAOs from *Aspergillus niger* to act on larger substrates like benzyl-piperidine, thus transforming it into a device to synthesize chiral pharmaceuticals based on the benzo-piperidine scaffold; simulations were carried out to assist this endeavor [545] and residues in the channels leading to the catalytic pocket and residues inside the pocket were mutated in silico. The MD simulations demonstrated that increased hydrophobicity of the entrance channel and alteration of shape and size of the binding pocket provided the most efficient version of the enzyme.

#### 6.3.5. Tropomyosin Receptor Kinase B

The tropomyosin receptor kinase B (TRKB) belongs to the tyrosine kinase receptor family of proteins: a set of bitopic membrane proteins that operate as dimers with their trans–membrane α-helices paired in a crisscross orientation (see Figures 23 and 24); their main function is to pass signals through the cell membrane [1002,1003]. The specific tyrosine kinase receptor TRKB is the main receptor of brain-derived neurotrophic factor (BDNF) and plays an important role both in neuron survival and the growth and differentiation of new neurons and synapses; it can thus be seen, on the scale of overall brain function, as an agent that promotes neuron plasticity, the increase of which, through a combination of drug

and talk therapies, is currently seen as one of the most promising routes, following a mechanistic biophysical paradigm, for the treatment of clinical depression. Thus, combating the inhibition of TRKB activation by BDNF is a possible strategy for drug therapy [1004].

**Figure 23.** Transmembrane domain of TRKB dimer formed by the crisscross oriented helices forms a pocket for antidepressant binding. (**A**) Antidepressant fluoxetine is embedded in the crevice between the two helices of the TRKB transmembrane domain. Protein backbone shown in white cartoon and protein residues and fluoxetine in licorice representation. (**B**) Binding site for antidepressants at the outer opening of the crossed dimer is stabilized by the phospholipids. Protein backbone shown as green cartoon, the phospholipids in licorice and fluoxetine in vdW representations.

**Figure 24.** Schematic representation of how cholesterol and antidepressants regulate the activity of TRKB receptors through driving the orientation of its transmembrane helices. The rise of cholesterol content in the membrane increases its thickness. This forces the transition of TRKB transmembrane dimers towards the states with shorter C-terminal distances between the ends of the helixes (red rectangles). In turn, the C-terminal distances determine the arrangement of TRKB kinase domains (shown in orange) and the phosphorylation states of tyrosine 816 (black and yellow stars for native and phosphorylated states, respectively). The antidepressant fluoxetine (shown in blue), when bound to the pocket, preserves the stable transmembrane dimer conformation in a similar fashion to that observed at moderate cholesterol level and optimal for receptor activation.

A recent study of TRKB by Casarotto et al., that utilized numerous experimental and computational methods demonstrated a possible mechanism through which the antidepressant fluoxetine can increase the extent to which TRKB is activated by BDNF, resulting in the desired increase in neuronal plasticity [320]. The angle between the two alpha helices of the TRKB dimer, arranged in the crisscross orientation mentioned above (see Figure 23), is determined by the thickness of the hydrophobic core of the lipid membrane; the two identical alpha helices each have the same fixed hydrophobic length and cholesterol is known to affect the thickness of a lipid membrane, thus the level of cholesterol in the lipid membrane will alter this angle. The angle between the helices determines the distance between the intracellular ends of the helices that, in turn, controls the positioning of the intracellular domains of the proteins; this mechanism switches the protein between the active or inactive states. Through a combination of experimental methodologies, Casarotto et al., found that TRKB activation is affected by the level of cholesterol in the neuronal membranes where the TRKB receptors reside; the study reveals a bell-shaped dependence between the level of cholesterol in the membrane and the extent to which TRKB activation by BDNF occurs: both low and high levels of cholesterol were found to inhibit BDNF receptor activation, while a moderate level of cholesterol was found to be optimal, i.e., the level of TRKB activation had a maximum at a certain level of cholesterol in the membrane. The same bell-shaped dependency was also observed in a direct binding assay of BDNF to TRKB [1005]. It can thus be surmised that BDNF activation is dependent on the angle between the two trans-membrane helices with a certain finite angle being optimal. It was then found that the presence of the antidepressant fluoxetine caused the extent of TRKB activation not to decrease when the cholesterol level in the membrane was raised above the formerly optimal level, i.e., the inhibitory effect of elevated cholesterol level was negated by the presence of fluoxetine.

Next, Casarotto et al., performed MD simulations of a TRKB dimer in a membrane with the antidepressant fluoxetine present; they made a truly startling discovery: a binding site for the fluoxetine molecule was found to exist at the point the two helices crossed; when a fluoxetine molecule was bound to this site, the fluoxetine molecule effectively jammed the two transmembrane helices like a rock in a pair of scissors, as shown in Figure 23. The study thus demonstrates how MD simulation has connected a set of experimental results to provide a complete picture of a hypothesis of the mechanism of action of the antidepressant fluoxetine, shown as a schematic in Figure 24. Mutagenesis experiments were then carried out to verify this; the fluoxetine binding site, found from MD simulation, was removed and the effect of fluoxetine on the cholesterol dependence of TRKB activity was found to disappear, as expected.

This mechanism also explains why antidepressants start activating TRKB only after 2 weeks of postnatal development: data from lipidomics indicates that, at the end of the second week of postnatal development, the level of cholesterol in the synaptic membranes of rodents drastically increases [1006], exactly the time when, in separate studies, antidepressants were found to start to become capable of activating TRKB [1007,1008].

The revolutionary potential of this possible discovery and the key role MD has played in it cannot be understated and already have been widely commented upon in the neuroscience community [1009–1012]; attempts have already been made to find small molecules which directly activate TRKB [1013] or positively stimulate BDNF signaling [1014].

#### *6.4. Peripheral Membrane Proteins*

#### 6.4.1. Proteins That Live in the Cytoplasm, but Work at the Membrane Surface

We finally come to the last category of membrane-associated proteins, and arguably the most difficult to study of all: peripheral membrane proteins. While these proteins have no permanent association with the membrane, the activity of the protein takes place at a lipid membrane; interaction with the lipid membrane plays a clear role in its activity. Peripheral membrane proteins are considered as possible drug targets [1015], in particular, the protein domains known to specifically bind lipid headgroups at the membrane surface [1016]; for example, it has been demonstrated that domains that bind phosphatidylinositols could be a possible drug target in cancer therapy [1017,1018].

Peripheral membrane proteins are frequently studied through MD simulations, but frequently without association with the lipid bilayers involved in their activity. For example, galectins, an important class of protein that recognizes carbohydrates, including glycolipids, have been studied in the context of protein–ganglioside carbohydrate headgroup interactions [1019,1020] or in the context of interactions with potential inhibitors [1021,1022]; the lipid bilayer has, however, not been included in any computational studies of galectins.

We will now outline a few examples of peripheral membrane proteins, including domains that recognize phosphatidylinositols and phosphatidylserine, protein toxin, and protein kinase C (PKC) where MD simulation has elucidated key elements of their mechanism of action.

#### 6.4.2. Protein Kinase C

Protein kinase C (PKC) plays many important roles in human physiology and aspects of its activity are thus also involved in several pathologies, including heart failure, cancer, Alzheimer's disease, and diabetes [1023–1025]. The catalytic activity occurs at the surface of a biomembrane, when activated by di-acyl-glycerol (DAG) in the membrane. In many cases moderate activation has been found to be beneficial while prolonged activation detrimental. PKC is thus a drug target for controlled activation and a detailed mechanistic understanding of the factors that modulate its activity and the role the membrane plays in mediating the DAG induced activation is pharmaceutically relevant. For a review of the use of computational methods for the development of ligands of PKC, see Katti and Igumenova (2021) [1026].

A set of MD simulation studies of the C1 and C2 domains of PKC have demonstrated the importance of specific lipids in regard to the docking of the protein to the lipid bilayer; specifically, the importance of anionic lipids has been demonstrated [1027–1030]. In other studies, PKC and its activator aplysiatoxin have been studied using MD simulations of this complex interacting with a lipid bilayer, in order to describe the binding mode of this potential anticancer drug [230]. In another computational study of PKC, the C1B domain was placed at the water–membrane interface and docked to four potential activators bryostatin, a bryostatin analog, phorbol 12,13–dibutyrate, and prostratin [231]; long MD simulations demonstrated that the C1B domain adopted a different orientation and positioning for each of the activators. These differences originated from variation in activator interactions with both water and lipids.

A series of experimental assays were carried out on different DAG mimics based on hydrophobic isophthalic acid derivatives (HMI**s**) [1031–1034]. Among these, the HMI known as HMI-1a3 (Figure 25) was found to be a particularly effective activator of PKC, however its application is limited due to its low solubility. An analog of HMI-1a3 with a phenyl ring substituted with pyrimidine (PYR-1gP) was synthesized that, unlike HMI-1a3, had an acceptable solubility profile, but this was found in experimental assays not to be effective as a PKC activator. An MD simulation study was carried out by Lautala et al., to determine the cause of this; they found that one of the reasons for this observed low potency to be a drastic change in the orientation of the functional group involved in binding from exposed to the water phase to embedded into the bilayer; this results from the formation of an internal H-bond that increase its effective hydrophobicity [336].

#### 6.4.3. The Binding Domains of PIPs

Phosphatidylinositols (PIPs) are a class of lipid of particular importance in cellular signaling [1035]. There are 8 PIP headgroup types characterized by different behaviors at the membrane surface, e.g., orientation in the membrane [1036]. There are 14 known protein domains that bind PIPs [1037–1039] including: (1) pleckstrin homology (PH), (2) FYVE (the first letter of the first four proteins in which domain was identified), (3) phox homology (PX), (4) epsin N-terminal homology (ENTH), (5) FERM (F stands for 4.1 protein, E for

ezrin, R for radixin and M for moesin), (6) PROPPINS (b-propellers that bind polyphosphoinositides), (7) Tubby, and (8) BAR (name from first letter of proteins: Bin, Amphiphysin, Rvs) domains and others not discussed in this paper.

**Figure 25.** Maps showing orientation-position relations of the hydroxyl groups of HMI-1a3 and PYR-1gP. Reproduced with permission from ref. [336].

The PH domain is the most common PIP binding domain, there are at least 329 different PH domains present in 284 human proteins [1040]. The PH domain is built from 100–150 residues and binds various PIPs. Unsurprisingly, it is the most frequently studied among PIP binding domains. A few MD simulation studies provide insight into PH domain interactions with PIP headgroups and details of interactions with lipids in the bilayer in addition to the orientation of the domain towards the membrane surface [1041–1048] or characterized dynamics of bound domain at the membrane surface [1030]; for example, Yamamoto et al., characterized PH domains originating from 13 different proteins [1049]. Interactions of PH domains with lipid bilayers were also studied through computational free energy calculations. For example, it has been shown that the PH domain from the GRP1 protein has stronger binding to a lipid bilayer with PIP3 than a bilayer with PIP2 [1050]. In another study of the GRP1 PH domain, it was shown that in the canonical binding mode, the free energy difference between the bound and unbound state is about 9 kcal/mol, while in an alternate, non–canonical, binding mode, it is about 7 kcal/mol [1047]. Moreover, it has been shown that the GRP1 PH domain has the ability to bind multiple PIP3 molecules simultaneously, increasing the strength of the binding to the lipid bilayer [1051]. Finally, MD simulations have demonstrated that the binding of the PH domain to the membrane could possibly be facilitated by phosphatidylserine; however, for stable binding, PIP is necessary [1048].

The ACAP1 protein, recently studied with MD simulations, is in possession of two PIP binding domains: the BAR and PH domains [1052]; the orientation of the PH domain at the membrane surface is affected by the BAR domain. The PH domain of ACAP1 has two binding pockets for PIP2 headgroups (pocket 1 and pocket 2); free energy calculations determined that the PIP2 occupancy of pocket 1 decreases the free energy of binding to the membrane by 1 kcl/mol in comparison to when PIP2 is not present, while the occupancy of pocket 2 decreases the free energy of binding by 3.5 kcl/mol.

The over-expression of Grb2-associated binding protein 1 (GAB1) occurs in numerous cancers, thus drugs designed to inhibit the activity of GAB1 are studied, including drugs that target the PH domain. Extensive modeling studies, including virtual screening of five million compounds and MD simulations of the five most promising molecules, have demonstrated that selected molecules strongly bind to the PH domain and induced significant changes to the protein conformation [1053]. Next, it has been shown experimentally that these potential drugs have tumor-specific cytotoxicity against two breast cancer cell lines. The ceramide transfer protein contains both PH and START domains, and dimerization of these domains leads to protein inhibition. Docking calculations, combined with MD simulation, allowed for the selection of a few possible protein inhibitors, which bind to the PH–START domain interface, increasing the strength of the association between these two domains [1054]. The protein AKT1 kinase, also in possession of one or more PH domains, has been studied, using docking calculations and MD simulation, to understand the effect of the cancerous E17K mutation [1055]. It has been observed that inhibitor 7, known to be ineffective for the case of this mutation being present, is characterized by a much lower affinity towards the mutated version of the protein. Moreover, mutation leads to conformational changes in the protein.

The FYVE domain that contains only 70–80 amino acids, has been identified in 30 human proteins. The FYVE domain only binds PIP3 and not the other PIPs, i.e., it is specific for PIP3. The PX domain is present in 49 human proteins and binds several different PIPs [1056]. Studies of these two domain interactions with lipid bilayers, carried out through MD simulation, have demonstrated that, after binding with a lipid bilayer and PIP3, both domains lose their flexibility that is observed when the domains are in the water phase [1057]. This effect was more pronounced for the FYVE domain. Both domains remained firmly bound to the PIP3 molecule and both domains penetrate the bilayer core, the FYVE domain via the N-terminal hydrophobic loop and the PX domain via two loops: α1α2 and β1β2. In another MD simulation study, phospholipase D2 (PLD2) that has both PX and PH domains, was shown to interact with PIP2 predominantly via these specific domains [1058]. Moreover, a second anionic lipid, POPA, was found to participate in the protein-membrane interactions.

The FERM domain was included in a few MD simulation studies of the proteins talin [1059–1061] and focal adhesion kinase (FAK) [1062,1063]. Studies of FAK showed a strong effect of PIP2 on the protein orientation at the membrane surface [1063]. Studies of talin elucidated how talin, in cooperation with PIP2, affects the interaction of dimers of the transmembrane helices of integrins leading to dimer dissociation, the first step in integrin activation [1059,1060]. MD simulation also found a change in the FERM domain conformation [1059]; the FERM domain is composed of four units F0, F1, F2 and F3 which are organized in a linear fashion (Figure 26) as seen in its crystal structure (PBD ID 3IVF) [1064] that is, however, missing loop F1 which was approximated computationally. Due to the presence of the loop F1, the FERM domain became V-shaped (Figure 26). Additionally, a study that combined MD simulations with multiple experimental methods, demonstrated that FERM in talin is more compact and nonlinear [1061] (Figure 26).

The mechanism through which PROPPINS docks to the membrane has been studied using coarse-grained and atomistic MD simulations; the study demonstrated the importance of the loop 6CD for protein binding to the membrane [1065]; a combination of experimental and theoretical studies revealed the presence of two phosphoinositide binding sites in the β-propeller domain [1066]. Coarse-grained MD simulations were used in studies of the aggregation of the ENTH domains at membrane vesicles and membrane tubes, i.e., cylinders formed from lipid membranes [1067]. These studies found evidence that the formation of dimers and larger oligomers occurs within membrane tubes. Finally, MD simulation studies of the tubby domain have demonstrated that it is in possession of two binding sites for PIP2 [1068].

**Figure 26.** Conformation of talin FERM domain (**A**) extended conformation from crystal structure (PBD ID 3IVF) and after CG simulations [1059], (**B**) compact conformation [1061].

#### 6.4.4. Other Examples

Coagulation factor X is one of several proteins involved in the initiation of the coagulation cascade; its activity is dramatically increased when it binds to a membrane that contains negatively charged lipids, in particular phosphatidylserines. The GLA domain is responsible for the binding of factor X to the lipid bilayers, its characteristic feature is posttranslational modification of glutamic acid residues, which transform the glutamic acid sidechain into γ-carboxyglutamic acid (GLA); GLA is in possession of two carboxylic groups with the ability to strongly bind divalent cations, predominately Ca++. Through MD simulation, evidence has been found that GLA residues with bound Ca++ ions bind directly to phosphatidylserine [1069]. Altogether, 7 GLA residues participate in interactions with lipid bilayers and short loops insert three hydrophobic residues directly into the bilayer core.

Protein toxins, including bacterial toxin, have specific requirements to recognize and enter cells [1070], i.e., certain molecules must be present in the membrane. Cholera toxin is a protein, which requires gangliosides, e.g., GM1, to be present in the membrane in order for it to bind and is a potential drug target; for example, peptide mimicking GM1 carbohydrates prevent the binding of toxin to intestinal cells [337]. Through MD simulation, the mechanism of toxin docking to the lipid bilayer has been studied, including its interactions with lipids, in particular GM1 and the effects of the toxin binding on the overall properties of the lipid bilayer [525,1071]. Docking studies have proposed three possible drug candidates with sufficient potency to prevent the binding of the toxin with ganglioside; MD simulations were used next to evaluate the stability of the binding of drug candidates with toxin [1072]. Two compounds, A6225 and A16503, formed a stable complex with the toxin. Cardiotoxin CTII, a component of cobra venom, is known to disrupt mitochondrial membranes. Through MD simulation, the interactions of CTII with a lipid bilayer composed of lipids present in the outer mitochondrial membrane [1073] were elucidated. In particular, residues involved in interactions with cardiolipin were identified.

The enzymes responsible for the metabolism and transportation of lipids, are often peripheral membrane proteins not explored as a drug target although proposed as a target for antiviral therapy against flaviviruses [1074]. Examples of MD simulation studies of enzymes that catalyze the metabolism of lipids include monogalactosyl-diacyl-glycerol synthase 1 [1075], phosphatidylinositol phospholipase Cγ and its cancerous mutation [1076], and lysophosphatidic acid acyltransferase [1077]. Glutathione peroxidase 4 is an enzyme responsible for the removal of lipid hydroperoxides, thus preventing lipid oxidation; MD simulations have recently been used to elucidate the process through which a substrate enters into the active pocket of the enzyme from the membrane surface [1078]. Lipid transporters are proteins that shuttle lipids between two separate membranes, thus acting as peripheral membrane proteins when loading/unloading their cargo and water-soluble proteins when transporting it through the cytoplasm. Lipid transporters studied through MD simulation include the mammalian phosphatidylinositol transfer protein; these studies provided insight into the mechanism and energetics of the loading/unloading of PIPs into the binding cavity of the protein [1079] and studies of the UPS1 protein transporting phosphatidic acid between the inner and outer mitochondrial membrane [1080].

As a large group of peripheral membrane proteins recognize specific lipids, it is also possible to develop drugs that bind to specific lipids, thus preventing the recognition of these lipids by specific peripheral membrane proteins. An example of such a strategy is the study of siramesine, a small compound characterized by its high affinity to phosphatidic acid [335].

#### **7. Conclusions**

Through a detailed discussion of how MD simulation can be used to elucidate the roles the interaction with lipid membranes play in drug action, we hope we have provided the reader with an intuitive understanding of the broader landscape within which drug molecules act, beyond the limiting paradigm of ADME and "docking and scoring". Xenobiotics are foreign molecules travelling through the body, uninvited intruders that become involved in the complex melee that is physiology on the molecular length scale. A drug molecule is a xenobiotic with a specific purpose, designed by its maker, like a foreign agent with a mission to perform a specific task travelling across the countryside to reach the city where they must perform it. In the previous review paper, we discussed the role MD simulation can play in providing mechanistic insight relevant to the development of the mechanisms through which this agent can reach their destination; now we have discussed how MD simulation can aid in the design of a more complex set of tasks for the agent than possible within the confines of the "docking and scoring" paradigm. Through MD simulation, (1) drug molecules can be rationally designed to perform more complex tasks than merely fitting to a specific protein and (2) even if the target is the active site of a specific protein, far more is involved in the determination of the effective affinity of a specific drug molecule structure for a specific active site than can be investigated through the "lock and key" paradigm. While there are many possible factors other than the role played by lipid membrane interactions that MD simulation can address, we hope that this provides a case study that expands the intuitive picture the reader has of the scope of computational drug design and the role the specific toolkit of MD simulation can play.

Lipid membranes are fundamental structures with no less complex or important a role in molecular level physiology than proteins; the mechanisms through which small drug molecules can interact with them are equally complex. Drug molecules must cross them, can affect their properties through collective action, and can have their interactions with target proteins mediated by them; a comprehensive inclusion of the role the drug structure plays in its interaction with lipid membranes and the use of MD simulation to model this interaction is clearly a component of the future of drug design.

In the beginning of the paper, we mentioned that we can envision a time in the future when, through the combined solution of the human proteome and the general structure from sequence problem, a program can exist that could input any molecular structure and output its binding affinity for every active site of every protein in the human proteome. Hopefully, when reading this review paper, the reviewer can see that even this fantastic device would merely be one component of the advanced computational toolkit of drug design of the future. This information would form a starting foundation that then would be combined with MD simulation able to investigate the broader context of the protein environment, or possible modes of action of a specific drug molecule that involve different elements of physiology than active sites of proteins. The lipid membrane is an important component of this where we have shown that MD simulation can play a leading role. As in all cases, the computational effort must be matched with complementary experimental techniques that themselves are undergoing revolutionary upgrades as we speak. As recent events have shown, the health challenges to humanity are mounting, but so is our arsenal in combating them and MD simulation as a component of computational drug design will grow in importance, with the increase in computational power and data stored.

**Author Contributions:** All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Open access funding provided by University of Helsinki.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **In Silico Studies of Potential Selective Inhibitors of Thymidylate Kinase from** *Variola virus*

**Danielle R. Garcia 1 , Felipe R. Souza 2 , Ana P. Guimarães 3 , Martin Valis 4 , Zbyšek Pavelek 4 , Kamil Kuca 5,6,\* , Teodorico C. Ramalho 5,7 and Tanos C. C. França 1,5, \***


**Abstract:** Continuing the work developed by our research group, in the present manuscript, we performed a theoretical study of 10 new structures derived from the antivirals cidofovir and ribavirin, as inhibitor prototypes for the enzyme thymidylate kinase from *Variola virus* (*Var*TMPK). The proposed structures were subjected to docking calculations, molecular dynamics simulations, and free energy calculations, using the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method, inside the active sites of *Var*TMPK and human TMPK (*Hss*TMPK). The docking and molecular dynamic studies pointed to structures 2, 3, 4, 6, and 9 as more selective towards *Var*TMPK. In addition, the free energy data calculated through the MM-PBSA method, corroborated these results. This suggests that these compounds are potential selective inhibitors of *Var*TMPK and, thus, can be considered as template molecules to be synthesized and experimentally evaluated against smallpox.

**Keywords:** *Variola virus*; thymidylate kinase; smallpox; docking; molecular dynamics

#### **1. Introduction**

Despite the declaration of the World Health Organization (WHO) that the *Variola virus* was eradicated from the world by the 1980s, smallpox is still a matter of concern. Moreover, many studies on the development of drugs against this disease have been reported in the literature [1–16]. This is due to the fact that *Variola virus* strains may have been stored in clandestine sites around the world, and their potential use for bioterrorist purposes cannot be ignored [17,18]. Furthermore, the recreation of this kind of virus has proven to be easy to carry out, considering the current technologies [19]. As viruses can survive long periods in nature, there is no guarantee that smallpox will never return as a natural pandemic [20,21]. It is also important to mention that since the 1980s the public vaccination campaigns against smallpox do not exist anymore [4,7,9], a fact that makes all of the world's population under 40 years of age particularly vulnerable.

Currently, there is only one drug approved by the Food and Drug Administration (FDA) of the United States for the treatment of smallpox. This drug, named tecovirimat (Figure 1), acts as an inhibitor of the protein complex needed for the survival of extracellular

**Citation:** Garcia, D.R.; Souza, F.R.; Guimarães, A.P.; Valis, M.; Pavelek, Z.; Kuca, K.; Ramalho, T.C.; França, T.C.C. In Silico Studies of Potential Selective Inhibitors of Thymidylate Kinase from *Variola virus*. *Pharmaceuticals* **2021**, *14*, 1027. https://doi.org/10.3390/ph14101027

Academic Editors: Osvaldo Andrade Santos-Filho and Eynde Jean Jacques Vanden

Received: 11 August 2021 Accepted: 30 September 2021 Published: 9 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

viruses, and thereby prevents its spread throughout the body from the infected cells [1,2,10]. However, despite the fact that it was approved by the FDA, tecovirimat is still in phase II of clinical trials and has not been tested in humans yet, due to the lack of patients with smallpox. It has been effective in laboratory tests to protect animals against monkeypox and rabbitpox as well as presented low toxicity to humans [1,2,10,22,23].

**Figure 1.** Structures of tecovirimat, cidofovir, and ribavirin.

The risk of smallpox resurgence either, as a natural pandemic or consequence of a terrorist attack, combined with the huge number of non-immunized people in the world, and the scarcity of drugs to combat this disease, highlight the importance of the search for new drugs against it.

It is known from the literature that the Variola virus belongs to the genus Orthopoxvirus, which, similar to all poxviruses, are capable of encoding their own thymidine and thymidylate kinases (TK and TMPK) [24]. Caillat et al. [24] have detailed the sequence alignment of TMPKs from different Orthopoxviruses (vaccinia virus, smallpox, cowpox, camelpox, and monkeypox) and showed that those enzymes present highly similar amino acid sequencing, which are practically identical, except for a few residues. The majority of the different residues among the viral enzymes belong to the protein loops [24]. Therefore, the crystallographic structure of the Vaccinia virus TMPK (VaccTMPK) can be used as a template for modeling the structure of the Variola virus TMPK (VarTMPK), which is not available yet in the protein data bank (PDB) (https://www.rcsb.org/).

Moreover, the study by Caillat et al. [24] included data on practical experiments using compounds such as brivudine monophosphate to test their inhibition of VaccTMPK and human TMPK (HssTMPK). HssTMPK and VaccTMPK are only 42% similar in their amino acid sequence alignments and the dimer interface arrangements of both enzymes are different, as well as their active site geometry [24–26]. Due to these differences, as stated by the authors, VaccTMPK is able to accommodate more voluminous compounds, such as brivudine monophosphate, which stabilizes the enzyme and is phosphorylated. The specificity of the bond with brivudine monophosphate shows that selective antipox agents can be developed based on this finding. Therefore, these antivirals can also be investigated for their use in the treatment of TMPK-related diseases from other viruses of the Orthopoxvirus genus [24].

According to the literature, TMPK is responsible for the synthesis of thymidine 5 ′ -triphosphate (TTP) based on the phosphorylation of thymidine 5′ -monophosphate (TMP) [27,28]. TTP participates in DNA synthesis (as a building block) and its levels are controlled during the different phases of the cell cycle. Since TMPK is important for the biosynthesis of TTP, the enzyme is considered a molecular target for the development of antiviral drugs against many diseases, and the interruption of the metabolism of TTP can be used to stop the development of said illnesses. In addition to its direct effect on the metabolism of TTP, TMPK also interacts directly with the DNA synthesis process, since it plays an important role in the activation of DNA precursor nucleoside analogues. As a result, its inhibition directly affects the synthesis of genetic material, thus provoking the deactivation (death) of the virus [25,29,30].

Previously, we proposed VarTMPK as a potential target to the design of potential selective inhibitors and studied the behavior of known antivirals inside it, thus pointing to the relevant residues to be targeted in the drug design [7]. Next, we proposed the structures of 10 potential VarTMPK inhibitors based on theoretical studies of another series of antivirals [4,8,24,26]. The best ranked compounds after docking and MD simulations studies were, then, selected to design a new series of compounds based on alterations of their structural features, having in mind the synthetic viability and selectivity towards VarTMPK [9]. A few compounds of this new series presented promising theoretical results [9].

In the present work, we performed the theoretical study of 10 new compounds whose design was based on structural modifications of the antivirals cidofovir and ribavirin (Figure 1), seeking the application of these new compounds against the Variola virus. Cidofovir and ribavirin were chosen to serve as templates for this study, since they have already been extensively studied and validated for diseases caused by Orthopoxviruses. Therefore, they have much data available in specialized databanks [31,32].

Maintaining the main idea of our previous studies [4,7,9], the objective of this work was to minimize the structural complexity of the template compounds, in order to make the synthesis of these structures more feasible and less cumbersome.

To analyze the binding modes and selectivity of the proposed compounds inside *Var*TMPK and *Hss*TMPK, the molecular docking method was used [33,34]. In order to assess their dynamical behavior and corroborate the docking achievements, molecular dynamics (MD) simulations rounds were performed on the best poses obtained from docking. Finally, free energy calculations, applying the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method, were performed on the most promising compounds to verify the effectiveness, reliability, and selectivity of their binding inside each enzyme [35–45].

#### **2. Results**

#### *2.1. Docking Calculations*

Table 1 lists the active site residues of *Var*TMPK and *Hss*TMPK. It is possible to see that seven of these 14 residues are different. This 50% difference highlights the possibility of designing selective inhibitors for *Var*TMPK.

**Table 1.** Active site residues of VarTMPK and HssTMPK. The non-matching amino acids are shown in red.


Similar to our previous works [4,9], the ionization states of the compounds used in this study corresponded to the predominant microspecies at physiological conditions (pH = 7.4). According to the chemicalize server (www.chemicalize.com), our compounds except compound 1, are 100% in the neutral form at pH 7.4. The dominant microspecies for compound 1 predicted by chemicalize (www.chemicalize.com) is the one negatively charged in the phosphate group, with 74% of prevalence at pH 7.4. Moreover, according to the chemicalize server (www.chemicalize.com), all of our compounds meet the Lipinski's criterion of drug likeness [46], as shown in Table 2.


**Table 2.** Calculated values of acute toxicity, carcinogenicity (in mouse), and Lipinski's rule of the compounds and antivirals.

The docking studies were meant to identify the compounds with better interactions and higher selectivity towards *Var*TMPK. The best poses for each compound were selected in accordance with the best (more negative) interaction energies inside *Var*TMPK and *Hss*TMPK (intermolecular and hydrogen bond) and the best superposition onto TDP. As shown in Figure 2, the poses selected in the docking studies present an optimal superposition onto TDP for both *Var*TMPK and *Hss*TMPK. These poses were analyzed according to: Their interaction energy values with the enzymes (Einteraction) and the cofactor (Ecofactor); the H-bond energies between the compounds and the enzymes (Hbond); and, finally, the amino acids involved in the interactions with the compounds in both enzymes. The results obtained for those parameters are summarized in Table 3. Figure 3a–j shows the interactions observed between the proposed compounds inside *Var*TMPK and *Hss*TMPK. The active site residues of both enzymes are shown in red and the red spheres represent water molecules.

#### *2.2. Molecular Dynamics Simulations*

The best poses of compounds **2**, **3**, **4**, **5**, **6**, **8**, and **9** were selected for additional rounds of MD simulations, since they were pointed as more selective towards *Var*TMPK by the docking studies. The results are shown in the plots of root-mean-square deviation (RMSD) and number of H-bonds formed during the MD simulations, for their complexes inside *Var*TMPK and *Hss*TMPK, shown in Figures 4 and 5, respectively.

In order to validate our MD protocol, we performed three rounds of MD simulations for the complexes of compound **2** with both enzymes. The obtained plots of RMSD are shown in Figures S1–S6 of the Supplementary Material.

#### *2.3. MM-PBSA Calculations*

According to the literature, one of the limitations of docking studies lies in determining the affinities of ligand-protein complexes using calculations based solely on the poses generated for these complexes. In this regard, the MM-PBSA method tends to improve the results related to the binding energy, which is due to the fact that MM-PBSA provides more accurate results when the affinities between the ligand and protein are determined. Moreover, this is achieved by calculating the free binding energy associated with the formation of ligand-protein complexes [47] and decomposing it in the contributing components, as proposed before by Bren et al. [44,45]. Table 4 shows the average binding energy values calculated for compounds **2**, **3**, **4**, **6**, and **9** complexed with *Var*TMPK and *Hss*TMPK, while Figure 6 illustrates the favorable and unfavorable energy contributions of compounds **3** and **6** inside *Var*TMPK.

*Pharmaceuticals* **2021**,*14*, 1027


**Figure 2.** Best molecular docking poses for the prototypes inside *Var*TMPK and *Hss*TMPK: (**a**) *Var*TMPK/compound **1**, (**b**) *Hss*TMPK/compound **1**, (**c**) *Var*TMPK/compound **2**, (**d**) *Hss*TMPK/compound **2**, (**e**) *Var*TMPK/compound **3**, (**f**) *Hss*TMPK/compound **3**, (**g**) *Var*TMPK/compound **4**, (**h**) *Hss*TMPK/compound **4**, (**i**) *Var*TMPK/compound **5**, (**j**) *Hss*TMPK/compound **5**, (**k**) *Var*TMPK/compound **6**, (**l**) *Hss*TMPK/compound **6**, (**m**) *Var*TMPK/compound **7**, (**n**) *Hss*TMPK/compound **7**, (**o**) *Var*TMPK/compound **8**, (**p**) *Hss*TMPK/compound **8**, (**q**) *Var*TMPK/compound **9**, (**r**) *Hss*TMPK/compound **9**, (**s**) *Var*TMPK/compound **10**, (**t**) *Hss*TMPK/compound **10**.

**Table 4.** MM-PBSA results for compounds **2**, **3**, **4**, **6**, and **9**.


**Figure 3.** Interactions observed for the best poses obtained inside the *Var*TMPK and *Hss*TMPK active sites: (**a**) Compound **1**, (**b**) compound **2**, (**c**) compound **3**, (**d**) compound **4**, (**e**) compound **5**, (**f**) compound **6**, (**g**) compound **7**, (**h**) compound **8**, (**i**) compound **9**, (**j**) compound **10**.

**Figure 4.** RMSD of the systems formed by the enzymes (in black) and compounds (in red): (**a**) *Var*TMPK/compound **2**, (**b**) *Hss*TMPK/compound **2**, (**c**) *Var*TMPK/compound **3**, (**d**) *Hss*TMPK/compound **3**, (**e**) *Var*TMPK/compound **4**, (**f**) *Hss*TMPK/compound **4**, (**g**) *Var*TMPK/compound **5**, (**h**) *Hss*TMPK/compound **5**, (**i**) *Var*TMPK/compound **6**, (**j**) *Hss*TMPK/compound **6**, (**k**) *Var*TMPK/compound **8**, (**l**) *Hss*TMPK/compound **8**, (**m**) *Var*TMPK/compound **9**, (**n**) *Hss*TMPK/compound **9**.

**Figure 5.** H-bonds between (**a**) VarTMPK/compound **2**, (**b**) HssTMPK/compound **2**, (**c**) VarTMPK/compound **3**, (**d**) HssTMPK/compound **3**, (**e**) VarTMPK/compound **4**, (**f**) HssTMPK/compound **4**, (**g**) VarTMPK/compound **5**, (**h**) HssTMPK/compound **5**, (**i**) VarTMPK/compound **6**, (**j**) HssTMPK/compound **6**, (**k**) VarTMPK/compound **8**, (**l**) HssTMPK/compound **8**, (**m**) VarTMPK/compound **9**, (**n**) HssTMPK/compound **9**.

**Figure 6.** Main interactions of the complexes *Var*TMPK/compound **3** (**left**) and *Var*TMPK/compound **6** (**right**) calculated by MM-PBSA.

#### **3. Discussion**

As can be seen, Table 2 shows the toxicity and mouse carcinogenicity values of the investigated structures and the antivirals used as precursors. In short, with respect to the proposed compounds, it is important to point out that the acute toxicity values in algae calculated for all the structures were lower than the value found for cidofovir (1.19), which is not toxic. Furthermore, except for compounds **1** and **10**, all the structures had lower toxicity values in comparison with those of ribavirin (0.54).

According to the literature, the cavity of *Hss*TMPK acquired through the software Molegro virtual docker (MVD)®, presents a volume of 90.112 Å<sup>3</sup> , which is greater than the value of 76.288 Å<sup>3</sup> observed for the cavity of *Var*TMPK. It is important to point out that the *Hss*TMPK cavity is more outspread and narrower. Therefore, it is expected to have issues in the entrance and/or permanence of larger inhibitors in the binding site [7].

The results in Table 3 show that all the compounds are capable of binding to the active sites of both enzymes. This is reflected by the negative values of Einteraction observed for all of them. Comparing the docking results obtained, we can state that compounds **2**, **3**, **4**, **5**, **6**, **8**, and **9** form more stable complexes with VarTMPK. This enables us to infer that they might show higher affinity for the active site of this enzyme, since the systems formed among these compounds and the viral enzyme presented lower values of Einteraction, Ecofactor, and H-bond energy. Regarding the ∆Einteraction values, determined based on the difference between Einteraction inside *Var*TMPK (lower value) and *Hss*TMPK, the compounds listed above ranked: **2** (−62.85 kcal.mol−<sup>1</sup> ) < **5** (−55.40 kcal.mol−<sup>1</sup> ) < **9** (−45.90 kcal.mol−<sup>1</sup> ) < **4** (−38.57 kcal.mol−<sup>1</sup> ) < **3** (−35.41 kcal.mol−<sup>1</sup> ) < **6** (−27.85 kcal.mol−<sup>1</sup> ) < **8** (−11.32 kcal.mol−<sup>1</sup> ).

The best pose of compound 1 interacts with three active site residues of *Var*TMPK and two active site residues of *Hss*TMPK, as shown in Figure 3a. Among these residues, Arg93 from *Var*TMPK and Arg97 from *Hss*TMPK are equivalent between both enzymes.

Regarding compound **2** (Figure 3b), we observed interactions with three residues from *Var*TMPK, of which two belong to the active site, while for *Hss*TMPK, an interaction with only one residue was observed, which also belongs to the active site.

Compounds **3** (Figure 3c) and **4** (Figure 3d) interact with the same number of residues inside both enzymes. However, for *Var*TMPK, two interacting residues belong to the active site versus only one for *Hss*TMPK. Residues Arg93 from *Var*TMPK and Arg97 from *Hss*TMPK are the only equivalent interacting residues observed.

Figure 3e shows that compound 5 interacts with two active site residues from *Var*TMPK and one from *Hss*TMPK. No matching interacting residues were observed.

According to Figure 3f, compound 6 interacts solely with one active site residue from *Var*TMPK and has no interaction with the *Hss*TMPK active site residues. On the other hand, for compound **7**, Figure 3g shows interactions with two active site residues from *Var*TMPK and one from *Hss*TMPK. Arg93 from *Var*TMPK and Arg97 from *Hss*TMPK were the only equivalent interacting residues observed.

Regarding the complexes formed with compound **8** (shown in Figure 3h), we can see interactions with two active site residues from *Var*TMPK and one from *Hss*TMPK. In this case, Asp13 from *Var*TMPK and Asp15 from *Hss*TMPK are the equivalent interacting residues observed.

Figure 3i shows that compound **9** interacts with three active site residues from *Var*TMPK, and two from *Hss*TMPK. The equivalent interacting residues observed are: Asp13 (*Var*TMPK) and Asp15 (*Hss*TMPK), and Arg72 (*Var*TMPK) and Arg76 (*Hss*TMPK).

Finally, Figure 3j shows that compound **10** interacts with four active site residues from *Var*TMPK, and only one from *Hss*TMPK. The equivalent residues observed are Arg93 from *Var*TMPK and Arg97 from *Hss*TMPK.

The RMSD plots for the multiple MD simulations, performed for the systems enzyme/compound 2 (Figures S1–S6), showed the same behavior for all dynamics, with no variation above 0.1 nm. This validates our MD simulations protocol for the systems under study.

Comparing the RMSD plots in Figure 4 for both enzymes, it is possible to see that for all the systems, the RMSD values never passed 0.4 nm for the enzyme (black lines) and 0.2 nm for the ligands (red lines) during the simulated time. However, for most of the systems, both the enzyme and compounds presented more fluctuations inside *Hss*TMPK. Inside *Var*TMPK, this fluctuation never passed 0.05 nm for compounds **2**, **3**, **4**, **6**, and **9**.

Compounds **5** and **8** were the only ones showing a similar or more instable behavior inside *Var*TMPK compared to *Hss*TMPK (see Figure 4g,h,k,l). This does not corroborate with the docking results and suggests that these compounds are not selective towards *Var*TMPK.

In addition, we analyzed the number of H-bonds formed during the MD simulations for the complexes inside *Var*TMPK and *Hss*TMPK (Figure 5). The H-bonds profiles of the complexes with compounds **2** (Figure 5a,b), **3** (Figure 5c,d), **4** (Figure 5e,f), **6** (Figure 5i,j), and **9** (Figure 5m,n) show that these compounds were capable of keeping at least three H-bonds with *Var*TMPK during the whole simulated time, presenting a profile better or similar to *Hss*TMPK. Conversely, the H-bond graphs of the complexes with compounds **5** (Figure 5g,h) and 8 (Figure 5k,l) show a more unstable behavior with the prevalence of no more than two H-bonds during the whole simulation, presenting a worse (compound **5**) or similar (compound **8**) profile compared to *Hss*TMPK.

The results regarding the formation of H-bonds can be correlated to the RMSD observed during the MD simulations, and point to compounds **2**, **3**, **4**, **6**, and **9** as more selective towards *Var*TMPK. For this reason, these compounds were selected for the determination of their binding energies inside *Var*TMPK and *Hss*TMPK based on the MM-PBSA calculations.

Regarding the MM-PBSA calculations, the results in Table 4 show that for all compounds, the mean binding energy values of the complexes formed with *Var*TMPK were lower than the values determined for the complexes formed with *Hss*TMPK. These data confirm the results obtained by the docking and MD simulations, which suggested the selectivity of these compounds towards *Var*TMPK.

Moreover, the results in Table 4 show that compounds **3** and **6** stand out among the others in terms of binding energy with *Var*TMPK, which are the most stable ones. Furthermore, the difference related to *Hss*TMPK was higher than 90 KJ/mol in both cases, suggesting a high selectivity towards *Var*TMPK.

A possible explanation for the high free energy values of the complexes between *Var*TMPK and compounds **3** and **6**, is the presence of the trifluoroacetate group (–COOCF3) attached to the tetrahydrofuran ring. The presence of three fluorine atoms—a chemical element with high electronegativity and with isolated and free electron pairs on its structure—likely favors the selective interactions inside *Var*TMPK. Therefore, these interactions can help in maintaining the complexes formed between *Var*TMPK and compounds **3** and **6**—thus contributing to the higher stability of these systems.

#### **4. Materials and Methods**

#### *4.1. Compounds Studied*

The 2D structures of the compounds studied in this work are shown in Figure 7. As mentioned above, they are all derivatives from the antivirals cidofovir and ribavirin, designed to better explore the potential selective interactions with residues of the active site of *Var*TMPK.

The ionization states of each compound in Figure 7 at physiological pH (7.4)—as well as the drug likeness criteria established according to the Lipinski's rule of five [48]—were calculated with the aid of the Chemicalize databank (https://chemicalize.com) [46]. Their toxicity and carcinogenicity in mice were calculated with the online program PreADMET (https://preadmet.bmdrc.kr/adme-prediction/).

The tridimensional structures of these compounds, and the calculation of their atomic charges, were done through the Recife model 1 (RM1) semiempirical model [49,50], using the Spartan 08® Suite software.

**Figure 7.** Structures of the studied compounds.

#### *4.2. Applied Protocols*

In previous works [4,7,9], the tridimensional structures of *Var*TMPK and *Hss*TMPK that were used for our studies include, respectively, the homology model complexed with TDP and the Mg2+ cofactor, constructed by Guimarães et al. [7], and the crystallographic structure of *Hss*TMPK complexed with TDP and Mg2+, available in the protein data bank (PDB) under the code 1E2G [51]. According to Guimarães et al. [7], the Ramachandran plot of the homology model showed that 99.5% of the residues were situated in the most favored regions.

The docking protocol used, as well as its validation by re-docking studies, was the same as employed by Guimarães et al. [7]. The MolDock algorithm [52] from the Molegro virtual docker (MVD®) was used to perform the energy calculations of the ligands inside VarTMPK and *Hss*TMPK. The binding site was restricted to spheres with radii of 6 and 10 Å, respectively, around TDP, and all the residues inside these spheres were set to be flexible. The coordinates were centered on x = 8.95, y = 22.41, and z = 0.69 for VarTMPK, and x = 13.92, y = 75.19, and z = 25.05 for *Hss*TMPK [7]. The best poses for each ligand in the viral and human enzymes were selected for further MD simulations.

The protocol comprising the application of MD simulations rounds to the best poses selected from the docking studies was based on our most recent work [9]. Each ligand was parameterized in order to be recognized by the OPLS-AA force field [53]. The enzyme/ligand complexes were put inside cubic boxes of 450 nm<sup>3</sup> containing around 13,000 Tip4P type water molecules under periodic boundary conditions (PBCs), using the GROMACS 5.1.4 software [54]. The energy minimization steps used were: (1) Steepest descent with position restraint (PR) of the ligand; (2) steepest descent without PR; (3) conjugate gradients; and, lastly, (4) quasi Newton Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm [55], with a minimal energy of 1 kcal.mol−<sup>1</sup> . Thereafter, the systems were submitted to temperature (NVT) and pressure (NPT) balancing phases in order to attain equilibrium. The equilibrated systems were, then, submitted to 500 ps of MD at 310 K with a PR for the entire system, except for water molecules, in order to ensure the equilibrium of solvent molecules, and finally, 100,000 ps of free MD simulations at 310 K without PR, with 2 fs of integration time, and a cutting radius of 10 Å for long-distance

interactions [9]. All the Glu and Asp residues were assigned with negative charges, and the Lys and Arg residues were assigned with positive charges. As mentioned above, the visual molecular dynamics (VMD) software [56] was used to analyze the MD results of the systems. In order to validate our MD simulations protocol, three MD simulations were performed for the systems VarTMPK/compound 2 and *Hss*TMPK/compound 2.

The MM-PBSA method was employed to predict the free binding energies of the ligands inside the VarTMPK and *Hss*TMPK. This study allows us to infer whether the binding process is spontaneous (freeing energy) or nonspontaneous (requiring energy), thus enabling us to point to the prototype that is more interesting and promising regarding the selective inhibition of VarTMPK. The determination of the free energy of formation of the complexes, in association with the MD simulations, take into consideration three energetic terms in the calculation of binding energy: (1) Changes in the potential energy of the system in a vacuum; (2) polar and apolar solvation of the different species; and (3) the entropy related to the formation of the complexes during the gaseous phase [9,36,37,44,45]. For all the enzyme/ligand complexes, MM-PBSA calculations were performed using the g\_mmpbsa tool [37] from the GROMACS package. In order to consider non-correlated frames, the structures for the free energy calculations were obtained at 500 ps each after the stabilization of the systems [9].

The parameters used for the docking, MD simulations, and MM-PBSA calculations performed in this work are summarized in Table 5.


**Table 5.** Parameters used in the docking and MD studies.

\* Parameters of cubic boxes: 450 nm3/1300 H2O/Tip4P molecules.

#### **5. Conclusions**

According to the docking studies, it was found that only compounds **1**, **7**, and **10** did not demonstrate selectivity towards *Var*TMPK. The MD simulations studies corroborated the docking results for compounds **2**, **3**, **4**, **6**, and **9**, implying that the complexes formed between each of them and *Var*TMPK are more stable when compared to the same complexes with *Hss*TMPK. This was also observed in the further MM-PBSA calculations of the binding energies that showed lower binding values for these compounds inside *Var*TMPK than inside *Hss*TMPK. Moreover, the MM-PBSA studies pointed to compounds **3** and **6** as the most promising selective inhibitors of *Var*TMPK with the higher difference in binding energies. This is probably due to the presence of the—COOCF<sup>3</sup> group on these compounds that could be favoring their stabilization inside *Var*TMPK.

In conclusion, our results suggest that five of the 10 proposed compounds, derived from the antivirals cidofovir and ribavirin, can be excellent alternatives for the development of new drugs against smallpox. Since TMPK is highly conserved amongst other Orthopoxviruses, similar to the vaccinia and monkeypox viruses, the same theoretical approach used here can be done with the TMPKs of those viruses as a proof of concept. Furthermore, the experimental evaluation of these compounds should be first performed

against those viruses as a primarily model to check their potential in vitro. This would help in confirming the activity of those compounds before moving to the more complicated evaluation against the Variola virus, which will demand both an authorized laboratory and a stricter safety protocol.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10.3 390/ph14101027/s1, Figure S1: RMSD of the first MD simulation for the system HssTMPK/Compound 2, Figure S2: RMSD of the second MD simulation for the system HssTMPK/Compound 2, Figure S3: RMSD of the third MD simulation for the system HssTMPK/Compound 2, Figure S4: RMSD of the first MD simulation for the system VarTMPK/Compound 2, Figure S5: RMSD of the second MD simulation for the system VarTMPK/Compound 2, Figure S6: RMSD of the third MD simulation for the system VarTMPK/Compound 2.

**Author Contributions:** Conceptualization, T.C.C.F., A.P.G. and D.R.G.; methodology, D.R.G. and F.R.S.; software, T.C.R. and T.C.C.F.; validation, T.C.C.F. and D.R.G.; formal analysis, T.C.C.F. and D.R.G.; investigation, D.R.G. and F.R.S.; resources, T.C.C.F., Z.P., K.K. and M.V.; data curation, T.C.C.F. and T.C.R.; writing—original draft preparation, D.R.G.; writing—review and editing, D.R.G., T.C.C.F., M.V., Z.P. and K.K.; visualization, D.R.G.; supervision, A.P.G., T.C.R. and T.C.C.F.; project administration, T.C.C.F. and K.K.; funding acquisition, T.C.C.F., M.V., Z.P. and K.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Brazilian agencies Conselho Nacional de Pesquisa (CNPq), grant no. 308225/2018–0; Fundação de Amparo a Pesquisa do Estado do Rio de Janeiro (FAPERJ), grant no. E-02/202.961/2017; and the *Excelence project PˇrF* UHK 2011/2021-2022. This study was also partially supported by grants from the Ministry of Health of the Czech Republic (FN HK 00179906) and Charles University in Prague, Czech Republic (PROGRES Q40).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data is contained within the article and supplementary material.

**Acknowledgments:** The authors are grateful to the Military Institute of Engineering, Federal University of Lavras, Federal University of Viçosa and University of Hradec Kralové for the infrastructure.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Deep Modeling of Regulating Effects of Small Molecules on Longevity-Associated Genes**

**Jiaying You, Michael Hsing and Artem Cherkasov \***

Vancouver Prostate Centre, Department of Urologic Sciences, Faculty of Medicine, University of British Columbia, Vancouver, BC V6H 3Z6, Canada; jyou@prostatecentre.com (J.Y.); mhsing@prostatecentre.com (M.H.)

**\*** Correspondence: acherkasov@prostatecentre.com

**Abstract:** Aging is considered an inevitable process that causes deleterious effects in the functioning and appearance of cells, tissues, and organs. Recent emergence of large-scale gene expression datasets and significant advances in machine learning techniques have enabled drug repurposing efforts in promoting longevity. In this work, we further developed our previous approach—DeepCOP, a quantitative chemogenomic model that predicts gene regulating effects, and extended its application across multiple cell lines presented in LINCS to predict aging gene regulating effects induced by small molecules. As a result, a quantitative chemogenomic Deep Model was trained using gene ontology labels, molecular fingerprints, and cell line descriptors to predict gene expression responses to chemical perturbations. Other state-of-the-art machine learning approaches were also evaluated as benchmarks. Among those, the deep neural network (DNN) classifier has top-ranked known drugs with beneficial effects on aging genes, and some of these drugs were previously shown to promote longevity, illustrating the potential utility of this methodology. These results further demonstrate the capability of "hybrid" chemogenomic models, incorporating quantitative descriptors from biomarkers to capture cell specific drug–gene interactions. Such models can therefore be used for discovering drugs with desired gene regulatory effects associated with longevity.

**Keywords:** library of integrated network-based cellular signatures (LINCS); longevity; gene regulating effects; gene descriptors; molecular fingerprints; machine learning; deep neural network; drug repurposing

#### **1. Introduction**

Aging is an ultimate, intrinsic risk factor for all degenerative conditions, and the incidence of age-associated diseases, such as Alzheimer's, Parkinson's, dementia, and osteoporosis (among many others), increases dramatically as we age. Moreover, humans are likely to suffer from conditions, such as vision impairment, chronic diseases, and cancers in older ages, all of which can greatly reduce the quality of life. Numerous studies were conducted in recent years to reverse the biological aging clock in animals, and a recent work has successfully demonstrated restored vision in mice by switching certain cells to a "younger" state [1]; thus, promising the possibility to regenerate tissues and organs in mammals, and encouraging researchers to explore longevity beyond laboratory animals. For example, mTOR inhibitors marked a milestone in anti-aging drug discovery and produced an FDA-approved drug, rapamycin, which extended the life spans of several model organisms. Rapamycin succeeded in increasing the lifespans by nearly three-fold in mice [2] and was proven to prolong life in yeast, worms, and flies [3]. However, there are objections to rapamycin, including warnings that such an immunosuppressive drug could lead to the development of malignancies, such as skin cancer (noted in an FDA statement). Moreover, irreversible side effects, such as diabetes [4], are also main concerns that have prevented the use of rapamycin at a larger scale. In recent years, a variety of similar studies have proposed geroprotector candidates that could potentially promote life spans [5–7].

**Citation:** You, J.; Hsing, M.; Cherkasov, A. Deep Modeling of Regulating Effects of Small Molecules on Longevity-Associated Genes. *Pharmaceuticals* **2021**, *14*, 948. https:// doi.org/10.3390/ph14100948

Academic Editor: Osvaldo Andrade Santos-Filho

Received: 25 August 2021 Accepted: 18 September 2021 Published: 22 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

For example, acarbose [8], initially used to treat diabetes, showed significant effects in improving the health and life spans of mice.

Recent developments in genomics and transcriptomics have led to a vast collection of large-scale gene expression datasets. Connectivity Map (CMap) [9], introduced in 2006, is aimed to link connections among genes, drugs, and diseases, by comparing gene signatures with reference perturbations; thus, it is a great resource when developing drug candidates with desired efficacies. CMap data have greatly been used in the bioinformatics field, especially in drug discovery applications, to retrieve novel chemicals that share similar regulatory effects on gene expressions with known perturbations. The NIH Library of Integrated Network-based Cellular Signatures (LINCS), inspired by the success of CMap, was funded as a next generation platform, using a more advanced approach at a lower cost, producing high-throughput gene expression profiles that have outpaced CMap. LINCS, with data stored in NCBI Gene Expression Omnibus (GEO), describes over 1 M gene perturbations, inflicted by thousands of small molecules at a variety of conditions and across multiple cell lines. With the increasing availability in gene expression profiles, we now have the opportunity to study how small molecules affect genes in human cells and to utilize the available gene expression data to predict drug responses, offering tremendous value for drug discovery and repurposing. For example, the limited biological knowledge on the recent COVID-19 outbreak made it difficult to choose appropriate treatments; however, querying differentially expressed genes in similar diseases (SARS-CoV-2) against CMap, to detect similarly behaved drug candidates without any prior knowledge, was shown to be an efficient therapeutic strategy [10]. In addition, rapidly emerging machine learning technologies provide powerful computational tools to discover the underlying biological mechanisms in a variety of domains. Thus, our previous study, DeepCOP, has proven the capacity of deep learning models in predicting gene expression regulating effects using LINCS perturbation datasets [11].

In this work, we propose repurposed anti-aging drug candidates by analyzing their regulation effects on the expression of pro-longevity and anti-longevity genes from the LINCS dataset. While simply querying LINCS is still a valid method to repurpose the existing drugs, this approach is limited to a very small portion of the chemical space with only about 5000 compounds. Moreover, most of the experiments described in the CMap/LINCS depository were designed to measure perturbation responses in cancer cell lines; thus, making it challenging to study longevity effects of drugs in normal, nontumorous cell lines. Thus, it is essential to build more general machine learning models that can harness the existing data from LINCS and apply to larger chemical space and non-cancer cell lines.

Herein, we hypothesize that deep neural network (DNN) could learn from high dimensional features, including gene ontology terms, small molecule descriptors, cell line mutation, and methylation data to produce reliable predictions on drug–gene regulation effects across multiple cell lines. To build testable computational models to predict regulating effects on unknown data, we applied assorted classification approaches, including DNN, random forest (RF), Naïve Bayes, and logistic regression. We tested the drug (D)–gene (G) regulation effects on external normal cell lines using the pre-trained DNN models. We identified 13 small molecules from the LINCS dataset that demonstrated potential ability to regulate aging gene expressions with the desired effects. We further demonstrated that the efficacy of these repurposed drugs on longevity is supported by some examples from the literature.

#### **2. Results and Discussion**

#### *2.1. Sample Distributions*

We have labeled the upregulated and downregulated D–G–C interactions with the top/bottom 5% Z-score cut-off in LINCS for each cell line. This results in a comparably much smaller proportion in positive samples then the negatives. In addition, LINCS experiments are not distributed evenly across cell lines, so that the sample size differs from different cell lines. For example, cell line A375 contains 73,610 unique D–G–Cs, labeled as positive samples with top 5% threshold, while the remaining 1.2 million D–G–Cs with unknown regulating effects form the negative set. Table 1 demonstrates the unique drugs and genes in each cell line for the upregulated models.

**Table 1.** Sample distribution for each cell line in upregulated models; positive samples are defined as the top 5% gene signatures, while negative samples are the remaining 95%. Unique numbers of drugs and aging genes are summarized below. For example, cell line A375 contains 73,610 unique D–G–Cs as positive samples with the top 5% threshold, while the remaining 1.2 million D–G–Cs with unknown regulating effects form the negatives. (#: Counts.)


#### *2.2. CMAP LINCS Dataset Querying Results*

By diving into the positive samples from the model 1 dataset labeled as upregulated D–G–C pairs, we ranked compounds that interact with the most pro-longevity genes across all LINCS cell lines. For each small molecule in the positive samples of model 1, we built a pool of (drug)–(pro-longevity gene)–(cell line) pairs and selected the molecules with the most interactions. To avoid chemicals that only upregulated pro-longevity genes within a small range of cell lines, or chemicals that only interacted with a few certain genes, we calculated the unique number of pro-longevity genes and cell lines to ensure the diversity and robustness of the selected chemicals. Only pairs that covered above 100 pro-longevity genes and more than five cell lines were included for the final ranking. Table 2 shows the top 10 small molecules that upregulate the most pro-longevity genes across all LINCS cell lines.

**Table 2.** Top-ranked 10 small molecules that upregulate the most pro-longevity genes across all cell lines in LINCS. (#: Counts.)


Conversely, positive samples in model 2 indicate downregulated D–G–C interactions for anti-longevity genes. We ranked small molecules that downregulated the most antilongevity genes in model 2, using the same filter as Table 2, and obtained the 10 top-ranked chemicals, as shown in Table 3.

**Table 3.** Top-ranked 10 small molecules that downregulate the most anti-longevity genes across all cell lines in LINCS. (#: Counts.)


We observed seven identical drugs AT 7519, CGP-60474, trichostatin-a, alvocidib, narciclasine, oxetane, emetine in both tables, which showed not only upregulation effects with pro-longevity genes, but also downregulation effects with anti-longevity genes across multiple cancer cell lines in LINCS. In addition, PHA-793887, zibotentan, and mitoxantrone showed potential in upregulating pro-longevity gene expression, while chemical BI-2536, LSM-3353, and BMS-345541 showed downregulated expressions on anti-longevity genes. In total, we can repurpose 13 unique small molecules in LINCS perturbations for longevity purpose.

Figure 1 shows the top 10 ranked small molecules with D–G (pro-longevity genes)–C interactions from model 1, and Figure 2 illustrates D–G (anti-longevity genes)–C interactions on the top 10 ranked small molecules from model 2. The color indicates the occurrence on different cell lines. From green to blue, the line connecting longevity genes with repurposed chemicals demonstrates a higher occurrence on different LINCS cell lines. For example, drug BI-2536 connects with anti-longevity gene RAD51 with downregulating effects in 9 cancer cell lines, while drug trichostatin-a promotes pro-longevity gene expressions (*GDI1*, *ZNF224*, *MAP3K13*, *EPHB1*, *ZNF500*, *PPFIA3*) in 10 cancer cell lines.

**Figure 1.** Top 10 small molecules that upregulate pro-longevity genes across all LINCS cell lines. Pro-longevity genes are shown on chromosomes bands, repurposed chemicals are shown on the drug band. Colors of interactions indicate the relationship occurrence on different cell lines. Green D–G (pro-longevity genes) links indicate the interactions were captured in less than six cell lines; pairs found in above five cell lines are labeled in blue.

#### *2.3. Model Performance*

–

–

We estimated the accuracy parameter, AUC score, precision, and recall values for each model, as shown in Table 4. A skewed class distribution in models 3–8 failed accuracy on evaluation of the model performance. Another commonly used interpretation metric, ROC curve, was employed in binary classification problems [12] to diagnose the trade-off between sensitivity and specificity, and a higher ROC value indicates the trained model is better in distinguishing between categories. However, area under the ROC curve could be misleading when the one class significantly outweighs the other [13]. AUC score and ROC visualization could be deceptively appealing in this scenario. Instead, precision and recall provide a straightforward evaluation, focusing on the comparably small positive class based on the imbalanced dataset [13], given the concept of the precision-recall curve (PRC) –

being the indictor of true positives in all positive predictions. Our results demonstrate that DNN outperformed the other benchmark approaches including RF, Naïve Bayes, and ridge regression for every model, despite the selected features in terms of the APR score with an acceptable drop in AUC and accuracy. We also found that by concatenating the cell line methylation beta values and binarized mutation status, the deep neural network is more capable at extracting useful features from high-dimensional feature sets through the learning process and results in better performance than learning with single cell line annotation resource. Deep neural network models (model 1–2) have outstanding APR scores and, thus, encourage making reliable predictions in further investigations on unknown datasets. ROC and precision recall curves are shown in Figure 3 for model 1 and model 2, and curves for the rest of the DNN models are provided in the supplementary figures.

– **Figure 2.** Top 10 small molecules that downregulate anti-longevity genes across all LINCS cell lines. Anti-longevity genes are shown on chromosomes bands, repurposed chemicals are shown on the drug band. Colors of interactions indicate the relationship occurrence on different cell lines. Green D–G (anti-longevity genes) links indicate the interactions were captured in less than six cell lines; pairs found in above five are labeled in blue.


**Table 4.** Model performance on overall accuracy, Area under the ROC curve (AUC), and area under the precision-recall curve (PRC) for the positive class on deep neural network(DNN), random forest, naïve bayes and logistic regression models.


**Figure 3.** ROC and precision curves for model 1 ((**a**): PRC, (**b**): ROC) and model 2 ((**c**): PRC, (**d**): ROC). While the AUC score dropped compared with models 3–8, the dramatic increase in the APR score (positive class) gained confidence in predicting the positives in further exploration.

#### *2.4. Prediction on Normal Cell Lines*

Newly generated pairs with top-ranked repurposed chemicals and longevity genes were predicted with our best-performed models—model 1 to predict upregulating effects and model 2 for downregulating the effects on normal cell lines, NHBEC and HGEC6B. In each drug candidate pool (Equation (1)), we generated D (drug candidate)–G (longevity genes)–C (normal cell line) connections as input for the pre-trained deep neural network models, and explored the positive predictions with desired regulation effects, respectively. We summarized the total positive predictions along with the number of corresponding aging genes for each drug candidate in model 1 and model 2, respectively, in Figure 4. The

Overall samples of Dሺiሻ ൌ Pool൫Dሺiሻ൯ ൌ Dሺiሻ െ GሺAging geneሻ െ CሺNormal cell lineሻ

prediction results confirmed the efficacy of the potential desired aging gene regulation effects on normal cell lines. Overall samples of D(i) = Pool(D(i)) = D(i) − G(Aging gene) − C(Normal cell line) (1)

**Figure 4.** Bar charts on positive predictions for repurposed drugs in CMAP LINCS dataset, for normal cell lines, NHBEC and HGEC6B, using model 1(**a**) and model 2(**b**). Orange bars demonstrate the total number of positive predictions for each drug candidate, and blue bars illustrate the number of unique aging genes among positive predictions.

– – " " " " " " " " " " – – at 'BI 6′ Table 5 shows the proportion of positive predictions against the D–G–C pool for each promising drug candidate. We observed an above 80% positive prediction rate for drugs "BI-2536", "CGP-60474", "oxetane", "alvocidib" and "PHA-793887" in both model 1 and model 2, demonstrating their great potential to upregulate pro-longevity gene expression and downregulate anti-longevity gene expression in normal cells. All of the D–G–C connections in BI-2536 pool were predicted positive in model 2, meaning that 'BI-2536 ′ downregulated all of the anti-longevity genes we collected from GenAge.

– – **Table 5.** Percentage of positively predicted D (drug candidates)–G (aging genes)–C (normal cell line) pairs for each promising drug candidate in model 1 and model 2. Highlighted repurposed drugs showed great potential in regulating aging gene expressions on normal cell lines in both models. Drugs in bold achieved high positive rate (above 80%) on both models.


#### *2.5. Repurposed Drugs*

We finally identified 13 molecules that helped to promote pro-longevity gene expressions, inhibit anti-longevity gene expressions, or act in both desired ways. Structures of

repurposed molecules are shown in Figure 5. While performing experimental validations on these 13 molecules in longevity studies in model organisms is out of the scope for this paper, previous research has uncovered a number of relevant traits of those chemicals with our repurposed objectives. Table 6 summarizes the evidence that supports our findings.

**Figure 5.** Molecular structures of repurposed drug candidates for longevity purpose.

" " Among 13 discovered longevity-promoting chemicals, four (AT 7519, alvocidib, CGP-60474, and PHA-793887) are indicated as cyclin-dependent kinase (CDK) inhibitors. Interestingly, previous studies have shown inhibition on CDK-2 resulted in tolerance towards environmental stress and promoted anti-aging in *Caenorhabditis elegans* [14,15]. Moreover, "BI-2536" inhibits tumor growth in vivo by inducing apoptosis on cancer cells as an inhibitor of polo-like kinase 1 [16]. Experiments have found the effectiveness in anti-aging on emetine dihydrochloride treated to leukemic mice [17]. In addition, narciclasine was proven to attenuate diet-induced obesity by promoting oxidative metabolism [18] while trichostatin-a, a histone deacetylase (HDAC) inhibitor, was proven to increase lifespans by promoting hsp22 gene expression on *Drosophila melanogaster* [19]. Zibotentan was designed and tested on castration-resistant prostate cancer patients as an endothelin A receptor antagonist [20], it is also proven to prevent hypertension and maintains cerebral perfusion [21]. A study conducted among 42 women with breast cancer showed great potential in mitoxantrone as a treatment for advanced breast cancer with mild side effects compared to traditional treatments, such as chemotherapy [22].


**Table 6.** Summary of previous research findings for repurposed pro-longevity drugs.


**Table 6.** *Cont.*

#### **3. Materials and Methods**

#### *3.1. Datasets*

Connectivity map (CMap) is a pilot project that aims to characterize cellular responses under pharmacologic perturbagens, thus, fulfilling the underdeveloped space in diseaseassociated gene functions, and contributing toward drug development by estimating off-target activities and eliminating unfit candidates at early stages. To date, the CMap LINCS dataset encompasses more than 1.5 million gene expression signatures related to up to 5000 small molecules and more than 10,000 genes across a total of 77 cancer cell lines [9]. Such a vast amount of gene expression information enables computational approaches, such as the deep neural network, to learn data patterns, to predict gene regulation effects [11], and drug side effects [24]. In this work, we collected L1000 high-throughput gene expression data from LINCS phase I dataset; the dataset contains perturbation data points on gene expression level under small molecular treatments at different conditions, such as dosages, cell line cultures, and time points. To reduce the data size and to maintain consistency across multiple cell lines, only perturbations with 24-h treatment were kept, and samples with molecular dose units other than "µM" were excluded.

Aging-related genes were downloaded from the GenAge (the Aging Gene Database) source, which labels pro- and anti-longevity genes in various model organisms, including (but not limited to) *Caenorhabditis elegans*, *Drosophila melanogaster*, and *Zaprionus paravittiger*. GenAge has been developed through manual curation by experts and several collaborated associations. In this work, we collected a total of 2205 genes from GenAge that are considered to have either pro- or anti-longevity effects in 10 different model organisms, including *Caenorhabditis elegans*, *Mus musculus*, *Saccharomyces cerevisiae*, *Drosophila melanogaster*, *Mesocricetus auratus*, *Podospora anserina*, *Schizosaccharomyces pombe*, *Danio rerio*, and *Caenorhabditis briggsae*. These aging-related genes identified from the model organisms were mapped to 889 human genes in total, where 397 were labeled with prolongevity effects and 492 with anti-longevity effects, respectively (these datasets are downloadable online through https://genomics.senescence.info/download.html (accessed on 17 September 2021)). Out of a total of 889 collected aging genes, 729 were successfully mapped to the LINCS dataset and were further investigated in our models.

Only filtered LINCS perturbations that contained 729 aging-related genes were kept for further machine learning modeling and prediction. To label gene expression regulations, we used the left–right percentile method on the Z-score with a threshold of 5% for each cell line. Only the top 5% of gene expression values were considered as upregulation samples in the upregulation models, and the bottom 5% of gene expression values were marked as downregulated in the downregulation models, while the remaining 95% samples were treated with 'unknown' effects (Figure 6). In the training models that predicted upregulation effects, the above defined upregulation samples were treated as positive samples, while the remaining 95% were treated as negative samples. In the training models that predicted downregulation effects, the above defined downregulation samples were treated as positive-, while the remaining 95% were treated as negative samples. Figure 7 features gene expression Z-score distribution and data sampling in upregulation and downregulation models.

ples were treated with 'unknown' effects (Figure

ples were treated with 'unknown' effects (Figure

– **Figure 6.** Data pipeline in collecting and labelling LINCS perturbations. Only perturbations include aging genes were kept for further analysis. The left–right percentile method was applied to label upregulation and downregulation effects with 5% threshold on the Z-score for each cell line. –

**Figure 7.** Z-score normal distribution. Positive samples were selected from the top 5% gene expressions perturbations for each cell lines in models predicting up/non-upregulation effects. Similarly, the bottom 5% perturbations were identified as the positives in models predicting down/nondownregulation effects.

#### *3.2. Gene Descriptors*

Gene ontology (GO) terms have been commonly used for gene annotations in recent drug discovery applications [25,26]. The GO terms consist of a set of categories that describe the gene functions as in cellular components, biological processes, and molecular functions. R package "ontologySimliarity" was built for comparing gene semantic similarity as encapsulated by the GO annotations, including nearly 20,000 terms that relate to all branches of gene ontology. The gene ontology descriptors for the 729 aging related human genes in LINCS were generated to a list of binary integers using one-hot encoding with R package "ontologySimliarity". Only GO terms that shared with at least three age-related gene domains were selected to reduce the feature size, which resulted in 946 GO features in the final standard dataset of aging-related genes. Our previous works [11,27] already illustrated the efficacy of using GO terms as gene descriptors in machine learning models, especially with deep learning architectures.

#### *3.3. Molecular Fingerprints*

The molecular fingerprints were encoded into 0|1 binary vectors to encode chemical structures, where 1 indicated a specific substructure was found in a given molecule. In this paper, Morgan fingerprints [28] were generated for small molecules in the LINCS phase I dataset using the python library "RDKIT". The Morgan fingerprints were calculated by counting the path through each atom in the chemical given a specific radius and a bit number. By increasing the radius, more fragments can be included in the Morgan fingerprint computations and can output a larger chemical feature space. We set the radius to 2 in this work and generated 2048 Morgan fingerprints for each molecular using canonical SMILES.

#### *3.4. Cell Line Features*

Gene mutations play an important role in cancer genetics and can be utilized to represent cell line functionalities, as previous studies have demonstrated significant performance of mutation features in machine learning approaches [29,30]. We collected copy number alternations and coding variants in "pan-cancer" from the Genomics of Drug Sensitivity in Cancer public platform, and a total of 735 mutation markers were labeled for each cell line. The mutation annotation dataset for pan-cancer is freely downloadable at https://www.cancerrxgene.org/downloads (accessed on 17 September 2021).

Besides mutation markers, DNA methylation levels also contributed to drug response prediction applications [31,32]. Its impact in regulating gene expression determines organ functionalities and may cause severe diseases, such as cancer. The 450K BeadChip array provides high-throughput methylation data at more than 450K CpG sites, at a low cost, making it feasible for machine-learning algorithms to learn and extract informative features. We collected methylation profiling from the NCBI gene expression omnibus (GEO) series GSE68379, where a total of 1028 cell lines were tested with the methylation level for each CpG island. Ranging from 0 to 1, beta value was calculated as a ratio of methylated intensity verses the sum of methylated and unmethylated intensity at the probe level. The formular of beta value *B* at the specific *j* CpG site is defined as Equation (2):

$$B(j) = \frac{\max\left(y\_{j, metallicityisted}, 0\right)}{\max\left(y\_{j,methylated}, 0\right) + \max\left(y\_{j,monthlyated}, 0\right) + a} \tag{2}$$

where *yj*,*<sup>q</sup>* stands for *j th* probe intensities in q status. *α* is the added in denominator to avoid computational error. As recommended by Illumina [33], beta value is used in this work to represent the methylation level for cell lines. We used R package "FCBF" to select limited informative methylation beta values from 450 K CpG sites. The fast correlation based filter (FCBF) algorithm [34] selected the most relevant features towards histology sites of LINCS cancer cell line. Figure 8 shows the corresponding number of selected variables under a variety of (cell line)–(cancer histology sites) correlations. By choosing a correlation cut-off at 0.6, we obtained 1183 subset methylation levels for each cell line.

#### *3.5. Querying Camp LINCS Dataset*

To retrieve aging-related drug (D)-gene and (G)-cell line (C) combinations, we queried perturbations with the aging-related genes in the LINCS phase I dataset. Samples were consequently labeled through 5% left–right percentiles as upregulation effects and downregulation effects, respectively. Human pro-/anti-longevity genes extracted from the GenAge platform were used as input samples to query against CMAP LINCS dataset signatures. Drugs that upregulated pro-longevity gene expression or downregulate anti-longevity gene expression across multiple cell lines were identified and could be repurposed for promoting longevity. Top chemicals, ranked by the number of D–G–C interactions, showed great potential in increasing lifespans in humans, as supported by previous studies.

**Figure 8.** Histogram of correlation values and the corresponding variable counts using FCBF. Correlations were calculated between CpG sites and cancer histology categories.

#### *3.6. Machine Learning Models and Deep Neural Network*

– – – Machine learning (ML) models have demonstrated unprecedented performance in recent computational biology applications [35–37]. ML approaches are programmed without explicit knowledge to self-extract informative features by learning the parameters, such as weights, and illustrate patterns towards the output. The pervasive applications in ML have changed our day-to-day lives, e.g., via object recognition applied in auto-driving cars, recommender systems on social media, and in-depth understanding on drug behavior. The capable solutions that trained models can learn are generally divided into regression and classification problems, where a regression model predicts the true numeric value given a set of features, and a classification model gives a category the input sample belongs. Commonly deployed classification algorithms include logistic regression, random forest (RF), and neural network (NN), each with pros and cons. It is notable that in the family of ML, deep learning (DL) plays an important part and is capable of learning more complex patterns with neurons, just as human brains. The advantage of DNN lies in absorbing datasets with high dimensions and recognizing nonlinearity; thus, providing solutions to a vast range of practical problems.

– To better illustrate D–G–C relationships and have a clear evaluation on the feature power, here we differentiated the sample size and feature sets, respective to up- and downregulation effects predictions, and designed the following eight models. The selected feature set in each model is: model 1, model 2, model 3, and model 4: gene descriptor, drug descriptor, cell line mutation status, cell line methylations; model 5 and model 6: gene descriptor, drug descriptor, cell line mutation status; model 7 and model 8: gene descriptor, drug descriptor, cell line methylations.

Due to the harsh filter on the sample selection process of determining up- or downregulation effects, only the top 5% samples were labeled as positive samples from LINCS for each model. Such severe imbalanced datasets challenged the ML approaches and could be difficult in measuring model performance for future predictions. To avoid models from being heavily influenced only by the majority class, we randomly selected the same number of negative and positive samples in model 1 and model 2. We used samples across all the cell lines in LINCS for models (1–2), and compared them with the remaining models (3–8)

that used the imbalanced dataset extracted from two cell lines "U266" and "NOMO1", which contained less data points and, thus, were easier for traditional machine learning benchmarks to train (details are provided in Table 7).

**Table 7.** Detailed model layouts on predictive labels, sample cell lines, used feature set, and whether the negative class is being downsampled. For example, model 3 was trained with gene ontology term, drug descriptors, both cell line mutation status, and methylation values on perturbation responses on cell lines "U266" and "NOMO1". The positive samples from model 3 were D–G–C interactions with the top 5% upregulated gene expression signatures, whereas the negative samples were the remaining 95% perturbation data.


Feature definitions: Type 1: gene ontology descriptors + drug fingerprints + cell line mutational status + cell line methylation levels. Type 2: gene ontology descriptors + drug fingerprints + cell line mutational status. Type 3: gene ontology descriptors + drug fingerprints + cell line methylation levels.

We then compared DNN model performance with commonly used classification solving algorithms including RF, Naïve Bayes, and logistic regression. Due to the large feature size, L2 norm(ridge) regulation was applied in logistic regression models to avoid coverage failure and overfitting, by taking the squared value of trained weighs as the penalty term in the cost function. DNN was constructed with four layers (one input layer, two hidden layers, and one output layer), and the information was randomly dropped by 50% in forward propagation. Selu and Rule activation functions were used for the internal hidden layers, adding complexity and non-linearity to the model, followed by a SoftMax activation for the final output layer, to transfer values into possibilities. The neural numbers of the hidden layers were identical as those from the initial input feature list. Figure 9 illustrates the DNN structure with the number of neurons and activation functions for model 1 and model 2. For more robust evaluation results, early stopping and three-fold cross validation (CV) were applied in the DNN model to avoid overfitting. We initiated the model with hyperparameters, such as layer numbers identical with our previous studies that demonstrate decent performance, and slightly revised them by watching validation results in such a manner where model complexity must decrease when overfitting and increase when underfitting.

#### *3.7. Model Evaluation*

To evaluate the performance on the developed models, we computed the overall accuracy score, area under the ROC curve (AUC) parameter, as well as precision and recall values for each model. Accuracy is simply calculated as the correct prediction proportion on the whole dataset, whereas the receiver operating characteristic curve (ROC) visualizes the model performance at all classification thresholds by comparing true positive rate (TPR) versus the false positive rate (FPR). Accuracy and AUC score are commonly used in evaluating machine-learning models and offer fairly accurate insight on model performance. However, both values are easily dominated by the majority group in the imbalanced datasets and could achieve misleading high scores. To address this issue, we introduced precision and recall as supplementary evaluations. Precision (Equation (3)), also known as positive predictive value, signifies the proportion of positive samples that are predicted positive. Recall (Equation (4)), also referred as true positive rate or sensitivity, evaluates the proportion of true positives out of all predicted positive

samples. Overall, precision and recall estimate the prediction power on the positives, which is highly important in our case, given that future repurposed drug candidates are based on the positive predictions and a false positive is more disastrous and costly than a false negative. As an alternative visualization of ROC curve on imbalanced datasets, the precision-recall curve (PRC) illustrates the trade-off on precision and sensitivity on every possible cut-off. A reasonable PRC curve should be above the diagonal line, with the area under the curve more than 0.5.

– **Figure 9.** Deep neural network structure for model 1 and model 2. Features of models 1–2 were contributed by gene ontology terms (946 bits), molecular fingerprints (2048 bits), cell line mutation status (735 bits), and cell line methylation beta level (1183 bits), forming the length of 4788 bits in total.


#### *3.8. Prediction on Normal Cell Lines*

Given the fact that all data in LINCS are for cancer cell lines, it is essential to run computational predictions for repurposing drugs with expected pro-longevity effects in normal, non-cancerous cells. The determining factor in choosing normal cell lines for prediction is the availability of features that are identical with our trained models. Two normal cell lines "NHBEC" and "HGEC6B" were tested for methylations in beta values using the same technology—Illumina 450K BeadChip arrays—and were annotated with identical CpG sites, as in GSE92843 and GSM2438425, respectively. As for the mutational status, these two normal cell lines were simply annotated as having 'none' in the prediction models.

We tested the regulation effects on the top 10 ranked promising drugs that we previously queried from LINCS on these two normal cell lines, "NHBEC" and "HGEC6B", with our best performed models, and provided the probabilities on desired regulating effects with pro-/anti-longevity genes. These 10 potential pro-longevity chemicals were paired with age genes under two normal cell lines, forming in total 7940 D–G–C pairs to be tested with up/non-upregulating (Equation (5)) and 9840 D–G–C down/non-down (Equation (6)), respectively. Figure 10 illustrates a flowchart for applying the longevity prediction models to these two normal cell lines. The mutational profiling for normal cell lines was labeled "0" in feature representation, and the methylation beta levels were collected from the Gene Expression Omnibus (GEO) "GSE92843" and "GSM2438425".

10 (promising drugs) × 397 (pro − longevity genes) × 2 (cell lines) = 7940 (5)

10 (promising drugs) × 492 (anti − longevity genes) × 2 (cell lines) = 9840 (6)

**Figure 10.** Predictions on normal cell lines with aging related genes and promising drug candidates queried from LINCS. Positive predictions will be of interest with desired regulating effects towards pro-/anti-longevity genes.

#### **4. Conclusions**

– – – It is estimated that the anti-aging global market value was over 60 billion US dollars annually in 2020 [38]. Machine learning tools can utilize substantial transcriptional perturbation data from resources, such as CMap and LINCS, and transfer them into predictive models and actionable knowledge on modulation of longevity genes. In this study, we labeled gene expression changes using the left–right percentile at a 5% threshold for each drug–gene–cell perturbation in the LINCS datasets and analyzed the labeled samples with known human aging-related genes. We created several machine-learning models to classify the direction of gene expression changes by using combined descriptive features of small molecules and genes along with information on cell line mutations and methylation

– –

levels. The deep neural network models outperformed the other K-machine learning methods and demonstrated promising accuracy in predicting up- or down-gene-regulating effects on perturbations beyond the scope of the original LINCS dataset. In addition, we demonstrated that the longevity models, while trained from cancer cell lines, are applicable to normal cell lines, and the models predicted a list of drug candidates that could have potential to be repurposed as pro-longevity agents. Quantitative predictions on all possible combinations of D (repurposed drug)–G (aging gene)–C (normal cell line) demonstrated the desired regulating effects on normal cells for the repurposed drugs with high positive rates. As a result, we identified 13 repurposing drug candidates that could potentially promote longevity by regulating aging-related gene expressions towards the desired direction, either upregulating pro-longevity genes, downregulating anti-longevity genes, or both. Interestingly, some of the proposed drug candidates were previously reported with aging-related functionalities in a number of model organisms. For example, one of the repositioned drug candidates, trichostatin-a, was found efficient at promoting anti-aging gene expression among fruit flies [19].

Our study utilized knowledge transferring from high-throughput gene expression profiling to testable data models, and achieved accurate performance in validating regulation effects, despite the severe imbalance of the data classes. In comparison to our previous model, DeepCOP [11], which is limited to only drug and gene descriptors, our current model has incorporated additional cell line descriptors that allow knowledge to be transferred from one set of cells (for example, cancer cells) to another set of cells (such as normal cells) using a single unified DNN. This task would not have been possible with DeepCOP models, where separate, disconnected DNN models were built for each individual cell line.

The limitations of this study include the possible improved methodologies in NN schemes and the lack of experimental validation of repurposed chemicals. One noticeable NN scheme—the graph convolutional neural network (GCN)—was presented in various studies, from drug discovery [39] to gene interactions [40]. In bioinformatics applications, two major types of graph structures were applied [41]: molecular structures and interaction networks. A multi-relational interaction network could be explored in GCN models using our preprocessed dataset that contains three domains—cells, genes, and drugs. Experimental validations should also be considered in future studies, in conjunction with docking simulations and ADMET estimations (a work in progress) using our in-house drug development platforms [42].

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/ph14100948/s1, Figure S1: Model 3 AUC curve, Figure S2: Model 3 PR curve, Figure S3: Model 4 AUC curve, Figure S4: Model 4 PR curve, Figure S5: Model 5 AUC curve, Figure S6: Model 5 PR curve, Figure S7: Model 6 AUC curve, Figure S8: Model 6 PR curve, Figure S9: Model 7 AUC curve. Figure S10: Model 7 PR curve, Figure S11: Model 8 AUC curve, Figure S12: Model 8 PR curve, Table S1: Sample distribution in down-regulated models, Table S2: Sample distribution in up-regulated models.

**Author Contributions:** Conceptualization, J.Y., M.H. and A.C.; methodology, J.Y., M.H. and A.C.; software, J.Y.; validation, J.Y., M.H. and A.C.; formal analysis, J.Y., M.H. and A.C.; investigation, J.Y., M.H. and A.C.; resources, J.Y., M.H. and A.C.; data curation, J.Y., M.H. and A.C.; writing—original draft preparation, J.Y.; writing—review and editing, J.Y., M.H. and A.C.; visualization, J.Y.; supervision, M.H. and A.C.; project administration, A.C.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Canadian Institutes of Health Research (CIHR), grant number 20R34369.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are contained within the article or Supplementary Material.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**

