**Preface to "Computational Approaches: Drug Discovery and Design in Medicinal Chemistry and Bioinformatics"**

To date, computational approaches have been recognized as a key component in drug design and discovery workflows. Developed to help researchers save time and reduce costs, several computational tools have been developed and implemented in the last twenty years. At present, they are routinely used to identify a therapeutic target, understand ligand–protein and protein–protein interactions, and identify orthosteric and allosteric binding sites, but their primary use remains the identification of hits through ligand-based and structure-based virtual screening and the optimization of lead compounds, followed by the estimation of the binding free energy. The repurposing of an old drug for the treatment of new diseases, helped by in silico tools, has also gained a prominent role in virtual screening campaigns.

Moreover, the availability and the decreasing cost of hardware and software, together with the development of several web servers on which to upload and download computational data, have contributed to the success of computer-assisted drug design. These improved, accurate, and reliable methods should help to add new and more potent molecules to the group of approved drugs. Nevertheless, the ease of access of computational tools in drug design (software, databases, libraries, and web servers) should not encourage users with little or almost no knowledge of the underlying physical basis of the methods used, who could compromise the interpretation of the results. The role of the computational (medicinal) chemist should be recognized and included in all research groups. These considerations led us to promote a volume collecting original contributions regarding all aspects of the computational approaches, such as docking, induced-fit docking, molecular dynamics simulations, free energy calculations, and reverse modeling. We also include ligand-based approaches, such as molecular similarity fingerprints, shape methods, pharmacophore modeling, and QSAR. Drug design and the development process strive to predict the metabolic fate of a drug candidate to establish a relationship between the pharmacodynamics and pharmacokinetics and highlight the potential toxicity of the drug candidate. Even though the use of computational approaches is often combined, we tried to identify which of these play a central role in each manuscript.

In this Special Issue, the use of molecular dynamics simulations, both unbiased and biased, cover a major part of the contributions. The non-covalent inhibition of the immunoproteasome was investigated in-depth through MD binding and binding pose metadynamics. MD simulations provided insight into the structural features of hTSPO (Translocator Protein) and the previously unknown interplay between PK11195, a molecule routinely used in positron emission tomography (PET) for the imaging of neuroinflammatory sites, and cholesterol. The interaction of certain endogen substrates, drug substrates, and inhibitors with wild-type MRP4 (WT-MRP4) and its variants, G187W and Y556C, were studied to determine differences in the intermolecular interactions and affinity related to SNPs using several approaches, but particularly all-atom, coarse-grained, and umbrella sampling molecular dynamics simulations (AA-MDS and CG-MDS, respectively). Natural sodium–glucose co-transporter 2 (SGLT2) inhibitors were selected to explore their potential against an emerging uropathogenic bacterial therapeutic target, i.e., FimH, which plays a critical role in the colonization of uropathogenic bacteria on the urinary tract surface, and molecular dynamics simulations were carried out to study the potential interactions. Doxorubicin encapsulation in carbon nanotubes with haeckelite or Stone–Wales defects as drug carriers were investigated using a molecular dynamics approach. The combined use of different approaches has been reported in a series of papers associated with the virtual screening of libraries. Almeelebia and co. screened 224,205 natural compounds from the ZINC database against the catalytic site of the Mtb proteasome. Pharmacophore-based virtual screening and molecular docking were carried out to identify potential Src inhibitors starting from a total of 891 molecules. Finally, MD simulations identified two molecules as potential lead compounds against Src kinase. An in silico study identified a methotrexate analog as a potential inhibitor of drug-resistant human dihydrofolate reductase for cancer therapeutics. A structure-based method for high-throughput virtual screening aimed to identify new dual-target hit molecules for acetylcholinesterase, and the 7 nicotinic acetylcholine receptor was reported and confirmed in vitro. A new complementary computational analysis called "dock binning" evaluates antibody–antigen docking models to identify why and where they might compete in terms of possible binding sites on the antigen. Interesting drug repurposing strategies have been reported. Hudson and Samudrala presented a computational analysis of a novel drug opportunities (CANDO) platform for shotgun multitarget repurposing. It implements several pipelines for the large-scale modeling and simulation of interactions between comprehensive libraries of drugs/compounds and protein structures. Qi and co. data-mined the crowd extracted expression of differential signatures (CREEDS) database to evaluate the similarities between gene expression signature (GES) profiles from drugs and their indicated diseases for GES-guided drug-repositioning approaches. In late 2019, the SARS-CoV-2 pandemic focused the attention of many researchers on not only vaccines but also new antiviral drugs. These reasons boosted the use of computational approaches to explore large libraries of natural compounds, already approved drugs, and in-house and commercial compounds. In this Special Issue, Baig and co. studied the efficacy of the Mpro inhibitor PF-00835231 against Mpro and its reported mutants in clinical trials. Several in silico approaches were used to investigate and compare the efficacy of PF-00835231 and five drugs previously documented to inhibit Mpro. Li and co. computationally investigated the MPD3 phytochemical database along with the pool of reported natural antiviral compounds to be used against SARS-CoV-2. Pedretti and co., exploiting the availability of resolved structures, designed a structure-based computational approach. The innovative idea of their study was to exploit known inhibitors of SARS-CoV 3CL-Pro as a training set to perform and validate multiple virtual screening campaigns. In the context of antiviral drugs, Regad and co. investigated the emergence of HIV-2 resistance. They proposed a structural analysis of 31 drug-resistant mutants of HIV-2 protease (PR2), an important target against HIV-2 infection. A wide series of contributions regarding the use of QSAR, machine learning, and deep learning has reported interesting outcomes. A multiple-molecule drug design based on systems biology approaches and a deep neural network to mitigate human skin aging was developed by Yeh and co. With the proposed systems medicine design procedure, they not only shed light on the skin-aging molecular progression mechanisms, but they also suggested two multiple-molecule drugs to mitigate human skin aging. The construction of quantitative structure–activity relationship (QSAR) models was used to predict the biological activities of the compounds obtained with virtual screening and identify new selective chemical entities for the COX-2 enzyme. The three-dimensional QSAR model, employing a common-features pharmacophore as an alignment rule, was built on 20 highly active/selective HDAC1 inhibitors. The predictive power of the 3D QSAR model represents a useful filtering tool for screening large chemical databases, finding novel derivatives with improved HDAC1 inhibitory activity. Different machine learning (ML) and deep learning (DL) algorithms using various integer and binary-type fingerprints were evaluated to develop quantitative structure–activity relationship (QSAR) models, which are important for hERG potassium channel blocker prediction.

Throughout this book, all the recent aspects of the computational approaches applied to several research fields are reported. We express our deep gratitude to all the contributors to this Special Issue for their commitment, hard work, and outstanding papers. We also thank all the reviewers involved in the manuscript revisions for their unpaid contributions to improve any aspects of the submitted works. Last but not least, we deeply thank Mrs. Jessie Zhang for her assistance during the period in which we served as guest editors.

### **Marco Tutone, Anna Maria Almerico** *Editors*

### *Article* **Immunoproteasome and Non-Covalent Inhibition: Exploration by Advanced Molecular Dynamics and Docking Methods**

**Giulia Culletta 1,2 , Maria Zappalà 2 , Roberta Ettari <sup>2</sup> , Anna Maria Almerico <sup>1</sup> and Marco Tutone 1,\***


**\*** Correspondence: marco.tutone@unipa.it

**Abstract:** The selective inhibition of immunoproteasome is a valuable strategy to treat autoimmune, inflammatory diseases, and hematologic malignancies. Recently, a new series of amide derivatives as non-covalent inhibitors of the β1i subunit with *K*<sup>i</sup> values in the low/submicromolar ranges have been identified. Here, we investigated the binding mechanism of the most potent and selective inhibitor, *N*-benzyl-2-(2-oxopyridin-1(2H)-yl)propanamide (**1**), to elucidate the steps from the ligand entrance into the binding pocket to the ligand-induced conformational changes. We carried out a total of 400 ns of MD-binding analyses, followed by 200 ns of plain MD. The trajectories clustering allowed identifying three representative poses evidencing new key interactions with Phe31 and Lys33 together in a flipped orientation of a representative pose. Further, Binding Pose MetaDynamics (BPMD) studies were performed to evaluate the binding stability, comparing **1** with four other inhibitors of the β1i subunit: *N*-benzyl-2-(2-oxopyridin-1(2H)-yl)acetamide (**2**), *N*-cyclohexyl-3- (2-oxopyridin-1(2H)-yl)propenamide (**3**), *N*-butyl-3-(2-oxopyridin-1(2H)-yl)propanamide (**4**), and (*S*)-2-(2-oxopyridin-1(2H)-yl)-*N*,4-diphenylbutanamide (**5**). The obtained results in terms of free binding energy were consistent with the experimental values of inhibition, confirming **1** as a lead compound of this series. The adopted methods provided a full dynamic description of the binding events, and the information obtained could be exploited for the rational design of new and more active inhibitors.

**Keywords:** immunoproteasome; non-covalent inhibitors; molecular dynamics; MD binding; metadynamics; induced-fit docking

#### **1. Introduction**

Protein turnover is essential for cellular function and homeostasis; in eukaryotic cells, the ubiquitin-proteasome system (UPS) is the central non-lysosomal pathway devoted to protein degradation. Whereas the lysosomal pathway mainly degrades membrane proteins or extracellular proteins imported into the cell by endocytosis, UPS, present both in the cytoplasm and nucleus, controls the degradation of damaged, incorrectly synthesized, or no longer useful intracellular proteins. Proteins are firstly tagged with several ubiquitin units; then, the polyubiquitinated proteins are rapidly hydrolyzed to small peptides by the proteasome, whereas ubiquitin is released and recycled [1]. The 26S constitutive proteasome consists of a barrel-shaped 20S catalytic core and two 19S regulatory caps. The catalytic core is constituted of four packed rings, each composed of seven different subunits, the two outer α, and the two inner β, respectively. The proteolytic activities reside in the β1c, β2c, and β5c subunits that are responsible for caspase-like (C-L), trypsin-like (T-L), and chymotrypsin-like (ChT-L) activities, respectively.

**Citation:** Culletta, G.; Zappalà, M.; Ettari, R.; Almerico, A.M.; Tutone, M. Immunoproteasome and Non-Covalent Inhibition: Exploration by Advanced Molecular Dynamics and Docking Methods. *Molecules* **2021**, *26*, 4046. https://doi.org/ 10.3390/molecules26134046

Academic Editor: Brullo Chiara

Received: 20 May 2021 Accepted: 29 June 2021 Published: 2 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Viale Annunziata, 98168 Messina, Italy; maria.zappala@unime.it (M.Z.); roberta.ettari@unime.it (R.E.)

Immunoproteasome is a specialized form of proteasome present in the vertebrates, constitutively expressed in lymphocytes and monocytes and induced by cytokines, such as IFN-α and TNF-α, in many other cell types. In immunoproteasomes, the constitutive catalytic subunits (β1c, β2c, and β5c) are replaced by the corresponding immunosubunits: β1i, β2i, and β5i. While β2i and β5i maintain the same type of activities as the β2c and β5c subunits, β1i mainly performs a ChT-L activity, thus cleaving peptides after hydrophobic amino acids [2]. High levels of immunoproteasomes have been found in a wide number of inflammatory diseases, such as Crohn's disease or inflammatory bowel disease, and autoimmune diseases like rheumatoid arthritis or systemic lupus erythematosus [3]. Furthermore, immunoproteasomes are overexpressed in hematologic malignancies, including multiple myeloma or acute myeloid leukemia [4]. Therefore, the discovery of selective immunoproteasome inhibitors is pivotal to bring new chances for the treatment of the above-mentioned diseases. Exhaustive reports on selective covalent and non-covalent immunoproteasome inhibitors have been recently published [5,6]. The main class of covalent immunoproteasome inhibitors is that of peptide derivatives bearing an electrophile warhead able to interact with the nucleophilic hydroxyl group of catalytic Thr1. Just to give some examples, ONX-0914, a tripeptide α ′ ,β ′ -epoxyketone, was the first β5i-selective inhibitor identified; another α ′ ,β ′ -epoxyketone, UK-101, and the peptidyl aldehyde IPSI-001 showed a selective activity on the β1i subunit [5,6]. However, the covalent irreversible inhibition of a human enzyme is not always desirable in medicinal chemistry, as it can be responsible for potential toxicity due to off-target binding. Another drawback is that a single mutation in the catalytic amino acid (i.e., Thr1) could cause a loss of activity and acquired resistance. [7]. Non-covalent inhibition is therefore strongly desirable, because it is free of these disadvantages. Lacking a reactive warhead, non-covalent inhibitors may have an improved selectivity and less reactivity and instability and, therefore, may not exhibit the side effects that occur in covalent inhibitor therapies (e.g., liver toxicity and idiosyncratic adverse reactions) [8,9]. Furthermore, the enzyme–inhibitor complexes have reduced lifetimes, and this promotes an extensive tissue distribution of the drug [10]. To date, few non-covalent immunoproteasome inhibitors show selectivity towards the β1i and/or β5i subunits. One of them is Argyrin B, a natural cyclic peptide that is a reversible, noncompetitive inhibitor of β5i and β1i [8]. Other compounds are N,C-capped dipeptides, such as PKS2279 and PKS2252, in which the insertion of a β-amino acid markedly reduces the inhibitory potency against constitutive proteasomes, yet maintain potent inhibitory activity against immunoproteasomes [11]. Recently, some of us identified a panel of selective non-covalent inhibitors of the β1i and/or β5i subunits, characterized by a 2(1H)-pyridone scaffold linked to an amide function [12]. *N*-Benzyl-2-(2-oxopyridin-1(2H)-yl)propanamide (**1**) proved to be the most potent and selective inhibitor, with a *K*i = 21 nM against the β1i subunit. Four other compounds of this series, *N*-benzyl-2-(2 oxopyridin-1(2H)-yl)acetamide (**2**), *N*-cyclohexyl-3-(2-oxopyridin-1(2H)-yl)propanamide (**3**), *N*-butyl-3-(2-oxopyridin-1(2H)-yl)propanamide (**4**), and (*S*)-2-(2-oxopyridin-1(2H) yl)-*N*,4-diphenylbutanamide (**5**), showed remarkable inhibitory activity towards the β1i subunit (Figure 1). Derivatives **3**–**5** were also active against the β5i subunit, whereas none of the compounds **1**–**5** proved to affect the constitutive catalytic subunits.

β **Figure 1.** Structures and *K*<sup>i</sup> values of the selective β1i inhibitors **1**–**5**.

β α′ β′ β The available experimental structures of immunoproteasomes provided the basis for several computational investigations. In the recent past, most of these studies made use of molecular docking methods. In particular, the binding mode of the non-covalent amide derivatives **1** and **2** was investigated at this level [12], while, to the best of our knowledge, the most accurate computational investigations were performed just on the β1i subunit (Figure 2A,B) and the peptide α′ ,β ′ -epoxyketone UK101 (Figure 2C) using molecular dynamics (MD) simulations. The observed selectivity of UK101 for the β1i subunit is rationalized by the requirement for both a linear hydrocarbon chain at the N-terminus and a bulky group at the C-terminus of the inhibitor [13]. In recent years, the constant update of hardware capabilities allowed the development of enhanced MD methods able to provide a full dynamical description of the target–ligand-binding events [14]. These methods are usually employed given that the sampling issue is fundamental to describing these slow processes while docking methods continue to be pivotal to screening large libraries, also assisted by MD [15,16].

β In this manuscript, we investigated the binding mechanism of the previously identified most active non-covalent amide **1** in the β1i subunit. For this purpose, we employed advanced molecular dynamics methods, such as MD binding (MDB) [17] and Binding Pose MetaDynamics (BPMD) [18]. In particular, we used the MDB tools implemented in the BiKi suite [19] to analyze the binding mechanism and gain insights into the ligand entrance pathway. Then, plain MD was performed to study the stability and conformational space into the immunoproteasome–ligand complex, thus allowing elucidation of the compound dynamic behaviors within the binding cavity. Successively, a cluster analysis provided representative poses that were used to evaluate the binding stability using the BPMD protocol. To have a comparative point of view, we also carried out induced-fit docking (IFD) and BPMD studies of the other four compounds (**2**–**5**) that showed high inhibitory activities towards the β1i subunit. The results obtained could provide further information to develop the most selective and active immunoproteasome inhibitors.

β

**2021**, , x FOR PEER REVIEW 4 of 21

β β **Figure 2.** (**A**) 3D structure of the immunoproteasome, the two β1i subunits in red, (**B**) 3D structure of the β1i subunit, and (**C**) the structure of inhibitors UK-101 and ONX-0914.

#### **2. Results**

#### *2.1. MD-Binding (MDB) Analysis*

β β We began the study using the crystal structure of the murine immunoproteasome in complex with the inhibitor ONX-0914 (Figure 2C) bound to the β5i subunit (PDB ID: 3UNF) [20]. Murine and human immunoproteasomes share a sequence identity of more than 90%, and the few nonidentical residues were external to the active sites. In the literature, a crystal structure of human immunoproteasome was recently released in complex with a boronic acid derivative [21], but the docking of compound 1 was previously studied on the β1i subunit structure derived from the PDB ID:3UNF that do not bind any ligand. For these reasons, we used it as a starting point to carry out our simulations. To gain insights into the ligand-binding mode, we employed the MDB method to predict the path of ligand entrance into the cavity. This method has the advantage of describing complex events without incurring prohibitive time and computational costs. It is based on an adaptive, electrostatics-inspired bias and a campaign of trivially parallel short MD simulations to identify a near-native binding pose from the sampled configurations. At a reasonable computational cost, this method also provides accurate predictions when the starting protein conformation is different from that of the crystal complex, a known hurdle for traditional molecular docking [22]. The advanced proposed MDB protocol would require the identification of the binding pocket with NanoShaper software [23], which can identify the atoms facing the lumen pocket entrance in the protein target. According to NanoShaper software, the attractive protein residues selected were Thr1, Val20, Ser21, Phe31, Lys33, Leu45, Ser46, Gly47, Ser48, Ala49, Ala52, Ser129, and Ser168 (Figure 3).

**2021**, , x FOR PEER REVIEW 5 of 21

<sup>β</sup> **Figure 3.** Identification of the binding cavity of the β1i subunit (solid blue) by NanoShaper software with the residues involved in the binding pocket.

Compound **1** is positioned with a random orientation at a predetermined distance, measured in terms of the thickness of the solvation shell around the ligand. From the set, we started 20 replicas of 20 ns for each entrance starting from the apo structure, thus collecting a total of 400 ns of MDB simulations.

The analysis of the results showed that the simulations overcome the energetic barrier in an average time of 2 ns, reaching the binding site. The unavailability of crystallographic structures for non-covalent ligand 1 did not allow the comparison of the conformations, employing the RMSD of the bound ligand. For these reasons, the RMSD of the protein backbone was used as a reference for any uncommon behaviors. All replicas showed a protein backbone RMSD average <2 Å, decreasing when the ligand arrived at the binding site (Supplementary Materials, Figures S1–S7). In most replicas, the ligand entered into the active site in the following 8 ns of the simulations, and in the last 10 ns, its refinement at the binding site was registered (Figure 4).

After the first 20 ns, the electrostatic bias was removed, and the sampling time was increased starting from the final frames of each MDB replica to enhance the sampling conformational changes and interactions of the ligand inside the binding site. For each replica, 10 ns more of the simulation was carried out, collecting a total of 600 ns of MD simulations. The plain MD simulations performed after the bias switch-off provided the local refinement of the binding mode. Once the binding simulation campaign was completed, the replicas ending without the ligand into the binding site were pruned out, and the remaining replicas were analyzed. The major part of the simulations showed a high stability, with the ligand strictly bound to the binding pocket, and in a few simulations, the ligand rapidly drifted away. Twelve replicas maintained a high stability at the binding site, as shown by averaged value of RMSD 1.5 Å of the complex (Supplementary Materials, Figures S8–S10). Then, the 12 replica trajectories were clustered. Each trajectory was recorded in 1000 frames, and these frames were clustered considering the RMSD of the binding site backbone (12,000 frames total). Each replica returned three representative clusters for a total of 36 MD representative poses. These last ones were further clustered and took into consideration the conformations of the ligand into the binding site and the most occurred interactions. In the end, it was possible to identify three final representative poses (pose 1, pose 2, and pose 3).

β **Figure 4.** Time sequence of compound **1** approaching the active site of the β1i subunit studied by MD binding. The figure is representative of the 20 replicas.

The poses obtained from the clustering analysis were characterized by the important features observed during the simulations. In particular, in pose 1, two H-bonds were formed between the oxygen of the amide group and Ser21 and between the hydrogen of the amide group and Gly47 (Supplementary Information, Figure S11A). The binding of the ligand was strengthened by several van der Waals contacts between the benzyl group and the residues Val20, Phe31, Lys33, Gly47, Ala49, and Ala52. Val20, Ser21, Ser46, Gly47, Ala49, and Ala52 interacted with the linker between the two rings.

The identified pose 2 showed a series of noteworthy interactions that have not been previously identified. The benzyl group of pose 2 interacted by a pi-stacking interaction with Phe31. This pose was stabilized by several van der Waals contacts. The 2-pyridone moiety showed a series of contacts different from pose 1 (Lys33, Leu45, Ser46, Gly47, Ser48, Ala49, Ala52, Ser129, and Ser168). Ser21, Phe31, Ser46, and Gly47 interacted with the ethylene linker. It is worthy to note the absence of H-bonds in this pose (Supplementary Information, Figure S11B).

Pose 3 was characterized by the same H-bonds network observed in pose 1, with Ser21 and Gly47 residues. The 2-pyridone moiety formed one cation-pi-stacking interaction with the epsilon amino group of Lys33 (Supplementary Information, Figure S12A). As observed for the other poses, van der Waals contacts strengthened the ligand binding in pose 3. The benzyl group interacted with Ser21, Ala22, Leu45, and Ser46. The 2-pyridone moiety showed contacts with several residues: Phe31, Lys33, Gly47, Ser48, Ala49, and andAla52. The linker showed interactions with Thr1, Ser46, Gly47, and Ser168.

The major differences observed for these poses concerned the orientations of pose 1 and pose 3 related to their interactions with the residues of the binding site. In particular, besides the same H-bonds, a flipped orientation of the 2-pyridone and the benzyl moieties was observed. This evidence could reveal that the entrance mode of the ligand occurred in different ways without affecting the binding capability during the MD runs. The folded conformation assumed by the ligand in pose 2 seemed to represent an intermediate conformation. Concerning previous studies [12], two pi-stacking interactions and van der

Waals contacts between the rings and the residues Thr1, Val20, Phe31, Lys33, Leu45, Ser46, and Ala52 were identified (Figure 5). **2021**, , x FOR PEER REVIEW 8 of 21

β **Figure 5.** 3D and 2D-binding modes of compound 1: pose 1 (**A**,**B**), pose 2 (**C**,**D**), and pose 3 (**E**,**F**) after the MDB simulations and after IFD (**G**,**H**) into the β1i active site of murine immunoproteasome (PDB ID: 3UNF). In the 3D figures, the H-bonds are represented in yellow dashes, the cation-pi-stacking interactions in green dashes, and the pi-pi stacking in blue dashes.

#### *2.2. Induced Fit Docking (IFD)*

To add more information concerning the previous docking studies and compare the results obtained by the MDB, we used the more accurate protocol induced-fit docking (IFD) [24]. IFD predicts the ligand-binding modes and concomitant structural changes in the receptor. It confers flexibility to the protein side chains, allowing the ligand to adjust and optimize the binding interactions within the active site. IFD was carried out for compound 1 and for the other four compounds **2**–**5** that showed encouraging inhibitory activity on β1i. In the previous studies, classical docking was performed for compound **2**, while computational studies have not been performed yet for compounds **3**–**5**.

The best IFD pose of **1** reported the same two interactions observed after the MD simulation: the H-bonds between the residues Gly47 and Ser21 of the protein and NH amide and the carbonyl group of the molecule (Figure 5G,H). Besides, other hydrogen bonds were found. In particular, the carbonyl of 2-pyridone moiety formed an H-bond network with Ala49 and Ala50. Concerning the other MDB poses, this was a peculiar difference of the IFD pose that was not observed in the docking study. These residues, together with Ser48, stabilized the ligand binding by van der Waals contacts, such as observed in the previous docking study. Other van der Waals contacts were formed between the benzyl group and Val20, Phe31, Lys33, Leu45, Gly47, and Ala52. Finally, the ethylene linker between the rings interacted with Thr1, Val20, Ser21, Gly47, and Ala49. It is interesting to note that the pi-stacking interactions observed in pose 2 between the benzyl group and Phe31 and the cation-pi-stacking interaction in pose 3 between the 2-pyridone and Lys33 were not evidenced in the IFD pose but only as van der Waals contacts (Figure 5).

The other four analogs of amide 1 were characterized by structural variations at the *N*-substituent and the methylene/ethylene linker between the 2-pyridone scaffold and the amide function. Compound 2 showed a methylene linker between the 2-pyridone scaffold and the amide function, and the experimental activity was recorded with a *K*i value of 2.23 µM on the β1i subunit. The best IFD pose for 2 showed three H-bonds: Ser21 with the carbonyl of amide and Gly47 with the NH amide and the carbonyl of 2-pyridone. The benzyl moiety of the molecule formed a cation-pi-stacking interaction with Lys33, as also evidenced for 1 in pose 3 (Figure 6A,B).

The cyclohexyl derivative **3** (*K*i = 2.92 µM) formed four H-bonds. The residue Thr1 made two H-bonds with the carbonyl of amide and the carbonyl of 2-pyridone. Gly47 formed two H-bonds with NH amide and carbonyl of 2-pyridone (Figure 6C,D). The interactions of the best IFD pose of *n*-butyl derivative **4** (*K*i = 3.09 µM) were characterized by two H-bonds between the carbonyl and NH of the amide of the molecule with Ser21 and Gly47, respectively. The 2-pyridone moiety formed pi-pi stacking with the Phe31 (Figure 6E,F). The last compound, (*S*)-2-(2-oxopyridin-1(2H)-yl)-*N*,4-diphenylbutanamide (**5**) (*K*i = 5.9 µM), showed two H-bonds, one between Ser21 and carbonyl of amide and the other between Ala49 and carbonyl of 2-pyridone (Figure 6G,H). Additionally, for these molecules, the recurrent interactions were between the residues Ser21, Gly47, and the amide group, but it underlined the pi-stacking interactions with Phe31 and Lys33, which could constitute clear evidence of the key role of these residues in the inhibition pattern.

**2021**, , x FOR PEER REVIEW 10 of 21

β **Figure 6.** 3D and 2D-binding modes of compound **2** (**A**,**B**), compound **3** (**C**,**D**), compound **4** (**E**,**F**), and compound **5** (**G**,**H**) into the β1i active site of murine immunoproteasome (PDB ID: 3UNF) after the IFD study. In the 3D figures, the H-bonds are represented in yellow dashes, the cation-pi-stacking interactions in green dashes, and the pi-pi stacking in blue dashes.

#### *2.3. Binding Pose MetaDynamics Analysis*

Binding Pose MetaDynamics (BPMD) is an automated, enhanced sampling, metadynamicsbased protocol, in which the ligand is forced to move around its binding pose. The possible higher mobility of the ligand under a biasing potential is a mark of the binding mode instability. This method showed the ability to reliably discriminate between the ligandbinding pose retrieved by MDB and a plausible alternative generated with IFD studies [22].

We decided to use BPMD to evaluate the affinity of the three representative poses obtained from MDB and the pose of IFD into the binding site for compounds **1** and **2**–**5**. The results were defined in terms of the pose stability based on the PoseScore, which is the RMSD of the ligand related to the starting coordinates of the heavy atoms of the ligand. Moreover, to evaluate the results, another metric is used, such as the PersScore, which is a clue of the H-bond formed between the ligand and the target during the simulations. The linear combination of these two scores provides a third score, the CompScore, which is calculated with Equation (1):

#### CompScore = PoseScore − 5 × PersScore (1)

Lower values of the CompScore indicate more stable complexes.

During the metadynamics simulations, pose 1 reached a steady PoseScore = 1.397, considered stable, while the PersScore showed that the hydrogen bonds identified at the start of the metadynamics run were kept for 60% of the averaged time (Figure 7A). In particular, the H-bond between the NH amide group of the ligand and Gly47 was kept for 88% of the simulation time, while the H-bond between the carbonyl of the ligand and Ser21 for 36% (Figure 7B). The CompScore value was -1.694. Due to the absence of H-bonds recorded, pose 2 with recorded pi stacking and van der Waals interactions showed the same value for the PoseScore and CompScore, 3.129 (Figure 7C), while, for pose 3, the scores were PoseScore = 3.349, PersScore = 0.223, and CompScore = 2.235, respectively (Figure 7E). As for pose 1, pose 3 kept the H-bond between NH amide and Gly47 as 26% and 18% between carbonyl and Ser21 (Figure 7F).

The PoseScore for the pose of amide 1 obtained by the IFD was 4.576, and the PersScore showed that the hydrogen bonds identified at the start of the MetaDynamics run were kept for 13% of the averaged time. The value of the CompScore was 3.917 (Figure 7G). It is interesting to point out that, of the four H-bonds detected by IFD, the two interactions between the amide group and Ser21 and Gly47 were maintained—in particular, the interaction between NH amide and Gly47 for 43% and 9% between carbonyl and Ser21 (Figure 7H).

The RMSD values and the percentage of the H-bonds retrieved from BPMD studies for the amide 1 in the three MDB poses and in the IFD pose showed that pose 1 could be considered more stable. Pose 1, pose 3, and the IFD pose adopted the same plain conformation and H-bonds between Ser21, Gly47, and the amide group. The differences were in the additional interactions between Ala49, Ala50, and the carbonyl of 2-pyridone, which led to a rotation of 2-pyridone, causing the ring to be specular in the IFD pose and showed a high value of RMSD (4.02 Å).

The BPMD analysis was also carried out for compounds **2–5** to evaluate their binding stability with respect to the most active compound of the series, **1**. The results of the BPMD calculations are reported in Figure 8. As can be evidenced from the plots, all showed PoseScore values higher than the averaged PoseScore for **1**. The hydrogen bonds identified at the start of the MetaDynamics run were maintained for 10-30% of the averaged time (Figure 8B,D,F,H) The CompScore values for compounds **2**–**5** were 4.750, 4.276, 5.979, and 1.728, respectively. Moreover, MM-GBSA-binding free energy calculations for all the complexes were performed. The plot of the calculated ∆G binding vs. the *K*i values is reported in Figure 9, and it shows an R<sup>2</sup> = 0.8071 (compound **1** ∆G = −52.912 Kcal/mol, compound **2** ∆G = −41.684 Kcal/mol, compound **3** ∆G = −41.355 Kcal/mol, compound **4** ∆G = −36.701 Kcal/mol, and compound **5** ∆G = −35.340 Kcal/mol).

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

F.

**Figure 7.** Plots of the RMSD estimate averaged over all 10 trials vs. the simulation time for the BindingPoseMetaDynamics runs: pose 1 (**A**), pose 2 (**C**), pose 3 (**E**), and IFD pose (**G**). Persistence Score Plots: pose 1 (**B**) pose 2 (**D**), pose 3 (**F**), and IFD pose (**H**). The blue dots represent the values of the CV RMSD at different times (2 ns, 4 ns, 6 ns, 8 ns, and 10 ns) for each simulation trial. The blue lines represent the mean CV RMSD values along the 10 × 10 ns of the simulation trials. The orange and blue bars represent the fraction of H-bonds maintained during the simulation for each trial.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

F.

**Figure 8.** Plots of the RMSD estimate averaged over all 10 trials vs. the simulation time for the BindingPoseMetaDynamics runs: compound **2** (**A**), compound **3** (**C**), compound **4** (**E**), and compound **5** (**G**). Persistence Score plots of compounds **2** (**B**), compound **3** (**D**), compound **4** (**F**), and compound **5** (**H**). The blue dots represent the values of the CV RMSD at different times (2 ns, 4 ns, 6 ns, 8 ns, and 10 ns) for each simulation trial. The blue lines represent the mean CV RMSD values along the 10 × 10 ns of the simulation trials. The orange and blue bars represent the fraction of H-bonds maintained during the simulation for each trial.

**Figure 9.** The plot of the MM-GBSA ∆G binding vs. the *K*i values of compounds **1**–**5**. The binding free energy is expressed in Kcal/mol, and the IC<sup>50</sup> is expressed in µM.

#### **3. Discussion**

The inhibition of the human immunoproteasome is a hot topic of recent years in medicinal chemistry due to its involvement in a wide range of diseases. Promising immunoproteasome inhibitors, both covalent and non-covalent, have been recently identified. In covalent inhibitors, the presence of a reactive warhead may cause significant off-target activities against other proteins, which may result in side effects (e.g., liver toxicity and idiosyncratic adverse reactions) and reduced selectivity over time [8,9]. For these reasons, the attention was focused on non-covalent immunoproteasome inhibitors. In this context, a series of amide derivative β1i subunit inhibitors with *K*<sup>i</sup> values in the low micromolar or submicromolar range have been recently identified [12]. The use of computational approaches could characterize the binding process of these inhibitors—in particular, the use of advanced molecular dynamics approaches able to explore the dynamic features of the protein/ligand complex could overcome the limitations of semiflexible molecular docking methods in which the protein target is treated as a rigid body. Several advanced methods have been proposed in the last years for computing association and dissociation mechanisms, and all of them were shown to be promising in the interpretation of such mechanisms. With the aim to gain more insights into non-covalent inhibitors of the immunoproteasome, we decided to exploit these enhanced sampling methods.

Here, we investigated the dynamic binding mechanism of compound **1**, the most active of a series of non-covalent amide derivatives. With the aim of collecting mechanistic insight on the binding process, we performed the MDB protocol implemented in BiKi software to simulate the events that elapsed among the ligand unbound and the ligand entrance in the binding pocket. Successively, plain MD simulations were performed to extend the sampling of the bound states. The clustering of the survived complexes trajectories allowed identifying three representative poses (pose 1, pose 2, and pose 3) observed during the simulation. The most important interactions for the inhibition pattern were, in pose 1, two H-bonds between the amide group and Ser21 and Gly47 and, in pose 2, the benzyl group interacting by pi-pi stacking with Phe31. The residues Ser21 and Gly47 of pose 3 formed H-bonds with carbonyl and NH amide, and at the same time, the 2-pyridone moiety made a cation-pi-stacking interaction with the epsilon amino group of Lys33. Moreover, pose 1 showed a different orientation of the 2-pyridone moiety with respect to the docking and IFD studies. The 2-pyridone moiety was stabilized in the binding pocket by van der Waals contacts, as observed in MDB, while, in docking and IFD studies, it was stabilized by

H-bonds with Ala49 and Ala50 beyond van der Waals contacts. In pose 3, the peculiarity is represented by the 2-pyridone moiety interacting with the pi-stacking interaction with Lys33, which determined a flipped orientation with respect to pose 1. Finally, in pose 2, a different folded conformation with respect to pose 1 and pose 3 was observed, with a new pi-stacking interaction between the benzyl group and Phe31. The flipped orientations obtained for pose 1 and pose 3 suggested a different entrance mode of the ligand into the active site. Through the BPMD studies, it was possible to observe that both poses were stable, but, according to the RMSD value, the conformation of pose 1 showed major stability compared to pose 3 in the active site. Beyond the van der Waals contacts observed, the conformation of pose 1 could be strengthened by the pi-stacking interactions shown in pose 2 and pose 3 to improve the potency and selectivity of the β1i subunit. To pursue the matter, we also carried out IFD calculations for the other four amide derivatives **2**–**5** that showed an appreciable inhibitory activity on β1i. These studies revealed that Phe31 and Lys33 residues could play a key role in the inhibition pattern, in addition to the already known Ser21 and Gly47 ones, showing the importance not only of the hydrogen bonds but also of the pi-stacking interactions for the stabilization of the binding of the inhibitors. Moreover, the BPMD analysis confirmed the higher binding stability of inhibitor 1 with respect to the inhibitors 2–5, as evidenced by in vitro tests. Compound **1** showed the best CompScore (−1.694) with respect to the other compounds. The consistency of the computational analysis with the experimental data was further confirmed by the MM-GBSA-binding free energy calculations. These outputs were plotted against the experimental *K*i values, and the R<sup>2</sup> = 0.8071 confirmed compound **1** as the best derivatives of this series.

#### **4. Materials and Methods**

#### *4.1. System and Ligand Preparation*

For the purposes of this study, we selected the catalytic subunit β1i (LMP2 and PSMB9) extracted from the murine i20S in complex with the inhibitor ONX-0914 bound to the β5i subunit (PDB ID: 3UNF) [20]. Both 20S subunits, murine and human, share a sequence identity of more than 90%, and the few nonidentical residues are external to the active sites. As reported in the literature, in the case of covalent cocrystallized inhibitors [25,26], the reactive residue at the catalytic site was rebuilt after removing the covalent inhibitor by breaking the covalent bond and filling in the open valence. In this case, the involved residue was Thr1. The protein was prepared with the Protein Preparation Wizard [27] included in the Maestro suite (Maestro, Schrödinger, LLC, 2021, New York, NY, USA): adding bond orders and hydrogen atoms to the crystal structure using the OPLS2005 force field. Next, Prime [28] was used to fix missing residues or atoms in the protein and to remove cocrystallized water molecules. The protonation states at pH 7.2 ± 0.2 of the protein and the ligand were evaluated using Epik 3.1 [29]. The hydrogen bonds were optimized through the reorientation of hydroxyl bonds, thiol groups, and amide groups. In the end, the systems were minimized with the value of convergence of the RMSD of 0.3 Å. The ligands were drawn using Marvin Sketch 19.25 [30]. Amide 1 was parameterized using the BiKi suite [19] at the AM1-BCC [22] level of theory. Partial charges were derived using the RESP method [23] in Antechamber [24]. Compounds **2**–**5** were prepared using Schrödinger LigPrep v. 2021-1 (LigPrep, Schrödinger, LLC, 2021, New York, NY, USA). The force field adopted was OPLS2005, and Epik was selected as the ionization tool at pH 7.0 ± 2.0. Tautomers generation was flagged, and the maximum number of conformers generated was set at 32.

#### *4.2. MD-Binding Simulations*

The MD-binding method [17] within the BiKi suite [19] (BiKi Technologies s.r.l., Genova, Italy) exploits an additive external force that is summed as the regular potential energy of the system to enhance the probability of observing the binding event. The bias is represented by external electrostatic-like forces acting between a subset of the residues of the binding site and the ligand. The intensity of the bias is controlled by the adaptivity

rules and gradually switches off as the ligand moves torward the subset of residues; after the conjectured passing of the transition state has occurred, it slowly recovers the behavior of classical unbiased MD [31].

The protocol for MD-binding consists of crucial steps: characterization of the binding pocket using NanoShaper [23] (BiKi Technologies s.r.l., Genova, Italy). NanoShaper calculations provide a characterization of the binding pocket, which identifies the atoms facing the pocket entrance in the protein structure. This information was then used by BiKi software for the initial ligand positioning outside the binding cavity. Subsequently, an additive external force was made to enhance the sampling of the binding event. Once the ligand was positioned through the "Residue Placement" tool in BiKi, the system was solvated in an orthorhombic box using the TIP3P water model [32]. A suitable number of counterions were added to neutralize the overall system. Then, the whole system underwent energy minimization by using the Amber99SB-ildn force field [33]. According to the standard protocol [17], four different consecutive equilibration steps were performed: (1) 100 ps in the NVT ensemble at 100 K with both the protein backbone and ligand restrained (1000 kJ/mol nm<sup>2</sup> ), (2) 100 ps in the NVT ensemble at 200 K with both the protein backbone and the ligand restrained, (3) 100 ps in the NVT ensemble at 300 K with the free protein and the ligand restrained, and (4) 1000 ps in the NPT ensemble at 300 K with the free protein and the ligand restrained. Electrostatic interactions were treated with the cutoff method for short-range interactions and with the particle mesh Ewald method for long-range interactions (rlist = 1.1 nm, cutoff distance = 1.1 nm, vdW distance = 1.1 nm, and PME order = 4). The constant temperature conditions were provided using the velocity rescale thermostat [34], which is a modification of Berendsen's coupling algorithm [35]. The coordinate output from the last simulation was then used as the input to produce the biased molecular dynamics. Finally, 20 replica production runs, 20-ns-long in the NVT ensemble at 300 K, were performed for each complex using C = 0.1 (the fraction of the felt force, here 10%), a smoothing window size of 1000 samples, and a maximal K(t) of 0.001 (maximal steering constant).

#### *4.3. Plain MD Simulations*

The plain MD simulations were carried out using Desmond 6.5 [36] using the OPLS2005 force field [37] (Desmond Molecular Dynamics System, D. E. Shaw Research, New York, NY, USA). The complexes were solvated in orthorhombic boxes using the TIP3P water model. Ions were added to neutralize the charges. The systems were minimized and equilibrated at a temperature of 303.15 K and a pressure of 1.013 bar. The system was simulated as an NPT ensemble; a Nose–Hover thermostat and Martyna–Tobia–Klein barostat were used. The integration time step was chosen to be 2 fs. To keep the hydrogen–heavy atom bonds rigid, the SHAKE algorithm was used. A 9 Å cutoff radius was set for the short-range Coulomb interactions, and smooth particle mesh Ewald was used for the long-range interactions. For each replica, we carried out 10-ns MD simulations for a total of 200 ns, with 1.2-ps detection ranges for energy and 4.8 ps for the trajectory frames. The stability of the systems was evaluated using the root mean square deviation (RMSD) of the aligned protein and ligand coordinate set calculated against the initial frame. Visualization and analysis of the MD trajectories were performed using the Desmond simulation analysis tools in Maestro. The trajectories frames were clustered according to the hierarchical cluster linkage method. The 1000 frames recorded in each simulation were clustered considering the binding site conformations into 10 clusters based on the atomic RMSDs.

#### *4.4. Binding Pose MetaDynamics (BPMD)*

Binding pose MetaDynamics (BPMD) is an automated, enhanced sampling, metadynamicsbased protocol in which the ligand is forced to move around its binding pose. This method showed the ability to reliably discriminate between the correct ligand binding pose and plausible alternatives generated with docking or plain MD studies [18].

According to the protocol, 10 independent metadynamics simulations of 10 ns were performed using as a collective variable (CV) the measure of the root mean square deviation (RMSD) of the ligand heavy atoms with respect to their starting positions. The alignment before the RMSD calculations was done by selecting protein residues within 3 Å of the ligand. The Cαs of these binding site residues were then aligned to those of the first frame of the metadynamics trajectory before calculating the heavy atom RMSD to the ligand conformation in the first frame. The hill height and width were set to 0.05 kcal/ mol (about 1/10 of the characteristic thermal energy of the system, kBT) and 0.02 Å, respectively. Before the actual metadynamics run, the system was solvated in a box of SPC water molecules [38], followed by several minimizations and restrained MD steps that allow the system to slowly reach the desired temperature of 300 K, as well as releasing any bad contacts and/or strain from the initial starting structure. The final snapshot of the short unbiased MD simulation of 0.5 ns was then used as the reference for the following metadynamics production phase. After the simulation, the stability of the ligand during the course was represented by three scores: PoseScore, PersistenceScore (PersScore), and CompositeScore (CompScore). The PoseScore is indicative of the average RMSD from the starting pose. A steep increase of this value is a symptom that the ligand is not in a well-defined energy minimum and, probably, it might not have been accurately modeled. PersScore is a measure of the hydrogen bond persistence calculated in the last 2 ns of the simulation that have the same number of hydrogen bonds as the input structure, averaged over all the 10 repeated simulations. It covers a range between 0 and 1, where 0 indicates that either the starting ligand pose did not have any interactions with the target or that the interactions were lost during the simulations, while 1 indicates that the interactions between the staring ligand pose and the last 2 ns of the simulations were retained. CompositeScore is the linear combination of the PoseScore and PersScore; lower values equate to more stable complexes. Each complex, previously obtained, was run, Country) on a single node with a 1 GPU card NVIDIA GeForce RTX2070.

#### *4.5. Induced-Fit Docking*

The induced-fit protocol (IFD)—developed by Schrödinger [24]—is a method for modeling the conformational changes induced by ligand binding. This protocol models induced-fit docking of one or more ligands using the following steps, as also reported in references [39–42]. The protocol starts with an initial docking of each ligand using a softened potential (van der Waals radii scaling). Then, a side-chain prediction within a given distance of any ligand pose (5 Å) is performed. Subsequently, a minimization of the same set of residues and the ligand for each protein/ligand complex pose is performed. After this stage, any receptor structure in each pose reflects an induced fit to the ligand structure and conformation. Finally, the ligand is rigorously docked, using Glide XP (Glide, Schrödinger, LLC, 2021, New York, NY, USA), into the induced-fit receptor structure. IFD was performed using a standard protocol, and the OPLS2005 force field was chosen. The receptor box was centered on the active site of β1i, according to the NanoShaper calculations. During the initial docking procedure, the van der Waals scaling factor was set at 0.5 for both the receptor and ligand. The Prime refinement step was set on the side chains of residues within 5 Å of the ligand. For each ligand docked, a maximum of 20 poses was retained to then be redocked in XP mode.

#### *4.6. MM-GBSA-Binding Free Energy Calculations*

Prime/MM-GBSA was used for the estimation of ∆G binding. The MM-GBSA approach employs molecular mechanics, the generalized Born model, and the solvent accessibility method to elicit free energies from structural information, circumventing the computational complexity of free energy simulations, wherein the net free energy is treated as a sum of a comprehensive set of individual energy components, each with a physical basis [25]. In our study, the VSGB solvation model was chosen using the OPLS2005 force field with a minimized sampling method [28].

#### **5. Conclusions**

In this study, we investigated the mechanism of non-covalent inhibition of the potent and selective immunoproteasome inhibitor 1. For this purpose, we employed advanced molecular dynamics methods such as MD binding (MDB) and Binding Pose MetaDynamics (BPMD) and advanced docking methods such as induced-fit docking (IFD). MD binding allowed analyzing the binding mechanisms and gained insights into the ligand entrance pathway. Then, plain MD was performed to study the stability and conformational space in the immunoproteasome–ligand complex, thus allowing elucidation of the compound dynamic behavior within the binding cavity. These results were compared with the IFD poses of the other four inhibitors, revealing new key residues in the binding pattern, and confirmed the different binding stability of **1** with respect to the other compounds, **2**–**5**. The collected information and outcome could be useful in ameliorating the activity of compound 1 and providing a dynamical point of view for the definition of the pharmacophoric features that could be exploited through dynamic pharmacophore modeling approaches, such as the Common Hits Approach (CHA) [43] or MYSHAPE [44], for the scaffold-hopping of new non-covalent inhibitors through a virtual screening campaign.

**Supplementary Materials:** The following supplementary materials are available online, Figures S1–S12. Figure S1: MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 1 (**A**,**B**) and Replica 2 (**C**,**D**). Figure S2. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 3 (**A**,**B**), Replica 4 (**C**,**D**), Replica 5 (**E**,**F**). Figure S3. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 6 (**A**,**B**), Replica 7 (**C**,**D**), Replica 8 (**E**,**F**). Figure S4. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 9 (**A**,**B**), Replica 10 (**C**,**D**), Replica 11 (**E**,**F**). Figure S5. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 12 (**A**,**B**), Replica 13 (**C**,**D**), Replica 14 (**E**,**F**). Figure S6. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 15 (**A**,**B**), Replica 16 (**C**,**D**), Replica 17 (**E**,**F**). Figure S7. MD-binding: Ligand RMSD calculated from the centroid of binding pocket and protein backbone RMSD (20 ns) for Replica 18 (**A**,**B**), Replica 19 (**C**,**D**), Replica 20 (**E**,**F**). Figure S8. Ligand and protein RMSD during MD-plain(10 ns). Replica 1 (**A**); Replica 2 (**B**); Replica 8 (**C**); Replica10 (**D**). Figure S9. Ligand and protein RMSD during MD-plain(10 ns). Replica 11 (**A**); Replica 12 (**B**); Replica 13 (**C**); Replica 14 (**D**). Figure S10. Ligand and protein RMSD during MD-plain(10 ns). Replica 15 (**A**); Replica 16 (**B**); Replica 17 (**C**); Replica 20 (**D**). Figure S11. Ligand Interaction Diagram of pose1 (**A**) and pose2 (**B**). Purple arrows show H-bond interactions and green line Pi-Pi stacking. Figure S12. Ligand Interaction Diagram of pose3 (**A**) and IFD pose (**B**). Purple arrows show H-bond interactions and red line Pi-cation.

**Author Contributions:** Conceptualization, G.C. and M.T.; methodology, G.C.; formal analysis, G.C.; investigation and data curation G.C. and M.T.; writing draft preparation, M.T.; writing—review and editing, G.C., M.T., M.Z., R.E. and A.M.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Samples of the compounds are not available from the authors.

#### **References**


### *Article* **The Interplay of Cholesterol and Ligand Binding in** *h***TSPO from Classical Molecular Dynamics Simulations**

**Hien T. T. Lai <sup>1</sup> , Alejandro Giorgetti 2,3,4, Giulia Rossetti 2,3,5,6 , Toan T. Nguyen 1,\* , Paolo Carloni 2,3,7,8 and Agata Kranjc 2,3,9,10,\***


**Abstract:** The translocator protein (TSPO) is a 18kDa transmembrane protein, ubiquitously present in human mitochondria. It is overexpressed in tumor cells and at the sites of neuroinflammation, thus representing an important biomarker, as well as a promising drug target. In mammalian TSPO, there are cholesterol–binding motifs, as well as a binding cavity able to accommodate different chemical compounds. Given the lack of structural information for the human protein, we built a model of human (*h*) TSPO in the apo state and in complex with PK11195, a molecule routinely used in positron emission tomography (PET) for imaging of neuroinflammatory sites. To better understand the interactions of PK11195 and cholesterol with this pharmacologically relevant protein, we ran molecular dynamics simulations of the apo and holo proteins embedded in a model membrane. We found that: (i) PK11195 stabilizes *h*TSPO structural fold; (ii) PK11195 might enter in the binding site through transmembrane helices I and II of *h*TSPO; (iii) PK11195 reduces the frequency of cholesterol binding to the lower, N–terminal part of *h*TSPO in the inner membrane leaflet, while this impact is less pronounced for the upper, C–terminal part in the outer membrane leaflet, where the ligand binding site is located; (iv) very interestingly, cholesterol most frequently binds *simultaneously* to the so-called CRAC and CARC regions in TM V in the free form (residues L150–X–Y152–X(3)– R156 and R135–X(2)–Y138–X(2)–L141, respectively). However, when the protein is in complex with PK11195, cholesterol binds equally frequently to the CRAC–resembling motif that we observed in TM I (residues L17–X(2)–F20–X(3)–R24) and to CRAC in TM V. We expect that the CRAC–like motif in TM I will be of interest in future experimental investigations. Thus, our MD simulations provide insight into the structural features of *h*TSPO and the previously unknown interplay between PK11195 and cholesterol interactions with this pharmacologically relevant protein.

**Keywords:** *h*TSPO; PK11195; cholesterol; homology modeling; molecular dynamics (MD) simulation

#### **1. Introduction**

The translocator protein (TSPO) is a transmembrane protein (18kDa), evolutionary conserved and expressed in different organisms, from bacteria to humans [1]. Its biological

**Citation:** Lai, H.T.T.; Giorgetti, A.; Rossetti, G.; Nguyen, T.T.; Carloni, P.; Kranjc, A. The Interplay of Cholesterol and Ligand Binding in *h*TSPO from Classical Molecular Dynamics Simulations. *Molecules* **2021**, *26*, 1250. https://doi.org/ 10.3390/molecules26051250

Academic Editor: Marco Tutone and Anna Maria Almerico

Received: 11 December 2020 Accepted: 3 February 2021 Published: 26 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

functions are conserved throughout the phylogenetic spectrum, like tetrapyrrole biosynthesis and/or sterol metabolism [1,2]. Indeed, the bacterial TSPO homology in *Rhodobacter sphaeroides* can be functionally replaced by rat TSPO [3], despite that these proteins share only about 30% sequence identity. The human protein (*h*TSPO) is expressed in all tissues and located in the outer mitochondrial membrane [4,5]. Its highest expression levels are found in steroid-synthesizing cells of endocrine organs indicating that it may play an important role in steroid synthesis from cholesterol [6]. Mammalian TSPO binds cholesterol with high affinity by the cholesterol recognition/interaction amino acid consensus (CRAC) motif (residues 150–156) [7,8]. This motif is preceded by a short three amino acids sequence L144–A145–F146 (LAF) that is highly conserved in mammalian TSPO. In the experiment with the *Rs*TSPO mutant, where the three amino acids (A136–T137–A138) preceding the CRAC region were replaced by mammalian LAF sequence, it was shown that the LAF motif greatly increased the binding affinity for cholesterol with respect to the original bacterial sequence [9]. An additional binding prediction sequence for cholesterol was found in TSPO—the inverse version of CRAC, the CARC motif (residues 135–141) [10]. While in the case of the nicotine acetylcholine receptor, functional studies clearly show that a substitution of a specific amino acid in CARC slows the kinetics of cholesterol binding, in the case of TSPO, it is not yet known whether CARC binds cholesterol as well [11]. In addition, TSPO has been proposed to play an important role in other cellular processes like porphyrin transport [12,13], mitochondrial respiration [4,14], and immunomodulation [15].

TSPO expression is highly upregulated in cancer and at the sites of neuroinflammation processes in cerebral ischemia, Alzheimer's, Parkinson's, and Huntington's diseases, and multiple sclerosis (reviewed in [16]). In addition, a human single nucleotide polymorphism of TSPO (A147T) is associated with different psychiatric disorders, like bipolar disorder, anxiety, and panic attacks [17–19], along with cancer. Thus, TSPO is an interesting target for the development of diagnostic and therapeutic ligands [16,20]. TSPO is overexpressed in the outer mitochondrial membrane of activated microglia [21–23] and reactive astrocytes [24]. Chronic activation of microglia leads to the release of neurotrophic and proinflammatory factors that are neurotoxic and cause neuronal damage and neurodegeneration [25–27]. The microglial activation is imaged in human brain in vivo by positron emission tomography (PET) of TSPO radiolabeled ligands. PK11195 (1-(2-chlorophenyl)- *N*-methyl-*N*-(1-methylpropyl)-3-isoquinolinecarboxamide) is one of the most commonly used, high affinity TSPO ligands for studying the diagnostics and treatment of brain inflammation and of other inflammatory diseases [28].

Second- (PRB28, PBR06, DAA1106) and third- (ER176 and GE-180) generation ligands have been developed (reviewed in [29,30]). PRB28, PBR06, and DAA1106 have higher binding affinity for *h*TSPO than PK11195. They provide a better signal-to-typical positron emission tomography (PET) noise ratio. ER176 and GE-180 are being developed to overcome differences in binding affinities for the WT or the A147T mutant *h*TSPO. This mutant emerges in 30% of Caucasians, 25% of Africans, 4% of Japanese, and 2% of Han Chinese according to the Hapmap database (http://hapmap.ncbi.nlm.nih.gov (accessed on 25 January 2020)). It binds PRB28 [31], PBR06 [32], and FEPPA [33]. This consequently leads to the lower PET signal intensity and can provide misleading results (absence of neuroinflammation) for the carrier of the A147T mutation.

TSPO folds into a bundle of five transmembrane (TM) helices and a short extramembrane helix placed in the cytoplasmic loop between helices TM I and TM II [34–36]. Bacterial and mouse TSPO can exist as monomers, dimers, and other oligomers as shown by experiments [34,35,37,38]. Different computational molecular modeling studies were carried out to further shed light on the dimerization of mouse and bacterial TSPO and on how ligands (small chemical compounds, porphyrin, cholesterol) interact with dimers or influence their stability [39–42]. In contrast, the oligomerization state of *h*TSPO has not been established. An in vitro study showed that it can adopt dimeric or trimeric forms under the inflammation conditions reproduced by high concentrations of reactive oxygen species (ROSs) [43]. Different experimental TSPO structures have been solved to date, notably the

NMR structure of mouse TSPO (*Mo*TSPO, PDB ID: 2MGY [36]) and the X-ray structures of *Rhodobacter sphaeroides* (*Rs*TSPO, PDB ID: 4UC1 [35]) and *Bacillus cereus* (*Bc*TSPO, PDB ID: 4RYI [34]) TSPOs.

The experimental structure of *h*TSPO has not yet been solved. Here, we built a monomeric structural model of *h*TSPO, alone and in complex with PK11195 by homology modeling and docking, and we ran molecular dynamics (MD) simulations of the protein in a lipid environment. We analyzed in detail the interactions of PK11195 and cholesterol with *h*TSPO and how PK11195 alters cholesterol interactions with this protein. Our structural model and results can be a valuable source for future studies of *h*TSPO and its interactions with cholesterol and/or other pharmacological ligands.

#### **2. Results and Discussion**

#### *2.1. The hTSPO Structural Model*

The sequence of the *h*TSPO was aligned with those of *Rs*TSPO, *Bc*TSPO, and *Mo*TSPO (the sequence identity is of 29%, 24%, and 81%, respectively; Figure 1). We decided to use *Rs*TSPO (PDB code: 4UC1 [35]) as a template in structural modeling of the human translocator protein [44,45].

Indeed, the latter is currently the best template choice in comparative modeling of mammalian TSPOs [40,41] for the following reasons: (i) the structural folds of the TSPOs on passing from the bacterial to the mammalian proteins are conserved [34–36]; (ii) the NMR *Mo*TSPO structure is affected by the ionic detergents used for the purification during the measurements [36]; as a result, the positions of highly conserved amino acids and of the transmembrane helices are altered [40]; (iii) the *Rs*TSPO crystal structure was resolved at a relatively good resolution (1.8 Å).

**Figure 1.** Multiple sequence alignment of TSPO from different organisms: human TSPO (*h*TSPO), *Mus musculus* (*Mo*TSPO) [36], *Rhodobacter sphaeroides* (*Rs*TSPO) [35], and *Bacillus cereus* (*Bc*TSPO) [34]. For the last three proteins, experimental structural information is available. Semi-conserved positions with more than 50% consensus according to ClustalO [46] are highlighted in cyan, while highly conserved positions with more than 90% consensus are shown in red. The oligomerization motif G83XXXG87 is indicated by the orange stars and rectangle. The W95XPXF99 motif is depicted with the blue stars and rectangle. The cholesterol-binding motif CRAC and its "mirror code" CARC are marked by dark green and violet rectangles and stars, respectively. The CRAC-like motif in TM I is highlighted by the light green rectangle.

We modeled the *h*TSPO based on the canonical *h*TSPO sequence reported in the UniProtKB database, Entry P30536 [47,48] (https://www.uniprot.org/uniprot/P30536 (accessed on 1 February 2019)). The model contains five transmembrane helices (TMs) arranged in the clockwise order TM I–TM II–TM V–TM IV–TM III. The four loops are located either in the cytoplasm, LP I and LP III, or in the cytosol, LP II and LP IV, respectively (Figure 2). LP I is composed of residues V26–H46 (Figure 1), and it was earlier suggested to play a role as the gate of the *h*TSPO ligand binding pocket [49]. Almost half of the residues composing LP I are fully conserved between *h*TSPO and *Rs*TSPO (Figure 1), which has a short *α*-helix in the middle of its loop. LP I in the *Mo*TSPO structure is shorter (10 residues) than in the bacterial TSPO structures (19 residues), and it lacks this *α*-helix.

**Figure 2.** Top and side view of the structural model of the *h*TSPO monomer (orange cartoon representation) aligned with the *Rs*TSPO template (black tube representation). Four important functional regions are highlighted: cholesterol recognition/interaction amino acid consensus region (CRAC, in green color) and reverse region of the CRAC (CARC, in purple color), both involved in cholesterol binding, the G83XXXG87 motif (blue color) relevant for monomer–monomer interactions, and the W95XPXF99 motif (shown in red color) important for ligands binding. The figure was prepared with the VMD program [50].

Different structurally and/or functionally important motifs are conserved among *Rs*TSPO and *h*TSPO:

(i) An important feature of the mammalian TSPO is its capability of binding cholesterol molecules [7]. Two motifs are involved in cholesterol binding: the cholesterol recognition/interaction amino acid consensus motif (CRAC) represented as L150–X–Y152–X(3)– R156 (Figure 1, dark green rectangle) [7] and the reverse region of CRAC (named CARC) described as R135–X(2)–Y138–X(2)–L141 (Figure 1, purple rectangle) [7,10]. One helical turn before CRAC, there is a short sequence L144–A145–F146 that enhances the binding affinity of mammalian TSPO for cholesterol [9]. Even though cholesterol is absent in bacterial membranes, the CRAC motif is conserved in *Rs*TSPO (Figure 1, dark green rectangle). This is most probably in order to accommodate hopanoids, which have a structure and function similar to that of cholesterol in higher organisms [51].

In order to interact with cholesterol, side chains of the key CRAC, CARC, and LAF residues need to face the membrane. In our model, CRAC Y152 and R156 side chains are membrane exposed, while L150 is oriented towards the ligand binding cavity in the center of the TSPO (Figure 2). Different mutagenesis studies showed the importance of Y152 and of R156 for cholesterol binding. If one of these two residues is mutated to serine or leucine, respectively, the binding of cholesterol is abolished [7,8]. CARC, in contrast to CRAC, is not conserved in *Rs*TSPO (Figure 1, purple box). However, in our *h*TSPO model, all key residues from this motif, R135, Y138, and L141, are membrane exposed and can interact with cholesterol (Figure 2). Among the LAF residues, L144 is facing the interior of the protein in our model, while A145 and F146 are membrane exposed and available for cholesterol binding.

(ii) The G83XXXG87 motif from *h*TSPO (Figure 1, orange rectangle) coincides with the A75XXXA79 motif in *Rs*TSPO. These motifs represent widespread helix–helix interactions across different membrane proteins [52–56]. Both motifs are located in the third TM domain (TM III) of the respective proteins and represent a binding interface for TSPO monomer– monomer interactions. The G83 and G87 residues are exposed to the membrane in our *h*TSPO model and can interact with the second monomer (Figure 2).

(iii) The W95XPXF99 motif is fully conserved among human, mouse, and *Rs*TSPO, while the *Bc*TSPO sequence differs significantly in this region (Figure 1, blue rectangle). This motif is conserved also in other prokaryotic and eukaryotic TSPO sequences [40,57], and it was suggested to play a role in oligomerization processes [49], as well as in ligand binding [34,58]. In the *Mo*TSPO experimental structure, W95 points into the binding cavity and F99 is oriented toward the membrane, whereas in the *Rs*TSPO structure, both residues point into the binding pocket. In the *Bc*TSPO–PK11195 complex, residues F90 and Q94, which correspond to W95 and F99 in mammalian TSPO, respectively, interact with PK11195. This is also the case for our *h*TSPO model (Figure 2).

#### *2.2. PK11195 Interactions with the hTSPO Model*

We docked PK11195 to our *h*TSPO model. Two 3D structures of the TSPO in complex with PK11195 exist, the NMR structure of *Mo*TSPO [36] and the X-ray structure of *Bc*TSPO [34]. The ligand binding cavity has the same location in both proteins, but PK11195 adopts different binding poses. In our studies, we used the *h*TSPO–PK11195 complex where the ligand binding pose was similar to the one observed in the *Bc*TSPO, since currently, there are no experimental structures available for the *Rs*TSPO–PK11195 complex, which is a template of our model. Other docking poses differed from the one used as the starting configuration for the MD simulations (Figure S1), showing a small degree of convergence. Indeed, these poses are already cluster representatives, since AutodockVina [59], used here, does the clustering automatically.

However, many residues crucial for PK11195 binding in the *Bc*TSPO crystal structure are fully conserved in *Rs*TSPO, like Y32(31), P42(41), W51(50), N87(84), W138(135), A142(139), and L145(142) (numbering is for *Bc*TSPO and in parentheses for *Rs*TSPO) (Figure 1). Furthermore, the main structural differences between *Bc* and the *Rs*TSPOs appear at the monomer–monomer interface in the dimer structure and not in the ligand binding pocket [49].

The binding cavity in our model is lined with residues belonging to four TM domains and to LP I: G18, C19, V21, G22, F25 (in TM I), Y34, H43 (in LP I), H46, L49, G50, W53 (in TM II), N92, W95, P96, F99, F100 (in TM III), W143, T147, L150 (in TM V). The residues L49, W53, W95, W143, A147, and L150 are involved in the binding of PK11195 in both the *Mo* and *Bc*TSPOs experimental structures [34,36]. Beside these residues, others bind PK11195 in *Mo* and *Bc*TSPO, but they are specific to each structure. One third of the binding residues in our model are fully conserved among mammalian [60] and other prokaryotic and eukaryotic species [57], like Y34, W53, N92, W143, L150, and A/T147 (Figure 1, for more complete alignments see [57,60]). It was shown that either threonine or alanine at position 147 has no impact on PK11195 binding to the TSPO since it binds to both polymorphs with the same binding affinity, in contrast to other radioligands, which bind with significantly smaller binding affinity to the protein with threonine [31,60,61].

Next, we ran a 1 µs long MD simulation and analyzed the interactions between our *h*TSPO model and the PK11195 ligand. Additional MD simulation replicas of 450 ns were later run for holo and apo *h*TSPOs (two for each protein), and the results are reported in the Supplementary Materials. During the MD simulation, at around 450 ns, we observed quite sudden movement of the PK11195 chlorophenyl-*N*-isoquinoline part towards different binding pose, while the alkyl part of PK11195 remained at its initial position. The rings horizontally slide in a way that the chlorophenyl ring, which initially faces TM I, is placed between TM I and TM II, closer to the latter helix (Figure 3a, center, and Figure S2). This movement indicates that the initial binding pose may not be optimal and/or that the TSPO binding pocket possesses certain plasticity allowing for different binding poses of the ligand. Interestingly, the pose of PK11195 after 450 ns was similar to one docking position (Figure S1, pose 5; the RMSD between the two poses is 1.8 Å). We compared the binding pose that PK11195 adopts during the first 450 ns (Figure 3b) with the one it takes up after the movement (Figure 3c). We observed that residues binding constantly PK11195 during the whole length of the MD run are: G22, F25 (in TM I), Y34, H43 (in LP I), L49, W53 (in TM II), W95, P96 (in TM III), and T147, L150 (in TM V). Residues Y34, W53, W95, A/T147, and L150 are well conserved among TSPOs from different species, while G22, F25, and L49 are semi-conserved [57]; this indicates their importance for the structure and/or function of the protein.

The F25 side chain is initially facing towards the membrane, later it moves inside the binding pocket, establishing the *π*-stacking and hydrophobic interactions with F100, Y34, and PK11195. In the *Mo*TSPO (PDB code: 2MGY [36]) and *Bc*TSPO (PDB code: 4RYI [34]) structures, this residue points out of the binding site in the same orientation as it does in our model at the beginning of the MD simulation.

In our model, Y34 forms hydrophobic interactions with PK11195 throughout the MD simulation. This residue is fully conserved among TSPOs from different species, from mammalians to bacteria [57]. However, in the crystal structure of *Bc*TSPO–PK11195, this residue binds PK11195, while in the *Mo*TSPO–PK11195 complex, it faces the cytosol. Despite this ambiguity in the TSPO–PK11195 experimental structures, our result is in very good agreement with mutational studies showing that mutations Y34F, Y34F/F100A, and Y34F/F99A cause a large decrease in the binding affinity for PK11195 with respect to the WT TSPO [58]. These results indicate that the aromatic phenyl rings are crucial at this place for PK11195 binding.

A stable hydrogen bond (H-bond) is formed between the W53 indole amino group and the carbonyl-oxygen atom of PK11195. The H-bond between W53 and PK11195 was observed also in the *Bc*TSPO–PK11195 crystal structure [34], but not in the *Mo*TSPO– PK11195 NMR structure, which lacks any H-bond interaction [36]. Another H-bond observed in the *Bc*TSPO–PK11195 structure was formed between W143 and the PK11195 ligand [34]. In our model, this H-bond is formed occasionally during the first 450 ns; after this time, W143 constantly interacts with PK11195 through VdW interactions.

**Figure 3.** PK11195 binding interactions with the *h*TSPO model. (**a**) Top: Open access to the binding pocket between TM I and TM II of the *h*TSPO structural model that persists throughout the MD simulation. We do not observe other openings, nor the change in the LP I conformation. Center: PK11195 moves in the binding site of the *h*TSPO model from its initial state observed in the first 450 ns (red color) to the new pose (blue color, 450–1000 ns). The image of every hundredth frame is shown smoothed with a five frame window. Bottom: Chemical formula of PK11195. (**b**,**c**) 3D and 2D representations of the PK11195 binding pocket during the first 450 ns (top) and after the ligand movement, from 450 ns till the end of the MD run (bottom). 3D plots show PK11195 (yellow and orange balls and sticksrepresentation for 0–450 ns and for 450–1000 ns, respectively) and the residues binding it for more than 90% of the simulation time; the backbone and hydrogen atoms were omitted for clarity reasons. F100 was kept in (**c**), despite that it does not bind PK11195 anymore, to show the change in its side chain conformation. The most constant interactions, formed for more than 75% of the simulation time between PK11195 and the *h*TSPO model, are shown in the 2D plots obtained by the Discovery tool [62]. Legend: green circles—hydrogen bonds, light green circles—VdW interactions, light pink circles—*π*-alkyl, and dark pink circles—*π*-*π* interactions.

PK11195 is additionally bound through VdW, hydrophobic, or stacking interactions by H43, L49, P96, W95, T147, and L150. Some of these residues interact with PK11195 also in the *Bc*TSPO structure (F90, S91, A142, and L145, respectively) and in the *Mo*TSPO structure.

In addition to all the above described residues, F100 and L112 steadily bind PK11195 during the first 450 ns. At this time, the ligand moves to a new binding pose, and these two interactions are lost; however, the interactions with V26 and F99 are established (Figure 3b,c). Interestingly, the F100 side chain flips out of the binding pocket at around 720 ns (Figure 3c). This residue is oriented towards the binding site in *Bc*TSPO, as it is in our model at the beginning of the MD simulations, while in *Mo*TSPO, it is facing the membrane like in our model at the end of the MD run. According to our results, we suggest that F100 spontaneously changes its position from inward to outward of the binding pocket and that the experimental structures captured it in one of these two different conformations.

The spontaneous change that we observe in the orientation of the F100 side chain may indicate that it is involved in placing the ligand inside the binding pocket, but not crucial for its binding, and that this role is left to F99 and Y34, as suggested by [58]. Deeper studies will of course need to be done to explore in detail the exact role of these three residues.

Different hypothesis were made in the literature about the possible binding pathways for PK11195: one suggested that the ligand enters the binding pocket from the cytosol and that LP I plays a role of the gate [49,63]; the second one proposes binding between the crevices in TM helices [34,63,64].

In the present work, we observed the opening between TM I and TM II of our *h*TSPO– PK11195 model, which during the whole simulation time gave direct access to the binding pocket through the membrane (Figure 3a). Furthermore, PK11195 adopts a position where its Cl-phenyl ring is placed in this crevice, enveloped by C19, G22, F25, and V26 from TM I and by H46, L49, and W53 from TM II. In contrast, in the apo *h*TSPO model, the TM I and TM II helices maintain the closed position, without any openings, during the full length of the MD simulation (Figure S3).

Furthermore, we noted that LP I is stabilized through a patch of interactions: cation*π* interactions formed between K39 (LP I) and Y34 (LP I) and the stacking interactions between Y34, F25 (TM I), F99, F100 (TM III), and PK11195. These interactions prevent within the time scale of our simulation—LP I from moving in a way to open the access to the binding site from the top of the protein and therefore from the cytosol. This result is consistent with the previous study [64]. However, we cannot exclude that on a longer time scale, LP I is able to perform larger movements, as was proposed by [63].

#### *2.3. PK11195 Stabilizes hTSPO Structural Fold*

The apo and holo *h*TSPO structural models were embedded into the membrane and evaluated by a 1 µs long MD simulation. We evaluated and compared the structural stability of the apo and holo *h*TSPOs by calculating the root mean squared deviations (RMSD) of backbone atoms, the root mean squared fluctuations (RMSF) of C*α* atoms (Figure 4), and the helices' flexibility (Figure 5).

The RMSD of apo *h*TSPO fluctuates more than that of the holo protein (Figure 4a); however, both systems reach a plateau at around 400 ns. The RMSD fluctuations in the apo protein (from 400 ns on) are principally due to the bending of the TM I helix and due to the flexibility of the loops LP I, II, and III. In contrast, the RMSD of the *h*TSPO– PK11195 complex is lower than for the apo protein and becomes steady from around 400 ns on, indicating that PK11195 stabilizes the *h*TSPO structural model. The higher structural stability of the holo protein can be observed as well from the principal component analysis (PCA) (Figure 6). These results are in line with experimental data for *Mo*TSPO, where the interactions with its cognate ligand PK11195 stabilize its structural fold [36,65]. Similar observations were obtained for *Rs*TSPO, which also showed an important flexibility, especially around the ligand binding site [35,66]. It was shown that the quality of *Rs*TSPO crystals was significantly improved by adding cholesterol and PK11195 to the crystallization medium [35], suggesting that PK11195 can have a positive impact also on the *Rs*TSPO and not only on *Mo*TSPO structural stability.

The RMSD values of backbone atoms for each transmembrane helix (Figure 4c,d) clearly show that PK11195 increases the stability of TM I and TM II, while TM IV and TM V are stable regardless of the absence/presence of the ligand. In both—apo and holo—proteins TM I is the most flexible helix and TM IV the most stable one.

**Figure 4.** (**a**) Evolution of the root mean squared deviation (RMSD) values of the apo (violet graph) and holo (green graph) *h*TSPOs during the 1 µs long MD simulation. RMSD values were calculated for the backbone atoms of residues W5 to N158, excluding the N– and C–termini and H atoms. (**b**) Root mean squared fluctuations (RMSF) of the C*α* atoms in the apo (violet graph) and holo (green graph) *h*TSPOs. RMSF values were calculated for equilibrated proteins (in the MD simulation range of 400 ns–1 µs). (**c**,**d**) RMSD values for each of the five transmembrane helices (TM I–TM V) in the apo (*h*TSPO) and holo (*h*TSPO–PK11195) proteins, respectively.

**Figure 5.** Analysis of the flexibility of each TM domain (TM I–TM V) in *h*TSPO–PK11195 and *h*TSPO structural models by means of Bendix [67]. y-axis: residue index number corresponding to the residues composing individual TM domain; x-axis: simulation time. The color scale indicates changes in helix angle/bending during the MD simulations, from blue: <6 ◦ to red: >24 ◦ .

**Figure 6.** The principal component analysis of the (**a**) *h*TSPO and (**b**) *h*TSPO–PK11195 models showing the flexible parts of the protein. The image of every hundredth frame is shown, spanning from the beginning (red color) to the end (blue color) of the MD simulation.

The RMSD for TM I in the apo system has two plateau levels indicating a conformational change. Indeed, there is a kink in the *α*-helix due to the P15 residue (Figure 5, TM I in *h*TSPO). Helix kinks are a common feature of long *α*-helices, which are frequent in transmembrane proteins, and proline residues are strongly associated with the helix being kinked [68,69]. TM I becomes straight at around 350 ns. The alteration between the kinked and straight form of TM I is the reason for the RMSD change and also for the higher RMSD values with respect to the other helices in our model (Figure 4c). TM III (P96–P97) and TM V (P139) are also slightly kinked. However, TM V in the holo protein is very stable most probably owing it to the presence of PK11195, though its binding site is distant from P139.

The root mean squared fluctuations (RMSF) of C*α* atoms were calculated in the range of 400 ns to 1000 ns, when both systems reach equilibration. The RMSF for both systems are very similar, but one can note important peaks at residues A35, E70, and F100 in the apo model (Figure 4b). These regions correspond to the first three loops fluctuating more in the apo than in the holo protein.

In the *h*TSPO–PK11195 complex, LP I with the small *α*-helix (residues G28 to G36) in the middle of it is stable during the whole MD simulation time. A crucial role in its high stability is played by a patch of interactions (described in details in the previous section) that hinders the free movement of LP I and contributes to the *α*-helix retaining its conformation. In contrast, LP I in the apo model varies in length (between F25–P45 and S23–P45), and the small *α*-helix is rarely formed. Due to its random coil structure and the absence of PK11195, LP I is more flexible than in the holo protein. LP II in the holo model is stable during the MD simulation, while in the apo protein, one helix turn at the C–terminus of TM II unfolds (data not shown), extending the length of LP II (W68–A78), which consequently fluctuates more than in the holo protein. Here, again, PK11195 seems to play an important role in the stability of TM II (Figure 4c,d).

LP III in the holo protein consists of residues G102–L109, and despite its length, it is more stable than in the apo protein where it is composed of residues F99–N104 (Figure 6). We observed that in both proteins, the parallel cation-*π* interactions are formed between R103 (LP III) and W33 (LP I *α*-helix) for more than 75% of the simulation time. Hydrophobic interaction between W33 and F100 (TM III) further stabilize the previous interaction in the holo protein (existent for 90% of the simulation time), but much less in the apo protein (existent for 40% of the simulation time). In addition, in the holo protein, F100 binds with Y34 (LP I) for 730 ns and for about 450 ns also with PK11195. This cascade stabilizes LP I, LP III, and the whole upper, C–terminal part of the holo *h*TSPO, namely the part in the outer membrane leaflet, where the ligand binding site is present. In the apo protein, this same cascade is not stabilized by PK11195, and indeed, LP III fluctuates more (Figure 6).

Finally, LP IV and LP V remain stable without changes in both models during all MD simulations, in line with our results that TM IV and TM V are the most stable helices regardless of the ligand's presence, and their termini do not unfold, as seen for some other helices described above (Figure 4c,d).

Taken together, the PK11195 ligand appears to reduce the fluctuations of the loops and to stabilize the overall structural fold of the TSPO protein.

#### *2.4. Cholesterol Interactions with hTSPO*

Our analyses show that the apo *h*TSPO model binds 1.5 times more cholesterol molecules than the holo one. This result is in good agreement with other studies postulating that PK11195 reduces the cholesterol binding to the TSPO [7].

We analyzed the average simulation time during which cholesterol interacts with each of the five TM helices (Table 1). In the holo protein, cholesterol interacts most often with TM I (47% of the simulation time) and TM V (48% of the simulation time), while in the apo protein, it interacts for 100% of the time with TM V and much less with other helices. Among other helices, TM II stands out, binding cholesterol for about 50% of the simulation time.


**Table 1.** The percentage (%) of the simulation time during which the individual transmembrane helices (TM I–TM V) bind the cholesterol molecule(s). The total simulation time is 1 µs.

Next we calculated the average number of cholesterol molecules bound to a single TM domain per frame (i.e., per ns). The holo TM I binds on average slightly more cholesterol than TM I in the apo protein, while for TM II, the result is inverted (Figure 7a). TM III and TM IV bind on average the same amount of cholesterol per frame, but holo TM V binds less than one cholesterol per frame, while apo TM V binds on average 1.5 cholesterol molecules per frame during the full length of the simulation time (Figure 7a). Indeed, the high affinity cholesterol binding motifs, CRAC and CARC, are located in TM V (Figures 1 and 2).

**Figure 7.** (**a**) The average number of cholesterol molecules (Naverage CHL) binding to the individual helix (TM I–TM V) in the apo (violet line) and holo (green line) *h*TSPOs at each frame of the 1 µs MD trajectory. (**b**) The total number of all cholesterol molecules (Tot No of CHL) binding either to the CRAC-like motif in TM I or to the CRAC and/or CARC in TM V during our 1 µs long MD simulation of apo *h*TSPO (violet) and holo *h*TSPO (green).

We analyzed why TM I of the holo protein binds cholesterol more frequently than other helices and why it binds more cholesterol than TM I in the apo protein (Figure 7b). We found that TM I has a CRAC–resembling motif, namely L17–X(2)–F20–X(3)–R24 (Figure 1), that attracts cholesterol. This motif is located in the upper, C–terminal part of the TSPO, in the outer membrane leaflet, next to the PK11195 binding site. We suggest that cholesterol binds more often to this motif in the holo than in the apo protein due to the higher stability of the TM I helix in the former protein (Figure 4c,d). The higher stability of holo TM I (i.e., it is less kinked than apo TM I (Figure 5, TM I) allows for the more optimal orientation of cholesterol binding residues L17, F20, and R24 and of cholesterol molecules with respect to the apo protein.

For TM II, the frequency of cholesterol binding is inverted: it interacts more often with the apo than with the holo protein. Cholesterol binds to both proteins only in the first half of the simulation time. Two cholesterols interact with apo TM II, one in the upper, C–terminal part (binding to residues P45–W47–V48–P51–V52) and one in the lower, N–terminal part of the protein (binding to residues T55-A59–Y62–L66). Cholesterol interacts almost twice more frequently with the lower part of the helix that is in the inner membrane leaflet than with the upper part of the helix that is in the outer membrane leaflet. In the holo protein instead, cholesterol binds exclusively to the upper part of TM II. Since apo TM II binds more cholesterol than holo, it is clear that cholesterol has higher binding affinity for the lower part of this helix (i.e., it rather binds to the N–terminal part of the *h*TSPO that is in the inner membrane leaflet than to its C–terminal part). We suggest that PK11195 reduces the frequency of cholesterol binding to the lower, N–terminal part of the *h*TSPO and in this particular case to TM II.

TM III and TM IV bind cholesterol equally frequently in both systems.

Finally, we determined the frequency of cholesterol binding to TM V, precisely to the CRAC (L150–X–Y152–X(3)–R156) and CARC (R135–X(2)–Y138–X(2)–L141) motifs. We defined how many cholesterol molecules bind individually to CRAC or CARC, as well as the frequency of cholesterol binding to both regions simultaneously (Figure 7b). For individual binding, we counted cases where only one motif at a time is occupied by cholesterol. For simultaneous binding, we counted only the cases when both motifs are occupied at the same time and with two different cholesterol molecules. Cases where one cholesterol molecule is bridging the two regions were excluded. Our results show that in the holo protein, cholesterol binds most often to CRAC and much less often to the CARC motif. Interestingly, the presence of PK11195 almost abolishes the simultaneous binding of cholesterol to both motifs.

In the apo protein, cholesterol binds more often to CRAC than to CARC. The number of cholesterol binding to these motifs is lower than in the holo protein, owing to the fact that cholesterol in the apo protein preferentially binds to both motifs at the same time.

Taken together, the apo protein binds more cholesterol molecules than the holo protein. In the apo protein indeed, cholesterol binds with about 50% frequency to TM II and with 100% frequency to the CRAC and CARC regions in TM V. In TM II, cholesterol binds more readily to the lower part of the helix, so to the N–terminal part of *h*TSPO present in the inner membrane leaflet. Very interestingly, in the apo protein, cholesterol binds most frequently to CRAC and CARC simultaneously (Figure 8, right panel), while in the holo protein, simultaneous binding to these motifs is almost abolished. Indeed, cholesterol in the holo protein binds mostly to the newly described motif in TM I and to the CRAC motif in TM V (Figure 8, left panel). Both motifs are present in the upper, C–terminal part of the *h*TSPO, next to the PK11195 binding site and in the outer membrane leaflet. In our study, the binding of cholesterol to the lower, N–terminal part of the holo *h*TSPO model is rarely observed.

**Figure 8.** Cholesterol molecules bind most frequently to CRAC and CRAC–like regions (green surface representation) that are in the vicinity of the PK11195 binding site (orange surface representation) in the *h*TSPO–PK11195 system and to CRAC, LAF(blue surface representation), and CARC (purple surface representation) motifs in the *h*TSPO system. Corresponding residues from each region that interact with cholesterol (color coded, respectively) are represented within ellipses.

#### **3. Materials and Methods**

#### *3.1. Building the 3D Structural Model of hTSPO*

We used the sequence reported in the UniProtKB [47,48] database with the ID: P30536. It contains 169 amino acids, spanning from M1 to E169. This sequence was aligned with those of: *Mus musculus* TSPO (*Mo*TSPO; UniProt ID: P50637), *Rhodobacter sphaeroides* TSPO (*Rs*TSPO; UniProt ID: Q9RFC8), and *Bacillus cereus* TSPO (*Bc*TSPO; UniProt ID: Q81BL7), using the Multalin [70] and ClustalO [46] web-servers. 3D structural models of the *h*TSPO were built based on the *Rs*TSPO template (PDB ID: 4UC1 [35]). Twenty models of *h*TSPO were generated using the MODELLER program, Version 9.19 [71]. All models were analyzed according to the Discrete Optimized Protein Energy (DOPE) score using the built-in script of the MODELLER package [71,72]. In addition, the local structural quality of the *h*TSPO models in the biological membrane were examined using the QMEANBranescoring function [73] from the Swiss-model server [74–78]. All models were visually inspected and compared to the template and to the available mutagenesis data. The model corresponding best to the available experimental data, having the lowest DOPE score according to the MODELLER program [71,72] and the appropriate local structural quality as defined by the QMEANBrane tool [73], was chosen for docking and molecular dynamics (MD) simulation [79,80] studies. We checked that the orientation and tilt angles of the helices—once the model is inserted in a membrane—were appropriate. These parameters were computed by the Positioning the Proteins in Membranes (PPM) server [81] for the model and the *Rs*TSPO template and compared between them (Table 2).


**Table 2.** Comparison of the hydrophobic thickness and protein tilt angles for the *Rs*TSPO template and the *h*TSPO structural model. All values were obtained from the Positioning the Proteins in Membranes (PPM) server [81].

#### *3.2. Docking of the PK11195 Ligand*

We docked the PK11195 ligand (*N*-[(2R)-butan-2-yl]-1-(2-chlorophenyl)-*N*methylisoquinoline-3-carboxamide) to the *h*TSPO structural model. The initial 3D structure of PK11195 was obtained from the PubChem database (https://pubchem.ncbi.nlm.nih. gov/compound/1345 (accessed on 1 February 2019)). Molecular docking was performed using the UCSF Chimera program [82] and AutoDock Vina package [59]. Protein and ligand input files were prepared by AutoDockTools. The ligand had fully flexible torsion of freedom, while the receptor side chains were kept rigid. Non-polar hydrogen atoms of the protein and the ligand were merged. The center of the grid was placed at X = −14.451 Å, Y = 25.618 Å, and Z = 25.297 Å. The grid dimensions were 76 × 72 × 66 Å, and the spacing between the grid points was set to 0.375 Å. The exhaustiveness parameter of the global search was set to 8 (default). Ten ligand binding modes were generated in search for a ligand pose with the lowest binding affinity.

We selected the model where the ligand had the lowest, i.e., the most negative docking binding affinity score, and it interacted with the two conserved residues W53 and W95 [57] shown to be important for binding [34].

#### *3.3. Molecular Dynamics Simulations*

The apo (*h*TSPO) and holo (*h*TSPO–PK11195 complex) models were then inserted into the lipid bilayer composed of phosphatidylcholine (POPC)—phosphatidylethanolamine (POPE)—cholesterol (CHL) with the ratio of 3:3:1 for POPC:POPE:CHL, respectively. The choice of the membrane composition was made based on the experimental studies of the mitochondrial membrane and the protein–lipid monolayers [83,84]. The membrane thickness was 3.04 nm and was built by the Mem-Builder web-server [85,86]. The *h*TSPO models were placed and properly oriented at the center of the membrane box by the Lambada and InflateGRO2 tools [87] (the tilt angle of all TM helices is 10◦ ). Principally, these values correspond to those of the *Rs*TSPO template (Table 2).

The apo and holo models of *h*TSPO inserted in the POPC–POPE–CHL membrane were solvated with 12,758 water molecules enclosed in a solvation box with dimensions of 10.5 nm × 10.5 nm × 11.0 nm. 161 sodium (*Na*+) and 166 chloride (*Cl*−) ions were added to neutralize the system net charge and to reproduce the physiologic electronic strength of 0.15M. The MD simulations were run using the GROMACS 2018.6 package [88,89] and applying the SLIPIDSforce field [90] for the membrane, the AMBER99SB-ILDNforce field [91] for the *h*TSPO model and ions, and the TIP3P [92] force field for water. The force field parameters of PK11195 were prepared using the General Amber force field (GAFF) [93,94], introducing the RESPatomic charges and electrostatic potential (ESP) as calculated based on the B3LYP/6-31G\* basis set using the Gaussian09 package [95]. The topology file of the PK11195 ligand was converted to GROMACS format using the ACPYPE tool [96]. The geometry of the *h*TSPO models was optimized by steepest descent minimization performed for 50,000 steps with a maximum force constant value of 1000 kJ/mol/nm. After the geometrical optimization, the systems underwent NPTequilibration for 10 ns with a time step of 2 fs. The systems were maintained at the reference pressure of 1 bar by coupling to the Parrinello–Rahman barostat [97,98] with uniform scaling of x-y box vectors and independent scaling for the z-axis (i.e., perpendicular to the membrane). The systems were coupled to the Nose–Hoover thermostat [99–101] to maintain the temperature at 310 K. A 1.2 nm

cut-off was set for the short-range non-bonded interaction. The LINCSalgorithm [102] was chosen to constrain all bonds involving hydrogen atoms. The holo and apo structural models of *h*TSPO were then simulated for 1 µs to detect the stable structure of *h*TSPO in the mitochondrial membrane. MD simulation parameters were the same as in the NPT equilibration run; only the thermostat was changed to the V-rescale thermostat [101].

#### *3.4. Analysis*

**The root mean squared deviation (RMSD)** for the entire *h*TSPO model and for the individual helices was calculated for the backbone atoms omitting the hydrogen atoms. The RMSD for the entire protein was calculated for the sequence from W5 to N158, excluding the N– and C–termini and the hydrogen atoms.

**The root mean squared fluctuation (RMSF)** was calculated for the C*α* atoms of each *h*TSPO residue. They were calculated for equilibrated proteins, that is in the MD simulation range 400 ns–1 µs. For these calculations, the *g\_rms* and *g\_rmsf* tools from the GROMACS package were used [88,89].

**Helices flexibility analyses** was done by the Bendix plugin [67] in the VMD program [50]. To define the color code, we saved the values of the angle changes along each helix during the MD simulation time. The average values ranged from 0◦ to 24◦ ; therefore, we divided the color code into 5 parts with the corresponding 6 angle values' extents. The highest change in the helical angle was observed for the TM I helix in the apo *h*TSPO model, i.e., 51◦ . The color code is described as blue: <6◦ , cyan: 6–12◦ , green: 12–18◦ , yellow: 18–24◦ , and red: >24◦ .

**The principal component analysis (PCA)** was performed on the apo and holo *h*TSPO including residues W5 to N158. The C– and N–termini, as well as hydrogen atoms were ignored. We used the *g\_covar* and *g\_anaeig* tools in the GROMACS package [88,89].

**The analysis of PK11195 interactions** with the *h*TSPO model (Figure 3) was carried out by home-made TCLand AWKscripts. To define a binding site of the ligand, we searched for all residues that were within 4.5 Å of any PK11195 heavy atom. We defined residues that bind PK11195 ligand for more than 90% of the simulation time as constant or principal binders. All of them form hydrophobic or stacking interactions with the ligand. For W53 that is H-bonding the carbonyl oxygen of PK11195, we calculated the frequency of H-bond formation using the distance criteria of 3.5 Å. The residues binding the PK11195 ligand for more than 90% of the simulation time are shown in the VMD representation in Figure 3. Residues that interact with the PK11195 for at least 75% were determined as frequent binders. All residues binding PK11195 for at least 75% are shown in the 2D plots in Figure 3. The subfigures in Figure 3 were made using the Visual Molecular Dynamics (VMD) [50] and Discovery Studio Visualizer [62] programs.

**The cholesterol analysis** was done by in-house written TCL and AWK scripts. The cholesterol was counted as bound to the *h*TSPO model, to the defined TM helix, or to different motifs (CRAC-like in TM I, CRAC/CARC in TM V) if it was found within 5 Å from any residue belonging to the analyzed region, respectively. Only contacts between heavy atoms were taken into account.

The total number of cholesterols bound to the apo and holo *h*TSPO systems throughout the 1 µs were counted and expressed as the ratio of the cholesterol molecules interacting with each system.

The data in Table 1 were obtained by counting all frames where the cholesterol interacts with the transmembrane helix in question. The percentage of the simulation time was calculated as the number of frames divided by the total number of frames (1000).

To obtain the average number of cholesterol molecules that bind the individual helix at each frame (as reported in Figure 7a), we counted the total number of cholesterol molecules interacting with the individual TM helix, and we divided this number by the total number of frames (1000).

For Figure 7b, we just counted the total number of cholesterol molecules that bind only to the CRAC–like motif in TM I, only to CRAC/CARC in TM V, or to both motifs (CRAC+CARC in TM V) at the same time.

#### **4. Conclusions**

The interplay between PK11195 and cholesterol interactions with *h*TSPO were studied by MD simulations of a homology model of the protein based on *Rs*TSPO. The ligand increases the stability of the protein in terms of RMSD, PCA, and Bendix analyses. During the MD simulation, PK11195 slides to a new position, in which its CL-phenyl ring initially facing TM I is placed between TM I and TM II, closer to the latter helix (Figures 3 and S2).The two helices detach from one other, while they stay close to each other in the apo protein. The ligand forms mostly hydrophobic and stacking interactions with the protein. Its carbonyl oxygen forms an H-bond with the W53 side chain. Two and three cholesterol molecules per ns bind, on average, to the holo and apo *h*TSPO, respectively. Hence, the presence of the ligand reduces the frequency of cholesterol binding to the protein. In the apo protein, the cholesterol molecules bind most of the time simultaneously to two well-known cholesterol binding motifs, CRAC and CARC in TM V. In the holo protein instead, cholesterol interacts with the CRAC–like motif in TM I and with the CRAC motif in TM V. Cholesterol binds much more rarely to the lower, N–terminal part of the holo *h*TSPO, that is in the inner membrane leaflet. Thus, PK11195 reduces cholesterol binding to this latter region, but it favors cholesterol interactions with the upper, C–terminal part of the protein in the outer membrane leaflet. Further studies are required to understand more in detail why this is the case.

**Supplementary Materials:** The following data are available online: Table S1: Experimentally defined dissociation constants (*K<sup>d</sup>* ) of the PK11195 molecule for different TSPOs. Figure S1: 3D and 2D representations of the PK11195 binding poses obtained by docking. Figure S2: MD snapshots showing the PK11195 ligand in its binding pocket at 0 ns, 250 ns, 500 ns, 750 ns and 1000 ns. Figure S3: Apo hTSPO does not exhibit an open access channel to the ligand binding pocket (located between the TM I and TM II helices in the holo protein). Figure S4: The number of cholesterol molecules binding to each helix of the apo and holo *h*TSPO proteins, averaged over our three MD simulations. Figure S5: The number of cholesterol molecules binding to apo and holo *h*TSPO averaged over our MD trajectories. Figure S6: Ligands motion in the binding site in our three MD simulations. Figure S7: Open access channels to the ligand binding pocket in our three MD trajectories.

**Author Contributions:** Conceptualization, T.T.N., G.R., A.G., and P.C.; methodology, T.T.N.; investigation, H.T.T.L, analyses, H.T.T.L. and A.K.; writing, original draft preparation, H.T.T.L. and A.K.; writing, review and editing, all authors; supervision, T.T.N., P.C., G.R., A.G., and A.K.; project administration, T.T.N.; funding acquisition, T.T.N. All authors read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the World Bank and the Ministry of Science and Technology of Vietnam joint project "Fostering Innovation through Research, Science and Technology", Subproject Number 13/FIRST/1.a/VNU1, and the financial support of the Vietnam National University, Hanoi, for the Key Laboratory of Multiscale Simulations of Complex System, Annual Grant Number TXTCN.20.03.

**Data Availability Statement:** MD trajectories and protein models are available upon request to the corresponding authors.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


### *Article* **Study of Endogen Substrates, Drug Substrates and Inhibitors Binding Conformations on MRP4 and Its Variants by Molecular Docking and Molecular Dynamics**

**Edgardo Becerra 1,2 , Giovanny Aguilera-Durán 1,3 , Laura Berumen <sup>2</sup> , Antonio Romo-Mancillas 3,\* and Guadalupe García-Alcocer 2,\***


**Abstract:** Multidrug resistance protein-4 (MRP4) belongs to the ABC transporter superfamily and promotes the transport of xenobiotics including drugs. A non-synonymous single nucleotide polymorphisms (nsSNPs) in the ABCC4 gene can promote changes in the structure and function of MRP4. In this work, the interaction of certain endogen substrates, drug substrates, and inhibitors with wild type-MRP4 (WT-MRP4) and its variants G187W and Y556C were studied to determine differences in the intermolecular interactions and affinity related to SNPs using protein threading modeling, molecular docking, all-atom, coarse grained, and umbrella sampling molecular dynamics simulations (AA-MDS and CG-MDS, respectively). The results showed that the three MRP4 structures had significantly different conformations at given sites, leading to differences in the docking scores (DS) and binding sites of three different groups of molecules. Folic acid (FA) had the highest variation in DS on G187W concerning WT-MRP4. WT-MRP4, G187W, Y556C, and FA had different conformations through 25 ns AA-MD. Umbrella sampling simulations indicated that the Y556C-FA complex was the most stable one with or without ATP. In Y556C, the cyclic adenosine monophosphate (cAMP) and ceefourin-1 binding sites are located out of the entrance of the inner cavity, which suggests that both cAMP and ceefourin-1 may not be transported. The binding site for cAMP and ceefourin-1 is quite similar and the affinity (binding energy) of ceefourin-1 to WT-MRP4, G187W, and Y556C is greater than the affinity of cAMP, which may suggest that ceefourin-1 works as a competitive inhibitor. In conclusion, the nsSNPs G187W and Y556C lead to changes in protein conformation, which modifies the ligand binding site, DS, and binding energy.

**Keywords:** MRP4; SNPs; variants; protein threading modeling; molecular docking; molecular dynamics; binding site

#### **1. Introduction**

The transport of xenobiotics out of the cell across membranes is a mechanism used by cells to detoxify. This mechanism is mediated by ATP-binding cassette (ABC) transporters [1]. Multidrug resistance protein-4 (MRP4) is a member of the ABCC subfamily and mediates the transport of xenobiotics such as cardiovascular, antiviral, and anticancer drugs. The substrates for MRP4 are mainly glucuronide conjugates and organic anions [2]. MRP4 can modify drug pharmacokinetics and contributes to the manifestation of side effects or multidrug resistance. In addition, the tumor energy metabolism is related to multidrug resistance due to the high production of ATP to enhance the activity of MRP4

**Citation:** Becerra, E.; Aguilera-Durán, G.; Berumen, L.; Romo-Mancillas, A.; García-Alcocer, G. Study of Endogen Substrates, Drug Substrates and Inhibitors Binding Conformations on MRP4 and Its Variants by Molecular Docking and Molecular Dynamics. *Molecules* **2021**, *26*, 1051. https:// doi.org/10.3390/molecules26041051

Academic Editors: Marco Tutone and Anna Maria Almerico

Received: 17 December 2020 Accepted: 8 February 2021 Published: 17 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and other ABC transporters [3]. An increase in MRP4 activity or expression leads to a decrease in drug efficacy. In another instance, a decrease in MRP4 activity or expression could enhance toxicity due to drug accumulation. Since MRP4 is expressed in the kidneys, liver, erythrocytes, lymphocytes, adrenal glands, platelets, brain, and pancreas in humans, it can modify cellular exposure to drugs. In addition, the toxicity produced by MRP4 will depend on the type of drug or endogen substrate [4]. The MRP4 dysregulation has been reported in several pathological disorders, especially in cancer [5]; thus, MRP4 represents an attractive therapeutic target. The design of pharmacological agents with the ability to selectively modulate the activity of this ABC transporter or modify its affinity of a given substrate represents a challenge in chemical biology and drug design [6,7]. and intracellular nucleotide binding domains (NBDs). The domain arrangement for MRP4 is TMD-NBD-TMD-NBD [9]. Each TMD consists of six transmembrane helixes (TMHs) that determine the ligand specificity and allow ligand binding. In addition, NDBs bind and hydrolyze ATP to trigger substrate transport [10]. MRP4 is codified by the ABCC4 gen, located on chromosome 13q32.1 [11]. Alternative splicing leads to four isoforms, of which isoform 1 has been the most studied [1]. MRP4 is a highly polymorphic gene [12]; however, limited data are available on the function of MRP4 variants. Recent studies have

substrate represents a challenge in chemical biology and drug design [6,7].

Molecules 2021, 26, x FOR PEER REVIEW 2 of 22

effects or multidrug resistance. In addition, the tumor energy metabolism is related to multidrug resistance due to the high production of ATP to enhance the activity of MRP4 and other ABC transporters [3]. An increase in MRP4 activity or expression leads to a decrease in drug efficacy. In another instance, a decrease in MRP4 activity or expression could enhance toxicity due to drug accumulation. Since MRP4 is expressed in the kidneys, liver, erythrocytes, lymphocytes, adrenal glands, platelets, brain, and pancreas in humans, it can modify cellular exposure to drugs. In addition, the toxicity produced by MRP4 will depend on the type of drug or endogen substrate [4]. The MRP4 dysregulation has been reported in several pathological disorders, especially in cancer [5]; thus, MRP4 represents an attractive therapeutic target. The design of pharmacological agents with the ability to selectively modulate the activity of this ABC transporter or modify its affinity of a given

MRP4 consists of 1325 amino acids and is the shortest member of the ABCC subfam-

ily [8]. The basic MRP4 core structure is comprised of transmembrane domains (TMDs)

MRP4 consists of 1325 amino acids and is the shortest member of the ABCC subfamily [8]. The basic MRP4 core structure is comprised of transmembrane domains (TMDs) and intracellular nucleotide binding domains (NBDs). The domain arrangement for MRP4 is TMD-NBD-TMD-NBD [9]. Each TMD consists of six transmembrane helixes (TMHs) that determine the ligand specificity and allow ligand binding. In addition, NDBs bind and hydrolyze ATP to trigger substrate transport [10]. MRP4 is codified by the ABCC4 gen, located on chromosome 13q32.1 [11]. Alternative splicing leads to four isoforms, of which isoform 1 has been the most studied [1]. MRP4 is a highly polymorphic gene [12]; however, limited data are available on the function of MRP4 variants. Recent studies have been focused on the relationship between ABCC4 nsSNPs and drug disposition. In most cases, nsNSPs have little or no effect on the protein structure or function, but sometimes nsSNPs promotes non-functional or highly functional proteins [13]. The nsSNPs that occur in protein coding regions always alter the encoded amino acid, and the effect on the structure or function of the protein depends on the mutated site [14]. been focused on the relationship between ABCC4 nsSNPs and drug disposition. In most cases, nsNSPs have little or no effect on the protein structure or function, but sometimes nsSNPs promotes non-functional or highly functional proteins [13]. The nsSNPs that occur in protein coding regions always alter the encoded amino acid, and the effect on the structure or function of the protein depends on the mutated site [14]. Nada-Abla and coworkers in 2008 reported that the MRP4 variants G187W and G487E show a significantly reduced function of azidothymidine and adefovir transport compared to wild-type MRP4 (WT-MRP4). G187W is a non-synonymous ABCC4 variant and the mutation is located at the cytosolic loop 1 in the TMD (Figure 1); it has undergone the greatest structural change in terms of composition, polarity, and molecular volume.

Nada-Abla and coworkers in 2008 reported that the MRP4 variants G187W and G487E show a significantly reduced function of azidothymidine and adefovir transport compared to wild-type MRP4 (WT-MRP4). G187W is a non-synonymous ABCC4 variant and the mutation is located at the cytosolic loop 1 in the TMD (Figure 1); it has undergone the greatest structural change in terms of composition, polarity, and molecular volume. G187W also has a 50% reduction in function, and this could be clinically relevant [15]. On the other hand, Y556C is another non-synonymous ABCC4 variant and is located at NBD1 (Figure 1). Mayukh-Banerjee and coworkers in 2016 reported that the Y556C variant exhibited a 1.8-fold increase in dimethylarsinic acid effectiveness relative to WT-MRP4. Experiments on MRP4 transfection into the HEK cell line showed that the Y556C variant had 50% less expression than WT-MRP4. Both G487E and Y556C had appropriate cellular membrane localization [16]. G187W also has a 50% reduction in function, and this could be clinically relevant [15]. On the other hand, Y556C is another non-synonymous ABCC4 variant and is located at NBD1 (Figure 1). Mayukh-Banerjee and coworkers in 2016 reported that the Y556C variant exhibited a 1.8-fold increase in dimethylarsinic acid effectiveness relative to WT-MRP4. Experiments on MRP4 transfection into the HEK cell line showed that the Y556C variant had 50% less expression than WT-MRP4. Both G487E and Y556C had appropriate cellular membrane localization [16].

Figure 1. MRP4 structure of the cell membrane and localization of MRP4 variants (adapted from Banerjee at al., 2016) [16]. **Figure 1.** MRP4 structure of the cell membrane and localization of MRP4 variants (adapted from Banerjee at al., 2016) [16].

The crystallographic structure of MRP4 is not available; thus, a protein threading The crystallographic structure of MRP4 is not available; thus, a protein threading model can be built based on the homology of the template and the construction of loops.

Protein threading by I-TASSER relies in the identification of structural templates from the

Protein threading by I-TASSER relies in the identification of structural templates from the Protein Data Bank (PDB) using a local meta-threading server, a method for templatebased protein structure prediction. Three-dimensional models are generated for a given sequence by collecting high-scoring structural templates from locally installed threading programs [17,18]. After model building, it can be refined by molecular mechanics calculations, such as energy minimization and molecular dynamics simulations. Protein threading models are considered working tools that can be used to generate hypotheses related to protein structure, protein function, and protein–ligand interactions. The molecular docking of drug molecules into their binding sites allows us to identify relevant amino acids for ligand–protein interactions in order to select such amino acids for further site-directed mutagenesis studies [9]. In the present study, three different MRP4 structures (WT and its variants, G187W and Y556C) were built through protein threading to study, by molecular docking, the interactions between endogen substrates, drug substrates, and inhibitors, with MRP4-WT, G187W, and Y556C, allowed to observe changes in the pattern of intermolecular interactions, adapted from Russel and coworkers in 2008, where they report the IC<sup>50</sup> of several molecules, in vitro, over the MRP4 protein [8], and to calculate the binding energy (∆G) between FA and cAMP and MRP4 structures.

#### **2. Results and Discussion**

#### *2.1. WT-MRP4, G187W, and Y556C Model Building and CG-MD Simulations*

MRP4 plays a critical role in the distribution of different xenobiotics and endogen substrates, which can lead to different effects in the organism. Differences in the MRP4 activity depend, among other factors, on the expression or mutational changes of the ABCC4 gene, leading to significantly higher or lower transport activity [4]. The WT-MRP4 and its variants Y556C and G187W were built by protein threading in the I-TASSER server [19]. Protein threading and homology modeling are based on the principle that similar primary sequences will lead to similar 3D protein structures. According to the BLAST server, the template structure MRP1 from *Bos taurus* and human MRP4 had a 36.56% identity sequence similarity. When the primary sequence of a protein has 30% of identity as referred to a template (crystallographic structure), the protein threading and homology models are considered functional because the root mean standard deviation (RMSD) of the positions of their atoms is 2.0 Å or less with regard to the template structure [20–22].

The best model by the I-TASSER of each MRP4 structure was selected for further analysis with coarse-grained molecular dynamics simulations (CG-MDS) of 1 µs. Figure 2 shows the three MRP4 models and the most representative structures (cluster 1) obtained in I-TASSER and by CG-MDS at timesteps 630.40 ns for WT-MRP4, 564.90 ns for G187W, and 674.90 ns for Y556C. The conformations of WT-MRP4 and variants were in an "inwardfacing conformation" [23], while, in CG-MDS, the three MRP4 structures were in a closed state. All the loops that connect the alpha helixes of the three MRP4 structures have different conformations and distributions over the protein.

In this work, RMSD values higher than 2.0 Å were considered significant, hence the protein conformations were considered different. Figure 3 shows the different MRP4 sites studied and each region is illustrated with a different color, where the green color represents WT-MRP4; those sites are the nucleotide-binding domains (NBD), the transmembrane domains (TMD), and the residues relevant to substrate interaction (r85-236 and r715-866).

The WT-MRP4, G187W, and Y556C conformations during the first 100 ns of CG-MDS changed significantly, according to the RMSD values (Figure 4a), which indicates a large movement of the protein to further stabilization from 250 to 1000 ns. The RMSD of WT-MRP4 was higher than those of its variants, considering the complete structure. In addition, different regions of the MRP4 structure were studied focusing on the ligand binding sites, nsSNPs, and ATP pocket binding. Figure 4b shows the RMSD values for TMDs of the WT-MRP4 and its variants. According to the RMSD plot, the changes in the TMDs' conformations are quite similar among the three MRP4 structures.

Molecules 2021, 26, x FOR PEER REVIEW 4 of 22

Figure 2. MRP4 models built by homology modeling in I-TASSER and cluster 1 from CG-MDS. Green, (A) WT-MRP4. Cyan, (B) G187W. Magenta, (C) Y556C. (D–F) represent cluster 1 obtained from GC-MDS for WT-MRP4, G187W, and Y556C, respectively. The arrows indicate the location of mutations. **Figure 2.** MRP4 models built by homology modeling in I-TASSER and cluster 1 from CG-MDS. Green, (**A**) WT-MRP4. Cyan, (**B**) G187W. Magenta, (**C**) Y556C. (**D**–**F**) represent cluster 1 obtained from GC-MDS for WT-MRP4, G187W, and Y556C, respectively. The arrows indicate the location of mutations. studied and each region is illustrated with a different color, where the green color represents WT-MRP4; those sites are the nucleotide-binding domains (NBD), the transmembrane domains (TMD), and the residues relevant to substrate interaction (r85-236 and r715-866).

**Figure 3.** Representation of different sites of the MRP4 protein in the complete structure. (**A**) Blue represents NBD1. (**B**) Red represents TMDs. (**C**) Magenta represents NBD2. (**D**) Orange represents r85-236. (**E**) Gray represents r715-866.

conformations are quite similar among the three MRP4 structures.

Molecules 2021, 26, x FOR PEER REVIEW 5 of 22

Molecules 2021, 26, x FOR PEER REVIEW 5 of 22

r85-236. (E) Gray represents r715-866.

r85-236. (E) Gray represents r715-866.

Figure 3. Representation of different sites of the MRP4 protein in the complete structure. (A) Blue represents NBD1. (B) Red represents TMDs. (C) Magenta represents NBD2. (D) Orange represents

Figure 3. Representation of different sites of the MRP4 protein in the complete structure. (A) Blue represents NBD1. (B) Red represents TMDs. (C) Magenta represents NBD2. (D) Orange represents

changed significantly, according to the RMSD values (Figure 4a), which indicates a large movement of the protein to further stabilization from 250 to 1000 ns. The RMSD of WT-MRP4 was higher than those of its variants, considering the complete structure. In addition, different regions of the MRP4 structure were studied focusing on the ligand binding sites, nsSNPs, and ATP pocket binding. Figure 4b shows the RMSD values for TMDs of

changed significantly, according to the RMSD values (Figure 4a), which indicates a large movement of the protein to further stabilization from 250 to 1000 ns. The RMSD of WT-MRP4 was higher than those of its variants, considering the complete structure. In addition, different regions of the MRP4 structure were studied focusing on the ligand binding sites, nsSNPs, and ATP pocket binding. Figure 4b shows the RMSD values for TMDs of the WT-MRP4 and its variants. According to the RMSD plot, the changes in the TMDs'

The WT-MRP4, G187W, and Y556C conformations during the first 100 ns of CG-MDS

The WT-MRP4, G187W, and Y556C conformations during the first 100 ns of CG-MDS

Figure 4. plot for the complete WT-MRP4 structures and their variants (A) and the TMDs (B) throughout 1000 ns of MDS. **Figure 4.** plot for the complete WT-MRP4 structures and their variants (**A**) and the TMDs (**B**) throughout 1000 ns of MDS. In contrast to the complete structure, the TMDs do not obtain stabilization according to the RMSD values, which increase and decrease over 1000 ns. Moreover, the RMSD val-

In contrast to the complete structure, the TMDs do not obtain stabilization according to the RMSD values, which increase and decrease over 1000 ns. Moreover, the RMSD values for G187W remain increasing from 750 to 1000 ns, and such behavior could be due to the mutation is in the cytosolic loop 1 that connects the transmembrane helix (TMH) 1 and TMH2. The NBD1 conformation in WT-MRP4 remained unstable and the RMSD values kept increasing throughout the 1000 ns of CG-MDS, while the NBD1 conformation in G187W and Y556C was stable with an RMSD value around 6.0 Å (Figure 5). The RMSD values for NBD2 of the WT-MRP4 and its variants were similar even though there was no In contrast to the complete structure, the TMDs do not obtain stabilization according to the RMSD values, which increase and decrease over 1000 ns. Moreover, the RMSD values for G187W remain increasing from 750 to 1000 ns, and such behavior could be due to the mutation is in the cytosolic loop 1 that connects the transmembrane helix (TMH) 1 and TMH2. The NBD1 conformation in WT-MRP4 remained unstable and the RMSD values kept increasing throughout the 1000 ns of CG-MDS, while the NBD1 conformation in G187W and Y556C was stable with an RMSD value around 6.0 Å (Figure 5). The RMSD values for NBD2 of the WT-MRP4 and its variants were similar even though there was no stabilization through the simulation. In addition, the RMSD values of G187W and Y556C tended to increase while the RMSD values of WT-MRP4 tended to decrease at the end of the simulation. ues for G187W remain increasing from 750 to 1000 ns, and such behavior could be due to the mutation is in the cytosolic loop 1 that connects the transmembrane helix (TMH) 1 and TMH2. The NBD1 conformation in WT-MRP4 remained unstable and the RMSD values kept increasing throughout the 1000 ns of CG-MDS, while the NBD1 conformation in G187W and Y556C was stable with an RMSD value around 6.0 Å (Figure 5). The RMSD values for NBD2 of the WT-MRP4 and its variants were similar even though there was no stabilization through the simulation. In addition, the RMSD values of G187W and Y556C tended to increase while the RMSD values of WT-MRP4 tended to decrease at the end of the simulation.

Figure 5. RMSD plot for NBD1 and NBD2 of WT-MRP4 and its variants throughout the 1000 ns of MDS. According to the RMSD plot in Figure 6, the conformation in residues (r) 85 to 236 (r85-236 site corresponding to TMH1-TMH4) of WT-MRP4 remained stable from 250 to 1000 ns while in G187W it stabilized at the last 250 ns. r85-236 site in Y556C did not stabilize throughout the 1000 ns. In the same Figure 6, the RMSD plot indicates that the conformation in the r715-866 site (TMH7-TMH10) in WT-MRP4 and Y556C kept constant, with a tendency toward increasing motion in Y556C and decreasing motion in WT-MRP4. Besides this, the conformation in r715-866 site in G187W did not stabilize and the RMSD

values suggest high motion throughout the 1000 ns of simulation. In the case of ATP sites 1 and 2 (Figure 7), the conformation remained unstable, suggesting that it is a site with high motion, with exception of ATP site 2 of WT-MRP4, which kept stable through the simulation. Regarding the three different MRP4 structures, alignments on the NBDs, TMDs, and TMH1-TMH4 and TMH7-TMH10 sites were performed to determine differences among the structures. In all the alignments (Supplementary Material S1), it was observed that all the sites studied presented significant structural differences, according to the RMSD values, comparing WT-MRP4 vs. its variants and G187W vs. Y566C. Moreover, the TMHs are responsible for the specificity for the substrate, and the r-85-236 and r715-866 sites in WT-MRP4 were significantly different with respect to G187W and Y556C, which could lead to differences in the ligand affinity, the ligand binding site, and the motion of the protein [23,24]. high motion, with exception of ATP site 2 of WT-MRP4, which kept stable through the simulation. Regarding the three different MRP4 structures, alignments on the NBDs, TMDs, and TMH1-TMH4 and TMH7-TMH10 sites were performed to determine differences among the structures. In all the alignments (Supplementary Material S1), it was observed that all the sites studied presented significant structural differences, according to the RMSD values, comparing WT-MRP4 vs. its variants and G187W vs. Y566C. Moreover, the TMHs are responsible for the specificity for the substrate, and the r-85-236 and r715- 866 sites in WT-MRP4 were significantly different with respect to G187W and Y556C, which could lead to differences in the ligand affinity, the ligand binding site, and the motion of the protein [23,24]. simulation. Regarding the three different MRP4 structures, alignments on the NBDs, TMDs, and TMH1-TMH4 and TMH7-TMH10 sites were performed to determine differences among the structures. In all the alignments (Supplementary Material S1), it was observed that all the sites studied presented significant structural differences, according to the RMSD values, comparing WT-MRP4 vs. its variants and G187W vs. Y566C. Moreover, the TMHs are responsible for the specificity for the substrate, and the r-85-236 and r715- 866 sites in WT-MRP4 were significantly different with respect to G187W and Y556C, which could lead to differences in the ligand affinity, the ligand binding site, and the motion of the protein [23,24].

Molecules 2021, 26, x FOR PEER REVIEW 6 of 22

Molecules 2021, 26, x FOR PEER REVIEW 6 of 22

According to the RMSD plot in Figure 6, the conformation in residues (r) 85 to 236

According to the RMSD plot in Figure 6, the conformation in residues (r) 85 to 236

(r85-236 site corresponding to TMH1-TMH4) of WT-MRP4 remained stable from 250 to 1000 ns while in G187W it stabilized at the last 250 ns. r85-236 site in Y556C did not stabilize throughout the 1000 ns. In the same Figure 6, the RMSD plot indicates that the conformation in the r715-866 site (TMH7-TMH10) in WT-MRP4 and Y556C kept constant, with a tendency toward increasing motion in Y556C and decreasing motion in WT-MRP4. Besides this, the conformation in r715-866 site in G187W did not stabilize and the RMSD values suggest high motion throughout the 1000 ns of simulation. In the case of ATP sites 1 and 2 (Figure 7), the conformation remained unstable, suggesting that it is a site with

(r85-236 site corresponding to TMH1-TMH4) of WT-MRP4 remained stable from 250 to 1000 ns while in G187W it stabilized at the last 250 ns. r85-236 site in Y556C did not stabilize throughout the 1000 ns. In the same Figure 6, the RMSD plot indicates that the conformation in the r715-866 site (TMH7-TMH10) in WT-MRP4 and Y556C kept constant, with a tendency toward increasing motion in Y556C and decreasing motion in WT-MRP4. Besides this, the conformation in r715-866 site in G187W did not stabilize and the RMSD values suggest high motion throughout the 1000 ns of simulation. In the case of ATP sites 1 and 2 (Figure 7), the conformation remained unstable, suggesting that it is a site with high motion, with exception of ATP site 2 of WT-MRP4, which kept stable through the

Figure 6. RMSD plot for r85-236 and r715-866 of WT-MRP4 and its variants. r86-236 represents TMH1-TMH4 and r715-866 represents TMH7-TMH10. **Figure 6.** RMSD plot for r85-236 and r715-866 of WT-MRP4 and its variants. r86-236 represents TMH1-TMH4 and r715-866 represents TMH7-TMH10. TMH1-TMH4 and r715-866 represents TMH7-TMH10.

Figure 7. RMSD plot for ATP binding site 1 (ATP site 1) and ATP binding site 2 (ATP site 2) of WT-MRP4 and its variants. Figure 7. RMSD plot for ATP binding site 1 (ATP site 1) and ATP binding site 2 (ATP site 2) of WT-MRP4 and its variants. **Figure 7.** RMSD plot for ATP binding site 1 (ATP site 1) and ATP binding site 2 (ATP site 2) of WT-MRP4 and its variants.

#### *2.2. Molecular Docking in MRP4 and Variants*

2.2. Molecular Docking in MRP4 and Variants By using cluster 1 from CG-MDS for each structure of MRP4, molecular docking was performed to explore the effect of the MRP4 variants on the affinity of three different groups of molecules, previously reported as substrates or inhibitors in vitro. Table 1 presents the docking score (DS), expressed as kcal/mol, related to the interaction between 2.2. Molecular Docking in MRP4 and Variants By using cluster 1 from CG-MDS for each structure of MRP4, molecular docking was performed to explore the effect of the MRP4 variants on the affinity of three different groups of molecules, previously reported as substrates or inhibitors in vitro. Table 1 presents the docking score (DS), expressed as kcal/mol, related to the interaction between By using cluster 1 from CG-MDS for each structure of MRP4, molecular docking was performed to explore the effect of the MRP4 variants on the affinity of three different groups of molecules, previously reported as substrates or inhibitors in vitro. Table 1 presents the docking score (DS), expressed as kcal/mol, related to the interaction between endogenous substrates and WT-MRP4, Y556C, and G187W. In this work, significant differences between docking poses were considered when a difference greater than 1 kcal/mol in DS was

present. According to the DS, most endogen substrates significantly changed their DS in G187W and Y556C with respect to WT-MRP4. The Y556C mutation is located at NBD1 and leads to a different conformation with respect to WT-MRP4, which causes the most substantial change in the ligand binding site of all the molecules studied, even more than those molecules with significant changes in DS compared to WT-MRP4.

**Table 1.** Endogen substrate DS (kcal/mol) in the WT-MRP4 and MRP4 variants.


cAMP: cyclic adenosine monophosphate. \* Represents more than 1 kcal/mol of difference in the DS compared with WT.

Since cAMP is considered to be the main molecule transported by MRP4 [25], it was used as a control to compare the effect of nsSNPs and ATP binding. The cAMP DS was not considered significantly different in WT-MRP4 with respect to Y556C and G187W. Cholic acid DS was significantly different, with more than 3 kcal/mol when comparing WT-MRP4 with respect to G187W, while cholic acid DS in WT-MRP4 with respect to Y556C was significantly different at over 2 kcal/mol. The taurocholic acid DS was significantly different, with more than 4 kcal/mol when comparing WT-MRP4 with respect to G187W. Folic acid (FA) was the molecule with the highest variation in DS when comparing WT-MRP4 to G187W and Y556C, with 7.05 and 6.17 kcal/mol differences.

Table 2 presents the DSs of drug substrates. Cefazoline and olmesartan DSs were significantly different in WT-MRP4 with respect to G187W, while cefazoline, furosemide, leucovorin, methotrexate, tenofovir, and topotecan DSs were significantly different in WT-MRP4 with respect to Y556C. According to DS, most drug substrates could present more affinity for Y556C than for WT-MRP4.

**Table 2.** Drug substrate DS (kcal/mol) in the WT-MRP4 and MRP4 variants.


\* Represents more than 1 kcal/mol difference in the DS compared against WT.

The DS of inhibitors is presented in Table 3. Glafenine DS was significantly different in WT-MRP4 with respect to both G187W and Y556C. Ceefourin-1, indomethacin, and sildenafil DSs were significantly different in WT-MRP4 with respect to G187W, while losartan DS was significantly different between WT-MRP4 and Y556C. Endogen substrates were the group of molecules with more variation in the DS, and thus such mutations on the ABCC4 gene could lead to changes in cell metabolism related to changes in the distribution of endogen substrates across cell membranes. Several mutational studies on MRP1 have demonstrated that amino acids in several TMDs are involved in substrate binding and nsSNPs can modify the ligand binding site or the affinity of ligands in a selective manner [11]. Yet, the DS values are not totally related to function. Regarding drug substrates and inhibitors, a small number of molecules changed their DS significantly to G187W and Y556C as related to WT-MRP4, which could modify the pattern of transport of cefazoline, ceftizoxime, olmesartan, topotecan, and ceefourin-1, which could, possibly, mean that they act as substrates or inhibitors depending on the MRP4 variant.


**Table 3.** Drug inhibitor DS (kcal/mol) in the WT-MRP4 and MRP4 variants.

\* Represents more than 1 kcal/mol difference in the DS compared against WT.

#### *2.3. Differences in the Interaction Pattern in MRP4 Structures*

Ligand interaction diagrams (LIDs) represent the pattern of intermolecular interactions of molecules with MRP4 amino acids. Those molecules with a >2 kcal/mol difference in the three different MRP4 structures appear in LIDs in Supplementary Material S2–S5 and the LIDs for FA appear in Figures 8–10. The interaction sites in the three different MRP4 structures were different, mainly in Y556C, for all the molecules exhibited in the LIDs. Cholic acid interacts mainly with hydrophobic residues on WT-MRP4 and Y556C, while on G187W it interacts with hydrophobic and polar residues. H-bonds are only exerted through hydrophobic residues, except on Y556C, where arginine exerts an H-bond with cholic acid. In addition, cholic acid interacts with positively charged amino acids in the three MRP4 structures, lysine on G187W and WT-MRP4, and arginine on Y556C (Supplementary Material S2). The taurocholic acid binding site was different in WT-MRP4 with respect to G187W and was totally different with respect to Y556C. The intermolecular interactions in Y556C and G187W were H-bonds, polar, hydrophobic, interactions with positively and negatively charged amino acids, while interactions in WT-MRP4 were hydrophobic and polar but did not interact with negatively charged amino acids and did not exert H-bonds (Supplementary Material S3). The cefazoline binding site was quite different on the three different MRP4 structures. Hydrophobic and polar residues on WT-MRP4, Y556C, and G187W interact with cefazoline, but only on G187W do H-bonds with arginine occur. WT-MRP4 and Y556C interact with cefazoline through negatively charged amino acids (glutamate), while the G187W interacts through positively charged amino acids (arginine) (Supplementary Material S4). Ceefourin-1 interacts with hydrophobic, polar, negative, and positively charged residues on WT-MRP4 and Y556C, even though the binding site is different in each MRP4 structure, which leads to only one difference; ceefourin-1 interacts via H-bonds in Y556C. Additionally, ceefourin-1 interacts in G187W

with hydrophobic, polar, and positively charged residues, and via H-bonds and π-π stacking (Supplementary Material S5). The Y556C mutation led to a substantial change in the binding site of most of the molecules analyzed, while G187W mutation did not substantially change the binding site with respect to WT-MRP4. Although the molecule's binding site was different among the MRP4, the pattern of intermolecular interactions could be similar to that observed in the LIDs. The molecules presented in the LIDs had a substantial modification to their binding site and intermolecular interactions in Y556C, which could be related to changes in the transport-rate, IC50, entrance to the inner cavity, and effect on the Y556C conformational movement. did not change its binding site throughout all the AA-MD simulation, but it kept moving throughout 25 ns to obtain a DS greater than that of the most stable conformation at 8 ns (Supplementary Material S12–S17). The G187W mutation is located at cytosolic loop 1, close to the entrance of the inner cavity [26], and leads to changes in the MRP4 conformation which could block or interfere with ligand binding or the entrance to its binding site. Notwithstanding this, it is not possible to determine with this study whether there is a relevant effect on the FA transport by G187W. Figure 9 shows the FA LID in G187W at T0 and 8.0 ns in AA-MDS. The FA binding site was slightly different, with intermolecular interactions by an H-bond and π-cation at T0, while at 8.0 no intermolecular interactions were observed, but the pocket binding was composed of hydrophobic amino acids in both time steps. did not change its binding site throughout all the AA-MD simulation, but it kept moving throughout 25 ns to obtain a DS greater than that of the most stable conformation at 8 ns (Supplementary Material S12–S17). The G187W mutation is located at cytosolic loop 1, close to the entrance of the inner cavity [26], and leads to changes in the MRP4 conformation which could block or interfere with ligand binding or the entrance to its binding site. Notwithstanding this, it is not possible to determine with this study whether there is a relevant effect on the FA transport by G187W. Figure 9 shows the FA LID in G187W at T0 and 8.0 ns in AA-MDS. The FA binding site was slightly different, with intermolecular interactions by an H-bond and π-cation at T0, while at 8.0 no intermolecular interactions were observed, but the pocket binding was composed of hydrophobic amino acids in both time steps.

S6–S23). As mentioned above, FA did not change its conformation significantly in WT-MRP4 from 0 to 25 ns, while in G187W the conformation was significantly different at 20 and 25 ns, as the carboxyl group was responsible for the FA conformational changes and multiple intermolecular interactions, such as H-bonds, π-π, and π-cation. FA in G187W

S6–S23). As mentioned above, FA did not change its conformation significantly in WT-MRP4 from 0 to 25 ns, while in G187W the conformation was significantly different at 20 and 25 ns, as the carboxyl group was responsible for the FA conformational changes and multiple intermolecular interactions, such as H-bonds, π-π, and π-cation. FA in G187W

Molecules 2021, 26, x FOR PEER REVIEW 10 of 22

Molecules 2021, 26, x FOR PEER REVIEW 10 of 22

Figure 8. FA binding interaction diagram in WT-MRP4 at T0 (A) and 16.20 ns (B) in AA-MDS. The nomenclature for the intermolecular interactions is shown in the bottom of the figure. **Figure 8.** FA binding interaction diagram in WT-MRP4 at T0 (**A**) and 16.20 ns (**B**) in AA-MDS. The nomenclature for the intermolecular interactions is shown in the bottom of the figure. Figure 8. FA binding interaction diagram in WT-MRP4 at T0 (A) and 16.20 ns (B) in AA-MDS. The nomenclature for the intermolecular interactions is shown in the bottom of the figure.

Figure 9. FA binding diagram interactions on G187W at T0 (A) and at 8.0 ns (B) in AA-MDS. **Figure 9.** FA binding diagram interactions on G187W at T0 (**A**) and at 8.0 ns (**B**) in AA-MDS. different across the AA-MDS (Supplementary Material S18–S23).

Figure 9. FA binding diagram interactions on G187W at T0 (A) and at 8.0 ns (B) in AA-MDS.

Figure 10 shows the FA LID in Y556C at T0 and 15.00 ns in AA-MDS. In this case, the

Figure 10. FA binding interaction diagrams in Y556C at T0 (A) and 15.0 ns (B) in AA-MDS. **Figure 10.** FA binding interaction diagrams in Y556C at T0 (**A**) and 15.0 ns (**B**) in AA-MDS.

The AA-MDS studies were conducted in cAMP as a substrate control to determine if

First, AA-MDS were performed on the wild type and variants for MRP4-cAMP complex to obtain the C1 conformations. Further, ATP and Mg2+ were added into the MRP4 structures to obtain a more complete system, and further AA-MDS were performed to determine a new C1. This procedure was applied in the MRP4-FA complexes as well. Both C1 conformations before ATP and after ATP were used for umbrella sampling studies to obtain the free binding energy (ΔG) of each ligand. Since two ATP molecules bind to MRP4, we referred to them as ATP1 and ATP2 for the NBD1 and NBD2, respectively. In all the MRP4-FA and MRP4-cAMP complexes with ATP bonds, the binding site and the ligand conformation were slightly different concerning the MRP4-FA and MRP4-cAMP complexes without ATP. The presence or the absence of ATP modifies the pattern of intermolecular interactions as well. The H-bonds, π-π stacking, and π-cation interactions are present together or individually in all the complexes, except in G187W-FA without ATP. Given the fact that the ATP influences the ligand conformation, it modifies the intermolecular interactions as well. Figures 11 and 12 show the ATP binding site in NBD1 and

Figure 11. Representation of the ATP binding site 1 in the complete MRP4 structure and an amplification of such site. Color pink in the complete MRP4 structure represents the ATP site 1. Color red

NBD2 from the WT-MRP4, G187W, and Y556C structures.

Table 4 shows the most important residues required for the interactions of endogen substrates, drug substrates, and inhibitors in each MRP4 structure. It is worth noting that in the inhibitor group, only Lys 329 and Arg 951 appear in both endogen and drug substrates, suggesting that such residues may play an important role in the binding site. In the endogen and drug substrate groups, at least two residues are repeated—Arg 946 and Lys 702.


**Table 4.** Residues considered as important for ligand–MRP4 interactions.

#### *2.4. All-Atom Molecular Dinaymics (AA-MD) Simulations and Umbrella Sampling Studies*

FA was the molecule with the highest DS variation in G187W and Y556C related to WT-MRP4 in molecular docking studies, considering that ATP was not bound; hence, to study the effect of the presence of ATP and the mutations, changes in the pattern of intermolecular interactions and affinity to MRP4 structures, 25 ns AA-MDS, and 10 ns of Umbrella sampling simulations were carried out on the MRP4-FA complexes and compared with cAMP as a control molecule. The C1 in AA-MDS was at 16.2 ns in WT-MRP4, 8.0 ns in G187W, and 15.0 ns in Y556C. Figure 8 shows the FA LID in WT-MRP4 at T0 and 16.2 ns. In this simulation, the differences in the FA binding site and the pattern of intermolecular interaction can be observed according to the WT-MRP4 conformation at a given time. The FA binding site was the same at T0 and 16.2 ns, suggesting that the protein can be in an inward-facing conformation for 25 ns or even more. Moreover, the pattern of intermolecular interactions between FA and WT-MRP4 is quite similar, consisting of H-bonds; π-π stacking; and interactions with polar, hydrophobic, and negatively and positively charged residues. The differences in the pattern of intermolecular interactions rely on the H-bonds, with 4 H-bonds at T0 with Phe698 and Leu835, Glu1002 and Thr839. Meanwhile, at 16.2 ns the interactions were mainly hydrophobic (π-π stacking) and there was one H-bond with Glu1002. Therefore, it seems that FA from 0 to 16.2 ns interacts with MRP4 to achieve its optimal bonding only. A longer simulation will help to determine all the FA binding sites across the WT-MRP4. Besides this, the AA-MDS studies were performed on each MRP4-FA complex to determine differences in the patterns of interactions and conformations in frames from 0 to 25 ns every 5 ns referred to T0 (Supplementary Material S6–S23). As mentioned above, FA did not change its conformation significantly in WT-MRP4 from 0 to 25 ns, while in G187W the conformation was significantly different at 20 and 25 ns, as the carboxyl group was responsible for the FA conformational changes and multiple intermolecular interactions, such as H-bonds, π-π, and π-cation. FA in G187W did not change its binding site throughout all the AA-MD simulation, but it kept moving throughout 25 ns to obtain a DS greater than that of the most stable conformation at 8 ns (Supplementary Material S12–S17). The G187W mutation is located at cytosolic loop 1, close to the entrance of the inner cavity [26], and leads to changes in the MRP4 conformation which could block or interfere with ligand binding or the entrance to its binding site. Notwithstanding this, it is not possible to determine with this study whether there is a relevant effect on the FA transport by G187W. Figure 9 shows the FA LID in G187W at T0 and 8.0 ns in AA-MDS. The FA binding site was slightly different, with intermolecular

interactions by an H-bond and π-cation at T0, while at 8.0 no intermolecular interactions were observed, but the pocket binding was composed of hydrophobic amino acids in both time steps. Figure 10. FA binding interaction diagrams in Y556C at T0 (A) and 15.0 ns (B) in AA-MDS.

Molecules 2021, 26, x FOR PEER REVIEW 11 of 22

different across the AA-MDS (Supplementary Material S18–S23).

contrary, the pattern of intermolecular interactions as well as the FA conformation was

Figure 10 shows the FA LID in Y556C at T0 and 15.00 ns in AA-MDS. In this case, the FA binding site was the same, but the intermolecular interactions by H-bonds at 15.00 ns were higher than at T0, which seems to confer to FA a stronger binding over time. Y556C promoted the greatest change in the FA binding site with respect to WT-MRP4 and G187W, but throughout the 25 ns in AA-MDS the binding site did not change. On the contrary, the pattern of intermolecular interactions as well as the FA conformation was different across the AA-MDS (Supplementary Material S18–S23). The AA-MDS studies were conducted in cAMP as a substrate control to determine if the nsSNPs affect the pattern of intermolecular interactions in the same manner as FA. First, AA-MDS were performed on the wild type and variants for MRP4-cAMP complex to obtain the C1 conformations. Further, ATP and Mg2+ were added into the MRP4 structures to obtain a more complete system, and further AA-MDS were performed to deter-

The AA-MDS studies were conducted in cAMP as a substrate control to determine if the nsSNPs affect the pattern of intermolecular interactions in the same manner as FA. First, AA-MDS were performed on the wild type and variants for MRP4-cAMP complex to obtain the C1 conformations. Further, ATP and Mg2+ were added into the MRP4 structures to obtain a more complete system, and further AA-MDS were performed to determine a new C1. This procedure was applied in the MRP4-FA complexes as well. Both C1 conformations before ATP and after ATP were used for umbrella sampling studies to obtain the free binding energy (∆G) of each ligand. Since two ATP molecules bind to MRP4, we referred to them as ATP1 and ATP2 for the NBD1 and NBD2, respectively. In all the MRP4-FA and MRP4-cAMP complexes with ATP bonds, the binding site and the ligand conformation were slightly different concerning the MRP4-FA and MRP4-cAMP complexes without ATP. The presence or the absence of ATP modifies the pattern of intermolecular interactions as well. The H-bonds, π-π stacking, and π-cation interactions are present together or individually in all the complexes, except in G187W-FA without ATP. Given the fact that the ATP influences the ligand conformation, it modifies the intermolecular interactions as well. Figures 11 and 12 show the ATP binding site in NBD1 and NBD2 from the WT-MRP4, G187W, and Y556C structures. mine a new C1. This procedure was applied in the MRP4-FA complexes as well. Both C1 conformations before ATP and after ATP were used for umbrella sampling studies to obtain the free binding energy (ΔG) of each ligand. Since two ATP molecules bind to MRP4, we referred to them as ATP1 and ATP2 for the NBD1 and NBD2, respectively. In all the MRP4-FA and MRP4-cAMP complexes with ATP bonds, the binding site and the ligand conformation were slightly different concerning the MRP4-FA and MRP4-cAMP complexes without ATP. The presence or the absence of ATP modifies the pattern of intermolecular interactions as well. The H-bonds, π-π stacking, and π-cation interactions are present together or individually in all the complexes, except in G187W-FA without ATP. Given the fact that the ATP influences the ligand conformation, it modifies the intermolecular interactions as well. Figures 11 and 12 show the ATP binding site in NBD1 and NBD2 from the WT-MRP4, G187W, and Y556C structures.

Figure 11. Representation of the ATP binding site 1 in the complete MRP4 structure and an amplification of such site. Color pink in the complete MRP4 structure represents the ATP site 1. Color red **Figure 11.** Representation of the ATP binding site 1 in the complete MRP4 structure and an amplification of such site. Color pink in the complete MRP4 structure represents the ATP site 1. Color red and represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow dotted lines indicate polar interactions.

Molecules 2021, 26, x FOR PEER REVIEW 12 of 22

dotted lines indicate polar interactions.

and represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow

Figure 12. Representation of the ATP binding site 2 in the complete MRP4 structure and an amplification of such site. Color cyan in the complete MRP4 structure represents the ATP site 2. Color red represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow dotted lines indicate polar interactions. **Figure 12.** Representation of the ATP binding site 2 in the complete MRP4 structure and an amplification of such site. Color cyan in the complete MRP4 structure represents the ATP site 2. Color red represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow dotted lines indicate polar interactions.

Figures 13–15 show the FA binding sites in WT-MRP4, G187W, and Y556C, respectively. All the afore mentioned figures contain the full MRP4 structure and a close-up of the ligand binding site. As observed in Figures 13 and 14, the FA binding site is almost the same and it is surrounded by the TMH1-TMH3; meanwhile, Figure 15 shows that the FA binding site in Y556C is surrounded by TMH2-TMH7 deep in the inner cavity. This finding is interesting, due to the fact that the Y556C mutation could promote the greater affinity of FA for the Y556C variant by switching into a "high affinity inward-facing orienta-Figures 13–15 show the FA binding sites in WT-MRP4, G187W, and Y556C, respectively. All the afore mentioned figures contain the full MRP4 structure and a close-up of the ligand binding site. As observed in Figures 13 and 14, the FA binding site is almost the same and it is surrounded by the TMH1-TMH3; meanwhile, Figure 15 shows that the FA binding site in Y556C is surrounded by TMH2-TMH7 deep in the inner cavity. This finding is interesting, due to the fact that the Y556C mutation could promote the greater affinity of FA for the Y556C variant by switching into a "high affinity inward-facing orientation", making the binding of FA easier [27]. Nevertheless, this does not mean that the FA can be transported properly, since MRP4 has an altered conformation within the NBD2, which blocks the ATP hydrolysis but not ATP binding. However, MRP4 can function with the energy provided by one ATP hydrolysis, but the Y556C mutation could interfere with such ATP hydrolysis [28].

tion", making the binding of FA easier [27]. Nevertheless, this does not mean that the FA can be transported properly, since MRP4 has an altered conformation within the NBD2, which blocks the ATP hydrolysis but not ATP binding. However, MRP4 can function with the energy provided by one ATP hydrolysis, but the Y556C mutation could interfere with such ATP hydrolysis [28]. Figure 16 shows that the cAMP binding site in Y556C is quite different compared to WT-MRP4 and G187W. The cAMP is located out of the normal binding site out of the TMDs and it may not be transported at a proper rate or efficacy. It seems that Y556C is in the "low-affinity outward-facing orientation", where the main characteristic in this stage is that the affinity of the transported entity switches from high affinity (low chemical potential of the substrate) to low affinity (high chemical potential). Such change in affinity is due to the gate to the inside is closed, and the gate to the outside of the membrane is opened [23]. The global effect will depend on the cell type. Focusing on leukemia cells, could be harmful to them, because the intracellular cAMP levels would be increased and thus, leading to apoptosis [25]. On the other hand, the cAMP binding site in Y556C could represent the initial interaction only and further re-location such as the WT-MTP4. Longer AA-MDS are required to test this hypothesis.

To contrast the effect of nsSNPs in MRP4, AA-MDS and umbrella sampling studies were performed in ceefourin-1, which is possesses a high selectivity over MRP4 inhibition. It has been suggested that ceefourin-1 may act as a competitive inhibitor in MRP4 [7], which is consistent to the AA-MDS where the binding site for cAMP and ceefourin-1 is similar

56

Figure 13. FA binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete MRP4 structure and amplification of the binding site represent the FA structure while the

in WT-MRP4, G187W and Y556C with ATP in the complex (Figures 17–19). Interestingly, ceefourin-1 binds at the same abnormal binding site as cAMP in Y556C-ATP. Moreover, Y556C seems to be in the same "low-affinity outward-facing orientation" where the low affinity (high chemical potential) of the ligand leads to switch in the binding site [23]. As mentioned before, such abnormal binding site could be the initial binding site but longer AA-MDS is required to demonstrate it. tion", making the binding of FA easier [27]. Nevertheless, this does not mean that the FA can be transported properly, since MRP4 has an altered conformation within the NBD2, which blocks the ATP hydrolysis but not ATP binding. However, MRP4 can function with the energy provided by one ATP hydrolysis, but the Y556C mutation could interfere with such ATP hydrolysis [28].

Molecules 2021, 26, x FOR PEER REVIEW 12 of 22

dotted lines indicate polar interactions.

lines indicate polar interactions.

and represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow

Figure 12. Representation of the ATP binding site 2 in the complete MRP4 structure and an amplification of such site. Color cyan in the complete MRP4 structure represents the ATP site 2. Color red represent the residues interacting with ATP, and rainbow colors represent ATP. The yellow dotted

Figures 13–15 show the FA binding sites in WT-MRP4, G187W, and Y556C, respectively. All the afore mentioned figures contain the full MRP4 structure and a close-up of the ligand binding site. As observed in Figures 13 and 14, the FA binding site is almost the same and it is surrounded by the TMH1-TMH3; meanwhile, Figure 15 shows that the FA binding site in Y556C is surrounded by TMH2-TMH7 deep in the inner cavity. This finding is interesting, due to the fact that the Y556C mutation could promote the greater affinity of FA for the Y556C variant by switching into a "high affinity inward-facing orienta-

Figure 13. FA binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete MRP4 structure and amplification of the binding site represent the FA structure while the **Figure 13.** FA binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete MRP4 structure and amplification of the binding site represent the FA structure while the yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The LID of WT-MRP4-FA is presented at the bottom. yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The LID of WT-MRP4-FA is presented at the bottom.

Figure 14. FA binding site in G187W and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the FA structure while the yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The LID of G187W-FA is presented at the bottom. **Figure 14.** FA binding site in G187W and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the FA structure while the yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The LID of G187W-FA is presented at the bottom.

Figure 15. FA binding site in Y556C and amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the FA structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556C-

Figure 16 shows that the cAMP binding site in Y556C is quite different compared to WT-MRP4 and G187W. The cAMP is located out of the normal binding site out of the TMDs and it may not be transported at a proper rate or efficacy. It seems that Y556C is in the "low-affinity outward-facing orientation", where the main characteristic in this stage is that the affinity of the transported entity switches from high affinity (low chemical potential of the substrate) to low affinity (high chemical potential). Such change in affinity is due to the gate to the inside is closed, and the gate to the outside of the membrane is opened [23]. The global effect will depend on the cell type. Focusing on leukemia cells, could be harmful to them, because the intracellular cAMP levels would be increased and thus, leading to apoptosis [25]. On the other hand, the cAMP binding site in Y556C could

FA is presented at the bottom.

Figure 14. FA binding site in G187W and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the FA structure while the yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The

yellow spheres and stick represent the ATP. The yellow dotted lines indicate polar interactions. The

Molecules 2021, 26, x FOR PEER REVIEW 13 of 22

LID of WT-MRP4-FA is presented at the bottom.

LID of G187W-FA is presented at the bottom.

Figure 15. FA binding site in Y556C and amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the FA structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556C-FA is presented at the bottom. **Figure 15.** FA binding site in Y556C and amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the FA structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556C-FA is presented at the bottom. represent the initial interaction only and further re-location such as the WT-MTP4. Longer AA-MDS are required to test this hypothesis.

Figure 16. cAMP binding site in Y556C and amplification of such site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the cAMP structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556C-**Figure 16.** cAMP binding site in Y556C and amplification of such site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the cAMP structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556CcAMP is presented at the bottom.

site but longer AA-MDS is required to demonstrate it.

To contrast the effect of nsSNPs in MRP4, AA-MDS and umbrella sampling studies

The cAMP and ceefourin-1 abnormal binding site in Y556C was one of the most re-

markable findings in this work and showed the importance of studying the global effect

were performed in ceefourin-1, which is possesses a high selectivity over MRP4 inhibition. It has been suggested that ceefourin-1 may act as a competitive inhibitor in MRP4 [7], which is consistent to the AA-MDS where the binding site for cAMP and ceefourin-1 is similar in WT-MRP4, G187W and Y556C with ATP in the complex (Figures 17–19). Interestingly, ceefourin-1 binds at the same abnormal binding site as cAMP in Y556C-ATP. Moreover, Y556C seems to be in the same "low-affinity outward-facing orientation" where the low affinity (high chemical potential) of the ligand leads to switch in the binding site [23]. As mentioned before, such abnormal binding site could be the initial binding

of Y556C in different cell lines.

site but longer AA-MDS is required to demonstrate it.

Molecules 2021, 26, x FOR PEER REVIEW 14 of 22

AA-MDS are required to test this hypothesis.

cAMP is presented at the bottom.

of Y556C in different cell lines.

represent the initial interaction only and further re-location such as the WT-MTP4. Longer

Figure 16. cAMP binding site in Y556C and amplification of such site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the cAMP structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar interactions. The LID of Y556C-

To contrast the effect of nsSNPs in MRP4, AA-MDS and umbrella sampling studies

The cAMP and ceefourin-1 abnormal binding site in Y556C was one of the most re-

markable findings in this work and showed the importance of studying the global effect

were performed in ceefourin-1, which is possesses a high selectivity over MRP4 inhibition. It has been suggested that ceefourin-1 may act as a competitive inhibitor in MRP4 [7], which is consistent to the AA-MDS where the binding site for cAMP and ceefourin-1 is similar in WT-MRP4, G187W and Y556C with ATP in the complex (Figures 17–19). Interestingly, ceefourin-1 binds at the same abnormal binding site as cAMP in Y556C-ATP. Moreover, Y556C seems to be in the same "low-affinity outward-facing orientation" where the low affinity (high chemical potential) of the ligand leads to switch in the binding site [23]. As mentioned before, such abnormal binding site could be the initial binding

**Figure 17.** Ceefourin-1 binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the ceefourin-1 structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of WT-MRP4-ceefourin-1 is presented at the bottom. Figure 17. Ceefourin-1 binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the ceefourin-1 structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of WT-MRP4-ceefourin-1 is presented at the bottom.

Figure 18. Ceefourin-1 binding site in G187W and the amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of G187W-ceefourin-1 is presented at the bottom. **Figure 18.** Ceefourin-1 binding site in G187W and the amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of G187W-ceefourin-1 is presented at the bottom.

teractions. The LID of Y556C-ceefourin-1 is presented at the bottom.

Figure 19. Ceefourin-1 binding site in Y556C and the amplification of such a site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar in-

Umbrella sampling simulations were performed to determine the theoretical affinity,

expressed as the free energy of binding (ΔG), of an event related to protein–ligand interactions along a reaction coordinate. Table 5 shows the ΔG obtained by Umbrella sampling through 10 ns of simulation on each complex. According to Umbrella sampling results, the Y556C-FA complex is the most stable with or without ATP bound to MRP4. Interestingly, ΔG for WT-FA and Y556C-FA complexes were lower in those MRP4 without ATP. In addition, ΔGs for G187W-cAMP and Y556C-cAMP complexes were higher without ATP, suggesting that the ATP bound is required for both FA and cAMP binding

Figure 18. Ceefourin-1 binding site in G187W and the amplification of such a site. Rainbow colors in both the complete G187W and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar

Figure 17. Ceefourin-1 binding site in WT-MRP4 and amplification of such site. Rainbow colors in both the complete WT-MRP4 and amplification of the binding site represent the ceefourin-1 structure while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar

interactions. The LID of G187W-ceefourin-1 is presented at the bottom.

Molecules 2021, 26, x FOR PEER REVIEW 15 of 22

interactions. The LID of WT-MRP4-ceefourin-1 is presented at the bottom.

Figure 19. Ceefourin-1 binding site in Y556C and the amplification of such a site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of Y556C-ceefourin-1 is presented at the bottom. **Figure 19.** Ceefourin-1 binding site in Y556C and the amplification of such a site. Rainbow colors in both the complete Y556C and amplification of the binding site represent the ceefourin-1 structure, while yellow spheres represent the ATP. The yellow dotted lines indicate polar and non-polar interactions. The LID of Y556C-ceefourin-1 is presented at the bottom.

Umbrella sampling simulations were performed to determine the theoretical affinity, The cAMP and ceefourin-1 abnormal binding site in Y556C was one of the most remarkable findings in this work and showed the importance of studying the global effect of Y556C in different cell lines.

expressed as the free energy of binding (ΔG), of an event related to protein–ligand interactions along a reaction coordinate. Table 5 shows the ΔG obtained by Umbrella sampling through 10 ns of simulation on each complex. According to Umbrella sampling results, the Y556C-FA complex is the most stable with or without ATP bound to MRP4. Interestingly, ΔG for WT-FA and Y556C-FA complexes were lower in those MRP4 without ATP. In addition, ΔGs for G187W-cAMP and Y556C-cAMP complexes were higher without ATP, suggesting that the ATP bound is required for both FA and cAMP binding Umbrella sampling simulations were performed to determine the theoretical affinity, expressed as the free energy of binding (∆G), of an event related to protein–ligand interactions along a reaction coordinate. Table 5 shows the ∆G obtained by Umbrella sampling through 10 ns of simulation on each complex. According to Umbrella sampling results, the Y556C-FA complex is the most stable with or without ATP bound to MRP4. Interestingly, ∆G for WT-FA and Y556C-FA complexes were lower in those MRP4 without ATP. In addition, ∆Gs for G187W-cAMP and Y556C-cAMP complexes were higher without ATP, suggesting that the ATP bound is required for both FA and cAMP binding stabilization. ∆G for WT-MRP4-cAMP with or without ATP was the same. This represents an interesting finding because in WT-MRP4, the binding of ATP may promote the stabilization and conformational changes on MRP4 protein to promote an adequate interaction with FA to be properly transported out of the cell. Regarding the information provided by the LIDs, it is not possible to link the intermolecular interactions to ∆G because of the lack of a pattern in all the complexes. The intermolecular interactions do not define the increase or decrease of ∆G in this case. To confirm these observations, experimentally testing the FA efflux on cells expressing equal levels of WT-MRP4, G187W, and Y556C, as well as MRP4 inhibition by ceefourin-1 to measure the transport rate would afford further information of the differences among mutants and wild-type MRP4, experiments that are currently being carried out and will be reported in future articles.

The ∆G for the WT-MRP4-ATP1 complex was lower than that for FA and cAMP, which is reasonable considering its role in MRP4 functioning and needs to be bound to NBDs for longer time than ligands. In the case of ATP1/2 complexes with MRP4 variants, the most remarkable finding was that the Y556C-ATP1 complex was the least stable, theoretically demonstrated through the Umbrella sampling results, and it may be related to the mutation in the NBD1. The mechanism for the MRP4 function requires the

transmission of the molecular motion from the NBDs to the TMDs. At this point, the ATP-binding can be considered as the power stroke in which the chemical potential of the transported entity changed, and ATP hydrolysis leads to the formation of an extra negative charge, thus opening the closed nucleotide sandwich structure and the opening of the nucleotide sandwich structure facilitates Pi release and ADP dissociation, which in turn allows the TMDs and access gates to reset to the high-affinity orientation on the original side of the membrane to continue the transport cycle [23,29]. If the Y556C mutation promotes a decreased affinity of ATP1 for its binding site and, in turn, a blockade of ATP1 hydrolysis, the Y556C variant activity could be diminished or truncated, also considering that WT-MTP4 and its variants lack the ability to hydrolyze ATP2 in NBD2; thus, the lack of Y556C efficacy in the transport substrates. Additionally, ATP2 had the highest affinity for Y556C, and it would be interesting to test if high affinity could reestablish the ATP2 hydrolysis in NBD2.


**Table 5.** Binding free energies (∆Gs) obtained by umbrella sampling.

FA-ATP, cAMP-ATP, and ceefourin-1-ATP: the ∆G is referred to FA, cAMP or ceefourin-1 in complex with ATP. FA-ATP1, FA-ATP2, cAMP-ATP1, cAMP-ATP2, ceefourin-1-ATP1, and ceefourin-1-ATP2: ∆G calculation is focused on ATP1 or ATP2

The abnormal cAMP and ceefourin-1 binding in Y556C (Figures 16 and 19) is consistent with the ∆G for Y556C-cAMP and Y556-ceefourin-1, which had the lowest affinity with respect to the complexes with WT-MRP4 and G187W, and it indicates a low affinity at that binding site, suggesting two possibilities: (a) it is an initial cAMP or ceefourin-1 binding site to further binding at the inner cavity, or (b) possible deficiency in cAMP and ceefourin-1 transport. The latter is the most feasible due to cAMP interacts to G187W and WT-MRP4 in its normal binding site, but Y556C seems to be in the "low affinity outwardfacing orientation" as mentioned before. To predict the effect of nsSNPs on the efficacy of chemotherapeutics, it is important to determine in a further study, the relation between ∆G in silico and the transport rate in vitro of several substrates. This suggests that ∆Gs higher than that for the WT-cAMP-ATP complex may be related to a low rate of cAMP transport. The ∆Gs for ceefourin-1 in all the complexes with MRP4 and its variants had the best affinity compared to those complexes of cAMP, supporting the idea that ceefourin-1 may act as a competitive inhibitor at least with cAMP. However, the presence of ATP in WT-MRP4 and its variants promotes a better affinity to cAMP than ceefourin-1 in the complexes with G187W and Y556C. A competitive assay in vitro is required to determine if ceefourin-1 had the best affinity to WT-MRP4 and its variants. The Figures 20 and 21 represent the LIDs for FA and cAMP, respectively. In both figures, the intermolecular interactions of ligands with the absence of ATP are presented in the upper panel while the intermolecular interactions of ligands with the presence of ATP are shown in the bottom panel. The FA binding site is almost the same with or without ATP, but the number and the type of intermolecular interactions is different. It seems that ATP promotes a greater number of H-bonds between FA and all WT-MRP4 and its variants. Regarding the cAMP

binding site, the ATP binding did not modify it; thus, the effect of ATP seems to be related with changes in the intermolecular interactions and affinity that, according to Umbrella sampling results, the ATP decreases the ∆G in the G187W-cAMP-ATP and Y556C-cAMP-ATP complexes. Figure 22 presents the LIDs for ceefourin-1. The binding site is different in all the complexes; interestingly, such a binding site does not change in the presence of ATP. The main difference induced for ATP is the intermolecular interaction pattern, which led to great amount of π-π stacking and H-bonds. ATP complexes. Figure 22 presents the LIDs for ceefourin-1. The binding site is different in all the complexes; interestingly, such a binding site does not change in the presence of ATP. The main difference induced for ATP is the intermolecular interaction pattern, which led to great amount of π-π stacking and H-bonds. The MRP4 variants may predispose the population to a given disease regarding the site of the MRP4 that was affected by the mutation and the change in the affinity of a given substrate. The clinical implications of MRP4 variants have been studied over the past years and it is crucial to describe the relation of the MRP4 variants with diseases [30].

Molecules 2021, 26, x FOR PEER REVIEW 17 of 22

binding site to further binding at the inner cavity, or (b) possible deficiency in cAMP and ceefourin-1 transport. The latter is the most feasible due to cAMP interacts to G187W and WT-MRP4 in its normal binding site, but Y556C seems to be in the "low affinity outwardfacing orientation" as mentioned before. To predict the effect of nsSNPs on the efficacy of chemotherapeutics, it is important to determine in a further study, the relation between ΔG in silico and the transport rate in vitro of several substrates. This suggests that ΔGs higher than that for the WT-cAMP-ATP complex may be related to a low rate of cAMP transport. The ΔGs for ceefourin-1 in all the complexes with MRP4 and its variants had the best affinity compared to those complexes of cAMP, supporting the idea that ceefourin-1 may act as a competitive inhibitor at least with cAMP. However, the presence of ATP in WT-MRP4 and its variants promotes a better affinity to cAMP than ceefourin-1 in the complexes with G187W and Y556C. A competitive assay in vitro is required to determine if ceefourin-1 had the best affinity to WT-MRP4 and its variants. The Figures 20 and 21 represent the LIDs for FA and cAMP, respectively. In both figures, the intermolecular interactions of ligands with the absence of ATP are presented in the upper panel while the intermolecular interactions of ligands with the presence of ATP are shown in the bottom panel. The FA binding site is almost the same with or without ATP, but the number and the type of intermolecular interactions is different. It seems that ATP promotes a greater number of H-bonds between FA and all WT-MRP4 and its variants. Regarding the cAMP binding site, the ATP binding did not modify it; thus, the effect of ATP seems to be related with changes in the intermolecular interactions and affinity that, according to Umbrella sampling results, the ATP decreases the ΔG in the G187W-cAMP-ATP and Y556C-cAMP-

Figure 20. LIDs for FA with the different MRP4 structures with or without ATP. (A) WT-MRP4-FA complex. (B) G187W-FA complex. (C) Y556C-FA complex. (D) WT-MRP4-FA-ATP complex. (E) G187W-FA-ATP complex. (F) Y556C-FA-ATP complex. **Figure 20.** LIDs for FA with the different MRP4 structures with or without ATP. (**A**) WT-MRP4-FA complex. (**B**) G187W-FA complex. (**C**) Y556C-FA complex. (**D**) WT-MRP4-FA-ATP complex. (**E**) G187W-FA-ATP complex. (**F**) Y556C-FA-ATP complex. Molecules 2021, 26, x FOR PEER REVIEW 18 of 22

Commented [M61]: Please replace with a sharper

image.

Figure 21. LIDs for cAMP with the different MRP4 structures with or without ATP. (A) WT-MRP4 cAMP complex. (B) G187W-cAMP complex. (C) Y556C-cAMP complex. (D) WT-MRP4-cAMP-ATP complex. (E) G187W-cAMP-ATP complex. (F) Y556C-cAMP-ATP complex. **Figure 21.** LIDs for cAMP with the different MRP4 structures with or without ATP. (**A**) WT-MRP4 cAMP complex. (**B**) G187W-cAMP complex. (**C**) Y556C-cAMP complex. (**D**) WT-MRP4-cAMP-ATP complex. (**E**) G187W-cAMP-ATP complex. (**F**) Y556C-cAMP-ATP complex.

Figure 22. LIDs for ceefourin-1 with the different MRP4 structures with or without ATP. (A) WT-MRP4-ceefourin-1 complex. (B) G187W-ceefourin-1 complex. (C) Y556C-ceefourin-1 complex. (D) WT-MRP4-ceefourin-1-ATP complex. (E) G187W-ceefourin-1-ATP complex. (F) Y556C-ceefourin-1-

Structure prediction by protein threading for MRP4 was performed using its primary sequence (code: O15439) from the UniProt database [31,32]. MRP4 mutant models were

3.1. Protein Threading for WT-MRP4 and Its Variants G187W and Y556C

ATP complex.

3. Materials and Methods

Molecules 2021, 26, x FOR PEER REVIEW 18 of 22

Figure 22. LIDs for ceefourin-1 with the different MRP4 structures with or without ATP. (A) WT-MRP4-ceefourin-1 complex. (B) G187W-ceefourin-1 complex. (C) Y556C-ceefourin-1 complex. (D) WT-MRP4-ceefourin-1-ATP complex. (E) G187W-ceefourin-1-ATP complex. (F) Y556C-ceefourin-1- ATP complex. **Figure 22.** LIDs for ceefourin-1 with the different MRP4 structures with or without ATP. (**A**) WT-MRP4-ceefourin-1 complex. (**B**) G187W-ceefourin-1 complex. (**C**) Y556C-ceefourin-1 complex. (**D**) WT-MRP4-ceefourin-1-ATP complex. (**E**) G187W-ceefourin-1-ATP complex. (**F**) Y556C-ceefourin-1- ATP complex.

3. Materials and Methods 3.1. Protein Threading for WT-MRP4 and Its Variants G187W and Y556C Structure prediction by protein threading for MRP4 was performed using its primary sequence (code: O15439) from the UniProt database [31,32]. MRP4 mutant models were The MRP4 variants may predispose the population to a given disease regarding the site of the MRP4 that was affected by the mutation and the change in the affinity of a given substrate. The clinical implications of MRP4 variants have been studied over the past years and it is crucial to describe the relation of the MRP4 variants with diseases [30].

#### **3. Materials and Methods**

#### *3.1. Protein Threading for WT-MRP4 and Its Variants G187W and Y556C*

Structure prediction by protein threading for MRP4 was performed using its primary sequence (code: O15439) from the UniProt database [31,32]. MRP4 mutant models were made by the substitution of amino acids, G187W, or Y556C into the WT-MRP4 primary sequence. Each primary sequence was uploaded into the I-TASSER [17,18] server for the calculations of the models. The crystallography of MRP1 (PDBD code: 5UJ9) from *Bos taurus* was used as a template in all cases [33,34].

#### *3.2. Coarse-Grained Molecular Dynamics (CG-MD) Simulations*

All the systems (WT, G187W, and Y556C) for the simulations were built in the Martini Maker module [35] from CHARMM-GUI [36,37] using the Martini2.2p force field and adding a phosphatidylcholine (POPC) lipid bilayer membrane in an isothermal-isobaric ensemble (NPT) at 310.15 K. The simulations of 1 µs CG-MD were carried out in the Gromacs 2018.7 program [38,39], after of the minimizations and equilibrium protocols suggested by CHARM-GUI server. The module "cluster" of Gromacs 2018.7 was used to find the relevant conformations of the simulation using the "gromos" algorithm and the backbone for alignment. The most representative structure of the largest conformation cluster of the three simulations was converted into all atoms in the Martini Maker/All-Atom converter from CHARMM-GUI for the following calculations.

All the CG-MD simulations were performed in the ADA cluster of the National Laboratory of Advanced Scientific Visualization at Campus Juriquilla of the National Autonomous University of Mexico (LAVIS-UNAM).

#### *3.3. Molecular Docking on WT-MRP4, G187W and Y556C*

The 3D structures of the selected ligand groups, substrate drugs, endogenous substrates, and inhibitors were obtained from the PubChem public database [40]. Molecular docking was performed with AutoDock 4.2.6 optimized for graphical-processing units using a total of 50 runs and 25,000,000 evaluations; a grid box of 22.5 × 22.5 × 22.5 Å<sup>3</sup> centering on the relevant amino acids reported by Ravna 2008 and 2009, El-Sheink 2008, and Chen 2018 [9,41–43]; in a Lamarckian genetic algorithm and Solis-Wets local search [44].

#### *3.4. All-Atom Molecular Dynamics (AA-MD) Simulations*

For molecular dynamic studies with FA as a ligand, the protein–ligand complexes WT-MRP4-FA, G187W-FA, or Y556C-FA, obtained from the molecular docking, were used, and AA-MD simulations were performed in Desmond 3.6 as an application of the Maestro software [45,46] as graphical interface. The AA-MDS systems were built using the System Setup module with an OPLS force-field (Optimized Potentials for Liquid Simulations), adding a POPC lipid membrane and simple point charge (SPC) water model in an NPT assembly at 310.15 K. Once the system was built, the standard relaxation protocol for system relaxation with increasing temperatures and decreasing restraints was used and an MD production simulation of 25 ns (100 frames) was performed in the Molecular Dynamics module. Clustering was performed in the Desmond Trajectory Clustering module to obtain the most representative conformation of the largest cluster (cluster1, C1). Trajectory analyses were performed in the Simulation Event Analysis module in Maestro.

#### *3.5. Umbrella Sampling*

Using the C1 of the 25 ns AA-MD trajectory, the system was built under the previously mentioned conditions. Once the system was obtained and relaxed using the AA-MD relaxation protocol, a 10 ns (100 frames) MD simulation was performed in the Metadynamics module of Desmond using the protein and ligand center-of-mass distance as the collective variable, with 0.3 kcal/mol height and 0.1 kcal/mol width as the Gaussian parameters for the umbrella protocol, on an NPT ensemble at 310.15 K and 1.01325 bar. Finally, the analysis was performed in the Metadynamics Analysis module of Desmond.

#### *3.6. Manipulation of the Complexes and Figures*

All the protein figures and alignments presented in this work were made in PyMOL software (The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger) [47]. The manipulation of the WT-MRP4, G187W, and Y556C structures was performed in Maestro (Schrödinger Release 2019-3: Maestro, Schrödinger, LLC, New York, NY, USA) [48].

#### **4. Conclusions**

To obtain the 3D structure of MRP4 and its variants, which are not resolved by NMR or X-ray, protein threading was performed in this study and relaxing the structures was performed by CG-MDS to carry out the molecular docking, while MDS and umbrella sampling studies were performed to yield relevant information regarding the residues involved in the binding of the studied molecules groups and changes in the ∆G of FA and cAMP in the presence or the absence of ATP, which also allowed us to observe the relevance of the mutations in the binding and movement of MRP4 and its variants. According to our results, the nsSNPs G187W and Y556C led to changes in the ligand binding site, DS, and binding energy (∆G). In addition, the ATP binding to MRP4 significantly modifies the intermolecular interactions (at least, for FA and cAMP) and the binding energy compared to the complexes where ATP was not bound to MRP4. The effect of the abnormal binding site of cAMP in Y556C is consistent with the highly selective MRP4 inhibitor ceefourin-1, which makes it interesting to study such mutations in vitro. The affinity of ceefourin-1 for WT-MRP4 and its variants is higher than the affinity of cAMP. Cofactors such as ATP and Mg2+ should be included in the in silico analyses related to MRP4. It is well known that nonsynonymous mutations usually affect the protein function or activity and its conformation, but this is the first report that suggests that most endogen substrates change their affinity and binding site in G187W and Y556C, which could modify the cell metabolism. We will report in further works the measure of substrate efflux and the relation between G187W or Y556C expression, location, and cell viability to determine the overall effect of the nsSNPs G187W and Y556C in vitro.

**Supplementary Materials:** The following are available online: Table S1: RMSD values obtained for the alignment of C1 cluster from AA-MDS for different sites of WT-MRP4, G187W and Y556C, Figure S2: Binding site and intermolecular interaction of cholic acid in WT-MRP4 (A), G187W (B) and Y556C (C). D Nomenclature of LIDs, Figure S3: Binding site and intermolecular interaction of taurocholic acid in WT-MRP4 (A), G187W (B) and Y556C (C), Figure S4: Binding site and intermolecular interactions of cefazoline in WT-MRP4 (A), G187W (B) and Y556C (C), Figure S5: Binding site and intermolecular interaction of ceefourin-1 in WT-MRP4 (A), G187W (B) and Y556C (C), Figure S6: WT-MRP4-FA complex at T0 in AA-MDS, Figure S7: WT-MRP4-FA complex at 5 ns in AA-MDS, Figure S8: WT-MRP4-FA complex at 10 ns in AA-MDS, Figure S9: WT-MRP4-FA complex at 15 ns in AA-MDS, Figure S10: WT-MRP4-FA complex at 20 ns in AA-MDS, Figure S11: WT-MRP4-FA complex at 25 ns in AA-MDS, Figure S12: G187W-FA complex at T0 in AA-MDS, Figure S13: G187W-FA complex at 5 ns in AA-MDS, Figure S14: G187W-FA complex at 10 ns in AA-MDS, Figure S15: G187W-FA complex at 15 ns in AA-MDS, Figure S16: G187W-FA complex at 20 ns in AA-MDS, Figure S17: G187W-FA complex at 25 ns in AA-MDS, Figure S18: Y556C-FA complex at T0 in AA-MDS, Figure S19: Y556C-FA complex at 5 ns in AA-MDS, Figure S20: Y556C-FA complex at 10 ns in AA-MDS, Figure S21: Y556C-FA complex at 15 ns in AA-MDS, Figure S22: Y556C-FA complex at 20 ns in AA-MDS, Figure S23: Y556C-FA complex at 25 ns in AA-MDS.

**Author Contributions:** Conceptualization: E.B., L.B., and G.G.-A.; calculations: E.B., G.A.-D., and A.R.-M.; data analysis: E.B., G.A.-D., and A.R.-M.; writing—original draft preparation: E.B., G.A.-D.; writing—review and editing: E.B., L.B., G.G.-A., G.A.-D., L.B., and A.R.-M.; project administration and funding acquisition: L.B. and G.G.-A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the University of Queretaro (UAQ) through the FOFI-UAQ research grant FCQ2018-39 and the National Council of Research and Technology (CONACYT) doctoral scholarship no. 335389 for Edgardo Becerra and no. 335496 for Giovanny Aguilera-Durán.

**Institutional Review Board Statement:** Not Applicable.

**Informed Consent Statement:** Not Applicable.

**Acknowledgments:** The authors gratefully acknowledge the technical support from Luis Aguilar, Alejandro de León, Carlos Flores, and Jair García of the National Laboratory of Advanced Scientific Visualization (LAVIS-UNAM).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Biocomputational Prediction Approach Targeting FimH by Natural SGLT2 Inhibitors: A Possible Way to Overcome the Uropathogenic Effect of SGLT2 Inhibitor Drugs**

**Mutaib M. Mashraqi 1,†, Navaneet Chaturvedi 2,3,† , Qamre Alam <sup>4</sup> , Saleh Alshamrani <sup>1</sup> , Mosa M. Bahnass 1,5 , Khurshid Ahmad <sup>6</sup> , Amany I. Alqosaibi <sup>7</sup> , Mashael M. Alnamshan <sup>7</sup> , Syed Sayeed Ahmad <sup>6</sup> , Mirza Masroor Ali Beg <sup>6</sup> , Abha Mishra <sup>2</sup> , Sibhghatulla Shaikh 6,\* and Syed Mohd Danish Rizvi 8,\***


**Abstract:** The Food and Drug Administration (FDA) approved a new class of anti-diabetic medication (a sodium–glucose co-transporter 2 (SGLT2) inhibitor) in 2013. However, SGLT2 inhibitor drugs are under evaluation due to their associative side effects, such as urinary tract and genital infection, urinary discomfort, diabetic ketosis, and kidney problems. Even clinicians have difficulty in recommending it to diabetic patients due to the increased probability of urinary tract infection. In our study, we selected natural SGLT2 inhibitors, namely acerogenin B, formononetin, (−)-kurarinone, (+)-pteryxin, and quinidine, to explore their potential against an emerging uropathogenic bacterial therapeutic target, i.e., FimH. FimH plays a critical role in the colonization of uropathogenic bacteria on the urinary tract surface. Thus, FimH antagonists show promising effects against uropathogenic bacterial strains via their targeting of FimH's adherence mechanism with less chance of resistance. The molecular docking results showed that, among natural SGLT2 inhibitors, formononetin, (+)-pteryxin, and quinidine have a strong interaction with FimH proteins, with binding energy (∆G) and inhibition constant (ki) values of −5.65 kcal/mol and 71.95 µM, −5.50 kcal/mol and 92.97 µM, and −5.70 kcal/mol and 66.40 µM, respectively. These interactions were better than those of the positive control heptyl α-Dmannopyranoside and far better than those of the SGLT2 inhibitor drug canagliflozin. Furthermore, a 50 ns molecular dynamics simulation was conducted to optimize the interaction, and the resulting complexes were found to be stable. Physicochemical property assessments predicted little toxicity and good drug-likeness properties for these three compounds. Therefore, formononetin, (+)-pteryxin, and quinidine can be proposed as promising SGLT2 inhibitors drugs, with add-on FimH inhibition potential that might reduce the probability of uropathogenic side effects.

**Citation:** Mashraqi, M.M.; Chaturvedi, N.; Alam, Q.; Alshamrani, S.; Bahnass, M.M.; Ahmad, K.; Alqosaibi, A.I.; Alnamshan, M.M.; Ahmad, S.S.; Beg, M.M.A.; et al. Biocomputational Prediction Approach Targeting FimH by Natural SGLT2 Inhibitors: A Possible Way to Overcome the Uropathogenic Effect of SGLT2 Inhibitor Drugs. *Molecules* **2021**, *26*, 582. https://doi.org/10.3390/ molecules26030582

Academic Editors: Marco Tutone and Anna Maria Almerico Received: 18 December 2020 Accepted: 19 January 2021 Published: 22 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** sodium–glucose co-transporters 2; FimH; uropathogenic bacteria; urinary tract infections; diabetes

#### **1. Introduction**

Globally, diabetes mellitus is one of the most prevalent metabolic diseases and is estimated to increase to 552 million cases by 2030 [1]. Diabetes has been considered to enhance vulnerability to infectious diseases and has also been linked with an enhanced risk of death from infectious illness in some [2,3], but not all [4], investigations.

Sodium–glucose cotransporter-2 (SGLT2) inhibitors are a novel group of drugs used to treat patients with type 2 diabetes mellitus (T2DM). These drugs exert their effects by preventing glucose reabsorption at the proximal renal tubule and enhancing the excretion of urinary glucose [5]. Owing to the elevated urine glucose levels, SGLT2 inhibitors enhance the risk of urinary tract infections (UTIs) [6]. In addition, pharmacologicallyinduced urine glucose levels with SGLT2 inhibitors in diabetic patients might further sustain bacterial growth [7]. By themselves, SGLT2 inhibitors can possibly enhance the risk of UTIs and susceptibility to genital infections when used to manage patients [8]. SGLT2 inhibitors might cause serious UTIs, as the FDA warned in December 2015 [9]. Empagliflozin and canagliflozin are the preferred drugs suggested for T2DM patients with established cardiovascular disease [10,11]. The United States-based public safety advisory reported 19 cases of fatal renal or blood infection from March 2013 to October 2014. The origin of these cases was traced to a UTI induced by SGLT2 inhibitor intake [9].

Bacterial pili are proteinaceous projections extending from the bacterial cell surface and are used for attachment and cell motility [12]. Gram-negative bacteria use Type 1 pili to adhere to the host tissue and, therefore, Type 1 pili have been established as an important virulence factor in UTIs. Type 1 pili are made up of repeated subunits of FimA protein. These subunits are arranged to form a helical wound cylinder that composes a thick pilus rod. The distal flexible tip of the pilus rod is comprised of a single copy of proteins—FimF and FimG—while the tip adhesin bears FimH. The distinct binding of FimH (terminal adhesin) to mannosylated host glycoproteins mediates the adhesion of bacterial pathogens to the host tissue. UTI pathogenesis is caused by the FimH and, therefore, is a promising curative target [13].

Acerogenin B is a cyclic diarylheptanoid obtained from the bark of *Acer nikoense*, and it has been found to inhibit both SGLT1 and SGLT2 [14]. Kurarinone and formononetin are flavonoids isolated from the dried root of *Sophora flavescens.* Kurarinone has demonstrated inhibitory activity against both SGLT1 and SGLT2; however, formononetin was reported to inhibit only SGLT2 and not SGLT1 [15]. Quinidine is isolated from the bark of the cinchona tree and (+)-pteryxin is extracted from the plant *Peucedanum* spp. Recently, both of these natural compounds have been found to inhibit both SGLT1 and SGLT2 [16]. Heptyl α-D-mannopyranoside is a FimH antagonist [17] and, in this study, it was used as a reference compound.

In the present study, we explored natural SGLT2 inhibitors (acerogenin B, kurarinone, formononetin, quinidine, and (+)-pteryxin) that might have less severe uropathogenic side effects than does the approved SGLT2 inhibitor canagliflozin, also known as gliflozins. We speculated that formononetin, (+)-pteryxin, and quinidine would be promising SGLT2 inhibitors with less severe uropathogenic side effects.

#### **2. Methodology**

#### *2.1. SGLT2 Inhibitors and Target Protein Structure Retrieval*

The 3-dimensional structure of FimH was taken from the protein data bank (PDB ID: 4AV5), while the 3-dimensional structure of SGLT2 was made by employing the SWISS-MODEL Workspace after retrieving its amino acid sequence from Uniprot (P31639). The PDB structure 2XQ2 was used as a template and the generated model was validated

using various in silico tools, viz., SAVES v6.0 and VADAR (Volume, Area, Dihedral Angle Reporter). The PDB structure of the compounds canagliflozin (CID: 24812758), acerogenin B (CID: 10913542), formononetin (CID: 5280378), (−)-kurarinone (CID: 10812923), (+) pteryxin (CID: 511787), quinidine (CID: 441074), and the FimH antagonist heptyl α-Dmannopyranoside (CID: 11300413) were retrieved from the PubChem database.

#### *2.2. Physicochemical Properties and Toxicity Potential Prediction*

The physicochemical properties and toxicity potential of SGLT2 inhibitors and FimH antagonists were calculated by applying the Orisis Datawarrior property explorer tool (http://www.openmolecules.org/datawarrior/download.html). At first, molecular weight, the number of rotatable bonds, the number of hydrogen bond acceptors and donors, cLogP value, topological polar surface area (TPSA), and the Lipinski's rule violation [18] were estimated to check physicochemical parameters. Later, the method outlined by Zhao et al. [19] was applied to calculate percentage of absorption; here, the following formula was used: absorption % = 109 − (0.345 × TPSA). Toxicity was also predicted for SGLT2 inhibitors and FimH antagonists by using toxicity features such as irritability, reproductive effects, tumorigenicity, and mutagenicity. Orisis Datawarrior property explorer tool toxicity predictions are based on comparative analysis of our tested compounds with the preestimated set of known structural molecules.

#### *2.3. Molecular Docking*

The ligands were docked to the SGLT2 and FimH proteins using "AutoDock 4.2" by following the protocol described by Rizvi et al. [20]. To minimize the energy usage of the ligand molecules, a Merck molecular force field (MMFF94) was employed. The ligand atoms were added with Gasteiger partial charges. Docking calculations were done on the target proteins. Essential hydrogen atoms, Kollman united atom type charges and solvation parameters were added by using AutoDock tools. Consequently, the binding pocket was added with conserved water molecules to mimic the *in vivo* environment. An auto grid program was used to generate the affinity (grid) maps sized at 60 × 60 × 60◦ , the aim of which was to target the grid coordinates in the catalytic site of the target protein (SGLT2 and FimH). The x, y, and z coordinate values for the FimH protein targeting the catalytic site were taken as 3.305, −16.68, and −13.57, respectively. Docking simulations were done using the Lamarckian genetic algorithm and the Solis and Wets local search method. The initial position, orientation, and torsions of the ligands were set randomly. Each docking experiment was derived from 100 different runs that were set to terminate after a maximum of 2,500,000 energy evaluations. The population size was set to 150. The Discovery Studio Visualizer 2.5 (Accelrys, San Diego, CA, USA) was used to generate the final figures.

#### *2.4. LIGPLOT+ ANALYSIS*

After the completion of all docking experiments, the best combination of ligand–FimH and ligand–SGLT2 was selected and analyzed using LIGPLOT+ Version v.2.1 (EMBL-EBI, Cambridgeshire, UK). The hydrogen and hydrophobic interactions between the important amino acid residues of FimH/SGLT2 with each ligand were analyzed by LIGPLOT. For each interaction, the 3-D structures generated were converted into 2-D figures by applying the LIGPLOT algorithm.

#### *2.5. Molecular Dynamics Simulation*

System building: GROMACS 4.6.7 packages [21,22] were used to prepare the system and perform molecular dynamics (MD) simulations using the gromos53a6 force field [23]. The protein solute was solvated by explicit SPC216 water [24] in a dodecahedral box with a margin of 10 Å between the solute and the box walls. Systems were brought to neutrality by the addition of sodium counter ions.

Simulation detail: A 10 Å cut-off distance was taken under the particle-mesh Ewald method [25] to calculate long-range electrostatic interactions, and a 10 Å cut-off distance was also considered to evaluate van der Waals interactions. The LINCS algorithm of fourth-

order expansion was used to constrain bond lengths [26]. The steepest descent algorithm was applied to optimize the removal of steric clashes between atoms for 10,000 steps. The system was equilibrated for 1 ns with position restraints on all heavy atoms. Berendsen weak coupling schemes were used to maintain the system at 300 K and 1 atom, using separate baths for the system. Initial velocities were generated randomly using a Maxwell– Boltzmann distribution corresponding to 300 K. Finally, the production run was performed for 20 ns. Furthermore, xmgrace (http://plasma-gate.weizmann.ac.il) was used for preparing graphs. Ligand topology preparation was implemented by using the PRODRG server, using the option specifying no chirality, full charge, and no energy minimization [27].

Trajectory analysis: The g\_rms tool of the GROMACS package was used to calculate the root-mean-square deviation (RMSD) of each trajectory. The covariance matrix and eigenvectors of the trajectories were calculated through g\_covar and g\_anaeig programs.

#### **3. Results and Discussion**

#### *3.1. Physicochemical Properties and Molecular Docking*

Many varieties of plant-based natural compounds have been reported, which have significant anti-diabetic effects. In streptozotocin-stimulated diabetic mice, bakuchiol (a polyphenol compound) decreases glucose levels and enhances serum insulin levels [28]. Caffeine (an alkaloid) lowers glucose, induces insulin secretion in vitro, and improves glucose absorption in skeletal muscle [29]. Vanillic acid isolated from *Fagara tessmannii Engl.* (Family: Rutaceae) exhibits α-glucosidase inhibitory actions in vitro [30]. Christinin-A is a triterpenoidal saponin glycoside that exhibits an anti-hyperglycemic effect in both type 1 and type 2 diabetic rats [31].

Molecular docking has an important role in drug discovery, assisting in digging out the active or lead compounds from a library of natural compounds [32]. It is one of the most widely used virtual screening tools, particularly when the three-dimensional structure of the target protein is available. Docking enables the prediction of both ligand–target binding affinity and the structure of the protein–ligand complex, which are useful for optimizing the lead [33,34].

Prior to the molecular docking study, we checked the physicochemical properties and toxicity potential of SGLT2 inhibitors and FimH antagonists (Tables 1 and 2). The selected SGLT2 inhibitors were known drug canagliflozin and natural SGLT2 inhibitors, namely, acerogenin B, formononetin, (−)-kurarinone, (+)-pteryxin, and quinidine, while heptyl α-Dmannopyranoside was selected as an FimH antagonist. During physicochemical property assessment, we found that out of all the compounds tested, (−)-kurarinone showed one violation of the Lipinski rule [18], i.e., a cLogP value (Logarithm of compound partition coefficient between n-octanol) higher than 5 (Table 1). On the other hand, all the tested compounds showed no toxicity except (+)-pteryxin. (+)-pteryxin was predicted to have a high irritant effect with no mutagenic, tumorigenic, or reproductive toxicity (Table 2).

**Table 1.** Physicochemical properties of natural sodium–glucose co-transporter 2 (SGLT2) inhibitors and control compounds.


\* Control drugs/compounds; \*\* Percentage of Absorption (% of Absorption) was calculated by the following equation: % of Absorption = 109 − (0.345 × Topological Polar Surface Area); \*\*\* Logarithm of the compound partition coefficient between n-octanol and water.


**Table 2.** Toxicity potential of natural SGLT2 inhibitors and control compounds.

\* Control drugs/compounds.

The predicted model of SGLT2 revealed that 81.25 percent of the residues had an average 3D-1D score of ≥0.2, resulting in a "pass" in SAVES v6.0.0 by the VERIFY3D tool. In addition, the Ramachandran plot (showing 93% of the residues in the allowed region), fractional accessible surface area, stereo/packing quality index, fractional residue volume, and 3-D profile quality index (produced by the VADAR 1.8 server) showed that the predicted 3-D model was within an appropriate range (Figure 1).

**Figure 1.** *Cont*.

**Figure 1.** *Cont*.

**Figure 1.** Validation of the predicted 3-D structure of SGLT2.

The inhibition of SGLT2 has been considered a novel pharmacotherapy for T2DM treatment [35]. Accordingly, molecular docking studies have revealed that the natural SGLT2 inhibitors formononetin, (+)-pteryxin, and quinidine were efficiently bound with SGLT2. Formononetin was found to interact with the F98, E99, A102, L149, K152, T153, V286, S287, Y290, W291, I456, Q457, and S460 binding pocket residues of SGLT2 (Figure 2a); while the S74, G79, H80, K154, I155, V157, D158, S161, S393, I397, and I456 residues of SGLT2 were observed to bind with (+)-pteryxin (Figure 2b). Furthermore, quinidine was found to interact with the S74, N75, H80, L84, F98, E99, V286, S287, Y290, W291, F453, I456, and Q457 residues of SGLT2 (Figure 2c). Consistent with this, the amino acid residues H80, F98, T153, K154, V157, D158, V290, I397, F453, I456, Q457, and S460 were shown to be important for the inhibition of SGLT2 [36]. Amino acid residues H80, V286, Y290, W291, and I456 are the main hydrophobic residues of SGLT2, interacting with the dock inhibitors formononetin, (+)-pteryxin and quinidine (Figure 3a–c). This concurs with a previous report wherein the amino acid residues H80, Y290, and I456 of SGLT2 have also been reported have a hydrophobic interaction with the inhibitor [36,37]. The binding energy (BE) for the catalytic domain interactions of formononetin–SGLT2, (+)-pteryxin–SGLT2, and quinidine–SGLT2 were found to be −7.63 kcal/mol, −9.01 kcal/mol, and −8.77 kcal/mol,

respectively, while their inhibition constants (Ki) were 2.57 µM, 0.245 µM, and 0.371 µM, respectively (Table 3).

**Figure 2.** The amino acid residue of SGLT2 interacting with formononetin (**a**), (+)-pteryxin (**b**), quinidine (**c**), and canagliflozin (**d**). The ligands (formononetin, (+)-pteryxin, quinidine, and canagliflozin) are represented as green stick forms and hydrogen bonds are indicated as green dashed lines.


**Table 3.** The docking results of the molecular interactions of SGLT2 and FimH with natural SGLT2 inhibitors and control compounds.

\* Control compound for SGLT2; \*\* Control compound for FimH.

Heptyl α-D-mannopyranoside \*\* - - −4.46 109.49

(+)-pteryxin −9.01 0.248 −5.50 92.97 Quinidine −8.77 0.371 −5.70 66.40

> The FDA approved canagliflozin, an SGLT2 inhibitor, for use in T2DM treatment in 2013 [38]. In the present study, canagliflozin was used as a positive control against SGLT2. Canagliflozin was observed to bound with G79, H80, Y150, K154, D158, W289, Y290, D294, S393, I397, S460, and I456 residues of SGLT2 (Figures 2d and 3d). Interestingly, these amino acids of SGLT2 have also been found to interact with SGLT2 inhibitors (formononetin,

(+)-pteryxin, and quinidine). I456 was one of the most reactive common hydrophobic residues of SGLT2, interacting with formononetin, (+)-pteryxin, and quinidine, as well as canagliflozin (Figure 3a–d). The Y290 residue of SGLT2 was found to make H-bonds with quinidine, while the same residue was observed to hydrophobically interact with formononetin and canagliflozin (Figure 3a,c,d). In addition, G79, H80, and I397 were the common hydrophobic interacting residues of SGLT2 with (+)-pteryxin and canagliflozin (Figure 3b,d).

**Figure 3.** Ligplot analysis of formononetin (**a**), (+)-pteryxin (**b**), quinidine (**c**), and canagliflozin (**d**) in terms of their interaction with SGLT2. The amino acid residues forming hydrophobic interactions are shown as red arcs while the hydrogen bonds are shown as green dashed lines with indicated bond lengths.

The function of human SGLT1 protein is dramatically affected by amino acid at position 457. It has been shown that residue 457 (i.e., Q457) in human SGLT1 directly interacts with sugar for its reabsorption [39,40]. The amino acid sequences of SGLT1 and SGLT2 revealed that both of these proteins have glutamine at the residue 457 position, and glucose–galactose malabsorption occurs due to mutation in the glutamine residue (Q457) [41]. Consistent with this, in the present study, formononetin and quinidine were observed to interact with the Q457 residue that is suggested to impair SGLT2 function.

Since the discovery of the first natural SGLT2 inhibitor (i.e., phlorizin), several other synthetic glucoside analogs have been developed [42]. Tofogliflozin is a selective SGLT2 inhibitor that enhances urinary glucose excretion in a dose-dependent manner [43]. Luseogliflozin is an orally active second-generation SGLT2 inhibitor that has protective effects on pancreatic beta-cell mass and function [44]. Furthermore, efforts have been made to find new active natural ingredients from *S. Flavescents* [15], alkaloids from *A. macrophylla* [45], and *Schisandrae Chinensis Fructus* for the development of SGLT2 inhibition [46]. A 4-O-methyl derivative of sergliflozin-A has been reported to exhibit SGLT2 inhibition activity [47].

SGLT2 inhibitors are some of the most promising anti-diabetic agents introduced into clinical practice over the last decade. The therapeutic benefits of SGLT2 inhibitors include weight loss, a reduction in blood pressure, and an enhancement in high-density lipoprotein level [48,49]. However, the SGLT2 inhibitors dapagliflozin, canagliflozin, and empagliflozin have been found to cause UTIs and genital infections in diabetic patients [50–52]. Additionally, the FDA has issued warnings about the possible UTI-inducing side effects of SGLT2 inhibitors in December 2015 [9].

UTIs pose a severe medical issue worldwide, with more than 85% of UTIs caused by uropathogenic *Escherichia coli* [53]. FimH is a bacterial adhesin that facilitates the colonization of uropathogenic *E. coli* on the cell surface of the human and murine bladder and leads to the formation of biofilm [54]. Therefore, this adhesin has been considered a virulence factor and a promising therapeutic target for the treatment of UTIs [55]. Interestingly, the docking results indicate that the SGLT2 inhibitors formononetin, (+)-pteryxin, and quinidine show strong binding with FimH. FimH was found to interact with formononetin through 11 amino acid residues, namely F1, N46, D47, Y48, I52, D54, Q133, N135, Y137, N138, and D140 (Figure 4a), while (+)-pteryxin was found to interact with 6 amino acid residues, namely Y48, T51, I52, Y137, N138, and D140 (Figure 4b). Similarly, 10 amino acid residues, namely the F1, I13, Y48, I52, D54, Q133, N135, Y137, N138, and D140 residues of FimH, were found to interact with quinidine (Figure 4c). The BE for formononetin–FimH, (+)-pteryxin–FimH, and quinidine–FimH interactions were found to be −5.65 kcal/mol, −5.50 kcal/mol, and −5.70 kcal/mol, respectively. The corresponding estimated inhibition constants (Ki) for the above-mentioned complexes were determined to be 71.95 µM, 92.97 µM, and 66.40 µM, respectively (Table 3).

**Figure 4.** The amino acid residue of FimH's interaction with formononetin (**a**), (+)-pteryxin (**b**), quinidine (**c**), and heptyl α-D-mannopyranoside (**d**). The ligands (formononetin, (+)-pteryxin, quinidine, and heptyl α-D-mannopyranoside) are represented as green stick forms and hydrogen bonds are represented as green dashed lines.

The FimH protein has two domains, the C-terminal pilin domain and the N-terminal lectin domain. The FimH lectin domain has a mannose-binding pocket (N46, D47, D54, Q133, N135, N138, and D140), in which sugar contributes to the formation of various hydrogen bonds with FimH. Hydrophobic regions are present in the surrounding region of the mannose-binding pocket and consist of hydrophobic support (F1, I13, and F142), the tyrosine gate (Y48, I52, and Y137), and the residue T51 [56]. Consistent with this, in the present study, the D47, D54, and N138 of the FimH protein were involved in hydrogen binding with formononetin (Figure 5a), while Y48, I52, Y137, and D140 were the common hydrophobic amino acid residues interacting with formononetin, (+)-pteryxin and quinidine (Figure 5a–c).

**Figure 5.** Ligplot analysis of the interactions of formononetin (**a**), (+)-pteryxin (**b**), quinidine (**c**), and heptyl α-Dmannopyranoside (**d**) with SGLT2. The amino acid residues forming hydrophobic interactions are shown as red arcs while the hydrogen bonds are shown as green dashed lines with indicated bond lengths.

In the present study, heptyl α-D-mannopyranoside (a FimH antagonist) was used as a positive control against the FimH protein. Molecular docking analysis revealed that amino acid residues, namely the F1, I13, Y48, T51, Q133, N135, N138, D140, and I52 residues of the FimH protein, have a vital role in binding with heptyl α-D-mannopyranoside (Figures 4d and 5d). The main FimH binding pocket amino acid residues are F1, N46, D47, D54, E133, N135, D140, and F142, and mutation in these individual main residues leads to them affecting FimH function and reducing its virulence [57,58]. Interestingly, we found that these amino acid residues of FimH were also determined to interact with natural SGLT2 inhibitors (formononetin, (+)-pteryxin, and quinidine).

O- and C-linked alpha-D-mannosides with hydrophobic and aryl substituents are effective FimH antagonists. The substitution of biphenyl derivatives may provide additional advantages for the FimH antagonist molecule. The addition of 1,10-biphenyl pharmacophore and various aglycone atoms enhanced the alpha-D-mannose derivatives' suitability as FimH inhibitors [59,60]. Further, glycomimetics based on mannose scaffolding

has also been synthesized and widely studied for their aptitude as FimH antagonists [61,62]. Besides synthetic compounds, natural substances, like cranberry and its derivatives (such as myricetin, proanthocyanidin (PAC)-standardized cranberry extract, and polyphenol metabolites extracted from PAC), have anti-adhesive effects on uropathogenic *E. coli* [63].

In docking experiments, it is usually crucial to look for a ligand that can bind efficiently with the receptor, using Gibbs free energy as a parameter of better binding [64,65]. The strength of an interaction between a ligand and a receptor is measured in terms of the free energy of binding. The lowest BE is the output of the efficient binding of the drug/ligand to the active site of the receptor [66]. A higher (negative) BE is a sign of efficient interaction between the ligand and the receptor [67]. Accordingly, in the present study, formononetin, (+)-pteryxin, and quinidine exhibited strong interaction with the FimH protein, with a high BE relative to that of the positive control heptyl α-D-mannopyranoside, suggesting that these compounds could be promising SGLT2 inhibitors with less severe uropathogenic side effects.

Diabetic ketoacidosis and its associated events have been reported at a low frequency in T2DM patients treated with canagliflozin [68]. Interestingly, in the present study, the BE of canagliflozin with FimH relative to the other SGLT2 inhibitor indicates that canagliflozin has less potency in inhibiting the FimH protein, thereby having a high susceptibility to causing diabetic ketosis and UTIs.

#### *3.2. Root Mean Square Deviation*

Root mean square deviation (RMSD) is the most significant dynamic to explore in terms of conformational changes by means of stability in the structure and dynamic behavior of the receptor [69]. The complexes of formononetin, quinidine, and (+)-pteryxin with FimH obtained in molecular docking that interacted the best were further probed by MD simulation. The binding of ligands to their receptor protein can result in large conformational deviations in the resulting complex, specifically at the binding site. In this study, values of RMSD were increased at the beginning, with respect to the native structure of the RMSD. Figure 6A reports the RMSD of the backbone atoms of FimH protein as they interacted with each ligand molecule, where formononetin–FimH, quinidine–FimH and (+)-pteryxin–FimH interactions are represented with a black, red, and green color code, respectively. RMSD changes in all systems were initiated at the same point (~0.125 nm); however, in the formononetin–FimH complex, the conformational changes took place throughout the stabilization process. The change in the RMSD values in the formononetin– FimH complex indicated a small fluctuation until ~30 ns, and, afterwards, RMSD was found to be stable, remaining at around 0.35 nm in value. Due to the initial ups and downs in the RMSD values of the formononetin–FimH complex, the topologies of the structures were observed to significantly change during simulation. The observed initial RMSD change predicted the large conformational changes in regions near the binding pocket. Although the simulation lasted for 50 ns, the discrepancy from the initial structure within the first 30 ns was sufficient to point out the protein structures that were substantially denatured at binding sites and adjacent areas. While the RMSD values of the quinidine– FimH complex showed little turbulence until ~20 ns, the values reached a plateau later on. The value observed was greater than 0.3 nm initially; however, after 35 ns it went steady at 0.37 nm. In general, overall RMSD values fluctuated within 0.5 nm ± 0.05 nm throughout the simulation.

Like the above complexes, RMSD values showed frequent conformational changes and binding residue incorporation during the simulation analysis of the pteryxin–FimH complex. Conformational changes were observed at a functional domain near the binding site of the complex, and a considerable movement of the pteryxin ligand was also predicted at that site. Moreover, the RMSDs of all three ligand candidates were calculated to evaluate the behavior of the ligands within the binding site of the protein. Figure 6B shows the plot representation of RMSD for each ligand. Ligands (formononetin and quinidine) have generally shown high stability within the close contact residues of protein, whereas, in the

case of pteryxin, this ligand demonstrated several fluctuations within the binding site. Due to the movement of the pteryxin ligand, an initial sudden elevation (~0.13 nm) followed by fluctuations until 0.15 ns was observed during the simulation. Although the RMSDs maintained a plateau later on, an average value of 0.21 nm was calculated throughout the simulation. Moreover, the difference in RMSDs due to the movement of pteryxin from the initial to the final frame at the binding site was observed as 0.15 nm, which revealed that pteryxin shows considerably weaker binding with interacting amino acid residues during the simulation. On the other hand, it was observed that formononetin and quinidine stably interacted during simulation, and the pattern of contact between the binding residues and both ligands (formononetin and quinidine) predicted regulation of the activity of the FimH protein.

**Figure 6.** Representation of an RMSD plot of each complex in terms of their relationship between RMSD values and simulation time, shown in three different color codes. Panel (**A**) shows an RMSD plot of the backbone atoms of FimH protein receptor interacting with three ligand candidates, while panel (**B**) represents the RMSD plots of ligand candidates during simulation, which are color coded black (Formononetin), red (Quinidine), and green (Pteryxin). FimH protein complexes with Formononetin and Quinidine possessed high stability during simulation, while the third complex with Pteryxin demonstrated comparatively less stability.

Consequently, the RMSDs of the formononetin–FimH and quinidine–FimH complexes were found to have comparatively more stable trajectories than the (+)–pteryxin–FimH complex during simulation analysis (Figure 6).

#### *3.3. Principle Component Analysis*

To identify the overall patterns of motion in the complexes during simulation, we utilized principal components analysis (PCA) for the three complexes. The covariance matrices of the simulations were calculated to produce the eigenvectors, and then a screening of the trajectories was performed around each of the diverse eigenvectors. Furthermore, the dominant motions during a four-vector RMSD for each complex were observed (Figure 7). A large portion of the overall variations in the receptor protein can be explained by a few reduced amplitude eigenvectors with large eigenvalues. The analysis illustrated that the fourth eigenvector accounts for the depletion of the RMSD value, although the rest of the three vectors showed elevation at the initial time of the simulation. It was shown that after ~8 ns the RMSD values reached a plateau and maintained their value until the end of the simulation (Figure 7).

To know the movement of the backbone atoms of the receptor protein, 100 frames of the first principal component were collectively loaded into VMD, which show the motility of the residues (Figure 8), although there might be differences between each complex with respect to the motions sampled. The width of the band is proportional to amplitude; a narrow band signifies those segments that hardly moved, while wide bands signify the sections most affected by the transitions. The middle domain of the protein showed less movement than the adjacent domains that predicted the stability of the middle domain.

The rest of the protein exhibited comparatively wider bands, which might be due to an increase in functionality.

**Figure 7.** Each panel (**A**–**C**) reports the calculation of projection of particular trajectory on eigenvectors for each three complexes. The RMSDs for every four eigenvectors are depicted as four subplots in each panel. Plot shows the projections of each vector are fitted to the structure in the eigenvector.

**Figure 8.** The motion of the backbone atoms of the protein receptor during simulation of the trajectory as calculated by principal component analysis. The width of the bands is proportional to amplitude.

It is worth mentioning that the binding energy values and MD simulations obtained can only demonstrate the binding efficacy and stability of inhibitors with the target protein. Further benchwork experiments are required to optimize these compounds (formononetin, (+)-pteryxin, and quinidine) as promising SGLT2 inhibitors with add-on FimH inhibition potential that might reduce the probability of uropathogenic side effects.

#### **4. Conclusions**

SGLT2 inhibitors are a newer class of drugs that have enormous anti-diabetic potential. Unfortunately, the side effects, specifically those that contribute to the development of UTIs, have impeded their success rate. Several clinicians, as well as diabetic patients, are still in a dilemma over the use of this class of anti-diabetic medication. In contrast, FimH is blooming as a potential alternative target for UTI treatment, and FimH inhibitors are currently under development. These inhibitors act against the uropathogenic bacterial strains adherence to the mucosal surface of the urinary tract, resulting in a lower chance of encountering resistance. In the present study, we docked natural SGLT2 inhibitors with FimH target proteins to predict their FimH interaction potential. To the best of our knowledge, this is the first time that SGLT2 inhibitors have been explored in terms of their use against FimH. Our findings suggest that among all SGLT2 inhibitors examined, formononetin,

(+)-pteryxin, and quinidine exhibited strong interactions with the FimH protein with high binding energy, in comparison to the positive control heptyl α-D-mannopyranoside. On the other hand, the FDA-approved SGLT2 inhibitor canagliflozin indicated lower interaction with FimH. Thus, we hypothesize that if explored further, natural SGLT2 inhibitors might reduce the probability of UTIs when used against FimH. The authors anticipate researchers to duly use the findings of this study to design more versatile and potent dual inhibitors against SGLT2 and FimH to cope with the uropathogenic side effects of SGLT2 inhibitor class anti-diabetic medications.

**Author Contributions:** Conceptualization, S.S. and S.M.D.R.; methodology, S.S., N.C., and S.M.D.R.; writing—Original draft preparation, S.S. and S.M.D.R.; writing—Review and editing, M.M.M., Q.A., S.A., M.M.B., K.A., S.S.A., M.M.A.B., A.M., A.I.A., and M.M.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Deanship of Scientific Research, Najran University, Najran, Saudi Arabia grant number NU/MID/17/091.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Doxorubicin Encapsulation in Carbon Nanotubes Having Haeckelite or Stone–Wales Defects as Drug Carriers: A Molecular Dynamics Approach**

**Leonor Contreras 1,\* , Ignacio Villarroel <sup>2</sup> , Camila Torres <sup>2</sup> and Roberto Rozas <sup>1</sup>**


**Abstract:** Doxorubicin (DOX), a recognized anticancer drug, forms stable associations with carbon nanotubes (CNTs). CNTs when properly functionalized have the ability to anchor directly in cancerous tumors where the release of the drug occurs thanks to the tumor slightly acidic pH. Herein, we study the armchair and zigzag CNTs with Stone–Wales (SW) defects to rank their ability to encapsulate DOX by determining the DOX-CNT binding free energies using the MM/PBSA and MM/GBSA methods implemented in AMBER16. We investigate also the chiral CNTs with haeckelite defects. Each haeckelite defect consists of a pair of square and octagonal rings. The armchair and zigzag CNT with SW defects and chiral nanotubes with haeckelite defects predict DOX-CNT interactions that depend on the length of the nanotube, the number of present defects and nitrogen doping. Chiral nanotubes having two haeckelite defects reveal a clear dependence on the nitrogen content with DOX-CNT interaction forces decreasing in the order 0N > 4N > 8N. These results contribute to a further understanding of drug-nanotube interactions and to the design of new drug delivery systems based on CNTs.

**Keywords:** carbon nanotubes; Stone–Wales defects; haeckelite defects; doxorubicin encapsulation; drug delivery system; binding free energies; noncovalent interactions; molecular dynamics

#### **1. Introduction**

Doxorubicin (DOX), an antineoplasic drug, approved for medical use by the FDA [1,2], has been used for more than 40 years to combat various types of cancers despite the cardiological risks associated with its use. Researchers at the Mayo Clinic [3] have reviewed its mechanism of action and, mainly thanks to the work of Denard et al. and Zhang et al. [4,5], have proposed two alternatives that explain its function. In one of them, DOX would stabilize a complex formed by double-stranded DNA and topoisomerase, which later it would cut both strands of DNA. The alternative is the production of a larger quantity of ceramides which would produce the translocation of a CREB3L-1 protein from the endoplasmic reticulum to the Golgii apparatus. There, some proteases would break the CREB3L-1 protein in such a way that its N-terminal fragment would be translocated to the nucleus where it would direct the DNA transduction, to finally express p21 proteins, which would be those that inhibit tumor growth. Other mechanisms of action of DOX reviewed by Ferreira et al. consider the intercalation of DOX in nuclear DNA and mitochondrial DNA, inhibition of topoisomerase-IIß, and epigenetic factors that involve methylation and deacetylation reactions [6].

In order to increase drug bioavailability and avoid its adverse effects, several types of drug carriers have been used, among which carbon nanotubes (CNTs) have shown

**Citation:** Contreras, L.; Villarroel, I.; Torres, C.; Rozas, R. Doxorubicin Encapsulation in Carbon Nanotubes Having Haeckelite or Stone–Wales Defects as Drug Carriers: A Molecular Dynamics Approach. *Molecules* **2021**, *26*, 1586. https:// doi.org/10.3390/molecules26061586

Academic Editors: Marco Tutone and Anna Maria Almerico

Received: 2 March 2021 Accepted: 11 March 2021 Published: 13 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

to form stable associations with DOX [7]. However, CNTs also exhibit some toxicity problems [8–11]. Fortunately, CNTs can be functionalized with fragments to increase their water solubility, which prevents them from being deposited as agglomerates in the body. Under certain conditions of concentration and purity CNTs are non-toxic [12]. Additionally, the functionalization of the CNTs facilitates the anchoring of the nanotubes right in the tumor to be attacked. The physicochemical and conductive properties of CNTs give them a versatility of applications in various fields such as electronics, photonics, catalysts, drug carriers, biotechnology, bone tissue engineering and others [13–15].

In the current work we are interested in the ability of CNTs to adsorb drugs and transport them to the target site. It is important to determine the structural parameters that facilitate the DOX-CNT association and to allow the development of strong DOX-CNT intermolecular interaction forces, which help to inhibit the drug from being released before reaching its target. Once there, the acidic pH of the tumor environment causes the release of the drug. Indeed, several studies show experimentally that DOX release is favored at pH of 5 or less [16–19], which is also demonstrated at a theoretical level [20].

Another technologically important characteristic of CNTs is their chirality which significantly modify their conductive properties. For example, armchair (n, n) nanotubes are conductive while zigzag (n, 0) and chiral (n, m) nanotubes are semiconductors except when the value of the difference (n − m) is a multiple of 3, since nanotubes become conductive [21–24]. The diameter of the nanotube has also been shown to be an important structural point since, depending on the diameter, the degree of curvature of the nanotube can be controlled, which seems to be a decisive factor in the stability of the DOX-CNT associations.

As can be deduced, the chirality and the diameter of the nanotubes are properties that determine their behavior and also the presence of structural defects which can change the chirality [25]. The carbon rings of the nanotube that differ from the hexagons are called defects, and depending on their ordering and distribution in the nanotube they confer different properties. For example, the five- and seven-membered rings, when distributed around the perimeter of the nanotube, constitute defects called bumpy. If they are distributed axially in the nanotube they are called zipper defects [26]. Other fourmembered rings, along with eight-membered rings, are called haeckelite defects [22,27]. These defects are formed by the addition of two carbon atoms or ad-dimers. However, there are other defects that are formed by rearrangement of their bonds, such as the Stone–Wales defects formed by a pair of rings of five and eight members [28].

The presence of some of these defects has significant technological importance. For example, zigzag nanotubes in the presence of ad-dimers can induce plastic transformations in a material that would otherwise be brittle [29]. Chiral nanotubes stand out, which in the presence of bumpy defects considerably increase their conductivity over armchair and zigzag nanotubes according to DFT studies considering dispersion-corrected B3LYP-D3 functional [30]. Zigzag nanotubes that contain bumpy defects show greater conductive ability and capacity to be reduced [30]; nitrogen doping increases the conductive ability of armchair nanotubes [30]. In addition, armchair nanotubes with bumpy defects notably increase their ability to adsorb hydrogen with very convenient hydrogen adsorption energy values, for their use in the management of clean energy [30].

CNTs form stable associations with doxorubicin [7]. Various theoretical and molecular dynamics studies predict a high capacity of CNTs as carriers [20,31–36] and several molecular dynamics studies of functionalized DOX-CNT systems with various organic groups have contributed to the study of DOX loading and release [37–43]. Although there are several works on the adsorption of DOX in CNT, there are no studies that report on the optimal structure that a nanotube should have to behave as a DOX nanocarrier. The situation is complicated because there are also no experimental data available on the formation and characterization of DOX-CNT complexes and the determination of their DOX-CNT binding energies. For non-functionalized nanotubes, Wang and Xu [20], systematically studied the adsorption and encapsulation of DOX in armchair nanotubes of different diameters,

using the theoretical methods PM6-DH2 and M06-2X in the ONIOM scheme and found that the diameter of the nanotube at which the best DOX encapsulation occurred was 14 Å and corresponded to (10,10) armchair nanotubes. This same behavior was confirmed through a study of molecular dynamics for armchair, zigzag and chiral nanotubes, finding that the strongest DOX-CNT interactions were produced for 14 Å in diameter nanotubes, regardless of chirality [32]. A different situation occurs in the presence of defects in the nanotube. In the case of bumpy defects, a dependence on chirality is observed, since armchair nanotubes with bumpy defects present weaker DOX-CNT interactions than armchair nanotubes without defects. In contrast, bumpy defects in chiral nanotubes favor the DOX-CNT interaction [32].

Several methods of synthesis of CNTs have been reviewed [44]. However, there is a lack of comparative systematic experimental antecedents on this issue, which makes it possible to pose as a valid hypothesis for a theoretical study that the presence of defects in the nanotube, the type and number of defects and their position, also modify the DOX-CNT association properties, along with the chirality and size of the nanotube.

The previous antecedents also lead us to investigate if there is a general trend for some type of nanotube, for example, with chirality or type of defect that accounts for the degree of DOX-CNT association. Our research questions include: (i) how does structural or nitrogen doping defects affect the ability of CNTs as drug carriers, in this case, DOX. (ii) Is the effect produced by the defects the same, regardless of the chirality and the size of the nanotube? (iii) Is there any type of defect that has better characteristics than the others? (iv) How does it affect the number of defects present?

In this work DOX-CNT binding energies are determined for chiral nanotubes with haeckelite defects (with rings of 4 and 8 carbon atoms) and for armchair and zigzag nanotubes with Stone–Wales defects (SW), by means of the MM/PBSA and MM/GBSA methods implemented in the AMBER program of molecular dynamics.

#### **2. Results**

Below are the results obtained by molecular dynamics (MD) simulation for DOX encapsulation systems in chiral CNTs with haeckelite (Hk) defects and also in armchair and zigzag CNTs with Stone–Wales (SW) defects. The Hk defects consist of a pair of rings of 4 and 8 members each, while the SW defects are made up of a pair of rings of 5 and 7 members each, as shown in Figure 1.

**Figure 1.** Representation of carbon nanotube structural defects and their C–C bond distances in Å. (**a**) haeckelite defect; (**b**) Stone–Wales defect.

#### *2.1. Chiral Nanotubes with Hk Defects*

Chiral nanotubes Ch(13,08) with one Hk defect (named Hk1) and two Hk defects (named Hk2) having 0N, 4N and 8N were studied considering different initial positions of the DOX: in the region of the defect (D), in the regular region of the nanotube (R) (there

Å

are no defects in that area) with the DOX NH<sup>2</sup> group pointing towards the center of the nanotube (v1 orientation) or to the inverse direction (v2 orientation) as shown in Figure 2 for Hk2 chiral nanotubes. Other additional DOX orientations refer to Hk1 nanotubes: when the DOX NH<sup>2</sup> group is oriented in a direction proximal to the defect (p) or is in the direction opposite to the defect (o) as shown in Figure 3.

Å

**Figure 2.** DOX orientation when encapsulated in a chiral nanotube having two haeckelite defects. (**a**) v1 orientation; (**b**) v2 orientation. Lateral views.

**Figure 3.** The encapsulated DOX orientations into a chiral nanotube having one haeckelite defect. (**a**) Hk1-DoxDIn.v1p; the DOX NH<sup>2</sup> group is oriented in a direction proximal to the defect; (**b**) Hk1-DoxDIn.v1o; the DOX NH<sup>2</sup> group is oriented in an opposite direction. Lateral and frontal views shown.

2.1.1. Chiral Nanotubes with HK1 Defects

For chiral nanotubes with Hk1 defects (Hk1 chiral nanotubes), the results predict a similar behavior for both undoped, 0N, and nitrogen doped nanotubes having 4N and 8N. In all three cases, the DOX-CNT interaction is favored when the DOX is located in the defect area with the NH<sup>2</sup> group pointing towards the center of the nanotube (v1 orientation)

π

Å

in the proximal direction close to the defect as shown in Table 1 (runs 2, 10 and 18 for 0N, 4N and 8N, respectively) with Poisson–Boltzman (PB) binding energies of −102, −99 and −102 kcal/mol, respectively. Coherently, most of these systems exhibit equilibrium distances with values between 3.2 and 3.6 Å evidencing stronger DOX-CNT interactions which are favored by the orientation of the DOX that facilitates the NH-π interaction. In Figure 4 the initial conformation of the Ch(13,08)8N-Hk1-DoxDIn.v1p complex is shown together with the final conformations after 2 ns of MD simulation and after 100 ns. It is observed that DOX does not move towards the regular part of the nanotube but interacts with the defect and as a result, in that area, there is a significant deformation of the nanotube. These results obtained for chiral nanotubes show the same behavior as was reported for armchair nanotubes with more favorable DOX-CNT interactions for systems in which DOX is located in the defect region and when it is oriented with its nitrogen atom directed towards the center of the nanotube. However, armchair nanotubes having one haeckelite defect exhibit DOX-CNT binding energies that are more exothermic suggesting stronger DOX-CNT interactions [33].

**Table 1.** DOX-CNT Poisson–Boltzman (PB) and generalized bond (GB) binding energies (in kcal/mol) for the nitrogen doped and undoped chiral nanotubes (34 Å length) having one haeckelite defect, Hk1, considering encapsulated system. dp-NT are the distances from the DOX anthraquinonic plane to the nanotube wall; dN-NT is the distance from the DOX nitrogen atom to the nanotube wall. All distances are expressed in Å.


<sup>1</sup> DoxD means DOX position is in the defect zone; DoxR is for the DOX in the regular part of the nanotube; v1 means the nitrogen atom of the DOX is oriented towards the center of the tube; v2 indicates the inverse orientation; v1p indicates that the DOX nitrogen atom is located in a proximal space regarding the defect meanwhile v1o is used to indicate that the DOX nitrogen atom is located in an opposite space regarding the defect as shown in Figure 3.

DOX-CNT systems, doped with 4N and containing Hk1 defect exhibit quite similar DOX-CNT PB binding energy values between −79 and −77 kcal/mol, probably accounting for an electronic distribution that interacts with the drug in a similar way regardless of DOX position and orientation. This could be due to the arrangement of nitrogen atoms which are part of two pyrimidine rings placed opposite each other on the walls of the nanotube.

**Figure 4.** Representation of encapsulated DOX-CNT complex for nitrogen-doped chiral nanotube with one haeckelite defect, Ch(13,08)8N-Hk1-DoxDIn.v1p, at different simulation times. (**a**) 0 ns; (**b**) 2 ns; (**c**) 100 ns. Two laterals and one frontal view are shown.

#### 2.1.2. Chiral Nanotubes with Hk2 Defects

Carbon nanotube diameter effect. The nitrogen doped and undoped Hk2 chiral (13,08) CNTs of 14 Å diameter and 19 Å length showed better DOX-CNT PB and GB binding energies than the corresponding Hk2 chiral (13,10) CNTs of 16 Å diameter (calculated with RESP charges for DOX) as shown in Figure 5. This was an expected result considering PM6-DH2 and M06-2X theoretical calculations in the scheme of ONIOM for the DOX encapsulation in armchair CNTs without defects [20], and also molecular dynamics studies on armchair, zigzag and chiral nanotubes with reported values of 14 Å as an optimal value of the nanotube diameter for encapsulating the DOX [32]. A diameter of 14 Å allows the proper curvature of the nanotube for the formation of different attractive and complementary non-covalent interactions between the nanotube and the DOX that stabilize the entire system which is also fulfilled in this case of nanotubes containing two haeckelite defects in their structure.

Hk2 chiral nanotubes of diameter 14 Å exhibit less favorable DOX-CNT PB binding energies for nitrogen doped nanotubes in the order 0N > 4N > 8N with values of −101, −97 and −74 kcal/mol, respectively (see Figure 5), which predicts stronger DOX-CNT interactions for undoped Hk2 chiral (13,08) nanotubes. These values were calculated using RESP charges for DOX. RESP (restrained electrostatic potential) approach to derive partial charges has been reported as having a lower average error than MM3 and CHARMm in a study considering 55 molecules [45]. Mean SD for the PB binding energy values between 2.6 and 3.2 kcal/mol and between 2.6 and 3.1 kcal/mol for the GB binding energy values were observed, being in all cases less than 4.2%.

**Figure 5.** DOX-CNT binding energies for nitrogen-doped and undoped chiral nanotubes of 19 Å length with different diameter, having Hk2 defects and calculated with RESP charges for DOX. (**a**) PB binding energies; (**b**) GB binding energies. Blue is for 14 Å diameter and red is for 16 Å diameter.

Carbon nanotube length and DOX pose effects. Longer Hk2 chiral nanotubes (34 Å length) exhibit more exothermic DOX-CNT PB binding energy values than Hk2 chiral nanotubes of 19 Å length, for both nitrogen doped and undoped nanotubes. For the undoped nanotubes (0N) and those doped with 4N, a clear preference of the DOX for the nanotube defect zone is shown with DOX-CNT PB binding energy values of −109 and −104 kcal/mol, respectively (runs 1 and 5, Table 2). When the DOX is in the regular zone of the nanotube, no significant differences are observed between the v1 or v2 orientations of the DOX. However, in cases where the DOX is initially located in the defect zone, stronger interactions are predicted for v1 DOX orientation for both undoped and 4N doped systems as shown in Table 2 (runs 1 vs. 2 and 5 vs. 6).

**Table 2.** Poisson–Boltzman (PB) and generalized bond (GB) binding energies (in kcal/mol) for the nitrogen doped and undoped chiral nanotubes (34 Å length) having two haeckelite defects, Hk2, considering encapsulated system. dp-NT are the distances from the DOX anthraquinonic plane to the nanotube wall; dN-NT is the distance from the DOX nitrogen atom to the nanotube wall. All distances are expressed in Å.


<sup>1</sup> DoxD means DOX position is in the defect zone; DoxR is for the DOX in the regular part of the nanotube; v1 means the nitrogen atom of the DOX is pointing towards the center of the tube; v2 indicates the inverse orientation.

The best system for DOX encapsulation in Hk2 chiral nanotubes is therefore Ch(13,08)0N-HK2-DoxDIn.v1 (run 1, Table 2) with the DOX in the defect region and v1 orientation (with the nitrogen pointing towards the center of the nanotube). In this conformation, the formation of the non-covalent DOX-CNT interactions is facilitated, which are mainly constituted by π–π stacking interactions complemented by NH-π, CH-π, C=O-π and van der Waals interactions [20,31,46]. Figure 6 shows the non-covalent interactions for the most favorable case with the DOX in the area of the nanotube defect (a large green surface is observed) and as a comparison, the same nanotube with the DOX encapsulated in the regular area (less green regions and more red regions are observed), with PB DOX-CNT binding energy values of −109 and −80 kcal/mol, respectively (runs 1 and 3, Table 2). A program specially developed for the visualization of non-covalent interactions (NCI) was used [47].

Under similar MD simulation conditions but considering RESP charges for DOX, Hk2 chiral (13,08) nanotubes (−101 kcal/mol, Figure 5) predict stronger DOX-CNT interactions than reported Hk2 armchair (10,10) nanotube (−83.4 kcal/mol) with the encapsulated DOX located in the defect zone in v1 orientation [33]. Hk2 chiral nanotubes predict stronger DOX-CNT interactions than reported Hk2 zigzag (18,0) nanotubes which exhibit values of DOX-CNT PB binding energies of −78.7 kcal/mol for undoped nanotubes [33]. The three types of CNTs in comparison have diameters of 14 Å. In this way, in terms of chirality and according to the indicated results, Hk2 nanotubes exhibit the following order of ability to encapsulate the DOX: chiral > armchair > zigzag, despite chiral nanotubes are shorter than zigzag and armchair nanotubes (19 vs. 34 Å length). The enhanced ability of chiral nanotubes with respect to other nanotubes to encapsulate DOX was reported also for perfect CNTs through MD simulation studies considering RESP charges for DOX [32].

**Figure 6.** Representation of non-covalent interactions (NCI) for DOX encapsulation inside chiral carbon nanotubes having two haeckelite defects. (**a**) DOX D position; (**b**) DOX R position. Cutplot 0.01 0.1. Blue surfaces indicate strong interactions; green means weak interactions; red means repulsion [47]. Different comparative lateral and frontal views shown.

#### *2.2. Nanotubes with Stone–Wales Defects*

The encapsulation of the DOX was studied in armchair and zigzag nanotubes that have one and two Stone–Wales defects (SW1 and SW2, respectively) as shown in Figure 7.

**Figure 7.** Representation of carbon nanotubes having one and two Stone–Wales defects. (**a**) SW1 armchair; (**b**) SW2 armchair; (**c**) SW1 zigzag; (**d**) SW2 zigzag. Side and front views shown.

#### 2.2.1. SW1 and SW2 Armchair Nanotubes

Armchair (10,10) nanotubes of 20 Å and 34 Å in length having Stone–Wales defects were studied, which showed different behaviors in DOX encapsulation. The shorter nanotubes (20 Å length) exhibit a significant stronger interaction with the DOX in two situations: (i) when they are of the SW2 type (with two defects, doped and undoped) in comparison with SW1 nanotubes and (ii) when they are doped with 4N (SW1 and SW2) as is clearly depicted in Figure 8a. Nitrogen doped SW1 and SW2 armchair (10,10) nanotubes of 20 Å length predict stronger interactions with the DOX than corresponding longer SW1 and SW2 nanotubes of 34 Å in length as shown in Figure 8 particularly for 4N doped nanotubes.

**Figure 8.** DOX-CNT PB binding energies for nitrogen-doped and undoped armchair (10,10) nanotubes of different length having one and two Stone–Wales defects. (**a**) 20 Å length; (**b**) 34 Å length.

In contrast, longer SW1 armchair nanotubes predict somewhat stronger interactions than longer SW2 armchair nanotubes the difference being more significant for the undoped SW1 and SW2 nanotubes with PB DOX-CNT binding energy values of −105 and −80 kcal/mol for, respectively, with DOX v1 orientation, and for DOX v2 orientation −92 vs. −81 kcal/mol, respectively, as shown in Figure 8b.

The most exothermic PB DOX-CNT binding energy with a value of −110 kcal/mol correspond to the shorter 4N-doped SW2 armchair nanotube which predicts the stronger DOX-CNT interactions. In the shorter SW2 armchair nanotubes the DOX is symmetrically located and can interact with both of the two defects which favors DOX-CNT interactions. Meanwhile for shorter SW1 armchair nanotubes the DOX interacts with just one defect only. In contrast, the less exothermic PB DOX-CNT binding energies correspond to longer SW1 and SW2 armchair nanotubes with values between −92 and −80 kcal/mol. The only exception is the longer SW1 armchair nanotube where the DOX is in the v1 orientation showing a PB binding energy of −105 kcal/mol. Figure S1 (Supplementary Materials) clearly shows the differences in relative DOX-CNT binding energies. For short nanotubes (20 Å long), SW2 exhibits more exothermic DOX-CNT binding energies than SW1, with 4N doping predicting the strongest DOX-CNT interactions. Within the long nanotubes (34 Å long) the most exothermic DOX-CNT binding energies correspond to the undoped SW1 nanotubes with v1 orientation. In Figure 9 its initial structure is shown and also at 2 ns and 74 ns of simulation where a probable double π–π interaction of the DOX with the two opposite walls of the nanotube is appreciated which generates a significant deformation of the nanotube in addition to its NH-π interaction with the DOX amino group that helps stabilize the system. In the longer nanotubes, the DOX interacts with the regular part of the nanotube also. Apparently the interactions DOX-Stone–Wales defects are stronger than the interactions DOX-regular CNTs.

π

π π

**Figure 9.** Representation of encapsulated DOX-CNT complex for undoped *armchair* nanotube with one Stone–Wales defect, A(10,10)0N-SW1-DoxDIn.v1 at different molecular dynamics simulation times. (**a**) 0 ns; (**b**) 2 ns; (**c**) 74 ns. Two lateral and frontal views shown.

#### 2.2.2. SW1 and SW2 Zigzag Nanotubes

Zigzag (18,0) nanotubes of 20 Å in length and 14 Å diameter having one and two Stone– Wales defects were studied. In Figure 10 the PB and GB DOX-CNT binding energies are shown for undoped nanotubes and for nanotubes doped with 4 and 8 nitrogen atoms. It is observed that both types of binding energies show the same tendency (as was also observed in Figure 5 for chiral nanotubes) and that undoped zigzag nanotubes predict stronger DOX-CNT interactions than doped ones, both with one or two SW defects.

Zigzag nanotubes having one SW defect (SW1) exhibit stronger DOX-CNT interactions than those having two SW defects (SW2). This behavior is more significant for nanotubes doped with 4 nitrogen atoms. Structurally, the presence of 4N means that there are two pyrimidine rings in the nanotube. Zigzag nanotubes with a SW1 defect doped with 8 nitrogen atoms (have four pyrimidine rings in the nanotube) show weaker DOX-CNT interactions than in the case of nanotubes doped with 4N. Undoped SW1 and SW2 nanotubes show similar DOX-CNT PB binding energies of −102 and −100 kcal/mol, respectively. Apparently DOX accommodates better in a space with SW defects but free from the influence of the nitrogen electron cloud. No great difference is observed between the DOX-CNT binding energies obtained for undoped SW1 and SW2 nanotubes and those for SW1 nanotubes doped with 4N with a PB binding energy of −101 kcal/mol; but for 4N-doped SW2 nanotubes, the DOX-CNT PB binding energy decreases significantly to −78 kcal/mol (probably DOX is prevented from accommodating in a narrower space). The same effect is observed when SW1 zigzag nanotubes doped with 4N are compared with those doped with 8N (−101 kcal/mol vs. −82 kcal/mol). Mean SD for the PB binding energy values ranged between 2.7 and 4.0 kcal/mol and between 2.9 and 4.1 kcal/mol for the PB and GB binding energy values, respectively, being in all cases less than 4%. Figure S2 (Supplementary Materials) clearly shows the DOX-CNT relative binding energies with respect to the non-doped nanotubes. SW1 nanotubes have more exothermic DOX-CNT binding energies than SW2. Furthermore, the non-doped SW1 nanotubes exhibit DOX-CNT binding energy values not very distant from the 4N-doped nanotubes, being the 8N-doped nanotubes the ones with the least favorable binding energies.

**Figure 10.** Representation of the PB and GB DOX-CNT binding energies for nitrogen-doped and undoped zigzag (18,0) nanotubes having one and two Stone–Wales defects. (**a**) PB binding energy; (**b**) GB binding energy.

The equilibrium distances dp-NT between the DOX anthraquinonic planar part and the nanotube wall ranged between 3.3 and 3.7 Å and the equilibrium distances d′ p-NT between the same DOX planar part and the nanotube wall in the opposite direction it ranged between 3.8 and 3.9 Å for cases having DOX-CNT PB binding energy between −100 and −102 kcal/mol. On the other hand, for cases with PB binding energy between −78 and −82 kcal/mol, the distances d′ p-NT ranged between 7.6 and 8.2 Å. So, for binding energies that predict strongest DOX-CNT interactions, a deformation of the nanotube is observed, which can also be seen in Figures 4 and 9 for chiral and armchair nanotubes, respectively, probably due principally to a double π−π stacking between the DOX anthraquinonic planar rings and both of the opposite CNT walls, and DOX-CNT van der Waals interactions. The equilibrium dN-NT distances between the DOX nitrogen atom and the nanotube inner wall exhibit values between 3.3 and 4.0 Å regardless of the value of the binding energies.

#### **3. Discussion**

It is interesting to find out how the structural parameters of nanotubes affect the relative DOX-CNT binding energies which could allow us to infer about the DOX-CNT interactions for nanotubes of different types. The aim is to predict nanotube structures that can develop more exothermic binding energies; that is, that produce more favorable DOX-CNT interactions, without fear that the desorption of the drug be difficult since the

acidic pH existing in the tumor environment facilitates protonation and release of the drug, which has been verified in different systems [14–17,38].

In the present work, it was found that a diameter of 14 Å favors the DOX-CNT interaction for Hk2 chiral nanotubes, in agreement with works reported for other defectfree nanotubes [20,32]. Furthermore, it was found that the strength of the DOX-CNT interaction decreases as the number of doping nitrogen atoms increases (0N > 4N > 8N). This relative trend was found both for long Hk2 chiral nanotubes (33 Å length) using Mulliken charges for DOX, as well as for short Hk2 chiral nanotubes (19 Å length) using RESP charges for DOX. However, using similar simulation conditions, Hk1 and Hk2 armchair and zigzag nanotubes (33 Å length) showed that nitrogen-doped systems had more favorable binding energies than non-doped ones [33]. Chirality shows to be an important parameter that controls the effect of the presence of nitrogen in the nanotube, modifying the distribution of electron density and therefore the DOX-CNT interactions in such a way that the presence of nitrogen in the chiral nanotubes destabilizes the association DOX-CNT unlike of what happens in armchair and zigzag nanotubes.

The initial poses of the DOX in the CNT proved to be important, finding better interactions for chiral nanotubes when the DOX is located in the defect part, oriented in such a way that its nitrogen atom points towards the center of the nanotube and is close to the defect. Apparently, this conformation favors π–π, van der Waals interactions and also electrostatic interactions of the NH-π, OH-π, C=O-π type as was observed in Figure 6. These DOX-CNT interactions in the cases of more electronegative binding energies translate into a significant deformation of the nanotube, as observed in Figures 4 and 9 with PB binding energies of −102 and −105 kcal/mol, for chiral and armchair nanotubes, respectively, which could be explained by the π–π interaction of the flat anthraquinonic system of the DOX with both opposite walls of the nanotube. The deformation of the nanotube has been observed in other MD simulation works [32,33], and also in works carried out using the PM6-DH2 and M06-2X methods [20].

On the other hand, the short SW2 armchair nanotubes (20 Å length) showed better DOX-CNT binding energies than the long SW2 armchair nanotubes (34 Å length). Although the short SW2 armchair nanotubes showed more exothermic DOX-CNT binding energies than the respective short SW1 armchair nanotubes, this trend did not hold for the long armchair nanotubes.

In contrast, the short SW1 zigzag nanotubes showed more exothermic DOX-CNT binding energies than the short SW2 zigzag nanotubes. For the SW1 and SW2 zigzag nanotubes it was found that the strength of the DOX-CNT interactions decreased in the presence of nitrogen in the order 0N > 4N > 8N, in a similar way to that shown by the Hk2 chiral nanotubes. Nitrogen-doped SW1 armchair and zigzag nanotubes showed stronger DOX-CNT interactions than the respective Hk2 chiral nanotubes.

There is a lack of experimental studies on the determination of DOX-CNT binding energies. Only one estimation is known in an aqueous system, of about 11.5 kcal/mol for DOX-CNT complexes 100 nm long and 2 to 3 nm in diameter [19]. However, the formation of stable DOX-CNT conjugates has been determined experimentally, verified by atomic force microscopy (AFM) and scanning tunneling microscopy (STM) images, using single-walled CNTs (SWCNTs) with diameter 1–1.5 nm and a few hundred nanometers long, purchased from ILJIN Co., Inc., Korea. [7]. In none of these cases is the chirality of the nanotube or the presence of defects specified.

It is not easy to confront theoretical studies that depend on a variety of factors, such as, in the field of molecular dynamics, the assignment of the type of atom involved in the consideration of the bonded and non-bonded parameters, or the definition of dihedrals and other interpretations of the description of the force field used [48].

The method used in the present work is very useful to obtain the DOX-CNT bonding energy values since it is comparatively fast and reproducible. Its calculation is based on the MM/PBSA and MM/GBSA approaches starting from the equilibrium energy values obtained through molecular dynamics simulation. Although the calculation does not

consider the translational entropy (huge computational cost), it has been shown that it does provide adequate values of the relative binding energies, which have been experimentally validated in biological systems [49,50] and it has recently been reported that when the MM/PBSA and MM/GBSA are used together with empirical corrections, they allow better experimental correlations [51].

DOX-CNT binding energies for the encapsulation of DOX on the walls of armchair nanotubes without defects at the PM6-DH2 level, in aqueous solution, showed values between −51.6 and −53.7 kcal/mol as a function of diameter [20].

As a reference, the nitrosamine adsorption free energy on open-ended Stone–Wales defective (5,5) armchair nanotube calculated by DFT methods at the B3LYP/6-31G(d) level of theory showed a value of −137.14 kcal/mol [52]. DOX-CNT binding free energy values obtained by the MM/PBSA and MM/GBSA methods for (10,10) armchair nanotubes without defects were −43 kcal/mol for DOX adsorption and fluctuated between −109 and −90 kcal/mol for DOX encapsulation [32]. For (10,10) armchair nanotubes with bumpy defects they ranged between −96 and −83 kcal/mol. (10,10) armchair nanotubes with haeckelite defects ranged from −104 to −80 kcal/mol depending on the number of defects and the presence of nitrogen [33].

Interestingly, although the DOX-CNT binding free energy values previously reported by us using the current methodology [32], are far from the values calculated by Wang and Xu [20] by means of PM6-DH2 and M06-2X methods, in both works, after a systematic study, it was found that the best nanotube diameter for encapsulate the DOX is 14 Å. This fact, together with the deformation of the nanotube observed with both methods means that the behavior and ability ranking of the nanotubes to associate with DOX coincides in both works, so it could be considered as an indirect validation of the use of the MM/PBSA and MM/GBSA methods to predict the DOX-CNT binding free energies for different nanotube structures.

The binding energy calculations are very useful to study molecular interactions, complex stability, getting information necessary for drug design, carrier design, inhibitor design, etc. The methods that allow obtaining the most accurate values are expensive in computational time and resources. Semi-empirical methods have been found to allow a good level of accuracy when compared to experimental measurements, and to be relatively fast and of low computational cost. The prediction of binding energies through the MM/PBSA approach increases in efficiency and accuracy when using the partial charges of the ligand determined by semi-empirical methods. A study with 50 protein-ligand systems revealed an excellent performance of the PM7 and AM1 systems, comparable to B3LYP which requires a huge computational cost, reaching the conclusion that the semi-empirical methods AM1 or PM7 provide partial charges of the proteins that help to improve the prediction of binding energies through methods such as solvated interaction energy (SIE) and MM/PBSA in a highly efficient and accurate way [53]. These results validate in some way the use of AM1 in this work.

To summarize, the structural parameters that dominate the behavior of defective nanotubes related to DOX encapsulation are interdependent and in order of importance, from what is observed in this work, the following can be mentioned: chirality, diameter and length of the nanotube; presence of nitrogen atoms as dopants; number of defects present and type of defects. Additionally, the initial pose of the DOX in the nanotube also affects the intermolecular attractive forces. We have found that the initial pose of DOX affects the DOX-CNT binding energies and we have been able to clarify that DOX-CNT interactions are favored when the DOX amino group points towards the center of the nanotube with an orientation close to the defects. It would be interesting in a next stage to do a simulation that considers different conformations of DOX.

As future work, we are doing research on chiral nanotubes with Stone–Wales defects. However, based on the known simulation studies, which are not directly comparable, we believe a global systematic approach is needed. This new comparable approach should consider the best values reported for each system (and also some of the values found as

unfavorable, for comparison) to perform the new simulations. To have a valid ranking of CNTs activity, as drug carriers, with or without defects, at least two conditions should be considered for the new simulations: (i) comparable structural parameters (for example, same diameter, length and number of defects, including nitrogen); and (ii) comparable simulation conditions (same version of the package of programs, same force field, type of charges, type of solvation, among others). Due to the complexity and the high number of variables, we believe that the use of Artificial Intelligence tools will be very useful to improve the study of the design of new drug carriers and also for other related applications. Anyway, refinement of the results for the best systems found could be done using simulation methods and conditions that guarantee greater accuracy.

#### **4. Simulation Methods**

The nanotubes were prepared as single-walled open nanotubes finished in hydrogen with the help of HyperTube [54] and Hyperchem [55] and optimized to the Austin Model One (AM1) level. Chiral CNTs with one and two haeckelite defects (Hk1 and Hk2, respectively) and also armchair and zigzag CNTs with one and two Stone–Wales defects (SW1 and SW2, respectively) were prepared as shown in Figure 7. Molecular dynamics simulations were performed with AMBER16 [56,57]. A program made at home was use for the preparation of the necessary files and instructions to run the MD program [32]. The combined GAFF and ff99SB force fields were used. The DOX was optimized at the level of HF/6-31G\* for some specific cases, just to get the restrained electrostatic potential (RESP) partial charges by using the antechamber AMBER program [45,58]. For the rest of the structures the AM1-Mulliken charges were used. All the DOX-CNT complexes were neutral systems, solvated in an explicit solvent, in a 10 Å octahedron water box using bondi radii under periodic boundary conditions. TIP3P was used as water model.

Simulations were run after some steps of minimization (1000 stages) and heating (from 0 to 300 K), both at constant volume, density balance (50 ps) and equilibrium (500 ps) both of them at constant pressure. Then, the production step consisting of six independent short stages (250 ps each) at constant pressure was carried out. This procedure of using several independent short simulations instead of a single long simulation has being recognized as efficient and accurate [50,59]. In effect, as it is shown in Figure 4, a simulation performed for 100 ns (using several short simulation) revealed similar energy and geometry results than the simulation procedure with short stages as it was indicated above. The drugnanotube binding free energies were determined through the MM/PBSA and MM/GBSA methods implemented in AMBER [56]. These methods were applied on an ensemble of 200 uncorrelated snapshots collected from the equilibrated molecular dynamics simulation for calculating the Gibbs free energy difference of the solvated bound (GCNT-DOX) and unbound states of the drug (GDOX) and nanotube (GCNT) molecules Equation (1).

$$
\Delta \mathbf{G} = \mathbf{G}\_{\text{CNT-DOX}} - \mathbf{G}\_{\text{CNT}} - \mathbf{G}\_{\text{DX}} \tag{1}
$$

Each of these terms were obtained according to Equation (2):

$$G = E\_{vdW} + E\_{bond} + E\_{el} + E\_{pol} + E\_{np} - TS \tag{2}$$

where *EvdW* (van der Waals), *Ebond* (bond, angle, dihedral) and *Eel* (electrostatic) are the standard molecular mechanics (MM) energy terms; *Epol* (polar term) is calculated by solving the Poisson–Boltzmann (PB) or the generalized born (GB) equation; *Enp* (non-polar term) is estimated from a linear relation with the solvent accessible surface area (SA); *T* is the absolute temperature and *S* is the entropy term (estimated through a normal-mode analysis of the vibrational frequencies). The fact that the binding free energies are calculated without considering the translational entropy must be borne in mind when interpreting the results. To obtain a good ranking of nanotube ability for encapsulating DOX based on the DOX-CNT binding energy values this is a convenient method in terms of computational resources, but not for obtaining absolute DOX-CNT binding energy values.

#### **5. Conclusions**

The DOX-CNT binding energies calculated in this work allow us to establish interesting trends such as:


**Supplementary Materials:** The following are available online, Figure S1: DOX-CNT PB relative binding energies for nitrogen-doped and undoped (10,10) armchair nanotubes of different length having one and two Stone–Wales defects. (a) 20 Å length; reference: 4N doped SW2 nanotube with −110 kcal/mol PB DOX-CNT binding energy; (b) 34 Å length; reference: 0N SW1 nanotube with −105 kcal/mol PB DOX-CNT binding energy. Figure S2: Representation of the PB and GB DOX-CNT relative binding energies for nitrogen-doped and undoped (18,0) zigzag nanotubes having one and two Stone–Wales defects. (a) PB binding energy. Reference: 0N SW1 nanotube with −102 kcal/mol PB DOX-CNT binding energy; (b) GB binding energy. Reference: 0N SW1 nanotube with −107 kcal/mol GB DOX-CNT binding energy.

**Author Contributions:** Conceptualization, L.C. and R.R.; funding acquisition, L.C.; investigation, L.C. and R.R.; methodology, L.C. and C.T.; project administration, L.C.; resources, C.T.; software, I.V. and C.T.; validation, L.C. and C.T.; visualization, I.V.; writing—original draft, L.C.; writing review and editing, L.C., I.V. and R.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by University of Santiago de Chile, USACH, grant number DICYT 061941CF and CIA SDT 2981.

**Institutional Review Board Statement:** Does not apply.

**Informed Consent Statement:** Does not apply.

**Data Availability Statement:** Does not apply.

**Acknowledgments:** We are grateful to David A. Case for facilitating access to using AMBER software and Rodrigo Yáñez for computer facilities.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Not available.

#### **References**


### *Article* **Identification of New** *Mycobacterium tuberculosis* **Proteasome Inhibitors Using a Knowledge-Based Computational Screening Approach**

**Tahani M. Almeleebia <sup>1</sup> , Mesfer Al Shahrani <sup>2</sup> , Mohammad Y. Alshahrani 2,3 , Irfan Ahmad <sup>2</sup> , Abdullah M. Alkahtani <sup>4</sup> , Md Jahoor Alam <sup>5</sup> , Mohd Adnan Kausar <sup>6</sup> , Amir Saeed 7,8, Mohd Saeed <sup>5</sup> and Sana Iram 9,10,\***


**Abstract:** *Mycobacterium tuberculosis* (Mtb) is a deadly tuberculosis (TB)-causing pathogen. The proteasome is vital to the survival of Mtb and is therefore validated as a potential target for anti-TB therapy. Mtb resistance to existing antibacterial agents has enhanced drastically, becoming a worldwide health issue. Therefore, new potential therapeutic agents need to be developed that can overcome the complications of TB. With this purpose, in the present study, 224,205 natural compounds from the ZINC database have been screened against the catalytic site of Mtb proteasome by the computational approach. The best scoring hits, ZINC3875469, ZINC4076131, and ZINC1883067, demonstrated robust interaction with Mtb proteasome with binding energy values of −7.19, −7.95, and −7.21 kcal/mol for the monomer (K-chain) and −8.05, −9.10, and −7.07 kcal/mol for the dimer (both K and L chains) of the beta subunit, which is relatively higher than that of reference compound HT1171 (−5.83 kcal/mol (monomer) and −5.97 kcal/mol (dimer)). In-depth molecular docking of top-scoring compounds with Mtb proteasome reveals that amino acid residues Thr1, Arg19, Ser20, Thr21, Gln22, Gly23, Asn24, Lys33, Gly47, Asp124, Ala126, Trp129, and Ala180 are crucial in binding. Furthermore, a molecular dynamics study showed steady-state interaction of hit compounds with Mtb proteasome. Computational prediction of physicochemical property assessment showed that these hits are non-toxic and possess good drug-likeness properties. This study proposed that these compounds could be utilized as potential inhibitors of Mtb proteasome to combat TB infection. However, there is a need for further bench work experiments for their validation as inhibitors of Mtb proteasome.

**Keywords:** *Mycobacterium tuberculosis*; tuberculosis; proteasome; natural compounds

**Citation:** Almeleebia, T.M.; Shahrani, M.A.; Alshahrani, M.Y.; Ahmad, I.; Alkahtani, A.M.; Alam, M.J.; Kausar, M.A.; Saeed, A.; Saeed, M.; Iram, S. Identification of New *Mycobacterium tuberculosis* Proteasome Inhibitors Using a Knowledge-Based Computational Screening Approach. *Molecules* **2021**, *26*, 2326. https://doi.org/10.3390/ molecules26082326

Academic Editor: Marco Tutone

Received: 8 March 2021 Accepted: 13 April 2021 Published: 16 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

*Mycobacterium tuberculosis* (Mtb) is a deadly tuberculosis (TB)-causing pathogen. TB is a communicable disease that ranks in the world's top 10 causes of death. Besides, it is the leading cause of single infectious agent fatality (higher than HIV/AIDS), and approximately 10 million people fell ill with TB in 2019 [1]. A person with a weakened immune system is highly susceptible to TB infection; thus, their involvement with HIV is the major cause of fatality for these patients [2]. Mtb resistance to existing antibacterial agents has enhanced drastically as well as multidrug-resistant and extensively drug-resistant Mtb strains. These strains are becoming a worldwide health issue [3,4] and are involved in the host immune system's resistance to nitric oxide stress.

Proteasomes are multi-subunit proteolytic complexes that have a vital role in various cellular functions. Proteasome inhibition has appeared as a prevailing approach for the management of various infectious diseases [5]. Mtb proteasome is vital for the bacterium pathogenesis; hence, it is regarded as an attractive target for the development of new agents that may inhibit Mtb. The Mtb mutans lacking the proteasome (proteasome subunits silencing) are viable in vitro, but the infection cannot be maintained in the TB mouse model [6,7]. Hence, it seems that Mtb proteasomes are vital for their propagation in mammalian hosts and are involved in the host immune system's resistance to nitric oxide stress [8].

The proteasome is a heap-shaped protein made up of four rings of heptamers. Its length and width are 150 and 115 Å, respectively [9,10]. Inner beta rings are formed by seven identical prcB subunits, and outer alpha rings are formed by seven identical prcA subunits, which give a path to inner beta rings with active sites when they are open. As a result, it provides the overall organization of α7β7β7α7. The active site of the bacterial proteasome is identical to that of the archaeal and eukaryotic proteasomes and is found primarily in β subunits [9,11].

In the literature, several effective Mtb proteasome inhibitors have been documented. Among them, HT1171, GL5, MLN273, and fellutamide-B are the most potent Mtb proteasome inhibitors [9,12,13]. The Mtb proteasome was revealed to be inhibited by 15 psoralens from a library of 92 analogs, and compounds 8, 11, 13, and 15 exhibited potent inhibition in a fluorescence-based enzymatic assay [14]. Several plant-derived natural products were discovered to inhibit Mtb proteasome with IC<sup>50</sup> values ranging from 25 to 120 M using the chymotrypsin substrate Suc-LLVY-AMC [15]. Various bortezomib analogs have been developed, with the phenol- and halogen-substituted analogs being more specific for the Mtb proteasome than the human proteasome [16].

Drug development requires the identification of some potential hits from a huge library of chemical components. Screening such a huge compounds library using wetlab assays can be a difficult task. Protein-ligand docking is a powerful tool in drug development because it aids in the identification of active or lead compounds from a library of compounds [17,18]. This method is also capable of accurately identifying inhibitor binding modes to the target proteins [17,19]. Computational screening before laboratory testing is a successful approach in decreasing the number of candidate inhibitors for benchwork-based screening [20–22]. The present study aimed to identify new possible hits from the natural compounds databases using in silico, state-of-the-art techniques that could serve as Mtb proteasome inhibitors to combat TB infection.

#### **2. Methodology**

#### *2.1. Protein Structure Preparation*

Mtb proteasome 3D structure (PDB ID: 5TRG) was retrieved from the protein data bank and prepared in monomer form by Discovery Studio (DS) 2020. Since the Mtb proteasome core particle has 14 chains in the beta subunit, all of which have the same active sites, the present study focused on chain K as a monomer, and the K and L chains as a dimer.

#### *2.2. Database Collection and Refinement*

Natural compounds were accessed from the ZINC database (https://zinc.docking.org accessed on 29 January 2021), limiting the outcomes by choosing "natural products" as a subset, resulting in a total of 224,205 compounds and was then downloaded in SDF format. Furthermore, these compounds were processed for minimization and preparation for screening using the "ligand preparation" tool in DS 2020.

#### *2.3. Receptor-Based Virtual Screening*

In order to identify possible leads, the prepared librarian was screened against the Mtb proteasome active site using AutoDock vina (version 1.1.2). Then, the top-scored hits were further processed for in-depth molecular docking studies.

#### *2.4. Molecular Docking*

Lead hits were docked with Mtb proteasome (monomer; K-chain) by Autodock4.2 to determine the ligand–protein interaction and their binding affinities [23]. All input files were prepared using AutoDock Tools, adding polar hydrogen in protein, and assigning the charges with the Kollman charges method. The grid center points X, Y, and Z were kept as 36.731, −11.255, and 43.313, respectively. Grid points were fixed as 60 × 60 × 60 Å with the spacing of 0.375 Å. Additionally, these hits were also docked with the dimer form (K and L chains) of the beta subunit of the Mtb proteasome, keeping the grid center points X, Y, and Z as 41.22, 0.60, and 32.31, respectively. All docking calculation parameters were kept as a default value. The highest negative binding energy (BE) value was ranked as the most promising binding pose.

#### *2.5. LIGPLOT<sup>+</sup> Analysis*

The H-bond and hydrophobic interactions between "hit compounds–Mtb proteasome" complexes were determined by the LIGPLOT+ Version v.2.1. The 3-D structures of the "compound–Mtb proteasome" interaction produced were transformed into 2-D figures using the LIGPLOT algorithm.

#### *2.6. Drug-Likeness*

Top-scoring molecules (ZINC3875469, ZINC4076131, and ZINC1883067) were further used to estimate drug-likeness, toxicity, and pharmacokinetic properties using the pkCSM and SwissADME tools [24]. SMILE IDs of the molecules retrieved from the ZINC database were entered into the pkCSM tool to evaluate drug-likeness [25].

#### *2.7. Molecular Dynamics (MD) Simulations*

To study the dynamic behavior of the protein–ligand complex in simulated physiological conditions, MD simulations were performed using Gromacs Ver. 2020.4. The protein–ligand complexes were solvated in a 10 × 10 × 10 Å orthorhombic periodic box built with TIP3P water molecules. By adding a sufficient number of 9 Na counterions, the entire system was neutralized. This solvated system was energy-minimized and positionrestrained with CHARMM 36 as a force-field [26]. Further, 20 ns of MD run was carried out at 1 atm pressure and 300 K temperature implementing NPT ensemble with a recording interval of 100 ps. This resulted in a total of 1000 reading frames. In the end, various parameters of MD simulation study such as ligand binding site analysis, root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), radius of gyration, Mindistance, H-bond analysis, etc., were analyzed for the stability, compactness, structural variations, and protein–ligand interactions in the solvated system.

#### **3. Results and Discussion**

–

#### *3.1. Virtual Screening, Molecular Docking, and LIGPLOT*

Mtb is the only known bacterial pathogen that has proteasome activity [6]. The increase in drug-resistant TB is a major public health concern and requires the development of new agents that can evade the resistance and effectively control TB. With this purpose, we conducted the computational screening of 224,205 natural compounds from the ZINC database targeting the Mtb proteasome. Among them, the selected hits ZINC3875469, ZINC4076131, and ZINC1883067 showed strong binding with the Mtb proteasome. Twodimensional structures of hit compounds are shown in Figure 1. ZINC3875469 interacted with proteasome through 10 amino acid residues: Thr1, Ser20, Thr21, Gln22, Gly23, Ala46, Gly47, Thr48, Gly140, and Ser141 (Figure 2a); while Thr1, Arg19, Ser20, Asn24, Thr21, Gln22, Gly23, Asn24, Lys33, Gly47, Ala49, and Ala180 residues of the proteasome were observed to bind with ZINC4076131 (Figure 2b). In a similar way, ZINC1883067 was found to interact with Thr1, Arg19, Ser20, Thr21, Lys33, Ala46, Gly47, Gly140, Ser141, and Ala180 residues of the proteasome (Figure 2c).

**Figure 1.** 2D structures of hit compounds.

The active site pocket residues of Mtb proteasome were determined as Ser20, Thr21, Gln22, Val31, Ile45, Ala46, Thr48, Ala49, Val53, Asp124, Asp128, and Asp130 [27]. Interestingly, ZINC3875469, ZINC4076131, and ZINC1883067 were also found to bind with these residues of Mtb proteasome. In a study, small molecules were reported to interact with Thr1, Arg19, Ser20, Thr21, Gln22, Lys33, Gly47, Ala49, Gly140, and Ser141 residues of Mtb proteasome [27]. Consistent with this, in the present study, the selected hits were observed to bind with the similar residues of Mtb proteasome.

Oxathiazol-2-one compounds altered the Mtb proteasome by interacting with the Thr1 residues of the core complex beta-subunit. Consequently, Thr1 is cyclocarbonylated, which greatly alters the active site environment and causes an alternative protein conformation in which the substrate-binding pocket is disrupted. As a result, Mtb proteasome substrates were unable to obtain access to the proteasome, causing toxic proteins and peptides to accumulate within mycobacterial cells. Notably, the oxathiazol-2-one compounds were thought to have no effect on the substrate-binding pocket of human proteasomal beta-subunits. It was proposed that this was due to the fact that the residues involved in preserving the altered conformation were largely non-conserved and thus not susceptible to cyclocarbonylation inactivation. This provided highly selective inhibition of Mtb proteasomes while leaving host proteasomal activity unaffected [13]. Interestingly, in this study, the selected hits ZINC3875469, ZINC4076131, and ZINC1883067 were found to interact with Thr1 residues of the Mtb proteasome.

**Figure 2.** Interacting residues of Mtb proteasome (monomer; K-chain) with ZINC3875469 (**a**), ZINC4076131 (**b**), ZINC1883067 (**c**), and HT1171 (**d**).

We also performed the molecular docking of selected hits with the dimer model (K and L chains) of the Mtb proteasome to determine possible interactions with the adjacent chain (L-chain). Asp124 of the adjacent chain of the Mtb proteasome has been reported as an important residue for inhibitor interaction [28]. Interestingly, in addition to interacting with K-chain residues, ZINC3875469 and ZINC4076131 interacted with several other Mtb proteasome L-chain residues including the Asp124 (Figure 3a,c). Despite the fact that ZINC1883067 did not interact with the Asp124 residue of the L- chain, it did interact with several other residues of both chains of the Mtb proteasome (Figure 3b).

The BE values for hits ZINC3875469, ZINC4076131, and ZINC1883067 with the Mtb proteasome were found to be −7.19, −7.95, and −7.21 kcal/mol, respectively, for the monomer, and −8.05, −9.10, and −7.07 kcal/mol, respectively, for the dimer (Table 1).

HT1171 is a well-characterized inhibitor of the Mtb proteasome [13], which was used as the control compound in this study. HT1171 has been reported to bind with Thr1, Thr21, Arg19, Ser20, Val31, and Ala49 residues of Mtb proteasome [27]. Interestingly, Thr1, Ser20, and Thr21 are the common interacting residues of the Mtb proteasome with HT1171 and the selected hit compounds in this study (Figure 2a–d). BE of HT1171 against the Mtb proteasome was noted as −5.83 kcal/mol for the monomer and -5.97 kcal/mol for the dimer (Table 1).

**Figure 3.** Interacting residues of Mtb proteasome (dimer; K and L chains) with ZINC3875469 (**a**), ZINC1883067 (**b**), and ZINC4076131 (**c**).


HT1171 \* −5.83 −5.97 45.36 47.01

**Table 1.** Binding energy of hit compounds with Mtb proteasome (monomer and dimer).

Mtb proteasome was noted as −5.83 kcal/m \* Positive control for Mtb proteasome.

The hydrophobic interaction and H-bond help to elucidate the potency of inhibitors to the target protein and contribute an important role in "inhibitor–protein" complex stability [29,30]. The Mtb proteasome residues involved in H-bond (Table 2) and hydrophobic interaction with selected compounds are shown in Figure 4.

–

−7.19 −8.05 **Table 2.** H-bond interactions between compounds and the Mtb proteasome (monomer; K-chain).


−5.83 −

"

–

"

**Figure 4.** H-bond (green dashed line) and hydrophobic interacting (red arc) residues of the Mtb proteasome (monomer; K-chain) with ZINC3875469 (**a**), ZINC4076131 (**b**), ZINC1883067 (**c**), and HT1171 (**d**).

In the docking study, more negative BE corresponded to the strong binding of hits to the target protein. Furthermore, it is a fact that weaker binding will ultimately have a rapid dissociation rate [31]. Accordingly, in this study, the hits (ZINC3875469, ZINC4076131, and ZINC1883067) exhibited lower BE (strong binding) with the Mtb proteasome than the control compound (HT1171), suggesting that these compounds could be utilized as an inhibitor of the Mtb proteasome to combat TB. The results of the pkCSM and SwissADME tool show that top-scored compounds (ZINC3875469, ZINC4076131, and ZINC1883067) retained an acceptable range of ADMET and drug-likeness (Lipinski) (Table 3).


**Table 3.** Pharmacokinetic properties of top-scoring ligands.

#### *3.2. MD Simulation*

MD simulations of the protein–ligand complex were performed using Gromacs 2020.4 on the Linux platform. MD simulation provides information about the receptor–ligand complex with time, so we performed the MD simulation for 20 ns on the three complexes (hit compounds). After the simulation, we analyzed the trajectory files for RMSD, RMSF, protein–ligand interactions, etc.

#### 3.2.1. RMSD

The RMSD value determines the mean deviation of the complex at a specific time. It is an indicator that defines the average change in an atom's displacement in the specific molecular conformation of the reference conformation. In trajectory analysis, the complex RMSD was found within the range of 0.25 Å. The initial RMSD value of the complex was 0.1 Å. The backbone atoms were monitored, and the stability, compactness, structural fluctuations, and protein–ligand interactions in a solvated system were examined. RMSD of the backbone was also noted to 0.2 Å (Figure 5). RMSD was found to be constant at 0.2 Å

after 10 ns. The low RMSD value suggests that complexes are more stable. Moreover, it was noted that among the three complexes, complex ZINC3875469 had the lowest RMSD values. On the other hand, complex ZINC3873067 had a comparatively larger RMSD value. This suggests that complex ZINC3875469 has more stability.

–

–

–

−

–

atom's displacement in the specific

**Figure 5.** RMSD of protein and ligand after the initial RMSD values were stabilized. The RMSD graph for the backbone is shown in black color (complex ZINC1883067), red color (complex ZINC3875469), and green color (complex ZINC4076131).

#### 3.2.2. RMSF

– – The RMSF is useful for characterizing local protein mobility in the protein–ligand complex, which is calculated throughout the simulation. It relates to the root-mean-square displacement of each frame conformation residue relative to the average conformation used to determine the flexibility of a protein region. In an RMSF plot, the peak shows the protein area fluctuates more throughout the simulation, while the lower RMSF values reflect the less conformational transition. The atomic profile fluctuations were found to be almost similar in all three complexes. The analysis revealed that the RMSF plot (Figure 6) displayed minimal fluctuations in the protein structures for complex ZINC3875469. The protein–ligand complex displayed lower flexibility, and the RMSF plot revealed variations in certain regions of protein residues. It is suggested that the ligand binding site remained approximately rigid throughout the simulation.

**Figure 6.** RMSF protein backbone and ligand complex is shown in black color (complex ZINC1883067), red color (complex ZINC3875469), and green color (for complex ZINC4076131).

–

#### 3.2.3. Radius of Gyration (Rg)

Rg is used to assess a characterization parameter that evaluates changes in protein structures. For the measurement of the transition in Rg of the protein backbone atoms, the gmx gyrate software was used. Figure 7 shows that the Rg values of complexes ZINC1883067, ZINC3875469, and ZINC4076131 did not change significantly throughout the simulation and continued to fluctuate at 1.73 and 1.69 nm, respectively, indicating that the ligand had little influence on protein structures. It was observed that the Rg value of complex ZINC3875469 was lower and had little fluctuation comparatively throughout the 20 ns of simulation. This suggests that the ligand–protein interaction in complex ZINC3875469 is very high, which makes its structure more compact. –

**Figure 7.** Plot for Rg of backbone atoms of the proteins in the presence of ligands. Black color (complex ZINC1883067), red color (complex ZINC3875469), and green color (complex ZINC4076131).

#### 3.2.4. Minimum Distance

The minimum distance between protein and ligand is given in Figure 8. The average value was found to be 1.5 nm. Interestingly, complex ZINC3875469 had the comparatively lowest minimum distance of 0.25 nm during the entire simulation, indicating more compactness and stability of complex ZINC3875469. This suggests that complex ZINC3875469 is more stable than other complexes comparatively.

**Figure 8.** Plot showing minimum distance for complexes. Black color (complex ZINC1883067), red color (complex ZINC3875469), and green color (complex ZINC4076131).

–

#### 3.2.5. Number of Hydrogen Bonds (H-Bond Number)

The number of H-bonds was measured to find out the robustness of the complex using a cut-off value 0.35 nm. It was noticed that complexes ZINC1883067 and ZINC3875469 had the most H-bonds, but the complex ZINC3875469 H-bonds were more stable over the entire simulation and thus play a significant role in stabilizing protein–ligand interactions (Figure 9). –

**Figure 9.** Plot showing number of hydrogen bonds for complexes. Black color (complex ZINC1883067), red color (complex ZINC3875469), and green color (complex ZINC4076131).

It should be noted that the BE values and MD simulations obtained can only show the binding effectiveness and stability of inhibitors with the target protein. However, further bench work studies are required to validate these hits (ZINC3875469, ZINC4076131, and ZINC1883067) as Mtb proteasome inhibitors.

#### **4. Conclusions**

In summary, natural compounds from the ZINC database were screened against the Mtb proteasome. The top hit compounds (ZINC3875469, ZINC4076131, and ZINC1883067) demonstrated robust binding with the monomer as well as dimer Mtb proteasome. Molecular docking of these selected hits with the Mtb proteasome dimer model revealed that, in addition to interacting with K-chain residues, they also interacted with many other residues of the L-chain. These results open the way for further experimental confirmation in the quest for a novel Mtb proteasome inhibitor to combat TB.

**Author Contributions:** Conceptualization, T.M.A., and S.I.; methodology, T.M.A., and S.I.; writing original draft preparation, T.M.A., and S.I.; writing—review and editing, M.A.S., M.Y.A., I.A., A.M.A., M.J.A., M.A.K., A.S., and M.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors would like to express their gratitude to the Research Center of Advanced Materials—King Khalid University, Saudi Arabia for support by grant number (RCAMS/KKU/ 0020/20).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Not available.

#### **References**


### *Article* **Identification of Novel Src Inhibitors: Pharmacophore-Based Virtual Screening, Molecular Docking and Molecular Dynamics Simulations**

### **Yi Zhang, Ting-jian Zhang, Shun Tu, Zhen-hao Zhang and Fan-hao Meng \***

School of Pharmacy, China Medical University, Shenyang 110122, China; cruckzhang0304@163.com (Y.Z.); todayfy@outlook.com (T.-j.Z.); tushun2018@163.com (S.T.); zhangzhenhao314@163.com (Z.-h.Z.) **\*** Correspondence: fhmeng@cmu.edu.cn; Tel.: +86-133-8688-7639

Academic Editors: Marco Tutone and Anna Maria Almerico Received: 25 July 2020; Accepted: 5 September 2020; Published: 8 September 2020

**Abstract:** Src plays a crucial role in many signaling pathways and contributes to a variety of cancers. Therefore, Src has long been considered an attractive drug target in oncology. However, the development of Src inhibitors with selectivity and novelty has been challenging. In the present study, pharmacophore-based virtual screening and molecular docking were carried out to identify potential Src inhibitors. A total of 891 molecules were obtained after pharmacophore-based virtual screening, and 10 molecules with high docking scores and strong interactions were selected as potential active molecules for further study. Absorption, distribution, metabolism, elimination and toxicity (ADMET) property evaluation was used to ascertain the drug-like properties of the obtained molecules. The proposed inhibitor–protein complexes were further subjected to molecular dynamics (MD) simulations involving root-mean-square deviation and root-mean-square fluctuation to explore the binding mode stability inside active pockets. Finally, two molecules (ZINC3214460 and ZINC1380384) were obtained as potential lead compounds against Src kinase. All these analyses provide a reference for the further development of novel Src inhibitors.

**Keywords:** Src inhibitors; pharmacophore model; virtual screening; molecular docking; molecular dynamics simulations

#### **1. Introduction**

The Src family kinases (SFKs) are a family of non-receptor tyrosine kinases, which are involved in a wide variety of essential functions to sustain cellular homeostasis, where they regulate cell cycle progression, motility, proliferation, differentiation and survival, among other cellular processes [1]. As a prototypical member of the SFKs, Src contains Yes, Fyn, Lyn, Lck, Hck, Fgr, Yrk, Frk and Blk kinases [2]. Src consists of four homology domains (SH1, SH2, SH3 and SH4) and a unique domain (Figure 1). The SH1 domain (also called the catalytic domain) is composed of two subdomains (generally termed N-terminal and C-terminal lobes) separated by a cleft. The N-terminal lobe contains the highly conserved hinge region that is implicated in the interaction with the ATP-adenine ring and to which most of the Src inhibitors anchor through hydrogen bonding. The C-terminal lobe is larger, comprises an activation loop that contains a tyrosine residue that can be autophosphorylated (Tyr419 in human c-Src) and is the positive regulatory site responsible for maximizing kinase activity. The phosphorylation of this residue stabilizes the kinases in an active conformation accessible to ATP and substrates. On the contrary, when another tyrosine residue located in the C-terminal lobe tail (Tyr530 in human c-Src) is phosphorylated, a closed conformation is induced [3]. The SH2 and SH3 domains regulate the Src catalytic activity through both intramolecular and protein–protein interactions. The SH4 domain is a 15-amino acid sequence whose myristoylation allows the binding of Src members to the inner surface of the plasma membrane. The unique domain is included in the N-terminal segment of the proteins, together with SH4, and is composed of 50–70 residues. Unlike the SH domains, it displays the greatest divergence among the SFKs and thus probably contributes to the differentiation of their biological functions [4]. Src is a central signaling hub that can be activated by many factors, including immune-response receptors, integrins and other adhesion receptors, receptor protein tyrosine kinases, G protein-coupled receptors and cytokine receptors [5]. In normal cells, Src is only transiently activated during the multiple cellular events in which it is involved. Conversely, Src is overexpressed and/or hyperactivated in a large variety of solid tumors and is probably a strong promoting factor for the development of metastatic cancer phenotypes [6]. Src is responsible for many human cancers such as lung [7], neuronal [8], ovarian [9], esophageal [10] and gastric cancers [11], as well as melanoma [12] and Kaposi's sarcoma [13]. Due to its involvement in many cellular processes related to cancer development, Src has long been considered a potential drug target in oncology. – – and Kaposi's sarcoma

**Figure 1.** The crystal structure of the Src kinase and schematic domain structure.

– The Src inhibitors developed to date are generally categorized into three major classes: (1) tyrosine kinase activity inhibitors (ATP-competitive inhibitors); (2) protein–protein interaction inhibitors (SH2, SH3 or substrate-binding domain); (3) enzyme destabilizers that provide a correlation between Src and its united molecular chaperone, i.e., heat shock protein 90 (Hsp90) [14,15]. The search for small molecules with an inhibitory activity toward Src kinases constitutes a growing field of study. Several compounds have entered clinical trials, with two compounds ultimately approved by the FDA: dasatinib, approved in 2006, and bosutinib, approved in 2012 [16]. However, dasatinib is known to inhibit over 40 kinases, while bosutinib inhibits over 45 kinases, making it impossible to use these compounds as selective mechanistic probes for Src-dependent pharmacology [17,18]. Furthermore, most Src inhibitors reported share similar scaffolds such as pyrazolo [3,4-d] pyrimidine, quinoline and quinazoline (Figure 2). To this end, it is meaningful to find more effective and selective Src inhibitors with new chemical scaffolds.

In this work, we report an integrated screening method containing pharmacophore-based virtual screening; molecular docking; absorption, distribution, metabolism, elimination and toxicity (ADMET) prediction; and molecular dynamics (MD) simulations to find novel Src inhibitors.

**Figure 2.** Chemical structures of previously reported Src inhibitors.

#### **2. Results and Discussion**

#### *2.1. Preparation of Chemical Database*

using filters such as "in stock" and "drug like", based on Lipinski's rule of five (molecular the " " " " Prior to performing the virtual screening, the database needed to undergo several filtering and preparation steps to reduce the enormous number of compounds [19]. All the selected ligands were downloaded from the ZINC database (https://zinc.docking.org/) using filters such as "in-stock" and "drug-like", and the selection of ligands was performed based on Lipinski's rule of five (molecular weight limit of 300 to 600 Da, hydrogen-bond acceptor limit of 10, hydrogen-bond donor limit of 5, rotatable-bond limit of 7, and log P limit of 5). To produce a more refined and precise set of chemical data, some built-in functions such as the "partial charges" and "energy minimize" tools of the Molecular Operating Environment software (MOE, Version 2015.10) were applied on the data set. The resulting database comprised 1,033,419 molecules with lowest energy in 3D format.

#### *2.2. Generation and Validation of Pharmacophore Model*

– – – The protein–ligand complex serves as the starting template for this modeling, wherein intermolecular interactions are perceived as feature points for subsequent virtual screening. Most often, a single protein–ligand complex is used as the template to align and score the database molecules, from which the best-fitted molecules are prioritized as potential hits [20]. Based on the crystal structure of Src kinase selected (PDB ID: 3F3V), five key pharmacophore features were generated, including one hydrogen-bond donor (Don), two hydrogen-bond acceptors (Acc), one hydrophobic and aromatic center (Hyd/Aro) and one aromatic center (Aro). As shown in Figure 3, the pharmacophore model was designed in consideration of the binding poses of the original ligand (RL45, yellow sticks). Three hydrogen-bond features were present for the ligand–protein complex: the hydrogen-bond donor (F1, purple sphere) of RL45 interacted with Glu310; the hydrogen-bond acceptors (F2 and F3, cyan sphere) of RL45 interacted with Asp404 and Met341, respectively. One hydrophobic and aromatic center (F4, yellow sphere) and one aromatic center (F5, orange sphere) were also present for the ligand.

For the validation of generated pharmacophore model, a test database was built including 18 Src reported inhibitors and 18 collected decoy molecules, which can be seen in the Supplementary Materials. The test database was then subjected to screening against the pharmacophore model to validate its precision. As a result, we obtained 14 active molecules as hits, and none of the inactive molecules was

mapped to the pharmacophore model. The results from the test database revealed the precision of the generated pharmacophore model.

**Figure 3.** Pharmacophore features generated by Molecular Operating Environment (MOE).

#### *2.3. Pharmacophore-Based Virtual Screening*

– The pharmacophore-model-based screening of databases has been considered as an important tool for computer-aided drug discovery techniques and provides information about geometric and electronic features that are involved in interaction with receptors [21]. In this part, the chemical database comprised 1,033,419 molecules with lowest energy in 3D format and was generated by applying various filters. We utilized the protein–ligand complex reported for Src for pharmacophore model generation and performed virtual screening of the prepared database to find out the best matches against the model. As a result, 891 molecules were obtained as hits on the basis of pharmacophore features.

#### *2.4. Molecular Docking*

– The hits obtained from the pharmacophore-based virtual screening were subjected to molecular docking studies; the top 10 molecules with the highest docking scores were selected for the study of binding modes (Table 1). Among these 10 molecules, ZINC23247639 and ZINC10479320 have been reported as broad-spectrum kinase inhibitors that interacted with the highly conserved ATP-binding sites of many human protein kinases [22–24]; thus, we filtered them out in this investigation. As we know, hydrogen bonds established between receptor and ligand play a major role in the functionality and stability of the complex. Hence, we observed the dominant hydrogen-bond interactions between the groups of the other eight hit molecules and the residues of the active site, and then, ZINC3214460 and ZINC1380384 caught our attention.

4′ −9. and − bonding energy components contributed −4.6 kcal/mol to the binding. − Using the default GBVI/WSA dG as a docking function in the MOE software, ZINC3214460 and ZINC1380384 ′ s docking scores were calculated as −9.6287 and −8.9096 kcal/mol, respectively. The binding interactions of ZINC3214460 and ZINC1380384 were illustrated by the PyMOL [25] software (Figure 4). As we can see, some key amino acid residues were involved in hydrogen-bonding interactions with ZINC3214460; a carbonyl group accepted an H-bond from the Asp404, and an N atom of the isoxazole formed an H-bond with Met341. The H-bond distances were 2.87 and 3.19 Å, respectively, and hydrogen-bonding energy components contributed −4.6 kcal/mol to the binding. Correspondingly, ZINC1380384 formed three H-bond interactions with the kinase. The amide fragment formed two H-bonds with Asp404 and Glu310, respectively, and an N atom of the benzimidazole accepted an H-bond from a Met341 residue. The hydrogen-bonding energy components contributed −6.9 kcal/mol to the binding. On the basis of good binding energies and their pattern of binding interaction with active pocket, two selected molecules showed strong interaction with the key amino acid residues of Src kinase. Furthermore, these two molecules have not been reported as kinase

inhibitors or related with cancer yet, so our attention is focused on ZINC3214460 and ZINC1380384, which were further subjected to ADMET prediction and molecular dynamics simulations.


**Table 1.** The structures and docking results of molecules.


**Table 1.** *Cont.*

**Figure 4.** Binding interactions of two hit molecules with active pocket of Src kinase. The hit molecules ZINC3214460 (**A**) and ZINC1380384 (**B**) are displayed in yellow sticks, and catalytic residues are displayed in green sticks. Hydrogen bonds are shown as red dashes.

#### *2.5. ADMET Prediction*

The lead compounds in drug development were found to have favorable absorption, distribution, metabolism, elimination and toxicity (ADMET) properties [26]. Pharmacokinetic properties predict the drug-likeness of ligand molecules. Therefore, the ADMET properties of molecules are essential for the development of an effective druggable molecule. In this section, ADMET characteristics such as buffer solubility, blood–brain barrier penetration (BBB), Caco-2 permeability, human intestinal absorption (HIA), plasma protein binding (PPB), cytochrome P450 2D6 (CYP2D6) modulation and hERG inhibition were studied for ZINC3214460, ZINC1380384, dasatinib and bosutinib. The results are summarized in Table 2. Solubility and human intestinal absorption are two key factors that affect oral bioavailability. We found that ZINC3214460 and ZINC1380384 show extremely high values of solubility. Low blood–brain barrier permeability was found, which served in reducing the side effects and toxicity to the brain, and the values of all the compounds are less than 1 (C.brain/C.blood < 1), suggesting that they are inactive in the CNS (central nervous system). Caco-2 permeability was used to evaluate the suitability of compounds for oral dosing, and the proposed compounds have slightly worse human intestinal permeability than two drugs approved by the FDA. In addition, the comparable intestinal absorption (HIA) for dasatinib and bosutinib indicated that two proposed compounds possess good bioavailability. The high plasma-protein binding of ZINC3214460 means a long half-life and stable efficacy, which could maintain a durable potency and adequate stability of the compound. The inhibition of CYP2D6 by a drug constitutes the majority of cases of drug–drug interaction. It was found that none of the compounds may inhibit CYP2D6. The cardiotoxicity may be related with the high risk of hERG inhibition, as shown in Table 2; ZINC3214460 has a low risk of inhibiting hERG, meaning that it has little chance of causing cardiac problems. However, according to the high-risk level of hERG inhibition, the potential cardiotoxicity of ZINC1380384 should be considered in the future. More ADMET prediction data for these two proposed molecules can be found in the Supplementary Materials (Figures S1 and S2).


**Table 2.** The absorption, distribution, metabolism, elimination and toxicity (ADMET) prediction for the investigated compounds.

<sup>1</sup> Buffer solubility: water solubility in buffer system (SK atomic types, mg/L), <sup>2</sup> BBB: blood–brain barrier penetration (C.brain/C.blood), <sup>3</sup> Caco-2: in vitro Caco-2 cell permeability (nm/sec), <sup>4</sup> HIA: human intestinal absorption (%), <sup>5</sup> PPB: plasma protein binding (%).

#### *2.6. Molecular Dynamics Simulations*

MD simulations were conducted to check the stability of the complexes predicted by molecular docking. After 50 ns MD simulations, the root-mean-square deviation (RMSD) of the backbone of Src kinase and the ligands at 300 K was plotted against time (ns). As can be seen in Figure 5, the RMSDs of 3F3V-ZINC3214460 and 3F3V-ZINC1380384 were discovered to be relatively stable at about 0.27 and 0.22 nm, respectively. There were some fluctuations in the beginning, and then, the complexes gradually tended to equilibrium until the time reached 25 ns of simulation. The RMSD values of RL45 and ZINC1380384 in the binding site of Src are similar; however, the 3F3V-ZINC3214460 complex has a larger RMSD value at around 0.30 nm. The root-mean-square fluctuation (RMSF) is an important parameter that yields data about the structural adaptability of every residue in the system. The RMSF values for all the residues in the 3F3V-ZINC3214460 and 3F3V-ZINC1380384 complexes were calculated (Figure 6). In general, the Src inhibitors are settled to the highly conserved region of

the protein (Glu339, Asp348 and Asp404 in the affinity pocket and Glu310 and Leu317 in the hinge region) through hydrogen bonds, van der Waals forces and hydrophobic interaction, etc. We found that the RMSF fluctuation values of the conserved amino acids (e.g., Glu310, Leu317 and Glu339) in 3F3V-ZINC3214460 and 3F3V-ZINC1380384 were lower than those in 3F3V-RL45, reflecting that the two proposed molecules could form stronger interactions with these conserved residues. The proteins showed distinct behavior with a varied magnitude of fluctuations at the loop region (e.g., Ser522 and Tyr527), which could be used to discover more selective inhibitors of Src. Based on the above analysis, we can conclude that these two selected molecules matched very well with the Src binding pocket, suggesting reasonable binding modes.

– **Figure 5.** The root-mean-square deviation (RMSD) trajectories of 3F3V–inhibitor complexes during 50 ns simulations. –

**Figure 6.** The root-mean-square fluctuation (RMSF) maps of 3F3V–inhibitor complexes during simulations.

–

–

#### **3. Materials and Methods**

### *3.1. Generation and Validation of Pharmacophore Model*

In this study, a protein–ligand complex-based pharmacophore model was generated by using the pharmacophore query editor protocol of MOE. Binding interactions induce significant chemical features, which were taken into account for the creation of the pharmacophore model [27]. To generate a pharmacophore model with good quality, MOE utilizes an in-built set of pharmacophore features including hydrogen-bond donor (Don), hydrogen-bond acceptor (Acc), aromatic center (Aro), Pi ring center (PiR), aromatic ring or Pi ring normal (PiN), hydrophobic atom (HydA), anionic atom (Ani), cationic atom (Cat) and so on. We summarized all the binding interactions of Src complexes in the DFG-out state from the RCSB Protein Data Bank (PDB, https://www.rcsb.org/), and several common features (e.g., hydrophobic interactions, hydrogen bonding modes and catalytic residues) were utilized for generating the model. Finally, the crystal structure of Src kinase in complex with a substrate-based inhibitor, 1-(4-((6-aminoquinazolin-4-yl) amino) phenyl)-3-(3-(tert-butyl)-1-(m-tolyl)-1H-pyrazol-5-yl) urea (RL45) (PDB ID: 3F3V; resolution, 2.6 Å), was chosen for the creation of a complex-based pharmacophore system. The inhibitor RL45 had strong interaction with key amino acid residues of Src kinase (i.e., Asp404, Glu310 and Met341) and was further used as a reference in molecular docking (Figure 7). The binding interactions of RL45 with the active pocket of Src are illustrated by the PyMOL software (2.3.2). –

The generated pharmacophore model was validated via a test database including 18 Src reported inhibitors and 18 collected decoy molecules. The molecules of Src inhibitors were downloaded from DrugBank (https://www.drugbank.ca/). The decoy molecules refer to those compounds with reported activities not related to Src, whereas their physical properties, including molecular weight, number of hydrogen-bond donors and acceptor, number of rotatable bonds, and Log P were similar to those of known Src inhibitors. The test database can be seen in the Supplementary Materials.

**Figure 7.** Binding interaction of RL45 with active pocket of Src kinase. The ligand RL45 is displayed in yellow sticks, and catalytic residues are displayed in green sticks. Hydrogen bonds are shown as red dashes.

#### *3.2. Pharmacophore-Based Virtual Screening*

In Computer-Aided Drug Designing (CADD), virtual screening is one of the time-saving methods for the discovery of novel, potent and drug-like compounds [28]. Pharmacophore models have the advantage that they can be used not only to identify novel active compounds in virtual screening but also for anti-target modeling to avoid side-effects resulting from off-target activity [29]. Especially, when structural information about the target protein or the ligand's active conformation is available, pharmacophore-based models are superior to docking and quantitative structure–activity relationship (QSAR) methods [30]. Based on the pharmacophore model generated above, virtual screening was conducted by a pharmacophore search protocol in MOE with an EHT scheme. MOE's pharmacophore search is used to apply a query to a database of molecular conformations and report those conformations as hits that satisfy the pharmacophore features. The hit molecules were preferred and kept in a separate database for the further evaluation of interactions.

#### *3.3. Molecular Docking*

The crystal structure of Src kinase was obtained from the RCSB Protein Data Bank (PDB ID: 3F3V; resolution, 2.6 Å). The protein was prepared for docking using the quickprep tool of MOE, including correcting structural issues, protonating the structure, deleting unbound water molecules and minimizing the structure to a specified gradient, to make the pocket available for the docking of new molecules. The original ligand (RL45) was used to define the binding site of the Src active pocket. For the docking parameters, we set the force field to MMFF94x and used the triangle matcher placement algorithm [31], which returned thirty poses; we also used the Rigid Receptor refinement method, which returned five poses. The London dG method was applied to score the poses in both steps [32]. By studying the top-scored docking poses, only the molecular poses for which the binding modes satisfied the pharmacophore features were retained. Each molecule with the highest docking score was regarded as a docking result for further analysis.

#### *3.4. ADMET Prediction*

Owing to poor pharmacokinetic parameters, many drugs have not passed through clinical trial stages [33]. Thus, it is necessary to predict the pharmacokinetics and toxicity of newly obtained molecules, which were selected from the docking results for further analysis. In this section, the Pre-ADMET server application (https://preadmet.bmdrc.kr) was used. The Pre-ADMET approach is based on different classes of molecular parameters that are considered for generating quantitative structure properties [34].

#### *3.5. Molecular Dynamics Simulations*

Based on the docking results, the best-posed complex was subjected to MD simulation studies using the Groningen Machine for Chemicals Simulations (GROMACS) 5.0 package [35] with a CHARMM36 force field [36] under periodic boundary conditions for molecules. Ligand topology files were generated using the CHARMM General Force Field [37]. The charge of the system was neutralized by the addition of the ions. The energy was minimized using a steepest-gradient method to remove any close contacts. The particle mesh Ewald (PME) method was employed for energy calculation and for electrostatic and Van der Waals interactions. The systems were equilibrated in the NVT ensemble for 50,000 steps, followed by equilibration in the NPT ensemble for an additional 50,000 steps. Finally, 50 ns molecular dynamics simulations were performed at 300 K with a 2.0 fs time step, and coordinates were saved every picosecond for analysis [38,39].

#### **4. Conclusions**

In this study, in order to find novel Src inhibitors, an integrated screening method was employed; through pharmacophore-based virtual screening and molecular docking, the top 10 molecules with

good binding scores were selected for the study of binding modes. ADMET prediction and molecular dynamics simulations were used to predict the pharmacokinetic properties and stabilities of proposed ligand–protein complexes. Finally, two molecules (ZINC3214460 and ZINC1380384) were selected with excellent properties and stable binding modes. In addition, ZINC1380384, possessing a novel benzo [d] imidazole scaffold, was valuable for further optimization and provides a reference for the development of novel potent Src inhibitors.

**Supplementary Materials:** Figure S1: The ADMET prediction for ZINC3214460; Figure S2: The ADMET prediction for ZINC1380384.

**Author Contributions:** Conceptualization, T.-j.Z.; methodology, Y.Z.; software, Y.Z. and T.-j.Z.; validation, Y.Z., S.T. and Z.-h.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z.; project administration, F.-h.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (Grant No. 81803355 and 81573687) and the Scientific Research Fund of Liaoning Provincial Education Department (LQNK201739).

**Conflicts of Interest:** The authors have no conflict of interest to declare.

#### **Abbreviations**

Src: sarcoma; SFKs: Src family kinases; MOE: molecular operating environment; ADMET: absorption, distribution, metabolism, elimination and toxicity; MD: molecular dynamics; RMSD: root-mean-square deviation; RMSF: root-mean-square fluctuation

#### **References**


**Sample Availability:** Samples of the compounds are not available from the authors.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
