# **In Silico Strategies for Prospective Drug Repositionings**

Edited by

Lucreția Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Printed Edition of the Special Issue Published in *Pharmaceutics*

www.mdpi.com/journal/pharmaceutics

## **In Silico Strategies for Prospective Drug Repositionings**

## **In Silico Strategies for Prospective Drug Repositionings**

Editors

**Lucret,ia Udrescu Ludovic Kurunczi Paul Bogdan Mihai Udrescu**

MDPI ' Basel ' Beijing ' Wuhan ' Barcelona ' Belgrade ' Manchester ' Tokyo ' Cluj ' Tianjin

*Editors* Lucret,ia Udrescu Department I-Drug Analysis "Victor Babes, " University of Medicine and Pharmacy Timis, oara Romania

Ludovic Kurunczi Department I-Physical Chemistry "Victor Babes, " University of Medicine and Pharmacy Timis, oara Romania

Paul Bogdan Ming Hsieh Department of Electrical and Computer Engineering University of Southern California Los Angeles United States

Mihai Udrescu Department of Computer and Information Technology Politehnica University of Timisoara Timis, oara Romania

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Pharmaceutics* (ISSN 1999-4923) (available at: www.mdpi.com/journal/pharmaceutics/special issues/drug reposition).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-6134-9 (Hbk) ISBN 978-3-0365-6133-2 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


Drug Repurposing Using Modularity Clustering in Drug-Drug Similarity Networks Based on Drug–Gene Interactions

Reprinted from: *Pharmaceutics* **2021**, *13*, 2117, doi:10.3390/pharmaceutics13122117 . . . . . . . . **127**


## **About the Editors**

#### **Lucret,ia Udrescu**

Lucret,ia Udrescu is a specialist pharmacist and Professor in the Department I–Drug Analysis and Environmental Chemistry, Hygiene, and Nutrition at the Faculty of Pharmacy, the Victor Babes, University of Medicine and Pharmacy Timisoara, Romania. She received her Ph.D. in Chemistry from the Romanian Academy in 2015. Her research interests are computational drug repositioning, drug interactions (clinical evaluation and prediction), the internet of medical things applied in pharmacovigilance, and biopharmaceutical drug profile optimization using cyclodextrins.

#### **Ludovic Kurunczi**

Ludovic Kurunczi is a retired Professor from the Faculty of Pharmacy of the Victor Babes, University of Medicine and Pharmacy Timis, oara, Romania. Before 1989, he was a researcher at the Institute of Chemistry Timis, oara, Romania, and a member of a research team running a multi-year technological project that directed the production of two insecticides from the laboratory level to the resultant industrial unit (during this period, he coauthored 10 Romanian patents). Since 1990, he has been a professor of Physical Chemistry, Methodology of Scientific Research, and Drug Design at the Faculty of Pharmacy. In his career, he addressed the following research domains: the chemistry of the organophosphorus compounds, computational chemistry (molecular modeling, quantum chemistry), Quantitative Structure Activity (Property) Relationships (QSA(P)R), cheminformatics (similarity search, Discriminant Analysis–DA, Principal Component Analysis–PCA, Partial Least Squares–PLS), protein structure handling, molecular docking, virtual screening. His publications include five books, 14 chapters, and 120 research articles with over 1260 citations. Over the years, he participated in more than 15 research projects. He is the supervisor of 9 doctoral theses (one in progress) coordinated at the Department of Computational Chemistry, "Coriolan Dragulescu", ˘ Institute of Chemistry Timis, oara of the Romanian Academy.

#### **Paul Bogdan**

Paul Bogdan is the Jack Munushian Early Career Chair and associate professor (with tenure) in the Ming Hsieh Department of Electrical and Computer Engineering at University of Southern California. His work has been recognized with a number of distinctions, including the 2012 A.G. Jordan Award from the Electrical and Computer Engineering Department, Carnegie Mellon University for outstanding Ph.D. thesis and service, the 2012 Best Paper Award from the Networks-on-Chip Symposium (NOCS), the 2012 D.O. Pederson Best Paper Award from IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, the 2012 Best Paper Award from the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), the 2013 Best Paper Award from the 18th Asia and South Pacific Design Automation Conference. His main areas of interest are network science, complex systems, synthetic and system biology, systems pharmacology, neuroscience, cyber–physical systems, fractal modeling and control of biological systems, machine learning, artificial intelligence, and design methodologies for large scale heterogeneous many core systems.

#### **Mihai Udrescu**

Mihai Udrescu is a Professor of Computer Engineering in the Department of Computer and Information Technology at the Politehnica University of Timisoara, Romania. Between September 2019 and February 2020, he was a Fulbright Visiting Scholar in the Electrical and Computer Engineering department at Carnegie Mellon University. He received his Ph.D. in Computer Engineering from the Politehnica University of Timisoara in 2006. Mihai Udrescu's present research concentrates on developing machine learning techniques and emerging (quantum) computing methods for analyzing complex biological, technological, and social systems.

## **Preface to "In Silico Strategies for Prospective Drug Repositionings"**

Drug design means planning a new chemical structure that purposely interacts with the biological targets known as being relevant for a given medical condition. The discovery of new drugs is one of the pharmaceutical research's most exciting and challenging tasks. Unfortunately, the conventional drug discovery procedure is chronophagous and cumbersome. However, over time, the successfully developed medicines—acting as planned on the intended targets—are also proven to work on other targets as efficient therapies for other diseases. Medicines have a proven proclivity to having multiple functions; a well-known example is Aspirin, initially used as an analgesic but later uncovered as an antiplatelet drug at low doses.

In this context, the process of systematically finding new functions for approved drugs—often called drug repositioning—becomes a valuable strategy in drug discovery. The literature also mentions similar terms: repurposing, reprofiling, redirecting, rediscovery, retasking, rescuing, recycling, redirection, therapeutic switching, etc. [1]. The common denominator of all these taxonomic variants is identifying a new indication for an existing drug. Nevertheless, there are significant differences in the characterizations of the repurposed entity—from "old drugs" [2] to "drug candidates from academic institutions and public sector laboratories not yet fully pursued" [3], "drugs that have previously passed safety testing for human use", or "drugs that have advanced to the clinical trial stage of development but have failed or stalled at that stage" [4].

Pursuing drug repurposing has obvious financial reasons. In 2004, Ashburn and Thor [5] argued that a solution for the pharma industry—facing high costs for launching new drugs—is to focus on drug repositioning. Indeed, in a 2020 June FDA virtual conference [6], participants estimated the costs for a repurposed drug to reach the market to be USD 500 thousand (for a development period of 1–3 years), in contrast to over USD 1.5 billion for a new drug (for a development period of 12–19 years). Such significant cost reductions mean drug repurposing is also suitable for identifying existing drugs as efficient in rare diseases with low economic incentives. Drug repositioning's massive potential to impact healthcare practices is further emphasized by the growth of this subject in the scientific literature. Thus, the number of papers containing the terms "drug" and "repurposing" has grown exponentially between 2005 and 2019 [7] (790 documents accumulated in 2019, according to Scopus).

Twenty years ago, pharmacists and medical doctors relied on fortuity to reposition medicines. The reason is that the drug-target search space is enormous, and exploring it requires vast human and material resources. Nevertheless, efficiently benefiting from serendipity requires that opportunity (drugs' apparent propensity for multiple functions) meets preparation (systematic, practical methods). The recent progress in machine learning, complex systems, and big data computational analysis provides systematic methods for drug repositioning. Such approaches successfully prune the drug-repositioning search space, providing valuable hints to experimental biologists and biochemists. As such, computational repositioning was recently employed as a valuable tool for identifying potential therapies for COVID-19.

Following this new and exciting trend, our Special Issue's scope was to collect papers introducing innovative computational methods to identify potential candidates for drug repositioning. The papers in this Special Issue introduce a wide array of in-silico strategies, such as complex network analysis, big data, machine learning, molecular docking, molecular dynamics simulation, and QSAR; these strategies target diverse diseases and medical conditions: COVID-19 and post-COVID-19 pulmonary fibrosis, non-small lung cancer, multiple sclerosis, toxoplasmosis, psychiatric disorders, or skin conditions such as Hidradenitis Suppurativa.

The editors are grateful to *Pharmaceutics* for allowing us to host this highly interdisciplinary, distinctive Special Issue. Our special thanks go to Candice Zhuo, the Assistant Editor, who strongly supported this project and facilitated our communication with the Editorial Office at *Pharmaceutics*. We also appreciate the work of all Assistant Editors involved in this Special Issue.

#### References:

[1] Langedijk, J. Continuous innovation in the drug life cycle. Ph.D. Thesis, Utrecht University, Utrecht, Netherlands, 2016 19 december (ISBN: 978-94-6233-489-2)

[2] Avram, S., Curpan, R., Halip, L., Bora, A., Oprea, T.I. Off-patent drug repositioning. Journal of Chemical Information and Modeling 2020, 60(12):5746-5753.

[3] Allarakhia, M. Open-source approaches for the repurposing of existing or failed candidate drugs: learning from and applying the lessons across diseases. Drug Design, Development and Therapy 2013, 7, 753-766.

[4] Pharmaceutical Companies Repurposing Drugs to Accelerate Growth.

[5] Ashburn, T.T., Thor, K.B. Drug Repositioning: Identifying and Developing New Uses for Existing Drugs. Nature Reviews Drug Discovery 2004, 3(8):673-683.

[6] FDA Drug Topics: CURE ID: Capturing Clinicians Experiences Repurposing Drugs to Inform Future Studies in the Era of COVID-19.

[7] Maria Shkrob and Jabe Wilson. Drug Repurposing Could Open the Door to New Therapies. Technology Networks. September 4, 2020.

#### **Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan, and Mihai Udrescu** *Editors*

## *Article* **A COVID-19 Drug Repurposing Strategy through Quantitative Homological Similarities Using a Topological Data Analysis-Based Framework**

**Raul Pérez-Moraga 1,2,† , Jaume Forés-Martos 1,2,3,† , Beatriz Suay-García 1,2 , Jean-Louis Duval <sup>4</sup> , Antonio Falcó 1,2,\* and Joan Climent 1,5,\***


**Abstract:** Since its emergence in March 2020, the SARS-CoV-2 global pandemic has produced more than 116 million cases and 2.5 million deaths worldwide. Despite the enormous efforts carried out by the scientific community, no effective treatments have been developed to date. We applied a novel computational pipeline aimed to accelerate the process of identifying drug repurposing candidates which allows us to compare three-dimensional protein structures. Its use in conjunction with two in silico validation strategies (molecular docking and transcriptomic analyses) allowed us to identify a set of potential drug repurposing candidates targeting three viral proteins (3CL viral protease, NSP15 endoribonuclease, and NSP12 RNA-dependent RNA polymerase), which included rutin, dexamethasone, and vemurafenib. This is the first time that a topological data analysis (TDA)-based strategy has been used to compare a massive number of protein structures with the final objective of performing drug repurposing to treat SARS-CoV-2 infection.

**Keywords:** COVID-19; drug repurposing; topological data analysis; persistent Betti function

#### **1. Introduction**

On 11 March 2020, the World Health Organization (WHO) declared the Coronavirus Disease 2019 (COVID-19) outbreak, produced by the novel SARS-CoV-2 virus, a global pandemic [1]. To date, three previously approved antiviral drugs and one antimalarial medication (remdesevir, iopinavir, interferon-1, and hydroxychloroquine) have been tested for efficacy against SARS-CoV-2 infection by the WHO SOLIDARITY consortium in a large multicentric study. The results of the trial suggested that these treatments had little or no effect in a set of clinical outcomes which included overall mortality, time to initiation of mechanical ventilation, and duration of hospital stay [2].

With the third wave ongoing in many countries, herd immunity a distant prospect, and new strains challenging the existing vaccines, it is still a pressing need to find adequate treatments for the disease. De novo drug development and testing, including preclinical research and clinical trials, is a slow process that could take more than 12 years [3,4]. However, the current sanitary emergency makes it imperative to shorten this time frame. Therefore, sustained efforts to identify potential candidates for drug repurposing are necessary.

**Citation:** Pérez-Moraga, R.; Forés-Martos, J.; Suay-García, B.; Duval, J.-L.; Falcó, A.; Climent, J. A COVID-19 Drug Repurposing Strategy through Quantitative Homological Similarities Using a Topological Data Analysis-Based Framework. *Pharmaceutics* **2021**, *13*, 488. https://doi.org/10.3390/ pharmaceutics13040488

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 21 February 2021 Accepted: 31 March 2021 Published: 2 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In the context of COVID-19, Kumar and co-workers compiled sets of genes linked to the disorder and studied their distribution in the human interactome [5]. They first identified the interactome subnetworks' hub genes in which the disease-related genes were placed. Then, they queried the drug–gene interaction database to identify Food and Drug Administration (FDA)-approved drugs that had the hub genes as their target (i.e., chloroquine, lenalidomide, pentoxifylline) [6,7]. Zhou and collaborators compiled a list of human proteins that physically interact with four previous human coronaviruses (SARS-CoV, MERS-CoV, HCoV-229E, and HCoV-NL63) and used network proximity measures to prioritize 16 potential anti-human coronavirus repurposable drugs including melatonin, mercaptopurine, and sirolimus [8]. Drug repurposing studies using virtual screening procedures based on molecular docking have also been reported. To cite an example, Kerestsu et al. used a protease inhibitors database (MEROSP) and the geometric structure of the 3C-Like virus protease (3CLpro) to identify 15 potential inhibitors using the surflex-Dock software [9]. frame. Therefore, sustained efforts to identify potential candidates for drug repurposing are necessary. In the context of COVID-19, Kumar and co-workers compiled sets of genes linked to the disorder and studied their distribution in the human interactome [5]. They first identified the interactome subnetworks' hub genes in which the disease-related genes were placed. Then, they queried the drug–gene interaction database to identify Food and Drug Administration (FDA)-approved drugs that had the hub genes as their target (i.e., chloroquine, lenalidomide, pentoxifylline) [6,7]. Zhou and collaborators compiled a list of human proteins that physically interact with four previous human coronaviruses (SARS-CoV, MERS-CoV, HCoV-229E, and HCoV-NL63) and used network proximity measures to prioritize 16 potential anti-human coronavirus repurposable drugs including melatonin, mercaptopurine, and sirolimus [8]. Drug repurposing studies using virtual screening procedures based on molecular docking have also been reported. To cite an example, Kerestsu et al. used a protease inhibitors database (MEROSP) and the geometric structure of the 3C-Like virus protease (3CLpro) to identify 15 potential inhibitors using the surflex-

Here, we present a general-purpose drug repositioning workflow and its application to the specific case of COVID-19. Our procedure is based on recent developments in the field of topological data analysis (TDA) and its use in the study of biological geometric structures [10]. In particular, our method relies on the idea that drugs that are known to target a specific protein would likely target other proteins that present high degrees of topological similarities with the initial protein. Therefore, the accumulated knowledge of drug–protein interactions available in public repositories such as DrugBank in combination with the information about protein three-dimensional structures found in the Protein Data Bank (PDB) can be used to predict new potential drug protein targets based on the computation of protein–protein topological similarities. Figure 1 contains a brief summary of the general methodology. Dock software [9]. Here, we present a general-purpose drug repositioning workflow and its application to the specific case of COVID-19. Our procedure is based on recent developments in the field of topological data analysis (TDA) and its use in the study of biological geometric structures [10]. In particular, our method relies on the idea that drugs that are known to target a specific protein would likely target other proteins that present high degrees of topological similarities with the initial protein. Therefore, the accumulated knowledge of drug–protein interactions available in public repositories such as DrugBank in combination with the information about protein three-dimensional structures found in the Protein Data Bank (PDB) can be used to predict new potential drug protein targets based on the computation of protein–protein topological similarities. Figure 1 contains a brief summary of the general methodology.

**Figure 1.** *Cont*.

**Figure 1.** Bioinformatic work-flow used. (**A**) Data preprocessing and acquisition (**B**) Topological data analysis phase, Vietoris–Rips complexes at scale ε are computed to generate the barcodes. Each ε-associated Betti number captures a unique topological feature of the protein. (**C**) To compare barcodes of viral proteins against structures with known drugs, it is necessary to transform barcodes into comparable curves using persistent Betti functions (PBFs). (**D**) Candidate drugs from proteins with a mean persistent similarity score above 0.9 were validated by a dual in silico strategy. We used Auto-Dock 4 to analyze the capacity of the drug to bind against viral proteins. Transcriptomics analysis was performed to test the capacity of the candidate drugs to revert the transcriptomics effect induced by the COVID-19. **2. Results**  *2.1. Drugs, Protein Targets, and PDB Structures Included in This Study*  **Figure 1.** Bioinformatic work-flow used. (**A**) Data preprocessing and acquisition (**B**) Topological data analysis phase, Vietoris–Rips complexes at scale ε are computed to generate the barcodes. Each ε-associated Betti number captures a unique topological feature of the protein. (**C**) To compare barcodes of viral proteins against structures with known drugs, it is necessary to transform barcodes into comparable curves using persistent Betti functions (PBFs). (**D**) Candidate drugs from proteins with a mean persistent similarity score above 0.9 were validated by a dual in silico strategy. We used AutoDock 4 to analyze the capacity of the drug to bind against viral proteins. Transcriptomics analysis was performed to test the capacity of the candidate drugs to revert the transcriptomics effect induced by the COVID-19.

#### DrugBank queries yielded 1825 drugs approved by the American Food and Drug Administration (FDA). The identified drugs had 1821 known unique protein targets, for **2. Results**

#### which 27,839 three-dimensional structures were available in the protein databank. The *2.1. Drugs, Protein Targets, and PDB Structures Included in This Study*

tion, Macromolecular Name (column 3) is the protein short name and Chain ID (column 4) are the studied chains.

first three persistent Betti functions (PBFs, see Section 4.2) were successfully calculated for 25,800 of the 27,839 structures, whereas computational limitations prevented us from estimating the remaining 1622 structures' PBFs. We also retrieved multiple protein structures from SARS-CoV-2 that were available in PDB, including the Spike protein receptor binding domain, the RNA-dependent RNA polymerase (NSP12), the endoribonuclease (NSP15), the ADP ribose phosphatase (NSP3), the RNA binding protein (NSP9), the 3Clike protease, and the NSP 8 and 7. In total, we calculated the PBFs of 23 viral protein structures. Table 1 shows the complete information regarding the included SARS-CoV-2 protein structures. **Table 1.** Protein Data Bank (PDB) structures of SARS-CoV-2 proteins analyzed in the study. Entry ID (column 1) encodes the PDB identifyers of the analyzed protein structures, Structure Title (column 2) provides the protein structure descrip-DrugBank queries yielded 1825 drugs approved by the American Food and Drug Administration (FDA). The identified drugs had 1821 known unique protein targets, for which 27,839 three-dimensional structures were available in the protein databank. The first three persistent Betti functions (PBFs, see Section 4.2) were successfully calculated for 25,800 of the 27,839 structures, whereas computational limitations prevented us from estimating the remaining 1622 structures' PBFs. We also retrieved multiple protein structures from SARS-CoV-2 that were available in PDB, including the Spike protein receptor binding domain, the RNA-dependent RNA polymerase (NSP12), the endoribonuclease (NSP15), the ADP ribose phosphatase (NSP3), the RNA binding protein (NSP9), the 3C-like protease, and the NSP 8 and 7. In total, we calculated the PBFs of 23 viral protein structures. Table 1 shows the complete information regarding the included SARS-CoV-2 protein structures.

#### **Entry ID Structure Title Macromolecule Name Chain ID**  6LVN 2019-nCoV HR2 Domain Spike protein S2 A, B, C, D *2.2. TDA Results, Viral Proteins Showing Mean Persistent Similarities above 0.9 with Structures Targeted by Known FDA-Approved Drugs*

6YI3 The N-terminal RNA-binding domain of the SARS-CoV-2 nucleocapsid phosphoprotein Nucleoprotein A 6M3M SARS-CoV-2 nucleocapsid protein N-terminal RNA binding domain SARS-CoV-2 nucleocapsid protein A, B, C, D 6VYO RNA binding domain of nucleocapsid phosphoprotein from SARS coronavirus 2 Nucleoprotein A, B, C, D 6WJI C-terminal Dimerization Domain of Nucleocapsid Phosphoprotein from SARS-CoV-2 SARS-CoV-2 nucleocapsid protein A, B, C, D, E, F We compared 23 PDB structures derived from SARS-CoV-2 with 25,800 structures belonging to proteins that are known targets of FDA-approved drugs through the computation of 593,400 persistent similarity measures. We selected a stringent threshold of 0.9 for the mean of the persistent similarity measures (see Section 4.2) in order to call two protein structures similar. Three viral structures, the 3CL protease (6M2Q), the RNA-dependent RNA polymerase (6M71), and the NSP15 endoribonuclease (6W01), presented a mean of the persistent similarity measures with values higher than the selected threshold with proteins known to be targeted by approved drugs. The 3CL protease was found to be associated with 284 PDB structures (Supplementary Table S1), most of them classified as Aldo/Keto reductases and protein kinases, which were targeted by 55 different pharmacological compounds (Supplementary Table S2). The RNA-dependent RNA polymerase was found to be significantly associated with 361 PDB structures (Supplementary Table S3), which in many cases belonged to the protein kinase and flavin-containing oxidoreductase

families, and that were found to be targeted by 204 unique drugs (Supplementary Table S4). Finally, the viral NSP15 endoribonuclease presented topological similarity values higher than 0.9 with 13 PDB structures (Supplementary Table S5), where the most abundant group was the poly(Adp-RIbose) Polymerase Catalytic Domain. These structures were targeted by 45 drugs (Supplementary Table S6).

**Table 1.** Protein Data Bank (PDB) structures of SARS-CoV-2 proteins analyzed in the study. Entry ID (column 1) encodes the PDB identifyers of the analyzed protein structures, Structure Title (column 2) provides the protein structure description, Macromolecular Name (column 3) is the protein short name and Chain ID (column 4) are the studied chains.


Drugs known to target proteins presenting a mean of the persistent similarity measures larger than 0.9 with the SARS-CoV-2 structures were subjected to blind docking with the viral proteins. Blind docking was carried out using the complete viral protein and

drug structure information preprocessed as detailed in Section 4, which included polar hydrogen addition. A set of potential repurposable candidates was then selected based on the topological similarity criteria (a mean of the persistent similarity measures), the correlations between the transcriptomic profiles observed in patients infected by SARS-CoV-2 and those generated by treating cell lines with the candidate drugs, and the blind docking analyses results. Therefore, the selected candidates are known to target proteins with large topological similarities with a specific viral protein, present high affinities with the viral structures, and have the capacity to partially revert the transcriptomic effects induced by the viral infection. Figure 2 provides a schematic overview of the narrowingdown process followed to identify the final 16 drug candidates. Furthermore, the full description of the candidates can be consulted in Table 2.

We identified six repurposable candidates to target the 3CL viral protease (6M2Q). Cholic acid, an amphipathic sterol, presented the strongest binding energies (BE = −15.06 kcal/mol), and was found to negatively correlate with transcriptomic dataset 2 (DS2 r = −0.11). Rutin (BE = −14.52 kcal/mol, DS2 r = −0.184 DS3 r = −0.1), a flavonoid-3-o-glycoside with known antioxidant and cytoprotective activity, was also selected [11,12]. Two non-steroidal anti-inflammatory drugs, indomethacin (BE = −13.31 kcal/mol, DS2 r = −0.12) and sulindac (BE = −13.14 kcal/mol, DS2 r = −0.12), were also identified. Whereas indomethacin presents antipyretic and analgesic properties [13], sulindac is used to treat conditions that involve chronic inflammation, such as arthritis [14]. Finally, sulfisoxazole (BE = −11.59 kcal/mol DS2 r = −0.13), a sulfanilamide used as a broad-spectrum antibiotic, and dasatinib (BE = −10.94 kcal/mol DS2 r = −0.15), a tyrosine kinase inhibitor indicated for the treatment of chronic myeloid leukaemia [15], were also identified as drugs with the potential of targeting the viral 3CL protease.

Five compounds were found to be candidates to target the SARS-CoV-2 NSP15 endoribonuclease (6W01), which included two corticosteroids, dexamethasone (BE = −11.42 kcal/mol, DS2 r = −0.15) and spironolactone (BE = −10.99 kcal/mol, DS1 r = −0.12 and DS2 r = −0.1), which are indicated for the treatment of allergies and asthma and resistant hypertension, respectively [14,16,17]; phenolphthalein (BE = −11.15 kcal/mol, DS1 r = −0.13), a compound historically used as a laxative [18]; mifepristone (BE = −10.04 kcal/mol, DS1 r = −0.13, DS2 r = −0.14), a synthetic steroid progesterone antagonist drug that is indicated for Cushing's syndrome and is also used as an emergency contraceptive pill [19,20]; and, finally, carbamazepine (BE = −9.66 kcal/mol, DS2 r = −0.15), a pharmacologically active molecule related to the group of tricyclic antidepressants, mainly used as anticonvulsant [14,21].

Lastly, the analysis of the NSP12 RNA-dependent RNA polymerase (6M71) yielded multiple antineoplastic drugs as possible repurposing candidates: vemurafenib (BE = −8.09 kcal/mol DS2 r = −0.16), a BRAF inhibitor [22,23]; sorafenib (BE = −7.34 kcal/mol DS1 r = −0.11, DS2 r = −0.15), a multitarget protein kinase inhibitor [24]; levonorgestrel (BE = −7,21 kcal/mol, DS2 r = −0.14), a synthetic progestogen used as a first-line oral emergency contractive pill [14]; the opioid antagonist naloxone (BE = −7.07 kcal/mol, DS2 r = −0.11); and raloxifene (BE = −7.05 kcal/mol, DS1 r = −0.13 and DS2 r = −0.17), a selective estrogen receptor modulator mainly used to treat osteoporosis in postmenopausal women and avoid bone loss [25]. Supplementary File 2 shows the interacting residues between the three viral proteins and the 16 drugs identified as potential repurposing candidates.

#### *2.3. Transcriptomic Data Analysis Results*

Differential gene expression analyses were carried out with the three identified datasets including samples infected with SARS-CoV-2 and uninfected controls, and were followed by Gene Set Enrichment Analysis (GSEA) and LINCS L1000 analysis. GSEA analyses allow the identification of coordinated changes in the expression of genes belonging to specific biological processes and pathways in case samples compared to controls. GSEA results are reported using the Normalized Enrichment Score (NES) and the *p*-value adjusted by multiple comparisons (p-adj). LINCS L1000 analyses aim to find drugs capable of reverting the transcriptomic effects produced by SARS-CoV-2 infection. Differential

gene expression analysis of DS1 yielded 451 deregulated genes (DEGs), of which 213 were found to be upregulated and 238 were downregulated in SARS-CoV-2 infected samples compared to controls. The top upregulated genes were derived from the virus open reading frames. Gene Set Enrichment Analysis (GSEA) showed that pathways linked to the immune response were heavily upregulated in SARSCoV-2-infected samples. Instances of such pathways included immune response mediated by circulating immunoglobulin (p-adj = 1.8 <sup>×</sup> <sup>10</sup>−25), B-cell mediated immunity, (p-adj = 3.2 <sup>×</sup> <sup>10</sup>−22), and adaptive immune response (p-adj = 2.0 <sup>×</sup> <sup>10</sup>−20). The FDA-approved drugs showing the strongest negative correlation in LINCS L1000 analysis were niclosamide, bisacodyl, and perhexiline (r = −0.21, −0.19, −0.18, respectively). GSEA analysis of the transcriptomic signatures produced by these medications suggested that they induce significant gene expression changes in pathways linked to interleukin signaling and NF-kB activation. Genes included in the set of potential 105 therapeutics for SARS were also found to be upregulated in the bisacodyl signature (NES = 1.61, p-adj = 2.19 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ). The JAK-STAT complex and the TCF-dependent signaling pathways were found to be downregulated in the perhexiline and niclosamide signatures, respectively. *Pharmaceutics* **2021**, *13*, x 6 of 20

**Figure 2.** Schematization of the narrowing-down process followed to identify the final 16 drug candidates. **Table 2.** Drug repurposing candidates based on the topological, trascriptomic, and docking criteria. PC: Pearson correla-**Figure 2.** Schematization of the narrowing-down process followed to identify the final 16 drug candidates.

tion. LE: Lowest energy conformation in the cluster. Candidates with a PC of <−0.1 may revert the transcriptomic effects of SARS-CoV-2 infection. The maximum number of the AutoDock cluster is 150. Drug ID (colum 2) encodes the DrugBank ID of the corresponding drug (column 1). **6M2Q (SARS-CoV-2 3CL Protease) Drug Name Drug ID PC DS1 (GSE150316) PC DS2 (CRA002390) PC DS3 (GSE147507) AutoDock LE (kcal/mol) AutoDock cluster** CholicAcid DB02659 –0.09 –0.11 –0.08 –15.06 74 Rutin DB01698 –0.07 –0.18 –0.1 –14.52 149 Indomethacin DB00328 –0.07 –0.12 –0.05 –13.31 146 Sulindac DB00605 –0.07 –0.12 –0.07 –13.14 73 Sulfisoxazole DB00263 –0.05 –0.13 –0.09 –11.59 77 Dasatinib DB01254 –0.04 –0.15 –0.09 –10.94 43 **6W01 (NSP15 Endoribonuclease)** Dexamethasone DB01234 –0.07 –0.15 −0.08 −11.42 49 Phenolphthalein DB04824 –0.13 –0.1 −0.04 −11.15 101 Spironolactone DB00421 –0.12 –0.1 −0.09 −10.99 110 Mifepristone DB00834 –0.13 –0.14 −0.06 −10.04 28 Carbamazepine DB00564 –0.08 –0.14 −0.07 −9.66 86 **6M71 (NSP12 RNA-dependent RNA polymerase)** Vemurafenib DB08881 −0.09 −0.16 −0.08 −8.09 13 Sorafenib DB00398 −0.11 −0.15 −0.05 −7.34 30 Levonorgestrel DB00367 −0.08 −0.14 −0.08 −7.21 89 Naloxone DB01183 −0.06 −0.12 −0.09 −7.07 69 Raloxifene DB00481 −0.13 −0.17 −0.07 −7.05 6 We identified six repurposable candidates to target the 3CL viral protease (6M2Q). Cholic acid, an amphipathic sterol, presented the strongest binding energies (BE = −15.06 A total of 8380 DEGs were identified in the DS2 analysis. A total of 4606 genes were found to be upregulated, and 3774 were found to be downregulated in SARS CoV-2 infected samples compared to uninfected controls. Upregulated genes were enriched in components of the humoral immune response, epidermis development, keratinization, and B-cellmediated immunity (p-adj = 1.1 <sup>×</sup> <sup>10</sup>−20, 8.2 <sup>×</sup> <sup>10</sup>−20, 1.3 <sup>×</sup> <sup>10</sup>−18, 2.5 <sup>×</sup> <sup>10</sup>−10, respectively), among others. The top negatively correlated drugs included instances of several different compound families, such as anti-inflammatories (phenylbutazone, r = −0.21), antidiabetics (troglitazone, r = −0.20), antimalarials (chloroquine, r = −0.20), and other compounds such as nicotine (r = −0.17). Treatment with phenylbutazone was found to upregulate the gene expression of genes included in the interleukin-12 and 17 signaling pathways. In contrast, interleukin-4 and 13 signaling-related genes tended to be downregulated by chloroquine treatment (NES = <sup>−</sup>1.45, p-adj = 4.30 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ). Genes involved in the viral mRNA translation and the ISG15 antiviral mechanism were also upregulated in the gene expression profiles induced by treatment with chloroquine, phenylbutazone, and troglitazone. In addition, the SARS-CoV infection pathway was found to be upregulated in samples treated by chloroquine and troglitazone. ADORA2B-mediated anti-inflammatory cytokine production-related genes were downregulated by the treatment of the three top negatively correlated drugs.

kcal/mol), and was found to negatively correlate with transcriptomic dataset 2 (DS2 r = −0.11). Rutin (BE = −14.52 kcal/mol, DS2 r = −0.184 DS3 r = −0.1), a flavonoid-3-o-glycoside with known antioxidant and cytoprotective activity, was also selected [11,12]. Two nonsteroidal anti-inflammatory drugs, indomethacin (BE = −13.31 kcal/mol, DS2 r = −0.12) and sulindac (BE = −13.14 kcal/mol, DS2 r = −0.12), were also identified. Whereas indomethacin presents antipyretic and analgesic properties [13], sulindac is used to treat conditions that DS3 presented the lowest yield in terms of differentially expressed genes. A total of 188 genes were found to be upregulated to controls, whereas 31 genes were found to be downregulated in infected samples compared to controls. Twenty-nine biological processes were found to be significantly upregulated and were mainly linked to mechanisms aimed to

fight the viral infection and immune system-related processes including, defense response to virus (p-adj = 7.2 <sup>×</sup> <sup>10</sup>−13), myeloid leukocyte-mediated immunity (p-adj = 8.8 <sup>×</sup> <sup>10</sup>−15), regulation of cytokine production (p-adj = 1.5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ), and response to interferon-gamma (p-adj = 1.9 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ), among others. Chloroquine was found to be the top negatively correlated drug (r = −0.11), followed by others such as pazopanib, spectinomycin, and troglitazone (r = −0.11, −0.11, −0.10, respectively). The correlations observed in this dataset tended to be weaker than those computed for DS1 and DS2. GSEA analyses of the drug signatures showed that troglitazone increased the expression of genes classified as potential therapeutics for SARS (NES = 1.46, p-adj = 4.65 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ), in addition to antiviral pathways such as the ISG15 and IFN-stimulated antiviral mechanisms. Spectinomycin was found to reduce the expression of interferon-gamma signaling 135 and interleukin 2, 3, and 5 pathway-related genes, whereas pazopanib was found to upregulate viralrelated pathways such as viral mRNA translation influenza and SARS-CoV-2 infection. Supplementary File 1 includes the complete differential gene expression and enrichment analysis results for transcriptomic datasets 1, 2, and 3, whereas Supplementary File 2 contains the full LINCS L1000 analysis information.

**Table 2.** Drug repurposing candidates based on the topological, trascriptomic, and docking criteria. PC: Pearson correlation. LE: Lowest energy conformation in the cluster. Candidates with a PC of <−0.1 may revert the transcriptomic effects of SARS-CoV-2 infection. The maximum number of the AutoDock cluster is 150. Drug ID (colum 2) encodes the DrugBank ID of the corresponding drug (column 1).


#### *2.4. GSEA Analysis of the Repurposing Candidates*

We determined the transcriptomic impact of the treatment with the selected candidates on two sets of biological processes linked to COVID-19, viral infections, and immune-related pathways by performing Gene Set Enrichment Analysis (GSEA) of their gene expression signatures derived from LINCS L1000. The transcriptomic profiles generated by cholic acid, rutin, sulfafurazole, and sulindac treatment (candidates to target the 3CL protease) were found to be enriched in the ISG15 antiviral mechanism. Furthermore, genes related to interleukin-1 and 12 signaling tended to be upregulated in rutin's signature, in addition to genes belonging to the potential therapeutics for SARS gene set (NES = 1.51, p-adj = 3.85 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ) whereas WNT ligand biogenesis and trafficking (NES) genes were found to be downregulated by rutin treatment (NES = <sup>−</sup>1.99, p-adj = 2.12 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ) (Supplementary Table S7). RNA-dependent RNA polymerase drug candidates, levonorgestrel and raloxifene, were found to be enriched in pathways related to antiviral processes such as ISG15 antiviral mechanism (levonorgestrel, NES = 2.08, p-adj = 9.95 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ; raloxifene, NES = 2.06, p-adj = 8.13 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ) and antiviral mechanism by IFN-stimulated genes (lev-

onorgestrel, NES = 1.95, p-adj = 1.22 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ; raloxifene, NES = 1.94, p-adj = 1.12 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ). In addition, interferon alpha/beta signaling was observed to be depleted in raloxifene-treated cells (NES = <sup>−</sup>1.52, p-adj = 4.59 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ) (Supplementary Table S8). Finally, in the case of NSP15 endoribonuclease candidate drugs, dexamethasone produced gene expression signatures upregulated in pathways associated with viral infection response, such as ISG15 antiviral mechanism (NES = 1.82, p-adj = 3.17 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ) and the antiviral mechanism by IFN-stimulated genes (NES = 1.59, p-adj = 1.20 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ). This pathway was also found to be upregulated in the gene expression profiles of carbamazepine and mifepristone. Finally, interleukin-7 signaling (NES = <sup>−</sup>1.64, p-adj = 3.47 <sup>×</sup> <sup>10</sup>−<sup>2</sup> ) and interferon alpha/beta signaling (NES = <sup>−</sup>1.68, p-adj = 5.48 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ) were downregulated by dexamethasone treatment (Supplementary Table S9). Figure 3 shows a dot plot representation of the GSEA analysis results. *Pharmaceutics* **2021**, *13*, x 10 of 20

**Figure 3.** Gene Set Enrichment Analysis (GSEA) results for candidate drugs for 6M2Q, 6M71, and 6W01 SARS-CoV-2 structures with the expression signature yields from correlation analyses from DS2. Reactome pathways related to the **Figure 3.** Gene Set Enrichment Analysis (GSEA) results for candidate drugs for 6M2Q, 6M71, and 6W01 SARS-CoV-2 structures with the expression signature yields from correlation analyses from DS2. Reactome pathways related to the immune system and viral infections. Only drugs with at least one pathway with an adjusted *p*-value < 0.05 are displayed. The GSEA table with the results is available in Supplementary Tables S7–S9.

#### **3. Discussion**

On December 31st, 2019, the World Health Organization (WHO) was officially notified about several cases of pneumonia in Wuhan City, China, caused by COVID-19, a disease with no effective treatment nor a specific vaccine at that time, which history and quest for a cure is a daily struggle and is constantly being rewritten. As specific antiviral treatments are still under development and the vaccination campaign has faced difficulties derived from unmet forecasts in the process of production and distribution, drug repurposing strategies suggesting the use of FDA-approved drugs continue to be a valuable option to find candidate drugs for the effective treatment of COVID-19 in a short timeframe.

Here, we report a novel TDA-based strategy for drug repurposing in combination with current methodologies of molecular docking, differential expression analysis of SARS-CoV-2 infected cells, and correlation with FDA-approved drugs transcriptomic profiles. Our results indicate that the proposed TDA-based formalism is a promising tool to address biological problems from a dual perspective. First, from a structural biology perspective, we used the Vietoris–Rips complex to compute the PBF encoding the shape of each protein structure. Next, to measure the degree of similarity between proteins we introduced the persistent similarity measure (PSM, see Section 4.2). This allowed us to classify proteins based solely on the C atomic coordinates. TDA-based methods have been previously proposed as a method to study the topological invariants of the three-dimensional structure of biomolecules. Several studies have employed this framework to classify protein structures using only the three-dimensional coordinates of the atoms from crystallographically resolved proteins. For instance, Xia and collaborators performed TDA-based methods on three-dimensional biomolecular structures to study their structural characteristics, flexibility prediction, and folding properties [10]. Hence, they defined the molecular topological fingerprints (MFTs) to extract the topological information from protein structures using the so-called persistent Betti numbers [26]. K. Dey and colleagues proposed another topologybased method to generate protein signatures to create a fast domain classifier using a support vector machine [27]. Interestingly, our mean persistence similarity metric was able to achieve results comparable to those obtained by the state-of-the-art structural alignment method, DALI [28], and presented a high predictive power clustering protein in terms of external classifications.

Molecular docking simulation is a rapid screening method to test compound binding activity. Additionally, transcriptomic data represent a rich alternative resource for inferring non-obvious relationships between drugs and genes. Previous in silico molecular docking studies have highlighted the potential of repurposed drugs for the treatment of COVID-19 [29–35]. However, here we used in silico molecular docking combined with transcriptomic small molecule treatment data from LINCS L1000 to determine which FDAapproved drugs may reverse the effects of SARS-CoV-2 infection. The gene expression profiles in response to the identified drugs support the docking results and offer a plausible perspective for the pathways associated with protein responses to drugs binding to SARS-CoV-2 proteins. To our knowledge, this is the first time that an application of barcode-based similarity measures has been used for the analysis of large datasets of PDB structures.

The generation of PBF depends upon the previous construction of Vietoris–Rips complexes, which have a computational store cost that scales exponentially with the number of points defining a particular structure. Moreover, in the worst case, the standard algorithm to compute the barcodes has cubic complexity in the number of simplices. Although our analyses were carried out in a cluster with 32 cores and up to 500 GB of RAM, the computational cost of the barcode generation of the excluded 1622 genes exceeded the available amount of RAM or required an exponential amount of runtime.

Among all of the SARS-Cov-2 proteins analyzed (*n* = 23, Table 1), only three showed a persistent similarity score above 0.9 against other protein structures targeted with known drugs. Interestingly, these proteins are key components in coronavirus replication and structural assembly: the Viral 3CL protease (6M2Q), a chymotrypsin-like protease that is essential for the production of non-structural proteins [36]; the nsp12 RNA-dependent

RNA polymerase (6M71), the main component of coronavirus replication and transcription machinery, and because of that an excellent target for new therapeutics [37]; and the nsp15 endoribonuclease (6W01), a protein with a poorly defined role in SARS-CoV-2 infection, but which has been described to be linked to pRB downregulation affecting host cell cycle division and coronavirus infection in other coronaviruses (SARS-CoV), and with a role as an antagonist of host dsRNA sensors during coronavirus infection in macrophages to evade innate immune system defenses [38,39]. Hence, in this study, we selected three proteins from the SARS-CoV-2 coronavirus as the best candidates to find repurposed drugs to combat the disease.

Our differential expression analyses revealed that troglitazone, niclosamide, and chloroquine, among multiple candidates, were the top negatively correlated drugs that may revert the effects of SARS-CoV-2 infection to the cell transcriptome. Moreover, chloroquine is already under study in several clinical trials, although recent results reported by the WHO SOLIDARITY study stated that chloroquine has no significant effect on hospitalized COVID-19 patients, in terms of the overall mortality level [2]. Niclosamide is also being evaluated under a Phase 2 clinical trial [40]. In addition, the antiviral activity of the niclosamide has been demonstrated against SARS-CoV in in vitro studies [41] and recent investigations against SARS-CoV-2 [42], and also previously against other MERS coronaviruses [43].

To date, no therapeutic agents have been proven to be effective against SARS-CoV-2. Several treatments have been reported under investigation specifically to treat COVID-19 as the result of drug repurposing strategies [44,45] and, as this draft is being written, up to 700 research papers have already been published. The number of clinical trials using repurposed drugs such as hydroxychloroquine, remdesivir, and lopinavir/ritonavir, among others, alone or in combination, is also exponentially growing, although in most cases unfortunately the results are not as good as initially expected [46–48]. Recently, a new treatment, plitidepsin, has been reported as the most potent antiviral drug against the coronavirus [49].

Our more promising candidates arise from the combination of molecular docking and transcriptomic results, and the cornerstone of our work, the TDA-based formalism. Among the 16 compounds related to the three SARS-CoV-2 proteins analyzed, nine have been described as possible candidates in other repurposing studies and five of these have already shown antiviral activity or have already been described as possible COVID-19 treatments (Supplementary Table S10), although preclinical studies will be required to determine their efficacy. In this direction, 3 of the 16 compounds are being evaluated under different clinical trials (indomethacin (*n* = 2), dexamethasone (*n* = 40), and spironolactone (*n* = 4)).

Rutin and indomethacin were amongst the notable compounds selected from 3CL main protease. In addition, they have been proven as good candidates in other studies. Rutin is a polyphenolic flavonoid that has shown a wide range of pharmacological applications due to its significant antioxidant properties [50]. Our results from GSEA analyses revealed that rutin might act in early stages of SARS-CoV-2 infection by activating the interferon-induced ISG15 pathway. ISG15 is an interferon-induced protein that has been implicated as a central player in the host antiviral response, and is the key element for the innate immune response against viral infection [51]. Furthermore, ISG15 modulates the immune system stimulating the IFN-gamma production by NK cells that lead to the promotion of early viral response [52]. Although the result of the possible interaction between rutin and 3CL protease has been reported by other studies using an in silico approach [53], our results provide a transcriptomic dimension to the possible effect of rutin during infection with SARS-CoV-2. Moreover, to our knowledge this is the first time the natural compound rutin has been related with the antiviral activity induced by the protein ISG15.

Dexamethasone, a corticosteroid used in a wide range of conditions for its antiinflammatory and immunosuppressive effects, could be one of the most promising repurposed drugs chosen to treat COVID-19 disease, based on some results that prove a decrease in the incidence of death versus the usual care group among patients receiving invasive

mechanical ventilation [54]. This compound was chosen because of its immunosuppressant properties to treat the cytokine storm induced by the immune response to coronavirus infection in late stages of the disease. Nonetheless, our results indicated that dexamethasone could also be a good candidate to target nsp15 endoribonuclease, although some repurposed works also suggested it as the target of the main protease [55]. These data could support the idea of administering corticosteroids, not just at the advanced infection stage, but also at the beginning. However, a recent study tested multiple pharmacological compounds derived from the steroids in vitro and demonstrated that dexamethasone has no antiviral activity against SARS-CoV-2 [56]. Nevertheless, we also found other corticosteroids that could interact with nsp15 protein, such as mifepristone, which suppressed viral growth conferring more than 95% of cell survival rate after viral infection and drug administration in vitro [56].

Lastly, the RNA-dependent RNA polymerase nsp12 of SARS-CoV-2 is a protein that performs essential functions in the coronavirus life cycle with no host cell homolog. This is an advantage for antiviral drug development, reducing the risk of affecting any protein present in human cells, as has been proven by many drug repurposing studies directed against nsp12 RdRP [57–60]. Vemurafenib, sorafenib, and raloxifene may be potential candidates against nsp12 RdRP. Vemurafenib can disturb the cellular Raf/MEK/ERK signaling cascade via binding in the ATP-binding site of BRAF(V600E) kinase and inhibiting its function [61], whereas sorafenib is another kinase inhibitor that targets VEGFR, PDGFR, and RAF kinases [62]. Interestingly, SARS-CoV-1 uses Raf/MEK/ERK signaling pathways to promote its replication via various mechanisms, indicating that this signaling cascade is a critical therapeutic target for host-directed SARS-CoV-2 antivirals [63–65].

#### **4. Materials and Methods**

#### *4.1. Data Acquisition*

DrugBank queries were carried out to retrieve the information regarding drugs with known protein targets [66]. In short, the DrugBank database version 5.1.5 (https://go. drugbank.com/releases/5-1-5, accessed on 21 March 2020) was downloaded in XML format, and the dbparser package and custom R scripts were employed to extract the relevant information [67]. We only selected drugs approved by the American Food and Drug Administration (FDA) and retrieved the names and UniProt identifiers of their protein targets. Then, UniProt IDs were mapped to their respective Protein Data Bank (PDB) structures using the Retrieve/ID mapping tool available at UniProt. All of the PDB structures targeted by FDA-approved drugs were downloaded in PDB format and stored for downstream analysis. Protein Data Bank queries were also performed to identify the three-dimensional structures of SARS-CoV-2 proteins.

#### *4.2. A Topological Data Analysis Based Formalism to Compare, at Quantitative Level, the Homological Similarities of Pairwise Three-Dimensional Molecules Considered as Surfaces*

In this paper, we used an adapted a TDA-based strategy which combines concepts and results from Algebraic Topology to compare three-dimensional protein structures [68–70]. More precisely, we considered the shape of the protein structure as a surface for which we only know a sample of points that are given by the coordinates of its Cα. Using this information, we construct a set of simplicial complexes associated to that protein. This set is composed by three classes of geometrical objects: isolated points, non-intersecting segments connecting these points, and non-intersecting triangles composed using nonintersecting segments. To quantify the above geometrical information, we associate a non-negative continuous function to each of the three components of a simplicial complex. The first function, denoted by *f*0, represents the structure of the position of the individual points, the second function *f*1, corresponds to the non-intersecting segments and finally, the third function *f*<sup>2</sup> correspond to the triangles. These three functions are called the persistent Betti Functions (PBFs) and allow us to characterize the representation of a protein's tertiary structure.

Therefore, we computed the persistent Betti functions using PDB structures from DrugBank. To compare the shape of both structures, one given by the PBF { *fi*} *i*=2 *i*=0 of each structure from DrugBank, against the PBF of *SARS-CoV*-2 proteins <sup>n</sup> *f SARS*−*Cov*−2 *i* o*i*=<sup>2</sup> *i*=0 we construct the persistent similarity measure (*PSM*), which is defined as

$$PSM\_{\bar{i}} = \frac{\int \min\left(f\_{\bar{i}}(\mathbf{x}), f\_{\bar{i}}^{SARS-Cov-2}(\mathbf{x})\right) d\mathbf{x}}{\int \max\left(f\_{\bar{i}}(\mathbf{x}), f\_{\bar{i}}^{SARS-Cov-2}(\mathbf{x})\right) d\mathbf{x}} \text{ for } i = 0, 1, 2. \tag{1}$$

Then, we calculate the mean of the persistent similarity measures:

$$\overline{PSM} = \frac{1}{3}(PSM\_0 + PSM\_1 + PSM\_2) \tag{2}$$

for each protein comparison. A *PSM* ≥ 0.9 threshold value was established, considering those drugs whose target protein had a value of 0.9 or higher for their mean persistent similarity measure with a *SARS-CoV*-2 protein as drug repurposing candidates.

#### *4.3. Data Preprocessing and Persistent Similarity Measures Computation*

All protein structures in PDB format were loaded into the R environment using the bio3d package [71]. Then, the coarse-grain representation of each structure was generated by selecting only the three-dimensional atomic coordinates of the alpha-carbons of the amino acids [26]. Two main reasons compelled us to work with this reduced representation. First, the construction of simplicial complexes scales exponentially with the number of initial points present in the point cloud. Therefore, structures defined by a very large number of points are not computationally tractable even in state-of-the-art computers. Second, all-atom models present a high degree of detail that could mask the general structure of the protein. Barcodes were constructed using the R package of TDAstats [72]. TDAstats makes use internally of the Ripser C++ library [73], an optimized fast software package for simplicial complexes and barcodes construction.

#### *4.4. Protein–Ligand Binding with AutoDock 4.2*

Ligand preparation was carried out as follows: First, the FDA-approved drugs in SDF format were retrieved from DrugBank. A custom R script and Open Babel v.3.0.0 were used to transform SDF into the mol2 format [74–77]. Following, the MGLTools v.1.5.7 toolkit was employed to add the polar hydrogens and protonation at pH 7.4. Then, mol2 drug structures were converted into PDBQT format, and their stereochemical properties were computed using AutoDock 4.2 [78]. A virtual screening library was then constructed using the preprocessed drug structures. Drugs containing atoms different from those included in the following list (H, C, N, O, F, Mg, P, S, Cl, Ca, Mn, Fe, Zn, Br, I) were discarded from the subsequent analyses because AutoDock does not include the values of their atomic force fields and is, therefore, unable to perform molecular docking using them. Polar hydrogens were also added to the SARS-CoV-2 protein PDB structures which were also transformed to the PDBQT format. Docking was carried out using AutoDock 4.2 [78], a molecular docking software package developed by the Scripps Research Institute. A grid box spanning the whole protein structure was set to perform blind docking. AutoDock was configured following the manual recommendations [79]. We increased the parameter ga\_runs from 10 to 150 to improve the accuracy of the results.

#### *4.5. Differential Gene Expression Analyses of SARS-CoV-2 Infected Human Samples and Cell Lines and Uninfected Controls*

We carried out searches for transcriptomic datasets of patients and human-derived cell lines including samples infected with SARS-CoV-2 and uninfected controls. At the time the searches were carried out, three datasets were identified. Dataset 1 (DS1) was found in the gene expression omnibus (GEO) under ID GSE150316 [80]. This includes formalin-

fixed paraffin-embedded samples from multiple tissues (i.e., lung, jejunum, heart) derived from SARS-CoV-2-infected individuals and uninfected controls obtained in autopsies. We restricted our analysis to lung samples. Twenty-one samples (16 cases and five controls) were selected for downstream analysis.

Dataset 2 (DS2) gathers samples derived from bronchoalveolar lavage fluids (BALF) of SARS-CoV-2 infected patients (four samples derived from two patients with two technical replicates) and three healthy controls [81]. Samples derived from infected patients were stored at the National Genomics Data Center under accession number CRA002390, whereas control samples were downloaded from the NCBI SRA database and were available under the identifiers SRR10571724, SRR10571730, and SRR10571732. Sequence alignment using the human reference genome hGR38 and count extraction were carried out using the Rsubread package [82].

Finally, the third dataset (DS3) was available in GEO under accession ID GSE147507 [83]. It presented a complex design including both primary cell lines derived from the human lung epithelium and transformed lung alveolar which were either mock treated or infected with different viruses including the influenza A virus (IAV), the respiratory syncytial virus (RSV), and SARS-CoV-2, in addition to samples derived from infected ferrets and two technical replicates of a lung sample derived from a SARS-CoV-2-infected human patient. We restricted our analysis to the cell lines NHBE, A549, and Calu-3, which were either infected with SARS-CoV-2 or were mock treated. The infected human lung samples and the healthy lung biopsies were also included. Overall, 28 samples were analyzed in this dataset.

For each dataset, differential gene expression analysis between SARS-CoV-2 infected samples and uninfected controls was carried out using the DESeq2 package [84].

#### *4.6. Identification of LINCS 1000 Signatures Negatively Correlated with the SARS-CoV-2 Differential Gene Expression Profiles*

LINCS L1000 contains an extensive collection of gene expression profiles generated using thousands of perturbagens (i.e., small molecules, ligands, micro-environments, CRISPR gene over-expression, and knockdown perturbations) and different cell lines, doses, and exposure times [85]. In particular, LINCS L1000 Level 5 data includes differential gene expression signatures computed by comparing three technical replicates of the same perturbation to appropriate controls. Level 5 LINCS L1000 phases I (GSE92742) and II (GSE70138) datasets were downloaded from GEO. Signatures involving FDA-approved drugs were identified with the help of the information contained in file *repurposing\_drugs\_20180907.txt* and *repurposing\_samples\_20180907.txt* available at the LINCS L1000 repurposing hub [85] (see Supplementary Materials). Drugbank and LINCS 1000 data were merged based on Pubchem compound identifiers. Then, the subset of signatures corresponding to FDA approved medications with 435 known Pubchem identifiers were selected. Overall, we obtained 52,144 expression signatures generated using 1313 approved drugs. To identify drugs with the potential of reverting the differential expression profiles generated by SARS-CoV-2 infection, we computed Pearson's correlations between each expression signature derived from LINCS L1000 and the differential expression profiles from DS1, DS2, and DS3, and picked those drugs exhibiting the most negative correlations.

#### *4.7. Gene Set Enrichment Analysis (GSEA)*

Dysregulated biological processes were identified for each transcriptomic dataset using the pre-ranked Gene Set Enrichment Analysis (GSEA) implementation of the fgsea package [86]. The C5 molecular signatures collection, which contains gene sets derived from the three branches of Gene Ontology (GO), was used as a source of functional information. GO terms including more than 500 or less than 15 genes were filtered out. GSEA analyses were also performed for those LINCS L1000 level 5 expression signatures negatively correlated with the differential gene expression profiles generated by the SARS-CoV-2 infection to determine their effect in specific pathways and biological processes. Reactome (version 73) was used as a source of pathway information and analyses were carried out using the clusterProfiler R-package (https://www.rdocumentation.

org/packages/clusterProfiler/versions/3.0.4, accessed on 21 March 2020) [87]. Biological processes and pathways presenting false discovery rate (FDR) adjusted *p*-values were called to be significantly deregulated.

#### **5. Conclusions**

In conclusion, our strategy of quantitative homological similarities using TDA-based formalism would allow researchers and clinicians to select optimal candidates from drug repurposing to achieve the desired target, not only regarding the SARS-CoV-2 coronavirus, but also any new viruses that may appear in the future, by choosing the best targets among all virus proteins. In this specific case, targeting nsp15 endonuclease and nsp12 RNA polymerase, in addition to other promising drug targets of the 3CL main protease, could support the development of a cocktail of anti-coronavirus treatments that could also be potentially used for the discovery of broad-spectrum antivirals. In particular, we identified 16 potential repurposable drug candidates including cholic acid, rutin, indomethacin, sulindac, sulfisoxazole, dasatinib, dexamethasone, phenolphthalein, spironolactone, mifepristone, carbamazepine, vemurafenib, sorafenib, levonorgestrel, naloxone, and raloxifene. Furthermore, by choosing a precision multidrug treatment, we could rescue any specific drug failure or avoid any future drug resistance due to possible acquired mutations in any of the proteins as a consequence of continuous virus replication and spreading, because the virus will be attacked from different fronts. Nevertheless, our results based on multidrug combinations should be validated in both in vitro and in vivo experiments, not just to prove the effectiveness of the treatment, but also to select the best combination against SARS-CoV-2 infection and consequent disease symptoms.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13040488/s1, File S1: Differential gene expression and GSEA analyses results for the three transcriptomic datasets, File S2: INCS L1000 analyses results for the three transcriptomic datasets, File S3: Supplementary tables. File S4: interacting residues between the viral proteins and the drugs identified as potential repurposing candidates. File S5: repurposing\_drugs\_20180907. File S6: repurposing\_samples\_20180907. Table S1: Proteins targeted by Drugbank FDA-approved medications showing average persistent similarity measures higher than 0.9 with 6M2Q, Table S2: Transcriptomic and molecular docking analyses results for drugs with the potential of targeting the SARS-CoV-2 3CL protease in apo conformation (6M2Q), Table S3: Proteins targeted by Drugbank FDA-approved medications showing average persistent similarity measures higher than 0.9 with 6M71, Table S4: Transcriptomic and molecular docking analyses results for drugs with the potential of target-ing the SARS-CoV-2 RNA dependent RNA polymerase (6M71), Table S5: Proteins targeted by Drugbank FDA-approved medications showing average persistent similarity measures higher than 0.9 with 6W01, Table S6: Transcriptomic and molecular docking analyses results for drugs with the potential of targeting the SARS-CoV-2 NSP15 Endoribonuclease (6W01), Table S7: GSEA results for top drugs targeting the 3CL protease (6M2Q), Table S8: GSEA results for top drugs tar-geting the RNA-dependent RNA polymerase (NSP12)(6M71), Table S9: GSEA results for top drugs targeting the SP15 Endoribonuclease (6W01), Table S10: Previous research analyzing the effects of our candidate drugs in SARS-CoV-2 infection.

**Author Contributions:** Conceptualization, A.F. and J.C.; methodology, A.F., R.P.-M. and J.F.-M.; software, A.F., R.P.-M. and J.F.-M.; validation, A.F., R.P.-M. and J.F.-M.; formal analysis, J.F.-M. and R.P.-M.; investigation, J.F.-M., R.P.-M. and J.C.; resources, A.F. and R.P.-M.; data curation, J.F.-M. and R.P.-M.; writing—original draft preparation, A.F., R.P.-M., J.F.-M. and J.C.; writing—review and editing, B.S.-G., J.F.-M. and A.F.; visualization, A.F., J.F.-M., B.S.-G. and J.C.; supervision, A.F., J.-L.D. and J.C.; project administration, A.F. and J.-L.D.; funding acquisition, A.F. and J.-L.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is partially supported by grants FONDOS SUPERA COVID-19, 2020–2021 and Funda-ción BBVA a equipos de investigación científica SARS-CoV-2 y COVID-19, IA4COVID19 2020-2022.

**Data Availability Statement:** All data used in this work was obtained from the following public repositories: Drug Bank (https://go.drugbank.com/ (accessed on 21 March 2020)), Gene Expression

Omnibus (https://www.ncbi.nlm.nih.gov/geo/ (accessed on 21 March 2020)), Protein Data Bank (https://www.rcsb.org/ (accessed on 21 March 2020)), and the Genome Sequence Archive (https: //bigd.big.ac.cn/gsa/browse/CRA002390 (accessed on 21 March 2020)).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Drug Repurposing for COVID-19 Treatment by Integrating Network Pharmacology and Transcriptomics**

**Dan-Yang Liu <sup>1</sup> , Jia-Chen Liu <sup>2</sup> , Shuang Liang <sup>2</sup> , Xiang-He Meng <sup>2</sup> , Jonathan Greenbaum <sup>3</sup> , Hong-Mei Xiao <sup>2</sup> , Li-Jun Tan 1,\* and Hong-Wen Deng 1,2,3,\***


**Abstract:** Since coronavirus disease 2019 (COVID-19) is a serious new worldwide public health crisis with significant morbidity and mortality, effective therapeutic treatments are urgently needed. Drug repurposing is an efficient and cost-effective strategy with minimum risk for identifying novel potential treatment options by repositioning therapies that were previously approved for other clinical outcomes. Here, we used an integrated network-based pharmacologic and transcriptomic approach to screen drug candidates novel for COVID-19 treatment. Network-based proximity scores were calculated to identify the drug–disease pharmacological effect between drug–target relationship modules and COVID-19 related genes. Gene set enrichment analysis (GSEA) was then performed to determine whether drug candidates influence the expression of COVID-19 related genes and examine the sensitivity of the repurposing drug treatment to peripheral immune cell types. Moreover, we used the complementary exposure model to recommend potential synergistic drug combinations. We identified 18 individual drug candidates including nicardipine, orantinib, tipifarnib and promethazine which have not previously been proposed as possible treatments for COVID-19. Additionally, 30 synergistic drug pairs were ultimately recommended including fostamatinib plus tretinoin and orantinib plus valproic acid. Differential expression genes of most repurposing drugs were enriched significantly in B cells. The findings may potentially accelerate the discovery and establishment of an effective therapeutic treatment plan for COVID-19 patients.

**Keywords:** SARS-CoV-2; COVID-19; drug repurposing; network-based pharmacology

#### **1. Introduction**

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the coronavirus disease 2019 (COVID-19) and triggered the largest pandemic since 1918 [1], which was responsible for >100 million cases and >2 million deaths reported globally [2]. However, there are no specific antiviral drugs for SARS-CoV-2 infection so far, for the reduction of morbidity and mortality of COVID-19, active symptomatic support was urgently needed [3].

According to recent reports [4–6], the majority of COVID-19 patients are currently given antiviral and antibiotic treatments or combination therapy including oseltamivir, ribavirin, lopinavir, ritonavir, and moxifloxacin. Additionally, several drugs are under clinical trials to verify their safety and efficacy for COVID-19 treatment, such as favipiravir, remdesivir, and hydroxychloroquine [7]. However, existing therapeutic options for the treatment of COVID-19 remain controversial. For example, remdesivir is an FDA Emergency Use Authorization (not FDA-approval) viral RNA polymerase inhibitor which has

**Citation:** Liu, D.-Y.; Liu, J.-C.; Liang, S.; Meng, X.-H.; Greenbaum, J.; Xiao, H.-M.; Tan, L.-J.; Deng, H.-W. Drug Repurposing for COVID-19 Treatment by Integrating Network Pharmacology and Transcriptomics. *Pharmaceutics* **2021**, *13*, 545. https://doi.org/10.3390/ pharmaceutics13040545

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 10 February 2021 Accepted: 26 March 2021 Published: 14 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

been widely used in COVID-19 patients [8], however, a recent randomized clinical trial demonstrated there was no significant beneficial effect [9]. Similarly, the COVID-19 WHO SOLIDARITY trial showed that other proposed treatments such as hydroxychloroquine, lopinavir, and interferon regimens appeared to have little or no effect on hospitalized COVID-19 patients [10]. Therefore, there is an urgent necessity to develop novel potential candidates for COVID-19 treatment.

Traditional drug development is a time-consuming and costly process that frequently takes 10–15 years and costs about 2–3 billion dollars from initial lab-scale experiments through the three phases of clinical trials and final approval for clinical usage [11]. Drug repurposing, as an effective and rapid drug discovery strategy from existing drugs [11,12], is considered the most practical approach as a rapid response to the emergent pandemic since the candidate treatments have already previously been tested for their safety [13]. The availability of the genomic sequence of SARS-CoV-2 has rapidly accelerated the development of clinical perspectives and recommendations. For example, David E. Gordon et al. identified 332 SARS-CoV-2 human protein-protein interactions and 69 drug candidates including 29 FDA-approved drugs, 12 clinical trial drugs, and 28 drugs at a preclinical stage [14]. Additionally, gene set enrichment analysis (GSEA) can be applied to identify underlying pathological processes using gene expression of COVID-19 patients, which can retrieve efficient drugs from patient-derived gene expression data using drug–target gene sets [15]. Therefore, the application of GSEA for drug targets based on drug–transcriptomeresponses datasets and disease-associated gene sets can serve as an excellent screening tool for diseases that lack a safe and reliable cellular model for in vitro screening, such as COVID-19 [16].

This study uses an integrated network-based pharmacologic and transcriptomic approach to screen drug candidates for COVID-19 treatment. Network-based pharmacology is an effective and holistic tool to identify drug treatments, where the drug effects are provided by the distance between drugs and disease in the interactome [17]. Additionally, several databases containing genome-wide expression profiles of human cell lines treated with bioactive compounds have been developed for drug discovery [18]. Transcriptional profiling studies have successfully identified potential therapies for diseases such as breast cancer [19], diabetes [20], and Parkinson's [21]. Using a network-based pharmacology approach combined with the transcriptional profiling databases, we detected 18 single drug candidates (e.g., dexamethasone, chloroquine, and tretinoin) and 30 synergistic drug combinations as potential therapies for COVID-19.

#### **2. Materials and Methods**

We screened novel drug combinations for COVID-19 by integrated network-based pharmacology and transcriptome analysis based on the following steps: (1) collection of COVID-19 related genes; (2) collection of target-available drugs and construction of drug–target modules; (3) calculation of network-based proximity between drug–target modules and COVID-19 related genes; (4) filtering drugs based on gene set enrichment analysis (GSEA); (5) network-based prediction of drug combinations (Figure 1). These steps will be detailed in the following.

1

**Figure 1.** Schematic illustration of the computational framework. (1) Collection of the coronavirus disease 2019 (COVID-19) related genes from published SARS-CoV-2 human host data and differential expression genes (DEGs) from a single-cell study of the peripheral immune response in patients with severe COVID-19 (GSE150728). (2) Drug–target information retrieved from DrugBank and SuperTarget. (3) Quantify the therapeutic effect by computing the proximity between drug targets and COVID-19 related genes. (4) Gene set enrichment analysis (GSEA) to determine whether COVID-19 related genes show significance in drug-induced gene expression profiles. (5) Drug candidates were further prioritized for drug combinations using the "Complementary Exposure" model.

#### *2.1. Genes Related to COVID-19*

Genes related to COVID-19 were retrieved from the latest SARS-CoV-2 human host data and a single-cell transcriptomic study of the peripheral immune response to severe COVID-19 (GSE150728). SARS-CoV-2 protein sequences, viral genomes, literature, clinical resources submitted to the National Center for Biotechnology Information (NCBI) on the SARS-CoV-2 special subject have been rapidly evolving [22]. In total, 65 SARS-CoV-2 human host proteins were selected from the coronavirus genomes of NCBI datasets and 1070 potential COVID-19 related genes were obtained from the transcriptomic study by selecting the differential expression genes (DEGs) between individual COVID-19 samples (*n* = 7) and healthy controls (*n* = 6) in 7 cell types, that was 409 genes from CD14+ Monocytes, 257 genes from CD16+ Monocytes, 261 genes from Dendritic Cells, 173 genes from NK (nature killer) cells, 180 genes from CD8+ T cells, 172 genes from CD4+ T cells and 481 genes from B cells (Tables S1 and S2) [23]. All the identified proteins were mapped to the official gene symbols of humans reported by the HUGO Gene Nomenclature Committee (HGNC). Finally, 63 SARS-CoV-2 related genes derived from human host proteins and 971 DEGs were retained as the COVID-19 potential related genes after removing duplicates.

Gene Ontology (GO) enrichment analysis was performed on the potential COVID-19 related genes to identify significant pathways. By using the R package ClusterProfiler [24], all potential COVID-19 related genes were functionally categorized according to their biological processes, cellular components, and molecular functions. Functional term

enrichment analysis was performed to provide insights into the biological mechanisms underlying the COVID-19 related genes. Using this approach, only genes involved in the significantly enriched GO terms (*p*-value < 0.05) were retained for further analysis as COVID-19 related genes in the context of networks.

#### *2.2. Drug–Target Relationship Modules*

The drug information was obtained from DrugBank and SuperTarget [25,26]. Briefly, 7485 drugs with 21,335 drug–protein links were selected from DrugBank (version 5.1.6), and 3138 drugs with 16,579 drug–protein links were retrieved from SuperTarget. After removing drugs without targets as well as duplications, and converting all target genes into human gene symbols, 31,139 interactions containing 3121 targets of 7811 drugs were finally identified (Supplemental Table S3). A drug–target relationship module was defined by the drug–target interaction information, where multiple targets share one drug.

#### *2.3. Network-Based Proximity between Drugs and COVID-19*

A network-based approach was used to analyze the correlation between drug and disease, in which proximity scores were quantified by calculating the closest distance between the drug–target module and COVID-19 related genes in the context of the human proteinprotein interaction (PPI) network. The PPI data were obtained from Pathway Commons (version 12), which contains over 5772 pathways and 2.4 million interactions [27]. Genes (nodes) with interaction (links) constructed a network graph of PPI, while the interaction between two nodes was undirected and unweighted. Here, a proximity score was defined by the average shortest path length between the drug target genes and their nearest disease proteins in the context of PPI to quantify the therapeutic effect of drugs [28,29]. Given the set of COVID-19 related genes sourced from SARS-CoV-2 proteins (S), the group of drug target genes (T), the shortest distance between two genes in the PPI network *d*(*s*, *t*) where s∈S and t∈T (Equation (1)),

$$d(S, T) = \frac{1}{|T|} \sum\_{t \in T} \min\_{s \in S} (d(s, t) + w) \tag{1}$$

where *w* is the drug influencing weight, defined as *w* = − ln(*D* + 1) if the drug target is one of the COVID-19 related genes sourced from DEGs (D is the connectivity degree of targets) and *w* = 0 otherwise.

A simulated reference distance score distribution corresponding to the drug was generated to assess the significance of the results by linking the drug's random target modules and disease-related genes. Referenced drug modules were constructed by selecting random genes (denoted as R) with the same degree of drug target sets in the network, where the distance *d*(*S*, *R*) indicates the relationship between a simulated drug and COVID-19. The reference distribution was established based on 30,000 replications. A drug with a score lower than 98% of the reference distribution scores was considered significant [28]. The network proximity was converted to Z-score based on permutation tests (Equation (2)):

$$Z(S,T) = \frac{d(S,T) - \mu\_{d(S,R)}}{\sigma\_{d(S,R)}} \tag{2}$$

where *µd*(*S*,*R*) and *σd*(*S*,*R*) are the mean and standard deviation of the permutation tests.

#### *2.4. Biological Enrichment Analysis of COVID-19 Related Genes on the Drug-Induced Expression Profiles*

We performed GSEA as a further prioritization strategy to screen drug candidates by examining the distribution of disease-related genes in drug-induced gene expression profiles. GSEA was utilized to determine whether a priori defined sets of genes showed statistically significant enrichment in a collected gene list [30], which could identify whether drug candidates affected the expression of disease pathways. We first collected perturbation-

driven gene expression profiles from LINCS (Library of Integrated Network-based Cellular Signatures), which provided transcriptional responses of human cells to chemical and genetic perturbation [31]. Human myeloid leukemia mononuclear (THP-1) cell line from blood was selected due to the important association of peripheral blood and myelomonocytic cells with COVID-19 [32–34]. The goal of GSEA was to determine whether the COVID-19 related genes sourced from the SARS-CoV-2 related gene set was randomly distributed throughout the drug-induced expression data set sorted by correlation with the phenotype of interest or enriched at either the top or bottom. Drugs with FDR (False Discovery Rate) less than 0.25 and ES (Enrichment Scores) higher than 0 were identified as potential drug candidates for COVID-19.

#### *2.5. GSEA Analysis of Repurposing Drugs in Specific Cell-Types*

According to the Seurat data provided by Aaron J. Wilk [23], we chose "Cell type (coarse)" as the standard to select scRNA seq data of seven cell types, including B Cells, CD14+ Monocytes, CD16+ Monocytes, CD4+ T Cells, CD8+ T Cells, Dendritic Cells, NK (natural killer) Cells, and calculated differentially expressed genes between total COVID-19 samples (*n* = 7) and all healthy controls (*n* = 6). Each cell type was divided into two groups, diseased and healthy controls, according to whether the donor had COVID-19. Subsequently, differential gene expression profiles between the diseased and healthy controls in specific cell-types were calculated by using the "FindMarkers" function in Seurat (Supplemental Tables S7–S13) [35]. GSEA analysis of repurposing drug-induced THP-1 differential expression genes (logFC > 1) and specific cell-type transcriptomes were used to assess the enrichment of sets of genes (repurposing drugs DE genes) in each cell type (scRNA seq gene list). For each repurposing drug, a specific cell with FDR < 0.05 and ES < 0 was identified as potential drug-sensitive cell types for COVID-19.

#### *2.6. Network-Based Prediction of Drug Combinations*

Drug combination therapies are more beneficial rather than individual drug since the synergistic drug pairs can target more genes and play role in multiple complicated pathways [36]. The Complementary Exposure model has previously been demonstrated as an effective approach to predict useful combinations [37]. The model is based on the following conditions: drug targets and disease genes overlap topologically (ZDA < 0, ZDB < 0, *ZDA* < 0, *ZDB* < 0) , and two sets of drug targets are separated topologically (*SAB* > 0). The Complementary Exposure model network proximity between a drug (*A* or *B*) and a disease (D) is defined by the z-score (Equation (3)):

$$Z\_{DA} = \frac{d\_{DA} - \mu\_d}{\sigma\_d} \tag{3}$$

The z-score is calculated by randomly sampling both degrees of nodes (drug targets and disease genes) with 1000 replications. The mean distance *µ<sup>d</sup>* and standard deviation *σ<sup>d</sup>* of the reference distribution are used to convert *dDA* to a normalized distance (Equation (4)), where *dDA* relies on the average shortest path lengths *d*(*d*, *a*) between disease genes (*d*, *d* ∈ *D*) and drug targets (*a*, *a* ∈ *A*).

$$d\_{DA} = \frac{1}{\|D\|} \sum\_{d \in D} \min\_{a \in A} d(d, a) \tag{4}$$

The network-based separation *SAB* is quantified with two drug targets module *A* and *B* by calculating the mean shortest distances *dAA* and *dBB* (Equation (5)):

$$d\_{AA} = \frac{1}{||A||} \sum\_{a \in A} \min\_{a' \in A} d\left(a, a'\right) \tag{5}$$

where *a* 0 (*a* <sup>0</sup> ∈ *A*) is the closet node to *a* (*a* ∈ *A*) within the interactome network. The mean shortest distance *dAB* between their proteins is defined by the "closest" measure, where *d*(*a*, *b*) is the shortest path length between *a* (*a* ∈ *A*) and *b* (*b* ∈ *B*) in the interactome network (Equation (6)).

$$d\_{AB} = \frac{1}{||A|| + ||B||} \left(\sum\_{a \in A} \min\_{b \in B} d(a, b) + \sum\_{b \in B} \min\_{a \in A} d(a, b)\right) \tag{6}$$

A networked-based separation of a drug pair, *A* and *B*, can be calculated as follows (Equation (7)):

$$\mathcal{S}\_{AB} = \langle d\_{AB} \rangle - \frac{\langle d\_{AA} \rangle + \langle d\_{BB} \rangle}{2} \tag{7}$$

where *dAB* = 0 if genes are included in both the drug *A* and *B* target modules [38].

#### **3. Results**

#### *3.1. GO Enrichment Analysis of COVID-19 Related Genes*

To obtain meaningful molecular mechanisms underlying COVID-19, GO enrichment analysis classified potential COVID-19 related genes into enriched terms (Supplemental Table S4). All 63 SARS-CoV-2 related genes were categorized functionally into 1035 Gene Ontology terms including biological processes, cellular components, and molecular functions. Among the 971 COVID-19 DEGs, 860 genes were enriched in 1399 Gene Ontology terms. The COVID-19 related genes we identified were significantly enriched in blood pressure regulation (*p*-value = 5.29 <sup>×</sup> <sup>10</sup>−23), inflammatory response (*p*-value = 3.62 <sup>×</sup> <sup>10</sup>−09), neutrophil activation (*p*-value = 6.16 <sup>×</sup> <sup>10</sup>−60), and response to virus (*p*-value = 8.68 <sup>×</sup> <sup>10</sup>−32) (Figure 2). The results are consistent with previous studies, indicating that the renin-angiotensin system (RAS) plays an important role in the biological mechanisms of COVID-19 [39,40].

**Figure 2.** GO enrichment analysis of COVID-19 related genes. The dot plot is used to visualize enriched terms, (**a**) shows the COVID-19 related genes (*n* = 63) enrichment visualization and category interpretation. (**b**) pathway enrichment analysis visualization of single-cell DEGs (*n* = 860).

#### *3.2. Network-Based Proximity Scores between Drug–Target Modules and COVID-19 Related Genes*

We obtained the drug–disease proximity scores to evaluate the drug effect on COVID-19 through a network-based calculation. Drugs with low proximity scores are more likely to be effective against SARS-CoV-2 infection since the proximity scores reflect the distance between drug target sets and COVID-19 related genes in the interactome networks. Using this approach, we explored the distance of 7811 drug–target modules and COVID-19 related genes. The distance distribution of the drug targets to COVID-19 related genes was in the range of −2.66 to 2.79, and both real drugs and simulated drugs were widely distributed near the point of 1.70 (Figure 3). A ranked list of the potential drugs was clearly distributed in the range of −2.66 to 0.99, suggesting that the targets of existing drugs were closer to the COVID-19 genes than the reference sets (simulated drugs). We selected a distance smaller than 0.99 as the threshold to screen the potential drug candidates for COVID-19, where the corresponding Z-score was approximately −2.33 after converting into the proximity value. Finally, 468 drugs with proximity less than −2.33 were included in further analyses (Supplemental Table S5).

**Figure 3.** Distance distribution of all 7811 drugs and simulated reference. Peaks suggest that the distance corresponding to most members was around this value. The red line shows the distribution of the distance of the 7811 drugs to COVID-19 related genes. The black line illustrates the distance distribution of the simulated reference based on 30,000 replications. The blue line shows the threshold (distance < 0.99, Z-score < −2.33) to screen the drug candidates for COVID-19.

#### *3.3. GSEA Analysis of COVID-19 Related Genes in Drug-Induced Signatures*

To further estimate the drug candidate's efficacy on the disease and explore the underlying signaling pathways, we performed GSEA to examine their impact on the transcriptome of THP-1 cells. Since drugs were not fully matched between DrugBank and LINCS, some drugs were removed during the matching progress. In the total of 7811 drugs included in DrugBank and 377 from LINCS (THP-1 cell line), 112 drugs were matched by common name and 101 were matched by InChI Key (International Chemical Identifier Key). After removing overlaps, 131 drugs were included in both DrugBank and LINCS, 27 of which had low proximity scores (Z < −2.33) and were obtained for further GSEA.

We identified 18 drugs (FDR < 0.25 and ES > 0, Table 1) as potential therapeutic candidates since they significantly affected the expression of COVID-19 related genes in the mononuclear cells (Supplemental Table S6). These candidates included anti-viral

agents (curcumin, dexamethasone, chloroquine), anti-diabetic agents (glibenclamide), analgesics (resveratrol), anti-convulsant (valproic acid), anti-cholesteremic agents (simvastatin), anti-carcinogenic agents (phenethyl isothiocyanate), anti-neoplastic agents (tretinoin), immunosuppressive agents (fostamatinib, atorvastatin, cyclosporine), anti-estrogen (tamoxifen), anti-hypertensive (nicardipine, nifedipine), anti-allergic agents (promethazine), and anti-cancer agents (orantinib, tipifarnib). *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 16 x FOR PEER *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported Table 1.** Eighteen repurposable candidates for COVID-19. **Reported**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported Studies of**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported Studies of**  x FOR PEER **Table** Eighteen repurposable candidates for COVID-19.**Reported**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported Studies of**  *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 16 **Table 1.** Eighteen repurposable candidates for COVID-19. **Reported** 

**Reported** 

**Reported Studies of** 

**Studies of COVID-19** 

**Studies of COVID-19** 

**Studies of COVID-19** 

**Studies of COVID-19** 

**Studies of** 

**COVID-19** 

**COVID-19** 

**COVID-19** 

**COVID-19** 


**Table 1.** Eighteen repurposable candidates for COVID-19. **DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics DrugBank ID Z-Score Drug Name Structure Pharmacodynamics** 

**Table 1.** Eighteen repurposable candidates for COVID-19.

**Table 1.** Eighteen repurposable candidates for COVID-19.

**Table 1.** Eighteen for

**DrugBank ID Z-Score Drug Name Structure Pharmacodynamics** 

**DrugBank ID Z-Score Drug Structure Pharmacodynamics** 

#### *3.4. Repurposing Drugs Sensitivity in Specific Cell Type*

Differential expression analyses in 7 cell types between COVID-19 patients (*n* = 7) and controls (*n* = 6) were performed based on the scRNA-seq data (Tables S7–S13). According to the GSEA analysis, the DE genes of most repurposing drugs were enriched significantly in B cells (Table 2, Table S14). CD14+ Monocytes Cells and Dendritic Cells also showed sensitivity to the repurposing drug treatment. None repurposing drug DE genes were significantly enriched for the Single-cell gene expression spectrum of NK Cells, CD8+ T Cells, CD4+ T Cells.


**Table 2.** GSEA analysis of drug-induced different expression (DE) genes in scRNA profiles.

<sup>1</sup> Significant: Drug-induced DE genes statistically significant enrichment in scRNA profile; <sup>2</sup> NA: Drug-induced DE genes statistically no significant enrichment in scRNA profile.

#### *3.5. Identification of Synergistic Drug Combinations*

Based on the Complementary Exposure model, we identified 153 drug combinations based on the 18 potential therapeutic candidates for COVID-19. Among these combinations, 123 drug pairs were excluded due to close drug–target modules (*SAB* < 0), while 30 drug combination conformed to the Complementary Exposure Model and may therefore be effective in the treatment of COVID-19 (Table 3).

**Table 3.** All predicted possible combinations for COVID-19.



**Table 3.** *Cont.*

One notable potential drug combination was fostamatinib (*F*) plus tretinoin (*T*). Fostamatinib (*ZDF* = −3.68) and tretinoin (*ZDT* = −2.44) targets were both overlapped with the COVID-19 disease module, indicating that the drug combination might have a therapeutic effect on the disease. At the same time, the targets of fostamatinib and tretinoin were independent with network-based separation (*SFT* > 0), and therefore fit the Complementary Exposure pattern (Figure 4a). We also used the Sankey diagram to represent the interactions among drug–target-disease (Figure 4b). Apart from the drug directly targeting COVID-19 related genes, un-targetable drug–disease effects were present due to the drug–target interaction with COVID-19 related genes in the PPI as reflected by the proximity scores. Additionally, take promethazine (*P*) and nicardipine (*N*) as a counterexample. Promethazine (*ZDP* = −2.58) and nicardipine (*ZDN* = −2.81) targets fell into the Overlapping Exposure with the COVID-19 disease module. Although promethazine and nicardipine showed effective treatment on the disease, overlapping drug pair (*SPN* < 0) was not a synergistic drug pair due to adverse effects such as overlapping drug toxicity (Figure 4c). Additionally, sharing targets of promethazine and nicardipine meant the drug pair had limits in treatment from different therapeutic pathways (Figure 4d).

**Figure 4.** Network-based stratification of hypertensive drug combinations. (**a**) A network-based separation of a drug pair, fostamatinib (F), and tretinoin (T). For *ZDF* < 0 and *ZDT* < 0, the drug–target module of fostamatinib (F) and tretinoin (T) was overlapped with the disease module (D). For *SFT* > 0, the two sets of drug targets are separated topologically. Fostamatinib and tretinoin targets both separately hit the COVID-19 module, which was captured by the Complementary Exposure pattern. The disease module in orange (D) included disease-related genes (nodes) and their undirected and unweighted interactions (links), while the drug module (F or T) in blue (green) included drug–targets (nodes) and their undirected and unweighted interactions (links). (**b**) Sankey diagram visualizes drug pairs' mechanism hypothesis: drugs are on the left, and COVID-19 related genes are right. Links show drugs that were mapped onto COVID-19 related genes through drug–target associations and human protein-protein interaction. (**c**) Nicardipine (N) and Promethazine (P) drug–target modules overlapped the network. For *SPN* < 0, the two sets of drug targets were Overlapping Exposure, which meant more adverse effects and less efficacy compared to the Complementary Exposure pattern. (**d**) Sankey diagram showed how drug–targets of Nicardipine and Promethazine overlapped and interacted with related genes.

#### **4. Discussion**

This study used a network-based drug repurposing combined with a transcriptomics strategy to identify potential drug candidates and drug pairs for COVID-19 treatment. The joint analysis of the proximity of drug–target relationship modules, SARS-CoV-2 genomics, transcriptomics, and synergistic drug effects could overcome the limitations of analyzing data from only network distance or transcriptome and improve drug candidate prediction. We proposed 18 drugs and 30 drug combinations including broad-spectrum antiviral agents, receptor antagonists, channel blockers, and renin-angiotensin system agents.

Some medications such as dexamethasone, chloroquine, curcumin [41], glyburide [42], tretinoin [43,44], cyclosporine [45,46], valproic acid [47], fostamatinib [48,49], atorvastatin [50–52], and phenethyl-isothiocyanate [53] have recently received major attention for the treatment of COVID-19 and have been validated by previous studies, supporting the reliability of our findings. Nicardipine, promethazine, orantinib, and tipifarnib have not previously been reported as potential treatments for COVID-19. Therefore, we will discuss these novel drug candidates in the following.

• Nicardipine

With a similar structure to nifedipine (Z = −2.68), nicardipine (Z = −2.75) was initially developed to regulate high blood pressure as a dihydropyridine calcium channel blocker [54]. Nifedipine is indicated to potentially be effective in the treatment regimens of elderly patients with hypertension hospitalized with COVID-19 [55,56]. Therefore, nicardipine might play a similar role with nifedipine in the adjuvant treatment of COVID-19 patients.

• Promethazine

Promethazine (Z = −5.65) antagonizes various receptors including dopaminergic, histamine, and cholinergic receptors, and is commonly used for indications such as allergic conditions, motion sickness, sedation, nausea, and vomiting [57]. The proximity score of promethazine was significantly low partly by targeting genes including CALM1, KCNS1, LPAR4, LPAR6, P2RY12, P2PY8, and P2RX5, which were DEGs between T cell subsets of COVID-19 samples and healthy controls. Characteristics of the bronchoalveolar immune genes have been explored as potential mechanisms underlying pathogenesis in COVID-19 [58]. These findings implied that promethazine might be effective for COVID-19 by regulating the immune cell microenvironment.

• Orantinib and Tipifarnib

Orantinib (Z = −2.54) showed preliminary efficacy and safety in advanced hepatocellular carcinoma [59]. Tipifarnib (Z = −2.40) was studied in the treatment of acute myeloid leukemia (AML) and other types of cancer [60]. Although orantinib and tipifarnib are both not yet approved by the FDA, anticancer drugs identified by our study such as phenethyl isothiocyanate have been reported to be an effective treatment strategy to treat COVID-19 [53]. Drug repurposing against COVID-19 focused on anticancer agents was previously predicted to be effective and it was speculated that drugs interfering with specific cancer cell pathways may be effective in reducing viral replication [61]. Therefore, the anticancer drugs orantinib and tipifarnib might also be potential candidates for the treatment of COVID-19.

In contrast with our results, tamoxifen (Z = −4.75) was reported to increase the COVID-19 risk due to its anti-estrogen and P-glycoprotein inhibitory effects [62]. Data from previous experiments suggested that estrogen could regulate the expression of angiotensinconverting enzyme 2 (ACE2) [63], which was reported to be the critical natural cellular receptor for SARS-CoV-2 and was an important factor for infection. However, a recent study discussed the uncertain effects of RAS blockers on ACE2 levels and activity in humans and proposed an alternative hypothesis that ACE2 might more likely be beneficial than harmful in patients with lung injury [64]. The controversies of ACE2 system inhibition attempt to explain the relationship between the virus and the RAS [65], but existing research is too limited to support or refute these hypotheses. Our research suggested that tamoxifen may influence cytokine storm syndrome by regulating cytokine-mediated signaling pathways (ES = 0.67, P = 0.14), which is a severe clinical symptom of COVID-19 [66,67]. Several studies have indicated that tamoxifen could reduce cytokines to normal levels and it has been demonstrated to be beneficial for inflammation in rats [68,69]. Overall, we recommend that tamoxifen may protect against cytokine storms and alleviate ARDS in COVID-19 patients as well as reduce the incidence of critical illness and mortality.

There are some limitations to our strategy. First, the proximity calculation regards proteins interaction as nodes and links, which may not completely capture important information about the interaction types. Second, the LINCS and DrugBank databases are only partly matched, and therefore many important drug candidates may be ignored. Additionally, some of the potentially interesting drugs, such as alemtuzumab (Z = −3.27), were not able to be included in the final screening. Third, although THP-1 cells might be a useful tool in the research of monocyte and macrophage-related mechanisms [70], heterogeneity still exists in the gene expression profile of the mononuclear cells of COVID-19 patients and THP-1 cells. Additionally, considering that the impaired function of heart,

brain, lung, and liver were complications of COVID-19 [71], more types of infection-related cell lines could be taken into account to fully investigate drugs and treatment outcome on COVID-19.

In conclusion, our effective drug repurposing strategy combined network-based pharmacology and transcriptomes methods to identify 18 potential COVID-19 drugs, and recommend 30 drug combinations. Although several candidate repurposing drugs were previously reported to have the anti-COVID-19 effect, four drugs such as nicardipine, promethazine, orantinib, and tipifarnib were recommended for the first time in COVID-19 treatment. Additionally, based on our repurposing drug sensitivity analysis, DE genes of most repurposing drugs were enriched significantly in B cells. Our analysis contributed to guide and accelerate research in COVID-19 drug development, and this method would be kindly applicable for drug repurposing research in future complex diseases. However, the identified drug candidates still require future experimental validation and large-scale clinical trials before their use in COVID-19 management.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13040545/s1, Table S1: DE genes list compared individual COVID-19 samples with all healthy controls in each cell type; Table S2: Number of DE genes between individual COVID-19 samples and the healthy controls in 7 cell types; Table S3: Drug target interactions from the well-known database; Table S4: GO enrichment of COVID-19 related genes; Table S5: Network-based proximity scores of drug–disease relationships; Table S6: GSEA results on the SARS-CoV-2 related gene set in the THP-1 cells; Table S7: COVID-19 DE genes in CD14+ Monocytes; Table S8: COVID-19 DE genes in Dendritic Cells; Table S9: COVID-19 DE genes in CD8+ T cells; Table S10: COVID-19 DE genes in CD16+ Monocytes; Table S11: COVID-19 DE genes in B cells; Table S12: COVID-19 DE genes in NK cells; Table S13: COVID-19 DE genes in CD4+ T cells; Table S14: GSEA results on the repurposing drugs-related gene set in the specific cells.

**Author Contributions:** Conceptualization, H.-W.D., L.-J.T. and H.-M.X.; methodology, D.-Y.L., J.-C.L., X.-H.M. and S.L.; software, D.-Y.L.; data analysis, D.-Y.L.; writing—original draft preparation, D.-Y.L.; writing—review and editing, J.-C.L., S.L., L.-J.T., X.-H.M., J.G. and H.-W.D.; visualization, D.-Y.L.; supervision, L.-J.T. and H.-W.D.; funding acquisition, L.-J.T. and H.-W.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported in part by the Natural Science Foundation of China (NSFC; 81570807). HWD was partially was supported by grants from the National Institutes of Health [U19AG05537301, R01AR069055, P20GM109036, R01MH104680, R01AG061917, U54MD007595].

**Institutional Review Board Statement:** Not applicate.

**Informed Consent Statement:** Not applicate.

**Data Availability Statement:** Not applicate.

**Acknowledgments:** The authors are grateful to Ying Liu and Yun Gong for technical assistance with the data analysis. The authors acknowledge the NCBI and World Health Organization for their work on collecting, processing, and sharing datasets about COVID-19.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Combination Therapy with Fluoxetine and the Nucleoside Analog GS-441524 Exerts Synergistic Antiviral Effects against Different SARS-CoV-2 Variants In Vitro**

**Linda Brunotte <sup>1</sup> , Shuyu Zheng <sup>2</sup> , Angeles Mecate-Zambrano <sup>1</sup> , Jing Tang <sup>2</sup> , Stephan Ludwig <sup>1</sup> , Ursula Rescher <sup>3</sup> and Sebastian Schloer 3,\***


**Abstract:** The ongoing SARS-CoV-2 pandemic requires efficient and safe antiviral treatment strategies. Drug repurposing represents a fast and low-cost approach to the development of new medical treatment options. The direct antiviral agent remdesivir has been reported to exert antiviral activity against SARS-CoV-2. Whereas remdesivir only has a very short half-life time and a bioactivation, which relies on pro-drug activating enzymes, its plasma metabolite GS-441524 can be activated through various kinases including the adenosine kinase (ADK) that is moderately expressed in all tissues. The pharmacokinetics of GS-441524 argue for a suitable antiviral drug that can be given to patients with COVID-19. Here, we analyzed the antiviral property of a combined treatment with the remdesivir metabolite GS-441524 and the antidepressant fluoxetine in a polarized Calu-3 cell culture model against SARS-CoV-2. The combined treatment with GS-441524 and fluoxetine were well-tolerated and displayed synergistic antiviral effects against three circulating SARS-CoV-2 variants in vitro in the commonly used reference models for drug interaction. Thus, combinatory treatment with the virus-targeting GS-441524 and the host-directed drug fluoxetine might offer a suitable therapeutic treatment option for SARS-CoV-2 infections.

**Keywords:** combination therapy; SARS-CoV-2; nucleoside GS-441524; fluoxetine; synergy

#### **1. Introduction**

The Coronavirus Disease 2019 (COVID-19) caused by the Severe Acute Respiratory Syndrome Related Coronavirus 2 (SARS-CoV-2) has resulted in over 2 million deaths within one year and demonstrates the risk of newly emerged pathogens [1,2].

In contrast to other human circulating coronaviruses, SARS-CoV-2 leads to a severe disease with multiple organ failures, especially in elderly patients and those with chronic medical conditions [3–5]. Although vaccines are available, their production, distribution and vaccine hesitancy are critical limiting factors in healthcare. Thus, additional therapeutic strategies to combat the SARS-CoV-2 infection are needed. However, the development and production of new antiviral drugs is a time-consuming process that can be accelerated by the repurposing of already clinically licensed drugs [6,7].

One of the repurposed FDA-approved drugs that has received considerable attention as an antiviral agent against SARS-CoV-2 is remdesivir, a nucleotide monophosphate analogue of adenosine monophosphate (AMP) that interferes with the viral RNA-dependent RNA polymerase [8,9]. Remdesivir was originally developed by Gilead for the treatment

**Citation:** Brunotte, L.; Zheng, S.; Mecate-Zambrano, A.; Tang, J.; Ludwig, S.; Rescher, U.; Schloer, S. Combination Therapy with Fluoxetine and the Nucleoside Analog GS-441524 Exerts Synergistic Antiviral Effects against Different SARS-CoV-2 Variants In Vitro. *Pharmaceutics* **2021**, *13*, 1400. https://doi.org/10.3390/ pharmaceutics13091400

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 20 July 2021 Accepted: 1 September 2021 Published: 3 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of Ebola [10], and is shown to have strong therapeutic efficacy in in vivo models of coronaviruses (MERS-CoV, SARS-CoV, SARS-CoV-2) in mice and primates [11–13]. However, it has a very limited half-life time in the plasma of patients [14–16]. Remdesivir is converted into its predominant serum metabolite GS-441524, which maintains the antiviral properties [12,15–18]. A study conducted in rhesus macaques infected with SARS-CoV-2 treated with remdesivir revealed 1000-fold higher GS-441524 serum levels than those of remdesivir [16]. The benefit of GS-441524 over remdesivir is the lower molecular weight and hydrophilicity, which makes it easier to produce an aerosolized formulation for inhalable therapeutic treatment. An inhalable formulation would allow a high concentration of the drug in lung cells and minimized systemic toxicity [17]. Hence, GS-441524 has a higher potential to be used for antiviral treatments of respiratory pathogens like SARS-CoV-2.

While the majority of antiviral drugs such as remdesivir or GS-441524 are directly targeting viral proteins and are quite efficient to eliminate the pathogen, they pose the risk of emerging viral resistance [19–21]. Thus, combination therapies that include virusand host-directed drugs are considered to cause less resistance. We recently reported the importance of the endosomal lipid balance for the entry process of enveloped viruses like SARS-CoV-2. The clinically licensed antidepressant fluoxetine, a drug belonging to the class of functional inhibitors of acid sphingomyelinase (FIASMA), blocks the sphingomyelin converting acid sphingomyelinase (ASMase) within the late endosomal/lysosomal (LEL) compartments [22]. The inhibitory effects of fluoxetine relies on its ability to interfere with the endosomal lipid balance, preventing the entry of SARS-CoV-2 [23].

Here, we evaluated the antiviral potential of GS-441524 in a polarized Calu-3 cell culture model when administered alone or in combination with the host-directed drug fluoxetine. The drug combination of fluoxetine and GS-441524 showed stronger antiviral activities against three different SARS-CoV-2 variants compared to the monotherapies. Notably, both drugs act synergistic, as calculated with the commonly used reference models for drug interaction studies.

#### **2. Materials and Methods**

#### *2.1. Cells and Compounds*

The human bronchial epithelial cell line Calu-3 and the Vero E6 cells derived from the kidney of an African green monkey were cultivated in Dulbecco's modified Eagle's medium (DMEM, Sigma-Aldrich, Darmstadt, Germany) with a 10% standardized fetal bovine serum (FBS Advance; Capricorn, Ebsdorfergrund, Germany), 2 mM L-glutamine, 100 U/mL penicillin, 0.1 mg/mL streptomycin, and 1% non-essential amino acids (Merck, Darmstadt, Germany) in a humidified incubator at 5% CO<sup>2</sup> and 37 ◦C. Calu-3 monolayers were polarized and cultured as described [24]. Fluoxetine (5 mM, Sigma-Aldrich, Darmstadt, Germany) and GS-441524 (100 mM, Biomol, Hamburg, Germany) were solubilized in DMSO.

#### *2.2. Cytotoxicity Assay*

Calu-3 cells were cultured at the indicated concentrations with either the solvent DMSO, GS-441524, fluoxetine or with the combinations of fluoxetine/GS-441524 for 48 h. To estimate cytotoxic effects, a staurosporine solution (1 µM) was used as a positive control. The cell viability was evaluated by adding MTT 3-(4,5-dimethylthiazol-2-yl)- 2,5-diphenyltetrazolium bromide (Sigma-Aldrich, Darmstadt, Germany) to the cells for 4 h and OD<sup>562</sup> measurements according to the manufacturer's protocols (Sigma-Aldrich, Darmstadt, Germany).

#### *2.3. Virus Infection and Drug Treatment*

The Muenster SARS-CoV-2 isolate hCoV-19/Germany/FI1103201/2020 (EPI-ISL\_463008, mutation D614G in spike protein), and the two newly emerged variants B1.1.7 UK VOC (alpha) and B1.351 SA VOV (beta) were amplified on Vero E6 cells (passage 1) and used for the infection assays. Polarized Calu-3 cells were washed once with PBS and inoculated with

the virus diluted in infection-PBS (containing 0.2% BSA, 1% CaCl2, 1% MgCl2, 100 U/mL penicillin and 0.1 mg/mL streptomycin) at a multiplicity of infection (MOI) of 0.1 at 37 ◦C for 1 h. Following infection, cells were washed with PBS and cultured in infection-DMEM (serum-free DMEM containing 0.2% BSA, 1 mM MgCl2, 0.9 mM CaCl2, 100 U/mL penicillin, and 0.1 mg/mL streptomycin) at 5% CO<sup>2</sup> and 37 ◦C. Calu-3 cells were then treated with the solvent DMSO or the indicated GS-441524 or fluoxetine concentration at 2 h post-infection (hpi) for the entire 48 h infection period. Afterwards, the apical culture supernatants were collected and immediately frozen at −80 ◦C to determine the number of infectious particles.

#### *2.4. Plaque Assay*

The number of infectious particles in the supernatant of treated cells were governed via a standard plaque assay. Briefly, monolayers of Vero E6 cells cultured in six-well dishes were washed with PBS and infected with serial dilutions of the respective supernatants in infection-PBS for 1 h at 37 ◦C. Subsequently, the inoculum was replaced with 2x MEM (MEM containing 0.2% BSA, 2 mM L-glutamine 1 M HEPES, pH 7.2, 7.5% NaHCO3, 100 U/mL penicillin, 0.1 mg/mL streptomycin, and 0.4% Oxoid agar) and incubated at 37 ◦C for 72 h. A neutral red staining was performed to visualize virus plaques, and virus titers were calculated and expressed as plaque-forming units (PFU) per mL.

#### *2.5. Data and Statistical Analysis*

The required sample sizes (to detect a > 90% reduction in virus titers at a power > 0.8) were determined by using the a priori power analysis G\*Power 3.1 (Faul et al., 2007). Data were analyzed using the software GraphPad Prism version 8.00 (GraphPad).

To define dose–response curves, virus titers were normalized to the percentages of titers detected in cells treated with the solvent DMSO (control), and drug concentrations were log-transformed. EC values were calculated from the sigmoidal curve fits using a four-parameter logistic (4PL) model. The combinatory effects of the drug pair fluoxetine/GS-441524 were analyzed by using SynergyFinder, an open-source, free, standalone web application for the analysis of drug combination data [25]. The synergy was evaluated based on the Zero Interaction Potency (ZIP), Bliss independence, and highest single agent (HSA) reference models. Additionally, we analyzed the overall drug combination sensitivity score (CSS) by using the CSS method [26]. For statistical analysis of cytotoxicity assays, values were normalized to the percentages of toxicity detected in the control cells (cells treated with the solvent DMSO); significant differences were evaluated using a one-way ANOVA followed by Dunnett's multiple comparison test. \*\* *p* < 0.01, \*\*\* *p* < 0.001, \*\*\*\* *p* ≤ 0.0001.

#### **3. Results**

We have recently reported that the clinically used antidepressant fluoxetine in combination with the viral RNA-dependent RNA polymerase inhibitor remdesivir exhibits synergistic antiviral effects against the SARS-CoV-2 infection in vitro [27]. A major drawback for the in vivo use of the prodrug remdesivir is the very short plasma half-life time of approximately 20 min [17]. Remdesivir is converted into its main plasma metabolite GS-441524 when administered to patients [14,17]. Thus, we wanted to assess the antiviral potential of GS-441524 in a polarized Calu-3 cell culture model. We infected Calu-3 cells with the isolate hCoV-19/Germany/FI1103201/2020 at MOI 0.1 for 48 h and quantified the production of infectious SARS-CoV-2 particles by a plaque assay. Control Calu-3 cells that were treated with the solvent DMSO yielded viral titers up to 2 <sup>×</sup> <sup>10</sup><sup>6</sup> PFU, whereas treatment with the nucleoside GS-441524 2hpi significantly inhibited the production of the circulating SARS-CoV-2 variant in a dose-depended manner (Figure 1). Fitting of the experimental dose–response values to a nonlinear four-parameter logistic model resulted in a half-maximal inhibitory (EC50) and 90% inhibitory concentrations (EC90) of 0.28 µM and 1.33 µM, respectively, for the Muenster Isolate (Figure 1). Validation of Calu-3 cell viability after administration of GS-441524 via an MTT assay revealed that only a very high

concentration of GS-441524 resulted in detectable cytotoxicity, whereas all concentrations further used in the pharmacological interaction studies had no influence on the cell viability (Figure S1a, Supplementary Material). The calculated 50% cytotoxic concentration (CC50) of the remdesivir metabolite is 47.66 µM with a selectivity index (SI) of 170.21, which emphasizes a safe antiviral treatment window. ity after administration of GS-441524 via an MTT assay revealed that only a very high concentration of GS-441524 resulted in detectable cytotoxicity, whereas all concentrations further used in the pharmacological interaction studies had no influence on the cell viability (Figure S1a, Supplementary Material). The calculated 50% cytotoxic concentration (CC50) of the remdesivir metabolite is 47.66 μM with a selectivity index (SI) of 170.21, which emphasizes a safe antiviral treatment window.

were treated with the solvent DMSO yielded viral titers up to 2 × 106 PFU, whereas treatment with the nucleoside GS-441524 2hpi significantly inhibited the production of the circulating SARS-CoV-2 variant in a dose-depended manner (Figure 1). Fitting of the experimental dose–response values to a nonlinear four-parameter logistic model resulted in a half-maximal inhibitory (EC50) and 90% inhibitory concentrations (EC90) of 0.28 μM and 1.33 μM, respectively, for the Muenster Isolate (Figure 1). Validation of Calu-3 cell viabil-

*Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 4 of 14

**Figure 1.** Analysis of GS-441524-mediated reduction of infectious SARS-CoV-2 particle production. Polarized Calu-3 cells were infected with 0.1 MOI of SARS-CoV-2 (hCoV-19/Germany/FI1103201/2020) for 48 h. At 2 hpi, cells were treated with GS-441524 at the indicated concentrations. Data were expressed as mean infectious viral titers ± SEM or as mean percent inhibition ± SEM of SARS-CoV-2 replication (control cells that were treated with the solvent DMSO were set to 100%), *n* = 5. LogEC50 and LogEC90 values were determined by fitting a four-parameter non-linear regression model. **Figure 1.** Analysis of GS-441524-mediated reduction of infectious SARS-CoV-2 particle production. Polarized Calu-3 cells were infected with 0.1 MOI of SARS-CoV-2 (hCoV-19/Germany/FI1103201/2020) for 48 h. At 2 hpi, cells were treated with GS-441524 at the indicated concentrations. Data were expressed as mean infectious viral titers ± SEM or as mean percent inhibition ± SEM of SARS-CoV-2 replication (control cells that were treated with the solvent DMSO were set to 100%), *n* = 5. LogEC<sup>50</sup> and LogEC<sup>90</sup> values were determined by fitting a four-parameter non-linear regression model.

We next addressed whether a combinatory treatment with the drug pair fluoxetine-GS-441524 had a synergistic interaction to limit the SARS-CoV-2 infection. For studying the antiviral properties of the drug combinations, we used, for both drugs, concentrations that were previously reported to have an individual antiviral activity below 90%, whereas their combination was able to achieve a more than 90% reduction in viral titers [27]. The highest single dose of GS-441524 (1000 nM) was able to achieve 95% inhibition on viral titers, whereas treatment with the highest single dose of fluoxetine reduced viral titers up to 75% (Figure 2). We next addressed whether a combinatory treatment with the drug pair fluoxetine-GS-441524 had a synergistic interaction to limit the SARS-CoV-2 infection. For studying the antiviral properties of the drug combinations, we used, for both drugs, concentrations that were previously reported to have an individual antiviral activity below 90%, whereas their combination was able to achieve a more than 90% reduction in viral titers [27]. The highest single dose of GS-441524 (1000 nM) was able to achieve 95% inhibition on viral titers, whereas treatment with the highest single dose of fluoxetine reduced viral titers up to 75% (Figure 2). *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 5 of 14

**Figure 2.** Antiviral activities of a single treatment against SARS-CoV-2**.** Polarized Calu-3 cells were infected with SARS-CoV-2 and treated with the indicated GS-441524 or fluoxetine concentrations for 48 h. Bars represent mean percent inhibition ± SEM of infectious virus production, with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%; *n* = 5. Dotted line, 90% reduction in viral titer. **Figure 2.** Antiviral activities of a single treatment against SARS-CoV-2. Polarized Calu-3 cells were infected with SARS-CoV-2 and treated with the indicated GS-441524 or fluoxetine concentrations for 48 h. Bars represent mean percent inhibition ± SEM of infectious virus production, with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%; *n* = 5. Dotted line, 90% reduction in viral titer.

Next, we determined the number of infectious virus particles in Calu-3 cells that

analyzed the antiviral effects of a combined fluoxetine/GS-441524 treatment (Figure 3A,B). We observed a noticeable increase in the pharmacological inhibition of infectious virus production (>90%) when cells were treated with a concentration of 500 nM GS-441524 and 1000 nM fluoxetine or higher doses of the drug pair (Figure 3A,B), thus showing the great potential of a combination treatment of the remdesivir metabolite GS-441524 with fluoxetine. Additionally, we assessed the cytotoxic effects of the combinatory treatments via an MTT assay to exclude the potential synergistic toxicity of the drug pair. The MTT assay is based on the reduction in 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide to formazan crystals by NAD(P)H-dependent oxidoreductase enzymes in metabolically active cells, this colorimetric assay measures the metabolic activity as an integrated indicator of changes in the cell viability, cytotoxicity, and proliferation. As the analysis of the combination treatments with fluoxetine and GS-441524 did not reveal any toxicities (Figure S1b), we continued to analyze the drug synergy without the subtraction of cytotoxi-

city.

Next, we determined the number of infectious virus particles in Calu-3 cells that weretreated with a combination of both drugs. On the basis of our recent publications [27] on the antiviral potential of fluoxetine alone or in combination with remdesivir, we now analyzed the antiviral effects of a combined fluoxetine/GS-441524 treatment (Figure 3A,B). We observed a noticeable increase in the pharmacological inhibition of infectious virus production (>90%) when cells were treated with a concentration of 500 nM GS-441524 and 1000 nM fluoxetine or higher doses of the drug pair (Figure 3A,B), thus showing the great potential of a combination treatment of the remdesivir metabolite GS-441524 with fluoxetine. Additionally, we assessed the cytotoxic effects of the combinatory treatments via an MTT assay to exclude the potential synergistic toxicity of the drug pair. The MTT assay is based on the reduction in 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide to formazan crystals by NAD(P)H-dependent oxidoreductase enzymes in metabolically active cells, this colorimetric assay measures the metabolic activity as an integrated indicator of changes in the cell viability, cytotoxicity, and proliferation. As the analysis of the combination treatments with fluoxetine and GS-441524 did not reveal any toxicities (Figure S1b), we continued to analyze the drug synergy without the subtraction of cytotoxicity. *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 6 of 14

**Figure 3.** Antiviral activities of combination treatment with fluoxetine and GS-441524. Polarized Calu-3 cells were infected with SARS-CoV-2 and treated with the indicated drug combinations for 48 h. (**A**) Data were expressed as plaque-forming units (PFU) per mL detected in a single experimental sample, lines indicate means; *n* = 5/treatment or as (**B**) percent pharmacological inhibition of infectious virus production for the drug pair fluoxetine and GS-441524 (with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%). **Figure 3.** Antiviral activities of combination treatment with fluoxetine and GS-441524. Polarized Calu-3 cells were infected with SARS-CoV-2 and treated with the indicated drug combinations for 48 h. (**A**) Data were expressed as plaque-forming units (PFU) per mL detected in a single experimental sample, lines indicate means; *n* = 5/treatment or as (**B**) percent pharmacological inhibition of infectious virus production for the drug pair fluoxetine and GS-441524 (with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%).

Although drug synergy is not necessarily required for clinical benefits, synergy scoring remains an important parameter for the evaluation of drug combination therapies. Thus, we next evaluated the drug interaction profile of fluoxetine and GS-441524 by using three commonly used reference synergy models: ZIP, Bliss independence and highest single agent (HSA). Even though these different reference synergy models analyzed the drug interactions based on different basic interaction assumptions, they emphasized a synergistic action of GS-441524 and fluoxetine (Figure 4). The drug interaction relationships and landscape visualizations revealed in all models, a high synergy score when cells were treated with a combination of 500–1000 nM GS-441524 and ~1000 nM fluoxetine. The strong synergy of the combinatory treatment with both drugs led to an overall drug combination sensitivity score (CSS) of 92.42. Although drug synergy is not necessarily required for clinical benefits, synergy scoring remains an important parameter for the evaluation of drug combination therapies. Thus, we next evaluated the drug interaction profile of fluoxetine and GS-441524 by using three commonly used reference synergy models: ZIP, Bliss independence and highest single agent (HSA). Even though these different reference synergy models analyzed the drug interactions based on different basic interaction assumptions, they emphasized a synergistic action of GS-441524 and fluoxetine (Figure 4). The drug interaction relationships and landscape visualizations revealed in all models, a high synergy score when cells were treated with a combination of 500–1000 nM GS-441524 and ~1000 nM fluoxetine. The strong synergy of the combinatory treatment with both drugs led to an overall drug combination sensitivity score (CSS) of 92.42.

We further assessed the antiviral capacity of the combination therapy with GS-441524 and fluoxetine against the SARS-CoV-2 alpha and beta variants of concern (VOC). Both strains have mutations in the spike protein's receptor binding domain (for example, 501Y, a change from asparagine (N) to tyrosine (Y) in amino-acid position 501), which impair angiotensin-converting enzyme 2 (ACE2) binding specificity and lead to an increased

tor-binding domain enables a partial immune escape from neutralization induced by vaccination or previous virus infection and is therefore considered to be of concern [28]. Importantly, the combination of GS-441524 and fluoxetine potently reduced viral titers of both VOCs synergistically when compared to monotherapy (Figure 5). While the monotherapy reduced viral titers between 60 to 70% for fluoxetine or up to 90% when treated with GS-441524, the combination of both drugs resulted in a viral inhibition above 99% (Figure 5). Thus, the combination of the host-directed fluoxetine and the virus-targeting GS-441524 showed great antiviral potential against SARS-CoV-2 variants that have signif-

icant changes in the spike protein's receptor-binding domain.

**Figure 4.** Pharmacological interaction profile of the drug pair GS-441524 and fluoxetine. Drug interactions were analyzed based on the three commonly used reference models: (**A**) Zero Interaction Potency (ZIP), (**B**) Bliss independence, and (**C**) highest single agent (HSA). While the HSA model assumes a synergistic drug combination that produce additional benefits on top of what the drugs can achieve alone, the Bliss independence model uses probabilistic theory to model the effects of individual drugs in a combination as independent yet competing events. Synergy calculations via the ZIP model includes the comparison of potency changes of the dose–response curves between individual drugs and their combinations. A color-coded interaction surface was used to illustrate the synergy scores of the responses, where high synergistic scores **Figure 4.** Pharmacological interaction profile of the drug pair GS-441524 and fluoxetine. Drug interactions were analyzed based on the three commonly used reference models: (**A**) Zero Interaction Potency (ZIP), (**B**) Bliss independence, and (**C**) highest single agent (HSA). While the HSA model assumes a synergistic drug combination that produce additional benefits on top of what the drugs can achieve alone, the Bliss independence model uses probabilistic theory to model the effects of individual drugs in a combination as independent yet competing events. Synergy calculations via the ZIP model includes the comparison of potency changes of the dose–response curves between individual drugs and their combinations. A color-coded interaction surface was used to illustrate the synergy scores of the responses, where high synergistic scores are colored in red. Synergy score calculations via the ZIP and Bliss independence model revealed a synergy of ~ 15, while the HSA model showed a higher synergy score of ~23.

We further assessed the antiviral capacity of the combination therapy with GS-441524 and fluoxetine against the SARS-CoV-2 alpha and beta variants of concern (VOC). Both strains have mutations in the spike protein's receptor binding domain (for example, 501Y, a change from asparagine (N) to tyrosine (Y) in amino-acid position 501), which impair

angiotensin-converting enzyme 2 (ACE2) binding specificity and lead to an increased transmissibility [28–31]. At least for the beta variant, changes in the spike protein's receptor-binding domain enables a partial immune escape from neutralization induced by vaccination or previous virus infection and is therefore considered to be of concern [28]. Importantly, the combination of GS-441524 and fluoxetine potently reduced viral titers of both VOCs synergistically when compared to monotherapy (Figure 5). While the monotherapy reduced viral titers between 60 to 70% for fluoxetine or up to 90% when treated with GS-441524, the combination of both drugs resulted in a viral inhibition above 99% (Figure 5). Thus, the combination of the host-directed fluoxetine and the virus-targeting GS-441524 showed great antiviral potential against SARS-CoV-2 variants that have significant changes in the spike protein's receptor-binding domain. *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 8 of 14 are colored in red. Synergy score calculations via the ZIP and Bliss independence model revealed a synergy of ~ 15, while the HSA model showed a higher synergy score of ~ 23.

**Figure 5.** Antiviral effect of the combination therapy with GS-441524 and fluoxetine against two newly emerged SARS-CoV-2 variants. Polarized Calu-3 cells were infected with the (**A**) alpha or (**B**) beta variant of SARS-CoV-2 and treated with 2.5 μM fluoxetine, 1 μM GS-441524 or the combination of both drugs for 48 h. (**A**) Data were expressed as plaque-forming units (PFU) per mL detected in a single experimental sample, lines indicate means; *n* = 5/treatment or as (**B**) percent pharmacological inhibition of infectious virus production for the drug pair fluoxetine and GS-441524 (with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%, *n* = 5. Dotted line, 99% reduction in viral titer. One-way ANOVA followed by Dunnett's multiple comparison test. \*\*\* **Figure 5.** Antiviral effect of the combination therapy with GS-441524 and fluoxetine against two newly emerged SARS-CoV-2 variants. Polarized Calu-3 cells were infected with the (**A**) alpha or (**B**) beta variant of SARS-CoV-2 and treated with 2.5 µM fluoxetine, 1 µM GS-441524 or the combination of both drugs for 48 h. (**A**) Data were expressed as plaque-forming units (PFU) per mL detected in a single experimental sample, lines indicate means; *n* = 5/treatment or as (**B**) percent pharmacological inhibition of infectious virus production for the drug pair fluoxetine and GS-441524 (with mean virus titer produced in control cells (treated with the solvent DMSO) set to 100%, *n* = 5. Dotted line, 99% reduction in viral titer. One-way ANOVA followed by Dunnett's multiple comparison test. \*\*\* *p* ≤ 0.001, \*\*\*\* *p* ≤ 0.0001.

threats to humans and the health care systems. SARS-CoV-2, which causes COVID-19, has already led to more than 2 million deaths within one year. Thus, vaccines and antivirals are urgently needed to decelerate the global spreading and community transmission of

*p* ≤ 0.001, \*\*\*\* *p* ≤ 0.0001.

**4. Discussion** 

#### **4. Discussion**

Emerging zoonotic diseases such as the current SARS-CoV-2 pandemic are global threats to humans and the health care systems. SARS-CoV-2, which causes COVID-19, has already led to more than 2 million deaths within one year. Thus, vaccines and antivirals are urgently needed to decelerate the global spreading and community transmission of SARS-CoV-2. Antiviral therapy often includes a combination of several drugs, each targeting different steps in the virus life-cycle to circumvent the emergence of drug resistance. The benefit of antiviral combinations has been reported in a large number of studies [32–35]. The most significant and latest successes of antiviral combination therapy was achieved in the fight against HIV-1 or HCV, where drugs that interfere with the virus entry and replication were used [36–38]. While host-directed drugs mostly impair the viral replication without a complete eradication of the pathogen, antivirals that directly target viral proteins are much more efficient in eradicating viruses. However, a major concern of direct antiviral therapy is the risk to induce new resistant virus strains [39], an adaptive step that was already observed in the antiviral therapy against influenza or HIV [40,41]. The combination of antivirals with host-directed drugs makes it much more unlikely that a virus can overcome the antiviral barrier and emerge resistances. Thus, the combination of both is routinely explored for enhanced treatment success [42–44].

One critical step in the life cycle of enveloped viruses such as SARS-CoV-2 is the entry into the host cell. SARS-CoV-2, similar to other enveloped viruses, needs to overcome the host cell membrane for transferring the viral genome into the cytosol, a step that is limited by the fusion of viral and cellular membranes [23,45]. SARS-CoV-2 binds via its spike protein, a viral envelope protein, to the host cell receptor ACE2 [46–48]. Attachment of virus particles facilitate a priming of the spike protein via proteolytic cleavage, which is mediated by several host proteases and a prerequisite for membrane fusion. Cleavage by the cellular transmembrane protease serine 2 (TMPRSS2) triggers the fusion with the plasma membrane, whereas other endosome-residing proteases are required for the fusion of endocytosed SARS-CoV-2 particles with endosomes [23,45]. Thus, the endosomal compartment is a critical host/pathogen interface for SARS-CoV-2 [23]. The antiviral mode of action of fluoxetine is most likely based on its inhibitory effect on the endolysosomeresiding enzyme sphingomyelin phosphodiesterase ("acid sphingomyelinase", ASM). The blocking of ASM activity results in sphingomyelin accumulation, which negatively affects cholesterol release from the endolysosomal compartment, causing the favored antiviral barrier [22,23].

In our recent study [27], we showed that the combination of the host-directed drug fluoxetine and the viral RNA-dependent RNA-polymerase inhibitor remdesivir results in a synergistic antiviral effect on the production of infectious virus particles. Remdesivir was originally developed for the treatment of Ebola [18], but exerts antiviral activity against a number of other viruses, including Ebolavirus, Marburg virus, MERS-CoV and also SARS-CoV-2 [9,11,15,18,49]. Remdesivir was the first drug that received an FDA emergency use authorization for severe COVID-19 treatment. Since remdesivir has a very short half-life time in the plasma of patients (approximately 20 min) and, moreover, requires activation through pro-drug enzymes (such as carboxylesterases (CES1), cathepsin A (CTSA) and histidinetriad nucleotide binding proteins (HINT)) which are preferentially expressed in the liver [17,50–53], it is unsuitable for a lung-specific delivery and its clinical use remains controversial [17]. Structural similarity studies between the main remdesivir metabolite GS-441524 and human enzymes suggest that the bioactivation of GS-441524 relies on adenosine kinase (ADK) [17]. ADK is moderately expressed across all tissues and, thus, the administration of GS-441524 would be more eligible for systemic and lung-specific delivery. GS-441524 has been reported to potently inhibit SARS-CoV-2 replication in vitro and in a mouse model of SARS-CoV-2 infection and pathogenesis [11,12], implying this metabolite as a promising drug candidate for further evaluation. The favorable safety profile of GS-441524 (shown by the better SI values [10,54] and by animal models [55,56]) suggests an increased therapy window, which allows for a higher dosing of GS-441524 compared to remdesivir without causing adverse side effects.

Our data are consistent with recent studies demonstrating that the monotherapy of the remdesivir metabolite GS-441524 elucidated similar EC<sup>50</sup> and EC<sup>90</sup> values similar to remdesivir in polarized Calu-3 cells (GS-441524; EC<sup>50</sup> = 0.28 µM and EC<sup>90</sup> = 1.33 µM; remdesivir: EC<sup>50</sup> = 0.28 µM, EC<sup>90</sup> = 2.48 µM, ref. [11]).

We further evaluated the overall antiviral effect of the combination GS441524 and fluoxetine, which was larger than the expected sum of the independent drug effects, showing a synergistic effect against three circulating SARS-CoV-2 variants (Figures 4 and 5). Treatment of GS-441524 in combination with fluoxetine indicates a comparable synergistic activity to the recent published combination of fluoxetine and remdesivir [27]. Both combination treatments lead to an average synergy score of ~15 (in the ZIP or Bliss independence reference model) or of ~23 in the HSA reference model with a high synergy score when cells were treated with a combination of 500–1000 nM GS-441524 or remdesivir and 1000–2500 nM fluoxetine [27]. Of note, no cytotoxic effects were observed when the cells were treated with the combination of both drugs. For successful monotherapy of the individual drugs, high drug doses are required, and a prolonged treatment is often associated with poor patient compliance. The synergistic action of fluoxetine and GS-441524 offers the administration of lower concentrations of the individual drugs, which can reduce potential side effects.

The transfer of in vitro data to the in vivo situation is critical in antiviral research. Thus, we compared the concentrations shown to be effective in our in vitro study with reachable plasma concentrations in patients when drugs were administered. The nucleoside analog GS-441524 can reach plasma concentrations up to 1000-fold higher than remdesivir (maximum plasma levels 3 mg/L directly after intravenous infusion and 80–170 µg/L after 1 h when given intravenously) [14], whereas orally administered fluoxetine (20 mg/day) has a high bioavailability with plasma levels of 350 µg/L after two weeks and up to 1055 µg/L for longer treatment periods in patients [57,58]. For both drugs, plasma concentrations are well within the ranges that equal effective drug concentrations in vitro.

Our results demonstrate a strain-independent potential therapeutic capacity of combined treatment with the direct antiviral acting nucleoside analog GS-441524 and the host-directed drug fluoxetine to combat the SARS-CoV-2 infection and limit deleterious COVID-19 outcomes. At least mutations occurring in the spike protein's receptor binding domain had no influence on the antiviral efficacy of the combination or monotherapies with GS-441524 and/or fluoxetine (Figure 5) [28–31]. The eligibility of combining host-directed drugs with antivirals in SARS-CoV-2 therapy was recently confirmed in a double-blind, randomized, placebo-controlled trial where combination therapy with remdesivir and the host-directed Janus kinase inhibitor baricitinib was beneficial in the treatment of hospitalized COVID-19 patients [59,60].

However, combined medications pose the risk of drug–drug interactions which may lead to a reduced therapeutic benefit or even severe adverse effects. Thus, it is indispensable to survey the drug interactions and to carefully evaluate the appropriate treatment strategy against SARS-CoV-2. While clinical data from healthy donors showed that remdesivir and its metabolite GS-4412524 are metabolized through Cytochromes P450 (CYPs) enzymes (CYP2C8, CYP2D6, and CYP3A4), clinical studies that examined drug–drug interactions were not yet complete, although the mathematical prediction of DDI liability suggested that remdesivir and GS-441524 might elevate the levels of co-prescribed drugs that depend on these CYP enzymes [61–63]. However, the influence of remdesivir on CYP-enzyme dependent metabolism is suggested to be weak [61,63]. Thus, simultaneous administration with fluoxetine, another known inhibitor of CYPs (CYP2D6 and CYP2C9/10) should be carefully monitored [64–66]. As fluoxetine is also a serotonin-reuptake inhibitor (SRI), simultaneous administration with other SRIs should also be avoided (including amphetamines and other sympathomimetic appetite suppressants) [67,68]. For further information about possible drug–drug interaction, visit Drugs.com (accessed on 18 February 2021) [69]. Since

fluoxetine can exert, in some patients, serious side effects, we do not recommended selfmedication. The careful administration of drugs should exclusively rely on medical advice.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13091400/s1, Figure S1: Analysis of the cytotoxicity of GS-441524 monotherapy and of the combinatory treatment with fluoxetine.

**Author Contributions:** Conceptualization and methodology, S.S.; validation, formal analysis, investigation, and data curation, L.B., A.M.-Z., S.S. and S.Z.; resources, L.B., S.S., J.T., S.L. and U.R.; writing—original draft preparation, S.S.; writing—review and editing, S.S., L.B., S.L., J.T. and U.R.; visualization, L.B., S.Z. and S.S.; supervision, U.R., S.L., J.T. and S.S., project administration, S.S.; funding acquisition, S.S., L.B., U.R., J.T. and S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by grants from the German Research Foundation (DFG), CRC1009 "Breaking Barriers", Project A06 (to U.R.) and B02 (to S.L.), CRC 1348 "Dynamic Cellular Interfaces", Project A11 (to U.R.), DFG Lu477/23-1 (to S.L.), KFO342 TP6, Br5189/3-1 (to L.B.), Lu477/30-1 (to S.L.), the European Research Council No. 716063 (to S.Z. and J.T.), the Academy of Finland No. 317680 (to S.Z. and J.T.), the Interdisciplinary Center for Clinical Research (IZKF) of the Münster Medical School, grant number Re2/022/20 (to U.R.), Bru2/015/19 (to L.B.), the Innovative Medizinische Forschung (IMF) of the Münster Medical School, grant number SC121912 (to S.S.) and from BR111502(to L.B.). Further funding was provided by the German Federal Ministry for Education and Research (BMBF), grant number 01KI20218 (CoIMMUNE) and NUM-COVID-19, Organo-Strat 01KX2021 (to L.B. and S.L.). S.S., S.L. and U.R. are members of the German FluResearchNet, a nationwide research network on zoonotic influenza. S.S. and U.R. are also members of the British Pharmacological Society.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

**Acknowledgments:** We thank Jonathan Hentrey for help with the cell culture. We acknowledge support from the Open Access Publication fund at the University of Muenster.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **3D-ALMOND-QSAR Models to Predict the Antidepressant Effect of Some Natural Compounds**

**Speranta Avram 1,†, Miruna Silvia Stan 1,2 , Ana Maria Udrea 2,3,† , Cătălin Buiu 4,\*, Anca Andreea Boboc 5,6 and Maria Mernea <sup>1</sup>**


**Abstract:** The current treatment of depression involves antidepressant synthetic drugs that have a variety of side effects. In searching for alternatives, natural compounds could represent a solution, as many studies reported that such compounds modulate the nervous system and exhibit antidepressant effects. We used bioinformatics methods to predict the antidepressant effect of ten natural compounds with neuroleptic activity, reported in the literature. For all compounds we computed their drug-likeness, absorption, distribution, metabolism, excretion (ADME), and toxicity profiles. Their antidepressant and neuroleptic activities were predicted by 3D-ALMOND-QSAR models built by considering three important targets, namely serotonin transporter (SERT), 5-hydroxytryptamine receptor 1A (5-HT1A), and dopamine D2 receptor. For our QSAR models we have used the following molecular descriptors: hydrophobicity, electrostatic, and hydrogen bond donor/acceptor. Our results showed that all compounds present drug-likeness features as well as promising ADME features and no toxicity. Most compounds appear to modulate SERT, and fewer appear as ligands for 5-HT1A and D2 receptors. From our prediction, linalyl acetate appears as the only ligand for all three targets, neryl acetate appears as a ligand for SERT and D2 receptors, while 1,8-cineole appears as a ligand for 5-HT1A and D2 receptors.

**Keywords:** antidepressant; natural compounds; QSAR; molecular docking

#### **1. Introduction**

Depression is a common mental disorder, 264 million persons being affected worldwide, according to the WHO. A severe consequence of depression is suicide, and near 800,000 people commit suicide every year [1]. Depression can be treated by psychotherapy and medication involving antidepressant and antipsychotic drugs. Although these drugs have beneficial effects in the management of depression, their usage could lead to severe side effects like hepatotoxicity, weight gain, sexual dysfunction, cardiovascular disorders, central nervous system disturbances, etc. [2].

Natural compounds may represent a viable alternative in depression treatment with possibly fewer side effects, supporting their administration even in patients with comorbidities [3,4]. Here we used an in silico approach to predict the antidepressant activity of

**Citation:** Avram, S.; Stan, M.S.; Udrea, A.M.; Buiu, C.; Boboc, A.A.; Mernea, M. 3D-ALMOND-QSAR Models to Predict the Antidepressant Effect of Some Natural Compounds. *Pharmaceutics* **2021**, *13*, 1449. https://doi.org/10.3390/ pharmaceutics13091449

Academic Editors: Lucret,ia Udrescu, Paul Bogdan and Mihai Udrescu

Received: 13 July 2021 Accepted: 7 September 2021 Published: 10 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the following natural compounds: resveratrol, quercetin, limonene, sabinene, 1,8-cineole, chamazulene, linalyl acetate, germacrene D, nerol, and neryl acetate.

Resveratrol is a polyphenol with benefits in inflammation, brain diseases, and depression [4]. A study on irritable bowel syndrome rat model shows that resveratrol had inhibitory activity on the 5-hydroxytryptamine receptor 1A (5-HT1A), thus improving the brain–gut axis [5]. A review of twenty-two studies concludes that resveratrol has positive effects on animal models with depression, comparable with those of antidepressant drugs. Regarding safety, the same review concludes that resveratrol has an exceptional safety profile and only a few side effects [6].

Quercetin is a flavonoid whose antidepressant activity was studied on diabetic mice and compared with antidepressants fluoxetine and imipramine. Results show that quercetin had similar results with those drugs in diabetic mice but not in naive mice [7]. Another study on mice concludes that pre-administrated quercetin decreases stress-induced behaviour, regulates cholinergic and serotoninergic functions, has an anxiolytic and antidepressant effect, and boosts memory function [8]. Quercetin also inhibits the behavioral effects induced by corticotropin-releasing factor (anxiety and depression) in mice model study [9].

Limonene is a monocyclic monoterpene known for its antiviral, antibacterial, anticancer, and anti-inflammatory activities [10]. This compound also shows neuroprotective effects in a *Drosophila* model [11] and antidepressant-like activity (mediated by its antineuroinflammatory action and by lowering hippocampal nitrite levels) in a mice model [12]. Studies on mice models showed that limonene regulates dopamine levels and 5-HT function [13] and increases dopamine and norepinephrine levels [14].

Sabinene is a monoterpene with antimicrobial and antifungal activity, with possible effects on the central nervous system [15,16]. Sabinene is a component of *Origanum vulgare* (oregano) essential oil (4.95%), and it may show antidepressant-like activity in a rat model [17].

Eucalyptol, 1,8-cineole, is a monoterpenoid with benefits as an anti-mucolytic or antispasmolytic [18]. Additionally, 1,8-cineole inhalation shows an anxiolytic effect in both mice and humans [19,20]. Eucalyptol is found in various plants, including *Rosmarinus officinalis* (rosemary) aerial plants oil (8.58%). This oil shows inhibitory activity on 5-HT1A [21]. Even if this compound had antidepressant effects, the mechanism of action is not clear yet [14].

Chamazulene is an aromatic compound found in *Matricaria chamomilla* (chamomile) and is known as a compound with antioxidant, anti-inflammatory, and hepatoprotective effects [22]. Chamomile is known for its benefits in generalized anxiety and insomnia [23].

The monoterpene linalyl acetate is the main constituent of *Lavandula angustifolia* (lavander) essential oil. Lavender essential oil is known for its antidepressant and anxiolytic properties [24,25]. Linalyl acetate's mechanism of action as an antidepressant is well studied: this compound has binding affinity for N-methyl-D-aspartate (NMDA) receptor [26], and reduces the activity of 5-HT1A receptor [24], but not of serotonin transporter (SERT) [26]. Nerol monoterpene can also be found in *Lavandula angustifolia* essential oil [27] or in *Rosa damascene* (damask rose) oil. It can decrease the lipid peroxidation levels, a process occurring under chronic mild stress [28].

Germacrene D is a sesquiterpenoid found in *Anthriscus nemorosa* (chervil) essential oil in a proportion of 5.6%. Studies on rats revealed that scopolamine-induced memory impairment, anxiety, and depression can be improved using chervil essential oil [14]. Neryl acetate is one of the main constituents of *Cananga odorata* (ilang-ilang) essential oil. This oil decreases dopamine levels and increases serotonin levels in mice [29].

The compounds that we have chosen to analyze are indexed in FooDB, a resource on food constituents, comprising detailed descriptions of compound, including their physiological and presumed health effects [30]. Except for sabiene, linalyl acetate, nerol, and neryl acetate, the compounds are also indexed in DrugBank, a database comprising extensive information on drugs and drugs targets, including natural compounds and herbs used in therapeutic products [31]. The applications of selected compounds as indexed in

the two databases are summarized in Table 1. It is essential to highlight that DrugBank or any other database, for that matter—does not index the considered compounds as antidepressants. Taking into account their promising benefits in depression, as revealed mostly by studies conducted in animal models, we considered here the possibility to reposition some of these compounds as antidepressants and even as neuroleptics to be used in the treatment of humans.

**Table 1.** Applications of the natural compounds indexed in the critical data bases DrugBank 5.1.8 [31] and Foodb [30].


Previously, we used structure-activity relationship (SAR) models to investigate the potential of natural compounds from *Mentha spicata* essential oil to modulate acetylcholinesterase and NMDA receptor, two important targets considered in Alzheimer's disease therapy [32]. A similar approach can be applied in the case of other nervous system diseases, like depression. In depression, important therapy targets are SERT, dopamine receptor 2 (D2), and 5-HT1A [33,34]. In our previous work, we built quantitative structureactivity relationship (QSAR) models to predict the effect of candidate compounds against these targets [33,34].

Particularly useful in drug design, discovery, and development is 3D-QSAR methodology, which helps to understand the relationship between spatial parameters of molecules and their biological properties [35]. Recent studies have used 3D-QSAR as a screening step

in drug repositioning strategies, some examples being [36–39]. In these cases, QSAR models were used to screen compounds from: (i) DrugBank database [31] in order to identify promising candidates that modulate histone deacetylases [37] or inhibit SARS-CoV main protease [38]; (ii) FDA approved drugs from ZINC database [40] that inhibit Sirt2 [39]; or (iii) FDA approved drugs from e-Drug3D database [41] to identify druggable compounds for iatrogenic botulism treatment [36]. 3D-QSAR is valuable in drug repositioning, being complementary to other methods like molecular docking, because it predicts the activity of compounds (high/low activity) and yields the molecular features important for their effect [39].

In the present study we built three QSAR models to screen our collection of natural compounds identified from the literature against SERT, D2, and 5-HT1A receptor in order to identify the most promising compound with antidepressant and neuroleptic activity. Additionally, we analyzed the interactions between receptors and lead ligands using molecular docking.

#### **2. Materials and Methods**

#### *2.1. Preparation of Natural Compounds Structures*

The present study looked at ten natural compounds, including resveratrol, quercetin, limonene, sabinene, 1,8-cineole, chamazulene, linalyl acetate, germacrene D, nerol, and neryl acetate, based on their potential antidepressant effects identified in the literature, as described in the Section 1. The antidepressant efficacy of these drugs is determined using SERT and 5-HT1A receptors, as well as the neuroleptic effect on the D2 receptor.

MOE software was used to model and optimize the 3D structures of molecules. We minimized the energy using the MMFF94x force field at a 0.005 gradient and Gasteiger-type charges [42].

#### *2.2. Prediction of Compounds Drug- and Lead-Likeness Features*

The Lipinski [43], Veber [44], Ghose [45], and Egan [46] filters, which were predicted in the SwissADME online tool [47], were used to evaluate the drug-likeness of the natural compounds. The analyzed compounds should not violate more than three of these rules. According to the Lipinski criteria, compounds must have a molecular weight lower than 500 Daltons, no more than 10 hydrogen bond acceptors, no more than 5 hydrogen bond donors, and a log octanol/water (Log P(o/w)) lower than 5.

The molecular weight must be between 160 and 480, the Log P(o/w) must be between 0.4 and 5.6, the molar refractivity must be between 40 and 130, and the number of atoms must be between 20 and 70, according to the Ghose filter.

If the total polar surface area is less than 140 and the number of rotatable bonds is less than 10, the Veber rule applies. The Log P(o/w) must be less than 5.88, and the total polar surface area must be less than 131, according to the Egan filter.

#### *2.3. Computational Pharmacokinetics and Pharmacogenomics Profiles of Natural Compounds*

The SMILES files of natural compounds were used to predict their absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles using the pkCSM database [48].

We started by calculating all of the ADMET entries that the bioinformatics portal sent us. The following were chosen as related to our research: (i) intestinal absorption (percentage)—a molecule with an absorption rate of less than 30% is deemed poorly absorbed; (ii) permeability of the blood-brain barrier (BBB) represented as log BBB (logarithm of the brain to plasma drug concentration ratio)—higher than 0.3 indicates high BBB permeability, while lower than 1 indicates low BBB permeability; (iii) central nervous system (CNS) permeability—a compound with a permeability-surface area product (logPS) higher than −2 can penetrate the CNS; (iv) fraction unbound (human) is represented by the ratio of the unbounded compound on plasmatic proteins; (v) substrate of renal organic cation transporter 2 (OCT2), the main renal uptake transporter that is expressed on the basolateral side of the proximal tubule.

We investigated the potential of the compounds to serve as inhibitors or substrates for cytochromes involved in the metabolization of neuropsychiatric medications, such as CYP2D6, CYP3A4, CYP1A2, CYP2C19, and CYP2C9, to predict their pharmacogenomic profile. The prediction of toxicity was a significant aspect of our research.

We assessed AMES toxicity, hepatotoxicity, LD50 (median lethal dose), and maximum tolerated dose (human).

#### *2.4. Building 3D-ALMOND-QSAR to Predict Natural Compounds Effects*

We used Pentacle software to create three 3D-QSAR-ALMOND models to predict the action of natural compounds on SERT, D2, and 5-HT1A receptors [33,49]. The three models are further called QSAR-SERT, QSAR-D2, and QSAR-5-HT1A in correlation with the target they address.

For each molecule, we computed several molecular descriptors. Each descriptor's contribution to biological activity was assessed singly or in groups of different combinations. The hydrophobicity, electrostatic, and hydrogen bond donor/acceptor features were found to be the most important statistical combination of molecular descriptors for all QSAR models.

The chemometric analysis was made using the regression analysis, partial least squares (PLS) within PENTACLE. The number of PLS components (latent variables, LVs = 5) was chosen to achieve optimum values of statistical parameters. These are r<sup>2</sup> > 0.8 (fitted correlation coefficient) and q<sup>2</sup> > 0.6 (cross-validated correlation coefficient). Additionally, SDEP (standard deviation of error prediction) and SDEC (standard deviation of error calculation) were evaluated.

The generation of consistent statistical models depends on the quality of training, validation, and testing sets in terms of structural diversity and property values distribution.

QSAR-SERT model has amitriptyline, citalopram, clomipramine, desipramine, doxepin, escitalopram, fluoxetine, imipramine, lofepramine, paroxetine, sertraline, trazodone, venlafaxine, aripiprazole, chlorpromazine, clozapine, fluphenazine, haloperidol, risperidone, sertindole, and zotepine in the training set and, in the validation set, bupropion, olanzapine, quetiapine, thioridazine, ziprasidone, and fluvoxamine.

QSAR-5-HT1A model has amitriptyline, desipramine, doxepin, escitalopram, fluoxetine, trazodone, aripiprazole, chlorpromazine, fluphenazine, haloperidol, iloperidone, loxapine, olanzapine, prochlorperazine, quetiapine, risperidone, spiperone, trifluoperazine, and ziprasidone in the training set and, in the validation set, clozapine, sertindole, thioridazine, and zotepine.

QSAR-D2 model has amitriptyline, despyramine, flufenazine, haloperidol, iloperidone, loxapine, prochlorperazine, risperidone, spiperone, trifluoperazine, clomipramine, clozapine, mesoridazine, olanzapine, promazine, remoxipride, sertindole, thioridazine, and zotepine in the training set and, in the validation set, doxepine, aripriprazole, quetiapine, chlorpromazine, and ziprasidone.

The test set used in all QSAR models is represented by the ten natural compounds that we investigated.

The biological activities of compounds in training and validation sets expressed as Ki values (inhibition constants) were retrieved from PDSP Ki Database—Psychoactive Drug Screening Program [50]. The three models that we built predict the biological activities of compounds as pKi values (log 1/Ki) for a better statistical analysis.

#### *2.5. Molecular Docking Protocol*

The interactions of the lead compounds identified using our QSAR models with SERT, D2, and 5-HT1A receptors were predicted using molecular docking.

The 3D protein structures were imported from Protein Data Bank in the case of SERT (PDB ID: 6VRH [51]) and D2 (PDB ID:6CM4 [52]) receptors; the structure of the 5-HT1A receptor was imported from AlphaFold [53].

resveratrol [PubChem ID = 445154]

chamazulene [PubChem ID = 10719]

germacrene D [PubChem ID = 5317570]

linalyl acetate [PubChem ID = 8294]

The molecular docking was performed using the CDOCKER algorithm [54] implemented in Biovia Discovery Studio v16.1.0.15350 (BIOVIA Dassault Systemes, San Diego, CA, USA). mented in Biovia Discovery Studio v16.1.0.15350 (BIOVIA Dassault Systemes, San Diego, CA, USA). The ligands were docked in the drug binding cavities according to the PDB files The ligands were docked in the drug binding cavities according to the PDB files used [51,52]. In the case of 5-HT1A, the binding site was identified by similarity with the site of the D2 receptor [52]. The Docking protocol was applied as described in the study

Drug Screening Program [50]. The three models that we built predict the biological ac-

Drug Screening Program [50]. The three models that we built predict the biological ac-

The interactions of the lead compounds identified using our QSAR models with

The 3D protein structures were imported from Protein Data Bank in the case of SERT (PDB ID: 6VRH [51]) and D2 (PDB ID:6CM4 [52]) receptors; the structure of the 5-HT1A

The interactions of the lead compounds identified using our QSAR models with

The 3D protein structures were imported from Protein Data Bank in the case of SERT (PDB ID: 6VRH [51]) and D2 (PDB ID:6CM4 [52]) receptors; the structure of the 5-HT1A

The molecular docking was performed using the CDOCKER algorithm [54] imple-

The molecular docking was performed using the CDOCKER algorithm [54] implemented in Biovia Discovery Studio v16.1.0.15350 (BIOVIA Dassault Systemes, San Diego,

tivities of compounds as pKi values (log 1/Ki) for a better statistical analysis.

tivities of compounds as pKi values (log 1/Ki) for a better statistical analysis.

SERT, D2, and 5-HT1A receptors were predicted using molecular docking.

SERT, D2, and 5-HT1A receptors were predicted using molecular docking.

The ligands were docked in the drug binding cavities according to the PDB files used [51,52]. In the case of 5-HT1A, the binding site was identified by similarity with the site of the D2 receptor [52]. The Docking protocol was applied as described in the study of Rao et al. [55]. used [51,52]. In the case of 5-HT1A, the binding site was identified by similarity with the site of the D2 receptor [52]. The Docking protocol was applied as described in the study of Rao et al. [55]. **3. Results**  of Rao et al. [55]. **3. Results**  *3.1. Drug-Likeness, Pharmacokinetics, and Pharmacogenomics Profiles of Compounds* 

#### **3. Results** *3.1. Drug-Likeness, Pharmacokinetics, and Pharmacogenomics Profiles of Compounds*  The structures of compounds were retrieved from PubChem database [56] as

CA, USA).

*2.5. Molecular Docking Protocol* 

*2.5. Molecular Docking Protocol* 

receptor was imported from AlphaFold [53].

receptor was imported from AlphaFold [53].

#### *3.1. Drug-Likeness, Pharmacokinetics, and Pharmacogenomics Profiles of Compounds* The structures of compounds were retrieved from PubChem database [56] as SMILES (Simplified Molecular Input Line Entry) files, as presented in Table 2.

The structures of compounds were retrieved from PubChem database [56] as SMILES (Simplified Molecular Input Line Entry) files, as presented in Table 2. SMILES (Simplified Molecular Input Line Entry) files, as presented in Table 2. **Table 2.** The name of natural compounds and PubChem ID, 2D structure, SMILES [56], natural source, compound FooDB ID [30], and the druglikness features [47].

**Table 2.** The name of natural compounds and PubChem ID, 2D structure, SMILES [56], natural source, compound FooDB ID [30], and the druglikness features [47]. **Table 2.** The name of natural compounds and PubChem ID, 2D structure, SMILES [56], natural source, compound FooDB ID [30], and the druglikness features [47]. **Compound Smiles Natural Source/ FooDB Id Lipins ki Veber Ghose Egan** 

*Pharmaceutics* **2021**, *13*, 1449 6 of 19

*Pharmaceutics* **2021**, *13*, 1449 6 of 19

CCC1=CC2=C(C= CC2=C(C=C1)C) C

CC1=CCCC(=C) C=CC(CC1)C(C) C

CC(=CCCC(C)(C =C)OC(=O)C)C

C1=CC(=CC=C1C =CC2=CC(=CC(= C2)O)O)O

skin of grapes

german chamomile, roman chamomile [FDB015363]

Peppermint

sage

[FDB031212] YES YES YES YES

[FDB003856] YES YES YES YES

[FDB019133] YES YES YES YES

YES YES YES YES

ID = 18818]

neryl acetate PubChem [ID = 1549025]

quercetin [PubChem ID = 5280343]

sabinene [PubChem

sabinene [PubChem ID = 18818]

sabinene [PubChem ID = 18818]


**Table 2.** *Cont*.

CC(C)C12CCC(= C)C1C2

CC(C)C12CCC(= C)C1C2

CC(C)C12CCC(= C)C1C2

CC(C)C12CCC(= C)C1C2

lemon, mint

lemon, mint

lemon, mint

lemon, mint

[FDB001454] YES YES

[FDB001454] YES YES

[FDB001454] YES YES

[FDB001454] YES YES

No; 1 violation: MW < 160

No; 1 violation: MW < 160

No; 1 violation: MW < 160

No; 1 violation: MW < 160

YES

YES

YES

YES

*Pharmaceutics* **2021**, *13*, 1449 7 of 19

*Pharmaceutics* **2021**, *13*, 1449 7 of 19

*Pharmaceutics* **2021**, *13*, 1449 7 of 19

*Pharmaceutics* **2021**, *13*, 1449 7 of 19

C1=CC(=C(C=C1 C2=C(C(=O)C3= C(C=C(C=C3O2) O)O)O)O)O

tures and could present a good bioavailability.

human fraction unbound (HFU), maximum tolerated dose (human), and LD50 for selected compounds.

**Compound HIA Log BBB CNS** 

imum of tolerated dose (human), and LD50 (Table 3).

**Table 3.** Computed human intestinal absorption (HIA), blood-brain barrier permeability (log BBB), CNS permeability,

1,8-cineole 96.50 0.36 −2.97 0.55 0.55 2.01 limonene 95.89 0.72 −2.37 0.48 0.77 1.88 sabinene 95.35 0.83 −1.46 0.29 0.36 1.54 resveratrol 89.05 −0.04 −2.09 0.18 0.48 1.79 chamazulene 94.50 0.79 −1.82 0.24 0.05 1.45 germacrene D 95.59 0.72 −2.13 0.26 0.49 1.63 linalyl acetate 95.27 0.51 −2.37 0.42 0.54 1.72

lemon balm, peppermint [FDB014946]

Grape

To determine the drug-likeness of compounds, we applied different filters, as presented in Table 2. As can be seen, all compounds comply with Lipinski, Veber, and Egan rules. In the case of the Ghose rule, only 1,8-cineole, limonene, and sabiene present one violation of the rule. These results show that the compounds present drug-likeness fea-

ADME and toxicity profiles of compounds were computed, with an emphasis on human intestinal absorption (HIA), BBB and CNS permeabilities, human fraction unbound (HFU), renal OCT2 substrate, mutagenesis features-AMES, hepatotoxicity, max-

**Permeability HFU Max. Tolerated Dose** 

YES YES YES YES

**(Human) LD50** 

[FDB011904] YES YES YES YES

nerol

ID = 643820]

ID = 643820]


**Table 2.** *Cont*.

CC(=CCCC(=CC O)C)C

CC(=CCCC( =CCO)C)C common grapes

common grapes

[FDB014945] YES YES YES YES

[FDB014945] YES YES YES YES

*Pharmaceutics* **2021**, *13*, 1449 8 of 19

*Pharmaceutics* **2021**, *13*, 1449 8 of 19

To determine the drug-likeness of compounds, we applied different filters, as presented in Table 2. As can be seen, all compounds comply with Lipinski, Veber, and Egan To determine the drug-likeness of compounds, we applied different filters, as presented in Table 2. As can be seen, all compounds comply with Lipinski, Veber, and Egan rules. In the case of the Ghose rule, only 1,8-cineole, limonene, and sabiene present one violation of the rule. These results show that the compounds present drug-likeness features and could present a good bioavailability. To determine the drug-likeness of compounds, we applied different filters, as presented in Table 2. As can be seen, all compounds comply with Lipinski, Veber, and Egan rules. In the case of the Ghose rule, only 1,8-cineole, limonene, and sabiene present one violation of the rule. These results show that the compounds present drug-likeness features and could present a good bioavailability.

rules. In the case of the Ghose rule, only 1,8-cineole, limonene, and sabiene present one violation of the rule. These results show that the compounds present drug-likeness features and could present a good bioavailability. ADME and toxicity profiles of compounds were computed, with an emphasis on human intestinal absorption (HIA), BBB and CNS permeabilities, human fraction un-ADME and toxicity profiles of compounds were computed, with an emphasis on human intestinal absorption (HIA), BBB and CNS permeabilities, human fraction unbound (HFU), renal OCT2 substrate, mutagenesis features-AMES, hepatotoxicity, maximum of tolerated dose (human), and LD50 (Table 3). ADME and toxicity profiles of compounds were computed, with an emphasis on human intestinal absorption (HIA), BBB and CNS permeabilities, human fraction unbound (HFU), renal OCT2 substrate, mutagenesis features-AMES, hepatotoxicity, maximum of tolerated dose (human), and LD50 (Table 3).


linalyl acetate 95.27 0.51 −2.37 0.42 0.54 1.72

bound (HFU), renal OCT2 substrate, mutagenesis features-AMES, hepatotoxicity, maximum of tolerated dose (human), and LD50 (Table 3). human fraction unbound (HFU), maximum tolerated dose (human), and LD50 for selected compounds. **Table 3.** Computed human intestinal absorption (HIA), blood-brain barrier permeability (log BBB), CNS permeability, human fraction unbound (HFU), maximum tolerated dose (human), and LD50 for selected compounds.

**Table 3.** Computed human intestinal absorption (HIA), blood-brain barrier permeability (log BBB), CNS permeability,

Here, we also computed the biological activities of considered natural compounds at some very important human cytochromes involved in neuropsychiatric disorders, namely CYP2D6, CYP3A4, CYP1A2, CYP2C19, and CYP2C9 [57].

The biological activities of natural compounds were expressed as inhibitors or substrates of human cytochromes, as presented in Table 4.


**Table 4.** The inhibitor/substrate features of natural compounds at CYP2D6, CYP3A4, CYP1A2, CYP2C19 and CYP2C9.

We intended to generate a pharmacogenomic pathway of natural compounds through these predictions, establishing if these are metabolized by the same cytochromes as classical antidepressants or neuroleptics, which is relevant when a combinatorial therapy involving classical antidepressants, classical neuroleptics, and natural compounds is indicated.

#### *3.2. Natural Compounds' Antidepressant Activities Predicted by 3D-ALMOND-QSAR*

Three QSAR models (QSAR-SERT, QSAR-D2, and QSAR-5-HT1A) were built to predict the biological effect of natural compounds against SERT, 5-HT1A, and D2 receptors. In building the models we initially considered individual descriptors like hydrophobicity, hydrogen bond donor/acceptor, electrostatic, or steric. These models could not predict biological activities in correlation with experimental activities.

Further, we considered the contribution of several descriptors at the same time, which led to a significant improvement of the prediction accuracy of our models (r<sup>2</sup> > 0.9, q<sup>2</sup> > 0.8, SDEP < 0.5), the statistical parameters being given in Table 5.

**Table 5.** Summary of the ALMOND statistical parameters in QSAR-SERT, QSAR-5-HT1A, and QSAR-D2.


The predicted activity of classical neuropsychiatric drugs in the training and validation sets was calculated according to the QSAR equations previously generated and was compared with experimental activity on SERT, 5-HT1A, and D2 receptors (Table 6).

Our QSAR models are described by very good statistical parameters, which allow us to predict the biological activities of 1,8-cineole, limonene, sabinene, resveratrol, chamazulene, germacrene D, linalyl acetate, nerol, neryl acetate, and quercetin at SERT, 5-HT1A, and D2 by following the QSAR equation generated in ALMOND-Pentacle (Table 6).


**Table 6.** Predicted and experimental biological activities of compounds at SERT, 5-HT1A, and D2 receptors. The biological activities of molecules in the validation set are in italics. In brackets are the predicted biological activities of natural compounds versus the most active compounds of each QSAR model (paroxetine in QSAR-SERT; ziprasidone in QSAR-5HT-1A; spiperone in QSAR-D2).

The correlation between the training and validation sets of our QSAR models is also represented in Figure 1.

resveratrol 8.68 (−1.41) 5.23 (−3.49) 6.72 (−3.43) chamazulene 9.50 (−0.59) 6.51 (−2.21) 7.96 (−2.19) germacrene D 9.52 (−0.57) 6.49 (−2.23) 7.90 (−2.25) linalyl acetate 9.40 (−0.69) 6.89 (−1.83) 8.11 (−2.04)

neryl acetate 10.61 (0.52) 6.06 (−2.66) 8.14 (−2.01) quercetin 6.42 (−3.67) 5.87 (−2.85) 8.39 (−1.76)

> **Figure 1.** The correlation between experimental and predicted values of QSAR SERT (**a**), QSAR 5-HT1A (**b**), and QSAR D2 (**c**) models. Data were plotted and fitted using Origin Pro, version 9.2 (2015), OriginLab Corporation, Northampton, MA, USA. **Figure 1.** The correlation between experimental and predicted values of QSAR SERT (**a**), QSAR 5-HT1A (**b**), and QSAR D2 (**c**) models. Data were plotted and fitted using Origin Pro, version 9.2 (2015), OriginLab Corporation, Northampton, MA, USA.

#### *3.3. Molecular Docking 3.3. Molecular Docking*

The interaction of the most promising compounds acting on the three protein targets were investigated by molecular docking. Therefore, we docked linalyl acetate at SERT, D2, and 5-HT1A; neryl acetate was docked at SERT and 5-HT1A; and 1,8-cineole was docked at D2 and 5-HT1A (see Section 4.3). The binding of ligands was evaluated based on CDOCKER energy and CDOCKER interaction energy; values are presented in Table 7. The interaction of the most promising compounds acting on the three protein targets were investigated by molecular docking. Therefore, we docked linalyl acetate at SERT, D2, and 5-HT1A; neryl acetate was docked at SERT and 5-HT1A; and 1,8-cineole was docked at D2 and 5-HT1A (see Section 4.3). The binding of ligands was evaluated based on CDOCKER energy and CDOCKER interaction energy; values are presented in Table 7.


**Table 7.** Molecular docking predictions of interactions between molecular targets in depression, natural compounds linalyl acetate, neryl acetate, and 1,8-cineole and CDOCKER scores calculated for analyzed ligands. **Table 7.** Molecular docking predictions of interactions between molecular targets in depression, natural compounds linalyl acetate, neryl acetate, and 1,8-cineole and CDOCKER scores calculated for analyzed ligands.

targets. The 2D interaction maps are presented in Figure 2. We further analyzed the structural basis of the interaction between ligands and targets. The 2D interaction maps are presented in Figure 2.

We further analyzed the structural basis of the interaction between ligands and

**Figure 2.** Interaction maps (2D) calculated based on the complexes formed by linalyl acetate with 5-HT1A (**a**), SERT (**c**), or D2 (**g**); 1,8-cineole with 5-HT1A (**b**) or D2 (**f**) and neryl acetate with SERT (**d**) or D2 (**e**). The maps were generated using Discovery Studio Visualizer v21.1.0.20298 (BIOVIA Dassault Systemes, San Diego, CA, USA). **Figure 2.** Interaction maps (2D) calculated based on the complexes formed by linalyl acetate with 5-HT1A (**a**), SERT (**c**), or D2 (**g**); 1,8-cineole with 5-HT1A (**b**) or D2 (**f**) and neryl acetate with SERT (**d**) or D2 (**e**). The maps were generated using Discovery Studio Visualizer v21.1.0.20298 (BIOVIA Dassault Systemes, San Diego, CA, USA).

#### **4. Discussion**

Drug repositioning involves the identification of novel treatments for diseases based on "old" drugs or compounds [58]. Several strategies were developed to achieve the identification of druggable compounds against novel targets, like QSAR studies, in conjunction with molecular docking [36], with molecular docking and molecular dynamics [37,38], and even with quantum mechanics/molecular mechanics methods [39]. High performance QSAR models can be obtained by using machine learning approaches to classify the molecular descriptors of compounds from large datasets [59]. Repositioning drug candidates can be identified using drug-drug interaction networks [60]; the method even allows the ranking of compounds into simple and complex multi-pathology therapies [61]. Unsupervised machine learning approaches can be used to establish dug–drug similarity networks based on drug–target interactions, which also lead to the identification of repositioning candidates [62]. Other approaches in drug repositioning, as well as limitations and recommendations, are presented in [58].

In the present study we performed a computational investigation on the possibility of repositioning some natural compounds as antidepressants and neuroleptics. Our tentative hypotheses were supported by previous experimental studies that report their antidepressant effects mainly in animals, as presented in the Section 1. Our strategy involved an initial filtering of compounds based on their drug-like properties, their predicted pharmacokinetic, and pharmacogenomic profiles (Sections 4.1 and 4.2). In the following step, QSAR models were built to predict the most active inhibitors of three druggable targets in depression, namely SERT, 5-HT1A, and D2 receptors. We selected the potent compounds that modulate three or at least two targets at once (Section 4.3). The interactions between lead compounds and the targets were addressed by molecular docking (Section 4.4).

#### *4.1. Assessment of Compounds Drug-Likeness Features*

Generally, all the natural compounds studied here are in agreement with the medicinal chemistry rules (Table 2). One rule of the Ghose filter (MW < 160) was violated only by 1,8-cineole, limonene, and sabinene. The molecular descriptors of resveratrol, chamazulene, germacrene D, linalyl acetate, nerol, neryl acetate, and quercetin presented values within the ranges defined by Lipinski, Veber, Ghose, and Egan rules. Our results suggest that the natural compounds considered here present drug-likeness features and should be further characterized by computing their pharmacokinetic and pharmacodynamic profiles.

#### *4.2. Computational Pharmacokinetics and Pharmacogenomics Profiles of Natural Compounds*

The ADME profiles predicted for the compounds showed that the human intestinal absorption parameter lies in the 77.20% (quercitin) to 96.50% (1,8-cineole) range, as shown in Table 3. Very good human intestinal absorption values were recorded for 1,8-cineole, neryl acetate, and limonene.

Regarding the distribution of the natural compounds in the human body, the human fraction unbound parameter presented a large variation, from 0.18 (resveratrol) to 0.55 (1,8-cineole). Unfortunately, the selected natural compounds presented low human fraction unbound percents. Other critical parameters for describing the natural compounds' distribution in the body are BBB and CNS permeabilities. As shown in Table 3, the considered natural compounds recorded very good BBB permeability values, log BBB ranging from 0.83 (sabinene) to −1.09 (quercitin). An easy BBB penetration was recorded for sabinene, limonene, chamazulene, germacrene D, and nerol, our results being in good agreement with the experimental studies [14,63]. The predicted CNS permeability values range from −1.46 (sabinene) to −3.06 (quercitin). These suggest that the considered natural compounds should have a good CNS permeability, the most permeable compounds being sabinene, chamazulene, resveratrol, germacrene D, and nerol (Table 3). These results are supported by experimental studies showing a possible activity of these compounds at CNS level [15].

The metabolism of compounds was addressed by predicting their affinities for human cytochrome P450 proteins (CYPs): CYP2D6, CYP3A4, CYP1A2, CYP2C19, and

CYP2C9 (Table 4). Our results revealed that resveratrol inhibits CYP2D6, CYP3A4, CYP1A2, CYP2C19, and CYP2C9, quercitin and chamazulene inhibit CYP1A2. Our results are in agreement with an experimental study mentioning that resveratrol modified the metabolism of aripiprazole by CYP2D6 and CYP3A4 [64]. The predicted inhibitory activity of quercetin on CYP1A2 is supported by a previous study [65] showing that quercetin is able to change the metabolism of melatonin by CYP1A2. Quercetin was also reported to be a strong inhibitor of CYP2D6 and a moderate inhibitor of CYP3A4 [66]. Chamazulene was experimentally proved to be a potent inhibitor of CYP1A2, CYP4A4, and CYP2D6 [67].

Important results were recorded for the elimination rate: none of the ten natural compounds are renal OCT2 substrates. In our study, high importance was given to predicting the toxicity of compounds. Our results predict that none of the compounds should present hepatotoxicity, cardiotoxicity, or AMES features. Additionally, we evaluated the maximum tolerated dose (human) and LD50 of natural compounds. Predicted LD50 values vary in a short range, from 1.54 (sabinene) to 2.47 (quercetin). A significant fluctuation was recorded for the maximum tolerated dose, from 0.05 mol/Kg (chamazulene) to 0.85 mol/Kg (nerol). Taken together, our results suggest that none of the natural compounds that we considered are toxic.

#### *4.3. Predicted Pharmacodynamic Profiles of Natural Compounds on SERT, 5H-T1A, and D2 Active Sites by 3D-ALMOND-QSAR*

The application of the QSAR-SERT model to molecules from the training and testing sets resulted in a suitable correlation between experimental and calculated biological activities. The differences between experimental and predicted biological activities for the molecules in the training set vary from 0.00 (fluphenazine) and −0.53 (zotepine), while the differences between experimental and predicted biological activities for the molecules in the validation set vary from 0.15 (fluvoxamine) and −1.40 (thioridazine). The statistical parameters (Table 5) and the good correlation between experimental and predicted biological activities of classical neuropsychiatric drugs support the strong power of prediction of QSAR-SERT model. Therefore, the model was used to predict the biological activities of natural compounds against SERT. These are presented in Table 6.

In order to evaluate the potency of natural compounds against SERT, their predicted biological effects were subtracted from the value obtained for paroxetine, the most active compound from the training set (pKi experimental = 10.09). The results show that natural compounds such as limonene (pKiparoxetine-pKilimonene = −0.64), sabinene (pKiparoxetinepKisabinene = −0.72), chamazulene (pKiparoxetine-pKichamazulene = −0.59), germacrene D (pKiparoxetine-pKigermacrene D = −0.57), linalyl acetate (pKiparoxetine-pKilinalyl acetate = −0.69), nerol (pKiparoxetine-pKinerol = −0.35), and neryl acetate (pKiparoxetine-pKineryl acetate = 0.52) have a strong antidepressant character.

Our results are in accord with other studies [14,68] which mention that limonene and sabinene reduce the depression-related behaviors in a similar manner with fluoxetine [69], linalyl acetate increases 5-HT levels in the amygdala, hypothalamus, and hippocampus of mice [70] and nerol effectively reduces the symptoms of depression and sensitivity.

In QSAR-5-HT1A model, a good correlation between predicted and experimental biological activities was noticed for molecules from the training set, where residual value varies from 0.04 (fluoxetine, iloperidone) to −0.28 (amitriptyline) and also for the validation set, where the residual value varies from 0.46 (thioridazine) to 1.65 (clozapine). The biological activities of natural compounds at 5-HT1A were evaluated relative to the activity of ziprasidone, the most active compound of the training set (pKi experimental = 8.72). We noticed that the natural compounds presented a middle antidepressant activity. A good affinity was predicted in the case of 1,8-cineole (pKiziprasidone-pKi1,8-cineole = 1.83) and linalyl acetate (pKiziprasidone-pKilinalyl acetate = 1.83) (Table 6).

The neuroleptic activity of natural compounds was evaluated by their affinity at D2 receptor. The power of prediction of QSAR-D2 model was sustained by the good statistical parameters (Table 5). Similar to previous QSAR models, a good correlation between predicted and experimental biological activities were obtained in both train-

ing and validation sets (Table 6). The neuroleptic activity of natural compounds was evaluated versus spiperone (pKi experimental = 10.15) and we noticed that a good neuroleptic activity is recorded by quercetin (pKispiperone-pKiquercetin = −1.76), neryl acetate (pKispiperone-pKineryl acetate = −2.01), linalyl acetate (pKispiperone-pKilinalyl acetate = −2.04), and 1,8-cineole (pKispiperone-pKi1,8-cineole = −2.04). The affinity of quercitin at D2 is close to the affinity of mesoridazine, loxapine, and olanzapine, our results being supported by experimental studies [71].

#### *4.4. Molecular Basis of the Interaction between Lead Compounds and Targets*

The docking scores that we calculated were CDOKER energy (calculated based on ligand strain energy and receptor-ligand interaction energy) and CDOCKER interaction energy (calculated based on ligand-receptor nonbonded interaction energy). In the case of linalyl acetate acting on the three targets, neryl acetate acting on SERT, and 5-HT1A and 1,8-cineole acting on D2 and 5-HT1A we obtained negative CDOCKER interaction energies, confirming that the ligands present favorable interaction energies with the targets (Table 7). The most favorable interaction energies were obtained for linalyl acetate and neryl acetate, while less favorable energies were calculated for 1,8-cinole.

By analyzing the 2D interaction maps presented in Figure 2 we observe that compounds form hydrogen bonds with the targets in the case of linalyl acetate with SERT and 5-HT1A or in the case of neryl acetate and SERT. Other types of interactions established by the compounds and the targets fall in the category of: (i) van der Waals interactions important for the binding of linalyl acetate to SERT, linalyl acetate to D2, or 1,8-cineole to D2; (ii) alkyl or π-alkyl interactions—important for the binding of linalyl acetate to 5-HT1A or D2, neryl acetate to D2, or 1,8-cineole to D2; and (iii) µ-σ interactions—which appear only in the case of linalyl acetate binding to SERT. The types of interactions that we identified are consistent with the molecular properties relevant for target binding that we identified using our QSAR models.

#### **5. Conclusions**

Medication repositioning is a quick way to employ an existing drug to treat new diseases [72–74]. The present study has investigated the opportunity to reposition ten natural compounds identified from the literature, namely resveratrol, quercetin, limonene, sabinene, 1,8-cineole, chamazulene, linalyl acetate, germacrene D, nerol, and neryl acetate, as antidepressants and even as neuroleptics. These compounds are found in common fruits, spices, and tea herbs. All compounds are indexed in databases (DrugBank, FooDB) with different effects like anti-inflammatory, antioxidant, antimicrobial, antiallergic, or anti cancer, but none of them are indexed as antidepressants. Several experimental studies conducted mostly in animal models point toward their antidepressive effects. Therefore, we used computational methods to address the ability of compounds to modulate three major targets in depression, namely SERT, 5-HT1A, and D2 receptors and compared their predicted effect with the effect of potent drugs used in clinics.

All ten compounds present drug-likeness features and no toxicity, meaning that they could be used in therapy. Their ADME features showed a very good intestinal absorption, as well as a good BBB and CNS permeability, suggesting that the compounds can reach the brain, where they should exert their biological effects.

Their biological activities relevant to depression were determined against SERT, 5- HT1A, and D2 receptors. For each target, we built powerful QSAR models that were trained and validated based on synthesis drugs that modulate their function. When predicting the effect of natural compounds, we determined that most compounds, namely limonene, sabiene, chamazulene, germacrene D, linalyl acetate, nerol, and neryl acetate, should inhibit SERT to an extent similar to paroxetine. Only two compounds appear as candidates to modulate 5-HT1A, namely 1,8-cineole and linalyl acetate, in a manner comparable with fluoxetine. Concerning the neuroleptic effect of compounds, quercetin, neryl acetate, linalyl acetate, and 1,8-cineole could be active against D2 receptors, in a similar manner

with ziprasidone. Overall, we identified linalyl acetate as a strong affinity ligand for all three targets (SERT, 5-HT1A, and D2 receptor), and we consider it to be a promising antidepressant compound. Neryl acetate appeared as a promising ligand for both SERT and D2, while 1,8-cineole appears as a common ligand for 5-HT1A and D2 receptors. Molecular docking results confirm the favorable interaction between lead compounds and the targets.

The results obtained here show that linalyl acetate, neryl acetate, and 1,8-cineole target the proteins relevant in depression and present drug-likeness features, suitable ADME profiles, and no toxicity, suggesting they represent viable candidates for repurposing as antidepressants. Our simulation study offers evidence on the molecular mechanism of these compounds and the results should be confirmed experimentally. Results obtained here can be the starting point for studies on the repositioning of natural compounds and plants for an alternative treatment of depression, with significant efficiency, but reduced side effects, that can be administered even to patients with comorbidities or during pregnancy.

**Author Contributions:** Conceptualization, S.A.; methodology, S.A.; software, S.A. and C.B.; investigation, S.A. and A.M.U.; formal analysis, C.B. and A.A.B.; writing—original draft preparation, A.M.U. and S.A.; writing—review and editing, M.M. and M.S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was supported by UEFISCDI through the projects PN-III-P2-2.1-PED-2019-1471, PN-III-P1-1.2-PCCDI-2017-0728, PN-III-P2-2.1-PED-2019-4771, PN-III-P2-2.1-PED-2019-1264, and PN-III-P4-ID-PCE-2020-0620.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## **Exploring Drugs and Vaccines Associated with Altered Risks and Severity of COVID-19: A UK Biobank Cohort Study of All ATC Level-4 Drug Categories Reveals Repositioning Opportunities**

**Yong Xiang <sup>1</sup> , Kenneth Chi-Yin Wong <sup>1</sup> and Hon-Cheong So 1,2,3,4,5,6,7,\***


**Abstract:** Effective therapies for COVID-19 are still lacking, and drug repositioning is a promising approach to address this problem. Here, we adopted a medical informatics approach to repositioning. We leveraged a large prospective cohort, the UK-Biobank (UKBB, *N* ~ 397,000), and studied associations of prior use of all level-4 ATC drug categories (*N* = 819, including vaccines) with COVID-19 diagnosis and severity. Effects of drugs on the risk of infection, disease severity, and mortality were investigated separately. Logistic regression was conducted, controlling for main confounders. We observed strong and highly consistent protective associations with statins. Many top-listed protective drugs were also cardiovascular medications, such as angiotensin-converting enzyme inhibitors (ACEI), angiotensin receptor blockers (ARB), calcium channel blocker (CCB), and beta-blockers. Some other drugs showing protective associations included biguanides (metformin), estrogens, thyroid hormones, proton pump inhibitors, and testosterone-5-alpha reductase inhibitors, among others. We also observed protective associations by influenza, pneumococcal, and several other vaccines. Subgroup and interaction analyses were also conducted, which revealed differences in protective effects in various subgroups. For example, protective effects of flu/pneumococcal vaccines were weaker in obese individuals, while protection by statins was stronger in cardiovascular patients. To conclude, our analysis revealed many drug repositioning candidates, for example several cardiovascular medications. Further studies are required for validation.

**Keywords:** COVID-19; drug repositioning; UK Biobank; vaccine

#### **1. Introduction**

Coronavirus disease 2019 (COVID-19) has resulted in a pandemic affecting more than a hundred countries worldwide [1–3]. More than 220 million confirmed infections and 4.56 million fatalities have been reported worldwide as of 6 September 2021 (https: //coronavirus.jhu.edu/map.html, accessed on 6 September 2021). Besides the burden due to the disease itself, COVID-19 has created heavy burdens on the medical systems in many countries and has led to delays in the diagnosis and treatment of other types of

**Citation:** Xiang, Y.; Wong, K.C.-Y.; So, H.-C. Exploring Drugs and Vaccines Associated with Altered Risks and Severity of COVID-19: A UK Biobank Cohort Study of All ATC Level-4 Drug Categories Reveals Repositioning Opportunities. *Pharmaceutics* **2021**, *13*, 1514. https://doi.org/10.3390/ pharmaceutics13091514

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 13 August 2021 Accepted: 10 September 2021 Published: 18 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

diseases [4,5]. Therefore, it is of urgent public interest to gain deeper understanding into the disease, including identifying risk factors (RFs) for infection and severe disease, and uncovering new treatment strategies.

Although vaccines have been developed for COVID-19, its distribution is highly uneven and only a small proportion of the world's population has been fully vaccinated so far. In addition, vaccine hesitancy remains a major issue that has led to suboptimal vaccination coverage [6,7]. Inadequate knowledge and awareness of COVID-19, especially among the younger population, may also contribute to the continuous rise in the number of cases [8]. Coupled with viral variants that may be associated with increased transmission and reduced vaccine effectiveness [9], the search for drugs that may reduce susceptibility to disease and/or disease severity remains highly important.

A number of clinical risk factors (e.g., age, obesity, cardiometabolic disorders, renal diseases, presence of multiple comorbidities) [10–15] have been found to increase the risk of infection or complications. However, it is less well-known how different drugs may affect the risks of COVID-19 or its severity. Importantly, drugs with protective effects may be potentially repurposed for the prevention or treatment of the disease, as development of a new drug is often extremely lengthy and costly.

Drug repositioning by computational or statistical approaches for COVID-19 is an area of intense interest. Please refer to other reviews (e.g., [16–18]) for an overview of recent studies. For instance, one widely used methodology is the network-based approach, which can integrate different data sources, including omics data and drug–protein–disease interaction networks [16,19–21]. Another methodology is the structure-based approach, which enables a large number of compounds to be screened for their ability to bind to known or predicted molecular targets for COVID-19 treatment [16,22–25]. These methodologies are promising but may have their limitations. For example, they generally do not provide direct evidence for the candidates' effectiveness in real-world or clinical settings. In addition, these approaches may be limited by inadequate knowledge of the pathophysiology and molecular basis of COVID-19. Another limitation is that most drug repositioning studies did not consider patient characteristics; for example, a drug may be more effective within a certain age group or in those with a certain comorbidity. In addition, the effect size (e.g., relative risk reduction) of individual drugs and the level of statistical significance usually cannot be easily estimated by network/structure-based approaches.

Here, we employed a different methodology *not* previously applied to drug repositioning studies for COVID-19. We adopted a medical informatics approach which involves screening a large number of drugs for their associations with the disease, leveraging a largescale population cohort. In brief, we performed a comprehensive study on all Anatomical Therapeutic Chemical Classification System (ATC) level-4 drug categories (*N* = 819) and assessed their associations with susceptibility to, and severity of, COVID-19 in the UK Biobank (UKBB), controlling for possible confounders. Vaccines were also included for analysis. To our knowledge, this is the most comprehensive analysis to date to screen for drug associations and repositioning candidates for COVID-19, leveraging real-world population data.

While pharmacoepidemiology studies are typically focused on one or a few drugs, COVID-19 is a new disease, and we still have limited understanding of its pathophysiology and treatment. As a result, a hypothesis-driven approach may have important limitations of missing potential drug associations and new repositioning candidates. In the field of genetic epidemiology, it has been observed that hypothesis-driven candidate gene studies are not as reliable as genome-wide association studies (GWAS) [26] which are relatively unbiased, indicating merits of the latter approach. In the same vein, here we adopted a "drug-wide" association study approach, which provides a systematic and unbiased assessment of drug associations and repositioning candidates. This approach has also been advocated before [27].

In the present study, we performed rigorous analyses on the impact of medications/vaccinations on the risk of infection, disease severity, and mortality. Analyses

were also conducted within infected patients, tested subjects, and the whole population respectively, and for five different time windows of prescriptions. We also performed further subgroup and interaction analyses to reveal differential effects of the drugs in people with different clinical background. This may enable more "personalized" drug repositioning, i.e., prioritizing drug candidates for specific patient subgroups.

#### **2. Methods**

#### *2.1. UK Biobank Data*

The UK Biobank is a large-scale prospective cohort comprising over 500,000 subjects aged 40–69 years who were recruited in 2006–2010 [28]. In this study, subjects with recorded mortality before 31 January 2020 (*N* = 28,930) were excluded, as it was the date for the first recorded case in UK. This study was conducted under project 28732.

#### *2.2. COVID-19 Phenotypes*

COVID-19 outcome data were downloaded from UKBB data portal. Information regarding COVID-19 data in the UKBB can be viewed at http://biobank.ndph.ox.ac.uk/ showcase/exinfo.cgi?src=COVID19 (accessed on 3 November 2020). Briefly, the latest COVID test results were downloaded on 6 November 2020 (last update 3 November 2020). We consider inpatient (hospitalization) status at testing as a proxy for severity. Data on date and cause of mortality were also extracted (latest update on 21 October 2020). Cases indicated by U07.1 were considered to be (laboratory-confirmed) COVID-19-related fatalities.

A case was considered as having "severe COVID-19" if the subject was hospitalized and/or if the cause of mortality was U07.1. We required both test result and origin to be 1 (positive test and inpatient origin) to be considered as a hospitalized case. For a small number of subjects with initial outpatient origin and positive test result, but changed to inpatient origin and negative result within 2 weeks, we still considered these subjects inpatient cases (i.e., assume the hospitalization was related to the infection).

For a minority of subjects (*N* = 19) whose mortality cause was U07.1 but test results were negative within one week, to be conservative, they were excluded from subsequent analyses.

#### *2.3. Medication Data*

Medication data was obtained from the primary care data for COVID-19 research in UKBB (details available at https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/ gp4covid19.pdf, accessed on 9 November 2020). We made use of the latest release of General Practice (GP) records released by UKBB, which contains prescription data from two electronic health record (EHR) systems (TPP or EMIS) for ~397,000 UKBB participants. The drug code and issue date of each drug are available. Please also refer to Figure 1 for an overview of our analysis workflow.

**Figure 1.** An overview of the analytic workflow. We considered five exposure time windows and multiple statistical models. We conducted analyses within infected patients, tested subjects, and the whole population, respectively. Effects of prescribed medications/vaccinations on the risk of infection, severity of disease (hospitalization as proxy) and mortality were investigated separately. Missing data were accounted for by multiple imputation. Inverse probability weighting (IPW) of the probability of being tested (Prob(tested)) was employed to reduce testing bias. Multivariable logistic regression was conducted, controlling for main confounders. We primarily focused on drugs with protective effects, as residual confounding tends to bias towards harmful effects. In addition, we performed further subgroup and interaction analysis to identify factors that may modify the drug effects. **Figure 1.** An overview of the analytic workflow. We considered five exposure time windows and multiple statistical models. We conducted analyses within infected patients, tested subjects, and the whole population, respectively. Effects of prescribed medications/vaccinations on the risk of infection, severity of disease (hospitalization as proxy) and mortality were investigated separately. Missing data were accounted for by multiple imputation. Inverse probability weighting (IPW) of the probability of being tested (Prob(tested)) was employed to reduce testing bias. Multivariable logistic regression was conducted, controlling for main confounders. We primarily focused on drugs with protective effects, as residual confounding tends to bias towards harmful effects. In addition, we performed further subgroup and interaction analysis to identify factors that may modify the drug effects.

#### 2.3.1. Time Window of Prescriptions Since the GP records cover many years of prescriptions, we set time windows to re-2.3.1. Time Window of Prescriptions

the seasonal nature of vaccination.

strict prescriptions with a certain time period as the "exposure". The "index date" was defined as (1) the date of the first positive COVID-19 test for infected subjects (for U07.1 cases, the mortality date was regarded as the index date if no test record was found); or (2) the date of last test for those who were tested negative; or (3) 3 November 2020 (the date of the latest update of COVID-19 test results) for those who were untested. The issue date of each prescription was available, but the duration was not. Time Since the GP records cover many years of prescriptions, we set time windows to restrict prescriptions with a certain time period as the "exposure". The "index date" was defined as (1) the date of the first positive COVID-19 test for infected subjects (for U07.1 cases, the mortality date was regarded as the index date if no test record was found); or (2) the date of last test for those who were tested negative; or (3) 3 November 2020 (the date of the latest update of COVID-19 test results) for those who were untested.

windows were determined by whether the drug was issued within a specified period before the index date. The following windows were considered for medications: 6 months, 1 year, 2 years, and 5 years. Narrower time windows (<6 months) may not be desirable and may lead to many prescriptions being missed, as the latest issue date was 25 July 2020, but the latest index date was 3 November 2020. As for vaccines, unlike many medications, vaccines are not prescribed regularly, and The issue date of each prescription was available, but the duration was not. Time windows were determined by whether the drug was issued within a specified period before the index date. The following windows were considered for medications: 6 months, 1 year, 2 years, and 5 years. Narrower time windows (<6 months) may not be desirable and may lead to many prescriptions being missed, as the latest issue date was 25 July 2020, but the latest index date was 3 November 2020.

most vaccines only need to be given once or less than a few times; hence, a narrow time window is not optimal due to sparsity of data. For seasonal vaccines, namely flu vaccines, they are usually given in autumn (September to November) or early winter in the UK. A time window of 6 months will lead to missing most of the flu vaccines given. On the other hand, it is also reasonable to consider a longer time window (e.g., 10 years) as vaccine effects can be more long-lasting [29]. In view of the above, we considered time windows of 1, 2, 5, and 10 years for vaccinations. For flu vaccines, we defined "past 1 year" as prescriptions from 1 September 2019 onwards (and similarly for past *k* years) to account for As for vaccines, unlike many medications, vaccines are not prescribed regularly, and most vaccines only need to be given once or less than a few times; hence, a narrow time window is not optimal due to sparsity of data. For seasonal vaccines, namely flu vaccines, they are usually given in autumn (September to November) or early winter in the UK. A time window of 6 months will lead to missing most of the flu vaccines given. On the other hand, it is also reasonable to consider a longer time window (e.g., 10 years) as vaccine effects can be more long-lasting [29]. In view of the above, we considered time windows of 1, 2, 5, and 10 years for vaccinations. For flu vaccines, we defined "past 1 year" as

prescriptions from 1 September 2019 onwards (and similarly for past *k* years) to account for the seasonal nature of vaccination.

#### 2.3.2. Mapping to ATC

All the medications were mapped to the ATC Classification (https://www.genome. jp/kegg-bin/get\_htext?br08303, accessed on 9 November 2020). Drug categories were defined by the fourth level of ATC classification.

#### *2.4. Covariate Data*

We performed multivariable regression analysis with adjustment for potential confounders including basic demographic variables (age, sex, ethnic group), comorbidities (coronary artery disease (CAD), diabetes (DM), hypertension, asthma, chronic obstructive pulmonary disease (COPD), depression, dementia, history of cancer, blood urea and creatinine reflecting renal function), indicators of general health (number of medications taken, number of non-cancer illnesses), anthropometric measures (body mass index (BMI)), socioeconomic status (Townsend deprivation index) and lifestyle risk factor (smoking status). For disease traits, we included information from ICD-10 diagnoses (code 41270) and self-reported illnesses (code 20002), and incorporated data from all waves of follow-ups. Subjects with no records of the relevant disease from either self-report or ICD-10 were regarded as having no history of the disease.

#### *2.5. Sets of Analysis*

We performed a total of eight sets of analysis (Table 1). The impact of prescribed medication/vaccination on the risk of infection (Models E and F), severity of infection (Models A, C, and G) and risk of mortality (Models B, D, and H) from COVID-19 were investigated separately. Both hospitalized and fatal cases were grouped under the "severe" category.

**Table 1.** The eight sets of analyses based on infected patients (model A, B), tested subjects (models F, G, H) and the population (models C, D, E).


U07.1 is the code for fatal (laboratory-confirmed) COVID-19 infection based on the latest ICD coding. Dx, diagnosis; -ve, negative.

We also considered different study designs and conducted our analyses with different comparison samples. Models A and B are restricted to the infected subjects, while models C, D, and E involve comparison of severe, fatal and general infected cases to the general population (with no known diagnosis of COVID-19). On the other hand, models F, G, and H compared infected, severe, and fatal cases, respectively, against subjects who were tested negative for SARS-CoV-2.

There were 397,000 subjects in the UKBB with available GP prescription records. Among them, 30,835 subjects have received at least one COVID-19 test, and 3858 had been tested positive. There were 1318 cases classified as "severe" (hospitalized or mortality from COVID-19) and 170 fatal cases. In total 393,142 UKBB participants did not have a known diagnosis of COVID-19. The detailed count of participants for each model is listed in Table 2.


**Table 2.** Number of available subjects for analysis for the 8 models.

Only subjects with available GP prescription records are shown.

#### *2.6. Statistical Analysis Methods*

Logistic regression (using the R package speedglm) was used to examine the impact of medication on different outcomes in the eight sets of analysis. For more stable estimates, analysis was not performed if the number of subjects taking the drug in the affected or unaffected group was less than five. All statistical analyses were conducted using R. The false discovery rate (FDR) approach by Benjamini and Hochberg [30] was performed to control for multiple testing. This approach controls the expected proportion of false positives among the rejected null hypotheses.

#### *2.7. Imputation of Missing Data*

Missing values of remaining features were imputed with the R package "missRanger". The program is based on missForest, which is an iterative imputation approach based on random forest (RF). It has been widely used and shown to produce low imputation errors and good performance in predictive models [31]. The program missRanger is largely based on the algorithm of missForest, but uses the R package "ranger" [32] to build RF for improvement in speed (we found that other packages, such as MICE and missForest, are computationally too slow to produce results for the large-scale analyses here). Predictive mean matching (pmm) was employed to avoid imputation of values not present in the original data, and to increase variance to more realistic levels for multiple imputation (MI). We followed the default settings with pmm.k = 5 and num.trees = 100. We performed the analyses on multiply imputed datasets (imputed for 10 times) and combined the results by Rubin's rules [33] using the function "mi.meld" under the R package "amelia". Another advantage of missRanger is that out-of-bag errors (in terms of classification errors or normalized root-mean-squared error) could be computed, which provides an estimate of imputation accuracy.

#### *2.8. Inverse Probability Weighting of the Probability of Being Tested*

Bias due to non-random testing has been discussed previously in other works [34,35]. As a person has to be tested to be diagnosed with COVID-19, factors leading to increased probability of being tested will also lead to an apparent increase in the risk of infection [35]. In addition, it has been raised that collider bias can occur when conditioned on the tested group. This could result in spurious associations, for example, between a risk factor and COVID-19 severity if both increases the probability of being tested (Pr(tested)). One way to reduce this kind of bias is to employ inverse probability weighting (IPW) of Pr(tested). Essentially, we wish to create a pseudo-population, or mimic a scenario under which testing is random instead of selected for certain subgroups. The IPW approach up-weighs those who are less likely to be tested and down-weighs those who have a high chance of being tested. This may create more unbiased estimates of the effects of drugs.

We took reference to the approach described in [34] to analyze the data with IPW. Following our recent work [36] which aims to predict COVID-19 severity with machine learning (ML), here we also employed an ML model (XGboost) to predict Pr(tested) based on a range of factors. An advantage of using ML models is that nonlinear and complex

interactions can be considered, which may improve predictive performance over logistic models. We employed the same set of predictors as in our previous work [36], and followed the same analysis strategy of hyper-parameter tuning and cross-validation to obtain predicted probabilities (please refer to [36] for details). Beta-calibration [37] was performed, and the resulting average AUC was 0.622. The predicted probabilities (i.e., Pr(tested)) were used to construct weights for IPW. Stabilized weights [38] were used.

#### *2.9. Subgroup Analysis*

For selected drugs showing tentative protective effects, we also performed further subgroup and interaction analyses. These drugs included cardiovascular medications listed in Table 3, four vaccines with protective associations (influenza, pneumococcal, typhoid, and combined bacterial/viral vaccines), and other top drugs with consistent protective associations across multiple models/time windows as listed in Table 4.

**Table 3.** Cardiometabolic medications showing significant protective associations (limited to FDR < 0.05) within time windows of 6, 12, and 24 months.


For space limits, only results with FDR < 0.05 are shown. Please refer to Tables S3 and S6 for full results. OR, odds ratio; conf.low, lower 95% CI for OR; conf.high, upper 95% CI for OR; FDR.BH, false discovery rate by the Benjamini–Hochberg method.


**Table 4.** Drugs showing consistent protective associations across 4 time-windows and 8 models (ranked by the frequency of being nominally significant, i.e., *p* < 0.05).

Frequency (freq) calculated based on results from time windows of 6 months to 5 years. Ophthalmological and dermatological agents are not listed in the above table.

Subgroup analysis was performed with respect to main demographic features (age, sex, and ethnicity) and main comorbidities (same as the diseases listed under "covariate data"). We also compared log(OR) estimates across the subgroups with or without the risk factor of interest. The test statistic was obtained by *z* = (*β*<sup>1</sup> − *β*2)/ p *var*(*β*1) + *var*(*β*2), where *β*<sup>1</sup> and *β*<sup>2</sup> refer to the coefficients under the two independent subgroups.

#### *2.10. Interaction Analysis*

As a complementary approach, we also performed analysis with a logistic model including an interaction term (drug\*risk\_factor). The same set of drugs and risk factors were studied. The two approaches are similar in principle; however, stratified analysis yields more unbiased estimates if confounders have subgroup-dependent associations, while the interaction term approach produces more precise (lower-SE) estimates (hence higher power to detect interactions) [39].

#### *2.11. Controlling for Other Drugs*

We also performed additional regression analyses controlling for other top-ranked drugs. Two sets of analyses were conducted. In the first set of analysis, we controlled for the top 10 or 20 protective and harmful drugs in each time window and model. As for the second analysis, for drugs with protective associations, we controlled all other protective drugs with FDR < 0.05 or 0.1 (this analysis was performed for protective drugs only, as there were too many drugs associated with harmful effects to be included as covariates).

#### **3. Results**

Due to the large number of models and drugs being studied, we highlight the main results and findings from different sensitivity analysis.

Confounding by indication and other comorbidities is unavoidable, and, in particular, drugs showing harmful effects may possibly be explained by such confounding. On the other hand, as it is expected that most diseases tend to *increase* the risk/severity of infection, drugs showing *protective* effects are much less likely to be affected by confounding, and

such associations may be relatively more reliable. We therefore place a greater emphasis on protective drugs in the sections below; this is also in line with our primary objective to prioritize repositioning candidates. Drugs with harmful effects are briefly discussed for comprehensiveness.

A summary of the demographic and covariate data of the original UKBB dataset is shown in Table S1. The missing rates and out-of-bag (OOB) errors for different variables from multiple imputations are shown in Table S2.

#### *3.1. Primary Analysis with Multiple Imputation of Covariates*

Full results of all drug categories across all time windows (including 6, 12, 24, 60, and 120 months; the last time window only for vaccines) are shown in Tables S6–S10. All protective associations (with at least nominal significance, i.e., *p* < 0.05) are shown in Table S3, while all association results with vaccines are presented in Table S4. For drugs associated with increased odds of infection/severity, we also summarize the top 10 drugs (ranked by *p*-value) from each model and time window, and organize them together in Table S5.

#### 3.1.1. Overview

Across all categories, statins showed the strongest and most consistent protective associations. Highly significant protective effects were seen across infected subjects, tested subjects, or the whole population, especially in reducing the severity or mortality of infection. Albeit with smaller effect sizes, we also observed that statins might be linked to lower susceptibility to infection (model E). Interestingly, a number of top-listed drugs are also cardiovascular medications, such as angiotensin-converting enzyme inhibitors (ACEI), angiotensin receptor blockers (ARB), calcium channel blocker (CCB), and beta-blockers.

For simplicity, odds ratios (OR) are presented for a time horizon of 1 year if not further specified.

#### 3.1.2. Drugs for Cardiometabolic Disorders

Significant protective associations with FDR < 0.05 are shown in Table 3. Statins showed protective effects across models A, C, D, E, and G. Significant protective effects against severe infection were seen among infected subjects (OR for prescriptions within a 12-month window, same below: 0.50, 95% CI: 0.42–0.60), tested subjects (OR = 0.63, 0.54–0.73), or when comparing severe cases to the general population (OR = 0.49, 0.42–0.57). In addition, protective association against fatal infection was observed (OR = 0.51, CI 0.34–0.74). Statins was also associated with lower susceptibility to infection, with ORs of 0.83 (CI: 0.77–0.91) and 0.86 (CI: 0.79–0.93) for prescriptions within 1 year and 2 years, respectively.

Another group of drugs with highly consistent protective associations were *ACEI and ARB*. ACEI showed protective associations against severe disease among infected subjects (model A: OR for 1-year time window, same below: 0.68, CI: 0.54–0.86), and when compared to the general population (model C: OR 1 year = 0.61, CI: 0.51–0.74) or test-negative subjects (model G: OR 1 year = 0.71, CI: 0.59–0.85). We also observed association with lower odds of infection at a population level (model E: OR 1 year = 0.81, CI: 0.73–0.90); the effect size seemed to decrease over longer time windows. ARBs also showed protective associations against severe disease in the population (model C: OR 1 year = 0.68, CI: 0.54–0.85) or among tested individuals (model G: OR 1 year = 0.68, CI: 0.55–0.87).

Biguanides (mainly metformin) were associated with lower odds of severe illness among the infected (model A: OR for 2-year time window = 0.60, CI: 0.42–0.86) and in the population (model C; OR 1 year = 0.67, CI: 0.51–0.88). Other drugs of interest include beta-blockers, which were associated with lower risk of infection among tested subjects (model F, OR 1 year = 0.80, CI: 0.70–0.91), and CCBs (C08CA) which were associated with lower odds of severe disease in the population (model C, OR 1 year: 0.76, CI: 0.64–0.90).

#### 3.1.3. Vaccines

Significant associations for vaccines with FDR < 0.05 are shown in Table 5. One of the most consistent associations was observed for influenza vaccines. Protective associations were observed across almost all models (B to H), and across all time windows. Flu vaccination was associated with lower odds of infection when compared to population controls (model E; OR 1 year = 0.73, CI: 0.65–0.83) or compared to test-negative individuals (model F; OR 1 year = 0.60, CI: 0.53–0.68). Similar protective effects were also observed when restricting the cases to severe cases (model C: OR 1 year = 0.74; CI: 0.60–0.91; model G: OR 1 year = 0.61, CI: 0.50–0.76). Association with lower odds of mortality was also observed, although the confidence interval is wide as the number of fatal cases was small (model D: OR 1 year = 0.28, CI: 0.13–0.63; model H: OR 1 year = 0.23, CI: 0.11–0.52). The effect sizes in general became weaker with longer time windows.

**Table 5.** Vaccines with significant protective associations (limited to FDR < 0.05) within time windows of 1, 2, 5, and 10 years.


For space limits, only results with FDR < 0.05 are shown. Please refer to Table S4 for full results.

In view of the significant findings, we repeated the analyses on flu vaccines with other ways to define the exposure (Table S14). First, we defined the exposure based on the actual season of vaccination instead of any vaccines received in the past *k* years. For people who had received flu vaccination in 2019–2020 (regardless of vaccination in other years), the OR for infection was 0.60 (CI: 0.53–0.68), compared to those who had not (test-negative

subjects as controls, model F; same below). The OR was attenuated to 0.76 (CI: 0.67–0.87) if the exposure was defined as flu vaccination in 2015–2016 (regardless of vaccination in other years). We then narrowed down the exposure as receiving flu vaccine in the last season (2019–2020) but *not* in 2018–2019; the resulting OR was 0.67 (CI: 0.53–0.83). On the other hand, if we considered exposure as vaccination in 2018–19 but not 2019–20, the OR became weaker and nonsignificant (OR = 0.80, CI: 0.63–1.01). Those who received the vaccine consecutively for the last two seasons had similar but slightly stronger protection from infection (OR = 0.59, CI: 0.51–0.69); however, the CI overlaps with other estimates. A similar pattern of association was observed for model E (population controls). In general, more recent vaccination was associated with stronger protective effects.

Pneumococcal vaccines were also associated with protection against infection, especially within tested subjects (model F: OR 1 year = 0.50, CI: 0.31–0.82), which shows a trend of attenuation with longer time windows (OR for 10-year window = 0.67, CI: 0.51–0.87). Another group of vaccines showing protective effects is J07CA (bacterial and viral vaccines), which was significant under model F (OR for 1-year window: 0.56, CI: 0.38–0.84); it also showed weakening of effect over time. Other significant associations included tetanus and typhoid vaccines, which were observed to be protective against infections.

#### 3.1.4. Other Drugs Showing Protective Associations

Significant results for other drugs having protective effects and FDR < 0.05 are shown in Table 6. As for other drugs, proton pump inhibitors (PPI) were associated with lower odds of infection when we compared test-positive against test-negative patients (model F: OR 1 year = 0.77, CI: 0.71–0.83); the ORs showed a gradient with largest effect within 6 month of use (OR = 0.72) and became weaker at the 5-year time window (OR = 0.87). PPI was also significantly associated with lower severity of disease.

Natural and semisynthetic estrogens (ATC G03CA) were linked to lower risk of infection and severity in the tested population (model F: OR 1 year = 0.67, CI: 0.58–0.78), which showed attenuation of effect over time. The largest effect size was noted within 6 months of use (OR = 0.63), which was attenuated for a 5-year time window (OR = 0.73). Similar protective associations were observed under model G, with severity as the outcome.

Prior use of thyroid hormones was consistently associated with lower risk of infection and severity, no matter whether the general population or test-negative individuals were considered as controls. The ORs were similar across all time windows. For model E (infected vs. population), the OR for 1-year time window was 0.80 (CI 0.71 to 0.92), which was close to the effect size under model F (infected vs. test-negative). For model C (hospitalized/fatal cases vs. population), the OR for 1-year time window was 0.62 (CI 0.48 to 0.79), and it was similar when constrained to tested subjects.


**Table 6.** Other drugs with significant protective associations (limited to FDR < 0.05) within time windows of 6, 12, and 24 months.

For space limits, only results with FDR < 0.05 are shown. Please refer to Tables S3 and S6 for full results. Ophthalmological and other topical agents are not listed in the above table.

#### 3.1.5. Drugs Ranked by Consistency of Protective Associations

We also ranked the drugs in term of their *consistency* of protective associations. Briefly, drugs were ranked by their frequency of being at least nominally significant (*p* < 0.05) across the four time windows and eight models (Table 4). This serves as an alternative approach to prioritize drugs. For some drugs, the results may not be significant after FDR

correction. Nevertheless, if a drug showed consistent associations (at least nominally) across multiple models or time-frames, it may also be worthy of further investigation.

#### 3.1.6. Drug Associated with Increased Odds of Risk/Severity of Infection

Among the drugs with harmful associations, the more frequently top-listed ones include laxatives, opioids (N02AA), benzodiazepines, tetracycline, penicillins, other antipsychotics (N05AX), and antidementia drugs (N06DA/DX). The full results are presented in Tables S6–S10, and a summary is also provided in Table S5.

#### *3.2. Analysis Restricted to Subjects with Complete Covariate Data, and Models with/without IPW*

As a sensitivity analysis, for the above analysis with imputed covariates, we also repeated models A to H *without* IPW of Pr(tested). In addition, we also repeated the analyses, limiting to subjects with complete covariate data, with or without the IPW approach. In general, we observed similar drugs with significant results, and the topranked protective or harmful drugs were similar to the above. Comparing results with and without IPW, the list of significant drugs remained similar although the OR estimates and SE were adjusted. The full results are presented in Tables S7 and S8 (complete covariate data with and without IPW) and Table S9 (imputed covariates without IPW).

#### *3.3. Subgroup Analysis*

The proportion of subjects falling into each subgroup is presented in Table S10, while full results are presented in Table S11. We performed a statistical test to compare the log(OR) across the two subgroups with and without the risk factor; drugs with protective effect in one subgroup but significantly different OR in the other subgroup are listed in Table 7. For example, the protective effects of pneumococcal and flu vaccines were significantly weaker in obese (BMI > 30) subjects under model F. With regards to age, several drugs, such as PPI and ACEI, showed larger protective effects in those with age > 70 under models F and E, respectively. Statins, ACEIs, and PPI showed stronger protective associations in hypertensive patients under models C, E, and F, respectively. Regarding ethnicity as a subgroup, a number of drugs, including several vaccines, appeared to have stronger protective effects in the white compared to non-white subjects. However, only <10% of the UKBB subjects included here were non-white, and the non-white subgroup was heterogeneous and composed of several different ethnicities. We did not observe clear evidence of sex-specific effects in this analysis.


**Table 7.** Summary of subgroup analysis, showing drugs having significant protective association in one subgroup but significantly different OR in the other subgroup (FDR < 0.2).

**Table 7.** *Cont.*


OR\_Y, odds ratio within the subgroup defined in the 1st column; OR\_N, OR in the other subgroup. Sig\_Y, sig\_N, significance in the two subgroups, 1 denotes significant protective effect, 0 denotes nonsignificant effect, −1 denotes significant harmful effect. p\_OR\_cmp, *p*-value based on comparison of ORs; p.adjust\_OR\_cmp, corresponding FDR. Ethnicity as a subgroup is not shown here; please refer to Table S11 for details. CAD, coronary artery disease, HT, hypertension.

#### *3.4. Interaction Analysis*

A summary of results (results with FDR < 0.2) is presented in Table 8, while a fuller version is given in Table S12. Full results are given in Table S13. More significant results (at FDR < 0.2) are observed compared to stratified analysis, presumably due to the higher power of this approach. For example, we found that most vaccines showing protective effects, including influenza and pneumococcal vaccines, interacted with BMI and obesity significantly. Higher BMI was associated with *reduced* protective effects, in line with evidence from subgroup analysis.

On the other hand, statins, biguanides (metformin), and antiplatelet drugs showed positive interactions with BMI. For CAD, significant interaction was observed with several cardiometabolic drugs, including beta-blockers (nonselective), antiplatelet drugs, and statins, suggesting larger protective effects for such drugs in CAD patients. In a similar vein, most cardiometabolic medications showed interaction with HT, indicating more prominent protective associations in HT patients.

Considering age as an interacting variable, interaction was observed with a large number of drugs, most suggesting weaker protection as age increases. Considering specific medications, statins interact with multiple risk factors and demonstrate larger protective effects with CAD, obesity, DM, CAD, HT, dementia, and in males. However, its effect tends to be weaker with increasing age. Interaction analysis with flu vaccines showed that its effect may be weaker in the obese and with increasing age, but was stronger in the white population and asthmatic subgroup. ACEI and ARB showed stronger protective effects in the white and HT patients, but weaker effects with advanced age.

#### *3.5. Controlling for Other Medications*

We primarily focused on protective drugs, as the number of drugs with significant negative effects is large and is hard to control for all. Overall, most drugs with protective effects remain significant (at least for a subset of models), despite controlling for other medications (Table S15). However, biguanides (A10BA), CCB (C08CA), and platelet aggregation inhibitors, excluding heparin (B01AC), showed a relatively consistent trend of nonsignificant association with outcome when other protective drugs were controlled for. The findings are similar when controlling for top-10/20 drugs or all protective drugs having FDR < 0.05/0.1.


**Table 8.** Summary of interaction analysis, showing pairs of variables with significant interactions (FDR < 0.2).


We added an interaction term drug\*interacting factor in the regression model. For "interaction term", 1 denotes significant interaction effects towards protection (i.e., presence of the interacting factor tends to increase the protective effect of the drug); −1 denotes significant interaction effects towards harmful side (presence of the interacting factor tends to reduce the protective effect of the drug). We consider significant results in any model or time window. For age and BMI, they were modeled as continuous variables unless otherwise specified. For full results, please refer to Tables S12 and S13.

#### **4. Discussion**

In this work, we performed a thorough and rigorous analysis on the effect of drugs and vaccines on COVID-19 susceptibility and severity. We uncovered a number of drugs with potentially protective effects, which may be further explored as candidates for drug repositioning.

As an approach based on observational data, different kinds of bias, such as confounding and selection bias, may affect the results. We performed analysis on infected subjects (models A and B), the whole population (models C, D, E) and the tested population (models F, G, H) to obtain a more comprehensive picture of drug effects under different settings, and to avoid limitations (e.g., selection bias, collider bias, unscreened controls) of some designs.

#### *4.1. Highlights of Relevant Drugs*

Below, we highlight drugs that are tentatively associated with altered risk or severity of infection. We preferentially consider drugs that showed significant associations across multiple models and time windows, those with stronger statistical significance, and those with protective effects, as confounding by indication is much less likely.

#### 4.1.1. Drugs for Cardiometabolic Disorders with Protective Effects

Interestingly, many drugs with potential protective effects are indicated for cardiometabolic (CM) disorders. Cardiometabolic risk factors, such as obesity, hypertension, DM, and CAD, have consistently been shown to be associated with risk and severity of infection [15,40]; as such, it is biologically plausible that drugs for treating CM disorders may be beneficial.

Among all drugs, the strongest and most consistent protective association was observed for statins. The beneficial effects of statins are supported by several previous studies. For example, a recent meta-analysis of four retrospective studies of COVID-19 patients [41] showed a significantly decreased hazard of severity or mortality of infection (pooled HR = 0.70) when comparing statin users against nonusers. Another retrospective study by Tan et al. [42] also reported lower risk of intensive care unit (ICU) admission among statin users in infected patients. Yet another work showed that statins may be effective in reducing in-hospital mortality among diabetic patients [43]. Potential mechanisms for the protective actions of statins have been discussed elsewhere [44–46]. It has been postulated that, besides reducing CVD risks, statins may reduce risk/severity of infection by inhibiting inflammation and excessive immune response, producing direct antiviral effects, improving endothelial function, and exerting an antithrombotic effect, among other actions [44–46].

Another group of drugs worth highlighting is ACEI and ARB. There have been intense discussions on whether ACEI/ARB may affect risk or severity of infection from early on, as ACE2 is a receptor for SARS-CoV-2. Nevertheless, a recent study showed that ACE2 is localized in respiratory cilia, and the use of ARB/ACEI does not change its expression [47]. Recent systemic reviews and meta-analysis (for example, see [48] with continuous updates) of observational studies do not support an association between ACEI/ARB prior use and severity of infection. However, several studies [47,49–55] reported protective effects of ACEI/ARB on severity or mortality of disease. Here, we observed consistent association of prior use of ACEI/ARB with reduced risks of severe/fatal infection (models A, C, G) and overall infection risk in the population (model E).

For several other kinds of cardiometabolic drugs, the associations were not as strong, but may still be worthy of further studies. Biguanides (mainly metformin) are observed to be protective for severe COVID-19 infection, both among the infected and at a population level. For example, in a meta-analysis on four observational studies of hospitalized patients mostly with type 2 DM, the use of metformin was associated with a lower risk of mortality (OR = 0.75, 95% CI = 0.67–0.85) [56]. A number of mechanisms have been proposed [56,57]. For example, besides improving glycemic control and weight reduction, metformin may lead to AMPK activation which potentially reduces viral entry by phosphorylation of ACE2 receptor. It may also lead to mTOR pathway inhibition and prevents hyperactivation of the immune system [56].

Other drugs of interest may include beta-blockers and calcium channel blockers (C08CA, dihydropyridine derivatives). It was suggested that beta-blockers may be useful in preventing hyperinflammation and hence beneficial for COVID-19 [58]. For calcium channel blockers (CCBs), a study using cell culture suggested that CCBs, especially amlodipine and nifedipine, were useful in blocking viral entry and infection in epithelial lung cells [59]. In another retrospective study [60], both beta-blockers and CCBs were associated with lower mortality. Another relevant study in the UK [61] utilized data from the UK Clinical Practice Research Datalink (CPRD) and found that ACEI/ARB, CCBs, and thiazide diuretics were all associated with lower odds of diagnosis, while beta-blockers do not show any association after adjusting for consultation frequency. None of the above drugs were associated with mortality in that study [61].

#### 4.1.2. Vaccines

There has been intense interest in whether vaccines indicated for other diseases may protect against COVID-19. Here, we observed that a number of vaccines showed protection against infection or severe infection. For example, pneumococcal vaccines were protective against infection in the population and tested subjects, and risk of severe infection (model G). Significant protective associations were also observed for tetanus and typhoid vaccines at a time horizon of 10 years (the power to detect associations is likely stronger over longer periods due to larger number of people having received the vaccine; it does not exclude the possibility that the vaccines may have effects over shorter time windows). We also observed associations with the J07CA category, which contains various bacterial and viral vaccines (see https://www.whocc.no/atc\_ddd\_index/?code=J07CA, accessed on 9 November 2020).

For influenza vaccines, we observed highly consistent protective associations. It has been proposed that "trained innate immunity", which may involve epigenetic reprogramming of innate immune cells, may enable a vaccine to protect against other diseases [62,63]. Interestingly, two studies in Italy reported that higher coverage rate of flu vaccine was associated with lower rate of infection, hospitalization, and mortality from COVID-19. Another larger-scale study, based on electronic records of 137,037 subjects who have received viral PCR tests, showed that a number of vaccines (given in the past 1, 2, or 5 years) were associated with lower risks of infection [64]. These included flu and pneumococcal vaccines also implicated in the present study. Another recent study in the Netherlands [65] also showed a reduced risk (Relative risk = 0.61, 95% CI: 0.46–0.82) of infection among

recipients of flu vaccine, and this effect size was similar to that observed here. In vitro studies by the same authors showed that the vaccine was able to induce a trained immunity response, including an increase of cytokine responses after stimulation of immune cells with SARS-CoV-2.

We note that this is an observational study, and residual confounding may be present. For example, it is possible that people receiving flu vaccines are more health-conscious and observe preventive measures better. However, we observed waning protective effects over time, which makes sense biologically but could not be entirely explained by the above confounder alone. In addition, the vaccine appears to have stronger effect sizes if fatal infection is considered as the outcome (although the confidence interval is large), which cannot be easily explained by health-consciousness. On the other hand, as flu vaccines are more likely to be received by the elderly and those with chronic illnesses, residual confounding of these factors tend to push the effects towards the harmful side.

Taken together, we believe that the protective effects of vaccines may not be easily and fully explained away by other confounders. Further experimental and clinical studies are warranted to investigate the nonspecific effects of flu and other vaccines, especially since COVID-19 vaccines may not be easily available to many people (especially those in low-income countries) in a short timeframe.

#### 4.1.3. Other Potential Protective Drugs

We briefly highlight a few other drugs with potential protective effects. Estrogens (G03CA) were among the drugs showing protective associations. As many studies reported higher risks of severe disease in men than in women, it has been hypothesized that estrogen may play a part in the sex-discordant outcomes, for example via its effects on immune response to infections [66–68].

Thyroid hormones (TH) were also among the top-ranked drugs. It was postulated that TH may ameliorate tissue injury due to hypoxia by suppression of p38 MAPK [69]. Clinical trials on TH are ongoing [69,70], and our findings support a protective role of TH in COVID-19.

Another drug category of note is proton pump inhibitors (PPI). Several studies have suggested harmful effects of PPI on disease severity, which may be related to reduced gastric acid production with subsequent bacterial overgrowth [71–73]. However, an in vitro screening study revealed that PPIs may serve as a potent inhibitor of SARS-CoV-2 replication [74]. The difference in findings between the current study and previous works may be due to heterogeneity in study samples and designs, differences in the outcome studied (e.g., hospitalization vs. ICU admission used in some other studies; infection risk vs. severity of disease, etc.), and variations in the covariates being adjusted for. Residual confounding, such as by other comorbidities and drugs given, may also affect the results. Interestingly, we observed that effects of PPI may be stronger in certain subgroups (e.g., older age, HT), which may also account for the discrepancy in results across different studies.

Several other top-ranked drug categories in Table 4 may also be worth discussing. Testosterone-5-alpha reductase inhibitors (5ARis) were recently shown in a small randomized controlled trial (RCT) to reduce the time to remission [75]. Two earlier observational studies also reported lower risk of ICU admission and frequency of symptoms [76,77]; 5ARis block the conversion of testosterone to its more potent form, dihydrotestosterone. Of note, one of the key receptors for the SAR-CoV-2 virus is TMPRSS2 [78], and the only known promoter of the gene is an androgen response element in the promoter region [79].

Another drug category of interest is platelet aggregation inhibitors (B01AC). It has been reported that COVID-19 is associated with higher risk of thrombotic events, including deep vein thrombosis and pulmonary embolism [80]. Antithrombotic therapies have been hypothesized to reduce thrombo-inflammatory processes as a result of endothelial dysfunction related to viral infection [81]. An observational study reported that aspirin is associated with reduced risk of mechanical ventilation and mortality in hospitalized patients [82]; however, RCTs are lacking.

For some of the protective drugs highlighted above, we note that their significance weakened (or became nonsignificant) when controlling for other medications. However, we expect multicollinearity among the drug variables, as cardiometabolic disorders are highly comorbid and one patient often takes multiple medications. Multicollinearity may render interpretation of individual predictors difficult due to unstable coefficient estimates [83].

In our secondary analyses, we also considered *subgroup and interaction effects*. While this is a more exploratory analysis and further replications are required, it shed light on how the effects of drugs/vaccines may differ in people with different clinical background and may contribute to more "personalized" drug repositioning in the future. For instance, we observed a consistent trend that the protective associations of flu and pneumococcal vaccines were weaker in obese individuals. As an example, comparing those who received flu vaccine in the past season (2019–2020) against those who did not, the estimated OR for infection was 0.76 in the obese group and 0.54 in the non-obese group (model F). It has been observed before that obese individuals respond less well to flu and other vaccines due to impaired immunological responses [84,85]. As another example, statins were observed to have more prominent protective effects in those with cardiometabolic abnormalities, such as DM, HT, CAD, and obesity. This is also supported by a recent study [43] which showed mortality reduction in statin users in diabetic patients only.

#### 4.1.4. Drugs with Potentially Harmful Effects

We noted a number of drugs with potentially harmful effects, but we caution that residual confounding, such as confounding by indication, other comorbidities, and general poor health, may lead to bias towards an increased odds of infection or severe disease.

For example, people who have poorer health in general may visit their GPs more often and be prescribed drugs (e.g., laxatives, antibiotics, painkillers), which may lead to confounding. Nevertheless, it is possible that some of the top-ranked drugs may indeed increase the risk/severity of infection. For instance, it is slightly unexpected that laxatives were highly significant across multiple models and time windows. It has recently been postulated that dysregulation of gut microbiome may be associated with susceptibility or resilience to infection [86,87], and laxatives represent a main category of drugs that affect the gut microbiome [88]. Interestingly, several associations involve psychiatric medications such as benzodiazepines, antipsychotics, and antidementia drugs. The association may be due to underlying neuropsychiatric conditions (e.g., anxiety, psychosis, dementia, etc.), or the effect of the drugs, or a combination of both. Some of the above drugs overlap with those revealed in a recent study using primary care data in Scotland. In a univariate analysis restricted to nonresidents in care homes and those without major conditions, laxatives, anxiolytics, penicillins, and opioid analgesics were significantly associated with ICU admission or mortality from COVID-19 when compared to population controls [89]. These drugs were also top-listed as drugs with harmful effects in this study.

Patients taking immunosuppressants are more susceptible to viral infections in general, and it is possible that these drugs are also associated with increased vulnerability to COVID-19 infection [90]. On the other hand, such drugs may dampen excessive immune responses ("cytokine storm") that may occur in severe infections [91]. However, here we did not find consistent evidence of associations between immunosuppressive agents and COVID-19. Across immunosuppressive drugs (ATC category L04), we only found two significant associations (FDR < 0.05). Interleukin inhibitors were associated with higher susceptibility to infection (model E) and selective immunosuppressants (L04AA) were associated with higher risk of severe infection (model C), respectively, when compared to population controls (Table S6). No other significant associations were observed. Of note, a few preclinical studies reported that thiopurines, a type of immunosuppressant, may lead to reduced viral replication [92,93] via other mechanisms, although clinical studies suggested possible harmful effects [94,95]. However, the number of patients taking such drugs was too small for meaningful analysis in this study.

#### 4.1.5. Different Results under Different Models

We note that sometimes the different models may yield different results. One main observation is that analysis on the tested population appears to result in more findings of drugs with protective effects. We also observed that some drugs in model F (infected vs. tested negative) may show different effects under model E (infected vs. general population). Several reasons may explain this finding. First, confounding by indication is inevitable and may play a more important role when analyzing general population samples. It is possible that apparent harmful effects of drugs are due to the diseases/conditions that the prescription is related to, or poorer health in general. Based on a machine learning model for predicting testing probability (see Figure S1), we observed that people who are older, having more comorbidities and taking more medications, suffering from cardiovascular conditions, etc. were more likely to be tested. Compared to the general population, the tested group may represent a more "homogeneous" population, enriched for people with poorer health and more comorbidities in general. Therefore, a proportion of confounders which overlap with factors associated with higher Pr(tested) are essentially controlled for by stratification, if we only study the tested subjects. On the other hand, in the general population, as there is a higher proportion of healthy subjects, the effect of confounding by indication may be stronger. Another possibility is collider bias due to conditioning on a subgroup of subjects. For example, a drug may be associated with certain conditions which, in turn, are associated with higher chance of being tested; on the other hand, those who have more severe symptoms or complications are more likely to be tested. Conditioning on testing may result in spurious associations between the drug and severity of infection. However, we have tried to minimize this type of bias by the IPW approach, and we did not observe substantial difference in results with or without IPW correction for most drugs. However, we note that, even with adjustment by IPW, there is still chance for residual selection or collider bias. For example, some factors associated with Pr(tested) may not be captured in the prediction model. A third possibility to consider is that a drug may truly produce different effects in different subgroups, due to effect modification by other factors or diseases. For instance, a recent study reported that the protective effect of statins is more marked in patients with diabetes [43]. The fact that risk factor associations may differ between a whole-population- or tested-population-based study has also been noted previously, for example in [35].

#### *4.2. Strengths and Limitation*

This study has a number of strengths. First and foremost, the study was performed on a large cohort with a sample size close to half a million. The sample was not limited to one or a few medical centers, and covered the entire UK population, although this is not an entirely random sample and participation bias still exists [34]. The large and well-characterized sample also enables analysis of infected and tested, as well as the whole population. We have studied *all* level-4 ATC drug categories, allowing an unbiased and systematic analysis on the association of different drugs with COVID-19 risks or outcomes. This avoids the risk of publication bias, especially negative results to be unreported. Drugs showing null associations can still be of important public health interest, as this may suggest that patients on such medications may not need to change their regimen in view of the pandemic. In addition, medication history was retrieved from GP records, which minimize recall bias and errors from self-reporting. Another strength is that we performed a variety of statistical analysis to reduce bias, including control for potential confounders, multiple imputation, IPW to reduce effects of testing bias, and study of different time windows and multiple models. Some of our findings were also corroborated by previous studies. Many previous clinical studies were limited to hospitalized or infected individuals, which cannot study the effect of drugs on susceptibility to infection. Selection on hospitalized/infected subjects may also be prone to selection/collider bias, as discussed elsewhere [34]; therefore, we included multiple models with infected and tested, as well the whole population as samples, which aims to reduce limitations due to specific designs.

There are also various limitations, some of which have been mentioned above. First and foremost, this is an observational study based on a retrospective cohort of UKBB. As this is not a randomized controlled trial, confounding is inevitable, especially confounding by indication. Although we have controlled for main confounders in the regression model, residual confounding is still likely. Since confounding by indication will likely bias towards *increased* odds of infection or severe disease, null or protective associations may be more reliable. Confounding by the use of other types of drugs is also possible. In addition, the UKBB cohort is not random, and participants are on average healthier than the general population [96]. The majority of participants are of European descent, so the findings may not be generalizable to other ethnicities. In addition, the subjects are mostly >50 years old, and drug effects in younger individuals may be different.

Regarding drug history, it is worth noting that vaccination records are not complete, as individuals may receive vaccination outside GP practices. Over-the-counter prescriptions were not counted, and it cannot be guaranteed that all drugs issued are dispensed by the pharmacy (see https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/tppgp4covid19. pdf, accessed on 9 November 2020). However, if this misclassification is nondifferential (unrelated to outcome), the bias will be towards the null. There is a relatively high missing rate of GP prescription records for deceased COVID-19 patients, which leads to reduced power to detect associations. While the UKBB cohort sample is large, we still have low power to detect associations for drugs that are uncommonly prescribed. Another limitation with the GP records is that only the issue date, but no duration or dosage, is available.

As for the outcome, hospitalization is a rough proxy for severity only. For models comparing to the general population, it is likely that a proportion of the population may be infected but were not tested. This tends to lead to bias on the conservative side (akin to the use of unscreened controls in genetic studies [97,98]), especially under model E. Patients with more severe symptoms are less likely to remain untested, so other models may be less affected by this bias. We note that this study focuses on prior (or pre-diagnostic) use of drugs and their association with infection risk/severity, and does not provide direct evidence for whether newly prescribed drugs to recently diagnosed patients will be useful or not. The current study represents one approach to drug repositioning with real-world population data, yet integrating results from other repositioning approaches (e.g., network/structure-based) may further improve the reliability of candidates.

#### *4.3. Clinical Implications*

We highlight a few clinical implications here, although we stress that further studies are required to confirm our findings. We discovered a number of drugs with potential protective effects that, if replicated and tested in further trials, may represent promising repurposing candidates (for prevention or treatment of disease). As CM disorders are a major risk factor for severe infection, this study also provides further support for the safety of CM medications and reinforces the need to continue these drugs for those indicated. In a similar vein, negative findings (nonsignificant associations with COVID-19) in this study may also be of value, given that some patients or physicians may have concerns over the risk of COVID-19 induced by existing drugs.

Another important finding is that flu (and possibly others, e.g., pneumococcal) vaccines may be associated with lower odds of infection and severity of disease. If further confirmed, the finding is clinically important as COVID-19 vaccines are not fully available yet to a large part of the world's population (especially those in developing countries), some may be hesitant to take the new vaccine, and the efficacy of existing vaccines varies and is less than perfect. At least, the present work supports that flu and other vaccinations should be continued and encouraged amid the pandemic. For any vaccines/drugs that may be repurposed for COVID-19, we believe that even a modest reduction in the risk/severity of infection may still be highly useful, given the huge number of people at risk for COVID-19 and its complications.

#### **5. Conclusions**

Here, we observed that a number of drugs, including many for cardiometabolic disorders, may be associated with lower odds of infection/severity of COVID-19. Several existing vaccines, especially flu vaccines, may be beneficial against COVID-19 as well. Due to the observational nature of the study, confounding cannot be excluded, and other limitations may be present. We understand that causal relationship between drugs and disease cannot be reliably concluded from this study alone, and shall regard the findings as more exploratory than confirmatory. Nevertheless, to our knowledge, this is the most comprehensive study to date on drug/vaccine associations with COVID-19. We believe that the current work provides a valuable resource to prioritize repositioning candidates for future meta-analyses, clinical trials, and/or experimental studies.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13091514/s1, All supplementary Tables and notes are available at the journal's website and at https://drive.google.com/drive/folders/1\_noITkBAsef\_7Kb6bUd\_RI\_3VQK5jafH? usp=sharing (accessed on 28 January 2021) or https://doi.org/10.6084/m9.figshare.14828112 (accessed on 23 June 2021). Table S1: Demographic and other characteristics of the original UKBB data, Table S2: Out-of-bag (OOB) errors for different variables from multiple imputations, Table S3: (a) All protective associations with at least nominal significance (*p* < 0.05) (6 month to 5 years), Table S4: All association results with vaccines (time windows of 1, 2, 5 and 10 years), Table S5: (a) Top 10 drugs with harmful effects (ranked by *p*-value) from each model and time window (time window of 6 month to 5 years) (b) Summary table by frequency of being listed among the top 10, Table S6: All association results based on subjects with available GP prescription records, with multiple imputation of covariates and inverse probability weighting (IPW) of probability of being tested, Table S7: Analysis restricted to subjects with complete covariate data, with IPW, Table S8: Analysis restricted to subjects with complete covariate data, without IPW, Table S9: Analysis with imputed covariates without IPW, Table S10: Proportion of subjects in each subgroup, Table S11: Full results of subgroup analysis, Table S12: Summary table of interaction analyses (results with FDR < 0.2), Table S13: Full results of interaction analyses, Table S14: Further analysis on associations of flu vaccine and risks/severity of infection, according to the season of vaccination, Table S15: Results of analyses after controlling for other top medications. Figure S1: Shapley dependence plot of top 15 variables contributing to Pr(tested) from the XGboost prediction model. Variables are ranked by absolute mean Shapley value. Please refer to Lundberg et al. (https://doi.org/10.1038/s42256-019-0138-9) for details on Shapley values.

**Author Contributions:** Conceptualization, H.-C.S.; Data curation, Y.X. and K.C.-Y.W.; Formal analysis, Y.X. (lead) and K.C.-Y.W.; Funding acquisition, H.-C.S.; Methodology, H.-C.S. (lead) and Y.X.; Project administration, Y.X.; Supervision, H.-C.S.; Writing—original draft, H.-C.S. and Y.X.; Writing review & editing, Y.X., K.C.-Y.W. and H.-C.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Natural Science Foundation of China, grant number 81971706; KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, China; Lo Kwee Seong Biomedical Research Fund, The Chinese University of Hong Kong.

**Institutional Review Board Statement:** The UK Biobank study has received ethical approval from the NHS National Research Ethics Service North West (16/NW/0274). Details of UK Biobank research ethics approval can be found at https://www.ukbiobank.ac.uk/learn-more-about-ukbiobank/about-us/ethics.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** UK Biobank data are available to eligible researchers after completing an application procedure (https://www.ukbiobank.ac.uk/enable-your-research).

**Acknowledgments:** We thank Pak SHAM for support on data access. We thank Carlos CHAU for help with data presentation, Liangying YIN for help with data cleaning and Shitao RAO and Jinghong QIU for help in manuscript preparation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Discovery of a Potent Candidate for RET-Specific Non-Small-Cell Lung Cancer—A Combined In Silico and In Vitro Strategy**

**Priyanka Ramesh <sup>1</sup> , Woong-Hee Shin 2,3,\* and Shanthi Veerappapillai 1,\***


**Abstract:** Rearranged during transfection (RET) is a tyrosine kinase oncogenic receptor, activated in several cancers including non-small-cell lung cancer (NSCLC). Multiple kinase inhibitors vandetanib and cabozantinib are commonly used in the treatment of RET-positive NSCLC. However, specificity, toxicity, and reduced efficacy limit the usage of multiple kinase inhibitors in targeting RET protein. Thus, in the present investigation, we aimed to figure out novel and potent candidates for the inhibition of RET protein using combined in silico and in vitro strategies. In the present study, screening of 11,808 compounds from the DrugBank repository was accomplished by different hypotheses such as pharmacophore, e-pharmacophore, and receptor cavity-based models in the initial stage. The results from the different hypotheses were then integrated to eliminate the false positive prediction. The inhibitory activities of the screened compounds were tested by the glide docking algorithm. Moreover, RF score, Tanimoto coefficient, prime-MM/GBSA, and density functional theory calculations were utilized to re-score the binding free energy of the docked complexes with high precision. This procedure resulted in three lead molecules, namely DB07194, DB03496, and DB11982, against the RET protein. The screened lead molecules together with reference compounds were then subjected to a long molecular dynamics simulation with a 200 ns time duration to validate the inhibitory activity. Further analysis of compounds using MM-PBSA and mutation studies resulted in the identification of potent compound DB07194. In essence, a cell viability assay with RET-specific lung cancer cell line LC-2/ad was also carried out to confirm the in vitro biological activity of the resultant compound, DB07194. Indeed, the results from our study conclude that DB07194 can be effectively translated for this new therapeutic purpose, in contrast to the properties for which it was originally designed and synthesized.

**Keywords:** LC-2/ad cell line; drug discovery; docking; MM-GBSA calculation; molecular dynamics; cytotoxicity assay

#### **1. Introduction**

Targeted therapies using tailored inhibitors against oncogenic driver kinases have transformed the landscape of cancer management, including non-small-cell lung cancer (NSCLC) [1]. Notably, first-generation inhibitors against oncogenic drivers such as gefitinib, erlotinib (EGFR mutations), and crizotinib (ALK rearrangement) have established a novel treatment paradigm for the use of targeted inhibitors in genetically defined NSCLC patients [2,3]. Despite the earlier success of these strategies, the emergence of acquired resistance against the therapy has become a significant challenge in developing selective and more potent next-generation inhibitors.

**Citation:** Ramesh, P.; Shin, W.-H.; Veerappapillai, S. Discovery of a Potent Candidate for RET-Specific Non-Small-Cell Lung Cancer—A Combined In Silico and In Vitro Strategy. *Pharmaceutics* **2021**, *13*, 1775. https://doi.org/10.3390/ pharmaceutics13111775

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 29 August 2021 Accepted: 19 October 2021 Published: 24 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Rearranged during transfection (RET), a transmembrane tyrosine kinase receptor was found to be overexpressed in 1–2% of never-smoking NSCLC patients [4]. In general, it plays a vital role in the development of neural crest cells in the nervous system and kidney morphogenesis. RET consists of three domains: adhesion, tyrosine kinase, and extracellular domain. Activation of RET involves autophosphorylation of a fusion protein complex with a glial cell line derived from neurotrophic factors (GDNF) and GFR-α, a cell membrane-bound coreceptor [5,6]. The downstream signaling of RET assists in cell migration, proliferation, and differentiation. Nevertheless, genetic alteration of RET oncogenes promotes ligand-independent activation of driver kinases, resulting in tumorigenesis. A study in late 2011 revealed that pericentric inversion, rearrangement, dimerization, and activation of RET proteins with KIF5B and CCD6C in NSCLC were analogous to the mechanism of ALK [7]. Multiple Kinase Inhibitors (MKIs), including cabozantinib and vandetanib, gave the first glimmer of hope for the treatment of RET-positive NSCLC patients. However, these nonselective MKIs demonstrated limited response durability and off-target side effects in NSCLC patients [8]. Thus, selective inhibitors such as selpercatinib and pralsetinib were developed to offset the debility of the multiple kinase inhibitors.

Recently, the emergence of solvent front mutations and gatekeeper mutations in RETpositive NSCLC patients has been reported as the primary cause for the development of acquired resistance against the targeted kinase inhibitors [9]. A similar pattern of the solvent front and gatekeeper mutations was observed in several types of oncogenic driven NSCLCs. A typical example of other proteins associated with resistance in NSCLC includes ALK rearrangement, ROS-1 positive, and EGFR mutations. A significant number of reports are available to tackle resistance caused by the above genes [10]. However, studies on RET mutations in NSCLC are very minimal and are not satisfactory [11]. In addition, it is to be noted that MKIs were the only choice of drug to treat RET-driven NSCLC. Recently, the selective inhibitor pralsetinib was administered in both naïve and platinum-based chemotherapy-treated patients. Among the cohort, 10% of the patients were detected with solvent front mutations (G810C/S), 15% were detected with MET amplification and 5% of the cohort were detected with KRAS amplification [12]. Although the study ended up with satisfactory results and was found to have overcome gatekeeper mutations during the clinical trials, the adverse side effects of the drug limit its efficacy and it failed to overcome solvent front mutations [13]. Moreover, the resistance mechanism of solvent front mutations to selective inhibitors is not yet reported in the literature [14]. Hence, developing next-generation targeted kinase inhibitors particularly against RET solvent front mutations is desperately needed to overcome the acquired resistance.

Virtual screening of active compounds for hit identification and lead optimization has been made possible by advancements in bioinformatics and computer modeling in modern drug research [15]. For instance, Misra et al. identified two potent human great wall kinase inhibitors using the ZINC database that mitigate mitotic division in various types of cancer [16]. Similarly, Tamta et al. identified and validated three natural inhibitors against Mpro of SARS-CoV-2 using different in silico strategies including molecular docking, dynamics and MM-PBSA analysis [17]. In view of the successful evidence mentioned above, we implemented an integrated approach using pralsetinib as the reference inhibitor towards the screening of potent candidates against RET protein. Three different models were generated for performing a virtual screening process using FDA approved, experimental and investigative subsets of the DrugBank database, followed by docking analysis, to identify potent and highly selective RET inhibitors. The combined assessment in this study provides a highly potent drug-like candidate tailored for RET oncogenic drivers that can overcome acquired resistance in NSCLC patients.

#### **2. Materials and Methods**

#### *2.1. Dataset Retrieval and Structural Refinement*

The 3D conformation of RET tyrosine kinase with PDB ID: 2IVU and resolution of 2.5 Å were retrieved from Protein Data Bank (PDB) (www.rcsb.org/pdb, accessed on 27 August 2021). RET protein was prepared using the protein preparation wizard of the Schrödinger suite [18]. This process involves eliminating water molecules and impurities and incorporating hydrogen bonds and ionization states to the protein. The optimization and minimization of 2IVU were performed using the optimized potential for liquid simulation \_2005 (OPLS\_2005) force field, to increase the protein's binding efficiency during docking analysis.

Table S1 (see Supplementary Materials) represents the existing RET inhibitors retrieved from various literature. They were utilized for pharmacophore hypothesis generation [19–21]. In addition, the spatial data file (SDF) of molecules in a different subset of the DrugBank repository containing a total of 11,808 compounds was extracted for proceeding with standalone library generation and the virtual screening process. The existing inhibitors and generated library were refined by attaching the hydrogen bonds, generating the stereoisomer, and identifying the significant ionization state using the Lig-Prep module of Schrödinger. Finally, the OPLS\_2005 force field was used to optimize the ligand structures considered in our study [22].

#### *2.2. Hypothesis Generation and Molecular Docking*

The screening hypotheses were generated based on three different approaches, such as ligands, protein structure, and energetics of protein–ligand interactions with the aid of the Phase module of Schrödinger (version 5.3). Initially, the reference ligands were divided into actives and inactives based on their IC<sup>50</sup> values (Table S1, see Supplementary Materials). Compounds with IC50 values higher than 5.0 µM were classified as inactive molecules. Consequently, the ligand-based pharmacophoric hypothesis was generated based on the common features of the active ligands using a tree-based partitioning algorithm [23]. Each common pharmacophore hypothesis (CPH) undergoes a rigorous scoring function based on alignment score, volume score, and vector score of the active ligands. The best CPH with high survival score was chosen for the virtual screening analysis. In the e-pharmacophore strategy, CPH was generated by docking the reference ligand pralsetinib and by mapping the energetic scores onto the atoms [24]. Similarly, receptor cavity-based CPH was developed based on the potential binding site of the RET protein using the SiteMap module of Schrödinger. Altogether, the chosen CPH contained four basic pharmacophoric features, namely a hydrophobic group (H), aromatic ring (R), hydrogen bond acceptor (A), and donor (D) [25]. Finally, the above-generated high precision CPH was used independently to screen the subsets of the DrugBank database. The resultant set in each screening was subjected to three hierarchical docking strategies, namely high-throughput virtual screening (HTVS), standard precision (SP), and extra precision (XP), which were implemented using the Glide module to identify the binders from nonbinders [26]. It is worth nothing that pralsetinib was used as the reference inhibitor throughout the investigation. Finally, the interaction pattern and the essential pharmacokinetic parameters such as stars, central nervous system response (CNS), and human oral absorption (HOA) were analyzed using the Qikprop module of Schrödinger.

#### *2.3. Machine Learning-Based Standalone Rescoring Function*

Random Forest score (RF score) based the rescoring function was implemented to determine the binding affinity between the ligand and RET for virtual screening using the open drug discovery toolkit available in https://github.com/oddt/rfscorevs, accessed on 27 August 2021. This scoring function is built using an RF algorithm with descriptors generated based on the distance between the atoms of the protein and the ligand that lie within 12 Å [27]. Compounds that have an RF score greater than the pralsetinib score were considered for further evaluation.

#### *2.4. Chemical Similarity Calculations*

Tanimoto coefficient (Tc) was similarly calculated based on the MACCS fingerprint to evaluate the structural similarities of all the compounds. A higher value of T<sup>c</sup> depicts the

high structural similarity of the compounds with the reference molecule. Hence, the cut-off T<sup>c</sup> value of >0.4 was considered in this analysis to quantify the fraction of compounds that exhibit structural similarity to pralsetinib [28]. In the present study, RDKit of the python library was implemented to generate the MACCS fingerprint and to calculate the T<sup>c</sup> of the compounds.

#### *2.5. Binding Free Energy and DFT Calculations*

The prime module of the Schrödinger suite was used to determine the binding free energy of RET protein–ligand complexes. It is interesting to note that the binding free energies that were calculated using the MM-GBSA method correlated with the experimental study most of the time. The pose viewer file of the protein–ligand complex generated during Glide XP docking was used as a query for binding free-energy calculations. Further, the prime module utilizes the VSGB 2.0 solvation model to optimize hydrogen bonds, hydrophobic interactions, π–π interactions, and self-contact interactions [29]. The energy terms such as electronic interactions, Van der Waal's interaction, entropy terms, polar and nonpolar contributions were considered for the binding free-energy calculations in the Prime package of Schrödinger.

Density functional theory (DFT) was calculated for the hit compounds obtained during the virtual screening process. Jaguar v8.7 was employed to calculate the nature of the interaction between the protein and ligand and molecular electrostatic properties such as highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO). Frontier orbital gaps of the hit compounds were calculated to analyze the kinetic stability and chemical activity [30].

#### *2.6. Assessing the Stability and Binding Mode of 2IVU–Ligand Complex*

A molecular dynamics (MD) simulation of the RET–ligand complex was used in this study to assess the stability and conformational changes of a protein–ligand complex. GROMACS v5.1.2 (Virginia Tech Department of Biochemistry, Blacksburg, VA, United States) with GROMOS96 43a1 force field was used for the simulation. The topology files and the parameters for the ligands were developed using the PRODRG server. Dodecahedron box with dimensions of 1 nm × 1 nm × 1 nm was configured using editconf inbuilt tool of GROMACS. Subsequently, the Simple Point Charge model was explicitly used for solvating the complex system in a dodecahedron box. During the solvation process, the system exhibited a total charge of +8. Hence, eight chlorine counter ions were added to neutralize the protein system. The weak Van der Waals linkages were removed using the Steepest Descent algorithm to minimize the energy of the complex. Electrostatic interactions were enlightened by applying the Particle-Mesh Ewald method. LINCS algorithm was implemented for constraining the hydrogen bonds and for truncating the Van der Waals interactions. The canonical calculations of NVT (Number of particles, Volume, and Temperature) and NPT (Number of particles, Pressure, and Temperature) ensembles were executed for restraining the position. The complex system was heated using a Berendsen thermostat at 300 K with a lapsing time of 0.1 ps and pressure of 1 bar. Precedent to MD simulation, a pre-run was performed with a 1000 kJ mol−<sup>1</sup> nm−<sup>2</sup> force constant as a positional restraint for 50 ps. Ultimately, final MD for the apoprotein (without ligand) and protein–ligand complex were carried out for 200 ns [31]. Trajectories for the complex system were saved every 2 fs. Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), H-bond linkages, free-energy landscape, and the salt bridge between the ligand and the protein were also evaluated using GROMACS utilities. In essence, the MM-PBSA strategy was also implemented to calculate the empirical free energies between the RET receptor and the identified potential ligands with high precision [32].

#### *2.7. In Vitro Analysis*

The anticancer activities of the potential compounds together with pralsetinib were determined using MTT assay [33]. The LC-2/ad cell was purchased from the European Collection of Authenticated Cell Cultures (Catalogue number: ECACC 94072247, Merck KGaA, Darmstadt, Germany) and grown in high-glucose Dulbecco's Modified Eagle Medium (AL149, Himedia, Mumbai, India) for 24 h. The cell line contains CCDC6-RET driver gene fusion isolated from the lung of a 51-year-old adenocarcinoma Japanese patient. This cell line is widely used to study intracellular signaling pathways, resistance mechanisms, and drug sensitivity against RET fusion in NSCLC samples. The chemical compounds pralsetinib and DB07194 were purchased from MolPort (Catalogue number: HY-112301, Molprot, Riga, Latvia) and Merck (Catalogue number: 574715-2mg, MercK KGaA, Darmstadt, Germany), respectively. Consequently, the grown LC-2/ad cells were exposed to reference and hit compound concentrations ranging from 6.25 µM/mL to 100 µM/mL for four days at 37 ◦C in a 5% CO<sup>2</sup> atmosphere. The absorbance of the samples was read at 570 nm and 630 nm as the reference wavelength to correct the nonspecific background values. The experiment was performed in triplets, and the mean value of the assays was considered in our analysis. Finally, the IC<sup>50</sup> of the compound was determined using a linear regression equation and viability graph. In addition, a statistical comparison of cell viability between control and drug candidates was carried out using one-way ANOVA. For all comparisons, a *p*-value of less than 0.05 was regarded as statistically significant.

#### **3. Results and Discussion**

#### *3.1. Pharmacophore Modeling and Virtual Screening*

A pharmacophore is a collection of chemical features and spatial properties required for the ligand to interact with a macromolecular target and elicit a biological response [34]. In the present investigation, about 193 ligand-based pharmacophore hypotheses were developed with the assistance of actives and inactives (Table S1, see Supplementary Materials) using the Phase module of Schrödinger (v5.3). Depending on the survival score, a five feature CPH containing one hydrogen bond acceptor (A), one hydrogen bond donor (D), one hydrophobic group (H), and two aromatic rings (R) were selected. Likewise, two other hypotheses, DHRRR and ADDHR, were generated from the e-pharmacophore and receptor cavity-based strategies, respectively. A total of 3673, 1198, and 4595 compounds were obtained after phase screening using pharmacophore, e-pharmacophore, and receptor cavity-based hypotheses, respectively. The screened compounds were subjected to three tiers of docking such as HTVS, SP and XP using pralsetinib (−7.79 kcal/mol) as a reference compound. In each stage, 50% of high-scoring leads were passed on to further analysis. This process yielded a total of 887 (Pharmacophore–208; e-Pharmacophore–103; Receptor cavity-576) compounds possessing better binding capability than the reference compound which were carried for further analysis.

#### *3.2. Rescoring Methodologies*

Random Forest scoring is a novel machine-learning algorithm implemented extensively in virtual screening to forecast binding affinity on a varied range of targets, using descriptors based on RF Score version v1-3. Despite being less precise on physicochemical properties, the RF scoring function typically outperformed conventional scoring systems in estimating binding affinity [35]. Hence, in the current investigation, rescoring was conducted using a random forest approach for all the hit molecules obtained in the screening process. The results from our algorithm depict that 500 out of 887 compounds had a higher RF score than pralsetinib. Further, T<sup>c</sup> was calculated between the reference ligand and the hit molecules to measure the structural similarity [36]. The results indicate that 406 molecules were highly similar to pralsetinib with a T<sup>c</sup> threshold value greater than 0.4. RF score and T<sup>c</sup> values of compounds obtained from pharmacophore, e-pharmacophore and receptor cavity-based strategy are tabulated in Tables S2–S4, respectively (see Supplementary Materials). On comparing the RF score and T<sup>c</sup> results of all the hit molecules, 78,

39, and 59 compounds were found to possess better similarities and RF scores, respectively from pharmacophore, e-pharmacophore, and receptor cavity-based strategies. The results from all three hypotheses were then integrated to eliminate false positive prediction. Notably, only 18 lead molecules were found to be in common among all the three approaches with high similarities and RF scores. The combined result of 18 lead molecules and their scores are tabulated in Table 1.


**Table 1.** Docking and rescoring evaluation of lead molecules using different strategies.

#### *3.3. Postdocking MM-GBSA Analysis*

The binding free energies of complexes were determined to validate the binding ability of the ligands to the target protein. The summary of the binding free energy of each complex is tabulated in Table 2. It can be observed that the ∆Gbind of the complexes varied from between −69.235 kcal/mol and −39.610 kcal/mol. Note that only eight compounds resulted in a binding free-energy value above −55 kcal/mol. The Van der Waals energy for all the compounds was observed to be highly favorable to the overall binding energy. The coulomb energy provided the second-highest contribution to the interaction in all the compounds; however, the high solvation energy compensated coulomb contribution in ∆Gbind. The contribution of covalent energy is almost unfavorable or negligible to the binding of the compounds DB08583, DB07606, and DB04751. Additionally, ligand strain energy depicts the deformation of ligands during the interaction, which is considered one of the most important parameters during the MM/GBSA analysis [37,38]. It is clear from the table that almost all the predicted compounds undergo less deformation than pralsetinib during interaction with the target protein except DB08583, DB07606, and DB04751. Although, the compounds DB08583, DB07606, and DB04751 exhibited better binding free energy. Higher ligand strain energy decreases the binding efficacy of the compounds with target receptors. Eventually, DB07194, DB03496, DB11982, DB12672, and DB12848 showed more satisfactory Coulombic potential and ligand strain energy than the other compounds, facilitating tight binding to the RET protein. Of note, these compounds exhibited minimal covalent energy contributions towards ∆Gbind, a key factor for forming a thermostable complex with RET protein.


**Table 2.** The predicted binding free energy of RET-complex structures calculated using MM-GBSA approach.

#### *3.4. HOMO–LUMO Theory Analysis*

All five compounds with high binding free energy and lower ligand strain energy were optimized using B3LYP-D3 theory and LACVP++ basis set (Schrödinger, Bangalore, India). Since the reactivity of a compound is directly related to the energy gap, the parameters HOMO and LUMO had are significant [39]. The molecule with a minimal energy gap between frontier orbitals is usually accompanied by a substantial chemical reactivity and weak kinetic stability which depicts the highly favorable potential reactions [40]. The energy gap between HOMO and LUMO is shown in Figure 1. It is observed that DB07194, DB03496, and DB11982 exhibited a lower or equivalent gap to pralsetinib than DB12672 and DB12848. These results imply that compounds such as DB07194, DB03496, and DB11982 exhibit better biological activities than pralsetinib.

#### *3.5. Interaction Pattern and Pharmacokinetic Analysis*

The interaction pattern of hit compounds in the binding pocket of the receptor is represented in Figure S1 (see Supplementary Materials). On analyzing the binding pattern of pralsetinib, two hydrogen bonds were found between the cyclohexane carbomide group and the ALA807 residue of RET, and one additional hydrogen link was observed between the pyridine ring of pralsetinib and the SER811 residue of the protein. The ligand interaction diagram of DB07194 clearly shows the formation of two hydrogen bonds between the amino pyrimidine group of DB07194 and the residues ASN879, ASP892 of the RET protein. Likewise, the N-methylpiperidinyl and flavone group of DB03496 displayed hydrogen bonds with ARG878 and ALA807 residues of the receptor. In addition, a salt bridge was formed between the tertiary amino group of N-methylpiperidinyl and the residue ASP892 in the DB03496-RET complex. In the case of DB11982, a hydrogen bond formation between the pyridine carboxamide and ARG878 of RET protein was observed. It is interesting to note that the anticancer property of these functional groups of the hit compounds involved in the interaction with the RET protein has been reported recently [41–43]. The existence of interactions by the key residues ASN879 and ASP892 of hydrophobic pockets in RET proteins has also been observed in the other approved drugs, crizotinib, and sorafenib, respectively. The interaction pattern of the drugs is given in supplementary Figure S2.

and (**g**) DB12848.

**Figure 1.** Graphical representation of HOMO–LUMO energy gap calculation for reference (**a**) Pralsetinib and hit compounds (**b**) DB07194, (**c**) DB03496, (**d**) DB11982, (**e**) DB04751, (**f**) DB12672 **Figure 1.** Graphical representation of HOMO–LUMO energy gap calculation for reference (**a**) Pralsetinib and hit compounds (**b**) DB07194, (**c**) DB03496, (**d**) DB11982, (**e**) DB04751, (**f**) DB12672 and (**g**) DB12848.

*3.5. Interaction Pattern and Pharmacokinetic Analysis*  The interaction pattern of hit compounds in the binding pocket of the receptor is represented in Figure S1 (see Supplementary Materials). On analyzing the binding pattern of pralsetinib, two hydrogen bonds were found between the cyclohexane carbomide group and the ALA807 residue of RET, and one additional hydrogen link was observed between the pyridine ring of pralsetinib and the SER811 residue of the protein. The ligand interac-Furthermore, the essential pharmacokinetic parameters were analyzed to prevent the elimination of the compounds during clinical trials in the future. Table S5 (see Supplementary Materials) characterizes the interaction patterns and pharmacokinetic features of the lead compounds. The hit compounds displayed satisfactory pharmacokinetic and pharmacodynamics properties. Of note, key properties such as solubility, blood–brain barrier, stars, human oral absorption, and CNS activity were found to be in the acceptable range Stars denote the number of pharmacokinetic features that lie outside the required range. Interestingly, none of the hit compounds were found to have outliers based on the star values. Moreover, the capability of stimulating the central nervous system response by the hit molecules was comparatively similar to pralsetinib (CNS = −2). Undeniably, the

tion diagram of DB07194 clearly shows the formation of two hydrogen bonds between the amino pyrimidine group of DB07194 and the residues ASN879, ASP892 of the RET protein. Likewise, the N-methylpiperidinyl and flavone group of DB03496 displayed hydro-

was formed between the tertiary amino group of N-methylpiperidinyl and the residue ASP892 in the DB03496-RET complex. In the case of DB11982, a hydrogen bond formation between the pyridine carboxamide and ARG878 of RET protein was observed. It is interesting to note that the anticancer property of these functional groups of the hit compounds involved in the interaction with the RET protein has been reported recently [41–43]. The existence of interactions by the key residues ASN879 and ASP892 of hydrophobic pockets HOA of all the predicted molecules was higher than pralsetinib (HOA = 2), which shows the efficacy of a drug that can be attained easily through oral administration in humans.

#### *3.6. Protein–Ligand Complex Stability Analysis*

The stability and dynamic characteristics of protein-lead inhibitor complexes were investigated using MD simulations. It provides precise insights on protein–ligand interactions, allowing for the visualization of the influence of ligand binding on protein and its contribution to their stable, bound conformation [32]. The RET protein complexed with three hit compounds alongside the reference complex was analyzed using 200 ns MD simulations. The extent of deviation of atoms in the protein-lead complex during the simulation process is explained using RMSD plots. It is interesting to note that the obtained results correlate well with our initial findings. The results are shown in Figure 2a–d. Figure 2 reveals that all the compounds showed an increased RMSD deviation within the interval of 0–30 ns simulation time. A minimal deviation in the pattern was observed between 30 ns and 75 ns. Consequently, all the compounds maintained a stable equilibrium of ~0.30 nm from 75 ns to the end of the simulation process. Towards the end of the simulation, minimal RMSD values of 0.345 nm, 0.323 nm, and 0.371 nm were observed for DB07194, DB03496, and DB11982, respectively, smaller than pralsetinib (0.385 nm) and apoprotein (0.414 nm). In all the cases, the RMSD data corresponding to apoprotein was significantly higher than the ligand-bound structure investigated in our analysis. This suggests that the hit compound could adapt to a more stable conformation than pralsetinib in the binding pocket of RET protein. Moreover, the overall deviation of hit molecules was less than ~5 nm, depicting the stability of the RET protein in the presence of lead molecules. Thus, we hypothesize that the predicted DB07194 compound could have a higher inhibitory potential against RET protein than pralsetinib.

Guterres and Im showed that active compounds have less RMSD than inactive compounds in 100 ns MD simulations [44] From the DUD-E set, they randomly selected 56 targets. For each target, 10 compounds, five actives and five decoys were selected. They observed that the active compounds have a unimodal RMSD distribution centered at 4 Å, whereas the decoys have a skewed-right distribution, showing that a lot of them leave the binding pocket during the simulation. As mentioned, our molecules including pralsetinib have RMSD ~0.3 nm, which is consistent with the work of Guterres and Im. This implies that the three compounds could act as active compounds.

#### *3.7. Residue Mobility Analysis (RMSF)*

RMSF depicts the flexibility of protein residues within the protein–ligand complex. As demonstrated in Figure 3, a similar pattern of fluctuation in the backbone was observed among all four systems. The region between Val871–Asp898 exhibited the least fluctuation, with less than ~0.05 nm, indicating the contribution of these residues to stable binding of predicted inhibitors with the RET receptor. Notably, important residues such as Asn879 and Asp892 showed fluctuations of ~0.04 nm, which were found within the conserved interaction region. It is to be noted that the presence of a highly stable protein–ligand complex was due to the formation of hydrogen bonds between these residues and the inhibitors. The other residues, Met700–Lys722 and Pro957–Arg982, showed high flexibility of about ~0.1 nm, suggesting that these residues contributed less to the RET–ligand interaction. These results are correlated well with the ligand interaction pattern discussed earlier. Moreover, a lower RMSF value depicts the well-organized region whereas a high RMSF value indicates loosely structured terminal ends of the complex [33]. In the present study, the apoprotein exhibited an RMSF value of 0.0696 nm whereas the complexes RET– Pralsetinib, RET–DB07194, RET–DB03496, and RET–DB11982 showed 0.069, 0.0665, 0.0773, and 0.033 nm RMSF values, respectively. The RET–DB07194 and RET–DB11982 complexes showed decreased RMSF values in comparison with the apoprotein and RET–pralsetinib complexes. This clearly depicts that the binding of lead molecules resulted in decreased flexibility of the catalytic residues. Hence, the identified lead compounds were very well po-

(**c**) DB03496 and (**d**) DB11982.

sitioned in the binding pocket of RET protein compared with other compounds considered in our analysis.

#### *3.8. Hydrogen Bond Analysis*

The stability of a protein–ligand complex is usually analyzed based on different types of transient interactions, including electrostatic interaction, Van der Waals, hydrogen bonds, and many others [45]. Among them, the hydrogen bond is regarded as an important transient interaction facilitating the binding of ligands with protein. The existence of hydrogen bonds in the complex structures was calculated from the MD trajectory. The RET– DB7194, RET–DB03496, and RET–DB11982 showed 0–8, 0–4, and 0–6 H-bonds, respectively (Figure 4). These observations demonstrated that the predicted hits showed a higher number of H-bonds than the reference drug during simulation. From the results of the H-bond analysis, it can be concluded that the predicted compounds form a more stable interaction with the RET protein than pralsetinib. *Pharmaceutics* **2021**, *13*, x 10 of 19

**Figure 2.** Time evolution of RMSD values for the apoprotein and protein–ligand complexes: (**a**) Pralsetinib, (**b**) DB07194, **Figure 2.** Time evolution of RMSD values for the apoprotein and protein–ligand complexes: (**a**) Pralsetinib, (**b**) DB07194, (**c**) DB03496 and (**d**) DB11982.

that the three compounds could act as active compounds.

*3.7. Residue Mobility Analysis (RMSF)* 

Guterres and Im showed that active compounds have less RMSD than inactive com-

pounds in 100 ns MD simulations [44] From the DUD-E set, they randomly selected 56 targets. For each target, 10 compounds, five actives and five decoys were selected. They

whereas the decoys have a skewed-right distribution, showing that a lot of them leave the binding pocket during the simulation. As mentioned, our molecules including pralsetinib have RMSD ~0.3 nm, which is consistent with the work of Guterres and Im. This implies

RMSF depicts the flexibility of protein residues within the protein–ligand complex.

As demonstrated in Figure 3, a similar pattern of fluctuation in the backbone was observed among all four systems. The region between Val871–Asp898 exhibited the least fluctuation, with less than ~0.05 nm, indicating the contribution of these residues to stable binding of predicted inhibitors with the RET receptor. Notably, important residues such as Asn879 and Asp892 showed fluctuations of ~0.04 nm, which were found within the conserved interaction region. It is to be noted that the presence of a highly stable protein– ligand complex was due to the formation of hydrogen bonds between these residues and the inhibitors. The other residues, Met700–Lys722 and Pro957–Arg982, showed high flexibility of about ~0.1 nm, suggesting that these residues contributed less to the RET–ligand interaction. These results are correlated well with the ligand interaction pattern discussed earlier. Moreover, a lower RMSF value depicts the well-organized region whereas a high RMSF value indicates loosely structured terminal ends of the complex [33]. In the present study, the apoprotein exhibited an RMSF value of 0.0696 nm whereas the complexes RET– Pralsetinib, RET–DB07194, RET–DB03496, and RET–DB11982 showed 0.069, 0.0665,

pounds considered in our analysis.

0.0773, and 0.033 nm RMSF values, respectively. The RET–DB07194 and RET–DB11982 complexes showed decreased RMSF values in comparison with the apoprotein and RET– pralsetinib complexes. This clearly depicts that the binding of lead molecules resulted in decreased flexibility of the catalytic residues. Hence, the identified lead compounds were very well positioned in the binding pocket of RET protein compared with other com-

**Figure 3.** RMSF values for apoprotein and the protein–ligand complexes system during MD simulations. **Figure 3.** RMSF values for apoprotein and the protein–ligand complexes system during MD simulations.

#### *3.9. Free Energy Landscape (FEL)*

*3.8. Hydrogen Bond Analysis*  The stability of a protein–ligand complex is usually analyzed based on different types of transient interactions, including electrostatic interaction, Van der Waals, hydrogen bonds, and many others [45]. Among them, the hydrogen bond is regarded as an important transient interaction facilitating the binding of ligands with protein. The existence of hydrogen bonds in the complex structures was calculated from the MD trajectory. The RET–DB7194, RET–DB03496, and RET–DB11982 showed 0–8, 0–4, and 0–6 H-bonds, respectively (Figure 4). These observations demonstrated that the predicted hits showed a higher number of H-bonds than the reference drug during simulation. From the results of the H-bond analysis, it can be concluded that the predicted compounds form a more stable interaction with the RET protein than pralsetinib. An inbuilt GROMACS tool gmx\_sham was employed further to investigate the conformational stability of the protein–ligand complex. The exchange of heat in a closed protein–ligand complex system is measured in Gibbs free energy [46]. This analysis provides information on energy minima confirmation and molecular fluctuation. Initially, the covariance matrix containing the eigenvalues was constructed using gmx\_covar tool of GROMACS. Subsequently, the eigenvectors were obtained by diagonalizing the constructed matrix. Finally, the first two principal components (PC 1 and PC2) mapping the eigenvector to its corresponding eigenvalues were obtained using gmx\_anaeig tool [47]. Figure 5 was plotted using the obtained PC1 and PC2, demonstrating the free energy landscape of the complexes. A dark blue color corresponds to the energetically stable and energy-minima favored complex conformation whereas a yellow color demonstrates the unfavorable conformation. The deep energy basin observed during the MD simulation process indicates the high stability of the complex system, while the shallow basin denotes the lower stability of the complex. The RET–pralsetinib complex contained two connected energy minima and one distinct energy minima. In the case of RET–DB03496 and DB11982, one deep energy basin as well as one shallow energy basin was observed, whereas, in the case of RET–DB07194, three deep energy basins were observed. Moreover, the Gibbs free energy of the two compounds (DB07194 and DB03496) was 14.8 kJ/mol and 14.4 kJ/mol, respectively, which were similar to the Gibbs free energy of pralsetinib (14.8 kJ/mol). Nevertheless, the Gibbs free energy of DB11982 was higher (16.2 kJ/mol) than the other two complexes. From Figure 5, it is evident that the energy basins were broad, clear, and distinct in all three compounds, and exhibited lower Gibbs free energy, which shows the stable confirmation of all three protein–ligand complexes.

**Figure 4.** Comparative H-Bond analysis of pairs within 0.35 nm of the complex structures from MD simulation: (**a**) Pralsetinib and DB07194, (**b**) Pralsetinib and DB03496, and (**c**) Pralsetinib and **Figure 4.** Comparative H-Bond analysis of pairs within 0.35 nm of the complex structures from MD simulation: (**a**) Pralsetinib and DB07194, (**b**) Pralsetinib and DB03496, and (**c**) Pralsetinib and DB11982.

tein–ligand complex system is measured in Gibbs free energy [46]. This analysis provides information on energy minima confirmation and molecular fluctuation. Initially, the covariance matrix containing the eigenvalues was constructed using gmx\_covar tool of

*3.9. Free Energy Landscape (FEL)* 

DB11982.

the stable confirmation of all three protein–ligand complexes.

GROMACS. Subsequently, the eigenvectors were obtained by diagonalizing the constructed matrix. Finally, the first two principal components (PC 1 and PC2) mapping the eigenvector to its corresponding eigenvalues were obtained using gmx\_anaeig tool [47]. Figure 5 was plotted using the obtained PC1 and PC2, demonstrating the free energy landscape of the complexes. A dark blue color corresponds to the energetically stable and energy-minima favored complex conformation whereas a yellow color demonstrates the unfavorable conformation. The deep energy basin observed during the MD simulation process indicates the high stability of the complex system, while the shallow basin denotes the lower stability of the complex. The RET–pralsetinib complex contained two connected energy minima and one distinct energy minima. In the case of RET–DB03496 and DB11982, one deep energy basin as well as one shallow energy basin was observed, whereas, in the case of RET–DB07194, three deep energy basins were observed. Moreover, the Gibbs free energy of the two compounds (DB07194 and DB03496) was 14.8 kJ/mol and 14.4 kJ/mol, respectively, which were similar to the Gibbs free energy of pralsetinib (14.8 kJ/mol). Nevertheless, the Gibbs free energy of DB11982 was higher (16.2 kJ/mol) than the other two complexes. From Figure 5, it is evident that the energy basins were broad, clear,

**Figure 5.** Contour plot demonstrating free energy landscapes of (**a**) RET–Pralsetinib, (**b**) RET– DB07194, (**c**) RET–DB03496 and (**d**) RET–DB11982 during 200 ns MD simulation. **Figure 5.** Contour plot demonstrating free energy landscapes of (**a**) RET–Pralsetinib, (**b**) RET–DB07194, (**c**) RET–DB03496 and (**d**) RET–DB11982 during 200 ns MD simulation.

#### *3.10. MM-PBSA*

*3.10. MM-PBSA* 

The binding free energy analysis of the three hit compounds and the reference molecule were calculated using the trajectories pulled out from the last 10 ns of the simulation process. The binding energy for RET–pralsetinib (−9.445 ± 65.091 kJ/mol), RET–DB07194 The binding free energy analysis of the three hit compounds and the reference molecule were calculated using the trajectories pulled out from the last 10 ns of the simulation process. The binding energy for RET–pralsetinib (−9.445 ± 65.091 kJ/mol), RET–DB07194 (−111.920 ± 17.179 kJ/mol), RET–DB03496 (−74.514 ± 77.458 kJ/mol) and RET0–DB11982 (−37.949 ± 42.465 kJ/mol) were demonstrated in Table 3. RET–DB07194 exhibited a stable conformation with the least binding energy among all other compounds screened from our study. The total binding energy is composed of Van der Waals energy, electrostatic energy, polar solvation, and solvent accessible surface area energy. Among them, Van der Waals energy has the highest contribution to the overall binding energy, followed by polar solvation energy, SASA, and electrostatic energy, respectively. It is to be noted that the estimated pattern of binding free energies was similar to that of the MM-GBSA strategy. The predicted binding energies were well correlated with RMSD and hydrogen-bond analysis.


**Table 3.** Total binding energy of the lead molecules against RET protein obtained from MM-PBSA analysis.

#### *3.11. In Silico Evaluation of Lead Compounds against Point Mutant RET Receptor*

As reported by Solomon et al., point mutations at different locations of RET resulted in the development of acquired resistance against the existing inhibitors. Specifically, the development of resistance due to solvent front mutations prevented the inhibitors from accessing the binding pocket of the protein [9,10]. Hence, we evaluated the binding capability of lead compounds against the mutant RET receptor using docking studies and MM-GBSA analysis. The results of docking and MM-GBSA analysis are tabulated in Table S6 (see Supplementary Materials). About 11 points mutated the RET-protein structure, containing 4 point mutations at the gatekeeper region, 4 mutations at the solvent front region, and 3 mutations at other regions, were generated using the homology modeling suite of Schrödinger. The docking analysis of the three lead compounds against RET mutants revealed that DB07194 had overcome G810C and G810V solvent front mutations with higher binding free energy than pralsetinib and the other two hit molecules. On the other hand, the compound DB03496 exhibited significant inhibitory activity against the G810R solvent front mutation. In addition, both the compounds DB07194 and DB03496 inhibited M918T mutation with high binding free energy, at −74.11 kcal/mol and −87.16 kcal/mol, respectively.

In some cases, including V804M mutational study, all the three lead compounds exhibited a high docking score. In contrast, the binding free energy of the lead compounds was lower than the pralsetinib, preventing them from overcoming resistance. Unfortunately, DB11982 did not overcome the acquired resistance in any RET mutant structures investigated in our study. On analyzing the interaction pattern of DB07194, three hydrogen bonds formed between the amino pyrimidine group and ARG874, ARG878, ASN879 had assisted the compounds in overcoming the acquired resistance caused by the G810C mutation in RET. Interestingly, a similar pattern of interaction was observed against the G810V mutation. In the case of M918T mutation, three hydrogen bonds were formed between DB07194, SER811 and ALA807 of the RET protein. Overall, DB07194 showed higher inhibitory activity against RET mutants, including G810C, G810V, and M918T, than DB03496 and DB11982. It is to be noted that pralsetinib has a comparatively lower potential than DB07194 to overcome the solvent front mutation, which might be due to the absence of an amino pyrimidine group in its structure.

#### *3.12. Cell Viability Analysis of DB07194 against LC-2/ad*

Finally, the inhibitory activity of pralsetinib and DB07194 was assessed against LC-2/ad cell lines using a colorimetric MTT assay. The compounds were examined and compared at five different concentrations, 6.25, 12.5, 25, 50, and 100 µM/mL, respectively. The experiment was performed in triplet to overcome the experimental error. Figure 6 and supplementary Table S7 (see Supplementary Materials) represent the comparative cell viability upon treatment with pralsetinib and DB07194. A similar pattern of inhibition was observed between pralsetinib and DB07194 at concentrations 6.25 and 12.5 µM/mL. Interestingly, a sudden rise in the inhibition of LC-2/ad cell using DB07194 was noted at 25 µM/mL, whereas only a smaller variation was observed on using pralsetinib at the same concentration. The inhibitory action does not show much deviation after 50 µM/mL of drugs, which shows the saturation level of inhibition. Overall, LC-2/ad showed higher sensitivity to DB07194 (IC<sup>50</sup> = 12.48 µM) than pralsetinib (IC<sup>50</sup> = 23.31 µM). Consistent with its anti-cancer activity, both pralsetinib and DB07194 can decrease the cell viability more

significantly than control. Moreover, the anticancer property of DB07194 reveals different pharmacological properties of the compound tested earlier in the experiments as an SYK inhibitor [48,49]. Subsequently, one-way ANOVA analysis was implemented to examine the significance of the difference in cell viability between the control and drug-treated samples. A *p*-value of less than 0.001 is observed between the control and drug-treated sample. This highlights the statistical significance of the experimental data carried out in our study. In addition, no literature evidence has been reported on the toxicity and side effects of the compound. Hence, the toxicity of the hit molecule was also assessed using the ProTox II server and compared against pralsetinib [50]. For instance, predicted LD50 values of pralsetinib and DB07194 were found to be 800 mg/kg and 681 mg/kg, respectively, and thus fall into the class four (slightly toxic) category of compounds. All these data are evidence that the identified hit molecule, DB07194, belongs to an experimental subset of the DrugBank database, displaying favorable drug-like properties and potential progression into clinical application. Thus, it could be considered for the treatment of RET-positive NSCLC, a contrast to the properties for which it was originally designed and synthesized. *Pharmaceutics* **2021**, *13*, x 16 of 19

Comparative % cell viability of Pralsetinib treated LC-2/ad cells (Mean of *n* = 3)

**Figure 6.** In vitro evaluation of DB07194 inhibitory activity against LC-2/ad cell lines. **Figure 6.** In vitro evaluation of DB07194 inhibitory activity against LC-2/ad cell lines.

#### **4. Limitations and Future Prospective**

**4. Limitations and Future Prospective**  Acquired drug resistance is the major restraint among RET inhibitors resulting in reduced efficacy of drugs in NSCLC patients. Therefore, we examined the activity of the hit compound against 11 different RET mutations in this study. Although the identified hit can demonstrate potent activity against solvent front mutations (G810C, G810V, and M918T), experimental validation of the compound activity using mutant cell lines is certainly needed to validate this finding. The toxicity studies of this compound either by in vivo micronucleus assays or in vitro genotoxicity assays are also interesting future directions. The in vitro activity of the compound identified by the LC-2/ad cell line in our study opens up a new avenue for biologists to explore the synergistic activity of the compound. Finally, the results of our study will facilitate hit-to-lead optimization to reach novel com-Acquired drug resistance is the major restraint among RET inhibitors resulting in reduced efficacy of drugs in NSCLC patients. Therefore, we examined the activity of the hit compound against 11 different RET mutations in this study. Although the identified hit can demonstrate potent activity against solvent front mutations (G810C, G810V, and M918T), experimental validation of the compound activity using mutant cell lines is certainly needed to validate this finding. The toxicity studies of this compound either by in vivo micronucleus assays or in vitro genotoxicity assays are also interesting future directions. The in vitro activity of the compound identified by the LC-2/ad cell line in our study opens up a new avenue for biologists to explore the synergistic activity of the compound. Finally, the results of our study will facilitate hit-to-lead optimization to reach novel compounds with economic value in the near future.

and its associated solvent frontline mutations using high-throughput drug discovery strategies. Different pharmacophore models were employed along with docking, Tanimoto coefficient calculations, rescoring with RF score, and MM-GBSA to deduce the structural characteristics and binding poses that govern the activity of inhibitors against the RET receptor. Comparative DFT analysis was carried out, and it was observed that the lead molecules exhibited a lower energy gap than pralsetinib, depicting more inhibitory potential against the protein. Furthermore, the stability and flexibility of the complex system were analyzed using molecular dynamics for 200 ns. The interaction surface of the protein Val871–Asp898 was found to be conserved, and contained a series of important residues and thus formed hydrogen bonds with the lead molecules. Moreover, the aminopyrimidine group in DB07194 facilitated inhibition of both native and mutant forms of

pounds with economic value in the near future.

**5. Conclusions** 

#### **5. Conclusions**

The current research focuses on the identification of potential candidates against RET and its associated solvent frontline mutations using high-throughput drug discovery strategies. Different pharmacophore models were employed along with docking, Tanimoto coefficient calculations, rescoring with RF score, and MM-GBSA to deduce the structural characteristics and binding poses that govern the activity of inhibitors against the RET receptor. Comparative DFT analysis was carried out, and it was observed that the lead molecules exhibited a lower energy gap than pralsetinib, depicting more inhibitory potential against the protein. Furthermore, the stability and flexibility of the complex system were analyzed using molecular dynamics for 200 ns. The interaction surface of the protein Val871–Asp898 was found to be conserved, and contained a series of important residues and thus formed hydrogen bonds with the lead molecules. Moreover, the aminopyrimidine group in DB07194 facilitated inhibition of both native and mutant forms of RET with higher binding free energy than pralsetinib. Ultimately, the cell line studies proved the efficiency of the predicted RET inhibitor, showing a lower required minimal drug concentration for inhibiting the RET protein than the existing FDA-approved drug pralsetinib. J.L. Kutok's patent, namely WO2017223422A1 also mentions the chemical compound DB07194 as a potential third chemotherapeutic agent used in combinations with phosphoinositide 3-kinase inhibitors for cancer treatment. Taken together, the results from our study provide a new gateway for developing DB07194 as a potent anticancer agent targeting RET protein and overcoming the RET-associated solvent front mutations.

**Supplementary Materials:** The following contents are available at https://www.mdpi.com/article/ 10.3390/pharmaceutics13111775/s1. Table S1. List of existing RET inhibitors retrieved from literature. Table S2. Rescoring and structure similarity analysis of screened compounds obtained using pharmacophore based strategy. Table S3. Rescoring and structure similarity analysis of screened compounds obtained using e-pharmacophore based strategy. Table S4. Rescoring and structure similarity analysis of screened compounds obtained using receptor cavity based strategy. Table S5. Interaction and pharmacokinetic evaluation of screened RET inhibitors. Table S6. Mutational analysis of predicted compounds using docking and MM-GBSA studies. Table S7. Examination of cell viability at different drug concentrations (µM/mL) against LC-2/ad cell line. Figure S1. Binding mode analysis of reference compound (a) Pralsetinib, (b) DB07194, (C) DB03496 and (d) DB11982 with the target RET protein. Figure S2. Comparative interaction analysis of (a) crizotinib and (b) Sorafenib with RET protein.

**Author Contributions:** P.R. performed the data collection, machine learning model generation, and validation. S.V. and W.-H.S. conceived this study and are responsible for the overall design, interpretation, manuscript preparation, and communication. All authors have read and agreed to the published version of the manuscript.

**Funding:** W.-H.S. acknowledges the supports from the National Research Foundation of Korea (NRF), the grand funded by the Korea government (MSIT) (No. 2020R1F1A1075998, 2020R1A4A1016695).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors (P.R. and S.V.) thank the management of Vellore Institute of Technology for providing the necessary facility to carry out this research work.

**Conflicts of Interest:** We wish to confirm that there is no known conflict of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

#### **References**


## *Article* **Combining Human Genetics of Multiple Sclerosis with Oxidative Stress Phenotype for Drug Repositioning**

**Stefania Olla 1,\* ,† , Maristella Steri 1,† , Alessia Formato <sup>2</sup> , Michael B. Whalen <sup>3</sup> , Silvia Corbisiero <sup>2</sup> and Cristina Agresti 2,\***


**Abstract:** In multiple sclerosis (MS), oxidative stress (OS) is implicated in the neurodegenerative processes that occur from the beginning of the disease. Unchecked OS initiates a vicious circle caused by its crosstalk with inflammation, leading to demyelination, axonal damage and neuronal loss. The failure of MS antioxidant therapies relying on the use of endogenous and natural compounds drives the application of novel approaches to assess target relevance to the disease prior to preclinical testing of new drug candidates. To identify drugs that can act as regulators of intracellular oxidative homeostasis, we applied an in silico approach that links genome-wide MS associations and molecular quantitative trait loci (QTLs) to proteins of the OS pathway. We found 10 drugs with both central nervous system and oral bioavailability, targeting five out of the 21 top-scoring hits, including arginine methyltransferase (CARM1), which was first linked to MS. In particular, the direction of brain expression QTLs for CARM1 and protein kinase MAPK1 enabled us to select BIIB021 and PEITC drugs with the required target modulation. Our study highlights OS-related molecules regulated by functional MS variants that could be targeted by existing drugs as a supplement to the approved disease-modifying treatments.

**Keywords:** GWAS; multiple sclerosis; oxidative stress; repurposing; ADME-Tox

#### **1. Introduction**

Multiple sclerosis (MS) is the most common chronic inflammatory and progressively disabling disease of the central nervous system (CNS), affecting young adults and leading to demyelination and neuronal degeneration [1]. It is found worldwide, with the highest prevalence (>100 cases per hundred thousand) in the populations of Western Europe, North America and Australasia, with considerably lower prevalence (<30 cases per hundred thousand) in populations that live nearer to the equator [2]. MS is likely the result of an interaction between genetic and environmental factors, but its etiology remains unknown. Although approved immunomodulatory therapies are effective in the early stages of the disease, they have little or no benefit in terms of preventing the transition to a more steadily progressive phase, characterized by accumulation of neuronal injury and loss. Thus, the search for agents that slow neurodegeneration and disability progression in MS is urgent.

Neuroinflammation is recognized as a key player in MS pathogenesis. It is present in all stages of the disease and involves adaptive and innate immune responses. Histopathological studies of MS indicate that demyelination and neurodegeneration are associated with the production of inflammatory molecules by both blood-derived immune cells recruited to the CNS and activated resident microglia [3]. Prolonged or chronic generation of

**Citation:** Olla, S.; Steri, M.; Formato, A.; Whalen, M.B.; Corbisiero, S.; Agresti, C. Combining Human Genetics of Multiple Sclerosis with Oxidative Stress Phenotype for Drug Repositioning. *Pharmaceutics* **2021**, *13*, 2064. https://doi.org/10.3390/ pharmaceutics13122064

Academic Editors: Lucret,ia Udrescu, Mihai Udrescu, Ludovic Kurunczi and Paul Bogdan

Received: 14 October 2021 Accepted: 30 November 2021 Published: 2 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

cytokines, chemokines, reactive oxygen species (ROS) and reactive nitrogen species (RNS) creates a self-perpetuating loop that provokes CNS damage and is considered to play a key role in the onset and progression of the disease [4].

ROS and RNS, including superoxide ions, hydrogen peroxide, nitric oxide and peroxynitrite, are generated by NADPH oxidase and nitric oxide synthase during normal cellular metabolism. However, these molecules are deleterious if overproduced because they can damage lipids, proteins and nucleic acids, eventually leading to cell death. Significant evidence indicates that the sustained inflammatory phase of MS creates an imbalance between ROS/RNS generation and the antioxidant defense systems, causing oxidative/nitrosative stress which has a role in CNS tissue damage [5]. Antioxidant defense is normally achieved with enzymes, such as superoxide dismutase, catalases and peroxiredoxins, as well as systems of antioxidant production, like the thioredoxin and glutathione systems. In addition, reactive species directly interact with critical signaling molecules, such as the transcription factors nuclear factor-erythroid 2-related factor 2 (Nrf2) and nuclear factor κB (NfkB) and mitogen activated kinases (MAPK) [6–8] which regulate antioxidant gene expression and cell survival. A recent gene expression study of MS brain areas adjacent to perivascular inflammatory cell infiltrates showed a significant induction of antioxidant genes in actively demyelinating and chronically active white matter lesions as part of a counter-regulatory response aimed at containing inflammation and limiting tissue damage [9]. Hence, the identification of drugs able to effectively support the maintenance of redox homeostasis represents a rational approach to limit MS-associated neurodegenerative processes.

Among current MS drugs, only dimethyl fumarate has been linked to the induction of antioxidant pathways, specifically through direct activation of Nrf2, a transcription factor with a crucial role in the regulation of the antioxidant defense response [10]. In addition, the clinical efficacy of natalizumab and fingolimod could in part be explained by their ability to increase antioxidant molecules and reduce oxidative stress (OS) biomarkers in MS patients [11,12], even though the mechanism responsible for these effects has not yet been established. Nevertheless, most complementary antioxidant therapies relying on endogenous and natural compounds have been previously investigated without overcoming MS clinical evaluation [13]. A possible explanation of this oversight is that the rationale behind the use of small molecules acting as scavengers was based on misconceptions linked to an incomplete understanding of antioxidant defense processes during disease development [14]. Hence, novel approaches should be used to assess the disease relevance of antioxidant targets prior to preclinical testing of new drug candidates.

It is now widely accepted that the selection of targets based on genetics significantly increases the success rates of clinical development programs [15,16]. The idea is to identify targets involved in disease processes that can be therapeutically modulated [17,18]. Over the past fifteen years, genome-wide association studies (GWAS), in increasingly larger sample sets, have succeeded in identifying more than 200 susceptibility loci for MS outside the major histocompatibility complex (MHC) [19]. In parallel, new functional genomic techniques assessing molecular quantitative trait loci (QTLs), such as chromatin interactions, protein level and gene expression regulation, have proven to be useful for the systematic identification of genes through which trait-association variants act, improving the clinical impact of GWAS [20]. Computational searches for existing drugs that modulate the molecular targets identified by genetic studies offer the advantage of repositioning, reducing the costs and timescales of drug development. In addition, in silico approaches are currently being used for the prediction of physicochemical properties, such as the blood–brain barrier (BBB) permeability and oral bioavailability of drugs, further reducing the risk of failure [21].

Here, we designed and applied an integrated approach that combines MS GWAS, molecular QTLs and in silico techniques of drug discovery, providing support for single drug candidates known to act as modulators of genes and/or gene products that are linked to OS pathways (Figure 1).

**Figure 1.** Schematic illustration of the in silico workflow. Multiple sclerosis (MS) genetic variants were collected from the Genome-Wide Association Studies (GWAS) Catalog and molecular Quantitative Trait Loci (QTLs) were exploited for each hit in the LinDA browser to identify gene targets. In parallel, all proteins from 22 oxidative stress-related pathways were retrieved from the Reactome database. The overlap of these data allowed for the identification of 85 common targets which were then prioritized through score assignment. Query of public drug databases for the 21 top targets enabled the selection of 35 drugs either already approved or in clinical trials that bind to six MS molecular targets. Absorption, Distribution, Metabolism, Excretion and Toxicity (ADME-Tox) selection highlighted 10 drugs with CNS localization and oral bioavailability for repurposing in MS.

#### **2. Materials and Methods**

#### *2.1. Data Collection*

MS GWAS summary statistics were extracted from the GWAS Catalog [22,23]. The selected genetic variants represent the most associated signal (top variant) in each genomic region (locus) given a significance threshold of *<sup>p</sup>*-value < 1 <sup>×</sup> <sup>10</sup>−<sup>5</sup> . All variants have been annotated by their rsID in dbSNP154, when available, or by chromosome and genomic positions encoded in the Genome Assembly GRCh38/hg38. To assign the most reliable gene target to each associated variant, molecular QTLs were searched for each hit in a large manually curated QTL resource, the LinDA browser [24,25]. Data from protein QTLs (pQTLs), expression QTLs (eQTLs), splicing QTLs (sQTLs), polyadenylation QTLs (polyQTLs) and methylation QTLs (mQTLs) were collected. The genomic positions in the LinDA browser being encoded in the Genome Assembly GRCh37/hg19, genomic coordinates were converted from GRCh38/hg38 to GRCh37/hg19 using the LitfOver tool in the UCSC Genome Browser [26,27].

Top variants were searched for molecular QTLs, including all variants showing a linkage disequilibrium (LD) r2 > 0.7 with the top variants (proxies) in the European population. LD was calculated using the –ld option in the plink v.1.9 software [28,29] on data from the 1000 Genome Project reference panel [30].

The functional role of each tested variant was further evaluated by the Variant Effect Predictor (VEP) tool [31], and missense or more deleterious variants with a deleteriousness score (combined annotation dependent depletion, CADD-Phred) > 15 were prioritized [32].

Genes regulated at the RNA or protein level by a hit variant (or by a variant in strong LD with a hit variant) or tagged by a functionally relevant variant were flagged as "gene targets".

The direction of the effect of each disease risk variant on the target product was calculated to establish the direction of the gene target modulation by therapy. To this end, the disease risk alleles available from the GWAS Catalog were coupled with the molecular QTLs alleles by applying the Plink –ld option to the European ancestry genotypes encoded in the 1000 Genome Project reference panel [30]. The direction of the effect of the disease risk allele on the molecular QTL was thus indicated as positive if the coupled molecular QTL allele showed a positive effect, and negative otherwise.

In parallel, 22 pathways related to OS were identified by Reactome [33,34], and all proteins belonging to the pathways were extracted.

Genes and/or proteins obtained by the overlapping between the MS-related genes and the OS-related proteins were recorded as "targets".

For each target, a prioritization score was defined by leveraging the gene-level information derived from GWAS and from LD. In particular, for each target, all top variants, together with their molecular QTL proxies pointing to the same gene, were collected. A score was attributed to the target for each of the following criteria met by at least one collected variant:

Top hit significantly associated with MS (*p*-value < 5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> : score = 5, if lying in the MHC region (chr6:27–33 mb in GRCh37): score = 2);

Top hit having a high effect on the disease compared to all top hit effects (odds ratio, OR > 1.2), with a decreasing score depending on the LD with gene-level molecular QTLs (LD ≥ 0.99: score = 4; LD range (0.95–0.99): score = 3; LD range (0.90–0.95): score = 2; LD range (0.80–0.90): score = 1);

eQTL available (score = 10; if the eQTL acts in the brain: additional score = 5);

LD level between the top hit and the eQTL (LD ≥ 0.99: score = 5; LD range (0.95–0.99): score = 3; LD range (0.90–0.95): score = 2; LD range (0.80–0.90): score = 1);

QTL (except eQTL) with LD ≥ 0.99 with top hit: score = 3.

An overall score was calculated as the sum of the partial scores and the top 25% targets were then prioritized.

#### *2.2. g:Profiler Analysis*

To perform functional enrichment analysis, g:Profiler e94\_eg41\_p11\_9f195a1 was used [35,36]. The parameters for the enrichment analysis were as follows. A specific organism was chosen: *H. sapiens* (human). Gene Ontology (GO) analyses, GO molecular function (GO:MF), GO cellular component (GO:CC) and GO biological process (GO:BP) were carried out sequentially. The biological pathways used were the Kyoto encyclopedia of genes and genomes (KEGG), Reactome (REAC) and WikiPathways (WP) databases. The protein databases used were the Human Protein Atlas and CORUM databases. The statistical domain scope was used only for annotated genes. The significance threshold was the g:SCS threshold. The user threshold was 0.05.

#### *2.3. Drug Searching*

Four different databases, OpenTarget [37,38], SuperTarget [39], DrugBank [40,41] and DGIdb [42,43] were used to search for drugs related to the targets of interest.

#### *2.4. In Silico Prediction of Physicochemical Properties of Drugs*

The Absorption, Distribution, Metabolism, Excretion and Toxicity (ADME-Tox) profile of the investigated compounds was predicted using the Schrodinger QikProp tool (Small-Molecule Drug Discovery Suite 2021–1, Schrodinger, LLC, New York, NY, USA). QikProp uses several indicators to estimate the activity in the CNS and thus also the ability of a compound to cross the BBB. The three most important are: (i) LogBB, which represents the blood–brain partition coefficient; (ii) the Madin–Darby dog kidney cell model (apparent MDCK permeability), which estimates the penetration of the substances through a layer of these cells, measured in nm/sec; (iii) the predictor of activity in the CNS. The indicators used to evaluate oral absorption include Human Oral Absorption, Percent Human Absorption and apparent Caco2 permeability, Caco2 being a human colon carcinoma cell line used to predict human intestinal permeability and to investigate drug efflux.

#### **3. Results**

#### *3.1. Genetic-Driven Identification of Targets Linked to Oxidative Pathways in MS*

We systematically collected GWAS data for MS from the GWAS Catalog (Methods), identifying 698 different genetic variants (hits; Table S1, Supplementary Materials). We then examined molecular QTLs to identify gene targets by searching each hit or its proxies (with r2 > 0.7) in the LinDA browser (Table S2). This LD-based searching strategy allowed us to maximize the information collected, considering that differences in the genetic map and/or in the sample size used in each study (both on disease and molecular QTLs) could lead to the identification of different genetic variants representing the same genetic signal. In addition, we evaluated the functional role of each tested variant by VEP, focusing on missense or more deleterious variants with a CADD-Phred score >15 (Table S3). Thus, each gene regulated at RNA or protein level by a hit variant or tagged by a functionally relevant variant, excluding MHC genes, was recorded for a total of 2,085 unique gene targets (Table S4). In parallel, we extracted the proteins encoded by 931 unique targets included in 22 OS-related pathways from the Reactome database [44] (Tables S5 and S6). The overlap between the 2,085 MS-related gene targets and the 931 OS-related proteins led to the identification of 85 shared targets (Table S7), including KEAP1 and HDAC1, which are both known to be modulated by drugs currently in use for MS (dimethyl fumarate and fingolimod, respectively). Among the 85 targets, 18 are supported by molecular QTLs in the brain (ASF1A, ATP6V1G2, BBC3, BCL2L11, CAPN1, CARM1, CHAC1, CRTC3, CSNK2B, DNM2, FOXO3, HSPA1L, KEAP1, MAPK1, NUP85, POM121C, PSMB9 and TRMT112). In addition, for each variant whose risk allele effect on the gene product was available in the brain, we were able to establish the direction of action (up or down) on the transcript/protein level and, consequently, to choose drugs with the proper mode of modulation: inhibition or activation (Table S8). In particular, 10 targets were regulated by MS risk variants at some level in the brain, and among them we observed increased expression levels for seven targets (ASF1A, CAPN1, CARM1, CHAC1, NUP85, POM121C and TRMT112) and decreased levels for three targets (BBC3, MAPK1 and PSMB9).

#### *3.2. Functional Enrichment Analysis of the Identified Targets*

To obtain the enrichment information for the 85 candidate targets showing QTLs, g:Profiler analysis was performed [36]. The default analysis implemented in g:Profiler searches for pathways whose genes are significantly enriched (i.e., over-represented) in the target list of interest and compares them to all genes in the genome. Among the most significant pathways detected by REAC, "cellular response to stress" (*p*-value = 4.2 48 <sup>×</sup> <sup>10</sup>−36) and "cellular responses to external stimuli" (*p*-value = 1. 326 <sup>×</sup> <sup>10</sup>−35) have been pointed out, consistent with OS being the investigated disease phenotype (Figure 2). "Proteasome" (*p*-value = 8. 289 <sup>×</sup> <sup>10</sup>−<sup>7</sup> ) and "proteasome degradation" (*p*-value = 9. 576 × 10−<sup>7</sup> ) have been identified as the most represented pathways by KEGG and WP, respectively. The three most significant cellular functions outlined by GO were "transcription factor binding" (GO:MF, 1. 942 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), "cellular response to stress" (GO:BP, 8. 022 <sup>×</sup> <sup>10</sup>−19)

and "cytosol" (GO:CC, 1. 303 <sup>×</sup> <sup>10</sup>−19). Table S9 gives details of all the individual targets involved in the described analyses.


**Figure 2.** g:Profiler analysis of 85 targets. (**A**) Graphic representation of the results. (**B**) The most significant results for Gene Ontology (GO) and pathways enrichment were shown. GO molecular function (GO:MF); GO biological process (GO:BP); GO cellular component (GO:CC); Kyoto encyclopedia of genes and genomes (KEGG); Reactome (REAC); WikiPathways (WP).

#### *3.3. Target Prioritization and Drug Search*

To prioritize the 85 selected targets, we assigned to each of them a genetic-based score which considers the strength of association (variant effect magnitude and significance) with the disease, the presence of QTLs regulating the gene target at protein expression level, particularly in the brain, and the extent of LD supporting all the molecular information.

Based on the score distribution, we then fixed a threshold of score ≥20, which corresponds to the top 25% of the OS-related targets (Figure S1). We prioritized 21 targets (Table S10), including seven targets regulated by eQTLs in the brain for which we established the required direction of modulation (TRMT112, CAPN1, ASF1A, NUP85 and CARM1, suggested to be inhibited, and BBC3 and MAPK1, suggested to be activated).

In four different databases, we searched for modulators of the 21 top-ranking targets (Table S10), selecting only: (i) drugs approved or in clinical trials; (ii) drugs known to act directly on the specific target or as transcriptional target modulators based on established criteria (DGIdb interaction score >0.50 and published data on experimental validation); (iii) drugs having a mode of action consistent with the direction of the eQTL for the risk allele in the brain, if present. This analysis identified 35 modulators of six out of the 21 top targets (MAPK1, MAPK3, CARM1, CDK4, STAT3 and FOS), with a substantial number of drugs for each target, except for CARM1, which had only one. To increase the modulators of CARM1 and to investigate the druggability of the remaining top-ranking targets, we also looked for experimental drug trials, finding five CARM1 inhibitors and 11 compounds

for two additional targets (NR1D1 and CAPN1). In addition, the presence of at least one modulator on Pharos makes the targets ASF1A, HVCN1 and YWHAQ druggable [45,46]. We then compiled a final list of 50 compounds for the next selection phase.

#### *3.4. Pharmacokinetic Prioritization of the Selected Drugs*

By QikProp, the ADME-Tox properties of 35 repurposable drugs and 15 experimental compounds associated with the eight selected targets were predicted (Table S11). Among the selection criteria, we prioritized the expected penetration into the CNS and the oral bioavailability, which are essential for maintaining drug function and potency towards the respective targets. In addition, physicochemical descriptors and other general properties related to good overall pharmacokinetics and metabolism profiles were considered. In detail, we selected compounds having (i) a value ≥0 for predicted CNS activity; (ii) medium–good values of logBB and MDCK apparent permeability; (iii) high values of human oral absorption and percent human oral absorption; (iv) medium–good values of Caco2 apparent permeability (Table S11). Overall, this analysis identified 10 repurposable drugs (Table 1) and seven experimental compounds. The selected drugs include: (i) the CARM1 inhibitor BIIB021 in clinical trial for breast and gastrointestinal tumors; (ii) the MAPK1 activator PEITC in clinical trial for lung and oral cancer; (iii) four CDK4 inhibitors, ABEMACICLIB approved for breast cancer, ALVOCIDIB, MILCICLIB and PHA-793887 in clinical trials for several tumors; (iv) three STAT3 modulators, ERLOTINIB approved for lung cancer, ENMD1198 and ATIPRIMOD in trial for neuroendocrine cancer and multiple myeloma; (v) PILOCARPINE approved for the treatment of presbyopia as an inducer of FOS expression. Some of the drugs that are presented in Table 1 do not directly modulate the identified targets but may act through indirect mechanisms. The MAPK1-3 inhibitors, MK8353 and LY3214996, were removed from the list since they have a mechanism of modulation not consistent with the direction of eQTLs that we identified for MAPK1 in the brain. The seven experimental compounds that exceeded the pharmacokinetics investigation comprise three CARM1 inhibitors (MS049, MS023, TP064) and four NR1D1 modulators (agonists GSK4112, SR9009 and SR9011 and antagonist SR8278) (Table S11).


**Table 1.** Repurposable candidates for oxidative-stress phenotype in MS. The table shows drug candidates with their mechanism of action and clinical trial status for each target. The queried databases are also reported.


\* Only the highest phase is shown.

#### **4. Discussion**

Advanced genetic analysis in MS has identified variants that clearly influence gene expression of CNS-resident immune cells [19], highlighting potential functional consequences for dysregulation of genes involved in the generation of inflammatory and oxidative mediators that trigger neurodegenerative processes. Our purpose was to link genome-wide MS associations and the correlated molecular QTLs to targets of OS pathways, improving the prediction of drug candidates that act as regulators of intracellular oxidative homeostasis. We selected 10 drugs already in use for cancer therapies that are specific for five out of the 21 top-scoring targets involved in the interplay between oxidation–apoptosis–autophagy– inflammation. Of these, MAPK1, STAT3, CDK4 and FOS targets have been indicated in previous MS GWAS [19,47–49], while the potential genetic link of CARM1 with MS is novel. However, drugs with CNS and oral bioavailability have not been predicted for any of these targets.

GWAS-associated genes have already resulted in candidate targets for drug discovery and repositioning in both complex and monogenic diseases [50]. Concerning MS, several studies have outlined the functional consequences of a set of disease variants [47] but these findings have not yet been translated into clinical practice. Moreover, the crosstalk between OS, neurodegeneration and neuroinflammation has a central role in the pathogenesis of MS [51].

In this study, we correlated MS susceptibility loci to OS pathways, finding those alleles (outside the MHC) that influence risk for this relevant disease phenotype. Notably, 85 shared targets were identified and ranked by assigning a score to each genetic outcome available. The reliability of our results is supported by the high score for KEAP1 and HDAC1, known targets of two drugs currently in use for MS, the antioxidant dimethyl fumarate and the immunomodulator fingolimod, respectively. As expected, our selected targets are linked with OS at different levels, in line with the dynamic outline of this process, which accounts for various interrelated events occurring in different cellular compartments. Our list includes: NCF4, a component of the NADPH oxidase system, and the proton channel HVCN1, which are involved in ROS generation [52]; MAPK1, MAPK3, STAT3 and FOS, inflammatory signaling molecules directly activated by ROS [53,54]; the arginine methyltransferase CARM1, a transcriptional co-activator known to regulate NFkBdependent gene expression [55] and to be involved in cellular processes, such as autophagy, control of the cell cycle and differentiation [56]; the kinase CDK4, which promotes cellular growth by stimulation of mitochondrial biogenesis and concomitantly increases ROS generation [57]; the circadian gene NR1D1, which improves cellular bioenergetics and is regulated by OS and inflammation [58,59]. Interestingly, targets involved in complex regulatory mechanisms have recently attracted interest in the treatment of multifactorial diseases, such as neurodegenerative diseases, in which several biochemical events and molecular targets operate simultaneously [60].

Our approach of genetic-driven target identification is based on the integration of GWAS with eQTLs, especially those measured in brain tissues, to assess genes whose expression levels are modulated by non-coding disease-related variants [49]. The fact that 80% of the genetic variants identified by GWAS map in non-coding regions highlights the potential of functional genomic tools [50,61]. The use in this pipeline of different MS GWAS datasets, including those not containing complete whole-genome results, increased the number of potential candidate targets. Moreover, when the correspondence between a disease-risk variant and an eQTL allele has been derived, we were able to obtain important information about the direction of drug target modulation to be considered.

Query of public databases, combined with in silico pharmacokinetics, allowed for the selection of 10 drugs acting as modulators of five targets associated with oxidative pathways in MS. The direction of brain eQTLs for CARM1 and MAPK1 enabled us to identify two drugs with the required target modulation, prioritizing BIIB021 and PEITC over modulators of targets without the direction of their allelic effect. In particular, BIIB021 is a CARM1 and HSP90 inhibitor currently in clinical trials for treating hematopoietic malignancies and solid tumors (NCT01004081, NCT00618319 and NCT00344786) which easily crosses the BBB and can be administered orally. The drug mechanism responsible for CARM1 inhibition has not yet been defined, and there is the possibility that it acts indirectly via the inhibition of HSP90, which was identified as a CARM1 interactor (EP 3 208 615 B1). In addition, we also indicated highly selective inhibitors of CARM1, recently developed and tested in experimental models [62–64]. PEITC is an organosulfur bioactive compound, known as an MAPK1 activator, that is currently in trial for lung cancer and leukemia treatment (NCT00691132 and NCT00968461). Notably, the anti-inflammatory and antioxidant activity of PEITC has been extensively demonstrated in both in vitro and in vivo models. [65,66]. Of note, our in silico ADME analysis confirmed previous data on the BBB permeability of this drug [67].

Lack of data on the direction of the effects of MS risk variants in the modulation of STAT3, CDK4 and FOS in the brain does not allow the selection of drugs with adequate therapeutic modulation (activation or inhibition). Previous studies based on genetic variants and QTLs have suggested drugs for repurposing without exploiting the direction of effects [49,68], further supporting the potential relevance of our results.

In our study, we exclusively selected drugs that had passed clinical phase I and which therefore should be free of serious side effects regardless of their selectivity. Nevertheless, some drugs, including the CDKs inhibitor Alvocidib, present dose-dependent adverse effects that might be evaluated in the disease of interest by a risk–benefit analysis. As shown for CARM1, small molecules with a higher selectivity can be found among compounds active in preclinical studies but, by definition, these are not currently repurposable compounds.

The knowledge about targets relevant to OS in MS for which no approved modulators are currently available could be exploited in future drug discovery studies. Our search for experimental modulators of these targets led to the identification of NR1D1 agonists and antagonists [69], thus proving the druggability of an additional target.

A major limitation of our in silico approach concerns the finding that only about 22% of protein-coding genes are druggable [70], which is consistent with the low proportion of top-identified targets engaged by approved or in clinical trial drugs. A more stringent selection of genes strongly associated with disease may result in the loss of relevant targets showing small effect sizes [71]. In addition, the smaller number of QTLs assessed in the

brain compared to other tissues and the lack of protein-QTLs significantly reduce the number of candidate genes to be matched with the selected disease phenotype. It should also be kept in mind that public databases for GWAS, drug targets and pathways make available data that are usually not uniform, often incomplete and frequently not up-to-date, and these represent important constraints for the achievement of a comprehensive analysis.

#### **5. Conclusions**

This study highlights the support of genetics in identifying targets which can potentially result in an unbalance of OS-related pathways in MS and existing drugs that can be repositioned to aim at these targets. We showed for the first time an increased expression of CARM1 genetically linked to MS. This finding agrees with the emerging dysregulation of methylation pathways in MS, which may impact immune and neurological processes [72]. Notably, several links between arginine methylation and neurodegenerative diseases, such as amyotrophic lateral sclerosis, Alzheimer's and Huntington's disease, have been established over the last few years [73]. However, preclinical studies will be necessary to validate the best drug candidates in cellular or animal models before their therapeutic application. A network pharmacology analysis could be helpful in identifying combinations of drugs targeting different unbalanced signaling pathways consistent with omics data integration and a multitarget drug development approach [74].

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13122064/s1, Table S1: Genome-wide association results for multiple sclerosis; Table S2: Molecular QTLs related to MS variants; Table S3: Genes tagged by functional relevant MS-related variants; Table S4: MS-related gene targets derived from GWAS and molecular QTLs; Table S5: OS-related pathways; Table S6: Unique proteins related to oxidative stress; Table S7: List of the 85 OS-related targets; Table S8: Direction of effects of MS variants on molecular gene features; Table S9: G-profiler analysis of 85 OS-related targets; Figure S1: Prioritization score distribution; Table S10: Prioritization score; Table S11: In silico ADME studies.

**Author Contributions:** Conceptualization, C.A., S.O. and M.S.; methodology, S.O. and M.S.; software, S.O., M.S., A.F. and S.C.; data analysis, S.O. and M.S.; investigation, C.A., S.O., A.F. and S.C.; writing original draft preparation, C.A., S.O. and M.S.; writing—review and editing, C.A., S.O., M.S., M.B.W., A.F. and S.C.; supervision, C.A.; funding acquisition C.A. and S.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by Progressive MS Alliance (collaborative research network PA-1604-08492-BRAVEinMS) to S.O. and C.A.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We thank Francesca Aloisi, Caterina Veroni and Andrea Angius for help with the manuscript revision and Michele Marongiu for technical support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Drug Repurposing Using Modularity Clustering in Drug-Drug Similarity Networks Based on Drug–Gene Interactions**

**Vlad Groza <sup>1</sup> , Mihai Udrescu 1,\* , Alexandru Bozdog <sup>1</sup> and Lucre¸tia Udrescu <sup>2</sup>**


**Abstract:** Drug repurposing is a valuable alternative to traditional drug design based on the assumption that medicines have multiple functions. Computer-based techniques use ever-growing drug databases to uncover new drug repurposing hints, which require further validation with in vitro and in vivo experiments. Indeed, such a scientific undertaking can be particularly effective in the case of rare diseases (resources for developing new drugs are scarce) and new diseases such as COVID-19 (designing new drugs require too much time). This paper introduces a new, completely automated computational drug repurposing pipeline based on drug–gene interaction data. We obtained drug– gene interaction data from an earlier version of DrugBank, built a drug–gene interaction network, and projected it as a drug–drug similarity network (DDSN). We then clustered DDSN by optimizing modularity resolution, used the ATC codes distribution within each cluster to identify potential drug repurposing candidates, and verified repurposing hints with the latest DrugBank ATC codes. Finally, using the best modularity resolution found with our method, we applied our pipeline to the latest DrugBank drug–gene interaction data to generate a comprehensive drug repurposing hint list.

**Keywords:** bioinformatics; drug repurposing; complex network analysis; modularity clustering; ATC code

#### **1. Introduction**

The growth in the number of newly approved pharmaceutical substances has stagnated despite the ever-growing resources that the industry allocates [1–4]. Designing, developing, and testing new medicines is an expensive, long, and cumbersome process [5], which becomes explicitly bothersome for new rare diseases—because funds are limited—and new pathogen epidemics—stopping the disease spread requires a rapid therapeutic solution [6,7]. One convenient alternative to the pharmaceutic industry's productivity challenges is drug repurposing, underpinned by the R&D in the pharmaceutical industry, as well as the observations and long-time experience indicating the favorable polypharmacological profile of drugs (in other words, most pharmaceutical substances tend to have multiple functions) [8–10]. The trend that calls for drug repurposing techniques is in sync with the recent expansion of Big Data and machine learning in genetics, biology, and medicine; therefore, we witnessed the development of a wide array of computer-based methodologies to uncover new drug repurposing [11–13].

A significant area in computational repurposing (or repositioning) relies on the complex network representations of various drug interaction/relationship types, e.g., drug–drug [14], drug–target [15–17], drug–side effect [18], drug–gene. The networks consist of nodes/edges—representing drugs, targets, genes, or side effects—and links/edges representing interactions or other types of relationships [19]. The network of specific drug interactions allows for the characterization of a complex biological system under therapy; therefore, researchers can use computational techniques and network science principles to

**Citation:** Groza, V.; Udrescu, M.; Bozdog, A.; Udrescu, L. Drug Repurposing Using Modularity Clustering in Drug–Drug Similarity Networks Based on Drug-Gene Interactions. *Pharmaceutics* **2021**, *13*, 2117. https://doi.org/10.3390/ pharmaceutics13122117

Academic Editor: David Barlow

Received: 24 October 2021 Accepted: 2 December 2021 Published: 8 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

explore the interplay between microscale interactions and macroscale behavior [14]. An important area in network science is community/cluster detection and analysis [20,21]. The assumption is that nodes from a distinct cluster have similar topological properties and, thus, share a common feature; this results in drug repurposing opportunities [6]. (If most drugs in a cluster have a particular therapeutic function, then it is reasonable to assume that the function also exists at least in some of the other drugs in the cluster). Many network-based computational drug repurposing methods use topological network features, such as centralities (topological indicators/measures of a node's importance in the network) and modularity, to identify potential repositioning [22,23].

All computational drug repositioning methods produce lists of hints or predictions that require testing or confirmation in silico (e.g., molecular docking) [24], in vitro, and in vivo [25]. One can also indirectly prove the effectiveness of the computational technique by applying it on an earlier database version and testing the predictions on the latest data [14,22]. The existing computational pipelines predicted several important drug repurposings. Moreover, the crisis generated by the COVID-19 pandemic called for drug repurposing solutions to counter SARS-CoV-2 infections.

In our prior study, we also approached the problem of drug repositioning by building a drug–drug interaction network [14] and a drug–drug similarity network based on drug– target interactions [22]; we used the corresponding drug–drug and drug–target interaction data from DrugBank 4.1 and 4.2, respectively. In [14], we used community detection with energy-based layouts and fixed modularity; in [22], we also used energy-based layouts and fixed modularity, as well as ranking nodes by network centralities; in both previous approaches, we labeled the clusters and confirmed predictions with expert analysis.

In this paper, we also use a method based on network community detection and analysis. To this end, we build a drug–drug similarity network, because similarity networks are better suited for community detection: Nodes in the same community are more likely to be similar. Indeed, many other computational drug repurposing methods operate on similarity networks [26,27], with similarity defined on various criteria—from drug–target interactions [22] to adverse effects [18]. We find inspiration in the diseasome project [28,29] based on processing a disease–gene bipartite network (i.e., with two types of nodes, namely, genes and diseases); the processing of the disease–gene network projects it as either a gene– gene similarity or a disease–disease similarity network. In the gene–gene network, a link between two genes exists if there is at least one common disease with which they interact; in the disease–disease network, a link between two diseases exists if at least one gene is responsible for both diseases.

Our method builds a drug–gene interaction network with drug–gene interaction data from the earlier DrugBank 5.0.9 version, then projects it as a drug–drug similarity network; this is the first drug repurposing method derived from a gene-based drug–drug similarity network to the best of our knowledge. Our drug–drug similarity network is weighted the weight of the link between two nodes/drugs represents the number of genes with which the two drugs interact in the same manner. We then use modularity-based network clustering to identify drug communities/clusters. We adopt the same assumption as in the case of the diseasome analysis in [30] that nodes inside the same community most probably share a common function or property. In this manner, if a drug inside one community does not have the ATC code level 1 of the majority, then we hypothesize that the drug can be repurposed accordingly. Nonetheless, we improve the efficiency of the approach by providing an automated procedure for tuning modularity resolution [31] by comparing the ATC code level 1 predicted with our method applied to DrugBank 5.0.9 [32] with the level 1 ATC codes of the drug in the latest DrugBank version 5.1.8 [33]. Finally, we apply our pipeline—with the optimized modularity resolution—to the latest DrugBank data to generate a new list of repurposing hints, which we support by existing literature findings. Refer to the overview of our proposed methodology in Figure 1. We only considered drugs listed as *approved* in DrugBank.

**Figure 1.** The overview of our proposed computational drug repurposing pipeline. In the first step, we use drug–gene interaction information from DrugBank 5.0.9 to build the (bipartite) drug–gene interaction network, which we then projected as a drug–drug similarity network (DDSN). In the second step, we used modularity class network clustering to identify drug communities with shared properties, analyzed the DrugBank 5.0.9 first-level ATC code histograms in each community to predict new drug properties, and checked these predictions against the latest DrugBank 5.1.8 level 1 ATC codes. The procedure in the second step allows maximizing the number of confirmed repositionings by adjusting modularity resolution. The third step uses our method with the optimized resolution value determined in the second step to generate a repurposing hints list according to DrugBank 5.1.8.

> Three arguments support the novelty of the research presented in this paper. First, this manuscript is—to the best of our knowledge—the first to build and process a DDSN based on drug–gene interaction data. Second, we present a novel method (based on level 1 ATC codes) that labels clusters and generates repositioning hints automatically. Third, we tuned modularity resolution algorithmically and automatically confirmed repositioning hints by comparing two chronologically distinct DrugBank versions.

> From a pharmacological perspective, our overarching contribution is to develop, for the first time, and promote the drug–gene interaction networks as a valuable analytical, screening, and visualization tool in drug repositioning. Our method can complement existing computational repositioning pipelines; therefore, it can be integrated into more sophisticated ensemble methods.

#### **2. Materials and Methods**

In this section, we present the conceptual description of our algorithmic drug repositioning method from Figure 1. The thorough technical implementation and description are provided on our GitHub page https://github.com/GrozaVlad/Drug-repurposing-using-DDSNs-and-modularity-clustering (last commit on 21 October 2021). We used *Nodejs* with packets *xml-js* (for parsing the DrugBank xml files) and *pg* (for interacting with the *PostgreSQL* database), and *Docker* and *Docker-compose* for containerized databases [34]. For building and clustering DDSN, we used the *Python* packages *Psycopg2*, *Pandas* [35], *NetworkX* [36], and *Cdlib* [37]; for visualizing the networks, we used *Gephi* [38]. The hardware platform for running this project was a MacBook Pro, Intel Core i9—2400 MHz with 16 GB RAM, GPU Radeon Pro 560× 4 GB.

#### *2.1. Databases*

In order to facilitate an automated procedure of validating our drug repurposing pipeline, we used the earlier DrugBank version 5.0.9 to generate repurposing predictions in one of the anatomical or pharmacological groups described by the first-level ATC codes, then we validated the predictions with the ATC codes with the latest DrugBank version 5.1.8 (last accessed on 30 September 2021).

In DrugBank version 5.0.9, there are 1966 drugs, 2352 genes, and 7249 drug–gene interactions; the interaction types are part of the set *I<sup>e</sup>* = {inhibitor, agonist, antagonist, other/unknown, ligand, partial agonist, inducer, other, suppressor, binder, antibody, modulator, allosteric modulator, potentiator, neutralizer, stimulator, activator, component of, substrate, inactivator, blocker, antisense oligonucleotide}. In the latest DrugBank version 5.1.8, there are 3117 drugs, 4108 genes, and 8396 drug–gene interactions with interaction types part of the set *I<sup>l</sup>* = {inhibitor, agonist, antagonist, other/unknown, antibody, substrate, ligand, partial agonist, inducer, other, suppressor, binder, potentiator, modulator, activator, cofactor, degradation, positive allosteric modulator, incorporation into and destabilization, allosteric modulator, neutralizer, stimulator, binding, inactivator, inverse agonist, blocker, chaperone, inhibition of synthesis, antisense oligonucleotide, gene replacement, regulator}. Refer to Section 4.1 for explanations.

We chose DrugBank [33] because it is a comprehensive, versioned, and scientifically curated (i.e., robust) database with consistent support for in silico drug design and repositioning space exploration [32].

#### *2.2. Building the Drug–Drug Similarity Network*

The bipartite drug–gene interaction network is a graph G = (*V*, *E*), where *V* is the set of vertices or nodes, and *E* is the set of edges. The network G is bipartite because *V* = *V<sup>D</sup>* ∪ *VG*, where *V<sup>D</sup>* is the set of drugs and *V<sup>G</sup>* is the set of genes. The edges *eij* ∈ *E* represent interactions between a drug *D<sup>i</sup>* ∈ *V<sup>D</sup>* and a gene *G<sup>j</sup>* ∈ *V<sup>G</sup>* (the interaction is of the type *T<sup>k</sup>* ∈ *I*, with *I* defined in Section 2.1). An example of such a drug–gene bipartite graph is presented in Figure 2a, with 4 drugs, 3 genes, and 3 types of drug–gene interactions.

**Figure 2.** An illustrative example of projecting the bipartite drug–gene interaction graph G (**a**) into a weighted drug–drug similarity network W (**b**). In our example, G has 4 drugs (*D*<sup>1</sup> , *D*2, *D*3, and *D*<sup>4</sup> ), 3 genes (*G*<sup>1</sup> , *G*2, and *G*3), and 3 types of drug–gene interactions. In the drug–drug similarity network from panel (**b**), nodes are drugs, and links between two drugs represent the number of genes with which the drugs interact in the same manner. For instance, as shown, the link *w*1,3 between nodes/drugs *D*<sup>1</sup> and *D*<sup>3</sup> has a weight of 3 because *D*<sup>1</sup> and *D*<sup>3</sup> have the same type of interaction with genes *G*<sup>1</sup> , *G*2, and *G*3.

From the drug–gene bipartite network G, we generated the weighted drug–drug similarity network W = (*VD*, *W*) using network projection [39]. In the DDSN, the nodes represent drugs, and a link between two nodes exists if there is at least one gene with which the two drugs interact in the same manner (i.e., the interactions are of the same type *T<sup>k</sup>* ∈ *I*). In Figure 2b, we present the DDSN projection of the drug–gene example network in Figure 2. The network is weighted because two drugs *D<sup>i</sup>* and *D<sup>j</sup>* can have the same type of interactions with *m* genes; therefore, the weight of edge *wij* ∈ *W* is *m*.

#### *2.3. Network Clustering Analysis*

The clustering of network G = (*V*, *E*) is the process of classifying all nodes *v<sup>i</sup>* ∈ *V* in one of the *n* (disjoint) subsets *C<sup>j</sup>* , with *V* = S*n <sup>j</sup>*=1*C<sup>j</sup>* , according to their topological properties. In this paper, we use modularity-based clustering because of its proven effectiveness in drug network analysis [14,22,23]. As defined in [40], the modularity of a clustering C in a weighted network such as our DDSN—represented as W—is defined as follows.

$$M = \frac{1}{2a} \sum\_{ij} \left( w\_{ij} - \frac{k\_i k\_j}{2a} \right) p\left(\mathbb{C}\_{i\prime} \mathbb{C}\_j\right). \tag{1}$$

In Equation (1), *a* = <sup>1</sup> <sup>2</sup>∑*ijwij*; *i* and *j* are the indexes of nodes *v<sup>i</sup>* , *v<sup>j</sup>* ∈ *VD*; *k<sup>i</sup>* and *k<sup>j</sup>* are the node degrees (i.e., the sums of weights of incident edges) for nodes *v<sup>i</sup>* , *v<sup>j</sup>* ∈ *VD*; *wij* is the adjacency matrix of nodes in W; *C<sup>i</sup>* and *C<sup>j</sup>* are the communities that include nodes *vi* , *v<sup>j</sup>* ∈ *VD*, respectively; and *p* is a function *p*(*x*, *y*) that returns 1 if *x* = *y* and 0 otherwise. (In our DDSN, nodes *v<sup>i</sup>* and *v<sup>j</sup>* are drugs *D<sup>i</sup>* and *D<sup>j</sup>* , respectively).

The modularity of clustering C is a value *M*<sup>C</sup> ∈ [−1, 1], representing the edge density within the clusters with respect to the edge density between clusters. The clustering algorithms are based on modularity search for the best partitioning C of the node-set such that the value of *M* is maximized. The problem is that an exhaustive search for the best modularity entails large computational burden. Consequently, in practice, heuristic algorithms approximate optimal modularity clustering. However, if the network is very large, such approximations cannot identify small-size clusters—even if the density of internal edges is high and the density of edges between these small clusters and the rest of the network is low.

In this paper, we use the modularity-based clustering algorithm from [41], which controls the resolution of the clustering using a recursive procedure that starts with each node being a cluster and then moving nodes *v<sup>i</sup>* (i.e., *D<sup>i</sup>* in our DDSN) to a different cluster *Cj* if this generates a positive modularity gain expressed as follows.

$$
\Delta \mathcal{M} = \left[ \frac{\mathcal{K}\_{\mathbb{C}\_{j}}^{\*} + \mathcal{K}\_{i}^{\mathbb{C}\_{j}}}{2a} - \left( \frac{\mathcal{K}\_{\mathbb{C}\_{j}} + \mathcal{K}\_{i}}{2a} \right)^{2} \right] - \left[ \frac{\mathcal{K}\_{\mathbb{C}\_{j}}^{\*}}{2a} - \left( \frac{\mathcal{K}\_{\mathbb{C}\_{j}}}{2a} \right)^{2} - \left( \frac{\mathcal{K}\_{i}}{2a} \right)^{2} \right]. \tag{2}
$$

In Equation (2), *K* \* *Cj* is the sum of the weights of all edges within cluster *C<sup>j</sup>* ; *KC<sup>j</sup>* is the sum of the weights of all edges incident to nodes in cluster *C<sup>j</sup>* ; *K<sup>i</sup>* is the sum of the weights of all edges incident to node *v<sup>i</sup>* (*D<sup>i</sup>* in DDSN); and *K Cj i* is the sum of the weights of links from *v<sup>i</sup>* to all nodes in cluster *C<sup>j</sup>* . The algorithm controls the clustering resolution using the value of *λ* = ∆*M*—a lower *λ* determines a higher number of clusters.

#### *2.4. Tuning Resolution λ*

Using Algorithm 1, we tune the modularity resolution to achieve efficiency in predicting new drug properties. To this end, we try *λ* values in the [0.1, 5] interval, with a step of 0.1, generate the modularity clustering C for each resolution value (Clustering(G, *λ*)), and determine the dominant property P*<sup>i</sup>* in each cluster *C<sup>i</sup>* ∈ C. The dominant property P*<sup>i</sup>* corresponds to the level 1 ATC code of the majority of drugs in cluster *i*, *D<sup>j</sup>* ∈ *C<sup>j</sup>* , as resulting from the level 1 ATC code histogram of *C<sup>i</sup>* , and denoted <sup>A</sup><sup>1</sup> (*Ci*). Then, for each

drug *D<sup>j</sup>* in each cluster *C<sup>i</sup>* , we checked the list of first level ATC codes for drug *D<sup>j</sup>* (denoted *A* 1 *Dj* ) against the drug's cluster dominant property P*<sup>i</sup>* . If P*<sup>i</sup>* is not in the list of DrugBank 5.0.9 level 1 ATC codes for *D<sup>j</sup>* (i.e., *A* 1 *Dj* ), but it is present in the list of DrugBank 5.1.8 level 1 ATC codes (i.e., *A* 1 *c Dj* ), then we consider this as a confirmed repositioning of *D<sup>j</sup>* to property P*<sup>i</sup>* . As such, we will add drug *D<sup>j</sup>* to the list of repositionings confirmed with DrugBank 5.1.8 level 1 ATC codes, <sup>R</sup>*<sup>c</sup>* . Value *<sup>λ</sup>max* corresponds to <sup>R</sup>*<sup>c</sup>* with the biggest number of elements, namely max{|R*<sup>c</sup>* |}.

**Algorithm 1** Find the parameter *λ*, such that the clustering C of nodes/drugs *D<sup>i</sup>* in G with modularity resolution *λ* (i.e., Clustering(G, *λ*)) produces the biggest number of repositionings confirmed with the level 1 ATC codes in DrugBank 5.1.8.

**Input:** Drug-drug similarity network G = (*VD*, *E*) based on drug-gene interaction data from DrugBank 5.0.9., ATC codes for drugs in DrugBank versions 5.0.9 and 5.1.8 **Output:** The *λ* value that generates the highest number of confirmed repositionings.

1: **for** *λ* in range (0.1 to 5), with 0.1 steps **do**

2: C ⇐ Clustering(G, *λ*) 3: **for all** *C<sup>i</sup>* ∈ C **do** 4: <sup>P</sup>*<sup>i</sup>* ⇐ A<sup>1</sup> (*Ci*) 5: <sup>R</sup>*<sup>c</sup> <sup>i</sup>* ⇐ ∅ 6: **for all** *D<sup>j</sup>* ∈ *C<sup>i</sup>* **do** 7: **if then**P*<sup>i</sup>* ∈/ *A* 1 *Dj* & P*<sup>i</sup>* ∈ *A* 1 *c Dj* 8: <sup>R</sup>*<sup>c</sup> <sup>i</sup>* ⇐ R*<sup>c</sup> i* ∪ *Dj* 9: **end if** 10: **end for** 11: **end for** 12: <sup>R</sup>*<sup>c</sup>* <sup>=</sup>⇐ S *i*R*c i* 13: **end for** 14: **Return** the value of *λmax* corresponding to max{|0*R c* |}

#### *2.5. Generating New Repurposing Hints*

We generated a list of new repositioning hints using the modularity clustering with the resolution value determined by Algorithm 1 in Section 2.4. Algorithm 2 presents the method we follow: Cluster the DDSN built with drug–gene interaction information from DrugBank 5.1.8 using the tuned resolution *λmax* (C = Clustering(G, *λmax*)); determine the dominant property P*<sup>i</sup>* of each cluster *C<sup>i</sup>* ∈ C as resulted from *C<sup>i</sup>* 's level 1 ATC code histogram (denoted <sup>A</sup><sup>1</sup> (*Ci*)); and check for each drug *D<sup>j</sup>* in each cluster *C<sup>i</sup>* the list of first level ATC codes of *D<sup>j</sup>* (denoted *A* 1 *Dj* ) against its cluster's dominant property P*<sup>i</sup>* . If the cluster's dominant property P*<sup>i</sup>* is not in *A* 1 *Dj* (the list of *D<sup>j</sup>* level 1 ATC codes), we hint that *D<sup>j</sup>* can be repositioned to P*<sup>i</sup>* . Consequently, we add these repositioning cases as drug–predicted property pairs *Dj* ,P*<sup>i</sup>* to the repositioning hints list N .

**Algorithm 2** Generate the list of drug repurposing hints by clustering the DDSN G with the tuned modularity resolution.

**Input:** Drug–drug similarity network G = (*VD*, *E*) based on drug–gene interaction data from DrugBank 5.1.8, *λmax*, and the ATC codes for drugs in DrugBank 5.1.8.

**Output:** The repositioning hints <sup>N</sup> as a list of drug–predicted property pairs, *Dj* , <sup>A</sup><sup>1</sup> (*Ci*) . 1: C ⇐ Clustering(G, *λmax*)

2: N ⇐ ∅ 3: **for all** *C<sup>i</sup>* ∈ C **do** 4: <sup>P</sup>*<sup>i</sup>* ⇐ A<sup>1</sup> (*Ci*) 5: **for all** *D<sup>j</sup>* ∈ *C<sup>i</sup>* **do** 6: **if** P*<sup>i</sup>* ∈/ *A* 1 *Dj* **then** 7: N ⇐ N ∪ *D<sup>j</sup>* ,P*<sup>i</sup>* 8: **end if** 9: **end for** 10: **end for**

11: **Return** the list of drug repositionings N as drug–predicted property pairs

#### **3. Results**

*3.1. DDSN Using Drug–Gene Interactions from DrugBang 5.0.9*

Following the algorithmic approach presented in Figure 1, according to the methods described in Sections 2.2–2.5, we employ cluster-based network analysis on the drug–drug similarity network (DDSN) built with drug–gene interaction information from DrugBank 5.0.9 to search for the most effective modularity resolution *λmax*—in other words, the modularity resolution that produces the highest number of drug repositionings confirmed with level 1 ATC codes from DrugBank 5.1.8. Figure 3 presents the result of running Algorithm 1 from Section 2.4; the best results correspond to resolutions 1.9 and 2.0 (the same nine confirmed repositionings in both cases). Henceforth, we will consider *λmax* = 2.0.

**Figure 3.** The number of confirmed repositionings <sup>R</sup>*<sup>c</sup>* for resolution *λ* values in the [0.1, 5] interval, with a step of 0.1, after running Algorithm 1 on the DDSN G built with drug–gene interaction information from DrugBank 5.0.9. The highest number of repositionings confirmed with level 1 ATC codes from DrugBank 5.1.8 (i.e., 9) corresponds to resolutions 1.9 and 2.0.

Figure 4 presents the largest connected component of the DDSN, constructed with drug–gene interaction data from DrugBank 5.0.9 and clustered with modularity resolution *λmax* = 2.0; the text indicates the topological coordinates of repositionings confirmed with DrugBank 5.1.8 data.

**Figure 4.** Drug–drug similarity network (DDSN) built with drug–gene interaction data from DrugBank 5.0.9, clustered using modularity classes for resolution *λmax* = 2.0. We indicate the position of drugs repositioned and confirmed (with level 1 ATC codes from DrugBank 5.1.8) them by labeling the corresponding nodes with their names. The brown nodes represent drugs in cluster *C*<sup>0</sup> (512 drugs), yellow nodes represent drugs in cluster *C*<sup>1</sup> (238 drugs), green nodes represent drugs in cluster *C*<sup>2</sup> (197 drugs), pink nodes represent drugs in cluster *C*<sup>3</sup> (143 drugs), and light blue nodes represent drugs in cluster *C*<sup>4</sup> (88 drugs).

In Figure 4, nodes represent drugs, and links represent similarity relationships based on drug–gene interactions, as described in Section 2.2; node colors correspond to specific clusters, as determined by the modularity class, and all links are represented with grey lines.

In Appendix A.1, Figures A1–A3, present zoomed details of DDSN from Figure 4 in the vicinity of nine confirmed repositionings corresponding to *λmax* = 2.0. The repositionings come from cluster *C*0–brown and cluster *C*2–green nodes. We indicated the drug repositionings confirmed with DrugBank 5.1.8 data with red arrows (→) in Figures A1 and A2; in Figure A3, we have many confirmed repurposed drugs and a high density of nodes; hence, red diamonds () were used instead of arrows.

The zoomed details provided by Figures A1 and A2 show that mepolizumab and naloxone are within cluster *C*<sup>0</sup> (brown nodes), where the dominant property is given by the level 1 ATC code N–*Nervous system*, followed by code R–*Respiratory system*. As such, our method automatically predicts that mepolizumab (listed as L–*Antineoplastic and immunomodulatory drugs* in DrugBank 5.0.9) acts as a drug with level 1 ATC code R. (In Appendix A.2, Figure A4 shows that in cluster *C*0—in addition to the dominant level 1 ATC codes N—we also have many subcluster drugs with level 1 ATC codes A–*Alimentary tract and metabolism*; R–*Respiratory system*; and C–*Cardiovascular system*). Our method predicts that naloxone (an opioid overdose antidote in DrugBank 5.0.9) also acts on the nervous system (first level ATC N). The more recent DrugBank 5.1.8 confirms the predictions, listing mepolizumab with first level ATC code R and naloxone with N (see more details in Section 3.3.1).

In Appendix A.1, Figure A3, we zoom in to the region in DrugBank 5.0.9 DDSN with the confirmed repositionings in cluster *C*<sup>2</sup> (green nodes), with the dominant level 1 ATC code G–*Genitourinary system and sex hormones* (see the histogram in Appendix A.2 Figure A4). The confirmed repositionings in cluster *C*<sup>2</sup> are torasemide (ATC level 1 code C, cardiovascular system), quinetazone (C), methazolamide (S, sensory organs), acetazolamide (S), dorzolamide (S), and brinzolamide (S). Zonisamide (N, nervous system) is a brown node (cluster *C*0) but in the close vicinity of cluster *C*2; therefore, one can expect functional overlappings [14]. Our method automatically predicts that all these drugs have genitourinary system properties, and DrugBank 5.1.8 confirms the predictions (see the detailed description in in Section 3.3.1).

Using ATC codes as references for drug repurposing is already used in the stateof-the-art contexts, although confirmations based on ATC codes are very conservative (i.e., the World Health Organization assigns new ATCs after a long and thorough process) [25,42]. Confirming the predicted drug repositionings by performing a research literature review will reveal many more confirmations [25,43]. By this logic, our analysis of DrugBank 5.0.9 does not reveal many confirmed repurposings, yet it helps tune the modularity resolution *λ*.

#### *3.2. DDSN Using Drug–Gene Interactions from DrugBang 5.1.8*

According to the algorithmic approach presented in Figure 1, we generated the DDSN based on the drug–gene interactions reported in DrugBank 5.1.8 and clustered DDSN using the modularity classes obtained for resolution *λmax* (by employing Algorithm 1 with the results presented in Section 3.1). We display the largest connected component of the DrugBank 5.1.8 DDSN in Figure 5, with cluster *C*<sup>0</sup> (brown nodes) having the dominant level 1 ATC code N–*Nervous system*; clusters *C*<sup>1</sup> and *C*<sup>2</sup> (green and orange nodes) J–*Anti-infectives for systemic use*; cluster *C*<sup>3</sup> (light blue nodes) L–*Antineoplastic and immunomodulating agents*; and cluster *C*<sup>4</sup> (pink nodes) A–*Alimentary tract and metabolism*.

By running Algorithm 2 on the DDSN built with DrugBank 5.1.8 data and clustered with modularity classes at resolution *λmax*, we generated lists of drug repurposing hints for each drug cluster. In the Supplementary Materials Table S1 file *DDSN-results.xls*, tab *DB 5.1.8 resolution 2.0*, we present the first 10 drug clusters and the entire list of drug repurposing candidates generated with Algorithm 2 (759 candidates).

Generating a list of 759 drug repurposing candidates with the latest DrugBank data and experimental confirmation is beyond the focus of our paper, and we select the first 10 drugs in each cluster in terms of betweenness/degree centrality (the methodology used in [22]) and checked them with the state-of-the-art scientific literature. For checking repositioning hints, we searched for articles in PubMed. The terms we used to search the literature were the name of the drug and the words/pharmacological terms that form level 1 of the ATC code. For example, our methodology predicted for methotrexate ATC code with level 1 J–*Anti infectives for systemic use*; we searched for the confirmation of this prediction by using keywords *methotrexate anti-infective*, as well as keywords representing therapeutic groups included in class J (i.e., *methotrexate antiviral*, *methotrexate antibacterial*, or *methotrexate antimycotic*). The confirmation results of our extensive literature check are presented in Table 1, showing the drug name, cluster number, current level 1 ATC code, predicted level 1 ATC code, and confirmation references. We also added a detailed discussion of the repurposing hints from Table 1 in Section 3.2.

**Figure 5.** Drug–drug similarity network (DDSN) built with drug–gene interaction data from Drug-Bank 5.1.8, clustered using modularity classes for resolution *λmax* = 2.0. The brown nodes represent drugs in cluster *C*<sup>0</sup> (479 drugs), green nodes represent drugs in cluster *C*<sup>1</sup> (346 drugs), light blue nodes represent drugs in cluster *C*<sup>2</sup> (270 drugs), orange nodes represent drugs in cluster *C*<sup>3</sup> (129 drugs), and pink nodes represent drugs in cluster *C*<sup>4</sup> (12 nodes).

We present the topological DDSN placement of Pyridoxal phosphate—predicted repositioning from cluster *C*0—in Figure 6, where a red diamond () marks the exact position.

In Figure 7, we illustrate the position of albendazole and methotrexate in the DDSN built with DrugBank 5.0.8 data as predicted drug repositionings from cluster *C*1. Other drug repurposing candidates from cluster *C*<sup>1</sup> (presented in Table 1) are shown in Appendix B.1 and Figure A5: simvastatin, fluvastatin, lovastatin, and atorvastatin.

Figure 8 displays the DrugBank 5.0.8 DDSN placement of cholecalciferol, ergocalciferol, and calcifediol—drug repurposing candidates from cluster *C*2. In Appendix B.1, Figures A6–A8, we identify the topological positions of the other drug repurposing canditates in cluster *C*<sup>2</sup> (Table 1): meloxicam, theophylline, and chloroquine.

**Table 1.** The list of drug repurposing candidates generated with our methodology in Figure 1 on data from DrugBank 5.1.8, and confirmed with scientific literature. The rows correspond to drugs or drug classes (for example, simvastatin, fluvastatin, lovastatin, and atorvastatin are statins). The columns indicate—from left to right—the name, the cluster, the current level 1 ATC code in DrugBank 5.1.8, the predicted level 1 ATC code, and the confirmation references for the drug (or drug class) in each row.


We also show the placement of drug repurposing candidates mecasermin and mecasermin rinfabate (in Figure 9, in cluster *C*4, with red diamonds ) and ornithine (in Figure 10, in cluster *C*25, with a red arrow →).

The histograms showing the dominant properties (as level 1 ATC codes) in clusters *C*0, *C*1, *C*2, and *C*<sup>4</sup> are presented in Appendix B.2, Figure A9.

**Figure 6.** The DrugBank 5.1.8 DDSN network's zoomed detail shows the repositioning within cluster *C*<sup>0</sup> (brown nodes) with a red diamond (). Our repositioning pipeline predicts that pyridoxal phosphate (currently at ATC level 1 code A–*Alimentary tract and metabolism*) has properties described by the level 1 ATC code N—*Nervous system*.

**Figure 7.** The DrugBank 5.1.8 DDSN network's zoomed detail shows two repositionings within cluster *C*<sup>1</sup> (green nodes) with a red diamond (). Our repositioning pipeline predicts that albendazole and methotrexate (currently at ATC level 1 codes P–*Antiparasitic products, insecticides, and repellents* and L–*Antineoplastic and immunomodulating agents*, respectively) have properties described by the level 1 ATC code J–*Anti infectives for systemic use*.

**Figure 8.** The DrugBank 5.1.8 DDSN network's zoomed detail shows three repositionings (vitamin D derivatives) within cluster *C*<sup>2</sup> (light blue nodes) with a red diamond (). Our repositioning pipeline predicts that cholecalciferol, ergocalciferol, and calcifediol (currently at ATC level 1 codes A–*Alimentary tract and metabolism* and M–*Musculo-skeletal system*) have properties described by the level 1 ATC code L–*Antineoplastic and immunomodulating agents*.

**Figure 9.** The DrugBank 5.1.8 DDSN network's zoomed detail shows two repositionings within cluster *C*<sup>4</sup> (pink nodes) with a red diamond (). Our repositioning pipeline predicts that mecasermin and mecasermin rinfabate (currently at ATC level 1 codes H–*Systemic hormonal preparations, excluding sex hormones and insulins*) have properties described by the level 1 ATC code A–*Alimentary tract and metabolism*.

**Figure 10.** The DrugBank 5.1.8 DDSN network's zoomed detail shows a repositioning within cluster *C*<sup>25</sup> (light orange) with a red arrow (→). Our method predicts that ornithine (currently at ATC level 1 code A–*Alimentary tract and metabolism*) has properties described by the level 1 ATC code N–*Nervous system*.

#### *3.3. Repositioning Confirmations*

#### 3.3.1. Confirmed Drug Repositionings in DrugBank 5.0.9

This section discusses the drug repositioning hits generated with our methodology in DrugBank 5.0.9 and confirmed with the level 1 ATC codes in DrugBank 5.1.8. Our procedure confirmed the predicted hints in modularity classes 0 and 2.

#### Modularity Cluster *C*<sup>0</sup>

In modularity cluster *C*0, DrugBank 5.1.8 confirms mepolizumab and naloxone (see Figures A1 and A2). Naloxone (ATC code V03AB15) is a µ-opioid receptor antagonist indicated in the treatment of opioid overdose. In DrugBank 5.0.9, naloxone's first level ATC is V–*Various*; its level 4 (V03AB) means naloxone is in the *Antidotes* category.

Our methodology predicts naloxone's level 1 ATC as N–*Nervous system*; the latest DrugBank 5.1.8 adds two N level 1 ATC codes to naloxone (level 4 ATC category *Natural opium alkaloids* for the combinations with hydromorphone and oxycodone), thus confirming our prediction.

Mepolizumab (ATC code L04AC06) is a monoclonal antibody acting as an antagonist of interleukin-5, included in the L–*Antineoplastic and immunomodulating agents* level 1 ATC category by DrugBank 5.0.9.

DrugBank 5.1.8 does not list the L04AC06 code anymore for mepolizumab; instead, it uses the level 1 ATC code R–*Respiratory system* (the level 4 ATC is R03DX, which includes *other systemic drugs for obstructive airways diseases*, as mepolizumab is indicated in severe eosinophilic asthma).

#### Modularity Cluster *C*<sup>2</sup>

In modularity cluster *C*2, DrugBank 5.1.8 confirms torasemide, methazolamide, acetazolamide, dorzolamide, brinzolamide, zonisamide, and quinetazone (see Figure A3).

Torasemide, quinetazone, methazolamide, acetazolamide, dorzolamide, and zonisamide, brinzolamide (ATC codes: C03CA04, C03BA02/C03BB02, S01EC05, S01EC01, S01EC03, N03AX15, S01EC04/S01EC54) are sulfonamide compounds with various pharmacodynamic effects. According to DrugBank 5.0.9, torasemide and quinetazone are diuretics used as antihypertensive drugs, included in the C–*Cardiovascular system* level 1 ATC category. Zonisamide is an antiepileptic drug (level 1 ATC N–*Nervous system*). Methazolamide, acetazolamide, dorzolamide, and brinzolamide are carbonic anhydrase inhibitors used in glaucoma (level 1 ATC S–*Sensory organs*).

Our methodology predicts G–*Genito urinary system and sex hormones* as the level 1 ATC code for torasemide, quinetazone, methazolamide, acetazolamide, dorzolamide zonisamide, and brinzolamide. Indeed, the latest DrugBank 5.1.8 version includes all these drugs in the G level 1 ATC category—more precisely, in the G01AE level 4 ATC category of *Anti-infective and antiseptics* having a sulfonamide-based chemical structure.

#### 3.3.2. Drug Repositioning Hints in DrugBank 5.1.8

This section discusses the validity of some drug repositioning hints generated with our methodology in DrugBank 5.1.8; as this is the latest database version, we cannot use the same confirmation procedure based on ATC codes. Consequently, we provide evidence found in the state-of-the-art literature as confirmation clues. However, as both the number of clusters and their size prohibit an exhaustive literature search, we focus on the clusters with confirmed drug repurposing candidates—clusters *C*0, *C*1, *C*2, *C*4, and *C*25.

Pyridoxal phosphate (cluster *C*0, ATC code A11HA06) is the active form of vitamin B6 and belongs to the A–*Alimentary tract and metabolism* level 1 ATC category, along with the rest of water-soluble and fat-soluble vitamins. Our method predicts pyridoxal phosphate as level 1 ATC code N–*Nervous system* (see Figure 6); H-S Wang et al. reported that pyridoxal phosphate controls idiopathic intractable epilepsy in children [44]. P.B. Mills and team identified two groups of patients with neonatal epileptic encephalopathy (determined by PNPO mutations) that respond to pyridoxal phosphate [45].

Albendazole (cluster *C*1, ATC code P02CA03) is an antiparasitic drug (first level ATC P–*Antiparasitic products, insecticides and repellents*) efficient in various helminthic infections. Our methodology predicts J as level 1 ATC code, suggesting potential systemic anti-infective effects (see Figure 7). Of note, ATC lists drug classes such as antivirals, antibacterials, antimycotics, and vaccines in the J–*Anti-infectives for systemic use* category. In vitro results show that albendazole exerts antifungal activity against *Aspergillus* spp. [46]; moreover, experiments on mice revealed antifungal effects against *Pneumocystis carinii* [47], confirming the new potential antifungal medical use of albendazole.

Methotrexate (cluster *C*1, ATC codes L04AX03, L01BA01) is an anticancer and immunosuppressant agent; therefore, the level 1 ATC is L–*Antineoplastic and immunomodulating agents*. We predict the first level J–*Anti infectives for systemic use* (see Figure 7). The literature survey reveals several papers reporting in vitro antiviral effects of methotrexate in a dosedependent manner on SARS-CoV-2 [48] and Zika virus replication [49]; methotrexate also prevents the replication of human cytomegalovirus and inhibits viral DNA synthesis [50].

Simvastatin, fluvastatin, lovastatin, and atorvastatin (cluster *C*1, ATC codes A10BH51/ C10AA01/C10BX04/C10BA02/C10BX01/C10BA04, C10AA04, C10AA02/C10BA01, and C10BX15/C10AA05/C10BX03/C10BA05/C10BX11/C10BX08/C10BX06/C10BX12) are HMG-CoA reductase inhibitors (also called statins) that lower serum lipid levels, reducing the risk of cardiovascular events caused by hyperlipidemia; they are in the level 1 ATC C– *Cardiovascular system* class. The first level of their ATC code, as predicted by our method, is J–*Anti infectives for systemic use* (see Figure A5), confirmed by literature; as such, simvastatin exhibits in vitro antimicrobial effect on methicillin-susceptible Staphylococcus aureus [51]. S.P. Parihar et al. [52] review the literature reporting preclinical and clinical evidence of statins effects in viral, parasitic, fungal, and bacterial infections, pointing out the factors that influence the response to statins, such as human polymorphism, metabolism, and drug interactions; this review includes data on all mentioned statins. Our algorithm predicts that all statins in cluster *C*<sup>1</sup> are potential anti-infective agents. As shown, for the statins we highlighted in Figure A5, we found literature confirming our prediction; for the other statins, new experiments and studies may provide confirmation.

Theophylline (cluster *C*2, ATC codes R03DA54, R03DA74, R03DA20, R03DA04, and R03DB04) is a methylxanthine derivative used to treat obstructive respiratory conditions, such as asthma and COPD, hence having R–*Respiratory system* as first level ATC code. Our methodology indicates theophylline's *Anticancer and immunomodulating properties*, as reflected by the predicted ATC first level L (see Figure A7), thus further confirming the repositioning proposed by our previous research [14]. Indeed, recent literature demonstrates the anticancer properties of theophylline in breast and cervical cell lines [53].

Meloxicam (cluster *C*2, ATC codes M01AC56 and M01AC06) is an oxicam derivative with anti-inflammatory and antirheumatic properties of the M–*Musculo-skeletal system* ATC category. Our network-based methodology predicts L as the first level of the ATC code (see Figure A6). The literature confirms our prediction of the anticancer properties of meloxicam: Meloxicam inhibits tumor growth in COX-2 positive colorectal cancer [54]. Tsubouchi et al. report that COX-2 plays a significant role in the pathogenesis and progression of non-small cell lung cancer (NSCLC), demonstrating the inhibitory effect of meloxicam on the NSCLC growth by preferentially inhibiting COX-2 [55]. Reference [56] shows that meloxicam is efficient in osteosarcoma in both COX-2-dependent and independent inhibitory manners.

Cholecalciferol, ergocalciferol, and calcifediol (cluster *C*2, ATC codes M05BB09/ M05BX53/M05BB07/M05BB08/A11CC55/M05BB05/A11CC05/M05BB03/M05BB04, A11CC01, and A11CC06) are vitamin D analogs. Cholecalciferol (vitamin D3) is a fat-soluble vitamin (ATC level 1 A–*Alimentary tract and metabolism*, a category which includes hydro-soluble and lipo-soluble vitamins) with a well-established role in bone mineralization (ATC second level M05–*Musculo-skeletal system, drugs for treatment of bone diseases*). Ergocalciferol and calcifediol are also grouped in A–*Alimentary tract and metabolism* level 1 ATC. We predict these drugs as targeting diseases at level 1 ATC code

L–*Antineoplastic and immunomodulating agents* (see Figure 8).There is extensive literature reporting the beneficial effects of vitamin D analogs in different cancers and highlighting the epidemiological, preclinical, and clinical results; all these back up their evolution as prophylactic and curative anticancer drugs [57,58].

Chloroquine (cluster *C*2, ATC code P01BA01) is an antimalarial drug; consequently, it belongs to the P–*Antiparasitic products, insecticides and repellents* level 1 ATC category. According to our results, the predicted first-level ATC is L–*Antineoplastic and immunomodulating agents* for chloroquine (dominant in cluster *C*1, see Figure A8). Multiple research reviews report in vitro, in vivo, and clinical trials testing chloroquine's anticancer effect in glioblastoma [59] and other types of cancers [60–63], hence supporting the potential repositioning of chloroquine as an anticancer drug, as uncovered by our methodology.

Mecasermin and mecasermin rinfabate (cluster *C*4, ATC codes H01AC03, H01AC05) are recombinant insulin-like growth factor-1 drugs indicated in growth failure in children with primary IGF-1 deficiency and, hence, are included in the H–*Systemic hormonal preparations, excluding sex hormones and insulins*. Literature and medicine regulatory authorities reports present the secondary pharmacologic actions of mecasermin and mecasermin rinfabate, including the anabolic and insulin-like effects (i.e., hypoglycemia) [64–66]; these pharmacologic effects could place the drugs in the A–*Alimentary tract and metabolism* level 1 ATC, as predicted by our methodology (see Figure 9).

Ornithine (cluster *C*25, ATC code A05BA06) is a non-essential amino acid indicated as nutritional supplementation and for a good liver function and included in the A–*Alimentary tract and metabolism* level 1 ATC. M. Miyake et al. suggest that L-ornithine may interfere with the Central Nervous System, following a randomized, double-blind controlled trial that demonstrated that L-ornithine relieved stress and improved sleep quality in humans compared to the placebo group [67]. Indeed, we predicted ornithine at level 1 ATC N– *Nervous system* (see Figure 10).

#### **4. Discussion**

In this section, we discuss the particularities of our method, namely the data we use, the limitations of our method and its validation with ATC codes, and the way to integrate it into an ensemble drug repositioning framework.

#### *4.1. Drug–Gene Interactions*

The method we propose in this paper uses drug–gene interaction data from DrugBank versions 5.0.9 and 5.1.8. Table 2 presents examples of drug–gene interactions and their corresponding types, as defined by DrugBank 5.1.8 (see a detailed list of drug–gene interaction types in the Supplementary Materials Table S1 file *DDSN-results.xls* and how to retrieve such drug–gene interactions from DrugBank in the GitHub page https://github.com/ GrozaVlad/Drug-repurposing-using-DDSNs-and-modularity-clustering (last commit on 21 October 2021)).

#### *4.2. Method Limitations*

The mechanisms that influence the polypharmacological profile of drugs are highly complex. Indeed, the medicinal compound interacts with a complex system represented by the human organism. Complex systems are context-dependent; in other words, any detail at the micro-scale influences the macroscale behavior. As such, many factors can be considered when analyzing the functions of any pharmaceutical substance: from the chemical structure to various types of relationships and interactions, as well as pharmacokinetics and pharmacodynamics. By this logic, our approach is limited to considering a narrow informational angle, namely drug–gene interactions. Nonetheless, considering many mechanisms and types of data simultaneously within the same model would be prohibitively complex, and the networks would become much too dense for any centrality of community analysis. Even considering one type of information has become significantly complex; for instance, the drug–drug interaction networks in DrugBank 3.0 had an average degree of ∼20, and

in DrugBank 5.1.8 the average DDI network degree is ∼600). Recent literature [68–70] advances the so-called ensemble methods to address this new situation of being confronted with an overabundance rather than scarcity of data (see Section 4.4).


**Table 2.** Examples of drug–gene interactions listed in DrugBank.

#### *4.3. Labeling and Validation with ATC Codes*

Employing computational methods (i.e., data mining and machine learning) in drug repositioning is generally hampered because we do not a have robust ground truth. Indeed, databases such as DrugBank record positive information about the drugs' known properties and functions, yet the absence of evidence is not evidence of absence (some drug properties may be hidden, and only future experiments can fully reveal them). That is why performance evaluation and validation of computational drug repositioning models are still an open issue; therefore, researchers adopt ad hoc, particular strategies, which are hard to compare [71]. Consequently, we resorted to making predictions with an older database version and then validating them with the latest version. However, even the latest database still cannot contain exhaustive information about drug functions. Furthermore, the negative information on drug functions/effects (stating what properties a drug does not have) will help prune the vast search space in drug repositioning. Unfortunately, negative information is scarce and scattered throughout the literature; to the best of our knowledge, no comprehensive dataset contains such data based on experimental results. As such, the existing negative information cannot be used algorithmically/automatically. As explained, one feasible method for filtering the noise and navigating the search space affected by uncertainty—an approach supported by recent research—is to integrate tools (such as the one we propose here) in ensemble methods.

Many computational drug repositioning methods based on complex networks rely on community detection and community labeling. However, labeling can be cumbersome and subjective; thus, we decided to use ATC codes, since this system is the standard for classifying medicines accepted by the WHO. Furthermore, the automated approach is fostered because the ATC code aggregates all information about a drug in a combination of

letters and numbers, which are easier to process algorithmically. The ATC code classifies drugs on five levels considering three criteria simultaneously: anatomical (A)—the first level; therapeutic (T)–levels 2 and 3; and chemical (C)—levels 4 and 5. The anatomical criterion indicates the anatomical level or the physiological organ systems on which a specific drug acts. Each anatomical level is indicated in the ATC code by a letter (e.g., A–*Alimentary tract and metabolism*, C–*Cardiovascular system*, M–*Musculoskeletal system*, or R–*Respiratory system*); the ATC system contains 14 anatomical groups. Level 2 represents the therapeutic classification criterion and is encoded by two digits. Level 3 (encoded by a letter) indicates the particular pharmacological group of the drug. Level 4 (encoded by a letter) indicates the chemical class of the drug. Level 5 is encoded by two digits the chemical structure of the drug. This paper only used the first-level ATC codes for labeling and validation of prediction, although drug function is more precisely expressed by levels 1–3; we opted perform this because the sophisticated hierarchical clustering algorithms entailed by such an approach would have unnecessarily intensified the computational character of our study.

#### *4.4. Method Application*

When the problem at hand is too complex to solve by employing a single model, machine learning uses an ensemble strategy [72], which trains several models on the same set of data to operate collectively for solving the problem. This strategy is already used in bioinformatics to approach complex problems such as motif discovery in ChIP-Seq data [73]. The problem of drug repositioning is also very complex; however, prediction accuracy is not the primary indicator of success (the benefit of correctly predicting even a few drug repositionings is more significant than the cost of experiments entailed by testing the wrong predictions [74].) As such, very recent literature advances the idea of using ensemble methods for drug repositioning [69,70].

In this context, considering that—as explained in Section 4.2—our method uses drug– gene interaction data that partially describes the behavior of drugs, we indicate the ensemble strategy as ta method to use our method. As shown in Figure 11, drug repositioning prediction based on drug–gene interaction data may be Method*i* from the group of machine learning methods based on distinct models {Method1, Method2, . . . Method*m*}. The repositioning hints list *i* is aggregated (i.e., via voting, averaging, or other procedures) to produce a final drug repositioning hints list. The aggregation process may use pharmacological expertise, e.g., to adjust the weights of a weighted average. However, implementing the ensemble strategy is beyond the scope of this paper, which aims to analyze and promote—for the first time—the beneficial role of drug–gene interaction networks for computational drug repositioning.

**Figure 11.** Overview of the ensemble strategy in drug repositioning. A group of machine learning and data mining methods {Method1, Method2, . . . Method*m*}, implementing various models and using distinct features (e.g., drug–drug interactions, drug–target interactions, drug–gene interactions, drug– adverse reactions relationships, pharmacokinetic properties) from the same comprehensive dataset and predicting a list of drug repositioning hints. Each method Method*i* generates its repositioning hints list, and an aggregation process assembles all lists in the final repurposing hints list.

#### **5. Conclusions**

In this paper, we propose a new drug repurposing methodology based on algorithmic complex network analysis. To this end, we introduce an original method of building the Drug–Drug Similarity Network (DDSN) using drug–gene interactions from DrugBank, clustering DDSN with modularity classes, and labeling each cluster with the dominant first level ATC code of drugs within the cluster. The assumption that results in drug repurposing hints is that drugs in a cluster share the dominant property of the cluster. We use an automated procedure to tune modularity resolution, to apply our methodology on a DDSN built with data from DrugBank 5.0.9, to generate the list of drug repurposing hints (i.e., drugs for which the first level ATC does not match the dominant cluster label), and to check it against ATC codes in DrugBank 5.1.8.

By running our method on the DrugBank 5.1.8 DDSN, we generated a consistent list of drug repositioning candidates; we select the top betweenness/degree drugs in each cluster and perform a preliminary validation with state-of-the-art experimental results reported in the literature. Due to the fact that we collected many literature confirmations of our method's predictions, we argue that our fully automated pipeline, based on Big Data and unsupervised machine learning, is a practical tool that can substantially narrow the enormous search space in drug repositioning.

To summarize, the overarching methodological contributions of our paper are listed as follows:


In the present context, affected by the COVID-19 pandemic, we believe that the most promising findings/results presented in our paper are the anti-infective effects of statins, especially their potential antiviral effects. Indeed, the very recent comprehensive study [6] also finds, following in vitro screening, that fluvastatin presents what the authors call "strong effect" against SARS-CoV-2.

Considering all aspects presented in Section 4.2, we will extend our research on drug– gene interaction networks by implementing hierarchical clustering to predict ATC codes on levels 1–3, developing a dedicated cluster overlapping algorithm as a drug repositioning prediction strategy (i.e., one would reasonably expect that drugs in the overlapping zone would inherit the dominant properties of the respective clusters) and integrating the drug– gene network method into an ensemble strategy. These future objectives require substantial reliance on developing bioinformatic tools, entailing algorithm design, machine learning, and Big Data analytics.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/pharmaceutics13122117/s1, Table S1: DDSN-results.

**Author Contributions:** Conceptualization, M.U. and L.U.; methodology, V.G., M.U., A.B. and L.U.; software, V.G. and A.B.; validation, V.G., A.B. and L.U.; formal analysis, M.U.; investigation, V.G. and A.B.; resources, M.U. and L.U.; data curation, V.G. and L.U.; writing—original draft preparation, M.U. and L.U.; writing—review and editing, V.G., A.B. and L.U.; visualization, V.G., M.U. and L.U.; supervision, M.U. and L.U.; project administration, M.U. and L.U.; funding acquisition, L.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by a grant of the Romanian Ministry of Education and Research, CCCDI-UEFISCDI, project number PN-III-P2-2.1-PED-2019-2842, within PNCDI III.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** This study uses only public database data.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

ATC Anatomical Therapeutic Chemical; COPD Chronic Obstructive Pulmonary Disease; COX-2 Cyclooxygenase-2; DDSN Drug–Drug Similarity Network;

NSCLC Non-Small Cell Lung Cancer.

#### **Appendix A. Repositionings and Statistics for DrugBank 5.0.9 DDSN**

*Appendix A.1. DDSN Zoomed Details*

**Figure A1.** The zoomed detail of the DDSN network built with drug–gene interaction data from DrugBank 5.0.9, which shows the relative position of mepolizumab within cluster *C*<sup>0</sup> (brown nodes) with a red arrow (→). Our repositioning pipeline predicts that mepolizumab—listed as antineoplastic in DrugBank 5.0.9—also acts as a drug with level 1 ATC code R (*Respiratory system*), confirmed by the more recent DrugBank version 5.1.8.

**Figure A2.** The zoomed detail of the DrugBank 5.0.9 DDSN network showing the relative position of naloxone within cluster *C*<sup>0</sup> (brown nodes) with a red arrow (→). Our repositioning pipeline predicts that naloxone—listed as opioid overdose antidote in DrugBank 5.0.9—also acts as a drug with level 1 ATC code N (*Nervous system*), confirmed by the more recent DrugBank version 5.1.8.

**Figure A3.** The DrugBank 5.0.9 DDSN network's zoomed detail shows the confirmed repositionings within cluster *C*<sup>2</sup> (green nodes) with red diamonds (). Our repositioning pipeline predicts that torasemide and quinetazone (both with ATC level 1 code C–*Cardiovascular system* in DrugBank 5.0.9), methazolamide, acetazolamide, dorzolamide, and brinzolamide (all with ATC level 1 code S–*Sensory organs* in DrugBank 5.0.9) are *Genito urinary system and sex hormones* drugs (first level ATC G). Zonisamide (N–*Nervous system*) is a brown node (cluster *C*0) but in the close vicinity of cluster *C*2; therefore, they are also predicted at level 1 ATC code G.

*Appendix A.2. DDSN Cluster Histograms*

**Figure A4.** Histograms of level 1 ATC codes in the DrugBank 5.0.9 DDSN clusters holding drug repositionings confirmed by DrugBank 5.1.8: cluster *C*<sup>0</sup> (brown nodes) in the left panel and cluster *C*<sup>2</sup> (green nodes) in the right panel. The dominant property in cluster *C*<sup>0</sup> is N–*Nervous system*, with many subcluster drugs with level 1 ATC codes A, R, and C (*Alimentary tract and metabolism*, *Respiratory system*, and *Cardiovascular system*, respectively). The dominant properties in cluster *C*<sup>2</sup> are G, C, and D (*Genito urinary system and sex hormones*, *Cardiovascular system*, and *Dermatologicals*, respectively).

#### **Appendix B. Repositionings and Statistics for DrugBank 5.1.8 DDSN**

*Appendix B.1. DDSN Zoomed Details*

**Figure A5.** The DrugBank 5.1.8 DDSN network's zoomed detail shows four repositionings within cluster *C*<sup>1</sup> (green nodes) with a red diamond (). Our repositioning pipeline predicts that simvastatin, fluvastatin, lovastatin, and atorvastatin (currently at ATC level 1 codes C–*Cardiovascular system*) have properties described by the level 1 ATC code J–*Anti infectives for systemic use*.

**Figure A6.** The DrugBank 5.1.8 DDSN network's zoomed detail shows a repositionings within cluster *C*<sup>2</sup> (light blue nodes) with a red diamond (). Our repositioning pipeline predicts that meloxicam (currently at ATC level 1 code M–*Musculo-skeletal system*) has properties described by the level 1 ATC code L–*Antineoplastic and immunomodulating agents*.

**Figure A7.** The DrugBank 5.1.8 DDSN network's zoomed detail shows a repositioning within cluster *C*<sup>2</sup> (light blue nodes) with a red diamond (). Our repositioning pipeline predicts that theophylline (currently at ATC level 1 code R–*Respiratory system*) has properties described by the level 1 ATC code L–*Antineoplastic and immunomodulating agents*.

**Figure A8.** The DrugBank 5.1.8 DDSN network's zoomed detail shows repositioning within cluster *C*<sup>2</sup> (light blue nodes) with a red diamond (). Our repositioning pipeline predicts that chloroquine (currently at ATC level 1 code P–*Antiparasitic products, insecticides and repellents*) has properties described by the level 1 ATC code L–*Antineoplastic and immunomodulating agents*.

*Appendix B.2. DDSN Cluster Histograms*

**Figure A9.** Histograms of level 1 ATC codes in the DrugBank 5.1.8 DDSN clusters holding drug repositionings confirmed by literature review: cluster *C*<sup>0</sup> (brown nodes), cluster *C*<sup>1</sup> (green nodes), cluster *C*<sup>2</sup> (light blue nodes), and cluster *C*<sup>4</sup> (pink nodes). The dominant property in cluster *C*<sup>0</sup> is N– *Nervous System*, J–*Anti-infectives for systemic use* in cluster *C*<sup>1</sup> , L–*Antineoplastic and immunomodulating agents* in cluster *C*2, and A–*Alimentary Tract and Metabolism* in cluster *C*<sup>4</sup> .

#### **References**


## *Article* **In Silico Screening of Available Drugs Targeting Non-Small Cell Lung Cancer Targets: A Drug Repurposing Approach**

**Muthu Kumar Thirunavukkarasu <sup>1</sup> , Utid Suriya <sup>2</sup> , Thanyada Rungrotmongkol 3,4,\* and Ramanathan Karuppasamy 1,\***


**Abstract:** The RAS–RAF–MEK–ERK pathway plays a key role in malevolent cell progression in many tumors. The high structural complexity in the upstream kinases limits the treatment progress. Thus, MEK inhibition is a promising strategy since it is easy to inhibit and is a gatekeeper for the many malignant effects of its downstream effector. Even though MEK inhibitors are under investigation in many cancers, drug resistance continues to be the principal limiting factor to achieving cures in patients with cancer. Hence, we accomplished a high-throughput virtual screening to overcome this bottleneck by the discovery of dual-targeting therapy in cancer treatment. Here, a total of 11,808 DrugBank molecules were assessed through high-throughput virtual screening for their activity against MEK. Further, the Glide docking, MLSF and prime-MM/GBSA methods were implemented to extract the potential lead compounds from the database. Two compounds, DB012661 and DB07642, were outperformed in all the screening analyses. Further, the study results reveal that the lead compounds also have a significant binding capability with the co-target PIM1. Finally, the SIE-based free energy calculation reveals that the binding of compounds was majorly affected by the van der Waals interactions with MEK receptor. Overall, the in silico binding efficacy of these lead compounds against both MEK and PIM1 could be of significant therapeutic interest to overcome drug resistance in the near future.

**Keywords:** drug-repositioning; MEK inhibitor; MM/GBSA; Glide docking; MD simulation; MM/PBSA

#### **1. Introduction**

Lung cancer accounts for about a quarter of all cancer deaths, among them 82% of deaths were being caused by intentionally smoking cigarettes. The development of advanced therapies for the management of the early and metastatic stages of lung cancer were not yet discovered over the past 40 years. Although some treatment measures are available to control the earlier stages of lung cancer, poor outcomes reduce the overall patient survival rates. One of the common clinical symptoms of lung cancer is frequently coughing for a particular period. For example, patients in the United States who had been coughing for three weeks were finally identified with lung cancer [1]. In the United Kingdom, smoking is responsible for 71% of lung cancer deaths, whereas 1% of the deaths of passive smokers were reported. The Canadian researchers reports that the lung cancer deaths in smokers were 15% higher than in non-smokers. In India, 9.3% of the cancer deaths were associated with lung cancer, containing both male and female patients [2]. The low lung cancer survival rates reflect the high number of patients diagnosed with metastatic

**Citation:** Thirunavukkarasu, M.K.; Suriya, U.; Rungrotmongkol, T.; Karuppasamy, R. In Silico Screening of Available Drugs Targeting Non-Small Cell Lung Cancer Targets: A Drug Repurposing Approach. *Pharmaceutics* **2022**, *14*, 59. https://doi.org/10.3390/ pharmaceutics14010059

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 30 September 2021 Accepted: 20 December 2021 Published: 28 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

disease (57%). Currently, surgery, radiation therapy, chemotherapy and targeted therapies were used to treat the lung cancer patients. Among these methods, targeted therapies demonstrated better outcome during the cancer treatment [3,4]. Genetic expression and mutational studies were certainly used to identify the definitive target for lung cancer. The incidence of particular mutations varies depending on ethnicity and location. The EGFR mutations that were reported in Caucasians were found to be 10%, whereas 60% of the mutational rates were reported in Asian people [5]. Ultimately, the tyrosine kinase pathway plays a major role in the tremendous increase in lung cancer deaths. Mitogenactivated protein kinase (MAPK) is one of the promising growth signaling pathways. The aberrant activation of this pathway's intermediates leads to uncontrolled cell growth and differentiation. In many cancer types, concomitant mutations occurred in RAS and BRAF, which is the reason for the consecutive activation of ERK, which is responsible for the activation of many transcription factors [6]. Hence, targeting the pathway receptors using checkpoint inhibitors leads to effective therapy in most cancers. However, a strong association between RAS and GTP impedes the direct inhibition of RAS. The lack of understanding regarding the allosteric sites is also a hindrance to the development of RAS targeting inhibitors [7]. The next intermediate RAF is another important target when there is an existence of BRAFV<sup>600</sup> mutations. Nevertheless, the acquired resistance in RAF selective inhibitors is the reason for the constant activation of the MAPK pathway in many cancers [8]. Therefore, it is possible to affirm a downstream cut-off of the MAPK pathway at a protein kinase called MEK. The MEK receptor is a key node in the MAPK pathway, which is the only known substrate of its downstream effector, ERK. In the recent decade, hundreds of MEK inhibitors were discovered to target the allosteric binding site of MEK [9]. Although these selective inhibitors were effective at the allosteric site, a poor cytotoxicity profile limits their treatment progress. For instance, the most potent kinase inhibitors, such as binimetinib, selumetinib, cobimetinib and rafametinib, caused diarrhea, elevated lipase levels and rashes as adverse effects [10,11]. It is important to note that a recently approved MEK selective inhibitor trametinib showed the most efficacy in the BRAF mutant tumors in combination with dabrafenib [12,13]. However, trametinib alone showed additional side effects during the treatment period in non-small cell lung cancer patients. For instance, trametinib specifically affects the ocular region of the patients. Moreover, a severe complication in the ocular region may lead to permanent vision loss in the patients [14]. In addition to their toxic effects, several MEK inhibitors were resistant to the BRAF mutations through the activation of adjutant signaling pathway receptors. The initial solution to the problem of resistance to therapy is the dual inhibition of crucial targets with the administration of a single therapy. It is also interesting to note that the inhibition of multiple kinases will produce better outcomes during clinical trials. For instance, the combination of MEK and JAK2/STAT3 pathway inhibition reduces the potential impact on drug resistance in colon cancer [15]. Similarly, a combination of MEK and PI3K inhibitors is a powerful treatment option for NSCLC patients who have developed resistance to EGFR–TKIs [16]. Note that dual inhibitors will produce more beneficial effects than the combined inhibitors in terms of cost and time taken for the approval. Note also that PIM1 is a critical effector facilitating cross-talk across several neighboring pathways, in particular to the MAPK pathway. Recent studies highlight that MEK inhibitors lead to the increased expression of PIM1, thereby increasing cancer cell growth [17,18]. Keeping this in mind, we framed an in silico-based drug repurposing workflow to screen the potential inhibitors that act against both MEK and PIM1.

Drug repurposing has become one of the most popular ways for increasing the efficiency and cost-effectiveness of drug development. Importantly, the discoveries of the novel indications of existing drugs were the major applications in drug repurposing strategies. In recent years, almost 30% of the FDA-approved drugs and vaccines were discovered by in silico approaches. For instance, the discovery of zanamivir was made possible using a computer-aided drug design technique based on the crystal structure of influenza virus neuraminidase [19]. Adding together the implementation of machine learning principles

and virtual screening would certainly enhance the accuracy of screening results. Machinelearning-based approaches produce more reliable results and provide faster outcomes by learning existing experimental data [20]. Hence, we incorporated machine-learning-based scoring functions (MLSF) to screen the potential compounds against the MEK receptor since they have attained a plateau in their performance during the binding affinity prediction [21]. We are certain that the outcome of this study is of immense importance for the experimental biologist involved in the screening of MEK inhibitors.

#### **2. Methodology**

#### *2.1. Dataset*

Structural information of proteins and ligand molecules were retrieved from the protein data bank (PDB) and DrugBank database, respectively. The 3D structure of two protein molecules, such as MEK1 (PDB ID:3W8Q) and PIM1 (PDB ID: 5KZI), were downloaded in the PDB format [22,23]. Eventually, the DrugBank molecules were downloaded as three subsets containing FDA (Food and Drug Administration)-approved drugs (*n* = 3085), experimental drugs (*n* = 5689) and investigational drugs (*n* = 3034) for screening application.

#### *2.2. Protein and Ligand Preparation*

Preparation of the receptor molecules was carried out using protein preparation wizard present in the maestro workspace. The four major pre-processing steps that were carried out include: (i) bond order assignment, (ii) addition of missing hydrogen atoms, (iii) creation of zero-order bond for the metal atoms and (iv) di-disulfide bond creation. The preprocessed proteins were then subjected to a hydrogen bond optimization process. During the optimization, the protonation state of each amino acid residue was calculated, and the pH was adjusted to 7 ± 0.5 using the predicted pKa values. The predicted and adjusted pKa values of amino acid residues of the proteins were presented in Supplementary Materials Table S1. If the predicted pKa was less than pH value, the amino acid functional groups were protonated during the optimization process. On the other hand, if the pKa was greater than pH value, the deprotonation process took place in those amino acid functional groups. Note that, if the pKa and pH values were equal, 50% protonation and 50% deprotonation took place. Subsequently, the excess water molecules were removed because of the higher occupancy at the receptor binding pocket. Finally, the heavy atoms were converged at the RMSD (root-mean-square deviation) value of 0.30 Å using restrained minimization process [24].

The ligand molecules were processed using LigPrep module in the maestro workspace. Initially, all the ligand molecules were subjected to energy minimization using OPLS\_2005 (optimized potentials for liquid simulations) force field at pH 7.0 ± 2. To avoid stereoisomer formation, the chiral centers of all the ligand molecules were chosen to preserve their original state. Notably, all the ligand molecules were allowed to generate only one structural conformation.

#### *2.3. Binding Site Analysis and Grid Generation*

Binding site prediction and pocket druggability analysis are the few important perquisites in drug repurposing strategy [25,26]. Here, we used the sitemap algorithm to predict the binding as well as druggable pockets present is the target receptor. Sitemap predicts the hot spots based on the number of hydrogen bond donors and acceptors, hydrophobic atoms and the concave sites present in the receptor [27]. Later, the grid generation was executed by using receptor grid generation wizard. The grid box was generated around the predicted hot spot residues with the partial charge cut-off of 0.25 and a scaling factor of 1.

#### *2.4. Glide Docking and MM/GBSA Analysis*

All prepared ligand molecules were screened through the high-throughput virtual screening (HTVS) method followed by being docked into the predicted binding sites using Glide XP (Extra-precision) protocols. We have utilized a flexible docking method with the

van der Waals radii scaling factor of 1 Å to soften the receptor binding site. The atoms of the protein with partial charges less than or equal to 0.25 were scaled with a van der Waals scale factor of 0.8 [28]. Later, the ligand interaction diagram was visualized for the in-depth understanding of ligand contacts with the target receptor. Further, the docking score was revalidated by the binding free energy calculations using Prime-MM/GBSA (molecular mechanics with generalized born surface area) analysis. XP docked complexes were further subjected to minimization at the local optimization feature with the force field of OPLS\_2005. Prime estimates the binding free energy by comparing the energy of the complex state to the energy of the individual protein and ligand molecules [29].

#### *2.5. Scoring Functions*

#### 2.5.1. RF-Score Analysis

The MLSF analyzes the molecular docking outputs of the existing protein–ligand complex to predict the binding affinity of unknown compounds [30]. Here, we used RF-Score-VS, which uses a random forest algorithm to predict the binding affinity of the molecules. It is a standalone program (https://github.com/oddt/rfscorevs, accessed on 3 June 2021) that was implemented using the ubuntu terminal. In this tool, a random forest model was robustly set to generate a maximum number of 500 trees. It is worth noting that random forest model used in this study implicitly captures binding effects that are hard to model explicitly. Protein and ligand molecules were supplied in sdf and pdb format, respectively, for the RF score calculation.

#### 2.5.2. Tanimoto Coefficient Calculation

Tanimoto coefficient is one of the most important similarity measures during the virtual screening process [31]. BulkTanimotoSimilarity() function in the RDKit package gets a fingerprint query and a collection of fingerprints to display the list of similarity results for each fingerprint target. This metric estimates the proportion of the common bits in the range of 0 to 1 between the chemical fingerprints. In this section, the Tanimoto resemblances of all DrugBank compounds were tested against the fingerprints generated by the trametinib.

#### *2.6. Molecular Dynamics (MD) Simulations*

The complex structures of two focused compounds and the known drug from molecular docking were dynamically simulated by the near-physiological-motion MD simulations. The AMBER ff14SB force field and generalized AMBER force field version 2 (GAFF2) were employed to treat bonded and non-bonded interaction parameters of all simulated complexes [32]. The TIP3P water model [33] was used to solvate the system with minimum padding of 10.0 Å between the protein surface and the solvation box edge. Then, either sodium or chloride ions were randomly added to neutralize the overall charge of the molecular system. Minimization of the hydrogen atoms and water molecules was performed by using 500 steps of steepest descent (SD) followed by 1500 steps of conjugated gradient (CG) methods. All studied systems were proceeded to run under the periodic boundary condition with the isothermal–isobaric (NPT) scheme according to the previous studies [34–38]. The electrostatic interactions were treated by the particle mesh Ewald summation method [39], whereas The SHAKE algorithm [40] was used to constrain all covalently connected hydrogen atoms. The temperature was controlled by the Langevin thermostat [41] with a collision frequency of 2 ps−<sup>1</sup> and gradually increased from 10 to 310 K. In addition, Berendsen barostat [42] was employed to control pressure with a relaxation time of 1 ps. Each simulated system was subsequently simulated under the NPT ensemble (310 K, 1 atm) until reaching 100 ns. The MD production for all systems was set to 100 ns by the 2-fs increment of a time step. The root-mean-square displacement (RMSD) and hydrogen bond (H-bond) occupations were calculated through the cpptraj module, while per-residue decomposition energy (∆*G* residue binding) was estimated by MM/PBSA.py implemented in AMBER16.

#### *2.7. End-Point Binding Free Energy Calculations*

To evaluate the ligand-binding capability, the total binding free energy (∆*G*binding) of each complex was estimated based upon the solvated interaction energy (SIE) approach [43]. In theory, ∆*G*bind can be estimated as the summation of the van der Waals (*E*vdW), electrostatic (*E*ele(Din)), reaction field (∆*G*RF(*ρ*,*D*in)), cavity (*γ*∆*SA*(*ρ*)), and a constant (*C*) value, which was expressed as the following equation

$$
\Delta \mathbf{G}\_{\text{bind}} \left( \rho\_r \mathbf{D}\_{\text{in}}, \mathbf{a}, \gamma, \mathbf{C} \right) = \mathbf{a} \left[ \mathbf{E}\_{\text{vdW}} + \mathbf{E}\_{\text{ele}} (\mathbf{D}\_{\text{in}}) + \Delta \mathbf{G}\_{\text{RF}} (\rho\_r \mathbf{D}\_{\text{in}}) + \gamma \Delta \mathbf{S} \mathbf{A} (\rho) \right] + \text{constant}
$$

where *D*in is the solute interior dielectric constant. *E*vdW and *E*ele are denoted as intermolecular van der Waals and Coulombic interaction energies in the bound state, respectively. ∆*G*RF is the electrostatic polarization component of the solvation free energy to binding, and ∆*G*cavity (*γ*∆*SA*) represents the nonpolar contribution of the solvation free energy to the binding. The coefficients set to every calculation are α = 0.105, γ = 0.013 and C = −2.89.

#### **3. Result and Discussion**

#### *3.1. Binding Site Prediction*

The identification and characterization of the druggable binding pocket of the MEK1 receptor were identified by employing the sitemap module. The best five binding sites of MEK1 and their physiological characteristics predicted by the sitemap were tabulated in Table 1. The larger quantity of hydrophobic residues at the top three sites shows improved pocket adaptation for the ligand binding. Notably, the druggability score of each pocket was in the range of 0.6 to 1. Sites 4 and 5 have a Dscore less than 0.7, which implies the poor druggability of those pockets. Whereas, sites 1, 2 and 3 have resulted in a Dscore of ~1, which indicates that these sites highly encourage the binding of drug-like molecules on their pocket residues [44]. Although the enclosure of site 3 (0.673) is lower, the higher Dscore (1.005) and sitescore (0.974) make the pocket suitable for molecule binding. The top three sites that displayed significant physiological characteristics for the binding of drug-like molecules are shown in Figure 1. Among these three binding sites, site 1 encompasses the end of the activation loop region where the substrate ERK binds to MEK. In addition, site 1 comprises the important amino acid residues for the activation of the MEK receptor and DGF motif, which is an important motif involved in the MEK phosphorylation process. In addition, site 1 comprises amino acid residues, such as VAL 127, SER212, LYS97, VAL211 and ATP binding site [45]. Since site 1 comprises the crucial pockets, we have utilized the results obtained from site 1 during the validation step and other analyses.

**Table 1.** The top five binding sites of MEK1 receptor predicted by sitemap.


**Figure 1.** (**a**) Schematic representation of top three predicted binding sites. (**b**) Functionally im- 2 portant residues in site 1. 3 **Figure 1.** (**a**) Schematic representation of top three predicted binding sites. (**b**) Functionally important residues in site 1.

1

#### *3.2. Validation of Molecular Docking* 4 *3.2. Validation of Molecular Docking*

The validation of Glide XP docking and RF-Score-VS were accomplished by using 5 external datasets (Table S2). The dataset consists of 25 active compounds and 75 decoy 6 compounds against mitogen-activated protein kinase, which were randomly sampled 7 from the Database of Useful Decoys-Enhanced (DUD-E) using the 'sample()' function in 8 pandas to validate the docking and RF-Score-VS analysis. The results were incorporated 9 into the maestro workspace for enrichment analysis [46]. The 'enrichment calculator' tool 10 was used here to evaluate the screening process. On both of the screening analyses, the 11 compounds were sorted by the respective scoring functions, for instance, the Glide XP 12 score and RF-Score-VS\_v2 for molecular docking and RF-Score-VS analysis, respectively. 13 Later, the effectiveness of the screening methodologies to differentiate between the actives 14 in the decoy set of compounds was tested by producing a receiver operating curve (ROC) 15 (Figure S1). A total of 11 decoys were outranked during the screening process using RF- 16 Score-VS. On the other hand, seven decoys were outranked during the molecular docking 17 analysis. The smaller number of outranked compounds indicates the effectiveness of these 18 screening algorithms. Further, these measures were evaluated using receiver operating 19 characteristic curve (ROC) analysis. Importantly, the ROC value of docking and RF-Score- 20 VS were 0.902 and 0.850, respectively. Moreover, the area under the curve (AUC) was 21 calculated as 0.801 and 0.762 for molecular docking and RF-Score-VS, respectively. Since 22 the AUC value of docking and RF-Score-VS are above 0.7, we believe that both algorithms 23 have the potential to discriminate the active compounds from the target database. Further, 24 we have accessed Pearson's and Spearman's correlations between the docking score and 25 experimentally determined binding affinity of the 25 active compounds. It is worth noting 26 that Pearson's and Spearman's correlation values of 0.758 and 0.818, respectively, were 27 observed. All of these findings indicate that the lead compounds produced through these 28 screening approaches may potentially be effective towards further experimental works. 29 The validation of Glide XP docking and RF-Score-VS were accomplished by using external datasets (Table S2). The dataset consists of 25 active compounds and 75 decoy compounds against mitogen-activated protein kinase, which were randomly sampled from the Database of Useful Decoys-Enhanced (DUD-E) using the 'sample()' function in pandas to validate the docking and RF-Score-VS analysis. The results were incorporated into the maestro workspace for enrichment analysis [46]. The 'enrichment calculator' tool was used here to evaluate the screening process. On both of the screening analyses, the compounds were sorted by the respective scoring functions, for instance, the Glide XP score and RF-Score-VS\_v2 for molecular docking and RF-Score-VS analysis, respectively. Later, the effectiveness of the screening methodologies to differentiate between the actives in the decoy set of compounds was tested by producing a receiver operating curve (ROC) (Figure S1). A total of 11 decoys were outranked during the screening process using RF-Score-VS. On the other hand, seven decoys were outranked during the molecular docking analysis. The smaller number of outranked compounds indicates the effectiveness of these screening algorithms. Further, these measures were evaluated using receiver operating characteristic curve (ROC) analysis. Importantly, the ROC value of docking and RF-Score-VS were 0.902 and 0.850, respectively. Moreover, the area under the curve (AUC) was calculated as 0.801 and 0.762 for molecular docking and RF-Score-VS, respectively. Since the AUC value of docking and RF-Score-VS are above 0.7, we believe that both algorithms have the potential to discriminate the active compounds from the target database. Further, we have accessed Pearson's and Spearman's correlations between the docking score and experimentally determined binding affinity of the 25 active compounds. It is worth noting that Pearson's and Spearman's correlation values of 0.758 and 0.818, respectively, were observed. All of these findings indicate that the lead compounds produced through these screening approaches may potentially be effective towards further experimental works.

#### *3.3. Virtual Screening*

A total of 11,808 molecules from the three subsets of Drugbank were screened through the HTVS docking method. Later, the screened hit molecules (n = 7075) were docked into the best predicted binding site, such as site 1, using the Glide XP method. Note that trametinib was used as a reference compound in all the analyses. The XP docking score of reference compound −3.423 kcal/mol in site 1 was then used as a threshold for further screening of hit molecules. Subsequently, the top 50% of the molecules resulting from the XP docking on site 1 were redocked to site 2 and site 3. A total of 3125 and 2813 compounds were predicted to bind better than the reference compound on site 2 and site 3, respectively. The results from the docking study were then integrated to eliminate the false positive compounds. The results indicate that 2468 compounds were able to bind tightly with all three binding sites predicted by the algorithm.

Recently, machine-learning-based scoring functions evolved to measure the binding affinity of the compounds with their multiple characteristic features. In particular, RF-Score-VS obtains a remarkable hit rate up to 88.6% throughout the DUD-E targets [21]. Hence, we analyzed the binding ability of all the screened hit compounds using RF-Score-VS. It is notable that the reference compound trametinib showed an RF-score of 6.565. Fortunately, a total of 5152 compounds were ranked better than the reference compound in RF-Score-VS analysis. The comparison of the docking study and RF-Score calculation yielded a total of 1654 compounds. These compounds were screened through the Tanimoto coefficient calculation using the rdkit package. All the compounds' fingerprints were generated and tested for structural similarity against the reference compound. The calculations of the Tanimoto coefficients of the screened hit compounds were tabulated in Table S3. Here, we chose a Tanimoto coefficient of 0.6 as a threshold value for screening the compounds [47]. Overall, 368 compounds gained a Tanimoto coefficient value above 0.6, which will be taken for further screening studies.

#### *3.4. MM/GBSA Analysis*

Recent literature studies highlight that the total binding free energy values predicted during the MM/GBSA calculation correlate well with the experimentally measured biological activity [48]. Thus, Prime-MM/GBSA was implemented as a post-scoring process for the validation of the screened hit molecules. The pose viewer file generated during the Glide XP docking on site 1 was considered as an input file for this analysis. The results of the MM/PBSA studies on the top 15 hit compounds and their associated energy values were represented in Table 2. Moreover, the replicability of the binding affinity by Glide docking was evaluated through three-fold validation of XP docking on 15 hit compounds. The binding free energy values obtained during the three iterations were represented in Table S4. It is evident from the table that the 14 hit compounds were able to display a better docking score than the reference compound in all three docking processes. Although the docking score slightly differs during each docking simulation, the compounds ranking was most likely the same as the initial docking simulation. These results demonstrate the excellent consistency of the compounds ranking during the docking simulation. It is evident from Table 2 that the binding free energy values of the compounds varied from −46 to −87 kcal/mol. The available literature information depicts that lipophilicity and van der Waals energy were key factors for the proper binding of the ligand molecules with the target receptor [49,50]. It is evident from the table that the lipophilicity of the compounds DB12661, DB07642, DB01771 and DB07177 were highly favorable for the ligand binding. Although two compounds, DB01711 and DB07177, showed better lipophilicity, the minimal van der Waals interaction limits the total binding free energy of these compounds.

It should be noted that, except DB02849 and DB04841, most compounds in terms of binding have been highly favored by van der Waals interaction energy. In particular, the compounds DB12661 and DB07642 displayed a massive van der Waals interaction energy value of −57.476 and −55.062 kcal/mol, respectively. Although these compounds show limited coulombic potential, the maximum contribution of van der Waals interaction energy is responsible for the tight binding of these compounds with the MEK1 receptor. Moreover, the total binding free energy values of these compounds, DB012661 and DB07642, were much higher (>−80 kcal/mol), which is also higher than the other compounds investigated in this analysis. Hence, we believe that the compounds DB012661 and DB07642 may more tightly bind with the MEK1 receptor than the other compounds screened in our analysis.

**Table 2.** Molecular docking and binding free energy calculations of hit compounds against MEK1 receptor.


#### *3.5. Structural Properties of Hit Compounds 3.5. Structural Properties of Hit Compounds*

The similarity between the ligand molecules was evaluated by mapping the pharmacophoric structure of the hit compounds. Here, we have used "2D structure alignment" utility present in the maestro workspace to align the structure of the compound. Moreover, we have predicted the ADME/T properties of the hit compounds using the QikProp module available in the Schrödinger package. These results were incorporated in Table 3. Note that these structures were aligned against the reference compound trametinib. Interestingly, four hit compounds, such as trametinib, DB08251, DB02849, DB04241 and DB12847, had pyridine as a common scaffold in their structures. Pyridine is an essential pharmacophore and an extraordinary heterocyclic system in the realm of anti-cancer drug development [51]. It is also noted that the hit compounds displayed acceptable ADME/T values during the QikProp analysis. The central nervous system activity prediction is one of the main properties during the ADME/T prediction [52]. All the compounds except DB12661 and DB07642 were exhibited at the in-active state, which is indicated by a CNS value of −2. Moreover, the other properties, such as stars (acceptable range: 0–5) and HOA (acceptable range: 1–3), were in the acceptable range in all the hit compounds. The similarity between the ligand molecules was evaluated by mapping the pharmacophoric structure of the hit compounds. Here, we have used "2D structure alignment" utility present in the maestro workspace to align the structure of the compound. Moreover, we have predicted the ADME/T properties of the hit compounds using the QikProp module available in the Schrödinger package. These results were incorporated in Table 3. Note that these structures were aligned against the reference compound trametinib. Interestingly, four hit compounds, such as trametinib, DB08251, DB02849, DB04241 and DB12847, had pyridine as a common scaffold in their structures. Pyridine is an essential pharmacophore and an extraordinary heterocyclic system in the realm of anti-cancer drug development [51]. It is also noted that the hit compounds displayed acceptable ADME/T values during the QikProp analysis. The central nervous system activity prediction is one of the main properties during the ADME/T prediction [52]. All the compounds except DB12661 and DB07642 were exhibited at the in-active state, which is indicated by a CNS value of -2. Moreover, the other properties, such as stars (acceptable range: 0–5) and HOA (acceptable range: 1–3), were in the acceptable range in all the hit compounds.

**Table 3.** 2D structure of hit compounds with their predicted ADME properties. **Table 3.** 2D structure of hit compounds with their predicted ADME properties.


DB08251 1 −2 −3.274 1

DB13174 0 −2 −2.449 2


*Pharmaceutics* **2022**, *14*, 10 of 19

*Pharmaceutics* **2022**, *14*, 10 of 19

It should be noted that, except DB02849 and DB04841, most compounds in terms of binding have been highly favored by van der Waals interaction energy. In particular, the compounds DB12661 and DB07642 displayed a massive van der Waals interaction energy value of −57.476 and −55.062 kcal/mol, respectively. Although these compounds show limited coulombic potential, the maximum contribution of van der Waals interaction energy is responsible for the tight binding of these compounds with the MEK1 receptor. Moreover, the total binding free energy values of these compounds, DB012661 and DB07642, were much higher (>−80 kcal/mol), which is also higher than the other compounds investigated in this analysis. Hence, we believe that the compounds DB012661 and DB07642 may more tightly bind with the MEK1 receptor than the other compounds screened in our

It should be noted that, except DB02849 and DB04841, most compounds in terms of binding have been highly favored by van der Waals interaction energy. In particular, the compounds DB12661 and DB07642 displayed a massive van der Waals interaction energy value of −57.476 and −55.062 kcal/mol, respectively. Although these compounds show limited coulombic potential, the maximum contribution of van der Waals interaction energy is responsible for the tight binding of these compounds with the MEK1 receptor. Moreover, the total binding free energy values of these compounds, DB012661 and DB07642, were much higher (>−80 kcal/mol), which is also higher than the other compounds investigated in this analysis. Hence, we believe that the compounds DB012661 and DB07642 may more tightly bind with the MEK1 receptor than the other compounds screened in our

The similarity between the ligand molecules was evaluated by mapping the pharmacophoric structure of the hit compounds. Here, we have used "2D structure alignment" utility present in the maestro workspace to align the structure of the compound. Moreover, we have predicted the ADME/T properties of the hit compounds using the QikProp module available in the Schrödinger package. These results were incorporated in Table 3. Note that these structures were aligned against the reference compound trametinib. Interestingly, four hit compounds, such as trametinib, DB08251, DB02849, DB04241 and DB12847, had pyridine as a common scaffold in their structures. Pyridine is an essential pharmacophore and an extraordinary heterocyclic system in the realm of anti-cancer drug development [51]. It is also noted that the hit compounds displayed acceptable ADME/T values during the QikProp analysis. The central nervous system activity prediction is one of the main properties during the ADME/T prediction [52]. All the compounds except DB12661 and DB07642 were exhibited at the in-active state, which is indicated by a CNS value of -2. Moreover, the other properties, such as stars (acceptable range: 0–5) and HOA

The similarity between the ligand molecules was evaluated by mapping the pharmacophoric structure of the hit compounds. Here, we have used "2D structure alignment" utility present in the maestro workspace to align the structure of the compound. Moreover, we have predicted the ADME/T properties of the hit compounds using the QikProp module available in the Schrödinger package. These results were incorporated in Table 3. Note that these structures were aligned against the reference compound trametinib. Interestingly, four hit compounds, such as trametinib, DB08251, DB02849, DB04241 and DB12847, had pyridine as a common scaffold in their structures. Pyridine is an essential pharmacophore and an extraordinary heterocyclic system in the realm of anti-cancer drug development [51]. It is also noted that the hit compounds displayed acceptable ADME/T values during the QikProp analysis. The central nervous system activity prediction is one of the main properties during the ADME/T prediction [52]. All the compounds except DB12661 and DB07642 were exhibited at the in-active state, which is indicated by a CNS value of -2. Moreover, the other properties, such as stars (acceptable range: 0–5) and HOA

(acceptable range: 1–3), were in the acceptable range in all the hit compounds.

(acceptable range: 1–3), were in the acceptable range in all the hit compounds.

**Table 3.** 2D structure of hit compounds with their predicted ADME properties.

**Table 3.** 2D structure of hit compounds with their predicted ADME properties.

**DrugBank ID 2D Strucure Stars a CNS b QPlogS c HOA d**

**DrugBank ID 2D Strucure Stars a CNS b QPlogS c HOA d**

Reference 1 −2 −8.042 1

Reference 1 −2 −8.042 1

analysis.

analysis.

*3.5. Structural Properties of Hit Compounds* 

*3.5. Structural Properties of Hit Compounds* 

DB12661 0 0 −5.177 3

DB12661 0 0 −5.177 3

DB12661 0 0 −5.177 3

DB12661 0 0 −5.177 3

DB12661 0 0 −5.177 3

**Table 3.** *Cont.*


*Pharmaceutics* **2022**, *14*, 11 of 19

*Pharmaceutics* **2022**, *14*, 11 of 19

DB07773 0 −2 −1.457 1

DB07773 0 −2 −1.457 1

DB02849 1 −2 −2.647 2

DB02849 1 −2 −2.647 2

DB04241 0 −2 −3.902 2

DB04241 0 −2 −3.902 2

DB07125 0 −2 −1.666 1

DB07125 0 −2 −1.666 1

DB01771 0 −2 −2.794 3

DB01771 0 −2 −2.794 3

DB02366 0 −2 −5.171 3

DB02366 0 −2 −5.171 3

**Table 3.** *Cont.*

a—The number of attributes or descriptor values that are beyond the 95% range of similar values for identified drugs; b—predicted central nervous system activity; c —predicted aqueous solubility; d— Human Oral Absorption; pink color indications in the 2D structure represent the 2D structure aligna—The number of attributes or descriptor values that are beyond the 95% range of similar values for identified drugs; b—predicted central nervous system activity; c —predicted aqueous solubility; d— Human Oral Absorption; pink color indications in the 2D structure represent the 2D structure aligna—The number of attributes or descriptor values that are beyond the 95% range of similar values for identified drugs; b—predicted central nervous system activity; c —predicted aqueous solubility; d— Human Oral Absorption; pink color indications in the 2D structure represent the 2D structure alignment of the compounds against the reference compound. a—The number of attributes or descriptor values that are beyond the 95% range of similar values for identified drugs; b—predicted central nervous system activity; c —predicted aqueous solubility; d— Human Oral Absorption; pink color indications in the 2D structure represent the 2D structure alignment of the compounds against the reference compound. <sup>a</sup>—The number of attributes or descriptor values that are beyond the 95% range of similar values for identified drugs; <sup>b</sup>—predicted central nervous system activity; <sup>c</sup>—predicted aqueous solubility; <sup>d</sup>—Human Oral Absorption; pink color indications in the 2D structure represent the 2D structure alignment of the compounds against the reference compound.

#### ment of the compounds against the reference compound. *3.6. Binding Mode Analysis 3.6. Binding Mode Analysis 3.6. Binding Mode Analysis*

*3.6. Binding Mode Analysis*  The binding frequencies of the top 14 compounds on the three different binding sites were represented in Figure S2. It is notable that the binding positions of the compounds at each binding site were more or less the same in site 1 and site 3. Since the binding site residues were dispersed larger in site 2, a few compounds, such as DB12661, DB02709, DB12847 and DB08251, were positioned differently from the other compounds. Most of the ligand molecules were bound tightly in site 1, as indicated by the better docking score in Table S3. Hence, the ligand binding conformations of the top hit compounds in site 1 were analyzed (Figure 2). It is evident from the figure that all the hit compounds exhibited *3.6. Binding Mode Analysis*  The binding frequencies of the top 14 compounds on the three different binding sites were represented in Figure S2. It is notable that the binding positions of the compounds at each binding site were more or less the same in site 1 and site 3. Since the binding site residues were dispersed larger in site 2, a few compounds, such as DB12661, DB02709, DB12847 and DB08251, were positioned differently from the other compounds. Most of the ligand molecules were bound tightly in site 1, as indicated by the better docking score in Table S3. Hence, the ligand binding conformations of the top hit compounds in site 1 were analyzed (Figure 2). It is evident from the figure that all the hit compounds exhibited The binding frequencies of the top 14 compounds on the three different binding sites were represented in Figure S2. It is notable that the binding positions of the compounds at each binding site were more or less the same in site 1 and site 3. Since the binding site residues were dispersed larger in site 2, a few compounds, such as DB12661, DB02709, DB12847 and DB08251, were positioned differently from the other compounds. Most of the ligand molecules were bound tightly in site 1, as indicated by the better docking score in Table S3. Hence, the ligand binding conformations of the top hit compounds in site 1 were analyzed (Figure 2). It is evident from the figure that all the hit compounds exhibited two hydrogen bond interactions with the MEK1 receptor, while the reference compound The binding frequencies of the top 14 compounds on the three different binding sites were represented in Figure S2. It is notable that the binding positions of the compounds at each binding site were more or less the same in site 1 and site 3. Since the binding site residues were dispersed larger in site 2, a few compounds, such as DB12661, DB02709, DB12847 and DB08251, were positioned differently from the other compounds. Most of the ligand molecules were bound tightly in site 1, as indicated by the better docking score in Table S3. Hence, the ligand binding conformations of the top hit compounds in site 1 were analyzed (Figure 2). It is evident from the figure that all the hit compounds exhibited two hydrogen bond interactions with the MEK1 receptor, while the reference compound The binding frequencies of the top 14 compounds on the three different binding sites were represented in Figure S2. It is notable that the binding positions of the compounds at each binding site were more or less the same in site 1 and site 3. Since the binding site residues were dispersed larger in site 2, a few compounds, such as DB12661, DB02709, DB12847 and DB08251, were positioned differently from the other compounds. Most of the ligand molecules were bound tightly in site 1, as indicated by the better docking score in Table S3. Hence, the ligand binding conformations of the top hit compounds in site 1 were analyzed (Figure 2). It is evident from the figure that all the hit compounds exhibited two hydrogen bond interactions with the MEK1 receptor, while the reference compound

two hydrogen bond interactions with the MEK1 receptor, while the reference compound displayed three hydrogen bond interactions with the binding site of MEK1. The iodoali-

displayed three hydrogen bond interactions with the binding site of MEK1. The iodoalinine moiety of trametinib produces a hydrogen bond interaction with SER 194 of the

displayed three hydrogen bond interactions with the binding site of MEK1. The iodoali-

two hydrogen bond interactions with the MEK1 receptor, while the reference compound displayed three hydrogen bond interactions with the binding site of MEK1. The iodoali-

MEK1 receptor. On the other hand, the cyclopropyl moiety of trametinib makes two hydrogen bond interactions with SER 194 and ASN 195 of the MEK1 receptor. Surprisingly, the quinazoline moiety of the compound DB07642 and methoxy phenyl group DB012661 were producing interactions with LYS 97, which is also an important catalytic residue present in the rooftop of the MEK1 binding pocket. It is also noted that LYS 97 located in the β strand is responsible for the pairing of ATP phosphate to GLU 114 on an adjacent alpha helix [45]. Moreover, the oxygen atom linked with the pyrimidine group of DB12661 makes a hydrogen bond interaction with MET 146, a hinge residue that connects the N and C lobes in the MEK1 receptor [53]. Most importantly, the quinazoline moiety of DB07642 forms an additional hydrogen bond interaction with activation loop residue,

drogen bond interactions with SER 194 and ASN 195 of the MEK1 receptor. Surprisingly, the quinazoline moiety of the compound DB07642 and methoxy phenyl group DB012661 were producing interactions with LYS 97, which is also an important catalytic residue present in the rooftop of the MEK1 binding pocket. It is also noted that LYS 97 located in the β strand is responsible for the pairing of ATP phosphate to GLU 114 on an adjacent alpha helix [45]. Moreover, the oxygen atom linked with the pyrimidine group of DB12661 makes a hydrogen bond interaction with MET 146, a hinge residue that connects the N and C lobes in the MEK1 receptor [53]. Most importantly, the quinazoline moiety of DB07642 forms an additional hydrogen bond interaction with activation loop residue,

drogen bond interactions with SER 194 and ASN 195 of the MEK1 receptor. Surprisingly, the quinazoline moiety of the compound DB07642 and methoxy phenyl group DB012661 were producing interactions with LYS 97, which is also an important catalytic residue present in the rooftop of the MEK1 binding pocket. It is also noted that LYS 97 located in the β strand is responsible for the pairing of ATP phosphate to GLU 114 on an adjacent alpha helix [45]. Moreover, the oxygen atom linked with the pyrimidine group of DB12661 makes a hydrogen bond interaction with MET 146, a hinge residue that connects the N and C lobes in the MEK1 receptor [53]. Most importantly, the quinazoline moiety of DB07642 forms an additional hydrogen bond interaction with activation loop residue,

MEK1 receptor. On the other hand, the cyclopropyl moiety of trametinib makes two hydrogen bond interactions with SER 194 and ASN 195 of the MEK1 receptor. Surprisingly, the quinazoline moiety of the compound DB07642 and methoxy phenyl group DB012661 were producing interactions with LYS 97, which is also an important catalytic residue present in the rooftop of the MEK1 binding pocket. It is also noted that LYS 97 located in the β strand is responsible for the pairing of ATP phosphate to GLU 114 on an adjacent alpha helix [45]. Moreover, the oxygen atom linked with the pyrimidine group of DB12661 makes a hydrogen bond interaction with MET 146, a hinge residue that connects the N and C lobes in the MEK1 receptor [53]. Most importantly, the quinazoline moiety of DB07642 forms an additional hydrogen bond interaction with activation loop residue,

ment of the compounds against the reference compound.

displayed three hydrogen bond interactions with the binding site of MEK1. The iodoalinine moiety of trametinib produces a hydrogen bond interaction with SER 194 of the MEK1 receptor. On the other hand, the cyclopropyl moiety of trametinib makes two hydrogen bond interactions with SER 194 and ASN 195 of the MEK1 receptor. Surprisingly, the quinazoline moiety of the compound DB07642 and methoxy phenyl group DB012661 were producing interactions with LYS 97, which is also an important catalytic residue present in the rooftop of the MEK1 binding pocket. It is also noted that LYS 97 located in the β strand is responsible for the pairing of ATP phosphate to GLU 114 on an adjacent alpha helix [45]. Moreover, the oxygen atom linked with the pyrimidine group of DB12661 makes a hydrogen bond interaction with MET 146, a hinge residue that connects the N and C lobes in the MEK1 receptor [53]. Most importantly, the quinazoline moiety of DB07642 forms an additional hydrogen bond interaction with activation loop residue, such as SER 212, which plays a major role in the phosphorylation of MEK1. It is evident from the literature that most of the MEK 1/2 ligands generate strong interactions with SER 212 [54]. It is important to note that both the lead compounds are bound on the same pattern where the known MEK inhibitors bind. For instance, rafemetinib and RO4987655 interacted with the amino acid residues LYS97 and SER212 of the MEK receptor. On the other hand, CI-1040, PD-0325901, cobimetinib, TAK-733 and GDC-0623 were successfully involved in contact with SER212 of the MEK receptor [6,45]. Based on these pieces of evidence, we are certain that compounds such as DB07642 and DB12661 make strong contact with the functionally important amino acid residues of MEK.

In general, the compound DB012661, also known as urapidil, acts as an antihypertensive drug that inhibits the activity of α-adrenoceptor. It is worth noting that the compound urapidil also resulted in substantial inhibitory activity in several cancer cell lines [55]. On the other hand, the compound DB07642 (5-[1-(2-Fluorobenzyl)piperidin-4-yl]methoxyquinazoline-2,4-diamine) contains crucial pharmacophores. For instance, piperidine, a heterocyclic pharmacophore, has immense importance in the field of drug development. The piperidine derivatives effectively block the several kinase targets (ERK 2, VEGFR 2 and Alb 1) during the in vitro assessment in the liver cancer cell line (HepG2) [56]. Quinazoline is another important pharmacophore that is present in the many approved anticancer drugs, such as erlotinib and vandetanib [57]. Overall, we believe that these compounds may potentially block the activation of MEK, thereby reducing the risk of many malignant effects.

#### *3.7. Binding Analysis of Lead Compounds with PIM1*

The binding abilities of the lead compounds were also tested on the PIM1 receptor, which is frequently cross-talked with the MAPK pathway. Molecular docking and prime-MM/GBSA analysis of the lead compounds tested against PIM1 were tabulated in Table S5. It is notable that the recently identified dual inhibitor (MEK1 and PIM1) KZ-02 was used as the reference compound in this analysis. The compound KZ-02 obtained a docking score of −4.892 kcal/mol and a binding free energy value of −50.61 kcal/mol. It is notable that both lead compounds displayed better docking scores and binding free energy values than the PIM1 reference compound. The interactions of the lead compounds with the PIM1 receptor were represented in Figure S3. Interestingly, the compound DB07642 displayed three hydrogen bond interactions and 2 pi-pi stacking with the PIM1 receptor. This implies the greater binding potential of the compound DB07642 with the PIM1 receptor. Altogether, we hypothesize that the lead compounds specified in this study may significantly inhibit the activation of both MEK1 and PIM1.

with the functionally important amino acid residues of MEK.

**Figure 2.** Ligand interaction diagram of the hit compounds. (**a**) Reference; (**b**) DB12661; (**c**) DB07642 with MEK1 receptor. **Figure 2.** Ligand interaction diagram of the hit compounds. (**a**) Reference; (**b**) DB12661; (**c**) DB07642 with MEK1 receptor.

such as SER 212, which plays a major role in the phosphorylation of MEK1. It is evident from the literature that most of the MEK 1/2 ligands generate strong interactions with SER 212 [54]. It is important to note that both the lead compounds are bound on the same pattern where the known MEK inhibitors bind. For instance, rafemetinib and RO4987655 interacted with the amino acid residues LYS97 and SER212 of the MEK receptor. On the other hand, CI-1040, PD-0325901, cobimetinib, TAK-733 and GDC-0623 were successfully involved in contact with SER212 of the MEK receptor [6,45]. Based on these pieces of evidence, we are certain that compounds such as DB07642 and DB12661 make strong contact

#### *3.8. SIE-Based Free Energy of Binding*

Since molecular recognition and drug binding have been recognized as dynamic processes, it is thus particularly important to elaborate on the protein–ligand binding capabilities in a presumed dynamic system. To this end, the free energy of binding (∆*G*bind) calculations based on the solvated interaction energy (SIE) were applied and theoretically used to predict the inhibitory activity as it is directly proportional to an experimental inhibitory parameter, K<sup>d</sup> (∆*G*bind = −RTln1/*Kd*) [58]. Here, the ∆*G*bind values of two focused compounds extracted from the last 10 ns (90–100 ns) snapshots, which were considered to be reaching their equilibrated state (Figure S4), were listed in Table 4 in comparison to the trametinib. The calculated molecular mechanics calculations showed that

Van der Waal (vdW) is the main interactive force contributing to the process of molecular complexation of all the focused compounds as well as trametinib (>five to six-fold than electrostatic interaction energy), which corresponds to the molecular docking study by Glide XP. Apart from that, the average ∆*G*bind values of the focused compounds and a reference drug were nearly the same, within the range of −8.4 to −7.5 kcal/mol. In particular, DB12661 possessed a slightly lower ∆*G*bind when compared to the trametinib (∆*G*bind of −8.41 and −8.17 kcal/mol, respectively), suggesting a minutely higher binding strength than the known drug. On the contrary, compound DB07642 exhibited a slightly higher ∆*G*bind value (∆*G*bind of −7.52 kcal/mol), which may imply a slight reduction in the ligand binding capability. However, we believed that these two screened compounds could be thermodynamically able to bind to the MEK1 at the ATP-binding site, and both are of particular interest to be subjected to next-step experimental studies, for which DB12661 and DB07642 were rationally considered as a priority and a second top, accordingly.

**Table 4.** Average ∆*G*bind values (kcal/mol) of focused compounds as well as trametinib in complex with MEK1 calculated by the SIE method using α = 0.105, γ = 0.013 and C = −2.89, respectively.


#### *3.9. Key Binding Residues*

In order to elucidate the key binding amino acid residues within the ATP-binding pocket located at the ATPase domain of MEK1, the decomposition free energy (∆*G* bind residue) based upon the MM/GBSA method was computationally predicted, and the total contribution of each amino acid of the known drug and focused complexes was plotted, in which the negative and positive decomposition free energy values manifested the ligand stabilization and destabilization, respectively, as illustrated in Figure 3. It was found that the contributing amino acid residues observed in all the complexes were mainly stabilized through van der Waals (vdW) interactions rather than electrostatic force. This indicates that these two candidate compounds may rely on a mechanism of inhibitory action similar to trametinib. In particular, the amino acids that largely contributed towards the trametinib's binding (∆*G* < −1.0 kcal/mol) include ASN78, VAL82, LYS97, SER150, SER194, ASN195, LEU197 and ASP208, of which the SER194 and ASN195 were also found from the docking pose. Among these, ASN78, LYS97 and ASN195 played a pivotal role in the complex stabilization (∆*G* < −2.0 kcal/mol). In the case of the candidate compounds, it was found that the key amino acid residues contributing to the DB07642 binding are mostly the same residues responsible for trametinib's binding (ASN78, VAL82, LYS97 and ASN195); one additional residue, M143, was observed. Apart from that, compound DB12661 was primarily stabilized through hydrophobic residue of VAL82 (∆*G*bind = −2.73 kcal/mol), while seven other residues (LEU74, GLY80, VAL81, LYS97, HIS145, MET146 and LEU197) were also found in the stabilization of the complex via vdW interactions with ∆*G* bind residue in the range of −2.0 to −1.0 kcal/mol. Nevertheless, one negatively charged residue, ASP208, was found to be slightly destabilized; that was probably due to the charge–charge repulsion in the complex system. To sum up, with a higher number of residues largely contributing to DB12661 binding, this compound, as expected, possessed the lowest vdW interactive and total binding free energy (Table 4), where the set of vdW interactions became the main driving force towards the complex formation. On the contrary, some contributing amino acid residues (observed in both trametinib and DB12661) may be somewhat lost during the MEK1–DB07642 complex formation, resulting in the slightly lower ∆*G*bind when compared to the trametinib. We noted that these results are correlated well with the calculated SIE-based ∆*G*bind and each energy component, as listed in Table 4.

**Figure 3.** Per-residue decomposition free energy (∆*G* bind residue) of the ATPase pocket of MEK1 for the binding of the (**A**,**B**) two screened compounds and (**C**) the known drug, trametinib. binding of the (**A**,**B**) two screened compounds and (**C**) the known drug, trametinib. *3.10. Ligand–Protein Hydrogen Bonding* 

**Table 4.** Average ΔGbind values (kcal/mol) of focused compounds as well as trametinib in complex with MEK1 calculated by the SIE method using α = 0.105, γ = 0.013 and C = −2.89, respectively.

In order to elucidate the key binding amino acid residues within the ATP-binding pocket located at the ATPase domain of MEK1, the decomposition free energy (ΔG୰ୣୱ୧ୢ୳ୣ

ୠ୧୬ୢ in the range of −2.0 to −1.0 kcal/mol. Nevertheless, one negatively charged

residue, ASP208, was found to be slightly destabilized; that was probably due to the charge–charge repulsion in the complex system. To sum up, with a higher number of residues largely contributing to DB12661 binding, this compound, as expected, possessed the lowest vdW interactive and total binding free energy (Table 4), where the set of vdW interactions became the main driving force towards the complex formation. On the contrary, some contributing amino acid residues (observed in both trametinib and DB12661) may be somewhat lost during the MEK1–DB07642 complex formation, resulting in the slightly lower ΔGbind when compared to the trametinib. We noted that these results are correlated well with the calculated SIE-based ΔGbind and each energy component, as listed in Table

based upon the MM/GBSA method was computationally predicted, and the total contribution of each amino acid of the known drug and focused complexes was plotted, in which the negative and positive decomposition free energy values manifested the ligand stabilization and destabilization, respectively, as illustrated in Figure 3. It was found that the contributing amino acid residues observed in all the complexes were mainly stabilized through van der Waals (vdW) interactions rather than electrostatic force. This indicates that these two candidate compounds may rely on a mechanism of inhibitory action similar to trametinib. In particular, the amino acids that largely contributed towards the trametinib's binding (ΔG < −1.0 kcal/mol) include ASN78, VAL82, LYS97, SER150, SER194, ASN195, LEU197 and ASP208, of which the SER194 and ASN195 were also found from the docking pose. Among these, ASN78, LYS97 and ASN195 played a pivotal role in the complex stabilization (ΔG < −2.0 kcal/mol). In the case of the candidate compounds, it was found that the key amino acid residues contributing to the DB07642 binding are mostly the same residues responsible for trametinib's binding (ASN78, VAL82, LYS97 and ASN195); one additional residue, M143, was observed. Apart from that, compound DB12661 was primarily stabilized through hydrophobic residue of VAL82 (ΔGbind = −2.73 kcal/mol), while seven other residues (LEU74, GLY80, VAL81, LYS97, HIS145, MET146 and LEU197) were also found in the stabilization of the complex via vdW interactions

ୠ୧୬ୢ )

**EvdW Eele Reaction Field Cavity ΔGbind**

Trametinib −51.05 ± 0.34 −9.58 ± 0.20 19.25 ± 0.26 −9.05 ± 0.07 −8.17 ± 0.04 DB12661 −52.08 ± 0.32 −4.29 ± 0.17 12.18 ± 0.24 −8.52 ± 0.05 −8.41 ± 0.04 DB07642 −43.91 ± 0.37 −6.90 ± 0.21 14.62 ± 0.36 −8.02 ± 0.06 −7.52 ± 0.04

**Compounds Energy Components** 

*3.9. Key Binding Residues* 

with ΔG୰ୣୱ୧ୢ୳ୣ

4.

#### *3.10. Ligand–Protein Hydrogen Bonding* Hydrogen bonding is one of the non-covalent interactions observed in the formation

Hydrogen bonding is one of the non-covalent interactions observed in the formation of protein–ligand complexes and could influence the ligand binding strength. Hence, the intermolecular hydrogen bond interactions were investigated in terms of the percentage of occupations and plotted in Figure 4. As expected, a few strong hydrogen bonds could be observed in the screened compounds and even the trametinib since they are intrinsically hydrophobic ligands. The reference drug trametinib created a strong hydrogen bond with ASN195 (65%), which was also observed by the docking pose (Figure 2). In addition, ALA76 and ASN78 moderately stabilized the drug through 45% and 44.5% of the hydrogen bond occupations, while ASN78 could additionally interact with the drug through 35% of it. For the MEK1–DB12661 complex, we found that the H atom in the backbone (-NH2) of MET146 exhibited a very strong hydrogen bond, while the polar H atom in the imidazole ring of HIS145 showed a moderate level. In the case of DB07642, there are three amino acid residues stabilizing the DB07642 binding, which include ASN195, ASP208 and SER194. Among these, the H atom in the amino side chain of ASN195 displayed the highest chance of hydrogen bond occurrence with percentage occupations of 26%, while the other two residues merely exhibited a weak hydrogen bond (≈17%). Altogether, these obtained results suggested that the intermolecular hydrogen bond interactions did not play a major role responsible for the complex stabilization observed in all the studied compounds, including the trametinib. On the other hand, the ligand binding within the ATP-binding pocket of MEK1 was predominantly contributed by vdW interactions, as discussed previously. of protein–ligand complexes and could influence the ligand binding strength. Hence, the intermolecular hydrogen bond interactions were investigated in terms of the percentage of occupations and plotted in Figure 4. As expected, a few strong hydrogen bonds could be observed in the screened compounds and even the trametinib since they are intrinsically hydrophobic ligands. The reference drug trametinib created a strong hydrogen bond with ASN195 (65%), which was also observed by the docking pose (Figure 2). In addition, ALA76 and ASN78 moderately stabilized the drug through 45% and 44.5% of the hydrogen bond occupations, while ASN78 could additionally interact with the drug through 35% of it. For the MEK1–DB12661 complex, we found that the H atom in the backbone (- NH2) of MET146 exhibited a very strong hydrogen bond, while the polar H atom in the imidazole ring of HIS145 showed a moderate level. In the case of DB07642, there are three amino acid residues stabilizing the DB07642 binding, which include ASN195, ASP208 and SER194. Among these, the H atom in the amino side chain of ASN195 displayed the highest chance of hydrogen bond occurrence with percentage occupations of 26%, while the other two residues merely exhibited a weak hydrogen bond (≈17%). Altogether, these obtained results suggested that the intermolecular hydrogen bond interactions did not play a major role responsible for the complex stabilization observed in all the studied compounds, including the trametinib. On the other hand, the ligand binding within the ATPbinding pocket of MEK1 was predominantly contributed by vdW interactions, as discussed previously.

**Figure 4.** Percentage of hydrogen bond occupations contributing to the binding of two screened compounds (DB12661 and DB07642) and the trametinib within the ATPase domain of MEK1 using two criteria involving the distance and angle between the hydrogen bond donor (HD) and hydrogen acceptor (HA) of ≤ 3.5 Å for the distance and ≥120° for the angle, respectively. **Figure 4.** Percentage of hydrogen bond occupations contributing to the binding of two screened compounds (DB12661 and DB07642) and the trametinib within the ATPase domain of MEK1 using two criteria involving the distance and angle between the hydrogen bond donor (HD) and hydrogen acceptor (HA) of ≤3.5 Å for the distance and ≥120◦ for the angle, respectively.

In conclusion, the DrugBank compounds were screened through the different computational approaches to discover the potential MEK inhibitors. Initially, molecular docking and various scoring functions were implemented to screen the active molecules against the MEK protein. Overall, the screening demonstrated that compounds such as

their inhibitory activity. In addition, the modes of action of these compounds were comprehended through the connection of the ligand with the MEK active segment residues. Most importantly, the compounds' inhibitory activity was also examined with the PIM1 receptor since it upregulated during the action of several MEK inhibitors. Further, the MD simulation and end-point free energy calculation validated the binding mode of the lead

**4. Conclusions** 

#### **4. Conclusions**

In conclusion, the DrugBank compounds were screened through the different computational approaches to discover the potential MEK inhibitors. Initially, molecular docking and various scoring functions were implemented to screen the active molecules against the MEK protein. Overall, the screening demonstrated that compounds such as DB07642 and DB12661 were able to tightly bind with the MEK receptor. Notably, the presence of crucial pharmacophore moieties in the hit compounds gives additional support to their inhibitory activity. In addition, the modes of action of these compounds were comprehended through the connection of the ligand with the MEK active segment residues. Most importantly, the compounds' inhibitory activity was also examined with the PIM1 receptor since it upregulated during the action of several MEK inhibitors. Further, the MD simulation and end-point free energy calculation validated the binding mode of the lead compounds with the MEK receptor. Thus, we hypothesize that further experimental validation of our research findings will help to level up the cancer treatment in the near future.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics14010059/s1. Table S1: Predicted pKa values of each amino acid residues at different conditions. Table S2: Compounds used for validation of docking and RF-Score-VS. Table S3: Multiple screening analysis of the compounds against MEK1. Table S4: Three-fold validation on glide docking analysis of hit compounds. Table S5: Molecular docking and binding free energy calculations of top hit compounds against PIM1 receptor. Figure S1: ROC analysis of screening methods. (a) Docking; (b) RF-Score-VS. Figure S2: Binding frequency of the ligand molecules on top three predicted binding sites. The coloured dots represent the binding sites: site 1 (red), site 2 (orange), and site 3 (yellow). The coloured chemical structures depict ligand molecules binding positions on various binding sites. Ligand bound in site 1 (purple); site 2 (green); site 3 (sky blue). Figure S3: Ligand interaction diagram of hit compounds (a) KZ-02 (Reference); (b) DB012661; (c) DB07642 with PIM1 receptor. Figure S4: Root-mean-square displacement (RMSD) plot for the backbone amino acid residues within a 5-Å sphere around the ligand. The data were derived from the three independent runs with different initial velocities.

**Author Contributions:** M.K.T. performed the data collection, preparation and virtual screening. U.S. performed the molecular dynamic simulation and binding free energy analysis. M.K.T. and U.S. performed the result analysis and wrote the initial version of manuscript. R.K. and T.R. conceived this study and are responsible for the overall design, interpretation, manuscript preparation, and communication. All authors have read and agreed to the published version of the manuscript.

**Funding:** T.R. acknowledge the supports from the Thailand Research Fund (grant number RSA6280085).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors (M.K.T. and R.K.) thank the management of Vellore Institute of Technology. The author (U.S. and T.R) would like to thank the Science Achievement Scholarship (SAST) of Thailand for the Ph.D. scholarship, the 90th Anniversary of Chulalongkorn University Fund (Ratchadaphiseksomphot), and the Overseas Presentations of Graduate Level Academic Thesis from Graduate School.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


## *Article* **A Single-Cell Network-Based Drug Repositioning Strategy for Post-COVID-19 Pulmonary Fibrosis**

**Albert Li <sup>1</sup> , Jhih-Yu Chen <sup>1</sup> , Chia-Lang Hsu 2,3 , Yen-Jen Oyang <sup>1</sup> , Hsuan-Cheng Huang 4,\* and Hsueh-Fen Juan 1,5,6,\***


**Abstract:** Post-COVID-19 pulmonary fibrosis (PCPF) is a long-term complication that appears in some COVID-19 survivors. However, there are currently limited options for treating PCPF patients. To address this problem, we investigated COVID-19 patients' transcriptome at single-cell resolution and combined biological network analyses to repurpose the drugs treating PCPF. We revealed a novel gene signature of PCPF. The signature is functionally associated with the viral infection and lung fibrosis. Further, the signature has good performance in diagnosing and assessing pulmonary fibrosis. Next, we applied a network-based drug repurposing method to explore novel treatments for PCPF. By quantifying the proximity between the drug targets and the signature in the interactome, we identified several potential candidates and provided a drug list ranked by their proximity. Taken together, we revealed a novel gene expression signature as a theragnostic biomarker for PCPF by integrating different computational approaches. Moreover, we showed that network-based proximity could be used as a framework to repurpose drugs for PCPF.

**Keywords:** single-cell RNA sequencing; COVID-19; pulmonary fibrosis; biological networks; drug repurposing

#### **1. Introduction**

Since 2019, the outbreak of the COVID-19 pandemic has caused millions of infections globally. Some patients may suffer from sequelae of the viral infection [1]. Post-COVID-19 pulmonary fibrosis (PCPF) is one of the long-term complications being emphasized recently [1]. Considering the medical treatments for this disease are limited, it is crucial to leverage pharmacogenomic data to repurpose drugs treating this disease. In this study, we combine single-cell analysis, machine learning, and network biology to identify a novel transcriptomic signature. We show that this signature is promising in assessing the disease and surveying drugs that can potentially treat pulmonary fibrosis.

Previously, network-based methods have successfully repurposed drugs treating several diseases [2–5]. Based on the property of biological networks, drugs with smaller proximity tend to be more effective than those with larger proximity [3]. However, since the choice of disease-related genes will largely impact results and inferences [6], whether the network-based approach can be applied to PCPF needs further verification.

Single-cell RNA-sequencing analysis (scRNA-seq) has been used to investigate the host response in severe COVID-19 cases [7]. Melms et al. discovered that two cell types,

**Citation:** Li, A.; Chen, J.-Y.; Hsu, C.-L.; Oyang, Y.-J.; Huang, H.-C.; Juan, H.-F. A Single-Cell Network-Based Drug Repositioning Strategy for Post-COVID-19 Pulmonary Fibrosis. *Pharmaceutics* **2022**, *14*, 971. https://doi.org/ 10.3390/pharmaceutics14050971

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 26 February 2022 Accepted: 29 April 2022 Published: 30 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

pathological and intermediate-pathological fibroblasts, are associated with the pathogenesis of pulmonary fibrosis; these cells strongly express markers of pathological fibroblasts (*CTHRC1*) and pathological extracellular matrix (*COL1A1* and *COL3A1*) [7]. They also revealed a clear relationship between fibrosis score and mortality, highlighting the importance of pulmonary fibrosis in patients' survival. Although the roles of pathological fibroblasts have been elucidated, whether these cells are applicable in clinical diagnosis, severity assessment, and treatment still needs further investigation.

Here, we aim to reveal a novel signature of PCPF by interrogating scRNA-seq data. We showed that the signature could be used to diagnose and assess pulmonary fibrosis. Further, this signature can also be used to repurpose and prioritize potentially effective drugs treating PCPF.

#### **2. Materials and Methods**

#### *2.1. Construction and Evaluation of the PCPF Signature*

The preprocessed single-cell gene expression profile underwent linearly dimensional reduction by principal component analysis (PCA). We used the Louvain algorithm to cluster the cells on the K-nearest neighbors (KNN) graph, which was constructed on the principal component (PC) space. We referred to the cell (sub)type information provided by Melms et al. [7]. We annotated each cell cluster based on the majority of the cell subtype in each cluster. Next, we made a case-control comparison to calculate the proportion difference in different cell clusters. To identify the characters of the cluster with the greatest proportional changes, we conducted differential gene expression analysis to compare the gene expression profiles of the cases and controls. We selected the top 200 up-regulated differentially expressed genes (DEGs) as the PCPF signature. We defined the signature score as the mean of the signature gene expression. We implemented the single-cell analysis with Scanpy [8].

We used DAVID (Available online: https://david.ncifcrf.gov/; (accessed on July 2021)) [9] to infer the signature-related biological functions. We selected the Benjamini−Hochberg procedure for the adjustment of multiple hypothesis testing.

#### *2.2. Support Vector Machine (SVM)*

Samples from GSE32537 underwent a random selection where 80% of samples were used for model training and the remainder for testing. A non-linear decision boundary, radial kernel function, was used to maximize the margin *M* that delineates two different classes (i.e., cases and controls). Ten-fold cross-validation was used to select optimal tuning parameters *C* and γ, where *C* determines the tolerance of violation to the margin and γ defines how far the support vectors should be taken. We compared the SVM values between cases and controls in the testing dataset (Wilcoxon rank-sum test). The procedure was implemented with the R package e1071.

#### *2.3. Principal Component Regression*

Observations from GSE32537 underwent random sampling where 2/3 of samples were used for model training, and the remaining samples were used for testing. Expression levels of genes within the signature were dimensionally reduced to PCs. We used PCs as features to predict DLCO and FVC. Suppose there are *m* observations, **y** represents the response vector in <sup>R</sup>*m*, and *<sup>n</sup>* is the total number of PCs. We composed a design matrix *<sup>P</sup><sup>m</sup>* <sup>×</sup> (*k*+1) with a constant column and the first *<sup>k</sup>* PCs, and fitted a linear regression model as:

$$y = \mathbf{P}\mathcal{B} + \mathbf{e} \tag{1}$$

With the lowest loss (mean square error, MSE), where β ∈ R*k*+<sup>1</sup> is the coefficient vector, ∈ R*<sup>m</sup>* is the error vector, and *<sup>k</sup>* <sup>∈</sup> [1, *<sup>n</sup>*]. Ten-fold cross-validation was used to assess the models for different *k*. Since the cut-offs of abnormal DLCO and FVC (% predicted) are typically set at 75% and 80% [10], respectively, we filtered out samples beyond those thresholds. The testing dataset was used to predict clinical traits (DLCO and

FVC). Correlation analysis (Pearson's r) was conducted to assess the association between predicted and observed values. We implemented the procedure with the R package *pls* [11].

#### *2.4. Calculation of Network-Based Proximity*

Proximity is the shortest path length between two sets of nodes (drug targets and disease-related proteins) in the interactome. Suppose that *T* is the set of protein target(s) of a drug, *D* is the set of proteins relating to the disease, and *l*(*t*, *d*) is the shortest path length between node *t* and *d*. Therefore, the shortest proximity (ds) is defined as follows:

$$\mathbf{ds} = \frac{1}{||T||} \sum\_{t \in T} \frac{1}{||D||} \sum\_{d \in D} l \begin{pmatrix} t, d \end{pmatrix} \forall \, t \in T, \, d \in D \tag{2}$$

To reduce the degree effect in proximity, we calculated the relative proximity Zds by stratifying the nodes according to their degrees. Specifically, nodes in the interactome were firstly arranged according to node degree and assigned to bins sequentially, where each bin can at most contain 100 nodes. Here, nodes in each bin will have similar, if not identical, degrees. Second, we randomly selected nodes from the same bin as nodes in the set *T* and *D*, then computed their shortest proximity. The procedure was iterated 100 times to obtain the mean (µds) and standard deviation (σds) of ds. The relative proximity (Zds) is defined as:

$$\text{Zds} = \frac{\text{ds} - \mu \text{ds}}{\text{odds}} \tag{3}$$

#### **3. Results**

ical impact.

#### *3.1. An Overview of the Analytical Pipeline*

The aims of this study are to discover a novel PCPF signature and leverage the networkbased drug repurposing method to explore medications treating PCPF. The analytical pipeline is shown in Figure 1. We first identify the cell (sub)types and annotate cell clusters. We next construct the PCPF signature and evaluate its roles in diagnosing and assessing pulmonary fibrosis. Finally, we use a network-based method to explore effective treatment for PCPF. *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 4 of 15

**Figure 1.** An overall analytical pipeline of this study. Schematic representation of the scRNA-seq analysis, signature construction, and application of the signature by integrating various computational methods. DEA: differential expression analysis; PCR: principal component regression; SVM: support vector machine. **Figure 1.** An overall analytical pipeline of this study. Schematic representation of the scRNA-seq analysis, signature construction, and application of the signature by integrating various computational methods. DEA: differential expression analysis; PCR: principal component regression; SVM: support vector machine.

To explore cell clusters contributing to PCPF, we first investigated lung tissues on the

pare their proportional differences (Figure 2B). We then noticed that cluster 12, pathological fibroblasts (PFBs), has the most considerable difference (Figure 2C). Therefore, we posited that PFBs play crucial roles in PCPF pathogenesis and further explored their clin-

*3.2. Identifying PCPF-Related Cell Clusters at the Single-Cell Level* 

#### *3.2. Identifying PCPF-Related Cell Clusters at the Single-Cell Level*

To explore cell clusters contributing to PCPF, we first investigated lung tissues on the dimensionally-reduced 2D plane (Figure 2A). To discover which cell cluster is mainly associated with PCPF, we conducted a case-control comparison on each cell cluster to compare their proportional differences (Figure 2B). We then noticed that cluster 12, pathological fibroblasts (PFBs), has the most considerable difference (Figure 2C). Therefore, we posited that PFBs play crucial roles in PCPF pathogenesis and further explored their clinical impact. *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 5 of 15

**Figure 2.** Single-cell transcriptome analysis of the lung tissues in COVID-19 cases. (**A**) Single-cell analysis of 116,314 cells from lung tissues. Nineteen cell clusters were identified and annotated based on the cell (sub)types provided by the literature [7]. (**B**) Visualization of the proportional difference of cells between COVID-19 patients and healthy controls. (**C**) Comparison of cluster 12 (PFBs) proportion between COVID-19 patients and healthy controls. (**D**) Differentially expressed gene analysis of cluster 12. Up-regulated and down-regulated genes are highlighted in red and blue, respectively. (**E**) Functional enrichment analysis of the differentially expressed genes. Enriched biological processes are shown in a bar plot. pFB: pathological fibroblast. PCPF: post-COVID-19 pulmonary fibrosis. **Figure 2.** Single-cell transcriptome analysis of the lung tissues in COVID-19 cases. (**A**) Single-cell analysis of 116,314 cells from lung tissues. Nineteen cell clusters were identified and annotated based on the cell (sub)types provided by the literature [7]. (**B**) Visualization of the proportional difference of cells between COVID-19 patients and healthy controls. (**C**) Comparison of cluster 12 (PFBs) proportion between COVID-19 patients and healthy controls. (**D**) Differentially expressed gene analysis of cluster 12. Up-regulated and down-regulated genes are highlighted in red and blue, respectively. (**E**) Functional enrichment analysis of the differentially expressed genes. Enriched biological processes are shown in a bar plot. pFB: pathological fibroblast. PCPF: post-COVID-19 pulmonary fibrosis.

To deduce the roles of PFBs in PCPF, we compared the gene expression profile be-

*3.3. Comparison of Pathological Fibroblasts (PFBs) to other Cell Types* 

#### *3.3. Comparison of Pathological Fibroblasts (PFBs) to Other Cell Types*

To deduce the roles of PFBs in PCPF, we compared the gene expression profile between PFBs and other cells (Figure 2D and Supplementary Figure S1). To infer the biological functions in which DEGs are involved, we performed a functional enrichment analysis to identify the enriched biological processes (BP) in PFBs (Figure 2E). We found that viral transcription is the most enriched term, followed by fibrosis formation (e.g., extracellular matrix organization and collagen fibril organization). The DEGs derived from PFBs show meaningful and related biological functions, suggesting that PFBs may contribute to PCPF pathogenesis. Therefore, we constructed a transcriptome signature (Supplementary Table S1) to represent the distinct expression profile of these PFBs and further explored the roles of the signature on pulmonary fibrosis patients' outcomes. *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 6 of 15 biological functions in which DEGs are involved, we performed a functional enrichment analysis to identify the enriched biological processes (BP) in PFBs (Figure 2E). We found that viral transcription is the most enriched term, followed by fibrosis formation (e.g., extracellular matrix organization and collagen fibril organization). The DEGs derived from PFBs show meaningful and related biological functions, suggesting that PFBs may contribute to PCPF pathogenesis. Therefore, we constructed a transcriptome signature (Sup-

#### *3.4. Difference in PFB Signature between the Patients and Healthy Controls* plementary Table S1) to represent the distinct expression profile of these PFBs and further explored the roles of the signature on pulmonary fibrosis patients' outcomes.

To further discover the signature derived from the scRNA-seq of COVID-19 samples, we externally validated the PFB signature in another cohort, comprising 119 idiopathic pulmonary fibrosis (IPF) patients and 50 healthy controls [12]. IPF patients and healthy people have a distinct signature pattern (Figure 3A,B). Next, we examined whether patients' symptoms (SGRQ) and lung function (FVC and DLCO) could be clearly visualized within the two main PCs as well. DLCO and FVC show an increasing trend from the top left to the bottom in the first two principal component dimensions (Figure 3C,D), suggesting that patients with different IPF severity are dissimilar in terms of their signature. Although not as clear as that in lung function, the SGRQ trend is also similar, where more severe patients appeared in the top left, and less impaired patients appeared in the bottom right (Figure 3E). *3.4. Difference in PFB Signature between the Patients and Healthy Controls* To further discover the signature derived from the scRNA-seq of COVID-19 samples, we externally validated the PFB signature in another cohort, comprising 119 idiopathic pulmonary fibrosis (IPF) patients and 50 healthy controls [12]. IPF patients and healthy people have a distinct signature pattern (Figure 3A,B). Next, we examined whether patients' symptoms (SGRQ) and lung function (FVC and DLCO) could be clearly visualized within the two main PCs as well. DLCO and FVC show an increasing trend from the top left to the bottom in the first two principal component dimensions (Figure 3C,D), suggesting that patients with different IPF severity are dissimilar in terms of their signature. Although not as clear as that in lung function, the SGRQ trend is also similar, where more severe patients appeared in the top left, and less impaired patients appeared in the bottom right (Figure 3E).

**Figure 3.** Discovery of distinct expression of the signature in pulmonary fibrosis patients. (**A**) Hierarchical clustering of samples based on the signature expression. Heatmap values are the scaled gene expression. (**B**) Visualization of patients and controls in the two main principal components. (**C**–**E**) Visualization of DLCO (**C**), FVC (**D**), and SGRQ (**E**) in the two main principal components. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; SGRQ: St. George's Respiratory Questionnaire. **Figure 3.** Discovery of distinct expression of the signature in pulmonary fibrosis patients. (**A**) Hierarchical clustering of samples based on the signature expression. Heatmap values are the scaled gene expression. (**B**) Visualization of patients and controls in the two main principal components. (**C**–**E**) Visualization of DLCO (**C**), FVC (**D**), and SGRQ (**E**) in the two main principal components. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; SGRQ: St. George's Respiratory Questionnaire.

#### *3.5. The Signature Can Be Used in the Diagnosis and Severity Assessment of Pulmonary Fibrosis* Current genetic tools for the diagnosis and assessment of pulmonary fibrosis are lim-

Current genetic tools for the diagnosis and assessment of pulmonary fibrosis are limited. Therefore, we explored whether the signature can be applied to these clinical challenges. We first revealed that FVC, DLCO, and SGRQ are significantly correlated with the signature score (Figure 4A–C). Moreover, as a potential confounder of clinical traits, age has a very weak correlation with SGRQ, FVC, and DLCO (Supplementary Figure S2). Next, we compared signature scores between IPF patients and healthy people and found IPF patients have significantly higher scores compared to the controls (Figure 4D). ited. Therefore, we explored whether the signature can be applied to these clinical challenges. We first revealed that FVC, DLCO, and SGRQ are significantly correlated with the signature score (Figure 4A–C). Moreover, as a potential confounder of clinical traits, age has a very weak correlation with SGRQ, FVC, and DLCO (Supplementary Figure S2). Next, we compared signature scores between IPF patients and healthy people and found IPF patients have significantly higher scores compared to the controls (Figure 4D).

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 7 of 15

*Pulmonary Fibrosis*

*3.5. The Signature Can Be Used in the Diagnosis and Severity Assessment of* 

**Figure 4.** Investigating the association between signature expression and lung functions. (**A**–**C**) Correlation analysis of signature score and DLCO (**A**), FVC (**B**), and SGRQ (**C**). The dashed line represents the linear regression line. (**D**) Comparison of signature expression between IPF patients and healthy controls. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; IPF: idiopathic pulmonary fibrosis; SGRQ: St. George's Respiratory Questionnaire. **Figure 4.** Investigating the association between signature expression and lung functions. (**A**–**C**) Correlation analysis of signature score and DLCO (**A**), FVC (**B**), and SGRQ (**C**). The dashed line represents the linear regression line. (**D**) Comparison of signature expression between IPF patients and healthy controls. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; IPF: idiopathic pulmonary fibrosis; SGRQ: St. George's Respiratory Questionnaire.

Considering the correlation between gene signature and traits, we next used the signature to train machine learning models to predict clinical outcomes of pulmonary fibrosis patients. We found that an SVM could perfectly differentiate pulmonary fibrosis patients from healthy controls (Figure 5A,B) without adding extra clinical features. We next explored whether the signature could predict patients' lung function test results (% of predicted DLCO and FVC). PC regression was used to fit the training data. The correlation coefficients between the predicted and observed DLCO and FVC are 0.61 (*p* = 2.91 × 10−4) and 0.77 (*p* = 2.52 × 10−6 respectively (Figure 5C,D). Considering the correlation between gene signature and traits, we next used the signature to train machine learning models to predict clinical outcomes of pulmonary fibrosis patients. We found that an SVM could perfectly differentiate pulmonary fibrosis patients from healthy controls (Figure 5A,B) without adding extra clinical features. We next explored whether the signature could predict patients' lung function test results (% of predicted DLCO and FVC). PC regression was used to fit the training data. The correlation coefficients between the predicted and observed DLCO and FVC are 0.61 (*<sup>p</sup>* = 2.91 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ) and 0.77 (*<sup>p</sup>* = 2.52 <sup>×</sup> <sup>10</sup>−<sup>6</sup> respectively (Figure 5C,D).

Altogether, the signature has high confidence in classifying pulmonary fibrosis patients and predicting lung function test results; this implies its potential applicability in clinical diagnosis and severity assessment.

**Figure 5.** Signature as a diagnosis and assessment tool for pulmonary fibrosis using machine learning models. (**A**) The SVM scores for IPF patients and healthy controls. (**B**) Comparison of SVM decision value between IPF patients and healthy controls. (**C**,**D**) Correlation analysis between observed and predicted DLCO (**C**) and FVC (**D**). The dashed line represents the linear regression line. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; IPF: idiopathic pulmonary fibrosis; SVM: support vector machine. **Figure 5.** Signature as a diagnosis and assessment tool for pulmonary fibrosis using machine learning models. (**A**) The SVM scores for IPF patients and healthy controls. (**B**) Comparison of SVM decision value between IPF patients and healthy controls. (**C**,**D**) Correlation analysis between observed and predicted DLCO (**C**) and FVC (**D**). The dashed line represents the linear regression line. DLCO: diffusing capacity for carbon monoxide; FVC: forced vital capacity; IPF: idiopathic pulmonary fibrosis; SVM: support vector machine.

#### Altogether, the signature has high confidence in classifying pulmonary fibrosis pa-*3.6. The Network-Based Proximity between Anti-Pulmonary Fibrosis Drugs and the Signature*

tients and predicting lung function test results; this implies its potential applicability in clinical diagnosis and severity assessment. *3.6. The Network-Based Proximity between Anti-Pulmonary Fibrosis Drugs and the Signature* Considering the roles of the signature in the diagnosis and assessment of pulmonary fibrosis, we defined the top-20 genes in the signature as the disease-related genes. Since Considering the roles of the signature in the diagnosis and assessment of pulmonary fibrosis, we defined the top-20 genes in the signature as the disease-related genes. Since the network proximity has been used to evaluate drugs for various diseases [3,4], we postulated that this method could also prioritize and repurpose the anti-PCPF drugs. In this case, anti-pulmonary fibrosis drugs should have closer proximity than the drugs with unknown anti-pulmonary fibrosis effects.

the network proximity has been used to evaluate drugs for various diseases [3,4], we postulated that this method could also prioritize and repurpose the anti-PCPF drugs. In this case, anti-pulmonary fibrosis drugs should have closer proximity than the drugs with unknown anti-pulmonary fibrosis effects. We calculated the shortest proximity (ds) between drug targets and PCPF-related proteins on the interactome (Figure 6A). Since our hypothesis is that shorter proximity is associated with therapeutic effects, it is necessary to examine other factors that simultaneously affect proximity. In particular, node degree has been known to be anti-correlated with proximity [3], defined here as degree effect. Degree effect can lead to a biased interpretation of proximity in drug repurposing analyses. For instance, the cytotoxic agents typically have lower proximity than other drug categories because anti-cancer drugs' targets tend to have higher node degrees [2]. In this study, we also observed this phenomenon (Supplementary Figure S3A,B). We then calculated the relative proximity (Zds) by randomly selecting the degree-stratifying nodes on the interactome (Figure 6B). It is clear that the degree effect is less prominent in Zds (Supplementary Figure S3C,D). Next, to prove that the known-effect (anti-pulmonary fibrosis) drugs have smaller proximity than the unknown-effect drugs, we compared Zds between these two categories. We found that the known-effect drugs have significantly lower proximity (Figure 6C), with predictive performance AUC equal to 0.672 (Figure 6D). To further validate the results, we used We calculated the shortest proximity (ds) between drug targets and PCPF-related proteins on the interactome (Figure 6A). Since our hypothesis is that shorter proximity is associated with therapeutic effects, it is necessary to examine other factors that simultaneously affect proximity. In particular, node degree has been known to be anti-correlated with proximity [3], defined here as degree effect. Degree effect can lead to a biased interpretation of proximity in drug repurposing analyses. For instance, the cytotoxic agents typically have lower proximity than other drug categories because anti-cancer drugs' targets tend to have higher node degrees [2]. In this study, we also observed this phenomenon (Supplementary Figure S3A,B). We then calculated the relative proximity (Zds) by randomly selecting the degree-stratifying nodes on the interactome (Figure 6B). It is clear that the degree effect is less prominent in Zds (Supplementary Figure S3C,D). Next, to prove that the known-effect (anti-pulmonary fibrosis) drugs have smaller proximity than the unknown-effect drugs, we compared Zds between these two categories. We found that the known-effect drugs have significantly lower proximity (Figure 6C), with predictive performance AUC equal to 0.672 (Figure 6D). To further validate the results, we used another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess antipulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity information can be found in Supplementary Table S2.

Cerulenin −10.03 0.688

Naringenin −9.40 2.204

Fisetin −9.18 1.325

[24]

[25]

[26]

[27]

Fluvastatin −10.03 2.379 [20]

**Figure 6.** Characterizing the roles of proximity on drug repurposing for anti-pulmonary fibrosis drugs. (**A**) Schematic representation of the method. (**B**) Distribution of different proximity measures. **Figure 6.** Characterizing the roles of proximity on drug repurposing for anti-pulmonary fibrosis drugs. (**A**) Schematic representation of the method. (**B**) Distribution of different proximity measures. (**C**) Comparison of proximity, Zds, between drugs with known and unknown anti-pulmonary fibrosis effects. (**D**) Analysis of the predictive performance of Zds on anti-pulmonary fibrosis effects using the ROC curve. identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity information can be found in Supplementary Table S2. identical trends (Supplementary Figure Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity information can be found in Supplementary Table S2. another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity

information can be found in Supplementary Table S2.

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis potential.

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis potential.

[22]

[22]



Emodin −10.25 1.238

Emodin −10.25 1.238

Quercetin −12.48 2.060

**Name** 

**Name** 

**Name** 

**Name Z-Shortest** 

Benzoic Acid −17.91 0.726

Artenimol −14.18 2.019

**Z-Shortest Proximity (Zds)** 

**Z-Shortest Proximity (Zds)** 

Benzoic Acid −17.91 0.726

**Z-Shortest Proximity (Zds)** 

**Proximity (Zds)** 

Acid −17.91

Benzoic Acid −17.91 0.726

Artenimol −14.18 2.019

Artenimol −14.18 2.019

Quercetin −12.48

**Shortest Proximity (ds)** 

**Shortest Proximity (ds)** 

> **Shortest Proximity (ds)**


*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 9 of 15

*Pharmaceutics* **2022**REVIEW 9 of 15 another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends Figure on the results, Zds can be used a predictor to assess anti-pulmonary effects. Therefore, we summarized drugs with high potential in Table The drug list and their proximity be found Table S2.

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 9 of 15

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 9 of 15

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis

information can be found in Supplementary Table S2.

information can be found in Supplementary Table S2.

information can be found in Supplementary Table S2.

[14,15] Artenimol −14.18

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis potential.

**Shortest Proximity (ds) Structure Reference** 

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis potential.

**Table 1.** Selected top-ranked drugs with highly anti-pulmonary fibrosis potential.

another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity

another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity

another set of anti-fibrosis drugs (not restricted to pulmonary fibrosis) [13] and found identical trends (Supplementary Figure S4A,B). Based on the above results, Zds can be used as a predictor to assess anti-pulmonary fibrosis effects. Therefore, we summarized the drugs with high repurposing potential in Table 1. The full drug list and their proximity

**Structure Reference** 

**Structure Reference** 

**Structure Reference** 

[14,15]

[14,15]

[16]

[17]

[14,15]

[16]

[16]

[27]

[27]

181

Vitamin D −9.18 1.690

Vitamin D −9.18 1.690


Valproic

Acid −10.11 2.373

Cerulenin −10.03 0.688

Naringenin −9.40 2.204

#### **4. Discussion**

This study integrates various computational approaches to reveal a crucial theragnostic signature in PCPF. We show that the signature is associated with viral infections, pulmonary fibrosis, and clinical outcomes. Moreover, we demonstrate that the machine learning models trained with the signature show decent performance in diagnosing pulmonary fibrosis and predicting patients' lung function. Lastly, we prove that drugs with known anti-pulmonary fibrosis effects have closer proximity than those with unknown effects, suggesting that a network-based framework can also be applied to prioritize and repurpose drugs in PCPF.

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 10 of 15

[23]

[24]

[25]

[26]

[27]

Fluvastatin −10.03 2.379 [20]

Considering the design of this study was for PCPF, we notice that the viral infectionrelated GO term is the most enriched (Figure 2E). This phenomenon also appears in the network-based analysis, where drugs with strong anti-COVID-19 effects have significantly closer (smaller) proximity than drugs with weak or no-effect (Supplementary Figure S3D). This observation suggests that the signature may be associated with two events: COVID-19 viral infection and pulmonary fibrosis. Although pulmonary fibroblasts are less well known as target cells of the virus, recent studies revealed that alveolar fibroblasts could also be infected by the virus due to their expression of ACE2 receptors [28]. Aloufi et al. found that IPF fibroblasts have an even higher expression of ACE2 receptor, highlighting the roles of pathological fibroblasts in COVID-19 infection [29].

We also observe some medical procedure-related terms (e.g., response to mechanical stimulus). Although these terms are not significantly enriched (Figure 2E), they still imply that patients may undergo specific medication therapies or receive mechanical ventilation during hospital treatment.

One of the advantages of performing scRNA-seq on clinical samples is the highresolution mapping of each cell. However, a deeper inspection may imply a smaller patient sample size because the number of patients enrolled can rarely be as large as that in bulk RNA analysis. There are 26 cases in the scRNA-seq dataset; it is reasonable to challenge any inference made from only 26 persons. Therefore, externally validating the results derived from scRNA-seq in a broader population can generate more confidence in the results. Nonetheless, it is undeniable that some facts exist such that the results from scRNA-seq may not be fully in concordance with bulk RNA analysis. Zero inflation, for instance, can lead to the underestimation of the low-expressed genes [30]. Another challenge is that the result in one patient cohort may not be reproducible in another simply due to numerous uncontrollable factors between the two cohorts. However, in our study, the signature derived from scRNA-seq also play a vital role in another bulk-sample patient cohort, suggesting that the signature is reproducible and can be externally validated.

There are limitations to this study. First, we applied the signature derived from PCPF to IPF patients. It is undeniable that the etiologies of PCPF and IPF are less likely to be identical. The causes of PCPF may include the viral infection and the host immune response; on the other hand, the causes of IPF remain unclear, even though there are several studies revealed the genetic predispositions or causal variants of IPF using genome-wide association studies with fine-mapping [31] or polygenic risk score [32]. However, regardless of the causes, PCPF and IPF are fibrogenesis and fibrosis in the lung tissue. Considering the limited clinical information on PCPF, we used IPF as a surrogate to investigate the potential impacts and clinical insights of this PCPF signature, in particular the application in drug repurposing. We understand that population structure and other bassline demographic characteristics could influence the performance of the gene signature score, and thus the signature score should be carefully interpreted when applying to other ethnic groups, such as Asians. Another limitation is the lack of lung function test results in the single-cell cohort. This makes it harder to compare the baseline characteristics of the IPF and PCPF patients.

The rationale for the network-based drug repurposing approach is that a drug may still be effective when its target proteins are 'close' to the disease-related protein(s) in the interactome [3,33,34]. If this argument is true, drugs with known effects on disease should have closer proximity compared to the unknown-effect drugs. Accordingly, this requires identifying a significant difference in proximity between known-effect and unknown-effect drugs. However, in some diseases, medical treatment options are very limited, such as IPF [35,36]. There are, in fact, only two FDA-approved drugs, nintedanib, and pirfenidone, that seem to be associated with a slower progression of IPF [36]. Therefore, if we simply assign drugs to either known or unknown effects based on current clinical knowledge, hypothesis testing between the two drug categories (known vs. unknown effect) can hardly be conducted due to highly unbalanced sample sizes. To address this problem, we searched the published literature which conducted drug repurposing for pulmonary fibrosis [37] and pan-fibrosis [13] and used the repurposed drugs as the known-effect drugs.

Previous studies have applied the network-based drug repurposing framework to various diseases [3,38]. Nonetheless, due to the complexity of disease mechanisms, validating this method is necessary when dealing with different conditions. For instance, previously, we found that, in lung adenocarcinoma, the closest proximity on the weighted interactome shows the best performance in identifying promising drugs [2]. In this study, however, we noticed that z-transformed shortest proximity, Zds, has better performance. This observation implies that the performance of proximity metrics may be context-dependent.

Although proximity may be associated with drug effectiveness, we urge caution when interpreting the ranked drug list, as proximity is not the only factor contributing to drug effectiveness. For instance, we found that nintedanib, one of the two currently approved drugs for IPF, has small proximity (Zds = −3.22; rank = 798/5643). However, the other approved anti-IPF agent, pirfenidone, has large proximity (Zds = 1.45; rank = 5115/5643). Therefore, this observation suggests that drugs with distant proximity could still be effective, as proximity may be only one of the many factors affecting drug effectiveness. Other crucial factors, such as binding affinity, also matter.

Within the top-ranked repurposed drugs (top 3% of the drugs in Supplementary Table S2), we found some drugs belonging to antibiotic or antiviral agent categories, which may be related to pneumonia treatment [39], acute exacerbation of pulmonary fibrosis [40], or other morbidities, such as pneumonitis, opportunistic infection, or tissue inflammation [41]. They may not truly show strong anti-fibrosis effects. On the other hand, we noticed that many top-ranked candidates on this list show promising anti-pulmonary fibrosis effects. Artenimol (Zds = −14.18; rank = 28/5643) (also known as dihydroartemisinin), for instance, can reduce lung fibrosis by suppressing the Notch signaling pathway [42] and pro-fibrotic pathways [43]. Another example is dinoprostone (also known as prostaglandin E2). It was reported that inhaling liposomal prostaglandin E2 can treat pulmonary fibrosis by restricting inflammation and fibrotic injury in the lungs [21].

Another interesting drug category is statins, a well-known class of lipid-lowering agents. A retrospective study surveying 323 IPF patients found that statin-users have a slower annual decline in DLCO and FVC than non-users [20]. We then searched our drug list for the types of the statin used in this study [20] and found that all of them have very small Zds: atorvastatin (Zds = −10.5), fluvastatin (Zds = −10.03), rosuvastatin (Zds = −8.37), pravastatin (Zds = −6.73), and simvastatin (Zds = −4.43).

#### **5. Conclusions**

We reveal a novel theragnostic signature for PCPF and provide a prioritized drug list based on network-based proximity, Zds. Our study shows the applicability of integrating various computational methods when analyzing biomedical data and, importantly, provides useful information for diagnosing, assessing, and treating PCPF.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/pharmaceutics14050971/s1, Figure S1: The expression of DEGs in PCPF patients and the controls, Figure S2: The correlation analysis between age and other clinical features, Figure S3: The degree effect on proximity, Figure S4: Characterizing the roles of proximity on the repurposing of anti-fibrosis drugs, Table S1: The transcriptome signature of pathological fibroblasts in PCPF, Table S2: The full list with 5644 drugs and their proximity (Zds).

**Author Contributions:** Conceptualization, A.L., H.-C.H. and H.-F.J.; methodology, A.L., C.-L.H., Y.-J.O., H.-C.H. and H.-F.J.; validation, A.L. and J.-Y.C.; data analysis, A.L. and J.-Y.C.; investigation, A.L., C.-L.H., H.-C.H. and H.-F.J.; data curation, A.L., J.-Y.C., H.-C.H. and H.-F.J.; writing—original draft preparation, A.L., H.-C.H. and H.-F.J.; writing—review and editing, H.-C.H. and H.-F.J.; visualization, A.L. and J.-Y.C.; supervision, H.-C.H. and H.-F.J.; project administration, H.-C.H. and H.-F.J.; funding acquisition, H.-C.H. and H.-F.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Science and Technology (MOST 109-2221-E-002- 161-MY3, MOST 109-2221-E-010-012-MY3, MOST 109-2327-B-002-009, MOST 111-2321-B-002-017). The Higher Education Sprout Project (NTU-110L8808 and NTU-CC-109L104702-2). Emerging Infectious and Major Disease Research Program and Taiwan Biotech Innovation Academy (AS-KPQ-110-EIMD).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The COVID-19 scRNA-seq dataset was derived from Melms et al. [7]. It contains autopsy lung tissues from 26 patients, containing 116,314 cells. The pulmonary fibrosis cohort was downloaded from GEO (GSE32537) [12], which provided the gene expression profiles and clinical traits of 119 idiopathic pulmonary fibrosis (IPF) patients and 50 healthy controls. The clinical traits include St. George's Respiratory Questionnaire (SGRQ) and lung function test results (diffusing capacity for carbon monoxide (DLCO) and forced vital capacity (FVC)). For the proximity calculations, we adapted human protein–protein interaction data from Guney et al. [3], which comprises 140,637 interactions among 13,101 proteins. We retrieved and adapted drug-related information, including drug targets and their anti-SARS-CoV-2 effects, from the Drugbank database [44] and Gysi et al. [4], respectively.

**Acknowledgments:** We would like to appreciate Chen-Hao Huang for his help and suggestion to this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Integration of In Silico Strategies for Drug Repositioning towards P38**α **Mitogen-Activated Protein Kinase (MAPK) at the Allosteric Site**

**Utid Suriya <sup>1</sup> , Panupong Mahalapbutr <sup>2</sup> and Thanyada Rungrotmongkol 3,4,\***


**Abstract:** P38α mitogen-activated protein kinase (p38α MAPK), one of the p38 MAPK isoforms participating in a signaling cascade, has been identified for its pivotal role in the regulation of physiological processes such as cell proliferation, differentiation, survival, and death. Herein, by shedding light on docking- and 100-ns dynamic-based screening from 3210 FDA-approved drugs, we found that lomitapide (a lipid-lowering agent) and nilotinib (a Bcr-Abl fusion protein inhibitor) could alternatively inhibit phosphorylation of p38α MAPK at the allosteric site. All-atom molecular dynamics simulations and free energy calculations including end-point and QM-based ONIOM methods revealed that the binding affinity of the two screened drugs exhibited a comparable level as the known p38α MAPK inhibitor (BIRB796), suggesting the high potential of being a novel p38α MAPK inhibitor. In addition, noncovalent contacts and the number of hydrogen bonds were found to be corresponding with the great binding recognition. Key influential amino acids were mostly hydrophobic residues, while the two charged residues including E71 and D168 were considered crucial ones due to their ability to form very strong H-bonds with the focused drugs. Altogether, our contributions obtained here could be theoretical guidance for further conducting experimental-based preclinical studies necessary for developing therapeutic agents targeting p38α MAPK.

**Keywords:** drug repositioning; p38α MAPK; molecular docking; MD simulation; allosteric inhibitors; in silico screening; computer-aided drug discovery

#### **1. Introduction**

Mitogen-activated protein kinase (MAPK) signaling pathways are a cascade comprising three kinases including extracellular signal-regulated kinase (ERK), c-Jun NH2-terminal kinase (JNK), and p38, in which the upstream kinase (MAPKKK) responds to various extraand intracellular signals and activates the middle kinase (MAPKK) by direct phosphorylation [1]. Then, MAPKKs phosphorylate and activate a MAPK, resulting in cell-specific physiological phenomena such as cell proliferation, differentiation, survival, and death [2]. MAPKs are known to be able to react with a wide range of input signals including hormones, cytokines, and growth factors, as well as endogenous stress and environmental factors. To this end, they were classified into two distinct responsive MAPKs; mitogen activated (ERK) and stress activated kinases (JNK and p38) [3]. Substantial studies revealed that the p38 pathway is a key player in response to environmental stress signals and inflammatory stimuli as well as being responsible for the production of some inflammatory cytokines such as tumor necrosis factor-α (TNF-α), interleukin-1β, interleukin-6,

**Citation:** Suriya, U.; Mahalapbutr, P.; Rungrotmongkol, T. Integration of In Silico Strategies for Drug Repositioning towards P38α Mitogen-Activated Protein Kinase (MAPK) at the Allosteric Site. *Pharmaceutics* **2022**, *14*, 1461. https://doi.org/10.3390/ pharmaceutics14071461

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 16 May 2022 Accepted: 10 July 2022 Published: 13 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and interleukin-12 in response to proinflammatory signaling [4,5]. Furthermore, p38 can be a restraint in cancer tumorigenesis (e.g., breast, lung, colon, and liver cancer), which induces a p38-mediated proapoptotic mechanism and the killing of incipient tumor cells by a mechanism involved in the production of reactive oxygen species (ROS) [6]. However, p38 activity functions conversely once a tumor has already been established by supporting its growth [7]. Experimental evidence indicates that tumor cells need to modulate the level of p38 MAPK activity in order to perform metastases, and this signaling occurs in a variety of diseases [8]. To this end, inhibition of the p38 pathway has attracted much attention for the reason that it could be a promising strategy in the management of cancer, neurodegeneration, inflammation, and even the newly emerged pandemic, COVID-19 [9].

Structurally, there are four homologues of p38 MAPK including p38α, p38β, p38γ, and p38δ [3]. Among these, p38α is the best characterized and seems to be the most physiologically related protein involved in inflammatory responses [4,10]. According to the site of the ligand modulation, there are two different generations of p38α MAPK inhibitors, including type I and type II inhibitors, which modulate the activity of the enzyme at the ATP-binding and the allosteric site, respectively. However, targeting an ATP-binding site has limited the clinical use due to a high level of sequence and structural similarity among kinase enzymes [2], which could result in non- or low selective behavior and cause undesirable side effects and toxicities [11]. In order to overcome this issue, recent research has been focusing on utilizing a novel allosteric regulatory site, which is distinct from the ATP pocket at about 60◦ spatially, and there is no structural overlap between compounds bound to the allosteric site and ATP [12]. The conserved residues Asp-Phe-Gly (DFG) motif in the active site were conformationally altered, which is often known as DFG-out conformation and seems to be more stable in protein Tyr kinases [12]. To date, even though a number of clinical p38 MAPK inhibitors have emerged for inflammatory disease indications such as rheumatoid arthritis, there have been no approved agents [13,14] due to the lack of target modulation, adverse events, toxicities, and poor pharmacokinetics [4,14]. Some toxicities reported by clinical studies of well-known p38 MAPK inhibitors, BIRB796 (doramapimod), VX-745 (Vertex), and SCIO-469 (talmapimod) included hepatotoxic elevation of liver transaminases, skin rash, and so forth [15–17]. Accordingly, searching for novel compounds capable of impeding p38 MAPK has still been necessarily important to provide bottom-up preclinical information, guiding the development of therapeutic agents disrupting the MAPK signaling pathway.

Herein, by shedding light on the advancement of computational biology partly contributed to a preclinical stage of drug discovery and development, we aimed to search for novel agents capable of binding to p38α MAPK at the allosteric site by a drug repositioning approach. Bioinformatic databases and in silico methods including docking-based virtual screening, molecular dynamics (MD) simulations, and free energy calculations were employed to guide the discovery of hit compounds that may present a significant potential for further optimization. All the results obtained here provide some useful information and may outline the next steps governing experimental studies for drug discovery and development against p38α MAPK.

#### **2. Materials and Methods**

#### *2.1. Preparation of the 3D Structure of P38α MAPK and Ligands*

The three-dimensional structure of p38α MAPK in complex with a known inhibitor, BIRB796, was retrieved from the RCSB Protein Data Bank (PDB ID: 1KV2). The missing residues (170–184) of p38α MAPK were constructed by means of the homology model implemented in the SWISS-MODEL server [18]. The newly generated structure was consequently validated by plotting the Ramachandran diagram (Figure S1), using PROCHECK [19]. The protonation states of all ionizable amino acids were predicted based on their pKa value by using the PROPKA 3.0 web interface [20], and were then set into the modeled complex structure before performing molecular docking and MD simulations.

Partial atomic charges of BIRB796 were calculated for their geometry and then assigned for the electrostatic potential (ESP) charges via the Gaussian 09W program (G09) using the Hartree–Fock method and 6-31G(d) level of theory [21]. Its structure was then assigned for atom type, and we generated its topology file using the Antechamber program [22]. Converting from the 'mol2' file into a 'pdbqt' format was achieved by AutoDockTools (ADT). For the virtual screening studies, all focused compounds were obtained from the 3210 FDA-approved drugs available in the ZINC database (http://zinc.docking.org, accessed on 12 November 2019). These ligands were also subsequently converted from the 'mol2' format into a 'pdbqt' format using ADT.

#### *2.2. Molecular Docking and Visual Inspection*

Docking calculations were carried out on a Linux operating system using AutoDock VinaXB, which provides a new empirical halogen bond scoring function [23]. Three docking parameters, including exhaustiveness, num\_modes, and energy\_range, were set to 20, 50, and 5 kcal/mol, respectively. For system validation, the crystallized ligand was redocked into the same binding site (Figure S2), and the verified grid box was then employed for all ligands in virtual screening. Predicted binding affinity (Ebinding, kcal/mol) of the most likely occurring conformation was a parameter used to rank the studied compounds, and the structure coordinate was employed to be the initial structure for the MD run.

To reduce the chance of false-positive scoring, a visual inspection of the intermolecular interactions between each ligand and amino-acid residues lining in the focused allosteric site was carried out by specifically examining (i) hydrogen bonding with E71, M109, and D168 as well as (ii) hydrophobic interactions with V38, A51, K53, R67, L75, I84, L104, L108, A157, L167, and F169, which were derived from the binding mode observed in the inhibitor prototype, BIRB796. For this purpose, compounds sharing features of intermolecular interactions with BIRB796 greater than five interactions were then selected.

#### *2.3. Molecular Dynamics (MD) Simulations*

The protein-ligand complex coordinates of the two screened drugs and the inhibitor prototype from molecular docking were dynamically modeled under the periodic boundary condition with the isothermal-isobaric (NPT) scheme [24–28]. The AMBER ff14SB and generalized AMBER force field version 2 (GAFF2) [29] were selected for a force field governing bonded and nonbonded interaction parameters. Electrostatic interactions were treated by the particle mesh Ewald summation method [30] with a cutoff distance for nonbonded interactions of 10 Å. The SHAKE algorithm [31] was retrieved to constrain hydrogen atoms. The temperature was controlled by the Langevin thermostat [32] and set to 310 K by increasing from 10 to 310 K. Controlling the pressure was achieved by the Berendsen barostat [33] with a relaxation time of 1 ps. Moreover, the TIP3P water model [34] was used to solvate the system with minimum padding of 10.0 Å between the protein surface and the solvation box edge. The overall charge of the molecular system was neutralized by randomly adding either sodium or chloride ions. Minimization of the added hydrogen atoms and water molecules was carried out using 500 steps of steepest descent (SD) followed by 1500 steps of conjugated gradient (CG) methods before running the MD simulations with constrained solvent molecules. The whole complex was then fully minimized using the same procedure. For MD production, all systems were set to 100 ns (2 fs increment). The root-mean-square displacement (RMSD), the numbers of hydrogen bond (H-bond), and the contact atoms were calculated through the cpptraj module whilst the perresidue decomposition energy (∆G residue binding) was estimated by MM/PBSA.py implemented in AMBER16.

#### *2.4. End-Point Binding Energy Calculations*

To observe the ligand-binding affinity, the end-point binding free energy (∆Gbind) of each system was predicted by the solvated interaction energy (SIE) approach [35]. ∆Gbind can be estimated by the summation of the van der Waals (EvdW), electrostatic (Eele), reaction field (GRF), cavity (γ∆SA(ρ)), and a constant (C) value. The mathematical equation can be expressed as follow:

$$\Delta \mathbf{G}\_{\text{bind}} \text{ (\$\rho\$, D\$}\_{\text{in}} \text{ \$\alpha\$, } \text{\textasciicircum} \text{C)} = \alpha [\mathbf{E}\_{\text{vdW}} + \mathbf{E}\_{\text{ele}}(\mathbf{D}\_{\text{in}}) + \Delta \mathbf{G}\_{\text{RF}}(\rho, \mathbf{D}\_{\text{in}}) + \gamma \Delta \mathbf{SA}(\rho)] + \mathbf{C}$$

where Din denotes the solute dielectric value. EvdW and Eele represent intermolecular van der Waals and Coulombic interaction energies in the bound state, respectively. ∆GRF is the alteration of the reaction field energy between the bound and free states, ∆Gcavity (γ∆SA) denotes the change in the non-electrostatic solvation free energy between the bound and free forms, and C is the constant value. The coefficients were set as α = 0.105, γ = 0.013, and C = −2.89.

#### *2.5. QM-Based ONIOM Binding Energy Calculations*

A quantum mechanics (QM)-based Our Own N-layered Integrated Molecular Orbital and Molecular mechanics (ONIOM) [36,37] was carried out to additionally observe the binding strength between BIRB796 and the screened drug candidate(s). Before calculating the binding energy, the constructed complexes were optimized using the Hartree−Fock method and a mechanical parameter (HF/6-31G(d):UFF). After optimization, two-layered ONIOM calculations (B3LYP/6-31G(d,):PM6) were applied to determine and compare the binding energies of the three systems. The residues lining within the 5 Å from the ligand, which include Y35, V38, A51, V52, K53, L55, R67, T68, R70, E71, L74, L75, I84, L86, L104, V105, T106, H107, M109, H148, R149, L167, D168, F169, G107, and L171, were selected to represent an allosteric site of p38α MAPK and separated into a low-level layer, while each screened drug was set to a high-level layer. Then, the selected amino acid residues and the ligand were again simulated individually with the B3LYP/6-31G(d) basis set and the PM6 method, respectively. The polarizable continuum model (PCM) was applied to observe the effect of water solvent on the binding energy. All calculations were performed by using the GAUSSIAN16 software package [38], and the binding energy was estimated using the equation below [39].

$$\mathbf{E}\_{\text{binding}}^{\text{solvation}} = \mathbf{E}\_{\text{complex}}^{\text{PCM}} - \mathbf{E}\_{\text{residuales}}^{\text{PCM}} - \mathbf{E}\_{\text{ligand}}^{\text{PCM}}$$

where E solvation binding is the binding energy of the drug-receptor in the solvation system, E PCM complex is the extrapolated ONIOM energy of the complex, E PCM residues is the potential energy of residues lining within the 5 Å from the ligand, and E PCM ligand is the potential energy of the studied ligand.

#### **3. Results and Discussion**

#### *3.1. Docking-Based Screening and Visual Inspection*

Finding the existing drugs that can offer inhibition towards novel targets is a great challenge. To this end, 3210 compounds retrieved from the FDA-approved drugs available in the ZINC database were docked into the allosteric site of the p38α MAPK where its 3D-structure and the inhibitor binding site are illustrated in Figure 1. The compounds were selected and ranked according to their binding affinity (Ebinding) predicted by the scoring function of the Autodock XB software package. By considering their binding affinity, it was found that, among the 3210 compounds, only ZINC27990463 exhibited higher binding affinity (Ebinding = −12.10 kcal/mol) when compared to the ligand reference, BIRB796 (Ebinding = −11.9 kcal/mol). However, to reduce false-negative selection, compounds exhibiting Ebinding lower than −10.00 kcal/mol were also clustered, which were totally filtered into 22 compounds. Note that the in silico filtering scheme and the plot of binding affinity of the selected first-round screened compounds were illustrated in Figures 2 and 3, respectively. All these first-round screened compounds were then inspected for intermolecular interactions inside the cleft of the allosteric site, which is commonly known as "visual inspection". This method has been widely used in the decision-making step for a great number of drug discovery campaigns [40].

of the selected first-round screened compounds were illustrated in Figures 2 and 3, respectively. All these first-round screened compounds were then inspected for intermolecular interactions inside the cleft of the allosteric site, which is commonly known as "visual inspection". This method has been widely used in the decision-making step for a great

number of drug discovery campaigns [40].

**Figure 1.** The ribbon representation of the 3D structure of p38α MAPK (PDB ID: 1KV2). The closeup regions illustrate two common inhibitors including VX-745 (green) and BIBR796 (yellow), indicating the ATP-binding site and allosteric pocket, which is distinct from each other at about 60° spatially. An orientation of F169 exhibiting the unique DFG-out conformation is also shown. Additionally, the hydrophobic nature (obtained via UCSF ChimeraX 1.4, Resource for Biocomputing, Visualization, and Informatics (RBVI), San Francisco, CA, USA.) within the focused allosteric cleft is depicted in a close-up view. **Figure 1.** The ribbon representation of the 3D structure of p38α MAPK (PDB ID: 1KV2). The close-up regions illustrate two common inhibitors including VX-745 (green) and BIBR796 (yellow), indicating the ATP-binding site and allosteric pocket, which is distinct from each other at about 60◦ spatially. An orientation of F169 exhibiting the unique DFG-out conformation is also shown. Additionally, the hydrophobic nature (obtained via UCSF ChimeraX 1.4, Resource for Biocomputing, Visualization, and Informatics (RBVI), San Francisco, CA, USA.) within the focused allosteric cleft is depicted in a close-up view. *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 6 of 15

**Figure 2.** (**A**) In silico filtering scheme, which includes first-round docking-based screening, visual inspection, and SIE-based dynamic screening as well as the program used during each step. (**B**) Chemical structures of a well-known p38α MAPK allosteric inhibitor (BIRB796), lomitapide and nilotinib, obtained via this computational platform. **Figure 2.** (**A**) In silico filtering scheme, which includes first-round docking-based screening, visual inspection, and SIE-based dynamic screening as well as the program used during each step. (**B**) Chemical structures of a well-known p38α MAPK allosteric inhibitor (BIRB796), lomitapide and nilotinib, obtained via this computational platform.

**Figure 3.** Binding affinity in kcal/mol of selected first-round screened compounds that were successfully docked into the focused allosteric site of p38α MAPK compared to BIRB796. Note that the prediction was based upon the scoring function implemented in the Autodock VinaXB, Sirimulla

Research Group at the University of Texas at El Paso, Texas, USA.

nilotinib, obtained via this computational platform.

**Figure 2.** (**A**) In silico filtering scheme, which includes first-round docking-based screening, visual inspection, and SIE-based dynamic screening as well as the program used during each step. (**B**)

**Figure 3.** Binding affinity in kcal/mol of selected first-round screened compounds that were successfully docked into the focused allosteric site of p38α MAPK compared to BIRB796. Note that the prediction was based upon the scoring function implemented in the Autodock VinaXB, Sirimulla Research Group at the University of Texas at El Paso, Texas, USA. **Figure 3.** Binding affinity in kcal/mol of selected first-round screened compounds that were successfully docked into the focused allosteric site of p38α MAPK compared to BIRB796. Note that the prediction was based upon the scoring function implemented in the Autodock VinaXB, Sirimulla Research Group at the University of Texas at El Paso, TX, USA.

It is obviously known that noncovalent interactions are essentially responsible for ligand binding. Compounds showing sufficient interactions in both qualitative and quantitative manners tend to exhibit greater binding capability and could form a more stable complex. Thus, an inspection of intermolecular interactions between each ligand and amino-acid residues lining in the focused allosteric site was visually carried out by specifically examining (i) hydrogen bonding with E71, M109, and D168, as well as (ii) hydrophobic interactions with V38, A51, K53, R67, L75, I84, L104, L108, A157, L167, and F169, which were derived from the binding mode observed in the inhibitor prototype, BIRB796. For this purpose, compounds sharing features of intermolecular interactions with BIRB796 greater than five interactions were then selected, for which the detailed information of all 22 compounds is listed in Figure 4. Thus, we could obtain 10 promising compounds (Figure 2A), which are hereinafter referred to as "hit compounds". These 10 hit compounds were subsequently subjected to second-round screening by MD simulations, and the MD output was used to compute SIE-based end-point free energy calculations.

#### *3.2. Dynamic-Based Screening and End-Point Binding Free Energy Calculations*

To observe and screen the hit compounds' binding capability in a near-physiological condition and dynamic system, the constructed protein-ligand complexes were performed to run MD simulations for 100 ns. The trajectories in the last ten nanoseconds (90–100 ns) was considered to have reached the equilibrated state (supported by the plot of root-meansquare displacement (RMSD) for the backbone amino acids within 5 Å from the ligand as shown in Figure S3) were used to calculate the binding free energy (∆Gbind). This parameter was used to indicate the protein-ligand binding affinity and to employ a dynamics-based screening tool for ranking the hit compounds in the aftermath of rigid docking.

**Figure 4.** Map of intermolecular interactions of BIRB796 and all 22 screened compounds as well as total features sharing interactions with BIRB796. Each type of noncovalent interaction was also illustrated in different colors. It is worth noting that these occurred interactions were based upon the best docked conformation and visualized by Accelrys Discovery Studio 2.5. \* The compounds selected to run MD simulations. **Figure 4.** Map of intermolecular interactions of BIRB796 and all 22 screened compounds as well as total features sharing interactions with BIRB796. Each type of noncovalent interaction was also illustrated in different colors. It is worth noting that these occurred interactions were based upon the best docked conformation and visualized by Accelrys Discovery Studio 2.5. \* The compounds selected to run MD simulations.

*3.2. Dynamic-Based Screening and End-Point Binding Free Energy Calculations*  To observe and screen the hit compounds' binding capability in a near-physiological condition and dynamic system, the constructed protein-ligand complexes were performed to run MD simulations for 100 ns. The trajectories in the last ten nanoseconds (90– 100 ns) was considered to have reached the equilibrated state (supported by the plot of root-mean-square displacement (RMSD) for the backbone amino acids within 5 Å from the ligand as shown in Figure S3) were used to calculate the binding free energy (∆Gbind). This parameter was used to indicate the protein-ligand binding affinity and to employ a dynamics-based screening tool for ranking the hit compounds in the aftermath of rigid docking. As listed in Table 1, the ∆Gbind values of all hit compounds were in the range of −11.4 to −7.2 kcal/mol, whilst the ∆Gbind of BIRB796 is −11.95 kcal/mol. Importantly, the predicted ∆Gbind of BIRB796 (−11.95 ± 0.04 kcal/mol) is close to the experimental-derived ∆Gbind value (−10.98 kcal/mol [41]), showing the verification of the predictive method and the reliability of the results obtained. For screening purposes, only two hit compounds, lomitapide (∆Gbind = −11.39 ± 0.05 kcal/mol) and nilotinib (∆Gbind = −11.21 ± 0.04 kcal/mol) displaying a similar level of binding strength to the BIRB796, were selected for further investigation and the chemical structures of these three drug candidates are illustrated in Figure 2B. For nilotinib, it was previously reported that it could be a new off-target to p38 MAPK in the myoblast cell line [42], which could support our theoretical findings. In particular, the calculated energy terms shown in Table 1 could imply the influence of specific types of noncovalent interactions responsible for drug recognition. In this case, we found that all three drugs possessed a considerably higher contribution of van der Waals interaction energies than other types of interaction energies, agreeing well with the previous study that suggested the hydrophobicity of the binding pocket [43]. Additionally, the higher contribution of vdW interaction energies might imply that the screened drugs could preferentially target the hydrophobic regions within the focused binding site, which was similarly observed in the previously reported potent inhibitors [11]. For the solvation effect, As listed in Table 1, the ∆Gbind values of all hit compounds were in the range of −11.4 to −7.2 kcal/mol, whilst the ∆Gbind of BIRB796 is −11.95 kcal/mol. Importantly, the predicted ∆Gbind of BIRB796 (−11.95 ± 0.04 kcal/mol) is close to the experimental-derived ∆Gbind value (−10.98 kcal/mol [41]), showing the verification of the predictive method and the reliability of the results obtained. For screening purposes, only two hit compounds, lomitapide (∆Gbind = −11.39 ± 0.05 kcal/mol) and nilotinib (∆Gbind = −11.21 ± 0.04 kcal/mol) displaying a similar level of binding strength to the BIRB796, were selected for further investigation and the chemical structures of these three drug candidates are illustrated in Figure 2B. For nilotinib, it was previously reported that it could be a new off-target to p38 MAPK in the myoblast cell line [42], which could support our theoretical findings. In particular, the calculated energy terms shown in Table 1 could imply the influence of specific types of noncovalent interactions responsible for drug recognition. In this case, we found that all three drugs possessed a considerably higher contribution of van der Waals interaction energies than other types of interaction energies, agreeing well with the previous study that suggested the hydrophobicity of the binding pocket [43]. Additionally, the higher contribution of vdW interaction energies might imply that the screened drugs could preferentially target the hydrophobic regions within the focused binding site, which was similarly observed in the previously reported potent inhibitors [11]. For the solvation effect, the polar solvation energies expressed as the ∆GRF were in the range of 10.52 to 20.55 kcal/mol. Lomitapide showed a slightly higher ∆GRF (17.01 ± 0.24) than nilotinib and BIRB796 (15.54 ± 0.19 and 15.60 ± 0.20, respectively), implying the relatively minute higher polar solvation in the lomitapide complex system. For ∆Gcavity, it was found that the nonpolar solvation energies were in the range of −7.13 to −14.43 kcal/mol. Among candidates, lomitapide possessed the highest contribution of ∆Gcavity (−14.43 ± 0.04), showing that the drug could be well-buried into the cleft of the binding site while nilotinib and BIRB796 demonstrated a slight reduction in the nonpolar solvation effect (−12.35 ± 0.03 and −13.63 ± 0.04, respectively). By including the solvation free energy, the vdW term (∆EvdW + ∆G nonpolar sol ) was the main contribution to the total binding free energies of both

the polar solvation energies expressed as the ΔGRF were in the range of 10.52 to 20.55

drug candidates as well as BIRB796 whilst the electrostatic term (∆Eele + ∆G polar sol ) became much less favorable to the binding (Figure S4).

**Table 1.** ∆Gbind values (kcal/mol) of the candidate compounds as well as BIRB796 in complex with p38α MAPK calculated by the SIE-based end-point method using α, γ, and constant coefficients of 0.10, 0.01, and −2.89, respectively.


\* The experimental binding free energy was derived from the IC<sup>50</sup> of 0.018 µM [41] and was calculated by the equation ∆Gbind = RTlnIC50.

#### *3.3. Contact Atoms and Numbers of Hydrogen Bond Formation*

Identifying the number of atoms surrounding a ligand is one of the crucial parameters implying the ability of the drug recognition within the focused allosteric target. Herein, noncovalent contacts of any atoms within the 5.0 Å from the ligand were computed, and we found that the number of surrounding atoms averaged in the last 10 nanoseconds of each focused complex was in the order of lomitapide (429 ± 17 atoms) > BIRB796 (424 ± 6 atoms) > nilotinib (405 ± 2 atoms) as illustrated in Figure 5. The number of surrounding atoms of the lomitapide complex was slightly higher than that of the BIRB796 and nilotinib complexes, suggesting that the binding pocket residues are close-packed during complexation.

Furthermore, the quantity of hydrogen bonds (H-bond), which was considered one of the strong interactions responsible for drug-receptor binding, was analyzed during 90–100 ns with three independent replicates. As shown in Figure 5, the numbers of averaged H-bond interactions in BIRB796 and nilotinib were in a vicinity of a similar level (≈3–4 bonds), while the lomitapide displayed lower numbers of this kind of interaction (≈2–3 bonds). For drug binding, this indicated that both BIRB796 and nilotinib could form more H-bonds when compared to lomitapide. This is likely to occur since the intrinsic structural characteristic of lomitapide consists of gradual lower numbers of hydrogen bond donors and acceptors when compared to BIRB796 and nilotinib, (total numbers of H-bond

donors and acceptors of BIRB796, nilotinib, and lomitapide are 7, 8, and 5, respectively (analyzed by PharmaGist web interface [44] as listed in Table S1).

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 10 of 15

**Figure 5.** (**A**) Numbers of surrounding atoms counted within the 5.0 Å from the ligand and number of H-bonds within p38α MAPK-BIRB796 complex and two focused drugs at the last 10 nanoseconds (90–100 ns). The results were shown in three independent runs. (**B**) Percentage of H-bond occurrence during a complex formation of two screened drugs and the BIRB796 using two criteria as follows: (1) the distance between the hydrogen bond donor (HD) and hydrogen acceptor (HA) of ≤3.5 Å (2) the angle ≥120°. **Figure 5.** (**A**) Numbers of surrounding atoms counted within the 5.0 Å from the ligand and number of H-bonds within p38α MAPK-BIRB796 complex and two focused drugs at the last 10 nanoseconds (90–100 ns). The results were shown in three independent runs. (**B**) Percentage of H-bond occurrence during a complex formation of two screened drugs and the BIRB796 using two criteria as follows: (1) the distance between the hydrogen bond donor (HD) and hydrogen acceptor (HA) of ≤3.5 Å (2) the angle ≥120◦ .

*3.4. Key Binding Residues*  To elucidate the key binding amino acids responsible for the drug recognition within the allosteric pocket of p38α MAPK, the decomposition of free energy (ΔG୰ୣୱ୧ୢ୳ୣ ୠ୧୬ୢ ) based on the MM/GBSA method was computed. The negative and positive ΔG୰ୣୱ୧ୢ୳ୣ ୠ୧୬ୢ values indicate the ligand stabilization and destabilization, respectively. The contribution of each amino acid of the known inhibitor and two focused complexes is shown in Figure 6. Note that among residues 5–352 of p38α MAPK, only residues 5–250 are shown. For the BIRB796, it was obviously seen that E71 and D168 played a pivotal role in stabilizing the protein–ligand complex as its large ΔG୰ୣୱ୧ୢ୳ୣ ୠ୧୬ୢ value was observed (approximately −6 and −4 kcal/mol, respectively). In addition, four hydrophobic residues (L75, I84, L108, and L107) and one polar uncharged amino acid (T106) were found to be involved in a process of complex formation. This key-binding elucidation agreed well with the previous reports of BIRB796's binding mode analysis [11,41]. Apart from a reference ligand, the amino acids largely contributing to the lomitapide binding (ΔG୰ୣୱ୧ୢ୳ୣ ୠ୧୬ୢ < −1.5 kcal/mol) include L74, L75, T106, L167, L171, and H174. Almost all were hydrophobic residues (except H174, a polar positively charged residue), suggesting the ensembles of hy-In addition, the intermolecular H-bond interactions were observed in terms of the percentage of occupations (Figure 5B), which indicated how often the transient H-bonds could occur during the whole simulated time. As expected, a few strong hydrogen bonds could be seen in all focused drugs and even in BIRB796 since their inhibitory actions were mainly driven by hydrophobic interactions (Table 1). Obviously, all three compounds were found to have very strong H-bonds with D168 (99.7%, 92.7%, and 98.7% occupations for BIRB796, lomitapide, and nilotinib, respectively). Additionally, the BIRB796 showed an additional very strong H-bond with E71 (≈99%), while this bond was reduced to 70% and 34.5% for nilotinib. The slight loss of this interaction in nilotinib might cause a slight reduction in binding affinity when compared to BIRB796 (Table 1). Nonetheless, we could not observe the H-bond with E71 for lomitapide binding since it lacks H-bond donors at that oriented position. Hence, we hypothesized that adding functional groups containing H-bond donors (e.g., -NH2) onto the carbon atom in the piperidine ring within its structure might allow it to have more additional H-bond interactions with E71. Hence, we ran MD simulations of the modified structure of lomitapide and subjected the MD output to analyze the binding energy at the last 10 ns by using the end-point SIE-based method (as the same protocol used previously with other compounds). As expected, the H-bond occupation with

drophobic interactions were dominant towards the binding (supported by per-residue vdW interaction energy as illustrated in Figure S4). Among these, the four residues L75, I84, T106, and L167 shared binding features in common with BIRB796. Interestingly, un-

E71 could be formed at 82.45% during the whole simulated time. Moreover, the binding energy was decreased from −11.39 ± 0.05 kcal/mol to −12.15 ± 0.06 kcal/mol (better binding affinity, Table S3), and slightly lower than BIRB796 (−11.95 ± 0.04 kcal/mol). This finding suggested that modification of a lomitapide's structure by permitting it to interact with E71 could improve its binding affinity, which encouraged us to investigate further. acids in the region of 170–199 attracted considerable attention, as similarly observed in a new series of benzooxadiazole-based p38 inhibitors which was granted a patent in 2014– 2015 (Allinky Biopharma. Co., Madrid, Spain) [11]. In the case of nilotinib, it was found

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 11 of 15

#### *3.4. Key Binding Residues* that key amino acids contributing to its binding were mostly the same residues responsi-

To elucidate the key binding amino acids responsible for the drug recognition within the allosteric pocket of p38α MAPK, the decomposition of free energy (∆G bind residue) based on the MM/GBSA method was computed. The negative and positive ∆G bind residue values indicate the ligand stabilization and destabilization, respectively. The contribution of each amino acid of the known inhibitor and two focused complexes is shown in Figure 6. Note that among residues 5–352 of p38α MAPK, only residues 5–250 are shown. ble for BIRB796 binding (E71, L75, I84, L107, L108, and D168) since it belongs to the same type of inhibitor (kinase inhibitor). Among these, E71 and D168 were essentially responsible for stabilizing the complex via H-bond while the others relied on hydrophobic interactions (Figure S4). Two additional residues, K53 and L74 were also observed. We noted that these results are correlated well with the calculated SIE-based ΔGbind and each energy component as listed in Table 1.

**Figure 6.** Per-residue free energy decomposition of amino acids involved in ligand binding where the highest to lowest ΔG୰ୣୱ୧ୢ୳ୣ ୠ୧୬ୢ contribution (more negative value) was shaded from dark blue to white. **Figure 6.** Per-residue free energy decomposition of amino acids involved in ligand binding where the highest to lowest ∆G bind residue contribution (more negative value) was shaded from dark blue to white.

For the BIRB796, it was obviously seen that E71 and D168 played a pivotal role in stabilizing the protein–ligand complex as its large ∆G bind residue value was observed (approximately −6 and −4 kcal/mol, respectively). In addition, four hydrophobic residues (L75, I84, L108, and L107) and one polar uncharged amino acid (T106) were found to be involved in a process of complex formation. This key-binding elucidation agreed well with the previous reports of BIRB796's binding mode analysis [11,41]. Apart from a reference ligand, the amino acids largely contributing to the lomitapide binding (∆G bind residue < −1.5 kcal/mol) include L74, L75, T106, L167, L171, and H174. Almost all were hydrophobic residues (except H174, a polar positively charged residue), suggesting the ensembles of hydrophobic interactions were dominant towards the binding (supported by per-residue vdW interaction energy as illustrated in Figure S4). Among these, the four residues L75, I84, T106, and L167 shared binding features in common with BIRB796. Interestingly, unlike BIRB796, lomitapide could bind to L171 and H174. Having interaction with the amino acids in the region of 170–199 attracted considerable attention, as similarly observed in a new series of benzooxadiazole-based p38 inhibitors which was granted a patent in 2014–2015 (Allinky Biopharma. Co., Madrid, Spain) [11]. In the case of nilotinib, it was found that key amino acids contributing to its binding were mostly the same residues responsible for BIRB796 binding (E71, L75, I84, L107, L108, and D168) since it belongs to the same type of inhibitor (kinase inhibitor). Among these, E71 and D168 were essentially responsible for stabilizing the complex via H-bond while the others relied on hydrophobic interactions (Figure S4). Two additional residues, K53 and L74 were also observed. We noted that these results are correlated well with the calculated SIE-based ∆Gbind and each energy component as listed in Table 1.

#### *3.5. QM-Based ONIOM Binding Energy*

The analysis of the ONIOM binding energies was employed to additionally observe the binding ability of the two screened drugs within the focused allosteric site of p38α MAPK. The calculations were based on the QM method, which could provide a more reliable prediction when compared to the end-point estimation [45]. As shown in Table 2, the calculated binding energy (E solvation binding ) values ranged from approximately −41.0 to −48.5 kcal/mol. The E solvation binding values displayed a similar trend to the SIE-based prediction in which the binding affinity of BIRB7996 was slightly higher than that of lomitapide and nilotinib. Even though the E solvation binding of the two screened drugs showed as slightly lower, their predicted binding strength was still high and comparable to the reference inhibitor, BIRB796. Accordingly, we believe that lomitapide and nilotinib could be able to inhibit the phosphorylation of p38α MAPK, and the ONIOM-based method theoretically confirmed their inhibitory capability towards p38α MAPK at the allosteric site. It is worth noting that the prediction trend of binding affinity by ONIOM energy calculations was in good agreement with the SIE-based end-point method.

**Table 2.** Calculated binding energy (E solvation binding ) in kcal/mol of two screened drugs and BIRB796 by means of ONIOM at B3LYP/6-31G(d):PM6 level of theory.


#### **4. Conclusions**

Since there have been no drugs approved as therapeutic agents for p38α MAPK, our research is considered one of the collective efforts to search for effective drugs targeting this target at the allosteric site by a drug repurposing approach. Verified docking- and dynamicbased screening revealed that lomitapide and nilotinib could alternatively impede the p38α MAPK's function with a great binding affinity and characteristics. The binding affinity estimated by both end-point and QM-based ONIOM methods revealed a comparable level to the inhibitor prototype (BIRB796), supported by the calculated numbers of atoms surrounded within the 5.0 Å from the ligand. Specifically, vdW interaction energies were the main force driving the complex formation. Moreover, all drugs could form a few H-bonds with the amino acids lining in the allosteric site, which could rank in the order of BIRB796 ≈ nilotinib > lomitapide. The two residues (E71 and D168) played a pivotal role in forming very strong H-bonds with the focused drugs. More importantly, we proposed that modifying a lomitapide's structure by allowing it to interact with E71 via H-bonds could improve its binding affinity. Altogether, our in silico study not only presented the potential inhibitors, but also provided useful information at the atomic level to shed light on rationally designing more potent inhibitors disrupting the MAPK signaling pathway. However, experiments determining the biological activities of these elucidated compounds including enzyme- and cell-based assays should be further carried out.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/pharmaceutics14071461/s1, Figure S1: Ramanchandran plot analysis of a homologically constructed protein structure, Figure S2: Alignment of the re-docked pose and available crystallized ligand (BIRB796) of p38α MAPK indicating a verified docking protocol used in this study, Figure S3: Plot of root-mean-square displacement (RMSD) for the backbone amino acids within 5 Å from the ligand, Figure S4: Analysis of per-residue VdW and electrostatic decomposition energy, Table S1: Summary of key pharmacophore features of BIRB796, lomitapide, and nilotinib detected by using the PharmaGist web interface, Table S2: The ∆Gbind value (kcal/mol) in each run and the averaged ∆Gbind of the two focused drug candidates and BIRB796 in complex with p38α MAPK, Table S3: The ∆Gbind value (kcal/mol) of the modified structure of lomitapide in complex with p38α MAPK, calculated by using end-point SIE method.

**Author Contributions:** U.S. carried out the preparation, data collection, virtual screening, molecular dynamic simulations, binding free energy calculations, and wrote the initial version of the manuscript. P.M. and T.R. conceived this study and are responsible for the overall design, interpretation, manuscript preparation, and communication. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financially supported by the Thailand Research Fund (grant number RSA6280085). P.M. would like to thank the Fundamental Fund of Khon Kaen University and the National Science, Research and Innovation Fund (NSRF) for the funding support.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** U.S. would like to thank the Science Achievement Scholarship (SAST) of Thailand for the Ph.D. scholarship, the 90th Anniversary of Chulalongkorn University Fund (Ratchadaphiseksomphot Endowment Fund; GCUGR1125651029D), and the Overseas Presentations of Graduate Level Academic Thesis from Graduate School.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Drug Repurposing Based on Protozoan Proteome: In Vitro Evaluation of In Silico Screened Compounds against** *Toxoplasma gondii*

**Débora Chaves Cajazeiro <sup>1</sup> , Paula Pereira Marques Toledo <sup>1</sup> , Natália Ferreira de Sousa <sup>2</sup> , Marcus Tullius Scotti <sup>2</sup> and Juliana Quero Reimão 1,\***


**Abstract:** *Toxoplasma gondii* is a protozoan that infects up to a third of the world's population. This parasite can cause serious problems, especially if a woman is infected during pregnancy, when toxoplasmosis can cause miscarriage, or serious complications to the baby, or in an immunocompromised person, when the infection can possibly affect the patient's eyes or brain. To identify potential drug candidates that could counter toxoplasmosis, we selected 13 compounds which were pre-screened in silico based on the proteome of *T. gondii* to be evaluated in vitro against the parasite in a cell-based assay. Among the selected compounds, three demonstrated in vitro anti-*T. gondii* activity in the nanomolar range (almitrine, bortezomib, and fludarabine), and ten compounds demonstrated anti-*T. gondii* activity in the micromolar range (digitoxin, digoxin, doxorubicin, fusidic acid, levofloxacin, lomefloxacin, mycophenolic acid, ribavirin, trimethoprim, and valproic acid). Almitrine demonstrated a Selectivity Index (provided by the ratio between the Half Cytotoxic Concentration against human foreskin fibroblasts and the Half Effective Concentration against *T. gondii* tachyzoites) that was higher than 47, whilst being considered a lead compound against *T. gondii*. Almitrine showed interactions with the Na+/K<sup>+</sup> ATPase transporter for *Homo sapiens* and *Mus musculus*, indicating a possible mechanism of action of this compound.

**Keywords:** bioinformatics; drug repurposing; toxoplasmosis; *Toxoplasma gondii*; in vitro screening; drug targets; drug discovery

#### **1. Introduction**

*Toxoplasma gondii* is an obligate intracellular protozoan parasite that belongs to Apicomplexa Phylum and is the etiological agent of toxoplasmosis [1]. The parasite diverged from closer species due to its ability to infect a wide range of hosts, re-enforced by flexible transmission pathways [2]. Because of this, it is estimated that more than 60% of the population throughout the world have been infected [3] and, in Brazil, the serologic prevalence of *T. gondii* human infection ranges from 50% to 80% [4].

Despite the importance of toxoplasmosis to public health, considering its high prevalence in the human population and the serious clinical manifestations, mainly in immunocompromised patients and in cases of congenital infection [5], there are still very few therapeutic options available, these being effective only against the acute form of the disease [6].

Ideal drugs for toxoplasmosis treatments should be effective against the chronic form of infection and be offered at an affordable price, and present low or zero toxicity [7].

**Citation:** Cajazeiro, D.C.; Toledo, P.P.M.; de Sousa, N.F.; Scotti, M.T.; Reimão, J.Q. Drug Repurposing Based on Protozoan Proteome: In Vitro Evaluation of In Silico Screened Compounds against *Toxoplasma gondii*. *Pharmaceutics* **2022**, *14*, 1634. https://doi.org/10.3390/ pharmaceutics14081634

Academic Editors: Paul Bogdan, Lucret,ia Udrescu, Mihai Udrescu and Ludovic Kurunczi

Received: 18 May 2022 Accepted: 2 August 2022 Published: 5 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Ideally, they should also not present risks of congenital malformation, allowing pregnant women to use them freely. However, several of these characteristics are not found in drugs currently used in standard toxoplasmosis therapy, which has remained unchanged since the beginning of the 1990s [7]. Since current chemotherapy is insufficiently effective, with extended treatments that vary from weeks to over a year in duration, or show high toxicity [8], alternative therapeutic options for toxoplasmosis treatment are of utmost importance.

The research and development of new drugs represent a slow and onerous process. New techniques have been proposed to speed up this process, and one of these is called 'drug repositioning'. It consists of a strategy that seeks new applications for an existing drug, which have not been previously referenced and are not currently prescribed or researched [9].

Aiming to find new uses for already known compounds, the international organization Medicines for Malaria Venture (MMV) and the Drugs for Neglected Diseases initiative (DNDi), together with researchers from the industrial and academic fields, created the Pandemic Response Box and the COVID Box. Together, these collections consist of 560 structurally diverse active compounds, all set for trial against infectious and neglected diseases. These compounds were selected from an extensive list of antibacterial, antifungal, and antiviral compounds, all of which are already being commercialized or are in the clinical development phase [10].

Malaria Box and Pathogen Box are two other collections created by MMV that gather around 800 compounds with confirmed activity against the most socio-economic relevant diseases all over the world, such as malaria, tuberculosis, sleeping sickness, leishmaniasis, schistosomiases, ancylostomiasis, toxoplasmosis, cryptosporidiosis, and dengue. These collections were used to identify new drug candidates for the treatment of many diseases, including toxoplasmosis [2,11,12].

Databases of bioactivity, such as ChEMBL and DrugBank, provide information about the interaction between compounds and proteins. Sarteriale et al. [13] have presented an approach to pre-track the entire proteome of any organism with available genomic data against known drug targets, using a combination of Ruby scripts and freely available resources. This method was used to predict inhibitors for disease-causing protozoan parasites. The authors performed the in vitro validation of the in silico results obtained, using a cell-based *Cryptosporidium parvum* growth assay, showing that the predicted inhibitors were significantly more likely than those expected randomly by chance. However, the identified compounds had not yet been evaluated against *T. gondii* in a cell-based assay until now. Here, we tested some of the inhibitors identified by Sarteriale et al. 2014 [13], aiming to confirm the in silico predicted activity against *T. gondii* in a cell-based assay.

Amongst the compounds that presented a *T. gondii* protein as their target in the virtual screening, 13 were selected for evaluation against *T. gondii* in the present work. This selection was based on the presence of these compounds in the MMV Pandemic Response Box and COVID Box collections, aiming to evaluate in vitro the predicted activity against *T. gondii* (Figure 1).

We crossed the in silico screening results achieved by Sarteriale et al. 2014 [13] with the MMV libraries, aiming to build a small enriched compound collection for in vitro drug testing.

Our objective was to test only the in silico predicted *T. gondii* inhibitors available in the Pandemic Response Box and COVID Box collections, enabling a more efficient use of laboratory resources. We obtained 100% accuracy, since all these 13 compounds showed anti-*T. gondii* activity in the micromolar or nanomolar range, this being the first report about the in vitro anti-*T. gondii* activity of almitrine, bortezomib, and fludarabine.

Drug development is a lengthy, complex, and costly process, entrenched with a high degree of uncertainty that a drug will succeed. In this context, drug repurposing—A strategy for identifying new clinical uses for existing drugs—Becomes an interesting strategy for drug discovery, as it involves potentially lower financial costs in drug development as well as shorter timelines [14].

Figure 1. Study design and workflow. Following the publication of predicted drugs for T. gondii via DrugBank alignments by Sarteriale et al., (2014) [13], we selected 13 compounds from Pandemic Response and COVID Boxes to be in vitro evaluated against T. gondii and for Molecular Docking and Dinamics Simulations in the present work. Among the 13 selected compounds, three demonstrated in vitro anti-T. gondii activity in the nanomolar range (almitrine, bortezomib, and fludarabine), and ten compounds demonstrated anti-T. gondii activity in the micromolar range. We crossed the in silico screening results achieved by Sarteriale et al. 2014 [13] with **Figure 1.** Study design and workflow. Following the publication of predicted drugs for *T. gondii* via DrugBank alignments by Sarteriale et al., (2014) [13], we selected 13 compounds from Pandemic Response and COVID Boxes to be in vitro evaluated against *T. gondii* and for Molecular Docking and Dinamics Simulations in the present work. Among the 13 selected compounds, three demonstrated in vitro anti-*T. gondii* activity in the nanomolar range (almitrine, bortezomib, and fludarabine), and ten compounds demonstrated anti-*T. gondii* activity in the micromolar range.

the MMV libraries, aiming to build a small enriched compound collection for in vitro drug testing. Our objective was to test only the in silico predicted T. gondii inhibitors available in Because repurposing screens can be costly and time consuming, an in silico drug screen with the ability to identify drugs with a high likelihood of activity improves the chances of success by enabling the pre-selection of compounds to test in vitro.

the Pandemic Response Box and COVID Box collections, enabling a more efficient use of laboratory resources. We obtained 100% accuracy, since all these 13 compounds showed anti-T. gondii activity in the micromolar or nanomolar range, this being the first report about the in vitro anti-T. gondii activity of almitrine, bortezomib, and fludarabine. Drug development is a lengthy, complex, and costly process, entrenched with a high degree of uncertainty that a drug will succeed. In this context, drug repurposing—a Here we connected traditional drug discovery techniques with computer-based tools to deliver robust drug repurposing hints. We used a target-based pre-screen that utilized simple sequence alignment techniques to discover potential drugs [13]. Drugs' structural and physicochemical properties and the predicted drug-target interactions were explored to select potential re-positioned compounds to treat toxoplasmosis. Therefore, the contributions of this manuscript are:


contributions of this manuscript are: The goal of the present work is therefore to contribute to the discovery of new candidates for toxoplasmosis chemotherapy, using repositioned compounds. The strategy of drug repositioning allows for efficient progress in the drug discovery process since many of the compounds are clinically safe and have well established pharmacological action.

#### **2. Materials and Methods**

#### *2.1. Drugs and Chemicals*

Pyrimethamine (PYR), dimethyl sulfoxide (DMSO), chlorophenol red-β-Dgalactop yranosidase (CPRG), phosphate buffer saline (PBS) and 3-[4,5-dimethylthiazol-2 yl]-2,5-diphenyltetrazolium bromide (MTT) were purchased from Sigma-Aldrich Corporation. Dulbecco's Modified Eagle's Medium (DMEM), fetal bovine serum (FBS), dithiothreitol (DTT), HEPES and sodium dodecyl sulfate (SDS) were purchased from Thermo Fisher Scientific. Pandemic Response Box (PRB) and COVID Box (CB) were kindly donated by the Medicines for Malaria Venture (MMV) foundation. Other analytical reagents were purchased from Sigma-Aldrich, unless otherwise stated.

#### *2.2. Cell Culture and Parasite Propagation*

Tachyzoites of the RH strain encoding a transgenic copy of β-galactosidase (type I, clone 2F1) [15] were continually passaged in confluent monolayers of human foreskin fibroblasts (HFF), cultured in DMEM supplemented with 2% FBS (D2 medium), L-glutamine (2 mM) and gentamycin (10 µg/mL) [16]. Fresh emerging tachyzoites were counted, diluted in a fresh culture medium, and added to 96-well plates containing HFF monolayers as described below. All HFF and parasite cultures were grown in a 37 ◦C incubator supplemented with 5% CO2.

#### *2.3. β-Galactosidase-Based Growth Inhibition Assays*

Firstly, 5 <sup>×</sup> <sup>10</sup><sup>3</sup> HFF cells/well (in 100 <sup>µ</sup>L volume) were placed in 96-well plates and incubated overnight to adhere. Afterwards, the wells were emptied and refilled with fresh D2 medium containing 5 <sup>×</sup> <sup>10</sup><sup>3</sup> RH-2F1 parasites (in 100 <sup>µ</sup>L volume) and incubated for 3 h at 37 ◦C, 5% CO2. Subsequently, compounds were serially diluted in D2 medium and added to the infected plates and incubated for 72 h at 37 ◦C, 5% CO2. Each drug concentration was assessed in two replicate wells. Finally, β-galactosidase activity was evaluated as previously described [17]. Infected cells were incubated with 100 µL of lysis buffer (100 mM HEPES, 1 mM MgSO4, 0.1% Triton X-100, 5 mM DTT) for 15 min. Afterwards, the lysates were mixed with 160 µL of assay buffer (100 mM phospate buffer pH 7.3, 102 mM β-mercaptoethanol, 9 mM MgCl2) and, subsequently, with 40 µL of 6.25 mM CPRG. After incubating the reaction mixtures for 30 min, the β-galactosidase activity was measured at 570 nm using a microplate reader (Thermo Scientific™ Varioskan LUX). Pyrimethamine was used as a reference drug (positive control) in all assays. Data presented are representative of the results of two or more biological replicates. Dose-response inhibition curves (Log (inhibitor) vs. normalized response—Variable slope) were obtained using Skanlt Software (Thermo Scientific, Waltham, MA, USA).

#### *2.4. Cytotoxicity in Mammalian Cells*

HFF were seeded at 5 <sup>×</sup> <sup>10</sup><sup>4</sup> cells/well in 96-well microplates and incubated overnight to adhere to the plate. After that, the cells were incubated in the presence of increasing concentrations of the compounds for 72 h at 37 ◦C in a 5% CO<sup>2</sup> humidified incubator. The viability of the cells was determined by the MTT assay as previously described [18]. The medium in each well was replaced by PBS (100 µL/well), MTT (5 mg/mL) was added (20 µL/well), and the plate was incubated for 4 h at 37 ◦C. Formazan extraction was performed using 10% SDS for 18 h (80 µL/well) at room temperature, and the optical density was measured at 550 nm using a microplate reader (Thermo Scientific™ Varioskan LUX). HFF incubated in D2 without drug treatment were used as viability control. Viability of 100% was expressed based on the optical density of untreated HFF cells, after normalization. The Selectivity Index (SI) was provided by the ratio between the CC<sup>50</sup> against HFF cells and the EC<sup>50</sup> against *T. gondii* tachyzoites. Data presented are representative of the results of two or more biological replicates. Dose-response inhibition curves (Log (inhibitor) vs. normalized response—Variable slope) were obtained using Skanlt Software (Thermo Scientific).

#### *2.5. Molecular Docking*

Molecular Docking was used to investigate the mechanism of action of the 13 compounds included in the study that contribute to the inhibitory effect of *T. gondii* through the binding affinity of the compound and the predicted molecular target [19]. The 3D structure of the enzyme was obtained from the Protein Data Bank (PDB) (https://www.rcsb.org/ pdb/home/home.do accessed on 14 March 2022) [20]. Initially, all water molecules were removed from the crystalline structure and the root mean square deviation (RMSD) was calculated from the postures, indicating the degree of reliability of the fit. RMSD provides the connection mode close to the experimental structure and is considered successful if the value is less than 2.0 Å [21]. We used two softwares—Molegro Virtual Docker v.6.0.1 (MVD) (CLC Bio Company, Aarhus, Denmark) and PYRX—Virtual Screening Tool, Source Force, 2022, Slashdot Media. The complexed ligand was used to define the active site. The compound was then imported to analyze the stability of the system through the interactions identified with the active site of the enzyme, taking as a reference the energy value of the MolDock Score [22]. The MolDock SE (Simplex Evolution) algorithm was used with the following parameters: A total of 10 runs with a maximum of 1500 iterations, using a population of 50 individuals, with 2000 minimization steps for each flexible residual and 2000 global minimization steps per run. The MolDock score (GRID) scoring function was used to calculate docking energy values. A GRID was set at 0.3 A and the search sphere was set at 15 A in radius. For the analysis of ligand energy, internal electrostatic interactions, internal hydrogen bonds and sp2-sp2 torsions were evaluated [23,24]. The PYRX—Virtual Screening Tool, Source Force, 2022, Slashdot Media features two main programs, corresponding to: Auto Dock (version 4.2.6), (Center for Computational Structural Biology, San Diego, CA, USA) which uses force fields such as AMBER in conjunction with free energy scoring functions, plus affinity maps and pre-calculated electrostatic maps for specific atoms [25,26]. The second program refers to Auto Dock Vina (version 1.2), (Center for Computational Structural Biology, San Diego, CA, USA), which corresponds to a more recent and improved version of the calculation platform. The software uses a semi-flexible docking algorithm by default. The anchoring site of the receptor being defined within the binding site of the co-crystallized ligand, identified through the coordinates of the ligand after importing and labeling the macromolecule [27,28]. The program was used with a default plug-in parameter. Furthermore, the hydrogen bonding distance (O-H) was defined at <2.50 Å between the donor and acceptor atoms with a minimum hydrogen donor-acceptor angle of 120◦ . Grid size was adjusted to 25 Å in each dimension. The proteins used in the study were, respectively: thymidyl synthase in complex with 2-amino-5-(phenylsulfanyl)-3,9-dihydro-4H-pyrimido[4,5-b]indol-4-one (PDB: 4KY4) [29], purine nucleoside phosphorylase in complex with 1,4-dideoxy-4-aza-1-(s)-(9-deazahypoxanthine-9-yl)-d-ribitol (PDB: 3MB8) [30], enoyl-acyl carrier protein reductase (ENR) in complex with triclosan (PDB: 2O2S) [31], and calcium dependent protein kinase 1 in complex with 5-amino-1-tert-butyl-3-(quinolin-2-yl)-1H-pyrazole-4-carboxamide (PDB: 4M84) [32]. In addition, to evaluate the specificity of the mechanism of action with Na+/K<sup>+</sup> -transporting ATPase alpha 1, the construction of this macromolecule was carried out for the species *Homo sapiens* and *Mus musculus* [32] with thapsigargin [33] as a positive control.

#### Docking Consensus

To increase the accuracy of the results obtained, a Docking consensus analysis was performed in order to provide a better selection of the compounds under study. Regarding the Molegro Virtual Docker v.6.0.1 (MVD) program (CLC Bio Company, Aarhus, Denmark), the values of the Moldock Score and PlantScore algorithms were used. Regarding the PYRX program—Virtual Screening Tool, Source Force, 2022, Slashdot Media, AutoDock Vina (version 1.2) (Center for Computational Structural Biology, San Diego, CA, USA) was used.

The determination of the affinity of the 13 compounds under study for the targets of *T. gondii* and the ATPase alpha 1 transporter was established by probability calculations. The probability was calculated by dividing the score of the molecule under study by the

lowest energy score (p = composite score/minor score) (Supplementary Tables S1–S5), for each algorithm, and at the end an overall average was calculated between the algorithms to generate the enzyme average ((p) Enzyme = ((p) Moldock Score + (p) Plants Score + (p) Vina Score)/3) [33,34]. The sum of the enzyme mean and division by the number of information originated the total probability (Total P).

#### *2.6. Alignment of Protein Sequences*

The sequences of the two proteins that do not contain 3D structures in the Protein Data Bank [35] were obtained from the GenBank database [36]. These proteins were: Na+/K<sup>+</sup> transporting ATPase alpha 1—*M. musculus* (NP\_659149.1) and Na+/K<sup>+</sup> -transporting ATPase alpha 1—*H. sapiens* (NP\_000695.2). A global alignment was then performed with the sequence of a protein with a known three-dimensional structure, using the Clustal Omega web tool (WMBL-EBI, 2022 https://www.ebi.ac.uk/Tools/msa/clustalo/ accessed on 14 March 2022) [37], which aligns all protein sequences entered by a user. Alignment facilitated the investigation of the active site and the determination of similarity and shared identity between proteins.

#### *2.7. Modeling by Homology*

Target sequences were obtained as amino acid sequences in FASTA format and were imported from the SWISS-MODEL website (https://swissmodel.expasy.org/ accessed on 14 March 2022) [38]. For each identified mold, the quality was predicted from alignment features such as ProMod3, QMEAN and GMQE. The stereochemical quality of the models was evaluated by the PSVS (protein structure validation software suite) web server (http:// psvs-1\_5-dev.nesg.org/ accessed on 14 March 2022), using PROCHECK [39]. PROCHECK generates a Ramachandran chart [34,35], which determines the allowed and disallowed regions of the amino acid backbone.

#### *2.8. Molecular Dynamics Simulations*

Molecular dynamics simulations were performed to estimate the flexibility of interactions between proteins and ligands, using GROMACS 5.0 software (European Union Horizon 2020 Program, Uppsala, Sweden) [40,41]. The protein and ligand topologies were also prepared using the GROMOS96 54a7 force field. The Molecular Dynamics simulation was performed using the SPC water model of point load, extended in a cubic box [42]. The system was neutralized by the addition of ions (Cl<sup>−</sup> and Na<sup>+</sup> ) and minimized, to remove bad contacts between complex molecules and the solvent. The system was also balanced at 300 K, using the 100 ps V-rescale algorithm, represented by NVT (constant pressure particles and temperature), up to 100 ps. DM simulations were performed in 5,000,000 steps, at 10 ns. To determine the flexibility of the structure and whether the complex is stable close to the experimental structure, RMSD values of all Cα atoms were calculated relative to the starting structures. RMSF values were also analyzed to understand the roles played by residues near the receptor binding site. The RMSD and RMSF graphs were generated using Grace software (Grace Development Team, http://plasma-gate.weizmann.ac.il/Grace/ accessed on 23 June 2022) [43].

#### **3. Results**

#### *3.1. In Vitro Anti-T. gondii Activity and Cytotoxicity against HFF*

We tested 13 compounds that have been in silico selected against *T. gondii* from the MMV foundation's Pandemic Response Box and COVID Box. Table 1 and Figure 2 show the structures and general characteristics of the tested compounds.


**Table 1.** General characteristics of the 13 compounds tested against *T gondii* in vitro.



<sup>a</sup> Compounds are named by their MMV identifier codes. <sup>b</sup> Molecular formula, molecular weight (Mol wt), aLogP values, and information about rule of five were obtained from the Pandemic Response Box and COVID Box supporting information. Pharmaceutics 2022, 14, x FOR PEER REVIEW 8 of 17

Figure 2. Structures of the 13 compounds tested against T. gondii in vitro. The structures were obtained from http://www.ebi.ac.uk/chembl. **Figure 2.** Structures of the 13 compounds tested against *T. gondii* in vitro. The structures were obtained from http://www.ebi.ac.uk/chembl.

We used a 96-well plate assay based on β-galactosidase expression to estimate the T. gondii tachyzoites' viability. From the 13 tested compounds, three demonstrated anti-T. gondii activity at nanomolar range, named almitrine (MMV1804175), bortezomib (MMV009415), and fludarabine (MMV637413), with activity comparable to the reference

We used a 96-well plate assay based on β-galactosidase expression to estimate the *T. gondii* tachyzoites' viability. From the 13 tested compounds, three demonstrated anti-*T. gondii* activity at nanomolar range, named almitrine (MMV1804175), bortezomib (MMV009415), and fludarabine (MMV637413), with activity comparable to the reference drug pyrimethamine. A total of ten compounds demonstrated EC<sup>50</sup> at the micromolar range (digitoxin, digoxin, doxorubicin, fusidic acid, levofloxacin, lomefloxacin, mycophenolic acid, ribavirin, trimethoprim, and valproic acid). The cytotoxicity against mammalian cells was evaluated for the three most active compounds (almitrine, bortezomib, and fludarabine). Almitrine presented the highest selectivity (SI > 47), with a CC<sup>50</sup> value greater than 20 µM (the higher tested concentration) against HFF. Results concerning the anti-*T. gondii* activity and mammalian cytotoxicity are shown in Table 2.

**Table 2.** In vitro activity of the selected compounds against *T. gondii*, with pyrimethamine as the reference drug.


<sup>a</sup> Half Effective Concentration (EC50) against *T. gondii* tachyzoites. <sup>b</sup> Half Cytotoxic Concentration (CC50) against HFF cells. <sup>c</sup> Selectivity indexes (SI) were calculated based on the CC<sup>50</sup> HFF cells/EC<sup>50</sup> *T. gondii* ratio. n.d.: not determined.

Based on these results, almitrine was considered a promising anti-*T. gondii* drug candidate. The 13 compounds were subjected to Molecular Docking screening in four proteins for *T. gondii,* and the compound almitrine was subjected to docking simulations with the Na+/K<sup>+</sup> -ATPase alpha 1 transporter of *H. sapiens* and *M. musculus*.

#### *3.2. In Silico Results*

The in silico screening was carried out in two stages, the first corresponding to the evaluation of the probabilities of the compounds against the specific targets for *T. gondii* and the second referring to the screening of the compounds in the ATPase alpha 1 transporter to the species *H. sapiens* and *M. musculus*. Prior to carrying out the Molecular Docking simulations, redocking was performed, aiming to validate the enzymes used in the study. The redocking results (Supplementary Table S1) showed that all targets obtained from the PDB for the organism *T. gondii* had RMSDs below 2.0 Å, indicating that the generated poses of the co-crystallized ligand are correctly positioned at the ligand's active site.

Docking results were generated using three scoring functions (moldock score, plants score and autodock vina). In addition, the probability of activity in each of the enzymes was calculated. The obtained probability in each algorithm is shown for *T. gondii* enzymes (Supplementary Tables S2–S5) and for the ATPase alpha 1 transporter (Supplementary Tables S6 and S7). The total probability of the compound in the organism was also calculated for *T. gondii* and for the transporter ATPase alpha 1 (Supplementary Tables S8 and S9, respectively). The protein in which the compound obtained probability higher than, or close to, the values obtained by the ligand in at least one scoring function was considered active. zymes.

Therefore, the ligands selected in the study are co-crystallized in the structure obtained in the PDB library and present experimental validation for the respective enzymes. are close to those obtained for the PDB ligands. Almitrine presented a significant probability for the ATPase alpha 1 transporter (H. sapiens) equivalent to 0.8362 (Supplementary Table S9). Furthermore, it was the most

obtained in the PDB library and present experimental validation for the respective en-

For T. gondii enzymes, the compound doxorubicin achieved the highest total probability, corresponding to 0.8816 (Supplementary Table S8). Furthermore, the compounds almitrine (0.8461) and bortezomib (0.8383) presented probabilities greater than 0.80, which

Pharmaceutics 2022, 14, x FOR PEER REVIEW 10 of 17

For *T. gondii* enzymes, the compound doxorubicin achieved the highest total probability, corresponding to 0.8816 (Supplementary Table S8). Furthermore, the compounds almitrine (0.8461) and bortezomib (0.8383) presented probabilities greater than 0.80, which are close to those obtained for the PDB ligands. likely compound for the ATPase transporter (M. musculus), with p = 0.9508, and presented the highest total probability for the two enzymes under study (0.8935). This demonstrates a potency and affinity of this compound for this macromolecule. The molecular coupling

Almitrine presented a significant probability for the ATPase alpha 1 transporter (*H. sapiens*) equivalent to 0.8362 (Supplementary Table S9). Furthermore, it was the most likely compound for the ATPase transporter (*M. musculus*), with *p* = 0.9508, and presented the highest total probability for the two enzymes under study (0.8935). This demonstrates a potency and affinity of this compound for this macromolecule. The molecular coupling of almitrine with transporters for the species *M. musculus* and *H. sapiens* can be seen in Supplementary Tables S4 and S5. The molecular coupling study of almitrine indicated steric, hydrophobic and hydrogen bonding interactions. In addition, it presented residues similar to the positive control tapsigargin, involved the hydrogen interactions of the Arg 551 and Asp 619 residues. of almitrine with transporters for the species M. musculus and H. sapiens can be seen in Supplementary Tables S4 and S5. The molecular coupling study of almitrine indicated steric, hydrophobic and hydrogen bonding interactions. In addition, it presented residues similar to the positive control tapsigargin, involved the hydrogen interactions of the Arg 551 and Asp 619 residues. After the analysis of the potential activity of the 13 compounds under study against important T. gondii enzymes, Molecular Dynamics simulations were carried out with the compound almitrine to assess the flexibility of the transporting ATPase alpha 1 and the

After the analysis of the potential activity of the 13 compounds under study against important *T. gondii* enzymes, Molecular Dynamics simulations were carried out with the compound almitrine to assess the flexibility of the transporting ATPase alpha 1 and the stability of the enzyme interactions in the presence of factors such as solvent, ions, pressure and temperature. This information is important since it complements the docking results and allows one to evaluate whether the compound remains strongly linked to the studied enzymes in the presence of factors that are found in the host organism. To evaluate the stability with the ATPase alpha 1 transporter, the compound almitrine was selected, as it presented the highest total probability for this transporter, taking into account the two species under study: *H. sapiens* and *M. musculus* (Supplementary Table S9). The RMSD was then calculated for the Cα atoms of the complexed enzyme and the structures of each ligand, separately. stability of the enzyme interactions in the presence of factors such as solvent, ions, pressure and temperature. This information is important since it complements the docking results and allows one to evaluate whether the compound remains strongly linked to the studied enzymes in the presence of factors that are found in the host organism. To evaluate the stability with the ATPase alpha 1 transporter, the compound almitrine was selected, as it presented the highest total probability for this transporter, taking into account the two species under study: H. sapiens and M. musculus (Supplementary Table S9). The RMSD was then calculated for the Cα atoms of the complexed enzyme and the structures of each ligand, separately. The RMSD analysis of the transporting ATPase alpha 1 of H. sapiens with the compound almitrine showed conformations ranging from 0.12 to 0.15 nm in size for 10 ns,

The RMSD analysis of the transporting ATPase alpha 1 of *H. sapiens* with the compound almitrine showed conformations ranging from 0.12 to 0.15 nm in size for 10 ns, with high stability (Figure 3). The stability of this protein is essential to keep compounds bound to the active site. Furthermore, stability prevents the ligand from losing important contacts with the enzyme's active site. with high stability (Figure 3). The stability of this protein is essential to keep compounds bound to the active site. Furthermore, stability prevents the ligand from losing important contacts with the enzyme's active site.

Figure 3. RMSD values of the Cα atoms of almitrine and the control (thapsigargine) with the transporting ATPase aplha 1. Legend: Green: ATPase of H. sapiens complexed with thapsigargine; and Red: ATPase of H. sapiens complexed with almitrine. **Figure 3.** RMSD values of the Cα atoms of almitrine and the control (thapsigargine) with the transporting ATPase aplha 1. Legend: Green: ATPase of *H. sapiens* complexed with thapsigargine; and Red: ATPase of *H. sapiens* complexed with almitrine.

Regarding the analysis of the flexibility of the ligands through the RMSD calculations of the protein (Figure 4), the profile demonstrated by the isolated protein was similar to the result observed by the control, remaining stable up to 0.4 ns. Almitrine maintained stability up to a certain point, showing a peak in the period from 8.0 to 9.0 ns. Despite the small variation in the protein structure by the peak demonstrated, there was no interference in the structure of the ligands within the active site even if the protein changes its conformation. Therefore, in the presence of solvents, ions and other factors, almitrine was able to establish stronger bonds with the active site. of the protein (Figure 4), the profile demonstrated by the isolated protein was similar to the result observed by the control, remaining stable up to 0.4 ns. Almitrine maintained stability up to a certain point, showing a peak in the period from 8.0 to 9.0 ns. Despite the small variation in the protein structure by the peak demonstrated, there was no interference in the structure of the ligands within the active site even if the protein changes its conformation. Therefore, in the presence of solvents, ions and other factors, almitrine was able to establish stronger bonds with the active site.

Regarding the analysis of the flexibility of the ligands through the RMSD calculations

Pharmaceutics 2022, 14, x FOR PEER REVIEW 11 of 17

Figure 4. RMSD values for the Cα atoms of the transporting ATPase alpha 1 of H. sapiens complexed with almitrine and the control (thapsigargine). Legend: Green: ATPase of H. sapiens complexed with almitrine; Blue: ATPase of H. sapiens complexed with thapsigargine; and Red: H. sapiens transporting ATPase homologous protein. **Figure 4.** RMSD values for the Cα atoms of the transporting ATPase alpha 1 of *H. sapiens* complexed with almitrine and the control (thapsigargine). Legend: Green: ATPase of *H. sapiens* complexed with almitrine; Blue: ATPase of *H. sapiens* complexed with thapsigargine; and Red: *H. sapiens* transporting ATPase homologous protein.

To understand the flexibility of the residues and amino acids that contribute to the conformational changes in the transporting ATPase alpha 1 of H. sapiens, the mean quadratic fluctuation (RMSF) was calculated for each amino acid in each enzyme. High RMSF values suggest greater flexibility. Since amino acids with fluctuations above 0.3 nm contribute to the flexibility of the protein structure, we found that residues at positions 39, 41, 86, 122, 123, 124, 125, 275, 276, 277, 278, 497, 498, 499, 500, 564, 566, 567, 568, 570, 575, 649, 835, 1011, 1012, 1013 and 1016 contribute to conformational changes in the transporting ATPase alpha 1 of H. sapiens (Figure 5). We also found that none of the amino acids that To understand the flexibility of the residues and amino acids that contribute to the conformational changes in the transporting ATPase alpha 1 of *H. sapiens*, the mean quadratic fluctuation (RMSF) was calculated for each amino acid in each enzyme. High RMSF values suggest greater flexibility. Since amino acids with fluctuations above 0.3 nm contribute to the flexibility of the protein structure, we found that residues at positions 39, 41, 86, 122, 123, 124, 125, 275, 276, 277, 278, 497, 498, 499, 500, 564, 566, 567, 568, 570, 575, 649, 835, 1011, 1012, 1013 and 1016 contribute to conformational changes in the transporting ATPase alpha 1 of *H. sapiens* (Figure 5). We also found that none of the amino acids that affect the structural conformations identified in the transporting ATPase alpha 1 of *H. sapiens* are a component of the active site. This helps almitrine to remain in the active site.

affect the structural conformations identified in the transporting ATPase alpha 1 of H. sapiens are a component of the active site. This helps almitrine to remain in the active site.

Figure 5. Root-mean-square fluctuation (RMSF) for the Cα atoms of the transporting ATPase of H. sapiens alpha 1 complexed with the almitrine and the control thapsigargine. Legend: Green: ATPase of H. sapiens complexed with almitrine; Blue: ATPase of H. sapiens complexed with the control thapsigargine; and Red: H. sapiens transporting ATPase homologous protein. **Figure 5.** Root-mean-square fluctuation (RMSF) for the Cα atoms of the transporting ATPase of *H. sapiens* alpha 1 complexed with the almitrine and the control thapsigargine. Legend: Green: ATPase of *H. sapiens* complexed with almitrine; Blue: ATPase of *H. sapiens* complexed with the control thapsigargine; and Red: *H. sapiens* transporting ATPase homologous protein.

#### **4. Discussion**

4. Discussion Sarteriale et al., 2014 [13] performed an in silico study based on the proteome of T. gondii to identify potential drug candidates for toxoplasmosis therapy. Among the inhibitors previously identified, we selected 13 compounds from the MMV collections to be tested against the parasite in a cell-based assay. We found that the selected compounds Sarteriale et al., 2014 [13] performed an in silico study based on the proteome of *T. gondii* to identify potential drug candidates for toxoplasmosis therapy. Among the inhibitors previously identified, we selected 13 compounds from the MMV collections to be tested against the parasite in a cell-based assay. We found that the selected compounds were in vitro active against the parasite, with EC<sup>50</sup> values ranging from 0.22 to 99.69 µM.

were in vitro active against the parasite, with EC50 values ranging from 0.22 to 99.69 μM. The obtained results indicated that this method is valuable and can be used to build enriched compound libraries for in vitro drug testing, which could enable a more efficient use of laboratory resources, as suggested by Sarteriale et al. 2014 [13], bringing the advantage of reduced speed and cost and extra broadness. We also confirmed that the com-The obtained results indicated that this method is valuable and can be used to build enriched compound libraries for in vitro drug testing, which could enable a more efficient use of laboratory resources, as suggested by Sarteriale et al. 2014 [13], bringing the advantage of reduced speed and cost and extra broadness. We also confirmed that the compound collections from MMV are promising sources of anti-*T. gondii* agents.

pound collections from MMV are promising sources of anti-T. gondii agents. In our study, diverse antitoxoplasmic compounds were identified, representing the In our study, diverse antitoxoplasmic compounds were identified, representing the first time that this combined set of compounds has been evaluated against *T. gondii* in vitro.

first time that this combined set of compounds has been evaluated against T. gondii in vitro. A total of three compounds showed EC50 values against T. gondii at the nanomolar A total of three compounds showed EC<sup>50</sup> values against *T. gondii* at the nanomolar range. Two of them (MMV1804175 and MMV009415) belong to the COVID Box and one of them is part of the Pandemic Response Box (MMV637413).

range. Two of them (MMV1804175 and MMV009415) belong to the COVID Box and one of them is part of the Pandemic Response Box (MMV637413). Compound MMV1804175, commercially named almitrine, was the most selective, with an EC50 value of 0.424 μM against the parasite and a CC50 value higher than 20 μM, the top concentration evaluated. The ratio between the CC50 against HFF and the EC<sup>50</sup> against the parasite resulted in a selectivity index greater than 47. Almitrine is a selective pulmonary vasoconstrictor, which has been proposed as an interesting therapeutic option to manage severe hypoxemia in patients with the Coronavirus 2019 disease [44]. This is the first report about the anti-T. gondii activity of almitrine. Previously published work has demonstrated the in vitro activity of this drug against chloroquine-susceptible and chloroquine-resistant P. falciparum, with EC50 values ranging from 2.6 to 19.8 μM [45]. When almitrine bismesylate was administered to young subjects in single or multiple oral Compound MMV1804175, commercially named almitrine, was the most selective, with an EC<sup>50</sup> value of 0.424 µM against the parasite and a CC<sup>50</sup> value higher than 20 µM, the top concentration evaluated. The ratio between the CC<sup>50</sup> against HFF and the EC<sup>50</sup> against the parasite resulted in a selectivity index greater than 47. Almitrine is a selective pulmonary vasoconstrictor, which has been proposed as an interesting therapeutic option to manage severe hypoxemia in patients with the Coronavirus 2019 disease [44]. This is the first report about the anti-*T. gondii* activity of almitrine. Previously published work has demonstrated the in vitro activity of this drug against chloroquine-susceptible and chloroquine-resistant *P. falciparum*, with EC<sup>50</sup> values ranging from 2.6 to 19.8 µM [45]. When almitrine bismesylate was administered to young subjects in single or multiple oral doses, the physiological and blood parameters indicated that the drug was safe at all doses tested, up to 400 mg per day, with symptoms of mild nausea and headache [46].

doses, the physiological and blood parameters indicated that the drug was safe at all doses tested, up to 400 mg per day, with symptoms of mild nausea and headache [46]. Bortezomib (MMV009415) is a proteasome inhibitor and antineoplastic agent that is used in the treatment of refractory multiple myeloma and certain lymphomas [47]. The compound was equally effective against drug-sensitive and -resistant *P. falciparum*, blocking

its intraerythrocytic development prior to DNA synthesis [48]. Here, we report for the first time the anti-*T. gondii* activity of bortezomib. This compound was the most active, with an EC<sup>50</sup> value of 0.223 µM against *T. gondii*. However, this compound presented low selectivity, with a CC<sup>50</sup> value of 0.079 against the mammalian lineage HFF, indicating the need to design possible changes in the chemical structure, aimed at finding more selective analogues.

The purine analogue fludarabine (MMV637413) is an antineoplastic agent used in the therapy of chronic lymphocytic leukemia and in immunosuppressive regimens in preparation of hematopoietic cell transplantation. This small molecule is an analog of the antiviral agent vidarabine and acts interrupting DNA synthesis and inhibiting tumor cell growth. Fludarabine is associated with a low rate of transient serum enzyme elevations during therapy and has only rarely been implicated in cases of clinically apparent acute liver injury [49]. To the best of our knowledge, this is the first report about the anti-parasitic activity of this compound.

Among the ten compounds presenting anti-*T. gondii* activity in the micromolar range, we can highlight doxorubicin, an antibiotic isolated from *Streptomyces peucetius* var. *caesius*. The compound triggers oxidative stress causing cardiotoxicity, which compromises its clinical use as an antineoplastic agent [50]. This anti-*T. gondii* candidate also showed activity against another three parasitic protozoan species, named *C. parvum*, *Trichomonas vaginalis* and *P. falciparum* [51]. To the best of our knowledge, this is the first report about the anti *T. gondii* activity of this compound.

Antibiotics have a history of repurposing success for Apicomplexan parasites and are the conventional treatment for human toxoplasmosis, in the form of pyrimethamine + sulphadiazine, trimethoprim + sulphamethoxazole and pyrimethamine + clindamycin [52]. Other antibiotics with anti-*T. gondii* activity identified in the present work were lomefloxacin, mycophenolic acid, fusidic acid, levofloxacin, and trimethoprim. Mycophenolic acid is an antineoplastic antibiotic derived from various *Penicillium* fungal species. It was previously reported that this drug triggers *T. gondii* extracellular tachyzoites differentiation into cyst-like structures [53]. Fusidic acid, an antibiotic that inhibits the growth of bacteria by preventing the release of translation elongation factor G from the ribosome, has been shown to be effective in tissue culture against *P. falciparum* and *T. gondii* [54]. Trimethoprim is an antimicrobial used to treat and prevent toxoplasmosis and many bacterial infections [55]. Therefore, the in vitro activity of this drug against *T. gondii* is not a novelty. Lomefloxacin is used to treat bacterial infections including bronchitis and urinary tract infections [56]. Levofloxacin is an antibacterial drug with a broad spectrum of activity. This drug diffuses through the bacterial cell wall and acts by inhibiting DNA gyrase (bacterial topoisomerase II), leading to blockage of bacterial cell growth [57]. The in vitro anti-*T. gondii* activity of lomefloxacin and levofloxacin is reported here for the first time.

Digitoxin is a lipid soluble cardiac glycoside that inhibits the plasma membrane Na+/K<sup>+</sup> -ATPase, with anticancer effects when used at therapeutic concentrations [58]. In addition, digoxin is a cardiac glycoside long used to treat congestive heart failure and has been found more recently to show anticancer activity [59]. Ribavirin is an inhibitor of the hepatitis C virus polymerase with a broad spectrum of activity against DNA and RNA viruses [60]. To the best of our knowledge, the in vitro anti-*T. gondii* activity of digitoxin, digoxin and ribavirin is first reported here. Valproic acid, a mood-stabilizing and antipsychotic drug, presents efficacy against chronic *T. gondii* infection, as previously demonstrated [61].

Among the three compounds presenting anti-*T. gondii* activity at nanomolar range, we consider almitrine to be the most promising, since this compound showed in vitro selective anti-*T. gondii* activity and presents good oral availability and low human toxicity. The future evaluation of the efficacy of almitrine in *T. gondii*-infected animals is encouraging.

#### **5. Conclusions**

Promising anti-*T. gondii* candidates were identified and previously published in silico data was confirmed, indicating that this is a useful tool in the search for active compounds in the target-based drug development process. In addition, we suggest that almitrine represents a lead compound against *T. gondii*, which may be useful for antitoxoplasmic chemotherapy.

The 13 selected compounds showed interaction with specific enzymes of *T. gondii*, whilst the compounds almitrine, bortezomib, digoxin, digitoxin, doxorubicin, mycophenolic acid, ribavirin, fludarabine and fusidic acid presented greater affinity than the ligands under study for the selected mechanisms. Almitrine showed a lower score than the positive control tapsigargin, regarding the Na+/K+ ATPase transporter of *H. sapiens* and *M. musculus* referring to the Plantscore algorithm. In addition, almitrine showed interactions such as the positive control tapsigargin, thus indicating a possible mechanism of action of this compound.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics14081634/s1. Refs. [62,63] are mentioned in Supplementary Materials.

**Author Contributions:** Investigation: D.C.C., P.P.M.T. and N.F.d.S.; Supervision: M.T.S. and J.Q.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the São Paulo Research Foundation (FAPESP) (grant number 2018/18954-4 and 2020/03399-5). The FAPESP process number 2022/05069-8 was also involved.

**Institutional Review Board Statement:** Not applicable.

**Acknowledgments:** We would like to thank the Medicines for Malaria Venture foundation (MMV; Switzerland) for having provided the open-access Boxes. We would also like to thank Tiago W. P. Mineo, Samuel C. Teixeira (Universidade Federal de Uberlândia), André G. Tempone and Cristina Cristina Meira-Strejevitch (Instituto Adolfo Lutz of São Paulo) for their help through their sharing of cell cultures, protocols and experience.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Drug-Disease Severity and Target-Disease Severity Interaction Networks in COVID-19 Patients**

**Verena Schöning and Felix Hammann \***

Clinical Pharmacology and Toxicology, Department of General Internal Medicine, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland

**\*** Correspondence: felix.hammann@insel.ch

**Abstract:** Drug interactions with other drugs are a well-known phenomenon. Similarly, however, pre-existing drug therapy can alter the course of diseases for which it has not been prescribed. We performed network analysis on drugs and their respective targets to investigate whether there are drugs or targets with protective effects in COVID-19, making them candidates for repurposing. These networks of drug-disease interactions (DDSIs) and target-disease interactions (TDSIs) revealed a greater share of patients with diabetes and cardiac co-morbidities in the non-severe cohort treated with dipeptidyl peptidase-4 (DPP4) inhibitors. A possible protective effect of DPP4 inhibitors is also plausible on pathophysiological grounds, and our results support repositioning efforts of DPP4 inhibitors against SARS-CoV-2. At target level, we observed that the target location might have an influence on disease progression. This could potentially be attributed to disruption of functional membrane micro-domains (lipid rafts), which in turn could decrease viral entry and thus disease severity.

**Keywords:** COVID-19; network analysis; drug-disease interaction; target-disease interaction; DPP4 inhibitors; lipid rafts; drug repurposing

#### **1. Introduction**

In Switzerland, patients seen by general practitioners have a median of two chronic conditions, and receive a median of two prescribed drugs [1]. The most common conditions are cardiovascular diseases, including arterial hypertension and lipid disorders, and diabetes [2]. Not only do drug-drug interactions increase with pill burden, but also the risk for drug-disease interactions (DDSIs), where drugs that are beneficial in one disease may be harmful in another [3]. A drug's action is brought about by its interaction with molecular targets. The relationship is asymmetric, meaning that a given drug can interact with multiple targets, and one target with multiple drugs [4]. By consequence, the interaction of drugs with specific molecular targets can also influence the progression or severity of a disease, which could lead to a target-disease interaction (TDSI).

The current pandemic of coronavirus disease 2019 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). By now, several risk factors for severe COVID-19 progression are known, such as age [5,6], male sex [7,8], or obesity [9–12]. Additionally, common co-morbidities such as diabetes [13–15], cardiac [16,17] and pulmonary diseases [18,19], or dementia [20] can influence prognosis of COVID-19. Furthermore, both the number and the combination of certain co-morbidities have been found to be predictors of severity [21]. Several studies have already been conducted to analyze the influence of specific co-medications on COVID-19 incidence and progression. For example, hypertension is a common chronic condition and a risk factor for severe COVID-19 progression [22]. Some researchers analyzed the influence of anti-hypertensive drugs acting on the renin-angiotensin-aldosterone-system (RAAS)-system [23,24]. The majority of these studies provided evidence that angiotensin converting enzyme (ACE)

**Citation:** Schöning, V.; Hammann, F. Drug-Disease Severity and Target-Disease Severity Interaction Networks in COVID-19 Patients. *Pharmaceutics* **2022**, *14*, 1828. https://doi.org/10.3390/ pharmaceutics14091828

Academic Editors: Paul Bogdan, Lucret,ia Udrescu, Mihai Udrescu and Ludovic Kurunczi

Received: 16 May 2022 Accepted: 27 August 2022 Published: 30 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

inhibitors and angiotensin-receptor blockers (ARBs) do not adversely affect the COVID-19 progression or may even be beneficial [22–28]. In general, studies showed that polypharmacy increases the risk for severe COVID-19 [29,30]. polypharmacy increases the risk for severe COVID-19 [29,30]. Network analysis is used to investigate a group of objects (e.g., friends, internet servers, patients, enzymes, or proteins) and their connection with each other. The objects are

inhibitors and angiotensin-receptor blockers (ARBs) do not adversely affect the COVID-19 progression or may even be beneficial [22–28]. In general, studies showed that

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 2 of 15

Network analysis is used to investigate a group of objects (e.g., friends, internet servers, patients, enzymes, or proteins) and their connection with each other. The objects are the nodes of the network, whereas the relationships are the edges connecting the nodes. One famous example is Zachary's "karate club" network, which displays the pattern of friendships amongst the members of a university karate club [31]. In recent years, network analysis has increasingly been applied in the context of pharmacology, e.g., to investigate the relationships between drugs and their respective targets [4] or the relationship between proteins and metabolites [32]. In addition, several network studies on the repurposing of drugs against SARS-CoV-2 have been conducted, mainly as drug-target, target-human, viral-human, or protein-protein-interactions, or combinations thereof [32–34]. In addition, transcriptomes of COVID-19 patients, patients with related conditions and healthy controls were compared to identify possible drugs candidates for repurposing [35]. the nodes of the network, whereas the relationships are the edges connecting the nodes. One famous example is Zachary's "karate club" network, which displays the pattern of friendships amongst the members of a university karate club [31]. In recent years, network analysis has increasingly been applied in the context of pharmacology, e.g., to investigate the relationships between drugs and their respective targets [4] or the relationship between proteins and metabolites [32]. In addition, several network studies on the repurposing of drugs against SARS-CoV-2 have been conducted, mainly as drug-target, targethuman, viral-human, or protein-protein-interactions, or combinations thereof [32–34]. In addition, transcriptomes of COVID-19 patients, patients with related conditions and healthy controls were compared to identify possible drugs candidates for repurposing [35]. However, none of these studies used clinical data to investigate the influence of pre-

However, none of these studies used clinical data to investigate the influence of pre-existing drug treatment on patient outcomes as measure of disease severity. existing drug treatment on patient outcomes as measure of disease severity. The aim of this study was to analyze the impact of DDSIs and TDSIs on COVID-19

The aim of this study was to analyze the impact of DDSIs and TDSIs on COVID-19 severity using network analysis as a tool to inform drug repurposing efforts and increase drug safety. We compared drugs on admission (i.e., drugs patients were taking before or on the day of admission) and their molecular targets in patients who tested positive for SARS-CoV-2, and used severe (required critical care or died) or non-severe outcome (outpatient or never requiring critical care) as endpoint. severity using network analysis as a tool to inform drug repurposing efforts and increase drug safety. We compared drugs on admission (i.e., drugs patients were taking before or on the day of admission) and their molecular targets in patients who tested positive for SARS-CoV-2, and used severe (required critical care or died) or non-severe outcome (outpatient or never requiring critical care) as endpoint.

#### **2. Materials and Methods 2. Materials and Methods**

#### *2.1. Study Population 2.1. Study Population*

We carried out this retrospective study at the Insel Hospital Group (IHG), a tertiary hospital network with six locations and about 860,000 patients treated per year, making it the biggest health care provider in Switzerland. The Cantonal Ethics Committee of Bern approved the protocol (2020-00973). We considered all patients who tested positive for SARS-CoV-2 by reverse-transcriptase polymerase chain reaction (RT-PCR) assay on nasopharyngeal swabs at the IHG between 1 February through 16 November 2020—covering the 'first wave' and most of the 'second wave' of COVID-19 in the region (Figure 1). We carried out this retrospective study at the Insel Hospital Group (IHG), a tertiary hospital network with six locations and about 860,000 patients treated per year, making it the biggest health care provider in Switzerland. The Cantonal Ethics Committee of Bern approved the protocol (2020-00973). We considered all patients who tested positive for SARS-CoV-2 by reverse-transcriptase polymerase chain reaction (RT-PCR) assay on nasopharyngeal swabs at the IHG between 1st February through 16th November 2020—covering the 'first wave' and most of the 'second wave' of COVID-19 in the region (Figure 1).

For patients with no registered general research consent status, a waiver of consent was granted by the ethics committee. Objection to the general research consent of the IHG

**Figure 1.** Flowchart of patient selection process. **Figure 1.** Flowchart of patient selection process.

For patients with no registered general research consent status, a waiver of consent was granted by the ethics committee. Objection to the general research consent of the IHG was an exclusion criterion for this study, whereas participation in other trials (including COVID-19 related treatment studies) was not. Disease progression was classified as *severe* if, for any reason, an intensive care unit (ICU) admission was required at any stage, or the patient died during the stay. All other patients were classified as *non-severe*. We selected only patients for whom drugs on admission had been recorded. Therefore, this study included 115 severe and 390 non-severe COVID-19 patients. We identified pre-existing conditions using Natural Language Processing from a previous study [36]. For a total of 28 patients (14 non-severe and 14 severe cases), we could not perform disease detection. Characteristics of the study population are provided in Table 1.


**Table 1.** Characteristics of study population.

To study the effects of co-morbidities, we created four sub-groups:


Note: patients can be members of more than one group, e.g., 70 patients suffered from diabetes as well as from cardiac conditions.

#### *2.2. Network Analysis*

Drugs on admission (drugs taken before admission to the IHG) were obtained from the electronic health records (EHR). As this part of the EHR was not always complete, we also considered drugs administered in-house on the day of admission. This also mitigates the effect of patients transferred from other hospitals compared to patients who were initially admitted to the IHG.

We evaluated different levels of detail in drug classification. First, we compared the fourteen main groups of the Anatomical Therapeutic Chemical (ATC) classification system [37]. Then we selected 90 pharmacological, chemical subgroups or substances, which we categorized in 30 therapeutic groups. We identified drugs in the EHR by ATC codes. By and large, the drug groups and subgroups are based on the categorization of the ATC classification, but some minor deviations are present, e.g., acetylsalicylic acid was included as antithrombotic agents, whereas in the ATC code, it is grouped with the analgesics, an

uncommon indication in Switzerland. Considering the hyper-thrombotic state of COVID-19 patients [38], we considered its rheological effect to be more important than its analgesic effect. Further information on our grouping is available in the Supplements, Table S1.

Lastly, we analyzed the molecular targets of the drugs on admission. We used Drug-Bank [39] to map drugs to targets and their target locations.

As the two severity cohorts are imbalanced, we normalized the number of patients for network analysis in each drug (sub-)group by dividing them through the total number of patients in the respective cohort. The obtained value was used as weight in the network analysis.

A network consists of nodes connected by edges. A node's weight is determined by the number of patients receiving the drug, and an edge's weight by the number of patients receiving two drugs simultaneously. In our analysis, drugs, drug classes, and targets were represented as nodes and concurrent use or interaction was represented by connecting undirected edges. Therefore, the weights of nodes or edges are both positively correlated with drug use or target engagement.

#### *2.3. Software and Statistical Tests*

Data wrangling, analysis, and visualization were performed in GNU R (version 4.0.2, R Foundation for Statistical Computing, http://www.R-project). Statistical significance levels were defined at a *p* value of <0.05, and determined with the Student's *t*-test for continuous parameters and Chi-square test for categorical parameters using the *stats* package (version 4.0.2). Network analysis was performed using the *igraph* package (Version 1.2.6) [40]. For network visualization, we used Gephi (Version 0.9.2) [41].

#### **3. Results**

#### *3.1. Network Metrics*

The main network metrics are presented in the Supplements, Table S2. Main nodes (hubs) and main edges are defined as those with the highest weight, i.e., largest share of patients taking this drug or drug combination. All main nodes and edges are identical between the severity cohorts, except for one edge in the drug subgroups (non-severe: *other analgesics and antipyretics—heparin*; severe: *other analgesics and antipyretics—antibiotics*). The diameter of the network (maximum distance between any two nodes; or the longest shortest path), was in general larger in the severe cohort. Node betweenness centrality (betweenness, indicating how often a node lies on the shortest path between two other nodes) in the non-severe cohort was higher than in the severe cohort (43 and 24 drug subgroups, respectively). More molecular targets had a higher betweenness in the nonsevere than in the severe cohort (418 and 124 molecular targets, respectively). In addition, betweenness values in the non-severe cohort were higher (median: 150 vs. 48 and mean: 225 vs. 104, respectively). In Table S2, we show nodes with the greatest differences in the betweenness between the cohorts.

#### *3.2. DDSI Network*

There are significant differences (*p* < 0.05) in all three networks (anatomical/pharmacological group, drug group, and drug subgroup) with regards to the drugs (nodes, Table 2) and drug combinations (edges, Table 3) taken on admission. In all nodes and edges with significant differences, the percentage of occurrence was higher in the cohort with severe disease progression unless stated otherwise. As an example, visualization of the anatomical/pharmacological group network is shown in Figure 2.

**Figure 2.** Drug group networks for severe (**A**) and non-severe (**B**) COVID-19 (only nodes with three or more edges are shown). **Figure 2.** Drug group networks for severe (**A**) and non-severe (**B**) COVID-19 (only nodes with three or more edges are shown).

No differences can be observed in the anatomical/pharmacological group *Alimentary tract and metabolism* or any of the corresponding (sub-)groups between the severity cohorts considering all diseases. However, *anti-hyperglycemics*, specifically *dipeptidyl peptidase-4 (DPP4) inhibitors* and *sodium glucose co-transporter 2 (SGLT2) inhibitors* (only borderline significant, *p* = 0.06), were taken more often by non-severe COVID-19 patients with cardiovascular conditions or cardiovascular conditions and diabetes (see supplements, Table No differences can be observed in the anatomical/pharmacological group *Alimentary tract and metabolism* or any of the corresponding (sub-)groups between the severity cohorts considering all diseases. However, *anti-hyperglycemics*, specifically *dipeptidyl peptidase-4 (DPP4) inhibitors* and *sodium glucose co-transporter 2 (SGLT2) inhibitors* (only borderline significant, *p* = 0.06), were taken more often by non-severe COVID-19 patients with cardiovascular conditions or cardiovascular conditions and diabetes (see Supplements, Table S3).

S3). In the anatomical/pharmacological group *Blood and blood forming organs*, *anti-hemorrhagics* and *anti-platelet agents* (even though only borderline significant with *p* = 0.095), and within these groups, especially *Vitamin K and other hemostatics* and *acetylsalicylic acid* (only borderline significant, *p* = 0.085), respectively, were significantly different between the se-In the anatomical/pharmacological group *Blood and blood forming organs*, *anti-hemorrhagics* and *anti-platelet agents* (even though only borderline significant with *p* = 0.095), and within these groups, especially *Vitamin K and other hemostatics* and *acetylsalicylic acid* (only borderline significant, *p* = 0.085), respectively, were significantly different between the severity cohorts.

verity cohorts. In the anatomical/pharmacological group *Cardiovascular system*, which showed no cohort difference, the drug group *diuretics* and *cardiovascular drugs* had a higher percentage in severe COVID-19. In the former group, *loop diuretics* and in the latter, *beta blockers* are In the anatomical/pharmacological group *Cardiovascular system*, which showed no cohort difference, the drug group *diuretics* and *cardiovascular drugs* had a higher percentage in severe COVID-19. In the former group, *loop diuretics* and in the latter, *beta blockers* are significant differences over all patients regardless of co-morbidity.


significant differences over all patients regardless of co-morbidity. **Table 2.** Significant nodes of the DDSI network.

Vitamin K and other hemostatics 0.51 3.48 0.037 <sup>1</sup> Drug subgroups that were associated with death or severe COVID-19 by Iloanusi et al. and McKeigue et al. [29,30].

Opioids 1 10.51 17.39 0.068 Acetylsalicylic acid 21.28 29.57 0.085

1 drug subgroups that were associated with death or severe COVID-19 by Iloanusi et al. and

McKeigue et al. [29,30].


**Table 3.** Significant edges of the DDSI network.

<sup>1</sup> Drug subgroups that were associated with death or severe COVID-19 by Iloanusi et al. and McKeigue et al. [29,30].

Additionally, non-steroidal anti-inflammatory drugs (NSAIDs) were more often taken by patients with non-severe COVID-19, whereas the opposite was true for opioids (but only borderline significant).

Considering all patients, there are differences in combinations of drugs from anatomical/pharmacological group, drug group combinations, and drug subgroup combinations, but the weight of these edges (percentage of patients) is relatively low in most cases (<15%) (Table 3).

However, the disease-specific analysis revealed that the combination of *anti-hyperglycemics* and *anti-coagulants* was more common in non-severe COVID-19 in patients with cardiac conditions or cardiac conditions and diabetes. In the latter cohort, the combination

of *anti-hyperglycemics* and *statins* had a higher percentage in non-severe COVID-19 (see Supplements, Table S4).

#### *3.3. TDSI Network 3.3. TDSI Network*

The main molecular targets and their relative frequency per cohort are shown in the Supplements, Figure S1. Molecular targets with highly significant (*p* < 0.001) differences are given in the Supplements, Figure S2. The main molecular targets and their relative frequency per cohort are shown in the supplements, Figure S1. Molecular targets with highly significant (*p* < 0.001) differences are given in the supplements, Figure S2.

Differences in molecular targets can be divided into two groups. The first group comprises targets which interact with only one specific group of drugs, e.g., antithrombotic agents mostly interact with *coagulation factor X*, *P-selection*, and *antithrombin-III*, whereas diuretics may target members of the *solute carrier family 12*. The second group includes targets that cannot be assigned to just one indication or drug group. *Beta adrenergic receptors* are targets for anti-depressants, anti-hypertensives, and anti-arrhythmics. There are over 2690 significantly different edges in the molecular target network. In the Supplements, we included the 30 most common edges in the network (Figure S3) and the highly significant edges (*p* < 0.001) (Figure S2). Differences in molecular targets can be divided into two groups. The first group comprises targets which interact with only one specific group of drugs, e.g., antithrombotic agents mostly interact with *coagulation factor X*, *P-selection*, and *antithrombin-III*, whereas diuretics may target members of the *solute carrier family 12*. The second group includes targets that cannot be assigned to just one indication or drug group. *Beta adrenergic receptors* are targets for anti-depressants, anti-hypertensives, and anti-arrhythmics. There are over 2690 significantly different edges in the molecular target network. In the supplements, we included the 30 most common edges in the network (Figure S3) and the highly significant edges (*p* < 0.001) (Figure S2).

In Figure 3, we present a filtered version of molecular target networks of both severity cohorts, where only nodes with three or more edges are shown. In Figure 3, we present a filtered version of molecular target networks of both severity cohorts, where only nodes with three or more edges are shown.

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 7 of 15

**Figure 3.** Molecular target networks for severe (**A**) and non-severe (**B**) COVID-19 (only nodes with three or more edges are shown); *ADRA1A/2A*: Alpha-1A/2A adrenergic receptor; *ADRB1/2*: Beta-1/2 adrenergic receptor; *AGTR1*: Type-1 angiotensin II receptor; *AKR1C1*: Aldo-keto reductase family 1 member C1; *ATP4A*: Potassium-transporting ATPase alpha chain 1; *CASP1/3*: Caspase-1/3; *CCND1*: G1/S-specific cyclin-D1; *CHRM1/M2/M3*: Muscarinic acetylcholine receptor M1/M2/M3; *CYCLA*: Cyclin A; *DDAH1*: N(G),N(G)-dimethylarginine dimethylaminohydrolase 1; *DRD2*: Dopamine D2 receptor; *EDNRA*: Endothelin-1 receptor; F10: Coagulation factor X; *HDAC2*: Histone deacetylase 2; *HMGCR*: 3-hydroxy-3-methylglutaryl-coenzyme A reductase; *HRH1*: Histamine H1 receptor; *HSPA5*: 78 kDa glucose-regulated protein; *HTR1A/2A/2C*: 5-hydroxytryptamine receptor 1A/2A/2C; *IKBKB*: Inhibitor of nuclear factor kappa-B kinase subunit beta; *MAPK1*: Mitogen-activated protein kinase 1; MYC: Myc proto-oncogene protein; *NFKBIA*: NF-kappa-B inhibitor alpha; *NR3C1*: Glucocorticoid receptor; *OPRD1*: Delta-type opioid receptor; *OPRK1*: Kappa-type opioid receptor; *OPRM1*: Mu-type opioid receptor; *PCNA*: Proliferating cell nuclear antigen; *PRKAA1*: 5'- AMP-activated protein kinase; *PTGES3*: Prostaglandin E synthase 3; *PTGS1/2*: Prostaglandin G/H synthase 1/2; *RSK*: Ribosomal protein S6 kinase alpha-3; *SERPINC1*: Antithrombin-III; *SLC12A1/2*: Solute carrier family 12 member 1/2; *SLC6A4*: Sodium-dependent serotonin transporter; *TP53*: Cellular tumor antigen p53; *TRPV1*: Transient receptor potential cation channel subfamily V member 1; *TSG-6*: Tumor necrosis factor-inducible gene 6 protein. The color of the nodes indicates the location of the molecular target within the cell. **Figure 3.** Molecular target networks for severe (**A**) and non-severe (**B**) COVID-19 (only nodes with three or more edges are shown); *ADRA1A/2A*: Alpha-1A/2A adrenergic receptor; *ADRB1/2*: Beta-1/2 adrenergic receptor; *AGTR1*: Type-1 angiotensin II receptor; *AKR1C1*: Aldo-keto reductase family 1 member C1; *ATP4A*: Potassium-transporting ATPase alpha chain 1; *CASP1/3*: Caspase-1/3; *CCND1*: G1/S-specific cyclin-D1; *CHRM1/M2/M3*: Muscarinic acetylcholine receptor M1/M2/M3; *CYCLA*: Cyclin A; *DDAH1*: N(G),N(G)-dimethylarginine dimethylaminohydrolase 1; *DRD2*: Dopamine D2 receptor; *EDNRA*: Endothelin-1 receptor; F10: Coagulation factor X; *HDAC2*: Histone deacetylase 2; *HMGCR*: 3-hydroxy-3-methylglutaryl-coenzyme A reductase; *HRH1*: Histamine H1 receptor; *HSPA5*: 78 kDa glucose-regulated protein; *HTR1A/2A/2C*: 5-hydroxytryptamine receptor 1A/2A/2C; *IKBKB*: Inhibitor of nuclear factor kappa-B kinase subunit beta; *MAPK1*: Mitogenactivated protein kinase 1; MYC: Myc proto-oncogene protein; *NFKBIA*: NF-kappa-B inhibitor alpha; *NR3C1*: Glucocorticoid receptor; *OPRD1*: Delta-type opioid receptor; *OPRK1*: Kappa-type opioid receptor; *OPRM1*: Mu-type opioid receptor; *PCNA*: Proliferating cell nuclear antigen; *PRKAA1*: 5 0 -AMP-activated protein kinase; *PTGES3*: Prostaglandin E synthase 3; *PTGS1/2*: Prostaglandin G/H synthase 1/2; *RSK*: Ribosomal protein S6 kinase alpha-3; *SERPINC1*: Antithrombin-III; *SLC12A1/2*: Solute carrier family 12 member 1/2; *SLC6A4*: Sodium-dependent serotonin transporter; *TP53*: Cellular tumor antigen p53; *TRPV1*: Transient receptor potential cation channel subfamily V member 1; *TSG-6*: Tumor necrosis factor-inducible gene 6 protein.

In the non-severe cohort, more molecular targets are located within the cell membrane,

whereas in the severe cohort more targets are located within the cytoplasm.

The color of the nodes indicates the location of the molecular target within the cell. In the non-severe cohort, more molecular targets are located within the cell membrane, whereas in the severe cohort more targets are located within the cytoplasm.

#### **4. Discussion**

The network analysis of drugs and their molecular targets revealed differences between the severity cohorts of COVID-19. Except for one edge, the main nodes (hubs) and edges are identical, however the weights were often slightly higher in the severe cohort. This suggests that the most important drugs and drug combinations are the same between the cohorts, but still, slightly more drugs and drug combinations are taken by the severe cohort. This may be indicative of a subpopulation with more co-morbidities. The larger diameter of the severe network indicates that the drugs and drug combinations are more heterogeneous in this cohort. This is supported by the generally lower betweenness of most nodes in this cohort in absolute values, but also in comparison to the non-severe cohort.

However, co-morbidities and co-medications did not always result in a more severe course. Noteworthy here is the higher percentage of patients with cardiac conditions, or cardiac conditions and diabetes, using anti-hyperglycemics, especially DPP4 inhibitors, and to a lower degree SGLT2 inhibitors in the non-severe COVID-19 cohort. These patients had at least two co-morbidities, which are considered risk factors for a severe course [15,42,43], but had a more favorable outcome under these treatment regimens. DPP4 inhibitors have been shown to be reno- and cardio-protective through the suppression of oxidative stress, inflammation, and improvement of endothelial function [44]. Furthermore, there is evidence that SARS-CoV-2, like MERS-CoV (Middle East respiratory syndrome-related coronavirus), also uses the membrane-bound DPP4 enzyme for viral entry. An inhibition of this enzyme is speculated to reduce viral entry and replication [45,46]. In SARS-CoV-2, a functional network analysis revealed that DPP4 is required in viral processes for viral entry and infection. Furthermore, protein-chemical interaction networks revealed important interactions between DPP4 and the DPP4 inhibitor sitagliptin [47]. Additionally, in animal experiments, DPP4 inhibition resulted in a rise of soluble DPP4 [48,49] which could bind to plasma SARS-CoV-2, reducing the amount of virus able to infect cells [50]. Mutations in DPP4 genes, leading to reduced levels for soluble DPP4, were identified as risk factors for increased susceptibility for MERS-CoV [51]. Within an infected cell, sitagliptin inhibited the SARS-CoV-2 papain-like proteases (PLpro) in an in-cell protease assay [52]. Clinical literature on DPP4 inhibitors in COVID-19 is ambiguous; several studies and meta-analyses have showed favorable effects [53–56], while some have not [57–59]. A review of clinical trials with the DPP4 inhibitor sitagliptin found that most studies showed a favorable effect on COVID-19 progression [50]. Several potential modes of action are discussed apart from the above-mentioned decrease in viral entry, increase in soluble DPP4, or inhibition of viral proteases. It is hypothesized that DPP4 inhibitors might attenuate COVID-19-related cardiovascular injury including arrhythmia, acute coronary syndrome and heart failure [60]. In addition, DPP4 inhibition has anti-inflammatory and immunomodulatory properties by decreasing activation of nuclear factor kappa beta (NF-κB) activation and expression of inflammatory cytokines [61,62]. These factors could also influence the progression.

A benefit of SGLT2 inhibitors is supported on pathophysiological grounds. SGLT2 inhibitors have been shown to downregulate systemic and adipose tissue inflammation by decreasing the expression of pro-inflammatory cytokines, lessen oxidative stress, and reduce sympathetic activity [63]. Furthermore, treatment with a SGLT2 inhibitors alleviated myocardial and renal fibrosis in mice [64]. In a large randomized trial with COVID-19 patients, treatment with dapagliflozin, a SGLT2 inhibitors, did not result in a statistically significant risk reduction in organ dysfunction and death, or speedier recovery [65].

Considering all patients, regardless of the diagnosed co-morbidities, there are some noteworthy differences in the drugs (nodes of the network, Table 2) used within the cohorts. Despite doubts early in the pandemic regarding the use of NSAIDs during COVID-19 [66], a systematic review and meta-analysis was not able to confirm this theoretical risk [67]. In human cell cultures and mice, NSAIDs reduced pro-inflammatory cytokines, and dampened the humoral immune response to SARS-CoV-2 [68]. This protective effect might be explained by reversing the progressive inflammation in different organs [69]. Even though this study included only few patients on NSAIDs, they were still more common in non-severe patients and thus corroborated earlier studies. Comparisons to other antipyretics with no anti-inflammatory action (e.g., acetaminophen) are necessary.

Some drugs with significant differences between cohorts might be more indicative of the severity of the underlying condition and not interact with COVID-19 prognosis directly. Loop diuretics, for instance are used in more advanced stages of renal failure [70]. As poor renal function is indicative of severe COVID-19 [71,72], this correlation might be due to the severity of the pre-existing condition, not the drug itself. Beta blockers were more often used in the severe cohort, but this might be explained by the higher prevalence of cardiovascular co-morbidities in this cohort. However, loop diuretics, beta blockers, and opioids are also associated with death or severe COVID-19 in a polypharmacy setting [29,30].

Overall, a relatively small percentage of patients received antipsychotic drugs, and the difference between cohorts was not significant (8% and 13% in the non-severe and severe cohorts, respectively). However, combinations with other drugs such as loop diuretics, opioids, beta-blockers, or proton pump inhibitors were more often seen in patients in the severe cohort. The influence of antipsychotic drugs on COVID-19 infection risk and prognosis is currently under discussion. A retrospective study in 698 patients using antipsychotic drugs revealed a lower infection risk and a better prognosis compared to non-users [73]. Comparable results were also reported from a study in patients with a pre-existing diagnosis of schizophrenia, schizoaffective disorder, or bipolar disorder [74]. On the other hand, a systematic review and meta-analysis showed a correlation between antipsychotics and COVID-19 mortality [75]. However, the reviewed studies included patients on antipsychotics independently of diagnoses, considered antipsychotics as a single homogenous pharmacological group, and did not test for adherence [76]. Our results suggest that not the use of a specific drugs per se, but the combination with other drugs influences the risk for severe COVID-19. Therefore, a detailed analysis of the most significantly different drug combinations (edges of the network, Table 3) was performed. Most drug combinations were taken by less than 15% of the patients, which makes a detailed analysis of cause and effect difficult, but trends are visible. In all cases but one (NSAIDs/other analgesics) a greater proportion was seen in the severe cohort. However, this difference is not due to general polypharmacy, which is known to influence disease severity in COVID-19 [29,30], as the number of drugs on admission was not significantly different in both cohorts. Not only polypharmacy, but also specific drug classes influence severity in COVID-19 [29,30]. Drug classes with an increased risk for severe COVID-19 are highlighted in red in Table 3. In seven and in eleven drug combinations, one or both drugs, respectively, were considered high risk. All these combinations were more prevalent in the severe cohort. Only in one combination (NSAIDs/other analgesics), neither drug was considered high risk. Interestingly, a higher proportion of non-severe patients took that combination.

In proton pump inhibitors (PPIs), the effect of combination with other drugs can be seen. PPIs are taken to the same extent by the non-severe and severe cohort (32.6% and 33.9%, respectively, *p* = 0.88, data not shown). However, combinations of PPIs with antipsychotics or platelet inhibitors were more prevalent in severe patients. Several review articles evaluating the effects of PPIs on COVID-19 progression and mortality revealed high heterogeneity in the outcomes [77–80]. However, those studies did not control for co-medication, except for one which looked at NSAIDs [80]. In summary, studies on drug effects should also consider including and ideally control for co-medication.

In the molecular target network of the non-severe cohort, there are more targets located in the cell membrane. Several hypotheses could help explain this finding. One hypothesis is that interaction of drugs with cell membrane receptors might interfere with viral entry

into the cell. The host protein angiotensin-converting enzyme 2 (ACE2) is considered the main entry receptor for SARS-CoV-2 and the transmembrane serine protease 2 (TMPRSS2) an important priming enzyme required during this process [81,82]. In addition, other cell membrane receptors may be involved in cellular entry of SARS-CoV-2 [81,83,84], like neurophilin-I [85,86], or DPP4 [45,46]. Interference may be direct if a drug targets a protein, which is also important for viral entry. Studies on SARS-CoV-2–human protein-protein interaction revealed hundreds of further possible targets [87–90], however there is only minimal overlap with the target we identified. However, interference may also be indirect due to changes in membrane organization that negatively impact any part of the viral replication cycle. Functionally organized micro-domains (lipid rafts), characterized by highly ordered and tightly packed lipid molecules, within the cell membrane may play a pivotal role in different processes during the viral life-cycle, including coronaviruses [91]. Lipid raft involvement in viral entry was already shown for the murine hepatitis virus, a betacoronavirus such as SARS-CoV-2 [92]. A further study used SARS-CoV-2 pseudo viruses to demonstrate the importance of cholesterol-rich membrane lipid raft for infection [93]. Micro-domains may increase the efficiency of infection by clustering enzymes and receptors in certain membrane area, thus allowing multivalent binding of virus particles, but are not an absolute requirement for the entry process [94]. Several drugs acting on specific the cell membrane targets were shown to disrupt lipid rafts [95]. These included targets we identified in the non-severe cohort, such as alpha- and beta-adrenergic receptors, and opioid receptors. As the network visualizations only include nodes with three or more edges, one might conclude that the combination of several drugs, which interfere with the integrity of the lipid rafts, have an influence on COVID-19 progression.

Our study has some limitations. The severity cohorts had some significant differences in demographics and co-morbidities. The severe cohort was significantly older, had a higher BMI, and a higher share of male patients, all factors which are known risk factors for severe COVID-19 [5–12]. Even though the differences are significant, they are still rather small (median age difference three years, median BMI difference 1.68 points), so that a detailed analysis of these factors would require a larger sample size to obtain enough power with an unknown effect size. Additionally, more patients in the severe cohort suffered from arterial hypertension, chronic heart failure and/or coronary heart disease, again established risk factors in COVID-19 [16,17,22]. However, the presence and even the severity of co-morbidities were indirectly accounted for by the analysis of prescribed drugs. The number of drugs on admission was not significantly different between the cohorts. Even though we were able to include a total of 505 patients in our analysis, the number of patients receiving one specific drug was still relatively low, especially in the disease cohorts. Therefore, significant differences were in some cases only seen in the high-level pooled groups. For this reason, we also reported borderline significant results (0.05 < *p* < 0.1), which could be interpreted as a weak signal and should be investigated in further research. Furthermore, we were mainly able to consider hospitalized patients because of data availability issues. As the IHG is an important regional medical center, some patients were transferred from smaller hospitals. Drugs given in these smaller hospitals are recorded on the drugs on admission list.

The analysis only focused on dual combinations. While we did perform cluster analyses to find more complex combinations, the data available did not support this. Furthermore, even though there were significant differences between the severity cohort with regards to age and sex, we did not control for that.

#### **5. Conclusions**

In summary, the use of a network approach allowed for studying the impact of drugs from a novel vantage point. Most importantly, autonomic targets appear to be influential on the course of disease in COVID-19, mostly in the form of off-target effects, possibly by disrupting lipid rafts and impeding viral entry. This also holds for DPP4 inhibitors, which are known to interact with adrenergic receptors [96]. The impact of interference with

autonomic receptors merits further study into potential future treatments for infection with SARS-CoV-2 and other viruses. Overall, our network analysis indicates that DPP4 inhibitors are related to a better prognosis for COVID-19 and thus represent potential repositioning drugs against SARS-CoV-2. Additionally, our study revealed (i) that drug-induced changes in cell membrane architecture might influence disease progression and (ii) that the influence of specific drugs on disease progression might be dependent on concurrent co-medication.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics14091828/s1, Table S1: Assignment of ATC codes to the drug groups and subgroups; Table S2: Important network metrics of DDSI and TDSI network; Table S3: Significant nodes of the DDSI network, disease specific; Table S4: Significant edges of the DDSI network, disease specific; Figure S1: Main target nodes; Figure S2: Target nodes with significant differences; Figure S3: Main target edges; Figure S4: Target edges with highly significant differences (*p* < 0.001).

**Author Contributions:** V.S.: formal analysis, investigation, methodology, software, visualization, writing; F.H.: conceptualization, formal analysis, methodology, software, supervision, visualization, writing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The study was approved by the Cantonal Ethics Committee of Bern (Project-ID 2020-00973).

**Informed Consent Statement:** Participants either agreed to a general research consent or, for participants with no registered general research consent status (neither agreement nor rejection), a waiver of consent was granted by the ethics committee.

**Data Availability Statement:** The source code is available on GitHub: https://github.com/cptbern/ Covid19-network-analysis.

**Acknowledgments:** We thank Noel Frey, Myoori Wijayasingham, and the Insel Data Science Center for database and infrastructure support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Systematic Review* **Hidradenitis Suppurativa and Comorbid Disorder Biomarkers, Druggable Genes, New Drugs and Drug Repurposing—A Molecular Meta-Analysis**

**Viktor A. Zouboulis <sup>1</sup> , Konstantin C. Zouboulis <sup>2</sup> and Christos C. Zouboulis 3,\***


**Abstract:** Chronic inflammation and dysregulated epithelial differentiation, especially of hair follicle keratinocytes, have been suggested as the major pathogenetic pathways of hidradenitis suppurativa/acne inversa (HS). On the other hand, obesity and metabolic syndrome have additionally been considered as an important risk factor. With adalimumab, a drug has already been approved and numerous other compounds are in advanced-stage clinical studies. A systematic review was conducted to detect and corroborate HS pathogenetic mechanisms at the molecular level and identify HS molecular markers. The obtained data were used to confirm studied and off-label administered drugs and to identify additional compounds for drug repurposing. A robust, strongly associated group of HS biomarkers was detected. The triad of HS pathogenesis, namely upregulated inflammation, altered epithelial differentiation and dysregulated metabolism/hormone signaling was confirmed, the molecular association of HS with certain comorbid disorders, such as inflammatory bowel disease, arthritis, type I diabetes mellitus and lipids/atherosclerosis/adipogenesis was verified and common biomarkers were identified. The molecular suitability of compounds in clinical studies was confirmed and 31 potential HS repurposing drugs, among them 10 drugs already launched for other disorders, were detected. This systematic review provides evidence for the importance of molecular studies to advance the knowledge regarding pathogenesis, future treatment and biomarker-supported clinical course follow-up in HS.

**Keywords:** hidradenitis suppurativa; acne inversa; transcriptome; proteome; comorbid disorder; biomarker; drug repurposing; signaling pathway; druggable gene

#### **1. Introduction**

Hidradenitis suppurativa/acne inversa (HS) is a chronic, inflammatory, recurrent, debilitating skin disease of the hair follicle that usually presents after puberty with painful, deep-seated, inflamed lesions in the apocrine gland-bearing areas of the body, most commonly at the axillae, inguinal and anogenital regions [1]. A consistent finding, regardless of disease duration, is follicular hyperkeratosis, leading to follicular rupture, inflammation and possible secondary bacterial colonization. The deep part of the follicle appears to be involved. HS is further associated with an initial lymphohistiocytic inflammation, granulomatous reaction, sinus tract formation and scarring [2].

Current own transcriptome and proteome studies highlighted a panel of immunerelated drivers in HS, which induce an innate immunity response in epithelial skin cells in a targeted manner [3]. An inflammatory process coupled to impaired barrier function and bacterial activity were detected at the follicular and epidermal keratinocyte and at a minor

**Citation:** Zouboulis, V.A.; Zouboulis, K.C.; Zouboulis, C.C. Hidradenitis Suppurativa and Comorbid Disorder Biomarkers, Druggable Genes, New Drugs and Drug Repurposing—A Molecular Meta-Analysis. *Pharmaceutics* **2022**, *14*, 44. https://doi.org/10.3390/ pharmaceutics14010044

Academic Editors: Lucret,ia Udrescu, Ludovic Kurunczi, Paul Bogdan and Mihai Udrescu

Received: 29 November 2021 Accepted: 23 December 2021 Published: 26 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

grade at the skin-gland level. In addition, the adipose tissue was shown to be involved in HS at a real-world immune histochemical study [4].

Despite the beneficial therapeutic effectiveness of several compounds [5,6], treatment of HS is still challenging, since most patients only respond partially with subsequent recurrences. The large unmet need of new therapies requires the elucidation of diseasedriving mechanisms and the recognition of the skin compartment initially involved [7,8]. This need can be covered by the development of novel therapeutic regimens for HS [9,10] or by drug repurposing through drug–gene interaction profiling [11,12].

New technology, including inverse virtual screening [13] and computational drug repurposing screening approaches [14], are widely engaged in identifying existing compounds as potential drugs for various diseases. The interaction level of disease and compound molecular profile patterns defines the probability of therapeutic activity of a certain drug. The aim of this study is to provide a wide and robust application of molecular pharmacology in HS through a systematic review of the relevant literature and identification of key molecular mediators in a real-world setting. Using the latter data, therapeutic agents that are currently available or under development for other indications are identified and potential paths for use in the medical management of HS are proposed.

#### **2. Materials and Methods**

#### *2.1. Literature Search*

This systematic review was conducted and narrated in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [15] utilizing datasets from publicly available studies, as previously described [11]. A rigorous search of academic databases including PubMed, Web of Science and Ovid databases through August 2021 was conducted. A search strategy predefined and adapted for each aforementioned database included the following keywords: (transcriptome OR proteome OR biomarker(s) OR repurposing OR repositioning OR reprogramming) AND (hidradenitis suppurativa OR acne inversa OR Verneuil's disease). Additional records were obtained through the Gene Expression Omnibus, National Institutes of Health (Bethesda, MD, USA) [16] and the citation search of the bibliographic records obtained from the academic databases. There were no search filters pertaining to language or publication year.

#### *2.2. Study Selection*

First the duplicates among bibliographic records were removed. Titles and abstracts were then scrutinized by two reviewers (V.A.Z. and K.C.Z.) working independently according to predefined inclusion and exclusion criteria. This was followed by scrutiny of full texts of eligible studies. Discrepancies were resolved by discussion with the senior investigator (C.C.Z.). After eligible studies were identified, their bibliographies were screened for studies judged suitable for inclusion. Original investigations of HS molecular signatures and protein studies followed by the identification of molecular mediators were selected for further analysis.

#### *2.3. Data Extraction*

Data pertaining to characteristics of publications under study and quantitative data were extracted by two of the reviewers (V.A.Z. and K.C.Z.) working independently using a predetermined customized extraction form. Characteristics of publications included publication year and affiliation of corresponding authors. Molecular characteristics included transcriptome and/or proteome of HS, and drug repurposing/repositioning/reprogramming.

#### *2.4. Data Analysis*

Qualitative gene/protein data from the studies were pooled to detect HS signature pathways. Gene nomenclature was verified through the HUGO Gene Nomenclature Committee, European Bioinformatics Institute (Cambridge, UK) public domain [17]. Gene taxonomy was assessed through the biological DataBase network, National Cancer Insti-

tute (Frederick, MD, USA) [18]. The molecular pathways were assessed according to the g:Profiler, University of Tartu (Tartu, Estonia) [19], the Kyoto Encyclopedia of Genes and Genomes [KEGG, gene ontology (GO); Kyoto, Japan] [20], the Reactome (REAC), Ontario Institute for Cancer Research (Toronto, ON, Canada), New York University (New York, NY, USA), Oregon Health and Science University (Portland, OR, USA) and the European Molecular Biology Laboratory—European Bioinformatics Institute (Heidelberg, Germany) [21], the WikiPathways (WP) [22] and the Human Phenotype Ontology (HP; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA) [23] public domains. Random effects were applied throughout the analysis due to expected clinical heterogeneity encountered in different studies supported by g:Profiler [19]. This approach allows heterogeneity in the data to be addressed by considering that differences between studies are random.

#### *2.5. Drug Repurposing Sources*

For drug repurposing, the detected overall HS molecular signature was compared with the drugs' molecular signatures of The Drug Repurposing Hub public domain, Eli and Edy L. Broad Institute, MIT and Harvard University (Cambridge, MA, USA) [24] and the Gene Cards, Weizmann Institute of Science (Rehovot, Israel) [25] public domains.

#### *2.6. Statistics*

Statistics were automatically performed by the applied public domains used [19–23].

#### **3. Results**

#### *3.1. Study Selection Process*

A total of 123 bibliographic records were identified after electronic database searches, 36 through other sources and six through bibliographic record citation search. Among them, 61 records were removed as duplicates, leaving 104 titles and abstracts to be screened. After careful screening and manual search, six records were excluded based on title and abstract and 49 records due to inappropriate design and two records due to overlapping data sets with another record, resulting in 47 studies that were included in the quantitative synthesis [3,4,11,26–69] (Figure 1).

#### *3.2. Differentially Expressed Genes and Proteins in HS*

The comparison of lesional skin vs. non-lesional skin as well as of blood of patients vs. controls at the mRNA and protein levels (cumulatively reported as "targets") without restrictions revealed 386 differentially expressed genes (DEGs) in HS (Table S1).

#### *3.3. HS Biomarkers*

DEGs and differentially expressed proteins in blood and involved skin of HS patients in comparison to controls in at least two relevant articles or two targets were defined as HS biomarkers. Among the 109 detected genes/proteins out of the 386 genes/proteins detected without restrictions, which fulfilled this requirement, 43 DEGs (including the coding genes of detected differentially expressed proteins) have been described in 2/4 targets in two articles, seven in 3/4 targets (*CXCL10*, *IL6*, *IL17A*, *IL36A*, *IL36G*, *S100A8*, *S100A9*) and none in all four targets (Table 1). Additional 10 DEGs have been described in 2/4 targets, however, in a diversified direction (upregulated/downregulated). Among the 109 HS biomarkers, 65 are druggable.

the 109 HS biomarkers, 65 are druggable.

**Figure 1.** Preferred reporting items for systematic reviews and meta-analyses (PRISMA 2020 [15]) flow diagram. **Figure 1.** Preferred reporting items for systematic reviews and meta-analyses (PRISMA 2020 [15]) flow diagram.

*S100A9*) and none in all four targets (Table 1). Additional 10 DEGs have been described in 2/4 targets, however, in a diversified direction (upregulated/downregulated). Among

**Table 1.** HS biomarkers resulting from the DEGs after transcriptomic profiling and protein expression studies between lesional HS and non-lesional skin biopsies and blood samples from HS patients and healthy controls, respectively and reported in at least two relevant articles. Bold letters indicate druggable genes. Background: white = similar results reported in one target (biological material) in at least two independent studies; orange = similar results reported in two targets in at least two independent studies; yellow = similar results reported in three targets in at least two independent studies. Gray = diversified result reported in at least two independent studies; + = upregulation; - = downregulation; +/- = diversified dysregulation in different studies; () = lower level of evidence. **Table 1.** HS biomarkers resulting from the DEGs after transcriptomic profiling and protein expression studies between lesional HS and non-lesional skin biopsies and blood samples from HS patients and healthy controls, respectively and reported in at least two relevant articles. Bold letters indicate druggable genes. Background: white = similar results reported in one target (biological material) in at least two independent studies; orange = similar results reported in two targets in at least two independent studies; yellow = similar results reported in three targets in at least two independent studies. Gray = diversified result reported in at least two independent studies; + = upregulation; − = downregulation; +/− = diversified dysregulation in different studies; () = lower level of evidence.





diphtheria

IL32 + [30,40,61] Interleukin 32 Cutaneous




#### *3.4. Enrichment Analysis of HS-Associated Genes*

The 386 detected HS-associated DEGs and the 109 HS biomarkers were enriched into relevant signaling pathways, which were assessed according to the g:Profiler [19], the KEGG GO, [20], the REAC [21], the WP [22] and the HP [23] public domains in order to identify the major organismal and signal transduction pathways involved in HS. Gene clustering in chromosome 2 and 4 was detected.

Among the 386 HS-associated DEGs, 101 genes were enriched in the cytokine–cytokine (C–C) receptor interaction pathway (−log<sup>10</sup> = 2.5 <sup>×</sup> <sup>10</sup>−74), 51 in the JAK-STAT signaling pathway (2.6 <sup>×</sup> <sup>10</sup>−34), 39 in the chemokine signaling pathway (2.7 <sup>×</sup> <sup>10</sup>−18), 32 in the IL-17 signaling pathway (1.8 <sup>×</sup> <sup>10</sup>−22), 31 in the Th17 cell differentiation pathway (2.6 <sup>×</sup> <sup>10</sup>−18), 28 in the Toll-like receptor (TLR) pathway (2.2 <sup>×</sup> <sup>10</sup>−16) and 26 in the inflammatory bowel disease pathway (3.6 <sup>×</sup> <sup>10</sup>−26) (Figure S1).

Furthermore, 45 HS biomarkers were enriched in the C–C receptor interaction pathway (5.6 <sup>×</sup> <sup>10</sup>−43, Figure 2, 19 in the IL-17 signaling pathway (8.8 <sup>×</sup> <sup>10</sup>−19, Figure 3), 19 in the JAK-STAT signaling pathway (6.0 <sup>×</sup> <sup>10</sup>−14, Figure 4), 18 in the inflammatory bowel disease pathway (1.1 <sup>×</sup> <sup>10</sup>−20), 18 in the rheumatoid arthritis pathway (1.2 <sup>×</sup> <sup>10</sup>−17), 13 in the Th17 cell differentiation pathway (1.5 <sup>×</sup> <sup>10</sup>−<sup>9</sup> ), 13 in the lipid and atherosclerosis pathway (1.2 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ), 10 in the TLR pathway (4.3 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), 9 in C-type leptin receptor signaling

pathway (6.1 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ), 8 in the tumor necrosis factor (TNF) signaling pathway (1.1 <sup>×</sup> <sup>10</sup>−<sup>3</sup> ) and 7 in the type I diabetes mellitus pathway (8.5 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ) (Figure 5). *Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 11 of 23

**Figure 2.** Hierarchical clustering of HS biomarkers in the KEGG GO C-C receptor interaction pathway. Genes which are positively regulated in HS are shown in green color, those downregulated with red color. Gray color corresponds to genes with a diversified reported regulation. **Figure 2.** Hierarchical clustering of HS biomarkers in the KEGG GO C-C receptor interaction pathway. Genes which are positively regulated in HS are shown in green color, those downregulated with red color. Gray color corresponds to genes with a diversified reported regulation.

*Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 12 of 22

**Figure 3.** Hierarchical clustering of HS biomarkers in the KEGG GO IL-17 signaling pathway. Genes which are positively regulated in HS are shown in green color. Gray color corresponds to genes with a diversified reported regulation. **Figure 3.** Hierarchical clustering of HS biomarkers in the KEGG GO IL-17 signaling pathway. Genes which are positively regulated in HS are shown in green color. Gray color corresponds to genes with a diversified reported regulation. **Figure 3.** Hierarchical clustering of HS biomarkers in the KEGG GO IL-17 signaling pathway. Genes which are positively regulated in HS are shown in green color. Gray color corresponds to genes with a diversified reported regulation.

genes with a diversified reported regulation. **Figure 4.** Hierarchical clustering of HS biomarkers in the KEGG GO JAK-STAT signaling pathway. Genes which are positively regulated in HS are shown in green color. Gray color corresponds to genes with a diversified reported regulation. **Figure 4.** Hierarchical clustering of HS biomarkers in the KEGG GO JAK-STAT signaling pathway. Genes which are positively regulated in HS are shown in green color. Gray color corresponds to genes with a diversified reported regulation.

**Figure 5.** Enrichment of HS biomarkers resulting from the comparison of transcriptomic profiles and protein expression studies between lesional HS and non-lesional skin biopsies and blood samples from HS patients and healthy controls, respectively, in signaling pathways. **Figure 5.** Enrichment of HS biomarkers resulting from the comparison of transcriptomic profiles and protein expression studies between lesional HS and non-lesional skin biopsies and blood samples from HS patients and healthy controls, respectively, in signaling pathways.

HS (Figure 5).

Concerning the individual cytokine signaling, IL-17, IL-4, IL-13, IL-10, IL-20 family, IL-1 family, IL-18, IL-36, IL-2 family, IL-21 and IL-12 family signaling included DEGs in HS (Figure 5). IL-1 family, IL-18, IL-36, IL-2 family, IL-21 and IL-12 family signaling included DEGs in Epithelial differentiation signaling dysregulation in HS was represented by the epidermal growth factor receptor (EGFR), IL-1, IL-1 receptor, formation of the cornified

Concerning the individual cytokine signaling, IL-17, IL-4, IL-13, IL-10, IL-20 family,

*Pharmaceutics* **2021**, *13*, x FOR PEER REVIEW 14 of 23

Epithelial differentiation signaling dysregulation in HS was represented by the epidermal growth factor receptor (EGFR), IL-1, IL-1 receptor, formation of the cornified envelope, TLRs and antimicrobial peptides (Figure 5). envelope, TLRs and antimicrobial peptides (Figure 5). Metabolic/obesity-associated dysregulation in HS was detected through type I

Metabolic/obesity-associated dysregulation in HS was detected through type I diabetes mellitus signaling, lipid and atherosclerosis, C-type leptin receptor signaling, estrogendependent nuclear events and extranuclear signaling, adipogenesis and resistin signaling (Figure 5). diabetes mellitus signaling, lipid and atherosclerosis, C-type leptin receptor signaling, estrogen-dependent nuclear events and extranuclear signaling, adipogenesis and resistin signaling (Figure 5). Interestingly, infection-indicating signaling pathways did not exhibit any major

Interestingly, infection-indicating signaling pathways did not exhibit any major involvement in our study (Figure 5). involvement in our study (Figure 5). At last, the REAC evaluation of globally involved pathways [70] revealed the innate

At last, the REAC evaluation of globally involved pathways [70] revealed the innate immune system, the cytokine signaling in immune system (major pathways: regulation of *IFNG* signaling), signal transduction (nuclear receptor, *GPCR* and leptin pathways) and developmental biology (formation of the cornified envelope pathway) pathways as the mainly HS-associated ones (Figure S2). immune system, the cytokine signaling in immune system (major pathways: regulation of *IFNG* signaling), signal transduction (nuclear receptor, *GPCR* and leptin pathways) and developmental biology (formation of the cornified envelope pathway) pathways as the mainly HS-associated ones (Figure S2). The protein-based connectivity map occurring from an assumed gene biomarker

The protein-based connectivity map occurring from an assumed gene biomarker translation (103 proteins our of 109 genes) resulted in 2465 interactions compared with the expected 531 interactions (4.64-fold; *p* < 0.0001), a result that indicates a robust strong protein–protein association in HS (Figure 6). On the other hand, the protein-based connectivity map occurring from the 386 HS-associated DEGs (372 proteins out of 386 genes) resulted in 19,823 interactions compared with the expected 6502 interactions (3.05-fold; *p* < 0.0001), indicating that the biomarker selection procedure increased the HS/protein association. translation (103 proteins our of 109 genes) resulted in 2465 interactions compared with the expected 531 interactions (4.64-fold; *p* < 0.0001), a result that indicates a robust strong protein–protein association in HS (Figure 6). On the other hand, the protein-based connectivity map occurring from the 386 HS-associated DEGs (372 proteins out of 386 genes) resulted in 19823 interactions compared with the expected 6502 interactions (3.05 fold; *p* < 0.0001), indicating that the biomarker selection procedure increased the HS/protein association.

**Figure 6.** Biomarker-resulting protein-based connectivity map of HS. **Figure 6.** Biomarker-resulting protein-based connectivity map of HS.

#### *3.5. Enrichment Analysis of HS Druggable Genes*

Among the 386 HS-associated DEGs, 105 druggable genes were recognized. With the 11 additional druggable genes described by Zouboulis et al. [12], namely *ABAT*, *ADRA1A*, *CYP3A4*, *GRM4*, *HRH1*, *OPRD1*, *OPRM*, *PRKAB1*, *PTGS1*, *PTGS2* and *SLC6A4*, the overall detected druggable genes in HS are 116.

The 116 druggable genes were enriched in relevant signaling pathways according to the KEGG GO [20] and the Gene Cards [25] public domains to identify the major targeted organismal and signal transduction pathways (Figure S3). Twenty-two druggable genes were enriched in the lipid and atherosclerosis pathway (8.4 <sup>×</sup> <sup>10</sup>−13), 19 in the JAK-STAT signaling pathway (6.2 <sup>×</sup> <sup>10</sup>−12), 17 in the Th17 cell differentiation pathway (5.2 <sup>×</sup> <sup>10</sup>−13), 17 in the IL-17 signaling pathway (6.0 <sup>×</sup> <sup>10</sup>−14), 16 in the inflammatory bowel disease pathway (1.5 <sup>×</sup> <sup>10</sup>−16), 14 in the TLR signaling pathway (6.0 <sup>×</sup> <sup>10</sup>−14), 14 in the C-type leptin receptor signaling pathway (2.4 <sup>×</sup> <sup>10</sup>−<sup>9</sup> ) and 13 in the TNF signaling pathway (8.4 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ).

#### *3.6. Study Drugs and Drug Repurposing for HS*

The majority of registered, studied or off-label administered drugs modify HS-associated DEGs. On the other hand, the evaluation of the detected 105 HS-associated druggable genes proposed 452 potentially therapeutic compounds, among them 120 launched drugs, 178 compounds in clinical studies and 154 in preclinical evaluation (Table S2). Among these potentially therapeutic compounds, the 31 drugs, which regulate three or more genes with all of them being HS-associated DEGs or at least four genes with 60% of them been DEGs were classified as probable repurposing drugs for HS (Table 2).

**Table 2.** Probable HS repurposing drugs \* and molecular profile of drugs registered \*\* or off-label administered in HS.



\* The differentially regulated genes in HS are presented with bold letters.

#### **4. Discussion**

#### *4.1. HS Pathogenesis*

Inflammation doubtlessly plays a major role in the pathogenesis of HS [3,7,8]. Proteome studies provide evidence that the innate immunity system and both *IL-1* and *IL-17* signaling pathways are activated in HS lesions and circulating neutrophils [27,40,45,71–73], findings that have been confirmed in our systematic review. In addition, Th17 differentiation of CD4+ lymphocytes is activated in HS [57]. Among others, Kelly et al. [38] provided evidence that CD45+CD4+ T cells are responsible for IL-17 production and CD11c+CD1a-CD14+ dendritic cells are the main producers of IL-1β in lesional HS skin. The IL-17 cytokine family has been linked to the pathogenesis of diverse autoimmune and inflammatory diseases and also plays an essential role in host defense against extracellular microorganisms [2,74]. IL-17 has been shown to increase the expression of skin antimicrobial peptides, including human β-defensin 2, psoriasin (S100A7) and calprotectin (S100A8/9) in keratinocytes and of a number of cytokines attracting neutrophils [75]. Thus, IL-17 may contribute to inflammation by increasing the influx of neutrophils, dendritic cells and memory T cells into the lesions. On the other hand, the involvement of *IL-1* signaling pathway is also prominent in HS with upregulation of molecules causing immune cell infiltration and extracellular matrix degradation and could be reversed by application of IL-1 receptor antagonist [40,76]. *IL1B* signaling pathway-associated genes, such as *IL1R1*, *IL1RN*,

#### *IFNG*, *IL6*, *IL18*, *IL18R1*, *IL32*, *IL33*, *IL36A*, *IL36B*, *IL36G*, *IL36RN*, *IL37*, *TLR2*, *TLR3*, *TLR4*, *S100A7*, *S100A7A*, *S100A8*, *S100A9* and *S100A12* were HS-associated DEGs, as detected in our systemic review.

The inflammatory process in HS seems to be coupled with impaired barrier function, altered epidermal cell differentiation, formation of the cornified envelope, TLRs and antimicrobial peptides [3], the latter not being associated with any infection, as clearly shown in the present study. These events have been observed at the follicular and epidermal keratinocytes and at a minor grade at the skin glands [3]. Moreover, we could confirm a dysregulated expression pattern of serpins, small proline-rich proteins and certain keratins, which further support the involvement of the follicular infundibulum in the initiation of the lesions, especially at the anatomic area of communication with the apocrine gland duct and the ductus seboglandularis [3].

Although HS has well-documented associations with the metabolic syndrome, which is characterized by systemic inflammation identified at a molecular level [77], the role of adipose tissue in HS has barely been investigated. Obesity is currently shown to represent the primary risk factor in HS at the molecular level [4,28]. A chronic low-grade subclinical inflammatory response is strongly implicated in the pathogenesis of insulin resistance and metabolic syndrome. The clinically relevant peroxisome proliferator-activated receptor (PPAR) pathway was down-regulated in adipocytes of HS lesions [4]. In agreement with these data, reduced serum levels of adiponectin were currently found in non-diabetic patients with HS [28]. Since adiponectin inhibits the production of TNF-α, IL-6 and chemokines of human macrophages the upregulation of *ADIPOQ* and *PLIN1*, shown in this systematic review, might be beneficial in HS treatment. Indeed, thiazolidine derivatives act as PPARγ agonists and effectively increase the adiponectin concentration and adipogenic gene expression [28,78]. Unsaturated fatty acids, eicosanoids and non-steroidal anti-inflammatory drugs function in a similar manner [79]. Further metabolic pathways, e.g., the IGF transport and uptake of IGF-binding proteins pathway, type I diabetes mellitus signaling, lipid and atherosclerosis, C-type leptin receptor signaling, estrogen-dependent nuclear events and extranuclear signaling and RETN signaling, encoding resistin, are dysregulated in HS, as shown in the present review.

In conclusion, inflammatory signaling, mainly innate immunity signaling pathways, mostly that of IL-1 and IL-17, epithelial differentiation signaling pathways, primarily of follicular keratinocytes and skin gland duct cells and metabolic signaling pathways, especially that of obesity/adipogenesis, represent pathogenetic HS cascades, whose activity may be targeted by future therapeutic means.

#### *4.2. HS Comorbid Disorders*

HS has been associated with a variety of comorbid disorders, such as inflammatory bowel diseases, especially Crohn's disease, axial spondylarthritis without or with follicular occlusion, triad signs, genetic keratin disorders associated with follicular occlusion, such as pachyonychia congenita, steatocystoma multiplex, Dowling-Degos disease without and with arthritis, as well as other genetic disorders, such as keratitis–ichthyosis–deafness syndrome and Down syndrome [80]. Moreover, HS has been associated with reduced quality of life, metabolic syndrome, sexual dysfunction, working disability, depression and anxiety. Like in psoriasis, HS patients have higher prevalence of cardiovascular disease risk factors and suicide risk [81]. At last, the development of epithelial tumors on chronic HS lesions at the anogenital region may be considered as the consequence of chronic severe inflammatory skin disease. The current work has provided molecular evidence of HS association with inflammatory bowel disease pathway, rheumatoid arthritis pathway, type I diabetes mellitus signaling, lipid and atherosclerosis and adipogenesis signaling.

#### *4.3. Study Drugs and Drug Repurposing for HS*

In addition to the only registered drug in HS, namely adalimumab [9,82,83], the majority of studied and off-label administered drugs also regulate differentially expressed

genes and their proteins in HS, as shown in the present review [10,65,76,81–95]. On the other hand, the 452 HS-associated druggable genes proposed can mostly be classified in receptor ligands, enzyme/protein inhibitors, JAK-STAT inhibitors, PI3K inhibitors, sodium/potassium/calcium channel activators and MMP inhibitors. Additionally, Gentamicin, Ibudilast, Spironolactone, Trastuzumab, Thalidomide, Apremilast, Glucosamine, Interferon-a-2b, Binimetinib and Midostaurin have previously been reported as repurposing drugs for HS [11]. The majority of the 31 probable repurposing drugs shown in Table 2 are JAK inhibitors, with cytokine inhibitors, such as anti-IL-17 compounds, tyrosine kinase receptor inhibitors, TNF inhibitors, cyclooxygenase inhibitors, EGF receptor inhibitors, MMP inhibitors and PPARγ ligands—among others—being represented. Ten of these drugs, which have not yet been administered in HS, are already launched for other indications and 17 are in clinical studies, not including HS.

#### **5. Conclusions**

The current review provides robust molecular evidence on the pathogenetic triads of HS, namely upregulated inflammation, dysregulated epithelial cell differentiation and obesity signaling/hormone involvement. In addition, evidence of the negligible role of infectious agents is included. Moreover, HS biomarkers with strong protein–protein connectivity in HS are presented. While adalimumab, the only currently registered drug in HS, and the majority of studied and off-label administered drugs regulate DEGs and their proteins in HS, numerous compounds are eligible for HS repurposing due to their molecular signaling. Among them, 31 compounds are designated probable, following our classification, with 10 of them already being launched for other indications.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/pharmaceutics14010044/s1: Figure S1: Enrichment of HS-associated DEGs in signaling pathways; Figure S2: Global REAC evaluation of possibly involved signaling pathways in HS; Figure S3: Enrichment of druggable HS-associated genes in signaling pathways; Table S1: DEGs resulting from the comparison of transcriptomic profiles and protein expression studies between lesional HS and non-lesional skin biopsies and blood samples from HS patients and healthy controls, respectively; Table S2. Drugs regulating HS-associated DEGs.

**Author Contributions:** Conceptualization, C.C.Z.; methodology, V.A.Z., K.C.Z. and C.C.Z.; software, V.A.Z. and K.C.Z.; validation, V.A.Z., K.C.Z. and C.C.Z.; formal analysis, V.A.Z. and K.C.Z.; investigation, V.A.Z. and K.C.Z.; resources, V.A.Z., K.C.Z. and C.C.Z.; data curation, V.A.Z. and K.C.Z.; writing—original draft preparation, V.A.Z., K.C.Z. and C.C.Z.; writing—review and editing, C.C.Z.; visualization, V.A.Z.; supervision, C.C.Z.; project administration, C.C.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Ethical review and approval were waived because the article reviews ethically approved published studies involving humans.

**Informed Consent Statement:** Patient consent was waived because the article reviews published studies involving humans. No patient can be identified.

**Data Availability Statement:** Data sets related to this article are hosted at the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/ (accessed on 21 November 2021)) data repositories GSE72702, GSE79150, GSE128637, GSE137141, GSE144801, GSE148027, GSE154773, GSE154775, GSE155176, GSE155850 and GSE175990.

**Acknowledgments:** The Departments of Dermatology, Venereology, Allergology and Immunology, Dessau Medical Center, Dessau, Germany are health care providers of the European Reference Network for Rare and Complex Skin Diseases (ERN Skin—ALLOCATE Skin group).

**Conflicts of Interest:** V.A.Z. and K.C.Z. declare no conflict of interest. C.C.Z. has received subjectrelevant honoraria from AbbVie, Bayer Healthcare, Boehringer-Ingelheim, Idorsia, Incyte, Inflarx, Janssen, Novartis, Regeneron, UCB and Viatris, which were not associated with or have any influence on this study. His departments have received grants from AbbVie, AOTI, AstraZeneca, Celgene, Galderma, Inflarx, NAOS-BIODERMA, Novartis, PPM and UCB for his participation as clinical investigator, which were not associated with this study.

#### **References**


## *Review* **Repurposing Drugs via Network Analysis: Opportunities for Psychiatric Disorders**

**Trang T. T. Truong <sup>1</sup> , Bruna Panizzutti <sup>1</sup> , Jee Hyun Kim 1,2 and Ken Walder 1,\***


**Abstract:** Despite advances in pharmacology and neuroscience, the path to new medications for psychiatric disorders largely remains stagnated. Drug repurposing offers a more efficient pathway compared with de novo drug discovery with lower cost and less risk. Various computational approaches have been applied to mine the vast amount of biomedical data generated over recent decades. Among these methods, network-based drug repurposing stands out as a potent tool for the comprehension of multiple domains of knowledge considering the interactions or associations of various factors. Aligned well with the poly-pharmacology paradigm shift in drug discovery, network-based approaches offer great opportunities to discover repurposing candidates for complex psychiatric disorders. In this review, we present the potential of network-based drug repurposing in psychiatry focusing on the incentives for using network-centric repurposing, major network-based repurposing strategies and data resources, applications in psychiatry and challenges of networkbased drug repurposing. This review aims to provide readers with an update on network-based drug repurposing in psychiatry. We expect the repurposing approach to become a pivotal tool in the coming years to battle debilitating psychiatric disorders.

**Keywords:** network analysis; drug repurposing; psychiatric disorders; medications; psychiatry; drug discovery; mental disorders

#### **1. Challenges of Drug Research for Psychiatric Disorders**

Psychiatric disorders are leading causes of disability, with an increasing burden and significant repercussions for health, society and the economy [1,2]. Despite some pharmacological advances, drug discovery for psychiatric disorders is particularly challenging and remains virtually stagnant. Out of 101 new drugs approved by the FDA in 2019 and 2020, only two were indicated for psychiatric disorders [3,4]. Such an outcome suggests that, compared with other diseases, drug development for psychiatric disorders has intrinsic bottlenecks that hinder the roadmap to new medications. In particular, there is a lack of understanding of the pathological mechanisms of neuropsychiatric disorders, largely due to their complex and ambiguous aetiology (genetics, environment, brain structure and function) [5,6]. Therefore, these disorders pose great challenges to the identification and characterization of biomarkers and molecular targets, as well as utilizing animal models adequately representing the disease.

Drug development is an inherently laborious, expensive, and time-consuming process, which becomes even more difficult for psychiatric disorders subserved by poorly understood mechanisms. Conventional drug discovery has long been considered a costly and risky journey (Figure 1a). The whole process usually takes approximately 13–15 years from initial discovery to final regulatory approval, and costs USD 2–3 billion [7]. The expenditure is predominated by failed candidates which are common given the low success rate of <10% [8].

**Citation:** Truong, T.T.T.; Panizzutti, B.; Kim, J.H.; Walder, K. Repurposing Drugs via Network Analysis: Opportunities for Psychiatric Disorders. *Pharmaceutics* **2022**, *14*, 1464. https://doi.org/10.3390/ pharmaceutics14071464

Academic Editors: Ludovic Kurunczi, Paul Bogdan, Mihai Udrescu and Lucret,ia Udrescu

Received: 1 June 2022 Accepted: 12 July 2022 Published: 14 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1. The comparison between (a) conventional drug discovery and (b) drug repurposing.** (**a**) De novo drug discovery usually requires 13–15 years and may cost up to USD 3 billion from initial experiments to final marketing approval. Moreover, the overall success rate is only ~10%. (**b**) Drug repurposing typically bypasses several steps of the conventional approach, including not only early discovery and preclinical stages but also Phase I clinical trials. Hence, time and cost can be optimized to 5–11 years and USD 0.35 billion respectively, with an improved success rate of 30%. In the field of neuropharmacology, there have been a substantial number of repur-**Figure 1. The comparison between (a) conventional drug discovery and (b) drug repurposing.** (**a**) De novo drug discovery usually requires 13–15 years and may cost up to USD 3 billion from initial experiments to final marketing approval. Moreover, the overall success rate is only ~10%. (**b**) Drug repurposing typically bypasses several steps of the conventional approach, including not only early discovery and preclinical stages but also Phase I clinical trials. Hence, time and cost can be optimized to 5–11 years and USD 0.35 billion respectively, with an improved success rate of 30%.

posed drugs approved or in development. A review by Caban et al. in 2017 reported a total of 118 repurposed drugs for 203 cases in neurology and psychiatry (some drugs have been repurposed for more than one neuropsychiatric disease) [18]. Although most approved drug cases originated from the same discipline (i.e., neuropharmacology), the majority of developing cases are from outside the field [18]. For example, there are recent investigational candidates with positive results, such as tamoxifen repurposed from oncology for use as an antimanic agent (completed phase 3 clinical trials) [19], and quinidine which was repositioned from an anti-arrhythmia drug to an antipsychotic (currently entering phase 3 clinical trials) [20]. The early success of these candidates may be a glimpse of the vast untapped potential of recycling drugs from beyond the scope of neuropharmacology. **3. Why Networks Matter for Psychiatric Drug Research**  In de novo drug discovery, a hypothesis related to the inhibition or activation of a protein/pathway would form the basis for the first step (target discovery—as shown in Figure 1a) [9]. However, psychiatric disorders are multi-faceted conditions, and it is still unknown whether targeting a key factor/pathway could lead to successful treatments [10]. The lack of experimental models not only poses further hurdles to answering that key mechanistic question but also prevents the next step of de novo drug discovery, i.e., lead discovery and optimisation (Figure 1a). This step is generally based on high-throughput compound screening or/and structure-based design but such approaches would require credible models to measure expected phenotypic traits [9]. Furthermore, novel compounds would undergo pharmacokinetics and pharmacodynamics testing including blood–brain barrier (BBB) penetration—another unique challenge of drugs targeting central nervous system (CNS) diseases such as psychiatric disorders [11].

#### identification is critical. Such repurposing compounds could be recognized from empiri-**2. Drug Repurposing—An Accelerated Framework for Psychiatric Drug Development**

Across the entire process of drug repurposing (Figure 1b), the first step of compound

cal or even serendipitous observations, with the prominent examples of valproic acid for bipolar disorder and ketamine for major depression [21,22]. While these empirical findings have earned great success in psychiatric drug research, the advent of computational In recent years, drug repurposing or repositioning, i.e., finding new indications for drugs previously developed and/or marketed for a different disease, has become an attractive alternative to conventional drug discovery. Considering the high attrition rate of de novo drug discovery, a plethora of abandoned candidate drugs, including some that have passed safety assessment but failed due to lack of efficacy, can be recycled and utilized for new therapeutic purposes. Given the known safety profiles and bioavailability, as well as established manufacturing processes, drug repurposing can bypass some steps of conventional drug discovery and hence shorten the timeline from bench to bedside with

lower cost and less risk (Figure 1b) [12–14]. Drug repurposing is playing an increasingly important role in the pharmaceutical industry. Out of 64 new drugs and biologics approved by the FDA in 2018, only 8 were first-in-class agents (i.e., novel drugs with a unique mechanism of action) [15]. As a shortcut to drug development, drug repurposing provides more feasible paradigms for organizations and institutions with limited resources, and potentially better financial incentives for companies to invest in rare, orphan diseases [16]. Importantly, governments and regulatory bodies are giving rigorous support including funding programs and drug repurposing public databases [17].

In the field of neuropharmacology, there have been a substantial number of repurposed drugs approved or in development. A review by Caban et al. in 2017 reported a total of 118 repurposed drugs for 203 cases in neurology and psychiatry (some drugs have been repurposed for more than one neuropsychiatric disease) [18]. Although most approved drug cases originated from the same discipline (i.e., neuropharmacology), the majority of developing cases are from outside the field [18]. For example, there are recent investigational candidates with positive results, such as tamoxifen repurposed from oncology for use as an antimanic agent (completed phase 3 clinical trials) [19], and quinidine which was repositioned from an anti-arrhythmia drug to an antipsychotic (currently entering phase 3 clinical trials) [20]. The early success of these candidates may be a glimpse of the vast untapped potential of recycling drugs from beyond the scope of neuropharmacology.

#### **3. Why Networks Matter for Psychiatric Drug Research**

Across the entire process of drug repurposing (Figure 1b), the first step of compound identification is critical. Such repurposing compounds could be recognized from empirical or even serendipitous observations, with the prominent examples of valproic acid for bipolar disorder and ketamine for major depression [21,22]. While these empirical findings have earned great success in psychiatric drug research, the advent of computational techniques as well as high-throughput data from "omics" technologies have enabled us to adopt a more systematic approach to discover new therapeutic agents. These approaches also require the design of methodologies that integrate the high-dimensional but noisy data efficiently to acquire useful insights for drug discovery, leading to the application of network science in medical research. Network science is the use of multiple layers of information to identify connections among biological components that are inherently and physiologically relevant [23].

The fusion of network science and drug research was first conceptualized by Andrew L. Hopkins based on the premise of poly-pharmacology—one drug, multiple targets [24]. This holistic view has been appreciated in psychiatry, in which many psychotropic drugs have been shown to exhibit promiscuity as an intrinsic feature of their therapeutic effects [25]. Antipsychotics are prominent examples. Each antipsychotic drug typically targets multiple receptors and they possess distinct pharmacological profiles [5]. Hence, poly-pharmacological profiles demand consideration of multiple factors (e.g., interactions with molecular targets, downstream affected pathways) to elucidate the mechanism(s) of action of known drugs as well as to discover new therapeutic agents for psychiatric disorders [6]. Network science enables the integration of various biological elements and simultaneous consideration of their relationships in complex systems, making it a powerful system for the poly-pharmacological paradigm.

Despite their pathological heterogeneity, psychiatric disorders have been suggested to share overlapping molecular mechanisms especially at the genetics level [26–29]. Comorbidity is the norm rather than the exception for psychiatric disorders [30–33]. While such commonality has posed challenges to the characterisation of distinct disorders, it also offers opportunities for the utilisation of existing drugs in multiple mechanistic-related disorders [34]. Therefore, network-based approaches can leverage the interconnection between different disorders to find potential latent connections suggesting the recycling of known targets of a disorder in another disorder.

#### **4. Network-Based Drug Repurposing in Psychiatry**

Previous publications have offered comprehensive reviews on network science theory [35] and capabilities in the context of medicine [36,37]. Herein, we will present major terminologies, repurposing strategies, main data resources and applications in psychiatric drug research.

Network-based interpretation comprises three major steps from understanding to predicting and possible manipulating biological systems: (1) network inference (reconstruction of network relationships from biomedical data, mostly from high-throughput assays), (2) network analysis (harnessing the topological relationships of networks), (3) network modelling (dynamic representations of time-course perturbations of network elements under different conditions) [38,39]. Most studies so far have utilised the first two steps for static networks, but very few have advanced to dynamic network modelling [36].

A network inference approach involves "simplifying" complex systems by describing them as a map of nodes connected by edges denoting their relationships or interactions [40] (Figure 2). While networks can represent a wide range of biological processes, in the context of drug discovery research, nodes are generally molecular targets (genes, proteins), compounds (drugs) or diseases, with their relationships inferred from structural interactions (e.g., protein–protein interactions), correlation (e.g., co-expression networks) or conditional dependences (e.g., Bayesian networks) [41]. Many real-world networks including biological networks, tend to exhibit scale-free properties, which means only a minority of nodes have a greater number of neighbours than average ("hubs"), while most nodes only have a few connections [42–44]. Selective targeting of hubs can therefore cause much greater impact on the function of the networks than those modulations on peripheral nodes, making hubs ideal drug targets [45]. *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 5 of 24

**Figure 2. Main elements of a network.** In the network, nodes (circles) are connected via edges (lines). For biological networks, nodes are usually biological entities (genes, proteins) and edges denote their relationships (interaction, association, similarity). From the networks, modules are clusters of closely connected nodes. Degree is the number of direct connections a node has to other nodes. Hubs are nodes with the highest degrees in the networks, meaning they have the highest number of connections. The shortest distance between node A and B is the path with the minimum number of edges from A to B. Created with BioRender.com (accessed on 2 June 2022). **Figure 2. Main elements of a network.** In the network, nodes (circles) are connected via edges (lines). For biological networks, nodes are usually biological entities (genes, proteins) and edges denote their relationships (interaction, association, similarity). From the networks, modules are clusters of closely connected nodes. Degree is the number of direct connections a node has to other nodes. Hubs are nodes with the highest degrees in the networks, meaning they have the highest number of connections. The shortest distance between node A and B is the path with the minimum number of edges from A to B. Created with BioRender.com (accessed on 2 June 2022).

Network-based drug repurposing efforts are generally based on Swanson's ABC model to retrieve unknown latent knowledge from multiple sources of data incorporated in the networks [46]. An assumption of this approach is that when term A is connected to term B, and term B is connected to term C, we can assume that terms A and C are also

and C must originate from different domains to yield new knowledge, and B can include multiple steps to abridge from A to C (A B1 B2 … Bn C) [47,48] (Figure 3).

Network-based drug repurposing efforts are generally based on Swanson's ABC model to retrieve unknown latent knowledge from multiple sources of data incorporated in the networks [46]. An assumption of this approach is that when term A is connected to term B, and term B is connected to term C, we can assume that terms A and C are also connected. For example, an indirect link between drug and disease can be inferred from a direct drug-target connection and a direct target-disease connection. In the ABC model, A and C must originate from different domains to yield new knowledge, and B can include multiple steps to abridge from A to C (A → B<sup>1</sup> → B<sup>2</sup> . . . B<sup>n</sup> → C) [47,48] (Figure 3). *Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 6 of 24

**Figure 3. ABC model for network-based drug repurposing.** Latent repurposing relationships can be inferred from multiple layers of network-based knowledge such as disease-target (diseasome), target–target (e.g., protein interactome), and drug–target interactions. As an example, disease A has target B1 exhibiting direct interaction with target B2 which in turn is targeted by drug C, suggesting drug C might be relevant for disease A (A B1 B2 C). Created with BioRender.com (accessed on 2 June 2022). **Figure 3. ABC model for network-based drug repurposing.** Latent repurposing relationships can be inferred from multiple layers of network-based knowledge such as disease-target (diseasome), target–target (e.g., protein interactome), and drug–target interactions. As an example, disease A has target B<sup>1</sup> exhibiting direct interaction with target B<sup>2</sup> which in turn is targeted by drug C, suggesting drug C might be relevant for disease A (A → B<sup>1</sup> → B<sup>2</sup> → C). Created with BioRender.com (accessed on 2 June 2022).

Another common approach is "guilt-by-association" (GBA), which uses similarity measures to suggest new disease indications for drugs [49]. There are two main assumptions of GBA: (1) if two diseases share a significant number of characteristics (e.g., indications, medical descriptions, mechanisms), a drug known to treat one of them may also treat the other (Figure 4A); and (2) if a drug with unknown indications and another drug with known indications share similar properties (e.g., chemical structures, transcriptional effects), they may have the same indication profile (Figure 4B). The major challenge of this approach would be how to define the robust similarity metric between drugs or diseases that concurs with similarity in mechanisms of action. Another common approach is "guilt-by-association" (GBA), which uses similarity measures to suggest new disease indications for drugs [49]. There are two main assumptions of GBA: (1) if two diseases share a significant number of characteristics (e.g., indications, medical descriptions, mechanisms), a drug known to treat one of them may also treat the other (Figure 4A); and (2) if a drug with unknown indications and another drug with known indications share similar properties (e.g., chemical structures, transcriptional effects), they may have the same indication profile (Figure 4B). The major challenge of this approach would be how to define the robust similarity metric between drugs or diseases that concurs with similarity in mechanisms of action.

**Figure 4. Guilt-by-association for network-based drug repurposing using (A) disease–disease or (B) drug–drug similarity.** (**A**) Disease–disease similarity is generally inferred from one or several disease-related properties such as overlapping disease genes, symptoms or comorbidities. A weighted disease network (diseasome) can be built based on the similarity metric; herein, modules of similar nodes (diseases) can be identified. The module containing the disease of interest (highlighted in the brown dashed circle) might suggest potential shared mechanism(s) for repurposing drugs. Within this module, if multiple connected diseases have known drugs with similar mechanism X, such drugs might be repurposed for the disease of interest. (**B**) Drug–drug similarity can be calculated based on one or several properties such as chemical structures, targets, side effects or transcriptional profiles. Using the similarity metric as the weight of edges for network construction, ones can identify modules of highly similar nodes (drugs) suggesting similar mechanisms of action. When considering in the context of a certain disease A, it would be of interest to focus on the module containing multiple known drugs for disease A (highlighted as brown dashed square). Within such a module, a drug that has yet to be used for disease A might be a potential repurposing candidate due to its high similarity with other drugs used for disease A. Created with BioRender.com (accessed on 2 June 2022). **Figure 4. Guilt-by-association for network-based drug repurposing using (A) disease–disease or (B) drug–drug similarity.** (**A**) Disease–disease similarity is generally inferred from one or several disease-related properties such as overlapping disease genes, symptoms or comorbidities. A weighted disease network (diseasome) can be built based on the similarity metric; herein, modules of similar nodes (diseases) can be identified. The module containing the disease of interest (highlighted in the brown dashed circle) might suggest potential shared mechanism(s) for repurposing drugs. Within this module, if multiple connected diseases have known drugs with similar mechanism X, such drugs might be repurposed for the disease of interest. (**B**) Drug–drug similarity can be calculated based on one or several properties such as chemical structures, targets, side effects or transcriptional profiles. Using the similarity metric as the weight of edges for network construction, ones can identify modules of highly similar nodes (drugs) suggesting similar mechanisms of action. When considering in the context of a certain disease A, it would be of interest to focus on the module containing multiple known drugs for disease A (highlighted as brown dashed square). Within such a module, a drug that has yet to be used for disease A might be a potential repurposing candidate due to its high similarity with other drugs used for disease A. Created with BioRender.com (accessed on 2 June 2022).

*Pharmaceutics* **2022**, *14*, x FOR PEER REVIEW 8 of 24

Data for network construction can be sourced from experimental data (e.g., high throughput screening), text mining or databases (e.g., phenotypic profiles, protein interactions). Text mining is also the main strategy of literature-based drug repurposing, which shares many integrative opportunities with network-centric approaches. Hence, readers can refer to previous reviews in this domain for an in-depth methodological presentation [50,51]. The advantage of network-based approaches is the possible integration of multiple data layers to complement the incompleteness of each domain's knowledge. Therefore, studies using network-based drug repurposing tend to utilise multiple data sources rather than one. There are various ways of data incorporation to find repurposing insights as shown in Figure 5. However, one should consider the relevance to the disease of interest (e.g., data yielded from brain tissue versus muscle tissue) and the robustness of the evidence supporting such a relationship (e.g., experimental evidence versus co-expression). Multi-omics integration has been playing a major role in the current biological interpretation and readers can refer to previous reviews of specific updates and recommendations for this approach [52]. Herein, we will focus on different types of biomedical database resources and their utility in the context of psychiatric drug discovery research (summarised in Table 1). A summary of studies using network-based drug repurposing in psychiatry is given in Table 2. can refer to previous reviews in this domain for an in-depth methodological presentation [50,51]. The advantage of network-based approaches is the possible integration of multiple data layers to complement the incompleteness of each domain's knowledge. Therefore, studies using network-based drug repurposing tend to utilise multiple data sources rather than one. There are various ways of data incorporation to find repurposing insights as shown in Figure 5. However, one should consider the relevance to the disease of interest (e.g., data yielded from brain tissue versus muscle tissue) and the robustness of the evidence supporting such a relationship (e.g., experimental evidence versus co-expression). Multi-omics integration has been playing a major role in the current biological interpretation and readers can refer to previous reviews of specific updates and recommendations for this approach [52]. Herein, we will focus on different types of biomedical database resources and their utility in the context of psychiatric drug discovery research (summarised in Table 1). A summary of studies using network-based drug repurposing in psychiatry is given in Table 2.

Data for network construction can be sourced from experimental data (e.g., high

throughput screening), text mining or databases (e.g., phenotypic profiles, protein inter-

shares many integrative opportunities with network-centric approaches. Hence, readers

**Figure 5. Different data sources for network-based drug repurposing.** Curved arrows represent the associations of entities within one type (e.g., drug–drug). Multiple data sources (coloured correspondingly to their main domains such as transcriptome) can be applied to infer these associations, usually for the creation of similarity or interacting networks. Straight arrows represent the relationships between entities of different types (e.g., drug–target). For drug repurposing, the aim generally is to find a latent drug–disease connection, which can be achieved by taking the inference route from Drugs– Targets–Diseases (and vice- versa) as in the ABC model, or via Diseases–Diseases–Drugs (or Drugs– Drugs–Diseases) as in the GBA model. Created with BioRender.com (accessed on 2 June 2022). **Figure 5. Different data sources for network-based drug repurposing.** Curved arrows represent the associations of entities within one type (e.g., drug–drug). Multiple data sources (coloured correspondingly to their main domains such as transcriptome) can be applied to infer these associations, usually for the creation of similarity or interacting networks. Straight arrows represent the relationships between entities of different types (e.g., drug–target). For drug repurposing, the aim generally is to find a latent drug–disease connection, which can be achieved by taking the inference route from Drugs–Targets–Diseases (and vice- versa) as in the ABC model, or via Diseases–Diseases–Drugs (or Drugs–Drugs–Diseases) as in the GBA model. Created with BioRender.com (accessed on 2 June 2022).


#### **Table 1. Summary of major data sources and their usage examples in psychiatry**.


**Table 2. Summary of studies using network-based drug repurposing for psychiatric disorders.** Abbreviations: ABC: ABC model; ASD: autism spectrum disorder; ADHD: attention-deficit/hyperactivity disorder; BD: bipolar disorder; GBA: guilt-by-association model; MDD: major depressive disorder; SCZ: schizophrenia; SUD: substance use disorder; TWAS: transcriptome-wide association study; ?: unclear mechanism.





#### *4.1. Structural Data (Structome)*

Structural data from compounds and biological entities such as proteins and RNAs have been extensively utilized in structure-based drug repurposing [123]. The conventional structure-based approach usually requires a few predefined specific target molecules, which is not suitable for psychiatric disorders with complex pathology as mentioned in Section 3. However, network-centric approaches can incorporate the structome as a layer of information in a non-biased way to find new indications for drugs. Tan et al. used descriptions of 3D chemical structures from PubChem to calculate the similarity profiles of 965 drugs [59]. The Tanimoto-based 3D similarity scores were then combined with gene semantic similarity information and drug–target interactions to construct a drug similarity network. From this GBA approach, Tan et al. predicted new indications for 143 drugs and missing indications for 42 drugs without Anatomical Therapeutic Chemical (ATC) codes (indications not yet listed in ATC database) (Table 2). Psychotropic drugs suggested for repurposing from this study included raloxifene (from postmenopausal osteoporosis to schizophrenia) and cyclobenzaprine (from muscle spasms to sleep disorders) [59]. Raloxifene has passed a phase 4 clinical trial in participants with schizophrenia [124,125] while a phase 2 clinical trial of cyclobenzaprine was terminated prematurely due to inadequate recruitment [126].

#### *4.2. Genome*

Using the phenotype-to-genotype concept, multiple large-scale genome-wide association studies (GWAS) have identified thousands of genetic variants across the genome associated with psychiatric disorders [127,128]. Disease-associated genes located in risk loci can be inferred from GWAS data and are usually used in network analysis as a filtering layer to prioritise targets relevant to the disease. Ganapathiraju et al. used schizophreniaassociated genes in combination with protein–protein interactions to create a schizophrenia interactome [88]. Such a disease-specific network can be harnessed for target identification and testing of repurposed agents [122]. However, a major limitation of using GWAS data is the lack of directionality, making it difficult to determine whether a risk gene is up- or down-regulated in the disease phenotype. Gaspar et al. partially addressed this shortcoming via the incorporation of the GWAS summary statistics with gene expression to predict expression levels in different tissues, which were incorporated with drug–target interactions to build a bipartite tissue-specific drug–target network for major depression [73] (Table 2).

#### *4.3. Transcriptome*

Among the wealth of "omics" data, transcriptomic profiling has emerged as an efficient source for computational drug repurposing due to its standardized data format, multiple comprehensive public databases, and possible implementation with network biology approaches for complex diseases [12,129,130]. The expression patterns of gene products that are connected by signalling cascades or protein complexes are expected to be more similar than those of random gene products [40,131]. With this premise, co-expression networks built upon multi-dimensional data such as transcriptomics have aided in the identification of latent mechanistic patterns of psychiatric disorders and their medications, which could be missed by conventional differential expression analysis [131,132].

Psychiatric disease-related transcriptional profiles, generally from post-mortem brain samples, can be readily obtained from experiments, public databases, or psychiatric-centric consortiums such as PsychENCODE and CommonMind [66,68]. The transcriptomic data can be used on its own (gene expression levels) or incorporated with GWAS data to predict genetically regulated gene expression. As an example of the former, Cabrera-Mendoza et al. used transcriptional profiles from post-mortem brain samples of substance-use disorder individuals with and without suicidal behaviour to build gene co-expression networks associated with each phenotype (Table 2). The hub genes from these networks were then subjected to drug–gene interaction testing using the DGIdb database [94] to identify drug repurposing candidates [75]. Integration of transcriptomic profiles with GWAS data was

adopted by Rodriguez-López et al. for finding druggable targets in schizophrenia. The authors estimated polygenic scores based on predicted expression and associated these scores with co-expression modules to find relevant hub target genes for early intervention [74]. Gaspar et al. also applied the genetically predicted gene expression approach [73].

Major sources of drug-induced transcriptional profiles are generated from cell lines after treatment exposure, utilising seminal reference databases for drug responses such as Connectivity Map (CMap) [133] and the Library of Integrated Network-based Cellular Signatures (LINCS) [134]. While transcriptional profiles have been used extensively in signature-based drug repurposing for the generation and comparison of selective genes representing the phenotype of interest [129,135], their network-centric drug repurposing application is still very limited in psychiatry. An emerging systems-level approach constructing gene-regulatory networks associated with each drug treatment-cell line pair using CMap expression data can offer a comprehensive characterisation of the mechanism of action of drugs. Such a systems-level approach includes information on complex interactions between multiple entities, beyond the reductionist consideration of several signature genes [119,136].

The major challenge of using drug-induced gene expression in psychiatry is the lack of biological and pathological representation of the treated model systems. Transcriptional perturbations are highly context-dependent; hence, the cancerous cells used commonly in CMap and LINCS might not recapitulate the tissue-specific effects in neuronal or glial cells. The advancement in stem cell technology has propelled the generation of patient-derived induced pluripotent stem cells (iPSC), leading to the genesis of the NeuroLINCS center of omics data generation for human iPSC response in neurological diseases [137]. Since iPSCs carry the genetic information of the patients, they recapitulate the disease-related mutations that would be more representative for diseases with significant genetic factors such as psychiatric disorders [138].

#### *4.4. Interactome*

Interactomes encompass the functional interactions of biological components, which might include physical contact between proteins (protein–protein interaction networks), metabolites (metabolic networks), transcription factors and putative regulatory elements (gene regulatory networks) or functional relationships only such as phenotypic profiling networks (phenome networks) [40]. The interactome might be placed in specific biological contexts such as signalling pathways or disease-related pathways [139]. The functional interactome based on phenotypic profiles have been broadly applied for drug discovery and will be discussed separately in the context of phenome-based networks. Interactome networks tend to possess small world property: nodes are well connected with only a few paths required for the shortest distance (Figure 2). This holds highly relevant for functionally associated nodes, ensuring a quick flow of regulatory information passing between them [140]. With the premise that risk genes tend to be more connected in the network than a set of random genes, Kauppi et al. utilised the protein interactome to map drug targets of antipsychotic drugs with networks of schizophrenia risk genes (Table 2). Using network topological analysis of shortest distance, they found risk genes were significantly localised into a distinct module and overlapped with antipsychotic drug targets. Kauppi et al. then evaluated druggable risk genes without direct links to known antipsychotic drug targets to find potential novel targets for schizophrenia such as nicotinic acetylcholine receptor genes [89].

Given the key contribution of transcription factors in the modulation of gene expression and driving phenotypic perturbations, the transcriptional regulome has been employed by De Bastiani et al. for drug repurposing in bipolar disorders [91]. Their study inferred transcription factors–targets interactions via a reverse-engineering prediction algorithm applied on human prefrontal cortex microarray data. The transcription factor-centric network comprised of modules of gene targeted by each transcription factor, called "regulons". Based on case-control transcriptomics data, gene set enrichment analysis (GSEA) was applied on the regulons to find enriched regulons in bipolar disorder. These regulons were used as gene expression signatures to query connectivity map for potential drug candidates reverting disease-related regulon signatures. Several compounds with known clinical relevance in bipolar disorders were identified such as antipsychotics (chlorpromazine, haloperidol) and antidepressants (maprotiline, mianserin, and desipramine). The study also found novel repurposing candidates including non-steroidal anti-inflammatory agents (meclofenamic acid, ketorolac, acetylsalicylsalicylic acid and diflorasone) and an antioxidant agent (trolox C) (Table 2) [91].

#### *4.5. Phenome*

The collection of phenotypic data collected from drug-induced (indications, sideeffects) or disease-associated phenotypes (symptoms, disease genes) has been extensively used for drug repurposing with the availability of comprehensive public sources such as DrugBank and PharmGKB [55,93]. Zhou et al. built a drug side effect–gene system comprising two networks: drug phenotypic network of side effect profiles from SIDER [92] and protein interactome network from STRING [141]. The two networks were interconnected via drug-target associations from DrugBank [55]. Zhou et al. then applied this phenomedriven drug discovery system in finding repurposing agents for opioid use disorders. Rather than finding drugs targeting the pathological mechanism of the disorder, which is still mainly unknown, the system explored repurposing candidates sharing similar side effects or common targets with drugs causing or indicated for opioid use disorders. Using a network-based iterative algorithm, top-ranked repurposing candidates including tramadol, olanzapine, mirtazapine, bupropion and atomoxetine were identified with supporting clinical corroboration (Table 2) [116].

As presented in Section 3, psychiatric disorders tend to share mechanisms, such as pleiotropic genes associated with multiple disorders. By incorporating disease phenome and disease genome networks together, one can explore the common pathophysiology between diseases and infer potential reusable targets of one disease in a different disease. Such a disease-gene network was first proposed by Goh et al. as a "diseasome"—a bipartite graph including all known genetic disorders and disease genes connected by the association of genetic mutations to disorders [142]. Such a network can be interpreted for gene-gene similarity (connected if two genes share a disorder), or disease–disease similarity (linked if two disorders share a gene). While the specific application of diseasome in psychiatric disorders is still limited, Lüscher Dias et al. built a diseasome network considering multiple psychiatric and neurological disorders using text mining. They found several clusters shared by multiple disorders and their enriched functional annotations, e.g., depression with anxiety disorder (enriched for inflammatory response), bipolar disorder with schizophrenia (enriched for long-term potentiation and circadian entrainment). However, Lüscher Dias et al. did not consider common genes for their drug repurposing steps but focused on unique genes associated with each disorder as potential targets for the corresponding disorder (ABC model), shifting back to a single-disease context [118]. To our knowledge, there have been no cases using disease–disease similarity networks for drug repurposing in psychiatric disorders. An example outside of psychiatry from Langhauser et al. demonstrated how the repurposing hypothesis can be generated from a disease–disease similarity network of the diseasome, even from seemingly distinct diseases [143]. They built diseasome networks for 132 diseases based on four different relationships: shared genes, protein interactome, common symptoms and co-morbidity. From the diseasome, Langhauser et al. found the cGMP signalling pathway was associated with a cluster of disease phenotypes including neurological, cardiovascular, metabolic and respiratory diseases. This GBA approach suggested cGMP modulators as treatments for diseases belonging to this cluster. Based on this premise, the authors repurposed soluble guanylate cyclase (sGC) activators—cGMP generation facilitators—from their exclusive indications for cardiovascular diseases to neurological disorders and successfully validated their neuroprotection effects in vivo [143].

#### *4.6. Network-Based Drug Repurposing Platforms*

There are various approaches to yield network-based repurposing insights from biomedical data if one would like to build networks from the ground up, which has been comprehensively reviewed [36,37,41]. However, there are several platforms that can serve as a "one-stop shop" for network repurposing with the incorporation of multiple biological datasets, pre-constructed networks, pre-set analyses for easy access and queries of existing or user-generated data: for example, GRAND, a web-based database of gene regulatory networks specific for disease- or drug-related phenotypes inferred from prior experimental data such as protein–protein interactions, transcriptional profiles, transcriptional factor binding motifs and miRNAs predicted targets [119]. Using similarity scores based on properties of inferred regulatory networks, the CLUEreg tool of GRAND allows users to query a list of "high-targeted" and "low-targeted" genes or transcriptional factors of the disease to identify single or combinations of compounds that might "reverse" aberrant regulatory patterns [119]. Other examples of open-sourced platforms include PharmOmics and NeDRex; the former is a knowledgebase supporting gene-network-based drug repurposing and the latter allows heterogeneous network construction to mine disease modules for drug prioritization [100,120]. While these platforms would be easy to use with curated networks, users are limited by the scope of the current platforms, and how regularly they are updated. Reproducibility would be a challenge especially with commercial platforms such as IBM Watson for Drug Discovery where detailed analysing workflows are not publicly accessible [121]. Moreover, most datasets incorporated were yielded from different domains such as oncology, weakening the robustness of interpretations in psychiatry.

#### **5. Challenges of Network-Based Drug Repurposing in Psychiatry**

Despite its great potential, there are major obstacles preventing network-based drug repurposing from making substantial impact:

(1) While previous knowledge plays a major role in network construction, our current understanding of psychiatric disorders remains inadequate and biased towards wellstudied mechanisms and biological entities. Even high-throughput screening data such as for protein interactions can only capture 20% of all potential interactions, leaving us an 80% incomplete interactome network with a great deal of missing gaps and fragmented clusters [144].

(2) Furthermore, the integration of heterogenous and high-dimensional datasets generally has to deal with disparate, incompatible or missing information [145]. To merge multiple datasets into a homogenous network would compromise accuracy due to the disregarding of biological and experimental variations affiliated with each dataset [146].

(3) Regardless of the scale of the network and data integrated, network representation in drug repurposing so far has only recapitulated static snapshots of the biological systems despite their dynamic nature. However, dynamic network modelling is still a major challenge due to the limited knowledge of interaction kinetics [147].

(4) Whilst phenotypic profiles are important data for network-based drug repurposing, similar phenotypes are not necessarily the result of similar modes of action. Genes, medication histories, and traits all play a significant role in the phenotypic outcomes of a drug's mode of action [148].

(5) Repurposing candidates have been implied from various network-based approaches, yet the preclinical validation of these candidates is limited. Even though biological follow-ups are the gold-standard, the lack of representative experimental models for psychiatric disorders has posed a great obstacle to in vitro and in vivo validation of drug efficacy [6]. Most studies in psychiatry resorted to in silico validation such as literature cross-referencing, domain expert consultation and electronic health records (EHR) [149]. The literature-based validation is undertaken by mining clinical trials or PubMed articles to find supportive evidence such as the work of Lüscher Dias et al. [118]. Expert consultation is employed for a more credible evaluation of results and literature support, as done by Tan et al. [59]. While these validations are dependent on the inference of prior knowledge,

the EHR-based validation can provide a more observational corroboration based on realworld clinical data. Zhou et al. employed EHR of nearly 73 million patients provided by the IBM Watson Health platform to validate repurposing candidates for opioid use disorders (OUD), using the odds of OUD remission as the outcome measure [116]. To validate repurposing drug X, they identified a cohort of OUD patients diagnosed with repurposing drug X's original indication (disease A). This group was then split into an exposure group (patients with OUD, disease A, using drug X) and a comparison group (patients with OUD, disease A, not using drug X). The odds ratios of remission rates between these groups were then measured. They reported patient cohorts using top-ranked repurposing candidates had higher odds of OUD remission than corresponding groups without these drugs, supporting their repurposing potential for OUD [116]. A list of EHR resources can be referred from the collection of Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) compliance databases [150]. Most of this list are commercial and private databases whose utility is mostly hampered by the restrictive access policies. However, recent initiatives such as "All of Us" have been collecting large-scale EHR data and making data widely available for approved researchers, offering valuable resources for biomedical research [151].

#### **6. Conclusions and Future Perspectives**

Drug repurposing has emerged as a promising alternative for de novo drug discovery and has become a vital shift in the pharmaceutical industry. Taking advantage of the expanding accumulation of biomedical data, various computational drug repurposing approaches have been facilitating informed decisions for drug research. Among those, network-based approaches offer a unique opportunity to integrate various domains of biological knowledge to discover latent repurposing candidates for complex diseases such as psychiatric disorders. Given the virtually stagnant progress of drug discovery in psychiatry, we have presented the incentives for using network-based drug repurposing for psychiatric disorders: the efficiency of repurposing drugs with verified safety records and the compatibility of network science with the poly-pharmacology concept for complex disorders. We then summarised major concepts and main strategies for network-based drug repurposing, including the ABC model and GBA approaches. Data sources and current repurposing applications for psychiatric disorders were then summarised to offer readers an update with the progress of this approach in psychiatry. However, no methodology is without limitations; thus, we presented common challenges of using network-centric approaches for drug repurposing—mostly with the noisiness and insufficiency of data resources, lack of appropriate models for follow-up validation and the dynamic representation of complex systems.

Nevertheless, network-based repurposing holds great potential for expanding the knowledge of drug research, especially for complex disorders. Emerging techniques and resources will complement its capabilities for psychiatric research. Neuroimaging techniques such as functional magnetic resonance imaging (fMRI) offer the detection of the drug-induced perturbations of brain activity for predicting the efficacy of drug action [152]. A library of drug-related fMRI patterns might offer biomarker refences to compare the similarity between repurposing drugs with existing ones [153,154]. Its unique ability of non-evasively capturing functional differences at the brain systems level would be beneficial for psychiatric drug research given the complex nature of these diseases and inadequate experimental models. However, it is still an open challenge to incorporate the human connectome, i.e., the map of neural connections mapped via brain imaging, into the network-based drug repurposing given most biological data resources were measured at the molecular level. The emerging application of more pathological-representative preclinical models for psychiatric disorders such as iPSCs and organoids is also expected to provide more phenotypic-relevant datasets for drug repurposing and validation. A patient-derived stem cells library of drug response specifically for psychiatric disorders

would offer a more accurate context-specific overview of drug action and therefore improve the robustness of network-based drug repurposing.

To address the incompleteness of data, computational approaches are being developed for the integration of multi-dimensional data with differences in statistical properties and biological objectives. It is challenging to represent relationships between multitudinous omics data solely with traditional linear modelling. Therefore, multi-omics tools employing multivariate statistics, machine learning (ML) and deep learning (DL) approaches have been proposed to extract and predict complex non-linear patterns [52,155]. While much development and optimization are needed to generalize ML/DL models for systems-level capture of dynamics and kinetics underlying phenotypes, ML/DL has been aiding network inference and improving network coverage via the prediction of missing connections with supervised and unsupervised analyses [52,156]. While data integration is a cornerstone of network-based inference, most aggregation results in a single network endeavoring to represent a population with a broad spectrum of phenotypic differences. Despite being informative in terms of finding shared characteristics of the inspected population, aggregated networks generally ignore population heterogeneity. Emerging attention for precision medicine has facilitated the development of personalized characterization of biological perturbations. Several efforts have been made in network medicine to account for individual-level estimations, e.g., via overlaying the sample-specific expression data on the known biological networks, or interpolation of aggregated networks with and without a sample to estimate network contribution of such sample [157,158].

Empowered by the ever-growing amount of biomedical data and new computational analyses, the network-centric approach will keep proving itself as a powerful tool for the comprehension of vast knowledge to shed light on new repurposing candidates for psychiatric disorders.

**Author Contributions:** Conceptualization: T.T.T.T., J.H.K. and K.W.; writing—original draft preparation, T.T.T.T. and B.P.; writing—review and editing, T.T.T.T., B.P., J.H.K. and K.W.; visualization, T.T.T.T.; supervision, J.H.K. and K.W.; funding acquisition, K.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Health and Medical Research Council (NHMRC) Project Grant (1078928) and Centre of Research Excellence (1153607).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Pharmaceutics* Editorial Office E-mail: pharmaceutics@mdpi.com www.mdpi.com/journal/pharmaceutics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-6133-2