Next Article in Journal
TARTESSUS: A Customized Electrospun Drug Delivery System Loaded with Irinotecan for Local and Sustained Chemotherapy Release in Pancreatic Cancer
Next Article in Special Issue
Strontium Ranelate Inhibits Osteoclastogenesis through NF-κB-Pathway-Dependent Autophagy
Previous Article in Journal
Comparing 3D, 2.5D, and 2D Approaches to Brain Image Auto-Segmentation
Previous Article in Special Issue
Arduino Automated Microwave Oven for Tissue Decalcification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computing Drug-Drug Similarity from Patient-Centric Data

Department of Computer Science, Najran University, Najran 61441, Saudi Arabia
Bioengineering 2023, 10(2), 182; https://doi.org/10.3390/bioengineering10020182
Submission received: 9 January 2023 / Revised: 22 January 2023 / Accepted: 29 January 2023 / Published: 1 February 2023

Abstract

:
In modern biology and medicine, drug-drug similarity is a major task with various applications in pharmaceutical drug development. Various direct and indirect sources of evidence obtained from drug-centric data such as side effects, drug interactions, biological targets, and chemical structures are used in the current methods to measure the level of drug-drug similarity. This paper proposes a computational method to measure drug-drug similarity using a novel source of evidence that is obtained from patient-centric data. More specifically, patients’ narration of their thoughts, opinions, and experience with drugs in social media are explored as a potential source to compute drug-drug similarity. Online healthcare communities were used to extract a dataset of patients’ reviews on anti-epileptic drugs. The collected dataset is preprocessed through Natural Language Processing (NLP) techniques and four text similarity methods are applied to measure the similarities among them. The obtained similarities are then used to generate drug-drug similarity-based ranking matrices which are analyzed through Pearson correlation, to answer questions related to the overall drug-drug similarity and the accuracy of the four similarity measures. To evaluate the obtained drug-drug similarities, they are compared with the corresponding ground-truth similarities obtained from DrugSimDB, a well-known drug-drug similarity tool that is based on drug-centric data. The results provide evidence on the feasibility of patient-centric data from social media as a novel source for computing drug-drug similarity.

1. Introduction

Drug-Drug Similarity (DDS) has received a lot of attention in recent years from biomedical researchers as a result of its usefulness in treating medical issues. It aims to find drugs with similar traits to a given drug resting on a general assumption that similar drugs share similar characteristics such as chemical structure [1], gene expression profiles [2], side effect profiles [3], and biological target [4]. In pharmaceutical drug development, in particular, DDS has successfully applied for drug repositioning [4,5], drug-drug interaction prediction [6,7], drug target identification [3], and drug side-effects prediction [7]. Each of these applications is driven by an application-specific hypothesis. A drug repositioning application, for instance, is motivated by the idea that if two different medications, D1 and D2, have comparable modes of action and properties, and D1 is utilized to treat a certain condition S, then D2 has the potential to be a choice to treat condition S. In drug-drug interaction prediction applications, the hypothesis is if drug D1 interacts with drug D2, and drug D3 is similar to D1, then D3 should also interact with D2 (the argument also follows if D1 is replaced with D2). The application of drug side-effect prediction is based on the hypothesis that if drug D1 is similar to drug D2 and drug D1 is known to cause a certain side effect, then drug D2 should also cause the same side effect.
The computation of DDS is essentially based on applying data similarity methods to drug-centric of different types. In doing so, these methods utilize different data similarity measures, which vary according to the type of data. Nonetheless, these measures can be divided into three broad categories [8]. The first category measures DDS using different features of drug and targets such as Anatomical Therapeutic Chemical (ATC) codes, molecular structure of drugs, sequences, and gene ontology of targets. The second category measures DDS using relationships such as drug-drug interaction, associations of drug–disease, and associations of drug–target. Finally, the third category integrates multi-information from multiple data sources to measure DDS. It should be noted that the computation of DDS has been made possible by recent developments in high-throughput biology, which have generated enormous quantities of data focused on drugs. The pharmacological side effects, gene ontologies, chemical structures, targets, and ATC codes are some examples of the data that are curated in biomedical databases such as DrugBank, CHEMBL, PubChem, SIDER, and KEGG. The chemical structure of a drug is a three-dimensional description of drug structure using strings of characters. The World Health Organization has adopted the ATC classification system, which divides each level of drug classification into a number of classes based on each level’s characteristics and therapeutic effects. Proteins and nucleic acids are examples of biological macromolecules that can be affected by a drug to carry out its pharmacodynamic actions in the body. Furthermore, gene ontology of drug targets is a representation of the ways in which gene products function in the biological realms. It is a helpful data source for biomedical research that is employed in the computer analysis of large-scale genetics and biological experiments. Finally, the drug side effects reported undesirable effects that may occur at standard doses should be considered throughout the drug targeting procedure [8].
Social media has recently become a valuable data source for healthcare informatics [9]. The emergence of Web 2.0 and Health 2.0 has made it possible for patients to share their social media experiences with illnesses, treatments, drug names, physicians, and therapists. Consequently, a massive amount of health information becomes available, representing potentially valuable, yet largely unexploited data sources that could be leveraged for drug knowledge discovery [10]. In this regard, the enormous amount of healthcare text generated from social media sites such as Google, Twitter, and YouTube has been used to tackle a number of medical issues such as detection of psychopathic class [11,12], classification of depression [13], identification of diseases [14], and detection of adverse drug reactions [15].
On this basis, this paper argues that social media data in the form of patient narration of their thoughts, opinions, and experience with drugs represent a potential source of drug-centric data that could be utilized for measuring DDS. It is based on a new drug-drug similarity hypothesis that states similar drugs should share similar aspects of patients’ experience. As the patients experience in social media is expressed in textual form, the problem of DDS is formulated as text similarity problems to which text similarity approaches can be applied. In Natural Language Processing (NLP), text similarity plays an important role in many tasks such as automatic translation, information retrieval, intelligent responses, and machine matching for dialogues and documents [16]. Over the past three decades, various semantic similarity techniques have been proposed and used in different contexts. Following this idea, each drug will be modeled as a document which contains all posts written about it. In this space of documents, text similarity can be applied to measure the similarity among them. The idea of utilizing patient-centric data in social media as a data source for measuring DDS is distinguished from the drug-centric data sources in three ways. First, unlike the drug-centric data which are stored in structured databases, the patient-centric data are unstructured. Second, the patient-centric data are produced by patients, who typically write simply and plainly without using professional medical terms when expressing their experiences on medical concerns, as opposed to the drug-centric data, for which a professional medical language is employed [17]. Because of this, the DDS method used with social media is different from the method used with more conventional drug-centric data sources in that it mainly relies on NLP techniques to extract pertinent information from social media. Third, the drug-centric data represent professionals’ perspectives, whereas the patients-centric data reflect patients’ thoughts and opinions. It is worth mentioning that this research has two-fold contributions: it introduces a new domain of applications where social media can be utilized and it adds a new data source that is worth exploration.
Finally, from a practical perspective, it is expected that the outcomes of this research would have a significant impact on the practical applications of DDS in drug discovery and development such as drug repositioning, drug side-effect prediction, and drug-drug interaction prediction. This is due to the fact that pharmaceutical corporations now place a high priority on incorporating patient perspectives into drug discovery and development [10]. Since the currently used methods for computing DDS depend only on drug-related data, this research would meet the requirement of incorporation of patients’ perspective in DDS and its practical applications. For example, in the application of DDS for the prediction of drug-drug interaction, the proposed patient’s centric data can be integrated with the traditional drug-centric data for a more robust computation of DDS which consequently improves the prediction of drug-drug interaction.

2. Background

The computation of DDS measures the similarity between drugs from drug-centric data sources. Usually, the resultant similarities are used as input to a target application. In this section, the previous DDS works which rely on computing DDS from various drug-centric data sources, regardless of the target application, are reviewed in the following dimensions: source of drug-centric data, similarity measures, target applications.
In the first dimension, source of drug-centric data, the previous DDS works utilizes different drug-centric data sources such as chemical structure [1,18], gene expression profiles [2], protein targets [19,20], side-effect profile [3,21], and clinical information [22]. It should be mentioned that in addition to the previous DDS works that utilizes a single drug-related data source, many DDS works utilize multiple drug-centric data sources to compensate for missing data across individual data sources and provide a multi-view aspects for forecasting related medications. Thus, a new insights into the target application [23,24]. In some of these works, drug characteristic is regarded as the combination of many drug similarities. In [23], GIPAE, for instance, combines chemical structure similarity from SMILES data as measured by Chemistry Development Kit (CDK), and association similarity from drug–disease association profiles as measured by the GIP kernel to represent drug features. Using the combined similarities as drug feature, the computation of DDS has improved drug–disease association prediction. Many works have proposed various integration approaches to leverage multimodal data and fuse similarities more effectively, but some of the earlier works integrated multiple similarities to yield multiple similarity matrices. These approaches of integration can be categorized as either linear integration or nonlinear integration [8].
As for the employed similarity measure, these measures can be a general similarity measure specific to drug-related data. While the general similarity measures such as Jaccard Coefficient, Euclidean distance, Cosine similarity are domain-independent measures that can be used in any domain, the drug-centric data-specific similarity measures such as CDK [25], SIMCOMP [26], normalized Smith–Waterman algorithm [27], GOSemSim [28] are developed specifically to measure the drug-drug similarity. The CDK is a library of structural chemoinformatics and bioinformatics developed in Java programming language and to perform many molecular informatics tasks such as two- and three-dimensional representations of chemical structures, structure diagram generation, SMILES parsing and generation, I/O routines, isomorphism checking, ring searches, etc. SIMCOMP is a method that compares chemical structures using graphs, represented as a two-dimensional graph with vertices for atoms and edges for covalent bonds. This approach counts the number of similar atoms in comparison between the corresponding graphs of two chemical compounds to determine how similar they are. By using a local sequence alignment to compare segments of all practical lengths, the Smith–Waterman method analyzes two protein canonical sequences of pharmaceutical targets. Next, the similarity between the similar parts is calculated.
Based on the target applications in which the computation of DDS is utilized, the following domains of applications can be identified:
  • Drug repositioning: to discover new uses for existing drugs according to the similar compounds of drugs that are expected to interact with similar signs. Because it is a very effective strategy with low risk and cost, drug repositioning for DDS has many successful applications in drug development [29].
  • Drug side effect prediction: to predict unexpected side effects of a drug based on computing ligand similarity and protein interactions. Knowing affected biological pathways and binding partners of a given drug is important for predicting both its efficacy and side effects [30]. The similarity-based drug side-effect prediction is an effective strategy, because the currently used laboratory assay method for evaluating potential adverse drug effects is a time-consuming method with high cost.
  • Drug-Drug Interaction: the interaction between two drugs taken concomitantly occurs when the action of one of them intervenes with the activity of the other. The discovery of the interaction between drugs is of significant benefit for guidance of clinical medications, because it could lead to adverse drug reactions or complicate disease treatments on patients. The similarity-based method is one of the successful methods to identify drug-drug interactions [8].
  • Drug–disease associations: the discovery of yet-unknown links between drugs and diseases has gained significant attention. In this regard, the similarity-based methods play an important role in complementing or guiding costly and exhausting wet experiments. In addition, the prediction of novel associations between drug and disease can be done utilizing the previously known drug–disease associations and the features of drug and disease as well [31].
  • Drug–target interaction prediction: to forecast a possible relationship between a medicine and a target. It is a necessary stage for tasks such as drug discovery and repositioning. In the database, similar medications and targets can be found using similarity-based algorithms and based on the known interactions between these drugs and targets, the interaction can be predicted [32].
  • Personalized medicine: to fit a treatment according to the characteristics of each patient. It requires a grouping of patients into subgroups with predictable response to a specific treatment. In this regards, the exploratory and predictive analysis provided by the similarity-based methods supports clinical decision-making, which is a key step in personalized medicine [1].
In all the previous works, the drug-centric data are a key factor for computing DDS. Moreover, in all of the previous works, the source of drug-related data represents professional perspective of the drugs. From the perspective of modern-day business dynamics, integrating patients’ perspectives into drug discovery and development is a critical issue. Furthermore, in the recent years, patient-perceived benefits are receiving increasing attention by the pharmaceutical regulatory authorities when decisions such as drug-approval, pricing, and reimbursement are made. The analysis of existing research shows that people with major diseases and disabilities have a propensity to use social media to seek self-help by sharing their experiences with their conditions [10,33]. Interestingly, the examination of patient posts on these social media platforms could be used to glean insightful information that opens the door for patient-centered drug development.
On this basis, this work intends to incorporate the patient’s perspective in the computation of DDS by considering his/her experience and opinions on the drug as a new source of drug-centric data for computing DDS. Unlike, the conventional drug-related data sources, in which data are curated in a structural form, the patient’s experience about the drugs in social media is unstructured and, therefore, the computation of DDS requires employing text similarity.
Text similarity is a ubiquitous notion within the natural language processing (NLP) community. It is utilized in a wide range of tasks such as question answering [34], automatic essay grading [35], or paraphrase recognition [36]. The text similarity methods can be divided into three broad categories [37]: string-based similarity method, corpus-based similarity method, and knowledge-based similarity method. String metric similarity or dissimilarity (distance) between two strings is used in the string-based similarity approach (also known as lexical-based similarity). The corpus-based similarity method (semantic-based similarity) calculates how similar two words are by using data from huge corpora. On the other hand, the knowledge-based similarity method calculates the degree of similarity between words using data from semantic networks such as WordNet, a sizable lexical database of English words created specifically for this purpose. Semantic relatedness measurements and semantic similarity measures are additional categories for knowledge-based similarity metrics. While the semantic similarity measures evaluate the similarity between concept based on their likeness, semantic relatedness measures employ a more general notion of relatedness that is not tied specifically to the form or shape of the concept.

3. Materials and Methods

The task of computing drug-drug similarity can be viewed as a use case of the general task of drug knowledge discovery that is concerned with extracting insights from available data. The five key stages of the standard approach for extracting drug-related datasets from social media [10] are: (1) resource selection, (2) dataset extraction, (3) data preparation, (4) data analysis, and (5) overall evaluation. The main elements of the process are frequently preserved, even though the specifics of each step may change depending on the final application. Figure 1 depicts the specific use case of the general drug knowledge discovery methodology that is concerned with computing drug-drug similarity from social media platforms.
In the first stage of this process, the social media resource of patient’s reviews should be identified. In general, the patients’ reviews of drugs can be drawn from social media platforms, which are divided into general platforms and specialized healthcare platforms. Facebook, Twitter, Instagram, and Reddit represent general social media platforms. The specialized healthcare social media platforms are divided into three types: generic health-centered platforms, drugs-focused sharing platforms, and disease-specific platforms. While the generic health-centered platforms, such as Patients-LikeMe, DailyStrength, MedHelp, WebMD, and CureTogether, permit patients to communicate their experiences on health-related issues, the drug-focused sharing platforms, such as Askapatient and Medications.com, permit patients to discuss and share their experiences on medications. On the other hand, disease-specific platforms focus on particular diseases, e.g., the TalkStroke forum [15].
After the identification of the social media data source, the second step is to extract patient-centric data from the identified social media platforms. For this purpose, two types of processes can be utilized: focused crawling and Web scraping. Focused crawling refers to automatically collecting websites that satisfy given criteria, e.g., all websites on Alzheimer’s disease or all websites on public health topics from a particular domain. In this process, the crawling algorithm should implement hyperlink analysis and prioritization processes to exclude many irrelevant sites. On the other hand, the Web scraping process refers to automated and systematic extraction of specific content of interest from given webpages. The decision of which process can be utilized is made based on the type of identified social media platform from which patients’ reviews are extracted. More specifically, to extract patients’ reviews from generic health-centered platforms, specific application programming interfaces can be used; however, an adapted web crawler to collect web pages and web scraper is usually used to obtain the patients’ reviews from specialized healthcare social networks [38].
The third step in the methodology of computing DDS is to generate drug documents. In this process, a single document that contains all collected patients’ reviews must be generated for each drug. The document of a given drug is generated by aggregating all collected patients reviews on that drug into a single document. The results of this step is a set of documents equal to the number of drugs under consideration.
The fourth step is the preprocessing of drug documents using NLP techniques to facilitate insightful analysis by reducing noise and structuring the text of drug documents. Data preparation and data reduction can often be used to execute the preprocessing in two steps. Data cleaning, standardization, and transformation are steps in the data preparation process. While data cleaning aims to ensure that complete and concise data are available and free from duplicates by applying appropriate techniques such as word removal, and repost removal, the aim in the data standardization is to ensure the data are expressed in unified medical form by identifying all imprecise medical terms and concepts occurrences in social-media posts and replacing them with appropriate ones. In data transformation, the data are instead transformed into a format that may be used for analysis. In the data reduction step, the dimensionality of the data is decreased using techniques including feature selection, transformation, and instance selection. When the data dimensionality is enormous, as in the case of text in drug documents, feature transformation, which seeks to condense original features into a limited set, is a critical procedure. On the other hand, by removing posts that are not relevant, for example, instance selection seeks to reduce the size of the data without sacrificing important information. Finally, feature selection is carried out by removing as many redundant and unnecessary features from the data as is practical.
After obtaining the drug documents in vector space model format, it is possible to compute the DDS by using similarity metrics to determine how similar each pair of drugs’ vector space models is to one another. Each similarity value in the medications similarity matrix created during this phase indicates how similar a particular pair of pharmaceuticals is to one another. In data mining, calculating similarity is a frequent task with a large range of potential measures. The Cosine similarity and Euclidean distance are two of the most often used data similarity metrics. It should be emphasized that because the selection of a data similarity metric is domain-specific, it is too challenging to know whether a metric is superior or worse under a general condition.

3.1. Computing DDS of Anti-Epileptic Drugs: A Case Study

This section explains how to compute DDS among a specific group of drugs used mostly to treat epilepsy using the methods given above. Anti-Epileptic Drugs (AEDs) are drugs primarily used to treat epilepsy, a neurological condition characterized by a variety of seizure forms, therapeutic sensitivity, and prognosis. Although the currently available AEDs provide greater treatment options for different types of seizures, none of them treats the disease etiology as they all work by suppressing the seizures when they occur. Additionally, more than one-third of epilepsy patients are still unable to manage their seizures using the AEDs that are now available [39].
The AEDs interact with a wide range of various molecular targets to produce their desired effects.The AEDs primarily target two broad target groupings [40]: the specific aspects of the damaged membrane, which are typically regarding aberrant ion permeability (calcium, sodium, and potassium), and the compromised synaptic functioning (heightened excitation or inadequate transmission of suppression). Even though the majority of recently developed AEDs, such as lacosamide and Perampanel, have numerous modes of action, several older AEDs, such as valproate, also have other pharmacological activities that are uncertain in relation to their anticonvulsant activity. Undoubtedly, the ongoing effort to identify the targets of the AEDs that are currently being used will advance knowledge of the pathophysiological mechanisms underlying epileptic seizures and the creation of novel therapeutic approaches.

3.2. AEDs Related Patients’ Reviews Extraction

The raw data of AEDs are extracted from Askapatient platform through a web crawler. The extracted data involve patients’ experiences and ratings of AED, reasons for using AED, side effects of AED, comments, gender, age, duration/dosage, and posting dates. When the data have been extracted, the number of AEDs reviews range from 1860 reviews for Lamotrigine to a single review for Aptiom. Therefore, this research does not consider AEDs whose review number in the Askapatient platform is less than 150. Table 1 lists the considered AEDs in this work.
Moreover, Figure 2 is a snapshot of the detailed data extracted from Askapatient for Lamictal (Lamotrigine).

3.3. AEDs Documents Generation

In this step, the relevant data, which include side effects and comments, for each AED are selected from the extracted patients’ reviews and then compiled into a unified single document for each AED.

3.4. AEDs Documents Preprocessing

As pointed out above, some NLP techniques must be applied to preprocess AED documents and transform them into vector space model representation. The applied NLP techniques are
  • Text cleaning: eliminating all digits, numerals, and punctuation.
  • Normalizing text entails changing capitalization to lowercase.
  • Stop words should be eliminated because they have no bearing on the DDS computation.
  • Using three as the maximum number of n-grams, all terms in an AED document are used to generate unigrams, bigrams, and trigrams.

3.5. Computing DDS of AEDs

In this study, the similarities across AEDs works are determined using four data similarity metrics, including Cosine Similarity, Euclidean Distance, Manhattan Distance, and Jaccard Coefficient, which are widely used in the text similarity area.

3.5.1. Cosine Similarity (CS)

A popular method to gauge text similarity is via the Cosine Similarity (CS) metric [15]. In an inner product space, it calculates the cosine of the angle formed by two non-zero vectors. The vector’s absolute length has no effect on the CS measure. The CS measure between two vectors X = (x1…xn) and Y = (y1…yn) is defined as:
C S ( X , Y ) = i = 1 n x i y i i = 1 n x i 2 i = 1 n y i 2
An interesting aspect of the CS measure is its variance to linear transformations and invariance to rotation. Additionally, the vector length has no bearing on the CS measure [41].

3.5.2. Euclidian Distance (ED)

The most typical metric employed for geometrical issues is the Euclidean Distance (ED) measure. The straight-line distance in n-dimensional space between any two data points is what is meant by this term. In data mining, it is has been widely applied for many tasks such as clustering problems [42]. Given two vectors representing two data points, X = (x1 …xn) and Y = (y1 …yn), the ED measure between them is defined as follows:
E D ( X , Y ) = i = 1 n ( x i y i ) 2
The ED measure has demonstrated several intriguing qualities, although suffering from a number of issues related to data sparsity, distribution, noise, and feature relevance, particularly in the high-dimensional space. The ED measure’s invariance to rotation, or the fact that the straight-line distance is unaffected by the axis system’s orientation, is an interesting feature [43]. This feature suggests that distance can be applied without being affected by procedures such as singular value decomposition and principal component analysis. The logical interpretability of ED measurements is another essential feature.

3.5.3. Manhattan Distance (MD)

The Manhattan Distance (MD) and ED measures are comparable in that they are both particular instances of the Minkowski distance [43]. In a place such as New York City’s Manhattan island, where the streets are organized into a grid, the MD measure is specified in terms of “city block” distance. Due to its resemblance to the ED measure, MD has the same interesting characteristics of being rotation-invariant and interpretable as the ED measure as well as experiencing the same difficulties in high-dimensional space. The MD measure between two vectors, X = (x1…xn) and Y = (y1…yn), which represent two data points, is defined as:
M D ( X , Y ) = i = 1 n ( x i y i )

3.5.4. Jaccard Coefficient (JC)

The Jaccard Coefficient (JC) measure is defined as the similarity between two finite sets by calculating the size of the intersection over the size of the union of the two sets [16]. Thus, if there are no intersecting elements between the two sets, JC equals to zero; however, if all elements intersect, JC equals to one. Given two sets X and Y, the JC measure is defined as follows:
J C ( X , Y ) = | X Y | X Y

4. Results and Discussion

This section displays the main findings from calculating the degree of similarity between the text documents of AEDs using the four similarity measures. Please see the Table 2, Table 3, Table 4 and Table 5.
The results shown in the above tables indicate that these measures are different as they yield quite different results due to the differences between their working mechanisms. In other words, although these measures evaluate how two documents, represented commonly as two points in the vector space, are related, each measure has a different evaluation of that relationship because what “similarity” means is different for each measure. This is obvious from the differences in their scales and range of similarity values. For example, since Euclidian and Manhattan distance define similarity in terms of the distance between two vectors, their scales fall in the range [0, ∞], where 0 means that the two documents are identical and the more they are dissimilar, the higher the value of these measures. Nonetheless, due to the differences in the meaning of distance between the two measures, Euclidean distance results are somewhat lower than the Manhattan distance measure. More precisely, while Euclidean distance measures the straight distance between two points in the vector space, the Manhattan distance is the sum of absolute differences between points across all the dimensions.
The cosine and Jaccard coefficient measures, on the other hand, deal with the similarity between a two documents from a different perspective. Unlike distance-based similarity measures, these measures interpret the similarity between two documents in terms of the closeness of the two documents to each other; therefore, their scales fall in the range [0, 1], where 0 means the two documents are totally dissimilar and 1 means the two documents are identical. Nonetheless, the two measures are different in their interpretation of the similarity between two documents. While the cosine measure interprets the similarity in terms of the orientation of the two vectors in vector space, the Jaccard coefficient interprets the similarity in terms of the size of the intersection divided by the size of the union of the two sets representing the documents. Another important difference between distance-based measures (Euclidian and Manhattan) and closeness-based measures is that the distance-based measures account for the magnitude of the values representing the dimension, whereas closeness-based measures are much less effected by magnitude, or how large the numbers are.
To overcome the above-mentioned variance in measuring the DDS, a unified scale measurement scale can be used. For this problem, a similarity-based ranking method is applied, where for each drug, the remaining drugs are ordered descendingly based on the obtained DDS from each measure and the ranking values are used instead. The results of applying the similarity-based ranking method are presented in Table 6, Table 7, Table 8 and Table 9.
In contrast to the similarities, the similarity-based rankings look more consistent and illustrate, for each AED, the ranks of the remaining AEDs with respect to their similarity. In addition to the unified measurement scale provided by the similarity-based ranking method, the obtained ranking values allow two types of analyses to be performed. The first analysis is drug-drug correlations which is motivated by the observed consistency between the ranking values of drugs in the rows within each table. The drug-drug correlation analysis would provide insights on the overall drug-drug similarity. The second analysis is the agreement between the similarity measures which is motivated by the observed consistencies between the corresponding drugs ranking values across tables. This analysis would provide insights on the performance of similarity measures relative to each other. For both analyses, rank correlation coefficient methods can be applied. A rank correlation coefficient is used to assess the significance of the relation between two rankings by measuring the degree of similarity between them. In this work, Pearson’s rank correlation coefficient [44] over the obtained drug rankings is defined for two variables X = (x1 …xn) and Y = (y1 …yn) as follows:
P r = n x y ( x ) ( y ) [ n x 2 ( x ) 2 ] [ n y 2 ( y 2 ) ]
where Pr is the Pearson correlation coefficient, xi and yi are values of the X and Y variables.

4.1. Drug-Drug Correlations Analysis

The drug-drug correlation analysis can be performed by applying Pearson’s rank correlation coefficient to the ranking values of each drugs within the same table. This can be considered as a second-order similarity measuring between AEDs to measure how the drugs are ordered with respect to their similarity to a particular drug. Table 10 presents the degree of agreement between each pair of AEDs in how the other AEDs are ranked measured by each one of the four measures. This unified scale allows to reach a final score of the similarity-based correlations between each pair of AEDs.
Based on the obtained drug-drug similarity-based correlations, an overall AEDs similarity-based correlation can be calculated as shown in Table 11.

4.2. Agreement Analysis of Similarity Measures

As pointed out above, the second analysis is the agreement between the similarity measures to provide insights on the performance of similarity measures relative to each other. Again, this analysis is performed by applying the Pearson’s ranked correlation coefficient to the obtained Drug Ranking values presented in Table 6, Table 7, Table 8 and Table 9.
The results of the agreement analysis using Pearson ranked correlation coefficient shows various levels of agreement between the four measures in measuring the similarities between AEDs. The values in the last rows of Table 12 represent the average agreement between different pairs of measures over all AEDs. It is obvious that Euclidian and Manhattan measures have the highest agreement. This can be attributed to the similar working mechanisms of the two measures where they measure the similarity in terms of the distance between the vectors in a Cartesian space. In addition, both Manhattan and Jaccard show a quite high degree of agreement between them, though both measures evaluate the similarity work on a different basis; however, the simplicity of the two measures could interpret the high degree of agreement between them. On the other hand, the Cosine similarity measure shows a low agreement with other measures, where it is the lowest with Jaccard. This reflects the inherent differences of the Cosine measure with others.

4.3. Evaluation

To evaluate the discovered similarity-based correlations among AEDs, it is meaningful to compare the obtained similarity from social media with the AEDs similarities that are based on drug-centric data mentioned above. For this sake, this research uses the DrugSimDB [45] tool which integrates multiple sources of drug-centric data to compute DDS among a comprehensive list of drugs. It includes 238,635 significant multi-modal DDS for 10,317 small-molecule medications that are either unlawful or withdrawn (2466 approved and 7212 investigational). DrugSimDB uses a variety of public datasets. This covers protein sequences and their functional annotations, drug-induced pathways, chemical structure descriptors, interactions between proteins and proteins and between drugs, to determine the degree to which each combination of medications has the same targets, structures, activities, and routes. DrugSimDB is a web-based application that enables users to browse or download the complete drug database or any crucial processed files. Table 13 presents the results of AEDs similarities obtained from DrugSimDB and Figure 3 shows their representation as a network.
Assuming the average DDS obtained from DrugSimDB tool as ground truth, the evaluation of the AEDs DDS obtained from social media can be performed in terms of Precision, Recall, and F1 as given in the following equations. In doing so, threshold values of AEDs’ drug-drug similarity-based correlations shown in Table 11 need to be specified so as two drugs are considered similar when their similarity-based correlation is above the threshold. Table 14 illustrates the obtained P, R, and F1 values for several threshold values. As shown in Table 14, the best F1 is obtained when the chosen threshold is 0.75. These results provide evidence on the feasibility of using drug-centric data from social media.

5. Conclusions

In this research, a framework for computing source data for computing drug-drug similarity based on a novel data source that represents patient perspective on drugs is proposed. The proposed framework employs text similarity methods to compute DDS from patients’ reviews collected from social media. A case study for computing DDS of a specific set of drugs, AEDs, is presented and the obtained results are analyzed using Pearson’s correlation coefficient method to investigate the AEDs DDS and the performance of four similarity measures. The AEDs DDS are compared with DDS obtained from DrugSimDB which depends on the commonly used drug-centric data and the results provide evidence on the feasibility of using drug-centric data for computing DDS. The outcomes of this research are expected to contribute to the healthcare at a practical as well as theoretical level. At the theoretical level, this research is considered the first of its kind to investigate patient’s centric data for computing DDS, which can inspire further research in this direction to fully exploit this novel source of data. At a practical level, this research can inform practical applications of drug discovery and development, which rely on computing DDS, with a new source of data to compensate for missing data across professional data sources and provide a multi-view perspective to compute DDS.
This research can be extended in several directions. First, there are abundant text similarity methods that can be investigated for improving the computation of DDS. Second, more sophisticated NLP methods can be utilized in the preprocessing of the textual data of drug documents to improve the computation of DDS. Finally, for the sake of generality, the proposed DDS framework can be experimented on an extended set of Central Nervous System CNS-acting drugs such as anti-Alzheimer, anti-Parkinson’s, and antipsychotic drugs.

Funding

The author is thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Priorities and Najran Research funding program grant code NU/NRP/SERC/11/17.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author is thankful to the Deanship of Scientific Research at Najran University for funding this work under the Research Priorities and Najran Research funding program grant code NU/NRP/SERC/11/17.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Zhang, P.; Wang, F.; Hu, J.; Sorrentino, R. Towards personalized medicine: Leveraging patient similarity and drug similarity analytics. AMIA Jt. Summits Transl. Sci. 2014, 2014, 132–136. [Google Scholar]
  2. Cha, K.; Kim, M.S.; Oh, K. Drug Similarity Search Based on Combined Signatures in Gene Expression Profiles. Healthc. Inform. Res. 2014, 20, 52–60. [Google Scholar] [CrossRef] [PubMed]
  3. Campillos, M.; Kuhn, M.; Gavin, A.C.; Jensen, L.J. Drug target identification using side-effect similarity. Science 2008, 321, 263–266. [Google Scholar] [CrossRef] [PubMed]
  4. Zhang, P.; Wang, F.; Hu, J. Towards drug repositioning: A unified computational framework for integrating multiple aspects of drug similarity and disease similarity. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 15–19 November 2014. [Google Scholar]
  5. Luo, H.; Wang, J.; Li, M.; Luo, J.; Peng, X.; Wu, F.X.; Pan, Y. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics 2016, 32, 2664–2671. [Google Scholar] [CrossRef]
  6. Ferdousi, R.; Safdari, R.; Omidi, Y. Computational prediction of drug-drug interactions based on drugs functional similarities. J. Biomed. Inform. 2017, 70, 54–64. [Google Scholar] [CrossRef]
  7. Sridhar, D.; Fakhraei, S.; Getoor, L. A probabilistic approach for collective similarity-based drug-drug interaction prediction. Bioinformatics 2016, 32, 3175–3182. [Google Scholar] [CrossRef]
  8. Huang, L.; Luo, H.; Li, S.; Wu, F.X.; Wang, J. Drug–drug similarity measure and its applications. Briefings Bioinform. 2021, 22, bbaa265. [Google Scholar] [CrossRef]
  9. Nawaz, M.S.; Mustafa, R.U.; Lali, M.I. Role of Online Data from Search Engine and Social Media in Healthcare Informatics. In Applying Big Data Analytics in Bioinformatics and Medicine; IGI Global: Hershey, PA, USA, 2018; pp. 272–293. [Google Scholar]
  10. Koss, J.; Rheinlaender, A.; Truebel, H.; Bohnet-Joschko, S. Social media mining in drug development—Fundamentals and use cases. Drug Discov. Today 2021, 26, 2871–2880. [Google Scholar] [CrossRef]
  11. Alotaibi, F.M.; Asghar, M.Z.; Ahmad, S. A Hybrid CNN-LSTM Model for Psychopathic Class Detection from Tweeter Users. Cogn. Comput. 2021, 13, 709–723. [Google Scholar] [CrossRef]
  12. Asghar, J.; Akbar, S.; Asghar, M.Z.; Ahmad, B.; Al-Rakhami, M.S.; Gumaei, A. Detection and Classification of Psychopathic Personality Trait from Social Media Text Using Deep Learning Model. Comput. Math. Methods Med. 2021, 2021, 5512241. [Google Scholar] [CrossRef]
  13. Ahmad, H.; Asghar, M.Z.; Alotaibi, F.M.; Hameed, I.A. Applying Deep Learning Technique for Depression Classification in Social Media Text. J. Med Imaging Health Inform. 2020, 10, 2446–2451. [Google Scholar] [CrossRef]
  14. Pervaiz, S.; Ul-Qayyum, Z.; Bangyal, W.H.; Gao, L.; Ahmad, J. A Systematic Literature Review on Particle Swarm Optimization Techniques for Medical Diseases Detection. Comput. Math. Methods Med. 2021, 2021, 5990999. [Google Scholar] [CrossRef]
  15. Pappa, D.; Stergioulas, L.K. Harnessing socialmedia data for pharmacovigilance: A review of current state of the art, challenges and future directions. Int. J. Data Sci. Anal. 2019, 8, 113–1335. [Google Scholar] [CrossRef]
  16. Wang, J.; Dong, Y. Measurement of text similarity: A survey. Information 2020, 11, 421. [Google Scholar] [CrossRef]
  17. Sarker, A.; Ginn, R.; Nikfarjam, A.; O’Connor, K.; Smith, K.; Jayaraman, S.; Upadhaya, T.; Gonzalez, G. Utilizing social media data for pharmacovigilance: A review. J. Biomed. Inform. 2015, 54, 202–212. [Google Scholar] [CrossRef] [PubMed]
  18. O’Boyle, N.M.; Sayle, R.A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminform. 2016, 8, 36. [Google Scholar] [CrossRef]
  19. Vilar, S.; Hripcsak, G.J. Leveraging 3D chemical similarity, target and phenotypic data in the identification of drug-protein and drug-adverse effect associations. J. Cheminform. 2016, 8, 35. [Google Scholar] [CrossRef]
  20. Wang, W.; Yang, S.; Zhang, X.; Li, J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 2014, 30, 2923–2930. [Google Scholar] [CrossRef]
  21. Tatonetti, N.P.; Ye, P.P.; Daneshjou, R.; Altman, R.B. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 2012, 4, 125ra31. [Google Scholar] [CrossRef]
  22. Zeng, X.; Jia, Z.; He, Z.; Chen, W.; Lu, X.; Duan, H.; Li, H. Measure clinical drug-drug similarity using Electronic Medical Records. Int. J. Med Inform. 2019, 124, 97–103. [Google Scholar] [CrossRef]
  23. Jiang, H.J.; Huang, Y.A.; You, Z.H. Predicting Drug-Disease Associations via Using Gaussian Interaction Profile and Kernel-Based Autoencoder. BioMed Res. Int. 2019, 2019, 2426958. [Google Scholar] [CrossRef] [PubMed]
  24. Liu, H.; Song, Y.; Guan, J.; Luo, L.; Zhuang, Z. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinform. 2016, 17, 269–277. [Google Scholar] [CrossRef]
  25. Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43, 493–500. [Google Scholar] [CrossRef] [PubMed]
  26. Hattori, M.; Tanaka, N.; Kanehisa, M.; Goto, S. SIMCOMP/SUBCOMP: Chemical structure search servers for network analyses. Nucleic Acids Res. 2010, 38, W652–W656. [Google Scholar] [CrossRef]
  27. Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197. [Google Scholar] [CrossRef] [PubMed]
  28. Yu, G.; Li, F.; Qin, Y.; Bo, X.; Wu, Y.; Wang, S. GOSemSim: An R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 2010, 26, 976–978. [Google Scholar] [CrossRef]
  29. Xue, H.; Li, J.; Xie, H.; Wang, Y. Review of drug repositioning approaches and resources. Int. J. Biol. Sci. 2018, 14, 1232–1244. [Google Scholar] [CrossRef]
  30. Hopkins, A.L. Network pharmacology: The next paradigm in drug discovery. Nat. Chem. Biol. 2018, 4, 682–690. [Google Scholar] [CrossRef]
  31. Zhang, W.; Yue, X.; Lin, W.; Wu, W.; Liu, R.; Huang, F.; Liu, F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinform. 2018, 19, 233. [Google Scholar] [CrossRef]
  32. Shi, J.Y.; Yiu, S.M.; Li, Y.; Leung, H.C.; Chin, F.Y. Predicting drug–target interaction for new drugs using enhanced similarity measures and super-target clustering. Methods 2015, 83, 98–104. [Google Scholar] [CrossRef]
  33. Nzali, M.D.; Bringay, S.; Lavergne, C.; Mollevi, C.; Opitz, T. What patients can tell us: Topic analysis for social media on breast cancer. MIR Med Inform. 2017, 5, e77–e79. [Google Scholar]
  34. Lin, D.; Pantel, P. Discovery of inference rules for question-answering. Nat. Lang. Eng. 2001, 7, 343–360. [Google Scholar] [CrossRef] [Green Version]
  35. Attali, Y.; Burstein, J. Automated essay scoring with e-rater® V. 2. J. Technol. Learn. Assess. 2006, 4. Available online: https://ejournals.bc.edu/index.php/jtla/article/view/1650 (accessed on 2 January 2023). [CrossRef]
  36. Dolan, W.; Quirk, C.; Brockett, C.; Dolan, B. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004. [Google Scholar]
  37. Gomaa, W.H.; Fahmy, A.A. A survey of text similarity approaches. Int. J. Comput. Appl. 2008, 68, 13–18. [Google Scholar]
  38. Lardon, J.; Abdellaoui, R.; Bellet, F.; Asfari, H.; Souvignet, J.; Texier, N.; Jaulent, M.-C.; Beyens, M.-N.; Burgun, A.; Bousquet, C. Adverse drug reaction identification and extraction in social media: A scoping review. J. Med. Internet Res. 2015, 17, e171. [Google Scholar] [CrossRef] [PubMed]
  39. Chen, Z.; Brodie, M.J.; Liew, D.; Kwan, P. Treatment Outcomes in Patients with Newly Diagnosed Epilepsy Treated with Established and New Antiepileptic Drugs: A 30-Year Longitudinal Cohort Study. JAMA Neurol. 2018, 75, 279–286. [Google Scholar] [CrossRef]
  40. Kuzmanova, R.; Stefanova, I. Basic mechanisms of action of the antiepileptic drugs. Acta Medica Bulgarica 2017, 44, 52–58. [Google Scholar] [CrossRef]
  41. Hand, D.; Mannila, H.; Smyth, P. Principles of Data Mining; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
  42. Xu, R.; Wunsch, D. Survey of Clustering Algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678. [Google Scholar] [CrossRef]
  43. Aggarwal, C.C. Data Mining: The Textbook; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  44. Sedgwick, P. Pearson’s correlation coefficient. BMJ 2012, 345, e4483. [Google Scholar] [CrossRef]
  45. Azad, A.K.M.; Dinarvand, M.; Nematollahi, A.; Swift, J.; Lutze-Mann, L.; Vafaee, F. A comprehensive integrated drug similarity resource for in-silico drug repositioning and beyond. Briefings Bioinform. 2021, 22, bbaa126. [Google Scholar] [CrossRef]
Figure 1. Drug–Drug similarity computation from social media.
Figure 1. Drug–Drug similarity computation from social media.
Bioengineering 10 00182 g001
Figure 2. Snapshot of patients’ reviews extracted from Askapatient.com (accessed on 2 January 2023) for Lamictal (Lamotrigine).
Figure 2. Snapshot of patients’ reviews extracted from Askapatient.com (accessed on 2 January 2023) for Lamictal (Lamotrigine).
Bioengineering 10 00182 g002
Figure 3. AEDs. Drug-Drug Similarity-based correlations from DrugSimDB.
Figure 3. AEDs. Drug-Drug Similarity-based correlations from DrugSimDB.
Bioengineering 10 00182 g003
Table 1. List of AEDs.
Table 1. List of AEDs.
NumberGeneric NameNumber of Reviews
1Carbamazepine283
2Oxcarbazepine357
3Gabapentin914
4Pregabalin1392
5Acetazolamide155
6Lamotrigine1845
7Levetiracetam190
8Topiramate1764
9Phenytoin183
10Diazepam393
11Clonazepam324
12Klonopin217
13Divalproex783
14Divalproex-ER566
Total9366
Table 2. TAEDs Drug-Drug Similarity using CS Measure.
Table 2. TAEDs Drug-Drug Similarity using CS Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine1.000.650.630.580.450.620.540.580.520.460.540.510.560.56
Oxcarbazepine0.651.000.600.550.470.690.560.590.490.470.570.530.610.62
Gabapentin0.630.601.000.750.460.580.470.560.420.510.560.540.540.54
Pregabalin0.580.550.751.000.440.530.430.530.380.450.490.470.500.51
Acetazolamide0.450.470.460.441.000.450.380.580.360.340.400.370.420.42
Lamotrigine0.620.690.580.530.451.000.550.600.480.480.570.550.610.62
Levetiracetam0.540.560.470.430.380.551.000.470.480.410.480.460.490.49
Topiramate0.580.590.560.530.580.600.471.000.430.420.500.470.590.57
Phenytoin0.520.490.420.380.360.480.480.431.000.370.430.410.430.43
Diazepam0.460.470.510.450.340.480.410.420.371.000.640.650.410.43
Clonazepam0.540.570.560.490.400.570.480.500.430.641.000.760.490.51
Klonopin0.510.530.540.470.370.550.460.470.410.650.761.000.470.49
Divalproex0.560.610.540.500.420.610.490.590.430.410.490.471.000.81
Divalproex-ER0.560.620.540.510.420.620.490.570.430.430.510.490.811.00
Table 3. AEDs Drug-Drug Similarity using ED Measure.
Table 3. AEDs Drug-Drug Similarity using ED Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine0.000.360.610.970.550.880.461.160.480.590.520.830.570.43
Oxcarbazepine0.360.000.620.980.550.810.451.140.510.600.500.810.520.39
Gabapentin0.610.620.000.590.770.820.761.050.820.720.660.790.700.69
Pregabalin0.970.980.590.001.101.001.121.111.161.071.021.050.981.04
Acetazolamide0.550.550.771.100.001.040.611.160.630.720.670.980.710.57
Lamotrigine0.880.810.821.001.040.000.950.941.020.990.880.890.800.90
Levetiracetam0.460.450.761.120.610.950.001.260.500.650.580.890.650.49
Topiramate1.161.141.051.111.160.941.260.001.301.291.191.191.041.18
Phenytoin0.480.510.821.160.631.020.501.300.000.680.630.940.710.55
Diazepam0.590.600.721.070.720.990.651.290.680.000.440.650.760.63
Clonazepam0.520.500.661.020.670.880.581.190.630.440.000.520.660.55
Klonopin0.830.810.791.050.980.890.891.190.940.650.520.000.880.86
Divalproex0.570.520.700.980.710.800.651.040.710.760.660.880.000.35
Divalproex-ER0.430.390.691.040.570.900.491.180.550.630.550.860.350.00
Table 4. AEDs Drug-Drug Similarity using MD Measure.
Table 4. AEDs Drug-Drug Similarity using MD Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine0.007.0611.4716.419.4019.227.3821.867.729.999.3015.279.367.34
Oxcarbazepine7.060.0010.5115.609.7517.657.4621.098.8910.389.1814.798.727.19
Gabapentin11.4710.510.009.4314.3915.0612.6417.1513.7012.3210.8112.8610.7412.30
Pregabalin16.4115.609.430.0018.9514.7917.7515.7418.7516.9215.3315.0114.4617.17
Acetazolamide9.409.7514.3918.950.0023.129.4222.979.9812.0411.9618.5612.658.95
Lamotrigine19.2217.6515.0614.7923.120.0020.2914.7021.5120.3718.5916.7816.1920.03
Levetiracetam7.387.4612.6417.759.4220.290.0023.177.5410.109.3715.629.967.16
Topiramate21.8621.0917.1515.7422.9714.7023.170.0024.3423.1821.3919.5418.8522.79
Phenytoin7.728.8913.7018.759.9821.517.5424.340.0010.8210.5216.9511.418.29
Diazepam9.9910.3812.3216.9212.0420.3710.1023.1810.820.006.8911.6411.749.98
Clonazepam9.309.1810.8115.3311.9618.599.3721.3910.526.890.0010.1510.139.30
Klonopin15.2714.7912.8615.0118.5616.7815.6219.5416.9511.6410.150.0014.2915.98
Divalproex9.368.7210.7414.4612.6516.199.9618.8511.4111.7410.1314.290.007.86
Divalproex-ER7.347.1912.3017.178.9520.037.1622.798.299.989.3015.987.860.00
Table 5. AEDs Drug-Drug Similarity using JC Measure.
Table 5. AEDs Drug-Drug Similarity using JC Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine1.000.510.370.290.400.240.470.250.470.410.440.320.440.45
Oxcarbazepine0.511.000.400.310.390.270.480.270.460.430.470.340.460.46
Gabapentin0.370.401.000.560.250.460.300.480.310.400.420.510.490.27
Pregabalin0.290.310.561.000.200.520.230.530.240.330.340.510.420.20
Acetazolamide0.400.390.250.201.000.160.400.170.380.320.330.210.300.42
Lamotrigine0.240.270.460.520.161.000.190.580.210.270.280.480.390.17
Levetiracetam0.470.480.300.230.400.191.000.200.470.390.420.270.380.48
Topiramate0.250.270.480.530.170.580.201.000.210.270.280.480.390.17
Phenytoin0.470.460.310.240.380.210.470.211.000.380.390.270.380.43
Diazepam0.410.430.400.330.320.270.390.270.381.000.530.410.410.36
Clonazepam0.440.470.420.340.330.280.420.280.390.531.000.430.460.40
Klonopin0.320.340.510.510.210.480.270.480.270.410.431.000.450.23
Divalproex0.440.460.490.420.300.390.380.390.380.410.460.451.000.36
Divalproex-ER0.450.460.270.200.420.170.480.170.430.360.400.230.361.00
Table 6. AEDs Drug-Drug Similarity-based Rankings using CS Measure.
Table 6. AEDs Drug-Drug Similarity-based Rankings using CS Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine1235144106111391287
Oxcarbazepine3161014297121381154
Gabapentin3412135127141169108
Pregabalin3421126135141191087
Acetazolamide5347161121314101289
Lamotrigine3271114196131281054
Levetiracetam4210121431971381165
Topiramate5489621111314101237
Phenytoin2310121454611371189
Diazepam7648145121013132119
Clonazepam7561014412913312118
Klonopin7659144121013321118
Divalproex6378134105121491112
Divalproex-ER6379144105131281121
Table 7. AEDs Drug-Drug Similarity-based Rankings using ED Measure.
Table 7. AEDs Drug-Drug Similarity-based Rankings using ED Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine1210137124145961183
Oxcarbazepine2110138124146951173
Gabapentin3412101391412851176
Pregabalin3521116131214107948
Acetazolamide2310131125146971184
Lamotrigine6341214110913115728
Levetiracetam3210137121145861194
Topiramate8645721211413111039
Phenytoin2410136123141871195
Diazepam3410139126148127115
Clonazepam4391311127148215106
Klonopin6541312109141132187
Divalproex4371391151481061212
Divalproex-ER4310138125146971121
Table 8. Drug-Drug Similarity-based Rankings of AEDs using MD Measure.
Table 8. Drug-Drug Similarity-based Rankings of AEDs using MD Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
Carbamazepine1210128134145961173
Oxcarbazepine2110128134146971153
Gabapentin6312121391411851047
Pregabalin9721144128131065311
Acetazolamide3510121144136871192
Lamotrigine9743141112131286510
Levetiracetam3410127131145961182
Topiramate9743112121141386510
Phenytoin3510126132141871194
Diazepam4611121013514712893
Clonazepam4310121113614921875
Klonopin8647131191412321510
Divalproex4371210135148961112
Divalproex-ER4310127132146981151
Table 9. AEDs Drug-Drug Similarity-based Rankings using JC Measure.
Table 9. AEDs Drug-Drug Similarity-based Rankings using JC Measure.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
carbamazepine1210129144133871165
oxcarbazepine2191210143137841165
gabapentin1081214612511973413
pregabalin1092114412311875613
acetazolamide4510121143136871192
lamotrigine1085314112211974613
levetiracetam5310127141134861192
topiramate1095314212111874613
phenytoin2410128143131961175
Diazepam6371112148139125410
clonazepam5371112148131021649
Klonopin1092314412511871613
divalproex6327141012911845113
divalproex-ER4310126142135871191
Table 10. AEDs Drug-Drug Similarity-based Correlations.
Table 10. AEDs Drug-Drug Similarity-based Correlations.
CarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamKlonopinDivalproexDivalproex-ER
CarbamazepineCS1.000.980.31−0.480.84−0.610.96−0.640.900.720.720.050.850.94
ED1.000.990.33−0.070.910.030.96−0.500.930.710.630.280.810.88
MD1.000.980.31−0.480.84−0.610.96−0.640.900.720.720.050.850.94
JC1.000.93−0.56−0.690.76−0.710.89−0.740.970.480.52−0.660.050.89
OxcarbazepineCS0.851.000.620.620.390.980.740.690.530.150.270.220.890.90
ED0.991.000.37−0.020.870.130.96−0.460.890.720.670.340.860.91
MD0.981.000.36−0.410.81−0.560.93−0.580.850.670.710.070.900.96
JC0.931.00−0.45−0.580.71−0.620.87−0.640.880.610.68−0.560.200.86
GabapentinCS0.860.621.000.950.450.570.120.380.050.480.450.470.490.50
ED0.330.371.000.780.240.210.260.020.150.300.330.360.440.31
MD0.310.361.000.540.070.150.210.110.080.230.410.610.550.27
JC−0.56−0.451.000.96−0.850.88−0.740.88−0.640.090.060.960.73−0.77
PregabalinCS0.870.620.951.000.560.570.060.530.000.300.250.270.590.58
ED−0.07−0.020.781.00−0.140.55−0.200.50−0.29−0.11−0.020.210.200.02
MD−0.48−0.410.541.00−0.740.85−0.590.80−0.71−0.43−0.220.51−0.13−0.51
JC−0.69−0.580.961.00−0.930.94−0.830.95−0.75−0.07−0.100.950.57−0.86
AcetazolamideCS0.510.390.450.561.000.360.010.80−0.01−0.27−0.22−0.260.410.33
ED0.910.870.24−0.141.00−0.140.87−0.420.870.590.470.140.700.78
MD0.840.810.07−0.741.00−0.840.89−0.770.870.620.53−0.170.630.85
JC0.760.71−0.85−0.931.00−0.970.89−0.980.790.180.24−0.90−0.390.93
LamotrigineCS0.790.980.570.570.361.000.720.710.490.200.340.290.870.90
ED0.030.130.210.55−0.141.00−0.050.49−0.20−0.010.150.420.390.19
MD−0.61−0.560.150.85−0.841.00−0.700.97−0.82−0.67−0.490.10−0.35−0.63
JC−0.71−0.620.880.94−0.971.00−0.860.99−0.78−0.16−0.180.920.49−0.89
LevetiracetamCS0.510.740.120.060.010.721.000.340.82−0.16−0.01−0.070.580.58
ED0.960.960.26−0.200.87−0.051.00−0.590.940.720.640.270.750.82
MD0.960.930.21−0.590.89−0.701.00−0.730.940.720.67−0.030.790.96
JC0.890.87−0.74−0.830.89−0.861.00−0.880.920.340.38−0.80−0.240.99
TopiramateCS0.590.690.380.530.800.710.341.000.22−0.24−0.12−0.180.770.71
ED−0.50−0.460.020.50−0.420.49−0.591.00−0.66−0.73−0.68−0.40−0.16−0.33
MD−0.64−0.580.110.80−0.770.97−0.731.00−0.85−0.75−0.560.02−0.39−0.65
JC−0.74−0.640.880.95−0.980.99−0.881.00−0.79−0.14−0.170.920.47−0.91
PhenytoinCS0.460.530.050.00−0.010.490.820.221.00−0.23−0.09−0.160.330.30
ED0.930.890.15−0.290.87−0.200.94−0.661.000.640.550.160.660.77
MD0.900.850.08−0.710.87−0.820.94−0.851.000.670.57−0.130.650.87
JC0.970.88−0.64−0.750.79−0.780.92−0.791.000.380.42−0.72−0.090.89
DiazepamCS0.160.150.480.30−0.270.20−0.16−0.24−0.231.000.960.98−0.070.07
ED0.710.720.30−0.110.59−0.010.72−0.730.641.000.960.710.460.54
MD0.720.670.23−0.430.62−0.670.72−0.750.671.000.940.420.600.65
JC0.480.610.09−0.070.18−0.160.34−0.140.381.000.990.050.630.30
ClonazepamCS0.190.270.450.25−0.220.34−0.01−0.12−0.090.961.000.990.050.19
ED0.630.670.33−0.020.470.150.64−0.680.550.961.000.840.450.50
MD0.720.710.41−0.220.53−0.490.67−0.560.570.941.000.580.690.63
JC0.520.680.06−0.100.24−0.180.38−0.170.420.991.000.010.640.35
KlonopinCS0.170.220.470.27−0.260.29−0.07−0.18−0.160.980.991.000.010.14
ED0.280.340.360.210.140.420.27−0.400.160.710.841.000.310.26
MD0.050.070.610.51−0.170.10−0.030.02−0.130.420.581.000.26−0.02
JC−0.66−0.560.960.95−0.900.92−0.800.92−0.720.050.011.000.63−0.83
DivalproexCS0.710.890.490.590.410.870.580.770.33−0.070.050.011.000.98
ED0.810.860.440.200.700.390.75−0.160.660.460.450.311.000.96
MD0.850.900.55−0.130.63−0.350.79−0.390.650.600.690.261.000.89
JC0.050.200.730.57−0.390.49−0.240.47−0.090.630.640.631.00−0.27
Divalproex-ERCS0.700.900.500.580.330.900.580.710.300.070.190.140.981.00
ED0.880.910.310.020.780.190.82−0.330.770.540.500.260.961.00
MD0.940.960.27−0.510.85−0.630.96−0.650.870.650.63−0.020.891.00
JC0.890.86−0.77−0.860.93−0.890.99−0.910.890.300.35−0.83−0.271.00
Table 11. Average AEDs Drug-Drug Similarity-based Correlations.
Table 11. Average AEDs Drug-Drug Similarity-based Correlations.
AVGCarbamazepineOxcarbazepineGabapentinPregabalinAcetazolamideLamotrigineLevetiracetamTopiramatePhenytoinDiazepamClonazepamDivalproexDivalproex-ER
Carbamazepine1.000.940.23−0.090.75−0.130.83−0.320.820.520.520.600.85
Oxcarbazepine0.941.000.22−0.100.70−0.010.87−0.250.780.540.580.710.91
Gabapentin0.230.221.000.81−0.020.45−0.040.35−0.090.270.310.550.08
Pregabalin−0.09−0.100.811.00−0.310.73−0.390.69−0.44−0.08−0.020.31−0.19
Acetazolamide0.750.70−0.02−0.311.00−0.400.66−0.340.630.280.250.340.72
Lamotrigine−0.13−0.010.450.73−0.401.00−0.220.79−0.33−0.16−0.050.35−0.11
Levetiracetam0.830.87−0.04−0.390.66−0.221.00−0.470.900.400.420.470.84
Topiramate−0.32−0.250.350.69−0.340.79−0.471.00−0.52−0.47−0.380.17−0.30
Phenytoin0.820.78−0.09−0.440.63−0.330.90−0.521.000.370.360.390.71
Diazepam0.520.540.27−0.080.28−0.160.40−0.470.371.000.960.410.39
Clonazepam0.520.580.31−0.020.25−0.050.42−0.380.360.961.000.460.42
Divalproex0.600.710.550.310.340.350.470.170.390.410.461.000.64
Divalproex-ER0.850.910.08−0.190.72−0.110.84−0.300.710.390.420.641.00
Table 12. AEDs Drug-Drug Similarities from DrugSimDB.
Table 12. AEDs Drug-Drug Similarities from DrugSimDB.
CS & EDED & MDCS & MDCS & JCED & JCMD & JCAVG. Performance
Carbamazepine0.110.990.130.100.950.970.54
Oxcarbazepine0.340.980.330.310.960.950.65
Gabapentin0.640.940.580.600.240.400.57
Pregabalin0.840.780.690.620.520.890.72
Acetazolamide0.110.960.02−0.010.951.000.51
Lamotrigine0.840.600.550.320.480.930.62
Levetiracetam0.560.970.530.460.960.980.74
Topiramate0.870.890.650.300.610.860.70
Phenytoin0.510.990.450.560.960.960.74
Diazepam0.400.960.270.530.740.710.60
Clonazepam0.570.950.420.510.850.890.70
Klonopin0.780.860.730.580.220.490.61
Divalproex0.470.980.420.340.230.290.46
Divalproex-ER0.350.950.200.020.850.950.55
Average Performance0.530.910.430.370.680.80
Table 13. Agreement Analysis of Similarity Measures Performance.
Table 13. Agreement Analysis of Similarity Measures Performance.
Drug_1Drug_2Structure SimilarityTarget SimilarityPathway SimilarityGO_CC SimilarityGO_MF SimilarityGO_BP SimilarityAverage
ClonazepamDiazepam0.470.9510.850.860.850.796
CarbamazepinePhenytoin0.420.65NA0.920.950.90.768
CarbamazepineOxcarbazepine0.640.48NA0.770.80.780.694
OxcarbazepinePhenytoin0.380.57NA0.790.790.830.672
DiazepamTopiramate00.8610.840.80.720.644
DiazepamLamotrigine0.220.78NA0.690.750.70.628
ClonazepamLamotrigine0.20.74NA0.720.70.680.608
LamotrigineTopiramate00.73NA0.650.820.750.59
ClonazepamTopiramate00.8310.640.730.660.572
PhenytoinValproic Acid00.57NA0.650.730.610.512
GabapentinPregabalin0.210.3410.690.670.620.506
CarbamazepineValproic Acid0.010.41NA0.740.720.60.496
Table 14. AEDs Drug-Drug Similarity Evaluation Results.
Table 14. AEDs Drug-Drug Similarity Evaluation Results.
Threshold0.50.60.70.750.80
Precision (P)0.290.350.440.540.40
Recall (R)0.670.670.670.580.33
F10.400.460.530.560.36
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asiri, Y. Computing Drug-Drug Similarity from Patient-Centric Data. Bioengineering 2023, 10, 182. https://doi.org/10.3390/bioengineering10020182

AMA Style

Asiri Y. Computing Drug-Drug Similarity from Patient-Centric Data. Bioengineering. 2023; 10(2):182. https://doi.org/10.3390/bioengineering10020182

Chicago/Turabian Style

Asiri, Yousef. 2023. "Computing Drug-Drug Similarity from Patient-Centric Data" Bioengineering 10, no. 2: 182. https://doi.org/10.3390/bioengineering10020182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop