Next Article in Journal
End-of-Life Scenarios for Mass Timber: Assumptions, Limitations and Potentials—A Literature Review
Previous Article in Journal
Molybdenite Re–Os and Zircon U–Pb Isotopic Constraints on Gold Mineralization Associated with Fine-Grained Granite in the Xiawolong Deposit, Jiaodong Peninsula, East China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity

Faculty of Pharmacy, Medical University of Sofia, 1000 Sofia, Bulgaria
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(3), 1206; https://doi.org/10.3390/app15031206
Submission received: 24 December 2024 / Revised: 19 January 2025 / Accepted: 22 January 2025 / Published: 24 January 2025

Abstract

:
Computational approaches applied in drug discovery have advanced significantly over the past few decades. These techniques are commonly grouped under the term “computer-aided drug design” (CADD) and are now considered one of the key pillars of pharmaceutical discovery pipelines in both academic and industrial settings. In this work, we review Quantitative Structure–Activity Relationships (QSARs), one of the most used ligand-based drug design (LBDD) methods, with a focus on its application in the discovery and development of anti-breast cancer drugs. Critical steps in the QSAR methodology, essential for its correct application—but often overlooked, leading to insignificant or misleading models—are examined. Additionally, current anti-breast cancer treatment strategies were briefly overviewed, along with some targets for future treatments. The review covers QSAR studies from the past five years and includes a discussion of notable works that could serve as models for future applications of this interdisciplinary and complex method and that may help in feature drug design and development.

1. Introduction

Most drugs exert their pharmacological effects through specific interactions with target macromolecules. Several theories explain this unique interaction, including “lock-and-key” [1], “induced fit” [2], and “conformational selection” (also known as the “population shift hypothesis”) [3]. These theories are based on the chemical structure of the molecules, their dynamic conformational properties, and how these factors influence receptor binding [4,5,6,7].
The formation of a drug–biomacromolecule complex involves only certain atoms from both molecules, referred to as the interface area [8]. In biochemistry and molecular biology, a binding site is a region on a biomacromolecule, such as a protein, functioning as a receptor, enzyme, etc., that specifically binds another molecule. The binding partner of the biomacromolecule is often referred to as a ligand. The portion of the interface area belonging to the drug is called the biophore. A biophore is defined as the essential geometric arrangement of atoms or functional groups in the ligand that can bind to the biomacromolecule.
A pharmacophore is a set of structural features in a molecule recognized at a receptor site, responsible for the molecule’s biological activity [9]. The pharmacophore is common to all active molecules and is essential for their action. Chemical groups that are not part of the interface area but support the pharmacophore conformationally are called linkers or spacers. These groups usually differ in structure among various active molecules.
One of the most important goals in the pharmaceutical industry is to identify new chemical entities (NCEs) with the potential to become approved drugs. An established technology adopted by the pharmaceutical industry is the so-called wet-lab high-throughput screening (HTS), which helps achieve this goal. On the other hand, the high cost and low hit rate associated with high-throughput screening (HTS) have driven the development of computational alternatives, enabling cheaper and faster in silico screening. The concept of using computers to perform “virtual” screening before experimental testing is now referred to as “high-throughput virtual screening” (HTVS) [10,11].
The completion of the Human Genome Project has unveiled a vast array of attractive druggable targets. Druggable targets are proteins capable of binding drug-like compounds with a certain binding affinity. In this context, “druggability” refers to the ability of a target protein to be therapeutically modulated by medicines. The “druggable genome” encompasses all genes encoding such proteins [12,13].
Meanwhile, advances in structural biology techniques, including X-ray crystallography, nuclear magnetic resonance spectroscopy, and cryogenic electron microscopy (cryo-EM), have paved the way for the development of structure-based virtual screening (SBVS) [14,15,16,17,18].
When the exact structure of the biological target of interest is unknown, information from ligands can be extracted and leveraged along with previously obtained experimental data. This concept forms the basis of Quantitative Structure–Activity Relationship (QSAR) modeling, which is a key component of ligand-based drug design (LBDD) methods.
LBDD, also known as pharmacophore-based design, enables the development of new active compounds by using the critical features of one or more molecules with the same mechanism of action, in other words, molecules that interact with the same target macromolecule [19]. The primary goal is to analyze input data to identify the pharmacophore responsible for the pharmacological activity. This identification is based on structural similarity, typically achieved using the alignment approach [20,21]. The identified pharmacophore can be used to design new active molecules through various methods, including the design of structural analogs, 3D database searching, de novo design, and manual design.
There are three major categories of ligand-based drug design (LBDD): QSAR, pharmacophore modeling, and similarity searching [22,23]. Similarity searching is based on the similarity-property principle, which states that similar molecules are likely to exhibit similar biological properties and activities. Similarities can be quantified using molecular fingerprints, 3D shape descriptors, or physicochemical properties [22]. It is utilized to identify compounds with chemical or physicochemical properties similar to those of known active compounds [24,25]. Additionally, it can be applied to expand structure–activity relationships within a given series and to prioritize compounds for synthesis or biological evaluation.
Pharmacophore modeling identifies the spatial arrangement of key electronic and steric features of ligands, such as hydrogen bond donors/acceptors, aliphatic hydrophobic groups, aromatic rings, ionizable groups, and others, which are responsible for their activity, specifically their interactions with the targeted protein [26]. It is applied in virtual screening to identify novel hits, facilitate de novo design and support various applications such as lead optimization and multitarget drug design.
QSAR is an effective method for analyzing and harnessing the relationship between chemical structures and their biological activities. Through the use of mathematical models, it enables the prediction of biological activity for chemical compounds based on their structural and physicochemical features. This method plays a crucial role in discovering new drug candidates and improving the properties of lead compounds, thereby advancing the development of potential therapeutics.
This review is organized into three main sections. The first section provides a brief overview of the history of QSAR, its methodology, and the main steps in QSAR modeling. The second section focuses on anti-breast cancer drug discovery, reviewing the different types of breast cancer, some of the current treatment strategies, and potential targets for feature treatments. Finally, QSAR studies from the past five years are summarized with a discussion of some key studies.

2. QSAR—A Brief Overview

The roots of QSAR can be traced back about 100 years, when Meyer and Overton made an important observation that the narcotic properties of anesthetizing gases and organic solvents correlated with their solubility in olive oil [27,28,29,30]. A significant advancement in the development of QSAR was the introduction of the so-called Hammett constants, which represent the effects induced by substituents in organic molecules on the rate of chemical reactions [31]. The reaction equilibrium shifts due to different substituents. Hammett proposed a simple equation (Equation (1)) involving a substituent constant, σ, specific to each substituent in the molecule, and a reaction constant, ρ, dependent on the reaction type.
logK = logK0 + ρ × σ
where logK0 and logK are the dissociation constants of the unsubstituted and substituted benzoic acid, σ is the Hammet constant, and ρ is the reaction constant.
QSAR formally began in the early 1960s with the works of Hansch and Fujita [32,33] and Free and Wilson [34]. Hansch and Fujita extended Hammett’s equation by incorporating the electronic properties of substituents, as follows:
log1/C = b0 + b1σ + b2logP
where log1/C stands for the biological activity defined as the logarithm of the reciprocal of effective dose (C = ED) or inhibitory concentration (C = IC) that is needed to produce a certain biological effect, and P is the octane/water partition coefficient.
The Free–Wilson method quantifies the observation that changing a substituent at one position of a molecule often has an effect independent of substituent changes at other positions. This effect is considered additive in nature [34].
Briefly, QSAR modeling begins with a library of chemical compounds that are assayed for their biological activity using a suitable assay system. Chemical descriptors are then calculated for these compounds, and the resulting numerical data are correlated with biological activities using appropriate data analysis or machine learning algorithms. This process leads to the development of a QSAR model, as schematically presented in Figure 1.
The generated model must be validated; if found valid, it can be used to interpret chemical effects involved in biological activity and/or to predict the biological activities of new compounds.
QSAR modeling focuses on the chemical properties of a series of compounds. The chemical variation within this series defines a theoretical space known as chemical space. A compound’s position in this space determines its biological activity. As a result, QSAR models typically consider a small subset of the entire chemical space, where model predictions for a specific set of compounds are considered reliable [35].

2.1. Chemical Space

The process of drug discovery is an extremely costly and time-consuming endeavor aimed at ensuring the safety and quality of new chemical entities entering the market. It is estimated that developing a novel molecule can take up to 14 years and cost more than USD one billion from target identification to regulatory approval [36].
One significant challenge in the medical field is the vast chemical space, forming what is known as the “drug-like” environment. An estimated 10200 drug-like molecules could theoretically be synthesized. Screening all of them would take approximately 2 × 10193 years if one molecule were tested every second [37]. This highlights the necessity of employing smarter methods such as Statistical Molecular Design (SMD) [38]. SMD is an approach used to intelligently select chemical features, effectively expanding the chemical space and increasing the informational content of a chemical library. All available compounds or chemical structural blocks (n) are described using (p) descriptors, forming an n × p data matrix X, where each row represents a compound from the library and each column corresponds to the value of a specific descriptor. The matrix X is then subjected to Principal Component Analysis (PCA) [39,40], offering two key options: (1) summarizing the information from p descriptors into m < p principal components that explain most of the variance in the X matrix; (2) using orthogonal (uncorrelated) principal components for fractional factorial designs. The basic workflow for SMD is presented in Figure 2.
PCA reduces the complexity of original variables by transforming them into principal component scores—linear combinations of the original data—making them easier to interpret. These scores, referred to as principal properties (PPs), represent key chemical characteristics and are mathematically independent. The values of each principal component score vector reflect changes in chemical properties when transitioning between compounds or substituents. The PPs are systematically varied across the entire dataset, ensuring that each PP is explored at all levels of the others. In full and fractional factorial designs, PPs are typically evaluated at two levels: a low level (minus) and a high level (plus), although some designs may include three to five levels. This results in a table of plus and minus signs that defines the optimal combinations of PPs, representing the desired chemical properties for a subset of representative compounds. This approach enables the selection of a subset of compounds or substituents that are both representative and diverse. The combination of full or fractional factorial design with PCA results is referred to as Statistical Molecular Design (Figure 2). By selecting a training set based on SMD principles, the degree of multicollinearity in the latent variables (PPs) is automatically reduced, thereby improving subsequent regression analysis [38,41,42,43].

2.2. Activities in QSAR and QSPR

Quantitative Structure–Property Relationships (QSPRs) are closely related to QSAR, with the primary distinction being the type of activity studied. While QSAR focuses on biological activities such as Kd, Ki, ED50, IC50, and binary representations like active/inactive, QSPR investigates chemical or physical properties of compounds, including melting point, vapor pressure, pKa, logP, and more.
In QSAR/QSPR analysis, the measured observation is often referred to as an “end-point”, indicating that the value has been obtained through a series of experimental steps, ultimately yielding data suitable for analysis. End-points are typically classified into the following categories:
  • Physicochemical end-points: these include properties such as logP, logD, pKa, logS, and water stability.
  • ADMET end-points: these involve parameters like permeability through passive transport measured via PAMPA (Parallel Artificial Membrane Permeability Assay) [44], blood–brain barrier permeability (PAMPA-BBB) [45], absorption [46], drug transporters assessed through P-glycoprotein transporter assays [47], and metabolites [48].
  • Toxicological, drug interaction, and secondary pharmacology end-points: this category covers cytotoxicity [49], carcinogenicity [50], genotoxicity [51,52], metabolism induction/inhibition [53,54], and drug transporter inhibition [55].
There is some overlap among these categories, as certain end-points may influence more than one classification.

3. QSAR Today

Today, we continue to apply the same principles that were used in the development of the first QSAR model. However, significant advancements in technology now allow for the generation of many new descriptors for chemical compounds and improved methods for data acquisition.
QSAR models today can be broadly divided into two categories: “global” QSAR and “local” QSAR. Global QSAR models are built from datasets containing hundreds of thousands of chemical compounds, covering extensive regions of the chemical space. These models are typically available only for highly standardized cases, where consistent data are gathered across various compound types. Their primary aim is to provide predictions for virtually all types of compounds. These models are widely used in drug profiling, a standard practice in the early stages of drug discovery, where compounds are tested to identify unsuitable properties for early elimination. However, the high costs involved make testing every compound across all assays impractical. When sufficient data are available, QSAR models can replace some wet-laboratory tests. The pharmaceutical industry has made significant progress in this area, with models built on data from over 100,000 compounds that can perform as effectively as wet-laboratory tests. By leveraging these models, in silico profiling of all compounds can be conducted even before synthesis, enhancing efficiency in drug discovery. Examples include models predicting drug permeability in the Caco-2 cell system [56], mutagenicity in the Ames test [57], drug–plasma protein binding [58], logP [59], pKa [60], and water solubility of chemical compounds [61]. The development of global QSAR models heavily relies on experimental data that is consistently collected under the same laboratory conditions, using the same assay, and often by the same person or through automated processes. This could be seen as a disadvantage, as it may result in the generation of inaccurate models.
In contrast, local QSAR models are developed using smaller datasets, sometimes comprising fewer than 100 compounds, and in some cases, only 10. In lead optimization projects, these models are frequently developed in an iterative process, where data from newly synthesized compounds are incorporated into the existing dataset [62]. Local QSAR studies often focus on closely related compounds, typically varying only in the substituents attached to a shared scaffold. These models can provide valuable insights to direct further synthesis aimed at improving the target activities of the compounds. One disadvantage of these models is that their applicability domain (see Section 3.9) is often restricted to a specific class of compounds with similar properties and close structural relationships.
Additionally, there is a third type of QSAR model developed for classification purposes. Examples include models that broadly classify chemical compounds as active or inactive against specific target groups (e.g., kinase-active or -inactive [63], toxic or non-toxic [64], water-soluble or insoluble [65].
QSAR can be defined as a method for building computational or mathematical models that attempt to find a statistically significant correlation between chemical structure and biological activity using chemometric techniques [66,67,68]. In drug design, structural information is represented by molecular descriptors, while biological activity (BA) is expressed as a function of these descriptors (Equation (3)).
BA = f(molecular descriptors)
The QSAR model development process typically follows an iterative approach with the following steps:
1.
Selection of Chemical Compounds: identify relevant compounds for analysis.
2.
Assessment of Biological Activity: evaluate the biological activity of the selected compounds.
3.
Descriptors Calculation: calculate molecular descriptors.
4.
Data Matrix Setup: preprocess data into a uniform matrix.
5.
Data Partitioning: set aside a portion of the data for external validation.
6.
Feature Selection and Model Building: use chemometric methods to select relevant features and develop a QSAR model.
7.
Model Validation: internal using the training dataset and external using the test dataset.
8.
Model Interpretation and Prediction: interpret the model’s results and apply the model for predictions.

3.1. Selection of Chemical Compounds: Identify Relevant Compounds for Analysis

The first step in QSAR model development is gathering a set of chemical compounds (molecules). These compounds can be sourced from earlier in-house efforts, synthesized as needed, or obtained from chemical vendors specializing in early-phase drug discovery. Several databases are widely used in QSAR modeling. PubChem [69], ChemSpider [70], and DrugBank [71] provide extensive data on small molecules. Interaction databases like ChEMBL [72], NCI60 [73], STITCH [74], and BioAssay [75] offer valuable insights into biological and chemical entity interactions, bioactivities, and ADMET properties, supporting compound selection and modeling efforts. The use of database sources or literature references for selecting active chemical compounds should be approached with caution, as discussed in the next section. Relying on diverse sources of biological activity data or improperly preprocessed information—such as data that require recalibration between different sources to align with a common reference compound—can result in QSAR models that are unreliable, statistically unsound, and lack meaningful insights.
Regardless of the source, this is where Statistical Molecular Design (SMD) [38,76] becomes essential for selecting chemical compounds that sufficiently cover a large volume of the chemical space (see Section 2.1).
The compounds in the QSAR chemical library may be structurally diverse (lacking a common chemical scaffold) or structurally similar (sharing a common chemical scaffold). To minimize collinearity between variables, the molecules should be selected to maximize their independent behavior and orthogonality [77]. This approach enables the selected compounds to map significant descriptor space with a minimum number of compounds. Additionally, the selected compounds should be synthetically feasible.

3.2. Assessment of Biological Activity: Evaluate the Biological Activity of the Selected Compounds

QSAR can be applied to heterogeneous data, but certain considerations are essential to maintain the accuracy of biological data [78,79]:
  • Mechanism Consistency: chemical compounds should have the same mechanism of action and binding mode.
  • Congeneric Series: compounds should belong to a congeneric series.
  • Correlation with Binding Affinity: biological activity should correlate with binding affinity and be measurable.
  • Uniform Data Acquisition: Biological data should be obtained using consistent protocols, preferably from a single source (cells, tissues, or organs) and a single laboratory. Inter-assay and inter-laboratory variations can be managed by using assay and laboratory descriptors.
  • Standardized Units: activity data should be measured using the same units (e.g., Ki, IC50, or binding) expressed in mol/L.
  • Sufficient Activity Range: the activity range should cover more than three logarithmic units with an even data distribution.

3.3. Descriptors Calculation: Calculate Molecular Descriptors

The biological, chemical, and physical properties of chemical compounds are determined by their chemical structure. Molecular description addresses another significant topic: the computer-based representation of molecular structures. Descriptors should be computed from the chemical structure of compounds. If the compounds are well described, QSAR models can accurately predict the activities of novel compounds before their synthesis.
The “Handbook of Molecular Descriptors”, first published in 2000, provides a comprehensive overview of this topic [80]. New descriptors are introduced every year, aiming to identify the most suitable descriptor for specific chemical, biological, or physical properties. Various software tools are available for calculating molecular descriptors, including ACD/Labs (logP, logS, logD, pKa) [81], DRAGON and E-DRAGON (constitutional, topological, 2D-autocorrelations, geometrical, WHIM, GETAWAY, RDF, functional groups, etc.) [82,83,84], CDK (topological, geometrical, electronic, and constitutional) [85], CODESSA PRO (constitutional, topological, geometrical, charge-related, semi-empirical, thermodynamical) [86], MOE (topological, physical properties, structural keys, etc.) [87,88], MOLD2 (1D and 2D descriptors) [89,90], PowerMV (Constitutional, atom pairs, fingerprints, BCUT) [91], and PreADMET (constitutional, topological, geometrical, and physicochemical descriptors) [92]. While some of these tools are commercial, others are available as freeware.
Descriptor types can be classified based on dimensionality, referring to the number of geometric dimensions considered. Thus, 0D and 1D descriptors provide basic molecular information like molecular weight and number of constituent elements. Two-dimensional descriptors include topological indices computed from structural formulas, using graph theory [93]. Common examples are the connectivity index [94,95,96], Wiener’s index [97], Zagreb indices [98], Balaban’s J Index [99] and Kier shape indices [100]. Three-dimensional descriptors rely on three-dimensional atomic coordinates. They can be alignment-dependent, including Comparative Molecular Field Analysis (CoMFA) [101], Comparative Molecular Similarity Indices Analysis (CoMSIA) [102], Comparative Binding Energy Analysis (CoMBINE) [103], Comparative Residue Interaction Analysis (CoRIA) [104], and Hint Interaction Field Analysis (HIFA) [105], and alignment-independent including Comparative Molecular Moment Analysis (CoMMA) [106], Comparative Spectral Analysis (CoSA) [107], and Hologram-QSAR (HQSAR or Holo-QSAR) [108].
Descriptors can be categorized based on the molecular properties they represent, including constitutional, topological, geometrical, electronic, and thermodynamic characteristics [109,110]. Constitutional descriptors provide information about the molecular composition without considering atom connectivity. Examples include molecular weight, atom and bond counts, and elemental composition. Topological descriptors describe how atoms are connected within a molecule using molecular graph representations. Geometrical descriptors use the 3D coordinates of atoms to capture molecular size, shape, and atom distribution, including steric parameters such as molecular volume and surface area. Electronic descriptors characterize the electronic structure of a molecule, including properties like dipole and quadrupole moments, polarizabilities, atomic charges, and the energies of HOMO and LUMO orbitals. Thermodynamic descriptors reflect chemical behaviors related to energy, such as heat of formation and molar refractivity. For further reading, see references [109,110,111,112,113,114].

3.4. Data Matrix Setup: Preprocessing Data into a Uniform Matrix

Before building a QSAR model, we must setup a data matrix. The QSAR data matrix consists of rows and columns, where each row represents a compound containing independent (X) variables—descriptors of the compound—and dependent (Y) variables—measured biological activities or properties. Most QSAR models have only one Y variable, but it is possible to model several Y variables simultaneously.
Data preprocessing includes tasks such as smoothing, normalization, aggregation, data reduction, sampling when the dataset is too large, noise elimination, feature selection (see Section 3.3), data cleaning, data integration, and discretization. The result should be a uniform matrix, meaning a matrix with the same number of descriptors in each row, which will be used to build a QSAR model.

3.5. Data Partitioning: Set Aside a Portion of the Data for External Validation

A QSAR model is valuable if it can predict the activity of new molecules. To assess the quality of a model, scientists typically divide the data into training and test datasets. The training set is used to build the QSAR model, while the test set evaluates its predictive accuracy. Generally, around 15–30% of the data are set aside for external prediction, though this proportion depends on the homogeneity and size of the dataset.
Various methods are used for data splitting. These methods are based either on analyzing the similarity of compounds (X response) using techniques like PCA [39,40,115] or cluster analysis or on activity sampling (Y response) through methods such as random selection, Statistical Molecular Design (SMDs) [38,76], k-means clustering [116,117], Kennard-Stone selection [118], and Kohonen’s self-organizing map selection [119,120].

3.6. Feature Selection and Model Building: Use Chemometric Methods to Select Relevant Features and Develop a QSAR Model

Chemometrics is a discipline that applies mathematical and statistical methods to analyze chemical data. Since the relationships between the structural properties of chemical compounds and their biological activities are often non-linear, modeling can be applied to both linear and non-linear properties.
Chemometric methods are generally grouped into regression and classification techniques based on the type of predictions they make. Regression methods provide quantitative predictions of the Y variable(s), while classification methods categorize compounds, for example, as active or non-active.
A commonly used regression method is Multiple Linear Regression (MLR) [121], where the dependent variable (Y), such as biological activity or physicochemical properties, is predicted utilizing more than one independent variable (X), typically molecular descriptors (Equation (4)).
y = a + b1x1 + b2x2 +…
where y is the dependent variable—biological activity or physicochemical property, xi is an independent variable accounting for molecular descriptors, and bi is the corresponding regression coefficient.
When working with descriptors, the next critical step is determining which of these parameters are relevant to the biological activity and should be included in the regression equation. Two essential points must be considered when using Multiple Linear Regression (MLR) [121] for analysis: (1) compound-to-descriptor ratio—the ratio of compounds to descriptors should be at least 5:1—and (2) multicollinearity—high correlations among descriptors can cause spurious solutions. This issue can be addressed by performing feature selection to reduce the number of descriptors or by applying multivariate techniques such as Partial Least Squares (PLS) [122,123] or Principal Component Regression (PCR) [39,124].
Classification or clustering involves grouping similar data points to maximize similarity within groups while ensuring dissimilarity between groups. Commonly used methods include hierarchical clustering and k-means clustering. Hierarchical clustering uses distances between objects, typically measured by Euclidean distances, to form clusters based on their dissimilarities [125]. K-means clustering is a non-hierarchical method that partitions data points into clusters based on k-centroids, determined by the mean values of the objects in each cluster [116,117]. Other widely used classification techniques include linear discriminant analysis [126,127] and logistic regression [128].
Machine learning (ML) comprises advanced mathematical and statistical techniques for analyzing complex and large-scale datasets, including big data. ML strategies are broadly categorized into unsupervised learning and supervised learning (Table 1) [129]. In the unsupervised learning, the system identifies patterns or structures within data without labeled outputs, simplifying data interpretation. In contrast, in supervised learning, the system is trained on input–output pairs, enabling it to predict correct outputs for new inputs.
Popular ML techniques in QSAR modeling include k-nearest neighbor (k-NN) [136], Artificial Neural Networks (ANNs) [137], Support Vector Machines (SVMs) [138,139], Decision Trees (DTs) [133,134], Random Forest (RF) [140], XGBoost [141], etc.
Machine learning (ML) and Deep Learning (DL) are subfields of Artificial Intelligence (AI), each specializing in different aspects of data processing. The primary distinction between ML and DL lies in the scale and complexity of the data they handle. ML typically works with smaller datasets and focuses on decision-making and problem-solving for large but manageable datasets. It employs algorithms that detect patterns in categorized data, improving their accuracy as they process more information over time. Deep Learning, a subset of ML, utilizes neural networks with multiple hidden layers to analyze and learn from vast and complex datasets. These advanced neural networks are designed to capture intricate relationships and functions within the data. AI, as an overarching concept, encompasses technologies that simulate human intelligence, enabling machines to sense, reason, act, and adapt, effectively replicating or imitating human cognitive abilities [142,143,144].
Each method has specific parameters that can be tuned to improve model performance. The objective is to optimize these settings to create models that accurately predict the properties or activities of previously unseen compounds. A model that is too simple may underfit the data, while an overly complex model may overfit, performing well on training data but failing to generalize to new data.
Feature selection minimizes the descriptor set, retaining only those essential for defining the model.
For “global” QSAR models, which may involve hundreds or even thousands of compounds, simple descriptors are generally preferred due to the model’s inability to capture the full complexity of molecular interactions. Examples of such simple descriptors include molecular weight, logP, polar surface area, and the number of hydrogen bond donors and acceptors.
In contrast, “local” QSAR models require more detailed descriptors. These models are typically used in lead optimization projects where improving specific molecular properties is essential. Therefore, more comprehensive descriptors related to the chemical structure are necessary, enabling targeted suggestions for structural modifications [145].
There are three main methods for feature selection: filter, wrapper, and embedded methods (Table 2). Filter methods select a subset of descriptors based on the intrinsic properties of the data [146,147]. They use relevant scoring metrics to rank features and select the top-ranked ones. Wrapper methods apply a learning algorithm to evaluate different subsets of features [148]. They conduct a search within the space of possible parameters, selecting features that optimize the model’s performance. Embedded methods, also known as hybrid methods, integrate feature selection into the training process of a specific learning algorithm. Like wrapper methods, they are sensitive to the structure of the underlying classifier and are tailored to a particular learning algorithm [149].

3.7. Model Validation: Internal Using the Training Dataset and External Using the Test Sataset

The goal of modeling is to find the optimal settings of the model parameters to accurately predict the activity of new, previously unseen molecules. This is achieved through validation of the developed model [150]. Validation helps avoid chance correlations from numerous descriptors used during model development and prevents data overfitting. In 2003, the Organization for Economic Co-operation and Development (OECD) defined five principles for QSAR validation for regulatory purposes [78]:
  • The QSAR model should be associated with a defined endpoint.
  • An unambiguous algorithm.
  • A defined domain of application.
  • Appropriate measures for goodness-of-fit, robustness, and predictivity.
  • Mechanistic interpretation, if possible.
Models can be validated through internal and external validation procedures. Internal validation of QSAR models involves using compounds from the training set to test the model’s predictability and accuracy. The most common method for internal validation is cross-validation using leave-one-out (LOO) or leave-group-out (LGO), also known as k-fold cross-validation [151,152]. In LGO cross-validation, the data are divided into k-number of groups. An iterative process excludes one group at a time, and a model is built using the remaining k-1 groups. The excluded group is then predicted, and the process continues until all groups have been left out once. During model runs, the parameters are kept constant. This process is repeated several times, and the scientist adjusts model parameters between each run until optimal model performance is achieved. In the LOO procedure, the process is similar, except that each compound from the training set is excluded one at a time. While this approach is time-consuming, it can provide an overly optimistic model evaluation, leading to potential errors.
Commonly used quality assessments for regression QSAR models include the following:
  • The Square of the Correlation Coefficient (R2): R2 measures the strength of the linear relationship between predicted and experimental activity values (y). It is calculated using the following equation:
R2 = 1 − ∑(ycalc − yexp)2/∑(yexp − ymean)2 = 1 − SSR/SS…
where ycalc is the predicted value of the model, yexp is the observed value, and ymean is the mean observed value. SSR is the sum of squares of the residuals and SS is the total sum of squares. R2 values range from 0 to 1, where values closer to 1 indicate a better fit. However, R2 increases with model complexity, potentially causing overfitting when the number of descriptors exceeds the number of observations.
  • Predictive Squared Correlation Coefficient (Q2): Q2, also known as the cross-validation correlation coefficient when LGO (Q2LGO) or LOO (Q2LOO) cross-validation is applied, is calculated as follows:
Q2 = 1 − ∑(ypred − yexp)2/∑(yexp − ymean)2 = 1 − RESS/SS…
where ypred is the predicted activity and PRESS is the predictive residual sum of squares. Q2 values typically range from 0 to the R2 value. A negative Q2 indicates a non-predictive model. Values above 0.5 suggest a valid model, while values above 0.8 are considered excellent [153,154]. If the margin between R2 and Q2 exceeds 0.2–0.3, this may signal chance correlations, irrelevant descriptors, or data outliers.
  • Root Mean Squared Error (RMSE): RMSE measures the standard deviation of residuals, indicating model accuracy:
RMSE = √ ∑(ycalc − yexp)2/n = √ SSR/n…
where n is the number of observations.
  • Root Mean Squared Error of Prediction (RMSEP): RMSEP has the same meaning as RMSE but corresponds to Q2. It is calculated as follows:
RMSEP = √ ∑(ypred − yexp)2/n = √ PRESS/n…
Both RMSE and RMSEP provide insight into the model’s accuracy relative to the measurement error in the original response data.
High Q2 values alone do not guarantee model validity. If the dataset contains duplicate compounds or clusters of similar compounds, the model may be overfitted with an artificially high Q2. Similarly, if the scientist conducts excessive parameter tuning or variable selection, the internal cross-validation results may be misleading due to chance correlations.
External validation addresses this issue by testing the model on an independent dataset that was set aside before model development. This dataset is never used in model training. The key external validation parameters are as follows:
  • Q2test (or Q2ext): calculated using the same equation as Q2 (Equation (6)) but applied to the test dataset.
  • RMSEPtest: Similarly to RMSEP (Equation (8)) but applied to the test dataset.
If the Q2test value is reasonably similar to Q2, along with valid R2 and Q2 values, the QSAR model is considered valid. If Q2test is significantly lower than Q2, the model is likely overfitted, requiring further refinement.
The most popular measures for assessing QSAR model performance are R2 and RMSE for goodness-of-fit, Q2LGO or Q2LOO for robustness, and Q2test and RMSEPtest for predictivity. These parameters ensure a well-validated and reliable QSAR model. For further reading, see references [153,154,155,156,157,158].

3.8. Model Interpretation and Prediction: Interpret the Model Results and Apply the Model for Predictions

A validated QSAR model can be used for both prediction and interpretation. For prediction, the same descriptors used during QSAR model training are calculated for new chemical compounds. These values are then input into the model to obtain a predicted response. When used this way, the QSAR model functions like a “black box”, where its internal processes remain hidden or not easily understood. As a result, the model generates predictions without explaining why specific results are produced. This approach makes QSAR models useful for virtual screening, enabling the evaluation of large virtual libraries to identify new compounds with desirable properties. A key benefit of this method is that the required computations are relatively straightforward and cost-efficient.
Another important application of QSAR models is identifying which chemical properties of studied compounds contribute most significantly to their biological effects. The interpretability of a QSAR model depends on both the selected descriptors and the applied machine learning techniques.
A QSAR model is considered interpretable when its descriptors have clear, well-defined meanings that correspond to specific physicochemical or structural properties. Examples include structural keys, molecular fragments, and 3D molecular interaction field descriptors like CoMFA and GRID. In contrast, descriptors such as topological indices or hashed fingerprints lack straightforward interpretations and are better suited for black-box QSAR models.
Machine learning techniques also play a crucial role in model interpretability. Non-linear methods like neural networks (NNs) [159] and Support Vector Machines (SVMs) [139] generally do not provide simple interpretations, making them more suitable for black-box applications [160,161].
Conversely, Partial Least Squares (PLS), a linear machine learning method, is widely used in QSAR modeling because it generates regression equations with coefficients that directly indicate how specific descriptors influence the biological activity [162]. However, PLS assumes a linear relationship between descriptors and biological activity, which can be a limitation when the true relationship is non-linear. In such cases, it becomes necessary to introduce square and cross-terms of descriptors by multiplying pairs of descriptors to create “cross-term blocks”. While these terms can enhance the model’s predictive accuracy, they also make interpretation more complex, especially when higher-order terms are included.

3.9. Applicability Domain and Methods for Its Identification

The applicability domain (AD) defines a theoretical region within the chemical space determined by the descriptors used to build a QSAR model. It indicates where the model can be reliably applied. Predictions for new compounds are trustworthy only if their chemical space falls within the AD established by the model’s training set.
Several methods are used to determine a model’s AD [163]. One common approach involves checking whether the compound being predicted is an outlier relative to the training set. Regression methods like PLS have built-in tools for outlier detection [122,164]. A PLS regression model can identify outliers in two ways: by evaluating the compound’s position in the PLS score plot and by analyzing the residuals of its descriptors. Algorithms have been developed for both methods to calculate the statistical probability of a compound being an outlier [122,164]. Another approach is based on chemical similarity, assuming that a compound’s prediction will be reliable if it resembles one or more compounds from the training set. Similarity can be quantified using metrics such as Euclidean or Manhattan distances and linkage methods like single, average, or complete linkage [165].
A more advanced method is conformal prediction, which addresses limitations in the predictive accuracy of traditional statistical methods like regression and classification [166,167]. This approach has two main components—(1) prediction regions or intervals, used for regression problems, while classification tasks involve assigning sets of labels; and (2) nonconformity score, which measures how “unusual” a new observation is compared to examples in the training set.

4. Lead Identification and Optimization

Drug design and discovery is a complex and time-intensive process, with most drug candidates failing during pre-clinical or clinical trials. QSAR approaches help accelerate this process by identifying potential hits from large compound libraries. These hits can then be acquired and tested experimentally for biological activity. Molecules with confirmed activity undergo further optimization to design promising drug candidates, reducing the need for extensive synthesis and experimental testing, thereby saving both time and costs.
Figure 3 illustrates the concept behind rational drug design [168]. Certain regions of the chemical space are well-characterized based on existing experimental data. QSAR models built on these data can predict the activity of new compounds with potentially improved properties.
Statistical Molecular Design (SMD) [38,76] organizes experiments within an information-gathering framework, enabling optimal predictions in specific regions of the chemical space. The goal is to create an accurate, valid model that effectively covers targeted chemical areas. Additionally, SMD can be used to relate the chemical features of compounds to their biological activity (BA) through regression modeling [38]. A regression equation similar to Equation (4) is constructed, where x1 represents a chemical compound expressed by k predictor variables. These variables capture chemical features, including higher-order or cross-product terms that account for the non-liner nature of BA. The regression coefficient, b, corresponds to each k variable, while a represents the residual BA for each compound. The objective is to minimize the error in the regression coefficients b of the QSAR model and to enhance the likelihood of deriving a robust and predictive model [169].
When predicted compounds fall outside previously explored regions, the process is termed lead discovery, involving the identification of entirely new chemical entities with novel properties. If the predicted compounds lie within previously explored areas, the process is referred to as lead optimization, focusing on refining the molecular structure for improved performance.
Additional experiments can be conducted to link distinct regions of the chemical space, enabling QSAR models to be updated with new data. This process expands the model’s applicability, unifies previously separate regions, and enhances prediction accuracy for unexplored areas. The cycle continues as more regions are modeled and explored.
In drug development, three key stages of molecules are recognized: hit, lead, and drug [170]. A hit is a molecule that contains a pharmacophore but has low affinity for its target receptor (~50 mmol/L). Through molecular design and optimization, the hit evolves into a lead, which exhibits improved activity and selectivity. Further optimization of the lead structure transforms it into a drug, characterized by optimal potency, selectivity, and pharmacokinetic properties. In this context, the drug must comply with Lipinski’s Rule of Five [171], a set of guidelines for evaluating drug-likeness based on molecular properties such as molecular weight, lipophilicity, hydrogen bond donors, and acceptors. This rule helps ensure good absorption, distribution, metabolism, and excretion (ADME) profiles for potential drug candidates. The distinguishing features of these three molecular types are summarized in Table 3.
QSAR is one of the LBDD methods used for lead identification and optimization. Table 4 presents additional techniques categorized based on whether the ligand structure or the target structure is known.
Most of these methods are beyond the scope of the current review; therefore, they will not be discussed.
Once the lead structure has been identified, the molecule must undergo further optimization to improve its affinity, selectivity, and ADMET properties. ADMET stands for Absorption, Distribution, Metabolism, Excretion, and Toxicity, which all influence how the drug behaves in the body and its pharmacological activity. It is important to recognize that lead optimization is a multi-objective process, with several factors influencing a compound’s druggability.
Historically, traditional lead optimization methods often relied on trial-and-error, as well as the expertise and judgment of the scientist. In most cases, lead optimization has not been a systematic process, with various properties being optimized at different stages. As a result, improvements in one area, such as absorption, may lead to deterioration in another, such as excretion. This is because it can be difficult to simultaneously analyze and incorporate all available data and information into the optimization process.
Today, it is standard practice to exclude compounds from screening libraries (whether virtual or real) that do not meet Lipinski’s Rule of Five [171], along with other filters like Ghose [172], Egan [173], Veber [174], and Muegge [175]. These filters help significantly narrow the search space. Once optimized, the drug can then proceed to pre-clinical and, eventually, clinical trials.
The era of AI has brought significant advancements, transforming productivity, effectiveness, innovation, and success rates in drug discovery and development. AI has demonstrated notable successes in CADD, including target identification [176], high-throughput screening [177], structure-based drug design [178], de novo design [179], protein folding prediction from sequences [180], ADMET prediction [181], druggability prediction [182], and QSAR modeling and lead optimization [183,184], among others [185,186,187].

5. Breast Cancer and Current Treatment Strategies

Cancer refers to a group of diseases marked by uncontrolled cell growth and division, resulting in the formation of a primary tumor that invades and damages surrounding tissues. It can also spread to other parts of the body through a process called metastasis, responsible for 90% of cancer-related deaths globally [188].
Breast cancer (BC) can occur in women of any age after puberty and is widespread globally. According to the World Health Organization (WHO), as reported on 13 March 2024, BC affected 2.3 million women and caused 670,000 deaths in 2022 [189]. Factors that increase the risk of breast cancer include age, obesity, excessive alcohol consumption, a family history of breast cancer, a history of radiation exposure, reproductive history (such as the age at first menstrual period or first pregnancy), tobacco use, and postmenopausal hormone status [190]. Unfortunately, half of BC cases are not associated with specific risk factors, with the exceptions being age (over 40 years) and gender (female).
In breast tissue, the hormones estrogen and progesterone are primary regulators of cell proliferation and differentiation [191]. These hormones pass through the cell membrane and bind to their corresponding receptor monomers—ERs and PRs. Upon binding, the receptors dimerize and are transported into the nucleus, where they activate signaling pathways. Extracellular estrogens bind to membrane receptors and activate the PI3K/AKT signaling pathway, which controls various cellular processes such as cancer formation and progression, apoptosis, angiogenesis, and cell cycle progression [192].
As a result, hormone therapy, including ER blockers (also known as antiestrogens) and aromatase inhibitors (AIs), is a major therapeutic strategy (Figure 4) [193]. Antiestrogens are medications that prevent transcriptional stimulation by estrogen receptor complexes [194]. There are two main types of antiestrogens: (1) nonsteroidal antiestrogens, also known as selective estrogen receptor modulators (SERMs), and (2) pure antiestrogens, which are analogs of natural hormones with long, flexible side chains at the C-7 position.
An alternative strategy to achieving antiestrogenic effects involves inhibiting aromatase, the enzyme responsible for converting androgens into estradiol and estrone [195,196,197]. Aromatase inhibitors (AIs) are used to treat postmenopausal women with hormone-positive breast cancer, where the primary source of estrogen is aromatase activity in tissues such as the breast, bone, vascular endothelium, and central nervous system [193]. AIs are typically categorized into steroidal (type 1) or nonsteroidal (type 2) inhibitors.
In the absence of estrogen, certain growth factors can activate estrogen receptors, establishing an alternative pathway that promotes endocrine-resistant breast tumors [198]. This activation occurs via phosphorylation of a specific domain in the ER known as the activating function (AF1), triggered by a cascade of kinase-mediated events [199,200]. A critical therapeutic target in oncology is the human epidermal growth factor receptor (EGFR) [201]. EGFR comprises four transmembrane tyrosine kinases: EGFR1/ErbB1, HER2/ErbB2, HER3/ErbB3, and HER4/ErbB4. When HER2 is overexpressed—occurring in 20–30% of aggressive breast cancer cases—it can lead to resistance to hormone therapies, particularly Tamoxifen [202]. HER2 inhibitors include small molecules from various chemical classes and monoclonal antibodies designed to target the receptor [203]. A key example is Trastuzumab (Herceptin®), a humanized monoclonal antibody that binds to the extracellular domain of the HER2 receptor, causing its internalization and degradation [204,205].
Targeting EGFR with small-molecule inhibitors is a promising cancer therapy approach [206,207]. Despite their initial success, the emergence of EGFR mutations presents a significant challenge to the efficacy of these drugs. Imatinib, approved in 2001, was the first EGFR-TKI, followed by Gefitinib in 2003 and Erlotinib in 2004, specifically targeting NSCLC. Lapatinib was approved in 2007 for breast cancer treatment. In the 2010s, seven FDA-approved EGFR-TKIs emerged, including Afatinib, Brigatinib, and Osimertinib. Additionally, four non-FDA-approved EGFR-TKIs, such as Icotinib and Olmutinib, were introduced in China and South Korea.
Other promising biological targets that are integral to cancer cell biology and are currently the focus of extensive research for the development of new compounds with potential anticancer activity include topoisomerase, kinase inhibitors, microtubule-targeting agents (MTAs), gene expression modulators and others. Topoisomerases are crucial enzymes in DNA replication and key targets for topoisomerase inhibitors (TOPO I and II), a vital class of anticancer drugs used to treat various solid tumors, including metastatic breast cancer [208,209]. These inhibitors disrupt DNA strand management during replication, causing cytotoxic effects in cancer cells. Anthracyclines, a foundational class of topoisomerase inhibitors, include clinically significant derivatives such as Doxorubicin, Epirubicin, and Daunorubicin. Doxorubicin, notably versatile, is effective against breast cancer, leukemia, lymphoma, sarcomas, and other cancers [210,211]. Synthetic anthracenediones, such as Mitoxantrone, target TOPO II, while acridine derivatives like Amsacrine inhibit topoisomerases [212,213]. Camptothecin derivatives, including Topotecan and Irinotecan, target TOPO I, and epipodophyllotoxins like Etoposide disrupt TOPO II, interfering with DNA replication in cancer cells [214,215].
Tubulin is a key protein involved in cell division and intracellular transport, making it an important target in cancer therapy [216]. Inhibiting tubulin’s role in microtubule formation induces apoptosis, with microtubule-targeting agents (MTAs) disrupting mitosis and arresting cells during interphase [217]. MTAs interact with tubulin at four key binding sites: Laulimalide, Taxane/Epothilone, Vinca Alkaloid, and colchicine. These agents disrupt microtubule dynamics, halting cell division and promoting cancer cell death [218,219,220,221,222]. The development of multitarget agents, particularly those targeting the colchicine binding site, is a promising strategy for overcoming drug resistance and improving therapeutic efficacy [223]. Fosbretabulin, an FDA-approved drug, targets the colchicine site, offering a potential treatment for thyroid cancer while being less susceptible to multi-drug resistance [224].
Epigenetic therapy aims to modify gene expression without changing the genetic code itself. It involves altering gene activity through mechanisms that do not affect the DNA sequence [225]. Key epigenetic processes include DNA methylation, histone modifications (such as acetylation and methylation), and non-coding RNAs. Among these, DNA methylation and histone tail acetylation/methylation are some of the most studied epigenetic mechanisms [226,227]. DNA methylation is an important epigenetic modification that regulates gene expression and plays a role in normal cell function and embryonic development [228]. Abnormal DNA methylation, such as hypermethylation of tumor suppressor genes or hypomethylation of oncogenes, can lead to tumor development [229]. DNA methylation is mediated by DNA methyltransferases (DNMTs), and inhibiting these enzymes shows potential for cancer therapy. Several DNMT inhibitors (DNMTis), such as Azacitidine and Decitabine, have been approved for clinical use in treating myelodysplastic syndrome and hematological cancers.
Histone methylation is regulated by histone methyltransferases (HMTs), with enzymes like histone lysine-specific demethylases, LSD1 and LSD2, responsible for removing methyl marks [230]. LSD1 is a key regulator of transcription, and research has focused on developing LSD1 inhibitors for cancer treatment [231]. Nine LSD1 inhibitors are currently in clinical trials [232].
Another important epigenetic modification is histone acetylation. Abnormal activity of histone deacetylases (HDACs) has been linked to diseases such as neurodegenerative and cardiovascular disorders [233]. HDAC inhibitors, including Vorinostat, Romidepsin, and Panobinostat, are FDA-approved for cancer treatment and are used primarily for hematologic cancers. These inhibitors have shown strong anticancer effects, particularly for cutaneous T-cell lymphoma and peripheral T-cell lymphoma [234].
Inhibiting signaling pathways involved in tumor growth and proliferation, such as through the use of kinase inhibitors, represents a cornerstone of targeted cancer therapies [235]. These inhibitors function by blocking the activity of kinases—enzymes responsible for transferring phosphate groups from ATP to specific substrates. This phosphorylation process is essential for regulating cell signaling pathways that govern critical cellular functions, including growth, proliferation, differentiation, and apoptosis. Promising targets for breast cancer therapy include the vascular endothelial growth factor (VEGF) and its receptor VEGFR [236,237], the PIK3/PDPK1/AKT/mTOR kinases pathway [238,239], the Rho Kinase (ROCK) pathway [240,241], the Aurora kinase pathway [242,243,244], and the JAK/STAT pathway [245,246].

6. Case Studies of QSAR Modeling of Anti-Breast Cancer Agents

Cancer, including breast cancer, remains one of the most challenging and extensively studied topics in drug design and discovery, attracting significant efforts from numerous scientific groups. This section reviews some of the most significant and recent QSAR studies on anti-breast cancer agents from the past five years. These studies are divided into two groups based on the type of biological activity assessed. The first group includes QSAR models developed using datasets with measured activities against specific targets. These studies are summarized in Table 5, providing additional information on the compounds’ scaffolds, their mechanism of action on specific targets, the type of QSAR models based on the descriptors used (2D- or 3D-QSAR), and the software employed. The second group includes studies where activities were evaluated in various cell lines, with exact targets and underlying mechanisms yet to be determined. A similar summary of these studies is presented in Table 6.
The following sections showcase successful QSAR case studies in which the developed models were applied to design new derivatives that were subsequently synthesized and experimentally validated. This process validates the models and supports their refinement or the development of improved QSAR models. These studies are marked with asterisks in Table 5 and Table 6.
An outstanding study on 39 selective estrogen receptor (ER) modulators, including partial agonists and antagonists, employed atom-based 3D-QSAR and 3D-pharmacophore searches using PHASE software from Schrödinger’s suite [247]. The dataset selection is noteworthy, as only compounds from co-crystallized structures of complexes with either wild-type or mutated ERs, retrieved from the PDB databank, were considered. The authors built a unique 3D-pharmacophore/3D-QSAR model combining structure-based and ligand-based alignment rules. The derived model was validated using an external test set of 13 compounds and applied to screen 4411 compounds from the National Cancer Institute (NCI) datasets. One of the top predicted compounds, the naturally occurring Brefeldin A (BFA), was identified as showing promising activity against ERα and the MCF-7 cell line, with notable selectivity against the MDA-MB-231 cell line. Further hit-to-lead structure optimization led to the development of twelve novel BFA derivatives. These compounds underwent in vitro and in vivo tests, demonstrating picomolar to nanomolar potencies against ERα. They were evaluated for their antiproliferative activities as p53 stimulators and as agents for BC cell cycle arrest. The selected leads showed promising anticancer activity, a favorable preclinical profile, and notable safety. Some of these compounds are potential candidates for preclinical and clinical trials, offering a promising future for SERM-related breast cancer therapy.
An exemplary study that incorporates several key steps discussed in this review for developing a reliable and robust QSAR model with strong predictive ability focused on ROCK1 and ROCK2 inhibitors [267]. The researchers developed two 3D-QSAR models for these systems, utilizing 34 and 32 compounds from the literature for the training sets and 15 compounds for each test set. Initially, the compounds were divided into four structural clusters, carefully modeled, and optimized at varying computational levels (semi-empirical and ab initio calculations). These compounds were subsequently docked into their respective binding sites, and the resulting bioactive conformations were used for alignment and the calculation of GRIND 3D descriptors. To identify the most significant GRIND variables, a fractional factorial design was applied, and these variables were then incorporated into PLS regression. The authors conducted an in-depth discussion of the 3D-QSAR models, identifying key structural features such as moieties, substituents, linkers, and optimal distances, as well as their favorable and unfavorable contributions to biological activity. Common structural characteristics for both ROCK1 and ROCK2 were also summarized. Additionally, the intermolecular interactions derived from Molecular Docking calculations were thoroughly examined for both systems. The insights gained from the 3D-QSAR models and Molecular Docking studies were employed to design and synthesize nine novel ROCK inhibitors. These compounds were docked into both binding sites, and their predicted intermolecular interactions were analyzed in detail. Experimental testing of the compounds against ROCK1 and ROCK2 revealed that compound C-19 stood out, displaying remarkable activity, with experimental pIC50 values deviating by only 0.4 units from the predicted values. Further testing showed that C-19 strongly inhibited ROCK1 (72.64%) and demonstrated potent anticancer effects, including enhanced apoptosis and cell cycle modulation, particularly in pancreatic cancer cell lines. Further assays on C-19 revealed its potential and mechanisms as a multitarget anticancer agent [267].
Another study focused on a 2D-QSAR-guided strategy for designing arylsulfonylhydrazone derivatives targeting human ER-positive breast adenocarcinoma and triple-negative breast (TNBC) adenocarcinoma [274]. The authors developed 2D-QSAR models with strong predictive performance for a series of 26 arylsulfonylhydrazones. A QSAR-based design was conducted, leading to the creation of nine novel derivatives, which were subsequently synthesized and tested. Their predicted anticancer activity, along with the anticipated impact of the indole ring on cytotoxic properties, was confirmed. Notably, seven and eight compounds demonstrated higher activity against MCF-7 and MDA-MB-231 cell lines, respectively, than originally anticipated. This QSAR-driven approach successfully identified several promising leads for anticancer drug development.
A study by Tomorowicz and colleagues serves as an excellent example of rational drug design using a small in-house dataset and 2D-QSAR modeling [273,283]. Their four-step QSAR-assisted rational design of novel 2-[(4-amino-6-R2-1,3,5-triazin-2-yl)methylthio]-N-(1-R1-imidazolidin-2-ylidene)-4-chloro-5-methylbenzenesulfonamide derivatives resulted in a 10-fold enhancement in activity against HCT-116 and MCF-7 cell lines. The activity improved from 45 μM and 46 μM, respectively, for compound 31 (included in the first step of QSAR modeling) to 3.6 μM and 4.5 μM, respectively, for compound 150 (included in the fourth step of QSAR modeling). The authors further investigated the potential mechanisms of cytotoxicity for the compounds with high selectivity indices by conducting various assays. This comprehensive study indicated that the compound scaffolds hold promising potential for further lead optimization and development.

7. QSAR: Benefits, Challenges, and Limitations

QSAR serves as an essential tool for streamlining the time-intensive and resource-demanding process of drug design and discovery by identifying promising hits from extensive compound libraries and supporting hit-to-lead optimization. By reducing the need for synthesizing and testing a large number of compounds, QSAR significantly lowers both time and financial costs. Additionally, QSAR allows for the prediction of biological activity for newly designed compounds before chemical synthesis, facilitating the development of innovative analogs. It also provides critical insights into molecular mechanisms by revealing interactions between specific functional groups in designed molecules and their target enzymes or proteins. Despite its many benefits, QSAR has faced criticism due to the poor statistical quality of many studies in the literature, which has contributed to its undeserved negative reputation. This can be attributed to the complexity and specificity of QSAR modeling. To ensure the reliability of QSAR models, scientists must meticulously follow best practices, including the careful selection and preparation of biological data, appropriate selection and reduction of descriptors based on their relevance and quantity, thorough model validation, and clear definition of the applicability domain. Neglecting any of these steps can result in flawed or unreliable QSAR models. While QSAR offers numerous advantages, it also has some limitations, including its (i) dependence on biological data: the accuracy of QSAR depends on the quality of biological data, which can be affected by experimental errors, potentially leading to false correlations; (ii) limited training datasets: small training sets may fail to adequately represent the studied properties, diminishing the predictive power of the QSAR model; (iii) descriptor challenges: selecting and reducing descriptors can be challenging, especially when they lack clear physical or chemical significance, leading to QSAR models being perceived as “black boxes”; and (iv) validation and applicability domain: ensuring robust model validation and defining the applicability domain are crucial for regulatory applications, but these processes cannot fully guarantee the model’s reliability for all scenarios. By addressing these challenges and adopting rigorous modeling practices, QSAR can continue to be a cornerstone in the advancement of drug discovery and development.

8. Conclusions

In our review, we revised Quantitative Structure–Activity Relationships (QSARs), one of the most used ligand-based drug design (LBDD) methods, with a focus on their application in the discovery and development of anti-breast cancer drugs in the past five years. Critical steps in the QSAR methodology, essential for its correct application—but often overlooked, leading to insignificant or misleading models—are examined. Additionally, current anti-breast cancer treatment strategies were briefly overviewed, along with some targets for future treatments. A discussion of notable works that could serve as models for future applications of this interdisciplinary and complex method, and that may help in feature drug design and development, was provided.

Author Contributions

Conceptualization, M.A. and B.V.; writing—original draft preparation, B.V.; writing—review and editing, M.A. and B.V.; visualization, B.V.; supervision, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This study is financed by the European Union-NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria, project № BG-RRP-2.004-0004-C01.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

During the preparation of this work, the authors used an AI tool for English editing. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design and writing of the review.

Abbreviations

ADapplicability domain
ADMETAbsorption, Distribution, Metabolism, Excretion and Toxicity
ANNsArtificial Neural Networks
AIArtificial Intelligence
AIsaromatase inhibitors
BCbreast cancer
BFABrefeldin A
Caco-2Cancer coli, “colon cancer”
CADDcomputer-aided drug design
CoMFAComparative Molecular Field Analysis
CoMSIAComparative Molecular Similarity Indices Analysis
CoMBINEComparative Binding Energy Analysis
CoMMAComparative Molecular Moment Analysis
CoRIAComparative Residue Interaction Analysis
CoSAComparative Spectral Analysis
DHFRDihydrofolate reductase
DLDeep Learning
DNADeoxyribonucleic acid
DNMTsDNA methyltransferases
DTsDecision Trees
ERestrogen receptor
EGFRepidermal growth factor receptor also known as ErbB
EGFR-TKIEGFR tyrosine kinase inhibitors
FDAFood and Drug Administration
HDACshistone deacetylases
HDAC6histone deacetylase 6
HER2human epidermal growth factor receptor 2
HIFAHint Interaction Field Analysis
HMTshistone methyltransferases
Holo-QSARHologram Quantitative Structure–Activity Relationship
HQSARHologram QSAR
HOMOHighest Occupied Molecular Orbital
HTShigh-throughput screening
HTVShigh-throughput virtual screening
JAK2Janus kinase 2
k-NNk-nearest neighbor
LAPLeucine aminopeptidase
LBDDligand-based drug design
LUMOLowest Unoccupied Molecular Orbital
LOOleave-one-out
LGOleave-group-out
LSD1/LSD2histone lysine-specific demethylase
MAOMonoamine oxidases
MLmachine learning
MLRMultiple Linear Regression
MTAsmicrotubule-targeting agents
NCEsnew chemical entities
NCINational Cancer Institute
NSCLCNon-small cell lung cancer
OECDthe Organization for Economic Co-operation and Development
PAMPAParallel Artificial Membrane Permeability Assay
PAMPA-BBBParallel Artificial Membrane Permeability Assay across the Blood–Brain Barrier
PCAPrincipal Component Analysis
PCMProteochemometrics
PCRPrincipal Component Regression
PDBProtein Data Bank
PKIsProtein kinase inhibitors
PLSPartial Least Squares (Regression)
PPsprincipal properties
PRProgesterone Receptor
PI3K/AKTPhosphatidylinositol 3-kinase and Akt (protein kinase B)
QSARsQuantitative Structure–Activity Relationships
RMSERoot Mean Squared Error
RMSEPRoot Mean Squared Error of Prediction
ROCKsRho-associated coiled-coil-containing protein kinases
SBDDstructure-based drug design
SERMsselective estrogen receptor modulators
SMDStatistical Molecular Design
SSRssum of squares of the residuals
SStotal sum of squares
SVMsSupport Vector Machines
TOPO I/IItopoisomerase inhibitor 1/2
TNBCTriple-Negative Breast Cancer
VERFGReceptors for vascular endothelial growth factor
WHOWorld Health Organization

References

  1. Fischer, E. Einfluss Der Configuration Auf Die Wirkung Der Enzyme. Berichte Dtsch. Chem. Ges. 1894, 27, 2985–2993. [Google Scholar] [CrossRef]
  2. Koshland, D.E. Application of a Theory of Enzyme Specificity to Protein Synthesis. Proc. Natl. Acad. Sci. USA 1958, 44, 98–104. [Google Scholar] [CrossRef]
  3. Ramanathan, A.; Savol, A.; Burger, V.; Chennubhotla, C.S.; Agarwal, P.K. Protein Conformational Populations and Functionally Relevant Substates. Acc. Chem. Res. 2014, 47, 149–156. [Google Scholar] [CrossRef] [PubMed]
  4. Kar, G.; Keskin, O.; Gursoy, A.; Nussinov, R. Allostery and Population Shift in Drug Discovery. Curr. Opin. Pharmacol. 2010, 10, 715–722. [Google Scholar] [CrossRef]
  5. Kenakin, T. Principles: Receptor Theory in Pharmacology. Trends Pharmacol. Sci. 2004, 25, 186–192. [Google Scholar] [CrossRef]
  6. Belorkar, S.A.; Jogaiah, S. Enzymes—Past, Present, and Future. In Protocols and Applications in Enzymology; Elsevier: Amsterdam, The Netherlands, 2022; pp. 1–15. [Google Scholar]
  7. Bardal, S.K.; Waechter, J.E.; Martin, D.S. Basic Principles and Pharmacodynamics. In Applied Pharmacology; Elsevier: Amsterdam, The Netherlands, 2011; pp. 3–16. [Google Scholar]
  8. Mannhold, R.; Kubinyi, H.; Folkers, G. Protein-Ligand Interactions; Böhm, H.-J., Schneider, G., Eds.; Methods and Principles in Medicinal Chemistry; Wiley: Hoboken, NJ, USA, 2003; ISBN 9783527305216. [Google Scholar]
  9. Seidel, T.; Schuetz, D.A.; Garon, A.; Langer, T. The Pharmacophore Concept and Its Applications in Computer-Aided Drug Design. In Progress in the Chemistry of Organic Natural Products 110: Cheminformatics in Natural Product Research; Springer: Berlin/Heidelberg, Germany, 2019; pp. 99–141. [Google Scholar]
  10. Tripathi, N.M.; Bandyopadhyay, A. High Throughput Virtual Screening (HTVS) of Peptide Library: Technological Advancement in Ligand Discovery. Eur. J. Med. Chem. 2022, 243, 114766. [Google Scholar] [CrossRef] [PubMed]
  11. Rester, U. From Virtuality to Reality–Virtual Screening in Lead Discovery and Lead Optimization: A Medicinal Chemistry Perspective. Curr. Opin. Drug Discov. Devel. 2008, 11, 559–568. [Google Scholar] [PubMed]
  12. Finan, C.; Gaulton, A.; Kruger, F.A.; Lumbers, R.T.; Shah, T.; Engmann, J.; Galver, L.; Kelley, R.; Karlsson, A.; Santos, R.; et al. The Druggable Genome and Support for Target Identification and Validation in Drug Development. Sci. Transl. Med. 2017, 9, eaag1166. [Google Scholar] [CrossRef]
  13. Oprea, T.I.; Hasselgren, C. Predicting Target and Chemical Druggability. In Comprehensive Medicinal Chemistry III.; Elsevier: Amsterdam, The Netherlands, 2017; pp. 429–439. [Google Scholar]
  14. Carvalho, A.L.; Trincão, J.; Romão, M.J. X-Ray Crystallography in Drug Discovery. In Ligand-Macromolecular Interactions in Drug Discovery: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2010; pp. 31–56. [Google Scholar]
  15. Emwas, A.-H.; Szczepski, K.; Poulson, B.G.; Chandra, K.; McKay, R.T.; Dhahri, M.; Alahmari, F.; Jaremko, L.; Lachowicz, J.I.; Jaremko, M. NMR as a “Gold Standard” Method in Drug Design and Discovery. Molecules 2020, 25, 4597. [Google Scholar] [CrossRef] [PubMed]
  16. Nogales, E.; Mahamid, J. Bridging Structural and Cell Biology with Cryo-Electron Microscopy. Nature 2024, 628, 47–56. [Google Scholar] [CrossRef]
  17. Appasani, K. Cryo-Electron Microscopy in Structural Biology; CRC Press: Boca Raton, FL, USA, 2024; ISBN 9781003326106. [Google Scholar]
  18. Wishart, D.S. Identifying Putative Drug Targets and Potential Drug Leads. In Molecular Modeling of Proteins; Springer: Berlin/Heidelberg, Germany, 2008; pp. 333–351. [Google Scholar]
  19. Muhammed, M.T.; Aki-Yalcin, E. Pharmacophore Modeling in Drug Discovery: Methodology and Current Status. J. Turk. Chem. Soc. Sect. A Chem. 2021, 8, 749–762. [Google Scholar] [CrossRef]
  20. Lee, J.Y.; Krieger, J.M.; Li, H.; Bahar, I. Pharmmaker: Pharmacophore Modeling and Hit Identification Based on Druggability Simulations. Protein Sci. 2020, 29, 76–86. [Google Scholar] [CrossRef] [PubMed]
  21. Giordano, D.; Biancaniello, C.; Argenio, M.A.; Facchiano, A. Drug Design by Pharmacophore and Virtual Screening Approach. Pharmaceuticals 2022, 15, 646. [Google Scholar] [CrossRef]
  22. López-Pérez, K.; Avellaneda-Tamayo, J.F.; Chen, L.; López-López, E.; Juárez-Mercado, K.E.; Medina-Franco, J.L.; Miranda-Quintana, R.A. Molecular Similarity: Theory, Applications, and Perspectives. Artif. Intell. Chem. 2024, 2, 100077. [Google Scholar] [CrossRef]
  23. Stumpfe, D.; Bajorath, J. Similarity Searching. WIREs Comput. Mol. Sci. 2011, 1, 260–282. [Google Scholar] [CrossRef]
  24. Yu, W.; MacKerell, A.D. Computer-Aided Drug Design Methods. In Antibiotics: Methods and Protocols; Springer: Berlin/Heidelberg, Germany, 2017; pp. 85–106. [Google Scholar]
  25. Shim, J.; MacKerell, A.D., Jr. Computational Ligand-Based Rational Design: Role of Conformational Sampling and Force Fields in Model Development. Medchemcomm 2011, 2, 356. [Google Scholar] [CrossRef] [PubMed]
  26. Yang, S.-Y. Pharmacophore Modeling and Applications in Drug Discovery: Challenges and Recent Advances. Drug Discov. Today 2010, 15, 444–450. [Google Scholar] [CrossRef] [PubMed]
  27. Meyer, H. Zur Theorie Der Alkoholnarkose. Arch. Für Exp. Pathol. Pharmakol. 1899, 42, 109–118. [Google Scholar] [CrossRef]
  28. Meyer, H. Zur Theorie Der Alkoholnarkose. Arch. Für Exp. Pathol. Pharmakol. 1901, 46, 338–346. [Google Scholar] [CrossRef]
  29. Meyer, K.H. Contributions to the Theory of Narcosis. Trans. Faraday Soc. 1937, 33, 1062. [Google Scholar] [CrossRef]
  30. Overton, C.E. Studien Über Die Narkose Zugleich Ein Beitrag Zur Allgemeinen Pharmakologie; Fischer: Jena, Germany, 1901. [Google Scholar]
  31. Hammett, L.P. The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives. J. Am. Chem. Soc. 1937, 59, 96–103. [Google Scholar] [CrossRef]
  32. Hansch, C.; Fujita, T. P-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure. J. Am. Chem. Soc. 1964, 86, 1616–1626. [Google Scholar] [CrossRef]
  33. Fujita, T.; Iwasa, J.; Hansch, C. A New Substituent Constant, π, Derived from Partition Coefficients. J. Am. Chem. Soc. 1964, 86, 5175–5180. [Google Scholar] [CrossRef]
  34. Free, S.M.; Wilson, J.W. A Mathematical Contribution to Structure-Activity Studies. J. Med. Chem. 1964, 7, 395–399. [Google Scholar] [CrossRef] [PubMed]
  35. Weaver, S.; Gleeson, M.P. The Importance of the Domain of Applicability in QSAR Modeling. J. Mol. Graph. Model. 2008, 26, 1315–1326. [Google Scholar] [CrossRef] [PubMed]
  36. Drews, J. Drug Discovery: A Historical Perspective. Science 2000, 287, 1960–1964. [Google Scholar] [CrossRef]
  37. Franco, L.S.; de Jesus, B.d.S.M.; Pinheiro, P.d.S.M.; Fraga, C.A.M. Remapping the Chemical Space and the Pharmacological Space of Drugs: What Can We Expect from the Road Ahead? Pharmaceuticals 2024, 17, 742. [Google Scholar] [CrossRef] [PubMed]
  38. Linusson, A.; Elofsson, M.; Andersson, I.E.; Dahlgren, M.K. Statistical Molecular Design of Balanced Compound Libraries for QSAR Modeling. Curr. Med. Chem. 2010, 17, 2001–2016. [Google Scholar] [CrossRef]
  39. Jolliffe, I.T.; Cadima, J. Principal Component Analysis: A Review and Recent Developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
  40. Salem, N.; Hussein, S. Data Dimensional Reduction and Principal Components Analysis. Procedia Comput. Sci. 2019, 163, 292–299. [Google Scholar] [CrossRef]
  41. Ramachandran, K.M.; Tsokos, C.P. Design of Experiments. In Mathematical Statistics with Applications in R.; Elsevier: Amsterdam, The Netherlands, 2015; pp. 459–494. [Google Scholar]
  42. Lundstedt, T.; Seifert, E.; Abramo, L.; Thelin, B.; Nyström, Å.; Pettersen, J.; Bergman, R. Experimental Design and Optimization. Chemom. Intell. Lab. Syst. 1998, 42, 3–40. [Google Scholar] [CrossRef]
  43. Linusson, A.; Gottfries, J.; Lindgren, F.; Wold, S. Statistical Molecular Design of Building Blocks for Combinatorial Chemistry. J. Med. Chem. 2000, 43, 1320–1328. [Google Scholar] [CrossRef] [PubMed]
  44. He, S.; Zhiti, A.; Barba-Bon, A.; Hennig, A.; Nau, W.M. Real-Time Parallel Artificial Membrane Permeability Assay Based on Supramolecular Fluorescent Artificial Receptors. Front. Chem. 2020, 8, 597927. [Google Scholar] [CrossRef]
  45. Mensch, J.; Melis, A.; Mackie, C.; Verreck, G.; Brewster, M.E.; Augustijns, P. Evaluation of Various PAMPA Models to Identify the Most Discriminating Method for the Prediction of BBB Permeability. Eur. J. Pharm. Biopharm. 2010, 74, 495–502. [Google Scholar] [CrossRef]
  46. van Breemen, R.B.; Li, Y. Caco-2 Cell Permeability Assays to Measure Drug Absorption. Expert Opin. Drug Metab. Toxicol. 2005, 1, 175–185. [Google Scholar] [CrossRef]
  47. Li, N.; Kulkarni, P.; Badrinarayanan, A.; Kefelegn, A.; Manoukian, R.; Li, X.; Prasad, B.; Karasu, M.; McCarty, W.J.; Knutson, C.G.; et al. P-Glycoprotein Substrate Assessment in Drug Discovery: Application of Modeling to Bridge Differential Protein Expression Across In Vitro Tools. J. Pharm. Sci. 2021, 110, 325–337. [Google Scholar] [CrossRef] [PubMed]
  48. Ladumor, M.K.; Tiwari, S.; Patil, A.; Bhavsar, K.; Jhajra, S.; Prasad, B.; Singh, S. High-Resolution Mass Spectrometry in Metabolite Identification. In Comprehensive Analytical Chemistry; Elsevier: Amsterdam, The Netherlands, 2016; pp. 199–229. [Google Scholar]
  49. Niles, A.L.; Moravec, R.A.; Riss, T.L. Update on in Vitro Cytotoxicity Assays for Drug Development. Expert Opin. Drug Discov. 2008, 3, 655–669. [Google Scholar] [CrossRef]
  50. van der Laan, J.-W.; Spindler, P. The in Vivo Rodent Test Systems for Assessment of Carcinogenic Potential. Regul. Toxicol. Pharmacol. 2002, 35, 122–125. [Google Scholar] [CrossRef] [PubMed]
  51. Guy, R.C. Ames Test. In Encyclopedia of Toxicology; Elsevier: Amsterdam, The Netherlands, 2024; pp. 377–379. [Google Scholar]
  52. Ishidate, M.; Miura, K.F.; Sofuni, T. Chromosome Aberration Assays in Genetic Toxicology Testing in Vitro. Mutat. Res. Mol. Mech. Mutagen. 1998, 404, 167–172. [Google Scholar] [CrossRef] [PubMed]
  53. Nettleton, D.O.; Einolf, H.J. Assessment of Cytochrome P450 Enzyme Inhibition and Inactivation in Drug Discovery and Development. Curr. Top. Med. Chem. 2011, 11, 382–403. [Google Scholar] [CrossRef] [PubMed]
  54. Lin, J.H. CYP Induction-Mediated Drug Interactions: In Vitro Assessment and Clinical Implications. Pharm. Res. 2006, 23, 1089–1116. [Google Scholar] [CrossRef]
  55. Sugimoto, H.; Matsumoto, S.-I.; Tachibana, M.; Niwa, S.-I.; Hirabayashi, H.; Amano, N.; Moriwaki, T. Establishment of In Vitro P-Glycoprotein Inhibition Assay and Its Exclusion Criteria to Assess the Risk Of Drug–Drug Interaction at the Drug Discovery Stage. J. Pharm. Sci. 2011, 100, 4013–4023. [Google Scholar] [CrossRef] [PubMed]
  56. Lanevskij, K.; Didziapetris, R. Physicochemical QSAR Analysis of Passive Permeability Across Caco-2 Monolayers. J. Pharm. Sci. 2019, 108, 78–86. [Google Scholar] [CrossRef]
  57. Furuhama, A.; Kitazawa, A.; Yao, J.; Matos dos Santos, C.E.; Rathman, J.; Yang, C.; Ribeiro, J.V.; Cross, K.; Myatt, G.; Raitano, G.; et al. Evaluation of QSAR Models for Predicting Mutagenicity: Outcome of the Second Ames/QSAR International Challenge Project. SAR QSAR Environ. Res. 2023, 34, 983–1001. [Google Scholar] [CrossRef]
  58. Sun, L.; Yang, H.; Li, J.; Wang, T.; Li, W.; Liu, G.; Tang, Y. In Silico Prediction of Compounds Binding to Human Plasma Proteins by QSAR Models. ChemMedChem 2018, 13, 572–581. [Google Scholar] [CrossRef] [PubMed]
  59. Cappelli, C.I.; Benfenati, E.; Cester, J. Evaluation of QSAR Models for Predicting the Partition Coefficient (LogP) of Chemicals under the REACH Regulation. Environ. Res. 2015, 143, 26–32. [Google Scholar] [CrossRef] [PubMed]
  60. Mansouri, K.; Cariello, N.F.; Korotcov, A.; Tkachenko, V.; Grulke, C.M.; Sprankle, C.S.; Allen, D.; Casey, W.M.; Kleinstreuer, N.C.; Williams, A.J. Open-Source QSAR Models for PKa Prediction Using Multiple Machine Learning Approaches. J. Cheminform. 2019, 11, 60. [Google Scholar] [CrossRef]
  61. Gozalbes, R.; Pineda-Lucena, A. QSAR-Based Solubility Model for Drug-like Compounds. Bioorg. Med. Chem. 2010, 18, 7078–7084. [Google Scholar] [CrossRef]
  62. Srimathi, R.; Kathiravan, M. Lead Optimization of 4-(Thio)-Chromenone 6-O-Sulfamate Analogs Using QSAR, Molecular Docking and DFT—A Combined Approach as Steroidal Sulfatase Inhibitors. J. Recept. Signal Transduct. 2021, 41, 123–137. [Google Scholar] [CrossRef]
  63. Arian, R.; Hariri, A.; Mehridehnavi, A.; Fassihi, A.; Ghasemi, F. Protein Kinase Inhibitors’ Classification Using K-Nearest Neighbor Algorithm. Comput. Biol. Chem. 2020, 86, 107269. [Google Scholar] [CrossRef] [PubMed]
  64. Myshkin, E.; Brennan, R.; Khasanova, T.; Sitnik, T.; Serebriyskaya, T.; Litvinova, E.; Guryanov, A.; Nikolsky, Y.; Nikolskaya, T.; Bureeva, S. Prediction of Organ Toxicity Endpoints by QSAR Modeling Based on Precise Chemical-Histopathology Annotations. Chem. Biol. Drug Des. 2012, 80, 406–416. [Google Scholar] [CrossRef]
  65. Lowe, C.N.; Charest, N.; Ramsland, C.; Chang, D.T.; Martin, T.M.; Williams, A.J. Transparency in Modeling through Careful Application of OECD’s QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem. Res. Toxicol. 2023, 36, 465–478. [Google Scholar] [CrossRef]
  66. Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
  67. Bassani, D.; Moro, S. Past, Present, and Future Perspectives on Computer-Aided Drug Design Methodologies. Molecules 2023, 28, 3906. [Google Scholar] [CrossRef]
  68. Golbraikh, A.; Wang, X.S.; Zhu, H.; Tropsha, A. Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment. In Handbook of Computational Chemistry; Springer International Publishing: Cham, Switzerland, 2017; pp. 2303–2340. [Google Scholar]
  69. PubChem. Available online: https://pubchem.ncbi.nlm.nih.gov (accessed on 24 December 2024).
  70. ChemSpider. Available online: https://www.chemspider.com (accessed on 24 December 2024).
  71. DrugBank. Available online: https://go.drugbank.com (accessed on 24 December 2024).
  72. ChEMBL. Available online: https://www.ebi.ac.uk/chembl (accessed on 24 December 2024).
  73. NCI60. Available online: https://dtp.cancer.gov/discovery_development/nci-60/ (accessed on 24 December 2024).
  74. STITCH. Available online: http://stitch.embl.de/ (accessed on 18 January 2025).
  75. BioAssay. Available online: https://www.ncbi.nlm.nih.gov/guide/chemicals-bioassays/ (accessed on 24 December 2024).
  76. Andersson, C.D.; Hillgren, J.M.; Lindgren, C.; Qian, W.; Akfur, C.; Berg, L.; Ekström, F.; Linusson, A. Benefits of Statistical Molecular Design, Covariance Analysis, and Reference Models in QSAR: A Case Study on Acetylcholinesterase. J. Comput. Aided. Mol. Des. 2015, 29, 199–215. [Google Scholar] [CrossRef] [PubMed]
  77. Brereton, R.G. Orthogonality, Uncorrelatedness, and Linear Independence of Vectors. J. Chemom. 2016, 30, 564–566. [Google Scholar] [CrossRef]
  78. Gómez-Jiménez, G.; Gonzalez-Ponce, K.; Castillo-Pazos, D.J.; Madariaga-Mazon, A.; Barroso-Flores, J.; Cortes-Guzman, F.; Martinez-Mayorga, K. The OECD Principles for (Q)SAR Models in the Context of Knowledge Discovery in Databases (KDD). In Advances in Protein Chemistry and Structural Biology; Springer: Berlin/Heidelberg, Germany, 2018; pp. 85–117. [Google Scholar]
  79. Burge, S.; Attwood, T.K.; Bateman, A.; Berardini, T.Z.; Cherry, M.; O’Donovan, C.; Xenarios, L.; Gaudet, P. Biocurators and Biocuration: Surveying the 21st Century Challenges. Database 2012, 2012, bar059. [Google Scholar] [CrossRef] [PubMed]
  80. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors; Methods and Principles in Medicinal Chemistry; Wiley: Hoboken, NJ, USA, 2000; ISBN 9783527299133. [Google Scholar]
  81. ACD/Labs. Available online: www.acdlabs.com (accessed on 24 December 2024).
  82. DRAGON. Available online: https://www.talete.mi.it/products/dragon_description.htm (accessed on 24 December 2024).
  83. E-DRAGON. Available online: https://vcclab.org/lab/edragon/ (accessed on 24 December 2024).
  84. Tetko, I.V.; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V.A.; Radchenko, E.V.; Zefirov, N.S.; Makarenko, A.S.; et al. Virtual Computational Chemistry Laboratory—Design and Description. J. Comput. Aided. Mol. Des. 2005, 19, 453–463. [Google Scholar] [CrossRef]
  85. CDK. Available online: https://cdk.github.io/ (accessed on 24 December 2024).
  86. CODESSA. Available online: https://www.codessa-pro.com/index.htm (accessed on 24 December 2024).
  87. Chemical Computing Group Inc Molecular Operating Environment (MOE). 2016. Available online: https://www.chemcomp.com/en/Products.htm (accessed on 24 December 2024).
  88. MOE. Available online: www.chemcomp.com (accessed on 24 December 2024).
  89. MOLD2. Available online: https://www.fda.gov/science-research/bioinformatics-tools/mold2 (accessed on 24 December 2024).
  90. Hong, H.; Xie, Q.; Ge, W.; Qian, F.; Fang, H.; Shi, L.; Su, Z.; Perkins, R.; Tong, W. Mold 2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics. J. Chem. Inf. Model. 2008, 48, 1337–1344. [Google Scholar] [CrossRef]
  91. PowerMV. Available online: https://www.niss.org/research/software/powermv (accessed on 24 December 2024).
  92. PreADMET. Available online: https://preadmet.webservice.bmdrc.org/ (accessed on 24 December 2024).
  93. Sylvester, J.J. Chemistry and Algebra. Nature 1878, 17, 284. [Google Scholar] [CrossRef]
  94. Randic, M. Characterization of Molecular Branching. J. Am. Chem. Soc. 1975, 97, 6609–6615. [Google Scholar] [CrossRef]
  95. Randić, M. The Connectivity Index 25 Years After. J. Mol. Graph. Model. 2001, 20, 19–35. [Google Scholar] [CrossRef]
  96. Kier, L.B.; Hall, L.H. Molecular Connectivity VII: Specific Treatment of Heteroatoms. J. Pharm. Sci. 1976, 65, 1806–1809. [Google Scholar] [CrossRef]
  97. Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17–20. [Google Scholar] [CrossRef] [PubMed]
  98. Bonche, D.; Trinajstič, N. Overall Molecular Descriptors. 3. Overall Zagreb Indices. SAR QSAR Environ. Res. 2001, 12, 213–236. [Google Scholar] [CrossRef]
  99. Balaban, A.T. Highly Discriminating Distance-Based Topological Index. Chem. Phys. Lett. 1982, 89, 399–404. [Google Scholar] [CrossRef]
  100. Kier, L.B. A Shape Index from Molecular Graphs. Quant. Struct. Relatsh. 1985, 4, 109–116. [Google Scholar] [CrossRef]
  101. Cramer, R.D.; Patterson, D.E.; Bunce, J.D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959–5967. [Google Scholar] [CrossRef] [PubMed]
  102. Klebe, G.; Abraham, U.; Mietzner, T. Molecular Similarity Indices in a Comparative Analysis (CoMSIA) of Drug Molecules to Correlate and Predict Their Biological Activity. J. Med. Chem. 1994, 37, 4130–4146. [Google Scholar] [CrossRef]
  103. Ortiz, A.R.; Pisabarro, M.T.; Gago, F.; Wade, R.C. Prediction of Drug Binding Affinities by Comparative Binding Energy Analysis. J. Med. Chem. 1995, 38, 2681–2691. [Google Scholar] [CrossRef] [PubMed]
  104. Datar, P.A.; Khedkar, S.A.; Malde, A.K.; Coutinho, E.C. Comparative Residue Interaction Analysis (CoRIA): A 3D-QSAR Approach to Explore the Binding Contributions of Active Site Residues with Ligands. J. Comput. Aided. Mol. Des. 2006, 20, 343–360. [Google Scholar] [CrossRef]
  105. Kellogg, G.E.; Semus, S.F.; Abraham, D.J. HINT: A New Method of Empirical Hydrophobic Field Calculation for CoMFA. J. Comput. Aided. Mol. Des. 1991, 5, 545–552. [Google Scholar] [CrossRef] [PubMed]
  106. Silverman, B.D.; Platt, D.E. Comparative Molecular Moment Analysis (CoMMA): 3D-QSAR without Molecular Superposition. J. Med. Chem. 1996, 39, 2129–2140. [Google Scholar] [CrossRef] [PubMed]
  107. Bursi, R.; Dao, T.; van Wijk, T.; de Gooyer, M.; Kellenbach, E.; Verwer, P. Comparative Spectra Analysis (CoSA): Spectra as Three-Dimensional Molecular Descriptors for the Prediction of Biological Activities. J. Chem. Inf. Comput. Sci. 1999, 39, 861–867. [Google Scholar] [CrossRef] [PubMed]
  108. David, R.L. HQSAR: A New, Highly Predictive QSAR Technique. Available online: https://pdfs.semanticscholar.org/efb9/de3a7d30dc445dfc0904da4ff225237be50c.pdf (accessed on 24 December 2024).
  109. Cartier, A.; Rivail, J.-L. Electronic Descriptors in Quantitative Structure—Activity Relationships. Chemom. Intell. Lab. Syst. 1987, 1, 335–347. [Google Scholar] [CrossRef]
  110. Wang, L.; Ding, J.; Pan, L.; Cao, D.; Jiang, H.; Ding, X. Quantum Chemical Descriptors in Quantitative Structure–Activity Relationship Models and Their Applications. Chemom. Intell. Lab. Syst. 2021, 217, 104384. [Google Scholar] [CrossRef]
  111. Danishuddin; Khan, A.U. Descriptors and Their Selection Methods in QSAR Analysis: Paradigm for Drug Design. Drug Discov. Today 2016, 21, 1291–1302. [Google Scholar] [CrossRef] [PubMed]
  112. Roy, K. Topological Descriptors in Drug Design and Modeling Studies. Mol. Divers. 2004, 8, 321–323. [Google Scholar] [CrossRef]
  113. Mapari, S.; Camarda, K. V Use of Three-Dimensional Descriptors in Molecular Design for Biologically Active Compounds. Curr. Opin. Chem. Eng. 2020, 27, 60–64. [Google Scholar] [CrossRef]
  114. van Speybroeck, V.; Gani, R.; Meier, R.J. The Calculation of Thermodynamic Properties of Molecules. Chem. Soc. Rev. 2010, 39, 1764. [Google Scholar] [CrossRef] [PubMed]
  115. Yao, F.; Coquery, J.; Lê Cao, K.-A. Independent Principal Component Analysis for Biologically Meaningful Dimension Reduction of Large Biological Data Sets. BMC Bioinform. 2012, 13, 24. [Google Scholar] [CrossRef] [PubMed]
  116. Andrada, M.F.; Vega-Hissi, E.G.; Estrada, M.R.; Garro Martinez, J.C. Application of K-Means Clustering, Linear Discriminant Analysis and Multivariate Linear Regression for the Development of a Predictive QSAR Model on 5-Lipoxygenase Inhibitors. Chemom. Intell. Lab. Syst. 2015, 143, 122–129. [Google Scholar] [CrossRef]
  117. Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-Means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
  118. Andrada, M.F.; Vega-Hissi, E.G.; Estrada, M.R.; Garro Martinez, J.C. Impact Assessment of the Rational Selection of Training and Test Sets on the Predictive Ability of QSAR Models. SAR QSAR Environ. Res. 2017, 28, 1011–1023. [Google Scholar] [CrossRef]
  119. Kohonen, T. The Self-Organizing Map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
  120. Kohonen, T. Exploration of Very Large Databases by Self-Organizing Maps. In Proceedings of the Proceedings of International Conference on Neural Networks (ICNN’97), Houston, TX, USA, 12 June 1997; IEEE: Piscataway, NJ, USA; Volume 1, pp. PL1–PL6. [Google Scholar]
  121. Krzywinski, M.; Altman, N. Multiple Linear Regression. Nat. Methods 2015, 12, 1103–1104. [Google Scholar] [CrossRef]
  122. Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
  123. Abdi, H. Partial Least Squares Regression and Projection on Latent Structure Regression (PLS Regression). WIREs Comput. Stat. 2010, 2, 97–106. [Google Scholar] [CrossRef]
  124. Jolliffe, I.T. A Note on the Use of Principal Components in Regression. Appl. Stat. 1982, 31, 300. [Google Scholar] [CrossRef]
  125. Nielsen, F. Hierarchical Clustering. In Introduction to HPC with MPI for Data Science; Springer: Berlin/Heidelberg, Germany, 2016; pp. 195–211. [Google Scholar]
  126. Qu, L.; Pei, Y. A Comprehensive Review on Discriminant Analysis for Addressing Challenges of Class-Level Limitations, Small Sample Size, and Robustness. Processes 2024, 12, 1382. [Google Scholar] [CrossRef]
  127. Lachenbruch, P.A. Discriminant Analysis. In Encyclopedia of Statistical Sciences; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
  128. Bewick, V.; Cheek, L.; Ball, J. Statistics Review 14: Logistic Regression. Crit. Care 2005, 9, 112. [Google Scholar] [CrossRef]
  129. Allenbrand, C. Supervised and Unsupervised Learning Models for Pharmaceutical Drug Rating and Classification Using Consumer Generated Reviews. Healthc. Anal. 2024, 5, 100288. [Google Scholar] [CrossRef]
  130. Kaneko, H. Clustering Method for the Construction of Machine Learning Model with High Predictive Ability. Chemom. Intell. Lab. Syst. 2024, 246, 105084. [Google Scholar] [CrossRef]
  131. Ma, C.Y.; Buontempo, F.V.; Wang, X.Z. Inductive Data Mining: Automatic Generation of Decision Trees from Data for QSAR Modelling and Process Historical Data Analysis. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2008; pp. 581–586. [Google Scholar]
  132. Kingsford, C.; Salzberg, S.L. What Are Decision Trees? Nat. Biotechnol. 2008, 26, 1011–1013. [Google Scholar] [CrossRef]
  133. Hecht-Nielsen, R. Counterpropagation Networks. Appl. Opt. 1987, 26, 4979. [Google Scholar] [CrossRef]
  134. Mitchell, M. An Introduction to Genetic Algorithms; The MIT Press: Cambridge, MA, USA, 1996; ISBN 9780262280013. [Google Scholar]
  135. Baskin, I.I.; Ait, A.O.; Halberstam, N.M.; Palyulin, V.A.; Zefirov, N.S. An Approach to the Interpretation of Backpropagation Neural Network Models in QSAR Studies. SAR QSAR Environ. Res. 2002, 13, 35–41. [Google Scholar] [CrossRef]
  136. Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
  137. Kohonen, T. An Introduction to Neural Computing. Neural Netw. 1988, 1, 3–16. [Google Scholar] [CrossRef]
  138. Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  139. Vapnik, V.N. The Support Vector Method. In International Conference on Artificial Neural Networks; Springer: Berlin/Heidelberg, Germany, 1997; pp. 261–271. [Google Scholar]
  140. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  141. Chen, T.; Guestrin, C. XGBoost. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  142. Blanco-González, A.; Cabezón, A.; Seco-González, A.; Conde-Torres, D.; Antelo-Riveiro, P.; Piñeiro, Á.; Garcia-Fandino, R. The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. Pharmaceuticals 2023, 16, 891. [Google Scholar] [CrossRef] [PubMed]
  143. Selvaraj, C.; Chandra, I.; Singh, S.K. Artificial Intelligence and Machine Learning Approaches for Drug Design: Challenges and Opportunities for the Pharmaceutical Industries. Mol. Divers. 2022, 26, 1893–1913. [Google Scholar] [CrossRef]
  144. Askr, H.; Elgeldawi, E.; Aboul Ella, H.; Elshaier, Y.A.M.M.; Gomaa, M.M.; Hassanien, A.E. Deep Learning in Drug Discovery: An Integrative Review and Future Challenges. Artif. Intell. Rev. 2023, 56, 5975–6037. [Google Scholar] [CrossRef]
  145. Kawakami, J.K.; Martinez, Y.; Sasaki, B.; Harris, M.; Kurata, W.E.; Lau, A.F. Investigation of a Novel Molecular Descriptor for the Lead Optimization of 4-Aminoquinazolines as Vascular Endothelial Growth Factor Receptor-2 Inhibitors: Application for Quantitative Structure–Activity Relationship Analysis in Lead Optimization. Bioorg. Med. Chem. Lett. 2011, 21, 1371–1375. [Google Scholar] [CrossRef]
  146. Saeys, Y.; Inza, I.; Larrañaga, P. A Review of Feature Selection Techniques in Bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed]
  147. Hira, Z.M.; Gillies, D.F. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv. Bioinform. 2015, 2015, 198363. [Google Scholar] [CrossRef]
  148. Chen, G.; Chen, J. A Novel Wrapper Method for Feature Selection and Its Applications. Neurocomputing 2015, 159, 219–226. [Google Scholar] [CrossRef]
  149. Naik, A.K.; Kuppili, V. An Embedded Feature Selection Method Based on Generalized Classifier Neural Network for Cancer Classification. Comput. Biol. Med. 2024, 168, 107677. [Google Scholar] [CrossRef]
  150. Eriksson, L.; Jaworska, J.; Worth, A.P.; Cronin, M.T.D.; McDowell, R.M.; Gramatica, P. Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs. Environ. Health Perspect. 2003, 111, 1361–1375. [Google Scholar] [CrossRef]
  151. Berrar, D. Cross-Validation. In Encyclopedia of Bioinformatics and Computational Biology; Elsevier: Amsterdam, The Netherlands, 2019; pp. 542–545. [Google Scholar]
  152. Stone, M. Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Soc. Ser. B Stat. Methodol. 1974, 36, 111–133. [Google Scholar] [CrossRef]
  153. Golbraikh, A.; Tropsha, A. Beware of Q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef] [PubMed]
  154. Tropsha, A.; Gramatica, P.; Gombar, V.K. The Importance of Being Earnest: Validation Is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
  155. Alexander, D.L.J.; Tropsha, A.; Winkler, D.A. Beware of R 2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J. Chem. Inf. Model. 2015, 55, 1316–1322. [Google Scholar] [CrossRef]
  156. Schüürmann, G.; Ebert, R.-U.; Chen, J.; Wang, B.; Kühne, R. External Validation and Prediction Employing the Predictive Squared Correlation Coefficient—Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 2008, 48, 2140–2145. [Google Scholar] [CrossRef] [PubMed]
  157. Consonni, V.; Ballabio, D.; Todeschini, R. Evaluation of Model Predictive Ability by External Validation Techniques. J. Chemom. 2010, 24, 194–201. [Google Scholar] [CrossRef]
  158. Király, P.; Kiss, R.; Kovács, D.; Ballaj, A.; Tóth, G. The Relevance of Goodness-of-Fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type. Mol. Inform. 2022, 41, 2200072. [Google Scholar] [CrossRef] [PubMed]
  159. Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024, 15, 517. [Google Scholar] [CrossRef]
  160. Shi, Y. Support Vector Regression-Based QSAR Models for Prediction of Antioxidant Activity of Phenolic Compounds. Sci. Rep. 2021, 11, 8806. [Google Scholar] [CrossRef]
  161. Xu, Y. Deep Neural Networks for QSAR. In Artificial Intelligence in Drug Design; Springer: Berlin/Heidelberg, Germany, 2022; pp. 233–260. [Google Scholar]
  162. Stanton, D.T. QSAR and QSPR Model Interpretation Using Partial Least Squares (PLS) Analysis. Curr. Comput. Aided Drug Des. 2012, 8, 107–127. [Google Scholar] [CrossRef] [PubMed]
  163. Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R. Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. Molecules 2012, 17, 4791–4810. [Google Scholar] [CrossRef] [PubMed]
  164. Fernández Pierna, J.A.; Jin, L.; Daszykowski, M.; Wahl, F.; Massart, D.L. A Methodology to Detect Outliers/Inliers in Prediction with PLS. Chemom. Intell. Lab. Syst. 2003, 68, 17–28. [Google Scholar] [CrossRef]
  165. Liu, R.; Wallqvist, A. Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds. J. Chem. Inf. Model. 2019, 59, 181–189. [Google Scholar] [CrossRef] [PubMed]
  166. Alvarsson, J.; Arvidsson McShane, S.; Norinder, U.; Spjuth, O. Predicting With Confidence: Using Conformal Prediction in Drug Discovery. J. Pharm. Sci. 2021, 110, 42–49. [Google Scholar] [CrossRef]
  167. Lampa, S.; Alvarsson, J.; Arvidsson Mc Shane, S.; Berg, A.; Ahlberg, E.; Spjuth, O. Predicting Off-Target Binding Profiles With Confidence Using Conformal Prediction. Front. Pharmacol. 2018, 9, 1256. [Google Scholar] [CrossRef] [PubMed]
  168. Mavromoustakos, T.; Durdagi, S.; Koukoulitsa, C.; Simcic, M.; Papadopoulos, M.G.; Hodoscek, M.; Grdadolnik, S.G. Strategies in the Rational Drug Design. Curr. Med. Chem. 2011, 18, 2517–2530. [Google Scholar] [CrossRef] [PubMed]
  169. Eriksson, L.; Andersson, P.L.; Johansson, E.; Tysklind, M. Megavariate Analysis of Environmental QSAR Data. Part I—A Basic Framework Founded on Principal Component Analysis (PCA), Partial Least Squares (PLS), and Statistical Molecular Design (SMD). Mol. Divers. 2006, 10, 169–186. [Google Scholar] [CrossRef]
  170. Keserű, G.M.; Makara, G.M. Hit Discovery and Hit-to-Lead Approaches. Drug Discov. Today 2006, 11, 741–748. [Google Scholar] [CrossRef]
  171. Lipinski, C.A. Lead- and Drug-like Compounds: The Rule-of-Five Revolution. Drug Discov. Today Technol. 2004, 1, 337–341. [Google Scholar] [CrossRef] [PubMed]
  172. Ghose, A.K.; Viswanadhan, V.N.; Wendoloski, J.J. A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases. J. Comb. Chem. 1999, 1, 55–68. [Google Scholar] [CrossRef] [PubMed]
  173. Egan, W.J.; Merz, K.M.; Baldwin, J.J. Prediction of Drug Absorption Using Multivariate Statistics. J. Med. Chem. 2000, 43, 3867–3877. [Google Scholar] [CrossRef] [PubMed]
  174. Veber, D.F.; Johnson, S.R.; Cheng, H.-Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular Properties That Influence the Oral Bioavailability of Drug Candidates. J. Med. Chem. 2002, 45, 2615–2623. [Google Scholar] [CrossRef] [PubMed]
  175. Muegge, I.; Heald, S.L.; Brittelli, D. Simple Selection Criteria for Drug-like Chemical Matter. J. Med. Chem. 2001, 44, 1841–1846. [Google Scholar] [CrossRef] [PubMed]
  176. You, Y.; Lai, X.; Pan, Y.; Zheng, H.; Vera, J.; Liu, S.; Deng, S.; Zhang, L. Artificial Intelligence in Cancer Target Identification and Drug Discovery. Signal Transduct. Target. Ther. 2022, 7, 156. [Google Scholar] [CrossRef] [PubMed]
  177. Wallach, I.; Bernard, D.; Nguyen, K.; Ho, G.; Morrison, A.; Stecula, A.; Rosnik, A.; O’Sullivan, A.M.; Davtyan, A.; Samudio, B.; et al. AI Is a Viable Alternative to High Throughput Screening: A 318-Target Study. Sci. Rep. 2024, 14, 7526. [Google Scholar] [CrossRef]
  178. Isert, C.; Atz, K.; Schneider, G. Structure-Based Drug Design with Geometric Deep Learning. Curr. Opin. Struct. Biol. 2023, 79, 102548. [Google Scholar] [CrossRef]
  179. Janet, J.P.; Mervin, L.; Engkvist, O. Artificial Intelligence in Molecular de Novo Design: Integration with Experiment. Curr. Opin. Struct. Biol. 2023, 80, 102575. [Google Scholar] [CrossRef]
  180. Gomes, P.S.F.C.; Gomes, D.E.B.; Bernardi, R.C. Protein Structure Prediction in the Era of AI: Challenges and Limitations When Applying to in Silico Force Spectroscopy. Front. Bioinforma. 2022, 2, 983306. [Google Scholar] [CrossRef]
  181. Siramshetty, V.B.; Xu, X.; Shah, P. Artificial Intelligence in ADME Property Prediction. In Computational Drug Discovery and Design; Springer: New York, NY, USA, 2024; pp. 307–327. [Google Scholar]
  182. Raies, A.; Tulodziecka, E.; Stainer, J.; Middleton, L.; Dhindsa, R.S.; Hill, P.; Engkvist, O.; Harper, A.R.; Petrovski, S.; Vitsios, D. DrugnomeAI Is an Ensemble Machine-Learning Framework for Predicting Druggability of Candidate Drug Targets. Commun. Biol. 2022, 5, 1291. [Google Scholar] [CrossRef]
  183. Karampuri, A.; Perugu, S. A Breast Cancer-Specific Combinational QSAR Model Development Using Machine Learning and Deep Learning Approaches. Front. Bioinforma. 2024, 3, 1328262. [Google Scholar] [CrossRef] [PubMed]
  184. Huang, Z.; Jiang, S.; Xiao, W. Optimization Method of an Antibreast Cancer Drug Candidate Based on Machine Learning. Comput. Math. Methods Med. 2022, 2022, 4133663. [Google Scholar] [CrossRef] [PubMed]
  185. Shahab, M.; Zheng, G.; Khan, A.; Wei, D.; Novikov, A.S. Machine Learning-Based Virtual Screening and Molecular Simulation Approaches Identified Novel Potential Inhibitors for Cancer Therapy. Biomedicines 2023, 11, 2251. [Google Scholar] [CrossRef] [PubMed]
  186. Lv, Q.; Zhou, F.; Liu, X.; Zhi, L. Artificial Intelligence in Small Molecule Drug Discovery from 2018 to 2023: Does It Really Work? Bioorg. Chem. 2023, 141, 106894. [Google Scholar] [CrossRef]
  187. Rehman, A.U.; Li, M.; Wu, B.; Ali, Y.; Rasheed, S.; Shaheen, S.; Liu, X.; Luo, R.; Zhang, J. Role of Artificial Intelligence in Revolutionizing Drug Discovery. Fundam. Res. 2024, in press. [Google Scholar] [CrossRef]
  188. Gerstberger, S.; Jiang, Q.; Ganesh, K. Metastasis. Cell 2023, 186, 1564–1579. [Google Scholar] [CrossRef] [PubMed]
  189. National Cancer Institute. Cancer Stat Facts: Female Breast Cancer. Available online: https://seer.cancer.gov/statfacts/html/breast.html (accessed on 13 March 2024).
  190. Xu, H.; Xu, B. Breast Cancer: Epidemiology, Risk Factors and Screening. Chinese J. Cancer Res. 2023, 35, 565–583. [Google Scholar] [CrossRef] [PubMed]
  191. Saha, T.; Makar, S.; Swetha, R.; Gutti, G.; Singh, S.K. Estrogen Signaling: An Emanating Therapeutic Target for Breast Cancer Treatment. Eur. J. Med. Chem. 2019, 177, 116–143. [Google Scholar] [CrossRef] [PubMed]
  192. He, Y.; Sun, M.M.; Zhang, G.G.; Yang, J.; Chen, K.S.; Xu, W.W.; Li, B. Targeting PI3K/Akt Signal Transduction for Cancer Therapy. Signal Transduct. Target. Ther. 2021, 6, 425. [Google Scholar] [CrossRef]
  193. Kümler, I.; Knoop, A.S.; Jessing, C.A.R.; Ejlertsen, B.; Nielsen, D.L. Review of Hormone-Based Treatments in Postmenopausal Patients with Advanced Breast Cancer Focusing on Aromatase Inhibitors and Fulvestrant. ESMO Open 2016, 1, e000062. [Google Scholar] [CrossRef]
  194. Ali, S.; Buluwela, L.; Coombes, R.C. Antiestrogens and Their Therapeutic Applications in Breast Cancer and Other Diseases. Annu. Rev. Med. 2011, 62, 217–232. [Google Scholar] [CrossRef]
  195. Generali, D.; Berardi, R.; Caruso, M.; Cazzaniga, M.; Garrone, O.; Minchella, I.; Paris, I.; Pinto, C.; De Placido, S. Aromatase Inhibitors: The Journey from the State of the Art to Clinical Open Questions. Front. Oncol. 2023, 13, 1249160. [Google Scholar] [CrossRef]
  196. Seruga, B.; Tannock, I.F. Up-Front Use of Aromatase Inhibitors As Adjuvant Therapy for Breast Cancer: The Emperor Has No Clothes. J. Clin. Oncol. 2009, 27, 840–842. [Google Scholar] [CrossRef] [PubMed]
  197. Riemsma, R.; Forbes, C.A.; Kessels, A.; Lykopoulos, K.; Amonkar, M.M.; Rea, D.W.; Kleijnen, J. Systematic Review of Aromatase Inhibitors in the First-Line Treatment for Hormone Sensitive Advanced or Metastatic Breast Cancer. Breast Cancer Res. Treat. 2010, 123, 9–24. [Google Scholar] [CrossRef] [PubMed]
  198. Hartkopf, A.D.; Grischke, E.-M.; Brucker, S.Y. Endocrine-Resistant Breast Cancer: Mechanisms and Treatment. Breast Care 2020, 15, 347–354. [Google Scholar] [CrossRef] [PubMed]
  199. Lannigan, D.A. Estrogen Receptor Phosphorylation. Steroids 2003, 68, 1–9. [Google Scholar] [CrossRef] [PubMed]
  200. Anbalagan, M.; Rowan, B.G. Estrogen Receptor Alpha Phosphorylation and Its Functional Impact in Human Breast Cancer. Mol. Cell. Endocrinol. 2015, 418, 264–272. [Google Scholar] [CrossRef] [PubMed]
  201. Rude Voldborg, B.; Damstrup, L.; Spang-Thomsen, M.; Skovgaard Poulsen, H. Epidermal Growth Factor Receptor (EGFR) and EGFR Mutations, Function and Possible Role in Clinical Trials. Ann. Oncol. 1997, 8, 1197–1206. [Google Scholar] [CrossRef] [PubMed]
  202. Fan, W.; Chang, J.; Fu, P. Endocrine Therapy Resistance in Breast Cancer: Current Status, Possible Mechanisms and Overcoming Strategies. Future Med. Chem. 2015, 7, 1511–1519. [Google Scholar] [CrossRef] [PubMed]
  203. Behl, A.; Wani, Z.A.; Das, N.N.; Parmar, V.S.; Len, C.; Malhotra, S.; Chhillar, A.K. Monoclonal Antibodies in Breast Cancer: A Critical Appraisal. Crit. Rev. Oncol. Hematol. 2023, 183, 103915. [Google Scholar] [CrossRef] [PubMed]
  204. Swain, S.M.; Shastry, M.; Hamilton, E. Targeting HER2-Positive Breast Cancer: Advances and Future Directions. Nat. Rev. Drug Discov. 2023, 22, 101–126. [Google Scholar] [CrossRef]
  205. Shepard, H.M. Trastuzumab: Dreams, Desperation and Hope. Nat. Rev. Cancer 2024, 24, 287–288. [Google Scholar] [CrossRef]
  206. Masuda, H.; Zhang, D.; Bartholomeusz, C.; Doihara, H.; Hortobagyi, G.N.; Ueno, N.T. Role of Epidermal Growth Factor Receptor in Breast Cancer. Breast Cancer Res. Treat. 2012, 136, 331–345. [Google Scholar] [CrossRef] [PubMed]
  207. Maennling, A.E.; Tur, M.K.; Niebert, M.; Klockenbring, T.; Zeppernick, F.; Gattenlöhner, S.; Meinhold-Heerlein, I.; Hussain, A.F. Molecular Targeting Therapy against EGFR Family in Breast Cancer: Progress and Future Potentials. Cancers 2019, 11, 1826. [Google Scholar] [CrossRef] [PubMed]
  208. Mastrangelo, S.; Attina, G.; Triarico, S.; Romano, A.; Maurizi, P.; Ruggiero, A. The DNA-Topoisomerase Inhibitors in Cancer Therapy. Biomed. Pharmacol. J. 2022, 15, 553–562. [Google Scholar] [CrossRef]
  209. Yakkala, P.A.; Penumallu, N.R.; Shafi, S.; Kamal, A. Prospects of Topoisomerase Inhibitors as Promising Anti-Cancer Agents. Pharmaceuticals 2023, 16, 1456. [Google Scholar] [CrossRef] [PubMed]
  210. Vuger, A.T.; Tiscoski, K.; Apolinario, T.; Cardoso, F. Anthracyclines in the Treatment of Early Breast Cancer Friend or Foe? Breast 2022, 65, 67–76. [Google Scholar] [CrossRef]
  211. Venkatesh, P.; Kasi, A. Anthracyclines; StatPearls: St. Petersburg, FL, USA, 2023. [Google Scholar]
  212. Malik, M.S.; Alsantali, R.I.; Jassas, R.S.; Alsimaree, A.A.; Syed, R.; Alsharif, M.A.; Kalpana, K.; Morad, M.; Althagafi, I.I.; Ahmed, S.A. Journey of Anthraquinones as Anticancer Agents—A Systematic Review of Recent Literature. RSC Adv. 2021, 11, 35806–35827. [Google Scholar] [CrossRef]
  213. Kozurkova, M. Acridine Derivatives as Inhibitors/Poisons of Topoisomerase II. J. Appl. Toxicol. 2022, 42, 544–552. [Google Scholar] [CrossRef]
  214. Wang, X.; Zhuang, Y.; Wang, Y.; Jiang, M.; Yao, L. The Recent Developments of Camptothecin and Its Derivatives as Potential Anti-Tumor Agents. Eur. J. Med. Chem. 2023, 260, 115710. [Google Scholar] [CrossRef] [PubMed]
  215. Guo, Q.; Jiang, E. Recent Advances in the Application of Podophyllotoxin Derivatives to Fight Against Multidrug-Resistant Cancer Cells. Curr. Top. Med. Chem. 2021, 21, 1712–1724. [Google Scholar] [CrossRef] [PubMed]
  216. Khwaja, S.; Kumar, K.; Das, R.; Negi, A.S. Microtubule Associated Proteins as Targets for Anticancer Drug Development. Bioorg. Chem. 2021, 116, 105320. [Google Scholar] [CrossRef] [PubMed]
  217. Čermák, V.; Dostál, V.; Jelínek, M.; Libusová, L.; Kovář, J.; Rösel, D.; Brábek, J. Microtubule-Targeting Agents and Their Impact on Cancer Treatment. Eur. J. Cell Biol. 2020, 99, 151075. [Google Scholar] [CrossRef] [PubMed]
  218. Churchill, C.D.M.; Klobukowski, M.; Tuszynski, J.A. The Unique Binding Mode of Laulimalide to Two Tubulin Protofilaments. Chem. Biol. Drug Des. 2015, 86, 190–199. [Google Scholar] [CrossRef]
  219. Willson, M.L.; Burke, L.; Ferguson, T.; Ghersi, D.; Nowak, A.K.; Wilcken, N. Taxanes for Adjuvant Treatment of Early Breast Cancer. Cochrane Database Syst. Rev. 2019, 9, CD004421. [Google Scholar] [CrossRef] [PubMed]
  220. Martino, E.; Casamassima, G.; Castiglione, S.; Cellupica, E.; Pantalone, S.; Papagni, F.; Rui, M.; Siciliano, A.M.; Collina, S. Vinca Alkaloids and Analogues as Anti-Cancer Agents: Looking Back, Peering Ahead. Bioorg. Med. Chem. Lett. 2018, 28, 2816–2826. [Google Scholar] [CrossRef]
  221. Jung, H., II.; Shin, I.; Park, Y.M.; Kang, K.W.; Ha, K.-S. Colchicine Activates Actin Polymerization by Microtubule Depolymerization. Mol. Cells 1997, 7, 431–437. [Google Scholar] [CrossRef] [PubMed]
  222. Dhyani, P.; Quispe, C.; Sharma, E.; Bahukhandi, A.; Sati, P.; Attri, D.C.; Szopa, A.; Sharifi-Rad, J.; Docea, A.O.; Mardare, I.; et al. Anticancer Potential of Alkaloids: A Key Emphasis to Colchicine, Vinblastine, Vincristine, Vindesine, Vinorelbine and Vincamine. Cancer Cell Int. 2022, 22, 206. [Google Scholar] [CrossRef]
  223. Podolak, M.; Holota, S.; Deyak, Y.; Dziduch, K.; Dudchak, R.; Wujec, M.; Bielawski, K.; Lesyk, R.; Bielawska, A. Tubulin Inhibitors. Selected Scaffolds and Main Trends in the Design of Novel Anticancer and Antiparasitic Agents. Bioorg. Chem. 2024, 143, 107076. [Google Scholar] [CrossRef] [PubMed]
  224. Smolarczyk, R.; Czapla, J.; Jarosz-Biej, M.; Czerwinski, K.; Cichoń, T. Vascular Disrupting Agents in Cancer Therapy. Eur. J. Pharmacol. 2021, 891, 173692. [Google Scholar] [CrossRef]
  225. Fritz, A.J.; El Dika, M.; Toor, R.H.; Rodriguez, P.D.; Foley, S.J.; Ullah, R.; Nie, D.; Banerjee, B.; Lohese, D.; Tracy, K.M.; et al. Epigenetic-Mediated Regulation of Gene Expression for Biological Control and Cancer: Cell and Tissue Structure, Function, and Phenotype. In Nuclear, Chromosomal, and Genomic Architecture in Biology and Medicine; Springer: Berlin/Heidelberg, Germany, 2022; pp. 339–373. [Google Scholar]
  226. Vietri, M.; D’elia, G.; Benincasa, G.; Ferraro, G.; Caliendo, G.; Nicoletti, G.; Napoli, C. DNA Methylation and Breast Cancer: A Way Forward (Review). Int. J. Oncol. 2021, 59, 98. [Google Scholar] [CrossRef] [PubMed]
  227. Zhuang, J.; Huo, Q.; Yang, F.; Xie, N. Perspectives on the Role of Histone Modification in Breast Cancer Progression and the Advanced Technological Tools to Study Epigenetic Determinants of Metastasis. Front. Genet. 2020, 11, 603552. [Google Scholar] [CrossRef] [PubMed]
  228. Moore, L.D.; Le, T.; Fan, G. DNA Methylation and Its Basic Function. Neuropsychopharmacology 2013, 38, 23–38. [Google Scholar] [CrossRef]
  229. Sonar, S.; Nyahatkar, S.; Kalele, K.; Adhikari, M.D. Role of DNA Methylation in Cancer Development and Its Clinical Applications. Clin. Transl. Discov. 2024, 4, e279. [Google Scholar] [CrossRef]
  230. Verde, G.; Querol-Paños, J.; Cebrià-Costa, J.; Pascual-Reguant, L.; Serra-Bardenys, G.; Iturbide, A.; Peiró, S. Lysine-Specific Histone Demethylases Contribute to Cellular Differentiation and Carcinogenesis. Epigenomes 2017, 1, 4. [Google Scholar] [CrossRef]
  231. Yang, G.-J.; Liu, Y.-J.; Ding, L.-J.; Tao, F.; Zhu, M.-H.; Shi, Z.-Y.; Wen, J.-M.; Niu, M.-Y.; Li, X.; Xu, Z.-S.; et al. A State-of-the-Art Review on LSD1 and Its Inhibitors in Breast Cancer: Molecular Mechanisms and Therapeutic Significance. Front. Pharmacol. 2022, 13, 989575. [Google Scholar] [CrossRef]
  232. Noce, B.; Di Bello, E.; Fioravanti, R.; Mai, A. LSD1 Inhibitors for Cancer Treatment: Focus on Multi-Target Agents and Compounds in Clinical Trials. Front. Pharmacol. 2023, 14, 1120911. [Google Scholar] [CrossRef]
  233. Yoon, S.; Eom, G.H. HDAC and HDAC Inhibitor: From Cancer to Cardiovascular Diseases. Chonnam Med. J. 2016, 52, 1. [Google Scholar] [CrossRef] [PubMed]
  234. Mottamal, M.; Zheng, S.; Huang, T.; Wang, G. Histone Deacetylase Inhibitors in Clinical Studies as Templates for New Anticancer Agents. Molecules 2015, 20, 3898–3941. [Google Scholar] [CrossRef] [PubMed]
  235. Fu, D.; Hu, Z.; Xu, X.; Dai, X.; Liu, Z. Key Signal Transduction Pathways and Crosstalk in Cancer: Biological and Therapeutic Opportunities. Transl. Oncol. 2022, 26, 101510. [Google Scholar] [CrossRef] [PubMed]
  236. Brogowska, K.K.; Zajkowska, M.; Mroczko, B. Vascular Endothelial Growth Factor Ligands and Receptors in Breast Cancer. J. Clin. Med. 2023, 12, 2412. [Google Scholar] [CrossRef]
  237. Liu, Y.; Li, Y.; Wang, Y.; Lin, C.; Zhang, D.; Chen, J.; Ouyang, L.; Wu, F.; Zhang, J.; Chen, L. Recent Progress on Vascular Endothelial Growth Factor Receptor Inhibitors with Dual Targeting Capabilities for Tumor Therapy. J. Hematol. Oncol. 2022, 15, 89. [Google Scholar] [CrossRef]
  238. Glaviano, A.; Foo, A.S.C.; Lam, H.Y.; Yap, K.C.H.; Jacot, W.; Jones, R.H.; Eng, H.; Nair, M.G.; Makvandi, P.; Geoerger, B.; et al. PI3K/AKT/MTOR Signaling Transduction Pathway and Targeted Therapies in Cancer. Mol. Cancer 2023, 22, 138. [Google Scholar] [CrossRef] [PubMed]
  239. Zhu, K.; Wu, Y.; He, P.; Fan, Y.; Zhong, X.; Zheng, H.; Luo, T. PI3K/AKT/MTOR-Targeted Therapy for Breast Cancer. Cells 2022, 11, 2508. [Google Scholar] [CrossRef] [PubMed]
  240. Kim, S.; Kim, S.A.; Han, J.; Kim, I.-S. Rho-Kinase as a Target for Cancer Therapy and Its Immunotherapeutic Potential. Int. J. Mol. Sci. 2021, 22, 12916. [Google Scholar] [CrossRef] [PubMed]
  241. Zhang, C.; Zhang, S.; Zhang, Z.; He, J.; Xu, Y.; Liu, S. ROCK Has a Crucial Role in Regulating Prostate Tumor Growth through Interaction with C-Myc. Oncogene 2014, 33, 5582–5591. [Google Scholar] [CrossRef]
  242. Du, R.; Huang, C.; Liu, K.; Li, X.; Dong, Z. Targeting AURKA in Cancer: Molecular Mechanisms and Opportunities for Cancer Therapy. Mol. Cancer 2021, 20, 15. [Google Scholar] [CrossRef] [PubMed]
  243. Borisa, A.C.; Bhatt, H.G. A Comprehensive Review on Aurora Kinase: Small Molecule Inhibitors and Clinical Trial Studies. Eur. J. Med. Chem. 2017, 140, 1–19. [Google Scholar] [CrossRef]
  244. Kovacs, A.H.; Zhao, D.; Hou, J. Aurora B Inhibitors as Cancer Therapeutics. Molecules 2023, 28, 3385. [Google Scholar] [CrossRef] [PubMed]
  245. Shao, F.; Pang, X.; Baeg, G.H. Targeting the JAK/STAT Signaling Pathway for Breast Cancer. Curr. Med. Chem. 2021, 28, 5137–5151. [Google Scholar] [CrossRef] [PubMed]
  246. Shawky, A.M.; Almalki, F.A.; Abdalla, A.N.; Abdelazeem, A.H.; Gouda, A.M. A Comprehensive Overview of Globally Approved JAK Inhibitors. Pharmaceutics 2022, 14, 1001. [Google Scholar] [CrossRef] [PubMed]
  247. Kurtanović, N.; Tomašević, N.; Matić, S.; Proia, E.; Sabatino, M.; Antonini, L.; Mladenović, M.; Ragno, R. Human Estrogen Receptor Alpha Antagonists, Part 3: 3-D Pharmacophore and 3-D QSAR Guided Brefeldin A Hit-to-Lead Optimization toward New Breast Cancer Suppressants. Molecules 2022, 27, 2823. [Google Scholar] [CrossRef]
  248. Rajagopal, K.; Kalusalingam, A.; Bharathidasan, A.R.; Sivaprakash, A.; Shanmugam, K.; Sundaramoorthy, M.; Byran, G. In Silico Drug Design of Anti-Breast Cancer Agents. Molecules 2023, 28, 4175. [Google Scholar] [CrossRef]
  249. Khaled, D.M.; Elshakre, M.E.; Noamaan, M.A.; Butt, H.; Abdel Fattah, M.M.; Gaber, D.A. A Computational QSAR, Molecular Docking and In Vitro Cytotoxicity Study of Novel Thiouracil-Based Drugs with Anticancer Activity against Human-DNA Topoisomerase II. Int. J. Mol. Sci. 2022, 23, 11799. [Google Scholar] [CrossRef]
  250. Phanus-umporn, C.; Prachayasittikul, V.; Nantasenamat, C.; Prachayasittikul, S.; Prachayasittikul, V. QSAR-Driven Rational Design of Novel DNA Methyltransferase 1 Inhibitors. EXCLI J. 2020, 19, 458–475. [Google Scholar] [CrossRef]
  251. Jawarkar, R.D.; Bakal, R.L.; Mukherjee, N.; Ghosh, A.; Zaki, M.E.A.; AL-Hussain, S.A.; Al-Mutairi, A.A.; Samad, A.; Gandhi, A.; Masand, V.H. QSAR Evaluations to Unravel the Structural Features in Lysine-Specific Histone Demethylase 1A Inhibitors for Novel Anticancer Lead Development Supported by Molecular Docking, MD Simulation and MMGBSA. Molecules 2022, 27, 4758. [Google Scholar] [CrossRef] [PubMed]
  252. Xu, Y.; Fan, B.; Gao, Y.; Chen, Y.; Han, D.; Lu, J.; Liu, T.; Gao, Q.; Zhang, J.Z.; Wang, M. Design Two Novel Tetrahydroquinoline Derivatives against Anticancer Target LSD1 with 3D-QSAR Model and Molecular Simulation. Molecules 2022, 27, 8358. [Google Scholar] [CrossRef] [PubMed]
  253. Xu, Y.; He, Z.; Yang, M.; Gao, Y.; Jin, L.; Wang, M.; Zheng, Y.; Lu, X.; Zhang, S.; Wang, C.; et al. Investigating the Binding Mode of Reversible LSD1 Inhibitors Derived from Stilbene Derivatives by 3D-QSAR, Molecular Docking, and Molecular Dynamics Simulation. Molecules 2019, 24, 4479. [Google Scholar] [CrossRef] [PubMed]
  254. Xu, Y.; He, Z.; Liu, H.; Chen, Y.; Gao, Y.; Zhang, S.; Wang, M.; Lu, X.; Wang, C.; Zhao, Z.; et al. 3D-QSAR, Molecular Docking, and Molecular Dynamics Simulation Study of Thieno[3,2-b]Pyrrole-5-Carboxamide Derivatives as LSD1 Inhibitors. RSC Adv. 2020, 10, 6927–6943. [Google Scholar] [CrossRef]
  255. Aljanabi, R.; Alsous, L.; Sabbah, D.A.; Gul, H.I.; Gul, M.; Bardaweel, S.K. Monoamine Oxidase (MAO) as a Potential Target for Anticancer Drug Design and Development. Molecules 2021, 26, 6019. [Google Scholar] [CrossRef] [PubMed]
  256. Balbuena-Rebolledo, I.; Rivera-Antonio, A.M.; Sixto-López, Y.; Correa-Basurto, J.; Rosales-Hernández, M.C.; Mendieta-Wejebe, J.E.; Martínez-Martínez, F.J.; Olivares-Corichi, I.M.; García-Sánchez, J.R.; Guevara-Salazar, J.A.; et al. Dihydropyrazole-Carbohydrazide Derivatives with Dual Activity as Antioxidant and Anti-Proliferative Drugs on Breast Cancer Targeting the HDAC6. Pharmaceuticals 2022, 15, 690. [Google Scholar] [CrossRef] [PubMed]
  257. Shirbhate, E.; Pandey, J.; Patel, V.K.; Veerasamy, R.; Rajak, H. Exploration of Structure-Activity Relationship Using Integrated Structure and Ligand Based Approach: Hydroxamic Acid-Based HDAC Inhibitors and Cytotoxic Agents. Turk. J. Pharm. Sci. 2023, 20, 270–284. [Google Scholar] [CrossRef]
  258. Bülbül, E.F.; Robaa, D.; Sun, P.; Mahmoudi, F.; Melesina, J.; Zessin, M.; Schutkowski, M.; Sippl, W. Application of Ligand- and Structure-Based Prediction Models for the Design of Alkylhydrazide-Based HDAC3 Inhibitors as Novel Anti-Cancer Compounds. Pharmaceuticals 2023, 16, 968. [Google Scholar] [CrossRef]
  259. Moussaoui, M.; Baammi, S.; Soufi, H.; Baassi, M.; El Allali, A.; Belghiti, M.E.; Daoud, R.; Belaaouad, S. QSAR, ADMET, Molecular Docking, and Dynamics Studies of 1,2,4-Triazine-3(2H)-One Derivatives as Tubulin Inhibitors for Breast Cancer Therapy. Sci. Rep. 2024, 14, 16418. [Google Scholar] [CrossRef]
  260. Mirzaei, S.; Ghodsi, R.; Hadizadeh, F.; Sahebkar, A. 3D-QSAR-Based Pharmacophore Modeling, Virtual Screening, and Molecular Docking Studies for Identification of Tubulin Inhibitors with Potential Anticancer Activity. Biomed Res. Int. 2021, 2021, 6480804. [Google Scholar] [CrossRef] [PubMed]
  261. Banerjee, S.; Mahmud, F.; Deng, S.; Ma, L.; Yun, M.-K.; Fakayode, S.O.; Arnst, K.E.; Yang, L.; Chen, H.; Wu, Z.; et al. X-Ray Crystallography-Guided Design, Antitumor Efficacy, and QSAR Analysis of Metabolically Stable Cyclopenta-Pyrimidinyl Dihydroquinoxalinone as a Potent Tubulin Polymerization Inhibitor. J. Med. Chem. 2021, 64, 13072–13095. [Google Scholar] [CrossRef] [PubMed]
  262. Abdullahi, S.H.; Uzairu, A.; Shallangwa, G.A.; Uba, S.; Umar, A.B. Pharmacokinetic Profiling of Quinazoline-4(3H)-One Analogs as EGFR Inhibitors: 3D-QSAR Modeling, Molecular Docking Studies and the Design of Therapeutic Agents. J. Taibah Univ. Med. Sci. 2023, 18, 1018–1029. [Google Scholar] [CrossRef]
  263. Anwar, S.; Alanazi, J.; Ahemad, N.; Raza, S.; Chohan, T.A.; Saleem, H. Deciphering Quinazoline Derivatives’ Interactions with EGFR: A Computational Quest for Advanced Cancer Therapy through 3D-QSAR, Virtual Screening, and MD Simulations. Front. Pharmacol. 2024, 15, 1399372. [Google Scholar] [CrossRef] [PubMed]
  264. Simeon, S.; Jongkon, N. Construction of Quantitative Structure Activity Relationship (QSAR) Models to Predict Potency of Structurally Diversed Janus Kinase 2 Inhibitors. Molecules 2019, 24, 4393. [Google Scholar] [CrossRef]
  265. Tian, Y.-Y.; Tong, J.-B.; Liu, Y.; Tian, Y. QSAR Study, Molecular Docking and Molecular Dynamic Simulation of Aurora Kinase Inhibitors Derived from Imidazo[4,5-b]Pyridine Derivatives. Molecules 2024, 29, 1772. [Google Scholar] [CrossRef] [PubMed]
  266. Bathula, S.; Sankaranarayanan, M.; Malgija, B.; Kaliappan, I.; Bhandare, R.R.; Shaik, A.B. 2-Amino Thiazole Derivatives as Prospective Aurora Kinase Inhibitors against Breast Cancer: QSAR, ADMET Prediction, Molecular Docking, and Molecular Dynamic Simulation Studies. ACS Omega 2023, 8, 44287–44311. [Google Scholar] [CrossRef] [PubMed]
  267. Beljkas, M.; Petkovic, M.; Vuletic, A.; Djuric, A.; Santibanez, J.F.; Srdic-Rajic, T.; Nikolic, K.; Oljacic, S. Development of Novel ROCK Inhibitors via 3D-QSAR and Molecular Docking Studies: A Framework for Multi-Target Drug Design. Pharmaceutics 2024, 16, 1250. [Google Scholar] [CrossRef]
  268. Ziemska, J.; Solecka, J.; Jarończyk, M. In Silico Screening for Novel Leucine Aminopeptidase Inhibitors with 3,4-Dihydroisoquinoline Scaffold. Molecules 2020, 25, 1753. [Google Scholar] [CrossRef] [PubMed]
  269. Kim, J.-H.; Jeong, J.-H. Structure-Activity Relationship Studies Based on 3D-QSAR CoMFA/CoMSIA for Thieno-Pyrimidine Derivatives as Triple Negative Breast Cancer Inhibitors. Molecules 2022, 27, 7974. [Google Scholar] [CrossRef] [PubMed]
  270. Subramani, A.K.; Sivaperuman, A.; Natarajan, R.; Bhandare, R.R.; Shaik, A.B. QSAR and Molecular Docking Studies of Pyrimidine-Coumarin-Triazole Conjugates as Prospective Anti-Breast Cancer Agents. Molecules 2022, 27, 1845. [Google Scholar] [CrossRef]
  271. Gandhi, A.; Masand, V.; Zaki, M.E.A.; Al-Hussain, S.A.; Ghorbal, A.B.; Chapolikar, A. Quantitative Structure–Activity Relationship Evaluation of MDA-MB-231 Cell Anti-Proliferative Leads. Molecules 2021, 26, 4795. [Google Scholar] [CrossRef]
  272. Szafrański, K.; Sławiński, J.; Tomorowicz, Ł.; Kawiak, A. Synthesis, Anticancer Evaluation and Structure-Activity Analysis of Novel (E)-5-(2-Arylvinyl)-1,3,4-Oxadiazol-2-Yl)Benzenesulfonamides. Int. J. Mol. Sci. 2020, 21, 2235. [Google Scholar] [CrossRef] [PubMed]
  273. Tomorowicz, Ł.; Sławiński, J.; Żołnowska, B.; Szafrański, K.; Kawiak, A. Synthesis, Antitumor Evaluation, Molecular Modeling and Quantitative Structure–Activity Relationship (QSAR) of Novel 2-[(4-Amino-6-N-Substituted-1,3,5-Triazin-2-Yl)Methylthio]-4-Chloro-5-Methyl-N-(1H-Benzo[d]Imidazol-2(3H)-Ylidene)Benzenesulfonamides. Int. J. Mol. Sci. 2020, 21, 2924. [Google Scholar] [CrossRef]
  274. Angelova, V.T.; Tatarova, T.; Mihaylova, R.; Vassilev, N.; Petrov, B.; Zhivkova, Z.; Doytchinova, I. Novel Arylsulfonylhydrazones as Breast Anticancer Agents Discovered by Quantitative Structure-Activity Relationships. Molecules 2023, 28, 2058. [Google Scholar] [CrossRef]
  275. Sarhan, M.O.; Abd El-Karim, S.S.; Anwar, M.M.; Gouda, R.H.; Zaghary, W.A.; Khedr, M.A. Discovery of New Coumarin-Based Lead with Potential Anticancer, CDK4 Inhibition and Selective Radiotheranostic Effect: Synthesis, 2D & 3D QSAR, Molecular Dynamics, In Vitro Cytotoxicity, Radioiodination, and Biodistribution Studies. Molecules 2021, 26, 2273. [Google Scholar] [CrossRef] [PubMed]
  276. Salas, C.O.; Zarate, A.M.; Kryštof, V.; Mella, J.; Faundez, M.; Brea, J.; Loza, M.I.; Brito, I.; Hendrychová, D.; Jorda, R.; et al. Promising 2,6,9-Trisubstituted Purine Derivatives for Anticancer Compounds: Synthesis, 3D-QSAR, and Preliminary Biological Assays. Int. J. Mol. Sci. 2019, 21, 161. [Google Scholar] [CrossRef]
  277. Nikolova-Mladenova, B.; Momekov, G.; Zhivkova, Z.; Doytchinova, I. Design, Synthesis and Cytotoxic Activity of Novel Salicylaldehyde Hydrazones against Leukemia and Breast Cancer. Int. J. Mol. Sci. 2023, 24, 7352. [Google Scholar] [CrossRef]
  278. Stanton, D.T.; Baker, J.R.; McCluskey, A.; Paula, S. Development and Interpretation of a QSAR Model for in Vitro Breast Cancer (MCF-7) Cytotoxicity of 2-Phenylacrylonitriles. J. Comput. Aided. Mol. Des. 2021, 35, 613–628. [Google Scholar] [CrossRef] [PubMed]
  279. Lawal, H.A.; Uzairu, A.; Uba, S. QSAR, Molecular Docking, Design, and Pharmacokinetic Analysis of 2-(4-Fluorophenyl) Imidazol-5-Ones as Anti-Breast Cancer Drug Compounds against MCF-7 Cell Line. J. Bioenerg. Biomembr. 2020, 52, 475–494. [Google Scholar] [CrossRef] [PubMed]
  280. Bennani, F.E.; Doudach, L.; Karrouchi, K.; El rhayam, Y.; Rudd, C.E.; Ansar, M.; El Abbes Faouzi, M. Design and Prediction of Novel Pyrazole Derivatives as Potential Anti-Cancer Compounds Based on 2D-QSAR Study against PC-3, B16F10, K562, MDA-MB-231, A2780, ACHN and NUGC Cancer Cell Lines. Heliyon 2022, 8, e10003. [Google Scholar] [CrossRef] [PubMed]
  281. Altaf, R.; Nadeem, H.; Ilyas, U.; Iqbal, J.; Paracha, R.Z.; Zafar, H.; Paiva-Santos, A.C.; Sulaiman, M.; Raza, F. Cytotoxic Evaluation, Molecular Docking, and 2D-QSAR Studies of Dihydropyrimidinone Derivatives as Potential Anticancer Agents. J. Oncol. 2022, 2022, 7715689. [Google Scholar] [CrossRef] [PubMed]
  282. Beč, A.; Mioč, M.; Bertoša, B.; Kos, M.; Debogović, P.; Kralj, M.; Starčević, K.; Hranjec, M. Design, Synthesis, Biological Evaluation and QSAR Analysis of Novel N -Substituted Benzimidazole Derived Carboxamides. J. Enzym. Inhib. Med. Chem. 2022, 37, 1327–1339. [Google Scholar] [CrossRef] [PubMed]
  283. Tomorowicz, Ł.; Żołnowska, B.; Szafrański, K.; Chojnacki, J.; Konopiński, R.; Grzybowska, E.A.; Sławiński, J.; Kawiak, A. New 2-[(4-Amino-6-N-Substituted-1,3,5-Triazin-2-Yl)Methylthio]-N-(Imidazolidin-2-Ylidene)-4-Chloro-5-Methylbenzenesulfonamide Derivatives, Design, Synthesis and Anticancer Evaluation. Int. J. Mol. Sci. 2022, 23, 7178. [Google Scholar] [CrossRef]
  284. Aloui, M.; El fadili, M.; Mujwar, S.; Er-rahmani, S.; Abuelizz, H.A.; Er-rajy, M.; Zarougui, S.; Elhallaoui, M. Design of Novel Potent Selective Survivin Inhibitors Using 2D-QSAR Modeling, Molecular Docking, Molecular Dynamics, and ADMET Properties of New MX-106 Hydroxyquinoline Scaffold Derivatives. Heliyon 2024, 10, e38383. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Workflow of QSAR modeling. Initially, a set of chemical compounds needs to be evaluated for their biological activity (BA). Next, the chemical structure of the compounds is described using various chemical descriptors, followed by the application of appropriate chemometric techniques to correlate the measured activity with the chemical descriptors, leading to a QSAR model. The derived model needs to be further validated for accurate predictions and interpretations. The model, along with the associated statistical parameters and graph, is arbitrary.
Figure 1. Workflow of QSAR modeling. Initially, a set of chemical compounds needs to be evaluated for their biological activity (BA). Next, the chemical structure of the compounds is described using various chemical descriptors, followed by the application of appropriate chemometric techniques to correlate the measured activity with the chemical descriptors, leading to a QSAR model. The derived model needs to be further validated for accurate predictions and interpretations. The model, along with the associated statistical parameters and graph, is arbitrary.
Applsci 15 01206 g001
Figure 2. The workflow of Statistical Molecular Design is illustrated as follows: characteristics or descriptors of selected compounds are calculated to form the X matrix, which is then reduced using Principal Component Analysis (PCA). This results in two independent (orthogonal) principal components (PCs) in the given example. The scores of these components, referred to as principal properties (PPs), serve as the starting data for a full or fractional factorial design. In this design, the PPs are evaluated at two levels (low and high), resulting in 2(levels)2(PPs/factors) = 4 possibilities, as shown in the table on the right. A scatterplot provides a graphical representation of the PP scores. By aligning the PPs with the settings of the design, a diverse set of compounds is selected. In the given example, the “corner” selected compounds (1, 8, 13, and 20) are denoted in red circles on the scatterplot, with their values highlighted in red and bold in the PCA table.
Figure 2. The workflow of Statistical Molecular Design is illustrated as follows: characteristics or descriptors of selected compounds are calculated to form the X matrix, which is then reduced using Principal Component Analysis (PCA). This results in two independent (orthogonal) principal components (PCs) in the given example. The scores of these components, referred to as principal properties (PPs), serve as the starting data for a full or fractional factorial design. In this design, the PPs are evaluated at two levels (low and high), resulting in 2(levels)2(PPs/factors) = 4 possibilities, as shown in the table on the right. A scatterplot provides a graphical representation of the PP scores. By aligning the PPs with the settings of the design, a diverse set of compounds is selected. In the given example, the “corner” selected compounds (1, 8, 13, and 20) are denoted in red circles on the scatterplot, with their values highlighted in red and bold in the PCA table.
Applsci 15 01206 g002
Figure 3. The iterative process of rational drug design is depicted. 1. The chemical space is represented by libraries of compounds (colored ovals) with known activities against a specific target, which are utilized for QSAR modeling and predictions. 2. Newly predicted compounds with promising activity are depicted as red stars. Statistical Molecular Design (SMD) is applied to select compounds based on predictor variables—higher-order terms or cross-products of chemical features—that correlate with activity. 3. The newly selected compounds, resulting from SMD, are represented as yellow and white stars. These compounds are subsequently synthesized and tested for their biological activity (BA). 4. The chemical space is expanded with novel data from these compounds and their activities, represented by yellow and white stars within the corresponding ovals. This expanded chemical space is used for further QSAR model updates and development.
Figure 3. The iterative process of rational drug design is depicted. 1. The chemical space is represented by libraries of compounds (colored ovals) with known activities against a specific target, which are utilized for QSAR modeling and predictions. 2. Newly predicted compounds with promising activity are depicted as red stars. Statistical Molecular Design (SMD) is applied to select compounds based on predictor variables—higher-order terms or cross-products of chemical features—that correlate with activity. 3. The newly selected compounds, resulting from SMD, are represented as yellow and white stars. These compounds are subsequently synthesized and tested for their biological activity (BA). 4. The chemical space is expanded with novel data from these compounds and their activities, represented by yellow and white stars within the corresponding ovals. This expanded chemical space is used for further QSAR model updates and development.
Applsci 15 01206 g003
Figure 4. The pathways involved in estrogen-mediated carcinogenesis are illustrated. The yellow ellipses highlight different classes of medications and their targets. Aromatase inhibitors block the enzyme aromatase, which converts androgens, into estrogens, while antiestrogens prevent the activation of transcription in cells caused by estrogen–receptor complexes. High estrogen levels contribute to carcinogenesis through two mechanisms: genotoxic, involving DNA mutations and increased cell proliferation (the upper pathway in the enclosed rectangle), and nongenotoxic, involving mutagenic species generated from estradiol metabolism that undergo oxidation, producing reactive oxygen species that cause oxidative stress and cell damage (the lower pathway in the enclosed rectangle).
Figure 4. The pathways involved in estrogen-mediated carcinogenesis are illustrated. The yellow ellipses highlight different classes of medications and their targets. Aromatase inhibitors block the enzyme aromatase, which converts androgens, into estrogens, while antiestrogens prevent the activation of transcription in cells caused by estrogen–receptor complexes. High estrogen levels contribute to carcinogenesis through two mechanisms: genotoxic, involving DNA mutations and increased cell proliferation (the upper pathway in the enclosed rectangle), and nongenotoxic, involving mutagenic species generated from estradiol metabolism that undergo oxidation, producing reactive oxygen species that cause oxidative stress and cell damage (the lower pathway in the enclosed rectangle).
Applsci 15 01206 g004
Table 1. Unsupervised and supervised ML methods employed in QSAR modeling.
Table 1. Unsupervised and supervised ML methods employed in QSAR modeling.
Unsupervised MLSupervised ML
PCA [39,40,115]
Clustering [119]
Kohonen networks (self-organizing maps) [130]
MLR [121]
PLS [122,123]
Counter-propagation networks [131]
Genetic algorithms (GAs) [132]
Decision Trees [133,134]
Back-propagation network [135]
Table 2. A brief description of feature selection methods, along with their advantages, disadvantages, and examples, is presented.
Table 2. A brief description of feature selection methods, along with their advantages, disadvantages, and examples, is presented.
MethodFilterWrapperEmbedded
Descriptionselection is based on the feature relevance score, which identifies and prioritizes the most impactful features based on their contribution to the dependent variable (Y) by evaluating the intrinsic properties of the dataselection of a subset of relevant features involves generating all possible feature/descriptor subsets, training a machine learning model for each subset, and comparing their performancefeature selection is based on constructing a classifier using a specific learning method, training the model on all descriptors, and extracting the importance of each feature
AdvantagesCan scale high-dimensional datasets; faster and computationally affordable compared to wrapper methodsconsider feature/descriptor dependency; interaction with classifier; simple to implementclassifier interaction; consider feature dependencies
Disadvantagesno interaction with the classifier, do not consider feature dependencies/redundancyoverfitting, selection based on classifiers, computationally demandingclassifier dependencies
Applicationsin various statistical tests that associate X and Y variables, such as the Chi squared test, correlation coefficient scores, information gain, t-testgenetic search, exhaustive search, sequential forward selection/backward eliminationDecision Tree, Weighted Naïve Bayes, Weighted Vector of SVM
Table 3. Features of the three generations of molecules: hit, lead, and drug.
Table 3. Features of the three generations of molecules: hit, lead, and drug.
HitLeadDrug
bring the pharmacophoreMw < 300 Da
logP < 3
up to 3 H-bond donors
up to 3 H-bond acceptors
positions for replacement
Mw < 500 Da
logP < 3
up to 5 H-bond donors
up to 10 H-bond acceptors
up to 10 rotatable bonds 10
affinity < 50 mmol/Laffinity < 10 µmol/Laffinity < 10 nmol/L
Table 4. Techniques for lead identification and optimization based on the knowledge of ligand and target structures.
Table 4. Techniques for lead identification and optimization based on the knowledge of ligand and target structures.
Ligand\Target StructureUnknownKnown
unknownHTS
Combinatorial Chemistry
De Novo Design
Target-based Pharmacophore Identification
Molecular Docking
knownProteochemometrics (PCM)
QSAR
Pharmacophore Identification
Similarity
SBDD including
Molecular Docking
Molecular Dynamics
De Novo Design
Table 5. QSAR case studies leveraging activities against various targets.
Table 5. QSAR case studies leveraging activities against various targets.
Compounds’ ScaffoldTargetActionModelingSoftwareRef.
Heterogenic group
(all available SERMs)
Human Estrogen Receptor Alpha (ERα)Selective ER modulators (SERMs, mixed agonists/antagonists of ERα)3D-Pharmacophore
3D-QSAR
PHASE program from the Schrödinger suite 2015-2[247] *
Heterogenic groupHuman Estrogen Receptor Alpha (ERα)ERα inhibitors3D-QSARSchrodinger suite 2021-4[248]
Thiouracil-based indeno pyrido pyrimidinesHuman-DNA topoisomerase IIInhibition of Human-DNA topoisomerase II2D-QSARMINITAB v. 19[249]
Indole and
Oxazoline/1,2-oxazole scaffolds
DNA methyltransferasesInhibition of DNA methyltransferases (an epigenetic modification)2D-QSARWeka 3.6[250]
Heterogenic groupLysine-specific histone demethylase 1A (LSD1)LSD1 inhibition (an epigenetic modification)2D-QSARQSARINS-v2.2.4[251]
Tetrahydroquinoline derivativesLysine-specific histone demethylase 1A (LSD1)LSD1 inhibition (an epigenetic modification)3D-QSARSYBYL-X2.0[252]
Stilbene derivativesLysine-specific histone demethylase 1A (LSD1)LSD1 inhibition (an epigenetic modification)3D-QSARSYBYL-X2.0[253]
Thieno [3,2-b]pyrrole-5-carboxamide derivativesLysine-specific histone demethylase 1
(LSD1)
LSD1 inhibition (an epigenetic modification)3D-QSARSYBYL-X2.0[254]
Xanthone, Pyrrole, Pyridazine, and Phenyl alkylamine derivativesMAO inhibitors/LSD inhibitorsInhibition of MAO2D-3D QSARSYBYL software 6.3[255]
Dihydropyrazole-Carbohydrazide derivativesHistone deacetylase 6 (HDAC6)Inhibition of HDAC6
(an epigenetic modification)
2D-QSARSigma Stat 3.5[256]
Heterogenic groupHistone deacetylase (HDAC)Inhibition of HDAC
(an epigenetic modification)
3D QSARSchrodinger suite (Maestro v 9.3, LLC, New York)[257]
N-monosubstituted hydrazide
derivatives
Histone deacetylase 3Inhibition of HDAC3
(an epigenetic modification)
3D-QSARMOE program 2019 (Molecular database calculator–RAND)[258]
1,2,4-triazine-3(2H)-one derivativesTubulin proteinTubulin Polymerization Inhibitor2D-QSARXLSTAT v2019[259]
Quinolines derivativesTubulin proteinTubulin Polymerization Inhibitor3D-QSARPhase (v4.3) module of Schrodinger 2016-1[260]
1H-Pyrazole-1-carbothioamide derivativesEpidermal growth factor receptor (EGFR)EGTK-TK inhibitors (tyrosine kinase inhibitor)2D-QSARIBM SPSS statistics v.23[261]
Quinazoline-4(3H)-one analogsEpidermal growth factor receptor
(EGFR)
EGFR-TK inhibitors (tyrosine kinase inhibitor)3D-QSARSYBYL-X 2.1.1[262]
Quinazoline analogsEpidermal growth factor receptor
(EGFR)
EGFR-TK inhibitors (tyrosine kinase inhibitor)3D-QSARSybyl-X1.3.[263]
Chemical Space of JAK2 InhibitorsJanus kinase 2 (JAK2)JAK2 inhibitors (tyrosine kinase inhibitor)2D-QSARR package 2018[264]
Imidazo [4,5-b]pyridine derivativesAurora kinaseAurora kinase inhibitors
(serine-threonine kinase inhibitor)
3D-QSARSYBYL 2.0[265]
2-Amino Thiazole derivativesAurora kinaseAurora kinase inhibitors
(serine-threonine kinase inhibitor)
2D-QSARQSARINS[266]
Heterogenic group
isoquinoline, pyridine, indazole, and pyrazole derivatives
Rho-associated coiled-coil-containing protein kinases (ROCKs)ROCK inhibitors
(serine-threonine kinase inhibitor)
3D-QSARPentacle 1.07[267] *
Dihydroisoquinoline analogsLeucine aminopeptidase (LAP)Leucine aminopeptidase
inhibitors
3D-QSARForge software 10.6.0[268]
Thieno-pyrimidine derivativesReceptors for vascular endothelial growth factor (VERFG 3)Inhibitors of VERFG 33D-QSARSYBYL-X2.1.1[269]
Pyrimidine–coumarin–triazole conjugatesDihydrofolate reductase (DHFR)Inhibitors of Dihydrofolate reductase (DHFR)2D-QSARQSARINS[270]
* QSAR models were subsequently employed for the design, synthesis, and experimental testing of novel compounds.
Table 6. QSAR studies based on activity data from various cell lines.
Table 6. QSAR studies based on activity data from various cell lines.
Compounds’ ScaffoldCell LineModelingSoftwareRef.
Heterogenic groupMDA-MB-2312D-QSARQSARINS v2.2.4[271]
(E)-5-(2-Arylvinyl)-1,3,4-oxadiazol-2-yl)benzenesulfonamidesHCT-116, MCF-7 and HeLa2D-QSARStatistica v13, TIBCO[272]
2-[(4-Amino-6-N-substituted-1,3,5-triazin-2-yl)methylthio]-4-chloro-5-methyl-N-(1H-benzo[d]imidazol-2(3H)-ylidene)BenzenesulfonamidesHCT-116, MCF-7 and HeLa2D-QSARMOE 2016[273] *
ArylsulfonylhydrazonesMCF-7 and MDA-MB-2312D-QSARMDL QSAR v.2.2[274] *
6-bromo-coumarin-ethylidene-hydrazonyl-thiazolyl and 6-bromo-coumarin-thiazolyl-based derivativesMCF-7, A-549, and CHO-K12D- and 3D-QSARMOE 2016.08[275]
2,6,9-Trisubstituted Purine derivativesCFAPC- 1, NCI-H460, HL-60, CACO2, HCT-116, K562, MCF-7, MRC-53D-QSARSybyl X-1.2[276]
Salicylaldehyde hydrazonesHL-60, KE-37, K-562, BV-173, SaOS-2, MCF-7, MDA-MB-231, HEK-293 lines2D-QSARMDL QSAR v. 2.2[277]
2-phenylacrylonitrilesMCF-72D-QSARwinMolconn v. 1.0.2.1[278]
Novel series of 2-(4-fluorophenyl) imidazol-5-onesMCF-72D-QSARMaterial studio v8[279]
Pyrazole derivativesPC-3, B16F10, K562, MDA-MB-231, A2780, ACHN and NUGC2D-QSARXLSTAT 2014[280]
Dihydropyrimidinone derivativesMCF-72D-QSARQSARINS[281]
N-substituted benzimidazole derivativesHCT 116, H 460, MCF-7 and HEK 293 3D-QSARVolSurf+ 3-D[282]
2-[(4-Amino-6-N-substituted-1,3,5-triazin-2-yl)methylthio]-N-(imidazolidin-2-ylidene)-4-chloro-5-methylbenzenesulfonamide derivativesHCT-116, MCF-7, HeLa, HaCaT lines2D-QSARSTATISTICA software v.13[283] *
Hydroxyquinoline scaffold derivativesMDA-MB-4352D-QSARna[284]
* QSAR models were subsequently employed for the design, synthesis, and experimental testing of novel compounds; na—not available.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vasilev, B.; Atanasova, M. A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity. Appl. Sci. 2025, 15, 1206. https://doi.org/10.3390/app15031206

AMA Style

Vasilev B, Atanasova M. A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity. Applied Sciences. 2025; 15(3):1206. https://doi.org/10.3390/app15031206

Chicago/Turabian Style

Vasilev, Boris, and Mariyana Atanasova. 2025. "A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity" Applied Sciences 15, no. 3: 1206. https://doi.org/10.3390/app15031206

APA Style

Vasilev, B., & Atanasova, M. (2025). A (Comprehensive) Review of the Application of Quantitative Structure–Activity Relationship (QSAR) in the Prediction of New Compounds with Anti-Breast Cancer Activity. Applied Sciences, 15(3), 1206. https://doi.org/10.3390/app15031206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop