1. Introduction
Serotonin receptors are an important group of biological targets belonging mainly to G protein-coupled receptors (GPCRs). Among them, we can distinguish as many as 13 subtypes of receptors, namely 5-HT1A, 5-HT1B, 5-HT1D, 5-HT1E, 5-HT1F, 5-HT2A, 5-HT2B, 5-HT2C, 5-HT4, 5-HT5A, 5-HT5B, 5-HT6, and 5-HT7. The only instance of a receptor outside this group is the 5-HT3 receptor, which belongs to the group of ionotropic receptors. Serotonin receptors play a key role in various physiological functions as they are widely distributed in the nervous system and peripheral tissues. These receptors are currently extensively explored as targets in the drug discovery process (against migraine, depression, and schizophrenia or in treatment of nausea) [
1].
Among the processes focusing on the discovery of new therapeutic molecules, there are two main streams: Structure-Based Drug Design and Ligand-Based Drug Design. The first approach includes molecular docking and virtual screening. One can find most of the structures of serotonin receptors on the UniProt platform [
2]. The literature includes publications focusing on describing the structural and functional differences between serotonin receptors, including their localization in different brain areas [
3,
4,
5,
6,
7]. There have been studies focusing on describing the relationship between descriptors and affinity or activity values for specific receptors. Bukhari et al. discussed the effect of PyDescriptors and PaDEL on the pKi values of ligands relative to serotonin 5-HT6 receptors. They developed QSAR (quantitative structure–activity relationship) models based on a database of more than 1200 molecules and applied molecular docking to visualize the potential effect of selected descriptors on ligand–receptor interactions [
8]. In turn, the paper by Petković et al. presented datasets of 50 molecules with observed serotonin transporter (pIC50) inhibitory effects. They created QSAR models using Monte Carlo optimization on local graph invariants and descriptors based on SMILES notation, a genetic algorithm based on two-dimensional PaDEL descriptors [
9].
This was based on a review of information by A. Sandri [
10] in which the author conducted a thorough analysis of different approaches to drug discovery together with the number of approved molecules. According to this information, focusing solely on biological targets may have limited chances of success. However, it is worth noting that this opinion does not exclude the usefulness of methods based on biological targets and affinity. On the contrary, it may represent one of many insights into a comprehensive research process. At the outset, it can be emphasized that these methods are an important part of successful drug discovery in the future, albeit while taking into account other approaches such as those based on observable phenotypic effects.
In this article, the analysis concentrates on differences between active/inactive compounds in the serotonin system and serotonin receptors based on ligand characteristics, represented by molecular descriptors. Moreover, we want to obtain models of serotonergic activity and selective binding to a chosen serotonin receptor. To our best knowledge, this type of concept has not yet been developed for serotonin receptors. These research objectives were based on statistical methods and machine learning with an emphasis on Automated Machine Learning with SHapley Additive exPlanations analysis (SHAP analysis). The results obtained provide a basis for the search for molecule features important for active and selective interaction with serotonin receptors. Perhaps these descriptors would be analogous to Lipinski’s features indicating ‘druglikeness’ and, in our case, will be features indicating serotonergic activity and selectivity for selected serotonin receptors [
11]. This selectivity may have a positive effect by knowingly reducing the occurrence of adverse effects. In the case of the serotonergic system, adverse events are varied; for example, 5-HT2A receptor activation can lead to psychedelic effects whereas 5-HT3 receptor activation can lead to nausea.
4. Discussion
The findings presented in this article indicate that a single descriptor alone may not clearly differentiate the presence or absence of serotonergic activity or demonstrate selectivity towards serotonin receptors. Despite this observation, the statistical analysis results reveal a possibility to highlight a group of descriptors to collectively establish rules for determining activity and selectivity. These insights provide a basis for the further exploration and understanding of the relationships within serotonergic receptors.
The binary classification model highlights the substantial importance of the features associated with both the descriptor groups (ATSC, GATS, JGI, GGI, Kier, MATS, PEOE_VSA, SlogP, VSA_Estate) as well as structure elements (aasC, aaaC, ssO, sssN, FRing, nBase). Moreover, the selectivity model for the 11 serotonin receptors shows the following groups of descriptors, including ATS, BlabanJ, Xch; those related to the number of structure elements (ddssS, sssCH, aasN); and those associated with the distance between atoms (MDEC-33).
In both models, ATS Mordred descriptors are present. The AATSC features represent the average-centered autocorrelation of the topological structure (Moreau–Broto autocorrelation descriptor), defined as AATSk = ATSk/Δk, where Δk is the number of vertex pairs at an order equal to k. The ATSC descriptors represent a way of measuring the similarity or correlation between different atoms in a molecule based on their properties and distances. This involves calculating the average-centered autocorrelation of the molecule’s topological structure, where the topological structure refers to the arrangement of atoms and bonds in the molecule. This method helps capture important information about the molecular structure for further analysis in a simpler form [
25]. Moreover, both models use descriptors representing spectral mean absolute deviation from the Barysz matrix (SpMAD_Dzare, SpMAD_DzZ). Another group of descriptors present in the models related to serotonergic activity and selectivity comprises those related to Chi descriptors. In the case of the binary model, these are AXp-7dv (valence-electron-weighted Chi path) and Xch-7dv (seven-ordered valence-electron-weighted Chi chain), and in the selectivity model they are Xc-5dv (five-ordered valence electron-weighted Chi chain) and Xch-5d (five-ordered sigma-electron-weighted Chi chain) [
26].
For the binary classification of serotonergic activity, more descriptors were detected. The first group, GATS, stands for the Geary coefficient descriptor. Those features represent a set of molecular descriptors that describe the spatial distribution of atom or bond properties in a molecule. Specifically, GATS descriptors are a type of autocorrelation descriptor calculated based on the Geary autocorrelation function. Autocorrelation involves measuring the similarity or correlation of a property between different atoms or bonds at varying distances within a molecule. Secondly, both JGI and GGI are descriptors that fall under the category of topological charge descriptors. These descriptors capture information about the electronic distribution and charge-related properties of atoms within a molecule. Another group of descriptors present in the binary model is Kier. It stands for ‘Kappa Shape Index’ and measures the molecular shape based on specific atom paths. Furthermore, MATS (Moran autocorrelation descriptor) is presented by this equation: MATS
k = AATSC
k/(1/A⋅∑w
2c), where W is the atomic property vector. An important group of descriptors appearing only in the binary classification model is PEOE-VSA (partial equalization of orbital electronegativity of van der Waals surface area). PEOE is a method of calculating partial atomic charges in which a charge is transferred between bound atoms until equilibrium is reached. To ensure convergence, the quantity of charge transferred in each iteration is suppressed by an exponentially decreasing scale factor. PEOE charges depend only on the connectivity of the input structures: elements, formal charges, and bond orders. Also associated with the van der Waals area are the VSA_EState2 and VSA_Estate7 descriptors found in the binary model. These are MOE-type descriptors using EState and surface share indices. In the serotonergic activity model, there are single descriptors discussing the neighborhoods of the atoms (IC1—number of edges of the subgraph, ZMIC3—three-ordered Z-modified information content) or the shape of the molecule TopoShapeIndex (topological shape index). Specific types of atoms and surroundings are also distinguished—aasC, aaaC, ssO, sssN—and shown in
Figure 8. Moreover, SHAP analysis distinguished the number of base groups (nBase) or the fused ring count (nFRing). In drug design, an important feature of molecules is the logP value, which is one of Lipinski’s rules for the lipophilicity of a molecule. For the binary model, there are two derivatives of the logP descriptor, namely the SLogP value—the Wildman–Crippen LogP and SlogP_VSA1—and MOE-type descriptors using the Wildman–Crippen LogP and surface contribution [
25,
26,
27].
In the selectivity model, the descriptors describing the structure are BalabanJ and the Extended Topochemical Atom descriptor (ETA_dPsi_B). The first one is a graph index used to describe the structural features of a graph. It takes into account the number of nodes, edges, and connected components in the graph. The calculation involves the graph’s distance matrix and the circuit rank, providing a numerical value that characterizes the graph’s complexity [
28]. The serotonin receptor prediction model focuses on individual atom types in a specific environment (descriptors ddssS, sssCH, aasN—shown in
Figure 8). Additionally, it considers the edge of the molecular distance between two tertiary carbon atoms, represented by MDEC-33 [
25,
26,
27].
Simultaneously examining these two models reveals the essential role of descriptors such as ATS, SpMAD, and Xch in predicting both serotonergic activity and selectivity. A comparative analysis underscores that for selectivity, information pertaining to the presence of sulfur in a specific arrangement within the molecular structure assumes greater importance. In contrast, in the serotonergic activity model, these elements do not emerge as descriptors responsible for 50% of the influence on predicting this feature. Descriptors NddssS and SddssS characterize the number and sum of electron states. The sum of SHAP values for those features constitutes almost 17% of the influence on selectivity towards serotonin receptors. In
Figure 9, we present a modification of these two descriptors, which represents the remainder obtained by dividing the ‘NddssS’ variable by the ‘SddssS’ variable. This derivative differentiates over half of the serotonin receptors.
All the descriptors discussed above are the results of extensive modeling experiments carried out in this work. However, they were chosen from the predefined set of descriptors falling into the category of “2D” descriptors from the Mordred package [
16]. Thus, our initial choice to limit the descriptor search space to 2D descriptors only was based on our previous experience with the numerical instability of 3D structure optimization methods resulting in the variability of 3D descriptors. In this view, we deny the added value of the descriptors’ sophistication [
29] and trade it for the robustness and stability of a whole system.
Models of serotonergic activity, as well as the selectivity model, have become new extensions of SerotoninAI, a new web application related to serotonergic QSAR models [
30] described in the article [
31]. Applicability domain information was implemented in a form unified with other SerotoninAI modules. The ‘Serotonergic activity’ and ‘Selectivity’ sections provide radial charts of the ten most important descriptors. If their values for a tested compound are within the range for at least seven descriptors, the compound is in the applicability domain and predictions have a high probability of success.
Considerations related to the sulfur atom in relation to the pIC50 value for the 5-HT6 receptor appear in a study by Bukhari S.N.A. et al. Based on the QSAR model created, followed by a docking step to confirm the results obtained, among other things, a sulfur atom was determined that should be taken into account when optimizing the molecule for its effect on the 5-HT6 receptor [
8].
In summary, this study demonstrated the importance of a comprehensive set of descriptors for understanding both serotonergic activity and receptor selectivity. The significant differences in the importance of the descriptors between the two models highlight the complex nature of predicting these pharmacological features.