Submit to IJMS Review for IJMS Propose a Special Issue

Journal Menu

Journal Browser

Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0

Editor
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (31 July 2024) | Viewed by 43813

Share This Special Issue

Editor

Dr. Bono Lučić

E-Mail Website
Guest Editor

The Ruđer Bošković Institute, Bijenička 54, 10000 Zagreb, Croatia
Interests: chemoinformatics; structural bioinformatics; structure–activity modeling; QSAR; QSPR; molecular modeling; computational chemistry; molecular structural biophysics; development of model validation algorithms; variable selection algorithms; classification modeling; chance accuracy estimation; development of accuracy parameters; computational research in bioprospecting research; protein structure analysis and prediction
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

At a time of universal digitization of data in various fields of research, including molecular sciences, there are more and more studies modeling continuous or classification endpoints (activities/properties) of molecules. In doing so, endpoints of molecules are most often classified (digitized) into two classes—active or inactive, and the classification is often carried out by grouping data into three or more classes.

Quantitative structure–activity/property relationships (QSAR/QSPR) are the most common, but not the only, forms of structure–endpoint models in molecular sciences. The accuracy of models is expressed by validation procedures, and many quality parameters are defined in the OECD document related to regulatory structure–activity models for the purpose of health and environmental protection [1]. In this document, the accuracy parameters of classification models are very sparsely presented. However, numerous accuracy parameters are used today, and those used for classification models are calculated from the confusion matrix elements [2]. There is also an increased need for a better definition of procedures for validation of regulatory structure–activity models in the OECD document [1]. Their application in environmental and health protection (toxicity, bioavailability, sorption, biodegradability, etc.) has been defined by EU REACH regulations [3].

The development of structure–activity modeling of different types of endpoints of molecules (usually various types of biological activities) is accelerated using chemoinformatics and bioinformatics tools, servers, algorithms, and databases developed for small molecules and proteins.

The research activities in the development of novel chemoinformatics and bioinformatics tools are particularly important topics for this Special Issue, such as the development of:

Valuable databases, servers, and data mining tools;
Drug or lead structure identification or dereplication approaches used in bioprospecting research;
Structure optimization tools;
Molecular descriptors;
Modeling and variable selection algorithms;
Computational model validation methods;
Multivariate linear and nonlinear methods;
Machine learning and deep learning algorithms;
Predictive or descriptive structure–activity models;
Different visualization tools;
Protein–ligand (target/small compound) interactions;
Protein-protein interactions;
Molecular docking, etc.

All these topics are of the highest importance for structure–activity modeling in molecular sciences.

This Special Issue aims to collect relevant contributions (papers) belonging to one or more of the topics listed above (and those related to them), which are important for the acceleration of structure–activity research in molecular sciences. Applications aimed at modeling a broad spectrum of chemical, biological, pharmaceutical, biochemical, and environmentally relevant activities and properties of molecules are also appreciated.

All forms of scientific articles covering mentioned or related topics are welcomed, i.e., original papers, reviews, and communications.

[1] Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models, [https://www.oecd.org/env/guidance-document-on-the-validation-of-quantitative-structure-activity-relationship-q-sar-models-9789264085442-en.htm]

[2] D. M. W. Powers, Evaluation: from precision, recall and f-measure to roc, informedness, markedness & correlation, J. Machine Learning Techn., 2011, 2, 37-63

[3] Regulation (EC) No 1907/2006: REACH - Registration, Evaluation, Authorisation and Restriction of Chemicals. [http://ec.europa.eu/enterprise/sectors/chemicals/reach/index_en.htm].

Dr. Bono Lučić
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

chemoinformatics tools
bioinformatics tools
structure–activity modeling
structure–property modeling
QSAR
QSPR
drug/structure identification in bioprospecting research
molecular docking
molecular interactions
protein-protein interactions
development of algorithms
databases and web servers
data mining
structure representation and optimization
molecular descriptors
modelling of health and environmentally relevant endpoints/activities/properties
toxicity
carcinogenicity
computational methods
model validation approaches
multivariate modeling
predictive modeling
descriptive modeling
classification modeling
machine learning
deep learning
structure visualization

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 3977 KB

Open AccessArticle

A Point Cloud Graph Neural Network for Protein–Ligand Binding Site Prediction

by Yanpeng Zhao, Song He, Yuting Xing, Mengfan Li, Yang Cao, Xuanze Wang, Dongsheng Zhao and Xiaochen Bo

Int. J. Mol. Sci. 2024, 25(17), 9280; https://doi.org/10.3390/ijms25179280 - 27 Aug 2024

Cited by 10 | Viewed by 5613

Abstract

Predicting protein–ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein–ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein–ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket’s advancement and practicality for protein–ligand binding site prediction. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

19 pages, 2243 KB

Open AccessArticle

An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction

by Nor Kumalasari Caecar Pratiwi, Hilal Tayara and Kil To Chong

Int. J. Mol. Sci. 2024, 25(11), 5957; https://doi.org/10.3390/ijms25115957 - 29 May 2024

Cited by 5 | Viewed by 2519

Abstract

In this study, we present an innovative approach to improve the prediction of protein–protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

21 pages, 3849 KB

Open AccessArticle

Unraveling Divergent Transcriptomic Profiles: A Comparative Single-Cell RNA Sequencing Study of Epithelium, Gingiva, and Periodontal Ligament Tissues

by Ali T. Abdallah and Anna Konermann

Int. J. Mol. Sci. 2024, 25(11), 5617; https://doi.org/10.3390/ijms25115617 - 22 May 2024

Cited by 9 | Viewed by 3764

Abstract

The periodontium comprising periodontal ligament (PDL), gingiva, and epithelium play crucial roles in maintaining tooth integrity and function. Understanding tissue cellular composition and gene expression is crucial for illuminating periodontal pathophysiology. This study aimed to identify tissue-specific markers via scRNA-Seq. Primary human PDL, gingiva, and epithelium tissues (n = 7) were subjected to cell hashing and sorting. scRNA-Seq library preparation using 10× Genomics protocol and Illumina sequencing was conducted. The analysis was performed using Cellranger (v3.1.0), with downstream analysis via R packages Seurat (v5.0.1) and SCORPIUS (v1.0.9). Investigations identified eight distinct cellular clusters, revealing the ubiquitous presence of epithelial and gingival cells. PDL cells evolved in two clusters with numerical superiority. The other clusters showed varied predominance regarding gingival and epithelial cells or an equitable distribution of both. The cluster harboring most cells mainly consisted of PDL cells and was present in all donors. Some of the other clusters were also tissue-inherent, while the presence of others was environmentally influenced, revealing variability across donors. Two clusters exhibited genetic profiles associated with tissue development and cellular integrity, respectively, while all other clusters were distinguished by genes characteristic of immune responses. Developmental trajectory analysis uncovered that PDL cells may develop after epithelial and gingival cells, suggesting the inherent PDL cell-dominated cluster as a final developmental stage. This single-cell RNA sequencing study delineates the hierarchical organization of periodontal tissue development, identifies tissue-specific markers, and reveals the influence of environmental factors on cellular composition, advancing our understanding of periodontal biology and offering potential insights for therapeutic interventions. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

11 pages, 461 KB

Open AccessArticle

Merging Counter-Propagation and Back-Propagation Algorithms: Overcoming the Limitations of Counter-Propagation Neural Network Models

by Viktor Drgan, Katja Venko, Janja Sluga and Marjana Novič

Int. J. Mol. Sci. 2024, 25(8), 4156; https://doi.org/10.3390/ijms25084156 - 9 Apr 2024

Cited by 6 | Viewed by 1991

Abstract

Artificial neural networks (ANNs) are nowadays applied as the most efficient methods in the majority of machine learning approaches, including data-driven modeling for assessment of the toxicity of chemicals. We developed a combined neural network methodology that can be used in the scope of new approach methodologies (NAMs) assessing chemical or drug toxicity. Here, we present QSAR models for predicting the physical and biochemical properties of molecules of three different datasets: aqueous solubility, acute fish toxicity toward fat head minnow, and bio-concentration factors. A novel neural network modeling method is developed by combining two neural network algorithms, namely, the counter-propagation modeling strategy (CP-ANN) with the back-propagation-of-errors algorithm (BPE-ANN). The advantage is a short training time, robustness, and good interpretability through the initial CP-ANN part, while the extension with BPE-ANN improves the precision of predictions in the range between minimal and maximal property values of the training data, regardless of the number of neurons in both neural networks, either CP-ANN or BPE-ANN. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

28 pages, 6051 KB

Open AccessArticle

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

by Pierre-Yves Libouban, Samia Aci-Sèche, Jose Carlos Gómez-Tamayo, Gary Tresadern and Pascal Bonnet

Int. J. Mol. Sci. 2023, 24(22), 16120; https://doi.org/10.3390/ijms242216120 - 9 Nov 2023

Cited by 12 | Viewed by 5362

Abstract

Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein–ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models’ decision-making processes and accurately compare the performance of models. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

16 pages, 1330 KB

Open AccessArticle

Utilization of Supervised Machine Learning to Understand Kinase Inhibitor Toxophore Profiles

by Andrew A. Bieberich and Christopher R. M. Asquith

Int. J. Mol. Sci. 2023, 24(6), 5088; https://doi.org/10.3390/ijms24065088 - 7 Mar 2023

Cited by 2 | Viewed by 3718

Abstract

There have been more than 70 FDA-approved drugs to target the ATP binding site of kinases, mainly in the field of oncology. These compounds are usually developed to target specific kinases, but in practice, most of these drugs are multi-kinase inhibitors that leverage the conserved nature of the ATP pocket across multiple kinases to increase their clinical efficacy. To utilize kinase inhibitors in targeted therapy and outside of oncology, a narrower kinome profile and an understanding of the toxicity profile is imperative. This is essential when considering treating chronic diseases with kinase targets, including neurodegeneration and inflammation. This will require the exploration of inhibitor chemical space and an in-depth understanding of off-target interactions. We have developed an early pipeline toxicity screening platform that uses supervised machine learning (ML) to classify test compounds’ cell stress phenotypes relative to a training set of on-market and withdrawn drugs. Here, we apply it to better understand the toxophores of some literature kinase inhibitor scaffolds, looking specifically at a series of 4-anilinoquinoline and 4-anilinoquinazoline model libraries. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Scheme 1

16 pages, 2323 KB

Open AccessArticle

MSEDDI: Multi-Scale Embedding for Predicting Drug—Drug Interaction Events

by Liyi Yu, Zhaochun Xu, Meiling Cheng, Weizhong Lin, Wangren Qiu and Xuan Xiao

Int. J. Mol. Sci. 2023, 24(5), 4500; https://doi.org/10.3390/ijms24054500 - 24 Feb 2023

Cited by 18 | Viewed by 3976

Abstract

A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug—drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug—drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

15 pages, 1614 KB

Open AccessArticle

Toward Quantitative Models in Safety Assessment: A Case Study to Show Impact of Dose–Response Inference on hERG Inhibition Models

by Fjodor Melnikov, Lennart T. Anger and Catrin Hasselgren

Int. J. Mol. Sci. 2023, 24(1), 635; https://doi.org/10.3390/ijms24010635 - 30 Dec 2022

Cited by 6 | Viewed by 3991

Abstract

Due to challenges with historical data and the diversity of assay formats, in silico models for safety-related endpoints are often based on discretized data instead of the data on a natural continuous scale. Models for discretized endpoints have limitations in usage and interpretation that can impact compound design. Here, we present a consistent data inference approach, exemplified on two data sets of Ether-à-go-go-Related Gene (hERG) K+ inhibition data, for dose–response and screening experiments that are generally applicable for in vitro assays. hERG inhibition has been associated with severe cardiac effects and is one of the more prominent safety targets assessed in drug development, using a wide array of in vitro and in silico screening methods. In this study, the IC₅₀ for hERG inhibition is estimated from diverse historical proprietary data. The IC₅₀ derived from a two-point proprietary screening data set demonstrated high correlation (R = 0.98, MAE = 0.08) with IC_50s derived from six-point dose–response curves. Similar IC₅₀ estimation accuracy was obtained on a public thallium flux assay data set (R = 0.90, MAE = 0.2). The IC₅₀ data were used to develop a robust quantitative model. The model’s MAE (0.47) and R² (0.46) were on par with literature statistics and approached assay reproducibility. Using a continuous model has high value for pharmaceutical projects, as it enables rank ordering of compounds and evaluation of compounds against project-specific inhibition thresholds. This data inference approach can be widely applicable to assays with quantitative readouts and has the potential to impact experimental design and improve model performance, interpretation, and acceptance across many standard safety endpoints. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

17 pages, 4850 KB

Open AccessArticle

Maximizing the Performance of Similarity-Based Virtual Screening Methods by Generating Synergy from the Integration of 2D and 3D Approaches

by Ningning Fan, Steffen Hirte and Johannes Kirchmair

Int. J. Mol. Sci. 2022, 23(14), 7747; https://doi.org/10.3390/ijms23147747 - 13 Jul 2022

Cited by 3 | Viewed by 2942

Abstract

Methods for the pairwise comparison of 2D and 3D molecular structures are established approaches in virtual screening. In this work, we explored three strategies for maximizing the virtual screening performance of these methods: (i) the merging of hit lists obtained from multi-compound screening using a single screening method, (ii) the merging of the hit lists obtained from 2D and 3D screening by parallel selection, and (iii) the combination of both of these strategies in an integrated approach. We found that any of these strategies led to a boost in virtual screening performance, with the clearest advantages observed for the integrated approach. On test sets for virtual screening, covering 50 pharmaceutically relevant proteins, the integrated approach, using sets of five query molecules, yielded, on average, an area under the receiver operating characteristic curve (AUC) of 0.84, an early enrichment among the top 1% of ranked compounds (EF1%) of 53.82 and a scaffold recovery rate among the top 1% of ranked compounds (SRR1%) of 0.50. In comparison, the 2D and 3D methods on their own (when using a single query molecule) yielded AUC values of 0.68 and 0.54, EF1% values of 19.96 and 17.52, and SRR1% values of 0.20 and 0.17, respectively. In conclusion, based on these results, the integration of 2D and 3D methods, via a (balanced) parallel selection strategy, is recommended, and, in particular, when combined with multi-query screening. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Figure 1

19 pages, 1445 KB

Open AccessArticle

Protein–Protein Interaction Prediction for Targeted Protein Degradation

by Oliver Orasch, Noah Weber, Michael Müller, Amir Amanzadi, Chiara Gasbarri and Christopher Trummer

Int. J. Mol. Sci. 2022, 23(13), 7033; https://doi.org/10.3390/ijms23137033 - 24 Jun 2022

Cited by 12 | Viewed by 7966

Abstract

Protein–protein interactions (PPIs) play a fundamental role in various biological functions; thus, detecting PPI sites is essential for understanding diseases and developing new drugs. PPI prediction is of particular relevance for the development of drugs employing targeted protein degradation, as their efficacy relies on the formation of a stable ternary complex involving two proteins. However, experimental methods to detect PPI sites are both costly and time-intensive. In recent years, machine learning-based methods have been developed as screening tools. While they are computationally more efficient than traditional docking methods and thus allow rapid execution, these tools have so far primarily been based on sequence information, and they are therefore limited in their ability to address spatial requirements. In addition, they have to date not been applied to targeted protein degradation. Here, we present a new deep learning architecture based on the concept of graph representation learning that can predict interaction sites and interactions of proteins based on their surface representations. We demonstrate that our model reaches state-of-the-art performance using AUROC scores on the established MaSIF dataset. We furthermore introduce a new dataset with more diverse protein interactions and show that our model generalizes well to this new data. These generalization capabilities allow our model to predict the PPIs relevant for targeted protein degradation, which we show by demonstrating the high accuracy of our model for PPI prediction on the available ternary complex data. Our results suggest that PPI prediction models can be a valuable tool for screening protein pairs while developing new drugs for targeted protein degradation. Full article

(This article belongs to the Special Issue Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0)

► Show Figures

Journal Menu

Journal Browser

Chemoinformatics and Bioinformatics Tools in Structure-Activity Modelling in Molecular Sciences 2.0

Share This Special Issue

Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI