ijms-logo

Journal Browser

Journal Browser

Machine Learning Applications in Bioinformatics and Biomedicine

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (31 March 2024) | Viewed by 18738

Special Issue Editor


E-Mail Website
Guest Editor
Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
Interests: protein informatics; peptide sequence analysis; machine learning application in biological macromolecular data; biomarker; protein post-translational modification site; systems biology; clinical data analysis; disease risk prediction; analysis and identification of DNA regulatory element
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine learning has been developed for over 40 years. In recent years, with the rapid accumulation of data in the biological and medical fields, machine learning has been widely used in these fields. The purpose of organizing this Special Issue is to provide a platform for publishing the latest cutting-edge work related to the application of machine learning in the biological and medicine fields, and promote the development of related fields. This Special Issue will focus on various aspects of the development and application of computational methods and techniques in biological and medical data for discovering disease markers. The subtopics include, but are not limited to, the following:

  • Identification of disease markers from genome, transcriptome, proteome and metabolome;
  • Discovery of drug target using machine learning;
  • Drug design based on machine learning;
  • Using machine learning to analyze clinical data;
  • Research on big data of physical examination based on machine learning and artificial intelligence;
  • Prediction of drug side effects based on machine learning;
  • Epigenetics markers discovery for disease using artificial intelligence;
  • The discovery of molecular network marker for disease diagnosis and therapy;
  • Early screening of diseases based on artificial intelligence.

Prof. Dr. Hao Lin
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • genome
  • transcriptome
  • proteome
  • metabolome
  • drug target
  • machine learning
  • prediction of drug side effects
  • epigenetics markers discovery for disease
  • molecular network marker
  • early screening of diseases

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 10553 KiB  
Article
Advancing Adverse Drug Reaction Prediction with Deep Chemical Language Model for Drug Safety Evaluation
by Jinzhu Lin, Yujie He, Chengxiang Ru, Wulin Long, Menglong Li and Zhining Wen
Int. J. Mol. Sci. 2024, 25(8), 4516; https://doi.org/10.3390/ijms25084516 - 20 Apr 2024
Viewed by 408
Abstract
The accurate prediction of adverse drug reactions (ADRs) is essential for comprehensive drug safety evaluation. Pre-trained deep chemical language models have emerged as powerful tools capable of automatically learning molecular structural features from large-scale datasets, showing promising capabilities for the downstream prediction of [...] Read more.
The accurate prediction of adverse drug reactions (ADRs) is essential for comprehensive drug safety evaluation. Pre-trained deep chemical language models have emerged as powerful tools capable of automatically learning molecular structural features from large-scale datasets, showing promising capabilities for the downstream prediction of molecular properties. However, the performance of pre-trained chemical language models in predicting ADRs, especially idiosyncratic ADRs induced by marketed drugs, remains largely unexplored. In this study, we propose MoLFormer-XL, a pre-trained model for encoding molecular features from canonical SMILES, in conjunction with a CNN-based model to predict drug-induced QT interval prolongation (DIQT), drug-induced teratogenicity (DIT), and drug-induced rhabdomyolysis (DIR). Our results demonstrate that the proposed model outperforms conventional models applied in previous studies for predicting DIQT, DIT, and DIR. Notably, an analysis of the learned linear attention maps highlights amines, alcohol, ethers, and aromatic halogen compounds as strongly associated with the three types of ADRs. These findings hold promise for enhancing drug discovery pipelines and reducing the drug attrition rate due to safety concerns. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

11 pages, 444 KiB  
Article
Optimizing Neural Networks for Chemical Reaction Prediction: Insights from Methylene Blue Reduction Reactions
by Ivan Malashin, Vadim Tynchenko, Andrei Gantimurov, Vladimir Nelyub and Aleksei Borodulin
Int. J. Mol. Sci. 2024, 25(7), 3860; https://doi.org/10.3390/ijms25073860 - 29 Mar 2024
Viewed by 534
Abstract
This paper offers a thorough investigation of hyperparameter tuning for neural network architectures using datasets encompassing various combinations of Methylene Blue (MB) Reduction by Ascorbic Acid (AA) reactions with different solvents and concentrations. The aim is to predict coefficients of decay plots for [...] Read more.
This paper offers a thorough investigation of hyperparameter tuning for neural network architectures using datasets encompassing various combinations of Methylene Blue (MB) Reduction by Ascorbic Acid (AA) reactions with different solvents and concentrations. The aim is to predict coefficients of decay plots for MB absorbance, shedding light on the complex dynamics of chemical reactions. Our findings reveal that the optimal model, determined through our investigation, consists of five hidden layers, each with sixteen neurons and employing the Swish activation function. This model yields an NMSE of 0.05, 0.03, and 0.04 for predicting the coefficients A, B, and C, respectively, in the exponential decay equation A + B · ex/C. These findings contribute to the realm of drug design based on machine learning, providing valuable insights into optimizing chemical reaction predictions. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

13 pages, 679 KiB  
Article
ReporType: A Flexible Bioinformatics Tool for Targeted Loci Screening and Typing of Infectious Agents
by Helena Cruz, Miguel Pinheiro and Vítor Borges
Int. J. Mol. Sci. 2024, 25(6), 3172; https://doi.org/10.3390/ijms25063172 - 9 Mar 2024
Cited by 1 | Viewed by 1058
Abstract
In response to the pressing need for continuous monitoring of emergence and circulation of pathogens through genomics, it is imperative to keep developing bioinformatics tools that can help in their rapid characterization and classification. Here, we introduce ReporType, a versatile bioinformatics pipeline designed [...] Read more.
In response to the pressing need for continuous monitoring of emergence and circulation of pathogens through genomics, it is imperative to keep developing bioinformatics tools that can help in their rapid characterization and classification. Here, we introduce ReporType, a versatile bioinformatics pipeline designed for targeted loci screening and typing of infectious agents. Developed using the snakemake workflow manager, ReporType integrates multiple software for read quality control and de novo assembly, and then applies ABRicate for locus screening, culminating in the production of easily interpretable reports for the identification of pathogen genotypes and/or screening of specific genomic loci. The pipeline accommodates a range of input formats, from Illumina or Oxford Nanopore Technology (ONT) reads (FASTQ) to Sanger sequencing files (AB1), or FASTA files, making it flexible for application in multiple pathogens and with different purposes. ReporType is released with pre-prepared databases for some viruses and bacteria, yet it remains easily configurable to handle custom databases. ReporType performance and functionality were validated through proof-of-concept exercises, encompassing diverse pathogenic species, including viruses such as measles, Newcastle disease virus (NDV), Dengue virus (DENV), influenza, hepatitis C virus (HCV) and Human T-Cell Lymphotropic virus type 1 (HTLV-1), as well as bacteria like Chlamydia trachomatis and Legionella pneumophila. In summary, ReporType emerges as a simple, dynamic and pan-pathogen tool, poised to evolve in tandem with the ever-changing needs of the fields of pathogen genomics, infectious disease epidemiology, and one health bioinformatics. ReporType is freely available at GitHub. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

15 pages, 541 KiB  
Article
scMGCN: A Multi-View Graph Convolutional Network for Cell Type Identification in scRNA-seq Data
by Hongmin Sun, Haowen Qu, Kaifu Duan and Wei Du
Int. J. Mol. Sci. 2024, 25(4), 2234; https://doi.org/10.3390/ijms25042234 - 13 Feb 2024
Viewed by 770
Abstract
Single-cell RNA sequencing (scRNA-seq) data reveal the complexity and diversity of cellular ecosystems and molecular interactions in various biomedical research. Hence, identifying cell types from large-scale scRNA-seq data using existing annotations is challenging and requires stable and interpretable methods. However, the current cell [...] Read more.
Single-cell RNA sequencing (scRNA-seq) data reveal the complexity and diversity of cellular ecosystems and molecular interactions in various biomedical research. Hence, identifying cell types from large-scale scRNA-seq data using existing annotations is challenging and requires stable and interpretable methods. However, the current cell type identification methods have limited performance, mainly due to the intrinsic heterogeneity among cell populations and extrinsic differences between datasets. Here, we present a robust graph artificial intelligence model, a multi-view graph convolutional network model (scMGCN) that integrates multiple graph structures from raw scRNA-seq data and applies graph convolutional networks with attention mechanisms to learn cell embeddings and predict cell labels. We evaluate our model on single-dataset, cross-species, and cross-platform experiments and compare it with other state-of-the-art methods. Our results show that scMGCN outperforms the other methods regarding stability, accuracy, and robustness to batch effects. Our main contributions are as follows: Firstly, we introduce multi-view learning and multiple graph construction methods to capture comprehensive cellular information from scRNA-seq data. Secondly, we construct a scMGCN that combines graph convolutional networks with attention mechanisms to extract shared, high-order information from cells. Finally, we demonstrate the effectiveness and superiority of the scMGCN on various datasets. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

16 pages, 6709 KiB  
Article
Identifying Explainable Machine Learning Models and a Novel SFRP2+ Fibroblast Signature as Predictors for Precision Medicine in Ovarian Cancer
by Ziyi Yang, Dandan Zhou and Jun Huang
Int. J. Mol. Sci. 2023, 24(23), 16942; https://doi.org/10.3390/ijms242316942 - 29 Nov 2023
Viewed by 1289
Abstract
Ovarian cancer (OC) is a type of malignant tumor with a consistently high mortality rate. The diagnosis of early-stage OC and identification of functional subsets in the tumor microenvironment are essential to the development of patient management strategies. However, the development of robust [...] Read more.
Ovarian cancer (OC) is a type of malignant tumor with a consistently high mortality rate. The diagnosis of early-stage OC and identification of functional subsets in the tumor microenvironment are essential to the development of patient management strategies. However, the development of robust models remains unsatisfactory. We aimed to utilize artificial intelligence and single-cell analysis to address this issue. Two independent datasets were screened from the Gene Expression Omnibus (GEO) database and processed to obtain overlapping differentially expressed genes (DEGs) in stage II–IV vs. stage I diseases. Three explainable machine learning algorithms were integrated to construct models that could determine the tumor stage and extract important characteristic genes as diagnostic biomarkers. Correlations between cancer-associated fibroblast (CAF) infiltration and characteristic gene expression were analyzed using TIMER2.0 and their relationship with survival rates was comprehensively explored via the Kaplan–Meier plotter (KM-plotter) online database. The specific expression of characteristic genes in fibroblast subsets was investigated through single-cell analysis. A novel fibroblast subset signature was explored to predict immune checkpoint inhibitor (ICI) response and oncogene mutation through Tumor Immune Dysfunction and Exclusion (TIDE) and artificial neural network algorithms, respectively. We found that Support Vector Machine–Shapley Additive Explanations (SVM-SHAP), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) successfully diagnosed early-stage OC (stage I). The area under the receiver operating characteristic curves (AUCs) of these models exceeded 0.990. Their overlapping characteristic gene, secreted frizzled-related protein 2 (SFRP2), was a risk factor that affected the overall survival of OC patients with stage II–IV disease (log-rank test: p < 0.01) and was specifically expressed in a fibroblast subset. Finally, the SFRP2+ fibroblast signature served as a novel predictor in evaluating ICI response and exploring pan-cancer tumor protein P53 (TP53) mutation (AUC = 0.853, 95% confidence interval [CI]: 0.829–0.877). In conclusion, the models based on SVM-SHAP, XGBoost, and RF enabled the early detection of OC for clinical decision making, and SFRP2+ fibroblast signature used in diagnostic models can inform OC treatment selection and offer pan-cancer TP53 mutation detection. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

19 pages, 7726 KiB  
Article
An In Silico Study for Expanding the Utility of Cannabidiol in Alzheimer’s Disease Therapeutic Development
by Kyudam Choi, Yurim Lee and Cheongwon Kim
Int. J. Mol. Sci. 2023, 24(21), 16013; https://doi.org/10.3390/ijms242116013 - 6 Nov 2023
Viewed by 1246
Abstract
Cannabidiol (CBD), a major non-psychoactive component of the cannabis plant, has shown therapeutic potential in Alzheimer’s disease (AD). In this study, we identified potential CBD targets associated with AD using a drug-target binding affinity prediction model and generated CBD analogs using a genetic [...] Read more.
Cannabidiol (CBD), a major non-psychoactive component of the cannabis plant, has shown therapeutic potential in Alzheimer’s disease (AD). In this study, we identified potential CBD targets associated with AD using a drug-target binding affinity prediction model and generated CBD analogs using a genetic algorithm combined with a molecular docking system. As a result, we identified six targets associated with AD: Endothelial NOS (ENOS), Myeloperoxidase (MPO), Apolipoprotein E (APOE), Amyloid-beta precursor protein (APP), Disintegrin and metalloproteinase domain-containing protein 10 (ADAM10), and Presenilin-1 (PSEN1). Furthermore, we generated CBD analogs for each target that optimize for all desired drug-likeness properties and physicochemical property filters, resulting in improved pIC50 values and docking scores compared to CBD. Molecular dynamics (MD) simulations were applied to analyze each target’s CBD and highest-scoring CBD analogs. The MD simulations revealed that the complexes of ENOS, MPO, and ADAM10 with CBD exhibited high conformational stability, and the APP and PSEN1 complexes with CBD analogs demonstrated even higher conformational stability and lower interaction energy compared to APP and PSEN1 complexes with CBD. These findings demonstrated the capable binding of the six identified targets with CBD and the enhanced binding stability achieved with the developed CBD analogs for each target. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

26 pages, 24628 KiB  
Article
Network Pharmacology Combined with Machine Learning to Reveal the Action Mechanism of Licochalcone Intervention in Liver Cancer
by Fangfang Guo, Xiaotang Yang, Chengxiang Hu, Wannan Li and Weiwei Han
Int. J. Mol. Sci. 2023, 24(21), 15935; https://doi.org/10.3390/ijms242115935 - 3 Nov 2023
Viewed by 1122
Abstract
There are reports indicating that licochalcones can inhibit the proliferation, migration, and invasion of cancer cells by promoting the expression of autophagy-related proteins, inhibiting the expression of cell cycle proteins and angiogenic factors, and regulating autophagy and apoptosis. This study aims to reveal [...] Read more.
There are reports indicating that licochalcones can inhibit the proliferation, migration, and invasion of cancer cells by promoting the expression of autophagy-related proteins, inhibiting the expression of cell cycle proteins and angiogenic factors, and regulating autophagy and apoptosis. This study aims to reveal the potential mechanisms of licochalcone A (LCA), licochalcone B (LCB), licochalcone C (LCC), licochalcone D (LCD), licochalcone E (LCE), licochalcone F (LCF), and licochalcone G (LCG) inhibition in liver cancer through computer-aided screening strategies. By using machine learning clustering analysis to search for other structurally similar components in licorice, quantitative calculations were conducted to collect the structural commonalities of these components related to liver cancer and to identify key residues involved in the interactions between small molecules and key target proteins. Our research results show that the seven licochalcones molecules interfere with the cancer signaling pathway via the NF-κB signaling pathway, PDL1 expression and PD1 checkpoint pathway in cancer, and others. Glypallichalcone, Echinatin, and 3,4,3′,4′-Tetrahydroxy-2-methoxychalcone in licorice also have similar structures to the seven licochalcones, which may indicate their similar effects. We also identified the key residues (including ASN364, GLY365, TRP366, and TYR485) involved in the interactions between ten flavonoids and the key target protein (nitric oxide synthase 2). In summary, we provide valuable insights into the molecular mechanisms of the anticancer effects of licorice flavonoids, providing new ideas for the design of small molecules for liver cancer drugs. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

11 pages, 4258 KiB  
Article
Deep Learning Approach for Differentiating Etiologies of Pediatric Retinal Hemorrhages: A Multicenter Study
by Pooya Khosravi, Nolan A. Huck, Kourosh Shahraki, Stephen C. Hunter, Clifford Neil Danza, So Young Kim, Brian J. Forbes, Shuan Dai, Alex V. Levin, Gil Binenbaum, Peter D. Chang and Donny W. Suh
Int. J. Mol. Sci. 2023, 24(20), 15105; https://doi.org/10.3390/ijms242015105 - 12 Oct 2023
Cited by 2 | Viewed by 1114
Abstract
Retinal hemorrhages in pediatric patients can be a diagnostic challenge for ophthalmologists. These hemorrhages can occur due to various underlying etiologies, including abusive head trauma, accidental trauma, and medical conditions. Accurate identification of the etiology is crucial for appropriate management and legal considerations. [...] Read more.
Retinal hemorrhages in pediatric patients can be a diagnostic challenge for ophthalmologists. These hemorrhages can occur due to various underlying etiologies, including abusive head trauma, accidental trauma, and medical conditions. Accurate identification of the etiology is crucial for appropriate management and legal considerations. In recent years, deep learning techniques have shown promise in assisting healthcare professionals in making more accurate and timely diagnosis of a variety of disorders. We explore the potential of deep learning approaches for differentiating etiologies of pediatric retinal hemorrhages. Our study, which spanned multiple centers, analyzed 898 images, resulting in a final dataset of 597 retinal hemorrhage fundus photos categorized into medical (49.9%) and trauma (50.1%) etiologies. Deep learning models, specifically those based on ResNet and transformer architectures, were applied; FastViT-SA12, a hybrid transformer model, achieved the highest accuracy (90.55%) and area under the receiver operating characteristic curve (AUC) of 90.55%, while ResNet18 secured the highest sensitivity value (96.77%) on an independent test dataset. The study highlighted areas for optimization in artificial intelligence (AI) models specifically for pediatric retinal hemorrhages. While AI proves valuable in diagnosing these hemorrhages, the expertise of medical professionals remains irreplaceable. Collaborative efforts between AI specialists and pediatric ophthalmologists are crucial to fully harness AI’s potential in diagnosing etiologies of pediatric retinal hemorrhages. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

12 pages, 439 KiB  
Article
A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe
by Anna Kloska, Agata Giełczyk, Tomasz Grzybowski, Rafał Płoski, Sylwester M. Kloska, Tomasz Marciniak, Krzysztof Pałczyński, Urszula Rogalla-Ładniak, Boris A. Malyarchuk, Miroslava V. Derenko, Nataša Kovačević-Grujičić, Milena Stevanović, Danijela Drakulić, Slobodan Davidović, Magdalena Spólnicka, Magdalena Zubańska and Marcin Woźniak
Int. J. Mol. Sci. 2023, 24(20), 15095; https://doi.org/10.3390/ijms242015095 - 11 Oct 2023
Cited by 1 | Viewed by 1806
Abstract
Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) [...] Read more.
Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) techniques seem to be especially well suited for analyzing large datasets obtained using MPS. The Slavic populations constitute about a third of the population of Europe and inhabit a large area of the continent, while being relatively closely related in population genetics terms. In this proof-of-concept study, various ML techniques were used to classify DNA samples from Slavic and non-Slavic individuals. The primary objective of this study was to empirically evaluate the feasibility of discerning the genetic provenance of individuals of Slavic descent who exhibit genetic similarity, with the overarching goal of categorizing DNA specimens derived from diverse Slavic population representatives. Raw sequencing data were pre-processed, to obtain a 1200 character-long binary vector. A total of three classifiers were used—Random Forest, Support Vector Machine (SVM), and XGBoost. The most-promising results were obtained using SVM with a linear kernel, with 99.9% accuracy and F1-scores of 0.9846–1.000 for all classes. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

17 pages, 790 KiB  
Article
Graph Convolutional Network and Contrastive Learning Small Nucleolar RNA (snoRNA) Disease Associations (GCLSDA): Predicting snoRNA–Disease Associations via Graph Convolutional Network and Contrastive Learning
by Liangliang Zhang, Ming Chen, Xiaowen Hu and Lei Deng
Int. J. Mol. Sci. 2023, 24(19), 14429; https://doi.org/10.3390/ijms241914429 - 22 Sep 2023
Cited by 1 | Viewed by 905
Abstract
Small nucleolar RNAs (snoRNAs) constitute a prevalent class of noncoding RNAs localized within the nucleoli of eukaryotic cells. Their involvement in diverse diseases underscores the significance of forecasting associations between snoRNAs and diseases. However, conventional experimental techniques for such predictions suffer limitations in [...] Read more.
Small nucleolar RNAs (snoRNAs) constitute a prevalent class of noncoding RNAs localized within the nucleoli of eukaryotic cells. Their involvement in diverse diseases underscores the significance of forecasting associations between snoRNAs and diseases. However, conventional experimental techniques for such predictions suffer limitations in scalability, protracted timelines, and suboptimal success rates. Consequently, efficient computational methodologies are imperative to realize the accurate predictions of snoRNA–disease associations. Herein, we introduce GCLSDA—graph Convolutional Network and contrastive learning predict snoRNA disease associations. GCLSDA is an innovative framework that combines graph convolution networks and self-supervised learning for snoRNA–disease association prediction. Leveraging the repository of MNDR v4.0 and ncRPheno databases, we construct a robust snoRNA–disease association dataset, which serves as the foundation to create bipartite graphs. The computational prowess of the light graph convolutional network (LightGCN) is harnessed to acquire nuanced embedded representations of both snoRNAs and diseases. With careful consideration, GCLSDA intelligently incorporates contrast learning to address the challenging issues of sparsity and over-smoothing inside correlation matrices. This combination not only ensures the precision of predictions but also amplifies the model’s robustness. Moreover, we introduce the augmentation technique of random noise to refine the embedded snoRNA representations, consequently enhancing the precision of predictions. Within the domain of contrast learning, we unite the tasks of contrast and recommendation. This harmonization streamlines the cross-layer contrast process, simplifying the information propagation and concurrently curtailing computational complexity. In the area of snoRNA–disease associations, GCLSDA constantly shows its promising capacity for prediction through extensive research. This success not only contributes valuable insights into the functional roles of snoRNAs in disease etiology, but also plays an instrumental role in identifying potential drug targets and catalyzing innovative treatment modalities. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

20 pages, 7040 KiB  
Article
A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning
by Yixian Huang, Hsi-Yuan Huang, Yigang Chen, Yang-Chi-Dung Lin, Lantian Yao, Tianxiu Lin, Junlin Leng, Yuan Chang, Yuntian Zhang, Zihao Zhu, Kun Ma, Yeong-Nan Cheng, Tzong-Yi Lee and Hsien-Da Huang
Int. J. Mol. Sci. 2023, 24(18), 14061; https://doi.org/10.3390/ijms241814061 - 14 Sep 2023
Cited by 1 | Viewed by 1713
Abstract
Drug–target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug–target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate [...] Read more.
Drug–target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug–target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug–target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug–target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

11 pages, 1280 KiB  
Communication
A Benchmark Study of Graph Models for Molecular Acute Toxicity Prediction
by Rajas Ketkar, Yue Liu, Hengji Wang and Hao Tian
Int. J. Mol. Sci. 2023, 24(15), 11966; https://doi.org/10.3390/ijms241511966 - 26 Jul 2023
Cited by 1 | Viewed by 1493
Abstract
With the wide usage of organic compounds, the assessment of their acute toxicity has drawn great attention to reduce animal testing and human labor. The development of graph models provides new opportunities for acute toxicity prediction. In this study, five graph models (message-passing [...] Read more.
With the wide usage of organic compounds, the assessment of their acute toxicity has drawn great attention to reduce animal testing and human labor. The development of graph models provides new opportunities for acute toxicity prediction. In this study, five graph models (message-passing neural network, graph convolution network, graph attention network, path-augmented graph transformer network, and Attentive FP) were applied on four toxicity tasks (fish, Daphnia magna, Tetrahymena pyriformis, and Vibrio fischeri). With the lowest prediction error, Attentive FP was reported to have the best performance in all four tasks. Moreover, the attention weights of the Attentive FP model helped to construct atomic heatmaps and provide good explainability. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

12 pages, 1609 KiB  
Article
A Random Forest Model for Peptide Classification Based on Virtual Docking Data
by Hua Feng, Fangyu Wang, Ning Li, Qian Xu, Guanming Zheng, Xuefeng Sun, Man Hu, Guangxu Xing and Gaiping Zhang
Int. J. Mol. Sci. 2023, 24(14), 11409; https://doi.org/10.3390/ijms241411409 - 13 Jul 2023
Cited by 2 | Viewed by 1300
Abstract
The affinity of peptides is a crucial factor in studying peptide–protein interactions. Despite the development of various techniques to evaluate peptide–receptor affinity, the results may not always reflect the actual affinity of the peptides accurately. The current study provides a free tool to [...] Read more.
The affinity of peptides is a crucial factor in studying peptide–protein interactions. Despite the development of various techniques to evaluate peptide–receptor affinity, the results may not always reflect the actual affinity of the peptides accurately. The current study provides a free tool to assess the actual peptide affinity based on virtual docking data. This study employed a dataset that combined actual peptide affinity information (active and inactive) and virtual peptide–receptor docking data, and different machine learning algorithms were utilized. Compared with the other algorithms, the random forest (RF) algorithm showed the best performance and was used in building three RF models using different numbers of significant features (four, three, and two). Further analysis revealed that the four-feature RF model achieved the highest Accuracy of 0.714 in classifying an independent unknown peptide dataset designed with the PEDV spike protein, and it also revealed overfitting problems in the other models. This four-feature RF model was used to evaluate peptide affinity by constructing the relationship between the actual affinity and the virtual docking scores of peptides to their receptors. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

25 pages, 5379 KiB  
Article
Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers
by Wei Wei, Yixue Li and Tao Huang
Int. J. Mol. Sci. 2023, 24(13), 11133; https://doi.org/10.3390/ijms241311133 - 6 Jul 2023
Cited by 1 | Viewed by 2107
Abstract
Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide, and the identification of biomarkers can improve early detection and personalized treatment. In this study, RNA-seq data and gene chip data from TCGA and GEO were used to explore potential biomarkers for [...] Read more.
Colorectal cancer (CRC) is a leading cause of cancer deaths worldwide, and the identification of biomarkers can improve early detection and personalized treatment. In this study, RNA-seq data and gene chip data from TCGA and GEO were used to explore potential biomarkers for CRC. The SMOTE method was used to address class imbalance, and four feature selection algorithms (MCFS, Borota, mRMR, and LightGBM) were used to select genes from the gene expression matrix. Four machine learning algorithms (SVM, XGBoost, RF, and kNN) were then employed to obtain the optimal number of genes for model construction. Through interpretable machine learning (IML), co-predictive networks were generated to identify rules and uncover underlying relationships among the selected genes. Survival analysis revealed that INHBA, FNBP1, PDE9A, HIST1H2BG, and CADM3 were significantly correlated with prognosis in CRC patients. In addition, the CIBERSORT algorithm was used to investigate the proportion of immune cells in CRC tissues, and gene mutation rates for the five selected biomarkers were explored. The biomarkers identified in this study have significant implications for the development of personalized therapies and could ultimately lead to improved clinical outcomes for CRC patients. Full article
(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)
Show Figures

Figure 1

Back to TopTop