International Journal of Molecular Sciences

Research

Jump to: Other

14 pages, 1815 KB

Open AccessArticle

Machine Learning as a Support for the Diagnosis of Type 2 Diabetes

by Antonio Agliata, Deborah Giordano, Francesco Bardozzo, Salvatore Bottiglieri, Angelo Facchiano and Roberto Tagliaferri

Int. J. Mol. Sci. 2023, 24(7), 6775; https://doi.org/10.3390/ijms24076775 - 5 Apr 2023

Cited by 53 | Viewed by 9406

Abstract

Diabetes is a chronic, metabolic disease characterized by high blood sugar levels. Among the main types of diabetes, type 2 is the most common. Early diagnosis and treatment can prevent or delay the onset of complications. Previous studies examined the application of machine [...] Read more.

Diabetes is a chronic, metabolic disease characterized by high blood sugar levels. Among the main types of diabetes, type 2 is the most common. Early diagnosis and treatment can prevent or delay the onset of complications. Previous studies examined the application of machine learning techniques for prediction of the pathology, and here an artificial neural network shows very promising results as a possible valuable aid in the management and prevention of diabetes. Additionally, its superior ability for long-term predictions makes it an ideal choice for this field of study. We utilized machine learning methods to uncover previously undiscovered associations between an individual’s health status and the development of type 2 diabetes, with the goal of accurately predicting its onset or determining the individual’s risk level. Our study employed a binary classifier, trained on scratch, to identify potential nonlinear relationships between the onset of type 2 diabetes and a set of parameters obtained from patient measurements. Three datasets were utilized, i.e., the National Center for Health Statistics’ (NHANES) biennial survey, MIMIC-III and MIMIC-IV. These datasets were then combined to create a single dataset with the same number of individuals with and without type 2 diabetes. Since the dataset was balanced, the primary evaluation metric for the model was accuracy. The outcomes of this study were encouraging, with the model achieving accuracy levels of up to 86% and a ROC AUC value of 0.934. Further investigation is needed to improve the reliability of the model by considering multiple measurements from the same patient over time. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Graphical abstract

16 pages, 8803 KB

Open AccessArticle

Molecular Docking and Dynamics Simulation Revealed the Potential Inhibitory Activity of New Drugs against Human Topoisomerase I Receptor

by Francesco Madeddu, Jessica Di Martino, Michele Pieroni, Davide Del Buono, Paolo Bottoni, Lorenzo Botta, Tiziana Castrignanò and Raffaele Saladino

Int. J. Mol. Sci. 2022, 23(23), 14652; https://doi.org/10.3390/ijms232314652 - 24 Nov 2022

Cited by 18 | Viewed by 5491

Abstract

Human Topoisomerase I (hTop1p) is a ubiquitous enzyme that relaxes supercoiled DNA through a conserved mechanism involving transient breakage, rotation, and binding. Htop1p is the molecular target of the chemotherapeutic drug camptothecin (CPT). It causes the hTop1p-DNA complex to slow down the binding [...] Read more.

Human Topoisomerase I (hTop1p) is a ubiquitous enzyme that relaxes supercoiled DNA through a conserved mechanism involving transient breakage, rotation, and binding. Htop1p is the molecular target of the chemotherapeutic drug camptothecin (CPT). It causes the hTop1p-DNA complex to slow down the binding process and clash with the replicative machinery during the S phase of the cell cycle, forcing cells to activate the apoptotic response. This gives hTop1p a central role in cancer therapy. Recently, two artesunic acid derivatives (compounds c6 and c7) have been proposed as promising inhibitors of hTop1p with possible antitumor activity. We used several computational approaches to obtain in silico confirmations of the experimental data and to form a comprehensive dynamic description of the ligand-receptor system. We performed molecular docking analyses to verify the ability of the two new derivatives to access the enzyme-DNA interface, and a classical molecular dynamics simulation was performed to assess the capacity of the two compounds to maintain a stable binding pose over time. Finally, we calculated the noncovalent interactions between the two new derivatives and the hTop1p receptor in order to propose a possible inhibitory mechanism like that adopted by CPT. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

13 pages, 1304 KB

Open AccessArticle

hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer

by Simone Carpanzano, Mariangela Santorsola, nf-core community and Francesco Lescai

Int. J. Mol. Sci. 2022, 23(23), 14512; https://doi.org/10.3390/ijms232314512 - 22 Nov 2022

Cited by 2 | Viewed by 5180

Abstract

Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences [...] Read more.

Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

26 pages, 3641 KB

Open AccessArticle

Antiproliferative Activity Predictor: A New Reliable In Silico Tool for Drug Response Prediction against NCI60 Panel

by Annamaria Martorana, Gabriele La Monica, Alessia Bono, Salvatore Mannino, Silvestre Buscemi, Antonio Palumbo Piccionello, Carla Gentile, Antonino Lauria and Daniele Peri

Int. J. Mol. Sci. 2022, 23(22), 14374; https://doi.org/10.3390/ijms232214374 - 19 Nov 2022

Cited by 9 | Viewed by 4377

Abstract

In vitro antiproliferative assays still represent one of the most important tools in the anticancer drug discovery field, especially to gain insights into the mechanisms of action of anticancer small molecules. The NCI-DTP (National Cancer Institute Developmental Therapeutics Program) undoubtedly represents the most [...] Read more.

In vitro antiproliferative assays still represent one of the most important tools in the anticancer drug discovery field, especially to gain insights into the mechanisms of action of anticancer small molecules. The NCI-DTP (National Cancer Institute Developmental Therapeutics Program) undoubtedly represents the most famous project aimed at rapidly testing thousands of compounds against multiple tumor cell lines (NCI60). The large amount of biological data stored in the National Cancer Institute (NCI) database and many other databases has led researchers in the fields of computational biology and medicinal chemistry to develop tools to predict the anticancer properties of new agents in advance. In this work, based on the available antiproliferative data collected by the NCI and the manipulation of molecular descriptors, we propose the new in silico Antiproliferative Activity Predictor (AAP) tool to calculate the GI₅₀ values of input structures against the NCI60 panel. This ligand-based protocol, validated by both internal and external sets of structures, has proven to be highly reliable and robust. The obtained GI₅₀ values of a test set of 99 structures present an error of less than ±1 unit. The AAP is more powerful for GI₅₀ calculation in the range of 4–6, showing that the results strictly correlate with the experimental data. The encouraging results were further supported by the examination of an in-house database of curcumin analogues that have already been studied as antiproliferative agents. The AAP tool identified several potentially active compounds, and a subsequent evaluation of a set of molecules selected by the NCI for the one-dose/five-dose antiproliferative assays confirmed the great potential of our protocol for the development of new anticancer small molecules. The integration of the AAP tool in the free web service DRUDIT provides an interesting device for the discovery and/or optimization of anticancer drugs to the medicinal chemistry community. The training set will be updated with new NCI-tested compounds to cover more chemical spaces, activities, and cell lines. Currently, the same protocol is being developed for predicting the TGI (total growth inhibition) and LC₅₀ (median lethal concentration) parameters to estimate toxicity profiles of small molecules. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

15 pages, 1212 KB

Open AccessArticle

Artificial Intelligence Predictor for Alzheimer’s Disease Trained on Blood Transcriptome: The Role of Oxidative Stress

by Luigi Chiricosta, Simone D’Angiolini, Agnese Gugliandolo and Emanuela Mazzon

Int. J. Mol. Sci. 2022, 23(9), 5237; https://doi.org/10.3390/ijms23095237 - 7 May 2022

Cited by 10 | Viewed by 3756

Abstract

Alzheimer’s disease (AD) is an incurable neurodegenerative disease diagnosed by clinicians through healthcare records and neuroimaging techniques. These methods lack sensitivity and specificity, so new antemortem non-invasive strategies to diagnose AD are needed. Herein, we designed a machine learning predictor based on transcriptomic [...] Read more.

Alzheimer’s disease (AD) is an incurable neurodegenerative disease diagnosed by clinicians through healthcare records and neuroimaging techniques. These methods lack sensitivity and specificity, so new antemortem non-invasive strategies to diagnose AD are needed. Herein, we designed a machine learning predictor based on transcriptomic data obtained from the blood of AD patients and individuals without dementia (non-AD) through an 8 × 60 K microarray. The dataset was used to train different models with different hyperparameters. The support vector machines method allowed us to reach a Receiver Operating Characteristic score of 93% and an accuracy of 89%. High score levels were also achieved by the neural network and logistic regression methods. Furthermore, the Gene Ontology enrichment analysis of the features selected to train the model along with the genes differentially expressed between the non-AD and AD transcriptomic profiles shows the “mitochondrial translation” biological process to be the most interesting. In addition, inspection of the KEGG pathways suggests that the accumulation of β-amyloid triggers electron transport chain impairment, enhancement of reactive oxygen species and endoplasmic reticulum stress. Taken together, all these elements suggest that the oxidative stress induced by β-amyloid is a key feature trained by the model for the prediction of AD with high accuracy. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

15 pages, 3660 KB

Open AccessArticle

Integrated OMICs Approach for the Group 1 Protease Mite-Allergen of House Dust Mite Dermatophagoides microceras

by Rei-Hsing Hu, Chun-Wen Cheng, Chia-Ta Wu, Jiunn-Liang Ko, Ko-Huang Lue and Yu-Fan Liu

Int. J. Mol. Sci. 2022, 23(7), 3810; https://doi.org/10.3390/ijms23073810 - 30 Mar 2022

Cited by 3 | Viewed by 2903

Abstract

House dust mites (HDMs) are one of the most important allergy-causing agents of asthma. In central Taiwan, the prevalence of sensitization to Dermatophagoides microceras (Der m), a particular mite species of HDMs, is approximately 80% and is related to the IgE [...] Read more.

House dust mites (HDMs) are one of the most important allergy-causing agents of asthma. In central Taiwan, the prevalence of sensitization to Dermatophagoides microceras (Der m), a particular mite species of HDMs, is approximately 80% and is related to the IgE crossing reactivity of Dermatophagoides pteronyssinus (Der p) and Dermatophagoides farinae (Der f). Integrated OMICs examination was used to identify and characterize the specific group 1 mite-allergic component (Der m 1). De novo draft genomic assembly and comparative genome analysis predicted that the full-length Der m 1 allergen gene is 321 amino acids in silico. Proteomics verified this result, and its recombinant protein production implicated the cysteine protease and α chain of fibrinogen proteolytic activity. In the sensitized mice, pathophysiological features and increased neutrophils accumulation were evident in the lung tissues and BALF with the combination of Der m 1 and 2 inhalation, respectively. Principal component analysis (PCA) of mice cytokines revealed that the cytokine profiles of the allergen-sensitized mice model with combined Der m 1 and 2 were similar to those with Der m 2 alone but differed from those with Der m 1 alone. Regarding the possible sensitizing roles of Der m 1 in the cells, the fibrinogen cleavage products (FCPs) derived from combined Der m 1 and Der m 2 induced the expression of pro-inflammatory cytokines IL-6 and IL-8 in human bronchial epithelium cells. Der m 1 biologically functions as a cysteine protease and contributes to the α chain of fibrinogen digestion in vitro. The combination of Der m 1 and 2 could induce similar cytokines expression patterns to Der m 2 in mice, and the FCPs derived from Der m 1 has a synergistic effect with Der m 2 to induce the expression of pro-inflammatory cytokines in human bronchial epithelium cells. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

13 pages, 2228 KB

Open AccessArticle

High-Integrity Sequencing of Spike Gene for SARS-CoV-2 Variant Determination

by Yu-Chieh Liao, Feng-Jui Chen, Min-Chieh Chuang, Han-Chieh Wu, Wan-Chen Ji, Guann-Yi Yu and Tsi-Shu Huang

Int. J. Mol. Sci. 2022, 23(6), 3257; https://doi.org/10.3390/ijms23063257 - 17 Mar 2022

Cited by 8 | Viewed by 3494

Abstract

For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and [...] Read more.

For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and cost-intensive. Moreover, it requires 14 pairs of primers in two separate PCRs to obtain spike gene sequences. To overcome these disadvantages, we proposed a single PCR to efficiently detect spike gene mutations. We proposed a bioinformatic protocol that can process FASTQ reads into spike gene consensus sequences to accurately call spike protein variants from sequenced samples or to fairly express the cases of missing amplicons. We evaluated the in silico detection rate of primer sets that yield amplicon sizes of 400, 1200, and 2500 bp for spike gene sequencing of SARS-CoV-2 to be 59.49, 76.19, and 92.20%, respectively. The in silico detection rate of our proposed single PCR primers was 97.07%. We demonstrated the robustness of our analytical protocol against 3000 Oxford Nanopore sequencing runs of distinct datasets, thus ensuring high-integrity sequencing of spike genes for variant SARS-CoV-2 determination. Our protocol works well with the data yielded from versatile primer designs, making it easy to determine spike protein variants. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

Other

Jump to: Research

12 pages, 1403 KB

Open AccessPerspective

What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health

by Frank Emmert-Streib and Olli Yli-Harja

Int. J. Mol. Sci. 2022, 23(21), 13149; https://doi.org/10.3390/ijms232113149 - 29 Oct 2022

Cited by 32 | Viewed by 5996

Abstract

The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, [...] Read more.

The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, a problem that severely hampers progress in these fields is the lack of a solid definition of the concept behind a digital twin that would be directly amenable for such big data-driven fields requiring a statistical data analysis. In this paper, we address this problem. We will see that the term ’digital twin’, as used in the literature, is like a Matryoshka doll. For this reason, we unstack the concept via a data-centric machine learning perspective, allowing us to define its main components. As a consequence, we suggest to use the term Digital Twin System instead of digital twin because this highlights its complex interconnected substructure. In addition, we address ethical concerns that result from treatment suggestions for patients based on simulated data and a possible lack of explainability of the underling models. Full article

(This article belongs to the Special Issue Data Mining and Bioinformatic Tools for Health)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data Mining and Bioinformatic Tools for Health

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI