MDPI - Publisher of Open Access Journals

24 pages, 4049 KB

Open AccessArticle

Transcriptome-Wide Analysis and Experimental Validation from FFPE Tissue Identifies Stage-Specific Gene Expression Profiles Differentiating Adenoma, Carcinoma In-Situ and Adenocarcinoma in Colorectal Cancer Progression

by Faisal Alhosani, Reem Sami Alhamidi, Burcu Yener Ilce, Alaa Muayad Altaie, Nival Ali, Alaa Mohamed Hamad, Axel Künstner, Cyrus Khandanpour, Hauke Busch, Basel Al-Ramadi, Rania Harati, Kadria Sayed, Ali AlFazari, Riyad Bendardaf and Rifat Hamoudi

Int. J. Mol. Sci. 2025, 26(9), 4194; https://doi.org/10.3390/ijms26094194 - 28 Apr 2025

Viewed by 1932

Abstract

Colorectal cancer (CRC) progression occurs through three stages: adenoma (pre-cancerous lesion), carcinoma in situ (CIS) and adenocarcinoma, with tumor stage playing a pivotal role in the prognosis and treatment outcomes. Despite therapeutic advancements, the lack of stage-specific biomarkers hinders the development of accurate [...] Read more.

Colorectal cancer (CRC) progression occurs through three stages: adenoma (pre-cancerous lesion), carcinoma in situ (CIS) and adenocarcinoma, with tumor stage playing a pivotal role in the prognosis and treatment outcomes. Despite therapeutic advancements, the lack of stage-specific biomarkers hinders the development of accurate diagnostic tools and effective therapeutic strategies. This study aims to identify stage-specific gene expression profiles and key molecular mechanisms in CRC providing insights into molecular alterations across disease progression. Our methodological approach integrates the use of absolute gene set enrichment analysis (absGSEA) on formalin-fixed paraffin-embedded (FFPE)-derived transcriptomic data, combined with large-scale clinical validation and experimental confirmation. A comparative whole transcriptomic analysis (RNA-seq) was performed on FFPE samples including adenoma (n = 10), carcinoma in situ (CIS) (n = 8) and adenocarcinoma (n = 11) samples. Using absGSEA, we identified significant cellular pathways and putative molecular biomarkers associated with each stage of CRC progression. Key findings were then validated in a large independent CRC patient cohort (n = 1926), with survival analysis conducted from 1336 patients to assess the prognostic relevance of the candidate biomarkers. The key differentially expressed genes were experimentally validated using real-time PCR (RT-qPCR). Pathway analysis revealed that in CIS, apoptotic processes and Wnt signaling pathways were more prominent than in adenoma samples, while in adenocarcinoma, transcriptional co-regulatory mechanisms and protein kinase activity, which are critical for tumor growth and metastasis, were significantly enriched compared to adenoma. Additionally, extracellular matrix organization pathways were significantly enriched in adenocarcinoma compared to CIS. Distinct gene signatures were identified across CRC stages that differentiate between adenoma, CIS and adenocarcinoma. In adenoma, ARRB1, CTBP1 and CTBP2 were overexpressed, suggesting their involvement in early tumorigenesis, whereas in CIS, RPS3A and COL4A5 were overexpressed, suggesting their involvement in the transition from benign to malignant stage. In adenocarcinoma, COL1A2, CEBPZ, MED10 and PAWR were overexpressed, suggesting their involvement in advanced disease progression. Functional analysis confirmed that ARRB1 and CTBP1/2 were associated with early tumor development, while COL1A2 and CEBPZ were involved in extracellular matrix remodeling and transcriptional regulation, respectively. Experimental validation with RT-qPCR confirmed the differential expression of the candidate biomarkers (ARRB1, RPS3A, COL4A5, COL1A2 and MED10) across the three CRC stages reinforcing their potential as stage-specific biomarkers in CRC progression. These findings provide a foundation to distinguish between the CRC stages and for the development of accurate stage-specific diagnostic and prognostic biomarkers, which helps in the development of more effective therapeutic strategies for CRC. Full article

(This article belongs to the Special Issue The Molecular Network, Key Biomarkers, and Therapeutic Targets in Colorectal Cancer)

► Show Figures

Figure 1

16 pages, 4868 KB

Open AccessArticle

Drosophila Eye Gene Regulatory Network Inference Using BioGRNsemble: An Ensemble-of-Ensembles Machine Learning Approach

by Abdul Jawad Mohammed and Amal Khalifa

BioMedInformatics 2024, 4(4), 2186-2200; https://doi.org/10.3390/biomedinformatics4040117 - 29 Oct 2024

Viewed by 1617

Abstract

Background: Gene regulatory networks (GRNs) are complex gene interactions essential for organismal development and stability, and they are crucial for understanding gene-disease links in drug development. Advances in bioinformatics, driven by genomic data and machine learning, have significantly expanded GRN research, enabling deeper [...] Read more.

Background: Gene regulatory networks (GRNs) are complex gene interactions essential for organismal development and stability, and they are crucial for understanding gene-disease links in drug development. Advances in bioinformatics, driven by genomic data and machine learning, have significantly expanded GRN research, enabling deeper insights into these interactions. Methods: This study proposes and demonstrates the potential of BioGRNsemble, a modular and flexible approach for inferring gene regulatory networks from RNA-Seq data. Integrating the GENIE3 and GRNBoost2 algorithms, the BioGRNsemble methodology focuses on providing trimmed-down sub-regulatory networks consisting of transcription and target genes. Results: The methodology was successfully tested on a Drosophila melanogaster Eye gene expression dataset. Our validation analysis using the TFLink online database yielded 3703 verified predicted gene links, out of 534,843 predictions. Conclusion: Although the BioGRNsemble approach presents a promising method for inferring smaller, focused regulatory networks, it encounters challenges related to algorithm sensitivity, prediction bias, validation difficulties, and the potential exclusion of broader regulatory interactions. Improving accuracy and comprehensiveness will require addressing these issues through hyperparameter fine-tuning, the development of alternative scoring mechanisms, and the incorporation of additional validation methods. Full article

(This article belongs to the Section Applied Biomedical Data Science)

► Show Figures

Figure 1

14 pages, 4951 KB

Open AccessArticle

Novel Methodology for the Design of Personalized Cancer Vaccine Targeting Neoantigens: Application to Pancreatic Ductal Adenocarcinoma

by Kush Savsani and Sivanesan Dakshanamurthy

Diseases 2024, 12(7), 149; https://doi.org/10.3390/diseases12070149 - 11 Jul 2024

Cited by 4 | Viewed by 2756

Abstract

Personalized cancer vaccines have emerged as a promising avenue for cancer treatment or prevention strategies. This approach targets the specific genetic alterations in individual patient’s tumors, offering a more personalized and effective treatment option. Previous studies have shown that generalized peptide vaccines targeting [...] Read more.

Personalized cancer vaccines have emerged as a promising avenue for cancer treatment or prevention strategies. This approach targets the specific genetic alterations in individual patient’s tumors, offering a more personalized and effective treatment option. Previous studies have shown that generalized peptide vaccines targeting a limited scope of gene mutations were ineffective, emphasizing the need for personalized approaches. While studies have explored personalized mRNA vaccines, personalized peptide vaccines have not yet been studied in this context. Pancreatic ductal adenocarcinoma (PDAC) remains challenging in oncology, necessitating innovative therapeutic strategies. In this study, we developed a personalized peptide vaccine design methodology, employing RNA sequencing (RNAseq) to identify prevalent gene mutations underlying PDAC development in a patient solid tumor tissue. We performed RNAseq analysis for trimming adapters, read alignment, and somatic variant calling. We also developed a Python program called SCGeneID, which validates the alignment of the RNAseq analysis. The Python program is freely available to download. Using chromosome number and locus data, SCGeneID identifies the target gene along the UCSC hg38 reference set. Based on the gene mutation data, we developed a personalized PDAC cancer vaccine that targeted 100 highly prevalent gene mutations in two patients. We predicted peptide-MHC binding affinity, immunogenicity, antigenicity, allergenicity, and toxicity for each epitope. Then, we selected the top 50 and 100 epitopes based on our previously published vaccine design methodology. Finally, we generated pMHC-TCR 3D molecular model complex structures, which are freely available to download. The designed personalized cancer vaccine contains epitopes commonly found in PDAC solid tumor tissue. Our personalized vaccine was composed of neoantigens, allowing for a more precise and targeted immune response against cancer cells. Additionally, we identified mutated genes, which were also found in the reference study, where we obtained the sequencing data, thus validating our vaccine design methodology. This is the first study designing a personalized peptide cancer vaccine targeting neoantigens using human patient data to identify gene mutations associated with the specific tumor of interest. Full article

► Show Figures

Figure 1

26 pages, 10062 KB

Open AccessArticle

Identifying Key Genes Involved in Axillary Lymph Node Metastasis in Breast Cancer Using Advanced RNA-Seq Analysis: A Methodological Approach with GLMQL and MAS

by Mostafa Rezapour, Robert Wesolowski and Metin Nafi Gurcan

Int. J. Mol. Sci. 2024, 25(13), 7306; https://doi.org/10.3390/ijms25137306 - 3 Jul 2024

Cited by 9 | Viewed by 3367

Abstract

Our study aims to address the methodological challenges frequently encountered in RNA-Seq data analysis within cancer studies. Specifically, it enhances the identification of key genes involved in axillary lymph node metastasis (ALNM) in breast cancer. We employ Generalized Linear Models with Quasi-Likelihood (GLMQLs) [...] Read more.

Our study aims to address the methodological challenges frequently encountered in RNA-Seq data analysis within cancer studies. Specifically, it enhances the identification of key genes involved in axillary lymph node metastasis (ALNM) in breast cancer. We employ Generalized Linear Models with Quasi-Likelihood (GLMQLs) to manage the inherently discrete and overdispersed nature of RNA-Seq data, marking a significant improvement over conventional methods such as the t-test, which assumes a normal distribution and equal variances across samples. We utilize the Trimmed Mean of M-values (TMMs) method for normalization to address library-specific compositional differences effectively. Our study focuses on a distinct cohort of 104 untreated patients from the TCGA Breast Invasive Carcinoma (BRCA) dataset to maintain an untainted genetic profile, thereby providing more accurate insights into the genetic underpinnings of lymph node metastasis. This strategic selection paves the way for developing early intervention strategies and targeted therapies. Our analysis is exclusively dedicated to protein-coding genes, enriched by the Magnitude Altitude Scoring (MAS) system, which rigorously identifies key genes that could serve as predictors in developing an ALNM predictive model. Our novel approach has pinpointed several genes significantly linked to ALNM in breast cancer, offering vital insights into the molecular dynamics of cancer development and metastasis. These genes, including ERBB2, CCNA1, FOXC2, LEFTY2, VTN, ACKR3, and PTGS2, are involved in key processes like apoptosis, epithelial–mesenchymal transition, angiogenesis, response to hypoxia, and KRAS signaling pathways, which are crucial for tumor virulence and the spread of metastases. Moreover, the approach has also emphasized the importance of the small proline-rich protein family (SPRR), including SPRR2B, SPRR2E, and SPRR2D, recognized for their significant involvement in cancer-related pathways and their potential as therapeutic targets. Important transcripts such as H3C10, H1-2, PADI4, and others have been highlighted as critical in modulating the chromatin structure and gene expression, fundamental for the progression and spread of cancer. Full article

(This article belongs to the Special Issue Targeting Breast Cancer: Strategies and Hope—2nd Edition)

► Show Figures

Figure 1

19 pages, 968 KB

Open AccessArticle

Gene Expression Analysis for Uterine Cervix and Corpus Cancer Characterization

by Lucía Almorox, Laura Antequera, Ignacio Rojas, Luis Javier Herrera and Francisco M. Ortuño

Genes 2024, 15(3), 312; https://doi.org/10.3390/genes15030312 - 28 Feb 2024

Cited by 2 | Viewed by 2990

Abstract

The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated [...] Read more.

The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated with uterine cancer localizations and histological subtypes. To achieve this, RNA-Seq data from the TCGA database were preprocessed and analyzed using the KnowSeq package. Firstly, a kNN model was applied to classify uterine cervix cancer, uterine corpus cancer, and healthy uterine samples. Through variable selection, a three-gene signature was identified (VWCE, CLDN15, ADCYAP1R1), achieving consistent 100% test accuracy across 20 repetitions of a 5-fold cross-validation. A supplementary similar analysis using miRNA-Seq data from the same samples identified an optimal two-gene miRNA-coding signature potentially regulating the three-gene signature previously mentioned, which attained optimal classification performance with an 82% F1-macro score. Subsequently, a kNN model was implemented for the classification of cervical cancer samples into their two main histological subtypes (adenocarcinoma and squamous cell carcinoma). A uni-gene signature (ICA1L) was identified, achieving 100% test accuracy through 20 repetitions of a 5-fold cross-validation and externally validated through the CGCI program. Finally, an examination of six cervical adenosquamous carcinoma (mixed) samples revealed a pattern where the gene expression value in the mixed class aligned closer to the histological subtype with lower expression, prompting a reconsideration of the diagnosis for these mixed samples. In summary, this study provides valuable insights into the molecular mechanisms of uterine cervix and corpus cancers. The newly identified gene signatures demonstrate robust predictive capabilities, guiding future research in cancer diagnosis and treatment methodologies. Full article

(This article belongs to the Special Issue New Advances and Challenges in Bioinformatics. IWBBIO-2023 Selection)

► Show Figures

Graphical abstract

13 pages, 2320 KB

Open AccessArticle

Exploring Promising Biomarkers for Alzheimer’s Disease through the Computational Analysis of Peripheral Blood Single-Cell RNA Sequencing Data

by Marios G. Krokidis, Aristidis G. Vrahatis, Konstantinos Lazaros and Panagiotis Vlamos

Appl. Sci. 2023, 13(9), 5553; https://doi.org/10.3390/app13095553 - 29 Apr 2023

Cited by 8 | Viewed by 3665

Abstract

Alzheimer’s disease (AD) represents one of the most important healthcare challenges of the current century, characterized as an expanding, “silent pandemic”. Recent studies suggest that the peripheral immune system may participate in AD development; however, the molecular components of these cells in AD [...] Read more.

Alzheimer’s disease (AD) represents one of the most important healthcare challenges of the current century, characterized as an expanding, “silent pandemic”. Recent studies suggest that the peripheral immune system may participate in AD development; however, the molecular components of these cells in AD remain poorly understood. Although single-cell RNA sequencing (scRNA-seq) offers a sufficient exploration of various biological processes at the cellular level, the number of existing works is limited, and no comprehensive machine learning (ML) analysis has yet been conducted to identify effective biomarkers in AD. Herein, we introduced a computational workflow using both deep learning and ML processes examining scRNA-seq data obtained from the peripheral blood of both Alzheimer’s disease patients with an amyloid-positive status and healthy controls with an amyloid-negative status, totaling 36,849 cells. The output of our pipeline contained transcripts ranked by their level of significance, which could serve as reliable genetic signatures of AD pathophysiology. The comprehensive functional analysis of the most dominant genes in terms of biological relevance to AD demonstrates that the proposed methodology has great potential for discovering blood-based fingerprints of the disease. Furthermore, the present approach paves the way for the application of ML techniques to scRNA-seq data from complex disorders, providing new challenges to identify key biological processes from a molecular perspective. Full article

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

► Show Figures

Figure 1

11 pages, 569 KB

Open AccessArticle

RNA-Seq Study on the Longissimus thoracis Muscle of Italian Large White Pigs Fed Extruded Linseed with or without Antioxidants and Polyphenols

by Jacopo Vegni, Ying Sun, Stefan E. Seemann, Martina Zappaterra, Roberta Davoli, Stefania Dall’Olio, Jan Gorodkin and Paolo Zambonelli

Animals 2023, 13(7), 1187; https://doi.org/10.3390/ani13071187 - 28 Mar 2023

Viewed by 1926

Abstract

The addition of n-3 polyunsaturated fatty acids (n-3 PUFAs) to the swine diet increases their content in muscle cells, and the additional supplementation of antioxidants promotes their oxidative stability. However, to date, the functionality of these components within muscle tissue [...] Read more.

The addition of n-3 polyunsaturated fatty acids (n-3 PUFAs) to the swine diet increases their content in muscle cells, and the additional supplementation of antioxidants promotes their oxidative stability. However, to date, the functionality of these components within muscle tissue is not well understood. Using a published RNA-seq dataset and a selective workflow, the study aimed to find the differences in gene expression and investigate how differentially expressed genes (DEGs) were implicated in the cellular composition and metabolism of muscle tissue of 48 Italian Large White pigs under different dietary conditions. A functional enrichment analysis of DEGs, using Cytoscape, revealed that the diet enriched with extruded linseed and supplemented with vitamin E and selenium promoted a more rapid and massive immune system response because the overall function of muscle tissue was improved, while those enriched with extruded linseed and supplemented with grape skin and oregano extracts promoted the presence and oxidative stability of n-3 PUFAs, increasing the anti-inflammatory potential of the muscular tissue. Full article

(This article belongs to the Special Issue Innovative Production Strategies for High-Quality, Traditional Pig Products—2nd Edition)

► Show Figures

Figure 1

11 pages, 1685 KB

Open AccessArticle

An Ensemble Feature Selection Approach for Analysis and Modeling of Transcriptome Data in Alzheimer’s Disease

by Petros Paplomatas, Marios G. Krokidis, Panagiotis Vlamos and Aristidis G. Vrahatis

Appl. Sci. 2023, 13(4), 2353; https://doi.org/10.3390/app13042353 - 11 Feb 2023

Cited by 18 | Viewed by 3282

Abstract

Data-driven analysis and characterization of molecular phenotypes comprises an efficient way to decipher complex disease mechanisms. Using emerging next generation sequencing technologies, important disease-relevant outcomes are extracted, offering the potential for precision diagnosis and therapeutics in progressive disorders. Single-cell RNA sequencing (scRNA-seq) allows [...] Read more.

Data-driven analysis and characterization of molecular phenotypes comprises an efficient way to decipher complex disease mechanisms. Using emerging next generation sequencing technologies, important disease-relevant outcomes are extracted, offering the potential for precision diagnosis and therapeutics in progressive disorders. Single-cell RNA sequencing (scRNA-seq) allows the inherent heterogeneity between individual cellular environments to be exploited and provides one of the most promising platforms for quantifying cell-to-cell gene expression variability. However, the high-dimensional nature of scRNA-seq data poses a significant challenge for downstream analysis, particularly in identifying genes that are dominant across cell populations. Feature selection is a crucial step in scRNA-seq data analysis, reducing the dimensionality of data and facilitating the identification of genes most relevant to the biological question. Herein, we present a need for an ensemble feature selection methodology for scRNA-seq data, specifically in the context of Alzheimer’s disease (AD). We combined various feature selection strategies to obtain the most dominant differentially expressed genes (DEGs) in an AD scRNA-seq dataset, providing a promising approach to identify potential transcriptome biomarkers through scRNA-seq data analysis, which can be applied to other diseases. We anticipate that feature selection techniques, such as our ensemble methodology, will dominate analysis options for transcriptome data, especially as datasets increase in volume and complexity, leading to more accurate classification and the generation of differentially significant features. Full article

(This article belongs to the Special Issue Machine Learning in Bioinformatics: Latest Advances and Prospects)

► Show Figures

Figure 1

17 pages, 2798 KB

Open AccessArticle

Exploring the Potential of Metatranscriptomics to Describe Microbial Communities and Their Effects in Molluscs

by Magalí Rey-Campos, Raquel Ríos-Castro, Cristian Gallardo-Escárate, Beatriz Novoa and Antonio Figueras

Int. J. Mol. Sci. 2022, 23(24), 16029; https://doi.org/10.3390/ijms232416029 - 16 Dec 2022

Cited by 6 | Viewed by 3138

Abstract

Metatranscriptomics has emerged as a very useful technology for the study of microbiomes from RNA-seq reads. This method provides additional information compared to the sequencing of ribosomal genes because the gene expression can also be analysed. In this work, we used the metatranscriptomic [...] Read more.

Metatranscriptomics has emerged as a very useful technology for the study of microbiomes from RNA-seq reads. This method provides additional information compared to the sequencing of ribosomal genes because the gene expression can also be analysed. In this work, we used the metatranscriptomic approach to study the whole microbiome of mussels, including bacteria, viruses, fungi, and protozoans, by mapping the RNA-seq reads to custom assembly databases (including the genomes of microorganisms publicly available). This strategy allowed us not only to describe the diversity of microorganisms but also to relate the host transcriptome and microbiome, finding the genes more affected by the pathogen load. Although some bacteria abundant in the metatranscriptomic analysis were undetectable by 16S rRNA sequencing, a common core of the taxa was detected by both methodologies (62% of the metatranscriptomic detections were also identified by 16S rRNA sequencing, the Oceanospirillales, Flavobacteriales and Vibrionales orders being the most relevant). However, the differences in the microbiome composition were observed among different tissues of Mytilus galloprovincialis, with the fungal kingdom being especially diverse, or among molluscan species. These results confirm the potential of a meta-analysis of transcriptome data to obtain new information on the molluscs’ microbiome. Full article

(This article belongs to the Special Issue Molecular Bacteria-Invertebrate Interactions)

► Show Figures

Figure 1

15 pages, 611 KB

Open AccessArticle

A Bioinformatics View on Acute Myeloid Leukemia Surface Molecules by Combined Bayesian and ABC Analysis

by Michael C. Thrun, Elisabeth K. M. Mack, Andreas Neubauer, Torsten Haferlach, Miriam Frech, Alfred Ultsch and Cornelia Brendel

Bioengineering 2022, 9(11), 642; https://doi.org/10.3390/bioengineering9110642 - 3 Nov 2022

Cited by 4 | Viewed by 3304

Abstract

“Big omics data” provoke the challenge of extracting meaningful information with clinical benefit. Here, we propose a two-step approach, an initial unsupervised inspection of the structure of the high dimensional data followed by supervised analysis of gene expression levels, to reconstruct the surface [...] Read more.

“Big omics data” provoke the challenge of extracting meaningful information with clinical benefit. Here, we propose a two-step approach, an initial unsupervised inspection of the structure of the high dimensional data followed by supervised analysis of gene expression levels, to reconstruct the surface patterns on different subtypes of acute myeloid leukemia (AML). First, Bayesian methodology was used, focusing on surface molecules encoded by cluster of differentiation (CD) genes to assess whether AML is a homogeneous group or segregates into clusters. Gene expressions of 390 patient samples measured using microarray technology and 150 samples measured via RNA-Seq were compared. Beyond acute promyelocytic leukemia (APL), a well-known AML subentity, the remaining AML samples were separated into two distinct subgroups. Next, we investigated which CD molecules would best distinguish each AML subgroup against APL, and validated discriminative molecules of both datasets by searching the scientific literature. Surprisingly, a comparison of both omics analyses revealed that CD339 was the only overlapping gene differentially regulated in APL and other AML subtypes. In summary, our two-step approach for gene expression analysis revealed two previously unknown subgroup distinctions in AML based on surface molecule expression, which may guide the differentiation of subentities in a given clinical–diagnostic context. Full article

(This article belongs to the Special Issue Machine Learning for Biomedical Applications)

► Show Figures

Graphical abstract

26 pages, 3146 KB

Open AccessReview

Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges

by Samarendra Das, Anil Rai and Shesh N. Rai

Entropy 2022, 24(7), 995; https://doi.org/10.3390/e24070995 - 18 Jul 2022

Cited by 20 | Viewed by 9842

Abstract

With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single [...] Read more.

With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis. Full article

(This article belongs to the Section Entropy Reviews)

► Show Figures

Figure 1

12 pages, 1542 KB

Open AccessReview

The Power of Single-Cell RNA Sequencing in eQTL Discovery

by Maleeha Maria, Negar Pouyanfar, Tiit Örd and Minna U. Kaikkonen

Genes 2022, 13(3), 502; https://doi.org/10.3390/genes13030502 - 12 Mar 2022

Cited by 12 | Viewed by 10179

Abstract

Genome-wide association studies have successfully mapped thousands of loci associated with complex traits. During the last decade, functional genomics approaches combining genotype information with bulk RNA-sequencing data have identified genes regulated by GWAS loci through expression quantitative trait locus (eQTL) analysis. Single-cell RNA-Sequencing [...] Read more.

Genome-wide association studies have successfully mapped thousands of loci associated with complex traits. During the last decade, functional genomics approaches combining genotype information with bulk RNA-sequencing data have identified genes regulated by GWAS loci through expression quantitative trait locus (eQTL) analysis. Single-cell RNA-Sequencing (scRNA-Seq) technologies have created new exciting opportunities for spatiotemporal assessment of changes in gene expression at the single-cell level in complex and inherited conditions. A growing number of studies have demonstrated the power of scRNA-Seq in eQTL mapping across different cell types, developmental stages and stimuli that could be obscured when using bulk RNA-Seq methods. In this review, we outline the methodological principles, advantages, limitations and the future experimental and analytical considerations of single-cell eQTL studies. We look forward to the explosion of single-cell eQTL studies applied to large-scale population genetics to take us one step closer to understanding the molecular mechanisms of disease. Full article

(This article belongs to the Special Issue Integrative Multi-Omics, Single-Cell and Spatial Approaches to Study Complex Diseases)

► Show Figures

Figure 1

29 pages, 8043 KB

Open AccessArticle

Rapid Transient Transcriptional Adaptation to Hypergravity in Jurkat T Cells Revealed by Comparative Analysis of Microarray and RNA-Seq Data

by Christian Vahlensieck, Cora S. Thiel, Jan Adelmann, Beatrice A. Lauber, Jennifer Polzer and Oliver Ullrich

Int. J. Mol. Sci. 2021, 22(16), 8451; https://doi.org/10.3390/ijms22168451 - 6 Aug 2021

Cited by 11 | Viewed by 4273

Abstract

Cellular responses to micro- and hypergravity are rapid and complex and appear within the first few seconds of exposure. Transcriptomic analyses are a valuable tool to analyze these genome-wide cellular alterations. For a better understanding of the cellular dynamics upon altered gravity exposure, [...] Read more.

Cellular responses to micro- and hypergravity are rapid and complex and appear within the first few seconds of exposure. Transcriptomic analyses are a valuable tool to analyze these genome-wide cellular alterations. For a better understanding of the cellular dynamics upon altered gravity exposure, it is important to compare different time points. However, since most of the experiments are designed as endpoint measurements, the combination of cross-experiment meta-studies is inevitable. Microarray and RNA-Seq analyses are two of the main methods to study transcriptomics. In the field of altered gravity research, both methods are frequently used. However, the generation of these data sets is difficult and time-consuming and therefore the number of available data sets in this research field is limited. In this study, we investigated the comparability of microarray and RNA-Seq data and applied the results to a comparison of the transcriptomics dynamics between the hypergravity conditions during two real flight platforms and a centrifuge experiment to identify temporal adaptation processes. We performed a comparative study on an Affymetrix HTA2.0 microarray and a paired-end RNA-Seq data set originating from the same Jurkat T cell RNA samples from a short-term hypergravity experiment. The overall agreeability was high, with better sensitivity of the RNA-Seq analysis. The microarray data set showed weaknesses on the level of single upregulated genes, likely due to its normalization approach. On an aggregated level of biotypes, chromosomal distribution, and gene sets, both technologies performed equally well. The microarray showed better performance on the detection of altered gravity-related splicing events. We found that all initially altered transcripts fully adapted after 15 min to hypergravity and concluded that the altered gene expression response to hypergravity is transient and fully reversible. Based on the combined multiple-platform meta-analysis, we could demonstrate rapid transcriptional adaptation to hypergravity, the differential expression of the ATPase subunits ATP6V1A and ATP6V1D, and the cluster of differentiation (CD) molecules CD1E, CD2AP, CD46, CD47, CD53, CD69, CD96, CD164, and CD226 in hypergravity. We could experimentally demonstrate that it is possible to develop methodological evidence for the meta-analysis of individual data. Full article

(This article belongs to the Special Issue Molecular Mechanobiology in Space and on Earth)

► Show Figures

Figure 1

13 pages, 1293 KB

Open AccessArticle

Gene Set Analysis Using Spatial Statistics

by Angela L. Riffo-Campos, Guillermo Ayala and Francisco Montes

Mathematics 2021, 9(5), 521; https://doi.org/10.3390/math9050521 - 3 Mar 2021

Viewed by 2221

Abstract

Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using [...] Read more.

Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using a gene set collection (gene set analysis). A previous (marginal) per-gene analysis of differential expression is usually performed in order to obtain a set of significant genes or marginal p-values used later in the study of association between phenotype and gene expression. This paper proposes the use of methods of spatial statistics for testing gene set differential expression analysis using paired samples of RNA-Seq counts. This approach is not based on a previous per-gene differential expression analysis. Instead, we compare the paired counts within each sample/control using a binomial test. Each pair per gene will produce a p-value so gene expression profile is transformed into a vector of p-values which will be considered as an event belonging to a point pattern. This would be the first component of a bivariate point pattern. The second component is generated by applying two different randomization distributions to the correspondence between samples and treatment. The self-contained null hypothesis considered in gene set analysis can be formulated in terms of the associated point pattern as a random labeling of the considered bivariate point pattern. The gene sets were defined by the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The proposed methodology was tested in four RNA-Seq datasets of colorectal cancer (CRC) patients and the results were contrasted with those obtained using the edgeR-GOseq pipeline. The proposed methodology has proved to be consistent at the biological and statistical level, in particular using Cuzick and Edwards test with one realization of the second component and between-pair distribution. Full article

(This article belongs to the Special Issue Spatial Statistics with Its Application)

► Show Figures

Figure 1

13 pages, 1567 KB

Open AccessArticle

Genome-Wide Co-Expression Distributions as a Metric to Prioritize Genes of Functional Importance

by Pâmela A. Alexandre, Nicholas J. Hudson, Sigrid A. Lehnert, Marina R. S. Fortes, Marina Naval-Sánchez, Loan T. Nguyen, Laercio R. Porto-Neto and Antonio Reverter

Genes 2020, 11(10), 1231; https://doi.org/10.3390/genes11101231 - 20 Oct 2020

Cited by 1 | Viewed by 6358

Abstract

Genome-wide gene expression analysis are routinely used to gain a systems-level understanding of complex processes, including network connectivity. Network connectivity tends to be built on a small subset of extremely high co-expression signals that are deemed significant, but this overlooks the vast majority [...] Read more.

Genome-wide gene expression analysis are routinely used to gain a systems-level understanding of complex processes, including network connectivity. Network connectivity tends to be built on a small subset of extremely high co-expression signals that are deemed significant, but this overlooks the vast majority of pairwise signals. Here, we developed a computational pipeline to assign to every gene its pair-wise genome-wide co-expression distribution to one of 8 template distributions shapes varying between unimodal, bimodal, skewed, or symmetrical, representing different proportions of positive and negative correlations. We then used a hypergeometric test to determine if specific genes (regulators versus non-regulators) and properties (differentially expressed or not) are associated with a particular distribution shape. We applied our methodology to five publicly available RNA sequencing (RNA-seq) datasets from four organisms in different physiological conditions and tissues. Our results suggest that genes can be assigned consistently to pre-defined distribution shapes, regarding the enrichment of differential expression and regulatory genes, in situations involving contrasting phenotypes, time-series, or physiological baseline data. There is indeed a striking additional biological signal present in the genome-wide distribution of co-expression values which would be overlooked by currently adopted approaches. Our method can be applied to extract further information from transcriptomic data and help uncover the molecular mechanisms involved in the regulation of complex biological process and phenotypes. Full article

(This article belongs to the Section Technologies and Resources for Genetics)

► Show Figures

Figure 1

Search Results (17)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (17)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI