Computational Approaches for Disease Gene Identification

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (30 June 2018) | Viewed by 51776

Special Issue Editors


E-Mail Website
Guest Editor
College of Life Science, Shanghai University, Shanghai 200244, China
Interests: systems biology; bioinformatics; protein sequence; machine learning
Special Issues, Collections and Topics in MDPI journals
Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China
Interests: bioinformatics; genetics; genomics; machine learning; ceRNA network; predictive modeling
Special Issues, Collections and Topics in MDPI journals
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
Interests: bioinformatics; computational biology; graph theory; algorithm design and analysis

Special Issue Information

Dear Colleagues,

Identification of disease genes is the foundation of medicine. Many experimental approaches have been used to screen the candidate disease genes. Genome-wide association studies (GWAS) can establish the associations between single-nucleotide polymorphism (SNP) and disease phenotypes. Clustered regularly interspaced short palindromic repeats (CRISPR) can knockdown genes and by measuring the before and after knockdown gene expression profiles and observing the phenotype of cells, their downstream genes can be investigated and their functions can be inferred. But all these experimental approaches have insurmountable barriers. For example, many of the GWAS identified SNPs locate in intergenic region and cannot be annotated to specific genes. The CRISPR knockdown may affect many genes and the cells may exhibit many irrelevant phenotypes. If you do not know which phenotypes to look at, you may miss the actual important events. Beside the limitations of experimental approaches their self, the integration of multi-data is another key problem. If the GWAS results indicate a gene is disease associated but the CRISPR result on cells indicate otherwise, how to deicide? To overcome these problems, the computational methods, such as network based prioritization of GWAS candidates, expression Quantitative Trait Loci (eQTL) regulatory network construction and analysis, machine learning based integration of multi-omics data, should be incorporated with the experimental technologies to identify the disease genes. The aim of this Special Issue is to introduce the latest developments of interdisciplinary researches of disease gene identification. Any original research and review articles related to the described topics are welcomed.

Dr. Yudong Cai
Dr. Tao Huang
Dr. Lei Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • disease gene prioritization
  • expression quantitative trait loci (eQTL)
  • deleterious single amino acid polymorphisms (SAP) identification
  • decipher the effects of gene knockout
  • multi-omics data integration
  • deep learning biosystem modeling
  • key driver analysis

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 698 KiB  
Article
Identification of Novel Candidate Markers of Type 2 Diabetes and Obesity in Russia by Exome Sequencing with a Limited Sample Size
by Yury A. Barbitoff, Elena A. Serebryakova, Yulia A. Nasykhova, Alexander V. Predeus, Dmitrii E. Polev, Anna R. Shuvalova, Evgenii V. Vasiliev, Stanislav P. Urazov, Andrey M. Sarana, Sergey G. Scherbak, Dmitrii V. Gladyshev, Maria S. Pokrovskaya, Oksana V. Sivakova, Aleksey N. Meshkov, Oxana M. Drapkina, Oleg S. Glotov and Andrey S. Glotov
Genes 2018, 9(8), 415; https://doi.org/10.3390/genes9080415 - 17 Aug 2018
Cited by 22 | Viewed by 5675
Abstract
Type 2 diabetes (T2D) and obesity are common chronic disorders with multifactorial etiology. In our study, we performed an exome sequencing analysis of 110 patients of Russian ethnicity together with a multi-perspective approach based on biologically meaningful filtering criteria to detect novel candidate [...] Read more.
Type 2 diabetes (T2D) and obesity are common chronic disorders with multifactorial etiology. In our study, we performed an exome sequencing analysis of 110 patients of Russian ethnicity together with a multi-perspective approach based on biologically meaningful filtering criteria to detect novel candidate variants and loci for T2D and obesity. We have identified several known single nucleotide polymorphisms (SNPs) as markers for obesity (rs11960429), T2D (rs9379084, rs1126930), and body mass index (BMI) (rs11553746, rs1956549 and rs7195386) (p < 0.05). We show that a method based on scoring of case-specific variants together with selection of protein-altering variants can allow for the interrogation of novel and known candidate markers of T2D and obesity in small samples. Using this method, we identified rs328 in LPL (p = 0.023), rs11863726 in HBQ1 (p = 8 × 10−5), rs112984085 in VAV3 (p = 4.8 × 10−4) for T2D and obesity, rs6271 in DBH (p = 0.043), rs62618693 in QSER1 (p = 0.021), rs61758785 in RAD51B (p = 1.7 × 10−4), rs34042554 in PCDHA1 (p = 1 × 10−4), and rs144183813 in PLEKHA5 (p = 1.7 × 10−4) for obesity; and rs9379084 in RREB1 (p = 0.042), rs2233984 in C6orf15 (p = 0.030), rs61737764 in ITGB6 (p = 0.035), rs17801742 in COL2A1 (p = 8.5 × 10−5), and rs685523 in ADAMTS13 (p = 1 × 10−6) for T2D as important susceptibility loci in Russian population. Our results demonstrate the effectiveness of whole exome sequencing (WES) technologies for searching for novel markers of multifactorial diseases in cohorts of limited size in poorly studied populations. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Graphical abstract

12 pages, 2118 KiB  
Article
GADD45B as a Prognostic and Predictive Biomarker in Stage II Colorectal Cancer
by Zhixun Zhao, Yibo Gao, Xu Guan, Zheng Liu, Zheng Jiang, Xiuyun Liu, Huixin Lin, Ming Yang, Chunxiang Li, Runkun Yang, Shuangmei Zou and Xishan Wang
Genes 2018, 9(7), 361; https://doi.org/10.3390/genes9070361 - 19 Jul 2018
Cited by 25 | Viewed by 5907
Abstract
GADD45B acts as a member of the growth arrest DNA damage-inducible gene family, which has demonstrated to play critical roles in DNA damage repair, cell growth, and apoptosis. This study aimed to explore the potential relationship between GADD45B expression and tumor progression and [...] Read more.
GADD45B acts as a member of the growth arrest DNA damage-inducible gene family, which has demonstrated to play critical roles in DNA damage repair, cell growth, and apoptosis. This study aimed to explore the potential relationship between GADD45B expression and tumor progression and evaluate the clinical value of GADD45B in stage II colorectal cancer (CRC). The expression patterns and prognostic value of GADD45B in CRC were analyzed based on The Cancer Genomic Atlas (TCGA). GADD45B expression features of 306 patients with stage II CRC and 201 patients with liver metastasis of CRC were investigated using immunochemical staining on tissue microarrays. Afterward, survival analysis and stratification analysis were performed in stage II to explore the prognostic and predictive significance of GADD45B. Overexpressed GADD45B is associated with poorer prognosis for CRC patients both in overall survival (OS) (p < 0.001) and disease-free survival (DFS) (p = 0.001) based on the TCGA database. Analysis results according to the stage II CRC cohort and the liver metastatic CRC cohort revealed that GADD45B was gradually upregulated in normal mucosa including primary colorectal cancer (PCC). Colorectal liver metastases (CLM) tissues were arranged in order (normal tissue vs. PCC p = 0.005 and PCC vs. CLM p = 0.001). The low GADD45B group had a significantly longer five-year OS (p = 0.001) and progression-free survival (PFS) (p < 0.001) than the high GADD45B group for the stage II patients. The multivariate Cox regression analysis results proved that the expression level of GADD45B was an independent prognostic factor for stage II after radical surgery (OS: Hazard Ratio (HR) 0.479, [95% confidence interval (CI) 0.305–0.753] and PFS:HR 0.490, [95% CI 0.336–0.714]). In high GADD45B expression subgroup of stage II cohort, the patients who underwent adjuvant chemotherapy had longer PFS than those who did not (p = 0.008). High expression levels of GADD45B is an independent prognostic factor of decreased OS and PFS in stage II CRC patients. The stage II CRC patients with high GADD45B expression might benefit from adjuvant chemotherapy. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

9 pages, 770 KiB  
Article
SEGF: A Novel Method for Gene Fusion Detection from Single-End Next-Generation Sequencing Data
by Hai Xu, Xiaojin Wu, Dawei Sun, Shijun Li, Siwen Zhang, Miao Teng, Jianlong Bu, Xizhe Zhang, Bo Meng, Weitao Wang, Geng Tian, Huixin Lin, Dawei Yuan, Jidong Lang and Shidong Xu
Genes 2018, 9(7), 331; https://doi.org/10.3390/genes9070331 - 02 Jul 2018
Cited by 6 | Viewed by 5537
Abstract
With the development and application of next-generation sequencing (NGS) and target capture technology, the demand for an effective analysis method to accurately detect gene fusion from high-throughput data is growing. Hence, we developed a novel fusion gene analyzing method called single-end gene fusion [...] Read more.
With the development and application of next-generation sequencing (NGS) and target capture technology, the demand for an effective analysis method to accurately detect gene fusion from high-throughput data is growing. Hence, we developed a novel fusion gene analyzing method called single-end gene fusion (SEGF) by starting with single-end DNA-seq data. This approach takes raw sequencing data as input, and integrates the commonly used alignment approach basic local alignment search tool (BLAST) and short oligonucleotide analysis package (SOAP) with stringent passing filters to achieve successful fusion gene detection. To evaluate SEGF, we compared it with four other fusion gene discovery analysis methods by analyzing sequencing results of 23 standard DNA samples and DNA extracted from 286 lung cancer formalin fixed paraffin embedded (FFPE) samples. The results generated by SEGF indicated that it not only detected the fusion genes from standard samples and clinical samples, but also had the highest accuracy and sensitivity among the five compared methods. In addition, SEGF was capable of detecting complex gene fusion types from single-end NGS sequencing data compared with other methods. By using SEGF to acquire gene fusion information at DNA level, more useful information can be retrieved from the DNA panel or other DNA sequencing methods without generating RNA sequencing information to benefit clinical diagnosis or medication instruction. It was a timely and cost-effective measure with regard to research or diagnosis. Considering all the above, SEGF is a straightforward method without manipulating complicated arguments, providing a useful approach for the precise detection of gene fusion variation. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

9 pages, 767 KiB  
Article
Weighted Gene Co-Expression Network Analysis Reveals Dysregulation of Mitochondrial Oxidative Phosphorylation in Eating Disorders
by Liulin Yang, Yun Li, Turki Turki, Huizi Tan, Zhi Wei and Xiao Chang
Genes 2018, 9(7), 325; https://doi.org/10.3390/genes9070325 - 28 Jun 2018
Cited by 4 | Viewed by 4593
Abstract
The underlying mechanisms of eating disorders (EDs) are very complicated and still poorly understood. The pathogenesis of EDs may involve the interplay of multiple genes. To investigate the dysregulated gene pathways in EDs we analyzed gene expression profiling in dorsolateral prefrontal cortex (DLPFC) [...] Read more.
The underlying mechanisms of eating disorders (EDs) are very complicated and still poorly understood. The pathogenesis of EDs may involve the interplay of multiple genes. To investigate the dysregulated gene pathways in EDs we analyzed gene expression profiling in dorsolateral prefrontal cortex (DLPFC) tissues from 15 EDs cases, including 3 with anorexia nervosa (AN), 7 with bulimia nervosa (BN), 2 AN-BN cases, 3 cases of EDs not otherwise specified, and 102 controls. We further used a weighted gene co-expression network analysis to construct a gene co-expression network and to detect functional modules of highly correlated genes. The functional enrichment analysis of genes in co-expression modules indicated that an altered mitochondrial oxidative phosphorylation process may be involved in the pathogenesis of EDs. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

17 pages, 7271 KiB  
Article
Identifying Patients with Atrioventricular Septal Defect in Down Syndrome Populations by Using Self-Normalizing Neural Networks and Feature Selection
by Xiaoyong Pan, Xiaohua Hu, Yu Hang Zhang, Kaiyan Feng, Shao Peng Wang, Lei Chen, Tao Huang and Yu Dong Cai
Genes 2018, 9(4), 208; https://doi.org/10.3390/genes9040208 - 12 Apr 2018
Cited by 36 | Viewed by 6312
Abstract
Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without [...] Read more.
Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew’s correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

16 pages, 2681 KiB  
Article
Identification of Key Pathways and Genes in the Dynamic Progression of HCC Based on WGCNA
by Li Yin, Zhihui Cai, Baoan Zhu and Cunshuan Xu
Genes 2018, 9(2), 92; https://doi.org/10.3390/genes9020092 - 14 Feb 2018
Cited by 115 | Viewed by 11989
Abstract
Hepatocellular carcinoma (HCC) is a devastating disease worldwide. Though many efforts have been made to elucidate the process of HCC, its molecular mechanisms of development remain elusive due to its complexity. To explore the stepwise carcinogenic process from pre-neoplastic lesions to the end [...] Read more.
Hepatocellular carcinoma (HCC) is a devastating disease worldwide. Though many efforts have been made to elucidate the process of HCC, its molecular mechanisms of development remain elusive due to its complexity. To explore the stepwise carcinogenic process from pre-neoplastic lesions to the end stage of HCC, we employed weighted gene co-expression network analysis (WGCNA) which has been proved to be an effective method in many diseases to detect co-expressed modules and hub genes using eight pathological stages including normal, cirrhosis without HCC, cirrhosis, low-grade dysplastic, high-grade dysplastic, very early and early, advanced HCC and very advanced HCC. Among the eight consecutive pathological stages, five representative modules are selected to perform canonical pathway enrichment and upstream regulator analysis by using ingenuity pathway analysis (IPA) software. We found that cell cycle related biological processes were activated at four neoplastic stages, and the degree of activation of the cell cycle corresponded to the deterioration degree of HCC. The orange and yellow modules enriched in energy metabolism, especially oxidative metabolism, and the expression value of the genes decreased only at four neoplastic stages. The brown module, enriched in protein ubiquitination and ephrin receptor signaling pathways, correlated mainly with the very early stage of HCC. The darkred module, enriched in hepatic fibrosis/hepatic stellate cell activation, correlated with the cirrhotic stage only. The high degree hub genes were identified based on the protein-protein interaction (PPI) network and were verified by Kaplan-Meier survival analysis. The novel five high degree hub genes signature that was identified in our study may shed light on future prognostic and therapeutic approaches. Our study brings a new perspective to the understanding of the key pathways and genes in the dynamic changes of HCC progression. These findings shed light on further investigations Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

15 pages, 3548 KiB  
Article
Molecular Network-Based Identification of Competing Endogenous RNAs in Thyroid Carcinoma
by Minjia Lu, Xingyu Xu, Baohang Xi, Qi Dai, Chenli Li, Li Su, Xiaonan Zhou, Min Tang, Yuhua Yao and Jialiang Yang
Genes 2018, 9(1), 44; https://doi.org/10.3390/genes9010044 - 19 Jan 2018
Cited by 25 | Viewed by 5760
Abstract
RNAs may act as competing endogenous RNAs (ceRNAs), a critical mechanism in determining gene expression regulations in many cancers. However, the roles of ceRNAs in thyroid carcinoma remains elusive. In this study, we have developed a novel pipeline called Molecular Network-based Identification of [...] Read more.
RNAs may act as competing endogenous RNAs (ceRNAs), a critical mechanism in determining gene expression regulations in many cancers. However, the roles of ceRNAs in thyroid carcinoma remains elusive. In this study, we have developed a novel pipeline called Molecular Network-based Identification of ceRNA (MNIceRNA) to identify ceRNAs in thyroid carcinoma. MNIceRNA first constructs micro RNA (miRNA)–messenger RNA (mRNA)long non-coding RNA (lncRNA) networks from miRcode database and weighted correlation network analysis (WGCNA), based on which to identify key drivers of differentially expressed RNAs between normal and tumor samples. It then infers ceRNAs of the identified key drivers using the long non-coding competing endogenous database (lnCeDB). We applied the pipeline into The Cancer Genome Atlas (TCGA) thyroid carcinoma data. As a result, 598 lncRNAs, 1025 mRNAs, and 90 microRNA (miRNAs) were inferred to be differentially expressed between normal and thyroid cancer samples. We then obtained eight key driver miRNAs, among which hsa-mir-221 and hsa-mir-222 were key driver RNAs identified by both miRNA–mRNA–lncRNA and WGCNA network. In addition, hsa-mir-375 was inferred to be significant for patients’ survival with 34 associated ceRNAs, among which RUNX2, DUSP6 and SEMA3D are known oncogenes regulating cellular proliferation and differentiation in thyroid cancer. These ceRNAs are critical in revealing the secrets behind thyroid cancer progression and may serve as future therapeutic biomarkers. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

10 pages, 5345 KiB  
Article
SNCA Is a Functionally Low-Expressed Gene in Lung Adenocarcinoma
by Yuanliang Yan, Zhijie Xu, Xiaofang Hu, Long Qian, Zhi Li, Yangying Zhou, Shuang Dai, Shuangshuang Zeng and Zhicheng Gong
Genes 2018, 9(1), 16; https://doi.org/10.3390/genes9010016 - 04 Jan 2018
Cited by 37 | Viewed by 5049
Abstract
There is increasing evidence for the contribution of synuclein alpha (SNCA) to the etiology of neurological disorders, such as Parkinson’s disease (PD). However, little is known about the detailed role of SNCA in human cancers, especially lung cancers. Here, we evaluated [...] Read more.
There is increasing evidence for the contribution of synuclein alpha (SNCA) to the etiology of neurological disorders, such as Parkinson’s disease (PD). However, little is known about the detailed role of SNCA in human cancers, especially lung cancers. Here, we evaluated the effects of SNCA on the occurrence and prognosis of lung adenocarcinoma (ADC). Comprehensive bioinformatics analyses of data obtained from the Oncomine platform, the human protein atlas (HPA) project and the cancer cell line encyclopedia (CCLE) demonstrated that SNCA expression was significantly reduced in both ADC tissues and cancer cells. The results of relevant clinical studies indicated that down-regulation of SNCA was statistically correlated with shorter overall survival time and post-progression survival time. Through analysis of datasets obtained from the Gene Expression Omnibus database, significant low levels of SNCA were identified in cisplatin-resistant ADC cells. Moreover, small interfering RNA (siRNA)-mediated knockdown of protein tyrosine kinase 7 (PTK7) elevated the expression of SNCA in the ADC cell lines H1299 and H2009. Our work demonstrates that low levels of SNCA are specifically found in ADC and that this gene may be a potential therapeutic target for this subset of lung cancers. Determination of the role of SNCA in ADC biology would give us some insightful information for further investigations. Full article
(This article belongs to the Special Issue Computational Approaches for Disease Gene Identification)
Show Figures

Figure 1

Back to TopTop