1. Introduction
The International Conference on Intelligent Biology and Medicine (ICIBM 2019) was organized and hosted by the International Association for Intelligent Biology and Medicine (IAIBM) and the Department of Biomedical Informatics at Ohio State University from 9 to 11 June 2019 in Columbus, Ohio, USA. The detailed description of the conference and its organization and achievements is summarized in [
1]. The conference attracted 164 researchers and over 100 manuscripts with broad scientific topics were submitted. Each manuscript went through ICIBM internal peer review by a minimum of two reviewers. Manuscripts selected for publication in the Special Issue in
Genes were further reviewed by a minimum of two additional external reviewers. In the end, 18 manuscripts were selected for publication in the Special Issue, which covered the topics of cancer research, gene expression, single cell sequencing, novel computational algorithms, and microbiome research. In this editorial, we introduce the 18 selected research manuscripts.
2. Cancer Informatics
Chen et al. published “Computational Cancer Cell Models to Guide Precision Breast Cancer Medicine” [
2]. In this work, the authors introduced an optimal two-layer decision system model for predicting drug sensitivity. The practical application of this model resides in precision or personalized medicine that attempts to address the bridge between conventional in vitro cancer cell models and clinical patient response to cancer drugs, which is still lacking. This model obtained an average accuracy of 90.8% using the simulated data. In conclusion, this two-layer model could be easily extended to multiple cancer types and it could help basic scientists who are seek optimal cancer cell models for an individual tumor while prioritizing clinical drugs’ recommendations in practice.
A pathway is a summary of a set of genes that can be connected via their biological process, regulation, mechanism. or phenomenon. Pathways of important function can be alternatively activated in cancer. Wang et al. published “Identification of Alternatively-Activated Pathways between Primary Breast Cancer and Liver Metastatic Cancer Using Microarray Data”, in which the authors proposed an alternatively-activated pathway mining method [
3] based on microarray data, and identified three types of alternatively-activated pathways between primary breast cancer and breast liver metastatic cancer. There were three types of alternatively-activated pathways that were identified through their analysis as follows: active states of some gene pairs were inversed, some subpathways were only active in primary cancer or metastatic cancer, and some subpathways were alternatively-activated by different genes.
Circulating tumor DNA (ctDNA) has been found in the bloodstream which originated from cancerous cells. Research on ctDNA has been expanding over the last decade resulting in substantial advancement in the identification of single nucleotide variants from ctDNA. Copy number variation (CNV), which is also considered to be an important cancer biomarker, has been very difficult to detect from ctDNA due to the low amount and complex CNV features. Peng et al. published “CNV Detection from Circulating Tumor DNA in Late-Stage Non-Small Cell Lung Cancer Patients” to address the critical issue of CNV identification in ctDNA [
4]. Their method could detect CNVs from a 150-gene panel using a very low amount of ctDNA.
DNA methylation plays a variety of roles in cancer, including a critical role in the control of gene activity, which helps to convert gene expression in normal tissue to a cancerous pattern. Utilizing deep learning techniques, Liu et al. published “DNA Methylation Markers for Pan-Cancer Prediction by Deep Learning”, in which they studied the prognostic value of DNA methylation [
5]. Using data from 27 cancer types covering 10,140 cancer samples and 3386 normal samples, the authors identified the following two categories of markers: 12 CpG markers and 13 promoter markers. The results were validated using cell-free DNA methylation data of 163 prostate cancer samples and they achieved excellent sensitivity and specificity. This study indicates that the identified biomarkers can be utilized in both cancer tissue and a cell-free setting.
Gene fusion describes hybrid genes that are formed from two independent genes. Gene fusion has been a common feature in cancer genomes and has served as a molecular target in therapeutic development. In Helm et al.’s “Gene Co-Expression Networks Restructured Gene Fusion in Rhabdomyosarcoma Cancers”, the authors studied gene fusion features in rhabdomyosarcoma [
6]. In this study, the authors utilized co-expression network to study fusion between
FOXO1 and
PAX3/7 and observed substantial restructuring of co-expression networks related to fusion status and fusion type.
Tumor-infiltrating leukocytes (TILs) are immune cells surrounding tumor cells, and several studies have shown that TILs are potential survival predictors in several types of cancers including liver cancer, which is highly associated with a hepatitis virus. Hsiao et al. studied TIL abundance and compositions concerning hepatocellular carcinomas survival in their manuscript, entitled “Tumor-Infiltrating Leukocyte Composition and Prognostic Power in Hepatitis B- and Hepatitis C-Related Hepatocellular Carcinomas” [
7]. The authors found that the total abundance of TILs was higher in non-tumor tissue regardless of the HCC subtype. Alternatively, the specific TILs associated with overall survival (OS) and recurrence-free survival (RFS) varied between subtypes.
Network and biomarker analyses have been heavily utilized for cancer research. Liu et al. combined the concept of both a network and biomarker approach in their paper entitled “Network as a Biomarker: A Novel Network-Based Sparse Bayesian Machine for Pathway-Driven Drug Response Prediction” [
8]. The authors developed a network-based sparse Bayesian machine (NBSBM) approach, which attempted to use a network as a drug response biomarker. NBSBM made use of the information encoded in a disease-specific (differentially expressed) network to improve its prediction performance in problems with a reduced amount of training data and a very high-dimensional feature space. This method provides a disease-specific network-based drug sensitivity prediction approach and can uncover the potential mechanisms of the action of drugs by selecting the most predictive sub-networks from the disease-specific network.
In the paper entitled “Kinetic Modeling of DUSP Regulation in Herceptin-Resistant HER2-Positive Breast Cancer”, Buiga et al. focused on the analysis of dual-specificity phosphatases (DUSPs) in HER2-positive breast cancer [
9], a highly aggressive subtype of breast cancer. The authors investigated whether inhibiting certain DUSPs resensitized Herceptin-resistant breast cancer cells to the drug by building kinetic models. The authors observed good concordance between their model and real tumor data. Their data shows that kinetic modeling of signaling pathways can generate predictions that assist experimental research in the identification of potential targets for cancer treatment.
5. Transcriptomic Analysis and Tools
In the era of big data, data visualization tools are essential for analyzing massive amounts of information and making data-driven decision. This is no difference in transcriptomic data analysis. Al-Ouran et al. published “A Portal to Visualize Transcriptome Profiles in Mouse Models of Neurological Disorders”, in which they described a new web-based platform for visualizing mouse transcriptome data [
14]. The web portal was developed to help with nominating the best mouse models for studying neurological diseases. The portal can be used to examine gene expression changes across multiple mouse model studies including Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, Amyotrophic Lateral Sclerosis, Spinocerebellar ataxia, and aging-related diseases.
Non-coding RNA has been the focus of many research studies over the last decade. Porto et al. published “Long Non-Coding RNA Expression Levels Modulate Cell-Type-Specific Splicing Patterns by Altering Their Interaction Landscape with RNA-Binding Proteins”, a study [
15] in which the authors explored the role of lncRNAs in modulating alternative splicing and their impact on downstream protein–RNA interaction networks. On the basis of the analysis results, the authors proposed that such lncRNA sponges could extensively rewire post-transcriptional gene regulatory networks by altering the protein–RNA interaction landscape in a cell-type-specific manner.
A circadian rhythm is a natural internal process that regulates the sleep-wake cycle. While the canonical circadian clock genes and their regulatory mechanisms appear highly conserved, the evolution of clock gene families is still unclear due to several rounds of whole genome duplication in vertebrates. Sun et al. studied circadian clock genes in spotted gar, a non-teleost ray-finned fish, and published their findings in the manuscript “The Molecular Evolution of Circadian Clock Genes in Spotted Gar (
Lepisosteus oculatus)” [
16]. Phylogenetic analysis showed that nine of these 11 spotted gar circadian clock gene families have the same number of genes as humans, while the members of the
nfil3 and
cry families are different between spotted gar and humans. These findings help decipher the repertoires of the spotted gar’s circadian system and shed light on how the vertebrate circadian clock systems have evolved.
6. Statistical Methods
In data science, a large dataset is often assembled from multiple smaller datasets with heterogeneity. The missing variable has become a common problem when combining datasets, which poses a major challenge for downstream analysis. Bartlett et al. published “Forming Big Datasets through Latent Class Concatenation of Imperfectly Matched Databases Features”, to address this problem [
17]. The authors introduced ROSETTA, a statistical method to address missing variables. It empirically derives a set of common latent trait metrics for each related measurement domain using a novel variation of factor analysis to ensure equivalence across the constituent datasets. The advantages of combining datasets this way are the simplicity, statistical power, and modeling flexibility of a single joint analysis of all the data.
Genotyping data has been aiding researchers for large genetic association studies for the last two decades. Imputation is an important preprocessing step for combining genotyping data or increasing coverage. Traditional genotype imputation methods are typically based on haplotype-clustering algorithms, hidden Markov models (HMMs), and statistical inference. Chen et al. described their new deep learning-based imputation method [
18] in their manuscript “Sparse Convolutional Denoising Autoencoders for Genotype Imputation”. The authors proposed a deep learning model named a sparse convolutional denoising autoencoder to impute missing genotypes. Their method showed strong robustness and outperformed popular reference-free imputation methods.
8. Conclusions
ICIBM is an annual international conference, which has been held every year since 2012 (except 2017). It promotes a highly interactive and friendly platform for both young and senior researchers to exchange their research, foster collaboration, as well as expand educational activities. Approximately one hundred and seventy researchers and trainees from around the world joined the 2019 conference and contributed to a rich conference program, which included four keynote lectures, four eminent scholar talks, five tutorials and workshops, twelve concurrent sessions, a poster session, and other conference activities. Among the 105 original research manuscripts, we selected 18 for the Special Issue after two rounds of peer reviews. These 18 manuscripts describe innovative, computational works in the field. We expect these manuscripts to promote further investigation in the same or similar topics, and lead to more research toward translational clinical applications.