1. Introduction
The use of model organisms is an essential step in the path to understanding neurological diseases and their progression. While several mouse models exist for studying neurological disorders, nominating the best model for functional follow-up is not an easy task. With sequencing technology becoming common and affordable, resources with a large collection of high-throughput transcriptomic profiles have been created, and data are made accessible through centralized portals. There are portals with large pools of samples from single nation-wide projects such as the Genotype-Tissue Expression (GTEx) project [
1] with pools of non-disease tissues from human samples and The Cancer Genome Atlas (TCGA) (
https://www.cancer.gov/tcga) with tumors/adjacent normal tissues from tumor patients. These portals allow users to download the processed expression data for further analysis. Alternatively, there are portals of data from studies focusing on particular tissues, such as the Allen Brain Atlas (
https://portal.brain-map.org/) with brain tissue data and the Brain RNA-seq portal [
2,
3] with brain cells expression. These two databases have expressions from both human and mouse experiments and allow users to search and compare the expression changes of genes between the human and mouse experiments. This feature is very valuable as insight gained from model organisms is essential to understanding diseases and their progression in human. In addition, there are numerous additional studies from different laboratories which profiled model transcriptomes and deposited the data into databases such as the Gene Expression Omnibus (GEO) database. However, the collection from each of these studies is relatively small in size, and the data is available in various file formats, ranging from raw files, such as sequencing reads to normalized expression or analysis results from an array of methods. There are few centralized portals with collections of expression data deposited in GEO, such as the Gene eXpression Database (GXD) [
4] which allows users to obtain the expression of single or multiple genes from mouse experiments. However, it does not provide comparisons to human expression data. Another portal, the ARCHS
4 [
5], has human and mouse studies from the GEO database processed through a unified pipeline and gene counts available for download for further processing and analyses. ARCHS
4 allows users to search for studies using meta-data such as tissues and cell lines, a GEO study identification number, or search using gene symbols which shows tissue expression levels and finds genes with similar co-expression patterns. However, it is difficult to gain specific insights on genes associated with a certain disease as the identification for relevant studies from the vast space of all studies included in ARCHS
4 is challenging. Furthermore, it does not allow gene expression changes investigation in the context of a specific experiment. To date, there is no centralized platform available where the mouse models could be jointly and fairly compared or evaluated using a set of target genes to comprehensively and specifically study neurological diseases in humans.
In a recent study [
6], a large collection of mouse studies related to neurological disorders were collected and processed using a unified RNA-seq pipeline utilizing Amazon Web Services (AWS). The studies were collected from the GEO database and the Accelerating Medicines Partnership-Alzheimer’s Disease (AMP-AD) Knowledge portal and include nine neurological diseases, 14 cell types, and 251 mouse experiment-control comparisons (i.e., Differentially Expressed Genes (DEGs)). The analysis and processed data are available at
https://www.synapse.org/#!Synapse:syn16779040, but there is no efficient way to examine genes of interest across all the processed studies without downloading the large dataset. Since this represents a rich resource of data which could be utilized by the research community, there is a need for a portal to access this data efficiently and intuitively. To address this shortcoming, we developed a centralized portal available at (
http://mmad.nrihub.org) to access and visualize the mouse transcriptomic data from a comprehensive set of studies on neurological disorders processed by Wan et al. [
6].
Using the portal, users can efficiently look up their genes of interest in a large number of mouse studies related to neurological disorders and draw insights about their genes of interest, which could help identify the best mouse model for their planned experiments or functional follow-up. The portal could be queried with a set of gene symbols and produces an interactive heatmap which displays the expression changes of queried genes across all included mouse data sets (e.g., experiment-control comparisons or DEGs). Our portal provides several advantages over existing resources for user query-based lookup of gene expression data in experimental data. First, it accumulates mouse model studies related to neurological disorders into a centralized portal with annotations added to each study for efficient searching. Second, the portal is specifically focused on mouse models related to neurological disorders.
3. Results and Discussion
In order to illustrate the use of our portal, we study three different “use cases”, including gene sets based on human transcriptional and genetic data. We examined two distinct Alzheimer’s disease (AD)-associated transcriptional signatures, based on published coexpression analyses from human postmortem brains [
9,
10]. First, in an analysis of 1647 samples from three brain regions (dorsolateral prefrontal cortex, visual cortex, and cerebellum), Zhang et al. [
9] identified a promising module (yellow), consisting of 1098 genes. Using an integrative, network-based approach, the yellow module was ranked highest for its association to late-onset AD and was highly enriched with immune and microglial functions. We utilized our transcriptome portal to examine whether genes from the yellow module are differentially expressed in mouse models relevant to AD and many other neurological disorders.
Figure 3a shows the heatmap produced using the 1098 genes from the yellow module. From the heatmap, we could identify three regions of interest with high or low log fold-change values (R1, R2, R3). R1 consists of 14 DEGs (e.g., mouse comparisons), 12 of the DEGs are derived from seven AD-related studies, with predominately upregulated genes (
Figure 3b). On the other hand, R2 (six DEGs) shows a set of significantly downregulated genes in models for studying different cell types in the mouse (
Figure 3c). Lastly, R3 consists of five DEGs with genes highly upregulated, and the mouse DEGs (comparisons) are from three studies related to inflammation and immunity (
Figure 3d). This is consistent with findings from human brain samples observed in the original study [
9]. The inspection of individual subsets from the vast collection of mouse model gene expression data allows users to deeply interpret the results with ease. The ease of interpretation is facilitated by the interactive heatmap provided through the portal, which allows users to zoom into specific regions of interest and the option to download the heatmap.
In a second example, we explored another co-expression module (m109) which was significantly associated with cognitive decline based on a more recent study by Mostafavi et al. [
10]. The authors studied RNA-seq data from the human Dorsal Lateral Prefrontal Cortex (DLPFC) brain region in two cohorts: the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP), collectively referred to as (ROSMAP), which consisted of 478 subjects. Sets of co-expressed genes (modules) were identified, and association analysis was performed between the co-expression modules and clinical traits. Using this approach, the authors found that m109 had a significant module-level association with cognitive decline. The top 112 reported genes with high levels of association with cognitive decline from m109 were used to examine their expression changes across mouse model studies.
Figure 4a shows the heatmap produced using the 112 genes from m109. There are two regions with consistent expression changes across the majority of genes: R1 (four DEGs) with genes consistently upregulated and R2 (four DEGs) with genes consistently downregulated. In region R1 (
Figure 4b), three DEGs (M132, M133, M140) are from the same paper which studied the effect of ethanol on synaptic transcriptome and synaptic plasticity [GSE73018], while M52 is from a paper investigating the biology of microglia [
11]. In region R2 (
Figure 4c), two DEGs (M48, M63) are from a paper investigating the transcriptomic profiles of different cell populations in the mouse brain, with M48 on the neurons and M63 on dopaminergic neurons from the midbrain [
12]. The other two DEGs (M136,M138) are from the same study as (M132,M133,M140) shown in R1. This is interesting because it implies that input genes’ expression levels are significantly altered in opposite directions depending on which experimental treatment was used (M136,138 vs. M132,133,140). Faced with such a result, users might investigate the original study so that they may better understand its experimental design to determine the causal mechanism.
Lastly, we also used our tool to interrogate a set of the 90 susceptibility gene candidates for Parkinson’s disease (PD) based on a recently reported Genome-wide Association Study (GWAS) [
13].
Figure 5a shows the resulting heatmap, and
Figure 5b shows the sets of DEGs (M49, M46, M52) with high log fold-change which study three cell types in mouse (newly formed oligodendrocyte, myelinating oligodendrocyte, and microglia) respectively.
Figure 5c shows a set of DEGs (M211, M214, M213, M212) all from the same study [
14] of microglial neurodevelopment.
Using our portal, we can efficiently examine gene expression changes across a large number of mouse studies related to neurological disorders which can help in the identification of the most appropriate mouse model for further study and functional validation. It is of importance to mention that while we do nominate mouse models for functional follow-up, we do not provide a significant measure or ranking score to the nominated models, thus we recommend to closely examine the nominated mouse model study to first check how well does a nominated model fit the users’ hypothesis and study goals and how closely does it follow the transcriptional changes in the human data the user is trying to validate, and second to check if the nominated mouse model does represent or model the human condition the user is investigating.