Metascape Gene List Analysis Report
metascape.org1
Bar Graph Summary
Gene Lists
User-provided gene identifiers are first converted into their corresponding H. sapiens Entrez gene IDs using the latest version of the database (last updated on 2023-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.
Table 1. Statistics of input gene lists.
Name |
Total |
Unique |
MyList |
100 |
100 |
Gene Annotation
The following are the list of annotations retrieved from the latest version of the database (last updated on 2023-09-01) (Table 2).
Table 2. Gene annotations extracted
Name |
Type |
Description |
Gene Symbol |
Description |
Primary HUGO gene symbol. |
Description |
Description |
Short description. |
Biological Process (GO) |
Function/Location |
Descriptions summarized based on gene ontology database, where up to three most informative GO terms are kept. |
Kinase Class (UniProt) |
Function/Location |
Detailed kinase classes. |
Protein Function (Protein Atlas) |
Function/Location |
Protein Function (Protein Atlas) |
Subcellular Location (Protein Atlas) |
Function/Location |
Subcellular Location (Protein Atlas) |
Drug (DrugBank) |
Genotype/Phenotype/Disease |
Drug information for the given gene as target. |
Canonical Pathways
|
Ontology |
Canonical Pathways
|
Hallmark Gene Sets
|
Ontology |
Hallmark Gene Sets
|
Pathway and Process Enrichment Analysis
For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, WikiPathways, and PANTHER Pathway. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution
2, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings
3. Kappa scores
4 are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.
Table 3. Top 20 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.
GO |
Category |
Description |
Count |
% |
Log10(P) |
Log10(q) |
WP2882 |
WikiPathways |
Nuclear receptors meta-pathway |
10 |
10.00 |
-7.03 |
-3.11 |
GO:0040008 |
GO Biological Processes |
regulation of growth |
13 |
13.00 |
-6.90 |
-3.11 |
GO:0032119 |
GO Biological Processes |
sequestering of zinc ion |
3 |
3.00 |
-6.86 |
-3.11 |
hsa05200 |
KEGG Pathway |
Pathways in cancer |
11 |
11.00 |
-5.82 |
-2.26 |
GO:0010038 |
GO Biological Processes |
response to metal ion |
9 |
9.00 |
-5.51 |
-2.01 |
R-HSA-1280218 |
Reactome Gene Sets |
Adaptive Immune System |
12 |
12.00 |
-5.05 |
-1.83 |
R-HSA-211859 |
Reactome Gene Sets |
Biological oxidations |
7 |
7.00 |
-5.03 |
-1.83 |
R-HSA-6785807 |
Reactome Gene Sets |
Interleukin-4 and Interleukin-13 signaling |
5 |
5.00 |
-4.52 |
-1.41 |
hsa05202 |
KEGG Pathway |
Transcriptional misregulation in cancer |
6 |
6.00 |
-4.35 |
-1.29 |
GO:0071900 |
GO Biological Processes |
regulation of protein serine/threonine kinase activity |
7 |
7.00 |
-4.34 |
-1.29 |
GO:0097006 |
GO Biological Processes |
regulation of plasma lipoprotein particle levels |
4 |
4.00 |
-4.33 |
-1.29 |
R-HSA-2022090 |
Reactome Gene Sets |
Assembly of collagen fibrils and other multimeric structures |
4 |
4.00 |
-4.30 |
-1.29 |
GO:0009725 |
GO Biological Processes |
response to hormone |
11 |
11.00 |
-4.30 |
-1.29 |
WP5094 |
WikiPathways |
Orexin receptor pathway |
6 |
6.00 |
-4.25 |
-1.27 |
GO:0042445 |
GO Biological Processes |
hormone metabolic process |
6 |
6.00 |
-4.19 |
-1.27 |
hsa04927 |
KEGG Pathway |
Cortisol synthesis and secretion |
4 |
4.00 |
-4.19 |
-1.27 |
GO:0006656 |
GO Biological Processes |
phosphatidylcholine biosynthetic process |
3 |
3.00 |
-4.12 |
-1.27 |
WP2880 |
WikiPathways |
Glucocorticoid receptor pathway |
4 |
4.00 |
-4.06 |
-1.27 |
R-HSA-9759194 |
Reactome Gene Sets |
Nuclear events mediated by NFE2L2 |
4 |
4.00 |
-3.86 |
-1.12 |
R-HSA-453279 |
Reactome Gene Sets |
Mitotic G1 phase and G1/S transition |
5 |
5.00 |
-3.86 |
-1.12 |
To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using
Cytoscape5, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.
Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.
Protein-protein Interaction Enrichment Analysis
For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING
6, BioGrid
7, OmniPath
8, InWeb_IM
9.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (
details). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm
10 has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.
Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.
Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.
 | |  |
| |
|
GO |
Description |
Log10(P) |
hsa05200 |
Pathways in cancer |
-7.4 |
GO:0001558 |
regulation of cell growth |
-7.2 |
GO:0040008 |
regulation of growth |
-6.8 |
| |
Color |
MCODE |
GO |
Description |
Log10(P) |
|
MCODE_1 |
hsa04915 |
Estrogen signaling pathway |
-5.1 |
|
MCODE_1 |
R-HSA-9658195 |
Leishmania infection |
-4.9 |
|
MCODE_1 |
R-HSA-9824443 |
Parasitic Infection Pathways |
-4.9 |
|
MCODE_2 |
hsa05204 |
Chemical carcinogenesis - DNA adducts |
-7.9 |
|
MCODE_2 |
hsa00982 |
Drug metabolism - cytochrome P450 |
-7.9 |
|
MCODE_2 |
hsa00980 |
Metabolism of xenobiotics by cytochrome P450 |
-7.8 |
|
Quality Control and Association Analysis
Gene list enrichments are identified in the following ontology categories: COVID, Cell_Type_Signatures, DisGeNET, PaGenBase, TRRUST, Transcription_Factor_Targets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. The top few enriched clusters (one term per cluster) are shown in the Figure 4-9. The algorithm used here is the same as that is used for pathway and process enrichment analysis.
Figure 4. Summary of enrichment analysis in COVID11.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
COVID347 |
RNA_Wilk_B-cells_patient-C6_Up |
6 |
6 |
-5.70 |
-3.10 |
COVID258 |
RNA_Wilk_CD14+Monocytes_patient-C7_Up |
4 |
4 |
-5.40 |
-2.90 |
COVID309 |
RNA_Wilk_CD8+T-cells_patient-C3_Up |
4 |
4 |
-5.40 |
-2.90 |
COVID052 |
RNA_Xiong_BALF_Up |
8 |
8 |
-5.10 |
-2.70 |
COVID337 |
RNA_Wilk_B-cells_patient-C1B-severe_Up |
5 |
5 |
-4.60 |
-2.30 |
COVID036 |
RNA_Sun_Calu-3_0h_Up |
7 |
7 |
-4.20 |
-2.00 |
COVID363 |
Interactome_Laurent_HEK293_24h_E |
7 |
7 |
-4.20 |
-2.00 |
COVID293 |
RNA_Wilk_NK-cells_patient-C3_Up |
3 |
3 |
-4.10 |
-2.00 |
COVID271 |
RNA_Wilk_CD16+Monocytes_patient-C7_Up |
3 |
3 |
-4.10 |
-1.90 |
COVID228 |
Translatome_Bojkova_Caco-2_24h_Down |
5 |
5 |
-3.60 |
-1.60 |
COVID071 |
Proteome_Bouhaddou_Vero_E6_24h_Down |
4 |
4 |
-3.50 |
-1.60 |
COVID264 |
RNA_Wilk_CD16+Monocytes_patient-C3_Up |
3 |
3 |
-3.50 |
-1.60 |
COVID313 |
RNA_Wilk_CD8+T-cells_patient-C5_Up |
3 |
3 |
-3.40 |
-1.50 |
COVID007 |
RNA_Blanco-Melo_A549_Down |
6 |
6 |
-3.30 |
-1.40 |
COVID038 |
RNA_Sun_Calu-3_12h_Up |
6 |
6 |
-3.30 |
-1.40 |
COVID040 |
RNA_Sun_Calu-3_24h_Up |
6 |
6 |
-3.30 |
-1.40 |
COVID243 |
RNA_Riva_Vero-E6_24h_Up |
6 |
6 |
-3.30 |
-1.40 |
COVID385 |
Interactome_Laurent_HEK293_24h_ORF7A |
6 |
6 |
-3.30 |
-1.40 |
COVID386 |
Interactome_Laurent_HEK293_24h_ORF7B |
6 |
6 |
-3.30 |
-1.40 |
COVID345 |
RNA_Wilk_B-cells_patient-C5_Up |
4 |
4 |
-3.20 |
-1.30 |
|
Figure 5. Summary of enrichment analysis in Cell Type Signatures12.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
M41656 |
TRAVAGLINI LUNG MUCOUS CELL |
9 |
9 |
-9.40 |
-5.80 |
M41659 |
TRAVAGLINI LUNG ALVEOLAR EPITHELIAL TYPE 1 CELL |
12 |
12 |
-8.20 |
-4.90 |
M39111 |
AIZARANI LIVER C7 EPCAM POS BILE DUCT CELLS 2 |
9 |
9 |
-7.30 |
-4.30 |
M40175 |
DESCARTES FETAL EYE CORNEAL AND CONJUNCTIVAL EPITHELIAL CELLS |
9 |
9 |
-6.90 |
-3.90 |
M39174 |
MURARO PANCREAS ACINAR CELL |
14 |
14 |
-6.90 |
-3.90 |
M39209 |
HAY BONE MARROW STROMAL |
14 |
14 |
-6.60 |
-3.80 |
M41713 |
FAN OVARY CL11 MURAL GRANULOSA CELL |
11 |
11 |
-6.60 |
-3.70 |
M40007 |
BUSSLINGER GASTRIC PREZYMOGENIC CELLS |
5 |
5 |
-5.80 |
-3.20 |
M41669 |
TRAVAGLINI LUNG BRONCHIAL VESSEL 2 CELL |
8 |
8 |
-5.50 |
-2.90 |
M39303 |
CUI DEVELOPING HEART C6 EPICARDIAL CELL |
7 |
7 |
-5.40 |
-2.90 |
M40292 |
DESCARTES FETAL SPLEEN MESOTHELIAL CELLS |
7 |
7 |
-5.40 |
-2.90 |
M41655 |
TRAVAGLINI LUNG GOBLET CELL |
6 |
6 |
-5.00 |
-2.60 |
M39278 |
DURANTE ADULT OLFACTORY NEUROEPITHELIUM SUSTENTACULAR CELLS |
4 |
4 |
-4.70 |
-2.40 |
M41705 |
FAN OVARY CL3 MATURE CUMULUS GRANULOSA CELL 1 |
7 |
7 |
-4.70 |
-2.40 |
M41697 |
TRAVAGLINI LUNG EREG DENDRITIC CELL |
10 |
10 |
-4.70 |
-2.40 |
M41700 |
TRAVAGLINI LUNG OLR1 CLASSICAL MONOCYTE CELL |
11 |
11 |
-4.50 |
-2.30 |
M39225 |
LAKE ADULT KIDNEY C6 PROXIMAL TUBULE EPITHELIAL CELLS FIBRINOGEN POS S3 |
6 |
6 |
-4.50 |
-2.30 |
M39321 |
CUI DEVELOPING HEART VASCULAR ENDOTHELIAL CELL |
6 |
6 |
-4.30 |
-2.10 |
M41717 |
FAN OVARY CL15 SMALL ANTRAL FOLLICLE GRANULOSA CELL |
10 |
10 |
-4.30 |
-2.10 |
M40229 |
DESCARTES FETAL LIVER MYELOID CELLS |
6 |
6 |
-4.20 |
-2.00 |
|
Figure 6. Summary of enrichment analysis in DisGeNET13.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
C0521158 |
Recurrent tumor |
20 |
20 |
-12.00 |
-8.30 |
C0019158 |
Hepatitis |
18 |
18 |
-11.00 |
-7.30 |
C0860207 |
Drug-Induced Liver Disease |
16 |
16 |
-11.00 |
-6.70 |
C3203102 |
Idiopathic pulmonary arterial hypertension |
18 |
18 |
-10.00 |
-6.30 |
C0042373 |
Vascular Diseases |
17 |
17 |
-10.00 |
-6.20 |
C4086152 |
Childhood Astrocytoma |
16 |
16 |
-9.70 |
-6.00 |
C0030297 |
Pancreatic Neoplasm |
17 |
17 |
-9.30 |
-5.70 |
C0031099 |
Periodontitis |
16 |
16 |
-9.10 |
-5.60 |
C0007107 |
Malignant neoplasm of larynx |
12 |
12 |
-8.70 |
-5.30 |
C1868683 |
B-CELL MALIGNANCY, LOW-GRADE |
12 |
12 |
-8.70 |
-5.30 |
C0025286 |
Meningioma |
15 |
15 |
-8.60 |
-5.20 |
C0153381 |
Malignant neoplasm of mouth |
16 |
16 |
-8.40 |
-5.10 |
C0015672 |
Fatigue |
16 |
16 |
-8.40 |
-5.10 |
C0278488 |
Carcinoma breast stage IV |
14 |
14 |
-8.20 |
-4.90 |
C0007785 |
Cerebral Infarction |
15 |
15 |
-8.10 |
-4.80 |
C0154830 |
Proliferative diabetic retinopathy |
9 |
9 |
-8.00 |
-4.80 |
C0038525 |
Subarachnoid Hemorrhage |
13 |
13 |
-7.90 |
-4.70 |
C0279550 |
Adult Rhabdomyosarcoma |
13 |
13 |
-7.90 |
-4.70 |
C0220611 |
Childhood Rhabdomyosarcoma |
13 |
13 |
-7.80 |
-4.60 |
C0023465 |
Acute monocytic leukemia |
14 |
14 |
-7.70 |
-4.50 |
|
Figure 7. Summary of enrichment analysis in PaGenBase14.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
PGB:00002 |
Cell-specific: HEPG2 |
9 |
9 |
-4.90 |
-2.50 |
PGB:00081 |
Cell-specific: Bronchial Epithelial Cells |
5 |
5 |
-4.00 |
-1.90 |
PGB:00071 |
Cell-specific: Vaginal Epithelial |
3 |
3 |
-4.00 |
-1.90 |
PGB:00101 |
Tissue-specific: Colorectal adenocarcinoma |
3 |
3 |
-3.50 |
-1.50 |
PGB:00082 |
Cell-specific: Breast cell |
3 |
3 |
-3.10 |
-1.20 |
PGB:00018 |
Tissue-specific: lung |
7 |
7 |
-2.90 |
-1.10 |
PGB:00014 |
Cell-specific: DRG |
6 |
6 |
-2.40 |
-0.75 |
PGB:00060 |
Tissue-specific: retinoblastoma |
3 |
3 |
-2.40 |
-0.75 |
PGB:00022 |
Tissue-specific: adrenal gland |
4 |
4 |
-2.30 |
-0.72 |
PGB:00004 |
Tissue-specific: kidney |
6 |
6 |
-2.30 |
-0.66 |
PGB:00034 |
Cell-specific: OVR278E |
3 |
3 |
-2.00 |
-0.48 |
|
Figure 8. Summary of enrichment analysis in TRRUST.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
TRR01256 |
Regulated by: SP1 |
19 |
19 |
-14.00 |
-9.50 |
TRR01158 |
Regulated by: RELA |
11 |
11 |
-7.80 |
-4.70 |
TRR00875 |
Regulated by: NFKB1 |
11 |
11 |
-7.80 |
-4.60 |
TRR00484 |
Regulated by: HIF1A |
6 |
6 |
-6.00 |
-3.30 |
TRR01277 |
Regulated by: STAT3 |
6 |
6 |
-4.80 |
-2.40 |
TRR01557 |
Regulated by: ZEB1 |
3 |
3 |
-4.60 |
-2.30 |
TRR00366 |
Regulated by: FOXO3 |
3 |
3 |
-4.40 |
-2.20 |
TRR00869 |
Regulated by: NFE2L2 |
3 |
3 |
-4.40 |
-2.10 |
TRR00645 |
Regulated by: JUN |
5 |
5 |
-3.60 |
-1.60 |
TRR00908 |
Regulated by: NR3C1 |
3 |
3 |
-3.10 |
-1.30 |
TRR01259 |
Regulated by: SP3 |
4 |
4 |
-3.10 |
-1.20 |
TRR00270 |
Regulated by: EP300 |
3 |
3 |
-2.90 |
-1.10 |
TRR00230 |
Regulated by: E2F1 |
4 |
4 |
-2.90 |
-1.10 |
TRR00110 |
Regulated by: CEBPB |
3 |
3 |
-2.80 |
-1.10 |
TRR01512 |
Regulated by: USF1 |
3 |
3 |
-2.70 |
-0.95 |
TRR00466 |
Regulated by: HDAC1 |
3 |
3 |
-2.60 |
-0.90 |
TRR00280 |
Regulated by: ETS1 |
3 |
3 |
-2.60 |
-0.88 |
TRR01275 |
Regulated by: STAT1 |
3 |
3 |
-2.40 |
-0.76 |
|
Figure 9. Summary of enrichment analysis in Transcription Factor Targets.
|
|
GO |
Description |
Count |
% |
Log10(P) |
Log10(q) |
M13012 |
TGCTGAY UNKNOWN |
10 |
10 |
-4.90 |
-2.50 |
M15719 |
YYCATTCAWW UNKNOWN |
6 |
6 |
-4.40 |
-2.10 |
M29904 |
BCL6B TARGET GENES |
5 |
5 |
-3.60 |
-1.60 |
M3647 |
GR Q6 01 |
6 |
6 |
-3.50 |
-1.50 |
M3403 |
GTGACGY E4F1 Q6 |
9 |
9 |
-3.40 |
-1.50 |
M30176 |
SOX3 TARGET GENES |
8 |
8 |
-3.30 |
-1.40 |
M11838 |
FOXD3 01 |
5 |
5 |
-3.20 |
-1.40 |
M551 |
TEF1 Q6 |
5 |
5 |
-3.00 |
-1.20 |
M6378 |
TTCYNRGAA STAT5B 01 |
6 |
6 |
-3.00 |
-1.20 |
M11587 |
CHX10 01 |
5 |
5 |
-3.00 |
-1.20 |
M18386 |
STAT5A 02 |
4 |
4 |
-2.90 |
-1.10 |
M7349 |
WYAAANNRNNNGCG UNKNOWN |
3 |
3 |
-2.90 |
-1.10 |
M3263 |
STAT3 02 |
4 |
4 |
-2.80 |
-1.10 |
M17318 |
CGTSACG PAX3 B |
4 |
4 |
-2.80 |
-1.10 |
M19455 |
PR 01 |
4 |
4 |
-2.80 |
-1.10 |
M16200 |
ISRE 01 |
5 |
5 |
-2.80 |
-1.10 |
M8585 |
TFIIA Q6 |
5 |
5 |
-2.80 |
-1.00 |
M5 |
STAT 01 |
5 |
5 |
-2.80 |
-1.00 |
M17883 |
STAT5A 01 |
5 |
5 |
-2.80 |
-1.00 |
M17779 |
PAX2 02 |
5 |
5 |
-2.70 |
-1.00 |
|
Reference
- Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.
- Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523
- Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.
- Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.
- Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.
- Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.
- Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.
- Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.
- Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.
- https://metascape.org/COVID.
- Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
- Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research 45, D833-D839 (2017).
- Pan JB, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 8, e80747 (2013).