Metascape Gene List Analysis Report

metascape.org1

Bar Graph Summary

Figure 1. Bar graph of enriched terms across input gene lists, colored by p-values.
Metascape only visualizes the top 20 clusters. Up to 100 enriched clusters can be viewed here.
The top-level Gene Ontology biological processes can be viewed here.

Gene Lists

User-provided gene identifiers are first converted into their corresponding H. sapiens Entrez gene IDs using the latest version of the database (last updated on 2023-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.

Table 1. Statistics of input gene lists.
Name Total Unique
MyList 50 31

Gene Annotation

The following are the list of annotations retrieved from the latest version of the database (last updated on 2023-09-01) (Table 2).

Table 2. Gene annotations extracted
Name Type Description
Gene Symbol Description Primary HUGO gene symbol.
Description Description Short description.
Biological Process (GO) Function/Location Descriptions summarized based on gene ontology database, where up to three most informative GO terms are kept.
Kinase Class (UniProt) Function/Location Detailed kinase classes.
Protein Function (Protein Atlas) Function/Location Protein Function (Protein Atlas)
Subcellular Location (Protein Atlas) Function/Location Subcellular Location (Protein Atlas)
Drug (DrugBank) Genotype/Phenotype/Disease Drug information for the given gene as target.
Canonical Pathways Ontology Canonical Pathways
Hallmark Gene Sets Ontology Hallmark Gene Sets

Pathway and Process Enrichment Analysis

For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, WikiPathways, and PANTHER Pathway. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution2, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings3. Kappa scores4 are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.

Table 3. Top 20 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.
GO Category Description Count % Log10(P) Log10(q)
hsa05200 KEGG Pathway Pathways in cancer 21 67.74 -29.48 -25.13
WP4806 WikiPathways EGFR tyrosine kinase inhibitor resistance 13 41.94 -25.37 -21.50
WP2263 WikiPathways Androgen receptor network in prostate cancer 13 41.94 -23.75 -20.10
WP2034 WikiPathways Leptin signaling pathway 12 38.71 -23.47 -19.90
GO:0016310 GO Biological Processes phosphorylation 17 54.84 -19.38 -16.34
WP5087 WikiPathways Pleural mesothelioma 14 45.16 -17.50 -14.56
R-HSA-2219528 Reactome Gene Sets PI3K/AKT Signaling in Cancer 9 29.03 -15.06 -12.39
GO:0048732 GO Biological Processes gland development 11 35.48 -12.68 -10.19
WP3651 WikiPathways Pathways affected in adenoid cystic carcinoma 7 22.58 -12.42 -9.96
WP4685 WikiPathways Melanoma 7 22.58 -12.28 -9.86
WP3303 WikiPathways RAC1/PAK1/p38/MMP2 pathway 7 22.58 -12.28 -9.86
WP3850 WikiPathways Factors and pathways affecting insulin-like growth factor (IGF1)-Akt signaling 6 19.35 -11.88 -9.50
M91 Canonical Pathways PID TCPTP PATHWAY 6 19.35 -11.46 -9.11
WP138 WikiPathways Androgen receptor signaling pathway 7 22.58 -11.37 -9.03
GO:0030335 GO Biological Processes positive regulation of cell migration 11 35.48 -11.20 -8.88
GO:0040008 GO Biological Processes regulation of growth 11 35.48 -10.88 -8.58
WP4205 WikiPathways MET in type 1 papillary renal cell carcinoma 6 19.35 -10.53 -8.29
hsa04210 KEGG Pathway Apoptosis 7 22.58 -10.12 -7.91
hsa01524 KEGG Pathway Platinum drug resistance 6 19.35 -9.95 -7.76
R-HSA-9009391 Reactome Gene Sets Extra-nuclear estrogen signaling 6 19.35 -9.81 -7.63

To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using Cytoscape5, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.

Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.

Protein-protein Interaction Enrichment Analysis

For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING6, BioGrid7, OmniPath8, InWeb_IM9.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (details). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm10 has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.

Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.

Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.
GO Description Log10(P)
hsa05200 Pathways in cancer -29.5
WP2261 Glioblastoma signaling pathways -28.0
WP4806 EGFR tyrosine kinase inhibitor resistance -25.4
Color MCODE GO Description Log10(P)
MCODE_1 hsa01521 EGFR tyrosine kinase inhibitor resistance -19.2
MCODE_1 WP4806 EGFR tyrosine kinase inhibitor resistance -19.0
MCODE_1 GO:0042327 positive regulation of phosphorylation -16.6
MCODE_2 WP2034 Leptin signaling pathway -13.1
MCODE_2 R-HSA-5663202 Diseases of signal transduction by growth factor receptors and second messengers -9.2
MCODE_2 R-HSA-982772 Growth hormone receptor signaling -8.4
MCODE_3 hsa05207 Chemical carcinogenesis - receptor activation -8.6
MCODE_3 M176 PID FOXM1 PATHWAY -8.1
MCODE_3 hsa05200 Pathways in cancer -7.0

Quality Control and Association Analysis

Gene list enrichments are identified in the following ontology categories: COVID, Cell_Type_Signatures, DisGeNET, PaGenBase, TRRUST, Transcription_Factor_Targets. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. The top few enriched clusters (one term per cluster) are shown in the Figure 4-9. The algorithm used here is the same as that is used for pathway and process enrichment analysis.

Figure 4. Summary of enrichment analysis in COVID11.


GO Description Count % Log10(P) Log10(q)
COVID055 Phosphoproteome_Bouhaddou_Vero_E6_0h_Down 5 16.00 -5.10 -3.70
COVID133 Phosphoproteome_Stukalov_A549-ACE2_6h_Up 3 9.70 -4.30 -3.00
COVID282 RNA_Wilk_Dendritic-cells_patient-C5_Up 3 9.70 -4.20 -2.90
COVID345 RNA_Wilk_B-cells_patient-C5_Up 3 9.70 -3.60 -2.40
COVID048 RNA_Wyler_Calu-3_12h_Up 4 13.00 -3.60 -2.30
COVID057 Phosphoproteome_Bouhaddou_Vero_E6_12h_Down 4 13.00 -3.60 -2.30
COVID061 Phosphoproteome_Bouhaddou_Vero_E6_2h_Down 4 13.00 -3.60 -2.30
COVID234 Phosphoproteome_Klann_Caco-2_24h_Down 4 13.00 -3.60 -2.30
COVID236 Proteome_Klann_Caco-2_24h_Down 4 13.00 -3.60 -2.30
COVID343 RNA_Wilk_B-cells_patient-C4_Up 3 9.70 -3.60 -2.30
COVID011 RNA_Blanco-Melo_A549-ACE2-ruxolitinib_Down 3 9.70 -2.80 -1.50
COVID134 Proteome_Stukalov_A549-ACE2_24h_Down 3 9.70 -2.50 -1.20
COVID032 RNA_Liao_BALF-severe_Up 3 9.70 -2.50 -1.20
COVID040 RNA_Sun_Calu-3_24h_Up 3 9.70 -2.50 -1.20
COVID050 RNA_Wyler_Calu-3_24h_Up 3 9.70 -2.50 -1.20
COVID052 RNA_Xiong_BALF_Up 3 9.70 -2.50 -1.20
COVID059 Phosphoproteome_Bouhaddou_Vero_E6_24h_Down 3 9.70 -2.50 -1.20
COVID062 Phosphoproteome_Bouhaddou_Vero_E6_2h_Up 3 9.70 -2.50 -1.20
COVID065 Phosphoproteome_Bouhaddou_Vero_E6_8h_Down 3 9.70 -2.50 -1.20
COVID364 Interactome_Laurent_HEK293_24h_M 3 9.70 -2.50 -1.20
Figure 5. Summary of enrichment analysis in Cell Type Signatures12.


GO Description Count % Log10(P) Log10(q)
M39153 GAO LARGE INTESTINE 24W C2 MKI67POS PROGENITOR 3 9.70 -3.70 -2.40
M39061 MANNO MIDBRAIN NEUROTYPES HPROGFPM 4 13.00 -3.30 -2.10
M39135 AIZARANI LIVER C39 EPCAM POS BILE DUCT CELLS 4 3 9.70 -3.10 -1.80
M39062 MANNO MIDBRAIN NEUROTYPES HNPROG 3 9.70 -2.80 -1.50
M39036 FAN EMBRYONIC CTX NSC 2 3 9.70 -2.80 -1.50
M39096 ZHONG PFC C1 OPC 3 9.70 -2.70 -1.50
M39059 MANNO MIDBRAIN NEUROTYPES HPROGBP 3 9.70 -2.50 -1.20
M39058 MANNO MIDBRAIN NEUROTYPES HPROGM 3 9.70 -2.40 -1.20
M39060 MANNO MIDBRAIN NEUROTYPES HPROGFPL 3 9.70 -2.40 -1.10
M39053 MANNO MIDBRAIN NEUROTYPES HRGL2C 3 9.70 -2.40 -1.10
M39052 MANNO MIDBRAIN NEUROTYPES HOPC 3 9.70 -2.20 -0.98
M39174 MURARO PANCREAS ACINAR CELL 4 13.00 -2.20 -0.96
M39077 ZHONG PFC MAJOR TYPES MICROGLIA 3 9.70 -2.00 -0.77
Figure 6. Summary of enrichment analysis in DisGeNET13.


GO Description Count % Log10(P) Log10(q)
C1960398 HER2-positive carcinoma of breast 21 68.00 -35.00 -31.00
C1257931 Mammary Neoplasms, Human 23 74.00 -34.00 -30.00
C4704874 Mammary Carcinoma, Human 23 74.00 -33.00 -29.00
C0027859 Acoustic Neuroma 17 55.00 -32.00 -28.00
C2145472 Urothelial Carcinoma 20 65.00 -29.00 -25.00
C0025286 Meningioma 21 68.00 -28.00 -24.00
C0279550 Adult Rhabdomyosarcoma 20 65.00 -28.00 -24.00
C0220611 Childhood Rhabdomyosarcoma 20 65.00 -28.00 -24.00
C4551686 Malignant neoplasm of soft tissue 21 68.00 -27.00 -23.00
C0035412 Rhabdomyosarcoma 20 65.00 -27.00 -23.00
C0007138 Carcinoma, Transitional Cell 20 65.00 -26.00 -23.00
C1512127 HER2 gene amplification 15 48.00 -26.00 -22.00
C1328504 Hormone refractory prostate cancer 20 65.00 -25.00 -22.00
C0025500 Mesothelioma 19 61.00 -25.00 -22.00
C0751690 Malignant Peripheral Nerve Sheath Tumor 16 52.00 -25.00 -21.00
C0521158 Recurrent tumor 20 65.00 -25.00 -21.00
C0032580 Adenomatous Polyposis Coli 19 61.00 -24.00 -21.00
C0030297 Pancreatic Neoplasm 20 65.00 -24.00 -21.00
C4721209 Metastatic human epidermal growth factor 2 positive carcinoma of breast 11 35.00 -24.00 -21.00
C0544886 Somatic mutation 14 45.00 -24.00 -21.00
Figure 7. Summary of enrichment analysis in PaGenBase14.


GO Description Count % Log10(P) Log10(q)
PGB:00065 Tissue-specific: Breast 4 13.00 -4.80 -3.40
PGB:00060 Tissue-specific: retinoblastoma 3 9.70 -3.90 -2.60
PGB:00034 Cell-specific: OVR278E 3 9.70 -3.50 -2.20
PGB:00081 Cell-specific: Bronchial Epithelial Cells 3 9.70 -3.40 -2.10
PGB:00031 Cell-specific: HUVEC 3 9.70 -2.20 -0.96
Figure 8. Summary of enrichment analysis in TRRUST.


GO Description Count % Log10(P) Log10(q)
TRR01419 Regulated by: TP53 15 48.00 -25.00 -22.00
TRR00075 Regulated by: BRCA1 8 26.00 -15.00 -13.00
TRR00466 Regulated by: HDAC1 8 26.00 -14.00 -12.00
TRR01158 Regulated by: RELA 11 35.00 -14.00 -12.00
TRR01277 Regulated by: STAT3 9 29.00 -13.00 -11.00
TRR00230 Regulated by: E2F1 8 26.00 -12.00 -9.80
TRR01256 Regulated by: SP1 11 35.00 -12.00 -9.70
TRR00875 Regulated by: NFKB1 9 29.00 -10.00 -8.50
TRR01546 Regulated by: YBX1 5 16.00 -9.50 -7.70
TRR00275 Regulated by: ESR1 6 19.00 -9.50 -7.70
TRR00889 Regulated by: NKX3-1 4 13.00 -9.20 -7.50
TRR00011 Regulated by: AR 6 19.00 -8.90 -7.20
TRR01521 Regulated by: VHL 4 13.00 -8.10 -6.50
TRR01018 Regulated by: PAX5 4 13.00 -8.00 -6.40
TRR00138 Regulated by: CTNNB1 4 13.00 -8.00 -6.30
TRR01062 Regulated by: PPARG 5 16.00 -7.80 -6.20
TRR01029 Regulated by: PGR 4 13.00 -7.70 -6.10
TRR00645 Regulated by: JUN 6 19.00 -7.70 -6.10
TRR00660 Regulated by: KDM4B 3 9.70 -7.50 -5.90
TRR00198 Regulated by: DNMT1 4 13.00 -7.40 -5.80
Figure 9. Summary of enrichment analysis in Transcription Factor Targets.


GO Description Count % Log10(P) Log10(q)
M30025 IGLV5 37 TARGET GENES 6 19.00 -4.40 -3.10
M572 TGCCAAR NF1 Q6 6 19.00 -4.10 -2.80
M5866 HNF4 01 B 4 13.00 -3.90 -2.60
M13672 HNF4 01 4 13.00 -3.80 -2.50
M29962 F10 TARGET GENES 4 13.00 -3.80 -2.50
M9412 RTTTNNNYTGGM UNKNOWN 3 9.70 -3.30 -2.00
M4764 TGTTTGY HNF3 Q6 5 16.00 -3.00 -1.80
M533 STTTCRNTTT IRF Q6 3 9.70 -3.00 -1.70
M14066 IRF Q6 3 9.70 -2.70 -1.50
M15389 PTF1BETA Q6 3 9.70 -2.70 -1.40
M10836 SP3 Q3 3 9.70 -2.70 -1.40
M16200 ISRE 01 3 9.70 -2.70 -1.40
M2727 ICSBP Q6 3 9.70 -2.70 -1.40
M8549 AREB6 04 3 9.70 -2.70 -1.40
M12258 IRF7 01 3 9.70 -2.60 -1.40
M30288 ZNF257 TARGET GENES 4 13.00 -2.60 -1.40
M13889 PPAR DR1 Q2 3 9.70 -2.60 -1.40
M4644 DR1 Q3 3 9.70 -2.60 -1.40
M17779 PAX2 02 3 9.70 -2.60 -1.40
M13332 HNF4 DR1 Q3 3 9.70 -2.60 -1.30

Reference

  1. Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.
  2. Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523
  3. Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.
  4. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.
  5. Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.
  6. Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.
  7. Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.
  8. Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.
  9. Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.
  10. Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.
  11. https://metascape.org/COVID.
  12. Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545-15550 (2005).
  13. Pinero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic acids research 45, D833-D839 (2017).
  14. Pan JB, et al. PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 8, e80747 (2013).