1. Introduction
Cervical cancer remains a prevalent disease globally. Despite the introduction of screening and vaccination programs, there were approximately 570,000 new cases and 311,000 deaths from cervical cancer worldwide in 2018, making it the fourth most frequently diagnosed cancer and the fourth leading contributor to cancer-related mortality in women [
1]. The high-risk factors of cervical cancer chiefly include human papillomavirus (HPV) infection, initiation of sexual behavior at a young age, multiple sexual partners, smoking, and long-term consumption of oral contraceptives [
2]. Moreover, it has been reported that during carcinogenesis of cervical cancer, HPV DNA is frequently integrated into the human genome. Although the combination of surgery and radiochemotherapy has improved overall survival (OS), progression-free survival (PFS) and disease-free survival of cervical cancer patients, and reduced the recurrence rate of cervical cancer, the 5-year survival rate of advanced cervical cancer patients, especially metastatic cervical cancer patients, remain dismally low, ranging from 5% to 15% [
3]. Moreover, the incidence and mortality of cervical cancer tend to be higher in countries or regions with a low human development index; cervical cancer is the most frequently occurring cancer type in women in sub-Saharan Africa and Southeast Asia [
1]. Remarkably, China contributed more than a sixth of the global cervical cancer burden, with 106,000 new cases and 48,000 deaths in 2018 [
4]. Consequently, greater efforts are needed to further elucidate the molecular mechanisms underlying tumor initiation and progression, especially in Chinese cervical cancer patients, which could facilitate the discovery of novel biomarkers for early cervical cancer screening and better molecular targets for the treatment of cervical cancer.
Over the recent years, many genomics sequencing studies have been carried out to uncover specific gene variations of cervical cancer. Kyrgiou et al., undertook a genome-wide association study (GWAS) of 273,377 women, including 4769 cervical intraepithelial neoplasia (CIN) grade 3 or invasive cervical cancer patients, and showed that six independent genetic susceptibility variants, PAX8 (rs10175462), CLPTM1L (rs27069), HLA-DQA1 (rs9272050), MICA (rs6938453), HLA-DQB1 (rs55986091) and HLA-B (rs92666183), were associated with CIN3 and invasive cervical cancer, suggesting disruptions in apoptotic and immune function pathways [
5]. Yang et al., found that targeting β-catenin reverses radioresistance of cervical cancer carrying PIK3CA-E545K, the most common hotspot mutation of PIK3CA in cervical cancer [
6]. Zhang et al., identified and screened the key genes (such as TSPO, CCND1) and pathways (such as DNA replication, organelle fission, chromosome segregation and cell cycle phase transition) closely related to cervical cancer by reanalyzing cervical cancer-associated gene expression dataset including 10 normal cervix samples and 21 cervical cancer samples [
7]. Burk et al., identified SHKBP1, ERBB3, CASP8, HLA-A and TGFBR2 as significantly mutated genes and unraveled amplifications in BCAR4, CD274 and PDCD1LG2 in 228 primary cervical cancer, among which multiple genes can be used as therapeutic targets [
8].
Although increasing cervical cancer-related mutations have been uncovered, the pathogenesis of cervical cancer remains still unclear in a considerable proportion of patients, and data are especially limited on the genetic characteristics of Chinese cervical cancer patients. In the current study, we used a multigene next generation sequencing (NGS) panel to analyze the sequencing results of 32 cervical cancer samples and paired normal control samples from Chinese cervical cancer patients. The panel contains 571 validated tumor-related genes and includes multiple genetic tests for simultaneously identifying single nucleotide variants (SNVs), small insertions and deletions (indels), copy number variations (CNVs), splice variants and gene rearrangements. We uncovered frequent and novel genetic alterations and performed related signaling pathways enrichment analysis, revealing distinct mutation characteristics from Caucasian patients.
2. Materials and Methods
2.1. Cervical Cancer Patients and Tissue Cohort
This study carried out between March 2019 and March 2020 at the Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China, prospectively enrolled 32 consecutive patients with pathologically proven primary cervical cancer for ultradeep NGS using a 571-gene targeted sequencing panel. All the specimens were collected during surgical resection of the primary tumor. Data were acquired from the hospital’s electronic medical records system and by direct interviews of participants. Cervical cancer surgical samples, paired tumor tissue and blood samples were acquired at the Department of Gynecological Oncology of the Hospital.
2.2. Genomic DNA Isolation and Targeted NGS
All samples were processed in a next-generation sequencing laboratory (Xinshu Healthcare Technology Company, Shanghai, China). Library was prepared according to the instructions of each manufacturer. Genomic DNA was extracted using a QIAamp DNA Mini kit (Qiagen GmbH, Dusseldorf, Germany). The quantity and purity of DNA were assessed using a Qubit® 3.0 fluorometer (Invitrogen; Thermo Fisher Scientific, Singapore) and a NanoDrop ND-1000 (Thermo Fisher Scientific, Wilmington, NC, USA). DNA fragmentation was evaluated by Genomic DNA ScreenTape assays (Agilent Technologies, Santa Clara, CA, USA) using the Agilent 2200 TapeStation system to produce a DNA integrity number. Sheared genomic DNA was used to perform end repair, A-tailing and adapter ligation with a KAPA library preparation kit (Kapa Biosystems, Wilmington, NC, USA). Libraries were captured using Agilent SureSelect human exon probes and amplified. Finally, the constructed sample libraries were sequenced by Illumina NextSeq500 System (Illumina, San Diego, CA, USA).
2.3. Preprocessing of Sequencing Reads
Raw short sequence reads were trimmed and filtered by fastp. Clean reads were mapped to the human reference genome hg19 using BWA-MEM with default parameters. Following GATK4 best practice, PCR duplicates in BAM files were first removed and subsequently realigned and recalibrated.
2.4. Somatic Variant Identification
Somatic SNVs and indels were identified using MuTect2. Tumor samples were used to call somatic mutations against the paired normal samples. Artifacts were filtered using the GATK FilterMutectCalls tool. Filtered variants were annotated using SnpEff with ExAC, 1000G, dbsnp, clinvar and COSMIC databases. The average depth of the sequencing was 1814X. To filter out mutations that may be false positive, only those mutations with a sequencing depth larger than 10X and supported by at least four mutation reads with a variant allele frequency (VAF) >0.01 and a global frequency <0.05 in ExAC and 1000G were used for further analysis.
2.5. Germline Variant Identification
Germline SNVs and indels were identified from the bam data of the blood samples using GATK HaplotypeCaller. Filtered variants were annotated using SnpEff with ExAC, 1000G, dbsnp, clinvar and COSMIC databases. To filter out mutations that might be false positive, only those mutations with a sequencing depth larger than 20X and supported by at least 10 mutation reads with a VAF >0.1 and a global frequency <0.05 in ExAC and 1000G were used for further analysis.
2.6. Copy Number Variations (CNVs)
CNVs were determined using CNVkit. A copy number of 1 indicated copy number loss, 0 homozygous deletion and ≥3 copy gain. ABSOLUTE was used to estimate tumor purity and ploidy from CNV and SNV results. Then, CNV was corrected for tumor purity and ploidy. In addition, significantly recurrent focal genomic regions with somatic copy number alterations (SCNAs) that were gained or lost in cervical cancer samples were identified using the Genomic Identification of Significant Targets in Cancer (GISTIC 2.0) algorithm20 software) [
9]. Default parameters of GISTIC were used and focal events with q-value below 0.25 were considered as significantly recurrent. Significant focal events in individual samples were classified according to the amplitude threshold of GISTIC: GISTIC status = 0, below threshold; GISTIC status = 1, amplified (gain); GISTIC status = 2, highly amplified (amplification); GISTIC status = −1, deleted (loss); GISTIC status = −2, highly deleted (deletion). The rates of CNVs and SCNAs in early and advanced stage cervical cancer were analyzed.
2.7. Gene Ontology (GO) and KEGG Pathway Enrichment Analyses
Gene ontology analysis (GO) is commonly used for annotating large scale genes and gene products [
10,
11]. KEGG is a collection of databases dealing with genomes, diseases, biological pathways, drugs and chemical materials [
12]. It is generated by molecular level information, can be used to predict which pathways a particular gene is enriched. It covers information resources such as diseases and pathways. GO analysis and KEGG analysis were performed by DAVID tools (DAVID. Available online:
https://david.ncifcrf.gov/, accessed on 10 December 2021) [
13] Statistical significance was considered for
p < 0.01. DAVID, which is an online bioinformatic tool, is designed to identify a large number of genes or proteins function. We could use DAVID to visualize the DEGs enrichment of BP, MF, CC and pathways (
p < 0.05).
2.8. Protein Interaction Assay
Protein interaction enrichment was analyzed using Metascape (Metascape. Available online:
https://metascape.org/, accessed on 10 December 2021) [
14]. The protein networks constructed were based on physical interactions among all input protein (gene) candidates.
4. Discussion
Cervical cancer is a common female malignancy with an incidence of 126.94/100,000 in China [
19]. It is of overly critical importance to understand the molecular mechanisms and genetic susceptibility of cervical cancer occurrence and development for early diagnosis and clinical therapy. Ojesina et al., performed whole exome sequencing (WES) of 115 paired cervical carcinoma and normal samples, RNAseq of 79 cases, and WGS of 14 tumor-normal pairs. They revealed that squamous cell carcinomas have higher frequencies of somatic nucleotide substitutions at cytosines preceded by thymines (Tp*C sites) than adenocarcinomas. They observed previously unknown somatic mutations in the MAPK1 gene, inactivating mutations in the HLA-B gene, and mutations in EP300, FBXW7, NFE2L2, TP53 and ERBB2 in squamous cell carcinoma samples, and somatic ELF3 and CBFB mutations in adenocarcinomas. They also reported that gene expression levels at HPV integration sites were statistically significantly higher in tumors with HPV integration compared with expression of the same genes in tumors without viral integration at the same site [
16]. Chung et al., performed WES in 15 Chinese cervical cancer patients. They observed frequently altered genes including FAT1, ARID1A, ERBB2 and PIK3CA. They also found HPV sequence in 13 samples and suggested that HPV genome integrated into the exon and may affect the tumorigenesis pathway [
20]. One of the largest cervical cancer sequencing efforts—
The Cancer Genome Atlas (TCGA) Project—reveals novel mutations in several genes, including SHKBP1, ERBB3, CASP8, HLA-A and TGFBR2, amplifications in immune targets PD-L1 and PD-L2, by sequencing 228 primary cervical cancer patients. They confirmed previously reported mutations in PIK3CA, EP300, FBXW7, HLA-B, PTEN, NFE2L2, ARID1A, KRAS and MAPK1. This study illuminates new therapeutic targets in cervical cancer [
8]. Unfortunately, we did not find any genomic alterations that are peculiar to the Chinese population in this study, probably due to the small sample size and the limitation of the clinical samples from a single center. We are actively recruiting more samples for further investigation.
In this study, using a 571 tumor-related gene panel for NGS, we identified 810 significant somatic variations, 2730 germline mutations and 701 CNVs from 32 cervical cancer samples and paired blood samples. PIK3CA and MTOR were the most frequently mutated genes, demonstrating that the PI3K/Akt/mTOR signaling pathway is commonly activated in cervical cancer [
16]. Previous studies revealed PIK3CA mutation is frequent in cervical cancer, and is associated with a poor OS and PFS [
8,
20,
21,
22,
23]. PIK3CA was mutated in 14% cervical squamous cell carcinoma patients in the study by Ojesa et al., Genomic profiling of advanced cervical cancer in the CLAP trial also showed a higher PFS in women with mutated PIK3CA receiving second line or later camrelizumab plus apatinib [
24]. This clinical benefit remains inconclusive in women receiving chemotherapy [
25]. Scholl et al., showed that patients with altered PI3K and epigenetic pathways had significantly poorer PFS [
26]. Nevertheless, these women only received conventional therapy. These findings indicate that biomarkers may have different predictive functions for cancer patients receiving conventional therapy versus immunotherapy, and a distinct set of predictive biomarkers should be developed for cervical cancer patients receiving immune checkpoint inhibitors.
Furthermore, the pathways enriched in the mutated genes in this study could offer some insight into signaling pathways associated with cervical carcinogenesis that can be therapeutically targeted such as the PI3K/Akt signaling pathway and DNA damage response pathways. DNA repair pathway genes, such as BRCA1, BRCA2 and ATM, which are potential predictive biomarkers [
27,
28,
29], also had a high frequency of CNVs in the present study, suggesting that HR defect might serve as a therapeutic target in cervical cancer. Furthermore, we performed overlap analysis of the SNV genes and CNV genes and screened 34 genes for subsequent bioinformatics analysis. Protein interaction simulation analysis of proteins encoded by the 34 genes showed that PIK3CA occupied the crucial central position in the whole network. Additionally, the results indicated that both HLA-A and HLA-B play an important role in this network, and intimately interacted with PIK3CA. Moreover, the somatic SNV mutation rate of PIK3CA is higher in HPV 16 (37.5%) than non-HPV 16 (12.5%). However, the difference was not significant (Fisher exact test,
p = 0.35), probably due to the limited sample size.
GO term enrichment analysis showed that the 34 genes were significantly enriched in cell cycle regulation, cellular response, and metabolic process, suggesting that some of these genes could be involved in cell cycle processes to promote cell proliferation by activating related signaling pathways and could be promising candidate genes of antitumor drugs. KEGG pathway enrichment analysis found that the virus infection-relation pathways were significantly enriched with many pathogenic genetic variations, which might contribute to persistent HPV infection.
We also observed chromosome 6q27 loss in advanced stage cervical cancer, occurring in 42% of advanced stage cervical cancer samples versus none in early stage cervical cancer samples. Chromosome 6 is frequently affected in cervical cancer [
30,
31]. Loss of heterozygosity (LOH) in 6q27 has been reported in up to 39% patients with invasive squamous cell carcinomas of the cervix [
32]. TRIM10, 15, 26, 31 and 40 loss has not been previously described in cervical cancer and their role in carcinogenesis of the cervix remains to be defined. Most TRIM family proteins are E3 ubiquitin ligases and have been reported to be involved in carcinogenesis.
While cross checking our results with TCGA, we found that the virus infection-relation pathways were significantly enriched with many pathogenic genetic variations, which is consistent to some extent with the result in TCGA showing that the driver genes of cervical cancer in the TCGA-CESC dataset were mainly enriched in the KEGG pathway, including human T-cell leukemia virus 1 infection, human papillomavirus infection, viral carcinogenesis, human cytomegalovirus infection and PI3K-Akt signaling pathway. GO pathway in the TCGA is enriched in cellular response to abiotic stimulus, negative regulation of cell differentiation, positive regulation of cell death and negative regulation of protein modification process, whereas our GO term enrichment analysis showed that the pathway was significantly enriched in cell cycle regulation, cellular response and metabolic process. These differences are highlighted in the paper.
By comparing with the ONCOKB database, the role of PIK3CA in cervical cancer was further highlighted, especially for E545K and E542K. Furthermore, this study also identified oncogenic somatic mutations in BRCA1, CHD1, KRAS and FBXW7, and likely oncogenic somatic mutations in EP300, HLA-A, KMT2D, PTEN and TP53, among others. In addition, we observed fifteen likely oncogenic germline mutations in thirteen genes recorded in the ONCOKB database. Notably, seven patients in the present study carried the probable oncogenic germline mutation in HLA-B, a well-known cervical cancer susceptibility gene [
8,
16]. Moreover, we also used the ONCOKB database to identify available therapeutic targets and found that eight genes can be targeted by fulvestrant plus alpelisib, olaparib, talazoparib, rucaparib and others, highlighting the potential clinical significance of therapeutic agents targeting these mutated genes.
In this article, we validated the presence of many specific pathogenic mutations in Chinese cervical cancer patients. There are several limitations in the current study. We did not uncover new molecular targets for cervical cancer in the Chinese cervical cancer population, possible due to the number of the samples and the limitation of the clinical samples from a single center. However, we performed a comprehensive analysis and presented more details of our findings, which have been partially confirmed by some other studies. We did not investigate the relationship between HPV integration and cervical cancer due to the lack of HPV status in the patients. We will include this information in future studies. We did not investigate the association of PIK3CA mutations and survival outcomes of cervical cancer patients in our cohort. As of December 2020, except for four patients lost to follow-up, only two patients had lymph metastasis or died (one each), and the remaining patients were progression-free. In the future, we will further expand the study and continue to follow patients to evaluate the significance of these variations in patient outcomes.