1. Introduction
Glioblastoma (GBM) is the most common and aggressive primary central nervous system (CNS) malignant tumor, with an overall survival of less than 15 months despite standard therapy [
1,
2]. Conventional treatments, including surgery, radiotherapy and chemotherapy, are limited in their abilities to achieve satisfactory results because of the high invasion and infiltration of tumor cells and chemotherapy resistance. As we know, several risk factors have been identified as established prognostic factors, such as age, histological grade, Karnofsky Performance Status (KPS), extent of resection (EOR) and gene mutation status. In recent years, genetic profiles have been paid increasing attention, and it has been widely accepted that isocitrate dehydrogenase (IDH) mutation, 1p/19q codeletion and O6-methylguanine-DNA methyltransferase (MGMT) gene promoter methylation affect treatment effect and survival [
3,
4].
With the development of targeted sequencing and proteomic profiling technology, neurooncology researchers have established some new tumor types in clinical practice, and a series of novel molecular markers related to tumor development, treatment and prognosis have also been identified. For this reason, the 2021 updated World Health Organization (WHO) classification of CNS tumors focused on advancing the role of molecular diagnosis in the classification of CNS tumors [
5]. At the same time, molecular diagnosis still needs to be combined with established approaches to CNS tumor diagnosis, such as histology and immunohistochemistry.
In the fifth edition of the WHO classification, one of the most major modifications is the classification of glioma into pediatric and adult types, suggesting that there are clear molecular genetic differences in the occurrence and development of adult glioma and pediatric glioma. Traditional bulk tumor analysis and bioinformatics analysis have identified some key genes, transcriptome changes and pathways that drive malignancy in GBM cells, but they are limited to exploring the diversity and similarity in both types. Instead, single-cell RNA sequencing (scRNA-seq) provides a new approach based on a resolution of an individual cell to reveal the heterogeneity of cancers. Heterogeneity of cancer has a profound impact on the prognosis of patients, and a major aspect of tumor heterogeneity is the tumor microenvironment. Many studies have demonstrated the influence of tumor microenvironment on GBM progression [
6,
7]. One main application of scRNA-seq is the study of the differentiation and development of tumor cells. Another major application is the determination of the key molecule during the acquisition of malignant potential for tumor cells.
In the present study, we took advantage of scRNA-seq analysis to investigate a detailed dialectical relation between the pediatric and adult types. We also further explored the impact of cellular heterogeneity on adult GBM prognosis and revealed the malignance-associated pathways through which cells gradually acquire the invasive ability. At last, we firstly identified and validated two potential genes involved in adult GBM migration.
2. Materials and Methods
2.1. Dataset Downloading and Preprocessing
The raw data used in this study were downloaded from the Broad Institute Single-Cell Portal database (
https://singlecell.broadinstitute.org/single_cell/study/SCP393/single-cell-rna-seq-of-adult-and-pediatric-glioblastoma/, accessed on 5 August 2022), the Cancer Genome Atlas (TCGA) database (
https://portal.gdc.cancer.gov/, accessed on 5 August 2022) and the Gene Expression Omnibus (GEO) database (no. GSE131928), which includes 24,131 cells (7930 cells from Smartseq2 sequencing platform, 16,201 cells from 10X sequencing platform) and 28 samples with GBM. Cyril Neftel et al. [
8] accurately divided cells from Smartseq2 sequencing platform into malignant cells and non-malignant cells. As we focused on the analysis of GBM cells, we selected malignant cells from Smartseq2 sequencing platform for analysis, forming a total of 6863 cells and 28 samples (the number of samples was consistent with the original data, without missing any). These cells were further divided into adult and child groups, according to known labels, with 4916 cells and 20 samples in the adult group and 1947 cells and 8 samples in the child group.
2.2. Data Normalization and Unsupervised Clustering
The R package “Seurat” is widely used for systematic processing of single-cell sequencing data [
9]. Here, we followed the normalization steps. First, genes not expressed in the cells were removed, and a total of 23,686 genes were left. Then, the adult and child group were treated separately. To accelerate the downstream analysis and improve the accuracy, selection of gene features was carried out. The “FindVariableFeatures” function was used to screen the highly variable genes based on the mean.var.plot (MVP) method. A total of 2703 highly variable genes in the adult group and 2989 highly variable genes in the child group were identified.
Next, Principal Component Analysis (PCA) was used for feature extraction of highly variable genes, and JackStraw and Elbow methods were used to select the best number of principal components, which was 54 for the adult group and 25 for the child group. According to the selected principal components, the “FindClusters” function, based on the original Louvain algorithm, was used to perform unsupervised clustering for the two groups, respectively. The clustering results were visualized using Uniform Manifold Approximation and Projection for Dimension Reduction (UMap) methods. The “FindAllMarkers” function was used to identify specific genes for each cell cluster with a significance threshold of adjusted p < 0.05 and absolute value of log2FC >1. The similarity between two groups of cell clusters was measured using the Jaccard coefficient. Specifically, the similarity between two cell clusters was equal to the intersection number of specific genes of the two cell clusters divided by the union number of specific genes of the two cell clusters.
2.3. Single-Cell Pseudotime Trajectory Reconstruction
To further investigate the similarities and differences in the GBM development process between adults and children from a dynamic perspective, the R package “Monocle” was used to conduct pseudotime trajectory analysis of the two groups of cells. The default DDRTree method of the “reduceDimension” function was used to reduce the dimension, and, subsequently, the “orderCells” function was used to construct the cell pseudotime trajectory. The results of pseudotime trajectories were visualized based on cluster, state and pseudotime, respectively. To reveal the chronological order of each cell cluster in the pseudotime trajectory more clearly, the trajectories were drawn in accordance with cell cluster colors, respectively. The cell states were automatically divided by “Monocle” according to the trajectories, which were mainly based on branches and bifurcation points.
The pseudotime value of each cell was calculated through “Monocle” based on change of gene expression. Each cell was arranged in the trajectory according to the pseudotime value, so as to fit the developmental process of the cell. Therefore, coloring the trajectory based on the pseudotime value can visually reveal the development process of cells.
2.4. Survival Analysis
To investigate the influence of different cell clusters on the prognosis of patients, GBM transcriptome data and corresponding survival data were downloaded from TCGA database. A total of 174 samples were collected, with 59,427 genes, including 169 GBM samples from 160 patients. The CIBERSORT method was used to infer the number of each cell cluster in each sample based on the support vector regression algorithm [
10]. The algorithm relies on the known background gene set and scores the fit by gene expression. Since the specific genes reflect the biological characteristics of each cell cluster, we used the specific genes of each cell cluster to construct the background gene set. We divided all the patients into two groups based on the survival time. The patients with longer survival time were assigned to the good prognosis group, and the patients with shorter survival time were assigned to the poor prognosis group. Survival curves were generated using R packages “survival” and “SurvMiner”, and
p < 0.05 was considered statistically significant using the log-rank test. Box plot was generated using R package “ggplot2” to visually show the differences in each cell cluster between the two groups. Statistical significance was calculated using the Wilcoxon rank-sum test through R package “GGPUBR”.
2.5. Prognosis Analysis of Pseudotime Values
To quantify the heterogeneity of different cell clusters, we calculated the pseudotime values for each patient based on the trajectories. The pseudotime value of a cell cluster is equal to the average of the pseudotime values of all cells in the cell cluster; the pseudotime value of a patient is equal to the weighted sum of the pseudotime value of each cell cluster and the proportion of each cell cluster in the patient. Patients were regrouped based on their pseudotime values to evaluate the influence of pseudotime on prognosis using the “surv_cutpoint” function in the R package “SurvMiner”.
2.6. Functional Analysis of Key Cell Cluster
Key cell cluster was identified based on trajectory and prognosis analysis. To further evaluate the biological function of key cell cluster, the specific genes of the cluster were identified and visualized by volcano plot. The R package “ClusterProfiler" was used to perform Gene Ontology (GO) functional annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment and Gene Set Enrichment Analysis (GSEA) for specific genes. Biological functions or pathways with adjusted p values < 0.05 and q values < 0.05 were considered significantly enriched and statistically significant. The significantly enriched KEGG pathways were displayed in the form of tree diagrams using the R package “Treemap”, and GO items were displayed in the form of bar graphs.
2.7. Identification of Hub Genes
Weighted gene co-expression network analysis (WGCNA) was performed using R package “WGCNA” to explore the key gene modules that regulate cell clusters. Cells and specific genes of the key cell cluster identified previously were used to build the feature expression profiles for WGCNA. To reduce the influence of noisy data, median absolute deviation (MAD) > 0 was used as the threshold to screen genes. Cytoscape software was used for further processing and visualization. We performed the following quality control on nodes and edges in the network: 1. The genes (nodes) assigned to the “Grey” module in WGCNA were deleted because these genes were not assigned to a valid gene module; 2. Since the weight of each edge represents the connection strength between two nodes, the edges with the weight of the top 1% were retained.
2.8. Validation of Hub Genes
We verified the validity of hub genes and evaluated the potential association with GBM though gene expression trend analysis, immunohistochemistry (IHC) images and experiments. It is of great significance to analyze the variation trend of a gene in the whole process of cell development from the perspective of pseudotime value and cell state. We drew the scatterplot depicting changes in genes over time using the “plot_genes_IN_pseudotime” function and fitted the change curves. IHC can reflect the expression of genes in tissues. IHC staining images were downloaded from the Human Protein Atlas database for verification.
2.9. Cell Culture and Antibodies
The human GBM cell lines, LN229 and A172, were provided by American Type Culture Collection (ATCC, Manassas, VA, USA) and cultured in Dulbecco’s Modified Eagle Medium (DMEM, BasalMedia, Shanghai, China) with 10% fetal bovine serum (FBS, Excell, Suzhou, China). ScienCell provided normal human astrocytes (HAs), which were cultivated in astrocyte growth medium with 5% FBS. All the cells were grown at 37 °C with 5% CO2. Abcam provided antibodies against SHISA9 and α-tubulin. Invitrogen provided the antibody against ASTN2.
2.10. Small Interfering RNAs and Transfection
LN229 and A172 cells were transiently transfected with human ASTN2 small interfering RNA (siRNA); Ribobio (Guangzhou, China) designed and synthesized three siRNAs against ASTN2. The target sequences are listed below:
si3121: 5′-CACCAGTGCTGCTGGAAAT-3′
si3181: 5′-GCACAAAGGAGGCCTTCAA-3′
si2941: 5′-GCAGCAAGAAGGAGCTCAA-3′
All transfections were carried out according to the manufacturer’s instructions using Lipofectamine 3000 (Invitrogen; Thermo Fisher Scientific-CN).
2.11. Western Blotting
Total proteins were extracted from cells cultured in vitro and RIPA buffer with protease inhibitors was used to lyse cellular proteins. Equal amounts of protein were added to the electrophoresis tank. Later, the proteins were transferred to polyvinylidene difluoride membrane with 0.45 mm pore size (Millipore, Billerica, MA, USA). The membranes containing the target proteins were tailored according to the Marker and molecular weight of the interest proteins, and then were blocked with 5% milk for 1.5 h at room temperature. The primary antibodies were used to incubate the corresponding target proteins for overnight at 4 °C and then the secondary antibodies were used to incubate the proteins for 2 h at room temperature. At last, the interest proteins were visualized using enhanced chemiluminescence methods.
2.12. Wound Healing
Cells grew in 6-well plates for 24 h before being scratched with a sterile pipette tip. Each wound was photographed by inverted microscopy (Leica, Wetzlar, Germany) at 0 and 48 h after rinsing the cells with phosphate-buffered saline (PBS) to remove cellular debris. The total wound area was analyzed using ImageJ software to evaluate the migration capacity.
2.13. Statistical Analysis
The results were expressed as Mean ± SEM. Comparisons were performed using Student’s t-test with two tails or ANOVA for multiple comparisons. p-values less than 0.05 were considered statistically significant. All statistical analyses were performed using GraphPad Prism 9 (Graphpad software inc, San Diego, CA, USA).
4. Discussion
In the present study, we took advantage of scRNA-seq analysis to investigate a detailed dialectical relation between pediatric and adult types. The 2021 updated WHO classification of CNS tumors first mentioned the classification of adult glioma and pediatric glioma based on the molecular diagnosis. Here, we used the scRNA-seq analysis to explore the diversity and similarities in the occurrence and development of adult and pediatric types, and to reveal the heterogeneity of cancers.
The unsupervised clustering of cells identified multiple cell clusters with clear boundaries in adult GBM and pediatric GBM, suggesting that these clusters had distinct biological characteristics and significant heterogeneity. Cell heterogeneity is an important reason for the different prognosis of GBM patients and intertumoral molecular heterogeneity poses a significant challenge for treatment [
13]. Other studies classified GBM based on promoter DNA methylation, the microRNA profile and intragenic breakpoints [
14,
15,
16]. Therefore, targeted treatment based on molecular diagnosis is highly necessary and reasonable. Our results also demonstrate that the number of cell clusters in the adult group was significantly higher than that in the child group, which indicates that the adult GBM has a higher level of complexity, heterogeneity and refraction. This was further demonstrated by the pseudotime trajectory analysis. The occurrence and development of tumors is a dynamic process, and cells with different temporal states in the same tissue, are reasons for cell temporal heterogeneity [
17]. By reconstructing the cell trajectory and pseudotime based on scRNA-seq data, dynamic processes can be calculated and simulated, which is of great significance for understanding the transition between cell states in cancer [
18]. The results show that the adult cell trajectory was divided into 11 time states by five branch points and the child cell trajectory was divided into only five time states by two branch points. Moreover, the distribution of cell clusters in the child group was also relatively simpler than that in the adult group.
However, adult glioma and child glioma are not completely different from each other. Most specific genes of cell clusters appeared in both the adult group and child group, simultaneously. We also found that the cell clusters located in similar time sequences have a higher degree of correlation based on the combination of the Jaccard coefficient and the cell development trajectory analysis. These results indicate that GBM is evolution-conserved between children and adults, and only a few genes cause biological differences.
The considerable heterogeneity of tumor tissue samples between different patients is an important factor for treatment failure and affects the prognosis of patients [
19,
20]. We analyzed the differences in adult GBM cell cluster content between different prognostic groups. The results show that cluster 9 was more abundant in the poor-prognosis group and cluster 10 was more abundant in the good-prognosis group. Since cell cluster 9 was in the middle–late stages of the trajectory and cell cluster 10 was in the early stage, we hypothesized that the pseudotime value could be a risk factor of poorer prognosis in adult GBM patients. The survival analysis shows that patients with higher pseudotime values had a worse prognosis. Pseudotime is a measure of the progress of individual cells in processes such as cell differentiation [
21]. Therefore, the pseudotime value reflects the state of the cells to some extent, and the effect of pseudotime value on prognosis is actually the effect of cell state on prognosis. Few studies focus on the impact of the pseudotime value on prognosis, and our study found that the pseudotime value is a potential candidate for the impact on prognosis. This may provide new ideas for the establishment of prognostic models in the future.
In 2014, Patel et al. analyzed 430 cells of five primary GBMs though scRNA-seq and found that these cells differed in the expression of various features, including cell proliferation, hypoxic stress, immune response and carcinogenic signaling pathways [
22]. Therefore, understanding the functions of single tumor cells and recognizing cell subset characteristics are of great significance for treatment strategies. Identifying specific genes by single-cell gene expression profiling allows the elucidation of mechanisms underlying tumor invasion and migration that are critical for preventing metastasis [
23]. Cell cluster 9 had an adverse effect on the prognosis of patients with GBM. The GO, KEGG and GSEA functional enrichment analyses were performed for specific genes of cell cluster 9 to reveal the underlying causes that affect the prognosis.
The GO enrichment analysis of these genes shows that they were mainly related to cell adhesion, which is closely associated with the development of cancer, especially the further metastasis of cancer cells. The KEGG analysis also reveals that cell adhesion was one of the most significant enrichment pathways, which was consistent with the GO enrichment analysis results. The results of the functional enrichment analysis explain why cell cluster 9 is a poor prognostic factor for GBM patients. Specific cells in cell cluster 9 are involved in the regulation of the cell adhesion function, which promotes the ability of cell invasion and migration, thus leading to poor prognosis of patients. In addition, the migration and metastasis of tumor cells generally occur in the late stage of the disease, which also explains why cell cluster 9 is mainly located in the middle–late stage in the pseudotime trajectory. The GSEA enrichment analysis reveals that the most significant down-regulated pathways were the antigen-processing and p53 pathways. As a tumor suppressor gene, the down-regulation of p53 predicts the further deterioration and development of cancer [
24,
25], and the weakening of antigen processing ability also leads to the further invasion of cancer cells [
26,
27]. In conclusion, the malignant characteristics of cell cluster 9 were further confirmed at the level of biological function.
The in-depth analysis of tumor cells at single-cell resolution is conducive to the identification of potential therapeutic targets, contributing to the development of new drugs and the improvement of survival. In 2017, Darmanis et al. performed scRNA-seq on 3,589 cells taken from four GBM patients. A group of genes involved in inhibition of apoptosis, regulation of adhesion and CNS development were identified, which provided new ideas for treatment [
28]. In 2019, Wang et al. first identified the RAD51AP1 as an oncogene in GBM using scRNA-seq [
29]. This study revealed a new possibility for treatment, which could enhance the therapeutic effect and prolong survival.
The WGCNA method was used based on the scRNA-seq results to investigate the key genes in cell cluster 9, and SHISA9 and ASTN2 were identified as the hub genes. These two genes may play an important role in the process of cell cluster 9 characterizing cancer malignancy. A search in the GeneCards database [
30] revealed that SHISA9 may be involved in the regulation of AMPA receptor activity and short-term neuronal synaptic plasticity and was identified as a risk gene for schizophrenia [
31]. ASTN2, which primarily encodes astrotactin, has been reported to be dysregulated in various neurodevelopmental disorders [
32].
Although some studies have demonstrated that the ASTN2 gene plays an important role in glial-guided neuronal migration, there are no studies about its impact on GBM cell migration. ASTN2 is not specific for GBM and also expresses moderately in normal glial cells. However, both bioinformatics and our experiments show that the ASTN2 expression in GBM was significantly higher than that in normal glial cells. These results suggest that ASTN2 may play a role in the development of GBM. Combined with functional enrichment analysis and its role in glial-guided neuronal migration, it is reasonable to speculate that ASTN2 may have an effect on GBM migration. We found for the first time that the down-regulation of ASTN2 could inhibit the migration of GBM LN229 and A172 cell lines, which provides a new direction for future studies on the inhibition of glioma migration and metastasis. Although the underlying mechanism was not studied further, we found a correlation between ASTN2 and SHISA9. The down-regulation of ASTN2 reduced the expression of SHISA9, which may also be a breakthrough in the study of the mechanism of inhibiting migration.
In summary, our findings demonstrate that pseudotime value maybe a predictive factor for GBM prognosis. It is particularly noteworthy that the overexpression of ASTN2 in GBM cell lines is associated with the expression of SHISA9 in GBM patients. Overall, these findings indicate that ASTN2 could be a promising target for GBM migration inhibition.