**4. Discussion**

NF1 is the most common of all neurofibromatosis syndromes and is caused by the loss of function of *NF1* gene (a known tumor suppressor) due to mutation or deletion. However, NF1 patients show a grea<sup>t</sup> deal of phenotypic heterogeneity [1]. Identification of candidate cellular signaling pathways that di fferentiate between various tumor types is key for understanding the biology underlying such phenotypic diversity, as well as predicting progression of tumor types towards malignancy. In this study, we integrated various *in silico* resources and analytical techniques to identify candidate genes or pathways unique to di fferent tumor types in an attempt to generate testable hypotheses. By capturing complex gene expression patterns using latent variables (LVs), we identified combinations of LVs that were important to classify tumor types. An ensemble of random forests was then used to select 98 latent variables that were important and su fficient for identifying and classifying the various tumor types with reasonably high accuracy (Figure 2). The selected LVs were then subjected to downstream analyses using di fferent data modalities to gain insight into the composition and relevance of the LVs in the context of NF1 (Figures 3–6). The selected latent variables that correlated with known pathways using tumor deconvolution methods confirmed the presence of previously described tumor microenvironment components. Investigation into previously uncharacterized LVs in the context of NF1 suggested the presence of (a) candidate genes for targeted experiments, (b) previously known as well as unknown tumor microenvironment components, and (c) candidate tissue-specific protein regulatory networks for future drug screening experiments.

To interpret the results of these analyses, we first evaluated the gene loadings of the latent variables that were associated with one or more tumor types. We found that two of the top LVs with lower expression in cNF but higher expression in other tumor types, LV 384 and LV 624 (Figures 3C and 6), had ties to known Schwann cells and NF1 tumor biology, as well as presence of immune cells in the tumor microenvironment. Specifically, the *EGR2* gene, one of the major components of LV 384, has been implicated in diseases associated with the myelin sheath such as Charcot–Marie–Tooth Disease (another disease of the Schwann cell) and is thought to play a role in pathways associated with myelination [53,54]. Indeed, clinical case studies of patients with concurrent NF1-induced tumors and Charcot–Marie–Tooth disease have been reported in the literature, suggesting a possible overlap in underlying biology [55–57]. Although the role of *EGR2* in Schwann cell di fferentiation is still under active investigation [58], *EGR2*-driven pathways have been found to be significantly downregulated in Lats1/2-deficient Schwann cell-based MPNST models [59]. Similarly, the SOCS6 gene, the top component of LV 624, is known to be directly involved in immune signaling via repression of cytokines [60] and has also been found to be a selective tumor suppressor [61]. Its upregulation in the more malignant NF tumors (Figure 3C) suggests that this gene plays a distinct role in NF-related tumors. Additionally, the *RUNX2* gene (a major component of LV 624), has been shown to drive

neurofibromagenesis by repression of the *PMP22* gene encoded myelin protein (a Schwann cell component) [62]. The enrichment of such genes in the selected LVs that di fferentiate nerve sheath tumors from cNFs showcase the importance of Schwann cell biology in these tumors and serves as a proof of principle for our analyses. Furthermore, EGR2 expression in immune cells has been shown to be important for activation of M1/M2 macrophages [63] as well as regulation of CD4+ T cells [64]. As discussed later in greater detail, our downstream analysis, using all the selected LVs and tumor deconvolution techniques, identified the enrichment of M2 macrophage and CD4+ T cell markers as tumor microenvironment components in our samples (Figure 5A).

Alternatively, we also evaluated LVs that are uniquely associated with specific tumor types. For example, LV 24 was found to be an important feature for classifying MPNST tumor samples but not the other benign tumor samples (Figure 3Biii). LV 24 was found to be significantly associated with the ΔNp63 pathway (FDR < 0.05), a pathway with implications in determining malignancy and poor survival in various subtypes of pancreatic and squamous cell carcinomas [65]. ΔNp63 signaling in the central nervous system is believed to play a role in neural precursor cell survival and neural stem cell dynamics [66,67], but its role in formation or maintenance of malignant NF1 tumors such as MPNSTs is relatively unexplored. Overall, the evidence surrounding EGR2, SOCS6, RUNX2, and ΔNp63 pathway sugges<sup>t</sup> that LV 384, LV 624, and LV 24 may be promising candidates for experimental follow-up.

Beyond looking at individual genes that comprise the latent variables, we employed orthogonal algorithms that measured tumor immune activity and regulatory protein activity to identify specific signatures represented by the latent variables. Tumor deconvolution methods (CIBERSORT, MCP-counter) confirmed previously described observations such as the presence of mast cells in NF1 model systems and patient tumors [68–74]. They also suggested the presence of T cells in human tumor samples. This has previously been described in mouse models of NF1 tumors [75]. In humans, systemic T cell burden has been found to correlate with NF1 nerve sheath tumor progression; T cell presence has also been observed in NF1 gliomas [10,76].

Through the metaVIPER protein regulatory activity predictions, we were able to identify putative therapeutic candidates for further evaluation. Specifically, we found two clusters of latent variables expressed in NF, pNF, and MPNST that had regulatory proteins enriched in targets of dovitinib and drug-like small molecules that inhibit receptor tyrosine kinases, as well as cyclin-dependent kinase (CDK) inhibitors such as abemaciclib or dinaciclib. These findings are consistent with previous studies that identify CDK inhibitors as useful in models of *NF1*-deficient or RAS-dysregulated tumors [77–80]. Because latent variables that are correlated with histone deacetylases (HDACs) were found to be expressed most highly in cNFs (cluster 5, Figure 6C, Figure S4C) and MPNSTs (cluster 1, Figures 4A and 6C), compounds such as CUDC-101 and analogs could be potential candidates for treating cNF and MPNST. Indeed, HDAC inhibitors were previously found to be e fficacious in in-vitro and in-vivo models of MPNSTs [77,81]. Thus, our results further sugges<sup>t</sup> that this therapeutic approach might be feasible in both MPNSTs and cNFs. Drugs found in other clusters such as dovitinib and lestaurinib in cluster 2, which is expressed most in MPNSTs, pNFs, and NFs (Figure 6C,D), may also merit further study.

Although the present *in-silico* study brings forth various candidate genes and pathways for further follow up, it also presents a few limitations that could be mitigated with future studies, particularly for tumor types with limited samples such as MPNSTs and neurofibromas. Most notably, analyzing genetic variants across tumor types failed to identify relevant variant signatures (Figure 4). This highlights the challenges in variant analyses using samples with limited class representation and motivates our focus on transcriptional signatures. Additional genomic and transcriptomic data from the same biobanks or additional tumor datasets will improve our ability to identify recurrent genetic markers of tumor type. Furthermore, additional data from tumor-adjacent normal tissue would greatly add value to additional analyses on the basis of di fferential gene expression. Such di fferential expression analyses were not possible within the scope of this work since these data are not currently available. Additionally, future studies comparing genomic signatures identified here to other publicly available tumor expression and variant data (e.g., the Cancer Genome Atlas or the International Cancer Genome Consortium [19]) may identify genomic similarities between peripheral nerve sheath tumors and other tumors.
