*2.3. Histology*

Images of the hematoxylin-eosin sections were taken (20X magnification) using an Olympus BX-51 microscope using an Olympus DP71 digital camera, and DP Controller software. Tumor purity was estimated based on morphologic review of the entire hematoxylin-eosin stained section estimating the number of tumor cells, stromal cells, lymphocytes, and extravasated red blood cells. Two pathologists reviewed these slides independently providing an estimated percentage of total tumors cells per slide.

#### *2.4. Sequencing and Bioinformatics Analysis*

Whole exome sequencing (WES), RNA sequencing (RNA-Seq), and copy number analysis (CNVkit) [26] were performed on each sample and compared to a blood sample as a germline DNA control. Both Illumina Whole Genome Sequencing (eWGS) of 3 tumor samples and 1 PBMC normal sample, and Illumina RNA Sequencing of the 3 tumor samples were generated from the sampled areas.

#### 2.4.1. Library Construction and Sequencing

Each tumor had 2 enriched libraries constructed (n = 6), and the PBMCs had a single enriched library constructed (n = 1). Exome libraries were captured with an IDT exome reagent, then pooled with a WGS library for sequencing on an Illumina HiSeq4000 with at least 1000x coverage. RNA was prepared with a TrueSeq stranded total RNA library kit, then sequenced on an Illumina HISeq4000 with 72M reads per sample.

#### 2.4.2. IDT Exome Sequencing Variant Detection

Genomic data were aligned against reference sequence hg38 via BWA-MEM [27] with Base Quality Score Recalibration (BQSR). Structural variants (SVs) and large indels were detected using manta [28]. SNVs and small indels were detected using VarScan2 [29], Strelka2 [30], MuTect2 [31], and Pindel [32] via the somatic pipelines available at https://github.com/genome/analysis-workflows, which includes best-practice variant filtering and annotation with VEP (Variant Effect Predictor, version 95) [33]. Manual review was used to remove additional sequencing artifacts. Germline variants and somatic variants reported on variant detecting pipeline were compared to see any intersection of variants. Any intersecting variants were removed from the somatic variant gene list, thus filtering out the germline variants. Common variants with 1000 genome MAF (minor allele frequency) > 0.05 were filtered out. Waterfall somatic variant plots were created with GenVisR [34] by including somatic variants that occurred in each area. Variants reported on the waterfall plot are most likely to be pathogenic, which is reported via VEP. These variants were not reported as a somatic variant in COSMIC (Catalogue Of Somatic Mutations In Cancer) [35] and ClinVar [36] archive, thus these variants are best classified as variants with unknown significance. In order to predict clinical significance and predictions of the functional effects of these variants, each variant was reviewed on SIFT [37] and Polyphen [38]. IMPACT rating was determined by VEP for each non-coding variant.

#### 2.4.3. Copy Number Analysis

CNVkit was used to infer and visualize copy number from high-throughput DNA sequencing data. Coverage for each bait position in the exome reagen<sup>t</sup> was calculated, then segments of constant copy number were identified using circular binary segmentation. Data were plotted to provide visualization of CNVs.

#### 2.4.4. Inference of Clonal Phylogeny

SciClone [39] and ClonEvol [40] were utilized to attempt to perform a phylogeny inference. However, the analysis was complicated by the abundance of copy number-altered regions in these tumors, and these standard algorithms were unable to automatically perform that inference. Manual review of the shared and private single nucleotide variants and large copy number altered areas, though, revealed only one possible phylogeny for this tumor.

#### 2.4.5. RNA Sequence Preprocessing

RNA-Sequence (RNA-seq) was trimmed from 3--end with a minimum quality Phred score of 20 and aligned against hg38—Ensembl Transcripts release 99 via BWA-MEM. Pre/pos<sup>t</sup> quality control and full expectation-maximization (EM) quantification were run via Partek® Flow® [41]. Gene counts and transcript counts were normalized by CPM (counts per million) by using edgeR [42] package. Heatmap visualizations were created using gplots [43] R package (Warnes, G.R. Seattle, WA, USA).

#### 2.4.6. Gene Differential Expression Analysis

The gene-specific analysis (GSA) method was used to test for differential expression of genes or transcript between sample regions in Partek® Flow® [44]. Differential expressed genes were defined as the following statistic parameters: *p*-value <= 0.05; FDR step up <= 0.05; Fold Change < −2 or >2. From differentially expressed genes, a GO enrichment test was used to functionally profile this set of genes, to determine which GO terms appear more frequently than would be expected by chance when examining the set of terms annotated to the input genes, each associated with a *p*-value.

## 2.4.7. Pathway Analysis

A list of genes in copy number aberrant (CNA) regions was extracted. CNA regions were defined as copy number regions greater than 3 or copy number regions less than 1. For each area, we intersected the list of genes that are located in the CNA regions with the differentially expressed gene list reported in the RNA differential expression analysis (*p*-value <= 0.05). PantherDB [45] was utilized to discover GO terms and pathways that may be affected by these genes.
