*Article* **Identification of Somatic Structural Variants in Solid Tumors by Optical Genome Mapping**

**David Y. Goldrich 1,†, Brandon LaBarge 1,†, Scott Chartrand <sup>2</sup> , Lijun Zhang <sup>2</sup> , Henry B. Sadowski <sup>3</sup> , Yang Zhang <sup>3</sup> , Khoa Pham <sup>3</sup> , Hannah Way <sup>3</sup> , Chi-Yu Jill Lai <sup>3</sup> , Andy Wing Chun Pang <sup>3</sup> , Benjamin Clifford <sup>3</sup> , Alex R. Hastie <sup>3</sup> , Mark Oldakowski <sup>3</sup> , David Goldenberg <sup>1</sup> and James R. Broach 2,\***


**Abstract:** Genomic structural variants comprise a significant fraction of somatic mutations driving cancer onset and progression. However, such variants are not readily revealed by standard next-generation sequencing. Optical genome mapping (OGM) surpasses short-read sequencing in detecting large (>500 bp) and complex structural variants (SVs) but requires isolation of ultra-highmolecular-weight DNA from the tissue of interest. We have successfully applied a protocol involving a paramagnetic nanobind disc to a wide range of solid tumors. Using as little as 6.5 mg of input tumor tissue, we show successful extraction of high-molecular-weight genomic DNA that provides a high genomic map rate and effective coverage by optical mapping. We demonstrate the system's utility in identifying somatic SVs affecting functional and cancer-related genes for each sample. Duplicate/triplicate analysis of select samples shows intra-sample reliability but also intra-sample heterogeneity. We also demonstrate that simply filtering SVs based on a GRCh38 human control database provides high positive and negative predictive values for true somatic variants. Our results indicate that the solid tissue DNA extraction protocol, OGM and SV analysis can be applied to a wide variety of solid tumors to capture SVs across the entire genome with functional importance in cancer prognosis and treatment.

**Keywords:** optical genome mapping; solid tumors; cancer genomics

#### **1. Introduction**

One of the hallmarks of cancer is genomic instability, which often affects genes controlling cell division and genome integrity. The resulting alterations include single-nucleotide variant (SNV) point mutations as well as structural variants (SVs), in which larger DNA segments undergo chromosomal perturbations such as deletions, insertions, duplications, inversions, and translocations. For instance, recurrent translocations, such as the Philadelphia chromosome, can activate oncogenes but at the same time reveal avenues for implementing or developing effective targeted drug therapies [1–4]. Likewise, SV identification plays an increasingly important role in cancer diagnosis and prognosis [5,6], and SVs have been shown to play a crucial role in intra-tumoral genetic heterogeneity [7]. Therefore, SV identification and analysis are important to understanding oncogenesis and tumorbehavior.

**Citation:** Goldrich, D.Y.; LaBarge, B.; Chartrand, S.; Zhang, L.; Sadowski, H.B.; Zhang, Y.; Pham, K.; Way, H.; Lai, C.-Y.J.; Pang, A.W.C.; et al. Identification of Somatic Structural Variants in Solid Tumors by Optical Genome Mapping. *J. Pers. Med.* **2021**, *11*, 142. https://doi.org/10.3390/ jpm11020142

Academic Editor: Silvia Fantozzi

Received: 22 January 2021 Accepted: 15 February 2021 Published: 18 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Short-read sequencing can readily detect many SNVs, but is less successful in detecting SVs, by either alignment-based or assembly-based methods [8]. Since alignment-based approaches rely on mapping reads to unique positions, repetitive and low-complexity genomic regions can lead to misalignment and false-positive SV calls. Additionally, homologous alleles may be incorrectly combined, leading to haploid assembly only representing a single allele or chimeric assemblies mixing alleles. Whole-genome and cytogenetic approaches such as whole-genome sequencing (WGS), karyotyping, fluorescent in situ hybridization (FISH) and CNV microarrays also contain significant limitations. Karyotyping provides a comprehensive view of the entire genome but carries limited resolution of ~5 Mb and in most cases requires culturing cells before preparing chromosomes. FISH has a higher resolution but requires prior knowledge as to which loci to test and has limited throughput. CNV microarrays offer a resolution down to multiple Kb but are insensitive to balanced chromosomal aberrations such as translocations and inversions, are unable to detect low-frequency allelic changes, and cannot distinguish tandem duplications from insertions in trans. Finally, WGS has difficulty with de novo genome assembly and resolving duplications and repeated sequences [8–10]. Therefore, alternative methods are required to preserve long-range genomic structural information.

Optical genome mapping (OGM) has emerged as a viable option for analyzing large genomes for SVs. OGM preserves long-range information by imaging entire intact molecules of DNA in their native state and, as a result, has contributed to constructing reference genome assemblies, including those for maize, mouse, goat, and humans [11–28]. OGM can detect large (>500 bp) and complex SVs, such as chromothrypsis, that are difficult to detect using traditional short-read sequencing alone. OGM preparation and analysis workflow has been successfully applied to liquid-phase tumor and cell culture SV analyses. For instance, investigators have analyzed primary leukemic cells with OGM to identify previously unrecognized SVs implicated in oncogenesis and patients' survival and have combined OGM with chromosome conformation capture to demonstrate enhancer highjacking resulting from SVs [5,29,30]. Similarly, investigators used OGM to visualize complex gene fusions and novel somatic SVs in liposarcoma, melanoma and other well-studied cancer cell lines [31,32].

Despite its success in visualizing SVs in liquid tumors and cell lines, OGM has not yet seen widespread application in solid tissue tumors, due primarily to the difficulty of obtaining high-quality, high-molecular-weight DNA from solid tumor samples. Nonetheless, previous work has shown the feasibility of high-quality high-molecular-weight DNA isolation and analysis using earlier workflow iterations [33], and recent feasibility studies have shown the importance of OGM application to solid tumor analysis [7,34,35]. Peng et al. demonstrated large SVs not detected by WGS implicated in metastatic lung squamous cell carcinoma [7], and Jaratlerdiri et al. and Crumbaker et al. similarly found SVs impacting oncogenic and tumor-suppressing genes not identified by NGS or WGS alone in prostate cancer [34,35]. However, these previous methods for extracting gDNA from solid tissue were either prohibitively expensive or yielded low quantities of DNA [36]. We demonstrate here the successful implementation of a workflow to generate ultra-high-molecular-weight gDNA and subsequent SV analysis for 20 solid tumor samples comprising a wide variety of solid tissue organ systems.

#### **2. Materials and Methods**

#### *2.1. Tumor Samples*

Solid tissue was collected following surgical resection for 10 tumors: four squamous cell carcinomas of the tongue, three anaplastic carcinomas of the thyroid, one liver hepatocellular carcinoma, one lung pleomorphic carcinoma, and one bladder tumor. Patients consented under protocols approved by the Penn State Health Institution Review Board and tissue was flash frozen and stored at −80 ◦C in the Penn State Institute for Personalized Medicine (IPM). Ten additional fresh frozen solid tumor samples were acquired from BioIVT for the following tumor types: lung adenosquamous carcinoma, liver hepato-

cellular carcinoma, bladder papillary urothelial carcinoma, kidney renal cell carcinoma, breast ductal carcinoma in situ, prostate invasive adenocarcinoma, brain anaplastic astrocytoma, ovarian serous carcinoma, colon adenocarcinoma, and papillary thyroid carcinoma. For some of the samples, two or three separate sections of the tumor were excised and processed independently to provide duplicate or triplicate biological replicates.

#### *2.2. Bionano Optical Genome Mapping*

*Ultra-High-Molecular-Weight gDNA Isolation from Solid Tissue.* The following protocol is diagrammed in Figure 1 and described in greater detail in a support document from Bionano Genomics (https://bionanogenomics.com/support-page/sp-tissue-and-tumordna-isolation-kit/). Briefly, tissue sections with a target mass of 10 mg were sliced from a frozen parent piece on a sterilized aluminum block over dry ice. The tissues were minced briefly and placed into a 15 mL conical tube on ice containing homogenization buffer (HB) for subsequent blending with a Tissueruptor II (Qiagen). Following tissue disruption, samples were washed in additional HB, poured through a 40 µm filter, and centrifuged to pellets, from which the supernatants were decanted.

**Figure 1.** Workflow for isolation of high-molecular-weight DNA from solid tumors.

Pellets were resuspended in Wash Buffer A (Bionano, San Diego, CA, USA) and transferred to microcentrifuge tubes for additional washing. Supernatants were then decanted, and pellets resuspended in residual volume. Proteinase K (Bionano Genomics, San Diego, CA, USA) was added to samples, followed by Lysis and Binding Buffer (LBB, Bionano Genomics, San Diego, CA, USA) and mixed to produce a lysate containing highmolecular-weight DNA. Phenylmethylsulfonyl Fluoride Solution (PMSF, Millipore Sigma) was added to inactivate Proteinase K, followed by Salting Buffer (SB, Bionano Genomics, San Diego, CA, USA).

A single paramagnetic Nanobind Disc (Bionano Genomics, San Diego, CA, USA) was added to the lysate with 100% isopropanol, to facilitate binding and washing of gDNA strands. With gDNA captured on the disc, the supernatants were carefully removed and discs were washed with rounds of ethanol-based wash buffer. Discs were then transferred to clean tubes, where gDNA was eluted in buffer and homogenized at room temperature.

*Ultra-High-Molecular-Weight gDNA Isolation from Blood.* Previously frozen EDTAstabilized blood aliquots were thawed, inverted to mix, and measured for white blood cell counts (HemoCue, Brea, CA USA, WBC). Blood volumes corresponding to 1.5 × 10<sup>6</sup> cells were transferred to a microcentrifuge tubes, then spun to obtain cell pellets. After removing supernatants, pellets were resuspended in 40 µL Stabilizing Buffer and 50 µL Proteinase K (Bionano Genomics, San Diego, CA, USA). Lysis and Binding Buffer (LBB, Bionano Genomics, San Diego, CA, USA) was then added and mixed to produce a lysate, after which isolation of DNA was performed essentially as described above for tumor tissue.

*Direct Label and Staining (DLS).* For both tumor- and blood-derived samples, gDNA was labeled in Direct Label and Stain reactions, in which fluorescent labels are enzymatically conjugated to a six-base pair recognition sequence followed by DNA counterstaining. Briefly, 750 ng gDNA was diluted and mixed with a labeling master mix containing DLE-1 Enzyme and DL-Green (Bionano Genomics, San Diego, CA, USA). Reactions were shielded from light and incubated at 37 ◦C for 2 h. A Proteinase K solution then inactivated the enzyme, and successive membrane adsorption steps were used for cleanup. A portion of each sample was then carried forward into a staining master mix addition, slowly homogenized, and incubated overnight at room temperature.

The DNA concentration of each labeled sample was confirmed within 4–12 ng/µL by High-Sensitivity dsDNA Qubit Assay and then loaded onto a Bionano Saphyr® Chip (Bionano Genomics, San Diego, CA, USA, Part#20366) and run on the Bionano Saphyr® instrument, targeting approximately 300× human genome coverage.

#### *2.3. Bionano Access and Solve Pipeline*

Genome analysis was performed using Rare Variant Analysis in Bionano Access 1.6 and Bionano Solve 3.6, which captures somatic SVs occurring at low allelic fractions. Briefly, molecules of a given sample dataset were first aligned against the public Genome Reference Consortium GRCh38 human assembly. SVs were identified based on discrepant alignment between sample molecules and GRCh38, with no assumptions about ploidy. Consensus genome maps (\*.cmaps) were then assembled from clustered sets of at least three molecules that identify the same variant. Finally, the genome maps were realigned to GRCh38, with SV data confirmed by consensus forming final SV calls. SVs were then annotated with known canonical gene set present in GRCh38, as well as estimated population frequency for each structural variant detected by comparing to a custom control database (*n* = 297) from Bionano Genomics.

#### *2.4. Data Comparison*

Whole-genome imaging data were compared to the human reference genome GRCh38 (hg38) to retain only those SVs not present in the reference genome. SVs were further filtered to eliminate any variant observed in any of the Bionano control samples or, if available, patient-matched blood. Bionano Access-created csv files containing filtered SVs were analyzed to compare SV content across samples. For tissue samples with associated

blood samples, control database filtration efficacy was compared to blood-filtering efficacy at identification of somatic mutations. For duplicate/triplicate samples, filtered SVs were compared to determine intra-sample reliability. For identification of cancer-related genes, the set of genes affected by SVs in each of the samples was compared to the list of genes causally implicated in cancer available in the Cosmic Cancer Gene Census database (v92) [37] (https://cancer.sanger.ac.uk/census).

#### **3. Results**

*Patient Clinical Characteristics.* Clinical data for the patients from whom tumor samples were acquired are shown in Table 1. A total of 60% (12/20) patients were male, with a mean age of 73.5 years at sample acquisition. A total of 45% (9/20) patients identified as Caucasian, 40% (8/20) as Asian, and 5% (1/20) as Hispanic, with 10% (2/20) not identifying. The majority of IPM-sourced tumor samples were obtained from Caucasian patients (7/10), while the majority of the BioIVT-sourced tumor samples were obtained from patients of Asian ethnicity (8/10). In terms of overall risk factors, 55% (11/20) of patients were self-described current or former tobacco users and 45% (9/20) endorsed some history of alcohol use.

**Table 1.** Patient demographics and tumor characteristics.



**Table 1.** *Cont.*

\* SCC: squamous cell carcinoma; AP: anaplastic.† ~Age (≥Age-3 and ≤Age+3) ‡ . Pathologic Staging: Tumor, Node Metastasis (TNM) staging is the internationally accepted system set forth by the American Joint Committee on Cancer (AJCC) used to determine cancer disease stage and guide prognosis and treatment (https://www.cancerstaging.org) [38].

> The tumor samples consisted of a variety of stages (Table 1). A total of 75% (3/4) of tongue cancer samples and 100% (3/3) anaplastic thyroid cancers were stage IV cancers, while 100% (2/2) lung and (2/2) bladder cancers were stage II. Limited tumor data were available for the commercially available BioIVT-sourced tumor samples.

> *DNA Quality Metrics*: All 20 solid tumors yielded high-molecular-weight gDNA (Table 2). The average concentration across all samples following gDNA isolation was 120 ng/µL by Broad Range dsDNA Qubit Assay. All eluted gDNA were well above the minimal concentration required for DLS labeling (35 ng/µL) and the average final DNA yields for each tumor ranged from 1.2 to 16.4 µg/10 mg input tissue. Analysis on a Saphyr instrument following DLS labeling revealed that samples achieved an average label density of 14.4/100 Kbp, average filtered N50 (>20 Kbp) DNA size of 242 Kbp, average filtered N50 (>150 Kbp) DNA size of 315 Kbp, map rate of 82.62%, effective reference coverage of 320× and average effective DNA throughput (≥150 Kbp) of 50 Gbp/scan. Rare Variant Pipeline Analysis of the samples yielded an average of 82.4% of molecules aligning to the reference genome. These values are all well above the acceptable range for obtaining high-quality data and none of the samples failed any of these quality control metrics.

> *Identification of somatic structural variants*. Rare Variant analysis of the samples revealed a large numbers of variants in each sample, only a fraction of which were likely somatic. The unfiltered analysis yielded an average of 1633 total SVs per sample (range 1241–2000), which include both somatic and germline polymorphic variants (Figure 2, upper panel). These consisted predominantly of insertions and deletions, with an average of 712 insertions and 604 deletions, a fewer number of inversion (an average of 153) and duplications (an average of 123), and relatively few translocations (an average of 41). Eliminating those SVs found in Bionano's control database of known polymorphic SVs reduced the number of putative somatic structural mutations by 91% to an average of 124 total SVs per sample (Figure 2, lower panel). Most of the variants eliminated were insertions and deletions, of which on average 97% and 94%, respectively, were removed. On the other hand, less than 0.2% of the translocations were flagged as polymorphic, consistent with the fact that almost no translocations persist in the population as polymorphisms.


**Table 2.** Single-molecule quality report metrics.

Average values are presented for samples with multiple replicates.

ymorphic SV found in Bionano's GRCh38 control database. SV counts are averages **Figure 2.** Total and somatic structural variants present in tumor samples. Upper panel: SV counts as determined using the Bionano Rare Variant pipeline, before control database filtration. SV counts are averages for duplicate and triplicate samples. Lower panel: SV counts after filtering total SVs to remove known polymorphic SV found in Bionano's GRCh38 control database. SV counts are averages for duplicate and triplicate samples, which are indicated by (\*).

To determine the efficacy of identifying somatic SVs by filtering against Bionano's database of known polymorphisms, we used as a gold standard the blood samples from four patients from whom we had obtained tongue tumors. That is, we determined the true somatic mutations in each of these four tumors by eliminating those SVs identified in each of the tumors that were also present in the corresponding blood sample. We could then compare those true somatic variants to the list of somatic variants predicted by filtering against the database of polymorphisms. For these four tongue tumor samples, we identified an average of 1474 total SVs per sample. Filtering these SVs using the Rare Variant Analysis pipeline for SVs not found in the Bionano control database yielded an average of 72 total SVs per sample, consisting of 11 insertions (range 9–15), 31 deletions (range 11–47), 3 inversions (range 1–6), 14 duplications (range 2–23), and 14 translocations (6–19) (Figure 3, right upper panel). Filtering against the variants found in the corresponding blood samples returned an average of 58 total SVs per sample, consisting of 10 insertions (range 9–10), 20 deletions (range 7–35), 2 inversions (range 0–4), 13 duplications (range 4–24), and 14 translocations (range 6–19) (Figure 3, left upper panel). Comparing the residual SV sets obtained by filtering against Bionano's control database to the sets of true somatic SVs for each sample demonstrated that the control database filtration exhibited strong statistical accuracy (Figure 3, lower panel). Across the four separate samples, the control database exhibited an average sensitivity of 92% (83–96%) and specificity of 98% (range 97–99%). That is, filtering with the control database retained most of the true somatic mutations while eliminating almost all of the polymorphic SVs. Similarly, the average negative predictive value of the filter was 99.6%, demonstrating that an SV identified as germline was indeed a germline variant, while the positive predictive value of 74% (range 60–81%) indicates that a majority, but not all, the variants identified as somatic are in fact somatic. In other words, the results obtained by filtering SVs against Bionano's control database retained almost all the true somatic mutations. However, several of the SVs identified as somatic were actually germline. Those SVs inaccurately identified as somatic were rare germline variants, predominantly insertions or deletions, essentially private to the patient's genome. As above, we noted that the filtering process did not affect all SV types equally: while most deletions and insertions were flagged as polymorphic and eliminated from the list of somatic mutations, very few duplications and essentially no translocations were identified as polymorphic. This is consistent with observation that few translocations or duplications are stable through meiosis. *Duplicate Sample Analysis*. We compared SV calls from separate isolates of the same sample to assess consistency and reproducibility of the method, albeit without knowing the extent of tumor heterogeneity of the individual samples. Six samples underwent triplicate analysis, and four samples underwent duplicate analysis (Table 3). After identifying SVs using the Rare Variant Analysis pipeline and filtering them against the Bionano control database of known polymorphisms, we recovered an average of 116 somatic SVs shared among the separate isolates of the same tumor. These comprised an average of 23 insertions, 29 deletions, 10 inversions, 11 duplications and 43 translocations (Table 3). As noted above, the number of SVs identified in a tumor varied widely across the different tumors examined, with lung, breast, brain and ovarian tumors showing a high level of somatic SVs while the others containing a relative low number of SVs. Moreover, the percentage of SVs shared among different isolates of the same tumor also varied among the different tumor types. However, the percentage of shared SVs and the total number of SVs were uncorrelated. Assuming that the higher values for shared SVs reflect the reproducibility of the method, then we might postulate that the lower shared values represent both the reproducibility and the tumor heterogeneity. That is, we would suggest that the reproducibility of the method across multiple biological replicates is 85–95%, corresponding to the values obtained from those samples with the least variability. Thus, we would suggest that the residual variability in those samples with lower reproducibility (50–75%) reflects heterogeneity of SVs in the tumors. This would suggest that these brain, liver, lung and prostate tumors had a relatively high level of tumor heterogeneity.

as determined by filtering against SVs in the patient's genome from peripheral blood (left) or against Bionano's control database of known polymorphisms. Lower Panel: V **Figure 3.** Efficacy of the somatic variant identification using a control database of known polymorphisms. Upper Panel: Number and distribution of somatic structural variant in four tongue tumors as determined by filtering against SVs in the patient's genome from peripheral blood (left) or against Bionano's control database of known polymorphisms. Lower Panel: Values for sensitivity (SN), specificity (SP) and positive (PPV) and negative predictive values (NPV) for identification of somatic structural variants obtained by filtering total identified SVs to remove those present in a control database of know human polymorphisms. Data obtained by filtering against the control database were compared to those obtained by filtering total SVs to remove those present in the genomes obtained from peripheral blood from the each of the patients from whom the tumors were removed.

onano's control database to the sets of true somatic SVs for each sample

–

–


**Table 3.** Duplicate Sample Analysis. Shown are the number of somatic structural variants shared among the multiple isolates of the same sample and the percentage of those relative the total number of somatic variants found in all the isolates of the same sample.

\* = duplicate sample, ‡ = triplicate sample. % = % of SV calls shared among duplicate/triplicate samples.

The number and types of somatic variants in a tumor varied substantially across the collection of samples (Figure 4). Several tumor samples, including those from colon, bladder, kidney and all four from thyroid, contained relatively few somatic SVs whereas others, including those from prostate, ovaries, lung and brain, carried a large number of somatic SVs. Since these samples for the most part serve as single representatives of each tumor type, we cannot extrapolate to the tumor types as a whole the contribution of SVs to cancer onset and development for each class of tumor. However, it is noteworthy that the SNV mutational burden in thyroid cancers is among the lowest among all tumor types and that measure of genome instability is mirrored in the low number of somatic SVs in all four of the samples examined [39]. Similarly, the SNV mutational burden in lung cancers is among the highest across all tumor types and both of the lung tumors examined here also carry a high level of somatic SV. Finally, the extent of somatic SVs observed in our collection of tumors does not correlate with either cancer stage nor with obvious lifestyle characteristics (Table 1). For instance, neither smoking nor drinking history has a stronger influence on SV mutation burden than does site of origin of the tumor. However, further data examining the correlation of lifestyle characteristics and tumor stages with SV mutational burden are warranted to assess the impact of these behaviors on SV formation and persistence.

*Identification of Cancer Gene Mutations*. While, as noted above, we cannot generalize regarding the role of structural variants in onset and progression of different tumor types, our results indicate that we can extract from the structural variant list clinically relevant data on individual tumors that might inform prognosis or treatment options. We examined the somatic structural variants in each tumor sample for those that affected genes previously associated with cancer. In particular, we annotated those genes altered by a structural variant, either by disruption, duplication, deletion or fusion, and intersected that list with the set of cancer-related genes in the Cosmic database (v92) [37]. The resultant list by tumor type is provided in Table 4 and subdivided into oncogenes, tumor suppressor genes and gene fusions. We included only those oncogenes that were potentially activated by duplication or gene fusion and only those tumor suppressor genes that were potentially inactivated by deletion, insertion or fusion. As evident, every tumor sample carried at least one such cancer gene mutation and most contained multiple hits. Several of these genes offer the opportunity for targeted therapies, focused either directly on the oncogene, as

would be the case for CDK6 and ERBB2, or at the pathway downstream of the affected gene, as would be the case for BRAF and CDKN2A. Other affected genes, such as MSH2, RAD51B, RAD21 and RAD18, suggest the potential of therapy based on possible ensuing genome instability, such as immunotherapy or PARP inhibitors. Many of these variants would not be readily identified by targeted gene panels generally used for clinical assessment of tumor genomes. Moreover, in many cases, the cancer genes altered by SVs were not previously associated with the cancer type in which we observed it. For instance, we observed a fusion of CDK6 in one of the tongue tumors while it has previously been associated predominantly only with ALL. Similarly, LRP1B is often inactivated in CLL or ovarian cancer, while we find it inactivated by deletion in one of the lung tumors. Thus, the identification of somatic structural variants by OGM could provide useful clinical insights not readily available through standard next-generation sequencing or targeted panels.

**Figure 4.** Global view of structural variants in solid tumor samples. Diagrams of somatic structural variants in all the solid tumor genomes, filtered to remove known polymorphisms, showing translocations and inversions in the center, copy number on the inner ring and insertions (green), deletions (orange) inversions (light blue) and duplications (violet) on the next to most outer ring. Chromosomes are ordered sequentially in a clockwise orientation in the outer ring on which are indicated cytological banding patterns and the centromere (red bar).

**ion** 

RMP

ZFHX3

Diagrams of somatic structural variants in all the solid tumor genomes, filtered to remove known polymorphisms, showing translocations and inversions in the center, copy number on the inner ring and insertions (green), deletions (orange) inversions (light blue) and duplications (violet) on the next to most outer ring. Chromosomes are ordered sequentially in a clockwise orientation in the outer ring on which are indicated cytological banding patterns and the centromere (red bar).

In addition to identifying individual cancer-related genes in tumor types, our results provide a panoramic view of the entire tumor genome and reveal large-scale genomic features not readily available from standard sequencing techniques. As evident in the results in Figure 4, our data provide a rapid snapshot of the extent of genomic instability in each of the tumors. Such images present an integrated picture of the aneuploidies, translocations, inversions, deletions and insertions, which offers a readily digestible impression of the extent of genetic instability underlying a tumor. Moreover, several large-scale features are evident in these data. For instance, chromothripsis is a massive cluster of chromosomal rearrangements localized to a restricted region of a chromosome, which often results from a single catastrophic event [40]. Figure 5 details a chromothripsis event on a portion of chromosome 5 in one of the lung tumor samples. In fact, such events are readily evident in four of the Circos plots in Figure 4, consistent with previous estimates of 2–3% prevalence across all cancers, albeit with different frequency in different cancers [41]. The detection and mapping of such a feature are difficult to achieve by short-read sequencing [41] but can indicate poor prognosis and the corresponding need for aggressive therapy.

**Table 4.** Structural variants affecting cancer relevant genes.



**Table 4.** *Cont.*

–

T, translocation; Dup, duplication; I, insertion; Del, deletion.

**Figure 5.** Chromothrypsis of chromosome 5p in a lung tumor. Shown is a truncated Circos plot of the lung tumor, focused on the region of chromosome 5, highlighting the chromothrypsis event that occurred on its p arm. The organization of the Circos plot is as indicated in the legend to Figure 4.

– —

—

#### **4. Discussion**

In this report, we described the application of optical genome mapping to solid tumors, which we suggest can significantly augment the genomic analysis of such tumors obtained by next-generation sequencing. Genomic analysis of tumors has stimulated major advances in cancer diagnosis, prognosis and treatment, shifting the focus from morphological and histochemical characterization to consideration of the landscape of driver mutations in the tumor [42–44]. Somatic driver events in a tumor—point mutations and structural variants (SVs) including insertions, deletions, inversions, translocations and copy number changes are currently identified in solid tumors by some combinations of RNA sequencing and genome sequencing of either targeted gene panels, whole exomes or whole genomes. As noted in this report, OGM can provide a pervasive view of the structural variants in a tumor and the cancer-related genes on which they impinge, thus identifying affected genes agnostically, without prior bias imposed by gene panels.

Some prior studies have begun to demonstrate the utility of Bionano DNA isolation protocols in solid tissue tumor analysis. These include studies of lung squamous cell carcinoma and metastatic prostate carcinoma [7,34,35]. This current report demonstrates the utility of the DNA isolation protocol and SV analysis in a wide variety of solid tissue types, and expands the feasibility of such analysis for previously unused human tissue types. The high DNA yield, high effective coverage, map rate and other molecular quality metrics shown across tumor types confirm how our extraction and analysis workflow can be effectively applied to many solid tissue tumors.

This current DNA isolation protocol carries a number of advantages. Tissue handling can be performed at room temperature. The current protocol showed successful DNA isolation in solid tissue samples of <20 mg, and even as low as 6 mg. The low tissue input requirement carries important applications for rare cancer samples, human tissue biopsy testing and other low-quantity specimen acquisition. Additionally, utilizing the novel paramagnetic Nanobind disks rather than prior agarose gel plugs greatly decreases time needed to complete DNA isolation to only 5 h. The ability to isolate DNA from up to eight simultaneous samples using the current protocol greatly amplifies throughput and reduces tissue-to-data processing time, increasing both laboratory convenience as well as expanding potential for clinical utility where rapid data turnaround is paramount. Furthermore, the strong inter-sample SV correspondence shown by most tissue types in duplicate/triplicate sample analysis demonstrates the reproducibility of this technique; intra-sample heterogeneity of select samples may be attributed to non-tumor normal tissue within some tissue fragments, or attributed to specific cancer subtype, and merits further investigation. Although the isolation protocol described here affords many advantages, there are some limitations to this protocol. While high-quality DNA isolation and OGM SV analysis was obtained for a wide variety of tumor types that were tested, it may not be generalizable to every additional untested solid tumor type. Future directions include continuing to validate this protocol in additional tissue types, and assessing additional tumor samples to assess broader trends in the role of specific OGM-identified SVs in individual cancer subtypes.

In clinical evaluation of liquid tumors such as leukemia, genomic analysis is augmented by karyotyping, which gives a panoramic, albeit low resolution, view of the entire genome. Despite the low resolution, the genome wide view of the structural changes afforded by karyotyping reveals diagnostic features of the tumor that have strong prognostic value. Given the consistent correlation of clinical outcomes with specific mutation classes, the World Health Organization (WHO), National Comprehensive Cancer Network (NCCN) and European Leukemia Net (ELN) agencies developed recommendations for diagnosis and management of acute myeloid leukemia in adults based on the spectrum of somatic point mutations and SVs generally revealed by karyotyping [45]. SVs, particularly translocations and inversions, are major considerations in this diagnosis. Since karyotyping is a very challenging technique to apply to solid tumors, the clinician does not have access to a comparable global view of a solid tumor's genome and the role of SVs in prognosis

has likely been underappreciated. Applying OGM broadly to cancer types and correlating SVs revealed by that analysis with clinical outcomes could provide new genomic markers for prognosis and treatment selection.

#### **5. Conclusions**

We demonstrate the utility of a DNA isolation protocol for high-molecular-weight DNA extraction and OGM SV analysis of a wide variety of solid human tumor types on the Bionano Saphyr system, including breast, colon, liver, brain, bladder, kidney, lung, ovary, prostate and thyroid cancer tissue. The system can be used to accurately detect genetic mutation hallmarks in cancer tissue samples, including rearrangements such as translocations, gene fusions and copy number alterations. Somatic SVs can be determined by comparison filtering with the Bionano control sample database, or against a matched pair sample. Importantly, Bionano SV pipelines can detect SVs with complex breakpoint structures that are difficult to detect with other technologies. Our results indicate that the solid tissue DNA extraction protocol can be applied to a wide variety of solid tumors, and that the Saphyr system can capture, in a streamlined workflow, a broad spectrum of SVs. These SVs have functional importance and provide great utility in cancer prognosis and treatment.

**Author Contributions:** Conceptualization, D.G., A.R.H., and J.R.B.; methodology, H.B.S., H.W., K.P. M.O., and Y.Z.; software, L.Z.; validation, C.-Y.J.L.; formal analysis, A.W.C.P., B.C., C.-Y.J.L., D.Y.G., and L.Z.; investigation, D.Y.G., B.L., and S.C.; writing—initial draft, D.G. and A.R.H.; writing editing, D.G., A.W.C.P., and A.R.H.; writing—final draft, J.R.B.; visualization, D.Y.G., A.R.H., and B.C.; supervision, A.R.H., D.G., J.R.B., H.B.S., and M.O.; project administration, A.R.H., J.R.B.; funding acquisition, D.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** Funding for this study was provided in part by a grant from the George Laverty Foundation (DG).

**Institutional Review Board Statement:** Patients consented under protocol 40532 approved by the Penn State Health Institution Review Board.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Primary Bionano Saphyr data are available on request (jbroach@pennstatehealth.psu.edu).

**Conflicts of Interest:** H.B.S., Y.Z., K.P., H.W., C.Y.J.L., A.W.C.P., B.C., M.O., and A.R.H. are employees of Bionano Genomics. The authors have no other conflicts of interest.

#### **References**


**Yigit Koray Babal <sup>1</sup> , Basak Kandemir <sup>2</sup> and Isil Aksan Kurnaz 1,\***


**Abstract:** The ETS domain family of transcription factors is involved in a number of biological processes, and is commonly misregulated in various forms of cancer. Using microarray datasets from patients with different grades of glioma, we have analyzed the expression profiles of various *ETS* genes, and have identified *ETV1, ELK3, ETV4, ELF4,* and *ETV6* as novel biomarkers for the identification of different glioma grades. We have further analyzed the gene regulatory networks of ETS transcription factors and compared them to previous microarray studies, where Elk-1-VP16 or PEA3-VP16 were overexpressed in neuroblastoma cell lines, and we identify unique and common regulatory networks for these ETS proteins.

**Keywords:** Ets; Elk-1; PEA3; Ets-1; glioma; biomarker

**Citation:** Babal, Y.K.; Kandemir, B.; Kurnaz, I.A. Gene Regulatory Network of ETS Domain Transcription Factors in Different Stages of Glioma. *J. Pers. Med.* **2021**, *11*, 138. https://doi.org/10.3390/ jpm11020138

Academic Editor: Greg Gibson

Received: 21 January 2021 Accepted: 13 February 2021 Published: 17 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

ETS proteins are present in metazoan lineages [1] and play a role in diverse biological processes. Intriguingly, ETS proteins also exhibit extensive overlaps in their tissue expression profiles, with many members of this superfamily having ubiquitous expression [2]. Not surprisingly, members of this family also tend to exhibit overlapping and sometimes redundant DNA binding, as analyzed by genome-wide occupancy and other assays [3].

The ETS (E26 transformation specific) domain transcription factor superfamily includes 27 members in humans in 11 subfamilies [2,4]. Taking its name from the founding member of this superfamily, namely oncogenic v-ets [5], ETS domain transcription factors are typically defined by their DNA binding domain, called the ETS domain, which binds the consensus motif 5′ -GGA(A/T)-3′ , called the ets motif of the ETS binding site (EBS) [6]. Their DNA binding property, as well as transactivation function, is regulated by the MAPK signaling pathway [7,8].

In addition to their roles in normal growth and development, ETS proteins are commonly involved in cancer formation and progression through the regulation of cell proliferation, adhesion, migration, or vascularization, as well as regulation of epithelial–stromal interactions and epithelial–mesenchymal transition [9]. The expressions of several ETS family members, such as PEA3, ETS-1, and ETS-2, are upregulated in tumors, playing a role in different aspects of tumorigenesis, including tumor initiation, epithelial-mesenchymal transition, metastasis, and angiogenesis [4,10]. In some cases, ETS members are amplified and/or rearranged, such as c-ETS1 in acute myelomonocytic leukemia, or undergo chromosomal relocations that result in fusions, like in the case of chromosomal translocation of 5′ TMPRSS2 to the ETS genes, resulting in TMPRSS:ERG fusion proteins in nearly half of prostate cancers, or chromosomal translocation that yields EWS-FLI1 fusion in Ewing sarcoma. The transcriptional potency of ETS proteins is also often increased in various cancers as a result of changes in protein–protein interactions, post-translational modifications, and/or protein stabilization [4].

Ets-1 was found to be overexpressed in breast cancer, which was reported to be associated with poor prognosis [10]. Ets-1 is not only a critical regulator of invasion [10], but is also involved in regulating cancer energy metabolism in ovarian and breast cancer cell lines [11]. It has also been shown to play a role in telomere maintenance through the regulation of hTERT expression [12]. T417 phosphorylation of Elk-1, a member of the ternary complex factor (TCF) subfamily, was found to be associated with the differentiation grade of colonic adenocarcinomas [13]. Additionally, in high clinical stage prostate cancer, ELK-1, not TCF members ELK3 or ELK4, was found to be associated with disease recurrence [14]. In fact, ELK1 expression was reported to be higher than ELK3 in many cancer cell lines, including brain, skin, and myeloid tumors and sarcomas [15]. PKCα expression, parallel to cell migration and tumorigenicity, of hepatocellular carcinoma was increased by MZF/Elk-1 transcription factor complex [16]. The PEA3 subfamily of ETS domain transcription factors was also involved in a number of cancers, such as in lung tumors with MET amplification, and PEA3 subfamily members were found to play a role in migration and invasion [17]. In colorectal carcinoma, PEA3 was shown to promote invasiveness and metastatic potential [18]. In ovarian cancer, the loss of repressors of the PEA3 subfamily was shown to cause overaccumulation of ETV4 and ETV5 [19].

Malignant gliomas are the most common and lethal primary tumors of the brain. Grading of diffuse gliomas is based largely on the mitotic activity and vascular proliferation states, and molecular markers are also used as diagnostic entities. Still, further molecular information will be important in a more detailed description and categorization of central nervous system (CNS) tumors, in particular for reliable and reproducible classification of grade II and grade III diffuse gliomas [20,21]. Glioblastoma multiforme (GBM), WHO grade IV, is the most aggressive and lethal among all gliomas. High-grade gliomas are composed of a highly proliferative tumor core, with highly invasive cells surrounding them [22].

ETV2, an early regulator of vascular development, was found to be overexpressed in high-grade gliomas, and was reported to play a critical role in endothelial transdifferentiation of CD133+ GBM stem cells, which is thought to render them resistant to anti-angiogenic therapy [23]. Another ETS-related gene, ERG, was found to be a novel and highly specific marker for endothelial cells within CNS tumors, a feature that can be used in studying the vascularization of gliomas [24]. A transposon-based study of gliomagenesis identified friend leukemia integration 1 transcription factor (Fli1), among other genes, to be expressed in gliomas, although Fli1 expression is limited to a subset of glioma cells [25], and ETS protein PU.1, known for its critical role in hematopoietic development, was also reported to be highly expressed in glioma patients, indicating its role in the progression of glioma [26]. In addition to their role in tumor vascularization, ETS proteins can also regulate other aspects of tumorigenesis. In a network analysis based on complexity, as measured by betweenness, Etv5 was identified as a regulator in low-grade optic gliomas in Nf1 mutant mice, and experiments validated the increased expression of both Etv5 and its target genes in optic gliomas [27].

Gliomas can be broadly classified as diffuse and non-diffuse (circumscribed) gliomas. Diffuse gliomas, namely oligodendrogliomas and astrocytomas, exhibit similarities to glial precursors, and are identified and categorized based on the WHO classification of CNS tumors [28]. Due to their rather heterogeneous nature, the reproducibility in diagnosis of low-grade (WHO grade II/III) diffuse gliomas can be a challenge. Several molecular markers, such as isocitrate dehydrogenase (IDH) mutations or telomerase reverse transcriptase (TERT) promoter mutations, which create ETS binding sites [12], are used to assist in differential diagnosis [20]. Previously, ETS gene status in clinical prostate tumor samples has been determined, and ERG+ and ETV1/4/5+ cases were found to be associated with worse prognosis, indicating that ETS status may act as a prognostic biomarker and be used in combination with other existing molecular determinants [29]. In this study, we have analyzed microarray data from patients with different grades of glioma for relative expression of ETS genes, and identified different ETS genes that are upregulated at different glioma grades. We show that, while ETV1 is expressed at high

levels in grade 2 glioma, its expression gradually decreases with glioma stage, and on the other hand, ELK3 and ETV4 expressions are increased with progression of the glioma stage. Furthermore, both ELF4 and ETV6 expressions are downregulated at grade 2 glioma, but upregulated at increasing levels in grades 3 and 4, indicating that these genes can also be utilized as additional molecular determinants to distinguish glioma grades. We further compare these data to microarray results from Elk-1-VP16 or PEA3-VP16 overexpression SH-SY5Y cells in order to narrow down transcriptional regulons, and identify common and unique transcriptional regulatory networks for these ETS proteins.

#### **2. Materials and Methods**

#### *2.1. Data Collection*

Microarray datasets related to glioma were searched from the Gene Expression Omnibus data repository [30], and GSE4290 datasets, including expression data of brain tissue from glioma patients, were selected in this study [31]. The dataset contains the brain tissue of three glioma grades (grade 2–4) from glioma patients and epilepsy patients as a non-tumor control by obtaining brain tissue from surgery patients from Henry Ford Hospital. Patient classification and tissue preparation for microarray were described in [30]. A preprocessed expression matrix was imported into an R programming interface by using R package GEOquery from the Bioconductor project [32]. The methodological flow chart of the study is shown in Figure 1. –

**Figure 1.** Methodology of transcriptomic profiling and gene regulatory network inference algorithm.

#### *2.2. Data Processing and Differentially Gene Expression Analysis*

The expression matrix of the study was Log2 normalized, used in principal component analysis (PCA) to investigate the relationship between samples. Outlier samples were determined by using the PCA plot and eliminated from the expression matrix before the differential gene expression analysis. Differential gene expression analysis was performed with the R package limma from Bioconductor with the contrast of each glioma grade versus non-tumor samples [33]. Differentially expressed genes (DEGs) were determined with a Benjamini & Hochberg corrected *p*-value < 0.05 significance level and absolute Log2 fold change > 0.6. DEGs were visualized with a volcano plot by using R package Enhanced-Volcano from Bioconductor [34]. Additionally, the intersection of DEGs was performed between limma results, Elk-1-VP16, and the PEA3-VP16 overexpression microarray, which were previously published from our laboratory for use in further analysis.

#### *2.3. Gene Regulatory Network Construction*

Transcriptional gene regulatory network (GNR) mediated by the ETS transcription factor family was constructed by using the R package RTN from Bioconductor [35–37]. The expression matrix and ETS transcription factor list from differentially expressed genes were used as an input for the transcriptional network inference (TNI) algorithm with a Benjamini & Hochberg corrected *p*-value < 0.05 significance level and 1000 permutation number parameters. Regulon activity scores were calculated from TNI by using a twotailed gene set enrichment analysis (GSEA) algorithm built-in RTN package and visualized as a heatmap. To construct an edge-weighted gene regulatory network mediated with ETS members (Regulons), transcriptional regulatory networks from TNI and differentially expressed genes were integrated by a transcriptional regulatory analysis (TNA) algorithm with two-tailed GSEA. A gene regulatory network of ETS members was constructed by using significant network interaction from TNA (*p*-value < 0.05). Additionally, the constructed network was filtered with the intersection of DEGs and previous microarray studies (Elk-1-VP16 and PEA3-VP16 overexpression studies). The final GNR was visualized by using Cytoscape software [38].

#### *2.4. Functional Enrichment Analysis*

Gene ontology (GO) and KEGG pathway enrichment analysis were performed by using R package clusterProfiler from Bioconductor to analyze the functional profile of gene clusters from differentially expressed genes (up- and downregulated genes of individual glioma grade) and transcription factor regulon clusters (positively and negatively regulated targets for each ETS regulon) [39]. The enriched GO term and KEGG pathways were determined by using a Benjamini & Hochberg corrected *p*-value < 0.05 significance level and context manner term filtration.

#### **3. Results**

#### *3.1. Identification of Differentially Expressed Genes in Gliomas*

Before analyzing the expression of genes within the ETS superfamily, we have obtained and analyzed microarray datasets corresponding to different stages of glioma from patients (grades 2–4), using epilepsy patients as the non-tumor control [31].

Analysis of the transcriptome profiles of these glioma samples yielded a set of differentially expressed genes (DEGs), and the performance of the differential expression levels in discriminating the tumor cells from non-tumor cells was verified through principal component analysis (Figure 2). According to the PCA analysis, it was found that, while non-tumor and glioma samples were readily separated, discrimination of glioma stages was less pronounced; grade 2 and grade 4 gliomas were relatively separate, however, grade 3 tumor samples were not clearly separated from the other tumor groups (Figure 2A). Therefore, any outlier data were eliminated for differential gene expression analysis. According to differential gene expression analysis, 8402 of 19,225 genes were found to be significantly changed (with adjusted *p*-value < 0.05 and absolute log2 fold change > 0.6 thresholds) in

the tumor samples (grades 2–4) (Supplementary Table S1). Up- and downregulated genes in all three grades of gliomas were obtained by differential gene expression analysis, and the volcano plot was used to visualize the DEGs, which shows that the distribution of differentially expressed genes was compatible (Figure 2B–D).

**– Figure 2.** (**A**). Differential gene expression analysis results. Principal component analysis (PCA) plot of the normalized expression matrix. Each point represents individual samples. (**B**–**D**). Corresponding differentially expressed genes (DEGs) were obtained from comparisons of non-tumor vs. individual glioma grades by using the limma package with a 0.6 absolute log2 fold change and adjusted *p*-value < 0.01 with FDR cutoff, which is indicated with red points. (**E**). Relative fold change of significantly changed ETS members for glioma grade compared with non-tumor samples by differential gene expression analysis. (**F**). Intersection of DEGs between limma results, Elk1VP16 and PEA3VP16, which are our previously published micro arrays, were represented as venn diagrams.

We have next asked whether ETS genes were among the DEGs, and, if so, how their expressions were affected in different glioma stages compared to the non-tumor control (Figure 2E). To that end, we have focused our studies to ETS subfamilies that have been previously reported to be involved in gliomas, namely the ETS subfamily [40–42], TCF subfamily [43–45], ELF subfamily [46,47], PEA3 subfamily [27,48], and TEL subfamily [49,50]. The results showed that the expressions of ETS2 and ELK-1 were downregulated at all grades, while the expressions of ELK4, ELF1, ELK3, ETS1, ETV4, and ETV1 were upregulated at all grades; intriguingly, ETV6 and ELF4 expressions were downregulated at grade 2, but upregulated at grades 3 and 4. The expression of ELK3, ETV4, and to some extent ELK4 was found to increase gradually with glioma grade, while ETV1 expression was highest in grade 2 glioma, and progressively decreased with glioma stage (Figure 2E). The only member of the SPI subfamily of ETS transcription factors represented was PU.1, which was previously shown to be involved in glioma progression, and its levels were indeed found to be increased with glioma grades (Figure 2E; [26]).

Previous microarray studies where either constitutively active Elk-1-VP16 or PEA3- VP16 was overexpressed in SH-SY5Y neuroblastoma cells have identified a number of transcriptional targets for these ETS proteins. In order to narrow down our search and focus on gene regulatory networks of ETS proteins, we have identified overlapping genes by comparing DEGs from glioma grades 2–4 with the microarray results from Elk-1-VP16 and PEA3-VP16 overexpressing cells; 2637 genes were found to be at the intersections of these two experiments, with 63 genes commonly regulated in both glioma tumor samples (DEGs), as well as cell line studies (Elk-1-VP16 and PEA3-VP16) (Figure 2F).

To clarify the functional profiles of the identified DEGs in glioma grades, enrichment analysis was performed, and significant GO and KEGG annotations were obtained (Figure 3). For the GO enrichment analysis of biological processes, initially, up- and downregulated genes of the different glioma grades were analyzed. While the upregulated gene clusters of grades were observed with cell cycle related phenotype in all glioma grades, the downregulated gene clusters of grades showed neuronal phenotype, such as synaptic transmission, as would be expected (Figure 3A, Supplementary Table S2). In the KEGG enrichment analysis, while the upregulated genes of glioma grades were enriched in p53, TGF-β, and Notch signaling pathways, prominent downregulated gene clusters fell into synaptic function related pathways, such as glutamatergic, GABAergic, serotonergic, and dopaminergic pathways (Figure 3B, Supplementary Table S3). For instance, the TGF-β signaling pathway was found to be altered through glioma progression, as observed by an increase in the level of BMP molecules, including BMP3 and BMP4, as well as their targets, such as Smad1/5/8 and Id. Additionally, the expression of cMyc and p15 associated with cell cycle were significantly increased. On the other hand, Notch signaling pathway genes, such as Delta, Notch, and Fringe, were observed to be upregulated in glioma. The MAPK signaling pathway was found to be enriched in downregulated clusters, with the expression of genes, such as Ras, MEK1, ERK, JNK, and Elk-1, being downregulated. All of these functional enrichment analysis results confirmed that, in all of the glioma grades, cells had downregulated pathways directly related with neuronal function, but upregulated signaling pathways related to cell proliferation and survival. It is interesting to note that the grade 4 glioma samples did not exhibit significant TGF-β pathway upregulation, but PI3K pathway upregulation instead (Figure 3B).

**Figure 3.** GO and KEGG enrichment analysis of differentially expressed genes for glioma grades. Each glioma grade was clustered as up- and downregulated gene clusters. (**A**). GO and (**B**). KEGG analysis were performed by clusterProfiler with an adjusted *p*-value < 0.05. The gene ratio indicates the number of genes enriched with a corresponding GO or KEGG term among the total gene number introduced into the enrichme.nt analysis.

#### *3.2. Transcriptional Gene Regulatory Network Construction*

In order to investigate ETS transcriptional regulation networks specific for each glioma grade, we have integrated DEGs obtained from the analysis of glioma grades with the normalized expression matrix, where significantly changed ETS members are referred as regulons. The initial network obtained using the transcriptional network inference (TNI) algorithm contained 10 ETS member regulons, 11,762 target genes, and 23,181 total interactions (Figure 4A). We have focused on the expression changes of ETS members in different grades of glioma, and regulon activity scores were calculated from the initial network. The analysis of regulon activity scores showed that, while ELK-1 and ETS2 showed high regulon activity in the non-tumor condition (magenta, Figure 4A), ETV1 showed a high activity score on mainly grade 2 glioma (green, Figure 4A). The other ETS proteins showed high regulon activity in the grade 4 glioma cluster, however, regulon activity of grade 3 glioma was dispersed between grade 2 and grade 4 (Figure 4A). After including DEGs into the initial network using the transcriptional network analysis (TNA) algorithm, a focused network of glioma grades was constructed. According to TNA

algorithm, nine significant ETS regulons were found to be enriched with different numbers of target genes. ETV1, which expressed the least significance among the significant ETS regulons, was marked in blue (Figure 4B).

**Figure 4.** Regulon activity and size of the ETS-mediated gene regulatory network. (**A**). The correlation distance heatmap of regulon activity for non-tumor and glioma grades. (**B**). Regulon size of the individual transcription factor in the gene regulatory network resulting from the transcriptional network analysis (TNA).

To determine the direction of regulation between each regulon and its targets, twotailed GSEA was performed, and positively and negatively regulated co-expression patterns in target gene distribution were constructed for individual ETS regulons (Figure 5). In this analysis, genes were ranked for their fold changes in the x-axis, and enrichment scores were given in the y-axis; the peak of each plot is the enrichment score for the gene indicated (dotted lines), while the colored bar shows the positively and negatively correlated genes. These results suggest that enriched ETS regulons have both unique and common gene targets in gliomas, as indicated by a clear separation of negatively and positively correlated targets in regulons such as ETS2 and ELK1 (unique), and overlapping negative and positive regulons, such as those of ELF4 and ELK3 (common).

**Figure 5.** Two-tailed GSEA analysis associated positively and negatively regulated targets of individual regulons. Target genes are ranked by gene expression analysis, and scored by enrichment analysis that indicates the edge weight of the gene regulatory network.

The network from the TNA algorithm was filtered by DEGs from Elk-1-VP16 and PEA3-VP16 overexpression microarray results to create a much more unique regulatory network of ETS members. As a result of filtering, a final regulatory network was constructed with 3366 target genes and 6610 interactions with ETS regulons, and the gene regulatory network was visualized with Cytoscape (Figure 6). This network representation shows fold changes of DEGs, as well as their interaction with the ETS regulons, showing the common targets to be clustered in the middle (Figure 6).

**Figure 6.** Gene regulatory network of glioma grades under the regulation of ETS transcription factors. Diamond nodes represent ETS members as a regulator, and circle nodes correspond to the target genes. The color of the circle indicates the mean fold change of glioma grades compared to non-tumor samples, resulting from differential gene expression analysis. Edge colors show the enrichment score of each target gene with corresponding regulators resulted from GSEA analysis.

> – – Focusing on the functional investigation of the gene regulatory network, GO and KEGG enrichment analysis was performed with positively and negatively regulated targets of ETS regulons on the regulatory network (positive regulation indicates similar coexpression patterns, i.e., when ETS protein is downregulated, its targets are also downregulated, and vice versa). It was observed that positive and negative cluster targets of ETS regulons were enriched in biological processes, such as cell–cell adhesion, synapse formation, and protein localization, some of which are common across members, while some are unique for one or few family member(s) (Figure 7A,B, Supplementary Tables S4 and S5). ELF1 and ELF4 regulons appear to belong to similar biological processes; ELK1 and ETS2 also were found to have targets within the same biological pathways (Figure 7A). The ETV1 regulon appears to have a distinct set of positively regulated targets, while ELK3, ELK4,

ETS1, and, to some extent, ETV4 appear to regulate similar biological processes (Figure 7A). This classification is not conserved for negatively regulated targets, however; here, ELF1, ELK3, ELK4, ETS1, and ETV4 appear to regulate similar biological processes, while the ELF4 regulon and ETS2 regulon each are comprised of distinct targets (Figure 7B). It is important to note that, in positively regulated targets of the ETV1 regulon, nucleosome and chromatin disassembly related processes were prominent, while no significant negatively regulated targets were identified for the ETV1 regulon. The ELK3 regulon included positively regulated targets in ECM organization, protein maturation, and processing pathways, and negatively regulated targets in synaptic vesicle signaling, synaptic transmission, and synaptic plasticity pathways.

**Figure 7.** GO enrichment analysis of individual regulons and their (**A**). positively and (**B**). negatively regulated targets. GO analysis was performed by clusterProfiler with an adjusted *p*-value < 0.05. The gene ratio indicates the number of genes enriched with corresponding GO terms among the total gene number introduced into the enrichment analysis.

Similar comparative analysis using KEGG pathway enrichment showed that, unlike the GO biological processes analyzed above, distinct signaling pathways were regulated by each ETS regulon, while a positively regulated cluster of the ETV4 regulon was enriched for the MAPK and PI3K-Akt signaling pathways, and a negatively regulated cluster of this protein was enriched for endocytosis and the synaptic vesicle cycle (Figure 8A,B, Supplementary Tables S6 and S7); the positively regulated cluster of ELK1 was enriched for cholinergic and dopaminergic synapses, as well as calcium signaling, while its negatively regulated cluster was enriched Hippo signaling, signaling of pluripotent stem cells, and cell cycle (Figure 8A,B). Interestingly, endocytosis and the synaptic vesicle cycle were common signaling pathways in almost all ETS regulons, except for ELK1 (Figure 8B).

**Figure 8.** KEGG enrichment analysis of individual regulons and their (**A**). positively and (**B**). negatively regulated targets. KEGG analysis was performed by clusterProfiler with an adjusted *p*-value < 0.05. The gene ratio indicates the number of genes enriched with corresponding KEGG pathways among the total gene number introduced into the enrichment analysis.

#### **4. Discussion**

The five stages of gliomagenesis are the initial growth stage, oncogene-dependent senescence stage, growth stage, replicative senescence stage, and, finally, the immortalization stage [28]. Disease stage classification and identification of stage-dependent or grade-dependent biomarkers is important in the accurate diagnosis of gliomas.

Graph complexity analysis in low-grade glioma has shown Etv5 and its network expression to be critical features of the neoplastic state [27]. Unfortunately, ETV5 of the PEA3 subfamily does not appear to be significantly altered in tumor vs. non-tumor samples in the microarray datasets used in this study (data not shown). However, we have identified another PEA3 subfamily member, ETV1, to be expressed at high levels in low-grade glioma and decrease in expression in higher grades (Figure 2). ELK-1 protein is known to be a critical partner for the androgen receptor (AR) in prostate cancer, and its expression was found to be associated with a higher clinical stage and prognostic marker of disease recurrence in prostate cancer [14]. No such distinction was apparent in our study on glioma grades 2–4. However, we have identified ELF4 and ETV6 to be downregulated in grade 2 gliomas, and upregulated in increasing amounts in grades 3 and 4 (Figure 2). It should be noted, however, that the ETS expression profile is also different in epilepsy; ELF1, ELK1, ELK4, ETS1, ETS2, and ETV1 are expressed at higher levels than ELF4, ELK3, and ETV4, and there is also variability in expression among different types of epilepsy (Supplementary Figure S1). However, since the type of epilepsy used in the datasets analyzed in this study were not known, it was not possible to normalize for ETS gene expression (Supplementary Figure S1) [51].

ETS proteins focused on in this study (namely, class I subfamilies ETC, TCF, and PEA3 and class II subfamilies ELF and TEL) exhibit little tissue specificity, and, in fact, many family members are ubiquitously expressed [2,15]. It is therefore not surprising that gene regulatory network analysis of ETS transcription factors exhibits extensive overlap of targets, confirming that functional redundancy exists at least to a certain extent (Figure 6). However when positively and negatively regulated targets of ETS regulons were analyzed, negatively regulated targets were found to extensively overlap (mostly related to synaptic vesicle trafficking and synaptic transmission, Figure 7B), while positively regulated targets appeared to be selective for groups of ETS family members, with ELF1 and ELF4 comprising one class (targets in synapse pruning, immune function, and cell to cell adhesion related biological processes), ELK1 and ETS2 forming another class (targets in synapse function and synaptic vesicle trafficking related processes), and ELK3, ELK4, ETS1, and ETV4 forming a third class (targets in extracellular matrix related processes, as well as protein processing and localization, ER stress, and cell cycle related processes); ETV1 appeared to be a class by itself (targets in nucleosome and chromatin disassembly) (Figure 7A). These regulon classes were not directly related to ETS subfamily assignments.

Although more physiologically relevant non-tumor controls are required to fine-tune the results in the future, this is a proof of concept study that shows that expression levels of ETS genes can be used as diagnostic markers for glioma grade identification, in addition to already existing molecular markers. In addition, the gene regulatory network analysis for ETS regulons can be used to identify target gene clusters in positively and negatively regulated pathways and processes, which can help in understanding the molecular mechanisms of transcriptional redundancy among family members. We propose that such network analysis can also be extended to differentiate stages of tumorigenesis in other types of tumors, as well as to developmental stages of various tissues.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2075-442 6/11/2/138/s1, Figure S1: Expression level of ETS members on the neocortex and hippocampus in epilepsy patients and the healthy control from GSE134697; Table S1: GO biological process enrichment analysis result table of glioma grades from DEGs; Table S2: KEGG pathway enrichment analysis result table of glioma grades from DEGs; Table S3: GO biological process enrichment analysis result table of positively regulated targets of ETS regulons; Table S4: GO biological process enrichment analysis result table of negatively regulated targets of ETS regulons; Table S5: KEGG pathway

enrichment analysis result table of positively regulated targets of ETS regulons; Table S6: KEGG pathway enrichment analysis result table of negatively regulated targets of ETS regulons; Table S7: Table formation of the gene regulatory network in Figure 6.

**Author Contributions:** Conceptualization, B.K. and I.A.K.; Data curation, Y.K.B.; Formal analysis, Y.K.B. and B.K.; Investigation, Y.K.B. and I.A.K.; Methodology, B.K.; Project administration, I.A.K.; Supervision, B.K. and I.A.K.; Writing—original draft, Y.K.B., B.K., and I.A.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data is available in Supplementary Files and upon request.

**Acknowledgments:** We would like to thank members of the AxanLab for helpful discussions. IAK is a GYA member.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **ETS-Domain Transcription Factor Elk-1 Regulates Stemness Genes in Brain Tumors and CD133+ BrainTumor-Initiating Cells**

**Melis Savasan Sogut 1,2,3, Chitra Venugopal <sup>4</sup> , Basak Kandemir 1,2,3,†, Ugur Dag 3,‡, Sujeivan Mahendram <sup>4</sup> , Sheila Singh <sup>4</sup> , Gizem Gulfidan <sup>5</sup> , Kazim Yalcin Arga <sup>5</sup> , Bayram Yilmaz 6,\* and Isil Aksan Kurnaz 1,2,\***

	- Cambridge, MA 02139-4307, USA.

**Abstract:** Elk-1, a member of the ternary complex factors (TCFs) within the ETS (E26 transformationspecific) domain superfamily, is a transcription factor implicated in neuroprotection, neurodegeneration, and brain tumor proliferation. Except for known targets, *c-fos* and *egr-1*, few targets of Elk-1 have been identified. Interestingly, *SMN*, *SOD1*, and *PSEN1* promoters were shown to be regulated by Elk-1. On the other hand, Elk-1 was shown to regulate the CD133 gene, which is highly expressed in brain-tumor-initiating cells (BTICs) and used as a marker for separating this cancer stem cell population. In this study, we have carried out microarray analysis in SH-SY5Y cells overexpressing Elk-1-VP16, which has revealed a large number of genes significantly regulated by Elk-1 that function in nervous system development, embryonic development, pluripotency, apoptosis, survival, and proliferation. Among these, we have shown that genes related to pluripotency, such as *Sox2*, *Nanog*, and *Oct4*, were indeed regulated by Elk-1, and in the context of brain tumors, we further showed that Elk-1 overexpression in CD133+ BTIC population results in the upregulation of these genes. When Elk-1 expression is silenced, the expression of these stemness genes is decreased. We propose that Elk-1 is a transcription factor upstream of these genes, regulating the self-renewal of CD133+ BTICs.

**Keywords:** ETS; Elk-1; stem cell; microarray; brain-tumor-initiating cell (BTIC)

#### **1. Introduction**

The ternary complex factor (TCF) Elk-1 of the ETS domain superfamily is a ubiquitous transcription factor, yet it interacts with neuronal microtubules and motor proteins, is found mainly in neuronal axons and dendrites, and is phosphorylated at Serine 383 residue in fear conditioning or synaptic plasticity paradigms [1–6]. Phosphorylation of Elk-1 by MAPKs, in particular Serine 383 and Serine 389 within the activation domain, was shown to induce its binding to DNA [7,8].

Elk-1 transcription factor has been widely studied with respect to its mitogen-induced activation through phosphorylation by mitogen-activated protein kinases (MAPKs) and regulation of the *c-fos* promoter in complex with serum response factor (SRF) [9]. However,

**Citation:** Sogut, M.S.; Venugopal, C.; Kandemir, B.; Dag, U.; Mahendram, S.; Singh, S.; Gulfidan, G.; Arga, K.Y.; Yilmaz, B.; Kurnaz, I.A. ETS-Domain Transcription Factor Elk-1 Regulates Stemness Genes in Brain Tumors and CD133+ BrainTumor-Initiating Cells. *J. Pers. Med.* **2021**, *11*, 125. https:// doi.org/10.3390/jpm11020125

Academic Editor: Chiara Villa

Received: 20 January 2021 Accepted: 9 February 2021 Published: 14 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Elk-1 and other ternary complex factor (TCF) members have a rather large number of targets, some of which have a high degree of redundancy [10]. A thousand new promoters were identified for Elk-1 binding using a ChIP-chip assay, with two distinct binding modes: SRF-dependent and SRF-independent; furthermore, it was shown that there was a redundancy of promoter occupancy by other ETS proteins in a subset of promoters [10]. Elk-1 was also shown to regulate survival in neuronal cell models by regulating the Survival of Motor Neuron (SMN) promoter as a novel target [11]. CD133, a widely-accepted Cancer Stem Cell (CSC) marker [12–14], was also shown to be regulated through *ets* motifs as well as hypoxia-inducible elements, through the interaction of HIF-1α and Elk-1 on the promoter [15].

Elk-1 was recently found to have both activating and repressive role in human embryonic stem cells (hESCs), particularly through SRF interaction, and found to be upregulated in mesoderm differentiation [16].

In this study, we have first aimed to identify novel targets of Elk-1 using SH-SY5Y neuroblastoma cell line in a transcriptomics approach. We have identified novel pathways and genes that were up- or downregulated upon Elk-1-VP16 overexpression, and when promoters of a subset of these genes were analyzed, several *ets* motifs were identified. Among these, genes related to pluripotency or early neuronal development were particularly interesting, hence we have further analyzed and verified the regulation of a selected set of genes by Elk-1 using qPCR and investigated the regulation of *SOX2*, *NANOG*, and *POU5F1* promoters by Elk-1 and its binding to predicted *ets* motifs in neuroblastoma and glioblastoma (GBM) cell lines. Considering Elk-1 was previously shown to regulate CD133 expression [15], we have also studied Elk-1 expression levels in CD133− and CD133+ cell lines as well as primary brain tumors, indicating Elk-1 was indeed overexpressed in CD133+ cells, and when Elk-1 expression was silenced by RNAi, *SOX2*, and *NANOG* expression were reduced in both CD133+ primary GBMs, as well as CD133+ cell lines in a cell context-dependent manner.

#### **2. Materials and Methods**

#### *2.1. Cell Culture and BTIC Isolation from Cell Lines and Primary Tumors*

SK-N-BE (2) (ATCC CRL-2271) and SH-SY5Y (ATCC CRL-2266) human neuroblastoma cell lines as well as U-87 MG (ATCC® HTB-14), A172 (ATCC CRL-1620), and T98G (ATCC CRL-1690) human GBM cell lines were used. U87-MG, A172, and T98G cell lines were provided by Assist. Prof. Tugba Bagci Onder from Koc University. For all the stated cell lines for monolayer culture, DMEM high-glucose (4.5 g/L) medium (Gibco, #41966029, Waltham, MA, USA) was used as a basal medium and supplemented with one percent penicillin-streptomycin solution (Gibco, #15140122, Rockville, MA, USA) and 10 percent fetal bovine serum (FBS) (Life Technologies, #10500064, Carlsbad, CA, USA). Cells were grown in 37 ◦C and 5 percent CO2 incubator.

To form tumorsphere cultures from monolayer cells and support brain-tumor-initiating cells after conducting CD133+ isolation, initial proliferation media (IPM), N2 media, and coated culture plates were used. Plates were prepared by coating with poly-HEMA (poly (2-hydroxyethyl methacrylate) solution. To prepare poly-HEMA solution, 38 mL absolute ethanol was mixed with two mL double distilled water. Following the addition of 1.2 g of poly-HEMA (Sigma Aldrich, #P3932, Taufkirchen, Germany) powder into the mixture, it was placed in a shaker at 37 ◦C with a vigorous shake for four-five hours until no powder could be seen with the naked eye. This poly-HEMA solution was filtered through a 0.22-micron filter and kept at 4 ◦C up to six months. Initial proliferation medium (IPM) is necessary for culturing tumorspheres and isolated brain-tumor-initiating cells up to three passages. IPM is made up of neurobasal medium (Gibco, #21103049, Waltham, MA, USA), 1X B27 (Gibco, #17504044, Waltham, MA, USA), 1X GlutaMAX (Gibco, #35050061, Waltham, MA, USA), one percent penicillin-streptomycin solution (Gibco, #15140122, Waltham, MA, USA), 20 ng/mL FGF-2 (Gibco, #13256029, Waltham, MA, USA), and 20 ng/mL EGF (Gibco, #SRP3027, Waltham, MA, USA). N2 medium is necessary for culturing spheroids and iso-

lated brain-tumor-initiating cells over three passages. N2 medium is made up of neurobasal medium (Gibco, #21103049), 1X N2 (Gibco, #17502048, Waltham, MA, USA), 1X GlutaMAX (Gibco, 35050061, Waltham, MA, USA), one percent penicillin-streptomycin solution (Gibco, #15140122, Waltham, MA, USA), 20 ng/mL FGF-2 (Gibco, #13256029, Waltham, MA, USA), and 20 ng/mL EGF (Waltham, MA, USA, #SRP3027, Waltham, MA, USA).

For brain-tumor-initiating cells' (BTICs) isolation from cell lines, SK-N-BE (2) neuroblastoma cells were grown as monolayer cells up to 80 percent confluency, and on the day of isolation, the media was removed, cells were washed with five mL PBS/flask, and three mL of StemPro Accutase/flask was added onto the cells. The suspension was centrifuged at 300× *g* for five minutes. The cells were resuspended with MACS buffer [two percent bovine serum albumin (BSA), two mM EDTA, and phosphate-buffered saline (PBS) pH 7.2]. To prevent the clogging of the columns at the ongoing isolation procedure, cells were passed through the first 70-micron cell strainer several times until they could pass freely through it. Then, they were passed through a 30-micron filter several times, cell aggregates were removed, and the single-cell suspension was prepared. Cells could be counted at this stage of the procedure. The viable cell number was determined by staining the cells with 0.4 percent Trypan Blue Solution (Gibco, #15250061, Waltham, MA, USA). To continue, cells were centrifuged at 300× *g* for 10 min, and the supernatant was removed. Cells were resuspended in 60 µL MACS buffer/10<sup>7</sup> cells and 20 µL FcR Blocking Agent/10<sup>7</sup> cells, and 20 µL CD133 Microbeads/10<sup>7</sup> cells were added (CD133 MicroBead Kit—Tumor Tissue, human, Miltenyl Biotec, #130-100-857, Gladbach, Germany). The cells were incubated at 4 ◦C for 30 min at a constant, slow rotation (12 rpm). Following incubation, two mL buffer/10<sup>7</sup> cells were added to wash the cells, and then, they were centrifuged at 300× *g* for 10 min again. The supernatant was aspirated, and the pellet was resuspended in 500 µL MACS buffer/10<sup>7</sup> cells and continued with magnetic separation part.

MACS MS column (Miltenyl Biotec, #130-042-201, Germany) was placed on the MACS Mini Separation stand and was equilibrated with 500 µL MACS buffer. The cells prepared in the previous step were loaded onto that column, and with gravity effect, the suspended cells flow through the column for positive selection. That is, the cells labeled for CD133 (CD133+) were kept in the column, while marker-free cells (CD133−) would not bind to the column and were collected in a tube. The column was washed three times with 500 µL of the buffer to wash column-retaining CD133+ cells, the flowing liquid was collected again, and the resulting cells were combined to assemble CD133− cells. For elution of CD133+ cells, the column was separated from the magnetic stand and allowed to stand in the non-magnetic field for about two minutes and flushed out with one mL MACS buffer with the supplied plunger. Cells were counted with Trypan Blue, centrifuged for five minutes at 150× *g* and resuspended in complete IPM and cultured for 7–10 days in a humidified incubator at 37 ◦C and five percent CO2, replacing the medium with freshly prepared IPM every three–four days until their size reached 200 microns, or they started dying from the center. When they reached the limitations, they were passaged. For passaging the cells, the suspension cells were collected from the dishes to a falcon and centrifuged at 300× *g* for 10 min. Following the centrifugation, the medium was aspirated, and one mL StemPro Accutase Cell Dissociation Reagent (Gibco, #A1110501, Waltham, MA, USA) was added onto the cells, and cells were incubated at 37 ◦C for five minutes. Cells were triturated about 40 times until the spheroids become single-cell suspension. Onto this single-cell suspension, five mL of PBS with antibiotics was added. Cells were counted at this stage if necessary or to continue cells were centrifuged at 300× *g* for five minutes. The cells were resuspended in complete IPM at the proper volume.

#### *2.2. Dissociation and Culture of Primary GBM Tissue*

Human GBM samples were obtained from consenting patients, as approved by the Hamilton Health Sciences/McMaster Health Sciences Research Ethics Board. Brain tumor samples were dissociated as previously described [17] and cultured as neurospheres in Neurocult complete (NCC) media, a chemically defined serum-free neural stem cell

medium (STEMCELL Technologies, Vancouver, BC, Canada), supplemented with human recombinant epidermal growth factor (20 ng/mL: STEMCELL Technologies, Vancouver, Canada), basic fibroblast growth factor (20 ng/mL; STEMCELL Technologies, Vancouver, Canada), heparin (2 µg/mL 0.2% Heparin Sodium Salt in PBS; STEMCELL Technologies, Vancouver, Canada), antibiotic-antimycotic (10 mg/mL; Wisent Bioproducts, Saint Bruno, QC, Canada) in ultra-low attachment plates (Corning, New York, NY, USA). Primary GBM cells (BT 428, BT 458 and BT 624) were cultured in NSC complete media and flow-sorted for CD133+ and CD133− populations as described previously [18,19]. Transfections were carried out by Lipofectamine 2000 as per the manufacturer's instructions.

#### *2.3. Transient Transfection of Cells*

For transfection of adherent cells, single-cell suspensions of adherent cell cultures were prepared and seeded at 0.3–0.6 × 10<sup>6</sup> cells/cm<sup>2</sup> density in complete DMEM medium, and they were incubated in 37 ◦C, five percent CO<sup>2</sup> incubator, so that they would be 85–90 percent confluent at the time of transfection. On day one, for the formation of the carrier liposome complex, the desired plasmid and PEI were mixed at the determined ratio for each cell line in serum-free DMEM and incubated at room temperature for 20 min. At the end of the period, a complete DMEM medium with 10 percent FBS was added to the mixture at half the volume of the mix. Two hours later, complete DMEM medium containing 10 percent FBS was added to the wells/dishes and the cells were incubated for 48 h in 37 ◦C, five percent CO<sup>2</sup> incubator for the transgene expression. Cells were transfected with empty pCDNA3 or pCMV plasmids, pCMV-Elk-1 and pRSV-Elk-1-VP16 (courtesy of Prof. A.D. Sharrocks) using the PEI reagent (CellnTech), in 3 replicas per sample. psiSTRIKE hMGFP-scrRNA (from here on referred to as scrRNA) and psiSTRIKE hMGFP-siElk-1 (from here on referred to as siElk-1) has been described elsewhere [11].

For transfection of BTICs, the suspension cells were collected from the dishes to a falcon and centrifuged at 300× *g* for 10 min. Following the centrifugation, the medium was aspirated, and one mL StemPro Accutase Cell Dissociation Reagent (Gibco, #A1110501) was added onto the cells, and cells were incubated 37 ◦C for five minutes. Cells were triturated for about 40 times until the spheroids become single-cell suspension. Onto this single-cell suspension, five mL of PBS with antibiotics was added and centrifuged at 300× *g* for five minutes. The cells were resuspended in complete IPM at the proper volume. Cell density and Lipofectamine 2000 (Thermo Fischer Scientific, Waltham, MA, USA) ratio were determined. Cells were seeded at 0.3–0.6 × 10<sup>6</sup> cells/cm<sup>2</sup> density in complete IPM without antibiotics on the day of transfection. Following the cell seeding, Lipofectamine 2000 and the nucleic acids were diluted in neurobasal medium without antibiotics, incubated at room temperature for five minutes, then the diluted Lipofectamine 2000 was gently combined with the dilute nucleic acids, and the mixture was incubated at room temperature for 20 min to form liposome. Then, the mixture was added directly onto wells containing cells, and the cells were incubated for 24–72 h in 37 ◦C, 5% CO<sup>2</sup> incubator for the transgene expression

#### *2.4. Soft Agar Assay*

For softy agar assay, 100 cells in 100 µL IPM and an equal volume of 2.8% lowmelting-point (LMP) agarose solution were mixed to generate 1.4% agarose-cell solution per well in a 96-well plate, and the mixture was incubated at 37 ◦C, 5% CO<sup>2</sup> incubator for 14 days. At the end of 14 days, colonies were counted under a 10× magnification or stereo microscope. For staining, crystal violet was dissolved in PBS with two percent ethanol at a final concentration of 0.04 percent, filtered with 0.45 µm filter, and dishes were stained with 50 µL of this solution for one hour at room temperature. The plates were checked every ten minutes to prevent the staining of the background. Then, the staining solution was removed carefully, and the wells were washed with water three times for 30 min. At the last wash, water was kept in the wells overnight to remove the background. The assay was performed in quadruplicate; colonies ≥ 20 µm were counted and analyzed using MS Excel software; results were reported as mean ± standard deviation.

#### *2.5. Limiting Dilution Analysis (LDA)*

Limiting dilution analysis (LDA) has been extensively used to find out differences within multiple groups for a particular trait. In our case, LDA was used for determining the cancer cell initiating frequency of CD133+ and CD133− SKNBE (2) cells; in other words, to evaluate the self-renewing capacity of BTICs. For LDA, following the BTIC isolation procedure, cells were counted so that 10,000 cells/50 µL complete IPM would be present in the first tube. Through serial dilution by factor two up to 1 cell/50 µL, cells were seeded on poly-HEMA coated 96-well plates. For each condition/cell number, samples were seeded in quintuplet. Twenty-five microliters of culture media were added to each well every three–four days, and cells were examined for the presence/absence of spheres and quantified on day 10.

#### *2.6. RNA Isolation, cDNA Synthesis, Reverse Transcription Polymerase Chain Reaction (RT-PCR), and Real-Time PCR*

PureLink RNA Mini Kit (Life Technologies Ambion, #12183-018) and PicoPure RNA Isolation Kit (Arcturus, #KIT0202) were used for RNA isolation throughout the experiments. In summary, adherent cells grown in cell culture plates (usually 1.5x10<sup>6</sup> cells/10 cm culture dish) were washed with cold PBS; then, the resuspended cells were centrifuged at 300× *g* for 5 min at 4 ◦C. An amount of 0.3–0.6 mL of lysis solution with beta-mercaptoethanol was added onto the cells depending on the number of cells, and they were mechanically burst and homogenized by triturating through an insulin syringe 15 times. The cells were centrifuged at 2000× *g* for five minutes at 4 ◦C, followed by the addition of 70 percent ethanol equal to the volume of the present cell lysate. The lysates were transferred to the filter cartridges and were centrifuged at 12,000× *g* for 30 s. This step was repeated until the whole sample was finished, and the washing process was started. For washing, 700 µL of wash buffer I was added and centrifuged at 12,000× *g* for 15 s. Following the first washing step, 500 µL of wash buffer II was added and repeated twice after centrifugation at 12,000× *g* for 30 s. Tubes were centrifuged for two minutes to dry the membrane. In the elution stage, the cartridges were transferred to new Eppendorf tubes, and depending on the starting number of cells, 20–35 µL of nuclease-free water was put onto the membrane surface and incubated for three minutes at room temperature. Total RNA isolation was completed with centrifugation at 12,000× *g* for one minute. The concentrations of RNA samples obtained were determined with NanoDrop Spectrophotometer (Thermo Fisher Scientific, Paisley, UK), and the samples were stored at −80 ◦C in the presence of RNase inhibitors or used for further experiments.

Following total RNA isolation, cDNA synthesis was performed using modified MMLVderived reversible transcriptase using the iScript cDNA Synthesis Kit (BioRad, #1708891, Hercules, CA, USA). For this purpose, a maximum of one µg total RNA sample was diluted to a maximum volume of 15 µL. The RNA sample was denatured at 70 ◦C for five minutes and centrifuged briefly. After the addition of the 5X reaction buffer and iScript reversible transcriptase, the mix was ready for the cycling. Prepared cDNA samples were diluted with nuclease-free water to the desired concentration immediately before use in qPCR and/or stored at −20 ◦C for a maximum of one month.

PrimerQuest (Integrated DNA Technologies, IDT, Coralville, IA, USA), a free online software, was used for the qPCR primer design. The mRNA sequences of the target genes were obtained from the NCBI Gene (https://www.ncbi.nlm.nih.gov/gene/, accessed on 20 January 2021) database, the exon regions of the respective genes were determined, and the primers were designed to be at the exon–exon boundary (if possible). Potential primer pairs were evaluated for GC content, melting temperatures (Tm), and the hairpin formation and appropriate primers were determined. The NCBI BLAST database (https: //blast.ncbi.nlm.nih.gov/, accessed on 20 January 2021) was used to check the specificity of the designed primers. The designed primers are listed in Table 1.


**Table 1.** qPCR primers used.


**Table 1.** *Cont.*

All the qPCR experiments were performed using SSOAdvanced Universal SYBR Green Supermix (Biorad, #1725274, Hercules, CA, USA) and Applied Bioscience StepOne Plus Real-Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA); essentially, 1–10 ng cDNA was used as template; primers were used at 300 nM each, and the reaction was carried out at 60 ◦C for 40 cycles. The differences between the expression of target genes were normalized by the expressions of *β-actin* and *gapdh* genes. Each setup was prepared in triplicate and analyzed by the ∆∆CT method as described previously [20]. The fold changes in target gene expressions were calculated based on the mean of the reference

gene expression and logarithm-transformed. All qPCR experiments were repeated at least 3 times, unless otherwise noted. The mean and standard deviation values were calculated for each group, and the differences in the gene expression levels were determined considering the control group. For statistical analysis, depending on the context, one-way ANOVA with Tukey post hoc test or Student's t-test depending on the context with Prism 5 GraphPad software was used. *p*-value under 0.05 was considered statistically significant.

Total RNA was extracted using a Norgen Total RNA isolation kit and quantified using the NanoDrop Spectrophotometer ND-1000. Complementary DNA was synthesized from 0.5–1 µg RNA by using qScript cDNA Super Mix (Quanta Biosciences, Beverly, MA, USA) and a C1000 Thermo Cycler (Bio-Rad, Hercules, CA, USA) with the following cycle parameters: 4 min at 25 ◦C, 30 min at 42 ◦C, 5 min at 85 ◦C, hold at 4 ◦C. qRT-PCR was performed by using Perfecta SybrGreen (Quanta Biosciences, Waltham, MA, USA) and an Opticon Chroma4 instrument (Bio-Rad, Hercules, CA, USA). Gene expression was quantified by using Opticon software, and expression levels were normalized to 28srRNA expression. For statistical analysis, multiple Student's t-tests with Prism 5 GraphPad software were used. *p*-value under 0.05 was considered statistically significant.

#### *2.7. Microarray and Data Analysis*

For microarray analysis, SH-SY5Y cells were transfected with Elk-1-VP16 expression plasmid or empty pCDNA3 plasmid as described above, and 48 h after transfection, RNA samples were isolated using Ambion Tri-pure RNA isolation kit, checked for quality, converted to cDNA, and confirmed for Elk-1 expression as described above. Thereafter, RNA was converted to cDNA using the Superscript Double-Stranded cDNA Synthesis (Invitrogen, Carlsbad, CA, USA) Kit and labeled with NimbleGen One Color DNA Labeling (NimbleGen, Roche, Madison, WI, USA). The labeled cDNA was hybridized to NimbleGen Human Gene Expression Array 12x135K (NimbleGen, Roche, Wisconsin, USA), which covers 45.033 genes with 3 probes per gene, containing 12 arrays per slide. After hybridization, slides were scanned using Genepix 4000B scanner and analyzed with NimbleScan 2.5 software using three arrays from the pCDNA3-transfected cell as reference samples. The expression datasets were normalized using the Robust Multi-Array Average expression measure [21], and differentially expressed genes (DEGs) and their fold-changes were identified from the normalized expression values using two-tailed Student's t-test assuming equal variances and Benjamini-Hochberg's method as the multiple testing option to control the false discovery rate. An adjusted *p*-value threshold of 0.15 was used to determine the statistical significance of differential expression. The dataset is accessible from EBI ArrayExpress, with the accession number of E-MTAB-9938.

Gene IDs were converted to official gene symbol, and gene set enrichment analyses of DEGs were performed through ConsensusPathDb (r.32) [22] using KEGG [23], Reactome [24], and Biocarta [25] as the data source for molecular pathways, and Gene Ontology Biological Process annotations [26] as the data source for biological processes. Wholegenome annotation for the human genome was used as the background reference set. *p*-values were determined through a modified Fisher exact test and adjusted via Benjamini-Hochberg's method. A threshold of adjusted *p*-value < 0.05 was used to determine the statistical significance of the enrichment results. Besides, to characterize the molecular functions of each gene product, and their association with diseases, we manually searched GeneCards Human Gene Database [27].

#### *2.8. Promoter Clonings and Site-Directed Mutagenesis*

To identify the putative Elk-1 transcription factor binding sites in selected stemness gene promoters (*SOX2*, *NANOG*, *POU5F1*), the Cold Spring Harbor Laboratory— Transcriptional Regulatory Element Database (TRED), Swiss Institute of Bioinformatics— The Eukaryotic Promoter Database (EPD), and Alggen-Promo algorithmic analysis program were used. The promoter sequences that correspond to the genes of interest were retrieved from either the Transcriptional Regulatory Element Database (TRED) (http:

//rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=home, accessed on 20 January 2021), or the Eukaryotic Promoter Database (EPD) (http://epd.vital-it.ch/, accessed on 20 January 2021). The obtained promoter sequences were analyzed with Promo 3.0 (http: //alggen.lsi.upc.es/cgi-bin/promo\_v3/promo/promoinit.cgi?dirDB=TF\_8.3, accessed on 20 January 2021). The promoter binding regions for transcription factors can be analyzed by the Promo 3.0 tool, and the results are displayed as "dissimilarity rate". The dissimilarity matrix expresses the similarity pair to pair between Elk-1 DNA binding sequence and the putative sequences at analyzed genes. From this point of view, the smaller dissimilarity rates are the indicators of a higher possibility for the interaction between Elk-1 and the promoter of interest. The binding ability of Elk-1 to the predicted sites on the promoters could be confirmed by luciferase and chromatin immunoprecipitation assays, thereby verifying the microarray results (Table 2).

**Table 2.** Number of ets motifs predicted on selected promoters and their dissimilarity score (DS) range; DS of 0% means perfect match to consensus; TRED, Transcriptional Regulatory Element Database; EPD, Eukaryotic Promoter Database. \*


\* For dissimilarity scores of individual ets motifs, see Supplementary Table S3 for URL of databases, please refer to text.

Cloning primers for human *SOX2*, *NANOG*, and *POU5F1* promoters were designed and analyzed with NetPrimer (http://www.premierbiosoft.com/netprimer/, accessed on 20 January 2021) and PrimerBlast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/, accessed on 20 January 2021) softwares. The designed cloning primers are listed in Table 3. Gradient PCR with five different annealing temperatures was performed to detect the optimum annealing temperature of the primers. The PCR reactions were prepared with i-Taq DNA Polymerase (Intron, #25024, Seoul, Korea) kit using the genomic DNA isolated from the SH-SY5Y cell line as a template. After optimization, the preparation of the insert was carried out using Pfu DNA Polymerase (Thermo Scientific, #EP0571, Waltham, MA, USA) suitable annealing temperatures as indicated in text for 30 cycles. Following amplification, PCR products were purified by PureLink PCR Purification Kit (Invitrogen, # K3100-01) and cloned into pGL3luciferase reporter plasmid.

Intentional deletion mutations were made on cloned promoter sequences with sitedirected mutagenesis (SDM). The promoter sequences were analyzed with Promo 3.0, as stated previously. Potential Elk-1 binding sites on stemness promoters were chosen according to the dissimilarity rate Promo 3.0. Accordingly, *ets1* motif on *NANOG* promoter, *ets1* and *ets2* motifs on *SOX2* promoter, and *ets1*, *ets2*, and *ets3* motifs on *POU5F1* promoter were deleted in corresponding pGL3 luciferase reporter constructs. SDM primers were designed using the NEB Base Changer (http://nebasechanger.neb.com/, accessed on 20 January 2021) website, and Q5® Site-Directed Mutagenesis Kit Protocol (NEB, #E0554) was followed for the mutations. The primer pairs designed flanking the region to be deleted

and eventually forming deletion mutants from the cloned promoter sequences are given in Table 4. The mutagenesis was carried out according to the manufacturer's instructions.



**Table 4.** Primers used for site-directed mutagenesis of cloned promoters.


#### *2.9. Luciferase Reporter Assay*

For each cell line, the necessary optimization experiments were performed, and cell numbers and DNA: PEI ratios were determined for co-transfections. For 24-well cell culture plates for luciferase analysis, for SK-N-BE (2), T98G, and A172 cells 80 × 10<sup>4</sup> cells/well, and for SH-SY5Y and U87-MG cells, 60 × 10<sup>4</sup> cells/well were seeded with triplicates for each transfection group. The following day, *SOX2*-luc, *NANOG*-luc, *POU5F1*-luc, or one of the deletion mutant of these plasmids, one of the Elk-1 series plasmids (pCDNA3.1, Elk1-VP16, Elk1-EN, siElk1, or scrRNA), Renilla-luc plasmid (pRL-TK (Promega, #E2241, Madison, WI, USA)) and the proper ratio of PEI mixture was prepared. After transfection, the cells were incubated for 42 h in a normoxic medium and subjected to one percent hypoxia for the last six hours for the normoxia–hypoxia experiments. At the end of hypoxia treatment, luciferase analysis was performed with Thermo Luminoskan Ascent device by using Dual-Glo Luciferase kit (Promega, Wisconsin, USA) with some modifications. For luciferase analysis of monolayer cell lines, 48 h of incubation was necessary before performing luciferase analysis.

On the day of the luciferase assay, the medium on the cells was aspirated, and the wells were washed with PBS. Cells were lysed with 100 µL of 5X Passive Lysis Buffer (PLB) (Promega, #E1941, Wisconsin, USA) diluted to 1X. Seventy-five microliters of the cells were transferred to luminometer compatible white-bottomed 96-well plates. To measure the Firefly luciferase activity, 75 µL of Dual-Glo® Luciferase Reagent was added onto the

lysed cells. For at least 15 min, the plates were incubated at room temperature, and the luminescence for Firefly luciferase activity was measured. To measure the Renilla luciferase activity, 75 µL of Dual-Glo® Stop&Glo Luciferase Reagent was added to the wells. They were incubated at room temperature for the equal time that was done for Firefly luciferase, and the luminescence for Renilla luciferase activity was measured. Firefly/Renilla ratios were calculated, normalizations were done, and the results were graphed as relative luciferase activity. For statistical analysis, one-way ANOVA with Tukey post hoc test or Student's t-test depending on the context with Prism 5 GraphPad software was used. *p*-value under 0.05 was considered significant.

#### *2.10. Chromatin Immunoprecipitation (ChIP) Assay*

In this assay, proteins and interacting DNA are crosslinked with formaldehyde; the chromatin is sheared with either sonication mechanically or micrococcal nuclease enzymatically. The nucleoprotein complex is enriched by immunoprecipitation, and through the reversal of the crosslinking, DNA and the interacting protein are separated. In the end, the interacting DNA fragment is purified and quantified with ChIP-qPCR. To determine the promoter fragment to be amplified in ChIP PCR, Promo3.0 analysis used for predicting *ets* motifs and their dissimilarity scores was used (Supplementary Table S3). The amplicon size was arranged between 75–150 bp; CpG islands were checked for the potential binding sites of Elk-1, and the position of Elk-1 to those sequences was considered for primer design. The UCSC in silico PCR tool was used to verify the amplicon (https://genome.ucsc.edu/cgi-bin/hgPcr, accessed on 20 January 2021); primers used for ChIP PCR are listed in Table 5.

**Table 5.** The list of primers used in chromatin immunoprecipitation (ChIP) assay.



**Table 5.** *Cont.*

Essentially, cells were seeded in three separate 150 mm cell culture dishes of 2 × 10<sup>6</sup> cells/dish per experimental group on day zero. On day 1, cells were transfected with either an empty pCDNA3.1 plasmid or an expression plasmid for Elk1-VP16 plasmids and incubated 48 h at 37 ◦C, 5% CO2. Cells were then treated with 1% formaldehyde at room temperature for 20 min; glycine was then added to the dishes to a final concentration of 0.125 M and incubated for 5 min at room temperature. The dishes were washed three times with cold PBS on ice and then centrifuged at 400× *g* for five minutes at 4 ◦C with 1× protease inhibitor cocktail (PIC) (Roche, 4693159001). The supernatant was aspirated, and lysis buffer was added onto the cells with a volume of at least 10 times the pellet obtained. The suspension was incubated on ice for 10 min and passed through an insulin needle 20 times. One volume of the sample was mixed with an equal volume of 0.4 percent Trypan Blue Dye, and the cell nuclei were checked under the microscope. The volume of the sonication buffer to be used to dissolve the pellet was adjusted to 2–3 × 10<sup>6</sup> nuclei/mL and sonicated in the Biorupter UCD-200 Sonicator (Diagenode, Denville, NJ, USA). Following the sonication, cell lysates were centrifuged at 22,000× *g* for 20 min at 4 ◦C to remove insoluble materials. The supernatant was then diluted five-fold with dilution buffer and pre-cleared for 4 h with slow rotation with protein A/G mixture beads. After incubation, the samples were precipitated at 150× *g* for 5 min at 4 ◦C, and 10% of the total supernatant was removed as total input control and kept in −20 ◦C. The rest of the supernatant was divided into two fractions of the negative control (IgG-mock) and immunoprecipitation (IP) per group.

Sixty microliters of ANTI-FLAG® M2 Affinity Gel (Sigma Aldrich, #A2220, Taufkirchen, Germany) resin per group were washed and equilibrated with five volumes of dilution buffer and centrifuged three times at 400× *g* for one minute each at 4 ◦C. The negative control and

IP fractions separated from the dilution in the previous step were mixed with Protein G-Plus agarose beads and anti-Flag M2 resin, respectively. The tubes were incubated at 4 ◦C overnight with slow rotation. The following day, the mix was centrifuged at 4 ◦C and 600× *g* for five minutes, and the pellet was collected. The beads were washed with one mL of low salt, high salt, LiCl, and TE buffers at 4 ◦C with rotation, respectively. Following each of the washing steps, the beads were centrifuged at 4 ◦C and 600× *g* for five minutes.

At the elution step, the inputs that were collected and frozen a day before were thawed and added as the third fraction of each group. After the last wash, 250 µL fresh elution buffer, pre-heated at 65 ◦C, was added onto the beads, and they were incubated on a shaker for 15 min. The tubes were vortexed with five-minute intervals and then centrifuged at 4 ◦C and 18,000× *g* for five minutes. The supernatant was collected for each fraction of each group, and the elution step was repeated with another 250 µL elution buffer. After elution of the crosslinked DNA–protein complex, 10 µL of RnaseA (10 mg/mL) (Intron, #BR003) and 25 µL of 5 M NaCl was added onto the elutes and incubated for at least five hours or overnight at 65 ◦C. The following day, 10 µL of 0.5 M EDTA, 20 µL 1 M Tris–HCl (pH 6.5), and two µL Proteinase K (20 mg/mL) (Invitrogen, #25530049, Carlsbad, CA, USA) mix were added and incubated again at 65 ◦C for two more hours. Using MEGAquick-spin™ Plus Total Fragment DNA Purification Kit (Intron Bio, #17290, Sungnam, Korea), the DNA was cleaned up. The resulting fractions were used for qPCR analysis.

SSOAdvanced Universal SYBR Green Supermix (Bio-Rad, #1725274, Hercules, CA, USA) and Applied Biosciences StepOne Plus Real-Time System were used for qPCR analysis with DNA isolated from ChIP. Ten microliters of PCR reaction were prepared by mixing 2X SSO Advanced Universal SYBR Green Supermix, 300 nM forward and reverse primers each, and 1 µL template. In the analysis phase, qPCR signals obtained from the ChIP samples were normalized by the signals obtained from the input, and the mock samples and the results are presented as fold change. For statistical analysis, one-way ANOVA with Tukey post hoc test or Student's t-test depending on the context with Prism 5 GraphPad software was used. *p* value under 0.05 was considered significant.

#### **3. Results**

#### *3.1. Microarray Analyses Reveal Novel Targets in Elk-1-VP16 Overexpressing SH-SY5Y Cells*

Elk-1 is a ubiquitous transcription factor, yet it has been implicated in different biological processes in the nervous system. In order to identify novel target genes of Elk-1 with respect to survival in neurons, we have overexpressed Elk-1-VP16 constitutively active fusion protein in SH-SY5Y neuroblastoma cells. The comparative analysis of the transcriptome profiles indicated 11,018 differentially expressed genes (DEGs), of which 4212 were downregulated and 6806 were upregulated, when SH-SY5Y neuroblastoma cells were transfected with Elk1-VP16. The gene set enrichment analysis (GSEA) of these genes up- or downregulated by exogenous Elk-1-VP16 presented overrepresentation of quite a high number of biological processes such as anatomical structure development, cell proliferation, single-organism developmental process, developmental growth, and organ and tissue development, including forebrain and midbrain development (Supplement Tables S1 and S2). When a subset of these genes was analyzed further, stemness genes such as *POU5F1*, *SOX2*, and *NANOG*, as well as growth factors and receptors or transcription factors including FGFR1, WNT16, WNT 3, PDGFA, PAX6, PAX7, HIF3A, NOTO, among many others were found to be upregulated, whereas genes such as EGLN2, FEV, JUNB, and GLI4 were found to be downregulated upon overexpression of Elk-1-VP16 (Figure 1A,B).

Prediction of putative Elk-1 binding sites (i.e., ets motifs) on the promoters of these genes was assessed via Alggen PROMO 3.0 online software [28]. Among the genes of interest for which human promoter sequences were available, the analysis was performed for human ELK-1 (TRANSFAC database accession no. T00250) binding, thereby limiting the number of promoters investigated, and out of these, promoters with at least one motif are listed (Supplement Table S3). Among the selected subset of genes, *SOX2* promoter was found to contain one ets motif with a dissimilarity score of 2.16, *NANOG* was found to

contain one ets motif with a dissimilarity score of 2.3, and *POU5F1* contained one ets motif with a dissimilarity score of 3.12, among other potential ets binding sites, indicating a high probability of binding (Supplement Table S3). Other promoters of the microarray-determined set of putative Elk-1 target genes, whose promoters contained low dissimilarity score ets motifs, included transcription factors such as *RXRB*, *TCF7L1*, *MEF2B*, *PAX6*, *SOX10*, *CREB3*, *SMAD6*, *CREM*, and *HES7* and signal transduction pathway elements such as *RHO*, *IRAK3*, *WNT3A*, *LIFR*, *FRZB*, *NGFR*, *MAPK6*, *NOTCH4*, *FGF11*, and *NODAL*, among many others (Supplement Table S3).

**Figure 1.** (**A**) Heatmap of a subset of genes regulated by Elk-1-VP16 showing increased (green) or decreased (red) expression in Elk1-VP16 overexpressing SH-SY5Y neuroblastoma cells; 557A10, 557A11, 557A12 correspond to SH-SY5Y cells transfected with Elk-1-VP16 expression plasmid, 557A01,557A02,557A03 control SH-SY5Y cells transfected with empty plasmid; color key shows up- and downregulation levels. (**B**) Schematic representation of the relation between selected genes in pluripotency and early embryonic development pathways that were found to be regulated by Elk-1-VP16 in microarray analysis.

#### *3.2. Regulation of Nervous System Development Related Genes by Elk-1*

To validate regulation of selected candidate genes identified through microarray experiments by Elk-1 transcription factor, we have either overexpressed Elk-1-VP16 constitutively active fusion protein or knocked down endogenous Elk-1 expression in SH-SY5Y and SK-N-BE (2) neuroblastoma cell lines and A172 and T98G GBM cell lines (Figure 2).

qPCR results in SH-SY5Y cells were parallel to those obtained from the microarray analysis, especially in the genes related to pluripotency such as SOX2, NANOG, POU5F1, RXRB, GLUT3, TCF7L1, NODAL, and CREB3 (Figure 2A,B). SOX2 was upregulated in SH-SY5Y overexpressing Elk-1-VP16 protein, similar to microarray, but not in other cell types, while it was repressed when Elk-1 was knocked down (siElk-1) in all cell types (Figure 2). Similarly, NANOG and POU5F1 was upregulated in SH-SY5Y cell overexpressing Elk-1-VP16, but downregulated in cells transfected with siElk-1 plasmid (Figure 2A,B), whereas both genes were repressed in SK-N-BE (2) cells overexpressing Elk-1-VP16 and upregulated in siElk-1 knockdown (Figure 2C,D; Table 6), indicating a cell context-dependent regulation. TCF7L1 and NODAL expression increased in Elk-1-VP16 overexpressing SH-SY5Y and SK-N-BE (2) but decreased in siElk-1 silencing; BRACHYURY (T) expression was upregulated in Elk-1-VP16 overexpressing but decreased in siElk-1 silenced SK-N-BE (2) cells (Figure 2; Table 6). GLUT3 expression was upregulated in all cell types overexpressing Elk-1-VP16, and decreased in all cells with siElk-1 silencing, paralleling the microarray results (Figure 2, Table 6). The expression of ARC and CREB3 increased in A172 and T98G cells overexpressing Elk-1-VP16 but decreased in siElk-1 knockdown cells (Figure 2E–H; Table 6). GLI4 and ALS genes increased in A172 cells overexpressing Elk-1-VP16 and decreased with siElk-1 silencing (Figure 2E,F).

**Figure 2.** qPCR expression profiles of selected genes in different cell lines upon overexpression of Elk-1-VP16 (**A**,**C**,**E**,**G**) or knockdown of endogenous Elk-1 (**B**,**D**,**F**,**H**). Expression profiles after (**A**). over-expression with Elk1-VP16 and (**B**). after knock-down with siElk1 in SH-SY5Y neuroblastoma cell line; expression profiles after (**C**). over-expression with Elk1-VP16 and (**D**). after knock-down with siElk1 in SK-N-BE(2) neuroblastoma cell line; expression profiles after (**E**). over-expression with Elk1-VP16 and (**F**). after knock-down with siElk1 in A172 GBM cell line; expression profiles after (**G**). over-expression with Elk1-VP16 and (**H**). after knock-down with siElk1 in T98G GBM cell line. Unpaired t-test; \*\*\*\* *p* < 0.0001, \*\*\* *p* < 0.001, \*\* *p* < 0.01, \* *p* < 0.05.


**Table 6.** Summary of qPCR and microarray comparisons of selected potential Elk-1 target genes after Elk1-VP16 overexpression or siElk-1 silencing in neuroblastoma and glioblastoma cell lines.

\* N/A: the expression level is not available.

The promoters of a subset of genes have been selected for chromatin immunoprecipitation to address whether predicted binding sites were indeed binding to Elk-1 (Figure 3). To that end, we have transfected SK-N-BE (2) neuroblastoma and T98G GBM cell lines with Elk-1-Flag expression vector and pulled down exogenous Elk-1 using Flag-agarose beads. The known targets *SRF* (*p* = 0.0451) and *MCL1* (*p* = 0.0102) showed significant binding in SK-N-BE (2) cells, but the binding was not statistically significant in T98G cells (Figure 3A). Among the novel promoters identified in this study, *GLUT3* promoter showed Elk-1 binding in both cell types, albeit not to the same extent, while *KLF4* (*p* = 0.0496) only showed significant binding in T98G cells (Figure 3B). LIF1, however, did not show significant Elk-1 binding in either cell type.

**Figure 3.** Chromatin immunoprecipitation assay for the identification of Elk-1 binding sites on the target gene promoters in pCMV-transfected (pCMV) vs. Elk-1 over-expressing cells (Elk-1) in (**A**). SK-N-BE (2) cells and (**B**). T98G cells. Lysates were immunoprecipitated with either Flag antibody (Flag IP) for exogenous Elk-1 or IgG (IgG IP) as control. qPCR results were analyzed as explained in Materials and Methods and reported as average fold change.

#### *3.3. Regulation of SOX2, NANOG, and POU5F1 by Elk-1 in CD133+ Cells*

Since Elk-1 was previously shown to be important in human embryonic stem cells (hESCs) maintenance of self-renewal capacity through co-occupation of promoters with *ERK2* [29], and to regulate the promoter of CD133, a cell surface protein commonly used as a cancer stem cell marker [15], we addressed whether the cell context-dependent regulation was due to heterogenous nature of some cell lines used in terms of their tumorsphere forming abilities. SK-N-BE (2) neuroblastoma cells were shown to form CD133+ tumorspheres, unlike SH-SY5Y cells, hence we have first sorted CD133− and CD133+ SK-N-BE (2) cells and showed that expression of *CD133*, *ELK*-*1*, *SOX2*, *NANOG*, and *POU5F1* were all significantly more in CD133+ cells than in CD133− cells (Figure 4A). Intriguingly, ELK-1 levels increased in different passages (p1 and p2) of CD133+ sorted cells, while *CD133* levels declined with each passage; *NANOG* and *POU5F1* levels also increased slightly in p2 cells, albeit not significantly (Figure 4B). Both passages (p1 and p2) of CD133+ SK-N-BE (2) cells were shown to be Nestin+ (data not shown). To address whether this coexpression of ELK-1 with stemness genes studied is through direct regulation, we have silenced endogenous Elk-1 expression in CD133+ SK-N-BE (2) cells, and observed that *NANOG* and *SOX2* but not *POU5F1* were downregulated significantly upon silencing (Figure 4C). It must be noted, however, that overexpression of Elk-1-VP16 in CD133− cells did not yield upregulation of *NANOG*, *SOX2* or *POU5F1* in SK-N-BE (2) cells (data not shown). − − −

− − − **Figure 4.** qPCR expression profiles of stemness genes in CD133− vs. CD133+ SK-N-BE(2) cells and in primary brain tumors. (**A**). Stemness gene expression analysis of SKNBE(2) passage 0, passage 1, and passage 2 cells (\*\*\* *p* < 0.001, two-way ANOVA w/Dunnett multiple comparison test); (**B**). Stemness gene expression analysis of SKNBE(2) CD133+ BTICs vs. CD133− spheroids (unpaired t-test, \* *p* < 0.05, \*\*\* *p* < 0.001, \*\*\*\* *p* < 0.0001); (**C**). *left*, *NANOG*, *middle POU5F1* and *right*, *SOX2* gene expressions in CD133+ cells upon silencing of Elk-1 expression (unpaired t-test; \*\*\*\* *p* < 0.0001, \*\* *p* = 0.0051); (**D**). endogenous Elk-1 expression levels in CD133− and CD133+ primary brain tumor samples (sample no 428, 458 and 624); relative gene expression is reported as normalized to 28S rRNA level (unpaired t-tests; \* *p* < 0.05, \*\*\* *p* < 0.0001); (**E**). primary brain tumor cells from sample no 624 were transfected with either scrRNA or siElk-1 plasmids and analyzed for expression level of endogenous *ELK-1*, *CD133*, *NANOG*, *SOX2*, and *POU5F1* normalized to 28S rRNA level (unpaired t-tests; \*\*\* *p* < 0.0001).

To investigate whether similar regulation could be observed in primary GBM, primary brain tumor samples from three different patients (patient no. 428, 458, 624) were analyzed for ELK-1 expression in CD133− vs. CD133+ cells. Although there was variability between samples, in all three GBMs, CD133+ cells expressed significantly more ELK-1 than CD133− cells (Figure 4D). This was parallel to our analysis of GBM cell lines, where tumorspheres of A172, T98G, and U87 GBM cells expressed significantly more Elk-1 protein than the monolayer cultures did, whereas ELK-1 expression level did not alter significantly in SH-SY5Y tumorsphere vs. monolayer cultures (data not shown). Furthermore, when endogenous ELK-1 was silenced in the primary tumor culture of patient 624 (middle level of ELK-1 expression), *CD133*, *NANOG*, *SOX2*, and *POU5F1* levels were all downregulated as compared to scramble RNA control (Figure 4E).

#### *3.4. Effect of Elk-1 Expression on Colony Formation of SK-N-BE (2) Cells on Soft Agar*

The ability of transformed cells to grow in anchorage-free conditions is one of the hallmarks of cancer formation, and soft agar colony assay is a commonly used tool to assay for this feature [30]. It was shown in endometrial tumors, for instance, that CD133+ cells exhibited higher colony formation than CD133− cells in soft agar assay [31]. We have, therefore, addressed whether the same scenario was true for CD133+/CD133− SK-N-BE (2) cells, and whether overexpression of Elk-1-VP16 or silencing of endogenous Elk-1 would affect the number of colonies. To that end, we have sorted SK-N-BE (2) cells into CD133+ BTICs and CD133− cells, and both CD133+ and CD133− spheroids were grown in IPM culture conditions for three days in limiting dilution assay (LDA), and the frequency of spheroid formation was found to be almost tenfold more in CD133+ BTIC cells, indicating that sorting of cells was successful.

Next, the effects of Elk-1 overexpression or silencing were studied; to that end, we have transfected CD133− cells with Elk-1-VP16 expression vector, while CD133+ cells were transfected with siElk-1 silencing vector as described in Materials and Methods, and colony formation frequencies were determined in soft agar assay. In untransfected SK-N-BE (2) cells, CD133− cells and unsorted cells showed a similar number of colonies (24 ± 11 vs. 25 ± 11, respectively), whereas CD133+ BTICs had almost 50% more colonies formed (33 ± 9 colonies). CD133− cells transfected with either pCDNA3- Elk-1-VP16 (37 ± 10 colonies) or pCMV6-Flag-Elk-1-VP16 (50 ± 15 colonies) showed higher colony number than their counterparts transfected with empty vectors, pCDNA3.1 (32 ± 3 colonies) or pCMV-Flag (24 ± 10 colonies). On the other hand, CD133+ cells where endogenous Elk-1 was silenced by RNAi exhibited a decreased colony number (18 ± 5) compared to scrambled RNA control (26 ± 9) (Supplement Table S4).

#### *3.5. Regulation of NANOG, POU5F1, and SOX2 Promoters by Elk-1*

To assess whether the regulation of these genes by Elk-1 was direct or indirect, the promoters for *NANOG*, *POU5F1*, and *SOX2* were cloned to luciferase reporter vectors and tested for Elk-1 regulation in different cell lines.

Initially, SK-N-BE (2) (Figure 5A) and SH-SY5Y (Figure 5B) neuroblastoma cells and U87-MG (Figure 5C), A172 (Figure 5D), and T98G (Figure 5E) GBM cells were either transfected with constitutively active Elk-1-VP16 and/or dominant-negative Elk-1-EN fusion protein expression vectors for overexpression (i), or with siElk-1 or scrRNA vectors for silencing (ii) experiments to study the regulation of *SOX2* promoter by Elk-1 protein (Figure 5). Although there appear to be cell type-specific variations, *SOX2* promoter appeared to be upregulated upon Elk-1-VP16 overexpression in all cell types (Figure 5Ai– Ei), whereas only SH-SY5Y and U87 cells exhibited downregulation of SOX2-dependent luciferase activity upon the silencing of endogenous Elk-1, indicating that other proteins are involved in the regulation of this promoter (Figure 5Bii,Cii).

**Figure 5.** *SOX2* promoter activity analysis with respect to (**i**) Elk-1 variants over-expression and (**ii**) endogenous Elk-1 silencing in (**A**) SK-NBE (2), (**B**) SH-SY5Y, (**C**) U87-MG, (**D**) A172, and (**E**) T98G cell lines. Luminometric measurements were normalized to *Renilla*-luc activity. ANOVA, Tukey's multiple comparative tests, \*\* *p* < 0.01, \*\*\* *p* < 0.001, \*\*\*\* *p* < 0.0001 for Ai, Ci; unpaired two-tailed t-test, \* *p* < 0.5, \*\* *p* < 0.01, \*\*\*\* *p* < 0.0001 was done for Bi, Bii, Cii, Di, Ei.

We next studied *NANOG* promoter; while *SOX2* promoter was found to have 1 consensus Elk-1 binding motif with dissimilarity score (DS) of less than 1%, and 5 ets motifs with DS 1–10%, *NANOG* promoter was found to contain three consensus ets motifs (Figure 6A), two of which had DS of 1–10% (Table 2). We have constructed a wildtype *NANOG* promoter reporter vector (*NANOG*-Luc), and one where the higher similarity consensus ets motif (ets1) was deleted (*NANOG*∆-Luc), and studied the regulation of this promoter by Elk-1 in different cell lines (Figure 6). *∆*

*∆ Δ Δ Δ* **Figure 6.** Regulation of *NANOG* promoter by Elk-1. (**A**) Schematic diagram of predicted *ets* motifs *ets1-3* on *NANOG* promoter. *Ets1* was predicted to be a stronger binding motif for Elk-1 and was deleted to generate *NANOG*∆*-*Luc reporter plasmid. (**B**) Luciferase assay for (**i**) wildtype *NANOG*-Luc and (**ii**) *NANOG*∆-Luc reporters in SK-N-BE (2) cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN, or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to Renilla-luc activity. ANOVA, Tukey's multiple comparative tests, \*\*\* *p* < 0.001 for (**i**) and (**ii**) left graphs; unpaired two-tailed t-test, \*\* *p* < 0.01 was done for (**i**) and (**ii**) right graphs. (**C**) Luciferase assay for (**i**) wildtype *NANOG*-Luc and (**ii**) *NANOG*∆-Luc reporters in SH-SY5Y cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to *Renilla*-luc activity. ANOVA, Tukey's multiple comparative tests, \*\*\* *p* < 0.001 for (**i**) and (**ii**) left graphs; unpaired two-tailed t-test; \* *p* < 0.5, \*\*\* *p* < 0.001 for (**i**) and (**ii**) right graphs. (**D**) Luciferase assay for (**i**) wildtype *NANOG*-Luc and (**ii**) *NANOG*∆-Luc reporters in U87-MG cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN, or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to *Renilla*-luc activity. ANOVA, Tukey's multiple comparative tests, \*\* *p* < 0.01, \*\*\* *p* < 0.001 for (**i**) and (**ii**) left graphs; unpaired *t*-test; \*\* *p* < 0.01 for (**i**) and (**ii**) right graphs.

*∆*

*∆*

*∆*

*∆*

Elk-1-VP16 overexpression in SK-N-BE (2) cells resulted in upregulation from wildtype *NANOG* promoter, but the upregulation was slightly less in *NANOG*∆-Luc reporter; Elk-1-EN repressed both promoter activities to control levels (Figure 6Bi vs. Figure 6Bii). Parallel to this, when Elk-1 was silenced using siElk-1, *NANOG*-Luc reporter activity was decreased (Figure 6Bi), whereas there was no significant change in NANOG∆-Luc activity in SK-N-BE (2) cells (Figure 6Bii). On the other hand, there was no significant difference between *NANOG* vs. *NANOG*∆ promoter activity by Elk-1-VP16 overexpression in SH-SY5Y or U87 cells, while Elk-1-EN repressed both wildtype and mutant promoter activities (Figure 6C,D). There was a slight albeit significant increase in *NANOG* promoter activity in siElk-1 SH-SY5Y cells, whereas *NANOG*∆ promoter activity was decreased upon siElk-1 silencing (Figure 6Ci vs. Figure 6Cii); in U87 silencing, endogenous Elk-1 did not significantly alter wildtype *NANOG*-Luc activity but resulted in a decrease in *NANOG*∆- Luc (Figure 6Di vs. Figure 6Dii). There was no significant change in either Elk-1-VP16 overexpression or siElk-1 silencing in A172 and T98G cells (data not shown).

In *POU5F1* promoter, of the four predicted ets motifs, three of them were predicted to have DS score of 1-10% DS (Table 2; Figure 7A). Wildtype *POU5F1* promoter was cloned, and deletion constructs for these motifs (ets1-ets3) were generated for luciferase reporter assays as described in Materials and Methods. When wildtype *POU5F1* promoter activity was compared to deletion constructs in SK-N-BE (2) cells transfected with Elk-1- VP16 expression plasmid, *POU5F1*∆*ets2*-Luc deletion construct exhibited less upregulation (around 2.4 units) than wildtype, *POU5F1*∆*ets1*, and *POU5F1*∆*ets3* promoters (around 3 units), while Elk-1-EN overexpression resulted in similar level of activation to control in all cases (Figure 7B). On the other hand, siElk-1 silencing did not result in a significant change in wildtype *POU5F1* promoter activity or the *POU5F1*∆*ets3* deletion mutant, whereas it resulted in a decrease in luciferase activity in both *POU5F1*∆*ets1* and *POU5F1*∆*ets2* constructs in SK-N-BE (2) cells (Figure 7B).

Elk-1-VP16 overexpression upregulated wildtype *POU5F1*-Luc reporter activity, while Elk-1-EN repressed it in SH-SY5Y cells; there was no significant change in this profile in either of the three ets deletion constructs, indicating the regulation might be through a different motif or could be indirect (Figure 7C). Interestingly, siElk-1 silencing upregulated wildtype *POU5F1*-Luc and *POU5F1*∆*ets1*-Luc reporter activity, while decreasing *POU5F1*∆*ets2*-Luc and *POU5F1*∆*ets3*-Luc reporter activity (Figure 7C). In U87-MG GBM cells, however, wildtype *POU5F1*-Luc and *POU5F1*∆*ets2*-Luc reporters were upregulated to similar levels in Elk-1-VP16 overexpression (1.2 units in Figure 7Di and 1.5 units in Figure 7Diii), while *POU5F1*∆*ets1*-Luc was upregulated more (2.4 units, Figure 7Dii), and upregulation was significantly less in *POU5F1*∆*ets3*-Luc reporter (Figure 7Div). Elk-1-EN overexpression did not significantly alter promoter activity (Figure 7D), and while siElk-1 silencing did not cause any change in wildtype promoter, it resulted in a downregulation in all deletion constructs to a different extent (Figure 7D). Wildtype *POU5F1*-Luc promoter was upregulated by Elk-1-VP16 overexpression in both A172 and T98G cells, although siElk-1 silencing did not significantly change with respect to scrambled RNA control.

*∆*

*∆ ∆*

*∆*

*∆*

*∆*

*∆*

∆ *∆ ∆ ∆ ∆ ∆* ∆ ∆ **Figure 7.** Regulation of *POU5F1* promoter by Elk-1. (**A**) Schematic diagram of predicted *ets* motifs *ets1-4* on *POU5F1* promoter. Motifs e*ts1-3* were predicted to be stronger binding motifs for Elk-1 and were individually deleted to generate *POU5F1*∆e*ts1-*Luc, *POU5F1*∆e*ts2-*Luc, and *POU5F1*∆e*ts3-*Luc reporter plasmids. (**B**) Luciferase assay for (**i**) wildtype *POU5F1-*Luc and its deletion mutant reporters (**ii**) *POU5F1*∆e*ts1-*Luc, (**iii**) *POU5F1*∆e*ts2-*Luc, and (**iv**) *POU5F1*∆e*ts3-* Luc in SK-N-BE (2) cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN, or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to Renilla-luc activity. ANOVA, Tukey's multiple comparative tests, \*\* *p* < 0.01, \*\*\* *p* < 0.001 for left graphs (**i**–**iv**); unpaired two-tailed t-test, \*\* *p* < 0.01 and \*\*\* *p* < 0.001 for right graphs (**i**–**iv**). **C.** Luciferase assay for (**i**) wildtype *POU5F1-*Luc and its deletion mutant reporters (**ii**) *POU5F1*∆e*ts1-*Luc, (**iii**) *POU5F1*∆e*ts2-*Luc, and (**iv**) *POU5F1* e*ts3-*Luc in SH-SY5Y cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN, or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to Renilla-luc activity. ANOVA, Tukey's multiple comparative tests, \* *p* < 0.5, \*\* *p* < 0.01, \*\*\* *p* < 0.001 for left graphs (**i**–**iv**); unpaired two-tailed t-test, \* *p* < 0.5, \*\* *p* < 0.01 for right graphs (**i**–**iv**). (**D**) Luciferase assay for (**i**) wildtype *POU5F1-*Luc and its deletion mutant reporters (**ii**) *POU5F1*∆e*ts1-*Luc, (**iii**) *POU5F1*∆e*ts2-*Luc, and (**iv**) *POU5F1*∆e*ts3-*Luc in U87-MG cells after transfection of expression plasmids with Elk1-VP16, Elk1-EN, or empty control plasmid pCDNA3.1 (**left graphs**) or co-transfection of silencing plasmids for scrRNA control or siElk-1 (**right graphs**). Luminometric measurements were normalized to Renilla-luc activity. ANOVA, Tukey's multiple comparative tests, \* *p* < 0.5, \*\* *p* < 0.01, \*\*\* *p* < 0.001 for left graphs (**i**–**iv**); unpaired two-tailed t-test, \*\* *p* < 0.01, \*\*\* *p* < 0.001 for right graphs (**i**–**iv**).

#### *3.6. Binding of Elk-1 to Predicted ets Motifs on SOX2, NANOG, and POU5F1 Promoters*

Elk-1-VP16 overexpression was found to upregulate expression of *SOX2*, *NANOG*, and *POU5F1* expression in qPCR analysis, and wildtype promoter luciferase reporters were found to be upregulated by Elk-1-VP16 in a cell context-dependent manner, yet deletion of predicted ets motifs did not significantly change reporter activities, indicating that either there are other ets motifs in distal promoters that are not cloned in this study, or that the regulation is not through direct Elk-1 binding to these predicted ets motifs. To address this second point, we have carried out chromatin immunoprecipitation (ChIP) experiments in SK-N-BE (2) neuroblastoma and T98G GBM cell lines (Figure 8).

The cells were transfected with pCMV-Flag-Elk-1 (empty pCMV was used as control), and immunoprecipitation was carried out using Flag agarose beads (Flag IP); IgG beads were used as control (IgG IP). Elk-1 binding motifs on *SRF* and *MCL*-1 promoters were used as a positive control for Elk-1 binding. All three of the predicted ets motifs on the *NANOG* promoter exhibited Elk-1 binding in SK-N-BE (2) cells (Figure 8A) but not on T98G cells (Figure 8B). Similarly, all four predicted ets motifs on *POU5F1* promoter showed Elk-1 binding, albeit to different extents, in SK-N-BE (2) cells (Figure 8A) but not on T98G cells (Figure 8B). Likewise, all five predicted ets motifs showed Elk-1 binding in SK-N-BE (2) cells (Figure 8A), whereas only the ets3 motif showed significant binding to Elk-1 in T98G (Figure 8B). This indicates that, while Elk-1 is capable of binding to these predicted motifs, this binding is affected by cell-dependent circumstances, which may be a transcriptional partner or posttranslational modification status of the Elk-1 protein in that particular cell type.

150

#### **4. Discussion**

ETS transcription factors are involved in a number of biological processes in different tissues, and it was shown that in embryonic development, expression of several ETS proteins including Elf3 and SpiC increased after fertilization until the blastocyst stage, and silencing of ETS expression affected Oct3/4 gene expression [32]. It was shown in human pluripotent stem cells (hPSCs) with different X chromosome inactivation states (Xa, active, Xi, inactive) that Elk-1 overexpression mimicked XaXa in terms of decreased pluripotency, the differences being diminished in low oxygen [33].

One study has shown Elk-1 to be essential for human embryonic stem cells, and that it co-occupies promoters of genes in cell proliferation pathways with *ERK2*, and in the absence of *ERK2*, the promoters were repressed by Polycomb proteins [29]. In fact, Elk-1 was further found to be upregulated, while Nanog, Oct4, and Sox2 were found to be repressed during mesoderm differentiation of hESCs, and it was shown to bind to and activate promoters such as *EGR-1* while repressing a subset of promoters such as *FOSL1* [16]. Intriguingly, mice deficient for Elk-1 were viable albeit with mild neuronal impairment, indicating other Ets proteins may act redundantly and compensate for its embryonic functions [34]. During neuronal differentiation of mES cells, Sox2 chromatin interaction profiles were altered, and promoters of neuronal differentially expressed gene clusters were enriched in Elk-1, among other transcription factors [35]. Similarly, during reprogramming of fibroblasts into neural stem cells (NSCs) using pharmacological molecules, Elk-1 was found to be one of the transcription factors to regulate reprogramming, particularly through binding Sox2 promoter [36].

Another Ets protein, Pea3/ETV4, was shown to regulate Nanog and Oct4 expression in pluripotent NCCIT embryonic carcinoma cells [37,38]. Interestingly, members of the Pea3 subfamily of ETS proteins, ETV4 and ETV5, were found to be expressed in undifferentiated ES cells, and suppression of Oct3/4 was found to result in downregulation of their expression, and ETV4 and ETV5 were found to be important for proliferation of undifferentiated ES cells through regulation of stem cell-related genes such as Tcf15, Gbx2, and Zic3 [39]. A transcriptional partner of Elk-1, namely serum response factor (SRF), was shown to repress the reprogramming induced by ERK pathway inhibition, and to negatively regulate pluripotency [40], which may be independent of Elk-1 interaction.

CD133 is a cell surface protein that has been used alone [12] or in combination with CD15 [41] to isolate and culture brain-tumor-initiating cells from a variety of tumors. ERK/MAPK pathway was shown to be required for CD133 expression [42], and HIF-1α was shown to bind to the CD133 promoter through Elk-1 [15], which is supported in our study by overexpression of Elk-1 in CD133+ BTIC subpopulation.

In a genome-wide study in the human embryonic stem cell (hESC) population, ELK1 was found to be essential for hESCs, and some of the promoters bound by ELK1 were determined to be important in the maintenance of embryonic identity, spinal cord development, and neuron fate development [29]. Furthermore, induced neural stem cells were found to contain relatively high levels of phosphorylated Elk-1, along with Gli2, and both were shown to bind to *Sox2* promoter upon neural reprogramming [36], and distinct GABPA/Elk-1 motifs were found in Sox2 promoter, identified as a neuronal cluster gene involved in differentiation of embryonic stem cells to neuronal precursors [35]. It is intriguing whether tumorigenesis reactivates this mechanism in a cell context-dependent manner.

#### **5. Conclusions**

We propose that not only does ELK1 present a novel target for tumor therapy directed at eliminating BTIC population, but also can be used as a molecular diagnostic molecule to identify potential for tumor recurrence. It should be noted, however, that posttranslational modifications such as phosphorylation and SUMOylation regulate ELK1 protein, which can differ among gliomas and must be studied in more detail.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/xxx/s1, supppl1: Biological processes among enriched pathways that were upregulated in Elk-1-VP16 transfected SH-SY5Y cells, Table S2: Biological processes among enriched pathways that were downregulated in Elk-1-VP16-transfected SH-SY5Y cells, Table S3: Transcription factor binding site analysis for promoters of genes identified in microarray analysis, Table S4: Soft agar assay colony formation assay.

**Author Contributions:** Conceptualization, I.A.K.; data curation, U.D.; formal analysis, M.S.S., C.V., B.K., U.D., S.M., G.G., and I.A.K.; funding acquisition, I.A.K.; investigation, M.S.S., C.V., B.K., U.D., and S.S.; methodology, M.S.S., B.K., S.M., S.S., G.G., and K.Y.A.; project administration, B.Y. and I.A.K.; resources, S.S. and I.A.K.; software, G.G.; supervision, K.Y.A., B.Y., and I.A.K.; writing—original draft, M.S.S., C.V., B.K., and S.S.; writing—review and editing, M.S.S., C.V., S.S., K.Y.A., and B.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by TUBITAK project no 211T167 as part of the COST TD901 Action, and TUBITAK project no 115Z804 as part of COST TN1301 Action.

**Institutional Review Board Statement:** Human GBM samples were obtained from consenting patients, as approved by the Hamilton Health Sciences/McMaster Health Sciences Research Ethics Board (REB# 07-366; approved 2007, last updated September 2020).

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Transcriptomics raw data have been uploaded to and available at EBI ArrayExpress, with accession number of E-MTAB-9938. The rest of the data are available upon request.

**Acknowledgments:** We would like to thank members of the Kurnaz and Singh laboratories for helpful discussions and Eray Sahin for initial data analysis of the Elk-1-VP16 microarray with demo IPA software. IAK is a GYA member and GG is a YOK 100/2000 student.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Review* **Current State of "Omics" Biomarkers in Pancreatic Cancer**

**Beste Turanli 1,† , Esra Yildirim 1,†, Gizem Gulfidan <sup>1</sup> , Kazim Yalcin Arga 1,2,\* and Raghu Sinha 3,\***

	- † These authors contributed equally to this work.

**Abstract:** Pancreatic cancer is one of the most fatal malignancies and the seventh leading cause of cancer-related deaths related to late diagnosis, poor survival rates, and high incidence of metastasis. Unfortunately, pancreatic cancer is predicted to become the third leading cause of cancer deaths in the future. Therefore, diagnosis at the early stages of pancreatic cancer for initial diagnosis or postoperative recurrence is a great challenge, as well as predicting prognosis precisely in the context of biomarker discovery. From the personalized medicine perspective, the lack of molecular biomarkers for patient selection confines tailored therapy options, including selecting drugs and their doses or even diet. Currently, there is no standardized pancreatic cancer screening strategy using molecular biomarkers, but CA19-9 is the most well known marker for the detection of pancreatic cancer. In contrast, recent innovations in high-throughput techniques have enabled the discovery of specific biomarkers of cancers using genomics, transcriptomics, proteomics, metabolomics, glycomics, and metagenomics. Panels combining CA19-9 with other novel biomarkers from different "omics" levels might represent an ideal strategy for the early detection of pancreatic cancer. The systems biology approach may shed a light on biomarker identification of pancreatic cancer by integrating multi-omics approaches. In this review, we provide background information on the current state of pancreatic cancer biomarkers from multi-omics stages. Furthermore, we conclude this review on how multi-omics data may reveal new biomarkers to be used for personalized medicine in the future.

**Keywords:** pancreatic cancer; systems biology; omics; biomarker; genomics; transcriptomics; proteomics; metabolomics; glycomics; metagenomics; personalized medicine

#### **1. Introduction**

Pancreatic cancer is one of the most fatal malignancies and the seventh leading cause of cancer-related deaths considering both sexes worldwide according to the latest global cancer statistics reported in 2018 [1]. Pancreatic cancer has a difficult diagnosis at an early stage and a 5 year survival rate of 10% at the time of diagnosis in the United States, where the poor survival rates have hardly changed for almost 40 years since most patients reporting to the hospital have either unresectable or metastatic disease. Only 10.8% of these patients are at a locally advanced stage at the time of diagnosis [2,3]. Unfortunately, pancreatic cancer is projected to become the third leading cause of cancer deaths in the future [1].

It is a great challenge to intervene at the early stages of pancreatic cancer that is in initial diagnosis or postoperative recurrence because of the difficulties in early diagnosis and inadequacy in precise prognostic biomarkers, and this challenge may result in undesirable overdiagnosis and/or overtreatment, causing the high mortality rate [4–7].

Pancreatic cancer can be divided into two large groups; (a) endocrine pancreatic tumors, including gastrinoma, glucagonoma, and insulinoma, and (b) exocrine (nonendocrine) pancreatic tumors, including adenoma, ductal adenocarcinoma, acinar cell

**Citation:** Turanli, B.; Yildirim, E.; Gulfidan, G.; Arga, K.Y.; Sinha, R. Current State of "Omics" Biomarkers in Pancreatic Cancer. *J. Pers. Med.* **2021**, *11*, 127. https://doi.org/ 10.3390/jpm11020127

Academic Editor: Raghu Sinha Received: 22 January 2021 Accepted: 11 February 2021 Published: 14 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

carcinoma, cystadenocarcinoma, adenosquamous carcinoma, signet ring cell carcinoma, hepatoid carcinoma, colloid carcinoma, undifferentiated carcinoma, pancreatoblastoma, and pancreatic mucinous cystic neoplasm [8,9]. Most of the pancreatic cancers are exocrine types—namely, ductal adenocarcinoma, which comprises 80–90% of all pancreatic cancers; whereas endocrine (neuroendocrine) pancreatic tumors are rare with 1–2% of all pancreatic cancers [7].

Moreover, pancreatic neoplasms can be categorized by their gross appearance as solid, cystic, or intraductal. The solid pancreatic tumors contain pancreatic ductal adenocarcinoma (PDAC), neuroendocrine (islet cell) neoplasms, acinar cell carcinomas, and pancreatoblastoma. The cystic types of pancreatic tumors tend to be less aggressive and include mucinous cystic neoplasms, serous cystadenoma, intraductal papillary mucinous neoplasms, and solid-pseudopapillary neoplasms [10]. Pancreatoblastoma is mostly observed in childhood, and it has a poor prognosis if an adult is diagnosed with it. Mucinous cystic neoplasms consist of a range from benign to malignant [7].

The World Health Organization (WHO) classifies the morphological variants of PDAC differently from the conventional pancreatic adenocarcinoma classification. These variants have different histological features besides molecular signatures and prognosis. According to WHO, the different subtypes of PDAC are adenosquamous carcinoma, colloid/mucinous carcinoma, undifferentiated/anaplastic carcinoma, signet ring cell carcinoma, medullary carcinoma, and hepatoid carcinoma [11].

Like most cancer types, pancreatic cancer has also several known risk factors, such as cigarette smoking, diabetes, obesity, lack of physical activity, and chronic pancreatitis [12,13]. Currently, computed tomography (CT), magnetic resonance imaging (MRI), endoscopic ultrasound (EUS), positron emission tomography (PET), and other imaging methods are used in the diagnosis and prognosis of pancreatic cancer [12–14].

Unsurprisingly, early detection of PDAC by effective screening approaches is crucial to improve a better prognosis of the disease. The absence of clinical symptoms in the early stage of pancreatic cancer could lead to a delay in confirmed diagnosis even though tumor biomarkers and imaging techniques are being developed. Therefore, using circulating biomarkers for primary screening and its combination with imaging and histopathologic results might be the future strategy for diagnosing PDAC. Candidate circulating biomarkers in PDAC are not limited to circulating tumor cells (CTC) but also consist of metabolites, cellfree DNA and non-coding RNA, exosomes, autoantibodies, and inflammatory or growth factors, which are recently summarized [15]. The presence of CTCs in the blood usually correlates with the systemic spread of the tumor, and the characteristics of these CTCs could be used as potential biomarkers. Moreover, the challenging tasks of CTC isolation and detection are being overcome [16,17], and the emerging area of profiling CTCs has been recognized in prognosis of pancreatic cancer [18].

Sample source is very critical in the identification of biomarkers for the detection and diagnosis of early-stage pancreatic cancer [19]. The pancreas is located in the back of the abdomen and is surrounded by the stomach, small intestine, liver, and spleen, so it becomes a big challenge in getting a biopsy. The most common way to get pancreatic tumor samples is by fine-needle aspiration (FNA). However, a core needle biopsy using a larger needle than an FNA can provide a larger sample, often useful for molecular profiling. These biopsies can be taken with an EUS. Other biopsy types, like brush biopsy or forceps biopsy, can be done during an endoscopic cholangiopancreatography (ERCP). However, body fluids such as blood, cyst fluid, pancreatic juice, bile, as well as urine are characteristically enriched with biomarkers that can be a potential source of diagnostic, predictive, and/or prognostic biomarkers in PDAC. As a source of pancreatic cancer biomarker, saliva has also been used. In omics biomarker studies, blood is a frequently preferred sample source due to its easy accessibility, noninvasiveness, and cost-effectiveness [20]. As an alternative rich source for the discovery of biomarkers, pancreatic juice has recently been identified. Pancreatic juice contains pancreatic cancer-specific markers such as DNA, RNA, proteins, and cancer cells, but the collection procedure for this sample source is invasive [19]. Although urine

contains limited protein, DNA, and RNA, it can be considered as an ideal source sample for proteomic and genomic biomarkers [21]. Furthermore, accurate staging is very important for providing appropriate treatment. The majority of the time, surgical excision is used for treatment, and traditional chemoradiotherapy has very restricted effectiveness, despite the development of novel therapy options [7]. In this review, we present a systems-level outlook of PDAC biomarkers from different "omics" levels (Figure 1) as well as a comprehensive overview of methodology and sampling used in biomarker studies for PDAC (Table 1).

**Figure 1.** A conceptual review of pancreatic cancer biomarkers from a variety of "omics" levels.

#### **2. Recent Insights from Different Omics Levels**

Despite the substantial advancement in pancreatic cancer research, there has not been any remarkable reduction in the mortality-to-incidence ratio. This is mainly a result of the limited early diagnostic characteristic symptoms and reliable biomarkers, besides the unresponsiveness to the treatments due to the tumor heterogeneity, plasticity, and the aggressive metastasis that presents in more than 50% of the diagnosed patients [22].

Systems biology studies of pancreatic cancer rely on the integration of omics data from different biological levels. With the frequently arising challenges regarding cancer diagnosis and treatment—mainly due to its complex pathogenic landscape and cellular heterogeneity—the holistic view provided by the systems biology approach allowed for having a global understanding of the mechanisms of the disease and gaining more insight toward diagnostic or prognostic biomarkers and drug target discovery [23,24].

Likewise, systems biology also augments current diagnosis and therapy options. Aggressiveness and chemoresistance of PDAC are caused by the desmoplastic reactions induced by immune cells, stromal cells, neural cells, and the extracellular matrix surrounding and forming the bulk of the tumor mass. Therefore, single-cell sequencing may shed a better insight into cellular differences. Moreover, altered metabolism is caused by limited delivery of the needed oxygen and nutrients in such a hypoxic and acidic microenvironment; a direct impact on the drug delivery mechanisms is common [25,26].

#### **3. Genomic Signatures**

Next-generation sequencing (NGS) provides support for the early diagnosis and screening of PDAC as well as many other diseases. Genomics techniques may assist in the early diagnosis of pancreatic cancer in patients with specific alleles that predispose them to cancer development. Different potential biomarkers discovered by genomics

methods can be categorized as chromosomal aberrations, driver changes, single nucleotide polymorphisms (SNPs), or copy-number alterations.

Previous studies pointed out the most prominent genetic features of PDAC, such as oncogenic activation of K-RAS, which is a standard feature in more than 90% of the patients, and with the early onset mutation of that gene, it is considered a critical driver of PDAC initiation and progression [27]. Along with the oncogenic activation, inactivating mutations of the tumor suppressor gene CDKN2A/2B are also observed in more than 80% of the early-stage lesions, while later stages of PDAC exhibit inactivating mutations and deletions of tumor suppressor genes most prominently including TP53 and SMAD4 [28].

Metabolic reprogramming is considered a prominent hallmark of PDAC. Therefore, tackling this aggressive cancer might be possible through establishing a clear understanding regarding its metabolism in addition to genomics [29]. Recent studies have shown the crucial role of both glucose and glutamine metabolism in the progression of PDAC tumors that are regulated by the K-RAS oncogene to maintain tumor growth [30–32]. Inducible oncogenic K-RAS mouse model of PDAC showed—in addition to being a key driver of PDAC initiation—that it plays a central role in rewiring the tumor glucose metabolism by stimulating the glucose uptake and driving glycolysis intermediates toward nonoxidative pentose phosphate pathways [31]. It was also reported that the PDAC cells maintain the tumor growth by relying on the distinct pathway of glutamine metabolism and that this reprogramming is mediated by K-RAS [30].

Therefore, not only genomics biomarkers but also network reconstructions [33], including different omics levels, become an essential tool for exploring the disease under the systems biology perspective. Network models and computational platforms for integrating and analyzing these data, as well as investigating more thoroughly into these networks by simulations, are prominent efforts.

#### **4. Coding and Noncoding RNA Signatures of Pancreatic Cancer**

Initial transcriptome studies were performed for analysis of the mRNA profiles, which focused on protein-coding genes in PDAC. Thereafter, researchers compared gene expression levels between tumors and normal pancreas tissues and determined the genes with altered expression profiles in the disease state; this assisted in discovering potential diagnostic or prognostic biomarkers [34]. Over the years, microarray and RNAseq technology have been utilized not only to obtain coding but also non-coding RNA signatures. Although transcriptomic studies of non-coding RNAs are mainly focused on microRNAs (miRNA) and long non-coding RNAs (lncRNAs), other non-coding RNA types such as piwi interacting RNA (piRNAs), circular (circRNAs), small nucleolar RNA (snoRNA), and small nuclear RNA (snRNA) [35] are also promising biomarker candidates as they are quantitatively assessed, providing opportunities for noninvasive and early diagnosis of PDAC [20].

miRNAs involve in the expression of posttranscriptional regulatory mechanisms [36] and act as oncogenes or inhibit tumor suppressors in PDAC. Overexpression of the oncogene miRNAs (oncomir) increases in tumor progression, while tumor suppressors inhibit cell proliferation and induce apoptosis [37] by inactivating TP53, P16, and SMAD4 in PDAC [38]. miRNAs have the advantage of being stable in serum, hence these show remarkable potential as diagnostic biomarkers or a prognostic tool for noninvasive detection and convenient screening [39]. Therefore, the use of miRNA expression profiling has gained importance for the early detection of cancer [40,41].

Dysregulation of miRNAs in PDAC has been investigated not only in pancreatic tumors but also in blood samples, pancreatic juice, stool, urine, and saliva [39,42]. In several studies, the expression levels of miR-21, miR-155, and miR-196 have been reported to be upregulated in PDAC [43–46]. The higher concentration of miR-155 and miR-210 in the sera of pancreatic cancer patients as compared to normal healthy individuals has been proposed as a potential diagnostic marker in the early stages of pancreatic cancer [47,48]. Moreover, miR-155 and miR-21 were also found to have increased expression in pancreatic juices, while expressions are linked with histological progression characteristics [49]. In addition, the evaluation of more than 700 miRNAs in a study using blood samples compared between pancreatic cancer patients and healthy individuals emphasized miR-1290 as a promising biomarker [50]. Likewise, multiple studies have proposed not only miR-21, miR-155, miR-196, and miR-1290 but also miR-200, miR-18a, miR-210, miR-192, miR-22, miR-642b, miR-885-5p, and miR-375 as candidate biomarkers for PDAC patients [47,51–55]. Another comparison between cancer patients and healthy individuals clearly showed a distinct miRNA expression profile that included upregulation of miR-21, miR-23a, miR-31, miR-100, miR-143, miR-155, miR-2214, and downregulation of miR-148a, miR-375, and miR-217 [43].

The combination of various biomarkers such as CA19-9 with miR-16 and miR-196a provoked distinct improvement to distinguish between PDAC patients and healthy controls [56]. Similarly, the miR-27a-3p expression profile coupled with CA19-9 differentiated PDAC patients and healthy controls with a sensitivity and specificity of more than 80% [57,58]. Among diagnostic features of miRNAs, poor survival in PDAC patients was determined regarding overexpression of miR-221/222 and miR-744 levels in tumor tissue and plasma, respectively, as well as low-expression levels of miR-218 and miR-494 in tumor tissue [59–62].

In addition to microRNAs, other non-coding RNAs—such as long non-coding RNAs (lncRNAs), small nuclear RNAs (snRNAs), or circular RNAs (circRNAs)—have also been identified that might have potential as diagnostic or prognostic markers for PDAC. Long non-coding RNAs (lncRNAs) consist of more than 200 nucleotides, and some of them are circulating in body fluids which makes them promising markers for disease detection [63]. Although the biological functions of lncRNAs are not fully understood, the expression of lncRNAs (HOTAIR, MALAT-1, GAS5, MEG3, HULC, BC008363, and HSATII) showed significant alterations in pancreatic cancer cell lines. Besides, HOTAIR and PVT1 had higher concentrations in saliva in PDAC patients than saliva taken from healthy individuals. Therefore, these lncRNAs in saliva offer a potential noninvasive detection method for PDAC [35]. To date, U2snRNA, which is overexpressed in PDAC, has been the only reported snRNA biomarker in PDAC patients [64].

Circular RNAs (circRNAs), as another type of non-coding RNAs, have drawn increased attention through their regulatory roles in cancer. Generally, these are generated from precursor mRNA (pre-mRNA) by canonical splicing and head-to-tail back splicing, which makes them circular. Moreover, their structure without a polyA tail makes circRNAs favorably insensitive to ribonuclease and more desirable as clinically useful biomarkers. These function as miRNA sponges and overwhelm the ability of the miRNA to bind its mRNA targets [65]. Therefore, the associations of miRNAs and circRNAs with their potential regulatory role were also investigated in PDAC. For instance, hsa\_circ\_0005785 is potentially able to bind miR181a and miR181b as "oncomiRs" in pancreatic cancer, while miR-181a plays a critical role in regulating cancer growth and migration [66]. In another study, two upregulated circRNAs (hsa\_circ\_0001946, hsa\_circ\_0005397) and five downregulated circRNAs (hsa\_circ\_0006913, hsa\_circ\_0000257, hsa\_circ\_0005785, hsa\_circ\_0041150, and hsa\_circ\_0008719) were proposed as biomarkers after microarray analysis. They also validated the expression pattern of the above seven proposed circRNAs via qRT-PCR in PDAC tissues and adjacent normal tissues [67]. More recently, circRNAs expression in PDAC was explored by comparing PDAC tissues versus normal tissues by using microarray again. As a result, 256 differentially expressed circRNAs and 20 differentially expressed miRNAs were proposed to be associated with PDAC development [68].

Seimiya and coworkers [69] applied circular RNA-specific RNA sequencing and determined more than 40,000 previously unknown circRNAs that were altered in PDAC. Their research resulted in a novel circRNA, named circPDAC RNA, with no peptide production but the aberrant expression in PDAC tissues as well as patient serum. Another recent study involving a 208-case cohort of patients with PDAC identified a novel circRNA, named circBFAR or hsa\_circ\_0009065. The expression of circBFAR correlated positively with the tumor-node-metastasis stage and was related to the poor prognosis of patients

with PDAC. Likewise, circBFAR knockdown dramatically inhibited the proliferation and motility of PDAC cells in vitro and their tumor-promoting and metastatic properties in the in vivo models [70]. A recent systematic review designating the roles of circRNAs in pancreatic and biliary tract cancers gathered detailed information and provided an understanding of the role of circRNAs in pancreatic cancer [71].

In recent studies, single-cell transcriptomics has paved the way to elucidate molecular biomarkers for early diagnosis of PDAC. Peng et. al. [72] found that a subset of ductal cells with unique proliferative features were associated with an inactivation state in tumorinfiltrating T cells, providing novel markers for the prediction of an antitumor immune response. EGLN3, MMP9, and PLAU have been reported as participating in PDAC carcinogenesis regarding dysregulated gene expression in malignant ductal cells [72]. In another single-cell RNA-sequencing study, sampling was from the mouse pancreas during the progression from preinvasive stages to tumor formation. While metaplastic cells were found to express two transcription factors, ONECUT2 and FOXQ1, the altered expression profiles of MARCKSL1, MMP7, and IGFBP7 were also observed, which could be accomplished as candidate markers for early detection of PDAC [72].

Consequently, findings provided by transcriptomic analysis of PDAC have been a valuable resource not only for deciphering the intra-tumoral heterogeneity and disease mechanism but also suggesting potential biomarkers for diagnosis, targeted therapy, or immunotherapy.

#### **5. Proteomic Signatures of Pancreatic Cancer**

Proteomics is a powerful approach that encompasses an extensive range involving the systematic analysis of protein structure, function, expression, protein–protein interactions, and posttranslational modifications [73]. Over many years, proteomics has been a key player for researchers to pinpoint biomarkers, which can be used as a tool for a faster disease diagnosis, prognosis, and enhanced treatment [74,75]. In terms of making contributions to clinical disease prediction, protein-based biomarkers are promising. The analysis and verification of unique protein biomarkers have been achieved by using highly sensitive and reliable mass spectrometry-based proteomics. Moreover, this technique is crucial in terms of querying protein modifications [20]. Numerous clinical specimens of pancreatic cancer such as pancreatic juice, pancreatic tumor tissue, pancreatic cyst fluid, urine, and plasma/serum have become targets for the proteomics field to dig into mechanisms of disease, improve novel biomarkers, and enhance drug development [76–78]. Identifying proteins or peptides detected in body fluids in cases of cancer might be useful for the early diagnosis of PDAC [78].

Sample type is a critical concern for the study of biomarkers. Since blood serum or plasma is convenient for periodic collections and includes a reproducible quantification, it is presumably the most preferred option. Although blood samples are easily accessible and noninvasive, the fundamental disadvantage of blood collection for the discovery of novel biomarkers is that not every protein carrying diagnostic potential is secreted into the bloodstream [79]. Investigation of the human pancreatic proteome has been done in patients with premalignant neoplasia, PDAC, and benign pancreatic disease. Although one of the most potent samples from the pancreas is the pancreatic juice, involving a high amount of proteins that might display the disease status, its collection is onerous since this procedure requires an endoscopy and cannulation of the pancreatic duct [80–86]. Collecting and conserving the intact tumor tissue and adjacent normal tissue is challenging due to the presence of digestive enzymes secreted by the pancreas. Nonetheless, pancreatic tissue is considered an excellent specimen for investigation of the pathological mechanisms underlying PDAC as well as for determining drug targets in virtue of its proximity to the lesion and its greater ingredient of tumor-related proteins [87]. Pancreatic cysts, which possess peculiarly stagnated fluids, are extensively seen as the most hopeful origin for the discovery of potential biomarkers since these tend to turn into pancreatic cancer [88]. In terms of urine, this is an effortlessly approachable biological specimen for biomarker

detection, and its proteins are generated from both glomerular filtration and kidney [89]. Due to their accessibility and noninvasiveness, various urinary protein biomarkers have been examined to improve clinical assays for the diagnosis of several cancer types. As yet, merely a restricted amount of proteomics studies have been carried out to investigate the urinary proteome [90].

A retrospective study using a comprehensive proteomic analysis of pancreatic juice and pancreatic cell line samples from PDAC patients demonstrated that regenerating Family Member 1 Beta (REG1B) and syncollin (SYCN) could represent potential PDAC biomarkers [84,91]. Sogawa et al. [92] carried out a comparative proteomics analysis using a tandem mass tag (TMT) labeling and demonstrated that C4b-binding protein α-chain (C4BPA) is a novel serum biomarker for the early diagnosis of PDAC as well as for discrimination between PDAC and other gastroenterological cancers. Based on the results of a combinatorial proteomics strategy, Yoneyama et al. [93] indicated that insulin-like growth factor-binding proteins, IGFBP2 and IGFBP3, are compensatory biomarkers that can allow more accuracy through the combination with CA19-9 for the early detection of PDAC. In an MS-based proteomic study, Guo et al. [94] have demonstrated that dysbindin as a potential biomarker improved the accuracy of diagnosis in distinguishing PDAC from other pancreatic diseases. In a recent study, Cohen et al. [95] observed that the combination of testing circulating tumor DNA (ctDNA) with protein biomarkers (CA19-9, CEA, hepatocyte growth factor (HGF), and osteopontin) shows better performance than the CA19-9 test alone to distinguish PDAC from healthy controls. The improved accuracy of the biomarker panel—which is composed of a gold standard biomarker CA19-9, tissue factor pathway inhibitor (TFPI), and an isoform of tenascin C (TNC- FNIII-B)—in the differentiation of early-stage PDAC from different diseases was also demonstrated in a clinical cohort study [96]. In addition, Capello et al. [97] reported that the combination of TIMP1, LRG1, and CA19-9 performed better diagnostic accuracy than CA19-9 alone in differentiating early-stage PDAC from benign PDAC. Kim et al. [98] identified another biomarker panel that has high plasma THBS-2 and CA19-9 concentrations, which showed a remarkable differentiation ability between PDAC and healthy patients with 87% sensitivity and 98% specificity. The clinical significance of serum survivin was also reported in PDAC patients [99].

The pancreatic ductal fluid has been proposed as a good biological fluid for identifying prognostic biomarkers [100]. Focusing on the content of the ductal fluid, high concentrations of mucins and S100A8 or S100A9 were associated with the low survival rate in PDAC [100]. Ger et al. [101] recently investigated the proteome of 37 samples from pancreatic cancer and healthy subjects and identified that FLT3 and PCBP3 are promising prognostic biomarkers of pancreatic cancer.

Targeted proteomics is a rapidly evolving technological tool that conceptually represents an important advancement in alleviating the bottleneck in the preclinical biomarker assessment processes. In a targeted proteomics pilot study [102], five pancreatic cancer biomarker candidates—including 14-3-3 protein sigma, gelsolin, lumican, transglutaminase 2, and tissue inhibitor of metalloproteinase 1—were investigated in 60 plasma samples using a simple and robust selected reaction monitoring (SRM) multiplexed assay. Their results showed that gelsolin, lumican, and tissue inhibitor of metalloproteinase 1 have better area under curve (AUC) values than CA19-9 to discriminate pancreatic cancer from healthy controls and chronic pancreatitis controls. Yoneyama and colleagues [103] developed a quantification method specific for α-fibrinogen hydroxylated at proline residues 530 and 565 by SRM/multiple reaction monitoring (SRM/MRM). To validate these modifications as pancreatic cancer biomarkers, they quantified these posttranscriptional modifications in plasma samples from 70 pancreatic cancer patients and 27 healthy controls. They demonstrated that the plasma concentration of proline-hydroxylated α fibrinogen is significantly greater in pancreatic cancer patients.

In light of the rapidly developing accuracy and efficiency of proteomic approaches, our knowledge of the underlying molecular mechanism of pancreatic cancer has greatly

increased [104,105]. However, there are still various limitations and analytic challenges that have resulted from the dynamic nature of the proteome of tissues and cells and the variation in the forms and functions of proteins due to several modifications [106]. Although several standardizations and improvements are required, proteomics is certainly a promising approach for the early diagnosis, prognosis, and discovery of targets for the treatment of pancreatic cancer.

#### **6. Metabolomic Signature of Pancreatic Cancer**

Metabolomics or metabolite profiling is a novel promising approach for the identification of robust biomarkers for diagnosis, prognosis, and assessment of treatment in pancreatic cancer [107–111]. Although there is currently no clinically validated metabolic biomarker that can help to provide early diagnosis of pancreatic cancer, the number of studies focusing on metabolic profiling and phenotyping of pancreatic cancer is increasing drastically [111–114]. As compared to other omics technologies, metabolic phenotyping is a sensitive indicator due to rapid and more precise results for new biomarker discovery [115]. The largest case-control study to discover a blood-derived metabolic biomarker signature that enables one to distinguish PDAC from chronic pancreatitis (ChP) was conducted by Mayerle et al. [114]. They investigated metabolomic profiles of plasma and serum samples from 914 subjects (patients with PDAC, ChP, liver cirrhosis, healthy, and non-pancreatic disease control), and a tumor biomarker signature (nine metabolites and additionally CA 19-9) was identified for differential diagnosis between PDAC and ChP with an AUC of 0.96. In a retrospective study investigating tissue metabolomics from 25 pancreatic cancer patients who had to undergo tumor resection surgery and gemcitabine-based adjuvant therapy, high lactic acid levels were observed in patients with poor clinical outcomes after gemcitabine therapy. Moreover, the combined evaluation of hENT1 with lactic acid showed superior performance in differentiating patients according to their overall survival [116]. In another study, Battini et al. [117] investigated tissue samples from 106 patients after PDAC resection to find metabolic biomarkers associated with long-term survival using metabolomic analysis methods. While the network analysis results revealed that higher levels of glucose, ascorbate, and taurine associated with long term survivors, decreased levels of choline, ethanolamine, glycerophosphocholine, phenylalanine, tyrosine, aspartate, threonine, succinate, glycerol, lactate, glycine, glutamate, glutamine, and creatine were estimated in long-term survivors. Due to the association of higher ethanolamine levels with worse survival, the metabolite with the highest accuracy in distinguishing between long-term and short-term survivors was ethanolamine.

An animal study was conducted to obtain metabolite profiling of pancreatic intraepithelial neoplasia (PanIN) and PDAC tissue samples from rats. They observed that the levels of kynurenate and methionine decreased in PDAC but increased in PanIN, demonstrating the potential of these metabolites to be biomarkers to differentiate PDAC from PanIN [116,118]. Laconti et al. identified that circulatory metabolite signatures can be used to differentiate animals with early-stage lesions with a diagnostic accuracy of 81.5% and 73.2% respectively [110].

Since the metabolic changes are quite important to detect and treat cancer regardless of the disease stage [119], genome-scale metabolic models (GEMs) might be a very helpful source to create and/or test the hypothesis for the elucidation of physiological mechanisms or novel biomarkers [120,121] so that GEMs can be used as a tool in both "top-down" and "bottom-up" methods in the context of biomarker discovery. GEMs have been employed for studying cancer metabolism utilizing either generic/personalized or tumor/cell-specific methods, which may translate into clinically relevant applications. They can also be used to identify drug targets leading to inhibition of cancer-related phenotypes or drug resistance in cancer therapy. Furthermore, the fortification of GEMs can be obtained via the integration of omics data like genomic, transcriptomic, and proteomic data, as well as the incorporation of regulatory molecules to the metabolism [122]. GEMs also provide valuable insight into the interaction between cancer cells and supporting cells in their niches as paving the way for whole-cell modeling [123,124].

In addition to all these, there are still some challenges in metabolomic studies. Whether significant changes in the metabolite level are due to the occurrence of the targeted disease, the use of non-confirmed metabolites with small sample size and the variability of patients' parameters would affect the accuracy and reliability of the results [125]. Therefore, further standardization and improvement of currently available metabolomics techniques is a prospective requirement for the designation of highly accurate biomarkers that will provide significant clinical benefits and may help to obtain new target signatures for accurate diagnosis, imaging, and possible therapeutic options [126,127].

#### **7. Glycomic Signatures of Pancreatic Cancer**

Cancer studies are performed mostly based on alterations in genome, transcriptome, proteome, and metabolome levels, with a relatively small number of studies in alterations in glycan compositions and/or structures and glycoproteins [128]. However, the glycan studies have been increasing day by day to identify potential glycan alterations and glycoprotein biomarkers for cancer owing to the developments in glycans profiling [129]. In cancer cells, alterations in carbohydrate structures of secreted proteins are functionally significant and may offer promising targets to develop potential diagnostic and therapeutic strategies [130–132].

Since pancreatic cancer does not indicate any noticeable symptom during the early stages, it is a very difficult cancer type to diagnose [131]. It is an important challenge to detect new diagnostic biomarkers for pancreatic cancer. The glycoproteome occurring after co-translational or posttranslational modifications (PTM) and its role in the mechanism of pathogenesis have not been explained completely in pancreatic cancer. Besides, the available information about glycoproteome in normal pancreas and pancreatic cancer is very limited [133,134].

Glycosylation—the covalent attachment of a glycan to protein, lipid, carbohydrate, or other organic molecules—is the most common and complex PTM of proteins and significantly affects the function of proteins. Glycosylation of proteins plays an important role in various biological functions, including immune response and cellular regulation. Abnormal glycosylation is accepted as a molecular characteristic of transformation into malignant tumors for many epithelial cancers, including PDAC. Therefore, targeting aberrant glycosylation associated with cancer would be a useful approach to improve accurate diagnosis and possibly therapeutic strategies [129,133].

Several studies were published about glycan alterations and glycoproteome in pancreatic cancer. Pan et al. investigated protein N-glycosylation in pancreatic tumor tissue compared to the normal pancreas and chronic pancreatitis tissue through a quantitative glycoproteomics approach using HPLC and MS. This study presented a set of glycoproteins having aberrant N-glycosylation levels in pancreatic cancer, including mucin-5AC (MUC5AC), carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5), insulin-like growth factor binding protein (IGFBP3), and galectin-3-binding protein (LGALS3BP) [133]. MUC5AC and CEACAM5 have been shown to play a role in tumor progression and metastasis in pancreatic cancer [133,135,136]. On the other hand, LGALS3BP was significantly hyperglycosylated in tumor tissue. Additionally, increased N-glycosylation on many cancer-associated aberrant glycoproteins was reported on pancreatic cancer-associated pathways such as TGF-β, TNF, NF-kappa-B, and TFEB-related lysosomal changes [133].

Yue et al. studied sera from pancreatic cancer patients to determine certain glycan alterations and their possible usage in the diagnosis of pancreatic cancer. To that end, they characterized glycan and protein levels of specific mucins and carcinoembryonic antigenrelated proteins of these patients through the antibody-lectin sandwich array method previously developed. They found that MUC16 protein was frequently increased (65% of the patients) in the cancer patients, whereas MUC1 (30%) and MUC5AC (35%) proteins were less frequently elevated. In addition to this, MUC1 and MUC5AC proteins indicated

highly extensive and diverse glycan alterations, while MUC16 protein did not. The most frequent glycan elevations that affected these proteins involved the Thomsen–Friedenreich antigen, fucose, and Lewis antigens. Additionally, they reported an unanticipated enhancement in the exposure of alpha-linked mannose on MUC1 and MUC5AC. Moreover, the CA19-9 on MUC1 had the most important increase (87%) in cancer patients with 4% of the control subjects [130].

In another study, N-glycosylation at Asn88 in serum human pancreatic ribonuclease 1 (RNase1) was substantially elevated in pancreas cancer patients compared with normal human subjects [131]. Similarly, increased fucosylation levels of serum α-1-acid glycoprotein (AGP) glycoforms were reported in pancreatic cancer compared to healthy controls and pancreatitis patients via numerous analytical methods consisting of MS, capillary zone electrophoresis (CZE), and enzyme-linked lectin assays (ELLA) [134].

As an alternative therapy option having fewer adverse effects than others, regional intra-arterial chemotherapy (RIAC) is preferred for advanced pancreatic cancer. Qian and colleagues [137] took advantage of the presence of Glypican-1 (GPC1) in extracellular vesicles (EVs) to determine if the change in GPC1+ cells in EVs could be a predictor of the consequences of RIAC for advanced pancreatic cancer patients. They concluded that patients with advanced pancreatic cancer who displayed a decrease in GPC1+ EVs experienced enhanced overall survival rates with the aid of RIAC therapy.

Another cell-surface glycoprotein, CD44 is a known prognostic biomarker and therapeutic target in pancreatic cancer [138]. The overexpression of CD44 was shown to be associated with aggressive malignant attitudes, cell migration, and distance metastasis, therefore with poor overall survival in patients with pancreatic cancer [138]. On the other hand, the reduction in CA19-9 levels envisaged a good prognosis after neoadjuvant therapy with a low incidence of recurrence after surgery [139].

All of these studies provide an insight into the potential biomarker candidates for effective diagnosis, prognosis, and treatment in pancreas cancer using measurements in glycan alterations on precise glycoproteins.

#### **8. Metagenomic Biomarkers of Pancreatic Cancer**

In recent studies, the interaction between microbiomes and the initiation and progression of pancreatic cancer has become recognized, raising the possibility of identifying novel diagnostic and prognostic factors for PDAC [140]. The existence of intratumoral microbiota is considered to have a potential etiologic impact on pancreatic carcinogenesis, including inflammation, immunosuppression, and stimulation of cellular carcinogenic pathways [141–143].

It is becoming clear that there is a correlation between oral microbiota and PDAC, and the abnormalities of oral microbiota have been proposed to appear before the development of cancer [144]. Available literature data provide knowledge on the oral bacteria that might play a pathogenic role in the progression of PDAC, and these are *Porphyromonas gingivalis*, *Fusobacterium*, *Neisseria elongata*, and *Streptococcus mitis* [145]. In this context, a large metagenomic study comparing PDAC patients and healthy controls revealed that *P. gingivalis* was associated with an approximately 60% greater risk of PDAC [146]. Mitsuhashi et al. [147] indicated that the existence of approximately 10% *Fusobacterium* in pancreatic cancer tissue is independently associated with poor prognosis of PDAC but not with its clinical and molecular features. It is also thought that *Fusobacterium* species may be a candidate prognostic biomarker for pancreatic cancer and should be considered for further oral microbiota studies. On the other hand, some studies have revealed that *Fusobacteria* are associated with reduced risk of PDAC, revealing that the role of *Fusobacteria* on PDAC could be controversial [144,146,148].

Fecal microbial transplantation (FMT) possesses an enormous amount of microbiota compared to usually preferred probiotic supplements and might provide a significant movement in reducing the immunosuppression and in increasing the response rate to treatment in cancer patients having a probable low survival [149]. In a recent cohort study,

Riquelme and colleagues [150] made a metagenomic analysis from 68 tumor samples of tumor microbiome composition of PDAC patients with short-term survival (STS) and long-term survival (LTS) phenotypes using 16S rRNA gene sequencing. They reported that the tumor microbiome diversity of long-term survivors was higher than that of short-term survivors, potentially representing a strong interrelation between the gut microbiome and patients' survival rate. Besides, animal studies by human-into-mice FMT experiments from STS, LTS, and healthy donors conspicuously confirmed that the transference of the longterm survivors' gut microbiome can modulate the intratumoral microbiome. According to a study encompassing a comparative analysis of fecal microbiota from PDAC patients and control donors in murine models, a certain type of bacteria—namely, Proteobacteria, Actinobacteria, Fusobacteria, and Verrucomicrobia—are found in higher amounts in the gut of PDAC patients. Specifically, the gut microbiota of PDAC patients contains greater amounts of Proteobacteria (45%), Bacteroidetes (31%), and Firmicutes (22%). This study remarkably highlights that the intratumoral microbiome associated with pancreatic cancer has relatively distinct proportions in comparison to the microbiome of normal pancreatic tissue [143].

In an animal study, Mendez et al. [151] demonstrated a substantial correlation between microbial dysbiosis and the release of tumor-inducing metabolites in the early-stage, while showing significantly elevated serum polyamine concentrations in PDAC patients; this may be postulated as a predictive biomarker for early detection of pancreatic cancer. It is among the current assumptions that bacteria in the pancreatic microbiome may contribute to the resistance of gemcitabine, which is widely used in the treatment of PDAC. Based on this assumption, 76% of the tested pancreatic tissue was found to be positive for bacteria, particularly Gammaproteobacteria [152].

Several studies also suggest that the composition of oral [146,148,153], fecal [154], and pancreatic microbiome [143,155] may be used for early diagnosis of PDAC. With the accumulation and advanced evaluations of data on the pancreas, gut, and oral microbiota, it might be possible to develop microbiome screening methods that can be considered as a promising tool in the prediction of PDAC risk and treatment of disease progression.

#### **9. Biomarkers Leading to Improved Personalized Medicine**

On the way to personalized medicine, there are promising and on-going efforts for the integration of multi-omic data. As an aim of precision medicine, the first attempt is to stratify patients according to their disease subtypes, biomarkers, clinical features, or demography. Later, in addition to the stratification process, more features such as environment, medication history, behaviors, and habits are utilized to create smaller groups. In theory, this stratification technique should avoid failures in clinical trials since the suitable diagnosis and targeted treatments are applied to small patient populations or directly to individuals. Instead of "one-size-fits-all" treatment approaches, the best therapy options or medications for each individual or a small group can be achieved through disease stratification and then personalization by the integration of multi-omics networks. In addition, personalized medicine treatment necessitates the co-development of diagnostic tools (preferably within noninvasive methods) to characterize the ideal therapy for patients. There is an urgent need for multi-omic data integration not only for pancreatic cancer but also for many other diseases from the personalized medicine perspective in the future (Figure 1).

According to the present clinical data, using only chemotherapeutic approaches in the treatment of pancreatic cancer will likely be insufficient in terms of the increase in survival time and response rate in the near future. Therefore, there is an urgent need for precision medicine, which aims at tailoring the best treatment option for individual patients based on their genomic information, together with molecular, environmental, and lifestyle factors, to identify the suitable biomarkers and targeted therapies for cancer patients. Personalized medicine stratifies the patients by considering the individual differences among cancer

patients, unlike conventional therapy. As in other types of cancer, studies on precision medicine in pancreatic cancer have increased in recent years [156,157].

There are several precision medicine programs and clinical studies run by various initiatives from different countries to offer the best personalized treatment options for pancreatic cancer patients according to their molecular tumor profiling [156]. These programs have demonstrated that a small patient cohort had better progression-free survival after switching their therapies from standard-of-care treatment to molecular-targeted therapy [158]. Further, molecular profiling of tumors from patients with all stages of pancreatic cancer was performed using NGS to develop response rates and therapeutic biomarkers [159]. Besides, different clinical studies were performed to discover biomarkers for prognosis or treatment response [160], focusing on alterations in genome and epigenome in tumor tissue [161]. The Comprehensive Molecular Characterization of Advanced Pancreatic Ductal Adenocarcinoma for Better Treatment Selection (COMPASS) trial was the prospective translational study that investigated the feasibility of comprehensive real-time genomic analysis of advanced PDAC, integrating genomic and transcriptomic subtypes and chemotherapy response [162].

The alterations in the genome, epigenome, proteome, and metabolome cause the changes in the phenotype in pancreatic cancer, and thus studies carried out on these alterations could help with the stratification of pancreatic cancer. The identification of new biomarkers for subtyping, diagnosing cancer, and predicting therapy response is an ongoing process in preclinical studies. However, the difficulties in the translation of promising preclinical findings into clinical practice make the application of precision medicine approaches in clinics a great challenge. These difficulties arise from the evaluation of basic science findings in the clinical settings and the selection of the best effective scientific data for clinical trials [156]. Moreover, it is very important and vital to building collaborations among basic scientists, clinicians, and bioinformaticians to overcome these challenges.

For patients with pancreatic cancer, CA19-9 is the only routinely used serum biomarker in prognosis and early diagnosis of recurrence after therapy [156]. Although the increase in CA19-9 level indicates advanced pancreatic cancer and poor prognosis [139], this elevation can be only observed in 65% of the patients with resectable pancreatic cancer, in addition to patients with other diseases such as pancreatitis or cirrhosis [163]. Besides, 10% of patients with pancreatic cancer cannot synthesize CA19-9 even if they are in the advanced stage, since they are negative for Lewis antigen a or b. Moreover, it is not a screening biomarker for pancreatic cancer to be used alone [156].

Numerous gene alterations that play important roles in tumorigenesis can provide the development of novel treatments that target specific genes for pancreatic cancer patients. Personalized medicine can certainly improve the management of patients and outcomes of novel treatments with the administration of the right therapy using the right dose at the right time to the right patient when applied to pancreatic cancer patients. The generation of well-designed clinical trials allowing the construction of molecular profiling of tumors of patients will further guide the development of novel and effective strategies for the overall survival of patients in this highly lethal cancer [157,160].

#### **10. Conclusions**

There are big initiatives, various research programs, and databases in which researchers are able to collect different omics datasets of pancreatic cancer. However, many biomarker studies have been challenged by low case numbers, non-specificity of molecular markers and their low reproducibility, and the absence of preclinical or clinical as well as feasibility studies.

The well-known example of pancreatic cancer biomarkers is CA19-9, but as a single biomarker it cannot offer a potential to be used in the clinic. Recent studies on noncoding RNAs such as miRNAs, circRNAs, and lncRNAs hold great promise not only as biomarkers but also for understanding the regulatory network components in pancreatic

cancer. Targeted or shotgun proteomic approaches also provide an opportunity for more sensitive or novel biomarker identification. Metagenomics is another emerging technique that measures altered microorganism abundance and may act as a potential biomarker. On the other hand, although the pancreas is at the center of many metabolic pathways, the metabolic rewiring of pancreatic cancer is an underestimated topic since the number of metabolomics studies are not as numerous as some of the other omics investigations.

Although many novel markers have been discovered through omics studies of PDAC in the past decade, none of those novel biomarkers have yet been brought into routine clinical practice. However, there is a hope that various combinations of these biomarkers as a biomarker panel may result in a clinical output, and this fact makes the integration of multi-omics data more challenging on the way to translating omics markers into the clinic.

Another point that has a crucial role in translation to the clinic is sampling, where body fluids are favorable for the detection of the biomarkers. Later, these biomarkers also assist oncologists in deciding optimal therapeutic management by defining the way for precision treatment.

In conclusion, there is great attention focusing on multi-omics biomarkers in terms of their diagnostic, predictive, and prognostic potentials to fight against pancreatic cancer as well as other cancer types. One of the major medical concerns raised by oncologists is the identification of robust, reasonable, and reliable diagnostic biomarkers since early detection of pancreatic cancer is crucial for personalized therapy options and improved survival outcomes. This strategy can be accomplished by a systems biology approach that aims to organize multi-omics data despite the challenges. Successfully accomplishing multi-omics data integration by systems biology approaches will fulfill future expectations and the need for robust, accurate, and feasible biomarker panels for pancreatic cancer.

**Table 1.** A summary of methodology and sampling used in biomarker studies for pancreatic cancer.



```
Table 1. Cont.
```


#### **Table 1.** *Cont*.


#### **Table 1.** *Cont*.

\* CZE: capillary zone electrophoresis, ELISA: enzyme-linked immunosorbent assay, ELLA: Enzyme-linked lectin assay, FISH: fluorescence in situ hybridization, GC: gas chromatography, HILIC: Hydrophilic interaction chromatography, HRMAS: high-resolution magic angle spinning, LC: liquid chromatography, MS: mass spectrometry, NMR: nuclear magnetic resonance, qRT-PCR: quantitative reverse transcription polymerase chain reaction, TOF: time of flight, WB: Western blot, WES: Whole exome sequencing, WGS: whole genome sequencing.

> **Author Contributions:** Conceptualization, B.T., K.Y.A., and R.S.; writing—original draft preparation, B.T., E.Y., and G.G.; review and editing, B.T., K.Y.A., and R.S.; visualization, G.G.; supervision, B.T., K.Y.A., and R.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **Drug Repurposing for Triple-Negative Breast Cancer**

**Marta Ávalos-Moreno 1,**† **, Araceli López-Tejada 1,2,**† **, Jose L. Blaya-Cánovas 1,2 , Francisca E. Cara-Lupiañez 1,2 , Adrián González-González 1,2 , Jose A. Lorente 1,3 , Pedro Sánchez-Rovira <sup>2</sup> and Sergio Granados-Principal 1,2,\***


Received: 24 September 2020; Accepted: 28 October 2020; Published: 29 October 2020

**Abstract:** Triple-negative breast cancer (TNBC) is the most aggressive type of breast cancer which presents a high rate of relapse, metastasis, and mortality. Nowadays, the absence of approved specific targeted therapies to eradicate TNBC remains one of the main challenges in clinical practice. Drug discovery is a long and costly process that can be dramatically improved by drug repurposing, which identifies new uses for existing drugs, both approved and investigational. Drug repositioning benefits from improvements in computational methods related to chemoinformatics, genomics, and systems biology. To the best of our knowledge, we propose a novel and inclusive classification of those approaches whereby drug repurposing can be achieved in silico: structure-based, transcriptional signatures-based, biological networks-based, and data-mining-based drug repositioning. This review specially emphasizes the most relevant research, both at preclinical and clinical settings, aimed at repurposing pre-existing drugs to treat TNBC on the basis of molecular mechanisms and signaling pathways such as androgen receptor, adrenergic receptor, STAT3, nitric oxide synthase, or AXL. Finally, because of the ability and relevance of cancer stem cells (CSCs) to drive tumor aggressiveness and poor clinical outcome, we also focus on those molecules repurposed to specifically target this cell population to tackle recurrence and metastases associated with the progression of TNBC.

**Keywords:** triple-negative breast cancer; personalized medicine; computational methods; drug repurposing; clinical trials; cancer stem cells

#### **1. Introduction**

Breast cancer is the second most common cancer and the second cause of cancer death among US women, after lung cancer [1]. In 2020, it is estimated that 279,100 new cases will be diagnosed in the United States and more than 42,000 deaths will be a consequence of this type of cancer [2]. It is a heterogeneous disease that has been classified using immunohistochemical techniques to measure the presence of three receptors: estrogen receptor (ER), progesterone receptor (PR), and overexpression of human epidermal growth factor receptor 2 (HER2). Triple-negative breast cancer (TNBC) is characterized by the lack of expression of these receptors and, consequently, there are no approved targeted therapies [3]. Approximately 10% to 20% of new cases of breast cancer would be included in this subtype, which presents poor prognosis with high risk of relapse compared to other breast cancer subtypes [4]. TNBC is the breast cancer subtype with the poorest overall survival (OS) and the highest rates of metastases [5], most commonly in lungs and brain [6]. Furthermore, it is more frequent in

women in younger ages and black race, presenting an incidence rate about twice as high compared with white race [1].

Histopathologically, TNBC is a heterogeneous group that mostly presents features of ductal invasive carcinomas, but also metaplastic, medullary, or apocrine characteristics. Based on the gene expression profile, TNBC is divided into four subtypes: basal-like 1 (BL1), basal-like 2 (BL2), luminal androgen receptor (LAR), and mesenchymal (M) [6]. As a result of the variety and the lack of receptors of TNBC, there are not targeted therapies, making it necessary the application of personalized medicine. Whereas TNBC has a higher sensitivity to chemotherapeutics in comparison to other breast cancers, this subtype presents a higher risk of recurrence, which makes the unraveling of new treatments important [5]. Nevertheless, the process of creating and testing a new drug for TNBC is a cost- and time-consuming challenge that requires a huge investment and comprises high failure rates. For this reason, drug repurposing has been considered an increasingly successful approach for developing new therapies [7].

#### **2. Current Treatments for TNBC**

Besides surgery, nowadays, chemotherapy is the only treatment approved by the Food and Drug Administration (FDA) for non-metastatic TNBC [8], which includes microtubule inhibitors, anthracyclines, alkylating agents, antimetabolites, and platinum (Table 1) [7,9]. The current standard of treatment is based on a combination of anthracyclines and taxane agents [10]. In spite of initial chemosensitivity of tumors and the use of different drug combinations to potentiate treatments, later chemoresistance is frequently developed and it is related to the high presence of cancer stem cells (CSC) [9]. All of these compounds are repurposed drugs as they have been previously approved for diseases other than TNBC [7,11,12].


**Table 1.** Summarized approved agents for non-metastatic triple-negative breast cancer (TNBC).

Additional therapeutic options have been recently approved by the FDA for metastatic TNBC, when patients do not respond to traditional treatments (Table 2) [13]. For instance, olaparib and

talazoparib, two PARP (poly[adenosine diphosphate-ribose] polymerase) inhibitors of enzymes were approved for patients harboring germline mutations in *BRCA1*/*2* [8,13–15].


**Table 2.** Novel approved agents for metastatic TNBC.

Furthermore, the use of patient's immune system as an approach for cancer treatment, or immunotherapy, has strongly emerged as the fifth pillar of cancer therapy [16]. Immune escape is hallmark of tumor cells that promotes their development and progression, by decreasing immune recognition, for example, through the expression of immune suppressive molecules, or immune checkpoints, like cytotoxic T-lymphocyte-associated antigen-4 (CTLA-4) or programmed cell death-1 and their ligands (PD-1, PD-L1/2)(19–21). Ligand-receptor binding inhibits T-lymphocytes activity through their exhaustion. Physiologically, these molecules are checkpoint regulators of strength and last of LT-mediated immune response [16]. Interaction of PD-1/PD-L1 represents a mechanism of resistance to adaptative immune system by tumor cells in response to the endogenous antitumor response [16]. Nowadays, several checkpoint inhibitors (CPIs) (antibodies anti-CTLA-4, anti-PD-1, and anti-PD-L1) are under clinical use in cancer. In TNBC, combination of CPIs with targeted therapies and/or chemotherapy have been shown to be more effective than monotherapy, which showed a modest effectivity and durability [17]. Recently, atezolizumab, an inhibitor that targets PD-L1, has been approved in combination with paclitaxel for the treatment of patients with previously untreated metastatic TNBC (IMpassion130 study, NCT02425891) [18,19]. Despite of the great expectative on this new and expensive therapy, a small percentage of patients respond to it [16] because of several reasons such as the low tumor infiltration of lymphocytes (TILs, tumor infiltrating lymphocytes), presence of which is associated with a higher survival and good prognosis in early stage TNBC patients [17], low expression of PD-L1 on tumor cells, or the expression of other inhibitor molecules of immune system (IDO, CD73, TIGIT, or VISTA) [20].

Lastly, antibody-drug conjugates (ADC) represent a big potential to improve cancer treatment as they allow to target toxic drugs directly into cancer cells by using specific receptors. Sacituzumab govitecan is the newest therapeutic option available only after the failure of at least two other treatments [13]. This FDA-approved drug is an anti-trophoblast cell-surface antigen 2 (Trop-2) antibody conjugated with SN-38, a DNA damaging agent [21].

#### **3. Drug Repurposing**

The discovery and development of a new drug is a time-consuming process which requires great investments, being estimated to take between 10 and 17 years and a cost of US\$2–3 billion [22,23]. Moreover, it comprises high failure rates in clinical trials, where almost 90% of the drugs are rejected because of unexpected properties [7]. Drug repurposing (also known as drug repositioning or drug reprofiling) is a strategy for identifying new uses for existing drugs, both approved and under investigation (Figure 1). This relatively new strategy allows to significantly shorten the time and reduce the costs of drug development, especially in the case of FDA-approved repurposed drugs, which would likely go through accelerated clinical trials owing to their previous safety and toxicological clinical studies [24]. It has been estimated that repurposing a drug would cost, on average, US\$300 million [23]. Several methodologies can be considered for drug repurposing, from non-computational approaches

including high-throughput screening [25] and methods based on experimental findings and previous literature, e.g., target-based, to computational strategies. Indeed, drug repurposing process can be highly improved via computational methods related to chemoinformatics, genomics, and systems biology. These methods allow to select, prior to in vitro experiments, drug candidates for repositioning in a rational manner [24,26,27].

**Figure 1.** Comparison between de novo drug development and drug repurposing. Adapted from Ashburn and Thor [22].

#### *3.1. Common Computational Approaches for Drug Repurposing*

– There are many different computational approaches for drug repurposing based on different types of data, including drug and target structures, drug–target interactions, or transcriptomes. Accordingly, several classifications have been suggested [24,28,29]. To date, it has not been determined which approach would be the best option for in silico drug repositioning, and no standardized method has been adopted. Hence, analyzing the retrieved literature, it was considered of interest reviewing and summarizing the most accessible, commonly used approaches (Figure 2), so as to provide a fuller view of the current strategies and the possibilities that in silico analysis has to offer. Thus, these various computational approaches have been categorized in: (1) structure-based, (2) transcriptional signatures-based, (3) biological networks-based, and (4) data-mining-based drug repurposing.

#### 3.1.1. Structure-Based Drug Repurposing

Structure-based methods, which rely on both drug and receptor structure, are mainly based on virtual high-throughput screening (VHTS) of small chemical compounds from different databases such as PubChem (https://pubchem.ncbi.nlm.nih.gov/), DrugBank (www.drugbank.ca/), ChemSpider (www.chemspider.com/) or CheEMBL (www.ebi.ac.uk/chembl/). It allows the user to find, in silico, multiple drugs that will potentially interact with the target's binding site [24]. The 3D structure of the target, which is usually a protein, can be found in the Protein Data Bank (PDB, www.rcsb.org/). VHTS comprises a computational modelling technique known as molecular docking, which enables to predict ligand-receptor biding affinity via different scoring functions. There are several molecular docking programs, such as Glide (www.schrodinger.com/glide), GOLD (www.ccdc. cam.ac.uk/solutions/csd-discovery/components/gold/), UCSF DOCK (http://dock.compbio.ucsf.edu/), AutoDock Vina (http://vina.scripps.edu/), or Ledock software [30]. VHTS can also be inversely approached by finding a variety of biological targets that may have affinity for a particular ligand. Apart from molecular docking, the user can also perform pharmacophore mapping, which consists of searching of ligands that can be matched to a pharmacophore, i.e., a set of molecular features such as hydrogen bonds, hydrophobic groups, or chemical substructures, that enable the recognition of a ligand by a receptor and their biological activity. Pharmacophore features can be derived

from protein-binding site or protein–ligand complexes structures, and software packages such as Catalyst (www.3dsbiovia.com/), Unity (Tripos, www.tripos.com), or PharmMapper can be used for pharmacophore searching [24,26]. Structure-based methods also encompass ligand/receptor profiling, based on a guilt-by-association principle. Ligand profiling consists of finding compounds that are chemically similar to a given drug, and consequently may have similar functional and biological properties. Likewise, receptor profiling consists of finding proteins that have similar binding sites to a particular receptor, therefore being likely to bind with the same ligands [24,26].

**Figure 2.** Diagram of the main computational approaches and software for drug repurposing.

#### 3.1.2. Transcriptional Signature-Based Drug Repurposing

multiple drugs that will potentially interact with the target's binding site Transcriptional signatures related to a disease or transcriptional responses associated to a specific treatment can be used for drug repurposing. Potential drug candidates can be identified via negative correlation between the gene expression profile from a disease and the transcriptional signature induced by a small compound, with the aim of finding a drug that would reverse the disease state toward the normal one. Similarly, positive correlation can be used to identify small compounds that have similar transcriptional signatures to a genetically or chemically induced perturbation, so as to induce a similar gene expression [31]. Signature-based drug repurposing is also known as connectivity mapping, a concept first introduced with the creation of the Connectivity Map (CMap) database [32,33], which comprises a genome-wide dataset of transcriptional expression responses of human cell lines to perturbagens, e.g., chemical treatments or genetic perturbations [34]. Transcriptional data can be found in different public databases such as Gene Expression Omnibus (GEO; www.ncbi.nlm.nih.gov/geo/), Ensembl (www.ensembl.org/), or The Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/), and several tools are available for analyzing and comparing drug

–

and disease transcriptional profiles. Examples of tools for signature-based repurposing are CMap (https://clue.io/), L1000CDS<sup>2</sup> (http://amp.pharm.mssm.edu/L1000CDS2/), and ksRepo free source [24].

#### 3.1.3. Network-Based Drug Repurposing

Biological networks are data representations used to model biological interactions of any kind, where nodes represent various biological components, such as genes or proteins, and whereas edges represent the associations between them [28]. Network-based drug repositioning methods help inferring unknown disease-associated signaling pathways and therefore new therapeutic targets. There are different biological networks depending on the main source of biological data. Some interesting examples are protein–protein interaction (PPI) networks and drug–target interaction (DTI) networks. In PPI networks, nodes represent proteins. Most proteins are associated with other proteins, but only a limited number of them interact with multiple others. PPI networks allow to identify the most highly connected central proteins, generally known as hubs or hub proteins [35]. Alterations of hubs may affect the structure of the biological network, leading to dysfunction and disease [36]. Accordingly, PPI networking methods help predicting new disease-related targets for drug repurposing. PPI analysis can be performed with PRISM (Protein Interactions by Structural Matching; http://gordion.hpc.eng.ku.edu.tr/prism) server [36], or OmicsNet (https://omicsnet.ca/). Regarding DTIs, they are considered bipartite networks, where nodes represent both drugs and targets. There are several tools for predicting potential DTIs, such as DT-web (https://alpha.dmi.unict.it/dtweb/) or STITCH (http://stitch.embl.de/). Moreover, systems biology combines different network models with quantitative mathematical network models to infer the dynamics of biological systems, providing a more complete perspective for drug repurposing [24]. Complex biological networks can be found in the Causal Biological Networks (CBN, http://causalbionet.com/) database, and complex biological pathways can be found in KEGG database (www.kegg.jp/).

#### 3.1.4. Data-Mining-Based Drug Repurposing

All the previously described methods are based on drug–target interactions. However, meta-analysis of data from clinical trials is another interesting approach for drug repurposing. Su et al. [37] described a novel method for drug repositioning using ClinicalTrials.gov (https: //clinicaltrials.gov/) public database and two text mining tools, I2E (Linguamatics) and PolyAnalyst (Megaputer). It consists of, first, the extraction of Serious Adverse Event (SAE) data to identify drugs with fewer SAEs on the test arm than on the control arm and, second, the ranking of said drugs. Therefore, it allows to discover potential drug candidates for diseases different from those in the testing conditions.

#### **4. Drug Repurposing for TNBC**

The urgent necessity to find effective molecularly targeted treatments for TNBC has been translated into efforts by the research community to characterize and divide it into different subtypes with a more approachable profile. One of the first transcriptomic-based breast cancer classifications was performed by Perou et al., using cDNA microarrays and hierarchical clustering analysis to distinguish variations in gene expression patterns [38]. It gave a different approach to the commonly immunohistochemical characterization of breast cancers. Afterwards, several studies conducted similar genome-wide analyses [39–41], up until 2009 when Bernard et al. developed a qRT-PCR-based assay using only fifty genes (PAM50) to classify tumors into four intrinsic subtypes of breast cancer: luminal A, luminal B, HER2-enriched, and basal-like [42]. In 2007, Kreike et al. performed the first gene-expression-based classification of TNBC. After gene profiling, they identified all triple-negative breast tumors as basal-like, and classified them in five different subgroups [43]. In opposition, Prat et al. proved that basal-like cancers were not interchangeable with TNBCs [44], similarly to the findings of the study conducted by Lehman et al. in 2014 [45]. While the majority of TNBCs are basal-like, and vice versa, they should not be considered synonymous. These studies highlighted the necessity to further classify TNBC in well-defined subtypes in order to successfully develop personalized therapies. The first transcriptomic-based TNBC classification which differentiated between basal-like and non-basal like TNBC subtypes was performed by Lehman et al. in 2011. They identified six TNBC subtypes with representative gene expression signatures and signaling pathways, including two basal-like (BL1, BL2), an immunomodulatory (IM), a mesenchymal (M), a mesenchymal stem-like (MSL), and a luminal androgen receptor (LAR) subtype [46]. A web-based tool (TNBCtype) was also developed for the classification of TNBC samples into the six mentioned subtypes [47]. Later in 2016, Lehman et al. refined their own classification algorithm and developed a new one (TNBCtype-4), which scaled down the number of subtypes to four: BL1, BL2, M, and LAR [48]. While several other TNBC classifications followed different approaches and described varying number of subtypes, they all broadly concurred in those four main subgroups [49–51]. Recently, based on both Lehman et al. and Ring et al. algorithms [48,52], Espinosa et al. identified various TNBC cell lines whose signatures remained stable between cell lines and xenografts for each of the four subtypes: HCC2157 for BL1 subtype; HCC70, SUM149PT and HCC1806 for BL2 subtype; BT-549 for M subtype; and MDA-MB-453 for LAR subtype [53]. Thus, those cell lines, representative of each subtype, should be considered for in vitro studies on the effectiveness of targeted therapies in all different subtypes. Among the previously mentioned TNBC subtypes, the dependency on androgen receptor (AR) signaling of the LAR subtype provides a feasible target for directed therapies, which makes it an excellent candidate for drug repurposing. Whereas patients with AR-dependent TNBCs, which have a better prognosis than those with other TNBC subtypes [54], would benefit from AR inhibition therapy, it has been suggested that this may also be beneficial for non-LAR patients with relatively lower AR expression [50,55,56]. However, not all TNBCs express AR, so a quadruple negative breast cancer subtype has also been addressed [57,58]. This subtype would not benefit from AR antagonist repurposing treatments, and so forth different molecular pathways would need to be targeted. Accordingly, we offer an insight on the main repurposed therapies which are currently being investigated for the treatment of TNBC based on their molecular targets, including both AR-directed and non-AR-directed therapies, as shown in Figure 3. We have also summarized drugs in preclinical phase for TNBC in Table 3 and those under clinical trials in Table 4.

**Figure 3.** Overview of the different pathways investigated by drug repurposing. Repurposed inhibitors under investigation are shown in red. Created with BioRender.com.


#### **Table 3.**Summarized repurposed drugs to treat TNBC that are under investigation in the preclinical phase.



**Table 4.**Summarized repurposed drugs for TNBC under current investigation in clinical trials.



1 Last access toClinicalTrials.govon October 16th, 2020.

#### *4.1. Androgen Receptor*

LAR subtype is highly enriched in hormonally regulated pathways, despite being negative for both ER and PR. All ER, PR, and AR belong to the nuclear steroid hormone receptor family, and it has been proposed that AR overexpression may replace ER signaling, resulting in similar functional effects. In fact, both epidemiological and preclinical studies suggest that the androgenic signaling pathways may be linked to the development of breast cancer [50,51,54]. AR plays a central role in regulating gene expression, is mainly located in the cytoplasm, and it can be found complexed with heat shock proteins, HSP70 and HSP90, in order to maintain its inactive conformation. Upon binding of androgens, the receptor dissociates from HSPs and homodimerizes, enabling nuclear translocation. Once in the nucleus, AR binds to the promoter of target genes and induces the recruitment of coactivators and other transcription factors, therefore inducing transcriptional activation [54,95]. In TNBC, it has been suggested that AR activation alters the tumor microenvironment, hence suppressing the antitumor response and upregulating the secretion of the epidermal growth factor receptor (EGFR) ligand amphiregulin (AREG), both stimulating tumor growth and progression. AR activation has also been linked to metastasis via promotion of epithelial-to-mesenchymal transition (EMT), survival of anchorage-independent cell population, and maintenance of a CSC-like population [56,58]. However, the mechanisms by which AR-associated pathways may influence TNBC development and progression still remain unclear and are currently under research. Considering the crucial role that AR may play in AR-positive TNBC, different AR-targeted agents first intended for the treatment of metastatic castration-resistant prostate cancer (mCRPC) are being repurposed and tested in clinical trials on TNBC patients. It includes several FDA-approved drugs, such as bicalutamide, enzalutamide, or abiraterone acetate, as well as experimental drugs such as orteronel or seviteronel [88,96,97]. In fact, enzalutamide has proved to prolong survival in men with mCRPC after developing drug resistance to chemotherapy [98]. Therefore, they might represent an alternative treatment to avoid resistance in TNBC. Additionally, selective AR modulators or SARMs (e.g., enobosarm), investigational drugs first intended to be used as an alternative to testosterone therapies for male hypogonadism as well as related conditions such as muscle dystrophy, sarcopenia, or osteoporosis, are also currently being tested in clinical trials for both prostate cancer and TNBC [95,99,100].

Bicalutamide. It was the first drug to be repurposed in clinical trials as a potential treatment for AR-positive TNBC. Bicalutamide is a first-generation, non-steroidal antiandrogen developed for prostate cancer. It acts as a competitive inhibitor that directly binds to AR, stabilizing its association with HSPs. Whereas it maintains the receptor in an inactivated conformation, it does not prevent nuclear translocation and binding to DNA, which entails possible partial agonistic activity [58,101]. In vitro studies showed that bicalutamide significantly reduced cellular proliferation and colony formation, and induced cell apoptosis in MDA-MB-453 and MDA-MB-231 breast cancer cells. Reduction of tumorigenicity was associated with the inhibition of Wnt/β-catenin signaling pathway through downregulation of c-Myc transcripts. Moreover, assays with xenografts tumors of MDA-MB-453 and MDA-MB-231 cells further demonstrated that bicalutamide decreased cellular viability and induced apoptosis in vivo [82]. A single-arm, nonrandomized, phase II clinical trial with bicalutamide was performed in AR-positive TNBC (NCT00468715). The criteria to define AR positivity was an AR expression higher than 10% by immunohistochemistry (IHC). Among all AR-positive patients (*n* = 51), 26 were treated with bicalutamide. The clinical benefit rate (CBR), defined as the total number of patients who showed a complete response, partial response, or stable disease at 6 months, was 19%, and the median progression-free survival (PFS) was 12 weeks. The drug had grade 1–3 adverse events (AEs), such as fatigue, limb edema, or hot flashes, indicating a moderate toxicity. This study suggested the potential of AR blockade in AR-positive metastatic TNBC [81]. Other clinical trials are currently under development, including a phase II (NCT02605486) and a phase III (NCT03055312) trial.

Enzalutamide. It is a second-generation, non-steroidal antiandrogen developed for prostate cancer, with higher binding affinity than bicalutamide. Upon binding to AR, enzalutamide blocks nuclear translocation, recruitment of AR cofactors, and transcriptional activation which, oppositely to bicalutamide, results in a lack of agonistic activity [54,55,58]. Different in vitro studies demonstrated that enzalutamide reduced cell proliferation, migration, and invasion and increased apoptosis [55,56,84], and it was correlated with decreased AREG mRNA expression in SUM159 cells after treatment with enzalutamide [56]. In vivo studies showed that enzalutamide inhibited tumor viability in TNBC xenografts by inducing cell apoptosis [56,84]. A single-arm, non-randomized phase II clinical trial evaluated the efficacy of enzalutamide in advanced AR-positive TNBC (NCT01889238). In this study, AR positivity was defined as AR expression higher than 0% by IHC (intent-to-treat population, ITT) or higher than 10% by IHC (evaluable subgroup). The ITT population (*n* = 118) and the evaluable subgroup (*n* = 78) showed a CBR at 16 weeks of 25 and 33%, respectively. Median PFS was 2.9 months in the ITT group and 3.3 in the evaluable group. Median OS was 12.7 and 17.6 in ITT and evaluable subgroup, respectively. The only treatment-related AE with grade 3 or higher was fatigue, meaning enzalutamide was well tolerated by AR-positive TNBC patients. This study supported further study of enzalutamide [83]. Moreover, other clinical studies are currently investigating the use of enzalutamide as an adjuvant in treating patients with AR-positive TNBC, including a phase II trial (NCT02689427) for enzalutamide in combination with paclitaxel and a phase Ib/II trial for enzalutamide in combination with taselisib (NCT02457910).

Abiraterone acetate. It was the first androgen-production inhibitor developed for the treatment of prostate cancer. It is a steroidal, non-selective inhibitor of 17α-hydroxylase/17,20-lyase (CYP17), a central, rate-limiting enzyme which plays a critical role in the androgen biosynthesis pathway [54,58,102]. The efficacy of abiraterone acetate was investigated in a phase II clinical trial in combination with prednisone in metastatic or locally advanced AR-positive TNBC patients (NCT01842321). AR positivity was defined as AR expression greater than 10% by IHC. Evaluable patients (*n* = 30) showed a CBR at 6 months of 20%, and the median PFS was 2.8 months. The most common treatment-related AEs were hypertension, fatigue, nausea, and hypokalemia, all grade 1–2 [85]. After this clinical trial, both in vitro and in vivo studies were performed to assess whether combining abiraterone acetate with a Chk1 inhibitor would enhance its efficacy. They showed that combination treatment with the inhibitor GDC-0575 had an additive effect on both MDA-MB-453 and SUM185PE cell lines in reducing cell proliferation. Whereas abiraterone acetate alone had a weak effect inducing apoptosis, Chk1 inhibitors doubled the effect, achieving statistical significance in MDA-MB-453 cells. Interestingly, a xenograft model with MDA-MB-453 cells injected orthotopically in the mammary gland ducts of NSG mice showed that abiraterone alone reduced tumor growth, and combination with GDC-0575 enhanced this effect [86].

Orteronel (TAK-700). It is a non-steroidal, selective, second-generation CYP17 inhibitor. Whereas clinical trials for the treatment of prostate cancer with orteronel were terminated in phase III because of a lack of significant effect on OS [54,58,103], it is currently being investigated in a phase II clinical study of women with AR-positive metastatic TNBC (NCT01990209).

Seviteronel (VT-464). It is another non-steroidal, selective, second-generation CYP17 inhibitor which, in contrast to orteronel, also inhibits AR activation [54,58]. It was demonstrated that seviteronel inhibited cellular growth and tumor volume in MDA-MB-453 cells and patient-derived xenografts (PDX), respectively [88,89]. Moreover, Michmerhuizen et al. proved that the AR inhibition with seviteronel induced radiosensitization, both in vitro and in vivo, whereas enzalutamide did not [104]. A phase I/II clinical study is investigating the activity of seviteronel in women with AR-positive TNBC (NCT02580448). Out of 16 patients with AR-positive TNBC, 6 were evaluable. Two patients (33%) had a 16-week CBR. The most common AEs were fatigue, nausea, and decreased appetite, all grade 1–2 [87]. A second phase II clinical trial is also currently investigating the effects of seviteronel in AR-positive TNBC patients (NCT02130700).

Enobosarm (MK-2866, ostarine, GTx-024). It is a non-steroidal SARM that achieves a tissue-selective modulation of AR action, hence minimizing the undesirable side-effects caused by antiandrogens [105]. In vitro studies showed that enobosarm inhibited cellular proliferation of MDA-MB-231 cells transiently expressing AR. Moreover, tumor growth was completely inhibited by enobosarm in a nude mice

xenograft model with MDA-MB-231-AR cells [106]. There was a phase II clinical trial for enobosarm in AR-positive TNBC (NCT02368691), but it was terminated because of lack of efficacy.

#### *4.2. Adrenergic Receptor*

Adrenergic receptors (ADR), which can be classified as α or β receptors, belong to the G protein-coupled receptor (GPCR) superfamily. The activation of ADR, stimulated through the catecholamines, epinephrine and norepinephrine, derives in several stress response signaling pathways key in maintaining physiological homeostasis [107]. However, there is an increasing evidence that altered ADR stimulation may play a significant role in breast cancer progression, promoting cell proliferation, metastasis, tumor invasion, and angiogenesis [68,108,109]. Accordingly, it has been addressed that ADR-directed therapies, widely used for the treatment of hypertension and other pathologies, could be repurposed for TNBC. Several preclinical studies have investigated the effects of both α- and β-ADR antagonists in TNBC [61,64,66,67,110,111], and retrospective epidemiological studies have explored whether TNBC cancer patients under treatment with beta-blockers for hypertension had a significant better outcome that non-treated patients [63,68,108,112].

#### 4.2.1. α-Adrenergic Receptor

α-adrenergic receptors can be subclassified as α1 (α1a, α1b, α1c) and α2 (α2a, α2b, α2c). Their ligands activate GPCRs and initiate a signaling cascade that, in the case of α1 receptors, increases intracellular calcium levels and is involved in blood pressure regulation, whereas α2 receptors signaling cascade decreases intracellular cyclic AMP (cAMP) levels and regulates neurotransmitters release [107]. Interestingly, activation of α-ADR has been associated with both tumor growth and chemoresistance in TNBC cell lines. Vazquez et al. showed that both epinephrine and norepinephrine, the natural ADR agonists, as well as clonidine, a synthetic α(2)-ADR agonist used in the treatment of hypertension [113], promoted cell proliferation in MDA-MB-231 cells [110]. Similarly, Bruzzone et al. demonstrated that clonidine increased tumor growth, whereas α(2)-ADR antagonist α**-**yohimbine reversed clonidine stimulation in breast cancer [114].

α-yohimbine (rauwolscine). It is an alkaloid and α(2)-ADR antagonist used as a mydriatic and in the treatment of impotence [115]. Piñero et al. found that yohimbine diminished tumor growth in vitro, and it was associated with inhibition of ERK1/2 phosphorylation in vivo [61]. It was also proved that α**-**yohimbine could reverse tumor growth after stimulation with clonidine in vivo [59]. Additionally, Flint et al. demonstrated that MDA-MB-231 cells developed resistance to paclitaxel when treated in combination with catecholamines and/or cortisol [60]. In the light of these results, we suggest the investigation of α-ADR antagonists for the treatment of TNBC and prevention of drug resistance.

#### 4.2.2. β-Adrenergic Receptor

β-adrenergic receptors can also be subclassified as β1, β2, and β3. Activation of β1- and β2-ADR increases intracellular cAMP levels, as opposed to α2-ADR, regulating the sympathetic nervous system's stress response in several different tissues [107]. The signaling cascade induced by higher cAMP levels includes two main pathways. First, cAMP activation of protein kinase A (PKA) induces phosphorylation of several transcription factors, such as GATA family, and β-ADR kinase (BARK). The latter inhibits β-ADR signaling and activates Src kinase, leading to the activation of different transcription factors, including STAT3, and several kinases like focal adhesion kinase (FAK). Conversely, cAMP also leads to Rap1A activation, which induces B-Raf/mitogen-activated protein kinase (MAPK) signaling pathway and activation of multiple genes with effects on several cellular events [116]. It has been addressed that, in breast cancer, β-ADR signaling in β-ADR-expressing tumor cells activates metastatic-associated genes involved in inflammation, angiogenesis, and EMT processes, whereas it downregulates the expression of antitumoral response genes. Moreover, activation of β-ADR pathway in tumor stromal cells and tumor-associated macrophages seem to promote tumor growth and metastasis [109,116]. Several in vitro studies with different TNBC cell lines showed that

β-ADR agonists stimulated cell migration, whereas β-ADR antagonists, such as atenolol and ICI118551, reverted this process [66,67,111]. Moreover, it was also demonstrated that β-blockers propranolol and ICI118551 decreased cell proliferation in TNBC, arresting the cell cycle and inducing cell apoptosis [62]. Oppositely, Slotkin et al. showed that treatment with β-ADR agonist isoproterenol lowered DNA synthesis and decreased cell proliferation, and that these effects were reverted by propranolol [64]. Similarly, in an experimental mouse model of breast cancer, β-ADR agonists isoprenaline and salbutamol inhibited breast cancer cell proliferation and tumor growth [61]. There seems to be conflicting results in the role of β-ADR signaling in breast cancer, indicating that it might be dependent on the cancer subtype. Accordingly, different retrospective observational cohort studies have been developed to further study the effects of different non selective β1/β2-blockers (propranolol, timolol) and selective β1-blockers (atenolol, bisoprolol, metoprolol) in breast cancer, more precisely in TNBC, so as to determine their effects in the cancer biology of each subtype [63,68,108,112]. The first observational study was performed by Powe et al. [108], in which breast cancer patients were divided into three subgroups: non-hypertensive control group (*n* = 374), hypertensive patients treated, prior to cancer diagnosis, either with β-blockers (*n* = 43) or with other antihypertensives (*n* = 49). Most β-blocker users had received selective blockers (25 with atenolol, 7 bisoprolol), but several had received non-selective ones (7 propranolol, 4 timolol). β-blocker users group suggested a significant lower risk of metastasis development, tumor recurrence, and breast cancer mortality. However, differences in β-ADR antagonists used by patients, and the lack of information in their cancer subtype made it necessary to perform further studies to assess the efficacy of non-selective β1/β2-blockers versus selective β1-blockers in TNBC.

Non-selective β1/β2-blockers (propranolol). Different studies showed that propranolol inhibited cell proliferation, arrested the cell cycle at G0/G1 and S, and induced cell apoptosis in vitro, and inhibited tumor growth in vivo [61,62,65]. Moreover, the anti-tumorigenic effects of thisβ-blocker were associated with a decrease in phosphorylation levels of ERK1/2 and the expression levels of cyclooxygenase 2 (COX-2) [62]. Interestingly, Pasquier et al. reported that, whereas combination of propranolol with chemotherapeutic drug paclitaxel seemed to have no additive effects in cellular cytotoxic effects in vitro, propranolol increased the anti-tumor efficacy of paclitaxel in an orthotopic xenograft model of TNBC, significantly increasing the median survival [65]. Barron et al. performed a study on women treated with propranolol for hypertension (*n* = 70) in the year before breast cancer diagnosis, in comparison with matching (1:2) non-users (*n* = 4738), and suggested that the use of propranolol was significantly associated with less advanced disease at diagnosis and decreased risk of metastasis and mortality [63]. However, like Ganz et al. pointed out, the limited size of the β-blocker users' group may be insufficient to prove propranolol benefits in breast cancer [117]. Moreover, the patient population was not subclassified based on cancer subtype or receptor status, so no conclusions can be drawn for TNBC subtype.

Selective β1-blockers (atenolol, metoprolol). In vitro studies demonstrated that atenolol inhibited cell proliferation in MDA-MB-435 cells [69], and enhanced metformin activity in vivo by reducing angiogenesis and metastasis [70]. In the same study mentioned above, Barron et al. also evaluated breast cancer patients treated with selective β1-blocker atenolol (*n* = 525) in the year before cancer diagnosis. However, they found no significant difference in between atenolol users and matched non-users in tumor incidence, risk of metastasis and mortality rates. These results indicated that the effects of propranolol in breast cancer were mediated by β2-ADR [63]. Melhem-Bertrandt et al. performed another retrospective study comparing breast cancer patients treated with β-blockers (*n* = 102), who received neoadjuvant chemotherapy, with non β-blockers users (*n* = 1311), as well as TNBC patients taking β-blockers (*n* = 29) compared to non-users (*n* = 348) [68]. The most commonly prescribed β-blockers were selective β1-blockers, first metoprolol (42%) followed by atenolol (37%). Interestingly, after age, race, stage, and receptor status adjustment, among some other parameters, users of β-blockers proved to have significantly lower recurrence but no significant OS among both breast cancer and TNBC patients, which seemed to contradict the findings of Barron et al. However, a subset analysis demonstrated that

the subgroup of ER-positive breast cancer patients had no significant differences in tumor recurrence. Consequently, these results suggested that, whereas patients with any breast cancer subtype could benefit from a treatment with non-selective β-blockers via β2-ADR antagonism, only TNBC patients could benefit from a treatment with non-selective β-ADR inhibitors. Nevertheless, it has to be noted that not statistically significant results in the ER-positive subgroup may have been due to the relatively short follow-up time in the study of Melhem-Bertrandt et al. Additionally, in a retrospective study on TNBC patients taking β-blockers (*n* = 74), compared to non-users (*n* = 726), Botteri et al. also demonstrated that a treatment with β-blockers was associated with a decreased risk of recurrence, metastasis, and mortality, supporting previous findings [112]. Nevertheless, new prospective studies will be required to clarify whether the efficacy of β-blockers depends on breast cancer subtype and/or receptor status.

#### *4.3. STAT3*

Signal transducer and activator of transcription 3 (STAT3) is a tumor marker for early diagnosis and the activation of its pathway is related to breast cancer aggressiveness, as it plays an important role in progression, proliferation, apoptosis, metastasis, and chemoresistance [118]. The activation of this pathway involves several cytokines such as, interleukin 6 (IL-6) and interleukin 10 (IL-10), and growth factors, including epidermal growth factor (EGF), fibroblast growth factor (FGF), and insulin-like growth factor (IGF), which bind their receptors and activate Janus kinases (JAKs). JAKs phosphorylate themselves in a tyrosine domain included in their cytoplasmic fractions and they subsequently activate STAT3 via tyrosine phosphorylation. Once STAT homodimers are produced, they are translocated to the nucleus in order to create a complex with coactivators (e.g., p68) and ending up into the activation of transcription [118]. The upregulation of IL-6/STAT3/ROS can lead to the transcription of genes involved in breast cancer progression, as well as an augmentation in inflammation and generation of breast cancer stem cells (BCSCs). Furthermore, the activation of JAK2/STAT3 favors proliferation and motility of breast cancer cells by different mechanisms, including the suppression of apoptosis by upregulation of cyclin D-1, c-Myc, and Bcl-2, and promotion of EMT. Finally, resistance to several drugs like paclitaxel may be a consequence of this pathway. Because of its complexity and wide regulation of breast cancer cells, STAT3 is an interesting target candidate to treat in TNBC. As a matter of fact, several compounds that inhibit different mechanisms are being investigated. We will highlight some of them: bazedoxifene, flubendazole, niclosamide, osthole, and zoledronic acid [118].

Bazedoxifene. It is a selective ER modulator approved in 2013 by the FDA to treat and prevent osteoporosis in postmenopausal women [71]. Using a structure-based study for repurposing drugs, bazedoxifene was discovered as a novel inhibitor of IL-6 receptor by blocking signals of glycoprotein 130 [119]. Hence, in TNBC, its mechanism involves the upstreaming disruption of STAT3 pathway as ER is not expressed. Studies in in vitro and in vivo models of TNBC confirmed the decrease of cell viability, migration, colony formation, and increase of apoptosis. Furthermore, when this compound was administered in combination with paclitaxel, a synergistic effect as well as an improvement of sensitivity to paclitaxel was found, probably because of the inhibition of the resistance effect induced by IL-6 [71,72]. Those doses were administered in safety ranges that are registered in other indication trials of bazedoxifene. Subsequently, safe effects can be assured in endometrial, ovarian, and breast tissues, but it would be necessary to study possible secondary effects in other tissues that express ER [72]. Considering the association between STAT3 and EMT, their interplay in CSCs, and the in vitro effects of bazedoxifene, we suggest that this compound could act as an inhibitor of tumor-initiating cells, although this hypothesis must be further investigated.

Flubendazole. It is an FDA-approved anthelmintic agent to treat intestinal parasites whose mechanism of action is the disruption of tubulin polymerization. For this reason, it was considered as a repurposed candidate to treat breast cancer [120]. Even though flubendazole causes cell cycle arrest at G2/M phase and, consequently, inhibits cell proliferation in vitro and tumor growth in vivo at clinical doses, it also presents additional properties. As an STAT3 inhibitor, it also causes a reduction of CD44high/CD24low CSC population, mammosphere-forming ability, and the expression of stemness genes [73]. This fact is a positive characteristic as CSCs might have an essential role in metastasis and aggressiveness of TNBC [120]. Furthermore, in some studies flubendazole is shown to increase cytotoxicity activity of fluorouracil and doxorubicin, meaning it could reduce tumor chemoresistance [73].

Niclosamide. It is a FDA-approved anthelmintic agent to treat tapeworms, which is known to inhibit cell growth in vitro and tumor growth in vivo in TNBC studies [74]. Niclosamide was identified as an inhibitor of BCSCs owing to a high-throughput drug screening [76]. It reverses EMT and inhibits the stem-like phenotype in cancer cells suggesting that it may reverse cisplatin resistance [74]. Furthermore, Lu et al. proved that niclosamide is a radiosensitizer both in vitro and in vivo models of TNBC as it reversed radioresistance generated by activation of STAT and Bcl-2 and reduction of reactive oxygen species (ROS) [75].

Osthole (7-methoxy-8-isopentenoxycoumarin). It is a coumarin-derivative extract isolated from *C. monnieri* that presents interesting properties, such as anti-inflammatory and vasorelaxant [121]. Osthole has successful results in vivo treating osteoporosis as it stimulates osteoblast proliferation and differentiation and bone formation [77]. It also possesses anti-tumoral characteristics and, hence it can be a candidate for repositioning in TNBC. Dai et al. elucidated that osthole inhibits STAT3 phosphorylation, induced by IL-6, in a dose-dependent manner by avoiding the translocation of STAT3 to the nucleus, what causes cell cycle arrest and induction of apoptosis in TNBC cell lines. Moreover, in vivo assays with osthole confirmed the suppression of STAT3 phosphorylation as well as reduction of tumor growth in TNBC xenograft mice [78].

Risedronate sodium and zoledronic acid. They are two oral bisphosphonates to treat osteoporosis that were found to be possible candidates as STAT3 inhibitors by a comparative docking study in silico. Svranthi et al. also proved their toxicity in TNBC cells in vitro [79]. Furthermore, zoledronic acid has been largely analyzed for TNBC. Schech et al. proved that it inhibited cell viability, induced cell cycle arrest, reduced proliferative capacity, inhibited self-renewal capability, and decreased the expression of EMT markers (N-cadherin, Twist, and Snail). Mechanistically, they discovered that zoledronic acid inhibited phosphorylation of RelA, an active subunit of nuclear factor κB (NF-κB). Consequently, direct inactivation of NF-κB induced the loss of EMT transcription factor gene expression [91]. In vivo studies in mice also support the antitumor potential of zoledronic acid in combination with doxorubicin [92]. In a randomized phase II clinical trial (UMIN000003261), the combination of zoledronic acid and neoadjuvant chemotherapy was evaluated in TNBC patients. The pathologic complete response rate (pCR) was ameliorated in the combination group (35.3%) (*n* = 17) compared to patients treated with chemotherapy alone (11.8%) (*n* = 17). Such an improvement of pCR rate was translated into a higher disease-free survival in the combination group (70.6%) versus the chemotherapy group (94.1%) [90]. In contrast, a phase II clinical trial studying the application of pre-operative zoledronate prematurely ended because of a low accrual rate (NCT02347163). Further trials to assess the anti-tumor activity of zoledronic acid are currently ongoing in combination with atorvastatin and neoadjuvant standard chemotherapy (NCT03358017), as well as to evaluate the potential of zoledronic acid as an adjuvant therapy (NCT02595138, NCT04045522).

#### *4.4. Nitric Oxide Synthase*

Nitric oxide (NO) is a small molecule that is involved in several functions in the organism. It can be synthesized by three isoforms of nitric oxide synthase (NOS): neuronal (NOS1/nNOS), inducible (NOS2/iNOS), and endothelial (NOS3/eNOS). NO has a short half-life and interacts with different targets, which produces nitrites, nitrates, S-nitrosothiols, and nitrosamines, these being compounds that induce DNA damage and, therefore, gene mutations [122]. Glynn et al. proved that an increased expression of iNOS in ER– breast cancer is correlated with poor survival of patients [123]. We later proved that iNOS is a biomarker of poor prognosis and a good therapeutic target in a cohort of 73 TNBC patients [93]. In a previous report, we identified two genes, *RPL39* (ribosomal protein L39) and *MLF2* (myeloid leukemia factor 2), that are commonly mutated in lung metastases from breast cancer patients, and their inhibition significantly reduced BCSC self-renewal and number, tumor cell migration, invasion and generation of lung metastases, and tumor growth in in vitro and patient-derived xenografts (PDX) models of TNBC. Mechanistically, RPL39 and MLF2 expression was associated with iNOS signaling, and their mutations were associated with shorter median time to relapse in a cohort of 53 breast cancer patients, which suggests that iNOS inhibition represents a promising strategy for the treatment of TNBC [124]. In this regard, we reported that iNOS inhibitors diminish cancer cell proliferation and migration, CSC self-renewal and EMT by a targeting HIF1α and endoplasmic reticulum stress-transforming growth factor (TGFβ)-ATF4/ATF3 crosstalk [93]. Furthermore, we later confirmed that ATF4 is a transcriptional target of TGFβ-Smad2/3, is a biomarker of poor prognosis in TNBC patients, and promotes tumor progression by modulating CSCs, metastasis, relapse, and growth in PDX of TNBC [125]. Among the inhibitors tested, we reported that the pan-NOS inhibitor L-NMMA (NG-monomethyl-L-arginine) decreased cell proliferation, migration, and CSC self-renewal in vitro, and tumor growth (associated with less expression of Ki67), CSC self-renewal and tumor initiation in xenograft models of TNBC. Accordingly, we designed a safe and effective targeted therapy in TNBC by repurposing L-NMMA, previously studied in septic shock, with a dose regimen in combination with docetaxel that restrained tumor growth and prolonged mice survival [93]. Moreover, in combination with docetaxel, iNOS inhibition with L-NMMA enhanced the response to chemotherapy in PDX models of TNBC [94]. The translation of this therapeutic approach into clinic is under investigation in a phase Ib/II study in refractory locally advanced or metastatic TNBC patients (NCT02834403) [93,94]. Finally, iNOS has been associated with different signal transduction pathways such as vascular endothelial growth factor (VEGF). Increased levels of VEGF have been found in TNBC and it is known that NO can be responsible for it. Both iNOS and eNOS can induce VEGF and promote angiogenesis, thus L-NMMA (pan-NOS inhibitor) may be a good option to target this pathway [126].

#### *4.5. Anexelekto (AXL)*

AXL, named from the Greek word *anexelekto* which means "uncontrolled," is one of the TAM (Tyro3, AXL, and Mer) family of receptors tyrosine kinase (RTK) [127]. Structurally, in the extracellular part, it is composed of two immunoglobulin-like domains and two fibronectin III domains. The intracellular part presents an RTK domain that contains a KWIAIES motif of TAM family. Its activation results in the autophosphorylation at the cytoplasmic domain that unleashes different cascades and downstream targets that are highly context dependent. Some of these pathways are PI3K/protein kinase B (Akt), extracellular-signal-regulated kinase (ERK), and STAT, which can stimulate tumorigenic processes such as cell motility, invasion, or proliferation [128]. In TNBC patients, the high expression of AXL is a predictor of poor prognosis, produces mesenquimal phenotypes, by promoting EMT through the expression of Vimentin, Twist, Snail, and Slug, higher chemoresistance, tumorigenesis, metastases, and CSCs, which make it a potential candidate to treat TNBC [80,128,129]. AXL can be activated by mechanisms dependent and independent of the ligand GAS6. If it is mediated by GAS6, AXL activates signaling pathways like PI3K/Akt, MAPK, NF-κB, and JAK/STAT, which can stimulate tumorigenic processes. On the other hand, the GAS6-independent pathway involves EGFR that activates AXL, which finally unleashes Akt transcription and produces an increase of tumor cell proliferation and migration [128]. Targeted inhibition of EGFR may not be a good option in TNBC because AXL can be activated thought other pathways and the response to EGFR inhibitors is limited [130]. Because of drug repositioning three drugs included in the same family are considered as a possible CSC-targeted therapy.

Phenotiazines. Goyette et al. carried out a research of drug repurposing based on AXL knockdown gene signature. Using CMap, they found that three phenothiazines (thioridazine, fluphenazine, trifluoperazine) could produce a similar gene signature. These dopamine receptor antagonists are used as anti-psychotics and were tested in TNBC, obtaining good results both in vitro and in vivo. In vitro, decrease of cell invasion, proliferation and viability, and increase of apoptosis were seen in TNBC cell lines. Interestingly, an increased sensitivity to standard chemotherapy was also observed in

combination with paclitaxel. In vivo, a significant reduction of tumor growth and metastasis were observed. Furthermore, mechanistic insights revealed that these compounds did not exert their activities by antagonizing with dopamine receptor. AXL activity was not decreased but a reduction of PI3K/Akt/mammalian target of rapamycin (mTOR) and ERK signaling was produced, unravelling that repurposed drugs generate the same consequences as AXL knockdown [80].

#### **5. Drug Repositioning to Target Cancer Stem Cells in TNBC**

The CSC model for tumor propagation underlines that solid tumors are hierarchically organized, and contain a subset of cancer cells with stem-cell-like characteristics known as CSCs or tumor-initiating cells, which are able to sustain tumor growth, progression, and recurrence, as well as metastasis. Consequently, this model would explain intra-tumor heterogeneity and dormant behavior of several types of cancer [131–133]. CSCs phenotype varies according to the type of cancer. BCSC are characterized by surface markers CD44+/CD24–/low and aldehyde dehydrogenase 1 (ALDH1) enzyme activity. Interestingly, it has been suggested that the acquisition of a stemness phenotype in CD44+/CD24–/low subpopulation is connected to EMT [134], key event in metastatic spread [131,135,136]. EMT is known to be regulated by different pathways, including the TGFβ, PI3K/Akt/mTOR, MAPK, or Wnt/β-catenin, which can be abnormally regulated during malignant processes in TNBC [131]. In fact, several studies have demonstrated that activation of EMT induced by TGFβ increases the subpopulation of CSCs in breast cancers [137,138]. Interestingly, CSCs have been proved to be more abundant in TNBC than in other breast cancer subtypes, which could explain its higher aggressiveness [139,140]. Therefore, efforts are being focused on the development of CSC-targeted therapies [141]. Additionally, several studies have shown that CSCs are intrinsically resistant to chemotherapy and radiotherapy, therefore, targeting CSCs in combination with conventional chemotherapy might decrease the aggressiveness of TNBC and prevent cancer relapse and improve survival [131–133]. It has been suggested that EMT inhibitors could be potential CSC-targeted therapies in breast cancer. In fact, activation of Wnt/β-catenin signaling has been correlated with the expression of CD44+/CD24–/low CSC subpopulation. Whereas different Wnt inhibitors are currently under development for the treatment of cancer, several FDA-approved drugs, such as salinomycin, vitamin D3, or pyrvinium pamoate, have proven to inhibit this pathway, being possible candidates for repurposing [50,142]. Some other FDA-approved drugs have also been demonstrated to regulate EMT and/or affect CSCs via different molecular pathways, such as all-trans retinoic acid (ATRA) [143], benztropine mesylate [144,145], and chloroquine [146]. Moreover, some of the previously mentioned TNBC-directed repurposed drugs were shown to target EMT or CSCs as well, including flubendazole, niclosamide, zoledronic acid, and L-NMMA. All breast CSCs-targeted drugs that are being investigated are summarized in Table 5.


#### **Table 5.**Summary of drug candidates to target cancer stem cells (CSCs) under investigation by drug repurposing.


**Table 5.** *Cont.*

Salinomycin. It has been shown that LRP6, a co-receptor in the Wnt/β-catenin signaling pathway, is upregulated in TNBC, [158], and transcriptional knockdown decreased Wnt/β-catenin signaling, suppressing tumor growth in vivo [159]. Interestingly, the antibiotic salinomycin was demonstrated to induced the degradation of LRP6, inhibiting the Wnt pathway [147]. Gupta et al. studied the effects of salinomycin both in vitro and in vivo in comparison with paclitaxel. Salinomycin was found to decrease CD44+/CD24−/low population both in cell culture and tumorspheres, whereas paclitaxel induced an increase of this cell population, showing that CSCs were resistant to paclitaxel but sensitive to salinomycin. This effect was later confirmed in mice orthotopically injected with SUM159 cells; it was shown that, compared to paclitaxel, salinomycin was able to inhibit tumor growth and the expression of CSC genes [149]. Moreover, a study investigating the efficacy of salinomycin in combination with LBH589 was proven to be a potential BCSCs-targeted therapy in TNBC by inducing apoptosis, arresting the cell cycle, and regulating EMT in breast CSCs [148].

Pyrvinium pamoate. This FDA-approved anthelmintic was discovered to inhibit the Wnt/β-catenin signaling pathway using a high-throughput screen in a *Xeropus* egg extract [160]. As a consequence of this inhibition, this drug is able to suppress self-renewal of CSC, it reduces both CD44+/CD24−/low and ALDH<sup>+</sup> BCSCs and expression of EMT markers such as N-cadherin, vimentin, and Snail [142]. Furthermore, pyrvinium pamoate inhibits PI3K-dependent pathway via suppression of Akt/P70S6K signaling axis [151], as well as mitochondrial respiration function [161] and fatty acids and cholesterol anabolism, lipids that are crucial to Wnt/β-catenin pathways [150]. Reduction of tumor growth was observed in in vivo assays [142,151]. Xu et al. suggested that pyrvinium pamoate's effect on chemoresistance should be assessed in combination with traditional treatments based on the known association between BCSCs and Wnt pathways and the development of drug resistance [142].

Vitamin D3. Upon binding to its ligand, the vitamin D3 nuclear receptor (VDR) heterodimerizes with the retinoid X receptors (RXRs) and regulates the transcription of several genes involved in Wnt, TGFβ and Notch pathways in different types of cancer [143]. In breast cancer, vitamin D3 has been proved to decrease transcriptional levels of the Notch ligands, resulting in the inhibition of Notch-1 signaling, and levels of NF-κB1 [152,153]. Moreover, vitamin D3 has been shown to induce the downregulation of BRCA-1 expression, a commonly mutated gene in breast cancer, including TNBC [162]. In addition, Vitamin D3 was shown to reduce cell proliferation, CD44+/CD24−/low population, and mammosphere formation [153]. Interestingly, Pervin et al. reported that, in breast cancer, VDR silencing was associated with EMT and a higher ability to form mammospheres, whereas its over-expression was followed by a decrease in mammosphere-forming ability. Moreover, in accordance with the inherent aggressiveness of TNBC, they reported that VDR was significantly downregulated in TNBC cells, which resulted in a relative insensitivity to vitamin D3 treatment. Accordingly, these authors showed that a combination therapy with DETA NONOate achieved a significant decrease in mammosphere formation in vitro and tumor growth in vivo [154]. Accordingly, vitamin D3 has been suggested to be a potential inhibitor of breast CSCs.

All-trans retinoic acid (ATRA). Also called tretinoin, is a retinoid used in dermatology which was approved to treat acute promyelocytic leukemia and has been investigated for the treatment of other cancers like lymphoma, leukemia, melanoma, lung cancer, or cervix [143]. In a HER2+ breast cancer cell line, Zanetti et al. proved that treatment of both ATRA and EGF suppressed tumorigenic effects of EGF. While EGF-treated cells developed an increase of Notch1 transcription and TGFβ pathway stimulation via SMAD3, ATRA+EGF-treated cells did not enhance levels of Notch1, and SMAD3 active form was also decreased as phosphorylation did not ensue. Hence, ATRA modulated and reduced EMT by inhibiting transcription of Notch1 and switching TGFβ pathway from a pro-migratory to anti-migratory program. In TNBC, further studies are needed to be done to verify these mechanisms [163]. Using CMap and introducing six analyses of up and down-regulated genes related to CSCs, Bhat-Nakshatri et al. found ATRA to be a good candidate for a CSC targeted therapy in breast cancer, although its effectiveness depends on tumor type. These gene signatures were obtained by comparison of gene expression in two opposite contexts: one associated with CSC versus a non-CSC conditioned control. In TNBC, it was more interesting in those subtypes having mesenchymal properties, as they are enriched for CD44+/CD24–/low subpopulations. In vitro, ATRA produced a decrease in CSC self-renewal, determined by a mammosphere assay, and its effectiveness was augmented in cell lines with higher SOX2 expression. In addition, ATRA reduced levels of EGFR, SERPINE1, and Slug in a cell-line-type-dependent manner. MDA-MB-231 cell line was less sensitive to ATRA because of SOX2-independent characterization and KRAS mutation, which was responsible for resistance to ATRA. Thus, better results in mammosphere assays were obtained after the inhibition of KRAS pathway [155]. Furthermore, Ginestier et al. proved that treatment of ATRA reduced breast ALDH1<sup>+</sup> CSC population [156].

Benztropine mesylate. It is used for the treatment of Parkinson's disease. It acts as a central anticholinergic agent, as well as an antihistamine and a dopamine re-uptake inhibitor. Cell-based phenotypic screening and functional assays showed that benztropine mesylate inhibited mammosphere formation and self-renewal, reduced CSC subpopulations (both ALDH1<sup>+</sup> and CD44+/CD24–/low), and improved chemotherapy in vitro. In vivo, it impaired CSC frequency and their tumor-initiating potential [144]. In addition, Sogawa et al. studied that benztropine could modulate EMT via STAT3, NF-κβ, and β-catenin in colorectal cancer [145].

Chloroquine. It is an autophagy inhibitor primarily used as an antimalarial drug. Interestingly, autophagy has been associated with drug resistance and maintenance of CSC population. In accordance with this mechanism, Choi et al. identified chloroquine as a potential repurposed BCSC inhibitor after in silico gene expression signature analysis of CD44+/CD24−/low population. In vitro assays showed that chloroquine alone reduced the mammosphere formation efficiency and CD44+/CD24−/low population in SUM159 and MDA-MB-231 cells, which was associated with a decrease in the expression of Jak2 and DNA methyltransferase 1 (DNMT1). Moreover, chloroquine sensitized TNBC cells to paclitaxel through the inhibition of autophagy. In vivo assays with an orthotopic xenograft model proved that chloroquine plus paclitaxel significantly reduced tumor growth and CD44+/CD24−/low population, as opposed to paclitaxel alone, which had no effect on tumor growth and increased the CD44+/CD24−/low population, compared to controls, in accordance with previous in vitro assays [146]. A phase II clinical trial demonstrated the efficacy of chloroquine in combination with taxanes in the treatment of patients with advanced or metastatic anthracycline-refractory breast cancer (NCT01446016). Among their results, objective response rate (ORR) was 45.16%, patients showed a median PFS of 12.4 months and a median OS of 25.4 months. The combination was well tolerated, with only up to 13.15% of patients experiencing Grade ≥ 3 adverse events. These results suggest that chloroquine, in combination with taxanes, could be used for the treatment of TNBC patients [157].

Several of the previously mentioned target pathways in TNBC have been associated with EMT mechanisms, maintenance of tumor-initiating cells and/or tumor invasion, and drug resistance, including AR, ADR, STAT3, and AXL pathways. Correspondingly, we hypothesize that AR antagonists [56,58], the β-blocker propranolol [65] and atenolol [66,67,111], the STAT3 inhibitor bazedoxifene [71,72,118] and zoledronic acid [91], and phenothiazines (thioridazine, fluphenazine, trifluoperazine) [80] could act as potential inhibitors of BCSCs. Nevertheless, further investigations would still need to be performed. The pathways altered by these drug candidates to be potentially repurposed, as well as those included in Table 5, have been summarized in Figure 4.

*J. Pers. Med.* **2020**, *10*, 0200

**Figure 4.** Overview of the different pathways investigated by drug repurposing to target breast cancer stem cells (BCSCs) and their potential inhibitors/modulators. Repurposed inhibitors under investigation are shown in red. Hypothesized inhibitors are shown in yellow. Created with BioRender.com.

#### **6. Conclusions**

The absence of targeted therapies for the treatment of TNBC, besides its inherent molecular and histopathologic complexity, strongly reduces the chance of patient recovery and life expectancy. It has therefore become imperative to find effective molecularly targeted treatments to overcome the aggressive progression of this breast cancer subtype. Whereas de novo research is a costly and long-term process, drug repurposing provides the possibility to reduce the time and investment needed to translate a drug from bench to bedside for a specific therapeutic purpose. Drug repositioning is achieved by means of different strategies, especially those including computational methods. Accordingly, several therapies with different molecular targets are currently being investigated for repurposing in TNBC, including androgen receptor, adrenergic receptor, STAT3, nitric oxide synthase, or AXL-directed therapies. However, because of the importance of CSCs in the progression and aggressiveness of this subtype of cancer, current efforts are also being directed to the search of compounds targeting this subset of tumor-initiating cells in TNBC. Herein, according to all repurposed drugs that are currently being studied for the treatment of TNBC, a few of them can be highlighted. AR antagonists bicalutamide, enzalutamide, and seviteronel, currently under clinical trials, seem to be particularly promising drugs in light of their association with the Wnt pathway, reduction of drug resistance, and induction of radiosensitization, respectively. However, clinical trials are evaluating the efficacy of these antiandrogens only in patients with a LAR subtype and, as a consequence, these drugs might not be successful in treating the rest TNBC patients. Other drugs that are currently in the clinical stage are also highlighted, including zoledronic acid, L-NMMA, and chloroquine. They decrease tumor viability, reduce CSC population and their capacity of self-renewal both in vitro and in vivo. Furthermore, they seem to sensitize these cells to chemotherapeutics, hence diminishing drug resistance. Finally, there are other drugs at preclinical stage that must be highlighted because they

target CSCs or have been associated with a reduction of drug resistance, such as salinomycin, pyrvinium, vitamin D3, ATRA, benztropine, flubendazole, niclosamide, or propanolol. These drugs could be used as a monotherapy or in combination with chemotherapy to enhance the therapeutic response.

At the core of precision oncology, the high heterogeneity and molecular subtypes of TNBC should drive the diversity of approaches to tackle it, however, most studies do not discriminate between different subtypes. To date, only LAR subtype has really been addressed as an example of successful personalized drug repurposing. Besides the variety of molecular targets, a plethora of computational strategies hinder the ability to efficiently find potential repurposed drugs for TNBC patients. While having different tools for drug repositioning offers indeed a wide range of possibilities for personalized medicine, lack of a standardized protocol and a resolution of the most effective approach in the search of new uses for old drugs, raises the question: can computational drug repurposing actually be implemented as an improved method for drug discovery in personalized medicine and, more particularly, for TNBC? Factually, it is noticeable that some of the reviewed studies date from some years ago but none of those repurposed compounds have been yet approved for TNBC. While drug repurposing might increase the chances to help find new molecularly targeted candidates, hence improving the development of a more personalized medicine, the results suggest that not all candidates were as adequate as they might have seemed during in silico analysis, meaning that computational drug repurposing could not be as efficient as expected. It is therefore necessary for computational approaches to be validated and standardized, so as to reduce the chances of failure and allow drug repurposing to become an improved and attainable alternative with guarantees for personalized medicine. Be that as it may, drug repositioning has allowed to find new candidates that would not have been considered otherwise, making it still a powerful alternative for the search of a personalized treatment for TNBC patients.

**Author Contributions:** Conceptualization, S.G.-P.; writing, review and editing, M.Á.-M., A.L.-T., S.G.-P.; preparation of tables and figures, M.Á.-M., A.L.-T., F.E.C.-L., J.L.B.-C.; revision, S.G.-P., F.E.C.-L., J.L.B.-C., A.G.-G., J.A.L., P.S.-R.; funding acquisition, S.G.-P., P.S.-R., J.A.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work and S.G.P. was funded by Instituto de Salud Carlos III (CP19/00029, CP14/00197, PI19/01533, PI15/00336), Ministerio de Economía y Competitividad (RTC-2016-5674-1) and European Regional Development Fund (European Union). A.L.T. is funded by the Ministerio de Ciencia, Innovación y Universidades (FPU19/04450) and J.B.C. is supported by Fundación Científica Asociación Española Contra el Cáncer, Junta Provincial de Jaén (AECC) (PRDJA19001BLAY).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**




#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
