# **Genetic Complexity of Hormone Sensitive Cancers**

Edited by Jyotsna Batra and Rupert Ecker Printed Edition of the Special Issue Published in *Genes*

www.mdpi.com/journal/genes

## **Genetic Complexity of Hormone Sensitive Cancers**

## **Genetic Complexity of Hormone Sensitive Cancers**

Editors

**Jyotsna Batra Rupert Ecker**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Jyotsna Batra Queensland University of Technology Australia

Rupert Ecker TissueGnostics GmbH Austria

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Genes* (ISSN 2073-4425) (available at: https://www.mdpi.com/journal/genes/special issues/Hormone Cancers).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-7116-4 (Hbk) ISBN 978-3-0365-7117-1 (PDF)**

Cover image courtesy of TissueGnostics GmbH

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**




## **About the Editors**

#### **Jyotsna Batra**

Associate Professor Jyotsna Batra, PhD, is a Laboratory Head at School of Biomedical Sciences and an Advance Queensland Industry Research Fellow at the Centre for Genomics and Personalised Health at Queensland University of Technology, Australia. Dr Batra leads a research group working on the molecular genetics of cancer. Her current research focus is to identify cancer risk-associated genetic variants and to understand their molecular consequences on cancer initiation and progression. She aims to develop better biomarkers to detect cancer early and to identify genetic biomarkers that can distinguish slow growing diseases from very aggressive diseases at an early stage, so that better therapeutic interventions can be performed. Dr Batra has contributed to >200 research articles, including those in high-impact journals such as *Nature Genetics*. Dr Batra has received several poster and oral prizes for her research work. She has also been a finalist f or the prestigious ASMR and Women in Technology (WiT) Awards and won the Queensland Young Tall Poppy Award (2019) and Cure Cancer Researcher of the Year Award (2018). She has received >AUD 8 million in funding, including international funding from the US DoD Idea Development grant.

#### **Rupert Ecker**

Rupert Ecker, PhD, is an Adjunct Professor at Queensland University of Technology (QUT) and has studied cell biology at University of Vienna. He is an entrepreneur, holder of multiple patents and co-founder of the TissueGnostics group in Austria, Romania, the USA, and the Asian Pacific. He i s Chief E xecutive O fficer of Ti ssueGnostics GLOBAL. Hi s cu rrent re search fo cus is on artificial intelligence (AI)-based decision support systems for next-generation histopathology/tissue cytometry. He aims to develop ISO 13485- and in-vitro-diagnostics-compliant image cytometry software for biomedical research, as well as clinical diagnosis. Dr. Ecker is an inventor of more than 10 patents and has contributed to >50 research articles and book chapters. Together with his collaborators, he won the science2business Award 2011 of the Austrian Chamber of Commerce.

## **Preface to "Genetic Complexity of Hormone Sensitive Cancers"**

Cancer is a multi-faceted phenomenon and requires a concerted action across different disciplines to define a roadmap for precision medicine to treat cancer, from genomics, bioinformatics, biochemistry, proteomics, cellular and tumor biology, pharmacology, oncology, hematology, radiology, and surgery to relapse monitoring and palliative care. In this Special Issue, we focus on the initial segments of this roadmap by elucidating the 'Genetic Complexity of Hormone-Sensitive Cancers', and the book chapters provide insights into selected topics and technologies at the current frontiers of cancer research.

As scientists working in the field of biomedical research, novel and innovative techniques and technologies are appearing at an increasing pace. These are exciting times for cancer research. Technologies that simultaneously determine the expression levels of large groups of genes and link the genotype with cellular phenotypes and functions are essential for understanding regulatory networks, not only those made up of proteins (enzymes, hormones and the signal transduction cascades they constitute), but also genetic regulation.

Today, completely novel technologies and instruments are revolutionizing our view of biological entities, e.g., genes, and also, novel extensions to and revisions of existing and wellestablished technologies such as genetic probes and bar-coding techniques, which add a whole new dimension to microscopy as they allow us to visualize DNA sequences and mRNA in situ. It is the combination of well-established technologies with novel approaches, for example, the identification of newly spliced isoforms of key oncogenes, that invites new perspectives on old questions and allows us to connect that which previously appeared to be separate or even contradictory.

Only 2% of genetic material belongs to defined genes, while 98%, the evolutionary detritus, were considered to be unused. There is emerging evidence, however, that this genetic material is neither unused nor detritus, but rather, it exerts regulatory functions, both physiological and pathological ones. In Chapters 3 and 5, the authors present data on long non-coding RNA (lncRNA), which code for small peptides referred to as micropeptides (miPEP). To find coding information in "non-coding gene sequences" and to identify huge regulatory networks, which until now, have been elusive, was completely unexpected, calling into question everything we had assumed about intracellular regulatory networks and our genetics. To analogize the work of cosmologists who detected that the vast majority of mass in the universe is dark matter, we term these 98% of the genetic material of unknown information and function, *Genomic Dark Matter*. Hence, we are close to understanding of how genes and proteins are intertwined, going far beyond the classical model of a transcription-based unilateral information flow from genes to proteins, and we might even witness a paradigm shift over the coming years. The regulation of genes and proteins through small, "transient regulatory units" that may act simultaneously with lncRNAs through their transcription products, miPEPs, is a revolutionary and refreshing new concept and we might soon observe the formation of new sub-disciplines in genetic science.

The editors of this Special Issue wanted to create a unique forum for the international biomedical research community to present their research and review articles on the genetic complexity of cancer. It is notable that the guest editors and contributors are based all over the globe. We express our gratitude to the contributing authors from multiple institutions in nine countries. While the chapters in this book cannot claim completeness in any way, they represent an attractive collection across different aspects of hormone-dependent cancers and will hopefully stimulate further targeted research.

The first chapter is a review article on the state of the art of computational pathology, the next step of digital pathology. Over the previous two decades, the automated slide scanning and digital representation of tissue sections for histopathological examination have become mainstream, and many commercial options have become available. The terminology generally used to describe this technological approach is Digital Pathology. This term is actually misleading in that the "act of pathology" is not digital. Pathology is the "ability" to recognize relevant structures in histological sections of human tissue and deduct diagnostic and prognostic conclusions. In Digital Pathology, human experts look at digital representations of tissue samples on monitors, rather than through the oculars of a microscope, but the pathological analysis and interpretation as such is still performed visually. The next step of development aims to introduce a digital workflow into the actual digital analysis by using contemporary tools such as tissue cytometry and spatial biology and in situ genomics. These novel approaches are also referred to as Next-Generation Digital Histopathology.

Chapter 2 is a review on the Genetic Complexity of Prostate Cancer (PCa), its various pathways and mechanisms, its genetically distinct subsets, and the extraordinary complexity of genomic alterations in this form of cancer. The authors provide an overview of genetic changes that can occur during carcinogenesis and tumor progression and reflect on potential therapeutic approaches.

An intriguingly novel research topic—the regulatory function of long non-coding RNAs (lncRNAs)—and its emerging importance in prostate and breast cancer is reviewed in Chapter 3. Recently, germline genetic variations associated with cancer risk have been correlated with lncRNA expression and/or function. In addition, single nucleotide polymorphisms (SNPs) may occur within cancer-associated lncRNA and are correlated with cancer risk and may have potential as therapeutic targets for cancer treatment. As some of these lncRNA have a tissuespecific expression profile, they require further investigation to explore their potential as biomarkers for specific cancers.

The authors of Chapter 4 present their research work on the gene signatures fundamental to controlling cancer progression and their importance for the development of biomarkers and diagnostic tools. Two sets of overlapping signatures linking hormone signaling, cell cycle progression, and control pathways are discussed for prostate cancer, as these pathways may act together to promote aggressive cancer development.

Chapter 5 scrutinizes the alternative splicing of Iroquois-class homeodomain protein 4 (IRX4), a homeobox transcription factor, which has been implicated in PCa as a tumor suppressor through genome-wide association studies (GWAS). Additionally, tumor growth, metastasis, and therapy resistance benefit from aberrant splicing. In the study presented, twelve IRX4 transcripts in PCa cell lines were investigated, including seven novel transcripts, demonstrating unique expression profiles between androgen-responsive and non-responsive cell lines. These IRX4 isoforms might induce distinct functional programming that could contribute to PCa hallmarks, thus providing novel insights into diagnostic, prognostic, and therapeutic significance in PCa management.

Chapter 6 provides an in-depth comparison between six prostate cancer cell lines and tumor tissues with respect to gene-expression levels of androgen, insulin, estrogen, and oxysterol signaling. Major differences between the PCa tissue and cell lines and between different cell lines are described, and these elucidate the importance and limitations of cell lines for understanding PCa formation and progression. This systematic characterization provides a solid basis for the PCa research community to choose the appropriate cell line model for any hormone pathway of interest.

Acquired resistance to cyclin-dependent kinases 4 and 6 (CDK4/6) inhibition in estrogen receptor-positive (ER+) breast cancer remains a significant clinical challenge. Chapter 7 investigates differentially expressed genes associated with an acquired resistance to palbociclib in ER+ breast cancer.

Chapter 8 reviews alternative splicing in gynecological cancers, which is also induced by human papillomavirus and may cause cervical cancer, the fourth most common cancer on a global scale and the most common cancer in developing countries with rapidly increasing mortality rates. Serine/arginine-rich (SR) proteins and heterogeneous ribonucleoproteins (hnRNPs) have prominent roles in modulating alternative splicing. Aberrant splicing events in cancer-related genes lead to chemo- and radio-resistance, and the authors reflect on the role of alternative splicing events and splicing variants in cervical cancer as potential biomarkers for diagnosis, prognosis, and the development of novel drug targets.

Bladder carcinoma (BC), another hormone-dependent form of cancer with increasing incidence and mortality rates, is the subject of a study presented in Chapter 9. The study presented here utilizes a global proteomics approach to identify differentially expressed proteins in bladder cancer cell lines and identified hundreds of proteins that overexpressed or downregulated several BC cell lines. which were compared with a normal human urothelial cell line. It identified UDP-N-acetylhexosamine pyrophosphorylase (UAP1) as a promising therapeutic target for bladder cancer.

The last chapter reviews the current research on head and neck cancers with a focus on alternative splicing. These forms of cancer originate from the mouth, nasal cavity, throat, sinuses, and salivary glands, and the growing incidence rate makes them to sixth most common cancers worldwide. Such as other cancer forms discussed in previous chapters, in head and neck squamous cell carcinomas (HNSCC), human papillomavirus (HPV) is a major risk factor, followed by tobacco use and alcohol consumption. The genetic basis of the development and progression of HNSCC includes aberrant non-coding RNA levels. Different protein isoforms, the alternate methylation of proteins, and changes in the transcription of non-coding RNAs (ncRNA) can be used as diagnostic or prognostic markers and targets for the development of new therapeutic agents.

Recent methodological advancements in genetics, cell biology, biochemistry, bioinformatics, and information technology allow us to perform tests, experiments, and in situ investigations that the previous generation of life scientists might have not even dreamt of. Currently available commercial solutions not only permit us to identify specific genes in cells within their native tissue environment, but detect genomic signatures in situ and on the single cell level. Tools for spatial biology, also referred to as contextual tissue cytometry, allow us to simultaneously quantify a continuously increasing number of phenotypic/proteomic markers and relate them with the genetic profile of specific cells. These novel conceptual and experimental approaches are essential aspects of today's research endeavors and aim to successfully establish a roadmap towards Precision Cancer Medicine. As these chapters present such advancements, we hope this book may contribute towards achieving this bold aim, and we wish to inspire and enlighten all of our readersin order for them to further their own research.

**Jyotsna Batra and Rupert Ecker** 

*Editors* 

## *Review* **Next-Generation Digital Histopathology of the Tumor Microenvironment**

**Felicitas Mungenast 1,2,\*, Achala Fernando 3,4, Robert Nica 2, Bogdan Boghiu 5, Bianca Lungu 5, Jyotsna Batra 3,4 and Rupert C. Ecker 2,3,4,\***


**Abstract:** Progress in cancer research is substantially dependent on innovative technologies that permit a concerted analysis of the tumor microenvironment and the cellular phenotypes resulting from somatic mutations and post-translational modifications. In view of a large number of genes, multiplied by differential splicing as well as post-translational protein modifications, the ability to identify and quantify the actual phenotypes of individual cell populations in situ, i.e., in their tissue environment, has become a prerequisite for understanding tumorigenesis and cancer progression. The need for quantitative analyses has led to a renaissance of optical instruments and imaging techniques. With the emergence of precision medicine, automated analysis of a constantly increasing number of cellular markers and their measurement in spatial context have become increasingly necessary to understand the molecular mechanisms that lead to different pathways of disease progression in individual patients. In this review, we summarize the joint effort that academia and industry have undertaken to establish methods and protocols for molecular profiling and immunophenotyping of cancer tissues for next-generation digital histopathology—which is characterized by the use of whole-slide imaging (brightfield, widefield fluorescence, confocal, multispectral, and/or multiplexing technologies) combined with state-of-the-art image cytometry and advanced methods for machine and deep learning.

**Keywords:** next-generation digital histopathology; tissue cytometry; multiplexing; RNA ISH; cancer; tumor immune microenvironment; tumor microenvironment

#### **1. Introduction**

Cancer is a crucial global health challenge. The incidence of new cancer cases is predicted to increase by around 70% over the coming two decades [1]. Due to the idea that cancer originates from a deranged genome, exploring the genomic, transcriptomic, and proteomic nature of cancer is vital for understanding and utilizing remedies for the treatment of cancer [2]. The tumor cells and the surrounding microenvironment, which includes various types of immune cells, signaling cells and molecules, fibroblasts, and the extracellular matrix comprised of adjacent blood vessels, are highly interdependent compartments. We have started to understand the complex interplay between each of these compartments, with the tumor itself shaping and directing its surroundings—the tumor microenvironment (TME)—while at the same time obtaining signals from this microenvironment for further progression [3]. It is often believed that cancer seeds are germinated

**Citation:** Mungenast, F.; Fernando, A.; Nica, R.; Boghiu, B.; Lungu, B.; Batra, J.; Ecker, R.C. Next-Generation Digital Histopathology of the Tumor Microenvironment. *Genes* **2021**, *12*, 538. https://doi.org/10.3390/ genes12040538

Academic Editor: Gael Roue

Received: 11 March 2021 Accepted: 1 April 2021 Published: 7 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

primarily in an appropriate microenvironment [4]. Precise localization of molecular indicators by spatial immunophenotyping techniques inside the microenvironment concedes an additional comprehensive analysis of the tumor to foresee its progression and therapy response [5–7].

In the past few years, in-depth profiling of cancer cells/tissues has determined the cancer genome, the transcriptome, and the proteome as powerful sources of diagnostic, prognostic, and predictive markers/biomarkers [8,9]. In this regard, spatially mapped cellular gene expression has appeared as a critical method to understand the localization and complicated multicellular interactions of DNA, RNA, and proteins within cells located in the tumor as well as in the TME [10,11]. Interrogation of the tumor cellular organization context at single cell level with the cell's interactions with neighbor cells helps towards a better understanding of the heterogeneity of the TME between individuals as well as within the same tumor sample [12,13]. Thus, a need arises for multi-omics approaches where many DNA, RNA, splice variants and protein targets can be visualized by various staining techniques in situ. This dictates the need to quantify stained tissue sections, in terms of intensity, presence (expression levels), and/or spatial distribution in an unbiased, objective, fast, and automated way. Next-generation digital pathology is able to fulfil these requirements in research as well as in clinics. Even though the term "digital pathology" has been used for decades, its practical definition is still limited to digitizing samples. The actual analysis in digital pathology is still performed visually—by pathologists looking on a monitor rather than through a microscope's oculars. Converting immunohistochemistry (IHC), immunofluorescence (IF), or RNA in situ hybridization (RNA ISH) stained markers within tissues sections into digitized images is a prerequisite, but for pathology to become really "digital" and automated, further processing and extraction of quantitative data, termed as image cytometry, is required. Several commercial systems are available [14] that offer specialized software solutions utilizing image cytometry, but are methodically focused on the analysis of histological sections and are thus referred to as tissue cytometry [15,16]. Due to the constant evolvement and the increasing reliability of these systems, two commercially available whole slide imagers are approved by the 'U.S. Food and Drug Administration' (FDA) and can be used for clinical approaches [17]. To evaluate the reliability of these next-generation digital pathology platforms in clinics in terms of prognostication and patient management, Nagpal et al. conducted a comprehensive study using prostatectomy specimens. They established a deep convolutional neural network addressing Gleason scoring, which was trained by pathologists on 912 hematoxylin and eosin (HE) stained tissue slides. Next, they compared the classification of 29 additional pathologists with the results of the deep learning-based system. As an outcome of the study, the deep learning-based Gleason classification system showed a significantly higher sensitivity and specificity than 9 out of 10 pathologists [18]. Further studies that used deep convolutional/deep learning/machine learning networks for cancer tissue classification/TME on HE cancer samples for follow-up alignment with clinicopathological parameters are those by Jiao et al. [19], Kwak et al. [20], and Bidal et al. [21] on colon cancer samples, Mittal et al. on breast cancer [22], Wang et al. on lung adenocarcinoma [23], and Diao et al. on skin cutaneous melanoma, stomach adenocarcinoma, breast cancer, lung adenocarcinoma, and lung squamous cell carcinoma [24].

All the above-mentioned studies are good examples that show that tissue cytometry may provide the methodological basis for next-generation digital pathology, which is the state-of-the-art technology to use and constitutes an enabling factor for precision medicine in clinics as well as in research. Within this review, we are going one step further by addressing the concepts of next-generation digital pathology using imaging-based tissue cytometry, in combination with multiplexing and RNA ISH technologies, as an emerging and central method within precision diagnostics, and discussing various applications.

#### **2. Multiplexing Techniques as Useful Tools for High-Content Phenotyping**

To achieve high-content phenotyping, optionally in combination with applying genetic markers for well-defined DNA loci as well as total RNA or specific mRNA measurements, the importance of multiplexing staining techniques continues to increase in research and clinics, especially for the purpose of determining the complex immune and tumor microenvironment status in patients suffering from cancer, graft versus host disease, and other pathological conditions related to immune responses [25]. In clinics the assessment of various immune cell markers as well as immune cell populations is required for prognosis, diagnosis, and selecting the therapeutic intervention strategy. Conventional IHC/IF staining techniques are restricted by the number of markers which can be detected at once within one tissue section. This problem was bypassed by staining consecutive tissue sections, with the main limitation being that high-dimensional co-expression analysis is not possible and very precious information is lost [14]. However, in recent years the ability of multiplexing, in terms of visualizing a high number of markers at one time within a sample, has evolved and thereby represents a powerful tool for investigating complex molecular/functional processes and interactions within cells as well as in the complex native tissue environment. In this section, we discuss various immunohistochemistry and immunofluorescence multiplexing techniques.

IHC-based multiplexing methods: Conventional IHC staining usually used in pathology only allows the detection of one marker per tissue section, and therefore no coexpression analysis is possible. With IHC multiplexing techniques the number of stained markers per tissue sections can be increased drastically, which leads to more detailed staining of patient tissue, especially important for clinical applications in respect to diagnostics and prognosis [26]. Previously published multiplexing methods based on IHC are "multiplexed immunohistochemical consecutive staining on single slide" (MICSSS) [27] and "Sequential Immunoperoxidase Labelling and Erasing Method" (SIMPLE) [28]. These two techniques use the chemical property alcohol solubility of the peroxidase substrate 3-amino-9-ethylcarbazole (AEC). The protocol is similar to conventional IHC but includes after image acquisition the removal of AEC with organic solvent-based destaining buffer, and the restaining with new antibodies targeting other markers of interest. Thereby, MIC-SSS and SIMPLE enable multiple staining rounds. As a final step, the images taken after each staining round are overlaid and sometimes even transferred into a pseudo-color IF-like image. The advantages of these two staining techniques are that they allow coexpression analysis and there are no limitations in terms of antibody species (same antibody origin species can be used for each marker), which is a limitation in conventional staining IHC techniques. However, MICSSS and SIMPLE allow only one marker at each staining round and therefore are limited in number of markers (accordingly to published data, up to 5–10 in total) and are highly time intensive [28,29].

IF-based multiplexing methods: Multiplexing methods based on immunofluorescence are much more common and comprise many advantages over IHC-based multiplexing methods. With IF multiplexing techniques, conventional immunofluorescence staining/imaging can be extended from around 6 to up to 60 markers. Published IF multiplexing techniques include "TSA Opal multiplex immunohistochemistry" (Opal mIHC, PerkinElmer, Waltham, MA, USA) [30], "in silico multiplexing workflow" [31], "tissuebased cyclic immunofluorescence" (t-Cycif), MultiOmyx (MxIF) and "multi-epitope-ligand cartography" (MELC) technology as well as DNA barcoding-based techniques such as "CO detection by InDEXing" (CODEX, Akoya Biosciences ,Marlborough, MA, USA) and GeoMx® (NanoString, Seattle, WA, USA). IF-based methods are much more effective and faster than the IHC-based methods, given that more than one marker can be stained simultaneously in each staining round [14,31–35]. The Opal mIHC technique is based on sequential staining rounds, and the secondary antibodies are tagged with tyramide signal amplification system (TSA)-conjugated fluorescence molecules. Heat-treated stripping of the tissues in between the staining rounds removes the primary and secondary antibodies but not the TSA-conjugated fluorescence molecules. After multiple staining

rounds, the slides can be acquired. There is no limitation in the number of different antibody species but there is a restriction in the number of fluorochromes [30]. Blenman et al. established a workflow for multiplexing that includes multiple staining rounds of the tissue, whole-slide imaging with the tissue cytometer TissueFAXS PLUS (TissueGnostics, Vienna, Austria), dye inactivation by chemical bleaching after each acquisition step, as well as merging the images from all staining rounds and quantitative analysis of the stained markers/cell populations with StrataQuest software (TissueGnostics) [31]. A similar strategy is used by the t-Cycif and the MxIF techniques [32,33]. One big advantage of these chemical bleaching-based methods is that they substantially reduce autofluorescence of the tissue after each acquisition step [36]. However, chemical bleaching-based technologies are still time consuming; for a staining protocol of 30 markers, approximately 1–2 weeks are needed. One main disadvantage of the repeated chemical-based bleaching steps for fluorochrome removal after each staining/imaging round is that the preservation of cell and tissue integrity cannot be guaranteed. Lin et al. reported that after 10 staining rounds, a loss of 2–46% of the cells within various tissue types was observed [31–33]. Another technology used for multiplexing is MELC, which is based on fully automated and repeated rounds of multiple marker IF staining, imaging as well as chemical and photobleaching (at the excitation wavelength) of the fluorochromes on a tissue section. The main limitation of the MELC technology is that the photobleaching and imaging step can be only applied to one microscopic field of view [35]. A rather innovative and novel technology able to deal with a very high number of different target antigens is the DNA barcoding-based method CODEX. A cocktail of up to 50 unique oligo-DNA (barcodes) conjugated antibodies specific for the target markers is applied at once on the tissue section. Next, the barcodes are detected by highly specific dye-labeled reporters, which are barcode-complementary oligonucleotides labeled with fluorochromes. Multiple rounds of staining, imaging, and removing of the reporters allow high-dimensional phenotyping [34]. Similar technology is used by GeoMx® (NanoString, Seattle, WA, USA), which is also based on oligonucleotide tags (barcodes) in combination with microscopic imaging to identify a high number of markers (proteins, mRNA, miRNA, etc.) in one hybridization reaction [26]. A summary of the above-mentioned staining methods is provided in Table 1.

The enhanced number of stained markers offered by several multiplexing methods also increases the necessity of appropriate next-generation digital pathology platforms that provide fully automated acquisition of the stained tissue sections as well as computerassisted/digital high-content phenotypic analysis and high-dimensional data mining.




*Genes* **2021**, *12*, 538


immunofluorescence.

6

#### **3. Advanced Imaging for Digital Pathology**

The first step in a tissue cytometry/next-generation digital pathology workflow includes whole slide scanning or at least acquisition of a region of interest of the stained slide. The second and even more important step comes with the subsequent computer-assisted quantitative image analysis. Next-generation digital pathology technology aims to guide the workflow away from visual observation with a standard microscope and subjective estimations, which are funneled into scoring schemes describing marker expression with "+/++/+++", to a fully automated computerized platform for the detection and numerical quantification of stained markers in defined cell subpopulations in relation to specific histological structures. Not only are these platforms providing a fast analysis of markers, but they also seek accurate, unbiased, reproducible, and standardized results. These platforms are already well integrated and used in various fields of research [37,38]. Additionally, in 2017 the FDA approved the first next-generation digital pathology program (Philips IntelliSite; PIPS) as a clinical digital diagnostics tool in routine diagnosis [39].

Several whole slide imaging platforms (with or without image analysis software) are commercially available in various configurations (e.g., TissueGnostics, Akoya Biosciences, Leica Biosystems, Hamamatsu, Zeiss, 3DHistech, PerkinElmer, Roche, Philips). As the name already indicates, these scanners are able to acquire whole slides instead of only individual captures of fields of view, and thereby provide complete composite digitized images of slides in high resolution. The technology used is image acquisition by either tile scanning or line scanning with a follow-up stitching of the images [14,40]. Depending on the specific next-generation digital pathology platform configuration, these scanners are able to perform whole slide imaging in different imaging modes such as brightfield, widefield fluorescence, confocal, structured illumination, multiplexing, and/or multispectral. The hardware components are usually the following: microscopy stand (upright, inverted) or boxed system without a phototube, cameras (color and/or monochrome), light sources for fluorescence and/or brightfield mode, multiple filter sets for multicolor fluorescence imaging (may include single-, dual-, and/or multi-band filters), high-quality objective lenses for acquisition with different magnifications (1× to 100×), motorized slide scanning stage or high-throughput slide loading systems. Some platforms offer objective auto-oiling for high magnifications and/or provide a slide bar-code reader for higher efficiency. A powerful computer workstation and high-resolution computer monitors for the viewing of the digitized slide as well as for the potential follow-up image analysis [37,41] are mandatory. For controlling all the individual components, for digitized slide viewing and data management, the platforms also include highly functional slide imaging software [40]. In some instances, the platforms also offer or are equipped with image analysis solutions for quantitative analysis. Such quantification is not stoichiometric, and hence does not provide chemical concentration of markers, but is rather based on comparison with negative controls, which is referred to as cytometric.

The basic Theory of Scales of Measurements defines four different types of scales nominal, ordinal, interval, and ratio [42], the first one being referred to as qualitative, and the other three being accepted as quantitative. In all three scales, systematic, observerindependent measurements of well-defined attributes of objects can be performed, resulting in numerical values that allow for comparison of the objects under investigation as well as statistical evaluation of the assigned attributes.

Most slide scanners today provide area and distance measurements in metric values, whose measurements fall into the ratio type of scales. The amount of any given molecular marker expressed in certain cells, however, is usually determined as a relational value (e.g., mean relative fluorescence or optical density in brightfield microscopy) rather than an absolute value (e.g., μmol or nanogram). Hence, such measurements belong to the interval type of scale. Such measurements do permit comparative measurements of more and less, but it is not possible to draw conclusions by building the ratio between two values. If a cell or cell population expresses a certain molecule at a mean relative fluorescence of 7000 and another cell or cell population exhibits a value of 14,000, in the scope of a

cytometric measurement it is safe to state that "the second cell/population contains more of that molecule than the first" and that "the mean relative fluorescence increases from 7000 to 14,000" in a comparison of these two entities, but it cannot be concluded that the amount of molecules doubles. This is similar to our daily temperature readings in Celsius or Fahrenheit: 30 ◦C is not "twice as hot as 15 ◦C". Cytometric measurements belong to the interval scale and are thus to be considered quantitative.

Image analysis options can provide unlimited applications depending on the platform and among others may enable basic single cell analysis, dot detection, cellular co-expression as well as subcellular co-localization analysis, meta structure detection, multiplexed highcontent phenotyping, proximity measurements, structural tracing (e.g., neurites and/or axons), particle and/or single cell tracking, as well as the analysis of spatial relationships for next-generation digital pathology [41,43]. A representative example of high-dimensional data analysis and the power of these platforms is shown in Figure 1.

**Figure 1.** A representative example of high-dimensional automated tissue cytometry shown on a colon sample stained for seven markers. (**a**) Original multicolor immunofluorescence image data set acquired by a multispectral imaging technology. Nuclei stained by 4 ,6-diamidino-2-phenylindole (DAPI) in blue; immune markers/immune checkpoint markers CD4 in green/PD-L1 in yellow/PD1 in red/CD68 in pink/CD8 in orange; pan-cytokeratin marker in turquoise. As this raw data image contains overlapping emission signals from the fluorochromes, the colors appear mixed. (**b**) Image with clearly separated fluorescent signals obtained by a mathematical procedure referred to as spectral unmixing. (**c**) Nuclei detection, highlighted by the green contour mask shown in overlay to the original image. (**d**) Metastructure detection of epithelial cells, highlighted in orange overlay. (**e**) Proximity measurements in relation to detected metastructures with various distance zones highlighted by different colors. (**f**) Analysis of spatial connections among and between single cells of a specific cellular phenotype highlighted by a green mask and white connecting lines. The images were provided by and analyzed using TissueGnostics' image cytometry solution StrataQuest.

#### **4. Role of Machine Learning**

A big fundamental improvement step in recent years elevating the next-generation digital pathology approach is the integration of artificial intelligence (AI) algorithms for pattern recognition into the image analysis/image cytometry process [44]. Over the past few years, these AI tools have become more robust, and with only minimal user input can be applied to automatically detect objects such as nuclei and specific structures as well as for the classification of various anatomical tissue entities within an entire digitized slide [44,45].

Understanding molecular and cellular interdependencies quickly leads to complex questions, which require the elaboration of extensive algorithms and enormous amounts of computing power to get to an answer. While machine learning has been known to have great potential in this field for many decades, in the recent past it has advanced greatly in its practical use due to the availability of powerful computer technology, in particular parallel computing on multiple CPUs and/or CPU cores as well as due to new software tools, programming languages, and advanced machine learning techniques, which have made the technologies much easier to use without the requirement of advanced theoretical knowledge [44,45]. By engaging state-of-the-art technologies, computer scientists and engineers try to generate models that can provide answers to the complex problems given by nature. Machine learning's power resides in its robustness in generating customized models designed to solve (very) specific problems. [44]. Machine learning models are generated by learning on examples consisting in observations. Current techniques of machine learning comprise supervised, unsupervised, transfer, federated, and reinforcement learning [45].

In the case of supervised machine learning, the observations are tagged to a class by a human expert, and therefore the model efficiency is strictly related to the quality of the training data set used. An optimal training data set should cover a wide enough range of variability expected in the real-world data, for example a well annotated slide. Failing to do so can increase the possibility of misclassifications. [46].

Unsupervised learning refers to a machine learning method where the algorithm learns from examples without being able to refer to predefined target values or classes (untagged/unlabeled data). The algorithm tries to identify patterns by creating an internal representation of the data and looks for density probabilities (e.g., clustering analysis). This method is suited to search for patterns which are not obvious, or are difficult to identify even for/by the human eye [47].

The transfer machine learning method is characterized by the fact that an already trained algorithm can be used to answer different, but related questions. It means an existing trained model can be adapted/tweaked to solve new tasks without the need to train a new model from scratch [48].

In the federated machine learning method, the algorithm learns from data spread/stored on multiple devices. Federated machine learning is similar to distributed learning, but the focus is on training on heterogeneous data and not on parallelization. No training data information is shared between the devices as part of the learning process [49].

Another machine learning method, enforcement learning, is based on an algorithm that needs to take optimal decisions based on the new data presented and the cumulated experience (knowledge). The learning process is continuous; each decision taken by the algorithm is labeled using a system with rewards and punishments. The aim is to solve the task by maximizing the cumulative positive feedback [50]. Precisely, the step-bystep development in machine learning aims towards a human-like learning, in the sense that humans learn from existing experience even in unrelated sectors and can transfer knowledge to new arising tasks rather than start from the basics, which is, however, still the case in machine learning.

Due to the versatile range of applications of next-generation digital pathology discussed in the following section, these platforms (with or without AI) can be seen as a crucial part of precision medicine by providing a solid and fully automated tool for the gaining of novel information on the pathology of specific diseases, identification of novel predictive and prognostic biomarkers, as well as targets for therapy [37,38,44].

#### **5. Current Applications of Next-Generation Digital Pathology**

#### *5.1. RNA In Situ Hybridization (ISH)*

In clinical settings, a routinely used method to measure RNA is real-time PCR [51]. However, this grind-and-bind technique is unable to visualize the individual cell signals in their original context, and is prone to becoming contaminated by unintended cell and tissue types and masking the different cellular subpopulations and phenotypes in the heterogenous TME [6,52]. Next-generation sequencing and single-cell sequencing technologies can detect RNA expression at the single cell level, but dissociation from

their native setting deprives the data related to their spatial relationship [53]. With the latest developments in RNA ISH, multiple approaches came into play such as non-isotopic fluorescently labeled ISH (fluorescence in situ hybridization—FISH) and biotin or hapten labeled nucleic acid probes (chromogenic in situ hybridization—CISH) to gather spatial data [52,54–57]. These methods opened a new data dimension, supporting localization and quantitation of target RNA in single cells to detect precise RNA expression in specific cell types [52,58]. However, these techniques only allow a restricted number of labels to be integrated into the probes, leading to reduced sensitivity of expression for most of the genes [52]. Due to a high possibility of cross-hybridization and non-specific binding in complicated tumors, the signal-to-noise ratio is constrained, and extreme technical complication limits the performance of these methods [52,58]. In Figure 2, a representative example of the automated quantitative analysis of FISH and RNA ISH is shown.

**Figure 2.** A representative example of automated analysis of fluorescence in situ hybridization (FISH) and RNA in situ hybridization (ISH) stained cells using a next-generation digital pathology platform. (**a**) FISH staining (blue, nuclei stained for 4 ,6-diamidino-2-phenylindole (DAPI); red and yellow dots, FISH probes); on the left the original image is shown, in the middle the corresponding analyzed image including cell and dot detection mask, and on the right the analyzed data visualized in a scattergram. (**b**) RNAscope staining (blue, nuclei stained for hematoxylin; brown, RNAscope staining); on the left the original image is shown, in the middle the original image overlaid with the detected dot mask, and on the right the original image overlaid with the nuclei mask, the cellular mask, and the identified dot mask. Both images were provided by and analyzed using TissueGnostics' image cytometry solution StrataQuest.

RNAscope by Advanced Cell Diagnostics Inc., Hayward, CA (ACD) has presented the most pragmatic method that overcomes these limitations of traditional RNA ISH by a unique probe design and an advanced signal amplification system [52,59]. This technology excels due to its specificity, sensitivity, low turnaround time, and robustness in a wide range of applications across various disciplines including infectious diseases, neuroscience, cell or gene therapy, and single-cell transcriptomic profiling in cancer [52,60–64]. In the TME, RNAscope has prominent advantages such as spatially mapping a cell atlas [65,66], visualizing and characterizing gene signatures and generating the immune landscape, and even identification of novel cell subtypes [67,68], classifying and identifying highly heterogeneous and immunotherapeutic cell types [69,70], and identification and characterization of a gene signature of stem cells [71–73] and circulating tumor cells [74,75] as well as analyzing or predicting their response to drug treatments [76,77]. Compared with a one-probe RNA ISH hybridization system, the possibility of nonspecific amplification in RNAscope is considerably low since it implies a double-probe independent hybridization system and improves the sensitivity and the signal-to-noise ratio, allowing better quantification of RNA expression [52,78].

The RNAscope method allows robust detection of mRNA, long non-coding as well as microRNAs [57,79–82], and multiple gene transcripts generated by alternative splicing [83,84] simultaneously in fresh-fixed, fresh-frozen, and formalin-fixed paraffin-embedded (FFPE) clinical specimens, revealing the full potential of RNA [85]. For example, the expression of a majority of androgen receptor (AR) splice variants other than the full-length AR variant remains unclear in prostate cancer progression. RNAscope has been proposed to be a capable technique for detecting expression and localization of splice variants by designing probes specifically to target distinct splice variants. For example, AR and AR-V7 expression have been detected in FFPE prostate tumors by RNAscope where AR expression was found to be 3-fold higher in primary tumor cells compared with benign glands, while AR-V7 expression was higher in metastatic castration-resistant prostate cancer than in primary prostatic tissues [84].

Emerging new therapeutic strategies broadly target both cellular and non-cellular components of the TME more than ever, by various therapies such as immune checkpoint blockade therapy, dendritic cell vaccination, and antiangiogenic therapy [86]. Detection of RNA targets in the TME that are involved in tumor immunotherapy with the RNAscope assay can facilitate these therapies predominantly. RNAscope applications enable the determination of localization of specific immune cell types (i.e., cytotoxic lymphocytes and regulatory T cells) in the TME [87], spatial relationships between different cell types in the TME [88], and immune activation state and function of tumor-infiltrating immune cells in the TME [89,90]. For example, Monte et al., using RNAscope assay, reported that infiltrating basophils in the TME regulate tumor-promoting Th2 inflammation and reduce survival in pancreatic cancer patients [89]. Besides, this technique is an attractive strategy to determine cell type-specific expression of immune checkpoint markers [91] and differentiate activated CAR+ T cells from endogenous T cells [5]. RNAscope's aptitude to precisely identify the cellular sources of secreted proteins (e.g., cytokines and chemokines) is a distinct benefit since although the mRNA will always localize in the cells of origin, secreted proteins tend to dilute and diffuse in the intercellular space [67,87,92]. Besides, RNAscope provides valuable information on the differentiation of paracrine and autocrine signaling, which aids in the classification of subtypes of several cancers [93]. A dual gene analysis approach with RNAscope has been utilized for simultaneous detection of CD44+ cells and PD-L1 in head and neck squamous cell carcinoma, which found that CD44+ in the TME induces expression of PD-L1, thus subsequently suppressing T cell-mediated immunity in the TME [94]. The localization and quantification of multi-RNA from several genes simultaneously by RNAscope provide greater time saving and significant results from a single feasible technique. However, rapid mRNA translation and RNA degradation in cells can affect RNAscope applications, and thus BaseScope, a subfield of RNAscope, has been recommended for short RNA targets of 50–300 nucleotides [95]. Instead of using 20 probe pairs, BaseScope utilizes short 1–6 probe pairs to target small regions of RNA more effectively. Thus BaseScope is a successful method to determine the expression and quantification of small nucleolar RNAs (snoRNAs), microRNAs, and the RNAs which have a high potential of degradation and transient expression in the TME [95].

The newest approach of RNAscope, in combination with IHC and called dual RNAscope ISH/IHC, has proven to offer an ideal platform to generate more reliable data that can be used to study gene expression signatures at the RNA and protein level with spatial and single-cell resolution in complex TME [5]. This allows correlation of both RNA and protein expression in a single slide, simultaneously validating antibody specificity [78,96–98]. For example, combined detection of HPV RNA by RNAscope and Cdc2 protein expression by IHC has been useful to predict the prognosis of oropharyngeal squamous cell carcinoma patients. Even more, the results conclude that the sensitivity of RNAscope was higher than that of PCR reverse dot hybridization [98]. The automated RNAscope is a significant advancement over manual RNAscope and improves the clinical advantage by allowing more samples to be analyzed in a standardized way simultaneously with less time, less inter-user variability, and less manpower in an observer-independent manner [86]. The method has proven consistent and provides reproducible results in quantifying transcript levels. Overall, the spatial resolution presented by the RNAscope method brings a novel dimension to precise localization of target RNA in single cells and allows localization and quantitation of RNA expression in specific cell types in the TME [86].

#### *5.2. Assessment of the Tumor Immune Microenvironment*

One of the most promising fields in biomarker and therapy target detection in oncology is dedicated to the exploration of the patient-specific immune contexture in situ with conventional and multiplexing IF and IHC staining techniques in combination with automated quantification [14].

One prominent approach for immune cell assessment within a particular tumor tissue, colorectal cancer (CRC), was developed by the group of Galon et al., where they successfully established a patient stratification strategy based on the detection/identification of T cell populations within the tumor core and the invasive margin named Immunoscore (ratio of the markers CD3 and CD45RO, CD3 and CD8, or CD8 and CD45RO). It is currently undergoing evaluation/implementation as a routine parameter for prognostic and predictive diagnosis in clinics for colon cancer [99,100]. To demonstrate its power the group of Pages et al. conducted a large-scale study, where his group assessed the Immunoscore by using a digital pathology method of a large patient cohort (*n* = 2681 CRC patients), aligned it with clinical pathological data, and thereby was able to show the power of the Immunoscore in the prognosis of survival prediction and treatment response in CRC patients [101]. In order to provide a representative (yet not complete) overview of recent applications, Table 2 shows further examples of studies using conventional and/or multiplexing IF and/or IHC staining techniques in which next-generation digital pathology was the central method for the quantification of various immune cell markers/populations in different cancer types and aligned with clinicopathological parameters.


**Table 2.** Studies using next-generation digital pathology for the assessment of the tumor immune microenvironment.


**Table 2.** *Cont.*


**Table 2.** *Cont.*


#### **Table 2.** *Cont.*


**Table 2.** *Cont.*

CRC, colorectal cancer; CRCLM, colorectal cancer metastasis in the liver; HCC, hepatocellular carcinoma; HNSCC, head and neck squamous cell carcinoma; NSCLC, non-small cell lung cancer; n.s., not specified.

> The examples summarized in Table 2, as well as the example shown in Figure 3 from Desbois et al. [153] show the immense power of the applications of this technique utilizing next-generation digital pathology for the assessment of the immune tumor microenvironment. In order to integrate the Immunoscore or other immune cell screening strategies also into clinical research, such fully automated next-generation digital pathology platforms should be implemented into the process of quantification of the rate of infiltration of various immune cell populations/markers. Ongoing clinical studies are aiming at the integration of such platforms in combination with the staining of a set of immune-related biomarkers including main subpopulation markers and immune checkpoint markers [14].

**Figure 3.** Analysis of the tumor immune microenvironment using next-generation digital pathology. A representative example of the automated detection of CD8+ immune cells within the tumor microenvironment of ovarian cancer by Developer XD (Definiens, Munich, Germany). Figure adapted from Desbois et al., 2020 [153].

To sum up, the need to automatically assess immune cell markers in situ, as well as analyzing spatial relationships, and thereby providing a better understanding of various immune cells populations and their interactions, is crucial for the detection of novel predictive and prognostic biomarkers as well as for clinical therapy strategy.

#### *5.3. Detection of Blood Vessels*

Neoangiogenesis and the resulting vascularization are equally required by the tumor, as in healthy tissues. In both types of tissue, normal and tumor, cell survival and proliferation depend on oxygen and nutrition supply as well as on removal of carbon dioxide and metabolic wastes. In contrast to regulated neoangiogenesis in healthy tissues, tumor angiogenesis is characterized by an uncontrolled, ineffective, often incomplete (and therefore leaky) growth of new blood vessels within the tumor tissue in order to supply the tumor mass with oxygen and nutrition [172]. However, the in situ assessment of the density of blood vessels stained by specific markers such as CD31 or CD34 was shown to correlate with the aggressiveness of the tumor in a variety of tumor types such as CRC, breast cancer, gastric cancer, and small cell and non-small cell lung cancer [173]. Furthermore, specific therapies such as neutralizing antibodies targeting anti-vascular endothelial growth factor are widely used in several cancer types [174]. However, inhibition of vessel growth has only been shown to provide limited or even no long-term improvement for cancer types including hepatocellular carcinoma and CRC [175,176]. However, the use of different non-standardized methods for detection and quantitation of blood vessel density leads to contradicting data in terms of influence on patient survival [177]. Therefore, the unbiased automated quantification of blood vessels could help to identify patient groups that would benefit from anti-angiogenic therapies.

Summarized in Table 3 are studies where next-generation digital pathology was used to detected blood vessels/blood vessel densities. Thereby we want to emphasize that the next-generation digital pathology approach is highly versatile and can be applied to various research needs and questions, not only to single cell detection or dot (RNA ISH) detection but also for the analysis of more complex structures such as blood vessels.


**Table 3.** Studies using next-generation digital pathology for the quantification of blood vessels.


**Table 3.** *Cont.*

CRC, colorectal cancer; n.s., not specified; ESCC, esophageal squamous cell carcinoma.

#### **6. Conclusions**

Within this review, we show several application fields that contribute to next-generation digital pathology, including the analysis of RNA ISH, conventional and/or multiplexed immunophenotyping, and blood vessel detection in the tumor microenvironment. Due to new staining technologies that allow a higher number of markers, one of the new challenges is high-dimensional data mining, which needs to be addressed by next-generation digital pathology platform providers. Several platforms are available on the market tackling different kinds of requests, including slide scanning, management of a large amount of data, follow-up image analysis with integrated AI modules, as well as high-dimensional data mining. Next-generation digital pathology has the potential to elevate research and clinics by providing automated, unbiased, fast, reproducible, and therefore reliable image cytometry.

The integration of data obtained by automated analyses, referring to the levels of DNA, RNA—including non-coding RNA—and proteins, will allow the development of tools for research and diagnostics within the scope of precision medicine, i.e., with a focus on the molecular mechanisms involved in disease formation in individual patients rather than averaged cohorts, in which individual yet decisive details get lost. This is the expanding area of tissue cytometry—and the era of next-generation digital pathology.

**Author Contributions:** Conception and design, F.M., R.C.E. and J.B.; Writing—original draft, F.M., A.F., B.B., B.L. and R.N.; Writing—review and editing, R.C.E. and J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** Some of the authors (F.M., R.C.E., B.B., R.N. and B.L.) are employees of TissueGnostics, which is a for-profit company.

#### **References**


### *Review* **The Genetic Complexity of Prostate Cancer**

#### **Eva Compérat 1,2,3,\*, Gabriel Wasinger 3, André Oszwald 3, Renate Kain 3, Geraldine Cancel-Tassin <sup>1</sup> and Olivier Cussenot 1,4**


Received: 28 September 2020; Accepted: 23 November 2020; Published: 25 November 2020

**Abstract:** Prostate cancer (PCa) is a major concern in public health, with many genetically distinct subsets. Genomic alterations in PCa are extraordinarily complex, and both germline and somatic mutations are of great importance in the development of this tumor. The aim of this review is to provide an overview of genetic changes that can occur in the development of PCa and their role in potential therapeutic approaches. Various pathways and mechanisms proposed to play major roles in PCa are described in detail to provide an overview of current knowledge.

**Keywords:** prostate cancer; germline mutations; somatic mutations; PTEN; TMPRSS2; ERG; androgen receptors

#### **1. Introduction**

Prostate cancer (PCa) is a major concern in public health, with more than 1.1 million cases worldwide detected every year [1]. Several risk factors for developing PCa are known, e.g., older age, family history and African ethnicity. Despite the refinement of existing treatments and emergence of new management strategies, such as active surveillance and focal therapy, metastatic disease is frequent and mortality is still relatively high, with 26,730 estimated deaths in 2017 [2].

Based on the severity of disease at diagnosis according to differentiation, extension and stage, PCa may be treated in different ways; of particular importance is its initial hormone dependency which allows specific treatments, especially during early-stage disease [1]. Further differences exist between primary, metastatic (mPCa) and castration resistant PCa (CRPCa). Therefore, it is important to know the most important actors, especially the important role of genetics in this hormone-sensitive cancer.

Several studies showed many different genetically distinct subsets of PCa. Various drivers are known, such as androgen-related fusions of ETS-related gene (ERG) and ETS family members, speckle-type pox virus and zinc finger protein (SPOP) mutations, DNA hypermethylation, PIK3/RAS/RAF pathway alterations and DNA damage repair (DDR) pathways.

For better understanding, it is necessary to mention genetic screening, which was explored in the early 2000s and abandoned in 2012 after it was demonstrated that many of the detected PCa were clinically insignificant and did not affect patient life expectancy [3]. In recent years, powerful genetic tests were developed that provide polygenic risk scores for individual patients [4,5]. However, a remaining challenge is the recrudescence of clinically relevant PCa, which may also benefit from personalized approaches for risk assessment or therapy.

Despite recent advances, standard pathology remains a fundamental tool in managing PCa. The Gleason score (GS), reflecting tumor differentiation, is a staple of clinical decision-making,

and recent meetings could refine the consensus and diminish interobserver variability [6]. Nevertheless, GS alone will not give all the necessary information; molecular profiling of PCa could provide further information. For instance, a study by Haffner et al. [7] showed that metastasis was not strictly and may result from a tumor region with lower grade, alongside observing PTEN loss. Clinicians need to be increasingly aware that although classical histopathology is a firmly established basis for clinical decisions, it is not the single determinant of PCa behavior.

In recent studies taking a scrutinous approach to Gleason grading of PCa, pathologists showed that cribriform and intraductal PCa, which should be considered Gleason grade 4, were probably more aggressive than the classical Gleason grade 4 pattern. Moreover, several studies underlined a more aggressive behavior of cribriform PCa, partly explained by the underlying molecular aberrations in these tumor patterns. A recent study tested genomic instability by determining the portion of the genome altered and somatic copy number alterations (CNA). Patients with cribriform and/or intraductal PCa and ≥ GS 7 had significantly higher percentages of the genome altered than men without this pattern in both cohorts of The Cancer Genome Atlas (TCGA) (2.2 fold; *p* = 0.0003) and the Canadian Prostate Cancer Genome Network (CPC-GENE) (1.7 fold; *p* = 0.004) [8]. These patterns were associated with deletions of different chromosomes, such as 8p, 16q, 10q23, 13q22, 17p13 and 1q22, and amplification of 8q24, which plays a major role in PCa evolution and is specifically addressed in this review. CNAs comprised a total of 1299 gene deletions and 369 amplifications in the TCGA dataset. Several of the affected genes were known to be associated with aggressive prostate cancer, such as loss of PTEN, CDH1 and BCAR1 and gain of MYC. Point mutations in TP53, SPOP and FOXA1 were also associated with these PCa patterns but occurred less frequently than CNAs. This study clearly shows that cribriform/intraductal patterns are associated with increased genomic instability, clustering to genetic regions involved in aggressive PCa.

The very complex genomic situation of PCa can be broken down into two major aspects, which need to be considered, namely, the germline genetic background and the somatic changes in PCa (Figure 1).

**Figure 1.** Common genomic alterations in prostate cancer. Germline mutations are highlighted in blue, while typical somatic mutations are highlighted in orange. For abbreviations see full text.

#### **2. Germline Mutations Driving PCa**

Hereditary prostate cancer (HPC) is defined by strict clinical criteria and represents 5% of all newly diagnosed PCa [9]. Inherited predisposition to acquire PCa is genetically determined by the presence of a deleterious mutation of DNA repair genes also related to breast/ovarian cancers (i.e., BRCA1 and BRCA2, ATM, etc.) or PCa-specific risk genes (HOXB13 and 8q24 region) [10]. A recent study performed germline sequencing and analysis of DNA repair genes [11] in 5545 men of European ancestry, including 2775 nonaggressive (localized disease, stage T1/T2 and GS ≥ 6 tumors) and 2770 aggressive (lethal or metastatic disease, stage T4 or both T3 and GS ≥ 8) PCa cases. The authors found that BRCA2 and PALB2 showed the most statistically significant gene-based disease associations, with 2.5% of aggressive and 0.8% of nonaggressive cases carrying deleterious BRCA2 alleles, and 0.65% of aggressive and 0.11% of nonaggressive cases carrying deleterious PALB2 alleles. ATM had a nominal association, with 1.6% of aggressive and 0.8% of nonaggressive cases carrying ATM alleles. According to typology of genetic risk, predisposition exposes the individual to an earlier age of onset or a more aggressive form of the disease, increasing the risk of death from this cancer. Germline mutations of DNA repair factors are found in only up to 8% of these patients due to the rarity of the mutations [12,13].

DNA damage response (DDR) pathways are extremely promising targets in PCa treatment. Two prominent factors are breast cancer 1 and 2 genes (BRCA1 and BRCA2). The TCGA research network, testing 333 primary PCas, reported mutations in DNA repair genes in 19% of primary PCas. Among these, 3% were affected by BRCA2, including germline as well as somatic truncating mutations. Only one case displayed BRCA1 as a frameshift mutation [14]. If these tumor suppressor genes are mutated heterozygously in the germ line, they can generate aggressive forms, especially mPCa [15]. In a normal setting, BRCA1 and BRCA2 repair double-strand breaks by homologous recombination. In case of a BRCA germline mutation, a somatic loss of function in the wild-type BRCA allele is consequently frequently observed together with defective homologous recombination. Although deleterious BRCA germline variants are rare, these patients were shown to more frequently develop PCa and mPCa, and also exhibit high Gleason scores (grade group 3–5) and worse outcomes [16]. Nevertheless, these tumors did not seem to have specific histological aspects allowing them to be recognized on a standard slide. Interestingly, it was observed that ATM and BRCA1/2 germline mutations were associated with Gleason grade reclassification during active surveillance of carrier patients, with an upgrading of GS from 6(3 + 3) to GS 7(3 + 4) or 7(4 + 3) [17].

Another gene implicated in DNA repair mechanisms is ATM, which also plays a role in DNA damage repair and mediates downstream checkpoint signaling. Its prevalence in mPCa is 1.6% and therefore it must be considered, especially in these forms of PCa [15]. ATM also integrates the concept of the so-called homologous repair deficiency profile [18]. These mutations were reported in nearly 2.5% of PCa patients, some detected in lethal PCa (death due to metastatic PCa), with fewer in localized PCa (low-risk disease, GS ≤ 6, organ confined) [19]. Interestingly, in Chinese patients, ATM mutations were found in 4.5% of lethal PCa. In the mentioned American–African–Asian study, no co-mutations BRCA1/2–ATM were detected, indicating that these mutations are probably simultaneously exclusive. This study also demonstrated that mutation carriers displayed higher overall lethality, higher mPCa rates and lower PCa-specific survival in patients with diagnosed mPCa, especially in young patients under 60 years. On the other hand, the lowest carrier rate was among patients who died of PCa>75 years or > 10 years after their initial PCa diagnosis.

The potential of detecting patients with the above-mentioned germline mutations opens the door to specific treatment approaches, one of the most promising drugs being inhibitors of anti-poly ADP-ribose polymerase (PARP). PARP inhibitors induce cell death because they interfere with a cellular mechanism of single-strand DNA break repair. These occur normally during the cell cycle, but also during oncogenesis. In case of mutation of other abovementioned repair genes, PARP is required to repair both strands, which means that the cell is entirely dependent on PARP for single-strand repair. In context of BRCA mutation and concomitant PARP inhibition, for example, a tumor cell would not be able to perform these measures, resulting in chromosomal instability, cell-cycle arrest and apoptosis [20]. A recent study by Hussain et al. [21] showed that patients with PCa harboring at least one mutation in BRCA1, BRCA2 or ATM who received the PARP inhibitor olaparib had significantly longer survival than those who received enzalutamide or abiraterone plus prednisone as control therapy. Other studies also demonstrated that metastatic, castration-resistant PCa (mCRPCa) with BRCA2 germline mutations or deleterious variants showed greater response to platinum-based chemotherapy [22]. This relationship between genotype and responsiveness to platinum chemotherapy was also observed among BRCA2 patients with breast and ovarian cancers. Another study showed concordant results, underlining that biallelic BRCA2 inactivation in mCRPCa could serve as a biomarker for predicting sensitivity to platinum chemotherapy, which showed a clear benefit in biallelic, BRCA-mutated patients [23].

These types of studies underline the necessity of germline testing in special patient groups. Considerations exist regarding who to test for genetic counseling, such as a known mutation in a cancer-susceptibility gene within the family, mPCa, high-risk, localized PCa (grade group 4/5, PSA ≥ 20, WHO group ≥ 3), young age at PCa detection, family history suggestive of Lynch syndrome and hereditary breast ovarian or prostate cancer. Prostate cancers with deficiency of the DNA mismatch repair (MMR) system are rare, with 1% at the localized stage and 5% in the advanced stages [24]. PCa rarely occurs as an unconventional malignancy of the Lynch syndrome spectrum (HNPCC) [25]. These forms of PCa have aggressive anatomopathological criteria, with intraductal forms (25%), a high incidence of Gleason score 8 and high rates (40–50%) of metastatic de novo disease which are frequently visceral (30% of metastatic patients). At the molecular level, they are characterized by a high rate of mutation (tumor mutational burden). Considering their advanced stage, they respond relatively well to androgenic deprivation, but less so to taxanes once castration-resistant. Afflicted patients are a subgroup currently considered candidates and evaluated for anti-PD1 or anti-CTL4 immunotherapy [26,27]. On the germinal level, Pritchard et al. [15] reported that, of 692 patients with mPCa, 12% showed deleterious germline mutations (1% involving MSH2, 1% MSH6 and 2% PMS2). Abida et al. [26] reported that, of 1346 PCa patients, 3% showed high microsatellite instability, of which 22% possessed a germ mutation in a gene associated with Lynch syndrome. Most patients (46%) had MSH6 mutations.

#### **3. 8q24 Region**

8q24 is a hotspot of susceptibility loci for PCa. These risk loci, identified by genome-wide association studies (GWAS), do not affect the coding DNA and are frequently associated with single-nucleotide polymorphisms (SNPs). It was shown that amplification of 8q24 (harboring MYC) is frequent. Often, enhancers such as rs-6983267 interact with MYC and alter sensitivity to certain crucial signaling pathways, e.g., WNT [28,29]. With ongoing research, it is evident that 8q24 variants play a role in PCa carcinogenesis. A recent meta-analysis determined significant associations between PCa risk and 15 variants in 8q24. The 8q24 region is dense with SNPs; some of these variants might enhance genes implicated in PCa carcinogenesis [29].

Although the inherited component for PCa was previously acknowledged, the identification of genetic variation on 8q24 conferring cancer-specific susceptibility may help to improve screening strategies. Genome-wide studies reported low penetrance of signals influencing the risk. Nevertheless, since risk alleles are relatively common in the population, their cumulative impact is potentially substantial. One recent study underlined several independent signals in different regions and yielded a biological annotation enriched with different elements, such as promoters, enhancers and transcription factor-binding sites, such as AR, ERG and FOXA1 [30]. In another study by the same group, 12 independent risk signals with different variants were described, some of them for the first time (rs1914295, rs190257175 and rs12549761). They were weakly correlated with already known PCa risk markers. On the other hand, this study showed that men with a cumulative risk score had a greater risk of developing PCa than the average population. The described 12 variants accounted for around 25% of what could be explained as familial genetic risk factor, highlighting the contribution of germline variation on 8q24. However, although the 8q24 region is now established as a major region for PCa susceptibility, the underlying biological mechanisms still require further elucidation [10].

#### **4. Somatic Mutations Driving PCa**

Outlier androgen-regulated genes play an important role in PCa development. These genes are generally expressed at low levels, but can show variation in expression of different genomic subsets [31,32]. Especially in light of emerging personalized medicine, it is increasingly important to take into consideration the individual mechanisms driving aggressive PCa.

The gene fusions in PCa are mostly controlled by androgen, fusing to members of the ETS (E26 transformation-specific) family of transcription factors [33]. One of the most cited gene fusions in PCa is the overexpression of TMPRSS2–ERG (T2E) gene fusion. A consequence of this fusion is an overexpression of oncogenic factors; this is frequently present globally, in about 50% of PCa [33]. TMPRSS2 also fuses to other ETS family genes, such as ETV1, ETV4 and, rarely, ETV5. From a morphological point of view, these tumors frequently display particular aspects, such as macronuclei, signet cell rings, cribriform aspects and intraductal carcinoma. The prognostic impact of T2E is still a matter of controversy. T2E acts as an aberrant transcription factor with oncogenic properties. Several papers indicated that its presence was an indicator of aggressive phenotype and poor prognosis [34–36]. Some authors claimed that it was the most important prognostic factor in patients treated with prostatectomy [37]. Furthermore, it was proposed to be a risk marker for lymph-node metastasis and poorly differentiated disease, as well as biochemical recurrence at five years [37,38]. Nevertheless, the data are still controversial, and T2E is not taken into consideration for decision-making at the moment. Several contrasting studies found no clinical significance of the TMPRSS2-ERG fusion [38–40]. Interestingly, recent studies showed that T2E-positive and -negative PCa are two different molecular groups of PCa, suggesting that the T2E status determines the nature of metastasis-related gene signatures [40,41]. In T2E-positive cases overexpression in mPCa was seen for several genes (GMNN, TROAP and WEE1). In T2E-negative cases, completely different metastasis-associated genes were expressed, such as ASPN, BGN and TYMS. Therefore, the authors concluded that, according to the T2E status, different genes are linked to the development of mPCa. In patients with overexpression of ASPN, BGN and TYMS, shorter event-free survivals were exhibited. TYMS (thymidylate synthetase), for instance, plays a role in DNA replication and repair. Interestingly, neither PTEN nor TP53 mutations biased the results. Finally, the study showed that the T2E status is not a strong prognostic biomarker per se, but determines the prognostic value of other biomarkers [41].

Another interesting outlier gene is SPINK1, which seems to be, in most cases, mutually exclusive with ERG overexpression [41]. SPINK1, located on 5q32, encodes a protein which functions as a trypsin inhibitor. SPINK1 positivity was shown to be an independent predictor of shorter biochemical recurrence and progression-free survival. Recombinant SPINK1 protein is able to stimulate cell proliferation in benign prostate tissue. A recent study showed that SPINK1 overexpression is linked to higher PTEN expression and lower AR (androgen receptor) expression, with the authors further suggesting that SPINK1 protein expression may not be a predictor of recurrence or lethal PCa amongst men treated by radical prostatectomy. SPINK1 and ERG proteins were not entirely mutually exclusive in this study, as some previous studies suggested [42].

Another important outlier to mention is SChLAP1, a noncoding RNA gene, located on 2q31. Apparently, SChLAP1 has no coding potential, is located in the nucleus and is associated with ETS gene fusion as well as with mPCa. SChLAP1 seems to coordinate cell proliferation and metastatic spread. High expression is associated with lethal PCa, independently of tumor differentiation (Gleason score), tumor stage, PTEN status and age. In a multivariate analysis, SChLAP1 predicted mPCa within 10 years (odds ratio = 2.45) [43].

Obviously, single gene outliers alone are limited in their results and application. These genes might become the basis of patient risk stratification and adaptations in treatment, e.g., more intense earlier treatment or targeted therapies, and will probably become more available in the upcoming years to facilitate genetic and individual subclassification [44].

#### **5. Recurrently Altered Genes**

PCa has a lower mutational burden than many other epithelial tumors. The recent TCGA paper showed several significantly mutated genes, such as SPOP, TP53, FOXA1, PTEN and others [14]. Clinically relevant genes, such as BRAF, HRAS and AKT1, as well as the β-catenin pathway and the DNA repair pathway (see above), also showed importance.

Loss of PTEN (located on 10q23) is frequently observed and is closely related to MYC overexpression. The latter is found on 8q24. Both together seem to play a role in high-risk PCa.

PTEN loss is associated with adverse findings in early PCa and occurs in approximatively 15% as homozygous deletions. PTEN loss or mRNA-based genomic signatures can be useful to help determine whether definitive therapy is required, and its loss seems to be more frequent in patients with African ancestry [45]. Early investigations already showed that loss of PTEN, even when detected by immunohistochemistry, was a predictor of aggressive metastatic disease. A paper by Haffner et al. [7] demonstrated that PTEN mutation was not present in morphological, higher grade lesions surrounding a tumor with PTEN loss when the Gleason score was lower. Associated TP53 and SPOP mutations were reported in the same patient. Interestingly, lymph node metastasis did not harbor the same mutations, suggesting an independent clonal/subclonal origin of these lesions. Genetically, there is strong evidence to suggest that poorly differentiated PCa (Gleason score 9/10) has a higher level of genomic instability with an increased rate of copy-number alterations and alterations in key signaling pathways (TP53, PTEN and RB1) associated with resistance to androgen deprivation therapy (ADT) [14]. The function of PTEN is closely linked to the PIK3 pathway, in which PTEN is generally considered a negative regulator. PIK3CA was shown to be frequently mutated, either via activation of mutational hotspots, coincidently activating mutations or amplification. According to the literature, PTEN-deleted tumors are likely to be PIK3CB-dependent; a coexistent loss and mutation of PTEN and PIK3CB might increase PIK3 pathway output and indicate PCa with AR signaling inhibition. PIK3 pathway and DNA repair alterations seem to be more frequent in metastatic specimens [14].

MYC overexpression is an early event in prostate cancer development. Apparently, concomitant with PTEN loss via characterized HOXB13 transcription control, genomic instability and aggressive disease with a high risk of metastases are initiated. These tumors typically show a high Gleason score and disease progression. On the other hand, isolated MYC activation and PTEN were reported to be insufficient to induce invasive PCa, with cells remaining in a precancerous stage (high-grade prostate intraepithelial neoplasia (PIN)) [46]. A recent study from Liu et al. [47] showed that gain of MYC and loss of PTEN resulted in elevated PCa mortality associated even with single copy-number changes. Regarding the interplay of these two factors, MYC overexpression can induce genetic instability, while PTEN might repress this process in PCa cells. Recent results suggested that PTEN might play a role in DNA repair, and if PTEN is lost, high levels of DNA damage that normally repress apoptosis due to increased PIK3 signaling are introduced [48].

One of the genes recurrently mutated in PCa is SPOP, an E3 ubiquitin ligase adaptor protein of the ubiquitin–proteasome system. Mutations may affect the degradation of developmental regulators of PCa, including AR. SPOP plays a role in several further important cellular functions, and intervenes in transcription, cell-cycle regulation and apoptosis.

SPOP mutations were previously described in precancerous lesions and primary tumors of the prostate, suggesting that SPOP mutations are early and recurrent events in the development of PCa [7]. SPOP mutations were identified in 6–13% of primary prostate adenocarcinomas and 14.5% of metastatic prostate cancers, but data regarding expression in distant metastasis are sparse [49]. Recent studies showed that SPOP mutations are less frequent in metastatic than in primary tumors (8% versus 11%) [14]. In the abovementioned article by Haffner et al. [7] describing the clonal evolution of a PCa in metastasis, SPOP was shown to be mutated in the lethal metastatic cell clone. In this context, it is intriguing that SPOP is also a component of the DNA damage response (DDR).

One PCa-relevant direct substrate of SPOP is the androgen receptor (AR) [50], which harbors a SPOP-binding motif. When binding to SPOP, AR undergoes ubiquitin-mediated degradation. AR splice variants that lack the SPOP-binding motif escape this degradation. Interestingly, PCa-associated SPOP mutants do not bind to AR or promote its degradation. SPOP-mediated degradation of AR is driven by antiandrogens and blocked by androgens. It is not clear whether SPOP can interact with other nuclear receptors [51].

It is unclear whether the same SPOP mutations are present in different ethnicities, since some authors described differences according to the patients' ancestry. We [52] and others observed different mutation frequencies between African and European patients, with SPOP mutations differing significantly between both groups. In our study, SPOP mutations were found in more than 20% of patients with African origin compared to 10% of European origin. In contrast, a recent study analyzing 720 PCa samples from six international cohorts, including Caucasian, African–American and Asian PCa patients, showed that SPOP mutations were variably frequent (4.6–14.4%), but found no association with ethnicity. Hence, the authors concluded that SPOP mutations were not associated with ethnicity, biochemical recurrence, clinical parameters or pathological parameters [53]. In light of inconsistent data, more studies are needed to make a conclusive statement.

CDK12 is a gene also implicated in DNA repair by regulation of the expression levels of different DNA repair damage response genes. CDK12 is recurrently mutated in aggressive localized PCa [52] and in mPCa [54]. CDK12 biallelic loss is associated with focal tandem duplications. Moreover, CDK12 mutations are related to increased gene fusion, which can yield neoantigens and induce strong immune infiltration, suggesting that patients with these mutations could benefit from immune checkpoint immunotherapy [54].

#### **6. Androgen Receptors**

PCa is a hormone-sensitive cancer, and androgen receptors (AR) play a major role in the treatment and development of PCa. For advanced PCa, ADT is the standard of care, but is decisively ineffective in castration-resistant PCa (CRPCa) [1]. Under ADT treatment, circulating testosterone indicates the androgen suppression level. In CRPCa, the activity of AR remains elevated, despite reduced androgen levels. In recent years, second-generation, AR-targeting therapies were developed in order to treat CRPCa with agents to antagonize AR and to suppress extragonadal androgen (coming from the adrenal gland, for instance) [55].

The AR gene, located on Xq11-12, is a major transcriptional regulator in the normal prostate, but also in PCa cells. The AR, a steroid hormone receptor, forms a complex with heat-shock protein 90 (HSP-90). When binding with androgen, it undergoes a change, allowing nuclear translocation, DNA binding and regulation of gene transcription [56]. Different structural variants exist, and some AR were previously implicated in aggressive tumor activity (see below). The AR gene may undergo genomic alterations such as point mutations, which can induce structural changes. These alterations, specifically seen in CRPCa, can help to understand the dependence of CRPCa on androgen-independent AR signaling.

After ligand binding, the AR is translocated into the nucleolus, forms a dimer and binds to the androgen-response element of the promoter or enhancer of target genes [57]. Furthermore, the AR dimer forms a complex with coactivator- and coregulatory proteins in different regions and regulates gene expression with diverse functions. These are located downstream of the androgen response element, including fusion genes (TMPRESS2–ERG), transcription factors (FOXP1, NKX3.1) and others. Many coactivators interacting with different AR domains are implicated in AR activation in therapy-resistant PCa. While normal AR activity on transcription is ligand-driven, AR transcript variants can encode truncated AR proteins lacking a ligand-binding domain, which can activate AR-target genes in the absence of androgens [58].

The spectrum of genes regulated by AR deserves special attention. AR has transcriptional activity and structural variants exist which play roles in the outcome of the patient. However, it is unknown to what extent individual primary PCa tumors differ in androgen sensitivity or dependence. Androgen activity is a central axis in PCa evolution, and drives most ETS fusion genes [26]. ETS fusion

genes are under AR control, but the ETS fusion-positive groups have different AR transcriptional activity. More frequent events, such as androgen-regulated fusions of ERG or ETS family members, form distinct PCa groups (T2E, see above). The most frequent drivers are ERG, ETV1-4, SPOP and FOXA1 mutations [14]. TMPRSS2 is the most frequent fusion partner of all ETS fusions and is androgen-related. Tumors with SPOP or FOXA1 mutations have the highest AR transcriptional activity, as SPOP mutations deregulate AR and AR co-activators [59]. In PCa. two different changes affect the AR pathway. The output is controlled by AR mRNA and protein expression, but also by the expression and mutations of AR cofactors [60]. FOXA1, which can be mutated, is a transcription factor that targets AR and plays an obvious role in PCa oncogenesis. A subset of these mutations also harbor SPOP mutations and consequently produce high AR levels.

It must be mentioned that the AR is the most frequently aberrant gene in mCRPCa, with around 63% aberrant expression [12]. Point mutations are found in 15–30% of CRPCa, most of the mutations residing in the ligand-binding domain. These point mutations can activate AR with a specific point mutation (T878A), which also activates resistance to second generation AR agonists [61]. These mutations were detected in 13% of CRPCa patients with abiraterone resistance. A second observed mechanism was the amplification of the AR receptor, seen in up to 50% of patients with CRPCa [62]. In case of ADT, low androgen levels still exist. In case of AR amplifications, PCa cells can survive under ADT, leading to progression of CRPCa. Change in androgen biosynthesis also plays a role. For instance, during ADT, the adrenal gland still produces androgens, while CRPCa overexpresses converting enzymes, converting weak androstenedione levels into DHT to activate AR [63].

The AR splice variants (AR-Vs) have different mRNA sequences than the full-length AR (AR-FL). Around 22 AR-Vs are currently known, most of them lacking the ligand-binding domain (LBD), which is the target of existing AR therapy. Most of these AR-Vs mediate active AR signaling, which means that they can act without the presence of androgens or AR-FL [64]. One of the best-described variants is AR-V7, which also lacks the LBD. AR-V7, known to be an important factor in the treatment of CRPCa, can already be detected in primary tumors and surrounding normal prostate tissue. This is surprising, since truncated AR splice variants were proposed to be expressed predominantly in mCRPCa, and their presence was associated with hormone-therapy resistance, at least for AR-V7. However, the TCGA study showed that these splice variants are already expressed in therapy-naive primary PCa [14]. Structural alterations of AR-Vs may occur in the same allele, leading to a generation of AR-Vs displaying deletions and duplications within the AR LBD. AR-V expression can be regulated by both splicing enhancer sequences and also protein kinase pathways [65].

It could be beneficial to obtain AR-V7 data before treating patients with mCRPCa, as a recent study suggested [66]. The authors showed that patients with mCRPCa exhibiting nuclear localized AR-V7 in circulating tumor cells had better outcomes when treated by taxanes than by AR signaling inhibitors. In a follow-up study, Graf et al. [67] showed that AR-V7-positive patients who were treated with taxanes exhibited better survival, while those who were AR-V7-negative exhibited better survival when treated with AR signaling inhibitors. Therefore, in the era of emerging personalized medicine, the status of the most important splice variants could become an important clinical criterion in the future.

#### **7. Conclusions**

As this short overview shows, genomic changes in PCa development are extremely complex and many different factors need to be considered. One major challenge of advanced disease is that many proposed underlying mechanisms are still insufficiently understood or unclear due to contrasting results, and thus cannot yet be leveraged as refined therapeutic approaches. However, science is progressing rapidly. This review provides a snapshot of the current knowledge, but in the upcoming years, more data will be available to treat this frequent cancer more effectively.

**Author Contributions:** Conceptualization, E.C. and O.C.; methodology, E.C., O.C. and G.C.-T.; software, G.W., A.O.; validation, E.C., O.C. and G.C.-T.; formal analysis, G.W., A.O., R.K.; investigation, E.C.; writing—original draft

preparation, E.C., O.C., G.W., A.O.; writing—review and editing, R.K.; visualization, G.W.; supervision, E.C. and O.C.; project administration, R.K.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Long Non-Coding RNAs at the Chromosomal Risk Loci Identified by Prostate and Breast Cancer GWAS**

**Panchadsaram Janaththani 1,2, Sri Lakshmi Srinivasan 1,2 and Jyotsna Batra 1,2,\***


**\*** Correspondence: jyotsna.batra@qut.edu.au

**Abstract:** Long non-coding RNAs (lncRNAs) are emerging as key players in a variety of cellular processes. Deregulation of the lncRNAs has been implicated in prostate and breast cancers. Recently, germline genetic variations associated with cancer risk have been correlated with lncRNA expression and/or function. In addition, single nucleotide polymorphisms (SNPs) at well-characterized cancerassociated lncRNAs have been analyzed for their association with cancer risk. These SNPs may occur within the lncRNA transcripts or spanning regions that may alter the structure, function, and expression of these lncRNA molecules and contribute to cancer progression and may have potential as therapeutic targets for cancer treatment. Additionally, some of these lncRNA have a tissue-specific expression profile, suggesting them as biomarkers for specific cancers. In this review, we highlight some of the cancer risk-associated SNPs that modulated lncRNAs with a potential role in prostate and breast cancers and speculate on how these lncRNAs may contribute to cancer development.

**Keywords:** long non-coding RNA; prostate cancer; breast cancer; single nucleotide polymorphisms; genome-wide association studies

#### **1. Introduction**

Hormone-related cancers, prostate and breast cancer, accounted for more than 3.6 million newly diagnosed cancer cases worldwide in 2020 [1]. Genetic predisposition has been identified as one of the factors contributing to the risk of these cancers. Genome-wide association studies (GWASs), analyzing common low-penetrance variants, have identified specific risk loci for these cancers [2,3]. Most of the risk-associated single nucleotide polymorphisms (SNPs) identified through GWAS are present in non-protein-coding DNA [4–7]. This non-coding DNA can regulate the expression of protein-coding genes and maintain the 3D structure of the genome by serving as a scaffold for transcription factors. Alternatively, some non-coding DNA is now found to be transcribed as non-protein-coding RNA (ncRNA) using high-throughput next-generation sequencing platforms [8]. Even though ncRNAs are not translated into proteins, they play a vital part in human complexity from maintaining normal cellular function to playing a broader role in human diseases including cancer [9–12]. Both ncRNAs and protein machinery involved in the development of diseases have become targets of novel therapeutic approaches [13–15]. Based on transcript size, these ncRNAs are grouped into two major classes: small non-coding RNAs (<200 bp) and long non-coding RNAs (lncRNAs) (>200 bp). The small ncRNA class comprises miRNAs, tRNAs, snRNAs, siRNAs, and piRNAs [3,16,17]., LncRNAs have recently been identified as important mediators in many diseases, including cancer [16,18,19]. Long non-coding RNAs (lncRNAs) are RNA transcripts that lack translational potential into functional proteins. The biogenesis of lncRNAs is similar to mRNAs. Most lncRNAs are transcribed by RNA polymerase II while some are also transcribed by RNA polymerase III. Most of the lncRNAs undergo post-transcriptional modifications, such as splicing, polyadenylation, and 5 capping-like protein-coding RNAs [20]. However, these molecules have several

**Citation:** Janaththani, P.; Srinivasan, S.L.; Batra, J. Long Non-Coding RNAs at the Chromosomal Risk Loci Identified by Prostate and Breast Cancer GWAS. *Genes* **2021**, *12*, 2028. https:// doi.org/10.3390/genes12122028

Academic Editor: Mikko P. Turunen

Received: 4 November 2021 Accepted: 17 December 2021 Published: 20 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

short open reading frames (sORFs) and have very little protein coding potential, which discriminates them from mRNA [21]. Based on their origin, these lncRNAs can be classified as intronic, exonic, intergenic, intragenic, antisense, 3 and 5 UTR, promoter-associated (paRNA), and enhancer-associated (eRNA) [22]. Most of the lncRNAs are localized in the nucleus while some are found in both the nucleus and cytoplasm, and some are specifically distributed in the cytoplasm. These lncRNAs play a functional role in gene expression regulation by either cis (targeting genomically local genes) or trans (targeting distant genes) action [22]. Interestingly, recent advanced research has identified several putative coding sORFs suggesting that lncRNAs may be translated into micro peptides with a functional role [23–25].

Even though lncRNAs constitute a major part of the human transcriptome, the functional characterization and identification of molecular pathways in which these lncRNAs are involved remains a challenge. Nevertheless, considerable variabilities in the function of lncRNAs can be observed through well-characterized lncRNAs to date. It is evidenced that many lncRNAs are deregulated in prostate and breast cancer and some of their expression has been significantly associated with different stages of cancer. These lncRNAs are proposed to be involved in cancer development by playing functional roles in chromatin remodeling, transcriptional regulation, or post-transcriptional regulation. They show tumor-suppressive or oncogenic potential, emphasizing their potential in targeted therapeutics for prostate and breast cancer [26,27]. In addition, lncRNAs show tissue- and cancer-specific expression patterns, enabling them to be better diagnostic and prognostic tools for cancer therapies [26,28]. Moreover, SNPs could affect the expression and molecular function of lncRNAs, for instance, by disrupting their secondary structure and playing critical roles in tumorigenesis [29]. Recently, there has been increasing evidence in studies linking genetic variants modulating lncRNA expression and prostate or breast cancer risks [6,30]. In this review, we summarize lncRNAs regulated by risk-associated genetic variants (Table 1) in these two hereditary cancers to gain insights into the contribution of lncRNAs to cancer etiology, oncogenic function, and treatment resistance.

#### **2. Prostate Cancer Risk-Associated SNPs Modulating lncRNAs**

As a multifactorial disease, prostate cancer has several aspects contributing to its etiology, comprising both modifiable and non-modifiable factors [31]. Diet and environmental exposure disruptors, such as bisphenol A, chlordecone, and pesticides [31], are reported as modifiable prostate cancer risk factors. Age is a well-known non-modifiable risk factor for prostate cancer, where the risk of developing cancer increases with age [32,33]. Ethnicity is another non-modifiable contributing factor to the development of prostate cancer, where Asians have been reported to have lower prostate cancer incidence rates than European and American populations [31]. Furthermore, family history and/or heredity is also a known non-modifiable prostate cancer risk factor [33]. There is a considerable amount of evidence for a genetic basis (up to ~57%) contributing to the risk of prostate cancer [34,35]. Recently, a large prostate cancer GWAS identified novel risk loci making it to a total of 269 risk loci to date [36] and the study led to the identification of a genetic risk score of prostate cancer predisposition. Nevertheless, identification of the causal genes has been a major challenge, given the location of a large proportion of these variants are in the non-coding regions. Functional studies are known to complement GWAS results to identify specific genes whose expressions are associated with disease phenotype. One such approach is by expression quantitative trait locus (eQTL), which can identify the association between risk genotype and gene expression, and transcriptome-wide association studies (TWASs), which can assess the association with disease risk throughout the transcriptome.

One of the few studies to explore prostate cancer GWAS SNPs-associated lncRNAs identified that the prostate cancer-associated SNPs are less polymorphic in the flanking regions, but the SNP density was similar in protein-coding and lncRNA gene regions, indicating the sequences of lncRNA are evolutionarily conserved [37]. This study reported that 52 loci were located within the lncRNA genes, including a new prostate cancer riskrelated SNP rs3787016 in a predicted lncRNA *AC1127096.1* [37]. This locus has been initially identified to be associated with prostate cancer risk in members with families having multiple cases of prostate cancer [38]. An independent case-control study in Chinese men reported that the rs3787016 SNP 'A' allele was associated with a higher risk of developing prostate cancer in younger individuals as well as individuals with a smoking history or Gleason score ≥7 (4 + 3) or aggressive disease [39].

By using a systemic approach for lncRNA based on their position in the promoter region, intercellular functional correlation, eQTL with one or more risk SNPs (cis-eQTL), and differential expression between tumor and normal prostate tissues, Guo et al. shortlisted 45 potential lncRNAs with 50% of prostate cancer risk loci from 122 loci [30]. This included already well-known lncRNAs, such as *KCNQ1OT1*, *H19*, and prostate cancer-associated transcript 1 (*PCAT1).* LncRNA *KCNQ1OT1* was known to act as an miRNA sponge and compete with either miR-211-5p or miR-15 to promote prostate cancer progression [40,41]. Polymorphisms in *H19* were also associated with risk and some of the clinical parameters of bladder cancer [42], hepatocellular carcinoma [43], cervical cancer [42], and urothelial cell carcinoma [44]. Two additional SNPs in *H19*, rs3024270 or rs3741219, were shown to be associated with the risk of perineural invasion of prostate cancer [45]. Increased expression of *H19* was observed in prostate cancer patients with a high Gleason score compared to a low Gleason score and benign prostatic hyperplasia (BPH) [46]. The function of *H19* remains controversial in multiple cancers, including prostate cancer. While knockdown of *H19* in PC3 and DU145 prostate cancer cells reduced cell proliferation and glucose and lactate levels [46], the H19-derived miR-675 axis is described as a suppressor of prostate cancer metastasis, regulating extracellular matrix protein and transforming growth factor β-induced protein (TGFB1) [47]. This suggests that the tumor microenvironment and cell types should be accounted for when determining the functional role of *H19*. In addition, *H19* overexpression increased the expression of stem cell markers *Oct4* and *Sox2* and increased colony formation in the RWPE-1 prostate epithelial cell line [48]. *H19*-dependent transcriptional regulation by estrogen and hypoxia redirected the cells from epithelial to mesenchymal transition (EMT) to β integrin-mediated invasion [49].

Guo et al. also reported that of the 45 lncRNAs regulated by non-coding SNPs, 18 lncRNAs' expression was significantly correlated with 15 prostate cancer risk loci [30]. Moreover, some of these risk SNPs are enriched in the promoter regions of five lncRNAs –*PCAT1*, *RP11-400F19.18*, *RP11-24D8.1*, *RP11-552F3.10*, and *RP11-328M4.2* [30]. This study further identified that the risk SNP rs7463708 in the enhancer region of the *PCAT1* increased the binding of a novel AR interacting partner, ONECUT2, which then looped to the *PCAT1* promoter. Moreover, *PCAT1* was identified as an androgen lateresponse gene and interacted with AR and lysine-specific demethylase 1 (LSD1) upon prolonged androgen treatment to promote prostate cancer growth [30]. *PCAT1*, which has been identified as the top-ranked lncRNA based on its overexpression in prostate tumor [50], is also known to promote prostate cell proliferation through upregulation of the cMyc protein [51]. Apart from this distal enhancer locus at rs7463708 SNP, another independent risk locus tagged by rs10086908 SNP was associated with *PCAT1* expression, with nine SNPs located across the promoter and exons of the *PCAT1* gene [30]. Moreover, rs1902432 SNP in *PCAT1* was also identified to be associated with an increased risk of PCa [52]. Interestingly, a meta-analysis of five lncRNA polymorphisms in prostate cancer-associated non-coding RNA1 (*PRNCR1*, also known as *PCAT8*) and multiple cancer susceptibility reported that four of the SNPs (rs16901946, rs13252298, rs1016343, and rs1456315) were associated with overall cancer risk while no association was found with rs7007694 SNP [53]. Another small case-control study using 178 prostate cancer patients and 180 BPH cases in the Iranian population identified rs13252298, rs1456315, and rs7841060 SNPs in *PRNCR1* to be associated with prostate cancer risk [54]. There was no significant association between rs7007694 SNP and prostate cancer risk [54], as also reported with overall cancer risk previously [53] and after adjusting for clinicopathological characteristics, such as age, tumor stage, prostate-specific antigen (PSA) levels, Gleason score, perineural invasion, and

surgical margin [54]. However, it is important to validate these associations in a larger cohort. *PRNCR1* was reported to have high expression in aggressive prostate cancer and was reported to enhance both ligand-dependent and ligand-independent AR-mediated transcriptional activity by directly binding to the region of 549-623 amino acids of AR and therefore promotes prostate cancer growth [55]. On the contrary, another study by Parolia et al. reported significantly lower expression levels of *PRNCR1* in the prostate cancer models they tested, raising questions about its involvement in AR activation in prostate cancer [56]. Similarly, *PRNCR1* was excluded from the study by Guo et al., due to its undetectable expression in the LNCaP prostate cancer cell line and TCGA prostate adenocarcinoma RNA-sequencing data [30]. In addition to prostate cancer, polymorphisms in *PRNCR1* at the *8q24* locus are also associated with gastric [57], colorectal [58], and lung cancer risk [59], indicating its functional role in multiple cancers.

A multiethnic meta-analysis study of prostate cancer GWAS in >10 million SNPs in ~80,000 individuals identified a novel prostate cancer risk locus at 9p21 [60]. The prostate cancer risk-associated variant at this locus, rs17694493, is predicted to disrupt the binding motifs of transcription factors STAT1 and RUNX1 and positioned in the intronic region of a novel lncRNA gene *CDKN2B*-*AS1* (also known as *ANRIL*). Moreover, SNPs rs4977574, rs1333048, and rs10757278 in the *ANRIL* gene were also associated with BPH and prostate cancer risk in the Iranian population [61]. Overexpression of *ANRIL* in prostate cancer cells increased cell proliferation and migration by regulating the let-7a/TGFB1/Smad signaling pathway [62], demonstrating the potential molecular mechanism by which this lncRNA mediates cancer progression.

Prostate cancer risk-associated SNPs, rs11672691 and rs887391, were identified to regulate two *PCAT19* lncRNA isoforms with two distinct transcription start sites, *PCAT19*-short and *PCAT19*-long, through a promoter-to-enhancer switching mechanism [63]. The rs11672691 SNP on chromosome 19 was identified to be associated with both non-aggressive and aggressive prostate cancer risk [64], prostate cancer-specific mortality [65], and poor prognosis after diagnosis [63]. *PCAT19*-long promoted prostate cancer progression by interacting with a nuclear riboprotein, Heterogeneous Nuclear Ribonucleoprotein A/B (HNRNPAB), to upregulate a subset of cell-cycle genes [63], suggesting a novel mechanism for the HNRNPAB role in prostate cancer progression.

Some SNPs are also found to regulate distant lncRNAs by chromosome looping. For instance, prostate cancer risk SNP rs378854 at the 8q24 locus was found to regulate the expression of lncRNA *PVT1*, which is located 0.5 Mb away from this variant by longrange chromatin looping [66]. Exon 9 of the *PVT1* gene was overexpressed in aggressive prostate cancer cases with African ancestry, suggesting this could be used as a biomarker for metastatic disease [67]. Knockdown of *PVT1* was shown to reduce prostate cancer growth in vitro and in vivo and increase cell apoptosis in prostate cancer cells [68]. Some of these studies are summarized in Table 1.

#### **3. Breast Cancer Risk-Associated SNPs Modulating lncRNAs**

Breast cancer is the commonly diagnosed cancer in females worldwide. It is a heterogeneous disease on a molecular and clinical level, and has four distinct subtypes: Luminal A, Luminal B, human epidermal growth factor receptor 2 (HER2) overexpression, and triplenegative, based on the status of estrogen receptor (ER), progesterone receptor (PR), and HER2 [69,70]. Breast cancer GWASs have identified more than 200 risk loci, including differential associations with ER+, ER−, or triple-negative breast cancer [7,71,72].

A transcriptome-wide association study by Wu et al. identified 26 lncRNAs through eQTL analysis of breast cancer risk loci [73]. The functional role of three of these lncRNAs: *RP11–218M22.1*, *RP11–467J12.4*, and *CTD-3032H12.1*, was confirmed by the significant reduction in cell proliferation on lncRNA knockdown in three breast cancer cell lines, 184A1, MCF7, and T47D, and reduced colony-forming efficiency in MCF7 cells. *RP11– 467J12.4*, also known as *PR-lncRNA-1*, is mainly localized in the nucleus, and regulated by P53 in human and mouse cells [74]. LncRNA *CTD-3032H12.1* is predicted to interact with

another lncRNA *RP11-20F24.2* and mRNA of *ANKRD30A*, a transcription factor implicated in breast cancer progression, using a tissue-specific co-expression regulatory network model [75].

A recent study by Marjaneh et al. exploring multi-exonic non-coding RNA (mencRNA) genes at 139 breast cancer GWAS loci identified more than 4000 mencRNAs using RNAcapture sequencing [76]. Interestingly, the breast cancer risk variants were enriched in the exonic regions of these RNAs, suggesting that these risk variants may impact RNA stability, structure, or function. One example reported in this study for enriched risk variants in exons is the 2q14.2 locus, where three of the four independent risk signals were in the exonic regions of mencRNAs. Furthermore, eQTL analysis shortlisted 800 mencRNAs, including seven signals: XLOC\_022678, XLOC\_093918, XLOC\_112072, XLOC\_142280, XLOC\_169717, XLOC\_195543, and XLOC\_209276, overlapping with breast cancer risk signals [76]. Four of these eQTLs were identified to regulate mencRNAs through distal interactions as confirmed by Capture Hi-Seq. This includes the potential causal variants at the estrogen-regulated enhancer of two lncRNAs: *CUPID1* and *CUPID2* at the 11q13 locus, which were previously known to promote homologous-based DNA repair [77].

A study carried out by Suvanto et al., analyzing 84 lncRNA and 44 transcribedultra conserved RNA (T-UCR, a subtype of lncRNAs) regions, identified SNPs in seven lncRNAs and eight T-UCRs associated with breast cancer risk, which were not previously reported by GWAS studies [6]. This includes risk SNPs, rs71124350, and rs28489579 at the 15q21.1 locus, which correlates with the expression of GA-binding protein transcription factor-β subunit 1 antisense RNA 1 (*GABPB1-AS1*). This lncRNA was predicted to be associated with two miRNA networks: hsa-miR-3613-3p and hsa-miR-7106-5p, which were differentially regulated in breast cancer compared to adjacent normal breast tissues [78]. Although the functional role of *GABPB1-AS1* is not known in breast cancer, it was reported to have a function in other cancers. For instance, *GABPB1-AS1* is known to regulate oxidative stress by regulating translation of its sense protein GABPB1 when exposed to a small molecule compound, Erastin, that induces non-apoptotic iron-dependent oxidative cell death (ferroptosis) in hepatocellular carcinoma (HCC) cells [79]. Moreover, high expression of *GABPB1-AS1* was correlated with better overall survival of HCC patients [79]. Similarly, high *GABPB1-AS1* expression was correlated with better prognosis and inversely correlated with tumor size, TNM stage, and Furhman stage of clear cell renal cell carcinoma patients [80]. This was further validated using in vitro and in vivo assays with *GABPB1- AS1* overexpression models, resulting in reduced proliferation, migration, and invasion in 786-o and caki-1 renal cell cancer cells and reduced tumor growth in xenograft models [80].

In addition to these transcriptome-wide lncRNA findings of risk loci, some studies have focused on the risk association of genetic variants in well-known breast cancerrelated genes. For instance, rs1899663 and rs7958904 SNPs at lncRNA *HOTAIR*, *HOX* transcript antisense intergenic RNA, were associated with an increased risk of breast cancer in the Southeast Chinese Han population of 969 breast cancer cases and 970 healthy controls [81]. Moreover, rs1899663 SNP was also associated with both disease-free and overall survival in younger cases. On the contrary, Yan et al. reported that rs1899663 and rs4759314 SNPs were associated with reduced breast cancer risk among women with age at menarche >14 while rs920778 SNP was associated with an increased risk in the Chinese population of 502 cases and 504 matched healthy controls [82]. rs1899663 and rs12826786 SNPs were associated with a reduced breast cancer risk in the southeast Iranian population while rs920778 SNP was associated with an increased risk similar to the association reported by Yan et al. [83]. A recent meta-analysis study showed that rs12826786 and rs920778 SNPs at *HOTAIR* were correlated with an increased overall cancer risk [84]. *HOTAIR* belongs to the conserved genomic region of several *HOX* family coding and non-coding genes, and known to play a functional role in embryonic development [85]. Overexpression of *HOTAIR* was correlated with metastasis and poor prognosis of various cancers, including breast cancer [85,86]. Overexpression of *HOTAIR* in breast cancer cells increased invasiveness of these cells in a polycomb repressive complex 2 (PRC2)- dependent manner by reprogramming the polycomb binding profile similar to embryonic fibroblast [87]. Interestingly, *HOTAIR* induction has been shown to also be important for the invasive growth of Claudin-low breast cancer cells, which are triple-negative cancer subtype with low expression of claudin-3, claudinin-4, and claudinin-7 [88].

Another well-known cancer-associated lncRNA, *H19*, a maternally inherited imprinted gene, is reported to be overexpressed in breast cancer, and associated with poor prognosis in breast cancer patients, especially in the triple-negative molecular subtype [89,90]. A genetic association study in the Chinese Han population of 464 breast cancer cases and 467 healthy controls did not observe any significant association with breast cancer risk for two SNPs, rs3741219 and rs217727, in the *H19* gene in the overall analysis, but on stratified analysis, rs217727 SNP was found to correlate with breast cancer risk in patients with ER+ or HER2+ or women who had more than two pregnancies [89]. Overexpression of *H19* in breast cancer cells promoted cell proliferation and migration [91] while *H19* knockdown reduced estrogen-induced cell growth of breast cancer cells [92]. However, a meta-analysis study by Mathias et al. analyzing 31 SNPs in 12 lncRNAs could not find any association for these SNPs with breast cancer susceptibility, including rs920778, rs1899663, rs12826786, and rs4759314 SNPs on the *HOTAIR* locus and rs217727, rs3741219, rs2107425, and rs2839698 SNPs on the *H19* locus [93] likely due to the smaller number of studies included for this analysis. This emphasizes the need for comprehensive functional analysis with experimental evidence to improve our understanding of how these genetic variants contribute to breast cancer pathology (Table 1).



#### **4. Conclusions**

Recently, there has been remarkable progress in our understanding of the multifaceted role of lncRNAs and the genetic variants impacting lncRNA expression and function, recognizing them as critical players in prostate and breast cancer progression. Although a majority of these cancer risk-associated genetic variants are found in non-coding RNA loci, only a few studies have focused on uncovering the role of these SNPs in modulating the structure and function of lncRNAs in cancer progression. Emerging sequencing techniques and bioinformatic analysis are helpful in predicting the putative function of the lncRNA. Databases, such as lncRNASNP2 [94] and LincSNP 3.0 [95], provide information on how these SNPs modulate the lncRNA structure and function. Some of these lncRNAs are differentially expressed in disease progression models and cancer subtypes, highlighting their potential to be used as a diagnostic and prognostic biomarker. For example, PCA3 (also known as DD3) is the only FDA-approved lncRNA used for prostate cancer diagnosis [96], which is overexpressed in prostate tumors compared to non-malignant tissues. Nevertheless, using lncRNA-SNPs to predict the disease progression and/or therapeutic options is still in an early stage, since how these SNPs regulate the expression or function of lncRNAs remains uncertain. Most of these studies have identified cancer-associated lncRNAs through their expression correlation with the SNP genotype. Moreover, whether these SNPs are true causal SNPs or correlated variants in high linkage disequilibrium with the causal SNPs still needs to be clarified. Advancing techniques including CRISPR genome editing may provide comprehensive insights in this field to help identify the functional role of the cancer-associated risk variants through lncRNAs in disease progression and identify their applicability as novel therapeutic targets or biomarkers for multiple cancers.

**Author Contributions:** Conceptualization, P.J. and J.B.; writing—original draft preparation, P.J.; writing—review and editing, S.L.S. and J.B.; supervision, J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the DoD Prostate Cancer Idea Development Grant under Award No. (W81XWH-19-1-0343), Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the Department of Defense, J.B. and S.S. are supported by the Advance Queensland Industry Fellowship.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## **Convergence of Prognostic Gene Signatures Suggests Underlying Mechanisms of Human Prostate Cancer Progression**

**Bogdan-Alexandru Luca 1,2, Vincent Moulton 1,**†**, Christopher Ellis 1, Shea P. Connell 2, Daniel S. Brewer 2,3,**† **and Colin S. Cooper 2,\*,**†


Received: 8 June 2020; Accepted: 10 July 2020; Published: 16 July 2020

**Abstract:** The highly heterogeneous clinical course of human prostate cancer has prompted the development of multiple RNA biomarkers and diagnostic tools to predict outcome for individual patients. Biomarker discovery is often unstable with, for example, small changes in discovery dataset configuration resulting in large alterations in biomarker composition. Our hypothesis, which forms the basis of this current study, is that highly significant overlaps occurring between gene signatures obtained using entirely different approaches indicate genes fundamental for controlling cancer progression. For prostate cancer, we found two sets of signatures that had significant overlaps suggesting important genes (*p* < 10−<sup>34</sup> for paired overlaps, hypergeometrical test). These overlapping signatures defined a core set of genes linking hormone signalling (HES6-AR), cell cycle progression (Prolaris) and a molecular subgroup of patients (PCS1) derived by Non Negative Matrix Factorization (NNMF) of control pathways, together designated as SIG-HES6. The second set (designated SIG-DESNT) consisted of the DESNT diagnostic signature and a second NNMF signature PCS3. Stratifications using SIG-HES6 (HES6, PCS1, Prolaris) and SIG-DESNT (DESNT) classifiers frequently detected the same individual high-risk cancers, indicating that the underlying mechanisms associated with SIG-HES6 and SIG-DESNT may act together to promote aggressive cancer development. We show that the use of combinations of a SIG-HES6 signature together with DESNT substantially increases the ability to predict poor outcome, and we propose a model for prostate cancer development involving co-operation between the SIG-HES6 and SIG-DESNT pathways that has implication for therapeutic design.

**Keywords:** prostate cancer; prognostic signature; diagnostic signature; biomarkers; cancer progression; aggressive cancer

#### **1. Introduction**

A major problem in management of human prostate cancer is the high variability in its clinical course making prediction of outcome at the time of diagnosis or following radical therapy extremely difficult [1,2]. A critical challenge is to improve prediction of outcome beyond the use of standard clinical predictors including D'Amico stratification and CAPRA score [3]. For prostate cancer, the development of expression-based prognostic biomarkers has proven very fruitful with over 20 predictive signatures and classifications reported. Many signatures were derived using supervised approaches involving

comparisons of aggressive and nonaggressive disease [4–17]. Several biomarkers represent particular biological functions [18–22]. For example, the Prolaris biomarker [19] contains genes known to be involved in controlling transition through the cell cycle.

Unsupervised approaches may also be used for classification and biomarker identification [23–26]. We have used an unsupervised mathematical approach called Latent Process Decomposition (LPD) that takes into account the issue of heterogeneity within individual prostate samples to identify a new poor prognosis category of prostate cancer called DESNT [23,27]. In an alternative unsupervised approach, the status of control pathways deduced from expression datasets was analysed using Non-Negative Matrix Factorisation (NNMF), leading to the identification of a poor prognosis category called PSC1 [26]. The presence of somatic copy number alterations, sometimes linked to the expression of genes within regions of alteration, has also be utilised for biomarker identification [28].

An interesting feature of biomarkers discovery involving comparisons of expression linked to different clinical states are the small overlaps between different predictive gene lists for the same biological endpoint. This observation, and its underlying causes, are well documented for human breast cancer [29–31]. A series of studies demonstrated that the lack of overlap cannot be simply be attributed to trivial reasons such as the use of different patient cohorts, different detection technologies, and different analytical methods. This is illustrated by the work of Ein-Dor et al. [29], who repeated the analysis performed by van't Veer et al. [32] during their derivation of the Mammaprint 70-gene predictive signature for breast cancer outcome. Using transcriptome data from many subsets of training samples that were selected from the complete van't Veer et al. dataset, they demonstrated that multiple different but equally predictive 70-gene signatures could be derived. They noted that for many hundreds of individual genes, the correlation with survival had intermediate predictive values and that the differences between values were very small. The relative ranking of genes changed dramatically when slightly different training sets were used, leading to the selection of poorly overlapping predictive signatures.

In the current study, we wished to examine the relationship between biomarker signatures that were derived using a variety different approaches. Our hypothesis is that the progression of prostate cancer occurs via one or a small number of underlying biological process and that the significant overlaps between prognostic signatures obtained by independent methods of discovery and using different datasets may indicate sampling from genes fundamental for controlling cancer progression.

#### **2. Materials and Methods**

#### *2.1. The You et al. Discovery Cohort (DISC)*

To repeat the work of You et al., we complied and normalised expression profiles from the same set of 38 public datasets as in You et al. [26] (DISC cohort) except that we did not include the ArrayExpress dataset E-SMDB-2486, which contains the same samples as the GEO dataset GSE3933. Where available, the raw data has been retrieved from GEO and ArrayExpress repositories, otherwise the provided normalised data from the datasets were used. For the TCGA dataset, the RNA-seq level 3 raw expression data have been downloaded from the TCGA data portal. Only the 217 TCGA samples uploaded before 24/04/2013 have been included in the DISC cohort. Two-channel microarray datasets have been internally normalised using the loess method [33] and across arrays using the quantile method [34]—both implemented in the limma R package [35]. One-channel microarrays have been normalised across arrays using the RMA algorithm [36] implemented in either the *a*ff*y* [37] or *oligo* [38] R packages, depending on the microarray platform. RNA-seq raw read counts have been processed using the variance stabilizing transformation implemented in DESeq2 [39]. For datasets that contained samples from more than one platform, samples from each platform have been normalised separately. The probes from the three platforms used in the GSE6919 dataset have been merged into a single sample, for each patient id. Probes from each platform have been annotated to Entrez gene identifiers using the corresponding Bioconductor annotation packages, if available;

otherwise the probe identifiers have been converted to entrez ids using the SOURCE interface (http://source-search.princeton.edu/help/SOURCE/resultsBatchHelp.html), Agilent annotation lists or biomaRt package [40]. The Multi-Dimensional-Scaling (MDS) decomposition of the expression profiles of the DISC cohort are shown in Figure S1.

For each dataset, duplicate probes for each Entrez Id have been removed, keeping only the probe with the highest mean expression across samples. The gene intensities have been centred by subtracting the median across samples. The DISC cohort has been then assembled by matching the Entrez Ids, resulting in a cohort of 2707 samples and 32,832 genes. Subsequently, the primary tissue samples without an associated Gleason score, have been removed, resulting in a set of 1381 samples. To remove dataset and platform-specific effects, the data was median-centered and the quantile normalised (MCQ) as described in You et al. [26]. Potential differences compared to the original protocol of You et al. may arise because of the removal of sample duplication and the use of the most probable approach when the published protocol was not completely clear.

#### *2.2. Validation Datasets*

Four publicly available transcriptome microarray datasets derived from prostatectomy samples from men with prostate cancer were used as a validation dataset and are referred to as: Memorial Sloan Kettering Cancer Centre (MSKCC) [41], CancerMap [23], CamCap [24], and SWD [42]. From the MSKCC dataset, only prostatectomy specimens were used, both in the derivation of the original DESNT classification and for validation analyses in the current studies. The CamCap dataset was produced by combining two Illumina HumanHT-12 V4.0 expression beadchip datasets (GEO: GSE70768 and GSE70769) obtained from two prostatectomy series (Cambridge and Stockholm). The original CamCap [24] and CancerMap [23] datasets have 40 patients in common and thus 20 of the common samples were excluded at random from each dataset. Each Affymetrix Exon microarray dataset was normalised using the RMA algorithm [36] implemented in the Affymetrix Expression Console software. For CamCap and Stephenson, previous normalised values were used. The ComBat algorithm from the sva R package and quantile transformation, was used to mitigate series-specific effects.

#### *2.3. Replicating You et al. Analysis*

#### 2.3.1. Pathway Activation Z-Score

For a given pathway and a given sample, the pathway activation score has been calculated as indicated in Levine et al. [43], namely:

$$Z\_{tS} = \frac{\overline{X}\_{tS} - \overline{X}\_t}{\sigma\_t} \sqrt{|S|} \tag{1}$$

where *XtS* is the mean expression level of the genes in pathway *S* and sample *t*, *Xt* is the mean expression level of all genes in sample *t*, σ*<sup>t</sup>* is the standard deviation of all genes in sample *t*, and |*S*| is the number of genes in the set *S*.

#### 2.3.2. Non-Negative Matrix Factorization

Non-negative matrix factorization (NNMF) algorithm implemented in the NNMF R package [44] was used with default parameters.

#### 2.3.3. NNMF Random Forest Classifier

A random forest classifier was trained on the DISC cohort to discriminate between the three NNMF clusters, using as features the 14-pathway z-scores calculated as described above. The model has been built using the implementation from the randomForest R package. The number of trees has been set to 5001, and samples within each class are down-sampled to the frequency of the smallest class; otherwise, the default settings have been used. The model obtained an out-of-bag (OOB) overall accuracy of 92.5%, and a per-class AUC of 0.98–0.99 (Figure S2a–c).

#### *2.4. Replicating the Ramos-Montoya Classifier*

To reproduce the Ramos-Montoya classification, a random forest model has been trained on the MSKCC dataset. It uses as training labels the assignment of the MSKCC samples into two classes available in Figure 4a of Ramos-Montoya et al. [22] and as features, the 222 genes in the Ramos-Montoya signature. The model has been built using the *randomForest* R package. The number of tress has been set to 5001, and samples within each class are down-sampled to the frequency of the smallest class; otherwise, the default settings have been used. The model obtained an out-of-bag (OOB) accuracy of 92.67%, and a per-class AUC of 0.99 (Figure S2d).

#### *2.5. Replicating the Prolaris Classifier*

For the Prolaris classification on a given sample, a score is calculated by averaging the within-sample *z*-score normalised expression of the 31 CCP genes [19]. For a given dataset, the top 25% of patients with the highest score are considered high-risk.

#### *2.6. LPD (Latent Process Decomposition) DESNT*

LPD [45,46] is an unsupervised Bayesian approach which breaks down (decomposes) each sample into component sub-elements (signatures). Each signature is a representative gene expression pattern. LPD is able to classify complex data based on the relative representation of these signatures in each sample and can objectively assess the most likely number of signatures. The approach can take into account the heterogeneous composition of individual prostate cancer samples. The LPD procedure was carried out exactly as described previously [23,27]. The OAS-LPD algorithm is a modified version of the LPD algorithm in which new sample(s) are decomposed into LPD signatures, without retraining the model (i.e., without re-estimating the model parameters μgk, σ<sup>2</sup> gk, and α). OAS-LPD was carried out exactly as previously described [27].

#### *2.7. Statistical Analysis*

The statistical analyses have been carried out in R version 3.3.2. For determining the statistical significance of intersection between two sets of genes, the hypergeometrical test has been used. Genes were defined as differentially expressed if the FDR-adjusted *p*-value < 0.001 and a fold change >1.4 and identified for each comparison using a moderated *t*-test implemented in the limma R package. Gene set enrichment analysis was performed using the *Fast Gene Set Enrichment Analysis* Bioconductor package [47] using 10,000 permutations. Survival analyses were performed using the log-rank test and Kaplan–Meier estimator, as implemented in the *survival* R package with biochemical recurrence after prostatectomy as the end point. All survival analyses were performed on the combined CancerMap, CamCap and MSKCC cancer datasets (*n* = 482) unless otherwise stated.

#### **3. Results**

#### *3.1. Relationships between Prostate Cancer Signatures*

As a starting point for this study, we compared 25 published mRNA expression signatures derived to predict aggressive human prostate cancer (Table 1, Data S1). The majority of the gene signatures were determined by comparisons of expression patterns with clinical endpoints, and are predicted to have small overlaps [29]. The pattern of overlaps observed in general fitted this model (Figure S3). We noted two highly significant sets of overlaps (Figure 1a,b, Figure S3, *p* < 10−<sup>34</sup> for paired overlaps, hypergeometrical test) involving signatures that were derived using unsupervised approaches or that involved investigations of particular biological pathways.

**Table 1.** Prognostic and Classification gene signatures. Abbreviations are as follows: A, signature discovered by association with clinically distinct states, B, signature representing a biological function; U, signature identified by unsupervised approach; LPD, Latent Process Decomposition; OAS-LPD, One Added Sample-LPD; HCA, Hierarchical Cluster Analysis; ADT, Androgen Deprivation Therapy; NNMF, Non-Negative Matrix Factorisation; RP, Radical Prostatectomy; PSA Prostate Specific Antigen.


\* Applied HCA for subgroup identification and partial-least-squares regression for signature development. All studies cited are listed in the reference section.

First, there was an overlap between the DESNT genes detected as important by two different LPD procedures, LPD-DESNT [23] and OAS-LPD DESNT [27], and the gene differentially over-expressed in the PCS3 subgroup detected by You et al. [26] (*p* = 2.6 <sup>×</sup> 10−<sup>35</sup> and 2.1 <sup>×</sup> 10−41; hypergeometrical test)(Figure 1a, Data S2). In the work of You et al., three groups designated PCS1 (86 genes), PSC2 (123 genes) and PCS3 (219 genes) were detected by non-negative matrix factorisation (NNMF) of the control pathway status calculated from the observed cancer expression profiles. This match could be considered as a match to PSC1 because DESNT genes are under-expressed and the genes overexpressed in PCS3 are also under-expressed in PCS1 [26] (Figure S4). This pathway is referred to as SIG-DESNT and the set of genes within PCS3 that match the DESNT genes is referred to as PCS3-U (U = underexpressed).

Secondly, there was a three-way overlap between the genes associated with PCS1 of You et al. [26], the Prolaris test genes [19], and the signature of Ramos-Montoya et al. [22] (Figure 1b, Data S3). The Prolaris genes were chosen based on their role in controlling cell cycle. Ramos-Montoya et al. selected genes that were controlled by HES6, a transcription factor that has a critical role in driving the androgen receptor (AR) program. This pathway is referred to as SIG-HES6. The 20 genes within PCS1 matching Prolaris and the HES6 signature are all overexpressed. These genes are referred to as PCS1-O (overexpressed) and are distinct from the genes overlapping with the DESNT signature (Data S2).

We were interested to compare cancers detected by these two groups of biomarkers (SIG-DESNT and SIG-HES6) to test whether they are sampling from the same or different cancers at high risk of PSA failure. To examine high risk cancers detected by NNMF of control pathways, we needed first to repeat the analyses carried out by You et al.

**Figure 1.** Highly significant signature overlaps. (**a**) Overlaps between LPD DESNT, OAS-LPD DESNT and PCS3. (**b**) Overlap between Prolaris, Ramos-Montoya and PCS1 gene signatures. For each pair of signatures, the probability of the observed overlap occurring by chance was calculated as described in the materials and methods.

#### *3.2. Cancer Subgroups Identified by Non-Negative Matrix Factorisation of Control Pathways*

To repeat the work of You et al. we initially complied and normalised expression profiles from 37 public datasets as outlined by the authors (DISC cohort). This resulted in a combine dataset with linked clinical data consisting of primary prostate cancer (*n* = 1059), non-malignant prostate tissue (*n* = 746), and metastatic samples from men with castration-resistant prostate cancer (*n* = 254) (Materials and Methods, Figure S1). We separately compiled a validation cohort consisting of prostatectomy specimens from the MSKCC, CancerMap, CamCap and SWD datasets. For analysis, we used the 14 pathways that were selected by You at al. on the basis of their likely involvement in prostate cancer development. Scores representing the activation status of each pathway in each sample were aggregated into a *z*-score. Computation of the cophenetic coefficient using a putative number of subgroups between two and six indicated three as the most likely number of subgroups (Figure 2a,b).

Assignment of samples to the three subgroups was carried out by NNMF using a 14xN matrix populated with *z*-scores as a starting point. The results showed that the three subgroups were detected each with a distinct pattern of pathway activation (Figure 2c). The three subgroups were designated NMF1, NMF2 and NMF3. The NMF1 subgroup exhibited overexpression of the AV, AR-V, PRF, PTEN, ES control pathways, while NMF2 had overexpression of the ERG, AR and FOXA1 pathways (Figure 2d). NMF3 was characterised by overexpression of the PRC, PN, MES and RAS pathways. These correspond to the patterns of pathway activation to the three subgroups PCS1, PCS2 and PSC3, respectively, identified by You et.al. The assignment of the majority of the samples (94%) was the same (Table S1). However, because of the small differences, a distinct nomenclature is

used in our study (e.g., NMF1 instead of PCS1). The differences may reflect small deviations in the datasets and in the methodology used (Materials and Methods).

We identified 262 genes (NMF1 *n* = 74; NMF2 *n* = 85; NMF3 *n* = 103; FDR < 0.0001; fold-change > 1.4) that were differentially expressed between the three subgroups NMF1, NMF2 and NMF3: 155 of these overlapped with the 428 differentially expressed genes that distinguished PCS1, PCS2 and PCS3 (Figure S5, Data S4). The overlap between the 262 differentially expressed genes and DESNT genes remained highly significant (Data S5): OAS-LPD DESNT and NMF3 (13 genes, *<sup>p</sup>* <sup>=</sup> 2.8 <sup>×</sup> <sup>10</sup><sup>−</sup>17; hypergeometrical test); and DESNT and NMF3 (11 genes, *<sup>p</sup>* <sup>=</sup> 2.2 <sup>×</sup> <sup>10</sup><sup>−</sup>14).

Finally, a random forest classifier trained on the division into NMF1, NMF2, and NMF3 in the original dataset was used to interrogate four test datasets (Figure 2e). In each case, there was a significantly worse outcome for patients assigned to the NMF1 subgroup compared to the NMF2 and NMF3 datasets, consistent with the poor outcome observed for the PCS1 dataset of You et al. We conclude that we have achieved a similar although not identical stratification of cancer samples to that achieved by You et al. and that this may be used as a comparator with DESNT cancer and other stratifications.

**Figure 2.** Non-Negative matrix factorisation of control pathways identified three prostate cancer categories. (**a**) Consensus matrix showing three cancer categories. (**b**) Cophenetic coefficient from rank 2 to 6. (**c**) Pathway activation profiles for each cancer (*n* = 1381) arranged according to the three cancer categories NMF1, NMF2 and NMF3. (**d**) The distribution of pathway activation scores within each cluster. The panels correspond to the three groups of pathways that are over-expressed in each cluster in the You et al. paper. (**e**) Kaplan–Meier plots for four different datasets showing clinical outcome for cancers assigned to the three different cancer categories NMF1, NMF2 and NMF3. \* ≤0.05; \*\* ≤0.01; \*\*\* ≤0.001.

#### *3.3. Overlaps in the Detection of Cancers at High Risk of PSA Failure*

Returning to the comparisons of cancers detected by the SIG-DESNT and SIG-HES6 groups of signatures, we combined data from the CancerMap, CamCap and MSKCC datasets (*n* = 482 patients) with PSA failure as an end point, and then separately applied the DESNT, HES6, NMF1, and Prolaris tests. DESNT is calculated as a continuous variable designated γ representing the proportion of the analysed sample that contains the DESNT signature. Cancers were assigned as a "DESNT cancer" when this gene expression pattern had a larger γ value than any other contributing signatures. Random forest classifiers were used to detect NMF1 and Ramos-Montoya et al. high-risk cancers (Figure S2e–g). We used a published formula to calculate the Prolaris index [19] and selected the 25% of cancers exhibiting highest risk.

Based on the assumption that each of the two signature groups (SIG-DESNT and SIG-HES6) represents a separate underlying molecular mechanism, there are two predicted results. Each signature group could represent an entirely separate progression mechanism in which case two non–overlapping groups of cancers with PSA failures should be detected. Alternatively, the two underlying mechanisms may cooperate to cause cancer progression, meaning that the SIG-DESNT and SIG-HES6 predictors will detect the same or overlapping groups cancers with PSA failure.

The overlaps in memberships of each group at high risk of PSA failure are shown in Figure 3 supporting the second of these models. A total of 30 cancers were assigned as high risk by all of the prognostic makers and of these 20 had undergone PSA failure (66.7%). Of 100 cancers with 50 PSA failures assigned the SIG-DESNT signature (DESNT), 61(37 PSA failures) were also detected by at least one of the SIG-HES6 signatures (HES6, Prolaris, and/or NMF1) (Figure 3). A combination of the Ramos-Montoya et al. (SIG-HES6) and DESNT (SIG-DESNT) models predicted the majority of PSA failures present in the high-risk cancer groups (76 of 84, 90.5%), with 32 failures overlapping.

**Figure 3.** Detection of high-risk cancers. For each sample in the combined dataset obtained by merging the CamCap, CancerMap and MSKCC datasets, we determined whether the patient was deemed high risk using four biomarkers: NMF1, Prolaris, Ramos-Monotoya et al. and DESNT. (**a**) The intersections between the four high-risk categories. The samples in brackets indicate the number of PSA failures. NMF refers to NMF1. (**b**) Kaplan–Meier plot when patients are grouped by the number of biomarkers that indicate that they are high risk. Endpoint is the time to biochemical recurrence. (**c**) Kaplan–Meier plot when patients are grouped by whether they are deemed high risk for DESNT, for at least one of the component biomarkers of SIG-HES6, or for both.

Kaplan–Meier analyses were preformed to investigate the interactions between the high-risk categories. A particularly poor outcome was observed for patients designated as high risk by all four biomarkers (Figure 3b). Looking at the interactions between SIG-HES6 and SIG-DESNT biomarkers, we found intermediate rates of progression for patients deemed high risk either for DESNT (*p* = 0.0037 Benjamini–Hochberg adjusted for multiple testing (BH); pairwise comparison between DESNT only and neither; 26.9 vs. 93.0 months to 25% events) or for at least one of the SIG-HES6 biomarkers (Prolaris and/or PCS1 and/or Ramos-Monotoya et al.) (BH *p* = 0.0039; 42.5 vs. 93.0 months to 25% events; Figure 3c). However, when patients were designated as high risk both by DESNT and by at least one of the SIG-HES6 biomarkers, their outcome was considerably worse (Time to 25% events = 8.0 months; BH *p* < 2 <sup>×</sup> 10−<sup>16</sup> both vs. neither; Figure 3c). This observation is consistent with our hypothesis that two underlying mechanisms represented by SIG-HES6 and SIG-DESNT are interacting to cause cancer progression.

Upon investigation of whether poor outcome was simply determined by Gleason Score, we found that the number of signatures that indicated that a patient was at high risk was an independent prognostic indicator when Gleason was included as a covariate (IQR HR = 1.98; 95% CI 1.54–2.55; *<sup>p</sup>* <sup>=</sup> 1.01 <sup>×</sup> <sup>10</sup><sup>−</sup>7; Cox proportional hazards regression model). In additional, the combination of a high risk defined by at least one member of SIG-HES6 and DESNT is an independent prognostic indicator when Gleason is included as a covariate [HR = 3.86 (95% CI 2.41–6.19)]. This compares to DESNT only [HR = 1.85 (0.99–3.46)], and SIG-HES6 only [HR = 1.61 (95% CI 1.02–2.52)] (Cox proportional hazards regression models). These results show that designation as high risk provides additional prognostic information to that determined by Gleason Score.

#### *3.4. Comparison of DESNT and Non-Negative Matrix Factorisation Subgroups*

We wished to further investigate the relationship between the DESNT and NMF1 poor prognosis groups. To achieve this, the MSKCC, CancerMap, CamCap, and TCGA datasets were combined and DESNT was plotted as a continuous variable (DESNT γ), as described in Luca et al. [27]. DESNT γ was significantly higher in NMF1 cancers compared to NMF2 and NMF3 cancers (Figure 4a) and the results of Gene Set Enrichment Analysis (GSEA) analysis show a highly significant association (*<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>6), giving an enrichment score of 0.61 (Figure 4b).

**Figure 4.** Comparison of DESNT and non-negative matrix classifications. (**a**) Distribution of DESNT γ for cancers assigned to NMF1, NMF2 and NMF3. (**b**) Gene Set Enrichment Analysis. Cancers were ranked according to DESNT γ (Lower Panel). The enrichment for cancers assigned to the NMF1 high-risk group (vertical lines) is shown (Upper Panel). (**c**) Pathway activation profiles for each cancer arranged according to DESNT and NMF1 subgroup status. The key is shown at the bottom of the figure. (**d**) Kaplan–Meir plots for the different cancer categories. The outcome used is time to biochemical recurrence post prostatectomy. \* ≤0.05; \*\* ≤0.01; \*\*\* ≤0.001.

We next calculated pathway status (*z*-scores, as shown in Figure 2b) for all samples in the MSKCC, CancerMap, CancerMap and TCGA datasets and grouped the samples according to NMF1 and DESNT status. The results are shown in Figure 4c. Cancers assigned both as DESNT and NMF1 had the strongest association with time to progression (Figure 4d, *p* = 4.4 <sup>×</sup> 10−16, Log-rank test) followed by DESNT-non-NMF1 cancers (*p* = 4.19 <sup>×</sup> 10−7) and non-DESNT-NMF1 cancers (*p* = 1.45 <sup>×</sup> 10−2). Membership of DESNT accounted for 36% (31/86) of NMF1 cancers in this series but 59% (31/45) of its PSA failures. Notably, activation of the PTEN, ES, AR-V, PRF and EZH2 pathways, a feature of NMF1 cancers, was not present in DESNT-non-NMF1 cancers.

We conclude from these studies that NMF1 and DESNT are overlapping but distinct cancer categories.

#### **4. Discussion**

A number of critical observations arise from the presented studies. Signatures derived by comparisons of expression profiles to clinical features (e.g., to Gleason Score and to PSA failure) exhibited only modest overlaps in gene lists. This was exactly as predicted from previous analyses of breast cancer datasets [29]. When normal cells change into a cancer cells or when the clinical state of a cancer is altered, many thousands of genes may exhibit altered expression levels and multiple control pathways modulated. Based only on the analyses performed to identify these biomarkers, it is not possible to determine whether the genes identified are central to cancer development or represent secondary events. Nonetheless, when biomarker analyses are combined with additional studies, useful individual genes are highlighted. For example, HOXB13 was identified in expression array studies as a gene highly upregulated in prostate cancer [48] but its central importance to cancer development was not established until the analyses of cancer families were performed [49]. AMACR was first identified as a gene upregulated in three of four expression array datasets from prostate cancer, but its importance as a cancer marker was not recognised until immunohistochemical studies of tissue sections were performed [50].

When significant overlaps do occur between predictive gene lists developed using entirely different approaches, it is our belief that this indicates genes fundamental for controlling cancer progression. The observation that HES6-signature reported by Ramos-Montoya et al. [22] overlaps with the PCS1 [26] and Prolaris [19] signatures supports this view. HES6 drives castration-resistant tumour growth by enhancing the transcriptional activity of the androgen receptor, while the Prolaris signature contains many genes known to be critical for cell cycle control—both processes already known to be essential for prostate cancer growth. A second overlap occurred between downregulated DESNT genes [23,27] and a set of genes overexpressed in PCS3 [26]. We propose that genes from these two categories are also involved in processes fundamental to the development of prostate cancer. The precise mechanism is currently unknown, although possible but different models were suggested both by Luca et al. [23] and by You et al. [26].

Support for this model was obtained from analyses of the impact of DESNT and SIG-HES6 signatures on clinical outcome. When patients were designated as poor prognosis by DESNT and by one or more of the SIG-HES6 signatures (Prolaris, PCS1, Ramos-Montoya et al.), a considerably worse outcome was observed compared with use of DESNT or SIG-HES6 signatures alone, consistent with interaction. This observation also has implications for patient management, indicating that use of DESNT classification together with, for example, the Prolaris biomarker or the Ramos-Montoya et al. biomarker could greatly increase the ability to predict whether a patient with organ-confined prostate cancer will progress following treatment. This would allow targeting of treatment to the patients who need it hence avoiding the side effects, including impotence, of unnecessary treatment in men with indolent disease.

The overlapping signature DESNT, PCS3, HES6, Prolaris and PCS3 are all derived using unsupervised approaches or by investigation of biological function. It is of interest that not all signature derived using these approaches demonstrated highly significant overlaps. The derivation of the 70-gene biomarker proposed by Walker et al. [25] represent an interesting case. A 222-gene signature was originally generated using an unsupervised approach. The 70-gene signature represents a subset of these genes derived by a combination of unsupervised and supervised approaches. We failed to observe highly significant overlaps involving this signature (Figure S3). Additional signatures involving unsupervised steps that failed show gene overlaps includes those derived by Lalonde et al. [28] and by Ross Adams et al. [24].

We provide evidence that classifications based on NNMF analysis of control pathways and the DESNT classification are overlapping but distinct. The methods of clinical applications of the two tests are also different. Assignment to the PCS1 poor prognosis category is based on the use of a classifier of 37-gene classifier [26] selected from genes' differential expression between the PCS1, PCS2, and PCS3 groups. In contrast, the poor prognosis DESNT signature is only considered to be present in part of the cancer, with the exact proportion (or DESNT γ) calculated by LPD carried out on genes with the most variable levels of expression across samples [23,27]—the DESNT gene signature itself cannot be used to calculate outcome. Once calculated, the proportion of DESNT cancer can be used in a nomogram together with clinical variables to estimate likelihood of PSA failure [27]. Additionally, PCS1 and PCS3 had been assigned as having, respectively, luminal and basal phenotypes based on the expression of a set of 12 genes [26]. In contrast, we failed to find differential expression of these same genes when comparing DESNT and non-DESNT cancers (result not shown).

An important finding is that all of the highly overlapping signatures predicting poor outcome appear to be sampling from the same high-risk cancer group: the SIG-HES6-and SIG-DESNT groups of signatures are not detecting entirely separate groups of high-risk cancers. This result as well as the observed interactions between DESNT and SIG-HES6 signatures in identifying patients with poor outcome are both consistent with a model where underlying molecular processes represented by SIG-HES6 and SIG-DESNT interact, leading to aggressive disease. This observation may have relevance to approaches for therapeutic targeting. In the clinical setting, the HES6-associated signature can be pharmacologically targeted by inhibition of PLK1 with restoration of sensitivity to castration [22]. For the DESNT signature, many of the genes with downregulated expression in prostate cancer are hypermethylation [23], indicating that 5-azacytidine that could be used to enhance gene expression. Thus, a prediction of the current studies is that the combined use of inhibitors of HES6 function, androgen withdrawal and strategies for gene re-expression, would synergise in preventing the growth of castration-sensitive prostate cancer. Our results also have an implication for the use of biomarkers in general since the use of DESNT together with a SIG-HES6 biomarker may represent a much more effective method for detecting patients with aggressive disease.

#### **5. Conclusions**

To our knowledge this is the first publication to systematically analyze the relationships between multiple distinct prognostic signatures for prostate cancer. We start with the hypothesis that highly significant overlaps between signatures derived using different approaches indicates genes and processes fundamental to prostate cancer progression; leading to the identification of two sets of overlaps designated SIG-HES6 and SIG-DESNT. First, we conclude that our results support a model whereby SIG-HES6 and SIG-DESNT genes co-operated to cause cancer progression. Secondly, consistent with this model, the use of a SIG-HES6 signature in combination with DESNT provides a much better predictor of poor outcome than the use of either alone. Thirdly, for the drug treatment of patients we predict a synergy between (i) inhibitors of HES6 function, and (ii) agents, such as 5-azacytidine, that can induce re-expression of DESNT genes.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/7/802/s1. Figure S1: MDS (Multi-Dimensional-Scaling) decomposition of the expression profiles of the DISC cohort, Figure S2: The out-of-bag(OOB) performance of NNMF random forest classifier on the DISC cohort, Figure S3: Overlaps between the gene signatures considered in this study, Figure S4: Genes identified as subtype-enriched by You et al., Figure S5: Comparison of altered genes identified by non-negative matrix factorization (NNMF) of control pathways; Table S1: Repetition of the non-negative matrix factorization (NNMF) categorisation of prostate

cancer described by You et al.; Supplementary materials: Data S1: List of biomarker and signature gene used in Figure S3, Data S2: Overlaps between the DESNT, OAS-DESNT, and PCS3 gene signatures, Data S3: Overlap gene signatures for the Prolaris, PCS1, and Ramos-Montoya et al. (HES6) gene signatures; Data S4: Comparisons of differentially expressed genes, Data S5: Overlaps between the DESNT, OAS-DESNT, and NMF3 gene signatures.

**Author Contributions:** B.-A.L.—Conceptualization, Writing—original draft, Visualization, Formal analysis, Data curation, Methodology; V.M.—Conceptualization, Supervision, Funding acquisition; C.E.—Formal analysis. S.P.C.: Writing—review & editing, Formal analysis; D.S.B.—Conceptualization, Writing—original draft, Supervision, Methodology; C.S.C.—Conceptualization, Writing—original draft, Supervision, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the Bob Champion Cancer Trust, The Masonic Charitable Foundation successor to The Grand Charity, The King Family, The Hargrave Foundation and The University of East Anglia. We acknowledge support from Movember, from Prostate Cancer UK, The Big C Cancer Charity, Callum Barton and from The Andy Ripley Memorial Fund.

**Acknowledgments:** The research presented in this paper was carried out on the High Performance Computing Cluster supported by the Research and Specialist Computing Support service at the University of East Anglia. These authors contributed equally: Bogdan-Alexandru Luca, Daniel S. Brewer These authors jointly supervised this work: Vincent Moulton, Daniel S. Brewer, Colin S. Cooper. We thank Charlie Massie for helpful comments.

**Conflicts of Interest:** Colin Cooper, Daniel Brewer, Bogdan-Alexandru Luca and Vincent Moulton are co-inventors on a patent application from the University of East Anglia on the detection of DESNT prostate cancer.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Identification and Characterization of Alternatively Spliced Transcript Isoforms of** *IRX4* **in Prostate Cancer**

**Achala Fernando 1,2, Chamikara Liyanage 1,2, Afshin Moradi 1,2, Panchadsaram Janaththani 1,2 and Jyotsna Batra 1,2,\***


**Abstract:** Alternative splicing (AS) is tightly regulated to maintain genomic stability in humans. However, tumor growth, metastasis and therapy resistance benefit from aberrant RNA splicing. Iroquois-class homeodomain protein 4 (IRX4) is a TALE homeobox transcription factor which has been implicated in prostate cancer (PCa) as a tumor suppressor through genome-wide association studies (GWAS) and functional follow-up studies. In the current study, we characterized 12 IRX4 transcripts in PCa cell lines, including seven novel transcripts by RT-PCR and sequencing. They demonstrate unique expression profiles between androgen-responsive and nonresponsive cell lines. These transcripts were significantly overexpressed in PCa cell lines and the cancer genome atlas program (TCGA) PCa clinical specimens, suggesting their probable involvement in PCa progression. Moreover, a PCa risk-associated SNP rs12653946 genotype GG was corelated with lower IRX4 transcript levels. Using mass spectrometry analysis, we identified two IRX4 protein isoforms (54.4 kDa, 57 kDa) comprising all the functional domains and two novel isoforms (40 kDa, 8.7 kDa) lacking functional domains. These IRX4 isoforms might induce distinct functional programming that could contribute to PCa hallmarks, thus providing novel insights into diagnostic, prognostic and therapeutic significance in PCa management.

**Keywords:** prostate cancer; IRX4; alternative splicing; transcript; isoforms

#### **1. Introduction**

Alternative splicing (AS) in precursor mRNA plays a vital role in the regulation of gene expression by expanding the coding capacity of genomes. Diverse combinations of splice sites and alternative promoters in pre-mRNA are chosen to produce structurally distinct mRNA and, thus, protein isoforms that range from slightly different to having the opposite functions [1]. Different mechanistic modes of AS have been identified such as exon skipping, retention of introns, alternative 5 /3 splice sites, alternative promoters, alternative polyadenylation sites and alterations in spliceosomes [1]. Apart from the contribution to the greater diversity of the proteome, AS leads to every hallmark of cancer progression [2]. Recent study findings predict that the vast heterogeneity of human cancers may be a result of the distinct roles of protease isoforms resulting from AS [3]. AS has been reported to modify the network of protein interactions in a disruptive, non-regulated manner, thus disrupting normal cell function via mediating cancer driven pathways [2]. Additionally, AS may induce degradation of the tumor suppressor transcripts by nonsense-mediated decay or induce mutations on the splice-sites, thus having intron-retention in tumor suppressors [2]. AS can considerably alter the coding region of the drug targets of proteins, which leads to drug and therapy resistance in many cancers [4]. Although the activities of transcription factors are extremely and coordinately regulated during

**Citation:** Fernando, A.; Liyanage, C.; Moradi, A.; Janaththani, P.; Batra, J. Identification and Characterization of Alternatively Spliced Transcript Isoforms of *IRX4* in Prostate Cancer. *Genes* **2021**, *12*, 615. https://doi.org/ 10.3390/genes12050615

Academic Editor: Anelia Horvath

Received: 6 March 2021 Accepted: 13 April 2021 Published: 21 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

embryonic growth and differentiation, AS can alter the transcriptional regulation and may switch cells from physiological to pathological transformation [5].

In 2020, prostate cancer (PCa) was the second most commonly diagnosed and the fifth leading cause of cancer-related death among men worldwide [6]. Alternative splicing plays a significant role in PCa, regulating malignant progression, aggressiveness, tumor cell lineage plasticity and therapy resistance [7,8]. Although androgen receptor (AR) splicing has been well studied in PCa, the knowledge on AS of other PCa oncogenes is scarce [9]. Continuous attempts to understand and translate the expertise in AS to PCa may accelerate the discovery of novel diagnostic and therapeutic targets to improve PCa patient care.

Iroquois-class homeodomain protein IRX4, also known as Iroquois homeobox protein 4, encoded by the gene *IRX4*, is a member of the homeobox gene family [10]. Homeobox family genes mainly act as transcription factors and are found in almost all multicellular organisms [11]. They play a crucial role in the regulation of many aspects of embryonic development including pattern formation [11]. The *homo sapiens IRX* genes consist of six members and are organized as two clusters containing three genes each: *IRX1, 2* and *4* cluster on chromosome 5 and *IRX3, 5* and *6* cluster on chromosome 16, which are separated by large intergenic regions [12]. *IRX4* is located at chromosome *5p15.3* locus and has been described as the most divergent member of the IRX family [10,13]. The *IRX* homeobox gene family is unique from other homeobox genes with an extra 3-amino acid loop extension (TALE) in their homeodomain (63 amino acids) and the 9 amino acid domain (the Iro box) [12]. The *IRX4* gene is expressed in different human organs including the developing central nervous system, breast, esophagus, skin, prostate and vagina, but is predominantly expressed in the cardiac ventricles [10]. IRX4 plays a crucial role in the regulation of the ventricular chamber-specific gene expression by triggering the ventricular myosin heavy chain-1 (VMHC1) gene and suppressing the atrial myosin heavy chain-1 (AMHC1) gene [14]. IRX4 supports the maintenance of cardiac contractile function and has a protective role in cardiomyopathy, cardiac hypertrophy and congenital heart disease [15,16]. In addition, *IRX4* expression was identified in the retina as a crucial regulator of *Slit1* expression [17]. Far beyond its physiological role, the differential expression and oncogenic role of IRX4 has been suggested in recent studies. High levels of IRX4 in breast cancer plasma samples have been reported, which suggest the potential of IRX4 as a biomarker for breast cancer [18]. IRX4 expression was found to be downregulated in the mesenchymal cell population compared to epithelial cells in breast cancer [19]. Overexpression of IRX4 drives cell proliferation in non-small cell lung cancer (NSCLC) and is directly associated with the overall survival of patients [20]. The *IRX4* promoter region was found to be frequently hypermethylated in pancreatic cancer, which provides an advantage for cell growth in pancreatic cancer [21]. *IRX4* has been identified recently as a potential candidate gene in PCa after genome-wide association studies (GWAS) discovered the *5p15* locus to be associated with PCa risk. Many studies carried out by Batra et al. [22], Wang et al. [23], Lindstrom et al. [24] and Qi et al. [25] have identified the association of SNP rs12653946 at *5p15*, a cis-eQTL of *IRX4,* with PCa risk in multiethnic populations. IRX4 has been described as a tumor suppressor in PCa with the interaction of vitamin D receptor [26]. The differential roles of IRXs in a tumor microenvironment have been reported [27–29], possibly suggesting their tissue-specific role and/or their splicing, which affect the protein level, thereby modifying the functional capacity of normal cells to prompt and withstand multiple mechanisms related to tumor progression.

*IRX4* is a human multi-exon transcription factor, which has been reported as alternatively spliced and highly expressed in PCa [26]. To date, five transcripts of *IRX4* have been identified, including four predicted alternative promoters [26,30]. *IRX4* has mainly two types of transcripts that differ from the presence of an additional fourth exon, further divided into five transcripts differing only from the 5 UTR region, and the expression of these transcripts have been reported in PCa [26]. The total of five transcripts are predicted to translate into two IRX4 protein isoforms that only differ from the presence of an additional 27 amino acids that is encoded by an extra fourth exon (UniProt database),functional role of

which is still unknown. In this article, we present the diversity of *IRX4* transcripts and IRX4 protein expression in PCa cell lines and clinical samples, as IRX4 protein isoforms may have differential roles in PCa progression. This study enables a broad understanding of the regulation and mechanisms in prostate tumorigenesis, which is very limited at present.

#### **2. Materials and Methods**

#### *2.1. Cell Culture*

A panel of PCa cell lines (LNCaP, VCaP, DuCaP, C42B, PC3, DU145, RWPE2, 22RV1) and benign prostate (BPH1, RWPE1) cell lines were purchased from the American Type Culture Collection (ATCC, Manassas, VA, USA). RWPE-1 and RWPE-2 cell lines were grown in keratinocyte serum-free medium supplemented with 5 ng/mL recombinant human epidermal growth factor (EGF) and 50 μg/mL bovine pituitary extract (Gibco™, Invitrogen, Carlsbad, CA, USA), whereas all other cell lines were grown in RPMI1640 (1X) with no phenol red (Life Technologies, Grand Island, NY, USA) supplemented with either 5% or 10% fetal bovine serum (FBS, Life Technologies, Thornton, NSW, Australia). The cell lines were authenticated by short tandem repeat (STR) profiling and tested negative for mycoplasma. The cells were maintained at 37 ◦C in a 5% CO2 humidified incubator.

#### *2.2. RNA Isolation and cDNA Synthesis*

Total RNA was extracted from PCa cells using the Isolate II RNA Mini Kit (Bioline, London, UK) according to standard protocol. RNA concentration and purity were measured using NanoDropTM1000 (Thermo Scientific, BiolaB, Scoresby, VIC, Australia). A total of 1 μg of RNA was reverse transcribed to cDNA using SensiFastTM cDNA synthesis kit (Bioline, GmbH, Luckenwalde, Germany). The cDNA was diluted to 100 μL before using it as a template for PCR reaction.

#### *2.3. Reverse Transcription-Polymerase Chain Reaction (RT-PCR)*

The primers for RT-PCR and qRT-PCR were designed using NCBI tool Primer BLAST– NCBI–NIH software. Several primer sets were designed to recognize the boundary between 2 exons (exon-exon spanning region) to specifically identify the expression of the *IRX4* transcripts. All the primer sequences are given in Table S1. RT-PCR was performed with a reaction comprising 1X PCR buffer, 1.5 mM MgCl2, 0.2 mM dNTPs and 0.2 μM of each of forward and reverse primers (Sigma Aldrich, Castle Hill, NSW, Australia), 1 U Platinum™ Taq DNA Polymerase (Invitrogen, Carlsbad, CA, USA) and 1 μL of th cDNA template. PCR reaction was performed on a Mastercycler® nexus machine (Eppendorf, North Ryde, NSW, Australia). The samples mixed with loading dye (NEB #B7024S) were loaded on to 0.7–2% agarose gels (Bioline, Alexandria, NSW, Australia) prepared in Tris-borate-EDTA (TBE) buffer (89 mM Tris base, 89 mM Borate, 2 mM EDTA) containing 0.5 μg/mL ethidium bromide (Invitrogen, Carlsbad, CA, USA). Approximately 0.5 μg of 1 kb ladder (New England Biolabs, Ipswich, MA, USA) was loaded to compare the size of the DNA products. Images were captured by the gel documentation system QUANTUM ST5 (Fisher Biotec, Wembley, WA, Australia).

#### *2.4. Relative Quantification by Real-Time Quantitative RT-PCR (qRT-PCR)*

Quantitative RT-PCR was performed using the ViiA7 Real-Time PCR system (Applied Biosystems, Foster City, CA, USA). Each reaction contained 1X final concentration of SYBR Green PCR Master Mix 2X (Applied Biosystems, Foster City, CA, USA), 50 nM forward and reverse primer, 2.0 μL of diluted cDNA (1:5) and nuclease-free water at a final volume of 8 μL. The cycling parameters were 95 ◦C for 10 min, 40 cycles of 95 ◦C for 15 s and 60 ◦C for 1 min followed by a dissociation step. All the CT values were normalized to the expression of housekeeping gene *RPL32* (ΔCT) [31]. Relative expression compared to control was performed by the comparative CT (ΔΔCT) method.

#### *2.5. Androgen Deprivation Assay*

LNCaP, VCaP and DuCaP cells were seeded in RPMI1640 media (Life Technologies, Grand Island, NY, USA) supplemented with 5% FBS and incubated at 37 ◦C for 3 days. The medium was then replaced with an androgen-depleted culture medium (RPMI1640) containing 5% charcoal-stripped serum (CSS, Sigma-Aldrich, Castle Hill, Australia). After 48 h, the cells in CSS were supplemented with 10 nM dihydrotestosterone (DHT), 10 nM DHT + 10 μM anti-androgens (bicalutamide or enzalutamide) and ethanol (EtOH) control and incubated at 37 ◦C for 48 h.

#### *2.6. In Silico Analysis of IRX4 Transcripts and Isoforms*

The UCSC genome browser (https://genome.ucsc.edu/accessed on 24 November 2019) [30], the National Center for Biotechnology Information (NCBI) (https://www.ncbi. nlm.nih.gov/accessed on 18 October 2019) [32], GTEx portal (https://gtexportal.org/ home/accessed on 10 January 2020), The Human Protein Atlas database (https://www. proteinatlas.org/accessed on 25 January 2020) and UniProt database (https://www.uniprot. org/accessed on 12 July 2020) were used to obtain *in silico* data for the expression of *IRX4* transcripts and protein isoforms.

#### *2.7. RNA-seq and Genotype Data for cis-eQTL Analysis*

The RNA-seq data (bam format) of 483 PCa patients and 49 matched controls of The Cancer Genome Atlas Program (TCGA) were used in the study [33]. RASflow was used for RNA-seq analysis [34]. HISAT2, a fast and sensitive alignment program, was selected for alignment to the transcriptome [35]; feature-count was utilized in the quantification, and Deseq2 was used in the normalization of data. Cleaned genotypes data of PCa risk SNP rs10866528 were used for cis-eQTL analysis.

#### *2.8. Reprocessing PRIDE PCa Cell Line LC-MS/MS Data*

LC-MS/MS raw data files of the LNCaP cell nuclear extractions were retrieved from the Proteomics Identification Database (PRIDE) database, belonging to the project ID: PXD003262 [36]. All files were converted to MASCOT generic format (MGF) peak files using MSConvertGUI (Version 3) [37]. Next, all peak files were searched in SearchGUI (Version 3.3.17) and PeptideShaker (Version 1.16.43) against a FASTA database comprising novel IRX4 peptide sequences merged with UniProt human reference database and contaminant proteins [38,39]. X! Tandem search algorithms were implemented using the following search parameters: precursor mass error: 10 p.p.m., fragment mass error: 0.05 Da, fixed modification: carbamidomethylation of cysteine, variable modification: oxidation of methionine defined as variable modification. Minimal peptide length was set to six amino acids. The 1%-fold discover rate (FDR) was set to identify specific tryptic peptides representing each IRX4 protein isoform.

#### *2.9. DNA Sequencing*

The PCR products were purified by Wizard® SV Gel and PCR Clean-Up System (Promega, Madison, USA) and sequenced by AGRF (Gehrmann Laboratories, Research Rd, University of Queensland, Brisbane, Australia). A total of 11 μL of the purified PCR products were sent with 1 μL primers (10 μM) in standard 1.5 mL Eppendorf tubes, and the results were obtained via the AGRF online website. For the identification of amplified DNA fragments, DNA sequences were aligned against the NCBI database [32] and UCSC genome browser [40].

#### *2.10. LC-MS/MS Analysis of PCa Cells*

Cell pellets were obtained and lysed using sodium deoxycholate (SDC) buffer (1% SDC in 1M Tris pH 8.0). The samples were then sonicated in an ultrasonic bath (Thermo Scientific™, Waltham, MA, USA) for 15 min (at 4 ◦C, 100% Power) to denature proteins and shear DNA. The concentration of proteins was calculated using a bicinchoninic acid assay

(BCA) with Pierce™ Bovine Serum Albumin (BSA) Standards (Thermo Scientific™). A total of 10 μg of the protein extract was denatured at 95 ◦ C for 5 min using a thermomixer (Eppendorf ThermoMixer® F1.5, Hamburg, Germany). Denatured protein samples were reduced by 10 mM Tris (2-carboxyethyl) phosphine (TCEP) (Sigma-Aldrich, Castle Hill, NSW, Australia), alkylated by 40 mM 2-chloroacetamide (2CAA) (Sigma-Aldrich) and incubated for 30 min in the dark at room temperature. Samples were then digested overnight at 37 ◦C by adding trypsin (Sigma-Aldrich) at a 1:50 enzyme-protein ratio. Peptides were desalted using Pierce™ C18 Spin Tips (Thermofisher, Waltham, MA, USA), washed in 0.1% TFA, and peptides were dissolved in 80% acetonitrile (ACN) (HPLC grade, Sigma-Aldrich) elution buffer. Solvents were evaporated in a SpeedVac centrifuge (Savant Speed Vac, SPD121P-230, Thermo Electron Corporation, Milford, MA, USA) at 35 ◦C and re-suspended using iRT calibration mix including 2% ACN ans 0.1% TFA (Biognosys AG, Schlieren, Switzerland). Samples were prepared in 3 biological replicates and analyzed by a sequential window acquisition of all theoretical mass spectra (SWATH-MS) approach, following our previously published protocol [41].

Data-dependent acquisitions were imported into ProteinPilotTM software (Version 5.0.1, AB SCIEX) and searched using the Paragon™ algorithm against the FASTA database consisting of novel IRX4 peptide sequences merged with the UniProt human reference database and contaminant proteins. The following search parameters were used: Sample type: Identification; Cys Alkylation: Iodoacetamide; Digestion: Trypsin; Instrument: TripleTOF 5600+; Species: None; Search effort: Thorough ID; Results Quality: Detected protein threshold [Unused ProtScore (Conf)] ≥ 0.05 with FDR. Generated ion library was imported into PeakView® SWATH micro app (Version 2.1, AB SCIEX) and saved in text format and cleaned using the iSwathX tool (Version 2.0). The curated ion library was imported into Skyline software (Version 1.1) [42]. The following peptide and transition settings were followed: Enzyme: Trypsin [KR|P]; Max missed cleavages: 1; Min length: 6; Max length: 35, Precursor charges: 2+,3+,4+; Ion Charges: 1+,2+,3+; Ion types:y/b; Ion match tolerance: 0.5. Data-independent acquisitions were imported, and peptide quantification was performed using MSstats R-based statistical tool (Version 2.0) [43]. Peptides only specific to each IRX4 protein isoforms were used for the quantification, and normalized peptide intensities were used to calculate the relative fold expression.

#### *2.11. Statistical Analysis*

All statistical data were analyzed by GraphPad Prism 9.0.0 (121). The comparison was analyzed by paired t-test (two groups) and Kruskal–Wallis test with Dunn's multiple comparisons (more than two groups). The results were considered statistically significant if \* *p* < 0.05 at a 95% confidence interval. All the experiments were performed in 3 biological replicates.

#### **3. Results**

.

#### *3.1. In silico identification and characterisation of human IRX4 transcripts*

The human *IRX4* gene at *5p15.33* locus comprises six coding exons, seven introns and 3 UTR and 5 UTR regions, which spans from 1,877,413 to 1,887,236 bp (hg38) [32,40]. Although not fully characterized, the IRX4 protein consists of a homeodomain or DNA binding domain, transactivation domain and Iro box (Figure 1a). Five *IRX4* transcripts have been reported for *IRX4* to date in NCBI [32]. The transcript 1(NM\_001278632.1) and the 3(NM\_001278634.2) and 5(NM\_016358.3) transcripts differ from the 2(NM\_001278633.1) and 4(NM\_001278635.2) transcripts since the latter transcripts consist of an additional fourth exon (78 bp). *IRX4* transcripts 1, 3 and 5 are similar in their coding regions but differ from each other with their diverse 5 UTR region, and also the 2 and 4 transcripts are similar in their coding regions and contrast with each other with their different 5 UTR region (Figure 1b) [30]. According to the alternative splicing graph of *IRX4* by the Swiss Institute

of Bioinformatics (Figure 1c), there is clear evidence that *IRX4* has been alternatively spliced and has a probability to encode different transcripts (Figure 1c) [30].

**Figure 1.** (**a**) Schematic representation of IRX4 domains. The predicted localization of the homeodomain, putative transactivation domain, irobox and the number of amino acids (aa) in different domains of the IRX4 proteins. The different domain encoding regions from the *IRX4* gene have been shown with the arrows, C-C-terminal, N-N-terminal regions. (**b**) The predicted *IRX4* transcripts according to gene databases. The five known *IRX4* transcripts: transcript 1(NM\_001278632.1), transcript 2 (NM\_001278633.1), transcript 3 (NM\_001278634.2), transcript 4 (NM\_001278635.2) and transcript 5 (NM\_016358.3) and two predicted *IRX4* transcripts (PV1: HTR011738.5.69.3 and PV2: HTR011738.5.69.2) according to the Swiss Institute of Bioinformatics gene predictions and the novel putative exon (exon 3a) are presented with alignment with the Reference Variant (RV). Introns are labelled from I1 to I7. (**c**) Alternative splicing graph detailing alternative splicing (AS) events of the *IRX4* gene by the Swiss Institute of Bioinformatics, adapted from the UCSC genome browser. Lines on the plot show the exon–exon junctions. The overlapping lines denote the two types of exon–exon junction, suggestive of the presence of transcripts of *IRX4.* (**d**) The characteristics of PCa cell lines used in the study.

> Four alternative promoters were predicted to be localized upstream of the 5 UTR of *IRX4* by the UCSC genome browser: the alternative promoter 1 (chr5:1887187–1887336), the alternative promoter 2 (chr5:1887169–1887318), the alternative promoter 3 (chr5:1882876– 1883025) and the alternative promoter 4 (chr5:1881044–1881193), as shown in the Figure 1b [30]. The predicted 1 and 2 alternative promoters and the presence of an EST (AI246240) in the 5 UTR region facilitated the identification of novel 5 UTR exons (exon 1a and 1b)

(Figure 1b). Nguyen et al. suggests the presence of a diverse 5 UTR region of *IRX4*, later confirmed by Northern blot analysis and RACE [26]. Further, few additional transcripts of *IRX4* have been predicted by the Swiss Institute of Bioinformatics gene predictions based on the mRNA and EST expression [30]. A novel transcript (PV1) (HTR011738.5.69.3) with a putative novel exon (exon 3a, 89 bp) localized between the second and third exons has been predicted (Figure 1b). The presence of this putative exon is also reported for *IRX4* in an EST dataset (BY799479) and the alternative splicing prediction plot of *IRX4* by Swiss Institute of Bioinformatics (Figure 1c). The expression of this exon has been reported in the skin with no sun exposure and esophagus mucosa according to the GTEx portal data. Moreover, the presence of an alternative promoter region (alternative promoter 4) has been predicted in this region, which further confirms a new starting point for this transcript (Figure 1b). Another short transcript (PV2) (HTR011738.5.69.2) starting from a novel open reading frame (ORF) from exon 5 of the *IRX4* has also been reported (Figure 1b) [30]. The 5 UTR region of this transcript consists of four exons, slightly differing from the other transcript exons by retention of part of the intron 6 (I6) and longer first UTR region. The 3 UTR end of both predicted transcripts is similar to that of other transcripts (Figure 1b). The *IRX4* 1-4 transcripts may deploy the alternative promoter 1 and 2; the transcript 5 utilizes the alternative promoter 3, while PV1 makes use of alternative promoter 4. The existence of alternative promoters reflects the possibility of multiple transcripts for the *IRX4* gene that differ in their size and regulate separately at their transcription level.

Expression of most of the *IRX4* exons f was observed across several tissue types such as skin, prostate, salivary glands, esophagus, vagina and heart, but the last exon (exon 6) was abundantly expressed in most of the tissues studied (GTEx portal data). IRX4 proteins were mainly localized in the nucleus and vesicles. Enhanced expression of IRX4 has been detected in some human cell lines, such as brain cancer (BEWO), skin immortalized (HaCaT), lung immortalized (HBEC3-KT) and breast cancer (MCF7) cell lines (the Human Protein Atlas database).

Even though IRX4 has been identified for its transcriptional role in the human heart and tumor suppressor role in PCa, the IRX4 proteins are so far not fully structurally and functionally characterized for their interactions and the functional domains. Three domains have been predicted to date: the homeodomain (DNA binding domain) (63aa), the putative transactivation domain (18aa) and Iro box (9aa) for the IRX4 protein (Figure 1a). The homeodomain and the transactivation domain of IRX4 are encoded from part of the fifth exon of the transcript, and the iro box region is encoded from part of the sixth exon [14]. The IRX4 protein isoform 1 (Uniprot ID- P78413-1 length: 519 aa; mass: 54.4 kDa) is encoded from the transcripts 1, 3 and 5 and differs only at 26 amino acids with IRX4 protein isoform 2 (Uniprot ID- P78413-2 length: 545 aa; mass: 57.0 kDa), which is encoded from transcripts 2 and 4. The addition of 26 amino acids in isoform 2 may result in a structural change that may affect the function. A panel of PCa cell lines representing the benign and metastatic nature has been used in the study. All the features corresponding to each PCa cell line have been characterized in Figure 1d.

#### *3.2. Identification of Human IRX4 Transcripts in PCa Cell Lines*

Firstly, we tried to identify the expression of five known *IRX4* transcripts in a panel of PCa cell lines. Two main bands (369 bp and 291 bp) on the agarose gel were detected in RT-PCR with the primer set 1, designed to amplify the region from complementary to exon 2 to 5, confirming the presence of two transcripts, where one transcript contains a fourth exon and the other does not (Figure 2a,f). The sequencing analysis of the resulting bands confirmed the presence of at least two *IRX4* transcripts with and without exon 4 in PCa cell lines The upper band may correspond to either of the transcripts 2 and 4 (Figure 2a). The lower band may correspond to either of transcripts 1, 3 and 5.

**Figure 2.** *IRX4* transcripts expression in a panel of PCa cell lines. (**a**) *IRX4* 1-5 transcripts expression (RT-PCR) from a panel of cell lines representing benign prostate (BPH1, RWPE1), PCa (RWPE2, DuCaP, VCaP, LNCaP, C42B, 22RV1, PC3, DU145) and immortalized prostate stromal cell line (WPMYl1). 1 Lane: ladder 1kb; 2–12 Lanes: cell lines; 13 Lanes: non-template control (NTC). *RPL32* was used as the endogenous housekeeping control. (**b**) *IRX4* predicted identification of transcript 6 and 7 in LNCaP and C42B cells (451 bp and 529 bp). (**c**) *IRX4* predicted identification of transcript 8 (176 bp) in C42B and LNCaP cells. (**d**) *IRX4* predicted identification of transcripts 10 and 11 in DuCaP and C42B cells (386 bp and 464 bp). (**e**) *IRX4* predicted identification of transcript 12 in BPH1, RWPE1 and RWPE2 cells (396 bp). (**f**) The sequencing BLAT results of gel bands (a) to (h) from Figure a to Figure e from the UCSC genome browser and the localization of primers re shown with arrows. The sequencing BLAT results for each transcript are presented with the alignment with the Reference Variant (RV). (**g**). The quantitative expression of *IRX4* 1-5 transcripts across a panel of prostate cell lines (BPH1, RWPE1, RWPE2, DuCaP, VCaP, LNCaP, C42B, 22RV1, PC3, DU145). *RPL32* was used as the endogenous housekeeping control. The relative fold expression was determined using the ΔΔCT method with respect to BPH1 (N = 3 biological and technical replicates, Mean ± SD, Kruskal–Wallis test with Dunn's multiple comparisons, \* *p* < 0.05).

Next, we investigated the expression of the predicted novel transcript 1 (PV1) (HTR011738.5.69.3, transcript 6) in C42B and LNCaP cells. The presence of this transcript was identified by RT-PCR after designing a specific primer set that targets the novel exon (exon 3a) and fifth exon (primer set 2). Unexpectedly, two bands higher (about 451 bp and 529 bp) than the expected size (281 bp) were detected (Figure 2b). The Sanger sequencing analysis of the purified PCR product identified the intron retention (I4) between the novel exon (exon3a) and the third exon. The Sanger sequencing analysis of the purified product of the upper band confirmed the existence of the additional fourth exon suggestive of an AS event of transcript 6 (Figure 2b,f). For subsequent identification, the transcript that corresponds to the upper band was termed transcript 7 (Figure 3).

**Figure 3.** Summary of all identified *IRX4* predicted transcripts in PCa cells. All the exons and predicted UTRs are shown with red and black boxes, respectively, while introns are shown with black lines. The pink boxes indicate the location of predicted alternative promoters. Introns are labelled from I1 to I7. Each *IRX4* transcript is presented with the alignment with the Reference Variant (RV).

> Next, we were interested to know whether the novel exon (exon 3a) is the first exon of transcripts 6 and 7. We designed a specific primer set complementary to the second exon and the novel intron region (I4) (primer set 3) and checked the expression in C42B and LNCaP cells. Interestingly, we found the presence of a band around 176 bp (Figure 2c) in PCR, but below the expected size (265 bp). Sequencing analysis confirmed a new *IRX4* transcript (which designated as variant 8) corresponding to this band (Figure 2c, f). This transcript lacks the novel 3a exon but has the intron 4 (I4) region; the intron 4 and the third exon appear as a single exon, which we called exon 3b. We saw that this transcript has an AS event with the fourth exon and appears as a new transcript with the additional fourth exon, which we termed as transcript 9 (Figure 3).

> Interestingly, two new transcripts were identified with primers in the second and sixth exon in DuCaP and C42B cells. Two unexpected lower bands (386 bp and 464 bp) were detected with the primer set 4, in addition to the two expected bands (544 bp and 622 bp, Figure 2d). The sequencing results confirmed that the lower bands lack part of the exon 5 (exon 5a) of *IRX4* (Figure 2d,f). The two lower bands differ from each other with the additional fourth exon. The transcript that lacks the part of exon 5 and the fourth exon is termed transcript 10, and the transcript that lacks part of the exon 5 but retains the fourth exon is termed transcript 11 (Figure 3).

> We were then able to confirm the presence of the second predicted transcript (PV2) by the Swiss Institute of Bioinformatics. The transcript has an extended fifth exon (exon 5b) in its 5 UTR region; the forward primer was designed specifically for this region to exclude the expression of other transcripts (primer set 5). The expression was detected in this

transcript at 396 bp as expected in few PCa cell lines (Figure 2e). This short transcript, which we denoted as transcript 12, was predicted to have a coding region starting from the fifth exon, and all other starting exons have been predicted as 5 UTR regions (Figure 3).

The relative *IRX4* transcripts 1–5 expression levels were measured by qRT-PCR with three different primer sets, corresponding to all transcripts (primer set-12), transcripts 1, 3 and 5 (primer set-13) and transcripts 2 and 4 (primer set-14) (Figure 2g). The castrationresistant PCa cell line, C42B, had the highest expression of *IRX4* (more than 20-fold) according to the primers capturing all *IRX4* transcripts (Figure 2g) compared to the expression in BPH1. The RWPE-1 and RWPE-2 prostate epithelial cell lines expressed a significantly higher expression of all *IRX4* transcripts compared to BPH1. However, a poor correlation between the intensity of gel bands (Figure 2a) and qPCR graphs (Figure 2g) was detected due to different primer efficiencies used in PCRs. Compared to androgenresponsive cell lines LNCaP, DuCaP and VCaP, androgen-nonresponsive PCa cell lines PC3, 22RV1 and DU145 had a minimal expression of *IRX4*. Interestingly, overall *IRX4* expression was significantly higher in the castration-resistant C42B cell line (a derivative of LNCaP) compared to LNCaP, indicating a role of IRX4 in castration-resistant PCa progression.

The lower band that corresponds to transcripts 1, 3 and 5 expression was also prominently detected in C42B cells (more than 25-fold) compared to BPH1, but there was a comparatively low expression in 22RV1 and PC3 cell lines (Figure 2g) consistent with the RT-PCR results. *IRX4* transcripts 2 and 4 were prominently detected in RWPE1 and 2, but were weakly detected in 22RV1, DU145 and PC3 cells, similar to the RT-PCR upper band of the gel (Figure 2g). Overall, the results of the expression of *IRX4* transcripts showed a higher disparity between PCa cell lines in relation to their androgen responsiveness, which may be suggestive of differential regulation of *IRX4* transcripts by androgen in PCa cells. Our results indicate that human *IRX4* is extremely alternatively spliced, and many *IRX4* transcripts expressions were able to be identified in most of the PCa cell lines. The expression of *IRX4* transcripts still have not been completely annotated in gene databases. All the transcripts identified for *IRX4* in PCa cells are summarized in Figure 3. The sequence alignment for transcripts 6, 8, 10 and 12 is shown in the Supplementary Data (Figure S1) concerning their altered regions.

#### *3.3. Characterisation of Human Novel IRX4 Transcripts in a Panel of PCa Cell Lines*

Differential expression of *IRX4* transcripts 1–5 expression was detected in a panel of PCa cell lines (Figure 2a). The RT-PCR primers (grey) were designed specifically to target one transcript at one time excluding all the other transcripts (Figure 4a). However, the expression of transcripts 6 and 7 (primer set 15) and transcripts 8 and 9 (primer set 16) was not distinguishable designing specific qRT-PCR primers (green), but transcripts 10, 11 and 12 expression was individually measured with specific primer sets (primer set 17, 18 and 19) to obtain a relative quantification (Figure 4b). All the bands obtained were confirmed for their sequence with Sanger sequencing.

The highest expression of transcripts 6 and 7 was detected in DuCaP cells (~15 fold) compared to BPH1, consistent with the RT-PCR results (Figure 4b). However, C42B, RWPE1 and RWPE2 cells showed a significantly higher expression of transcripts 6 and 7 compared to BPH1 (Figure 4). The highest expression of transcripts 8, 9, 10, 11 and 12 was detected in C42B cells, and it reached statistical significance compared to BPH1, in line with the intensity of gel bands suggestive of a role of IRX4 in metastatic progression of PCa. (Figure 4b). The identified novel transcripts showed a minimal or no expression in 22RV1, PC3 and DU145 cell lines (Figure 4b), except for transcript 12 which has a comparable higher expression in DU145 cells. The PC3 cell line has a minimal expression of all identified novel transcripts. The results showed a high disparity of expression of *IRX4* novel transcripts between PCa cell lines. The results suggest that the identified novel transcripts significantly contribute to the overall overexpression of *IRX4* in PCa and may have a role in PCa progression.

**Figure 4.** *IRX4* novel transcripts 6–12. (**a**) Qualitative and (**b**) quantitative expression across a panel of PCa cell lines. Expression analysis of each novel *IRX4* transcript with RT-PCR using transcript-specific primer sets (shown in grey arrows) and qRT-PCR using specific primer sets (shown in green arrows) in a panel of cell lines (BPH1, RWPE1, RWPE2, DuCaP, LNCaP, C42B, 22RV1, PC3, DU145). 1 Lane: ladder 1kb; 2–10 Lanes: cell lines; 11 Lanes: non-template control (NTC), *RPL32* was used as the endogenous control. The relative fold expression was determined using the ΔΔCT method with respect to BPH1 (N = 3 biological and technical replicates, Mean ± SD, Kruskal–Wallis test with Dunn's multiple comparisons, \* *p* < 0.05).

#### *3.4. Androgen Regulation of IRX4 Transcripts in PCa Cell Lines*

Relative expression of *IRX4* transcripts in PCa cell lines with comparatively higher expression in androgen-responsive cell lines and low expression in androgen-nonresponsive cell lines may indicate that the expression of *IRX4* transcripts could be regulated by androgens and may have a role in therapy resistance. Thus, the expression of each *IRX4* transcript was determined in androgen-responsive cell lines (LNCaP, VCaP and DuCaP) with androgen (DHT) and antiandrogen (Bicalutamide and Enzalutamide) treatment. *KLK3* (PSA) expression was used as a positive control to validate both the treatments. *KLK3* was overexpressed with DHT treatment in all three cell lines, VCaP ~500-fold, DuCaP >10-fold and LNCaP ~40-fold and downregulated with antiandrogen treatment (Figure 5) compared to the ethanol (EtOH) control. Almost all *IRX4* transcripts expressions were upregulated with DHT treatment in VCaP and DuCaP cells compared to the EtOH control, and the upregulation of transcript 10 and 11 was prominent among other transcripts. Transcript 10

is overexpressed more than 300-fold and transcript 11 by more than 200-fold compared to EtOH with DHT in VCaP cells, and by 15-fold and 12-fold compared to EtOH in DuCaP cells, respectively, consistent with the treatment effect. Interestingly this expression pattern is not observed in LNCaP cells even with 40-fold *KLK3* overexpression (comparatively higher as compared to DuCaP) (Figure 5). The bicalutamide and enzalutamide treatment effectively reduced the androgen mediated expression of all identified *IRX4* transcripts in VCaP and DuCaP cells except for transcripts 6 amd 7 in DuCaP compared to DHT. This suggests that the antiandrogen treatment is not effective to reduce the expression transcripts 6 and 7, which show the highest expression in DuCaP cells among other cell lines. Although the expression of *KLK3* was reduced in LNCaP cells with antiandrogen treatment, a significant upregulation or downregulation of *IRX4* transcripts was not observed in LNCaP cells (Figure 5).

**Figure 5.** Regulation of *IRX4* transcripts expression by androgen and anti-androgen treatment in PCa cells. VCaP, DuCaP and LNCaP PCa cell lines were treated with 10 nM of androgens (DHT) or DHT + 10 μM bicalutamide (BIC) or DHT + 10 μM enzalutamide (ENZ) or EtOH/control. Relative fold expression of *IRX4* transcripts compared to EtOH/control expression was measured using the ΔΔCT method using *RPL32* as the endogenous control. (N = 3 biological and technical replicates, Kruskal–Wallis test with Dunn's multiple comparisons, Mean ± SD, \* *p* < 0.05).

#### *3.5. Identification and Characterisation of IRX4 Protein Isoforms by Mass Spectrometry*

All the presented known and novel *IRX4* transcripts were characterized by their ORF for having the potential to encode novel protein isoforms of IRX4 using the ExPASy translate tool. The two protein isoforms, IRX4 isoform 1 and 2 encoded from transcripts 1 to 5, have been reported in the UniProt database (P78413-1 and P78413-2), and the expression has been validated in the human heart. The ORF coding site ranges from the first exon to exon 6 in *IRX4* transcripts 1 to 5 and encodes full-length proteins (54.4 kDa and 57 kDa). For most of the newly identified transcripts, the coding frame shifts from the main transcripts, and novel stop codons are incorporated (Table 1). Therefore, new transcripts produce truncated proteins. There is more than one coding frame for a few transcripts such as transcripts 6, 7, 8 and 9. The proteins generated from those novel transcripts differ from the main two protein isoforms and may lack the essential domains for their functional activity. Splice deletion of exon 1 and 2 and insertion of intron 4 (I4) produces transcripts 6 and 7 and thus has the capability to produce truncated proteins with two probable coding frames (ORFs), exon 3a–exon 5 and exon 5–exon 6 (Table 1). Moreover, transcripts 8 and 9 have intron retention between exons 3 and 3a, excluding exon 3a, and change the whole coding frame to encode a short protein isoform (11 kDa) due to the insertion of a new stop codon in their intron region (I4). The predicted second frame for these two transcripts is similar to the frame of exon 5–exon 6. Among the 12 *IRX4* SVs, two transcripts had a deletion of exon 5 (transcript 10 and 11), thus inserting a new stop codon at exon 6 which is predicted to encode two protein isoforms (21 kDa and 23.6 kDa), respectively. Transcript 12 presents the coding frame from exon 5 to exon 6 that predicted the encoding of a novel protein isoform (40 kDa). The transcripts 6, 7, 8, 9 and 12 also have the potential to encode this protein isoform. Then, we tried to identify the exact coding frame for each *IRX4* transcript by looking at their protein expression level. The details of the ExPASy tool analysis for probable protein coding frames of the identified *IRX4* transcripts and the probability to obtain their predicted domains are mentioned in Table 1.


**Table 1.** ExPASy tool analysis for predicted proteins from identified transcripts.

✓ = Presence of predicted domain.

Due to the lack of availability of IRX4 protein isoform-specific antibodies, we analyzed the predicted protein coding frames of all identified *IRX4* transcripts with the available proteomic identificationdatabase of PCa cell lines (PRIDE) with peptide shaker and seachGUI software. We could detect the MS/MS fragmentation ion spectrums of peptides that correspond to the coding frames of four IRX4 protein isoforms in LNCaP cells (Figure S2). They are the following: the first isoform (54.4 kDa) encoded by transcripts 1, 3 and 5, the second isoform (57 kDa) encoded by transcripts 2 and 4, the third isoform (40 kDa) encoded by transcripts 12, 8 or 9 and the fourth isoform (8.7 kDa) encoded transcripts 6 and 7. A peptide sequence that is specific to isoform 2 and a specific peptide sequence that is specific to isoform 4 were detected in the mass spectrometry data individually (Figure S2). We also detected the common peptides related to isoforms 1 and 3 but cannot exactly differentiate these isoforms since they share a similar sequence identity with isoform 1. Unfortunately, we did not see the expression of peptides generated from the coding frames of transcript 10 (21 kDa) and 11 (23.6 kDa) and the short coding frame of transcripts 8 and 9 (11 kDa). Thus, the fate of these short proteins is still unknown, which may be suggestive of nonsense-mediated decay of these transcripts or might be temporal. The identified peptides in the IRX4 protein isoforms and the MS/MS fragmentation ion spectrum files of the peptides are presented in Figure S2.

Following the identification of the IRX4 protein isoforms in the published PRIDE data, we intended to quantify the isoform-specific peptide expression in PCa cell lines and BPH1 using the SWATH-MS/MS approach. Specific peptides used to determine isoformspecific expression in PCa cell lines were mentioned in Figure 6. Expression of protein isoform 3 was excluded from the analysis since it does not contain a specific peptide to differentiate from other isoforms. According to the analysis, the expression of IRX4 protein isoform 1 showed significantly higher expression in the VCaP cell line (~10 fold) and LNCaP cell line (~6 fold) compared to the expression in BPH1 (Figure 6a). Although not statistically significant, we observed a moderately higher expression of protein isoform 2 in LNCaP and DuCaP cell lines (Figure 6b). The expression of IRX4 protein isoform 4 showed significantly higher expression in the LNCaP cell line (~2 fold) compared to the expression in BPH1 (Figure 6c). In summary, IRX4 protein isoforms showed a marked expression in androgen-responsive cell lines (LNCaP, DuCaP and VCaP) compared to androgen-nonresponsive cell lines (22RV1 and DU145). However, all three IRX4 protein isoforms were found highly expressed in the castration resistant C42B cell line compared to other androgen-nonresponsive cell lines used for the analysis.

#### *3.6. Validation of Expression of IRX4 Transcripts in PCa Patients*

Since the expression of individual *IRX4* transcripts has not been determined in patients with PCa earlier, the expression of the identified *IRX4* transcripts was analysed in 49 PCa patient sample tumors (*N* = 49) and their matched non-malignant tissues (*N* = 49) from the TCGA database. We observed significant overexpression of *IRX4* transcripts 3, 5 (protein isoform 1) and 6 (protein isoform 4) in PCa tissues compared with their non-malignant tissues (Figure 7). However, the expression of *IRX4* transcripts 2 and 4 (protein isoform 2) in PCa tumor samples did not reach statistical significance compared to their normal tissues. The number of samples that express *IRX4* transcript 1 was quite low in the TCGA database and was thus excluded from the study. Since the identified novel transcripts are still not annotated with precise IDs, we could not extract the data related to the expression levels from the RNAseq data of the TCGA patient samples.

#### *3.7. Association between PCa Risk SNP rs12653946 and Expression Levels of IRX4 Transcripts*

The numerous prostate cancer-associated SNPs have been identified by GWAS, and the regulation effect of these SNPs on the expression of the nearby genes is investigated by expression quantitative trait loci (eQTLs) methods [44]. Those SNPs that can influence the expression of nearby genes are called cis-eQTL. Xu *et al*. identified fifty-nine SNPs from 39 distinct PCa risk loci, among them rs12653946 SNP was found to be a cis-eQTL which has the strongest association with *IRX4* [44]. The risk-associated genotype (GG) was associated with lower IRX4 levels in PCa [44]. In view of this, we were interested to see whether the expression pattern of the *IRX4* transcript varies with PCa patients' genotypes. rs10866528 is a tag SNP for PCa-risk SNP rs12653946, which is in linkage disequilibrium (LD), and it was later found that this SNP itself can act as a PCa-risk SNP [45]. We extracted the *IRX4* transcripts expression data of483 patients from the TCGA database and correlated them with the patients' SNP rs10866528 genotype. According to the analyzed expression of *IRX4* transcripts 2, 4, 5, and 6. we found that the GG PCa risk-associated genotype is associated with lower levels of all the analyzed *IRX4* transcripts (Figure 8). Although we

did not see a significant lower expression of *IRX4* transcript 3 with the genotype GG, the sample number that showed the expression of transcript 3 was quite low.

**Figure 6.** Identification and quantification of IRX4 isoform-specific peptides across a panel of prostate cell lines. Figures demonstrate the extracted ion chromatogram (XIC), MS/MS spectrum, charge and mass/charge ratio (m/z) of (**a**) IRX4 protein isoform 1, (**b**) IRX4 protein isoform 2 and (**c**) IRX4 protein isoform 4-specific peptides. The expression of each isoform-specific peptide was measured in BPH1, DuCaP, VCaP, LNCaP, C42B, 22RV1 and DU145 cell lines. The relative fold expression was measured compared to the BPH1 cell line. (N = 3 biological replicates, Mean ± SD, Kruskal–Wallis test with Dunn's multiple comparisons, \* *p* < 0.05).

**Figure 7.** The expression of *IRX4* transcripts 2, 3, 4, 5 and 6 in PCa patient tumors and their matched normal tumor samples according to the TCGA database. A single patient is shown by a blue dot (the normal tissue) and a red dot (prostate tumor tissue), and the expression is matched with the line joining the dots. The number of patients in the normal and tumor categories is mentioned under each graph. (paired *t*-test, \* *p* < 0.05).

**Figure 8.** The expression of *IRX4* transcripts (mapped on DESeq2-normalized counts) by genotypes (AA, AG, GG) related to risk SNP rs10866528 in PCa. The number of patients in each genotype is mentioned under each genotype of the graph. (Mean ± SEM, Kruskal–Wallis test with Dunn's multiple comparisons, \* *p* < 0.05).

#### **4. Discussion**

Novel research insights have proved that protein isoforms are sophisticated expressionbased biomarkers in cancer prediction and progression [46–51]. Systematic sequencing of the human genome and transcriptome has revealed that more than 90% of genes express multiple mRNAs via AS events, suggesting a major impact on the functional diversity of proteins [52]. However, cancer cells frequently exhibit abnormalities in RNA splicing to survive, grow and progress to therapeutic resistance [53]. Many studies have highlighted that alternative RNA splicing is a common intrinsic mechanism leading to therapy and drug resistance in PCa [8,9,54]. For example, constitutively active ARtranscript 7 (*AR-V7*) in prostate tumor cells confers a primary or an acquired resistance to androgen deprivation therapy [55]. In addition to *AR*, several other genes undergoing AS, such as *FGFR, VEGF, Bcl-x, SH3GLB1* and *CCDN1*, were found to be associated with PCa development and progression [8]. Homeodomain transcription factors are shown to frequently be alternatively spliced [56]. For example, the homeodomain gene *HNF1B*, which encodes for three protein isoforms A, B and C, has been shown to have different functions with respect to the transcripts, as HNF1B A and B protein isoforms act as transcriptional activators, while HNF1B C protein isoform that lacks the transactivation domain functions as a transcription repressor [56–58].

Nguyen et al. have identified four novel transcripts of *IRX4* in PCa cells [26]. Although for these four *IRX4* transcripts, the sequences of exons 1 to 6 were highly conserved, the sequences of their upstream exons, which encode the 5 UTRs, are diverse in sequence and length [26]. We now reveal considerably greater diversity in human *IRX4* transcripts than previously realized. In the present study, we identified by RT-PCR analysis 12 *IRX4* transcripts, including seven novel transcripts in PCa cells. This diversity of the *IRX4* gene mainly arises from alternative promoter usage, intron retention, exon skipping and alternative 3 and 5 splice site usage. The 5 UTRs of mRNAs play important roles in the posttranscriptional regulation of gene expression [59]. Nguyen et al. have identified additional exons of the *IRX4* 5 UTR region by Northern blot analysis of *IRX4* transcripts [26]. The identified novel *IRX4* transcripts are yet to be characterized for their UTR regions. This complex isoform diversity of *IRX4* gene may not be constrained to PCa but can deviate to other cancers with significant expression of IRX4; thus, it is worth exploring the pathological and/or physiological impact of them.

Androgens and AR play a critical role in PCa pathogenesis. Androgen deprivation therapy (ADT) has been the mainstay of management for advanced PCa. Despite the initial strong responses to androgen deprivation therapy, the majority of patients with advanced PCa relapse with fatal castration-resistant PCa (CRPC) [60]. According to our data, *IRX4* and its isoforms are differentially expressed depending on the cell lines' androgen responsiveness. Castration-resistant cell line C42B showed the highest expression of most of the *IRX4* transcripts at the mRNA level. However, RNA expression does not correlate with proteomic data and complicates the precise understanding of the castration resistance and its relationship with IRX4. *IRX4* transcriptomic datasets in PCa showed a poor correlation with the proteomic data in line with recent research insights in PCa [41,61,62] due to varied discordant regulation between the transcriptional and translational regulation, post-transcriptional modifications associated with translation regulation, lack of temporal synchronization between transcription and translation and kinetic changes between protein generation and turnover in complex biological samples [63]. However, in consonance with *IRX4* RNA expression, the expression level of IRX4 protein isoforms was more prominent in the androgen-responsive cell lines than androgen-nonresponsive cell lines. Overall, the results suggest that IRX4 protein isoforms are androgen regulated. The simultaneous overexpression of *IRX4* transcripts together with *KLK3* with androgen may indicate a role of IRX4 in the early stages of PCa progression, and the antiandrogen treatment with bicalutamide and enzalutamide is not completely effective in eliminating the overexpression of *IRX4* transcripts in PCa cells. The effect of androgens and antiandrogens in the regulation of *IRX4* transcripts compared to VCaP and DuCaP cells is lower in LNCaP cells, suggestive of differential androgen regulation of *IRX4* in LNCaP cells, which needs to be further elucidated. Thus, IRX4 will be a better therapeutic target in combination with anti-androgen therapy for the treatment of PCa.

The lack of response to antiandrogen treatment of *IRX4* transcripts 6 and 7 was seen in DuCaP cells. IRX4 protein isoform 4 is encoded by transcripts 6 and 7, and these two transcripts use an alternative promoter distinct from other transcripts that are located immediately upstream of exon 3a. This may critically affect and alter the androgen regulation of IRX4 protein isoform 4 in PCa cells and can lead to therapeutic resistance to antiandrogen therapy in PCa patients. Although the function of IRX4 is tightly regulated at both transcriptional and post-transcriptional levels in human heart ventricles [64], the transcriptional regulation of IRX4 is still not completely clear in PCa. Nguyen et al. have explored the tumor suppressor role of IRX4 through the interaction with vitamin D receptors in PCa, but the isoform-specific roles are unknown [26]. Since IRX4 protein isoform 4 lacks both an N-terminus and C-terminus and predicted functional domains compared to full-length IRX4 proteins, we suspect the possibility of this isoform to act as a transcription factor. Thus, this isoform 4 may act distinctly compared to full-length isoforms by various mechanisms. Recent studies have elucidated that cancer-associated AS of transcription factors generates isoforms with altered activity, opposite transcription or antagonistic functions that severely impact tumor initiation and progression [3]. As Belluti et al. summarized, the lack of binding domains in transcription factors can show opposite functions in cancer progression [65]. For instance, the AP-2B isoform produced by AS of *AP-2* lacks a DNA binding domain and shows an inhibitor effect of the transactivation of *AP2* and leads to increased tumorigenicity, anchorage-independent growth, invasiveness and angiogenesis in melanoma [66,67]. In addition, functional domains lacking transcription factor isoforms can be mislocalized within the cancer cell and exert a dominant-negative activity as a result of the excessive expression of the non-functional isoform over functional or cytoplasmic titration of the functional isoform or regulation of full-length isoforms by non-functional isoforms directly or indirectly [65]. For example, the HELIOS-V1 isoform lacks exon 6 for the nuclear localization signal and therefore has a cytoplasmic localization in human leukemic T-cell lines and contributes to T-cell growth and survival. Further, the expression of HELIOS isoforms triggers the deregulation of various downstream target genes in T-cells compared to full-length isoforms [68]. The isoform 3 encoded from transcripts 12, 8 or 9 is expected to have the essential domains such as the transactivation domain and homeodomain, which are essential for DNA binding and act as transcription factors. However, compared to full-length IRX4 isoforms, this isoform 3 lacks an N-terminal region that may affect the structure and function in PCa, which needs to be further elucidated. Unfortunately, we could not see any protein expression in mass spectrometry data in several transcripts (10 and 11) identified in PCa cell lines that may suggest a nonsense-mediated decay of these transcripts. In view of this, although the potential importance of IRX4 isoforms towards PCa progression is still unknown, the diagnostic and therapeutic value of these isoforms cannot be ignored. Additional functional studies with isoform-specific overexpression and knockdown models are essential to prove the differential roles of IRX4 isoforms in PCa.

GWAS have identified over 160 PCa risk loci, some of which act as cis-eQTL regulatory elements which modulate the expression of nearby genes. The PCa risk SNP rs12653946 identified in the *5p15* locus was found to have a strong relationship with the *IRX4* gene (*<sup>P</sup>* = 4.91 × <sup>10</sup><sup>−</sup>5, FDR = 0.00468) [22,44]. rs10866528 has been used to tag the SNP rs12653946 and has been found to have a lower *IRX4* expression with the PCa highrisk homozygous genotype GG than with the common heterozygous AG genotype in a 50-patient sample cohort [44]. Similar to the reported results, with rs10866528 SNP we observed low levels of expression of *IRX4* transcripts in common AA genotypes and the PCa risk-associated GG genotype in a large sample cohort (483 samples). All the analyzed *IRX4* transcripts showed a similar pattern, and we were not able to observe a disparity

between the analyzed transcripts. This suggests that *IRX4* transcripts equally contribute to PCa risk and one cannot ignore the individual value in the progression of PCa.

Isoform discovery is a challenging task in cancer research. Although many transcripts are identified at a cellular level, only a small proportion of transcripts encode isoforms in context-dependent manner [63]. Most of the isoforms share overlapping regions; therefore, the identification of unique isoforms is extremely difficult at a lab-based assays and computational level. The commonly used techniques, such as, PCR, RNA-seq and mass spectroscopy, have their own limitations in identifying and quantifying low abundant isoforms. We have selected a small number of PCa cell lines to quantify the expression of IRX4 isoforms by mass spectrometry, but these needed to be measured in a large sample cohort including clinical specimens. Low reproducibility, low sensitivity, high variability and high data noise of the instruments are real challenges in isoform annotation. Moreover, the limited correlation between the transcriptomic and proteomic data makes it difficult to interpret differences in isoform expression between cancer cell lines.

Analyzing isoform-specific expression in prostate tumor progression can provide clear insights to develop drugs against specific isoforms that promote tumors. Besides PCa, the diversity of the *IRX4* gene can be a hallmark for other cancers; thus, the characterization of these isoforms would augment our knowledge for the development of specific therapeutic strategies. Considering the IRX4 isoform-specific expression, our study suggests that IRX4 can have distinct roles in PCa, which suggests the clinical importance of isoform-specific therapeutic targets in improving PCa patient care.

#### **5. Conclusions**

In summary, identifying isoform-specific expression is a challenging but critically important task, especially in cancer progression. We identified a subset of *IRX4* transcripts in PCa whose mRNA and encoding isoforms show distinct expression profiles across a panel of androgen-responsive and nonresponsive PCa cell lines. Apart from the prominent *IRX4* 1–5 transcripts, a large contribution has been imparted by other *IRX4* transcripts to the overall *IRX4* gene expression in PCa. Some transcripts are not only lacking in regulatory and essential coding regions but also possess additional sequence features via intron retention, which suggests that the functional roles of their encoding isoforms may be distinct from the primary full-length isoforms. Given the experimental evidence associated with *IRX4* AS, our results prioritize those splicing events that can show a clear expression signal of functional importance. Thus, this highlights the importance of exploring the differential roles of potential isoforms in cancer and how cells obtain the benefit of AS in tumor progression, therapy and drug resistance. Therefore, understanding the AS of *IRX4* in PCa could provide insights into tumor development and lead to the development of new therapeutic targets.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/genes12050615/s1, Figure S1: Alignment of predicted *IRX4* transcripts 6, 8, 10 and 12 sequences, Figure S2: Identification of IRX4 isoform-specific peptides by reprocessing LNCaP LC-MS/MS data. Table S1: Primer sequences for *IRX4* transcripts used in the study.

**Author Contributions:** Conceptualization, A.F., P.J., and J.B.; methodology, A.F., P.J., C.L., A.M; investigation, A.F., C.L., A.M.; formal analysis A.F., C.L., A.M.; validation, A.F. and J.B.; writing original draft preparation, A.F.; writing—review and editing, C.L., A.M., P.J. and J.B.; supervision, P.J. and J.B.; project administration, J.B.; funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was supported by grant 1124950 awarded through the 2016 Priority-driven Collaborative Cancer Research Scheme and funded by Cancer Australia awarded to J. Batra. J. Batra was supported by a National Health and Medical Research Council (NHMRC) Career Development Fellowship and Advance Qld Industry Research Fellowship. Achala Fernando acknowledges the Research Training Stipend (RTP) and QUT HDR Tuition Fee Sponsorship. Chamikara Liyanage and Afshin Moradi acknowledge the QUT Postgraduate Research Award (QUTPRA) and QUT HDR Tuition Fee Sponsorship.

**Institutional Review Board Statement:** The cell line-based studies are approved by the QUT ethics committee (QUT Ethics approval number: 1500001082).

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** The authors acknowledge Srilakshmi Srinivasn for her guidance in the experiments and Mr Adil Malik for his contribution to PRIDE data reprocessing. Pawel Sadowski and Raj Gupta are acknowledged for carrying out the LC-MS/MS analysis at the Central Analytical Research Facility (CARF), operated by the Institute for Future Environments at QUT.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


5. Biamonti, G.; Infantino, L.; Gaglio, D.; Amato, A. An Intricate Connection between Alternative Splicing and Phenotypic Plasticity in Development and Cancer. *Cells* **2019**, *9*, 34. [CrossRef]


*Article*

## **Characterization of Hormone-Dependent Pathways in Six Human Prostate-Cancer Cell Lines: A Gene-Expression Study**

**Andras Franko 1,2,3, Lucia Berti 2,3,\*, Alke Guirguis 4, Jörg Hennenlotter 5, Robert Wagner 1,2,3, Marcus O. Scharpf 6, Martin Hrabe de Angelis ˘ 3,7, Katharina Wißmiller 8,9,10, Heiko Lickert 3,8,9,10, Arnulf Stenzl 5, Andreas L. Birkenfeld 1,2,3, Andreas Peter 2,3,4, Hans-Ulrich Häring 1,2,3, Stefan Z. Lutz 1,11,**† **and Martin Heni 1,2,3,4,**†


Received: 25 August 2020; Accepted: 30 September 2020; Published: 7 October 2020

**Abstract:** Prostate cancer (PCa), the most incident cancer in men, is tightly regulated by endocrine signals. A number of different PCa cell lines are commonly used for in vitro experiments, but these are of diverse origin, and have very different cell-proliferation rates and hormone-response capacities. By analyzing the gene-expression pattern of main hormone pathways, we systematically compared six PCa cell lines and parental primary cells. We compared these cell lines (i) with each other and (ii) with PCa tissue samples from 11 patients. We found major differences in the gene-expression levels of androgen, insulin, estrogen, and oxysterol signaling between PCa tissue and cell lines, and between different cell lines. Our systematic characterization gives researchers a solid basis to choose the appropriate PCa cell model for the hormone pathway of interest.

**Keywords:** androgen receptor; estrogen receptor; gene expression; insulin receptor; prostate cancer

#### **1. Introduction**

Genetic and metabolic alterations can both drive the development of prostate cancer (PCa) [1,2]. Metabolic alterations can influence autocrine and paracrine signaling pathways, which are crucial in the carcinogenic processes of PCa [3]. A tumor is a complex milieu and, in addition to prostate-cancer cells, it contains fibroblasts, endothelial cells, and mesenchymal stem cells [4]. These various cells communicate with PCa cells via paracrine factors that drive carcinogenesis [4,5]. For instance, paracrine factors secreted by PCa cells were shown to stimulate osteoblasts, which led to excess bone formation [6]. Metabolic alterations, including obesity, metabolic syndrome, insulin resistance, and diabetes, are known to exacerbate PCa [7,8]. These conditions mediate the aggressivity of PCa via endocrine signaling [1]. The major hormone pathways in PCa are androgen, insulin, estrogen, and oxysterol signaling [1,9,10]. In vitro studies use different cell models that are of diverse origin, and have very different cell-proliferation rates and hormone-response capacities [11–13]. Although all these cell lines represent valuable tools to study the development of PCa *in vitro*, whether these cell lines are suitable for studying endocrine pathways, which regulate PCa development in humans, is poorly understood. Therefore, it is vital to use a suitable in vitro model that resembles the physiology of endocrine pathways in human PCa. To characterize the major hormone pathways in human PCa cell lines, we systematically compared six commonly applied PCa cell lines and parental primary cells with (i) each other and (ii) PCa tissue samples of patients with PCa by analyzing their gene-expression patterns.

#### **2. Materials and Methods**

#### *2.1. Cell Culture*

Human prostate adenocarcinoma cell lines PC3 and LNCaP, isolated from bone and lymph-node metastasis respectively, were purchased from CLS-Cell line services (Eppelheim, Baden-Württemberg, Germany). DU145 cells (isolated from brain metastasis) were obtained from the DSMZ-German collection of microorganisms and cell cultures (Braunschweig, Niedersachsen, Germany). NCI-H660 cells (isolated from lymph-node metastasis) and MDA-PCa-2b (isolated from bone metastasis) were purchased from the ATCC-American type culture collection (Manassas, VA, USA). CWR-R1ca cells (isolated from human xenograft tumors) and human prostate epithelial cells (HPEC) were purchased from Merck (Darmstadt, Hessen, Germany). HPEC cells were used at passages 3 and 4. All cell lines were propagated according to the supplier's instructions and maintained in a medium supplemented with 100 IU/mL penicillin, 0.1 mg/mL streptomycin, and 2 mM glutamine (Gibco/Thermo Fisher Scientific, Karlsruhe, Baden-Württemberg, Germany) in a 5% CO2 humidified atmosphere at 37 ◦C. LNCaP cells for 14 days of androgen deprivation were cultured in a basal medium (ATCC #CRL-1740, ThermoFisher Scientific #21875034) containing 2 mM L-glutamine and 10% FBS Good (Pan-Biotech P40-37500, Aidenbach, Freistaat Bayern, Germany). The medium used for androgen deprivation (ThermoFisher Scientific #11835030) was supplemented with 10% charcoal-stripped FBS (Sigma-Aldrich, F6765, Munich, Freistaat Bayern, Germany) and 10 nM dihydrotestosterone (Sigma-Aldrich, D-073) for 14 days. Detailed information regarding culture medium and supplements is given in Table 1. Cells were routinely cultivated until 90% confluency before harvesting.



#### *2.2. Human Samples*

Newly diagnosed PCa patients who had not received treatment before surgery were recruited prior to radical prostatectomy. Tissue sampling was performed by an experienced uropathologist. PCa tissues were immediately snap-frozen in liquid nitrogen and stored at −80 ◦C. Hematoxylin and eosin staining was performed on paraffinized samples for histological confirmation. Histopathological features were assessed, and pT- and postoperative Gleason scores were determined [14,15]. Informed written consent was obtained from all participants, and the Ethics Committee of the University of Tübingen (575/2018BO2) approved the protocol according to the Declaration of Helsinki.

#### *2.3. Real-Time qPCR*

Total RNA was isolated using an AllPrep DNA/RNA/miRNA kit (Qiagen, Hilden, Germany) according to the manufacturer's description, and cDNA was synthesized (Roche, Basel, Switzerland). Real-time PCR was performed with LightCycler 480 Probes Master (Roche) with universal probe library using LightCycler 480 (Roche) [16]. Delta–delta crossing-point (Cp) values were calculated, and data were normalized to ubiquitin c (*UBC*) [8,15]. For real-time PCR analysis, the following primers and probes were applied: *ESR1* 5 -TCCTAACTTGCTCTTGGACAGG and 3 -GTAGCCAGCAGCATGTCG (probe nr 22), *ESRRA* 5 -GTGGGCGGCAGAAGTACA and 3 -TCAACCACCAGCAGATGAGA (probe nr 3), *CYP46A1* 5 -GCTGGACAACTTCGTCACCT and 3 -CATCACTGTGAACGCCAAGT (probe nr 53), *CCND1* 5 -GACCTTCGTTGCCCTCTGT and 3 -GGTTCAGGCCTTGCACTG (probe nr 87), *TP53* 5 -AAGTCTAGAGCCACCGTCCA and 3 -AGTCTGGCTGCCAATCCA (probe nr 3), *GLUT1* 5 -GGTTGTGCCATACTCATGACC and

3 -CAGATAGGACATCCAGGGTAGC (probe nr 67), *GLUT12* 5 -TGCTGCTTTTTCAATTGGTCT and 3 -AGGAAAGATCTCGCTGAGCA (probe nr 37), *IRS1* 5 -GCCTATGCCAGCATCAGTTT and 3 -TTGCTGAGGTCATTTAGGTCTTC (probe nr 71), *IRS2* 5 -TGACTTCTTGTCCCACCACTT and 3 -CATCCTGGTGATAAAGCCAGA (probe nr 49), *FGFR1* 5 -GGCAGCATCAACCACACATA and 3 -TACCCAGGGCCACTGTTTT (probe nr 42), *CH25H* 5 -CCTTCTTCCCGGTCATCTTC and 3 -GATATCCAGGACCACGAAGG (probe nr 9), *CDKN1A* 5 -CCGAAGTCAGTTCCTTGTGG and 3 -CATGGGTTCTGACGGACAT (probe nr 82), *CDKN1B* 5 -GAGAGCCAGGATGTCAGCG and 3 -TTGTTTTGAGTAGAAGAATCGTCGGT (probe CCTTTAATTGGGGCTCCGGCTAACT), *NOS2* 5 -GCTCAAATCTCGGCAGAATC and 3 -GCCATCCTCACAGGAGAGTT (probe nr 42), *SEPP1* 5 -GGAGCTGCCAGAGTAAAGCA and 3 -ACATTGCTGGGGTTGTCAC (probe nr 38). Primer sequences for *AR, CYP27A1, CYP7B1, ESR2, FOLH1, IGF1R, INSRA, INSRB, KLK3, UBC* and *MKI67* are given in [9]. Primer sequences for *HIF1A, RELA* and *BIRC5* are given in [14]. For the human PCa samples, the gene-expression levels of *GLUT1*, *HIF1A*, *BIRC5*, *NOS2*, *RELA* and *MKI67* were previously studied [15].

#### *2.4. Western Blot*

Total cellular protein was extracted using RIPA buffer (50 mmol/L Tris-HCl pH 7.4, 150 mmol/L NaCl, 1% NP-40, 0.25% Na-deoxycholate, 1 mmol/L phenyl-methyl-sulfonyl-fluoride, 1 mmol/L dithiothreitol) containing a protease and phosphatase inhibitor cocktail (Roche Molecular Biochemicals, Mannheim, Baden-Württemberg, Germany), and cleared by centrifugation. Protein concentration was determined using a BCA protein assay from Bio-Rad (Hercules, CA, USA). The 20 μg protein lysates were separated on a 4–12% Bis-Tris gel (Invitrogen, Carlsbad, CA, USA). After electrophoresis, proteins were transferred using nitrocellulose ministacks and the iBlot dry-blotting system (Invitrogen, Carlsbad, CA, USA). Membranes were blocked for two hours in Odyssey Blocking Buffer (LiCor, Lincoln, NE, USA) and further incubated with antibodies against androgen receptor (AR, ab133273), prostate-specific antigen (PSA, KLK3, ab53774), β-tubulin (ab21057), prostate-specific membrane antigen (PSMA, FOLH1, ab19071) (Abcam, Cambridge, Cambridgeshire, UK), IGF1R β subunit (D23H3, #9750), and insulin receptor β subunit (L55B10, #3020) (Cell Signaling Technologies, Danvers, MA, USA). IRDye® or AlexaFluor® secondary antibodies (LiCor or Abcam, Cambridge, UK) were used, and signals were detected and quantified using the iBright device (Invitrogen, Carlsbad, CA, USA).

#### **3. Results**

For our comprehensive analysis, we chose six commonly investigated human PCa cell lines (CWR-R1ca, DU145, LNCaP, NCI-H660, MDA-PCa-2b, and PC3). Human prostate epithelial cells (HPEC) were included as parental, nontumorous primary prostate cells. To compare gene expression for hormone pathways in the PCa cell lines to the human situation, we analyzed 11 PCa samples isolated from patients who underwent radical prostatectomy due to their tumor. Histopathological screening confirmed the presence of prostate cancer in the collected tissues. As prostate-cancer metabolism could be different at various tumor stages, we specifically selected patients at a similar tumor stage with comparable Gleason scores (7a and 7b) and without lymph-node metastasis (Table 2). Data for the 11 human samples are shown as pooled values (mean ± standard deviation) in Figures 1–3.

As a first step, we quantified the transcripts of the main hormone receptors. The expression levels of androgen receptor (*AR*) and its target genes, prostate-specific antigen (*PSA, KLK3*), and prostate-specific membrane antigen (*PSMA, FOLH1*) were most pronounced in LNCaP and MDA-PCa-2b cells (Figure 1A–C). The insulin receptor isoform A (*INSRA*)/insulin receptor isoform B (*INSRB*) ratio was highest in MDA-PCa-2b cells (Figure 1D–F), suggesting the activation of a mitogenic insulin pathway [17]. PC3 cells demonstrated the largest insulin receptor substrate (IRS) IRS1/IRS2 ratio (Figure 1G–I).

**Table 2.** Patient characteristics. Abbreviations: BMI: body-mass index, PSA: prostate-specific antigen, N: number of patients, Stdev: standard deviation. pT and Gleason scores represent prostate-cancer (PCa) pathological stages; pN denotes lymph-node status.

**Figure 1.** Transcript levels of hormone receptors and downstream substrates in prostate-cancer cell lines and in human prostate-cancer tissue. Transcript levels of indicated genes measured using real-time PCR. (**A**) *AR*, (**B**) *KLK3*, (**C**) *FOLH1*, (**D**) *INSRA*, (**E**) *INSRB*, (**F**) *INSRA*/*INSRB* ratio, (**G**) *IRS1*, (**H**) *IRS2*, (**I**) *IRS1*/*IRS2* ratio. PCa: prostate cancer, HPEC: parental primary prostate cells, CWR-R1ca: xenograft PCa cells, DU145: brain metastasis PCa cells, LNCaP: lymph-node metastasis PCa cells, NCI-H660: lymph-node metastasis PCa cells, MDA-PCa-2b: bone metastasis PCa cells, PC3: bone metastasis PCa cells, nd: not detected. For human PCa samples, data shown as pooled samples: mean ± standard deviation (*n* = 11).

The highest gene levels of estrogen receptors α (*ESR1*) and β (*ESR2*) were observed in NCI-H660 and PC3 cells, whereas the expression of estrogen-related receptor α (*ESRRA*) was the highest in LNCaP cells (Figure 2A–C). The gene-expression levels of insulin-like growth factor 1 receptor (*IGF1R*) were the highest in HPEC and DU145, whereas fibroblast growth factor receptor 1 (*FGFR1*) showed the strongest expression in CWR-R1ca and DU145 cells (Figure 2D–E).

**Figure 2.** Transcript levels of hormone receptors and potential oncogenic mediators in prostate-cancer cell lines and in human prostate-cancer tissue. Transcript levels of indicated genes were measured using real-time PCR. (**A**) *ESR1*, (**B**) *ESR2*, (**C**) *ESRRA*, (**D**) *IGF1R*, (**E**) *FGFR1*, (**F**) *CYP27A1*, (**G**) *CYP7B1*, (**H**) *CYP46A1*, (**I**) *CH25H*. PCa: prostate cancer, HPEC: parental primary prostate cells, CWR-R1ca: xenograft PCa cells, DU145: brain metastasis PCa cells, LNCaP: lymph-node metastasis PCa cells, NCI-H660: lymph-node metastasis PCa cells, MDA-PCa-2b: bone metastasis PCa cells, PC3: bone metastasis PCa cells, nd: not detected. For human PCa samples, data shown as pooled samples: mean ± standard deviation (*n* = 11).

As a second step, we quantified the mRNA levels of further major intracellular regulators of PCa. Among the three investigated enzymes of oxysterol metabolism, the strongest expression for cytochrome P450 family 27 subfamily A member 1 (*CYP27A1*) was observed in DU145 and PC3 cells (Figure 2F), whereas cytochrome P450 family 7 subfamily B member 1 (*CYP7B1*) was highly represented in CWR-R1ca cells (Figure 2G). The transcript for cytochrome P450 family 46 subfamily A member 1 (*CYP46A1*) was detected only in CWR-R1ca, DU145, and NCI-H660 cells (Figure 2H). In order to use a statistical test among the analyzed cells lines, we examined cell-type-dependent gene expressions in a mixed model by bundling different genes to pathways: androgen pathway: *AR*, *PSA*,

and *PSMA*; insulin pathway: *INSRA, INSRB, IRS1*, and *IRS2*; estrogen pathway: *ESR1, ESR2,* and *ESRRA*; oxysterols: *CYP7B1, CYP46A1*, and *CYP27A1*. The expressed genes were considered as random effects, and the pathways, cell types, and their interactions as fixed effects. We observed a significant interaction between cell type MDA-PCa-2b and the androgen pathway, which suggested that the MDA-PCa-2b cell line has a different gene expression for this pathway. Cholesterol 25-hydroxylase (*CH25H*) was only identified in four cell lines (Figure 2I).

Next, we analyzed three genes that are involved in proliferative pathways [18,19]. The expression of Ki-67 (*MKI67*) was the highest in NCI-H660 and MDA-PCa-2b cells (Figure 3A). CWR-R1ca cells showed the strongest expression of cyclin D1 (*CCND1*) (Figure 3B). The mRNA level of tumor protein p53 (*TP53*) was high in LNCaP and MDA-PCa-2b cells (Figure 3C). Hypoxia inducible factor 1 subunit α (*HIF1A*) and RELA proto-oncogene NF-kB subunit (*RELA*), as well as the NF-kB target genes, including baculoviral IAP repeat containing 5 (*BIRC5*), nitric oxide synthase 2 (*NOS2*), and selenoprotein P (*SEPP1*), are important regulators of carcinogenic processes in the development of PCa [20–23]. *HIF1A* was strongly expressed in PC3 cells (Figure 3D). mRNA levels of *RELA* and *BIRC5* were comparable among the cell lines (Figure 3E,F). *NOS2* was detected in three cell lines (Figure 3G). The transcript levels of *SEPP1* were the largest in LNCaP and MDA-PCa-2b cells (Figure 3H). Cyclin-dependent kinases and their inhibitors are implicated in mitogenic signaling and were demonstrated to be regulated by androgen signaling [18]. Cyclin-dependent kinase inhibitor 1A (*CDKN1A*) showed strong gene expression in HPEC cells (Figure 3I). The mRNA levels of *CDKN1B* were the highest in LNCaP and MDA-PCa-2b cells (Figure 3J). Glucose transporters regulate crucial tumorigenic pathways, which have particular interest in the term of diabetes [24]. Among the analyzed glucose transporters, solute carrier family 2 member 1 (*GLUT1*) showed the strongest expression in HPEC cells (Figure 3K). The strongest gene expression for solute carrier family 2 member 12 (*GLUT12*) was found in CWR-R1ca cells (Figure 3L).

Ghandi and colleagues recently assessed the transcript pattern of five PCa cell lines using RNA sequencing [25]. The observed differences of gene-expression pattern among the analyzed cells lines in our study resemble these previous data for DU145, LNCaP, NCI-H660, MDA-PCa-2b, and PC3 cells (Table 3, https://www.cbioportal.org/, and [25]).

**Figure 3.** *Cont*.

**Figure 3.** Transcript levels of potential oncogenic mediators in prostate-cancer cell lines and in human prostate-cancer tissue. Transcript levels of indicated genes measured using real-time PCR. (**A**) *MKI67*, (**B**) *CCND1*, (**C**) *TP53*, (**D**) *HIF1A*, (**E**) *RELA*, (**F**) *BIRC5*, (**G**) *NOS2*, (**H**) *SEPP1*, (**I**) *CDKN1A*, (**J**) *CDKN1B*, (**K**) *GLUT1*, (**L**) *GLUT12*. PCa: prostate cancer, HPEC: parental primary prostate cells, CWR-R1ca: xenograft PCa cells, DU145: brain metastasis PCa cells, LNCaP: lymph-node metastasis PCa cells, NCI-H660: lymph-node metastasis PCa cells, MDA-PCa-2b: bone metastasis PCa cells, PC3: bone metastasis PCa cells, nd: not detected. For human PCa samples, data shown as pooled samples: mean ± standard deviation (*n* = 11).


**Table 3.** RNA sequencing data from https://www.cbioportal.org/ [25].


**Table 3.** *Cont.*

Numbers denote median values.\* Applied method did not distinguish between INSRA and INSRB isoforms, which were, however, differentiated by our qPCR analysis.

Furthermore, we measured the protein levels for AR, PSMA, IR, IGF1R, and PSA using Western blotting (Figure 4A). Most of the observed changes at the gene-expression level among HPEC, LNCaP, and PC3 cells were mirrored well on the protein level. The gene-expression profiles of CWR-R1ca, DU145, NCI-H660, and MDA-PCa-2b cells are in line with protein-expression data of previous studies [11,12,26–29]. As LNCaP cells showed profound PSA and PSMA expressions at the protein level, we treated these cells with dihydrotestosterone (DHT) after serum deprivation. DHT treatment increased the protein level of PSA (Figure 4B), suggesting that LNCaP cells are partly sensitive for hormone treatment after serum deprivation.

**Figure 4.** Protein levels of androgen receptor (AR), PSA, prostate-specific membrane antigen (PSMA), insulin receptor (IR), and insulin-like growth factor 1 receptor (IGF1R) in prostate-cancer cell lines. Proteins detected using SDS-PAGE and Western blot. (**A**) For HPEC, LNCaP, and PC3 cell lysates, AR, PSA, PSMA, IR, and IGF1R antibodies were applied. (**B**) For LNCaP cell lysates, PSA, PSMA, and IR antibodies were applied. As loading control, β-tubulin was used. Protein intensities were normalized to (**A)** β-tubulin and HPEC cells or (**B**) control condition. These relative protein intensities are under the pictures. For quantifying PSA intensity, upper protein bands were evaluated. Control: LNCaP cells grown in growth medium; DHT: dihydrotestosterone treatment of LNCaP cells after serum deprivation. Numbers on the right side represent molecular-weight markers.

#### **4. Discussion**

All analyzed PCa cell lines had lower insulin-receptor expression levels than those of PCa samples obtained from PCa patients. On the other hand, transcript levels of potential oncogenic mediators and proliferation markers in different prostate-cancer cell lines and in human prostate-cancer tissue have similar expression. These data indicate that certain PCa cell lines can serve as an appropriate model for investigating the main hormone pathways on the transcript level.

Our comprehensive data regarding the mRNA levels of endocrine-signaling components identified major differences among the six human PCa cell lines and the parental nontumorous primary cells. Androgen-receptor signaling is the best-studied pathway in PCa [1,22]. LNCaP and MDA-PCa-2b cells have very high transcript levels for androgen-signaling components; the present data suggest that, out of the six analyzed PCa cell lines, LNCaP and MDA-PCa-2b cells are probably the best choice for studies on androgen signaling (Table 4).


**Table 4.** Summary table.

In contrast to the androgen-signaling pathway, there is less knowledge about insulin, estrogen, and oxysterol signaling in PCa cell models. A high insulin-receptor isoform A to insulin-receptor isoform B ratio in PCa suggests the prevailing activation of the mitogenic insulin pathway [17,30]. Our analysis indicates that MDA-PCa-2b cells could be highly responsive to mitogenic insulin signaling (Table 4). On the other hand, NCI-H660 and PC3 cells appear to be suitable for analyzing estrogen signaling due to the high expression levels of *ESR1* and *ESR2* (Table 4). Cholesterol derivates, including oxysterols [31], were recently implicated in the regulation of PCa growth by modulating androgen-receptor signaling [9]. Our current data indicate that, on the mRNA level, the major enzymes of oxysterol metabolism show significant differences in the analyzed PCa cell lines.

Furthermore, the activation of HIF1A and NF-kB pathways plays a pivotal role in oncogenic processes [24,32]. In human prostate samples, we observed a positive correlation of the transcript levels of HIF1A and NF-kB pathways to the oncometabolite fumarate [14]. These real-time PCR data indicate that, for investigating HIF1A pathways *in vitro*, PC3 cells could serve as an appropriate model (Table 4). Patients with Type 2 diabetes develop more aggressive PCa [33,34] and show increased risk of PCa mortality [35]; therefore, altered glucose metabolism and the upregulation of glucose transporters could be involved in the progression of PCa in patients with diabetes [7,24]. HPEC cells had high *GLUT1* expression levels, whereas CWR-R1ca cells showed the highest expression for *GLUT12* transcripts, indicating that these cell lines could be used to analyze the involvement of glucose metabolism in cancer progression.

Androgen-deprivation therapy (ADT) is one of the common treatment options for localized PCa [36] and was shown to modify intracellular hormonal metabolism [37]. PCa cell lines represent in vitro models studying the consequences of ADT and hormone stimulus [13]. In LNCaP cells, PSA showed high androgen sensitivity, which is in line with previous studies [38,39]. Nevertheless, ADT has additional effects on PCa cells, as it also stimulates neuroendocrine differentiation in LNCaP cells [40]; thus, the in vitro application of hormone treatment after serum deprivation should be carefully interpreted.

The cell lines in our study were cultivated in various culture conditions, such as different media, hormonal supplements, and FBS concentrations, following suggestions for each cell line from the provider. Nevertheless, these different conditions could have created a bias on the gene expressions that we could not control.

In summary, our systematic characterization gives researchers a solid basis to choose the most appropriate PCa cell-culture model to characterize the hormone pathway of interest (Table 4).

**Author Contributions:** Conceptualization, S.Z.L. and H.-U.H.; methodology, A.F., L.B., J.H., A.G., K.W., and R.W.; formal analysis, A.F., L.B., and R.W.; resources, H.L., A.S., A.L.B., M.H.d.A., and H.-U.H.; writing—original-draft preparation, A.F.; writing—review and editing, L.B., A.G., J.H., R.W., M.O.S., M.H.d.A., K.W., H.L., A.S., A.L.B., A.P., H.-U.H., S.Z.L., and M.H.; All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Transcriptomic Profiling Identifies Di**ff**erentially Expressed Genes in Palbociclib-Resistant ER**+ **MCF7 Breast Cancer Cells**

**Lilibeth Lanceta 1, Conor O'Neill 2, Nadiia Lypova 1, Xiahong Li 3, Eric Rouchka 4, Sabine Waigel 1, Jorge G. Gomez-Gutierrez 5,6, Jason Chesney 1,6 and Yoannis Imbert-Fernandez 1,6,\***


Received: 20 March 2020; Accepted: 18 April 2020; Published: 24 April 2020

**Abstract:** Acquired resistance to cyclin-dependent kinases 4 and 6 (CDK4/6) inhibition in estrogen receptor-positive (ER+) breast cancer remains a significant clinical challenge. Efforts to uncover the mechanisms underlying resistance are needed to establish clinically actionable targets effective against resistant tumors. In this study, we sought to identify differentially expressed genes (DEGs) associated with acquired resistance to palbociclib in ER+ breast cancer. We performed next-generation transcriptomic RNA sequencing (RNA-seq) and pathway analysis in ER+ MCF7 palbociclib-sensitive (MCF7/pS) and MCF7 palbociclib-resistant (MCF7/pR) cells. We identified 2183 up-regulated and 1548 down-regulated transcripts in MCF7/pR compared to MCF7/pS cells. Functional analysis of the DEGs using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database identified several pathways associated with breast cancer, including 'cell cycle', 'DNA replication', 'DNA repair' and 'autophagy'. Additionally, Ingenuity Pathway Analysis (IPA) revealed that resistance to palbociclib is closely associated with deregulation of several key canonical and metabolic pathways. Further studies are needed to determine the utility of these DEGs and pathways as therapeutics targets against ER+ palbociclib-resistant breast cancer.

**Keywords:** palbociclib; estrogen receptor; breast cancer; CDK4/6; CDK4/6 inhibitors; therapy resistance; DNA repair; metabolic rewiring

#### **1. Introduction**

Breast cancer is the most frequent malignancy among women, and approximately 60–70% of cases are estrogen receptor-positive (ER+). Selective inhibition of cyclin-dependent kinases 4 and 6 (CDK4/6) and ER signaling is now standard-of-care therapy for ER+ metastatic breast cancer [1]. Three CDK4/6 inhibitors, palbociclib, ribociclib and abemaciclib, are currently used in combination with endocrine therapy given their shown improvement in progression-free survival compared to endocrine therapy alone in the metastatic setting [2]. Despite the clear benefit of this combination, approximately 10% of

patients remain insensitive, whereas nearly all patients become resistant within 12 to 36 months of therapy initiation [3]. Therefore, determining the underlying mechanisms of resistance is required to design novel treatment strategies that delay or overcome clinical resistance.

Previous studies have shown that resistance to palbociclib is commonly associated with cyclin E or CDK6 amplification, CDK2 activation and loss of the retinoblastoma (Rb) protein in ER+ breast cancer cells [4–6]. Analysis of circulating tumor DNA from patients enrolled in the PALOMA-3 trial (fulvestrant or fulvestrant + palbociclib) identified an enrichment of Rb mutations, although this only occurred in 4.5% of the palbociclib-treated cohort [7,8]. Importantly, acquired alterations in *ESR* and *PIK3CA* were also observed; however, these alterations occurred in both treatment arms indicating distinct events driving resistance to palbociclib versus fulvestrant [9]. Additional studies have implicated fibroblast growth factor receptor (FGFR) or aurora kinase A amplifications, enhanced MAPK or AKT signaling and decreased DNA repair as mechanisms of resistance against CDK4/6 inhibition [10–13]. Taken together, these studies have provided rationale for the testing of CDK4/6 inhibitors in combination with MEK or PI3K inhibitors [11,14].

The major goal of this study was to identify additional mechanisms of resistance to palbociclib in ER+ breast cancer cells through transcriptomic analyses. We previously demonstrated that ER+ palbociclib-resistant cells exhibit a marked decrease in the cellular antiviral interferon (IFN) response [6], and thus we expected that other drivers of resistance remained to be identified. Here, we determined the transcriptional landscape of ER+ MCF7 palbociclib-sensitive (MCF7/pS) and palbociclib-resistant (MCF7/pR) breast cancer cells via next-generation transcriptomic RNA sequencing (RNA-seq). Gene expression profile and pathway analysis identified significant canonical pathways associated with resistance to palbociclib including cell cycle regulation, immune responses and DNA damage repair (DDR) among others. Importantly, we identified several metabolic pathways uniquely enriched in palbociclib-resistant cells compared to palbociclib-sensitive cells. These studies provide a mechanistic base for the further validation of these pathways in mediating resistance to palbociclib.

#### **2. Materials and Methods**

#### *2.1. Cell Culture, Generation of Palbociclib-Resistant Cells and Palbociclib Treatment*

MCF7 (HTB-22) cells were purchased from the American Type Culture Collection (ATCC) and maintained at 37 ◦C with 5% CO2. MCF7 cells were cultured in IMEM (Corning) supplemented with 10% fetal bovine serum (FBS, Invitrogen). Drug-resistant MCF7 cells were established by culturing in media containing palbociclib (0.1–4 μM). Drug was replenished every 3 days. Cells were subcultured every 1–2 weeks with 25% increments in drug concentration. The resistant cells were established after 6 months and maintained in the presence of 1 μM palbociclib. Cells were authenticated by the short tandem repeat (STR) assay (Genetica).

#### *2.2. RNA Extraction and Next-Generation Sequencing*

MCF7/pS and MCF7/pR cells were seeded in 10 cm2 dishes at a density of 2 <sup>×</sup> 10<sup>6</sup> cells and allowed to incubate overnight prior to RNA extraction using the RNeasy kit (Qiagen) for a total of three independent replicates per cell line. Libraries were prepared simultaneously for all replicates and cell lines using the TruSeq Stranded mRNA LT Sample Prep Kit - Set A (Cat# RS-122-2101) with poly-A enrichment. Sequencing was performed on the University of Louisville Center for Genetics and Molecular Medicine's (CGeMM) Illumina NextSeq 500 using the NextSeq 500/550 1 × 75 cycle High Output Kit v2 (Cat# FC-404-2005). A second run was performed on all samples to achieve an average of 45 million reads per sample.

#### *2.3. DEG Analysis*

The resulting samples were downloaded from Illumina's BaseSpace [15] (https://basespace. illumina.com/). Sequences were directly aligned to the Homo sapiens hg38 reference genome assembly

(hg38.fa) using tophat2 (version 2.0.13), generating alignment files in bam format. DEGs were identified for the pairwise comparison MCF7/pS versus MCF7/pR using the tuxedo suite programs including cufflinks-cuffdiff2 (VERSION2.2.1). A total of 60,603 ENSEMBL genes were considered. Of these, 26,837 showed no gene expression and were excluded. A q-value cutoff ≤ 0.05 with |log2FC| ≥ 1 and gene expression greater than 1 in at least one replicate was used to determine differential expression. RNA-seq data are available (GEO accession number GSE130437). Gene Ontology Biological Processes (GO:BP) and KEGG pathway analysis was performed by using CategoryCompare [16].

#### *2.4. In Silico Ingenuity Network Analysis*

Pathway and biological processes analysis of all differentially expressed genes was performed using Ingenuity Pathway Analysis (Qiagen).

#### *2.5. GFP-LC3 Visualization*

Plasmid vector containing green fluorescent protein linked to microtubule-associated protein 1 LC3 was used to detect autophagosome formation in MCF7/pS and MCF7/pR cell lines [17]. Cells were treated with either vehicle control or palbociclib after 24 h of transfection. The expression of GFP was monitored by fluorescence microscopy 48 h after treatment. Cells were classified as having a predominantly diffuse GFP stain or having numerous punctate structures representing autophagosomes. Images were taken at 40× magnification with the EVOS FL Imaging System (Thermo Fisher Scientific, Waltham, MA, USA) under 357/44 and 447/60 nanometers (nm) excitation and emission visualization, respectively.

#### **3. Results**

#### *3.1. RNA-Seq Profiling Reveals a Distinct Transcriptomic Profiling in Palbociclib-Resistant Cells*

To characterize transcriptional alterations driven by acquired resistance to palbociclib, we performed gene expression profiling in MCF7/pS and MCF7/pR cells. These cells were developed by our group and were previously shown to be resistant to palbociclib [6]. Hierarchical clustering based on differentially expressed RNA transcripts revealed a distinct transcriptomic profile in MCF7/pR cells compared to MCF7/pS (Figure 1). Using a q-value cutoff ≤ 0.05 with |log2FC| ≥ 1, we identified 2183 up-regulated genes and 1548 down-regulated transcripts in MCF7/pR cells. Table 1 shows the top 20 up-regulated and down-regulated genes in MCF7/pR cells compared to MCF7/pS cells.

**Figure 1.** Differential expression heatmap of estrogen receptor-positive (ER+) MCF7 palbociclib-sensitive (MCF7/pS) compared to MCF7 palbociclib-resistant (MCF7/pR) cells. Next-generation transcriptomic RNA sequencing (RNA-seq) was performed and the raw expression of genes is shown as a heatmap. Replicate samples are clustered. Red and yellow indicate lower and higher gene expression, respectively.

**Table 1.** Top 20 up-regulated and down-regulated genes between MCF7/pS and MCF7/pR ranked by *p*-value (pval ≤ 0.05; qval ≤ 0.05; |log2FC| ≥


1).

#### *3.2. KEGG Annotation of DEG and Enriched Biological Processes Analysis*

To gain insight into the molecular mechanisms underlying palbociclib resistance, we performed KEGG pathway analysis of all DEGs identified using CategoryCompare [16]. Table 2 lists the enriched KEGG pathways identified in MCF7/pS vs. MCF7/pR cells (false discovery rate (FDR) ≤ 0.05 and *p*-value ≤ 0.001). The KEGG terms associated with resistance to palbociclib included 'cell cycle', 'DNA replication', 'mismatch repair' and 'phagosome'. Subsequent analysis of GO:BP identified many enriched biological processes that correlated with palbociclib resistance (Figure 2). Importantly, we observed distinct groups of nodes such as DNA replication, cell cycle transition, mitosis, protein–DNA assembly and organization and response to virus revealing multiple functional 'themes' associated with resistance to palbociclib.

**Table 2.** Top enriched KEGG terms between MCF7/pS and MCF7/pR ranked by *p*-value. (pval ≤ 0.05; qval ≤ 0.05; |log2FC| ≥ 1).


**Figure 2.** Enriched biological processes (BP) analysis of ER+ palbociclib-resistant breast cancer cells.

#### *3.3. Resistance to Palbociclib Is Associated with Increased Autophagosome Formation*

Characterization of MCF7/pR cells by KEGG pathway analysis revealed an enrichment in genes associated with phagosomes (Table 2). Given a previous observation suggesting a crosstalk between phagocytosis and autophagy, we sought to investigate autophagy levels in the context of resistance to palbociclib [18]. We performed hierarchical clustering of autophagy-related genes in MCF7/pS and MCF7/pR cells (Figure 3A). Using a *p*-value cutoff ≤ 0.05, we identified a significant number of autophagy-related genes differentially expressed in MCF7/pR compared to MCF7/pS cells. Next, we measured autophagosome formation by monitoring the conversion of cytoplasm-diffuse GFP-LC3-I to punctate forms of membrane-associated GFP-LC3-II, which indicates LC3-II incorporation into the autophagosomes. We observed that MCF7/pR cells displayed a significant increase in autophagosome formation compared to MCF7/pS and that the addition of palbociclib led to a marked increase in autophagosome formation in both MCF7/pS and MCF7/pR cells (Figure 3B). These results confirmed an increase in autophagy in MCF7/pR cells and are in line with previous studies [19]. Numerous studies have demonstrated that autophagy contributes to the resistance of breast cancer cells to targeted therapies by promoting tumor cell survival and blocking apoptosis [20–22]. Recently, it has been shown that autophagy inhibitors synergize with palbociclib in ER+ MCF7 and T47D breast cancer cells resulting in a significant increase in growth inhibition [19]. Our results provide rationale for the use of autophagy inhibitors to treat palbociclib-resistant cells in addition to palbociclib-sensitive cells. Future studies will test the efficacy of this combination against CDK4/6 inhibition in the resistance setting and determine the molecular mechanisms driving the increase in autophagy in resistant tumors.

**Figure 3.** Increased autophagy is associated with palbociclib resistance in ER+ MCF7 cells. (**A**) Hierarchical clustering of autophagy-related genes performed by MetaCore analysis. (**B**) Cells were transfected with a pEGFP-LC3 plasmid and treated with either vehicle control (0.5% water) or 500 nM palbociclib for 24 h. Formation of autophagosomes is depicted by punctate structures (arrows). Images were taken at 40× magnification with an EVOS microscope.

#### *3.4. Pathway Enrichment Analysis of DEG*

To identify potential targetable pathways, all altered transcripts were mapped to known pathways using Ingenuity Pathway Analysis (IPA). We observed significant enrichment of several canonical pathways including four pathways involved in cell cycle regulation ('Estrogen-mediated S-phase entry', 'Cell cycle control of chromosomal replication', 'Mitotic roles of Polo-Like Kinase' and 'Role of CHK proteins in cell cycle checkpoint control'), four involved in DDR ('ATM signaling', 'Role of BRCA in DNA damage response', 'Mismatch repair in eukaryotes' and 'G2/M DNA damage checkpoint regulation'), eight involved in immune responses ('IL-17A signaling', 'Interferon signaling', 'STAT3 pathway', 'April mediated signaling', 'Tec Kinase signaling', 'Antigen presentation pathway', 'Production of nitric oxide and reactive oxygen species in macrophages' and 'IL-15 production') among other pathways (Figure 4).

**Figure 4.** Canonical pathway analysis of ER+ palbociclib-resistant breast cancer cells. A higher–log(B-H *p*-value) shown on the left Y axis represents more significant pathways. The ratio (right Y axis) refers to the number of genes from the data set that map to the pathway divided by the total number of genes that map the canonical pathway from the Ingenuity Pathway Analysis (IPA) database. pval ≤ 0.05; qval ≤ 0.05; |log2FC| ≥ 1.

#### *3.5. Metabolic Pathways Associated with Resistance to Palbociclib*

Previous reports have indicated that cellular metabolism is a downstream target of CDK4/6 inhibition. Specifically, it has been shown that palbociclib administration increases glucose utilization in cancer, whereas cyclin D3-CDK6 can directly phosphorylate and inhibit the activity of two key enzymes in the glycolytic pathway [23,24]. To identify metabolic pathways associated with resistance to palbociclib, we performed metabolic pathway analysis of all DEGs using IPA (Figure 5). We observed an enrichment of several metabolic pathways including three pathways involved in ribonucleotides synthesis ('Pyrimidine deoxyribonucleotides de novo biosynthesis I', 'dTMP de novo biosynthesis' and 'Salvage pathway of pyrimidine ribonucleotides'), six pathways involved in inositol metabolism ('3-Phosphoinositide biosynthesis', '3-Phosphoinositide degradation', 'D-myo-inositol(1,4,5,6)-tetrakisphosphate biosynthesis', 'D-myo-inositol-5-phosphate metabolism' and 'Superpathway of inositol phosphate compounds'). Among other pathways, we found 'Glycerol-3-phosphate shuttle', 'Asparagine degradation' and 'NAD biosynthesis II (from tryptophan)' to be enriched in our dataset. These results indicate that deregulated metabolism may play an essential role in mediating resistance to palbociclib.

**Figure 5.** Metabolic pathway analysis of ER+ palbociclib-resistant breast cancer cells. A higher–log(*p*-value) shown on the left Y axis represents more significant pathways. The ratio (right Y axis) refers to the number of genes from the dataset that map to the pathway divided by the total number of genes that map the canonical pathway from the IPA database. pval ≤ 0.05; qval ≤ 0.05; |log2FC| ≥ 1.

#### **4. Discussion**

Three orally available inhibitors of CDK4/6 are currently used in combination with endocrine therapy (ET) as first-line therapy ER+ metastatic breast cancer patients [25]. Although initially beneficial, resistance to CDK4/6 inhibition arises in almost all patients within two years thus limiting durable responses. Currently, there are no biomarkers that can predict treatment response or early resistance [26]. Here, we identified a number of clinically relevant pathways that are associated with resistance to palbociclib, largely focusing on metabolic alterations and oncogenic signaling such as nucleotide metabolism, inositol metabolism, cell cycle, immune regulation and DDR.

Previous efforts to identify mechanisms of resistance to CDK4/6 inhibition have found that lack of Rb protein, increased cyclin E expression, IL6/STAT3 pathway activation and decreased DNA repair are some of the underlying mechanisms of resistance in ER+ breast cancer cells [6,13,19,27,28]. Analysis of ctDNA or tumor mRNA from patients enrolled in the PALOMA-3, NeoPalAna and MONALEESA-3 trials have identified Rb mutations, activating mutations in PIK3CA and ESR1, increased cyclin E1 and activation of the PDK1-AKT axis as some of the drivers of resistance [7,9,11]. Consistent with previous findings, we observed a significant enrichment in pathways involved in DDR [13]. Furthermore, we observed an increased in autophagy in MCF7/pR cells which is consistent with the previously described increase in autophagy driven by CDK4/6 inhibition in palbociclib-sensitive ER+ breast cancer cells [19]. Previous studies have shown that resistance to CDK4/6 inhibition is associated with a loss of ER/progesterone receptor (PR) expression in tumor biopsies of patients treated with the CDK4/6 inhibitor abemaciclib [5]. Notably, we observed a significant decrease in PR expression in palbociclib-resistant cells (Supplementary File 1). This finding is relevant given that unliganded PR sustains ER expression levels by maintaining a low methylation status of the ER gene [29]. Taken together, these observations suggest that PR loss may drive breast cancer cells to escape CDK4/6 inhibition by altering ER methylation thereby resulting in the down-regulation of ER expression. Additionally, our results highlight that ER methylation status can potentially be used to predict acquired resistance to CDK4/6 inhibition.

While our findings are in line with previously identified mechanisms of resistance, our analysis uncovered additional potential mechanisms of resistance such as deregulation of 'Polo-Like Kinase (PLK)', 'April mediated signaling' and 'Tec Kinase signaling'. Of these, targeting PLK1 is of high relevance due to its role as a master regulator of the G2-M phase and DNA replication [30,31]. Importantly, PLK1 has been shown to play a role in mediating tamoxifen resistance in ER+ breast cancer cells, and thus we will conduct additional studies evaluating the role of PLK1 as a novel target for the ER+ breast cancer resistant to CDK4/6 inhibition [32]. Importantly, a potent PLK1 inhibitor, volasertib (BI6727), has been recently approved for the treatment of acute myeloid leukemia and would be a promising therapeutic agent against palbociclib-resistant breast cancer [33,34].

Close examination of the DEGs revealed significant expression changes in many genes involved in tumorigenesis and chemoresistance. For example, up-regulation of three of the small leucine-rich family of proteoglycans (SLRP), decorin, epiphycan and lumican, was observed in our dataset (Table 1). These proteoglycans are known for their ability to regulate cell signaling, adhesion, migration, proliferation and apoptosis in many types of cancer [35,36]. Notably, accumulated evidence supports a role for both decorin and lumican in mediating drug resistance, and thus our data suggest a potential role for these proteoglycans in mediating resistance to palbociclib [37–41]. Other promising genes that were shown to be up-regulated in our dataset were cystatin S and alpha B-crystallin. Elevated blood levels of cystatin-C have been detected in women with breast cancer and are shown to correlate with cancer progression [42,43]. Alpha B-crystallin expression has been associated with high metastatic potential, poor clinical outcome and drug resistance in breast cancer [44–46]. Our findings raise the possibility of the potential use of alpha B-crystallin and cystatin-C as biomarkers of sensitivity to CDK4/6 inhibition. Of the top 20 down-regulated genes, miR-646 host gene and homeobox A10 (HOXA10) are of great relevance given their emerging tumor suppressive functions. Expression of miR-646 has been shown to directly regulate CDK6 and FOXK1 expression in gastric cancer, suggesting

its utility as a potential therapeutic target [47,48]. A lack of HOXA10 in breast cancer has been shown to decrease apoptosis and promote metastasis, and thus the role of HOXA10 in the context of palbociclib resistance warrants further investigation [49]. A limitation of our studies is the lack of validation of gene expression changes by real-time PCR; however, we believe that our initial profiling will help guide further efforts to better understand the molecular mechanisms driving drug resistance.

Metabolic reprograming is a well-established oncogenic driver that allows cells to support the increased bioenergetic and anabolic demands [50]. Importantly, CDK4/6 are key regulators of metabolic pathways, and therefore we anticipated that metabolic rewiring will be observed upon the development of resistance to palbociclib. While a previous study described an increase in glucose dependence in ER+/Her2- palbociclib-sensitive compared to palbociclib-resistant cells [51], to date little is known about global metabolic changes driving resistance to CDK4/6 inhibition. Our unbiased analysis of DEGs and metabolic pathways began to define metabolic hubs linked to palbociclib resistance. Specifically, we observed alterations in nucleotide metabolism in MCF7/pS vs. MCF7/pR cells. Importantly, these results are in line with previous reports indicating that increased expression of thymidine kinase-1 (TK1), an enzyme of the pyrimidine salvage pathway, correlates with poor prognosis in breast cancer patients treated with palbociclib [52–55].

Our findings indicate that inositol metabolism was altered in ER+ palbociclib-resistant cells. Given the role of inositols as essential membrane components crucial for the generation of secondary messengers, our results are of high biological significance and provide a direct link between signal transduction and metabolic alterations contributing to resistance. Future metabolomic profiling will be needed to confirm our initial findings and provide further evidence as to how inositol alteration contributes to resistance to palbociclib.

Collectively, our RNA-seq analysis uncovered key canonical and metabolic pathways altered in ER+ palbociclib-resistant cells and provided new insights into the molecular mechanisms and potential therapeutic targets underlying resistance to CDK4/6 inhibition.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/4/467/s1, File S1: List of all DEG identified.

**Author Contributions:** Conceptualization, Y.I.-F.; methodology, L.L., C.O., N.L., J.G.G.-G., S.W.; software, E.R., X.L. and S.W.; validation, L.L., E.R., X.L. and Y.I.-F.; formal analysis, E.R., X.L., S.W. and Y.I.-F.; investigation, L.L., N.L. and Y.I.-F.; resources, E.R., X.L., S.W., J.C. and Y.I.-F.; data curation, E.R., X.L. and Y.I.-F.; writing—original draft preparation, Y.I.-F.; writing—review and editing, N.L., X.L., S.W., J.G.G.-G. and Y.I.-F.; visualization, J.G.G.-G. and Y.I.-F.; supervision, Y.I.-F.; project administration, Y.I.-F.; funding acquisition, J.C. and Y.I.-F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Twisted Pink (Yoannis Imbert-Fernandez, PI). Sequencing and bioinformatics support for this work was provided by the National Institutes of Health grants P20GM103436 (Nigel Cooper, PI) and P20GM106396 (Donald Miller, PI).

**Acknowledgments:** We want to thank the University of Louisville's Genomics Core facility and Nigel Cooper and Donald Miller for providing support for the Next-Gen RNA-seq studies.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Splicing Genomics Events in Cervical Cancer: Insights for Phenotypic Stratification and Biomarker Potency**

**Flavia Zita Francies 1, Sheynaz Bassa 2, Aristotelis Chatziioannou 1,3, Andreas Martin Kaufmann 1,4 and Zodwa Dlamini 1,\***


**Abstract:** Gynaecological cancers are attributed to the second most diagnosed cancers in women after breast cancer. On a global scale, cervical cancer is the fourth most common cancer and the most common cancer in developing countries with rapidly increasing mortality rates. Human papillomavirus (HPV) infection is a major contributor to the disease. HPV infections cause prominent cellular changes including alternative splicing to drive malignant transformation. A fundamental characteristic attributed to cancer is the dysregulation of cellular transcription. Alternative splicing is regulated by several splicing factors and molecular changes in these factors lead to cancer mechanisms such as tumour development and progression and drug resistance. The serine/arginine-rich (SR) proteins and heterogeneous ribonucleoproteins (hnRNPs) have prominent roles in modulating alternative splicing. Evidence shows molecular alteration and expression levels in these splicing factors in cervical cancer. Furthermore, aberrant splicing events in cancer-related genes lead to chemoand radioresistance. Identifying clinically relevant modifications in alternative splicing events and splicing variants, in cervical cancer, as potential biomarkers for their role in cancer progression and therapy resistance is scrutinised. This review will focus on the molecular mechanisms underlying the aberrant splicing events in cervical cancer that may serve as potential biomarkers for diagnosis, prognosis, and novel drug targets.

**Keywords:** cervical cancer; alternative splicing; biomarkers; SR proteins; hnRNP; drug resistance

#### **1. Introduction**

Cervical cancer, also known as cervix uteri cancer, is the fourth most frequently diagnosed cancer globally and the most common malignancy in developing countries [1]. It is the most frequently diagnosed cancer in women in Sub-Saharan Africa (SSA) and the leading cause of cancer-related mortality in this region (Figure 1) [2,3]. An estimated 90% of cervical cancer-related mortality occurs in low- and middle-income countries [4]. Cervical cancer is predominantly categorised into two main histopathological subtypes—squamous cell carcinoma and adenocarcinoma [5]. Over 75–80% of all cervical cancers are squamous cell carcinomas [6,7]. Cervical cancer is attributed to a number of risk factors such as sexually transmitted infections including human immunodeficiency virus (HIV) infection, human papillomavirus (HPV) infection, socioeconomic factors, obesity, smoking [8], alcohol consumption [9], unprotected sex and multiple sexual partners, prolonged usage of oral

**Citation:** Francies, F.Z.; Bassa, S.; Chatziioannou, A.; Kaufmann, A.M.; Dlamini, Z. Splicing Genomics Events in Cervical Cancer: Insights for Phenotypic Stratification and Biomarker Potency. *Genes* **2021**, *12*, 130. https://doi.org/10.3390/ genes12020130

Received: 19 November 2020 Accepted: 12 January 2021 Published: 20 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional clai-ms in published maps and institutio-nal affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

contraceptives, and family history of cervical cancer [10]. HPV infection is the major contributor of cervical cancer [11]. HPV is a circular double-stranded DNA virus with capsid proteins with more than 200 subtypes identified and categorised as high and low risk. Of these, about 40 subtypes have an affinity for genital mucosa and are sexually transmitted. The low-risk subtypes are generally associated with genital warts, whereas high risk subtypes cause invasive cervical cancer. The most prominent high-risk HPV genotypes are HPV16 and 18. Persistent infection with these high-risk subtypes contributes to over 99% of cervical cancers [11]. HPV infections can be prevented by vaccination that confers protection against HPV 6, 11, 16, and 18 subtypes, and depending on the vaccine, subtypes 31,33,45,52,58 can also be prevented. The vaccinations are available as quadrivalent vaccine to target all four subtypes or as bivalent to target only the high-risk subtypes [12] or a combined 9-valent vaccine that targets nine subtypes [13].

**Figure 1.** Global cancer mortality. The age standardised rates (ASRs) of various leading cancers worldwide in 2018. Cervical cancer is a major burden in most parts of Africa. Reprinted from [2].

In addition to HPV infections, dysregulated pathways are a fundamental feature in cervical cancer development and progression. For this reason, research in elucidating modifications in cancer-related pathways and alternative splicing is rapidly emerging. Several studies show aberrant alternative splicing and the dysregulation of gene expression in cervical cancer [14–17]. The related molecular signatures offer potential therapeutic targets for novel drug development and improved strategies in cervical cancer management, particularly for advanced disease in developing countries where HPV infections are the major contributor of cervical cancer.

The burden of cervical cancer mortality due to HPV infections is felt prominently in developing nations. Novel therapeutic targets are warranted to address this issue. Moreover, prevention strategies such as HPV vaccinations and pap smears play a significant role in cervical cancer prevention. Modifications in cervical tissue are detected through pap smears and HPV tests, and early diagnosis allows effective management of the disease [1,10]. This review will focus on the molecular mechanisms underlying the aberrant splicing events in cervical cancer that may serve as potential biomarkers for diagnosis and prognosis and as novel drug targets for their therapeutic properties.

#### **2. Alternative Splicing and Its Implications in Cervical Cancer**

Alternative splicing is an important process in gene expression and proteome diversity. In this cellular process, introns are spliced to join exons for the production of proteins through several mechanisms (Figure 2). Alternative splicing maintains cellular diversity and regulates the synthesis of multiple protein isoforms from the same gene. These protein isoforms perform several biological functions that are necessary for normal cellular functionality. Alternative splicing is an intricate process that is closely regulated by numerous spliceosome factors that aid in recognition of intron and splice sites such as small nuclear ribonucleoproteins (snRNP) particles and the serine/arginine-rich (SR) proteins [18]. In this process, proteins are synthesised, when introns are spliced and functional exons are joined together. The negative regulation of alternative splicing is achieved by heterogeneous ribonucleoproteins (hnRNPs) that block the intron and exon boundaries [19]. These two protein families—the SR proteins and the hnRNPs—are important trans-acting regulatory factors in splicing and are known to be altered in cervical cancer [20–23]. Enhanced levels of SR lead to splicing induction, whereas splicing is inhibited when hnRNPs are overly expressed. Aberrant alternative splicing, resulting from DNA damage, mutations and expression alterations in splicing factors, miRNA disruptions, and unregulated gene expression, are implicated in cancer mechanisms, such as sustained cell proliferation, apoptotic evasion, tumour suppressor inhibition, angiogenesis, metastasis, and drug resistance [19,24–26]. Evidence suggest that aberrant alternative splicing plays an important role in the development of cervical cancer. In cervical cancer, alternative splicing is primarily HPV-mediated. Next generation sequencing (NGS) offers a platform to identify potential disease-causing splice variants and genomic changes in splicing regulatory factors/proteins. Elucidating the functions of these splice variants may provide underlying information on malignant transformation and be beneficial in developing novel strategies for therapeutic interventions [27]. For these reasons, modifications in alternative splicing are becoming a significant biomarker with diagnostic and therapeutic potential.

Alternative splicing of key genes may facilitate the development and progression of cervical malignancy. For instance, the 5 alternative splicing of the *KLHDC7B* gene is closely associated with cellular differentiation and tumour size in 67.5% of squamous cell carcinoma [28]. Similarly, 35% of exon skipping in the *SYCP2* gene was reported in cervical squamous cell carcinoma and associated with invasion and metastases [28]. Evidence also shows the association of cervical cancer and the aberrant alternative splicing of the *IL1RAP* gene. SRSF10 regulates the splicing of *IL1RAP* gene and promotes the production of its oncogenic isoform, MIL1RAP. This in turn facilitates the malignant cell evasion of phagocytosis by macrophages. Therefore, aberrant alternative splicing of *IL1RAP* gene promotes immune evasion and promotes cervical cancer [29]. A recent study by Ouyang et al. (2020) provides evidence that supports the notion that aberrant splicing events are closely associated with cervical cancer development, and the identification of these splicing biomarkers may provide useful prognostic and therapeutic tools [30]. The authors identified 2860 alternative splicing events. Of these, SNRPA and CCDC12 were associated with the tumour suppressor gene, p53, and were identified as hub genes in cervical cancer [30]. These results highlight the need to screen candidate biomarkers associated with cervical cancer that may have a clinical utility in diagnosis, prognosis, and therapy. Biomarkers related to cervical cancer are shown in Table 1.

**Figure 2.** Frequent types of alternative splicing mechanisms. Alternative spliced mRNA produces mature transcripts, namely, cassette exons (CEs), mutually exclusive exons (MXEs), alternative 5 or 3 splice sites (A5SS and A3SS), intron retention (IR), and alternative polyadenylation (AP). Coloured boxes: exons; black lines: introns [31–35].

HPV contributes to the development and progression of cervical cancer by disrupting alternative splicing and other cellular functions. The HPV genome is double-stranded and circular; it is divided into three regions, namely, the long control region (LCR) and early and late region. Each region produces proteins that have different functions in the life cycle of HPV and in cancer development [36]. Persistent HPV infection gives rise to malignancy by producing viral proteins necessary to maintain virus replication and oncoproteins. Viral oncoproteins facilitate disease development and progression by abrogating normal cellular functions such as G1 arrest, cell proliferation, apoptosis, DNA repair, and chromosomal instability [37]. In addition, HPV oncoproteins bind to splicing factors and induce aberrant alternative splicing events. Moreover, HPV-related cervical cancer has a number of genes and splicing factors that are significantly upregulated. These include genes with vital functions such as immune surveillance, inflammatory response, and tumour suppressors [29,37]. Collectively, the interference of HPV in alternative splicing and cellular function promotes transformation, leading to carcinogenesis.

**Table 1.** Overview of biomarkers associated with cervical cancer.


#### *2.1. HPV-Mediated Disruptions in Serine/Arginine-Rich (SR) Proteins*

The spliceosome is crucial in regulating alternative splicing. In addition, other regulatory factors that are short DNA sequences, known as exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs), and intronic splicing silencers (ISSs), ensure accurate splicing [60]. The splicing regulators have either a positive or negative effect on alternative splicing. ESE and ISE are cis-acting elements that are capable of binding the SR protein family to facilitate recognition of exons and initiate assembly of the spliceosome prior to alternative splicing (Figure 3) [61]. SR proteins are also known as SR splicing factors (SRSF) with SRSF1–12 as the major proteins in this family that have been identified as splicing regulators [62]. SR proteins have other vital cellular functions that are hallmarks of cancer, namely, cell cycle regulation, apoptosis, genome stability, cell adhesion and metastasis [27,63], and angiogenesis [64].

**Figure 3.** Alternative splicing regulation by SR binding. The domain structure of the 12 serine-rich (SR) proteins indicating the RRM (RNA recognition motif), RRMH (RRM homology), RS (arginine/serine-rich domain), and Zn (Zink knuckle). SR proteins bind to exonic splicing enhancers (ESEs), facilitating splice site recognition and stimulating the activation of splicing. In comparison, splicing is inhibited by the binding of the SR to introns [64–66].

> The SRSF regulates splicing by determining the cycle of phosphorylation of SR proteins. CDC-like kinases (Clks), SR protein-specific kinases (SRPKs), and Topoisomerase 1 modulate the activation of SRSFs through a cycle of phosphorylation and dephosphorylation [18]. In the event of dephosphorylation, SRSFs begin to accumulate in the cytoplasm [64]. In comparison, phosphorylated SRSFs are transported to the nucleus to stimulate splicing. The SRPKs are capable of splicing regulation by the binding action to Clks in the nucleus and the cytoplasm [21]. In addition to splicing regulation, evidence suggests that SRPKs are able to modulate viral genomic material such as the HPV [18,21,31]. Evidence shows the binding of HPV E4 protein to SRPK1 [18]. This binding action impedes activation of SR protein by inhibiting the phosphorylation of SRSF1, SRSF3, SRSF4, and SRSF7 and impedes the pre-mRNA processing (Figure 4) [67]. This leads to aberrant cellular splicing that results in oncoprotein production and cervical cancer [18].

**Figure 4.** Human papillomavirus (HPV)-mediated disruption of accurate mRNA processing. The binding of HPV E4 protein leads to the deactivation of SR splicing factors (SRSFs) 1, 3, 4, and 7 by the loss of phosphorylation. This concomitantly inhibits pre-mRNA splicing, leading to inaccurate splicing and production of oncoproteins that give rise to malignancy [18,67]. SRPK: SR protein-specific kinase.

Evidence sheds light on the oncogenic role of SRSF1 [68] and a recent report shows its involvement in cervical malignancy [21]. Mole et al. (2020) showed enhanced levels of SRSF1 in cervical cancer cells. The authors showed the trans-activation of the SRSF1 gene promoter by the high-risk HPV16 E2 protein, with differing levels in the nucleus and cytoplasm [21]. Modifications of SRSF1 abrogate alternative splicing and facilitate genomic instability and cervical malignancy. Henceforth, the results suggest that the increased cytoplasmic levels of SRSF1 are associated with early tumour progression [21]. Other evidence shows the interaction of SRSF1 binding to long non-coding RNAs (lnRNA) to regulate expression levels of keratin 17. Cervical cancer cells display enhanced levels of keratin 17. Dong et al. (2019) showed the interplay between SRSF1 and lnRNA to modulate expression of keratin 17 through alternative splicing in cervical cancer [22]. In addition to SRSF1, SRSF3 regulates the expression of a number of genes and the overexpression of SRSF3 has been shown to modulate cell proliferation by inducing G2/M cell cycle arrest and apoptosis [69,70]. SRSF3 induces production of interleukin enhancer binding factor 3 (ILF3) isoform 1 and 2 through aberrant alternative splicing. These isoforms are involved in malignant transformation [71]. Furthermore, SRSF3 regulates expression of p300, a tumour suppressor, and induces cell proliferation [70]. In HPV-infected cervical cells, SRSF3 plays a significant part in the E6\* splicing that is vital for E7 production [72] and in E1ˆE4 for viral replication [73]. Silencing SRSF3 in HPV-infected cells shows downregulation of viral E6 and E7 [72] and suppresses the E1ˆE4 splicing [73]. These results highlight the oncogenic potential of SRSF3 that may lead to cellular transformation and may contribute to cervical cancer [69].

DNA damage response plays a vital role in maintaining genomic stability and preventing carcinogenesis. Several important genes are involved in DNA damage pathways such as RAD51, ATM, p53, and ERCC1 [74]. Detecting modifications in DNA repair genes could be beneficial as biomarkers for diagnosis, prognosis, and targets for therapy. For instance, evidence shows the upregulated RAD51 mRNA levels in cervical cancer, which are associated with poor prognosis [75]. In addition to somatic mutations, HPV induces DNA damage in cervical cancer cells [76] and the resulting DNA damage response gene expression serves as prognostic biomarkers [77]. New evidence shows the association

between SRSF6 and DNA damage genes. Yang et al. (2020) evaluated the function of SRSF6 in cervical cancer cells and showed that overexpressed SRSF6 influenced the alternative splicing of DNA damage genes [78]. SRSF6-induced aberrant alternative splicing of DNA damage genes is associated with the hallmarks of cancer such as cell proliferation, tumour progression, and apoptosis [78]. Elucidating the functional impact of SRSF6 in alternative splicing of DNA damage genes could offer a target for cervical cancer therapy.

#### *2.2. HPV-Mediated Disruptions in Heterogeneous Ribonucleoproteins (hnRNPs)*

The ESS and ISS act as negative regulators to repress alternative splicing and bind to the hnRNP family of proteins. Similar to SR proteins, the hnRNPs can either positively or negatively regulate splicing by binding to ESS and ISS, negatively prompting exon definition (Figure 5). There are currently at least 20 hnRNPs identified with several important cellular functions including alternative splicing [31]. Loss of regulation in hnRNPs leads to modified gene expression of tumour suppressors and other cancer-related genes [27,79]. Henceforth, hnRNPs are implicated in malignant transformation and could be scrutinised as potential cancer-related biomarkers.

**Figure 5.** Alternative splicing regulation and the structural domains of hnRNP family. The domain structure of the hnRNP showing the RRM (RNA recognition motif), KH (K homology domain), and other RNA binding domain that is structurally different from RRM. hnRNP negatively regulates this process by binding to either exonic splicing silencers (ESSs) or intronic splicing silencers (ISSs). In addition, hnRNP blocks the activity of exonic splicing enhancers (ESEs) by binding to it [34,65,80].

Alternative splicing events are frequent in cervical cancer and are significantly associated with diagnosis and prognosis. Major splicing factors promote cervical malignancy by facilitating the production of HPV mRNAs and oncoproteins required. In addition, cellular oncogenic protein production is favoured to enhance the development of cervical cancer (Table 2). Cervical cancer cells have elevated expression of hnRNPs. For instance, hnRNPA1 is highly expressed in cervical cancer cells and can disrupt cancer-related genes. The alternative splicing of pyruvate kinase mRNA is induced by hnRNPA1 and favours aerobic glycolysis, resulting in uncontrolled cell proliferation. In the event where hnRNPA1 is downregulated, cancer-specific apoptosis is induced. hnRNPA1 is thus a good biomarker for cervical cancer diagnosis [23]. Another recent study investigated prognostic biomarkers of alternative splicing in cervical cancer and revealed hnRNPA1, ubiquitin C, and RNA polymerase II subunit L as effective prognostic biomarkers [81]. As a crucial component in alternative splicing, scrutinising the aberrant splicing induced by hnRNPA1 in cervical

cancer is critical. Additionally, during the HPV infection-related differentiation of cervical epithelial cells, hnRNPA1 is further upregulated and enables oncoviral protein transcription. Deleterious mutations in hnRNPA1 have been identified and may alter expression levels contributing to aberrant alternative splicing, mRNA processing, and translation [82].

**Table 2.** The role of major splicing factors, the human papillomavirus (HPV) binding region, and function of transcripts in cancer progression.


UTR: Untranslated region. Reviewed in [31].

Prolonged HPV infections influence cellular and viral alternative splicing to enhance viral oncogene production, leading to malignant transformation of the cervix. Malignant transformation is initiated and sustained by the high-risk HPV16 E6 and E7 proteins that interact with tumour suppressor genes p53 and retinoblastoma protein (pRb), respectively. The interaction of E6 with p53 results in apoptosis, whereas E7 steers cell proliferation by interacting with pRb [83,84]. Moreover, E6 and E7 are essential in viral replication [85]. Zheng et al. (2020) showed splicing regulation of E6 and E7 by cellular hnRNPA1 and hnRNPA2 [20]. This study revealed the direct interaction of hnRNPA1 and hnRNPA2 with high-risk HPV16 splice site SA742 and SA409. The authors showed the inhibition of SA409 when hnRNPA1 is overexpressed and favouring viral E6 mRNA production. In comparison, when hnRNPA2 is upregulated, the viral E6Ê7, E1, and E4 mRNA transcripts are favoured [20]. Adequate amounts of both E6 and E7 transcripts are required for the development of cervix carcinoma. Furthermore, evidence also shows that HPV interacts with hnRNPA1 and the silencing of hnRNPA1 suppresses E6 intron retention [73]. Hence, targeting hnRNPA1 and hnRNPA2 to modulate viral E6 and E7 mRNA transcripts may provide novel therapeutic strategies.

#### **3. Alternative Splicing and Therapy Resistance**

Drug resistance is a considerable hurdle in cancer treatment and management. Aberrant alternative splicing events are a common theme in cancer drug resistance and, therefore, strategies targeted to silence variants that promote drug resistance are highly warranted. Aberrant splice variants can promote resistance to chemotherapy and radiotherapy [24,86–88] by mechanisms that include apoptotic regulation, modified drug metabolism, response to DNA damage, and regulation of cell proliferation (Figure 6) [89]. Radiotherapy is an important therapeutic modality for the management of advanced cervical cancer and radioresistance may be detrimental. In cervical cancer, a splice variant of nucleophosmin

(NPM) protein resulting from alternative splicing causes radioresistance [86]. NPM functions in mRNA processing, genome stability, and apoptotic regulation [90]. Enhanced expression of the NPM2 variant is correlated with a radio-protective function. Evidence shows that silencing the NPM2 splice variant decreases radioresistance in cervical cancer cells in a dose-dependent manner [86]. Similarly, enhanced levels of ΔNp73, a splice variant of p73, have anti-apoptotic functions and display radioresistance in cervical cancer cells [91]. p73 (i) is a p53 homologue that expresses the oncogenic isoform ΔNp73 [92]; (ii) functions in DNA damage repair, cell cycle regulation, and apoptosis with p73 polymorphism closely associated with cervical cancer [93]; and (iii) is a prognostic biomarker for cervical cancer [94]. Cervical cancer cells exposed to high-LET radiation degrade ΔNp73 to exhibit enhanced apoptosis and cell cycle arrest at the G2/M phase when compared with low-LET radiation [91]. In addition, ΔNp73 promotes malignant transformation by interacting with RAS and inducing drug resistance to chemotherapy and radiotherapy [87]. Furthermore, the HPV oncoprotein, E6, suppresses the activity of p53 expression and alters sensitivity to radiotherapy. The overexpression of the splice variant, p73α, in p53 deficient cervical cancer cells, enhances sensitivity to radiotherapy [95]. These results highlight the importance of targeting aberrant splice variants to reverse radioresistance in cervical cancer, which is significantly relevant in treating advanced metastatic disease.

Cervical cancer is often managed with chemotherapy and radiotherapy concurrently. An estimated 50% of patients do not attain a complete response to therapy due to resistance. Alterations in molecular pathways that promote drug resistance are potential drug targets to counteract resistance [96]. For instance, the CRK-like (CRKL) adapter protein is overexpressed in approximately 50% of cervical cancers. Moreover, evidence shows that CRKL significantly regulates alternative splicing of pre-mRNA in cancer-related genes in cervical carcinoma to promote malignant transformation, metastases, and chemoresistance by binding to BCR-ABL and activating the Src and Akt signally pathway through phosphorylation [97,98]. Additionally, recent evidence shows the role of AKT3 mRNA in inducing cisplatin resistance [99]. By blocking the activity of Src and Akt through pharmacological inhibitors such as dasatinib [97] and fucoxanthin [100], respectively, aberrant splicing events that facilitate chemoresistance in cervical cancer can be reversed and promote a complete therapy response in advanced metastatic disease.

**Figure 6.** Alternative splicing-induced drug resistance. Aberrant splicing events of vital genes in cervical cancer cells promote drug resistance through several mechanisms by regulating apoptosis, cell cycle arrest, cell proliferation, and DNA damage response. In addition, splice variants may also alter drug targets that effect drug metabolism and lead to chemoresistance and alter the sensitivity to radiotherapy [86,87,89,95,97–99].

#### **4. Clinical Utility of Biomarkers in Cervical Cancer**

Altered expression of splicing regulators, deleterious mutations in splicing regulators and splicing regulatory sequences, and suppressed activity of splicing regulators can cause aberrant alternative splicing, which may result in tumourigenesis and therapy resistance (Figure 7). However, alternative splicing biomarkers have been studied extensively as potential targets of novel therapy [24,27]. The current diagnostic and prognostic indicators of cervical cancer are largely clinicopathology and HPV screening intensive. With the introduction of NGS, large-scale RNA sequencing can be clinically utilised to identify tissue-specific molecular biomarkers. Subsequent to the identification of onco-biomarkers, functional biological assays are imperative to characterise the properties of effective and clinically significant biomarkers for novel clinical utility in diagnosis, prognosis, and therapeutic interventions [27].

**Figure 7.** Overview of clinical biomarker identification. Aberrant alternative splice variants are often expressed in significantly higher levels compared with normal splice variants that can be identified through next generation sequencing (NGS). These aberrations can contribute to the development of tumourigenesis, therapy resistance, and poor prognosis. The effects of aberrant alternative splicing can be addressed by identifying cervical cancer-specific genomic and splicing aberrations that are clinically relevant for novel diagnostic, prognostic, and therapeutic purposes [24,81].

Reversing aberrant alternative splicing or silencing oncogenic variants could offer therapeutic strategies in managing cervical cancer. Pharmacological agents are frequently evaluated for their splicing inhibitory or silencing effects in cancer cells. The current alternative splicing modulators studied are small molecule splicing inhibitors, transsplicing, antisense oligonucleotides, and gene therapy. These modulators can regulate alternative splicing by controlling the functioning of spliceosomal activity [27]. For instance, caffeine suppresses the expression of SRSF2/3 and p53α, while upregulating the alternative spliced variant of p53β. Caffeine regulates cellular functions such as cell cycle arrest, DNA damage, and apoptosis by modulating the SRSF3 [101,102]. Cervical cancer cells treated with caffeine showed tumour suppression through the modulation of splicing factors. In addition, the recent evidence shows that pladienolide B inhibits the splicing factor SF3b1, which is a subunit of the spliceosome, to induce the G2/M cell cycle arrest, apoptosis, and p73 splicing in cervical cancer cells [103]. Other small molecules evaluated in cervical cancer include RI-1, a RAD51 inhibitor [104]. Modified gene expression is a central characteristic of cancer cells such as the altered expression of RAD51 mRNA in cervical cancer cells compared with healthy cells [75]. RI-1 promotes cell cycle arrest from G0/G1 to S phase and inhibits the RAD51-induced cell proliferation in cervical cancer cells [104]. These results indicate the potential of pharmacological agents to regulate alternative splicing in cervical cancer and their therapeutic potential.

Inhibiting splicing factors can evoke a tumour suppressive function. For instance, blocking the function of SRSF1 may contribute to apoptotic activity. Cervical cancer cells treated with an AURKA kinase inhibitor, such as the pharmacological agent VX-680, downregulate the post-transcriptional expression levels of SRSF1 [105]. AURKA kinases, part of the aurora family of proteins, are cell division regulators. Dysregulation of these proteins leads to uncontrolled cell division and proliferation, resulting in malignancy [42]. Cervical cancer cells treated with VX-680 promote aberrant alternative splicing of apoptotic regulating genes, Bcl-x and Mcl-1, and inhibit the anti-apoptotic function of SRSF1, leading to apoptosis [105]. Silencing of SRSF1, therefore, signifies a novel therapeutic target for cervical cancer.

#### **5. Conclusions**

The mortality associated with cervical cancer is increasing at an alarming rate. The development of cervical cancer is largely influenced by HPV infections in low- and middleincome countries that add to this encumbrance. Vaccination programs addressing HPV have been successful in lowering HPV infections in high-risk women. Moreover, screening and prevention programs are useful in early detection and treatment. In addition to HPV infections, molecular alterations at the RNA level contribute to cervical carcinoma. These include modifications in cellular alternative splicing induced by HPV. RBPs like SRs and hnRNPs are essential in maintaining the stability and packing of mRNAs, as well as transport to the cytoplasm for further processing. These processes are intricately balanced by several splicing factors and proteins to ensure accurate alternative splicing. Despite the stringent regulation, SR proteins and hnRNPs are often dysregulated in cervical cancer and lead to aberrant alternative splicing of many important cancer-related genes, including therapy resistance. For these reasons, SR proteins and hnRNPs are ideal candidates for drug targets. Hence, identifying biomarkers crucial to the development of cervical malignancy, its pathogenesis, and splice variants that are highly expressed in cervical cancer will be beneficial in developing novel therapeutic targets, especially in low- and middle-income countries where the burden of cervical cancer is rapidly increasing.

**Author Contributions:** F.Z.F. was involved in writing original draft preparation and editing; S.B. was involved in review and editing; A.C. was involved in final review and editing; A.M.K. was involved in final review and editing; Z.D. conceived the idea, funding acquisition, supervision, and involved in review and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Medical Research Council of South Africa, grant number: 23108.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to thank the Medical Research Council of South Africa (SAMRC) and the University of Pretoria for funding this research.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


## *Article*

## **Quantitative Proteomics of Urinary Bladder Cancer Cell Lines Identify UAP1 as a Potential Therapeutic Target**

**Vinuth N. Puttamallesh 1,2, Barnali Deb 1,3, Kirti Gondkar 1,2, Ankit Jain 1, Bipin Nair 2, Akhilesh Pandey 1,3,4,5,6, Aditi Chatterjee 1,2,3, Harsha Gowda 1,2,3 and Prashant Kumar 1,2,3,\***


Received: 4 May 2020; Accepted: 28 May 2020; Published: 8 July 2020

**Abstract:** Bladder carcinoma (BC) incidence and mortality rates are increasing worldwide. The development of novel therapeutic strategies is required to improve clinical management of this cancer. Aberrant protein expression may lead to cancer initiation and progression. Therefore, the identification of these potential protein targets and limiting their expression levels would provide alternative treatment options. In this study, we utilized a liquid-chromatography tandem mass spectrometry-based global proteomics approach to identify differentially expressed proteins in bladder cancer cell lines. A total of 3913 proteins were identified in this study, of which 479 proteins were overexpressed and 141 proteins were downregulated in 4 out of 6 BC cell lines when compared with normal human urothelial cell line (TERT-NHUC). We evaluated the role of UDP-N-acetylhexosamine pyrophosphorylase (UAP1) in bladder cancer pathogenesis. The silencing of UAP1 led to reduction in proliferation, invasion, colony formation and migration capability of bladder cancer cell lines. Thus, our study reveals UAP1 as a promising therapeutic target for bladder cancer.

**Keywords:** quantitative proteomics; urothelial carcinoma; molecular subtypes; therapeutic target

#### **1. Introduction**

Bladder carcinoma (BC) is the most common malignancy of the urinary tract. Majority of the patients are diagnosed as non-muscle invasive (NMIBC; 60%) with a recurrence rate of 50%–70% and about 20% of patients are diagnosed as muscle invasive bladder cancer (MIBC) [1]. The risk of progression for NMIBC to MIBC after 5 years ranges from 6% to 45% [2,3]. MIBCs are biologically aggressive, with less than 15% five-year survival rate [4]. In the past few years, the surgical advancements have improved the outcomes of the patients. However, despite the use of neoadjuvant and adjuvant therapies, the long term survival rates in BC patients has remained unchanged for over a decade [5]. Furthermore, progress is required to develop a common molecular classifier that can be used for effective clinical decisions [6].

Several FDA approved immune checkpoint inhibitors, such as atezoliumab, nivoluma, pembrolizmab, durlumab and avelumab, were used to treat bladder cancer, with response rates

ranging from 15% to 25% [7,8]. Moreover, extensive efforts have been made to identify therapeutic targets for BC, by targeting proteins belonging to various biological functions. This includes casein kinase 2 (CK2), C-X-C motif chemokine ligand 1(CXCL1), eukaryotic translation initiation factor 3 subunit D (EIF3D), adenylate kinase 4 and yes-associated protein 1 (YAP1) [9–12]. Alternatively, several proteins which are studied in other cancer types, such as are androgen receptor (AR), aurora kinase A (AURKA), epidermal growth factor receptor (EGFR), focal adhesion kinase (FAK), fibroblast growth factor receptor (FGFR) and transforming growth factor-β-induced (TGFBI) have also been explored in BC for their potential as therapeutic targets [13–16]. Though all these efforts provided multiple avenues for reducing the tumor burden in BC, they fell short in successful translation as treatment options.

Proteomics is a high-throughput technology that can be used to identify differentially expressed proteins that potentially play an important role in pathogenesis. A rapidly evolving technology platform is known to have the potential to identify novel proteins involved in key biological processes in the cell that may serve as potential drug targets. Therefore, an unbiased investigation of the proteomic alterations in BC using high resolution mass spectrometry-based approach will aid in identifying alternative therapeutic targets.

In this study, we investigated six BC cell lines (SW780, RT112, VMCUB1, T24, J82 and UMUC3), which were previously characterized as luminal, basal and non-type subtype [17]. Previously, it was shown that the EMT score of non-type molecular subtype is more "mesenchymal-like," whereas the luminal/basal subtypes are "epithelial-like." The non-type subtype cell lines show an increased migratory and invasive phenotype, reflecting typical characteristics of a mesenchymal-like phenotype [18]. We performed quantitative proteomic analysis to identify global proteomic changes in these BC cell lines. Our study identified and quantified 3913 proteins. We explored the role of UAP1 for the first time in bladder carcinoma, which was overexpressed in basal and non-type subtype cell lines. UAP1 plays an important role in energy metabolism and has been reported to be involved in prostate cancer pathogenesis [19]. Furthermore, the functional characterization of UAP1 using siRNA-based silencing in non-type subtype BC cell lines resulted in a decrease in the cell proliferation, colony formation, invasion and migration properties of highly aggressive bladder cancer cell lines.

#### **2. Materials and Methods**

#### *2.1. Cell Lines and Culture Conditions*

BC cell lines SW780, RT112, VMCUB1, T24, J82 and UMUC3 and normal human urothelial cell line TERT-NHUC (JTERT) were received from Prof. Jean Paul Thiery (Department of Biochemistry, National University of Singapore, Singapore). The non-type (UMUC3, J82, T24), luminal (RT-112, SW780) and basal (VMCUB-1) BC cell lines were cultured using DMEM medium supplemented with 10% Fetal Bovine Serum (FBS) and 1% Penicillin/Streptomycin. TERT-NHUC was grown in KGM goldTM keratinocyte growth medium containing supplements (bovine pituitary extract, human epidermal growth factor, insulin, hydrocortisone, epinephrine, transferrin, gentamicin and amphotericin-B). All the cell lines were maintained at 37 ◦C in a humidified 5% CO2 incubator.

#### *2.2. Cell lysis and Protein Digestion*

Cells were lysed in lysis buffer (50 mM triethyl ammonium bicarbonate (TEABC) pH 8.0, 2% SDS, 1 mM sodium orthovanadate, 2.5 mM sodium pyrophosphate, 1 mM β-glycerophosphate, 1 mM sodium fluoride), sonicated and centrifuged at 16,000× *g* for 20 min. Protein estimation of the clarified lysate was done using bicinchoninic acid assay (BCA) (Pierce, Waltham, MA, USA) according to manufacturer's instructions. Equal amount of protein from all the cell lines were reduced, using 10 mM Dithiotheritol (DTT) for 30 min at 60 ◦C, followed by alkylation using 20 mM Iodoacetamide (IAA) in dark for 10 min at room temperature. Reduced and alkylated protein lysate was subjected to acetone precipitation, using 7 volumes of pre-chilled acetone to remove SDS from the solution. Protein digestion was performed using sequencing grade trypsin (Promega, Madison, WI, USA) at 1:20

enzyme:substrateratio at 37 ◦C for 12–14 h. The tryptic peptides were vacuum dried and stored until further use.

#### *2.3. TMT Labeling and Basic pH Reverse Phase Liquid Chromatography (bRPLC)*

TMT labeling was done according to the manufacturer's instructions, with minor modifications. Briefly, the TMT labels were reconstituted in 41μL of anhydrous acetonitrile (ACN) and trypsin digested peptide samples were reconstituted in 100 μL of 50 mM TEABC (pH 8.0). Tandem mass tag (TMT) labels 128N, 128C, 129C, 130N, 130C and 131 were used for labeling bladder cancer cell line samples, and TMT label 126 was used for TERT-NHUC control cell line. TMT labels were mixed with respective samples and the reaction was incubated for 1 h at room temperature. After incubation, the reaction was quenched with 8 μL of 5% hydroxylamine. The labeled peptides were lyophilized and subjected to basic pH reverse phase chromatography (bRPLC). Lyophilized samples were reconstituted in bRPLC solvent A (10 mM TAEBC, pH 9) and were separated on XBridge BEH C18 Column (Waters, UK), using solvent B (10 mM TEABC with 90% acetonitrile, pH 9). The column was equilibrated at 5% Solvent A from 0-to-5 min; the solvent B percentage was gradually increased from 5% to 55% in the following 60 min and then increased from 55–90% for the following 10 min, and then maintained at 90% Solvent B for 10 min before being decreased to 5% for 2 min on an Agilent 1100 Liquid Chromatography (LC) system, with a flow rate of 1 mL/min. A total of 96 fractions were collected over a period of 60 min gradient and later concatenated into 12 fractions. Fractionated samples were lyophilized before LC-MS/MS analysis.

#### *2.4. LC-MS*/*MS Analysis*

LC-MS/MS analysis was done on an Orbitrap Fusion mass spectrometer (Thermo Electron, Bremen, Germany), interfaced with Easy-nLC1000 nanoflow LC system (Thermo Scientific, Odense, Denmark). The peptides were reconstituted in 0.1% formic acid and loaded onto a trap column (nanoviper 2 cm, 3 μ magic C18Aq, Thermo Scientific). Peptides were resolved on an analytical column (nanoviper 25 cm (75 μm silica capillary, 3 μm magic C18, Thermo Scientific)), at a flow rate of 300 nL/min, using a linear gradient of 7–35% solvent B (0.1% formic acid in 100% ACN) over 100 min. The total run time including sample loading and column reconditioning was 120 min. Data-dependent acquisition with full scans in 350–1700 m/z range was carried out using an Orbitrap mass analyzer at a mass resolution of 120,000 at 400 m/z. Most intense precursor ions from a survey scan were selected for MS/MS fragmentation using higher energy collision dissociation (HCD) fragmentation, with 35% normalized collision energy and detected at a mass resolution of 30,000 at 400 m/z. AGC target value was set to 50,000, with maximum ion injection time of 150 ms. For MS3 analysis, synchronous precursor selection was enabled and 10 precursor ions were selected for fragmentation with 55% HCD collision energy.

#### *2.5. Data Analysis*

The mass spectrometry data were searched for using MASCOT (version 2.2.0) and SEQUEST search algorithms against Human RefSeq database (version 89), using Proteome Discoverer (version 2.2 (Thermo Fisher Scientific, Bremen, Germany). The search parameters for both algorithms included carbamidomethylation of cysteine residues (57.02 Da), TMT modification at peptide N-terminus and lysine side chain as a fixed modification, oxidation of methionine (15.99 Da) as dynamic modification. MS/MS spectra were searched with a precursor mass tolerance of 10 ppm and fragment mass tolerance of 0.1 Da. Trypsin was specified as the protease, and a maximum of two missed cleavages were allowed. The data was searched against target decoy database and the false discovery rate was set to 1% at the peptide level. The TMT ratio for each peptide-spectrum match was calculated by the quantitation node.

#### *2.6. Protein-Protein Interaction Networks*

Interaction network was analyzed using the STRING functional protein association network (https://string-db.org; version: 11.0; University of Zurich, Zurich, Switzerland) [20]. The input consisted of the 122 proteins that were overexpressed in all the BC cell lines and was set to highest confidence (0.90) of active interaction. The disconnected nodes were hidden, and K-means clustering was conducted to identify three clusters in the dataset.

#### *2.7. Western Blotting*

Cell lines were cultured up to 70% confluency and cells were harvested using RIPA lysis buffer (10 mM Tris pH 7.4, 150 mMNaCl, 5mM EDTA, 1% Triton-X-100, 0.1% SDS containing protease and phosphatase inhibitor cocktails) and sonicated to extract proteins. Western blot analysis was performed as previously described, using 30 μg protein lysates [21]. Nitrocellulose membranes were hybridized with primary antibodies and developed using Luminol reagent (Santa Cruz Biotechnology, Dallas, TX, USA), as per the manufacturer's instructions. Anti-UAP1 (HPA014659) antibody and β-actin antibody were obtained from Sigma (St. Louis, MO, USA). Anti-GAPDH antibody was obtained from Abcam (Cambridge, MA, USA).

#### *2.8. siRNA Transfection*

ON-TARGETplusSMARTpool control siRNA and UAP1siRNA (catalog ID: L-017160-01-0005) were procured from Dharmacon (Lafayette, CO, USA), and BC cell lines (UMUC3, J82 and T24) were transfected with controlandUAP1siRNA using RNAiMAX reagent (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's instructions. Cells were subjected to invasion assays and colony formation assays 48 h post-transfection, unless otherwise stated.

#### *2.9. Cell Proliferation Assay*

Cell proliferation assays were carried out as described previously [22]. Cells from UMUC3, J82 and T24 cell lines were seeded at a density of 4000 cells/well into a 96-well plate and cells were counted subsequently after every 48 h. Cell proliferation was determined using MTT (3-(4, 5-dimethylthiazol-2yl)-2, 5-diphenyl tetrazolium bromide) assays, as described. Absorbance was measured at 570 nm and 650 nm over a period of 4 days and growth kinetics were plotted. All the experiments were carried out in triplicates.

#### *2.10. Colony Formation Assay*

Colony formation assays were carried out as described previously, with minor modifications [21]. UMUC3, J82 and T24 cells were transfected with UAP1siRNA or scramble siRNA. 48 h post transfection, 1000 cells/well were seeded in 6-well plates with complete media and allowed to grow for 7 days. The resulting colonies were fixed with methanol, and stained with Giemsa (Sigma, St. Louis, MO, USA). The number of colonies per dish was counted. All experiments were performed in triplicate and standard deviation was calculated.

#### *2.11. Wound Healing Assay*

The wound healing assays were performed as described previously [23]. Briefly, 100,000 cells were seeded in 6 well plates in triplicates for each condition. UMUC3, J82 and T24 cells were treated with UAP1siRNA or control siRNA. The cells were allowed to grow until full confluency was achieved. Uniform size wound was introduced from end to end in a 6 well plate for each condition and a photograph was taken at 0 h. Cells were then observed for wound healing periodically and photomicrograph at 8 h was taken under microscope. All experiments were performed in triplicate, unless otherwise indicated.

#### *2.12. Invasion Assay*

Invasion assays were performed in a transwell system (BD Biosciences), with Matrigel-coated filters and cellular invasion was evaluated after 48 h, as described previously [24]. Briefly, invasiveness of the cells was assayed in the membrane invasion culture system using polyethylene terephthalate (PET) membrane (8-μm pore size), in the upper compartment of a transwell coated with Matrigel (BioCoat Matrigel Invasion Chamber; BD Biosciences). The cells were transfected with either control or UAP1siRNA and seeded at 2.0 <sup>×</sup> 104 cells per 500 <sup>μ</sup>L of media on the Matrigel-coated PET membrane in the upper compartment. The lower compartment was filled with complete growth media and the plates were maintained at 37 ◦C for 48 h. At the end of the incubation time, the upper surface of the membrane was wiped with a cotton-tip applicator, to remove non migratory cells. Cells that migrated to the lower side of membrane were fixed and stained using 4% methylene blue. The membranes were cut out from the transwell and mounted on glass slide using Dibutylphthalate Polystyrene Xylene (DPX) and covered with a microscope cover slip. The number of cells that penetrated was counted for six randomly selected viewing fields at 10x magnification.

#### *2.13. Statistical Analysis*

Statistical analyses were carried out using GraphPad Prism version 6 (GraphPad Software, La Jolla, CA, USA). For cell culture-based assays (proliferation, invasion, colony formation and wound healing) non-parametric test (Mann–Whitney U test) was used to assess statistical significance. For proteomics data, a statistical analysis was done using one-way ANOVA for individual proteins.

#### **3. Results**

#### *3.1. Identification of Proteins with Altered Expression in Urinary Bladder Cancer Cell Lines through Quantitative Proteomics*

We studied the protein expression of BC cell lines (SW780, RT112, VMCUB1, T24, J82 and UMUC3) and normal human urothelial carcinoma cell line (TERT-NHUC) using tandem mass tag (TMT)-based labeling technology, coupled with high-resolution mass spectrometry to identify differentially expressed proteins. The experimental workflow used for differential proteomic expression analysis in this study is depicted in Figure 1. We identified and quantified 3913 proteins across all the cell lines in triplicate mass spectrometry analysis and calculated their fold-change values based on reporter ion intensities (Supplementary Table S1; Supplementary Figure S1).

**Figure 1.** Workflow for quantitative proteomic analysis of urinary bladder cancer cell lines. Cultured bladder cancer (BC) cell lines and normal human urothelial carcinoma cell line (TERT-NHUC) were harvested in cell lysis buffer. Protein extraction was done using probe sonicator, followed by protein estimation, normalization, trypsin digestion and tandem mass tag (TMT) labeling. TMT labeled peptides were pooled and subjected to fractionation and LC-MS/MS analysis. Proteomics data were analyzed and candidate molecule was validated.

#### *3.2. Distinct Protein Expression Pattern of Non-Type*/*Basal as Compared to Luminal Bladder Carcinoma Cell Lines*

We compared the protein expression pattern of the non-type/basal and luminal subtypes of bladder carcinoma cell lines. Unsupervised clustering based on both rows and columns was conducted with one minus Kendall's correlation with average linkage and we observed that the luminal cell lines showed a distinct expression pattern (Figure 2A). We compared differentially expressed proteins (>2 fold) from all the non-type/basal and luminal cell lines. We identified 135 and 53 proteins, which were overexpressed and downregulated exclusively in non-type/basal cell lines (Figure 2B). Principle component analysis revealed that the luminal cell lines show distinct expression pattern as compared to the non-type/basal cell lines (Figure 2C).

**Figure 2.** Comparison of the protein expression of non-type/basal and luminal subtype. (**A**) Unsupervised clustering showing the distinct expression pattern of non-type/basal and luminal cell lines. (**B**) Overexpressed and downregulated proteins in non-type/basal and luminal cell lines. (**C**) Principle component analysis of the protein expression of the bladder carcinoma cell lines.

#### *3.3. DNA Replication Process and Cell Cycle Regulation and Progression Were Dysregulated across All Bladder Carcinoma Cell Lines*

We identified 122 proteins to be overexpressed and 34 proteins to be downregulated across all the 6 BC cell lines, compared to the TERT-NHUC (control) cell line (Figure 3A). 479 proteins were overexpressed, and 141 proteins were downregulated ≥2 fold in four or more BC cell lines when compared with the TERT-NHUC (control) cell line. We checked the interaction between the proteins that were overexpressed across all BC cell lines. We observed that the proteins formed two major clusters; one cluster involved the MCM proteins; namely MCM2, MCM3, MCM4, MCM5, MCM6 and MCM7, which closely interact with each other and are involved in DNA replication, while the other cluster involved proteins involved in cell cycle regulation and progression, such as AURKB, AURKA, CDC20, RRM2, TOP2A, RACGAP KIF23, KIF4A, TPX2, CEP55, NUSAP1, CKAP2, CENPF, and so on (Figure 3B).

#### *3.4. UAP1 Was Overexpressed in Bladder Carcinoma*

We sought to identify the proteins that show higher expression in the basal and non-type subtype of bladder carcinoma, which might be attributed to the aggressive phenotype in these cells. We identified 39 proteins which were overexpressed in non-type and basal subtype (Figure 3C; Supplementary Table S2). We selected UDP-N-acetylhexosamine pyrophosphorylase (UAP1) as a candidate molecule to study the functional implication in BC, based on its function and association with cancer pathogenesis. Expression pattern of UAP1 in BC cell lines is represented in the box plot, as shown in Supplementary Figure S2.

#### *3.5. Silencing of UAP1 Decreases Cellular Proliferation in Urinary Bladder Cancer Cells*

Western blotting analysis confirmed the overexpression of UAP1 in most of the BC cell lines compared with the TERT-NHUC cell line (Figure 4A). We selected the 3 non-type BC cell lines (UMUC3, J82, T24), to study the role of UAP1 in cellular proliferation. Endogenous expression of UAP1 was silenced-using siRNA mediated knockdown and Western blot analysis confirmed the successful knockdown of UAP1 in BC cell lines (Figure 4B). We then assessed the effect of UAP1 knockdown on the cellular proliferation of BC cell lines. UAP1 siRNA and control siRNA transfected BC cell lines were grown in triplicate for 96 h, and cell viability was assessed at every 24 h intervals using MTT assay. We observed significant reduction in cellular proliferation of UAP1 knockdown BC cell lines when compared with the control cell line (Figure 4C). These results indicate that UAP1 plays a role in cellular proliferation in non-type BC cells.

#### *3.6. Silencing of UAP1 Decreases Colony Forming Ability in Urinary Bladder Cancer Cells*

After demonstrating the role of UAP1 in cellular proliferation, we continued to study the role of UAP1 in the colony forming ability of the BC cells. UAP1 siRNA transfected cells and control siRNA transfected cells were seeded onto 6 well plates at 1000 cells per well density and allowed to grow for 7 days. The colonies formed were fixed, stained, counted under microscope and photographed for representation. Silencing of UAP1 led to reduction in the number of colonies formed and reduced the size of colonies formed in UAP1 knockdown non-type BC cells (Figure 5A,B). Our results show the importance of UAP1 in the colony formation of BC cells.

**Figure 3.** Global proteomic profiling of bladder carcinoma cell lines. (**A**) Total dysregulated proteins identified in the study in four or more bladder carcinoma cell lines, with 479 proteins overexpressed and 141 proteins downregulated. (**B**) Protein-protein interaction network showing the clusters of proteins with highest confidence (0.90) of interaction. (**C**) Heatmap showing 39 proteins that were overexpressed in the basal and non-type bladder carcinoma cell lines.

**Figure 4.** UAP1 silencing reduces cell proliferation of urinary bladder cancer cells. (**A**) Western blot analysis of UAP1 expression in BC cell lines and TERT-NHUC cell line validates the high expression of UAP1 in BC cell lines, as discovered from mass spectrometry analysis. (**B**) Western blot analysis of UAP1 silencing using siRNA mediated knockdown confirms the reduced expression of UAP1 after silencing in BC cell lines; β actin was used as a loading control. (**C**) Inhibition of UAP1 reduces cellular proliferation of BC cell lines (\* *p* < 0.05; \*\* *p* < 0.01).

**Figure 5.** Inhibition of UAP1 affects the colony forming ability and reduces the invasive ability of urinary bladder cancer cell lines. (**A**) Colony formation assay was performed after siRNA mediated knockdown of UAP1 or control siRNA in BC cell lines. Colonies formed were fixed and stained using methylene blue. Images were captured and colonies were counted; reduced colony forming ability of BC cells after siRNA silencing was observed. (**B**) Graphical representation of reduction in colony forming ability of BC cell lines after UAP1 silencing (\*\*\* *p* < 0.001; \*\*\*\* *p* < 0.0001). (**C**) BC cell lines were transfected with UAP1 siRNA or control siRNA and invasion assay was performed. The assay was done in transwell system using Matrigel-coated filters and the cells that migrated to the lower chamber fixed, stained and images captured for counting and representation. (**D**) Graphical representation of reduction in invasive ability of BC cell lines after UAP1 silencing (\*\* *p* < 0.001; \*\*\* *p* < 0.001).

#### *3.7. Silencing of UAP1 Decreases Invasive Property in Urinary Bladder Cancer Cells*

We further examined whether UAP1 has a role in BC invasiveness. Invasion assays were performed after silencing the endogenous expression of UAP1 using siRNA. BC cells transfected with UAP1 siRNA or control siRNA were transferred to the matrigel-coated PET membrane in the upper compartment and the plates were incubated at 37 ◦C for 48 h. We fixed and stained the migrated cells and counted the cells under a microscope. siRNA mediated silencing of UAP1 showed reduction in invasive capability of all the non-type BC cell lines (Figure 5C,D). Our results indicate a clear role of UAP1 in regulating invasive capability, which is a critical process during metastasis.

#### *3.8. Silencing of UAP1 Decreases Cell Motility in Urinary Bladder Cancer Cells*

Since the silencing of UAP1 reduces both the cell proliferation and colony forming ability of non-type subtype BC cells, we decided to explore whether UAP1 has any role in tumor cell motility. We performed scratch wound assays using non-type subtype BC cells with or without UAP1 silencing. Cells were grown till full confluence in 6 well plates and wounds were made in uniform size. After incubation for 8 h, control BC cells showed increased migration compared to UAP1 silenced BC cells. We took the images of wounds at 0 h and 8 h and calculated the distance covered by cells using Image J software I (Figure 6A,B). Our results from the cell motility assay clearly showed that UAP1 influences cell motility in BC cells.

**Figure 6.** Inhibition of UAP1 reduces the migratory ability of bladder cancer cell lines. (**A**) Wound healing assay was carried out after transfection of BC cell lines using either UAP1 siRNA or control siRNA; scratches were made after cell confluence and monitored for 8 h till wound healing. (**B**) Representative graph shows the distance migrated by BC cell lines (\* *p* < 0.05; \*\* *p* < 0.01; \*\*\* *p* < 0.001; \*\*\*\* *p* < 0.0001).

#### **4. Discussion**

BC management is challenging due to its presentation, histological subtypes and high recurrence rates. Surgical intervention and systemic chemotherapy are the preferred treatment options by clinicians, due to lack of bladder cancer specific targeted therapies [25]. The deeper knowledge on molecular mechanisms would help enormously to develop precision therapeutics. In addition, alternative therapeutic approaches need to be explored to improve the overall survival of patients, particularly in the case of aggressive tumors. In this study, we have conducted a global quantitative proteomics study to identify dysregulated proteins and potential targets which could lead to better treatment strategies in bladder carcinoma.

Cancer cell lines provide abundant information about cancer cell biology and alterations in protein repertoire in disease pathogenesis. In this study, we identified 122 proteins that were overexpressed across all the BC cell lines. These proteins might play an important role in cancer pathogenesis. The protein-protein interaction network revealed interaction between MCM proteins which formed the hub proteins. These proteins include a panel of minichromosome maintenance complex component (MCM) proteins; namely MCM2, MCM3, MCM4, MCM5, MCM6 and MCM7 [26,27]. MCM complex plays an important role in the DNA replication initiation. The active MCM2-7 double hexamer conducts bidirectional DNA synthesis at the S-phase of the cell cycle [28,29]. It has been reported that MCM complex subunits have been implicated in cell proliferation, invasion and metastasis in cancer [30–32]. A meta-analysis of the MCM proteins also reported that the higher expression of these proteins were related to worse prognosis in cancer patients [28]. Indeed, MCM expression is often observed in all epithelial layers and a high frequency of MCM-positive cells correlates with adverse clinical outcome in bladder carcinoma [33].

Apart from MCM family proteins, we also identified thymidine kinase (TK1), aurora kinase B (AURKB), cell division cycle protein 20 (CDC20), DNA topoisomerase 2-α (TOP2A), exportin-5 (XPO5) and several other proteins significantly overexpressed in our study [34–37]. Pharmaceutical compounds targeting Aurora A (such as Alisertib), have been extensively studied in preclinical models and observed to have synergy with several other targeted therapies that lead to tumor regression in various cancers [38]. Furthermore, TK1 has been reported as a tumor biomarker and it also exhibits potential in drug discovery and as a therapeutic target [39]. Interestingly, TK1 is not essential for normal cell growth, but it is important in the repair of DNA damage that may be caused by chemotherapy [6]. Alternatively, the overexpression of TK1 can reduce the efficacy of chemotherapy as well [39]. Cell cycle proteins are being widely studied for their potential in cancer therapeutics and further mechanistic studies would lead to their clinical utility in personalized therapy.

Our proteomics data identified 39 proteins which were specifically overexpressed more than 2-fold in basal and non-type subtype cell lines compared to the luminal subtype (table A). Among the list, only few molecules have been previously studied in bladder cancer tumorigenesis. Annexin-6 was overexpressed in pT1 grade3 in urothelial carcinoma [40]. Expression of vimentin was found in 69% cases of transitional cell carcinoma and its expression is associated with the grade of transitional cell carcinoma [41]. In accordance with our study, Wu et al. have also identified the expression of Anilin actin binding protein to be subtype-specific [42]. Similarly, the expression of Anilin actin binding protein was higher in the basal subtype compared to luminal subtype in bladder cancer cell lines.

Aberrant glycosylation has been gaining importance as one of the hallmarks of cancer. Alteration in metabolism and dysregulation in cellular energetics leads to cancer progression [43]. A tight association between the hexosamine biosynthetic pathway (HBP) and cell metabolism is widely reported [44]. Post-translational modification like glycosylation plays a key role in cell adhesion, migration, immune surveillance, cell signaling and cellular metabolism. HBP mediates glycosylation events through the final substrate UDP-GlcNAc. The conversion of GlcNAc-1P substrate to UDP-GlcNAc is mediated by UDP-N-acetylhexosamine pyrophosphorylase (UAP1) [45].

In our data, we identified that UDP-N-acetylhexosamine pyrophosphorylase (UAP1) was overexpressed more than 2-fold in all the basal and non-type subtypes of bladder cancer cell lines. Although the lack of biological replicates for proteomic analysis of each cell line is a limitation, UAP1 was chosen for further characterization, as it was overexpressed in four out of six cell lines analyzed. Functional assays to evaluate role of UAP1 in regulating proliferation, cell migration and colony formation were carried out in biological replicates across three different cell lines. UAP1 is the final enzyme in the hexosamine biosynthetic pathway (HBP), producing UDP-N-Acetylglucosamine (UDP-GlcNAc), which is a substrate for protein glycosylation. Abnormal protein glycosylation was previously shown to be associated with poor prognosis in bladder cancer patients [45]. A previous study in prostate cancer has shown the ability of UAP1 to protect cancer cells from endoplasmic reticulum stress and provide a growth advantage. High expression of UAP1 exhibited resistance against inhibitors of N-linked glycosylation (tunicamycin and 2-deoxyglucose) and targeting UAP1 blocked anchorage independent growth [19]. Earlier work from our group on N-glycoproteomic profile in bladder cancer cell lines has identified aberrant glycosylation in aggressive non-type subtype BC cell lines [46]. These results prompted us to further investigate UAP1 role in BC. Western blot analysis in our study confirmed that UAP1 was also overexpressed in the luminal cell lines, although not as high as in the basal or non-type cell lines. Further studies are required to understand its role in cancer progression, to comment on its role on the aggressive phenotype. Furthermore, knockdown of UAP1 using siRNA-based silencing in vitro showed a reduction in cell proliferation, invasion, colony formation and migration properties of BC cell lines. These results are novel to BC and demonstrate UAP1 role in tumorigenesis.

#### **5. Conclusions**

In conclusion, comprehensive quantitative proteomic analysis of urinary BC cell lines identified the overexpression of UAP1 for the first time. Knockdown of UAP1 reduced BC cell proliferation, invasion, colony formation and migration ability of these cell lines. Our data suggests that UAP1 could be a promising therapeutic target for aggressive urinary bladder cancer.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/7/763/s1, Figure S1: Unsupervized clustering of all the proteins identified and quantified across 6 BC cell lines, Figure S2: Box plot representing expression pattern of UAP1 in BC cell lines. Table S1: Complete list of proteins identified and quantified in the study, Table S2: Proteins overexpressed in non-type and basal subtype BC cell lines.

**Author Contributions:** Conceptualization, P.K., A.P., and H.G.; methodology, V.N.P., AJ., and K.G.; software, V.N.P.; B.D., validation, V.N.P., A.C., and A.J.; formal analysis, V.N.P., H.G., and P.K.; investigation, V.N.P., H.G., and P.K.; writing—original draft preparation, V.N.P., B.D., K.G., and P.K.; writing—review and editing, A.C., B.N., H.G., and P.K.; visualization, V.N.P., H.G., and P.K.; supervision, H.G., and P.K.; project administration, P.K.; funding acquisition, H.G., A.P., and P.K. All authors have read and approved the final version of the manuscript.

**Funding:** This research was funded by Department of Science and Technology (DST), Ramanujan Fellowship, Government of India, grant number SB/S2/RJN-077/2015, and the APC was funded by SB/S2/RJN-077/2015, Department of Science and Technology (DST), Ramanujan Fellowship, Government of India.

**Acknowledgments:** We thank Jean Paul Thiery (Department of Biochemistry, National University of Singapore, Singapore) for providing bladder cancer cell lines and normal human urothelial cell line TERT-NHUC (JTERT). We thank the Department of Biotechnology (DBT), Government of India, for research support to the Institute of Bioinformatics (IOB), Bangalore. We thank the "Manipal Academy of Higher Education", Madhav Nagar, Manipal 576104, India, for research support to the Institute of Bioinformatics. Harsha Gowda is a Wellcome trust/DBT India Alliance Intermediate carrier Fellow. Prashant Kumar is a recipient of the Ramanujan Fellowship awarded by the Department of Science and Technology (DST), Government of India. BD is a recipient of the INSPIRE Fellowship, Department of Science and Technology, Government of India. KG is a recipient of the Senior Research Fellowship from the University Grants Commission (UGC), Government of India.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Genetic Drivers of Head and Neck Squamous Cell Carcinoma: Aberrant Splicing Events, Mutational Burden, HPV Infection and Future Targets**

**Zodwa Dlamini 1,\*, Mohammed Alaouna 2, Sikhumbuzo Mbatha 3, Ahmed Bhayat 4, Mzubanzi Mabongo 5, Aristotelis Chatziioannou 1,6,7 and Rodney Hull <sup>1</sup>**


**Abstract:** Head and neck cancers include cancers that originate from a variety of locations. These include the mouth, nasal cavity, throat, sinuses, and salivary glands. These cancers are the sixth most diagnosed cancers worldwide. Due to the tissues they arise from, they are collectively named head and neck squamous cell carcinomas (HNSCC). The most important risk factors for head and neck cancers are infection with human papillomavirus (HPV), tobacco use and alcohol consumption. The genetic basis behind the development and progression of HNSCC includes aberrant non-coding RNA levels. However, one of the most important differences between healthy tissue and HNSCC tissue is changes in the alternative splicing of genes that play a vital role in processes that can be described as the hallmarks of cancer. These changes in the expression profile of alternately spliced mRNA give rise to various protein isoforms. These protein isoforms, alternate methylation of proteins, and changes in the transcription of non-coding RNAs (ncRNA) can be used as diagnostic or prognostic markers and as targets for the development of new therapeutic agents. This review aims to describe changes in alternative splicing and ncRNA patterns that contribute to the development and progression of HNSCC. It will also review the use of the changes in gene expression as biomarkers or as the basis for the development of new therapies.

**Keywords:** head and neck squamous cell carcinoma (HNSCC); aberrant splicing events; human papillomavirus (HPV) infection; non-coding RNA (ncRNA); methylation; mutational burden

#### **1. Introduction**

The term head and neck cancers are used to describe a variety of tumors that arise in the mouth, nose, throat, sinuses or salivary glands [1]. Head and neck cancers are the sixth most common form of malignancy, with a total of 600,000 reported cases around the globe each year [2]. Over 90% of these cases are squamous carcinoma of the head and neck, head and neck squamous cell carcinoma (HNSCC) [3]. More than two-thirds of HNSCC incidents are diagnosed in developing countries [4]. The estimated average age of patients is 60 years, and the incident rate is highest in males [5]. The first indications that a

**Citation:** Dlamini, Z.; Alaouna, M.; Mbatha, S.; Bhayat, A.; Mabongo, M.; Chatziioannou, A.; Hull, R. Genetic Drivers of Head and Neck Squamous Cell Carcinoma: Aberrant Splicing Events, Mutational Burden, HPV Infection and Future Targets. *Genes* **2021**, *12*, 422. https://doi.org/ 10.3390/genes12030422

Academic Editor: Selvarangan Ponnazhagan

Received: 27 January 2021 Accepted: 11 March 2021 Published: 15 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

patient is suffering from these types of cancers include changes in the sound of the voice, a persistent sore throat that will not heal, difficulty in swallowing and most notably, the development of lumps or lesions in the throat [6]. Even with major advances in diagnosis, radiation therapy and immunotherapy, the 5-year survival rate for HNSCC patients has not improved in recent decades [7,8]. Additionally, due to the lack of appropriate biomarkers for the early diagnosis of HNSCC, in many patients, the cancer is only detected at the later stages of the disease, leading to a poor prognosis [4,9].

The primary risk factors for HNSCC involve smoking and heavy alcohol use [10]. Human papillomavirus (HPV) is classified as a distinct risk factor, giving rise to tumors that are distinct from those caused by other risk factors [11]. Genome-wide systematic sequencing of mRNAs, microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs have led to the identification of probable methylation sites, single nucleotide polymorphisms (SNPs), mutations and variations in copy number in a variety of different genres. This has led to the identification of numerous potential biomarkers for HNSCC [12–16]. In addition to these genomic and epigenetic changes, alternative splicing events have also been implicated in the initiation and progression of head and neck cancer [17].

#### **2. The Role Played by HPV Infection in HNSCC Development and Progression**

Another important factor for the changes in gene expression that occur in head and neck cancer is infection with HPV. This is an independent etiological factor in the development of HNSCC and has been the target of interest for a large amount of recent research [11,18,19]. Human papillomaviruses are epitheliotropic DNA viruses with an average genome size of 8 kb [20]. The virus generates two oncoproteins encoded by the E6 and E7 genes that effectively inhibit the proteins p53 and pRb. This leads to the initiation of the cell cycle and DNA synthesis, which is required for viral replication [20]. Numerous studies have shown that HPV positive (HPV+) and HPV negative (HPV-) HNSCCs are separate entities with distinct etiologies, clinical behaviors, treatment outcomes, pathological toxicity, and molecular profiles [21–24]. HPV type 16 is identified as the causative agent in more than 90% of HNSCC cases [25].

Tumors expressing HPV genes (particularly HPV16) displayed no TP53 mutations and low losses of segments of chromosomes 3p, 9p and 17p [19]. Traditionally HPV+ HNSCC is more sensitive to treatment; however, resistance to these treatments, such as chemotherapy, radiation, and surgery, are on the rise [26]. Most HPV+ HNSCC tumors were found to be infected with HPV16 alone and showed expression of the HPV16-E6 oncogene [26]. A distinct class of 123 member genes was specifically deregulated in HPV16 positive HNSCC. These genes were deregulated in both smokers and non-smokers [26]. The symptoms of HPV+ and HPV- HNSCC are very different from each other, which causes confusion about whether these cancers are considered distinct tumors [27]. HPV + oral cancers show changes in the expression of genes regulating the cell cycle or a decrease in the levels of tumor suppressor proteins, such as pRb and cyclin D1. These proteins are usually overexpressed in oral HPV-e tumors [27,28].

#### *HPV-Infected HNSCC Expression Profiles*

HPV + HNSCC tumors overexpress retinoblastoma-binding protein factor-C replication gene, and transcription factor partner E2F-dimerization protein (TFDP2). A large group of genes that play a role in the defense against viral infection and immune response have been shown to be ineffective against HPV, including interleukin and interferon-induced proteins [26].

A study has demonstrated that there is a difference in the expression pattern of host genes in HPV + tumors from smokers, ex-smokers and non–smokers. HPV16-+ tumors from smokers could be monitored through the expression of p53 or E2F-transcription factors, such as insulin growth factor (IGF), protein transcription factor-C4 (RFC4), cell division cycle (CDC7), cytochrome P450 (CYP4V2), mini-chromosome maintenance protein complex 2 (MCM 2) and mitotic checkpoint complex protein (MCC) [29,30]. It was proposed that the expression of some of these genes might be linked to tobacco use [26]. Cyclindependent kinase inhibitor 2C (CDKN2C) and retinoblastoma (RB) genes were among the genes whose expression is consistently and greatly altered in HPV16-positive tumors from non–smokers. Neither of these genes has been conclusively linked with HPV16 in smokers despite their upregulation [26]. CDKN2C encodes the enzyme p18, a tumor suppressor and cyclin-dependent kinase receptor. It binds protein kinases and acts in conjunction with the retinoblastoma tumor protein (pRb) to inhibit cell cycle progression and regulate growth [26]. Enhanced CDKN2C and RB expression; suggests the lack of the negative feedback loop, a situation that is observed when the expression of HPV16-E7 is repressed [22]. Studies found that the expression of CDKN2 was able to regulate the growth of cell lines derived from HPV16-+ HNSCC tumors [31]. Cancer development and progression can also be inhibited in HPV-infected cells through the ability of pRb to suppresses the function of the E2F transcription factors in cells infected with HPV [32,33].

In head and neck cancer, p16 is inactivated by gene mutation or methylation, which triggers the functional inactivation of pRb [26,34]. HPV-E6/E7-related overexpression of p16 protein occurs in oral lesions [35]. In HPV16-+ HNSCC, elevated levels of the expression of the cell cycle control genes (CDC7, MCM2) were also recorded [22,31]. Regulation of interferon-inducible protein (IFN-inducible) and interleukin-1 receptor antagonist (IL-1RA) was recorded in-HPV16 expressing immortalized head and neck tumor cell lines [36]. Spontaneous degradation of the early HPV protein E2 led to increased transcription of viral DNA. This is accompanied by the rise of antiviral gene expression in the form of IFN and an increase in viral E6/E7 oncoprotein production [37]. Other studies also demonstrated that transfection of malignant cells with oncogene E7 would render them more vulnerable to IFN-alpha-induced apoptosis [38]. This suggests that active chronic oncogenic HPV infection can impact the vulnerability of cells to IFN-induced apoptosis in tumor tissue and may similarly affect IFN-based HPV therapy in associated diseases, such as HNSCC [34].

#### **3. Alternative Splicing in HNSCC**

A study published in 2019 reported the occurrence of alternative splicing (AS) events in 519 HNSCC patients. It was found that in these 519 samples, there were 4626 AS-related survival events in 3280 genes. These changes in AS signatures resulted in multiple, cumulative survival outcomes [39]. A study in 2020 identified 4068 splicing events associated with changes in the survival of HNSCC patients, using records from The Cancer Genome Atlas (TCGA). These results imply that a patient's AS signature can be used as a prognostic biomarker [17]. The top five AS events that correlated with survival were exon skipping, use of alternate promoter sites, use of alternate terminator sites, use of alternate acceptor sites, and use of alternate donor sites (Figure 1). These results imply that AS events are capable of being not only diagnostic and prognostic biomarkers but also therapeutic targets for the treatment of HNSCC patients. GO and KEGG analysis indicates that most genes whose splicing is altered in HNSCC are implicated in playing a role in functions, such as apoptosis, DNA repair, mRNA splicing and metabolism [39].

**Figure 1.** Analysis of alternative splicing (AS) events in head and neck squamous cell carcinomas (HNSCC). (**A**) Seven types of AS events were identified in HNSCC patient samples. These include changes in the location of alternate acceptor sites (AA), donor sites (AD), alternate promoter sites (AP) and alternate terminator sites (AT). Other AS events involve changes in the incorporation of exons and exclusion of introns due to exon skipping (ES), the use of mutually exclusive exons (ME) and retained introns (RI) [40]. (**B**) The number of each type of AS event taking place in 519 HNSCC patients. AA, alternative accepter; AD, alternative donor; AP, alternative promoter. AT, alternative terminator site; ES, exon skip; ME, mutually exclusive exons; RI, intron preserved/retained intron [39].

Further analysis indicated that HNSCC patient survival was associated with AS of five specific genes. These five genes are C5orf30, eEF1A lysine and N-terminal methyltransferase (*METTL13*), Ras homolog gene family member T1 (*RHOT1*), ATP-binding cassette sub-family C member 5 (*ABCC5*), and Myelin protein zero-like protein 1 (*MPZL1*). The role played by AS in these five genes is currently not fully known. METTL13 di-methylates eukaryotic elongation factor 1A (eEF1A), leading to increased translation and protein expression and can promote cancer formation and progression [41]. METL13 is alternately spliced to give rise to 5 isoforms. The full-length protein contains two methyltransferase domains. The second, of which is missing in at least two of the isoforms. These isoforms would then be less efficient at methylating targets, implying that these isoforms could help prevent cancer formation and progression (Figure 2A). RHOT1 is a membrane receptor that promotes proliferation and cancer [42]. It is spliced to give rise to six isoforms. Some of these isoforms lack the transmembrane receptor, implying that these isoforms can block signaling and could, therefore, prevent cell migration. (Figure 2B) MPZL1 activates Src kinases, which results in increased cancer cell proliferation and migration [43]. The fulllength variant has two transmembrane domains. Two of the splice variants lack one of these domains. This may interfere with the recognition of ligands by this receptor (Figure 2C). In

addition to these genes, there are a variety of genes that have been found to be alternately spliced in HNSCC. Examples of these genes are discussed in more detail below.

**Figure 2.** Isoforms of some of the proteins where alternative splicing had a strong correlation with survival. (**A**) eEF1A lysine and N-terminal methyltransferase (METTL13) are responsible for the activation of the eukaryotic elongation factor 1A (eEF1A) through methylation. There are 5 known isoforms of this protein. Some of these isoforms are missing one of the two methyltransferase domains. (**B**) Ras homolog gene family member T1 (RHOT1) is the gene that codes for the Mitochondrial Rho GTPase protein, a membrane receptor that is spliced to give rise to at least 6 isoforms. Some of these isoforms lack the transmembrane receptor, implying that these isoforms can block signaling. (**C**) Myelin protein zero-like protein 1 (MPZL1) is spliced to give rise to 5 known isoforms, some of which lack one of the transmembrane domains. Numbered boxes indicate exons, while colored boxes show the position of the corresponding domain. The blocks in this figure represent the different exons making up each isoform. The size of the boxes indicates the relative size of the exons.

#### *3.1. DOCK5*

Dedicator of cytokinesis 5 (DOCK5) is an intracellular signaling protein that is alternately spliced to give rise to at least two isoforms. By analyzing the expression changes of different isoforms of dedicator of cytokinesis 5 (DOCK5) and comparing this to clinical parameters, a link was discovered between the expression of certain DOCK5 variants and the patient's tobacco usage. This indicates that smoking decreases the overall survival of a patient through the alteration of the expression of DOCK5 variants [44]. AS of the DOCK5 mRNA gives rise to two splice variants. One variant contains an exon with an alternate terminator site, resulting in a truncated variant of DOCK5. This variant of DOCK5 enabled HNSCC cell proliferation, migration, and invasion of HPV-negative HNSCC [44]. The DOCK family of proteins are members of the guanine nucleotide exchange factor (GEF) group. These contain two DOCK homology (DHR) domains, DHR-1, and DHR-2, where DHR-2 is the GEF catalytic element [45]. AS events in DOCK5 observed in HNSCC results in the loss of this catalytic domain [44]. The expression of a truncated variant lacking the catalytic domain can promote the development and progression of HPV-negative HN-SCC [44]. This also implies that this splice variant, as well as the processes it is associated with, can serve as possible therapeutic targets.

#### *3.2. Lysyl Oxidase (LOXL2) Facilitates the Development of HPV-Negative HNSCC*

Lysyl oxidases (LOXL) are a family of copper-containing amine oxidases that catalyze the deamination of lysine residues in collagen and/or elastin. These lysines are involved in the formation of crosslinking in the extracellular matrix leading to increased fibroblast growth and adhesion. In this way, the overexpression of members of the LOXL family promotes metastasis [46]. There are four members of the LOXL family, and these are named LOXL1-4 [47,48]. One of these family members, LOXL2, contributes to the initiation and development of tumors [49]. LOXL2 overexpression seems to enhance the ability of cancer cells to invade tissue layers and promote metastasis [49,50]. LOXL2 activity can be inhibited by hypoxia and hydrogen peroxide (Figure 3A).

Several splice variants of LOXL2 have been reported in various cancers, including esophageal squamous cell carcinoma. Two isoforms—the exon 13-free form (Figure 3B) and the form containing a 72-nucleotide-deletion—result in tumor progression through a new molecular mechanism distinct from the canonical model for LOXL2 [51,52]. This alternate LOX2 mechanism involves the activation of signaling pathways, such as focal adhesion kinase (FAK) and protein kinase B (PKB), and leads to the transition from epithelial to mesenchymal tissue [53]. Overexpression of the isoform lacking exon 13 activated these pathways to a higher degree in HPV-negative HNSCC patients; this includes both the focal adhesion kinase (FAK) and protein kinase B (PKB) pathways. This is associated with modifications in p-FAK, p-AKTT308, p-AKTS473 and p-S6 (Phospho-S6 Ribosomal Protein) [54]. Therefore, it is predicted that this LOXL2 splice variant may promote the proliferation, migration, and invasion of HPV-negative HNSCC cells [54].

All these findings promote the idea that LOXL2 isoforms can be used as biomarkers or therapeutic targets. Its use as a biomarker is further promoted by the fact that it is secreted [55] and can, therefore, be used as a biomarker in non-invasive liquid biopsies.

**Figure 3.** Alternate splicing of the LOXL2 mRNA. (**A**) The lysyl oxidase-like (LOXL2) signaling pathway stimulates invasion and angiogenesis in cancer cells. The pathway is also inhibited because of hypoxia. Green lines with green arrows indicate stimulation or induction, while red lines with red diamonds indicate inhibition. (**B**) The e13 splice variant of LOXL2 results from the exclusion of the final exon, exon 13. This variant promotes stronger signaling promoting invasion and metastasis. Both isoforms contain all four SR domains (scavenger receptor Cys-rich domain). These are shown in the figure as the blue boxes. These domains are responsible for facilitating binding to cell membranes during phagocytosis. FAK, focal adhesion kinase; PI3K/AKT, phosphatidylinositol 3-kinase/protein kinase B; HIF1α, hypoxia-inducible factor 1α; VEGF, vascular endothelial growth factor; EMT, epithelial-mesenchymal transition.

#### *3.3. Transcription Factor Dp-2 (TFDP2)*

The transcription factor Dp-2, also known as the E2F dimerization partner 2 (*TFDP2)* gene, encodes a member of a family of transcription factors that heterodimerize with E2F proteins to enhance their DNA-binding activity and promote transcription of E2F target genes. The expression of TFDP2 is upregulated in HPV16-positive HNSCC tumors from non-smokers [26]. Some of these E2F target genes function to control the transcriptional activity of numerous genes involved in the progression from the G1 to the S phase of the cell cycle. TFDP2 is alternately spliced to give rise to 8 isoforms (Figure 4). One of these isoforms lacks the E2F transcription factor dimerization partner domain. This DNAbinding domain stimulates E2F transcription. Increased expression of this isoform may inhibit cell cycle progression (Figure 4).

**Figure 4.** (**A**–**H**) Isoforms of TFDP2. The TFDP2 transcription factor is known to be expressed at high levels in HNSCC and is alternately spliced to give rise to 8 isoforms. Isoform 7 is missing the E2F/DP dimerization domain. This isoform can block E2F associated transcription and, therefore, inhibit cell cycle progression. Although the precise role of these isoforms in HNSCC is unknown, the high expression levels of the wild-type variant in HNSCC imply that isoform 7 may serve as a negative regulator of cancer progression. The blocks in this figure represent the different exons making up each isoform. The size of the boxes indicates the relative size of the exons.

#### *3.4. Splicing of p53 in HNSCC*

The abolition of normal p53 function is one of the most common genetic changes in human cancer. P53 mutations are assumed to contribute significantly to the development of around 40% of HNSCCs [11,56]. AS of p53 gives rise to at least 12 isoforms. These isoforms all retain the mutation hotspot sequence (exons 5–8). The canonical p53 protein (Figure 5B) is named p53α and is normally the most abundant isoform and contains all seven functional domains. The two N-terminal transactivation domains, a proline-rich domain (PXXP), a DNA-binding domain, an oligomerization domain (OD), a nuclear localization signaling domain and a negative-regulation domain [57]). The isoforms are divided into three main groups variants, α, β or γ, based on the splicing changes at the N terminal. Isoforms in group alpha have the N terminal basic domain. Isoforms in groups beta and gamma lack this domain and use an alternate exon nine splice variants, exon 9a in group beta and exon 9b in group gamma (Figure 5) [58]. The other isoforms are the result of a variety of AS mechanisms, including alternative promoter usage and alternative initiation of translation sequences. These groups each contain their own truncated variants that arise due to internal promoters Δ40p53. Δ133p53 and Δ160p53 [57].

The isoform detected at the highest level in HNSCC was p53β [58]. Unlike the canonical p53, p53β preferentially binds to the promoter for the proapoptotic *Bax* and is unable to efficiently induce the expression of the p53 regulator MDM2. This allows it to induce apoptosis in a p53 independent manner [59]. Overexpressed p53β cooperates with full-length p53 and contributes to cellular senescence. This increase in p53β was observed in vivo in senescent colon adenomas [58]. The truncated Δ40p53 isoforms are created by AS in intron 2, and the resulting isoform lacks the transactivation domain. These isoforms have a dominant-negative effect on the activity of full-length p53 [60]

The shorter isoforms Δ133p53 and Δ160p53 lack the transactivation domain, the proline-rich domain and a part of the DNA-binding domain (Figure 5). These isoforms can interact directly with the canonical p53 and regulate its transcriptional activity [58]. A mouse model was developed expressing a p53 protein with a deletion of the first 122 amino acids. This model was used to study the role of the Δ133p53 isoforms [61]. This p53 promoted hyperproliferation in cancer and inflammation in the mice studied [61]. Other studies undertaken using mice xenograft models have shown that Δ133p53α could stimulate cell migration and angiogenesis [62]. The Δ133p53α isoform is also able to prevent p53-mediated replicative senescence, G1 cell-cycle arrest and apoptosis [63]. In summary, p53β promotes replicative senescence, and the action of this isoform is opposite to that of Δ133p53α, which promotes proliferation. Therefore, the ratio of p53β/Δ133p53α can be used to measure cancer risk. A decrease in the ratio would favor cancer development and progression [58].

**Figure 5.** (**A**–**M**) Splicing isoforms of p53: Alternate splicing of p53 gives rise to at least 12 isoforms. The canonical isoforms (**B**) contain all 10 exons and all 6 domains. The main classification of these isoforms relies on differences in the C terminal. The alpha group contains the basic domain encoded by exon 10. The Beta family of isoforms contains the 9a exon, and the gamma exon contains the 9b exon. The extent of the deletions at the N terminal can further divide these into separate groups full length, Del40, Del 133 and Del 160 [64]. The blocks in this figure represent the different exons making up each isoform. The size of the boxes indicates the relative size of the exons.

#### *3.5. PITX2*

The homeobox gene paired-like homeodomain transcription factor 2 (PITX2) is one of the bicoid transcription factors and is spliced to give rise to four isoforms (PITX2A, PITX2B, PITX2C, PITX2D) (Figure 6). These transcription factors play a role in controlling the transcription of procollagen lysyl hydroxyl, an enzyme responsible for the formation of many body structures during development [65]. PITX2 mutations are responsible for the Axenfeld–Rieger type I condition, a disorder that affects the development of teeth, hair, and abdominal structures [66]. The expression of the PITX2 gene can control the Wnt pathway and interferes with the activation of transcription and the β - catenin cell adhesion mediator. PITX2 is also required for the induction of Cyclins A1 and D2 by recruiting coactivators [67,68]. The different isoforms of PITX2 (A, B and C) induce the transcription of different target genes. Isoform D acts as a negative regulator of the other isoforms by suppressing their transcriptional activity [69]. The isoforms A, B and C are expressed at higher levels in various cancers, where different isoforms stimulate the expression of different members of the TGFβ family [70].

**Figure 6.** Isoforms of PITX2. Alternate splicing of the PITX2 mRNA gives rise to four protein isoforms. Three of these isoforms (**A**–**C**) have a similar function, each inducing the transcription of different genes that stimulate growth and proliferation. The final isoform, isoform (**D**), acts as a negative regulator of the other three isoforms. The blocks in this figure represent the different exons making up each isoform. The size of the boxes indicates the relative size of the exons (**E**).

#### *3.6. Aberrant Expression of Splicing Factors and Associated Proteins in HNSCC*

The expression of HPV proteins is also dependent on the hosts splicing factors. The most important of these splicing factors are hnRNPA1 and hnRNPA2, which control the expression level of the E6 protein, which is directly responsible for the increased levels of HPV-related HNSCC. The viral E7 and E6 proteins are produced due to AS of viral mRNA. The use of the 5'-splice site SD226 and 3'-splice site SA409 produce E7 mRNAs. Un-spliced viral mRNAs produce E6 mRNA, which is promoted by hnRNPA1 by inhibiting the use of the SA409 splice site, decreasing the levels of E7, and increasing those of E6. The levels of E6 are also decreased through splicing induced by hnRNPA2. This splicing factor inhibits the use of the SA-409 splice site and promotes the use of a downstream 3' splice site named SA742 [71].

#### **4. Non-Coding RNAs in HNSCC**

Recent studies have indicated that non-coding RNAs (ncRNAs) play an important role in the development and progression of HNSCC. These RNAs regulate the expression of coding genes. MicroRNAs (miRNAs) can either promote or inhibit the expression of target genes by binding directly to their target mRNA. They then affect the stability of the mRNA [72]. This is why the aberrant regulation of miRNAs is an important contributing factor in the development of this disease [73]. LncRNAs may control gene expression by promoting transcription, silencing transcription or by promoting or inhibiting translation [74]. Not only do these ncRNAs regulate the expression of protein-coding genes, but they also regulate the expression of other ncRNAs, and since these molecules act by binding to target mRNA, they also compete for the target binding sites on these mRNA targets. Both these types of ncRNA can also be used to fulfill the role of biomarkers for cancer diagnosis and prognosis, as they are found in the body fluids [75].

#### *4.1. MicroRNA Profile in HNSCC*

A number of miRNAs have been identified as playing an important role in the development and progression or prevention of HNSCC by acting as either oncogenes or as tumor suppressors [72,76–79]. AS can generate mRNA with different MicroRNA response elements (MREs) that can alter the ability of miRNA to target them. Different miRNAs can easily be generated through the use of alternate promoters and alternate termination sequences to generate miRNAs with different 5' and 3' UTRs. The sequence of miRNAs can also be altered by alternate polyadenylation [80]. An early study that examined miRNA profile changes in HNSCC found that the expression of 20 miRNAs was different in HNSCC samples when compared to normal tissue [76], while a later study using more sensitive deep sequencing found 365 miRNAs with significantly different expression levels in HNSCC samples [81]. Further characterization of these miRNAs that are differentially expressed in HNSCC revealed that 49 of these miRNAs were associated in some way with p53. Sixteen of these miRNAs were also associated with lower survival rates in HNSCC patients [82].

MiRNAs whose expression changes in HNSCC cell lines and patients' samples that play a tumor suppressor role include miR-200 [83], mi-R375 [84], miR-26a [85], miR-7 [86], miR-107 [87] miR-218 [88] and members of the let-7 micro-RNA family [89]. In addition to this, multiple miRNAs were reported to be downregulated in HNSCC. These include miR-206 [90], miR-10a-5p, miR-125a-5p, miR-144-3p, miR-195-5p and miR-203 [91]. MiR-200 knockdown results in the development of aggressive cancer, while increased levels of, Mi-RNA-200 inhibits cell growth [83]. Another miRNA that acts as a tumor suppressor in multiple cancers, including HNSCC, is mi-R375; however, it was found to act as an oncogene in cancers, such as lung cancer. It was also found that the expression ratio of miR21 to mi-R375 in tumors compared to normal tissue is a good indicator of patient survival. The lower this ratio is, the worse the survival outcome [84]. MiR-26a acts as a tumor suppressor by inhibiting cell migration and metastasis as well as lowering the expression of the enhancer of zeste homolog 2 (EZH2). This results in decreased cell growth [85]. Many of the other tumor suppressor miRNAs function by inhibiting the expression of genes that promote cell proliferation. MiR-7 inhibits EGFR expression [86], miR-107 inhibits Akt, Stat3 and Rho GTPases via Protein kinase Cε (PKCε) [87]. Other tumor suppressor miRNAs function by inhibiting cell migration, invasion, and metastasis by inhibiting signaling cascades. For example, miR-218 inhibits the focal adhesion pathway, preventing cell migration [88].

The changes in miRNA expression in HPV + HNSCC have also been studied. Specific effects of HPV infection in the development of HNSCC rely on the dysregulation of miRNA expression levels and changes in the location of cellular miRNA. MiR-363 is overexpressed in HPV positive HNSCC, where it functions in cell cycle regulation and reduces cell growth and invasion [92,93]. Analysis of the transcription levels of miR-106a and miR-92a did not reveal any variations in expression between HPV+ and HPV- HNSCC cell lines [93], yet in the presence of HPV-16, MiR-155 has been shown to be downregulated [93]. Studies have shown that in HPV+ HNSCC cells, miR-181a and miR-29a were downregulated in comparison to HPV- HNSCC cells [93]. MiR-29a interacts with and stabilizes p53 [94]. Since HPV-16 E6 increases the rate of p53 degradation [20], MiR-29a deregulation in conjunction with E6 expression could further decrease p53 levels following chronic HPV infection [93].

MiRNAs that were found to function as oncogenes include miR-21 [95], miR-375 [96] and miR-184 [97]. Some of the miRNAs that promote HNSCC development and progression that function by the inhibition of apoptosis include miR-21 [95]. Additionally, many of these miRNAs whose expression is increased in HNSCC are also associated with decreased HNSCC survival; an example of this is miR-21 [96]. MiRNAs, who were found to be expressed at higher levels in HNSCC, but whose effects are not known include miR-133b, miR-455-5p and miR-196 [98], miR-26a, miR-21 [95], miR-106b-3p, miR-2, miR-19a, miR-33a and miR-31 [97].

#### *4.2. LncRNAs in HNSCC*

Multiple studies have identified numerous lncRNAs whose expression is altered in many cancers [97]. As in many other cancers, the lncRNA HOX antisense intergenic RNA (*HOTAIR*) is deregulated in HNSCC. This lncRNA is overexpressed in poorly differentiated HNSCCs, and higher expression is associated with more advanced stages of the disease [99]. Those lncRNAs whose expression is increased in HNSCC include nuclear paraspeckle assembly transcript 1 (NEAT1) [100], HOXA transcript at the distal tip (HOTTIP), urothelial cancer associated 1 (UCA1) [101], lncRNA-regulator of reprogramming (ROR) [102] and H19 [103]. The expression of other lncRNAs is downregulated in HNSCC, and this lower expression is associated with a poorer prognosis. This is a possible indication that they play an antitumor function. These include AC026166.2-001, RP11-169D4.1-001, growth-arrestspecific 5 (GAS5) [100], LET [104], X-inactive specific transcript (XIST) [105], maternallyexpressed 3 (MEG3) [106], and lnc-JPHl-7 [107].

#### **5. The Contribution of Genomic Mutations to HNSCC**

The genomic changes that have been observed in HNSCC include chromosome amplification, chromosome deletion and mutations. Mutations in the genome of HNSCC patients are commonly observed and are known to contribute to cancer development and progression. This has been observed in both HPV+ and HPV- tumors [108]. Some of the most common genes that are found to be mutated in HNSCC include genes that play a role in cell cycle regulation and progression. The gene cyclin-dependent kinase inhibitor 2a (*CDKN2A)* was found to be mutated in up to 87% of HPV- HNSCC tumors. However, mutations in this gene are not common in HPV+ HNSCC [109]. Other groups of genes that are found to be mutated in HNSCC include receptor tyrosine kinases and mitogen-associated protein kinases, growth factors and growth factor receptors [109].

#### *5.1. Mutations in P53 and Associated miRNAs*

Mutations in TP53 are linked to the poor overall survival of patients when compared to patients with wild-type p53 [110]. Most of the mutations observed in the p53 gene occur in the DNA-binding region, commonly referred to as the mutation hotspot. As a result of this, mutation analysis of the genomic DNA sequence is usually confined to exons 5–8 or 5–9 [111]. A genomic DNA sequence analysis of exons 5–9 was performed in a study of 166 HNSCC patients. Mutations in p53 were found in 65 tumors (39%) [11]. Another study of 32 HNSCC tumors showed that only 8 (25% were identified as having mutations in p53. Once again, this study focused on exons 5–8 [112]. Another later study indicated that in HNSCC, mutations in exons 5–9 were reported in 22% of all tumors [111]. Mutations are also found in the non-coding regions of p53. One mutation is found between exon 6 and 7 (63 bp downstream of exon 6). The resulting mutant protein was found more commonly in samples from cancer patients. It is expected that this mutation results in the stabilization and accumulation of wild-type p53 [113]. Studies estimating the frequency of mutations in p53 in HNSCC samples have found a wide variety of results. These include 78% of HPVtumors [114]; 60% in freshly frozen HNSCC samples regardless of HPV status [115], to frequencies as low as 30% in some HNSCC samples regardless of HPV infection [108].

In HNSCC, the TP53 tumor suppressor gene is most often mutated in tumors negative for HPV [110]. The tumor suppressor activity of p53 is accomplished through its activity as a transcription factor. Some of the genes whose transcription is initiated by p53 are miRNAs and mRNA coding for proteins involved in cell stress control, apoptosis, and DNA damage repair [110]. One example of this is the expression of miR-377-3p, the primary regulator of Sestrin 1 (SESN1), which encodes a Sestrin family member and is also known as p53-regulated protein PA26 [110]. The expression of SESN1 correlates with the expression of genes that control autophagy. It triggers TP53 expression [110] and helps to stimulate the response to DNA damage and oxidative stress in cells [110]. MiR-377-3p is a downregulator of SESN1, which directly targets 30 untranslated genome regions [110]. The lower expression of miR-377-3p is an indicator of a poor outcome for patients.

#### *5.2. Mutation in PIK3CA*

The alpha catalytic subunit of phosphatidylinositol-4,5-bisphosphate 3-kinase (*PIK3C*A) has been identified as being the most affected gene in HNSCC. In HPV+ tumors, the mutation hotspots for this gene occur in the area of the gene that codes for the protein's helical domain [109]. This domain conducts inhibitory signals to the kinase catalytic domain. This kinase activity activates downstream signaling in response to growth factors leading to cell growth. Therefore, mutations that affect the activity of this domain result in uncontrolled growth and cancer [116]. In HPV- tumors, the mutations are not localized to any single region and occur throughout the gene [109].

#### **6. Diagnostic and Therapeutic Applications**

The differences in mRNA splicing, the resulting changes in expression profiles of various protein isoform and the changes in the transcription levels of ncRNA observed in HNSCC compared to healthy tissue could be used to develop new diagnostic or prognostic biomarkers and may also be utilized as targets for the development of new therapies. There have been multiple clinical trials evaluating alterations in splicing as therapeutic targets, and they have been reviewed in detail elsewhere [117,118].

Drugs have already been developed that target AS in cancer. Some of these drugs are shown in Table 1. One of the therapeutic approaches targeting splicing events for the treatment of cancer is the search for small molecular inhibitors of splicing. This is most commonly done using a bioprospecting approach involving the search for natural products derived from bacteria or plants. Many of the most successful of these compounds inhibit splicing factors, such as SF3B (Table 1). This results in the inhibition of spliceosome assembly [118]. This includes compounds, such as pladienolides, isolated from *Streptomyces platensis*. This compound displays cytotoxic effects and the ability to induce cell cycle arrest. However, this compound is not stable, but a stable derivative molecule was developed based on pladienolide known as E7107 [119]. Although SF3B consists of 7 subunits, named SF3B1, SF3B2, SF3B3, SF3B4, SF3B5, SF3B14, and PHF5A, all these small molecules target SF3B1 [120]. Other than SF3B, many of these compounds inhibit the spliceosome by binding to other snRNPs (Table 1). However, despite the promise of these drugs, cancer cell lines have been shown to develop resistance to them. For instance, after continuous exposure to pladienolide B, human colorectal cell lines developed resistance to the drug. This resistance results from point mutations arising in *SF3B1* (*SF3B1*R1074H), which reduce the binding affinity of these compounds to SF3B [120].

Another means to target specific splicing events is through the use of specific oligonucleotides that are able to hybridize to specific regions of mRNA and by targeting specific sequences and regulate splicing to favor one isoform over another (Table 1) [121]. These oligonucleotides that target specific regions of mRNA need to have the antisense sequence of the mRNA to facilitate binding. Therefore, they have been named antisense oligonucleotides (ASOs). Another type of oligonucleotide-based therapy is known as splice-switching oligonucleotides (SSOs), and many functions by blocking the sites on the mRNA where silencers or enhancers can bind. These exonic splicing enhancer (ESE) sites or intronic splicing silencers (ISSs) can lead to the incorporation of different exons and introns [122]. Despite the promise of these oligonucleotides as therapeutic interventions

targeting aberrant splicing, it is important to note that none of these oligonucleotides have been approved by the FDA for the treatment of cancer [121].

Although cancer screening panels do exist that detect splicing-factor mutations, there is as yet none that detect changes in AS events. Additionally, changes in the AS of many genes are shared across many different types of cancers. These changes and the accompanying changes in molecules downstream of these splice variants can be used to provide diagnostic and prognostic biomarkers for many cancers and then be used in specific combinations to stratify individual cancer types. Studies have shown that SESN1 mRNA, UHRF1BP11 mRNA and miRNA-377-3p are important biomarkers for predicting prognosis for patients with HPV- HNSCC [110]. This can help to stratify patients and possibly introduce new clinical strategies for managing HNSCC patients [110]. Further analysis indicated that the mRNA for ubiquitin-like containing PHD and RING finger domains 1-binding protein 1 (UHRF1BP1), and p53-regulated protein PA26 (SESN) are both associated with mutated TP53 [110].

Practical use of these isoform profiles as diagnostic or prognostic markers can be achieved using tissue biopsies from patients. The development of labeled riboprobes specific to the alternately spliced mRNA or labeled antibodies raised against isoformspecific protein regions would allow for the detection of AS variants in these tissue sections. Previously the quantification of the staining intensity in samples, such as these, was based on subjective ranking by an experienced histologist. Recently, automated systems have been developed that can automatically scan and analyze slides prepared in the above manner. An example of such a system is the TissueFAXS (TissueGnostics®, Vienna, Austria). This system has previously been shown to accurately detect the transcription and expression levels of different members of the neuron growth factor (NGF) and the neurotrophin receptor (NTR) families. The expression of different members of the NTR family has been found to be mutually exclusive in different cells, with one member being more commonly expressed in HNSCC (NTRK1, the high-affinity NTR) at higher levels than other members (p75NTR, the low-affinity NTR). Through the use of riboprobes and antibodies specific to these two family members, the TissueFAXS system was able to accurately detect and quantify the levels of these different family members in HNSCC samples [123]. An example of the workflow using such a system is given in Figure 7.


**Table 1.** Different classes of drugs that target alternate splicing in cancer.


**Figure 7.** An example of a workflow to detect alternative splicing events using an automated detection and quantification system. The differences in the profiles of splicing events can be used to diagnose HNSCC, detect if the HNSCC is related to human papillomavirus (HPV) infection or stratify patients based on cancer stage and severity. Either riboprobes specific for mRNA transcripts or antibodies that are raised against portions of different protein isoforms that are unique to specific isoforms can be used to detect variants. An automated quantification system, such as the TissueFAXS Plus from TissueGnostics, can be used to rapidly and accurately detect and quantify the levels of these variants without the possible inaccuracies introduced by a subjective assessment by a histologist.

#### **7. Conclusions**

As the sixth most diagnosed cancer in the world, HNSCC is a large public health burden. The use of changes in the expression profile of alternate protein isoforms or

ncRNAs profiles as biomarkers could prove to be a useful diagnostic tool. In addition to this, the isoforms that contribute to cancer development and progression could serve as useful targets for the development of new therapies. These new diagnostic tools and therapeutic targets could assist in the development of personalized healthcare and more precise patient stratification. However, more research is required to establish if these different profiles can truly be used as diagnostic or prognostic tools. The main problem with these markers is that entire profiles of the splicing changes must be evaluated as many different tumors will not show changes in the splicing of the same genes. For example, only a certain percentage of tumors will show alterations in the splicing of one gene, while another percentage of cancer cells will not show any changes in the splicing of the same gene. For an accurate diagnosis and patient stratification, a large amount of data concerning the changes in the populations of various isoforms or mutations will need to be analyzed in order to come to the correct conclusion. This large amount of data could be expensive to obtain for individual patients and may be time-intensive to analyze. As technology advances, this cost will decrease. Additionally, as artificial intelligence and machine learning technology advance, so the time needed to analyze data will decrease.

These molecular differences between healthy and HNSCC tissue allow patients to be defined by their prognosis and optimize the management of the disease through specific treatments or, in the worst circumstances, palliative care. Multiple methods of targeting alternative splicing are being developed. However, problems have been encountered with many of these approaches. Small molecules that can target the spliceosome, inhibiting the formation of specific splice variants, can lead to resistance to these drugs. However, the development of new drugs derived from naturally occurring compounds that can inhibit splicing may provide novel therapies that cancer cells do cannot readily develop resistance to. Despite the hurdles and the large amount of further study required to develop new biomarkers and therapeutic targets based on genetic drivers of HNSCC. These molecular profiles still offer a promising target for improving treatment outcomes and creating new diagnostic and prognostic tests for HNSCC.

**Author Contributions:** Conceptualization, Z.D., A.B., S.M. and M.M.; writing—original draft preparation, M.A. and R.H.; writing—review and editing, Z.D., S.M., A.B., M.M. and R.H.; visualization, Z.D., M.A., R.H.; supervision, Z.D and A.C.; project administration, Z.D and A.C.; funding acquisition, Z.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by The South African Medical Research Council.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Genes* Editorial Office E-mail: genes@mdpi.com www.mdpi.com/journal/genes

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-7117-1