**Genomic Analysis of Localized High-Risk Prostate Cancer Circulating Tumor Cells at the Single-Cell Level**

#### **Aline Rangel-Pozzo 1,\*, Songyan Liu 2, Gabriel Wajnberg 3, Xuemei Wang 1, Rodney J. Ouellette 3, Geo**ff**rey G. Hicks 2, Darrel Drachenberg <sup>4</sup> and Sabine Mai 1,\***


Received: 7 July 2020; Accepted: 5 August 2020; Published: 8 August 2020

**Abstract:** Accurate risk classification of men with localized high-risk prostate cancer directly affects treatment management decisions and patient outcomes. A wide range of risk assessments and classifications are available. However, each one has significant limitations to distinguish between indolent and aggressive prostate cancers. Circulating tumor cells (CTCs) may provide an alternate additional source, beyond tissue biopsies, to enable individual patient-specific clinical assessment, simply because CTCs can reveal both tumor-derived and germline-specific genetic information more precisely than that gained from a single diagnostic biopsy. In this study, we combined a filtration-based CTC isolation technology with prostate cancer CTC immunophenotyping to identify prostate cancer CTCs. Next, we performed 3-D telomere profiling prior to laser microdissection and single-cell whole-exome sequencing (WES) of 21 CTCs and 4 lymphocytes derived from 10 localized high-risk prostate cancer patient samples. Localized high-risk prostate cancer patient CTCs present a high number of telomere signals with lower signal intensities (short telomeres). To capture the genetic diversity/heterogeneity of high-risk prostate cancer CTCs, we carried out whole-exome sequencing. We identified 202,241 single nucleotide variants (SNVs) and 137,407 insertion-deletions (indels), where less than 10% of these genetic variations were within coding regions. The genetic variation (SNVs + indels) and copy number alteration (CNAs) profiles were highly heterogeneous and intra-patient CTC variation was observed. The pathway enrichment analysis showed the presence of genetic variation in nine telomere maintenance pathways (patients 3, 5, 6, and 7), including an important gene for telomere maintenance called telomeric repeat-binding factor 2 (TRF2). Using the PharmGKB database, we identified nine genetic variations associated with response to docetaxel. A total of 48 SNVs can affect drug response for 24 known cancer drugs. Gene Set Enrichment Analysis (GSEA) (patients 1, 3, 6, and 8) identified the presence of CNAs in 11 different pathways, including the DNA damage repair (DDR) pathway. In conclusion, single-cell approaches (WES and 3-D telomere profiling) showed to be useful in unmasking CTC heterogeneity. DDR pathway mutations have been well-established as a target pathway for cancer therapy. However, the frequent CNA amplifications found in localized high-risk patients may play critical roles in the therapeutic resistance in prostate cancer.

**Keywords:** localized high-risk prostate cancer; circulating tumor cells; three-dimensional (3-D) telomere profiling; laser microdissection; whole-exome genome sequencing; somatic single nucleotide variants; copy number alterations; precision medicine

#### **1. Introduction**

Prostate cancer is a heterogeneous disease with indolent and aggressive forms. Prostate cancer is the most commonly diagnosed type of cancer in men [1]. A wide range of risk assessments and classifications are available. However, each one has significant limitations to distinguish between indolent and aggressive prostate cancers [2]. Patients are categorized into aggressive and potentially lethal disease based on tumor (T) stage, Gleason grade, the number of cores with tumor in the diagnostic biopsy, prostate-specific antigen (PSA), and imaging [3]. For some men with the highest heterogeneity or widest range of outcomes, recommendations range from active surveillance to surgery, radiation, or androgen deprivation therapy [4–7]. At present, despite improvements in prostate cancer management, relapse is still reported in the order of 30% and about 10% with rapid disease progression [8]. In addition, changes in prostate-specific antigen (PSA) concentrations was shown to not be a reliable parameter to inform prognosis [8]. The ultimate consequence of imprecise clinical prognostic grouping is that some patients with indolent tumors are overtreated, while others with an aggressive tumor are undertreated.

Recent studies have shown that specific genetic variations and copy number alterations (CNAs) are associated with disease aggressiveness and prediction of post-radical prostatectomy biochemical recurrence [9–13]; patients with high-risk polyclonal tumors relapse more frequently after primary therapy [13]. The problematic aspect of applying this information into clinical care is the associated risks of biopsy sampling, as well as the extensive spatial heterogeneity of the multifocal tumors typically present at diagnosis [14,15].

Other recent studies have addressed the potential use of circulating tumor cells (CTCs) as an additional source, beyond tissue biopsies, for a patient's clinical assessment. CTCs can reveal tumor-derived and germline genetic information with more precision than the information obtained from a single diagnostic biopsy [14,15]. The use of CTCs, as liquid biopsies, in prostate cancer provides the opportunity for multiple and minimally invasive sampling for disease monitoring, response to treatment, and molecular profiling of the disease [16–18]. CTCs isolated from blood samples have shown to be found in early stages of the disease as well as in localized high-risk prostate cancer; however, the clinical significance of this has not yet been established [19–22]. In order to use CTCs as a minimally invasive sampling of prostate cancer and as biomarkers for patient stratification and selection of targeted therapy, it is important to ensure efficient enrichment (isolation), detection (identification imaging), and characterization (molecular profiling) strategies [23–27]. The limitations of the Food and Drug Administration (FDA)-approved platform Epithelial cell adhesion molecule (EpCAM)-based capture assays for the detection and enumeration of CTCs stimulated the development of many other technologies, including size-based capture enrichment devices [28].

In this study, we combined a filtration-based CTC isolation technology with prostate cancer CTC immunophenotyping to identify the prostate cancer CTCs [29,30]. After identification, we performed 3-D telomere profiling prior to laser microdissection and single-cell whole-exome sequencing (WES) of 21 CTCs and 4 lymphocytes from 10 localized high-risk prostate cancer patients. Our goal was to identify unique and common single nucleotide variants (SNVs), insertion-deletions (indels) mutations, and copy number alterations (CNAs) that could be used to predict high-risk lethal prostate cancer and treatment response for patients with clinically localized high-risk prostate cancers. Three-dimensional telomere profiling was performed prior to single-cell sequencing in the same patient sample, since alterations in telomere biology are one of the earliest events in prostate cancer tumorigenesis that continue during tumor progression [29,30]. The ability of CTCs' 3-D telomere profiling in displaying

tumor cell-dependent alterations in telomere architecture and its role as an important structural indicator of genomic instability present in each tumor cell genome have appeared in previous studies [31–35].

#### **2. Materials and Methods**

#### *2.1. Patient Samples*

Ten treatment-naïve patients with confirmed localized high-risk prostate cancer Gleason 8 or 9 had their CTCs and/or lymphocytes collected and analyzed. This study was conducted between 2017 and 2019. The patient clinical characteristics are summarized in Supplementary Table S1. This study obtained University of Manitoba Ethics Board approval and informed consent (HS14085; H2011:336; CCMB RRIC number 59-2011).

#### *2.2. Isolation of CTCs Using the ScreenCell Filtration Technique and Immunostaining*

CTC isolation by size-based filtration and immunostaining was performed as previously reported [34]. All samples were processed within 2h. The CTCs were isolated from the blood of prostate cancer patients using Screen Cell filtration devices (ScreenCell, Sarcelles, France), according to the manufacturer's instruction [28]. All prostate cancer CTCs were recognized with a combination of prostate cancer cell-specific antibodies. Anti-androgen receptor conjugated with Alexa Fluor 488 (AR Antibody (441): sc-7305, Santa Cruz Biotechnology, Dallas, Texas, EUA), anti-cytokeratin 8,18,19 antibodies (Anti-Cytokeratin 8 + 18 + 19 antibody—ab41825, abcam, Cambridge, United Kingdom), as well as a negative marker for prostate cancer CTCs, CD45 (Anti-CD45 antibody (ab10558), abcam, Cambridge, United Kingdom) for lymphocyte staining was used. Dried isolation supports (ISs) were stored at 4 ◦C or -20 ◦C prior to quantitative 3-D telomere fluorescent in situ hybridization and laser microdissection for single-cell isolation. Two ISs were collected per patient.

#### *2.3. Co-Immuno Telomere Three-Dimensional Quantitative Fluorescent In Situ Hybridization (3-D-QFISH)*

The 3-D-QFISH was performed as previously described [32–35]. Briefly, the ISs or filters were rehydrated with 1x PBS (phosphate-buffered saline) for 5 min followed by a 10-min fixation in 3.7% formaldehyde/1x PBS. The filters were blocked in 4%BSA/4×SSC blocking solution for 5 min, then incubated with primary antibody anti-AR (1:500 dilution), anti-Cytokeratin 8 + 18 + 19 antibody (1:200 dilution), and anti-CD45 antibody (1:100 dilution) for 45 min at 37 ◦C in a humidified atmosphere. Then, 1× PBS for 5 min (3×) were performed to wash away the extra unbound primary antibody. Incubation with secondary goat anti-mouse antibody (1:500 dilution, Alexa Fluor 680 (Cy 5.5) ThermoFisher Scientific, Waltham, MA, USA) and secondary goat anti-rabbit antibody (1:500 dilution, Alexa Fluor 647 (Cy5) ThermoFisher Scientific, Waltham, MA, USA) was followed for 30 min at 37 ◦C in a humidified atmosphere. Then, 1× PBS three times for 5 min washes were performed to wash away the extra unbound antibody. Filters were dehydrated in an ethanol series and air-dried. Cyan 3 (Cy3) telomere-specific peptide nucleic acid (PNA) probe (DAKO, Agilent Technologies, USA) was applied before denaturation at 80 ◦C for 3 min, hybridization for 2 h (h) at 30 ◦C. Filters were washed in 70% deionized formamide (Sigma-Aldrich, St. Louis, MI, USA) in 10 mM Tris pH 7.4 for 15 min. Filters were removed from the metal support ring using an 8-mm biopsy punch, placed on a new slide, DAPI (4- ,6-diamidino-2-phenylindole), ThermoFisher Scientific, Waltham, MA, USA) counterstained, and mounted with Vectashield (Vector Laboratories, Burlington, ON, Canada) with a coverslip.

#### *2.4. Imaging and Analysis*

For each patient sample, 30 CTC nuclei were analyzed using TeloViewTM software [36] (used with the permission of Telo Genomics Corp Inc. Toronto, ON, Canada). Telomeres were imaged using fluorescence microscopy (Zeiss AxioImager Z1 microscope (Carl Zeiss, Toronto, ON, Canada) equipped with an AxioCam HRm camera, using a 63×/1.4 oil plan apochromat objective lens). The imaging software ZEN 2.3 software was used for image acquisition. Three-dimensional imaging of telomeres was performed using 40 *z*-stacks, each with a thickness of 0.2 μm (*z* plane). The sampling distance of the *x*- and *y*-planes was 102 nm. The exposure time for Cy3 (telomeres) was maintained at a constant 444.54 milliseconds, whereas that for FITC, Cy5, Cy5.5, and DAPI varied. An FITC filter was used to determine the presence of AR antibodies, Cy5.5 for anti-cytokeratin 8,18,19 antibodies and Cy5 for CD45 antibodies. Images were deconvolved using a constrained iterative algorithm [36] at the manual strength of 7 for CY3 and 6 for DAPI. Each cell was analyzed for the number of telomere signals per nucleus, intensity of signal, presence of telomere aggregates (two or more signals that cannot be resolved due to proximity and defined as a signal with an intensity above the standard deviation of signal intensity for that cell), *a*/*c* ratio, and nuclear volume. These measurements were determined for CTCs from each patient isolated at diagnosis. When cells are captured on the ScreenCell filtration device, they are flattened due to the mild vacuum applied during isolation [32]. Therefore, the nuclear volumes and *a*/*c* ratios discussed here can only be seen in a comparative manner (CTCs vs. CTCs) and do not represent absolute measurements.

#### *2.5. Laser Microdissection and Whole-Exome Amplification*

Prostate cancer CTCs and lymphocytes were isolated by laser microdissection. Giemsa (Millipore, Billerica, MA, USA) was used to stain the filters, allowing single CTCs and lymphocytes to be identified and isolated by Laser Microdissection Olympus IX microscope MMI CellCut (MMI GmbH—Molecular Machines & Industries, Eching, Germany) (Figure 1). Once isolated at the single-cell level, CTCs underwent whole-genome amplification (WES). The DNA of isolated CTCs and lymphocytes was amplified using the Ampli1™ WES kit (Menarini Silicon Biosystems, San Diego, CA, USA) according to the manufacturer's instructions. Briefly, reactions conducted in the same tube followed these steps: Cell lysis, DNA digestion, ligation, and primary PCR according to the procedure of the supplier, resulting in a final volume of 50 μL of WES product. Genome integrity and quality were evaluated using the Ampli1™ QC kit (Menarini Silicon Biosystems San Diego, CA, USA) and PCR products were visualized via 1.5% agarose gel.

**Figure 1.** Principle of the laser capture microdissection. After circulating tumor cells (CTC) isolation, the CTCs were attached in a track-etched polycarbonate filter. The filter pores measure 6.5 ± 0.33 μm in diameter and retain 85–100% of tumor cells and only 0.1% of lymphocytes. (**A**) May-Gruenwald-Giemsa stain was performed on the filters for CTC identification by morphological and cytopathological criteria. Then, a UV laser beam was focused and used to cut a circle around the area of the target CTC or lymphocyte via an inverted microscope (Laser Microdissection Olympus IX microscope MMI CellCut—MMI GmbH—Molecular Machines & Industries, Eching, Germany). The dissected CTC was collected by photonic pressure using laser pressure to lift the dissected CTC into a collecting cap (**B**). The empty area that had contained the target cell can be visualized in **C**.

#### *2.6. Whole-Exome Sequencing and Bioinformatics Analysis*

DNA fragments of 180–280 bp in length were generated by a hydrodynamic shearing system (Covaris, MA, USA) with 1.0 μg of genomic DNA per sample. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities and enzymes from a TruSeq preparation kit were removed. After adenylation of the 3' ends of DNA fragments, adapter oligonucleotides (TruSeq adaptors) were ligated. DNA fragments with ligated adapter molecules on both ends were selectively enriched in a PCR reaction. The PCR products were purified using an AMPure XP system (Beckman Coulter, Beverly, MA, USA) and quantified using the Agilent high sensitivity DNA assay on the Agilent Bioanalyzer 2100 system. The fragmented sequences were hybridized with probes using an Agilent SureSelect Human All Exon kit (Agilent Technologies, CA, USA). The clustering of the index-coded samples was performed on a cBot Cluster Generation System using a TruSeq PE Cluster Kit v4-cBot-HS (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. After cluster generation, the output was loaded into a chip and sequenced with the HiSeq X sequencer (Illumina, San Diego, CA, USA). A total of 21 CTCs and 4 lymphocytes were sequenced (Table 1). The number of CTC and or lymphocyte analysed per patient is shown in the Supplementary Materials Table S1.


**Table 1.** List of the 25 samples sequenced: 21 CTCs and 4 lymphocytes.

The sequencing experiment produced raw fastq files (sequencing data available at SRA database, SRA accession: PRJNA633995; Temporary Submission ID: SUB7472754), which were preprocessed with Trim Galore (version 0.3.7) [37] in order to remove adapters and perform quality trimming. The mapping was performed with the Burrows-Wheeler Aligner (BWA) [38], more precisely the bwa-mem (version 0.7.8) algorithm, to the human reference genome (GRCh37 + decoy). The aligned bam file was processed using Samtools (version 1.0) [39] and Picard (version 1.111) [40]. SNV and indels calling were performed with GATK (version 3.1) [41] and respective annotation with Ensemble Effect Variant Predictor (VEP) web-version [42] using dbSNP (build 144) [43]. The copy number variations were identified using Control-FREEC (version 6.7) [44] and the genomic amplification was calculated between each CTC generating a file with merged SNVs from all lymphocytes. For the SNV and indel cutoff, we discarded variants with a mapping quality lower than 30 and read depth lower than 100. For the CNV calling, we considered only CNVs with gains above 2 copies. The SNVs and indels identified in the lymphocytes were used only to filter out variations not associated with the cancer genotype. For example, in the cluster analyses, we removed every SNV or indels also

present in at least one lymphocyte. For CNV analysis, lymphocytes were used to set out the number of gains of copies in the CTCs. Gene set enrichment analysis (GSEA) was performed with EnrichR [45]. To compare our findings with known pathways and terms, we used KEGG [46], Gene Ontology [47], and Reactome [48] databases. We also used the PharmKGB database to associate genetic variations to known cancer drugs' effect [49].

To reduce the probability of false positives, we selected the SNVs and indels with genotype quality ≥ 30 and reading depth ≥ 100 from all samples to be entered into a customized database (MySQL). We filtered out all SNVs and indels found in any of the lymphocyte samples in order to analyze only somatic variants.

#### **3. Results**

#### *3.1. High-Risk Prostate Cancer CTCs Were Selected Based on Their Positive Staining for the Androgen Receptor and Cytokeratin 8, 18, and 19 and Negativity for CD45.*

The study involved 10 patients with high-risk localized prostate cancer, aged 55–80 years (median, 75 years) at diagnosis. Their median PSA level and Gleason score were 6.4 ng/mL and 9, respectively. A summary of the clinical features of all 10 patients at baseline is shown in Supplementary Table S1. CTCs were collected at diagnosis and prior to treatment, using a size-based filtration technique (ScreenCell) [28]. We identified the prostate cancer CTCs based on their positive immunostaining for androgen receptor (AR) and cytokeratin 8, 18, and 19 (Cy) and negative staining for CD45 (Figure 2). Figure 2B shows an isolated CTC stained with AR antibody conjugated with Alexa Fluor 488 in green and Cytokeratin in red. AR stains both the intracytoplasmic region and the cell membrane, while cytokeratin identifies CTCs with epithelial origin.

**Figure 2.** Example of a circulating tumor cell from a high-risk localized prostate cancer patient captured on top of a filter pore (**A**). The arrows show empty filter pores in A. Prostate cancer CTCs are recognized based on their androgen receptor (AR) and cytokeratin-positive staining and -negative staining of CD45 (**B**). (B1) Two-dimensional image showing a CTC AR positive in fluorescein isothiocyanate (FITC—green); (B2) CTC with the telomeres labeled with Cy3-labeled probe (red); (B3) Merge between FITC and telomeres; (B4) CTC cytokeratin positive in Cy5.5 (red); (B5) CTC CD45 negative; and (B6) merge of CTC counterstained with 4- ,6-diamidino-2-phenylindole (DAPI) in blue. In (**C**), the same cell is shown in three-dimensional representation. Red spots represent telomere signals; and the blue is DAPI.

#### *3.2. CTCs from Localized High-Risk Prostate Patients Showed Telomere-Related Heterogeneity at the Single-Cell Level*

The three-dimensional telomere profile was used to adequately characterize prostate cancer CTCs and possible CTCs subpopulations. The TeloView® (Telo Genomics, Toronto, ON, Canada) program provides information on the average intensity/telomere length distributions, which can be related to the clonality. We used the Wilcoxon score to explore the cell distribution of all TeloView® parameters obtained for the 30 readings in each patient (Figure 3). At the individual patient level, it was apparent that CTC telomeres displayed considerable length heterogeneity (Figure 3B,C). The same was observed for the total number of telomere signals and *a*/*c* ratio (Figure 3D,E). The *a*/*c* ratio is correlated with cell cycle phase; a higher *a*/*c* ratio corresponds to the telomeres becoming organized in a disk-like formation in preparation for mitosis (later stages in the cell cycle) [36]. However, for the number of telomere aggregates and nuclear volume, the heterogeneity among CTCs were less evident. While all patients exhibited a high number of telomere aggregates (Figure 3F), the patients were dispersed in two subpopulations for nuclear volume (Figure 3A), based on the clear distinction of samples provided by this parameter. The telomere parameters were similar to those previously described by our group for high-risk localized prostate patients [30]. The lymphocytes' telomere parameter distribution for each patient is provided in the Supplementary Materials Figure S1.

**Figure 3.** *Cont*.

**Figure 3.** Representative bar plots to illustrate the intra and inter-sample variability of all telomere parameters. (**A**) Nuclear volume. (**B**) Total telomere signal intensity. (**C**) Average intensity (proportional of telomere length). (**D**) Total number of telomere signals. (**E**) a/c ratio (see material and methods). (**F**) Total number of telomere aggregates (see material and methods). The x-axis assigns one box for the CTC population analyzed per patient. The y-axis refers to output from Kruskal–Wallis test represented as Wilcoxon mean scores (determined using SAS software). Whiskers show minimum and maximum values, boxes represent 25–75% data ranges, horizontal lines within boxes are medians, and diamond symbols are means.

#### *3.3. Whole-Exome Sequencing Showed Genetic Variation (SNVs and Indels) Associated with Telomere Maintenance Genes, Prostate Cancer, and Known Cancer Drug Response*

We first analyzed the distribution of SNVs and indels among CTCs and lymphocytes. The genetic variation analysis (SNVs and indels) of single CTCs can detect important somatic mutations at diagnosis and new alterations acquired during the disease evolution or after treatment. Twenty-one single CTCs and 4 single lymphocytes DNA from 10 different patients were isolated and sequenced. We identified a total of 202,241 SNVs and 137,407 indels where less than 10% of these genetic variations were within coding regions (Table 2). We used the term genetic variations for the sum of SNVs and indels. Table 3 shows the number of SNVs and indels found in each CTC. As demonstrated in Table 3, each CTC showed a different number of SNVs and indels alterations. The lowest count of genetic variation (SNVs + indels) were found in the CTC sample 902\_4 of patient 8 with 982 total and the highest number was found in the CTC sample 3013 of patient 2 with a total of 137.854 affected genes. No common genetic variation (SNVs + indels) was found in all 21 CTCs. However, we found common SNVs and indels in all patients in at least one of the CTCs. They all presented a deletion of four nucleotides (AAAG) in the *ITSN1* (Intersectin 1 gene) (rs71322246) and four SNVs in *PDE4DIP* (phosphodiesterase 4D interacting protein gene) (A/G, rs4997150) and (G/T, rs4997149); gene *ITSN1* (A/G, rs10222139); and *RCF1* (Respiratory supercomplex factor 1 gene) (A/G, rs2306596).

**Table 2.** Total number of unique, coding, and non-coding regions affected by single nucleotide variants (SNVs) and insertion-deletions (indels).


Hierarchical clustering analysis was performed to compare patterns of SNVs and indels between the samples. Closer clusters in the dendrogram have more similar genetic variations than distant ones. As shown in Figure 4, all lymphocytes sit in the same cluster, which highlights the similarities between them. The lymphocyte cluster is very different from the cluster formed by CTCs samples 6016, 5015, and 7017, which is located in the opposite site of the dendrogram.


**Table 3.** Single nucleotide variants (SNVs) and insertion-deletions (Indels) count by CTCs and annotation into dbSNP [43] and COSMIC [50] databases.

Unannotated: not found in both database (dbSNP and COSMIC). Annotated: found in at least one database (dbSNP and/or COSMIC). P = Patient.

**Figure 4.** Hierarchical clustering of 21 CTCs (in black) and 4 lymphocytes (in blue) used in the analysis. The SNVs and indels are clustered using the average method, which performs a hierarchical cluster analysis using a set of dissimilarities between the samples. The y-axis (the heigh) are values of the distance in which two groups can split or merge, using the calculation of the Euclidean distance. P = patient.

Next, we evaluated the SNVs and indels in coding regions to highlight the presence of clinically relevant mutations. Patients 3, 5, 6, and 7 presented the highest number of genes among all patients, with genetic variation in coding regions (frameshift indels and/or missense SNVs). In Figure 5, we used a Venn diagram to illustrate the CTC-shared SNVs and indels between those four prostate cancer patients. We combined all genetic variation found in different CTCs isolated from the same patient. Patients 3, 5, 6, and 7 shared 758 genetic variations. To further understand the potential biological functions of CTC-shared SNVs and indels, we performed KEGG pathway (http://www.kegg.jp/) and GO (http://www.geneontology.org/) biological process analyses. In the total affected genes (4698), nine are associated with telomere maintenance (Gene Ontology term, GO:0000723) (*ATM*, *PARP1*, *HNRNPC*, *RAD50*, *PINX1*, *TERF2*, *NAT10*, *HNRNPA1*, and *TNKS*) (9 of 36 genes found in this pathway) and 18 genes were related to prostate cancer in the KEGG database (hsa05215) (18/89 found associated with prostate cancer) (Figure 5 in bold). In Supplementary Materials Figure S2, we illustrate the CTC-shared SNVs and indels results found for each patient and important genes affected.

**Figure 5.** CTC-shared SNVs and indels for four prostate cancer patients. Venn diagram of genes with genetic variation within the coding sequence (frameshift indels and missense SNVs) in patient 3 (green), patient 5 (yellow), patient 6 (red), and patient 7 (blue). The genes highlighted in italic are from the GO term telomere maintenance (GO:0000723) and in bold from the KEGG related to prostate cancer (hsa05215).

Lastly, we assessed the impact of SNVs found in all CTCs (10 patients) on known drug response targets according to the PharmGKB database. We identified nine genetic variations associated with response to docetaxel. In Figure 6, we show the number of SNVs associated with drug response. A total of 48 SNVs can affect drug response for 24 known cancer drugs. The 48 SNVs were identified in at least one CTC (Figure 6). To better contextualize Figure 6, we included a Supplementary Materials Table S2, where we list the number of SNVs found in our cohort associated with different cancer drug response. The percentage of our findings in all described SNVs associated with the same drug in the PharmKGB database and the patients where at least one SNV was found are shown in the sequential columns (Supplementary Materials Table S2). We used the Fisher exact test to perform an enrichment analysis of each drug in which we found correlated SNVs using the PharmGKB database. In summary, we checked if the group of SNVs previously identified had enough overrepresentation within a certain

drug in the total all cancer drugs listed in the PharmGKB database and identified the following three drugs (*p*-value < 0.05): Cyclophosphamide, Docetaxel, and Thalidomide. These are the main drugs used in prostate cancer therapy (highlighted in blue in Supplementary Materials Table S2). The false discovery rate (FDR) using Benjamini–Hochberg correction was applied (threshold of 0.05) to the list of drugs and these three drugs remained the ones showing statistical significance.

**Figure 6.** Bar plot of SNVs associated with known cancer drugs in the PharmGKB database. The PharmGKB database gathers all currently reported variant–drug interactions by at least two different scientific publications. The x-axis illustrates the cancer drugs, anthracyclines (**A**), capecitabine (**B**), carboplatin (**C**), cisplatin (**D**), cyclophosphamide (**E**), docetaxel (**F**), doxorubicin (**G**), doxorubicinol (**H**), exemestane (**I**), fluorouracil (**J**), gemcitabine (**K**), imatinib (**L**), irinotecan (**M**), letrozole (**N**), leucovorin (**O**), lonafarnib (**P**), methotrexate (**Q**), oxaliplatin (**R**), paclitaxel (**S**), platin compounds (**T**), thalidomide (**U**), tipiracil HCL (**V**), trastuzumab (**W**), and trifluridine (**X**). The y-axis is the number of SNVs found associated with each drug. The same SNV can affect the response of different drugs.

#### *3.4. Copy Number Alterations Identify Gene Amplifications Associated with High-Risk Prostate CTCs*

Somatic copy number alterations (CNAs) have an important role in genome instability and tumorigenesis. In contrast to SNVs and indels, which show substantial cell-to-cell heterogeneity, CNAs seem to exhibit genomic homogeneity in their patterns [51]. Genomic analyses of CTCs are crucial for understanding the underlying mechanisms required for cancer metastasis, including escape from the primary tumor site, entry in peripheral blood, and survival in the circulation [51]. To reduce the number of false negative genes for CNAs due to the procedure of single cells' DNA amplification, we focused only on gene amplification. We noticed that significant portions of chromosomes in different samples, such as chromosome 1 in 1011; chromosomes 6 and 11 in 6016; chromosome 5 in 6016 and 2012; chromosome 7 in 2012; and chromosome 13 in 6016 and 7017 (Figure 7), were amplified. No common CNAs were found in all 21 CTCs or in all patients (found in at least one CTC).

Four of the nine patients analyzed had the highest number of amplifications (patients 1, 3, 6, and 8). The CTC-shared CNAs for those four patients are shown in Figure 8. Thirty-three amplifications have already been described as being associated with high-risk prostate and they are highlighted in Figure 8 [52,53]. In addition, 37 amplified genes were identified to be commonly shared by those four patients (Supplementary Materials Table S3). In order to understand how those 37 shared amplified genes are conected in known biological pathways, we next performed Gene Set Enrichment Analysis (GSEA) in the Reactome database (Table 4). The GSEA revealed that Poly(ADP-Ribose) Polymerase 1 (PARP1) amplification can affect three important DNA damage repair (DDR) pathways: Single strand break repair (SSBR), base excision repair (BER), and nucleotide excision

repair (NER) [54]. In addition, USP21 amplification/ overexpression was positively correlated with human pancreatic ductal adenocarcinoma disease progression. USP21 promotes cell proliferation, tumor progression, and colony formation, and enhances cancer stem cell self-renewal. USP21 stabilizes the Wnt (Wingless-related integration site) pathway transcription factor 7 (TCF7) to activate gene expression in the Wnt network [55].

**Figure 7.** Heatmap showing significant large-scale amplification events in CTCs per whole chromosome. In this heat map, the chromosome number are arranged from left top to right, and 21 CTCs analysed flow vertical, top to bottom. Significant genomic amplifications are represented in red and the red intensity can vary according to the number of copies amplified. P = patient.

**Figure 8.** Venn diagram representing genes with amplification in at least one copy in patients 1 (blue), 3 (yellow), 6 (green), and 8 (red). The genes highlightened are similar genes' amplification found in a previous study using CTCs from clinically localized high-risk prostate cancer (Friedlander et al. 2019) [52]. Some of the affected genes were also found by Ikeda et al. 2019 [53]. The list with the 37 commonly amplified genes is shown in Supplementary Materials Table S3.


**Table 4.** Gene set enrichment analysis of the 37 shared amplified genes using the Reactome database found in patients 1, 3, 6, and 8.

#### **4. Discussion**

Accurate risk classification of men with localized high-risk prostate cancer directly affects treatment management decisions and patient outcomes [2]. A wide range of risk assessment methods is available, each with significant limitations in discriminating between indolent and aggressive prostate cancers [6]. Sampling error, due to tumor multifocality tumors, failure of currently available imaging modalities to detect and assess local disease burden, and low-volume metastatic disease, can also increase the changes of misclassification [9]. Studies have shown that specific genetic alterations, such as mutations and copy number alterations, are associated with disease aggressiveness [9–11]. In addition, prostate patients with polyclonal tumors (and distinct mutational signatures) also relapse more frequently after primary therapy [13]. The main problem to apply this information into clinical care is the risk associated with biopsy sampling and the extensive spatial heterogeneity of the multifocal tumors, typically present at diagnosis. Repeated biopsy sampling can lead to infectious complications and even death [14]. CTCs have shown great clinical utility to characterize the genetic landscape of underlying tumors in prostate cancer and other solid tumors [34–36]. Obtaining the molecular profiles from patients with clinically localized disease may reduce the risk of misclassification and increase the detection of aggressive/lethal disease in need of immediate treatment. In addition, tumor cell-dependent alterations in telomere architecture represent a structural indicator of genomic instability present in prostate cancer CTCs [34–36]. The combination of telomere-related genomic instability with novel blood-based molecular profiling technologies, such as single-cell whole-exome sequencing, can improve our ability to monitor clonal evolution during therapy and disease progression.

Here, we performed 3-D telomere profiling prior to laser microdissection and single-cell whole-exome sequencing in localized high-risk prostate cancer patient samples. Our telomere measurements using TeloView® showed that CTC telomeres displayed considerable length heterogeneity as well as the total number of telomere signals and *a*/*c* ratio, in agreement with our previously published

results [35]. The CTCs of localized high-risk prostate cancer patients present a higher number of telomere signals than normal lymphocytes, with lower signal intensities (length), which signal an increase in telomere-related genomic instability [30]. We could see clearly that the nuclear volume measurements identified two subpopulations: Subpopulation 1 (including patients 3, 4, 5, 6, and 7) and subpopulation 2 (including patients 1, 2, 8, 9, and 10) (Figure 3A). We then used WES of single CTCs in order to detect the presence of multiple mutations within the same cell and further investigated tumor heterogeneity. In total, 21 single CTCs and 4 single lymphocytes from 10 different patients were isolated and sequenced. We identified a total of 202,241 SNVs and 137,407 indels where less than 10% of these genetic variations were within coding regions (Table 2). Since many regions of noncoding DNA play a role in the control of gene activity, it is possible that the number of genetic variations identified in noncoding regions is affecting the expression of a variate of genes. In addition, indels that can lead to frameshifts are usually under negative selection pressure [56]. Indels are the second most frequent type of genetic variation, followed by single nucleotide variations, and account for almost a quarter of the genetic variation implicated in human diseases [57]. We identified that the genetic variations (SFNVs + indels) and CNAs profiles were highly heterogeneous. Intra-patient CTC variation was observed for both SNVs + indels and CNAs (Figures 5 and 8, and Supplementary Materials Figure S2). However, in reality, all 21 CTCs lacked common genetic variations (SNPs + indels) or CNAs, which is an indication of an extremely heterogeneous disease. In fact, localized prostate cancers are known to be genetically variable and frequently multifocal with multiple independently evolving clones [11]. To date, there is no understanding of whether this genetic variability can aid in management decisions for patient care. However, all patients presented a deletion of four nucleotides (AAAG) in the ITSN1 (intersectin 1 gene). ITSN1 inhibition is associated with cell proliferation and cell apoptosis inhibition. The ITSN1 gene is being considered a key biological target candidate for breast cancer [58]. The importance of ITSN1 deletion in prostate cancer still awaits future studies. We also found, in all patients, SNPs in the PDE4DIP and RCF1 genes. PDE4DIP (also known as myomegalin, MMGL) is a tumor marker for diagnosis and prognosis in patients with esophageal squamous cell carcinoma [59]. RCF1 is a member of the conserved hypoxia-induced gene 1 (Hig1) protein family [60]. The role of PDE4DIP and RCF1 genes in PC still awaits full investigation.

To explore the biological significance of genetic variants found in prostate cancer CTCs, we performed pathway enrichment analysis of the affected genes. Patients 3, 5, 6, and 7 showed 758 commonly genetic variations, where 9 telomere maintenance pathways are affected. This includes an important gene for telomere maintenance, called telomeric repeat-binding factor 2 (TERF2, also known as TRF2), which is one of the critical members of the shelterin complex. Loss or mutation of TRF2 results in telomere shortening, DNA damage, senescence, or apoptosis [61]. Alterations in TERF2 could explaining the increased telomere-related genomic instability in patients 3, 5, 6, and 7. A key opportunity arising from whole-exome sequencing analysis is the early identification of the patient's drug response. To this end, we used the PharmGKB database to investigate the impact of the SNVs found in all CTCs on drug response. We identified nine genetic variations associated with response to docetaxel. Adjuvant docetaxel-based chemotherapy improved the overall survival and disease-free survival among high-risk nonmetastatic prostate cancer, when added to the standard treatment of radiotherapy and long-term androgen suppression. Rosenthal et al. (2019) showed a reduction in the rate of distant metastasis with the addition of docetaxel to standard treatment in men [62].

Another WES data application explored was CNA analysis. We found a high-level gain of a chromosomal segment in some CTCs (Figure 8). In the total of nine patients analyzed, four of them had the highest number of amplifications found in different chromosomes (patients 1, 3, 6, and 8). Due to the absence of studies using WES to investigate CNAs in single CTCs from localized high-risk prostate cancer patients before treatment, we compared our finding with those of Friedlander et al. (2019) [52]. The authors performed single-cell whole-genome analysis in CTCs of 14 patients with localized high-risk prostate cancer within 2 to 5 months after prostatectomy. We found amplification in 33 similar genes (Figure 8). As observed by Friedlander et al. (2019) and corroborated by our study, *MYCN* and

*AR* amplications was not frequenty observed in CTCs from localized high-risk prostate cancer. None of our CTC-shared CNAs, represented in Figure 8, were observed by Friedlander et al. (2019). In order to investigate which pathways the CTC-shared CNAs (37 genes) could affect, we performed Gene Set Enrichment Analysis (GSEA). PARP1 amplification can affect two important DNA damage repair (DDR) pathways. DDR gene amplification can lead to chemotherapy resistance and short overall survival [53]. PARP1 is a multifunctional enzyme, which binds to DNA breaks and recruits DNA repair proteins to the damaged site [63]. The use of PARP inhibitors in cancer treatment is based on the combination of PARP inhibition with DNA-damaging drugs [63]. Four of the PARP inhibitors are currently approved by FDA for ovarian and breast cancer. However, only a few early phase studies have been completed to propose the use of PARP inhibitors for prostate cancer treatment [63]. A high proportion of prostate cancer patients carry DDR gene defects. Here, we found a higher frequency of amplification on DDR genes as a novel finding of our study. Copy number amplification of DNA damage repair pathways potentiates therapeutic resistance in cancer [63]. These patients may represent a new subgroup that would benefit from therapeutics targeting DNA damage response pathways, such as PARP inhibitors [63]. In addition, USPs (ubiquitin-specific protease) amplification has been reported in prostate cancer, such as USP2a, USP7, and USP10. We showed that USP21 is also amplified in prostate cancer CTCs. USP21 amplification can increase proliferation, migration, and invasion [64]. In non-small-cell lung cancer (NSCLC), USP21 amplification is highly prevalent and it is speculated that inhibition of USP21 might serve as a promising therapeutic approach in NSCLC [64].

The nuclear volume measurements identified two subpopulations: Subpopulation 1 (including patients 3, 4, 5, 6, and 7) and subpopulation 2 (including patients 1, 2, 8, 9, and 10). We found 153 genes commonly affected by missense SNV or frameshift indels in the subpopulation 1 but not in the subpopulation 2. Supplementary Materials Figure S3 shows a heat map in clustered grouping order and a list of all 153 genes found. To date, no study appeared in the literature investigating the association between smaller or larger CTCs with prognosis using CTCs from localized high-risk prostate cancer. In breast cancer patients, for example, smaller CTCs were associated with poor overall survival [65] and the authors suggested that smaller isolated CTCs could be cancer stem cells, and the more cancer stem cells, the more aggressive the disease. For the CNA analysis, we found just one gene commonly amplified in subpopulation 1 (patients 3, 5, 6, and 7) that was not amplified in subpopulation 2 (patients 1, 2, 8, 9, and 10), which was the *MUC12* gene. MUC12 overexpression is an independent marker of prognosis in stage II and stage III colorectal cancer. However, the role of MUC12 overexpression in prostate cancer has not been explored. It is especially important in cancer to distinguish driver mutations from passenger mutations, i.e., to distinguish meaningful events from random background aberrations. Control-FREEC software (version 6.7) identifies those regions of the genome that are aberrant more often than would be expected by chance, with greater weight given to high-amplitude events (high-level copy-number gains or homozygous deletions), which are less likely to represent random aberrations or sequencing errors, and filters for recurrent CNVs that exceed a significance probability threshold of 0.01 [44]. The frequencies of the altered CNAs and SNVs/indels in each group were compared between the subpopulations. A chi square was used to evaluate the statistical significance of the differences. The amplification of the MUC12 gene, which was described in the four patients in group A but not in any of group B, was statistically significant between subpopulations (*p*-value = 6.198 <sup>×</sup> 10−12). The same chi-square test showed a statistically significant difference (*p*-value = 2.2 <sup>×</sup> 10−16) between the pattern of SNVs and indels of the same two subpopulations.

In conclusion, single-cell approaches (WES and 3-D telomere profiling) were shown to be useful in unmasking CTC heterogeneity in a treatment-naïve prostate cancer patient risk group. Tumor heterogeneity is one of the major causes of failure in prostate cancer prognosis and prediction. Accurately detecting tumor heterogeneity and resistant clones is one of the main goals for the identification of new biomarkers for clinical assessment. DDR pathway mutations have been well-established as a target pathway for cancer therapy. However, frequent CNA amplifications found in localized high-risk

patients may play critical roles in the therapeutic resistance in prostate cancer. Hence, the single-cell profiling techniques described here, together with other clinical parameters, may aid in the classification of prostate cancer patients and contribute to understanding the predictive value alluded to the presence of genetic alterations, such as SNVs, indels, and CNAs in CTCs subclones.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4409/9/8/1863/s1, Figure S1: Representative bar plots to illustrate the lymphocytes telomere parameters. Figure S2: Venn diagram representing CTC-shared SNVs/indels and CNA per patient. Genes highlight in the CNA section were also found in Friedlander et al. 2019 and genes highlight in the SNVs/indels were associated with prostate cancer in bold and telomere maintenance in italic. We also emphasize in red CTC-shared genes. Figure S3: A heat map of gene alterations found in two subpopulations identified by nuclear volume measurements. Table S1: List of all patients included with number of CTCs and/or lymphocytes analyzed, corresponding Gleason score, TMN staging and PSA levels at diagnosis. Table S2: SNVs associated with known cancer drugs in PharmGKB database. Table S3: List of 37 commonly amplified-genes for patients 1, 3, 6 and 8.

**Author Contributions:** Experimental part, A.R.-P., X.W.; Analysis, A.R.-P., S.L. and G.W.; Writing—Original Draft Preparation, A.R.-P., G.W.; Clinical data, D.D.; Writing—Review and Editing, A.R.-P., S.L., G.W., R.J.O., G.G.H., D.D., S.M.; Supervision, A.R.-P., S.M.; Project Administration, S.M.; Ethics approval, S.M.; Funding Acquisition, S.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was funded by the Prostate Cancer Fight Foundation/Manitoba Ride for Dad.

**Acknowledgments:** The authors would like to thank the prostate cancer patients who contributed to this study in Manitoba/Canada and the research nurse, Paula Sitarik, for blood collection. We thank Telo Genomics Corp. for the use of TeloView®, Mary Cheang for statistical analysis and Elizabete Cruz for helping in the manuscript preparation. This study was supported by the Manitoba Tumor Bank, Winnipeg, Manitoba, funded in part by the CancerCare Manitoba Foundation and the Canadian Institutes of Health Research and is a member of the Canadian Tissue Repository Network. The authors also thank the Genomic Centre for Cancer Research and Diagnosis (GCCRD) for imaging. The GCCRD is funded by the Canada Foundation for Innovation and supported by CancerCare Manitoba Foundation, the University of Manitoba and the Canada Research Chair Tier 1 (S.M.). The GCCRD is a member of the Canadian National Scientific Platforms (CNSP) and of Canada BioImaging.

**Conflicts of Interest:** S.M. is a shareholder, director and chair of the clinical and scientific advisory board of Telo Genomics Corp. (Toronto, ON, Canada). The other authors declare that they have no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Single-Cell Transcriptomes Reveal Characteristic Features of Mouse Hepatocytes with Liver Cholestatic Injury**

#### **Na Chang** †**, Lei Tian** †**, Xiaofang Ji, Xuan Zhou, Lei Hou, Xinhao Zhao, Yuanru Yang, Lin Yang and Liying Li \***

Department of Cell Biology, Municipal Laboratory for Liver Protection and Regulation of Regeneration, Capital Medical University, Beijing 100069, China; changna@ccmu.edu.cn (N.C.); tianlei2700@126.com (L.T.); jixiaofang@healtech.com.cn (X.J.); zhouxuanyee@126.com (X.Z.); houleizzu@163.com (L.H.);

xinhaozhao0010@163.com (X.Z.); yyr\_rose@126.com (Y.Y.); yang\_lin@ccmu.edu.cn (L.Y.)

**\*** Correspondence: liliying@ccmu.edu.cn

† These authors contributed equally.

Received: 21 August 2019; Accepted: 11 September 2019; Published: 11 September 2019

**Abstract:** Hepatocytes are the main parenchymal cells of the liver and play important roles in liver homeostasis and disease process. The heterogeneity of normal hepatocytes has been reported, but there is little knowledge about hepatocyte subtype and distinctive functions during liver cholestatic injury. Bile duct ligation (BDL)-induced mouse liver injury model was employed, and single-cell RNA sequencing was performed. Western blot and qPCR were used to study gene expression. Immunofluoresence was employed to detect the expressions of marker genes in hepatocytes. We detected a specific hepatocyte cluster (BDL-6) expressing extracellular matrix genes, indicating these hepatocytes might undergo epithelia-mesenchymal transition. Hepatocytes of BDL-6 also performed tissue repair functions (such as angiogenesis) during cholestatic injury. We also found that four clusters of cholestatic hepatocytes (BDL-2, BDL-3, BDL-4, and BDL-5) were involved in inflammatory process in different ways. To be specific, BDL-2/3/5 were inflammation-regulated hepatocytes, while BDL-4 played a role in cell chemotaxis. Among these four clusters, BDL-5 was special. because the hepatocytes of BDL-5 were proliferating hepatocytes. Our analysis provided more knowledge of hepatocyte distinctive functions in injured liver and gave rise to future treatment aiming at hepatocytes.

**Keywords:** single-cell RNA sequencing; cholestatic liver injury; hepatocyte heterogeneity; inflammation; liver tissue repair

#### **1. Introduction**

Cholestatic liver injury is a common clinic symptom that is characterized by impaired bile flow in the liver. There are various reasons for cholestasis, such as acute hepatitis, viral infection, alcoholic liver disease, and drug-induced liver injury. In the liver, the accumulation of highly toxic bile acids in the hepatocytes leads to cytotoxicity and causes the death of hepatocytes. If left untreated, cholestasis will cause liver fibrosis, cirrhosis, and liver failure [1,2]. Inflammation is a character of cholestasis and anti-inflammation has been considered as a therapeutic target of cholestasis [3]. For this reason, the role of hepatic non-parenchymal cells (NPCs), especially immune cells (such as neutrophils and macrophages), has been well studied in cholestatic liver injury [1,3]. However, little research has been done on hepatocyte function during cholestasis.

As the main component of liver, hepatocytes make up ~80% of liver cells and are also important players in liver injury [4–6]. It has been reported that hepatocytes release cytokines and inflammatory extracellular vesicles to mediate liver inflammation [7–9]. In a recent study, hepatocyte-specific suppression of microRNA mitigates liver fibrosis [10]. On the other hand, hepatocytes are also involved in liver injury by participating in liver repair process. During liver injury, the reparation includes angiogenesis, extracellular matrix (ECM) component alteration and ECM reorganization. If the repair process is out of control, collagen will accumulate abnormally in the liver tissue. Then, liver fibrosis will occur. It has been reported that hepatocytes undergo epithelial-mesenchymal transition (EMT) during liver fibrosis [11]. EMT is a pathological process characterized by loss of epithelial features (such as low-expression of E-cadherin) and high expression of mesenchymal cell-related genes (such as Nestin and Cx43). The EMT of hepatocytes has been well studied in hepatocellular carcinoma [12]. However, whether EMT occurs in the process of liver fibrosis remains a controversial issue. Some studies have shown that hepatocytes play an important role in tissue repair and fibrogenesis through EMT. In this way, hepatocytes are transformed into cells that produce ECM and produce collagen to participate in tissue repair or fibrogenesis [11,13]. However, some researchers also reported that EMT did not occur in mouse liver fibrosis because no ECM-producing hepatocytes were found in lineage-tracking mice [14]. Therefore, it is worthwhile to study whether hepatocytes undergo EMT during cholestatic liver injury.

Single-cell RNA sequencing (scRNA-seq) could reveal the transcriptional heterogeneity of complex tissues or cells [15–17]. Recent research has identified different clusters of normal hepatocytes [18–21]. However, the knowledge of injured hepatocyte variation is limited. Therefore, figuring out hepatocyte changes after cholestatic liver injury will provide new view in cholestasis-injured liver treatment.

In this work, we aimed at exploring the cellular heterogeneity and characterizing the transcriptomic profile of cholestatic hepatocytes at the single-cell level. Bile duct ligation (BDL) was preformed to induce mouse cholestatic liver injury. scRNA-seq (10× Genomics) was used to identify expression profile of cells isolated from injured liver. We identified six clusters of hepatocytes isolated from injured liver. Among these cholestatic hepatocytes, we identified hepatocytes involved in tissue repair and liver inflammation. Furthermore, the repair-related hepatocytes underwent EMT during cholestasis-induced liver injury. The analysis revealed different functions of hepatocyte subtypes and their changes after liver injury.

#### **2. Materials and Methods**

#### *2.1. Materials*

LPS and Collagenase IV were obtained from Sigma (St. Louis, MO, USA). PCR reagents were from Applied Biosystems (Foster City, CA, USA). Fetal bovine serum was from Biochrom (Berlin, Germany). Other common reagents were from Sigma (St. Louis, MO, USA).

#### *2.2. Mouse Models of Liver Injury*

Mouse models of liver injury were induced by BDL. BDL was performed on male ICR mice (30.0 ± 1.0 g, six weeks age, *n* = 6). Sham-operated mice, used as controls, underwent a laparotomy with exposure, but no ligation of the common bile duct was performed. Mice were sacrificed at 7/14 days of BDL. For scRNA-seq, hepatocytes were isolated from one BDL mouse or one Sham mouse. All animal work was conformed to the Ethics Committee of Capital Medical University and in accordance with the approved guidelines (approval number AEEI-2014-131).

#### *2.3. Mouse Primary Hepatocytes Preparation*

Primary murine hepatocytes were isolated as previous research [9] and were used for immunofluorescence, qPCR and Western blot. For in vitro experiments, isolated mouse hepatocytes were cultured in William's Medium E (Gibco, Life Technologies, Foster City, CA, USA) with 10% FBS on 24-well collagen-coated plate for four hours. Hepatocytes were incubated in the presence or absence of lipopolysaccharide (LPS, 100 ng/mL), and then the cells were used for qPCR.

#### *2.4. Single-Cell RNA Sequencing*

scRNA-seq was performed by Capitalbio Technology Corporation (Beijing, China). Cell suspensions were loaded on a Chromium Single Cell Controller (10× Genomics, San Francisco, CA) to generate single-cell gel beads in emulsion, following the manufacture's introduction of Single Cell 3- Library and Gel Bead Kit V2 (10× Genomics). Following Drop-seq droplet collection, cDNA amplification and sequencing library preparation were carried out exactly as described previously [22], and the libraries were sequenced on an Illumina HiSeq X Ten. For Drop-seq data from normal and cholestatic cells, the libraries from one batch of droplets were sequenced individually.

#### *2.5. scRNA-Seq Data Analysis*

Data analysis was mainly performed by Capitalbio Technology Corporation (Beijing, China). We used Cell Ranger 2.0.1 to analyze the sequencing data and generated the single cell information. Cell Ranger also provided pre-built mouse (mm10-1.2.0) reference packages for read alignment which finished by STAR-2.5.1b. For analysis of mix cells, the cells of different samples were merged together by Cell Ranger aggr pipeline and normalized by equalizing the read depth among libraries. Principal-component analysis and t-distributed Stochastic Neighbor Embedding (t-SNE) were performed using the prcomp and Rtsne package of the R software (Version 3.4.1). Pseudotime analysis was performed using Monocle 2 [23]. Gene hierarchical cluster was performed by Cluster 3.0.

#### *2.6. Gene Ontology (GO) and Pathway Analysis*

GO analysis and pathway analysis were performed using STRING database (https://string-db.org/). Benjamini & Hochberg adjusted *p*-value < 0.05 was recommended to present significantly differential the term.

#### *2.7. Immunofluorescence*

Primary hepatocytes were fixed in 4% paraformaldehyde and made into smear. Immunofluorescence was performed as previous described [9]. Albumin antibody (1:100, Santa Cruz Biotechnology, Santa Cruz, CA, USA), CD31 antibody (1:50, Santa Cruz Biotechnology, Santa Cruz, CA, USA), Laminin antibody (1:100, Abcam, Cambridge, UK), and Nestin antibody (1:50, Chemicon, Billerica, MA, USA) were used. FITC-conjugated donkey anti-goat antibody or Cy3-conjugated goat anti-rabbit antibody was used as secondary antibodies (1:100, Jackson Immuno-Research, West Grove, PA). At last, nuclei were stained with DAPI.

#### *2.8. qPCR*

qPCR was performed as described previously [9]. All primers were synthesized by Biotech (Beijing, China). Primers used for qPCR were as follows: 18S rRNA: sense, 5- -GTAACCCGTTGAACCCCATT-3- ; antisense, 5- -CCATCCAATCGGTAGTAGCG-3- . *Ccl8*: sense, 5- -TACGCAGTGCTTCTTTGCCTG-3- ; antisense, 5- -TTATCTGGCCCAGTCAGCTTCTC-3- . *Laminin*: sense, 5- -ATGTTTAGTGGGGGCGATG-3- ; antisense, 5- -AGCGGTAGCGTTCAAAGGT-3- . *Hgf*: sense, 5- -AGCACCATCAAGGCAAGGT-3- ; antisense, 5- -GACCAGGAACAATGACACCA-3- . *Cdh1*: sense, 5- -ATCCTCGCCCTGCTGATT-3- ; antisense, 5- -ACCACCGTTCTCCTCCGTA-3- . *Cx43*: sense, 5- -TGTGCCCACACTCCTGTACTTG-3- ; antisense, 5- -TTTCTTGTTCAGCTTCTCTTCCTTT-3- .

#### *2.9. Western Blot*

Western blot analysis was carried out with standard procedures and followed primary antibodies against Laminin (1:2000, Abcam, Cambridge, UK), HGF (1:2000, Abcam, Cambridge, UK), and Cx43 (1:1000, Sigma, St. Louis, MO, USA). IRDyeTM 800-conjugated goat anti-rabbit IgG (1:10,000, LI-COR Biosciences, Lincoln, NE, USA) was applied as secondary antibodies. Protein expression was visualized and quantified by the LI-COR Odyssey® Imaging System and Odyssey® software (LI-COR Biosciences, Lincoln, NE, USA), respectively. Results were normalized relative to β-Tubulin (1:1000; Cell Signaling, Beverly, MA, USA) expression to correct for variations in protein loading and transfer.

#### *2.10. Statistical Analysis*

The results are expressed as mean ± SEM from at least three independent experiments performed. Statistical significance was assessed by Student's t-test or ANOVA for analysis of variance when appropriate. Correlation coefficients were calculated by a Pearson test. *p* < 0.05 was considered to be significant.

#### **3. Results**

#### *3.1. Cholestasis-Injured Hepatocytes are Heterogeneous, Separating in Six Distinct Clusters*

To identify the heterogeneity and variation of hepatocytes in cholestasis-injured liver, BDL injury model was performed. After two weeks, we isolated hepatocytes from a mouse liver with BDL treatment and performed scRNA-seq (Figure 1A). We first employed immunofluorescence to detect the purity of isolated hepatocytes. The result showed that almost all cells expressed albumin (Alb, the marker of hepatocytes). At the same time, there are almost no NPCs in the isolated cells. These results indicated the isolated cells were hepatocytes with high purity (Figure 1B). Then, scRNA-seq was performed by 10× Genomics. The 10× Genomics sequenced the resultant single-cell transcriptomes to an average depth of more than 300,000 reads per cell (median genes per cell: 3303). We obtained single-cell transcriptomes from 1186 cells derived from mouse BDL liver (Figure 1C,D, Table S1). All the cells expressed *Alb*, indicating these obtained cells were all hepatocytes (Figure 1C). To clarify the differences in cell populations captured in our experiments, we selected the highly variable genes and performed hierarchical clustering on the significant principal components and visualized the cell clusters with t-SNE. Six cell clusters as cholestatic hepatocytes were obtained (named as BDL-1 to 6) (Figure 1D). We also found that *Alb* level in cholestatic hepatocyte clusters were different. *Alb* expression in BDL-1 cells was high while other five clusters were *Alb*-low hepatocytes (Figure 1C,E).

#### *3.2. Hepatocytes undergo Gene Expression Profile Change after Liver Injury*

To understand the changes of hepatocytes after liver injury, we isolated normal hepatocytes from one Sham mouse and compared gene expression profile between normal and cholestatic hepatocytes. We obtained 1173 single cell transcriptome data from normal hepatocytes (Table S1 and Figure S1). First, some known representative gene expressions were studied (Figure 2A). *Alb* was down-regulated after liver injury. Major urinary protein 3 (*Mup3*), which regulates glucose metabolism and is considered as another marker of hepatocytes, was also decreased. Furthermore, apolipoprotein a1 (*Apoa1*), which plays a role in lipid transfer, was reduced. However, the expression of lipoprotein lipase (*Lpl*), participating in lipid regulation, was increased after liver injury. At the same time, the inflammation related gene, C-type lectin domain family 4, member f (*Clec4f*), and heme oxygenase 1 (*Hmox1*) were up-regulated (Figure 2A). These results proved that hepatocyte gene expression profile was changed, suggesting functional change of hepatocytes in cholestasis.

Second, we analyzed all the 2359 hepatocytes from BDL and Sham mouse together. The result of t-SNE analysis clearly identified eight clusters (Figure 2B). Among these clusters, Mix-1 was the largest group which included BDL-2, BDL-3, and BDL-5. Mix-2 cells were all normal hepatocytes. The cells of Mix-6 were from BDL-4 hepatocytes, and most cells of the Mix-8 were from BDL-6 hepatocytes. All of the remaining hepatocyte clusters (Mix-3/4/5/7) were composed by BDL-1 cells and normal hepatocytes (Figure 2C,D). Correlation analysis also indicated that these eight clusters could be correlatively separated into two big groups. One was composed by Mix-1, Mix-6, and Mix-8. Cells belonged to the three clusters were from BDL-2 to 6. Another one was comprised by Mix-2, Mix-3, Mix-4, Mix-5, and Mix-7. The five clusters were composed of almost all of the normal hepatocytes and BDL-1 cells

(Figure 2E). These results showed that BDL-1 hepatocytes were similar to normal hepatocytes, which indicated that BDL-1 was a cluster of "normal" hepatocytes injured less.

Third, we performed pseudotime analysis on the mixed cells. Pseudotime analysis is usually used to contribute cell developmental trajectory based on transcriptional similarities [23]. Here, we employed it to further study the changes of hepatocytes after injury. The result showed that these cells were separated into three different states. State 1/2 cells were from BDL and Sham sample, while State 3 cells were mainly from BDL sample (Figure 3A). Meanwhile, BDL hepatocytes belonged to State1/2 were from BDL-1 cluster. This result further indicated that BDL-1 hepatocytes were similar with normal hepatocytes. At the same time, the cells belonged to cluster 2-6 of BDL sample composed state 3 cells, which we named as "injured hepatocytes". Since State 1 and State 2 cells were all normal hepatocytes, we asked what the difference was between these two states of hepatocytes. The top-10 most expressed genes of these two states were used to perform Gene Ontology (GO) analysis (Table S2). The results showed that hepatocytes belonged to State 1 mainly performed lipid metabolic functions, while State 2 hepatocytes were involved in transport (Figure 3B). These data proved that normal hepatocytes were heterogeneity and performed different functions.

**Figure 1.** Six clusters were classified of cholestatic hepatocytes depending on scRNA-seq analysis. (**A**) Workflow depicts isolation of hepatocytes from liver for generating scRNA transcriptome profiles. (**B**) The staining of albumin (Alb) on isolated cholestatic hepatocyte smear. Scale bars, 20 μm. (**C**) The expressions of Alb in all cells were shown. (**D**) 2D visualization of single-cell clustering of hepatocytes profiles inferred from RNA-seq data. Six major classes of hepatocytes in injured liver were detected. The count of each cell population was indicated. Colored bar coded as indicated. (**E**) The expression of Alb in each cluster was shown.

**Figure 2.** Eight clusters were classified of all the cells depending on scRNA-seq analysis. (**A**) The expressions of representative genes in normal and cholestatic hepatocytes were shown. (**B**) 2D visualization of single-cell clustering of hepatocytes profiles inferred from RNA-seq data in the mixed cells. Eight major classes of hepatocytes in normal and injured liver were detected. Colored bar coded as indicated in Figure 2D. (**C**) The components of each mix clusters. (**D**) The count of each cell population was indicated. Colored bar coded as indicated. (**E**) Correlation analysis of eight clusters.

#### *3.3. Hepatocytes Responsible for Liver Repair are Identified*

To further define the functions of cholestatic hepatocytes, the top-30 most expressed genes (absolute value of Log2 fold change ≥ 1, Benjamini & Hochberg adjusted *p*-value < 0.05) of each BDL clusters were selected for GO analysis. Among these clusters, we defined a cluster of cells (BDL-6) which were involved in tissue repair.

To further clarify the functions of these hepatocytes during liver repair, we first chose more highly expressed genes of BDL-6 (Top 200) to perform hierarchical cluster. The result showed that all these 200 genes were divided into three gene groups (Figure 4A). Then, we analyzed the functions of the three gene groups by GO analysis. There were 113 genes belonged to Gene Group 1 and these genes were angiogenesis-related gene, suggesting the important role of hepatocytes in angiogenesis. The 40 genes belonged to Gene Group 2 were mainly involved in cellular response to stimulus and signal transduction. The top 5 GO terms of Gene Group 3 (47 genes) included extracellular structure organization and ECM organization, indicating these hepatocytes were involved in ECM reorganization after liver injury (Figure 4A).

**Figure 3.** Pseudotime analysis indicated hepatocyte function transformation during liver injury. (**A**) Pseudotime analysis of hepatocytes was performed in the mixed cells. (**B**) GO analysis of top 10 genes of State 1 and State 2.

The expressions of representative genes were also analyzed. In BDL-6 hepatocytes, multimerin 2 (*Mmrn2*) and *Hgf* were highly expressed (Figure 4B, Table S3). The two genes are important mediators of angiogenesis [24,25]. Furthermore, *Hgf* is also a factor improving liver regeneration and inducing EMT of liver tumor cells [26,27]. On the other hand, the expressions of ECM genes were also detected in this cluster, such as laminin, collagen type IV alpha 1 (*Col4a1*), *Col4a2*, and heparan sulfate proteoglycan 2 (*Hspg2*) (Figure 4B).

Then, isolated primary hepatocytes were used to confirm these results. Since we detected the expression of endothelial cells (ECs) marker, *Pecam1* (also known as Cd31), in BDL-6 cells (Figure 5A), we first asked whether these cells formed hepatocytes-EC pair during scRNA-seq [28]. We employed immunofluorescence assay to detect Cd31 expression on isolated cholestatic hepatocyte smear. Hepatocytes with Cd31<sup>+</sup> signal were found on smear, while hepatocyte-EC pair was not found (Figure 5A). The expressions of representative genes were also detected in isolated hepatocytes. The results of qPCR and Western blot showed that laminin and *Hgf* expressions were increased in cholestatic hepatocytes (Figure 5B,C). Next, we treated primary hepatocytes with LPS to induce hepatocyte injury and found that laminin and *Hgf* expressions were also up-regulated in LPS-treated hepatocytes (Figure 5D).

We also employed pathway analysis to study the mechanism under the formation and function of tissue repair-related hepatocytes. The results showed that various signaling pathways might involve in these processes, including PI3K-AKT, Relaxin, AGE-RAGE, Rap1, and Ras signaling pathways (Figure 5E).

*Cells* **2019**, *8*, 1069

Taken together, hepatocytes involving in repair of liver injury (especially angiogenesis) were defined. Our results indicated the important role of hepatocytes during cholestatic liver injury.

**Figure 4.** Elucidation of hepatocyte clusters participating in tissue repair. (**A**) Hierarchical cluster and GO analysis of Top 200 high expressed genes of BDL-6 in BDL sample. (**B**) Enrichment pattern of genes in the BDL-6.

**Figure 5.** The expressions of tissue repair-related genes were changed in isolated cholestatic hepatocyte. (**A**) The detection of Alb and CD31, Laminin, Nestin on cholestatic hepatocyte smear. Scale bars, 50 μm. (**B**) The mRNA expressions of representative tissue repair-related genes were examined in isolated normal and cholestatic livers. (**C**) Western blot was employed to study the protein level of tissue repair-related genes. (**D**) Isolated normal hepatocytes were cultured with 100 ng/mL LPS and the mRNA expression of Laminin, E-cadherin and Cx43 were detected by qPCR. (**E**) Pathway analysis of top 200 high expressed genes of BDL-6. Data are presented as the means ± SEM. \**p* < 0.05 vs. control (*n* = 7 for each group in Figure 5B, *n* = 3 for each group in Figure 5D).

#### *3.4. The Liver Repair–Related Hepatocytes undergo EMT during Liver Injury*

As mentioned earlier, the expressions of ECM genes and EMT-related gene (*Hgf*) were specifically detected in BDL-6 hepatocytes. Since ECM production is one of the features of hepatocyte EMT, we asked whether BDL-6 hepatocytes occurred EMT during liver injury. First, the expressions of EMT marker genes were studied in scRNA-seq. Nestin (*Nes*) and gap junction protein alpha-1 (*Gja1*, also named as Cx43) were highly expressed, while cadherin 1 (*Cdh1*, also known as E-cadherin) was lowly expressed (Figure 4B). Then, we examined whether EMT-occurring hepatocytes could be detected in cholestatic hepatocytes smear. Laminin and Nestin, which were high expressed in BDL-6, were chosen as marker genes and were detected by immunofluorescence. The results showed that Nestin+Alb<sup>+</sup> or Laminin+Alb<sup>+</sup> cell was existed (Figure 5A). Finally, the results of qPCR and Western blot showed that expressions of Laminin, Hgf, and Cx43 were increased in cholestatic and LPS-treated hepatocytes, while E-cadherin level was decreased (Figure 5B–D). In brief, these data proved that hepatocytes underwent EMT during liver injury.

#### *3.5. Hepatocytes are Important Players in Liver Inflammation During Cholestatic Injury*

It has been reported that hepatocytes are important players of liver inflammation during liver injury. We then studied the inflammatory functions of hepatocytes in our scRNA-seq data. Among cholestatic hepatocytes clusters, BDL-2/3/4/5 were involved in immune process (Figure 6C). However, these four inflammation-related clusters showed different gene expression profiles and functions.

First, gene heatmap showed that BDL-2/3/5 shared similar gene expression profiles (Figure 6A). We also performed correlation analysis for all cholestatic hepatocyte clusters to confirm this conclusion. The result showed that BDL-3 were highly correlated with BDL-2 and BDL-5 (Figure 6B). Herein, we merged these three clusters for further analysis. The top signature genes of BDL-2/3/5 includes *Clec4f*, V-set and immunoglobulin domain containing 4 (*Vsig4*), integrin alpha L (*Itgal*), *Hmox1*, and IL-18 binding protein (*Il18bp*) (Figure 6A, Table S4). All these genes were related to immune system process. For example, *Vsig4* is reported to regulate inflammatory factor expressions negatively [29]. *Hmox1*, who is reported as an anti-fibrogenetic protein in liver fibrosis, also functions as a modulator of inflammation and enhances autophagy [30–32]. Through the analysis of gene functions, we considered BDL-2/3/5 cells as inflammation-regulating hepatocytes, who were involved in liver inflammation positively or negatively.

Second, we found that BDL-5 was a specific cluster in the three clusters. BDL-5 cells specifically expressed genes that regulate cell division and cell cycle, such as centromere protein E (*Cenpe*), nucleolar and spindle associated protein 1 (*Nusap1*), antigen identified by monoclonal antibody Ki 67 (*Mki67*), cyclin A2 (*Ccna2*) and cyclin B1 (*Ccnb1*) (Figure 7, Table S4) [33–36]. BDL-5 also distinguished from other clusters by GO terms associated with cell cycle and cell division (Figure 7B). The results of cell cycle analysis confirmed this conclusion, since cells of BDL-5 were almost proliferating cells (Figure 7C). Taken together, the cells belonging to BDL-5 were proliferative hepatocytes.

Third, BDL-4 was different from BDL-2/3/5 as cells in these cluster expressed chemokines and their receptors, such as *Ccl5*, *Ccl8*, and *Cxcl*2 (Figure 6A, Table S4). Owing to chemokine expression, GO analysis showed that the terms of chemotaxis regulation were enriching in this cluster (Figure 6C). Overall, the analysis indicated that BDL-4 hepatocytes significantly regulated leukocyte migration and affected immune function. We also performed qPCR to detect the expression of representative gene (*Ccl8* was chosen). The results of qPCR showed that *Ccl8* expression was up-regulated in cholestatic or LPS-treated hepatocytes (Figure 6D,E). These results illustrated that hepatocytes are important players in liver inflammation during cholestatic liver injury.

**Figure 6.** Elucidation of hepatocyte clusters participating in inflammation. (**A**) Heatmap of highly expressed genes of BDL-2, BDL-3, BDL-4 and BDL-5 in BDL sample. (**B**). Correlation analysis of cholestatic hepatocyte clusters. (**C**) GO enrichment analysis of BDL-2/3/5 and BDL-4. (**D**) Hepatocytes were isolated from Sham or BDL mouse livers and Ccl8 expression was detected. (**E**) Ccl8 expression in LPS-treated primary hepatocytes. Data are presented as the means ± SEM. \**p* < 0.05 vs. control (*n* = 7 for each group in Figure 6D, *n* = 3 for each group in Figure 6E).

**Figure 7.** Characterization of proliferative hepatocyte cluster. (**A**) Heatmap showed top 20 highly expressed genes of BDL-5. (**B**) GO analysis of BDL-5 highly expressed genes. (**C**) Cell cycle analysis of six clusters of BDL sample. (**D**) Enrichment pattern of genes in BDL-5.

#### *3.6. Characterization of a Less Damage Hepatocyte Cluster in Cholestatic Injured Liver*

Final, we analyzed the characters of BDL-1 hepatocytes. BDL-1 was the biggest cluster of cholestatic hepatocytes and contained 300 cells (25.3%) (Figure 1D). Despite BDL-1 hepatocytes were isolated from injured liver, they still highly expressed some genes of normal hepatocytes, including choline kinase alpha (*Chka*), *Alb*, *Mup3*, and *Apoa1* (Figure S2A,C). The top discriminative genes also included glucose metabolism related genes, such as glucose-6-phosphatase (*G6pc*), solute carrier family 2 member 2 (*Slc2a2*) and solute carrier family 22 member 30 (*Slc22a30*) (Figure S2A,C, Table S5). According to GO enrichment analysis, BDL-1 distinctness was driven by terms associated with liver development and bile acid metabolic process (Figure S2B). All these results indicated that hepatocytes of BDL-1 were injured less and were similar with normal hepatocytes.

#### **4. Discussion**

In the current study, we studied the role of hepatocytes during cholestatic liver injury. Through scRNA-seq, we identified six clusters of cholestatic hepatocytes in BDL-treated mouse liver. BDL-1 hepatocytes expressed some genes which highly expressed in normal hepatocytes, indicating that BDL-1 hepatocytes were less injured. BDL-2, BDL-3 and BDL-5 cells participated in immune system regulation. Furthermore, the genes regulating cell cycle and division were increased in BDL-5 hepatocytes. BDL-4 cells also played a role in immune, but its function focused on leukocyte chemotaxis. Moreover, BDL-6 hepatocytes highly expressed the marker genes of EMT and genes mediating tissue repair, especially angiogenesis. Taken together, these analyses proved that hepatocytes were important players in liver injury.

EMT is a pathological process which is well studied in cancer research, but not in liver injury. During liver fibrogenesis, hepatic stellate cells are well-known source of collagen, but some researches also prove that hepatocytes contribute to collagen production via undergoing EMT. It has been reported that EMT is occurred in TGF-β1 (a well-known fibrotic factor)-stimulated hepatocytes [37,38]. In vitro, hepatocytes with TGF-β1 treatment show mesenchymal morphology, lose epithelial marker gene expression (like E-cadherin), and express mesenchymal genes (for example, vimentin, fibronectin, ZO-1) [38,39]. In vivo, a research based on lineage tracing mouse proves that hepatocytes contribute to the population of FSP1-positive fibroblasts in liver fibrosis [37]. After undergoing EMT, hepatocytes become a kind of ECM-producing cell and are involved in liver fibrogenesis. However, there are also reports against this conclusion. In a study based on another lineage tracing mouse, the researchers do not find any hepatocyte undergo EMT [14]. Therefore, it is still a controversy whether EMT occurs during liver fibrosis. In our current study, our data showed that EMT was occurred in some hepatocytes (BDL-6) during liver cholestatic injury. We have three pieces of evidence to support our conclusion. (1) scRNA-seq data showed EMT marker gene expressions in BDL-6 hepatocytes. (2) We found EMT marker gene-expressed hepatocytes on isolated cholestatic hepatocyte smear. (3) The changes of EMT marker gene expressions were confirmed by qPCR and western blot using isolated cholestatic hepatocytes and LPS-treated hepatocytes. Our research proved the existence of EMT in liver injury and added the first scRNA-seq evidence for this controversial issue.

During liver injury, the formation of new blood vessels, sinusoidal remodeling, and changes in ECM composition/organization were observed. In our scRNA-seq data, we found BDL-6 hepatocytes were also involved in these processes. Angiogenesis, the sprouting of new vessels from preexisting ones, is an essential pathophysiological process required for embryogenesis, growth, regeneration, and wound healing [40]. It has been reported that liver injury and pathological angiogenesis are interdependent processes that occur in parallel. Hepatic stellate cells play a key role in angiogenesis [41–44]. Furthermore, many kinds of cells have been discovered participating in angiogenesis, including macrophages, dendritic cells and so on [45,46]. However, the function of hepatocytes on angiogenesis is still not clear. The scRNA-seq results defined BDL-6 hepatocytes highly expressed angiogenesis-related genes, which illustrated that hepatocytes might meditate angiogenesis. We noticed the highly expressed genes and functions of BDL-6 cells were similar with ECs, and cells of this cluster were Cd31+. Therefore, we asked whether cells belonged to this cluster were cell pair formed by hepatocytes and ECs, as a recent research reported [28]. We performed immunofluorescence and did not find a hepatocyte-EC cell pair. Instead, Cd31+Alb<sup>+</sup> cells were found. Since NPCs (include ECs) had been removed when hepatocytes were isolated, our data suggested the transdifferentiation between hepatocytes and ECs, but the details and direction (hepatocytes to ECs, or ECs to hepatocytes) of the transition should be studied in the future. It should be noted that the percentage of Cd31+Alb<sup>+</sup> cells was low on the isolated hepatocyte smear (three cells among 150 detected hepatocytes).

There are many immunocytes mediating liver injury and inflammation, such as neutrophils, macrophages, and nature killer cells [47]. Recently, more and more studies focus on the immunologic function of hepatocytes [5]. It has been reported that the injured hepatocytes could secrete pro-inflammatory cytokines, such as IL-33, which promotes liver injury and fibrogenesis directly [48]. Macrophage migration inhibitory factor is another inflammatory cytokine secreted by hepatocytes and plays a critical role in liver damage [7,9]. Our previous study has also shown that MCP-1 expression is increased in injured hepatocytes in vitro [9]. Therefore, hepatocytes are one of the important components in liver inflammation and are considered as effective therapeutic target of liver diseases. Depending on our analysis, inflammatory hepatocytes participated in inflammation via two different manners during cholestatic liver injury. Hepatocytes belonging to BDL-2, BDL-3, and BDL-5 highly expressed inflammation-related genes. BDL-4 hepatocytes, different from the three cluster cells, expressed chemotaxis-related genes and mediated immunocyte chemotaxis. In brief, our results proved that the inflammatory process mediated hepatocytes were complex since different hepatocytes performed quite different immunologic functions. Our data provided more information on hepatocyte-involved hepatic inflammation.

scRNA-seq has been used to study the heterogeneity of normal hepatocytes [18–20]. At present, liver zonation has been identified by scRNA-seq, and the liver is divided into nine layers with different gene expression profiles [20]. Recent studies based on a normal human liver further support this opinion [19,21]. In these studies, hepatocytes are divided into three groups—pericentral hepatocytes, periportal hepatocytes, and middle-layer hepatocytes. Since the localization of hepatocytes is critical for their response to injury, further analyses are needed to study the relationship between hepatocyte zonation and hepatocyte heterogeneity during cholestasis.

In summary, these comprehensive analyses provide first scRNA transcriptome profiles of cholestatic hepatocytes. The analyses show the heterogeneity of cholestatic hepatocyte, which may give rise to further study on hepatocyte function during liver injury. Therefore, our data provide much more information for future treatment of hepatic injury aiming at hepatocytes and open new perspectives for treatment of hepatic injury.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4409/8/9/1069/s1, Figure S1: Quality characterization of drop-seq scRNA-seq data. Figure S2: Characterization of a hepatocyte cluster less injured isolated from cholestasis injury liver. Table S1: metrics summary. Table S2: Top10 gene among state. Table S3: High expressed genes of BDL-6. Table S4: High expressed genes of BDL-2-5. Table S5: High expressed genes of BDL-1.

**Author Contributions:** Conceptualization, N.C. and L.L.; Methodology, L.T., X.J., and X.Z.; Formal Analysis, N.C. and L.T.; Investigation, X.Z., L.H., X.Z., and Y.Y.; Data Curation, N.C.; Writing – Original Draft Preparation, L.T. and N.C.; Writing – Review & Editing, L.L.; Project Administration, L.Y.; Funding Acquisition, L.L.

**Funding:** This research was funded by National Natural and Science Foundation of China (81430013, 81670550, and 81770603), Beijing Municipal Natural Science Foundation (7172019), and the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges under the Beijing Municipality (IDHT20150502).

**Conflicts of Interest:** The authors indicate no potential conflicts of interest.

#### **Abbreviations**



#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Advances of Single-Cell Protein Analysis**

### **Lixing Liu 1,2, Deyong Chen 1,2,3, Junbo Wang 1,2,3,\* and Jian Chen 1,2,3,\***


Received: 29 April 2020; Accepted: 18 May 2020; Published: 20 May 2020

**Abstract:** Proteins play a significant role in the key activities of cells. Single-cell protein analysis provides crucial insights in studying cellular heterogeneities. However, the low abundance and enormous complexity of the proteome posit challenges in analyzing protein expressions at the single-cell level. This review summarizes recent advances of various approaches to single-cell protein analysis. We begin by discussing conventional characterization approaches, including fluorescence flow cytometry, mass cytometry, enzyme-linked immunospot assay, and capillary electrophoresis. We then detail the landmark advances of microfluidic approaches for analyzing single-cell protein expressions, including microfluidic fluorescent flow cytometry, droplet-based microfluidics, microwell-based assay (microengraving), microchamber-based assay (barcoding microchips), and single-cell Western blotting, among which the advantages and limitations are compared. Looking forward, we discuss future research opportunities and challenges for multiplexity, analyte, throughput, and sensitivity of the microfluidic approaches, which we believe will prompt the research of single-cell proteins such as the molecular mechanism of cell biology, as well as the clinical applications for tumor treatment and drug development.

**Keywords:** single-cell analysis; protein characterization; conventional approaches; microfluidic technologies

#### **1. Introduction**

As the physical basis for all life and the main component of living organisms, proteins dominate or participate in almost all biological activities and biological functions like providing structural supports, molecule transportations, cell growth and adhesion, signal transductions, catalytic biochemical processes, etc. [1,2]. Under the controls of internal genes and external environments, the differences in protein expressions affect cell differentiations, nerve conductions, immune responses, and disease occurrence, which is a crucial indicator of changes in life activities [3,4]. Therefore, protein expression analysis is critical for the studies of cellular molecular mechanisms, clinical diagnosis and treatments, and drug developments [5]. In the past few decades, various methods have been developed for protein analysis, such as gel electrophoresis [6], immunoassay [7], chromatography and mass spectrometry [8], and Raman imaging [9]. These methods provide a comprehensive understanding of the biological functions of different proteins, which facilitate the developments of molecular biology and medicine [10]. However, most of these conventional approaches are limited to protein analysis at tissue levels and only able to measure population-averaged protein expressions from large amounts of cells [11], masking the single-cell heterogeneity within a population [12,13]. As a result, many rare but critical individual cells are typically overlooked in conventional studies though these cells play essential roles

in, for example, cancer metastasis and stem cell differentiation. Although single-cell genomic and transcriptomic analysis with high throughputs have developed rapidly to address the issue of cellular heterogeneity in recent years, studies have located poor correlations between RNA and protein levels in single cells [14]. Due to the stochasticity of gene expressions, variations occur in RNA and protein copy numbers of cells with the identical gene, which indicates the disconnection between single-cell proteomic and transcriptomic analysis and the necessity of single-cell proteomic analysis. Single-cell protein analysis enables protein analysis at the single-cell level and provides a feasible approach to distinguish and identify those rare but important single cells from large average populations, facilitating the corresponding studies related to fundamental mechanisms, disease developments, and drug therapies [15].

The big challenges of single-cell protein analysis are the enormous complexity and low abundance of the proteome [16,17]. Thus, single-cell protein analysis must be high-multiplexity, high-throughput, and high-sensitivity to provide quantitative information [18,19]. Some of the conventional technologies can solve the problems by single-cell separation and signal analysis (such as fluorescence or mass spectrometry) for protein detection. Besides, microfluidics provides a reliable technology for manipulating cells at very tiny volumes, thus can effectively fit single-cell analysis.

In this review, we mainly summarize the recent two-decade advances of various single-cell protein analysis approaches and techniques. We first present the developments of several key conventional approaches including fluorescence flow cytometry, mass spectrometry flow cytometry, enzyme-linked immunospot assay, and capillary electrophoresis. Then we focus on the latest advances enabled by microfluidic technologies for single-cell protein detection, including microfluidic fluorescent flow cytometry, droplet-based microfluidics, microwell-based assay (microengraving), microchamber-based assay (barcoding microchips), and single-cell Western blotting. We discuss the performance of each system in terms of multiplexity, analyte (e.g., membrane, intracellular, and secreted proteins), throughput and sensitivity, comparing advantages and limitations, and providing our perspectives on the potential development directions of future studies.

#### **2. Conventional Approaches**

#### *2.1. Fluorescence Flow Cytometry*

Fluorescence flow cytometry is the golden-standard approach for profiling of proteins at the single-cell level, which enables measurements of fluorescence characteristics of single cells or any other particles in a fluid stream when they pass through a light source [20–22]. Specifically, when single cells stained with fluorescent labelled antibodies rapidly travel through the detection region in the flow chamber, stained cells are excited by a laser, and a detector measures the emitted fluorescent intensities [23,24]. By building calibration curves using beads that have been coated with proteins under precise controls, fluorescent intensities could be translated to single-cell protein expressions [25,26] (Figure 1A).

Since its emergence in the 1960s [30], as the most established method for single-cell protein analysis, fluorescence flow cytometry made remarkable technological advancements and was featured with high throughputs and multiplexing [31]. Based on the working principle of continuous flow, it enables high-throughput detection of measuring ~10<sup>4</sup> cells per second [32]. With fluorescent labelled antibodies, it is capable of analyzing ~20 multiplexing protein parameters for membrane and intracellular proteins associated with signaling pathways in single cells [33].

Furthermore, fluorescence flow cytometry has transformed from a primitive cell counter to a powerful tool for semi-quantitative analysis, especially for analyzing pathways underlying diseases [34], discovering surface markers [35], and processing drug screening [36]. For example, it is a generally accepted method to determine the type of leukemia by detecting CD series differentiation antigens on the surface of cell membranes and estimating the proportions of immune cell subtypes [37]. More

specifically, Chattopadhyay et al. found diversely complex phenotypic patterns in total CD8+ T cells with a modified flow cytometry of 17 fluorescence emissions based on fluorescent quantum dots [38].

**Figure 1.** Schematics of conventional approaches for single-cell protein analysis. (**A**) Fluorescence flow cytometry. Single cells stained with fluorescent labelled antibodies rapidly travel through the flow chamber, stained cells are excited by a laser, and the emitted fluorescent intensities are measured by the detector. Additionally, the fluorescent intensities could reflect the single-cell proteins expression. Adapted with permission from [22]. (**B**) Mass cytometry. Stained single cells with element isotopes labelled antibodies are pushed into a nebulizer and ionized, and an elemental mass spectrum is acquired for each cell. The integrated elemental reporter signals for each cell can then be analyzed by flow cytometry. Adapted with permission from [27]. (**C**) Enzyme-linked immunospot. Single cells are localized on a plate coated with capture antibodies against specific secreted proteins. When the cells secrete proteins after stimulation, the secreted proteins are captured by the primary antibody and the signal is further amplified by secondary antibody. Each visual spot signal readout represents a single cell expressing the target protein and intensity of spot indicates proteins secretion level. Adapted with permission from [28]. (**D**) Capillary electrophoresis. Single cells are injected into the capillary under electromigration or pressure and then lysed via chemical, optical, or electrical methods resulting in lysed ions of diverse levels of migration properties. Combined with electrochemical, laser induced fluorescence, mass spectrometry, and other technologies, the detector outputs an electrophoretic spectrum which can reflect the protein expression. Adapted with permission from [29].

However, due to the rapid flowing of samples, neither measurement of secreted proteins nor the dynamic monitoring of cells over time is easy to achieve. The multiplexing capacity is limited due to spectral overlap even if fluorescence compensation is conducted. Due to the significant loss during the sample preparation process, mass populations of single cells are required, making it difficult to detect rare samples. In addition, because the cells are exposed to physical stressors such as fluidic pressure and laser beams, this can damage the cellular integrity and hamper recovery [39].

#### *2.2. Mass Cytometry*

Mass cytometry is a technique that integrates flow cytometry and mass spectrometry to analyze single-cell protein expressions with distinct transition element isotopes labelled antibodies on and within cells rather than fluorescence [8,27]. Stained single cells are pushed into a nebulizer, ionized through an argon plasma, and separated by the ions mass-to-charge ratio. Based on the time-of-flight mass spectrometer, results for each cell's constituent ions are sampled, transformed, and integrated to electric signals, which can be further quantified as single-cell protein measurements [27,40] (Figure 1B).

Compared to fluorescence flow cytometry, mass cytometry uses heavy metal element labels to avoid cross-talks among channels in fluorescence and reduces background noise interferences, which enables high multiplexed detections of surface and intracellular proteins with over 40 different proteins simultaneously measured [41]. Nolan's group used mass cytometry to profile primary human bone marrow cells with multiple parameters simultaneously for phenotype analysis. They monitored signaling behaviors of cell subpopulations based on subtype-specific surface markers [27]. As for the throughput, it depends on the time-of-flight sampling resolution, mass cytometry enables the measurement up to ~10<sup>3</sup> cells per second, inferior to that of fluorescent labelled analysis approaches. A new method named mass-tag cellular barcoding was developed by Nolan's group, which improved throughput by using n metal ion tags to multiplex up to 2n samples and applied to characterize signaling proteins and pathways in human peripheral blood mononuclear cells [40]. Besides, compared to quantum-efficient fluorophores, mass reporters show lower sensitivities, which makes it difficult to measure low expressed proteins in single cells. Moreover, since mass cytometry requires ionization, cellular recovery and preserving integrity are still infeasible. The common limitations for both mass cytometry and fluorescence flow cytometry are incapable of analyzing secreted proteins at the single-cell level, for lack of approaches that maintain small molecules and binding agents associated with the cells.

In order to obtain information on cell localization and interactions, several improved methods based on mass cytometry came into being later [42]. Imaging mass cytometry is applied to tissue analysis with a high-resolution laser ablation system to time-of-flight mass cytometry, which achieves measurements of over 100 markers possible with the availability of additional isotopes [43]. Compared with mass cytometry, which is only applied to cell suspensions, imaging mass cytometry allows spatial information of cells through tissue analysis. Another approach, multiplexed ion beam imaging, is a secondary ion imaging method that operates an ion beam to release metal ion reporters and uses mass spectrometry to quantify, which can simultaneously determine more than 100 targets [44]. As the advances and complements to mass cytometry, these methods can achieve higher resolutions and multiplexing parameters for single-cell protein analysis.

#### *2.3. Enzyme-Linked Immunospot Assay*

Enzyme-linked immunospot assay, developed in the 1980s, is a quantitative approach for detecting secreted protein at the single-cell level [45,46]. Single cells are localized on a plate coated with capture antibodies against specific secreted proteins. After stimulation to cells, the secreted proteins are captured by the primary antibodies and the signal is further amplified by secondary antibodies. Each visual spot represents a single cell expressing the target proteins and intensities of spot indicate secretion levels of target proteins [28,47] (Figure 1C).

Enzyme-linked immunospot is highly sensitive for detection of secreted proteins with a six spots per 10<sup>5</sup> cells detection limit [48]. It is widely used in the studies of immune responses, such as detecting cytokine-secreting cells [49,50] and monitoring immune system activations [51,52]. Herr et al. proposed a fast enzyme-linked immunospot assay to quantitate CD8 + T lymphocytes of HIV patients and proved a reliable detection of T cell reactivity due to previous exposure to HIV [53]. Karlsson et al. made a comparison of enzyme-linked immunospot and flow cytometry to assay CMV and HIV-1 proteins in chronically HIV-1-infected patients. Though results of T cell responses were statistically correlated between two approaches, it showed consistently lower results in the enzyme-linked immunospot assay, which suggested that it was preferable to detect low-level responses [54]. Kornum et al. presented an enzyme-linked immunospot assay to test hypocretin in CD4+ T-cells and indicated that epitope frequency was lower than the detection limit (1:10,000 cells) among peripheral CD4+ T-cells from narcolepsy type I patients [55]. However, this approach can only detect no more than three secreted

proteins simultaneously. Compared with flow cytometry, the throughput is insufficient because it is a static assay.

#### *2.4. Capillary Electrophoresis*

Capillary electrophoresis is a separation and detection approach based on a high-voltage electric field in a micron capillary whose inner diameter is compatible with single cells [29,56]. Single cells are injected into the capillary under electromigration or pressure and then lysed via chemical, optical, or electrical methods resulting in lysed ions of diverse levels of migration properties. Combined with electrochemical analysis, laser-induced fluorescence, mass spectrometry, and other technologies, according to the migration times, the detector outputs an electrophoretic spectrum, in which a peak corresponds to a type of protein. The abundance of each protein can be reflected to the statistics of the peaks, such as height or area [57–59] (Figure 1D).

Capillary electrophoresis exhibits a high sensitivity and only requires ultra-low injection. Schultz et al. described a capillary electrophoresis with laser-induced fluorescence, realizing a detection limit of 3 nM or ~6 fg injection for secreted insulin, which demonstrated the capability of rapidly determining a low level of protein in single cells [60]. Sobhani et al. presented an ultrasensitive fluorescence detection that proteins were separated and analyzed by 2-dimensional capillary electrophoresis. They used the tool to characterize the single-cell proteins and biogenic amines from the murine macrophage cell line, revealing large variations in component expressions among single cells [61]. As a technology well-suited for analysis of small heterogeneous samples, capillary electrophoresis was reported by Phillips et al. to measure protein tyrosine phosphatases in single cells of human epidermoid carcinoma, which provided a powerful tool for the analysis of human biopsy specimens [62]. In spite of these advantages, several intermediate steps, such as cell injection, lysis, and separation, result in the whole process being time-consuming and having low throughput.

#### **3. Microfluidic Approaches**

Microfluidics is a technology to process and manipulate small amounts of fluids (10-9–10-18 L) based on microfabricated channels [63]. Due to the dimensional compatibility with biological cells, microfluidic systems capable of miniaturization, integration, and parallelization have become an ideal platform for the analysis of single-cell proteins [64,65]. In the recent two decades, some microfluidic approaches have been developed and made great improvements on single-cell analysis of protein expressions.

#### *3.1. Microfluidic Flow Cytometry*

Microfluidic flow cytometry is a miniaturized version of flow cytometry for analysis of a small number of cells and enables integration of sample handling and single-cell analysis on a single microfluidic chip, where protein analysis is conducted [66]. By integrating microfluidic fabrication, optical sources and fluorescence detection together, microfluidic flow cytometry facilitates single- cell protein analysis and achieves quantification based on calibration curves (Figure 2).

Quake's group developed a microfabricated flow cytometer for sorting various biological cells in 1999 [67], and since then, microfluidic flow cytometry for single-cell protein analysis has developed rapidly. Preckel et al. demonstrated a commercially available microfluidic system for analysis of protein expressions of fluorescently stained primary cells, with a small number down to 625 cells per sample [68]. In order to achieve dynamic detections, a microfluidic platform combining multi-color flow cytometry and fluorescence microscopy was proposed by Wu et al. for probing signaling events spanning multiple timescales and intercellular locations [69]. Chen et al. reported an improved microflow cytometry platform based on a constriction channel enabling the quantification of numbers of multiple intracellular proteins simultaneously from tens of thousands single cells from both tumor cell lines and patient samples [70,71].

**Figure 2.** Microfluidic flow cytometry for single-cell protein analysis. (**A**) A commercially available microfluidic flow cytometry for analysis of protein expression with a small number down to 625 cells per sample. (a) Schematic of the microfluidic chip; (b) layout of the microfluidic glass chip with sample wells (S), buffer wells (B), the well for the reference dye (D), and the priming well (P); (c) cross-section micrograph of a channel with dimensions of 25 × 75 μm after bonding top and glass plate. Adapted with permission from [68]. (**B**) A microfluidic chip for global profiling of cellular pathways. (a) TLR4 signaling events occur at different timescales and subcellular locations; (b) shows the workflow procedures integrated and performed on the chip shown in (c); all the representative events in the cell diagram can be profiled using both fluorescent microscopy (d) and flow cytometry (e). Adapted with permission from [69]. (**C**) An improved microflow cytometry platform based on a constriction channel for absolute quantification of multiple intracellular proteins. Cells stained with multiple fluorescent labelled antibodies (a) are aspirated into the constriction microchannel with excited fluorescent signals detected by photomultiplier tubes (b); for each travelling cell, time coordinated fluorescent pulses are obtained with fluorescent levels (c); the calibration curves are obtained by the gradient solutions of multiple types of fluorescent labelled antibodies (d,e); based on raw parameters and calibration curves, numbers of multiple types of intracellular proteins are obtained (f). Adapted with permission from [71].

Microfluidic imaging flow cytometry is a modified method to collect spatial information at a high throughput. McKenna et al. presented a parallel microfluidic cytometer with 384 parallel flow channels for protein localization in a yeast model with a high throughput of several thousand events per second [72]. Furthermore, Holzner et al. proposed a microfluidic imaging flow cytometer for the ultra-high-throughput (60,000 and 400,000 cells per second for blur-free fluorescence and brightfield detection, respectively) quantitative imaging analysis of cytoplasmic proteins in human cells. It was capable of multi-parametric fluorescence quantification and subcellular localization analysis of cellular structures down to 0.5 μm with microscopy image quality [73].

Compared to conventional flow cytometry, microfluidic flow cytometry greatly reduces the amount requirements of samples which is helpful for applications in studying rare samples such as primary cells and rare tumor cells. In addition, it can obtain intracellular spatial information of single cells with a high throughput and is featured with the capacity of absolute quantification. The microfluidic flow cytometry improves some features; however, it has several similar limitations as conventional flow cytometry, i.e., the limited multiplexing capacity and incapability of quantifying secreted proteins.

#### *3.2. Droplet-Based Microfluidics*

Droplet-based microfluidics allows the quantification of secreted proteins, thereby overcoming the major limitations for protein analysis by microfluidic flow cytometry [74,75]. Typically, single cells and reagents, including fluorescent probes and target antibodies, are encapsulated simultaneously in the pico- or nanoliter water-in-oil emulsion-droplets. After incubation, fluorescent labelled antibodies bind to the secretions within the droplets. Subsequently, the droplets are loaded into a continuous flow channel, and the signal intensities are quantified, enabling a high-throughput droplet generation and protein analysis [76] (Figure 3).

**Figure 3.** *Cont.*

**Figure 3.** Droplet-based microfluidics for single-cell protein analysis. (**A**) A microfluidic device of picoliter droplets for enzymatic reaction. (a) Single *Escherichia coli* and substrate 3-O-methylfluorescein-phosphates are encapsulated within single droplets where the substrate is enzymatically hydrolyzed by the target enzyme alkaline phosphatase expressed by *E. coli*, generating a fluorescent signal; (b) and (c) show the droplet formation that occurred by confluence of three aqueous inlet streams (substrate, buffer and cells). Adapted with permission from [77]. (**B**) A new approach for absolute quantification of proteins combining proximity ligation assay and droplet digital PCR. Targeted proteins are isolated, lysed, and converted to dsDNA by standard proximity ligation assay. The dsDNA is distributed among 20,000 droplets at limiting dilution. Single dsDNA molecules in the droplets are then amplified by PCR and counted by measuring the fluorescence using droplet reader based on calibration curve. Adapted with permission from [78]. (**C**) A droplet-based microfluidic system for enzyme secretion from circulating tumor cells (CTCs) based on size purification. The system isolates CTCs by size, exchanges fluid around CTCs to remove contaminants, introduces a matrix metalloprotease substrate, and encapsulates CTCs into microdroplets. The cells can then be incubated and imaged by an imaging cytometer in the droplet generator. Adapted with permission from [79].

By confining single cells within tiny rooms by droplets, droplet microfluidics has worked as a well-established tool in single-cell protein analysis. Huebner et al. described an approach based on picoliter microdroplets initially, performing high-throughput screening by detecting the enzyme alkaline phosphatase expressed by *Escherichia coli* cells [77,80]. Weitz's team presented droplet-based microfluidics for high-throughput analysis of proteins released from or secreted by cells, screening individual enzyme expressions at a rate of ~10<sup>7</sup> per hour [81,82]. To realize the absolute quantification of tiny protein concentrations, a new approach that combines a proximity ligation assay and droplet-based digital PCR for protein quantification was developed by Albayrak et al. They counted both endogenously (CD147) and exogenously (GFP-p65) expressed proteins from hundreds of single cells [78]. Stoeckius et al. introduced a method of cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) based on droplet-based microfluidics to analyze protein and RNA expressions simultaneously for thousands of single cells. They exploited this method to detect multiplexed protein markers of cord blood mononuclear cells and enabled classifications of immune subpopulations [83]. Furthermore, Dhar et al. described a droplet-based microfluidic system integrated with vortex capture for estimating single-cell protease activities, which concentrated rare circulating tumor cells >106-fold from whole blood into 2-nL droplets and characterized the collagenase enzymes with a high-sensitivity of ~7 molecules per droplet [79].

As a popular approach of single-cell protein analysis, droplet-based microfluidics is capable of compartmentalizing highly controllable activities for a high-sensitivity analysis of intracellular, membrane, and especially secreted proteins. Nevertheless, it is a low efficient detection approach for limited cell encapsulation by the Poisson distribution, which would cause invalid analysis of empty or multiple cells in a droplet. Besides, changes in the microenvironments of single cells in droplets may cause unclear effects on cell activities in comparison to in vivo situations.

#### *3.3. Microwell-Based Assay (Microengraving)*

The microwell-based assay (microengraving) is a technique to monitor the temporal dynamics of secreted proteins from single cells based on microwells (~1 nL) in a large array [84]. In this method, single cells are distributed in large-array wells with antibody-coated microengraved substrates, and the corresponding antibodies capture the secreted proteins. After short periods of incubation, the slide with captured proteins is removed and analyzed by the conventional enzyme-linked immunosorbent assay [85] (Figure 4).

After Love's group first proposed this technology in 2006, a series of microengraving approaches have been applied in single-cell protein analysis. To improve the sensitivity, a hybridization chain reaction was integrated into this platform to amplify signals resulting from sandwich immunoassay for simultaneous detections of three secreted proteins, improving the sensitivity by an average of 200-fold compared to direct fluorescence detections [86]. Furthermore, it can provide a dynamical scope when immune responses of white blood cells (such as T-cells and B-cells) are monitored [87,89–91]. For example, Jia et al. presented a study of evaluating multiple parameters based on microengraving to analyze the protein-conjugate vaccine responses in adult nonhuman primates of B-cells. Compared to the enzyme-linked immunospot assay, the nanowell-based assay increases the sensitivity with a 106-fold higher concentration of analytes from given cells and enables the recovery of cells for further genetic analysis [91]. To detect low numbers of proteins with a broad dynamic range, another microwell-based assay design named "single molecule array" was presented by Walt et al. They demonstrated a wide range of expression of prostate-specific antigens with variation over several orders of magnitudes, revealing that genetic instabilities in cancer cells can affect protein expressions [88].

In all, the microengraving method is a powerful dynamics tool for single-cell protein analysis with advantages of high sensitivity, wide dynamic range, and capability of cell recovery. However, it characterizes only secreted proteins, but not membrane and intracellular proteins. Additionally, due to the spectral overlaps of colorimetric fluorescent proteins, its multiplexing capacity is limited to no more than four proteins. In addition, the throughput is also a limitation, because of the limited size of the microchip and the filling rate of single cells in each well requiring complex manipulations.

**Figure 4.** Microwell-based assay (microengraving) for single-cell protein analysis. (**A**) An integrated platform for microengraving and hybridization chain reaction. (a) Schematic illustration for detection of secreted products from single cells. Single cells are deposited onto an array of microwells on a glass slide with antibody coated. After incubation, the slide is removed, and immune-hybridization chain reaction is used to amplify the signal related to each capture event; (b) fluorescent micrographs for secreted proteins following microengraving and immune-hybridization chain reaction. Adapted with permission from [86]. (**B**) Process schematic for the integrated analysis of B cells using microengraving and on-chip cytometry. Microwells loaded with stained cell are imaged on a microscope cytometry to record the expressed phenotypes of every cell and the occupancy of each well. Microengraving can then be performed to capture secreted anti-bodies. Cells of interest can be recovered with an automated micromanipulator, and then sequenced further. Adapted with permission from [87]. (**C**) A single molecule array approach for quantifying phenotypic responses. Cultured cells are isolated, lysed, and loaded into the analyzer of single molecule array, and then incubated with capture beads, target antibody, and enzyme conjugate. The enzyme substrate is added, and the oil seal is used after the immune complex is formed on the beads, and then the imaging is detected. Adapted with permission from [88].

#### *3.4. Microchamber-Based Assay (Barcoding Microchips)*

In the same period, other than microwell-based assay, microchamber-based assays (barcoding microchips) function as an effective approach for analyzing proteins in single cells [92]. As an approach of absolute quantification in the number of protein molecules, this approach utilizes control microvalves to isolate single cells within known volumes of microchambers that contain capture antibodies in a barcode array. When proteins are captured, each microchamber containing an entire barcode can be quantitatively analyzed via a surface-bound immune sandwich assay (Figure 5).

**Figure 5.** Microchamber-based assay (barcoding microchips) for single-cell protein analysis. (**A**) A single-cell barcode chip for quantitative measurements of membrane, intracellular, and secreted proteins from single cells. (a) Image of the microchip and a fluorescence micrograph of a cellular assay unit (20 microchambers); (b) workflow of the on-chip operation. Fully open: cells are loaded into the microchambers. Close-I state: microchambers are sealed by a low pressure on the microchip but lysis buffer can be introduced to the channel. Close-II state: cells are isolated completely in the microchambers from the channel by a high pressure; (c) workflow of detecting of membrane, intracellular, and secreted protein via the sandwich-type fluorescence immunoassay; (d) single-cell proteomic result of fluorescence intensity, cell numbers and cell positions; (e) fluorescence data for secreted and intracellular protein assays. Adapted with permission from [93]. (**B**) A microchamber-based platform combined with spatial and spectral encoding. (a) Workflow illustration of high-throughput profiling of single cells in basal and stimulated conditions for 42 secreted effector proteins; (b) representative optical image showing a block of microchambers loaded with U937-derived macrophage cells and the corresponding scanned fluorescence images showing protein detections with three colors; (c) representative heat maps showing single-cell protein profiles measured on U937-derived macrophages; (d) correlation of protein secretion expressions between two replicate microchip experiments at single-cell levels, and (e) between single-cell levels measured using microchips and population levels measured using conventional methods. Adapted with permission from [94]. (**C**) A barcoding microchip for identifying most stable separation distance between two cells. (a) Schematic of a single microchamber with valves and barcodes (top) and the fluorescent sandwich immunoassay protein detection scheme (bottom); (b) a representative time-lapse image of a two-cell chamber over 8 h and a typical fluorescence image of a barcode for the five assayed proteins. Adapted with permission from [95].

Heath's team first demonstrated this method and a series of follow-up studies. Ma et al. presented a single-cell barcode chip for quantitative measurements of over 10 secreted proteins from single cells and applied the chip to quantify the effector molecules of T cells, observing the functional heterogeneity in cytotoxic T lymphocytes [96]. Apart from secreted proteins, Shi et al. described a new barcode chip for quantification of cytoplasmic and membrane proteins, and the microchip evaluated protein interactions related to PI3K signaling pathway mediated by EGF receptor [97]. Moreover, Wang et al. extended the function to the detection of comprehensive analytes (including membrane, intracellular, and secreted proteins) based on a modified barcode chip [93]. To further increase the multiplexity, Lu et al. designed a combination of spatial spectrum coding and microchambers, and realized detection of 42 secreted proteins. Through a comparative analysis of differentiated macrophages between different stimulations, distinct functional heterogeneity was exposed [94]. Additionally, another

barcoding microchip was used to examine secreted proteins in isolated cell pairs to identify the most stable separation distances between two cells [95].

This approach has been conducted with advantages of precise quantification, comprehensive analyte detection and multiplexing capacity, and a commercial instrument of "Isoplexis" has been developed. Despite these advantages, it also has some limitations. Due to the complex fabrication of microvalves on the chip, the effective area of the barcoding microchip is restricted, resulting in a limited detection throughput, as well as the requirement of sophisticated operations. Additionally, a balance is needed that either maintains the multiplexing capacity or detection sensitivity; that is to say multiplexing capacity would decrease assay sensitivity.

#### *3.5. Single-Cell Western Blotting*

Existing methods are almost antibody-based assays, which may cause a false-positive signal because of the non-specific binding from antibody cross-reactivity. As a recently proposed technology, single-cell Western blotting is a combination of microfluidics and conventional Western blotting to achieve protein expression analysis at a single-cell resolution [98]. Due to separation by electrophoresis before the antibody probing, it overcomes the issue of cross reactions. In single-cell Western blotting, a layer of polyacrylamide gel is coated on a glass and patterned with a large-array microwells. Single cells are dropped on the thousands of microwells and lysed in situ, and then proteins are separated by gel electrophoresis, immobilized via photoinitiated blotting, and detected by fluorescent labelled antibodies [99,100] (Figure 6).

**Figure 6.** Single-cell Western blotting for single-cell protein analysis. (**A**) Schematic of single-cell phenotype imaging and Western blotting. (a) The array consists of thousands of microwells patterned in a thin layer (30 μm) photoactive polyacrylamide gel seated on a glass slide; (b) fluorescent imaging of single cells in microwells provides phenotype information; (c) single cells are lysed in situ after imaging and the lysate is used for Western blot analysis; (d) workflow of single-cell Western blotting for proteomic assay. Adapted with permission from [100]. (**B**) Single-cell Western blotting of rare cells. (a) Enrich circulating tumor cells (CTCs) from whole blood samples based on cell size; (b) deposit enriched cells into the microwell and identify each CTC by nuclear staining; (c) for each cell in microwell, proceed as in-microwell chemical CTCs lysis, single-CTC protein polyacrylamide gel electrophoresis, covalent immobilization of proteins to the gel (photo-blotting) and in-gel immunoprobing; (d) single-CTC lysate is analyzed and rounds of immunoprobing support the multiplexing of 12 proteins, thus creating a protein expression profile for each rare cell. Adapted with permission from [101].

As a young approach of single-cell protein analysis, single-cell Western blotting has developed rapidly in recent years since Herr's group first reported it. Kang et al. described a useful protocol to measure single-cell variation in protein expressions based on single-cell Western blotting, enabling detection of more than 10 proteins in each cell during 4 h [102]. Due to cell loss, thousands of cells are required in single-cell Western blot. To solve the problem, Sinkala et al. introduced a single-cell resolution microfluidic Western blotting for multiple membrane and intracellular proteins expressions in circulating tumor cells with only two starting cells to monitor the response to therapy [101]. To improve identification specificity in single-cell Western blotting, Kim et al. established a molecular mass standard with a "solid phase" protein marker. The magnetic field was used to guide the protein-coated particles into most (>75%) microwells, accomplishing His protein marker release subsequently and protein solubilization and cell lysis simultaneously [103]. To improve analytical sensitivity and throughput, Gumuscu et al. recently introduced a hybrid single-cell Western blotting integrated with separation-encoded microparticles. The dehydrated microparticles were reduced dimensionally based on the hydrogel molding and release method, thereby enhancing the sensitivity obviously. Meanwhile, ERα expression from breast tumor cells were quantified with a reduced immunoprobing time of ~36 h based on mass transport in microparticles [104].

Although single-cellWestern blotting represents a new technology for single-cell protein expression analysis, some limitations are obvious. It is a relative quantification approach due to lack of calibration and it is unable to quantify the secreted proteins. Furthermore, single-cell Western blotting has limited detection sensitivity because proteins are easily lost during processing procedures such as cell lysing, protein immobilization, and repeated antibody stripping.

#### **4. Conclusions and Outlook**

In this review, we summarized the key advances of conventional and microfluidic technologies for single-cell protein analysis in the past two decades, and made an approach comparison for multiplexity, analyte, throughput, and sensitivity (Table 1). The rapid developments and enormous progress of single-cell protein research offer unprecedented opportunities in studying multiplexed, high-throughput, and high-sensitivity of single-cell proteins (including membrane, intracellular, and secreted proteins). Apart from improving our understanding of the cellular molecular mechanisms (cellular heterogeneity), it is helpful for applications of clinical diagnosis, tumor treatments, and drug developments.


**Table 1.** Approach comparison of single-cell protein analysis for multiplexity, analyte, throughput, and sensitivity.

In the field of single-cell protein expression analysis, conventional approaches often have certain advantages, for instance, fluorescence flow cytometry—high throughput; mass cytometry—multiplexed capacity; enzyme-linked immunospot assay—high sensitivity; capillary electrophoresis—comprehensive analytes. Compared with conventional technologies, microfluidic approaches usually integrate several strengths, which makes assays of rare cells possible.

Despite the recent technological advances, the limitations of current single-cell protein analysis technologies are also obvious. From the aspect of multiplexity, current multiplexing is still not enough for whole proteomics detections (>10,000 proteins in a single cell). As for the analyte, comprehensive detections of the membrane, intracellular, and secreted proteins are a mainstream trend that most approaches are capable of only one or two specific types, while in this review only droplet-based microfluidics and barcoding microchips could simultaneously achieve detections but limited in other aspects. Throughput is another important evaluation parameter because of the analysis requiring large numbers of cells and a large amount of data; flow cytometry-based techniques usually beat other techniques in terms of throughput. In addition, high sensitivity is necessary for accurate and quantitative analysis for single-cell proteins; however, most current approaches still cannot reach the single-cell level limit of detecting single-molecule protein quantification.

In addition, in order to achieve comprehensive analysis of single-cell proteins, single-cell proteomic analysis can be combined with multi-omics (e.g., genomics, transcriptomics, or metabolomics). Increasing evidence shows that integrating multiple genetic data was essential to obtain accurate understanding of biological information [105,106]. Moreover, as an important supplementary information in addition to protein abundance, spatial information is also necessary for single-cell proteomic characterization. It includes both protein characteristics such as protein locations and cell characteristics such as cellular phenotypes and cellular dynamics. Combining information from comprehensive multi-omics and spatial-omics, a complete new insight of cellular status and heterogeneity can be obtained.

In the future work, researchers will still focus on improving multiplexity, analyte, throughput, and sensitivity uniformly based on combination, parallelization, and automation. The combination of multiple technologies can leverage the advantages of different approaches, for example, applying continuous cell flow detections in large-array microchips to increase multiplexity and throughput [107], combining droplets with signal amplification technologies to increase sensitivity, such as immunoassay [108], proximity ligation/extension assay [109,110], and sequence-topology assembly for multiplexed profiling [111]. Besides, parallelization of microchannels for single-cell processing enables increased throughput. Automation is also critical to provide commercial services of transforming technologies into a reliable and effective instrument that can apply to clinical diagnosis and treatments.

**Author Contributions:** Conceptualization, L.L. and J.C.; methodology, L.L. and J.W.; visualization, L.L. and D.C.; writing—original draft, L.L.; writing—review and editing, J.W. and J.C.; supervision, D.C.; funding acquisition, D.C., J.W. and J.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors would like to acknowledge financial supports from the National Natural Science Foundation of China (Grant No. 61431019, 61825107, 61671430, 61922079); Key Project (QYZDB-SSW-JSC011), Youth Innovation Promotion Association and Interdisciplinary Innovation Team of Chinese Academy of Sciences.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Highly Sensitive and Multiplexed In-Situ Protein Profiling with Cleavable Fluorescent Streptavidin**

#### **Renjie Liao 1,**†**, Thai Pham 1,**†**, Diego Mastroeni 2,3, Paul D. Coleman 2,3, Joshua Labaer <sup>1</sup> and Jia Guo 1,\***


Received: 31 January 2020; Accepted: 25 March 2020; Published: 1 April 2020

**Abstract:** The ability to perform highly sensitive and multiplexed in-situ protein analysis is crucial to advance our understanding of normal physiology and disease pathogenesis. To achieve this goal, we here develop an approach using cleavable biotin-conjugated antibodies and cleavable fluorescent streptavidin (CFS). In this approach, protein targets are first recognized by the cleavable biotin-labeled antibodies. Subsequently, CFS is applied to stain the protein targets. Though layer-by-layer signal amplification using cleavable biotin-conjugated orthogonal antibodies and CSF, the protein detection sensitivity can be enhanced at least 10-fold, compared with the current in-situ proteomics methods. After imaging, the fluorophore and the biotin unbound to streptavidin are removed by chemical cleavage. The leftover streptavidin is blocked by biotin. Upon reiterative analysis cycles, a large number of different proteins with a wide range of expression levels can be profiled in individual cells at the optical resolution. Applying this approach, we have demonstrated that multiple proteins are unambiguously detected in the same set of cells, regardless of the protein analysis order. We have also shown that this method can be successfully applied to quantify proteins in formalin-fixed paraffin-embedded (FFPE) tissues.

**Keywords:** proteomics; immunofluorescence; immunohistochemistry

#### **1. Introduction**

Multiplexed molecular profiling in single cells in situ holds great promise to reveal cell-to-cell variations, cell-microenvironment interactions and tissue architecture at the single cell level, which are masked by population-based measurements [1,2]. Various methods [3–10] have been developed for multiplexed single-cell analysis. An increasing number of studies have been focused on proteins, for their central roles in biological processes. Immunofluorescence (IF) is a well-established single-cell in-situ protein analysis platform. However, on each specimen, only a couple of proteins can be profiled by IF, due to spectral overlap of commonly available organic fluorophores [11].

To enable multiplexed in-situ protein profiling, a number of methods [12–20] have been developed recently. In these methods, the detection tags are either conjugated to the primary antibodies or the secondary antibodies. Without signal amplification, the existing methods have limited detection sensitivity, which limits the analysis of low-expression proteins. Moreover, the low sensitivity is exacerbated in highly autofluorescent, formalin-fixed, paraffin-embedded (FFPE) tissues [13], which are the most common type of archived clinical tissue samples [21]. Additionally, due to their weak sensitivity, the current methods require long imaging exposure times, which results in limited sample throughput and long assay times.

Here, we report a highly sensitive and multiplexed in-situ protein analysis method. In this approach, protein targets are sensitively detected by cleavable biotin-conjugated antibodies and cleavable fluorescent streptavidin (CFS) using a layer-by-layer signal amplification method. Through reiterative cycles of protein labeling, signal amplification, fluorescence imaging, signal removal and streptavidin blocking, comprehensive protein profiling can be achieved in individual cells at the optical resolution. To demonstrate the feasibility of this approach, we designed and synthesized cleavable biotin-conjugated antibodies and CFS. We showed that the detection sensitivity of our approach is at least one order of magnitude higher than the current in-situ proteomics methods. We demonstrated that our approach enables accurate multiplexed protein analysis in single cells, without prior knowledge of the protein expression levels. We also showed proteins in FFPE tissues can be successfully profiled using our approach.

#### **2. Materials and Methods**

#### *2.1. General Information*

Chemicals and solvents were purchased from Sigma-Aldrich or TCI America, and used directly without further purification, unless otherwise noted. Bioreagents were purchased from Invitrogen, unless otherwise indicated.

#### *2.2. Cell Culture*

HeLa CCL-2 cells (ATCC) were maintained in Dulbelcco's modified Eagle's Medium (DMEM) supplemented with 10% fetal bovine serum, 100 U/mL penicillin and 100 g/mL streptomycin in a humidified atmosphere at 37 ◦C with 5% CO2. Cells were plated on chambered coverglass (0.2 mL medium/chamber) (Thermo Fisher Scientific) and allowed to reach 60% confluency in 1–2 days.

#### *2.3. Cell Fixation and Permeabilization*

Cultured HeLa CCL-2 cells were fixed with 4% formaldehyde (Polysciences) in 1X PBS (phosphate buffered saline) at 37 ◦C for 15 min, followed by washing with 1X PBS for 3 × 5 min. Cells were then permeabilized with PBT (0.1% Triton-X 100 in 1X PBS) for 10 min at room temperature, and subsequently washed three times with 1X PBS, each for 5 min.

#### *2.4. Preparation of Cleavable Fluorescent Streptavidin*

Cleavable Cy5 NHS ester was prepared according to the literature [14]. To 20 μL of streptavidin solution at a concentration of 1 mg/mL, 1 nmol of cleavable Cy5 NHS ester and 2 μL of NaHCO3 solution (1 M) were added. The mixture was incubated in the dark and at room temperature for 15 min. The labeled streptavidin was purified by p-6 biogel column.

#### *2.5. Preparation of Biotin-SS-Ab*

To 20 μL of primary antibody solution at a concentration of 1 mg/mL, 3 nmol of EZ link Sulfo-NHS-SS-Biotin (Thermo Fisher Scientific) and 2 μL of NaHCO3 solution (1 M) were added. The mixture was incubated in the dark and at room temperature for 15 min, and then the conjugation product was purified by p-6 biogel column.

#### *2.6. Immunofluorescence with CFS*

Fixed HeLa cells were incubated with antibody-blocking buffer (10% normal goat serum (*v*/*v*), 1% bovine serum albumin (*w*/*v*), 0.1 Triton-X 100 in 1X PBS) for 1 h at room temperature, and then washed three times with PBT, each for 5 min. To block the cell endogenous biotin, the cells were treated with 0.1 mg/mL streptavidin in 1X PBS for 15 min at room temperature, and washed three times with

1X PBS, each for 5 min. Subsequently, the cells were incubated with 0.5 mg/mL biotin in 1X PBS for 30 min at room temperature, and washed with 1X PBS three times, each for 5 min. After blocking, the cells were incubated with Biotin-SS-Ab in antibody-blocking buffer (concentration varied and was suggested by the manufacturers) for 45 min at room temperature, and washed with PBT three times, each for 10 min. Subsequently, the cells were incubated with 10 μg/mL cleavable fluorescent streptavidin in 1% BSA (bovine serum albumin) in PBT for 30 min, and washed three times with 1X PBS, each for 5 min. The cells were washed with GLOX buffer (0.4% glucose and 10 mM Tris HCl in 2 X SSC) for 1–2 min at room temperature, and then imaged in GLOX solution (0.37 mg mL−<sup>1</sup> glucose oxidase and 1% catalase in GLOX buffer).

#### *2.7. Signal Amplification*

To amplify the staining signal, the cells were incubated with cleavable biotin-conjugated goat anti-chicken antibodies (Thermo Fisher Scientific) in 1% BSA in PBT at a concentration of 10 μg/mL for 30 min, and then washed three times with 1X PBS, each for 5 min. Afterwards, the cells were incubated with cleavable fluorescent streptavidin in 1% BSA in PBT at a concentration of 10 μg/mL, and again washed three times with 1X PBS, each for 5 min. Multiple amplification cycles can be repeated to obtain the desired signal intensity.

#### *2.8. Fluorophore and Biotin Cleavage*

Fluorophore and biotin cleavage was performed by incubating the specimen with tris(2-carboxyethyl)phosphine (TCEP, pH = 9.5, 100 mM in deionized water) for 30 min at 37 ◦C. Subsequently, the cells were washed three times with PBT and three times with 1X PBS, each for 5 min.

#### *2.9. Streptavidin Blocking*

After cleavage, the cells were incubated with 0.5 mg/mL biotin in 1X PBS for 30 min at room temperature, and then washed three times with 1X PBS, each for 5 min.

#### *2.10. Quantification of the Fluorophore Cleavage E*ffi*ciency*

Fixed and blocked HeLa CCL-2 cells were incubated with 10 μg/mL cleavable biotin-labeled rabbit anti-Ki67 (Thermo Fisher Scientific) for 45 min. Subsequently, the cells were stained by 10 μg/mL cleavable fluorescent streptavidin. Then, one, two, three and four rounds of amplification were applied to different sets of cells. In each round of amplification, the cells were first incubated with cleavable biotin-labeled goat-anti-chicken antibodies and then with cleavable fluorescent streptavidin. The cells were then incubated with TCEP (100 mM, pH = 9.5) for 30 min at 37 ◦C. Subsequently, the cells were washed three times with PBT and three times with 1X PBS, each for 5 min.

#### *2.11. Quantification of the Biotin Cleavage E*ffi*ciency*

Fixed and blocked HeLa CCL-2 cells were incubated with 10 μg/mL cleavable biotin-labeled rabbit anti-Ki67 (Thermo Fisher Scientific) for 45 min. Subsequently, cells were stained by 10 μg/mL cleavable fluorescent streptavidin. Following that, one, two, three and four rounds of amplification were applied to different sets of cells. In each round of amplification, the cells were first incubated with cleavable biotin-labeled goat-anti-chicken antibodies and then with cleavable fluorescent streptavidin. Biotin and fluorophores were cleaved by TCEP (100 mM, pH = 9.5). The cells were then incubated with cleavable fluorescent streptavidin.

#### *2.12. Quantification of the Streptavidin Blocking E*ffi*ciency*

Fixed and blocked HeLa CCL-2 cells were incubated with 10 μg/mL cleavable biotin-labeled rabbit anti-Ki67 (Thermo Fisher Scientific) for 45 min. Subsequently, cells were stained by 10 μg/mL cleavable fluorescent streptavidin. Following that, one, two, three and four rounds of amplification were applied to different sets of cells. In each round of amplification, the cells were first incubated with cleavable biotin-labeled goat-anti-chicken antibodies and then with cleavable fluorescent streptavidin. Biotin and fluorophores were cleaved by TCEP (100 mM, pH = 9.5). The cells were blocked with 0.5 mg/mL biotin. The cells were incubated with cleavable biotin-labeled goat anti-chicken and then cleavable fluorescent streptavidin.

#### *2.13. Multiplexed Protein Analysis in HeLa Cells*

Fixed and blocked HeLa CCL-2 cells were incubated with 10 μg/mL cleavable biotin-labeled primary antibodies. Subsequently, cells were stained by 10 μg/mL cleavable fluorescent streptavidin. One to three amplification cycles were applied. In each amplification cycle, cells were first incubated with cleavable biotin-labeled goat anti-chicken antibodies (Thermo Fisher Scientific) and then cleavable fluorescent streptavidin. After imaging, cells were incubated with TCEP (100 mM, pH = 9.5) to cleave the fluorophores and biotin. Cells were then blocked with 50 mM iodoacetamide, 0.1 mg/mL streptavidin and 0.5 mg/mL biotin, followed by the next immunofluorescence cycle. Rabbit anti-c-erbB-2 (Thermo Fisher Scientific), rabbit anti-Ki67 (Thermo Fisher Scientific), and rabbit anti-Histone H4 (mono methyl K20) (Abcam) were used as primary antibodies.

#### *2.14. Conventional Immunofluorescence*

Fixed and blocked HeLa CCL-2 cells were incubated with 10 μg/mL Cy5-labeled or unconjugated primary antibodies. Subsequently, cells were stained by 10 μg/mL Cy5-labeled goat anti-rabbit secondary antibodies (Thermo Fisher Scientific). Rabbit anti-c-erbB-2, rabbit anti-Ki67, and rabbit anti-Histone H4 (mono methyl K20) were used as primary antibodies. Cy5-labeled rabbit anti-Ki67 was prepared according to the literature [14].

#### *2.15. Depara*ffi*nization and Antigen Retrieval of FFPE Tissues*

A brain FFPE tissue slide was deparaffinized in xylene three times, for 10 min each. Then the slide was immersed in 100% ethanol for 2 min, 95% ethanol for 1 min, 70% ethanol for 1 min, 50% ethanol for 1 min, 30% ethanol for 1 min. The slide was rinsed with deionized water. Afterwards, a combination of 'heat induced antigen retrieval' (HIAR) and 'enzymatic antigen retrieval' was used. HIAR was done using a pressure cooker (Cuisinart). The slide was immersed in antigen retrieval buffer (10 mM sodium citrate, 0.05% Tween 20, pH = 6.0), and water-bathed in pressure cooker for 20 min with the 'High pressure' setting. Subsequently, the slide was rinsed three times with 1X PBS, each for 5 min. The slides were treated with pepsin digest-all 3 (Life Technologies) for 10 min, and then washed three times with 1X PBS, each for 5 min.

#### *2.16. Protein Staining in FFPE Tissues*

To block the endogenous biotin, the slide was treated with 0.1 mg/mL streptavidin in 1X PBS for 15 min at room temperature, and washed three times with 1X PBS, each for 5 min. Subsequently, the slides were incubated with 0.5 mg/mL biotin in 1XPBS for 30 min at room temperature, and washed three times with 1X PBS, each for 5 min. The slide was incubated with 10 μg/mL cleavable biotin-labeled rabbit anti-H3K4me3 (Cells Signaling) in antibody blocking buffer for 45 min, and washed three times with PBT, each for 10 min. The slide was stained by 10 μg/mL cleavable fluorescent streptavidin for 30 min, and then washed three times with 1X PBS, each for 5 min. Two cycles of amplification were applied. In each round of amplification, the cells were first incubated with cleavable biotin-labeled goat-anti-chicken antibodies and then with cleavable fluorescent streptavidin. After imaging, the slide was incubated with TCEP (100 mM, pH = 9.5) for 30 min at 37 ◦C and washed three times with PBT and three times with 1X PBS, each for 5 min. Streptavidin was blocked with 0.5 mg/mL Biotin. The tissue was re-stained with 10 μg/mL cleavable biotin-labeled goat anti-chicken and then 10 μg/mL cleavable fluorescent streptavidin.

#### *2.17. Imaging and Data Analysis*

Stained cells and brain FFPE tissue were imaged under a Nikon Ti-E epifluorescence microscope equipped with a 20× objective. Images were captured using a CoolSNAP HQ2 camera and Chroma filter 49009. Image data was analyzed with NIS-Elements Imaging software.

#### **3. Results**

#### *3.1. Platform Design*

As shown in Figure 1, each staining cycle of our multiplexed protein profiling technology is composed of five major steps. First, proteins of interest are targeted by cleavable biotin-labeled primary antibodies and cleavable fluorescent streptavidin (CFS). Second, the specimen is incubated with a cleavable biotin-labeled orthogonal antibody or protein, which does not bind to any specific targets in the specimen or the primary antibodies. Then, CFS can be applied again to amplify the signal. This second step can be repeated several times to achieve the desired signal intensities through layer-by-layer signal amplification. Third, the specimen is imaged to generate quantitative single-cell protein expression profiles. Fourth, the fluorophore and the biotin unbound to streptavidin are efficiently removed by chemical cleavage. Finally, the leftover streptavidin is blocked with biotin. Through reiterative cycles of staining, amplification, imaging, cleavage and streptavidin blocking, a large number of different proteins with a wide range of expression levels can be characterized in single cells in situ.

**Figure 1.** Highly sensitive and multiplexed in-situ protein profiling with cleavable fluorescent streptavidin (CFS). In each cycle, the protein of interest is first targeted by cleavable biotin-labeled primary antibodies, and then stained with CFS. Though layer-by-layer signal amplification using cleavable biotin-conjugated orthogonal antibodies and CFS, highly sensitive protein detection is achieved. After imaging, the fluorophore and the biotin unbound to streptavidin are chemically cleaved and subsequently streptavidin is blocked by biotin. Through reiterative cycles of target staining, signal amplification, fluorescence imaging, chemical cleavage and streptavidin blocking, comprehensive protein profiling can be achieved.

#### *3.2. Design and Synthesis of Cleavable Biotin-Conjugated Antibodies and CFS*

To demonstrate the feasibility of this approach, we conjugated biotin to antibodies through a disulfide-bond-based cleavable linker and Cy5 to streptavidin through an azide-based cleavable linker, according to a previously described method [14]. In this way, both biotin and Cy5 can be simultaneously removed by the reducing reagent tris(2-carboxyethyl)-phosphine (TCEP).

#### *3.3. Significantly Enhanced Detection Sensitivity*

We then evaluated the detection sensitivity of our approach by comparing it with direct and indirect immunofluorescence (IF). Protein Ki67 in HeLa cells was stained with these three methods with the same concentration of primary antibodies (Figure 2). The staining patterns obtained by the three methods closely resemble each other. Compared with direct and indirect immunofluorescence, the CFS method does not lose the staining resolution (Figure 2A). Without any signal amplification steps, the CFS method is ~4.5 times more sensitive than direct immunofluorescence (*p* = 6.6e-5) and is comparable to indirect immunofluorescence (*p* = 0.36) (Figure 2B). With four rounds of signal amplification, the original staining intensities were further increased by more than 10 times (*p* = 3.8e-12) (Figure 3), while the staining background remained almost the same (Figure 3C). These results demonstrate that our approach has at least one order of magnitude higher detection sensitivity compared with the existing in-situ proteomics methods. The staining patterns obtained by direct IF, indirect IF and our approach closely resemble each other (Figures 2A and 3A), suggesting that our signal amplification method does not lose the staining resolution.

**Figure 2.** (**A**) Fluorescent images of protein Ki67 stained with direct IF, indirect IF and cleavable fluorescent streptavidin (CFS). Scale bars, 20 μm. (**B**) Comparison of the averaged signal integration in single cells (*n* = 30) for the three methods. Error bars, standard deviation.

**Figure 3.** (**A**) Fluorescent images of protein Ki67 stained with 0 to 4 amplification cycles in HeLa cells. Scale bars, 20 μm. (**B**) Averaged signal integration in single cells (*n* = 30) in amplification cycles 0 to 4. Error bars, standard deviation. (**C**) Fluorescence intensity profiles corresponding to the indicated line positions in amplification cycles 0 to 4.

#### *3.4. E*ffi*cient Fluorophore and Biotin Cleavage and Streptavidin Blocking*

To enable multiplexed protein analysis by reiterative analysis cycles, three major requirements exist. (1) Fluorescence signals need to be efficiently erased by chemical cleavage. (2) The biotin not bound to streptavidin has to be efficiently removed to avoid false positive signals in the next staining cycle. (3) As TCEP can not effectively cleave the biotin bound to streptavidin (data not shown), the free binding sites on the leftover streptavidin need to be efficiently blocked before the next staining cycle. To assess whether these three requirements are met by the CFS approach, we stained protein Ki67 with 0 to 4 amplification cycles in different sets of HeLa cells, and first quantified the cleavage efficiency (Figure 4). After TCEP incubation, ~95% of signal was removed regardless of the number of amplification rounds. To test whether the biotin unbound to streptavidin can be removed by TCEP, we stained protein Ki67 with 0 to 4 amplification cycles in different sets of HeLa cells (Figure 5). After TCEP cleavage, the cells were incubated the CFS again. No further fluorescence signal enhancement was introduced, suggesting that the free biotin is efficiently removed during the cleavage step. To evaluate the streptavidin blocking efficiency, we stained protein Ki67 with 0 to 4 amplification cycles in different sets of HeLa cells (Figure 6). Subsequently, the cells were incubated with TCEP and then with biotin to block streptavidin. Another round of signal amplification was applied and no further fluorescence signal enhancement was detected. These results indicate that streptavidin is efficiently blocked by biotin.

**Figure 4.** (**A**) Fluorescent images of protein Ki67 stained with 0 to 4 amplification cycles in HeLa cells and those after cleavage. The exposure time in amplification cycles 0 to 4 is 1 s, 500 ms, 250 ms, 125 ms, 62 ms, respectively. Scale bars, 20 μm. (**B**) Fluorescence intensity profiles corresponding to the indicated line positions in amplification cycles 0 to 4.

**Figure 5.** (**A**) Fluorescent images of protein Ki67 stained with 0 to 4 amplification cycles in HeLa cells, after cleavage and re-stained with CFS. The exposure time in amplification cycles 0 to 4 is 1 s, 500 ms, 250 ms, 125 ms, 62 ms, respectively. Scale bars, 20 μm. (**B**) Fluorescence intensity profiles corresponding to the indicated line positions in amplification cycles 0 to 4.

**Figure 6.** (**A**) Fluorescent images of protein Ki67 stained with 0 to 4 amplification cycles in HeLa cells and those after cleavage. Following streptavidin blocking, the cells were re-stained with cleavable biotin-labeled orthogonal antibodies and CFS. The exposure time in amplification cycles 0 to 4 is 1 s, 500 ms, 250 ms, 125 ms, 62 ms, respectively. Scale bars, 20 μm. (**B**) Fluorescence intensity profiles corresponding to the indicated line positions in amplification cycles 0 to 4.

#### *3.5. Multiplexed In-Situ Protein Profiling*

To demonstrate the feasibility of applying the CFS method for multiplexed protein analysis, we labeled protein c-erbB-2, Ki67 and H4K20me through reiterative staining cycles in the same set of HeLa cells (Figure 7). The staining signals generated in the previous cycles do not reappear in the following cycles, confirming that the fluorophore and free biotin are efficiently cleaved and streptavidin is efficiently blocked. We also stained these three proteins by conventional immunofluorescence (Figure S1). The staining results obtained by our approach and conventional immunofluorescence closely resemble each other. These results suggest that the CFS method enables the multiplexed protein profiling in single cells in situ.

**Figure 7.** (**A**) Protein c-erbB-2, Ki67 and H4K20me were detected with CFS through reiterative staining cycles in the same set of HeLa cells. (**B**) Nuclei were stained with DAPI. (**C**) Digital overlay of the three staining images in (**A**). (**D**) Staining intensity (*n* = 40) for the three proteins. Error bars, standard deviation. Scale bars, 20 μm.

Existing reiterative protein profiling methods [12–19] require knowledge of the relative protein expression levels in advance. With that prior knowledge, proteins are quantified in the order of their increasing expression levels, to minimize the interference from the leftover signals generated in the previous cycles. However, due to the limited amount of the biological and clinical samples, to obtain prior knowledge of protein expression levels is sometimes not possible. In addition, the relative protein expression levels in different cell types in the same specimen can be different, which makes it difficult to develop a desired protein analysis order for all the cell types. Moreover, the process to generate such prior knowledge can be time-consuming and expensive. Our CSF method addresses all of these issues by eliminating the requirement of knowing protein expression levels in advance. In our approach, the protein staining signal in each analysis cycle can be amplified with a certain number of amplification cycles until the satisfied staining intensities are achieved. In this way, following the analysis of high-expression proteins in the previous cycles, the low-expression proteins can be accurately quantified by more amplification cycles. To demonstrate the feasibility of this concept, we profiled protein H4K20me, Ki67 and c-erbB-2 in the same set of HeLa cells in the order of decreasing expression levels (Figure 8A–C). As a result of efficient fluorophore and biotin cleavage and also efficient streptavidin blocking, protein Ki67 was successfully detected following the analysis of high-expression H4K20me. However, due to the extremely low expression level of c-erbB-2 and the accumulated leftover signals produced in the previous two cycles, it was difficult to detect protein c-erbB-2 without signal amplification (Figure S2). After one cycle of signal amplification, the staining signal of c-erbB-2 was significantly enhanced. The significant stochastic protein expression heterogeneity resulted in the relatively large error bars in Figures 7D and 8D [14]. With the improved signal-to-background ratio, the low-expression c-erbB-2 was unambiguously detected following the analysis of two high-expression proteins. These results indicate that the CFS method does not require the prior knowledge of protein expression levels and enables accurate protein analysis regardless of the protein analysis order.

**Figure 8.** (**A**) H4K20me and Ki67 were detected with CFS through reiterative staining cycles without signal amplification. Afterwards, protein c-erbB-2 was detected by signal amplification with CFS in the same set of HeLa cells. (**B**) Nuclei were stained with DAPI. (**C**) Digital overlay of the three staining images in (**A**). (**D**) Staining intensity (*n* = 40) for the three proteins. Error bars, standard deviation. Scale bars, 20 μm.

#### *3.6. In-Situ Protein Profiling in Human FFPE Tissues*

Archived tissues are important biological samples to study normal physiology and disease pathogenesis. Formalin-fixed, paraffin-embedded (FFPE) tissue is the most common form of archived tissue in clinics and pathology labs [21]. FFPE tissues often display high autofluorescence [13] and partially degraded proteins [22], which makes them difficult to profile by fluorescence imaging methods with low detection sensitivity. To demonstrate the feasibility of applying the CFS approach to analyze FFPE tissues, we stained H3K4me3 in FFPE human brain tissue (Figure 9). With two rounds of signal amplification, the signal-to-background ratio was significantly improved. After cleavage, the fluorescence signal was efficiently removed. Another round of signal amplification cycle after cleavage and streptavidin blocking did not further increase the staining intensities. These results confirm that the flurophore and the free biotin can be efficiently removed by TCEP and streptavidin can be efficiently blocked by biotin. These results also imply that the CFS approach can be successfully applied to quantify the proteins in FFPE tissues.

**Figure 9.** Protein H3K4me3 in human FFPE brain tissue was stained with CFS in amplification cycles (**A**) 0, (**B**) 1 and (**C**) 2. (**D**) Afterwards, the stained tissue was incubated with TCEP. (**E**) Following chemical cleavage and streptavidin blocking, the tissue was incubated with cleavable biotin-conjugated antibodies and CFS again. (**F**) Fluorescence intensity profiles corresponding to the indicated line positions in (**A**) to (**E**). Scale bars, 25 μm.

#### **4. Discussion**

In summary, we have designed and synthesized CFS and demonstrated that this multiplexed in-situ protein analysis approach enhances the detection sensitivity of the existing in-situ proteomics approaches by at least ten times. This improved sensitivity enables our approach to precisely quantify the low-expression proteins, which are not detectable by other current in-situ proteomics methods. In this way, our approach also improves the dynamic range of protein detection by one order of magnitude. We have also shown that multiple proteins can be accurately detected in the same specimen using our approach, regardless of the analysis order of proteins with varied expression levels. With its dramatically-improved sensitivity, our approach enables the quantitative analysis of low-expression proteins, especially in the highly autofluorescent FFPE tissue samples.

Similarly to other in-situ proteomics assays, our approach applies the signal intensities generated by the target-bound antibodies to infer the relative abundances of the proteins. As a result, it can be difficult to compare the results obtained using different antibodies with varied binding affinities and specificities. To precisely quantify the amount of proteins in the sample, mass spectrometry can be applied first, to determine the absolute copy number of the proteins in standard cells. Then, these standard cells can be analyzed together with the sample of interest using our approach. By calibrating the generated results with the standard cells, we can calculate the exact copy number of the protein target in the sample.

The multiplexing capacity of this approach depends on two factors: the number of reiterative analysis cycles and the number of proteins quantified in each cycle. We have shown previously that protein antigenicity is preserved after incubation with TCEP for at least 24 h [14], which suggests that more than 40 cycles can be carried out on the same specimen. In each cycle, varied protein targets can be first recognized by primary antibodies labeled with distinct cleavable haptens, such as biotin, fluorescein, TAMRA, and digoxigenin (DIG). Subsequently, streptavidin, anti-fluorescein, anti-TAMRA, and anti-DIG antibodies labeled with different fluorophores can be applied to stain the protein targets and amplify the signals. In this way, at least four proteins can be quantified simultaneously in each cycle. Thus, we anticipate this method has the potential to analyze over 100 protein targets in the same specimen.

In addition to protein profiling, this cleavable layer-by-layer signal amplification approach developed here can also be applied for highly sensitive in-situ DNA [6], RNA [23,24] and metabolic analysis [25]. By combining these applications, integrated in-situ genomics, proteomics and metabolomics analysis can be achieved in the same specimen at the optical resolution. This highly sensitive and multiplexed molecular imaging platform will have wide applications in systems biology and biomedical research.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4409/9/4/852/s1, Figure S1: Fluorescent images obtained by conventional immunofluorescence; Figure S2: Fluorescent image of protein c-erbB-2 stained in the third analysis cycle without signal amplification.

**Author Contributions:** Conceptualization, J.L. and J.G.; methodology, R.L., T.P. and J.G.; software, R.L., and T.P.; validation, R.L., and T.P.; formal analysis, R.L., T.P., D.M., P.D.C. and J.G.; investigation, R.L., T.P., D.M., P.D.C. and J.G.; resources, D.M., P.D.C. and J.G.; data curation, R.L., T.P. and J.G.; writing—original draft preparation, R.L., T.P. and J.G.; writing—review and editing, R.L., T.P., J.L. and J.G.; visualization, R.L., T.P. and J.G.; supervision, J.G.; project administration, J.G.; funding acquisition, J.G.". All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Institute of General Medical Sciences (1R01GM127633), the National Institute of Allergy and Infectious Diseases (R21AI132840), Arizona State University startup funds, and Arizona State University/Mayo Clinic seed grant (ARI-219693).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
