**Potential Involvement of lncRNAs in the Modulation of the Transcriptome Response to Nodavirus Challenge in European Sea Bass (***Dicentrarchus labrax* **L.)**

**Patricia Pereiro 1,2 , Raquel Lama <sup>1</sup> , Rebeca Moreira <sup>1</sup> , Valentina Valenzuela-Muñoz 2, Cristian Gallardo-Escárate 2, Beatriz Novoa <sup>1</sup> and Antonio Figueras 1,\***


Received: 8 June 2020; Accepted: 13 July 2020; Published: 15 July 2020

**Abstract:** Long noncoding RNAs (lncRNAs) are being increasingly recognised as key modulators of various biological mechanisms, including the immune response. Although investigations in teleosts are still lagging behind those conducted in mammals, current research indicates that lncRNAs play a pivotal role in the response of fish to a variety of pathogens. During the last several years, interest in lncRNAs has increased considerably, and a small but notable number of publications have reported the modulation of the lncRNA profile in some fish species after pathogen challenge. This study was the first to identify lncRNAs in the commercial species European sea bass. A total of 12,158 potential lncRNAs were detected in the head kidney and brain. We found that some lncRNAs were not common for both tissues, and these lncRNAs were located near coding genes that are primarily involved in tissue-specific processes, reflecting a degree of cellular specialisation in the synthesis of lncRNAs. Moreover, lncRNA modulation was analysed in both tissues at 24 and 72 h after infection with nodavirus. Enrichment analysis of the neighbouring coding genes of the modulated lncRNAs revealed many terms related to the immune response and viral infectivity but also related to the stress response. An integrated analysis of the lncRNAs and coding genes showed a strong correlation between the expression of the lncRNAs and their flanking coding genes. Our study represents the first systematic identification of lncRNAs in European sea bass and provides evidence regarding the involvement of these lncRNAs in the response to nodavirus.

**Keywords:** RNA-Seq; lncRNAs; *Dicentrarchus labrax*; viral infection; nodavirus; immune response

#### **1. Introduction**

European sea bass (*Dicentrarchus labrax* L.) is a very valuable fish species, especially for Mediterranean countries. This species was the first non-salmonid species to be cultured in Europe, and since the 1990s, sea bass aquaculture has grown exponentially and is currently one of the main cultured fish species in Europe [1]. However, different diseases cause important economic losses and represent a major limiting factor for production. These effects are observed for nervous necrosis virus (NNV), or nodavirus, a member of the family Nodaviridae, genus *Betanodavirus*, which can affect numerous aquatic animals, including a wide variety of marine and freshwater fish species [2]. This icosahedral, nonenveloped, positive-sense single-stranded RNA virus is the causative agent of viral encephalopathy and retinopathy (VER), which is characterised by damage to its target tissues,

the nervous system (e.g., brain, retina and spinal cord) [3]. Among the four nodavirus genotypes, European sea bass seems to be primarily affected by red-spotted grouper nervous necrosis virus (RGNNV), especially during larval and juvenile stages [2,3].

In recent years, some publications have reported the involvement of numerous immune factors in the defence against nodavirus in different tissues from *D. labrax* [4–12]. The development of next-generation sequencing (NGS) technologies also enabled us to analyse the complete transcriptome response after NNV infection both in vitro [13,14] and, more recently, in vivo [15]. Nevertheless, the potential role of noncoding RNAs (ncRNAs) in the modulation of the response to NNV infection in *D. labrax* has not been determined.

The non-coding regions of the genome remained largely unexplored until the last decade. Nevertheless, when it was discovered that the genome is transcribed into many non-coding RNAs (ncRNAs), in addition to the well-known transfer RNAs (tRNAs) or ribosomal RNAs (rRNAs), numerous investigations were conducted in a variety of species in an attempt to identify them and to describe their functions and expression profiles. The long noncoding RNAs (lncRNAs) represent a subset of ncRNAs with a length over 200 nucleotides and transcribed in the same way as mRNA [16]. However, lncRNAs are among least well-understood ncRNAs. This is probably due to the variety of regulatory mechanisms that they could affect [16], but also to their rapid evolution, and consequently, to the absence of recognisable homologs for most of the lncRNAs even in evolutionarily close species [17,18] and to the divergent subcellular location and function of certain conserved lncRNAs [19]. LncRNAs regulate the expression of adjacent genes (*cis*-acting regulation) or genes located at other genomic locations, even on different chromosomes (*trans*-acting) [16]. The gene expression promotion or repression mediated by lncRNAs can be conducted through different mechanisms, including chromatin remodelling, promoter inactivation, transcription interference, activation and/or transport of transcription factors and epigenetic regulation [16]. This wide variety of potential regulatory mechanisms makes it difficult to establish concrete functions for specific lncRNAs. Nevertheless, lncRNAs have been demonstrated to be involved in immune system regulation in mammals [20,21].

In fish, although functional studies remain highly limited, several publications reported the modulation of lncRNAs after different stimuli and/or conditions with a special focus on the immune system [19]. Because lncRNAs can alter the expression of their adjacent genes, the functionality of those lncRNAs identified in fish is usually predicted based on the function of their neighbouring protein-coding genes [22].

In this work, we identified the lncRNA repertoire in the head kidney and brain from European sea bass. We selected these tissues because the head kidney is the main immune and hematopoietic organ in fish and the brain is a target tissue for NNV. Our results showed the existence of tissue-specific lncRNAs, with a specialised role in the maintenance of the basic functions of each tissue, as well as a broad modulation of the lncRNAs after NNV challenge. The functionality of these infection-induced lncRNAs was predicted based on the analysis of their nearby coding genes, revealing a variety of processes in which they could be involved. Because samples were taken at 24 and 72 h post-infection (hpi), we also observed a highly variable lncRNA profile depending on the sampling point, which could be indicative of the fine-tuned gene regulation mediated by these lncRNAs.

#### **2. Materials and Methods**

#### *2.1. Fish and Virus*

Juvenile *D. labrax* (average weight 70 g) were obtained from the Estación de Ciencias Mariñas de Toralla (ECIMAT, Universidad de Vigo, Vigo, Galicia, Spain) facilities. Prior to experiments, fish were acclimatised to the laboratory conditions for 2 weeks and maintained in 500 litres fibreglass tanks with a re-circulating saline-water system (total salinity approximately 35 g/L) with a light-dark cycle of 12:12 h at 20–22 ◦C and fed daily with a commercial diet. Animals were euthanised via MS-222 (Sigma-Aldrich, St. Louis, MO, USA) overdose (500 mg/L). All the experimental procedures were reviewed and approved by the CSIC National Committee on Bioethics under approval number ES360570202001/20/FUN.01/INM06/BNG01.

The nodavirus red-spotted grouper nervous necrosis virus (RGNNV) (strain 475-9/99) was propagated at 25 ◦C in the SSN-1 cell line (ECACC 96082808) cultured in Leibovitz's L-15 medium (Gibco, Carlsbad, CA, USA) supplemented with 2 mM glutamine (Gibco), 2% foetal bovine serum (FBS) (Gibco) and 1% penicillin/streptomycin solution (Invitrogen, Waltham, MA, USA). The virus stock was titrated into 96-well plates according to established protocols [23], and aliquots were stored at −80 ◦C until use.

#### *2.2. Fish Infection and Sampling*

A total of 18 sea bass were intramuscularly injected with 100 μL of a nodavirus suspension (106 TCID50/mL), whereas the same number of fish served as the control and were inoculated with the same volume of medium (L-15 + glutamine + 2% FBS + penicillin/streptomycin). Head kidney and brain samples were taken from nine fish at 24 and 72 h post-challenge. The same quantity of tissue from three animals was pooled, obtaining three biological replicates (three fish/replicate) per tissue at each sampling point.

For this challenge experiment, mortality data and viral replication analysis in head kidney and brain from infected fish were previously provided [15].

#### *2.3. RNA Isolation and High-Throughput Transcriptome Sequencing*

Total RNA from the different samples was extracted using the Maxwell 16 LEV simplyRNA Tissue kit (Promega, Madison, WI, USA) with the automated Maxwell 16 Instrument in accordance with instructions provided by the manufacturer. The quantity and purity of the total RNA was measured in a NanoDrop ND-1000 (NanoDrop Technologies, Inc., Wilmington, DE, USA), and RNA integrity was analysed in the Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA) according to the manufacturer's instructions. All the samples showed an RIN over 8.0; therefore, they were used for Illumina library preparation. Double-stranded cDNA libraries were constructed using the TruSeq RNA Sample Preparation Kit v2 (Illumina, San Diego, CA, USA), and sequencing was performed using Illumina HiSeq™ 4000 technology at Macrogen Inc. Korea (Seoul, Republic of Korea). The read sequences were deposited in the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) under the accession number PRJNA589774.

#### *2.4. Sequence Assembly and LncRNA Mining*

CLC Genomics Workbench, v. 11.0.2 (CLC Bio, Aarhus, Denmark) was used to filter, assemble and perform the RNA-Seq and statistical analyses. Raw reads were trimmed to remove adaptor sequences and low-quality reads (quality score limit 0.05). A reference transcriptome was constructed by de novo assembly using all the samples with an overlap criterion of 70% and a similarity of 0.9 to exclude paralogous sequence variants. The settings used were a mismatch cost = 2, deletion cost = 3, insert cost = 3 and minimum contig length = 200 base pairs. From the de novo assembly, the selection of the lncRNAs was conducted following the pipeline previously described [24] but with some modifications. Briefly, the contigs were annotated using Blastx (e-value <sup>&</sup>lt; <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>5) against the proteins for all the bony fish species included in the NCBI GenBank and UniProt/Swiss-Prot databases. All the annotated contigs were deleted from the assembly, as well as all the contigs with an average coverage <50. The potential open reading frames (ORFs) were predicted for the remaining contigs, and those with a potential ORF >200 bp were discarded. After this step, the Coding Potential Assessment Tool (CPAT) [25] was used to discard sequences with coding potential. The contigs that passed all the filters were considered putative lncRNAs and retained for further analyses.

#### *2.5. Di*ff*erential Expression Analysis*

RNA-Seq analyses of the putative lncRNAs were conducted with the following settings: mismatches = 2, length fraction = 0.8, similarity fraction = 0.8. The expression values were set as transcripts per million (TPM). To identify and quantify the directions of variability in the data, a principal component analysis (PCA) plot was constructed by using the original expression values. Finally, Baggerley's statistical analysis test was used to compare gene expression levels and to identify differentially expressed lncRNAs. Transcripts with absolute fold change (FC) values >2 and false discovery rate (FDR) <0.05 were retained for further analyses. Those significantly modulated lncRNAs were selected, and the average TMP value of the three biological replicates is represented in heat maps. To this end, the distance metric was calculated by Pearson's method, and lncRNAs were hierarchically clustered by average linkage. Venn diagrams were constructed with the Venny 2.1 tool (http://bioinfogp.cnb.csic.es/tools/venny/).

#### *2.6. Genome Mapping and Identification of LncRNA Neighbouring Coding Genes*

To position the putative lncRNAs on the genome, the *D. labrax* genome [26], composed of 25 scaffolds, was uploaded into the CLC Genomics Workbench. These scaffolds were constructed based on linkage/radiation hybrid groups (LG1A - LGx), in which approximately 83% of the contigs were located [26]. The remaining unanchored scaffolds were concatenated into a virtual chromosome ('UN') [26].

LncRNAs were mapped using the following parameters: length fraction = 0.8, similarity fraction = 0.8, mismatch cost = 2, insertion cost = 3 and deletion cost = 3. The correlation between the chromosome length and the number of mapped lncRNAs per chromosome was calculated with IBM SPSS Statistics V25.0 by using Pearson's correlation coefficient (r). The coding genes flanking up to 10 kb upstream and downstream from the mapped and differentially expressed lncRNAs for each comparison were identified and selected for Gene Ontology (GO).

#### *2.7. GO Enrichment Analyses*

For the significantly differentially expressed lncRNAs in each comparison (Brain 24 h infected vs. Brain 24 h control; Brain 72 h infected vs. Brain 72 h control; HK 24 h infected vs. HK 24 h control; HK 72 h infected vs. HK 72 h control), the lncRNAs-neighbouring coding-genes were extracted for GO enrichment analyses using Blast2GO software [27]. For the GO enrichments, Fisher's exact test was run with default settings, a p-value cut-off of 0.01 was applied, and the enriched list was reduced to the most specific GO terms. The most enriched biological processes among the protein coding genes proximal to lncRNAs were represented. When the number of significantly enriched GO terms exceeded 30, only the 30 most significant terms were represented.

#### *2.8. Correlation Analyses Between LncRNAs and Coding Genes*

Correlations in the expression of lncRNAs and their neighbouring genes were calculated by using Pearson's correlation coefficient and Spearman's rank correlation coefficient with IBM SPSS Statistics V25.0. Correlations were considered to be significant when p-values were <0.01. To illustrate the expression correlations, heat maps were illustrated with the free software Heatmapper [28] and the average TPM values of the different experimental conditions and sampling points were normalised by row. The protein–protein interaction networks were constructed with String 11.0 (https://string-db.org) [29].

#### *2.9. Quantitative PCR (qPCR) Validation of LncRNA Expression*

cDNA synthesis of the samples was conducted with the NZY First-Strand cDNA Synthesis kit (NZYTech, Lisbon, Portugal) using 0.5 μg of total RNA. A total of 12 lncRNAs were used to validate the RNA-Seq results. Specific qPCR primers for the selected lncRNAs were designed using Primer 3

software [30], and their amplification efficiency was calculated with the threshold cycle (CT) slope method [31]. Primer sequences are listed in Supplementary Table S1. Individual qPCR reactions were carried out in a 25-μL reaction volume containing 12.5 μL of SYBR GREEN PCR Master Mix (Applied Biosystems, Foster City, CA, USA), 10.5 μL of ultrapure water, 0.5 μL of each specific primer (10 μM) and 1 μL of two-fold diluted cDNA template in MicroAmp optical 96-well reaction plates (Applied Biosystems). Reactions were conducted using technical triplicates in a 7300 Real-Time PCR System thermocycler (Applied Biosystems). qPCR conditions consisted of an initial denaturation step (95 ◦C, 10 min) followed by 40 cycles of a denaturation step (95 ◦C, 15 s) and one hybridization-elongation step (60 ◦C, 1 min). The relative expression level of the lncRNAs was normalised following the Pfaffl method [31] and using *18S ribosomal RNA* (*18S*) as a reference gene.

#### **3. Results**

#### *3.1. Assembly, Annotation and LncRNA Mining*

A summary of the reads, assembly data, contig annotation and lncRNA information is included in Table 1. Approximately 680 million reads were obtained from the different samples of European sea bass with an average of 28 million per sample, and over 99% of raw reads met the quality control standards. From the de novo assembly, a total of 347,317 contigs were obtained with an average length of 723 bp. A total of 69% of the contigs (240,274) were successfully annotated against the bony fish sequences. From the non-annotated contigs (107,043), a total of 12,158 passed all the filters to be considered potential lncRNAs (Supplementary Table S2). The length of these selected contigs ranged from 200 to 6829 bp with an average length of 667 bp (Figure 1a). The GC content (%) of the predicted lncRNAs ranged from 11 to 80% with an average value of 38% (Figure 1b).


**Table 1.** Summary of the Illumina sequencing, assembly, annotation and lncRNA identification.

From the 12,158 putative lncRNAs, 95.96% (11,667) were successfully mapped to the *D. labrax* genome (Figure 1c). The scaffold with a higher number of lncRNAs was the virtual chromosome 'UN' (2004 lncRNAs). This finding is most likely due to the greater length of this virtual chromosome compared to the true ones. Indeed, a strong positive correlation (Pearson's r = 0.99) existed between the chromosome length and the number of predicted lncRNAs per chromosome (Figure 1d), reflecting that there is no enrichment in lncRNA abundance in any particular chromosome.

**Figure 1.** Features of predicted lncRNAs in *D. labrax*. (**a**) Guanine-cytosine (GC) content and (**b**) length distribution of the 12158 predicted lncRNAs. (**c**) LncRNA abundance and localisation per chromosome. (**d**) Correlation between chromosome length and lncRNA abundance.

#### *3.2. Tissue Distribution of the LncRNAs*

To identify lncRNAs expressed in head kidney but not in brain and vice versa, only those lncRNAs with a TPM value of 0 in all the samples from the same tissue (12 samples) were considered absent in the corresponding tissue. A Venn diagram of the lncRNAs detected in the head kidney and brain was constructed (Figure 2a). While 30 lncRNAs did not show a TPM value in any of the samples both in head kidney and brain, most of the lncRNAs were detected in both tissues (10823). A total of 971 lncRNAs were expressed in the brain but not in the head kidney, whereas 334 lncRNAs were only found in the head kidney.

As expected, GO enrichment analysis of the lncRNAs expressed only in one of the tissues revealed biological processes directly related to specific aspects of the functionality of each organ (Figure 2b,c). In the head kidney, numerous immune terms were enriched, which were mainly related to the production of cytokines, Toll-like receptor signalling, inflammation, leukocyte activation and proliferation, complement pathway and phagocytosis (Figure 2b). On the other hand, neighbouring genes of those lncRNAs expressed only in the brain showed enrichment in terms directly related to the nervous system (Figure 2c).

To illustrate the relevance of the tissue and of the infection in the lncRNA profile, a PCA score plot was constructed (Figure S1). Interestingly, all the head kidney samples clustered separately from the brain samples, reflecting the higher weight of the tissue compared to the nodavirus infection.

**Figure 2.** LncRNAs identified per tissue. (**a**) Venn diagram reflecting the number of lncRNAs with a transcripts per million (TPM) value in at least one of the samples from each tissue. Most of the predicted lncRNAs were detected in both the head kidney and brain. In contrast, a total of 30 predicted lncRNAs obtained a TPM value of 0 in all the samples. (**b**,**c**) The neighbouring coding genes of the lncRNAs expressed in the head kidney but not in the brain (**b**) and vice versa (**c**) were analysed by GO enrichment analyses (biological processes). Only the 30 most significant terms were represented.

#### *3.3. Di*ff*erential Expression of LncRNAs after Nodavirus Challenge*

The differential expression analysis (FC > 2, FDR < 0.05) for each tissue and sampling point is provided in Table S3. A total of 204 and 93 lncRNAs were significantly modulated in the head kidney at 24 and 72 h after nodavirus infection, respectively (Figure 3a). In the brain, 931 and 342 lncRNAs were affected at these sampling points (Figure 3a). In both tissues, the number of upregulated lncRNAs was higher than that of downregulated lncRNAs at 24 h post-challenge, but these differences were substantially reduced after 72 h.

**Figure 3.** Temporal expression of the predicted lncRNAs after nodavirus infection in head kidney and brain. (**a**) Number of lncRNAs up- and downregulated in each tissue at 24 and 72 hpi with nervous necrosis virus (NNV). (**b**) Venn diagram representing the shared and unshared lncRNAs modulated after NNV challenge in both tissues at the different sampling points.

A Venn diagram was constructed to illustrate the shared and unshared lncRNAs modulated in both tissues at different days post-challenge (Figure 3b). Most of the differentially expressed lncRNAs were only modulated in one of the tissues and their expression pattern also completely changed in the same tissue depending on the day after infection (Figure 3b).

Indeed, by analysing the lncRNAs affected by the infection per tissue, we observed that in the head kidney, only 5 lncRNAs were commonly modulated at both 24 and 72 h post-challenge, representing 1.7% of the total lncRNAs affected by the infection in this tissue (Figure 4a). In the brain, the number of shared lncRNAs between both sampling points was 61, representing 5% of the total (Figure 4b). This almost complete switch in the lncRNA profile over time is also reflected in the corresponding heat maps (Figure 4a,b). Interestingly, in both tissues, the pattern of the analysed lncRNAs showed a notably different profile between the control at 24 h and at 72 h (Figure 4a,b). This highlights the importance of including the corresponding controls for each sampling point, especially in the case of

a very stressful manipulation for the challenge (anaesthesia and intramuscular injection). However, the absence of a time 0 control did not allow to determine the basal lncRNA expression profile in the analysed tissues, which could be also an interesting question. In general terms, for both HK and the brain, four main clusters of lncRNAs were observed. For HK, cluster 1 included lncRNAs mainly downregulated at 72 hpi, cluster 2 grouped lncRNAs induced at 24 h by nodavirus, cluster 3 included lncRNAs induced at both 24 and 72 h after infection and cluster 4 included lncRNAs downregulated at both sampling points (Figure 4a). In the brain, cluster 1 was mainly composed of lncRNAs overexpressed at 72 h post-challenge, cluster 2 included those lncRNAs induced both at 24 and 72 h, cluster 3 included lncRNAs highly expressed in controls at 72 h and finally, cluster 4 included the lncRNAs highly represented in the control at 24 h (Figure 4b).

**Figure 4.** Modulation of lncRNAs in (**a**) head kidney and (**b**) brain after nodavirus challenge. Venn diagrams represent the number of shared and unshared lncRNAs significantly modulated at each sampling point (FC > 2, FDR < 0.05). Heat maps of the lncRNAs significantly affected by the infection in each tissue and hierarchical clustering of the different samples constructed with the TPM values.

#### *3.4. GO Enrichment of the LncRNA Neighbouring Coding Genes*

The coding genes flanking the differentially expressed lncRNAs were extracted (Supplementary Table S4) for GO enrichment analysis. The 30 most significantly enriched biological processes are represented in Figures 5 and 6 for head kidney and brain samples, respectively.

For head kidney samples (Figure 5), numerous biological process terms directly involved in immunity were found to be enriched at 24 h post-challenge ('negative regulation of mast cell activation', 'positive regulation of immunoglobulin production', 'positive regulation of B cell proliferation', 'antigen processing and presentation of exogenous peptide antigen via MHC class II' and 'neutrophil mediated immunity'). Nevertheless, this immune representation seemed to be diluted at 72 h, although the immune-related term 'leukotriene production involved in inflammatory response' also appeared to be enriched. In this tissue, some biological process terms suggesting DNA damage and cell cycle arrest were observed both at 24 h ('signal transduction involved in G2 DNA damage checkpoint,' 'signal transduction involved in mitotic DNA damage checkpoint') and 72 h post-challenge ('DNA

damage induced protein phosphorylation,' 'cell cycle arrest'). At this sampling point, the term 'positive regulation of endoplasmic reticulum stress-induced intrinsic apoptotic signalling pathway' was also significantly represented.

**Figure 5.** GO enrichment analysis (biological processes) of the neighbouring coding genes of the differentially modulated lncRNAs (FC > 2, FDR < 0.05) in head kidney after viral challenge. Only the 30 most significant terms were represented.

**Figure 6.** GO enrichment analysis (biological processes) of the neighbouring coding genes of the differentially modulated lncRNAs (FC > 2, FDR < 0.05) in the brain after viral challenge. Only the 30 most significant terms were represented.

In brain samples, the enriched terms were almost completely different from those observed in head kidney (Figure 6). In this case, although a biological process immune term was also found at 24 h ('positive regulation of isotype switching to IgA isotypes'), the representation of immune processes was lower compared to head kidney. As also occurred in the head kidney, some DNA damage terms were also observed ('nucleotide-excision repair, DNA gap filling', 'nucleotide-excision repair, pre-incision complex assembly', 'regulation of double-strand break repair via homologous recombination'), as well as numerous de-ubiquitination-related processes. Moreover, two biological processes related to calcium homeostasis were also represented ('induction of synaptic vesicle exocytosis by positive regulation

of presynaptic cytosolic calcium ion concentration' and 'positive regulation of high voltage-gated calcium channel activity'). After 72 h, the representation of immune terms increased in this tissue, with the biological process enriched terms 'regulation of toll-like receptor 9 signalling pathway', 'negative regulation of antigen receptor-mediated signalling pathway' and 'B cell differentiation'. Two stress-related biological processes were represented ('positive regulation of translation in response to stress' and 'response to isolation stress'), and as was observed in head kidney, an endoplasmic reticulum (ER) stress term ('regulation of response to endoplasmic reticulum stress').

The representation of immune biological processes in both tissues could be directly related to the NNV replication level, since the detection of the virus was higher in the head kidney at 24 hpi compared to the brain, but it remained practically unaltered after 72 h [15]. On the other hand, the NNV detection enormously increased in the brain at 72 hpi [15], which could explain the higher representation of immune-related terms at this sampling point in the brain.

#### *3.5. Expression Correlation of LncRNAs and Coding Genes*

To compare the magnitude of the modulation after nodavirus infection, we constructed scatter dot plots with the fold-change values of the significantly differentially expressed genes (DEGs) and the lncRNAs in the different samples (FC > 2, FDR > 0.05) (Figure S2). For the DEGs, only those that were successfully annotated were included [15]. In general terms, the fold-change variations of the DEGs were considerably more pronounced compared to those observed for the lncRNAs (Figure S2).

To correlate the expression of the lncRNAs and their flanking coding genes, we randomly selected four DEGs after nodavirus infection [15] with at least one adjacent lncRNA significantly modulated after the viral challenge. The *complement component C3* (*c3*), *NK-tumour recognition protein* (*nktr*), *cerebellin 1* (*cbln1*) and *beta-1,4-galactosyltransferase 1* (*b4galt1*), which were overexpressed in the brain after infection, were represented together with all the predicted lncRNAs flanking and/or overlapping those genes. The TPM values were used to construct the heat maps (Figure 7a) and to calculate the correlation values (Figure 7b). Only one potential lncRNA (Lnc\_contig0102476) was found near the *c3* gene, which was significantly upregulated (FC = 3.66) in the brain at 24 hpi. The expression of *c3* and the corresponding lncRNA were strongly correlated, since both were overexpressed in the brain after viral challenge (Figure 7a,b). For *nktr*, four lncRNAs were predicted to be located in the vicinity of the gene (Lnc\_contig0004499, Lnc\_contig0006737, Lnc\_contig0002850, Lnc\_contig0029684), and three of them were significantly upregulated in the brain at 24 and/or 72 h after infection. In this case, all the lncRNAs showed the same tendency as that observed for the *nktr* gene (Figure 7a,b). In the case of *cbln1*, four neighbouring lncRNAs were also identified (Lnc\_contig0006952, Lnc\_contig0221449, Lnc\_contig0041997, Lnc\_contig0114726), and whereas three of them showed the same pattern as cbln1, the expression of Lnc\_contig0221449 was inversely correlated with gene expression (Figure 7a,b). Finally, the lncRNA predicted to be positioned near *b4galt1* (Lnc\_contig0036634) was significantly overexpressed in the brain both at 24 and 72 h after NNV infection, and a notably high correlation with gene expression was observed (Figure 7a,b).

**Figure 7.** Correlation between differentially modulated coding genes after NNV infection and their flanking lncRNAs. (**a**) Heat map representing the TPM values of four genes and their neighbouring lncRNAs across the different head kidney and brain samples. Expression levels are represented as row-normalised values on a blue–yellow colour scale. (**b**) Correlation values (Pearson's correlation coefficient and Spearman's rank correlation coefficient) between the lncRNAs and their nearby coding genes. \* represents significant correlation and ns non-significant correlation.

#### *3.6. LncRNAs Could Mediate Calcium Homeostasis*

In a previous analysis of the coding transcriptome, we found a strong regulation of genes involved in calcium homeostasis in the brain, and the concentration of this cation seems to be highly altered during NNV infection and is crucial for infectivity [15]. In this work, the biological process terms 'induction of synaptic vesicle exocytosis by positive regulation of presynaptic cytosolic calcium ion concentration' and 'positive regulation of high voltage-gated calcium channel activity' were also enriched in the brain at 24 hpi (Figure 6), as well as other calcium-related terms not included in the 30 most significantly enriched GO terms but significantly enriched at 24 h and/or at 72 h post-challenge. To illustrate the connections among the different lncRNA neighbouring coding genes involved in calcium homeostasis, a protein–protein network interaction was constructed only for the neighbouring genes of those lncRNAs significantly modulated after viral infection (Figure 8a). Most of the proteins encoded

by these genes were interconnected, and the main function of these genes is to directly mediate calcium transport across the cellular, mitochondrial and ER membranes. Among the molecules represented, *sarcoplasmic endoplasmic reticulum calcium atpase 1-like* (*atp2a1* or *serca1*) was the most modulated gene in the coding-transcriptome analysis [15]. On the other hand, the *regulator of g-protein signalling 6-like* (*rgs6*) was slightly modulated [15]. To exemplify the potential involvement of lncRNAs in the modulation of calcium homeostasis-related genes, these two genes were selected, and their expression was correlated to that observed for their neighbouring lncRNAs (Figure 8b). Positive or negative correlations between each coding gene and the corresponding lncRNAs were observed, and these correlations were significant for the *atp21a* gene and the lncRNA Lnc\_contig0024843 and for the *rgs6* gene and Lnc\_contig0052951. The three lncRNAs were included in the validation of the RNA-Seq results by qPCR.

**Figure 8.** Calcium homeostasis-related genes and their relationship with lncRNAs. (**a**) Protein–protein interaction network of the lncRNAs neighbouring coding genes involved in calcium homeostasis. Numerous lncRNAs significantly modulated in the brain after nodavirus infection were flanked by genes directly related to calcium homeostasis. (**b**) Gene representation of two genes mediating cellular calcium concentration after NNV challenge and genomic location of their neighbouring lncRNAs. Heap maps represent the TPM values of the different contigs. Expression levels are represented as row-normalised values on a blue–yellow colour scale. \* represents significant correlation and ns non-significant correlation.

#### *3.7. qPCR Validation of LncRNA Expression*

RNA-Seq results were validated by qPCR by analysing the expression of 12 putative lncRNAs significantly modulated in the RNA-Seq results (10 for brain and 2 for head kidney). Although in general terms the magnitude of the modulation was higher in the RNA-Seq data, the qPCR fold-changes followed the same tendency (Figure S3). Indeed, a positive correlation was obtained between the RNA-Seq and qPCR fold-change values by using Pearson's correlation coefficient (r = 0.84).

#### **4. Discussion**

In mammals, lncRNAs have been linked to a variety of immune processes, including inflammation [32,33], T cell development, differentiation and migration [34], B cell maturation [35], interferon response [36], dendritic cell differentiation [37] and granulocyte maturation [38]. The lncRNA profile has been shown to be modulated after different viral infections in several species [39–42]. Moreover, for several lncRNAs, their specific function in viral infection has been elucidated [43].

Massive analysis of zebrafish (*Danio rerio*) lncRNAs enabled the identification of numerous conserved zebrafish lncRNAs with a putative orthologue in mammals [44], which could be indicative of functional conservation. This analysis of the lncRNA repertoire in zebrafish has derived in the creation of a refined database [45], which could help to facilitate the functional analyses in this model species. However, to the best of our knowledge, only one publication has reported the specific immune function of a lncRNA in fish. In that work, the authors found that the PU.1 gene, which is involved in the expression of adaptive immune genes, is negatively regulated by its antisense lncRNA [46]. In the case of zebrafish, the existence of numerous mutant lines could facilitate the identification of lncRNAs related to specific functions. This finding was observed for the heterozygous *rag1*+/<sup>−</sup> zebrafish, which is partially deficient in the Rag1 protein, an endonuclease involved in the assembly of immunoglobulins and T cell receptor (TCR) genes [47]. When wild-type and *rag1*+/<sup>−</sup> zebrafish were infected with the spring viremia of carp virus (SVCV), those animals partially deficient in Rag1 showed a modulation of lncRNAs surrounded by genes involved in adaptive immunity, which was not observed in wild-type fish [48]. A plausible explanation is that those animals deficient in adaptive mechanisms potentiated this response to combat the infection compared to the full competent animals [48].

In the past several years, some publications have reported the identification and modulation of lncRNAs in aquacultured fish against a variety of pathogens, especially in salmonids [21,49–52]. However, to date, no lncRNA analyses have been conducted in European sea bass. We first investigated the modulation of the coding transcripts in head kidney and brain from *D. labrax* infected intramuscularly with nodavirus. In that work, we observed a strong modulation of the stress axis during infection with the virus but a slight regulation of immune-related genes [15].

From the Illumina sequencing of the *D. labrax* transcriptome and de novo assembly [15], we selected those contigs that passed all the filters to be considered potential lncRNAs. We obtained a contig list of 12,158 lncRNAs, and 95.96% of them mapped to the genome. Analysis of the lncRNA abundance per chromosome revealed a homogeneous distribution with a strong positive correlation between the number of putative lncRNAs and chromosome length.

RNA-Seq analysis of these contigs in the different samples showed that some lncRNAs were tissue-specific. This tissue specificity was previously reported for different vertebrate species [53,54], including zebrafish [55] and other teleost species [49]. Most lncRNAs influence the expression of their nearby genes, acting as local regulators; therefore, lncRNA expression is often correlated with the expression of their adjacent coding genes [16]. These tissue-specific lncRNAs were flanked by protein-coding genes involved in biological processes closely linked to the functionality of that tissue. Because the head kidney is a major haematopoietic and immune tissue in fish [56], numerous immune-related GO terms were only enriched in this organ. In the brain, almost all the enriched GO terms were directly related to the maintenance of neuronal functions and homeostasis, neurotransmitters and behaviour.

Nodavirus infection significantly modulated the expression of different lncRNAs in both the head kidney and brain. The brain showed a higher number of modulated lncRNAs (931 and 342 at 24 and 72 h, respectively) compared to the head kidney (204 and 93), probably because of the neurotropic nature of the virus. As was also reported for *S. salar* after infectious salmon anaemia virus (ISAV) infection [49] or *Caligus rogercresseyi* infestation [52], lncRNAs were expressed in a temporal-specific manner with a very low percentage of shared lncRNAs between 24 h and 72 h post-challenge in both tissues. Therefore, the lncRNA transcriptome profile changes as the disease progresses with high spatiotemporal specificity.

GO enrichment analyses of the neighbouring coding genes of lncRNAs modulated after NNV challenge in the head kidney revealed a large number of immune-related terms, especially after 24 h. These terms were mainly related to the activation and proliferation of immune cells, antigen presentation, immunoglobulin production and inflammation. However, the analysis of the transcriptome revealed a practically absent immune response in this tissue, and the modulated genes were mainly related to cortisol synthesis and reactive oxygen species (ROS) production [15]. It is worth mentioning that different DNA damage-, cell cycle- and ER stress-related terms were also enriched in this tissue. It is known that oxidative stress caused by ROS production can induce DNA damage and ER stress, which could lead to cell cycle arrest [57–59]. Nevertheless, viruses can manipulate DNA repair pathways and cell cycle control mechanisms to facilitate their own replication [60,61]. On the other hand, ER stress is also intimately related to virus activity. Viruses hijack the host translation machinery to massively produce viral proteins, affecting ER homeostasis and causing ER stress [62]. Moreover, ER stress also induces the production of ROS [63], and ROS are an important component of the immune defence because they can kill pathogens directly by causing oxidative damage or modulating different immune mechanisms [64]. It has been previously described that NNV can induce ROS in European sea bass [15] and oxidative stress-mediated cell death in fish cells [65]. Because ROS are also harmful to the host, the activity of antioxidant molecules, such as glutathione, is also needed to control the damage. The term 'positive regulation of glutathione biosynthetic process' was also enriched among those genes flanking the lncRNAs enriched in head kidney at 72 hpi. A previous analysis of the coding genes differentially modulated in sea bass head kidney after NNV infection revealed enrichment in the oxidation-reduction process [15], which could indicate that these are mechanisms controlled by lncRNAs.

In the brain, we found that those genes located at less than 10 kb from differentially modulated lncRNAs after NNV challenge were also enriched in immune terms both at 24 and 72 hpi. At 24 h, only one immune term was enriched in this tissue ('positive regulation of isotype switching to IgA isotypes'), which was related to antibody production. It has been described in teleosts that host antibody production is an important response to nodavirus infection even at early stages of infection [11,66–68], and the expression level of these immunoglobulins could be related to a higher resistance to the virus [69]. Indeed, antibody production is one of the main immune mechanisms protecting the brain against neurotropic viruses [70]. However, in the transcriptome analyses, the production of antibodies seemed to be downregulated at this early sampling point [15]. According to this finding, although more investigations are needed, inhibition of antibody production could be regulated by lncRNAs.

We previously observed that the main response induced by NNV in European sea bass at these early sampling points is the activation of the hypothalamic-pituitary-interrenal (HPI) axis, which is the stress response [15]. According to this study, some stress-related terms were also enriched in the brain for the neighbouring genes of the lncRNAs modulated at 72 hpi. The activation of the stress axis observed in these same samples could lead to calcium influx into the neurons, generating excitotoxicity, and as a consequence, neuronal damage [71,72]. Indeed, calcium cellular homeostasis was found to be highly altered in the brain during NNV infection [15]. For the neighbouring genes of the lncRNAs significantly modulated in this tissue, different calcium-related biological process terms were also enriched, and as occurred in head kidney, we also found lncRNAs flanked by genes related to the DNA damage response, which could be a consequence of excitotoxicity. Although calcium homeostasis

modulations in the brain could be a direct consequence of the stress response activated after NNV challenge, viruses can also perturb it and utilise calcium and calcium-related proteins to benefit their own replication [73]. In fact, nodaviruses require the incorporation of calcium ions into the viral capsid for a correct assembly process and integrity [74,75]. We observed that RGNNV, the genotype infecting European sea bass, needs calcium to replicate correctly [15].

Although the number of annotated DEGs in the different tissues at 24 h and 72 hpi was lower compared to the number of differentially modulated lncRNAs, in general terms, the magnitude of these modulations was higher for the DEGs. This observation could suggest that small variations in lncRNA levels can have a high impact on mRNA expression. Moreover, several lncRNAs can simultaneously affect the expression of the same coding gene, and consequently, their effect can be additive. To link the coding-lncRNA response, we randomly selected four genes that were significantly modulated in the brain after viral challenge and flanked by at least one lncRNA that was also significantly affected by the infection. We observed a strong correlation between the expression of the coding genes and all the closely positioned lncRNAs. Because lncRNAs can activate or repress gene expression [16], negative transcriptional correlations can also be observed. Because of the high modulation of calcium homeostasis-related coding genes and lncRNAs closely located to these genes, we suggest that the expression of these genes is likely affected by lncRNAs.

#### **5. Conclusions**

Taking all these observations into account, we can conclude that there exists a high parallelism between the protein coding genes modulated by NNV challenge and the genes located near lncRNAs affected by infection. Therefore, lncRNAs seem to strongly contribute to the response against nodavirus. Further functional studies between significantly correlated lncRNAs and coding genes will help to elucidate the mechanism of the interactions between lncRNAs and immune genes induced after NNV infection in European sea bass.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2079-7737/9/7/165/s1. Table S1: Primer pairs used to validate the lncRNA differential expression analysis, Table S2: LncRNA contig list, Table S3: LncRNA differential expression analyses in head kidney and brain at 24 and 72 h post-challenge with NNV, Table S4: Predicted neighbouring coding genes of the potential sea bass lncRNAs, Figure S1: Principal component analysis (PCA) of the lncRNAs in the different samples, Figure S2: Magnitude of transcriptome modulation in the head kidney and brain after NNV infection, Figure S3: Validation of differentially expressed lncRNAs through qPCR.

**Author Contributions:** Conceptualisation, P.P., B.N. and A.F.; Methodology, P.P., R.L., R.M. and A.F.; Software, P.P., R.L., R.M., V.V.-M., C.G.-E. and A.F.; Validation, R.L.; Formal analysis, P.P. and A.F.; Investigation, P.P. and R.L.; Data curation, P.P.; Writing—original draft preparation, P.P.; Writing—review and editing, P.P., R.L., B.N. and A.F.; Supervision, B.N. and A.F.; Funding acquisition, B.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financially supported by the European Union through the funding programme Horizon 2020 (Performfish -727610) and Ministerio de Economía, Industria y Competitividad of Spain (BIO2017-82851-C3-2-R). Patricia Pereiro and Raquel Lama wish to thank the Axencia Galega de Innovación (GAIN, Xunta de Galicia) for their postdoctoral (IN606B-2018/010) and predoctoral contracts (IN606A-2017/011), respectively. Our laboratory is funded by EU Feder Programa Interreg España-Portugal 0474\_BLUEBIOLAB and IN607B 2019/01 from Consellería de Economía, Emprego e Industria (GAIN), Xunta de Galicia.

**Acknowledgments:** We thank the IIM-CSIC aquarium staff for their technical assistance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

**Pedro Perdiguero, Esther Morel and Carolina Tafalla \***

Fish Immunology and Pathology Group, Animal Health Research Center (CISA-INIA), Valdeolmos, 28130 Madrid, Spain; perdiguero.pedro@inia.es (P.P.); morel.esther@inia.es (E.M.)

**\*** Correspondence: tafalla@inia.es; Tel.: +34-916202300

**Simple Summary:** Although evolutionarily jawed fish constitute the first group of animals in which a complete adaptive immune system based on immunoglobulins (Igs) is present, many structural immune differences between fish and mammals predict important functional and phenotypical differences between B cells in these two animal groups. However, to date, very few tools are available to study B cell heterogeneity and functionality in fish. Hence, thus far, antibodies targeting the different Igs have been almost exclusively applied as tools to investigate B cell functionality in fish. In the current study, we used the newly developed 10× Genomics single cell RNA sequencing technology and used it to analyze the transcriptional pattern of single B cells from peripheral blood. The results obtained provide us with a transcriptional profile at single cell level of what seem to correspond to different B cell subsets or B cells in different stages of maturation or differentiation. The information provided will not only help us understand the biology of teleost B cells, but also provides us with a repertoire of potential markers that could be used in the future to differentiate trout B cell subsets.

**Abstract:** Single-cell sequencing technologies capable of providing us with immune information from dozens to thousands of individual cells simultaneously have revolutionized the field of immunology these past years. However, to date, most of these novel technologies have not been broadly applied to non-model organisms such as teleost fish. In this study, we used the 10× Genomics single cell RNA sequencing technology and used it to analyze for the first time in teleost fish the transcriptional pattern of single B cells from peripheral blood. The analysis of the data obtained in rainbow trout revealed ten distinct cell clusters that seem to be associated with different subsets and/or maturation/differentiation stages of circulating B cells. The potential characteristics and functions of these different B cell subpopulations are discussed on the basis of their transcriptomic profile. The results obtained provide us with valuable information to understand the biology of teleost B cells and offer us a repertoire of potential markers that could be used in the future to differentiate trout B cell subsets.

**Keywords:** teleost; B cells; single cell transcriptomics; immunoglobulins; immune markers; transcription factors; long non-coding RNAs

#### **1. Introduction**

To date, most studies have agreed on the fact that adaptive immunity appeared during the early stages of vertebrate evolution, most likely in the disappeared placoderms [1]. Accordingly, the genes that define the adaptive immune system such as immunoglobulins (Igs), T cell receptors (TCR), major histocompatibility complex I (MHC I), MHC II, recombination activating gene 1 (RAG1), and RAG2 are present in gnathostomes (jawed vertebrates) including cartilaginous fish such as sharks (the most ancient jawed fish) and teleost fish. While the basic components of adaptive immunity are present, it must be taken into account that the adaptive branch of the immune system continued to evolve

**Citation:** Perdiguero, P.; Morel, E.; Tafalla, C. Diversity of Rainbow Trout Blood B Cells Revealed by Single Cell RNA Sequencing. *Biology* **2021**, *10*, 511. https://doi.org/10.3390/ biology10060511

Academic Editor: Patricia Pereiro

Received: 28 April 2021 Accepted: 6 June 2021 Published: 9 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in tetrapods, reaching further degrees of specialization and sophistication in mammals. Consequently, there are important differences between the mammalian and teleost adaptive immune system that significantly condition the phenotype and functionality of B cells and how they respond to an antigen encounter. For instance, given the lack of bone marrow, the head kidney is the main hematopoietic organ in teleosts, and the main site for B cell differentiation. Similarly, fish do not have lymph nodes, being the spleen the main secondary immune organ. Within the spleen, the organization of lymphocytes is very primitive, with scattered B and T cells and no clearly defined regions as those found in mammals [2,3]. Thus, no cognate germinal centers (GCs) have ever been identified in the teleost spleen. GCs, formed in mammals during the immune response, promote the close collaboration between proliferating antigen-specific B cells, T follicular helper cells, and specialized follicular dendritic cells (DCs). In this environment, B cells divide in response to antigens and acquire the capacity to differentiate into antibody-secreting cells (ASCs), reaching a terminal state of plasma cells or memory B cells, both having the capacity to secrete high affinity antibodies. It is in these sites, that B cells undergo class switch recombination (CSR) and replace the heavy chain of IgM for IgG, IgA or IgE, antibodies with higher affinity and different effector functions. In the absence of GCs or specialized Igs, whether teleost B cells differentiate to plasma cells or memory B cells equivalent to those found in mammals is still a matter of debate.

Fish also have a more limited Ig array. Fish genomes encode only three classes of Igs, namely IgM, IgD, and the fish specific IgT/Z [4,5]. Thus, in contrast to mammals, no CSR has ever been reported in fish. IgM and IgD are co-expressed on the surface of naïve B cells. In this case, alternative splicing between the recombined variable region and constant regions of IgM and IgD render cells that co-express IgM and IgD of the same specificity. Upon activation, naïve B cells lose surface IgD to become IgM+IgD<sup>−</sup> B cells with a plasmablast profile [6,7]. Moreover, through some still unknown mechanisms, certain cells lose surface IgM and become IgD+IgM<sup>−</sup> B cells. These cells have been identified in catfish blood [8] as well as in rainbow trout gills [9] and gut [10] and have yet unknown functions. Finally, fish B cells expressing IgT cells constitute an independent B cell linage in which IgM and IgD are not expressed [11]. Most teleost species express more than one IgT, three in the case of salmonids [12], although whether individual cells express one or more IgT is still not clear.

To date, antibodies targeting the different Igs have been almost exclusively applied as tools to investigate B cell functionality in fish. In some cases, antibodies recognizing key transcription factors involved in B cell development in mammals (Pax5, Blimp1) have been combined with antibodies recognizing species-specific immunoglobulins (IgM, IgT, and IgD), allowing the identification of B cells in different maturation/differentiation stages by flow cytometry [13,14]. Other studies have focused on identifying in fish orthologue genes of cluster of differentiation (CD) molecules previously characterized and assigned in mammals to specific B cell subsets. However, it should be taken into account that many of these orthologues to key mammalian markers are absent in the teleost genomes or in other cases show a high divergence compared to mammalian proteins, questioning whether they have a similar function. Despite these efforts, the notorious lack of fish-specific markers that differentiate B cell subsets, maturation stages, or differentiated cells have considerably hindered our understanding of teleost B cell functionality.

The recent development of sophisticated tools that permit the analysis of transcriptomes at single cell resolution is allowing a fast progression of our current knowledge regarding the phenotype and function of different leukocyte subsets. Thus, for example, single cell RNA sequencing has been used in Atlantic cod (*Gadus morhua*) spleen cells and peripheral blood leukocytes, identifying the major cell subsets including cytotoxic T cells, B cells, erythrocytes, thrombocytes, neutrophils, and macrophages [15]. Focusing the analysis on one specific immune cell subset, Niu et al. defined different subsets of non-specific cytotoxic cells in Nile tilapia (*Oreochromis niloticus*) [16]. In the present work, we have taken advantage of the recent single cell genomic tools developed to perform an

in depth analysis of teleost B cell transcriptomes at single cell resolution. For this study, we used rainbow trout (*Oncorhynchus mykiss*) as a model species and blood as a B cell source given that this is where the higher percentage of B cells are found in homeostasis [17]. To avoid cross-linking of the B cell receptor (BCR) and the subsequent cell activation in response to anti-IgM, we sorted lymphoid (small cells with low complexity) MHC II<sup>+</sup> cells, which exclusively corresponded to B cells. Our results evidence the presence of different circulating B cell subpopulations with interesting differences at transcriptional level that seem to reflect different stages of maturation or diverse B cell subsets. These results provide valuable information to elucidate B cell functionality in teleosts. The identification of specific markers for each of these subpopulations will also be of great help to generate in the future novel antibodies that can be used to differentiate teleost B cell subsets.

#### **2. Materials and Methods**

#### *2.1. Experimental Fish*

Female rainbow trout (*Oncorhynchus mykiss*) of approximately 50–70 g (15–20 cm) were obtained from *Piscifactoria Cifuentes* (Guadalajara, Spain). Fish were maintained at the Animal Health Research Centre (CISA-INIA) laboratory at 16 ◦C with a re-circulating water system and 12:12 h light:dark photoperiod. The fish were fed a commercial diet twice a day (T4 Royal Optima, Skretting, Spain). The fish were acclimatized to laboratory conditions for at least 2 weeks prior to any experimental procedure. During this period, no clinical signs were ever observed. The procedures described comply with the Guidelines of the European Union Council (2010/63/EU) for the use of laboratory animals and were previously approved by the Ethics committee from the *Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria* (INIA; Code CEEA PROEX002/17).

#### *2.2. Peripheral Blood Leukocyte Isolation and Sorting*

Three individual rainbow trout were killed by benzocaine (Sigma) overdose. Blood was extracted with a heparinized needle from the caudal vein and diluted 10 times with Leibovitz medium (L-15, Thermo Fisher Scientific, Walthan, MA, USA) supplemented with 100 IU/mL penicillin and 100 μg/mL streptomycin (P/S, Thermo Fisher Scientific), 2% fetal calf serum (FCS, Thermo Fisher Scientific), and 10 IU/mL heparin (Sigma, St. Louis, MO, USA). Peripheral blood leukocytes (PBLs) were obtained by density gradient centrifugation (500× *g* for 30 min at 4 ◦C) of diluted blood on 51% continuous Percoll (GE Healthcare, North Richland Hills, TX, USA). The interface cells were collected, washed twice in L-15 containing antibiotics and 5% FCS and adjusted to 2 × <sup>10</sup><sup>6</sup> cells/mL.

To avoid the activation of B cells by BCR cross-linking and to be able to study different B cell subsets independently of their pattern of surface Ig expression, we decided to sort MHC II+ cells with small size and low complexity (within what has been previously designated as the lymphoid gate), as it was predicted that these cells would correspond mainly to B cells. For this, PBLs were washed in FACS staining buffer (phenol red-free L-15 medium supplemented with P/S and 2% FCS) and incubated with a monoclonal antibody specific for rainbow trout MHC II β chain (mAb mouse IgG1 coupled to allophycocyanin, 2 μg/mL) previously characterized [18]. After 30 min of incubation at 4 ◦C, the cells were washed with FACS staining buffer and YO-PRO dye (0.05 μM) added to the cell suspension for dead cell exclusion. Lymphoid (small, low complexity) MHC IIβ<sup>+</sup> YO-PRO<sup>−</sup> (live) cells were then isolated in a FACSAria™ III sorter (BD Biosciences, San Jose, CA, USA) equipped with BD FACSDiva™ software (BD Biosciences). The purity of the sorted population (above 98%) was confirmed in a FACS Celesta flow cytometer (BD Biosciences).

#### *2.3. Library Construction and Sequencing*

Isolated MHC II+ cells gently pipetted and diluted to a concentration of 700 cells/μL were used for cell isolation on a 10× Genomics Chromium Controller instrument [19]. All steps, including PBL isolation, sorting, and Chromium™ Single Cell isolation were carried out in the same morning to avoid cell death. A total of 2500 cells per donor were

loaded into the chips of the Chromium™ Single Cell 5 Gel Beads Kit (10× Genomics) and subjected to the Chromium Controller instrument to generate single cell Gel Bead-In Emulsions (GEMs) following the manufacturer's instructions. Next, GEMs were subjected to library construction using the Chromium™ Single Cell 5 Library Kit v1 (10× Genomics). As a first step, reverse transcription was performed, resulting in cDNA tagged with a cell-specific barcode and a unique molecular index (UMI) per transcript. Fragments were then size selected using SPRIselect magnetic beads (Beckman Coulter). Next, Illumina sequencing adapters were ligated to the size-selected fragments and cleaned up using SPRIselect magnetic beads (Beckman Coulter, Brea, CA, USA). Finally, sample indexes were incorporated and amplified, followed by a double-sided size selection using SPRIselect magnetic beads (Beckman Coulter). The quality of the final library was assessed using an Agilent 2100 Bioanalyzer (Agilent technologies, Amstelveen, The Netherlands). The samples were then sequenced using a NextSeq instrument (Illumina) with 150 PE chemistry.

#### *2.4. Alignment and Initial Processing of the Sequencing Data*

The Cell Ranger software (10× Genomics, v3.1) was used to process the sequenced libraries. A rainbow trout reference transcriptome was constructed from the RefSeq *Oncorhynchus mykiss* genome v1.0 using the "Cell Ranger mkref" tool. The sequences encoding the constant regions of rainbow trout Igs were added to the fasta file prior to the construction of the reference transcriptome. The complementary DNA reads from each donor were mapped against this reference using the "Cell Ranger count" tool. Through this system, filtered UMI expression matrices from each donor were generated. As a result, the raw expression data was obtained containing transcriptomes for single MHC II<sup>+</sup> cells from the three donor fish.

In accordance with published pipelines and quality control standards, abnormal cells in all datasets were uniformly filtered out based on their gene expression distribution using the Seurat package (version 3.1) [20,21]. Cells with at least 200 detected genes, and only those genes that appeared at least in three cells were included in the initial matrix from each fish. A cell was considered to be abnormal if any of the following criteria were met: (i) detected gene number >2500; (ii) detected count number >15,000 and (iii) >25% of reads mapping for mitochondrial genes or no reads mapping for mitochondrial genes.

#### *2.5. Data Integration and Cell Clustering*

The SCTransform method from the Seurat software was applied in order to normalize the three filtered single-cell datasets from different fish. The percentages of mitochondrial and ribosomal proteins previously calculated for each cell were included as variables to regress the data. Then, filtered and normalized datasets were integrated using the PrepSCTIntegration tool to avoid a batch effect, enabling a systematic comparison between the three fish. The merged data was subjected to dimensionality reduction using principal component analysis (PCA) followed by uniform manifold approximation and projection (UMAP) using the first 30 principal components (PCs). This setup was also used to define nearest neighbors among cells with the KNN method using the FindNeighbors function. To group the cells in different subsets according to expression levels, the FindCluster tool was applied using the Louvain algorithm with the resolution set as 0.5, which allowed the correct definition of clearly separated clusters.

#### *2.6. Marker Identification and Functional Analysis*

The identification of genes showing differential expression associated to a specific cluster was performed using the FindAllMarkers tool from the Seurat software, considering a significant association for those genes showing an adjusted *p* < 0.001 and logFC ≥ 0.25. The information relative to the gene description contained in the *O. mykiss* genome v1.0 was taken into account for gene name association. In order to obtain an actualized functional annotation, the nucleotide sequences from the genes identified as markers were compared with proteins from a set of model species (*Homo sapiens*, *Mus musculus*, *Danio rerio*, *Macata*

*mulata*, *Drosophila melanogaster*, and *Xenopus tropicalis*) using the Blastx software applying as threshold a minimum E value of 10<sup>−</sup>5. The blast results were subjected to the Blast2GO software for GO term mapping. Sequences were also compared against domain databases using the InterProScan tool implemented in Blast2GO. GO term annotations were inferred for rainbow trout transcripts. Single enrichment analysis was performed by comparing the functions associated to genes from each cluster taking into account differences with an adjusted *p* < 0.05.

#### **3. Results**

#### *3.1. Single-Cell Transcriptomic Analysis of Rainbow Trout MHC II+ Lymphoid Cells from Peripheral Blood*

To acquire a transcriptomic profile of rainbow trout peripheral blood B cells at singlecell resolution, leucocyte isolation following sorting of lymphoid (small size and low complexity) MHC II+ cells was carried out using blood obtained from three independent unstimulated fish. Cell viability was checked after sorting and confirmed to be higher than 90% in all samples. Single cell cDNA libraries were sequenced using Illumina HiSeq 150PE obtaining a total number of 73,934,335, 83,626,235 and 76,465,953 raw reads per fish. After quality control and mapping using the Cell Ranger software, transcriptomes of 1078, 2178, and 1488 single cells from each fish were acquired, detecting a median of 762, 694, and 1027 genes per cell, respectively.

Single cell sequencing data from each fish was then analyzed to determine the distribution of genes, read counts, percentage of reads mapping mitochondrial genes as well as percentage of reads mapping ribosomal proteins (Figure 1a). The distribution of genes and reads was found to be similar in all fish, with approximately 2500 genes per cell or 15,000 counts per cell (Figure 1a). Some cells, mainly identified in Fish 3, showed a higher number of genes or reads. These cells were considered abnormal cells or potential doublets and were filtered out for successive analysis. Regarding the percentage of mitochondrial reads, a wide distribution was observed in all fish reaching values of ~50% (Figure 1a). Commonly, a high percentage of mitochondrial reads is associated with abnormal cells or cells that have been damaged during isolation. For this reason, cells showing a percentage of mitochondrial reads higher than 25% were considered abnormal and excluded from subsequent analysis. Similarly, a reduced number of cells with no mitochondrial reads were filtered out at this step. After filtering a total of 843, 1814, and 1327 cells were retained for successive analysis.

Once the data was normalized and integrated, the cells from each of the three fish appeared to be equally distributed along the cell projection (Figure 1b). After cell clustering, a global view was generated to illustrate the potential subpopulations of rainbow trout peripheral blood MHC II+ B cells. It should be noted that 99.28% of the cells analyzed transcribed some type of B cell marker, either genes encoding Ig heavy or light chains or isoforms of CD79, previously identified as a B cell specific marker in Atlantic salmon (*Salmo salar*) [22], confirming that almost all MHC II+ lymphoid cells in peripheral blood correspond to B cells. The UMAP reduction generates a clear cell clustering highlighting 10 distinct cell populations based on their gene expression profiles (Figure 1c). All the clusters identified were shared by the three fish analyzed (Figure 1b). According to the global expression analysis, the clusters located at the central region of the cell projection (Clusters 0, 1, and 6) seem to contain cells showing a lower number of reads (Figure 1d) as well as a lower number of expressed genes per cell (Figure 1e). In contrast, the clusters located in the periphery of the cell projection show cells with higher number of reads and genes expressed, especially noticeable for clusters 2, 4, 5, and 7 (Figure 1d,e).

**Figure 1.** Single cell sequencing quality check and cell projection analysis. (**a**) Quality check of single cell sequencing results. The figure shows the total number of genes (nFeature\_RNA), the total number of reads (nCount\_RNA), the percentage of reads mapping mitochondrial genes (percent.mt) and the percentage of reads mapping ribosomal proteins (percent.RB) per cell in each individual fish. UMAP visualization of cells integrating data from the three individual fish (**b**); defining clusters of cells with similar transcriptional identities (**c**); showing the total number of reads identified for each cell (**d**) or the total number of genes identified for each cell (**e**).

#### *3.2. Characterization of Marker Transcripts Associated with Identified Cell Clusters*

Following the identification of cell clusters of B cells according to their transcriptional profile, a new analysis was performed to identify those genes that were most specific to a given cluster versus the rest. These specific marker transcripts identified for each cluster were retained when the adjusted *p* value was lower than 0.001 (Table S1). These transcripts that showed significant differences in transcription levels among different B cell subsets, included protein-coding genes as well as genes encoding ribosomal proteins, long non-coding RNAs (lncRNAs), and also mitochondrial genes (Figure 2a,b). Specifically, the gene encoding the 60S ribosomal protein L18a (LOC110516869) was the gene that most significantly represented cluster 0 (Figure 2c). Genes encoding the heat shock protein 70b (hsp70b), the ferritin heavy subunit (frih), an early growth response protein 1 (LOC110488587), and the heat shock protein HSP 90-alpha-like protein (LOC110529844) were the genes that most significantly differentiated clusters 1, 2, 7, and 8 respectively (Figure 2c). Interestingly, clusters 3, 4, 5, and 9 had one lncRNA as their most significant marker transcript (Figure 2c). Finally, cells included in cluster 6 showed a remarkable expression of the mitochondrial 12S ribosomal gene (Figure 2c).

**Figure 2.** Identification of transcripts specific for each rainbow trout B cell subset. (**a**) Table including the number of transcripts identified for each cell cluster showing significant differences in expression levels (adjusted to *p* < 0.001) differentiating among protein coding genes (Pcg), long non-coding RNAs (Lnc), genes coding for ribosomal proteins (Rpr) and mitochondrial genes (Mtc). (**b**) Heatmap showing expression levels from the top five most significant subset-specific transcripts for each B cell subset (**c**) Violin plots showing normalized expression levels in all B cell subsets for the most significant marker transcript in each B cell subset.

#### 3.2.1. Description of Molecular Markers Associated with Different B Cell Subsets

Marker transcripts for each B cell subset as well as the potential functions associated to each of these genes were further analyzed using the GO term single enrichment analysis. Several of these genes for which significant differences in expression levels were identified among the different clusters included genes encoding important enzymes, genes related to immune functions, as well as several transcription factors involved in important physiological processes (Table S1). Interesting functionalities were also associated with each B cell subset through this analysis (Table S2). The most relevant findings are summarized below.

#### Cluster 0

Cluster 0 is the largest cluster in number of cells, with 1098 cells in total (taking into account all three fish), and it is located at the center of the cell projection (Figure 3a). According to the transcription of the constant regions of Ig genes around 50% of cells within this cluster are IgM+IgD+ B cells (Figure 3b). A significant enrichment of GO terms "response to chemokines" (GO:1990868) or "cellular response to increased oxygen levels" (GO:0036295) was observed in this cluster within the biological process category (Figure 3c). A gene encoding an interferon-induced protein with tetratricopeptide repeats 5-like (LOC110491392) or a transmembrane protein 42-like (LOC110516550) were found within the genes showing the highest differences in expression levels for this cluster (Table S1). Some interesting immune genes were also found transcribed at significantly higher levels in this cluster. These included genes coding for the constant region of the immunoglobulin mu heavy chain and a C-X-C chemokine receptor type 4-like (LOC110520024 and LOC110501543), which also showed higher expression in Cluster 8 (Figure 4).

**Figure 3.** Cluster definition including immunoglobulin profile of cells and functional analysis of marker transcripts. (**a**) UMAP visualization of cells in each cluster. (**b**) Percentage of cells within each cluster assigned to common heavy chain Ig subsets (IgM+IgD+, IgM+, IgD+ and IgT+) according to their transcriptional analysis. (**c**) Single enrichment analysis along the different cell clusters. Graphs show the percentage of genes within the most significant GO terms at level six within the biological process category for each specific cluster (colored bars) versus the percentage of genes from this category found in the rest of clusters (grey bars).


**Figure 4.** Dot plot analysis showing the marker transcripts identified for each cell cluster related with immune function. For each gene, the average normalized expression (dot color) together with the percentage of cells expressing the gene (dot size) are shown for each cluster.

Cluster 1 in formed by 613 cells and is located in the upper region of cell projection (Figure 3a). This B cell subset is mainly represented by IgT+ B cells (Figure 3b). Intriguingly, this cluster is characterized by a group of markers related with "definitive hemopoiesis" (GO:0060216), "primitive erythrocyte differentiation" (GO:0060319) and "positive regulation of transcription of Notch receptor target" (GO:0007221) (Figure 3c). Regarding gene expression, this cluster is mainly represented by cells expressing immunoglobulin tau-1 and tau-3 heavy chains. Several genes coding for homologues of heat shock protein 70 also appeared among the most significant markers associated with this cluster (*hsp70b*, LOC110486254, LOC110485160, and LOC110486256) (Supplementary Table S1). Other immune genes transcribed at significantly higher levels by several cells within this cluster include genes coding for CD18 (*cd18*), interleukin 15 (*il15*), interleukin 31 receptor subunit alpha-like (LOC110523479), interleukin 3 receptor class 2 subunit beta-like (LOC110538012), tumor necrosis factor superfamily member 13b (*tnfsf13b*), as well as a gene encoding a CXC chemokine receptor type 4-like (LOC110516585) (Figure 4). Some receptors related to innate immune responses were also highly transcribed in cells from this cluster, for example, toll-like receptor 12 (LOC110508481) or 13 (LOC110496442) (Table S1). Among genes that encode transcription factors, these cells are characterized by the transcription of two genes encoding GATA-binding factors 2-like proteins (LOC110494514 and LOC110527950), involved in primitive hematopoiesis in mammals. As mentioned above, some genes related with regulation of the Notch pathway were also up-regulated in this cell cluster. These included two genes encoding pre-B-cell leukemia transcription factors 1-like proteins (LOC110508658 and LOC110520992), and a gene encoding a signal transducer and activator of transcription 1-alpha/beta-like protein (LOC110520020) (Figure 5). Analyzing the GO terms within the molecular function category associated with this cluster, we found "signaling receptor binding" (GO:0005102) and "calcium ion binding" (GO:0005509) (Table S2). In correlation, a representative group of genes highly expressed in cells from this cluster are related with calcium regulation, for example genes encoding ictacalcin (*s10i*), an ictacalcin-like protein (LOC110520890), a C-type lectin domain containing 14A (*clec14a*), and a E3 ubiquitin-protein ligase CBL-like protein (LOC110533571) (Table S1).


**Figure 5.** Dot plot analysis including those marker transcripts identified for each cell cluster related to the activity of transcription factors. For each gene, the average normalized expression (dot color) together with the percentage of cells expressing the gene (dot size) are shown for each cluster.

Cluster 2, located in the left part of the cell projection, is formed by a total of 578 cells (Figure 3a). The cluster is mostly composed of IgM+IgD+ B cells, which constitute approximately 87% of these cells according to the levels of transcription of Ig constant regions (Figure 3b). Cluster 2 seems to be the most transcriptionally active cell cluster according to number of reads and genes transcribed per cell (Figure 1e). Genes identified as marker genes specific for this cluster are related to several functions including "peptide transport" (GO:0015833), "negative regulation of apoptotic process" (GO:0043066), "cellular response to cytokine stimulus" (GO:0071345), or "positive regulation of protein binding" (GO:0032092) (Figure 3c, Table S2), among others functionalities (Table S2). Among the genes most significantly expressed by this cluster we found some interesting immune genes, such as two genes encoding CC chemokine receptor type 9 (*ccr9* and LOC110509110), different components of the MHC II complex, including the MHC class II beta chain (*oncmyk-dab*) and the BOLA class I histocompatibility antigen, an alpha chain BL3-6-like (LOC110496224), and several homologues of beta-2-microglobulins (LOC110491889, LOC110491891 and LOC110491893) (Figure 4). In addition, among genes associated with the activity of transcription factors, we found a strong transcription of the factor BTF3 (LOC110523362), annexin A4-like (LOC110523711) and the cyclic AMP-dependent transcription factor ATF-4 (LOC110494903) (Figure 5). Finally, beta thymosin (LOC100136027), a gene associated with plasma cell differentiation, was also identified among the most significant marker transcripts for this cluster (Table S1).

#### Cluster 3

A total of 400 cells were identified in cluster 3, located at the bottom left region of the cell projection (Figure 3a). According to the levels of transcription of Ig constant regions, this cluster included IgM+IgD+ cells, but also IgT+, IgD+ or IgM<sup>+</sup> cells (Figure 3b). The numbers of marker transcripts identified for this cluster were lower than for other clusters but were enriched in functions such as "negative regulation of organelle assembly" (GO:1902116), "regulation of ubiquitin-protein transferase activity" (GO:0051438) and "regulation of gene expression, epigenetic" (GO:0040029) (Figure 3c, Table S2). A gene coding for the PTEN induced putative kinase 1 (*pink1*) was transcribed at significantly higher levels in cells from this cluster, despite the fact that this gene was already highly transcribed in all the cells analyzed. A gene encoding a lysosomal acid phosphataselike protein (LOC110506315) was also preferentially transcribed in this cluster (Table S1). Concerning immune genes, some cells from this cluster highly expressed a gene coding for a tumor necrosis factor receptor superfamily member 19-like protein (LOC110504009) and a DENN domain-containing 1B-like protein (LOC110524795) (Figure 4). Additionally, a few genes related to transcription factor activity were identified with higher expression in this cluster, such as genes coding for a CREB/ATF bZIP transcription factor-like protein (LOC110533414) and a REST corepressor 1-like protein (LOC110505760) (Figure 5).

#### Cluster 4

A group of 253 cells localized at the right region of cell projection was designated as cluster 4 (Figure 3a). According to the levels of transcription of Ig constant regions, although the majority of cells from this cluster were IgM+IgD+ cells, IgT+, IgD+, or IgM+ cells were also included (Figure 3b). Interestingly, marker transcripts for this cluster indicate a significant enrichment in functions related to "glycoprotein biosynthetic process" (GO:0009101) as well as "protein N-" or "O-linked glycosylation" (GO:0006487 and GO:0006493, respectively) (Figure 3c). Among the genes related to these functions that were transcribed at significantly higher levels in this cluster, we found genes coding for several enzymes, such as for example beta-1,3-galactosyltransferase 6-like protein (LOC110527712), beta-1,4 glucuronyltransferase 1 (*b4gat1*), probable C-mannosyltransferase DPY19L3 (LOC110506384), and UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9-like protein (LOC110494808) (Table S1).

A group of 242 cells localized at the bottom right section of the cell projection was annotated as cluster 5 (Figure 3a). The cells from this cluster were mostly IgM+IgD+ cells but cells transcribing only IgT or IgM were also identified, according to the levels of transcription of Ig constant regions (Figure 3b). The genes identified in this cluster are enriched in functions such as "protein processing" (GO:0016485) or "regulation of inflammatory response" (GO:0050727). (Figure 3c, Table S2). Interestingly, the cells included in this cluster exhibited higher transcription levels of the constant region of immunoglobulin light chain kappa G1. Other genes related with immune functions identified with higher transcription levels in cells from this cluster, included, for instance, genes coding for the IL-6R alpha precursor (*il6ra*), CXC chemokine receptor type 4-like (LOC110530627), mucin-22-like (LOC110505032), or cysteine-rich protein 1 (*crip1*) (Figure 4). Within transcription factors, different zinc finger proteins (585-LOC110515698, 271-LOC110516917, and 689- LOC110533183) were identified with higher expression than other clusters as well as a nuclear factor erythroid 2-related factor 1-like (LOC110538084) (Figure 5).

#### Cluster 6

A total of 231 cells were included in Cluster 6, situated at the left part of cell projection and near cluster 0 (Figure 3a). Cells from this cluster were identified as a mixture of IgM+IgD+ cells and cells exclusively expressing IgT, IgM or IgD (Figure 3b). Transcripts in this cluster were enriched in a lower number of functionalities than those of other clusters, identifying genes related to "melanocyte differentiation" (GO:0030318) as well as "positive regulation of microtubule polymerization or depolymerization" (GO:0031112) (Figure 3c, Table S2). Cluster 6 is mainly characterized by a high expression of the ribosomal 12S gene mitochondrial gene (Figure 2c). Interestingly, no nuclear genes are clearly associated with cells included in this cluster.

#### Cluster 7

Cluster 7 is represented by 213 cells located at the upper and right part of the cell projection (Figure 3a). This cluster is mostly composed of IgM+IgD+ B cells, which constitute approximately 63% of these cells according to the levels of transcription of Ig constant regions (Figure 3b). Functions related with "regulation of steroid metabolic process" (GO:0019218) or "transmembrane receptor protein serine/threonine kinase signaling pathway" (GO:0007178) were found within the most significant functions from the biological process category (Figure 3c). This cluster contains several genes associated to important immune system functions, such as "regulation of cell population proliferation" (GO:0042127), "positive regulation of cytokine production" (GO:0001819), "lymphocyte activation" (GO:0046649), and "cell migration" (GO:0016477) (Table S2). Also, significant enrichments in other GO terms from the molecular function category, such as "signaling receptor activator activity" (GO:0030546) or "DNA-binding transcription activator activity" (GO:0001216) were also detected (Table S2). The analysis of cluster 7 highlighted interesting markers related to immune functions. Thus, genes encoding two CD83 antigen-like proteins (LOC110486829 and LOC110486775), CCL4 protein (*ccl4*), three early growth response protein 1-like (LOC110488587, LOC110533389, and LOC110537802) and other early growth response protein 3-like (LOC110536011), the immediate early response 2 protein (*ier2*), as well as several nuclear receptor subfamily 4 genes (LOC110534168, LOC110508134, and LOC110494169) were identified within the most significant markers for this cluster (Figure 4). Additional immune related genes were expressed in several cells from this cluster, for instance, a C-C motif chemokine 4-like protein (LOC110494096), two genes encoding interferon regulatory factor 4-like proteins (LOC110524663 and LOC110521762), an interferon regulatory factor 8-like protein (LOC110526480), a tumor necrosis factor receptor superfamily member 9-like protein (LOC110492146) and a cytotoxic and regulatory T cell molecule (*crtam*) (Figure 4).

A total of 190 cells were grouped in cluster 8, located at the center of the cell projection near cluster 0 (Figure 3a). According to the levels of transcription of Ig constant regions, although the majority of cells from this cluster were IgM+IgD+ cells, IgT+, IgD+, or IgM+ cells were also present in this cluster (Figure 3b). Genes with higher transcription levels in cluster 8 than in other clusters appeared associated to "cellular response to stress" (GO:0033554) and "programmed cell death" (GO:0012501) (Table S2). Within the molecular function category, a significant enrichment in genes identified in the GO terms "heat shock protein binding" (GO:0031072) and "ubiquitin-like protein ligase binding" (GO:0044389) were also detected (Table S2). These enrichments were reflected in the genes that were identified within the most significant marker transcripts for this cluster, which included genes encoding several heat shock proteins as well as several ubiquitins. Among the heat shock proteins, a gene coding for a HSP90-alpha-like protein (LOC110529844) was the most significant marker showing high transcription levels in all cells included in this cluster (Figure 2c). In addition, a group of genes coding for homologues of the HSP70 kDa protein (LOC110485160, LOC110533353 and LOC110486254) also showed remarkable transcription levels in cells from this cluster. Within ubiquitins, genes encoding polyubiquitin-B (LOC110533627), polyubiquitin (LOC110533628), ubiquitin B (*ubb*), or E3 ubiquitin-protein ligase RNF220-like (LOC110508723) also showed significantly higher mRNA levels in cells from this cluster when compared to cells from other clusters (Table S1). Some genes related to immune functions were also identified as marker transcripts for cluster 8, such as genes coding for a B-cell CLL/lymphoma 6 member B-like protein (LOC110530070), a TNF receptor superfamily member 5A precursor (*tnfrsf5a*), and a TNF receptor-associated factor 6-like protein (LOC110506988) (Figure 4).

#### Cluster 9

A total of 166 cells were grouped in cluster 9, located at the bottom of the cell projection (Figure 3a). Cells in this cluster corresponded to either IgM+IgD+, IgT+, IgM+ or IgD+ B cells according to their transcriptional Ig profile (Figure 3b). The marker transcripts identified in this subpopulation were mainly associated with functions related to "lipid storage" and "post-translational protein modification" (GO:0043687) (Table S2). This cluster was mainly characterized by the transcription of a gene encoding a glycine-rich RNA-binding -like protein (LOC110508373) (Table S1). Among immune genes, cells from this cluster exhibited higher transcription levels of a cullin-associated NEDD8-dissociated protein 1 (LOC110513018) (Figure 4). Some transcription factors such as the zinc finger and BTB domain containing 24 protein (*zbtb24*) or the metal regulatory transcription factor 1-like protein (LOC110489178) were also transcribed at significantly higher levels by cells in this cluster (Figure 5).

#### 3.2.2. Long Non-Coding RNAs Associated to B Cell Subsets

Several lncRNAs showed differential transcription levels along the different B cell clusters identified among trout peripheral blood B cells (Figure 6a). Remarkably, some of these lncRNAs were found to be widely and strongly expressed all B cells (Figure 6b).

The lncRNAs LOC110490088, LOC110494519, LOC110521442, and LOC110513619 were identified as the most significant marker transcripts for clusters 3, 4, 5, and 9, respectively (Figure 2c). LOC110504686, LOC110520552, and LOC110495721 were also expressed at slightly higher levels in cells from clusters 0, 1 and 2, respectively (Figure 6a,b). In clusters 3 and 4, in addition to the LOC110490088 and LOC110494519 lncRNAs previously mentioned, several other lncRNAs showed strong expression levels. Thus, LOC110499522, LOC110512049, LOC110510226, and LOC110535625 showed higher transcription levels in cells from cluster 3 whereas LOC110494523, LOC110514167, LOC110514392, and LOC110538866 showed higher transcription levels in cluster 4 (Figure 6a,b).

**Figure 6.** Long non-coding RNAs (lncRNAs) identified as markers for different cell clusters. (**a**) Heatmap showing the most significant lncRNAs with differential expression between clusters. (**b**) Ridge plot showing normalized expression in each cluster of lncRNAs identified as markers for specific subsets.

#### **4. Discussion**

Immune studies in mammalian models, especially those undertaken in human and mouse, have identified specific surface markers and/or transcription factors that have allowed the classification of B cells into different subsets and/or different stages of maturation/differentiation. Interestingly, remarkable differences have been observed between human and mouse markers throughout the differentiation process, often sharing just two or three markers between the two species within a specific B cell state [23,24]. In addition, several key markers that define B cell subsets in mammals such as CD19, CD23, or CD24 have no orthologues within the fish genomes, evidencing important differences in B cell biology between mammals and fish. Hence, the analysis of rainbow trout genes encoding homologues to different CD previously described along different states of human and mouse B cells evidenced no expression or residual expression of most of these markers in rainbow trout B cells, with the exception of the different genes encoding CXCR4-like molecules. Surprisingly, CXCR4 was the only marker gene identified in rainbow trout B cells, which can also be considered a marker gene for specific B cell subsets in humans. Other genes identified in our studies as marker genes, have not been identified as such in mammals. Instead, many of these genes correspond to proteins identified in a study recently performed by Peñaranda and collaborators in which a surface proteome of salmon (*Salmo salar*) IgM<sup>+</sup> B cell was performed [22]. These observations evidence the great differences between markers of B cell subsets between fish and mammals, highlighting the need for high throughput studies that perform an unbiased search for these proteins. Of course, studies such as the one performed here, are not only useful to render a panel of molecules that could be used as markers for specific B cell populations, but also provide further insights on the functionality of teleost B cells which are further discussed.

#### *4.1. Cluster 2 May Represent a Subpopulation of Mature B Cells with High Antigen Presenting Capacities*

Cluster 2 was the subpopulation expressing the highest number of genes per cell. This cluster was clearly enriched in IgM+IgD+ B cells (~86%). In addition, this cluster transcribed several components of the MHC II complex and associated microglobulin genes as the most significant marker genes. A significant enrichment in genes involved in peptide or amide transport, also seems to indicate that this B cell subpopulation is associated with a high antigen presenting capacity. Two genes encoding orthologues of mammalian CCR9 are also interesting markers for this subpopulation. In mammals, CCR9 has been shown to regulate naïve B cell migration [25] and localization of plasma cells in mucosal tissues [26]. In addition, high CCR9 expression has also been revealed in mammalian memory B cells [27]. To date, the function and ligands of fish CCR9 are still unknown.

Our results also identified a group of transcription factors among the most significant markers of this subpopulation. These include BTF3 as well as the cyclic AMPdependent transcription factor ATF4. Studies in mammals suggest these transcription factors play different roles associated with the endoplasmic reticulum biology. Thus, BTF3 transcription factor in association with nascent polypeptide-associated complex subunit alpha (LOC110492373), also identified as a marker for this cell cluster, form the nascent polypeptide-associated complex (NAC) which prevents the inappropriate targeting of non-secretory polypeptides to the endoplasmic reticulum. ATF4, on the other hand, has been related with the activation of genes during endoplasmic reticulum stress, a process that regulates activation, B cell differentiation to plasma cell, and cytokine expression mediated by unfolded protein response (UPR) [28].

#### *4.2. Cluster 7 May Represent a Subpopulation of Activated B Cells with Innate Properties*

Teleost B cells retain several innate functions such as a strong phagocytic capacity, ability to produce pro-inflammatory cytokines or antimicrobial peptides in response to a non-specific stimulation [29–31]. However, whether all teleost B cells or only a specific cell subset has these capacities is currently unknown. In the current study, we found that Cluster 7 had among its marker genes, some that could be catalogued as innate genes, commonly found in DCs or macrophages. Hence, two genes encoding orthologues of CD83 were identified in this subpopulation as marker genes. Interestingly, CD83 has been previously associated to a population of small mononuclear blood cells from Atlantic salmon which also expressed high MHC-II levels, showed a high phagocytic capacity and the ability to differentiate into DC-like cells [32]. However, in Atlantic salmon, no B cell markers were identified in this subpopulation, whereas in our study, the transcription of different immunoglobulins and CD79 unequivocally indicates that these cells correspond to a B cell subset. Other important markers that link this B cell subpopulation with innate responses are the genes annotated in the rainbow trout genome as CCL4 and *tnfrsf9* (CD137), both known to play important roles in monocyte activation. Interestingly, a significant increase in the levels of transcription of both of these genes had also been reported in rainbow trout IgM+ B cells after stimulation with either T-dependent or Tindependent antigens [33]. Rainbow trout CCL4 was originally identified as a highly inducible chemokine produced by differentiated rainbow trout macrophages after LPS induction [34]. However, it is important to mention that after this first description, posterior studies focused at establishing phylogenic relationships between mammalian and fish CC chemokines renamed this gene as CK5B [35] and established a close relation with mammalian CCL5 [35,36]. In mammals, CCL5 is known to attract T cells, specifically CD4<sup>+</sup> Th1 cells [37]. In rainbow trout, a previous study established that upon viral infection, CpG or poly I:C stimulation, IgM<sup>+</sup> B cells increased CK5B transcription. This was interpreted at the time as a mechanism of B cells to attract and obtain co-stimulatory signals from T helper cells [38]. Therefore, it might be possible that this is the case for this B cell subset identified.

Cells from cluster 7 also showed a differential expression of several transcription factors such as *irf4* and *irf8* associated in mammals with different functions throughout the B cell differentiation process. For instance, the combined loss of IRF4 and IRF8 in mammals resulted in a blockage of germline κ and λ gene transcription as well as DNA recombination at the pre-B cell stage [39]. In addition, IRF4 is a common marker for activated B cells known to promote Ig CSR as well as plasma cell differentiation by positive regulation of the *Aicda* and *Blimp1* genes, respectively [40]. Other transcription factors identified as markers for this cluster also point these cells having experienced some kind of activation by either an antigen or another type of stimuli. For instance, an increased transcription of different Egr-1 homologues had also been observed in mature mammalian B cells which undergo proliferation as a consequence of B cell receptor (BCR) cross-linking [41]. The participation of this transcription factor in the differentiation program of B cells into plasma cells has also been evidenced [42]. Egr-3, on the other hand, has been identified as an activator of suppressor of cytokine signaling-1 (SOCS1) and SOCS3, inhibitors of STAT1 and STAT3, which regulate B cell function in adaptive immune responses and homeostasis by promoting antigen receptor signaling [43]. Fra-2, also identified as a key transcription factor in this cluster, is involved in the proliferation and differentiation of B cells acting as an upstream regulator of Foxo1 and Irf4 expression [44]. Thus, all the marker genes identified in the complex regulatory network identified among the markers of cluster 7 seem to indicate that these CD83<sup>+</sup> cells characterized by the expression of genes commonly related with macrophages experienced activation.

#### *4.3. Cells in Cluster 1 Correspond to IgT+ B Cells*

Cluster 1 corresponds to approximately 85% of cells with a confirmed IgT transcriptional profile. As previously established, these cells constitute an independent B cell linage with no IgM or IgD [29]. Interestingly, the functionalities and marker genes that were associated with this cluster seemed to indicate an initial stage of maturation. For instance, a significantly higher expression of two genes encoding GATA-binding factors 2-like were found in this cell subset when compared to others, and these factors have been commonly related with primitive hematopoiesis in mammals [45]. In mammals, this transcription factor is expressed in hematopoietic progenitor cells (HPCs) and is considered a key component of the transcriptional program required for early hematopoietic development [45,46]. Similarly, the identification as markers of two genes encoding the pre-B cell leukemia transcription factor 1 (*xbp1*), also support that these cells are at an initial stage of maturation, as this is one of the earliest-acting transcription factors that regulates de novo B-lineage lymphopoiesis, with critical functions during the intermediate stage between hematopoietic stem cell development and B cell commitment [47]. As IgT+ cells are known to preferentially colonize mucosal surfaces [48], it might be possible that these cells are still in an immature stage in peripheral blood thus completing their maturation in mucosal tissues.

Cluster 1 also included several HSPs among its most significant markers. In mammals, the interaction of HSPs with antigen presenting cells (APCs) such as macrophages or DCs, through receptors involved in the innate immune response, such as CD40 or Toll-like receptors 2 and 4 (TLR2/4), leads to several non-antigen specific reactions that promote the stimulation of the innate immune system [49,50]. Thus, the strong overrepresentation of these genes in this cluster may indicate that HSPs play a specific role in the maturation and/or differentiation of IgT+ B cells. Remarkably, several innate receptors such as TLR12 and TLR13 were also identified as markers for this subpopulation, being these the only TLRs identified with significant differential expression between cell clusters. Although no homologues have been identified for these genes in humans, both present homologues in mice. Murine TLR12 and TLR13 are known to activate innate immune responses via MYD88 and TRAF6, leading to NF-kappa-B activation, cytokine secretion and an inflammatory response [51,52]. Thus, the expression of these TLRs in IgT<sup>+</sup> B cells commonly related with mucosal responses [48], suggests that these cells are traveling from central immune organs to mucosal peripheral tissues where they might be activated through the action of these TLRs.

Several cells within this cluster showed a significant higher expression of two interleukin receptors, *il3r2* and *il31r*. Additionally, some cells from this cluster produced different cytokines that might have effects on other immune cells such as *il15* or *tnfsf13b* (BAFF). Studies in mice have determined that IL-15 together with recombinant CD40 ligand is able to potently induce polyclonal IgM, IgG1, and IgA secretion [53] and together with CpG oligonucleotides to induce the proliferation of class switched human memory B cells [54]. In the case of BAFF, this cytokine is a key factor for B cell survival and maturation, illustrated by the absence of mature B cells in BAFF-deficient mice [55]. Previous work performed in rainbow trout showed the regulation of splenic B cell functionality by BAFF [7] and differential effects on peritoneal IgM+ B cell subpopulations [6]. Furthermore, these studies revealed the production of BAFF by a subset of splenic IgM+ B cells pointing to an auto-regulatory mechanism of B cell survival in this organ [6]. In this case, the expression

of these molecules in peripheral blood IgT<sup>+</sup> B cells may indicate the capacity of these cells to regulate other B cell subsets or themselves in an autocrine fashion.

#### *4.4. CXCR4-Like 4 Genes as Key Markers for Fish B Cells*

Mammalian CXCR4 has been suggested as a marker for an initial common lymphoid progenitor in mice [56] and for differentiated plasmablasts or plasma cells both in mice and humans [57,58]. Several studies have analyzed the evolution and function of *cxcr4* genes in teleost fish showing the presence of either two paralog genes designated as *cxcr4a and cxcr4b* [59–61] or even four gene copies in cyprinids [62]. Our results identified four different genes annotated as rainbow trout CXCR4-like, which all seemed expressed in B cells. Interestingly, all of them were identified as potential markers for different B cell subpopulations. On one hand, two genes encoding CXCR4 (LOC110520024 and LOC110501543), which seem to be homologues to the molecule designated as CXCR4b in previous studies, were identified as shared markers for cluster 0 and cluster 8. On the other hand, the other two genes encoding homologues of the molecule previously designated as CXCR4a (LOC110516585 and LOC110530627) showed differential expression in cluster 1 and cluster 5, respectively. A previous study in fish pointed to the duplication of CXCR4 genes in teleosts evolving to a gene subfunctionalization that may stabilize the immune system controlling teleost hematopoietic stem/progenitor cell (HSPC) homeostasis [61]. In addition, these authors suggested that HSPCs expressing CXCR4a preferentially bind LPS, whereas those expressing CXCR4b preferentially bind SDF-1 reflecting different pathways of immune cell differentiation [61]. Our analysis focused in B cells seems to indicate that CXCR4 relates to an advanced state of maturation, but with different rainbow trout CXCR4 molecules carrying out a differential regulation of specific B cell subpopulations.

#### *4.5. Long Non-Coding RNAs Seem to Regulate Different B Cell Transitional Stages in Fish*

LncRNAs carry out an epigenetic regulation through chromatin modification of differentiating enhancer-associated lncRNAs (eRNAs), acting in cis, and promoter-associated lncRNAs (pRNAs), acting in trans [63]. The potential regulatory function of lncRNAs during B cell differentiation and development has recently emerged [64]. In humans, a set of over 3000 lncRNAs were detected analyzing bone marrow and thymic progenitors spanning the earliest stages of B and T lymphoid specification [65]. The expression patterns observed along these lncRNAs identified were shown to be highly stage-specific and more lineage-specific than protein-coding genes [65]. More recently, RNAseq or microarray studies using isolated B cell subpopulations highlighted equivalent expression patterns between coding genes and lncRNAs throughout B cell differentiation [66–68]. Similarly, in mouse, a repertoire of 4516 lncRNAs were identified with expression along 11 mouse B cell subpopulations, again suggesting that lncRNAs displayed a greater cell-type restriction than coding genes, showing very restricted spatiotemporal roles during B cell development [67]. Interestingly, the expression of lncRNA loci is controlled by transcription factors specific of B cell lineage, such as for example PAX5, known to regulate several lncRNAs in both pro-B and mature B cells [67].

Our results identified 60 transcripts, annotated in the *O. mykiss* genome as lncRNAs, which showed significant differences in expression levels between cell clusters. Thus, four clusters had a lncRNA as its most significant marker, in agreement with the cell-type restriction previously described in mammals.

Interestingly seven lncRNAs were identified as markers for cluster 3 in correlation with a significant enrichment in genes related with "regulation of gene expression, epigenetic". In addition, several functions showing significant enrichment in this cluster were related to the modification or assembly of different cellular organelles. For instance, the *pink1* gene was identified among the most significant markers encoding a protein in this cluster. In mammals, PINK1 modulates mitochondrial trafficking and is involved in the clearance of damaged mitochondria via selective mitophagy [69]. Recent studies in mammals have evidenced important links between mitophagy and immunity [70–72]. Mitochondrial stress

in response to pathogens and danger signals induces mitophagy through the PINK1/Parkin pathway [73]. Interestingly, an important role has been attributed to PINK1 in integrating extracellular signals with the metabolic state, thus determining T cell fate [74]. Although there are no evidence in mammals that connect PINK1 with B cell fate, our results suggest a potential connection in fish defining a specific subpopulation of peripheral blood B cells from rainbow trout.

Cluster 4 also has a high number of lncRNAs among their most significant markers. In this cluster, several enzymes related to different mechanisms of glycosylation were also identified as markers. Both N- and O- glycosylations have emerged as important translational modifications of key immune proteins including Igs [75–77]. These modifications have also been observed in the variable regions of these Igs, affecting both antibody specificity and stability [78,79]. In human blood B cells, IgG1 Fc-glycosylation with all-trans retinoic acid has been described upon stimulation [80]. Hence, our results might indicate that B cells in Cluster 4 have already encountered some kind of stimuli that has triggered a glycosylation process.

In what concerns lncRNAs, it should be mentioned that cells from cluster 5, which seemed associated with CXCR4a expression, were also highly represented by the expression of the lncRNA LOC110521442. Finally, cluster 9 is also highly represented by the lncRNA LOC110513619. Interestingly, a significant enrichment in functions related with fatty acid oxidation was observed in this cluster reflecting a specific metabolic state of this B cell subpopulation. Recent results highlighted fatty acid oxidation as a predominant energy source for *ex vivo bona fide* GC B cell growth [81]. Although no GCs have been described in fish, our results suggest that fatty acid metabolism may represent an important function in specific B cell subpopulations, opening a new field to further explore B cell immunometabolism in fish.

Altogether, our results suggest for the first time that fish lncRNAs also carry out key roles in B cell functionality, showing a very restricted spatiotemporal expression. Although the specific functions carried out by these regulators remain to be further analyzed, the correlation with B cell subpopulations associated to important functions suggests a role of lncRNAs in maintaining these different B cell states.

#### **5. Conclusions**

In summary, we have taken advantage of the recently developed 10× Genomics single cell sequencing techniques to analyze the transcriptome of 3984 MHC II+ lymphoid cells from rainbow trout peripheral blood. These cells, obtained from three independent fish, mostly corresponded to B cells as established by Ig transcription levels or expression of the B cell marker CD79. This methodology allowed us to establish 10 different B cell clusters, characterized by a specific gene expression profile. These fish were obtained from a fish farm as adults, thus throughout their development they might have encountered different stimuli, microorganisms or even pathogenic agents. For this reason, the B cell population analyzed will surely include cells in different stages of activation. However, whether the different cell clusters identified correspond to cells in different stages of activation/differentiation or in fact are diverse naïve B cell subpopulations that carry out different immune functions should be further explored. In any case, our studies highlight the diversity of the B cell response in teleost and provide further insight on key genes that mediate important processes throughout the B cell maturation/differentiation process in these species, also providing us with a panel of potential markers that might be used in the future to differentiate and further explore the functionality of these B cell subsets.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/biology10060511/s1, Table S1. Results from marker identification analysis. Rainbow trout genes showing significant differential expression (adjusted *p* < 0.001) between cell clusters (identified with different colors). Table S2. Results from GO term single enrichment analysis. GO terms significantly enriched (adjusted *p* < 0.05) within markers from each cell cluster for each Gene Ontology

category (Biological process, Molecular function and Cellular component). Different clusters are represented by different colors.

**Author Contributions:** P.P.: Investigation, Formal analysis, Writing—original draft preparation. E.M.: Investigation, Resources. C.T.: Conceptualization, Supervision, Funding acquisition, Writing reviewing and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the European Research Council (ERC Consolidator Grant 2016 725061 TEMUBLYM).

**Institutional Review Board Statement:** Procedures described comply with the Guidelines of the European Union Council (2010/63/EU) for the use of laboratory animals and were previously approved by the Ethics committee from the *Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria* (INIA; Code CEEA PROEX002/17).

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus [82] and are accessible through GEO Series accession number GSE158102 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158102; Public since 10 May 2021).

**Acknowledgments:** Lucía González is greatly acknowledged for technical assistance. We also thank Patricia Díaz-Rosales for critically reviewing the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

