**Exploring the lncRNAs Related to Skeletal Muscle Fiber Types and Meat Quality Traits in Pigs**

#### **Rongyang Li 1 , Bojiang Li 2 , Aiwen Jiang 1 , Yan Cao 1 , Liming Hou 1 , Zengkai Zhang 1 , Xiying Zhang 1 , Honglin Liu 1 , Kee-Hong Kim <sup>3</sup> and Wangjun Wu 1, \***


Received: 2 July 2020; Accepted: 31 July 2020; Published: 4 August 2020

**Abstract:** The alteration in skeletal muscle fiber is a critical factor affecting livestock meat quality traits and human metabolic diseases. Long non-coding RNAs (lncRNAs) are a diverse class of non-coding RNAs with a length of more than 200 nucleotides. However, the mechanisms underlying the regulation of lncRNAs in skeletal muscle fibers remain elusive. To understand the genetic basis of lncRNA-regulated skeletal muscle fiber development, we performed a transcriptome analysis to identify the key lncRNAs affecting skeletal muscle fiber and meat quality traits on a pig model. We generated the lncRNA expression profiles of fast-twitch *Biceps femoris* (Bf) and slow-twitch *Soleus* (Sol) muscles and identified the differentially expressed (DE) lncRNAs using RNA-seq and performed bioinformatics analyses. This allowed us to identify 4581 lncRNA genes among six RNA libraries and 92 DE lncRNAs between Bf and Sol which are the key candidates for the conversion of skeletal muscle fiber types. Moreover, we detected the expression patterns of lncRNA *MSTRG.42019* in different tissues and skeletal muscles of various development stages. In addition, we performed a correlation analyses between the expression of DE lncRNA *MSTRG.42019* and meat quality traits. Notably, we found that DE lncRNA *MSTRG.42019* was highly expressed in skeletal muscle and its expression was significantly higher in Sol than in Bf, with a positive correlation with the expression of *Myosin heavy chain 7* (*MYH7*) (*r* = 0.6597, *p* = 0.0016) and a negative correlation with meat quality traits glycolytic potential (*r* = −0.5447, *p* = 0.0130), as well as drip loss (*r* = −0.5085, *p* = 0.0221). Moreover, we constructed the lncRNA *MSTRG.42019*–mRNAs regulatory network for a better understanding of a possible mechanism regulating skeletal muscle fiber formation. Our data provide the groundwork for studying the lncRNA regulatory mechanisms of skeletal muscle fiber conversion, and given the importance of skeletal muscle fiber types in muscle-related diseases, our data may provide insight into the treatment of muscular diseases in humans.

**Keywords:** pig; skeletal muscle fiber; meat quality; metabolic diseases; lncRNA; RNA-seq

#### **1. Introduction**

Skeletal muscle is the major component of body mass accounting for approximately 50% of body mass in a mammal, and its growth and development are critical for maintaining skeletal muscle function. Skeletal muscle is composed of various muscle fibers that exhibit different physiological and metabolic properties, such as glycolysis, oxidative metabolism, and contraction [1]. Notably, dysfunctions of skeletal muscle fiber types are known to many human diseases, such as muscular dystrophy, cardiomyopathic disease, and type 2 diabetes [2–5]. Moreover, the differences in skeletal muscle fiber types directly affect meat quality postmortem in livestock, such as pH, meat color, and drip loss [6]. Therefore, elucidating the underlying mechanisms of skeletal muscle growth and development and skeletal muscle fibers formation will be useful for the treatment of human muscle diseases and the improvement of production traits of livestock.

Myogenesis is a complex process that is controlled by a series of factors, such as myogenic regulatory factors (MRFs) [7,8] and non-coding small molecule RNA microRNAs (miRNAs) [9,10]. In recent years, emerging researches found a key role of long non-coding RNAs (lncRNAs) in skeletal muscle growth and development and found that they are closely related to muscle diseases [11,12]. LncRNAs are a class of non-coding RNAs with a length of more than 200 nt and mainly transcribed by RNA polymerase II in eukaryotic organisms [13]. For example, Cesana et al. found that lncRNA linc-MD1 is expressed explicitly in the differentiating myoblasts and exerts its function through miR-133 and miR-135-regulated expression of muscle-specific transcriptional regulators, MAML1 and MEF2C [14]. Dey et al. found that lncRNA lncRNA-H19 controls skeletal muscle differentiation and regeneration by two miRNAs generated from its own exon 1 [15]. Moreover, it has been reported that lncRNAs affect skeletal muscle growth and development by regulating DNA methylation [16]. Intriguingly, researchers recently found that a skeletal muscle-specific RNA, annotated as a putative the lncRNA RNA, exerts its role in skeletal muscle physiology via encoding a conserved micropeptide [17].

Notably, the types of skeletal muscle fiber are critical factors related to various metabolic diseases in humans and affecting meat quality traits in livestock, but the researches on the genetic basis of skeletal muscle fiber formation involved in lncRNAs are still limited. In this study, we performed a comparative analysis of the whole transcriptomes between *Biceps femoris* (Bf) (fast muscle or white muscle) and *Soleus* (Sol) (slow muscle or red muscle) muscles characterized by noticeable muscle fiber type differences using RNA-seq. We further identified a series of differentially expressed (DE) lncRNAs between Bf and Sol muscles, which represent potential candidate lncRNAs affecting the conversion of skeletal muscle fiber types. Furthermore, we found several important Gene Ontology (GO) terms and metabolic pathways associated with skeletal muscle fiber formation. We proposed that lncRNA *MSTRG.42019* may be a key lncRNAs affecting skeletal muscle fiber types based on its expression profiles and relationship with porcine meat quality traits and its network interaction map. Overall, our data provide the groundwork for an understanding of the lncRNA-regulated skeletal muscle growth and development and muscle fiber conversion.

#### **2. Materials and Methods**

#### *2.1. Ethics Statement*

All experimental procedures on the pigs were conducted according to the Guidelines of the Institutional Animal Care and Use Committee of Nanjing Agricultural University, Nanjing, China (SYXK2011-0036).

#### *2.2. Animals and Samples Collection*

Three full-sibling Duroc × Meishan female pigs derived from the offspring of a Duroc boar crossing with eight Meishan sows were used in this study and the experimental pig population was constructed by Li et al. [18]. Two types of skeletal muscles with different meat color, Bf (Bf28, Bf35, and Bf36) and Sol (Sol28, Sol35, and Sol36) muscles, were dissociated from these three full-sibling Duroc × Meishan pigs. Previously, we have only identified the DE coding genes due to the limitation of the library construction method and sequencing depth [18]. In this study, we reconstructed the RNA sequencing library to identify the DE coding and non-coding genes using the Bf (Bf28, Bf35, and Bf36) and the Sol (Sol28, Sol35, and Sol36) muscles. Moreover, twenty *Longissimus dorsi* muscles from a 279 commercial hybrid pig population [Pietrain (P) × Duroc (D)] × [Landrace (L) × Yorkshire

(Y)] were selected randomly to detect the correlation between the expression of DE lncRNAs and meat quality traits, and the experimental pigs and the production traits were described previously by Dong et al. [19].

#### *2.3. RNA Isolation and LncRNA Library Construction*

Total RNA was extracted using the Trizol® reagent (Invitrogen, Carlsbad, CA, USA), and the quality of the isolated RNA was evaluated by 1% agarose gels. The purity of total RNA was initially detected using the Kaiao K5500® Spectrophotometer (Kaiao, Beijing, China). Next, RNA integrity and concentration were precisely assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). LncRNA libraries were constructed using 3 µg total RNA per sample. Ribo-ZeroTM Gold Kits (Ribo, Guangzhou, China) was used to remove the ribosomal RNA (rRNA) from the sample, and the different indexes were used to build the libraries according to the NEBNext® UltraTM Directional RNA Library Prep Kit for Illumina® (NEB, Ipswich, MA, USA). The specific steps for the library construction were as follows: firstly, rRNA was removed from total RNA and the first-strand cDNA was synthesized with fragmented RNA using a six-base random primer. Next, the second-strand cDNA was synthesized by adding buffer, dNTPs, RNase H, and DNA Polymerase I. Subsequently, the library fragments were recovered by agarose gel electrophoresis after purification using the QIAQuick PCR kit (Qiagen, New York, NY, USA), followed by the end-repair, adding base A and adapter ligation. Finally, the double cDNA strand was digested with uracil-N-glycosylase (UNG) enzyme and subjected to PCR amplification, and the 300 bp target fragments were recovered by gel electrophoresis to obtain the sequencing libraries. The RNA concentration of libraries was measured using the Qubit® RNA Assay Kit (Thermo Fisher, New York, NY, USA) in Qubit® 2.0 (Thermo Fisher, New York, NY, USA), and the size of the inserted fragment was assessed using Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). After cluster generation, sequencing was performed on an Illumina Hiseq 4000 sequencing platform with 150 bp paired-end reads. The raw data of the transcriptome sequence data have been deposited in NCBI SRA (accession codes PRJNA597666).

#### *2.4. Quality Control and Genomic Alignment*

Raw data were processed with Perl scripts to obtain high-quality clean data for further analyses. The filtering criteria were as follows: (1) remove the adaptor-polluted reads (reads containing more than 5 adapter-polluted bases were regarded as adaptor-polluted reads and were filtered out); (2) remove the low-quality reads (reads with a Phred quality value of less than 19 accounting for more than 15% of total bases are regarded as low-quality reads); (3) remove reads whose number of N bases account for more than 5%. As for paired-end sequencing data, both reads were filtered out if any reads of the paired-end reads were adaptor-polluted. The pig reference genome (*Sscrofa11.1*) and the annotation file were downloaded from the Ensembl database (http://www.ensembl.org/index.html). Bowtie2 v2.2.3 was used for building the genome index [20], and clean data were mapped to the reference genome using HISAT2 v2.0.5 [21].

#### *2.5. Identification of LncRNAs*

Novel transcripts were reconstructed using the StringTie software (v1.3.3b) with the default parameters using the mapped clean reads [22], and GffCompare was used to screen out known mRNAs and other non-coding RNAs (rRNA, tRNA, snoRNA, snRNA, known lncRNA, etc.). Furthermore, the known lncRNAs were also figured out through comparative analysis. Subsequently, the novel potential lncRNAs transcripts were identified according to the transcript length (≥200 bp), the exon number (≥2), and the read coverage of each transcript (≥5). Then, we used four kinds of tools to predict the protein-coding potential of novel transcripts to confirm novel lncRNAs, including the Coding-Non-Coding Index (CNCI) [23], Coding Potential Calculator (CPC) [24], PFAM protein domain analysis [25,26], and the Coding Potential Assessment Tool (CPAT) [27]. Finally, the co-existing

transcripts were designated as novel lncRNAs. Moreover, we performed structure and expression characteristic analyses of the novel lncRNAs and known lncRNAs.

#### *2.6. Di*ff*erentially Expressed Gene Analyses*

HTseq v0.6.0 [28] was utilized to calculate the read count for each gene, and RPKM (Reads Per Kilobase Millon Mapped Reads) was used to estimate the expression level of genes in each sample. The formula is shown as

$$\text{RPKM} = \frac{10^6 \times R}{\text{NL}/10^3}. \tag{1}$$

*R* represents the number of reads assigned to a specific gene in each sample, *N* represents the total number of mapped reads in each sample, and L represents the length of a specific gene. Furthermore, DEseq2 [29] was used to identify the differentially expressed genes (DEGs), including protein-coding and non-coding genes, between Bf and Sol groups. DEGSeq (v1.16) [30] was used to determine the DEGs between two paired samples (Bf28 vs. Sol28; Bf35 vs. Sol35; Bf36 vs. Sol36). The *p*-value was adjusted by the Benjamini and Hochberg's approach to control the false discovery rate, and *q* ≤ 0.05 and |log2\_ratio| ≥ 1 were identified as DEGs.

Furthermore, nine lncRNAs were selected for validating the RNA-seq results using qRT-PCR. cDNA was synthesized with the total RNA derived from the same muscle samples for RNA-seq with the Takara PrimeScriptTM II 1st Strand cDNA Synthesis Kit (TaKaRa, Dalian, China), and qRT-PCR was performed on a Step-One Plus Real-Time PCR System (Applied Biosystems, Carlsbad, CA, USA) using the AceQ qPCR SYBR Green Master Mix (Vazyme, Nanjing, China). The primers for quantitative analysis were shown in Additional File 5: Table S1. Relative expression levels were calculated using the 2−∆∆ct value method [31] and porcine *GAPDH* was used for normalization of lncRNAs expression as an endogenous reference gene.

#### *2.7. Target Prediction of DE LncRNAs and GO and KEGG Enrichment Analyses*

The Spearman correlation coefficient between the expression of mRNAs and DE lncRNAs was calculated and the mRNAs with high a Spearman correlation coefficient (r ≥ 0.90 or ≤ −0.90) were selected as the trans-targets of DE lncRNAs, while the mRNAs with a distance of less than 50 kb to DE lncRNAs were selected as the cis-targets based on the possible region of target genes of lncRNAs [32,33]. To understand the biological functions of the target genes of DE lncRNAs, we performed the GO (Gene Ontology, http://geneontology.org/) and KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.kegg.jp/) enrichment analyses. The GO and KEGG enrichment of target genes of DE lncRNAs was implemented by the hypergeometric test in which the *p*-value was adjusted by multiple comparisons as *q*-value. The GO and KEGG terms with *p* < 0.05 were considered to be significantly enriched.

#### *2.8. Expression Patterns of LncRNA MSTRG.42019*

Among these DE lncRNAs, we found that lncRNA *MSTRG.42019* was a promising DE lncRNA affecting skeletal muscle growth and development and muscle fiber types. To validate its expression pattern, we detected its expression levels in Bf and Sol and analyzed its expression patterns in different tissues of adult pigs and 70-day-old fetuses (P70) and *Longissimus dorsi* muscles derived from different developmental stages, including 110-day-old fetuses (P110), the day of birth (D0), and 70 days after birth (D70). Relative expression levels of lncRNA *MSTRG.42019* were calculated using the 2−∆∆ct value method [31] and normalized using porcine *GAPDH* as an endogenous reference gene.

#### *2.9. Correlation Analyses of LncRNAs MSTRG.42019 and Meat Quality Traits*

The skeletal muscle mass and muscle fiber types are closely related to porcine meat quality. To confirm the correlation between lncRNA *MSTRG.42019* and meat quality traits, we randomly selected 20 samples from 279 [(P) × (D)] × [(L) × (Y)] commercial pig populations and detected

the expression levels of *Myosin heavy chain 7* (*MYH7*), *Myosin heavy chain 4* (*MYH4*), and *Myoglobin* (*MyoB*) using qRT-PCR and performed the correlation analyses between the expression of lncRNA *MSTRG.42019* and *MYH7*, *MYH4*, and *MyoB*. Furthermore, we performed the correlation analyses between the expression levels of lncRNA *MSTRG.42019* and meat quality traits, including carcass weight, back fat, intramuscular fat, pH45min, *L*24h, *a*24h, *b*24h, glycolytic potential, and drip loss.

#### *2.10. Construction of LncRNA MSTRG.42019–mRNA Network*

DEGs and the *trans*- or *cis*-targets of DE lncRNAs were derived from the RNA-seq data in this study, and the network, including DEGs and *trans*- or *cis*-targets of DE lncRNAs *MSTRG.42019*, was constructed using the Cytoscape software. The visualized network was edited based on the attribution of each gene [34]. Furthermore, we performed a pathway search for the DE target genes of lncRNA *MSTRG.42019* using the KEGG Mapper tool (https://www.kegg.jp/kegg/tool/map\_pathway2.html).

#### *2.11. Statistical Analyses*

Statistical analyses were performed using the Prism 7 software (GraphPad Software, San Diego, CA, USA), and data were expressed as mean ± SE unless otherwise noted. Differences were tested using a two-tailed unpaired Student's *t*-test or a one-way analysis of variance (ANOVA) with Duncan's test. A *p*-value of <0.05 was considered significantly different.

#### **3. Results**

#### *3.1. Generation of Transcriptome Data*

In this study, we generated six RNA-seq libraries, namely, Bf28, Bf35, Bf36, Sol28, Sol35, and Sol36. The alignment results showed that the average ratio of rRNA mapping was less than 0.20%, and the ratio of clean Q30 bases (those with a base quality of >30 and an error rate of <0.001) in each library was higher than 94%, indicating high-quality raw reads from each library. After strict filtering, more than 15 Gb clean bases data corresponding to more than 100 million reads were obtained from each library after strick filtering, and the ratio of clean reads in each library reached more than 93%. The transcriptome sequence data generated from this study were deposited in NCBI SRA (accession code: PRJNA597666).

#### *3.2. Novel LncRNAs Prediction and Characteristics Analyses*

In this study, 5864 overlapped novel lncRNA transcripts without coding potential were identified using four kinds of predicted tools (Additional File 1: Figure S1). Detailed information for the coding potential of novel predicted lncRNAs are listed in Additional File 6: Table S2. Additional File 6: Table S2-1, Table S2-2, Table S2-3, and Table S2-4 represent the predicted results from CNCI, PFAM protein domain analysis, CPAT, and CPC, respectively. Moreover, our results showed that the exon number of lncRNAs was gathered at 2–4, while the exon number of mRNA was distributed in 1–29 (Additional File 2: Figure S2A). The length distribution showed that the length of lncRNAs was shorter than mRNA and mainly distributed in 200 bp and 2900 bp (Additional File 2: Figure S2B). In addition, our results indicated that the relative expression level of lncRNAs was lower than that of mRNA (Additional File 2: Figure S2C).

#### *3.3. Identification of DE LncRNAs and Target Genes Prediction*

Genomic expression analysis revealed that 4362 novel and 219 known lncRNA genes were detectable from Bf vs. Sol. In these, 3786 novel and 125 known lncRNA genes were detectable from Bf28 vs. Sol28, and 3741 novel and 126 known lncRNA genes were detectable from Bf35 and Sol35, and 3787 novel and 120 known lncRNA genes were detectable from Bf36 and Sol36 (Additional File 7: Table S3). Furthermore, 92 DE lncRNA genes were identified between Bf and Sol, including 35 up-regulated and 57 down-regulated lncRNAs (Figure 1A, Additional File 8: Table S4). Hierarchical clustering showed

that lncRNA expression patterns between Bf and Sol were distinguishable (Figure 1B), indicating a substantial difference exists between the Bf and Sol groups, while a small variation exists among three biological replicates. Moreover, 600 up-regulated and 775 down-regulated lncRNA genes were identified in Bf28 vs. Sol28; 1077 up-regulated and 349 down-regulated lncRNA genes were identified in Bf35 vs. Sol35; and 618 up-regulated and 602 down-regulated lncRNA genes were identified in Bf36 vs. Sol36 (Additional File 8: Table S4). In addition, 246 DE coding genes were identified between Bf and Sol (Additional File 9: Table S5).

**Figure 1.** Statistics and heat map analyses of differentially expressed (DE) long non-coding RNAs (lncRNAs). (**A**) Statistics of DE lncRNAs. The X-axis represents the different compared groups, *Biceps femoris* (Bf) vs. *Soleus* (Sol) indicates the DE lncRNAs from the DEseq2 method; Bf28 vs. Sol28, Bf35 vs. Sol35, and Bf36 vs. Sol36 indicate DE lncRNAs from the DEGseq method. The Y-axis shows the number of DE lncRNAs. (**B**) Heat map analysis of DE lncRNAs between Bf and Sol muscles. Heat map analysis was conducted with 92 overlapped DE lncRNAs among three different comparative groups (Bf28 vs. Sol28, Bf35 vs. Sol35, and Bf36 vs. Sol36). Each column represents a sample and each row represents a DE lncRNA. Yellow and blue gradients indicate an increase and decrease in gene expression abundance, respectively.

To validate the RNA-Seq results, nine DE lncRNAs, including two known lncRNAs and seven novel lncRNAs, were selected to confirm their expression differences between Bf and Sol by qRT-PCR assay. We found that the expression patterns of these lncRNAs were in agreement with that in RNA-Seq (Figure 2A) and the correlation coefficient of two methods was 0.8957 (*p* = 0.0011) (Figure 2B). These results indicate that DE lncRNAs identified from RNA-Seq were reliable. In addition, since lncRNAs exert their functions mainly via interaction with protein-coding genes through *cis* or *trans*, *cis* and *trans* target genes prediction of DE lncRNAs from Bf vs. Sol was performed. A total of 111 *cis* target genes and 9,050 *trans* target genes were predicted (Additional File 10: Table S6).

−△△ **Figure 2.** Validation of DE lncRNAs and correlation analysis. (**A**) Validation of DE lncRNAs by qRT-PCR (*n* = 3). Relative expression levels were calculated using the 2−∆∆ct value method and porcine *GAPDH* was used for normalization of lncRNAs expression levels as an endogenous reference gene. The X-axis represents DE lncRNAs and the Y-axis represents the log<sup>2</sup> (fold change) for qRT-PCR and RNA-Seq. (**B**) Correlation analysis of the expression of DE lncRNAs between qRT-PCR and RNA-Seq. The X and Y-axis represent the log<sup>2</sup> (fold change) measured by qRT-PCR and RNA-Seq, respectively.

#### *3.4. GO and KEGG Pathway Enrichment Analyses of Target Genes of DE LncRNAs*

To understand the function of these DE lncRNAs, we performed the functional enrichment analyses for their target genes. GO analysis results showed that some interesting GO terms related to skeletal muscle fiber properties, such as lipid metabolic process, muscle system process, muscle contraction, muscle structure development, and ATP metabolic process, were significantly enriched in Biological Process (BP); and the mitochondrial part, mitochondrion, mitochondrial membrane part, respiratory chain complex, and mitochondrial membrane were significantly enriched in Cellular Component (CC); the catalytic activity, protein serine/threonine kinase activity, kinase activity, protein kinase activity, and calcium ion binding were significantly enriched in Molecular Function (MF) (Additional File 11: Table S7). KEGG pathway analysis showed that Alzheimer's disease, Parkinson's disease, Huntington's disease, non-alcoholic fatty liver disease, and thermogenesis are the five most significantly enriched pathways; moreover, glycolysis/gluconeogenesis, cardiac muscle contraction, fatty acid metabolism, fatty acid degradation, and pyruvate metabolism pathways were also significantly enriched (Additional file 11: Table S7).

#### *3.5. DE LncRNAs A*ff*ecting Muscle Fiber Types*

In this study, 53 overlapped DE lncRNAs were filtered out from Bf28 vs. Sol28, Bf35 vs. Sol35, Bf36 vs. Sol36, and Bf and Sol (Figure 3A and Additional File 12: Table S8), and the heat map of 53 DE lncRNAs with distinguishable expression patterns is shown in Figure 3B. Among these DE lncRNAs, we noted that the lncRNA *MSTRG.42019* expression level was significantly higher in Sol muscle than that in Bf muscle (Figures 3B and 4A) and was relatively highly expressed in skeletal muscle than that in other tissues in adult pig (Figure 4B and Figure S3A) and fetuses (Figure 4C). In addition, the expression patterns of lncRNA *MSTRG.42019* were shown to up-regulate before birth from 110 days of pregnancy (P110) to the day of birth (D0) and then down-regulated after birth (Figure 4D and Figure S3B).

**Figure 3.** Overlapped and heat map analyses of DE lncRNAs. (**A**) Venn diagram of DE lncRNAs. A total of 53 overlapped DE lncRNAs were obtained from DEseq2 and DEGseq methods. Different color represents a different combination, and the number in the overlapped region represents the overlapped DE lncRNAs number. (**B**) Heat map analysis of 53 DE lncRNAs. Heat map analysis was conducted with 53 overlapped DE lncRNAs from four different comparative groups (Bf28 vs. Sol28, Bf35 vs. Sol35, Bf36 vs. Sol36, and Bf vs. Sol). Each column represents a sample and each row represents a DE lncRNA. Red and green gradients indicate an increase and decrease in gene expression abundance, respectively.

#### *3.6. Correlation between LncRNA MSTRG.42019 Expression and Meat Quality Traits*

To elucidate the relationship between lncRNA *MSTRG.42019* and meat quality traits, we detected the expression levels of lncRNA *MSTRG.42019* and *MYH7*, *MYH4*, and *MyoB* in 20 *Longissimus dorsi* muscles derived from a 279 [(P) × (D)] × [(L) × (Y)] commercial hybrid pig population and performed correlation analyses between the expression of lncRNA *MSTRG.42019* and *MYH7*, *MYH4*, *MyoB*, and meat quality traits. Our results showed that expression levels of lncRNA *MSTRG.42019* were positively correlated with the mRNA expression level of *MYH7* (*r*=0.659, *p*=0.0016) (Figure 5A), with no correlation with the expression levels of *MyoB* and *MYH4* (Additional File 4: Figure S4A,B). Additionally, the expression level of lncRNA *MSTRG.42019* was negatively correlated with glycolytic potential (*r* = −0.5447, *p* = 0.013) (Figure 5B) and drip loss (*r* = −0.5085, *p* = 0.0221) (Figure 5C), respectively. Furthermore, no correlation was observed between the expression of lncRNA *MSTRG.42019* and the carcass weight (Additional File 4: Figure S4C), back fat (Additional File 4: Figure S4D), pH45min (Additional File 4: Figure S4E), *L*24h (Additional File 4: Figure S4F), *a*24h (Additional File 4: Figure S4G), *b*24h (Additional File 4: Figure S4H), and intramuscular fat (Additional File 4: Figure S4I).

−△△ ≤ ≤ **Figure 4.** Expression patterns of lncRNA *MSTRG.42019*. (**A**) Expression of lncRNA *MSTRG.42019* in Bf and Sol muscles. (**B**) Expression profile of lncRNA *MSTRG.42019* in different tissues of adult pigs, *n* = 3. (**C**) Expression profile of lncRNA *MSTRG.42019* in different tissues of 70-day-old fetuses, *n* = 3. (**D**) Expression patterns of lncRNA *MSTRG.42019* in *Longissimus dorsi* muscles derived from different developmental stages, *n* = 3. Expression levels were determined by qRT-PCR and relative expression levels were calculated using the 2−∆∆ct method and normalized to the expression of *GAPDH*. The expression of lncRNA *MSTRG.42019* in the sample of the first column was determined as control and normalized to 1. All data are presented as mean ± SE, and an unpaired Student's *t*-test in the Prism 7 software was performed to evaluate significant differences between Bf and Sol. ANOVA with Duncan's test was used to evaluate significant differences between groups P70 (70 days of pregnancy), P110 (110 days of pregnancy), D0 (the day of birth), and D70 (70 days after birth). \* *p* ≤ 0.05, different letters above the bars indicate significant differences; *p* < 0.01.

**Figure 5.** Correlation analyses between the expression of lncRNA *MSTRG.42019* and muscle fiber-related genes and meat quality traits. (**A**) Correlation between the expression of lncRNA *MSTRG.42019* and *Myosin heavy chain 7* (*MYH7*); (**B**) correlation between the expression of lncRNA *MSTRG.42019* and glycolytic potential of *Longissimus dorsi* muscles; (**C**) correlation between the expression of lncRNA *MSTRG.42019* and drip loss of *Longissimus dorsi* muscles. Twenty *Longissimus dorsi* muscles were derived from a 279 [(P) × (D)] × [(L) × (Y)] commercial hybrid pig population.

#### *3.7. LncRNA MSTRG.42019–mRNA Interaction Network*

To further explore the underlying function of lncRNA *MSTRG.42019*, we constructed a network map of lncRNA *MSTRG.42019* and its target genes and performed the KEGG pathway search for the DE target genes of lncRNA *MSTRG.42019* (Figure 6). The results showed that the *parvalbumin* (*PVALB*) was the down-regulated gene targeted by lncRNA *MSTRG.42019*. Moreover, the up-regulated expressed genes, including *Glycerol-3-phosphate acyltransferase* (*GPAT*), *Tropomyosin-3* (*TPM3*), *R-spondin 3* (*RSPO3*), *Heat shock protein family B member 6* (*HSPB6*), *Leucine-rich glioma inactivated 1* (*LGI1*), *Cholinergic receptor nicotinic alpha 6 subunit* (*CHRNA6*), *Solute carrier family 35 member F1* (*SLC35F1*), *Acyl-CoA dehydrogenase very long chain* (*ACADVL*), and *Microtubule*-*associated serine*/*threonine kinase 2* (*MAST2*) were also predicted as the targets of lncRNA *MSTRG.42019*. In addition, many non-DE genes were targeted by lncRNA *MSTRG.42019*. Furthermore, the KEGG pathway search showed that the DEGs *TPM3*, *RSPO3*, *ACADVL*, and *CHRNA6* were mapped to specific pathways.

**Figure 6.** LncRNA *MSTRG.42019*–target mRNA interaction network and KEGG pathway search. Cytoscape was used to construct the interaction network between lncRNA *MSTRG.42019* and target mRNAs. Red and green represent down-regulated and up-regulated target genes from Bf and Sol, respectively, and gray indicates non-DE target genes. The KEGG pathway search for DE target genes was conducted using the website tool KEGG Mapper (https://www.kegg.jp/kegg/tool/map\_pathway2.html).

#### **4. Discussion**

In this study, we predicted 5864 novel lncRNA transcripts from the six skeletal muscle libraries, and our data showed that the characteristics of exon number, length distribution, and genomic expression level of these lncRNAs are similar to the previous report [35] and distinguished with the mRNAs. These predicted novel lncRNAs may play an important role in skeletal muscle growth and development, but the specific function still needs to be further explored. Moreover, 92 DE lncRNAs were identified between Bf and Sol from the RNA-seq analyses. Bf mainly composed of type IIB muscle fiber is a type of fast-twitch muscle, whereas Sol enriched in type I muscle fiber is a type of slow-twitch muscle [18]. Thus, these DE lncRNAs are the key candidates for the regulation of skeletal muscle fiber types. Certain types of skeletal muscle fiber are known to play a critical role in determining the meat quality postmortem in livestock, such as pH, meat color, and drip loss [6]. Thus, the DE lncRNAs identified in this study are the promising candidates influencing meat quality traits. Moreover, it is known that the alteration of muscle fiber types leads to the disorder of muscle physiological and

metabolic properties, and thereby may cause many human diseases, such as muscular dystrophy, cardiomyopathic disease, and type 2 diabetes [2–5]. Thus, our data might provide some insights into human muscle diseases.

To understand the potential function of the identified lncRNAs, the target genes of DE lncRNAs were subjected to functional enrichment analyses. In the BP category, we found that the ATP metabolic process is one of the significantly enriched GO terms. ATP is the energy currency of living cells. Many biological processes are depended on the energy of ATP hydrolysis, for instance, the chromatin-structure dynamics critical for the regulation of gene expression and the chromosome function relies on ATP-dependent chromatin-remodeling complexes [36,37]. Skeletal muscle growth and development are also ATP-dependent, where the ATP induces muscle hypertrophy in slow muscle Sol via activating mTOR signaling pathway [38]. Furthermore, ATP is involved in the regulation of muscle fiber types and glucose metabolism by activation of P2Y receptors, phosphatidylinositol 3-kinase, Akt, and AS160 [39]. However, the regulatory mechanisms of lncRNAs on ATP metabolism and the effect of the lncRNA-ATP signaling pathway on skeletal muscle growth and muscle fiber conversion are still unknown.

Our KEGG pathway analysis showed that the cardiac muscle contraction pathway is significantly enriched. This result suggests that the DE lncRNAs identified in the skeletal muscle also play an important role in myocardial development. In addition, the glycolysis/gluconeogenesis, closely related to skeletal muscle fiber types, is significantly enriched. Moreover, fatty acid metabolism and fatty acid degradation pathways are also significantly enriched, while recent studies found that the change of fatty acid composition was accompanied by the transformation of skeletal muscle fiber types [40,41]. Thus, it will be interesting to investigate the specific regulatory mechanisms of the skeletal muscle fiber types and fatty acid composition mediated by the DE lncRNAs identified in this study. Different skeletal muscle fibers exhibit different glucose metabolic properties, while the pyruvate metabolism pathway critical for glucose metabolism is significantly enriched. The significantly enriched oxidative phosphorylation is the major source of ATP, while ATP is essential for many biological processes, including the growth and development and the fiber types of skeletal muscle [38,39]. Therefore, it is intriguing to investigate the potential mechanisms of the oxidative phosphorylation pathway regulating muscle fiber types which are mediated by the DE lncRNAs identified in this study. Furthermore, Alzheimer's disease, Huntington's disease, and Parkinson's disease are three of the most significantly enriched pathways, thus the relationship between the skeletal muscle fiber types and these diseases is worth exploring.

To screen the key candidate lncRNAs affecting skeletal muscle fiber types, we performed the overlapped analysis of DE lncRNAs from Bf28 vs. Sol28, Bf35 vs. Sol35, Bf36 vs. Sol36, and Bf vs. Sol and identified 53 overlapped DE lncRNAs. Sol muscle (slow muscle) and Bf muscles (fast muscle) are two different skeletal muscles with various muscle fibers and have different physiological and metabolic properties, such as glycolytic potential. Therefore, these overlapped DE lncRNAs are most likely to regulate the formation of different muscle fiber types and participate in the regulation of glucose metabolism in skeletal muscle. Intriguingly, among these DE lncRNAs, we found that lncRNA *MSTRG.42019* is highly expressed in skeletal muscle and its expression is significantly higher in Sol muscle than in Bf muscle, suggesting that lncRNA *MSTRG.42019* may contribute to skeletal muscle fiber type conversion. Moreover, we found that the expression patterns of lncRNA *MSTRG.42019* were shown to up-regulate before birth from 110 days of pregnancy (P110) to the day of birth (D0) and then down-regulate after birth. However, the complete expression pattern of lncRNA *MSTRG.42019* at different developmental stages needs to be verified using more stages. Overall, our data suggest that lncRNA *MSTRG.42019* may participate in the regulation of skeletal muscle growth and development and are involved in skeletal muscle fibers conversion. However, understanding of the specific regulatory mechanisms warrants further study.

The above studies suggest that lncRNA *MSTRG.42019* is a promising lncRNAs affecting skeletal muscle fiber types, while skeletal muscle fibers are closely related to meat quality traits [42]. Therefore, we subsequently performed a correlation analysis between the lncRNA *MSTRG.42019* expression and meat quality traits and skeletal muscle fiber-related marker genes, *MYH7*, *MYH4*, and *MyoB*. Our results showed that lncRNA *MSTRG.42019* expression is significantly positively correlated with the *MYH7* expression, which is consistent with the expression of lncRNA *MSTRG.42019* higher in the Sol muscle (slow muscle) than that in the Bf muscle (fast muscle). These results suggest that lncRNA *MSTRG.42019* may regulate the formation of slow muscle fiber. Furthermore, we observed that the lncRNA *MSTRG.42019* expression is significantly correlated with glycolytic potential and drip loss. The slow muscle fibers (type I) display stronger oxidative metabolism capacity. On the other hand, fast muscle fibers (type IIb) exhibit higher glycolytic capacity, which is in accordance with the result of the lncRNA *MSTRG.42019* expression level significantly negatively correlated with glycolytic potential. The glycolytic potential is critical for pH decline postmortem, in turn, the change of pH is closely related to drip loss [43,44]. The change of pH in slow muscle with lower glycolytic potential postmortem is relatively moderate when compared with fast muscle with higher glycolytic potential, and the drip loss in slow muscle is relatively lower. The lncRNA *MSTRG.42019* expression level is higher in slow muscle than fast muscle, which is consistent with the result of its expression negatively correlated with the drip loss.

The predicted interaction network is very helpful for further study of the potential regulatory mechanism. In this study, we constructed a network map using the lncRNA *MSTRG.42019* and its target genes. *Parvalbumin* (*PVALB*) is the only down-regulated gene targeted by lncRNA *MSTRG.42019* in the interaction network. PVALB, a Ca2<sup>+</sup> binding protein, is known to play a key role in muscle relaxation and it acts as an intracellular Ca2<sup>+</sup> buffer to determine the duration and magnitude of the activating Ca2<sup>+</sup> signal and the force and duration of contraction [45,46]. *PVALB* is positively associated with relaxation speed in skeletal muscle, but less PVALB expressed in the slow-twitch skeletal muscles, cardiac, or smooth muscle [47]. Moreover, ectopic expression of *PVALB* by injection of its cDNA into the slow-twitch muscle of rats leads to an increased relaxation rate [48]. Conversely, the fast-twitch muscle of the *PVALB*-knockout mouse showed a slower Ca2<sup>+</sup> transient and relaxation rate than the wild-type mouse [49]. Since our result showed that lncRNA *MSTRG.42019* is highly expressed in Sol, we speculated that lncRNA *MSTRG.42019* could promote the formation of slow muscle fiber via Ca2<sup>+</sup> signaling pathway mediated by *PVLAB*. Moreover, Glycerol-3-phosphate acyltransferase (GPAT) is a limited enzyme that can acylate the glycerol 3-phosphate in the first committed step in triacylglycerol (TAG) synthesis phosphate pathway [50]. Therefore, lncRNA *MSTRG.42019* may affect the intramuscular fat content and fatty acid composition by up-regulating *GPAT* expression. *Tropomyosin-3* (*TPM3*) is required for the early steps of myofibril formation in zebrafish [51] and our result showed that porcine lncRNA *MSTRG.42019* is highly expressed in P110, which suggests that the effects of lncRNA *MSTRG.42019* on embryonic or fetal skeletal muscle development might be associated with the *TPM3*. In addition, lncRNA *MSTRG.42019* also may up-regulate the expression of *RSPO3*, *HSPB6*, *LGI1*, *CHRNA6*, *SL35F1*, *ACADVL*, and *MAST2*. Furthermore, the KEGG pathway search showed that the DEGs *TPM3* involves in cardiomyopathy, *RSPO3* participates in Wnt signaling pathway, and *ACADVL* is critical for fatty acid metabolism. However, the relationship between lncRNA *MSTRG.42019* and its target genes, the exact mechanisms of lncRNA *MSTRG.42019*-related growth and development, and the conversion of the fiber types of skeletal muscle still need to be further explored.

In addition, many non-DE genes are targeted by lncRNA *MSTRG.42019*, such as *Cytochrome C Oxidase Subunit 5A* (*COX5A*). *COX5A* encodes a subunit of the respiratory chain complex IV (cytochrome c oxidase) [52] and its expression is associated with niacin-induced changes in fiber type distribution and expression of MHC isoforms. Moreover, we found that lncRNA *MSTRG.42019* targets several ATPase and ATP synthetase genes, including *ATP1A2*, *ATP2B2*, *ATP5E*, *ATP5I*, and *ATP5O*. These results were consistent with the results that the ATP metabolic process is significantly enriched in GO analysis, suggesting that lncRNA *MSTRG.42019* may play a pivotal role in energy metabolism. Collectively, our data provide preliminary evidence that lncRNA *MSTRG.42019* may play an important role in muscle fiber type specification, mitochondrial respiration activity, and energy metabolism.

#### **5. Conclusions**

We identified 92 DE lncRNAs between Bf and Sol which are promising candidates regulating skeletal muscle fibers and muscle fiber-related metabolic processes and found some potential signaling pathways related to skeletal muscle fiber formation and metabolic properties mediated by these DE lncRNAs, such as ATP metabolic process pathway, fatty acid metabolic pathway, and pyruvate metabolism. Furthermore, we identified that a skeletal muscle highly expressed DE lncRNA *MSTRG.42019* and documented that it is closely associated with meat quality traits. Overall, our data provide a solid foundation for an in-depth investigation of the lncRNA regulatory mechanisms of skeletal muscle growth and development and skeletal muscle fiber conversion.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/8/883/s1, Figure S1: Venn diagram of non-coding potential novel lncRNAs. The coding potential of novel lncRNAs was predicted using CNCI, CPC, PFAM, CPAT analysis. Figure S2: Characteristics of lncRNAs and mRNAs. (A) Length distribution of lncRNAs and mRNAs. (B) The number of lncRNAs and mRNAs exon. (C) Distribution of lncRNAs and mRNAs expression. Table S1: Primers used for qRT-PCR analysis. Figure S3: Expression patterns of lncRNA *MSTRG.42019* in Bf and Sol muscles, different tissues of adult pigs, and *Longissimus dorsi* muscles derived from different developmental stages, *n* = 3. Expression levels were determined by qRT-PCR and relative expression levels were calculated using the 2−∆∆ct method and normalized to the expression of *HPRT*. The expression of lncRNA *MSTRG.42019* in the sample of the first column was determined as control and normalized to 1. All data are presented as mean ± SE. Figure S4: Correlation analyses between the expression of lncRNA *MSTRG.42019* and genes and meat quality traits. Correlation between the expression of lncRNA *MSTRG.42019* and *MyoB* (A), *MYH4* (B), carcass weight (C), back fat (D), pH45min (E), *L*24h (F), *a*24h (G), *b*24h (H), and intramuscular fat (I), respectively. Twenty *Longissimus dorsi* muscles were derived from a 279 [(P) × (D)] × [(L) × (Y)] commercial hybrid pig population. Table S2: Novel predicted lncRNA transcripts; Table S3: Genomic expression analysis of all lncRNA genes; Table S4: DE lncRNAs from Bf and Sol groups; Table S5: DE mRNAs from Bf and Sol groups; Table S6: Target gene prediction of lncRNAs; Table S7: GO and KEGG enrichment analyses; Table S8: 53 DE lncRNAs.

**Author Contributions:** W.W. conceived the experiments; R.L., B.L., and A.J. performed the experiments and analyzed the data. Y.C., Z.Z., X.Z., R.L., and W.W. wrote and revised the paper. L.H. and K.-H.K. revised the paper. H.L. reviewed the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation (31591920) and Nanjing Agricultural University Research Start-up Funding (060804008).

**Conflicts of Interest:** The authors declare that they have no competing interests.

#### **Abbreviations**

Bf: *Biceps femoris*; Sol: *Soleus*; DE: differentially expressed; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; lncRNAs: long non-coding RNAs. *MYH4*: *Myosin heavy chain 4*; *MYH7*: *Myosin heavy chain 7*.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Evaluation of the CRISPR**/**Cas9 Genetic Constructs in E**ffi**cient Disruption of Porcine Genes for Xenotransplantation Purposes Along with an Assessment of the O**ff**-Target Mutation Formation**

#### **Natalia Ryczek 1, \* , Magdalena Hryhorowicz 1 , Daniel Lipi ´nski 1 , Joanna Zeyland <sup>1</sup> and Ryszard Słomski 2**


Received: 21 May 2020; Accepted: 24 June 2020; Published: 26 June 2020

**Abstract:** The increasing life expectancy of humans has led to an increase in the number of patients with chronic diseases and organ failure. However, the imbalance between the supply and the demand for human organs is a serious problem in modern transplantology. One of many solutions to overcome this problem is the use of xenotransplantation. The domestic pig (*Sus scrofa domestica*) is currently considered as the most suitable for human organ procurement. However, there are discrepancies between pigs and humans that lead to the creation of immunological barriers preventing the direct xenograft. The introduction of appropriate modifications to the pig genome to prevent xenograft rejection is crucial in xenotransplantation studies. In this study, porcine *GGTA1*, *CMAH*, β*4GalNT2*, *vWF*, *ASGR1* genes were selected to introduce genetic modifications. The evaluation of three selected gRNAs within each gene was obtained, which enabled the selection of the best site for efficient introduction of changes. Modifications were examined after nucleofection of porcine primary kidney fibroblasts with CRISPR/Cas9 system genetic constructs, followed by the tracking of indels by decomposition (TIDE) analysis. In addition, off-target analysis was carried out for selected best gRNAs using the TIDE tool, which is new in the research conducted so far and shows the utility of this tool in these studies.

**Keywords:** xenoantigen; coagulation system dysregulation; CRISPR/Cas9 system; genome modifications; non-homologous DNA ends joining (NHEJ); TIDE analysis; off-target

#### **1. Introduction**

The increasing life expectancy of humans has led to an increase in the number of patients with chronic diseases and organ failure. Organ transplantation is an effective approach in the treatment of the end-stage organ failure. However, the imbalance between the supply and the demand for human organs is a serious problem in modern transplantology. In view of the above, it is assumed that alternative approaches would reduce or eliminate this problem. One of many solutions is the use of xenotransplantation [1,2].

Xenotransplantation is any procedure that involves the transplantation, implantation, or infusion of a recipient (in this case a human) with zoonotic cells, tissues, or organs. In addition, therapies using human body fluids, tissues, organs, or cells that have had ex vivo contact with animal organs, tissues, or cells are also subject to this term [3]. The domestic pig (*S. s. domestica*) is currently considered as the most suitable species for human organ procurement. The reasons for choosing the pig as a donor animal include its relatively large litter size and short puberty, the size and physiological similarity of its organs to human organs, and the low risk of xenozoonosis transmission [4]. However, there are discrepancies between pigs and humans that lead to the creation of immunological barriers preventing the direct xenograft. These differences cause xenograft rejection [5]. The introduction of appropriate modifications into the porcine genome to prevent xenograft rejection is crucial in xenotransplantation studies. There are three main types of xenograft rejection consecutively—hyperacute rejection (HAR), acute humoral xenograft rejection (AHXR), and acute cellular rejection (ACR) [6–8]. In addition, dysregulation of the recipient's coagulation system is a barrier in xenotransplantation, which appears in parallel with HAR, AXHR, and ACR [9,10]. Accordingly, this study focuses on preventing HAR and coagulation dysregulation in xenotransplantation.

HAR is a process that occurs within a few minutes to several hours after xenotransplantation. HAR is a type of humoral rejection mediated by IgM antibodies naturally occurring in the recipient. The association of recipient antibodies with epitopes present on the porcine endothelial cells activates the complement system [11]. To date, three epitopes have been described that constitute a barrier to xenotransplantation and are responsible for HAR. Galactose-α1,3-galactose (α-Gal) is the major xenoantigen involved in xenograft hyperacute rejection. This epitope is synthesized by α-1,3-galactosyltransferase encoded by the porcine *GGTA1* gene [11,12]. The second significant epitope is Neu5Gc, which is formed in an enzymatic reaction involving cytidine monophospho-N-acetylneuraminic acid hydroxylase encoded by the porcine *CMAH* gene [13,14]. β-1,4 N-acetylgalactosaminyltransferase 2 encoded by the porcine β*4GalNT2* gene is involved in the synthesis of Sd(a) antigen [15–17].

Dysregulation of the recipient's coagulation system is one of the main barriers in xenotransplantation. It causes the development of thrombotic microangiopathy in xenograft. Features of thrombotic microangiopathy include fibrin deposition and platelet aggregation, which causes thrombosis within the transplant blood vessels and ultimately ischemic damage [18]. With the development of disorders of the coagulation system, systemic consumption coagulopathy is often observed in the recipient, which can lead to his death [19]. There are also factors expressed in specific organs that pose a problem in xenotransplantation. One of them is the von Willebrand factor encoded by the porcine *vWF* gene, which is involved in the pathogenesis of transplant failure in lung xenotransplantation [20,21]. Fatal thrombocytopenia accompanying liver xenotransplantation is another barrier resulting from differences in the coagulation system. Human platelets are bound by asialoglycoprotein receptor (ASGR) encoded by porcine *ASGR1* and *ASGR2* genes. They are expressed in liver sinusoidal endothelial cells (LSEC) [22,23].

The described research focuses on introducing changes within genes that are involved in the immune response that is the biggest barrier in xenotransplantation. Porcine *GGTA1*, *CMAH*, and β*4GalNT2* genes involved in the synthesis of epitopes responsible for the xenograft hyperacute rejection. Additionally, two genes responsible for the synthesis of porcine proteins causing dysregulation of the recipient's coagulation system have been chosen—the porcine *vWF* gene and porcine *ASGR1* gene.

The use of the CRISPR/Cas9 system for genome modification has led to enormous progress in the field of animal transgenesis [24]. Through projectable short gRNA, it is possible to precisely generate double-strand DNA breaks (DSBs). DSBs of DNA generated by the CRISPR/Cas9 system are repaired by NHEJ or by homologous directed repair (HDR) after delivery of properly designed donor DNA [25,26]. There are, however, some restrictions related to the use of the CRISPR/Cas9 system. One of them is off-target effect [27]. In this study, the usefulness of the TIDE online tool has been confirmed not only to analyze indel-type mutations arising as a result of DSBs repair via NHEJ but also to analyze additional Cas9 hydrolysis sites linked to a specific gRNA—off-target sites.

#### **2. Materials and Methods**

#### *2.1. Selection of Short Oligonucleotides*

Short oligonucleotides (gRNA) and PCR primers were selected using the Benchling platform, San Francisco, CA, USA (www.benchling.com). Nucleotides have been added to the selected gRNA sequences at the 5' and 3' ends for molecular cloning. The designed oligonucleotides were ordered from the Institute of Biochemistry and Biophysics Polish Academy of Sciences.

#### *2.2. Preparation of Genetic Constructions in the CRISPR*/*Cas9 System*

The constructions were prepared using the vector *pSpCas9(BB)-2A-Puro (PX459) V2.0*, which was a gift from Feng Zhang (Addgene plasmid #62988; http://n2t.net/addgene:62988; RRID: Addgene\_62988). For each pair of oligonucleotides, a hybridization mixture containing 100 µM F and R gRNA sequences was prepared. The mixtures were incubated 5 min at 95 ◦C and then incubated at room temperature for 10 min. Plasmid DNA was hydrolyzed with the restriction enzyme *Bbs*I-HF® (New England Biolabs, Ipswich, MA, USA) for 1.5 h at 37 ◦C. To combine oligonucleotides after hybridization with plasmids in linear form, ligation was performed using T4 DNA Ligase (New England Biolabs, Ipswich, MA, USA). The mixtures were incubated for 18 h at 16 ◦C, followed by additional hydrolysis with the use of enzyme *Bbs*I-HF® for 30 min at 37 ◦C. Then *Escherichia coli* bacterial cells were transformed with purified constructions and positive clones were selected. Plasmid DNA isolation was performed using the Plasmid Maxi Kit (Qiagen, Germantown, MD, USA).

#### *2.3. Isolation and Culture of Porcine Primary Kidney Cells In Vitro*

For the isolation of porcine primary kidney fibroblasts, a 2 × 2 × 2 cm kidney cortical tissue was excised. Then, after tissue fragmentation, enzymatic-mechanical disintegration was carried out using collagenase II (Sigma-Aldrich, Darmstadt, Germany) and a magnetic stirrer. Then the incubation was carried out for 60 min while heating the mixture to 37 ◦C. The mixture was then filtered through a filter (0.5 mm pore diameter) and washed several times in the culture medium (DMEM (Sigma-Aldrich, Darmstadt, Germany), 20% FBS (Sigma-Aldrich, Darmstadt, Germany), 1% MEM Non-Essential Amino Acid Solution (100×) (Sigma-Aldrich, Darmstadt, Germany)), 1% Antibiotic Antimycotic Solution (100×) (Sigma-Aldrich, Darmstadt, Germany), 1% Sodium pyruvate solution (100 mM) (Sigma-Aldrich, Darmstadt, Germany), 1% L-Glutamine solution (200 mM) (Sigma-Aldrich, Darmstadt, Germany)). In vitro cell culture was carried out under standard aseptic conditions (5% CO2, 37 ◦C).

#### *2.4. Nucleofection*

Primary porcine kidney fibroblasts were detached from the surface of T-flasks by Accutase® solution (Sigma-Aldrich, Darmstadt, Germany) and counted using a ScepterTM Handheld Automated Cell Counter, version 2.0 (Merck, Darmstadt, Germany). One million cells were used for the transfection. Nucleofection was carried out using the Mouse Embryonic Fibroblast Nucleofector™ Kit 1 (Lonza, Basel, Switzerland) and program T-007 on Amaxa™ Nucleofector™ II device (Lonza, Basel, Switzerland).

#### *2.5. Antibiotic Selection*

After 24 h since performing nucleofection, the culture medium has been changed to selective culture medium with the addition of puromycin at a concentration of 1 µg/mL. Incubation was then carried out for 48 h, after which the selection medium was removed, and standard medium was added again. The culture was carried out until the expected cell confluency was obtained.

#### *2.6. Analysis of Introduced Genetic Modifications*

Prior to analysis of the introduced genetic modifications, DNA isolation from modified primary porcine kidney fibroblasts was performed using the DNeasy Blood & Tissue Kit (Qiagen, Germantown, MD, USA). Then PCR reactions were performed using the StartWarm HS-PCR Mix (A&A Biotechnology, Gdynia, Poland) to amplify DNA fragments that include modified *loci* and DNA fragments containing potential off-target sites. The purified PCR products were sequenced in the Sequencing Laboratory of the Faculty of Biology at the Adam Mickiewicz University in Poznan.

#### *2.7. Sequencing Analysis Using the TIDE Tool*

The presence of modifications at the target locus and off-target sites after nucleofection of primary porcine kidney fibroblasts was identified using the TIDE online tool, version 2.0.1 by The Netherlands Cancer Institute, Amsterdam, Netherlands (https://tide.deskgen.com/).

#### **3. Results**

#### *3.1. Selection of the Potential Modifications Location in Porcine Genome*

The first stage of research was to find the best bioinformatically selected gRNA using the Benchling platform. The choice was made based on several guidelines: the localization of the introduced modifications, the values of the on-target score (from 0 to 100—the higher score, the better) and the off-targets score (from 0 to 100—the higher the score, the lower the chance of additional genomic cleavage sites). The localization of the three selected gRNAs for each of the tested *loci* in the porcine genome is shown in Table 1.


**Table 1.** Localization of the potential modification *loci* in porcine genome.

<sup>1</sup> Based on NCBI: Sus scrofa isolate TJ Tabasco breed Duroc chromosome 12, Sscrofa11.1, whole genome shotgun sequence, GCF\_000003025.6.

#### *3.2. Analysis of On-Target Modification Sites*

#### 3.2.1. Nucleofection Efficiency

To obtain genetically modified primary porcine renal fibroblasts, nucleofections were performed. Transfection efficiency for isolated cells was checked using the *pmaxGFPTM Vector*from the nucleofection kit and visualized using a ZOE Fluorescent Cell Imager (Figure 1). After counting 10,000 cells from multiple views, transfection efficiency was estimated to be around 65–70%.

(**a**) (**b**)

(**c**)

μ **Figure 1.** Nucleofection efficiency obtained on porcine primary kidney fibroblasts. Cells visualization was performed using a ZOE Fluorescent Cell Imager 24 h after nucleofection—(**a**) brightfield; (**b**) fluorescence detection lamp; (**c**) merge of view (**a**,**b**). The scale is 100 µm.

3.2.2. The Efficiency of Introducing Modifications within the Examined Porcine Genes Using Genetic Constructions Selected as the Best

PCR products obtained on DNA template isolated from modified and unmodified porcine primary kidney fibroblasts were purified using a CleanUp kit (A&A Biotechnology, Gdynia, Poland). Then, sequencing was commissioned, the results of which were analyzed using the TIDE online tool, version 2.0.1. One, best gRNA from the three directed at each of the examined genes was selected.

The gGGTA1 F1/R1 was chosen as the best for the disruption of the porcine *GGTA1* gene. The gGGTA1 F1/R1, when combined with Cas9, enabled on modification of porcine primary kidney fibroblasts with total efficiency of 70.1%. A statistically significant mutation occurring with the highest frequency was a single nucleotide deletion in 51.1% sequences. It was determined that the insertion of one nucleotide occurred at 11.6% (adenosine nucleotide was incorporated in 47.6% sequences).

From the genetic constructs of the CRISPR/Cas9 system directed to the porcine *CMAH* gene, the one containing gCMAH F3/R3 was chosen as the best, as it enabled obtaining 61.2% total efficiency of modification. The most common mutation was the insertion of a single nucleotide, occurring in 57% of the tested sequences. In 93.6% of sequences, the inserted nucleotide was the cytidine.

*β* β The next gene tested was porcine β*4GalNT2*. For genetic construction containing gβ4GalNT2 F3/R3, a total modification efficiency of 45.2% was obtained. The most common mutation was a deletion of one nucleotide, which occurred in 28.8% sequences. The second most common mutation was the insertion of a single nucleotide, occurring in 8.4% of cases. Adenosine nucleotide was inserted in 79.8% of sequences.

Another analysis was performed for genetic constructions directed at the porcine *vWF* gene. A total efficiency of 85.1% was obtained for the plasmid containing gvWF F2/R2. The most common deletion of one nucleotide occurred with a frequency of 39.9%. The second most common mutation was the insertion of one nucleotide, which occurred in 34.3% of sequences. An adenosine nucleotide was inserted in 83.6% of sequences.

The last gene tested was porcine *ASGR1*. A total efficiency of 80.5% was obtained for the plasmid containing gASGR1 F3/R3. The most common deletion of one nucleotide occurred in 58.6% of sequences. The second most common mutation was the deletion of two nucleotides.

Detailed results of indel spectrum and inserted nucleotide probability for all chosen as the best gRNAs are presented in Figure 2.

**Figure 2.** *Cont.*

(**e**) gASGR1 F3/R3

β **Figure 2.** The indel spectrum and inserted nucleotide probability results for the CRISPR/Cas9 genetic constructs containing gRNA chosen as the best for disruption of tested porcine genes. Results obtained after use the plasmids with (**a**) gGGTA1 F1/R1; (**b**) gCMAH F3/R3; (**c**) gβ4GalNT2 F3/R3; (**d**) gvWF F2/R2; (**e**) gASGR1 F3/R3.

The indel spectrum and inserted nucleotide probability results, alignment, and decomposition quality controls for the other CRISPR/Cas9 genetic constructs containing gRNA tested for disruption of the porcine genes are present in the Supplement (Figure S1). The alignment and decomposition quality controls for the CRISPR/Cas9 genetic constructs containing gRNA chosen as the best for the disruption of the porcine genes are present in the Supplement (Figure S2).

3.2.3. Comparison of Bioinformatically Predicted DNA Diruption Efficiency Results with Those Obtained in the In Vitro Cultured Cells

The total efficiency results of modifications obtained using the CRISPR/Cas9 system predicted during bioinformatics analysis in silico were compared with those obtained in in vitro cultured porcine primary kidney fibroblasts. Comparison of the results for all tested construction variants is shown in Table 2.

**Table 2.** Comparison of the results of total efficiency predicted in silico with the results obtained in the in vitro cultured cells.



**Table 2.** *Cont.*

\* the gRNA together with total efficiency was marked, which was chosen as the best for disruption of the studied genes.

#### *3.3. Analysis of the Potential O*ff*-Target Sites in Porcine Genome*

#### 3.3.1. Selection of the Potential Off-Target Sites Location

After selecting the best genetic constructs containing gRNAs that mediate in the efficient disruption of studied porcine loci, off-target sites were predicted. For each targeted modification site, a minimum of three loci have been selected that can lead to mutations outside the target locus using the Benchling internet platform. The main criterion considered when choosing the tested off-target sites was the score showing the probability of a DNA break at a given off-target site by the testes construct is equal to or higher than 1.0. In addition, all off-target loci predicted in the coding sequences for selected constructs were checked. The list of tested potential off-target sites is presented in Table 3.


**Table 3.** A summary of bioinformatically predicted potential off-target sites for selected genetic constructs.


**Table 3.** *Cont.*

Nucleotides not complementary to the target template for a given gRNA were determined with the bold font. \* The number of the selected off-target site.

#### 3.3.2. TIDE Analysis of the Chosen Potential Off-Target Sites

The presence of unwanted changes—the indel mutations, in eight off-target sites from the 19 tested loci was confirmed using the TIDE tool. The off-target sites were confirmed in *loci* numbers 1, 4, 5, 10, 11, 14, 18, and 19. Only in one locus the occurred modification was statistically significant at the level *p* < 0.001 and it was site number 1. It was determined that the off-target site number 1 for the genetic construct of gGGTA1 F1/R1 was cut with a total efficiency of 3.9%. A mutation arising with the statistical significance was the insertion of one nucleotide, which occurred in 1.7% sequences. Guanosine nucleotide was inserted in the 50.8% sequences. The indel spectrum and inserted nucleotide probability for the number 1 off-target locus is presented in Figure 3.

**Figure 3.** The indel spectrum and inserted nucleotide probability result for the number 1 off-target locus after the use of CRISPR/Cas9 genetic construct containing gGGTA1 F1/R1 chosen as the best for disruption of porcine *GGTA1* gene.

3.3.3. Comparison of Bioinformatically Predicted DNA Diruption Efficiency Results in the Off-Target Sites with Those Obtained in the In Vitro Cultured Cells

The efficiency results of off-target mutation using the CRISPR/Cas9 system obtained during bioinformatics analysis were compared with those obtained in in vitro cultures. The results are summarized in Table 4.


**Table 4.** Comparison of bioinformatic analysis with the results of laboratory analyses—analysis of the off-target sites presence.

\* The number of the selected off-target site.

#### **4. Discussion**

The research presented in this paper shows the importance of the verification of the bioinformatically selected gRNAs in the context of the modifications obtaining efficiency. To compare the efficiency of individual gRNAs in modifications obtaining in cells cultured in vitro, it was necessary to use the third-generation CRISPR/Cas9 system, which enables to carry out antibiotic selection using puromycin after transfection. This process eliminates the impact of the transfection efficiency associated with the chosen method on the efficiency of obtaining modifications via individual gRNAs [28–30]. After antibiotic selection, a population of positively transfected cells after nucleofection was examined. In this way, it was possible to determine the efficiency of indel mutation formation at target sites for specific gRNAs. It was shown that the results obtained in the cells prepared in this way slightly coincided with bioinformatics predictions.

Therefore, the selection of gRNA itself, which would mediate the disruption of specific genomic site should be given the great importance. There are many different online tools that enable gRNA design. The limiting factor are the available databases related to the genomes of various organisms. Thus, the selection of the appropriate tool for prediction of Cas9 DNA hydrolysis target sites is based on the chosen research model [31,32].

The results of sequencing of PCR products that were amplified within the target sites of designed gRNAs were analyzed using the TIDE tool. This method makes it easy to determine, based on the results of sequencing, the indel mutation profile that occurs in a population of genetically modified cells using the CRISPR/Cas9 system. The indel detection by amplicon analysis (IDAA) method is also used for this purpose. It is based on three-primer amplicon labeling and detection by capillary electrophoresis [33]. The sensitivity and precision of both methods to determine the insertion-deletion profile of the modification is similar. The choice of method used depends on the researcher's preferences and available infrastructure [34,35].

Based on the results obtained in this study, the best targeted sites were selected within the genes tested, thanks to which it is possible to obtain modifications efficiently. The assessment of individual gRNAs is very important in the context of obtaining simultaneous multimodification of the porcine genome for xenotransplantation purposes. This is the latest goal of researchers in this field. Research related to this work is part of this trend. Modifications used in the research are aimed at counteracting the two immune barriers existing in xenotransplantation. The knock-out of the porcine *GGTA1*, *CMAH* and β*4GalNT2* genes aims at the elimination of the hyperacute xenograft rejection. Prevention of the coagulation dysregulation can be achieved by the disruption of the porcine *vWF* gene for lung xenotransplantation and knock-out of the porcine *ASGR1* gene for liver xenotransplantation.

An important aspect related to these results is the ethical effect. Preliminary evaluation of bioinformatic data allows to limit the participation of animals in research. This is in line with the Polish recommendations of the National Ethics Committee for Animal Experiments described in Resolution No. 14/2017, as well as with international guidelines and trends.

This study also analyzes the off-target sites generated by Cas9 nuclease in combination with the best gRNAs selected. The formation of additional DNA hydrolysis sites by Cas9 is a common problem observed in CRISPR/Cas9-related studies. Off-target mutations are the biggest threat associated with the use of the new genome editing technology. In total, 19 potential off-target sites were tested for genetic constructs selected during experiments (Figure 3, Figures S3–S7). Off-target mutations were confirmed at eight loci. Only in one of them the changes occurred with statistical significance at the level of *p* < 0.001. This was site No. 1 (for genetic construction containing gGGTA1 F1/R1). Eight potential off-target sites were in the coding sequences of the porcine genome. Experimentally, in genetically modified porcine primary kidney fibroblast cultures, the presence of the off-target mutations in two gene coding sequences was confirmed. The first locus in which off-target mutation has been reviled after the use of the genetic construction with gvWF F2/R2 lies within the porcine *COMT* gene. Another one confirmed off-target site for gASGR1 F3/R3 containing constructs is within the porcine *C1orf210* gene. These loci must be checked after receiving genetically modified animals using obtained using these two genetic constructs.

The obtained results correlate with the research that describe the influence of the non-complementary nucleotides (mismatch nucleotides) position between gRNA and the sequence of the potential off-target site. The efficiency of the off-target mutation formation is higher when incomplementarity occurs in the eighth nucleotide and in the first three nucleotides of the gRNA sequence (5′→3 ′ ). For nucleotide in the eighth position (5′→3 ′ ) such a relationship was confirmed for two out of eight off-target sites (numbers 1 and 19) for the examined gRNAs in this study. In addition, as many as seven out of eight off-target sites (numbers 1, 4, 5, 10, 11, 18, and 19) had a mismatch of at least one nucleotide in the first, second, or third position of the gRNA sequence (5′→3 ′ ) [36]. Interestingly, this study also revealed that in three out of eight confirmed off-target sites, complementarity was found for the seventh position nucleotide. Studies show that the CRISPR/Cas9 system tolerates three to five mismatches in the distal gRNA region from the protospacer adjacent motif (PAM) sequence. In addition, it has been reported that the complementary alignment of the ten nucleotides of the proximal gRNA region from the PAM sequence is sufficient to mediate the Cas9 nuclease DNA hydrolysis [37]. It has also been shown that the occurrence of nucleotide incomplementarity at positions 15, 16, and 17 of the gRNA sequence (5′→3 ′ ) to a potential off-target site abolishes CRISPR/Cas9 system activity [38].

The research presented above proved the usefulness of the TIDE online tool for off-target sites checking. There are other, more accurate methods used for this purpose. One of them is DISCOVER-Seq (discovery of in situ Cas off-targets and verification by sequencing), which uses recruitment of the factors involved in DNA repair. To this end, the binding of double-strand break DNA repair protein (MRE11) is tracked. Thanks to this it is possible to check the double-strand DNA breaks in the whole genome [39]. Another method is CIRCLE-seq (circularization for in vitro reporting of cleavage effects by sequencing), it consists of next generation sequencing and leads to the analysis of the presence of off-target sites throughout the genome [40]. Both above methods have many advantages, so it would be worth to additionally perform these analyzes for the best genetic constructions selected in this paper. However, their use is only possible in in vitro cell culture studies and it is not possible to evaluate off-target sites in genetically modified animals. In contrast, the TIDE tool enables quick and cheap recognition of basic off-target sites, and its use can apply to both cellular and model animals research.

New approaches have emerged to reduce the risk of off-target mutations after using the CRISPR/Cas9 system. One of them is the use of new algorithms in bioinformatics tools, thanks to which it is possible to more accurately assess the efficiency of modifications within individual off-target sites [41–44]. Another approach is to use modified Cas9 with nickase activity. It has been proven that introducing modifications with the CRISPR/Cas9 system modified in such a way reduces the number of unwanted mutations [45,46]. The second approach is to provide Cas9 protein and gRNA in the form of a ribonucleoprotein complex (RNP). It has been shown that this method of delivery of the CRISPR/Cas9 system enabled obtaining the modifications at the target site with high efficiency. The number of the off-target mutations that have arisen has significantly decreased. However, the use of the RNP complex has a major drawback. There is a problem with the number of complexes delivered. Very high concentrations are used because only such concentrations can guarantee that some of them will function in the cell nuclei of genetically modified cell cultures in vitro. High concentration of the RNP complexes may have a cytotoxic effect on some cell lines [47–51]. Accordingly, ribonucleoprotein complexes can only be used in some experiments.

#### **5. Conclusions**

The research presented above enabled the usefulness assessment of the CRISPR/Cas9 system genetic constructions containing gRNAs to obtain modifications for xenotransplantation purposes. Selected gRNAs, which in combination with Cas9 nuclease enable for the efficient disruption of the studied genes, can be used to obtain genetically modified pigs. To obtain an effect associated with counteracting the immune response in the recipient—the primate animal models or human, all the modifications tested should be present in the porcine genome simultaneously. The genetically modified porcine primary kidney fibroblasts cells nuclei for the SCNT procedure or verified genetic constructs for microinjection of the porcine zygotes can be used to obtain the desired porcine genome modifications. After receiving genetically modified animals, it is necessary to check off-target sites, which presence has been confirmed by the above analyzes.

**Supplementary Materials:** The indel spectrum results, insertion nucleotides probability results, alignment and decomposition quality controls for studied genetic constructs are available online at http://www.mdpi.com/ 2073-4425/11/6/713/s1, Figure S1: The indel spectrum and inserted nucleotide probability results, alignment, and decomposition quality controls for the other CRISPR/Cas9 genetic constructs containing gRNA tested for disruption of the porcine genes.; Figure S2: The alignment and decomposition quality controls for the CRISPR/Cas9 genetic constructs containing gRNA chosen as the best for the disruption of the porcine genes; Figure S3: The indel spectrum for the numbers 2–4 off-target loci after the use of CRISPR/Cas9 genetic construct containing gGGTA1 F1/R1 chosen as the best for disruption of porcine *GGTA1* gene; Figure S4: The indel spectrum for the numbers 5–7 off-target loci after the use of CRISPR/Cas9 genetic construct containing gCMAH F3/R3 chosen as the best for disruption of porcine *CMAH* gene; Figure S5: The indel spectrum for the number 8–10 off-target loci after the use of CRISPR/Cas9 genetic construct containing gβ4GalNT2 F3/R3 chosen as the best for disruption of porcine β*4GalNT2* gene; Figure S6: The indel spectrum for the numbers 11–14 off-target loci after the use of CRISPR/Cas9 genetic construct containing gvWF F2/R2 chosen as the best for disruption of porcine *vWF* gene; Figure S7: The indel spectrum for the numbers 15–19 off-target loci after the use of CRISPR/Cas9 genetic construct containing gASGR1 F3/R3 chosen as the best for disruption of porcine *ASGR1* gene.

**Author Contributions:** Investigation and writing—original draft, review and editing, N.R.; supervision and conceptualization, M.H., D.L.; supervision, J.Z.; funding acquisition and supervision, R.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by the National Centre for Research and Development (Grant No. INNOMED/I/17/NCBR/2014) within the framework of the INNOMED programme entitled "Development of an Innovative Technology Using Transgenic Porcine Tissues for Bio-Medical Purposes," acronym: MEDPIG. The publication was co-financed within the framework of a Ministry of Science and Higher Education program as "Regional Initiative Excellence" in the years 2019-2022, Project No. 005/RID/2018/19.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Review*
