**Pig Genomics and Genetics**

Editors

**Katarzyna Pi ´orkowska Katarzyna Ropka-Molik**

MDPI ' Basel ' Beijing ' Wuhan ' Barcelona ' Belgrade ' Manchester ' Tokyo ' Cluj ' Tianjin

*Editors* Katarzyna Piorkowska ´ Animal Molecular Biology National Research Institute of Animal Production Cracow Poland

Katarzyna Ropka-Molik Animal Molecular Biology National Research Institute of Animal Production Cracow Poland

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Genes* (ISSN 2073-4425) (available at: www.mdpi.com/journal/genes/special issues/Pig Genomics Genetics).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-5358-0 (Hbk) ISBN 978-3-0365-5357-3 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


Generation of Marker-Free *pbd-2* Knock-in Pigs Using the CRISPR/Cas9 and Cre/loxP Systems Reprinted from: *Genes* **2020**, *11*, 951, doi:10.3390/genes11080951 . . . . . . . . . . . . . . . . . . . **153**


### *Editorial* **Pig Genomics and Genetics**

**Katarzyna Piórkowska \* and Katarzyna Ropka-Molik**

National Research Institute of Animal Production, Animal Molecular Biology, 31-047 Cracow, Poland; katarzyna.ropka@iz.edu.pl

**\*** Correspondence: katarzyna.piorkowska@iz.edu.pl

The pig (Sus scrofa) is the most popular large farm animal in the world. They are frequently used as animal models for human medical research due to high biological similarity to humans, such as body proportions [1,2], metabolic process [3,4], adipose tissue distribution and adipocyte size [5]. In addition, both species reveal also a high genetic analogy: the human genome is composed of 3.5 billion bp, the pig genome of 3.0 billion bp; 21,630 protein-coding genes were identified in pigs, while in humans, this is 20,310 [6,7].

On the other hand, molecular biology methods assist agricultural progress, for example, in pig production and breeding. In addition, since the reference genome sequence of the domestic pig was assembled in 2012, the identification processes of crucial phenotypic traits and search of genetic markers for selection have been significantly refined, including the newest wide-range high-throughput techniques. The use of these new genomic tools has the advantage of generating information about multiple genes and gene products in parallel, which makes it possible to identify pathways and gene interactions [8,9]. This approach provides insight into the epistatic effects of genes that could improve understanding of the genetic component of pig phenotype. At first, DNA microarray that is broadly used to date, supports livestock production by predicting the potential genetic breeding value of farm animals [10]. Microarray approach also serves as a research tool in pig breeding, as well. For example, Lee et al. [11] used it to prove that the porcine immune system was affected by different breeding environments, suggesting the importance of controlling microbes in the animal room for qualified research. Another kind of microarray is used to identification of gene expression. In the Sun et al. [12] study, the authors applied cDNA microarray to identify differentially expressed genes (DEGs) between two Chinese pig breeds, pinpointing the association between BAX and BMPR1B genes with litter size. Such methodology allows to highlight potential genetic markers which can used in the pig industry. However, this method, due to numerous limitations in data analysis [13] and the possibility to identify only profiles predefined transcripts/genes through hybridization [14], is eagerly replenished by 'omic' approaches. Omic methods integrate structural and functional genomics and relate them with phenotypic data for farm animals, including pigs [8]. They offer the comprehensive detection of the whole transcriptome, genome, proteome, etc. [15]. Next-generation sequencing (NGS) methods using high throughput platforms identify genetic and transcriptomic components by sequencing long hundrednucleotide reads and then mapping them to the reference genome [15]. Using this approach, Piórkowska et al. [16] pinpointed a new gene cluster involved in porcine meat quality determination via regulating cell proliferation and differentiation and calcium-binding. In turn, more advanced tool PacBio sequencing platform providing ultra-long sequencing reads, allow in the more precise manner identifying gene mutations, new transcripts and gene candidates throughout the whole genome, transcriptome, or epigenome and estimating quantitative traits important for breeding as well as the genetic backgrounds of inherited diseases. However, PacBio is a very expensive method and for now it is applied mainly to improve genome reference, also included pig genome [17].

In this Special Issue, we will present the state of the art in the field of pig genetics and genomics, including the identification of gene candidates linked to important pig

**Citation:** Piórkowska, K.; Ropka-Molik, K. Pig Genomics and Genetics. *Genes* **2021**, *12*, 1692. https://doi.org/10.3390/ genes12111692

Received: 15 October 2021 Accepted: 21 October 2021 Published: 25 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

traits and to nutritional modifications, with the aim of collecting the most recent advances. Manuscripts focusing on high-throughput methodologies, such as RNA sequencing, ATACseq, MACE-seq, chip-seq and RRBS and covering other fields of pig genetics are included.

**Conflicts of Interest:** None of the authors has a financial or other relationship with other people or organizations that may inappropriately influence this work.

#### **References**


### *Article* **Phenotypic and Genomic Analysis of Cystic Hygroma in Pigs**

**Anna Letko <sup>1</sup> , Alexandria Marie Schauer 2 , Martijn F. L. Derks 3,4 , Llorenç Grau-Roma 2 , Cord Drögemüller 1 and Alexander Grahofer 5, \***


**Abstract:** Cystic hygroma is a malformation of the lymphatic and vascular system and is recognized as a benign congenital tumor that affects humans and animals in the perinatal period. This congenital disorder is rarely described in animals, and until today, cystic hygroma in pigs has not been described in the literature. In a purebred Piètrain litter with twelve live-born piglets, cystic hygroma was noticed on the rump of two male pigs within the first week of life. In addition, a third case of a crossbred weaner (Large White × Landrace) was detected during a herd examination. To rule out common differential diagnoses, e.g., abscess or hematoma, further clinical and pathological investigations were conducted. During clinical examination, a painless and soft mass, which was compressible, was detected on the rump of all affected animals. The ultra-sonographic examination revealed a fluid-filled and cavernous subcutaneous structure. In addi-tion, a puncture of the cyst was conducted, revealing a serosanguinous fluid with negative bacte-riological culture. In all cases, a necropsy was performed, showing that the animals had fluid-filled cysts lined by well-differentiated lymphatic endothelium. Based on the clinicopathological examination, cystic hygroma was diagnosed. Furthermore, SNP array genotyping and whole-genome sequencing was performed and provided no evidence for a chromosomal disorder. In the Piètrain family, several genome regions were homozygous in both affected piglets. None-theless, a dominant acting de novo germline variant could not be ruled out, and therefore differ-ent filtering strategies were used to find pathogenic variants. The herein presented lists of pri-vate variants after filtering against hundreds of control genomes provide no plausible candidate and no shared variants among the two sequenced cases. Therefore, further studies are needed to evaluate possible genetic etiology. In general, systematic surveillance is needed to identify ge-netic defects as early as possible and to avoid the occurrence of losses in the pig population.

**Keywords:** *Sus scrofa*; precision medicine; lymphatic system; whole-genome sequencing; SNP array genotyping

#### **1. Introduction**

Cystic hygroma, often also referred to as 'cystic lymphangioma', is one of the most commonly presenting lymphangioma in human medicine. It is a well-known congenital malformation of the lymphatic system characterized as single or multiloculated fluid-filled cavities due to a lack of communication between the lymphatic and venous systems [1–4]. Cystic hygroma occurs with an incidence of ~1:1000–6000 births and 1:750 miscarriages in humans [5,6]. Even though cases of cystic hygroma are rarely described in animals [7–11], an estimation of the prevalence of this malformation in animals is still missing. Cystic hygromas can manifest anywhere in the body but are often found in the neck, clavicle,

**Citation:** Letko, A.; Schauer, A.M.; Derks, M.F.L.; Grau-Roma, L.; Drögemüller, C.; Grahofer, A. Phenotypic and Genomic Analysis of Cystic Hygroma in Pigs. *Genes* **2021**, *12*, 207. https://doi.org/10.3390/ genes12020207

Academic Editors: Katarzyna Piórkowska and Katarzyna Ropka-Molik Received: 22 December 2020 Accepted: 28 January 2021 Published: 31 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and axillary regions in humans [1–4,12]. In approximately half of the reported cases, cystic hygromas are present directly after birth, whereas the other cases occur within the age of two years [13]. Until today, the exact etiology of cystic hygroma in humans and animals has been unclear, but an association with chromosomal aberrations and genetic syndromes, such as the Noonan syndrome (OMIM PS163950), has been described [1–3,12]. Abnormal karyotype was found in 29% to 60% of the cases [14], whereas congenital disorders with normal karyotype ranged from 25% to 53% [15]. However, submicroscopic chromosomal abnormalities that are missed by conventional karyotyping are also described in cystic hygroma [2]. In addition, cases of familial cystic hygroma with normal karyotype have been described and suggest that both recessively as well as dominantly inherited genetic variants are involved in the phenotype [2,16–19].

Until now, no information regarding the occurrence, the etiology, and the genetic background of cystic hygroma in the pig population has been available. The clinical phenotype of cystic hygroma in pigs resembles the human condition as well as reports of similarly affected individuals of other domestic animal species. To the authors' knowledge, this is the first report of the hygroma cyst in pigs. Therefore, this report describes the phenotypic findings of hygroma cysts in three pigs of different breeds and the subsequent preliminary genomic analysis, including SNP genotyping and whole-genome sequencing, to evaluate a possible inherited cause.

#### **2. Materials and Methods**

#### *2.1. Ethics Statement*

All animal experiments were performed according to the local regulations. The study was approved by the Cantonal Committee for Animal Experiments (Canton of Bern; permit 109/18) at the University of Bern.

#### *2.2. Animals and DNA Samples*

Three male cases with cystic hygroma on the rump were used in this study. Two cases were littermates from a purebred Piètrain litter. The third case, a crossbred weaner (Large White × Landrace), was observed during a herd examination in a fattening herd. Blood samples were obtained from all cases as well as from the Piètrain sow, boar, and all 20 healthy siblings from two independent litters after repeated mating for further genetic investigations. Genomic DNA was isolated from EDTA blood samples using the Maxwell RSC Whole Blood DNA Kit (Promega AG, Dübendorf, Switzerland).

#### *2.3. Clinical and Further Examination*

A total of three male cases with cystic hygroma on the rump were examined in this study. Two cases (case 1 and 2) were littermates from a purebred Piètrain litter born at the Clinic for Swine in Bern. The pregnant Piètrain sow, artificially inseminated from a Piètrain boar of a boar study, was bought from a nucleus farm in Switzerland and farrowed at the Clinic for Swine. The sow was raised under conventional conditions. At the beginning of gestation, the sow was kept in a group house system with straw as a bedding material according to legal requirements and received a conventional feed diet and water ad libitum. At the Clinic for Swine, the animal was housed in a single pen with contact to other pigs. The pen was interspersed with straw and sawdust. In addition to the conventional feed, the sow also received hay. Within the first week after farrowing, cystic hygroma occurred in two (male) out of eleven (3 female, 8 male) piglets. A rebreeding of the sire and dam under experimental conditions was conducted to investigate a possible genetic effect. The litter size of the second litter was twelve (7 female, 5 male), but no more affected piglets were observed during the lactation period of four weeks. Due to leg weakness, no further breeding with the sow was possible, and therefore, no further rebreeding was conducted.

The third case (case 3) was observed during a herd examination in a fattening herd. The farmer reported that this weaner (Large White × Landrace, male castrated) already arrived with the cystic hygroma to his farm from a piglet producer. No information about

the mother and father could be obtained. However, this animal was referred to the Clinic for Swine for further investigation. A clinical examination was conducted in all affected animals. Further investigations, including ultrasonography and cytology of the fluid in the cyst, were conducted to clarify the phenotype of the cystic hygroma and rule out differential diagnoses.

#### *2.4. Postmortem Examination, Histology, and Bacteriology*

A full postmortem was performed immediately after euthanasia on the two affected Piètrain littermates (cases 1 and 2), and one crossbred weaner pig (case 3). Tissue samples from the skin, subcutaneous lesions, skeletal muscle, superficial inguinal lymph nodes, and internal organs including lung, liver, and kidney from all pigs and from a mass observed within the radius in case 1 were fixed in 10% buffered formalin, pH 7.2, overnight. All tissues were routinely processed for histopathology and stained with hematoxylin and eosin (H&E). Immunohistochemical examination was performed using rabbit polyclonal antibodies against LYVE1 (Abcam, Cambridge, UK), at the dilution of 1:3000, and counterstained with hematoxylin. Inguinal lymph nodes were included as an internal positive control.

Sterile samples for bacteriological investigations of the fluid present within the cystic masses were collected before sectioning the tissues in all three cases. The fluid samples were cultured aerobically on blood agar with and without ammonia and on MacConkey agar for 48 h. As part of the national surveillance system, blood samples were tested for the presence of antibodies against African swine fever virus (ASFV), Classical swine fever virus (CSFV), and Porcine reproductive and respiratory syndrome virus (PRRSV) by ELISA.

#### *2.5. Genetic Analyses*

From both of the purebred Piètrain litters, 24 animals (dam, sire, 2 affected piglets, and 20 apparently normal littermates) were genotyped using the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA, USA) containing 50,915 SNPs. Basic quality control filtering steps of the SNP array genotyping data and parentage confirmation were carried out using PLINK v1.9 [20]. Markers with call rates <90% were excluded, and all individuals had call rates >90%. The dataset was additionally pruned for low minor allele frequency (0.05) and failure to meet Hardy–Weinberg equilibrium (0.0001), resulting in 31,054 markers. The dataset was also scanned for Mendelian errors using the –mendel option of PLINK to reveal any deviations from expected values based on per-individual, per-family, and per-SNP error rates.

Merlin software [21] was used to test for cosegregation of any chromosomal regions and the cystic hygroma phenotype in one complete family representing the first litter (sire, dam, two affected, and eight unaffected offspring) by performing parametric linkage analysis under a fully penetrant, recessive model of inheritance. Assuming identity-bydescent (IBD), the autozygosity mapping approach in PLINK v1.9 [20] was used to discover homozygous intervals with alleles shared by both affected piglets (using –homozyg-match 0.95 for allelic matching between both cases). Additionally, individual homozygous intervals were determined in the 22 control animals for comparison. All plots were constructed in R environment v3.6.0 [22].

#### *2.6. Whole-Genome Sequencing*

Whole-genome sequence (WGS) data were obtained from five pigs including four animals of the purebred Piètrain litter (case 1, its dam, sire, and one normal littermate), as well as the unrelated affected crossbred pig (case 3), after preparation of PCR-free fragment libraries with approximately 400 bp inserts that were sequenced for paired-end reads of 2 × 150 bp length. All five animals were sequenced on the Illumina NovaSeq 6000 System at an average coverage of 21× (Supplementary Table S1). The obtained reads were mapped to the pig reference genome assembly Sscrofa11.1 using the Burrows-Wheeler Aligner v0.7.15 [23] with default settings. Picard v2.9 [24] was used to sort the mapped reads by the sequence coordinates and to label the read duplicates. Genome Analysis Toolkit v3.8 (GATK) [25] was used to perform local realignment and to produce a cleaned BAM file. The single nucleotide variants (SNVs) and small indels were identified using genotypeGVCFs of GATK, and the prediction of their functional effects was performed with SnpEff v4.3 [26] using the NCBI Annotation Release 106. The generated files were merged with 10 other unrelated pig genomes (Supplementary Table S1), which were generated previously during the course of different studies and are publicly available, into a final variant call format (VCF) file, including all individual variants and their functional annotations. The VCF file was then used for determining private variants of the affected animals.

As an independent validation, the obtained variants were additionally searched in a cohort of 756 commercial pig genomes from four breeds (Duroc, Large White, Landrace, Piètrain). This cohort of sequenced animals was analyzed according to Derks et al., 2019 [27]. In short, sequence reads were mapped using Burrows–Wheeler aligner against the Sscrofa11.1 reference genome, and SAMtools [23] was used to sort, merge, and index BAM files. We performed variant calling using FreeBayes with the setting: –min-basequality 10 –min-alternate-fraction 0.2 –haplotype-length 0 –min-alternate-count 2 [28]. We discarded variants with a Phred quality <20. The resulting sequence variants were functionally annotated using the Ensembl Variant Effect Predictor pipeline (v99) [29].

The Integrative Genomics Viewer (IGV) [30] was used for visual inspection and screening for structural variants in the regions of interest in the WGS of the affected pigs. Additionally, coverage for the five animals was calculated using the function bedcov of the program Samtools [23] by a sliding window approach with the window size of 300 kb moving for half the window size and including reads with mapping quality greater than 15. The average coverage over every window was plotted for each pig with the qqman package [31] in R environment v3.6.0 [22].

#### *2.7. Availability of Data and Material*

All positions refer to the pig reference genome assembly Sscrofa11.1 and NCBI Annotation Release 106. The herein generated WGS data are freely available at the European Nucleotide Archive (ENA) under study accession number PRJEB29465, and individual sample accession numbers are available in Supplementary Table S1. The genotypes of the commercial pig genomes used as the validation cohort within the candidate regions are available on reasonable request.

#### **3. Results**

#### *3.1. Clinical and Further Examination*

A clinical examination of all three cases revealed that the animals were alert and in good body condition. The vital signs were within the reference values, and no locomotion or neurological disorders were determined. In all three animals, a painless, soft, and compressible mass on the rump was detected (Figure 1, Supplementary Video S1, and Supplementary Video S2); and an ultrasonographic examination revealed a cystic or multicystic lesion with internal septations (Figure 2). In addition, after sterile preparation, an aspiration of the fluid was conducted in all three animals and a cytological investigation was performed. Results from the cytological examination are listed in Table 1.

Based on the findings of the clinical examination, the diagnosis of cystic hygroma was made in all three affected pigs. In addition, a clinical examination of all controls revealed that the animals were alert and in good body condition. The vital parameters were all within the reference range, and examination of the locomotion and neurological system revealed no pathological findings. Based on the results of the clinical examination, all controls were healthy.

**Figure 1.** Phenotypic appearance of cystic hygroma in pigs. A soft and compressible mass on the rump of two cases is visible ((**A**) = case 1; (**B**) = case 2).

**Figure 2.** Ultrasonographic imaging of cystic hygroma in pigs. Fluid-filled cysts ((**A**) = case 1; (**B**) = case 2; (**C**) = case 3) with cavernous structure (**C**) could be detected.



#### *3.2. Postmortem Examination, Histology, and Bacteriology*

The gross and histological appearance of the masses was similar in all three pigs. The masses were grossly palpable, fluctuant, well-demarcated, and located subcutaneously at the level of the rump. The masses were 9 × 7 × 3 cm, 10 × 8 × 4 cm, and 17 × 7 × 2 cm in cases 1 to 3, respectively. In the cut sections, the masses were surrounded by a thick, fibrous capsule with a width of 0.3–0.5 cm and contained multiple cystic cavities filled with serosanguineous fluid and a few aggregates of coagulated protein (Figure 3).

–

**Figure 3.** Porcine cystic hygroma (case 1). Note the well-demarcated, subcutaneous mass located at the level of the left rump (arrowheads). Inset: The mass is surrounded by a thick, fibrous capsule and contains a central cystic cavity filled with serosanguineous fluid.

Histologically, the masses were limited to the subcutis, and the cystic cavities were surrounded by a thick capsule comprised of mature fibrous connective tissue. The innermost layer of the capsule was lined by a single layer of cells that were mostly flat but multifocally plump to cuboidal. Cells contained a moderate amount of pale, eosinophilic cytoplasm and one central, round to ovoid nucleus. The central cavity was filled with pale, extracellular eosinophilic material, a low amount of fibrin, and a few scattered neutrophils and lymphocytes. Immunohistochemistry for LYVE-1 demonstrated diffuse and strongly positive staining within the cells lining the cystic masses, which was consistent with lymphatic endothelium (Figure 4). The endothelial cells lining the afferent and efferent lymphatic vessels of the lymph node as well as the subcapsular and peritrabecular sinuses showed similar positive staining, indicating the success of the positive control.

No other relevant lesions were observed in the three affected pigs. No bacterial growth was obtained in the bacteriological investigations of any of the collected samples. The serology for ASFV, CSFV, and PRRSV was negative.

#### *3.3. Genetic Analyses*

A total of 24 pigs from two Piètrain litters were genotyped using the SNP array. The same parentage of both litters was confirmed by the IBD estimates for all pairs of individuals. No increased number of Mendelian errors between the parents and the two affected offspring (case 1 and case 2) was observed, indicating no larger structural variants in the genome of the affected piglets.

Based on the pedigree structure, initially, an autosomal recessive inheritance was suspected. Parametric linkage analysis for a recessive trait in the first litter resulted in ten genome regions with a positive logarithm of the odds (LOD) score (Supplementary Table S2). Moreover, ten runs of homozygosity shared by the two Piètrain cases were detected on nine autosomes (chr 1, 3, 5, 6, 11, 12, 14, 17, and 18). Only one homozygous segment overlapped with a linked interval on chromosome 12, forming a ~4.6 Mb (chr 12: 178,275–4,826,688) region of interest (Supplementary Table S2). However, from 9% to 100% of the 22 control pigs were also homozygous over the ten detected homozygous regions (Supplementary Figure S1, Supplementary Table S2).

**Figure 4.** Histological phenotype of porcine cystic hygroma (case 1). Detail of the subcutaneous cystic mass surrounded by a thick capsule of mature connective tissue and lined by a single layer of well-differentiated cells with a squamous to cuboidal morphology (arrows). Inset: Immunohistochemistry for LYVE-1. Most of the epithelial cells lining the cystic cavity are diffusely and strongly positively stained with hematoxylin and eosin (arrowhead).

> As both affected piglets of the Piètrain litter were males, an X-linked recessive inheritance could also be assumed. SNP genotyping showed that both cases received three IBD segments on chromosome X (Supplementary Table S2). Subsequently, we noticed that these haplotypes also occurred in several unaffected littermates as in these three segments, the unaffected male siblings received the identical maternal X chromosomes (Supplementary Table S3).

#### *3.4. Whole-Genome Sequencing*

The genomes of one affected Piètrain piglet (case 1), both parents, one healthy littermate, and the unrelated case 3 were sequenced. All five cleaned BAM files were inspected for coverage differences, and no evidence for chromosomal imbalance was detected (Supplementary Figure S2).

– Subsequently, we focused genome-wide on variants in annotated genes and loci, assuming a protein-changing variant was causing the observed phenotype. For the sequenced Piètrain family consisting of four animals, we applied three different strategies to filter for disease-associated variants assuming various modes of inheritance, including autosomal recessive (AR), X-linked recessive (XR), and trio filtering for possible autosomal dominant (AD) de novo variants (Table 2). For the crossbred case 3, we searched for private heterozygous and homozygous variants that were absent in all other available genomes (Table 2).

Filtering for SNVs and small indels was performed against 10 unrelated control pigs (Supplementary Table S1) and revealed 104 coding variants detected in the purebred Piètrain case 1 based on different filtering strategies (Table 2, Supplementary Table S4). After a second round of filtering using the independent validation cohort of 756 commercial pig genomes, 22 coding variants of which 8 were predicted to be protein-changing remained. These were present only in the genome of the affected piglet from the Piètrain litter and absent in all controls (Table 2). In light of the outcome of the previously performed genetic analyses, only a single private protein-changing variant in *LOC110255918* mapped to the IBD genome region on chromosome 12 (Supplementary Table S4).



**Table 2.**Results of different filtering strategies.

AR denotes autosomal recessive, XR denotes X-linked recessive, and AD denotes autosomal dominant modes of inheritance.

In the WGS data of case 3, we found 2585 variants; after the second round of filtering, this list was reduced to 500 private SNVs and small indels, including 2 nonsense, 6 frameshift, 184 missense, 216 synonymous, and 86 intronic variants, as well as 3 inframe insertions, 2 inframe deletions, and 1 bidirectional gene fusion (Table 2, Supplementary Table S5).

No private variants shared between the two sequenced affected pigs were found. Two genes located on chromosome X were shared in case 1 and case 3, harboring independent private coding variants. The two different variants found in the chloride voltage-gated channel 4 (*CLCN4*) gene located in a homozygous region (Supplementary Table S4) were synonymous in both sequenced cases. In addition, the two different variants in the myotubularin related protein 1 (*MTMR1*) gene located in a linked region (Supplementary Table S4) were silent in case 1 and only predicted to alter the protein sequence in case 3.

#### **4. Discussion**

This is the first case report of cystic hygroma in pigs of different breeds in combination with further downstream analyses, including a comprehensive but still preliminary genomic analysis based on short-read whole-genome sequencing. Hence, a novel congenital disorder in pigs was described. In comparison with similar cases described in humans showing cystic hygroma in the neck region [1–4,12], in all three herein described porcine cases, the cystic hygroma was located at the level of the hind limb. Interestingly, no breathing problems, which are described in infants, could be observed in the pigs because the cystic hygroma was only located on the hind limb. However, all the pigs showed a higher risk of injuries on the rump, and therefore, this congenital disorder has implications for animal welfare. Moreover, the cystic hygroma on the limb tremendously influenced the meat quality, especially the dry ham production [32]. If the incidence of cystic hygroma in the pig population increased due to the high use of carrier boars, it would have a significant impact on the pig industry economy. Therefore, the results of this preliminary study are of major importance.

Due to the high use of semen from boar studs for artificial insemination, there is an increased risk of distributing a hereditary disease in the pig population [33–35]. Therefore, a thorough diagnostic and rapid analysis of this genetic disorder was conducted to avoid spreading of the disease in the pig population. The outcome of genetic analysis depends on the number of cases with a confirmed phenotype and the causative genetic variant. Hence, the diagnosis of the phenotype is of major importance, although systematic surveillance in the pig production is mostly lacking. In this study, all affected animals were confirmed by clinical examination and further pathological investigations to receive a valid phenotype for the subsequent genomic analysis. The clinical examination revealed all typical signs for hygroma cysts, including a painless, compressible, and soft mass on the rump [1,2]. To confirm this diagnosis, further investigations, including ultrasonographic examination, aspiration of the mass, and pathological examination were conducted. Findings showed cystic or multicystic lesions with internal septations and cloudy, reddish fluid with a low content of proteins, which was in line with the diagnosis of cystic hygroma [1,2]. Although half of the cases in human medicine are described at birth [13], all three herein described porcine cases developed the malformation within several days after birth, indicating a congenital condition.

As a familial occurrence of cystic hygroma is hypothesized in human medicine [16–18], we hypothesized a possible genetic origin for the observed cases in pigs. As the mode of inheritance was unclear, we evaluated different possible scenarios, such as monogenic recessive, X-linked, and dominant inheritance. As all three cases were male, a sex-linked inheritance seemed to be likely, and therefore, a second litter of the Piètrain sow with the same boar was produced under experimental conditions but revealed no further affected piglets. As all piglets of the second litter were apparently showing no signs of cystic hygroma until weaning, it might be speculated that due to the low number of male offspring, no further cases occurred. However, a resolution of the malformation on the

lymphatic tissue has been proven by ultrasonographic examination during pregnancy [17]. This could not be ruled out in this study because the examination of all piglets in utero with ultrasonography is not possible in sows.

The outcome of the genetic and genomic analyses were inconclusive. There was at least a single genome region linked to the phenotype and showing shared homozygosity within the Piètrain family, and a single private protein-changing variant was found in that region on chromosome 12. Additionally, the condition may not be fully penetrant as the possibility that two normal littermates were identically homozygous for that genome region could not be ruled out. Using MutPred2 [36], the silico prediction of possible deleterious consequences for this missense variant in LOC110255918 revealed a score of 0.046, indicating no pathogenic effect. Nonetheless, the impact of this missense mutation in the uncharacterized locus remains unclear.

Furthermore, we opted for a possible sex-linked inheritance approach, which showed three shared IBD regions on chromosome X. However, the unaffected male littermates also shared these haplotypes. Only a single normal littermate shared the identical haplotypes in all three regions of the X chromosome, indicating that this offspring received the identical nonrecombinant copy of the maternal chromosome. Nonetheless, several further normal piglets received recombinant versions of the maternal X chromosome in the three distant chromosomal segments, resulting in several normal littermates sharing the respective haplotypes seen in both affected piglets. Therefore, the X-recessive mode of inheritance seems to be less likely. Furthermore, we could not rule out a possible paternal mosaicism of the X chromosome as an explanation. Filtering in the three regions revealed a variant in the CLCN4 gene that is known to be associated with the X-linked dominant inherited Raynaud-Claes syndrome, a very rare neurodevelopmental disease in humans (OMIM 300114). In addition to the fact that this synonymous variant does not alter the encoded protein, we postulate that the variant also found in the dam of the cystic hygroma-affected Piètrain piglet is most likely not causative. Indeed, sequencing the genome of the second affected littermate would have been beneficial in the identification of a possible shared causal variant. Recently it was evaluated that sequencing of each additional family member helped to narrow down the number of variants by 50%–75% [37]

In addition, the sequence analysis performed for the independent crossbred pig revealed no plausible candidate variant for the observed cystic hygroma phenotype that was highly similar to the two Piètrain cases. The private missense variant in MTMR1 located on the X chromosome seems to be less likely causative as the closely related MTM1 gene is known to be associated with forms of X-linked inherited myopathies (OMIM 300415). Furthermore, the long list of remaining private variants after filtering against hundreds of control genomes clearly illustrates that without sequencing of close relatives, ideally parents, as done for the Piètrain case, it is nearly impossible to limit the number of relevant variants if no obvious candidate gene is known. Furthermore, the chosen shortread genome sequencing method is known to be limited for the detection of structural variants. Therefore, long-read sequencing might be interesting in future cases to identify these kinds of genetic variations. Finally, the generated genome data revealed no indication of the presence of submicroscopic chromosomal abnormalities that were described in human cystic hygroma.

#### **5. Conclusions**

For the first time, this report provides a comprehensive description of cystic hygroma in pigs, including a preliminary genomic evaluation of possible inherited causes. It could be assumed that the genetic origin is heterogeneous as no shared variants across the two whole-genome sequenced affected pigs are found. Further targeted matings might help to elucidate the mode of inheritance. Systematic surveillance is needed to identify congenital defects as early as possible and to avoid the occurrence of further losses in the pig population.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2073-4 425/12/2/207/s1, Figure S1: Runs of homozygosity across the autosomes of 24 analyzed Piètrain pigs, including two cases (red lines) and 22 controls (blue lines). Figure S2: Average coverage over 300 kb sliding windows as determined by whole-genome sequencing (WGS) of the analyzed Piètrain family and the unrelated case 3; note that the red lines indicate an individual's average coverage over the whole genome. Table S1: Sample designations and breed information of the whole-genome sequenced pigs. Table S2: Genomic regions of interest identified by homozygosity and linkage analyses in the Piètrain family. Table S3: Haplotype analysis for three segments on chromosome X. Table S4: Genomic variants detected in the WGS of the Piètrain case 1 based on different filtering strategies. Table S5: Genomic variants detected in the WGS of the unrelated case 3. Video S1: Video illustrating the clinical phenotype of case 2. Video S2: Video illustrating the clinical phenotype of case 3.

**Author Contributions:** Conceptualization, A.G. and C.D.; methodology, A.G., A.L. and L.G.-R.; formal analysis, A.G. and A.L.; investigation, A.L., A.M.S. and L.G.-R.; data curation, A.L.; software, A.L.; validation, M.F.L.D.; writing—original draft preparation, A.G., A.L., A.M.S., L.G.-R. and C.D.; writing—review and editing, A.G., A.L., M.F.L.D., A.M.S., L.G.-R. and C.D.; visualization, A.G., A.L., A.M.S. and L.G.-R.; supervision, A.G. and C.D.; funding acquisition, C.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** Authors have no external funding to report.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data generated or analyzed during this study are available from the corresponding author on reasonable request.

**Acknowledgments:** The Next Generation Sequencing Platform and the Interfaculty Bioinformatics Unit of the University of Bern are acknowledged for performing the WGS and providing highperformance computational infrastructure.

**Conflicts of Interest:** The authors declare that they have no competing interests.

#### **References**


### *Article* **Genes and SNPs Involved with Scrotal and Umbilical Hernia in Pigs**

**Ariene Fernanda Grando Rodrigues 1 , Adriana Mércia Guaratini Ibelli 2,3 , Jane de Oliveira Peixoto 2,3 , Maurício Egídio Cantão 2 , Haniel Cedraz de Oliveira 4 , Igor Ricardo Savoldi 1 , Mayla Regina Souza 1,5 , Marcos Antônio Zanella Mores 2 , Luis Orlando Duitama Carreño <sup>6</sup> and Mônica Corrêa Ledur 1,2, \***


**Abstract:** Hernia is one of the most common defects in pigs. The most prevalent are the scrotal (SH), inguinal (IH) and umbilical (UH) hernias. We compared the inguinal ring transcriptome of normal and SH-affected pigs with the umbilical ring transcriptome of normal and UH-affected pigs to discover genes and pathways involved with the development of both types of hernia. A total of 13,307 transcripts was expressed in the inguinal and 13,302 in the umbilical ring tissues with 94.91% of them present in both tissues. From those, 35 genes were differentially expressed in both groups, participating in 108 biological processes. A total of 67 polymorphisms was identified in the inguinal ring and 76 in the umbilical ring tissue, of which 11 and 14 were novel, respectively. A single nucleotide polymorphism (SNP) with deleterious function was identified in the integrin α M (*ITGAM*) gene. The microtubule associated protein 1 light chain 3 γ (*MAP1LC3C*), vitrin (*VIT*), aggrecan (*ACAN*), alkaline ceramidase 2 (*ACER2*), potassium calcium-activated channel subfamily M α 1 (*KCNMA1*) and synaptopodin 2 (*SYNPO2*) genes are highlighted as candidates to trigger both types of hernia. We generated the first comparative study of the pig umbilical and inguinal ring transcriptomes, contributing to the understanding of the genetic mechanism involved with these two types of hernia in pigs and probably in other mammals.

**Keywords:** gene expression; congenital defects; RNA-Seq; transcriptomics; swine

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Pig production is one of the most important livestock activities in the world and its evolution and expansion are mainly due to the development of technologies that combine genetics, management, nutrition and well-being [1], which increase productivity and bring the final product closer to what the consumer idealizes. Meat quality and feed efficiency are traits that have been prioritized by genetic breeding programs [2–4]. However, in recent years, studies have also been carried out to improve our knowledge on diseases that persist in production, which bring losses to the entire chain [5–9]. Among these, scrotal (SH)/inguinal (IH) and umbilical hernias (UH) are birth defects often found in pigs [10],

**Citation:** Rodrigues, A.F.G.; Ibelli, A.M.G.; Peixoto, J.d.O.; Cantão, M.E.; Oliveira, H.C.d.; Savoldi, I.R.; Souza, M.R.; Mores, M.A.Z.; Carreño, L.O.D.; Ledur, M.C. Genes and SNPs Involved with Scrotal and Umbilical Hernia in Pigs. *Genes* **2021**, *12*, 166. https://doi.org/10.3390/genes12020166

Academic Editor: Katarzyna Piórkowska Received: 18 December 2020 Accepted: 22 January 2021 Published: 27 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

causing pain and discomfort to the animals and, consequently, economic losses related to reduced performance [11,12] and increased risk of death [13].

Scrotal hernia is mainly characterized by the displacement of intestinal loops to the scrotal sac, resulting from an abnormality in the inguinal ring [11]. Failure to obliterate the vaginal process [14], impairment of the innervations that act on the site [15] or the failed involution of the internal inguinal ring [16] are processes associated to the manifestation of this defect. Moreover, SH can arise as a result of low resistance of the inguinal region [17] caused by disturbances in the metabolism and hydrolysis of the extracellular matrix components, such as collagen and muscle fiber structures [18], compromising the repair of connective tissue [19]. The incidence of SH in pigs is influenced by genetics and environmental factors. Sevillano et al. [20] have shown that the incidence of SH/IH in Landrace and Large White pigs was 0.34% and 0.42%, respectively.

Umbilical hernia is characterized by the passage of abdominal contents (mainly intestine) to the hernial sac, in the umbilical region [12]. The discomfort and pain can be aggravated when secondary factors are associated with the defect, and the pig welfare is compromised [12]. UH can be related to genetic and non-genetic factors such as navel infections, lesions at the site, obesity and incorrect umbilical cord cutting [21,22]. This defect has been associated with a failure in the process of umbilical ring closure [23,24]. The incidence of UH ranges from 0.55% [25] to 1.5% [23] and it varies according to the management, breed line and production lot [26]. Usually, the UH is not observed at birth, and this defect appears when the pigs are already in the growing period [27]. This demonstrates the difficulty that breeding companies and pig farmers have to eliminate such defect from their herds.

The heritability for SH/IH and UH were estimated in 0.31 [20] and 0.25 [28], respectively, which shows moderate genetic influence in the appearance of these defects. The knowledge of the genetic mechanisms associated with the formation of these anomalies is important to understand their underlying causes, regardless of the type of hernia studied. In pigs, quantitative trait loci (QTL) were detected for the occurrence of SH in pig chromosomes (SSC) 2, 4, 8 (locus SW 933), 13 and 16 [29]. In addition, suggestive QTL for IH and SH were found in seven chromosomes (SSC1, 2, 5, 6, 15, 17 and X) when comparing healthy and herniated pigs [11]. Moreover, candidate genes involved with the manifestation of SH, such as Insulin-like receptor 3 (*INSL3*), Müllerian inhibiting substance (*MIS*) and Calcitonin gene-related peptide (*CGRP*) were also identified [11]. Twenty-two single nucleotide polymorphisms (SNPs) located on chromosomes 1, 2, 4, 10 and 13 in Landrace pigs, and 10 SNPs on SSC 3, 5, 7, 8 and 13 of Large White pigs, were associated with the incidence of SH/IH in these populations [20]. To date, there is a record of 116 QTL/associations related to the appearance of SH/IH in pigs in the Animal QTL database [30].

Regarding UH, there are 55 QTL/associations related to its manifestation in the Pig QTL database (accessed in 17th December 2020) [30]. A SNP in the *CAPN9* gene (Calpain 9) on SSC14 significantly associated with UH has already been detected [31]. In commercial pigs, four candidate genes were identified in QTL regions associated with UH, namely *TBX15* (T-Box 15) and *WARS2* (Tryptophanyl TRNA Synthetase 2, Mitochondrial) in chromosome 4, and *LIPI* (Lipase I) and *RBM11* (RNA Binding Motif Protein 11) in chromosome 13 [32]. A QTL on chromosome 14 of the Landrace breed was highly correlated with UH, where the *LIF* (Leukemia inhibitory factor) and *OSM* (Oncostatin M) genes were identified as candidates for this defect [25].

Another approach that has recently been used to investigate the genetic mechanisms involved with these defects is the whole transcriptome study. Functional candidate genes were prospected for scrotal [6] and umbilical [7] hernias based on differentially expressed (DE) genes between affected and unaffected pigs in the inguinal and umbilical ring tissues, respectively, using RNA sequencing (RNA-Seq). Information about gene expression in those tissues is scarce and there are still many gaps to be elucidated. Interestingly, these previous studies have shown some genes and pathways that seem to be shared between SH and UH, many of them involved with muscle. Therefore, new and systematic analyses

on tissues characterization and on common processes involved with both types of hernia are needed.

Although several genetic studies have been performed, no comparison between largescale gene expression profile of pigs affected with SH from those affected with UH was reported to date. Since SH and UH are related to some extent to muscle dysfunction, our hypothesis is that some muscle-related genes could be involved in the development of both types of hernia. Therefore, the objective of this study was to investigate the common molecular mechanisms and genes involved with these two types of hernia by comparing the SH and UH transcriptomes, as well as to identify and characterize polymorphisms in those transcriptomes.

#### **2. Materials and Methods**

This study was performed with the approval of the Embrapa Swine and Poultry National Research Center Ethical Committee of Animal Use (CEUA) under the protocol number 011/2014. A diagram summarizing the experiment and the analyses performed can be seen in the Supplementary Materials: Figure S1.

#### *2.1. Animals and Sample Collection*

Eighteen pigs were selected from a Landrace female line from the same nucleus farm, located in Santa Catarina State, SC, Brazil. From those, five were females affected (case) with UH, five healthy (control) females for UH, four males with SH and four control males for SH, as described by Souza et al. [7] and Romano et al. [6]. Control animals were healthy pigs, without any type of hernia, came from hernia-free litter and were from the same management group of their respective cases. The animals were selected at approximately 60 days of age for SH and 90 days of age for UH. At the Embrapa Swine and Poultry National Research Center, located in Concórdia, SC, Brazil, the pigs were euthanized by electrocution stunning for ten seconds, followed by bleeding, in accordance with the practices recommended by the Ethics Committee. The necropsy was performed for general evaluation and to confirm the both types of hernia. In the pathological analysis, the two groups of animals were confirmed: hernia-affected or without hernias (Figure 1). Tissue samples from the inguinal and umbilical rings were collected for investigating the scrotal and umbilical hernias, respectively (Figure 1). Samples were immediately placed in liquid nitrogen and stored in ultra-freezer (−80 ◦C) for RNA extraction. Samples from those tissues were also collected and placed in 4% paraformaldehyde for histopathological analysis.

#### *2.2. Histopathological Analysis of the Inguinal and Umbilical Ring Tissues*

The samples previously collected were routinely processed for histopathology as described by Souza et al. [7]. Briefly, sample tissues were dehydrated in increasing concentrations of ethanol, diaphanized with xylol and embedded in paraffin. Tissue sections were cut with an automatic microtome, stained using the hematoxylin and eosin (HE) and analyzed by optical microscopy.

#### *2.3. RNA Extraction, RNA-Seq Libraries Preparation and Sequencing*

Total RNA extraction was initiated using the Trizol reagent (Invitrogen, Carlsbac, CA, USA) according to the manufacturer's instructions. A 100 mg of the tissues were macerated in liquid nitrogen with a mortar and added to a polypropylene tube with 1 mL of the Trizol reagent, mixed using the vortex, and then incubated for five minutes at room temperature (RT, 25 ◦C). Then, 200 µL of chloroform were added, shaked vigorously and incubated at RT for another five minutes, followed by centrifugation. Approximately 600 µL of the aqueous phase were transferred to a new tube and 600 µL of 70% ethanol were added and homogenized by inversion. This volume was then added to a mini RNeasy silica column (Qiagen, Düsseldorf, Germany) following the manufacturers' protocol. The quantity and quality of total RNA were assessed by quantification in a Biodrop spectrophotometer (Biodrop, Cambridge, UK), in a 1% agarose gel and in the Bioanalyzer Agilent 2100 (Agilent

Technologies, Santa Clara, CA, USA). Samples with an RNA integrity number (RIN) > 8.0 were used to prepare the RNA-Seq libraries.

**Figure 1.** Pathological analysis. Legend: (**A**) swine affected with scrotal hernia. (**B**) Region affected with scrotal hernia (the white arrow shows the inguinal ring). (**C**) Swine affected with umbilical hernia. (**D**) Region affected with umbilical hernia (the white arrow shows the umbilical ring).

> The RNA of four normal and four SH-inguinal ring and five normal and five UHumbilical ring tissues were submitted to library preparation with the TruSeq mRNA Stranded Sample Preparation kit (Illumina, Inc., San Diego, CA, USA). For this, 2 µg of total RNA was purified using magnetic microspheres for poly-A selection. The libraries were quantified by qPCR and evaluated in Bioanalyzer (Agilent, Santa Clara, CA, USA), prior to sequencing. Libraries passing the quality control were sent for RNA sequencing in HiSeq 2500 equipment (Illumina, San Diego, CA, USA) at the Functional Genomics Center of ESALQ/USP, in a paired-end protocol (2 × 100 bp). All samples of each hernia and their respective controls were placed in the same lane. Additional information can be found in Romano et al. [6] and Souza et al. [7]. The transcriptome sequences for the scrotal and umbilical hernias used in this study were previously deposited in the SRA database with BioProject numbers PRJNA350530 and PRJNA445856, respectively.

#### *2.4. Quality Control and Differentially Expressed Genes*

The quality control analysis and mapping were performed using the Bioinformatics Analysis for Quality Control and Mapping (BAQCOM) pipeline available in the Github repository [33]. BAQCOM pipeline uses Trimmomatic tool [34] to remove short reads (<70 bp), reads with low quality (QPhred < 24) and adapter sequences. The sequences were mapped against the swine reference genome (Sus scrofa, assembly 11.1) available on the Ensembl [35] version 95 using the STAR (version 2.7.0) program [36]. To verify the consistency of the expression pattern between the sample groups, principal component analysis (PCA) plots were performed in RStudio [37] (version 1.1.463) from R language [38] (version 3.5.3). The EdgeR package [39] from R was used for the differentially expressed analysis. Differentially expressed (DE) genes in the analyzed tissues (case and control for each hernia) were selected based on the level of false discovery rate (FDR < 0.05) after the Benjamini–Hochberg (BH) method for multiple correction tests [40].

#### *2.5. Transcriptomes Characterization of Scrotal and Umbilical Hernia*

Initially, SH and UH-related transcriptomes were characterized as the total number of transcripts, number of protein-encoding genes, miRNAs, lncRNAs and non-characterized genes in the swine genome using the Biomart tool available in Ensembl 95 [35]. An annotation of non-characterized genes in the porcine genome was performed using DAVID 6.8 database [41]. The comparison between both transcriptomes was carried out to identify genes expressed in both types of hernia, SH and UH.

#### *2.6. Comparison of Differentially Expressed Genes and In Silico Functional Analysis*

The classification of DE genes was performed according to the database available in Ensembl 95 and further enrichment in DAVID 6.8 database [41]. A comparison of DE genes between both conditions was performed to verify if the genes were involved in the manifestation of the two types of hernia, and whether there was agreement or not between gene expression profiles in both conditions. In order to verify if DE transcripts named as uncharacterized proteins in Ensembl 95 were similar to genes known in other genomes, the sequences were aligned against the UniProt database [42] using BLASTp tool from the NCBI [43,44]. A gene interaction network was built with the DE genes common to both types of hernia using the STRING database [45,46]. The gene ontology (GO) was evaluated in the Panther [47] and DAVID 6.8 [41] databases, followed by clusterization in the REVIGO tool [48]. Furthermore, it was verified if the DE genes were located in QTL regions previously reported in the Pig QTL database [30].

#### *2.7. Identification of Polymorphisms*

For the polymorphisms discovery between the transcriptome of animals affected with each type of hernia, the Genome Analysis Tool Kit (GATK) program (version 3.8) [49] was used with the Picard (version 2.5) toolkit [50]. The search for SNPs and insertions or deletions (InDels) was carried out following the filtering parameters and sequence quality suggested by the best practices protocol [51,52]. The polymorphisms annotation for the two hernias studied was performed using the variant effect predictor (VEP) tool (version 3.8) [53] with standard parameters available in the Ensembl 95 database and using the KEGG Pathway database [54]. Therefore, this annotation allowed the discovery of new polymorphisms, as well as to verify their location and possible function in the genome. Using the Biomart data mining tool [35], miRNAs were observed, and a manual comparison was made to identify common miRNAs between SH and UH. Subsequently, a search in the miRBase database (version 22) [55,56] was performed to obtain individual information for each miRNA.

#### **3. Results**

#### *3.1. Histopathological Analysis of the Inguinal and Umbilical Ring Tissues*

In the microscopic evaluation, the group affected with SH showed a larger number of connective tissue fibers compared to the control group (Figure 2A,B). In the UH-affected group, the connective tissue was denser than in the control group (Figure 2C,D) as shown by Souza et al. [7].

#### *3.2. Sequencing and Mapping*

The RNA sequencing of all samples (*n* = 18) generated approximately 465 million paired-end reads and, after the quality control analyses, 13.84% of these were removed, resulting in approximately 400 million reads (86.16%) (Additional file 1: Table S1). The PCA plots show the separation between the affected and control samples from the two evaluated types of hernia (Supplementary Materials 2: Figure S2A–C). Around 93.50% of reads were mapped against the swine reference genome (Sus scrofa 11.1) (Ensembl 95), with individual samples ranging from 87.73% to 96.05%, distributed between the groups of healthy and affected by SH or UH. From those, 78.77% of all reads were mapped in genes. From the 25,880 annotated genes in the swine reference genome (Sus scrofa 11.1, Ensembl

95) [35], 13,307 (51.42%) genes were expressed in the inguinal ring and 13,302 (51.40%) in the umbilical ring.

**Figure 2.** Histopathological slide stained with hematoxylin and eosin (HE). Legend: (**A**) sample from the scrotal hernia (SH)-control group and (**B**) sample from the SH-affected group. A larger number of connective fibers is observed in the sample of the SH-affected group than in the sample from the SH-control group. (**C**) Sample from the umbilical hernia (UH)-control group and (**D**) sample from the UH-affected group. Connective tissue interspersed with adipose tissue is observed in the sample of the UH-control group, while in the sample from the UH-affected group, only proliferated connective tissue is observed.

#### *3.3. Characterization of the Scrotal and Umbilical Hernia Transcriptomes*

The transcripts from the inguinal and the umbilical ring tissues were classified according to the Ensembl 95 database [35] (Table 1). After comparing the two transcriptomes (SH and UH), 94.91% of the genes were identified in both groups (Figure 3). The Venn diagram also presents the number of transcripts expressed exclusively in each type of tissue.

#### *3.4. Differentially Expressed Genes*

In the pig inguinal ring transcriptome, 627 genes were differentially expressed (FDR < 0.05) between the control and the SH-affected group. Out of those, 435 genes (69.38%) were downregulated and 192 (30.62%) were upregulated in the SH-affected pigs compared to the normal animals. Regarding the genes expressed in the umbilical ring, 199 were DE between normal and UH-affected pigs. From those, 129 were downregulated (64.82%) and 70 (35.18%) upregulated in the UH-affected pigs when compared to the normal ones. In the samples from the SH group, 98.09% of the DE genes were characterized as protein coding genes, 0.64% as lncRNA, 0.32% as pseudogenes, 0.32% as C immunoglobulins, 0.32% as miscRNA, 0.16% as encoding immunoglobulins V and 0.16% as ribozyme. In the

UH transcriptome, 92.46% were protein coding genes, 3.52% immunoglobulin C coding genes, 1.51% pseudogenes, 1.01% miscRNAs, 1.01% mitochondrial ribosomal RNA and 0.50% lncRNA.


**Table 1.** Characterization of the transcripts identified in the inguinal and umbilical ring samples.

**Figure 3.** Distribution of transcripts identified in the pig inguinal and umbilical ring tissue samples. Legend: SH—scrotal hernia group; UH—umbilical hernia group. For the SH, the inguinal ring tissue was evaluated, and for the UH, the umbilical ring tissue was analyzed.

### *3.5. Differentially Expressed Genes Common to Both SH and UH Transcriptomes*

Comparing the DE genes identified in the SH and UH groups, 35 DE genes were present in both transcriptomes (Table 2).


**Table 2.** Differentially expressed genes identified in both scrotal (SH) and umbilical hernia (UH) groups.

From the 35 DE genes found in both tissues (inguinal and umbilical ring), 34 were protein coding and one was an immunoglobulin C coding gene. Moreover, eight transcripts (22.86%) were uncharacterized proteins (Table 3), of which six were similar to the amino acid sequences of the pig immunoglobulin and other was similar to another predicted protein in pigs (Table 3). When the relative expression of the 35 common DE genes from each group that represents a type of hernia was compared based on the log2 fold-change (logFC), 26 of these genes had a similar expression profile in the two types of hernia (Figure 4A), and nine had opposite expression profiles considering both types of hernia (Figure 4B).


**Table 3.** Differentially expressed genes in the inguinal and umbilical ring annotated as uncharacterized protein.

**Figure 4.** Common differentially expressed genes for scrotal and umbilical hernias and their respective control groups. Legend: (**A**) Genes with similar expression profile and (**B**) with opposite expression profile in the two types of hernia based on the Log2 Fold Change (log2FC).

From the 35 genes DE in both types of hernia, a network with 27 of them was built and the *MAP1LC3C* and *MUC16* genes grouped the two largest clusters of the network (Figure 5). One cluster was related to macroautophagy including the *MAP1LC3C*, *ATG3*, *ATG5* and *ATG12* genes (Figure 5) and the other cluster was composed by the mucin gene family (*MUC4*, *MUC6*, *MUC16* and *MUC20*) (Figure 5), which plays an important role protecting against environmental stress. A third group was related to the complement and coagulation cascade composed of genes *C3*, *CFH* and *CFI* (Figure 5).

**Figure 5.** Gene interaction network with differentially expressed genes common to both scrotal and umbilical hernias. Legend: gene network built with 27 of the 35 differentially expressed genes common to both types of hernia obtained with the STRING database using information from *Sus scrofa* proteins.

α Four metabolic pathways were enriched with the 35 genes DE in both types of hernia using the PANTHER database [47]: Huntington's disease (P00029) (*ARL4A*); muscarinic receptor signaling pathway 1 and 3 of acetylcholine (P00042) (*BCHE*); acetylcholine muscarinic receptor 2 and 4 signaling pathway (P00043) (*BCHE*) and acetylcholine receptor nicotinic signaling pathway (P00044) (*BCHE*). The enrichment of this set of 35 DE genes using the DAVID 6.8 database [41] indicated that those genes participate in 108 biological processes (BP) (Additional file 1: Table S2). The *KCNMA1* gene (potassium calciumactivated channel subfamily M α 1) was the most enriched in BP, appearing in 18 of them (Supplementary Materials 1: Table S2). These BP were clustered in nine macro biological processes (superclusters) using the REVIGO tool [48] (Table 4).

The 26 DE genes with similar expression profile enriched 99 BP (Supplementary Materials 1: Table S3). Considering the molecular function, the set of 35 genes was present in 57 different molecular functions mainly comprising binding, catalytic activity, molecular function regulator, structural molecule activity and transport activity. Using the Pig QTL database [30], two DE genes in both groups of hernias studied here were located in QTL regions already identified as being associated to SH hernia in pigs: the *ACAN* and *BCHE* genes were mapped, respectively, in the QTLs 55892 (SSC7) and 8794 (SSC13).


**Table 4.** Macro biological processes (superclusters) enriched with the 35 differentially expressed genes common to both types of hernia.

#### *3.6. Identification of Polymorphisms*

Using the GATK program with all sequences obtained from the RNA-Seq analyses, 67 polymorphisms were identified in the inguinal ring tissue between SH-affected and unaffected samples (Supplementary Materials 1: Table S4) and 76 in the umbilical ring tissue between UH-affected and unaffected samples (Supplementary Materials 1: Table S5). Comparing the transcriptomes of pigs affected with each type of hernia, the polymorphisms were then classified (Table 5). From the 67 polymorphisms related to scrotal hernia, 56 (83.58%) have already been described in VEP tool and 11 (16.42%) are considered new (Supplementary Materials 1: Table S4). Of the 76 polymorphisms referring to umbilical hernia, 62 (81.58%) have been previously described in VEP tool and 14 (18.42%) are new (Supplementary Materials 1: Table S5).

Considering the whole transcriptome of the two tissues, the variants detected for SH and UH were classified according to the functional region indicating their possible consequences in gene regulation (Table 6). Most of the SNPs in the SH group (37.74%) were classified as synonymous variants (Additional file 1: Table S4), and in the UH group, most were of the UTR3′ type (44.78%) (Supplementary Materials 1: Table S5). In the SH group, two observed variants had calculated SIFT (sorting intolerant from tolerant) score classified as tolerant (SIFT score > 0.05) (Table 7). One of them has already been described in the dbSNP database [35] and the other was classified as new. From the variants belonging to the UH group, six had the calculated tolerance prediction score (SIFT) detected, one of

them being deleterious (SIFT ≤ 0.05) and five tolerant (SIFT > 0.05) (Table 7), all of which were already present in the dbSNP database [35]. These six variants belong to six genes, two of which were enriched for metabolic pathways in the KEGG Pathway database [54] (Table 8). The frameshift type variants were located in two genes (*NCOA7* and *SEC62*), of which one was enriched for a metabolic pathway in the same database [54] (Table 8). The SNPs of the SH group were observed in 17 different genes, which enriched nine BP (Table 9) in the DAVID 6.8 database [41]. The SNPs found in the UH group were mapped in 24 genes, which enriched six biological processes (Table 10).

**Table 5.** Classification of polymorphisms found in samples from the inguinal and umbilical ring tissues from normal, and scrotal and umbilical hernia-affected pigs, respectively.


**Table 6.** Variants annotated in different functional classes in samples from inguinal and umbilical ring tissues.


**Table 7.** Missense variants observed in groups with sorting intolerant from tolerant (SIFT) score calculated in the dbSNP database (Ensembl).



**Table 8.** Genes with elevated impact variants enriched in metabolic pathways with the KEGG Pathway Database.

<sup>1</sup> Metabolic pathway identifying code described for Sus scrofa by the KEGG Pathway.

**Table 9.** Biological processes enriched with genes harboring SNPs in the scrotal hernia group.


The genes in bold were upregulated in the scrotal hernia-affected group.

In the SH group, the genes corresponding to exonic regions, in which the variants were observed, were not enriched by the KEGG Pathway database [54]. Considering the type of impact caused by the variants, the results were distributed as shown in Figure 6A,B for the scrotal and umbilical hernias, respectively. These figures show that more than 60% of the variants represent variations of modifying impact for both types of hernia.

Among the transcripts present in the analyzed samples, five miRNAs were identified in the SH transcriptome and four in the UH transcriptome (Table 11). From these, three were expressed in both types of hernia. No DE miRNAs belonging to the evaluated samples were identified.


**Table 10.** Biological processes enriched with genes harboring single nucleotide polymorphisms (SNPs) in the umbilical hernia group.

The genes in bold were upregulated in the umbilical hernia-affected group.

**Figure 6.** Impact caused by variants and its frequency. Legend: (**A**) samples from the scrotal hernia group. (**B**) Samples from the umbilical hernia group.

**Table 11.** miRNAs identified in the transcriptomes of the pig inguinal and umbilical ring tissues.


SH stands for scrotal hernia and UH for umbilical hernia.

#### **4. Discussion**

Some studies investigating genes involved with the occurrence of hernias have been performed using candidate genes and GWAS approaches [11,25,29,31,32,57–60]. More recently, functional candidate genes were prospected for scrotal [6] and umbilical [7] hernias by our group using RNA-Seq approach. Nevertheless, since the molecular mechanisms involved with these anomalies are not yet completely understood, a comparison between the transcriptome of umbilical and scrotal hernias was performed here, allowing the identification of several common genes differentially expressed in both conditions. Moreover, several SNPs involved with these conditions were also identified and characterized.

#### *4.1. Transcriptome Characterization*

Gene expression studies obtained from samples of the inguinal and the umbilical ring tissue are quite recent. Lorenzetti et al. [5] and Romano et al. [6] performed gene expression analyses from the pig inguinal ring and Souza et al. [7] performed analyses with umbilical ring samples to investigate scrotal and umbilical hernias, respectively. Information about gene expression in those tissues is scarce and there are still many gaps to be elucidated in this field. Those authors found DE genes and pathways in each type of hernia. Since some genes and biological processes seem to be shared between SH and UH, new analyses were performed with focus on tissues characterization and on common processes involved with both types of hernia.

From the transcripts characterization of the two tissues (Table 1), a great similarity between the groups of both types of hernia was observed when comparing the number of each class of transcripts, implying that the appearance of both hernias may be related to the same set of genes or family of genes. This large number of transcripts that are expressed in both groups can also be seen in the Venn diagram (Figure 3). With the exception of processed pseudogenes and snRNA, which were not identified in the UH group, the percentage of each type of transcript was similar. Thus, the expression profile of the genes in the inguinal ring tissue was very similar to the profile found in the umbilical ring, being compatible with the histopathological composition of these two tissues (Figure 2).

#### *4.2. Common Differentially Expressed Genes in Scrotal and Umbilical Hernias*

From the DE genes observed in each type of hernia, 35 were common to both groups. Among these, nine genes (*CD2, GPT2, MOXD1*, ENSSSCG00000031037, EN-SSSCG00000032582, ENSSSCG00000036224, ENSSSCG00000036983, ENSSSCG00000037009 and ENSSSCG00000039111) had different expression profiles when comparing both types of hernia. This behavior may have occurred due to the expression be tissue specific (inguinal ring and umbilical ring) for those genes. Other reasons could be the differences in sex and age between the groups evaluated for the two types of hernia. The other 26 DE genes have shown similar expression in both types of hernia, of which 14 genes were downregulated and 12 were upregulated in pigs affected by both types of hernia.

From the gene interaction network (Figure 5), three DE genes were enriched in both types of hernia. The *MAP1LC3C* (microtubule associated protein 1 light chain 3 γ) interacted in the group of the macroautophagy BP (GO: 0016236) [61]. Macroautophagy is the main path involved in inducing general renewal of cytoplasmic constituents in eukaryotic cells and is essential for cell survival, development, differentiation and homeostasis [62–66]. The Gene Ontology (GO) annotations related to this gene include the assembly and maturation of the autophagosome (Supplementary Materials 1: Table S4). Marcelino et al. [67] indicated the *MAP1LC3C* gene as a candidate for the formation of UH in pigs since this gene was upregulated in the affected compared to the normal pigs. In our research, the *MAP1LC3C* gene exhibited the same behavior, being upregulated in affected animals of both types of hernia when compared to the control groups (Figure 4A). Moreover, this expression profile can be one of the causes of the hernia onset, since the high expression of this gene can cause excessive autophagy and interfere with normal tissue development [68].

The *CFI* gene (Complement Factor I) was grouped in the cluster of the coagulation cascade metabolic pathway and complement system (Figure 5). The coagulation cascade is a sequence of interconnected reactions in order to clot the local blood when a blood vessel injury occurs [69]. The complement system is a proteolytic cascade in blood plasma and a mediator of innate immunity [70]. The GO annotations related to this gene include endocytosis (content absorption through membrane invagination process) and proteolysis (protein degradation process) (Supplementary Materials 1: Table S2). The CFI gene, like the *MAP1LC3C*, was upregulated in animals with hernia when compared to the control group (Figure 4A). The *CFI* encodes the trypsin-like protein serine-protease [42], which plays an essential role in regulating the immune response, controlling all the complement pathways [71]. The participation of the *CFI* in these pathways and processes, taken together with its expression profile, suggests that this gene could be involved with the consequence of these disorders.

The *MUC16* (Mucin-16) gene encodes a protein of the mucin family, which are Oglycosylated proteins found in the apical surfaces of the epithelium and play an important role in the formation of a protective mucous barrier [72]. This gene was enriched in the gene network (Figure 5) as a participant in processing O-glycan BP (GO: 16266) [61]. This process is related to the gradual addition of carbohydrate residues or carbohydrate derivatives to form the O-glycan structure [73]. The *MUC16* gene was also enriched as an integral cell membrane component BP (GO: 16021) [61]. According to Blalock et al. [74], the MUC16 build a protective barrier to the epithelial cell surface, where binding proteins are associated with its tail, linking it to the actin cytoskeleton. This gene was upregulated in the affected group of both types of hernia compared to their respective control groups (Figure 4A), thus configuring a defense system that might have arisen as a consequence of the hernias formation.

#### *4.3. Enriched Biological Processes*

When the 99 BP enriched by the 26 DE genes with an equivalent profile in both types of hernia (Supplementary Materials 1: Table S3) were evaluated, the BP of cell adhesion, apoptosis, organization of the actin cytoskeleton and organization of collagen fibrils can be highlighted, because they are generally linked to the formation of hernias [5–7]. The enriched genes for cell adhesion, *VIT* (Vitrin), *ACER2* (Alkaline Ceramidase 2) and *CHL1* (Cell Adhesion Molecule L1 Like) were downregulated in the affected animals compared to the control groups and *ACAN* (Aggrecan) was upregulated in the affected animals. The cell adhesion BP allows the interaction among cells, and between cells and the extracellular matrix [75]. This BP has already been related to tissue maintenance and cell differentiation [76,77]. The *CHL1* gene was enriched with the process of homophilic cell adhesion via plasma membrane adhesion molecules. *ACER2* participates in the specific BP of negative regulation of cell adhesion mediated by integrin and negative regulation of cell matrix adhesion. The *VIT* gene enriched the process of positive regulation of cell substrate adhesion. Thus, the reduced expression of these genes that actively participate in cell adhesion interferes with the integrity of tissues, which can be determinant for the appearance of both types of hernia.

The *KCNMA1* gene (Potassium Calcium—Activated Channel Subfamily M α 1), which was upregulated in animals affected with hernia, and *ACER2* (downregulated) were enriched in the apoptosis BP. This process is related to the regulation of programmed cell death, which is extremely important for the maintenance of the development of living beings [78]. The overexpression of the *KCNMA1* gene can compromise the tissue as a result of an accumulation of immature cells in the region, which can influence the appearance of hernias, especially when associated with unfavorable environmental factors. *ACER2* was enriched with the specific process of activating cysteine-type endopeptidase activity, involved in the apoptotic process, and the *KCNMA1* was enriched for positive regulation of the apoptotic process. This last gene was also enriched for the relaxation process of the vascular smooth muscle that is related to the negative regulation of the contraction

of this muscle. The relaxation is mediated by a decrease in the phosphorylation state of the myosin light chain [79]. As the expression of this gene was higher in herniated than in normal pigs, the *KCNMA1* can be pointed out as a candidate gene for the formation of umbilical and scrotal hernia, since the lack of local muscle contraction facilitates the passage of the abdominal content through the rings.

Biological processes that regulate the activities of collagen and its structures have been indicated in the enrichment of the *ACAN* and *VIT* genes. The first gene was related to the condensation of mesenchymal cells that differentiate into chondrocytes, organization of collagen fibrils and the development of chondrocytes [61]. The *VIT* gene, on the other hand, was related to the morphogenesis of chondrocytes in the cartilage of the growth plate, in which the structures of a chondrocyte are generated and organized [80]. The *ACAN* gene was upregulated in animals affected with hernia, which is in accordance with the histopathological analyses that evidenced a larger amount of collagen compared to normal pigs. Moreover, *ACAN* upregulation in animals affected with hernia can generate an exaggerated collagen production, which has already been related to hernia previously [81].

Regarding the organization of the cytoskeleton, especially those processes related to actin, two genes were enriched, *SYNPO2* (Synaptopodin 2) and *ENSSSCG00000037142*. *SYNPO2* has been enriched specifically for the process of positive regulation of the actin filament bundles set. The organization of the actin cytoskeleton is carried out at the cellular level and results in the assembly, disposition of the constituent parts or disassembly of the structures, including filaments and their associated proteins [61]. In our study, this gene was downregulated in animals with hernia. This negative regulation can be a predisposing factor to hernia, since the non-assembly and organization of the structures that constitute the tissue can make it less resistant [18].

#### *4.4. DE Genes Located in QTL Regions for Hernias and Polymorphisms Characterization*

Several studies have been carried out to identify QTL regions related to umbilical and scrotal hernia [11,20,29,81,82]. Among the DE genes in the two types of hernia, *ACAN* and *BCHE* (Butyrylcholinesterase) are highlight since they have already been located in QTL regions associated to scrotal/inguinal hernia [20,29]. Even with scientific reports relating these two genes only with QTL regions for scrotal hernia, in our study, the expression profile of these two genes was equivalent in both types of hernia, being upregulated in the affected animals. Souza et al. [7] have recently indicated *ACAN* as a strong candidate gene for triggering umbilical hernias in pigs.

Our results have shown that variations in the transcripts may be related to the manifestation of the different types of hernia. In both groups, most of the polymorphisms detected were SNPs, followed by insertions and deletions (Supplementary Materials 1: Tables S4 and S5). In the SH group, a new SNP was identified on chromosome 13 (13: 34083960-34083960), which is located within a QTL region (QTL ID 55898) associated with scrotal hernia [11,20,29]. This SNP was mapped in the *PARP3* gene (member of the poly ADP-ribose 3 polymerase family), which acts in the repair pathways by base excision, apoptosis and necroptosis, participating in biological processes of DNA repair [4]. Moreover, Piórkowska et al. [83] carried out research with Polish Landrace and Pulawska pigs and pointed out the participation of the PARP3 gene in the regulation of the actin cytoskeleton BP. The muscle tissues belonging to the regions where the hernias occur are classified as skeletal striatum, which are formed by myofibrils composed by actin and myosin. As mentioned by Bendavid [18], disturbances in the structures of muscle fibers cause low resistance in the inguinal region, which can lead to scrotal hernia.

The 53 SNPs observed in the SH group were located in 17 genes (Additional file 1: Table S4), which have been enriched in nine biological processes (Table 9). Most of these BP were related to homeostasis, which are processes that maintain the stability of the structure of the analyzed tissue (GO:0042592) [61]. The *ACACA* gene, enriched in these BP, participates in processes that maintain the stability of anatomical structures of the site [43]. *ACACA* was downregulated in the SH-affected animals (Table 9) indicating

the development of an unstable structure of the inguinal ring, which can influence the development of hernia. In humans, this gene participates in the fatty acid synthesis BP [43], which reinforces the histopathological findings that showed greater amount of adipose tissue in normal than in SH-affected pigs (Figure 2A,B). From the SNPs found in the SH group, all variants were tolerated (Table 7). According to the SIFT score, two had a moderate impact classification, so they can alter the effectiveness of the encoded protein. This means that the function of the proteins resulting from these sites has not been altered, since the SIFT score is a tool that predicts whether the variant affects the function of the protein or not [35]. These SNPs were located in two genes, *RAI14* and *RALGAPA1*; the first has already been annotated and the second has no identification in the VEP tool. No high impact polymorphisms were identified in the SH group.

The 67 SNPs found in the UH group were mapped in 24 genes (Supplementary Materials 1: Table S5). These genes were enriched in six biological processes (Table 10) [41], all of which were related to some type of regulation, mainly metabolic. The *EPHB2* and *VIM* genes were enriched in BP that interrupts the processes of cellular projections formation (GO: 0031345) [61]. These two genes were upregulated in animals affected with UH when compared to the control group (Table 10). The *VIM* gene encodes an intermediate filament protein that is part of the cytoskeleton [43]. Lazarides [84] reported that high amount of this filament is observed in the early stages of myogenesis in humans, and is hardly identified in adult muscles. Thus, the levels of this protein indicate functionality feature. The upregulation of *VIM* in the umbilical ring tissue of the UH-affected animals suggests that this gene may be involved with a consequence of UH since Miller et al. [12] reported that the appearance of hernia can be a consequence of a muscular defect.

Polymorphisms that had a high impact rating in the UH group (Supplementary Materials 1: Table S5) were identified in two genes (*NCOA7* and *SEC62*). These variants still do not have identification in the tool used, but they were classified as insertions of the Frameshift type. Therefore, they can cause an interruption in the translation reading frame, because the number of inserted nucleotides is not multiple of three [35]. The *NCOA7* (Nuclear receptor coactivator 7) is involved in the biological process of RNA polymerase II transcription and negative regulation of the cellular response to oxidative stress. The *SEC62* (Preprotein translocation factor) is related to the regulation of post-translational protein transport to the membrane BP and was mapped in a QTL region for stillborn pigs [85]. The detection of these polymorphisms is important because they can alter not only processes related to hernias, but all important processes for biological maintenance, possibly resulting in transcription failures or disruption in the transport of translated proteins by lack of regulation.

The SNPs classified as having moderate impact for the UH group were found in six genes (Supplementary Materials 1: Table S5), with the SNP rs327289001 being highlighted due to its deleterious SIFT score. This SNP is located in the *ITGAM* gene that participates in the biological process of ectodermal cell differentiation [35]. This process is related to the specialization of previously non-specialized cells, which acquire structure and functioning of ectodermic cells. This differentiation integrates the processes involved in the commitment of a cell to its specific purpose (GO: 0010668) [61]. In the embryonic gastrulation phase, the formation of germ layers (ectoderm, mesoderm and endoderm) occurs, which will give rise to specific tissues and organs [86]. The ectoderm is the external layer of a developing embryo and gives rise to epidermis, hair, nails, cutaneous and mammary glands, tooth enamel, inner ear, lens, and the anterior part of the pituitary gland, besides others related to the neural tube and neural crest [86]. A SNP with deleterious SIFT score indicates that the function of the protein can be altered due to the polymorphism, which in this case can result in non-differentiated cells, compromising the formation of resistant tissues, which, when associated with environmental factors such as obesity, can lead to hernia. SNPs located in QTL regions associated with UH were not found in the current study.

The SNP rs339972872 from the SH group and the SNPs rs324236192 and rs340781986 from the UH group were located in the same gene (*ACACA*) (Supplementary Materials 1: Tables S4 and S5). These are synonym SNPs and were classified as low impact. According to Stachowiak et al. [87], the *ACACA* gene is involved with performance traits in pigs.

From the expressed miRNAs, three were identified in the groups of both hernias. One of them, ssc-mir-214, plays an important role in the regulation of ovarian function and in the induction of granular ovarian cells to induce follicular development [88]. The ssc-mir-145, which was identified only in samples from the SH group, is involved in the development of adipose tissue [89].

We conducted the first comparative study of the pig inguinal and umbilical ring tissue transcriptomes. The results demonstrated similarities related to the expression profile of the whole transcriptome and DE genes in both types of hernia. The *ACAN* gene, which had already been associated to the appearance of scrotal hernia, showed similar behavior in the data obtained from the umbilical hernia group. Moreover, the *MAP1LC3C*, *VIT*, *ACER2*, *KCNMA1* and *SYNPO2* genes were highlighted as candidates for the formation of the two types of hernias evaluated in our studied for presenting equivalent expression in both hernias and for being involved in biological processes such as cell adhesion, cytoskeleton organization, collagen production, muscle relaxation and autophagy. Furthermore, the differential expression of some of those genes, such as *MAP1LC3C*, *VIT*, *ACER2* and *ACAN*, has already been confirmed using qPCR [6,7]. However, further studies are needed to identify the expression profile of these same genes in younger animals to improve our interpretation of the gene regulation mechanisms triggering the formation of hernias. The knowledge of the genetic factors that control the manifestation of both scrotal and umbilical hernia brings possibilities to the pig production chain to develop actions to reduce the appearance of these defects in their herds, aiming to reduce economical losses and favoring the animal welfare.

#### **5. Conclusions**

The expression profile of the inguinal and umbilical ring transcriptomes showed great similarity. Thirty-five differentially expressed genes between normal and affected samples were common to both types of hernia. The *MAP1LC3C*, *ACAN*, *VIT*, *ACER2*, *KCNMA1* and *SYNPO2* genes are indicated as strong candidates for the appearance of both defects. A total of 11 and 14 new SNPs were identified in the samples related to the scrotal hernia and umbilical hernia, respectively. Moreover, a SNP with predicted deleterious function was identified in the *ITGAM* gene, which might be related to the appearance of umbilical hernia in pigs. Finally, the expression profile of these genes possibly interferes with the normal development of the tissues, causing weakness and decreasing the resistance of the site, which can lead to the formation of both types of hernia in pigs.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2073-442 5/12/2/166/s1, Table S1: average of reads sequenced, removed in the quality control analysis and mapped in each group of samples, Table S2: biological processes of the 35 differentially expressed genes between normal and affected pigs common to both scrotal and umbilical hernias using DAVID database. Table S3: enrichment for biological process of the 26 DE genes with equivalent expression profile between both types of hernias using DAVID database. Table S4: polymorphisms identified in samples of the pig inguinal ring. Table S5: polymorphisms identified in samples of the pig umbilical ring. Figure S1: diagram summarizing the experiment and analyses performed. Figure S2: principal component analysis (PCA) plot S2A showing the separation of control (c) and affected (a) samples used to generate the transcriptome of the inguinal ring for scrotal hernia (SH), S2B showing the separation of control (c) and affected (a) samples used to generate the transcriptome of the umbilical ring for umbilical hernia (UH), and S2C with all samples together showing the separation of samples from both SH and UH transcriptomes.

**Author Contributions:** Conceptualization, A.F.G.R., A.M.G.I., J.d.O.P., M.E.C. and M.C.L.; methodology, A.M.G.I., J.d.O.P. and M.C.L.; software, M.E.C. and A.M.G.I.; validation, M.E.C., A.M.G.I., J.d.O.P. and M.C.L.; formal analysis, A.F.G.R., A.M.G.I., J.d.O.P., M.E.C., L.O.D.C., I.R.S., M.R.S., M.A.Z.M., H.C.d.O. and M.C.L.; investigation, A.F.G.R., A.M.G.I., J.d.O.P., M.E.C. and M.C.L.; resources, M.C.L.; data curation, A.F.G.R., A.M.G.I., J.d.O.P., M.E.C. and M.C.L.; writing—original draft preparation, A.F.G.R., A.M.G.I., J.d.O.P., L.O.D.C. and M.A.Z.M.; writing—review and editing, A.F.G.R., A.M.G.I., J.d.O.P., I.R.S., M.R.S., M.E.C., M.A.Z.M., H.C.d.O., L.O.D.C. and M.C.L.; visualization, A.F.G.R., A.M.G.I., J.d.O.P. and M.C.L.; supervision, M.C.L.; project administration, M.C.L.; A.M.G.I., funding acquisition, M.C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Council of Scientific and Technological Development (CNPq), from the Brazilian Government, grant number #476146/2013.

**Institutional Review Board Statement:** This study was performed with the approval of the Embrapa Swine and Poultry National Research Center Ethical Committee of Animal Use (CEUA) under the protocol number 011/2014 in 07/11/2014.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The datasets used or analyzed during the current study are available from the corresponding author on reasonable request. The transcriptome sequences for the scrotal and umbilical hernias are available in the SRA database with BioProject numbers PRJNA350530 and PRJNA445856, respectively. The SNP information is available in the EVA database with Project number PRJEB42670, Analyses number ERZ1737910.

**Acknowledgments:** The authors are grateful to Alexandre L. Tessmann for technical assistance. I.R. Savoldi was sponsored by a PROMOP/Udesc scholarship and M.R. Souza by a CAPES/FAPESC scholarship. MCL is recipient of a productivity fellowship from the National Council for Scientific and Technological Development (CNPq).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


### *Article* **Weighted Single-Step GWAS Identified Candidate Genes Associated with Growth Traits in a Duroc Pig Population**

**Donglin Ruan 1,2,† , Zhanwei Zhuang 1,2,† , Rongrong Ding 1,2 , Yibin Qiu 1,2 , Shenping Zhou 1,2 , Jie Wu 1,2 , Cineng Xu 1,2, Linjun Hong 1,2 , Sixiu Huang 1,2 , Enqin Zheng 1,2 , Gengyuan Cai 1 , Zhenfang Wu 1,2, \* and Jie Yang 1,2,\***


**Abstract:** Growth traits are important economic traits of pigs that are controlled by several major genes and multiple minor genes. To better understand the genetic architecture of growth traits, we performed a weighted single-step genome-wide association study (wssGWAS) to identify genomic regions and candidate genes that are associated with days to 100 kg (AGE), average daily gain (ADG), backfat thickness (BF) and lean meat percentage (LMP) in a Duroc pig population. In this study, 3945 individuals with phenotypic and genealogical information, of which 2084 pigs were genotyped with a 50 K single-nucleotide polymorphism (SNP) array, were used for association analyses. We found that the most significant regions explained 2.56–3.07% of genetic variance for four traits, and the detected significant regions (>1%) explained 17.07%, 18.59%, 23.87% and 21.94% for four traits. Finally, 21 genes that have been reported to be associated with metabolism, bone growth, and fat deposition were treated as candidate genes for growth traits in pigs. Moreover, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses implied that the identified genes took part in bone formation, the immune system, and digestion. In conclusion, such full use of phenotypic, genotypic, and genealogical information will accelerate the genetic improvement of growth traits in pigs.

**Keywords:** Duroc pigs; growth traits; weighted single-step GWAS; SNP

#### **1. Introduction**

Pork is the primary source of protein for humans, with global pork consumption exceeding 110 metric kilotons per year [1]. Growth traits are economically important traits in porcine breeding programs, as accelerating the genetic process of growth-related traits can increase the supply of pork. At present, the age to 100 kg, average daily gain, backfat thickness, and lean meat percentage for a specific stage are vital indicators to measure the growth rate and carcass fat content of pigs due to their significant impact on production efficiency [2]. Furthermore, both genetic and non-genetic effects can affect growth traits, including pig breed, feeding behavior, and nutrition level. However, the above four traits have moderate heritability [3], suggesting that they could be improved by the genetic method.

Since the first genome-wide association study (GWAS) for age-related macular degeneration was published in 2005, GWAS has been widely used to identify quantitative trait loci (QTL) and to map candidate genes for complex traits in humans [4] and domestic animals [5]. Until now, 2036 QTL for growth traits have been reported in the pig QTL database

**Citation:** Ruan, D.; Zhuang, Z.; Ding, R.; Qiu, Y.; Zhou, S.; Wu, J.; Xu, C.; Hong, L.; Huang, S.; Zheng, E.; et al. Weighted Single-Step GWAS Identified Candidate Genes Associated with Growth Traits in a Duroc Pig Population. *Genes* **2021**, *12*, 117. https://doi.org/10.3390/ genes12010117

Received: 23 October 2020 Accepted: 11 January 2021 Published: 19 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

(https://www.animalgenome.org/cgi-bin/QTLdb/SS/summary, release 27 August 2020). These findings have provided a certain number of molecular markers to porcine breeding for growth traits—for instance, Jiang et al. [6] performed a GWAS in a total of 2025 American and British Yorkshire pigs using PorcineSNP80 bead chip and detected five significant SNPs for days to 100 kg and the other five significant SNPs for 10th rib backfat thickness. Qiao et al. [7] found 14 QTL significantly associated with growth-related traits for White Duroc × Erhualian F2 and Sutai (Chinese Taihu × Western Duroc) populations. Although many studies have contributed to complex quantitative traits by GWAS, the genetic mechanisms of growth traits in pigs remain unclear. Additionally, some single marker GWAS analyses result in a weak power for QTLs detection and low accuracy for mapping. Moreover, most studies on GWAS for growth traits used the limited population size of genotyped animals and neglected the pedigree relationship. To overcome the limitation of the traditional GWAS approach, the weighted single-step GWAS (wssGWAS) proposed by Wang et al. [8] is preferable for livestock breeding, for which phenotypic and genealogical information is available for the vast majority of individuals and the small size of individuals genotyped.

The GWAS under the single-step genomic best linear unbiased prediction (ssGBLUP) framework is called ssGWAS, which intermixes genotypes, pedigree, and phenotypes data in a single analysis without creating pseudo-phenotypes [9]. However, when some traits are affected by significant QTL in practice, it is improper to account for all SNPs to explain the same proportion of genetic variance in ssGBLUP [10]. In that case, the wssGWAS can be adopted, which weighs SNPs according to their effects that were calculated genomic estimated breeding values (GEBVs) via ssGBLUP. The wssGWAS method has been successfully applied to detect supplementary QTLs and candidate genes in domestic and aquaculture animals, such as carcass traits in Nellore cattle [11], growth and carcass traits in rainbow trout [12], and reproductive traits in pigs [13]. However, to our knowledge, few wssGWASs have been performed to study growth traits in purebred Duroc pigs. Therefore, this study aims to identify genomic regions and candidate genes associated with growth traits such as days adjusted to 100 kg (AGE), average daily gain adjusted to 100 kg (ADG), backfat thickness (BF) and predicted lean meat percentage (LMP) adjusted to 100 kg in a Duroc pig population using the wssGWAS methodology. Then, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis facilitate further understanding of biological processes and functional terms of candidate genes for growth traits.

#### **2. Materials and Methods**

#### *2.1. Ethics Statement*

All animals used in this study were used according to the guidelines for the care and use of experimental animals established by the Ministry of Agriculture and Rural Affairs of China. The ethics committee of South China Agricultural University (SCAU, Guangzhou, China) approved the entire study. No experimental animals were anesthetized or euthanized in this study.

#### *2.2. Animals, Phenotypes, and Pedigree*

The animals used in this study were raised in two core farms of the Wens Foodstuff Group CO., Ltd. (Guangdong, China) with uniform standards. In brief, a total of 3945 Canadian Duroc pigs (1966 males and 1979 females) born between 2015 and 2017 were used in this study. Among them, 2084 individuals had genotypes and four growth-trait phenotypes in the pedigree, while 1843 ungenotyped individuals in the pedigree had phenotypes of AGE and ADG, and 1825 ungenotyped individuals in the pedigree had phenotypes of BF and LMP. Furthermore, the complete pedigree could be traced back 3 generations, with 5204 pigs in the full pedigree (2103 males and 3101 females).

Days to 100 kg and ADG were measured from 30 to 115 kg and then adjusted to 100 kg. AGE was adjusted to 100 kg using the formula below [14]:

$$AGE \text{ adjusted to } 100 \text{ kg} = \text{Measured age} - \left(\frac{\text{Measured weight} - 100 \text{ kg}}{\text{Correction factor } 1}\right) \tag{1}$$

where the correction factor 1 of sire and dam are different, as follows:

$$\text{Sire} : \text{Correction factor 1} = \frac{\text{Measured weight}}{\text{Measured age}} \times 1.826\tag{2}$$

$$\text{Dam} : \text{Correction factor 1} = \frac{\text{Measured weight}}{\text{Measured age}} \times 1.715 \tag{3}$$

*ADG* was adjusted to 100 kg by following formula [14]:

$$\text{ADG adjusted to 100 kg} = \frac{100 \text{ kg}}{\text{AGE adjusted to 100 kg}}\tag{4}$$

Adjusting LMP to 100 kg, phenotypes of BF and loin muscle depth (LMD) was measured between the last 3rd and 4th rib of Duroc pigs at the weight of 100 ± 5 kg by an Aloka 500 V SSD B ultrasound (Coromertics Medical Systems, USA) [15]. BF and LMD adjusted to 100 kg were calculated as reported by the Canadian Center for Swine Improvement (http://www.ccsi.ca/Reports/Reports\_2007/Update\_of\_weight\_adjustment\_ factors\_for\_fat\_and\_lean\_depth.pdf):

$$\text{BF adjusted to 100 kg} = \text{Measured BF} \times \frac{A}{A + \left[\text{B} \times (\text{Measured Weight} - 100)\right]} \tag{5}$$

where *A* and *B* are different for sire and dam, as follows:

$$\text{Sire} : A = 13.47; \ B = 0.1115 \tag{6}$$

$$\text{Dam}: A = 15.65; \text{ B = 0.1566}\tag{7}$$

*LMD* adjusted to 100 kg was calculated by the following equation [16]:

$$\text{LMD adjusted to 100 kg} = \text{Measured LMD} \times \left[ \frac{a}{a + b \times (Measured Weight - 100)} \right] \tag{8}$$

where *a* and *b* are gender-specific, and

$$Size: a = 50.52; \; b = 0.228\tag{9}$$

$$\text{Dam}: a = \text{52.01}; b = 0.\text{228} \tag{10}$$

*LMP* was adjusted to 100 kg using the formula below [16]:

$$\text{LMP adjusted to 100 kg} = 61.21920 - 0.77665 \times \text{BF} + 0.15239 \times \text{LMD} \tag{11}$$

Overall, 3927 individuals were used in wssGWAS for ADG and AGE; 3909 individuals were used in wssGWAS for BF and LMP.

#### *2.3. Genotyping and Quality Control (QC)*

DNA was extracted from ear tissue of 2084 Duroc pigs following the standard phenol/chloroform method, then quantified and diluted to 50 ng/µL. All DNA samples were genotyped by GeneSeek porcine 50 K SNP chip from Illumina (Neogen, Lincoln, NE, USA), including 50,649 SNPs mapped to Sus scrofa11.1 (https://www.ensembl.org/biomart) in total. Quality control was performed by PLINK v1.09 (Boston, MA, USA) [17] in which

SNPs were excluded when individuals call rate was <90%, SNPs call rate was <90%, Hardy– Weinberg equilibrium *p*-value was <10−<sup>6</sup> , minor allele frequency was <0.01, and SNPs were located in sex chromosomes and unmapped. After QC, a final set of 35,851 high-quality SNPs for 2084 Duroc pigs remained for subsequent analyses.

#### *2.4. Statistical Analyses*

Variance components for AGE, ADG, BF, and LMP traits were estimated with two methods using the average information restricted maximum-likelihood (AIREML), including pedigree-based Best Linear Unbiased Prediction (BLUP) and ssGBLUP. The four traits were analyzed using the same single-trait animal model, as described below:

$$Y = Xb + Za + e\tag{12}$$

where *Y* was the vector of phenotypic values; *X* was the incidence matrix of fix effect for relating phenotypes; *b* was the vector of fixed effect, including birth year, sex, and farm; *Z* was the incidence matrix of random effect; *a* was the vector of additive genetic effects, and *e* was the vector of residuals. Narrow sense heritability was estimated as *h* <sup>2</sup> = *σ* 2 *a σ* 2 *<sup>a</sup>* +*σ* 2 *e* , where *σ* 2 *<sup>a</sup>* and *σ* 2 *<sup>e</sup>* were additive genetic variance and residual variance, respectively.

Additionally, the GEBVs of all individuals were estimated via the same single-trait model as described previously using the ssGBLUP [18] approach, and marker effects were calculated from the GEBVs. Comparing with the regular BLUP approach, ssGBLUP replaces the inverse of the pedigree relationship matrix (*A* −1 ) with the matrix *H*−<sup>1</sup> , for which the matrix *H* combined the pedigree and the genomic relationship matrices [19]. The inverse of matrix *H* was represented as follows:

$$H^{-1} = A^{-1} + \begin{bmatrix} 0 & 0 \\ 0 & G^{-1} - A\_{22}^{-1} \end{bmatrix} \tag{13}$$

where *A* −1 <sup>22</sup> was the inverse matrix of the numerator relationship matrix considering genotyped animals and *G* <sup>−</sup><sup>1</sup> was the inverse matrix of the genomic relationship matrix [20]. The genomic matrix *G* can be created as follows [21]:

$$G = \frac{Z D Z'}{\sum\_{i=1}^{N} 2p\_i (1 - p\_i)} \tag{14}$$

where *Z* was a centered matrix of SNP genotypes (aa = 0, Aa = 1 and AA = 2), *D* was a matrix of weights for SNP variances, *n* was the number of SNPs and *p<sup>i</sup>* was the minor allele frequency of the i-th SNP [8].

The wssGWAS of SNP effects and weights were calculated following by Wang et al. [8]:


$$d\_{i(t)} = \mathfrak{M}\_{i(t)}^2 p\_i (1 - p\_i) \tag{15}$$

where *i* was the *i*-th SNP;

6. Normalize SNP weights to keep total genetic variance constant via

$$D\_{(t+1)} = \frac{\operatorname{tr}\left(D\_{(t)}\right) \times D\_{(t+1)}}{\operatorname{tr}\left(D\_{(t+1)}\right)}\tag{16}$$

7. Set *t* = *t* + 1, then loop to step 2.

The procedure was run for three iterations, as suggested by Wang et al. [8], which reached a high accuracy of GEBVs. In this study, SNPs located within 0.8 Mb (according to the linkage disequilibrium decay of this population [22]) were grouped in a window, and the percentage of genetic variance explained by each 0.8 Mb window was calculated following as below [8]:

$$\frac{\text{Var}(a\_i)}{\sigma\_a^2} \times 100\% = \frac{\text{Var}(\sum\_{j=1}^{x} Z\_j \mathbf{g}\_j)}{\sigma\_a^2} \tag{17}$$

where *a<sup>i</sup>* was the genetic value of the *i*-th region consisting of *x* = 0.8 Mb.

The procedures mentioned above were run with BLUPF90 software family programs [23] iteratively. The RENUMF90 module was used to obtain the required parameter file formats; the AIREMLF90 module was used for variance components estimation, the BLUPF90 module for GEBVs calculation, and the postGSF90 module for association analysis.

#### *2.5. Identification of Candidate Genes and Functional Enrichment Analysis*

Genomic windows that explained higher than 1.0% of the total genetic variance were selected as candidate QTL regions associated with growth traits in this study, which was also used in previous studies [8,13]. Since the 0.8 Mb window explained on–average 0.0473% (100% divided by 2115 genomic regions) of the genetic variance, the 1% threshold is over 20 times the expected average genetic variance explained by the 0.8 Mb window. The first three windows that explained the largest proportion of genetic variance for each trait were extended to 0.4 Mb flanking on either side of the regions. For the identified QTL regions, genes were searched using the Ensemble Sus scrofa 11.1 (https://www.ensembl. org/biomart) database within significant windows. To better understand the biological processes, GO and KEGG analyses were performed based on genes within significant regions using the database for annotation, visualization, and integrated discovery (DAVID v6.8, https://david.ncifcrf.gov/). A *p*-value of <0.05 was the threshold for significantly enriched GO terms and KEGG pathways.

#### **3. Results and Discussion**

#### *3.1. Descriptive Statistics and Heritability for the Growth Traits*

Descriptive statistics of the phenotypes are presented in Table 1. Previous studies reported that the average AGE phenotype of Duroc and other western commercial pig breeds was between 150 and 162 days, ADG was between 610 and 820 g/day, BF was between 11.69 and 18.19 mm, and LMP was between 56% and 62% [6,14,24–26]. The phenotypic averages for AGE, ADG, BF, and LMP in this study were similar to previous studies. The coefficients of variation (CV) for AGE, ADG, BF, and LMP were 7.30%, 7.25%, 17.86%, and 2.83%, respectively. The results indicated substantial phenotypic variation in these traits, except LMP. Since Duroc pigs are the terminal male parent of the Duroc × (Landrace × Yorkshire) pigs (DLY), the LMP of Duroc pigs receives long-term positive selection [27]. In other words, the lower CV of LMP indicates that the selection prior to the LMP was effective in this core Duroc population.

**Table 1.** Descriptive statistics of growth traits in the Duroc pig population.


<sup>a</sup> AGE, days to 100 kg, ADG: average daily gain adjusted to 100 kg, BF, backfat thickness adjusted to 100 kg; LMP, predicted lean meat percentage adjusted to 100 kg; <sup>b</sup> SD, standard deviation; <sup>c</sup> CV, coefficient of variation.

To better understand the genetic background of growth traits, we estimated the genetic variance (*σ* 2 *a* ), residual variance (*σ* 2 *e* ), and heritability (*h* 2 ) by different methods, including BLUP and ssGBLUP. The heritability estimated by BLUP and ssGBLUP were 0.507 and 0.343, 0.508 and 0.333, 0.512 and 0.315, and 0.554 and 0.332 for AGE, ADG, BF, and LMP, respectively (Table 2). There were differences in the heritability estimated by the two methods in this study, and the previous study showed that common environmental components lead to a possible overestimation of genetic variance in the pedigree-based BLUP method of estimating heritability [28]. Compared with the BLUP method, the ssGBLUP method has a lower standard error. The ssGBLUP method uses both pedigrees and genotyped information, and the estimated genetic parameters are theoretically more accurate [29]. Furthermore, the results from the two methods indicated that these traits were moderate heritability traits and could be genetically improved by genetic techniques.

**Table 2.** Variance components and heritability estimates of growth traits.


<sup>a</sup> AGE, days to 100 kg, ADG: average daily gain adjusted to 100 kg, BF, backfat thickness adjusted to 100 kg; LMP, predicted lean meat percentage adjusted to 100 kg; **\*** *σ* 2 *a* , genetic variance, *σ* 2 *e* , residual variance, *σ* 2 *p* , phenotypic variance, *h* 2 , heritability; SE, standard error.

#### *3.2. Summary of wssGWAS*

Most important economic traits of livestock are quantitative traits with complicated genetic architectures. Therefore, uncovering the candidate genes underlying these traits has been a crucial goal in livestock breeding programs. In particular, growth rate and carcass fat content comprise the essential measuring basis of production performance in pigs, influencing the economic benefit directly. In this study, genetic variance explained by 0.8 Mb windows for each trait was achieved by wssGWAS. The first three most important QTL regions and the candidate genes are shown in Table 3. Overall, the first three QTL regions totally explained 5.96%–7.25% of the genetic variance of these traits under study. For each trait, the most significant windows explained approximately 2.56%–3.07% of the total genetic variance. Additionally, the identified windows (>1%) explained 17.07%, 18.59%, 23.87%, and 21.94% for AGE, ADG, BF, and LMP, respectively (Supplementary File, Tables S1–S5). Previous GWAS research reported that the candidate QTL regions of ADG on Sus scrofa chromosome (SSC) 1, 3, 6, 8, 13 and the candidate QTL regions of AGE on SSC 1, 3, 6, 8, 10, explaining a total of 8,09% and 4.08% of genetic variance [14], respectively. Due to LD, the wssGWAS method using the SNP window for analysis probably better identifies unknown QTL than the traditional GWAS, avoiding overestimation of the detected QTL number and false-positives [30,31]. Moreover, iterative weighting for SNPs could highlight QTL with larger effects [8]. Comparing with the results of the ssGWAS in ADG and BF by Matteo et al. [32], and our results identify the most significant QTL regions explaining greater genetic variance. Figure 1 shows the proportion of variance explained by each 0.8 Mb window for the studied traits, suggesting the polygenic genetic architecture of these traits.


**Table 3.** First three most important quantitative trait loci (QTL) regions and candidate genes for growth traits.

<sup>a</sup> AGE, days to 100 kg, ADG: average daily gain adjusted to 100 kg, BF, backfat thickness adjusted to 100 kg; LMP, predicted lean meat percentage adjusted to 100 kg; <sup>b</sup> Chr, chromosome; <sup>c</sup> gVar (%) represents the proportion of genetic variance explained by 0.8 Mb. For each trait, the genomic regions are sorted in descending order according to the proportion of genetic variance explained.

**Figure 1.** The proportion of genetic variances of the growth traits is explained by 0.8 Mb windows. gVar (%) represents the proportion of genetic variance explained by 0.8 Mb windows; 100 kg AGE, days to 100 kg; 100 kg ADG: average daily gain adjusted to 100 kg; 100 kg BF, backfat thickness adjusted to 100 kg; 100 kg LMP, predicted lean meat percentage adjusted to 100 kg.

#### *3.3. wssGWAS for AGE and ADG*

For AGE, 11 relevant QTL regions located on SSC 1, 2, 3, 4, 5, 9, 11, and 14 were identified (Supplementary Materials, Tables S1–S3). These regions explained 1.13–3.07% of total genetic variance for AGE, and 73 genes were annotated in these genomic regions. For ADG, 13 relevant QTL regions located on SSC1, 2, 3, 4, 5, 9, 11, 12, and 14 were identified, where 104 genes are located in these genomic regions (Supplementary Material, Tables S1 and S3). These regions explained total genetic variance ranged from 1.06% to 2.56% for ADG.

For the identified significant regions, there were 10 overlapped windows for AGE and ADG, which explained different proportions of genetic variance in these two traits. For complex quantitative traits, it was assumed that the linear effects of genes fitted the average of traits completely. However, the effects of genes are not always linear for the traits in practice, and the nonlinear assumption is more appropriate [14], which means that genes contributed differently and pleiotropic effects of the QTL between traits. QTLs with pleiotropic effects are common in the pig genome. For instance, Yang et al. [33] reported that a pleiotropic QTL on SSC 7 was associated with the vertebral number, carcass length, and teat number. In the present study, the region with the largest explained genetic variance for AGE and ADG, located in the region of 4.38–5.98 Mb on SSC4, seemingly had pleiotropic effects on meat and carcass traits in pigs [34]. Considering the duplication of identified windows and the strong genetic relationship of AGE and ADG, the genes identified by these two traits as common candidate genes are acceptable.

Among the significant windows of these two traits, the most important region (4.38–5.98 Mb on SSC4) harbored the *Family with Sequence Similarity 135 Member B* (*FAM135B*). The expression of *FAM135B* promotes granulin (GRN) secretion, and GRN is a secreted growth factor with high expression in epithelial, immune, chondrocytes, and neuronal cells [35]. Furthermore, *FAM135B* was reported as a candidate gene related to growth traits in beef cattle [36] and reproductive traits in Duroc pigs [37]. The *Zinc Finger And AT-Hook Domain Containing* (*ZFAT*) located in the region of 6.75–8.35 Mb on SSC4, and its mutation would lead to abnormal human body development and thyroid hormone secretion that played a key role in growth and metabolism [38].

The *Nuclear Factor, Interleukin 3 Regulated* (*NFIL3*) and the *Receptor Tyrosine Kinase-Like Orphan Receptor 2* (*ROR2*) were located in the regions of 1.63–3.23 Mb on SSC14. Wang et al. [39] reported that *NFIL3* affected the circadian lipid metabolism program, lipid–absorption, and export of intestinal epithelial through mouse experiments. The mice knocked out *ROR2* resulted in shortened or deformed bones and neurodevelopmental dysplasia [40].

The *Solute Carrier Family 27 Member 6* (*SLC27A6*) gene is located in the region of 130.75–132.35 Mb on SSC9. The *SLC27A6* gene had high expression in fat and muscle tissue and worked on lipid metabolism in pigs [41]. The *Adrenoceptor β 2* (*ADRB2*) gene, located in the region of 149.94–151.54 Mb on SSC2, encoded the β-adrenergic receptor that played an essential role in regulating metabolic level [42]. Furthermore, Bachman et al. [43] found that the knockout mice *ADRB*s have a reduced metabolic rate and accelerated fat deposition. The members of the *Tumor Necrosis Factor Receptor Superfamily* (*TNFS*), among which *TNFS11* was identified in the region of 24.24–25.04 Mb on SSC11, were responsible for bone growth in mice [44], and the variation of *TNFS11* led to the low level of serum insulin-like growth factor 1 (*IGF1*) influencing growth rate [45].

#### *3.4. wssGWAS for BF*

A total of 17 relevant QTL regions on SSC2, 3, 4, 6, 7, 10, 12, 13, 14, and 15 were identified for BF (Supplementary Material, Tables S1 and S4), where 99 genes were targeted in these genomic regions. These genomic regions explained 1.02–2.97% of the total genetic variance for BF.

The most significant window was located in the region of 29.34–30.94 Mb on SSC7, where four genes were targeted and were related to BF. In previous studies, *Death Domain*

*Associated Protein* (*DAXX*) was reported to affect fat deposition and fatty acid synthesis via regulating the transcriptional activity of the androgen receptor negatively [46,47]. For *Inositol 1,4,5-Trisphosphate Receptor Type 3* (*ITPR3*), another gene located in the most important window, it was confirmed that mutations could cause taste disorders in mice [48]. Nonetheless, the *Inositol Hexakisphosphate Kinase 3* (*IP6K3*) gene was located in the same region. The mice without this gene resulted in a lower growth rate and metabolism and a shorter lifespan [49]. *Protein–Kinase C and Casein Kinase–Substrate In Neurons 1* (*PACSIN1*), a fourth gene located in the region of 29.34–30.94 Mb, was identified concerning the bodyweight [50] and loin muscle area [51] in pigs.

*CYP7A1*, a member of Cytochrome P450 Family 7 Subfamily A, was identified in the region of 74.12–74.92 Mb on SSC4. The *CYP7A1* gene-encoded enzyme cholesterol 7α-hydroxylase mainly catalyzes the decomposition of cholesterol and synthesis of cholic acid [52]. The *SECIS-Binding Protein 2* (*SECISBP2*) was located in the region of 0.43–1.22 Mb on SSC14, and its mutation brought about abnormal thyroid hormone metabolism in humans [53].

#### *3.5. wssGWAS for LMP*

Altogether, 15 relevant regions on SSC2, 3, 4, 5, 6, 10, 11, 12, 17 and 18 were identified for LMP. These regions explained 1.00–2.68% of total genetic variance for LMP and 115 genes located in these genomic regions (Supplementary Materials, Tables S1 and S5). The *N-α-Acetyltransferase 40* (*NAA40*) gene and *Galectin 12* (*LGALS12*) gene were located in the region of 8.11–9.71 Mb on SSC2 with the highest percentage of total genetic variance. Liu et al. [54] demonstrated that knockout male rats of the *NAA40* gene exhibited abnormal lipid metabolism and reduced fat mass. In addition, *NAA40* was identified to be associated with the metabolism/transport of fatty acids or lipids in pigs [55]. For *LGALS12*, this gene was preferentially expressed in adipocytes, and mice lacking *LGALS12* resulted in increased mitochondrial respiration, reduced adiposity and decreased insulin resistance/glucose tolerance [56]. Furthermore, *LGALS12* has been identified to be associated with intramuscular and subcutaneous fat in pigs [57].

The *Corticotropin-Releasing Hormone Receptor 2* (*CRHR2*) gene, located in the region of 42.05–42.83 Mb on 18, was highly expressed in adipose tissue, which was involved in the regulation of energy homeostasis and the anorexia effect of fat levels in the corticotropinreleasing hormone (CRH) system [58]. For the region of 41.40–42.12 Mb on SSC2, *Peroxisomal Biogenesis Factor 16* (*PEX16*) and *Cryptochrome Circadian Regulator 2* (*CRY2*) were associated with LMP. Hofer et al. [59] found that the silence of *PEX6* affects adipocyte differentiation and increases peroxisomal fatty acid oxidation–reduction. For the *CRY2* gene, Mármol-Sánchez et al. [60] reported that the polymorphism of *CRY2* was significantly associated with stearic acid content in the longissimus dorsi muscle in Duroc pigs. The *Acyl-CoA Thioesterase 8* (*ACOT8*) gene is located in the region of 47.67–48.83 Mb on SSC17. The protein encoded by this gene is an acyl-CoA thioesterase enzyme that influences the thyroid hormone to regulate lipid storage and utilization according to metabolic demands [61].

#### *3.6. BF and LMP Overlap Regions*

In the present study, six genomic regions were found to be associated with both BF and LMP, including 41.40–42.12 Mb on SSC2,117.76–119.36 Mb on SSC3, 67.38–68.18 Mb, and 155.99–156.71 Mb on SSC6, and 38.67–40.27 Mb and 55.95–57.55 Mb on SSC10. Notably, BF and LMP were used as an important indicator of carcass fat content in production. Moreover, the genetic correlation of lipid deposition with growth rate and feed efficiency traits were positively high and negatively moderate, respectively [62]. Therefore, these overlap and pleiotropic regions were valuable for growth traits in pigs.

*Potassium Inwardly Rectifying Channel Subfamily J Member 11* (*KCNJ11*), located in the region of 41.40–42.12 Mb on SSC2, was associated with type 2 diabetes in humans [63]. The region of 117.76–119.36 Mb on SSC3 was the second most important window for BF and LMP, which explained 1.94% and 2.08% of the additive genetic variance, respectively, and

the *Syndecan 1* (*SDC1*) gene was detected. The *SDC1* gene has been proved to consume the intradermal fat layer, improve glucose tolerance, and significantly reduce body fat content in knockout mice [64]. Two genomic regions stood out on SSC6, which explained 1.05% and 1.42% of additive genetic variance for BF, and 1.26% and 1.19% of additive genetic variance for LMP, respectively. However, the annotated genes in one of these regions are not reported to be associated with growth traits, and no genes are described in the other region on SSC6, pending further studies.

The *Neuropilin 1* (*NRP1*) gene is located in the region of 55.95–57.55 Mb on SCC10, and several studies have exhibited its function in regulating fat cell–activity [65] and reducing dietary insulin resistance [66]. For the region of 38.67–40.27 Mb on SSC10, three genes were identified to be associated with BF and LMP. The *MOB Kinase Activator 3B* (*MOB3B*) gene was significantly associated with intramuscular fat and residual feed intake in cattle [67]. *Ras-Related Protein Rab-18* (*RAB18*), another gene located in the region of 38.67–40.27 Mb on SSC10, encoded a crucial Rab guanosine triphosphatase that controls the growth and maturation of lipid droplet, which lipid droplet was an intracellular organelle to stores triglycerides and cholesterol [68]. Still, in the same region, the *Membrane Palmitoylated Protein 7* (*MPP7*) gene was detected, and Bhoj et al. [69] reported that differences in *MPP7* gene expression affected glucose metabolism in the body.

#### *3.7. GO and KEGG Analysis*

In the current study, gene set enrichment analyses revealed that several terms might be related to growth traits. Among them, seven biological processes, two cellular components, one molecular function, and four KEGG pathways were targeted significantly (Table 4).


**Table 4.** Significant gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with growth traits in Duroc pigs (*p* < 0.05).

<sup>a</sup> GO, gene ontology, KEGG, Kyoto Encyclopedia of Genes and Genomes pathway.

The positive regulation of bone mineralization (GO:0030501) is a key biological process of bone formation, which promotes the deposition of inorganic minerals in the organic– matter of the bone. Bone mineralization affects the strength and density of bone, enabling it to bear the body weight. Shim et al. [70] found that rapid weight gains were correlated with bone mineralization in broilers.

The positive regulation of lipophagy (GO:1904504) is an autophagic process that promotes cells to activate autophagy-related molecules to degrade lipids and regulate intracellular lipid content. Excessive fat deposition in pigs reduces feed conversion rate and affects growth rate, but also affects the quality of animal products [71]. Hence, the function of lipophagy in preventing excess fat deposition may improve the growth traits of pigs. Moreover, the PPAR signaling pathway (ssc03320) is the main pathway associated with lipid metabolism in pigs [72]. Free fat acid from lipophagy is a well-characterized ligand for PPARγ (peroxisome proliferator-activated receptor γ) [68], which activated the PPAR signaling to induce agouti-related peptide expression (AgRP). Sandoval et al. [73] found that AgRP co-expressed neuropeptide Y stimulated food intake and reduced energy expenditure.

Potassium ion import (GO:0010107) mediates the transmembrane transport of ions and plays a key role in material exchange, energy transfer, and signal transduction. In particular, resting potassium currents make sour taste cells particularly sensitive to changes in intracellular pH, thereby affecting sour taste transduction [74]. Besides this, the taste transduction (ssc04742) pathway is the biological process by which the taste receptors of the organism detect and encode taste information through various transduction mechanisms. Several studies have shown that taste affects appetite and feed intake, and leads to a decrease in growth traits, such as body weight [75]. Moreover, the taste transduction pathway stimulates cephalic phase responses [76], promoting the process of salivary, gastric acid, and cephalic insulin secretion. Moreover, the insulin secretion (ssc04911) pathway was related to feeding intake, which promotes digestive metabolism and nutrient absorption and thus improves the growth trait.

#### **4. Conclusions**

In conclusion, we indicated 41 genomic regions to be associated with four growth traits (AGE, ADG, BF, and LMP) in a Canadian Duroc pig population using the wssGWAS method. The identified windows explained 1.00 to 3.07% of the genetic variance. Furthermore, 21 genes with related functional validation in previous studies were highlighted as candidate genes for growth traits in pigs. Moreover, GO, and KEGG enrichment analyses implied that the identified genes took part in bone formation, the immune system, and digestion, which were associated with growth traits. Such a full use of phenotypic and genotypic data and genealogical information will further advance our understanding of the genetic architecture and accelerate the genetic improvement of these economically important traits in pigs. In addition, the SNPs within identified regions may be useful for marker-assisted selection or genomic selection in future pig breeding.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2073-442 5/12/1/117/s1, Table S1: Genomic regions of 0.8 Mb explained more than 1% of genetic variance for growth traits in Duroc pigs; Table S2: The explained genetic variance of SNPs within significant windows for AGE; Table S3: The explained genetic variance of SNPs within significant windows for ADG; Table S4: The explained genetic variance of SNPs within significant windows for BF; Table S5: The explained genetic variance of SNPs within significant windows for LMP.

**Author Contributions:** J.Y. and Z.W. conceived and designed the experiment. D.R., Z.Z., and R.D. performed the experiments. Y.Q., S.Z., J.W., C.X., L.H., S.H., G.C., and E.Z. collected the samples and recorded the phenotypes. D.R., Z.Z., and J.Y. analyzed the data and wrote the manuscript. Z.W. contributed to the materials. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Local Innovative and Research Teams Project of Guangdong Province (2019BT02N630), the National Modern Agricultural Industry Science and Technology Innovation Center Creation Project of Guangzhou (2018KCZX01), the Natural Science Foundation of Guangdong Province (2018B030315007) and the Pearl River S and T Nova Program of Guangzhou (201906010011). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Institutional Review Board Statement:** This study was approved by the Ethics Committee of South China Agricultural University (SCAU, Guangzhou, China and Approval number SCAU#0017).

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The datasets generated and/or analyzed during the current study are not publicly available since the studied population is consisted of the nucleus herd of Wens Foodstuff Group Co., Ltd., but are available from the corresponding author on reasonable request.

**Acknowledgments:** The authors would like to thank Wens Foodstuff Group Co., Ltd. (Guangdong, China) for providing all phenotypic data, pedigree information, and ear tissue samples.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Identification of Circular RNAs in Hypothalamus of Gilts during the Onset of Puberty**

**Qingnan Li 1,†, Xiangchun Pan 2,† , Nian Li 2 , Wentao Gong 2 , Yaosheng Chen 1, \* and Xiaolong Yuan 2,3, \***


**Abstract:** The disorders of puberty have shown negative outcomes on health of mammals, and the hypothalamus is thought to be the main regulator of puberty by releasing GnRH. Many studies show that the circular RNAs (circRNAs) might be implicated in the timing of puberty in mammals. However, the circRNAs in the hypothalamus of gilts have not been explored. To profile the changes and biological functions of circRNAs in the hypothalamus during the onset of puberty, RNA-seq was utilized to establish pre-, in-, and post-pubertal hypothalamic circRNAs profiles. In this study, the functions of hypothalamic circRNAs were enriched in the signaling pathway of neurotrophin, progesterone-mediated oocyte maturation, oocyte meiosis, insulin, ErbB, and mTOR, which have been highly suggested to be involved in the timing of puberty. Furthermore, 53 circRNAs were identified to be putative hypothalamus-specific expressed circRNAs, and some of them were exclusively expressed in the one of three pubertal stages. Moreover, 22 differentially expressed circRNAs were identified and chosen to construct the circRNA-miRNA-gene network. Moreover, 10 circRNAs were found to be driven by six puberty-related genes (*ESR1*, *NF1*, *APP*, *ENPP2*, *ARNT*, and *DICER1*). Subsequently, the expression changes of several circRNAs were confirmed by RT-qPCR. Collectively, the preliminary results of hypothalamic circRNAs provided useful information for the investigation of the molecular mechanism for the timing of puberty in gilts.

**Keywords:** hypothalamus; puberty; circRNAs; pubertal genes

### **1. Introduction**

In female pigs, puberty is widely defined as the emergence of the first estrous and capable of reproduction [1]. There are more evidences demonstrated that gilts having an earlier age at puberty can shorten the generation interval of livestock [2,3] and farrow multiple litters [4]. Nevertheless, the basic molecular mechanisms that regulate the onset of puberty have not been largely explored in gilts. Generally, the onset of puberty is controlled and driven by hypothalamic-pituitary-gonadal (HPG) axis. The release of gonadotropin-releasing hormone (GnRH) from the hypothalamus leads to the release of FSH and LH from the pituitary [5], and the FSH and LH act on the folliculogenesis, oogenesis, and sex steroid of the gonads to arouse the timing of puberty in mammals [6]. Lomniczi, A. et al. showed that disrupting the release of pulsatile GnRH in hypothalamus delayed puberty [7]. Pandolfi, E.C. et al. demonstrated that the deletion of homeodomain protein sine oculis-related homeobox 6 (Six6) in hypothalamic GnRH neuron can leads to infertility [8]. These demonstrations indicate that the hypothalamus plays an essential role in the onset of puberty.

**Citation:** Li, Q.; Pan, X.; Li, N.; Gong, W.; Chen, Y.; Yuan, X. Identification of Circular RNAs in Hypothalamus of Gilts during the Onset of Puberty. *Genes* **2021**, *12*, 84. https://doi.org/ 10.3390/genes12010084

Received: 1 December 2020 Accepted: 8 January 2021 Published: 12 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Circular RNAs (circRNAs) are covalently closed transcript generated by the backsplicing. This back-splicing jointed a canonical 5′ splice site sequence to an upstream 3 ′ splice site sequence to produce the only region of a circRNA (BMJ, back-spliced junction) [9]. Multiple circRNAs have been showed to be generated by a single gene through alternative splicing [10]. Recently, next-generation sequencing has shown that circRNAs are widespread expression in mammals [11–13]. Moreover, circRNAs have been suggested to be stage-specific, cell- specific, and tissue-specific in the development of mammals [10,14]. Furthermore, most circRNAs are consisted of exons, while a few numbers of circRNAs are formed by the exon-intronic or intronic RNA in mammals [13]. It has been shown that the exonic circRNA may induce active DNA methylation through recruit specific protein, such as the FLI1 exonic circRNA recruits the methylcytosine dioxygenase TET1 to the promoter region of its parental gene [15].

Recently, several studies have showed that circRNAs could regulate the transcription of genes [16–18]. Specifically, circRNAs can be used as sponge for microRNAs (miRNAs). For example, Hall et al. showed that circ\_Lrp6 was the sponge for circ\_Lrp6 to counterbalance functions of the miRNA in functions of the miRNA in VSMCs [19]. Jost et al. produced the artificial circRNAs to inhibit the viral protein production by acting as sponges for the miRNA relevant in human disease [20]. Moreover, increasing evidence has shown that circRNAs are significantly enriched in mammalian brain and are related to physiological development of the brain [21]. Study has shown that the unique patterns of circRNAs across tissues and development stages seemed to reflect the reproductively capable individuals [22]. In addition, recent research indicated that circRNAs were closely connected to development in pig's brain. For instance, M.T. et al. identified large amounts of circRNAs in fetal brain of pig, and indicated that circRNA was significant impacted gilts' brain development [23]. These results indicated that circRNAs might play an indispensable role in multiple critical biological process in pigs. However, circRNAs has rarely been studies in onset of puberty of gilts.

In this study, the hypothalamus of pre-, in-, and post-pubertal gilts were utilized for RNA-seq analysis to explore the expression of circRNAs driven by the pubertal genes, and then investigate a circRNA-miRNA-gene network. It is hoped that the results of this study will provide insight into the potential function of circRNA in gilts during the onset of puberty and help in identifying circRNAs that play pivotal role in this process.

#### **2. Materials and Methods**

#### *2.1. Ethics Statement*

All animal experiments were approved by the Animal Care and Use Committee of the South China Agricultural University, Guangzhou, China (permit number: SCAU#2013-10), and conducted with the Regulations for the Administration of Affairs Concerning Experimental Animals (Ministry of Science and Technology, China; revised in June 2004).

#### *2.2. Animals*

All the experimental Landrace × Yorkshire crossbred gilts were monitored periodically for signs of puberty, including body weight, days of age, change of vulva, and the reaction to the boars, and the onset of puberty was identified by looking at this information. After that, three stages during the onset of puberty (pre-, in-, and post-puberty) were used. Thereinto, three gilts were designated as pre-puberty gilts (160 days old) without any pubertal signs (weight = 81.38 ± 2.40 kg); three gilts were selected as the in-pubertal gilts which exhibited first pubertal signs (weight = 110.00 ± 2.00 kg); three gilts (14 days old) were served as the post-pubertal gilts beyond the pubertal phase (weight = 122.82 ± 9.11 kg). After euthanasia, the hypothalamuses of gilts were immediately removed, placed in the liquid nitrogen, and then stored it at −80 ◦C until further use. In addition, refer to other researchers' experimental studies on circRNAs, three replicates in each group were used in this study [24].

#### *2.3. RNA Sequencing and the Transcriptome Assembly*

Total RNA from the pre-, in- and post-pubertal hypothalamuses of gilts was isolated with the Trizol agent (Invitrogen, Carlsbad, CA, USA). After quality testing of total RNA using the Agilent Bioanalyzer 2100 System (Agilent Technologies, Santa Clara, CA, USA), RNA samples with RNA integrity value of greater than 7.0 were left behind. Subsequently, rRNA was removed using the Epicentre Ribo-zero rRNA removal kits (Epicentre, Madison, WI, USA). Then we used the rRNA-depleted RNAs to compose double-stranded cDNA with the mRNA-Seq Sample Preparation Kit (Illumina, San Diego, CA, USA). Each sample was sequenced using the HiSeq 3000 sequencer according to the manufacturer's instructions for 5 µg cDNA and generated 150 bp paired-end reads. These raw reads were subjected to quality control using the Cutadapt software [25] to remove the 3′ adaptor-trimming, the low-quality reads which had >10% of unknown bases or >50% of the low-mass bases. Remaining reads after quality control were clean reads, which will be mapped onto the pig reference genome Sus scrofa11.1 by BWA [26] and bowtie2 [27] software.

#### *2.4. circRNA Identification and Data Analysis*

CIRI2 [28] was used to identify circRNA after BWA, and find\_circ [29] was used to identify circRNAs after bowtie2, which based on the reference genome alignment. We screened the number of unique junctions read to be at least 2, removed RNA with unclear breakpoints, and filtered out RNA with a length greater than 100 kb (genome length) as potential circRNA. Analysis included three replicates for each stage. Finally, circRNAs in pre-, in- and post-pubertal hypothalamuses of gilts were identified. Subsequently, the two software identified the intersection of circRNAs as the candidate circRNAs, and the annotation of circRNAs was proposed by CIRI for further study, which was based on the annotation file from Ensembl release 95. Furthermore, circRNAs originating from exons was used for further analyses. Length of circRNA is the sum of the lengths of the exons that form the circRNA. Besides, we obtained the circRNAs expression with BSJ reads, and used EBSeq package to calculate RPM [30]. In addition, the screening criteria for differential expression were FDR < 0.05, log2 − fold − change −≥1. Furthermore, the screening criteria for stage-specific circRNAs were as follows: circRNA detecting only in a unique stage was judged as stage-specific circRNA. The tissue-specific screening criteria were as follows: identification of circRNAs in this study were matched with the known pig' circRNAs through the starting and ending genomic positions of circRNAs, and the new circRNAs that were not matched in the database were regarded as the presumed tissue-specific circRNAs. In this study, the ggsignif package was used to perform statistical tests for differences between groups (Welch two-sample *t*-test).

#### *2.5. Pathway Analysis and circRNA-miRNA-mRNA Network Construction*

The parental genes of circRNA were used for Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis that the cutoff criterion was *p* < 0.05 and the results were conducted with KOBAS 3.0 online software ( http://kobas.cbi.pku.edu.cn/) [31]. In addition, the differentially expressed genes were screened under the condition of FDR < 0.05, log2 − fold − change −≥3. Furthermore, miRanda software [32] was used to predict circRNA-miRNA connections, miRanda match score ≥120. Then miRanda was as well as used to predict differentially expressed target genes of these miRNA, miRanda match score ≥200. Finally, cytoscape software [33] was used to draw a network interaction map between circRNA-miRNA-gene. Moreover, this analysis is based on the part of the transcript containing only exons.

#### *2.6. circRNA Validation by RT-qPCR*

We used RT and quantitative PCR (RT-qPCR) assays to validate the reliability of the high-throughput RNA sequencing data with divergent primers flanking the BSJ [9]. Used primeScript RT Reagent Kit (TaKaRa, Osaka, Japan) in a Mx3005P real-time PCR System (Stratagene, La Jolla, CA, USA) for qPCR according to the manufacturer's protocol.

Furthermore, the divergent primers of 5 circRNAs were designed to verify the accuracy of the RNA-seq. In order to normalize the expression of circRNAs, GAPDH was served as an internal reference. The PCR standard procedure were denaturation 94 ◦C (5 min), 40 cycles at 94 ◦C (10 s), 52 to 62 ◦C (15 s), and 72 ◦C (30 s). We used the 2−∆∆Ct method to analyze the RT-qPCR data. The pre-, in- and post-pubertal hypothalamuses were come from three gilts. Moreover, three biological replicates were carried out in each qRT-PCR. The Student's t test was used to assess the differences between any two pubertal groups of gilts, and the screening criteria for statistically significant were *p* < 0.05.

#### **3. Results**

#### *3.1. Identification of Hypothalamus-Derivced circRNAs during the Onset of Puberty*

Totally, 2582 circRNAs candidates were identified by CIRI2 and find\_circ software (Figure 1a, Supplementary Table S1). Respectively, 1619, 1273, and 1936 circRNAs were identified during the pre-, in- and post-puberty stages (Figure 1b). And the average circR-NAs expression were highest in in-puberty compared with other two stages (Figure 1c). Moreover, circRNAs split into three categories: 2388 exonic circRNAs, 65 intronic circRNAs and 129 intergenic region circRNAs. Furthermore, these 2582 circRNAs were derived from 1487 genes, of which 1461 genes identified as able to produce 2388 exonic circRNA and 57 genes identified as able to produce 65 intronic circRNA (Figure 1d). Furthermore, the 2388 exonic circRNAs were used for subsequent analyses.

**Figure 1.** Overview of the identified circRNAs by RNA-seq analyses in ovaries of gilts. (**a**) The circRNAs were identified by two algorithms (CIRI and find\_circ). (**b**) The number of unique and common circRNAs during pre-, in- and post-puberty. (**c**) Expression level of circRNAs in the pre-, in- and post-puberty stages. (**d**) The number of three types of circRNA and the number of corresponding parental genes. \*\*\* *p* < 0.001.

#### *3.2. Key Pathways of cirRNAs in Pubertal Transition*

To further explored the circRNA involved in pubertal hypothalamus, the KEGG analysis was used to perform the parental genes of all circRNAs (Supplementary Table S2). Notably, the functional pathways that were significantly overrepresented in pubertal hypothalamus included ras signaling pathway, insulin signaling pathway, ErbB signaling pathway, mTOR signaling pathway, neurotrophin signaling pathway, progesteronemediated oocyte maturation and oocyte meiosis signaling pathway (Figure 2a). For the ras signaling pathway, *NF1* drive to "circ 12:43516178-43526438" which was uniquely expressed in the in-puberty, and drive to "circ 12:43673069-43691261" which was expressed in pre- and post-puberty; *EXOC2* drive to "circ 7:229160-252991" which was uniquely expressed in in-puberty and post-puberty, and drive to "circ 7:306094-311618" which was expressed in pre-puberty (Figure 2b, Supplementary Table S2). For the insulin signaling pathway, *PPP1CB* drive to "circ 3:110442415-110455791" which was expressed in pre- and post-puberty, but not expressed in post-puberty (Figure 2b, Supplementary Table S2). For the ErbB signaling pathway, *MAP2K4* drive to "circ 12:56438413-56490424" which was expressed in pre- and post-puberty, and drive to "circ 12:56464310-56490424" which was uniquely expressed in post-puberty (Figure 2b, Supplementary Table S2). For the mTOR signaling pathway, *RICTOR* drive to "circ 16:24175667-24179590" which was expressed in pre- and post-puberty, and drive to "circ 16:24193110-24213797" which was expressed in inand post-puberty (Figure 2b, Supplementary Table S2). For the Neurotrophin signaling pathway, *PRDM4* drive to "circ 5:12644891-12657528" which was uniquely expressed in in-puberty (Figure 2b, Supplementary Table S2). For the progesterone-mediated oocyte maturation, *MAPK10* drive to "circ 8:132670606-132709741" which was expressed in inand post-puberty, drive to "circ 8:132704592-132733488" which was uniquely expressed in in-puberty (Figure 2b, Supplementary Table S2). For the oocyte meiosis signaling pathway, *PPP3CB* drive to "circ 14:76278036-76289229" which was uniquely expressed in-puberty (Figure 2b, Supplementary Table S2). The detailed information of these circRNA is shown in Supplementary Table S3.

#### *3.3. The Stage-Specific circRNAs in the Pubertal Transition*

To explore the expression changes of circRNA expressed in all stages, circRNAs were used for differentially expressed analysis except for the stage-specific circRNAs. 367, 168, and 575 putative stage-specific circRNAs were exclusively identified during the pre-, in- and post-puberty stages, respectively (Figure 1b). Furthermore, the expression levels of specific post-puberty circRNAs were significantly lower than specific pre-puberty circRNAs (*t*-test, *p*-value < 9.7 × 10−<sup>5</sup> ), as well as significantly lower than specific in-puberty circRNAs (*t*-test, *p*-value < 0.025) (Figure 3a). In addition, the KEGG enrichment with the parental genes of stage-specific circRNAs were showed in Figure 3b, neurotrophin signaling pathway and ErbB signaling pathway were enriched in the pre- and post-puberty stages; axon guidance pathway was enriched in the in-puberty stage; insulin signaling pathway as well as progesterone-mediated oocyte maturation were enriched in the post-puberty stage (Supplementary Table S4). Moreover, 111 genes generated stage-specific and non-specific circRNAs, of which 100 genes generated pre-specific and non-specific circRNAs, and 11 genes generated in-specific and non-specific circRNAs (Supplementary Table S5).

#### *3.4. Potentially Regulated Network of Differentially Expressed circRNAs*

In order to explore the putative functions of differentially expressed circRNAs, we identified a total of 22 differentially expressed circRNAs (Supplementary Table S6) and showed the expression in Figure 4a. Thereinto, 11 differentially up-regulated circRNAs and three differentially down-regulated circRNAs were identified in the pre- vs. in-puberty group; two differentially up-regulated circRNAs and four differentially down-regulated circRNA were identified in the pre- vs. post-puberty group; two differentially up-regulated circRNAs and five differentially down-regulated circRNA were identified in the in- vs. post-puberty group (Supplementary Table S6). Later, circRNAs mentioned above were used

to predict the binging sites, and the top three possible miRNA targets were listed in Table 1. After that, differentially expressed genes were used to predict the circRNA-miRNA-gene regulatory network (Figure 4b). Noticeably, we highlight *FSTL4*, *TSHR*, *SULT1E1*, *NPFFR2*, *RGCC*, and *ADAMTS4* genes, which were associated with puberty [34–39] (Supplementary Table S7). Interestingly, one of these differentially expressed circRNAs, "circ 11:4104218- 4118265" that interacted with *FSTL4* via ssc-miR-34a, was down-regulated in the prevs. in-puberty groups, as well as down-regulated in the pre- vs. post-puberty groups (Supplementary Tables S6 and S7). In addition, "circ 3:103726106-103773127" was upregulated in the pre- vs. in-puberty groups but down-regulated in the in- vs. post-puberty groups (Supplementary Table S6), and this circRNAs interacted with *SULT1E1* via sscmiR-4331-3p (Supplementary Table S7). According to this result, we found that some differential expression of circRNAs interacted with differentially expressed genes via miRNAs, whereafter they potentially regulate the onset of puberty.

**Figure 2.** The Key signaling pathway of CirRNAs in pubertal transition. (**a**) KEGG analysis of all identified circRNAs (*p* < 0.05). (**b**) Expression level of circRNAs involved in pubertal key pathways in pre-, in- and post-puberty.

−

**Figure 3.** Analysis results of stage-specific circRNAs. (**a**) Expression level of stage-specific circRNAs during Pre-, In-, Post-puberty. (**b**) KEGG analysis with parental genes of stage-specific circRNAs during pre-, in-, post-puberty (*p* < 0.05). \* *p* < 0.05, \*\*\* *p* < 0.001.

**Figure 4.** Analysis of differentially regulated circRNAs. (**a**) Heatmap of differentially expressed circRNAs in pubertal transition. (**b**) Differentially expressed circRNAs interact with differentially expressed genes via miRNAs and the differentially regulated status was show in Tables S6 and S7. The red circle represented circRNAs, the yellow triangle represented miRNAs, the green diamond represented genes.

*Genes***2021**, *12*, 84


**Table 1.** The differentially regulated circRNAs in this study of hypothalamus of gilts.

Positive strand; − Negative strand.

+

#### *3.5. The Hypothalamus-Specific circRNAs in Puberty*

The known circRNA of pig come from circAtlas 2.0, which was includes thousands of known circRNAs in nine porcine tissue types (brain, heart, kidney, liver, lung, skeletal muscle, spleen, testis, and retina) [40]. In order to investigate the specific circRNAs in hypothalamus tissue, 2518 circRNAs which overlapped in known circAtlas 2.0 were excluded, and leaving another 53 circRNAs as being putative hypothalamus-specific circR-NAs. Moreover, the 53 putative hypothalamus-specific circRNAs were significantly shorter than that of the known circRNAs (*t*-test, *p*-value = 0.00082) (Figure 5a). Furthermore, the expression of these putative hypothalamus-specific circRNAs was significantly lower than that of the known circRNAs during the onset of puberty (*t*-test, *p*-value < 0.001) (Figure 5b). In addition, the expression of these 53 hypothalamus-specific circRNAs was shown in Figure 5c, some of which were only expressed in one of three stages, and 5 genes were able to produce circRNAs at all stages without variation. Interestingly, the circRNA "circ 1:68439845-68491357" was only expressed in pre-puberty and "circ 1:68645763-68678869" was only expressed in post-puberty (Supplementary Table S8), both of which were derived from *GRIK2* associated with excitatory neurotransmission in the mammalian central nervous system [41]. The parental genes of these putative hypothalamus-specific circRNAs were enriched in "ssc04360: axon guidance" and "ssc04015: rap1 signaling pathway" pathways (Supplementary Table S9**)**. Meanwhile, these parental genes of hypothalamus-specific circRNAs were related to "GO 0048666: neuron development" and "GO 0030182 neuron differentiation" terms (Supplementary Table S9).

**Figure 5.** Analysis results of hypothalamus-specific circRNAs. (**a**) Length of tissue-specific circRNAs and known circRNAs. (**b**) Significant difference analysis between hypothalamus-specific circRNAs and known circRNAs. (**c**) The expression of hypothylamus-specific circRNAs in three stages, the blue box represents circRNAs and its parental genes without variation. \*\*\* *p* < 0.001.

#### *3.6. circRNAs in Pubertal Genes*

To further explore the function of circRNAs in puberty, the 20 pubertal genes were selected and investigated through reviewing the literature and databases by hand (Supplementary Table S10). Subsequently, we found that 16 circRNAs were driven by 6 pubertal genes (Supplementary Table S10). Thereinto, *ESR1* drive to circRNAs "circ 1:14416335- 14457143", which was differentially expressed during pubertal hypothalamus; *APP* drive to four circRNAs ("circ 13:189505179-189528484", "circ13:189505179-189544139", "circ 13:189523366-189528484" and "circ 13:189597995-189600156"); *NF1* drive to two circRNAs ("circ 12:43516178-43526438" and "circ 12:43673069-43691261"); *ENPP2* and *ARNT* respectively drive to circRNAs "circ 4:19360870-19367922" and circRNAs "4:98369520-98372553", which were uniquely expressed in pre-pubertal hypothalamus; *DICER1* drive to circRNAs "circ 1:14416335-14457143", which was always expressed during pubertal hypothalamus (Supplementary Table S10). These results will become the focus for further analysis.

#### *3.7. Validation of circRNAs by RT-qPCR*

In order to verify the accuracy of RNA-seq data, five circRNAs were randomly selected for validation experiments. Thereinto, circRNA "circ 1:11656690-11658857", "circ 1:87134227-87153004", "circ 2:141219340-141222143", "circ 3:26701499-87153004" were differential expression and circRNA "circ 6:9159375-91605991" was no differential expression. First, the divergent primers were used to this study, then the RT-qPCR were used to verified (see Section 2 for detail). Accordingly, the RT-qPCR assay results showed a similar tendency of expression with our RNA-seq data (Figure 6), further confirming the reliability of sequencing.

**Figure 6.** RT-qPCR validation of circRNAs. Five circRNAs were randomly selected for RT-qPCR validation, of which four circRNAs (**a**–**d**) were differential expression and one circRNA (**e**) was no significant difference. The primer informations were listed in Table S11. \* *p* < 0.05.

#### **4. Discussion**

Puberty is a complex physiological process regulated by multiple pathways. Hypothalamus, the main female puberty organs, directly mediate the pulsatile release of GnRH, which play crucial roles during onset of puberty [7,42]. Due to the delay in puberty, about 30% of gilts has been culled, which has obviously harmed the financial stake of the modern commercial farms [43]. CircRNAs were found to be function in many biological processes and widely expressed in mammal [44,45]. With the development of next-generation sequencing technology, research on the regulation of circRNA in animal puberty has made small progress year by year. However, puberty-associated circRNA expression remains unclear in gilts. Thus, our study focused on exploring the potential role of circRNAs in pubertal hypothalamus of gilts. A total of 2582 circRNAs were identified, of which 1110 were putative stage-specific circRNAs, 53 were putative hypothalamus-specific circRNAs and 22 were differentially expressed circRNAs.

For all identified circRNAs, 2388 exonic circRNAs were generated from 1461 parental genes (Figure 1d). This result may be explained by the fact that one gene could produce different circRNAs through different splicing forms. Furthermore, we uncovered several genes in some of the key pathways associated with the timing of puberty that could drive expression of differential circRNAs in differential pubertal stage. Neurofibromin 1 (*NF1*) is the main Ras regulator and plays an important role in neurons [46]. For the ras signaling pathway, "circ 12:43516178-43526438" driven by *NF1* was only expressed in in-puberty, while "circ 12:43516178-43526438", driven by *NF1*, was not expressed in in-puberty but expressed in pre- and post-puberty, indicating that these two *NF1*-driven circRNAs have different roles during the onset of puberty. When entering in-puberty, *NF1* might specifically splicing "circ 12:43516178-43526438" to play a crucial role. *MAP2K4* as the direct upstream activator of NH2-terminal kinase pathway, which plays an important role in regulating neuron survival and apoptosis in response to cerebral ischemia [47,48]. Similarly, for the ErbB signaling pathway, the two circRNAs driven by *MAP2K4* might have different effects. When entering pre-puberty, *MAP2K4* spliced "circ 12:56438413- 56490424", and after in-puberty, *MAP2K4* re-spliced "circ 12:56438413-56490424" and spliced "circ 12:56464310-56490424". In addition, previous report has shown that *MAPK10* could block the hypothalamic-pituitary-thyroid axis, thereby reducing energy expenditure and promoting obesity [49]. In this study, *MAPK10* drive to the specific expression of circRNA "circ 8:132670606-132709741" during in- and post-puberty in the progesteronemediated oocyte maturation signaling pathway. These results suggested that the circRNAs identified on the relevant signaling pathway might play a crucial role in pubertal transition.

Moreover, 367, 168, and 575 circRNAs were uniquely expressed in pre-, in-, postpuberty, respectively (Figure 3b), among which the uniquely expressed circRNAs in postpuberty had the highest average expression (Figure 3a). Importantly, parental genes of stage-specific circRNAs were involved in neurotrophin signaling pathway, ErbB signaling pathway, axon guidance pathway, insulin signaling pathway as well as progesteronemediated oocyte maturation; these processed have been reported to regulate the puberty [50–55]. In addition, circRNA-miRNA-gene network was used to predict the relationship between differential circRNA and differential genes. Interestingly, differentially expressed circRNAs "circ 11:4104218-4118265" that downregulated in the pre- vs. postpuberty groups interacted with *FSTL4* that downregulated in the pre- vs. post-puberty groups (Tables S6 and S7). This result suggested that circRNAs "circ 11:4104218-4118265" might be the sponge for ssc-miR-34, thereby promote the expression of *FSTL4*. The formation of circRNA is affected by alternative splicing and methylation [9]. Moreover, our previous studies have found that there had differential methylation pattern in genes during the onset of puberty in gilts [56]. It is possible, therefore, that the parental genes might be influenced by other epigenetic regulation to produce stage-specific exonic circRNA.

In addition, the circRNA "circ 7:19147980-19162903" that upregulated in the pre- vs. in-puberty groups interacted with *SULT1E1* and *NPFFR2* that up-regulated in the pre- vs. in-puberty groups (Supplementary Tables S6 and S7). This result suggested that circRNAs

"circ 7:19147980-19162903" might be the sponge for ssc-miR-4331-3p and ssc-miR-145-5p, thereby promote the expression of *SULT1E1* and *NPFFR2*, respectively. These results showed that there might be a competitive binding of miRNA by circRNA to affect gene expression in pubertal hypothalamus of gilts. Subsequently, we identified 53 hypothetical hypothalamic-specific circRNAs, which were involved in axon guidance and rap1 signaling pathway pathways, neuron development and neuron differentiation. Previous study reported that there were complex changes in the central nervous system in the pubertal hypothalamus [57]. Other study showed that proper axon guidance is essential for the migration of GnRH neurons in the brain [53]. Another study demonstrated that the rap1 plays a crucial role in mediating cAMP-induced MAPK activation of specific cell types [58]. It may be the case therefore that hypothalamus-specific circRNAs affect the development of neurons in hypothalamus and subsequently affect the onset of puberty. Interestingly, we found that two tissue-specific circRNA (circ 1:68439845-68491357, circ 1:68645763-68678869) were derived from the gene associated with excitatory neurotransmission in the mammalian central nervous system, suggesting that these two circRNAs might play a vital role in the specific differentiation of the hypothalamus.

Moreover, 10 pubertal genes-driven circRNAs were found in this study (Supplementary Table S10). Thereinto, *APP*, which implicated in neural development and reproduction [59], drive to four circRNAs ("circ 13:189505179-189528484", "circ 13:189505179- 189544139", "circ 13:189523366-189528484" and "circ 13:189597995-189600156"); *ESR1*, which associated with the timing of puberty [60], drive to circRNAs "circ 1:14416335- 14457143", of which "circ 1:14416335-14457143" was up-regulated in pre- vs. post-puberty group (Supplementary Table S6). Moreover, *DICER1*, which was essential for the normal development of the reproductive system [61], drive to circRNA "circ 7:116399251-116408577". In addition, these circRNAs was constituted in multiple exons except "circ 4:47041713- 47042593" and "circ 7:116399251-116408577". Our results provide new insight into the existence of hypothalamus-derived circRNAs in gilts. However, the underlying mechanism of these circRNAs during gilts' pubertal onset still requires carefully elucidation and verification.

#### **5. Conclusions**

During pubertal transition, 2582 circRNAs were identified hypothalamus, of which 1110 circRNAs were putative stagce-specific circRNAs, 53 circRNAs were putative hypothalamusspecific expressed circRNAs, and 22 circRNAs were significantly differentially expressed. These cirRNA were mostly enriched in neurotrophin signaling pathway, progesteronemediated oocyte maturation, ras signaling pathway, insulin signaling pathway, ErbB signaling pathway, mTOR signaling pathway and oocyte meiosis signaling pathway, which had been highly implicated in puberty. Moreover, 16 circRNAs were driven by six genes, i.e., *ESR1*, *NF1*, *APP*, *ENPP2*, *ARNT*, and *DICER1*. These preliminary results indicated circRNAs involved in the timing of puberty at the hypothalamus level in gilts, and provided useful information for the investigation of the molecular mechanism of pubertal onset in mammals.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2073-4 425/12/1/84/s1, Table S1: Information of all identified circRNAs. Table S2: The KEGG pathways enriched using parental genes of all CircRNAs. Table S3: Key pathways related to the timing of puberty in parental genes of circRNAs. Table S4: The KEGG pathways enriched using parental genes of stage-specific CircRNAs. Table S5: Parental Genes That Are Capable of Producing Stagespecific And Non-specific CircRNAss. Table S6: The differentially regulated circRNAs. Table S7: The differentially expressed genes associated with puberty our development of ovary. Table S8: Tissue-specific CircRNAs. Table S9: The KEGG pathways and GO term enriched using parental genes of tissue-specific CircRNAs. Table S10: CircRNAs in Pubertal Genes. Table S11: Primers used for qRT-PCR.

**Author Contributions:** Data curation: Q.L., X.P., N.L., and W.G.; funding acquisition: Y.C. and X.Y.; supervision: Y.C. and X.Y.; writing—original draft: Q.L. and X.P.; writing—review and editing: Y.C. and X.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key R&D Program of China (grant number: 2018YFD0501200), the Special Fund for Science and Technology Innovation of Guangdong Province (grant number: 2018B020203003), the National Natural Science Foundation of China (grant number: 31902131), the earmarked fund for the China Agriculture Research System (grant number: CARS-35), the National Natural Science Foundation of Guangdong Province (grant number: 2019A1515010676), the Youth Innovative fund of Guangdong Education Department (grant number: 2018KQNCX019), and China Postdoctoral Science Foundation (grant number: 2020M672556).

**Institutional Review Board Statement:** The animal study was reviewed and approved by The Animal Care and Use Committee of the South China Agricultural University, Guangzhou, China (permit number: SCAU#2013-10).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The datasets used in this study have been submitted to the European Nucleotide Archive under accession number PRJEB39729.

**Conflicts of Interest:** The authors claim that there is no conflict of interest.

#### **References**


*Article*

## **Profiling Novel Alternative Splicing within Multiple Tissues Provides Useful Insights into Porcine Genome Annotation**

#### **Wen Feng, Pengju Zhao, Xianrui Zheng, Zhengzheng Hu and Jianfeng Liu \***

Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China; wfeng@cau.edu.cn (W.F.); zhaopengju2014@gmail.com (P.Z.); zxr07sk1@163.com (X.Z.); zhengzheng0517@163.com (Z.H.)

**\*** Correspondence: liujf@cau.edu.cn; Tel.: +86-(010)-62731921

Received: 13 November 2020; Accepted: 24 November 2020; Published: 26 November 2020

**Abstract:** Alternative splicing (AS) is a process during gene expression that results in a single gene coding for different protein variants. AS contributes to transcriptome and proteome diversity. In order to characterize AS in pigs, genome-wide transcripts and AS events were detected using RNA sequencing of 34 different tissues in Duroc pigs. In total, 138,403 AS events and 29,270 expressed genes were identified. An alternative donor site was the most common AS form and accounted for 44% of the total AS events. The percentage of the other three AS forms (exon skipping, alternative acceptor site, and intron retention) was approximately 19%. The results showed that the most common AS events involving alternative donor sites could produce different transcripts or proteins that affect the biological processes. The expression of genes with tissue-specific AS events showed that gene functions were consistent with tissue functions. AS increased proteome diversity and resulted in novel proteins that gained or lost important functional domains. In summary, these findings extend porcine genome annotation and highlight roles that AS could play in determining tissue identity.

**Keywords:** alternative splicing; transcript; protein; domain; single nucleotide polymorphism

#### **1. Introduction**

Alternative splicing (AS) is a regulated process that generates multiple transcripts from a single gene. It therefore plays an important role in expanding protein diversity. AS affects 95% of multi-exonic genes in humans, and occurs in high proportions in other animals [1–3]. AS has four basic modes, including exon skipping, alternative donor sites, alternative acceptor sites, and intron retention (IR) [4]. In addition to these, multiple promoters [5] and multiple polyadenylation sites are two other mechanisms by which different mRNAs may be generated from the same gene [6].

Many alternatively spliced isoforms play important roles in the biological timing and development of tissues [7–10]. It is becoming clear that a large number of AS events contributes to the acquisition of adult tissue functions and identity in human tissue development [11]. It has been revealed that RNA mis-splicing underlies a growing number of human diseases [12]. Protein coding genes always have several alternatively spliced isoforms, emphasizing the importance of AS in gene expression [13]. Proteins generated by different AS also have the potential to gain or lose domains. This can substantially change the protein products, which results in similar or even opposite biological functions, which in turn can affect phenotypes [14]. Point mutations, such as single nucleotide polymorphisms (SNPs), have been shown to have substantial phenotypic variation and affect pre-mRNA splicing [15–20]. AS leads to the early termination of translation by the introducing premature stop codons [21]. Regulation of AS plays roles in multiple eukaryotic biological processes, including cell growth [22], chromatin modification [23], and tissue development [24].

As the best medical models for human diseases, pigs have similar anatomical and physiological structures as humans [25]. AS in pigs plays an essential role in the regulation of gene expression in genital tissues [26,27]. However, compared to the relevant studies in humans, it has still left a gap in our knowledge of the features of pig AS in multiple pig tissues. Various research groups have attempted to understand the role of AS in pigs by using expressed sequence tags (ESTs). A total of 1223 genes, with an average of 2.8 splicing variants per gene, have been detected among 16,540 unique genes using the EST data [28]. However, it has been reported that more AS could be detected using RNA sequencing (RNA-seq) than via EST technology in humans and plants [1,29]. However, to date, only one study has cataloged AS in a cross-breed at the pig genome-wide level, using RNA-seq [30]. In this previous study by Beiki et al., they mainly focus on transcripts and transcript structures. In our study, we further detect what affects the generation of AS, and how AS affects downstream progress.

In this study, a profile of AS events within multiple tissues of the Duroc pigs was constructed to extend the porcine AS genome annotation. Genome-wide transcript identification was performed in 34 normal tissues of the pig, using over 340 Gb of sequence data generated from 116 RNA sequencing libraries. In total, 2,486,239 known transcripts and 2,430,911 novel transcripts were identified. Tissue-specific expression patterns of novel and known transcripts were examined, and then the effects of the divergence of novel isoforms on their translation were analyzed. AS variations regulated by SNPs were also analyzed. The data reported here provide a valuable resource for enhancing the understanding and utilization of pig AS.

#### **2. Materials and Methods**

#### *2.1. Sample Collection and Sequencing*

PBMCs (peripheral blood mononuclear cells) and 33 different tissues were removed from nine unrelated, healthy Duroc pigs from Shenzhen Jinxinnong Technology Co., Ltd. (Shenzhen, China) in this study (Table S1). These nine Durocs consist of three infant pigs at the age of 3 days, three adult pigs at the age of 3 months, and three adult pigs at the age of 1 year old. For tissues collected from infant pigs, the samples are referred to tissuename\_I (e.g., Brain\_I), while from adult pigs they are called tissuename\_A (e.g., Brain\_A). The sample collection and treatment were fully conducted in strict accordance with the protocol approved by the Institutional Animal Care and Use Committee (IACUC) of China Agricultural University (no. DK2017/163) and Shenzhen Jinxinnong Technology Co., Ltd. The tissues and cells were then used for RNA and protein extraction.

#### *2.2. Separation of RNA from Tissues*

According to the standard protocols of Trizol method (Invitrogen, Carlsbad, CA, United States), the total RNA was isolated from mixture of equally unrelated pig pool tissues. RNA degradation and contamination were monitored on 1% agarose gels. The purity and contamination of total RNA was measured using a NanoPhotometer trophotometer (IMPLEN, Westlake Village, CA, United States) and Qubit RNA Assay Kit in a Qubit 2.0 Flurometer (Life Technologies, Carlsbad, CA, United States). The RNA's integrity was measured using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA). RNA samples that met the criteria of having an RNA integrity number (RIN) value of 7.0 or higher and a total RNA amount of 5 µg or higher were included and batched for RNA sequencing.

#### *2.3. Library Construction and RNA Sequencing*

The total RNA of the samples meeting the quality control (QC) criteria were ribosomal RNA-depleted and depleted QC, using the RiboMinus Eukaryote System v2 (Thermo Fisher Scientific, Waltham, MA, USA) and RNA 6000 Pico chip (Agilent Technologies) according to the manufacturer's protocol. RNA sequencing libraries were constructed using the NEBNext Ultra RNA Library Prep Kit (Illumina, Santiago, CA, United States), with 3 µg rRNA-depleted RNA, according to the manufacturer's

recommendation. RNA-seq library preparations were clustered on a cBot Cluster Generation System using a HiSeq PE Cluster Kit v4 cBot (Illumina), and sequenced using the Illumina Hiseq 2500 platform according to the manufacturer's instructions, with a data size of per sample of a minimum of 10 G clean reads (corresponding to 126 bp paired-end reads). The sequenced RNA-Seq raw data for 34 pig tissues is available from NCBI Sequences Read Archive with the BioProject number PRJNA392949.

#### *2.4. Read Alignment to the Reference Sus Scrofa 11.1 Genome*

The RNA-Seq raw data were trimmed based on the quality control for downstream analyses by following steps: BBmap [31] automatically detected the adapter sequence of reads and removed those reads containing Illumina adapters; the Q20, Q30, and GC content of the clean data were calculated by FASTQC [32] for quality control and filtering; homopolymer trimming to 3′ end of fragments and removal of the N bases of 3′ end were carried out by Fastx toolkit v0.014 [33]. The resulting sequences then were mapped to a reference *Sus scrofa* 11.1 genome by Hisat 0.1.6-beta 64-bit [34]. Ensemble *Sus scrofa* 11.1 version 91 [35] annotation was used as the transcript model reference for the alignment and splice junction findings, as well as for all protein-coding genes and isoform expression-level quantifications. In addition, StringTie 1.0.4 [36] calculated the FPKM (fragments per kilobase of exon model per million mapped reads (controlling for fragment length and sequencing depth)) values. A gene or transcript was defined as expressed when its expression was measured above 0.1 FPKM in all tissues [37].

#### *2.5. Alternative Splicing Events in Pig Transcriptome*

Asprofile (v1.0.4) software (https://ccb.jhu.edu/software/ASprofile/) was used to classify and count the AS events in each sample [38]. Asprofile counts twelve AS events types in total. In our analysis, only exon skipping, alternative donor site, alternative acceptor site, and intron retention were included. Tissue-specific AS events in pig transcriptome were detected by Splicing Express software [39].

#### *2.6. Comparison of Novel and Known Protein Domains*

Translated DNA sequences of novel transcripts were aligned to UniProt database by DIAMOND software (v4.4.0) (https://www.crystalimpact.com/diamond/) [39]. HMMER3 [40] was used to determine conserved protein domains for novel isoforms and their most similar known transcripts. The domain changes in novel and known proteins were carried out in a pairwise manner. Protein conformation was predicted by the Swiss model [41].

#### *2.7. SNP Compared with Alternative Splicing Variations*

Here we mainly used Pvaas software [42] to detect the single nucleotide1 variant (SNV) mutation associated with the novel AS, which is mainly to determine the correlation between the SNV and AS by Fisher exact test. Its reliability was assessed by the *p*-value after correcting by the Benjamini–Hochberg procedure. Further filtering was performed in order to ensure the accuracy of SNV: the minor mutation frequency should be greater than 5% (the minor mutation reads number accounts for more than 5% of AS); the number of mutant reads is ≥5; the *p*-value after correction <0.001; and the reads number of the new AS is ≥10.

#### *2.8. Known and Novel AS Validation Using Quantitative Reverse Transcription PCR*

RNAs from six pig tissues, including heart, liver, lung, kidney, brain, and lymph were transcribed into cDNA using PrimeScript RT reagent Kit with gDNA Eraser (TaKaRa Bio, Beijing, China) for PCR reaction. Two reverse-transcribed reaction systems were conducted. For reverse-transcribed reaction system I, a DNA-free master mix of each reaction was prepared, with 2.0 µL 5× gDNAEraser buffer, 1.0 µL gDNA Eraser, 7 µL RNase-free dH2O, and 1.0 µg total RNA. Reverse-transcribed reaction system I was standing for 5 min. Reverse-transcribed reaction system II consisted of 4.0 µL 5× PrimeScript

Buffer 2, 1.0 µL PrimeScript RT Enzyme Mix I, 1.0 µL RT Primer Mix, 4.0 µL RNase-free dH2O, and 10.0 µL reverse-transcribed reaction system I. Reverse-transcribed reaction system II reacting was performed at 37 ◦C for 15 min, followed by 85 ◦C for 5 s.

The primers for qPCR amplification were designed by primer-blast and confirmed by Oligo 7.0. The details about the primers are listed in Table S2. The selected internal reference gene was β-actin, which is commonly used in swine tissue as an internal reference gene. RT-qPCR was performed by the LightCycler 480 SYBR Green I Master kit (Roche Applied Sciences, Indianapolis, IN, United States). A total of 20 µL volumes consisted of 2 µL cDNA, 10 µL SYBR green master mix, 1 µL forward primer and 1 µL reverse primer (10 µmol/L), and 6 µL nuclease water. RT-qPCR conditions were as follows: pre-incubation for 5 min at 95 ◦C, and amplification for 40 cycles of 10 s at 95 ◦C, 20 s at 60 ◦C, and 20 s at 72 ◦C. After this, a high-resolution melting curve was generated, using the following protocol: 5 s at 95 ◦C and 1 min at 65 ◦C, followed by a gradual increase in temperature from 60 ◦C to 97 ◦C, using a ramp rate of 0.02 ◦C per second. Results were analyzed with the standard Light- Cycler 480 software, version 1.5 (Roche), using the 2-∆∆Ct method [43] to calculate the relative expression level of the target gene for each sample.

#### *2.9. Protein Extraction and Western Blotting*

Proteins were isolated from the heart and kidney of the Durocs. Fresh frozen tissue was thawed, cut into small pieces, and extensively washed with pre-cooled PBS (Gibco, Rockville, MD, USA). Tissues were suspended in 100 µL RIPA Lysis Buffer (Beyotime, Nanjing, China) and supplemented with 1% proteinase inhibitor (PMSF; Beyotime, Nanjing, China). The supernatant was collected and determined with a BCA (Bicinchoninic acid) assay kit (Thermo Fisher Scientific). The protein concentration of all samples were adjusted to 2 ug/ul with RIPA buffer (Beyotime, Nanjing, China). Samples containing 30 µg protein were separated on 9% sodium dodecyl sulfate–polyacrylamide gels (SDS-PAGE), and then electrotransferred onto a nitrocellulose membrane for 1 h using Bio-Rad Trans-Blot. The membrane was blocked with 5% non-fat milk in Tris-buffered saline (20 mM Tris-HCl, pH 7.6, 137 mM NaCl) containing 0.1% tris-buffered saline +Tween-20 (TBST, Gibco) for 30 min at room temperature, and incubated at 4 ◦C overnight with the following primary antibodies: Anti- Immunoglobulin Binding Protein 1(IGBP1) antibodies (ab70545, Abcam, Cambrige, UK). The membranes were washed three times with TBST for 10 min and incubated with goat anti-rabbit IgG (Heavy + Light) secondary antibody, an Horseradish peroxidase (HRP) conjugate (Thermo Fisher Scientific), for 1 h at room temperature. The antibody–antigen complexes were detected using Western blotting (WB) luminal reagent. The bands on the developed film were quantified with Quantity One v4.6.2 software (Bio-Rad, Hercules, CA, United States). The β-actin was used as a loading control for normalization.

#### **3. Results**

#### *3.1. Identification of Novel Transcripts in 34 Di*ff*erent Pig Tissues*

To discover and map novel transcripts, RNA-seq of 34 different pig tissues was performed. On average, 48.48 million reads per tissue were sequenced from 116 strand-specific and paired-end cDNA libraries (Table S3). Of these sequences, an average of 43.97 million reads (90.7%) per sample passed the strict quality control (QC). A total of 1495 million high-quality reads (376.7 Gb, 135-fold genome coverage) were aligned to the porcine reference genome (*Sus scrofa* 11.1), and 1223 million mapped fragments (310.1 Gb, 110.8-fold genome coverage) with an average alignment rate of 88.29% were recovered (Table S1). These mapped reads were then assembled and quantified as candidate transcripts using Stringtie software. This step produced a total of 2,486,239 transcripts from 29,270 genes across all tissues; of these, 60,578 transcripts (23,887 loci) were annotated in the pig Ensemble database (https://www.ensembl.org/), and 144,134 transcripts from 2424 non-coding genes, as well as 15,385 coding genes, were considered as potential AS. The remaining 2,281,529 novel

transcripts were from 26,493 loci without any annotated information. Details for each tissue are available in Table S4. These novel transcripts enhanced the pig genome annotation and increased the number of average transcripts for each gene in pigs.

#### *3.2. Classification of Alternative Splicing Types*

AS events were classified into 12 types, including the four basic types of AS events (exon skipping, alternative donor site, IR, and alternative acceptor site) (Figure 1A). A total of 138,403 AS events and 29,270 genes across all 34 tissues were detected. A gene had 4.73 AS events on an average. Alternative donor sites accounted for the most common AS event type (60,851; 44%), whereas the other three types shared similar percentages (exon skipping, 19%; alternative acceptor site, 19%; IR, 18%) (Figure 1A). The most significant increase was evident in the number of novel alternative donor sites, which accounted for 44% of all novel AS events (Figure 1B). The average length of IR was found to be the longest, with exon skipping the shortest, which is consistent with the definition of each AS type (Figure 1B).

**Figure 1.** Four modes of alternative splicing (AS). (**A**) The proportion of exon skipping, alternative donor sites, alternative acceptor sites, and intron retention. (**B**) Left *y*-axis: the number of known (blue bars) and novel (red bars) alternative splicing events. Right *y*-axis: average length of alternative splicing events (black polyline).

The novel AS events were categorized into two types, novel transcripts of known genes and novel transcripts of unknown genes. The number of novel transcripts in different tissues is shown in Tables S5 and S6. More novel transcripts of unknown genes than known genes were detected. The correlation between the number of previously annotated isoforms and novel isoforms of annotated genes was analyzed (Figure 2A). Transcripts with ensemble IDs were considered to be annotated: the greater the number of annotated isoforms per gene, the more the novel isoforms were detected (Figure 2A). Comparing to annotated isoforms, novel isoforms did not increase significantly, perhaps due to the limited isoforms per gene, which would not change even with improved annotation methods.

The number of AS events related to different genes varied. The number of AS events of the AHNAK gene was 391. However, thousands of genes only had a single AS event. To explain the variation in the number of AS events of the genes, a cluster analysis was performed. Genes with more than 10 AS events were grouped together; those with between 1 and 10 events were grouped together, and those with one AS event were grouped together. A Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of these three groups (Figure 2B, Table S7) determined that for genes with one AS event, a total of 24 pathways were significantly enriched, including the phosphatidylinositol signaling system (*p*-value = 6.30 × 10 −9 ; number of genes = 59), inositol phosphate metabolism (*p*-value = 2.80 × 10 −6 ; number of genes = 43), glycerophospholipid metabolism (*p*-value = 1.30 × 10 −4 ; number of genes = 49). These genes were mostly involved in lipid metabolism. Other substantial pathways were mainly related to cancer and reproduction. For genes with 1 < AS events ≤ 10, altogether 70 pathways were significantly enriched. Forty-six genes were involved in the measles pathway (*p*-value = 3.9 × 10 −8 ; number of genes = 46). Genes with 1 < AS events ≤ 10 were mostly involved in diseases and

signaling pathways, such as the forkhead box O (FoxO) and hypoxia-inducible factors (HIF)-1 signaling pathways. However, for genes with >10 AS events, only seven pathways were significantly enriched. These pathways were related to neuroactive and immune-related diseases. The second significant pathway of the seven was the olfactory transduction pathway (*p*-value = 4.9 × 10 −145 ), enriched with 639 genes.

≤ **Figure 2.** The difference in the number of alternative splicing-related genes. (**A**) The relationship between the number of annotated isoforms and novel isoforms per gene. (**B**) Top 10 enriched Kyoto Encyclopedia of Genes and Genome (KEGG) pathways of genes with different numbers of AS events (ASEs). Blue bars represent genes with one ASE, yellow bars represent genes with 1 < ASEs ≤ 10, and red bars represent genes with >10 ASEs.

#### *3.3. Tissue Specificity of Alternative Splicing in Di*ff*erent Tissues*

− − − were mainly related to cancer and reproduction. For genes with 1 < AS events ≤ 10, The tissue-specific AS events in the pig transcriptome were detected with Jekroll splicing express [39]. As reported in a previous study [37], transcripts with an expression level above 0.1 FPKM in only one tissue and an expression level less than 0.1 FPKM in all other tissues were defined to be tissue-specific transcripts. The number of tissue-specific transcripts varied substantially in different tissues, with the peripheral blood mononuclear cells possessing the most unique isoforms (1633) and the pancreas possessing the least (133; Table 1). Further examination of these tissue-specific transcripts showed that 86% of the genes that encode them were expressed in a single tissue at levels above 0.1 FPKM, revealing that the tissue specificity of these transcripts occurs at the transcriptional level. The remaining 14% of the tissue-specific transcripts had other isoforms expressed in other tissues, indicating that their tissue specificity likely stems from AS (Figure 3A).

<sup>−</sup> AS events ≤ 10 were mostly involved in diseases

−

**Figure 3.** (**A**) Single tissue-specific transcripts. The number of known and novel transcripts that are expressed at or above 0.1 FPKM (fragments per kilobase of exon model per million mapped reads) in only one tissue and less than FPKM in all others. Blue areas represent the transcripts and related genes expressed only in one tissue (gene expression dependent). Red areas represent the transcripts expressed in one tissue, with other isoforms present in other tissues (alternative splicing dependent). (**B**) The expression of a gene with different numbers of alternative splicing. The *y*-axis represents the ratio = log10\* (the highest FPKM/lowest FPKM) of a gene with multiple AS events, which means the greater the ratio, the greater the difference between the highest and lowest expression of a different transcript per gene. The plot represents this ratio for a gene.


**Table 1.** Distribution of tissue ASE types in different tissues.

To identify developmentally regulated changes in AS, StringTie [36] was used to quantify the expression of novel and known transcripts in different tissues (Table S8). The average level of expression of novel transcripts in the pancreas, liver, longissimus dorsi, spleen, prostate, adrenal gland, and breast was higher (FPKM > 10) than that of known transcripts. The difference in the expression level of transcripts between the highest expression transcript and the lowest expression transcript was compared (the ratio of major FPKM and minor FPKM) to the gene whose number of AS events was greater than 1. This revealed that more isoforms of a gene are associated with a greater difference in its highest and lowest expression (Figure 3B). A total of 116,439 genes were tissue-specific, of which 109,773 genes were newly detected. A total of 5,683 genes were expressed across all 34 tissues, suggesting that many genes are expressed in multiple tissues simultaneously (Table S9). These results are useful resources for improving the transcript annotations of the pig genome.

#### *3.4. Potential E*ff*ect of Novel Alternative Splicing on the Pig Proteome*

Translated DNA sequences of novel transcripts were aligned to the UniProt database using DIAMOND software [44]. To determine the effect of AS on the pig proteome, computationally predicted proteins encoded by the novel isoforms of known genes were compared to their respective known annotated proteins. A total of 1184 alternatively spliced transcripts were mapped to 361 new proteins (Table S10). These results, therefore, improved the annotation of unknown proteins in the pig genome.

HMMER3 (biosequence analysis using profile hidden Markov models) was used to identify conserved domains within the novel and known proteins, which were then compared in a pairwise manner [40]. A novel transcript of apolipoprotein B MRNA editing enzyme catalytic subunit 3B (APOBEC3B) in uterine tissue was mapped to a protein with the UniProt identifier D3U1S2\_PIG. Compared to a similar known transcript (UniProt identifier: F1SNY0\_PIG), the novel transcript (3072 bp) was 11 bp shorter. The annotated protein contained two APOBEC-like N-terminal domains, whereas the new isoform lacked an APOBEC-like N-terminal domain, potentially altering its function substantially (Figure 4A). The novel transcript had three coding exons, which was fewer than the known transcript (Figure 4B). It has been shown that DNA ligase 3 (LIG3) proteins activate leukemia via a transcriptional error [45]. The current results reveal a novel transcript (UniProt identifier: I3L9T2\_PIG) of LIG3 that had gained an additional domain (DNA ligase 3 BRCA1 C-terminal domain) not present in the known transcript (UniProt identifier: I3LN18\_PIG; Figure 4C). This novel transcript had one more coding exon than the known transcript (Figure 4D). The protein conformations of these four transcripts were predicted (Figure 4E,F), and the loss of the APOBEC3B domain and the gain of the LIG3 domain were clearly demonstrated. Although AS may cause changes in protein conformation, further experiments are needed to understand expression and function of the predicted proteins.

**Figure 4.** Examples of proteins encoded by known transcripts compared with those encoded by novel transcripts. (**A**) Apolipoprotein B MRNA editing enzyme catalytic subunit 3b (APOBEC3B)'s loss of an APOBEC-like N-terminal domain. (**B**) DNA ligase 3 (LIG3)'s gain of a DNA ligase 3 BRCA1 C-terminal (BRCT) domain. (**C**,**D**) The number of coding exons of APOBEC3B and LIG3. (**E**,**F**) The predicted protein conformations of APOBEC3B and LIG3.

#### *3.5. Conserved Alternative Splicing between Pig and Human*

Pigs are generally considered a promising medical model for human diseases, and pig orthologs for many human disease-associated genes have been identified [25]. To investigate the orthologous relationship between humans and pigs at the protein level, pig proteins were aligned with human proteins using Basic Local Alignment Search Tool (BLAST; https://blast.ncbi.nlm.nih.gov/), with the criteria of minimum length ≥ 50 bp and identity ≥ 80%. As such, 4168 (56.97%) pig proteins that are orthologous to human proteins were identified.

≥ ≥ A similar study reported a global analysis of AS transitions between human infants and adult hearts [46]. Infant and adult pig hearts were included in the current study. Similar to human infant hearts, IR and exon skipping occurred more frequently in the infant pig. In contrast, exon skipping occurred more frequently in the adult pig hearts, but least frequently in the adult human hearts. The specific AS genes that were differentially expressed in infant human and pig hearts were compared. Two genes, sperm-associated antigen 5 (SPAG5) and LIG3, were found to be shared between human and pig. Pig proteins of SPAG5 and LIG3 were aligned to their human proteins (Figure 5A). For two proteins, the identity between the pig and human was 100% and 99.5%. These conserved proteins indicate genome evolution and facilitate further exploration of the potential functional similarity between human and pig genomes.

–

–

– –

—

**Figure 5.** (**A**) Example of a homologous protein related to an AS gene between human and pig (downloaded from NCBI Multiple Sequence Alignment Viewer, Version 1.15.0). (**B**) Effect of single nucleotide polymorphism (SNP) mutation of synapse-associated protein 1 (SYAP1) on alternative splicing. (**C**) Protein conformation of canonical and novel isoforms as predicted by the Swiss model.

#### *3.6. E*ff*ect of Single Nucleotide Polymorphism on Pig Alternative Splicing*

Deep RNA sequencing can detect sequence variations associated with AS. AS events were grouped into three types: canonical splicing with a GT–AG intron (i.e., GT and AG splicing signals at donor and acceptor sites, respectively), semi-canonical splicing with GC–AG or AT–AC introns, and novel splicing without GT–AG introns [47]. Semi-canonical and novel splicing are also called aberrant splicing. SNPs occur in different regions of a gene. They change the expression of transcripts and therefore the protein abundance, especially when present in the exonic regions (Table S11). The changes in exons owning to AS may cause synonymous or non-synonymous mutation in the proteins. Some aberrant splicing of the exons led to non-synonymous mutations in the codons, and caused the premature termination of translation in some tissues (Table S12). Related SNPs, genes, and codons when an aberrant splicing occurred in more than 20 tissues are described in Table 2. Six novel splicing events were shown to cause non-synonymous mutations in genes (Table 2)—in particular, a missense SNV (exon8:c.G970T:p.D295Y) within the synapse-associated protein 1 (SYAP1) gene was found that generated a serine to threonine substitution (Figure 5B), leading to a change in the SYAP1 protein conformation (Figure 5C).



 **2.**Statistics of different SNP mutation types in different tissues.

**Table**

#### *3.7. Validation of Known and Novel Transcripts Using RT-qPCR and Western Blotting*

The reliability of transcript identification in this study was verified with RT-qPCR of one known and five novel transcripts from six tissues. Of these transcripts, TCONS\_01698330 and TCONS\_01698333 are known and novel transcripts of DDX17 gene, respectively. TCONS\_00393548, TCONS\_00028775, TCONS\_01245051, and TCONS\_01413087 are novel transcripts of MICU2, IGBP1, TCIRG1, and NFATC2IP genes, respectively. These six transcripts that were detected with RT-qPCR and RNA-seq showed consistent expression patterns (Figure 6A). Gene expression levels measured using these two methods were highly correlated (*R* <sup>2</sup> = 0.92). These results confirm the accuracy of the identification and characterization of pig transcripts.

**Figure 6.** (**A**) Validation of transcripts using RT-qPCR. Blue bars and red bars represent gene expression measured with RT-qPCR and RNA-seq, respectively. (**B**) The protein level of Immunoglobulin Binding Protein 1 (IGBP) using western blotting.

To determine whether AS was also reflected at the protein level, Western blotting was performed. Among the five genes used for RT-qPCR, only the antibody of IGBP1 (Abcam) for pig was available. β-actin was used as house-keeping gene. Fortunately, only one AS type and transcript of the IGBP1 gene was detected. We can see that the mRNA of IGBP1 is translated to similar changes at protein levels.

#### **4. Discussion**

In the present study, novel AS from 34 different tissues in nine Duroc pigs was profiled via deep RNA sequencing, providing important insights into porcine genome annotation. A total of 138,403 AS events and 29,270 expressed genes were detected, with 4.73 AS events per gene on average. The average number of AS events per gene has improved to 1.93 splicing variants in this study compared with the 2.8 splicing variants per gene by EST data [28,30]. Previous studies have shown that exon skipping, alternative donor sites, alternative acceptor splice sites, and IR are the four basic AS types. In the current study, the alternative donor site was the most common AS form, accounting for 44% of the total AS events. The percentage of the other three AS forms was approximately 19%. These results suggest that AS events involving different alternative splicing types can produce different transcripts or proteins that affect biological processes. In a previous study to detect AS events in nine pig tissues using Iso-seq and RNA-seq data, alternative acceptor splice site was shown to be the most prevalent AS type in each tissue [30]. The study by Beiki et al. [30] shared four tissues in common with the current study, including the brain, liver, spleen, and thymus. In the current study, exon skipping was the predominant splicing event in the brain, liver, and spleen (Table 1), which is consistent with a study where exon skipping is the most prevalent AS type in animals [4]. In addition, IR was the most frequent of AS events in the thymus, which is similar to that reported in humans [48]. Nine Duroc pigs, three infants and six adults, were used to detect AS events in the present study, whereas a single cross-bred pig was used for AS analysis in the previous study [30]. Compared to results in humans, we can see it is common that AS show different trends in different situation. Even we do not share the exact same results, both of our results make sense. The different breeds used in these two studies could have produced different results, but regardless of the splicing mechanism, this large increase in new, alternatively spliced transcripts improves our knowledge of the diversity of the pig proteome dramatically.

Genes with only one AS event were predominantly located in the pathway of lipid metabolism. Genes with more than 10 AS events were associated with pregnancy or disease-related signaling pathways. Whether genes belonging to a pathway are related in terms of their AS events requires further investigation. The functions of genes with tissue-specific AS were found to be nearly consistent with their tissue of expression. In addition, the expression of a transcript in different tissues could be different. For example, the expression of the gene glutathione peroxidase 4 (GPX4) ranges from 20 to over 400 FPKM, and its transcript expression in the adult testes was much higher than in other tissues (Figure S1). The GPX4 gene translates into specific enzymes in the testes of adult rats, which is an important structural protein in mature sperm [49,50].

Genes expressed in specific tissues are reported as tissue-enriched genes with tissue-specific functions in multicellular organisms (Figure S2). Results shows that 8482 well-annotated protein isoforms demonstrated tissue-specific expression characteristics. The expression level of 16,356 well-annotated protein isoforms in a specific tissue is at least five times higher FPKM compared to its second-highest expression level in the tissue. In the present study, tissues with complex biological processes usually had more tissue-enriched genes that are closely related to the function of the corresponding tissue. For example, the rhodopsin (RHO) gene, enriched in the retina, plays an important role in the deposition of retinal pigments [51]. Therefore, these specific, tissue-enriched genes can not only confirm the biological properties of known genes, but also predict the potential function of unknown genes in the pig genome. Besides these tissue-specific genes, there are still some genes that can be highly expressed in some functionally related tissues/organs, and these genes usually have similar functions. A total of 1318 well-annotated protein isoforms could be divided into seven

types (Figure S3). These results show that most of the enriched genes belong to the brain system (72.7%), followed by the muscle system, adrenal and thymic system (6.6%), and liver and gallbladder system (4.5%).

In general, the majority of novel splicing events resulted in new proteins. Novel alternatively spliced transcripts often contain domain losses or gains relative to their most similar known isoform, and even led to opposite functions [14]. In the current analysis, proteins encoded by novel transcripts were predicted in silico. The 1184 alternatively spliced transcripts mapped to 361 new proteins. Novel and known transcripts of APOBEC3B and LIG3 were compared as examples and showed changes in transcript length. The novel transcripts of APOBEC3B and LIG3 showed the loss and gain of a domain, respectively, as predicted by HMMER3 (Figure 4A,D). However, additional experiments are still necessary to confirm the expression and function of new proteins.

SNPs affect protein-binding sites (or cause mutations in the binding proteins themselves) and contribute to aberrant splicing. Some non-synonymous mutations also cause premature translational termination. A missense SNV (exon3:c.T376A:p.S126T) within SYAP1 was found that generated a serine to threonine substitution, which caused changes in protein conformation of these two isoforms, as predicted by the Swiss model.

#### **5. Conclusions**

In summary, the current analysis greatly expands the pig transcriptome to include more than 2,281,529 newly annotated transcripts. It was found that if genes had more AS events, the expression difference between the highest and lowest expressions would be greater. The expression of genes with tissue-specific AS events is consistent with their tissue-specific functions. This expansion of isoforms increases the known proteome diversity and results in novel proteins that gain and lose important functional domains. Point mutations, such as SNPs, in exons could lead to new AS. Taken together, these findings extend pig genome annotation and highlight the roles that AS plays in tissue identity in organisms.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/12/1405/s1, Figure S1: The specific AS events of the GPX4 gene in the testis of adult pigs. Figure S2: Numbers of tissue-enriched isoforms for known and unknown proteins, Figure S3: Numbers of group-enriched isoforms in different tissue groups, Table S1: Alignment information of 34 pig tissues, Table S2: The details about the primers, Table S3: Information of sequence data from 34 pig tissues, Table S4: Statistics of predicted transcripts and related genes (FPKM > 0.1), Table S5: Statistics of novel AS that are known genes, Table S6: Statistics of novel transcripts of unknown genes, Table S7: Enriched KEGG terms of genes with different number of AS events, Table S8: The average expression of novel and known transcripts (/FPKM), Table S9: Statistic of novel and known genes expressed in different number tissues, Table S10: Statistic of new proteins that AS were alligned successfully, Table S11: Specific SNP in different Tissues (Gene level), Table S12: Specific SNP indifferent Tissues (Exonic level).

**Author Contributions:** W.F. and J.L. designed the experiments. P.Z. performed population analyses. W.F. contributed to computational analyses. X.Z. collected samples and prepared them for sequencing. Z.H. designed the primers for RT-qPCR validation. W.F. wrote, J.L. revised the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundations of China (31661143013, 31790414), National Key R&D Program of China (2018YFD0501200),the Beijing Natural Science Foundation (65199011). The funders had no role in study design, data collection and analysis, and interpretation of decision to publish, nor the preparation of the manuscript.

**Acknowledgments:** We thank Shenzhen Jinxinnong Technology Co., Ltd. for offering nine Duroc pigs.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

1. Pan, Q.; Shai, O.; Lee, L.J.; Frey, B.J.; Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. *Nat. Genet.* **2008**, *40*, 1413–1415. [CrossRef] [PubMed]


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
