**Improvement of Disease Resistance in Livestock: Application of Immunogenomics and CRISPR**/**Cas9 Technology**

#### **Md. Aminul Islam 1,2,3, Sharmin Aqter Rony 4, Mohammad Bozlur Rahman 5, Mehmet Ulas Cinar 6,7, Julio Villena 2,8, Muhammad Jasim Uddin 1,9,\* and Haruki Kitazawa 2,3,\***


Received: 20 October 2020; Accepted: 26 November 2020; Published: 28 November 2020

**Simple Summary:** Disease resistance is the ability of animals to inhibit the growth of invading pathogens within the body, which is influenced by the interaction of the host immune system, host genetics, and the pathogens. Resistant animals can be produced by molecular breeding by introducing the genomic marker responsible for disease resistance or immunocompetence. Immunogenomics is an information science that enables the genome-scale investigation of host immune response to pathogenic infection thereby identification of the genomic marker for disease resistance. Once the genomic marker is determined, it could be implemented in producing disease resistance animals by applying the advanced reproductive biotechnology like genome editing. The technical ease and decreasing cost over time might enhance the application of genome editing techniques for producing disease resistance livestock.

**Abstract:** Disease occurrence adversely affects livestock production and animal welfare, and have an impact on both human health and public perception of food–animals production. Combined efforts from farmers, animal scientists, and veterinarians have been continuing to explore the effective disease control approaches for the production of safe animal-originated food. Implementing the immunogenomics, along with genome editing technology, has been considering as the key approach for safe food–animal production through the improvement of the host genetic resistance. Next-generation sequencing, as a cutting-edge technique, enables the production of high throughput transcriptomic and genomic profiles resulted from host-pathogen interactions. Immunogenomics combine the transcriptomic and genomic data that links to host resistance to disease, and predict the potential candidate genes and their genomic locations. Genome editing, which involves insertion, deletion, or modification of one or more genes in the DNA sequence, is advancing rapidly and

may be poised to become a commercial reality faster than it has thought. The clustered regulatory interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) [CRISPR/Cas9] system has recently emerged as a powerful tool for genome editing in agricultural food production including livestock disease management. CRISPR/Cas9 mediated insertion of *NRAMP1* gene for producing tuberculosis resistant cattle, and deletion of *CD163* gene for producing porcine reproductive and respiratory syndrome (PRRS) resistant pigs are two groundbreaking applications of genome editing in livestock. In this review, we have highlighted the technological advances of livestock immunogenomics and the principles and scopes of application of CRISPR/Cas9-mediated targeted genome editing in animal breeding for disease resistance.

**Keywords:** next generation sequencing; transcriptomics; bioinformatics; genome editing; disease resistance; livestock

#### **1. Introduction**

Foods from the livestock are a vital source of high-quality protein. Efficient production of animal-originated food is one of the major issues for global food safety, a long desire for a healthy population. The occurrence of infectious diseases in livestock affects not only the farm production, economics, and animal health-welfare; but it also increases the risk of zoonoses. Therefore, the outbreak of infectious diseases in food animals is a major threat to food safety and public health. The emergence of antimicrobial drug resistance along with the unavailability of effective vaccines and spontaneous genetic mutation of infectious pathogens are the major perils for breaking down the disease control strategies [1]. On the other hand, there is a growing consumer demand to produce organic animal food by securing animal welfare standards. Raising disease-free healthy livestock is the key to produce safe meat and milk. Good husbandry practices including optimum housing, ration, use of probiotics, biosecurity, and vaccination should be taken into account for the maintenance of health and production of farmed animals [2]. However, the improvement of host genetic resistance to disease may potentially contribute to the profitability through improved animal welfare and reduced antibiotic usage in livestock production [3].

Disease resistance is the host's ability to restrict or inhibit the establishment of infections and/or pathological processes of infectious pathogens [4]. Interactions among the host immune system and invading pathogens, and host genome determine the fate of infection and disease pathogenesis (Figure 1). Innate immunity is the frontline host defense against invading pathogens, and genes associated with the induction of innate immune responses are considered as the potential candidates for disease resistance [5]. Innate immune responses provide immediate and non-specific defense against a wider range of pathogens; therefore, innate immune response traits are the first choice to be incorporated in the disease-resistant breeding plans. Notably, the heritability of the innate immune response is medium to moderate [6]. The contribution of host genetic factors to disease occurrence is one of the fundamental issues in understanding disease pathogenesis and host resistance. The variation in the genetic resistance to disease is due largely to the variability in the host's immune response to infection. Thus, information on immunology and genomics together would better characterize the disease phenotype [4]. The application of genomic technologies to understand the immunology is termed as immunogenomics. Immunogenomics include integrated analysis of immunologic and genomic data on host response to infectious pathogens, and thereby contribute to identifying potential candidate genes for disease resistance in livestock [7]. The single nucleotide polymorphisms (SNPs) within the candidate gene associated with host immunocompetence to infection can subsequently be considered as a DNA marker for disease resistance.

Genome editing is a bio-engineering technology that involves insertion, deletion, or modification of a specific section of DNA sequence in the genome. The genome-editing technology encompasses nuclease enzyme for cutting DNA sequence, in addition to a targeting mechanism that guides the enzyme to a particular site on the genome [8]. The clustered regulatory interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) [CRISPR/Cas9] system is one of the latest genome-editing tools that has been widely used over the last couple of years (reviewed by Pellagatti et al. [9]). Genome editing in livestock has become a commercial reality due to the emergence of CRISPR/Cas9 technology. Advance use of CRISPR/Cas9 could facilitate the improvement of disease resistance in livestock through (a) enhancing the frequency of favorable trait-associated alleles, (b) introgression of favorable alleles from other breeds/species, and (c) by generating de novo favorable alleles [10]. However, the main challenge is the identification of genome-editing targets for a disease-resistant trait, which will require a combination of high-quality annotated livestock genomes, well-powered genome-wide association studies, and robust knowledge of molecular genetics of pathogen-host immune system interactions. In this review, we have systematically explained how the gift of immunogenomics could be applied to grasp the disease resistance candidate genes and use of the biological scissor, CRISPR/Cas9 technology, for insertion or deletion of desired genes in the host genome (Figure 1). The prospects, regulations, and social acceptance of genome editing technology concerning improving the livestock health and production are also discussed.

**Figure 1.** Schematic diagram showing the applications of immunogenomics and genome editing to produce disease-resistant animals. Severity and pathogenesis of disease depend on the interaction of the host immune system and the invading pathogens, where host genetic has potential influence. Immunogenomics employ the integrated bioinformatics tools to explore the influence of host genetic on the interaction between the host immune system and invading pathogens and subsequently identify the candidate gene (s) for disease resistance. The CRISPR/Cas9 mediated genome editing technology could subsequently be employed for targeted modification of the host genome to produce disease-resistant animals.

#### **2. Disease Resistance: The Phenotype**

Host-pathogen interactions result in either death or survival of the affected animal. Survivors either remain healthy and unaffected by the pathogens or experience a course of a disease that recover afterward. The immune system plays a crucial role in maintaining a balance between health and disease pathogenesis. In general, disease resistance demonstrates the host's ability to limit or prevent the replication of invading pathogens [11,12], and the relationship between the host and pathogens is better understood from an ecological point of view [13]. This concept includes several ways through which a host becomes comparatively more robust. For example, to display less possibility of infection, to have a reduced proliferation rate once infected, to possess a reduced rate of shedding or transmission [4]. It is not easy to measure the disease resistance trait, which is a major challenge for the investigator. It is possible that phenotype of disease resistance is relative rather than absolute and altered resistance impacts on the population as a whole because few attributes favor the individual host in certain cases, but other attributes (such as lower rate of transmission and infection) favor other population members. There is another associated phenotype called disease resilience, the capacity of the host to suppress the establishment or development of infection. Disease resilience is a physiological state in which an infected animal is capable of sustaining an acceptable degree of efficiency despite the burden of the pathogen [14]. The prospects and possibilities of using disease resilience in animal breeding programs are reviewed by Berghof et al. [15]. In addition, tolerance is another closely related phenotype that indicates the ability of the host to maintain the body homeostasis in the presence of replicating pathogens, with limited pathological consequences [12]. In a mixed population, the presence of asymptomatic and disease tolerant individuals may increase the genetic resistance of individual hosts, but there is a risk of increasing the prevalence of disease on the farm. Therefore, the disease resistance phenotype could consider to be incorporated into the host genetic improvement strategy.

Absolute quantification of the disease resistance phenotype under field condition is very difficult due to logistic and economic constraints, which is one of the major rate-limiting steps in animal breeding for disease resistance. Therefore, targeting immunocompetence trait is the indirect approach for improving disease resistance in livestock, and thereby producing safe milk and meat in the age of antibiotic resistance. Immunocompetent animals possess higher metabolic function and resilience to a wider range of infectious diseases, thereby improving the production performance in terms of quality and quantity. The host resistance to infectious diseases in livestock could be enhanced by incorporating resistance genes in the host genetics. The most economically important diseases in different livestock species could be considered first in the breeding goals for more sustainable production of disease-resistant animals [3]. Some infectious diseases of economic importance are mastitis, foot and mouth disease (FMD), tuberculosis, John's disease, and tick-borne diseases in cattle, Peste des petits ruminant (PPR) in goats, porcine reproductive and respiratory syndrome (PRRS), African swine fever in pigs, and gastrointestinal (GI) parasite infection in cattle and sheep [16].

Breeds of different animal species have a great influence on the innate resistance to infectious disease. Native breeds usually exert a higher degree of resistance to pathogens than the high-yielding exotic breed of the same species, which is believed to be due to the baseline immune competence, mainly innate immunity which could be transmitted down the progeny via genetic information and colostrum. Indigenous breeds of cattle (*Bos indicus*) were found to have a higher resistance to tick infestation and tick-borne diseases compared to high-yielding crossbred cattle as evidenced by higher hemolytic complement activity [17]. Native breed's resilience to local pathogens could be gained through evolution and adaptability when raised in the extensive farming system [18]. On contrary, intensive farming may weaken hosts' innate immunity. The relative difference in immunocompetence to infectious disease was reported among porcine breeds [19,20]. The genetic components associated with disease resistance should incorporate into the breeding program. Advances in tools and techniques for studying immunology and genetics of livestock and data analyses by immunogenomics enabled animal scientists to implement the molecular breeding technique including marker-assisted selection, genomic selection, and targeted genome editing for sustainable livestock production.

#### **3. Advances of Immunogenomics**

Immunogenomics is an information science that deals with big data from immunology and genomics. Immunogenomics integrates the molecular interactions among the host genome, immune system, and the invading pathogens. Understanding the function of the mammalian immune system has been broadening since the intersection of immunology and high-throughput sequencing technologies. Immunogenomics combines the transcriptome, DNA variants, polymorphisms/SNPs, and quantitative trait loci (QTL) mapping data resulted from host-pathogen interactions through a series of integrated

bioinformatics [7]. Thus, immunogenomics have been considered as an efficient omics tool for identifying disease resistance genes or DNA markers [7]. Transcriptome profile provides an overview of expressed genes associated with particular phenotypes and is used as guidelines for subsequent analyses by proteomics, metabolomics, epigenomics, and other omics approaches. Since transcriptome can explore the relationship between genotype and phenotype of an organism, it has been considered as the most informative assay to start with for the functional genomics. Microarray and next-generation sequencing (NGS) are the cutting-edge technologies for transcriptomics or immunogenomics in many areas of the life sciences [21]. In parallel to the revolutionary progress of NGS technology, some advanced tools for cell biology studies are equally impactful. For example, flow cytometry and mass spectrometry/cytometry can provide a better picture of the phenotypic diversity among immune cell subsets [22]. Besides, the development of the induced pluripotent stem cell (iPSC) model from peripheral T cells has opened up another exciting avenue of immunology research [23].

#### *3.1. Sequencing Technology*

As a cutting-edge tool for immunogenomics, the NGS platform has rapidly evolved over the past 15 years, and exponentially increased amounts of sequence data generated per instrument run at ever-decreasing costs [24]. NGS-based RNA sequencing (RNA-Seq) can quantify all sorts of RNA species including messenger RNA (mRNA), microRNA, small interfering RNA, and long non-coding RNA, which enables the researcher to discover novel RNA forms and variants. The NGS methodology has been extended through the development of single-cell RNA sequencing (scRNA-seq) and in-situ RNA sequencing [24–26]. A single cell is the smallest structural and functional unit of a living organism, as such a particular cell represents a specific unit of molecular coding across the DNA, RNA, and protein expression [27]. Therefore, the omics-based investigations are highly expected to be carried out at the single-cell level for more precise results. The scRNA-seq has tremendous progress in the last couple of years owing to overcoming the difficulties of isolation of a single cell population [28]. The technological progress and application of the single-cell sequencing platform about cancer research have been well-summarized in Huang et al. [28]. The scRNA-seq has recently advanced along with the development of whole-genome sequencing such as scRNA methylation sequencing and single-cell assay for transpose-accessible chromatin sequencing (Single-cell ATAC-seq) [28]. Recently, the direct RNA sequencing using high throughput nanopore sequencing technology has emerged as the latest state-of-the-art RNA-Seq technique [29]. However, the single-molecule, long-read sequencing-based NGS technology is coming soon which may replace the ongoing platforms [30].

Currently, Illumina and Thermo Fisher's (Ion Torrent) standard and commonly used NGS platforms are based on short-read sequencing technologies [31]. To build sequencing ready libraries with an average length of 300 bp (ranging from 200–700 bp), short-read RNA-seq requires either fragmentation followed by reverse transcription or full-length cDNA synthesis followed by fragmentation. Since most mammalian RNA transcripts are 1–2 kb in length [32], getting complete RNA sequencing information relies on agreement with the annotated whole transcriptomic sequence or de novo transcriptome assembly approaches. In addition, genes have more than a transcriptional isoform that confronts the performance of the NGS in accurate gene expression quantification. The mRNA molecules transcribed from the same locus are referred to as transcriptional isoforms because mRNA can be produced from different transcriptional start sites, terminated at different polyadenylation sites, or as a result of alternative splicing [33]. Due to the limitations of short-read sequencing, it is difficult to assemble all expressed isoform for each gene and quantify the expression of all the isoforms with currently available bioinformatics tools [34,35]. To address these limitations, two commercial companies have recently launched the single-molecule, long-read sequencing technology-based NGS platform: Pacific Bioinformatics (PacBio), and Oxford Nanopore Technologies (ONT). With these technologies, the read length achieved (~15 kb for PacBio and >30 kb for ONT) exceeds the length of most mammalian transcripts. Combined with the benefit of full-length cDNA synthesis [36]. Especially SMARer (Switching Mechanism at RNA Termini) technology, long-read technologies are commercially available

from Clontech (Mountain view, CA, USA), makes full-length mRNA sequencing possible with more precise transcriptomic studies. While a very powerful approach to unraveling the full spectrum of gene expression profiles is represented by long-read technologies, the relatively high costs of these technologies have prevented the broader spectrum of use. Oikinomopoulos et al. [30] reviewed the recent scientific and methodological developments in transcriptome profiling using the newly introduced single-molecule, long-read sequencing technology.

#### *3.2. Bioinformatics Tools*

While in the NGS platform, sequencing has become relatively straightforward with overcoming the technological limitations, the processing of upstream samples and downstream data analyses are still labor-intensive. To make a meaningful biological analysis, big data obtained from microarray and RNA-seq experiments requires high-power statistics and rigorous bioinformatics. Although most of the RNA-Seq data analysis algorithms can be run either in a Unix environment or inside the R/Bioconductor environment [37] from a command-line interface, some web-based menu-driven interfaces (e.g., Galaxy (www.usegalaxy.org), Geneious (www.geneious.com)) also support NGS data analysis. For the identification of DNA variants/SNPs associated with features like disease resistance, NGS sequence reading has been an area of rapid development [38]. Data from DNA variants (SNPs) or genome-wide association studies can be incorporated to explore the association between genomic architecture and the traits of interest once the potential candidate genes have become available from transcriptome analysis. The Animal QTL database (QTLdb) has been developed to bridge genotypes and phenotypes by repositing all publicly accessible QTLs and SNP/gene association data on animal species [39]. Approximately 191,422 QTLs have been curated to date, including 142,261 for cattle, 30,580 for pigs, 12,246 for chicken, 3305 for sheep and 2446 for horses (https://www.animalgenome.org/cgi-bin/QTLdb/index, Release 41, 26 April 2020). In order to analyze, annotate and visualize such genomic data for complex phenotype, such as immune repertoires, including disease resistance, bioinformatics software tools are essential components [40].

The keystone of immunogenomics is data integration. Accordingly, the scientific community can benefit from data sharing strategies that facilitate the integration of datasets among research groups. However, reliable methods for data integration are needed and require a broad range of expertise such as in mathematical and statistical models, computational methods, visualization strategies, and deep understanding of complex phenotypes. The commonly used tools and databases for immunogenomics of animals are summarized in Table 1.


**Table 1.** Bioinformatic tools or database commonly used for transcriptomics or immunogenomics studies aiming to identify the key regulatory genes in animals.


**Table 1.** *Cont*.

While in silico genomics tools and databases for the human genome have been developed well, those for the genomes of livestock are still growing. Thus, an ID conversion tool, such as bioDBnet [46], is required to convert gene ID to human orthologous identifiers to perform the downstream functional analysis using human database. BovineMine, for instance, is a useful database and web portal for data mining for the annotation of bovine genes and genomes [45]. Standardize laboratory workflows, raw data formats, experimental designs, and methods of biostatistical analyses are, however, required. The technological shortcomings of immunogenomics are being resolved and are likely to be put in the life sciences among the largest 'big data' enterprises [51]. The Functional Annotation of ANimal Genomes (FAANG www.faang.org) was developed in this regard to support the standard protocol for core research, data research and meta-data analysis of standards for swine immunogenomics studies [52]. In addition, the 1000 Bull Genome Project has provided the bovine research community with a huge volume of data on bovine variants that will be useful for GWAS and the identification of causal mutations (http://1000bullgenomes.com). These initiatives pave the way for a systemic incorporation of the findings of bovine immunogenomics studies and for making them available online.

#### **4. Applications of Immunogenomics in Livestock Disease Management**

Sustainable management of livestock diseases requires breeding techniques that protect the environment, animal welfare, and public health, as well as providing adequate financial rewards for farmers. Several efforts are in progress to develop a disease-resistant stock through molecular breeding. By dissecting the genetic makeup, a typical first stage of molecular breeding by marker-assisted selection, genomic selection, or genome editing is to determine the extent in genetic variation on the individual trait. Simple understanding of host immunology and genetics better characterize the disease-resistant phenotype. The advent of immunogenomics provides more accurate identification of candidate biomarkers for disease resistance, which makes it easier to enhance disease resistance by genome editing techniques.

In a real-life situation, the direct estimation of disease resistance phenotype is very difficult because a precise animal identification and disease regression method and infection challenge routines are required [53]. The disease resistance phenotype is typically quantified in the direct disease model by calculating the magnitude of the infection, such as burden of the inside-host pathogen (e.g., viral load), and these are technically difficult to incorporate in the intensive commercial production system. In addition, the direct measurement techniques target the host's resistance to a particular infection. However, when introduced by molecular breeding, it can result in increased susceptibility to other infections [54]. Appropriate indirect methods are therefore highly desirable for estimating disease resistance. One method may be to estimate the phenotype of disease resistance by defining the genomic marker associated with the immunocompetence of the host produced from vaccination that is inheritable and linked to improved health and production performance [55,56]. Vaccination does not cause disease, but it allows memory T-lymphocytes, B-lymphocytes, and antibodies to be formed by the host immune system, enabling hosts to defend the subsequent infection [57]. The most suitable candidates for disease resistance traits are possibly genes regulating the first few hours of the host's response during innate immunity to infection or vaccination [5,58]. Due to the rapid onset and wider range of defense, innate immune traits have the potentials to be used in the selection of livestock for disease resistance [53], and the innate immune traits display significant genetic variation among the breeds of livestock species [6,59]. Thus, as a possible indirect indicator of disease resistance, enhancing innate immunocompetence is of great importance.

To identify the potential biomarkers for immunocompetence as indirect measures of disease resistance in livestock, the study population could be divided into two contrast phenotypes correlated to disease resistance (Figure 2). The contrast phenotypes may be achieved by either taking animals with the extreme immune response phenotype following vaccination (extremely high antibody titre vs. extremely low antibody titre) or vaccinated vs. control animals. In order to obtain adequate statistical power from the experiment, the use of an optimum number of animals as biological replicates for each group should be considered. Schurch et al. [60] proposed that for the detection of substantially differentially expressed gene between two contrast phenotypes using RNA sequencing technology, at least six biological replicates should be used. The single population of target primary cells should be separated once the experimental group is fixed in order to prevent cell-type-specific gene expression. Then, a pure single-cell population should be subjected to the extraction of total RNA and genomic DNA separately. To scan the full spectrum of gene expression between the contrast phenotypes, the quality-assured total RNA samples can be subjected to holistic transcriptome profiling by RNA-seq. The RNA samples could be used for profiling for proteomics and metabolomics. On the other hand, SNP sequencing and genotyping, quantitative trait loci (QTL) mapping, and genome-wide association study (GWAS) may also be subjected to DNA samples, and epigenomics analysis targeting the phenotype of disease resistance. Rigorous integrated bioinformatics framework on all sets of omics data together helps us to identify the molecular biomarkers for intended immunocompetence trait. Those could be suggested as the targets for CRISPR/Cas9 mediated genome editing technology after functional validation of the identified biomarkers in the independent population.

Applications of immunogenomics in human disease particularly in cancer have much progress such as the in-silico prediction of human leukocyte antigen (HLA) gene has been accelerated through a combination of high-throughput sequencing technology and specialized computational approaches [61]. Immunogenomics have been employed to explore the porcine resistance to Gram-negative bacterial infection in porcine [7] and gastrointestinal nematode infection in ruminant [62]. Knowledge of immunogenomics has been applied in the identification of several immune response genes in livestock based on their association with resistance or susceptibility in several diseases and their proven function in disease pathogenesis [16]. Several GWAS have been performed aiming to identify the QTLs or SNP profiles associated with resistance or susceptibility to bovine mastitis [63], and milk somatic cell count as an indicator of mastitis [64,65]. Many immune response genes are associated with mastitis resistance including cytokines IL-4, IL-8, IL-13, and IL-17 [66,67]. Nevertheless, further fine-tuning of

the genomics tools and in silico omics software recourses in near future would enable the identification of disease resistance candidate biomarkers in a more precise way.

**Figure 2.** A working pipeline of (in vivo, in vitro, and in silico) immunogenomics for identification of disease resistance candidate gene/marker as prospective targets for genome editing. Isolation of single-cell population of a target from both phenotypic groups followed by RNA and DNA extraction separately. The RNA samples could be employed for proteomics and metabolomics profiling. On the other hand, DNA samples could also be subjected to single nucleotide polymorphisms (NSP) sequencing and genotyping, quantitative trait loci (QTL) mapping, and genome-wide association study, and epigenomics study targeting the disease resistance phenotype. Rigorous integrated bioinformatics application on all sets of omics data together enables us to identify the molecular biomarker for the target immunocompetence trait. After functional validation of the identified biomarkers in the independent population, those could be recommended as the targets for CRISPR/Cas9 mediated genome editing technology.

#### **5. Advances of Genome Editing Technology**

The premier seed for the possibility of a desired improvement of the mammalian genome has sown by Palmiter and Brinster in 1988 [68]. The active insertion into the mouse embryos by pronuclear microinjection of a foreign DNA fragment, a growth hormone gene called metallothionein-I, resulted in the rapid growth of the animals as targeted [68]. This technique of genome engineering based on cells was restricted to injecting plasmids or gene fragments into the embryos pronucleus. Subsequent advances in genome engineering have been achieved by the introduction of transposons or retroviral vectors [69], followed by the development of homing endonuclease (HEs), natural meganuclease capable of introducing double-strand breaks (DSBs) at 14–40 bp target sites [70]. Nevertheless, in vivo application of HE-based genome modification has been limited due to their off-target cutting propensity. In general, modern genome editing relies on DNA insertion, deletion, or substitution in the genome of a living organism using programmable nuclease-based editors (Figure 3). Zinc finger nuclease (ZFNs) [71,72] and transcription activator-like effector nuclease (TALENs) [73] were the most widely used genome editors until recently. Zinc fingers (ZFs) are among the most well-characterized DNA-binding protein domains found in the nature that have enhanced the programmed modification

repertoire of enable any target genome to be accurately cut and repair [72]. A second group of naturally occurring proteins containing a DNA-binding domain and formed by proteobacteria of the genus Xanthomonas is transcription activator-like effector (TALE) [73]. A pair of ZFNs must bind regions flanking the target locus for genome editing to form a FokI dimer, which is required to induce DSBs (Figure 3A) [74]. Similarly, TALENs are also modular proteins that contain two domains: a customizable DNA-binding domain (TALE) and the FokI nuclease domain. The TALE-binding DNA sequence is cut by dimerization and thus produces DSBs in a similar manner to ZFNs (Figure 3B) [75]. However, both ZFNs and TALENs are limited by targeting multiple sites in the same genome. The comprehensive interaction of ZFNs with protein-DNA and the highly repetitive nature of TALENs warrant the advent of new instruments.

**Figure 3.** Nuclease-based genome editors. (**A**). Zinc Finger Nuclease (**B**). Transcription-Activator Like Effector Nuclease (TALEN). (**C**). Schematic diagram showing genome editing using CRISPR/Cas9 system. The Cas9 induces DNA double-strand break (DSB) which are repaired either by imperfect nonhomologous end-joining (NHEJ) to generate insertion or deletion (indels) or if a repair is provided, by homology-directed repair (HDR) (Adapted from Moore et al. [71], and Pellagatti et al. [9].

The clustered regulatory interspaced short palindromic repeats (CRISPR)/Cas9 system, which was built from an inherent antiviral mechanism found in bacteria [76], is the latest addition to the genome toolbox. In comparison to a classical genetic modification that involves moving genes from species to another, the CRISPR/Cas9 system relies on the use of molecular 'scissors' to introduce changes in existing DNA sequences [75,77]. The CRISPR/Cas9 system uses a single guided RNA (sgRNA) to help the Cas9 nuclease to classify the particular genomic sequence, unlike the ZFNs and TALENs, which use proteins to recognize specific sequences in the genome (Figure 3C). In the presence of sgRNA complementary sequence and the Porto-spacer adjacent motif (PAM) sequence [9], the Cas9 protein binds to the sgRNA scaffold and generates a DSB. The DSB activates the machinery for the repair of endogenous cellular DNA that catalyzes non-homologous end joining (NHEJ) and homology direct repair (HDR) (Figure 3C). The NEHJ pathway is preferably used by most cell types, which is possibly an error-prone mechanism that generally results in minor insertions or deletions at the site of repair. NEHJ also produces a mutation and induces encoded protein fragmentation or functional gene knockout by producing DNA break in the coding sequence of a gene. On the other hand, the HDR pathway can be activated through flooding the target cell with a DNA repair template, which allows the implementation of specific sequence changes proximal to the cut site ranging from single base changes to the insertion of transgenes. In order to make minor change, a synthetic single-stranded oligodeoxynucleotide is used, whereas larger modification requires the plasmid/dsDNA template. Finally, two simultaneous breaks are produced and transcriptional profiles are altered either by

removing regulatory elements or by deleting exons. Protein domains are subsequently deleted, leaving the remaining reading frame intact [78]. One of the CRISPR/Cas9 system's key advantage is that the Cas9 nuclease is not covalently fused to a DSB, so it is possible to use same protein to attack multiple target sites by combining it with a different sgRNA package. Besides, most of the necessary reagents can be made in the molecular biology laboratory, while Cas9 nuclease and sgRNA molecules can be purchased commercially. The CRISPR/Cas9 system has made the genome-editing technique a practical reality in recent years due to its methodological simplicity, performance, precision, flexibility, and a greater degree of accuracy.

A variety of techniques are available to produce the genetically modified animals depending on the species and cell type used to deliver the Cas9 genome-editor (reviewed by Harwood et al. [79]). The technique of sperm mediated gene transfer (SMGT) is used to deliver the genome-edited reagents into a zygote. Manipulation of spermatogonial stem cells (SSCs) or primordial germ cells (PGCs) is often used for the development of genome-edited organisms. For the development of genome-manipulated animals, the pronuclear and intracytoplasmic injection of genome editors into zygotes accompanied by either direct transfer to surrogates or in vitro maturation and embryo transfer could be applied. In addition, somatic cell nuclear transfer (SCNT) is another approach that enables precise edits to be selected in somatic cells before nuclear transfer to surrogates or in vitro maturation. The injection of edited embryonic or iPSC into blastocyst may also be used for the development of chimeric animals examined by Harwood et al. [79]. Like the genome-editor tool use, the techniques follow is also important to achieve the intended efficiency and precision in generating genome-edited animal species. The introduction of genome editing in the research animal model for the treatment of human disease has also made tremendous progress, in addition to the development of CRISPR/Cas9 techniques in animal disease management. Recent reviews have summarized the methodological advances of CRISPR/Cas-9 mediated genome editing in large animals (pigs, monkeys, dogs, rabbits, mice, rats) with a focus on the creation of model animals for studying human diseases [80,81].

#### **6. Applications of Genome Editing in Livestock Disease Management**

The CRISPR/Cas9-based genome editing has been applied in livestock for several specific (but not limited to) purposes: (a) for inactivation or alteration of expression of targeted genes in the model animals to confirm their functions (b) for producing research animals for studying human disease pathogenesis in controlled setup and evaluate potential therapies, (c) and producing genetically modified animals for industrial, pharmaceutical, and biotechnological implications [82]. Nevertheless, the success of the gene-editing system in controlling animal diseases would be affected by several factors. For instance, the proportion of gene-edited animals and how they are distributed within and across the population. According to epidemiological theory, a certain proportion of gene-edited animals required to achieve herd immunity, and to prevent disease transmission within the population [83]. Disease-specific epidemiological modeling could provide information on the required number of genome-edited animals for preventing certain disease/infectious agents. Such modeling should consider the influence of population structure, demographic characteristics, diverse environmental factors, and management strategies that affect the disease transmission dynamics to estimate the size of the population for genome editing. The latest advance in genome editing with programmable nucleases, such as CRISPR/Cas9, has opened up new avenues for animal breeding targeted with disease resistance. However, the efficiency of genome editing varies due to the variation in reproductive physiology among different livestock species. Genome editing in cattle accompanied by major challenges due to high market value, a small number of offspring, and longer gestation period (9 months gestation, 12–18 months to reach puberty). While pigs, sheep, and goats have several advantages as they are smaller, cheaper, and produce more offspring at a time, shorter gestation lengths, and shorter ages of puberty. Over time, genome editing has been successfully applied in livestock species to improve different traits like growth, production performance, and resistance to certain diseases.

Myostatin (*MSTN)* gene is known to be associated with growth and skeletal muscle development. *MSTN* was targeted in the earlier attempts of genetic manipulation in farm animals because the disruption of this single gene has significant effects on meat production, an economically important trait. To date, genome editing has been successfully applied to knockout the *MSTN* from the genomes of cattle, pigs, sheep, and goats (reviewed by Petersen [8]). In cattle, the introduction of antibiotic lysostaphin by SCNT resulted in secretion of milk protein that has bactericidal activity against mastitis-causing bacteria *Staphylococcus aureus* [84]. The gene encoded with bovine whey protein ß-lactoglobulin (BLG), which is a major milk protein and a dominant allergen, was successfully knocked out in the bovine genome using ZFNs technique [85]. Another genome editing study has described the production of swine with mutation RELA (p56) gene using ZFNs, which confer the tolerance of African swine fever [86]. Production of a hornless strain of dairy cattle (Holstein Frisian) by inserting the pooled gene (P) of beef cattle (Angus) through the HDR pathway of gene editing has also been reported [87]. With the evolution of CRISPR/Cas9-based genome editing and a deeper understating of how to gear up its potential, it is now possible to introduce extremely precise changes to the genome with better accuracy and efficiency than any previous attempts. The CRISPR/Cas9 mediated insertion of *NRAMP1* gene for producing tuberculosis resistant cattle and deletion of *CD163* gene for producing porcine reproductive and respiratory syndrome (PRRS) resistant pigs are two groundbreaking application of genome editing technique in livestock. Hereinbelow, we summarized some prominent examples of implication of CRISPR/Cas9-based genome editing to produce disease-resistant livestock (Table 2).


**Table 2.** Reported application of genome editing techniques for disease resistance livestock breeding.

#### *6.1. Porcine Reproductive and Respiratory Syndrome (PRRS) in Pigs*

PRRS is considered the most economically important infectious disease of the swine industry worldwide affecting the production, reproduction, health, and welfare of pigs. The global transcriptome profiling of peripheral blood mononuclear cells revealed that pigs can induce innate and subsequent adaptive immune response to PRRS virus infection or vaccination [92,93]. The host transcriptomic response to PRRSV has been found to have substantial genetic variation [19,20]. The cluster of differentiation 163 (CD163) is a cell surface receptor gene, which is a member of scavenger receptor cysteine-rich superfamily (SRCR) having one intracellular domain and nine extracellular SRCR domains [94]. The CD163 facilitates the PRRS virus to enter into the pulmonary alveolar macrophage through endocytosis, where the virus replicates and induces disease pathogenesis [94,95]. Whitworth et al. [96] reported for the first time that CRISPR/Cas9 mediated CD163-knockout pigs were fully protected against the clinical outcome of PRRS virus infection with a single isolate [96]. A subsequent experiment demonstrated that the SRCR5 domain is crucial for establishing viral infection [97]. The implication of reproductive biotechnology for the production of genome-modified pigs might therefore significantly reduce the PRRS-associated economic losses in the pork industry.

#### *6.2. African Swine Fever Resistance in Pig*

Another economically relevant infectious disease caused by the African swine fever virus (ASFV) in pigs is African swine fever (ASF). Many areas of sub-Saharan Africa are endemic to the ASF. It has been detected recently in Eastern Europe, from where it is rapidly spreading to both Western Europe and China. Native wild swine breeds including the Warthog, have been reported to be resilient to ASFV infection, whereas domestic pigs experience cytokine storm-related lethal hemorrhagic fever. Significant variation in the expression of the RELA gene between resilient and susceptible pigs is associated with host response to ASFV infection [98]. The RELA is a part of the Nuclear Factor kappa Beta (NFkB) transcription factor, considered to play an important role in stress management and immune defense. By using the ZFN-based genome editing technique [99], the sequence of RELA gene in domestic pigs could be translated to that of Warthog pigs. However, the phenotypic data supporting the genetic resistance to ASFV infection yet to be reported.

#### *6.3. Tuberculosis Resistance in Cattle*

Bovine tuberculosis (bTB) is a chronic bacterial disease that is crippling and caused by *Mycobacterium bovis* in cattle. With a wide host range, *M. bovis* infection create considerable hardship for livestock producers with estimates of more than 50 million cattle infected worldwide [100]. This zoonotic infection can be transmitted to humans mainly through the ingestion of milk products that are not pasteurized, resulting in a 10–15% prevalence of human TB [99]. Compulsory testing accompanied by the slaughter of test-positive animals, which accounts for as huge economic loss, is an important means of bTB regulation. The bTB is as a crucial target for genome editing for producing tuberculosis resistant cattle, due to its economic and zoonotic importance, endemic existence, and failure of conventional control strategies. As a strong candidate gene for tuberculosis resistance, natural resistance to infection with intracellular pathogens 1 (*NRAMP1*) gene has been reported by several studies. In a recent study, Cas9 nickase (nCas9) was used by scientist to insert the *NRAMP1* into the genome of the bovine fetal fibroblast [90]. These engineered fibroblast cells were then used as donor cells in somatic cell nuclear transfer, where the *NRAMP1*-containing donor cell nucleus was inserted into the cow's ovum. Before being transferred to recipient cows following a physiologically natural estrous cycle, ova were then nurtured in the laboratory up to embryos. The inserted *NRAMP1* gene was correctly expressed and provided cattle by showing a higher degree of resistance to *M. bovis* infection [90]. It has also been reported that resistance against *M. bovis* infection could be achieved in cattle through TALEN-mediated insertion of mouse SP110 gene into an intergenic region of the bovine genome [101].

#### *6.4. Enzootic Pneumonia Resistance in Cattle*

Pasteurellosis in cattle also called shipping fever or enzootic pneumonia is a respiratory disease complex mostly found in recently weaned, single-sucked beef calves after housing or transport to a new house. Following infection, *P. hameolytica* secretes a leukotoxin that is cytotoxic and it binds to the uncleaved signal peptide of the CD18 protein on the surface of leukocytes. However, the mature CD18 lacks the signal peptide in the leukocytes of other species (e.g., mouse and human) that do not suffer from this disease. ZFNs have been used to introduce a single amino acid change in the bovine CD18 protein and leukocytes from the resultant cattle were able to inhibit the *P. hameolytica* leukotoxin-induced cytotoxicity [102].

#### **7. Ethics, Regulations, and Social Acceptance of Genome-Edited Livestock Products**

Manipulation of genomes in farm animals is becoming a lucrative and materialistic alternative. However, problems such as regulatory legislation, market pay off, and performance acceptability of users are still unresolved. The societal attitude towards genome-edited animal products worldwide will depend on whether the handling was carried out with due regard to the ethical value of target customer community and the issue of animal welfare [103]. The resulting foods after genome-edited

livestock should appear on the commercial market ready to be distributed to consumers immediately, with the required legislative approval of a country. However, the contested factors linked to genetically modified (GM) as well as cloned animals suggest that food derived from genome-edited animals is socially rejected or unable to accept it. Psychological studies have shown that several factors such as the consumer's perceived risks of the consumer, the recognized benefits and/or the confidence in regulatory legislation, can cause the acceptability of GM animals by researchers [104]. The public's skeptical view of GMOs (GM organisms) is partly related to the degree of skepticism of both researchers and state policymakers [105]. Off-target mutagenesis seems to be a big problem due to the widespread use of the CRISPR/Cas9 tool. In relation to the breeding of disease resistant animals, the problem is of more serious concern. The debate on closed animals, stimulated by so many tests, prompted the study of off-target mutation. Exploration of off-target mutations tends to be crucial from the point of view of animal health and ethics to the broader implications of genome-editing techniques in animal breeding [103]. Therefore, in addition to scientific ethics, ethical concerns of animal health and welfare to minimize the imminence of off-target mutations are required to enhance public understanding. This could advocate the social acceptance of genome-edited livestock products gradually.

#### **8. Potentials and Prospects of CRISPR**/**Cas9 Technology in Livestock Production**

The CRISPR/Cas9 exhibits the potentials for substantial improvement over the gene-editing technologies in the ease of application, speed, efficiency, and cost involved. The genome-editing technique has been extensively used for elucidating the gene function in disease pathogenesis and host immune responses. Indeed, the CRISPR/Cas9-based correction of gene mutation in a mouse model of human disease and the primary adult stem cells derived from patients suffering monogenic hereditary defects are being considered the cornerstones for future gene therapy technique [9]. The CRISPR/Cas9-mediated knockout libraries could also potentially be applied to target regions of interest in the noncoding regulatory segment of the genome, such as promoter and enhancer. Moreover, the application of CRISPR/Cas9 in genome-wide studies will facilitate the holistic screening of disease resistance markers. The CRISPR/Cas9 technique would also enable a much wider range of modifications, for instance, gene knockout, base-pair substitution, targeted insertion/deletion of larger genomic regions, and modulation of gene expression [8]. Despite many challenges, remarkable advancements in the field of gene therapy and CRISPR/Cas9-based genome editing technique have been observed in recent years, which paves the way for the development of sustainable disease control strategies for humans, crops, fish, and livestock. Though unlikely in crops, where the whole population can rapidly be replaced, the application of genome editing in the livestock population is a more complex and time-consuming process. Moreover, increasing the efficiency of the CRISPR/Cas9-based repairing process, particularly to increase the rate of gene correction and reduce resultant off-target effects and the development of more effective delivery methods would require for its wider application.

#### **9. Conclusions**

Raising immunocompetent healthy livestock is crucial for the sustainable production of safe food. Understanding the genetics behind the host immunocompetence to infectious pathogens is the key to improve the level of disease resistance through molecular breeding approaches. Exploring the variation of host innate immunocompetence to economically important infectious diseases among indigenous, exotic, and cross-bred animals of the same species can be of important starting point toward estimating the genetic components of disease resistance phenotype. The cutting-edge techniques for immunological and molecular genetics study may create a direct linkage between disease-resistant phenotype and the host genotype. Continuous advancement of open-source in silico omics tools will identify the potential genomic marker, the target for genome editing for disease resistance in livestock. The application of modern reproductive biotechnology, such as CRISPR/Cas9 mediated genome editing, is a breakthrough tool for improving disease resistance in livestock due to its high

precision. Minimizing the risk of off-target mutations would restore the animal welfare standard and increase consumer acceptance of food products derived from genome-edited livestock.

**Author Contributions:** M.A.I., M.J.U., and H.K. designed the study. M.A.I., S.A.R., M.B.R. collected the literature. M.A.I. wrote the original draft. S.A.R., M.B.R., M.U.C., J.V., M.J.U., and H.K. critically reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by a Grand-in-Aid for Scientific Research (A) (19H00965), (B) (21380164, 24380146 and 16H05019) and Challenging Exploratory Research (23658216) and Open Partnership Joint Project of JSPS Bilateral Joint Research Projects from the Japan Society for the Promotion of Science (JSPS), and by grants from the project of NARO Bio-oriented Technology Research Advancement Institution (research program on the development of innovative technology, No.01002A) to H.K. This study was also supported by ANPCyT-FONCyT Grant PICT-2016-0410 to J.V., and by Japan Racing Association, and by JSPS Core-to-Core program, A. Advanced Research Networks entitled Establishment of international agricultural immunology research-core for a quantum improvement in food safety.

**Acknowledgments:** Authors are thankful to the Karl Schellander of the Institute of Animal Science, University of Bonn, Bonn, Germany for his insightful advice in planning this work.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Identification of SNPs Associated with Somatic Cell Score in Candidate Genes in Italian Holstein Friesian Bulls**

**Riccardo Moretti 1,†, Dominga Soglia 1,†, Stefania Chessa 1,\*, Stefano Sartore 1, Raffaella Finocchiaro 2, Roberto Rasero <sup>1</sup> and Paola Sacchi <sup>1</sup>**


**Simple Summary:** Mastitis is a worldwide diffused disease usually treated with an excessive use of antibiotics. Therefore, antimicrobial resistance is an important issue to be addressed by scientists. One of the possible solutions to decrease the use of drugs is genetic selection of resistant animals, that is, individuals that can be more resistant to mastitis. In our survey we analyzed Single Nucleotide Polymorphisms (SNPs) in genes known to be involved in both infection resistance and immune system activity. We found a group of SNPs that can be associated to mastitis related phenotypes (namely SCS) and that can be used for selecting resistant animals. An efficient selection is able to improve both animal welfare and quality and safety of animal products

**Abstract:** Mastitis is an infectious disease affecting the mammary gland, leading to inflammatory reactions and to heavy economic losses due to milk production decrease. One possible way to tackle the antimicrobial resistance issue stemming from antimicrobial therapy is to select animals with a genetic resistance to this disease. Therefore, aim of this study was to analyze the genetic variability of the SNPs found in candidate genes related to mastitis resistance in Holstein Friesian bulls. Target regions were amplified, sequenced by Next-Generation Sequencing technology on the Illumina® MiSeq, and then analyzed to find correlation with mastitis related phenotypes in 95 Italian Holstein bulls chosen with the aid of a selective genotyping approach. On a total of 557 detected mutations, 61 showed different genotype distribution in the tails of the deregressed EBVs for SCS and 15 were identified as significantly associated with the phenotype using two different approaches. The significant SNPs were identified in intergenic or intronic regions of six genes, known to be key components in the immune system (namely *CXCR1*, *DCK*, *NOD2*, *MBL2*, *MBL1* and *M-SAA3.2*). These SNPs could be considered as candidates for a future genetic selection for mastitis resistance, although further studies are required to assess their presence in other dairy cattle breeds and their possible negative correlation with other traits.

**Keywords:** Holstein Friesian cattle; mastitis resistance; candidate genes; SNP selection; nextgeneration sequencing

#### **1. Introduction**

Mastitis is an infectious disease that affects the mammary gland of cattle, leading to an inflammatory reaction and therefore to negative economic consequences due to a marked decrease in milk production [1]. This infection is usually caused by microorganisms penetrating the mammary gland via teat canal [2]: pathogens can be transmitted either between cows (e.g., *Staphylococcus aureus*) or picked up from the environment (e.g., *Escherichia coli*) [3]. Bovine mastitis is considered as one of the costliest diseases affecting dairy cattle

**Citation:** Moretti, R.; Soglia, D.; Chessa, S.; Sartore, S.; Finocchiaro, R.; Rasero, R.; Sacchi, P. Identification of SNPs Associated with Somatic Cell Score in Candidate Genes in Italian Holstein Friesian Bulls. *Animals* **2021**, *11*, 366. https://doi.org/10.3390/ ani11020366


Academic Editor: Bianca Castiglioni Received: 10 December 2020 Accepted: 29 January 2021 Published: 1 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

worldwide, with antimicrobial therapy representing the major impact on sustained costs [4]. Furthermore, given the high mastitis frequency worldwide, the subsequent antibiotic use in dairy cows is under constant control due to its association with antimicrobial resistance increase [5].

One possible way to tackle the antimicrobial resistance issue is to select animal with a higher genetic resistance to this disease [6]. Starting from the 50's, along with its genetic relationship with additional infection-related phenotypes, the possible inheritance of genetic resistance to mastitis was studied through the years [7,8]. However, the first traits to be selected, like mammary gland characteristics and somatic cell count (SCC), revealed to have low to moderate heritability [9]. Therefore, new genetic approaches to find markers able to allow a faster and more accurate selection are requested, with two potential available candidates: genome scanning and single nucleotide polymorphisms (SNPs) in candidate genes [6]. A holistic approach such as genome scanning and/or a Genome Wide Association Study (GWAS) requires a large effect and/or a large number of animals to detect loci associated with traits of interest, while complex diseases are determined by many loci or genes with a small or almost negligible effect [10]. Therefore, the SNP approach in candidate genes involved in organism recognition, leukocyte recruitment, pathogen elimination and resolution seem to be more direct and reliable.

At first, we selected nine genes that are all involved in the immune response to mastitis infection according to literature. Pentraxin 3 (*PTX3*) gene maps on *Bos taurus* autosome 1 (BTA1) and its encoded 45 kDalton glycosylated protein is expressed by mononuclear phagocytes, dendritic and endothelial cells in response to primary inflammatory signals, due to complement activation and pathogen recognition [11]. Chemokine (C-X-C motif) receptors 1 and 2 (*CXCR1* and *CXCR2*, respectively) are paralogous genes coding for major proinflammatory cytokine receptors [12]. They both map on BTA2 and are separated by a 23 Kb fragment. *CXCR1* and *CXCR2* have been considered as prospective genetic markers for mastitis resistance in dairy cows [13,14]. Deoxycytidine kinase (*DCK*) maps on BTA6 and has a functional role in drug resistance and sensitivity [15]. Toll like receptor 4 (*TLR4*) maps on BTA8 and is associated with the early innate immune response. Specifically, *TLR4* is recognized as the key transmembrane receptor for the detection of gram-negative bacteria [16]. Nucleotide binding oligomerization domain containing 2 (*NOD2*) maps on BTA18 and is a key component in the innate immune system, inducing the activation of proinflammatory signaling pathways [17]. Mannose binding lectin 1 and 2 (*MBL1* and *MBL2*, respectively) map on BTA28 and BTA26, respectively. MBLs are collagenous lectins involved in the innate immune response to various microbial pathogens and are potential candidate gene to mastitis resistance [18]. Lastly, mammary Serum amyloid A3.2 (*M-SAA3.2*) maps on BTA29 and belongs to a superfamily of apolipoproteins expressed in bovine mammary gland as a response to pathogens associated to mastitis [19]. Another candidate gene found in a genomic region close to *DCK*, namely the immunoglobulin J (joining) chain gene (*JCHAIN*), was then included in the re-sequencing to find new possible candidate SNPs: this gene plays an important role in the assembly of polymeric immunoglobulins (dimeric IgA and pentameric IgM) and in their selective transport across epithelial cell layers [20].

The aim of this research was to analyze the genetic variability of these genes in Italian Holstein bulls and to study the effects of their SNPs in relation to resistance to mastitis [21].

#### **2. Materials and Methods**

#### *2.1. Animal Data*

The deregressed Estimated Breeding Values for Somatic Cell Score (dEBVs) were obtained by the National Breeding Association (ANAFIJ) for a total of 15,562 Holstein bulls. The dEBVs were calculated as for the genomic national evaluation with the algorithm developed by ANAFIJ. We chose to analyze individuals within the two extreme tails of the dEBVs distribution (±1 SD) of bulls born between 2002 and 2012. Given the 11 years range of bulls' birth year, dEBVs were expressed as deviation from the mean dEBVs updated in

April 2014. Only bulls with a reliability index > 90 and with availability of semen were retained, resulting in the two following subsets: 37 bulls with low dEBVs (group L, <95) and 58 bulls with high dEBVs (group H, >105). Group L dEBVs ranged from 85 to 94 (92.0 ± 2.3, mean ± SD), while group H was composed by bulls with dEBVs values ranging from 106 to 112 (107.9 ± 1.8, mean ± SD). Reliability of the two groups was (mean ± SD) 92.4 ± 0.2 and 92.6 ± 0.2 (L and H, respectively). The mean number of daughters for each bull was 310 ± 74 and 300 ± 57 (L and H, respectively). Semen doses of the selected bulls were obtained from specialized laboratories for semen cryo-conservation.

#### *2.2. Genes Data and Re-Sequencing*

The DNA of the 95 bulls was re-sequenced in order to detect all the SNPs present in the sequences of the investigated genes. Information were obtained from NCBI database [22], "Gene" section (available from https://www.ncbi.nlm.nih.gov/gene/).

The assembly of the UMD Bovine Genome 3.1.1 was taken as a reference for all the genes (Table 1). The *PTX3* gene mapped on the strand AC\_000158.1, corresponding to the sequence of BTA1 in position 1: 111,027,803–111,033,868. The *CXCR1* and *CXCR2* genes both mapped on the strand AC\_000159.1, corresponding to the sequence of BTA2 in position 106,936,887–106,938,583 and 106,900,465–106,915,876, respectively. *JCHAIN* was located upstream *DCK* in position 87,759,435–87,768,832 on BTA 6, while *DCK* gene mapped on the strand AC\_000163.1 in position 88,049,498–88,077,488. The *TLR4* gene mapped on the strand AC\_000165.1, corresponding to the sequence of BTA8 in position 108,828,899– 108,839,913. The *NOD2* gene mapped on the strand AC\_000175.1, corresponding to the sequence of BTA18 in position 19,177,563–19,212,607. The *MBL1* and *MBL2* genes mapped on the strands AC\_000185.1 (BTA28) and AC\_000183.1 (BTA26), respectively, in position 35,840,848–35,846,070 and 6,343,615–6,348,912, respectively. Lastly, *M-SAA3.2* gene mapped on the large strand AC\_000186.1, corresponding to the sequence of BTA29 in position 26,755,567–26,759,547.


**Table 1.** Position of the selected genes on UMD bovine genome 3.1.1, target region selected for resequencing, genes' upstream and downstream regions sequenced, and gene sequencing coverage (COV).

<sup>1</sup> reverse oriented gene.

Genomic DNA was extracted from semen by using NucleoSpins Tissue kit (Macherey-Nagel, Düren, Germany). The primers were designed using the Design Studio web application by Illumina® to sequence the entire genes and about 10,000 bp of the upstream regions to search for polymorphisms that could be responsible of the gene expression and be related with resistance to mastitis also in the 5- UTR regulatory regions. The maximum length of the amplicons for each gene was 450 bp, trying to maximize the coverage of the target region. The primers generated by the software were included in a TruSeq® kit custom amplicon. All the obtained amplicons were sequenced by Next-Generation Sequencing technology on the Illumina® MiSeq platform at the IGA technology Services (Udine, Italy), which also performed the output data processing the variant and genotype call and generated a Variant Call File (VCF) for each gene. Polymorphisms were filtered on the base of locus GQX (genotype quality assuming position, <10,000), GQ (genotype

quality, <30.00), R8 (indel repeat length, >8), and MQ (mapping quality, <0.00). Other considered parameters were indel, site conflicts, and read depth. The genotype table was set up in R environment inferring all the identified allelic variants from the VCF files. Allelic frequencies were calculated and mutations with Minor Allele Frequency (MAF) < 0.05 were removed from the dataset.

#### *2.3. SNP Analysis*

An investigation in the National Center for Biotechnology Information (NCBI, https: //www.ncbi.nlm.nih.gov) was carried out on the SNPs with MAF ≥ 0.05 to verify if they were already present in available databases, to define their gene position and, if detected in exon regions, to evaluate their effect on protein translation. The Wilcoxon-Mann-Whitney (WMW) non-parametric test was used to check if there was a difference between the two groups in terms of genotype distribution for all the loci. Subsequently, the generalized heteroscedastic effects regression model (HEM) was used to estimate the contribution of each SNP to the differentiation of the individuals into the two tails of the distribution of the dEBVs [23]. All the SNPs were also simultaneously tested with a multiple gene approach (MG), where association analysis under an additive model was performed considering the phenotype as binary trait (high or low dEBVs) and using the GRAMMAR approach as implemented within the GenABEL package v.1.8 [24] for R [25]. Since the analyzed SNPs were not actually scattered all over the genome, but in 10 selected genes (considering the introduction of *JCHAIN*), polymorphisms with a correlation higher than 0.80 with any others were excluded. Also, the SNPs not in Hardy-Weinberg Equilibrium (HWE) were excluded. Moreover, four individuals having identity by state (IBS) > 0.95, as revealed by the genomic relationship matrix calculated using the entire dataset, were not included in the following analysis. Then, a polygenic analysis was conducted using a genomic kinship matrix based on SNP genotypes to account for relationship between individuals. Residuals from the polygenic analysis were then used as dependent, quantitative variables in single marker, linear regression analyses to test the significance of marker effects.

In order to understand the possible role of the significantly associated SNPs a first check was performed on NCBI database to see if the SNPs fell within regions from which an RNA transcription through RNAseq analysis was obtained so far. Then, short interspersed nuclear elements (SINE), long terminal repeat elements (LTR), and RNA repeats were analyzed using UCSC Genome Browser [26]. Finally, RNA Central was used to search for potential similarity with non-coding RNA [27], while miRbase was used to search for micro-RNA (miRNA) [28].

#### **3. Results**

The final re-sequenced regions are listed in Table 1. Only for *TLR4*, *NOD2*, *MBL2*, and *MBL1* we obtained sequences for about 10,000 bp upstream the selected genes, whereas for *M-SAA3.2* and *DCK* we obtained reads for about 5600 bp upstream. In the remaining genes, namely *PTX3*, *CXCR1*, *CXCR2*, and *JCHAIN*, we could re-sequence about 1000 bp of the upstream region. For *PTX3* gene we could not obtain the last 1000 bp of the gene. This occurred mainly because some regions were highly repeated while for other regions the software failed in finding specific primers. For the same reasons we could not obtain the 100% coverage, but only a mean gene coverage of 78.6% of the selected regions, with *PTX3* and *MBL2* sequenced with 64% and *CXCR1* and *MBL1* with 88% of coverage respectively. Despite gene selection, other genes are included in the sequenced regions of the bovine genome assembly: *PTX3* gene is included in a region within the ventricular zone expressed PH domain homolog 1 gene (*VEPH1*) and the collectin surfactant protein A (SP-A) gene (*SFPTA1*) is located downstream *MBL1*. Moreover, one significant SNP in *M-SAA3.2* was mapped in an intergenic region that seems to be closer to *SAA4* than to *M-SAA3.2*, about 14,000 bp far from the re-sequencing chosen region. Considering the high similarity of the SAA superfamily, further work is needed to verify if there were mapping errors or we obtained re-sequencing of non-specific products. Therefore, we considered the discovered

SNPs as belonging to the selected genes and specified in the annotation column of Table 2 their position referring to other genes.

**Table 2.** SNPs resulted significantly different in frequency in the two tail of the deregressed EBVs for SCS as calculated with Wilcoxon-Mann-Whitney (WMW) test. The SNP resulted significantly associated with the phenotype with the Heteroscedastic Effects regression Mode (HEM) and the multiple gene approach (MG) are also reported together with the SNP effect.



**Table 2.** *Cont.*

<sup>1</sup> reverse oriented gene; <sup>2</sup> NA = not available; <sup>3</sup> the SNP falls in intron 2 of the collectin surfactant protein A gene *SFPTA1*; <sup>4</sup> the SNP falls in an intergenic region mapped near *SAA4* gene; <sup>5</sup> *p*-values: \* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001, ns = not significant.

> In the sequenced region the total number of called variants was 25,200, but after quality filtering 2585 SNPs were retained. Out of these 2585 SNPs only 557 had a MAF > 0.05 and therefore were used to analyze their associations with the selected phenotype: 20 SNPs within *PTX3* (BTA1-Supplementary Table S1), three of which located in intron 2 and not found on NCBI SNPs databases (1: 111,030,410, 1: 111,030,420, and 1: 111,030,423); 91 SNPs in *CXCR1* and *CXCR2* (BTA2-Supplementary Table S2), three not found on NCBI SNPs databases (2: 106,913,554, 2: 106,913,717, and 2: 106,913,727) and supposed to lead to missense (V125A), synonymous (L179L), and missense (V183I) mutations, respectively; 76 SNPs within *JCHAIN* and *DCK* (BTA6-Supplementary Table S3), nine not found on NCBI SNPs databases (6: 88,044,038, 6: 88,044,298, 6: 88,044,381, 6: 88,044,420, 6: 88,044,746, 6: 88,064,952, 6: 88,069,416, 6: 88,069,428, and 6: 88,069,437), the first five located in intergenic region and the last four in intron regions; 81 SNPs within *TLR4* (BTA8-Supplementary Table S4); 43 SNPs in *NOD2* (BTA18-Supplementary Table S5); 54 SNPs in *MBL2* (BTA26- Supplementary Table S6); 64 in *MBL1* (BTA28-Supplementary Table S7); 128 SNPs in *M-SAA3.2* (BTA29-Supplementary Table S8). After pruning for correlation and HWE, 182 of these 557 polymorphisms were tested for associations with the MG approach.

> WMW test revealed 61 SNPs with a significant different genotype distribution in the two tails of the dEBVs: 12 SNPs out of 20 in *PTX3*, 1 SNP out of the 91 on BTA2 in *CXCR1*, 12 SNPs (three in intron 2 of *JCHAIN*, three in intergenic regions of *DCK*, and six in *DCK* introns) out of 76 on BTA6, six SNPs out of 81 in *TLR4*, 18 SNPs out of 54 in *MBL2*, 3 SNPs out of 64 in *MBL1*, four SNPs out of 128 in M-SSA3.2. Both the HEM and the MG analysis showed nine SNPs significantly associated with the phenotype each. Since two SNPs obtained by HEM and one SNP obtained by MG approach had no significant differences in genotype distribution by WMW, the total number of SNPs included in Table 2 is 64. The SNPs position refers to the UMD Bovine Genome 3.1.1 assembly and, where available, the SNPs rs are reported together with the description of the type of variant, the level of significance of the three tests and the effect of the SNP for HEM and MG approaches. The HEM approach revealed significant SNPs in *DCK* (4), *NOD2* (2), *MBL2* (2), *MBL1* (1), whereas the MG approach revealed significant SNPs in *CXCR1* (1), *DCK* (3), *NOD2* (3), and *M-SAA3.2* (2). None of the SNPs in *PTX3*, *CXCR2*, *JCHAIN* and *TLR4* resulted significantly associated with the SCS levels in the analyzed population. Nevertheless, one SNP in *PTX3* mapped within 3- UTR and others two corresponded to the non-synonymous mutations responsible for the aminoacidic exchanges D341E and E347K, therefore they could lead to changes in the expressed protein and be matter of interest although they were not significantly associated with the SCS levels in the analyzed population.

#### *3.1. CXCR1*

The only SNP showing a different genotype distribution in L and H groups by WNW test, rs109694601, also resulted significantly associated with SCS levels by MG approach. This SNP, located within intron region in *CXCR1*, is mapped 24 pb upstream a Mammalianwide Interspersed Repeats (MIR) element of the MIR3 subfamily. MIRs represent an ancient family of tRNA-derived Short INterspersed Elements (SINEs) found in all mammalian genomes, whose core region may serve some general function, although they have ceased to amplify by retro-transposition. Despite their massive presence in mammalian genomes, their contribution to the transcriptome is still largely unexplored even in humans, and in particular, elements controlling their transcription have never been systematically studied [29], thus we cannot even hypothesize a role of this SNP yet.

#### *3.2. DCK*

None of the SNPs located in *JCHAIN* showed significant associations with the considered phenotype, whereas a total of five SNPs in *DCK* were found significantly associated with dEBVs levels with HEM (four SNPs) and MG (three SNPs) approach. It has to be noted that despite the pruning of the high correlated SNPs and the removal of four individuals because of high IBS by MG approach, which could lead to slightly different results with respect to HEM, the two SNPs not found on NCBI databases (6: 88,044,420 and 6: 88,069,428) resulted significant with both HEM and MG approach (Table 2). The intergenic region comprising the SNP at position 88,044,420 and SNP rs1115177107 includes a transcribed RNA, as shown by NCBI database, and could therefore have some not yet described regulatory effects. In *DCK* intron 1, where two significant SNPs (rs43472176 and rs43472180) were described, two ncRNA-like sequences were also found. Moreover, in-between rs43472177 and rs43472180 we found a sequence showing high homology with BTA-mir-2452, a miRNA found expressed upon induced viral infection [30], indicating that it should have a role in the immune response.

#### *3.3. NOD2*

Four out of the six SNPs with significant genotype distribution in H and L groups, were also found significantly associated with dEBVs levels using MG (rs109352180, rs209159307, and rs110918103) and HEM (rs110918103, and rs210362219) approaches. As for *DCK* also for *NOD2* there was at least one SNP in common between the two approaches (rs110918103). The intergenic region comprising rs109352180 shows the presence of transcribed RNA and could therefore have regulatory effects. Moreover, rs109352180 flanks a region partially aligning with three human miRNAs and four bovine lncRNAs, three of which are precursors of miR-185, a miRNA that could influence the expression of different genes [31,32]. The region is also rich in SINEs, whereas rs210362219 is in a sequence aligning with 3 bovine lncRNA and one human miRNA (68% identity).

#### *3.4. MBL2*

None of the 18 SNPs that could be of interest on the basis of WMW test resulted significantly associated with SCS levels by MG approach, neither the three mapped within exon regions (rs209940244, rs210820536, rs463533307), all synonymous mutations, nor the three in proximity of 5- UTR (Table 2). With HEM approach two SNPs resulted significantly associated with the phenotype: rs110884426, mapped in the intergenic region and rs136687134, located about 1000 bp downstream the 3- UTR. Both SNPs are found in a transcribed RNA, as shown by NCBI database. Moreover, rs136687134 is included in a region of Long INterspersed Elements (LINEs), transposable elements that take a large proportion of eukaryotic genomes, once regarded as nonfunctional sequences, and now considered to play pivotal roles in gene regulation [33,34].

#### *3.5. MBL1*

None of the 3 SNPs that could be of interest on the basis of WMW test resulted significantly associated with SCS levels by MG approach, whereas rs211629255, the SNP that falls into intron 2 of *SFPTA1*, resulted significant by the HEM approach (Table 2).

#### *3.6. M-SAA3.2*

The WMW test showed that genotypes distribution in the two sample groups was significantly different for three SNPs, namely rs136687125, rs137746604, and rs210417381. Of the three SNPs, only rs210417381, with also rs42175273, was found to be significantly associated with the phenotype in the MG approach, (Table 2). It has to be considered that the three SNPs located near the 5- UTR of *M-SAA3.2* have a mean correlation of about 0.77 and were therefore all included in the MG analysis, but due to the high correlation among them it is difficult to define which SNPs has actually an effect or if they are all involved in the response to mastitis resistance. The 3 SNPs are included in an RNA transcribed region containing 4 ncRNAs and also a long terminal repeated (LTR) element that should belong to endogenous retrovirus KERVK family. There are evidences that LTR sequences derived from distantly related endogenous retroviruses (ERVs) act as regulatory sequences for many host genes in a wide range of cell types throughout mammalian evolution [35,36]. Re-activation of ERVs is often associated with inflammatory diseases, thus the region could actually be related with mastitis resistance/susceptibility.

#### **4. Discussion**

Starting from the knowledge of the correlation between SCC and mastitis [37], the aim of this study was to analyze the genetic variability of candidate genes and to investigate the association of the identified SNPs with SCS EBVs. As reported by Koeck et al. [38], SCS EBVs are a valuable alternative predictor for mastitis resistance. Candidate genes selected for this study were already known in literature to be related to immune response to mastitis infections [11–19]. To maximize the effects of the different SNPs, bulls to be sequenced were selected between the low and high tails of distribution for SCS. We chose to use the selective genotyping approach: only individuals from the high and low extremes of the trait distribution are selected for genotyping, reducing genotyping work and costs maintaining nearly equivalent efficiency to complete genotyping [39,40].

An overall total of 64 SNPs showed significantly different genotype distributions in the H and L groups or were identified as significantly associated with dEBVs for SCS. Among these SNPs, 15 resulted significantly associated with HM or MG approach and 3 of them with both approaches. The main difference between the two approaches used in this study lies in MG that takes into account the genomic relationship matrix and the correlation between SNPs. Indeed, when correlation is considered, several SNPs associations could actually belong to correlated SNPs that were excluded from the analysis rather than to the reported SNP. Moreover, with the MG approach, given the general small allele substitution effect when considering simultaneously SNPs in different genes, and using a correction for the genomic relationship, a low number of significant SNPs is expected. The remaining SNPs should be, therefore, the ones where the association is stronger. Of the 15 SNPs identified by both approaches, the most significant were located inside intronic regions within *DCK* (rs43472176 and one at position 6: 88,069,428 not previously available on NCBI SNPs databases) and *NOD2* (rs110918103). The newly discovered SNP at position 6: 88,069,428, together with rs110918103, are two of the three SNPs resulted significantly associated with both HM and MG approach. No significantly associated SNPs were found in exon regions, while eight SNPs were located in the intergenic regions and could possibly be related to distal regulatory element. Six genes (*CXCR1*, *DCK*, *NOD2*, *MBL2*, *MBL1* and *M-SAA3.2*) included SNPs that were identified as significantly associated by HEM and/or MG and that are known to be key components in the immune system. As reported by Siebert et al. [41], several studies identified different SNPs in gene associated with mastitis and, furthermore, with SCS. Similarly, *DCK* gene was suggested as candidate gene

associated with mastitis [42]. On the other hand, the direct association between mastitis and *MBL1*, *MBL2*, *NOD2* and *M-SAA3.2* genes has not been fully explored so far. Nevertheless Wang et al. [43], using a GWAS approach found a significant SNP in *SAA2* gene, indicating an important role of the superfamily of these apolipoproteins. These results suggest that, just like *CXCR1* and *DCK* genes, also *MBL1*, *MBL2*, *NOD2* and *M-SAA3.2* genes could be eligible as candidate genes for genetic selection of mastitis resistant cows. Many studies demonstrated that SNPs associated with diseases are mainly located in non-coding regions, making it difficult to link them to specific biological pathways. A recent study, in which a genotyping-by-sequencing approach was used to find novel SNPs associated with milk traits, showed that the majority of identified SNPs were located within intergenic regions (69%), followed by intronic regions (25%), with only 3.46% of SNPs being coding variants [44]. Moreover, different studies found that conserved non-coding regions in introns and near genes show large allelic frequency shifts, similar in magnitude to missense variations, suggesting that they are critical for gene function regulation and evolution in many species [45,46]. Most of the significant SNPs detected in our investigation are located inside or in proximity to complete or partial lncRNA-like or miRNA-like sequences, or to repeated regions containing SINEs or LINEs, and it is now established that both intronic and LINE/SINEs repeats could lead to transcriptional regulation of the affected genes [47]. Thus, although the role of some of these elements, especially in the bovine species, has still to be verified, and further studies are needed to better understand the role of these SNPs, the results obtained here are a starting point. Contrary to the findings of Welderufael et al. [48], no SNPs were identified as significantly associated with dEBVs level within *PTX3* gene in our population. Similarly, no significant SNPs were detected within candidate genes *TLR4* [49].

#### **5. Conclusions**

In this study, next generation sequencing technology was used to discover new SNPs in candidate genes related to mastitis resistance and to identify those associated with dEBVs for SCS. We analyzed the genetic variability of several SNPs found in candidate genes and identified 15 that are associated with the phenotype. Two of them maps within *DCK* gene and were not previously available on NCBI database, whereas other SNPs strengthen the role of *CXCR1*, *NOD2*, *MBL1*, *MBL2*, and *M-SAA3.2* as candidate genes. These results suggest that the possibility to use SNPs as markers for genetic selection of mastitis resistant cattle is plausible.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2076-261 5/11/2/366/s1, Table S1: 20 SNPs found within *PTX3* gene, Table S2: 91 SNPs found within *CXCR1* and *CXCR2* genes, Table S3: 76 SNPs found within *DCK* gene, Table S4: 81 SNPs found within *TLR4* gene, Table S5: 43 SNPs found within *NOD2* gene, Table S6: 54 SNPs found within *MBL2* gene, Table S7: 64 SNPs found within *MBL1* gene, Table S8: 128 SNPs found within SAA3 gene.

**Author Contributions:** Conceptualization, P.S. and R.R.; methodology, P.S. and R.R.; software, D.S., S.C. and R.M.; validation, D.S., S.C. and R.M.; formal analysis, D.S.; investigation, R.M.; resources, P.S.; data curation, S.C.; writing—original draft preparation, R.M.; writing—review and editing, P.S., R.R., S.S., S.C., R.F. and D.S.; visualization, D.S. and R.M.; supervision, P.S.; project administration, P.S.; funding acquisition, P.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Fondazione Cassa di Risparmio di Cuneo (CRC), project "MIGLIORLAT–Miglioramento della qualità e dello sviluppo competitive della filiera latte piemontese" (Bando Ricerca Scientifica 2011) and by Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR) under the programme "Dipartimenti di Eccellenza ex L.232/2016" to the Department of Veterinary Science, University of Turin.

**Institutional Review Board Statement:** Genetic material was obtained from semen doses from specialized laboratories.

**Data Availability Statement:** Data is contained within the article or as supplementary material.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Heritability of Teat Condition in Italian Holstein Friesian and Its Relationship with Milk Production and Somatic Cell Score**

#### **Francesco Tiezzi 1, Antonio Marco Maisano 2, Stefania Chessa 3, Mario Luini 2,4 and Stefano Bi**ff**ani 4,\***


Received: 9 October 2020; Accepted: 29 November 2020; Published: 2 December 2020

**Simple Summary:** Mammary infections in dairy cattle are still a major problem which impair animal health and jeopardize breeders' efforts to attain sustainable production. The first natural protection against pathogens' access to udder tissue is the teat canal. Teat canal morphology can be influenced by several environmental factors but the present study confirmed the existence of a genetic component. Moreover it was observed that the teat canal morphology is related to milk production and somatic cell count, the latter being an indirect indicator of mammary infections. Considering that the current selection objectives implemented in dairy cattle worldwide have shifted toward a more balanced breeding goal, with a larger emphasis on health traits, a further genetic deterioration in teat condition is not expected.

**Abstract:** In spite of the impressive advancements observed on both management and genetic factors, udder health still represents one of most demanding objectives to be attained in the dairy cattle industry. Udder morphology and especially teat condition might represent the first physical barrier to pathogens' access. The objectives of this study were to investigate the genetic component of teat condition and to elucidate its relationship with both milk yield and somatic cell scores in dairy cattle. Moreover, the effect of selection for both milk yield and somatic cell scores on teat condition was also investigated. A multivariate analysis was conducted on 10,776 teat score records and 30,160 production records from 2469 Italian Holstein cows. Three teat scoring traits were defined and included in the analysis. Heritability estimates for the teat score traits were moderate to low, ranging from 0.084 to 0.238. When teat score was based on a four-classes ordinal scoring, its genetic correlation with milk yields and somatic cell score were 0.862 and 0.439, respectively. The scale used to classify teat-end score has an impact on the magnitude of the estimates. Genetic correlations suggest that selection for milk yield could deteriorate teat health, unless more emphasis is given to somatic cell scores. Considering that both at national and international level, the current selection objectives are giving more emphasis to health traits, a further genetic deterioration in teat condition is not expected.

**Keywords:** dairy cattle; teat-end hyperkeratosis; udder health; somatic cell; genetic correlation; selection response

#### **1. Introduction**

Udder health still represents one of the most demanding objectives to be attained in the dairy cattle sector. In spite of the impressive advancements observed on both management and genetic factors, the diseases of the mammary gland in response to an infection, i.e., a clinical (CM) or a subclinical mastitis (SCM), is one of the most common and expensive diseases faced [1–6]. In 2015 the estimated overall cost per case of clinical mastitis in the first 30 days of lactation was \$444 [7]. Moreover, 71% of the costs of a case of clinical mastitis were indirect costs. Among the several factors which can mitigate or increase the occurrence of intramammary infections, udder and teat morphology play a substantial role [8–11]. Indeed, the most relevant pathogens affecting dairy cattle, e.g., *Staphylococcus aureus (S. aureus)* [12] gain access to udder tissues primarily via the teat canal. Teat-end hyperkeratosis (THK), i.e., the hyperplasia of the teat orifice's keratin layer [13], is considered one of the most common teat condition changes observed in dairy cattle [14]. THK is commonly evaluated using an ordinal scale ranging from 1 (no THK) to 4 (severe THK) [15]. Results presented in a recent review show that a condition of severe THK is an important risk factor for both CM and SCM, while a mild THK can mitigate possible SCM [11]. Alterations in teat condition across lactation and parities are expected as a consequence of machine milking [14] but the existence of a genetic component has also been investigated [16,17]. In a first study enrolling 1740 US Holstein cows from nine herds, Chrystal and colleagues [16] found that heritability estimates of teat end shape for first, second, and all lactations combined were 0.53, 0.44, and 0.56, respectively. Teat diameter heritability estimates were 0.23, 0.27, and 0.35, respectively across the three lactations. Teat-end shape and diameter repeatability estimates were 0.75 and 0.36, respectively. In a subsequent study with 1259 cows from a single herd, Chrystal et al [17] found heritability estimates of teat-end shape for first, second, and third and later lactations of 0.34, 0.21, and 0.13, respectively. Repeatability ranged from 0.34 to 0.46.

An additional and widely used phenotypic indicator of udder health is somatic cell count (SCC). After being transformed to somatic cell score (SCS) using a logarithmic transformation [18], it is used as a predictor in the selection for mastitis resistance [3,19]. Some studies [20–27] have investigated the relationship between THK and SCC and most of them reported positive associations, especially when THK is scored as severe. However THK scoring can be confounded with other factors, e.g., parity, and its relationship with SCC might not be clearly identified. Chrystal and colleagues investigated the relationship between teat end shape and SCS in two subsequent studies [16,17]. They used SCS as a dependent variable in a mixed model and included teat end shape as a covariate. In both cases, the effect of teat end shape was not significant. Among the possible explanations the authors hypothesized some genetic factors that would control SCS and prevail over the effect of teat end shape.

The main objectives of this work were to further investigate the genetic component of THK and to elucidate its relationship with both milk yield (MY) and SCS. Moreover, the genetic change in THK when selecting for both MY and SCS was also investigated.

#### **2. Materials and Methods**

#### *2.1. Dataset*

Animal welfare and use committee approval was not needed for this study as datasets were obtained from pre-existing databases based on routine animal recording procedures. Data used in the present study were collected between September 2011 and August 2012 and between March 2016 and January 2017 in 48 Italian Holstein dairy farms located in the Lombardy region (Northern Italy, 45.4791◦ N, 9.8452◦ E). The average herd size was 106 milking cows, ranging from 38 to 285 cows. The original data set included 3087 cows [28].

Teat condition score (TCS) was evaluated visually for each cow during milk sampling and assigned a score using the methodology proposed by Neijenhuis et al. [15]: an absent callosity was evaluated as TCS = 1, a smooth callous ring around the orifice was evaluated as TCS = 2; a rough and very rough callous rings were evaluated as TCS = 3 and 4, respectively. Scores were applied on each single teat, generating four records for each cow scored, each one was assigned to the respective teat position (FL: front left; FR: front right; RL: rear left; RR: rear right). In addition to scoring values, date of scoring and hygiene scoring were obtained. Hygiene of udder, flanks and legs was scored based on a 4-point scale system [29], from very clean (score 1) to very dirty skin (score 4). Both TCS and hygiene score were collected by a group composed of four technicians who were previously trained to harmonize the scoring procedure.

Production records for daily milk yield (MY) and SCC per ml were obtained by the Regional Breeders' Association (ARAL, Crema, Italy). The variable SCC was log-transformed into SCS by using the formula SCS = log2(SCC/100,000) + 3 [18]. Together with production records, calving dates, herd number and pedigree information were obtained. Using calving date, it was possible to extract the parity order and days in milk at teat scoring and production recording.

Teat scoring was organized in a dataset where the four quarters appeared as repeated information of the same phenotype. Each herd was visited and scored only once although the scoring was conducted over two consecutive days in two of the herds. Teat position information was assigned to each record, so that each cow showed four records with the respective position indicator. Only cows showing all four records were included in the analysis.

Production records from the same cows were retained and were organized as repeated measures on the same individuals. For each cow, only production records from the same lactation number of the scoring were retained. The median number of records per cow was 10, each record reported MY and SCS for that cow in that test-day.

The final dataset was built stacking the teat score and production datasets. Records that showed teat scoring did not show production records, and vice versa.

The final dataset included 10,776 scoring records and 30,160 production records, for a total of 40,936 records available for analyses. All records came from 2649 cows, daughters of 869 sires and 2408 dams, while 275 cows appeared also as dams. All animals were raised in a total of 48 herds.

The choice of building a dataset where test scores and production records were not aligned was dictated by practical modeling decisions. The alignment of the teat score records with the closest production record could have been done. However, the repeated nature of the teat score record would have forced the production record to be repeated four times, which was not appropriate. At the same time, repeated records for both teat scores and production records allowed us to disentangle the additive genetic and permanent environmental effects. By not aligning records, the residual covariance between the teat scores and the production records could not be estimated. Nevertheless, such covariance should be forced to zero, because the phenotypes came from different data capture and recording systems. The cow permanent environmental effect allowed us to estimate a non-genetic covariance between the two sets of traits.

#### *2.2. Statistical Analysis*

Three teat scoring traits were defined and included in the analysis. A four-classes ordinal scoring (1, 2, 3, 4) was maintained for the trait hereinafter called TS. In addition, two binary traits were created. Classes 2, 3 and 4 were combined into a single class creating the trait hereinafter called TSa; classes 1 and 2 vs. 3 and 4 were combined to create trait TSb. Relative frequencies for all trait classes are reported in Table 1.

A tri-variate threshold-linear model was used for analyzing traits combination such as TS-MY-SCS, TSa-MY-SCS, TSb-MY-SCS. Model specifications were:

$$y(\lambda) = \mathbf{X}b + \mathbf{Z}\_{\mathbb{H}}h + \mathbf{Z}\_{\mathbb{P}}p + \mathbf{Z}\_{\mathbb{S}}s + \varepsilon$$

where *y* is the phenotype vector for MY and SCS, λ is the underlying liability for TS, TSa or TSb, *X* and *b* are the incidence matrix and vector of solutions for fixed effects, *Zh* and *h* are the incidence matrix and vector of solutions for the herd environmental random effect, *Zp* and *p* are the incidence matrix and vector of solutions for the cow permanent environmental random effect, *Zs* and *s* are the incidence matrix and vector of solutions for the sire additive genetic random effect, *e* is the vector of residuals. Fixed effects common to all traits were parity by stage of lactation class (36 classes), defined by pasting the parity class (first, second and later lactations) with the stage of lactation month (last class including records until the 15th month of lactation). Additional fixed effects for TS, TSa and TSb were hygiene (4 classes) and teat position (4 classes). The vector of solutions for random effects were assumed to be normally and independently distributed. Specifically for the herd random effect:

$$
\begin{bmatrix} h\_1 \\ h\_2 \\ h\_3 \end{bmatrix} \sim N(0, \mathbf{I} \otimes \mathbf{H}),
$$

where **I** is an identity matrix and **H** is the following variance-covariance matrix:

$$
\begin{bmatrix}
\sigma\_{h11}^2 & \sigma\_{h12} & \sigma\_{h13} \\
\sigma\_{h12} & \sigma\_{h22}^2 & \sigma\_{h23} \\
\sigma\_{h13} & \sigma\_{h23} & \sigma\_{h33}^2
\end{bmatrix}
$$

Similarly, for the cow permanent environmental effect:

$$
\begin{bmatrix} p\_1 \\ p\_2 \\ p\_3 \end{bmatrix} \sim N(0, \mathbf{I} \otimes \mathbf{P}),
$$

where **I** is an identity matrix and **P** is the following variance-covariance matrix:

$$
\begin{bmatrix}
\sigma\_{p11}^2 & \sigma\_{p12} & \sigma\_{p13} \\
\sigma\_{p12} & \sigma\_{p22}^2 & \sigma\_{p23} \\
\sigma\_{p13} & \sigma\_{p23} & \sigma\_{p33}^2
\end{bmatrix}
$$

For the sire additive genetic effect:

$$
\begin{bmatrix} s\_1 \\ s\_2 \\ s\_3 \end{bmatrix} \sim N(0, \mathbf{A} \otimes \mathbf{S}),
$$

where **A** is the numerator relationship matrix constructed on the pedigree and **S** is the following variance-covariance matrix: ⎡

$$
\begin{bmatrix}
\sigma\_{s11}^2 & \sigma\_{s12} & \sigma\_{s13} \\
\sigma\_{s12} & \sigma\_{s22}^2 & \sigma\_{s23} \\
\sigma\_{s13} & \sigma\_{s23} & \sigma\_{s33}^2
\end{bmatrix}
$$

and for the residual effect:

$$
\begin{bmatrix} e\_1 \\ e\_2 \\ e\_3 \end{bmatrix} \sim N(0, \mathbf{I} \otimes \mathbf{R}),
$$

where **I** is an identity matrix and **R** is the following variance-covariance matrix:

⎢⎢⎢⎢⎢⎢⎢⎢⎣

$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & \sigma\_{\ell 22}^2 & \sigma\_{\ell 23} \\
0 & \sigma\_{\ell 23} & \sigma\_{\ell 33}^2
\end{bmatrix}
$$

Therefore, residual variance for any teat scoring trait was fixed to 1 for identifiability in the threshold liability model and the covariance between the teat scoring and production traits was fixed to 0, due to lack of records showing both traits.

Variance components estimates were obtained using software THRGIBBS1F90 version 3.1 [30]. Solutions for TSa and TSb were then obtained using the package 'MCMCglmm' implemented in R version 3.6.1 [31] by fixing the variance components to the estimated values. This package was chosen for easier manipulation of solutions in the R environment. In both cases, a total of 300,000 iterations were run, removing the first 50,000 as burn-in and thinning every 50 iterations. Convergence was assessed by visual inspection of trace plots and Geweke's test conducted using the 'coda' package in R [32].

**Table 1.** Frequency of observations for each teat score in the dataset (*n* = 10,776).


<sup>1</sup> TS = teat scoring trait based on a four-classes ordinal scoring (1 = absent callosity, 2 = smooth callous ring, 3 = rough callous ring, 4 = very rough callous ring). TSa: binary teat scoring trait where classes 2, 3 and 4 were combined into a single class. TSb: binary teat scoring trait where classes 1 and 2 vs. 3 and 4 were combined.

#### *2.3. Genetic Parameters*

Heritability was calculated as:

$$h^2 = \frac{4\sigma\_s^2}{\sigma\_s^2 + \sigma\_p^2 + \sigma\_h^2 + \sigma\_c^2}$$

Intra-herd heritability was calculated as:

$$h\_{IH}^2 = \frac{4\sigma\_s^2}{\sigma\_s^2 + \sigma\_p^2 + \sigma\_c^2}$$

Cow repeatability was calculated not considering the genetic effect as a source of cow repeatability, therefore:

$$r^2 = \frac{\sigma\_p^2}{\sigma\_s^2 + \sigma\_p^2 + \sigma\_h^2 + \sigma\_\varepsilon^2}$$

Herd repeatability was calculated as:

$$he^2 = \frac{\sigma\_{\rm li}^2}{\sigma\_s^2 + \sigma\_p^2 + \sigma\_{\rm li}^2 + \sigma\_e^2}$$

In all formulas, σ<sup>2</sup> *<sup>e</sup>* was '1' for the teat score traits. Genetics, cow and herd correlations were calculated by dividing the covariance by the product of the standard deviations. Parameters were calculated at every iteration and posterior mean and 95% empirical confidence intervals were used as estimates of their standard error.

Least square means for all effects were calculated on the liability scale at every iteration and transformed to the phenotypic (probability) scale. Posterior mean and 95% empirical confidence intervals were used as estimates and their error. Plots were created using package 'ggplot2' implemented in R [33].

#### *2.4. Genetic Change in Teat Score Traits*

Expected selection response on the teat score traits was calculated for economic indices that combined MY and SCS with different values of relative emphasis. One economic index for each of the three teat score traits was built, including three traits (the teat score traits, MY and SCS). Relative emphasis was applied on MY and SCS, starting with 100% emphasis on MY and 0% emphasis on SCS and ending with 0% emphasis on MY and 100% on SCS, with step increases of 1% unit. While MY was given positive emphasis, SCS was given negative emphasis. Following [34–36] the genetic response on the teat score trait was calculated as:

$$r = \frac{\mathbf{w}'\_{\mathbf{g}^\circ 1}}{\sqrt{\mathbf{w}' \mathbf{G} \mathbf{w}}}$$

where *r* is the genetic response for the teat score trait (TS, TSa or TSb); w is the vector of index weight; **G** is the variance-covariance between the teat score trait, MY and SCS; g◦<sup>1</sup> is the first column of **G**. The genetic response was then expressed in genetic standard deviation units for the respective teat score trait.

#### **3. Results**

#### *3.1. Data*

Descriptive statistics for the teat score traits are in Table 1. Mean values and SD for daily milk production (MY) and daily linear score (SCS) were 29.73 ± 9.05 for and 3.97 ± 1.99 (not reported in table). The observed phenotypic trend for MY and SCS by parity and month of lactation are in Figures 1 and 2, respectively.

**Figure 1.** Milk yield across parity and stage of lactation for cows with a teat-score evaluation.

**Figure 2.** Somatic cell score across parity and stage of lactation for cows with a teat-score evaluation.

#### *3.2. Heritability and Environmental E*ff*ects Estimates*

Estimates of variance components, heritability, intra-herd heritability, cow repeatability and herd repeatability for all traits are reported in Table 2. Estimates for MY and SCS came from the model where TS was the correlated trait, though posterior distribution estimates for production traits in the other models largely overlapped the values shown.


**Table 2.** Estimates (posterior mean and 95% highest probability density intervals) of variance components and genetic parameter estimates for the traits considered in this study.

<sup>1</sup> TS = teat scoring trait based on a four-classes ordinal scoring (1 = absent callosity, 2 = smooth callous ring, 3 = rough callous ring, 4 = very rough callous ring). TSa: binary teat scoring trait where classes 2, 3 and 4 were combined into a single class. TSb: binary teat scoring trait where classes 1 and 2 vs. 3 and 4 were combined. MY = daily milk yield. SCS = daily somatic cell score.

Heritability estimates for the teat score traits were moderate to low, with TS showing the largest value (0.238) followed by TSa with 0.162 and TSb with 0.084. A similar but weaker trend was found for the intra-herd heritability, TS showed an estimate of 0.289, TSa showed 0.248 and TSb showed 0.250. The weaker trend was due to the herd effect also decreasing from 0.221 for TS to 0.159 for TSa to 0.093 for TSb. Cow repeatability (free of genetic variance) was 0.554 for TS, 0.46 for TSa and 0.251 for TSb.

Heritability and intra-herd heritability estimates were larger for SCS than MY (0.146 vs. 0.065 and 0.234 vs. 0.126, respectively). Also cow repeatability was stronger for SCS than MY (0.368 vs. 0.257) while herd effect was weaker for SCS than MY (0.153 vs. 0.251, respectively).

#### *3.3. Genetic and Environmental Correlations*

Correlations between teat scoring and production traits for all random effects are reported in Table 3. Correlations were considered significant when the 95% empirical confidence intervals did not include the '0' value.


**Table 3.** Estimates (posterior mean and 95% highest probability density intervals) of genetic correlations between teat score and production traits.

<sup>1</sup> TS = teat scoring trait based on a four-classes ordinal scoring (1 = absent callosity, 2 = smooth callous ring, 3 rough callous ring, 4 = very rough callous ring). TSa: binary teat scoring trait where classes 2, 3 and 4 were combined into a single class. TSb: binary teat scoring trait where classes 1 and 2 vs. 3 and 4 were combined. MY = daily milk yield. SCS = daily somatic cell score.

Genetic correlations for TS and TSa with MY were positive and significant, showing values of 0.862 and 0.893, respectively. The correlation between MY and TSb was not significant. Only the genetic correlation between TS and SCS was significant showing a value of 0.439.

None of the cow permanent environmental correlations of teat scoring traits with MY were significant. All correlations with SCS were significant and positive but weak, showing values of 0.141, 0.125 and 0.152 for TS, TSa and TSb.

For the herd environmental correlations, the only significant estimate was found between TSb and MY with a value of 0.463. However, for all estimates the tendency was for a positive relationship between teat scoring traits and MY and a negative relationship with SCS.

#### *3.4. Influence of Fixed E*ff*ects on Teat Scores*

Least square mean estimates for the hygiene score are reported in Table 4 while estimates for the teat position effect are reported in Table 5. Estimates are expressed on the phenotypic scale (probability of a TS greater than 1 for TSa or TS greater than 2 for TSb).

The hygiene effect did not appear to be relevant for any traits, since the posterior distributions of the estimates overlapped.

A relationship was observed among teat positions and TSa. Indeed front quarters showed a higher probability of a TS greater than 1 or TS greater than 2. The same pattern was found for TSb, with probabilities for front left and front right quarters at 0.095 and probabilities for rear left and rear right quarters at 0.094. While the front-back contrast could not be declared significant at the chosen α threshold of 0.05, the trend appeared evident.


**Table 4.** Least square mean estimates (posterior mean and 95% highest probability density intervals) for the influence of Hygiene score on TSa and TSb traits. Estimates are expressed on the phenotypic scale (probability of a TS greater than 1 for TSa or TS greater than 2 for TSb).

<sup>1</sup> TSa: binary teat scoring trait where TS classes 2, 3 and 4 were combined into a single class. TSb: binary teat scoring trait where TS classes 1 and 2 vs. 3 and 4 were combined. <sup>2</sup> Hygiene of udder, flanks and legs was scored based on a 4-point scale system, from very clean (score 1) to very dirty skin (score 4).

**Table 5.** Least square mean estimates (posterior mean and 95% highest probability density intervals) for the influence of udder quarter on on TSa and TSb traits. Estimates are expressed on the phenotypic scale (probability of a TS greater than 1 for TSa or TS greater than 2 for TSb).


<sup>1</sup> TSa: binary teat scoring trait where TS classes 2, 3 and 4 were combined into a single class. TSb: binary teat scoring trait where TS classes 1 and 2 vs. 3 and 4 were combined.

Least square mean estimates expressed on the phenotypic scale (i.e., probability) for each parity by month of lactation effect are in Figure 3. Results for TSa are in the upper part while results for TSb in the lower part of the figure.

**Figure 3.** Least square mean estimates expressed on the phenotypic scale (posterior mean and 95% highest probability density intervals) for the lactation by stage of lactation effect on TSa (upper figures) and TSb (lower figures) traits. The dashed horizontal line represents the estimate for that lactation across stages.

Estimates for TSa in lactation 1 showed an increase in the higher scores towards the end of lactation with estimates going from below 0.25 to 0.30. Values similar to the end of lactation 1 were maintained in lactation 2 and subsequently, suggesting that the deterioration of teats is carried over (it is worth considering that this is a cross-sectional study, and not a longitudinal study). Estimates within lactations 2 and subsequently did not show as clear a pattern, though a small increase was shown until the peak of lactation. Lactation estimates across stages for TSa increased from 0.28 to 0.47 to 0.51 in lactation 1, 2 and 3, respectively. Similarly, lactation estimates across stages for TSb increased from 0.094 to 0.097 to 0.106 for lactation 1, 2 and subsequently. Conversely, estimates within-lactation for TSb did not show relevant differences.

#### *3.5. Genetic Response in Teat Score Traits*

The genetic response in teat score traits (TS, TSa and TSb) after selecting for both MY and SCS applying different relative emphasis is reported in Figure 4. The x-axis gives, from left to right, the relative emphasis as shifting from MY to SCS, the y-axis gives the genetic response in genetic standard deviation units for the respective trait given the overall relative emphasis. The values on the left side of the graph show the response when low emphasis is given to MY, which is positively (unfavorably) correlated with the three traits, and high emphasis is given to SCS, which is still positively correlated but it was given negative emphasis.

**Figure 4.** Expected genetic progress in Teat Score traits (y-axis) when shifting relative economic emphasis (x-axis) on milk yield (MY) vs. somatic cell score (SCS). Negative values mean an improvement in teat score (less teat-end hyperkeratosis).

When the emphasis on MY was below 12 (and subsequently the emphasis on SCS was over −88) genetic response for all three teat score traits was negative, i.e., teat score improved. The response for TS became positive at the value of MY emphasis of 17 (i.e., −83 for SCS), became positive for TSa at the MY emphasis value of 14 (i.e., −86 for SCS) and at the value of 22 for TSb (i.e., −78 for SCS). All the response curves reached a plateau around the value of 40 for emphasis on MY. At large values of MY emphasis, genetic response became strong and positive (unfavorable) for TS (0.56 genetic standard deviation units), moderately positive for TSa (0.32 genetic standard deviation units) and almost null

for TSb (0.09 units). The genetic response was therefore mostly unfavorable for all traits unless most of the emphasis was given to SCS (at least 90%). While genetic correlations with other productive and reproductive traits are missing, we could infer that strong index emphasis should be given to udder health if the deterioration of teat score at the genetic level needs to be avoided. The response was stronger, in either direction, for TS than TSa, probably due to the larger heritability since the genetic correlations with MY and SCS were about the same for both traits. The response was small to null for TSb.

#### **4. Discussion**

The results of the present study confirm that THK has a genetic component. Indeed, heritability ranged from 0.084 for TSb (binary trait) to 0.238 for TS (scored using a 1 to 4 scale). These results agree well with the estimates previously reported by Chrystal et al. [17] but are lower than those presented by Lojda et al. [37], Seykora and McDaniel [38], and Chrystal et al. [16]. An interesting aspect regards the scale used to score THK which might influence heritability estimates. Pantoja et al [11], in their systematic review of the association between THK and mastitis, support the idea that THK should be scored using a reduced number of categories. This approach will guarantee an adequate number of records per THK score, hence a better statistical power. However reducing the number of THK scores will partly jeopardize its actual variation and, as [17] suggested, possibly lead to lower heritability estimates. In the present study, the relative frequency of a TS = 4 (i.e., very rough callous ring) was below 2%, lower than that reported by Pantoja et al [11]. However, the 95% highest probability density intervals of the posterior mean heritability estimate for TS trait ranged from 0.116 to 0.364, suggesting that a 4-point scale is still a viable solution. A similar pattern was observed for cow repeatability, with larger estimates when THK was scored on a 4-point scale. This result is quite interesting because it suggests that reducing the scale and treating the THK as a binary trait could be a less accurate measure if recorded only once per cow.

Parity and DIM strongly affect THK [11] and the results found in the present study, even if modulated by the scoring scale, confirm this finding.

The relationship between THK and other breeding goals (e.g., milk yields and somatic cell score) has been mainly investigated considering the former as the dependent variable (Y) and the latter as possible factors affecting Y. The relationship, especially with SCS, was not always clear and significant [11,16,17]. In the present study a different approach has been applied. Indeed, previous studies, not only in dairy cattle [3] but also in other dairy species [39], have suggested a non-zero genetic correlation between udder traits and milk and/or SCS. This aspect is not negligible and should be taken into consideration when implementing a selection index. If the genetic correlation is statistically different from zero there could have been a correlated favorable or unfavorable response on THK. This is particularly true if we consider that selection on milk yield has been the primary objective in dairy cattle for many years [40]. The genetic correlation between THK and milk yield was large and positive, i.e., unfavorable, confirming previous findings [41] in dairy cattle. However, a recent and large study by Tribout [42] in dairy cattle did not confirm the existence of a single quantitative trait locus (QTL) with pleiotropic effect on both milk production and udder morphology suggesting instead the presence of neighboring QTLs that show linkage disequilibrium, eventually leading to a non-null genetic correlation. When using THK scored on 1–4 scale, the genetic correlation was positive and significant (95% empirical confidence intervals not including the '0' value). A positive genetic correlation means that an increase in SCS causes an increase in THK hence selection for lower **SCS** should have a positive impact on teat-end condition. When the relative emphasis on milk yield is below 15%, the response in teat score traits is favorable. Nowadays the focus of selection has moved away from being purely production oriented toward a more balanced breeding goal, even if milk yield has still a relative emphasis not less than 40–50% [40].

In this study we also estimated cow-level correlations as well as herd-test-day correlations. While the former expresses trait associations that are driven by any cow-related factor other than

additive genetics, the latter expresses an association at the herd-level, which is likely due to management alone. In management, we include any strategic choice made by the farmer as well as the equipment used for milking, which is known to affect the insurgence of THK.

Cow correlations were not significant when MY was the correlated trait, but were positive, though weak, when SCS was the correlated trait. This suggests that, beyond genetics, there is a common factor that drives SCS to increase and THK to appear more frequently. At the herd level, correlations were not significant when TS and TSa were the correlated traits. TSb was positively correlated with MY and negatively correlated with SCS. This suggest that herd management (and equipment) that leads to higher MY also leads to higher THK. Also, the herd management that leads to lower SCS also leads to higher THK (moving from classes 1 and 2 to classes 3 and 4, specifically).

The different direction of the correlation between TSb and SCS depending on the additive genetic, cow permanent or herd level suggests that there are counter-acting factors. While cows themselves could show a positive association for which higher SCS means higher THK, the herd management (and equipment) shows the opposite direction, i.e., herds with lower SCS have higher THK and vice versa. The milking practice, for example, could be a factor determining this association.

#### **5. Conclusions**

In the current study, multivariate analysis conducted on 10,776 scoring records and 30,160 production records from 2469 Italian Holstein cows enabled the estimation of the genetic parameters for teat-end score and its relationship with milk yields and SCS. Teat-end score has a genetic background and is genetically related to both production and SCS. The scale used to classify teat-end score has an impact on the magnitude of the estimates which however were always statistically different from zero. Teat-end score was also genetically related to SCS, reinforcing the idea that udder morphology is still a fundamental piece in the control of mammary infection. Finally, an unfavorable genetic correlation of teat score with milk yield was observed. However, considering that the current selection objectives implemented in dairy cattle worldwide have shifted toward a more balanced breeding goal a further genetic deterioration in teat-score is not expected. The importance of the negative effects of some environmental aspects, such as milking routine, should not be forgotten.

**Author Contributions:** Conceptualization, F.T. and S.B.; methodology, F.T. and S.B.; software, F.T. and S.B.; investigation, S.B. and S.C.; resources, S.B.; data curation, A.M.M. and M.L.; writing—original draft preparation, F.T. and S.B.; writing—review and editing, F.T., S.B., S.C., A.M.M. and M.L.; supervision, S.B.; funding acquisition, F.T. and S.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Short Term Mobility (STM) 2018 Program by the Italian National Research Council (CNR).

**Acknowledgments:** Data were provided by the Lombardy Region: Project N. 1745–MASTFIELD "Applicazione di sistemi molecolari innovativi per il controllo in campo delle mastiti bovine." The authors thank the agronomists and veterinarians involved in the project for their important contributions in sampling milk and recording management scores: Lucio Zanini, Rosangela Garlappi, Carla Cattaneo, Giorgio Oldani (ARAL, Associazione Regionale Allevatori Lombardia, Italy).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
