**Genetic Diversity Assessment and Marker-Assisted Selection in Crops**

Editors

**Francesco Mercati Francesco Sunseri**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Francesco Mercati National Research Council Italy

Francesco Sunseri University Mediterranea of Reggio Calabria Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Genes* (ISSN 2073-4425) (available at: https://www.mdpi.com/journal/genes/special issues/ marker-assisted selection).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-0854-2 (Hbk) ISBN 978-3-0365-0855-9 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

**Francesco Mercati** was born in Milan, received a degree in Biological Science in 2003, and is a researcher at the Institute of Biosciences and BioResources (IBBR) UOS Palermo, CNR—National Research Council of Italy (e-mail: francesco.mercati@ibbr.cnr.it). He obtained his PhD in Biology Applied to Agri-Food and Forest systems at the University Mediterranea of Reggio Calabria. He has also been a visiting scientist at the Department of Plant Biology, University of Georgia (USA). The main research activities of Dr. Mercati are focused on biodiversity, marker assisted selection, and the study of genes associated to trait of interest. He is the author of approximately 80 scientific publications, the most important published in ISIS and Scopus indexed journals, available at https: //www.cnr.it/people/francesco.mercati.

#### Research Interest:

Genetic structure of plant biodiversity. Next-generation sequencing (NGS) technologies and their implications for plant genetics, DNA barcode, and structure population. Genotyping-by-Sequencing (GBS). Molecular Assisted Selection (MAS). Olive and grape cultivars (cultivated and wild populations) from Sicily genotyping by using different molecular markers (ISSR, AFLP, SSR). Molecular markers investigation for somatic protoplast fusion hybrids and cybrids.

**Francesco Sunseri** was born in Palermo, received a degree in Agricultural Science in 1983, and received a permanent position at the University Mediterranea as Associate Professor on Plant Genetics and Breeding in 2008. He is the leader of many projects focused on plant biodiversity conservation and exploitation, as well as molecular assisted selection on horticultural crops such as asparagus, tomato, and eggplant. A member of the Italian Society of Plant Genetics (S.I.G.A.) and the International Society Horticultural Science (I.S.H.S.). Associate Editor of BMC Plant Biology and Genes (MDPI) and occasional reviewer for several scientific journals devoted to Plant Genetics and Horticultural Science. Author of approximately 80 scientific publications indexed in ISI and Scopus databases.

#### Research Interest:

Genetic structure of plant biodiversity. Genomics and Transcriptomics. Genomic Wide Association Selection and Mapping (GWAS). Genotyping-by-Sequencing (GBS). Molecular Assisted Selection (MAS). QTL analysis and cloning. Plant abiotic stress with particular interest in Nitrogen Use Efficiency (NUE).

## *Editorial* **Genetic Diversity Assessment and Marker-Assisted Selection in Crops**

#### **Francesco Mercati 1,\* and Francesco Sunseri 2,\***


Received: 16 November 2020; Accepted: 7 December 2020; Published: 9 December 2020

Global warming is negatively impacting on crop yield and Earth's climate changes can bring possible negative effects on the growth and reproductive success of crops. Therefore, the exploitation of biodiversity is essential to select more resilient genotypes employable in more sustainable cropping systems.

The assessment of genetic diversity from the major crops and their wild relatives together with its exploitation have been always among the main challenges for plant breeding, as recently highlighted [1,2]. The wide utilization of molecular markers for mapping traits of agronomic interest in specific genomic regions appears to return back another pivotal effort for the future development of novel cultivars [3]. Indeed, the improvement of plant breeding efficacy has always gone through the construction of exotic genetic libraries, exploiting the genetic resources [4].

Nowadays, there is evidence that MAGIC and other exotic populations will play a major role in the coming years in allowing for impressive gains in plant breeding for developing new generations of improved cultivars [5].

This Special Issue focused on the application of such advanced technologies devoted to crop improvement, exploiting the available biodiversity in crops. In detail, next-generation sequencing (NGS) technologies supported the development of high-density genotyping arrays for different plants included in this issue.

By using a high throughput approach, here we report a new high-resolution eggplant (*Solanum melongena* L.) genetic map based on a RIL population and Genotyping by Sequencing (GBS) analysis by which 7249 SNPs were assigned to the 12 chromosomes spanning 2169.23 cM [6]. Afterwards, the phenotyping of the RIL population at three locations allowed us to elucidate the genetic bases of seven traits related to anthocyanin content in eggplant as well as seed vigor. Overall, between 7 and 17 QTLs (at least one major QTL) were identified for each trait [6]. Otherwise, a genome-wide association scan (GWAS) using 121 accessions and a 9K single nucleotide polymorphisms (SNPs) chip were also reported to clarify the genetic determinants underlying drought tolerance in barley (*Hordeum vulgare* L.) [7]. Overall, a total number of 101 significant SNPs, distributed over all seven barley chromosomes, were found to be highly associated with the studied traits, of which five genomic regions were associated with candidate genes at chromosomes 2 and 3 [7].

The limited availability of simple sequence repeats (SSR) in *Paeonia lactiflora*, a flowering crop with great economic value, triggered a study to develop a novel SSR panel with Illumina RNA sequencing for also assessing the role of these variants in gene regulation. The results showed that dinucleotides with AG/CT repeats were the most abundant type of repeat motif in *P. lactiflora* and were preferentially distributed in untranslated regions. Significant differences in SSR size were observed among motif types and locations [8]. This new set of SSRs will aid programs for accession identification, marker-trait association and molecular assisted breeding in *P. lactiflora* [8].

QTL-related Lethal Necrosis (LN) tolerance/resistance in maize (*Zea mays* L.) has been studied by using five hundred selected kompetitive allele specific PCR (KASP) SNPs and multiple mapping populations [9]. To understand the status of previously identified quantitative trait loci (QTL) in diverse genetic backgrounds, F3 progenies derived from seven bi-parental populations were genotyped and phenotyped under artificial LN inoculation for three seasons. Joint linkage association mapping revealed at least seven major QTL spread across the 7-biparetal populations, for resistance to LN infections potentially useful for marker-assisted breeding [9].

A particular resequencing approach was utilized for exploring the natural variation and the domestication selection of *ZmPGP1*, involved in the polar auxin transport and associated to plant height, leaf angle, yield traits, and root development in maize (*Z. mays* L.) [10]. Li et al. (2019) [10] re-sequenced this gene in 349 inbred lines, 68 landraces, and 32 teosintes. Sequence polymorphisms, nucleotide diversity, and neutral tests revealed that *ZmPGP1* might be selected during domestication and improvement processes. Marker–trait association analysis identified 11 variants significantly associated with 4 plant architecture and 5 ear traits, revealing that significant variants in *ZmPGP1* can be used to develop functional markers to improve plant architecture and ear traits in maize [10].

Another particular approach was reported for two popular fruit crops such as strawberry (*Fragaria* × *ananassa* Duchesne) and raspberry (*Rubus idaeus* L.). Here, Lebedev et al. (2020) [11] reported the potential transferability between the species of a large SSR panel for their employment in breeding programs assisted by functional DNA markers [11]. One hundred eighteen (118) microsatellite loci in the flavonoid biosynthesis were developed to assess the genetic diversity of 48 *Fragaria* and *Rubus* accessions, including wild species and rare cultivars, which differ in berry color, ploidy, and origin. SSR panel may be a useful molecular tool in strawberry and raspberry breeding programs for improvement anthocyanin related traits [11].

Informative molecular markers such as SSR were adopted also for detecting hybridity and homozygosity in breeding segregant populations in lettuce (*Lactuca sativa* L.). In this study, a panel of 16 SSR was used to genotype 71 putative parental lines and to plan 89 controlled crosses designed to maximize the genetic recombination [12]. Unexpected genotypes were detected (5%), consistent with this species' spontaneous out-pollination rate. Overall, the synergistic advantages of conventional and molecular selection applied in different steps of a breeding programs aimed at developing new varieties were demonstrated [12].

Two other manuscripts reported the usefulness of molecular analysis for analyzing germplasm collections [13,14]. In the first, an SSR panel was adopted to analyze the official Algerian olive (*Olea europea* L.) collection highlighting a biodiversity hotspot in the Mediterranean Basin [13]. The olive germplasm was characterized using 16 nuclear (nuSSR) and six chloroplast (cpSSR) microsatellites, useful to underline the presence of an exclusive genetic core represented by 13 cultivars located in a mountainous area in the North-East of Algeria, named Little Kabylie. The genetic relationship of Algerian and Mediterranean olive germplasm was assessed, suggesting possible events of secondary domestication and/or crossing and hybridization across the Mediterranean area [13]. The second manuscript described the genetic diversity in sweet potato (*Ipomea batatas* L.) genetic resources by morphological and molecular markers [14]. The EU market of this orphan crop has recently increased by 100%, and its cultivation in southern European countries is a new opportunity for the EU to exploit and introduce new genotypes. In this view, the origins of the principal Italian sweet potato clones, compared with a core collection of genotypes from Central and Southern America, were investigated by combining genetic analysis with morphological and chemical measurements [14]. Overall, these markers combination resulted as being effective to cluster the sweet potato clones in agreement with their geographical origin [14]. The last report included in this Special Issue focused on molecular markers supporting the in situ conservation of faba bean (*Vicia faba* L.) landraces in Tunisia [15]. The seed phenotypic features of the collected samples were analyzed, together with the genetic diversity and population structure, by using simple sequence repeat markers, highlighting

the genetic stability of the population under study. These findings suggested that farmers applied international best practices for the in situ conservation of agricultural crops [15].

In conclusion, the Special Issue focused on the development and application of such technologies associated with adaptation and functional crop improvement, exploiting the available biodiversity in very different crops, from vegetables and legumes (eggplant, lettuce, sweet potato and faba bean), through important cereals (barley and corn) to very important Mediterranean trees (olive). This issue has allowed a scientific journey through the use of consolidated molecular markers, such as SSR, as well as novel classes of molecular markers obtainable by the new technologies (NGS). These were applied at the genetic analysis of germplasm collections, but also to the findings of new markers and QTL for assisted breeding programs.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


15. Babay, E.; Khamassi, K.; Sabetta, W.; Miazzi, M.M.; Montemurro, C.; Pignone, D.; Danzi, D.; Finetti-Sialer, M.M.; Mangini, G. Serendipitous In Situ Conservation of Faba Bean Landraces in Tunisia: A Case Study. *Genes* **2020**, *11*, 236. [CrossRef] [PubMed]

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

## **A New Intra-Specific and High-Resolution Genetic Map of Eggplant Based on a RIL Population, and Location of QTLs Related to Plant Anthocyanin Pigmentation and Seed Vigour**

**Laura Toppino 1, Lorenzo Barchi 2,\*, Francesco Mercati 3, Nazzareno Acciarri 4, Domenico Perrone 5,6, Matteo Martina 2, Stefano Gattolin 7, Tea Sala 1, Stefano Fadda 1, Antonio Mauceri 8, Tommaso Ciriaci 4, Francesco Carimi 3, Ezio Portis 2, Francesco Sunseri 8, Sergio Lanteri <sup>2</sup> and Giuseppe Leonardo Rotino <sup>1</sup>**


Received: 8 June 2020; Accepted: 2 July 2020; Published: 4 July 2020

**Abstract:** Eggplant is the second most important solanaceous berry-producing crop after tomato. Despite mapping studies based on bi-parental progenies and GWAS approaches having been performed, an eggplant intraspecific high-resolution map is still lacking. We developed a RIL population from the intraspecific cross '305E40', (androgenetic introgressed line carrying the locus *Rfo-Sa1* conferring *Fusarium* resistance) x '67/3' (breeding line whose genome sequence was recently released). One hundred and sixty-three RILs were genotyped by a genotype-by-sequencing (GBS) approach, which allowed us to identify 10,361 polymorphic sites. Overall, 267 Gb of sequencing data were generated and ~773 M Illumina paired end (PE) reads were mapped against the reference sequence. A new linkage map was developed, including 7249 SNPs assigned to the 12 chromosomes and spanning 2169.23 cM, with iaci@liberoan average distance of 0.4 cM between adjacent markers. This was used to elucidate the genetic bases of seven traits related to anthocyanin content in different organs recorded in three locations as well as seed vigor. Overall, from 7 to 17 QTLs (at least one major QTL) were identified for each trait. These results demonstrate that our newly developed map supplies valuable information for QTL fine mapping, candidate gene identification, and the development of molecular markers for marker assisted selection (MAS) of favorable alleles.

**Keywords:** linkage map; RAD; QTL; *Solanum melongena*

#### **1. Introduction**

Eggplant (*Solanum melongena* L., 2n = 2x = 24) is a member of the Solanaceae, a large plant family comprising over 3000 species and including important crops such as tomato, potato, pepper and tobacco. Unlike most of the other solanaceous crops, which are native to the New World [1–3], eggplant has a phylogenetic uniqueness due to its exclusive Asian origin [4]. It has been reported that the species resulted from two or three independent domestication events [5,6], although a recent study suggested a single domestication event [7]. Eggplant worldwide production is estimated as about 54 Mt, with China, India and Indonesia being the major producing countries, while Egypt, Turkey and Italy represent the main producers in the Mediterranean region (FAO 2018; [8]). Breeding efforts in eggplant, like in most crops, have been focused on increasing yield, resistance/tolerance to biotic and abiotic stress, and fruit shelf-life, but also on improving some plant morphological distinguishing traits (reduced prickliness and leaf hairiness) as well as raising the content of health-promoting metabolites (e.g., anthocyanins and chlorogenic acid) or reducing the anti-nutritional content (e.g., steroidal glycoalkaloid, saponins) in the berries. Furthermore, studies have been carried out with the goal to improve seed germination and seedling emergence [9–11], which affect the crop performance.

In eggplant, several inter-specific genetic maps were developed by applying pre-next generation sequencing (NGS) techniques (RFLP, AFLP, RAPD, SSR, etc.). They were based on inter-specific crosses between cultivated *S. melongena* and *S. linnaeanum* (=*S. sodomaeum)* or *S. incanum* and used for Quantitative Trait Loci (QTL) analyses of domestication and morphological traits [12–16], as well as to locate genes involved in polyphenol biosynthesis [17] and resistance to *Verticillium* spp. [18]. Intra-specific maps were also built [19–22] and, more recently, Fukuoka et al. [23] generated two intra-specific genetic maps based on F2 populations, which were then combined into one on the basis of common markers for studying macro-syntenic relationships between eggplant and tomato, as well as for QTL analysis of parthenocarpy [24] and resistance to *Fusarium oxysporum* [25].

The advent of NGS-based marker technologies, by increasing the speed, throughput, and cost effectiveness of genotyping and providing genome-wide marker coverage, has allowed the development of the so-called 'second generation' maps. Barchi et al. [26], by applying the RAD-seq protocol from Baird et al. [27] on an intra-specific F2 population, identified 10,000 single nucleotide polymorphisms (SNPs) as well as nearly 1000 polymorphic indels, and more than 2000 SNPs were found of potential use for genotyping on the basis of a GoldenGate© assay. Afterwards, the first 'second generation genetic map' was developed [28], which included 415 markers assigned to the 12 chromosomes. The latter was used to identify the genetic bases of traits associated with anthocyanin content [28] and, more recently, for detecting QTL affecting key horticultural traits [29], fruit metabolic content [30] and resistance to soil-borne diseases [31]. Furthermore, the previously identified loci were validated, and new linked marker/trait associations were detected, through a genome-wide association (GWA) mapping approach [32,33]. Second generation intra-specific genetic maps were also generated [2] for anchoring the first draft genome sequence of eggplant and for mapping resistance QTLs to *Ralstonia* strains by SNPs developed through Illumina sequencing of the parents of a Recombinant inbreed line (RIL) mapping population as well as AFLP, SSR and SRAP markers [34].

Despite recent efforts, the linkage maps used for identifying the genetic basis of traits of breeding interest are still not saturated, hampering the fine mapping of QTL regions and the identification of candidate genes associated with the phenotypic traits. Up to now, the only available high-resolution SNP-based linkage map was developed on a F2 population from the inter-specific cross (*S. melongena* × *S. linneanum*) and was employed to highlight QTLs affecting stem height and fruit and leaf morphology [35].

We previously developed a RIL mapping population of 170 F6-F7 lines from the intra-specific cross between the breeding lines '305E40' (female parent) and '67/3' (male parent). Furthermore, the first high quality eggplant genome sequence of the breeding line '67/3' was released [36] and, through the resequencing of the female parent ('305E40') and a low coverage Illumina sequencing of each RIL, we constructed a first linkage map aimed at anchoring the scaffolds to the 12 chromosomes. The map also demonstrated efficient mapping metabolomic traits of interest related to the metabolomics composition of fruit flesh and peel [37].

The genetic basis of anthocyanin synthesis and accumulation has been widely studied in the Solanaceae [38–43]. In the last decade, QTL-related studies using family-based or GWA mapping approaches allowed us to shed light on the genetic bases of anthocyanin distribution in eggplant as well as to identify its syntenic relationships with tomato [28,30,32,44]. By contrast, no information is available on QTLs controlling seed vigor in term of speed of seedling emergence, which diversifies the parents of our RIL mapping population. Here, we propose a more breeder-friendly map developed through a genotype-by-sequencing (GBS) approach with our RIL mapping population, whose reliability for mapping studies has been proved by identifying QTLs related to plant anthocyanin pigmentation and seed vigor.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

A population of 163 F7 plants, previously obtained by the single seed descent approach from a cross between eggplant lines '305E40' and '67/3' [28,36], was employed. The two parental lines were contrasting for a wide number of key agronomic and metabolic traits [28–31], as well as for their seed vigor. The '305E40' line (female parent) is a double haploid derived from the inter-specific somatic hybrid [*Solanum aethiopicum* gr. gilo(+)*S. melongena* cv. Dourga], which was repeatedly backcrossed with the recurrent eggplant genotypes (lines DR2 and Tal1/1) prior to selfing and anther culture. This line carries the locus *Rfo-sa1* from *S. aethiopicum*, which confers complete resistance to the soil-borne fungus *Fusarium oxysporum f. sp. melongenae* (*Fom*) [45] and is partially resistant to *Verticillium dahliae* [31]. Plants of '305E40' display a slight anthocyanin overall pigmentation, produces pink flowers and long, highly pigmented dark purple fruits characterized by the presence of the anthocyanin delphinidin-3-rutinoside (D3R) as well as a higher glycoalkaloids and organic acid content than the ones of '67/3' [28–30]. The '67/3' line is an F8 selection from the intra-specific cross cv. 'Purpura' × cv. 'CIN2', which lacks the *Rfo-sa1* locus and is fully susceptible to *Verticillium* [31]. Its plants display higher anthocyanin pigmentation than '305E40' in leaves and stems and produce violet flowers and round, violet colored fruits with white peel colour both under and next to the calyx. The fruits are characterized by the presence of the anthocyanin nasunin in the peel, higher soluble solids, sugars and chlorogenic acid content in the flesh compared to 305E40.

The mapping population was sown, along with both parents and the F1 hybrid, in glasshouses at Montanaso Lombardo in 2012. The seeds were sown in plastic trays consisting of 104 holes (8 rows of 13 holes each) filled with peat and placed over an electric warmed carpet at 24 ◦C. For each RIL, we sowed 52 seeds split in two replicates of 26 seeds. Each replicate was sown in two adjacent randomly chosen rows containing 13 holes of the replicate-specific tray, using a single seed per hole; each replicate was kept under the same conditions but in a different glasshouse at Montanaso Lombardo. All plantlets were grown in heated glasshouses (minimum temperature of 15 ◦C ensured) until the 3rd–4th leaf, and then were transplanted in three field trials in northern (Montanaso Lombardo, ML, 45◦20 N, 9◦26 E), central (Monsampolo del Tronto, MT, 42◦53 N,13◦47 E) and southern (Battipaglia, BP, 40◦36 N, 14◦59 E) Italy. Mulched twin rows of 1.1 m width were arranged using plastic black PE (0.05 mm), and plantlets were transplanted at 45 cm between each other along the rows. A drip irrigation system was employed for watering and fertilizing, and local standard horticultural practices were applied. In each site, the material was transplanted in the field according to a randomized block design (3 replicate blocks, 4 plants per block) to score the phenotypic traits.

#### *2.2. Library Construction and Sequencing*

DNA from the RIL population, parental lines and F1 hybrid was extracted following a modified CTAB method [46] as indicated elsewhere [47]. Library construction was performed as proposed by Acquadro et al. [48] by using a HindIII-MseI enzyme combination and adding a final biotin/streptavidin-coated beads-based purification step. Quality, quantity and reproducibility of libraries were assessed with a Bioanalyzer instrument (DNA High Sensitivity chip) as well as qPCR using KAPA SYBR FAST Universal 2X qPCR Master Mix (Kapa Biosystems, Boston, MA, USA). On the basis of the quantitation, DNA libraries were pooled and sequenced on Illumina HiSeq 2500 platform (Illumina Inc., San Diego, CA, USA), following the manufacturer protocol using 150 PE chemistry at Biodiversa srl (Rovereto (TN), Italy).

#### *2.3. Sequence Analysis and Map Construction*

Raw reads were analyzed with Scythe (https://github.com/vsbuffalo/scythe) for filtering out contaminant substrings, and Sickle (https://github.com/najoshi/sickle), for removing reads with poor quality ends (Q < 30). Illumina reads were de-multiplexed on the basis of the Illumina TruSeq index using Stacks process rad tags. Alignment to the reference eggplant genome [36] was carried out using the Burrows-Wheeler Aligner (BWA) aligner [49] (i.e., mem command) with default parameters and avoiding multiple-mapping reads. BAM files were processed and use for the SNP calling using bcftools mpileup/call/norm utilities [50] with default parameters, except for the use of multiallelic calling model (-m option), minimum mapping quality (Q = 20) and filtering out multimapping events (−q > 1). Only SNPs with at most 20% of missing data and a mean-minDP of 20 were retained for linkage analyses. Polymorphic markers were grouped in linkage groups with "R/qtl" package [51], with minimum LOD = 8, rec ≤ 0.15. For each linkage group identified, identical loci were removed with the Exclude identical function and the remaining loci were ordered with Joinmap software (version 4, [52]), using a LOD of 8 and the Kosambi function to estimate distance and the Maximum-likelihood function to infer correct order. Markers exhibiting segregation distortion were identified applying the chi-square (X2)-goodness-of-fit test (*p* < 0.001) and also integrated into the map. The ordering step was iterated several times, each time by correcting genotype calls with the "SMOOTH.pl" script, which is a Perl implementation of the SMOOTH software, as developed by van Os et al. [53]. Finally, visual inspection of genotypes was applied to identify and correct the remaining genotype errors. Linkage groups were visualized in MapChart (version 2.32, [54]).

#### *2.4. Phenotypic Traits Evaluation*

The speed of emergence and hypocotyl anthocyanin distribution traits were assessed on 56 plantlets of each RIL, of the two parental lines and their F1 hybrids, which were obtained from as many seeds sown as previously described in Plant material.

The speed of emergence index (*sei*) was evaluated as the time, expressed in days, needed for the emergence of 50% + 1 plantlets from the soil. Hypocotyl anthocyanin (*hyan*) trait was assessed on plantlets at the second–third leaf stage (Figure 1d) according to a 0–5 scale, with "0" representing no visible anthocyanin coloration (i.e., completely green tissues) and "5" representing complete dark violet coloration.

**Figure 1.** Phenotypic trait evaluation. (**a**) *adlan* (range from 0 to 5); (**b**) *lvean* (0–5); (**c**) *stean* (1,3,5); (**d**) *hyan* (0, 5); (**e**) *corcol* (clockwise from top-right: light violet, light pink, dark pink, dark violet); (**f**) *flian* (1,3,5); (**g**) *toan* (violet-reddish); (**h**) *sei* (picture at 10 days after sowing).

In all the three field trials (ML, MT and BT), the material was arranged as a set of three randomized complete blocks with 4 replicate plants per entry per block. Phenotyping was based on the European Cooperative program for Plant Genetic Resource descriptors panel for Solanaceae (ECPGR, 2008) and the International Board for Plant Genetic Resource descriptors for eggplant (IBPGR, 1990). The traits assayed, reported in Figure 1 and detailed in Table 1, were: adaxial leaf lamina anthocyanin (*adlan*), corolla colour (*corcol*), flower anthocyanin intensity (*flian*), hypocotyl anthocyanin (*hyan*), leaf venation anthocyanin (*lvean*), stem anthocyanin (*stean*), anthocyanin tonality (*toan*).

The anthocyanin content of stems, leaves and flower calyxes was scored on a 0 (no visible coloration) to 5 (complete dark violet coloration) scale. Anthocyanin content in leaves and leaf venation was evaluated on 4 leaves per RIL in each block, chosen in the upper middle part of the plant. Stem anthocyanin content was measured as an average value based on 3 stems per block. Flower anthocyanin intensity resulted from averaging 5 flowers per block. Anthocyanin tonality was scored as "1" reddish, "3" intermediate, or "5" violet. Finally, for the corolla colour, the trait was coded as "1", pink; "2", dark pink; "3", light violet; "4", violet-pink and "5", violet.



#### *2.5. Statistical Analyses and QTL Detection*

Statistical analyses were performed using R software [55]. A conventional analysis of variance was applied to estimate genotype and environment effects based on the linear model Yij = μ + gi + bj + eij, where μ, g, b and e represent, respectively, the overall mean, the genotypic effect, the block effect and the error. Broad-sense heritability values were given by σ<sup>2</sup> G/([σ<sup>2</sup> <sup>G</sup> + σ<sup>2</sup> E]/n), where "σ2G" represents the genetic variance, "σ2E" the residual variance and "n" the number of blocks. Correlations between traits were estimated using the Spearman coefficient, and normality, kurtosis and skewness were assessed with the Shapiro–Wilks test (α = 0.05). Segregation was considered as transgressive when at least one individual RIL recorded a trait value higher or lower by at least two standard deviations than the higher or lower scoring parental line. QTL detection was performed considering each location independently and was based on the newly developed map using MQM [56,57] mapping, as implemented in MapQTL v4 software [58]. QTLs were initially identified using interval mapping, after which one linked marker per putative QTL was treated as a co-factor in the approximate multiple QTL model. Co-factor selection and MQM analysis were repeated until no new QTL could be identified. LOD thresholds for declaring a QTL to be significant at the 5% genome-wide probability level were established empirically by applying 1000 permutations per trait [59]. Additive and dominance genetic effects, as well as the percentage of the phenotypic variation (PVE) explained by each QTL, were obtained from the final multiple QTL model. Individual QTLs were prefixed by a trait abbreviation, followed by the relevant chromosome designation—BT, ML or MT—which was added as a suffix when a QTL was expressed in a site-specific manner. Confidence interval of the QTL was calculated at a LODmax−1 interval or at least by considering 0.5Mb upstream and downstream (if not explicitly reported in the text) the marker identified at the QTL. CMplot was used for drawing QTL results [60]. No site-specific suffix was added to the *hyan* and *sei* QTLs, as these two traits were assessed in a single environment.

#### **3. Results**

#### *3.1. Sequencing and Linkage Map Construction*

A total of 855 million paired-ends (PE) reads were produced, corresponding to about 257 Gb of data. After demultiplexing, cleaning and trimming, a total of 745 M Illumina PE reads were retained, corresponding to an average number of PE reads per sample of 4.48 M, with a standard deviation of 2.87 M (Figure S1). The sequence data were deposited into NCBI Short Read Archive under the Bioproject PRJNA635547.

Reads were then aligned to the reference eggplant genome [36]; close to 100% of reads were successfully mapped to single regions (no multiple mapping was permitted). A total of 10,316 polymorphic sites (i.e., markers) were identified after SNP calling using conservative filtering parameters, and were used for mapping purposes by applying a combination of R/qtl and Joinmap. At first, all markers were fed to R/qtl for linkage group identification and eventually ordered with Joinmap. Overall, 7249 markers were successfully retained and assigned to the 12 linkage groups (LG) corresponding to the haploid chromosome number of the species (Figure 2 and Figure S2**)**. A total of 1744 markers showed segregation distortion with *p* < 0.001, covering about 24% of the total mapped markers. Chromosome E02 showed the largest segregation distortion for 1423 markers, followed by E05, with 72 markers (Table 2).

**Figure 2.** Eggplant linkage map depicting the size of the chromosomes and markers distribution. Marker names and map distances (in cM) are detailed in Figure S2.

The marker names, chromosome, and genetic position of all markers on the map and RILs haplotypes are included in Table S1. The linkage map spans 2169.23 cM (Table 2), with E02 being the longest (326 cM) chromosome and containing the highest number of markers (i.e., 1454), while E09 was the shortest (107 cM) and contained 230 markers. Some markers belonging to contigs previously assigned to CH0 [36] were mapped to the 12 LGs, of which the majority were mapped on E06, while just 47 were mapped to other chromosomes. The genome-wide mean inter-locus separation was reduced (mean of 0.4 cM), with the highest value (0.7 cM) for E05 and E10. The mapping of the SNP markers on the eggplant genome sequence revealed a total coverage of 95.88% (1095.72 Mb) of the diploid genome (1142.80 Mb). The genome-wide mean inter-locus separation was 216.5 kb, with the highest value (411.2 kb) in E04. The largest gap was observed on E12 (22.45 cM), while almost all gaps were less than 5 cM. The recombination rate of different chromosomes was estimated as the quotient between the genetic distance (cM) covered by the corresponding LG and the size in Mb of the chromosome fragment covered with markers. This value ranged from 0.81 cM/Mb on E07 to 3.91 cM/Mb on E02.

To evaluate the quality of the map obtained, we used heat maps of recombinant frequency, which highlighted that the mapped markers were ordered correctly, as the pair-wise recombination rates were noticeably low between adjacent markers (diagonal distribution of the yellow color indicates the lowest recombination rate) in the heat map for each chromosome, except E02, E06, E08 and E12 (Figure S3). Similar results were obtained when collinearity between the genetic distances of mapped SNP on each linkage group and their corresponding physical position on the eggplant chromosomes was spotted (Figure S4).

#### *3.2. Phenotypic Variation and Inter-Trait Correlations*

A summary of the phenotypic performance for each trait in the parental lines, hybrid F1 and RILs, together with the skewness, kurtosis, broad sense heritability (*h2BS*) values and presence of transgressive genotypes for each trait, are listed in Table 2. As expected, the parental lines contrasted for each trait. The female line '305E40' had a delayed emergence from the soil compared to '67/3', as evidenced by its higher *sei* (6 days). The '305E40' line showed lower anthocyanin content in leaves, leaf venations, flower calyxes and stems; its hypocotyl was characterized by a reddish tonality and produced flowers with a pink corolla. Conversely, line '67/3' produced violet flowers. The F1 hybrid's phenotype was intermediate between the two parents for *adlan*, *toan* and *hyan*. For the remaining traits, the F1 hybrid was more similar to '67/3' in all environments. Transgressive genotypes among the RILs were limited and only toward '305E40'—more precisely, three RILs for *steanBT,* two RILs for *lveanML* and *adlanBT*) and one RIL for *lveanBT*, and *hyan*. An exception was the speed of plant emergence, which was delayed in 44 RILs with respect to the late female parent '305E40'. The *h2* was overall high, ranging from 0.86 (*lveanBT*) to 0.98 (*sei* and *hyan*) (Table 2).

Significant inter-trait correlations were detected within and across locations (Table 3), and the same traits appeared to be highly correlated in the three locations. No significant correlation was detected for *sei* with other traits as well as between *adlanMT* and traits such as *corcolBT*, *corcolML* or *hyan*. The least correlated traits were *adlan* with *hyan*, *toan* and *corcol* (in all environments), while the most highly correlated were *corcol* and *toan* in both the BT (+0.92), and ML (+0.91) environments.



a,b

14

*flianMT*

*sei* *hyan* *lveanBT* *lveanML* *lveanMT* *steanBT* *steanML* *steanMT*

*toanBT*

0.02

0.44 0.09

 0.07 0.52

0.43 0.83

0.59 0.52

0.78 0.54

0.59 0.88

0.88 0.91

0.59 0.62

0.65

0.84

0.62

0.70

0.68

0.62

0.35

0.36

0.83

0.83

0.55

0.57

0.83

0.82

0.80

0.52

0.50

0.27

0.82

0.78 −0.01

−0.02

 0.05 0.65

0.53

0.61

0.77

0.80 −0.01

 0.00 0.54

0.66

0.65

 0.07

0.80

0.45

0.46

−0.01

#### *3.3. QTL Analysis*

LOD score, percentage of variance explained (PVE), and confidence interval (CI) related to QTLs, are described in Table 4. QTL analyses on all traits and environments yielded a total of 23 major (PVE values >10%) and 11 minor QTLs (Figure 3).

QTL analyses for both *hyan* and *sei,* which were evaluated in a single location, highlighted one major and two minor QTLs. Separate analyses performed on each location for *adlan*, *lvean*, *stean*, *corcol* and *toan* resulted in the identification of a ratio between major and minor QTLs of 8/2, 7/4 and 6/1 in BT, ML and MT, respectively. The majority of QTLs identified can be considered stable, as they had the same genomic position across the three locations, with the exception of the minor QTLs *stean2.1*, confirmed both in MT and ML, but not in BT, and *corcol10.1*, demonstrating a major QTL in BT and a minor QTL in ML, and not detected in MT. Moreover, the major QTL *corcol5.1*, detected both in BT and ML, was mapped in a different position of E05 and was found as playing a minor effect in MT. For the anthocyanin-related QTLs, the positive alleles responsible for an increase in the anthocyanin content and for the presence of a violet vs. reddish pigmentation derived from '67/3'. Concerning the speed of emergence index, for all the QTLs but *sei2.2*, the allele increasing seed vigor derived from '67/3'. The largest single QTL effect was associated with *flian10.1\_MT* (69.3% of the PVE). The additive effects of all the QTLs were significant at *p* < 0.05.

All the identified QTLs were distributed over five chromosomes (Table 4), namely E02, E04, E05, E07, E10, and two evident clusters of QTLs were detected (Figure 3). One is located on E05, which also harbors two adjacent sub-clusters of QTLs conserved in the three locations. The former is comprised between 62.45 to 66.5 cM, and contained QTLs for *stean* and *toan*; the other is at 75.48 cM, and included coincident QTLs for *hyan, toan* and *corcol*. The second main cluster is on E10 at 231.77–236.98 cM and included the major QTLs *stean10.1*, *lvean10.1, adlan10.1* and *flian10.1*, and the minor QTLs *hyan10.1, corcol10.1,* and *toan10.1.*

**Figure 3.** QTLs identified for the traits in study. Blue dots represent markers within the confidence interval of the QTL (LODmax−1 interval), with LOD values plotted against genome locations. Red lines in the Manhattan plots indicate LOD significance threshold.



from

#### 3.3.1. QTL Affecting Plant Pigmentation in Eggplant

#### Adaxial Leaf Lamina Anthocyanin (*adlan*)

A single major QTL, *adlan10.1,* was mapped on E10 at 236.987 cM next to the marker CH10\_95003635 in all the three environments, and explains 21% of PVE in BT, 20.4% in ML and 17.7% in MT.

#### Stem Anthocyanins (*stean*)

Two major QTLs on E05 and E10 were detected in all the environments. The QTL *stean10.1* on E10 explains from 40% to 48% of the PVE and is located at 231.46 cM (proximal marker: CH10\_94779014). The second major QTL *stean5.1*, explains from 13% to 21% of the PVE and maps on E05, within the same confidence interval (62.45–66.51 cM) in all locations (proximal marker: CH05\_36124744). A third minor QTL was spotted in ML and MT, but not in BT, on E02 at 88.73 cM (proximal marker: CH02\_30555633), and explains 6% and 9% of PVE, respectively.

#### Leaf Venation Anthocyanins (*lvean*)

One major QTL determining anthocyanin pigmentation in leaf venation (*lvean10.1*), explaining from 29% to 64% of PVE, was mapped on E10 in all locations at 231.46 cM (proximal marker: CH10\_94779014).

#### Flower Anthocyanin Intensity (*flian*)

The unique major QTL *flian10.1* mapped, in all environments, on E10 at 231.46 cM (proximal marker: CH10\_94779014), explains from 59% to 69% of the PVE.

#### Anthocyanin Tonality (*toan*)

Data for *toan* were only available for BT and ML environments. A major QTL (*toan5.1)* explaining 36% and 47% in BT and ML, respectively, was spotted on E05 at 75.48 (proximal marker: CH05\_37533757). A minor QTL (*toan5.2*) was identified on E05 within the same CI in both locations at 61.55–66.39 cM. A third QTL with a minor effect (PVE explained from 6.8% to 9.3%) was spotted on E10 at 231.46 cM.

#### Hypocotyl Anthocyanins (*hyan*)

The QTL analysis for this trait was performed with data from one environment. A major and two minor QTLs were spotted on E05, E07 and E10, respectively. The largest effect locus (*hyan5.1*) explains 48% of PVE and is located on E05 at 75.48 cM (proximal marker: CH05\_37533757). A minor QTL (*hyan7.1*), explaining 6% of the PVE, maps on E07 at 83.95 cM (proximal marker: CH07\_132671839). The QTL with the lower effect (4.5% PVE explained) was located on E10 at 231.46 cM (proximal marker: CH10\_94779014).

#### Corolla Colour (*corcol*)

A major QTL (*corcol5.1*), explaining from 40% to 44% of the PVE, was spotted on E05 at 75.48 cM (proximal marker: CH05\_37533757) in BT and ML, while *corcol5.2* was located at 39.40 cM (proximal marker CH05\_17086140) and explains 19% of PVE in MT. A minor QTL (*corcol10.1*), explaining 7.7%−11.1% of the PVE, was spotted on E10 in BT and ML (not in MT) in the same position at 232.77 cM (proximal marker: CH10\_94275882).

#### 3.3.2. QTL Affecting Speed of Plant Emergence Index

On E02, we mapped the largest effect locus *(sei2.1*), explaining 10.4% of PVE and located at 204.18 cM (proximal marker: CH02\_63996392), together with a minor QTL (*sei 2.2*), at 176 cM (proximal marker: CH02\_54633733), which explains 8% of the PVE. A further minor QTL was located on E04, at 108.43 cM (proximal marker: CH04\_102121728), which explains 10% of PVE.

#### *3.4. Candidate Genes Identification*

To find out candidate genes at the identified QTLs in the confidence interval region, we exploited the annotation of the available eggplant genome sequence by searching for genes, including transcription factors, putatively involved in the genetic control of the traits in study. For each QTL, the position, best candidate genes ID, acronym (abbreviation) and predicted function are reported in Table 5.




#### **Table 5.** *Cont.*

#### *sei2.1*

The major QTL associated to *sei* lies on E02 at 204.18 cM and includes four markers in the confidence interval, whose physical positions are quite distant, i.e., 48–54 Mb and 9.2 Mb. The top-linked marker is located at around 54 Mb, in a region containing 15 genes, among which a Pectinesterase 2 and SNL2, a protein involved in response to hormonal stimulus, appeared to be good candidates. The region at 9.2 Mb contains two colocalizing markers: in the interval around 0.3Mb, five genes are present, including two putative NHL6, putatively involved in the response to abscisic acid (ABA).

#### *sei2.2*

The minor QTL *sei2.2* lies on E02 at 176 cM, in a region of six co-localizing markers located at 60–63 Mb. This region contains 70 genes, some of which may be eligible as candidates, such as the ones associated to the response to abscisic acid (GRDP1 and CAR2), two polyol transporters, a TCP1 and peptide methionine sulfoxide reductase. In the region beneath *sei2.1* and *sei2.2*, a laccase, a zinc finger and a G-box-binding factor 1 putatively involved in seed germination are also present.

#### *sei4.1*

In the interval around 0.5 Mb from *sei4.1,* several Serpins-ZX and NRT1/PTR genes protein were identified, as well as an Ent-kaurene oxidase and a B3 domain protein, both involved in gibberellin synthesis. By slightly relaxing the confidence interval (until around 1 M), other genes involved in the signaling pathway of ABA degradation (among which an abscisic acid hydroxylase) were found.

#### *stean10.1, lvean10.1, flian10.1, hyan10.1*

The major QTLs *stean10.1, lvean10.1, flian10.1,* identified at 231.47 cM on E10 in the three environments, as well as the minor QTL *hyan10*.1, lie at around 94.7 Mb, in the top part of the cluster of QTLs associated to anthocyanin amount/coloration intensity and very close to the QTLs for *corcol* and *toan*. In this region, 17 genetic markers are co-segregating, corresponding to a physical interval comprised between 94.54 and 94.88 Mb. The region includes 19 annotated genes, among which a BES1/BRZ1 transcription factor, a DREB2C (dehydration-responsive element-binding protein 2C), an Ankyrin repeat-containing protein and a PPC6-1 (putative protein phosphatase 2C) are eligible as candidates involved in the anthocyanin synthesis. By slightly relaxing the confidence interval, three proteins annotated as regulatory-associated protein of TOR (RAPTOR) are also present.

#### *corcol10.1-toan10.1*

The QTLs *corcol10.1* and L *toan10.1*, both identified in BT and ML in the cluster on E10 at 232.77cM lean in a region located on E10 containing ten co-localizing markers and physically located at 93.8-94.2 Mb. In this region, the most promising candidate genes are a putative MYB family transcription factor (SMEL\_010g352790) and a phosphoinositide phosphatase, SAC8 (SMEL\_010g352650), a new class of phosphatase playing a role in vacuolar trafficking. By slightly expanding the region of interest, five genes (SMEL\_010g352310, SMEL\_010g352320, SMEL\_010g352330, SMEL\_010g352490 and SMEL\_010g352500) were identified and annotated as having predicted 2-oxoglutarate/Fe(II)-dependent dioxygenase activity (ANS). In the same region, three sequences with homology with regulatory-associated protein of TOR (RAPTOR) are localized.

#### *adlan10.1*

The major QTL *adlan10.1* lies at 236.99 cM on E10, physically located at 94.9–95.08 Mb. A total of seven genes were identified as candidate, five of which were annotated as peroxidases or protein disulfide-isomerases.

#### *stean5.1, toan5.2*

The major QTL *stean5.1* identified in the three environments as well as *toan5.2*, specific to BT and ML, underlined a region on E5 in the interval 36.0–36.3 Mb, containing twelve genes. Among them, a class II heat shock protein and a putative BHLH could represent a good candidate. By increasing the confidence interval to the physical position of the marker 3311\_PstI\_L361, the candidate SMEL\_005g236240, annotated as an acetyl-CoA-benzylalcohol acetyltransferase, was found.

#### *toan5.1-corcol5.1-hyan5.1*

The major QTL *toan5.1* as well as *corcol5.1* identified in ML and BT and *hyan5.1* in ML lean on E05, in a small region at ~37.5 Mb. Among the seven annotated genes which lie in this interval, two genes annotated as BKI1 (BRI1 kinase inhibitor 1), are eligible as the best candidate. By slightly increasing the region analyzed, two scopoletin glucosyltransferases, as well as a cluster of three genes annotated as encoding cytochrome P450, two zinc fingers and a BHLH-like protein were spotted.

#### *hyan7.1*

The minor QTL for *hyan7.1* lies on E07 at 83.94 cM, whose confidence interval physical region spanned between 132.76 and 134.02 Mb. More than 80 annotated genes were found, among which the most interesting are a cluster of putative candidates annotated as similar to MYB14−15−58−102, a PPC6−6 and a F-box/kelch-repeat protein.

#### *stean2.1*

The minor QTL *tean2.1* lies on E02 at about 30.5 Mbp. A total of five annotated genes were spotted, among which a L10 interacting MYB domain protein, a Heat shock 70 kDa protein and an AKRP—ankyrin repeat domain-containing protein—are eligible as possible candidates.

#### **4. Discussion**

#### *4.1. Genetic Map Construction and Phenotyping*

Studies on eggplant genome organization have received an increasing amount of interest in the last few decades, turning it from a "genomic orphan species" to a crop with a high-quality genomic sequence available [36]. As in several other crops, the low level of polymorphism within the cultivated eggplant germplasm required huge efforts in detecting markers exploitable for linkage mapping purposes [61,62].

Several "first generation" inter-specific and intra-specific maps were developed, with the former exploiting a higher genetic polymorphism, but being of minor relevance for marker-assisted breeding [63]. QTLs associated to morphological and plant production traits, as well as parthenocarpy, resistances to fungal and bacterial wilts were identified through bi-parental approaches and genome-wide association (GWA) studies on the basis of the available linkage maps.

An intraspecific F2 segregating population, obtained from the same cross from which we developed the RIL population in the present study, proved to be a highly efficient tool for the detection of more than 140 QTLs associated to leaf, flower, plant and fruit traits, fruit biochemical composition and resistances to fungal wilts [22,28–31]. Our F7 RIL population was also previously used for anchoring the "67/3" eggplant genome sequence [36]. Here, we exploited the GBS-derived approach, as applied by Acquadro et al. [64] in eggplant, to develop a new high density linkage map including 7249 SNPs assigned to the 12 chromosomes and spanning 2169.23 cM. Its genetic length is longer than the previously published intra-specific maps [14,22,24,28], as well as the one recently made available by Salgon et al. [34], which spans about 1500 cM and includes 1170 markers.

The newly created map clearly represents a step forward compared to the one we developed for anchoring the genome [36]. This was generated using markers derived from an imputation-based method following low-coverage sequencing of the same mapping population. The pipeline took windows containing 100 SNPs along scaffolds to convert them into genetic markers, which were actually based on the haplotype of 100 SNPs.

This approach was useful in anchoring the scaffolds to pseudomolecules, but some drawbacks are present, especially for QTL analysis and candidate gene identification. Indeed, this contained three chromosomal locations (E02, E08 and E11), which were split into two different portions. The newly developed map actually includes 12 chromosomes, which in turn may increase the efficiency in identifying candidate regions during QTL analysis. Furthermore, the map developed in this study contains a slightly higher number of gaps shorter than 5cM than the previous one, but some chromosomes have a larger gap. On the other hand, the newly created map is shorter (~500 cM) and contains more markers (1285) (Table S2) than the previous one, resulting in a more dense and saturated map.

Finally, in the map used for anchoring the genome sequence, a marker is based on a window of 100 SNPs, whose size is dependent on the polymorphic level of that specific chromosome regions and whose coordinates in the genome are not well defined. On the contrary, in the new map, each marker is based on a single SNP, allowing us to know the precise position of each marker in the genome and make it possible for a breeder to identify the genes located in a QTL region on the basis the available annotation. Furthermore, GBS data provide information on the SNPs that generated the markers, which is of utility for more targeted analysis.

Our map contains about 24% distorted markers, presumably as a result of the genetic distance between the parents as well as possible preferential or gametic/zygotic selection occurring during the development of our RIL population. However, we included these markers in order to increase the genomic coverage of the genetic map, which reached about 96% of the physical sequence. Indeed, if properly handled, these markers do not cause detrimental effects and increase the potential of QTL mapping, as previously reported [65,66]. The breeding line '305E40', used as the female parent, contains scattered introgressed regions from *S. aethiopicum*, with a large portion on chromosome E02, which includes the locus Rfo-sa1 [29,45]. This may justify the reduced recombination observed not only on E02, but also E09 and E12.

All the anthocyanin-related traits showed a high *h2 BS* (lowest value of 86% for *lvean\_BT*) with a high correlation of their phenotypic value among different environments. Similar results were previously reported for some of the traits in the study in the F2 population developed from the same cross [28,30]. Transgressive genotypes were infrequent and always deviated towards the less pigmented parent '305E40'. The parental line '305E40' produced less vigorous seeds, but about 25% of the RIL population showed a further reduction in the speed of seedling emergence; this is presumably also because we were not able to identify all the QTLs affecting this trait, implying that some other QTLs still remain to be identified.

Conventionally, a 'major' QTL is defined as such when, in addition to justifying a PVE greater than 10% [67], it is conserved in multiple seasons/locations [68–70]. We identified at least one major QTL for all the traits in study including *hyan* and *sei*, although in this case the traits were evaluated only in one environment.

#### *4.2. QTLs and Underlying Candidate Genes*

The least and the most convincing LOD scores associated with the major QTLs were 3.89 (*sei2.2*) and 37.70 (*flian10.1\_MT*), respectively. The explained PVE varied from 10.4% (*sei2.1*) up to 69.3% (*flian10.1\_MT*), and most of the identified QTLs were stable across two/three environments, making them potentially useful for marker-assisted selection. On the other hand, some QTLs were identified in just one *(corcol5.2\_MT*) or two (like *corcol5.1* or *stean2.1*) environments. This suggests a strong environmental effect on their expression, but also that the rather limited genetic variation in the mapping population did not allow us to fully dissect the genetic bases of these traits [71].

#### 4.2.1. Seed Emergency Index

Seed germination is the switch from a dormant embryonic state to a highly active phase of growing. It is also defined as the sum of events that begin with seed imbibition and culminate in the emergence of the embryonic axis (usually the radicle) from the seed coat [72,73]. This progression is controlled by several internal factors, such as auxins, abscisic acid (ABA), cytokinins, ethylene, and GA content and balance, as well as environmental factors that include water availability, temperature, and light [74]. Eggplant, as with most of the *Solanum* species, is mainly propagated by seeds, whose vigor influences their germination and seedling emergence performance. Seed dormancy, low uniformity and poor germination rate have been documented in many eggplant accessions as well as in wild and allied species, including those employed as rootstock or those useful for introgression breeding [11,75–79].

Here, we reported, for the first time, three QTLs associated to seed vigor in eggplant, assessed by evaluating the speed of emergence index. Only one major QTL was spotted, suggesting that other key regions controlling this trait are still to be identified.

However, interestingly, both a major and minor QTL (i.e., *sei2.1* and *sei2.2*) were detected on chromosome E02, with *sei2.1* inherited from the female parent '305E40'. As previously pointed out, this breeding line harbors on chromosome E02 an introgressed fragment from *S. aethiopicum*, [28,36], associated to the *Fusarium oxysporum* resistance locus *Rfo-sa1* [45]. Thus, we could speculate that the introgressed portion of *S. aethiopicum* genome might be also involved in the genetic control of this trait, as this allied species usually displays a delayed germination with respect to eggplant.

The introgressed region may also be responsible of an inaccurate positioning of the genomic sequences in this region, and, consequently, in a reduction in the QTL mapping efficiency. Indeed, the candidate genes encompassing *sei2.1* are physically located both at 48–54 Mb and 9.2 Mb on E02. In the first large region, a paired amphipathic helix protein Sin3-like 2 could be a good candidate gene as it belongs to a class of proteins, involved in the response to hormonal stimuli and in the seed dormancy breakdown [80], while the NHL6 genes identified at 9 Mb have been reported to play an important role in the abiotic stress-induced abscisic acid (ABA) signaling and biosynthesis, acting as positive regulator of ABA-mediated seed germination inhibition [81].

*Sei2.2* overlies some candidates annotated as similar to previously described genes involved in the ABA response and seed dormancy breakdown: two membrane C2-domain abscisic acid-related proteins (CAR2) [82] and a GRDP1—glycine-rich domain-containing protein 1 [83]. Furthermore, two TCP1 encoding genes and a peptide methionine sulfoxide reductase should also be considered as similar genes are involved in the repair mechanism during seed dormancy release in *Arabidopsis* and the increase in *M. truncatula* seed longevity by reducing the protein oxidation damage [84,85], respectively.

Several Serpins-ZX and NRT1/ PTR coding-genes were identified in the *sei4.1* region. The Serpin gene family has gained attention in wheat and barley for its role in grain development, but these genes could play a possible role in the mobilization of sugars during germination by enhancing β-amylase enzymatic activity and preventing β-amylase aggregation during oxidative stress [86]. The cluster of NPF6/NRT1–1 nitrate or di/tripeptide transporters, also spotted in *sei4.1*, are potentially involved in nitrate sensing and signaling [87,88], and genes belonging to this class are reported to regulate seed development, germination and dormancy cycling in fava bean and *Arabidopsis* [89–91], with a possible involvement in the ABA transport [92]. Nitrate itself is reported as a signal molecule that controls several aspects of plant development including seed dormancy, as higher nitrate accumulation in mother plants leads to lower seed dormancy [93].

Good candidates for future studies might also be other genes in the same region encoding an Ent-kaurene oxidase and two B3 domain proteins, all involved in GA biosynthesis: the latter are regulating factors of the pathway in which the former is a key enzyme, and it is reported that suppressive mutations in the coding region of both genes cause a delay in seed germination and seedling development [94,95].

#### 4.2.2. Anthocyanins

Anthocyanins are among the most represented flavonoid compounds in plants and are responsible for the pigmentation of many flowers and fruits. They have an essential eco-physiological role in attracting pollinators and seed dispersers [96,97] and are also implicated in the response against biotic and abiotic stresses [98,99]

The genetic control of anthocyanin formation, distribution and accumulation has been widely studied in Solanaceae species [38–43,100,101]. This was long thought to be a complex trait in eggplant, involving several loci with assumed epistatic interactions and/or pleiotropic effects [102,103]. More recently, QTL-related studies allowed us to identify the chromosome regions involved in anthocyanin distribution in eggplant tissues and organs, highlighting their synteny with tomato [28,30,32,44] and, thanks to the recent availability of an high quality eggplant genome sequence coupled with metabolomic analyses [37], allowed us to localize putative candidate genes.

As previously observed [28,30,32], our ultra-high density genetic linkage map confirmed that the clusters on E10 and E05 are involved in the pigmentation of eggplant tissues, which may be associated with two different aspects of the anthocyanin synthesis among tissues, but likely control different processes linked to anthocyanin accumulation in diverse tissues. Indeed, the cluster on E10 is mainly prominent for anthocyanin production and accumulation in the vegetative plant organs, except in the corolla of the flower, whose pigmentation is governed by the major QTL on E5 with a smaller contribution by a minor QTL on E10. Conversely, the cluster on E05 contains QTLs more likely associated with the anthocyanin tonality (in flower, with *frucol*, and in general in all the vegetative tissues, represented by *toan*) and with the accumulation of anthocyanins in hypocotyl (*hyan).* Although the phenotypic data available for *hyan* were only collected in one environment, the combined results with *stean* QTLs seem to suggest that both 5 and 10 are involved, but with stronger specific effect on anthocyanin accumulation of *hyan5.1* in the hypocotyl at plantlet stage and of *stean10.1* in the stem of the fully developed plant. Overall, the joint effect of both E05 and E10 QTLs could impact on *hyan*, *stean*, *corcol* and *toan* through an interaction between genes, influencing both tonality and anthocyanin intensity.

#### 4.2.3. Anthocyanin Related Candidate Genes Identifications

The biosynthesis of anthocyanin is one of the most studied pathways in plants, with most of the genes encoding for enzymes and regulatory transcription factors (TFs) identified in several plant species, including Solanaceae [3,104]. The anthocyanin pathway is under the control of many early (EBGs) or late (LBGs) biosynthetic genes, with the former involved in the first steps of biosynthesis of flavonols and other flavonoid compounds and the latter involved in the ensuing steps of the pathway until the final steps of decoration, leading to different anthocyanin compounds. Each enzymatic step of this complex pathway is finely tuned by co-activators independent and functionally redundant R2R3-MYB regulatory proteins which regulate the expression of structural genes, alone or in complexes with other TFs belonging to the basic helix-loop-helix (BHLH) family. The control of the biosynthetic pathway is strongly dependent on tissue, developmental stage and environment, and has only been partially elucidated in eggplant with regard solely to the fruit peel coloration [36,103,105–108].

#### Cluster on Chromosome E10

The cluster identified on E10 lies in a region of 1.5 Mb, between 93.5 and 95 Mb, containing three clusters of colocalizing QTLs.

We spotted, next to the upper limit of the cluster and close to *toan*, five genes predicted as ANS, which might be involved in the oxidation of leucocyanidin, in the second to last step of anthocyanin biosynthesis [109]. A further characterization highlighted that these genes are located in a genomic region of the parental line '67–3' containing retrotransposon-like sequences, which could alter the expression pattern of nearby genes [110]. Within the QTL for *toan* at 232.77 cM, we identified a putative MYB transcription factor and a phosphoinositide phosphatase, belonging to a class of phosphatases that plays a key role in abiotic stress response, vacuolar trafficking and anthocyanin accumulation [111].

Lying within *stean10.1*, we identified an ankyrin repeat-containing protein coding gene together with a dehydration-responsive element-binding protein 2C and a PPC6–1 (protein phosphatase 2C). Ankyrin repeat proteins were reported to be involved in the anthocyanin synthesis pathway [112], while the other two candidates are putatively involved in the response to abiotic stresses and detoxification [113]. Finally, alongside *adlan10.1,* we identified five peroxidases coding genes which may be involved in the degradation of anthocyanin, influencing the overall coloration of the tissues where they are expressed. Indeed, enzymatic degradation has been considered to be responsible for anthocyanin breakdown in plants, leading to pigment concentration reduction and colour fading [114]. Recent studies have shown that PODs and laccases (LACs) are responsible for anthocyanin catalysis [115, 116], and also, in combination with some environmental factors, such as high temperature and low light density, were reported to enhance the peroxidase activity [104,117].

#### QTL cluster on Chromosome E05

The region identified on E05 which controls *stean*, *toan*, *hyan* and *corcol* contains two slightly separate clusters. The upper region, including the QTLs for *toan5.2* and *stean5.1*, was already spotted by Barchi et al. [28] as a genomic region involved in the control of several anthocyanin-related traits (such as *stean*) and the corolla color. In the same position, Toppino et al. [30] mapped QTLs associated to peel fruit color as well as to the presence and amount (determined by HPLC) of D3R and nasunin, the two different anthocyanins in the eggplant peel. Analogous QTLs in the distal portion of E05 were also previously identified by GWAS approaches [32,44].

Our candidate gene search highlighted the presence of an acetyl-CoA-benzylalcohol acetyltransferase (AAT), for which we speculate a function in the aromatic group decoration as the last step of the anthocyanin biosynthetic pathway. Furthermore, the distribution and dominance relationships strongly support the hypothesis that AAT is active in '67/3' and inactive in '305E40', and thus responsible of the acetylation of the D3R glucosidic group [118] and the subsequent conversion into nasunin. The comparison of the Illumina sequencing data available for the two parental lines [36] revealed a 1bp indel which could determine a loss-of-function mutation in the *305E40\_AAT* CDS sequence, opening the path to a deeper functional study of this gene.

The cluster of QTLs for *corcol, hyan* and *toan* on E5 is proximal to *toan5.1* and *stean*; in this region, two scopoletin glucosyltransferase coding genes, involved in the phenylpropanoid pathways [119], were spotted, alongside four genes annotated as cytochrome P450, known as playing pivotal roles in the biosynthesis of plant secondary metabolites, including phenylpropanoids and phytoalexins [120,121].

#### QTLs for *hyan7.1* and *stean2.1*

In the *hyan7.1* region, a cluster of MYB genes were identified, with homology to sequences known to be involved in the phenylpropanoid pathway and more specifically in the stilbene biosynthesis in *Vitis* [122], lignin in *Arabidopsis* [123] and anthocyanin in forage legumes [124]. In the same region, other interesting candidate genes are a PPC6−6 (probable protein phosphatase 2C) and a F-box/kelch-repeat protein, a class of regulators reported to be associated to phenylpropanoid pathway [125].

Finally, for *stean2.1,* a valid candidate gene was an AKRP—ankyrin repeat domain-containing protein—which was reported to be involved in the anthocyanin synthesis pathway [112].

#### **5. Conclusions**

Our results demonstrate that the newly developed map, supported by genome annotation, supplies a key tool to gather valuable information for QTL fine mapping, candidate gene identification, and for the development of molecular markers suitable for identifying favorable alleles, and thus increasing the precision and efficiency of selection in breeding. Our high-density intraspecific map made it possible not only to validate previously reported QTLs, but also to identify new ones associated with the plant anthocyanin pigmentation intensity and tonality, as well as to better define their underlying chromosomal regions. Thanks to the availability of genome annotation, it was also possible to provide a set of relevant candidate genes involved in the anthocyanin biosynthetic process and regulation, some of which are already the subject of ongoing studies.

Finally, the map allowed us to identify the first QTLs affecting seed vigor in eggplant, as measured by the speed of seedling emergence from soil. The identification of the genetic bases of this trait are of key importance, since seed germination and seedling emergence represent two of the most vulnerable phases of a crop cultivation cycle, and less vigorous seeds, other than reducing the crop competitiveness toward weeds, increase the exposure of seedlings to abiotic (drought, heat) and biotic (soil-borne pests) stresses. On the whole, the QTLs we detected provide important knowledge on the genomic region linked physiological and phenotypic properties in eggplant which may be usefully exploited in future breeding programs.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/7/745/s1, Figure S1: Distribution of sequenced reads, after quality cleaning and trimming procedures, across the parental lines and F1 (first three samples) as well as the RIL mapping population (in million reads); Figure S2: Eggplant linkage map. Marker names are shown to the right of each chromosome, with map distances (in cM) shown on the left; Figure S3: Heat map representing pairwise recombination fractions with LOD scores for each marker on all 12 chromosomes; Figure S4: Marey Map plots of SNPs mapped to positions on the 12 *S. melongena* chromosomes vs. their physical positions on the v3. eggplant pseudomolecules from Barchi et al. [36]; Table S1: Markers name, chromosome, genetic position of all markers on the map and RILS graphical haplotypes are reported. The color 'orange' represents homozygous alleles from the "305E40" parent, 'blue' represent homozygous alleles from "67/3" parent, 'green' as heterozygous alleles and white as missing data; Table S2: Parameters associated with the eggplant genetic map compared to the previous developed by Barchi et al. [36].

**Author Contributions:** Conceptualization, F.M., F.S., S.L. and G.L.R.; Data curation, L.B.; Formal analysis, L.B., M.M. and E.P.; Funding acquisition, L.T., F.C., E.P. and G.L.R.; Investigation, L.T., L.B., F.M., N.A., D.P., S.G., T.S., S.F., A.M., T.C. and G.L.R.; Visualization, L.T., L.B. and E.P.; Writing—original draft, L.T., L.B. and M.M.; Writing—review & editing, F.M., F.C., E.P., F.S., S.L. and G.L.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially funded by the European Union's Horizon 2020 Research and Innovation Program under the Grant Agreement number 677379 (G2P-SOL project: 'linking genetic resources, genomes and phenotypes of solanaceous crops') and by CARIPLO Foundation in the frame of the project Code 2016−0723 (WAKEAPT project: 'Seed Wake-up with Aptamers: a New Technology for Dormancy Release and Improved Seed Priming').

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


#### *Genes* **2020**, *11*, 745


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Natural Variation Uncovers Candidate Genes for Barley Spikelet Number and Grain Yield under Drought Stress**

#### **Samar G. Thabet 1, Yasser S. Moursi 1, Mohamed A. Karam 1, Andreas Börner <sup>2</sup> and Ahmad M. Alqudah 2,\***


Received: 22 March 2020; Accepted: 5 May 2020; Published: 11 May 2020

**Abstract:** Drought stress can occur at any growth stage and can affect crop productivity, which can result in large yield losses all over the world. In this respect, understanding the genetic architecture of agronomic traits under drought stress is essential for increasing crop yield potential and harvest. Barley is considered the most abiotic stress-tolerant cereal, particularly with respect to drought. In the present study, worldwide spring barley accessions were exposed to drought stress beginning from the early reproductive stage with 35% field capacity under field conditions. Drought stress had significantly reduced the agronomic and yield-related traits such as spike length, awn length, spikelet per spike, grains per spike and thousand kernel weight. To unravel the genetic factors underlying drought tolerance at the early reproductive stage, genome-wide association scan (GWAS) was performed using 121 spring barley accessions and a 9K single nucleotide polymorphisms (SNPs) chip. A total number of 101 significant SNPs, distributed over all seven barley chromosomes, were found to be highly associated with the studied traits, of which five genomic regions were associated with candidate genes at chromosomes 2 and 3. On chromosome 2H, the region between 6469300693-647258342 bp includes two candidate drought-specific genes (*HORVU2Hr1G091030* and *HORVU2Hr1G091170*), which are highly associated with spikelet and final grain number per spike under drought stress conditions. Interestingly, the gene expression profile shows that the candidate genes were highly expressed in spikelet, grain, spike and leaf organs, demonstrating their pivotal role in drought tolerance. To the best of our knowledge, we reported the first detailed study that used GWAS with bioinformatic analyses to define the causative alleles and putative candidate genes underlying grain yield-related traits under field drought conditions in diverse barley germplasm. The identified alleles and candidate genes represent valuable resources for future functional characterization towards the enhancement of barley cultivars for drought tolerance.

**Keywords:** GWAS; drought; barley; spikelet development; candidate gene

#### **1. Introduction**

Barley (*Hordeum vulgare* L.) ranks as one of the most important cereal crops worldwide. Globally, barley is the fourth most important cereal crop in terms of production after maize (*Zea mays* L.), rice (*Oryza sativa* L.), and wheat (*Triticum spp.*) (Faostat 2017, http://www.fao.org/faostat/en/#home). One limitation in achieving the production target is abiotic stress which limits the quality and nutritional value of the grain in cereal crops worldwide [1]. Among all abiotic stresses, drought is the most important environmental stress which limits crop production and yield [2–4], and is becoming more common particularly in the arid and semiarid regions [2].

Crops can be exposed to drought during their entire life cycle from vegetative to reproductive stages [5]. Drought stress affects crop growth and yield during all developmental stages [6]. Water shortage at early growth stages can cause severe problems for seedlings, restricting the emergence, growth and development of seedlings, and thus affecting grain yield [7]. Furthermore, the developing plants will have poor tillering capacity, leading to fewer tillers per unit area and thus lower yield potential. Moreover, drought in the period of stem elongation causes a decrease in the number of grains per unit area because it has a negative impact on floret formation and fertility [8].

At post-anthesis, water insufficiency reduces the grain filling rate and duration leading to shriveled grains [2]. Moreover, the effect of drought on yield is highly complex and involves processes as diverse as reproductive organs, gametogenesis, fertilization, embryogenesis and seed development [6,9]. Reproductive and seed development phases are especially sensitive to drought stress [2,10]. In barley, a reduction in the number of grains per spike, grain filling duration and dry matter accumulation have already been reported to decrease grain yield [2,10]. Several studies have reported that early growth stage parameters (e.g., tiller number, biomass formation, etc.) are highly correlated with yield potential and grain quality at harvest under both normal and drought stress conditions in various cereal crops, including barley [11,12]. Accordingly, understanding the genetic basis for drought tolerance in crop plants by identifying the genetic loci and the candidate genes associated with these traits is useful for developing new varieties with more drought-tolerant characters.

The Genome-Wide Association Scan (GWAS) approach is widely employed to reveal associations between genomic loci and advantageous traits in a given population based on linkage disequilibrium [13]. These loci then become targets for improving new genotypes by the breeder. The GWAS is very effective in identifying major candidate genes regulating mono- or oligogenic agricultural traits. Recently, GWAS has been successfully used to identify genes for yield-related traits [14,15]. Barley has a high-level population structure such as two-rowed and six-rowed cultivars, spring and winter barley [16]. This may lead to spurious marker–trait associations in GWAS [17]. Therefore, it is important to use strong statistical methods and strategies to control the population structure [17]. A mixed-linear model (MLM) approach has been developed to control spurious associations through account multiple levels of relatedness leading to better performance [18].

GWAS has successfully yielded genomic locations for quantitative trait loci (QTL) in crop cereals [13]. Identification of genomic locations for QTL using linked segregating markers is considered to be highly useful for marker-assisted breeding. Nevertheless, the ultimate goal of GWAS in crop species is to detect new QTL through genetic dissection of complex traits.

The present study aimed at identifying the genetic basis of drought tolerance at different developmental and growth phases in 121 spring barley accessions under field conditions using GWAS. In total, 101 SNPs showed association with different traits that were distributed across the seven chromosomes of barley. The identified QTL colocalized with several genes that are exclusively distributed on chromosomes 2H and 3H. The annotation and expression of these genes demonstrated their roles in drought tolerance.

#### **2. Materials and Methods**

#### *2.1. Experimental Setup and Phenotyping*

In total, 121 diverse spring barley accessions from different geographical origins were grown under the field conditions in the 2017/2018 growing season at the Experimental Station of University of Fayoum. The collection included 83 cultivars, 29 landraces, and 9 breeding lines. They originated from Europe (EU, 62), West Asia and North Africa (WANA, 24), East Asia (EA, 22), and the Americas (AM, 13). The row types were two-rowed (72) and six-rowed (49). The population structure and genetic diversity using the genotypic information of the accession in the collection are shown in Figure S1 that demonstrated there is no clear structure on our collection.

Five seeds from each accession were sown in plastic pots (40 cm × 26 cm × 26 cm) filled with field-soil on the 1st of December 2017 under field conditions. The soil texture was classified as clay-loam with pH = 8.0, total N% = 9.4 and available P = 58.0 ppm. Manual irrigation was performed as required, and 5 g (17:11:10/N:P:K) fertilizer was added to each pot. In field-grown plants, each accession was replicated in three pots of each treatment (control and drought). At the beginning of early reproductive (spikelet development phase) phase (~25 days after sowing), the plants were thinned into three plants per pot, standing with a border to eliminate positional and environmental effects on growth and development. Weeds were controlled manually.

Plants were irrigated until the onset of early reproductive, then the plants were then exposed to two watering treatments: (1) well-watered treatment (soil maintained at ~75% of field capacity (FC)); and (2) severe drought stress (at 35% FC). To maintain the targeted (~75% FC) and drought (~35% FC), eight randomly selected accessions were used as a reference. Before irrigation, eight reference pots were weighed and watered to adjust the corresponding field capacity, and the rest of the experiment was watered accordingly. Irrigation and drought treatment continued until maturity after that, irrigation withholds until harvest. Nine morphological, developmental and grain yield-related traits were measured from at least three biological replicates for each accession under each treatment. More information about the phenotypic trait measurements are explained in Table 1. Respective drought tolerance indices were calculated from the recorded data as described in (Table 1).


**Table 1.** The name and abbreviation of measured traits and respective description of measurements.

The average of temperature and humidity during the growing season of this experiment (2017–2018) was 16.62 ◦C and 55.16%, respectively. The maximum temperature was recorded in November and April (24.33 and 27.66 ◦C, respectively) with the minimum temperature from January to February (8.29 and 9.12 ◦C, respectively). The minimum humidity was 30.37% and 30.41% in November and February, respectively, whereas the maximum humidity was recorded in April (78.33%) (Figure S2).

#### *2.2. Data Analysis*

Analysis of variance [19] was conducted to compare the controlled and drought stress conditions at *p* < 0.05 for all measured traits using GENSTAT 18 [20]. Data were analyzed as a randomized complete block design (RCBD) with three replications. The treatments were considered as main plots and the accessions were considered as sub-plots. Broad-sense heritability (*H*2) for the measured traits under each condition separately was calculated using GENSTAT 18. The phenotypic data were subjected to Residual Maximum Likelihood (REML) to analyze it in mixed linear model (MLM). Mean estimation of each measured trait in each accession under each treatment was calculated as Best Linear Unbiased Estimates (BLUEs) using GENSTAT 18. Correlation matrix analysis among the traits in each treatment was separately calculated by GENSTAT 18. Comparison between treatments at *p* < 0.05 for each trait including boxplots were calculated using R-studio [21].

#### *2.3. Genome-Wide Marker–Trait Associations*

The accessions were genotyped by a 9K IlluminaTM SNPs chip. In the analysis, we only used the markers which passed the quality control as minor allele frequency (MAF) ≥ 0.05 with their physical positions. A mixed linear model (MLM) was performed to determine marker–trait associations between the estimated phenotypic traits (BLUEs) and genotypic data. Different statistical models, e.g., general-linear model (GLM), mixed linear model (MLM), and compressed MLM (CMLM)) were tested in GWAS using GAPIT R package [22]. Finally, we used MLM as a powerful model considering the population structure, including kinship and PCA, to control the population structure influence. False discovery rate (FDR) at 0.001 was calculated for each trait under each treatment separately and association signals passed the threshold of FDR at 0.001 (−log10 *p*-values ≥ FDR) were used in further analyses. To be sure that our associations were true, we followed the GWAS and post-GWAS protocol published recently [13].

#### *2.4. SNP-Gene-Based Association and Haplotype Analysis*

At each chromosome, linkage disequilibrium (LD, r2) among the significant SNPs within highly associated genomic region was calculated and presented as heatmap plot. This allowed us to define the most important physical position that had been used for candidate gene identification. The physical positions of SNPs exceeding FDR within the linkage disequilibrium interval were used in annotation for high-confidence (HC) candidate gene with other respective information using the barley genome explorer web-based with recent barley genome dataset (BARLEX; http://apex.ipk-gatersleben.de).

SNPs within the candidate gene physical position were used for further validation of SNP-Gene-based haplotype analyses and expression analyses. *T*-test at *p* < 0.05 was used to calculate the significant differences between alleles on the associated trait(s) [13]. RNA-seq datasets were derived from 16 different tissues of barley cv. 'Morex' cultivar, each with three biological replicates. In total 48 samples were used for generation of RNA-seq data.

From seven vegetative, six inflorescence, two developing grain and one germinating grain tissues, more details about RNA-seq experiments was published by Mascher et al. [23]. We used BARLEX; an expression database for barley that presented as FPPM (fragments per kilobase per million mapped reads).

#### **3. Results**

#### *3.1. Phenotypic Characteristics and Natural Variation*

In total, nine traits were recorded under control and drought treatments. Additionally, respective drought tolerance indices were calculated and used as a derived trait for GWAS. A wide range of phenotypic variation with normal distribution was detected for all traits (Figure 1 and Figure S3). The means of most of the studied traits showed a significant reduction under drought treatment compared to control conditions (Table 2 and Figure S3). There were no significant differences between the treatments in phase transition i.e., AT, SH and AE developmental stages (Table 2 and Figure S3). Drought treatment influenced significantly other developmental and yield traits such as plant height and spikelet number per spike (Table 2 and Figure S3). Notably, the genotypes showed wide range of variation in all drought tolerance indices (Table 2, Figure 2 and Figure S4).

Furthermore, heritability values were relatively high under drought, ranging from 0.64 for TGW to 0.84 for AT, whereas they ranged from 0.72 to 0.80 for AT and SH, respectively under control conditions. Additionally, the heritability values varied for tolerance indices from 0.68 for AE\_DTI and NGS\_DTI to 0.78 for AT\_DTI (Table 2).


**Table 2.** Analysis of variance and heritability for the measured traits under control and drought treatments.

*<sup>H</sup>*2—Heritability; T—Treatment; G—Genotype; T × G—Treatment by Genotype interaction; DTI—Drought Tolerant Index; The degree of significance is indicated as \* *p*, 0.05; \*\* *p*, 0.01; \*\*\* *p*, 0.001; ns: not significant.

**Figure 1.** Histogram of phenotypic values distribution analysis of the studied traits; (**a**) Awn Tipping, (**b**) Spike Heading, (**c**) Anther Extrusion, (**d**) Plant Height, (**e**) Spike length, (**f**) Awn Length, (**g**) Spikelet per Spike, (**h**) Grain per Spike and (**i**) Thousand Grain Weight (TGW) in 121 spring barley accessions under control and drought stress.

**Figure 2.** Histogram of phenotypic values distribution analysis of drought tolerance index variation of the studied traits in 121 spring barley accessions under control and drought stress. (**a**) Awn Tipping, (**b**) Spike Heading, (**c**) Anther Extrusion, (**d**) Plant Height, (**e**) Spike length, (**f**) Awn Length, (**g**) Spikelet per Spike, (**h**) Grain per Spike and (**i**) Thousand Grain Weight (TGW).

#### *3.2. Correlations Analysis*

Significant positive correlations were observed among various traits within both treatments. For example, AE showed a significantly high positive correlation with AT and SH (*r* = 0.98 \*\*\* and *r* = 1.00 \*\*\*, respectively) under control conditions and (*r* = 0.93 \*\*\* and *r* = 1.00 \*\*\*, respectively) under drought treatment (Figure 3A,B and Figure S5). On the contrary, some traits showed high significant negative correlations under both conditions. Interestingly, TGW showed negative correlation with NGS under control and drought conditions (*r* = −0.21 and −0.44 \*\*\*, respectively; Figure 3A,B).

For DTI, AT\_DTI showed high significant positive correlation with SH\_DTI and AE\_DTI (*r* = 0.92 \*\*\* and *r* = 0.91 \*\*\*, respectively). Additionally, SH\_DTI exhibited high positive correlation with AE\_DTI (*r* = 0.99 \*\*\*). The SL\_DTI with AL\_DTI showed a significant negative correlation (*r* = −0.27 \*\*\*; Figure S5).

**Figure 3.** Correlations matrix among the studied traits in barley genotypes (**A**) under control, and (**B**) under drought stress. The degree of significance is indicated as \* *p*, 0.05; \*\* *p*, 0.01; \*\*\* *p*, 0.001; ns: not significant.

#### *3.3. Natural Genetic Variation and Candidate Genes Potentially Underlying Drought Tolerance*

GWAS analysis of 121 selected accessions was performed to find out the natural genetic variation of the studied traits. We detected a total number of 101 significant marker–trait associations (with −log10 *p*-value ≥ 3) distributed over the seven barley chromosomes (Figure 4 and Table S1). There was plenty of natural genetic variation of all studied traits in this collection. Through GWAS analysis, we found twelve interesting genomic regions for genetic variation of all studied traits distributed on only two chromosomes (2H and 3H).

Highly significant LD was found among significant SNPs within these genomic regions (Figure 4b), indicating that these significant SNPs are potentially harboring important candidate genes in addition to their useful for marker-assisted selection. On chromosome 2H, two adaptive genes were identified, whereas on chromosome 3H, ten genes that represent a combination of both adaptive and constitutive genes were identified. Adaptive genes might be control-specific, i.e., genes that regulate trait variation under control only, or drought-specific (genes that regulate trait variation under drought only). Constitutive genes regulate trait variation under both control and drought conditions (Table 3).

In total, eight SNPs were associated (−log10 *p*-value ≥ 3) with AT parameters. Out of these, five SNPs were adaptive: three were control-specific and two were drought-specific; the remaining three SNPs were constitutive. The constitutive SNPs were identified on chromosomes 1H, 3H and 4H. The most significant one (with −log10 *p*-value = 8.7) was observed on chromosome 3H at 126.69 cM. Only one constitutive gene was identified *HORVU3Hr1G098200* (Chr. 3; 126.69 cM) (Table 3).

Sixteen SNPs were associated (−log10 *p*-value ≥ 3) with SH parameters. Out of these, seven SNPs were adaptive: four SNPs were control-specific and three drought-specific; the remaining nine SNPs were constitutive. The constitutive SNPs were identified on chromosomes 1H, 2H, 3H, 4H, 5H and 6H where the most significant one (with −log10 *p*-value = 4.7) was observed on chromosome 6H at 53.54 cM. Only seven SNPs showed association with candidate genes. Four genes were control-specific; *HORVU3Hr1G088300*, *HORVU3Hr1G089160*, *HORVU3Hr1G089080* and *HORVU3Hr1G098200*. Three constitutive genes were identified: *HORVU3Hr1G098200*, *HORVU3Hr1G116790* and *HORVU3Hr1G115810* (Table 3).

In total, 18 SNPs showed association (with −log10 *p*-value ≥ 3) with AE parameters. Ten SNPs were adaptive: six were control-specific and four were drought-specific; the remaining SNPs were constitutive. These constitutive SNPs were mapped on chromosomes 2H, 3H, 4H, 5H and 6H. The most significant one (with −log10 *p*-value = 4.6) was detected on chromosomes 6H at 53.54 cM. Out of these, twelve SNPs showed association with candidate genes. Eight genes were control-specific: *HORVU3Hr1G018650*, *HORVU3Hr1G020430*, *HORVU3Hr1G020660*, *HORVU3Hr1G019590*, *HORVU3Hr1G088300*, *HORVU3Hr1G089160*, *HORVU3Hr1G089080* and *HORVU3Hr1G098200* and four genes are constitutive: *HORVU3Hr1G116790*, *HORVU3Hr1G098200* and *HORVU3Hr1G115810* (Table 3).

For the trait PH, six SNPs were associated. Out of these, five SNPs were adaptive: four SNPs were control-specific and one SNP was drought-specific. Only one constitutive SNP was found on chromosome 7H at 131.59 cM. There were no SNPs associated with candidate genes for PH (Table 3).

GWAS analysis showed that six SNPs were associated with SL parameters. Only one SNP was associated with a candidate gene that was control-specific, *HORVU3Hr1G098200* (Table 3).

In total, eight SNPs showed association (*p*-value ≤ 0.001) with AL. Out of these, six SNPs were adaptive: two were control-specific and four were drought-specific; the remaining two SNPs were constitutive and mapped on chromosomes 2H and 3H. Six SNPs were associated with candidate genes. Only one gene was control-specific, *HORVU3Hr1G098200*; and one was constitutive, *HORVU3Hr1G098200* (Table 3).

For NSS, 15 SNPs showed significant association with −log10 *p*-value ≥ 3. Six of them were associated with adaptive candidate genes. Three SNPs were control-specific: *HORVU3Hr1G018650*, *HORVU3Hr1G020430* and *HORVU3Hr1G020660* and another three were drought-specific: *HORVU2Hr1G091030*, *HORVU2Hr1G091170* and *HORVU3Hr1G019590*. No constitutive genes were identified (Table 3).

In total, 15 SNPs were associated with NGS parameters. Seven SNPs showed association with candidate genes. Five genes were control-specific: *HORVU2Hr1G091030*, *HORVU3Hr1G018650*, *HORVU3Hr1G020430*, *HORVU3Hr1G020660* and *HORVU3Hr1G019590* and two drought-specific genes: *HORVU2Hr1G091030* and *HORVU2Hr1G091170* (Table 3).

For TGW parameters, seven SNPs showed significant association with −log10 *p*-value ≥ 3. Only two SNPs showed association with candidate genes. One was control-specific, *HORVU3Hr1G098200*, and the other drought-specific one was *HORVU3Hr1G098200* (Table 3).

Overall, all the identified genes revealed a pleiotropic effect, i.e., each gene controlled more than one trait. The gene *HORVU3Hr1G098200*, for instance, regulates the variation of ten traits (Table 3). On the contrary, they differ in their mode of action, and some of them are adaptive genes such as *HORVU2Hr1G091170* and *HORVU3Hr1G018650*. The first gene modulates the variation of NGS and NSS under drought; whereas the second controls them under control. Other genes are constitutive, such as *HORVU3Hr1G116790* and *HORVU3Hr1G115810* (Table 3). Surprisingly, the gene *HORVU3Hr1G098200* showed a constitutive/adaptive mode of action. It controls the variation of (TGW) constitutively, while regulating the variation of (AE, AL, SH and SH) in an adaptive manner, i.e., under control only.

42

**Figure 4.** (**a**) The significant SNPs (101 SNPs, −log10 ≥ 3) associated with all traits under control and drought stress conditions. The x-axis shows the chromosomes and the SNP positions. The y-axis shows the −log10 (*P*-value) for each SNP marker. (**b**) Heatmap linkage disequilibrium to detect the region of candidate genes, which show a consistent effect on traits and associated with SNPs passing ≥ FDR using their physical position.


*Genes* **2020** , *11*, 533

#### *3.4. SNP-Gene-Based Analysis*

Twelve SNPs at chromosomes 2H and 3H were physically co-located inside the candidate genes (Table 3). Two SNPs at 2H (SCRI\_RS\_166540 and SCRI\_RS\_157347) were detected within the genes *HORVU2Hr1G091030* and *HORVU2Hr1G091170*, respectively. Meanwhile, ten SNPs at 3H were co-located within the physical position of candidate genes. For example, SNP numbers BOPA1\_2391-566 and SCRI\_RS\_177313 were located within *HORVU3Hr1G019590* and *HORVU3Hr1G098200* genes, respectively. Interestingly, these SNPs were mostly associated with NGS, NSS, and TKW under drought stress. The rest of the genes at 3H were associated with traits under drought stress or with DTI for SH and AE.

The allelic analysis of the SNPs that are associated with traits under drought showed that alleles A, G and A from markers SCRI\_RS\_166540, SCRI\_RS\_157347 and BOPA1\_2391-566, respectively, have a highly significant impact on NSS (Figures 5a and 6). The genes *HORVU2Hr1G091030* and *HORVU2Hr1G091170* at 2H controlled NGS under drought via markers SCRI\_RS\_166540 and SCRI\_RS\_157347, where alleles G and A, respectively, increased NGS significantly (Figures 5b and 6). Only one marker (SCRI\_RS\_177313 from *HORVU3Hr1G098200* gene) showed a significant effect on TKW by allele A that increased the value under drought (Figure 5b).

**Figure 5.** SNP-gene-based analysis for (**a**) Spikelet number per spike, (**b**) Grain number per spike in barley genotype, and (**c**) TKW (gram). The degree of significance indicated as \* *p*, 0.05; \*\* *p*, 0.01; \*\*\* *p*, 0.001; \*\*\*\* *p*, 0.0001.

47

**Figure 6.**

Expression analysis of the candidate genes in different plant organs at different

developmental

 stages.

#### *3.5. Expression Analysis of Candidate Genes*

The expression analysis of candidate genes in different organs showed a wide range of expression for the genes (Figure 6). Notably, the associated genes with spikelet and grain number per spike under drought stress (Figure 5) showed a high expression for most of the organs (Figure 6). Gene *HORVU2Hr1G091170* at 2H in particular that had high impact on spikelet and grain numbers under drought showed highest expression in the respective grain organs, e.g., lodicule and rachis in addition to spike at 1–1.5 cm length (Figure 6). The second highest expressed gene from the highly associated ones was *HORVU2Hr1G091030*. This gene was highly expressed in developing grain and lodicule (Figure 6). The expression of these genes demonstrated their biological roles in the spikelet and grain development under drought conditions. Other genes, e.g., *HORVU3Hr1G020660* and *HORVU3Hr1G018650*, showed high expression particularly in senescing leaves, suggesting their roles in leaf development (Figure 6).

#### **4. Discussion**

Studying drought stress tolerance under field conditions in cereals such as barley is very limited, as it requires complex and laborious experiments for population characterization, in addition to being influenced by environmental factors [3,4]. Nevertheless, the present study focused on investigating the natural variation in diverse spring barley collections and on identifying candidate genes associated with the traits of interest under field conditions.

In the present study, there was a considerable reduction in most traits under drought stress compared to control conditions. These results indicated that drought stress reduced grain yield by decreasing NSS, NGS, and TGW. Our results are supported by the findings of [2,24], who examined the response of spring barley to pre- and post-anthesis drought and reported a yield reduction due to pre-anthesis water deficit on several fertile spikes and grains per plant. Our results are in agreement with those of Samarah, Alqudah, Amayreh and McAndrews [2], who found that drought stress reduced grain yield by reducing the number of tillers, spikes and grains per plant and individual grain weight in barley (*Hordeum vulgare L.*) as a result of early maturity and shortened grain filling duration at 25% field capacity compared to control. In conclusion, drought stress negatively influenced barley yield through impairing grain development, size and grain filling duration.

Agronomic traits such as grain yield and its components (NSS, NGS, and TGW) are the major selection criteria for drought tolerance in barley breeding [25]. Therefore, understanding the interplay among these parameters is of high importance. The correlation between TGW and NGS was always negative and significant under control and drought, respectively. The positive and significant correlation between TGW and SL indicates that yield mainly depends on spike length. Zhou et al. [26] suggested that the grain yield traits interacting with each other, increase in one of them (e.g., TGW) can be correlated with a reduction in another (e.g., NGS), which is in agreement with our results. In support of our findings, in a European spring barley collection [27], it was found that NGS showed negative correlations with all other yield parameters except grain length, concluding that the yield was mainly dependent on grain size and SL rather than NGS. Several reports showed that drought-tolerant genotypes implement high productivity under both stressed and unstressed conditions [2,28]. Therefore, the comparative analysis of the yield components under drought and well-watered conditions can be used for predicting stress tolerance of genotypes, and then in selection of more tolerant barley lines for future breeding purposes [24,29].

#### *The Role of Putative Candidate Genes in Drought Tolerance*

Promising genomic regions are located at position 75.56 cM on chromosome 2H, harboring two important candidate genes. The first one is *HORVU2Hr1G091030*, which encodes RNA polymerase II C-terminal domain phosphatase-like 1 (*CPL1*), for NGS\_C, NGS\_D, and NSS\_D. *CPL1* is a negative regulator of stress-responsive gene transcription, ABA, and stress responses [30]. In Arabidopsis, *CPL1* regulates gene expression under various osmotic stresses through ABA signaling [30]. Loss

of *CPL1* function in the mutants enhances tolerance to oxidative stress including drought and salt stresses [31]. In rice, *OsCPL1* is expressed in the young spikelets [32]. Most likely, this gene is expressed during the development of barley spikelets under drought. Additionally, it controls the NGS under drought and control conditions in a constitutive manner. The second candidate, *HORVU2Hr1G091170*, encodes expansin B3 for NSS\_D and NGS\_D. Expansins have been implicated in the responses of various plant species to water stress. In maize, increased expansin activity was found to be involved in maintaining the growth of primary roots at a low water potential [33]. The expression of a root β-expansin gene, *GmEXPB2*, is remarkably participated in root system architecture responses to several abiotic stresses, such as Fe, P, and water deficiency [34]. *RhEXPA4* is a rose expansin gene that is up-regulated in rose petals after dehydration [35], and it confers salt and drought tolerance to transgenic Arabidopsis [36]. In potato, most of expansin-like B genes have a potential role in multistress tolerance and upregulated under stresses including drought [37]. These two genes—*HORVU2Hr1G091030* and *HORVU2Hr1G091170*—showed different expression profiles (Figure 6). The noisy expression profile of *HORVU2Hr1G091170*, suggesting a stress-responsive gene, exclusively regulates the variation of NGS\_D and NSS\_D under drought (Table 3). While the slightly constant expression profile of *HORVU2Hr1G091030* indicates a constitutive gene that regulates the variation of NGS under control and drought. These findings are in accordance with the results of several authors who found that the stress-responsive genes exhibiting noisy expression profiles, while the constitutive ones showing constant expression patterns reviewed in Lopez-Maury et al. [38].

Notably, the allelic diversity analysis shows that allele A and G from *HORVU2Hr1G091030* and *HORVU2Hr1G091170* genes, respectively, have a significant positive impact on NSS and then NGS under drought stress. The expression of these genes during spike, spikelet and grain developmental stages demonstrates their influence on agronomic traits under drought stress conditions. In terms of molecular breeding, the above-mentioned alleles were the highest alleles explained the natural variation under drought stress suggesting their usefulness in breeding programs. Taken together, this provides evidence that these genes are drought-specific and involved in the drought stress-tolerance pathway. These findings indicate that both genes are of high importance for enhancing barley grain yield under drought stress.

Interestingly, ten candidate genes were detected on chromosome 3H. Out of these, three genes at positions 44.26 and 45.55 cM were identified as candidates for various traits such as AE, NGS, and NSS under control conditions. The first one is *HORVU3Hr1G018650* encoding pyruvate decarboxylase-2 (*PDC2*), which belongs to pyruvate decarboxylase (*PDC*) gene family. Pyruvate has been involved in the ethanolic fermentation pathway that is associated with flooding tolerance when plant cells switch from respiration to anaerobic fermentation [39]. Additionally, fermentation has important functions in the presence of oxygen, mainly in germinating pollen and during abiotic stress. This indicates the interdependency between pyruvate decarboxylase (PDC) and AE, NGS, and NSS under control conditions. Furthermore, PDC, which catalyzes the first step in this pathway, is thought to be a main regulatory enzyme [40]. In Arabidopsis, the expression of PDC genes during abiotic stresses has been reported [41]. In maize and Arabidopsis, strong induction of fermentation genes takes place in anaerobic conditions [42]. Thus, it is conceivable that ethanolic fermentation is part of a general response to environmental stress, e.g., drought stress.

For traits such as AE\_C, NGS\_C, and NSS\_D at position 46.25 cM, *HORVU3Hr1G019590* encodes myb domain protein 37. MYB (myeloblastosis) has a regulatory role in ABA signaling by activating some stress-inducible genes [43]. In Arabidopsis, MYB, namely *AtMYB60*, *AtMYB44*, and *AtMYB15*, have been implicated in the regulation of stomatal closure and ABA-mediated response to drought and salt stresses [44]. Agarwal et al. [45] detected that *AtMYB15* was expressed in both vegetative and reproductive organs and up-regulated by cold and salt stresses. The differential expression of numerous MYB TFs in the *Triticeae* was shown to be involved in the response of abiotic stress conditions such as drought and salt stresses [46,47]. These findings suggest that the genes in question are constitutive genes and may have a role in drought tolerance in barley during heading and maturation.

For SH\_C and AE\_C, *HORVU3Hr1G089160* encodes AP2-like ethylene-responsive transcription factor at position 104.32 cM. The AP2/ERF superfamily regulates diverse developmental responses such as flower pedicel abscission [48], leaf senescence [49], cell proliferation and shoot branching [50]. Houston et al. [51] observed that mutants of *HvAP2* internode lead to the reduction of elongation in both the culm and the spike in barley, suggesting that the *HvAP2* alleles increase grain yield by controlling spike density. In the current study, the allelic variation was only observed under control condition indicating that this gene might not be involved directly in drought tolerance.

Additionally, *HORVU3Hr1G116790* encodes Aquaporin-like superfamily protein which is a candidate for SH\_DTI and AE\_DTI. Aquaporins (AQPs) are a class of water channel proteins that belong to the major intrinsic protein (MIP) superfamily of membrane proteins [52]. These proteins regulate the movement of water and other small molecules across plant vacuolar and plasma membranes [53]. Aquaporins have also been suggested to have an essential role in plant tolerance of biotic and abiotic stresses [54], and extension growth [52]. Furthermore, it was reported in [55] that various aquaporin homologs are involved in plant stress responses against a variety of environmental stresses that disturb plant cell osmotic balance and nutrient homeostasis.

The most effective gene *HORVU3Hr1G098200* was mapped at 126.69 cM. This gene orchestrates the variation of ten traits both in constitutive and adaptive manner (Table 3). This explains the significant correlations between these traits, either under control or drought (Table 3 and Figure 3A,B). Shi et al. [56] reported that Chromosome 3B harbors genes that may be significant in controlling agronomical important traits, such as yield and resistance to biotic and abiotic stress in wheat. The QTL for these traits maps quite close to semidwarf1 (sdw1) on chromosome 3H at 126.69 cM. The *semi dwarf 1 (sdw1)* gene has previously been found to control the most desired agronomic traits barley reviewed by Hedden [57]. The pleiotropic effect of *sdw1* gene was evidenced in wheat [58,59].

Additionally, the gene *HORVU3Hr1G098200* showed an interplay between the constitutive and adaptive control pattern. For example, it constitutively controls the variation of thousand grain weight (TGW) under both control and drought stress. At the same time, it controls the spike heading only under control in an adaptive manner (Table 3). This pattern indicates that *HORVU3Hr1G098200* is partially constitutive and partially stress-responsive gene. Therefore, we considered it an adaptive gene when it controlled a trait(s) under control or drought. On the other hand, we considered it constitutive when it controlled trait(s) under both conditions. This gene, nevertheless, exhibited constant expression profile in different plant tissues, spanning the reproductive period from anther extrusion until the seed set, suggesting a key role in controlling different traits and more likely to be constitutive (Figure 6). This finding is consistent with the results of [60], who found that cells express some genes constantly to maintain the concentration of some proteins tuned with the cell physiological needs. The constant expression pattern characterizes the constitutive genes rather than the stress responsive (adaptive) ones (reviewed in Lopez-Maury, Marguerat and Bahler [38]. According to Blum [61], drought stress when expressed as a final yield is affected by constitutive and adaptive plant traits (genes). These constitutive QTL/genes represent an instrumental tool for selection as they show stability across different environments compared to the adaptive ones. Moreover, the selection for drought tolerance based on these QTL/genes does not require drought stress.

The last candidate gene is *HORVU3Hr1G115810*, which encodes Kinetochore protein spc25 and affects AE\_DTI and SH\_DTI. Kinetochore proteins may have a pivotal role for centromere and kinetochore functioning [62–64], and chromosome segregation mediating [65]. Specifically, kinetochore protein MIS12 is required for the co-orientation of sister kinetochores during meiosis I in maize [66]. NDC80 kinetochore protein serves as a contact point for chromosome–spindle interaction [67]. Interestingly, QTL at 3H 126 and 154 cM have previously been reported to be associated with grain number and yield in barley under drought stress conditions [4,68,69].

On chromosome 3H, three genes (*HORVU3Hr1G020660*, *HORVU3Hr1G088300*, and *HORVU3Hr1G098200*) are counterparts of the ortholog in Arabidopsis, *AT2G36270*. The corresponding genes are expressed during the reproductive stage in the different flower and seed organs (Figure 6), indicating their significance in flowering and seed set. Our results are in accordance with the findings of Klepikova et al. [70], who compared the spatiotemporal expression and stability of a lot of genes in 79 organs and developmental stages. Additionally, they found that *AT2G36270* had been significantly expressed in senescent organs (leaves and silique); this is similar to expression profile of the gene *HORVU3Hr1G020660* that highly expressed in senescent leave (Figure 6). Taking these findings together, the similarity of spatiotemporal expression patterns, as well as functions of these genes in both barley and Arabidopsis, suggests that they might be are homologous for AT2G36270.

The significant role of *AT2G36270* (*ABI5*, *AtABI5* and *BZIP39*) during different growth stages, as well as in drought tolerance, has been evidenced in several studies. Finkelstein et al. [71] reported that it encodes a member of the basic leucine zipper transcription factor family, involved in ABA-regulated gene expression during germination, seed development and subsequent reproductive stage. In particular, *ABI5* regulates a set of the late embryogenesis-abundant genes during both seed and ABA-inducible vegetative gene expression in wild-type and abi5-1 plants [71]. Mittal et al. [72]reported that overexpression of *AtABI5* in transgenic cotton (*Gossypium hirsutum*) showed resistance to the imposed drought stress through ROS scavenging and osmotic adjustment, enhancing photosynthesis, as well as traits of drought avoidance (bigger root and leaf systems) and tolerance (longer internode length and higher stem weight) leading to better establishment under water shortage. In rice (*Oryza sativa*), overexpression of *OsbZIP46CA1* significantly increased tolerance to drought and osmotic stresses at flowering stage, and suggested that *OsbZIP46* is a positive regulator of ABA signaling and drought stress tolerance by modulating many stress-related genes [73].

In summary, the present study showed the value of using field experiments to investigate natural phenotypic and genetic variation in a worldwide panel of barley accessions underlying agronomic traits such as grain yield and its components which can be exploited for crop improvement. Drought stress negatively influences most of the studied traits. In addition, we observed significant positive correlations among various traits within both control and drought treatments. Candidate genes associated with drought response were detected on two chromosomes, notably 2H and 3H. Interestingly, most the candidate genes are described to be involved in responses to abiotic stresses such as drought and salt. Interestingly, the genomic regions at 75 cM on 2H and 126.69 cM on 3H harbor three candidate gene *HORVU2Hr1G091030 HORVU2Hr1G091170* and *HORVU3Hr1G098200*, which are highly associated with spikelet and final grain number per spike, suggesting the crucial role in controlling grain yield under drought conditions. The discovered SNPs and candidate genes for drought response will be helpful for breeding drought tolerant barley cultivars.

#### **5. Conclusions**

Conclusively, the present study showed that drought negatively affects the yield-related traits. Despite of the reduction in most traits under drought, the heritability estimates for all respective traits were high, indicating the potential of this collection to conduct a GWAS analysis looking for drought-controlling alleles/genes. Additionally, the present study revealed that combining GWAS and bioinformatics is a very instrumental approach to identify candidate genes even for polygenic traits such as the yield-related components. Our results confirmed that the yield-related components are under polygenetic control; under contrasting growth conditions (control and drought). The candidate genes exhibited different patterns of traits control; some genes were adaptive (*HORVU2Hr1G091170*), while other genes were constitutive (*HORVU2Hr1G091030* and *HORVU3Hr1G098200*). The constitutive pleiotropic genes are of high importance to improve drought tolerance because they can be employed to improve several traits at a time with no need to test under drought. The causative genes showed different expression patterns; the constitutive genes showed constant expression profiles, while the adaptive genes showed a noisy expression profile. Most of the causative genes were expressed in spikelet organs (palea, lema and lodicules), as well as in grain, spike and leaf, indicating their potential role in drought tolerance, in particular, during the reproductive stage. To get more comprehensive and clear answers, a new detailed experiment is underway to study the gene expression of the candidate

genes identified in this study especially, the gene *HORVU3Hr1G098200* because it regulates the variation of ten traits, and because of its constitutive/adaptive mode of action.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/5/533/s1, Figure S1: Population structure based on their row type (two-rowed vs. six-rowed), geographical of origin; Figure S2: Weather data at Fayoum station that indicates Max-Min Temperature and Humidity; Figure S3: Boxplot analysis of variation of the traits; Figure S4: Boxplot analysis of drought tolerance index variation of the traits in barley genotypes under control and drought stress; Figure S5: Correlations matrix among drought tolerant index; Table S1: Summary of the most significant SNP associated with studied traits.

**Author Contributions:** Conceptualization, S.G.T., Y.S.M., M.A.K. and A.M.A.; methodology, S.G.T., Y.S.M., M.A.K. and A.M.A.; software, A.M.A.; validation, S.G.T., Y.S.M., M.A.K. and A.M.A.; formal analysis, A.M.A.; investigation, S.G.T., Y.S.M. and A.M.A.; resources, A.B.; data curation, S.G.T., Y.S.M. and A.M.A.; writing—original draft preparation, S.G.T.; writing—review and editing, S.G.T., Y.S.M. and A.M.A.; visualization, S.G.T., Y.S.M. and A.M.A.; supervision, Y.S.M., M.A.K. and A.M.A.; software, A.M.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors would like to thank all who contributed to this work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
