**The Cypriot Indigenous Grapevine Germplasm Is a Multi-Clonal Varietal Mixture**

#### **Apostolis Grigoriou 1, Georgios Tsaniklidis 2, Marianna Hagidimitriou <sup>3</sup> and Nikolaos Nikoloudakis 1,\***


Received: 16 May 2020; Accepted: 13 August 2020; Published: 14 August 2020

**Abstract:** Cypriot vineyards are considered as one among the earliest niches of viticulture and a pivotal hub for the domestication and dissemination of grapevine. The millennial presence of *Vitis* spp. in this Eastern Mediterranean island has given rise to a plethora of biotypes that have not been adequately characterized, despite their unique attributes and stress tolerance. This ancient germplasm also has an additional value since it survived the phylloxera outbreak; hence, it possesses a large amount of genetic diversity that has been unnoticed. In order to provide useful insights to the lineage of Cypriot vineyards, a two-year-spanning collection of centennial grapevine cultivars mostly regarded to belong to four indigenous variety clusters ("Mavro", "Xynisteri", "Maratheftiko", and "Veriko") was initiated. There were 164 accessions across the broader Commandaria wine zone sampled and characterized using a universal microsatellite primer set. Genetic analysis indicated that considered indigenous Cypriot germplasm has a polyclonal structure with a high level of heterozygosity. Moreover, several lineages or unexplored varieties may exist, since a larger than considered number of discrete genotypes was discovered. Furthermore, it was established that grapevine lineages in Cyprus were shaped across eras via clonal, as well as, sexual propagation. The special attributes of the Cypriot landscape are discussed.

**Keywords:** Cyprus; domestication; microsatellites; *Vitis vinifera* subsp. *sativa*; *Vitis vinifera* subsp. *Sylvestris*

#### **1. Introduction**

Grapes are the most emblematic deciduous woody vines. The imprint of viticulture to agriculture, society, culture, and even religion rituals has been unparalleled, and it can be stated without any reluctance that viticulture is intertwined with human history throughout eras since antiquity. At present, the total global grape production is 75 million tons and the surface of vineyards spans at around 7.5 million ha (FAO). In terms of revenue, the global wine market alone has perceived a stable expansion over the years and has developed as a multi-billion industry [1].

Taxonomically, grapes belong to the Vitaceae family which consists of 14 genera and about 900 species [2]. Based on seed morphological data and fossils, it has been determined that the Vitaceae cluster arose nearly 60 million years ago in the North American region [3]. Nowadays, cultivated species (classified as *Vitis vinifera* L. or *Vitis vinifera* subsp. *vinifera*) have a widespread distribution and can be found across all continents. Nonetheless, *Vitis* spp. progenitors can be exclusively found at specific areas; *V. vinifera* subsp. *sylvestris* (regarded as the Eurasian *Vitis* wild form) is found in North Africa and Eurasia, while *Vitis labrusca* is restricted to Eastern North America. The distinction of wild ancestors from cultivated forms can be challenging, since at least in the broader Mediterranean zone, a wide variety of feral forms and hybrids are present. These comprise cultivation escapees, as well as, seed-disseminated weedy types growing in local niches that occur primarily in surroundings areas of vineyards [4].

Currently, *V. vinifera* subsp. *sylvestris* is considered to be the wild relative of cultivated grapes (*Vitis vinifera* subsp. *vinifera*) and several botanical characteristics are used in order to differentiate the two taxa. One of the most important characters is the flower, as wild grapevines are dioecious, in contrast to virtually all cultivated genotypes that are hermaphrodites [5]. Due to the dioecious manner of the wild grapevine species and the high rates of heterozygosity, asexual propagation was frequently adopted in the grapevine lineage and domestication, in order to preserve desirable traits [6]. Selection of elite traits resulted to a domestication syndrome; thus, generating notable morphological variability in leaves and seeds, as well as a shift from dioecy to self-pollinating hermaphroditism and an increase in berry size and sugar content [7].

Increasing indications converge that the domestication of grapevines was disseminated alongside the advent of wine production from the center of domestication placed at the Near East and the Southern Caspian Belt [8]. The diffusion of grapevines from the principal center of domestication into adjacent areas of Northern Africa and Europe trailed through three main pathways. Firstly, towards Mesopotamia, the broader Eastern Mediterranean Basin and the Southmost Balkans (Pre-Bronze Age). Then, moving via Sicily to Western Europe, and lastly, domesticated grapevine forms reached to Central Europe 1000 years before the common era [9]. Concurrently, introgression of regional forms and secondary domestication events cannot be excluded [10–12]. This millennial ongoing process via asexual propagation, as well as trait introgression through inter- and intra-specific crossings resulted in a plethora of discrete and/or overlapping genotypes.

However, there were milestone eras and events that influenced the plethora of genetic grapevine diversity. Early Muslim conquests in the 7th century, and occupation of the primary as well as secondary domestication regions, introduced an extended period of gradual decline of wine production. This in succession could have forced a subsequent extinction of genotypes used for vinification and a turnover towards table grapes production [13]. A more recent substantial reduction of genetic resources in both wild and cultivated grapevines occurred when grape phylloxera outbroken in Europe at the late 1800s [14]. As a result, several American indigenous *Vitis* species were introduced as rootstocks, or used for incorporating disease resistant traits to hybrids [15]. Hence, varieties that were used as scions were mostly preserved but non-grafted genotypes were likely perished. Nowadays, a third wave of genetic erosion occurs via habitat fragmentation of wild and feral *Vitis* forms, supported by an augmented interest of global wine producers for only a handful of elite varieties [16].

Cyprus was on the map of ancient trade routes and, as a result, on the crossroads of the westward dissemination of grapevines to Europe. Cyprus and viticulture have been inextricably related since antiquity [17]. Regardless of its millennial vinification history [18], there is also one paramount reason why Cyprus is a grapevine hotspot of conserved diversity. The phylloxera pest occurs in all Eurasia, but Cyprus is regarded as a protected zone [19]. This constitutes Cyprus as an invaluable genetic resource reservoir of pivotal significance, since the present genotypes are living fossils and epochal remnants of grapevine domestication and distribution. Furthermore, wild grapevines can still be detected in fragmented populations or sporadically across the country; mainly located near small streams or forests. Despite its significance, up till now there has not been any widespread attempt to properly record the diversity of domestic grapevines.

Moreover, a precise number of varieties and clones is not clearly demarcated and there is quite uncertainty regarding the actual number since synonyms and homonyms can occur. Furthermore, ampelographic data are not largely available. Pierre Galet referred that "among the vines of Cyprus, there are (some) fifteen indigenous varieties": "Xynisteri", "Mavro", "Maratheftiko", "Lefkas", "Opthalmo", "Promara", "Spourtiko", "Flouriko", "Yiannoudhi", "Katomylitiko", "Kanella", "Morokanella", "Michalias", "Skouro mavro", "Rodhino rose", "Rozoudi rose", and "Maroucho

black" [20]. On the other hand, only 12 native cultivars were genotyped by Hvarleva and co-workers [21]. Still, the number of indigenous registered varieties in the national plant variety catalogue is further reduced, probably due to insufficient description.

One of the most emblematic (and globally recognized) dessert wines is Commandaria. Commandaria is an amber-colored traditional wine produced in Cyprus, and it was the first type of wine receiving the controlled appellation of origin certification amid Cypriot wines [22]. One of the aspects that gives Commandaria a unique status comes from the fact that it is one of infrequent cases of wines in production that essentially follow the same principles practiced for millennia. According to Cypriot regulation, Commandaria production is only permitted in an explicit zone of fourteen villages, situated at the slopes of the Troodos Mountain at a spanning elevation from 500 to 900 m.a.s.l. The contemporary vineyard zone covers circa 9000 ha with non-grafted plants. Moreover, Commandaria can be only produced via the vinification of sun-dried grape berries of the two local varieties; Mavro and Xynisteri [23]. Xynisteri is the major white grape Cypriot variety and accounts for 30% of Cypriot vineyards, while Mavro is the prominent black grape variety and occupies almost 40% of Cyprus vineyards. Besides Mavro and Xynisteri varieties that are used in Commandaria, Maratheftiko is also gaining recognition as the most auspicious variety to develop elegant and quality wines. Maratheftiko is a red grapevine variety, and even though it is a cultivated *Vitis* form, it lacks hermaphrodite flowers and is frequently co-planted with other cultivars in order to attain fruit-set and development [24].

Several aspects can affect the outcome of vinification. Among these, primarily the variety (or genotype) and *terroir* are among the most complex and debated issues in viniculture and oenology. The effect of the genotype and the existence of different clones within varieties is often overlooked in studies, hence making it difficult to drive meaningful conclusions (the possible diversity due to genetic background is not considered). Still, clonal selection is extensively practiced in viticulture, signifying that somatic alterations have a substantial unaccounted outcome on berry and wine attributes [25]. Since Cypriot genotypes have been locally cultivated for millennia, it is expected that genetic diversity within variety clusters exist, but largely remain uncharted and unaccredited.

Since a nationwide molecular characterization of the Cypriot germplasm has not been conducted so far, the size of genetic diversity in Cypriot grapevines must be largely underestimated (as indicated by the low and uncertain number of varieties reported, compared to other countries having similar size and geography). Nowadays newly established vineyards are propagated clones of the same genotype; hence, focusing on antique grapevines could in fact be more effective in demarking genetic diversity. The hypothesis when commencing this project was that relic genotypes that are currently considered as a variety (due to the lack of robust ampelographic description and identification) could in fact be landraces (populations), discrete clones, or different varieties that may have a potential in enriching the Cypriot grapevine germplasm. Towards attaining meaningful insights for the Cypriot grapevine landscape, 164 centennial genotypes (putatively belonging to four local Cypriot varieties; Mavro, Xynisteri, Maratheftiko, and Veriko) were sampled across the Commandaria zone and genotyped using 11 nuclear SSRs in order to reveal their genetic variability. This study is the starting point towards the prominence and safeguarding of local indigenous germplasm, and the highlighting of historical knowledge and heritage concerning the millennia viniculture of Cyprus.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

A total of 164 centennial grapevine accessions putatively attributed to the four mentioned populations, Maratheftiko, Mavro, Veriko, and Xynisteri (Table S1), were collected across Cypriot vineyards (Figure 1). Sampling was conducted across the seven wine-zone districts in the Commandaria region and the wine villages of Cyprus, during a spanning period of two years (2018–2019). Vines were carefully chosen through the support of local industry members and vineyard owners to certify a record of vine age, while distinctive phenotypic characters were considered according to the growth stage (Supplementary Data).

**Figure 1.** Map of Cyprus. Shaded area (brown) indicates the Commandaria zone and red rhombi mark the collection sites.

#### *2.2. DNA Extraction Protocol*

For DNA extraction, leaves were excised and kept between humid paper towels on ice, until storage at −80 ◦C. Leaf tissue (100–200 mg) was weighed in 2-mL rounded Eppendorf tubes, flash frozen in liquid nitrogen and freeze-dried overnight. Two stainless steel balls (3 mm) were added in each Eppendorf, samples were quickly frozen in liquid nitrogen, and tissues were crushed for 30 sec (at full speed) using a mixer mill MM 200 (Retsch). Samples were kept at −80◦C until DNA extraction. In preliminary tests, it was established that DNA extraction using standard DNA extraction procedures (CTAB or commercial DNA extraction kits) proved problematic (for multiplex PCR) due to the highly recalcitrant nature of vine tissues. Hence, before DNA extraction, crushed tissues were pre-washed twice with sorbitol wash buffer (Sorbitol wash buffer: 100 mM Tris-HCl pH 8.0, 0.35 M Sorbitol, 5 mM EDTA pH 8, 0.1% (w/v) Polyvinylpyrrolidone PVP-40, 0.1% β-mercaptoethanol) to remove interfering metabolites [26]. After the wash of the inhibitors, cell lysis was achieved using a high-salt CTAB extraction buffer (2% CTAB (hexadecyltrimethylammonium bromide), 100 mM TrisHCl pH = 8, 20 mM EDTA, 1.4 M NaCl, 0.1% (w/v) Polyvinylpyrrolidone PVP-40, 0.1% β-mercaptoethanol). Impurities were further removed using equal volumes of chloroform: Isoamyl alcohol (24:1) and DNA were precipitated with saturated NaCl (1/10th of the lysis buffer volume) and ice-cold ethanol (twice of the lysis buffer volume). The pellet was washed twice with 100 ul of 70% cold ethanol, air dried, and dissolved in 100 μL of dd H2O. DNA concentration and purity were estimated using nanodrop spectrophotometry.

#### *2.3. Microsatellite Genotyping*

Eleven SSR markers were selected for the analysis of the Cypriot germplasm (Table S2). Nine of them are suggested within the European projects Genres081 and GrapeGen06, proposing the use of a common set of microsatellite markers [27]. All forward primers employed were designed with a M13(-21) tail at the 5 -end and carried a fluorescent dye label (FAMTM, ROXTM, or TAMRATM). This enabled the run of a one-tube, single-reaction nested PCR, as previously described [28]. Based on expected allele sizes from databases (Italian Vitis Database, http://www.vitisdb.it/; Vitis International Variety Catalogue VIVC, http://www.vivc.de/) and pilot single locus PCR reactions, three different panels were selected (Table S2), in order to ensure non-overlapping allele sizes across loci. Furthermore, even though green fluorescent dye (JOETM) was initially tested for multiplexing, it was not finally employed, since it was determined that it interfered with red (ROXTM) and blue (FAMTM) spectra, causing minor peak pull-ups that could result to erroneous scoring.

For PCR reactions, the mix contained 50 ng of template DNA, 10 pmol of a labeled M13 tailed forward primer, 10 pmol of the reverse and 2.5 pmol of the forward primer, 0.2 mM dNTPs, 0.5 U KAPA Taq DNA Polymerase (Kapa Biosystems), and a 2.5 mM final concentration of MgCl2 in a 12 μL final reaction volume. Conditions for the PCR amplification were: 94 ◦C (5 min for initial denaturation), followed by 39 cycles at 94/56/72 ◦C (60 s), and a final extension at 72 ◦C for 30 min.

Amplification products were verified using a standard 2% agarose electrophoresis and diluted at a 1:40 ratio with dd H2O. One μL of the dilutions was added to 10 μL deionized formamide and 0.2 μL of DNA size standard (GeneScan 500-LIZ, Applied Biosystems, Foster City, CA, USA), before denaturing at 95 ◦C (5 min). Allele fragments were separated (in three discrete panels) by capillary electrophoresis using an Applied Biosystems 3130® Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Two control varieties (Cabernet Sauvignon and Merlot) and four Cypriot genotypes were replicated in every run, to normalize microsatellite allele sizes and associate genotypes across runs and databases. Allele scoring was performed by two researchers independently, and tandem software was utilized to verify/correct bins [29].

#### *2.4. Genetic Relationships and Analysis of Population Structure*

Microsatellite data curation and formatting was performed via the MS Excel add-in GENALEX v. 6.501 [30]. All genotypes were included for calculating allelic frequencies across loci studied. In order to assess the discriminating power among unique genetic profiles, a genotype accumulation curve was constructed. Additionally, the number of multi-locus genotypes (MLGs), as well as the number of expected MLG (eMLGs) were calculated. Moreover, genotypic diversity was assessed with several indexes (H: Shannon–Wiener Index of MLG diversity, G: Stoddart and Taylor's index of MLG diversity, lambda Simpson's index, E.5: Evenness of the alleles, and Hexp: Nei's unbiased gene diversity).

The same dataset was similarly used to test for linkage disequilibrium and Hardy–Weinberg equilibrium (HWE) in the grapevine accessions. Genetic relationships between individuals (MLGs) were assessed using the dissimilarity distance calculation and visualized as a minimum spanning network (MSN) and a discriminant analysis of principal components (DAPC). All of the above-mentioned statistics/analyses were performed using the Poppr (V. 2.8.5) package [31] and the RStudio suite (V 1.2.5033; R V 3.6.2).

A phylogenetic tree was also constructed using the binary template (converted from allele size) using the R package polysat [32]. An approximate likelihood-ratio test (aLRT) for branch support was achieved by means of the SH-like parameter as previously described [33]. The newick -formatted tree was displayed and manipulated using the iTOL v4 server [34].

Finally, a Bayesian statistic employing method for estimating genetic kinship was performed using Structure 2.3.4 [35]. The admixture model was selected and 20 independent repeats per K value (extending from 1 to 20) were run. Each run involved 100,000 iterations burning period and a post burning simulation of 1,000,000. Validation of the most probable number of clusters K and visualization was achieved using the Clumpak server (http://clumpak.tau.ac.il/).

#### **3. Results**

#### *3.1. Genetic A*ffi*liations across the Cypriot Germplasm*

From the broader Commandaria zone of Cyprus 164 vines putatively attributed to four varieties (Maratheftiko, Mavro, Veriko, and Xynisteri) were genotyped using a primer set of 11 SSR markers. The genotype accumulation curve (Figure S1) indicated that the 11 microsatellite loci were acceptable in order to delineate all the multi-locus genotypes (MLGs) present among cultivars (Table 1). The probability of identity (PI) of two samples to have the same genotype was also calculated for the dataset, and it was concluded that the cumulative capacity of the 11 loci resulted to a PI value of 6.2 <sup>×</sup> <sup>10</sup>−<sup>7</sup> to 7.8 <sup>×</sup> <sup>10</sup>−<sup>10</sup> across populations.

**Table 1.** Genetic variability across Cypriot grapevine populations. Total number of discrete genotypes and genetic diversity indices (clone corrected).


*MLG* = number of observed multi-locus genotypes, *N* = total number of samples, *eMLG* = expected multi-locus genotype, *H* = Shannon–Wiener index of MLG diversity, *Hexp* = Nei's 1978 expected heterozygosity, *Ia* = the index of association, *rbarD* = the standardized index of association \* *p* < 0.001, *Ho* = observed heterozygosity (calculated by GeneAlEx), *He* = expected heterozygosity (calculated by GeneAlEx).

All loci were found polymorphic (Table S3), and in total 102 alleles were detected, varying from six at loci VrZAG112 and VVMD25 up to 16 at locus VVMD28; with a mean of 9.27 alleles per locus (Table 2). Even though many discrete alleles were detected across loci, a few alleles were predominant, while the rest were noticed at lower frequencies (Table S4). After clone correction filtering (removal of redundant genotypes), several diversity indices were established across loci (Table 2).


**Table 2.** Allele summary statistics for the 11 analyzed loci of Cypriot grapevine populations.

*Alleles* = number of observed alleles, *1-D* = Simpson index, *Hexp* = Nei's 1978 gene diversity, *Evenness* = distribution of genotype abundances.

The Simpson index varied from 0.5 for locus VrZAG112, up to 0.83 for locus VVMD28 (mean 0.73); in general, being proportional to the number of alleles detected among discrete loci. Still, across loci the high percentages of heterozygosity and evenness of alleles, elevated the efficiency of these markers to reveal the genetic polymorphism. Indices indicated the high potential for this marker set to define variability in the Cypriot panel of grapevines.

Multi-locus genotype analysis across populations recorded 32 MLGs for Xynisteri, 41 for Mavro, eight for Maratheftiko, and two for Veriko (Table 1). The most frequent MLG across populations was MLG 76 (16 vines) and MLG 81 (14 vines) that were detected for Xynisteri (Table S3). Correspondingly, several accessions were also found to be common in Mavro, (MLG 38 in 14 vines) and Maratheftiko population (MLG 8 in 7 vines), but at a lesser extent.

Even though genotypic abundancy can be primarily inferred from the number of MLGs detected, unequal sample size can cause a partial bias; hence, an estimate of the quantity of genotypes that could be anticipated at the major common sample size established on rarefaction (eMLG) could be more suitable. The expected number of eMLGs was also computed and was found comparable among Xynisteri and Mavro populations (10; Table 1). Still, a somewhat higher amount of genetic diversity was established in the Mavro cluster (*H* = 3.714, *Hexp* = 0.667), even though many accessions shared a genotype.

In order to determine the mode of reproduction across these centennial varieties, a test of linkage disequilibrium was conducted for clone corrected genotypes. Consequently, disequilibrium indices were calculated (*Ia* and *rbarD*); it seems that populations in Cyprus were mostly clonally propagated. Moreover, we explored the probability that loci were under Hardy–Weinberg equilibrium (Table 1). It was established that several loci were in HWE (*p* < 0.5). This suggests that even though at a reduced rate, sexual propagation has also occurred in the lineage of Cypriot grapevine varieties.

#### *3.2. Population Structure of the Main Cypriot Grapevine Varieties*

Estimates of population differentiation across loci (Figure S2) were elevated suggesting that a great amount of genetic diversity still exists within each considered cluster. In order to infer the number of groups of genetically related accessions, a multivariate approach partitioning betweenand within-group components was used. The discriminant analysis of principal components (DAPC; Figure 2A) revealed that a clear genetic distinction across the Cypriot populations studied is evident. Moreover, it was established that significant within-variability also exists. The minimum spanning network (MSN) analysis (Figure 2B) depicted a structure having limited reticulation, indicative of clonal propagation where somatic mutations can result in novel genotypes that are highly affiliated to the core. Still, at low levels, reticulation was evident among genotypes. In general, the four main clusters of genotypes distinguished reflected the discrete genetic background of the four Cypriot strains studied. Interestingly, linear and reticulate relations were depicted within the germplasm of Xynisteri and Mavro populations, while the Maratheftiko cluster had a lower kinship to the core of genotypes.

The population structure among the Cypriot collection was also assessed using the Bayesian procedure STRUCTURE (Figure 3). The optimal for the ad hoc number, based on the second order rate of probability of the likelihood function respecting to Delta K, was attained for K = 2. At low cluster complexities, this analysis, indicated two discrete genetic groups (Delta K = 2229.2). The majority of putatively Mavro, Maratheftiko (black berry varieties), and Veriko (rose berry varieties) accessions were found highly constrained (all having a mean proportion of membership higher than 0.9, and were clustered in the first genetic category (Table S5). On the other hand, genotypes regarded as Xynisteri (white berry variety) had lower affinity to the first group and were affiliated with the second genetic ancestry. Within each population there were instances of admixture genotypes, characteristic of sexual propagation. As the level of K complexity increased (higher K clusters), a finer and a more detailed structure was portrayed (also differentiating Veriko genotypes; data not shown).

**Figure 2.** (**A**) Discriminant analysis of principal components (DAPC) depicting a clear cut-off across populations (DAPC cross-validation values indicated that the proportion of successful prediction (larger than 0.8) was optimum for 10 PCA axes). (**B**) Minimum spanning network (MSN) of the Cypriot grapevine varieties studied. Linear and reticulated affiliations are evident across Xynisteri (red), Mavro (green), Veriko (blue), and Maratheftiko (purple) clusters. Nodal size is proportional to the number of accessions sharing an MLG.

**Figure 3.** Structure analysis for Cypriot grapevine germplasm at K = 2 cluster and delta K probability for Cypriot genotypes. Numbers correspond to the analyzed samples.

Finally, a phylogenetic tree was constructed in order to also visualize the genetic relationships across genotypes using a hierarchical clustering approach with branch support (Figure S3). The dendrogram produced was analogous to the previous analyses, though with a notable distinction. The Maratheftiko cluster was not positioned as an outgroup as in Figure 2, but had a greater affinity to Mavro group, as it was placed among Mavro accessions. Moreover, Veriko accessions were highly affiliated to Mavro genotypes. Still, there were a few instances where several genotypes were sporadically clustered to different than expected groups, showing that misnaming is possible, or that these genotypes could in fact be different than registered varieties.

#### **4. Discussion**

The main scope of the current study was to evaluate the level of existing genetic diversity across Cypriot vineyards in the broader Commandaria area, and possibly identify novel genotypes that could provide evolutionary insights of the Cypriot grapevine germplasm structure. It was also intended to delineate the extent and imprint of genotype variability in one of the world's oldest niches of viticulture and pivotal landmark of grapevine domestication. Despite the fact that Cyprus is proximate to the grapevine center of domestication and a vital region of *Vitis* sp. westward dissemination, a very low number of native varieties has been reported so far [20,21]. Furthermore, it is speculated that the lack of robust ampelographic descriptors permits the unnoticed misuse of several "alike" forms as the same variety. As a result, there is the possibility that grapevine variability in the form of landraces, clones, or novel varieties remains largely unnoticed till today.

Towards that objective, two-year genetic resources collecting across the vineyards of Cyprus in the broader Commandaria zone was initiated and 164 centennial grapevines were obtained. The samples were genetically characterized using a universal set of microsatellite primers at eleven loci. Genotypic fingerprinting using microsatellite markers can deliver a valuable tool that allows for inter-accessions assessment of discrete collections, as well as, the pedigree analysis of hybrids and cultivar certification [36–38].

Among the 164 Cypriot accessions (registered and putatively assigned to the four clusters), a total of 83 multi-locus genotypes (MLGs) were identified (Table 1). Interestingly, 64 MLGs were unique and thus were detected only in one individual. As a consequence, 19 MLGs were found to be common (at different frequencies) among the remaining accessions. Hence, it was established that centennial grapevines of Cyprus are characterized by extensive genetic diversity and a small fraction of the sampled grapevine collection was composed of redundant germplasm. In a recent study, Drori and co-workers [13] collected and characterized grapevine genetic resources in Israel (a proximate to Cyprus region, having similar size and edaphoclimatic conditions). It was also concluded that a large proportion of the *V. vinifera* subsp. *sativa* and *V. vinifera* subsp. *sylvestris* genotypes had a unique genetic profile (about 40%) and did not correspond to known varieties/genotypes.

In viticulture, it is known that grapevine cultivars frequently consist of discrete clones, sharing mutual phenotypic traits and grouped as a variety cluster [39]. If, however, clones belonging to a cluster have traits discrete enough they are considered as different varieties. Nevertheless, several genetically affiliated varieties are very similar morphologically and hence hard to distinguish based on visual observation [40]. Conversely, clones of varieties can considerably differ in several characteristics without a significant change in genetic profiles [41]. Microsatellites can be proven a valuable tool delineating the above-mentioned dilemma, since such markers are extremely informative for attaining the level of heterozygosity across and within grapevine varieties. Across grapevine genotypes, a mean of nine alleles per locus [42] and a maximum of 23 alleles has been reported [41]. In the present study, 11 microsatellite loci were used, and a comparable level of allelic diversity was attained. An average of 9.27 alleles was obtained while in the case of the most polymorphic marker (VVMD28), 16 discrete alleles were recorded (Table 2). Moreover, several genotypes that were putatively assigned to the four clusters had extensive allelic discrepancies; hence, the possibility they are misidentified as the aforementioned cultivars and are in fact discrete varieties cannot be uncritically ruled out.

In genetic studies, when dealing with clonal taxa, analyses can be typically conducted using (or not) clonal correction. Since allele frequencies can be affected by the number of redundant genotypes, clone correction is often advised in order to better depict genetic relationships and kinships, as well as removing potential bias [31]. In the current survey, clone correction was performed in order to robustly estimate genetic indices (Tables 1 and 2). A substantial level of heterozygosity was revealed within the Cypriot varieties (*Hexp* = 0.732). The within cultivar heterozygosity is infrequently reported across studies, but in several cases, it seems that a moderate-to-high proportion of heterozygosity exists. Heterozygosity levels across grapevine clones generally vary from 0.47 for "Tannat" [43] to 0.87 for "Pinot Noir" and "Riesling" [44] cultivars, having a mean of 0.77 [40]. Despite the fact that data across studies are somewhat difficult to compare, since there are discrepancies in the number of loci and accessions employed, several common conclusions can still be attained [16]. In general, even though reports employ grapevine datasets of a few dozens to more than thousand accessions (focusing on eight to eleven loci), average to elevated heterozygosity values are reported [42,45–49]. Thus, it seems that diversity is present within cultivars, as well as across cultivars.

Since Cypriot grapevines have been cultivated for millennia, a high amount of diversity is expected among clones or varieties. Pinot is another characteristic cultivar where several clones are considered, and a great amount of heterogeneity exists. Unfortunately, the lack of precise ampelographic data on Cypriot germplasm complicates the cut-off line that defines the optimal taxonomic status. As a result, the existence of unidentified/misidentified varieties, hybrids, or feral forms considered as a "true-to-type" variety is a possibility. In the current study the predominant MLGs of Xynisteri cluster (MLG 76, MLG 81, and MLG 79) differed in one out of 11 loci, hence probably representing different clones of the same variety. This was also the case for the Mavro (MLG 38 and MLG 36) and Maratheftiko (MLG 7 and MLG 8) clusters, which were also identical in 10 out of 11 loci. Conversely, in the study of Hvarleva et al. [21], genotypes referring to discrete varieties differed at almost half of the analyzed loci in bilateral comparisons. In an equivalent study utilizing clones of Pinot, which is generally considered as one of the oldest varieties, significant discrepancies of SSR fragment lengths within clones of "Pinot gris" (loci VVS2, VMCNG1E1, VMC8A7, VMC7G3, VrZAG79, and VrZAG 25) and Pinot noir (loci VVS2, VVIM10, VMCNG1E1, VMC1F10, VMC2H4, VMV8A7, VMC7G3, VVMD28, and VrZAG 25) were established [50]. Still, in the case of Cypriot germplasm, a clear demarcation of what constitutes a variety, clone, or a landrace (population) is not an easy task, especially due to the lack of robust ampelographic data. This can be depicted in the DAPC (Figure 2A) that clearly demarks four clusters. Still, the within-group variation seems relatively small and a confident conclusion of the cut-off line separating clones from varieties cannot be established. In fact, one variety may consist of a smaller or larger number of similar and more or less related clones. Nevertheless, the findings presented here showcase beyond any doubt that a much larger than currently considered genetic variation is present across Cypriot vineyards and that several grapevines varieties are in fact misidentified by farmers.

The high amount of heterozygosity within Cypriot-cultivated grapevines could also be the result of a recurrent hybridization and parallel domestication events due to the long cultivation history and the proximity to the hotspot of grapevine domestication. In that direction, in our dataset several tri-allelic profiles across markers and genotypes were observed for markers VVMD32 and VVMD5, indicative of hybridization or chimerism. This phenomenon in clonally propagated grapevines such as monumental cultivars could participate in clonal variation and hamper proper variety identification or pedigree analysis [51]. Zarouri and co-workers [52] have also reported that amplification of multiple alleles per locus in one accession is possible. Feral forms and wild grapevines are still present in Cyprus, and sporadically, individual plants have been marked in remote regions near riverbeds.

Unfortunately, microsatellite profiles of Cypriot grapevine genotypes (using a universal primer set) are not available in public databases such as VIVC. Furthermore, partial comparison of the current dataset to the one previously described [21], revealed discrepancies in allele sizes or shifts. Such disparities are often reported in microsatellite markers studies, since inter-laboratory protocols can influence the outcome; the use of different DNA polymerase types, PCR elongation time, template

concentration, or different fluorochromes can affect the addition of an adenine nucleotide at the 3 end, resulting in allele size shifts due to the differences in molecular weight or even spectra pull-ups [53]. Hence, comparison across inter-laboratory studies must be cautiously regarded. Additionally, screening of the profiles attained in the current study to the VIVC database did not reveal similarities. The novelty of these centennial genotypes is further supported by the fact that foreign cultivars were only fairly recently (1970s) introduced into Cyprus [21].

Nonetheless, the genetic relationships of the current dataset largely correlates to previous studies that included Cypriot genotypes [21] even though a different primer set was used. Mavro and Maratheftiko populations seem to be genetically affiliated compared to Xynisteri that was the most genetically distant form, from the core cluster of Cypriot genotypes. Maratheftiko, or locally known as "Vamvakada" (due to a white coating at the back of its leaves resembling cotton; vamvaki is the Greek word for cotton) is an irregular occurrence of grape variety. Maratheftiko is one of the rare cases of grapevine cultivated forms that lacks hermaphrodite flowers. More specifically, its flowers present a well-developed pistil, but the stamens are reflexed; hence, Maratheftiko is incapable of self-pollination. This is the main problem hampering its cultivation in Cyprus despite being a drought tolerant variety and having exceptional vinification capabilities. Maratheftiko is considered among the most promising Cypriot varieties across local wineries since it produces red wines with an intense full body, having soft tannins and unique aromas when harvested and vinified correctly [54]. In the current survey it was clustered as an outgroup, having a limited genetic affiliation to the core of other Cypriot genotypes (Figure 2B). The Hardy–Weinberg equilibrium (HWE) indicates that at least in the case of the Maratheftiko cluster, widespread hybridizations have occurred throughout its lineage. In that direction, previous studies employing SNP markers have described a Levant domestication of the vinifera cluster and have displayed signs of introgression from local *sylvestris* forms as grapevines disseminated to Europe [12]. The extent to which local wild forms contributed to the establishment of the grapevine germplasm globally remains a disputable subject [9].

Still, the indices of association and significant *rbarD* values indicated that Cypriot genotypes were also clonally propagated since a considerable disequilibrium among loci was attained signifying conscious selection of elite genotypes during domestication. The clonal propagation of Cypriot genotypes can also be depicted from the network analysis since a limited reticulation can be observed among clusters (Figure 2), even though at slower rates compared to sexual propagation, clonal propagation can give rise to genetic variation [55]. Clonal polymorphism within perennial species has been mainly attributed to naturally occurring mutations throughout grapevine growth [39].

The "wild" characteristics of the Cypriot germplasm is also supported by vinification features of these genotypes. Xynisteri ('Xyno' is the Greek word for sour) is the predominant white variety and has a sharp taste due to the high level of acids [54]. Other traits indicating that Xynisteri retains some wild type features is the exceptional resilience against drought and elevated temperature [56], which are frequent features across wild *V. vinifera* spp. *sylvestris* forms [57]. In fact, even though domestication has resulted into higher yield, larger berries, and a higher sugar content, this selection was at the expense of biotic and abiotic stress capacity [9]. Recent studies have stressed the significance of wild forms for breeding purposes against mildew pathogens [58,59] or abiotic resilience and berry quality [60]. Past pedigree records and passport data (http://www.vivc.de/) reveal that Xynisteri has been broadly used in hybrid crosses in Israel during the 1930s, establishing a series of Hebron varieties used mainly as rootstocks. Nowadays, Xynisteri is largely tested outside the narrow range of Cyprus, at high temperate-arid regions of Australia, highlighting the ready-to-use potential of Cypriot germplasm for grapevine sustainability purposes [56].

In terms of conservation, several projects for the preservation and highlighting of local genetic resources have been directed in grapevine growing countries [16]. As a result, a significant number of minor varieties have been collected and characterized. Since several elite foreign varieties were only fairly recently (1970s) introduced into Cyprus [21], the Cypriot germplasm largely preserves its original structure throughout antiquity and is a remnant of the westward grapevine domestication and dissemination. In conclusion, the genetic characterization of such genetic resources is of paramount importance not only having a local interest but extendable to global viticulture. Furthermore, in the current study it was portrayed that the existence of a plethora of discrete genotypes is in fact an indication that more Cypriot varieties can exist and the need for proper ampelographic characterization is imperative.

**Supplementary Materials:** All relevant data are appended as supporting information. The following are available online at http://www.mdpi.com/2223-7747/9/8/1034/s1. Supplementary data: Morphological characters used for the discrimination of varieties/ Leaf diversity across accessions. Figure S1: Genotype accumulation curve depicting the efficiency of SSRs in delineating the Cypriot Vitis spp. Genotypes. Figure S2: Genetic variability estimates from clone corrected genotypes. Figure S3: Phylogenetic tree of Cypriot grapevine genotypes. Table S1: Sampling locations and germplasm information across collection sites within the Commandaria zone. Table S2: Primers and panels used for the amplification of alleles across 11 loci. Table S3: Alleles size and MLGs across the 164 Cypriot genotypes sampled and analyzed. Table S4: Frequency of alleles detected for every locus in the Cypriot grapevine collection. Table S5: Ancestry probability of STRUCTURE analysis across the 164 Cypriot genotypes.

**Author Contributions:** A.G. collected the plant material, performed the investigation process, and developed fingerprinting data. G.T. contributed in data acquisition, validation, and reviewed the manuscript. M.H. contributed in data acquisition, validation and analysis, contributed to the discussion of results, and reviewed the manuscript. N.N. conceptualized and supervised the study, curated data and analysis, wrote the final draft. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors would like to thank the Vlassides, Tsiakkas, and Santa Irene wineries for their help with providing the germplasm studied and the anonymous reviewers for their constructive comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **Genetic Structure of Wild Germplasm of Macadamia: Species Assignment, Diversity and Phylogeographic Relationships**

#### **Thuy Mai \*, Mobashwer Alam, Craig Hardner, Robert Henry and Bruce Topp**

Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Brisbane, QLD 4072, Australia; m.alam@uq.edu.au (M.A.); c.hardner@uq.edu.au (C.H.); robert.henry@uq.edu.au (R.H.); b.topp@uq.edu.au (B.T.)

**\*** Correspondence: thi.mai3@uqconnect.edu.au

Received: 15 May 2020; Accepted: 30 May 2020; Published: 3 June 2020

**Abstract:** Macadamia is an Australian native rainforest tree that has been domesticated and traded internationally for its premium nuts. Common cultivars rely upon a limited gene pool that has exploited only two of the four species. Introducing a more diverse germplasm will broaden the genetic base for future crop improvement and better adaptation for changing environments. This study investigated the genetic structure of 302 accessions of wild germplasm using 2872 SNP and 8415 silicoDArT markers. Structure analysis and principal coordinate analysis (PCoA) assigned the 302 accessions into four distinct groups: (i) *Macadamia integrifolia*, (ii) *M. tetraphylla*, and (iii) *M. jansenii* and *M. ternifolia*, and (iv) admixtures or hybrids. Assignment of the four species matched well with previous characterisations, except for one *M. integrifolia* and four *M. tetraphylla* accessions. Using SNP markers, 94 previously unidentified accessions were assigned into the four distinct groups. Finally, 287 accessions were identified as pure examples of one of the four species and 15 as hybrids of *M. integrifolia* and *M. tetraphylla*. The admixed accessions showed the highest genetic diversity followed by *M. integrifolia*, while *M. ternifolia* and *M. jansenii* accessions were the least diverse. Mantel test analysis showed a significant correlation between genetic and geographic distance for *M. integrifolia* (r = 0.51, *p* = 0.05) and a positive but not significant correlation for *M. tetraphylla* (r = 0.45, *p* = 0.06). This study provides a population genetics overview of macadamia germplasm as a background for a conservation strategy and provides directions for future macadamia breeding.

**Keywords:** genetic diversity; DArT markers; macadamia; dendrogram; principal coordinate analysis; population structure; population genetics; wild species

#### **1. Introduction**

The genus *Macadamia* belongs to the Proteaceae family and is composed of four species: *Macadamia integrifolia* Maiden and Betche, *M. jansenii* C.L. Gross and P.H. Weston, *M. ternifolia* F. Muell and *M. tetraphylla* L.A.S. Johnson. The natural distribution of the four species is in the subtropical rainforest from south-east Queensland (QLD) to north-east New South Wales (NSW), Australia [1,2]. *M. integrifolia* and *M. ternifolia* occur in south-east QLD, while *M. tetraphylla* is distributed mainly in northern NSW, with some extension into south-east QLD [3]. Overlapping zones exist between *M. integrifolia* and *M. tetraphylla* and between *M. integrifolia* and *M. ternifolia*, with natural hybridisation occurring in these zones [4]. The fourth species, *M. jansenii*, has been found only in a single location in Bulburin (QLD) that is 180 km north from the nearest population of *M. integrifolia* [5]. Of these four species, *M. integrifolia* and *M. tetraphylla* produce edible nuts, and hence, most of the commercial cultivars belong to either of these two species or their hybrids [6]. The other two species, *M. jansenii* and *M. ternifolia*, have not been used in directed breeding, due to their bitter inedible nuts containing

high levels of cyanogenic glycoside [7–9]. Most current commercial varieties appear to be two to four generations from the wild [1]. The majority of global macadamia production relies upon the cultivars from the Hawaiian breeding program, which is comprised mostly of a limited gene pool of *M. integrifolia* [10,11]. However, wild genetic resources have the potential to provide parents with desirable traits, including small tree size, nuts with thinner shells, resistance/ tolerance to biotic and abiotic factors, etc. [12] Exploring the genetic potential of the wild germplasm will facilitate further exploitation of these resources in trait improvement.

The study of genetic diversity and population structure can determine the potential of wild germplasm in future breeding. The genetic structure is formed over time due to the multiple actions of migration, selection, mutation, and genetic drift [13], as well as the mode and method of reproduction. A diverse species has the opportunity for selection of the fittest alleles while low diversity leads to the risk of extinction [14]. Broadening the genetic base of breeding material requires the identification of diverse parents for crossing with the cultivated crop [14]. Understanding the genetic relationship among the parents is essential to avoid inbreeding depression, particularly for the improvement of complex traits. Therefore, knowledge of the genetic divergence is a prerequisite to maximise heterosis in the breeding progeny [15]. The history of world macadamia breeding is very short, and the existing cultivars are only a few generations from the wild germplasm growing naturally in the rain forest. Recently, a chloroplast genome sequencing project on wild and cultivated germplasm indicated that all major Hawaiian cultivars share a single chlorotype probably derived from a small sample form single location [16]. As current macadamia plantations are mostly dependent on Hawaiian cultivars, this limited sampling suggests there is an opportunity for future genetic improvement by exploiting the diverse genetic resource. In addition, the genetic base can be expanded in breeding material by selecting diverse parents from wild germplasm for crossing with the cultivated crop.

Molecular markers are considered as the most suitable tool to estimate genetic diversity, due to their polymorphic nature and independence to environmental effects [17]. Several molecular marker systems have been developed to study in macadamia and only a few of them were used for the genetic characterisation of wild germplasm. An isozyme-based study was conducted by Aradhya et al. [18] to identify the genetic relationships among 40 cultivars (35 *M. integrifolia*, three *M. tetraphylla*, and two hybrids) and three *M. ternifolia* accessions. Mast et al. [19] studied the relationships among the four *Macadamia* species and their wild relatives, using three cpDNAs (*matK, atpB*, and *ndhF*), three nDNA (*waxy* loci *1* and *2*, and *PHYA*) genomic regions. However, this study used only one accession per species. The genetic structure of a large number of wild germplasm accessions was studied by Peace [4] using low throughput RAF (randomly amplified DNA fingerprinting, dominant) and RAMiFi (randomly amplified microsatellite fingerprinting, co-dominant) markers. All these marker systems have limitations, including a low total number of markers, low marker density, and low genome coverage, and hence, are seldom used in genomic studies. SSRs (simple sequence repeat markers or microsatellite) are considered as one of the best marker systems for genetic studies, with many advantages, such as stability, PCR-based amplification, and relatively low cost [17]. SSRs were used in the genetic diversity study of wild *M. integrifolia* [20] and wild *M. tetraphylla* populations [21,22]. However, there is a limited number of SSR primers available for macadamia, particularly those that successfully amplify across species [23], and as such, may not be effective for a large-scale genetic study of all four wild species.

The rapid advancement of next generation sequencing technology (NGS) enables the discovery of high-throughput and cost-effective molecular marker systems. Using NGS technology, Diversity Array Technology (DArT) developed a marker system that facilities affordable whole-genome level genetic characterisation. DArT has been successfully applied for the genetic diversity, population structure and genetic mapping studies of many crop species [24–26]. Recently, Alam et al. [27] used 11,526 silicoDArTs and 3956 SNPs to study the genetic diversity and population structure of 80 macadamia cultivars. O'Connor et al. [28] reported the genetic diversity, population structure and linkage disequilibrium of 295 seedling progenies from 29 selected parents, using 16,171 silicoDArTs and 4113 SNPs. These

studies suggest that DArTseq markers could be applied for genomic studies in the wild germplasm of macadamia.

In this study, for the first time, we used high-throughput DArTseq platforms for the genetic characterisation of a large number of wild accessions of macadamia. The aims were to: (1) assess the population structure of wild macadamia germplasm, (2) explore the genetic diversity among the accessions within species, and (3) determine the relationship between genetic and geographic distance within *M. integrifolia* and *M. tetraphylla*.

#### **2. Results**

#### *2.1. Quality of DArTseq Markers*

DArTseq platforms generated 13,221 SNP and 47,811 silicoDArT markers. The call rates of SNP markers varied from 0.20 to 1.00, with an average of 0.62 (Supplementary Materials Table S1). Of the 13,221 SNPs, the call rate of 4184 markers (32%) was > 0.80 (Figure 1a). The reproducibility of SNPs varied from 0.86 to 1.00, and most of them (98%) were over 0.95 (Figure 1a). The call rate of silicoDArT markers varied from 0.81 to 1.00 (Table S1). The average call rate was very high (>0.99), with 94% of markers having a call rate over 0.95 (Figure 1a). The range of reproducibility was 0.95 to 1.00, in which 98% of the markers had very high value (0.99) of reproducibility. Mean one ratio was higher in SNPs (0.32) than in silicoDArT markers (0.08) (Table S1). Most silicoDArTs (82%) had one ratio below 0.05, compared with only 28% of SNPs (Figure 1b). Considering the quality parameters: call rate (>0.80), reproducibility (>0.95) and one ratio (>0.05), 2872 SNPs and 8415 silicoDArTs were retained for further analysis (Tables S2 and S3). The remaining markers had PIC values from 0 to 0.5 for both SNPs and silicoDArTs (Figure 1c). Mean PIC was 0.26 for silicoDArTs and 0.22 for SNPs. Only 120 of the SNPs (4%) had low PIC (<0.05), compared with 825 (9.8%) of silicoDArTs (Figure 1c).

#### *2.2. Population Assignment*

The K and Q values from STRUCTURE analysis were used for the assignment of individual accessions in each species/hybrid group. Population clusters in PCoA validated the species representation.

The ΔK from the STRUCTURE analysis of SNP markers was significant when K = 2, 3 and 5, with a peak at K = 3 (Figure 2a). The optimal peak at K = 3 suggested that the 302 accessions in the germplasm were derived from three distinct clusters, as represented by different colours in the structure analysis (Figure 2b, K = 3), here named Cluster I (blue), Cluster II (green) and Cluster III (red). Cluster I was composed of 18 accessions, including eight previously labelled as *M. ternifolia*, two as *M. jansenii*, and eight as undefined species (Table S4). These undefined eight accessions, originally labelled as mixed/hybrid populations, were collected from the natural distribution of *M. ternifolia*. Considering this distribution and their morphological appearance (Thuy Mai, pers. observations), these accessions were classified as *M. ternifolia*. Cluster II contained 99 predefined *M. integrifolia* and 36 accessions of previously undefined species. Cluster III was comprised of 94 predefined *M. tetraphylla* and 38 undefined species. There were 17 accessions, including one predefined *M. integrifolia*, four predefined *M. tetraphylla*, and 12 accessions from planted/unknown/mixed populations that showed the genetic admixture (hybrid) of clusters. For example, the accession "M034", which was previously labelled as *M. integrifolia*, consisted of 6% of Cluster II (predominantly *M. integrifolia*), 16% of Cluster III (predominantly *M. tetraphylla*), but 78% of Cluster I (predominantly *M. jansenii* and *M. ternifolia*). Two previously labelled *M. tetraphylla* accessions ("M265" and "M266") were identified as admixtures of Cluster II & III. The accession "M265" was composed of 61% of Cluster II and 39% of Cluster III, and accession "M266" was composed of 52% of Cluster III.

The pattern displayed for K = 2 (Figure 2b) grouped the accessions of three species, *M. jansenii*, *M. ternifolia* and *M. tetraphylla*, into one cluster, and separated *M. integrifolia* accessions in another

cluster. The pattern displayed for K = 5 (Figure 2b) still did not separate the accessions of the two species *M. jansenii* and *M. ternifolia* but divided the accessions of *M. integrifolia* into smaller sub-clusters.

**Figure 1.** Distribution of SNP and silicoDArT marker data for several quality parameters. (**a**) call rate and reproducibility; (**b**) one ratio and (**c**) polymorphic information content (PIC) value.

**Figure 2.** Population structure of 302 accessions based on 2872 SNPs, as inferred by STRUCTURE. (**a**) Best value of K based on Evanno's ΔK; (**b**) Individual membership proportions (Q value) in two, three and five clusters, with each cluster represented by a colour block. Each vertical line represents one accession. The accessions were sorted on the x-axis by latitude from north to south.

Principal coordinate analyses (PCoA) of SNP markers via distance matrix with data standardization identified three distinct groups of the four species (Figure 3). This result was consistent with the result of STRUCTURE analysis at K = 3. The first two coordinates of PCoA explained 61.09% of total variation in SNPs. The accessions of two species *M. ternifolia* and *M. jansenii* formed Cluster I. Cluster II was formed by the accessions of *M. integrifolia* and Cluster III included the accessions of *M. tetraphylla*. The accession "M034", which was assigned as admixture in STRUCTURE analysis, was clustered in the *M. jansenii*/*M. ternifolia* group and shows a close relationship with two *M. jansenii* accessions. The accession "M160", which was also assigned as admixture in STRUCTURE analysis composing 78% of Cluster III, was clustered in the *M. tetraphylla* group.

**Figure 3.** Principal coordinate analysis (PCoA) of the 302 accessions based on 2872 SNP markers, showing the three distinct groups and the admixtures. The first two coordinates of PCoA explained 34.06% of the total variation.

Finally, based on the STRUCTURE analysis and PCoA, we assigned 302 wild accessions into 287 pure accessions, representing the four distinct species, and 15 admixtures. Pure accessions are composed of 135 *M. integrifolia*, 133 *M. tetraphylla*, and 19 *M. ternifolia*/*M. jansenii* (Table S4).

#### *2.3. Genetic Diversity*

We estimated the genetic diversity parameters among 302 accessions, using both SNP and silicoDArT markers (Table 1). For SNP markers, the number of effective alleles (*Ne*) ranged from 1.08 to 1.34, with the lowest in *M. ternifolia*/*M. jansenii* group and the highest in admixture. Shannon's index (*I*) ranged from 0.11 (*M. ternifolia*/*M. jansenii*) to 0.33 (admixture), with a mean of 0.23. In all clusters, the observed heterozygosity (*Ho*) was smaller than the expected heterozygosity (*He*). *He* was highest in admixture (0.21) and lowest in *M. ternifolia*/ *M. jansenii* (0.07), with a mean of 0.15. Interestingly, accessions from the *M. integrifolia* group showed the highest percentage (86.53%) of polymorphism (%P), followed by admixture (74.93%), *M. tetraphylla* (71.5%), *M. ternifolia*/*M. jansenii* (32%) groups. Similar results were also observed for silicoDArT markers (Table 1).

Nei's genetic distance (D), based on SNP markers (Table 2), ranged from 0.06 between admixtures and *M. integrifolia* to 0.27 between *M. integrifolia* and *M. ternifolia*/*M. jansenii* groups. *M. tetraphylla* accessions shows lower genetic distance with *M. integrifolia* (D = 0.2) than that of *M. ternifolia*/*M. jansenii* accessions (D = 0.23). Hence, the admixture accessions showed almost similar genetic distance with *M. integrifolia* (0.06) and *M. tetraphylla* (0.07) germplasm. Although the estimated value of genetic distance using silicoDArT markers was lower than that of SNPs, the genetic relationship between species was similar in both marker systems (Table 2).

The analysis of molecular variance (AMOVA) showed a higher proportion of variance detected within species than among clusters (Table 3). For SNPs, the percentage of genetic variation within species (55%) was higher than that among species (45%). A similar pattern of genetic variation was observed using silicoDArT markers (Table 3).


**Table 1.** Genetic diversity parameters for Macadamia accessions based on SNP and silicoDArT markers. N = number of accessions, Na = number of different alleles, Ne = number of effective alleles, I = Shannon's information index, Ho= observed heterozygosity, He = expected heterozygosity, %P = percentage of polymorphic loci, SE = standard error.

**Table 2.** Pairwise population matrix of Nei's genetic distance (D) among the clusters of Macadamia wild germplasm, using SNP and silicoDArT markers.


**Table 3.** Summary statistic of AMOVA analysis in Macadamia germplasm using SNP and silicoDArT markers. df = degrees of freedom, SS = sum of squared observations, MS = mean of squared observations, Est. Var = estimated variance, % Var. = percentage of total variance. PhiPT = var. among groups (species)/ (var. among groups + var. within group).


#### *2.4. Phylogeographic Relationships among the Accessions of M. integrifolia and M. tetraphylla*

We identified the genetic relationships within each species of *M. integrifolia* and *M. tetraphylla* presented in the dendrogram (Figure 4). Most accessions from the same locality grouped together, although some accessions from a locality clustered with accessions from other localities. For example, within *M. integrifolia*, the accession "M010" from Gundiah and four accessions from Nambour grouped with Beenleigh accessions (Figure 4a). Two accessions, "M027" and "M037", from Gympie grouped with accessions from Gundiah and Nambour, respectively. Within *M. tetraphylla*, the accession "M263" from Lismore grouped with Murwillumbah, while the accessions "M208" and "M209" from Murwillumbah grouped with Lismore (Figure 4b). Some of the accessions showed an unexpectedly variable branch length compared to other accessions from the same cluster. Accessions with the longest branches represent the most diverged accessions within the population. For example, accession "M148" had ~50% longer branch than other Beenleigh accessions. Similarly, accessions "M036", "M048", "M136" and "M008" of *M. integrifolia*, and "M227", and "M267" of *M. tetraphylla* had significantly longer branches.

**Figure 4.** Unweighted neighbour-joining dendrograms using 2872 SNP markers, showing the genetic relationships among (**a**) 99 *M. integrifolia* accessions and (**b**) 94 *M. tetraphylla* accessions. Localities are represented by different colours.

To explore the genetic basis of the geographic relationship we calculated the correlation between the genetic distance and geographic distances of the localities for each species. The pairwise genetic distance among the accessions of six localities of *M. integrifolia* (Table S5) varied from 0.019 to 0.055, with an average of 0.034. The genetic distance between Gundiah and Gympie was the closest (0.019), while it was the farthest (0.055) between the accessions of Numinbah and Caboolture. The pairwise genetic distances among the accessions of *M. tetraphylla* localities (Table S6) were higher than that of *M. integrifolia*. The range of variation in *M. tetraphylla* was 0.013 to 0.151, with a mean of 0.063. Accessions from locality Murwillumbah and Lismore showed the closest genetic distance (0.013), while the highest (0.151) was observed between the accessions of Beenleigh and Nimbin.

The Mantel test analysis showed a significant correlation between genetic distance and geographic distance (r = 0.51, *p* = 0.05) among *M. integrifolia* localities (Figure 5a). Meanwhile, the correlation between genetic and geographic distance among *M. tetraphylla* localities was positive but not significant (r = 0.45, *p* = 0.06) (Figure 5b).

**Figure 5.** Correlation between genetic and geographical distance among (**a**) six localities of *M. integrifolia*, and (**b**) eight localities of *M. tetraphylla*, based on a Mantel test at 999 random permutations.

#### **3. Discussion**

#### *3.1. Species Assignment of Wild Macadamia Germplasm*

Population structure and PCoA of DArTseq based SNP markers facilitated the assignment of individuals in corresponding species or hybrid groups. Our species classification of each accession matches well with the field note and previous DNA study [4]. Results from this study confirmed that most of the previous phenotypic characterisations were successful for species' identification of wild germplasm.

All but one *M. integrifolia* accessions were clustered together in the same group. "M034", an accession from the "Mooloo" region of the "Gympie" locality, was identified as *M. ternifolia*/*M jansenii*, although it was previously recorded as*M. integrifolia*. A co-investigation of wild germplasm using 15 SSR markers identified that the same accession "M034" is a clone of another accession (X-CANB896104) from Canberra Botanic Garden. Interestingly, accession "X-CANB896104" is recorded as a cutting from wild *M. ternifolia* from Mary Cairncross Park, Maleny (Cathy Nock., pers communication). Further phenotypic characterisation confirmed that "M034" is a *M. ternifolia*. Among the 99 accessions of *M. tetraphylla*, four accessions were assigned as hybrid of *M. tetraphylla* and *M. integrifolia*. Accessions "M265" and "M266" shared almost 50% from each species, whereas "M270" and "M277" had ~30% from *M. integrifolia* genotype and ~70% *M. tetraphylla* (Figure 3, Table S3).

This study clearly demonstrated that the accessions of *M. tetraphylla* and *M. integrifolia* formed two distinct populations. The accessions of *M. jansenii* and *M. ternifolia* clustered together, although they were collected from geographically distant locations. The close relationship between *M. jansenii* and *M. ternifolia* is supported by the previous molecular studies [29,30]. These findings indicate that the accessions of both *M. jansenii* and *M. ternifolia* may have the same genetic lineage, or that one may be an ancestor of the other. Possibly, these two small groups of populations may have separated due to past climatic extremes and adapted as small groups in two distinct locations. It is to be noted that there may be some sampling effect of the small number of accessions of *M. jansenii* (*n* = 2) and *M. ternifolia* (*n* = 16) in our study. Though *M. jansenii* and *M. ternifolia* differ in a few simple traits including leaf tip, leaf serration and flower colour, they are morphologically similar, both with small tree size, and small and bitter nuts [9,31]. However, there may be further debate on the species differentiation of these two small populations. Investigation on evolutionary genetics and time divergence on four *Macadamia* species can be used to confirm the speciation of wild macadamia germplasm.

The species status of previously unidentified accessions, including those of unknown origins, planted germplasm and mixed/hybrid populations were resolved. Out of 94 unidentified accessions, 36 were assigned as *M. integrifolia*, 39 as *M. tetraphylla*, 8 as *M. ternifolia* and 11 as hybrids/admixtures of *M. integrifolia* and *M. tetraphylla* (Table S5). The genotypic classification was consistent with our field observation on phenotypic characteristics of representative species (Thuy Mai, pers communication) The species composition of many modern and heritage cultivars is uncertain [1], but this finding supports the potential of SNP markers to resolve their species status.

Some accessions we identified as *M. tetraphylla* had been collected from further north than the accepted distribution of this species. These were the accessions "M056", and "M057" from population 16 (Palmwoods, Nambour QLD), accession "M054" from population 106 (Mapleton Kenilworth, Nambour QLD) and five accessions from population 36 (Mount Glorious, Caboolture QLD). However, these populations were noted at the time of collection as planted or uncertain populations, and it seems highly likely that their locations were the result of human activity.

#### *3.2. Genetic Diversity of the Four Macadamia Species*

In this study, we developed new knowledge on genetic diversity in wild accessions by using high-throughput silicoDArT and SNPs marker. In both types of markers, the average expected heterozygosity (*He*) across the wild germplasm and clusters forming "pure" species was lower than in previous genetic studies of macadamia cultivars [21–23,32]. However, the cluster of hybrid accessions from wild *M. integrifolia* and *M. tetraphylla*, which is composed of a small number of accessions (*n* = 15), showed greater gene diversity than previous studies on cultivars and pure wild species in the current study. This result indicates that crossing between two species can be conducted, to increase genetic diversity in the future breeding program.

Our results suggested that *M. integrifolia* and *M. tetraphylla* contained two- to three-fold greater diversity than the *M. jansenii* and *M. ternifolia* cluster. Population size has a significant effect on genetic diversity [33]. Generally, smaller population size leads to lower genetic diversity [34]. Extinction and contraction of species' distribution during successive ice ages has resulted in reduced population size and resultant diversity bottlenecks in other Australian flora, such as *Acacia, Banksia, Eucalyptus* etc. [35]. Certainly, *M. jansenii* formed a very small population, with less than 100 individuals comprising the whole species [5,9]. In this study, only a small number of *M. jansenii* (*n* = 2) and *M. ternifolia* (*n* = 17) have been tested. The lower diversity of *M. jansenii* and *M. ternifolia* in our study may be the result of small populations. An investigation with larger sample numbers from diverse distributions of *M. jansenii* and *M. ternifolia* should be conducted, to define their genetic diversity more completely and accurately.

#### *3.3. Phylogeographic Relationship among the Accessions of M. integrifolia and M. tetraphylla*

Geographical distance is one of the major contributing factors in species differentiation. Knowledge of the genetic structure of a species over its geographic distribution is important to develop an understanding of the evolutionary processes [36]. In this study, the neighbour-joining tree, based on

dissimilarity matrix (Figure 6), showed that the accessions from the same geographical area appeared to be grouped together. However, some accessions (e.g., "M010", "M027", "M037", "M263", "M208", "M209") were found to be clustered with accessions of different geographical regions (Figure 6). This result was supported by a previous study on chloroplast genome sequence, where Nock et al. [16] reported the relocation of some accessions within the *M. integrifolia* germplasm. Since the gene flow for both species is restricted within a short distance (~50 km) [1], the impact of environmental parameters, such as water, gravity, and animals like rodents [37], or a result of human activity on the seed transportation [16], could be considered as the cause of relocation.

**Figure 6.** Map showing origins of representative wild-germplasm accessions of *M. integrifolia* and *M. tetraphylla*. The black circles represent six localities of *M. integrifolia* population, and the red circles represent eight localities of *M. tetraphylla* populations.

Phylogeographic analysis of *M. integrifolia* accessions revealed a significant positive correlation between genetic and geographic distance (r = 0.51, *p* = 0.05). This correlation was higher than a previous study using RAF markers [4] (r = 0.16, *p* = 0.016), and in almond germplasm (r = 0.173, *p* = 0.226) [38]. Based on chloroplast genome study, Nock et al. [16] also found a latitudinal population structure among these accessions. We also observed a positive relationship between genetic and geographic distance among the accessions of wild *M. tetraphylla*. Although, this correlation (r = 0.45) is non-significant, it is higher than that found by Peace [4] (r = 0.13). The non-significant phylogeographic relationship in *M. tetraphylla* suggested that geographic distance may not be the main factor influencing the genetic distance between populations of this species, although geographical boundaries, low gene flow and genetic drift are typically key factors explaining genetic differentiation in fragmented populations [39].

The phylogenetic trees (Figure 4) of *M. integrifolia* and *M. tetraphylla* provided an indication of ancestral lineage of the accessions of each species. Interestingly, the root of most of the accessions of both species originated from the population of Numinbah (Figure 4). It is to be noted that Numinbah is an overlapping region of both species, and its surrounding regions are the sources of mixed/hybrid population. We hypothesise that there is a possibility of early divergence of these two species at Numinbah. Later, smooth leaved *M. integrifolia* may have adapted to the north and serrated leaved *M. tetraphylla* may have adapted to the southern regions. Further genomic investigation with more accessions from Numinbah can explain the origin of these two species.

#### **4. Materials and Methods**

#### *4.1. Collection of Plant Samples*

The germplasm field trials were established in 2000 and 2001 in Tiaro, Queensland (QLD) and Alstonville, New South Wales (NSW), Australia. Wild accessions were collected from multiple geographical regions (Figure 6), covering the natural distribution of the four species. Ramets of each accession were propagated clonally as rooted cuttings. During the collecting trip, the germplasm collector (S. Faulkner) classified the populations as "wild", "planted" (trees were cultivated from the local wild trees), "hybrid/mixed", or "uncertain". The accession(s) from each population may belong to more than one of these classifications; therefore, the original description for each accession was noted. Accessions of the rare species *M. jansenii* were added to the field trial in July 2011. From these collections, we studied 302 accessions, including 100 *M. integrifolia*, two *M. jansenii*, eight *M. ternifolia*, 98 *M. tetraphylla*, and 94 accessions of undefined species (from mixed/hybrid populations, uncertain origins or planted germplasm). These accessions originated from 75 populations (one to seven accessions per population, averaging 4.1) across 52 regions (one to three populations per region) from 14 localities (one to nine regions per locality) (Table S7). The accessions were ordered by latitude from north to south and coded from "M001" to "M302".

#### *4.2. DNA Extraction and Genotyping*

Newly flushed young leaves were collected from the ex situ trials at Tiaro and Alstonville in December 2017 and placed in zipped plastic bags inside a cool box with ice blocks. The materials were stored in a cold room at 4 ◦C before transfer to DArT Pty Ltd. (Canberra, Australian Capital Territory, Australia), to perform DNA extraction. A total of 2–3 mg of leaf tissue from each sample was sub-sampled into a 1.1 mL microtube containing a disposable steel ball bearing. Leaf samples were crushed using a QIAGEN Tissue Lyser (Qiagen, Germany). A Freedom EVO robotic Tecan 100 (Tecan Group, Switzerland) was used for DNA extraction following an existing protocol recommended by DArT (https://www.diversityarrays.com/). DNA samples were incubated with loading dye at 37 ◦C for two hours and then checked for quality control on 0.8% agarose electrophoresis gel for 30 min at 100 V.

Accessions were genotyped for dominant silicoDArT and co-dominant SNP markers following an established protocol developed by Kilian et al. [40] An appropriate method (*Pst*I + *Hha*I) of complexity reduction was selected to detect DArTseq-based markers using next generation sequencing (NGS) technology. The silicoDArT and SNP markers were scored as a binary data format in which the

score was "1" for presence, "0" for absence and "-", for failure to score of a marker in the genomic representation of each sample. The full details of methodology of developing DArTseq-based markers for macadamia were previously described by Alam et al. [27] and O'Connor et al. [28]

#### *4.3. Analysis of Processed Marker Data*

DArTseq platforms generated SNP and silicoDArT markers. DArTsoft v7.4 software was used to automatically identify and score the polymorphic markers, using a proprietary marker calling algorithms. The quality of the markers was tested for call rate (%), reproducibility (%), one ratio, and polymorphic information content (PIC). The call rate determines the success of reading the marker sequence across the samples, and was estimated from the percentage of samples for which the score was either "0" (absence of marker) or "1" (presence of marker). The scoring of reproducibility involves the proportion of 138 technical replicates for which the marker score exhibited consistency. One ratio was determined as the proportion of samples for which genotypes were scored as "1". PIC is the degree of diversity of the marker in the population and shows the usefulness of the marker for linkage analysis. Quality control and filtering were applied to both SNP and silicoDArT markers, including call rate (>80%), reproducibility (>95%), and one ratio (>0.05).

#### *4.4. Analysis of Population Structure and Genetic Diversity*

The population structure of the 302 accessions of four species was identified based on SNP markers by using the Bayesian model-based program STRUCTURE v2.3.4 [41–44]. A burn-in length of 10,000 cycles and Markov chain Monte Carlo (MCMC) of 50,000 runs were set for the structure analysis. Cluster values (K) ranging from one to 10 were performed, each with ten different iterations. Results from STRUCTURE were uploaded to Structure Harvester [45], through http://taylor0.biology.ucla.edu/ structureHarvester/, to determine the optimum number of populations (best value of K) using ΔK value, as described by Evanno et al. [46] The individual ancestry proportion (Q value) at the best K value was determined for each accession. Based on Q values, the accessions were identified as pure or admixed species: accessions with Q value more than 0.8 were considered as "pure" and accessions with Q value less than 0.8 were assigned as "admixture" [14,47].

The genetic diversity and principal coordinate analysis (PCoA) were performed using GenAlEx v6.5.2 software [48]. Both SNP and silicoDArT markers were used to calculate the diversity parameters, including the mean number of alleles per locus (Na), number of effective alleles (Ne), observed and expected heterozygosity (Ho and He), Shannon's information index of diversity (I) and the percentage of polymorphic loci (%P). Pairwise population matrix of Nei's genetic distance [49] and the analysis of molecular variance (AMOVA) were also estimated.

#### *4.5. Phylogeographic Relationships Analysis*

*M. integrifolia* and *M. tetraphylla* are the two major species used in current breeding programs. In this study, we have identified a large number of accessions of *M. integrifolia* and *M. tetraphylla*. Therefore, a detailed study was undertaken on the genetic relationships and geographical diversity of those species. Only accessions identified as "pure" representatives of each species that were confirmed by structure analysis, including 99 *M. integrifolia* accessions from six localities (Figure 6, black circles), and the 94 *M. tetraphylla* accessions from eight localities (Figure 6, red circles), were used. The "pure" accessions collected from planted wild germplasm or unknown origin were not included. DARwin v6.0 software [50] was used to estimate pairwise Jaccard's genetic dissimilarity indices using 2872 SNP markers. A dendrogram was constructed by clustering accessions, based on a dissimilarity matrix using the unweighted neighbour-joining method. Clade strength in the dendrogram was tested using 100 bootstraps.

The correlation between genetic distance and geographic distance was estimated. The Nei's genetic distance matrix and the pairwise geographic distance among the localities were calculated using GenAlEx v6.5.2 [48]. The Mantel test was used to determine the correlation between genetic and geographic distance, using 999 random permutations.

#### **5. Conclusions**

This is the first study to investigate the genetic structure of a large collection of wild macadamia germplasm using thousands of high-throughput molecular markers. A total of 302 wild accessions were characterised, using 2872 SNP and 8415 silicoDArT markers. Our population structure and principal co-ordinate analyses identified three distinct populations, in which *M. jansenii* and *M. ternifolia* formed a single cluster. The Nei's genetic distance analysis clearly demonstrated that *M. jansenii* and *M. ternifolia* are related and showed greater heterozygosity in *M. ternifolia* than in *M. jansenii*. However, the limited number of accessions available in this study from *M. jansenii* and *M. ternifolia* limits the strength of our conclusion on the diversity and population structure of these species. We suggest that further analysis with more accessions from these two species should be conducted, to increase understanding of genetic diversity and clarify their classification as distinct species. We observed the significant correlation between genetic and geographic distance among *M. integrifolia* populations. Additionally, we were able to confirm the species identity of unknown wild accessions and suggested the use of these markers to resolve the unclear species composition of domesticated cultivars.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/6/714/s1, Table S1. Summary of markers; Table S2. List of SNP markers; Table S3. List of silicoDArT markers; Table S4. Q value at K = 3, Table S5. Genetic distance of *M. integrifolia*; Table S6. Genetic distance of *M. tetraphylla*; Table S7. 302 accessions information.

**Author Contributions:** Conceptualization, B.T. and T.M.; Methodology, T.M. and M.A.; Software, T.M. and M.A.; Validation, T.M.; Formal Analysis, T.M.; Resources, T.M.; Data Curation, T.M. and M.A.; Writing—Original Draft Preparation, T.M.; Writing—Review and Editing, M.A., C.H., R.H., B.T. and T.M.; Visualization, T.M.; Supervision, B.T., M.A., R.H. and C.H.; Project Administration, B.T. and M.A.; Funding Acquisition, B.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research has been funded by Hort Innovation, using the Macadamia research and development levy and contributions from the Australian Government. Hort Innovation is the grower owned, not-for-profit research and development corporation for Australian horticulture. PhD scholarship for TM was provided by Vietnam International Education Development (The Key Biotechnology Program—Ministry of Agriculture and Rural Development) and The University of Queensland.

**Acknowledgments:** The authors acknowledge Julia Playford for initiating the germplasm collection. Thanks to Agnelo Furtado (QAAFI), Jodi Neal (DAF), Katie O'Connor (QAAFI), Cathy Nock (Southern Cross University), Diversity Arrays Technology staff, Macadamia team and DAF staff, for their guidance and comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Availability Statement:** The SNP and silicoDArT markers generated and analysed during this study are obtainable from The University of Queensland's Institutional Data Access/Ethics Committee, but restrictions apply to the availability of these data. The dataset "DArTseq markers of wild macadamia species" is available at https://doi.org/10.14264/uql.2018.395, for researchers who meet the criteria for access to confidential data. Contact data@library.uq.edu.au.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **Genetic Diversity and Population Structure of** *Rhododendron rex* **Subsp.** *rex* **Inferred from Microsatellite Markers and Chloroplast DNA Sequences**

#### **Xue Zhang 1,2,3,4,**†**, Yuan-Huan Liu 1,**†**, Yue-Hua Wang 1,2,3 and Shi-Kang Shen 1,2,3,\***


Received: 7 February 2020; Accepted: 5 March 2020; Published: 7 March 2020

**Abstract:** Genetic diversity is vital to the sustainable utilization and conservation of plant species. *Rhododendron rex* subsp. *rex* Lévl. is an endangered species endemic to the southwest of China. Although the natural populations of this species are facing continuous decline due to the high frequency of anthropogenic disturbance, the genetic information of *R. rex* subsp. *rex* is not yet elucidated. In the present study, 10 pairs of microsatellite markers (nSSRs) and three pairs of chloroplast DNA (cpDNAs) were used in the elucidation of the genetic diversity, population structure, and demographic history of 11 *R. rex* subsp. *rex* populations. A total of 236 alleles and 12 haplotypes were found. A moderate genetic diversity within populations (*HE* = 0.540 for nSSRs, *Hd* = 0.788 for cpDNA markers), high historical and low contemporary gene flows, and moderate genetic differentiation (nSSR: *FST* = 0.165\*\*\*; cpDNA: *FST* = 0.841\*\*\*) were detected among the *R. rex* subsp. *rex* populations. Genetic and geographic distances showed significant correlation (*p* < 0.05) determined by the Mantel test. The species exhibited a conspicuous phylogeographical structure among the populations. Using the Bayesian skyline plot and species distribution models, we found that *R. rex* subsp. *rex* underwent a population demography contraction approximately 50,000–100,000 years ago. However, the species did not experience a recent population expansion event. Thus, habitat loss and destruction, which result in a population decline and species inbreeding depression, should be considered in the management and conservation of *R. rex* subsp. *rex*.

**Keywords:** *Rhododendron*; conservation strategies; genetic differentiation; gene flow; populations contraction

#### **1. Introduction**

*Rhododendron* is the largest woody plant genus in Ericaceae, containing more than 1000 recognized species, of which 567 species representing six subgenera are known from China [1]. Wild *Rhododendron* species are the major components of alpine and subalpine vegetation and widely distributed in America, Europe, and Asia, which have tropical to polar climates [2,3]. Therefore, these species are potential genetic resources for the development of new cultivars that can adapt to diverse environmental conditions [4]. In addition, plants in the genus *Rhododendron* L. produce numerous chemical constituents and are recognized as an important source of bioactive phytochemicals [5]. Some *Rhododendron* species are used as traditional medicine in China, India, Europe, and North America against various diseases, such as inflammation, pain, skin ailments, common cold, and gastrointestinal disorders [5]. However, as an important natural resource for human daily life and ecosystem composition, most *Rhododendron* species are facing risk of extinction due to the high frequency of anthropogenic disturbance [6]. Thus, research on the population genetic information of *Rhododendron* species is undoubtedly beneficial for germplasm protection and sustainable utilization [6–9].

Inferring genetic information is recognized as the undisputed basis for the sustainable exploitation and conservation of plant diversity [10,11]. Different molecular markers are used in assessing genetic information and identifying distinct plant populations for management and conservation [12–14]. Microsatellite markers (SSRs) are used in revealing the genetic characteristics and related influence factors of plant species at individual and population levels due to their desirable advantages [13,15]. Chloroplast DNA (cpDNA), which is transmitted only through seeds in most angiosperms, is exceptionally conserved in gene content and organization, providing sufficient information for genome-wide evolutionary studies [16]. cpDNA can reveal a more highly geographical structure than a nuclear genome [17] and is generally used in the detection of phylogeographical patterns in plant species [18,19]. Thus, nSSRs and cpDNA were extensively and successfully documented to study the genetic diversity, variation, and population demographic of plant species [17,20–22].

Habitat loss and destruction are global problems that continue to threaten global biodiversity [23,24]. Firstly, habitat destruction and loss can cause a decline in the distribution range and population and limit the natural regeneration of a species. Secondly, habitat destruction and loss can increase selfing rates and decrease pollen diversity, thereby affecting a species's reproductive success [23,25]. Finally, habitat destruction and loss increase genetic drift and inbreeding and reduce gene flow in the fragmented populations of species and substantially decrease species genetic diversity and adaptation to the changing environment. Some studies suggested that woody plants are less likely to lose genetic diversity after habitat fragmentation and destruction than herbaceous species [26]; however, recent reports showed that habitat loss and fragmentation are associated with increased level of inbreeding, reduced gene flow, genetic variation, plant progeny quality, and genetic extinction debt in woody species [24,27]. Thus, understanding the current genetic information of endangered woody plants subjected to habitat loss and destruction is necessary for effective conservation and management.

*Rhododendron* species are not only popular woody ornamental plants but also play an important role in alpine and subalpine ecosystems. In addition, *R. rex* is an important wild germplasm source of the genus *Rhodendron* in China and an endangered plant endemic to the southwest of China [1]. Three subspecies (*R. rex* subsp. *rex*, *R. rex* subsp. *gratum*, and *R. rex* subsp. *fictolacteum*) are recognized in the *R. rex* complex. Recently, the wild populations of *R. rex* subsp. *rex* are facing continuous decline due to the high frequency of anthropogenic disturbance and forest destruction. Genetic information is important to the management and sustainable exploitation of species, particularly those threatened by habitat loss and destruction. However, the genetic diversity and structure of the wild populations of *R. rex* subsp. *rex* remain unexplored. In the present study, the genetic diversity and differentiation, population structure, and demographic history of 11 *R. rex* subsp. *rex* populations are inferred using 14 pairs of microsatellite markers and three cpDNA sequences. The following central questions are addressed: (1) What is the level of genetic diversity in *R. rex* subsp. *rex*? How does they apportion among/within the populations? (2) How is the genetic structure of the remnant population? Are they affected by historical and contemporary gene flows? (3) How is the phylogenetic relationship of haplotypes? Are they reflected by the demographic history in *R. rex* subsp. *rex*? This result is used to design optimum management strategies for *R. rex* subsp. *rex* conservation.

#### **2. Materials and Methods**

#### *2.1. Plant Material Sampling*

We collected 212 individuals of *R. rex* subsp. *rex* from 11 natural populations. Four of these populations (BJS, DLT, BCL, and JZS) with 63 individuals were distributed in Yunnan province, whereas seven populations (QLB1, QLB2, QLB3, GDX, LJS, LZS, and YS) with 149 individuals were distributed in Sichuan province, China (Table 1). Our sampling locations covered all the herbarium sampling sites and documented sites of *R. rex* subsp. *rex*. During field sampling, sampled site, sampled individuals, and altitude were recorded (Figure 1 and Table 1). Fresh leaves were collected from individuals of *R. rex* subsp. *rex* separated by a minimum distance of 15 m and then dried in silica gel immediately. The samplings were stored at −4 ◦C until DNA extraction.

**Figure 1.** Distribution of chloroplast DNA (cpDNA) haplotypes (**A**); map of the geographic distribution of nuclear microsatellite clusters when the assumed cluster numbers are (**B**) *K* = 3 and (**C**) *K* = 6 in 11 populations of *Rhododendron rex subsp. rex*.



#### *2.2. DNA Extraction, PCR Amplification, and Sequencing*

We extracted genomic DNA of *R. rex* subsp. *rex* from the silica-dried leaves through a modified cetyltrimethyl ammonium bromide (CTAB) method [28]. Purified DNA was amplified by three universal cpDNA sequences (*rbc*L, *mat*K, and *psb*A*-trn*H). A total of 14 SSR markers were selected from recently developed nuclear microsatellites in *Rhododendron* subg. *Hymenanthes* according to their clarity and reproducibility (Table S1) [29–31]. PCR amplification was performed in accordance with methods of Zhang et al. [1]. Forward SSR primers were labeled with a fluorescent dye (FAM, TAMRA, or HEX) and visualized by an ABI 3730xl automated sequencer at Sangon Biotech Services Company Ltd. (Shanghai, China). Fragment sizes were read with the GeneMapper version 4.0. CERVUS [32] was used in eliminating four loci as existing null alleles (*F*(Null) > 0.4) [33]. PCR products by three cpDNA intergenic spacers were sequenced in both directions by Sangon Biotech Services Company Ltd. (Shanghai, China).

#### *2.3. Data Analysis*

#### 2.3.1. Data Analysis of Microsatellite Markers

The dataset was edited and formatted with GenAlEx ver. 6.3 [34]. We used Genepop ver. 4.1.4 to test the Hardy–Weinberg equilibrium (HWE) for each locus and population [35]. The universal genetic diversity parameters were calculated using GenAlEx ver. 6.3 [34] and POPGENE ver. 1.32 [35]. Then, rarefied allelic richness (*Ra*), total diversity (*HT*), and the level of gene differentiation (*GST*) among *R. rex* subsp. *rex* populations were estimated by FSTAT ver. 2.9.3 [13,36]. Analysis of the molecular variance (AMOVA) was implemented in the estimation of genetic variation by using Arlequin ver. 3.11 [37,38], and *FST* values with 10<sup>3</sup> permutations were calculated for the assessment of genetic differentiation between the pairwise populations of *R. rex* subsp. *rex*.

The historical gene flow (*Nm*) between the pairs of *R. rex* subsp. *rex* populations was calculated using Wright's principles using formula *Nm* = (1 − *FST*)/4*FST* [39]. In addition, pollen to seed gene flow ratio (mp/ms) was calculated using the Ennos formula [40]. To estimate contemporary migration patterns, we estimated the contemporary inter-population migrations in *R. rex* subsp. *rex* using the BayesAss version 3.0 by 3 <sup>×</sup> <sup>10</sup><sup>6</sup> Markov chain Monte Carlo (MCMC) iterations, with a burn-in of 10<sup>6</sup> iterations and a sampling frequency of 2000 by setting delta at 0.15 (the default value) [41–43].

Isolation by distance was examined in GenAlEx ver. 6.3 on the basis of the correlation of a geographic distance for pairwise populations with *FST*/(1 − *FST*) value [34]. Population structure was accessed through unweighted pair group mean analysis (UPGMA) and principal coordinate analysis. TFPGA ver. 1.3 with 5000 permutations [44] and GenAlEx ver. 6.3 [34] were used, respectively. The Bayesian clustering analysis with an admixture model to understand the population structure of *R. rex* subsp. *rex* using STRUCTURE ver. 2.2 was also explored [22,45]. *K*-values in the model ranged from two to 15 with 20 independent variables for each set with a burn-in of 1 <sup>×</sup> <sup>10</sup><sup>5</sup> iterations and 1 <sup>×</sup> 10<sup>5</sup> subsequent Markov chain Monte Carlo steps [45]. The final best-fit number of the clusters was determined by Δ*K* values in STRUCTURE HARVESTER ver. 0.6.8 [46,47].

By performing a heterozygosity excess test, we explored the demographic history of the populations. We used two different models, namely, stepwise mutation and two-phased models, to construct the recent bottleneck statistic in BOTTLENECK ver. 1.2.02 (Sign and Wilcoxon tests) [48]. We further analyzed the genetic bottleneck with Garza–Williamson index (GWI) in Arlequin ver. 3.11 [38]. GWI lower than the critical *Mc* value of 0.68 indicated a reduction in population size [1,38,49].

#### 2.3.2. Data Analysis of cpDNA Sequences

We used SeqMan II [50] and Bioedit ver. 7.0.4.1 [51] to treat the raw sequencing data and manually edited and assembled these sequences [22]. Three cpDNA intergenic spacers of *R. rex* subsp. *rex* were combined using PAUP 4.0 [52].

The haplotypes and variable sites of combined cpDNA sequences were calculated by DnaSP ver. 5.0 [53]. *Nei*'s nucleotide diversity (*Pi*) and haplotype diversity (*Hd*) indices of *R. rex* subsp. *rex* were tested within a population and among populations. The haplotype distribution in each sampled population was plotted by ArcGIS ver. 10.2. In addition, we calculated *HT* and within-population gene diversity (*HS*) with Permut ver. 1.0 [22]. The two values of population differentiation *GST* and *NST* were computed in accordance with the methods described by Pons and Petit [54] and with the same software. AMOVA of cpDNA sequences was constructed with Arlequin ver. 3.11 [37,38].

A genealogical haplotype network was constructed by Network ver. 4.2.0.1 for the estimation of the relationship per haplotype, and an indel was treated as a single mutational event [55]. The phylogenetic relationships of the as-obtained haplotypes of *R. rex* subsp. *rex* were inferred by Bayesian methods and neighbor-joining method in MrBayes ver. 3.1.2 [56]. *Nerium oleander* (EU916729.1, GQ997664.1 and AY899942.1) was selected as the outgroup species.

The evolutionary rates of seed plants (1.01 <sup>×</sup> <sup>10</sup><sup>−</sup>9) were used for each Beast analysis in BEAST ver. 1.6.1 [57–59] with 10<sup>7</sup> iterations and a burn-in of 10<sup>6</sup> under the Hasegawa–Kishino–Yano (HKY) model and a strict clock. The most suitable model (HKY) was determined by Mega ver. 6.06 [60]. The results were visualized using the software FigTree ver. 1.4.2. The signatures of demographic changes in *R. rex* subsp. *rex* populations were assessed. We calculated pairwise mismatch distribution, neutrality tests (Tajima's D and Fu's FS), the sum of squared deviations and the raggedness index, and their *p*-values using DnaSP ver. 5.0 [53] and Arlequin ver. 3.11 [38].

#### 2.3.3. Analysis of Species Distribution Model

Species distribution models were constructed for the identification of the species' potential distribution during the last glacial maximum (LGM; ~21–18 ka) at present and in the future (model rcp45 for the years 2050, model rcp85 for the years 2070) by MAXENT v. 3.3.3k [61]. For each time period, models were run for 25 replicates, and default parameters were used. A total of 28 points comprised our 11 sampled sites and 17 records compiled in the Chinese Virtual Herbarium (www.cvh.org.cn), and 19 bioclimatic variables were obtained from the WorldClim database [62].

#### **3. Results**

#### *3.1. SSR Data*

We identified 169 alleles at 10 polymorphic loci among 11 *R. rex* subsp. *rex* populations, ranging from eight (R-40, R-49) to 30 (R-30), with an average of 16.9 alleles per locus (). All loci and populations conformed to HWE (*p* > 0.05; Table S3). At the locus level, genetic diversity and variation exhibited certain dissimilarities. However, no remarkable difference was detected between populations (Table 2). *NP* varied from 2 (BJS) to 12 (DLT and JZS), *Ra* varied from 3.071 (BJS) to 4.231 (JZS), *AE* varied from 2.011 (BJS) to 3.954 (YS), and *I* varied from 0.740 (BJS) to 1.319 (YS). The minimum values of *HO* (0.300) and *HE* (0.399) occurred in population BJS. The mean value of fixation indices (*Fis* = 0.171; Table 2) was positive for *R. rex* subsp. *rex* populations, suggesting a slightly increased level of inbreeding.

**Table 2.** Genetic diversity of populations in *R. rex* subsp. *rex.*


Note: *NA*, mean number of alleles; *AE*, number of effective alleles; *I*, Shannon's information index; *HO*, observed heterozygosity; *HE*, expected heterozygosity; *NP*, number of private alleles; *Ra*: rarefied allelic richness; *Fis*, fixation index; *PPB* (%), percentage of polymorphic loci.

AMOVA indicated that 83.53% genetic variation occurred within populations, whereas 16.47% variation was estimated among the populations (Table 3). Genetic differentiation was observed among populations (*FST* = 0.165, 0.15 < *FST* < 0.25).


**Table 3.** Analysis of molecular variance (AMOVA) based on 14 microsatellites and three cpDNA sequences in *R. rex* subsp. *rex.* d.f.: degrees of freedom.

Note: \*\*\* *p* < 0.001, most significant difference.

A high level of historical gene flow of pairwise populations was detected in *R. rex* subsp. *rex* (Table 4). The minimum gene flow was generated from populations BJS and QLB2 (0.307), whereas the maximum gene flow was generated from populations QLB2 and QLB3 (7.452). The relative contribution of mp/ms was 24.775, indicating that pollen dispersal played an important role in the gene flow of *R. rex* subsp. *rex*. Except for other populations migrating to LJS, a non-significant level of inter-population contemporary migration rate between the populations of *R. rex* subsp. *rex* was detected (*M* < 0.05, Table 5).

**Table 4.** Historical gene flows between 11 populations of *R. rex* subsp. *rex.*


**Table 5.** Contemporary migration rate between populations of *R. rex* subsp. *rex* by BayesAss with 95% confidence intervals.


Note: population->: population migration into the other populations.

The optimal *K* value was 3 with Δ*K* of 63.924, and the second fit value was 6 with Δ*K* of 16.473 according to STUCTURE analysis (Figure S1A,B). At *K* of 3, the populations GDX, YS, and JZS were similar, BJS and DLT were related, and the remaining populations BCL, LJS, LZS, QLB1, QLB2, and QLB3 comprised one group (Figure 1B; Figure S1A). At *K* = 6, the populations BCL and LZS were further distinguished from LJS, QLB1, QLB2, and QLB3, and JZS was further distinguished from GDX and YS (Figure 1C; Figure S1A). This result was in accordance with the conclusions of UPGMA (Figure S1C) and principal component analysis (PCA) at the population level (Figure 2A). In conclusion, the populations of *R. rex* subsp. *rex* should be grouped into three groups according to the genetic structure analysis results by using SSR data. A significant correlation between genetic and geographic distances was determined by Mantel test (*p* < 0.050; Figure 2A).

**Figure 2.** Principal coordinate analysis (**A**) and the plot of geographical distance against genetic distance (**B**) for *R. rex* subsp. *rex* by SSR data analysis.

As shown in Table 6, most of the probabilities of the Wilcoxon and Sign tests under both models in *R. rex* subsp. *rex* populations were non-significant (*p* > 0.05). In addition, the allele distribution per population was presented as a normal L-shaped distribution. The above-mentioned results indicated that the *R. rex* subsp. *rex* populations conformed to the mutation–drift equilibrium. However, GWI values were lower than the critical *M*c indices (0.68), implying that the populations of *R. rex* subsp. *rex* underwent a demographic reduction in history.


**Table 6.** Bottleneck analysis of 11 populations in *R. rex* subsp. *rex.*

Note: \*\* *p* < 0.01, significant difference.

#### *3.2. cpDNA Sequence*

The three cpDNAs, *mat*K, *psb*A-*trn*H, and *rbc*L were 828, 398, and 658 bp in length, respectively (GenBank accession numbers: MN228019-MN228483). The 1884-bp combined cpDNA sequences of *R. rex* subsp. *rex* had 18 polymorphic sites and 12 haplotypes (H1–H12) (Table 1). The detailed haplotype distribution per population is displayed in Figure 1.

The populations DLT (0.538 and 0.00031) and GDX (0.514 and 0.00115) exhibited additional maximum values of *Hd* and *Pi* per site, followed by QLB2 (*Hd* = 0.264, *Pi* = 0.00028) and QLB3 (*Hd* = 0.264, *Pi* = 0.00028), whereas no diversity was found in the seven remaining populations (Table 1). In summary, the total *Hd* and *Pi* for *R. rex* subsp. *rex* were 0.78768 and 0.00180, respectively. The value of *HT* (0.909) was higher than that of *HS* (0.337), and the value of *NST* (0. 0.774) was significantly higher than that of *GST* (0.629; *p* < 0.05; Table S4). These results indicated the remarkable phylogeographic structure among the populations of *R. rex* subsp. *rex*. AMOVA indicated that 84.07% genetic variation was partitioned among populations, whereas 15.93% was partitioned within populations (Table 3). This result was inconsistent with the result of nSSRs data. Moreover, significant genetic differentiation was observed among *R. rex* subsp. *rex* populations (*FST* = 0.841, *p* < 0.001).

The phylogenetic relationships of 12 cpDNA haplotypes are shown in Figure 3A. H7, H8, H9, and H12 were grouped into one clade. Among the remaining haplotypes, H1, H2, and H3 were grouped into one clade, while H4, H5, H10, and H11 were grouped into another clade. H6 was differentiated from the remaining others. However, the result of the haplotype network diagram shown that H6 distributed more closely to H5, and the 12 cpDNA haplotypes should be divided into three groups (Figure 3B).

**Figure 3.** Bayesian tree (**A**) and the network of haplotypes (**B**) based on combined cpDNA sequences. (**A**) The numbers on branches indicate the posterior probability; (**B**) the size of the circles corresponds to the frequency of each haplotype, and the vertical branches indicate mutational steps.

Only the Fu and Li'*D* yielded significantly positive values (*p* < 0.05; Table S5) according to the neutrality test. This result indicated that no recent population expansion in *R. rex* subsp.*rex* occurred, and this was supported by the effects of mismatch distributions shown in the multimodal graph (Figure 4A). Based on the Bayesian analysis, the skyline plot indicated that the historical demographic of *R. rex* subsp. *rex* populations experienced a contraction event approximately 50,000–100,000 years ago and had no recent expansion (Figure 4B).

**Figure 4.** Mismatch distribution (**A**) and Bayesian skyline plot based on combined cpDNA sequences (**B**). (**A**) The solid lines show expected values, whereas the dashed lines represent observed values under a model of sudden population expansion. (**B**) The black line indicates effective population size fluctuation throughout.

#### *3.3. Species Distribution Model*

According to predictions of *R. rex* subsp. *rex*'s past, present, and future potential distributions, the predicted current distributions showed a clear range contraction relative to the LGM distributions. Moreover, the moderate habitat suitability (>0.31) was slightly removed to the northeastern direction (Figure 5A,B). The potential distribution with moderate to high habitat suitability (>0.31) for the years 2050 and 2070 was slightly extended compared with the present-day model (Figure 5C,D).

**Figure 5.** Distribution dynamics of *R. rex* subsp. *rex* using MAXENT. Predicted distributions are shown for (**A**) the last glacial maximum (LGM), (**B**) the present, (**C**) the year 2050, and (**D**) the year 2070. Color-coded keys represent different habitat suitability.

#### **4. Discussion**

#### *4.1. Genetic Diversity in R. rex Subsp. rex Populations*

The genetic diversity of species reflects its long-term evolution and adaptation demographic history [63]. Based on nSSR data, *R. rex* subsp. *rex* has lower genetic diversity (*HE* = 0.540) than the other species of *Rhododendron*, such as *R. protistum* var. *giganteum* (*HE* = 0.602) [9], *R. jinggangshanicum* (*HE* = 0.642) [64], *R. simsii* (*HE* = 0.754) [65], *R. ripense* (*HE* = 0.800) [66], and *R. brachycarpum* (*HE* = 0.815) [67], but has higher genetic diversity than *R. ferrugineum* (*HE* = 0.500) [68]. Meanwhile, the genetic diversity of *R. rex* subsp. *rex* is evidently higher than that of the "narrow" species (*HE* = 0.420) and lower than that of the "widespread" species (*HE* = 0.620) [69]. Inconsistent with the results of microsatellite markers, the genetic diversity of *R. rex* subsp. *rex* (*Pi* = 0.00180) assessed by cpDNA shows a higher tendency toward high genetic diversity than the genetic diversities of 20 species of *Rhododendron* sect. *Brachycalyx* (insular species, *Pi* = 0.00040; continental species, *Pi* = 0.00160) in East Asia [70], but a lower tendency than the genetic diversity of bird-dispersed arctic–alpine plant *Vaccinium vitis-idaea* in Ericaceae (*Pi* = 0.00240) [71]. The value of *HT* estimated in *R. rex* subsp. *rex* (0.909) is higher than the mean value of *HT* (0.6747) in 170 plant species according to cpDNA [19,72]. Therefore, *R. rex* subsp. *rex* possesses a relatively moderate genetic diversity compared with the other species in *Rhododendron*. In general, life span, reproductive mode, and breeding system are the important factors in genetic diversity [6,22,69]. As in other outcrossing and long-lived species in *Rhododendron*, high historical gene flow among ancestral population mitigates the loss of genetic diversity and further results in a moderate or high genetic diversity in remnant populations [69,73]. Thus, the current levels of genetic diversity in *R. rex* subsp. *rex* may be attributed to the species' long-lived habit, which is similar to other perennial woody plants [9,22].

#### *4.2. Genetic Di*ff*erentiation and Structure among R. rex Subsp. rex Populations*

The *FST* value of *R. rex* subsp. *rex* indicated that a moderate genetic differentiation among populations occurred. A total of 83.75% genetic variation occurred within *R. rex* subsp. *rex* populations with regard to nSSR markers, whereas 83.53% variation was partitioned among populations with regard to cpDNA sequences. This discordance should be affected by dispersal mechanisms among populations of plant species in *Rhododendron* [74,75]. Insect visitors are the primary pollen dispersal vectors for *Rhododendron* species. Various insect vectors evolved longer dispersal distance for pollen, whereas the seeds dispersed by wind traveled less than 10 m albeit in open landscapes [74]. Meanwhile, this different consequence might be related to the type and evolutionary rates of different genome sequences [76]. In general, the evolutionary rate of nuclear genomes transmitted by parents was higher than that of maternally inherited chloroplast genomes [77]. Therefore, cpDNA variations reflected a past change, whereas nSSR variations inferred recent events in the population demographics of *R. rex* subsp. *rex*.

On the basis of genetic structure analysis by SSR data, the populations of *R. rex* subsp. *rex* were grouped into three groups, and the correlation between genetic and geographic distances was significant (*p* < 0.05). Phylogenetic trees and genealogical haplotype networks based on cpDNA sequences showed that three reciprocally lineages were detected. This species possessed unique genetic lineages and endemic cpDNA haplotypes in its separate refuge populations. The closely related haplotypes H1, H2, and H3 were distributed in populations BJS, BCL, and DLT; H4, H5, H6, H10, and H11 occurred primarily in populations LJS, JZS, LZS, QLB1, QLB2, and QLB3; H7, H8, H9, and H12 were only detected in populations GDX and YS.

Habitat dislocation and overexploitation accelerate the generation of genetic differentiation among populations [9,13]. In the sampled regions, large-scale land reclamation and unreasonable forest destruction can be observed, which resulted in habitat loss and fragmented distribution in *R. rex* subsp. *rex* natural populations. In addition, gene flow is a fundamental micro-evolutionary force, influencing genetic differentiation among populations [78,79]. The contemporary gene flow of *R. rex* subsp. *rex* is lower than that of the related species of *R. protistum* var. *giganteum* [9], which plays an important role in the formation of genetic structure and differentiation among *R. rex* subsp. *rex* populations. Moreover, breeding system is an important factor for the genetic differentiation and structure of a species [6,22,69]. Although both selfing and outcrossing are detected in *Rhododendron* species [80,81], the breeding system in *R. rex* subsp. *rex* is yet to be explored. Based on the positive value of fixation indices (*Fis*) and the phenomenon of all populations deviated from HWE in the present study, we can reasonably speculate that inbreeding is present in populations of *R. rex* subsp. *rex*. Hence, the mating system and its influences on genetic differentiation and structure must be further elucidated in *R. rex* subsp. *rex*.

#### *4.3. Population Demographic History of the R. rex Subsp. rex*

Exploring the historical demography of a species can facilitate our knowledge of its ancient evolutionary environment [58]. Quaternary glaciers profoundly affected the distribution and genetic variation of plant species. Tremendous global climatic oscillations during quaternary glaciations with several glacial–interglacial cycles caused the expansion and contraction of plant distribution [82]. Most plants were subjected to population demographic stability or expansion throughout the LGM [83,84]. The Bayesian skyline plot of cpDNA showed that *R. rex* subsp. *rex* experienced a notable reduction approximately 50,000–100,000 years ago. This supposition is supported by the GWI values, which are lower than the critical *M*c indices (0.68). Microsatellite-based bottleneck analysis indicated that no recent bottleneck event occurred in the natural populations of *R. rex* subsp. *rex*. Therefore, the population demographic contraction detected in the *R. rex* subsp. *rex* might have been the result of climate oscillations, and the finding is consistent with the results obtained from other species, such as *Cycas simplicipinna* [63]. Typically, rapid population expansion occurred in the post-glacial period because temperatures increased to warm conditions [85]. However, based on neutrality and mismatch distribution tests, no recent population expansion occurred in *R. rex* subsp. *rex*. We speculate that the populations of *R. rex* subsp. *rex* might have survived in situ rather than migrating long distances to suitable habitats and that evolutionary adaptation might have occurred in the cold environment. The current existing populations of *R. rex* subsp. *rex* were limited in distribution at 2400–3400 m elevation, and this condition might partly support our speculation. In addition, the complex topology of physical environmental condition in southwest China might cause geographical barriers between population migrations. This scenario was also found in the population demography of *Leucomeris decora* [86].

#### **5. Conclusions**

The present study firstly investigated the genetic diversity, population structure, and demographic history of 11 remnant *R. rex* subsp. *rex* populations. A moderate genetic diversity, a high genetic differentiation, and a conspicuous geographical structure were detected in *R. rex* subsp. *rex*. The species possessed unique genetic lineages and endemic cpDNA haplotypes in its separate refuge populations. In addition, we found that *R. rex* subsp. *rex* experienced a population contraction approximately 50,000–100,000 years ago based on the comprehensive analysis of demographic history. Furthermore, no recent population expansion occurred in this species. Hence, the conservation of *R. rex* subsp. *rex* should focus on habitat destruction and loss, resulting in a population decline and inbreeding depression within populations. Furthermore, all the remnant adult trees of *R. rex* subsp. *rex* should receive priority protection for the maintenance of its genetic diversity. This research exhibited tremendous ecological value for the future conservation and sustainable utilization of *R. rex* subsp. *rex* and other similar plants, which are subjected to climate oscillation, inbreeding depression, overexploration, and habitat destruction.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/3/338/s1: Figure S1. Bayesian inference of the number of clusters when *K* = 3 and *K* = 8 (A), delta *K* values obtained (B), and an unweighted pair-group method with arithmetic averages (UPGMA) phenogram of *R. rex* subsp. *rex* (C) based on nSSR; Table S1. The information of 14 microsatellite primers for the *R. rex* subsp. *rex*; Table S2. Summary of the 10 microsatellite loci used to the 11 populations of *R. rex* subsp. *rex;* Table S3. *p*-Value of Hardy–Weinberg equilibrium test for 11 populations of *R. rex* subsp. *rex;* Table S4. Genetic diversity and differentiation parameters for the combined cpDNA in 11 populations of *R. rex* subsp. *rex;* Table S5. Parameters of neutrality tests based on cpDNA of *R. rex* subsp. *rex.*

**Author Contributions:** S.-K.S., Y.-H.L., X.Z., and Y.-H.W. initiated and designed the research. S.-K.S. obtained funding for this study. S.-K.S., Y.-H.L., and X.Z. collected the materials and performed the experiments. S.-K.S., Y.-H.L., X.Z., and Y.-H.W. wrote and revised the paper. All authors read and approved the version to be published. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financially supported by the National Key Research and Development Project of China (2017YFC0505204), the National Natural Science Foundation of China (31560224, 31870529), the Young Academic and Technical Leader Raising Foundation of Yunnan Province (2018HB035), the Open Fund of Yunnan Key Laboratory for Plateau Mountain Ecology and Restoration of Degraded Environments (2018DG005), and the Program for Excellent Young Talents, Yunnan University.

**Acknowledgments:** The authors thank Xiu-Yan Feng from the Kunming Institute of Botany, Chinese Academy of Science for her construction suggestions. We also thank Fang-Li Liu, Xiong-Li Zhou, and Yue Zhang for their assistance with the field sampling.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Genetic Distinctiveness Highlights the Conservation Value of a Sicilian Manna Ash Germplasm Collection Assigned to** *Fraxinus angustifolia* **(Oleaceae)**

**Loredana Abbate 1,**†**, Francesco Mercati 1,\*,**†**, Giuseppe Di Noto 2, Myriam Heuertz 3, Francesco Carimi 1, Sergio Fatta del Bosco <sup>1</sup> and Rosario Schicchi <sup>2</sup>**


Received: 24 July 2020; Accepted: 12 August 2020; Published: 14 August 2020

**Abstract:** The cosmopolitan genus *Fraxinus* comprises about 40 species occupying several habitats in the Northern Hemisphere. With some species hybridizing and sharing genetic variants, questions remain on the species assignment of germplasm within the genus *Fraxinus* despite numerous species-specific assessments. A multidisciplinary approach was employed to provide a definitive insight into the genetics of an endangered *Fraxinus* "manna ash" collection, located in a rich plant biodiversity hotspot of the Madonie Mountains (Sicily). Although the collection size was small, genetic diversity, assessed by chloroplast (cpSSR) and nuclear (nSSR) microsatellites (SSR—Simple Sequence Repeats), allowed identifying three different chloroplast haplotypes, with one (H5) dominant, and several polymorphic loci, able to discriminate most of the local accessions studied. Molecular data were linked to cytofluorimetric and phenotypic evaluations and, contrary to popular belief that manna ash is *Fraxinus ornus* L., the germplasm currently used for manna production belongs to *Fraxinus angustifolia* Vahl. Interestingly, joint analysis of our genetic panel with a large European dataset of *Fraxinus* spp. suggested the presence of a possible glacial refuge in Sicily, confirming its importance as biodiversity source. Our results will be helpful for the design of long-term conservation programs for genetic resources, such as in situ and ex situ conservation, seed collection and tree reintroduction.

**Keywords:** *Fraxinus* spp.; manna; local varieties nSSR; cpSSR; cytometry; morphological traits

#### **1. Introduction**

The genus *Fraxinus* (Oleaceae) comprises 45–65 tree species and is represented across large areas of Europe, Asia and North America [1,2]. *Fraxinus* species show considerable variability in their flowering biology, ecological requirements and distribution ranges as a result of dispersal and vicariance processes as well as adaptive evolution underlying diversification in the genus [3].

In Europe, three ash species are present: the common ash, *Fraxinus excelsior* L., the flowering ash, *Fraxinus ornus* L., and the narrow-leaved ash, *Fraxinus angustifolia* Vahl [4]. *F. excelsior*, a polygamous species with male, female and hermaphrodite individuals, [5,6], is found throughout the continent except in the Mediterranean region, while *F. ornus* is androdiecious [3] and grows in relatively high

altitude areas of the eastern Mediterranean basin [7,8]. Finally, *F. angustifolia*, a close relative to *F. excelsior*, shows hermaphrodite flowers and is very common around the Mediterranean basin and to the west of the Black Sea in the Danube basin. Due to the level of local adaptation [9], different botanical names were assigned to *F. angustifolia.* Three groups can be identified at subspecies level, structured by geographical regions [7,10]: *F. angustifolia* ssp. *oxycarpa* (Bieb. ex Willd.) Franco and Rocha Afonso observed in the in East Central and Southeastern Europe, including the Balkans; *F. angustifolia* ssp. *syriaca* (Boiss.) Yalt. in the east of Europe, from Turkey to Pakistan; and *F. angustifolia* ssp. *angustifolia* in southwestern Europe. While *F. excelsior* grows on a wide range of soil types, preferring nutrient-rich substrates, *F. angustifolia* grows near surface and ground waters, and thus its habitat is more restricted [11,12]. Indeed, the distribution of this species in the Mediterranean region is irregular and limited to smaller and isolated populations on drier sites at higher altitudes or on wetland sites [13]. The levels and distribution of genetic diversity in *F. excelsior* and *F. angustifolia* reflect the differential climatic and ecological affinities of both species, as well as the signature of ancient and contemporary hybridization [4,14].

Sicily boasts several botanical highlights. Both trees and endemic herbs survived through the last glacial maximum, highlighting the role of Sicily as a refuge area for thermophilous European plants during glacial times [15]. In addition Sicily, the biggest region of Italy, has been identified as one of 52 putative glacial refuges based on phylogeographic data from trees and herbs, and lies within one of ten regional hotspots of plant species biodiversity in the Mediterranean basin [16].

The presence of the genus *Fraxinus* in Sicily is restricted mainly to scattered groves and natural populations of *F. angustifolia* and *F. ornus* spread in the hilly area of the Madonie and Nebrodi Regional Natural Parks, at altitudes between 100 and 900 m a.s.l. In this area, the relevance of *Fraxinus* spp. is extremely high, from economical, botanical, naturalistic and historical points of view [17,18]. In fact, the *F. ornus* cultivation, brought to Sicily at the time of the Arab invasion (ninth-eleventh century) and known as manna ash, is grown for the production of manna, a crystal coagulate exuding after cutting 5–10 cm long slits with a billhook along and inside the whole thickness of the bark. From these slits a purplish and bitter liquid flows out. After contact with the air, this liquid turns white and becomes sweet, rapidly coagulating and forming crystal-like layers, called manna. Then manna is harvested and placed in specific trays for drying. Its production varies from 0.2 to 1.5 kg per tree per year, depending on ash cultivar [18]. However specific pedoclimatic conditions (persistent dry, hot periods and steady ventilation during summertime) are mandatory for the bioaccumulation of osmotic sugars in the phloem of *Fraxinus* and to obtain the complete drying of the sap that exudes from cuttings. The most abundant constituent of manna is mannitol (around 50% of total chemical compounds), a hexavalent alcohol known also as "manna sugar"; monosaccharide, oligosaccharide, mineral and volatile constituents are also well represented [18,19]. Due to its characteristics, the manna is used in cosmetic, pharmaceutical and confectionary industries [20]. In Sicily, the presence of *Fraxinus* groves has given rise to a traditional agroforestry landscape of unique value and relevant economic importance until the late 1950s. In the last 50–60 years, after the introduction of synthetic mannitol, the integrity and the extension of the manna ash landscapes have been dramatically reduced. Today, because of habitat fragmentation, naturally poor regeneration and land use change, the maintenance of the crop is threatened. The present landscape of Sicilian manna *Fraxinus* is limited to a fragmented and isolated area of few hundreds of hectares in the territory around Castelbuono and Pollina (Palermo-Italy), in which natural populations, single and scattered trees are distributed. Large differences in several phenotypic traits, such as leaf morphology, inflorescence features and infructescence characteristics, are observed among the different plants. However, based on their distinctive morphological features, several cultivars (or varieties) were distinguished and identified with dialect names, since the nineteenth century [17,20–22]. The declining status of the manna ash germplasm has been confirmed by the last survey on the residual consistency of the crop in the Madonie Mountains [17,18]. In less than 150 years, about 50% of the germplasm of the manna ash cultivars has been lost; therefore, both the germplasm of the cultivated varieties and the ash cultivation are threatened with extinction. Previous studies,

based on morphological and chemical analysis, confirmed the presence of 16 cultivars, 13 of which putatively belonging to *F. angustifolia* and the remaining three putatively belonging to *F. ornus* [17]. However, the value of these approaches to characterize manna cultivars and varieties is limited due to their drawbacks, such as the influence of environment on trait expression, epistatic interactions and pleiotropic effects. Therefore, there is an urgent need to focus on the unknown genetic background, structure, relationships and diversity of the *Fraxinus* gene pool distributed in Sicily, to provide useful information aimed both to prevent biodiversity loss, safeguarding this endangered plant populations, and develop future breeding programs.

Microsatellite (SSR, simple sequence repeat) markers are codominant, highly polymorphic, reproducible, well distributed throughout the genome and suitable for automated analysis, and are; therefore, widely used markers for plant variety characterization [23–28]. SSRs have proved to be ideal tools, in conjunction with analysis of morphological characteristics, for analyzing genetic diversity and structure and evolutionary relationships between populations of rare, isolated and endangered species. Chloroplast DNA sequences are suitable tools to highlight the postglacial colonization routes [4,29,30] because they are usually non-recombinant in angiosperms and they are transmitted through seeds only [31,32], implying that the effective population size is more reduced for chloroplast DNA than for nuclear DNA. Nuclear and plastid DNA microsatellites have been extensively used to describe the phylogeography of *Fraxinus* spp. and clarify the hybridization process among ash species [4,14,33,34].

Nuclear DNA content has an important role in systematics, and it is a useful tool in biodiversity evaluation [35,36]. Flow cytometry is a quick and effective method to assess genome size (i.e., the amount of DNA in cell nuclei) [37,38]. Studies of genome size variation can shed light on the molecular mechanisms underlying this type of variation [39–41], which is a relevant indicator of genetic divergence and speciation processes [42,43].

In this study, we used morphological traits, genetic profiles (nuclear and plastid SSR markers) and genome size estimates (nuclear DNA content) to characterize the biodiversity within and among native accessions of *Fraxinus* in Sicily, belonging to a historical manna ash collection that, from previous studies [18], seems to be represented by varieties mainly characterized as *F. angustifolia*. We also present the first assessment of evolutionary relationships between the *F. angustifolia* germplasm from Sicily and *F. angustifolia* germplasm from throughout Europe. Knowledge of the extent of genetic diversity and characterization of population genetic structure of native *Fraxinus* spp. in Sicily will have important implications for the conservation of this biodiversity, to provide useful knowledge for the management of the manna ash germplasm collection, in-situ/ex-situ conservation and to save this precious cultural heritage.

This analysis is a first step towards selecting breeding material and establishing conservation strategies to sustain and potentially increase the overall production and productivity of manna ash and the continual use of this agroforestry tree resource through varietal improvement and suitable agronomic practices under southern Mediterranean conditions.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

Leaf samples of native manna ash (*Fraxinus* spp.) germplasm located in Madonie National Park (Palermo, Sicily; 37◦53 N 14◦01 E/37.883◦ N 14.017◦ E) were collected. Individual trees were selected according to specific criteria based on morphological features, historical notes and their use for manna production [17,18,21,22,44]. The list of accessions studied, and their dialectical name, is indicated in Table 1 and Table S1. At least three individual trees (replicates) were collected for each local variety studied, except for monumental trees. In total thirty-four (34) trees (accessions) were analyzed. Four *F. ornus* trees, collected in the same area, were also added in the study and used as reference. Young and fully expanded leaves were immediately frozen in liquid nitrogen and conserved at −80 ◦C

until used for DNA extraction. This sample set, comprising 38 accessions (Table S1) belonging to manna ash varieties and *F. ornus* (FOR) collected in Sicily, is named sample set #1 from here on.

To full investigate genetic relationships and population genetic structure among the analyzed manna ash accessions and natural populations of *Fraxinus* spp., our sample set (#1) was added to SSR profiles already available. To avoid redundancy in the results, this second sample set (#2) included only samples showing unique genetic profiles isolated from sample set #1, and SSR unique profiles of 475 individual ash trees described in Gérard et al. (2013) [14], belonging to *F. angustifolia* (148; FAN), *F. excelsior*(274; FEX) and hybrids between *F. angustifolia* and *F. excelsior*(53; HYB) collected in the natural, partially overlapping, distribution ranges of the two species across Europe (Table S2). This second sample set is named sample set #2 from here on.

#### *2.2. Morphological Trait Variation*

For each accession from set #1, including four *F. ornus* trees, a set of morphological traits were evaluated (Table S1). The most discriminant morphological features among *Fraxinus* species (e.g., the samara stalk length, the number of samaras/raceme, the colour of apical buds) were recorded; three replicates for each trait were evaluated.

#### *2.3. Genome Size and Ploidy Level*

The genome sizes of collected accessions (Table 1) were estimated by flow cytometry using the *F. ornus* (2C-value = 1.98 pg) as reference standard. Nuclei isolated from a single mature leaf were analyzed in three technical replicates for each accession of set #1. The analysis was carried out with a Partec PAS flow cytometer (Partec, http://www.partec.de/), equipped with a mercury lamp. Fully expanded leaves (1 cm2) were chopped in a glass Petri dish with 1 mL nuclei extraction buffer OTTO1 [45] and 3 drops of Tween 20. After 3 min, 1 mL of OTTO2 [45] supplemented with DAPI (4 μg/mL) was added. The solution was filtered through a 30 μm Cell-Trics disposable filter (Partec). The relative fluorescence intensity of stained nuclei was measured on a linear scale, and 4000–5000 nuclei for each sample were analyzed. DNA content histograms were generated using the Partec software package (Partec-FlowMax®, Münster, Germany).

#### *2.4. DNA Extraction and Genotyping*

Genomic DNA was extracted from 100 mg of powdered, frozen, young leaf tissue of each individual from sample set #1 using the QiagenDNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The purity and quantity of the DNA extracts were assessed with a NanoDrop 1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA). Molecular investigations were carried out by amplifying 6 plastid (chloroplast) microsatellite (SSR, simple sequence repeat) markers (cpSSR) (ccmp2, ccmp3, ccpm4, ccmp6, ccmp7, and ccmp10 [46]), 3 plastid DNA regions (atpB-rbcL, CPFRAX6 and matK [33]) and 11 nuclear microsatellite nSSR (FEMSATL4, FEMSATL11, FEMSATL12, FEMSATL16, FEMSATL19 [47], M230 [48], EST-SSR326, EST-SSR427, EST-SSR431, EST-SSR520, EST-SSR528 [49]) markers, respectively. PCR reactions were performed following the procedures reported in Garfì et al. (2013) [50], using primers fluorescently labelled with FAM, VIC, NED and PET. The fragments were separated by capillary electrophoresis using an ABI PRISM 3130 Genetic Analyzer (Applied Biosystems) to detect length polymorphism only. Fragments were sized and binned into alleles using Gene Mapper v. 4.1 software (Thermo Fisher Scientific, Waltham, MA, USA), (Applied Biosystems) (Table S2).

#### *2.5. Data Analysis*

Plastid DNA haplotypes were defined based on the concatenation of fragment length polymorphism at cpSSRs and amplified plastid DNA regions. To assign the species of origin of the sampled materials, the obtained haplotypes were compared visually with previously published cpSSR profiles from the three European ash species reported in Heuertz et al. (2006) [4].

Genetic relationships between nSSR profiles of both sample sets (set #1; set #2) were estimated using Bruvo's distance [51] in the poppr package [52] in R Core Team (2020; http://www.R-project.org). A dendrogram was computed from each distance matrix using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) implemented in the adegenet package [53] in R. Bootstrap analysis was performed based on 1000 replicate samples to assess the robustness of the inferred evolutionary relationships in the dendrogram.

After establishing that Sicilian manna ash is *F. angustifolia* based on the listed analysis, we assessed the genetic differentiation between Sicilian manna ash and the collection of European *F. angustifolia*, *F. excelsior* and their hybrids using Wright's fixation index (*Fst*) [54] and Nei's (1973) distance [55] computed through hierfstat [56] in R. The genetic variability across *F. angustifolia* (FAN accessions belonging to set #2 was estimated using the observed (*Ho*) and expected (*He*) heterozygosities [55], Shannon's index (*I*) and the inbreeding coefficient (*Fis*) for sampling locations in GenAlEx 6.5 [57]. Each locus and population was tested for Hardy-Weinberg equilibrium deviation with the exact test through Genepop v. 3.4 [58], with the default parameters (dememorization = 10,000, number of batches = 100, and number of iterations/batch = 5000). To compare the genetic diversity of population studied with different sample sizes the allelic richness (*AR*) within sampling locations was also evaluated though SPAGeDi software [59]. Population pairwise fixation index (*Fst*) values were then computed for only *F. angustifolia* accessions, grouped based on their sampling location or country and a neighbor-joining (NJ) tree using was developed using the adegenet package [53].

Finally, to evaluate the genetic structure of *F. angustifolia* distributed across Europe (belonging to set #2 a discriminant analysis of principal components (DAPCs) was employed. Samples belonging to *F. angustifolia* were grouped based on their geographic area of origin (in five main groups: Balkans, Eastern Europe, France, Italy and Portugal) and DAPC analysis, implemented in the adegenet package [53], was carried out to infer population subdivision of the germplasm studied. The number of principal components (PCs) retained was evaluated using the cross-validation procedure. The *K*-means algorithm "find.clusters" was used to independently verify the assignment of individuals to clusters.

#### **3. Results**

Ploidy level, DNA content and morphological trait variation of the manna ash collection DNA content of the 34 accessions belonging to the manna ash collection and of four *F. ornus* individual trees (set #1) was estimated based on the relative intensity of fluorescence using the known *F. ornus* genome size (1.98) as reference [60]. DNA content estimates were congruent with all individuals of set #1 being diploids (2*n* = 2 × = 46) [61]. Flow cytometry-derived genome sizes (pg/2C) for manna ash varieties ranged from 1.539 to 1.544 (Table 1).

Broad variability in several phenotypic traits related to leaf morphology, inflorescence and infructescence features were observed among the different local varieties belonging to the Sicilian manna ash germplasm collection (Table S1). Leaf shape, number of leaflets, leaflet margin, petiole length and rachis wings were highly variable (Table S1). Also the inflorescences showed a great range of variation. Differences between varieties were mainly observed in the structure of reproductive organs, including the terminal clustering of multiple carpels, the partial basal fusion of individual carpels and the attachment and orientation of the seeds (Table S1). The mean values of some morphological traits showed most high variability between the manna ash collection and the *F. ornus* reference trees (Table 1). Specifically, the largest phenotypic variability was found in the number of samaras/raceme (5.8 vs. 46.0), followed by samara stalk length (1.1 vs. 0.4) and apical leaf length/width ratio (0.28 vs. 0.52) (Table 1). Moreover, flowering time (November–January for manna ash vs. May–June for *F. ornus*) and the colour of apical buds (red in manna ash vs. white in *F. ornus*) were clearly distinct for manna ash and *F. ornus* (Table 1).


*Plants* **2020**, *9*, 1035

**Table 1.** Genome sizes and

morphological

 traits of studied samples, belonging to manna ash collection. *F. ornus* was used as reference. In bracket the number of

\* Mean values. \*\* Significantly different at the 0.01 probability level between local varieties and *F. ornus*.

#### *3.1. Genetic Variation of Plastid DNA Microsatellites (cpSSRs) and Amplified Regions in the Manna Ash Collection*

Five (ccmp3, ccmp4, ccmp7, atpB/rbcL and matK) out of nine plastid DNA markers were monomorphic in set #1, displaying fragment lengths of 97, 140, 117, 157 and 253 bp, respectively. The other loci showed low but significant levels of polymorphism (Table 2): Three size variants, with amplification fragment sizes of 364, 365 and 366 bp, were observed at CPFRAX6, whereas two distinct size variants separated by one and four nucleotides were showed at ccmp6 and ccmp2, respectively. Finally, ccmp10 displayed three specific size variants, with amplicons of 103, 104 and 106 bp (Table 2), respectively. The size variants combined into a total of 4 haplotypes (Table 2). Except for "Verdello", the samples belonging to each local variety showed one specific haplotype. The four haplotypes detected in set #1 represented three of the twenty-two previously characterized cpSSR-based haplotypes [4]: Manna ash accessions carried either a sub-variant of H5 (92% of accessions) or haplotype H10 (8% of accessions), these haplotypes having previously been observed in *F. angustifolia* and in *F. excelsior*; the *F. ornus* reference trees carried H19, a haplotype private to *F. ornus* (Table 2).


*Plants* **2020**, *9*, 1035

CPFRAX6, atpB/rbcL and matK profiles.

#### *3.2. Genetic Diversity of Manna Ash Germplasm*

The genetic diversity of the manna ash germplasm and *F. ornus* genotypes, the so called set #1, was investigated using 11 nSSR (Table S2). Phylogenetic analysis based on Bruvo's distance and the UPGMA algorithm generated a dendrogram that comprised four main clusters across set #1 (Figure 1). A total of 20 unique SSR profiles were detected (16 from manna ash collection and 4 for *F. ornus* accessions, respectively), with "Frassino monumentale" as the most distant sample among the manna ash genotypes. Except for varieties "Baciciu", "Macigna" and "Nsiriddu", all accessions were assigned to their expected local variety (Figure 1), showing "Verdello" as the local variety with the highest genetic variability. In addition, the accessions analyzed belonging to "Sarvaggio", "Nivuru", "Russo" and "Abbassa cappeddu" can be considered clones, respectively, showing the same genetic profile within the same variety. The four reference trees of *F. ornus* grouped together in a cluster that behaved like an outgroup.

**Figure 1.** Genetic relationship developed using the profiles obtained from eleven nSSR (Table S2) among studied local varieties (34 samples) collected in Sicily, belonging to the manna ash germplasm. Four genotypes belonging to *F. ornus* were used as references. The dendrogram was developed using the UPGMA and Bruvo's distance [51].

#### *3.3. Genetic Diversity of Fraxinus spp. in Europe*

In order to compare the genetic relationships and diversity between the manna ash collection (sample set #1) and the germplasm of *F. angustifolia* and *F. excelsior* throughout the partially overlapping natural distribution ranges of both species across Europe, we combined our SSR profiles of manna ash with the genotypic profiles described in Gérard et al. (2013) [14] into a second dataset called sample set #2, which included 495 unique profiles at 11 SSRs. UPGMA analysis based on Bruvo's distance [51] identified groups based on species (Figure 2; Table S2). Indeed, three main clusters were highlighted, with two branches represented almost exclusively by *F. angustifolia* and *F. excelsior* samples, respectively. As expected, samples belonging to *F. ornus* clustered as the most distant genotypes, behaving as outgroup, while the individuals morphologically classified as hybrids were distributed across the two main groups. Interestingly, *F. angustifolia* samples coming from Southern Italian regions from Gérard et al. (2013) [14] data clustered together with genotypes of our manna ash collection (Figure 2), highlighting a distinctive, basal position of Southern Italian *F. angustifolia* germplasm in comparison with the range-wide collection of the species.

**Figure 2.** Genetic relationships among *Fraxinus* spp. accessions-based UPGMA and Bruvo's distance, developed using unique nSSR profiles of sample belonging to set #2, including *F. angustifolia* (FAN), *Fraxinus excelsior* (FEX), and *F. angustifolia* x *F. excelsior* (HYB). *F. ornus* (FOR) was used as outgroup. Manna ash genetic profiles collected in the South of Italy are highlighted (in pink) in the phylogenetic tree.

Interestingly, within FAN, samples collected in Southern Italy (Sicily and Calabria) showed lower genetic distance (Nei, 1973 [55]) and lower *Fst* values with respect to the other FAN populations collected in Europe, whereas comparison to FEX populations resulted in higher values (Table 3). Hybrids of both ash species were closer both to FAN (*Fst* = 0.030; *Nei* = 0.159) and FEX (*Fst* = 0.028; *Nei* = 0.110) than to FAN (Italy) (*Fst* = 0.107; *Nei* = 0.536) (Table 3).

**Table 3.** *Fst* values (below diagonal) and *Nei* (1973) genetic distances (above diagonal) evaluated through nSSR on sample set #2. Each parameter was calculated for plants belonging to *F. angustifolia* collected in Italy—FAN (Italy), *F. angustifolia* sampled across Europe—FAN, *F. excelsior*—FEX, and hybrids—HYB (*F. angustifolia* × *F. excelsior*).


Different genetic parameters (I, He, Ho, Fis, AR and Fst) were also estimated for *F. angustifolia* sampling locations belonging to set #2 (Table 4). Hungary and France2 locations had, respectively, the lowest (0.504) and highest (0.680) values for genetic diversity (He). Overall, genetic diversity parameters (I, He, Ho) displayed similar values across the locations studied (Table 4). Sixteen out of twenty-one locations showed an excess of heterozygotes, showing a negative inbreeding coefficient (*Fis*). However, except Hungary population (−0.607), *Fis* was close to zero therefore all groups could be considered in equilibrium (Table 4).

**Table 4.** Summary of genetic variation statistics at 11 nSSR loci on FAN samples belonging to set #2. Individuals were grouped based on sampling location (see Table S2).


*n*. = number of samples for each group; *I* = Shannon's index; *Ho* = observed heterozygosity; *He* = expected; *Fis* = inbreeding coefficient; *AR* = allelic richness; \* *p* < 0.05; \*\* *p* < 0.01; \*\*\* *p* < 0.001.

Overall allelic richness (*AR*) ranged from 1.173 (SSR431) to 3.746 (Fem12) for SSR431 and Fem12 markers, respectively (Table S3). *AR* was higher in trees collected in Turkey (3.55) than the other populations. Macedonia, Montenegro, Portugal1, five out of seven French groups and the two population collected in Serbia showed similar *AR* values (ranging from 3.05 to 3.42), while population belonging to Hungary had the lowest value (2.28; Table 4).

The *Fst* values were also used to assess the genetic relationships across the population belonging to the *F. angustifolia* collection here studied (Figure 3; Table S4). Three main clusters were observed in the NJ-tree: six out of seven populations collected in France were gathered together with the two Italian groups and Portugal2 population; except the samples collected in Croatia, the other Balkan populations clustered with samples belonging to Turkey and Hungary; and, finally, the remaining populations of France (France7), Balkans (Croatia1 and 2) and Portugal (Portugal1) were linked to Bulgaria and Ukraine populations (Figure 3). On the other hand, comparing the *Fst* recorded on samples grouped by country, the greatest distance was observed between the pair Hungary–Bulgaria; except Croatia, the populations belonging to Balkan clustered close in the NJ-tree. Portugal and France were genetically separated, with the last population near the branch that gathered Turkey and Italy (Figure S1).

**Figure 3.** Neighbor-joining (NJ) tree based on population pairwise fixation index. Genetic distances were computed among populations grouped based on sampling locations.

DAPC identified five clusters corresponding to the five groups selected (Figure 4). Clusters including the samples belonging to France and Portugal were overlapping, as well as clusters containing Balkans and Eastern European genotypes. On the contrary all samples collected in Italy (from Calabria and Sicily) were separated from the other groups. Except for some samples, the plot of the first two principal components distinguished clearly three groups represented by genotypes from Eastern, Western and Mediterranean Europe, respectively (Figure 4).

**Figure 4.** DAPC clustering of European germplasm of *F. angustifolia* studied, using the first two principal components (*Y*-axis and *X*-axis, respectively). The samples are grouped in five main groups: Balkans (blue; Croatia, Macedonia, Montenegro, Serbia, and Turkey), Eastern Europe (cyan; Bulgaria, Hungary, and Ukraine), France (orange), Italy (green) and Portugal (pink).

#### **4. Discussion**

The genetics of endangered *Fraxinus* spp. populations is of great interest for both conservation and evolutionary aspects. Using a multidisciplinary approach, including molecular analysis, we demonstrated in this study that an important and residual manna ash germplasm collection located in the Madonie Mountains (Sicily, Italy), belongs to *F. angustifolia*, the narrow-leaved ash. We also showed that this population contained high genetic diversity and, together with a population from Calabria, was strongly differentiated from *F. angustifolia* populations from other locations in Europe (Balkans, Eastern Europe, France, and Portugal). The assessment of the genetic diversity is crucial to plan activities aimed to conservation of this important and endangered forest tree genetic resource.

Notwithstanding the limited number of individuals studied, plants belonging to the manna ash collection analyzed showed a wide variation in several phenotypic traits, such as leaf characteristics, inflorescence and infructescence features, plant vigor and morphology, showing a continuous and narrow range of values for individual traits. Leaf morphology is the strongest differentiating character within the local collection, although this trait is not successfully used to characterize the varieties, being strongly influenced by environmental factors. Indeed, it is well known that plants react to water stress, which is particularly strong in the Mediterranean area, by modifying their leaf traits. Considering that the trees collected come from a heterogeneous mountain environment in which neighboring individuals compete for the same ground and light resources, the observed phenotypic variations might be attributed to the impact of environmental factors on growth habit. Regarding fruit morphology, the shape and the length of the fruits of "manna" varieties were quite heterogeneous,

as well as the colour or samaras changing from light green of most cultivars to veined brown of "Nsiriddu". These traits; thus, showed a weak distinguishing power [62]. Nevertheless distinctive morphological traits separated all manna ash varieties from *F. ornus*. Specifically, in the local varieties the samara stalk length was greater (nearly three times) than in *F. ornus*, while the apical leaf length/width ratio and the number of samaras/raceme of manna ash samples were approximately two and a half and eight times lower, respectively, separating the two groups compared. Variety identification based on morphological data can be verified using molecular analysis, because the environmental effects, epistatic interactions and pleiotropic effects can interfere with morphological traits evaluation, as in many forest trees [63].

The genome size showed a small change in the 2C nuclear DNA content within the samples belonging to the manna ash collection (from 1.539 to 1.544 pg). The existence of intraspecific variation in genome size has been reported in several plant species [64–66]. However, due to the little shift here reported, the genome size, as well as the ploidy level, can be considered stable in our collection, as commonly observed within the species [67]. The variations recorded can be explained by technical issues, as reported for some angiosperms [68,69]. The 2C DNA values observed in the manna ash collection unambiguously assign the manna germplasm to *F. angustifolia* (1.540 pg [60]) and are different from both *F. excelsior* (1.68 pg; [60]), and *F. ornus* (1.98 pg; [60]), showing a larger genome size due to probably a high number of repetitive elements and/or several ribosomal gene repetitions [60].

Molecular analyses confirmed the differences highlighted through morphological and cytological approaches. In agreement with the low chloroplast DNA mutation rate detected in the Oleaceae [70], cpSSRs and plastid DNA regions identified a low genetic diversity in the manna ash collection, identifying three haplotypes, two sub-variants of H5 (covering 92% of samples) and H10 (the remaining 8%), respectively. The haplotype detected in the reference trees (H19) was previously found only in *F. ornus*, specifically in trees from Italy and Corsica. In agreement with the increased botanical similarity of *F. angustifolia* and *F. excelsior* and a partially overlapping phenology [6,71,72], an extensive sharing of cpDNA haplotypes between *F. angustifolia* and *F. excelsior* has been found previously but without common profiles between them and *F. ornus* [4]. Indeed, both H5 and H10 variants have been shown to be shared haplotypes between *F. angustifolia* and *F. excelsior* although in different geographic regions [4,73]. The majority of manna ash accessions carried haplotype H05, a haplotype observed in *F. angustifolia* throughout central and southern Italy and in *F. excelsior* in a restricted area of the Eastern Alps; this particular distribution could be related to two possible different and independent glacial refuges of ash species [73]. The other haplotype (H10) observed in "Abbassa cappeddu" variety and in some trees belonging to "Verdello" was detected in *F. angustifolia* trees collected in Portugal and Corsica [4] and only in the Czech Republic for *F. excelsior*.

Microsatellites are reported as very effective marker in terms of high information content and discrimination power owing to high allelic variation and allowing clear identification of populations or varieties in several plant species [74–77]. Nuclear SSR (nSSR) profiles showed a noteworthy rate of polymorphism and, except for "Baciciu" and "Cavolo", were able to group each accession with its own variety. The most represented local variety named "Verdello" was grouped in a main cluster, together with tree belonging to "Nsiriddu", showing a high variability, in agreement to the profiles obtained in natural populations of *F. angustifolia* in Greece [78]. Among studied varieties, "Verdello" and "Nsiriddu" showed an upright growth habit and are considered the most productive ones [18], yielding high quality manna with very similar chemical and organoleptic characteristics.

Although only a slight genetic differentiation between *F. angustifolia* and *F. excelsior* was reported [79,80], our findings highlighted a clear separation between trees belonging to two species collected throughout their natural distribution across Europe. Significant patterns of distance were found among species, with *F. ornus* clearly behaving like outgroup. Interestingly, a private branch grouping only *F. angustifolia* trees collected in the South of Italy, including the profiles recorded for manna ash collection, was highlighted. In agreement to cluster analysis, the pairwise fixation index (*Fst*) highlighted a clear distinctness for the populations from Southern Italy, close to Turkish and French

populations. In addition both cluster and *Fst* analysis allowed to separate the two Italian populations highlight the feature belonging to manna ash collection. Our results are consistent with *Fraxinus* spp. distribution, indeed common ash (*F. excelsior*) is found mainly in the north regions throughout Europe [4,81] and has expanded its range in the south probably during cooler climate episodes, maintaining the current relic populations in temperate locations such as the Elburz Mountains (Turkey), Calabria (Italy), and in Sicily, more specifically in the Nebrodi Mountains, where a small population (200 plants) of *F. excelsior* spp. *siciliensis* [82,83] is located in three different sites (Caronia, Longi and Alcara Li Fusi), showing specific features (e.g., reduced size and blooming at the same time of leaf emission; [84] due to the high isolation. On the contrary, *F. angustifolia* is mostly restricted to the Mediterranean region [81] and the favorable environmental conditions allowed this species to spread in the Sicilian Nebrodi Mountains, giving rise at the end of 19th century more than 16 varieties distinguished for both morphological traits and manna production [21,85]. The remarkable diversity and strong genetic structure in the Sicilian collection of *F. angustifolia* highlighted by both cp and nSSRs probably is related to the geographic isolation in which they have been found for about 10,000 years starting from the end of the last glaciation. This trend of speciation in *Fraxinus*, related to specific geographic zones, is in agreement to previous work [71], and would be driven by geological and climatic modifications [86] as reported for other species [87]. Phylogenetic analysis showed also a distribution of hybrids (HYB) within both *F. excelsior* and *F. angustifolia* groups, reflecting the sharing of haplotypes between the two species [4]. More interestingly, in the last group the hybrids and trees belonging to common ash were linked to *F. angustifolia* trees collected in Balkan and Eastern Europe, in agreement to the species distribution [4], whereas the samples from Southern Italy were the most genetically distant plants. Our finding can be explained by the marked different flowering time that characterizes the two species in Sicily making the interspecific cross unfavorable.

Finally, grouping the *F. angustifolia* trees studied on the grounds of their geographic area, discriminant analysis based on nuclear microsatellites underlined a gradient from west to east in *Y*-axis. This evidence is consistent with the classification of *F. angustifolia* that can be grouped into three geographic subspecies: (i) *F. angustifolia* Vahl ssp. *angustifolia* (Portugal and the Western Mediterranean), (ii) ssp. *oxycarpa* (M. Bieb. ex Willd.) Franco and Rocha Afonso (Northeast Spain to Turkey), and (iii) ssp. *syriaca* (Boiss.) Yalt (Turkey and Asia Minor) [6–8]. Additionally, samples belonging to Calabria and Sicily were separated from the others. The clear diversity harbored by *F. angustifolia* populations from South Italy, allow to hypothesize a possible glacial refuge of species in this area, according to a temperature increasing during the late glacial maximum (18,000 years bp) in the Mediterranean eastern compared to western one, as already reported for the Turkish populations [4,88].

#### **5. Conclusions**

In summary, using a multidisciplinary approach the present study was focused to characterize in depth a historical manna ash collection represented by local varieties collected in the Madonie area (Sicily, Italy), in order to safeguard an important cultural heritage. Our finding definitively clarifies that the local varieties actually used for manna production belong to *F. angustifolia*. This study provides useful information for germplasm management, finalized to improve the production and productivity of agroforestry species investigated. In addition, the evidences here reported suggest the presence of an additional glacial refuge for *F. angustifolia* in Italy, confirming the importance of Sicily as source of biodiversity. Furthermore, our study could represent an invitation for botanists to expand the historical knowledge of the collection. Due to the importance of ash germplasm studied, all local varieties were propagated in two repository fields, at the Professional Institute for Agriculture and the Environment (I.P.A.A.) "Luigi Failla Tedaldi" (Castelbuono, Italy) and at the private Schicchi's repository located in the Croce-Foresta district (Castelbuono, Italy) (info at rosario.schicchi@unipa.it) making available this genetic resources for future breeding programs and to develop future national/international collaborations.

*Plants* **2020**, *9*, 1035

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/8/1035/s1, Table S1: Morphological traits of studied samples, belonging to manna ash collection and samples belonging to *F. ornus*, used as reference. In brackets the number of samples analyzed belonging to each local variety/species. Three replicates for each trait were analyzed. Table S2: Nuclear SSR profiles of samples belonging to (A) set #1 and (B) set #2. Table S3: Allelic richness (*AR*) evaluated for FAN accessions grouped based on sampling location. In the table geographic coordinates (Long and Lat) and AR values for each nSSR were indicated. Table S4: Pairwise population Fst values. Figure S1: Neighbor-joining (NJ) tree based on population pairwise fixation index. Genetic distances were computed among populations grouped based on country of origin.

**Author Contributions:** Conceptualization, L.A., F.M., S.F.d.B. and R.S.; Data curation, L.A., F.M. and M.H.; Formal analysis, L.A. and F.M.; Funding acquisition, R.S.; Investigation, L.A. and F.M.; Resources, G.D.N. and R.S.; Supervision, F.M.; Writing-original draft, L.A., F.M., M.H. and S.F.d.B.; Writing-review & editing, L.A., F.M., G.D.N., M.H., F.C., S.F.d.B. and R.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors are grateful to Maria Luisa Virga (Assessorato dell'Agricoltura, dello Sviluppo Rurale e della Pesca Mediterranea-Dipartimento Regionale dell'Agricoltura-Regione Sicilia) for encouraging the meeting between the research groups.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Patterns of Genetic Diversity in Highly Invasive Species: Cogongrass (***Imperata cylindrica***) Expansion in the Invaded Range of the Southern United States (US)**

### **Rima D. Lucardi 1,\*, Lisa E. Wallace <sup>2</sup> and Gary N. Ervin <sup>3</sup>**


Received: 11 March 2020; Accepted: 25 March 2020; Published: 31 March 2020

**Abstract:** The spatial expansions of invasive organisms in the novel range are generally expected to follow an isolation-by-distance relationship (IBD) if the invasion is biologically driven; however, many invasions are facilitated anthropogenically. This research focused on the extant expansion patterns of cogongrass (*Imperata cylindrica*). Cogongrass is a widespread invasive species throughout the southern United States (US). Patterns of infestation vary among US states. Cogongrass is pyrogenic, and its invasion threatens softwood (*Pinus* spp.) plantations, a substantial economic market for this US region. Over 600 individuals were sampled from seven invaded US states, using amplified fragment length polymorphisms (AFLPs) to assess genetic diversity and population structure. We suspected that differences in historical management efforts among US states influenced differences in genetic diversity and structure. We detected two genetic lineages at the highest level of analysis. One genetic lineage was locally restricted, whereas the other was found throughout the study region. Admixed individuals were found in all US states and consistently co-occurred with the dominant lineage, suggesting that secondary contact and hybridization may have facilitated expansion. The widespread prevalence of only one of the two detected genetic lineages suggests a primary genetic lineage responsible for on-going population expansion in the US.

**Keywords:** AFLP; genetic diversity; invasive; Poaceae; population genetics; range expansion

#### **1. Introduction**

Biological invasions continue to increase during this era of increased global connectivity [1], while research has sought to understand the biological mechanisms which contribute to novel invasion success or the failure to establish or expand beyond incipient populations [2]. Ultimately, the success of a novel plant invasion can be constrained by its biological and evolutionary limits, such as its evolutionary history, geographic origin, propagule pressure and multiple introductions, or a species' ability to adapt to novel environments [3–6]. However, biology is not the only factor in successful plant invasions; the consequences of anthropogenic activity and vectoring are also influential [5,7,8]. It is known that changes in the structure and/or function of the recipient habitat may provide or limit opportunities for invasion success and can be possibly aided by human-assisted dispersal [9–13]. Disturbances can create new habitat resources (i.e., increased availability of canopy openings, or increased frequency of edge habitat) or facilitate the successful dispersal of invasive plant propagules (creation of "stepping-stone" patches, novel dispersal routes and/or mechanisms) in the spread and establishment of invasive

plants [14–16]. Habitat-modifying activities by humans are common in the regular maintenance and management of rights-of-way, roads, and in agricultural and silvicultural practices in the United States (US). For many weedy plant species, human activities influence inter-patch connectivity by creating new habitats and by enhancing dispersal [17–20]. Long-distance dispersal can increase the overall rates of spread, but also has the potential to influence the population's genetic structure by facilitating intraspecific hybridization among independent novel introductions or through the homogenization of a regional population of a particular plant invader.

Invasions, such as cogongrass (*Imperata cylindrica*), are of great concern to the agriculture of softwood timber plantations, primarily pines (*Pinus* spp.). Previous work on this species has shown that anthropogenic land use (e.g., the presence and maintenance of transportation corridors or forestry management practices) is an important driver of this species' distribution, and therefore of range expansion [21,22]. Cogongrass (*Imperata cylindrica*) has invaded seven southern US states and has negatively impacted both the economy and the ecology of these states and the agronomy of the region [23,24]. This invasive grass is a federally listed noxious weed [25] and is considered a global weed of substantial consequence [26,27]. In addition to its invasion across the US softwood timber-growing region, cogongrass is widespread throughout the tropical and sub-tropical regions of the Old World [26,28]. In the US, this plant invasion has benefitted from multiple introductions from previously isolated genetic lineages [29–31]. Previous research observed two genetic lineages of parental material introduced from East Asia, along with documented occurrences of multiple introductions that probably facilitated the establishment of cogongrass in the southern US [31]. The continued spread of cogongrass into neighboring states is a significant concern [23,24,32]. Cogongrass is reported to be an obligate-outcrosser that produces thousands of viable seeds per inflorescence [23,33]. Despite its impressive sexual reproductive capacity, the range expansion of this species has been primarily attributed to the human-aided movement of rhizomatous fragments [24,30]. Again, previous research found cogongrass populations to be surprisingly genetically diverse for a clonally reproductive grass; therefore, this reproductive flexibility of both sexual and asexual approaches has contributed to its fitness and its successful establishment and invasion over the last 100 years [31]. Previous research also allows us to exclude interspecific hybridization with a congeneric directly contributing to current range expansion [34–36] at this time. Human-aided spread, both purposeful and inadvertent, is the probable vector of erratic range expansion, especially during the mid-20th century when propagules were transported from Mississippi and Alabama into Florida and elsewhere [29,30]. Consequences of cogongrass, in both the native and invaded ranges, include reductions in biodiversity, monotypic stands, and timber loss; furthermore, cogongrass is pyrogenic, meaning this grass species is highly-flammable, contributing to alterations in fire regimes to more frequent and/or intense fire events [23,24,26,33]. In managed timber plantations, young *Pinus* spp. are especially susceptible to cogongrass fire events; after a fire, cogongrass rhizomes are generally the first to re-sprout and then dominate the landscape, excluding other plants and animals from recolonizing [24,26,34].

Cogongrass was sampled from, and in cooperation with, state and federal agencies in the following US states experiencing extant infestation and practicing any form of current management: Alabama, Florida, Georgia, Louisiana, Mississippi, South Carolina and Texas. An anonymous genome-wide scan provided by amplified fragment length polymorphism (AFLPs) markers is considered adequate for this analysis. The same AFLP markers have been previously utilized to assess genetic diversity and population structure in invasive US cogongrass populations [31,36–40]; however, previous published research was limited in spatial scale. In this analysis, the US state-level genetic diversity and population structure across the infested range were evaluated. Furthermore, this evaluation sought if anthropogenic activities influenced the extant patterns of genetic diversity and population structure during novel range expansion. One example of a major anthropogenic influence is the American Recovery and Reinvestment Act (ARRA) of 2008, which allocated funding from the US federal government to state agencies for the purpose of improving the control and management of this invasive grass. Alabama received \$6.3-mil (USD) and Georgia received \$1.8-mil (USD) earmarked for

cogongrass eradication. Mississippi received \$1.2-mil (USD) and South Carolina received \$700,000 (USD) for general invasive plant control and management. These widely variable financial allocations, and each state's use of the funding, may have contributed to differential patterns of genetic diversity or structure among infested US states. Furthermore, we expected genetic variance among US states due to the historical treatment and management of cogongrass in duration, chemical or physical management, and historical efforts. Though state political borders are unlikely barriers to dispersal for propagule movement within the southern US, the influence of differential funding availability and state-level management practices on genetic diversity and structure are variable due to jurisdictional boundaries; thus, we expected cogongrass in states at the expanding fronts of the invasion (South Carolina and Texas) to be less genetically diverse than in the states that received direct introductions and where invasion has been present for the most amount of time (Mississippi and Alabama). Alternatively, states that received the most funding (e.g., Alabama and Georgia) may be less genetically diverse than states that received substantially less funding (e.g., South Carolina).

#### **2. Results**

#### *2.1. Genetic Diversity*

The AFLP analysis resulted in 2057 polymorphic loci from 676 cogongrass individuals. We observed a good reproducibility for this method (SE = 0.004; 95% CI; <1 mismatch per individual per locus) among the samples analyzed. The state-level average percentage of polymorphic loci was 23% (SE ± 7%), ranging from 2% (GA) to 56% (SC). The mean Shannon's Information Index (I) was 0.030 (SE ± 0.001). Nei's gene diversity for all states analyzed ranged from 0.006 (GA) to 0.042 (SC) with a mean of 0.023 (Table 1). Heterozygosity values also trended similarly to Shannon's (I) and Nei's gene diversity: the mean heterozygosity (both He and UHe) was 0.016 (SE ± 0.001). Spearman's correlation coefficient (ρ) values between the sample size and genetic diversity were calculated for (1) all states, and (2) for states with more than 50 sampled individuals (AL, FL, LA, MS and SC). All relationships between the sample size and genetic diversity were not significant (*P* > 0.05), suggesting that unevenly sampling states did not unnecessarily bias results or interpretation.


**Table 1.** State-level sampling and location information with genetic and genotypic diversity estimates.

An overall reduction was observed in the clonal diversity analysis from 676 sampled individuals to 321 unique multi-locus genotypes (Table 1 and Figure 1). The mean genotype diversity (range: 0 to 1) for all states was 0.736 (SE ± 0.105). We observed the lowest genotype diversity in GA at 0.154, reducing the effective number of genotypes to 1.154. We observed the highest genotype diversity in MS and AL (>0.90), with FL and SC close behind (0.816 and 0.888, respectively; Table 1 and Figure 1).

**Figure 1.** Genotypic diversity relative to the number of individual samples in each state and overall. The effective number of genotypes for the US state of Georgia was <1; therefore, no symbol is present for that state.

#### *2.2. Population Structure*

Of the 676 individuals sampled, 485 were assigned with >90% posterior probability to a single lineage; 73 to the MS-type lineage (red, Figure 1) and 412 to the AL-type lineage (blue, Figure 2). Admixture was present in both clusters (mean α = 0.165), and 191 individuals could not be confidently assigned to either cluster with strong confidence. A Bayesian cluster analysis was conducted in the program STRUCTURE, which supported an inference of two distinct genetic clusters (*K* = 2; mean LnP(D) = −107,561; Figure 3) extant in the region examined: a MS-type lineage and an AL-type lineage (Figure 2). Individuals assigned to the MS-type lineage are only present in central MS, based on our sampling, and there is a single outlier individual in SC. It is in this geographic locale of central MS that both lineages are present and co-occur at the patch level, with varying proportions of admixed individuals. In all other states in the region, the AL-type lineage is dominant and co-occurs with admixed individuals, except in GA where only the AL-type lineage was found. A cluster analysis at the patch- and state-level in GA is consistent with the genetic diversity, supporting a low heterogeneity, and all individuals in GA were assigned to the AL-type lineage. Additional Bayesian cluster analyses were conducted on all individuals excluding the individuals strongly assigned to the MS-type lineage. No population substructure was observed in the AL-type and in ambiguous individuals, and this remained consistent with the initial analysis.

**Figure 2.** Map of the southern US with the proportion of individuals assigned to MS-type (red), AL-type (blue) or ambiguous (green) based on a 90% threshold of assignment from STRUCTURE.

**Figure 3.** Summary evaluation of the posterior probability values [LnP(D)] used for the determination of the most likely number of clusters (*K*) from STRUCTURE simulations.

Significant population pairwise FST values were observed between the two genetic groups within MS patches (FST = 0.330, *P* < 0.05; Table 2). Please note that the MS-central group was significantly dissimilar from all other groups tested (FST > 0.3); therefore, a pairwise FST analysis separated MS populations into two groups: 1) coastal Mississippi (MS-Coast) populations from the other 2) Mississippi lineage (MS-Central) (Table 2). The SC population was the least genetically differentiated from the FL populations (FST = 0.060), while the AL and SC populations were the least genetically differentiated from the TX population (FST = 0.094, 0.083, respectively; Table 2). Pairwise population FST values were similar between SC and most other states analyzed (FST < 0.2), excepting MS-central (FST = 0.314). Pairwise FST between TX and AL (FST = 0.090) was very similar to pairwise FST values

between TX and SC, and between TX and FL (both FST = 0.083). The greatest genetic dissimilarity existed between TX and GA (FST = 0.553).


**Table 2.** Seven groups analyzed for population pairwise FST1, where MS-type (central) was separated from coast (AL-type) for the state-level analysis.

<sup>1</sup> Significant values in bold, *P* = 0.05; \* indicates a value of 0, when pairwise comparison of the same group.

A principal coordinate analysis (PCoA) was consistent with the population structure as inferred by the Bayesian cluster analysis in STRUCTURE. The first two axes explained 61% of individual genetic variance (Figure 4). Two clusters were observed in the PCoA plot: one fairly contained cluster and the other being broad and loosely organized. Individuals sampled from MS were present in both clusters, where individuals sampled from central MS (MS-Central) appear localized to the bottom right quadrant (red diamonds, Figure 4) and individuals sampled from the MS Gulf Coast (MS-Coast) cluster with individuals from AL, FL and other states. Individuals sampled from AL are broadly distributed throughout the larger cluster, indicating a high degree of individual genetic variation. Individuals from FL also presented a similar pattern. Sampled individuals from TX (all collected from a single site) did not form a tight cluster, but grouped with samples from AL, FL, LA and SC. Georgia; these individuals from these states, however, form a very tight cluster within the broad cluster (Figure 4), overlapping within a subset of the AL, LA, MS and SC individuals, potentially indicating genetic relationships among geographically disparate populations.

**Figure 4.** Principle coordinates analysis (PCoA) of individual genetic covariance with data standardization (N = 676, 61% of variation accounted for by the first two axes). All seven US states are represented in this ordination.

The genetic differentiation was evaluated between the MS-type cluster and all other individuals based on inferences from STRUCTURE and the PCoA. Ambiguous individuals were lumped with the AL-type in an analysis of molecular variation (AMOVA; Table 3); this grouping was derived because of the low value of admixture from STRUCTURE (mean α = 0.165) and the high proportion of assignment to the AL-type in ambiguous individuals (<0.90, but >0.50). We observed a significant genetic differentiation (FST = 0.363, *P* < 0.001) between the groups as defined above.

**Table 3.** Analysis of molecular variance (AMOVA) where two groups were inferred from the STRUCTURE analysis.


#### **3. Discussion**

The detection of two distinct genetic lineages throughout the invaded region is consistent with other published research on the cogongrass invasion in the US [29–31,36]. This analysis, however, demonstrated that one of those two genetic lineages (the dominant AL-type) has been responsible for the majority of the spatial spread and persistence in the US. The dominant AL-type lineage is more heterogeneous, widespread, and is genetically distinct (FST = 0.363, *P* < 0.001) from the MS-type lineage, even when inclusive of ambiguously assigned individuals. A two-lineage scenario provided the only evidence of strong population structuring throughout the invaded US range. The majority of individuals sampled for this work were assigned with a 90% or greater probability to the AL-type lineage (Figure 2). The other detected genetic group (the MS-type) appears to be geographically constrained to central MS (Figure 2). This pattern may be due to the reduced competitive or dispersal ability of this lineage, lack of anthropogenic activity promoting its spread or other potentially heritable fitness factors restricting its ability to spread from this region.

The present analyses further demonstrated that, although the two genetic groups remain significantly differentiated at the scale of this study, admixture is present within individuals at the local, patch scale (mean α = 0.165). Admixture is also suggested to have increased upon regional evaluation as compared to previous research constrained to MS and AL (where mean α = 0.08) [31]. Similarly, recent admixture in the invasive weed, *Silene vulgaris*, was detected in the novel range (North America), and demonstrated a relationship between fitness benefits and increased heterozygosity in invasive populations [41]. Such results suggest the possibility that genetic mixing contributed to the successful spread of cogongrass across the southern US; however, measures of intraspecific phenotypic variation, and its relationship to genetic variation and/or hybridization, are needed to demonstrate whether significant genetic admixture has contributed to cogongrass' success [42,43]. Typically, multiple introductions increase the genetic diversity and probability of success in a wide array of invasive, colonizing organisms [44–48].

Although we found a significant genetic distance to exist among the states (Table 2), the overwhelming evidence favored a regional population dominated by the AL-type lineage, with a highly-restricted MS-type found mainly restricted in the central portion of Mississippi (Figure 2). There also appears to be an element of biological influence on the population structure, as the majority of the genetic variation in cogongrass is partitioned within sampled locations (64%). Similar to previous findings at smaller scales (in terms of sample size and geography), the majority of genetic variation being partitioned within patches continues to be unexpected for a species thought to expand locally via clonal propagation. The regional degree of genetic differentiation in cogongrass (FST = 0.363, *P* < 0.001) is similar to that of early successional plant species (mean FST = 0.37) [49]. Even considering the fact that the majority of genetic differentiation is partitioned within local patches, average heterozygosity for state-level cogongrass sampling (He and UHe = 0.016 ± 0.001) is still lower than such values in long-lived perennial plant species (He = 0.68) or those capable of selfing (He = 0.41) [49].

We expected that differential funding and historical management efforts among the seven US states would contribute to contemporary state-level genetic diversity differences across the region. Differing state-level management practices, efforts, and starting-acreages of infestation affected the relative genetic diversity (Table 1). The state of Alabama (AL) received the greatest allocation of cogongrass-specific funding from the American Recovery and Reinvestment Act (ARRA), \$6.3-mil (USD). Despite receiving the largest allocation of federal funding, AL cogongrass populations on public and private lands were not much reduced since 2008. Though no published data has resulted since then, we do know that AL chose to outsource the ARRA funds to a private invasive plant control consulting firm responsible for the detection, treatment and repeated visits/re-application on infested sites. Since a follow-up chemical application is necessary in any cogongrass management plan or strategy [24], Alabama's utilization of ARRA funds has been viewed as one of the greatest modern failures in cogongrass control efforts. Alabama continues to be one of the most infested states in the US, superseded only by Florida. When compared to the neighboring state of Georgia (GA), which received \$1.8-mil (USD) from the ARRA specifically for cogongrass eradication, the state of GA resulted in the lowest genetic and genotypic diversity (Figure 2), along with the least amount of acreage infested excluding the infestation site in Texas. Additional evidence as to why GA is so different from other states in terms of the distribution and diversity of cogongrass infestation is that the Georgia Forestry Commission began treating cogongrass in the late 1960s (Art Miller personal correspondence, GFC *retired*) with chemical herbicide and mechanical removal decades before all other southern US states. Furthermore, the Georgia Forestry Commission currently utilizes a chemical-application strategy that has been most effective [24], along with semi-annual revisits to document infested sites. Therefore, the state of Georgia has been able to stem the majority of cogongrass infestation to the southwestern corner of the state, which borders Gulf Coastal Alabama. The states of GA and TX resulted in the lowest overall genetic diversity. We considered that the smaller sample sizes in these two states may have contributed to this pattern, 13 and 10 individuals, respectively; however, Spearman's correlation tests did not find significant relationships between sample sizes and genetic/genotypic diversity outcomes (Figure 1, Table 1). The genotypic diversity in GA was slightly over 15%, whereas in TX it was over 77%. From our analysis, this suggested that cogongrass sites sampled in GA persist with both a low genetic and genotypic diversity. Furthermore, all individuals in GA were strongly assigned to the AL-type, also suggesting a superior ability to persist despite the fitness benefits expected to accompany a high genotypic diversity contributing to increased adaptability [3]. This pattern observed in GA, which is not mirrored in TX, indicates that the success of an invader can be based on the fitness of one to a few adapted genotypes [50,51].

Cogongrass possesses a broad global distribution [23,24], indicating a generalist phenotype, preferentially colonizing and positively associated with disturbance [21], and originating from potentially multiple points in Asia [29,30]. We consider cogongrass a suitable candidate for rapid adaptation and evolution in the invaded range due anthropogenic activities [52]. Furthermore, the chemical control of invasive plant populations is a human-induced selective pressure, unevenly applied and distributed across the landscape. These human activities directly and indirectly affect the population and genetic structure of secondarily invading populations, and will continue to do so.

The regional expansion of cogongrass since its initial introduction in 1919 has benefitted from the direct and purposeful anthropogenic transport of propagules, multiple introductions and secondary contact [29–31]. The majority of propagule spread had been previously attributed to the transport and establishment of rhizome fragments; however, the lack of a strong decay in isolation-by-distance (IBD) genetic-geographic relationships suggested this is not the case [31,36,40]. These data suggest that the cogongrass expansion throughout the South has probably benefitted from reproductive

flexibility, the on-going movement of viable propagules (both seeds and rhizome fragments) and potential adaptive persistence under active management by chemical control. Given the clear regional dominance by the AL-type lineage of plants, further investigation into the mechanisms responsible for the successful invasion of cogongrass might benefit from a functional genetic comparison between the AL- and MS-lineage's phenotypes and phenotypic responses, as well as intraspecific global references.

#### **4. Materials and Methods**

#### *4.1. Study Area and Sampling*

Live cogongrass leaf tissues were collected during 2008–2009 from seven states in the southern region of the US: Alabama (AL, n = 208, 10 patches), Florida (FL, n = 129, 14 patches), Georgia (GA, n = 13, 2 patches), Louisiana (LA, n = 62, 6 patches), Mississippi (MS, n = 180, 11 patches), South Carolina (SC, n = 74, 7 patches) and Texas (TX, n = 10, single patch site; Figure 2, Table 1, N = 676). Each tiller was assumed as being representative of an "individual" or a ramet in the patch/population (while acknowledging that individual patches may have arisen from only one to a few colonizing propagules resulting in few genets comprising a population). Leaf tissues were collected in the field from cogongrass populations. Patches were identified as contiguous sites of cogongrass, which often occur as circular patches in open areas or as long, narrow patches along roadside rights-of-way. Leaf tissues were placed into individually-labeled plastic bags and then stored in a cooler with 1–2 cups of ice (or ice substitute) per large cooler during collection and transport (for maximum 24–36 h, at ambient vehicle temperature). Silica gel containing color indicator was poured into individual bags to dry the leaf tissue in the lab upon unpacking of leaf tissues. Drying leaf tissues in silica gel were dried for at least 1 week at room temperature.

Cogongrass (*I. cylindrica*) is a listed Federal Noxious Weed; all sampling was conducted with approval by the U.S. Dept. of Agriculture, Animal and Plant Health Inspection Service, Plant Pest Quarantine (Permit#: P526P-12-00211, P526-080721-005). Additional permissions were required for access to specific areas, including a collection agreement with The Nature Conservancy (TNC) and Miami-Dade County Parks and Recreation (Permit #145). Most sampling was conducted on public land including National Forests (USDA Forest Service), interstate/highway rights-of-way and on private land with permission from state forestry agencies and associated landowners, as appropriate.

#### *4.2. Molecular Analysis*

Approximately 1-cm<sup>2</sup> of dried individual leaf tissue was aseptically transferred into a 2-mL microcentrifuge tube for each individual. The extraction of total DNA from leaf tissues, primers, reagents and PCR conditions were described previously in Lucardi et al. [31,36] based on Vos et al. [53]. Lack of reproducibility is considered one of the negative aspects of dominant markers, such as AFLPs; therefore, we accompanied all runs with positive and negative controls.

Fragment data were visualized in GeneMarker® (SoftGenetics, LLC, State College, PA, USA) and exported into a general text format. Detected fragments were sorted on migration size (basepairs) and objectively scored utilizing an independently developed procedure (Lucardi and Walker, unpublished methodology) using both Excel 2007 (Microsoft Corporation, Redmond, WA, USA) and PASW v.18.0 (SPSS, IBM Corporation, Armonk, NY, USA) as specified in Lucardi et al. [31,36].

Data conversions of presence-absence matrices were conducted in AFLPdat source script [54] in R v2.15.1. The genetic diversity was assessed using the expected heterozygosity (biased (He) and unbiased (UHe)) and Shannon's Information/Diversity Index (I), serving as a coefficient of similarity, for each state for state-level diversity estimates (GenAlEx 6.3) [55]. Population genetic diversity metrics, such as Shannon's Information Index [I], were estimated from allele frequencies inferred from dominant data and are subject to Hardy-Weinberg equilibrium assumptions. These assumptions can reduce the accuracy in allele frequency estimations from dominant data, but reliable results for a

comparative study can be achieved with adequate population sampling and a sufficient number of primer sets, which generate a large number of detected polymorphic loci [39,56,57].

Because of cogongrass' capacity to clonally reproduce *via* rhizomes, we conducted a clonal analysis using the "Clones" function within the AFLPdat package [54]. The standard error among positive control replicates (0.004) functioned as the error parameter for detecting identical multi-locus genotypes. The number of unique multi-locus genotypes per population contributes toward a more accurate assessment of the actual genetic diversity. Uneven sample sizes may bias the interpretation of results due to statistical dependence between variables. We used the Spearman's correlation coefficient to determine if relationships are present between sample size and genetic diversity. We tested the relationships between sample sizes and frequency-based estimates using Spearman's correlation method ("cor.s") in R v.2.15.1.

We evaluated the population structure with population pairwise FST (Arlequin v.3.5 [58]) and the Bayesian cluster analysis program, STRUCTURE v.2.3.3 [59], to infer the most likely number of clusters (or *K*) based on posterior log-likelihood probability values from each simulation. We utilized the Evanno et al. [60] method to detect the population structure when *K* ≥ 2. Multiple simulations were conducted for *K* = 1 through 7, based on the seven sampled US states, with the following parameters: admixture ancestry model (to infer α), burn-in 10,000 and 50,000 Markov Chain Monte Carlo (MCMC) [58]. The ad hoc statistic, Δ*K*, was plotted to determine the mode. Individuals were assigned to a single lineage if they contained a probability of membership of at least 0.90 (90% threshold) to one of the clusters; all other individuals that were not strongly assigned to one of the lineages were assigned as ambiguous. We further assessed the population structure with a principal coordinates analysis (PCoA, GenAlEx v.6.3) and analysis of molecular variation (AMOVA, [61]) performed in Arlequin v.3.5 [58] using genetic distances, testing the structure based on inferences made from multi-locus genetic information analyzed in a cluster analysis.

**Author Contributions:** R.D.L., L.E.W. and G.N.E. initiated the initial research design; R.D.L. collected field samples (with assistance), conducted the molecular research and data analyses, R.D.L. wrote the manuscript. L.E.W. and G.N.E. significantly contributed to the writing and interpretation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by in part by the United States Geological Survey, Biological Resources Discipline (#04HQAG0135) and the United States Department of Agriculture (2006-03613 and 2008-35320-18679) to GNE.

**Acknowledgments:** We are very grateful to the Georgia Forestry Commission (M. McClure and C. Bates), the Texas Forest Service (M. Murphrey), our cooperators at Auburn University (N. Loewenstein), C. Matson and D. Tharp (The Nature Conservancy), S.C. Hughes, and K.A. Bradley (previously of the Institute for Regional Conservation, FL) for their assistance with sampling and field identifications. We also thank Drs. Brian Counterman and Diana Outlaw (Mississippi State University), Charles T. Bryson (retired, USDA ARS), and Jacob Walker (Arkansas Research Center) for reviewing an early draft of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Accessibility:** Sampling coordinates are detailed in the Appendix section of Lucardi 2012, available freely here: [http://sun.library.msstate.edu/ETD-db/theses/available/etd-10192012-122638/unrestricted/LUCARDI\_Lib\_ v2.3\_291012.pdf].

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Review*

### **A Dual Strategy of Breeding for Drought Tolerance and Introducing Drought-Tolerant, Underutilized Crops into Production Systems to Enhance Their Resilience to Water Deficiency**

### **Amparo Rosero 1,\*, Leiter Granda 2, Jhon A. Berdugo-Cely 1, Olga Šamajová 3, Jozef Šamaj <sup>3</sup> and Radim Cerkal <sup>2</sup>**


Received: 31 July 2020; Accepted: 22 September 2020; Published: 24 September 2020

**Abstract:** Water scarcity is the primary constraint on crop productivity in arid and semiarid tropical areas suffering from climate alterations; in accordance, agricultural systems have to be optimized. Several concepts and strategies should be considered to improve crop yield and quality, particularly in vulnerable regions where such environmental changes cause a risk of food insecurity. In this work, we review two strategies aiming to increase drought stress tolerance: (i) the use of natural genes that have evolved over time and are preserved in crop wild relatives and landraces for drought tolerance breeding using conventional and molecular methods and (ii) exploiting the reservoir of neglected and underutilized species to identify those that are known to be more drought-tolerant than conventional staple crops while possessing other desired agronomic and nutritive characteristics, as well as introducing them into existing cropping systems to make them more resilient to water deficiency conditions. In the past, the existence of drought tolerance genes in crop wild relatives and landraces was either unknown or difficult to exploit using traditional breeding techniques to secure potential long-term solutions. Today, with the advances in genomics and phenomics, there are a number of new tools available that facilitate the discovery of drought resistance genes in crop wild relatives and landraces and their relatively easy transfer into advanced breeding lines, thus accelerating breeding progress and creating resilient varieties that can withstand prolonged drought periods. Among those tools are marker-assisted selection (MAS), genomic selection (GS), and targeted gene editing (clustered regularly interspaced short palindromic repeat (CRISPR) technology). The integration of these two major strategies, the advances in conventional and molecular breeding for the drought tolerance of conventional staple crops, and the introduction of drought-tolerant neglected and underutilized species into existing production systems has the potential to enhance the resilience of agricultural production under conditions of water scarcity.

**Keywords:** crop diversity; drought tolerance; genetic approaches; neglected and underutilized species

#### **1. Introduction**

Crops are dependent on rainfall, and so water scarcity is the primary productivity constraint in arid and semiarid tropical areas [1]. In these areas, water deficiency can last for periods longer than

four months. Additionally, when El Niño Southern Oscillation (ENSO) occurs, the amount of water available during the rainy season significantly drops, while rainfall is concentrated within a period of a few months. As a meteorological event, drought is a period in which the potential evaporation exceeds the rainfall. Agricultural drought is the result of water flow imbalance between the environmental demands of evapotranspiration and water transport in the soil-root system [2]. In this context, drought tolerance is described as the ability of a plant to live, grow, and reproduce successfully with a limited water supply or during periodic conditions of water deficit [3]. Regarding crops, a drought response is defined as a change in yield as a consequence of impaired plant development [4].

The challenge to produce under water scarcity conditions requires integrated actions and strategies to remodel the crop genetic background and the cropping systems. For it, wild relatives and landraces and neglected and underutilized species (NUS) are a rich source of genetic diversity. Thus, crop wild relatives and landraces due to their local adaptations are a vast resource of genetic diversity for developing more productive, nutritious, and resilient crop varieties [5–7]. In maize, landraces from dry habitats have been used successfully in breeding for water-limited environments, and wild species that are relatives of cultivated crops have been on the agenda as possible donors for drought tolerance [8]. Similarly, several NUS are drought and heat stress-tolerant, resistant to pest and diseases, and adapted to semi-arid and arid environments [9–11]. However, NUS are considered those species to which little attention is paid or that are entirely ignored by agricultural researchers, plant breeders, and policymakers [12]. Some NUS show potential to be introduced in cropping systems for crop diversification, e.g., quinoa has been accepted as an alternative crop in the northern latitudes of Europe [13]. The wider use of NUS would increase the agricultural biodiversity to buffer against crop vulnerability to water scarcity and would provide the quality of food and diverse food sources to address both food and nutritional security [14].

This review focuses on two strategies to increase drought stress tolerance: (i) the use of natural genes that have evolved over time and are preserved in crop wild relatives and landraces and (ii) exploiting the reservoir of neglected and underutilized species to identify those that are known to be more drought-tolerant than conventional staple crops for introducing them into existing cropping systems to make them more resilient to water deficiency conditions. We also highlight the use of phenomics and genomics as methods and approaches to characterize, identify, and use the desired traits related to drought tolerance (Figure 1).

**Figure 1.** Dual strategy of breeding for drought tolerance and the introduction of underutilized crops to make more resilient cropping systems to water deficiency conditions.

#### **2. General Overview of Physiological Responses of Plants to Drought Stress Conditions**

In arid environments, crops are exposed to extreme water-limiting conditions, which have become more extreme in recent decades, leading to reductions in yield or even total yield loss. Drought conditions trigger a progressive process in plants that begins with an early priming and preconditioning stage, followed by an intermediate stage in preparation for acclimation and a late stage of new homeostasis with reduced growth (Figure 2) [15]. Signal transduction pathways connecting the recognition of environmental stress factors and the initiation of plant responses often involve several intracellular changes, including variations in Ca2<sup>+</sup> concentration, reactive oxygen species (ROS)

accumulation, and cytosolic K+. Several proteins in the plasmalemma and tonoplast recognize these intracellular messengers, acting not only during signaling sensing but, also, in response to stress conditions [16].

**Figure 2.** General description of physiological responses of plants to drought stress conditions. ROS: reactive oxygen species and ABA: abscisic acid.

The cascade of morphophysiological responses to drought stress is primarily controlled by the early accumulation of abscisic acid (ABA), ion transport, and the induction of the associated signaling pathway genes [17]. Signal transduction that occurs in response to the early accumulation of ABA during drought stress is mediated by protein phosphorylation and ubiquitination. This post-translational modification of specific proteins triggers a cascade of physiological responses that include a decrease in stomatal conductance as an early avoidance response to drought stress, resulting in rapid stomatal closure [18–20]; furthermore, the regulation of several transcription factors (TFs) involved in osmotic stress and the increased expression of expansion genes involved in cell wall adjustments are a preparatory step towards drought acclimation [15,17,21]. TFs, as major regulators of plant responses to drought stress, affect the adaptation of plants to drought stress through their involvement in the transcriptional regulation of ABA- and drought stress-related gene expressions [21,22]. An elevated resistance to drought can also be achieved by the increased expression of cuticular wax biosynthesis genes leading to an enhanced cuticular wax accumulation in both leaves and stems [23].

ABA-dependent kinases related to stomatal closure are sucrose nonfermenting 1 (SNF1)-related protein kinase 2.6 (SnRK2.6, also known as OST1), which has the overall control of the stomata, and SnRK2.2 and SnRK2.3, which are only involved in the implementation of stress memory in guard cells during the subsequent dehydration process [24]. The OST1-dependent phosphorylation of the plasma membrane intrinsic protein 2;1 (PIP2;1) aquaporin produces an increase in the guard cell permeability to water and, possibly, hydrogen peroxide to trigger stomatal closure [25].

Drought-sensitive plants accumulate significantly more reactive oxygen species (ROS) and reactive nitrogen species (RNS) than tolerant genotypes [26,27]. Cellular redox homeostasis is disturbed as a consequence of extra ROS generation under drought stress. ABA induces the production of nitric oxide (NO) in guard cells, which, together with RNS, is a secondary messenger that modifies the enzyme activity and gene regulation [28,29]. At a specific level of NO and H2O2 treatments, the destruction of the mesophyll cell ultrastructure caused by drought stress is attenuated, increasing leaf chlorophyll, chlorophyll fluorescence values, and soluble carbohydrate and protein contents. Thus, endogenous NO and H2O2 may play crucial roles in rooting and the photosynthetic performance under drought conditions [30].

ABA is produced in the cytosol by the plastid-carotenoid pathway via the cleavage of xanthophyll precursors [20,31], and an impairment of the plastid-carotenoid pathway produces photo-oxidation and ABA-deficiency [31]. Genes involved in carotene biogenesis are not only rate-limiting for ABA synthesis but are also involved in plant responses to drought-stress conditions [20].

#### **3. Use of Crop Diversity in Plant Breeding for Drought-Tolerance Traits**

Valuable genes from natural inter- and intraspecific diversity can be used to take advantage of several mechanisms of survival and coadaptation in plants produced by natural selection [32]. Some of these genes are conserved by farmers (in landraces) or are present in crop wild relatives, and the narrow genetic base of modern cultivars is becoming a major bottleneck for crop improvement efforts; therefore, crop wild relatives have been extremely valuable in adapting crop varieties to changing climatic conditions [33,34]. *Helianthus anomalus*, a diploid annual sunflower species of hybrid origin that is endemic to active desert dunes, was successfully used in sunflower breeding with tolerance to drought stress [35]. In rice, *Oryza glaberrima* has been used in interspecific backcrossing to improve the drought resistance in *Oryza sativa* [36]. Similarly, the wild emmer wheat (*Triticum dicoccoides*) is highly tolerant to drought compared with its domesticated counterpart [37]. Additional examples of drought-tolerant varieties of major stables obtained through conventional breeding are presented in Table 1.

**Table 1.** Progress in the improvement of drought tolerance (DT) in major crops, number of drought-tolerant varieties or donors selected using conventional breeding, and the use of crop wild relatives as sources of drought-tolerant genes.



**Table 1.** *Cont.*

There is evidence of success in some crops that have been obtained by genetic introgression. Although, in some cases, it could be a time-demanding method, the introduction of new high-throughput "omics" technologies (some are described below) improves the efficiency for drought-tolerance traits in intra- and interspecific introgressions. In rice, a transcriptomic analysis established that drought-tolerant genotypes (drought-tolerant donor parent and progeny) were functionally enriched in oxidoreductase and lyase activities compared with a cultivated variety [61]. Thus, many traits related to mechanisms of drought tolerance have evolved over time and are present in wild relatives. The process of introducing genetic diversity from wild species into cultivars requires a significant amount of time, resources, and human capacity. The evident success of this strategy cannot be estimated only in terms of released varieties; for example, its contribution as additional genetic diversity in some crops should also be considered. However, in many species, wild relatives and landraces are poorly represented in gene banks, largely unavailable, and, therefore, underutilized [6]. For this reason, the ex situ conservation of genetic resources, especially wild relatives, should be a priority to guarantee their future availability [5,6]; however, this conservation is constrained for technical and funding challenges. Therefore, crop prioritization should be done for target wild relative conservations; ex situ conservations should be guaranteed especially for crops with importance to global food supplies and production systems worldwide [5]. Alternatively, the establishment of in situ conservation reserves could be used for major and minor crops to support the custodians of agrobiodiversity, the local communities [6]. Crops that grow in regions that currently are under high threats of water scarcity should consider seeking gene sources from wild relatives; for that, the characterizations of the wild relatives and landraces enable the detection of traits that can be used and introduced in improved varieties to provide greater adaptation and resilience to such restrictive environmental conditions.

#### **4. Introduction of Neglected and Underutilized Species into Cropping Systems**

The introduction of neglected and underutilized species (NUS) into the current cropping systems could help reduce food scarcity and diversify the homogeneous crops systems. This as a contribution to improve the human diet that currently has an overreliance on very limited numbers of major crops, mainly as sources of energy-dense foods but poor-quality nutrition [62,63]. As research on drought tolerance has mainly focused on major staple crops, the potential of some NUS being naturally more drought-tolerant than most staple crops has been overlooked. Moreover, NUS can be used in strategies to diversify cropping systems by introducing new species, thus increasing the genetic

diversity and resilience of production systems [63]. The domestication and breeding of new crops is a long-term solution for drought constraints [64]; however, this should be considered and could be carried out through (i) understanding the physiological, genetic, and molecular basis of natural mechanisms and the plasticity that allows their adaptations to drought stress, (ii) integrating new knowledge in breeding and field crop management in priority species, and (iii) articulating strategies and actionable recommendations to encourage their cultivation and make technologies and innovations widely available [64]. Crops that are naturally adapted to arid and semi-arid regions of the world exhibit several drought-tolerance mechanisms. Halophyte plants, such as quinoa (*Chenopodium quinoa*), have adaptations to drought stress that include an increase in Na<sup>+</sup> and K<sup>+</sup> transporters in cell and vacuole membranes, the rapid alteration of the plasma H+-ATPase activity, high contents of antioxidant compounds, vesicles for salt secretion, and lower stomatal density, among others [65,66]. Although, sweet potatoes (*Ipomoea batatas*) cannot be categorized as a NUS, in certain countries, this species is not cultivated and could have high potential, since it shows the ability to grow and produce under adverse conditions. In this species, the role of phytoene synthase (IbPSY1) regulated by the orange protein (IbOr) in abiotic stress tolerance has been confirmed [67,68]. Both proteins are related to carotenoid biosynthesis and accumulation. In buckwheat, FeBREB1 and FtMYB10 are the functional genes associated with the drought response, and the proteomic profile showed an overexpression of oxidoreductase activity, oxidation–reduction processes, xyloglucan:xyloglucosyl transferase activity, and apoplasts [69,70]. Thus, some NUS grow in marginal lands and extreme conditions (drought, salinity, heat, etc.) using versatile and adaptive mechanisms.

Several NUS are hardy, resilient, and long used for food by traditional communities, particularly in the primary regions of the diversity of each crop. Although the ongoing globalization of food systems worldwide has led to a uniformization of crops grown globally at the detriment of tradition, it also has contributed to crop introductions in countries from the primary regions of diversity of the crops [71]. Therefore, countries use introduced crops from regions of diversity other than their own ("foreign crops"), confirming that crop introduction is a process that has occurred throughout the history of agricultural crops. However, this process has been affected by crop homogeneity of the global food supply, which has limited the current agriculture to a focus on eleven species [62] (these species, consequently, have been the main focus of the research activities). Thus, research focused on NUS needs to be encouraged and is required to dissect their value and promote their use as alternative crops to create more resilient cropping systems. Thanks to dedicated research, previously neglected and underutilized species such as quinoa (*Chenopodium quinoa*), buckwheat (*Fagopyrum* sp.), millet (*Pennisetum glaucum*), cowpea (*Vigna unguiculata*), and sweet potato (*Ipomoea batatas*), among others, were shown to have adaptative capacities to water deficiency and have been successfully introduced as new commercial crops into production systems (Table 2). Global efforts relating to these NUS have allowed them to be explored and introduced into agricultural systems in different regions worldwide, confirming that this is a key strategy for crop diversification, nutritional enhancement, and adaptation to changing climates for future needs.

**Table 2.** Neglected and underutilized species successfully introduced to several countries as alternative crops for drought tolerance.



**Table 2.** *Cont.*

In parentheses is the number of registered varieties in each country. *Ah*: *Amaranthus hypochondriacus*, *Ac*: *Amaranthus caudatus*, and *Acr*: *Amaranthus cruentus*.

Exploiting the potential of NUS provides a highly diversified agricultural production system capable of sustaining food and nutritional security in water-deficient environments [81]. Species such as bambara groundnut (*Vigna subterranea*), taro (*Colocasia esculenta*), teff (*Eragrostis Tef*), yam (*Dioscorea esculenta*), moringa (*Moringa oleifera*), fonio (*Digitaria exilis*), safflower (*Carthamus tinctorius*), cañahua (*Chenopodium pallidicaule*), and tepary bean (*Phaseoulus acutifolius*), among others, could be alternative crops for several regions worldwide due to their natural adaptation to arid or semi-arid regions (center of origin or diversity) [71] to contribute to crop diversification and cropping systems that are more resilient to water-deficient conditions. However, more studies should be performed to understand the natural mechanisms and plasticity that allow their adaptation to current climate alterations, to identify priority species, to design their own field crop management, and to articulate strategies and actionable recommendations to encourage their cultivation and improvement.

#### **5. Methods and Approaches to Improve Crop Tolerance to Drought Stress**

Several methods and approaches could be used to characterize, identify, and apply desired traits related to drought tolerance in crops and their relatives and uncover their potential to promote more resilient cropping systems. This review is focused on both phenotyping and genomic approaches.

#### *5.1. Phenotyping Methods for Drought-Tolerance Trait Evaluations*

Plant growth and development change because of physiological alterations in response to water deficiency. These morphological traits are related to changes in metabolic patterns in source or sink organs [82]. Morphoanatomical and physiological adaptations can be determined by measuring certain traits, primarily those related to constitutive early vigor, starch storage, growth maintenance, and desiccation tolerance. These traits are important components of crop yield but can be expressed differently among genotypes. Consequently, levels of drought tolerance are expected to differ because of the phenotypic plasticity of the genotypes. In this context, phenotyping methods could complement and improve the efficiency and accuracy of field measurements that are, subsequently, to select desirable genotypes. For example, imaging techniques are suitable to measure the response to abiotic stress [83]. A decrease in stomatal conductance produces an increase in leaf or canopy temperature, which can easily be detected by thermal imaging, and changes in the assimilation rate, stomatal conductance, and intrinsic water use efficiency can be estimated [84,85]. Hyperspectral and near-infrared imaging are also used to correlate these parameters, which can easily be monitored without destructive sampling [86]. Fluorescence imaging is used to estimate the phenotypic parameters of the photosynthetic status, quantum yield, nonphotochemical quenching, and leaf health [83].

Morphology and color distributions can serve as indicators of developmental processes and stress responses of plants, and such alterations can be effectively detected by RGB color imaging. This is an effective, quantitative, and low-cost method to determine variations in several morphological traits and other yield-related parameters, such as growth, biomass, and architecture, among others, employing routines developed to convert pixels from red/green/blue images [87]. The use of RGB imaging for plant phenotyping is improving the quality of morphoagronomic characterizations, because it uses quantitative parameters as opposed to conventional qualitative parameters. Plant growth, morphology, and physiology can be automatically monitored via nondestructive analysis using RGB imaging software that is readily available. Thus, RGB is currently the most extensive imaging technology used [88]. Additionally, traits detected by imaging can be correlated with yield, resulting in cost-effective models that can identify target traits for breeding, including drought-tolerance traits and genotypes with higher plasticity.

In roots, hydrotropism is a phenotypic strategy in response to the water supply [89]. Similar to aboveground tissues, several mechanisms are related to morphological and physiological adaptations in response to drought stress. In soybeans, the plants with more lateral roots, a thicker lateral root system, and forks showed higher yields under water-deficient conditions in clay or sandy soils compared to those of the susceptible varieties [90]. Several phenotyping systems have been developed to evaluate the response in roots to drought conditions. Those systems use high-throughput analysis to determine the stem diameter, median and maximum root width, root top, and bottom angles (using Digital Imagen of Root Traits (DIRT) [91]), properties of the primary and lateral roots (employing RootGraph [92]), and other parameters. Although image analysis has improved with time, the extraction of roots from the soil produces damage that can affect the evaluation of the actual plant response. The 3D reconstruction systems that are currently under development perform nondestructive evaluations of the root system, as they do for aboveground organs, to address this problem [93].

All the previously described imaging methods could be useful tools to characterize wild species, cultivars, landraces, and orphan crops with mechanisms to tolerate drought stress conditions. The identification of these desired traits contributes to the efficient use of genotypes with higher tolerances to drought stress, expressed primarily in yields under water-deficient conditions.

#### *5.2. Potential of Genomic Approaches to Improve Crop Tolerances to Drought Stress*

For the development of genetic materials with drought tolerance, selection has focused on genotypes with high yield levels cultivated under dry conditions. For example, in wheat and maize, selections have been based on evaluating the plant phenotypes and physiological responses to drought stress [94]. Selection based on plant responses to drought stress is affected by low heritability, genetic interaction, environment-genotype interactions, and polygenic effects; therefore, the selection is slow, as massive phenotypic screening is required [95]. In this context, future breeding programs focusing on drought tolerance require the combination of plant breeding, genomics, statistics, experimental design, and genetic diversity management strategies. The combinations of these approaches can offer new opportunities in the genetic dissection of important quantitative traits

(e.g., drought tolerance) through the identification of quantitative trait loci (QTL), the implementation of marker-assisted selection (MAS) in breeding programs, and the cloning of these QTLs/genes and their editing using genetic-engineering strategies [94,96]. All approaches are studied using data from "omics" strategies, such as genomics, transcriptomics, proteomics, and metabolomics, which include large amounts of information; therefore, bioinformatics approaches are required for its analysis; hence, the importance of this area of computation in the analyses that integrate genotypic and phenotypic information, such as QTL mapping, single-nucleotide polymorphism (SNP), and gene discovery, as well as genomic selection, among others. Currently, the generation and use of bioinformatics tools, the open or closed code source (paid programs), and the use of molecular databases such as NCBI (National Center for Biotechnology Information) allows the analysis and interpretation of this information at different levels and varying degrees of complexity [97,98].

Genetic engineering has allowed the manipulation of the plant genome to study the gene structure and the function of candidate genes [95]. However, for drought stress, this strategy has not allowed the development of drought-tolerant varieties [94,99]. For example, maize (the highest-yielding cereal crop worldwide) is susceptible to drought during the flowering period, and the materials generated with genetic modifications to improve their drought tolerance presented a reduction in their yield traits. However, the overexpression of the trehalose-6-phosphate phosphatase (TPP) gene in maize plants has allowed the identification of materials that show an increase in yield of a similar proportion to the material generated without genetic modifications cultivated in mild and severe drought conditions [100]. Nonetheless, in Arabidopsis, the genetic basis of drought tolerance has been studied, and related homologous genes and their products have been identified and manipulated using genetic engineering, which demonstrated the feasibility for agriculturally important crop species. For example, ABA is a key player in drought tolerance and avoidance, as described above (Section 2). ABA regulates stomata closure in cases of water deficiency and plant desiccation and activates stress response genes by regulating diverse transcription factors. Therefore, if the primary aim is to increase plant sensitivity to ABA, this plant hormone is one of the main targets for genetic breeding regarding plant drought tolerance [96].

Several approaches are proposed to achieve plant tolerance to drought in transgenic crops. One approach is based on the manipulation of the ABA pathway, either by downregulating or by upregulating genes and the corresponding proteins involved in ABA signaling, biosynthesis, or degradation [7]. Farnesyltransferase ERA1 participates in ABA signaling, and plants transformed with antisense ERA1 constructed under a drought-inducible promoter showed higher tolerance to drought [101]. Another approach relies on the regulation of transcriptional activities of several genes by controlling the functions of the ABA-dependent transcription factors MYB2, MYC2, CBF, and AREB or the ABA-independent transcription factors such as DREB2, DREB3, and ZHDH [96,102,103]. ABA-independent transcription factors regulate the expression of stress-responsive genes; DREBs were first identified in Arabidopsis, but their role in stress tolerance has been demonstrated in several crops (e.g., maize, wheat, rice, rye, barley, soybean, and tomato). Hence, because DREBs have universal roles in abiotic stress responses in plants [104], they are ideal candidates for the genetic manipulation of the drought response in transgenic plant lines. Finally, but not less important, some signaling pathways that include reactive oxygen species (ROS), sucrose nonfermenting 1-related protein kinase (SnRKs), mitogen-activated protein kinases (MAPKs), calcium-dependent protein kinases (CDPKs) and phosphatases are involved in the regulation of plant responses to drought [105–107]; these pathways are also valuable targets in the genetic engineering of drought tolerance. The experimental results of genetic manipulation in crops are positive. In maize, the increased expression of the orthologous maize transcription factor (ZmNF-YBs) increased the tolerance to drought based on the responses of various stress-related parameters, including the chlorophyll content, stomatal conductance, leaf temperature, reduced wilting, and the maintenance of photosynthesis. These adaptations contributed to higher grain yields in water-limited environments [108]. The overexpression of phytochrome-interacting factor 3 (PIF3) increases the tolerance to dehydration and salt stress, and in ZmPIFs transgenic plants,

the relative water and chlorophyll contents and chlorophyll fluorescence increases, in addition to a significant increase in cell membrane stability under stress conditions [109,110].

Drought tolerance is a quantitative trait controlled by many genes, so the genetic manipulation to generate new cultivars with drought tolerance using the single genetic interventions strategy is difficult [111,112]. However, the understanding of the genetic bases of drought tolerance has increased [101], and many genes associated with this trait have been identified [113] and used in the implementation of gene-editing systems [114], gene silencing [115] and overexpression methods [116] to generate materials with possible drought tolerance [100]. Some examples of success using these strategies to produce new materials with possible drought tolerance in the main crops are presented in Table 3. As with the phenotypic selections, the genomic analysis for drought tolerance has been studied, measuring traits related to yield in dry conditions [113], because there are few reports about regions of the genome associated with specific drought-response components. Additionally, in species such as wheat, the QTLs associated with drought tolerance remain large, and their uses in breeding programs are limited [95].


**Table 3.** Examples of the contribution of genomic approaches in the breeding of major crops for drought tolerance (DT).

Genomic analyses for drought tolerance have allowed the identification of genes, transcription factors, miRNAs, hormones, and proteins involved in this type of stress [96], including loci related to ABA pathway signaling [124], leaf senescence [125], and other drought-related traits [96]. For example, for related ABA signaling, one of these QTLs contained a candidate gene coding for 9-cis-epoxycarotenoid dioxygenase 2, which is an essential enzyme during ABA synthesis that is expressed under drought stress [124]. QTLs related to plant growth and physiological parameters in wheat under water-deficit conditions were identified from a genetic map of 3200 SNP markers and 783 loci, and strong effects of these QTLs related to drought conditions were detected, providing evidence of their potential in plant breeding. Nevertheless, low heritability values showed that these traits are highly influenced by the environment [87]. Similarly, in barley, two primary QTLs associated with drought stress and leaf senescence were identified. Water deficiency negatively affected the biomass yield, leaf color, and electron transport rate values, whereas the osmolality, free proline content, and total content of soluble sugars increased under drought stress [125]. In rice, asymmetric root growth and an increase in the root growth angle were observed upon the introduction of the DEEPER ROOTING 1 gene (DRO1), which is a quantitative trait locus in rice that controls the root growth angle; therefore, the resulting line avoided the occurrence of drought by increasing deep rooting and maintained a high yield performance under drought conditions [126]. However, drought tolerance is a highly complex trait, so the cultivar development for this target has been achieved by classical breeding [95].

Due to the interest in drought tolerance, genomic tools such as genomic selection (GS) and marker-assisted selection (MAS) are required in new breeding programs [127]. To implement MAS, first, the identification of molecular markers and genes/QTLs that explain the phenotypic variance of drought tolerance is required, and due to drought stress in plants, many changes in gene expressions are observed. For this, the identification of candidate genes that are expressed in drought-stress conditions is a major strategy, and genomic technologies such as microarray and transcriptomic analyses have been useful in the identification of these genes [101,128]. These technologies include new strategies in the identification of candidate genes associated with important agronomic traits in species with genome sequencing that have been developed as novel in-silico platforms for gene discovery and that help scientists to identify candidate genes through the knowledge available in genetic databases and public information, allowing candidate gene prioritization [129]. Although the advance of genomic technology has allowed the identification of QTLs for drought tolerance, this information has been underutilized in the generation of new cultivars with drought tolerance [113]. To improve drought tolerance, QTLs associated with this trait have been introgressed into breeding populations and selected genotypes using MAS. For example, in rice, alleles of QTLs related to drought stress were transferred into different genetic backgrounds, and their effects were identified; moreover, the phenotypic evaluations confirmed the success of MAS for this trait [130].

Many favorable alleles for drought tolerance are present in crop wild relatives; thus, new alleles from those wild species could be included in breeding programs. These alleles should come mainly from genotypes adapted to target environments and traits. For this, it is necessary to include and evaluate these materials in all breeding programs, particularly in crops that show low levels of genetic diversity due to bottlenecks caused by selection and inbreeding [131]. In crops such as wheat and barley, QTLs related to drought have been identified and introgressed from wild relative species [132], allowing the inclusion of new favorable alleles in the breeding populations. Another approach to identifying regions that explain the phenotypic variation of traits (e.g., drought tolerance) in plants is the Genome-Wide Association Study (GWAS), where it is possible to associate a specific single-nucleotide polymorphism (SNP)/gene with a phenotypic variation of any trait [133]. The GWAS combines phenotypic and genomic information to identify statistical associations in order to explain the phenotypic variations of the target trait. This approach uses all allelic variations available and the possibility to identify marker-trait associations. On the other hand, resolution mapping depends on the linkage disequilibrium (LD) extension in the mapped population. These populations can be found in gene banks, core collections, and breeding populations, allowing the analysis of all historical recombination events in the used population. Additionally, the GWAS considers the relationship between the samples and their genetic structure in order to minimize false positives [134]. For drought stress, the GWAS has been used to identity SNPs related to high-temperature tolerance and their yield effects in crops such as wheat [135], cotton [136], rice [137], and maize [138], among other species.

Considering the above, new approaches in genomics strategies such as GS combined with high-throughput phenotyping are starting to be used to achieve different important target traits; this is because this approach allowed the identification of better lines through prediction models for complex traits such as stress (e.g., drought tolerance), employing the genomic estimated breeding value (GEBV) at the individual level [95]. GS is a form of MAS but shows new characteristics for identifying promising materials in comparison to conventional MAS that uses some markers/genes previously identified, with significant effects in the genetic government of the target traits, to improve their level [95]. In recent years, the most popular technologies to identify molecular markers in a partial representation of the genome (use of enzyme restriction) are genotyping strategies based on next-generation sequencing such as genotyping by sequencing (GBS), RAD-seq, diversity arrays technology (DArT), and the complexity reduction of polymorphic sequences (CRoPS), among others [139,140].

Genomic selection implemented prediction models that allowed selecting genotypes without phenotypic evaluation. In this sense, alleles with high and low effects are analyzed; the breeding of complex traits such as drought tolerance is well-supported by this approach. GS strategies in the plant breeding of maize and barley have reduced the selection time by almost half per cycle compared to the phenotypic selection [141]. GS in plant breeding has been used to improve different important traits such as grain yield in maize [142], soybean cyst nematode resistance [143], amylase activity in barley [144], and grain yield and plant height in rice [145], among others. To this end, genomic selection offers a new method to accelerate the breeding process [146]. The use of GS to improve drought tolerance has been reported in maize [147,148] and chickpea [149], establishing a low-to-medium prediction accuracy for yield and secondary traits related to drought stress. The implementation of GS has shown that the prediction accuracy is affected by the breeding population types, training population size, the complexity of the trait, and the number of markers used. Additionally, the inclusion of marker-trait associations identified by QTL and GWAS analyses in prediction models increases the prediction accuracy [150,151]. Thus, it is necessary to continue with the study of the genetic architecture of complex traits by GWAS and QTLs analyses.

Currently, it is possible to generate new alleles in known genes through second-generation gene-editing methods, the most popular being the zinc-finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN), and the clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated nuclease protein (CRISPR/Cas) system [152,153]. The CRISPR/Cas9 system is a technique to edit genes that function as an endonuclease that induces double-strand breaks (DSB) at specific genome sites, followed by DNA repair, which includes, in the gene sequences, some insertions and/or deletions [154]. This system for genome editing has been used in species such as Arabidopsis, tomato, rice, maize, and wheat [154–157], among others. This gene-editing system is known to be simple to implement, has design flexibility, is low-cost, and is highly efficient [158]. The CRISPR/Cas9 system for drought stress has been used to edit the genes ARGOS8 in maize [159]; SlMAPK3 and SlNPR1 in tomato [160,161]; MIR169a and OST2 in Arabidopsis [162]; and OsDERF1, OsPMS3, OsEPSPS, OsMSH1, and OsMYB5 in rice [163], among others.

Genetic information on drought tolerance pathways in crops has proven to be a promising approach to improve crops using genetic engineering, marker-assisted selection, and genomic selection, thus demonstrating the feasibility of integrating this approach into drought tolerance challenges in cropping systems.

#### **6. Concluding Remarks**

In this review, we described two strategies to improve drought-stress tolerance in crops: (i) the use natural genes for drought stress tolerance that have evolved over time and are present in crop wild relatives and landraces and (ii) exploiting the potential of neglected and underutilized species and introducing them into cropping systems to make them more resilient to water deficiency conditions. For both, the richness of genetic diversity represents an invaluable reserve for breeding, crop diversification, nutritional enhancement, and adaptation to changing climates, which should be recognized and conserved for future needs. The mechanisms of the drought tolerance of crop gene pools and neglected and underutilized species guarantee food security in environments where they grow naturally and/or are cultivated. However, despite the recent studies in this field, much information remains unknown. Future studies should continue and integrate several approaches (including both phenomics and genomics) to explore, characterize, identify, and use desired traits, contributing to the development of crops and cropping systems with tolerances to drought stress. An understanding of the morphoanatomical, physiological, and genetic mechanisms involved in the responses to drought stress in crop wild relatives and neglected and underutilized species is fundamental to recognize their potential for crop breeding or crop diversification. Therefore, the integration of crop breeding supported by phenomics and genomics is key for improving the tolerance to drought stress in crops, as has been reflected in species such as corn and other mentioned species. Likewise, for neglected and underutilized species such as quinoa, whose development as a crop in recent years can be considered as the result of the integration of these strategies and introduced into existing cropping systems. This is a practical direction for future research to make more resilient cropping systems to water deficiency conditions.

**Author Contributions:** All authors listed have made a substantial, direct, and intellectual contribution to this work; further. All authors have read and agreed to the published version of the manuscript.

**Funding:** The Internal Grant Agency, FA, Mendel University in Brno (No. IP 40/2015), NAAR (No. QK1910197) (both in the Czech Republic), and the Ministerio de Agricultura y Desarrollo Rural (MADR) of Colombia (Agreement No. 1828) funded this study.

**Acknowledgments:** To William Gomez for support and time for discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **The Role of Genetic Resources in Breeding for Climate Change: The Case of Public Breeding Programmes in Eighteen Developing Countries**

### **Gea Galluzzi 1,\*, Ase**ff**a Seyoum 2, Michael Halewood 1, Isabel López Noriega <sup>1</sup> and Eric W. Welch <sup>2</sup>**


Received: 29 July 2020; Accepted: 21 August 2020; Published: 31 August 2020

**Abstract:** The role of plant breeding in adapting crops to climate changes that affect food production in developing countries is recognized as extremely important and urgent, alongside other agronomic, socio-economic and policy adaptation pathways. To enhance plant breeders' capacity to respond to climate challenges, it is acknowledged that they need to be able to access and use as much genetic diversity as they can get. Through an analysis of data from a global survey, we explore if and how public breeders in selected developing countries are responding to climate challenges through a renewed or innovative use of plant genetic resources, particularly in terms of types of material incorporated into their breeding work as well as sources of such germplasm. It also looks at the possible limitations breeders encounter in their efforts towards exploring diversity for adaptation. Breeders are clearly considering climate challenges. In general, their efforts are aimed at intensifying their breeding work on traits that they were already working on before climate change was so widely discussed. Similarly, the kinds of germplasm they use, and the sources from which they obtain it, do not appear to have changed significantly over the course of recent years. The main challenges breeders faced in accessing germplasm were linked to administrative/legal factors, particularly related to obtaining genetic resources across national borders. They also underscore technical challenges such as a lack of appropriate technologies to exploit germplasm sets such as crop wild relatives and landraces. Addressing these limitations will be crucial to fully enhance the role of public sector breeders in helping to adapt vulnerable agricultural systems to the challenges of climate change.

**Keywords:** genetic resources; plant breeding; climate change adaptation; genebanks; policy; developing countries

#### **1. Introduction**

The International Panel on Climate Change (IPCC) [1] defines climate change as a change in the mean value and/or variability of the climate's properties, that persists for an extended period, usually for decades or longer. Agriculture is extremely vulnerable to climate change: increasing temperatures and declining precipitation over semi-arid regions are likely to reduce yields for a number of primary crops in the next two decades [2–7]; the intensity and distribution of pests and disease outbreaks may also become increasingly unpredictable, with serious impacts on agricultural productivity [8–10]. While the overall impacts of climate change on agricultural systems are expected to be negative [11], the effects may vary both in type and magnitude across geographical areas [12,13]: developing countries

are likely to be most affected, not only because of their predominantly low input, rain-fed cropping systems which rely on somewhat regular weather patterns, but also because of their more rapid population growth, which determines greater pressures on agricultural production and more serious food insecurity risks [2,14–16].

Adaptation to climate change is the practice and process of adjusting to climate-induced adverse effects [2,14,17]. In agriculture, adaptation to climate change is about farmers' and other stakeholders' responses to environmental disturbances that affect their cropping systems. This may involve a broad array of alterations from livelihood and agronomic strategies to policy changes [18,19]. Among those strategies that focus on the production system itself, climate change adaptation strategies are commonly divided in two pathways: agronomic management (which tends to be short term) and genetic improvement (longer term) [9,13]. Agronomic management strategies encompass changing cultivation practices (timing or location of cropping activities, techniques of land preparation, weed/pest/disease management [18]) and the adoption of new varieties or shifting to alternative crops or crop combinations [9,13,20–25]. Genetic improvement, on the contrary, involves the development and adoption of new, better adapted varieties of the crop of interest. While farmers have always conducted genetic improvement within their informal seed networks [26–30], a crucial role has been played over the last century by public (and, increasingly, private) plant breeding [31,32]. Indeed, "modern", scientific plant breeding has contributed to massive and rapid yield increases in many crops, as well as to increased tolerance to a variety of biotic and abiotic stresses; however, with more severe and frequent challenges from aggravated climate change, plant breeders are being called to place extra efforts in improving and accelerating the tools and working strategies they use, to timely provide farmers with adapted varieties [33–36], particularly in more climate vulnerable developing countries.

An essential building block for any innovation in plant breeding to occur is access to, and use of genetic diversity from existing wild or domesticated species [37–40], particularly when the genetic target dealt with is complex, as in the case of adaptive traits for climate stress responses. In developing new cultivars, breeders can start from a range of different germplasm types and germplasm sources. Among the former, landraces and crop wild relatives remain perhaps the largest reservoir of genetic diversity, including traits of tolerance or resistance to environmental stressors, even those associated with climate change [30,36,41–54]. On the other hand, advanced breeding or elite lines which have undergone pre-breeding efforts, may harbour less genetic diversity but be more "ready to use" materials for breeders, thanks to the useful information accumulated on their structure and properties [49,55–59]. In terms of sources, breeders can tap into the collections of national and international genebanks and germplasm research centres, which host a wide range of different materials for different crops [55,57,60], or search for the diversity they may need directly in natural or farmers' fields, particularly in areas of origin or domestication. Recent years have witnessed the establishment of national laws and regulations that have changed the rules of germplasm access, distribution and sharing, potentially affecting the use of genetic materials by breeders [57,61–65]. Additional factors, including human, technical and financial resources available in any breeding programme, as well as linkages and collaborations with other national and international institutions, are likely to contribute to shaping the way breeders use genetic resources in the context of climate change adaptation [66,67].

It is against the above context that we undertook a survey of plant breeders in 19 developing countries to see how they are perceiving climate change's impacts on their breeding objectives, and if these changes had knock-on effects on the kinds of genetic resources they use and where they access them. In particular, the objectives of the study are: (1) assessing breeders' perception of climate change and the traits they give priority to in breeding; (2) exploring trends in the use of genetic materials in breeding programs over the course of the last two to five years (in terms of types of materials used and sources of access); (3) assessing the association between changes in the way breeders use genetic resources and their perception of climate change priorities; and (4) examining if and how regulatory, technical, financial or other issues influence the extent to which breeders are capable of innovating with genetic resources.

Section 2 presents the results of the study, Section 3 provides the discussion and conclusions; data sources, data collection procedures and methods of analysis are presented in Section 4. We chose to focus on public sector breeders only, based on the consideration that in the developing world, private breeding continues to be modest for most crops, while public support to breeding is possibly still the major avenue for the development of new varieties [66], and for focusing on traits (such as environmental stability and sustainability) or crops which may be under-researched in private sector breeding [68].

#### **2. Results**

Complete responses were received from 200 breeders in 18 countries, who answered in one out of three languages (English, French and Spanish), as summarized in Table 1.


**Table 1.** Sample frame of the survey.

Around 57% of the respondents were from national agricultural research systems, while about 24% belonged to academic institutions. Although we had targeted breeders which to the best of our knowledge belonged to public institutions, we received a few responses from the private sector, which anyhow represented only 5%. Respondents from national non-governmental or non-profit organizations totaled 6%. The remaining 8% of respondents worked in other organizations including local, community or farmers' organizations as well as international non-governmental and non-profit organizations. Regarding gender, 146 breeders were male, while 32 were female. At the time of the survey, the average age of respondents was 49 years, and the average time they had worked as breeders was just over 17 years, of which, on average, 15 years were spent working in their current organization. Over half of the breeders (113) had a doctorate degree, while over half of the others (46) had a master's degree.

#### *2.1. Perception of Extreme Weather Patterns*

The plant breeders were asked about their perception of extreme weather events occurring in their region of work over the last five years. The results show that most breeders agreed on irregular drought periods and irregular rainfall patterns as being the most seriously increasing weather phenomena (see Figure 1). A considerable number of plant breeders also observed an increasing tendency toward a late onset of cropping seasons, a pattern that was strongly correlated to perceptions of irregular rainfall (*p*-value = 0.007864) as well as with irregular and excessive droughts (*p*-value = 0.00000002616 and 0.002548 respectively) (data not shown).

**Figure 1.** Plant breeders' perceptions about weather patterns observed in their regions. Numbers on or next to the bars represent percentages over the total responses.

The crops which breeders work on, do not seem to significantly influence their perception of climate change. To detect if there were significant differences across regions in terms of climate change perceptions, we subdivided our responses into six geographical regions, namely Central America (with five responses), East Africa (38), East Asia (43), North Africa and the Middle East (15), South America (29), South Asia (53) and West Africa (17). Ordinal means per region revealed that the most serious problem affecting all of the areas was increasing irregularity in rainfall and drought as well as from increasing drought periods. Inter-regional differences in perceived weather patterns were not statistically significant (based on an ANOVA analyses of variance) but may provide an indication of prevailing trends. Breeders in West Africa reported to be dealing with increasing unpredictability in the onset and end of rainy and dry seasons; in East and West Africa an increase in cold weather periods was reported, while Central America was the only region where a slight decrease in overall rainfall quantity was described (in all other regions, rainfall appears to be changing little or slightly increasing). Central America also scored highest in terms of increases in strong winds and hurricanes. Late onset of seasons, more than early, was reported as an issue in many regions, namely in Central America, East Africa, East Asia, North Africa and the Middle East, and South Asia.

#### *2.2. Changes in the Traits on Which Breeders Are Working*

The results presented in Figure 2 describe the changes in the importance of the traits on which breeders are working. Breeding for resistance to pests and diseases is the priority that has become most important over the past five years (87%), followed closely by breeding for drought tolerance (82%), shorter growth cycles (63%) and tolerance to high temperatures (54%). In another set of questions concerning the traits that have always been most important regardless of recent changes, breeders flagged the same traits in the same order of importance. Similarly, breeding for low temperature as well as for water logging tolerance have not consistently increased in importance, and, indeed, their

overall relevance even before the last five years was ranked quite low by respondents. Over 85% of breeders listed other important traits in addition to these priority ones, including quality traits (63%), traits related to tolerance to low soil fertility (20%) and yield (11%). In summary, recent years appear to have witnessed an increase in the urgency with which breeders are working on traits that already had high priority in their longer-term breeding work. The "other" traits category was chosen by a number of respondents, but 66% of these did not specify in the open text space what they meant. Those who did, flagged the importance of working for improving the quality of the crops' final product (15% of the total "other" responses), tolerance to limiting soil conditions (acidity, salinity, low nitrogen, aluminium, 7%) and yield improvements (5%).

**Figure 2.** Change in the importance of traits in breeders' work. Numbers on or next to the bars represent percentages over the total responses.

While our questions on changing trait importance required choosing among a list of single priority traits, the results we received reflect the complexity of breeders' work, which entails focusing on different but somehow related traits at the same time: focusing more on pest/disease resistance and drought tolerance was related to an increase in the importance of all other breeding objectives; strong associations were also found among breeding for tolerance to low temperatures and waterlogging and between breeding for high and low temperature tolerance, as if a more encompassing breeding objective was obtaining a general improvement in temperature stress response mechanisms. The many positive associations reported between the changing importance of different traits confirm the complex, multi-trait nature of a crop's adaptation to biotic and abiotic stresses. As expected, changes in the importance of traits during recent years showed a strong association with breeders' perception towards changes in some of the climate patterns (Table 2): plant breeders who observe an increase in irregular and excessive drought also increase their focus towards drought resistance and shorter growth cycles; those focusing more on pest and disease resistance are those who report a more prevalent shift in seasonality and drought occurrences; those detecting excessive and prolonged cold periods (although not common among the respondents) tend to breed more for traits of tolerance to low temperatures. In summary, the recent focus of breeders has not shifted significantly from the most important traits they had been working on routinely, but it does closely mirror their perceptions of climatic events and trends.


**Table 2.** Correlation between the change in importance of traits in breeding and perceived changes in climate.

\* Significant at the 0.1 probability level. \*\* Significant at the 0.05 probability level. \*\*\* Significant at the 0.01 probability level.

The majority of breeders reported that farmers' preferences for resistance to pests and diseases, drought tolerance and short-cycle varieties had increased the most, followed by high temperature tolerance; significant positive associations were detected between breeders' increasing attention to pest/disease resistance, drought tolerance, high and low temperature tolerance and farmers' increased preference for each of these traits. A caveat must be made here: farmers' preferences, as described in this article, were reported by breeders in the survey and not gathered among farmers directly. We cannot therefore be certain that the breeders' perceptions are not self-made or just intuited. In a different section of our survey we asked breeders if their institutions had specific partnerships with relevant field-level organizations that allowed them to interact with farmers and determine their needs. The majority of responses were positive; most breeders stated that their programme interacted with local, community or farmers' organizations (over 78%) as well as with national agricultural research organizations (72%). This points to an encouraging scenario of collaboration and dialogue between researchers and grassroots/community-level organizations to ensure that breeders' work is aligned with farmers' needs (see Table 3).



\* Significant at the 0.1 probability level. \*\* Significant at the 0.05 probability level. \*\*\* Significant at the 0.01 probability level.

#### *2.3. Changes in the Genetic Materials Used in Breeding Programs*

Having noted a positive association between breeders' perceptions of climate change and of farmers' needs, on the one hand, and the traits breeders' target, on the other hand, we analysed if an association existed between breeders' perceptions of the prevailing climate patterns and the changes in the types of genetic resources they use. First, breeders were asked to describe the relative proportion of crop wild relatives, landraces and advanced/elite lines in their routine work, and then to describe any recent (two to five years) increase or decrease of each germplasm type. Advanced or elite lines were reported as the prevalent material type in breeders' routine work (53%), followed by landraces (25%), other kinds of materials (15%) and crop wild relatives (7%). The use of crop wild relatives showed

a positive association with the use of landraces, suggesting that plant breeders who use a greater proportion of the one material type are also likely to be using more of the other. On the contrary, a trade-off between the use of elite lines and less advanced materials exists, indicating that the choice of advanced materials comes with a more likely reduction in the use of landraces and crop wild relatives (data not shown). In terms of recent change, the only category of germplasm that was reported to have increased among the majority of breeders (58%) was that of advanced/elite lines. Conversely, for landraces and crop wild relatives, the majority of responses indicated no significant change in their use (42% and 49% respectively) (see Figure 3).

**Figure 3.** Change in the proportion of germplasm types used by breeders. Numbers on or next to the bars represent percentages over the total responses.

The crops that breeders reported to be working on were grouped into categories: cereals and pseudo-cereals (114 responses), grain legumes (27 responses), roots and tubers (25 responses), fruits (16 responses) and other crops (18 responses). No significant differences were found between crop categories in terms of changes in the types of materials used.

A few elements of breeders' perception of climate change were related to recent changes in the types of materials they use. The significance of these relationships is limited to landraces and elite lines since no association exists with crop wild relatives (which, as we have seen, are more scarcely used). When breeders observe increasing rainfall quantity and irregularity, or colder weather, they tend to use more landraces in their breeding work. On the other hand, those plant breeders observing later onset of seasons, tend to use smaller proportions of landraces. Elite lines are prevalent among breeders concerned with increasing drought irregularity (see Table 4).


**Table 4.** Association between breeders' perception of climate change and changes in their use of germplasm.

\* Significant at the 0.1 probability level. \*\* Significant at the 0.05 probability level. \*\*\* Significant at the 0.01 probability level.

These associations are hard to interpret or even match in literature, since all these climate challenges can potentially be met by working with both advanced materials as well as with landraces and even crop wild relatives. However, using more landraces positively correlates with a higher number of climate challenges, suggesting greater climate-related efforts or successes among breeders who explore potentially more variable reservoirs of genetic diversity.

#### *2.4. Changing Sources of Breeding Materials*

The majority of breeders reported to be mostly using ex situ material (82%, if we combine all internal material and that coming from genebanks and universities, excluding community genebanks). Thirty-five percent of the ex situ germplasm comes from the breeder's own organization, and 23% comes from the CGIAR. The share of materials from the national genebank in the breeders' own country is rather low, probably due to the fact that most respondents belonged to the NARO, which often also includes or manages the national genebank, and hence referred to this source as their "own organisation". A less consistent, but still significant, proportion of materials also comes from farmers' fields and natural areas (11%) and community genebanks (5%), which we consider to be a conservation strategy closer to on farm than to ex situ [69,70] (see Figure 4).

**Figure 4.** Sources of germplasm used by breeders. CGIAR, Consultative Group on International Agricultural Research; CSB, community seed bank; gb, genebank; NARO, National Agriculture Research Organization; NGO, non-governmental organization.

Association analyses between changes in sources of germplasm and material types highlighted a few relevant relationships. Breeders who have been using more crop wild relatives over recent years, although few in number, increasingly sourced them from a variety of ex situ collections (both public, including the CGIAR, and private, and mostly outside their countries' borders) rather than in situ (i.e., in farmers' or natural fields). Increasing landrace use is indeed the only trend that is associated to greater on farm sourcing; other typical sources for those breeders using more landraces are a variety of (public) institutions, all within the national territory. Increased use of elite lines was strongly associated with foreign or international sources—that is, genebanks and universities, and the CGIAR (see Table 5).

These relationships mirror what was already happening in breeders' routine work, previous to recent years (data not shown), and are therefore not indicative of any recent change in behaviour or innovation.


**Table 5.** Association between changing material types and sources of breeding material.

\* Significant at the 0.1 probability level. \*\* Significant at the 0.05 probability level. \*\*\* Significant at the 0.01 probability level.

#### *2.5. Policy, Financial and Other Limitations*

To analyse whether other factors are related to breeders' behaviour with respect to germplasm use, we asked breeders about if and how (always/often, sometimes, rarely/never) they are subject to limitations that affect their capacity to access and use diverse sets of germplasm. The limitations that breeders were offered to choose among were of a legal, administrative, financial and technical nature (see Figure 5). The types of restrictions that the majority of breeders reported to be always or often affected by, were of an administrative and policy nature, the most serious being burdensome procedures imposed by providers (60%) and national rules or laws (58%). These were followed by restrictions to the further transmission of the material received (50%), difficulties related to the use of material transfer agreements (48%), intellectual property regimes (47%), international rules (45%), internal administrative procedures (43%) and an unwillingness by providers to share materials (43%). Restrictions of technical (breeders' ability to use the germplasm) and financial (payments for obtaining germplasm) nature were less frequently reported.

Some of these limitations were significantly related to breeders' efforts to incorporate a greater proportion of specific germplasm types into their work (Table 6). Using more advanced materials, which was a widespread trend in our sample population, was related to experiencing administrative limitations both by providers and by breeders' own organizations. The few breeders who are increasing their use of landraces and crop wild relatives experienced other limitations, including financial (for both landraces and crop wild relatives), technical capacity (crop wild relatives) and political (landraces) issues. The positive and significant association between the difficulty of using landraces and the "costs of shipment" variable is rather surprising, since respondents declared to be mostly and increasingly sourcing landraces from national or internal sources to their organization. It could be that breeders who are using more landraces are close to exhausting their "routine" sources of variation and are eager to obtain more samples of this germplasm category, which are unavailable within their closest collections. It may be that in seeking beyond these, they incur in shipping costs which they are unable to cover.


**Figure 5.** Administrative (adm)/legal and technical factors affecting breeders' work. Numbers on or next to the bars represent percentages over the total responses.


**Table 6.** Association between limiting factors and changes in the use of germplasm types.

\* Significant at the 0.1 probability level. \*\* Significant at the 0.05 probability level. \*\*\* Significant at the 0.01 probability level.

When prompted about the effect of access and benefit sharing policies on their work, most breeders reported that they were scarcely influenced by these policies (25%), although another 21% said their situation had worsened; few breeders (12%) responded that they did not know if and how these policies had affected their work. These percentages are in a way surprising if compared to the importance breeders had given to legal and administrative barriers in the previously analysed question, but may suggest that there are either other non-ABS barriers which we did not capture, or that there is not a widespread awareness and knowledge about national and international ABS frameworks and their inter-relations. Regarding in particular the effect of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA), 37% of the breeders stated that it had made little or no difference to their work, and 22% answered that they did not know. Among the surveyed countries, China (from where we received numerous responses) and Bolivia were not Parties to the International Treaty at the time of the survey (China is still not). When asked about the lack of specific technologies or tools, limited access to molecular tools and approaches (from sequencing to genomics and proteomics instruments, together with the capacity to use them) was the most urgent

factor (for 68% of respondents). Infrastructure for phenotyping, controlled trials, micro-propagation and characterization followed (24%). The availability of genetic materials or information about them was deemed to be a critical limitation by only around 6% of respondents. Breeders were also asked if the budget available for their breeding programme as well as international donor funding had changed. The majority of breeders reported that both had increased (57% and 39% respectively). Positive changes in financial availability were related to a greater use of crop wild relatives in breeding, which makes sense in light of the fact that the few breeders working more with wild material had reported financial limitations.

#### **3. Discussion**

Overall, our results suggest that the surveyed plant breeders are well aware of increasingly urgent climate patterns in their target regions, and that their response is to increase their focus on traits that were already the highest priority in earlier years. For the vast majority of breeders, regardless of geographical or crop focus, these traits included pest and disease resistance and drought tolerance, followed by shorter growth cycles, which they believe to also be the traits most desired by farmers. Our results confirm that recent climate changes have exacerbated the breeders' sense of urgency in addressing biotic and abiotic challenges that were already a high priority [71,72]: indeed, research has suggested that dry regions will become drier and wet regions wetter in response to global warming, a trend labelled as the "rich get richer" mechanism [73]. This tendency would naturally lead breeders to devote even more efforts to traits that have always received the highest importance in their work, such as drought tolerance, particularly in those developing countries which rely on mostly rain-fed cropping systems and are hence more heavily affected [71,74]. The prevailing tendency among the surveyed breeders was to increase their use of elite lines versus landraces and crop wild relatives. One might have expected instead to see increased reliance on more genetically diverse materials as potential sources of genetic traits adapted to changing climate changes, as highlighted in the introduction. However, increased use of landraces, albeit not widespread in our sample population, correlated with the highest number of climate challenges, suggesting that those relatively few breeders who are working more with this material type, are able to tackle specific climate change needs. However, financial, technical and policy related disincentives appear to be very influential in limiting a more widespread increase in the use of landraces, and even more so crop wild relatives. Of course, it is also possible that some of the advanced lines that breeders are increasingly using derive from introgression of traits from crop wild relatives or traditional varieties as a result of pre-breeding conducted elsewhere, thus guaranteeing the incorporation of new, potentially adaptive diversity anyway.

As far as sources of material are concerned, breeders prioritised materials coming from their own organization and secondly the international network of CGIAR Centers. A similar order of prevalence in germplasm sources has been observed in other studies. A survey among wheat breeders in developed and developing countries found that top priority was given by the majority of breeders to lines readily available within their own programmes; in developing countries, CGIAR lines and released varieties were the second most heavily used type of germplasm [75], as also emerging here. Indeed, it is widely acknowledged that the CGIAR has played a critical role in the provision of germplasm to developing countries since its inception, through its genebanks, breeding programmes and international nurseries [76–80]. Germplasm originally received from foreign or international programmes such as the CGIAR gradually becomes internalised into national programmes, likely making the dependence on continued international sourcing less prominent over time, compared to in-house sourcing. Indeed, international or foreign germplasm is often re-distributed by the original recipient to additional colleagues, particularly in developing countries [59]. In addition, some national programmes have consistently improved their capability of carrying out their own crosses and developing their own improved lines, with more targeted use of CGIAR improved material.

Perhaps the most interesting result of our survey is that sixty percent of the breeders confirmed that their work is always/often affected by burdensome procedures imposed by providers, and by national rules or laws regulating access and further transfer of materials. We recorded similarly high response rates with respect to the deleterious impacts of international rules and regulations, intellect property and licensing, even their own organization's administrative procedures. Overall, this points to the lack of a supportive policy environment as a greater limiting factor than their own technical capacity to use genetic resources, their ability to pay for them or budgetary limitations. We were not clearly able to pinpoint the effect of specific international ABS rules (either those under the CBD/Nagoya or the ITPGRFA) since breeders' responses to our questions on if/how these frameworks had affected their work (in general or with specific materials) suggested a limited effect or one of which they were not clearly aware.

Increasing the use of different material types is affected by policy and technical limitations in different ways. It is elite lines and landraces that are most affected by policy/legal issues, the former by administrative procedures imposed by providers, the latter by an unwillingness by providers to grant access to germplasm. Since breeders' increasing use of elite lines is related to using external, foreign sources of germplasm, there may indeed be more barriers for introducing these materials, including on the phytosanitary side: it has been noted that certain countries have adopted phytosanitary policies that have led to lower acceptance rates of international genetic materials, or that some countries may not have the capacity to carry out all the analyses that their phytosanitary policies require, resulting in decreasing requests [57]. The fact that landraces are the material type specifically related to the response on providers' "unwillingness" to share material, may be due to the long-lasting heritage of traditional and cultural knowledge and the strong sense of ownership that is associated to landraces, often instigated by civil society organizations or local and national governments [81]. Ownership and sovereignty issues become even more relevant in light of the fact that many of the breeders working with landraces are also using more materials collected from on farm sources. While on farm sourcing yields materials subject to ongoing evolutionary forces, hence potentially more adapted to climate change [57,82], it also makes it more likely to raise cultural identity and resource ownership issues. In addition to on farm sources, breeders who are using more landraces are also accessing more materials from ex situ institutions, all of which fall within the breeders' own institution or her/his national borders. The fact that they don't turn to the CGIAR as much, despite the centres' facilitated access system under the conditions of the International Treaty on Plant Genetic Resources [60,83], may be due to the higher transaction costs (increased landrace use positively correlated to increased shipping costs) and possible delays due to international phytosanitary issues [57]. Furthermore, in a scenario of increasingly cumbersome international exchanges, personal contacts and closer relationships of trust may be more effective in smoothening germplasm transactions, particularly those which involve culturally-relevant materials [57].

Technical and financial challenges outweighed political/legal or ownership issues when it came respondents' ability to access and use crop wild relatives. While there are a number of success stories around the introduction of useful traits from the highly variable pools of crop wild relatives [41,46–50], the transfer of alleles from wild populations tends to be slow, genetically tricky and expensive compared to when using advanced or elite lines [12,55,84]. Other limitations, such as the lack of genetic materials or information about them, only affected a minority of breeders working with wild relatives, a minority which seeks to mine an even greater variety of ex situ sources, including foreign ex situ institutions and collections from the private sector. The lack of significant in situ sourcing of crop wild relatives goes against the recommendation of using materials subject to natural climate change selective pressures and may be an issue if we also consider the far from complete CWR representation in genebanks [10,85]. However, the possibility of sourcing materials from the wild requires a knowledge of any existing in situ conservation strategies [86] as well as targeted collection missions, which many countries don't have the finances or the technical tools to carry out. The efficiency of in situ sourcing could also be improved by greater availability and capacity to use eco-geographical and climate modelling tools, which can significantly narrow down the collection areas to be surveyed while maximising the chance to find adapted materials [87–90].

While most of the surveyed breeders are continuing to pursue similar strategies working with genetic resources under increasing climate change awareness, our results offer some interesting insights from the minority of breeders who are diversifying their germplasm materials and sources. Overcoming the barriers they experience may encourage a broader diffusion of those diversity-based strategies that literature describes as essential in responding to climate change [36–50,91]. Though some argue this potential is overstated [92,93], introducing more advanced genomic/phenotyping tools into any breeding programme has the potential to improve the power and speed of exploring large pools of diversity [94,95]. Although many molecular and genomic tools are becoming increasingly affordable, they still require substantial investment and training to be fully integrated into breeding programmes worldwide, particularly in developing countries [96,97]. The widespread diffusion of high-throughput field phenotyping infrastructure and capacities appears to be even slower [66,98,99]. The technical, technological and digital divide affecting researchers in some countries has increasingly important implications in terms of access to and use of genetic resources in breeding for climate relevant objectives. For example the opportunities to link historical climate change data with genomic analysis of germplasm stored in long-established collections, could provide an additional tool for breeders to select wild varieties with potential adaptive traits, particularly in centres of crop diversity [52]. Eco-geographical modelling tools could aid the identification of sites where to conduct multi-environment trials allowing the evaluation of germplasm across a range of climate-relevant target environments [100,101]. Financial, technology transfer and capacity building bottlenecks remain very real, particularly for certain (minor) crops and contexts that offer limited incentives for private sector involvement, for instance through public-private partnerships or consortia. Thus, increasing support from national governments and networking with foreign or international institutions will be crucial.

Above all technical trends and limitations, the persistence of high-level policy and legal bottlenecks affecting access to and use of PGRFA would deserve a more detailed and dedicated investigation than the one made possible by our survey data. Much has been written on how to smoothen national and international rules and regulations, to achieve a more mutually supportive implementation of the numerous agreements on access to and use of germplasm (and most particularly of the Convention on Biological Diversity—CBD, its Nagoya Protocol, and the International Treaty on PGRFA). It would appear that substantial work is still required, since the many actors now involved in national policy development and implementation are uncertain how to do this in practice. This scenario risks negatively affecting researchers and crop breeders, who instead could and should be playing an important role, hand in hand with farmers, in adapting agriculture to future climate challenges.

#### **4. Materials and Methods**

Data for this study were collected in 2013 through a web-based survey designed by a research team from the Centre for Science, Technology and Environment Policy Studies of Arizona State University and Bioversity International. The study targets were plant breeders in national programmes dealing with food security crops in their countries. Target countries were 19 developing countries across Asia, Africa and Latin America (namely Bhutan, Bolivia, Brazil, Burkina Faso, China, Costa Rica, Ethiopia, Guatemala, India, Ivory Coast, Jordan, Kenya, Morocco, Nepal, Peru, Philippines, Rwanda, Uganda and Zambia). Many of these are countries where Bioversity International has been running projects and has contacts within the plant genetic resources community. In addition, the selection also took into account countries' human development index and their climate vulnerability.

The sample frame for the target population was prepared by collecting contacts of potential respondents from electronic sources, including country reports from the Global Partnership Initiative for Plant Breeding Capacity Building, the African Crop Science Society and a list of national and international workshop participants, and through Bioversity International's network of national project partners. Complete contact information for approximately 1092 potential respondents was collected within the 19 countries. The team checked for repetition of names or incorrect email addresses (for example, through pinging) before administering the survey. The survey was pre-tested by the study team members. The final version was translated to English, French and Spanish, written into the Sawtooth software© and administered online. Potential respondents received an email describing the project and requesting their participation in the survey, with an appropriate consent statement. The letter provided information about the survey and its purposes as well as the link to the instrument, a unique identification and an individual password. The survey contained 65 questions covering a wide range of issues (technical, policy, financial) regarding breeders' work; the questions we are analysing in this paper focused on breeders' perceptions of climate change and of farmers' climate-related needs over the past two-to-five years, the priority traits they work on, the types of germplasm they use and the sources of access for such germplasm. For priority traits, types and sources of germplasm, breeders were asked both about their business-as-usual behaviour and about any changes occurring over the last two-to-five years. Other questions analysed here concerned the institution breeders worked in, the resource allocation for their programme, as well as gender, age, academic qualifications and breeding experience. Except for age and years of work as breeders, all the other questions were multiple choice, with several pre-defined responses. Finally, we also looked into breeders' responses to an open ended question on what tools they would need to improve their breeding work and efficiency. All categorical responses were coded numerically for subsequent analyses, which employed basic descriptive statistics such as mean, frequency and association analyses. All data analysis was performed in R [102].

**Author Contributions:** Conceptualization, G.G., A.S., M.H., I.L.N. and E.W.W.; Formal analysis, G.G. and A.S.; Methodology, G.G., A.S. and E.W.W.; Software, G.G. and A.S.; Supervision, M.H., I.L.N. and E.W.W.; Writing-original draft, G.G. and A.S.; Writing-review & editing, G.G., A.S., M.H. and I.L.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was implemented as part of the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS), which is carried out with support from the CGIAR Trust Fund and through bilateral funding agreements. For details please visit https://ccafs.cgiar.org/donors. The views expressed in this document cannot be taken to reflect the official opinions of these organizations. Support was also received from the CGIAR Genebank Platform and from the Directorate-General for International Cooperation, Ministry of Foreign Affairs of the Netherlands through the grant "Strengthening national capacities to implement the International Treaty on Plant Genetic Resources for Food and Agriculture" (Grant contract number: 21820/DSO0113359).

**Acknowledgments:** The authors would like to thank all breeders who took the time to answer the survey, providing us with the basis for the present analysis.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Plants* Editorial Office E-mail: plants@mdpi.com www.mdpi.com/journal/plants

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18