**Molecular Analysis of the O**ffi**cial Algerian Olive Collection Highlighted a Hotspot of Biodiversity in the Central Mediterranean Basin**

## **Benalia Haddad 1, Alessandro Silvestre Gristina 2,\*, Francesco Mercati 2, Abd Elkader Saadi 3, Nassima Aiter 4,5, Adriana Martorana 2, Abdoallah Sharaf 2,6 and Francesco Carimi <sup>2</sup>**


Received: 30 January 2020; Accepted: 9 March 2020; Published: 13 March 2020

**Abstract:** Genetic diversity and population structure studies of local olive germplasm are important to safeguard biodiversity, for genetic resources management and to improve the knowledge on the distribution and evolution patterns of this species. In the present study Algerian olive germplasm was characterized using 16 nuclear (nuSSR) and six chloroplast (cpSSR) microsatellites. Algerian varieties, collected from the National Olive Germplasm Repository (ITAFV), 10 of which had never been genotyped before, were analyzed. Our results highlighted the presence of an exclusive genetic core represented by 13 cultivars located in a mountainous area in the North-East of Algeria, named Little Kabylie. Comparison with published datasets, representative of the Mediterranean genetic background, revealed that the most Algerian varieties showed affinity with Central and Eastern Mediterranean cultivars. Interestingly, cpSSR phylogenetic analysis supported results from nuSSRs, highlighting similarities between Algerian germplasm and wild olives from Greece, Italy, Spain and Morocco. This study sheds light on the genetic relationship of Algerian and Mediterranean olive germplasm suggesting possible events of secondary domestication and/or crossing and hybridization across the Mediterranean area. Our findings revealed a distinctive genetic background for cultivars from Little Kabylie and support the increasing awareness that North Africa represents a hotspot of diversity for crop varieties and crop wild relative species.

**Keywords:** *Olea europaea* L.; olive; cpSSR; nuSSR; genetic diversity; population structure; Mediterranean Region

#### **1. Introduction**

Olive (*Olea europaea* L.) is one of the most important fruit species of the Mediterranean region [1]. Ninety-eight percent of olive trees of the world are cultivated in this region [2], providing over 90% of World production [3]. Olive fruits and olive oil are central in the Mediterranean diet and symbols of the Mediterranean culture. It is commonly believed that olive domestication occurred in the Near East approximately 6000 years ago [4]. Phoenicians, Greeks and Romans later spread olive cultivation to the western Mediterranean region [5–8]. The hypothesis of a human-mediated diffusion of the olive tree from the eastern to western Mediterranean basin is supported by recent genetic studies [9], demonstrating that as many as 90% of current cultivars are characterized by the same chloroplast haplotype lineage [4,10]. Therefore, the spreading of the olive culture throughout the Mediterranean Basin by human migrations and commercial exchanges has played a key role in determining the pattern of olive germplasm [11,12]. The cultivated olive germplasm shows a high degree of diversity, with about 1250 recognized cultivars [13]. Olive cultivation in Algeria dates back to antiquity, and it has maintained great socio-economic importance until present days [14] and is mostly present along the Mediterranean coast. In this area, the mountainous region of Kabylie, geographically divided in two districts by the river Soummam, great Kabylie to the West, and little Kabylie to the East, can be considered an important reserve of local olive germplasm [14]. The olive sector is considered strategic for the Algerian economy and for this reason the Algerian Ministry of Agriculture and Rural Development recently set a strategy for the expansion of olive tree cultivation in different regions, aiming to reach one million hectares by 2019, using the local genetic resources. Therefore, the identification and characterization of local germplasm is a key step for future breeding programs, cultivar selection for new plantations and to preserve Algerian olive biodiversity from the risk of genetic erosion due to introduction of foreign cultivars. Despite that Hauville [15] reported 150 varieties in Algeria, according to the Algerian Ministry of Agriculture and Rural Development, 36 main varieties are officially recognized and cultivated in the experimental field of the Institut Technique de l'Arboriculture Fruitiere et de la Vigne (ITAFV, Takarietz, Bejaia). Previous studies on Algerian olive germplasm focused mainly on the genetic characterization of a subset of local cultivars [16,17], their population structure and their genetic relationship with wild olive trees [14]. In his study of the World Olive Germplasm Bank (WOGB) of Marrakech, Haouane et al. [18] analyzed some Algerian varieties with nuclear and chloroplast microsatellites (nuSSRs and cpSSRs), but only their cpSSRs profiles are publicly available.

The present study is the first genetic characterization of the official Algerian National Olive Germplasm Repository, using both nuSSRs and cpSSRs. Among the available molecular markers, nuSSRs were chosen for their highly reproducible and informative co-dominant and multi-allelic nature, which allowed to evaluate the genetic diversity in several plant crops, such as maize [19], rice [20], common bean [21], wheat [22], tomato [23,24], grape [25–28] and olive [29–33]. CpSSRs are maternally inherited in angiosperms, and they have been informative in unravelling the phylogenetic pattern in olive germplasm [4,34] but also in other crops such as grape [35,36].

In order to provide new insights in the origin and diffusion of olive cultivars around the Mediterranean basin, we analyzed the Algerian varieties with 16 nuSSR and six cpSSR markers and compared our results with the widest published datasets available, which includes olive varieties representative of Mediterranean Basin crop's biodiversity. The aims of this research were: (i) to evaluate the genetic diversity of the main Algerian olive varieties; (ii) to assess, for the first time, the genetic relationships between this germplasm and olive accessions from public datasets; (iii) to provide useful knowledge for future cultivation expansion and breeding programs.

#### **2. Materials and Methods**

#### *2.1. Plant Material and Sampling*

A total of 34 Algerian varieties from the ITAFV national olive germplasm collection (Table 1) were sampled for the genetic characterization. The ITAFV experimental field was created between 1947 and 1954, covering 0.95 ha. It is located 30 km off Bejaia Takarietz (latitude 36.24, longitude 6.57 and altitude 63.30), in the coastal area of the Sidi Aich district (Figure 1), an area with an arboriculture vocation, characterized by a Mediterranean climate [37]. Information on Algerian varieties including its Arabic name, meaning, synonyms, putative origin, diffusion and use are reported in Table S1, the catalogue illustrating the main features of each variety is presented in Table S2.

**Table 1.** List of Algerian varieties analyzed, collected at the Institut Technique de l'Arboriculture Fruitière et de la Vigne (ITAFV).


**Figure 1.** Geographic origin of the Algerian olive cultivars sampled. The yellow point indicates the ITAFV (Takarietz, Bejaia). In brackets, the region of diffusion of the characterized cultivars is indicated. The numbers highlight the origin of cultivars: (1) Abani; (2) Aberkane; (3) Aeleh; (4) Aghchrend'el Ousseur; (5) Aghchren de Titest; (6) Aghenfas; (7) Agrarez; (8) Aguenaou; (9) Aimel; (10) Akerma; (11) Azeradj; (12) Blanquette de Guelma; (13) Bouchouk Guergour; (14) Bouchouk Lafayette; (15) Bouchouk Soummam; (16) Boughenfous; (17) Bouichret; (18) Boukaila; (19) Bouricha; (20) Chemlal; (21) Ferkani; (22) Grosse du Hamma; (23) Hamra; (24) Limli; (25) Longue de Miliana; (26) Mekki; (27) Neb Djemel; (28) Ronde de Miliana; (29) Rougette de Mitidja; (30) Sigoise; (31) Souidi; (32) Tabelout; (33) Takesrit; (34) Tefah.

#### *2.2. Molecular Analyses (nuSSRs and cpSSRs)*

Total genomic DNA was extracted from 0.1 g of dry leaves following the Doyle and Doyle [38] CTAB (cetyl-trimethylammonium bromide) method. The extract was treated with DNase-free RNase (Roche Diagnostics, Mannheim, Germany) and the quality and concentration were checked by a NanoDrop Spectrophotometer (Thermo Scientific—Waltham, MA, USA).

Algerian varieties were analyzed by 16 nuSSRs available in current literature [28–31] (Table S3). The haplotype of each variety was evaluated by using six cpSSRs [34] (Table S3). Multiplexed amplification reactions were performed in 15 μL final volume reaction mixture as described by Garfì et al. [39]. The amplification products were solved on ABI PRISM 310 Genetic Analyzer (Applied Biosystems by Life Technologies, Foster City, CA, USA) and the alleles were sized by GENEMAPPER 4.0 (Applied Biosystems by Life Technologies).

Many articles analyzed large datasets of wild and cultivated olive nuSSR profiles, but unfortunately, only a few of them provide their genetic profiles. We compared our nuSSRs profiles with the largest

published dataset available from the WOGB of Cordoba [40], using a common subset of seven SSRs (Table S3). Normalization among datasets was achieved using the common variety Chemlal de Kabylie, present in our dataset with the synonym Chemlal. For all the analyses, WOGB profiles were grouped geographically as follow: Spain and Portugal (Iberian Peninsula—IB); France (FRA), Italy (ITA), Morocco and Tunisia (Maghreb—MAG); Croatia, Albania and Greece (Balcanic Peninsula—BAL), Turkey and Cyprus (Turkey—TUR); Iran, Israel, Lebanon, Syria and Egypt (East Mediterranean—East-M). Due to the reduced number of SSRs, WOGB dataset was reduced to 351 accessions to consider only unique genetic profiles, and some Algerian varieties were grouped in single genetic profiles for the Structure analyses because with seven SSRs they were not able to differentiate, namely: Aguenau, including Agrarez and Hamra, and Aimel, including Aberkane. cpSSRs profiles were also compared with the available published dataset [4,34].

#### *2.3. Data Analysis*

For each microsatellite we estimated the principal genetic parameters, i.e., number of alleles (Na), expected (He) and observed (Ho) heterozygosity and Polymorphic Information Content (PIC) by using PowerMarker [41], Haplotype analysis software version 1.05 [42] and FreeNA [43] software.

To identify the number of genetic groups in the Algerian germplasm, cluster analysis was carried out for nuSSR and cpSSR separately according to the UPGMA (Unweighted Pair-Group Method with Arithmetical Averages) algorithm and two phylogenetic trees were generated using the R package Adegenet [44]. The levels of support for the nodes were estimated by bootstrap analysis (1000 replicates). The number of genetic groups within the Algerian collection alone and among Algerian and other Mediterranean and Near East cultivars, was inferred by means of Principal Coordinates Analysis (PCoA) in GenAlEx v.6.51b2 [45] and of Bayesian analysis in STRUCTURE [46]. The most likely number of genetic groups (*K*) in STRUCTURE was calculated following Evanno et al. [47]. Twenty independent runs (100,000 burn-in, 1,000,000 Marchov Chain Monte Carlo) for each *K* were carried out using the admixture model with correlated marker frequency and default parameters. The runs were averaged using CLUMPP (CLUster Matching and Permutation Program) [48] and the histograms were shown using DISTRUCT program [49]. Individuals with ancestry value < 0.65 were considered mosaics (Tables S4 and S5), while those with higher values were assigned to the corresponding cluster. Using nuSSR profiles, main genetic parameters, including pairwise Gst values [50], among Mediterranean populations were calculated in GenAlEx v.6.51b2 [45]. For Gst values, the significance of the differentiation between pairs of selected populations was tested by permutation procedures (9999 replicates).

For the comparison with the WOGB dataset [40], three STRUCTURE analyses were performed. The first analysis tested whether the number of Mediterranean olive genetic clusters (West, Central and Eastern Mediterranean) changed by the reduction of the SSRs marker panel. The second analysis identified the ancestry of the Algerian germplasm. Finally, a hierarchical analysis [51] following the procedure described above, was carried out only with samples belonging to Central and Eastern Mediterranean pool, showing an ancestry value higher than 0.80 (Table S5).

#### **3. Results**

#### *3.1. Genetic Diversity Assessed by cp SSRs and nuSSRs*

The six cpSSR markers showed a total of 12 alleles, with an average of two alleles per locus and major allele frequency values ranging from 0.647 to 1.000 (Table S6). Five out of six cpSSR used were polymorphic, while trnT-L-polyT locus was monomorphic in the analyzed collection (Table S6). The phylogenetic tree obtained through cpSSRs (Figure 2) highlighted three chlorotype groups, corresponding to the wild and cultivated lineages identified in the Mediterranean (Table 2) [4].


**Table 2.** Comparison of chlorotype lineages of Algerian varieties identified in this study and in published literature.

**Figure 2.** UPGMA (Unweighted Pair-Group Method with Arithmetical Averages) tree of Algerian germplasm based on chloroplast simple sequence repeats (cpSSRs). Original haplotypes (CE1, CE2, COM1-COM2, CCK-CCK2) obtained with cpSSR from Besnard et al. [34] are reported together with the corresponding haplotype lineage: E1, E2 and E3 following Besnard et al. [4].

NuSSRs markers amplified a total of 140 alleles, ranging from five to 11 for EMO90 and DCA07, respectively, with an average of 7.2 alleles per marker (Table S6), which is in agreement with previous studies [14,52]. The average PIC value for nuSSRs (0.659) indicates that the analyzed markers are highly informative and useful for variety screening. Among nuSSR loci, 11 (69%) showed high polymorphism with PIC values exceeding 0.6 (Table S6). In agreement with the PIC value, the average He value was 0.716, while the Ho value was greater than 0.500 for 10 loci (Table S6), underlining a remarkable rate of heterozygosity among the studied cultivars.

The UPGMA phylogenetic tree based on nuSSRs underlined a main genetic group (A) of 21 varieties, 13 of which belonged to Little Kabylie (LK), split in four subclusters. A second group (B) of 13 cultivars separated in two subclusters (Figure 3a). Within group A, subcluster A1 included only cultivars native to LK; A2 consisted of two varieties from LK plus Sigoise and Neb Djemel, considered native to the Mascara Plain (West from LK) and Cherchar (South-East from LK), respectively; A3 showed three varieties from LK, Hamra from the nearby coastal area of Jijel, and Mekki from the Aurès mountain region (close to the Sahara desert); A4 included Chemlal, an important variety that covers 30% of the Algerian olive orchard, and three cultivars with local distribution, i.e., Bouricha, Grosse du Hamma and Rougette de Mitidja. In the B group, the two subclusters accounted six varieties each, with three cultivars considered native to Kabylie, i.e., Akerma, Bouchouk Soummam and Tabelout. The last remaining cultivar, considered native to Kabylie, Aghchren d'el Osseur, clustered alone as an outgroup.

STRUCTURE analysis clearly assigned 30 varieties (ancestry value > 0.65) to one of the six identified genetic groups (Figure 3b, Table S4), while the remaining four cultivars (Aeleh, Blanquette de Guelma, Bouricha, Neb Djemel) showed a mosaic genetic pattern. The six genetic groups were in agreement with phylogenetic analysis, with group K1, K2, K4 and K5 included in cluster A, counting the 13 varieties from LK, while K3 and K6 belonged to group B. In particular, K4 and K5 exactly corresponded to group A1 and A3, respectively; K1 included the variety belonging to A2, plus Grosse du Hamma; in K2, two varieties of A4 (Chemlal and Rougette de Mitidja) plus the cultivar Aghchren d'el Osseur were included. Finally, K3 and K6 account respectively for three and seven varieties, but they did not correspond to the subclusters B.

**Figure 3.** (**a**) UPGMA tree of 34 Algerian varieties based on nuclear simple sequence repeats (nuSSRs). Capital letters indicate the two main clusters (A and B) and relative subclusters; colored dots highlight the haplotype lineage of each variety, and colored rectangles indicate the genetic cluster identified by STRUCTURE analysis. Underlined varieties are native to Little Kabylie (LK). (**b**) STRUCTURE analysis of Algerian germplasm showing on the left side accessions from Little Kabylie and on the right cultivars from other Algerian regions; each color represents the identified genetic cluster and the length of the colored segment shows the estimated membership proportion of each sample to designed group.

We further analyzed the nuclear genetic profiles by PCoA (Figure S1). The resulting pattern reflected the genetic structure identified by UPGMA and STRUCTURE: the first axis separated most of the LK varieties with some other cultivars from group A. The second axis separated a group of five varieties, corresponding to A3.

#### *3.2. Relationship among Algerian and Mediterranean and Near East Germplasm*

In order to frame the genetic relationships of the Algerian cultivars within the three main Mediterranean lineages, we compared our profiles with a large dataset of cultivated accessions across the Mediterranean Basin and Near East [40] using hierarchical and Bayesian clustering by mean of a common set of seven nuSSRs. The genetic parameters for each population are shown in Table 3. We observed high values of genetic diversity for each population (ranging from 0.635—FRA—to 0.746—EAST-M, mean 0.698), and a mean of 0.772 for observed heterozygosity, with the Algerian cultivars that showed the lowest value (0.696). The inbreeding coefficient was negative for all populations, but can be considered in equilibrium.


**Table 3.** Genetic parameters of Algerian and Mediterranean germplasm obtained by nuSSR profiles.

Mean value over loci and standard errors for each population: N: Number of samples; Na: Number of different alleles; Ne: Number of effective alleles; I: Shannon's information index; He: Expected heterozygosity; Ho: Observed heterozygosity; F: Inbreeding coefficient. ALG: Algeria; IB: Iberian Peninsula—Spain and Portugal; FRA: France; ITA: Italy; MAG: Maghreb—Morocco and Tunisia; BAL: Balcanic Peninsula—Croatia, Albania and Greece; TUR: Turkey—Turkey and Cyprus; EAST-M: East Mediterranean—Iran, Israel, Lebanon, Syria and Egypt.

The pairwise Nei's genetic distances and relative Gst values (Table 4) indicated that the Algerian and Iberian germplasm were significantly (*p* < 0.01) the most unrelated, followed by Algeria vs. France and Algeria vs. Turkey. On the contrary, the most related cultivars were those from Turkey vs. East-Mediterranean, Italy vs. Balkan, Balkan vs. Turkey and Iberian vs. Maghreb. The PCoA analysis was able to separate a wide number of the Western varieties from the Central-Eastern group. The Algerian samples showed a bimodal distribution with a group of varieties located in the center of the graph with the Turkish and Near Eastern cultivars, and a second group clustering mainly with the central Mediterranean varieties from Italy and Balkan Peninsula (Figure 4a). The UPGMA phylogenetic tree differentiated Algerian germplasm in two subclusters (Figure S2), one with Western affinity consisting of Iberian Peninsula and Maghreb accessions and the other intermingled with mainly East Mediterranean cultivars and with a minor contribution of Iberian and Balkan varieties.


**Table 4.** Estimates of pairwise Gst values (below the diagonal) and Unbiased Nei's genetic distance (above the diagonal) among overall populations.

In bold significant values with *p* ≤ 0.01 calculated over 999 permutations.

**Figure 4.** Genetic relationship among Algerian and Mediterranean cultivars. (**a)** Principal Coordinates Analysis (PCoA) and (**b)** STRUCTURE analysis showing unique profiles of 31 Algerian and 351 Mediterranean accessions from Trujillo et al. [40]. (**c**) Hierarchical STRUCTURE analysis of cluster B accessions with ancestry value > 0.80 following Emanuelli et al. [51]. Each color represents the identified genetic cluster (cluster A = orange; cluster B = blue) and the length of the colored segment shows the estimated membership proportion of each sample to the designed group.

STRUCTURE analysis of the whole dataset revealed that the most likely number of clusters of Mediterranean varieties (*k* = 2) was in agreement with previous results [40]. In particular, as reported in other studies [4,9,53], two main pools were identified (Figure 4b), the first (A—orange) accounting for the most part of Western Mediterranean varieties, and the second (B—light blue) including mainly Central and Eastern Mediterranean cultivars. Algerian varieties clustered mostly with cluster B (59%), 9% was grouped in cluster A and the remaining cultivars (32%) were mosaics between the two groups (Table S5). Interestingly, all the cultivars belonging to cluster B were from LK, i.e., Aghenfas, Tefah and Bouchouk Guergour, the latter already differentiated by UPGMA analysis.

Finally, to have a higher resolution of group B, the more heterogeneous pool, which includes samples from all population studied, we ran a second round of STRUCTURE analysis using only the samples closely associated to it (ancestry >0.8). In total, 18 Algerian cultivars and 126 accessions from the entire Mediterranean region were investigated. The analysis allowed to identify four subclusters, including 84 varieties with strong association (ancestry value >0.65), while the remainders 60 cultivars were mosaics (Figure 4c, Table S5). Subcluster B1 included an assorted group consisting of samples from the entire Mediterranean Basin dominated by Central-Western Mediterranean cultivars including 100% of French accessions, 77% Italian, 57% Balkan and 44% Iberian, with a minor contribution of Turkish (31%), Maghreb (20%), East-Mediterranean (15%) and Algerian (6%). Subcluster B2 was well represented by North African cultivars accounting for 50% of the Algerian germplasm and 60% of Maghreb accessions, with a minor contribution of Turkey (15%), East-Mediterranean (15%), Balkan (13%) and Italy (8%). Subcluster B3 included mainly Eastern and Central Mediterranean varieties, accounting for the 41% of East-Mediterranean accessions, 33% Algerian, 23% Turkish, 15% Italian, 20% Maghreb and 4% Balkan. Subcluster B4 included almost exclusively Western and Eastern cultivars, accounting for 56% of Iberian varieties, 31% Turkish, 28% East Mediterranean and 26% Balkan, with a smaller contribution of central Mediterranean accessions (11% of Algerian accessions).

#### **4. Discussion**

For thousands of years, olive cultivation has been central in the culture and economy of many Mediterranean and Middle Eastern regions. The ancient civilizations that thrived in this wide geographical area selected and diffused countless varieties across the different countries facing the Mediterranean Sea. Due to these complex historical events, an endless debate arose among scholars about olive domestication, and in particular whether there has been a single or multiple independent domestication events [54,55].

In the last decades, the development of molecular markers such as nuclear and chloroplast SSRs have made it possible to investigate the genetic fingerprint of cultivated and wild olives and disentangle the clues left by migrations and crossing among varieties across the entire olive distribution area. Despite many recent studies provided nuclear and chloroplast SSRs genetic profiles for hundreds of wild and cultivated olive accessions, the germplasm from Central and Southern Mediterranean regions, especially from the Maghreb area, is highly underrepresented. Here, for the first time, we characterized the official Algerian collection of olive varieties from ITAFV by mean of both nuSSRs and cpSSRs. Our results filled the gaps left by previous studies [14,16,17] as we provided the nuSSR genetic profiles for 34 out of 36 official Algerian varieties, 10 of which have never been described (Aimel, Bouchouk Guergour, Bouchouk Lafayette, Boukaila, Grosse du Hamma, Hamra, Longue de Miliana, Mekki, Neb Djemel, Ronde de Miliana) and, for the first time, cpSSR profiles for seven varieties (Aeleh, Aghenfas, Bouchouk Guergour, Boughenfous, Bouichret, Tabelout, Tefah). Overall, high genetic diversity was observed, in agreement with the range obtained in previous studies [5,40,56,57], indicating that the Algerian olive germplasm collection represents an important genetic reservoir for the species. Compared with previous studies on Algerian germplasm [14,17], we found a lower number of alleles but a remarkably higher observed and expected heterozygosity. These discrepancies are probably due to the different panel of varieties analyzed and the presence of wild germplasm in previous studies, which contained private alleles [14,17]. The 16 nuSSRs used here allowed us to

discriminate all the Algerian cultivars and were powerful enough to resolve putative cases of synonymy. For example, we found that Agrarez and Azeradj had distinct profiles at eight loci and should not be considered as synonyms as previously suggested [14]. Most of the Algerian varieties (70%) belonged to the Mediterranean/Saharan Africa chlorotype olive lineage E1, widely represented in the cultivated and wild forms in the whole Mediterranean Basin. This cluster included two subclusters, CE1-CL1 and CE2 [34]. Subluster CE1-CL1 consisted of 13 varieties from LK region, Hamra from the nearby coastal area of Jijel, Longue de Miliana and Sigoise from the Central-West regions, Mekki, Ferkani and Souidi from Aurès. Subcluster CE2 accounted for one variety from LK (Boughenfous), and three varieties from the territory of Costantine (Grosse du Hamma), Guelma, (Blanquette de Guelma) and Aurès (Aeleh), respectively. Seven varieties, two from LK (Aghchren d'el Ousseur and Tabelout) and five from other regions (Abani, Limli, Neb Djemel, Ronde de Miliana, Rougette de Mitidja) grouped in the Central-Western Mediterranean lineage E2. Interestingly, this lineage is represented by wild olive mainly from Italy and Greece with a minor contribution from Spain and Morocco, and by few cultivars (*n* = 13) from different central Mediterranean regions (Corsica, France, Greece, Italy, Morocco, Sardinia, Spain and Tunisia) [58]. Chemlal, Akerma and Boukaila clustered in the other less common Western Mediterranean lineage E3, mainly found in wild olive from Spain and Morocco, and in cultivars from Maghreb except for three varieties from Corsica, France and Spain (i.e., Antonina, Olivière and Farga), respectively [58]. Our results mostly confirmed the chlorotypes identified in previous studies (*n* = 18), but highlighted some divergence (Table 2); in particular, we found that nine varieties chlorotypes were assigned differently compared to Besnard et al. [4], six when comparing Besnard et al. [4] and Haouane et al. [18], and two when comparing the three datasets. These results could be due to *i)* possible mislabeling errors in the WOGB collection of Marrakech; *ii)* errors in the published datasets, at least for the same six varieties coming from the above mentioned collection that showed different chlorotypes between the dataset of Besnard et al. [4] and Haouane et al. [18]; *iii)* different clones of the same varieties. The different clustering methods adopted in our study highlighted a clear genetic group mainly consisting of 13 LK cultivars, except for a few varieties (four) from other regions (emigrants). Conversely, a few cultivars (four) from LK clustered in other genetic groups (in-migrants). We can formulate some speculative hypotheses to explain these few exceptions to the general geographic and genetic division between LK region and other parts of Algeria. Emigrant varieties might share a wide genetic background with the LK group because they were selected in this region, but later, for some ecological/historical/agronomic reasons, their cultivation disappeared from the LK area and the knowledge of the original native region was lost. This could be the case of Mekki, Neb Djemel and Hamra, today cultivated only in the driest mountain area of the Aurès or Kenchela or in the coastal area, respectively, or in contrast diffuse throughout Algeria (Sigoise). It has been documented that the historical distribution of Mekki was connected to Phoenician, Greek and Roman dominations [37]. This variety share the chloroplast haplotype dominant in LK varieties, suggesting a common origin, thus, it is plausible that it was once distributed in this area and that it later disappeared in LK remaining confined in a restricted area near the old Roman town of Timgad. For in-migrant cultivars, we can speculate that the knowledge of the original native region was lost: these varieties could have been selected and genetically improved in LK by crossing with germplasm imported from other regions of Algeria/North Africa, thus explaining the genetic divergence from other LK varieties. In particular, nuSSR results were supported by cpSSR analysis indicating that *in-migrant* cultivars had different haplotypes as compared to other LK accessions. In a wider perspective, the combined use of nuSSR and cpSSR, that are differently affected by evolutionary processes (i.e., selection, mutation, recombination etc.), allowed us to investigate the genetic relationship of Algerian varieties with cultivars from other Mediterranean countries and shed light on the geographic origin of Algerian germplasm and relative patterns of crossing/migration across the Mediterranean Region. Our results are compatible with two different hypotheses: (i) local domestication from oleaster and Laperrine's olive; (ii) local selection by crossing of cultivars imported from other Mediterranean region with local germplasm. The gene flow between Algeria and the rest of the Mediterranean area has been probably

limited, suggesting that development of new cultivars possibly proceeded through crossing of few imported varieties with local germplasm, namely oleaster and Laperrine's olive, as testified by cpSSRs pattern of genetic diversity. We found that the highest proportion of Algerian varieties shared the same haplotype with the majority of Mediterranean cultivars (E1) and the Laperrine's olive. Interestingly, 30% of Algerian varieties belonged to the other two lineages E2 and E3, unravelling the contribution of the Western Mediterranean olive lineage to the origin of North African cultivars. This result provides new evidence on the role of Algeria as possible and important secondary domestication/selection center, considering that only 4.9% and 4.4% of the Mediterranean cultivars belong to E2 and E3 lineages, respectively [4]. In particular, we can speculate that few varieties with chlorotype E2 probably represent locally domesticated cultivars or cultivars imported from a central Mediterranean region such as Italy, e.g., during Roman domination, as this haplotype is only found in few cultivated varieties from Central Mediterranean region and Maghreb but it is common in wild olive from Italy and Greece. In addition, the three cultivars belonging to lineage E3 represent the most likely candidates for secondary domestication events in this area, given that E3 haplotype is rare and found in wild oleaster from Spain and Morocco and cultivated varieties from Morocco (*n* = 10), France (*n* = 2) and in one variety from Italy and Spain each.

Finally, regardless of the true origin of the Algerian germplasm, LK varieties can be considered an exclusive genetic core, selected and developed during the different historical periods by the civilizations that thrived in this area from Phoenicians to Arabs until the present. Genetic differentiation parameters and results of the hierarchical STRUCTURE provided evidence that Algerian varieties are more genetically related to Central-Eastern Mediterranean cultivars than to the West. In particular, the second round of STRUCTURE highlighted four main subclusters among the group of Eastern varieties. The Algerian germplasm grouped mostly in two subclusters, both with Central and Eastern affinity. Subcluster B1 was particularly interesting because it was dominant in Algeria and Maghreb, suggesting gene flow between Near East and North Africa. We speculate that the two STRUCTURE clusters could correspond to bottleneck events due to the arrival of different civilization, i.e., Phoenician and Romans. According to the chlorotype lineages identified, 67% of Algerian varieties showed affinity to the Eastern Mediterranean germplasm and 33% to the West. Nuclear DNA confirmed this pattern, with 80% of varieties showing affinity to the Eastern cluster and 20% to the Western cluster. These results might depend on recurrent reticulation events during the diffusion of olive culture [34] and reflect the predominant role of Eastern germplasm in the development of olive cultivars around the whole Mediterranean Basin [54,59]. However, our study revealed an important contribution of Central-Western Mediterranean germplasm in the development of olive varieties, supporting the hypothesis of the existence of an independent domestication center in the Central Mediterranean area [9,55]. This hypothesis is supported by the fact that the wild Laperrine's olive tree share its haplotype (E1) with most of the world's olive varieties, including 67% of the Algerian varieties, whereas the remaining 33% share their haplotypes (E2 and E3) with wild olives from Spain, Morocco, Italy and Greece. In particular, given that North African germplasm is highly underrepresented in current literature, the role of this area in the history of olive domestication and cultivar development should be reconsidered to evaluate its real contribution.

Our present data do not allow discriminating between the two different hypotheses, namely the occurrence of a secondary domestication event or introgression of imported cultivars with local germplasm, including natural populations (oleaster and Laperrine's olive). We can hypothesize that before foreign civilizations arrived in Algeria, wild olive tree populations consisted of two taxa, oleaster (*Olea europaea* subsp. *sylvestris*) and Laperrine's olive (*O. europaea* subsp. *laperrinei*), that were already exploited by local human population, laying the foundations for the development of olive cultivation. Subsequently, phoenicians introduced Eastern Mediterranean olive trees to North Africa, triggering gene flow with the local germplasm (oleaster and Laperrine's olive) and with cultivated olive coming from other Mediterranean areas. Local people and settlers from abroad (e.g., Romans, Arabs) eventually selected the cultivars better suited for the cultivation in the different environmental conditions of each Algerian region, thus originating the distribution pattern that we observe today.

#### **5. Conclusions**

Genetic studies of local olive varieties from different Mediterranean areas, in particular from North Africa, central Mediterranean area and Near East, can help to clarify the pathways of domestication and diffusion of this species along the history of civilizations. Due to its central position in the Mediterranean Basin, Algeria has played an important role for civilizations crossing the Mediterranean Basin, especially for Phoenicians, Romans and Arabs. To the best of our knowledge, this is the first study analyzing and characterizing the official Algerian olive germplasm collection by means of nuclear and chloroplast SSRs, comparing the results with available published datasets. An exclusive genetic group of 13 varieties from little Kabylie has been identified among the main Algerian olive cultivars and it can be considered a valuable genetic resource for future cultivation and breeding programs. Nuclear and chloroplast genetic profiles provided here will be useful for future program of plant material certification in Algeria. Bayesian and hierarchical cluster analyses allowed to develop inferences on the different patterns of genetic diversity observed. A detailed evolutionary view of Algerian germplasm has been defined, highlighting its genetic relationship with reference cultivars from the whole Mediterranean Basin and Near East. Our findings are compatible with the hypothesis of the existence of an independent olive domestication area in the center of the Mediterranean Basin, but further analysis with more extended datasets are needed to verify this hypothesis. Unfortunately, in contrast to other species, such as grapevine (Vitis International Variety Catalogue, European Vitis Database), there is no international database of olive varieties to use as a reference, and the genetic profiles are not always available. The creation of a public database for olive germplasm would greatly foster the elucidation of the history of domestication for this important crop species.

**Supplementary Materials:** The following are available online http://www.mdpi.com/2073-4425/11/3/303/s1, Figure S1: PCoA analysis of Algerian germplasm based on nuSSRs, Figure S2: UPGMA tree of the whole dataset covering the Mediterranean genetic background, Table S1: Name, synonym, Arabic name, meaning, geographic diffusion and origin, use of fruits and oil percentage of the analyzed Algerian varieties, Table S2: Description sheets of the analyzed Algerian varieties, Table S3: nuSSR and cpSSR markers used in the present study, genetic profiles for Algerian varieties, and the public Mediterranean dataset at seven SSRs (Trujillo et al. [40]), Table S4: Posterior membership coefficients following a STRUCTURE analysis and K = 6 for the studied germplasm, Table S5: Posterior membership coefficients following a STRUCTURE analysis and K = 2 for the studied germplasm, Table S6: Genetic parameters at 16 nuSSR and 6 cpSSR loci used to genotype the ITAFV Algerian germplasm collection.

**Author Contributions:** Conceptualization, B.H., A.S.G., F.M. and F.C.; Formal analysis, B.H., A.S.G., F.M., A.M. and A.S.; Investigation, B.H., A.S.G., F.M., A.M. and A.S.; Methodology, B.H., A.S.G., F.M. and F.C.; Project administration, B.H., A.S.G., F.M. and F.C.; Resources, B.H., N.A., A.E.S. and F.C.; Supervision, F.C.; Visualization, A.S.G. and F.M.; Writing—original draft, B.H., A.S.G., F.M. and F.C.; Writing—review and editing, B.H., A.S.G., F.M. and F.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by the Italian Ministry of Foreign Affairs and International Cooperation (MAECI), project n B56J13000320006 'Scienze per la DIPLOMAzia'.

**Acknowledgments:** We are grateful to Roberto De Michele and Angela Carra for reviewing the first draft of the manuscript and providing useful suggestion and to Francesca La Bella for helping in the laboratory work.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


Late Quaternary diversification of Mediterranean lineages to primary domestication in the northern Levant. *Proc. R. Soc. B Biol. Sci.* **2013**, *280*, 20122833. [CrossRef]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Diversity Analysis of Sweet Potato Genetic Resources Using Morphological and Qualitative Traits and Molecular Markers**

#### **Fabio Palumbo, Aline Carolina Galvao, Carlo Nicoletto \*, Paolo Sambo and Gianni Barcaccia**

Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE) University of Padova, Agripolis Campus, Viale dell'Università, 16-35020 Legnaro, Italy; fabio.palumbo@unipd.it (F.P.); aline.galvao@unipd.it (A.C.G.); paolo.sambo@unipd.it (P.S.); gianni.barcaccia@unipd.it (G.B.) **\*** Correspondence: carlo.nicoletto@unipd.it; Tel.: +39-049-827-2826

Received: 24 September 2019; Accepted: 22 October 2019; Published: 24 October 2019

**Abstract:** The European Union (EU) market for sweet potatoes has increased by 100% over the last five years, and sweet potato cultivation in southern European countries is a new opportunity for the EU to exploit and introduce new genotypes. In view of this demand, the origins of the principal Italian sweet potato clones, compared with a core collection of genotypes from Central and Southern America, were investigated for the first time. This was accomplished by combining a genetic analysis, exploiting 14 hypervariable microsatellite markers, with morphological and chemical measurements based on 16 parameters. From the molecular analyses, Italian accessions were determined to be genetically very similar to the South American germplasm, but they were sub-clustered into two groups. This finding was subsequently confirmed by the morphological and chemical measurements. Moreover, the analysis of the genetic structure of the population suggested that one of the two groups of Italian genotypes may have descended from one of the South American accessions, as predicted on the basis of the shared morphological characteristics and molecular fingerprints. Overall, the combination of two different characterization methods, genetic markers and agronomic traits, was effective in differentiating or clustering the sweet potato genotypes, in agreement with their geographical origin or phenotypic descriptors. This information could be exploited by both breeders and farmers to detect and protect commercial varieties, and hence for traceability purposes.

**Keywords:** *Ipomoea batatas*; genetic diversity; SSR markers; qualitative traits

#### **1. Introduction**

Sweet potato (*Ipomoea batatas* Lam.) is a root crop of the Convolvulaceae family, originating in Central and South America, which spread through the world with great ease due to its prominent productive efficiency. This crop plays a vital role in food production because it is one of the most important root and tuber crops in the world. Its ability to produce energy is very efficient and it can provide a significant quantity of protein and sugars per hectare in a short time [1].

In the European context, this crop had an enormous rise in consumption and, according to the Center for the Promotion of Imports from Developing Countries (CBI) [2], its importation has doubled in recent years. The European Union (EU) market for sweet potatoes has increased by 100% over the last five years (CBI, 2015), and sweet potato cultivation in southern European countries presents a new opportunity for the EU to exploit and introduce new genotypes. In Italy, it is considered a niche and ethnic product, and the recent immigration flow has created a market with increasing domestic demand [3] and many future opportunities for growth and profitability.

Despite the historical and commercial importance of sweet potatoes, to date, no study has investigated the origin, the conservation, or the genetic background of this species in Italy. On a larger scale, several works have been published [4–8] on the genetic characterization of sweet potato accessions, mainly to investigate the dispersal of New World sweet potato landraces from the center of origin (Tropical America, [9]). One of the main obstacles to the understanding of the dispersal dynamics of sweet potato throughout the world is probably the genetics of this hexaploid species (2*n* = 6x = 90) [10], which severely complicates any genomic approach. In particular, sweet potato is an allohexaploid species (AABBBB), most likely derived from the interspecific hybridization between a diploid and tetraploid species followed by chromosomal doubling [11,12]. As a consequence, its inheritance model is admixed, including both disomic (AA) and tetrasomic (BBBB) pairings. On the other hand, it must be recognized that this polyploidy could represent an important source of genetic diversity [13]. According to Silva Ritschel and Huamán [14], the vast genetic diversity that characterizes the sweet potato germplasm is also due to sexual reproduction (i.e., genetic segregation and recombination) and asexual propagation (i.e., fixation of specific genetic combinations), as well as to the exchange and introduction of plants from all over the world. This diversity provides a valuable source for potentially useful traits and allows plant breeders and farmers to adapt the crop to heterogeneous and changing environments [15]. The evolving climate conditions and the staggering expansion of the world population together represent pressing challenges for agriculture.

As already seen in other crops, morphological, agronomic, and molecular marker approaches are often used in combination to complement the information provided singularly in order to investigate the heterogeneity described in a species [16]. Molecular markers such as microsatellites or SSRs play a central role in the assessment and conservation of genetic diversity due to their efficiency, reliability, and reproducibility. Several studies based on the application of SSRs have recently attempted to monitor and prevent genetic erosion of local crop varieties in Italian scenarios [15,17–19]. Estimating the allelic dosage at each locus represents a critical question in polyploid species, even when using co-dominant markers such as microsatellites [20]. For this reason, as has already been done in previous studies [21–23], SSRs were scored as dominant markers and organized in binary matrixes, similarly to the process with AFLP or RAPD markers. According to Silva Ritschel and Huamán [14], morphological and chemical characterization is an indirect measure of population genetic diversity. Morphological markers for sweet potatoes are accessible and easy to use, making the technique one of the most used for this kind of analysis [24–26].

In this study, the geographical and genetic origins of the principal Italian sweet potato accessions were compared with a core collection of accessions from Central and South America for the first time. As has already been achieved in previous works on sweet potato [27,28], this was achieved by combining the high polymorphism and reproducibility of SSR markers with the high information value of strategic morphological and qualitative traits.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

The cultivation was carried out at the experimental farm "L. Toniolo" of Padova University (45◦21 N; 11◦58 E; 8 m a.s.l.) in the 2016 spring/fall growing cycle. The propagation material used in the experiment was obtained from the germplasm bank of the Padova University (Table 1). The pedigree information of the plant materials derived from Central and Sothern America is unknown; as to the origin of Italian genotypes, we only know that they were introduced into Tuscany in 1630, cultivated until the end of the 1800s exclusively in botanical gardens, and only spread to the Northern Italy cultivation areas from 1880. In January 2016, sweet potato cuttings were produced in a glass greenhouse set with a temperature of 25 ◦C and 18 ◦C during the day and night, respectively. Thirty storage roots for each genotype, 40 mm to 80 mm in diameter, were placed in PVC pots (three roots per pot) filled with a peaty substrate (Klasmann Potgrond H) integrated with 20% perlite. In May, the cuttings were

suitable for transplanting (0.30–0.35 m tall). Before transplanting, the soil was plowed and fertilized with 80, 70, and 210 kg ha−<sup>1</sup> of N, P2O5, and K2O, respectively [29]. Cuttings were planted 0.10 m deep on the built-up rows, spacing the plants 0.35 m apart in the row and 0.80 m between rows. After transplanting, approximately 100 mL of water was provided for each cutting. The crop was irrigated three times during the growing cycle, at a rate of 30 mm m−<sup>2</sup> for each irrigation. Sweet potatoes were harvested at the end of September 2016.


**Table 1.** List of sweet potato genotypes used.

n.a.: not available.

#### *2.2. Molecular Analysis*

#### 2.2.1. Genomic DNA Isolation

Leaves were collected from 1 month old transplants, snap-frozen in liquid nitrogen upon harvesting, and stored at −20 ◦C until further processing. Approximately 100 mg of leaf tissue was employed for the isolation of genomic DNA using the DNeasy plant kit (Qiagen, Valencia, CA, USA), according to the manufacturer's instructions. Extracted DNA samples were run on 0.8% agarose/1× TAE gel containing 1× SYBR Safe DNA stain (Life Technologies, Carlsbad, CA, USA) to evaluate their integrity. Both the purity and quantity were assessed with a NanoDrop 2000c UV-Vis spectrophotometer (Thermo Scientific, Pitsburgh, PA, USA).

#### 2.2.2. SSR Genotyping

For the SSR analysis, microsatellite markers belonging to 14 distinct genomic loci from both coding regions (EST-SSR) and non-coding regions (nSSR) were obtained from different sources [4,5,10,30] (Table 2). These markers were chosen due to the high polymorphism they showed in the reference studies. In order to evaluate the efficiency and the polymorphism degree of this SSR set, a preliminary test was performed using three different clones randomly chosen from the sweet potato collection, and each was analyzed in two biological replicates (i.e., two distinct plants for each clone). Moreover, each biological replicate was, in turn, analyzed in two technical replicates, to evaluate the reproducibility of the SSR.



\* Tm of the forward primers does not take into account the tail sequence.

The PCRs were carried out via the three-primer strategy reported by Schuelke [31], with a major modification first described by Palumbo et al. [32]; Instead of using only M13, three additional universal sequences (designated as PAN1, PAN2, and PAN3) were used to tag the 5' end of the forward primer of each couple (colored sequences in Table 2) and adopted in combination with M13, PAN1, PAN2, and PAN3 fluorophore-labeled oligonucleotides. Fluorophores adopted were 6-FAM, VIC, NED, and PET, respectively. Due to the genetic complexity of the species and thus the possibility of obtaining up to six alleles per SSR locus, PCRs were performed in single reactions. Each reaction contained approximately 40 ng of genomic DNA template, 1<sup>×</sup> Platinum® Multiplex PCR Master Mix (Applied Biosystems, Carlsbad, CA, USA), GC enhancer 10% (Applied Biosystems), 0.05 μM tailed forward primer (Invitrogen Corporation, Carlsbad, CA, USA), 0.1 μM reverse primer (Invitrogen Corporation), 0.23 μM universal primer (Invitrogen Corporation), and sterile water to volume. Amplifications were performed in a 96 well plate using a 9600 thermal cycler (Applied Biosystems), adopting the following conditions: after initial denaturation for 2 min at 95 ◦C, a touch-down PCR was undertaken with six cycles consisting of 30 s denaturation at 95 ◦C, 1 min annealing at 60 ◦C decreasing by 1.0 ◦C with each cycle and 30 s elongation at 72 ◦C; then 35 cycles at 95 ◦C for 30 s, 55 ◦C for 60 s, and 72 ◦C for 30 s. A final extension at 60 ◦C for 30 min terminated the reaction and filled in any protruding ends of the newly synthesized strands. The amplicons were visualized and quantified by agarose gel electrophoresis (2% agarose/1× TAE gel containing 1× Sybr Safe DNA stain (Life Technologies)), and the gel pictures were acquired with an UVITEC UV Transilluminator (Cambridge, UK) equipped with a digital camera. Subsequently, 10 ng of each PCR product was pooled and organized according to the four multiplexes reported in Table 2 and subjected to capillary electrophoresis on an ABI PRISM 3130xl Genetic Analyzer (Thermo Fisher) using LIZ500 (Applied Biosystems) as molecular weight standard and G5 (Applied Biosystems) as filter. Peak Scanner software v. 2.0 (Applied Biosystems) was used to determine the size of each peak, and each SSR was handled as a dominant marker. From the total of 14 SSR primer pairs initially selected for sweet potato genome analysis and tested for polymorphisms, Ib318 [4], J263 [5], and GDAAS0156 [10] showed weak resolution or screening errors and were excluded from our study.

#### 2.2.3. Marker Data Analysis

Data were coded as (0,1) vectors, where 1 indicated the presence and 0 the absence of a peak/allele at a specific position in the electropherogram. The polymorphic information content (PIC) of each SSR locus over its *n* marker alleles was computed as [33],

$$\text{PIC} = 1 - \sum \mathbf{p}\_i^2 \tag{1}$$

where p*<sup>i</sup>* is the frequency of the marker allele *i*.

Genetic similarity between the clones was estimated by applying Dice's coefficient [34] in all possible pairwise comparisons, and a triangular similarity data matrix was generated. The first two principal components of the matrix were thus computed through a principal coordinate analysis (PCoA). All calculations were conducted using NTSYS-pc v. 2.21q software [35]. Taking advantage of the genetic similarity data, PAST software v. 3.14 [36] was used to construct a dendrogram through the unweighted pair group arithmetic average (UPGMA) method and by applying the Dice's coefficient [34]. To measure the stability of the computed branches, a statistical bootstrap analysis was conducted with 1000 resampling replicates. GenAlEX software v. 6.5 [37] estimated the number of observed alleles and the presence of private allele throughput in all the samples, purposely grouped as "Italian clones (*N* = 5)" and "foreign clones (*N* = 17)" according to their putative origin. Marker alleles were scored as "private" when shared by at least 60% of the individuals of one group and simultaneously absent from the other group.

A Bayesian clustering algorithm implemented in STRUCTURE v. 2.2 software [38] was used to model the genetic structure of the considered *I. batatas* accessions' haploid genotypes. The "admixture model" and "correlated allele frequencies model" were selected, because no prior knowledge about their origin was available (first model) and to guarantee the identification of a previously undetected correlation without affecting the results if no correlation existed [39] (second model). The number of founding groups ranged from 2 to 10, and 10 replicate simulations were performed for each K value, setting a burn-in of 2 <sup>×</sup> <sup>10</sup><sup>5</sup> and a final run of 10<sup>6</sup> Markov chain Monte Carlo (MCMC) steps [40]. Finally, the most likely estimation of K was decided by evaluating the rate of change in the log probability of data between successive K values (ΔK method), according to Evanno et al. [41]. In particular, one 2D Excel vertical histogram for each accession, conveniently divided into K colored segments, was used to represent the estimated membership in each hypothesized ancestral genotype. Each color correlated to a putative ancestor.

#### *2.3. Morphological and Chemical Analyses*

The morphological characterization was performed in August 2016. Root shape, root skin color, root flesh color, and the general outline of the leaf were scored according to the morphological descriptors available from International Board for Plant Genetic Resources (IBPGR) [42]. All the morphological traits considered for each genotype were transformed into numbers using the CIP scale [42] in order to process them with statistics. Three biological replicates were performed for the chemical analyses in order to recover representative data about the sweet potato samples.

#### 2.3.1. Extraction of Phenols for Analysis

Freeze-dried samples (1 g) were extracted in methanol (20 mL) with an Ultra Turrax T25 (IKA-Labortechnik, Staufen, Germany) at 1018 rpm until a uniform consistency was achieved. Samples were filtered (589 filter paper; Whatman, Germany) and appropriate aliquots of extracts were assayed by the Folin–Ciocalteu (FC) method [43] for total phenolic (TP) content, and by the ferric reducing antioxidant power (FRAP method) for antioxidant activity [44]. For HPLC analyses, extracts were further filtered with cellulose acetate syringe filters (0.45 μm porosity).

#### 2.3.2. Determination of TP Content by the FC Assay

The TP content was determined according to the FC assay, using gallic acid as a calibration standard and a UV-1800 spectrophotometer (Shimadzu, Columbia, MD, USA). The FC assay was carried out by putting 200 μL of sweet potato extract into a 10 mL test tube, followed by the addition of FC reagent (1 mL). The mixture was vortexed for 20–30 s and 800 μL of filtered 20% sodium carbonate solution was added 1–8 min after the FC reagent addition. The mixture was then vortexed for 20–30 s (time 0). The absorbance of the colored reaction product was measured at 765 nm after two hours at room temperature. The TP content in the extracts was calculated from a standard calibration curve obtained with different concentrations of gallic acid, ranging from 0 to 600 μg mL−<sup>1</sup> (coefficient of determination: *r*<sup>2</sup> = 0.9992). The results have been expressed as mg gallic acid equivalent (GAE) kg−<sup>1</sup> dry weight.

#### 2.3.3. Determination of Total Antioxidant Activity by FRAP

Freshly prepared FRAP reagent contained 1 mmol L−<sup>1</sup> 2,4,6-tripyridyl-2-triazine and 2 mmol L−<sup>1</sup> ferric chloride in 0.25 mol L−<sup>1</sup> sodium acetate (pH 3.6). A methanol extract aliquot (100 μL) was added to the FRAP reagent (1900 μL) and accurately mixed. Absorbance was determined at 593 nm after leaving the mixture at 20 ◦C for 4 min. The calibration was performed with a standard curve (0–1200 μg mL−<sup>1</sup> ferrous ion) (coefficient of determination: *r*<sup>2</sup> = 0.9985) obtained by the addition of freshly prepared ammonium ferrous sulfate. FRAP values were calculated as μg mL−<sup>1</sup> ferrous ion (ferric reducing power) from three determinations and have been reported as mg kg−<sup>1</sup> of Fe2<sup>+</sup> (ferrous ion equivalent) of dry matter.

#### 2.3.4. Quantitative Determination of Ions by IC and Organic Nitrogen

For the estimation of anions and cations, a freeze-dried sample (200 mg) was extracted in water (50 mL) and shaken at 150 rpm for 20 min. Samples were filtered in sequence through filter paper (589 Schleicher), and the extracts were further filtered through cellulose acetate syringe filters (0.20 mm) before analysis by ion chromatography (IC). The IC was performed using an ICS-900 ion chromatography system (Dionex Corporation) equipped with a dual piston pump, a model AS-DV autosampler, an isocratic column at room temperature, a DS5 conductivity detector, and an AMMS 300 suppressor (4 mm) for anions and CMMS 300 suppressor (4 mm) for cations. A Dionex Ion-Pac AS23 analytical column (4 × 250 mm) and a guard column (4 × 50 mm) were used for anion separations, whereas a Dionex IonPac CS12A analytical column (4 × 250 mm) and a guard column (4 × 50 mm) were used for cation separations. The eluent consisted of 4.5 mM sodium carbonate and 0.8 mM sodium bicarbonate at a flow rate of 1 mL min−<sup>1</sup> for anions, and 20 mM metansolfonic acid for cations at the same flow rate. Chromeleon 6.5 chromatography management software was used for system control and data processing. Anions and cations were quantified following a calibration method. Dionex solutions containing seven anions and five cations at different concentrations were taken as standards, and the calibration curves for anions and cations were generated with concentrations ranging from 0.4 mg L−<sup>1</sup> to 20 mg L−<sup>1</sup> and from 0.5 mg L−<sup>1</sup> to 50 mg L−<sup>1</sup> of standards, respectively. The Kjeldahl method (ISO 1656) was used for organic nitrogen.

#### 2.3.5. Brix Content

Approximately 0.5 mL of defrosting liquid of the product was used for the determination of the Brix content, carried out using a Hanna Instruments HI 96801 portable digital refractometer.

#### 2.3.6. Starch

Starch analysis was performed by chromatographic analysis according to AOAC Official Method 996.11 (University of Florida, IFAS, Bulletin 339-2000 "Starch Gelatinization & Hydrolysis Method" Boehringer Mannheim, Starch determination, cat. N◦ 207748).

#### 2.3.7. Quantitative Determination of Sugars by HPLC

Freeze-dried sweet potato root samples (0.2 g) were homogenized in demineralized water (20 mL) with an Ultra Turrax T25 until a uniform consistency was achieved at 1018 *g*. Samples were filtered in sequence through filter paper (589; Schleicher), and the extracts were further filtered through cellulose acetate syringe filters (0.45 mm) and analyzed by HPLC. The liquid chromatography apparatus utilized in this analysis was a Jasco X.LC system consisting of a model PU-2080 pump, a model RI-2031 refractive index detector, a model AS-2055 autosampler, and a model CO-2060 column. ChromNAV Chromatography Data System software was used for analysis of the results. The separation of sugars was achieved on a Hyper-Rez XP Carbohydrate Pb++ analytical column (7.7 <sup>×</sup> 300 mm; Thermo Scientific, Waltham, MA, USA), operating at 80 ◦C. Isocratic elution was effected using water at a flow rate of 0.6 mL min−1. D-(+)-glucose, D-(−)-fructose, and maltose were quantified by a calibration method. All standards utilized in the experiments were accurately weighed and dissolved in water; the calibration curves were generated with concentrations ranging from 100 to 1000 mg L−<sup>1</sup> of standards.

#### *2.4. Statistical Analysis*

Chemical and morphological data were finally used to construct a constrained UPGMA dendrogram using PAST software v. 3.14 [36], applying the Euclidean similarity index and keeping the position of the samples fixed throughout the tree according to the clustering resulting from the SSR-based dendrogram. To measure the stability of the branches, a statistical bootstrap analysis was conducted with 1000 resampling replicates.

The complete set of data for each variety was used for random combinations using the bootstrap method. For each variety, a set of 1000 combinations was produced, and the data were analyzed by the PCoA procedure using the software Statgraphics Centurion 18.1.06 (Statgraphics Technologies, Inc.). All qualitative trait data were processed by ANOVA and, in case of significant differences, average values were separated by Tukey HSD test (CoStat 6.400–CoHort Software, CA, USA).

#### **3. Results and Discussion**

Overall, 117 marker alleles were detected in 11 SSR loci analyzed throughout the accession pool, ranging from a minimum of 6 (J206A) to a maximum of 16 (GDAAS0757), with an average number equal to 10.5 per locus (Table 3 and supplementary Table S1). According to Botstein et al. [33], all examined marker loci were found to be highly informative and variable across the accessions, with a mean PIC value equal to 0.79, spanning from 0.61 (J206A) to 0.93 (GDAAS0615), as reported in Table 3. Since private polymorphisms are recognized as an efficient molecular tool for food traceability, the presence of marker alleles able to discriminate the accessions according to their putative origin was investigated. As many as 58 out of the total 117 marker alleles scored (Table 3, blue boxes) were exclusively identified only within the foreign accessions pool, but only two of them, i.e., loci IBSSR04 and J116a (Table 3, underlined percentages) were scored in at least 60% of the accessions. In contrast, six marker alleles were exclusively associated with the Italian pool (Table 3, red boxes) and only one, i.e., locus IBSSR27, was found in at least 60% of the Italian sweet potatoes (Table 3, underlined percentage).

According to the genetic similarity matrix calculated in all possible pairwise comparisons among the 22 accessions, Dice's coefficient ranged from 0.28 (between BR\_78 and BR\_30) to 0.97 (between IT\_41 and IT\_44, Table S2). US\_45 resulted to be the most divergent genotype: the average genetic similarity value calculated against the rest of the accessions was as low as 0.41, highlighting a clear-cut differentiation from the rest of the pool. The mean genetic similarity value calculated among the five Italian accessions was 0.70. The same value calculated in all pairwise comparisons within the Brazilian accessions was considerably lower (0.54), consistent with the great morphological variability observed within the South American core collection. The accession BR\_66 was the most closely related to the Italian clones, scoring an average genetic similarity value of 0.70 and a maximum of 0.74 when compared with IT\_44. From the principal coordinate analysis (PCoA), the first coordinate separated a group composed of eight Brazilian entries (namely, BR\_25, BR\_53, BR\_33, BR\_32, BR\_11, BR\_78, BR\_79, and BR\_80) from a group including all of the Italian clones (Figure 1).

The first principal coordinate also underlined the separation of the two US accessions (US\_45 and US\_85), according to their contrasting phenotype and in agreement with their low genetic similarity value (0.38). The main result of the variation explained by the second principal coordinate was the split of three Italian accessions showing a cream skin color (namely, IT\_44, IT\_81, and IT\_41) from two Italian accessions distinguishable for their pink skin color (IT\_49 and IT\_43). This finding was supported not only by a contrasting phenotype, but also by a relatively low estimate of mean genetic similarity (0.60) calculated between these two groups, suggesting a different origin of the Italian accessions. The UPGMA analysis confirmed the sharp detachment, already seen in the genetic similarity matrix (Table S2), of US\_45 from the rest of the genotypes (Figure 2A), supported by a bootstrap value of 100.



**Figure 1.** Principal coordinate analysis (PCoA) of the sweet potato core collection based on molecular markers: two-dimensional centroids derived from the genetic similarity estimates computed among accessions in all possible pairwise comparisons using the whole SSR marker data set. The first two coordinates were able to explain 54% of the total variation, accounting for 31% and 23% of the total, respectively. Four different colors have been used to distinguish the accessions based on their geographical origin: blue = Brazil, red = Honduras, green = Italy, and brown = USA.

Four main clustering patterns already highlighted with the PCoA (Figure 1) were also observed throughout the dendrogram (Figure 2 panel A), with bootstrap values always higher than 90%. In detail, BR\_79 and BR\_80 grouped together and scored the 91% of similarity; BR\_33, BR\_25, and BR\_53 shared a mean genetic similarity value of 82%; IT\_44, IT\_41, and IT\_81 showed, on average, 88% the genetic similarity and, finally, BR\_11 grouped with BR\_78 according to a genetic similarity of 78%. In particular, it is worth highlighting that although the five Italian accessions were all part of the same macrobranch of the tree, they subclustered into two groups, according to what was observed in the PCoA analysis. In fact, in the PCoA, the second coordinate alone explained 23% of the total variation, clearly separating IT\_44, IT\_41, and IT\_81 from a second group including IT\_43 and IT\_49 (Figure 1). A noteworthy consideration involving BR\_66 is the close association between this Brazilian accession and the first group of Italian accessions in both the PCoA and the UPGMA tree, with a similarity value of 0.74.

The ΔK criterion suggested by Evanno et al. [41] gave the highest value for the SSR analysis at three groups (for *K* = 3, ΔK resulted 58.53, Figure S1). According to this estimate, the whole group putatively originated from three ancestral genotypes (indicated in grey, orange, and blue in Figure 2B). One vertical histogram for each accession, conveniently divided into *K* = 3 colored segments, has been used to represent the estimated membership in each hypothesized ancestral genotype, and 90% was the threshold set for the admixed ancestry (Figure 2B). This finding seems to be in agreement with those reported by Roullier et al. ([45] Figure 2 in Appendix S1) and Wadl et al. [8], both supporting the *K* = 3 in the sweet potato clones from tropical America. Overall, one of the three ancestors (the one in grey) was predominant: 13 samples from USA, Honduras, Italy, and Brazil showed a membership to this ancestral genotype higher than 95%, suggesting a probable common origin. Five accessions (BR\_66, IT\_81, BR\_53, BR\_79, and BR\_80) were admixed and one of the three ancestors (the one in grey) was always recurrent; interestingly, no example of hybridization between the other two ancestors (blue and orange, Figure 2) was observed.

**Figure 2.** Genetic structure analysis of the sweet potato core collection: (**A**) The unweighted pair group method with arithmetic average means (UPGMA) tree of the genetic similarity estimates computed among pairwise comparisons of sweet potato accessions using the whole simple sequence repeat (SSR) marker data set, with nodes of the main subgroups supported by bootstrap values. The color scheme for this figure is the same as that used in Figure 1 (Blue = Brazil, red = Honduras, green = Italy, and brown = USA). (**B**) Population genetic structure of a core collection of *N* = 22 sweet potato accessions estimated using 11 microsatellite markers. Each sample is represented by a vertical bar partitioned into *K* = 3 colored segments representing the estimated membership. The proportion of ancestry (%) is reported on the ordinate axis, and the identification number of each accession is indicated below each histogram. (**C**) The unweighted pair group method with arithmetic average means (UPGMA)-constrained tree was built by applying the Euclidean similarity index and using the morpho-qualitative measurements of a subset of the *I. batatas* core collection. The positions of the samples throughout the dendrogram were kept fixed according to those ones resulting from the dendrogram in Figure 2A, and the bootstrap values supporting each node of the main subgroups were calculated.

In detail, two of three Italian accessions characterized by a cream skin color (IT\_44 and IT\_41) and strictly associated with the same branch of the UPGMA tree (similarity = 0.97) also shared the same marker allele cluster, both with accession scores of individual membership higher than 99%. Although these two Italian accessions were collected in two different areas, these results suggest a case of synonymy and we may suppose that the same genotype is cultivated in different regions with a different name. IT\_81 and BR\_66, although counted as admixed, shared the same ancestral genotype as IT\_44 and IT\_41, with percentages of 75% and 35%, respectively. Considering the PCoA and the UPGMA tree, it is probable that all four shared the same ancestral genotype (the one indicated in orange), and it is not to be ruled out that the three Italian accessions derived from BR\_66. However, it is possible that BR\_66 has undergone continuous events of hybridization with individuals descending from the "grey" progenitor, which would explain the admixed pattern of this Brazilian accession. In contrast, the three Italian accessions, whose memberships ranged from 75% (IT\_81) to 99% (IT\_44 and IT\_41) may have preserved their "ancestral purity" thanks to repeated crosses with accessions of the same lineage or to asexual propagation breeding schemes. A second cluster included BR\_25 and BR\_33 (membership > 97%) and, to a lesser extent, BR\_53 (membership = 59%), according to their sharp detachment from the rest of the pool highlighted by the PCoA analysis. The remaining accessions, including two Italian accessions (IT\_49 and IT\_43), were all part of a third cluster, except for BR\_79 and BR\_80, which were admixed (Figure 2B). The Euclidean-index-based UPGMA dendrogram emphasized the clear-cut detachment of US\_45 from the rest of the pool (Figure 2C), as previously established by the SSR-based UPGMA (Figure 2A). In this case, the bootstrap support was 100%. The other samples were all clustered in three main branches, with a bootstrap value of 80%. In the first branch, BR\_79, BR\_80, BR\_33, BR\_25, and BR\_53 clustered together with IT\_43, IT\_49, and BR\_1. This finding was different from that observed in the SSR-based UPGMA tree (Figure 2A), where these two subgroups clustered separately, and it was not consistent with the PCoA analysis and the morphological data (Figure 1). Nevertheless, it must be noted that in the morphological and chemical-marker-based dendrogram (Figure 2C), the bootstrap support of this specific node was quite low (45%). Moreover, despite the different clusterization highlighted in the two UGPMA trees (Figure 2A,C), the two subgroups maintained the same composition. In particular, all of the samples in the first subgroups (BR\_79, BR\_80, BR\_33, BR\_25, and BR\_53) were characterized by a typical purple color of flesh that was not detected in other samples. This observation, along with a full or partial membership to the same cluster (Figure 2 panel B), reinforces the hypothesis of a common ancestor for this group of samples, in agreement with their geography. Regarding the second subgroup (IT\_43, IT\_49, and BR\_1), it is very hard to make hypotheses about its origin, although genetic data would presuppose a certain degree of kinship with the other three Italian accessions. Considering the second main branch of the Euclidean-distance-based UPGMA, two considerations are needed. First, IT\_44, IT\_41, IT\_81, and BR\_66 grouped together according to all the previous analyses (bootstrap support 86%). The fact that the genetic and the morpho-agronomic markers led to identical results may suggest that this first group of Italian genotypes was descended from BR\_66, as initially predicted on the basis of their highly similar morphology (Figure 1). All the accessions were also analyzed by means of morpho-agronomic qualitative markers (Table 4).



In addition to genetic characteristics, quality traits are generally preferred for nutritive value and market demand [46–48]. With regard to the qualitative aspects, the dataset obtained for the different parameters is summarized in Figure 3 by multivariate analysis.

**Figure 3.** Principal components analysis (PCA) of the sweet potato core collection based on qualitative traits: (**A**) Score plot of the first two principal components (PC1 and PC2) for the 22 sweet potatoes. (**B**) Eigenvectors of the variables measured for the first two principal components. Loadings (eigenvalues) for the first and second principal components were equal to 25% and 17%, respectively. TP: total phenols; TAA: total antioxidant activity; dw: dry weight.

The first identified the different genotypes positioned on the basis of the measured qualitative characteristics, whereas the second vectorially highlighted the main qualitative traits that determined the genotype positioning. The results obtained from this elaboration allowed us to identify interesting cues regarding grouping of the different genotypes according to their qualitative peculiarities. In particular, it was possible to group the 22 genotypes into three macrogroups (A, B, and C). Group A comprised those genotypes characterized by a high anthocyanin content in both the skin and the flesh (US\_45, BR\_1, BR\_33, and BR\_25). The genotypes belonging to this group were characterized by high total antioxidant capacity and high total polyphenol content (Figure 3). The anthocyanins belong to the antioxidant family, in particular to the polyphenols, and have positive effects on human health [49]. Anthocyanin composition was determined in purple-fleshed sweet potatoes [50,51], highlighting that cyanidin and peonidin glycosides acylated with phenolic acids were the primary anthocyanin components. The second group, B, comprised genotypes phenotypically characterized by high beta-carotene content, thus featuring orange pulp, and by high simple sugar concentrations (BR\_51, BR\_30, IT\_49, US\_85, and HO\_86). Carotenoids are secondary plant compounds that form lipid-soluble yellow, orange, and red pigments. Carotene-rich vegetables are associated with decreased risk of chronic diseases related to vision, skin, infection, and reproduction, in addition to being active oxygen species scavengers [52]. The most abundant carotenoid in sweet potato roots is usually β-carotene, which comprises more than 77% of total carotenoid content and can reach more than 99% in sweet potatoes characterized by orange flesh [53]. The colored genotypes considered in this experiment were characterized by a β-carotene content ranging from 23.8 to 811 μg g−1. This range was wider than those measured by Simonne et al. [54] (1–190 μg g<sup>−</sup>1) and by Grace et al. [50] (1–253.3 μg g<sup>−</sup>1) in several sweet potato varieties. Finally, group C contained genotypes characterized by different flesh colors, but they shared a high content of sucrose and minerals (BR\_32, BR\_54, IT\_43, and IT\_81) or a high percentage of dry matter, soluble solids, and starch (BR\_53, BR\_78, BR\_80, IT\_41, BR\_79, and BR\_66). The high presence of starch and sucrose makes these varieties particularly sweet after slow cooking processes (i.e., boiling, oven, steam), following the production of maltose [3,55]. Genotypes belonging to group A and B may be more suitable for faster cooking methods (i.e., frying) because they are characterized by less starch, and they are more appealing for the consumer from a color point of view.

As shown from the comparison between grouping results of the core collection of sweet potato accessions (see Figures 1 and 3), some inconsistencies arose between the clustering based on molecular markers and that derived from qualitative parameters. This was understandable since both the molecular markers and qualitative parameters evaluated in this study not only represented a subset of the evaluable genotypic and phenotypic traits, but also because they did not show any known linkage. Consequently, a full overlap with what was detected by the principal components based on qualitative traits is not possible, as demonstrated by the centroids derived from the principal coordinates of molecular markers. An example of this behavior can be found between IT\_43 and IT\_49: these genotypes were grouped together based on molecular markers, whereas they differed according to the qualitative profiles (e.g., the former scored a higher concentration of potassium, whereas the latter a higher content of simple sugars).

#### **4. Conclusions**

We investigated, for the first time, the genetic structure and qualitative composition of the principal Italian sweet potato clones, along with their relatedness to a core collection of accessions from Central and Southern America. It is worth mentioning that the sweet potato accessions analyzed in this study represent the most cultivated clones in Italy and, to the best of our knowledge, no other locally adapted varieties are commercially available to Italian farmers. In fact, these materials, grown mainly in the Veneto region, are known to possess a high adaptation to the natural and anthropological environment in which they have been introduced and are still cultivated.

From the molecular analyses, Italian accessions were sub-clustered into two groups and were found to be genetically very similar to the South American germplasm. This finding was also supported by the morphological and chemical measurements affecting their principal qualitative traits.

Summarizing our results and considering both the morphological and qualitative and the genetic–molecular data, it is evident how the combination of these two different approaches was very effective in differentiating or clustering the different clonal genotypes, as expected by their geographical origin or their phenotypic characteristics. Moreover, because the molecular and the chemical results were often comparable, it was possible to make robust speculations on the common origins of sweet potato accessions. The experiment demonstrated not only a good relationship between genetic and morphological and qualitative aspects, but also allowed us to indirectly highlight the good level of adaptation of South American genotypes to European conditions. This last information allows us to suggest that breeders use South American germplasm, characterized above all by colored pulp, for the constitution of new genotypes in Europe, useful for the renewal and innovation of the European market as well as for providing new opportunities for farmers. On the whole, this information could be exploited by both breeders and farmers to detect and protect commercial varieties, and hence to certify the genetic identity of their propagation materials and overall quality of their food derivatives.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/10/11/840/s1, Figure S1: Definition of the number of ancestral sweet potato genotypes. Mean ΔK is calculated as L"(K)/(SD(L(K))) following Evanno et al. (Evanno et al. 2005), where mean LnP(D) ± SD over 10 runs is a function of K, being L'(K) = ΔLnP(D), Table S1: SSR data recorded for the different sweet potato genotypes, Table S2: Genetic similarity matrix for the sweet potato accessions.

**Author Contributions:** Conceptualization, C.N., G.B. and P.S.; methodology, A.C.G., F.P.; validation, C.N., G.B. and P.S.; formal analysis, A.C.G., F.P. and C.N.; investigation, A.C.G., F.P. and C.N.; resources, G.B. and P.S.; data curation, A.C.G., F.P., C.N., G.B. and P.S.; writing—original draft preparation, A.C.G., F.P. and C.N.; writing—review and editing, F.P. and C.N.; supervision, G.B. and P.S.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Serendipitous In Situ Conservation of Faba Bean Landraces in Tunisia: A Case Study**

## **Elyes Babay 1,2,**†**, Khalil Khamassi 3,**†**, Wilma Sabetta 4, Monica Marilena Miazzi 5, Cinzia Montemurro 5, Domenico Pignone 4, Donatella Danzi 4, Mariella Matilde Finetti-Sialer 4,\* and Giacomo Mangini 5,\***


Received: 21 January 2020; Accepted: 16 February 2020; Published: 24 February 2020

**Abstract:** Cultivation of faba bean (*Vicia faba* L.) in Tunisia is largely based on improved varieties of the crop. However, a few farmers continue to produce local cultivars or landraces. The National Gene Bank of Tunisia (NGBT) recently launched a collection project for faba bean landraces, with special focus on the regions of the North West, traditionally devoted to cultivating grain legumes, and where around 80% of the total national faba bean cultivation area is located. The seed phenotypic features of the collected samples were studied, and the genetic diversity and population structure analyzed using simple sequence repeat markers. The genetic constitution of the present samples was compared to that of faba bean samples collected by teams of the International Center for Agricultural Research in the Dry Areas (ICARDA) in the 1970s in the same region, and stored at the ICARDA gene bank. The results of the diversity analysis demonstrate that the recently collected samples and those stored at ICARDA largely overlap, thus demonstrating that over the past 50 years, little genetic change has occurred to the local faba bean populations examined. These findings suggest that farmers serendipitously applied international best practices for in situ conservation of agricultural crops.

**Keywords:** *Vicia faba* L., genetic diversity; SSR markers; in situ conservation

#### **1. Introduction**

Faba bean (*Vicia faba* L., 2n = 2x = 12) is a facultative cross-pollinating species with outcrossing rates varying between 1 and 55% depending on its environment; it belongs to the *Fabaceae* family, *Faboideae* subfamily, tribe of *Fabeae*, and is not interfertile with any other *Vicia* species [1]. The wild progenitor of *V. faba* is unknown, but recent archaeological excavations have allowed, in the Mont-Carmel (Mediterranean Levantine), the discovery of fossilized seeds that are compatible with a wild progenitor of this crop, dating as back as 14,000 ybp [2]. Considering other archaeological evidences, such as those relating to findings in Tel el Kerkh [3], northwest Syria, it is possible to

hypothesize that this species has been domesticated since the Neolithic era, and that the wild progenitor, possibly distributed in small habitats, was entirely domesticated and then became extinct [2–5]. According to Cole [6] and Cubero [7], the spread of faba bean from its center of origin to other countries could have involved five routes. In the Mediterranean, in particular, faba bean mainly spread through two routes: the first across Anatolia to Greece, the Illyric coast (possibly the Danubian regions), and then to Italy; the second, beginning at the Nile Delta, moving towards the West, along the North African Mediterranean coast, to the Maghreb and then to the Iberian Peninsula. It is worth mentioning that, in this regard, North Africa and Tunisia in particular constitute a center of primary and secondary diversification of several agricultural and wild species [8].

In Tunisia, faba bean covers more than 70% (59,583 ha) of the total area annually devoted to grain legume crops [9]. Approximately half of the area is cultivated with grain legume types meant for fresh pod consumption; the rest as forage (plant and seeds) is mainly based on small seeded types. The average productivity in Tunisia is 1.03 t/ha, 40% below the world average [10]. This is mainly due to the parasitic weed broomrape (*Orobanche crenata* Forssk. and *O. foetida* Poir.) and drought stress occurring in faba bean-growing areas [11]. Until the last century, most crops consisted of landraces, often named after the farmer who selected them or after areas where they were grown [12]. Some landraces ('Batata', 'Malti', 'Chemchali' or 'Masri local') are still grown by farmers and seeds can be bought from local informal markets.

In recent years, thanks to significant achievements in crop breeding, modern high-yielding varieties are widely used in cultivation and have almost completely replaced local populations and landraces [11]. The increase in yield was obtained mainly with breeding programs targeted at tolerance to abiotic (heat and drought) and biotic (foliar diseases and parasitic weeds) stresses. Moreover, recent breeding efforts are directed towards the development of new cultivars with low anti-nutritional compounds (vicine and convicine), to improve the quality and utilization efficiency of faba in human diet and for livestock feed [13]. Different molecular tools were used to investigate genetic diversity in grain legume species [14–17]. Some attempts to evaluate genetic variability have also been made for Tunisian faba bean germplasm. The analysis of isozyme [18] and sequence specific amplified polymorphism (SSAP) [19] markers analyzed in nine Tunisian *Vicia faba* collections has indicated a certain degree of genetic cohesiveness. Using simple sequence repeat (SSR) markers, 16 faba bean accessions, selected from 42 populations collected across eight southern oases arid agro-ecosystems, were analyzed, evidencing genetic cohesiveness among the studied samples [19–21], together with a low level of variability among accessions. In both reports, the authors stated that intense seed exchange among farmers had led to a leveled degree of genetic diversity among those populations.

In the 1960s, major concerns focused on the genetic erosion of biodiversity, eventually leading to fostering of ex situ conservation efforts and the creation of gene banks [22]. As a result, in the latest decades of the 20th century, several different research centers organized collection missions for crop diversity in order to secure the local germplasm before it was completely lost [23]. Within the frame of "emergency" collections, from the seventies until the early nineties, ICARDA, among others, carried on a series of faba bean collection missions in North Africa. During that period, several faba bean samples were retrieved from different regions, in particular from Tunisia. At the end of the 20th century, taking into account the evident loss of genetic diversity in the Mediterranean [24], specific attention was paid to the practice of in situ conservation of crops. This is the conservation of agricultural genetic resources on farms located in the same areas where local communities had developed them, with specific attention to neglected crops [25,26]. According to Duc et al. [23], in situ conservation of biodiversity may contribute to the development of the best-adapted materials for local agronomic practices and involve farmers in the selection process through participatory breeding [27]. Within this frame, NGBT started an ongoing program of genetic resource collection in different areas of Tunisia. The aim was to preserve Tunisian crop gene-pools from genetic erosion and characterize local germplasm, thanks to an integrated approach, including on-farm conservation of local germplasm and landraces.

In order to better plan an in situ conservation strategy for faba bean and to understand the possible loss of genetic diversity in Tunisian *Vicia faba* germplasm, the genetic structure of the samples collected in recent years was compared with those collected by ICARDA in the 1970s. This paper reports on the results of this comparison.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

The plant material used in the present study consisted of a collection of 51 Tunisian local faba bean samples (Table 1). It included 29 samples collected during 2016–2018 by the NGBT and 22 faba bean accessions collected by ICARDA, starting from the seventies until the early nineties. The NGBT samples were collected in the governorates of Beja and Jendouba (Figure S1) characterized by annual average rainfall of 800 and 600 mm, respectively. The passport data and ethno-botanical information of the NGBT samples are available at NGBT. The samples of ICARDA (labelled ICAR) were derived from collection missions conducted from the seventies until the early nineties in North Tunisia (Beja, Bizerte, and Siliana).


**Table 1.** List of faba bean samples included in the study.


**Table 1.** *Cont*.

n.a. data not available.

#### *2.2. Seed Phenotypic Traits*

Five seeds of each sample were randomly selected to determine average seed size. The three axial dimensions of seed length (L), width (W), and thickness (T) were measured using a Vernier caliper (Gilson Tools, Japan) with accuracy of 0.05 mm. The geometric mean diameter (Dg) was calculated by using the equation reported by Mohsenin [28]:

$$\mathbf{D}\mathbf{g} = (\mathbf{L} \times \mathbf{W} \times \mathbf{T})^{1/3} \tag{1}$$

The sphericity (ϕ) of faba bean seeds was calculated using the following formula:

$$\Phi = \left[ (\mathbf{L} \times \mathbf{W} \times \mathbf{T})^{1/3} / \mathbf{L} \right] \mathbf{^\ast 100} \tag{2}$$

#### *2.3. DNA Extraction and SSR Assays*

Total genomic DNA was extracted from fresh young leaves—five plants per sample—using the cetyltrimethyl ammonium bromide (CTAB) method, as described by Fulton et al. [29]. DNA concentration was determined using a NanoDropTM ND-2000 (Thermo Scientific, MA, USA) and the quality was verified by separation on 0.8% agarose gel. Equal DNA quantities of the five plants of the sample were then pooled, and all DNA samples were diluted to a standard working concentration of 50 ng/μl by adding ultrapure water (Gibco, Invitrogen, USA).

A set of 11 simple sequence repeats (SSRs) markers, retrieved from the literature [30–32] were used for this study (Table S1). A preliminary assay was carried out in order to evaluate the robustness of PCR reaction and the reproducibility of the fragments. In particular, different faba samples randomly chosen were analysed considering technical replicates. The PCR conditions for each SSR markers were set up at best conditions, considering an annealing temperature ranging from 45 to 60 ◦C. The fragments produced were separated on 2.0% agarose gel containing Nancy-520 DNA Gel Stain (Sigma Genosys, St. Louis, MO, USA), and visualized under UV light. The amplification reactions were performed in a final volume of 20 μl, containing the template DNA (50 ng), the 5 end of each forward primer with the M13 (21 bp) tail, the reverse primer (M13) labeled with fluorescent dye (FAM, VIC, PET, or NED). PCR reactions were performed in a thermal cycler (Bio-Rad Laboratories, Hercules, CA, USA) as follows: an initial denaturing step at 95 ◦C for 3 min, followed by 36 cycles of 94 ◦C for 20 s, 56 ◦C for 50 s, 72◦C for 1 min, and a final extension step at 72 ◦C for 7 min. The amplification products were detected by automatic capillary sequencer ABI PRISM 3100 Genetic Analyzer (Applied Biosystems, Waltham, MA, USA), and the fragments were analyzed with GeneMapper genotyping software version 5.0 (Thermo Fisher Scientific, Waltham, MA, USA). The internal molecular weight standard was GeneScanTM 600 LIZ dye Size Standard (Thermo Fisher Scientific, Waltham, MA, USA).

#### *2.4. Data Analysis*

Hierarchical ascending classification (HAC) clustering analysis based on dissimilarity matrix of morphometric seed data (L, W, T, Dg, and ϕ) was performed to evaluate the relationship among the faba samples using XLSTAT statistical software ver. 2016.2 (Addinsoft Inc, New York, USA).

The genetic indices, number of different alleles (*Na*), Shannon's information index (*I*), observed Heterozygosity (*Ho*), expected Heterozygosity (*He*), Fixation Index (*F*), and private alleles were calculated using GenAlEx version 6.5 [33]. The allelic data were used to obtain a similarity matrix, from which a dendrogram was constructed using the UPGMA algorithm with MEGA ver. 4 [34]. The molecular data were processed using STRUCTURE ver. 2.3.4 [35]. The number of sub-populations (K) was estimated by 10 independent runs for each K (from 1 to 10), applying the admixture model, 500000 Markov Chain Monte Carlo (MCMC) repetitions, and a 100000 burn-in period. The means of the log-likelihood estimates for each K were calculated. The true K was determined with the Evanno test [36] using STRUCTURE HARVESTER [37]. Analysis of molecular variance (AMOVA) was used to partition the genetic variation into inter- and intra-gene pool diversities in faba using GenAlEx program ver. 6.5, with 1000 permutations.

#### **3. Results**

#### *3.1. Variation in Seeds Phenotypic Traits*

Morphometric seed traits (L, W, T, Dg, and ϕ) were measured in the samples belonging to both the NGBT and ICARDA faba bean collection. The average mean of the three principal axial dimensions (L, W, and T) and Dg of the NGBT and ICAR groups are shown in Figure 1. No statistically significant differences for these morphometric seed traits were detected between the two groups. The values of faba bean sphericity (ϕ) were calculated by using the geometric mean diameter and length data. No significant difference was found for these traits when comparing the NGBT and ICAR groups.

**Figure 1.** Length (L), width (W), thickness (T), and geometric mean diameter (Dg) averages of samples collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICAR). Lines represent the standard deviation.

The values of morphometric seed traits of each samples were used to calculate a dissimilarity matrix based on Euclidean distances. A dendrogram was obtained starting from the matrix, in which three main clusters are evidenced, corresponding to three seed types—large, medium (called also equina), and small (Figure 2). Each cluster included both NGBT and ICAR samples with no reference to the area of origin.

**Figure 2.** Dendrogram based on dissimilarity matrix calculated from morphometric seed traits in the faba bean collection split in samples collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICAR). The colors orange, purple, and green were used to distinguish between the medium, large, and small seed type clusters, respectively.

#### *3.2. Molecular Variation of Faba Bean Collection*

In order to evaluate the genetic diversity of 51 faba bean samples, a set of 11 SSR markers was used (Table 1 and Table S1). A total number of 94 alleles were identified, ranging from 3 (loci M22 and M46) to 22 (locus VFG41), with an average of 10.2 alleles per locus (Table S2).

AMOVA did not show a molecular diversity among the 29 NGBT faba beans when these were grouped according to the collection sites (El Hamra, Oued Ghrib and Fouazia), suggesting that the NGBT faba bean samples belonged to one genetically cohesive population (Table S3).

According to these results, we investigated the genetic diversity among samples collected in recent years by NGBT and ICAR during the 1970s, in the same Tunisian areas. Similar values for the number of different alleles, Shannon's information index, Heterozygosity expected, Heterozygosity observed, Diversity index, and Fixation index were found between the two groups, although the ICAR samples appeared slightly more fixed (Table 2).

**Table 2.** Number of different alleles (*Na*), Shannon's information index (*I*), Heterozygosity observed (*Ho*), Heterozygosity expected (*He*), Fixation Index (*F*), and private alleles of faba bean collection split in samples recently collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICARDA).


AMOVA indicated that the NGBT and ICAR groups have no statistical difference, at the molecular level (Table 3), as most of the diversity clearly appeared within groups, with only a limited variation occurring among them.

**Table 3.** Analysis of molecular variance (AMOVA) of faba bean collection split in samples collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICARDA).


df = degree of freedom, SS = sum of squares, MS = mean squares, Est. Var. = estimate of variance, *p*-value.

To define the genetic relationships among the faba bean samples, the similarity matrix obtained was also used to produce a UPGMA dendrogram (Figure 3). The clustering showed that the NGBT samples were admixed with ICAR ones, thus suggesting that the two groups could be considered a unique meta-population. Total faba bean collection was also evaluated with Bayesian clustering modeling, performed using SSR allelic data generated according to 11 SSR markers. As the clustering model presumes the underlying existence of K clusters, an Evanno test [36] was performed that yielded K=3 as the highest log-likelihood (Figure S2). Nevertheless, each sample analyzed showed to belong to all three clusters identified, and none of them predominantly pertained to a specific cluster (Figure 4). These results seemingly indicate that the Tunisian faba collection was structured in three subpopulations that do not correspond to NGTB and ICAR samples' distinction.

**Figure 3.** Dendrogram of faba bean collection split in samples collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICAR) resulting from the UPGMA cluster analysis based on similarity matrix obtained from 11 SSR allelic data.

**Figure 4.** Membership coefficient (Q) mean of the faba bean collection split in samples collected by the National Gene Bank of Tunisia (NGBT) and Agricultural Research in the Dry Areas (ICARDA). The different colors indicate the three subpopulations detected using a Bayesian approach (blue: subpopulation 1; red: subpopulation 2, and green: subpopulation 3).

#### **4. Discussion**

Faba bean is an important crop for sustainable agriculture in marginal areas and advanced agricultural systems, as it plays an important role in soil fertility and nitrogen fixation, being able to grow in diverse climatic and soil conditions. Although faba bean is less consumed in western countries as human food, it is considered one of the main sources of cheap protein and energy for many people in Africa, parts of Asia, and Latin America, where many people cannot afford to buy meat [38]. Seed size and shape are characters of polygenic control [39,40] and have undergone strong selection measures by farmers during evolution of the crop. Seed size is considered a key trait in the study of the historical evolution of this crop based on archaeological remains and findings [2]. This selection pressure can still be found in the habits of farmers, who manually select seeds to be sown in the next season [41], a habit that was followed in many Mediterranean regions until modern times and lost only on introduction of improved varieties. This manual selection procedure resulted in the formation of peculiar landraces, such as "Larga di Leonforte" [42]. However, this study noted that Tunisian farmers did not practice seed selection, and have not done so for the last 50 years.

The analysis of morphometric seed traits and molecular markers carried out in this work did not distinguish patterns in the distribution of morphological and genetic variations. All the samples collected by the NGBT teams appeared to belong to a single, genetically cohesive population. Similar results were observed for the faba bean samples obtained by ICARDA and collected from the same areas. When all faba samples were analyzed together, there were no differences between the NGBT and ICARDA groups. In fact, our molecular analyses and seed morphometric variation study demonstrated that the two groups belonged to a single meta population.

Seed size is one of the most important morphological traits responsible for yield, and a major target for breeding. Several studies have led to the mapping of QTLs/genes for seed weight/size in soybean [43], chickpea [44], and lentil [45]. In faba bean, the genetic control of these traits is still unclear, although consensus linkage maps have been produced [43,46].

Our data confirmed that morphometric seed traits were not associated to the markers used. This implies that any selection of lines based on seed traits still retains a variable level of genetic diversity potentially associated to other traits, such as adaptation or resistance, an issue to be taken into account in any faba bean-breeding program based on phenotypic data.

The absence of genetic differentiation in different collection sites might depend on many factors. The absence of human selection pressure does not force crop adaptation in a specific direction. The outbreeding habit of faba bean is a second factor; in fact, while the advanced breeding varieties favor inbreeding in search of higher stability, as requested by UPOV standards, the landraces are generally quite allogamous [47,48]. Recent studies have also demonstrated that the level of allogamy might depend on the species of pollinating insects [49]. A further, effective mechanism could be derived by the spontaneous seed exchange practice occurring among farmers. In informal seed markets, seed exchange by farmers favors the establishment of a landrace in a given environment with uniform agro-climatic characteristics. The farmers cultivating these landraces are actually the relics of a once wider cultivation area. All these factors act synergically to produce the observed genetic differentiation patterns.

The Evanno test yielded three subpopulations within the faba bean collection. In addition, each sample had a coefficient membership lower than 0.50, thus denying that it predominantly belonged to one of these subpopulations. This further supports the fact that the two meta-populations, NGBT and ICARDA, are subsamples of a unique population collected in the same area at different times.

The biological significance of the three subpopulations detected with STRUCTURE is unclear, and might depend on the presence of genetic signals that do not parallel clear-cut characteristics. Nevertheless, several authors have reported that hierarchical analyses, such as those based on STRUCTURE or similar software, relay on strict assumptions that might not completely apply to the case studies, thus resulting in incorrect evaluation of the population structure [50].

In conclusion, unintended conservation of ancient faba bean germplasm in Tunisian farms is witnessed, because the farmers cultivating faba bean landraces do not follow seed selection, as well as owing to concurring factors such as the high level of cross-pollination, and the consistent presence of pollinators (due to the limited use or absence of insecticides). This beneficial situation is an empirical application of best practices recommended by research institutions for on-farm conservation of plant genetic resources. In their quest to feed their families and their precious animals, the Tunisian farmers in the small villages mentioned above serendipitously stored and protected their faba bean genetic resources. These findings and implications should be further discussed in the broadest context possible. Future research directions should also be highlighted.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/2/236/s1, **Figure S1:** NGBT sampling **Figure S2:** Delta K for differing numbers of subpopulations (K) estimated for faba bean collection with 11 markers; **Table S1:** List of SSR markers used for genetic analysis of faba bean collection; **Table S2:** Number of different alleles (*Na*), Shannon's information index (*I*), Heterozygosity observed (*Ho*), Heterozygosity expected (*He*), and Fixation Index (*F*) of 11 SSR markers used in faba bean collection; **Table S3:** Analysis of molecular variance (AMOVA) of 29 NGTB faba bean split according to collection sites (El Hamra, Oued Ghrib, and Fouazia).

**Author Contributions:** Conceptualization, E.B., K.K., D.P., M.M.F.-S., and G.M.; Formal analysis, M.M.M., D.D., and G.M.; Funding acquisition, C.M. and D.P.; Investigation, E.B., K.K., W.S., and M.M.F.-S.; Methodology, D.P., M.M.F.-S., and G.M.; Resources, E.B. and K.K.; Supervision, D.P., M.M.F.-S., and G.M.; Writing—original draft, E.B., K.K., D.P., M.M.F.-S., and G.M.; Writing—review & editing, D.P., M.M.F.-S., and G.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was carried out within the framework of the project "Ressources phytogénétiques tunisiennes mieux conservées et valorisées", funded by the Italian Cooperation to the Tunisian Ministry of Local Affairs and Environment.

**Acknowledgments:** The authors wish to thank Amine Hmid, Anis Khlij, and Alberto Dragotta from the Mediterranean Agronomic Institute of Bari, Italy (CIHEAM-IAM); Olfa Saddoud and Mbarek Ben Naceur from the National Gene Bank of Tunisia for project management and administration. Special thanks are also given to Mr. Mostafa Khemiri and Mr. Ali Ayari, agronomic engineer-extensionists and head of CTV Fernana and CTV Amdoun, respectively, for facilitating contacts with farmers and help during the collection missions. We give special thanks to Nicoletta Rapanà for excellent technical assistance.

**Conflicts of Interest:** The authors declare that they have no competing interests for this research.

#### **References**

1. Leht, M.; Jaaska, V. Cladistic and phenetic analysis of relationships in *Vicia subgenus Vicia* (Fabaceae) by morphology and isozymes. *Plant Syst. Evol.* **2002**, *232*, 237–260. [CrossRef]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Genes* Editorial Office E-mail: genes@mdpi.com www.mdpi.com/journal/genes

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18