1. Introduction
Avocado (
Persea americana L.) is one of the most economically important species within the
Lauraceae family [
1]. Its origin has been established in Central America, as avocado seeds found in Mexican excavations have been dated to 7000 B.C. [
2]. Local selection of improved genotypes, and their subsequent fixation by vegetative propagation, have allowed the development of hundreds of cultivars, but only a selection is currently being agronomically exploited. Indeed, about 90% of worldwide production relies on the “Hass” cultivar, which originated decades ago at the California University [
3]. Classically, avocado cultivars have been classified into three horticultural races,
P. americana var. drymifolia (Mexican),
P. americana var. guatemalensis (Guatemalan) and
P. americana var. americana (Antillean or West Indian), according to morpho-physiological features of trees and fruits [
2,
4]. However, reproductive biology of
P. americana is mainly asynchronous (i.e., female flowers bloom first then male ones), which usually favors cross-fertilization, causing a wide genetic diversity in the avocado progeny, as well as the continuous production of new hybrid cultivars [
2].
Since 1961, efforts to meet avocado global demand have led to about 23% increase in worldwide production, reaching more than 25 million tons per year in 2018 [
5]. The market value of avocado-derived products has also progressively increased [
5], not only because of its excellent nutritional properties [
6], but also as a consequence of mark-up product development by food, oil, and cosmetic industries [
7,
8]. Moreover, the anti-inflammatory and analgesic properties of phytochemical compounds found in avocado fruits are being exploited by the pharmaceutical industry [
9]. As a consequence, global consumption of this product cannot be actually supplied by the agronomic industry and, therefore, the avocado is currently considered a crop with an excellent profitability. Spain represents the unique European country with significant avocado production, (11,000 ha in 2017) [
10], being mainly produced in the south of the Iberian Peninsula, but also in the Canary Islands, in which
P. americana represents a promising crop to increase the economic impact of the local agronomy-based economy [
11,
12]. Both the area dedicated to avocado cultivation and its production in the Canary Islands have doubled since 2012, reaching 1965.4 ha and 13,293 tons of fruits produced in 2020 [
13]. However, agricultural soils on the islands show a high degree of degradation, especially due to salinization of irrigated soils, which reaches an average of 57% on the islands [
14]. Avocado is one of the most salt-sensitive crops [
15], and it is noteworthy that the physiological response of ‘Hass’ avocado to salinity is influenced by the rootstock [
16]. In this sense, the West Indian rootstock is able to grow in saline environments [
15] and it is resistant to the
Phytophthora cinnamomi phytopathogen [
17], which are the two main reasons why the local administration recommends it for new exploitation [
18].
In the same way that the avocado market has been expanded, so interest in development of new molecular markers has grown during recent decades, especially to unequivocally characterize the best cultivars to improve yield, but also to identify cultivars adapted to specific geoclimatic conditions and for development of molecular-assisted breeding programs [
2]. In this sense, several publications have addressed the identification of molecular markers in
P. americana, involving different classic methods such as Restriction Fragment Length Polymorphism (RFLP) [
19,
20], Amplified Fragment Length Polymorphism (AFLP) [
21,
22,
23,
24], Random Amplified Polymorphic DNA (RAPD) [
25], Single Sequence Repeats (SSRs) [
26,
27,
28,
29,
30,
31,
32,
33,
34] or Single Nucleotide Polymorphisms (SNPs) [
35,
36,
37,
38,
39,
40].
Surprisingly, the application of transposable elements for development of new
P. americana molecular markers has not been exploited so far, as in the case for other agronomically-important plant species [
41,
42]. Different strategies make use of transposable elements to generate DNA fingerprints, such as Sequence-Specific Amplification Polymorphisms (S-SAP) [
43], Inter-Retrotransposon Amplified Polymorphism (IRAP), or Retrotransposon-Microsatellite Amplified Polymorphism (REMAP) [
44,
45,
46,
47]. However, previous knowledge of nucleotide sequences from target plant species is necessary to apply these strategies. Fortunately, this problem has been successfully solved after development of the so-called inter-Primer Binding Site (iPBS) technique [
48,
49]. Most of plant transposable elements belong to Class-I retrotransposons, which usually contain Long Terminal Repeats (LTRs) as flanking sequences [
50]. These elements show a “copy-paste” transposition mechanism that involves its transcription to an RNA intermediate, which is then reverse transcribed to cDNA and inserted at the target genomic location [
51,
52]. This transposition mechanism requires the use of host cell tRNAs as primers, which recognize a Primer Binding Site (PBS) sequence placed near to the 5′ LTR of the retrotransposon, to initiate the reverse transcription step [
53]. These PBS sequences are usually conserved among species and have been used to design the iPBS nearly-universal primers, which allows a single-primer amplification of DNA fragments placed between two inverted LTR-retrotransposons [
48,
49].
In the present work, molecular tools based on LTR-retrotransposons (iPBS and IRAP) have been implemented for the first time in P. americana. Genetic diversity among 12 avocado cultivars has been evaluated, and phylogenetic relationships were reconstructed in order to compare results obtained by these two techniques.
2. Materials and Methods
2.1. Plant Samples and DNA Purification
Well-characterized
Persea americana cultivars were retrieved from
Instituto Canario de Investigaciones Agrarias (ICIA), as well as from several private collections located in Tenerife (Canary Islands, Spain) (
Table 1).
Young leaves were collected from adult trees, without symptoms of disease, chlorosis or wounds. Genomic DNA (gDNA) was purified from 0.1 g of fresh plant material, avoiding petioles and main nerves, with the E.Z.N.A. SP Plant DNA kit (Omega BIO-TEK, Norcross, GA, USA) following manufacturer’s instructions. As the first step, leaf samples were homogenized in 2 mL Lysing Matrix-A tubes (M.P. Biomedicals, Irvine, CA, USA) by vigorous shaking twice at 5 m/s for 30 s, in a FastPrep-24 system (M.P. Biomedicals, Irvine, CA, USA), the second time being in the presence of the lysis buffer from the kit. DNA concentration and purity were determined with a DeNovix DS-11 spectrophotometer (Denovix, Wilmington, DE, USA), considering ranges of 1.7–1.9 and 1.8–2.0 for 260/280 and 260/230 absorbance ratios, respectively, as adequate purity references. Each DNA sample was diluted to a final concentration of 10 ng/µL in 10 mM Tris-HCl pH 8.0, and stored at −20 °C. From these stocks, different working dilutions were prepared in the same buffer, as indicated.
2.2. The iPBS Analysis
Implementation of the iPBS strategy was essentially carried out following recommendations of Kalendar et al. [
46], making use of a subset of 9 PBS primers (
Table 2) [
49]. After optimization, PCRs were carried out in a final volume of 20 µL, containing 2 ng of gDNA, 1X Phire HotStar II Reaction Buffer; (ThermoFisher Scientific, Bedford, MA, USA), 0.2 mM each dNTP (VWR, Radnor, PA, USA), 1 µM of one PBS primer, 0.2 µL of Phire HotStar II DNA polymerase, 0.5 µg/µL BSA (VWR, Radnor, PA, USA), and a supplement of MgCl
2 (0.5 mM). A ProFlex PCR System (Applied Biosystems, Waltham, MA, USA) was used for incubation of amplification reactions, including an initial denaturation step (98 °C for 30 s), 30 amplification cycles (98 °C for 10 s; annealing temperature described in
Table 2, for 30 s; 72 °C for 40 s), and a final extension step (72 °C for 2 min).
PCR products (10 µL) were fractionated by agarose gel electrophoresis in 1X TBE buffer under two different conditions. When target amplicons were in the range of 100–800 bp, they were resolved in 2% agarose gels (10-cm length) at 60 V for 4 h, while larger PCR products (up to 2.5 Kb) were better separated in 1.7% agarose gels (20-cm length) at 120 V for 10 h. The 100 bp DNA Step Ladder (Promega, Madison, WI, USA), was used as the molecular weight marker. Gels were immersed in 1X GelRed (Biotium, Fremont, CA, USA) for 1.5–2 h, and exposed to ultraviolet light in a ChemiDoc XRS+ (BioRad, Hercules, CA, USA) to visualize DNA fragments.
2.3. Cloning and Sequencing of iPBS Fragments
Several polymorphic iPBS bands were collected from agarose gels with sterile scalpels. Amplicons with identical lengths from different individuals were pooled, before purification of DNA fragments with the E.Z.N.A. MicroElute Gel Extraction kit (Omega BIO-TEK, Norcross, GA, USA), following manufacturer’s instructions. Concentration and purity of DNA preparations were determined spectrophotometrically, as explained before.
Purified iPBS amplicons were cloned into pJET1.2/blunt vector making use of the CloneJet PCR Cloning kit (Thermo Scientific, Bedford, MA, USA), following manufacturer’s recommendations. E. coli TOP10 cells were transformed with the ligation mixtures following a CaCl2/heat shock transformation protocol. Transformant colonies were screened by PCR to determine insert lengths using the amplification primers supplied by the kit. Recombinant plasmids bearing the targeted iPBS amplicons, were purified with the E.Z.N.A. Plasmid DNA Mini kit (Omega BIO-TEK, Norcross, GA, USA) following manufacturer’s indications. Purified plasmids were spectrophotometrically quantified as explained above, and prepared for Sanger sequencing.
2.4. Identification of Potential LTRs and IRAP Analysis
To generate a unique consensus sequence for each iPBS fragment, plasmids from at least three positive
E. coli clones were sequenced. In an attempt to discard locus-specific sequences and to identify all potential LTRs, at this step the first 200 nucleotides at the 5′ end of both forward and reverse DNA chains from all sequenced iPBS amplicons were aligned with ClustalW in MEGAX software [
54]. This multiple alignment was used to construct a consensus UPGMA tree from 10,000 bootstrap replicates [
55]. Tree branches reproduced in less than 80% of replicates were collapsed, and the remaining clusters were the start point for the next analysis step. The complete iPBS sequences belonging to a certain cluster were retrieved and compared in search of potential LTRs following recommendations of Kalendar et al. [
49], which essentially involved the identification of a conserved region at the 5′ end of iPBS sequences, which should start by 5′-TG dinucleotide (as far as 5 residues from the 3’ end of the PBS primer biding site), and finish by CA-3′. This conserved region (potential LTR) was presumed to be followed by a less conserved locus-specific region, as a consequence of LTR-retrotransposon integration at different genome loci (
Figure S1).
Potential LTRs were then used to design primers for implementation of IRAP technique. Oligonucleotide sequences were designed with PRIMER3 application included in Gene Runner software [
56], maintaining their melting temperature (Tm) at about 65 °C. The IRAP technique was implemented by making use of eight different single primers, which recognized potential avocado LTRs. Preparation of PCR reactions, amplification profiles and electrophoresis conditions were exactly the same as described for the iPBS analysis, with the exception of annealing temperatures (Ta), which were experimentally optimized for each oligonucleotide tested.
2.5. Phylogenetic Inferences
Band patterns obtained from IRAP or iPBS analysis were carefully inspected, and converted into binary markers (presence of a band was coded as 1, while its absence as 0). The resulting data matrices were imported into DARwin software [
57], which was used to estimate genetic distances between avocado cultivars, applying the Jaccard similarity index. From the distance matrix, phylogenetic trees were obtained using the weighted Neighbor-Joining algorithm contained in the same software, and robustness of each tree node was assessed by 10,000 bootstraps replicates, and a consensus tree was generated, with tree branches reproduced in less than 50% of replicates collapsed.
4. Discussion
Transposable elements have been identified as one of the main sources of genetic variation in plants [
50,
61,
62]. In land plants, retrotransposons are present as high copy number elements, conforming a substantial part of their genomes [
63], and are currently considered as important drivers in evolution [
64]. Accordingly, molecular techniques able to detect mobile element-based genetic variation have been validated as useful tools for generation of molecular markers [
48]. Unlike the IRAP technique, which has been clearly contrasted in the literature [
44,
45,
46,
47], the iPBS strategy has been described more recently [
49]. As far as we know, only one previous study, based on retrotransposons, has been successfully implemented in
P. americana. In that work, polymorphism detection and discrimination ability was compared between the retrotransposon-based technique Inverse Sequence Tagged Repeat (ISTR), with SSR and AFLP analysis, concluding that the number of average polymorphic bands obtained by AFLP and ISTR analysis was higher than with SSR [
65]. Nevertheless, iPBS has also been used to characterize genetic variation present in several genera of agronomic interest, such as
Prunus [
66],
Vitis [
67,
68],
Psidium [
69],
Phoenix [
70],
Nicotiana [
71],
Solanum [
72],
Allium [
73,
74],
Gnetum [
75] or
Musa [
76]. The clear advantage of the iPBS strategy is that prior knowledge of nucleotide sequences from target species is not required, since a relatively small set of “universal” primers can be used for analysis of any eukaryotic organism [
46,
49]. Moreover, co-dominant markers, as microsatellites or SNPs, require allele dosage determination, which is especially difficult for partially heterozygous genotypes in polyploid species. This limitation is usually overcome after transformation of co-dominant genotypes into dominant ones, thus providing essentially the same binary information as iPBS or IRAP. Therefore, another important advantage arises when iPBS and IRAP are implemented for analysis of polyploid species, as dominant multi-locus genotypes are directly obtained, which can be immediately analyzed with most available bioinformatic tools [
46,
49]. Finally, their set-up simplicity, low cost, and the lower need for scarce laboratory resources, make iPBS and IRAP very attractive tools for population genetic analysis in the field of agronomy.
In the present work, iPBS strategy was applied, for the first time, to analyze genetic variability in
Persea americana cultivars of agronomic interest, showing adequate repeatability for three PBS primers, and revealing a high proportion of polymorphic alleles (63.8 to 78.3%). The haploid genome size for the Hass cultivar ranges between 1.33–1.63 Gb [
40] approximately, and draft versions of
P. americana cv. Drymifolia (Mexican horticultural race) and Hass (GxM hybrid) genomes have been published, confirming the presence of 12 chromosomes [
77], but these nucleotide sequences were not considered in the present work because they did not include a reference West Indian genome. Selection and sequencing of iPBS fragments allowed us to characterize twelve different
P. americana potential LTR clusters, which were then used for design and validation of IRAP primers. As far as we know, the present work also represents the first time in which the IRAP technique has been applied to investigate genetic variability between
P. americana purebreds and hybrid cultivars. To find out if genetic information obtained through IRAP analysis compensates the effort required for LTR characterization from iPBS fragments, the same set of avocado gDNA samples were analyzed with three iPBS or IRAP primers, in such a way that application of both techniques required similar effort. With respect to the amount of genetic variation detected, the total number of alleles was about 12% higher for iPBS (101 alleles) than for IRAP (89 alleles), while the percentage of polymorphic alleles was similar (67.3% and 64.0% for iPBS and IRAP, respectively).
Moreover, the usefulness of the data generated by both strategies was compared by respective phylogenetic analysis. Briefly, data provided by the iPBS and IRAP techniques allowed clear distinguishing of cultivars with a West Indian component (SS3, Julián and Choquette) from the rest (89% and 99% bootstrap replicates for iPBS and IRAP, respectively). In addition, a clade formed by GxM hybrids Hass and Lamb-Hass, as well as another cluster formed by the GxM cultivar Bacon and the Mexican purebred Thomas, were concordant with both techniques. However, the iPBS technique was better able to define genetic relationships among the rest of the cultivars. Overall, we could conclude that the results provided by the two techniques showed concordant results. However, the IRAP strategy showed less ability to discriminate between the Guatemalan purebred and several GxM hybrids.
It must be considered that the bibliography available about phylogenetic relationships between avocado cultivars is certainly confusing and sometimes contradictory. Therefore, it was difficult to compare our results with other studies but, in general, we found that the iPBS and IRAP phylogenies reached plausible conclusions. For example, SS3 and Julian cultivars, which were found to be mother and descendant (ICIA, personal communication), were grouped into a well-differentiated clade. The same coherence was observed with Choquette and Julian cultivars, both GxW hybrids, which are known to be closely related with the West Indian purebred [
35]. However, an important discordance, related to the clade composed by Hass and Lamb-Hass cultivars, was found. This clade showed relatively high bootstrap values by both iPBS and IRAP-based phylogenies, while microsatellite-based phylogeny suggested a higher association between Hass and Pinkerton cultivars [
78]. Another surprising result related to Zutano and Bacon cultivars, which are closely related, according to bibliography [
36,
78]. Nevertheless, Bacon was found to be more related with Thomas by both iPBS and IRAP phylogenies. These discordances could be explained because molecular markers, with different change rates, were analyzed in these studies. In addition, botanical characterization of avocado horticultural races has usually been based on a limited set of phenotypic characters, which were not necessarily related to the large number of molecular markers analyzed. Another explanation could be related to the difficulty in certifying the authenticity of certain hybrid cultivars, due to frequent seed propagation without control of the male parent involved in the fertilization event, in such a way that two cultivars with the same breed denomination could actually be genetically different. This genetic drift would occur even if fertilization processes are controlled, since parent cultivars are usually not genetically pure.
In conclusion, the IRAP results were quite similar to those obtained by iPBS, regarding the amount of genetic variation detected. However, the ability of our IRAP primers to determine phylogenetic differences was found to be lower. In addition, the time-consuming work that requires LTR identification makes this technique less attractive than iPBS. The clear advantage of iPBS relies on the generation of potential LTR sequences, which could be used in combination with locus-specific primers to design RBIP molecular markers that allow a simple PCR characterization of P. americana cultivars and horticultural races, especially in those cultivars with a genetic component from the West Indian race.