1. Introduction
Recent advances in molecular techniques continue to offer new, unprecedented opportunities to increase our understanding of organisms without subjecting them to lethal harm [
1,
2,
3,
4,
5]. The use of noninvasive genetic sampling offers tremendous utility for conservation practitioners to help inform management and recovery decision-making by securing high yield and quality DNA from rare or imperiled taxa. For many vertebrate classes, noninvasive genetic sampling has generally included collecting feces, urea, hair, shed skin, or molted feathers among other sources [
6,
7,
8,
9,
10]. For many arthropods, specifically insects however, the predominant available options have often involved the removal of wing clips, palps, leg clips, or individual legs [
11,
12,
13,
14,
15,
16,
17]. Despite being nonlethal, such methods still require capture and handling of the target organism which can inflict significant stress, inadvertent damage, or death, and may not be feasible for low density or at-risk populations. Alternatively, researchers have also successfully secured DNA from frass, exuviae, and hemolymph in defensive secretions [
18,
19,
20,
21,
22]. However, these sources may not be broadly available for many taxa and certainly offer significant challenges for reliable collection in the field. Here, we investigated the effectiveness of using the chorion from hatched ovae of the federally endangered Miami blue butterfly (
Cyclargus thomasi bethunebakeri) (Lepidoptera: Lycaenidae) for DNA extraction and analysis. We additionally provide a validated protocol for field collection with the closely related
Callophrys irus irus. This taxon is univoltine throughout its range. In Florida,
C. irus irus is associated with fire-maintained sandhill pine and oak upland habitat that support its sole larval host,
Lupinus perennis.
2. Materials and Methods
2.1. Samples
As part of our proof-of-concept for noninvasive genetic sampling of rare populations, we took advantage of material readily available from a captive breeding population of
C. thomasi bethunebakeri maintained at the McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History in Gainesville, Florida, U.S.A. The primary objective was to determine if sufficient genetic material for gene sequencing could be recovered from egg case debris left by recently hatched larvae. Secondarily, we were also interested in the effects of non-target material carryover, such as host plant tissue, when target DNA is rare or in low-abundance. Non-target tissue carryover can reduce yields of rare target DNA, especially if non-target tissue is fresher and relatively more abundant. Because these butterfly egg cases are small (~1 mm), delicate, and potentially low quality, non-target tissue carryover is likely and possibly unavoidable. To address this concern, sampling conditions included variable source material: egg cases only (n = 15) or egg cases + host plant tissue (n = 15;
Figure 1). Treatments with host plant tissue received a hole punch size circle approximately 1 cm in diameter. Five of each of these source material combinations was randomly assigned to a storage treatment in lysis buffer at room temperature for 1, 5, or 14 days prior to DNA extraction (
Figure 1). When collecting egg case samples, time since hatching was not controlled for and ranged between 1 and 10 days. This uncertainty in time since hatching reflects that of field conditions where this variable is often unknown. Initial attempts to prepare egg debris stored in ethanol for DNA extraction were unsuccessful given the small size (<1 mm) and static charges. Therefore, for this proof-of-concept all samples were stored at 4 °C and directly in 180 µL of ATL lysis buffer (Cat. #1014758, QIAGEN, Hilden, Germany) used for DNA extractions. While sample storage in lysis buffer may not be ideal, its use represents a compromise between the challenges of field collection and working with the material. To ensure that storage in lysis buffer did not impact the sample and downstream genetic analyses, additional controls were included: host plant tissue only (n = 3), adult tissue (n = 3), and adult tissue + host plant (n = 3). All adults were freshly deceased from natural causes.
2.2. DNA Extraction and Amplification
DNA was extracted from all samples per manufacturer’s instructions for animal tissue using the Qiagen DNEasy Blood & Tissue kit with some modifications. Tissue in lysis buffer was not mechanically disrupted prior to the initial incubation. Incubation in lysis buffer and proteinase K was overnight, ~12 h. To avoid clogging the silica membrane only lysate was applied to spin columns avoiding large tissue debris like that from host plant material. A single final elution 50 µL was performed. Total DNA from the resulting extracts was quantified using the high sensitivity Qubit dsDNA assay (Invitrogen, Carlsbad, CA, USA) per manufacture’s instructions with 2 µL of extract. Aliquots of DNA were normalized to the lowest total DNA concentration with molecular grade water prior to PCR. This was to ensure as much similar starting material as possible for amplification.
A 640 bp fragment of the barcoding gene COI was amplified in PCR using Lepidoptera specific primers from Hebert et al. [
17]: LEP-F1, 5′-ATTCAACCAATCATAAAGATAT-3′; and LEP-R1, 5′-TAAACTTCTGGATGTCCAAAAA-3′. The COI barcoding gene was chosen because reference sequences were available for vouchered specimen of
C. thomasi bethunebakeri enabling us to confidently confirm species identity via sequencing. Additionally, COI is one of the most commonly sequenced barcoding genes in animals so it will therefore be a versatile marker for similar research using tissue of possibly unknown origin where species identity needs to be confirmed. A master mix PCR cocktail was prepared for carrying out 20 µL reactions, each containing 1 unit of Platinum Taq DNA Polymerase (Invitrogen), 1× PCR Buffer, 2.0 mM of MgCl
2, 0.4 mM dNTPs, 0.2 µM of each primer, 13.4 µL of PCR-grade H
2O, and 2 µL of DNA template normalized to 1.1 ng/µL. Two reactions contained PCR-grade H
2O in lieu of template to serve as negative controls. Thermocycling conditions were modified from Hebert et al. [
17] and consisted of one cycle of 1 min at 94 °C, six cycles of 30 s at 94 °C, 40 s at 48 °C, and 1 min at 72 °C, followed by 42 cycles of 30 s at 94 °C, 40 s at 51 °C, and 1 min at 72 °C, with a final extension step of 1 min at 72 °C.
A 5 µL aliquot of PCR products was combined with 1 µL of 5× loading dye (Bioline) for Agarose gel electrophoresis. Product–dye mix was run for 120 min at 100 V alongside 5 µL HyperLadder™ 50 bp (Bioline) on a 2% TAE Agarose gel stained with Ethidium Bromide. Additionally, PCR products were quantified using the broad range Qubit dsDNA assay (Invitrogen) per manufacture’s instructions and 2 µL of product.
2.3. Validation
To quantitatively assess DNA extraction and gene amplification yields and capture methodological variation, mean differences in dsDNA recovered and amplicon abundance were tested for using an analysis of variance (
aov {stats}) with Tukey’s Honest Significant Difference (
TukeyHSD {stats}) post hoc test in R version 3.4.1 [
23].
Identity of the resulting amplicons was validated via Sanger sequencing. A single exemplar PCR product from each treatment combination (tissue and storage time; n = 9;
Table S1) was sent to Genewiz (Plainfield, NJ, USA) for purification and sequencing in the forward and reverse directions. Consensus sequences were generated from successfully sequenced and quality trimmed amplicons using Geneious v. 11.1.5 [
24]. Gene and species identity were confirmed via NCBI’s BLASTn megablast [
25] and Genbank [
26], which contains COI sequences from vouchered specimens of
C. thomasi (KY412475.1).
2.4. Field Testing
The efficacy of these methods with natural samples was tested by sampling egg cases of frosted elfin butterfly (Callophrys irus irus) populations. Citizen scientists were recruited, trained, and subsequently conducted the collection of all field samples. Participating individuals were experienced and knowledgeable butterfly watchers. Field and classroom training was provided for egg detection, collection, and storage along with a simple written protocol. A total of 84 egg case samples were collected from 23 locations in Apalachicola National Forest and Blackwater River State Forest in Florida, USA during March 2018. Collection sites were distributed across six burn units and, if possible, egg case collections were done in multiple sites within a burn unit. Sampling across and within burn units was conducted because each unit has a different management history, potentially impacting organism abundance and gene flow. In most cases, egg cases were collected from multiple plants found in the same patch of host and combined into a single sample. Pooling of egg cases within a host plant patch was performed to increase target tissue abundance and the chances of recovering target DNA. The number of eggs in each sample ranged from 2 to 20. Upon collection, the eggs were immediately preserved in ATL lysis buffer (QIAGEN, Cat. #1014758) and stored at −20 °C until DNA extraction.
Extraction and COI gene amplification were carried out as described for the proof-of-concept with
C. thomasi bethunebakeri. Unlike for the proof-of-concept, no treatments for storage time or non-target tissue were carried out. Correlation between quantity of egg case starting material and gDNA yield was tested for using Pearson’s product-moment correlation as computed in R using
cor {stats} and
cor.test {stats}. For samples that failed to amplify following this methodology, additional amplification reactions were preformed using the universal COI primers LCO1490: 5′-GGTCAACAAATCATAAAGATATTGG-3′ and HC02198 5′-TAAACTTCAGGGTGACCAAAAA ATCA-3′ designed by Former et al. [
27] and the same cycling conditions described for the Lepidoptera specific primers used for the proof-of-concept above. These primers were designed to amplify a 710-bp region of COI across invertebrates. All PCRs contained 2.l µL template DNA, 0.2 µM of each primer, 0.5 µg/µL of bovine serum albumin (BSA), 10 μL of 2× OneTaq Hot Start Quick-Load Master Mix (New England BioLabs, Ipswich, MA, USA) in a total volume of 20 μL. PCR products were analyzed by gel electrophoresis, purified, and sequenced via Sanger sequencing at Eurofins Genomics (Louisville, KY, USA).
Alignment and editing of resulting sequence data were performed in Geneious v. 11.1.5. To confirm species identity and eliminate non-target sequences, sequences were queried against the GenBank nucleotide database. These validated sequences were retained for downstream analyses if there was a minimum of 99% similarity over at least 96% of the query sequence to the C. irus reference sequence.
In order to assess population structure and connectivity additional loci with polymorphic sites were needed. Therefore, the nuclear gene elongation factor 1 alpha (EF1) was also amplified for all samples with validated COI sequences. Using primers ef44 5′-GCYGARCGYGARCGTGGTATYAC-3′ and efrcM4 5′-ACAGCVACKGTYTGYCTCATRTC-3′ [
28] a 1064-bp fragment of EF1 was targeted. PCR for EF1 was carried out under the following conditions: 94 °C for 2 min; 40 cycles of 94 °C for 30 s, 58 °C for 40 s, and 68 °C for 90 s; and 68 °C for 5 min. Reactions were the same as described for COI PCR above. Because there was no reference sequence for
C. irus EFI, only sequences matching the closely related
Ahbergia korea with of 98% similarity over at least 96% query were retained for downstream analyses.
Sequences passing our sequence similarity quality filters for both COI and EF1 were used to obtain a coarse overview of genetic variation and population structure by conducting an analysis of molecular variance (AMOVA) in R (poppr.amova {poppr}). Burn unit and then locality were defined as the nested hierarchy for the AMOVA. Statistical significance of variance components was computed with a permutation test using randtest {ade4} and 999 permutations.
4. Discussion
To our knowledge, this study is the first to demonstrate that it is possible to extract DNA of sufficiently high quantity and quality from residual butterfly egg debris for successful gene sequencing. It additionally describes a simple, low-cost, and reliable method of collecting and storing egg samples that can be adopted for field or laboratory work as well as deployed with projects that have a larger geographic scope and/or involve citizen scientists.
While the methods used for field sample testing did not reveal statistically significant population structure, if it even exists, additional sampling, additional loci, and genotyping technologies that can better identify heterozygosity will improve the methods utility. Even though resolving population structure would require an expanded study design as suggested, at a minimum, the presence of a rare species can be confirmed using our sampling methods and COI DNA barcoding. Likely, much of the variation observed in sequencing success rates and DNA extraction yields has to do with how degraded the sample had become prior to being collected and stored in lysis buffer. Sample DNA degradation could be evaluated in advance of PCR using agarose gel electrophoresis or spectrophotometry. However, often DNA yields are so low that gDNA gels would exhaust the available extract and specialized equipment such as the NanoDrop spectrophotometer for evaluating degradation with less material might not be available. Proceeding directly to PCR and sequencing is likely the most effective and accessible means for testing material quality. For example, EF1 sequencing of field collected samples had poor success, likely due to degraded sample DNA. Both the large gene fragment size and the pooled nature of samples may have also contributed to poor sequencing success. However, for both genes, PCR optimization and collecting fewer egg cases per sample can likely improve sequencing success rates. When age and level of sample degradation is unknown, our field testing proves that collecting useable genetic data is possible with as few as two egg cases. From the proof-concept we show that sequencing can be successful from samples with gDNA concentrations as low as 1.1 ng/µL even when carry-over non-target host plant tissue is present. Additionally, both the proof-of-concept and field testing demonstrate the usefulness and success of sample storage directly in lysis buffer.
Our results validate that this procedure is effective and has potential broad applicability for insect population and conservation research, not only for rare or at-risk (e.g., imperiled, threatened or endangered) insect taxa, but also common unthreatened species too. Similarly, the use of residual egg debris has advantages over other documented noninvasive genetic sampling methods involving larval exuviae, frass, or the more traditional removal of tissue samples (e.g., palps, wing clippings, legs) from adult organisms. These can be problematic logistically as they often require longer organism holding times, necessitate organism removal from the environment, or may result in a limited number of viable available samples [
18,
19,
20,
21,
22]. For example, Saarinen et al. [
13] demonstrated that wing clips represent a viable, non-lethal method of obtaining DNA from a federally endangered butterfly. This technique however required temporary organism capture and handling as well as a sufficient number of adult organisms readily available for sampling. Similarly, Monroe et al. [
11] and Scriven et al. [
22] evaluated the effectiveness of using insect feces with somewhat mixed results. In both cases, the target organisms additionally needed to be captured and temporarily maintained in captivity to successfully obtain samples. While Hamm et al. [
14] demonstrated that the removal of small amounts of hind wing material had no significant impact on butterfly behavior or survival, the authors do suggest that wing clipping may not be appropriate for taxa with higher wing loading. Such methodologies may additionally be unavailable due to factors including permit restrictions that might be in place to limit organism handling or mitigate potential injury (i.e., mortality, damage, or stress). The collection of residual egg debris by contrast requires no living organism contact, and thereby would not potentially compromise small populations of rare or endangered species while at the same time allowing adequate sample sizes for population-level genetic analysis. It furthermore offers a potentially large pool of available samples that can be obtained over a somewhat longer time period than simply the phenology of a particular developmental stage or under weather conditions such as cloudy skies or cool temperatures that would typically not be optimal for adult activity and collection. This in turn would help provide added sample collection flexibility and/or opportunity.
The simplicity of the sample collection and storage protocol facilitates the use of trained citizen scientists to conduct field sampling as was done for this study, thereby significantly increasing both the project scope and the potential total number of samples collected. While volunteer training for field survey and collection protocols was necessary, the time commitment was minimal requiring just 1–1.5 days. This is substantially less in our opinion than what would be necessary to reach the appropriate skill level for adult collection, handling, and sampling while ensuring appropriate safeguards.
Residual egg debris sample collection is not without potential limitations. It is inherently labor-intensive owing to the small size of egg cases and their overall limited detectability in the larger landscape. Detailed ecological, life history, and behavioral knowledge of the target organism such as oviposition behavior and preference can help identify priority search areas and enhance detection probability in the field. In the case of Callophrys irus irus, the encounter rate of adult butterflies in Florida populations is generally low and quite limiting for nondestructive tissue sample collection. Egg and larval abundance by contrast tends to be locally high. Habitat access or impact may also be a factor. For egg debris collection to be effective, extensive larval host plant examination is required. This could potentially result in negative impacts to sensitive habitat areas such as trampling or disturbance to other rare taxa. All of these factors of course vary tremendously from one taxon to another, necessitating a highly targeted approach tailored to the specific organism and circumstance.
Sample degradation is also a potential concern. As the proof-of-concept indicates, samples known to be fresh when collected guarantee the best success for generating genetic data. However, even when there is uncertainty surrounding tissue and collection conditions, this study demonstrates that generating useable genetic data is possible even from samples with DNA yields below detection limits. Likely, much of the variation observed in the DNA extraction and sequencing success of field samples has to do with how the degraded the sample had become prior to being collected and stored in lysis buffer. It is unknown how long residual egg debris can reside in the environment and remain viable for DNA extraction and analysis. This undoubtedly depends on many factors, including total duration time, time of year, geographic location, sample location, ambient temperature, light intensity, etc. Additional trails are needed to better understand the potential temporal limitations of this nondestructive genetic sampling technique and sample source.