1.2.2. Novel Computational Resources and Software

The Barcode of Life Data System (BOLD, https://www.boldsystems.org/, accessed on 26 January 2022 [18]) has been the core bioinformatics resource dedicated to hosting DNA barcode sequence data since it was launched in 2007. In addition, many computational resources and software have been developed to accommodate the expanding role of DNA barcodes. Some of these packages (e.g., MDOP [19]) help researchers to organize DNA barcoding data before uploading to databases, such as BOLD and NCBI's Genbank, and still others are designed to assess the quality of data that have already been made publicly available (e.g., BAGS [20] and MACER [21]). The quality of DNA barcode data can be impacted by a number of factors, including poor sequence annotation, a lack of physical specimen vouchers, poor sequence quality, and incorrect consensus sequence building. The last of these factors is especially problematic for DNA barcoding methods based on high-throughput sequence reads. Fortunately, several recent software packages have been developed to address challenges with consensus sequence building, such as PIPEBAR, OverlapPER [22] and NGSpeciesID [23].

Taxonomic assignment is key for downstream applications of DNA barcode sequences and the accuracies of approaches, which assign sequences from unknown taxa to a recognized barcode sequence, are critical [24]. Despite the development of several tools to accurately assign sequences to taxa represented in barcode sequence databases, comparison across software has demonstrated that it remains challenging to accurately assign sequences to taxa at or below the level of genus [25]. Taxonomic assignment methods are being developed and refined rapidly, with several options published in just the last four years. Among these are the QIIME2 feature classifier [26], IDTAXA [27], MeTaxa2 [28], and Basta [29]. Although the methodology to perform taxonomic assessment is quickly evolving, older methods are accurate, still perform well, and continue to be used, such as Kraken2 [30], Protax [31], and the longstanding BLAST tool [32]. Beyond these methods, other options are optimized for clade-based metabarcoding reference databases (e.g., Fungi: funbarRF [33]) or have been developed as part of custom pipelines that have more specific user needs (e.g., the Anacapa Toolkit [34]). Ultimately, the ability of any computational method to accurately match a sequence from an unknown species is dependent upon well-curated, annotated, and comprehensive reference sequence databases. Focus should remain on populating DNA barcode reference databases with high-quality sequence data from accurately identified and vouchered collections.

### 1.2.3. National and International Sequencing Consortia

The effort to contribute DNA barcode sequence data is coordinated worldwide through both national and international organizations. Coordination of international barcoding activities began in 2004 with the Consortium for the Barcode of Life, followed by the International Barcode of Life Project (iBOL, https://ibol.org/, accessed on 26 January 2022) in 2008. National efforts have also been launched in Austria (ABOL), Finland (FinBOL), Germany (GBOL), the Netherlands (NBOL), Norway (NorBOL), and Switzerland (Swiss-BOL) to name a few. Most recently, BIOSCAN [35], an international project organized by iBOL, was initiated and includes 1000 researchers in over 30 countries with the objective of generating DNA barcodes to discover species, to understand species interactions, and to monitor species in a global biological surveillance system. Once achieved, the collective goals of these organizations will result in a DNA barcode library for nearly all species on Earth.

In the nearly two decades since DNA barcodes were first proposed, other ambitious and sweeping networks have emerged that also reflect the fundamental goal of the DNA barcoding community: to leverage organismal DNA to understand life on Earth. One of these, the Global Genome Biodiversity Network (GGBN, https://www.ggbn.org/ggbn\_portal/, accessed on 26 January 2022 [36]) represents a network of well-curated tissue collections that seeks to develop standards, share collection information, and facilitate biodiversity genomics research. More recently, the Earth BioGenome Project (EBP; https: //www.earthbiogenome.org/, accessed on 26 January 2022) was launched [37] as a "moonshot" [38] for biology that aims to sequence whole genomes of all eukaryotic species on Earth in ten years. Although not specifically aimed at DNA barcode loci, EBP will indirectly provide a wealth of sequence data for the major DNA barcode loci of plants, animals, and fungi. DNA barcoding, which was originally considered to be at one end of the sequence spectrum, is now converging with entire genomes [39]. These global efforts, which have been described as "networks of networks," build connections among more localized, often national endeavors.

The organization of DNA barcoding projects has often followed geopolitical boundaries and the most common denominator for large sequence programs reflects local, regional, or national funding structures. Some examples of these at a regional and national levels include the African Centre for DNA Barcoding (https://www.acdb.co.za/, accessed on 26 January 2022 [40]), the Canadian Centre for DNA Barcoding (https://ccdb.ca/, accessed on 26 January 2022), and the China Plant BOL (Barcode of Life) Group [41]. In a similar way, the United Kingdom's Darwin Tree of Life Project (https://www.darwintreeoflife.org/, accessed on 26 January 2022 [42]) takes a geopolitical approach toward their goal to sequence the whole genomes of all eukaryotic species in Britain and Ireland. These focused, localized research networks contribute to international goals that help support the shared priority of advancing a global understanding of biodiversity and facilitate the use of DNA barcodes and other genetic tools for broader ecological, evolutionary, and conservation purposes.

### 1.2.4. Building the Plant DNA Barcode Library

With more than half a million plant DNA barcode sequences available today in the Barcode of Life Data Systems (BOLD, Figure 1), continuing to populate the global library is a major effort of botanists. In addition to the national and multinational projects described above, building the plant DNA barcode library can be enhanced by taking advantage of a number of diverse efforts, such as forest monitoring plots, individual lineage-based taxonomic studies, and regional floristic efforts. Forest monitoring plots, such as the Smithsonian's Forest Global Earth Observatories (ForestGEO) and the National Science Foundation's Long Term Ecological Research (LTER) sites, are rich resources because they have well-verified identifications, vouchered collections, and individually tagged trees that can be revisited by botanists if necessary [43–46]. Even if no specific monitoring plots have been established, many studies have generated DNA barcode libraries for specific habitats [47], plant communities [48], or regional taxa [49–51] and are thereby expanding the global plant genetic library. Individual taxonomists are also generating DNA barcodes for specific groups of plants as either standard markers (e.g., [52–55]) or as an offshoot of their basic molecular phylogenetic investigations aimed at understanding

evolutionary relationships. Preserved museum specimens can also be used to generate DNA barcodes [56]. It is significant that one recent study has encouraged a large-scale effort to sequence DNA barcodes from all types of specimens [57]. All of these DNA sequences add to the library of standard DNA barcode markers even if they do not carry the official GenBank DNA barcode designation.

Other efforts to generate DNA barcodes for entire regional floras are in some cases complete or just getting underway. One of the most impressive is the library that has been built for identifying the vascular plants of Canada [58], which includes sequence records (*rbcL*, *matK*, and ITS2) for 96% of the 5108 species known from that country. Another success story for plant DNA barcodes is the China Plant Barcode of Life [41]. This sixteenyear project has now generated and made available for use 120,000 DNA barcodes for 16,000 species, representing a significant sampling of the entire flora of China. Other examples are the recently completed DNA barcode library for the plants of the UK [59], and work that has started on the flora of the Arabian peninsula [60].

### *1.3. The Purpose and Structure of This Review*

Today, more than ever, DNA barcodes are being used to advance our understanding of how species evolve, how they interact, and how we can slow down their extirpation and extinction (e.g., [61–63]). As sequencing technologies have improved and sequencing costs have declined, the use of DNA barcoding is skyrocketing and some of the most exciting prospects for using this new taxonomic tool are being realized. A number of comprehensive reviews of the application of plant DNA barcodes to the fields of ecology, evolution, and conservation have been provided in the past [5,64–68]. This review and the Special Issue of *Diversity* of which it is a part focus on current areas of research as well as new applications of DNA barcodes that are the direct result of the accumulation of barcode reference sequences, including past trials, experiments, and applications of this twenty-first century biological tool (Figure 2).

**Figure 2.** A graphical representation of DNA barcoding today. DNA barcode applications in ecology (left), evolution (top), and conservation (right) are supported by a foundation of collections, metadata, and informatics (bottom). These applications are facilitated by increasingly large DNA barcode reference databases (center circle) that are reciprocally built from and contribute to the major biological disciplines. National and international initiatives that support the growth of DNA barcode reference databases are core resources (green circle).
