**4. Discussion**

The use of herbal medicine traditionally for disease treatment and as a precursor for developing several important drugs [2,73] necessitates the accurate identification of medicinal plants. The results of our study sugges<sup>t</sup> that the applications of DNA barcoding techniques can enhance the accurate identification of medicinally important species. Our study is among the first to utilize different DNA barcode markers and confirms the potential of the barcoding approach for the accurate identification of medicinally useful plants from the UAE that will help generate a reference dataset for research and other applications.

We investigated the efficacy of the three DNA barcode regions (rbcL, matK, and ITS2) for discriminating selected medicinal plant species belonging to the order Caryophyllales. The first step in assessing the potential candidate barcodes was to estimate the universality of the amplification and sequencing success rate across the studied taxa. The matK region showed a lower amplification rate (60%) than rbcL and ITS2, although two matK primers pairs were used with several attempts under different conditions (Figure 2a). The MatK (P1 and P2) pair was highly effective in the amplification success. However, the matK

(P3 and P4) pair resulted in low recovery (only one sample was amplified successfully). The inconsistent success rate has been reported for matK. Several studies have indicated that the matK region was less amplified than other regions in different angiosperms and gymnosperms, including some arid desert plants [74–76]. The universality issues of the matK primer could be attributed to the nucleotide variations in the respective binding site that could inhibit the PCR amplification [74], or to the large amplified product size (≈900 bps) that could be susceptible to the degradation [77]. Cräutlein et al. [78] suggested the need for further efforts to improve primer design in matK to achieve higher efficiency. For sequencing, a higher number of good-quality sequences (80%) were obtained for rbcL than the other two regions. This result is aligned with previous studies that compared the three barcode loci for the coding genes (matK, rbcL, and rpoC1) for the discrimination of different plants of the UAE and concluded that the rbcL was more effective in discrimination between species [15,56,57].

Several different approaches based on the DNA barcoding technique have been advised for assigning species to their relevant taxa [52,54,79,80]. Our analysis applied an integrative approach for the delimitation of species using unsupervised "OTU picking" methods, viz., ABGD and ASAP that use only pairwise genetic distances, along with supervised methods for more data reliability. The ABGD method automatically identifies where the barcode gap is located in their distribution. This gap marks the limit between minimum interspecific and maximum intraspecific divergence. Thus, it is crucial to ensure the distance-based method's effectiveness [51,81]. Our results showed that the recursive partitions in ABGD recognized more OTUs than primary ones, exhibiting a higher accuracy in species resolution under the analysis, which corroborates with previous observations [51,82,83]. Further, ASAP was performed to evaluate the relevance of the ABGD partitions, as any species partition must be subsequently tested against other evidence as recommended in an integrative taxonomy approach [50].

Our results indicated that the unsupervised ABGD method showed taxonomic conflicts in rbcL between *Amaranthus* species (*A. hybridus* and *A. viridis*), and between *Paronychia arabica and Sclerocephalus arabicus*. Interestingly, these species differed morphologically and could be discriminated easily (Figure 3a). Moreover, merged taxa were observed for the genus *Suaeda* (*S. aegyptiaca* and *S. vermiculata)* in the rbcL dataset using ABGD, as well as in the matK datasets using both the ABGD and ASAP methods (Figure 3a,b). Moreover, a low pairwise interspecific divergence of rbcL (=1.55%) and matK (=1.21%) was observed between the species of *Suaeda,* thereby exhibiting a monophyletic relationship. A similar result was observed by Kapralov et al. [80], who provided strong statistical support for the monophyly. The taxonomic relationships might be confusing due to the absence of a barcode gap, which can result from a limited number of sequences per species (i.e., <3–5) [51].

Following the ABGD and ASAP methods, species delimitation through characterbased supervised machine learning methods was utilized to understand better the confirmation of the initial identification [84]. So far, several studies have performed the character-based barcoding approach, which has proved its usefulness in identifying plant species better than the conventional unsupervised methods [52–54,85]. In our analysis, the unsupervised ASAP method tended to provide a better resolution potential for the rbcL dataset than its neighboring ABGD method (Table 1). In addition, ASAP was able to resolve two singleton species in the rbcL dataset that were not even recognized using the ABGD method (Figure 3a). Moreover, when compared with the supervised learning approach, the SVM method stood out as the more efficient method to provide an accurate identification than the unsupervised approach with the higher number of species, as observed in the rbcL and matK datasets (Table 1 and Figure 3a,b). In addition, *S. aegyptiaca* and *S. vermiculata* were also recovered as separate clades, which indicates that the intraspecific diversity could be hidden [34,86].

It has been reported that OTUs proposed by one or more methods could be inconsistent in distinguishing between the members of closely related genera [49]. In our study, we

observed that the members of genus *Amaranthus* (*A. viridis* and *A. hybridus*) were only discriminated through ASAP, but members of *Calligonum* (*C. crinitum* and *C. comosum*) were distinguished only by SVM. This supports the importance of using more than one method, especially for closely related species that are difficult to discriminate morphologically, such as *C. crinitum* and *C. comosum.* The use of more than one method can maximize the probability of identifying morphologically similar species and overcoming the limitation associated with each method [50,87].

Overall, the taxonomic performance of SVM was stronger than that of ABGD and ASAP in the rbcL dataset. The SVM delivered the highest incidence of correct matches (55.0%) across the 20 species compared to 35.0% and 45.0% for ABGD and ASAP, respectively (Table 1). In the matK dataset, the performance of ABGD was similar to ASAP (60.0%) and was improved to 73.33% using supervised learning methods. However, all the methods delivered a similar percentage of correct matches in the ITS2 dataset (Table 1). Considerably, it is now a well-known fact that the combination of the two plastid markers, ribulose 1,5-bisphosphate carboxylase gene (rbcL) and maturase K (matK), that were accepted as the core barcoding regions [33], do not gran<sup>t</sup> a suitable coverage of plant species. Thus, they must often be implemented along with the other hypervariable sequences, such as nuclear ITS or the plastid interspacer region trnH-psbA [88].

Moreover, the efficiency of the utilized markers and methods depends on the sample size, as the singleton species or small sample size could lead to skewed results [21]. In our study, we had about 10 singleton species, which were considered as singletons and not independent OTUs to reduce the probability of biased identification. Thus, an adequate sample size and proper implementation of the DNA barcoding technique can provide a scientific basis for the molecular identification and conservation of valuable medicinal species. Our study is among the first to utilize different DNA barcode markers and to confirm the potential of DNA barcoding in the accurate identification of medicinally important plants from the UAE. The dataset generated through this study will assist in developing the reference library, and allows others to contribute and explore the genetic potential of the available germplasm for various applications.
