**3. Results**

#### *3.1. Sequence Characteristics of CO I Gene Fragment*

A total of 401 *CO* I sequences were obtained, representing 44 species and 10 genera. All the sequences were trimmed to a consensus length of 533 bp. The mean nucleotide compositions for the complete data set were as follows: 22.7% adenine (A), 29.5% thymine (T), 29.5% cytosine (C), and 18.2% guanine (G). The highest percentage of G-C (55.69%) was detected in the first codon, whereas the lowest (42.96%) was detected in the second codon (Table 2). Within the 533-bp nucleotide sequences in the complete data set, there were conserved sites (327, 61.53%), variable sites (204, 38.27%), parsimony-informative sites (194, 36.40%), and singleton sites (10, 1.88%). Transitional pairs (si = 458) were present in greater numbers than transversional pairs (sv = 52). The ratio of si/sv (R) was 21.00 for the data set (Table 2).


**Table 2.** Sequence variation of the *CO* I gene and average nucleotide frequencies of *CO* I partial sequences of *Scaridae* (%).

Note: ii = Invariant pairs; si = Transitional pairs; sv = Transversional pairs; R = si/sv.

#### *3.2. Genetic Distance between Species and within Species*

Intraspecific K2P distances ranged from 0.000 to 0.015, and most intraspecific genetic distances were below 0.01. There were four species with intraspecific genetic distances between 0.01 and 0.02 (Figure 1). The mean intraspecific genetic distance was 0.003. Among the 44 species, *Scarus flavipectoralis* and *Nicholsina usta* had the greatest interspecific genetic distance of 0.248, while *Chlorurus sordidus* and *C. spilurus* had the lowest interspecific genetic distance (0.002). Most interspecific genetic distances were above 0.1. Overall, the mean interspecific genetic distance was 0.159, nearly 53 times higher than that among individuals within the same species (Supplementary Table S1).

**Figure 1.** The intraspecific genetic distances of the family *Scaridae*. Note: The abscissa represents the species: 1. *Bolbometopon muricatum*, 2. *Cetoscarus ocellatus*, 3. *Hipposcarus longiceps*, 4. *Scarus iseri*, 5. *S. rubroviolaceus*, 6. *S. ghobban*, 7. *S. taeniopterus*, 8. *S. niger*, 9. *S. forsteni*, 10. *S. prasiognathos*, 11. *S. frenatus*, 12. *S. dimidiatus*, 13. *S. oviceps*, 14. *S. chameleon*, 15. *S. rivulatus*, 16. *S. globiceps*, 17. *S. quoyi*, 18. *S. flavipectoralis*, 19. *S. schlegeli*, 20. *S. fuscopurpureus*, 21. *S. psittacus*, 22. *S. pinus*, 23. *Chlorurus capistratoides*, 24. *C. japanensis*, 25. *C. bleekeri*, 26. *C. microrhinos*, 27. *C. frontalis*, 28. *C. spilurus*, 29. *C. sordidus*, 30. *Sparisoma radians*, 31. *S. aurofrenatum*, 32. *S. viride*, 33. *S. chrysopterum*, 34. *S. rubripinne*, 35. *S. rocha*, 36. *S. cretense*, 37. *S. atomarium*, 38. *Cryptotomus roseus*, 39. *Nicholsina denticulata*, 40. *N. usta*, 41. *Leptoscarus vaigiensis*, 42. *Calotlmus carolinus*, 43. *C. viridescens*, 44. *C. spinidens*.

#### *3.3. Molecular Phylogenetic Tree*

The NJ tree clustered *C. sordidus* and *C. spilurus* together, while the other individuals clustered by species (Figure 2). There were close relationships between *C. japanensis* and *C. capistratoides*, *C. carolinus,* and *C. viridescens*; *S. rivulatus* and *S. globiceps*; *S. rubroviolaceus* and *S. ghobban* and *S. schlegeli* and *S. ferrugineus*, which together formed a cohesive group with a moderately significant bootstrap value above 80%. Simultaneously, *Chlorurus*, *Cryp-*

*totomus*, *Nicholsina*, *Leptoscarus*, *Hipposcarus*, *Bolbometopon*, *Sparisoma*, *Calotomus*, *Cetoscarus,* and *Scarus* clustered into separate branches.

**Figure 2.** NJ tree resulting from analysis of the CO I gene for *Scaridae* species Bootstrap vaues Higher than 50% based on 1000 replicates are shown on the branches. The scale represents a genetic distance of 0.01 per million years.

### *3.4. New Country Records*

Based on the samples collected, identified, and processed in this study, and a BLAST search in the BOLD and GenBank databases, three species, *C. carolinus*, *C. japanensis*, and *S. rivulatus*, are reported from mainland China for the first time. This increases the parrotfish diversity for mainland China species.

#### **4. Discussion**

*CO* I is commonly used as a barcode marker for animal species when the intraspecific K2P distance is below 1% and rarely exceeds 2% [36]. Hebert et al. suggested that the key point for the effective identification of species using *CO* I gene sequences is that the interspecific genetic distance must be greater than the intraspecific genetic distance, and the distances must differ by approximately 10 times [36,37]. The mean intraspecific genetic distance was 0.003 for the entire data set. The average interspecific genetic distance of the entire data set was 53 times that of average intraspecific genetic distance. The NJ tree clustered *C. sordidus* and *C. spilurus* together, and all other individuals clustered together by species, with high confidence. These results indicate that the *CO* I gene sequence can be used to effectively identify species in the family *Scaridae*.

On average, a G-C content of 47.75% was detected in the dataset. The base composition characteristics of the *CO* I gene were consistent with those reported for other teleosts, all of which had a GC content lower than AT content [38,39]. The first codon had the highest G + C content (48.3%), and the variation range was 53.39–56.74%. The second codon had the lowest G + C content (42.96%), and the variation range was 41.57–44.94%. TheG+C content of the 3rd codon was 44.53%, ranging from 30.86–52.57%. In the CO I gene, si is often greater than sv, and the smaller the ratio R, the faster the evolutionary rate. In this study, the first and third codons had the largest and smallest R-values, respectively; the third codon had the fastest evolutionary rate, and the first codon the slowest. The possible explanation for this result might be that the variation range of G + C content is directly related to the evolution rate of codons. The larger the variation range of G + C content, the faster the evolution rate of codons.

The greatest genetic distance within the species was less than the smallest genetic distance between species, and a barcode gap was generated. Barcode gap is a key factor for the accurate identification of species from DNA barcodes. The intraspecific genetic distances of *C. sordidus* and *C. spirus* were 0.005 and 0.000, respectively. The interspecific genetic distance between the two species was 0.003. No barcode gaps were observed between the two species. In the NJ phylogenetic tree, *C. sordidus* and *C. spilurus* formed independent branches with a confidence level of 88. Therefore, the results of this study are identical to those of earlier studies, supporting the assertion that *C. spilurus* and *C. sordidus* are the same species [16].

The divergence between *Chlorurus* and *Scarus* was quite close, 6.0–7.4 mya, and the genera *Chloruus* and *Scarus* showed most of the variation after 3–5 million years [40]. However, Bellwood regarded the genera *Chlorurus* and *Scarus* as two distinct monophyletic lineages [41]. The topological structure reinforced the morphological diagnosis that these two genera belong to a monophyletic lineage, and together form a good clade [39]. Bayesian analysis was consistent with previous studies that provided strong support for confirming the identity of *Chlorurus* and *Scarus* [42]. In our study, *Chlorurus* and *Scarus* clustered into two independent branches, verifying the morphological diagnosis. In the phylogenetic tree, fish species of each genus formed an independent branch. Therefore, *CO* I is also suitable for identification at the genus level in the *Scaridae*.

The International Barcode of Life (iBOL, http://ibol.org/; accessed on 2 June 2022) is the global leader in DNA barcode work, determining species based on DNA barcodes, and sharing results freely [43]. Notably, the development of DNA barcode libraries is based on community efforts, and the use of the BOLD has led to DNA barcode technology being regarded as the standard for species recognition [38]. In BOLD, barcode sequences are stored and associated with other taxonomic data (voucher images, location data, etc.) to improve the accuracy of species recognition [44]. The BOLD has accelerated exchanges between countries worldwide, enabling global resources to be interoperable and species identification to be more standardized. BOLD is an accessible database for the analysis and search of DNA barcode data [45]. International life barcodes have several shortcomings: (i) Data sharing is not timely. Data provided by countries to the BLOD can be made public after only two years, because researchers hope to disclose the data only after analysis

or article publication. (ii) According to the BLOD standards for data management, each DNA barcode must have complete voucher specimen information, acquisition information, and the original files of the sequenced peak map. However, many research groups and researchers from China cannot strictly follow these requirements and standards, and the quality of the data is greatly reduced. (iii) The data were not updated in time to match the genetic sequences of the same species, which were still identified by a previous name. Therefore, the entries were not unified. Although *C. sordidus* and *C. spilurus* belong to the same genus, their genetic distance is very small, and *C. spilurensis* syn. nov. is therefore synonymized with *C. sordidus* [16], and they cluster together in a molecular evolutionary tree. However, when the sequences are aligned in BOLD, they do not have the same species names.
