1. Introduction
Since the International Board for Plant Genetic Resources (IBPGR) was established in 1974 to coordinate the global efforts to systematically collect and conserve the world’s threatened genetic plant diversity, many countries and organizations have founded gene banks, and millions of crop resources have been preserved [
1,
2]. As a result of the global efforts to conserve plant genetic resources for food and agriculture, the number and scale of ex situ germplasm collection has increased tremendously in the last 40 years [
3]. However, the large sizes of redundant collections, either individually or collectively, for particular species have become an obstacle to the characterization, evaluation, utilization, and maintenance of those species [
2,
3]. As part of this solution, the authors in [
4] proposed that collections could be pruned to core collections, which could “represent with a minimum of repetitiveness, the genetic diversity of a crop species and its relatives.” This core collection serves as a working collection that can be extensively examined, while the accessions excluded from such collections can be preserved as preliminary collections [
5]. Therefore, core collections can facilitate the use of crop germplasm and manage the entire collections [
2].
Tea (
Camellia sinensis (L.) Kuntze) is a woody evergreen plant in the family Theaceae and is native to the region covering the northern part of Myanmar, as well as the provinces of Yunnan and Sichuan in China. It is one of the most popular beverages and has become a daily drink for many people around the world [
6]. To combat climate change, biological threats, and market fluctuations, the main tea-producing countries of China, Sri Lanka, and India have managed and preserved their tea and genetic resources both in situ and ex situ [
7]. In addition, they have developed core collections or subsets of tea germplasm that maintain the original diversity of the collections but at a size that facilitates the evaluation, use, and conservation of the entire collections using geographical origin, phenotypic traits, and molecular markers [
8,
9,
10,
11,
12]. Knowledge and understanding of the genetic background, genetic diversity, relationships, and identification are important for the collection, preservation, characterization, and utilization of tea resources [
13]. The proper characterization and evaluation of genetic resources via systematic preservation and maintenance is the most important factor in utilizing such resources for improving crops [
14]. The characterization of germplasm can be carried out using morphological, biochemical, and molecular descriptors according to the standard criteria contained in the tea descriptors [
15]. Among the characteristics in tea descriptors, morphological traits and phytochemical content tend to be most affected by environmental factors. In addition, since phytochemical content can have very large variations depending on the environment, these characteristics need to be evaluated for multiple years, yielding more precise data. On the other hand, molecular markers are rarely influenced by the environment and thus directly offer an observation of genomic diversity.
The phytochemical characterization of plant germplasm is an acceptable method to define biochemical diversity [
16]. The composition of phytochemicals in tea is important, as these chemicals contribute to tea’s quality and pharmacological properties [
15]. Tea consists of compounds rich in polyphenols, theanine, and caffeine, which not only determine the quality of tea but also provide tremendous health benefits [
17]. Among tea polyphenols, catechins account for 8% to 26% of the tea leaves’ dry weight [
18]. Previous studies reported that because each catechin monomer has a different chemical structure they each have unique bioactivity, bioavailability, and physiological pharmacokinetic properties [
19,
20]. In addition, the origin and growing conditions of the tea plant affect the contents of the tea’s phytochemicals, which changes bioactivity [
21,
22]. The leaves of the tea tree have been primarily cultivated as a source of tea beverages, in which phytochemicals such as catechin and caffeine are the main functional compounds. The development of a new variety that contains enhanced phytochemical contents (qualitatively or quantitatively) is the ultimate objective of tea breeding programs. Therefore, along with the evaluation of phytochemical diversity, the development of a core collection that can represent the diversity of the entire germplasm is very important not only for the conservation and management of germplasm but also for tea breeding programs.
To assess the genetic diversity and/or develop new cultivars in many countries, molecular markers such as restriction fragment of length polymorphism RFLP) [
23,
24], random amplified polymorphic DNA (RAPD) [
25,
26,
27], amplified fragment length polymorphism (AFLP) [
24], and simple sequence repeats (SSR) [
11,
28,
29,
30] were used. [
31] reported that morphological traits have drawbacks such as the influences of environment on trait expressions, epistatic interactions and pleitrophic effects among others despite the value of their advantages. On the other hand, molecular markers are used because they are least affected by environmental factors and are almost unlimited in number. In addition, they offer a possibility to observe the genome directly, and thus eliminate the shortcomings inherent in a phenotype observation [
32].
In our previous study, we analyzed the genetic diversity of tea accessions collected in Korea using 21 SSRs [
28]. In this study, we evaluated the content of eight phytochemicals over two years (2018 and 2019) and analyzed the genetic diversity through 33 SSR markers for 462 tea accessions collected from Korea, China, Japan, and Indonesia. In addition, a target-oriented core collection was developed using both the phytochemical content and genetic diversity. This core collection will be used to efficiently preserve, manage, and evaluate tea germplasm in the genebank of Korea and to be provided to the tea breeding program as breeding materials.
4. Discussion
A vast collection consisting of 15,234 accessions of tea is available in 23 gene banks around the world [
7]. The biochemical characterization of tea germplasm in earlier studies demonstrated significant variability [
18,
45,
46,
47,
48]. Despite the substantial diversity of compounds in tea germplasm, the development of tea cultivars was limited due to bottlenecks in tea breeding, such as long gestation periods, high inbreeding depression, and self-incompatibility [
49]. In addition, the tea quality and yield in the main tea producing countries, such as China, India, Sri Lanka, Kenya, Japan, etc., were significantly improved with an increase in the ratio of clonal tea acreage [
50]. Breeding strategies often focus on a limited set of target traits, resulting in cultivars with a narrow genetic base. Yao et al. [
51] reported that the developed tea cultivars from China, Japan, and Kenya have a narrow genetic basis due to the popularity of only a few cultivars for breeding and planting. This has produced several problems, such as the spread of specific diseases and insects, the concentration of plucking time in the tea season, the non-uniformity of taste and flavor, and susceptibility to environmental changes [
40,
51]. Meegahakumbura et al. [
29] noted that a molecular analysis that can discern not only patterns of lineage, but the origin of tea germplasm is also required because the morphological characteristics that are traditionally used to define cultivars are highly plastic and easily influenced by environmental conditions. The present study attempted to address the above issue by generating a core collection of tea germplasm that includes data on the molecular variability of the crop, in addition to biochemical characterization.
4.1. Phytochemical Diversity of Tea Germplasm
Significant variation was observed among the 462 tea accessions for catechin and caffeine content in this study (
Table 1). In addition, significant differences between the two years were observed (
Table S3). Catechins and caffeine serve as secondary metabolite defense compounds in tea plants. They provide sessile plants with protection against pathogens and predators, oxidative stress, and other environmental variables. Thus, the content of catechins and caffeine varied in the tea samples based on environmental variability [
45]. Many previous studies reported a large variation in catechin and caffeine contents in tea accessions [
15,
18,
52,
53]. The authors in [
54] noted that a biochemical characterization with different proportions of total catechins and their components would be a useful tool for the development of quality-tea clones. The authors in [
55] reported that differences between locations were far larger than the variations among cultivars, implying that environmental effects should be taken into consideration when total catechin and its component contents are utilized as biochemical markers in tea breeding programs.
There are six major catechins in tea leaf: (+)-catechin (C), (−)-epicatechin (EC), (−)-epicatechin gallate (ECG), (+)-gallocatechin (GC), (−)-epigallocatechin (EGC), and (−)-epigallocatechin gallate (EGCG) [
56]. The concentration of catechins in tea was determined as follows: EGCG>ECG>EGC>EC>GC>C [
52,
53,
57,
58]. In addition, the authors determined antioxidant activity in the following order: ECG>EGCG>EC>EGC [
59]. The variation of catechin contents in tea accessions depends on the condition of the tea germplasm, such as the number of samples and the origin of the tea accessions, in each study. The range of each catechin’s content in the previous studies was as follows: EGCG, 13.0 to 139.0 mg/g; ECG, 3.2 to 89.1 mg/g; EGC, 2.1 to 249 mg/g; EC, 2.0 to 54.5 mg/g; GC, 1.4 to 22.7 mg/g; and C, 0.3 to 30.9 mg/g [
52,
54,
57,
59,
60]. In this study, the 462 tea accessions also showed a similar level of catechin content to that in previous studies (
Table 1). The concentration of catechins in tea germplasm is important for tea quality. For instance, the ratio of (EGCG + ECG) × 100/EGC has been suggested as a quality index for measuring the difference in the catechin levels of fresh tea shoots across growing seasons [
60]. In addition, the catechin index (CI)) (EC + ECG)/(EGC + EGCG)) has been used as a biochemical marker for studying the genetic diversity of tea germplasms [
54]. The tea accessions with desirable compositions of catechins in this study could be incorporated into breeding programs for crop improvement.
Caffeine is the most abundant alkaloid in tea, with content usually between 15 and 50 mg/g [
15]. In this study, the caffeine content of 462 tea accessions ranged from 0.4 to 36.6 mg/g (2018) and 0.4 to 28.8 mg/g (2019) (
Table 1). Kottawa-Arachchi et al. [
15] noted that various amounts of caffeine have been observed in different tea growing countries. Due to the pharmacological properties of caffeine on the central nervous system, the demand for low-caffeine tea is increasing greatly, from 2% of total tea consumption in 1980 to 15% in the early twenty-first century [
61]. Although many countries have invested in methods and techniques to make decaffeinated tea, such techniques can remove the tea’s unique aroma and taste, which will worsen the quality. It is thus important to develop low caffeine clones through breeding and selection, as such clones could be a solution to the problem of high caffeine levels and contribute tremendously to the provision of natural low-caffeine tea [
18]. The tea accessions with a lower caffeine content in this study could be used as naturally low-caffeine genetic resources for crossbreeding parents.
4.2. Genetic Diversity of Tea Germplasm
In our previous study, we analyzed the genetic diversity and population structures of 410 tea accessions collected from South Korea using 21 SSR markers and revealed the narrow genetic base of South Korean tea accessions [
28]. In the present study, the genetic diversity and population structure of 462 tea accessions from China, Japan, Indonesia, and Korea (conserved in NAC) were analyzed using 33 SSR markers. As shown in
Table 4, higher diversity was detected among the tea accessions in China (H = 1.91, I = 0.81) than among those in Korea (H = 1.73, I = 0.76), Japan (H = 1.42, I = 0.73), and Indonesia (H = 1.05, I = 0.76). Other studies also similarly reported that the Chinese tea population exhibited a higher level of genetic diversity than that of other tea populations from other countries [
24,
51]. In general, China is thought to be the origin of tea, so Chinese tea populations are the most likely to account for the largest proportion of diversity [
51]. Our previous study noted that Korean tea germplasm showed low genetic diversity because of limitations in the gene stock from China, political and religious reasons, and extreme environmental conditions [
45]. Tanaka et al. [
62] reported that the tea plant in Japan was first introduced from China about 1200 years ago and that the country’s original tea populations were established based on only a few of seeds from a restricted source. In addition, the authors in [
23,
25] suggested that the low genetic diversity of tea accessions in Japan could be attributed to long and intensive selection and breeding from the genetically limited tea stock in Japan.
It is important to identify the correspondence between the genetic diversity of tea accessions and their origins. In this study, the different approaches (STRUCTURE and DAPC) used to analyze the population structures of the 462 tea accessions were able to provide complementary information. However, the structuring of tea accessions at K = 2 (based on the estimated ΔK value in STRUCTURE) and K = 4 (based on the BIC and DAPC) clearly did not segregate the accessions based on geographical distinctions. The Evanno method is artificially maximal at K = 2, in some cases, because it finds the highest level of structure in the data by focusing only on the changes in slope [
39,
63]. Similar results were obtained in previous studies on tea germplasm structures based on SSR (K = 2) [
11,
30,
64,
65]. The DAPC method does not require that populations be in HW equilibrium and can handle large sets of data without using parallel processing software, so it provides an interesting alternative to the STRUCTURE software [
66]. In addition, the DAPC analysis provided more detailed clusters compared to the STRUCTURE analysis in previous analyses using SSR [
28,
66,
67]. Our results also agree with those of previous studies where the DAPC analysis (K = 4) provided more detail than STRUCUTRE (K = 2). However, these results indicated lower genetic differentiation (PhiPT, DAPC = 1.2%; Clustering analysis of phytochemicals = 0.8%; STRUCTURE = 0.5%) than that in the collection area (5.6%). This might be due to an imbalance in the distribution of tea accessions used in this study, as 88.3% of tea accessions in this study were collected from South Korea. In our previous study, the genetic differentiation in the DAPC analysis of Korean tea germplasm was 1.4% [
23]. This affected the low genetic differentiation between groups resulting from an analysis of the population structure, although the genetic differentiation of tea origins was also shown to be low (5.6%).
4.3. Development of a Target-Oriented Core Collection
To develop core collections, various methods, such as phenotypes, proteins, and molecular markers, have been used. However, there is no universally accepted method to construct a core collection because every method has advantages and disadvantages [
68]. Previous studies have proven that phenotypes are useful parameters for developing core collections [
2,
12,
69]. Kumar et al. [
70] reported that the use of molecular markers in the development of a core collection is more effective than the use of other data, such as morphological traits sensitive to environmental effects. In addition, molecular markers are more effective in identifying and minimizing redundancy. Le et al. [
71] suggested that the use of both phenotypic and molecular data together is more effective than their use individually when constructing a core collection. In this study, molecular markers and biochemical contents were utilized for the construction of a core collection in tea germplasm using the POWERCORE program, which was successfully used to build a core collection for various plant species, including olive [
69], safflower [
71], and tea [
9].
In this study, seasonal data sets were handled independently to develop the core collections because the MANOVA analysis presented noticeable Genotype X environmental interactions. In addition, the evaluation indices (MD%, VD%, VR%, CR%) were comparable and reflected their effectiveness in capturing diversity to validate the core collection. MD%, VD%, and VR% were used to evaluate the statistical consistency between the core and entire collections [
42], while MD% was used to represent the difference in the accession averages between the core and entire collections, which should be <20% for a representative core collection. VD% indicates the variance captured by the core collection, and VR% indicates a comparison between the coefficient of variation values present in the core and entire collections. CR% indicates whether the distribution ranges of each variable in the core set are well represented when compared to the entire collection, which should be greater than 80% [
12,
42,
43,
70]. In this study, the core collections yielded a CR% of more than 80% (97.43%) and an MD% of less than 20% (7.88%) (
Table 7). Similar results for other species were reported in core collections developed with a lower MD% or higher CR%, which were more representative of the entire collections [
72,
73]. In addition, the distributions of each phytochemical in the tea accessions showed similarities to those of the entire collection (
Figure S1). In general, the core collections can be classified into three types or categories: core collections representing (1) individual accessions, (2) extremes, and (3) the distribution of accessions in the entire collection [
3]. Odong et al. [
3] suggested that a core collection of type 3 (distribution of accessions) is only of interest if the aim is to provide an overview of the composition of the whole collection using only a part of the collection. The authors in [
23,
74] suggested that this type of core collection can be obtained by maximizing the representativeness of the pattern of trait variations in the whole collection. Considering these reports, the core collection developed in this study showed a similar pattern of type 3, which could represent the entire collection.
By integrating genetic diversity and phytochemical content, we developed a target-oriented core collection that we have not tried before in tea germplasm. The main targets for tea breeding and use are mostly related to catechin content; therefore, the phytochemical analysis and development of TOCC allow us to extend the use of tea germplasm broadly. Furthermore, the TOCC retained the phytochemical and genetic diversity of ENC, as we extracted the accessions after analyzing the variation of the content over two years using molecular marker data. The genetic diversity indices (I and H) and the distribution of accessions (NJ and PCoA) also indicate that the TOCC is well developed and reflects the whole diversity of ENC. Throughout this process, we developed a greater value-added core collection, which will not only provide useful materials to breeders but also aid in the efficient management of genebank. This target-oriented core collection is distinguished from the previous core collection in which accessions were selected based on their agronomic traits and molecular markers. Our upgraded core collection focused on the phytochemical content in tea germplasm suggests new directions for the use and conservation of tea germplasm.