1. Introduction
Germplasm resources are essential for genetic research, enabling the exploration and utilisation of economically and ecologically important genes and traits [
1]. Therefore, it is crucial to collect, preserve, evaluate, and utilise germplasm resources for breeding improved cultivars and new varieties [
2,
3]. Germplasm work can also be applied in theoretical studies related to plant origin, evolution, classification, and other aspects. However, the management and utilisation of germplasm resources require significant resources for appropriate assessment, particularly regarding redundant and duplicated germplasm resources [
4,
5,
6]. Due to their large quantity, diverse structure, and incomplete information, the available potential of collected germplasm may not be fully utilised effectively. To tackle these issues, the concept of a “core collection” was further expanded in the early 1980s, introducing additional principles and methodologies for construction [
7]. A core collection is a selected subset of the entire germplasm resource, chosen through certain methods, with the objective of representing the genetic diversity of the entire germplasm using the minimum number of resources required. They facilitate the rapid capture of germplasms possessing target traits and promote the effective utilisation of these resources [
8,
9]. Core collection resources are a novel approach to the utilisation and preservation of germplasm resources. Core collections perform a crucial function in the management and utilisation of genetic resources [
10].
Initially, researchers used geographic origin and phenotypic characters to establish core collections, since these characters visually represent plant differences and are easy to measure, e.g., pomegranate
Punica granatum L. [
11],
Ziziphus mauritiana Lam. [
12],
Prunus persica L. Batsch [
13], etc. Nevertheless, there are some dilemmas with the method, such as the loss, incompleteness, and unreliability of germplasm and genetic data, as well as the sensitivity of phenotypic data to environmental conditions. These dilemmas mean that a core collection does not correctly represent the diversity of the original population [
14,
15]. Microsatellites or simple sequence repeats (SSRs) are tandemly repeated motifs of 1–6 bases found in all prokaryotic and eukaryotic genomes analysed to date [
16]. These help to reveal the population structure and are often used as a tool for core collection development, with the advantage of exhibiting high polymorphisms that lead to population-specific alleles. A core collection has been developed using SSR markers in quite a few species, such as
Corylus avellana L. [
17]. In practice, breeders often use multiple data types to improve the quality of a core collection, which can greatly prevent the loss of key germplasm and improve the accuracy and comprehensiveness of a core collection. To name only a few, Krichen L et al. established a core collection of
Prunus armeniaca L. through molecular markers and morphological traits [
18]. Sun et al. developed a core collection of
Litchi chinensis Sonn. based on genotypic data and agronomic traits [
19]. Kumar et al. developed a core collection of
Carthamus tinctorius L. using molecular, phenotypic, and geographic diversity [
20]. Wang et al. used a combination of molecular markers and different phenotypic data to construct a core collection of
Pinus yunnanensis [
21].
Korean pine (
Pinus koraiensis) is a valuable tree species in Northeastern China, serving as both an excellent source of timber and a valuable edible dry fruit and oilseed tree species [
22]. The distribution area of Korean pine includes Northeastern China, the Korean Peninsula, and far southeastern Russia, with intermittent distribution in Honshu and Shikoku, Japan. In China, it is mainly found in Changbaishan, Zhangguangcailing, Laoyeling, Wandashan, and Xiaoxinganling [
23]. Korean pine is a tall and straight tree with a perfectly shaped stem, high longevity, good physical quality of wood, and high productivity levels [
24]. Additionally, the seeds, resin, and bark of Korean pine hold high economic value [
25]. Korean pine is one of the main species of pine nuts and has a similar nutrient composition to other pine family cones such as
Pinus pinea,
Pinus sibirica,
Pinus sinensis,
Pinus cembra,
Pinus edulis, and
Pinus monophylla. These cones have significant food value and are highly beneficial to human health [
26]. Since the 1960s, Heilongjiang Province has focused on the production and utilisation of Korean pine resources. The establishment of numerous seed orchards has provided a strong foundation for the development of Korean pine nut resources and the cultivation of nut forests [
27]. Breeding research has been conducted based on Korean pine seed source zoning [
28], resulting in the accumulation of abundant germplasm resources and a theoretical basis [
29,
30,
31,
32]. However, the direction and method of seed orchard construction has mainly been determined by geographical origin and growth traits. The genetic relationships and diversity of germplasm were overlooked. The repetitive and redundant materials in the germplasm resources of seed orchards has impeded the effective conservation of germplasm resources and may have also impede the germplasm resources’ evaluation and efficient use. Therefore, it is necessary to establish a core collection of Korean pine seed orchard germplasms.
The aim of this study is to enhance the effective management and utilisation of Korean pine resources. To achieve this, we constructed a core collection consisting of 314 clones from eight (8) populations, based on 11 SSR markers and nine (9) morphological and physiological traits, using various data types and strategies. The core collection was selected based on an analysis of genetic diversity parameters. The characteristics of the core collection have been described and compared to those of the entire collection. The results of this study will lay the foundation for research on good genes and molecular markers and provide the basis and valuable materials for the effective use and conservation of Korean pine resources. This study is of theoretical importance for the development of conservation strategies and breeding programs for Korean pine germplasm.
4. Discussion
Germplasm resources are essential for breeding. It is crucial to understand the genetic diversity and variation of germplasm resources for the conservation and sustainable use of plant resources [
44,
45]. The greater the range of tree species, the higher the genetic variation, leaf phenology, and physiological variation [
46,
47]. Among the populations, Chlb exhibited the highest CV. The average CV for each trait was highest in SGL among the populations. The degree of genetic diversity is higher when the CV of quantitative traits in germplasm resources is greater. The results of PCoA indicate that there is a high degree of overlap between different populations from the same orchard. Therefore, it is necessary to use multiple site materials to build the core collection.
Molecular markers are based on DNA polymorphisms, which are not biologically active and are rarely affected by the growth period of the plants or the external environment. They are therefore more suitable than morphological markers for constructing core collections and assessing genetic diversity. SSRs are co-dominant markers that provide richer allelic information than other molecular markers. In this study, Ne and I were higher than those in the study of Feng Fujuan et al. [
48]. This study found that the use of SSR markers resulted in a higher efficiency compared to other types of molecular markers. The population with the highest genetic diversity was HL, while the lowest was SGL. The HB population had the lowest F (−0.002), indicating the presence of inbreeding in the seed orchard. All other populations had an F greater than zero, suggesting heterozygous deficiencies. These deficiencies may be related to the size of the original population and the procedures and criteria used to select superior trees. AMOVA revealed that despite the populations having different origins, interpopulation genetic variation accounted for only 3% of the total genetic variation. These results suggest that the interpopulation variation is insignificant and there is no clear genetic differentiation. The most genetically distant relationships are between HL and SGL.
The construction of a core collection typically involves four steps, data collection and organization, grouping of materials, determination of sampling strategy, and testing and evaluation of the core collection [
49]. Observing genetic diversity is crucial for conserving germplasm resources and for breeding work. Sets 1, 2, and 3 compared the characteristics of collections constructed using different strategies. Both the E-strategy and M-strategy were able to maintain the Na, but only the E-strategy significantly improved the Ne, I, Ho, and He of the core collection. The D-strategy was able to restore the Ho of the original germplasm by using the least amount of germplasm, making it more suitable for the construction of breeding populations. Collections 4, 5, and 6 compared the quality of the core collection constructed using different data types. The results show that the core collection, constructed using multiple data types not only retained the Na, but also the morphological and physiological traits of the entire population. Previous studies have also demonstrated that constructing a more robust and reliable core collection using multiple data types can significantly prevent the loss of critical germplasm and improve the accuracy and comprehensiveness of the core collection [
6]. Genetic analyses suggest that there is no clear genetic structure or significant genetic differentiation between geographic sources of Korean pine clones. Therefore, an appropriate sampling strategy is required to construct a core collection. The construction of a core collection depends on its purpose [
17]. In recent years, the M-strategy has been widely used for this purpose. The M-strategy selects materials with high allele abundance and low redundancy by maximizing the number of alleles at each locus. This approach can preserve rare and local alleles [
50,
51]. The core collection constructed using M-strategy was able to restore more alleles and also restore the level of genetic diversity of the original germplasm under the same sample size.
The study found that the sampling rate of the core collection of Korean pine was at 36.31%, which is higher than that of
Schima superba (15.3%), but with a comparable core collection size (115) [
52], and lower than that of
Juglans regia L. (44.23%) [
53]. These results suggest that the sampling rate of the core collection varies depending on the special characteristics. The evaluation of the core collection depends on various factors, including the size and accessibility of the original germplasm, the similarity of the germplasm, the number and relevance of traits investigated, and the sampling strategy [
54]. In addition to parameters such as MD, VD, Na, and Ne, the evaluation of morphological and physiological traits has introduced
H. The core collection was determined using the Brekin Comprehensive Evaluation Method, which calculates composite scores for multiple traits in superior germplasm [
43,
55]. This method was also found to be suitable for the comprehensive evaluation of the core set. To compare distributional characteristics between the core collection and the original population, PCoA was used [
56]. The distribution of molecular markers and morphological and physiological traits was consistent between the core and original groups, indicating no significant differences between the two. The highest proportion of the core collection was from HB, while the lowest was from SGL, which is related to the population size. The proportion of the core collection in each population of the original population reflects the redundancy of the population. SGL, which has the smallest proportion, exhibits the lowest genetic diversity. The higher proportion observed in LSH may be attributed to its limited population size, in addition to its rich genetic variation.