**3. Discussion**

To understand the genetic differentiation of forest tree populations and contribute to the development of effective breeding strategies, comprehensive evaluations of natural germplasm resources of individual species are essential; such evaluations can accelerate breeding strategies and industrial development [36,37]. Naturally, *P. koraiensis* mainly grows in the cold temperate zone, especially in northeast China, and natural forests of this species have been shown to be sensitive to climate factors. Thus, to conserve genetic resources of this species, it is important to obtain data on its genetic diversity and population structure. In present study, we conducted a population genetic analysis using codominant molecular markers, representing the first such analysis in *P. koraiensis*. The results can help guide the genetic improvement and resource conservation of this important gymnosperm.

### *3.1. Genetic Diversity*

Genetic diversity has been increasingly evaluated in species lacking a reference genome, including some conifers [38], endemic species [39] and endangered plants [40]. Studies of genetic diversity can provide insight into speciation and genetic variation within and among populations and can aid the development of conservation strategies. However, transcriptome data and molecular markers remain lacking for *P. koraiensis*; the available genetic data provide few markers suitable for the study of population genetics in this species. Evaluating the germplasm resources of this species represents the first step towards understanding the genetics of natural *P. koraiensis* populations. A high level of genetic diversity in natural *P. koraiensis* populations was detected in this study, with mean values of 10.33 and 0.521 for Na and He, respectively. High genetic diversity was observed in the Heihe population in the northern Xiaoxinganling Mountains, possibly due to less human disturbance in this region than in other areas. According to a previous study, a PIC value equal to or more than 0.5 indicates high genetic information for genetic markers. In the present study, the PIC values obtained for the multiallelic EST-SSR markers ranged from 0.142 to 0.833, with a mean value of 0.461, indicating a high level of genetic information among the 480 *P. koraiensis* individuals from the 16 natural populations. The genetic diversity of *P. koraiensis* obtained in the present study is higher than that reported for *Pinus bungeana* (Na = 3.70, He = 0.36) [41], *Pinus dabeshanensis* (He = 0.36) [42] and *Pinus yunnanensis* (Na = 4.10, He = 0.43) [43] but lower than that reported for *Pinus tabulaeformis* (Na = 6.52, He = 0.68) [44]. The genetic diversity of a species may vary with characteristics such as adaptability, pollination mechanism and population size [45–47]. The observed genetic diversity in the present study might be attributable to the genetic background, life history

and population dynamics of *P. koraiensis*. Previous studies have found that *P. koraiensis* has a large population size, long life cycle, strong adaptability, long pollination distance and large genome size, and it has a complex genetic background, which allows it to generate high genetic diversity [48–52]. Natural selection under the changing environmental conditions is likely to lead to differences in genotype frequency among populations. In addition, previous studies have suggested that evaluations of genetic diversity are limited by low numbers of populations and molecular markers [53,54]. For instance, analyses of the genetic diversity of natural *P. koraiensis* populations have been conducted using a variety of molecular marker techniques. Kim et al. [53] analyzed allozyme loci variation and found a moderate level of genetic diversity among natural *P. koraiensis* populations in Korea. The genetic diversity of natural *P. koraiensis* populations detected by allozyme markers (He = 0.18) in Russia [54] was much lower than that identified using EST-SSRs markers in this study. These findings indicate that *P. koraiensis* maintains high genetic diversity worldwide. The level of genetic diversity detected in this study is similar to that detected based on nine EST-SSR markers in seven natural populations of *P. koraiensis* in northeast China (He = 0.610) [35]. Xiaoxinganling Mountain of China was considered as the distribution center for *P. koraiensis*, possessing abundant germplasm resources and ancient founding stocks and maintaining considerable numbers of individuals. In addition, the genetic diversity of *P. koraiensis* populations from Xiaoxinganling Mountains was higher than that of the Changbaishan Mountains populations, with high expected heterozygosity and abundant private alleles found for the former populations (Figure 4). All these results indicate that Xiaoxinganling Mountains may be the center of genetic diversity of *P. koraiensis* in China.

### *3.2. Population Genetic Differentiation*

Detection of genetic differentiation is a key process in the genetic improvement of forest trees. Regarding the estimation of genetic differentiation, past studies have considered an Fst value higher than 0.15 but lower than 0.25 to indicate significant divergence [55–57]. In the present study, the genetic differentiation assessed by EST-SSRs among *P. koraiensis* populations ranged from 0.014 to 0.348, with a mean value of 0.177, indicating significant differentiation among populations in China. However, previous studies reported low genetic differentiation among populations as assessed by allozyme loci variation in Korea (Fst = 0.06) [53] and Russia (Fst = 0.015) [54] and by EST-SSRs in China (Fst = 0.02) [35]. In addition, Kim et al. [31] studied the genetic variation of *P. koraiensis* in Korea, Russia and China using allozymes and RAPDs and detected small differences among the three regions. Different degrees of genetic differentiation were observed in natural *P. koraiensis* populations in these countries, with low Fst values. The main reason for these differences is that only limited numbers of natural populations and molecular markers were analyzed. The genetic differentiation index (Fst) is correlated with gene flow (Nm). Generally, the greater the degree of differentiation, the weaker the gene flow, i.e., a lower gene migration rate among populations [57–60]. Natural *P. koraiensis* forests originated in Siberia and in northeast Asia have undergone regeneration, succession and migration over millions of years [61,62]. After the Quaternary glaciation, many species died out, but the *P. koraiensis* forests persisted into the present and underwent a range of changes and varying degrees of differentiation. In natural *P. koraiensis* populations, low levels of genetic differentiation have been observed in Korea [31], whereas high genetic differentiation has occurred in northeast China, which may have contributed to the rich *P. koraiensis* germplasm resources (representing approximately 60% of the world's total) and broad distribution area (more than 3000 hectares) in this country. The mean He (0.521) across all loci was greater than Ho (0.374), indicating a high heterozygosity among the sampled populations of *P. koraiensis*. This high heterozygosity is attributable to the fact that Pinus species exhibit cross-pollination and wind pollination. Furthermore, the AMOVA suggested that most of the genetic variation (more than 60%) in *P. koraiensis* exists within populations, with a

small proportion occurring among populations; these findings are consistent with findings in other cross-pollinating species.

### *3.3. Population Structure and Gene Flow*

Analyses of population structure can provide insight into population size, breeding system, extent of isolation and population migration or gene flow [63,64]. Furthermore, such analyses can help reveal the relationships between genetic variation and environmental stresses and enhance our understanding of evolution. Evaluating population structure is a key component of genome-wide association analysis (GWAS) and marker-assisted selection (MAS) [65]. *P. koraiensis* is mainly distributed in Xiaoxinganling and Changbaishan Mountains in northeast China, areas with a humid climate. Due to the environmental conditions, the germplasm resources of *P. koraiensis* from different locations display high phenotypic and genetic variation. The STRUCTURE analysis of population structure identified two groups (optimal K = 2) from the 16 natural populations, with 5 populations in one group and the remainder in another group. Similar results were obtained in the PCA and dendrogram (neighbor-joining tree) analysis, indicating genetic differentiation of *P. koraiensis* populations in China. Interestingly, individuals from Xiaoxinganling Mountains were clustered into one group, occupying a northern area, which makes them more like an ancestral group. Furthermore, the samples from Changbaishan Mountains and the adjacent ridge region were clustered into the other group; the populations corresponding to these samples are distributed in a southern area and exhibit different degrees of genetic differentiation and gene flow. However, some of the individuals from Xiaoxinganling Mountains were clustered into cluster 2, although the majority were clustered into cluster 1 (Figure 3A,B). The main potential reasons for this finding are as follows: (1) These two mountain regions are close to each other, and some hybridization events may occur; (2) for populations separated by a short spatial distance, the probability of gene flow is high, which will affect population genetic structure; and (3) pollen and seed dispersal occur over long distances in this species, which promote gene flow. The genetic structure of the natural *P. koraiensis* populations in China determined in this study is consistent with the current geographical distribution of these populations. Furthermore, the findings are consistent with previous studies showing that populations in similar geographical locations or environments tend to cluster into the same group [66,67]. In this study, although some admixture was detected by the STRUCTURE and PCA analyses, the population dendrogram also suggested two subgroups, with 5 populations in cluster 1 and 11 populations in cluster 2.

Gene flow among populations is closely related to geographical distance and effective population size and can generate new genetic combinations, potentially enhancing species resilience and persistence [68–70]. In plants, migration, or gene flow, is achieved via seeds, pollen and other propagules, and influences the genetic diversity and differentiation among independent evolutionary units [71,72]. We found that two genetically distinct populations (Zhanhe and Wangqing) exhibited segregation from other populations, which may be related to their geographic distance from other populations (approximately 565 km), limiting the level of gene flow between them. These independent units play an important role in maintaining the genetic diversity of this species. This interpretation is consistent with previous studies demonstrating that isolated populations of plants with long-distance pollination may have higher levels of genetic diversity than large contiguous populations [73–75]. Moreover, high levels of gene flow were found among Helong, Maoershan and Fangzheng populations. Thus, it can reduce the effects of artificial selection or genetic drift and promote the maintenance of genetic information. Similar results were obtained for *Camelina sativa* accessions [65]. Previous studies have also found that extensive gene flow can alter the gene frequencies in populations to affect genetic diversity and structure. In our study, although a strong correlation between gene flow and geographic distance between populations was observed, some degree of gene flow was also evident between geographically distant populations. In addition, geographic distance was not correlated

with genetic distance in the natural *P. koraiensis* populations in this study, suggesting that geographic distribution may not be a determinant factor for the genetic structure of populations.

### *3.4. Conservation and Management Strategies*

Evaluations of germplasm resources are needed to maintain abundant genetic variation and high levels of genetic diversity of some species of interest and establish sound conservation strategies. Our population genetic analysis revealed that the populations distributed in the Xiaoxinganling Mountains (Zhanhe, Heihe, Liangshui, Tieli and Hegang) exhibit high levels of genetic diversity and moderate levels of gene flow (Figure 4). These populations represent the core populations and have stronger environmental adaptability and evolutionary potential than the other populations, and they can be considered independent genetic units. Hence, measures such as in situ conservation should be implemented for conserving natural *P. koraiensis* resources. In addition, the marginal populations represent special germplasm resources. They are characterized by low genetic diversity but have high levels of genetic differentiation relative to the other natural populations. Habitat fragmentation can reduce gene flow among populations, leading to a loss of genetic diversity. In this study, the Helong population, which occurs in a marginal area, should be targeted for conservation measures, such as ex situ measures. In addition, the greatest level of population differentiation was observed between Helong and Liangshui populations, indicating that these populations can be considered independent units. Therefore, regulations and managemen<sup>t</sup> strategies must be established to protect the natural habitat of this species and prohibit harvest. More importantly, a national-level core germplasm resource library of *P. koraiensis* should be established by the government, with the objectives of maintaining genetic variation, improving plant adaptability to environmental changes and developing new breeding materials. Under these measures, the existing natural *P. koraiensis* populations in China can be protected and be better used as a source of resources for genetic improvement in the future.

For forest managemen<sup>t</sup> perspectives, efforts should be made to regulate timber harvesting in these population in order to reduce loss of genetic diversity. Particularly, the marginal populations, such as the Helong population, that are characterized by low genetic diversity should be given the highest managemen<sup>t</sup> priority by enrichment planting of individuals from different populations to enhance the genetic diversity within this population. Forest managemen<sup>t</sup> should also focus on suppression of wildfire in these forests as population fragmentation driven by wildfire can reduce gene flow among populations, leading to a loss of genetic diversity. In addition, these forests should be protected from pest and disease, such as the white pine blister rust diseases that affect the trees.

### **4. Materials and Methods**

### *4.1. Plant Materials and Genomic DNA Extraction*

In this study, 16 populations of *P. koraiensis* from Jilin Province (J) and Heilongjiang Province (H) were considered. A total of 480 samples were collected from these populations, which occur throughout the natural distribution areas in northeastern China (Table 5, Figure 6). The sampled populations were selected to represent the main distribution region and included the Xiaoxinanling Mountains group (G1, including populations P4, P5, P6, P7 and P8) and the Changbai Mountains group (G2, including P1, P2, P3, P9, P10, P11, P12, P13, P14, P15 and P16). The populations were distributed across the northernmost (Heihe, 49◦2447", 126◦3647.16"), southernmost (Linjiang, 42◦036", 127◦1312") and easternmost regions (Hulin, 45◦4612", 132◦5848") of natural *P. koraiensis* distribution in northeast China. The average altitude of the sampled individuals was 320 m. To obtain representative samples of populations, sampling was conducted with at least 200 m between sampled individuals. The number of sampled individuals per population ranged from 29 to 31, with an average number of 30. Fresh needle samples of *P. koraiensis* with no signs of pests or disease were collected from the middle portion of the crown and immediately frozen in

liquid nitrogen and stored at −80 ◦C for subsequent genomic DNA extraction and PCR amplification. In addition, nucleic acids were extracted from needles using the improved cetyltrimethyl ammonium bromide (CTAB) method described by Li et al. [76]. DNA quality and concentration were evaluated using 1.0% agarose gel electrophoresis and the K5500 Plus microspectrophotometer (KAIAO Technology Development Co., Ltd., Beijing, China), respectively.


**Table 5.** Summary of *Pinus koraiensis* sampled populations in NE China.

**Figure 6.** Distribution of *Pinus koraiensis* populations sampled in northeast China. G1 represents Xiaoxinganling Mountains geographical groups. G2 represents Changbaishan Mountains geographical groups.

### *4.2. PCR Amplification and SSR Analysis*

Fifteen highly polymorphic and reproducible EST-SSR markers of *P. koraiensis* developed in our laboratory were selected in this study to detect polymorphisms in the 16 sampled *P. koraiensis* populations. The primers of *P. koraiensis* were developed as described by Li et at. [23]. Eight capillary electrophoresis templates were amplified with fifteen primers synthesized by Sangon Biotech (Shanghai, China), and universal M13 sequence (5-TGTAAAACGACGGCCAGT-3) labeled with four fluorescent dyes (TAMRA, FAM, HEX and ROX) was added at the 5 end of the forward primers. DNA was diluted to a working concentration of 25 ng/μL. To detect SSR loci, polymerase chain reaction (PCR) was performed in a total volume of 20 μL containing 10 μL 2× Super PCR Mix (Beijing Genomics Institute Tech Solutions (Bejing Liuhe) Co., Ltd., Beijing, China), 2 μL template DNA, 0.8 μL forward primer (1 μM), 3.2 μL reverse primer (1 μM), 1 μL M13 primer with fluorescent label and 3 μL ddH2O. The PCR amplification conditions were as follows: 94 ◦C for 5 min followed by 30 cycles at 94 ◦C for 30 s, 57 ◦C for 30 s, and 72 ◦C for 30 s; followed by 8 cycles at 94 ◦C for 30 s, 55 ◦C for 30 s, and 72 ◦C for 30 s; followed by final extension at 72 ◦C for 10 min. The PCR products were subjected to 1.0% agarose gel electrophoresis and then analyzed by high performance capillary electrophoresis (HPCE) using an ABI 3730XL DNA Sequencer (Applied Biosystems, Foster City, CA, USA) to detect fragment size. The original sequence data were analyzed using GeneMapper (version 4.1) software.

### *4.3. Data Analysis*

GeneMapper was used to obtain the microsatellite allele data, and the Microsatellite toolkit v 3.1.14 was used to convert the data into the necessary format for analysis. The genetic diversity analysis was conducted using GENALEX software version 6.50 [77] with the following parameters: number of alleles (Na), effective number of alleles (Ne), observed (Ho) and expected (He) heterozygosity, number of rare alleles (NRA), Shannon diversity index (I), Hardy–Weinberg equilibrium (HWE), F-statistics (Fis, Fit and Fst) and Nei's genetic distance. The TBtools software [78] was used to plot the heatmap of expected heterozygosity (He). In addition, we calculated the polymorphism information content (PIC) values of each SSR primer using the PICcalc program [79]. Gene flow (Nm) was calculated as Nm = (1 − Fst)/4 × Fst and used to measure the degree of gene exchange among or within the 16 populations. ALREQUIN software (version 3.5) [80] was used to analyze the level and sources of molecular genetic variation via AMOVA based on the evolutionary distances among and within the sampled populations and the observed genetic clusters. The total genetic variation was divided into three components: among groups, among populations within groups and within populations.

To evaluate the population genetic structure of *P. koraiensis*, a Bayesian clustering algorithm was performed in STRUCTURE software (version 2.3) [81] with the following settings: K-values from 1 to 10, with ten runs per K value and a burn-in period and number of Markov chain Monte Carlo (MCMC) reps after burn-in of 100,000 iterations and 100,000, respectively. The optimal K value for the number of populations was determined based on the delta-K values calculated by the Evanno method [82], using an algorithm of the online tool of STRUCTURE HARVESTER [83], where a clear peak was observed in the plot of delta K. In addition, principal component analysis (PCA) was performed to evaluate the genetic relationships among different populations using GENALEX software version 6.50. Based on the Nei's genetic distance (1983), a Neighbor-joining (NJ) phylogenetic tree of the populations was constructed using PowerMarker software (version 3.25) [84] and annotated and visualized using the online tool interactive Tree Of Life (iTOL) [85]. Geographic distance among populations was calculated as described in Li et al.'s study [76]. Finally, to detect the gene flow among the 16 populations, a relative migration network was constructed using the 'diveRsity' [86] package of R software (version 3.5.0) [87].
