1. Introduction
The Salicaceae family comprises over 300 species of trees and shrubs found across China and it is generally classified into three genera, namely
Populus,
Salix, and
Chosenia [
1]. Interestingly, the genus
Chosenia is monotypic and contains only
Chosenia arbutifolia (Pall.) A. Skv., which is distributed in the Greater and Lesser Khingan Mountains, the Changbai Mountains, and the montane regions of eastern Liaoning Province [
1]. As an ancient tree species,
C. arbutifolia was considered a transitional form between the divergence from
Populus to
Salix and was segregated into a separate genus by some botanists [
2,
3]. In Northeast China,
C. arbutifolia serves primarily as landscape planting element, owing to its unique characteristics, including fast growth, frost tolerance, majestic architecture, beautiful shape, and scarlet branches. Moreover, as an important wood source,
C. arbutifolia is also utilized for construction, furniture, and paper production [
4]. The nature populations of
C. arbutifolia have drastically decreased due to their weak regenerative capacity, unreasonable deforestation, and over-utilization, leading to the plant’s recognition as a class II endangered species listed in the National Key Protected Wild Plants.
As the most important determinant of biodiversity, genetic diversity is driven by mutations, natural selection, and other factors that are fundamental for species evolution and adaption [
5]. The level of genetic diversity indicates the adaptive ability to environmental changes and is a decisive factor for the long-term survival of species, especially for tree species with long life spans [
6,
7]. Therefore, the evaluation of genetic diversity and population structure is of major theoretical and practical significance for the management, development, utilization, and conservation of plant germplasms. Thus far, various efforts have been made to determine the genetic diversity and population structure of many common tree species, such as eucalypts [
8], Chinese elm [
9], oak [
10], some fruit trees [
11,
12], and conifer species [
13,
14,
15]. Furthermore, Salicaceae species, including
Populus nigra L. [
16],
Populus tomentosa Carr. [
17],
Populus deltoids Bartr. [
18],
Populus euphratica Oliv. [
19],
Populus trichocarpa Torr. & Gray [
20],
Salix viminalis L. [
21],
Salix psammophila C. [
22],
Salix purpurea L. [
23,
24], and
Salix alba L. [
25], have been extensively characterized.
Despite its endangered status and environmental significance, genetic and genomic resources available for
C. arbutifolia are scarce, with previous research having mainly focused on biological habitat, growth traits, breeding techniques, and transcriptome [
26,
27,
28]. Furthermore, only one report exploring the phylogeography and population structure of
C. arbutifolia in Japan through the analysis of chloroplast DNA markers has been published [
29]. Therefore, the available genetic markers for
C. arbutifolia are still insufficient compared to those for other Salicaceae species. In this study, a large number of single nucleotide polymorphism (SNP) markers were identified through specific locus amplified fragment sequencing (SLAF-seq). Moreover, the genetic diversity and population structure of ten
C. arbutifolia populations were evaluated in order to determine the evolutionary history of
C. arbutifolia species in Northeast China. Landscape genomics analysis was also performed to investigate the loci potentially under selection.
4. Discussion
As the most important determinant of biodiversity, genetic diversity can be studied at different levels, including population, individual, tissue and organ, or the molecular level. Highly variable genetic markers, such as SSRs and SNPs, facilitate the evaluation of genetic diversity within and between populations. However, population size is a critical factor that can have a significant influence on the accuracy of genetic parameters [
37]. Due to unreasonable deforestation, the distribution range of natural
C. arbutifolia in China is being continuously reduced, thus decreasing the number of individual plants available for collection, which in turn accounts for the small sample size in the current study (
Table 1). In general, the larger sample size, the more rare alleles will be detected. For some tree species, previous reports have suggested that a sample size of 25–29 [
38] or less than 30 [
39] per population should be sufficient. In contrast, numerous lines of evidence have demonstrated that the parameters of genetic diversity can be accurately estimated through the study of small populations with less than 10 individuals, as previously reported for
Quercus susber L. [
40] and
Eucalyptus occidentalis Endl. [
41]. Taken together, due to species-specific biological characteristics and different sampling strategies, there is no definitive conclusion regarding the optimal population size [
42].
H
e is commonly considered an essential measure for genetic diversity, and the larger it is, the lower the population consistency, and the higher the genetic diversity. In the present study, the H
e value ranged from 0.3559 to 0.3428, with an average value of 0.3505, which is significantly smaller than that of
Salix arbutifolia Pall. in Japan (mean H
e = 0.6026) [
29]. Moreover, the current H
e value was also lower than that obtained for
Salix species, such as
S. viminalis (mean H
e = 0.616) [
21],
S. psammophila (mean H
e = 0.689) [
22],
S. purpurea (mean H
e = 0.7365) [
43],
Salix eriocephala Michx. (mean H
e = 0.6857) [
43], as well as
Populus species, such as
P. tomentosa (mean H
e = 0.446) [
17],
P. deltoids (mean H
e = 0.487) [
18] and
Populus simonii Carr. (mean H
e = 0.677) [
44]. However, the H
e values in all of these reports were calculated using SSR markers, which are extremely variable and greatly dispersed across the genome. Using the same SNP markers, the H
e value estimated herein was much higher than that previously reported for
S. purpurea (mean H
e = 0.2301) [
23], while comparable to those for
Ulmus parvifolia Jacq. (mean H
e = 0.3315) [
9] and
Elaeis guineensis Jacq. (mean H
e from 0.29 to 0.33) [
45]. Furthermore, the I and PIC value measured based on SNP markers in this study (mean I = 0.5258, mean PIC = 0.2810) were considerably lower than those determined for
S. psammophila (mean I = 1.345, mean PIC = 0.714) [
22] and
P. tomentosa (mean I = 0.80, mean PIC = 0.385) [
17] with SSR markers, yet consistent with values reported for
U. parvifolia (mean I = 0.5041, mean PIC = 0.2686) [
9] and
E. guineensis (PIC from 0.23 to 0.269) [
45] based on the same SNP markers. Therefore, we believe that high values of genetic parameters, such as effective alleles, heterozygosity, and Shannon-Wiener index, are accompanied by an increase in the allelic variation of molecular markers, indicating that the type and efficiency of the marker will have tremendous influence on the estimation of genetic parameters. As a result, the authentic features of population genetic diversity might be most accurately revealed through the selection of more polymorphic markers.
Changes in population structure, which are mainly affected by genetic mutation, mating system, genetic drift, and selection, could be reflected by F statistics, which are ordinarily measured via F
IS, the total inbreeding coefficient of population (F
IT), and F
ST. In this study, all the F
IS values were negative, which is consistent with the report on
P. tomentosa [
17], indicating that a significant excess of heterozygotes exists among these populations across the entire natural distribution area in Northeast China. Similarly, negative F
IS values were observed in partial populations of S.
purpurea [
23] and
S. arbutifolia [
29], which is more likely due to interspecific hybridization or hybridization of inbred cultivars. In contrast, some Japanese populations of
S. arbutifolia showed positive values of F
IS, suggesting a deficiency of heterozygosity which might lead to bi-parental inbreeding [
29]. It is plausible that the population structure is affected by the geographical disjunction and habitat quality of these populations in Japan. However, for willows, the vegetative reproductive strategy is another critical factor contributing to an excess of homozygotes, as observed for
S. purpurea and
S. eriocephala [
43]. Moreover, the null alleles generated using SSR markers may also be attributable to the positive value of F
IS [
44]. The overall F
ST value in our study (from 0.0068 to 0.3063) indicated significant differentiation among provenances, which is comparable with data for S.
purpurea (from 0.064 to 0.420) [
23] and
S. arbutifolia (from 0.01 to 0.41) [
29], but much higher than that reported for
S. viminalis (from 0.040 to 0.119) [
21],
S. psammophila (from 0.008 to 0.016) [
22],
S. alba (F
ST = 0.07) [
25], and several
Populus species [
44]. As described by Wright [
46], F
ST > 0.25 represents an enormous genetic differentiation, 0.25 > F
ST > 0.15 indicates a comparatively large genetic differentiation, 0.15 > F
ST > 0.05 shows a moderate genetic differentiation, and F
ST < 0.05 is negligible. According to the criteria, large genetic differentiation was found for eight provenance combinations, among which were HZ and CB, as well as JGDQ and CB (
Table 4). Of the eight combinations, all six provenances from the Heilongjiang province were related to CB, which is the farthest geographic provenance in Jilin province (
Figure 1). Such high-level differentiation was also observed in the phylogenetic tree (
Figure 3a). The lowest F
ST value was detected between XL and HZ (0.0068), which belongs to the same administrative division as Jiagedaqi. Similar results were also observed on S.
purpurea [
23]. This suggests that gene flow was impeded by pollen competition among the local male individuals. Therefore, all of these evidences have proven that the genetic differentiation of
C. arbutifolia in China is significantly associated with geographic distribution.
Natural selection drives species adaption to the environment through genetic changes between generations [
47]. The variation in gene frequency among populations facilitates phenotypic differentiation [
48]. Thus, the identification of genes or genomic loci associated with natural selection is of major relevance for protecting germplasm resources. In our study, out of the 105,857 efficient SNP markers, only 18 were related to environmental variables, of which five had specific functions, and one was uncharacterized. The Dicer protein was initially described in animals. Its ortholog in plants is the DCL protein. Research involved into the function of DCL focuses on the presence and function of siRNAs, which are major regulators of growth, development, and resistance to biotic or abiotic stress [
49]. Similarly, as a prominent signaling molecule, mitogen-activated protein kinase (MAPK) participates in the transduction of developmental and environmental signals into programmed and adaptive responses in plants, thereby regulating gene expression [
50]. While sulfite oxidase genes were previously cloned in
Arabidopsis thaliana,
Solanum tuberosum, and
Populus, functional research has mostly concentrated on
A. thaliana, where sulfite oxidase could protect plants against sulfur dioxide [
51]. There are few reports on the function of prefoldin subunit 5 and hippocampus abundant transcript-like protein in plants, which should be addressed in further studies.