**Genetic and Phenotypic Variation in Tree Crops Biodiversity**

Editor

**Gaetano Distefano**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Gaetano Distefano Universita degli Studi di Catania ` Italy

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Forests* (ISSN 1999-4907) (available at: https://www.mdpi.com/journal/forests/special issues/Tree Crops Biodiversity).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-0638-8 (Hbk) ISBN 978-3-0365-0639-5 (PDF)**

Cover image courtesy of Gaetano Distefano

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**



### **About the Editor**

**Gaetano Distefano** is a researcher currently working at the Department of Agriculture, Food and Environment of University of Catania (Italy). During his research career, he has collaborated with several research centers, including the CSIC of University of Zaragoza (Spain) to study aspects of Citrus reproductive biology, the University of New England (Australia) to study HRM fruit tree molecular markers' characterization, and the Oxford Brookes University (UK) and Hunan Agricultural University (China) to study specific techniques of genetic transformation and gene expression analysis. He has been lecturing at the University of Catania since 2012 on fruit tree crops' genetic improvement. Most of his studies have focused on gene identification and expression quantification in Citrus reproductive biology and fruit ripening. He also has a solid background in histology for flower biology studies. He has also worked in studies of biodiversity characterization and phyletic relationship among Citrus species and related genera and on the molecular characterization (AFLP, SSR and SNPs) of several Mediterranean fruit tree species such as carob, pomegranate, olive, almond, pear, cactus pear and grape, developing new markers and new methods of analysis.

### *Editorial* **Research and Application of Molecular and Phenotypic Data for Tree Biodiversity Evaluation**

**Gaetano Distefano**

Department of Agriculture, Food and Environment, University of Catania, Via Valdisavoia 5, 95123 Catania, Italy; distefag@unict.it

The main challenges for tree crop improvement are linked to the sustainable development of agro-ecological habitats, improving the adaptability to limiting environmental factors and resistance to biotic stresses or promoting novel genotypes with improved agronomic traits.

The exploitation of plant genetic resources and their conservation and sustainable management are the key to guaranteeing environmental and food security for future generations, since this will determine the ability of crops to adapt in several conditions. Significant advances in the application of molecular tools to characterize and conserve genetic diversity in tree crops took place in the last decades.

Molecular marker technologies are being increasingly used to explore genetic structure and function, and to provide high-resolution profiling of nucleotide variation within tree crop germplasm collections. Advances in DNA-derived data and innovative phenotyping are bridging the genotype-to-phenotype gap in tree crop selection.

In this Special Issue, fourteen original articles and one review represent a brief overview of the latest genetic and phenotypic trait characterization for tree crop.

Genetic characterization of different tree crops was performed by studying S-allele segregation, epi-markers, and molecular markers.

The study by Bennici et al. [1] was conducted on Italian pear germplasm collection (*Pyrus communis*) composed of varieties selected across the last two centuries, for their traits of agronomic interest and was complemented with wild related species (*P. pyrifolia*, *P. amygdaliformys*), for the S-allele genotyping. The results shaded light on the differences between the traditional varieties, with the wild-related species reflecting a more complex history of hybridization.

Tuisima-Coral et al. [2] used the amplified fragment length polymorphic (AFLP) fingerprints to estimate the genetic diversity of *Guazuma crinita*, a fast-growing timber tree species. The analysis of molecular variation showed higher genetic diversity within rather than among provenances. Cluster analysis and principal coordinate analysis did not show correspondence between genetic and geographic distance and significant genetic differentiation among population types.

Ma et al. [3] detected epimarkers using a methylation-sensitive amplification polymorphism (MSAP) method and performed epimarker–trait association analysis on the basis of nine growth and wood property traits within populations of 432 genotypes of *Populus tomentosa*. Tree height was positively correlated with relative full-methylation level, suggesting that changes in DNA methylation might contribute to regulating tree growth and wood property traits.

The study by Rollo et al. [4] assessed the genetic diversity and structure of the icecream-bean (Inga edulis Mart.; Fabaceae) in wild and cultivated populations from the Peruvian Amazon. The average pod length in cultivated trees was significantly higher than that in wild trees. The expected genetic diversity and the average number of alleles was higher in the wild compared to the cultivated populations; thus, a loss of genetic diversity was confirmed in the cultivated populations.

**Citation:** Distefano, G. Research and Application of Molecular and Phenotypic Data for Tree Biodiversity Evaluation. *Forests* **2021**, *12*, 564. https://doi.org/10.3390/f12050564

Received: 14 April 2021 Accepted: 25 April 2021 Published: 30 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Lee et al. [5] characterized 410 tea accessions collected from South Korea, using 21 simple sequence repeat (SSR) markers. The study revealed the presence of lower diversity and a simpler population structure in Korean tea germplasms. Suggesting, more attention on collecting and conserving the new tea individuals, to broaden the genetic variation of new cultivars in future breeding of the tea plant.

Particular interest was paid to the novel sequencing method of plastid genome combined with molecular markers or phenotypic traits for several forest tree species as a tool to better define the genetic phylogenies and domestication process.

In particular, Su et al. [6] studied the different phenotypic traits of four endemic Ilex species (*I. latifolia*, *I. suaveolens*, *I. viridis*, and *I. micrococca*) on Mount Huangshan, China. A comprehensive comparison of plastomes within eight Ilex revealed the incongruence with the traditional taxonomy, whereas it informed a strong association between clades and geographic distribution.

Zeng et al. [7] used chloroplast DNA sequences and 29 nuclear microsatellite markers to investigate *C.* × *kuchugouzhui* (natural hybrid between *C. sclerophylla* and *C. tibetana*) confirming that it is a rare hybrid between *C. sclerophylla* and *C. tibetana*.

The study by Guo et al. [8] was based on the first collection of cultivated *Paeonia rockii* (flare tree peony, FTP) germplasm, across the main distribution area. Using phenotypic traits, expressed sequence tag (EST)-simple sequence repeat (SSR) markers and chloroplast DNA sequences (cpDNA), the authors showed that the selected accessions could fully reflect the genetic background information of FTP germplasm resources, so their protection and utilization would be of great significance for genetic improvement of woody peonies.

Deng et al. [9] carried out two different studies on *Michelia shiluensis* (Magnoliaceae), a rare and endangered magnolia species found in South China. The first study was dedicated to examining the genetic diversity of *M. shiluensis*, in which high genetic diversity and low differentiation were detected using eight nuclear single sequence repeat (nSSR) markers, and two chloroplast DNA (cpDNA) markers.

The second study relied on the complete chloroplast genome sequencing of *M. shiluensis* [10]. The genetic structure was represented by 160,075 bp in length with two inverted repeat regions (26,587 bp each), a large single-copy region (88,105 bp), and a small copy region (18,796 bp). The genomic information presented in this study was valuable for further classification, phylogenetic studies, and to support ongoing conservation efforts.

Yu et al. [11] reported the complete chloroplast genomes of five species of Acer sect. Platanoidea. The length of Acer sect. Platanoidea cp genomes ranged from 156,262 bp to 157,349 bp and detected the structural variation in the inverted repeat (IRs) boundaries. Platanoidea, with high resolutions for nearly all identified nodes, suggests a promising opportunity to resolve infrasectional relationships of the most species-rich section Platanoidea of Acer.

On the other hand, phenotypic traits were evaluated to select improved individuals or traits.

The study by Kim et al. [12] was conducted to select plus trees of two evergreen oaks, *Quercus salicina* and *Q. glauca*, in Korea. To select the candidate trees, they developed a subjective grading system with six characteristics in three categories and introduced a weighted generalized value (GVIw) to compare the superiority of the candidate trees. Through this process, 44 candidate trees of *Q. salicina* and 41 candidate trees of *Q. glauca* were selected as plus trees.

The study by Baniulis et al. [13] monitored the constitutive protein expression differences detected during active growth associated with cell metabolism and stress response, and conveyed a population-specific adaptation to the distinct climatic conditions in geographically distant Scots pine (*Pinus sylvestris* L.) populations adapted to specific photoperiods and temperature gradients, and which markedly vary in the timing of growth patterns and adaptive traits.

Dinulică et al. [14] measured the acoustic properties of wood for identifying simple, valid criteria for diagnosis, which remains an exciting challenge when selecting materials

for manufacturing musical instruments. The results showed that the spruce trees with acoustic and structural features that match the requirements for the manufacture of violins have a bark phenotype distinguishable by color—as well as by scale shape. The southfacing side of the trunk and the external side of the scale were best for identifying resonance trees by their bark. The authors concluded that the differences among bark phenotypes were noticeable to the naked eye.

Zhao et al. [15] monitored the leaf color mutation as ideal materials for studying pigment metabolism, chloroplast development and differentiation, photosynthesis, and other pathways that could also provide important information for improving varietal selection. In this review, the authors summarized the research on leaf color mutants, such as the functions and mechanisms of leaf color mutant-related genes, which affect chlorophyll synthesis, chlorophyll degradation, chloroplast development, and anthocyanin metabolism. The review provides a reference for the study and application of leaf color mutants in the future.

This Special Issue end points were to contribute to the growth of this area of research, trigger research interest on biodiversity conservation, and valorization, by adding genetic and phenotypic information scientifically substantiated with new data.

We would like to thank all authors and reviewers of the papers published in this Special Issue for their great contributions and efforts. We are also grateful to the editorial board members and to the staff of the Journal for their kind support in the preparation steps of this Special Issue.

**Funding:** This research received no external funding.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


### *Article* **Deciphering** *S***-RNase Allele Patterns in Cultivated and Wild Accessions of Italian Pear Germplasm**

**Stefania Bennici 1,**†**, Mario Di Guardo 1,**†**, Gaetano Distefano 1,\*, Giuseppina Las Casas 2, Filippo Ferlito 2, Paolo De Franceschi 3, Luca Dondini 3, Alessandra Gentile 1,4 and Stefano La Malfa <sup>1</sup>**


Received: 4 November 2020; Accepted: 19 November 2020; Published: 22 November 2020

**Abstract:** The genus *Pyrus* is characterized by an *S*-RNase-based gametophytic self-incompatibility (GSI) system, a mechanism that promotes outbreeding and prevents self-fertilization. While the *S*-genotype of the most widely known pear cultivars was already described, little is known on the *S*-allele variability within local accessions. The study was conducted on 86 accessions encompassing most of the local Sicilian varieties selected for their traits of agronomic interest and complemented with some accessions of related wild species (*P. pyrifolia* Nakai, *P. amygdaliformis* Vill.) and some national and international cultivars used as references. The employment of consensus and specific primers enabled the detection of 24 *S*-alleles combined in 48 *S*-genotypes. Results shed light on the distribution of the *S*-alleles among accessions, with wild species and international cultivars characterized by a high diversity and local accessions showing a more heterogeneous distribution of the *S*-alleles, likely reflecting a more complex history of hybridization. The *S*-allele distribution was largely in agreement with the genetic structure of the studied collection. In particular, the "wild" genetic background was often characterized by the same *S*-alleles detected in *P. pyrifolia* and *P. amygdaliformis*. The analysis of the *S*-allele distribution provided novel insight into the contribution of the wild and international cultivars to the genetic background of the local Sicilian or national accessions. Furthermore, these results provide information that can be readily employed by breeders for the set-up of novel mating schemes.

**Keywords:** *S*-genotyping; *S*-locus; *P. communis*; *P. pyrifolia*; *P. amygdaliformis*; genetic structure

#### **1. Introduction**

European pear (*Pyrus communis* L.) is an economically important fruit tree species belonging to the Rosaceae family. Like the majority of the Rosaceae, the genus *Pyrus* exhibits the *S*-RNase-based gametophytic self-incompatibility (GSI) system, evolved by flowering plants to prevent self-fertilization and promote outbreeding [1]. The GSI system prevents self-fertilization through a specific pollen–pistil recognition that selectively inhibits the growth of those pollen tubes recognized by the pistil as "self" (i.e., pollen from the same plant or from individuals that are genetically related) [2]. The GSI system

is controlled by the single, multigenic, and multiallelic *S*-locus expressing a female determinant in the style, the stylar ribonuclease (*S*-RNase) [3], and a male determinant in the pollen, a pool of F-box proteins known as SFBB (*S*-locus F-box brothers) [4]. In the GSI system, a match between the male *S*-determinant carried by the haploid genome of the pollen grain and one of the two *S*-alleles carried by the diploid genome of the stylar tissue (self-recognition) results in the arrest of the pollen tube growth triggered by the "self" *S*-RNase, which acts as a cytotoxin and possibly activates programmed cell death. Conversely, in non-self-recognition, *S*-RNase is inactivated by a specific F-box protein, through a proteolytic degradation mechanism [5]. Self-incompatibility is generally considered an undesired trait, especially for those cultivated species in which the success of the fertilization process is essential for fruit set. An understanding of the *S*-genotype is crucial for the choice of pollinators during the set-up of novel orchards and for novel breeding programs. Traditionally, the degree of compatibility between cultivars was determined directly in the field via the set-up of controlled crosses; however, this approach is time-consuming and often not reliable in the discrimination between fully and semi-compatible combinations. The advent of molecular biology techniques enabled both the sequencing of the gene coding for the *S*-RNase and the development of molecular markers, allowing a fast and relatively inexpensive screening of the *S*-genotypes in germplasms of interest [6]. The *S*-RNase gene is composed of five consensus conserved regions (C1, C2, C3, RC4, and C5) and the highly conserved noncanonical hexapeptide (IIWPNW); moreover, between C2 and C3 is located the hypervariable region (RHV) harboring a highly polymorphic intron. This RHV region has been largely exploited to assess the *S*-locus diversity in European pear by cloning and sequencing PCR products amplified with universal primers on the basis of conserved regions [6–16]. Sanzol [14] developed a PCR-based method for the detection of 20 *S*-RNase alleles in European pear by using consensus primers simultaneously amplifying a large number of alleles characterized by different intron sizes, plus a set of allele-specific primers. Then, further primer pairs were developed by Nikzad Gharehaghaji and colleagues [16], allowing the detection of additional alleles of European pear, some of which were highly similar and possibly derived from other *Pyrus* species.

Sicily is characterized by a wide pear biodiversity; to this extent, Mount Etna represents an ideal reservoir of different local accessions due to the occurrence of different microclimates, soils, and orographic conditions combined with the ancient history of cultivation and natural seed-based propagation. Moreover, the geographic location of Sicily in the Mediterranean Sea and its historical involvement in commercial exchanges may have favored the introgression of several traits of agronomical interest from different pear species. This wide pear biodiversity includes autochthonous and wild pears such as *P. amygdaliformis* Vill. and *P. pyraster* (L.) Burgsd. (*P. communis* ssp. *pyraster* L.) that were largely employed as rootstocks to increase the hardiness and longevity of the trees in past centuries [17]. *P. amygdaliformis* is native to the Mediterranean region and is highly tolerant to drought stress [18]. *P. pyraster* comes from the western Black Sea region, with a distribution area spanning from the British Isles to Latvia, and it is believed to be one of the most probable ancestors that gave rise to European pear [19,20].

A previous study of genetic structure, conducted largely on the same accessions of the present study, revealed the presence of two subpopulations that can be reconducted to a "wild" and "cultivated" genetic status. Within the Sicilian local germplasm, only a small number of accessions were characterized by a high admixture of the two subpopulations, while the majority were characterized by a clear prevalence of one of the two subpopulations [17]. In such a genetic background, identifying the *S*-allele distribution within the subpopulations can provide valuable insights into the relationship and gene flow between wild and cultivated Sicilian pears.

In the present work, the PCR-based *S*-genotyping method described by Sanzol [14] was used to (i) ascertain the *S*-RNase composition in local Sicilian pear varieties and native wild accessions collected from the Mount Etna area in comparison with national and international varieties used as a reference, (ii) determine the distribution pattern of the *S*-alleles among the different groups

of accessions, and (iii) compare the *S*-allele distribution in genotypes characterized by "wild" and "cultivated" genetic backgrounds.

#### **2. Materials and Methods**

#### *2.1. Plant Material and DNA Extraction*

The germplasm in analysis consisted of 86 accessions composed of 43 local varieties (LV), 16 individuals belonging to wild related species (RS) (nine *P. pyraster* and seven *P. amydgaliformis,* all collected from Mount Etna area, Italy), 18 nationally cultivated varieties (NCV), and nine internationally cultivated varieties (ICV) (Table 1). Accessions were located in two ex situ germplasm collections located in Catania district (Sicily, South Italy): the experimental farm of Catania University (UNICT, 10 m above sea level (a.s.l.)) and the Germplasm Bank of "Parco dell'Etna" (Mt. Etna, 850 m a.s.l.).

Genomic DNA was extracted from fresh leaves using ISOLATE II Plant DNA Kit (Bioline, Meridian Life Science, Memphis, TN, USA) following the protocol provided by the manufacturer. The concentration and the purity of the extracted DNA were assessed using a Nanodrop 2000 spectrophotometer (Thermo Scientific, Waltham, MA, USA) and agarose gel electrophoresis.

#### *2.2. S-Genotyping Assay*

The *S*-genotyping assay was performed according to the protocol described by Sanzol [14] using a pair of consensus primers, PycomC1F and PycomC5R [12], and 17 specific primers able to discriminate among 23 different *S*-RNase alleles (Table S1, Supplementary Materials). The *S*-genotyping assay was implemented with additional allele-specific primer pairs for alleles PcS126 and PcS127 (Table S1, Supplementary Materials). Primer pairs were ad hoc developed in this study on the basis of genomic sequences of PcS126 (accession number KF588567) and PcS127 (accession number KF588568) using Primer-BLAST [21]. The nine ICVs and the NCVs "Bella di Giugno", "Coscia", and "Gentile", which have a known *S*-genotype, were used as controls (Table S2, Supplementary Materials).

Consensus PCR was performed in a 20 μL volume containing 40 ng of genomic DNA, 1× PCR buffer II, 2 mM magnesium chloride, 0.2 mM dNTPs, 0.6 μM each primer (PycomC1F and PycomC5R), and 1 U of MyTaq DNA polymerase (Bioline, Meridian Life Science, Memphis, TN, USA). Amplification was conducted using a program with an initial denaturation at 94 ◦C for 10 min, followed by 35 cycles at 94 ◦C for 30 s, 57 ◦C for 45 s, and 72 ◦C for 2 min, with a final cycle of 72 ◦C for 7 min.

Allele-specific PCR was performed in a 20 μL volume containing 40 ng of genomic DNA, 1× PCR buffer II, 2 mM magnesium chloride, 0.2 mM dNTPs, from 0.3 to 0.6 μM each primer (Table S1, Supplementary Materials), and 1 U of MyTaq DNA polymerase (Bioline, Meridian Life Science, Memphis, TN, USA). Amplification was conducted using a program with an initial denaturation at 94 ◦C for 10 min, followed by 35 cycles at 94 ◦C for 30 s, 58 ◦C for 45 s, and 72 ◦C for 1 min, with a final cycle of 72 ◦C for 7 min.

Amplicons were separated by gel electrophoresis in 1% agarose stained with SYBR Safe DNA gel stain (Invitrogen, Carlsbad, CA, USA). Image acquisition and fragment size estimation were performed using Image LabTM software with the GelDOCTM XR+ system (BIO-RAD Molecular Imager®, Hercules, CA, USA).

#### *2.3. DNA Sequencing and Allele Identification*

Consensus PCR products were excised from agarose gel and purified using UPGRADE TO ISOLATE II Nucleic Acid Isolation Kits (Bioline, Meridian Life Science, Memphis, TN, USA) following the protocol provided by the manufacturer. Purified products were sequenced in the forward and reverse directions starting from primers PycomC1F1 and PycomC5R1 sequenced using an ABI310 genetic analyzer (Applied Biosystems, Foster City, CA, USA).


**Table 1.**

*S*-Genotypes

 and features of accessions employed in this study.


**Table 1.** *Cont.*


**Table 1.** *Cont.*

10

a prevalence (≥0.8) of

were classified "admixed", while samples that were not genotyped were scored as not applicable (NA). \* Reference genotypes.

subpopulations

 Q1 or Q2 were classified as "cultivated" or "wild", respectively, accessions showing a not clear

predominance

 of one of the two

subpopulations

#### *2.4. Clustering*

*S*-RNase alleles identified for each genotype were converted into a binary matrix and used to compute a principal component analysis (PCA) on the basis of a dissimilarity matrix, performed using the statistical package R [22].

#### **3. Results**

The germplasm was genotyped using the consensus primers PycomC1F1 and PycomC5R [12]. The analysis of the PCR products allowed the identification of six alleles: PcS101 (1300 bp), PcS102 (1700 bp), PcS104 (750 bp), PcS110 (2200 bp), PcS113 (2000 bp), and PcS120 (800 bp), while 16 *S*-alleles were identified through the use of specific primers (Table S1, Supplementary Materials). The consensus PCR products and the amplification for each *S*-RNase allele tested for every accession are shown in the Table S2 (Supplementary Materials).

The accession "Iazzuleddu" showed a consensus amplicon of 1650 bp, positive to PcS103 primers and a new PCR product size of approximately 850 bp, which could not be identified by any of the tested allele-specific primers. The same amplicon was detected in the LVs "Azzone di Cassone", "Chiuzzu", "Faccibedda", "Franconello", "Ianculiddu", "Moscatello maiolino", "Pauluzzo", "Piru Pizzu", and "Tabaccaro" and in the two RS genotypes of *P. pyraster* (no. 4 and no. 8). The sequencing of the 850 bp amplicon of "Iazzuleddu" showed a 100% similarity to the *S*-RNase-PcS127 allele of *Pyrus communis* (Sequence ID: KF588568.1). A new pair of primers was designed to selectively amplify the *S*-RNase-PcS127 allele (Table S1, Supplementary Materials) and allowed its detection in all genotypes carrying the band of 850 bp producing an amplicon of 214 bp.

The LV "Pauluzzo", in addition to the band of 850 bp (PcS127 allele), was characterized by a smaller band of 680 bp that was not identified by any of the allele-specific primers. The sequencing of the 680 bp amplicon revealed a 99% similarity to the *S*-RNase-PcS126 allele of *Pyrus communis* (Sequence ID: KF588567.1). This amplicon was also detected in the LVs "Adamo", "Faccibedda", "Franconello", "Moscatello maiolino", and "Paradiso Confittaru", in the NCV "S. Pietro", and in the RS *P. amygdaliformis* (no. 2). A specific primer pair was designed to selectively amplify the *S*-RNase-PcS126 allele (Table S1, Supplementary Materials) producing an amplicon of 100 bp in all genotypes carrying the initial band of 680 bp.

Summing up, the use of the consensus, the specific, and the two ad hoc designed primers allowed the detection of 24 *S*-alleles; for 72 accessions, both *S*-alleles were detected (resulting in 48 different *S*-genotypes), while, for the remaining 14 accessions, only a single allele was detected (Table 1). The relative frequencies of the *S*-RNase alleles identified for each group (ICV, NCV, LV, and RS) are shown in Table 2.

The *S*-allele showing the highest absolute frequency in the germplasm was PcS103, detected in 23 accessions (Table 2). Looking at the distribution of the *S*-RNase allele among the four pear groups, PcS103 was detected only among Italian varieties (14 LVs and nine NCVs) (Table 2). The *S*-RNase alleles PcS101, PcS104, PcS105, and PcS108 (identified in 21, 16, 11, and nine accessions, respectively) were detected, although with different frequencies, in all four groups; in contrast, five *S*-RNase alleles (PcS110, PcS113, PcS114, PcS115, and PcS121) were group-specific (PcS110 for RS, PcS113 and PcS114 for ICV, and PcS115 and PcS121 for LV) (Table 2). None of the *S*-RNase alleles detected in more than two samples were found in only one of the four classes presented (Table 2).

A number of 68 out of the 86 accessions here characterized were previously SSR-genotyped, and genetic structure analysis detected two subpopulations defined as "wild" and "cultivated" [17] (Table 1). The "wild" subpopulation largely characterized the RS group (contributing for an average of 93.4% on the genetic makeup of such accessions), while the "cultivated" subpopulation was predominantly detected in the ICVs (average of 87.4%). A more complex pattern was detected for the accessions deemed as LVs or NCVs; in both cases most of the accessions showed a clear prevalence (more than the 80%) of one of the two subpopulations with only five accessions showing a more balanced presence of the "wild" and "cultivated" subpopulations ("admixed" [17]). Figure 1 showed the relative frequency

of the different *S*-alleles according to the structure analysis (subpopulations "wild", "cultivated", and "admixed").


**Table 2.** *S*-Allele frequencies among analyzed accessions.

For each of the *S*-alleles detected, the absolute frequency is reported together with the relative frequency according to the four classes: RS (wild related species), LV (local varieties), NCV (nationally cultivated varieties), and ICV (internationally cultivated varieties).

**Figure 1.** *S*-Allele frequency among analyzed accessions according to their population structure information shown in Bennici et al. (2018) [17]. Samples without an assigned subpopulation are excluded.

Results indicated that the two most abundant *S*-alleles, PcS103 and PcS101, were largely detected in accessions characterized by a clear predominance of the "cultivated" subpopulation (60% and 56%, respectively; blue color in Figure 1), whereas the other *S*-alleles were mostly associated with individuals characterized by the "wild" subpopulation (red color in Figure 1). The alleles PcS116, PcS120, PcS122, PcS123, PcS124, and PcS126 (18 accessions in total) were detected only in samples showing a "wild" genetic background (Figure 1).

The results presented in Table S2 (Supplementary Materials) were converted into a binary matrix and used to compute a PCA (Figure 2).

**Figure 2.** Principal component analysis (PCA) depicting the distribution of accessions over the first two PCs (Dim1 and Dim2) performed on the genetic data scored for the presence/absence of each S-allele. (**A**) PCA biplot: different colors indicate the subpopulations detected with structure analysis in Bennici et al. (2018) [17]: "wild" (red); "cultivated" (blue); "admixed" (black); not available (gray). (**B**) Loading projections of the variables employed.

The first two PCs explained 27.3% of the whole variability (14.2% and 13.1% for PC1 and PC2, respectively). The analysis of the first two principal components (Dim1 and Dim2) allowed the definition of four clusters composed of five (Dim1 > 0, Dim2 ∼ 0, S-genotype = PcS101, PcS103), 16 (Dim1 > 0, Dim2 > 0, one PcS101), 18 (Dim1 ∼ 0, Dim2 < 0, one PcS103), and 47 (Dim1 < 0, Dim2 ∼ 0, other *S*-alleles) accessions. The five accessions characterized by a PcS101/PcS103 *S*-genotype included the NCVs "Butirra", "Buona Luisa", and "Virgolese", and the LVs "Putiru d'Estate" and "Pergolesi" (Figure 2). Even though the PCA was computed with *S*-allele data, the first principal component (Dim1) was highly predictive for the genetic structure results, with individuals plotted in the upper-right and lower-right quadrants (Dim1 > 0) showing a predominance of the "cultivated" subpopulation (blue color), while samples characterized by Dim1 negative values were largely "wild" (red color).

#### **4. Discussion**

Despite its importance for breeding and as an agronomic trait, information on GSI genetic background is mainly known for the most commonly used varieties. The germplasm collection herein analyzed encompassed local cultivars selected through the last two centuries for their traits of agronomical interest such as chilling requirements, resistance to biotic/abiotic stress, and fruit quality. The present work aimed to decipher the *S*-genotype of such local varieties and to assess similarity and differences with the close wild accessions *P. pyraster* and *P. amygdaliformis* and with some national and international cultivars of *P. communis*. Analyses were carried out employing the PCR-based *S*-genotyping method described by Sanzol [14], resulting in the identification of 24 *S*-alleles [14,16]. *S*-RNase alleles identified in the reference ICVs and the NCVs "Bella di Giugno", "Coscia", and "Gentile" agreed with previous reports [6,8,10,11,14,23,24], confirming the reliability of the protocol for the detection of the known European pear *S*-RNases.

*S*-Allele genotyping allowed the definition of the complete *S*-genotype for 84% of the accessions, while, for the remaining 14 samples, only one allele was identified. Given the forced heterozygosity at the *S*-locus, the detection of a single amplicon (Table S2, Supplementary Materials) implied the presence of additional alleles that were undetected suggesting the occurrence of sequence diversity and/or of insertions/deletions (INDELs) at the probe site [25]. These 14 genotypes included mostly LVs and RSs, reinforcing the assumption that the gene pool of pears from the Etna region is, to a certain extent, different from that of the widely employed cultivars.

*S*-Alleles were not uniformly distributed across the germplasm; in particular, the most abundant *S*-alleles were PcS103, detected in 23 accessions (61% LV; 39% NCV), and PcS101 detected in 21 accessions (10% RS; 48% LV; 19% NCV; 24% ICV) (Table 2). Fourteen *S*-alleles were detected in five or fewer accessions each (Table 2). Such variability reflected the different history and utilization of the accessions, with NCV or ICV cultivars widely employed both for cultivation and in breeding plans, such as "Coscia", "Gentile", "Cascade", "Dr. Jules Guyot", "Max Red Bartlett", "Old home", and "Williams", contributing to the spread of either PcS103 or PcS101. Conversely, only two RS (*P. amygdaliformis* n.3 and *P. pyraster* n.7) showed PcS101 and none showed PcS103 (Table 1).

None of the most abundant *S*-alleles were exclusively detected in one of the four classes in which the germplasm was subdivided according to its diffusion, suggesting a high heterogeneity within each of the groups, with LV and NCV accessions being highly similar to either RS or ICV.

This result agreed with the findings of the structure analysis performed by Bennici et al. [17] in which the presence of two subpopulations ("wild" and "cultivated") was revealed and that, although characterizing the RSs and ICVs, respectively, coexisted in the LVs and NCVs. Within these two groups, accessions were either "wild" or "cultivated" with very few accessions showing an admixed genetic configuration. The high genetic diversity between RSs and ICVs was here confirmed by the analysis of the *S*-alleles; in fact, 10 *S*-alleles (PcS106, PcS109, PcS110, PcS111, PcS116, PcS120, PcS122, PcS123, PcS126, and PcS127), detected in at least one RS accession, were absent in the ICVs, and four *S*-alleles (PcS102, PcS107, PcS113, PcS114) were detected in ICVs and absent in RS accessions.

Interestingly, when the *S*-allele genotypic data were matched with the population structure results, the most frequent *S*-alleles, PcS103 and PcS101, were found largely present in the "cultivated" group (Figure 1), while the remaining *S*-alleles were exclusively or predominantly detected in the "wild" accessions. The PCA analysis confirmed the close relationship between *S*-alleles and the genetic stratification of the germplasm collection with accessions that largely clustered according to their "wild" or "cultivated" nature (Figure 2).

Among the *S*-alleles largely present in the "wild" subpopulation, PcS126 and PcS127, were detected in 20 accessions through the design of specific primers (Table 2), the same *S*-alleles were already detected among Iranian local varieties [16]. Nikzad Gharehaghaji and colleagues [16] highlighted the high sequence homology of PcS126 and PcS127 with the *S*-alleles of the Asian pear species *P. korshinskyi* (S9) and *P.* × *bretschneideri* (S19), respectively, suggesting that these alleles might have been introgressed by hybridization. In fact, *P. korshinskyi* Litv. is native to Central Asia, and it was proposed to have originated from the hybridization between *P. communis* and *P. regelii* Rehder., a species native of Afghanistan [26–28]. The Chinese white pear (*P*. × *bretschneideri* Rehder.) is native of East Asia [28]. Molecular studies using RAPD and SSR markers suggested that *P*. × *bretschneideri* might share a common ancestor with *P. pyrifolia* [29,30]. Recently, a genotyping-by-sequencing (GBS) study proposed that the genomes of the Ussurian pear (*P. ussuriensis* Maxim.) and the Chinese sand pear (*P. pyrifolia* Nakai) could have both contributed to the origin of *P.* × *bretschneideri* [31]. Interestingly, four LVs ("Faccibedda", "Pauluzzo", "Moscatello Maiolino", and "Franconello") showed both alleles PcS127 and PcS126, suggesting a possible contribution of Asian pear species on the genetic backgrounds of these accessions.

The *S*-RNase allele PcS117 (detected in the NCV "Bianchetto" and the LVs "Piccola Dolce", "Piridda", "Pistacchino", "Sciaduna", and "Spineddu", Table 1) was amplified using the primer pairs developed for the PpS9 allele of the Japanese pear *P. pyrifolia*. PpS9 is one of the most common *S*-alleles characterizing Japanese pear; nevertheless, the high similarity between PpS9 and PcS117 from *P. communis* was already described in [16]. The occurrence of hybridization events between *P. communis* and other wild species is also confirmed by the high degree of homology between the *S*-RNase alleles of *P. pyraster* and those of *P. communis* [20]. Furthermore, many plants identified as *P. pyraster* likely represent various stages of hybridization between *P. pyraster* and *P. communis* [32]. Such close genetic proximity of the wild accessions to most of the LVs could also be explained by their wide use as rootstock to propagate selected varieties, as well as increase plant vigor and adaptability in different pedoclimatic conditions [33].

Collectively, the *S*-genotyping results confirmed the existence of genetic distinctness between the "wild" and "cultivated" subpopulations which emerged from previous SSR analyses. While natural and human selections indeed shaped the population genetic structure differently, forced allogamy and insect-mediated pollination favored gene flow between wild and cultivated populations. It is reasonable to hypothesize that at least some of the ICVs did not come in contact with the Sicilian pear populations, preventing gene exchange with local genotypes; however, those cultivars and genotypes which were introduced in Sicily in historical times offered "new" *S*-alleles that had the chance to spread into the local gene pool. It should be considered that, unlike other loci, the *S*-locus is subject to frequency-dependent balancing selection; i.e., pollen harboring rare alleles has increased chances to be accepted by pistils with respect to more frequent ones [34], making the frequency of a rare allele increase across generations until an equilibrium is reached. In such a scenario, an *S*-allele introduced in Sicily through foreign cultivars would not only rapidly spread in the local population (thanks to the ability of its pollen to be accepted by 100% of local pistils), but would then have a great chance to become a stable part of the local gene pool and to be maintained for long times in the population, as frequency-dependent balancing selection makes it very unlikely to loose *S*-alleles due to random frequency fluctuations or genetic drift [35]. Natural selection, therefore, might have favored the introgression of new *S*-alleles from cultivated to wild populations; however, on the other hand, the opposite path (from wild to cultivated material) would be theoretically more unlikely to occur, as human selection tends to eliminate wild-related detrimental traits, which in most cases affect hybrid progenies. On the basis of these assumptions, wild populations are expected to maintain a greater allelic diversity at the *S*-locus than cultivated ones. The SSR-based data on population structure previously described, combined with the *S*-genotypes determined in this study, support the following hypothesis: when the two groups supposed to correspond to "wild" and "cultivated" subpopulations according to SSR data are analyzed separately, the former shows a higher number of *S*-alleles than the latter (19 vs. 11; Figure 1). Moreover, allele frequencies in the "wild" subgroup are less skewed, with none of the alleles reaching 10%, while, in the "cultivated" group, only two alleles accounted for more than 50% of the allelic composition of the entire subpopulation (PcS101 and PcS103; Figure 1). The *S*-allele composition of the "wild" group is, therefore, less distant from an equilibrium state, in which natural balancing selection tends to maintain comparable frequencies for all the *S*-alleles present in the population.

#### **5. Conclusions**

The *S*-allele genotyping analysis was conducted on an ex situ collection encompassing most of the local Sicilian varieties selected for their traits of agronomic interest complemented with national/international cultivars and with related wild species. Results shed light on the distribution of the *S*-alleles among accessions and revealed that RSs display a high diversity from ICVs, in terms of the S-allele composition. On the other hand, LSs and NCVs showed a more heterogeneous distribution of the *S*-alleles as the results of a more complex history of hybridization. The analysis of the *S*-allele distribution provided novel insight into the contribution of RSs and ICVs to the genetic background of the LVs and NCVs. Furthermore, these results provide information that can be readily employed by breeders for the set-up of novel mating schemes, both for rootstocks and varieties, and they are the ideal completion of the phenotypic and genotypic evaluation of the Mount Etna pear germplasm described by Ferlito et al. (under review) and Bennici et al. [17].

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/11/1228/s1: Table S1. Allele-specific primer pairs used for *S*-genotyping; Table S2. *S*-Genotypes assigned to the accessions in analysis combining consensus and allele-specific PCR primers.

**Author Contributions:** Conceptualization, G.D. and L.D.; methodology: G.D. and P.D.F.; validation G.L.C. and P.D.F.; formal analysis, S.B. and M.D.G.; resources F.F.; data curation, M.D.G.; writing—original draft preparation S.B. and M.D.G.; writing—review and editing G.D., A.G., and S.L.M. All authors read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Comparative Survey of Morphological Variations and Plastid Genome Sequencing Reveals Phylogenetic Divergence between Four Endemic** *Ilex* **Species**

### **Tao Su 1,2, Mengru Zhang 1,2, Zhenyu Shan 1, Xiaodong Li 1, Biyao Zhou 1,2, Han Wu <sup>1</sup> and Mei Han 1,\***


Received: 3 August 2020; Accepted: 1 September 2020; Published: 3 September 2020

**Abstract:** Holly (*Ilex* L.), from the monogeneric Aquifoliaceae, is a woody dioecious genus cultivated as pharmaceutical and culinary plants, ornamentals, and industrial materials. With distinctive leaf morphology and growth habitats, but uniform reproductive organs (flowers and fruits), the evolutionary relationships of *Ilex* remain an enigma. To date, few contrast analyses have been conducted on morphology and molecular patterns in *Ilex*. Here, the different phenotypic traits of four endemic *Ilex* species (*I. latifolia*, *I. suaveolens*, *I. viridis*, and *I. micrococca*) on Mount Huangshan, China, were surveyed through an anatomic assay and DNA image cytometry, showing the unspecified link between the examined morphology and the estimated nuclear genome size. Concurrently, the newly-assembled plastid genomes in four *Ilex* have lengths ranging from 157,601 bp to 157,857 bp, containing a large single-copy (LSC, 87,020–87,255 bp), a small single-copy (SSC, 18,394–18,434 bp), and a pair of inverted repeats (IRs, 26,065–26,102 bp) regions. The plastid genome annotation suggested the presence of numerable protein-encoding genes (89–95), transfer RNA (tRNA) genes (37–40), and ribosomal RNA (rRNA) genes (8). A comprehensive comparison of plastomes within eight *Ilex* implicated the conserved features in coding regions, but variability in the junctions of IRs/SSC and the divergent hotspot regions potentially used as the DNA marker. The *Ilex* topology of phylogenies revealed the incongruence with the traditional taxonomy, whereas it informed a strong association between clades and geographic distribution. Our work herein provided novel insight into the variations in the morphology and phylogeography in Aquifoliaceae. These data contribute to the understanding of genetic diversity and conservation in the medicinal *Ilex* of Mount Huangshan.

**Keywords:** *Ilex* species; Aquifoliaceae; morphological traits; DNA C-value; plastid genome

#### **1. Introduction**

The *Ilex* L. (holly) is the only living woody dioecious angiosperm genus, accounting for approximately 700 species within the monogeneric family of Aquifoliaceae [1]. The *Ilex* species are evergreen and deciduous trees, prostrate shrubs, and climbers with a broad distribution from tropics to temperate regions [2,3]. Over 200 species have been documented in the center of East Asia and South America, whereas several species can grow in Europe, tropical Africa, and northern Australia [4]. The unexpected spreading in the oceanic islands as patchy populations prompted the assumption of unspecialized pollination and efficient seed dispersal by birds [5]. Recently, a report in

phylogeny implicated the origin of the Aquifoliaceae as subtropical lowlands with a mesic climate type. This study indicated that the eternal existence of *Ilex* species in humid and warm subtropical monsoon forests of southern China could trace back to the middle Eocene [6,7].

For tens of decades, various *Ilex* species have been used as herbal infusions in traditional Chinese medicine. Some large-leaved species of *Ilex* (e.g., *I. kudingcha* and *I. latifolia*) are processed by brewing to produce a bitter-tasting "Kuding" tea, used for its nutritional merit and nourishment for the daily health stimulation, consumption, and pharmaceutical drugs [8,9]. A traditional Chinese herb, *I. pubescens*, is used particularly to treat cardiovascular diseases [10]. Originating from southern areas of South America, *I. paraguariensis* is consumed commercially as a popular beverage, known as "yerba mate" in America and the Middle East, highly appreciated for its peculiar flavor and health maintenance effects [11,12]. An alternative mate tea, *I. dumosa*, has a similar taste and mild effects; it is available to satisfy the different consumers as a low-caffeine and xanthine-containing substitute [13]. As the natural resources for pharmaceuticals and dietary food, progress in phytochemistry and pharmacology has led to discoveries of terpenoids, saponins, flavonoids, glycosides, amino acids, and other bioactive compounds in many additional *Ilex* species, reflecting the worldwide economic, medicinal, and clinical value of *Ilex* [13–15]. Some larger native *Ilex* species have been developed for timber production and garden use. Other holly species (e.g., *I. aquifolium*, *I. opaca*, and *I. crenata*) retaining the green foliage and bright red drupes are grown as traditional Christmas decorations and ornamentals during the winter holidays [16,17].

Enabling the estimation of absolute genome size, flow cytometry (FCM) has provided relevant clues for taxonomy in various plant species [18]. Independent contrast studies suggested a significant relationship between the phenotypic traits (e.g., cell size and stomata density) and the genome size in angiosperms [19]. However, a proposed correlation between cellular architecture and organelle DNA content remains poorly understood [20]. As all plastids possess the same DNA and a few functional features, the plastid genome comparison between different species of plant branches, integrating basic backgrounds with gene content and bar codes, was significantly relevant to the understanding of the evolution of plastid DNA and the adapted ancient environments [21]. The plastid phylogeny offers an alternative strategy for the investigation of phylogeography due to its maternal inheritance and conserved circular structure as well as small sufficient population size [22]. Increasing evidence suggests that plastid genomes provide more suitable material for species identification and conservation, enabling a significant improvement in the resolution of branches under the frame of the nuclear phylogenies [23,24]. To date, emerging advances in sequencing technologies have increased the availability of plastid genomes to approximately 3000 in the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/). Therefore, contrasting analyses of the plastid genome and nuclear genome using the molecular genetic methodologies may provide the theoretical bases for the phylogenetic reconstruction and exploration of genetic diversity and plant systematics.

The unique subtropical monsoon pattern in Mount Huangshan (Anhui, China) led to high forest diversity and broad distribution of evergreen broad-leaved forest [25]. Based on a dynamic field survey in a large-scale forest plot, a total of 12 documented endemic *Ilex* species in Aquifoliaceae were found to be predominant [26]. However, comparative reports are lacking on the morphology traits in combination with molecular and genomic patterns within the *Ilex* genus. In this work, the four *Ilex* species were chosen to conduct a contrast analysis in terms of their phenotypic variation, DNA-C values, and the newly assembled plastid genomes. The main objectives were: (1) to investigate the possible relationships between the morphological traits and nuclear genome size between the four selected *Ilex* species, (2) to discover the variation and highly divergent regions of the plastome that could be used in the classification and identification of various *Ilex* species, and (3) to analyze the plastid genome structures and reconstruct the plastome-derived phylogenetic relationships of *Ilex* species in Aquifoliaceae.

#### **2. Materials and Methods**

#### *2.1. Plant Materials*

The respective organs (e.g., leaves and seeds) of *I. latifolia*, *I. suaveolens*, *I. viridis*, and *I. micrococca* were harvested from 10.24 ha (320 m × 320 m) forest plot (30◦8 26" N, 118◦6 38" E) in Mount Huangshan, Anhui, China [26]. The plot ranges altitude from 430 to 565 m with an annual average temperature of 7.8 ◦C and annual precipitation of 2394.5 mm. The voucher specimens of four *Ilex* species (accession numbers: YL20190417014, YL20190417015, YL20190417016, and YL20190417017) were preserved in the herbarium of Nanjing Forestry University, Nanjing, China.

#### *2.2. Phenotype Quantification and Determination of DNA Content*

For each species, more than 60 healthy mature leaves and 300 mature seeds of 5 independent trees of similar age were randomly sampled for the morphological analyses. In total, thirty anatomic sections of epidermis were prepared from the cut leaves (5 mm × 5 mm) macerated by H2O2-HAC solutions [27]. To quantitate the size of the upper leaf epidermal cell (LEC), the stomata aperture (STA), and the stomata density (STD), twenty visual fields were captured using the same scale during optical microscopy. The leaf area (LA) was measured from 50 randomly selected leaves by Image J v1.53c (https://imagej.nih.gov/ij/) after scanning with Expression 11000XL (EPSON, Beijing, China). The specific leaf area (SLA) was calculated based on the ratio of leaf area to leaf dry mass. We used 100 air-dried seeds to determine the seed weight (SW). The variance of three perpendicular seed dimensions (VSD) was calculated using the average of 50 seeds according to a previous report [28]. Spherical seeds have a variance of 0, and elongated or flattened seeds have a variance of up to 0.33. Flower size (FS) was measured from the average diameters of 12 female flowers. The significance of phenotypic variation between four *Ilex* samples was statically analyzed using SPSS 24.0 (https://www.ibm.com) based on the ANOVA (*p* < 0.05) and Duncan's multiple range tests.

For the determination of the DNA C-value, the young leaves were chopped to isolate the crude nuclear DNA with the addition of woody plant buffer followed by RNase digest. DNA staining of propidium iodide (PI) and FCM analysis were performed based on a previous report [29]. Together with the internal standard (*Solanum lycopersicum*, 2C = 2.00 pg), the resulting suspensions were analyzed with BD InfluxTM cell sorter (BD, Piscataway, NJ, USA). The histograms of FCM were generated by the software BD FACSTM 1.0.0.650. The coefficients of variation (CV) of DNA peaks below 5% were considered as reliable. The chromosome numbers in four diploid *Ilex* species were retrieved from the IPCN database (http://legacy.tropicos.org/Project/IPCN). The genome size (DNA C-value or the haploid DNA content) was calibrated by multiplying the standard by the ratio of the mean fluorescent intensity of each sample to that of the standard [30]. The DNA-C value is represented as means ± standard error (±SE) of at least four independent biological replicates.

#### *2.3. Plastome Sequencing, Assembly, Annotation, Codon Usage, and Repeat Analyses*

The fresh leaves of *I. suaveolens*, *I. viridis*, and *I. micrococca* were harvested and flash-frozen in liquid nitrogen. DNA was extracted using a method modified from a previous report [31]. The plastid genome sequencing data of *I. latifolia* were obtained from the NCBI database (https://www.ncbi.nlm.nih.gov/) based on the new accession MN688228 [32]. The next-generation sequencing of the whole-plastid genomes was performed by Biodata Biotechnologies Inc. (Hefei, China) for *I. suaveolens*, *I. viridis*, and *I. micrococca* on the BGISEQ-500 platform (BGI, Shenzhen, China). Approximately 50 MB of high-quality clean paired-end reads was generated, and the filtered sequences were assembled by SPAdes assembler 3.14.1 (http://cab.spbu.ru/software/spades/) with default parameters [33]. The genome was annotated using the CPGAVAS (http://47.96.249.172:16014/analyzer/home) program [34] and checked further by DOGMA (https://dogma.ccbb.utexas.edu/) following with nBLAST searches in NCBI based on the reference genome to identify the specific genes [1]. The circular graphical maps of the plastome were drawn using the program OGDRAWv1.3.1 (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) [35]. The complete

plastome sequences of *I. suaveolens*, *I. viridis*, and *I. micrococca* were submitted to the GenBank with accession numbers MN830249, MN830250, and MN830251, respectively. The relative synonymous codon usage for all protein-encoding genes was analyzed using MEGA X (https://www.megasoftware.net/) [36]. The use frequency of amino acids was calculated by the percentage of the codons divided by the total codons. The simple sequence repeats (SSRs) were analyzed by MISA (http://pgrc.ipk-gatersleben.de/ misa/) [37]. The parameters of categorized SSRs (mono-, di-, tri-, tetra-, pena-, and hexanucleotide) sequence length were set up with a minimum number of repeats of 8, 5, 4, 3, 3, and 3, respectively. The long repeat sequences, including the forward, palindrome, reverse, and complement repeats, were analyzed using the online program REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer) with adjusted parameters [38].

#### *2.4. Plastome Divergence and Phylogenetic Analyses*

Using the annotation of *I. suaveolens* as the reference, we compared of the entire plastid genomes of eight *Ilex* species in Aquifoliaceae using the program GView (https://www.gview.ca/wiki/GView/ WebHome) [39] and the mVISTA program (http://genome.lbl.gov/vista/mvista/submit.shtml) in the Shuffle-LAGAN mode [40,41]. For the inverted repeat (IR) expansion and contraction of border genes, eight *Ilex* plastomes were aligned to analyze the variations in the junctions of LSC, IRs, and SSC using IRscope (https://irscope.shinyapps.io/irapp/) [42]. In total, nineteen plastid genome sequences (15 *Ilex* species) were retrieved from the GenBank. *Populus trichocarpa*, *Populus deltoides*, *Quercus acutissima*, and *Helwingia himalaica* were used as outgroups. The multiple sequences were aligned using MAFFT v7.471 (https://mafft.cbrc.jp/alignment/server/index.html) [43]. The phylogenetic topology was constructed using the software MEGA X software by the methods of maximum likelihood (ML) and maximum parsimony (MP) using the nucleotide substitution model of Tamura-Neighbour. The bootstrap values are shown on the branches of the phylogenetic tree based on 1000 replicates.

#### **3. Results**

#### *3.1. Variation of the Morphological Trait and Nuclear DNA Content*

To investigate the variation in morphology between the four *Ilex* species, we initially performed a comparative analysis of phenotypic traits of the major vegetative and reproductive organs, including LA, LEC, SLA, STA, STD, SW, VSD, and FS (Table 1). Based on the anatomic analyses, the significant variation in LA was observed in all *Ilex* species, with *I. latifolia* having the highest value (79.19 cm2) (Table 1). Significant differences were also observed in SLA, ranging the values from 40.88 cm2/g (*I. latifolia*) to 165.4 cm2/g (*I. micrococca*). Regarding the LEC, the highest value (938.65 μm2) was found in *I. suaveolens*; however, the value varied insignificantly as *I. viridis* and *I. micrococca* have a similar-sized LEC (Figure 1b). Image analyses of stomata-related traits revealed that STD is drastically different in four *Ilex* species, and *I. micrococca* has the highest density (257.64/mm2; Table 1 and Figure 1c). The significantly different values in SLA appeared to be related to the STD values in the four *Ilex* species, as both traits showed similar patterns (Table 1). Analyses of VSD indicated that *I. suaveolens* seeds are closer to spherical shape (0.1137). However, for STA, VSD, and FS, significantly statistical distinctions were not found within the four *Ilex* species.

To determine the DNA-C value of four *Ilex* species by FCM, *S. lycopersicum* was used as the internal standard to calculate the genome size [44]. The DNA-associated fluorescence on FCM histograms showed that the CV values for G0/G1 peaks were between 2.92% and 4.67% (Figure 1d). The low CV (< 5%) indicated a constant quality considered reliable for FCM assessments [45]. The inspection of the FCM fluorescent peak revealed that *I. micrococca* has the most abundant nuclear DNA content (3.053 pg), followed by *I. viridis* (2.519 pg). *I. suaveolens* (2.242 pg) and *I. latifolia* (1.910 pg) exhibited similar levels of DNA 2C-value. Further calculation of nuclear genome size (NG) revealed that the mean levels vary with a range from 955 Mb (*I. latifolia*) to 1493 Mb (*I. micrococca*) in the four *Ilex* species (Table 1).


**Table 1.** Comparison of the morphological traits between four subtropical *Ilex* species.

LA, leaf area; LEC, upper leaf epidermal cells; SLA, specific leaf area; STA, stomata aperture; STD, stomata density; SW, weight of 100 seeds; VSD, variance of seed dimensions; FS, flower size in diameter; NG, nuclear genome size. The data represent mean values ± SE of the independent biological replicates (see materials and methods). According to Duncan's multiple range tests, the superscripted letters indicate statistical significance per ANOVA (*p* < 0.05) between four *Ilex* species.

**Figure 1.** Analyses of the morphological traits and the fluorescent histograms of DNA content in four *Ilex* species. (**a**) The phenotypes of leaves and fruits; (**b**) LEC images; (**c**) STA and STD images; (**d**) flow cytometry (FCM) histograms obtained from leaves of four *Ilex* species. Nuclear DNA was stained with PI, using tomato (*S. lycopersicum*) as the internal standard.

#### *3.2. The Plastid Genome Features and Sequence Divergence*

The de novo sequencing and reference-guided assembling of the plastid genome of four *Ilex* species revealed a double-stranded circular DNA with a range of length from 157,601 (*I. latifolia*) to 157,857 bp (*I. suaveolens*) (Figure 2). The typical combined quadripartite structures of the plastid genome contain two inverted repeats (IR) regions, IRA and IRB, between 26,065 and 26,102 bp, each separated by a large single-copy (LSC, 87,020–87,255 bp) and a small single-copy (SSC, 18,394–18,434 bp) sections. Except for *I. latifolia*, the plastomes encode 134 genes comprised of 89 putative protein-encoding (PE)

genes, 37 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes in the other three *Ilex* species (Table 2). Among the annotated genes in *Ilex* plastome, eight PE genes (*rps7*, *ndhB*, *ycf15*, *ycf2*, *rpl23*, *rpl2*, *rpl12*, and *ycf1*), five tRNA genes (*trnV-GAC*, *trnL-GAU*, *trnA-UGC*, *trnR-ACG*, and *trnN-GUU*), and all four rRNA genes (*rrn4.5*, *rrn5*, *rrn23*, and *rrn16*) are anchored in the IR regions. Twelve PE genes and one tRNA gene (*trnL-UAG*) characterize the locations in the SSC sections (Figure 2). Most of the gene sequences assembled as a single-copy, whereas 18 genes occur with duplication in IR regions, including seven PE genes, four rRNA genes, and six tRNA genes (Table 2). Interestingly, *ycf1* is the only variable gene identified with an incomplete duplication in the junction of SSC and IRB regions (Figure 2 and Table 2). In *Arabidopsis*, translocon complex of *ycf1*/*Tic214* and *Tic20* has been proposed as the central component for plastid proteins accumulation [46]; however, a recent report stated that *ycf1*/*Tic214* might not be involved in the general import machinery [47]. All four *Ilex* plastomes uniformly contain *ycf2*, a duplicated gene, showing a maximum length of 6894 bp in IR regions. Another single-copy gene *rpoC2* has a relatively shorter length (4155 bp) in the LSC section. Besides, a total of 19 genes, including 11 PE genes and 7 tRNA genes, were identified with two exons, while three of PE genes (*clpP*, *ycf3*, and *rps12*) contain three exons. Contrast analyses revealed a slight variation in GC content in various regions of four *Ilex* plastomes (Table S1). Additionally, analyses of the total numbers of universal genetic code for the coding genes showed a similar range from 26,729 (*I. viridis*) to 27,121 (*I. latifolia*) (Table S2). Based on the statistical analysis of the codon usage, leucine (Leu) represents the most abundant amino acid with a frequency of 10.5%, whereas cysteine (Cys) shows the lowest abundance of 1.1%. Overall, the codon usage in all identified genes exhibits similar patterns across the four *Ilex* species. The almost identical gene numbers, annotation, and plastid genome length prompted us to further explore the sequence variation and divergence through the analyses of SSRs and long repeat sequences.

**Figure 2.** The plastome map of four *Ilex* species. The genes are shown by the gray arrowheads. Genes inside the circle indicate clockwise transcription, and those outsides are transcribed counterclockwise. The protein-encoding genes are marked in different colors. The GC content graphs are included as dark gray bars toward the center of the diagram.


**Table 2.** List of annotated genes in the plastid genome of *Ilex*.

<sup>2</sup> Genes containing two exons; <sup>3</sup> genes containing three exons; <sup>d</sup> two gene copies in the IRs; <sup>d</sup><sup>+</sup> two gene copies with one shared exon 1 in the LSC, whereas exon 2 and 3 are in the IRs; \* pseudogene.

#### *3.3. Analyses of SSRs and Long Repeat Sequences*

Using the MISA program, a total of 221 SSRs (mono- /di- /trinucleotide repeats) were characterized in the plastid genomes of *I. latifolia* (46/2/1), *I. suaveolens* (46/3/1), *I. viridis* (53/3/1), and *I. micrococca* (52/2/1). Among the SSRs, the mononucleotide repeats are the most abundant type with a location in the non-coding regions, which is related to AT richness. The proportions of A/T sequences in the mononucleotide repeats appear to be identical in the four *Ilex* species, varying from 92.00% (*I. suaveolens*) to 92.98% (*I. viridis*), whereas C/G only exists in the plastid genomes of *I. latifolia* and *I. micrococca*. The AT/AT sequence of dinucleotide repeats showed similar levels (1.75%–4.08%); however, the AAT/ATT sequence of trinucleotide repeats showed significant variability, ranging in proportion from 1.82% to 6.00% (Figure S1). Other types of SSRs (e.g., tetra- and pentanucleotide repeats) were not identified in any *Ilex* plastid genomes. The long repeats (forward, reverse, complement, and palindrome repeats) were concurrently analyzed by REPuter. A total of 196 unique long repeats were detected in *I. latifolia* (24/0/0/22), *I. suaveolens* (38/9/3/0), *I. viridis* (41/8/1/0), and *I. micrococca*

(38/9/3/0) (Figure S2). The forward repeats appear to be the most common type in the four plastid genomes. We found no reverse and complement repeats identified in *I. latifolia*, but it uniquely contains the palindrome repeats (Figure S2a). Further analyses of various types of long repeats showed that the 20–30 bp sequence length is the typical pattern (Figures S2b–e). In four plastid genomes, the sequence length in reverse and complement repeats usually has a limitation of 30 bp, whereas this length can be extended to 60 bp in the forward and palindrome repeats.

#### *3.4. Comparative Analyses of Complete Plastomes in Ilex Species*

We used *I. suaveolens* as a reference to conduct a BLAST comparison of eight *Ilex* species, showing that the entire plastid genomes are well conserved across eight selected species. The LSC and SSC regions were found to be more substantially divergent than the IR regions (Table 3, Figure 3). The non-coding regions appear to have more significant variation compared with the coding regions, in which some genes are relatively conserved. The VISTA analysis resulted in the findings of 12 hotspot regions for genome divergence and variable genes (e.g., *rpoC1*, *rbcL*, *ndhF*, *clpP*, and *psbA*). The highly divergent hotspot regions are particularly located in the intergenic regions than in coding regions, including *trnH-psbA*, *matK-rps16*, *psbK-psbI-trnS-trnG*, *petN-psbM*, *trnE-psbD-trnT*, *trnS-psbZ-psaB*, *trnL-ycf3*, *rbcL-accD-ycf4*, *clpP-rpl33*, *rpl16-rpoA*, *rpl32-ccsA*, and *ycf15-rps12-rrn16* (Figure 4). These divergent sites may facilitate the development of potential DNA markers for the species identification and reconstruction of phylogeny in the genus *Ilex*.


**Table 3.** List of plastid genome features in *Ilex* species.

Further sequence comparison of quadripartite borders revealed that the IR regions are extremely conserved in eight *Ilex* species (Figure 5). The various sites of the *rps19*, *rpl2*, *ycf1*, and *ndhF* are generally located in the junction of IRs/SSC (JSA/B) and IRs/LSC (JLA/B). The *rps19* locates within the LSC region, showing 8–13 bp gaps to the JLB in *I. latifolia*, *I. pubescens*, *I. wilsonii*, *I. szechwanensis*, and *I. micrococca*. *rps19* was identified across the JLB with the addition of 4 bp in *I. paraguariensis*, *I. viridis*, and *I. suaveolens*. At the JLA, gene *rpl2* shows the same location of 55 bp away from the LSC in all *Ilex* species. A tRNA gene (*trnH*) showed 11 bp shifts in the LSC region (Figure 5). The JLB and JLA are moderately conserved, whereas the JSA and JSB are strikingly different in all *Ilex* species. One short copy gene, *ycf1* in IRB, shows a significant sequence variation, ranging in length from 1038 bp (*I. wilsonii*) to 1085 bp (*I. suaveolens*). The other long copy gene of *ycf1* in IRB shows an identical length of 5690 bp in most *Ilex* species, excluding *I. paraguariensis* (5693 bp), *I. wilsonii* (5684 bp), and *I. micrococca* (5684 bp). Within the SSC, the gene *ndhF*, which is correlated with photosynthesis, differs in length from 2231 bp, 2258 bp (15 bp shift), and 2270 bp (40 bp shift) from the JSB. The overlapping sequences of the short copy gene *ycf1* and *ndhF* are also commonly found in *I. latifolia*, *I. paraguariensis*, *I. micrococca*, *I. viridis*, and *I. suaveolens*. Other variable sequences were also identified between the gene *psbA* and the JLA (Figure 5). Overall, the compared sequence information of the IRs/SC boundaries suggested that the contractions and expansions in IRs regions have relatively stable patterns in the eight *Ilex* plastid genomes.

**Figure 3.** Plastome comparison between eight *Ilex* species in GView. The eight outer circles represent the BLAST results for *I. suaveolens* vs. *I. latifolia*, *I. paraguariensis*, *I. pubescens*, *I. wilsonii*, *I. micrococca*, *I. viridis*, *I. szechwanensis*, and itself, respectively. The clockwise inner cycle shows the CDS, ribosomal RNA (rRNA) genes, and transfer RNA (tRNA) genes in the plastome of *I. suaveolens*. The GC skew in a purple color indicates either G > C or G < C, and the GC content is shown in black.

**Figure 4.** VISTA visualization of the alignment between the eight *Ilex* plastomes. The plots show the sequence identity with the *I.suaveolens* plastome, used as a reference. The vertical and horizontal axes represent the sequence consistency degree (50%–100%) and the sequence length, respectively. The locations of divergent hotspot regions are labeled along the top of the alignment, where the gray arrows indicate the orientation of the annotated genes. The red bars indicate the non-coding sequences (NCS), and white peaks represent differences in chloroplast genomes.

**Figure 5.** Comparative analyses of the junctions for the SSC/IRs and LSC/IRs regions among eight *Ilex* plastomes. The colored boxes above the strip scaled with sequence length indicate the denoted genes. The gaps in base length (bp) are indicated between boxed genes and boundaries. JLA, the junction of IRa/LSC; JLB, the junction of IRb/LSC; JSA, the junction of IRa/SSC; JSB, the junction of IRb/SSC.

#### *3.5. Phylogenetic Analyses of Ilex in Aquifoliaceae*

To investigate the phylogenies, the entire plastid genome sequences of 15 *Ilex* accessions were aligned using MAFFT. The phylogenetic topology was generated by MEGA X using the ML method supported with 1000 bootstrap values. *P. trichocarpa*, *P. deltoides*, *Q. acutissima*, and *H. himalaica* were used as out groups in the phylogenetic analyses. In Figure 6, the phylogenetic tree shows that four clades were deduced from all examined *Ilex* species. *I. latifolia* and *I. integra* cluster together with an identical high value in clade I, which also includes three additional *Ilex* species (*I. delavayi*, *I.sp. XY-2016*, and *I. cornuta*). All of them are evergreen trees with leathery leaves and broad distribution in subtropical Asia. The endemic *I. viridis* and *I. suaveolens* show close relationships in the clade III. *I. micrococca* clusters around *I. wilsonii* and *I. asprella* in clade IV, and the latter two are deciduous shrubs or trees. Yerba mate *I. paraguariensis* and *I. dumosa*, originating from South America, are located in clade II. The plastid phylogeny revealed that the species might not have originated from a single ancestor in Aquifoliaceae. The full encoding sequences were used to construct the phylogenetic tree, showing a distinctive clade (Figure S3).

**Figure 6.** The phylogenetic analyses of 15 *Ilex* species using the entire plastid genomes. The evolutionary tree was constructed using MEGA X with the maximum likelihood (ML) method and the Tamura-Neighbour model. The percentage of trees in which the associated taxa clustered together is shown next to the branches. This analyses involved 19 plastid entire plastomes of *I. latifolia*, *I. suaveolens*, *I. viridis*, *I. micrococca*, *I. paraguariensis* (KP016928), *I. dumosa* (KP016927), *I. integra* (MK335537), *I. cornuta* (MK335536), *I. sp. XY-2016* (KX426469), *I. delavayi* (KX426470), *I. szechwanensis* (KX426466), *I. wilsonii* (KX426471), *I. polyneura* (KX426468), *I. asprella* (NC\_045274.1), and *I. pubescens* (KX426467). The plastomes of *P. trichocarpa* (NC\_009143), *P. deltoides* (MK267316), *Q. acutissima* (MF593895), and *H. himalaica* (KX434807) were used as outgroups.

#### **4. Discussion**

The holly *Ilex* mostly grows in mesic environments with global distribution. A recent report on phylogeny and biogeography suggested the origin of *Ilex* being in subtropical Asia [6]. Subsequently, it colonized other areas (e.g., South and North America, Australia, Europe, Africa, and some ocean islands) with divergence time from 4 to 30 million years ago [6]. Approximately 204 *Ilex* species (149 endemic species) have been described in China [48]. Located in a transition zone of north–south flora of eastern China, Mount Huangshan is considered as a priority spot for biodiversity conservation, in which all of 20 recorded *Ilex* species exhibited many medicinal properties and economic importance for garden and industry use [8,49,50]. Among these *Ilex* species, *I. suaveolens* is the most abundantly native species. The deciduous *I. micrococca* is locally grown as popular ornamental trees and for providing superior materials for paper and tannin production, in addition to herbal medicine use. The "Kuding" tea represented by *I. latifolia*, and pharmaceutical *I. viridis*, show potential clinical functions for scavenging heat, anti-inflammation, and detoxification, are moderately distributed in the dynamic forest plot [26].

The plasticity and diversity in leaf traits, including venation, anatomy, stomatal distribution, and stomatal conductance, were drastically affected by both cellular factors and environmental cues [51–53]. Hence, this extensively phenotypic survey of the vegetative and reproductive organs in plants contributes to the understanding of plant physiological and ecological adaptation. In our work, initial comparative analyses of leaf phenotypes revealed the significantly different values in LA, SLA, STD, and SW between four *Ilex* species, suggesting that these morphological traits could be potentially used as candidates to distinguish different *Ilex* species (Table 1). The persistent fruits and distinctive leave morphology in *Ilex* result from complicated genetic variation and ecological adaptability to various environments; therefore, the underlying molecular mechanisms require an extensive understanding of genome architecture and genome size diversity [54]. The transcriptome assembly was previously only conducted in *I. paraguariensis* [13]. Unfortunately, the draft of the full

nuclear genome sequencing and annotation remains unavailable in *Ilex*. Using the tomato as the internal standard, analyses of DNA image cytometry in four *Ilex* species revealed that the calculated NG showed approximately 955 Mb in *I. latifolia*. This NG value is higher than *I.cornuta* (642 Mb) [55]. The examined NG in *I. suaveolens* and *I. viridis* showed similar levels to *I. mucronata* (1073 Mb) [56]. The most abundant NG was characterized in *I. micrococca*, slightly lower than *I. paraguariensis* (1078 Mb) [29]. The estimated DNA C-value (NG) in four *Ilex* species had not been previously reported. We found that *I. micrococca* had the largest size in NG but showed the lowest SW levels in comparison with other *Ilex*. A similar phenomenon was reported in a recent study, which proposed a negative association between the seed mass and nuclear genome size in the diploid *Aesculus* species [57]. In contrast with all detected morphological traits, the calculated NG appears to have a linear relationship with SLA and STD in four *Ilex* species, which is inconsistent with the associated NG/STD patterns from a large-scale comparative analysis in angiosperms [19]. We hypothesized that the statistical associations between the NG, LEC, and STD might not be evaluated adequately with small sample capacity, which could reason the discordance above. Perhaps, the relationships of NG size and various leaf traits appeared to be not conservative among angiosperms due to the diversified environmental adaptation [58–60].

The advances in novel biotechnologies, particularly in whole-genome sequencing, created the opportunity to explore the phylogenies, species identification, and molecular markers among taxa [23,61]. In our work, using high-throughput sequencing technologies combined with de novo and reference-guided assembly, the complete plastomes of four *Ilex* species were constructed, showing the conserved quadripartite circular structure, comprising of LSC, SSC, and two copies of IR regions. A total of 144 putative genes (96 PE, 40 tRNA, and 8 rRNA genes) were previously annotated in *Ilex* plastid genomes [1]. The number of identified PE genes and tRNA appeared to be variable in various *Ilex* species [62,63]. Further sequence analyses indicated that *ycf68*, *orf42*, and *ycf68* were located between two exons of *trnA-UGC*, and *orf188* sequence overlap with one exon of *ndhA*. *trnP-GGG* shows a sequence overlapping with *trnP-UGG*. Thus, the total numbers of genes identified were normalized into 134 (89 PE, 37 tRNA, and 8 rRNA genes) in the plastomes of *I. suaveolens*, *I. viridis*, and *I. micrococca*. The most distinctive length in plastome is 370 bp between various *Ilex* species, whereas the maximum sequence difference of LSC is 331 bp, suggesting that the divergence in the LSC region may cause the varied plastome lengths, which might depend on the IR contraction and expansion [64]. Plastid genome sequences exhibit extensive variations in the length, number, and distribution of SSRs. These variations have potential importance for the outcome of genomic diversity [65]. Plastid SSRs have been used extensively in the taxonomy, phylogenies, and the maternal structures in the community, diversity, and differentiation [66]. In total, 221 SSRs and 196 unique long repeats were characterized in four *Ilex* plastomes. The mononucleotide repeats (A/T) in all SSRs represent the dominant type distributed in the non-coding regions, which is related to the AT abundance of the nucleotide composition. This typical pattern was also identified in previous reports [67–69]. The forward repeat and 20–30bp in sequence length were the most general features within the different types of long repeats.

In our study, both GView and mVISTA analyses of eight plastid genomes illustrated numerous divergent hotspots that are primarily located in the SSC regions, suggesting more variable sites in intergenic non-coding sequences than in coding genes. These variable sequences could potentially be used to develop new molecular markers for the identification and taxonomy in *Ilex*. The findings of these hotspot regions are compatible with a previous study that reported the presence of at least 11 divergent regions [1]. Although several divergent genes (*rbcL*, *atpB*, and *matK*) helped inform the reconstruction of the phylogenetic tree among distantly-related species, they might provide suitable resolution for studies within *Ilex* due to the relative lower divergence than the hotspot regions [2,6,22,70]. Recently, *trnH-psbA*, *rbcL-accD*, and *trnS-trnG* have been developed as genetic markers in other species [71,72]. The plastome-divergent hotspot regions were considered powerful for species-level identification. However, whether these divergence hotspot regions could be extensively used for classification and taxonomy in *Ilex* remains to be assessed experimentally.

The comparative detection of sequence variability in the junctions of IRs/SSC and IRs/LSC showed relative fluid patterns in eight *Ilex* species. Mostly, gene *rps19* in LSC regions is nearest to the JLB; however, it was found to span the JLB into IRB with a 4 bp extension in *I. paraguariensis*, *I. viridis*, and *I. suaveolens*. The typical variability in JSB was also observed for *ycf1* in the IRB region and *ndhF* in the SSC region. A similar phenomenon of the extent in IR regions has been recognized in other *Ilex* species [1]. The sequence variations in four junctions of SC/IRs appeared to frequently occur during genome evolution, which resulted in alteration of the plastome size [73]. Hence, the contraction and expansion at the boundaries of IR regions could explain the variability in sequence length between different plastid genomes [74].

Although the Aquifoliaceae has good fossil records, several systematic studies revealed a high incongruity of phylogenies between the nuclear trees and plastid trees, suggesting that the evolutionary and phylogenetic patterns of Aquifoliaceae remain to be elucidated [1,6,22]. Some molecular genetic factors, including inter-lineage hybridization, lineage sorting, introgression, and gene duplication and loss, were reported to significantly influence the phylogenetic incongruence, which broadly occurred in Angiosperms [75]. In contrast to the nuclear trees, the plastid trees strongly reflect the biogeographic distribution of extant species [22]. In our work, the phylogenetic topology revealed the presence of four clades within *Ilex*, in agreement with the fossil record [6], and also fit well with recent reports on plastid phylogenetic analysis [1,32,63]. All five *Ilex* species in the clade I belong to section *Ilex*, showing leathery morphology in the leaf, which is consistent with the traditional classification [48]. *I. paraguariensis* and *I. dumosa* are both located in clade II, suggesting a similar geographic distribution in Southern America. However, *I. suaveolens* in section *Lioprinus* clusters together with *I. szechwanensis* and *I. viridis* in clade III, while the latter two species are categorized into section *Paltoria*. The findings that evergreen species surprisingly have a close relationship with deciduous species in clade IV, further reflects the incongruence between the plastid phylogeny and traditional taxonomy [1,3,67]. Nevertheless, three endemic *Ilex* species in clade III show a relative similarity of the leaf morphology and growth pattern (600–1600 m altitude). Both subtropical deciduous *I. micrococca* and *I. asprella* in clade IV have very similar leaf margins, growing at an altitude of 400–1000 m. Additionally, the topology constructed using the entire plastomes significantly improved the quality of phylogenies compared to the use of full coding sequences (Figure S3). Therefore, the plastid topology provides more reliable clues to infer the species' geographic patterns and origins. Overall, when more complete plastome sequences of *Ilex* species become accessible, plastid trees combined with the nuclear markers will contribute to the resolution of the deeper branches of the biogeography and phylogeny in Aquifoliaceae.

#### **5. Conclusions**

The unspecific pollination and weakness of reproductive isolation resulted in the frequently intraspecific genetic exchange, confusing the phylogenies and biogeography in the *Ilex* genus. Dioecious *Ilex* species have similar flowers and fruits, but the variable morphology in leaves is commonly affected by climate and season, creating an obstacle to the identification and classification of the *Ilex* specimens [22,75]. Hitherto, the anatomical features and plastid genomes of four endemic *Ilex* species in Mount Huangshan were undocumented. In this work, we performed comparative analyses of a variety of phenotypic traits (e.g., leaves, flowers, and seeds) and the molecular profiles, including nuclear genome size, plastid genome structures, variable patterns, and phylogenetic relationships, in four *Ilex* species. The reconstruction of the plastid phylogenies verified the significant usage of the complete plastomes in identifying the phylogeography and phylogenetic evolution of *Ilex*. However, the evolved phenotypic variation and variable molecular patterns might be tightly related to the adapted ecology and environments. The limited accessions in the plastid genome restrained the extensive exploration of the phylogenies in *Ilex*, which necessitates additional sequencing samples. In summary, the morphological and molecular data in the present study provided informative resources for the in-depth mining of the phylogenies, biogeography, and genetic diversity in the family Aquifoliaceae.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/9/964/s1, Figure S1: Analysis of SSRs in four *Ilex* plastid genomes, Figure S2: Analysis of the long repeated sequences in four *Ilex* plastid genomes, Figure S3: The phylogenetic analysis of 15 *Ilex* species using the coding sequences, Table S1: Base composition and proportions of the four *Ilex* plastid genomes, Table S2: Statistical analysis of codon usage, RSCU values, codon-anticodon recognition patterns of four *Ilex* plastid genomes.

**Author Contributions:** T.S. and M.H. designed the experiment, analyzed all of the data, and prepared the initial draft. M.H. developed the concept and approved of the final manuscript. M.Z. and Z.S. collected all genome sequencing data and performed the statistical analysis. Z.S. and X.L. conducted the section preparation, image visualization, and FCM assessment. B.Z. and H.W. assisted M.Z. with the sequence comparison, experiments conduction, and DNA calculation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (NSFC) (31870589; 31700525); The Natural Science Foundation of Jiangsu Province (NSFJ) (BK20170921); the Scientific Research Foundation for High-Level Talents of Nanjing Forestry University (SRFNFU) (GXL2017011; GXL2017012). The Priority Academic Program Development of Jiangsu Higher Education Institutions.

**Acknowledgments:** The authors would like to thank the NSFC, NSFJ, and SRFNFU for funding this work and the Co-Innovation Center for Sustainable Forestry in Southern China and PAPD for the instrument use. Thanks go to Yao Li, who assisted with the species identification and Mingzhi Li for the technical support with sequence processing. We also appreciate the critical comments from Yanming Fang.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Identification of a Natural Hybrid between** *Castanopsis sclerophylla* **and** *Castanopsis tibetana* **(Fagaceae) Based on Chloroplast and Nuclear DNA Sequences**

#### **Xiaorong Zeng, Risheng Chen, Yunxin Bian, Xinsheng Qin, Zhuoxin Zhang and Ye Sun \***

Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agriculture University, Guangzhou 510642, China; zengxiaorong1211@126.com (X.Z.); sam.chengrisheng@gmail.com (R.C.); bianyunx@163.com (Y.B.); qinxinsheng@scau.edu.cn (X.Q.); zxzhang@scau.edu.cn (Z.Z.)

**\*** Correspondence: sun-ye@scau.edu.cn; Tel.: +86-136-4265-9676

Received: 30 June 2020; Accepted: 7 August 2020; Published: 11 August 2020

**Abstract:** *Castanopsis* × *kuchugouzhui*Huang et Y. T. Chang was recorded in Flora Reipublicae Popularis Sinicae (FRPS) as a hybrid species on Yuelushan mountain, but it is treated as a hybrid between *Castanopsis sclerophylla* (Lindl.) Schott. and *Castanopsis tibetana* Hance in Flora of China. After a thorough investigation on Yuelushan mountain, we found a population of *C. sclerophylla* and one individual of *C.* × *kuchugouzhui*, but no living individual of *C. tibetana*. We collected *C.* × *kuchugouzhui*, and we sampled 42 individuals of *C. sclerophylla* from Yuelushan and Xiushui and 43 individuals of *C. tibetana* from Liangyeshan and Xiushui. We used chloroplast DNA sequences and 29 nuclearmicrosatellitemarkers toinvestigateif*C.*×*kuchugouzhui* is a natural hybrid between *C. sclerophylla* and *C. tibetana*. The chloroplast haplotype analysis showed that *C.* × *kuchugouzhui* shared haplotype H2 with *C. sclerophylla* on Yuelushan. The STRUCTURE analysis identified two distinct genetic pools that corresponded well to *C. sclerophylla* and *C. tibetana*, revealing the genetic admixture of *C.* × *kuchugouzhui*. Furthermore, the NewHybrids analysis suggested that *C.* × *kuchugouzhui* is an F2 hybrid between *C. sclerophylla* and *C. tibetana*. Our results confirm that *C.* × *kuchugouzhui* recorded in FRPS is a rare hybrid between *C. sclerophylla* and *C. tibetana*.

**Keywords:** *Castanopsis* × *kuchugouzhui*; natural hybrid; molecular identification; chloroplast DNA sequence; microsatellite

#### **1. Introduction**

Hybridization is a process through which there is interbreeding of individuals from two genetically distinct populations or species [1]. It was estimated that at least 25% of plant species, mostly the youngest species, were involved in hybridization and potential introgression with other species [2]. A large number of studies showed that natural hybridization is ubiquitous in different taxa [3–6], and it plays a significant role in generating genetic diversity, even the origin of new ecotypes or species [7–10]. Research on natural hybridization has become a hot spot in the field of plant systematics and evolution in recent years [11–13].

Hybrid identification is the first step in exploring the intricate evolutionary history of natural hybridization. The morphological characteristics of natural hybrids are usually intermediate or more similar to one of their parents, and they form a gradually morphological transition that often causes the blurred and indistinguishable boundary of the species [14]. Chloroplast DNA is uniparentally inherited (maternal inheritance in most angiosperm) and nuclear DNA is biparentally inherited; thus, a comparative analysis of nuclear DNA and chloroplast DNA would provide complementary and often contrasting information on the genetic structure and phylogenies. Interspecific hybrids are commonly identified by cytonuclear discordance that may indicate different parental contribution to the hybrid genome [4,15–17].

*Castanopsis sclerophylla* (Lindl.) Schott. and *Castanopsis tibetana* Hance are dominant species in mid-subtropical evergreen broad-leaved forest in China. *Castanopsis sclerophylla* is mainly distributed in the north of Nanling mountain, south of the Yangtze River, and east of Sichuan and Guizhou provinces [18]. *Castanopsis tibetana* is widely distributed in subtropical China and overlaps with the range of *C. sclerophylla*. *Castanopsis tibetana* grows together with *C. sclerophylla* in a forest on Yuelushan mountain of Changsha City in Hunan Province, where a specimen of *Castanopsis* × *kuchugouzhui* Huang et Y. T. Chang was collected according to Flora Reipublicae Popularis Sinicae (FRPS) [18]. The leaves and cupules of *C.* × *kuchugouzhui* show intermediate morphologies between *C. sclerophylla* and *C. tibetana*; thus, it is recognized as a hybrid species in FRPS [18]. However, *C.* × *kuchugouzhui* is assumed to be a putative natural hybrid between *C. sclerophylla* and *C. tibetana* in Flora of China [19]. Up to now, there is no molecular evidence for this putative hybrid. The purpose of this study was to verify whether *C.* × *kuchugouzhui* is a natural hybrid between *C. sclerophylla* and *C. tibetana* by using chloroplast DNA sequences and nuclear microsatellite markers.

#### **2. Materials and Methods**

#### *2.1. Plant Materials and DNA Extraction*

After a thorough investigation of this forest on Yuelushan mountain in 2017–2019, we found a population of *C. sclerophylla* and one individual of *C.* × *kuchugouzhui*, but no living individual of *C. tibetana*. According to the information obtained from Yuelushan Mountain Scenic Area Administration Bureau and Hunan Normal University, this *C.* × *kuchugouzhui* is more than 100 years old and it is the same individual reported by FRPS. We sampled *C.* × *kuchugouzhui* and 20 individuals of *C. sclerophylla* on Yuelushan. We further sampled 22 individuals of *C. sclerophylla* and 19 individuals of *C. tibetana* from Xiushui, as well as 24 individuals of *C. tibetana* from Liangyeshan (Table 1). For each population, eight to 10 fresh leaves per tree were chosen and quickly dried with silica gel, and the individuals were sampled at least 20 m apart from each other. Voucher specimens were made for *C.* × *kuchugouzhui* and each population, and they were stored in the Dendrological Herbarium of South China Agricultural University (CANT).


**Table 1.** Location, sample size, and genetic diversity of the investigated populations.

N: north, E: east, *A*: total number of alleles detected, *AR*: allelic richness for 19 diploid individuals, *H*: gene diversity, *FIS*: fixation index. \* Deviated from Hardy–Weinberg equilibrium significantly (*p* < 0.01).

Total genomic DNA was extracted using the DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. The quality and concentration of the genomic DNA were evaluated by electrophoresis on 1% agarose gels and NanoDropTM 2000 Spectrophotometers (Thermo Scientific, Waltham, MA, USA).

#### *2.2. Chloroplast DNA Sequencing and Nuclear Microsatellite Genotyping*

The chloroplast DNA of psbA-trnH and trnM-trnV intergenic spacers was amplified and sequenced with primers described in References [20,21]. The total volume of the polymerase chain reaction (PCR) was 30 μL, which contained 1 × ES Tag Master Mix (Cwbiotech, Beijing, China), 0.2 μM each of forward

and reverse primers, and 20 ng of DNA. PCR amplification was conducted for 33 cycles in a TaKaRa PCR Thermal Cycler Dice™ Gradient (TP600) (TAKARA, Kyoto, Japan). Each cycle included 45 s at 95 ◦C, 45 s at annealing temperature, and 45 s at 72 ◦C. An initial pre-denaturation for 5 min at 95 ◦C and a final extension for 7 min at 72 ◦C were added.

Polymorphic nuclear microsatellites were screened from 173 simple sequence repeat (SSR) markers originally developed in *C. sclerophylla*, *Castanopsis fargesii*, *Castanopsis sieboldii*, *Castanopsis tribuloides*, *Castanea sativa*, and *Castanea mollissima* [22–28]. One sample in each population and *C.* × *kuchugouzhui* were used in a preliminary experiment to test SSR amplification. PCR was performed in a total volume of 10 μL that consisted of 1 × ES Tag Master Mix, 0.5 μM each of forward and reverse primers, and 20 ng of DNA. The PCR program was the same as above apart from the annealing temperature. PCR products were separated by electrophoresis on 3% agarose gels.

A total of 75 pairs of primers that generated a clear electrophoretic band were applied to the multiple PCR, in which the forward primers were labeled with fluorochromes TAMRA, HEX, 6-FAM, and ROX. The primers with different fluorescence were combined, and we tried to keep their predicted products 30–50 bp apart. The Type-it Microsatellite PCR Kit (QIAGEN, Hilden, Germany) was used to prepare the multiple PCR with a total volume of 10 μL, which contained 1 × PCR Master Mix, 1 × Q-Solution, 0.2 μM each of forward and reverse primers, and 20 ng of DNA. Two samples in each population and *C.* × *kuchugouzhi* were used in this experiment. The PCR programs included an initial pre-denaturation of 5 min at 95 ◦C, followed by 28 cycles of 30 s at 95 ◦C, 90 s at 57 ◦C, 30 s at 72 ◦C, and a final extension of 30 min at 60 ◦C. PCR products were visualized on an ABI-3730XL fluorescence sequencer (Applied Biosystems, Foster City, CA, USA) using LIZ500 as an internal size standard. Alleles (Table S1) were scored using GeneMarker v 2.2.0 [29]. Finally, 32 pairs of primers with high polymorphism and stability were applied to genotype all samples.

#### *2.3. Data Analyses*

DNA sequences of psbA-trnH (GenBank accession numbers: MT635060-MT635092) and trnM-trnV (GenBank accession numbers: MT635093-MT635125) were manually checked using BioEdit [30]. Multiple alignments were carried out using MEGA v 7 [31] with *Castanopsis fabri* (GenBank accession numbers: MF592976, MF592882) as the outgroup. Haplotypes were retrieved using DnaSP v 6.12.03 [32], and a reduced median-joining network was constructed using NETWORK v 5.0 [33] to infer haplotype relationships.

Genetic diversity parameters including number of alleles detected (*A*), allelic richness (*AR*), gene diversity (*H*), observed heterozygosity (*HO*), gene diversity within populations (*HS*), gene diversity in total population (*HT*), fixation index (*FIS*), and genetic differentiation among populations (*FST*) under an infinite allele model were calculated per locus using FSTAT v 2.9.4 [34]. The Hardy–Weinberg equilibrium was tested by permuting alleles and comparing the fixation index calculated from randomized datasets to that obtained from the observed dataset; *p*-values were subjected to Bonferroni correction for multiple comparisons. Three loci significantly deviated from the Hardy–Weinberg equilibrium both in *C. sclerophylla* and in *C. tibetana*; they were excluded from all subsequent analyses.

Genetic differentiation and exchange of *C. sclerophylla* and *C. tibetana* were assessed with a model-based Bayesian clustering method implemented in the program STRUCTURE v 2.3.4 [35] by choosing the admixture model and correlated allele frequencies between populations. Ten independent runs were conducted for each K value (from 1 to 5) with 100,000 MCMC (Markov chain Monte Carlo) iterations after 50,000 burn-in period. The optimal number of clusters (K) was determined through the statistic ΔK based on the second-order rate of change in the log probability of data between successive K values [36]. The average matrix of ancestry membership proportions was calculated over the 10 runs using CLUMPP v 1.1.2 [37].

Each individual was assigned to a genotype category with posterior probability by using the program NewHybrids v 1.1 beta [38]. We considered six genotype categories: Parent 1, Parent 2, F1 hybrids, F2 hybrids, backcross generation to Parent 1, and backcross generation to Parent 2 [39]. The analysis was run for 100,000 rounds after a burn-in of 50,000 iterations.

#### **3. Results**

#### *3.1. Chloroplatst DNA Variation*

After alignment using *C. fabri* as outgroup, the total length of chloroplast sequences was 1158 bp, from which seven variable sites and six haplotypes were identified. The sequence variants and their positions are shown in Table 2. Within the lineage of *C. sclerophylla*/*C. tibetana*, five variable sites and five haplotypes were identified. Haplotypes H1 and H2 were detected in populations of *C. sclerophylla*, while haplotypes H3, H4, and H5 were found only in *C. tibetana*. *Castanopsis* × *kuchugouzhui* shared haplotype H2 with *C. sclerophylla* at the Yuelushan. Population GK-XS possessed two haplotypes H4 and H5, while the other populations were fixed for one unique haplotype. There was no shared common haplotype between *C. sclerophylla* and *C. tibetana*, nor between *C.* × *kuchugouzhui* and *C. tibetana*. Relationships among haplotypes are shown in Figure 1. Haplotype H2 was closely related to H1, and they diverged from haplotypes H3, H4, and H5, which constituted a haplogroup.



**Figure 1.** Haplotype relationships shown on median-joining network. The size of each circle is proportional to the haplotype frequency. Mutational steps between the haplotypes are indicated on the line. Population codes are the same as in Table 1.

#### *3.2. Genetic Diversity at Nuclear Microsatellite Loci*

In *C. sclerophylla*, a total of 195 alleles were revealed at 29 nuclear microsatellite loci (Table 3). Per locus, the number of alleles ranged from two to 15, and the observed heterozygosity (*HO*) varied

from 0.143 to 0.932. The within-population gene diversity (*HS*) and the overall gene diversity (*HT*) ranged from 0.286–0.900 and from 0.312–0.907, respectively. Over the 29 loci, the values of *FIS* and *FST* were 0.080 and 0.053, respectively. Five of 29 loci significantly deviated from the Hardy–Weinberg equilibrium (*p* < 0.01).


**Table 3.** Genetic diversity at 29 nuclear microsatellite loci in *Castanopsis sclerophylla*.

*A*: total number of alleles detected, *HO*: observed heterozygosity, *HS*: gene diversity within populations, *HT*: gene diversity in total population, *FIS*: fixation index (a coefficient based on the difference among observed and expected heterozygosity), *FST*: genetic differentiation among populations (a coefficient based on the difference among expected heterozygosity within populations and expected heterozygosity in the species). \* Deviated from Hardy–Weinberg equilibrium significantly (*p* < 0.01).

In *C. tibetana*, a total of 115 alleles were revealed at 29 nuclear microsatellite markers (Table 4). Per locus, the number of alleles ranged from two to nine, and the observed heterozygosity (*HO*) varied from 0.021 to 0.921. The within-population gene diversity (*HS*) and the overall gene diversity (*HT*) ranged from 0.155–0.746 and from 0.171–0.786, respectively. Over the 29 loci, the values of *FIS* and *FST* were −0.061 and 0.204, respectively. Five of 29 loci significantly deviated from the Hardy–Weinberg equilibrium (*p* < 0.01).


**Table 4.** Genetic diversity at 29 nuclear microsatellite loci in *Castanopsis tibetana*.

*A*: total number of alleles detected, *HO*: observed heterozygosity, *HS*: gene diversity within populations, *HT*: gene diversity in total population, *FIS*: fixation index (a coefficient based on the difference among observed and expected heterozygosity), *FST*: genetic differentiation among populations (a coefficient based on the difference among expected heterozygosity within populations and expected heterozygosity in the species). \* Deviated from Hardy–Weinberg equilibrium significantly (*p* < 0.01).

Genetic diversity within populations is summarized in Table 1. *Castanopsis sclerophylla* exhibited a higher level of genetic diversity within population when compared to *C. tibetana*. At the population level, the average values of the number of alleles (*A*), allelic richness (*AR*), and gene diversity (*H*) were 5.6, 5.428, and 0.605 in *C. sclerophylla*, but 3.1, 3.041, and 0.454 in *C. tibetana*. Population KZ-XS harbored the highest genetic diversity (*A* = 5.6, *AR* = 5.426, and *H* = 0.629), while population GK-XS showed the lowest genetic diversity (*A* = 2.9, *AR* = 2.862, and *H* = 0.427). The fixation index (*FIS*) was 0.078 in *C. sclerophylla*, but −0.063 in *C. tibetana*. The population KZ-XS significantly (*p* < 0.01) deviated from the Hardy–Weinberg equilibrium across 29 loci.

#### *3.3. STRUCTURE and NewHybrids Analyses Based on Nuclear Microsatellite Markers*

In the STRUCTURE analysis (Figure 2A), the optimal K-value was found to be 2, indicating that all individuals sampled were assigned to two genetic clusters. One cluster corresponded to *C. sclerophylla* (42 individuals with cluster membership >0.993) and the other corresponded to *C. tibetana* (43 individuals with cluster membership >0.996). *Castanopsis* × *kuchugouzhui* showed genetic admixture with proportions of 0.606 and 0.394 for each cluster that appeared to be "heterozygous" with alleles inherited from *C. sclerophylla* and *C. tibetana*, suggesting that it could be a hybrid offspring of the two species. In the NewHybrids analysis (Figure 2B), 86 sampled individuals were clearly assigned to three genotype categories with high posterior probabilities (>0.999). Forty-two individuals of *Forests* **2020**, *11*, 873

*C. sclerophylla* and 43 individuals of *C. tibetana* were assigned to Parent 1 and Parent 2, respectively. *Castanopsis* × *kuchugouzhui* was classified as an F2 hybrid between *C. sclerophylla* and *C. tibetana*.

**Figure 2.** (**A**) Genetic admixture coefficient of the samples; (**B**) posterior probability of genotype category evaluated based on nuclear microsatellite markers. Population codes are the same as in Table 1.

#### **4. Discussion**

#### *4.1. SSR Transferability among Castanopsis Species*

The ability to transfer SSRs among species is related to the level of divergence between the species; a closer relationship, denotes higher transferability of the primers [40]. Ye et al. [23] screened 51 microsatellite markers originally developed from four *Castanopsis* species (*C. sclerophylla*, *Castanopsis chinensis*, *Castanopsis cuspidata*, and *C. sieboldii*) and found that 68.6% of SSR primer pairs successfully cross-amplified and 31.4% were polymorphic in *C. tibetana*. Li et al. [41] tested 124 EST(expressed sequence tags)-SSRs originally developed from *Castanea mollissima* and found that 42.7% of *C. mollissima* EST-SSR primers successfully cross-amplified and 56.6% showed polymorphism in *Castanopsis fargesii*. In this study, we screened 31 SSRs originally developed from *Castanopsis* species and 142 SSRs originally developed from *Castanea* species. We found that 80.6% of *Castanopsis* SSRs successfully cross-amplified in both *C. tibetana* and *C. sclerophylla*, and 52% showed polymorphism. In contrast, 35.2% of *Castanea* SSRs cross-amplified in both *C. tibetana* and *C. sclerophylla*, and 38% were polymorphic. These results were consistent with the expectation that successful cross-species amplification among closely related genera appears to be much lower than that within genera [40]. The moderate to very high cross-species transferability of SSRs among *Castanopsis* species implies that the species in this genus have more similar genetic makeup and may not be completely reproductively isolated; thus, there is a chance of natural hybridization between species in this genus.

#### *4.2. Genetic Diversity of C. sclerophylla and C. tibetana*

Genetic diversity is essential for populations to adapt to environmental change. Large populations of naturally outbreeding species usually have extensive genetic diversity, but it is generally reduced in small populations and endangered species. Habitat fragmentation caused by human interference would reduce the population size and increase the spatial isolation. Such changes will be accompanied by an erosion of genetic variation and an increase of inter-population genetic divergence due to increased genetic drift, elevated inbreeding, and reduced gene flow [42].

In the present study, moderate genetic variation was found in *C. sclerophylla*; the average number of alleles per locus was 6.7, and the mean observed heterozygosity was 0.557. The level of genetic diversity of *C. sclerophylla* was similar to that reported in other closely related species such as *C. fargesii* (*A* = 6.7, *HO* = 0.690) [24], *Castanopsis acuminatissima* (*A* = 10.8, *HO* = 0.517) [43], and *C. sieboldii* (*A* = 5.2, *HO* = 0.563) [44]. Compared with the species above, a lower level of genetic diversity was observed in *C. tibetana* (*A* = 4.0, *HO* = 0.482). The genetic diversity of *C. tibetana* may be greatly affected by habitat fragmentation and human interference given that *C. tibetana* was destroyed out on Yuelushan mountain and *C. tibetana* exhibited higher genetic differentiation than *C. sclerophylla*. Expanding the sampling of *C. tibetana* is required to examine how habitat fragmentation impacted the genetic diversity of this species.

#### *4.3. Molecular Evidence of Natural Hybrid and Taxonomic Status for C.* × *kuchugouzhui*

The genetic structure analyses based on nuclear microsatellite markers show a clear genetic differentiation between *C. sclerophylla* and *C. tibetana*, and two genetic clusters correspond well to the two species. The fact that *C.* × *kuchugouzhui* shows genetic admixture with proportions of 0.606 and 0.394 for each cluster indicated that it is a hybrid between *C. sclerophylla* and *C. tibetana*. Natural hybridization between *C. sclerophylla* and *C. tibetana* could be attributed to a full overlap in the flowering phenology of the two species that usually flower from April to May. *Castanopsis sclerophylla* is supposed to be the maternal parent of *C.* × *kuchugouzhui* since it shared with *C. sclerophylla* a common haplotype (H2) that is maternally inherited. *Castanopsis sclerophylla* inhabits lower elevation and would have a great chance of receiving pollen from *C. tibetana* that occupies higher elevation. *C.* × *kuchugouzhui* was assigned to an F2 hybrid between *C. sclerophylla* and *C. tibetana* with very high posterior probabilities in the NewHybrids analysis. However, the possible hybrid category that could occur after up to four generations of crossing between the two parent species was allowed [39]. The last specimen of *C. tibetana* on Yuelushan mountain was collected in 1977. Given that *C.* × *kuchugouzhui* is more than 100 years old, we can be sure that the hybridization event to form this hybrid occurred before *C. tibetana* disappeared from Yuelushan mountain. At the time of the presumable F1 hybrid formation, no other *Castanopsis* species was present in the region but *C. fargesii* and a preliminary test by four SSRs did not mark any gene flow from this species to *C.* × *kuchugouzhui* (data not shown). Because the dispersal distance of seeds and pollens of *Castanopsis* species is very limited [45–47], it is almost impossible that the formation of *C.* × *kuchugouzhui* was due to a hybrid seed dispersed from elsewhere or was contributed to by pollen of *C. tibetana* elsewhere.

The leaf and cupule morphologies of *C.* × *kuchugouzhui* were intermediate between *C. sclerophylla* and *C. tibetana* [18], and *C.* × *kuchugouzhui* showed mixed genetic characteristics between *C. sclerophylla* and *C. tibetana*. However, there is only one record of *C.* × *kuchugouzhui* up to now, and we did not identify natural hybridization elsewhere, such as in Xiushui, where the two species coexisted together. These facts suggest that natural hybridization between *C. sclerophylla* and *C. tibetana* is a very rare event. This is consistent with our expectation since the hybrid offspring between two species will suffer deleterious consequences termed outbreeding depression. The hybrid offspring in the F1 and subsequent generation will be rapidly eliminated by natural selection due to their minor fitness [48]. Therefore, instead of listing as a separate species, *C.* × *kuchugouzhui* should be comprehensively treated as a natural hybrid between *C. sclerophylla* and *C. tibetana*.

In recent years, *C. tibetana* on Yuelushan mountain disappeared due to serious disturbance from human beings such as tourism development. According to our investigation, *C. sclerophylla* inhabits lower elevations of approximately 200–1000 m and favors plenty sunshine, while *C. tibetana* generally prefers to grow in humid conditions at slightly higher elevations. These facts indicate that *C. sclerophylla* and *C. tibetana* have some differentiation in ecological niche occupation. The individuals of the two species seldom grow together just like they do on Yuelushan mountain and Xiushui county, although they could be found in the same forest communities. In this study, only one hybrid individual was corroborated. The very rare natural hybrid between *C. sclerophylla* and *C. tibetana* may imply that the two species have strong but not complete reproductive isolation, which may be caused by their ecological differentiation. Natural hybridization will generate new genotypes and increase genetic diversity, which is important for trees to adapt new environments in the face of rapid global change; thus, this new genotype of *C.* × *kuchugouzhui* is worth conserving as an important germplasm.

#### **5. Conclusions**

In this study, we provided compelling evidence for the natural hybrid of *C.* × *kuchugouzhui* using chloroplast DNA sequences and 29 nuclear microsatellite markers. *Castanopsis* × *kuchugouzhui* is a very rare event of natural hybridization, where introgression occurred between *C. sclerophylla* and *C. tibetana*. The genetic analysis of this rare natural hybrid is very helpful for us to understand the genetic differentiation and gene exchange between *Castanopsis* species.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/8/873/s1: Table S1. Alleles at 29 loci used for STRUCTURE and NewHybrids analysis.

**Author Contributions:** Conceptualization, Y.S., X.Q., and Z.Z.; methodology, X.Z., R.C., Y.B., and Y.S.; software, X.Z.; validation, X.Z., R.C., and Y.S.; formal analysis, X.Z.; investigation, X.Z., R.C., Y.B., and Y.S.; resources, Y.S., X.Q., and Z.Z.; data curation, X.Z.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z., R.C., and Y.S.; visualization, X.Z.; supervision, Y.S.; project administration, Y.S.; funding acquisition, Y.S. All authors read and agree to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China, grant number 31770698.

**Acknowledgments:** We are grateful to the editor and three anonymous reviewers for their insightful comments and suggestions that greatly helped improve our manuscript.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **Genome Cytosine Methylation May A**ff**ect Growth and Wood Property Traits in Populations of** *Populus tomentosa*

**Kaifeng Ma 1,2,3, Yuepeng Song 1,2, Dong Ci 1,2, Daling Zhou 1,2, Min Tian 1,2 and Deqiang Zhang 1,2,4,\***


Received: 18 June 2020; Accepted: 23 July 2020; Published: 29 July 2020

**Abstract:** Growth and wood formation are crucial and complex biological processes during tree development. These biological regulatory processes are presumed to be controlled by DNA methylation. However, there is little direct evidence to show that genes taking part in wood regulation are affected by cytosine methylation, resulting in phenotypic variations. Here, we detected epimarkers using a methylation-sensitive amplification polymorphism (MSAP) method and performed epimarker–trait association analysis on the basis of nine growth and wood property traits within populations of 432 genotypes of *Populus tomentosa*. Tree height was positively correlated with relative full-methylation level, and 1101 out of 2393 polymorphic epimarkers were associated with phenotypic traits, explaining 1.1–7.8% of the phenotypic variation. In total, 116 epimarkers were successfully sequenced, and 96 out of these sequences were linked to putative genes. Among them, 13 candidate genes were randomly selected for verification using quantitative real-time PCR (qRT-PCR), and it also showed the expression of nine putative genes of *PtCYP450*, *PtCpn60*, *PtPME*, *PtSCP*, *PtGH*, *PtMYB*, *PtWRKY*, *PtSTP*, and *PtABC* were negatively correlated with DNA methylation level. Therefore, it suggested that changes in DNA methylation might contribute to regulating tree growth and wood property traits.

**Keywords:** growth trait; wood property; cytosine methylation; epimarker; candidate gene; gene expression

#### **1. Introduction**

In plant species, phenotypic plasticity is a phenomenon that one genotype displays alternative phenotypes induced by changing environments and epigenetic variation [1–4]. Epigenetic regulation is not based on alterations in DNA sequence [5,6]; rather, it gives rise to 'epialleles', meaning alleles with identical DNA sequences yet diverse levels of gene expression resulting from differences in their epigenetic status [7,8]. Research shows that epigenetic changes are potentially reversible, often exist in metastable states [9,10], and can be passed from one generation to the next via mitosis or meiosis [11–13]. This inherited characteristic provides an opportunity to plot epigenetic linkage maps and perform intraspecific association analysis between epigenetic alleles and traits [14–17].

Cytosine methylation, primarily the addition of a methyl group to the C5 position of a cytosine residue, is one of the best known epigenetic modifications [18]. The modification participates in various important biological responses, e.g., repressing the expression of transposons and repetitive elements, responding to biotic/abiotic stress, and taking part in early embryogenesis, stem cell differentiation, X chromosome inactivation, and genomic imprinting [19–24]. Differences in the extent of DNA methylation can give rise to varied gene expression patterns, leading to phenotypic variation [25]; furthermore, specific sites showing different patterns of 5 mC (either hypermethylation or demethylation) between different individuals or tissues can affect genetic transcription, resulting in morphological changes [26–28].

For some time, researchers focused attention on selected methylated genes and their function in individual plants, such as *Arabidopsis SUPERMAN* genes with cytosine methylation affecting flower development [29], *PHOSPHORIBOSYLANTHRANILATE ISOMERASE* (*PAI*) genes supply insufficient PAI activity with cytosine methylation [30], *DEMETER* genes imprint *MEDEA Polycomb* gene by excising 5-methylcytosine [31], and *FLOWERING WAGENINGEN* (*FWA*) transcription factor gene silenced by DNA methylation in vegetative tissue but it is demethylated in the central cell of the female ovule [13,32]. In recent years, epimarkers were developed to explore epigenetic diversity and structure, and associated genes, on the basis of epialleles that can be detected using the methylation-sensitive amplified polymorphism (MSAP) method in hybrids and wild populations [16,33,34]. Furthermore, these methods led to the notion that germplasm resources carrying epimutations associated with beneficial traits could be selected from plant populations [15–17].

Chinese white poplar (*Populus tomentosa* Carr.), a native tree species with 2*n* = 2*x* = 38 chromosomes (diploid; *n* and *x* represent chromosome numbers of gametophyte and haploid, respectively) and widely distributed in the Yellow River basin, plays an important role in ecological and environmental protection. The tree has also been cultivated for commercial timber production and pulpwood because of its low contents of fermentation-inhibiting extractives, as well as high biomass conversion efficiency [35–37]. It has also been used as a model species for perennial woody plants in physiological and biochemical research, genetic diversity and structure analysis, and the investigation of functional genes that participate in lignocellulose biosynthesis and growth processes [38]. Previously, we have explored the genetic/epigenetic diversity and structure of *P*. *tomentosa* and performed marker–trait association analysis [39,40], as well as showing that candidate wood-related genes with unique methylation patterns play critical roles in xylem biosynthesis [41]. However, epimarker–trait association analysis and seeking functional genes regulated by methylation status have not yet been performed in natural populations of *P*. *tomentosa*. Therefore, here, we investigate the relationship between global DNA cytosine methylation markers (using MSAP) and variations in tree growth and wood property parameters in order to elucidate the possible epigenetic regulation of xylem formation.

#### **2. Materials and Methods**

#### *2.1. Plant Materials*

Plant xylem materials were harvested from the germplasm bank of *P*. *tomentosa* located at Guanxian County (36◦30 54" N, 115◦21 45" E), Shandong Province, China. The germplasm bank, including 1047 genotypes collected from the region of China in which the species is naturally distributed, was constructed between 1982 and 1984 using root segment propagation techniques [42]. Each genotype was planted with at least three replicates with row spacing 4 m and plant spacing 4 m. In the present study, 432 unrelated genotypes (each genotype has three clones) originating from Beijing (20), Hebei (114), Shandong (19), Henan (114), Shaanxi (64), Shanxi (80), Gansu (6), Anhui (10), and Jiangsu (5) were selected randomly for phenotype measurements and genotyping. To deal with each of the 1296 trees, we first stripped off the bark (approximately 5 cm × 5 cm) at breast height and cut out xylem samples with a sharp blade; each of the xylem samples was divided into two parts equally, one part of the xylem material was frozen rapidly in liquid nitrogen for nucleic acid extraction

and the other part was stored in plastic bags for wood property determination at normal temperature status, respectively.

#### *2.2. Phenotypic Data Collection*

Phenotypic data, including growth traits and wood properties, were collected and then used in statistical analysis. The growth phenotype traits of tree height (H) and diameter at breast height (DBH) were measured by using the enclosed ruler method and hypsometer, respectively. The volume of timber (V) was estimated according to the following equation [43]: *V* = <sup>π</sup> <sup>×</sup> (*DBH*/2)2·*H*·*f* with *f* = 0.488.

The xylem materials that were stored in plastic bags were taken back to the laboratory and dried naturally for determining wood properties, including fiber length, fiber width, microfibril angle (MFA), and contents of lignin, holocellulose, and α-cellulose. Firstly, the MFA was measured using an X-ray diffractometer (PanalyticalX'Pert Pro, Philips, Eindhoven, Netherlands) with the angle between the incident ray and the receiving optical path set to 22.4◦. The main parameter settings were: tube voltage 40 kV; tube current 40 mA; scanning step 0.5◦; and angle of rotation 0–360◦. The MFA was calculated using the method of 0.6 T [44,45]. Secondly, we cut a small amount of each xylem sample into small pieces and then macerated the tissue in a solution of hydrogen peroxide (30%) and glacial acetic acid (1:1, *v*/*v*) at 70 ◦C for 48 h. For fiber length and fiber width determination, the fibrous material was washed with deionized water, processed with safranin staining, and placed uniformly over a slide, then measured using a computer image analysis system with a color television video camera (VM-60N, Olympus, Japan) according to Hart et al. [46]. Then, the xylem samples were ground into fine powder and passed through a 40–60 mesh screen; later, the contents of lignin, holocellulose, and α-cellulose were determined and measured using wet chemistry analysis techniques described by Porth et al. [47]. At the same time, near-infrared reflectance (NIR) spectroscopy was used to detect the NIR absorption spectra of the powder samples, and calibration models were developed to predict the contents of holocellulose, α-cellulose, and lignin [48].

The xylem materials that were stored in plastic bags were taken back to the laboratory and dried naturally for determining wood properties, including fiber length, fiber width, microfibril angle (MFA), and contents of lignin, holocellulose, and α-cellulose. Firstly, the MFA was measured using an X-ray diffractometer (PanalyticalX'Pert Pro, Philips, Eindhoven, Netherlands) with the angle between the incident ray and the receiving optical path set to 22.4◦. The main parameter settings were: tube voltage 40 kV; tube current 40 mA; scanning step 0.5◦; and angle of rotation 0–360◦. The MFA was calculated using the method of 0.6 T [44,45]. Secondly, we cut a small amount of each xylem sample into small pieces and then macerated the tissue in a solution of hydrogen peroxide (30%) and glacial acetic acid (1:1, *v*/*v*) at 70 ◦C for 48 h. For fiber length and fiber width determination, the fibrous material was washed with deionized water, processed with safranin staining, and placed uniformly over a slide, then measured using a computer image analysis system with a color television video camera (VM-60N, Olympus, Japan) according to Hart et al. [46]. Then, the xylem samples were ground into fine powder and passed through a 40–60 mesh screen; later, the contents of lignin, holocellulose, and α-cellulose were determined and measured using wet chemistry analysis techniques described by Porth et al. [47]. At the same time, near-infrared reflectance (NIR) spectroscopy was used to detect the NIR absorption spectra of the powder samples, and calibration models were developed to predict the contents of holocellulose, α-cellulose, and lignin [48].

The plain vanilla ANOVA (analysis of variance) was used to see the significant difference (*p* < 0.05) in the phenotypic variables among the nine populations. Additionally, we used Duncan's multiple range test to detect the significance difference (*p* < 0.05) within the pairs of means of each phenotypic parameter for the populations [49].

#### *2.3. Genotyping by Methylation-Sensitive Amplified Polymorphism*

The frozen xylem materials were ground into fine powder in porcelain mortars with liquid nitrogen for the extraction of nucleic acid DNA and RNA. Half of each powder sample was used for DNA isolation by the Cetyltrimethyl Ammonium Bromide (CTAB) method [50], and DNA was quantified using a NanoVue UV/visible spectrophotometer (GE Healthcare Company). We processed the DNA for methylation-sensitive amplified polymorphism (MSAP) detection, including double digestion, ligation, pre- and selective amplification, and capillary electrophoresis with fluorescence detection, as described by Ma et al. [40]. Each DNA sample was digested separately by restricted endonuclease combinations of *Eco*RI–*Hpa*II and *Eco*RI–*Msp*I, and the electrophoresis results for the digested mixtures were analyzed using GeneMarker V1.7.1: four patterns of genotyping were generated (denoted '1,1'; '1,0'; '0,1'; and '0,0') according to the presence ('1') or absence ('0') of the relevant fragment in the electrophoresis lanes of the final amplification products pre-digested by *Eco*RI–*Hpa*II and *Eco*RI–*Msp*I, respectively.

#### *2.4. Linear Correlation Analysis and Association Analysis*

Four patterns of genotyping were generated from the procedure for MSAP detection, and the percentage of each pattern relative to the total number of bands was defined as relative hemi-methylation ('1,0'), full-methylation ('0,1'), non-methylation ('1,1'), or uninformative site ('0,0'). The total relative methylation level was defined as the sum of hemi-methylation and full-methylation percentages. Pearson correlation analysis was carried out between relative methylation levels and traits (tree height, diameter at breast height, volume of timber, fiber length, fiber width, MFA, and contents of lignin, holocellulose, and α-cellulose) to investigate the relationship between methylation and phenotype. Meanwhile, the relationship between gene expression level and DNA methylation level was also estimated by using linear correlation analysis. Primary percentage data were transformed as *xij*' = arcsin <sup>√</sup>*xij* before calculation (*xij* is the *<sup>j</sup>* observed value in the *<sup>i</sup>* group).

In order to identify correlations between phenotypic traits and the MSAP markers with four patterns at an epigenetic locus, single marker association analysis was carried out using single-factor ANOVA with a significance threshold of *p* = 0.05 or *p* = 0.01 in general. The contribution to trait variation of each MSAP marker was evaluated as the percentage of the square deviation among groups and population variance [39]. The structural network of associated epimarkers and the traits was constructed using a Cytoscape V3.5.1 software (https://cytoscape.org/what\_is\_cytoscape.html) according to the instructions.

#### *2.5. Candidate Gene Screening and Gene Expression*

The associated MSAP markers were separated from the selective amplification products using electrophoresis on 6% denaturing polyacrylamide gels and detected with silver staining [51]. Then, the candidate marker fragments were extracted from the gels with the Wizard SV and PCR Clean-Up System (Promega, Madison, WI, USA), and the short PCR fragments were sequenced by Biomed Company (Beijing, China) after transformation and cloning processes [38]. Sequence homology analysis and function prediction were performed using web databases, including National Center of Biotechnology Information (NCBI) and Joint Genome Institute (JGI).

We performed total RNA isolation using the RNeasy plant mini kit (Qiagen, Shanghai, China) from powdered xylem (mentioned in Section 2.3), and used the first-strand cDNA (synthesized using Reverse Transcription System, Promega) for quantitative real-time PCR (qRT-PCR) processes following the description by Song et al. and Ma et al. [52,53]. Primers were designed according to the functional annotation of the linked genes obtained from the homology sequence alignment. The specificity of each primer set was checked by sequencing the PCR products, and 13 candidate genes were selected for verification using *PtACTIN* as the internal control gene [54]. Primers for qRT-PCR were designed using Primer Express 3.0 software (Applied Biosystems) and were listed in Table S1. All reactions were performed in triplicate as technical and biological repetitions in 45 genotypes (135 trees or clones).

#### **3. Results**

#### *3.1. Variation in Growth and Wood Characteristics*

The tree phenotype, including growth traits (diameter at breast height, height of tree and volume of timber) and wood characteristics (fiber length, fiber width, microfibril angle and contents of lignin, holocellulose, and α-cellulose) were quantified in natural populations planted in the germplasm bank in Guanxian County. In total, the phenotypic variation was analyzed in 432 genotypes (1296 clones) collected from nine different geographic provenances: it was considered that this range of genotypes should provide substantial materials for selective breeding and genetic improvement. The contents of lignin, holocellulose, and α-cellulose were 20.87 ± 2.67% (mean ± SD; SD, standard deviation), 72.59 ± 10.59%, and 40.08 ± 8.78%, respectively. The values of fiber length and width were distributed within the range 0.866–1.512 mm and 16.984–29.850 μm with mean values 1.169 ± 0.085 mm and 23.161 ± 1.973 μm, respectively. The mean microfibril angle was 17.815◦ ± 4.526◦. The commercially important parameters of diameter at breast height (21.38 ± 5.67 cm), the height of the tree (14.57 ± 2.88 m), and the volume of timber (0.29 <sup>±</sup> 0.19 m3) were also measured.

We performed one-way ANOVA to investigate the genetic variation in the growth and wood property traits among the nine natural populations (Figure 1). Three parameters of wood properties (contents of holocellulose and α-cellulose, and fiber width) showed significant differences among the populations; similarly, the growth traits of diameter at breast height, the height of the tree, and the volume of timber displayed statistically significant significances (*p* < 0.05).

**Figure 1.** Differences in phenotypic traits among nine natural populations of *Populus tomentosa*. The *x*and *y*-axes indicate provenance and phenotypic value, respectively. Mean ± SD (standard deviation); different letters above the bars (SD) indicate significant differences (*p* < 0.05) according to Duncan's multiple range test (within each sub-graph, it shows a significant difference between each two means if the pair of data sets have no common letter above the bars).

#### *3.2. Linear Correlation between Phenotype and DNA Methylation Levels*

Linear correlation analysis was performed to determine trait–trait phenotypic correlations and trait—methylation level correlations among *P*. *tomentosa* genotypes (Table 1). Lignin content was negatively correlated with cellulose (holocellulose, α-cellulose) content. Three wood and growth parameters (fiber length, timber volume, diameter at breast height) were significantly correlated with the other traits. For instance, fiber length was positively correlated with fiber width, diameter at breast height, tree height and timber volume; in contrast, fiber length was negatively correlated with microfibril angle (*p* < 0.01). Timber volume was positively correlated with lignin content, diameter at breast height, as well as tree height, but was negatively correlated with α-cellulose content (*p* < 0.01). These results suggest that it was possible to select individuals for long fibers and large quantities of stem volume according to measurements of diameter at breast height and tree height.

We also analyzed the relationships between phenotypic traits and relative gene methylation levels estimated from 2393 polymorphic MSAP markers (epimarkers) out of 2408 bands according to our previous data files [40]. The relative total methylation and non-methylation levels were 26.55% and 42.71%, respectively; the relative hemi-methylation level (13.47%) was larger than that of full-methylation (13.10%) (*p* < 0.001) [40]. Here, it showed that tree height was significantly negatively correlated with relative non-methylation level but positively correlated with relative full-methylation level (*p* < 0.05) (Table 1).

#### *3.3. MSAP Markers Associated with Phenotypic Traits within the Populations*

Though we investigated the linear correlations between phenotypic traits and relative gene methylation levels, the relationship was unclear between traits and each of the polymorphic epimarker. On the basis of the single marker ANOVA, we considered that 1101 (630 trait-specific and 471 multi-function epimarkers) out of 2393 polymorphic MSAP markers were associated with the nine phenotypic traits (*p* < 0.05). We constructed structural networks (Figure 2) to illustrate their relationships and they showed that 125 (77 trait-specific), 127 (66 trait-specific), and 72 (30 trait-specific) epimarkers, explaining 1.19% (*p* = 0.034) to 4.69% (*p* < 0.001) of variation, were associated with contents of lignin, holocellulose, and α-cellulose, respectively (Table S2). Similarly, 1.06% (*p* = 0.043) to 7.15% (*p* < 0.001) of variation in fiber length, fiber width and microfibril angle was explained by 190 (91 trait-specific), 134 (75 trait-specific), and 140 (64 trait-specific) associated epimarkers, respectively. Finally, 312 (281 trait-specific), 414 (254 trait-specific) and 345 (309 trait-specific) epimarkers were associated with diameter at breast height, the height of the tree, and the volume of timber, respectively, explaining 1.13% (*p* = 0.035) to 7.78% (*p* < 0.001) of variation in those traits (Figure 2, Table S2). The analysis suggested that these epimarkers might be associated with important functions in the regulation of complex quantitative traits, including wood property and tree growth.



55

percentage data were transformed as *xij*' = arcsin

√*xij* before calculation; \* *p* < 0.05 (2-tailed), \*\* *p* < 0.01 (2-tailed).

**Figure 2.** Relationships between all of the significant associated epimarkers and phenotypic traits, as represented by a structural network. The square nodes in the innermost circle represent fiber length (FL), fiber width (FW), microfibril angle (MFA), diameter at breast height (DBH), the height of the tree (H), the volume of timber (V), and the contents of lignin (L), holocellulose (HC) and α-cellulose (α-). Nodes with red around the central circle represent multi-function associated epimarkers. Nodes with red around the outermost circle represent trait-specific associated epimarkers. The green lines connecting traits with epimarkers represent the phenotype variants explained by the associated epimarkers (*p* < 0.05).

#### *3.4. Sequencing and Functional Prediction by Homology Alignment*

In order to investigate the function of genes linked with candidate epimarkers that were sequenced by denaturing polyacrylamide gel electrophoresis and silver staining, recovery from the gels, transformation and cloning processes must take place. In total, out of 180 randomly selected MSAP markers, we successfully sequenced 116 epimarker fragments (NCBI, Nos. MN757649–MN757764; Data S1). We aligned the sequences to the reference genome of *P. trichocarpa* and identified 96 markers that were linked to putative functional genes. The linked genes were predicted to take part in, e.g., encoding cytochrome P450 family proteins (MSAP-602), regulatory MYB and WRKY transcription factor families (MSAP-1222, MSAP-2105), ATPase activity regulation (MSAP-803, MSAP-811), regulating transmembrane transport (MSAP-1250), and glycosyl hydrolase family 1 protein (MSAP-2313), as well as other important functions (Table S3). The evidence indicated that the genes linked to epimarkers were essential for plant development, gene regulation via transcription factors, and energy metabolism in *P*. *tomentosa*.

#### *3.5. Quantitative Expression of Candidate Genes*

To examine whether the expression of putative genes was influenced by DNA methylation, we investigated the relationship between different patterns of gene exSpression and the relative DNA methylation levels. Thirteen candidate genes were randomly selected for verification using qRT-PCR within 45 genotypes of *P. tomentosa*: We found that the genes of *Potri.002G228400*, *Potri.010G254400*, and *Potri.018G070900* were expressed at high levels

(Figure 3a). We then observed that the genes of *Potri.001G223800*, *Potri.002G228400*, *Potri.004G133800*, *Potri.005G091700*, and *Potri.018G070900* were expressed at high levels and showed relatively high non-methylation levels (Figure 3b, in red); however, the genes *Potri.001G223800*, *Potri.005G091700*, and *Potri.018G070900* were expressed at lower levels and showed relatively high hemi-methylation, full-methylation, and total-methylation levels, respectively. Negative correlations were also detected between some of the other genes and DNA methylation levels in different patterns (Figure 3b, Table S4). Homology analysis showed that the nine correlated putative genes were *PtCYTOCHROME P450* (*PtCYP450*, *Potri.018G070900*), *PtCHAPERONES* (*PtCpn60*, *Potri.004G133800*), *PtPECTINMETHYLESTERASE* (*PtPME*, *Potri.015G127500*), *PtSERINECARBOXYPEPTIDASE* (*PtSCP*, *Potri.005G091700*), *PtGLYCOSIDE HYDROLASE* (*PtGH*, *Potri.001G223800*), *PtMYB* (*Potri.015G075800*), *PtWRKY* (*Potri.002G228400*), *PtSUGAR TRANSPORT PROTEIN* (*PtSTP*, *Potri.002G096000*), and *PtATP*-*BINDING CASSETTE TRANSPORTER* (*PtABC*, *Potri.010G254400*) (Table S3).

**Figure 3.** Expression levels of candidate genes and their correlations with relative DNA methylation levels. (**a**) Gene expression level determined by quantitative real-time polymerase chain reaction (qRT-PCR). The *x*- and *y*-axes indicate, respectively, gene name and expression level relative to *PtACTIN*. (**b**) Linear correlations between gene expression levels and relative DNA methylation levels. Non-, Hemi-, Full-, and Total- indicate relative non-methylation, hemi-methylation, full-methylation, and total-methylation levels, respectively, estimated by the Pearson correlation coefficient (*r*, red-black-green color scale from positive to negative correlation). The white number (*p*-value) in each cell indicates the significance of difference.

On the basis of previous research, the *Arabidopsis* CYP450 Reductase 2 (ATR2), providing electrons from NADPH to a large number of CYP450 [55], appears to be induced during lignin biosynthesis and under stress [56]. Cpn60 are large double-ring assemblies that assist in the folding of the key proteins in plant growth and development [57–59]. The roles of PME have not been fully elucidated; however, this enzyme may be involved in cell elongation by modifying cell wall pectin [60]. It has long been proposed that *PMEs* have roles in growth regulation, including stem growth [61,62], and *PttPME1* was demonstrated involving in mechanisms determining fiber width and length in the wood cells of aspen trees [63]. The serine carboxypeptidase (SCP) family, also called SCP-like (SCPL) family, plays a key part in plant growth, development and stress responses. Transgenic plants of *Nicotiana tabacum* over-expressing *NtSCP1* show reduced cell elongation [64]. In *Triticum aestivum*, SCP regulates cell death during vascular tissue development [65]. Glycosyl hydrolase (GH) proteins are broadly distributed in organisms, and β-glucosidases belonging to GH family 1 have been implicated in several fundamental processes, including lignification [66,67].

MYB and WRKY are two transcription factor families involved in most biological processes. It was shown that the MYB61, regulated by NAC29/31, binds with *CELLULOSE SYNTHASE*, which in turn activates gene expression in secondary wall cellulose synthesis in *Oryzasativa* [68]. Similarly, *PtrMYB152* is over-expressed in secondary wall-forming cells, resulting in the specific activation of lignin biosynthetic genes in *P*. *trichocarpa* [69]. It was also reported that PtrWRKY19 and VvWRKY2 may function as regulators of pith secondary wall formation in poplar and lignification in grapevine, respectively [70,71].

Sugar transport protein (STPs) and ATP-binding cassette transporter (ABC) transporters are important for transmembrane transport, an essential biological process in cells. STPs are high-affinity hexose transporters with sink-specific tissue expression [72–74]; constitutive over-expression of *STP13* resulted in seedlings with increased biomass when grown on media supplemented with sugar [75]. It was suggested that *MeSTPs* may play a role in early tuber growth, the period when these genes were mainly expressed in *Manihot esculenta* [76]. A breakthrough study demonstrated that a transporter of the ABC family pumps the monolignol *p*-coumaryl alcohol, one of the three main monolignols synthesized in the cytosol, across the plasma membrane [77,78]. It was also reported that the stem vascular morphology was slightly disorganized in *abcb14-1* mutants, with decreased phloem area in the vascular bundle and decreased xylem vessel lumen diameter [79]. Therefore, it seemed that cytosine methyaltion might take part in regulating the expression of the putative genes and affect the growth and wood property traits.

#### **4. Discussion**

Growth traits and wood properties, determining the business value and potential for energy production, are the most important phenotypes for commercial timber. *P*. *tomentosa* is a native species mainly used for timber in vast regions of North China; thus, it is imperative to investigate the mechanisms of growth regulation and wood formation to underpin the selection of germplasm resources and breeding for genetic improvement. In this paper, we measured and calculated values of nine parameters of phenotypic traits (diameter at breast height, height of tree, and volume of timber; content of lignin, content of holocellulose, content of α-cellulose, fiber length, fiber width, and microfibril angle) among 1296 plants (432 genotypes) from natural tree populations. Then, we explored the relationships between those traits and variation in DNA methylation detected using MSAP and qRT-PCR methods.

On the basis of the statistics of the different genotypes, we found that the coefficient of variation in traits was 7.24–66.91%, and the growth and wood property traits showed significant variation among the nine natural populations, with the exception of lignin content, fiber length and microfibril angle. This phenotypic variation, or plasticity, might correlate with genome DNA methylation, the level of which plays an important role in genome stabilization and transposable element repression [19,22]. Previous evidence showed that the methylation level is correlated with phenotype [80]. For instance, a negative correlation was detected between genomic methylation level and gene expression in maize; similarly, negative correlations were reported between methylation level and energy use efficiency and crop yield in *Brassica napus* [80,81]. However, in the woody ornamental plant mei, the leaf length, width, and area were positively correlated with relative full and total methylation levels [17]. In hybrids of poplar, a positive correlation was demonstrated between DNA methylation percentage and productivity [82]. We previously showed that the net photosynthetic rate, tree height and diameter at breast height were positively correlated with relative total methylation and hemi-methylation levels in an intraspecific hybrid population of *P*. *tomentosa* [39]; here, we found a similar result among the natural populations—the relative full-methylation level was positively correlated with tree height.

Previous studies have showed that methylation is essential for gene exploration and epimarker-assisted analysis in plant populations [15,17,34]. For instance, *PtHT1.1*, *PtHT1.2*, *PtPsbK*, *PtPIN1.2*, *PtMYB60*, and *PtMYB61* were found to be modified by DNA methylation and regarded as playing roles in leaf formation and regulation of photosynthesis in *P*. *simonii* [34]. In our research, we detected 1101 epimarkers associated with growth and wood property traits. From these, we identified 96 epimarker-linked putative genes, 13 of which were randomly selected for qRT-PCR analysis and it was found that *PtCYP450*, *PtCpn60*, *PtPME*, *PtSCP*, *PtGH*, *PtMYB*, *PtWRKY*, *PtSTP*, and *PtABC* were negatively correlated with relative DNA methylation level. Meanwhile, it was demonstrated that *PttPME1* involved in mechanisms determining fiber width and length in the wood cells of aspen trees [63]. The transcription factor gene *PtrMYB152* is over-expressed in secondary wall-forming cells, resulting in the specific activation of lignin biosynthetic genes, and PtrWRKY19 may function as regulators of pith secondary wall formation in *P*. *trichocarpa* [69,71]. The *ABCB14-1* plays a role in decreasing the phloem area in the vascular bundle and xylem vessel lumen diameter in *Arabidopsis* [79]. The above evidence seems to suggest that genes involved in regulating growth and wood development in *P*. *tomentosa* might be affected by cytosine methylation modifications. The epimarkers that needed to be verified may also provide a new sight for breeding programs in this commercially important tree species.

#### **5. Conclusions**

The quantitative traits of the growth and wood property are important for commercial timber tree species. The objective of the present study was to reveal the relationship between tree growth, wood property traits and cytosine methylation, respectively. It was found that the tree height was positively correlated with relative full methylation level. And 1101 single- and multi-function MSAP markers explaining 1.1–7.8% of phenotypic variation were associated with growth traits (diameter at breast height, the height of the tree and the volume of timber) and wood characteristics (fiber length, fiber width, microfibril angle, lignin, holocellulose, and α-cellulose). It was demonstrated that 96 sequences, out of 116 successfully sequenced epimarkers, were linked to putative genes. The expression levels of nine putative genes (*PtCYP450*, *PtCpn60*, *PtPME*, *PtSCP*, *PtGH*, *PtMYB*, *PtWRKY*, *PtSTP*, and *PtABC*) were negatively correlated with DNA methylation. Our results imply that the widespread natural variation of DNA methylation might contribute to regulating tree growth and wood formation, and the findings will enhance our understanding of epigenetics in tree growth and xylem formation.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/8/828/s1, Table S1: Primer sequences for Real-time PCR, Table S2: MSAP markers associated with phenotypic traits and the variation explanation in natural population of *Populus tomentosa*, Table S3: Homologous alignment and gene function analysis, Table S4: The gene expression levels detected using qRT-PCR and relative cytosine methylation levels within 45 genotypes of *Populus tomentosa*. Data S1: Sequences for the 116 MSAP markers.

**Author Contributions:** D.Z. (Deqiang Zhang) designed the experiments. K.M., Y.S., and D.C. collected the plant materials and phenotypic data. K.M., D.Z. (Daling Zhou), and M.T. did the experiment in gene cloning and quantification of gene expression. K.M. detected the molecular markers and analyzed all of the data profiles. K.M. and D.Z. (Deqiang Zhang) wrote the manuscript. Y.S., D.C., D.Z. (Daling Zhou), and M.T. provided suggestions for manuscript revision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (Nos. 31872671 and 31670333), and the 111 Project (No. B20050).

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sequencing Data:** All of the sequences were submitted to NCBI (https://www.ncbi.nlm.nih.gov/) with accession numbers: MN757649–MN757764.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Preliminary Evidence for Domestication E**ff**ects on the Genetic Diversity of** *Guazuma crinita* **in the Peruvian Amazon**

### **Lady Laura Tuisima-Coral 1,2,\*, Petra Hlásná Cepkov ˇ á 3, John C. Weber <sup>4</sup> and Bohdan Lojka 1,\***


Received: 30 June 2020; Accepted: 20 July 2020; Published: 23 July 2020

**Abstract:** *Guazuma crinita,* a fast-growing timber tree species, was chosen for domestication in the Peruvian Amazon because it can be harvested at an early age and it contributes to the livelihood of local farmers. Although it is in an early stage of domestication, we do not know the impact of the domestication process on its genetic resources. Amplified fragment length polymorphic (AFLP) fingerprints were used to estimate the genetic diversity of *G. crinita* populations in different stages of domestication. Our objectives were (i) to estimate the level of genetic diversity in *G. crinita* using AFLP markers, (ii) to describe how the genetic diversity is distributed within and among populations and provenances, and (iii) to assess the genetic diversity in naturally regenerated, cultivated and semi-domesticated populations. We generated fingerprints for 58 leaf samples representing eight provenances and the three population types. We used seven selective primer combinations. A total of 171 fragments were amplified with 99.4% polymorphism at the species level. Nei's genetic diversity and Shannon information index were slightly higher in the naturally regenerated population than in the cultivated and semi-domesticated populations (*He* = 0.10, 0.09 and 0.09; *I* = 0.19, 0.15 and 0.16, respectively). The analysis of molecular variation showed higher genetic diversity within rather than among provenances (84% and 4%, respectively). Cluster analysis (unweighted pair group method with arithmetic mean) and principal coordinate analysis did not show correspondence between genetic and geographic distance. There was significant genetic differentiation among population types (Fst = 0.12 at *p* < 0.001). The sample size was small, so the results are considered as preliminary, pending further research with larger sample sizes. Nevertheless, these results suggest that domestication has a slight but significant effect on the diversity levels of *G. crinita* and this should be considered when planning a domestication program.

**Keywords:** genetic diversity; genetic differentiation; natural regeneration; cultivated population; semi-domesticated population

#### **1. Introduction**

Tropical forests provide many valuable products, including rubber, fruits and nuts, medicinal herbs, lumber, firewood, and charcoal [1]. Natural forest populations typically possess considerable genetic variation [2]. However, deforestation due to slash-and-burn agriculture [3], over-harvesting and other unsustainable forestry practices are reducing tree genetic diversity in many areas in the tropics [1]. Tree domestication has been promoted as a strategy to conserve genetic resources for tropical species [1,4]. Domestication involves the selection and propagation of desirable trees, so we expect that genetic diversity is lower in domesticated populations compared with natural populations. However, very few studies have assessed the difference in genetic diversity between domesticated and natural forest tree populations in the tropics [5,6]. This information is necessary for planning tree domestication strategies that maintain high levels of genetic diversity in the domesticated population.

*Guazuma crinita* Mart. (Malvaceae) was identified as a priority timber species for tree domestication in the Peruvian Amazon [7]. It is a pioneer species in the Amazon basin of Peru, Ecuador and Brazil [8]. It can be inter-cultivated with food crops because it has a small crown with thin branches and the older branches naturally self-prune. It provides wood products at an early age, can be coppiced for successive harvests and contributes significantly to farmers' income [9,10]. Due to its initial fast growth (up to 3 m per year), it has been promoted in reforestation programs and agroforestry systems [11,12]. In addition, it can be vegetative propagated for commercial purposes [10]. It has promising national and international markets for lumber products [10,13].

*G. crinita* is a cross pollinated species that produces fruit at an early age, and can potentially disperse seeds over a long distance by both wind and water, allowing it to colonize forest gaps and potentially form dense stands of natural regeneration [14]. These dispersal characteristics should produce extensive gene flow among populations, resulting in high levels of genetic diversity within population and relatively low genetic differentiation among populations [15–17]. Local farmers manage *G. crinita* in its natural ecological niche for timber, while also engaging in other agricultural activities.

The objectives of this study were (i) to estimate the level of genetic diversity in *G. crinita* using AFLP (amplified fragment length polymorphism) markers, (ii) to describe how the genetic diversity is distributed within and among populations and provenances, and (iii) to assess the genetic diversity in naturally regenerated, cultivated and semi-domesticated populations. Based on the reproductive characteristics of this species, we hypothesized that there would be (i) a high level of polymorphism within populations and provenances, (ii) relatively little genetic differentiation among populations, and (iii) greater genetic diversity in the naturally regenerated population compared with the cultivated and semi-domesticated populations.

#### **2. Materials and Methods**

#### *2.1. Sampling*

In this study, we analyzed the genetic diversity of *G. crinita* populations in three stages of tree domestication. A natural population included wild, naturally regenerated trees from one provenance that farmers retained in their fields. In the cultivated population, we sampled trees from one provenance that farmers planted in a home garden using seedlings produced in a home nursery. The semi-domesticated population included trees from six provenances in a clonal garden. Genotypes in the clonal garden were selected over a period of years from progeny trees originating from an extensive collection of 200 mother trees. The natural, cultivated and semi-domesticated populations are in the second, fourth and sixth stages, respectively, of the seven stages of domestication proposed by Vodouhe and Dansi [18].

A total of 84 individuals from the three different population types (natural, cultivated and semi-domesticated) were sampled from eight *G. crinita* provenances in the Peruvian Amazon (Figure 1). Thirty individuals from the village of Nuevo Piura were randomly sampled from a population of natural regeneration located in Campo Verde district, Ucayali region (150 m.a.s.l). In the city of Tingo Maria, Huanuco region, 30 cultivated individuals were sampled in a home garden (564 m.a.s.l). In addition, 24 vegetative propagated trees were sampled from a clonal multiplication garden at the Peruvian Amazon Research Institute (IIAP), located 12.4 km from Pucallpa, Ucayali Region (154 m.a.s.l). They represent selected genotypes from six provenances in two watersheds in the Peruvian Amazon. Young leaf tissues were collected from individual plants and then dried in silica gel for DNA extraction.

**Figure 1.** Map of the geographic distribution of eight *Guazuma crinita* provenances sampled in Peru. NP = Nuevo Piura—naturally regenerated population. TM = Tingo Maria—cultivated population. NR = Nueva Requena, SA = San Alejandro, PI = Puerto Inca, CU = Curimana, MA = Macuya and TS = Tahuayo—semi-domesticated populations.

The sample size in this study was small so we consider the results as preliminary. Other studies of genetic diversity in tropical tree species have also used small sample sizes [17,19–22] and reported genetic diversity patterns consistent with studies based on large sample sizes.

#### *2.2. DNA Extraction*

DNA from the 84 leaf samples was extracted using the CTAB (cetyltrimethylammonium bromide) method [23] with a slight modification (adding a trace of polyvinylpyrrolidone (PVP) and 5 μL of RNase). The DNA quality was determined by 0.8 % agarose gel electrophoresis using a Nanodrop Spectrophotometer (Thermo Scientific, Delaware, USA). We only obtained genomic DNA of sufficient quality for amplification from 58 of the 84 samples (Table 1). It was diluted to 50 ng/μL and stored at −20 ◦C.

**Table 1.** Origin of the 58 *G. crinita* individuals analysed by amplified fragment length polymorphic (AFLP) markers.


<sup>1</sup> Samples cultivated in a home garden; <sup>2</sup> genotypes established in a clonal multiplication garden.

#### *2.3. AFLP Amplification*

Molecular AFLP markers were used because no previous genome information is required and a large number of polymorphic loci can be analysed simultaneously [24]. With the use of AFLP, we expected to successfully assess the genetic relationships between *G. crinita* populations in the Peruvian Amazon.

Techniques for the AFLP analysis of *G. crinita* were adapted from those described by Vos et al. [25]. Commercial AFLP kits (Stratec Molecular, Berlin, Germany) were used for the restriction, ligation and pre-amplification steps.

An AFLP Core Plant Reagent Kit I (Stratec Molecular, Berlin, Germany) was used for restriction and ligation. The restriction reaction volume was 5 μL and included the following: 1 uL of 5 × Reaction Buffer (50 mM Tris-HCl (pH 7.5), 50 mM Mg-acetate, 250 mM k-acetate); 0.4 μL of enzyme mixture EcoRI/MseI (1.25 U/μL each in 10 mM Tris-HCl (pH 7.4), 50 mM NaCl, 0.1 mM EDTA, 1 mM DTT, 0.1 mg/mL BSA, 50% glycerol (*v*/*v*), 0.1% Triton®X-100); 1.1 μL of sterile water and 2.5 μL of DNA (50 ng/μL). After mixing the reaction we incubated it in a thermocycler at 37 ◦C for 2 h. Ligation of the adapters included the following: 4.8 μL of Adapter/Ligation Solution (EcoRI/MseI adapters, 0.4 mM ATP, 10 mM Tris-HCl (pH 7.5), 10 mM Mg-acetate, 50 mM K-acetate); and 0.2 μL of T4 DNA Ligase (1 U/μL in 10 mM Tris-HCl (pH 7.5), 1 mM DTT, 50 mM KCl, 50% (v/v) glycerol). This volume was added into a microtube with the restriction products from previous reactions. The reaction was left at 37 ◦C for 2 h.

For pre-amplification, we used AFLP Pre-Amp Mix I (Stratec Molecular, Berlin, Germany). The cycle profile for pre-amplification PCR was as follows: an initial step at 72 ◦C for 2 min, followed by 20 cycles of 94 ◦C for 10 s, at 56 ◦C for 30 s and at 72 ◦C for 2 min and final elongation at 60 ◦C for 30 min; containing 4.0 μL of pre-amplification mix, 0.5 μL of 10 × Buffer for RedTaq Polymerase (100 mM Tris-HCl (pH 8.3), 500 mM KCl, 11 nM MgCl2 and 0.1% gelatin) (Sigma-Aldrich, Saint Louis, USA), 0.1 μL RedTaq Polymerase (Sigma-Aldrich, Saint Louis, USA) and 0.5 μL of DNA after restriction and ligation. The product was visualized on 1.8% TBE agarose gel. After amplification, the product was diluted by the addition of 15 μL of ddH2O.

The selective amplification reactions with slight modifications were performed following the protocol described in Mikulášková et al. [26], with a total volume of 9.8 μL, comprising 2.3 μL of preamplified DNA, 5.1 μL ddH2O, 1 μL 10 × polymerase buffer (100 mM Tris-HCl (pH 8.3), 500 mM KCl, 11 nM MgCl2 and 0.1% gelatin) (Sigma-Aldrich, Saint Louis, USA), 0.2 mM dNTP (Thermo Scientific, USA), 0.5 pmol fluorescent dye-labelled EcoRI primer (Applied Biosystems, Foster city, California, USA), 0.5 pmol MseI primer (Generi Biotech, Hradec Králové, Czech Republic) and 0.2 U RedTaq DNA polymerase (Sigma Aldrich, Saint Louis, USA). Selective PCR amplifications were carried out using the following cycle profile: 92 ◦C for 2 min, 65 ◦C for 30 s and 72 ◦C for 2 min. A touchdown protocol was applied in the following eight cycles at 94 ◦C for 1 s, at 64 ◦C (1 ◦C decrease each cycle) for 30 s, and at 72 ◦C for 60 s. This was followed by 23 cycles of 94 ◦C for 1 s, at 56 ◦C for 30 s and at 72 ◦C for 2 min. Final elongation was carried out at 60 ◦C for 30 min.

Eleven primer combinations were tested but only seven were selected for final analysis because they produced distinct polymorphic bands. For all PCR amplifications T100TM Thermal Cycler (Bio-Rad Laboratories, California, USA) was used. The final products after selective amplification were visualized on 1.8% agarose gels buffered in 1 × TBE. Following a successful amplification, the AFLP products were prepared for analysis on 3500 Genetic Analyser, automated sequencer (Applied Biosystems, Foster city, California, USA). Ten percent of the samples were analyzed twice for error rate estimation.

#### *2.4. Data Analysis*

AFLP fragments were analyzed using GeneMarker v 2.0.2 (SoftGenetics, USA). Polymorphic and strong peaks were scored as present or absent and then converted into a binary matrix. The data were used to calculate the percentage of polymorphic fragments, gene diversity (He) and Shannon's information index (*I*) using POPGENE v1.32 [27].

Analysis of molecular variance (AMOVA) was carried out to evaluate genetic diversity within and among samples, as well as to estimate genetic differentiation indexes, using GenAlEx v6 [28]. Principal coordinate analysis (PCoA) was carried out to assess genetic relationship among samples, also using GenAlEx v6.

Patterns of genetic relationships among samples was also investigated using cluster analysis. A dendrogram was constructed based on Jaccard's dissimilarity index with UPGMA using DARwin5 [29]. The software STRUCTURE v2.3.2.1 [30] was used to identify the number of similar population clusters (K) and the proportion of membership of each population in each of the *K* clusters. The analysis of the number of clusters was performed using the recessive allele model with a burn-in and run lengths of 100,000 and 1,000,000 interactions, respectively. The number of clusters was determined following the guidelines of Pritchard and Wen [31] and Evano et al. [32] using the online software Structure Harvester [33], and subsequently visualized using DISTRUCT 1.1 [34]. AFLP percentage of reproducibility was calculated following Bonin et al. [35].

#### **3. Results**

#### *3.1. AFLP Fingerprint*

The seven primer combinations selected for the analysis revealed 10 to 35 fragments in the 58 *G. crinita* samples, with the mean of 24 fragments. Of the 171 total fragments, 99.7% were polymorphic. The fragments were in a size range of 52 to 336 bp (Table 2). The first primer combination (EcoRI-ACG/*Mse*I- CTT) was the most successful with a polymorphic rate of 20.5%. The least successful was EcoRI-ACG/*Mse*I- CAG (5.8%).


**Table 2.** Description of the primer combinations.

Ten percent of the sample size was independently replicated with the same primer combinations, resulting in 85% of the fragment reproducibility among replicated samples.

#### *3.2. Genetic Diversity and Population Structure*

In a single measurement of intra-population diversity, e.g., the percentage of polymorphic fragments, samples from Nuevo Piura provenance (natural population) exhibited the highest diversity (72.5%), followed by the samples from Tingo Maria (cultivated population, 42.2%). There was less diversity in the provenances in the semi-domesticated population (20.7% on average) (Table 3). However, in the semi-domesticated provenances considered as one population, the percentage of polymorphic fragments was 54.4%, Nei's genetic diversity was 0.09 and the Shannon index was 0.16.


**Table 3.** Measurements of genetic diversity in three types of populations of *G crinita:* natural regeneration, cultivated and semi domesticated.

PF = polymorphic fragments; PPF = percentage of polymorphic fragments; NP = natural population, CP = cultivated population, SDP = semi-domesticated population.

Based on 170 polymorphic fragments from the 58 *G. crinita* samples, Nei's genetic diversity values ranged from 0.06 to 0.10 and the Shannon information index (*I*) ranged from 0.09 to 0.19 (Table 3). Comparing the three population types, all measure (polymorphic fragments (PF), percentage of polymorphic fragments (PPF), *He*, and *I*) were slightly higher in the population of natural regeneration.

The coefficient of genetic differentiation (*Gst)* among the three population types was 0.10. This indicates that 10% of the genetic diversity was distributed among the population types. Nei's genetic identity comparison between population types indicated that the highest identity (0.011) was between natural and cultivated populations, and the lowest identity (0.022) was between cultivated and semi-domesticated populations.

Pairwise genetic distance between provenances ranged from 0.011 to 0.063 (Table 4). Nueva Piura and Tingo Maria were the most similar with the minimum distance value of 0.011, while the highest value of genetic distance (0.063) was between Nuevo Piura, Puerto Inca, and Tahuayo, San Alejandro.


**Table 4.** Nei's genetic identity (above diagonal) and genetic distance (below diagonal) among eight *G. crinita* provenances analysed by AFLP.

NR = Nueva Requena, TS = Tahuayo Stream, SA = San Alejandro, CR = Curimana River, NP = Nuevo Piura, TM = Tingo Maria, PI = Puerto Inca, MA = Macuya.

Analysis of molecular variation (AMOVA) showed that 12% of the variation was among population types, 4% was among provenances and 84% was within provenances (Table 5). The level of differentiation among provenances was higher (Fst = 0.16) than among population types (Fst = 0.12) at *p* < 0.001.


**Table 5.** Results of the analysis of molecular variation (AMOVA) of 58 *G. crinita* individuals representing three population types and eight provenances.

<sup>a</sup> Significance tests after 999 permutations.

Patterns of a genetic relationship were visualized using principal coordinate analysis (PCoA) (Figure S1) and a dendrogram based on Jaccard's dissimilarity, which grouped the 58 samples into two main clusters with seven sub-clusters (Figure S2). The number of clusters (K value) assessed by STRUCTURE analysis suggested two was the optimal K because it had the largest delta K value. Under this K = 2 model, provenance from the semi-domesticated population (NR, TS, PI, MA, SA and CU) had some individuals with mixed assignment membership in cluster 1 (black bar) and cluster 2 (white bar, Figure 2). The analysis also provided membership assignment, with the higher membership ranging from 56.6% (CU provenance in cluster 1) to 89.3% (SA provenance in cluster 1). In natural and cultivated populations (NP and TM, respectively), the membership was 73.2% and 80.4%, respectively in cluster 2.

**Figure 2.** Population structure for the 58 *G. crinita* samples from eight provenances at two *K* value. The provenance codes are given above, and the numbers of samples per provenance are given below.

#### **4. Discussion**

Studies of genetic variation in growth and wood traits of *Guazuma crinita* have been published [14,36,37], but genetic variation in morphological traits represents a small part of a total genetic variation in a species [38]. This research assesses genetic diversity in *G. crinita* based on amplified fragment length polymorphism (AFLP) markers, and we found 99.4% polymorphism. In another study involving eleven provenances of *G. crinita* in the Peruvian Amazon, there was 93.8% polymorphism based on Inter Simple Sequence Repeat (ISSR) [17]. Although the methods were different, both studies confirm high levels of genetic diversity in *G. crinita*. The high levels of diversity are probably related to the fact that *G. crinita* is a pioneer species and has long-distance seed dispersal, which results in extensive gene flow [39,40].

We analyzed the genetic diversity of *G. crinita* from three different population types (natural, cultivated and semi-domesticated). Comparing the genetic diversity parameters, such as PPF, *He,* and *I*, the naturally regenerated population had slightly greater genetic diversity than the cultivated and semi-domesticated populations. This suggests that artificial selection in the domestication process has reduced the levels of *G. crinita* genetic diversity. Other studies also confirmed that wild populations usually maintain higher levels of genetic diversity compared with cultivated populations [5,6,41,42].

Higher diversity in natural populations is expected because they are not affected by artificial selection. Maintaining high genetic diversity in natural populations is important because it reduces the risk of local extinction under natural conditions [1,43]. The conservation of cultivated populations is also important to conserve genetic diversity, particularly for those cultivated populations with superior individuals. Genetic diversity parameters were slightly higher in the semi-domesticated population compared with the cultivated population. This probably is due to the larger genetic base of the semi-domesticated population. The semi-domesticated population included six provenances with individuals selected from offspring of 200 mother trees (details of the initial collection were reported by Rochon et al. [14]), while the cultivated population represented only one provenance and a few mother trees. The number of mother trees used to establish a population is a key factor that affects inbreeding and genetic diversity: a low number will cause inbreeding among progeny, while a large number will increase genetic diversity and reduce differentiation among plantations [44].

In this study, three parameters were used to assess genetic differentiation among the three population types, and they gave similar results (AMOVA = 12%, Gst coefficient = 0.10 and Fst = 0.12). This indicates that about 12% of the variation was due to the domestication stage. In contrast, genetic differentiation among naturally regenerated and managed stands of *Picea abies* (L.) Karst in Europe are much lower (Fst = 0.012), suggesting that tree breeding activities have not greatly altered gene frequencies compared with natural populations of this species [45]. Geographical and climatic factors can also affect genetic differentiation, and their effects should be assessed in future studies of *G. crinita* [46,47].

The relatively low level of genetic differentiation (4%) among the eight *G. crinita* provenances can be explained by the high gene flow value (*Nm* = 12.9) reported by Tuisima et al. [17]. The high gene flow probably reflects the long-distance dispersal of its small seed by wind and water [37]. Lower genetic differentiation is also expected for cross-pollinated species [48]. The high level of genetic diversity within provenances in this study was consistent with reports based on phenotypic traits [14,37].

Other studies have also reported relatively low genetic differentiation among tree populations in the Amazon Basin. Russell et al. [49] reported 9% variation among populations of *Calycophylum spruceanum* Benth from several watersheds in the Peruvian Amazon Basin, using seven AFLP primer combinations. This species also produces small seeds that are dispersed over long distances by both wind and water. Nassar et al. [50] found low levels of diversity among populations of three native species from the Amazon Basin based on allozymes (8%, 6% and 7%, respectively, among populations of *Samanea saman* (Jack.) Merr. (Fabaceae), *Guazuma ulmifolia* (Malvaceae) and *Hura crepitans* L. (Euphorbiaceae)).

Many trees species have adaptations that allow long-distance seed dispersal [48,51]. One may expect that trees sampled over a relatively small geographical range (as in our study) would show low levels of variation among populations [15]. However, in some studies in the tropics, trees were sampled over extensive geographical ranges and still showed low differentiation among population (e.g., *Swietenia macrophylla* King [52]; *Vitellaria paradoxa* Gaertn [53]; *Inga edulis* Mart [22]).

According to STRUCTURE analysis, individuals within provenances were assigned mixed membership in the two clusters, so provenances were not distinctly separated. This is consistent with the AMOVA, which showed much greater genetic diversity within than among provenances. As a result, it was not possible to correctly identify groups [54]. Nevertheless, we notice similarity among provenances from the semi-domesticated population (cluster 1), and similarity between provenances from the naturally regenerated and cultivated populations (cluster 2).

#### **5. Conclusions**

AFLP markers were successful and effective for the assessment of the genetic diversity and structure of *G. crinita* populations in different stages of the domestication process. A high level of genetic diversity was observed at the species level, and this probably reflects extensive gene flow due to long-distance seed dispersal.

Genetic diversity appears to be slightly greater in the natural population compared to the cultivated and semi-domesticated populations, while significant genetic differentiation was detected among the three population types. These results are preliminary, given the small sample size, and suggest the presence of a slight, but significant genetic bottleneck in the cultivated and semi-domesticated populations. The semi-domesticated population appears to have a slightly higher genetic diversity than the cultivated population.

There appears to be significant differentiation among the natural, cultivated and semi-domesticated populations, a result presented with caution given the small sample size employed. Future studies should include larger sample sizes in different domestication stages to confirm the results reported in this paper.

The in situ and *circa situ* conservation and sustainable management of naturally regenerated populations are recommended to maintain *G. crinita* genetic resources in order to cope with potential inbreeding depression and environmental changes.

To increase genetic variation in planted populations, we recommend further sampling, the collection of *G. crinita* seeds over an extensive geographic range (including various natural stands), and the establishment of seedlings and clonal seed orchards.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/8/795/s1, Figure S1: Dendrogram based on 171 AFLP loci for 58 samples of *G. crinita.* Samples in green, black and blue are from natural, cultivated and semi-domesticated populations, respectively. Figure S2: Principal coordinate analysis of the 58 samples belonging to eight provenances of *G. crinita*, based on AFLP analysis.

**Author Contributions:** Conceptualization, L.L.T.-C. and P.H.C.; methodology, L.L.T.-C. and P.H. ˇ C.; formal ˇ analysis, L.L.T.-C. and P.H.C. writing—original draft preparation, L.L.T.-C. and B.L.; writing—review and editing, ˇ P.H.C., B.L., J.C.W.; Supervision of the research, B.L., P.H. ˇ C.; project administration, B.L.; funding acquisition, B.L. ˇ All authors have read and agreed to the published version of the manuscript.

**Funding:** The research was funded by the Internal Grant Agency of Czech University of Life Science Prague CIGA (Project No. 20205003).

**Acknowledgments:** The authors thank the collaboration of the Peruvian Amazon Research Institution (IIAP) and to Ing. Walter Rios Perez and Gabriel Morales Alejo in the collection of samples. We would like to thank Bohumil Mandák for support on Structure analysis and Antony Del Aguila Heller for help with Figure 1.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Plus Tree Selection of** *Quercus salicina* **Blume and** *Q. glauca* **Thunb. and Its Implications in Evergreen Oaks Breeding in Korea**

#### **In Sik Kim \*, Kyung Mi Lee, Donghwan Shim, Jin Jung Kim and Hye-In Kang**

Division of Forest Tree Improvement, Department of Forest Bio-Resources, National Institute of Forest Science, 39 Onjeong-ro, Gwonseon-gu, Suwon 16331, Korea; kmile@korea.kr (K.M.L.); shim104@korea.kr (D.S.); acgwnd88@korea.kr (J.J.K.); hyeinkang@korea.kr (H.-I.K.)

**\*** Correspondence: kimis02@korea.kr

Received: 21 May 2020; Accepted: 2 July 2020; Published: 6 July 2020

**Abstract:** This study was conducted to select plus trees of two evergreen oaks, *Quercus salicina* and *Q. glauca*, in Korea. Evergreen oaks are distributed in subtropical region in Korea and have recently emerged as one of the alternative tree species against climate change. Accordingly, a tree breeding program is underway to foster evergreen oaks as a reforestation species for the future. Through intensive survey on the distribution range, 15 stands (8 for *Q. salicina,* 3 *for Q. glauca,* and 4 for both species) were selected as base populations. To select candidate trees, we developed a subjective grading system with six characteristics in three categories and introduced a weighted generalized value (*GVIw*) to compare superiority of candidate trees. The candidate trees were screened using baseline value '0', i.e., if *GVIw* > 0, then accepted and if *GVIw* < 0, then rejected. After then, adjustment was conducted to avoid biasing the selection of plus trees for a particular location. Through this process, 44 candidate trees in *Q. salicina* and 41 candidate trees in *Q. glauca* were selected as plus trees. Finally, the results and implications were discussed in relation to evergreen oak breeding in Korea.

**Keywords:** tree improvement; evergreen oak; phenotypic selection; selection criteria; seed orchard; generalized value; conservation

#### **1. Introduction**

The genus *Quercus* is one of the most important angiosperms in the northern hemisphere in terms of species diversity, ecological dominance, and economic value. Oaks are dominant members of a wide variety of habitats including temperate deciduous forest, temperate and subtropical evergreen forest, subtropical and tropical savannah, and subtropical woodlands [1,2].

In South Korea, six deciduous oaks (*Q. accutissima* Carruth., *Q. variabilis* Blume, *Q. mongolica* Fisch. Ex Ledeb., *Q. serrata* Thunb. Ex Murray, *Q. dentata* Thunb. and *Q. aliena* Blume) and five evergreen oaks (*Q. myrsinifolia* Blume, *Q. acuta* Thunb., *Q. glauca* Thunb., *Q. salicina* Blume, and *Q. gilva* Blume) are naturally distributed. While the deciduous oaks are growing all over the country, from temperate to subtropical regions, evergreen oaks are restricted to subtropical regions, the southernmost part of Korea [3]. Traditionally, deciduous oaks have been used as a wood resource (timber, media for mushroom cultivation, charcoal, etc.) and their nuts are also used for starch production, i.e., acorn jelly. Evergreen oaks are also used as a timber production in subtropical region, but the portion is very small. Accordingly, deciduous oaks have more attention in tree breeding program than evergreen oaks [4].

As global warming progresses, the distribution and composition of forest tree species are expected to change. According to climate change scenario in Korea, it is predicted that the distribution range of conifers will decrease while the range of subtropical tree species will expand northward [5,6]. The assisted migration and development of alternative tree species were suggested as countermeasures

against global warming [7,8]. In Korea, evergreen oaks have recently emerged as one of the alternative tree species against climate change. Moreover, there have been reports that the leaf, branch, bark, and acorn of evergreen oaks contain useful substances that can be used for food or medicine, which have attracted more attention [9,10]. Thus, as a preliminary study, plus tree selection and establishment of seedling seed orchard of *Q. glauca* had been conducted with small scale mainly in Jeju island [11]. After that, Korea Forest Service established a plan for additional plus tree selection and expansion of seed orchard of evergreen oaks, which was major motivation for conducting this study.

Although the utilization potential of evergreen oaks is highly evaluated, there is a problem that the amount of available resources is relatively limited compared to other common species. For example, they are restrictedly distributed in a narrow area of subtropical region with small population size and low appearance frequency. There is also a need to prepare measures to conserve genetic resource. Thus, in the process of reviewing the breeding plan, it is highly recommended to consider the utilization and genetic resources conservation together. Accordingly, unlike a previous study conducted by considering only growth characteristics [11], this study aimed to consider growth, adaptability, and seed production characteristics simultaneously.

Tree breeding is an integral part of modern silviculture used to increase the economic profit through enhanced wood production. In first-generation tree improvement programs, mass selection or phenotypic selection would be the first step in breeding cycle [12]. The philosophy underlying plus tree selection is that the favorable deviation of plus tree from the population mean is due at least partly to genetic rather than environmental or random effects [13]. There are several methods for plus tree selection, i.e., ocular or subjective method, comparison tree method, baseline method, and regression method [14,15]. The comparison tree method is generally preferred and widely applied in even-aged, pure-species stands. In uneven-aged and mixed-species stands, most common for hardwoods, a grading system is required than standard comparison tree method. The regression or baseline method aims to adjust tree scores for differences in ages, competition and/or environmental gradients. The ocular or subjective method is used when environmental noise is too large to be overcome by regression methods or visual inspection will be effective at locating superior trees. A determination of the best selection techniques depends on several factors, including species characteristics, past history, the present condition of the forest, variability and inheritance pattern of important characteristics, and objectives of the particular tree improvement program [12,16].

As deciduous oaks are dominant or co-dominant tree species with high density in natural stands, comparison tree method is usually used for plus tree selection. However, evergreen oaks have a characteristic of being scattered in a mixed forest with a low density, making it difficult to apply the comparison tree method. Regression method or baseline method would be considered as alternatives. However, the data needed to obtain a baseline or regression equation for the target traits was not accumulated at present. Thus, we have to consider ocular selection method for plus tree selection in evergreen oaks. Ocular method rely upon a subjective assessment where trees that appear to be better than average are chosen as plus trees without measuring the candidate or neighboring trees [14,15]. Actually, simple direct (ocular) selection method was used for plus tree selection in previous study on *Q. glauca* [11]. They were selected 35 plus trees only based on growth traits such as height, diameter, stem straightness, and timber height in lowlands and valleys of Mt. Halla in Jeju island. It seemed to achieve the selection objectives relatively easily using ocular selection method. However, they were not present the detailed selection criteria and methods. In addition, it was regrettable to overlook the characteristics such as flowering, fruiting, resistance to pests or diseases, and adaptability to environment for production of improved seed and reforestation. In other case, Stringer et al. [17] compared the growth and stem quality of 19-year-old progeny from superior and comparison trees in *Q. rubra*. They suggested that selecting several phenotypically above-average candidate trees may be more effective than rigorously selecting a smaller number of phenotypically superior trees. Combining these two cases, it seemed that development of some modified selection method was required for this

study, because the selection work has to be operated at several stands with different environments in the entire distribution and the selection results have to be evaluated objectively by some standard.

Accordingly, we firstly tried to make selection criteria and method for evergreen oaks. Then, we applied the method to plus tree selection of *Q. salicina* and *Q. glauca* for case study. Finally, the results and implications were discussed in relation to evergreen oak breeding in Korea.

#### **2. Materials and Methods**

#### *2.1. Selection of Base Populations*

To screen candidate stands and select base populations of *Q. salicina* and *Q. glauca*, some relevant literatures had been searched [3,18–22]. However, there were limitations in understanding of population size, density and growth performance of target species, which were needed for the selection of base populations. To supplement this, we had interviewed with related researchers and experts to get more information. In addition, consideration was also given not to overlap with the stands selected in previous study [11].

Through this process, 25 stands (15 for *Q. salicina* and 10 for *Q. glauca*) were screened. After then, preliminary survey was conducted to determine if they were appropriate or not as a base population. In the fields, we examined stand size, frequency and density of target species, growth state, existence of candidate trees, and environmental condition. These findings were used to select base populations as well as set up the selection criteria and method.

For selection of base population, the priority was given to whether there were enough individuals to perform selection and whether there were candidate trees that met the selection criteria. For example, a stand with small size and no available candidate trees was excluded. Finally, 15 stands were selected as a base population, i.e., 8 for *Q. salicina*, 3 for *Q. glauca*, and 4 for both species (Figure 1, Table 1).

**Figure 1.** Location map of base populations for candidate tree selection in *Quercus salicina* and *Q. glauca*. The numbers in the black circle indicate the No. of stands shown in the Table 1.

#### *2.2. Selection Criteria and Methods*

In Korea, selection criteria and method for broad-leaved tree species are already established and have been used to plus tree selection of several tree species [23]. However, the method is based on comparison tree method, which is not suitable to evergreen oaks as mentioned above. Through literature review and in-depth discussion, a subjective grading system was developed and applied for plus tree selection of evergreen oaks.

Firstly, the dominant trees that occupy the upper crown canopy of the stand were selected as a candidate trees. The interval among adjacent candidate trees was kept at 30~50 m to avoid selection of relatives. In addition, selection of candidate trees was made to include as diverse location as possible to avoid biased selection from a particular site.


**Table 1.** List of 15 candidate stands of *Q. salicina* and *Q. glauca* stands used in this study.

To select candidate trees, we developed subjective grading system with six characteristics in three categories (Table 2). Each characteristic was rated and scored as 5 grades, i.e., 5—very good, 4—good, 3—moderate, 2—poor, and 1—very poor. The grade was given by relative superiority or degree compared to the average of the stand.

**Table 2.** Selection criteria and weights of characteristics used for candidate tree selection in *Quercus salicina* and *Q. glauca.*


Growth category was aimed to evaluate the superiority of growth (SG) and superiority of tree form (STF). The SG was determined by considering growth status of shoot, branches, crown, and stem. This is an indicator for selecting large and well-growing individuals. The STF was determined by considering number of stem, stem straightness, and timber height. This is an indicator for selecting individuals with straight stem and high timber height for long-log production.

Adaptability category was aimed to evaluate the adaptability to disturbance (AD) and adaptability to environment (AE). The AD was determined by considering adaptability to disturbance such as biotic (pests or diseases) and abiotic (artificial disturbance) stresses. The symptoms or damages by pests or

diseases were immediately identified by ocular observation. Artificial disturbance usually appeared in the form of collecting leaves or small branches, even stem for medicine. It was also possible to intuitively determine whether there was an impact on tree growth by ocular observation. The grade was scored by considering these two factors together. Thus, this is an indicator for selecting individuals with superior adaptability or resilience to disturbance. The AE was determined by considering vitality and adaptability to a given environment. When a tree does not adapt well to a given environment, its vitality was decreased. The adaptability can be indirectly identified through observations of growth state of leaves, i.e., small and fewer leaves in crown and/or abnormal growth responses of leaves such as curling, bending, marginal browning, chlorosis, and shedding. The grade was scored considering all aspects mentioned above. Thus, this is an indicator for selecting individuals with superior adaptability to environment.

Seed production category was aimed to evaluate the superiority of seed production (SSP) and potential of seed production (PSP). The SSP was determined by considering current status and recent history of seed production. First, the average level of flowering or fruiting of the stand was determined and the grade was scored according to the relative level of flowering or fruiting of an individual. However, the amount of flowering or fruiting usually varied from year to year. To compensate for this, the grade was adjusted considering the amount of buried or floor seed under the crown of individual tree. Thus, this is an indicator for selecting individuals with superior seed production. The PSP was determined by considering seed production capacity reflecting the advantages and disadvantages of micro-environment affecting flowering or fruiting such as spacing, light intensity, soil moisture, and humidity. The exact values of these factors cannot be accurately known without measuring equipment, but experienced foresters can intuitively determine whether the place is favorable or unfavorable for flowering or fruiting. The grade of PSP was scored considering these judgement results and SSP grade together. Thus, PSP is an indicator for selecting individuals with high potential of seed production.

In a grading system, weights of the characteristics were generally determined by its heritability and economic worth [16]. However, since there is no information on the heritability of evergreen oaks, breeding goal and expected economic worth were only considered for determining the weights of each category and characteristic in this study. The primary objective of this breeding program is to improve growth rate and stem straightness for timber production. Thus, the highest weight was given to growth category. Seed production was considered the second most important in terms of supply of materials for reforestation as well as food and medicine. So, the second highest weight was given to seed production category. Usually, adaptation is not well placed in a grading system, because it is automatically taken into account through common garden test of selected trees [16]. However, since evergreen oaks have the urgency of genetic resources conservation as mentioned before, adaptability category was included in this study considering adaptability to various environments for assisted migration or reforestation outside of current habitats range. However, it was judged that its importance in current state was relatively lower than those of others, so the lowest weight was given to adaptability category. The detailed weights of each category and characteristic were inevitably determined through discussion of relevant experts and researchers (Table 2).

In principle, individuals above the moderate grade in all characteristics were selected as candidates. Exceptionally, however, if the number of candidates to be selected was small and/or the growth-related characteristics were excellent, an individual with lower than moderate grade in some characteristic included in the candidates.

This subjective grading system is frequently used for hardwoods but is successful only if the grader is experienced and dedicated to finding the best tree possible [16,24]. In order to minimize the problem of deviation caused by technical errors or prejudices when selecting candidates, the survey team composed of three experienced researchers and standardized the research method through simulation selection according to the selection criteria before starting work. Two investigation teams were operated for candidate tree selection. Each team consisted of 3 people, i.e., selector, evaluator/recorder, and GPS manager. Selector was primarily responsible for selection and marking of candidate trees. Evaluator/recorder was responsible for evaluation and recording the grade of each characteristic in each candidate tree. GPS manger had recorded and managed the positional information. In all cases, if there were some problem in selection process, i.e., whether to select or not, how to rank, and etc., three persons worked together to draw a conclusion. To reduce the deviation of candidate tree selection, all members of each team had a meeting to evaluate and adjust daily results after finishing work every day.

#### *2.3. Data Analysis*

Since the primary goal of this study was to secure as many candidate trees as possible from various candidate stands, candidate selection was made on each stand basis according to the selection criteria and methods. However, it was suggested that it would be appropriate to combine geographically adjacent stand with similar environment into a group. If each stand was treated separately in the process of data analysis for plus tree selection, candidate trees selected under similar environmental conditions were more likely to be included into breeding population. The reason could be understood by looking at the relevant analysis method presented later in this section. For *Q. salicina*, Sacheon1 and Scaheon2 were tied together because they were adjacent to each other with a ridge in between them. Gamsan and Cheongsu, Seoho and Sanghyo, and Buhwang and Jeongja were also tied together respectively, because they are geographically adjacent under similar environment. For *Q. glauca*, Cheongsu and Gamsan were also tied for the same reason. The linear distance between stands tied together was ranged 1.4~10.4 km. All subsequent analysis was performed based on this group (Table 3).


**Table 3.** The adjusted groups and the number of candidate trees selected in each group in *Quercus salicina* and *Q. glauca*.

Since a grading system was applied to select candidate trees in this study, the scoring data of measured characteristics were highly skewed to 4 or 5 grades. Thus, prior to performing the analysis, the data was transformed using logarithm function to improve normality [25–27]. Even after data transformation, the normality was not improved significantly while the homoscedasticity was satisfied. In this case, the robustness of analysis of variance (ANOVA) would be weakened. So, a non-parametric method such as the Kruskal-Wallis (KW) test was generally recommended as an alternative. However, in case of small sample size like this study, the ANOVA test will be a better option than KW test, even for non-normal data [28]. Moreover, when using rank data, the result of KW test was very similar to that of ANOVA F test. Since the grade score measured in this study was a kind of rank in each characteristic, it was considered possible to use ANOVA and multiple comparison methods. Thus, one-way ANOVA with fixed effect model and post hoc multiple comparison analysis were performed to examine difference across the group means and specific difference between pairs of groups, respectively. For multiple comparison analysis, Fisher's least significant difference (LSD) with Bonferroni adjustment was applied. To understand the relationship among six characteristics as well

as height and diameter at breast height of candidate trees, correlation analysis was performed with Spearman rank correlation. All statistical analyses were performed using R software version 4.0.0.

Although we obtained scores applying grade system in each characteristic, the mean and distribution of the measurements were different among characteristics as well as base populations. If we used original grade scores measured for candidate selection, it was possible that individuals from a specific stand or group with overall high grade are more likely to be selected as candidates. In other words, if such an erroneous biased selection was made, it does not meet the purpose of this study to select candidate trees evenly from various stands as possible. Therefore, normalization (rescaling) of the data was carried out before applying the weights for each characteristic according to the selection criteria.

There were several methods typically used for normalization, i.e., scaling to a range, clipping, log scaling, z-score, etc. Depending on the nature of the data and the purpose of the study, researchers usually select and use the appropriate method [25–27]. In this study, we normalized the data by dividing the deviation by mean. When applying this method, the mean of the new data was set to '0' and the each value was reversely adjusted according to the relative value of the original mean. This feature was thought to be more suitable to our purpose, because we could select candidates more evenly from various stands or locations than other methods.

Then, we could consider two kinds of mean value for normalization, i.e., each population level (*GVp*) and total population level (*GVt*) like the below formula.

$$GV\_{p\_i} \coloneqq (\mathbf{X}\_i - \,\overline{\mathbf{X}}p) / \overline{\mathbf{X}}p \tag{1}$$

where, *GVp,i* is a generalized value of *ith* individual, *Xi* is a grade score *ith* individual and X*p* is a mean value of a corresponding characteristic in each population.

$$GV\_{t,i} = (X\_i - \overline{\mathbf{X}}t) / \overline{\mathbf{X}}t \tag{2}$$

where, *GVt,i* is a generalized value of *ith* individual, *Xi* is a grade score *ith* individual and X*t* is overall mean value of a corresponding characteristic in total population.

The next question was which mean value was more effective for candidate selection. *GVt* was assumed that the environmental differences among stands was negligibly small. If we only considered *GVt*, it would be easy to compare and rank overall selected candidates, but it may lead to biased selection from a specific stand with superior properties without consideration of environmental differences. However, environmental impact on fitness and growth of provenances or families is already well known in several cases [16,29]. Thus, it is necessary to combine *GVp* in candidate tree selection, which was assumed environmental differences among stands. Moreover, since the genetic test has not been conducted, the GxE interaction effect cannot be estimated. To compensate for this problem, we decided to temporarily apply the weighted generalized value, i.e., *GVw* = 0.7 × *GVp* + 0.3 × *GVt*. These weights are not from a scientific basis but from a conceptual one suggested by researchers in the sense that consideration of each population level was more important in achieving the purpose of this study.

In actual data analysis, we firstly calculated *GVp* and *GVt* of each characteristic in each individual, respectively. To obtain the summation index of generalized value of each characteristic for each individual, *GVIp* and *GVIt* were calculated with *GVp* and *GVt* multiplying by the weights of each characteristic, respectively (Table 2).

$$GVI\_{p, \, i} = \sum \left( GV\_{p, \, i} \times \mathbb{C}\_{\, j} \right) \tag{3}$$

where *GVIp,i* is a generalized value of individual based on *GVp* value and *GVj* is a weight of corresponding characteristics.

$$GVI\_{t,\,\,i} = \sum \left( GV\_{t,\,\,i} \times \mathcal{C}\_{\,\,j} \right) \tag{4}$$

where *GVIt,i* is a generalized value of individual based on *GVt* value and *Cj* is a weight of corresponding characteristics.

Then, the weighted generalized value of each individual (*GVIw*) was calculated as like *GVIw* = 0.7 × *GVIp* + 0.3 × *GVIt*. The candidate trees were truncated using *GVIw* of each individual with baseline value '0 , i.e., if *GVIw* > 0, then accepted and if *GVIw* < 0, then rejected. Finally, plus trees were selected through the adjustment process to avoid biasing the selection of candidate trees for a particular stand or group.

#### **3. Results and Discussion**

#### *3.1. Characteristics of Base Populations*

The base populations were horizontally distributed in the inlands and islands region of southern coast of Jeonnam-do, and southern parts of Jeju-do. The latitudinal range was from 33.33◦ (Sanghyo) to 34.47◦ N (Sacheon) and the most of altitudinal range was below 200 m above sea level, except for Seoguipo (465 m) and Seoho (602 m). The mean annual temperature (MAT), mean annual precipitation (MAP), and relative humidity (RH) of base populations were ranged 13.6~16.2 ◦C, 1453.4~1850.8 mm and 69.7~78.2%, respectively. These areas belonged to warm temperate and subtropical region, which were characterized by high temperature, high precipitation, and high humidity in Korea (Figure 1, Table 1).

In overall range of both species, most stands were discontinuously distributed with relatively small size and low appearance frequency, which would lead to restriction of genetic exchange and promote of genetic differentiation. The flowering and fruiting are well done in most stands, however, seedlings for next generation were not well observed within a stand, except for some forest gaps. This phenomenon was commonly found in mature and old-growth forests with closed canopy [30,31]. In addition, both species were being pushed out of competition with other companion species, such as *Distylium racemosum*, *Acer palmatum*, *Ficus erecta*, *Machilus therbergii*, *Litsea japonica*, *Celtis sinensis*, etc.

In most stands, except for Seonhul and Cheongsu, it was estimated that inbreeding was likely to occur due to limitation of pollen-mediated gene flow by low dominance and frequency of target species within a stand. Moreover, the morphological variation of leaves was observed. For example, *Q. salicina* typically has leathery, narrow-lanceolate, taper-pointed leaves having mostly entire margins with a few marginal teeth near the apex. The front side of leaves is glossy and the reverse side is grayish green, which is easily distinguishable from other evergreen oaks. However, in Yesong, Daeya, and Sacheon, there were variations on leaf size, shape, and reverse side color different from the typical characteristics. In relation to this, some varieties like *Q. salicina* var. *latifolia* and *Q. glauca* var. *nudata* were described as an ecotype [4]. Thus, it was difficult to assert that there were some degree of genetic differentiation among stands in target species.

While well managed stands generally showed good growth performances, the stands experienced with illegal or destructive logging had many multi- or crooked stems due to reproduction by sprouts [19]. Thus, it was also necessary to consider the history of disturbance of stand for candidate tree selection. For example, even if an individual had multi-stem, it is possible to select as a candidate if the growth characteristics of each stem is good.

Considering the investigation results, it was highly suggested that more active measures are needed for sustainable use of evergreen oaks. Although the distribution of evergreen oaks is expected to expand due to climate change [5,6], it is possible that the distribution will not expand as expected due to competition with other broad-leaved tree species. Even if it is possible to create new habitat by natural migration, the genetic diversity may decrease due to the founder effects, i.e., limited number of individuals will contribute in the formation of next generation and genetic drift may result in the loss or fixation of some allele within stand [16]. Thus, it was suggested that assisted migration by supplying seeds from seed orchard with reasonable genetic diversity [7,8]. To this end, this study attempted to select plus trees from as many stands and/or individuals as possible to secure genetic diversity, if the selection criteria were met.

#### *3.2. Selection of Candidate Trees*

According to selection criteria and methods, intensive survey and selection were conducted to each base population. Since we tried to assess all trees appearing within a stand as possible, there were large differences in the number of assessed trees per site depending on the stand size as well as density and frequency of the target species. For example, in small stands such as Yulpo, Goheung, and Gamsan, it was ranged approximately 50~100. In large stands such as Seonhul, Seoho, and Cheongsu, it was ranged approximately 250~300. A total of 169 candidate trees were primarily selected in two target species (Table 3). While the selection intensity of *Pinus* species with large pure stands was generally ranged 0.5~1.0% [23], the range in this study was approximately 5~8%. As a result of sampling and analyzing of candidate trees using increment borer, most of them were belonged to age class IV (30~40 years old). It indicated that they were mature enough to be selected. Although optimum selection age may vary depending on heritability and genetic correlation of target trait, more than 50% of rotation age is thought as an appropriate selection age to reduce the risk of early selection [12,23].

In *Q. salicina*, 87 candidate trees were selected from 8 groups (12 stands). The highest number of candidate trees was selected in Sacheon (26 individuals) and the smallest one was Yulpo (3 individuals). In *Q. glauca*, 82 candidate trees were selected from 6 groups (7 stands). While the highest numbers of candidate trees were obtained in Cheongsu (33 individuals), Sayang had the smallest candidate trees (3 individuals).

Although there were some limitations to discuss the differences among groups because the values were just only measured from selected candidate trees, it was observable significant differences among groups in some characteristics (Table 4). In *Q. salicina*, the mean values of superiority of growth (SG), adaptability to disturbance (AD), superiority of seed production (SSP) and potential of seed production (PSP) were significantly different among groups, but superiority of tree form (STF) and adaptability to environment (AE) were not. In *Q. glauca*, all characteristics except for adaptability to environment (AE) were significantly different among groups. These differences might reflect the differences in local environment and growth situation among groups.


**Table 4.** Multiple comparisons of group mean among candidate trees in *Quercus salicina* and *Q. glauca* by Fisher's least significant difference (LSD) with Bonferrroni adjustment.

Significant differences (*p* < 0.05) are indicated by different letters among groups. ns indicate not significant. \*, and \*\* represent a significant at 95% and 99% probability level.

Although mean values of height (HT) and diameter at breast height (DBH) of candidate trees in each group were presented together (Table 4), candidate tree selection was made by only six characteristics. In other words, HT and DBH had no effect on candidate tree selection because they were arranged after finishing of selection. To understand the relationships among these variables, correlation analysis was conducted in each species (Table 5). Although the explanatory power was low due to weak correlation, a rough tendency for significantly correlated variables was examined.


**Table 5.** Spearman rank correlations among variables measured from candidate trees of *Quercus salicina* (upper diagonal) and *Q. glauca* (lower diagonal).

ns indicate not significant. \*, and \*\* represent a significant at 95% and 99% probability level.

In both species, there were no correlations between SG and STF in growth category and between AD and AE in adaptability category. However, there was a significant correlation between SSP and PSP in seed production category. It meant that candidate trees with higher fruiting at present were also graded having higher potential of reproduction, although the evaluation criteria were different.

In *Q. salicina*, AD had significant positive correlation with SG and significant negative correlations with SSP and PSP. It indicated that the candidate trees with high adaptability to disturbance also showed good growth and development, but the seed production was not. AE showed positive correlation with STF, which indicated that the candidate trees with high adaptability to environment had good tree form. HT of candidate trees had significant positive correlation with SG, AD and PSP. DBH of candidate trees also had significant positive correlation with SG. It meant that the candidate trees with superior growth, high adaptability to disturbance and high potential of seed production had higher HT and/or DBH.

In *Q. glauca*, AD had significant positive correlation with SG like *Q. salicina*. AE had significant positive correlation with SG, SSP, and PSP. It indicated that the candidate trees with higher adaptability showed higher growth and seed production. Meanwhile, STF had negative correlation with PSP, which indicated that candidate trees with superior tree form showed low potential of seed production. HT had significant positive correlation with SG and AD. DBH had significant positive correlation with AD and PSP but significant negative correlation with STF. It meant that the candidate trees with superior growth and high adaptability to disturbance had higher HT and/or DBH. However, candidate trees with higher STF or DBH showed lower PSP unlike *Q. salicina*.

By analogy with the results of this survey and the relevant research [32], it seemed to be related to the characteristics of the species. Initial adaptation of *Q. glauca* is favorable as a pioneer species, but if the stand is stabilized, it is disadvantageous in competition with other species. *Q. salicina* is estimated to be advantageous in competition once it is adapted because it has higher adaptability in relatively broad environments as a generalist species.

#### *3.3. Data Analysis and Plus Tree Selection*

The weighted generalized value of each individual (*GVIw*) was calculated as *GVIw* = 0.7 × *GVIp*+ 0.3 × *GVIt*. The candidate trees were truncated with baseline value '0 , i.e., if *GVIw* > 0, then accepted and if *GVIw* < 0, then rejected (Figure 2).

**Figure 2.** Truncation selection of candidate trees based on *GVIw* values in *Quercus salicina (top) and Q. glauca (bottom)*. If *GVIw* > 0, then accepted and if *GVIw* < 0, then rejected. Capital letters indicate the group to which each individual belongs.

In *Q. salicina*, 43 candidate trees were accepted and 44 candidate trees were rejected. In *Q. glauca*, 42 candidate trees were accepted and 40 candidate trees were rejected (Table 6). In group A of *Q. salicina*, for example, *GVIp* and *GVIt* of No. 4 candidate tree were 0.0242 and −0.0200, respectively. If we only considered *GVIp* the candidate tree would be accepted (>0), but if *GVIt* was only considered, the candidate tree would be rejected (<0). However, if we applied *GVIw* (=0.0109) suggested in this study, it was classified as an accepted candidate (>0).


**Table 6.** Final list of plus trees selected by truncation (*GVIw* > 0) and adjustment in *Quercus salicina* and *Q. glauca*.

The final selection was decided through some adjustment process considering the imbalance of number of individuals selected, the specificity of stand identified at investigation, and biased selection according to the differences in micro-environment within the group.

The major adjustments performed in *Q. salicina* were as follows (Table 6). In group A, only 3 out of 11 candidate trees were included in accepted group. To balance the number of candidate trees among groups, No. 6 (−0.0054) and No. 8 (−0.0179) with higher *GVIw* among rejected group were included in accepted group. In group C, Seoho (602 m) and Sanghyo (465 m) stand were located in highland area comparing other stands (less than 200 m). To consider specificity of stand and balance the number of candidate trees among groups, No. 8 (−0.0050) and No. 14 (−0.0050) with higher *GVIw* among rejected group were included in accepted group. In group D, Sacheon1 and Sacheon2 stand were adjacent with a ridge in between them. However, the micro-environment was a little different, i.e., soil depth, direction of slope, and topology of Sacheon1 is better than Sacheon2. So, they were initially treated as a separate stand. Later, however, they were grouped into a group considering geographic proximity. Accordingly, 85.7% of candidate trees in Sacheon1 (6 out of 7) were accepted whereas only 47.4% of candidate trees of Sacheon2 (9 out of 19) were accepted. To balance the number of candidate trees within group, adjustment was made to include No. 5 (0.0033) and No. 7 (0.0047) with small *GVIw* in the rejected group.

The following adjustments had been made in *Q. glauca* (Table 6). In Cheonsu stand in group B, No. 8~13, 15, and 20~24 candidate trees were selected from relatively good sites with deep soil depth and low gravel ratio. All except for No. 11 candidate tree were included in accepted group. To avoid biased selection to a specific site condition, two candidate trees with lower *GVIw*, No. 10 (0.0172) and No. 14 (0.0188), were adjusted into rejected group. An opposite case was found in group D. No. 3~5 were selected in relatively poor site of Jeongja stand. Accordingly, all of them were rejected by *GVIw* truncation. To broaden genetic variability, adjustment was made to include No. 4 candidate tree in accepted group. In the same sense, No. 3 candidate tree (0.0208) in Jangjoa stand of group E were adjusted into rejected group and No. 3 (−0.0228) and 11 (0.0067) in Seonhul of group A were adjusted into accepted group and rejected group, respectively.

Although the primary goal of this study was to select plus trees, the conservation aspect was also considered. Thus, we tried to include as many individuals as possible from stands of various environments. To this end, the selected stands were divided into groups as well as the adjustment was performed as above. Nevertheless, we cannot be sure that enough genetic diversity will be included in a breeding population. Thus, additional studies are required to evaluate genetic diversity, i.e., comparison of genetic diversity between natural stands and candidate trees including rejected trees. If genetic diversity is lower than expected, adding some rejected trees to breeding population may be an option.

Through adjustment process, 44 candidate trees in *Q. salicina* and 41 candidate trees in *Q. glauca* were finally selected as plus trees. In order to compare the selection effect of plus tree, the mean values of six characteristics, height and diameter growth between accepted group (plus trees) and rejected group (Table 7). On average, the accepted group was excellent in all characteristics in both species, i.e., 103.2~112.9% in *Q. salicina* and 105.0~112.8% in *Q. glauca*. This tendency was the same in HT and DBH, even though they were not considered directly at the time of selection. It was interpreted as the result of indirect selection of HT and DBH in relation to selection of SG, AD, or STF as shown in the correlation analysis (Table 5). It was meaningful that similar selection effects were obtained without direct measuring or selecting growth-related traits like other general selection methods.


**Table 7.** Comparison of means values of six characteristics, height and diameter growth between accepted trees (plus trees) and rejected trees in *Quercus salicina* and *Q. glauca*.

#### *3.4. Implications is Evergreen Oak Breeding*

Evergreen oaks are only distributed in the southernmost part of Korean peninsula and the natural stands are discontinuously scattered with small size [3,4]. The genetic conservation for evergreen oaks is also highly recommended [11,33]. Accordingly, it is difficult to secure large stand for seed production area or seed stand. Thus, establishment of multi-purpose seedling seed orchard (MPSSO) for seed production and genetic conservation by mass selection was suggested as an alternative.

There are several selection methods, but the most important thing is to find the best method for the target species [16]. Comparison tree selection had been widely used in most conifer species as well as deciduous oak species in Korea [23].

In this study, some modified selection criteria and method were applied for plus tree selection. It was primarily due to the characteristics of the target species as mentioned above. Another reason was the results of Stringer et al. [17]. They suggested that selecting several phenotypically above-average candidate trees may be more effective than rigorously selecting a smaller number of phenotypically superior trees in *Quercus rubra*. Reflecting these aspects, the truncation of candidate trees for plus tree selection was performed with baseline value of *GVIw* in this study.

Through this process, 44 candidate trees in *Q. salicina* and 41 candidate trees in *Q. glauca* were finally selected as plus trees. Since it was found that this method could provide a selection effect similar to direct selection on growth-related traits, it was expected that application of this methodology could be expanded to other tree species if the conditions were met.

Most tree improvement programs are designed to provide continued gain through each of many cycles of improvement. A selection procedure that involves many cycles of selection and breeding is known as recurrent selection (RS). A number of recurrent selection schemes have been devised to utilize general combining ability (GCA) and, in some cases, specific combining ability (SCA) [12,16].

In case of *Q. salincina* and *Q. glauca*, simple recurrent selection (SRS), was suggested as a strategy for advanced generation. Although SRS is less efficient at achieving genetic gains than other forms of recurrent selection, i.e., RS for GCA and RS for SCA, it is considered to be a more reasonable approach matched the breeding goal and current situation of *Q. salincina* and *Q. glauca*. It could serve the purpose of increasing the average of genetic gain in each generation and maintaining the genetic variation in the selected population as much as possible.

Another reason for making this suggestion is the aspect of forest policy in Korea. The reforestation tree species are classified into two categories, i.e., major vs minor tree species. The major tree species is a species with high demand and large planting area, i.e., deciduous oaks, Pines, Larches, and so on. The minor tree species is the opposite. Evergreen oaks and most of other broad-leaved trees are belonged to this category. Accordingly, RS-GCA and RS-SCA strategy are mainly applied in major tree species. In practical aspects, it is difficult to invest a lot of efforts and expenses into minor tree species [29,34].

Nevertheless, to compensate the disadvantage of SRS and use of GCA information for advanced generation breeding, operation of several MPSSO was suggested as an alternative. At present, a discussion is underway to determine the location and number of sites for establishment of MPSSO in Korea.

#### **4. Conclusions**

Although evergreen oaks, including *Q. salicina* and *Q. glauca*, are restricted to subtropical regions on the southernmost part of the Korean peninsula, it is considered as an alternative species for adaptation to climate change in the future. So, Korea Forest Service is being pursed to expand the selection of plus trees and establishment of seed orchard to simultaneously consider conservation and utilization of evergreen oaks. However, a selection methodology for evergreen oaks has not been established to support this. Through this study, we tried to find a suitable methodology for the plus tree selection of evergreen oaks. The usefulness of our method was confirmed through application of plus tree selection in *Q. salicina* and *Q. glauca*. Thus, it is expected that all processes and results thus far will be used as a guideline for plus tree selection as well as conservation of evergreen oaks.

**Author Contributions:** Conceptualization and methodology, I.S.K. and K.M.L.; investigation, K.M.L., J.J.K. and I.S.K.; data curation, J.J.K. and K.M.L.; formal analysis, D.S. and H.-I.K.; writing—original draft preparation, I.S.K. and K.M.L.; writing—review and editing, D.S., H.-I.K. and I.S.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We are greatly indebted to W.Y. Choi, J.C. Lee, and S.S. Chang for their valuable suggestions and helping in field investigation. This study is supported by National Institute of Forest Science, Republic of Korea (FG0400-2019-01).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Genetic Diversity of** *Paeonia rockii* **(Flare Tree Peony) Germplasm Accessions Revealed by Phenotypic Traits, EST-SSR Markers and Chloroplast DNA Sequences**

#### **Xin Guo, Fangyun Cheng \* and Yuan Zhong**

Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Peony International Institute, Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing 100083, China; freshxiaoxinxin@163.com (X.G.); zhongyuanbjfu@126.com (Y.Z.) **\*** Correspondence: chengfy8@263.net; Tel.: +86-010-62338027

Received: 16 May 2020; Accepted: 10 June 2020; Published: 12 June 2020

**Abstract:** Research Highlights: This study, based on the first collection of cultivated *Paeonia rockii* (flare tree peony, FTP) germplasm across the main distribution area by our breeding desires, comprehensively evaluates these accessions by using phenotypic traits, expressed sequence tag (EST)-simple sequence repeat (SSR) markers and chloroplast DNA sequences (cpDNA). The results show that these accessions collected selectively by us can represent the genetic background information of FTP as a germplasm of tree crops. Background and Objectives: FTP has high cultural, ornamental and medicinal value traditionally, as well as recently presenting a significance as an emerging edible oil with high α-linolenic acid contents in the seeds. The objectives of this study are to reveal the characteristics of the genetic diversity of FTP, as well as to provide scientific suggestions for the utilization of tree peony breeding and the conservation of germplasm resource. Materials and Methods: Based on the phenotypic traits, EST-SSR markers and chloroplast DNA sequence variation, we studied the diversity of a newly established population of 282 FTP accessions that were collected and propagated by ourselves in our breeding project in recent years. Results: (1) There was an abundant variation in phenotype of the accessions, and the phenotypic variation was evenly distributed within the population, without significant hierarchical structure, (2) the EST-SSR data showed that these 282 accessions had relatively high genetic diversity, in which a total of 185 alleles were detected in 34 pairs of primers. The 282 accessions were divided into three distinct groups, and (3) the chloroplast DNA sequences (cpDNA) data indicated that these accessions had a higher genetic diversity than the population level and a lower genetic diversity than the species level of wild *P. rockii*, and the existing spatial genetic structure of these accessions can be divided into two branches. Conclusions: From the results of the three analyses, we found that these accessions can fully reflect the genetic background information of FTP germplasm resources, so their protection and utilization will be of great significance for genetic improvement of woody peonies.

**Keywords:** *Paeonia rockii* (flare tree peony) germplasm accessions; phenotypic traits; EST-SSR markers; chloroplast DNA sequences; genetic diversity

#### **1. Introduction**

Tree peony or Mudan, known as "the king of flowers" in China, has high cultural, ornamental and medicinal value traditionally, as well as recently presenting a significance as an emerging edible oil with high unsaturated fatty acid (>90%) and α-linolenic acid (>40%) contents in the seeds [1,2]. Tree peony belonging to the section *Moutan* in the genus *Paeonia* and all wild species are endemic to China. It has been cultivated since the Tang Dynasty (618–906 AD) and is now widespread in temperate regions of China and other countries. At present, there are more than 1500 germplasm accessions of tree peony in the world, included in about 17 cultivar groups. In China, long-term and repeated domestication and selection breeding have formed ten groups, about 1000 germplasm accessions, with diverse genetic background [3].

As the representative of the most-widely cultivated tree peonies, *Paeonia su*ff*ruticosa* Andrews or the common tree peony (CTP), includes the germplasm accessions traditionally cultivated in China and Japan, while the germplasm accessions originated from *P. rockii* or the flare tree peony (FTP) have established a distinct group from CTP and are known well, with a colored flare at the base of petals [3]. Compared to CTP, FTP is more fertile in most cases as well as more resistant to stress conditions like coldness, drought and poor soil. FTP is distinct from CTP in many other features, like more fragrant blossoms, and taller and stronger plants with more longevity (Figure 1) [1,4]. Since many studies [1,5–11] have confirmed that FTP is a valuable genetic resource of tree peonies, greatly promising in the breeding and industry of peonies, the further in-depth study of FTP has become obviously required for the utilization and protection of FTP as a germplasm resource of tree crops.

**Figure 1.** The morphological characteristics of *P. rockii* plant: (**a**) *P. rockii* has more fragrant blossoms, and taller and stronger plants with more longevity, (**b**) there is a colored flare at the base of petals, and (**c**) with strong fertility, the ripe follicles contain a large number of seeds.

The analysis of genetic diversity can provide beneficial data for collection of germplasm resources, breeding application and researches on origin and evolution [12]. Phenotypic variation analysis is one of the most important, basic, intuitive but indispensable methods for the development and application of plant germplasm resources. In tree peonies (*Paeonia* Sect. *Moutan*), researchers have focused on the genetic diversity of phenotypic traits in different species and cultivated germplasm, including *P. delavayi* [13], *P. ostii* [14,15], *P. delavayi* var *lutea* [16], *P. su*ff*ruticosa* [17–19] and so on. In recent years, the emergence of molecular markers has greatly accelerated the development of genetics and genomics and is expected to be a powerful tool to accelerate the research of tree peony breeding. Researchers had used random amplification polymorphic DNA (RAPD), inter-simple sequence repeat (ISSR), sequence-related amplified polymorphism (SRAP), conserved DNA-derived

polymorphism (CDDP), inter-primer binding site (iPBS), target region amplified polymorphism (TRAP), expressed sequence tags microsatellite markers (EST-SSR), chloroplast DNA sequences (cpDNA) and EST-SSR with cpDNA to analyze the genetic diversity of *P. su*ff*ruticosa* [20–26], *P.* x *yananensis* [27], *P. rockii* [28,29], *P. delavayi* [30], *P. ludlowii* [31], *Paeonia* subsect. *Delavayanae* [32,33], *P. decomposita* [34], *P. jishanensis* [35], *P. ostii* [36] and *P. quii* [37]. In *P. rockii*, Yuan et al. [28,29] used cpDNA and SSR to analyze the genetic diversity in 20 wild populations, showing that wild *P. rockii* had high genetic diversity at the level of species and can be divided into three genetically distinct clusters corresponding to three geographically distributed regions. Xu et al. [37] used cpDNA and EST-SSR to analyze the genetic diversity of *P. rockii*, *P. jishanensis* and *P. qiui* to discover that the genetic diversity of *P. rockii* among the three tree peony species was relatively low. In the population of 462 cultivated *P. rockii* seedlings, Wu et al. [38] used 40 EST-SSR markers to reveal a medium genetic diversity and the genetic structure composed of three subgroups. These studies provided us with very useful references to carry out further research on the genetic diversity of tree peony germplasm.

With high polymorphism, codominance and ease of operation, EST-SSR molecular markers have been applied more and more in the study of genetic diversity [39]. Chloroplast genome is maternal lineage and retains ancestral patterns of genetic diversity longer than nuclear DNA [40]. Moreover, in the cross-breeding system, the effective population size of cpDNA is only half of that of nuclear DNA, which may lead to higher genetic differentiation and more significant genetic structure [41]. Chloroplast non-coding region sequences have higher mutation rates than coding region sequences, which can be used for population genetics research [42,43].

Based on the phenotypic traits, EST-SSR markers and chloroplast DNA sequence variation, this paper reports the research of genetic diversity in the newly established population of 282 FTP accessions that were collected, propagated and mostly named as cultivars by us in our breeding project in recent years. Our objectives are to reveal the characteristics of genetic diversity of these accessions, as well as to provide scientific suggestions for the utilization in peony breeding and the conservation of FTP germplasm resource.

#### **2. Materials and Methods**

#### *2.1. Plant Materials and DNA Extraction*

Germplasm resources of FTP had been selectively collected by our breeding desires across the main cultivation area of the FTP, Lanzhou city, Zhang county, Longxi county, Lintao county and Linxia county of Gansu province in Northwest China since 1990, propagated by grafting in Guose Peony Garden of Yan Qing district, Beijing, China (40◦46 N, 116◦07 E) since 2011 (Figure 2). The plants of the 282 accessions used in this study are randomly planted in the same nursery field and maintained under the same condition of water and fertilizer throughout the year.

In March 2018, the young leaves of all accessions were individually collected and dried in silica gel, from which total genomic DNA was extracted by using a DNAsecure plant kit (Tiangen Biotech, Beijing, China). The extracted DNA was detected by electrophoresis using 2% agarose gels and a UnicoUV–visible Spectrophotometer (Agilent, Palo Alto, CA, USA), respectively. The samples were diluted to 20–30 ng/μL with deionized water and then stored in the refrigerator at −20 ◦C.

A total of 282 accessions, of those collected above, were used for EST-SSR analysis, and the phenotypic traits were investigated only in the completely matured 15-year-old plants of 180 accessions. The color of flowers is genetically stable for every specific accession and in the studied accessions can be classified into five groups: 89 white, 33 pink, 32 red, 116 purplish red and 12 purplish black. By this color group, we selected the accessions for cpDNA sequencing, including 24 white, 10 pink, 8 red, 35 purplish red and 3 purplish black accessions, with the proportions of 26.97%, 30.30%, 25.00%, 30.17% and 25.00%, respectively (Supplementary Table S1).

**Figure 2.** Geographic origins of the flare tree peony (FTP) accessions used in this study. These accessions were selectively collected in the cultivation center (dotted area) of FTP in the middle of Gansu province where the blue dots represent the concrete places, and then propagated clonally and transplanted into the conservation site of Beijing (the green dot). The red dots represent Luoyang and Heze, the two traditional cultivation centers of common tree peony (CTP) in China.

#### *2.2. Measurement of Phenotypic Traits*

We recorded 28 phenotypic traits according to the measurement standard (Table 1) during the flowering time from April to May and the seed ripening time from August to September in 2019. In April to May, three plants were randomly selected for each accession as three replicates and three branches with flowers were randomly selected for each plant. The number of flowers, plant height, crown breadth and tiller number of three plants were measured first, then nine flower diameters were measured, and nine carpel numbers were counted. A random outer petal was selected from each flower to measure the length and width of each petal and flare. Then, the number of petals of each flower was counted. Finally, the second or third compound leaf from bottom to top of each branch was selected for leaf trait investigation. From August to September, we chose the three plants and three branches of each plant that we had measured in the blooming period. First, we counted the fruit number of three plants and fruit weight per plant. Then, we measured individual fruit weight, number of seeds per fruit, seed weight per fruit and effective carpel number. Lastly, we randomly selected a single carpel of each follicle to measure fruit length, fruit width, fruit height and pericarp thickness.


**Table 1.** Traits investigated of 180 accessions and the measurement standard.

#### *2.3. Microsatellites Markers*

From 72 pairs of EST-SSR primers distributed in 5 linkage groups of the first high-density genetic map of*P. suffruticosa* [44,45], we screened out 34 pairs of primers with high amplification efficiency and rich polymorphism. The sequences of the primers used in this study are listed in Table 2, and all the forward primers were respectively labeled with 5-Carboxyfluorescein (5-FAM), 5-Hexachlorofluorescein (5-HEX), 5-Carboxytetramethylrhodamine (5-TAMRA), 5-Carboxy-X-rhodamine (5-ROX). The polymerase chain reaction (PCR) was conducted in a 10 μL solution, including: 5 μL 2× Power Taq PCR Master MIX (Aidlab Biotechnologies, Beijing, China), 0.5 μL and 10 μmol/L each of forward and reverse primer, 1 μL and 20–25 ng/μL genomic DNA and 3 μL ddH2O. The SSR-PCR amplification procedure was run as described by Wu et al. [45]. The amplified fragment results were detected by capillary electrophoresis using an ABI3730xl DNA Analyzer with a GeneScan-500LIZ size standard (Applied Biosystems, Carlsbad, CA, USA).





#### *2.4. Chloroplast DNA Sequences*

Since the same chloroplast gene fragment has different evolutionary rates in different plants, the optimal chloroplast fragment should be selected for different species [46–48]. From the reported chloroplast fragments, three pairs of chloroplast gene spacer fragments, *pet*B-*pet*D, *acc*D-*psa*I and *psb*E-*pet*L, which can be amplified stably and have high polymorphism, were screened out on the basis of existing common primers [26,49]. Primer information is shown in Table 3. Three chloroplast DNA primers were selected to access the genetic diversity of 80 accessions that can represent the typical flower colors from 282 accessions. The PCR amplification reaction was conducted in a 50 μL solution, including: 25 μL 2× Power Taq PCR Master MIX (Aidlab Biotechnologies, Beijing, China), 2.5 μL and 10 μmol/L each of forward and reverse primer, 2 μL and 20–25 ng/μL genomic DNA and 18 μL ddH2O. The PCR procedure was as follows: 5 min at 94 ◦C, followed by 35 cycles of 30 s at 94 ◦C, 30 s at 52–54 ◦C and 50 s at 72 ◦C, and lastly, 7 min at 72 ◦C. The amplified fragment results were detected by capillary electrophoresis using an ABI3730xl DNA Analyzer (Applied Biosystems, Carlsbad, CA USA).



#### *2.5. Statistical Analyses*

#### 2.5.1. Analysis of Phenotypic Data

The average values per plant were used in the statistical analysis and the data were analyzed using the statistics software SPSS version 18.0 (IBM Inc., Chicago, IL, USA) [50]. One-way analysis of variance (ANOVA; with post hoc Duncan' s multiple range test with a probability *p* < 0.05 and *p* < 0.01) was used to analyze the 28 traits' variation and Pearson's correlations between traits were calculated. Prior to the ANOVA and Pearson's correlation analysis, all data were tested for normality with the Shapiro–Wilk W test and for homogeneity of variance with the Levene's test, and the non-normal data were logarithmic transformed. We adjusted the *p*-value for Pearson's correlation using the Benjamini–Hochberg (BH) false discovery rate (FDR) correction. Coefficient of variation (%) = (standard deviation/mean) × 100. Principal component analysis (PCA) was performed for 28 traits. Maximum deviation method was used for factor rotation, and the principal components were extracted according to Kaiser criterion (characteristic root > 1). Then, with the first principal component (PC-1) and the second principal component (PC-2) as the main coordinates, the distribution diagrams of 28 quantitative traits and 180 accessions were plotted, respectively.

#### 2.5.2. Analysis of EST-SSR Data

GeneMarker V2.2.0 was used to analyze the capillary electrophoresis data. GenAIEx version 6.5 [51] was used to calculate the following indicators: Number of Different Alleles (Na), Number of Effective Alleles (Ne), Shannon's Information Index (I), Observed Heterozygosity (Ho), Expected Heterozygosity (He), Inbreeding coefficients (FIS) and Nei's genetic diversity (GD). Polymorphism information content (PIC) was calculated for each locus by using a Microsatellite Toolkit. GENEPOP version 4.2 [52] was used to detect whether microsatellite loci deviated from the Hardy–Weinberg equilibrium (HWE). Additionally, a Neighbor-Joining phylogenetic tree based on Nei's unbiased genetic distance was

constructed with MEGA-X [53]. Finally, Principal Coordinates Analysis (PCoA) was carried out using GenAIEx version 6.5 based on Nei's genetic distance of all accessions.

#### 2.5.3. Analysis of Chloroplast DNA Sequences

MAFFT version 7 [54] was used to conduct multi-sequence alignment and adjustment. SequenceMatrix 1.7.8 [55] was used to splice three chloroplast non-coding regions, *acc*D-*psa*I, *psb*E-*pet*L and *pet*B-*p*etD, into a sequence. Values of haplotype diversity (Hd), haplotype number (Hap) and nucleotide diversity (Pi) were calculated by DnaSP 5.0 [56]. PopART [57] and the Median-Joining network (MJ) algorithm were used to construct the association map of the chloroplast DNA haplotypes. MEGA-X [53] was used to construct the phylogenetic tree of 3 haplotypes based on the maximum likelihood method (ML). Meanwhile, phylogenetic trees were constructed for the sequences of 80 accessions based on the Maximum Parsimony method (MP) with PAUP 4.0 [58]. The consistency index was 1.000000, the retention index was 1.000000, and the composite index was 1.000000 for all sites and parsimony-informative sites (in parentheses).

#### **3. Results**

#### *3.1. Genetic Diversity Based on Phenotypic Traits*

#### 3.1.1. Phenotypic Traits Variation

ANOVA analysis on phenotypic variation showed that in addition to the number of flowers per plant, the other 27 quantitative traits showed very significant differences in 180 accessions (*p* < 0.01) (Table 4). Since all investigated plants were adult with stable characteristics growing in a consistent cultivation condition, it could be considered that the phenotypic variation was mainly due to genotypic differences. Descriptive statistical analysis was conducted at the whole population level, and it was found that the coefficient of variation of flower traits, stem and leaf traits and fruit traits ranged from 12.33% to 108.73%, with a very high degree of variation. Among the flower traits, the coefficient of variation ranged from 12.33–108.73%. The variation of petal number (108.73%) was very obvious, but the other flower traits varied weakly in various accessions, among which the variation coefficients of flower diameter (12.33%) and petal length (12.33%) were the smallest. As for the branch and leaf traits, the variation coefficients of tillers' number (90.53%), fruits (66.86%) and flowers (57.43%) per plant were relatively higher. The variation coefficient of the fruit traits ranged from 17.84% to 99.93% (average 48.50%). Among them, the variation coefficient of seed weight per fruit was the largest (99.93%), followed by seed number per fruit (94.30%). So, significant phenotypic differences of phenotypic variation in the FTP accessions provide valuable germplasm resources for selective breeding of tree peonies.

#### 3.1.2. Distribution of Phenotypic Traits Variation in the Population

PCA on 28 quantitative traits showed that the cumulative contribution rate of 8 principal components reached 70.516% (Table 5). The explanatory contribution rate of the first principal component was 17.722%. The traits that determined the first principal component were, in order, seed weight per fruit, individual fruit weight, number of seeds per fruit, fruit width, fruit height, fruit length and fruit weight per plant, all of which were fruit traits. Therefore, the first principal component represented the variation of fruit traits, which can be defined as the fruit character factor. The traits that determined the second principal component were fruit number, flower number, east-west crown breadth, north-south crown breadth and fruit weight per plant. The third principal component mainly consisted of petal length, petal width, flower diameter, flare length and flare width, all of which were floral traits, which can be defined as the flower character factor. The fourth principal component consisted of compound leaf length, petiole length, compound leaf width, terminal leaflet length and terminal leaflet width, all of which were stem and leaf traits. Length and width of the terminal

leaflet majorly affected the character of the fifth principal component. The fourth and fifth principal components can be defined as the branch and leaf character factor. The PCA of quantitative traits explained the variation distribution of phenotypic traits in the whole population.


**Table 4.** Descriptive statistics for quantitative traits measured in 180 accessions.

Note: *p* < 0.05: Significant difference; *p* < 0.01: Very significant difference. The traits corresponding to the serial number are shown in Table 1.

In addition, among the principal coordinate distributions of all quantitative traits, we found that fruit traits were distributed along the horizontal axis, stem and leaf traits along the vertical axis, and flower traits were distributed near the origin (Figure 3a). The 180 accessions were evenly distributed on the coordinate axes composed of the first principal component and the second principal component (Figure 3b), indicating that the variation of these phenotypic traits was evenly distributed in the population and no obvious hierarchical structure was formed.


**Table 5.** The principal component analysis (PCA) of 28 quantitative traits.

Note: The traits corresponding to the serial number are shown in Table 1.

**Figure 3.** Principal component (PC) coordinate distribution of traits and accessions: (**a**) Distribution of 28 quantitative traits of the population on the PC-1 and PC-2 axis. The traits corresponding to the serial number are shown in Table 1. (**b**) Distribution of 180 accessions on the PC-1 and PC-2 axis.

#### 3.1.3. Correlation Analysis of Quantitative Traits

Correlation analysis on 28 quantitative traits showed that there were different degrees of correlation among all traits (Table 6). Among the 378 pairs of combinations, 205 pairs of all traits showed significant correlation (*p* < 0.05), among which 191 pairs showed very significant correlation (*p* < 0.01). In tree peony breeding for ornamental uses, we mainly focused on the improvement of the flower diameter and the petal number. There were 17 traits that were very significantly positively correlated with flower diameter, including all floral traits except petal numbers and all fruit traits except effective carpel number. Petal number was negatively correlated with flower diameter, and only positively correlated with effective carpel number. This indicated that the traits of flower and fruit were closely related and the two ornamental traits we want to improve are opposite. For the breeding of oil tree peony, we need accessions that have more fruit number per plant and more seed weight per fruit. In Table 6, we can find that the trait that was very significantly positively correlated with these two traits was fruit weight per plant. For conventional breeders, it is not necessary to peel off the pericarp to weigh the seeds, but simply to weigh the fruit weight per plant to make a preliminary selection of high seed yield accessions.

#### *3.2. Genetic Diversity Based on the EST-SSR*

#### 3.2.1. EST-SSR Polymorphism

Amplification of 282 accessions using 34 SSR pairs generated 185 alleles and the number of alleles detected per locus was in a range of two to thirteen, with an average of 5.441 alleles (Table 2). Observed heterozygosity (Ho) and expected heterozygosity (He) ranged from 0.004 to 0.993 and 0.004 to 0.815, with means of 0.537 and 0.489, respectively. Among all 34 SSR loci, PS004, PS026, PS047, PS068, PS073, PS074, PS095, PS119, PS139, PS157, PS166, PS187, PS221, PS260, PS265, PS296, PS309, PS311, PS337, PS345, PS356 and PS367 had lower He than Ho. Overall, the mean of Ho was higher than that of He, indicating that there was a heterozygote excess in these accessions. The effective number of alleles (Ne) was 2.291 ± 0.166. Shannon diversity index (I) was 0.908 ± 0.072, and the range of PIC at 34 SSR pairs of 282 accessions was 0.004~0.792. According to the Shannon's Information Index, the genetic diversity of PS095 (I = 1.857) was the highest, followed by primers PS074 (I = 1.457), PS166 (I = 1.455) and PS356 (I = 1.476). Primer PS323 has the lowest genetic diversity. The mean value of SSR marker polymorphism information content (PIC) in this study was 0.611. In addition, 24 SSR sites deviated significantly from the HWE (*p* < 0.001), and 2 SSR sites deviated significantly from the HWE (*p* < 0.01).

#### 3.2.2. Genetic Relationships among 282 Accessions

It was found that 282 accessions were firstly divided into 3 major branches (Figure 4a), comprising 175, 87 and 20 accessions. Branch I; included 61 white, 25 pink, 21 red, 59 purplish red and 9 purplish black accessions, branch II included 23 white, 8 pink, 9 red, 44 purplish red and 3 purplish black accessions, and branch III included 5 white, 2 red and 13 purplish red accessions. The accessions of each color were relatively evenly distributed on each cluster, which showed the same results as the PCoA analysis (Figure 4b) and showed no regular distribution. Some accessions with similar traits have a relatively long genetic distance, while some with great phenotypic differences have relatively short genetic distances. For example, accessions '73' and '74', both of which had single white flowers, were very distantly related, while accessions '40' and '75', one purplish red double and one white single, were particularly closely related. This also indicated that there was no direct relationship between the relative distance and the color of the accessions.


**Table 6.** Pearson correlation coefficients of phenotypic traits of 180 accessions.

**Figure 4.** Relationships among 282 accessions: (**a**) Neighbor-Joining (NJ) phylogenetic tree based on the data of 34 EST-SSR markers, (**b**) Principal coordinates analysis (PCoA) based on Nei's unbiased genetic distance.

#### *3.3. Genetic Diversity Based on the cpDNA*

The *acc*D-*psa*I, *psb*E-*pet*L and *pet*B-*pet*D fragments of three chloroplast non-coding regions were used to detect 80 accessions. After combination, the total length of the three fragments was 1580 bp. A total of 4 mutation sites were detected in the combined fragment, all of which were single base mutations and corresponding to three haplotypes. The *pet*B-*pet*D fragment had 2 variation loci, the other two fragments were each one, respectively (Table 7). According to the calculation results of the three combined fragments by DnaSP 5.0, the total Hd was 0.164 and Pi was 0.28 <sup>×</sup> <sup>10</sup><sup>−</sup>3.

**Table 7.** The variations in *acc*D-*psa*I, *psb*E-*pet*L and *pet*B-*pet*D regions among 80 accessions.


Note: A, T, C and G are bases, respectively.

In the haplotype spectrum diagram (Figure 5a), Haplotype 1 was located in the center of the network, which was an obvious widespread haplotype. It appeared in 73 accessions, which may be the original haplotype. Haplotype 2 and 3 contained 6 and 1 accessions, which were '98', '120', '140', '146', '150', '187' and '128', respectively. According to Figure 5a, Haplotype 1 was closer to Haplotype 3 than Haplotype 2, showing the same result with phylogenetic relationship of 3 haplotypes (Figure 5b). Three haplotypes were clustered into two branches in the tree, and Haplotype 2 formed a single branch.

**Figure 5.** Analysis of cpDNA haplotypes: (**a**) Network of 3 cpDNA haplotypes (Hap 1–Hap 3), (**b**) Phylogenetic relationship of 3 haplotypes based on the Maximum Parsimony (MP) method analysis.

Meanwhile, phylogenetic trees were constructed for the sequences of 80 accessions. The accessions were divided into two distinct branches (Figure 6). The first branch consisted of 6 germplasms of Haplotype 2, including '98', '120', '140', '146', '150' and '187', and the second branch consisted of 74 accessions of Haplotype 1 and Haplotype 3, which also indicated that the phylogenetic relationship of Haplotype 1 and 3 were closer than that of Haplotype 2.

**Figure 6.** Strict consensus tree of 80 accessions using the Maximum Parsimony (MP) method (the numbers in the figure represent the number of the accessions).

#### **4. Discussion**

#### *4.1. Diversity of Phenotypic Traits*

Because the phenotype is the result of the interaction of genotype and external environment and the diversity of phenotypic traits is the manifestation of genotype difference in morphology, the phenotypic variation in the population mainly depends on the genotype difference when all accessions used in this study were grown in the same condition. In FTP, Pang et al. [59] reported phenotypic variation coefficients ranging from 10% to 30% based on 32 traits of 150 accessions, but Wu et al. [60], on the diversity of 29 quantitative traits of 462 accessions, found that the coefficient of genetic variation was 9.52–112.1%. By the genetic diversity of 28 quantitative traits of 180 accessions, this study revealed a genetic variation coefficient of 12.33–108.73%, which was similar to Wu et al. [60]. As the plants used both by Wu et al. and by this study are widely selected and collected from the main cultivation area of the FTP, Gansu province (Figure 2), we felt that the accession population of this study should be large enough to represent the overall situation of FTP as a germplasm resource. Moreover, each accession collected in this study has been clonally propagated and most of them were named as the cultivars (Supplementary Table S1) and can be released into the industry as crop germplasm resources.

Studies on phenotypic variation can not only improve our understanding of the biological basis of plants, but also have very important breeding value [61]. In FTP used in this study, the fact that the number of petals varied from 9 petals to 238.33 petals (Table 4) indicates that these accessions have diverse flower types, from single through to semi-double to double [1]. Such various flower types are really valuable for breeding new cultivars for ornamental plants like tree peonies. Moreover, the tree peonies have been recently developed rapidly as an emerging oil crop and the selection of high-yield accessions has become very crucial for increasing cultivation. Among the accessions used in this study, some of them with single flower, in most cases, are fertile for the formation of fruits (seeds) and can be screened as the high-yield oil germplasms, in which the maximum fruit weight per plant was up to 2661.64 g (Table 4).

#### *4.2. Genetic Diversity Based on SSR Markers*

The primary task of evaluating all kinds of samples in germplasm resources is to determine their genetic constitution in the repository. We can also compare newly collected cultivated or wild plants from various locations with existing germplasm accessions to facilitate identification. With the advent of molecular technologies and methods, it has become more efficient and scientific to use genetic information at the genome level to analyze accession differences in genetic resources and compare these samples with germplasm collections elsewhere [62].

This study detected a total of 185 alleles in 34 pairs of EST-SSR primers in 282 FTP accessions, and the average number of different alleles (Na) was 5.441 (Table 2), which was smaller than in 20 populations (Na = 9.15) of 335 *P. rockii* wild individuals [29], but larger than in 40 pairs of primers in 462 *P. rockii* seedlings (Na = 4.5) [38]. Such differences might be caused by the different primers used in the different studies, but it was very clear that the wild germplasm resources of *P. rockii* do have more abundant allele variation than that of cultivated FTP. The average inbreeding coefficients (FIS) was negative, and the expected heterozygosity (He) of 22 pairs of primers was lower than the observed heterozygosity (Ho) (Table 2), indicating that there was a surplus of heterozygotes in this FTP population [29]. Yuan et al. [63] believed that the higher observed heterozygote meant higher diversity of the population, as their results showed that the wild *P. rockii* (Ho = 0.475) has significant diversity compared to CTP (Ho = 0.61) and FTP (Ho = 0.58). In FTP used in this study, the observed heterozygote was 0.537, which was larger in wild *P. rockii* (Ho = 0.475) but smaller in CTP (Ho = 0.61), and closer to FTP of Yuan et al. (Ho = 0.58). This study by Neighbor-joining tree based on microsatellite analysis identified the 282 FTP accessions into three distinct groups and there was no direct relationship between the genetic distance and the flower color. The grouping was consistent with the studies on 335 wild *P. rockii* individuals of 20 populations and 462 *P. rockii* seedlings, which were also divided into three groups [29,38]. Guo et al. [64] used SRAP to classify 16 accessions of *P. su*ff*ruticosa*, which also indicated that the genetic distance had no correlation with the color: white, green, yellow and black accessions formed a group, while blue, pink and/or multi-colored flowers formed another group.

To sum up, SSR data showed that the number of alleles for FTP used in this study was less than that of wild *P. rockii*, but more than that of cultivated *P. rockii* seedlings. The heterozygosity of FTP is less than that of CTP, larger than that of wild *P. rockii*, and close to that of cultivated *P. rockii* seedlings. This also showed that wild plants have abundant allele variation but low population heterozygosity, and cultivated germplasm have less allele variation but high population heterozygosity, which just indicates that gene exchange is more frequent and more heterozygosity is retained artificially under cultivation conditions [65]. Most woody perennial species are obligately outcrossing, resulting in high heterozygosity [66]. The longer the culture history of tree peony, the higher the heterozygous, and these individuals maintained high heterozygosity by asexual reproduction or clonal propagation under cultivation conditions. The FTP accessions were divided into three groups, which were consistent with the wild *P. rockii* and cultivated *P. rockii* seedlings. So, we speculate that the germplasm resources of FTP in this study, to a large extent, can represent the genetic background information of species tree peony *P. rockii* germplasm.

#### *4.3. Genetic Diversity Based on the cpDNA*

According to Yuan et al. [28], the mean nucleotide haplotype diversity (Hd) and the mean nucleotide diversity (Pi) of the chloroplast (cpDNA) in 335 *P. rockii* wild individuals at the population level was 0.0686 and 0.765 <sup>×</sup> 10−<sup>4</sup> respectively, while Hd and Pi at the species level was 0.887 and 0.185 <sup>×</sup> 10−2, which was significantly higher than population level. Based on the analysis of three cpDNA sequences, Xu et al. [37] measured the diversity of 214 individuals in 24 populations of *P. rockii* and obtained the same results as Hd and Pi at the species level, 0.874 and 0.294 <sup>×</sup> 10−2, which were significantly higher than Hd (0.1097) and Pi (0.383 <sup>×</sup> 10−4) at the population level. This shows that wild *P. rockii* has a low genetic diversity at the population level and a higher genetic diversity at the species level. This situation was consistent with other tree peony species, *P. delavayi* [33], *P. ludlowii* [33], *P. qiui* [37] and *P. jishanensis* [37]. However, Hd and Pi in FTP accessions in this study were 0.164 and 0.28 <sup>×</sup> 10−3, which were higher than the population level and lower than the species level in populations of wild *P. rockii*. Considering these accessions as a population to compare with wild

*P. rockii*, we found that the genetic diversity of the FTP under cultivation conditions is really different from its ancestor species in the habitats in forming genetic diversity or evolution. This is probably because wild plants of *P. rockii* reproduce by seeds in the same population, while gene exchange does not occur due to geographical isolation between populations. The chloroplast diversity of all individuals in the same population was lower, while it was higher at the species level because each population has distinct haplotypes. In this study, the accessions of FTP were selected and collected from various places of Gansu province without geographic isolation, and gene exchange may have occurred during the formation of germplasms through repeated hybridization, which finally resulted in a higher chloroplast diversity at the population level after long cultivation.

Meanwhile, based on the results of phylogenetic tree of chloroplast gene fragments (Figure 6) and haplotype spectrum (Figure 5b) analysis, the existing spatial genetic structure of the FTP can be divided into two branches, which was consistent with Yuan et al. [27] using chloroplast fragment to divide the genetic structure of wild *P. rockii* into two branches.

#### *4.4. Conversation and Utilization of FTP Germplasm Accessions*

Research on genetic diversity and structure is the theoretical basis for the protection and utilization of species. One of the goals of species protection is to maximize the conservation of genetic diversity of the species [67,68]. The accessions we investigated have abundant phenotypic variations and represent FTP germplasm, from which we have selected to name some cultivars for ornamental oil uses. The next step is to establish multi-site repositories and a multi-year, comprehensive phenotypic database of FTP germplasm resources. Germplasm accessions are crucial resources for studying crucial characteristics like plant resistance and flowering time through multi-year and multi-site observations. Similarly, using these germplasms to study phenotypic variation may be significant to researching phenotypic limberness as they react to changing climatic and different living environments.

With the results from SSR and cpDNA analysis, the FTP accessions can represent the genetic background information of *P. rockii* germplasm resources and have relatively high genetic diversity. In order to protect this diversity, a germplasm resource bank should be established to conserve as many accessions as possible. Analysis of chloroplast sequence variation showed that three haplotypes were found in 80 accessions. One haplotype consists of 73 accessions, but the other two haplotypes only consist of six and one, respectively. This suggests that the presence of seven accessions is closely related to the genetic diversity of *P. rockii* germplasms, and their disappearance will bring about an irreversible loss of diversity. Therefore, special protection should be given to these unique accessions.

Due to the rapid change of global climate caused by human activities, the selection pressure of woody ornamental plant commercialization is increasing, and the diversity of existing germplasm resources has more important breeding value [69]. As a kind of high-quality germplasm resource with strong resistance, high ornamental value and great potential for oil use, *P. rockii* accessions play an important role in the improvement of tree peony. At the level of DNA, this cultivated *P. rockii* population we have established can represent the basic level of *P. rockii* germplasm resources. In terms of phenotypic diversity, the variation of each trait is also extremely rich. Therefore, the value of these accessions is especially obvious for breed improvement and development of tree peonies. Meanwhile, the germplasm resources contain various plant genetic resources, which is of great significance to the basic research of plant biology. In order to explore the potential excellent characters within these accessions, we are required to continue to explore the phenotypes, genotypes and preservation of these resources. In this study, we planted these accessions from the traditional cultivation area of Gansu province to Beijing. It is necessary to continuously observe and measure these characters to explore the adaptability of the FTP to the different environment. Ultimately, germplasm collections should not simply be viewed as tools for conservation, but as resources with unique benefits that hold immense value for sustaining the future of woody perennials and their wild relatives. The comprehensive evaluation and utilization of these cultivated accessions is to protect the germplasm of *P. rockii* tree peony and its derivatives.

#### **5. Conclusions**

This study is the first to analyze the genetic diversity of FTP accessions newly selected by us using phenotypic traits, EST-SSR markers and chloroplast DNA sequences. Phenotypic data indicated that these accessions have abundant phenotypic variations and are representative of woody peony *P. rockii* germplasm. SSR data showed that the FTP accessions used in this study had relatively high genetic diversity and were composed of three distinct groups, like their wild relatives. The cpDNA data indicated that the genetic diversity of chloroplast genome in the FTP accessions had a higher genetic diversity than the population level and a lower genetic diversity than the species level of wild *P. rockii*, and they can be divided into two branches based on this. The FTP accessions fully reflected the genetic diversity and current situation of species tree peony *P. rockii* germplasm to a large extent, so they can be regarded as the core germplasm resources and their protection and utilization will be of great significance both for genetic improvement and biological studies of tree peonies and for sustaining development of the peony industry.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/6/672/s1: Table S1: The numbers, names, flower colors and grouping results of 282 accessions.

**Author Contributions:** Conceptualization, X.G. and F.C.; validation, X.G. and F.C.; formal analysis, X.G.; investigation, X.G.; data curation, X.G.; writing—original draft preparation, X.G.; writing—review and editing, X.G., F.C. and Y.Z.; visualization, X.G.; supervision, F.C.; project administration, F.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National High Technology Research and Development Program of China, grant number 2011AA100207 and the Science and Technology Project of Beijing, grant number Z181100002518001.

**Acknowledgments:** We would like to thank Xinyun Cheng and Xiwen Tao for their efforts in maintaining living plant materials for this study.

**Conflicts of Interest:** The authors declare no conflict of interest. In this study, Beijing Guose Peony Technologies Co., Ltd. carried out the collection and management of plant materials.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Genetic Structure and Pod Morphology of** *Inga edulis* **Cultivated vs. Wild Populations from the Peruvian Amazon**

**Alexandr Rollo 1,2,**†**, Maria M. Ribeiro 3,4,5,**†**, Rita L. Costa 6, Carmen Santos 7, Zoyla M. Clavo P. 8, Bohumil Mandák 9, Marie Kalousová 1,10, Hana Vebrová 1,10, Edilberto Chuqulin 11, Sergio G. Torres 11,12, Roel M. V. Aguilar 13, Tomáš Hlavsa <sup>14</sup> and Bohdan Lojka 1,\***


Received: 5 May 2020; Accepted: 4 June 2020; Published: 8 June 2020

**Abstract:** *Research Highlights*: This study assesses the genetic diversity and structure of the ice-cream-bean (*Inga edulis* Mart.; Fabaceae) in wild and cultivated populations from the Peruvian Amazon. This research also highlights the importance of protecting the biodiversity of the forest in the Peruvian Amazon, to preserve the genetic resources of species and allow further genetic improvement. *Background and Objectives*: Ice-cream-bean is one of the most commonly used species in the Amazon region for its fruits and for shading protection of other species (e.g., cocoa and coffee plantations). Comprehensive studies about the impact of domestication on this species' genetic diversity are needed, to find the best conservation and improvement strategies. *Materials and Methods*: In the current study, the genetic structure and diversity were assessed by genotyping 259 trees, sampled in five wild and 22 cultivated *I. edulis* populations in the Peruvian Amazon, with microsatellite markers. Pod length was measured in wild and cultivated trees. *Results*: The average pod length in cultivated trees was significantly higher than that in wild trees. The expected genetic diversity and the average number of alleles was higher in the wild compared to the cultivated populations; thus, a loss of

genetic diversity was confirmed in the cultivated populations. The cultivated trees in the Loreto region had the highest pod length and lowest allelic richness; nevertheless, the wild populations' genetic structure was not clearly differentiated (significantly different) from that of the cultivated populations. *Conclusions*: A loss of genetic diversity was confirmed in the cultivated populations. The species could have been simultaneously domesticated in multiple locations, usually from local origin. The original *I. edulis* Amazonian germplasm should be maintained. Cultivated populations' new germplasm influx from wild populations should be undertaken to increase genetic diversity.

**Keywords:** agroforestry; domestication; *Inga edulis*; amazon forest; microsatellite markers; genetic diversity

#### **1. Introduction**

The Peruvian rainforest, due to its large and relatively continuous area of primary forest, is a worldwide biodiversity hotspot, which is suffering intense disturbance and deforestation from human exploitation and global change [1]. Amazonian inhabitants have used these natural resources through millennia and modified the natural environment, but how human management practices resulted in Amazonian forests' domestication is not known, in particular the germplasm source [2]. Moreover, the species' gene pool could have been reduced due to farmers' selection, thus strategies for genetic resource conservation and management are needed [3]. The Peruvian rainforest that remains a large and relatively continuous area of primary forest has major conservation value and is considered a priority in nearly all global biodiversity inventories, due to its biodiversity and the disturbance and deforestation rates [4,5]. Despite the international recognition of its major conservation value resulting from its uniqueness and global importance, the impacts of human activities throughout the region remain poorly understood [5]. Indeed, information about the species' genetic structure will assist in tree breeding programmes and conservation strategies, in particular in tropical trees, and, also, to study the implications of human impact on genetic resources [6,7].

The genus *Inga* (Fabaceae) comprises ca. 300 species of neotropical rainforest trees [8], but earlier studies suggest that the diversification of *Inga* species in Amazonia is recent, during the past 2–10 million years [9,10]. Ice-cream-bean (*Inga edulis* Mart.; Fabaceae) is a lowland rainforest light-demanding species, distributed in Colombia and tropical South America to the east of the Andes, extending from south to north-western Argentina (Figure 1). The species' natural altitudinal range is mostly below 750 m a.s.l., though it has been occasionally identified at 1200 m in Roraima, Brazil [8]. The flowers are hermaphrodite and the pollination is provided by hawkmoths, bats and hummingbirds that may carry the pollen grains across large areas [11]. The species is diploid, 2n = 26 [12], and it is believed to be self-incompatible [13]. Fruiting occurs, in three year-old trees, as a long pod containing recalcitrant seeds, covered by a white, fleshy and slightly sweet edible sarcotesta [14]. Seed dispersal is performed by mammals and birds after eating the sarcotesta [11,15]. The species is widely cultivated for its edible fruit throughout South and Central America [16]. It is one of the most widely distributed and economically useful species in the whole Amazon region [14,17]. In Amazonian Peru, the fruits from cultivated trees may exceed 2 m in length and 5–6 cm in diameter. The wild trees have smaller pods than cultivated trees, rarely exceeding 50 cm in length [8]. This fast growing, symbiotic nitrogen-fixing tree, with umbrella-like canopy, is commonly used as a shade tree for cocoa, coffee, coca and tea plantations, in agroforestry systems and in "home grown" multi-purpose cultivation uses [8].

**Figure 1.** Distribution map of *Inga edulis*. The green dots represent the 1686 occurrences, and the red dots the trees sampled in the current study (259 occurrences). GBIF.org (10th October 2018) GIBF Occurrence. Download: https:doi.org/10.15468/dl.ik3uki.

Amazonia was a major centre of crop domestication, with at least 83 native species containing populations domesticated to some degree, which expanded rapidly in the Mid-Holocene [18], including the *Inga* genus. The historical records for *Inga edulis* show that this species has been cultivated in Peru for its edible fruit since the pre-Colombian time and has become a commonly used tree species in the Amazon region [19]. The origin of the cultivated populations of *I. edulis* is uncertain [8]; however, León [17] and Clement [20] claimed West Amazonia as a probable origin. The species' genetic structure was not studied in detail, yet a reduction of allelic richness in cultivated relative to natural populations was found in *I. edulis* from the Peruvian Amazon [13,21]. *Inga edulis* has become a model species to evaluate the maintenance of genetic resources in agroforestry systems, and the putative genetic diversity reduction associated with domestication [22]. More recently, Cruz-Neto et al. [23], using microsatellite markers, observed high levels of genetic diversity within *I. vera* populations from the Atlantic forest of north-eastern Brazil. They concluded that cultivated populations compared to natural populations displayed reduced genetic diversity. Nevertheless, maintaining high levels of genetic variation within agroforestry trees is important for two main reasons: genetic variation in agricultural landscapes helps farmers to manage their inputs in more efficient ways and because they provide the ability for tree species to adjust to new environments, such as the shifting climate and weather conditions, allowing local adaptation and the migration of better-suited provenances along ecological gradients [3]. In addition, a stronger emphasis on the genetic quality of the trees planted by smallholders is needed, which means paying attention both to domestication and to the systems by which improved germplasm is delivered to farmers for the management of tree genetic resources and the livelihoods of rural communities in the tropics [24].

In the present study, the objectives were to (i) explore differences in pod length between wild trees and cultivated *I. edulis* trees from different geographical regions in the Peruvian Amazon; (ii) compare the wild and cultivated *I. edulis* populations' genetic structure using microsatellite markers; and (iii) determine if the cultivated populations' genetic structure reflects the different uses and cultivation practices throughout the species' use history, to help design practical measures to preserve *I. edulis* genetic resources.

#### **2. Materials and Methods**

#### *2.1. Plant Material Sampling*

The leaves and mature fruits from 259 individuals and 27 populations of *I. edulis* were sampled in the Peruvian Amazon, between 2009 and 2012. Each tree was identified according to the morphological aspects detailed by Rollo et al. [25] and, additionally, with the help of locals. Each sampled tree's young leaves were preserved in micro test-tubes with silica gel for further DNA extraction. One to ten mature pods were sampled per tree, from opposite sides and different heights of the crown, according to the availability of mature fruits on the tree. The pod length was measured from the base to the top of the pod apex. The mature fruits had the following phenological characteristics: seeds from creamy white to purple black up to vivipary; and the sarcotesta membranous creamy to generally flashy white, watery, soft and slightly sweet [8]. A total of 448 mature pods were measured in the 259 trees from the 27 *I. edulis* populations. We chose to study the pod length, as we can draw a null hypothesis based on the domestication process, since the trees were selected for their pod length (H0: Is the pod length of the domesticated trees higher than that of the wild trees?). Indeed, this trait has economic importance in the species domestication and agroforestry value.

Each sampled tree's geographical coordinates were recorded, and the minimum distance between any two trees was 200 m. Voucher specimens were kept in the Regional Herbarium of Ucayali IVITA-Pucallpa, Peru, with the code AR1-384. Each population was numbered from 1 to 27 and coded, e.g., 1 SRc, 23 RPw (the two capital letters are taken from the initial letters of the geographic origin of the population, e.g., San Ramón or River Pacaya, and the third letter meaning either c = cultivated—managed by humans or w = wild—growing spontaneously) (Table 1).


**Table 1.** The sampling region (Site), population code (Pop.), sample size (N), geographic location (GPS coordinates in WGS84; latitude S and longitude W) and altitude (in metres above sea level) of the 27 sampled *Inga edulis* populations (cultivated and wild).

Cultivated trees were sampled in 22 geographically different populations, in home gardens and agricultural landscapes surrounding the urban areas in the Selva Central, Ucayali and Loreto regions. Wild trees were sampled in five geographically different populations, in lowland forests; four populations (23 RPw, 24 RSw, 26 MAw, 27 SDw) in protected natural areas in original forest vegetation and one (25 RUw) in secondary forest as described in Rollo et al. [25]. Details on the sampled populations are displayed in Table S1. Finally, no *I. edulis* wild trees were found in the Selva Central region's original vegetation, since the species is a lowland rainforest species and has been only occasionally recorded above 750 m [8].

#### *2.2. DNA Extraction and Amplification*

Total DNA was extracted from dried young leaves, using the Invitek, Invisorb® Spin Plant Mini Kit following the manufacturer's instructions. Four microsatellite markers were used to genotype all the individuals, Pel5 [26] and Inga03, Inga08 and Inga33 [21]. For microsatellite detection, each forward primer was fluorescently labelled at the 5 end (6-FAM, NED or VIC). Amplification conditions were performed according to the conditions described by Rollo et al. [25]. The amplified products were separated on an ABI PRISM 310 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) and ran according to the manufacturer's protocol. Fragment sizes were determined using the ROX500 internal size standard and the global southern algorithm implemented by ABI PRISM GeneMapper® software version 4.0 (Applied Biosystems).

#### *2.3. Data Analysis*

#### 2.3.1. Morphological Data

The pod length was measured in the 448 pods sampled from the 259 trees of the *I. edulis* populations from the Peruvian Amazon, and all the individual values were used in the following analysis (not averaged per tree). The pod lengths' normality (in both cultivated populations originated in Selva Central, Ucayali and Loreto regions and in wild populations) was tested using the Kolmogorov–Smirnov test, and all the groups displayed normal distribution except for the wild trees group. A non-parametric Kruskal–Wallis test (k independent samples) was performed to check for significant differences in the groups' average followed by the non-parametric Mann–Whitney U post-hoc test [27]. The statistical analyses were performed using the IBM SPSS v.22 statistics software.

#### 2.3.2. Molecular Data

The estimated genetic diversity parameters included the average number of alleles per locus (*A*), the effective number of alleles (*N*e), the number of private alleles (Pa), the mean allelic richness (*R*S) that uses a rarefaction index to consider differences in sample size [28], the observed (*H*O) and expected (*H*E) heterozygosities [29] and the fixation index *F*IS. Comparisons of the genetic diversity parameters between groups (i.e., cultivated and wild populations) were performed with 10,000 permutations. The estimates were made using Fstat 2.9.3.2 [30] and GenAlEx v. 6.501 [31]. Genepop 4.3 software [32] was used to test the heterozygote deficiency for each population and to compute the average frequency of null alleles.

Genetic variation at the level of populations and groups (i.e., cultivated and wild populations) was investigated with a hierarchical analysis of molecular variance (AMOVA), which partitions the total variance into covariance components due to inter-group differences, inter-populations within groups differences and inter-population differences, in the Arlequin software [33]. Levels of significance were determined by computing 1000 random permutation replicates.

A Bayesian clustering method was performed in the STRUCTURE v.2.3.4. software [34] to infer population genetic structure. The number of genetic clusters (*K*) was estimated and the individuals sampled from cultivated and wild populations were fractionally assigned to the inferred groups. Afterwards, the allele frequencies were estimated in each of the *K* groups as well as the proportion of the

genome derived from each group for each tree. We applied the model allowing population admixture and correlated allele frequency [34]. However, due to the weak population structure found in the *I. edulis* populations, we used a model that incorporated a priori sampling location information [35], i.e., a "locprior" model. This improved model has the advantage of allowing cryptic structures to be detected at a lower level of divergence and does not bias towards detecting structures spuriously when none is present, helpful in situations when the standard structure models do not provide a clear signal of structure [35]. Two groups of populations were used as priors, i.e., cultivated and wild populations (see Table 1). The alternative ancestry prior 1/*K* was used due to unbalanced population sampling [36]. The number of clusters (*K*) was set from one through twenty-seven and the simulation was run ten times at each *K* value to confirm the repeatability of the results. Each run comprised a burn-in period of 25,000, followed by 100,000 Markov chain Monte Carlo (MCMC) steps. We used the Δ*K* distribution statistic of Evanno et al. [37] to determine the most appropriate number of genetic clusters. Hence, the STRUCTURE output data were parsed using the STRUCTURE HARVESTER [38] to determine the optimal *K* value following the method referred to above. Alignment of cluster assignments across replicate analyses was then conducted in CLUMPP 1.1.2 [39] and subsequently visualized using STRUCTURE PLOT [40]. The results of Bayesian clustering were further mapped in the ArcGIS® Desktop version 10.2 software [41].

#### **3. Results**

#### *3.1. Pod Length*

From a total of 259 individual trees, 448 fruits were collected and measured: 329 and 119 fruits from 197 cultivated trees and 62 wild trees, respectively. The longest pod (148 cm) was found in 18 EDc, a cultivated population in the Loreto region around El Dorado village (Table S2). The Kruskal–Wallis non-parametric test, using the four groups of populations (regions), showed significant group differences for pod length (Figure 2). The Mann–Whitney U post-hoc test produced three homogeneous groups, which indicated that the pod length average (78 cm) in the Selva Central region was not significantly different (*p* < 0.05) from the Ucayali region's 80 cm long average, and both values were significantly different from the Loreto value (90 cm) (Figure 2).

**Figure 2.** Pod length comparison among the cultivated and wild populations. From Selva Central, 94 mature pods were collected from 45 trees from 5 populations; from Ucayali, 120 mature pods were collected from 72 trees from 7 populations; and from Loreto, 115 mature pods were collected from 80 trees from 10 populations. In the wild populations, 119 mature pods were collected from 62 trees from 5 populations. Significantly different means are followed by different letters (*p* < 0.05).

The Selva Central and Ucayali regions could be one group, considering the cultivated trees' average pod length. The Loreto region's cultivated trees produced the highest pod length average. The average pod length of 83 ± 1.17 cm (mean ± standard error) in the cultivated trees was significantly higher than the 39 ± 0.95 cm pod length average in the wild trees.

#### *3.2. Genetic Diversity*

We identified a total of 71 alleles using the four microsatellite markers after genotyping all the individuals from the 27 populations. The average *A* was 5.7, the *R*<sup>S</sup> was 4.4, the *H*<sup>O</sup> was 0.59, and *H*<sup>E</sup> was 0.69. The overall inbreeding coefficient *F*IS was 0.11 (Table 2).

**Table 2.** Summary of the genetic diversity of the 27 *I. edulis* populations. Sample size (N), average number of alleles per locus (*A*), allelic richness (*R*s), effective number of alleles (*N*e), expected heterozygosity (*H*E), observed heterozygosity (*H*O) and fixation index (*F*IS) averaged over loci. Sig. refers to the significance resulting from the heterozygote deficiency test (a conservative α value for the test of at least *p* < 0.01 was used, due to the low number of individuals per population: NS, not significant, \*\* *p* < 0.01 and \*\*\* *p* < 0.001, significant). *F-null* refers to the average estimate of the null frequency over the loci. Standard errors in brackets.


From the results of the current study, the population with the highest and lowest expected heterozygosity possessed the highest and the lowest allelic richness values, in both cultivated and wild populations. The allelic richness parameter correlated well with the populations' genetic diversity parameters, which is not surprising since the number of individuals sampled per population was unevenly distributed. The population with the highest *H*<sup>E</sup> was 2 VRc, 0.83, a cultivated population from the Selva Central region, and the lowest was 21 INc (0.50), a cultivated population from the Loreto region, and they both possessed the highest and the lowest *R*S, 5.5 and 3.3, respectively. The wild populations 24 RSw and 27 SDw displayed the highest and lowest *H*<sup>E</sup> values, 0.79 and 0.60, similarly with the highest and lowest *R*<sup>S</sup> values, 5.8 and 4.0. Interestingly, the population with the highest *A* and *N*e was 26 MAw, which could be partially explained by the highest number of sampled individuals (27). The cultivated populations 8 CTc and 13 BRc also had a high number of sampled individuals, which was also reflected in the *A* and *N*e parameters. The average expected genetic diversity is slightly higher

in the wild compared to the cultivated populations, 0.72 and 0.69, respectively. The average number of alleles is much higher in the wild (7.4) than in the group of cultivated populations (5.3), but when we consider the allelic richness and effective number of alleles, these differences are reduced (Table 2).

Six cultivated populations out of 22 had significant heterozygote deficiency, but this parameter was not significant in the wild populations (Table 2). The Selva Central group of *cultivated* populations lacked populations with significant inbreeding coefficient, and the Loreto and the Ucayali regions had only two populations and more than half of the populations with heterozygote deficiency, respectively. Positive and significant *F*IS values mirror differences between observed and expected heterozygosity, due to putative heterozygosity loss because of non-random mating of the parents. We should also emphasize that the cultivated populations are an assembly of individuals and we should not expect them to be in Hardy–Weinberg Equilibrium. Additionally, the presence of null alleles is an unlikely explanation since the estimated frequency is very low across populations (Table 2). Nevertheless, when we compared the overall inbreeding coefficient from the wild with the cultivated populations, no significant differences were found between them. Conversely, the allelic richness and the observed heterozygosity were significantly lower in the cultivated populations (Table S3).

Seven private alleles (Pa) were identified in three wild populations, the highest Pa per population was found in the 26 MAw population (3) and two in both 23 RPw and 25 RUw. Only one Pa was identified in four different *I. edulis cultivated* populations (2 VRc, 9 CVc, 14 JHc and 22 MZc). The locus Inga08 had the highest Pa (7 across all populations) and Inga33 and Pel5 only one (data not shown).

The cultivated populations possessed 13 exclusive alleles compared to the wild ones, and only two had a frequency lower than 5%. The regions with the highest number of cultivated populations with exclusive alleles was Selva Central, 80%, followed by Ucayali, 60%, and the Loreto region had the lowest number of populations with those alleles (40%) (data not shown).

#### *3.3. Population Structure*

The population genetic structure was investigated by a hierarchical analysis of molecular variance (AMOVA), which revealed that most of the genetic diversity existed within populations (92%). The differentiation between the cultivated and wild group populations (ΦCT = 0.010) was low (~1%), and not significant (*p* < 0.0958), and the variation among populations within groups was appreciable, ca. 7% (ΦSC = 0.073), and significant (*p* < 0.0001) (Table 3).

**Table 3.** Hierarchical AMOVA between the cultivated and wild population groups, among populations within the cultivated and wild population groups and within *I. edulis*. populations. df = degrees of freedom; SS = sum of squared deviation; Φ statistics = fixation indexes; P = level of probability of obtaining a more extreme component estimate by chance alone. The significance of the variance components were tested by a permutation test.


The *I. edulis* genetic structure was further estimated using a Bayesian approach. Using the method of Evanno et al. [37], the most appropriate number of genetic clusters (*K*) is 2, referred to as red and green (Figure 3; Figure 4 and Figure S1). The red cluster was predominant in the wild populations and in the cultivated populations in the northernmost region (Loreto). Conversely, the green cluster was predominant in the southernmost region (Selva Central). The Ucayali region displayed a mixture of both types of cultivated populations, probably a mixture from the southern and the northern regions (Figure 3; Figure 4).

**Figure 3.** Proportion of genotype membership q (y-axis) based on STRUCTURE cluster analysis. Plots of proportional group membership for the 259 trees for *K* = 2. Each tree is represented by a single vertical line, which is divided into different colours based on the genotype affinities to each *K* cluster (red and green). Divisions between populations are made with black lines.

**Figure 4.** *Inga edulis* populations investigated in this study plotted on the map of Peru. Bayesian clustering for *K* = 2. Populations assigned to two clusters (red and green) corresponding to the *I. edulis* wild (bigger pie charts outlined in black) and cultivated populations (smaller pie charts).

For *K* = 2, the highest proportion of red cluster was observed in cultivated populations along the navigable river watersheds in the Loreto and Ucayali regions (e.g., 6 ATc, 11 YAc, 14 JHc, 15 LAc, 16 NAc, 19 MAc, 20 SCc, 21 INc and 22 MZc). Moreover, the green cluster was found to be prevalent in populations cultivated on the Andean foothills and "terra firme" in the Selva Central and Ucayali regions (e.g., 2 VRc, 3 PIc, 4 SAc, 7 VHc and 10 ARc) (Figure 4).

#### **4. Discussion**

#### *4.1. Influence of Domestication on Fruit Length*

Although the history of cultivation of *I. edulis* is not well documented, a crop domestication study suggested that humans have domesticated this species over a considerable period of time [20]. Indeed, Amazonia is a major world centre of plant domestication, where selection began in the Late Pleistocene to Early Holocene in peripheral parts of the basin [18]. The origin of cultivated *I. edulis* trees is uncertain, though probably Amazonian [8]; nevertheless, some authors have suggested it was started by European settlers in west Amazonia [17,20]. Since this tree was cultivated mainly for fruit production, domestication is expected to increase pod length [8,16,20]. To our knowledge, no study has been made comparing both types of populations, cultivated vs. wild, considering this morphologic characteristic (pod length). Certainly, the higher values found in cultivated trees compared to the wild trees clearly support the domestication of *I. edulis* for food supply. Plant domestication is a long-term process in which natural selection interacts with human selection, driving changes that improve usefulness to humans and adaptations to domesticated landscapes [18].

In the current study, maximum pod length in the wild and the cultivated populations was 73 and 148 cm, respectively, in agreement with Pennington [8]. This author reported that wild trees' pods rarely exceed 50 cm and cultivated trees could, exceptionally, produce pods exceeding 2 m. The average pod length was higher in the Loreto region's cultivated trees, compared to Ucayali and Selva Central regions; the smallest fruits were observed in Selva Central. The species' different cultivation and uses, and differences in ecological conditions, could explain these results. Indeed, in Selva Central, the species was mainly used to shade coffee or cocoa rather than to produce large fruits [42,43]. Farmers were focused mainly on the cash crop yield, rather than the fruit yield provided by shade trees. Additionally, large fruits could be more attractive to uninvited guests, which could then cause damage to the cash crop due to *Inga* fruit collection. Another supporting argument is the wild *I. edulis* local name among the Selva Central region inhabitants. The local name for the cultivated *I. edulis* in Selva Central is "pacay soga", whereas in the Ucayali and Loreto regions, the name "Guaba" is used for the cultivated type and "guabilla" or "guabilla del monte" for the wild tree (A. Rollo, pers. communication). The difference in local names in these regions might be related to the species abundance, both in the wild and cultivated form. Locals in Selva Central informed us that *I. edulis* was hard to find in the surrounding wild vegetation; indeed, the species is rarely seen above 750 m [8]. We were also unable to find and sample wild trees in the Selva Central region.

#### *4.2. Genetic Diversity of Wild and Cultivated Populations of I. edulis in the Peruvian Amazon*

The overall *H*<sup>E</sup> (0.69) was slightly higher than the *H*<sup>O</sup> (0.59), inducing an overall inbreeding coefficient index of 11%. In a meta-study for microsatellites and outcrossing species, the author showed a similar value of *H*<sup>E</sup> (0.65), but slightly higher *H*<sup>O</sup> (0.63) [44].

The results from our study further indicate that all the genetic diversity estimates were lower in the case of the cultivated populations compared to the wild ones, as well as the average inbreeding coefficient. These results confirm a loss of genetic diversity in the cultivated populations, in agreement with the studies by Hollingsworth et al. [21] and Dawson et al. [13] on the same species. These authors concluded that cultivated stands possessed lower total allelic richness than neighbouring wild populations, but the expected genetic diversity remained unchanged, indicating that the process of domestication reduced the number of alleles. Both authors stated that the wild plant material they studied was collected from nearby cultivated populations, in old-growth, primary forest, but due to (i) the long history of the species' use, (ii) the habits of slash-and-burn in primary forest and (iii) gene flow among nearby stands, the wildness of the trees could be questioned [2,18]. Nevertheless, Dawson et al. [13] observed marked differences in the haplotype composition between natural and cultivated stands. In our case, the wild material was sampled in natural vegetation in protected areas and secondary forest, and unless extensive long-distance gene flow existed, no ambiguities in distinguishing both types existed. In addition, the results regarding pod length clearly distinguish the wild from cultivated material. We found an important effect of the domestication on the natural resources of a species, which is an expected phenomenon when a species is used by humans [23,45]. In some cases, the expected heterozygosity might be higher or similar in the cultivated population than that displayed in the wild population, due to a putative "melting pot" phenomenon in the former populations (introduced alleles from different origins). Nevertheless, the allelic richness and observed heterozygosity found, in our study, in the cultivated populations was lower than in the wild ones, indicating the loss of rare alleles during selection as observed by other authors [23,46]. Some cultivated populations from the current study had a significant heterozygote deficit, particularly in the Ucayali region. The consequences of the inbreeding effect in fruit trees, such as *I. edulis*, might impact fruit production due to inbreeding depression, which would directly impact farmers' yield [15,23]. The fact that the species is self-incompatible [13] excludes the possibility of heterozygote deficiency due to self-pollination; probably, related trees were introduced in these populations and the value reflect biparental inbreeding.

#### *4.3. Population Structure*

The genetic variance partition in our study (92% of the variance was observed within populations and a low genetic structure, 7%, was detected among populations) is usual in outcrossing tropical forest tree species with high levels of gene flow [6]. The hierarchical AMOVA showed that the Φct between wild and cultivated populations was 1%, yet not significant.

Dawson et al. [13] found low genetic structure, similarly to our results, in *I. edulis* natural and cultivated stands, with nuclear but not chloroplast microsatellite data. Nevertheless, the authors used only two chloroplast loci, which might have biased the results, since the smaller effective population size of the chloroplast genome makes it more susceptible to genetic drift and species differentiation [47]. Conversely, a high genetic structure was found between natural and cultivated stands of *I. vera*, and the authors reasoned that the cultivated populations were derived from seeds coming from different mother trees, but a different geographic origin was also possible [23].

In the current study, for *K* = 2 (Figures 3 and 4), the wild populations displayed identical composition, with the predominant red cluster. The uniform composition of the studied wild material could be due to the relatively recent speciation of the genus [9] and, also, to regional wild populations sampling [8]. The red cluster prevailed in the northern cultivated populations: the Loreto region and along the Ucayali river in the Ucayali region, which could express large population centres occupying the margins of main rivers with extensive trade networks [43]. A tiny green genetic cluster is present in the *wild* populations and in the cultivated populations of the Loreto region. Conversely, the green cluster is relevant in Selva Central and Ucayali cultivated populations. The green cluster increases in the sub-Andean Selva Central region and in the higher elevated sites from the Ucayali region, where the *I. edulis* trees were traditionally used on coca fields before the Conquest, and for shading protection of cocoa, coffee and tea plantations after the Spanish settlement [43]. In the Loreto region, the cultivated population 13 BRc possessed a higher proportion of green cluster than others from this region. This population is near Bretaña village, which was named after the Europeans, who arrived from the Andes and the coastal regions of Peru, during the rubber boom at the end of the 19th century [48].

Iquitos, in the Loreto region, is referred to as a crop domestication centre in Amazonia, created as populations expanded, and providing strong evidence that pre-conquest human populations had intensively transformed their plant resources [18,20]. Indeed, the *I. edulis* domestication was probably achieved by selecting from the local wild population and possibly started in the Loreto region, since the genetic structure of the cultivated populations from this region do not differ much from the wild ones. Moreover, they have bigger pods and lower allelic richness than the other cultivated populations, which could indicate that the selection intensity was higher here. Indeed, some authors claim that the possible origin of *I. edulis* domestication was in this region, which was also the location for the domestication of other species [18,22]. Additionally, the crop is probably recently domesticated, since when a crop is in an initial process of domestication no clear genetic structuring occurs, as in Brazil nut [22]. The genetic differentiation between the wild and cultivated populations is low and with admixture; the cultivated populations seem to originate from the wild ones. Conversely, the results of Dawson et al. [13] on chloroplast haplotype composition displayed a completely different pattern between natural and cultivated populations. The authors explained these results by a non-local origin of the *I. edulis* cultivated material. Our results do not support this theory. Instead, we inferred that the cultivated populations had a local germplasm origin, yet without representative sampling, which is expected, since a few trees were probably selected in nearby wild populations. Indeed, a possible genetic drift effect (change in the frequency of the allele in a population due to random sampling of organisms) in *the cultivated* populations is expected.

#### *4.4. Practical Measures to Maintain I. edulis Genetic Resources*

The *I. edulis* germplasm management should focus on both the wild and the cultivated stands. In the case of wild material, the protection of the original Amazonian vegetation remnants is key to maintaining the species' genetic resources in the region. In modern-day Amazonia, increasing deforestation for the establishment of pastures has become a global concern due to its impacts on biodiversity [43]. Considering the cultivated stands, the villages and indigenous settlements are the units of interest because they are the domesticated plant population keepers. Consequently, the fate of the village will determine the maintenance of the crop genetic resources. For example, the post-Colombian population collapse that resulted in a loss of village units and corresponded to the loss in human numbers (ca. 90–95% population decline), was quickly reflected in the loss of crop diversity [20,43]. The cultivated populations with low genetic diversity and/or high inbreeding estimates (e.g., 7 VHc, 10 ARc, 12 SSc, 14 JHc, 17 EPc, 18 EDc, 19 MAc and 21 INc) should be supported with new germplasm sources (from wild populations) to eliminate the risk of biparental inbreeding and diversity loss, which might be reflected in the future crop value (inbreeding depression, flower abortion, and crop yield failure).

#### **5. Conclusions**

The results of the current study on *I. edulis* show a significantly higher value for average pod length in cultivated trees than in wild trees. The wide-scale infusion from wild stands into farms could negatively affect fruit size and weaken domestication efforts over time. Additionally, the Loreto region displayed the highest average pod length, as well as having populations with a lower allelic richness when compared to the cultivated populations of other regions.

The cultivated stands in the Selva Central and Ucayali region could, additionally, be a germplasm material source, and thus could provide a long-term safeguard for on-farm conservation since the Loreto region possesses the populations with the lowest values of allelic richness. Hybridization programmes using such a germplasm source and local wild material with backward selection could help increase the crop yield and genetic diversity in the cultivated populations. Additionally, new selection should consider the needs of modern agriculture and forest management practices, as well as global warming.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/6/655/s1, Figure S1. Representation of the ΔK distribution statistic. The method of Evanno et al. [37] was used to determine the most appropriate number of genetic clusters (K); Table S1. *Inga edulis* 27 populations (cultivated and wild), region (Geographic region), sampling region (Site), population code (Pop.) and sampling details (Sampling description); Table S2. Average fruit length per population. The minimal and maximal pod length values (in brackets) and the number of pods (Nl) per population, in a total of 448 pods. Table S3. Diversity parameters comparison between cultivated and wild populations. Allelic richness (*R*S), observed heterozygosity (*H*O) and inbreeding coefficient (*F*IS). P = probability values for differences between groups for two-sided t-test after 1000 permutations. \* = significant test (*p* < 0.05).

**Author Contributions:** Conceptualization, A.R., B.L. and M.M.R.; field data collection, A.R., Z.M.C.P., H.V., E.C., S.G.T. and R.M.V.A.; methodology, A.R., M.M.R., R.L.C., C.S.; formal analysis, writing—original draft preparation, M.M.R., A.R., B.M., M.K. and T.H.; writing—review and editing, M.M.R., A.R., B.L., B.M., M.K. and T.H.; funding acquisition, A.R., B.L. and M.M.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was funded by the Bilateral Project ´Morphological and genetic diversity of indigenous tropical trees in the Amazon—model study of Inga edulis Mart. in Peruvian Amazon´; Czech Academy of Sciences and CONCYTEC, Peru 2011–2012; The Internal Grant Agency of CULS Prague (No. 20185004, No. 20205003); The Scholarship National University of Ucayali, Peru; European Union Lifelong Learning Programme Erasmus Consortium—Practical Placement Scholarship (Certificate No. CZ-01-2009). The Foundation Nadace Nadání Josefa, Marie a Zde ˇnky Hlávkových, Czech Republic; Supported by the Ministry of Agriculture of the Czech Republic, institutional support MZE-RO0418; Foundation for Science and Technology, Portugal, UIDB/00239/2020 and UIDB/00681/2020 supported MMR.

**Acknowledgments:** We thank the Servicio Nacional de Áreas Naturales Protegidas and to José Grocio Gil Navarro director of Pacaya Samiria National Reservation for the investigation authorization (N◦ 004-2012-SERNANP-RNPS-J). Thanks are extended to N. Roque for help with Figure 1, to D. Petrus for help with Figure 4, and to I. Salavessa for editing the English of the manuscript. We want to thank two anonymous reviewers for the very helpful comments and suggestions, which considerably improved the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


*Biogeography, and Conservation*; Pennington, R.T., Lewis, G.P., Ratter, J.A., Eds.; Crc Press-Taylor & Francis Group: Boca Raton, FL, USA, 2006; pp. 433–447.


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **High Genetic Diversity and Low Di**ff**erentiation in** *Michelia shiluensis***, an Endangered Magnolia Species in South China**

**Yanwen Deng 1,2, Tingting Liu 1,2, Yuqing Xie 1,2, Yaqing Wei 3,4, Zicai Xie 2, Youhai Shi 3,4,\* and Xiaomei Deng 1,2,\***


Received: 26 March 2020; Accepted: 17 April 2020; Published: 21 April 2020

**Abstract:** *Research Highlights*: This study is the first to examine the genetic diversity of *Michelia shiluensis* (Magnoliaceae). High genetic diversity and low differentiation were detected in this species. Based on these results, we discuss feasible protection measures to provide a basis for the conservation and utilization of *M. shiluensis*. *Background and Objectives*: *Michelia shiluensis* is distributed in Hainan and Guangdong province, China. Due to human disturbance, the population has decreased sharply, and there is thus an urgent need to evaluate genetic variation within this species in order to identify an optimal conservation strategy. *Materials and Methods*: In this study, we used eight nuclear single sequence repeat (nSSR) markers and two chloroplast DNA (cpDNA) markers to assess the genetic diversity, population structure, and dynamics of 78 samples collected from six populations. *Results*: The results showed that the average observed heterozygosity (Ho), expected heterozygosity (He), and percentage of polymorphic loci (PPL) from nSSR markers in each population of *M. shiluensis* were 0.686, 0.718, and 97.92%, respectively. For cpDNA markers, the overall haplotype diversity (Hd) was 0.674, and the nucleotide diversity was 0.220. Analysis of markers showed that the genetic variation between populations was much lower based on nSSR than on cpDNA (10.18% and 77.56%, respectively, based on an analysis of molecular variance (AMOVA)). Analysis of the population structure based on the two markers shows that one of the populations (DL) is very different from the other five. *Conclusions*: High genetic diversity and low population differentiation of *M. shiluensis* might be the result of rich ancestral genetic variation. The current decline in population may therefore be due to human disturbance rather than to inbreeding or genetic drift. Management and conservation strategies should focus on maintaining the genetic diversity in situ, and on the cultivation of seedlings ex-situ for transplanting back to their original habitat.

**Keywords:** nSSR; cpDNA; Magnoliaceae; conservation genetics; fragmentation

#### **1. Introduction**

Habitat destruction and environmental pollution caused by human disturbance [1] have limited the distribution and thus promoted the spatial isolation of many wildlife species, threatening their reproduction and development [2,3]. In addition to human disturbance, genetic factors can also

represent threats to wild species [4]; such processes as mating systems, genetic drift, gene flow, evolution, and life history greatly affect the genetic diversity and spatial structure of plant populations [5–9]. Under the combined effects of human and genetic factors, the gene flow in natural populations is reduced in the short term, resulting in increased genetic differentiation among populations and genetic drift effects within populations, which together reduce genetic variation [10,11]. Over extended temporal scales, low genetic variation coupled with high differentiation put species at risk of extinction by reducing their evolutionary potential [12]. In order to maintain the adaptability and long-term survival ability of species [13–15], it is necessary to understand the genetic diversity and population structure across a given species' distribution area [16], and thereby develop scientifically grounded conservation strategies [17].

*Michelia shiluensis* (Magnoliaceae) is an evergreen tree that has been placed on the Red List of Magnoliaceae, which identifies magnolia species that are at risk of extinction throughout the world [18]. This tree is scattered throughout Hainan [19] and southern Guangdong provinces [20], and is characterized by a straight tree shape, bright leaves, elegant and aromatic flowers, and dense wood structure [21,22]. As a desirable ornamental garden plant and an important timber tree species, the wild resources of *M. shiluensis* have been severely misappropriated and plundered. In addition, the development of scenic tourism in recent years has caused damage to habitats, and thus seriously threatened the natural survival and reproduction of *M. shiluensis* [23]. Therefore, there is an urgent need to understand the genetic diversity and population structure of *M. shiluensis* in order to plan conservation actions on its behalf.

Molecular markers have been widely used to study magnolia species. Simple sequence repeat (SSR) markers, also known as microsatellite DNA, are widely distributed throughout the nuclear genome [24]. These markers can display high polymorphism and reproducibility [25], reflect gene flow from seed and pollen [26], and depict fine genetic structure [27]. Unlike the bi-parentally inherited nuclear genome, the chloroplast genome is inherited exclusively from the maternal line in most angiosperms [28,29], and is therefore widely used as a marker for inferring species phylogeography, origin, and historical dynamics [30–32]. The combined analysis of the two markers can be used to comprehensively assess the importance of each population and provide a theoretical basis for conservation strategies [33].

In this study, we used eight pairs of nuclear single sequence repeat (nSSR) loci and two cpDNA gene spacers to study the genetic diversity and population structure of 78 *M. shiluensis* individuals from six populations. The purpose of this study was to: (1) identify genetic diversity at both population and species levels, (2) explore the correlation between genetic diversity and population size, (3) determine the level of population differentiation and dynamics, and (4) discuss possible effective strategies for protection. This research will help the development of conservation and breeding plans for *M. shiluensis*.

#### **2. Materials and Methods**

#### *2.1. Plant Material and DNA Extraction*

Based on a previous investigation and information from the National Plant Specimen Resource Center (NPSRC), we identified seven possible wild populations for sampling: one located in Guangdong province (YC(YangChun E'HuangZhang Nature Reserve), and the other six in Hainan province (DL (DiaoLuoShan Nature Reserve), BS (BaiSha County), YG (YingGeLing Nature Reserve), WZ (WuZhiShan Nature Reserve), CJ (ChangJiang County), and WN (WanNing County)). However, we failed to locate any individuals from the WN population. Thus, we sampled 78 individuals out of six populations (DL, BS, YG, WZ, CJ, and YC) from two provinces (Table S1). It should be noted that we were unable to locate seedlings from any of the populations except for DL. Therefore, we sampled all individuals from the BS, YG, WZ, CJ, and YC populations, and randomly collected samples from DL (at least 15 m between each individual). Leaves from each individual were placed into Ziploc bags sealed with silica gel, then brought back to our laboratory and stored at 4 °C until DNA extraction.

Total genomic DNA was extracted from dried leaves following the supplied protocol in the Plant Genomic DNA Kit (DP305, TianGen Biotech Co., LTD, Beijing, China). The NanoDropTM 2000 Spectrophotometers (Thermo Scientific, Waltham, MA, USA) were used to test the concentration of extracted DNA solution, and the DNA work solution was prepared at an approximate concentration of 10 ng/μL.

#### *2.2. Primers and Fragment Amplification*

In total, 117 nSSR primer pairs were chosen from previous studies on Magnoliaceae; all were screened using eight samples from six populations. Screening revealed 16 pairs of primers that were able to amplify specific bands from selected samples. Among these, eight pairs showed high peak fluorescence signals and rich polymorphism in genotyping results. A total of 10 pairs of universal cpDNA primers were selected for screening; only two displayed rich polymorphisms in sequencing results. All primers (Table S2) were manufactured by Tsingke (Tsingke Biotech Co., Beijing, China).

Two PCR procedures were conducted for nSSR amplification, following the method of Schuelke (2000) [34]. The final volume used in the first PCR cycle was 10 μL; this contained 1 μL of genomic DNA (about 10 ng), 5 μL of 2X Es Taq Master Mix (Cwbiotech, Beijing, China), 1 μL each of forward and reverse primers, and ddH2O filled in the remainder. In the second PCR cycle, 5 μL of PCR product from the first cycle was used as a template in the final 30 μL volume, which contained 15 μL of 2X Es Taq Master Mix (Cwbiotech, Beijing, China), 3 μL fluorescent dye primer, and ddH2O filling in the remainder.

The DNA amplification protocol for nSSR was as follows: 5 min at 94 °C for denaturation, followed by cycles of 30 s at 94 °C, 30 s at specific Ta (Table S2), and 30 s at 72 °C, with a final extension step of 72 °C for 7 min. Twenty cycles were conducted for the first PCR procedure, and 35 for the second. The SSR-PCR products were run on the ABI 3730 DNA Analyzer (Biosystems, Foster City, CA, USA), using GS-500 as an internal size standard. Allele size was assessed using GeneMarker 2.2.0 [35]. For cpDNA amplification, cycling parameters were set as follows: initial denaturation at 94 ◦C for 3 min, followed by 35 cycles of denaturation at 94 ◦C for 1 min, annealing for 35 s at 56 °C, extension at 72 ◦C for 1 min, and a final extension at 72 ◦C for 5 min. Products of cpDNA amplification were sequenced on the ABI 3730 xl DNA Analyzer (Biosystems, Foster City, CA, USA).

#### *2.3. Data Analysis*

#### 2.3.1. nSSR Analysis

Genetic diversity parameters were generated using GeneAlEx 6.5.2 [36]. These were: the number of observed alleles (Na), number of effective alleles (Ne), the observed (Ho) and expected (He) heterozygosity, the Shannon diversity index (I), the percentage of polymorphic loci (PPL), and the private alleles (Np). A Mantel test was also carried out in GenAlEx 6.5.2 [36]. PowerMarker v3.25 [37] was used to calculate polymorphism information content (PIC) and linkage disequilibrium (LD). Population genetic structure analysis using mixed models and allele-dependent frequency models was conducted using STRUCTURE 2.3.4 [38]. The parameters in STRUCTURE were set as follows: K = 1–6, Number of Interaction = 10, Burn-in period = 1,000,000, Markov chain Monte Carlo (MCMC) = 200,000. The results from STRUCTURE were compressed and uploaded to Structure Harvester online service; from this, we obtained the most likely K value (ΔK) and L value (K) using Evanno's method [39]. Optional clustering was summarized using CLUMPP v1.1.2 [40]. The neighbor-joining (NJ) tree clustering algorithm was conducted using Mega 7 [41] and edited on iTOL [42]. Additionally, gene flow between populations (Nm), F-statistics for inbreeding coefficient (FIS), global inbreeding coefficient (FIT), and the coefficient of genetic differentiation (FST) were generated using Popgene 32 [43]. Arelquin v3.5 [44] was used to calculate the pairwise FST for each population and for analysis of molecular variance (AMOVA).

#### 2.3.2. cpDNA Analysis

Sequencing results were checked; misidentified results were manually corrected using BioEdit 7.0.5.3 [45]. Calculation of haplotype diversity (Hd) and nucleotide diversity (Pi), derivation of polymorphic loci, and a mismatch analysis were performed using DnaSP 6.12.03 [46]. The haplotype network was visualized using TCS 1.21 [47]. Sequence alignment shearing and Maximum Likelihood (ML) mapping were performed using MEGA 7 [41] with *Michelia odora* as an outgroup. We used PERMUT 1.2.1 [48] to calculate HS, HT, GST, and NST, along with their P values. Arlequin 3.5.2.2 [44] was used to calculate the pairwise FST, AMOVA, and neutrality tests. A Mantel test was conducted using GenAlEx 6.502 [36].

#### **3. Results**

#### *3.1. Genetic Diversity and Population Structure from nSSR Markers*

A total of 115 alleles were detected at the eight loci (Table S3). The Na at each locus ranged from 7 to 21, with an average of 14.375 per loci. The average I was 1.506; the lowest was 0.820 (WS18) and the highest was 1.767 (LT116). The PIC values ranged from 0.531 to 0.909, with an average of 0.809 per locus. The Ho values ranged from 0.482 (WS18) to 0.780 (MA3-7), with an average value of 0.686. Four of the eight loci displayed higher He than Ho, with the highest at LT116 (0.797) and the lowest at WS18 (0.445), and an average of 0.718. There was no evidence of significant LD. The average estimated FIS, FIT, FST, and Nm were 0.042, 0.176, 0.139, and 1.546, respectively (Table S4).

At the population level, the average Na, Ne, I, Ho, and He were 6.021 (4.625–9.500), 4.236 (3.598–5.174), 1.506 (1.316–1.784), 0.686 (0.563–0.786), and 0.718 (0.650–0.773), respectively (Table 1). The PPL in each population ranged from 87.50% to 100%, with an average of 97.92%. The PPL of the WZ accession was 87.50, whereas those of all other populations were 100. All populations contained private alleles, with 24 found in DL, followed by seven in YC, four in YG, three each in BS and WZ, and two in CJ. The inbreeding coefficient (FIS) for CJ was 0.207; those for all of the other populations were less than 0.1. This result indicates that intense breeding had occurred within the CJ population. An examination of the correlation between population genetic diversity parameters and population size showed that Na, Ne, I, and Np were significantly related to the population size; however, Ho and He were not related (Figure 1). At the species level, Na, Ne, I, Ho, He were 14.375, 7.019, 2.120, 0.689, and 0.825, respectively. These parameters show high genetic diversity in *M. shiluensis* at both the species and population levels.

Analysis of nSSR genetic structure shows a K of 167.38 for K = 2. The results show that all 78 individuals comprised two clusters (Figure 2), which separate individuals from DL from the other populations; there was evidence of minor genetic mixing between DL and other populations.


**Table 1.** Genetic diversity parameters among six populations of *Michelia shiluensis*.

Notes: DL, DiaoLuoShan Nature Reserve; BS, BaiSha County; YG, YingGeLing Nature Reserve; WZ, WuZhiShan Nature Reserve; CJ, ChangJiang County; YC, YangChun E'HuangZhang Nature Reserve; N, number of samples; Na, number of alleles; Ne, number of effective alleles; I, Shannon index; Ho, observed heterozygosity; He, expected heterozygosity; Np, number of private alleles; FIS, population inbreeding coefficient; PPL, percentage of polymorphic loci.

**Figure 1.** The correlation coefficient between population genetic diversity parameters and population size of *Michelia shiluensis*.

**Figure 2.** Genetic relation among six populations of *Michelia shiluensis* analyzed by STRUCTURE based on nuclear single sequence repeat (nSSR) markers.

The AMOVA analysis of the *M. shiluensis* populations revealed a high proportion of variance within the population (89.92%, *p* < 0.001), and a lower proportion of variance across populations (10.18%, *p* < 0.001; Table 2).


**Table 2.** AMOVA result for six populations from nSSR markers and cpDNA haplotypes.

The tree diagram obtained by the NJ clustering method shows that the 78 individuals from six accessions were divided into two clusters (Figure 3). Forty-six individuals from DL and one from BS comprise the green cluster, and three individuals from DL and the remaining individuals comprise the red cluster. This result is very similar to that obtained through STRUCTURE analysis.

**Figure 3.** Neighbor joining (NJ) tree among individuals was constructed by MEGA 7.0 using nSSR data.

The Mantel test (Figure 4) confirmed that genetic distance and geographic distance were significantly correlated (*p* = 0.04, R = 0.66), indicating that an increase in geographic distance leads to increased genetic differentiation among populations. However, this correlation becomes insignificant upon removal of the YC population, which is located far from the main distribution area (*p* = 0.20, R = 0.33).

**Figure 4.** Mantel test among populations of *Michelia shiluensis* from nSSR markers. (**a**) Includes YC. (**b**) Excludes YC.

#### *3.2. Genetic Diversity and Population Structure from cpDNA Marker*

The aligned sequences consist of 1142 bp from two chloroplast DNA regions: *trnH-psbA* (400 bp) and *trnK59-matK* (742 bp). A total of 12 nucleic acid substitutions were observed within the region, and a total of six haplotypes were identified (Table S5). Haplotype diversity is highest in YG, followed by YC and DL; the lowest was found in BS, WZ, and CJ. Haplotype diversity ranges from 0 to 0.800, with an average of 0.674, reflecting the degree of difference among the haplotypes in each population. The YG accession shows moderate nucleotide diversity (0.00333), while DL (0.00074) and YC (0.00117) both show lower diversity. Haplotype and nucleotide diversity are both zero in three populations (BS, WZ, and CJ), which reflects the absence of variation between individuals within each of these populations (Table 3).


**Table 3.** Distribution of cpDNA haplotypes, Hd, and Pi among six populations of *Michelia shiluensis*.

The haplotype H3 was shared among all populations (Figure 5), and the remaining haplotypes were private in populations. The haplotype H3 was central in the haplotype network, and can mutate into any of the other five haplotypes through base substitution. Haplotypes H1 and H2 were exclusive to DL; H4 and H5 were exclusive to YG; and H6 was exclusive to YC. Based on its widespread distribution in all populations, we surmise that H3 may represent the ancestral haplotype of *M. shiluensis*.

The analysis of the ML tree (Figure 6) shows that the haplotypes H3, H4, and H5 comprised one cluster; H1 and H2 comprised another cluster; and H6 comprised a third cluster.

Total genetic diversity (HT = 0.856) was higher than that within each population (HS = 0.137). The permutation test revealed a higher value for NST (0.728) than GST (0.534), with P < 0.05, indicating a clear phylogenetic structure among the populations.

**Figure 5.** Location of the study populations and chloroplast haplotype relationships and distribution in six accessions of *Michelia shiluensis*. The radius of the pie charts is proportional to the number of individual haplotypes in each population.

**Figure 6.** Maximum Likelihood (ML) tree based on six haplotypes of the cpDNA fragment with *Michelia odora* as an outgroup. The numbers are percentage values over 1000 bootstrap replicates. Only bootstrap values over 50% are shown.

AMOVA results (Table 2) show that chloroplast DNA diversity of all groups was 77.56% among populations and 22.44% within populations.

The neutrality test (Table 4) shows that both Tajima's D and Fu's FS were positive, but not significant (*p* = 0.525 and 0.904, respectively). This result indicates that chloroplast DNA diversity was less affected by selection than nuclear DNA diversity. The analysis of the mismatch distribution using multimodal plots (Figure 7) shows that a recent expansion in population was not supported, indicating that the population may be in dynamic equilibrium. The values of SSR and HRag were 0.059 (*p* = 0.25) and 0.190 (*p* = 0.41), respectively.


**Table 4.** Neutrality test and mismatch distribution from cpDNA fragments of *Michelia shiluensis*.

**Figure 7.** Multimodal plots from mismatch distribution of *Michelia shiluensis* analyzed by DnaSP.

The results of the correlation between geographic distance and genetic distance (Mantel test; Figure 8) show no significant correlation between the genetic distance and geographic distance of the *M. shiluensis* population at the cpDNA level (R = 0.01, *p* = 0.96), regardless of whether YC is removed (R = 0.63, *p* = 0.13).

**Figure 8.** Mantel test among six populations of *Michelia shiluensis* from cpDNA markers. (**a**) Includes YC. (**b**) Excludes YC.

#### **4. Discussion**

#### *4.1. High Level of Genetic Diversity*

The level of genetic diversity plays an important role in the adaptive evolution and long-term survival of species and populations [13–15]. Generally, higher genetic diversity is found in widely distributed species than in those with restricted ranges [49,50]. However, many studies have shown that high genetic diversity can be maintained in endangered species despite dispersed populations [17,51]. In this study, a relatively high degree of genetic variation was found in the sampled populations of *M. shiluensis*. The nSSR data showed the population-level He to be 0.718, which is higher than that

found in *Magnolia tomentosa* (He = 0.675) [52], *Michelia coriacea* (He = 0.47) [53], or *Magnolia wufengensis* (He = 0.184) [54], but lower than that of *Magnolia stellata* (He = 0.773) [9]. It is worth noting that the He found in this study is significantly higher than that found in some narrow-ranging species (He = 0.420) and in some widely distributed species (He = 0.620) [55]. Studies have shown that genetic diversity is positively correlated with population size [56–58]. Surprisingly, in this study, neither Ho (*p* = 0.953) nor He (*p* = 0.166) was significantly related to the population size (Figure 1). This outcome has been found in some special cases [59,60]. However, because the population size in this study has remained relatively uniform, this result might require further consideration. The cpDNA data show that the overall nucleotide diversity was relatively low (0.00220), but the haplotype diversity (Hd) was 0.674 (Table 3); this is higher than that of both *M. stellata* (Hd = 0.527) [9] and *Michelia maudiae* (Hd = 0.44) [27], but lower than that of *Michelia formosana* (Hd = 0.953) [61]. High haplotype diversity further indicates that *M. shiluensis* is highly genetically diverse and is not greatly affected by genetic drift [9]. The lower nucleotide diversity may be due to highly conserved sequences and low substitution rates within Magnoliaceae [62].

Although the six sampled *M. shiluensis* groups formed a single large population comprising several smaller subpopulations [63], the genetic diversity in each subpopulation was relatively high. The reasons for high genetic diversity may be as follows: first, we speculate that gene exchange between *M. shiluensis* populations has historically been frequent, and thus, the existing population inherited rich ancestral genetic variation [64]; second, *M. shiluensis* is a perennial plant, and therefore, the recent sharp decline of individuals has not increased the chance of inbreeding, which reduces genetic diversity [65]; third, plants of the *Michelia* genus tend to cross-pollinate, which can reduce the loss of genetic diversity through large gene flows [55,66,67].

#### *4.2. Lack of Genetic Di*ff*erentiation between Populations*

The results of molecular analysis of variance (AMOVA; Table 2) showed that genetic variation among the *M. shiluensis* populations was 10.18% based on nSSR markers, while that based on cpDNA markers was 77.56%. This inconsistency may be related to the differential focus of the markers or high levels of gene flow [68]. Comparative analysis between bi-parental markers and maternally inherited markers can provide comprehensive insights into population dynamics because cpDNA mutations reflect past changes, while nSSR mutations can provide inferences for recent population events [69]. Thus, differences in genetic patterns and rates of evolution often produce large contrasts between nuclear and organelle genetic diversities under conditions of genetic differentiation [28].

According to Wright [70], if Nm is less than 1, genetic drift and differentiation may occur. In this study, Nm reached 1.546 (Table S3), thus, it is very likely that the *M. shiluensis* population has not yet undergone either genetic drift or any apparent degree of genetic differentiation. From the perspective of *M. shiluensis* haplotype distribution (Figure 5), haplotype H3 is distributed in all populations, which greatly reduces the level of differentiation between populations. In addition, the characteristic red seeds of Magnoliaceae are easily seen and eaten by birds [71]. As a result, the seeds can be widely dispersed via long-distance bird flight, thereby increasing the range of gene flow. Based on the above reasons, we can infer that a large amount of gene flow may be the reason for the current low level of genetic differentiation and indistinct geographic structure in the *M. shiluensis* population [72].

Generally, the level of genetic differentiation between populations increases with increasing geographic distance [73]. However, the two markers showed a different pattern when applied to geographically isolated *M. shiluensis* accessions (Figure 4, Figure 8). The results of nSSR showed that the genetic distance increased significantly with the increase in geographic distance (*p* = 0.04); however, removal of the YC population eliminated significance (*p* = 0.20). The YC population is far from the main distribution area and isolated by a strait (Figure 5). We speculate that the latter was the source of the significant difference we found. Results based on cpDNA showed no correlation between genetic and geographic distances regardless of whether YC is removed (P=0.96, 0.13). The result of the Mantel test may be due to the fact that, in the primary distribution range of *M. shiluensis*

on Hainan Island [23], the maximum distance between groups of less than 80 km is insufficient to limit the range of bird activity; the resulting pattern is consistent with a lack of geographical isolation [9]. In addition to bird-mediated seed dispersal, gravity is another agent of dispersal, which results in a spatially-restricted genetic structure [9].

#### *4.3. Demographic History and Population Structure*

Population genetic structure reflects the genetic relationships within and between populations. From the haplotype network diagram, we can infer that *M. shiluensis* is likely to have originated in central Hainan. Colonization in DL fostered the formation of the haplotypes H1 and H2, while haplotypes H4 and H5 were formed after colonization in YG. Based on the large number of individuals and the relatively complete age structure in the DL population, we conclude that H1 and H2 form dominant haplotypes; however, as the number of individuals remaining in YG is small, it is impossible to determine which one of these haplotypes is dominant. Results of nSSR analysis confirm the genetic distinctiveness between DL and other Hainan groups, but suggest that the YG population may be very similar to BS, WZ, and CJ. The large number of private alleles in the DL population may be an important adaptation to an atypical environment [74], or a response to climate change; this result may be related to the local adaptability this species developed in response to the low temperature conditions of the Cenozoic Era [75,76]. It is worth mentioning that *M. shiluensis* has also been found outside Hainan Province. Studies have shown that Hainan Island was connected to Guangxi Province 65 million years ago [77]. In this study, STRUCTURE results of the analysis of nSSR markers indicate that the YC population shares the same gene pool as BS, YG, WZ, and CJ, while the analysis of cpDNA markers indicate that all sampled populations share the H3 haplotype. Furthermore, we speculate that there may be residual *M. shiluensis* in the area around the border between Guangdong and Guangxi.

#### *4.4. Implication for Conservation*

One of the main goals in protecting endangered species is to maximize existing genetic variation [17], as genetic variation may largely promote adaptation to a changing environment [78,79]. As all of the sampled *M. shiluensis* groups showed similar levels of genetic diversity, all of these populations make important contributions to the viability and protection of the species. Strengthening in situ conservation is an ideal approach that enables existing genetic resources to continue developing in an existing suitable environment. According to our field investigation, natural reserves are well-established in the DL, WZ, YG, and YC populations. Although no natural reserves are established in BS and CJ, these populations have also received protective support from the local government. The protection of existing genetic resources is only a short-term goal; the medium-term goal is to establish a new generation to improve genetic diversity. Except for DL, the individuals in the remaining populations are mature trees with a high level of genetic diversity, which provides protective value to each individual. One strategy for propagating future generations is to collect seeds in order to cultivate young seedlings and establish breeding in an off-site location [11]. In addition, germplasm resources can be propagated by grafting, and genetically dissimilar individuals can be crossed by artificial pollination to produce highly heterozygous seedlings. Great care needs to be taken to avoid genetic contamination when transplanting seedlings back to their native land [80], as this could allow foreign genes to infiltrate the original population and potentially eliminate it due to a lack of competitiveness. Finally, future research can encompass building a niche model to explore possible habitats of *M. shiluensis*, further expand its distribution range, and generate more genetic variation through local adaptation.

#### **5. Conclusions**

In summary, our results from nSSR and cpDNA markers indicate that high levels of population genetic diversity and low levels of genetic differentiation still exist in the endangered plants we studied. Therefore, we infer that the fragmentation and isolation within the *M. shiluensis* population may be

due to recent human disturbance rather than to inbreeding or genetic drift. We have proposed some protection strategies for *M. shiluensis* based on these data.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1999-4907/11/4/ 469/s1, Table S1. Geographical and climatic information of sampled *Michelia shiluensis* populations; Table S2. Characterization of the eight SSR and two cpDNA primers used in this study; Table S3. Genetic diversity of eight SSR markers within six *Michelia shiluensis* populations; Table S4. F-statistics and Nm at nSSR loci in six *Michelia shiluensis* populations; Table S5. Variable sites of the cpDNA fragment in six *Michelia shiluensis* populations.

**Author Contributions:** Conceptualization, Y.D.; methodology, Y.D. and T.L.; software, Y.X.; validation, Y.D., T.L., Y.X., Y.W., M.G., Z.X., Y.S., and X.D.; formal analysis, T.L. and Y.X.; resources, Y.D., Y.W., Z.X., Y.S., and X.D.; writing—original draft preparation, Y.D.; writing—review and editing, Y.D.; supervision, X.D.; project administration, X.D.; funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the Forestry Public Welfare Industry Research of China (Grant Number: 201404116), the National Natural Science Foundation of China (Grant Number: 31670601), and the Forestry Science and Technology Innovation of Guangdong Province grant programs (Grant Numbers: 2014KJCX006, 2017KJCX023).

**Acknowledgments:** We appreciate the help from Xinsheng Qin and forest rangers in each location during sampling. We are also grateful to Ye Sun and Xinsheng Hu for their invaluable guidance during the manuscript preparation.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Comparative Plastome Analyses and Phylogenetic Applications of the** *Acer* **Section** *Platanoidea*

### **Tao Yu 1, Jian Gao 2, Bing-Hong Huang 3, Buddhi Dayananda 4, Wen-Bao Ma 5, Yu-Yang Zhang 1, Pei-Chun Liao 3,\* and Jun-Qing Li 1,\***


Received: 17 March 2020; Accepted: 16 April 2020; Published: 19 April 2020

**Abstract:** The *Acer* L. (Sapindaceae) is one of the most diverse and widespread genera in the Northern Hemisphere. Section *Platanoidea* harbours high genetic and morphological diversity and shows the phylogenetic conflict between *A. catalpifolium* and *A. amplum*. Chloroplast (cp) genome sequencing is efficient for the enhancement of the understanding of phylogenetic relationships and taxonomic revision. Here, we report complete cp genomes of five species of *Acer* sect. *Platanoidea*. The length of *Acer* sect. *Platanoidea* cp genomes ranged from 156,262 bp to 157,349 bp and detected the structural variation in the inverted repeats (IRs) boundaries. By conducting a sliding window analysis, we found that five relatively high variable regions (*trnH*-*psbA*, *psbN*-*trnD*, *psaA*-*ycf3*, *petA*-*psbJ* and *ndhA* intron) had a high potential for developing effective genetic markers. Moreover, with an addition of eight plastomes collected from GenBank, we displayed a robust phylogenetic tree of the *Acer* sect. *Platanoidea,* with high resolutions for nearly all identified nodes, suggests a promising opportunity to resolve infrasectional relationships of the most species-rich section *Platanoidea* of *Acer*.

**Keywords:** *Acer*; sect. *Platanoidea*; chloroplast genome; sequence divergence; structural variation; phylogenetics

#### **1. Introduction**

Chloroplasts (cp) are essential organelles in plant cells for the processes of photosynthesis and carbon fixation [1]. They possess uniparental inheritance and their genome has a high conservation structure in most land plants [2]. Generally, cp genomes in most angiosperms are circular DNA molecules composed of four parts, namely two inverted repeats (IRs) at approximately 20–28 kb, a vast single-copy region (LSC) at 80–90 kb and a small single-copy region (SSC) at 16–27 kb [3]. The composition of angiosperm cp genomes is relatively conserved and encodes four ribosomal RNAs (rRNAs), roughly 30 transfer RNAs (tRNAs) and approximately 80 single-copy genes [4]. With the rapid development of next-generation sequencing (NGS) and other methods for obtaining the cp genome sequences, the availability of cp genome sequences has increased dramatically for land plants, offering opportunities for the comprehensive structure comparison, improvement of horticultural plant breeding [5,6] and reconstruction of evolutionary relationships [7,8]. In most angiosperms,

the cp genome is inherited from the patrilineal lineage and exhibits little or no recombination [9]. Due to its relatively conserved features, cp sequences are commonly used as DNA barcodes for genetic identification, plant systematic studies, and research into plant biodiversity, biogeography, adaptation, etc. [10,11].

*Acer* L. (Maple), a diverse genus of family Sapindaceae L., contains more than 124 species [12]. Most extant species of the genus are native to Asia, whereas others occur in North America, Europe and North Africa [12–14]. Most *Acer* species are famous ornamental plants [13], and also another usage for pharmaceutical and chemical products [15]. To date, 17 species belonging to the *Acer* section Platanoidea have been recognised in China, in which the section shows high genetics and morphology diversity. [12]. Among them, four species are widespread in various vegetation regions (*A. amplum*, *A. longipes*, *A. mono* and *A. truncatum*) [16], and some are endangered, such as *A. catalpifolium, A. miaotaiense,* and *A. yangjuechi.*

However, some species with diversified leaf morphology have taxonomic controversy due to unresolved phylogenetic relationships (for example, *A. longipes* and *A. amplum*) and require further studies and clarification [17]. The comparative plastome analysis allows detailed insights to affirm the phylogenetic placement of these plants and will be useful for species identification, to verify taxonomic levels and identify phylogenetic relationships [8,18]. Recently, cp genomes of *A. miaotaiense* and *A. truncatum* have been reported, but merely the sequence information was provided without further analyses [19,20]. Thus, a comparative study among these two published cp genomes and five newly generated plastomes of sect. *Platanoidea* is conducted and applied to address phylogenetic and taxonomic validity.

Firstly, we reported newly completed cp genomes of five species of the *Acer* sect. *Platanoidea* (*A. catalpifolium*, *A. amplum*, *A. longipes*, *A. yanjuechi* and *A. mono*). Then, we compared the gene contents and the plastomic organisation with two published cp genomes in the sect. *Platanoidea* to identify variable loci that can apply to the species or population-level studies on *Acer*. The aims of this study are: (i) to deepen our understanding of the genetic and structural diversity within the sect. *Platanoidea*, (ii) to increase our understanding of phylogenetic relationships of species within the sect. *Platanoidea*, and (iii) to reconstruct a phylogenetic tree based on these plastomes. Our study also provides genetic resources for future research in this genus.

#### **2. Materials and Methods**

#### *2.1. Sampling and DNA Extraction*

Young leaves of five *Acer* species (*A. catalpifolium*, *A. amplum*, *A. longipes*, *A. yanjuechi* and *A. mono*) were collected and dried immediately with silica gel for preservation, for each species we collected the leaves from one healthy plant. Collection information of the plant materials is showed in Table S1. Vouchers taxonomical determination was by the Beijing Forestry University herbarium and deposited at the College of Forestry, Beijing Forestry University, China. We isolated total genomic DNA from silica gel-dried leaves according to the modified CTAB method [21].

#### *2.2. Chloroplast Genome Sequencing, Assembling and Annotation*

Purified genomic DNA was sequenced using an Illumina MiSeq sequencer at Shanghai OE Biotech. Co., Ltd. A paired-end library was constructed with an insert size of 300 bp, yielding at least 8 GB of 150 bp paired-end reads for each species. Clean reads were obtained with NGS QC Toolkit v2.3.3 (cut-off read length for HQ = 70%, cut-off quality score = 20, trim reads from 5 = 3, trim reads from 3 = 7) [22]. We used MITObim v. 1.8 [23] to assemble the five new *Acer* cp sequences with the reference cp genomes, *A*. *miaotaiense* (KX098452) [19], *A*. *davidii* (KU977442) [24] and *A*. *morrisonense* (KT970611) [25]. Gene functions were annotated using DOGMA [26], and the protein-encoding genes (PCG), tRNAs and rRNA were determined and verified using the BLAST searches (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

#### *2.3. Divergence Hotspot Identification*

In order to determine the divergence level, a MAFFT: multiple sequence alignment program [27] was used to align cp sequences of seven *Acer* sect. *Platanoidea* species, and then sliding windows of the nucleotide variability (pi) was conducted using DnaSP 5.0 with 600-bp window length and 200-bp step size [28].

#### *2.4. Phylogenomic Reconstruction*

To determine the phylogenetic relationship of the *Acer* sect. *Platanoidea*, we performed phylogenetic analyses using 13 cp genome sequences, which comprised five plastome sequences generated in this study, six plastomes of the *Acer* species collected from GenBank, and two of the *Dipteronia* species as an outgroup (Table S2). Consequently, a total length of 160,886 bp was aligned using MAFFT [27]. The best-fitting substitution model (GTR + I + G) was inferred using Modeltest 3.7 [29]. Finally, phylogenomic relationships were reconstructed with Bayesian Inference (BI) and Maximum Likelihood (ML) using MrBayes 3.2 [30] and phyML v3.0 [31], respectively. For the BI tree, 10 million generations were simulated using two parallel Markov Chain Monte Carlo (MCMC) simulations, sampled every 1000 generations. The first 25% of the simulations were discarded (burn-in) to generate a consensus tree. For the ML tree, 1000 bootstrap replicates were conducted to evaluate the supporting values of each node.

#### **3. Results and Discussion**

#### *3.1. Chloroplast Genome Organisation of the Acer sect. Platanoidea*

The nucleotide sequences of the seven *Acer* sect. *Platanoidea* cp genomes ranged from 156,262 bp to 157,349 bp in length (Figure 1, Table 1), which are similar to other reported cp genomes of *Acer* [19,25]. The variation of chloroplast genome length is mainly caused by the change of LSC region length. The quadripartite structures of these cp genomes were identical to most angiosperms containing a LSC region, SSC region and two inverted repeat regions (IRa and IRb) [32]. The overall GC content of these cp genomes accounts for 37.9% and the GC content of IR regions accounts for 42.8% higher than the LSC (35.9%) and the SSC (32.3%). The new sequences possess 117 genes, including four unique rRNAs, 31 tRNAs, and 82 PCGs, respectively (Table 2). Among them, most cp genes were single copy, while 23 genes exhibited double copies, including four rRNA (*4.5S*, *5S*, *16S* and *23S* rRNA), nine tRNA (*trnA-UGC*, *trnI-CAU*, *trnI-GAU*, *trnL-CAA*, *trnM-CAU*, *trnN-GUU*, *trnR-ACG*, *trnT-GGU* and *trnV-GAC*) and 10 PCGs (*ndhB*, *rpl2*, *rps12*, *rpl23*, *rps19*, *rps7*, *ycf1*, *orf42*, *ycf2* and *ycf15*). Additionally, a total of 18 genes harboured introns, and three genes (*ycf3*, *clpP* and *rps12*) contained two introns.


**Table 1.** General features of the *Acer* sect. *Platanoidea* chloroplast genomes compared in this study.

**Figure 1.** Merged gene map of the complete chloroplast genomes of five *Acer* sect. *Platanoidea* species. Genes belonging to different functional groups are colour-coded. The genes drawn inside the circle are transcribed clockwise, while those outside are transcribed counter-clockwise. Darker grey in the inner circle corresponds to the GC content of the chloroplast genome.

**Table 2.** Genes presented in the *Acer* sect. *Platanoidea* chloroplast genome.


Note: a single asterisk (\*) following after gene names indicate intron-containing genes, and double asterisks (\*\*) following after gene names indicate two introns in the gene.

#### *3.2. Comparative Analysis of the Genomic Structure*

Contraction and expansion of the IRs, LSC and SSC are important to the evolution of cp genomes [33,34], which is the leading cause of gene order and the size changes of the cp genome [35]. Detailed structure comparisons among the 13 cp genomes of the *Acer* species were presented in Figure 2. *rpl22* in the LSC/IR boundary and *rps19* was the last gene in most *Acer* sect. *Platanoidea* cp genomes included *A. mono*, *A. amplum*, *A. catalpifolium*, *A. yangjuechi*, *A. longipes*, *A. morrisonense, A. davidii* and *A. griseum.* However, the different structures of *rps19* in the LSC/IR boundary and the last gene *rpl2* were found in *A. truncatum, A. miaotaiense* and *A. buergerianum.* Among the Arecoideae species,

*rpl22* and *rps19* gene order changes in the IRA/LSC borders were also observed and has become the most varied rearrangement in this section [36]. Similarly, the rearrangement has been reported in the Apiales species, which also has two structure types of *rpl23* and *rps19* in the LSC/IR boundary [37]. Length variations of the *Acer* cp genomes were found by contraction and expansion of the LSC and IRs, the length of the LSC varies from 85,379 bp to 86,327 bp, and the length of the IR varies from 26,085 bp to 26,769 bp (Figure 2). Moreover, the type of unique structural borders of the cp genome (*A. truncatum*, *A. miaotaiense*, and *A. buergerianum*) also show a contraction of IRs, which indicates variations in boundary genes may be caused by variations in length.

**Figure 2.** Comparison of the junction sites of the LSC, IRs and SSC regions among 13 Acer sect. Platanoidea species chloroplast genomes. Different colour boxes indicate specific genes and those above the genome lines indicate their transcriptions in a forward direction, while the under-line boxes are in the reverse direction. The length of the LSC, SSC, and IR regions is shown in red, the length of the gene distance from the boundary is shown in green, the length of the gene is shown in blue, and the P represents the pseudogene.

#### *3.3. Divergence Hotspot of the Acer sect. Platanoidea Species*

Sliding window analysis of the whole cp genome was performed to identify hotspots of the *Acer* sect. *Platanoidea* species. In Figure 3, it is apparent that the *trnH*-*psbA*, *psbN-trnD*, *psaA-ycf3*, *petA-psbJ* and *ndhA* intron nucleotide variability was higher than other regions. Most divergent hotspot loci are located in the LSC region, which allows for the proper design of the genetic markers. Only one hotspot *ndhA* intron was located in the SSC region. The IR regions were much more conserved. This result was similar to other cp genomes [37,38]. The general barcode *trnH*-*psbA* has demonstrated extreme variation in plant groups [39,40]. Thus, the highest variation *trnH*-*psbA* has the potential to be used in DNA barcoding in the *Acer* sect. *Platanoidea*. Additionally, the evolutionary history of *A. mono* has been inferred by using *psbA-trnH*, *trnL*-*trnF* and an intron of *rpl16* [16]. The regions of the *psaA-ycf3*, *petA-psbJ* and *ndhA* intron have been indicated as high variations in previous studies. In witch-hazel (genus *Hamamelis* L., Hamamelidaceae), appropriate variations of *psaA-ycf3* were used to reconstruct phylogenetic relationships [41]. In *Scutellaria*, *petA-psbJ* was one of six fast-evolving DNA sequences in the cp genome [42], while a systematic study shows that Muhlenbergiinae has high variation at the *ndhA* intron [43].

**Figure 3.** Sliding-window analysis on the cp genomes for seven *Acer* sect. *Platanoidea* species.

The endangered plants, *A*. *catalpifolium*, *A. yangjuechi* and *A. miaotaiense* in the *Acer* sect. *Platanoidea*, have small population sizes [44]. Population genetics studies of these species are relatively weak, and with limited conservation goals [45,46]. These high variability regions can provide alternative sites for subsequent studies and will contribute to the conservation of endangered plants.

#### *3.4. Phylogenetic Analysis*

The phylogenomic inference based on cp genomes shows high bootstrap supports in most nodes that provided robust evolutionary placement and relationship of the *Acer* species (Figure 4). The results showed that seven sampled species of the *Acer* sect. *Platanoidea* formed a single clade, which is consistent with previous studies of the *Acer* phylogeny [14,47]. In this clade, *A. catalpifolium*, *A. mono*, *A. miaotaiense and A. yangjuechi* had the closest phylogenetic relationship and formed a subclade, while *A. truncatum*, *A. longipes*, *A. amplum* formed another one. Previous phylogenetic inference of the *Acer* did not contain species with a small population size in the sect. *Platanoidea* (such as *A. catalpifolium* and *A. yangjuechi*) [14,47]. Phylogenetic analysis exhibited in this study for the issue of *A. longipes A. amplum* complex in the sect. Platanoidea raised earlier [17], and the phylogenetic position of *A. catalpifolium* was also redefined.

It is somewhat surprising that *A. mono* is not a sister with *A. truncatum* but with *A. yangjuechi*, which differs from some published studies [13]. Since *A. mono* is widespread in Asia and comprised of multiple local varieties, we cannot rule out that the possibility of the grouping of *A. mono* and *A. yangjuechi* due to adjacent sampling localities of *A. mono* and *A. yangjuechi* in Lin'an, Zhejiang province of China, the only extant habitat of *A. yangjuechi* [48]. Liu et al. [49] showed that the population composition of the Lin'an population is significantly different from the neighbouring populations of *A. mono*. The clustering of *A. mono* and *A. yangjuechi* inferred in this study may not only reflect the truth of geographic divergence in the genetics of *A. mono* but also implies the chloroplast capture by ancient hybridisation events between the two species in Lin'an.

**Figure 4.** Phylogenetic tree of 13 *Acer* species inferred by Maximum Likelihood (ML) and Bayesian Inference (BI) methods, based on the whole cp genome sequences. The numbers above the branches are the bootstrap values of ML methods and the posterior probabilities of BI.

#### **4. Conclusions**

In this study, we firstly reported complete cp genomes of five *Acer* sect. *Platanoidea* species (*A. catalpifolium*, *A. amplum*, *A. longipes*, *A. yanjuechi* and *A. mono*) using the NGS technology. In comparison with other published *Acer* species from NCBI, we found that the *Acer* species have similar cp genome structure and gene content. The divergence hotspots identified in the cp genome of the *Acer* sect. *Platanoidea* could be applied to develop molecular markers for further population genetics studies. The high variation at the IR/LSC and IR/SSC boundaries were also reported. The phylogenetic analysis strongly supported that *A. catalpifolium* has the closest relationship with *A. miaotaiense*, followed by *A. mono*, and *A. yanjuechi*, which confirms the species-complex relationship of *A. longipes* and *A. amplum*. The available genomic data presented in this paper provides a basis for further research on the evolutionary history and conservation genetics of endangered species of genus *Acer*.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/4/462/s1, Table S1: General features of the *Acer* sect. *Platanoidea* chloroplast genomes compared in this study. Table S2: Acer taxa sampled in this study.

**Author Contributions:** T.Y. and J.G. conceived and designed the work; J.G. and Y.-Y.Z. collected samples; T.Y., J.G. and B.-H.H. performed the experiments and analysed the data; T.Y. and J.G. wrote the manuscript; P.-C.L., W.-B.M., B.D. and J.-Q.L. critically reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the program "Reintroduction Technologies and Demonstration of Extremely Rare Wild Plant Population" of National Key Research and Development Program (2016YFC0503106) to J.Q.L., the Ministry of Science and Technology of Taiwan (grant number: 108-2628-B-003-001) and National Taiwan Normal University (NTNU) to P.C.L. and "The biogeographical feature and competitive hybridization of Maple (*Acer* L.) in East Asia" of National Natural Science Foundation of China (41901063) to J.G.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Complete Chloroplast Genome of** *Michelia shiluensis* **and a Comparative Analysis with Four Magnoliaceae Species**

**Yanwen Deng 1,2, Yiyang Luo 1,2, Yu He 1,2, Xinsheng Qin 2, Chonggao Li <sup>3</sup> and Xiaomei Deng 1,2,\***


Received: 7 February 2020; Accepted: 25 February 2020; Published: 27 February 2020

**Abstract:** *Michelia shiluensis* is a rare and endangered magnolia species found in South China. This species produces beautiful flowers and is thus widely used in landscape gardening. Additionally, its timber is also used for furniture production. As a result of low rates of natural reproduction and increasing levels of human impact, wild *M. shiluensis* populations have become fragmented. This species is now classified as endangered by the IUCN. In the present study, we characterized the complete chloroplast genome of *M. shiluensis* and found it to be 160,075 bp in length with two inverted repeat regions (26,587 bp each), a large single-copy region (88,105 bp), and a small copy region (18,796 bp). The genome contained 131 genes, including 86 protein-coding genes, 37 tRNAs, and 8 rRNAs. The guanine-cytosine content represented 39.26% of the overall genome. Comparative analysis revealed high similarity between the *M. shiluensis* chloroplast genome and those of four closely related species: *Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*. Phylogenetic analysis shows that *M. shiluensis* is most closely related to *M. odora*. The genomic information presented in this study is valuable for further classification, phylogenetic studies, and to support ongoing conservation efforts.

**Keywords:** Hainan Province; endemic species; conservation; codon usage; sequence divergence; phylogeny

#### **1. Introduction**

*Michelia shiluensis* Chun and Y. F. Wu (Magnoliaceae) is an endangered flowering plant that is sparsely distributed throughout Hainan Province, China [1]. It is characterized by leafy branches and beautiful flowers, and is, therefore, widely used in landscape gardening [2]. This species is also a source of excellent quality wood which is in demand for furniture production [3]. In recent decades, there has been a serious decline in wild populations of this species as a result of the illegal harvesting to supply both the timber and horticultural markets [4]. Moreover, this species naturally has a low seeding rate and its wild populations are declining [5]. Consequently, *M. shiluensis* is categorized as a Class II National Key Protected Species in China [6] and is considered endangered (EN) by the International Union for Conservation of Nature [7]. Currently, most studies on *M. shiluensis* have focused on its use in landscape gardening and its protection in China [5]; however, there remains a lack of evolutionary and phylogenetic research.

The chloroplast is an important organelle in plants with its own genome (hereafter, cp genome) and participates in photosynthesis and other functions [8]. The cp genome of most land plants has a

circular structure, including four segments: A large single-copy (LSC), a small single-copy (SSC), and two invert repeats (IRs) [9]. Although the cp genome is generally conserved, it has undergone intraand inter-species rearrangement during evolution [10,11], including IR expansion and contraction. The information obtained from sequence rearrangements can be applied in phylogenetic analyses to solve taxonomic problems, such as low-level classifications, using genome comparison [12–17]. In the section *Michelia*, complete cp genomes have been reported for only *Magnolia alba* (NC037005), *Magnolia laevifolia* (NC035956), and *Michelia odora* (NC023239). Therefore, analysis of the cp genomes of other *Michelia* plants is necessary because of the similarity of morphology among Magnoliaceae species [18].

In the present study, we characterized the cp genome of *M. shiluensis* and compared its sequence features with four closely related species (*M. odora*, *M. laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*). The phylogenetic relationships among 28 Magnoliaceae species were constructed based on 79 proteincoding gene (PCG) sequences and show that *M. shiluensis* is most closely related to *M. odora*.

#### **2. Materials and Methods**

#### *2.1. Plant Material and DNA Extraction*

Fresh leaves of *M. shiluensis* were collected in the South China Botanical Garden (113◦21 E, 23◦10 N), China and transported to the laboratory at the South China Agricultural University. Total genomic DNA was isolated from the leaves using the CTAB method [19].

#### *2.2. Genome Sequencing and Annotation*

An Illumina shotgun library was established according to the manufacturer's protocol, and high-throughput sequencing was conducted using the HiSeq X TEN platform (Illumina, San Diego, CA, USA). After filtration using SOAPnuke [20], 4.93 GB of clean data were generated. Filtered reads were assembled de novo using SPAdes (version 3.10.1) [21] by referencing them against the cp genome sequence of *M. odora* (NC037005.1) using BLAST v2.2.30 (National Center for Biotechnology Information, Bethesda, MD, USA). Gene annotation was performed using GeSeq [22]. The cp genome map was generated using Organellar Genome DRAW (version 1.2) [23]. The annotated sequence was submitted to GenBank (accession number MN418056).

#### *2.3. Sequence and Repeat Analysis*

We used the Editseq v7.1.0 [24] software to calculate the guanine-cytosine (GC) content. MEGA v7.0.26 [25] was used to generate the relative synonymous codon usage (RSCU) values based on 79 PCGs. RNA editing sites in PCGs were predicted using the PREP suite [26] with the default settings.

The REPuter [27] online service was used to identify repeats (forward, reverse, complement, and palindromic) in the cp genome with default parameters. Chloroplast simple sequence repeats (cpSSRs) were identified using MISA-web [28] with minimal repeat numbers of 8, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats, respectively.

#### *2.4. Genome Comparison and Sequence Divergence*

Comparisons between five Magnoliaceae cp genomes were visualized using online mVISTA software [29] with the annotation of *M. shiluensis* as the reference in Shuffle-LAGAN mode. The borders of four different regions among the five cp genomes of Magnoliaceae were visualized using IRscope [30]. The nucleotide diversity (Pi), the rate of nonsynonymous (dN) substitutions, the rate of synonymous (dS) substitutions were determined using DNAsp v6.12.03 [31] to investigate the nucleotide diversity of sequences and genes that are considered to be under selection pressure.

#### *2.5. Phylogenetic Analysis*

To research the phylogenetic relationships and allow for comparisons among Magnoliaceae species, a maximum likelihood tree was constructed using RAxML [32], with 1000 bootstrap replicates, based on the PCG sequences found in 28 Magnoliaceae cp genomes. All 28 Magnoliaceae cp genome sequences were downloaded from the NCBI nucleotide database.

#### **3. Result**

#### *3.1. Structures and Features of M. shiluensis Chloroplast Genome*

The complete cp genome of *M. shiluensis* was 160,075 bp in length and comprised two IR regions of 26,587 bp each, separated by an LSC region of 88,105 bp, and an SSC region of 18,796 bp. The cp genome had the following base proportions: Adenine (A), 29.99%; thymine (T), 30.75%; cytosine (C), 19.98%; and guanine (G), 19.28%. Therefore, the total GC content was 39.26%. The GC content of the LSC, SSC, and IR regions were 37.95%, 34.28%, and 43.20%, respectively (Table 1).


**Table 1.** Summary of the chloroplast genomes of *Michelia shiluensis* and four closely related species (*Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*).

Notes: GC: Guanine-cytosine; LSC: Large single-copy; IR: Invert repeat; SSC: Small single-copy; PCG: Proteincoding gene.

The cp genome of *M. shiluensis* contained 113 unique genes, including 79 PCGs, 30 tRNAs, and four rRNAs (Table 1, Figure 1). A total of 58 genes were found to be involved in self-replication, 12 genes encoded small ribosomal subunit proteins, eight genes encoded large ribosomal subunit proteins, 30 genes encoded tRNA, and four genes encoded RNA polymerase subunits. A total of 44 genes were found to be involved in photosynthesis, including six genes for ATP synthase, 11 genes for NADH dehydrogenase, six genes for the cytochrome b/f complex, five genes for photosystem I, 15 genes for photosystem II, and one gene for the large chain of Rubisco. In total, 18 genes were duplicated in the cp

genome of *M. shiluensis*, including seven PCGs, seven tRNA genes, and four rRNA genes, all of which were located in the IR region (Table 2). None of the genes contained stop codons in coding sequences, therefore, no pseudogenes were detected.

**Figure 1.** Gene map of the *Michelia shiluensis* chloroplast genome. Genes on the outside of the circle are transcribed counter-clockwise, while genes on the inside are transcribed clockwise. Different colors represent different kinds of functional genes. The guanine-cytosine content is indicated by darker gray and the adenine-thymine content is indicated by light gray.

A total of 16 genes were found to have introns, including 10 PCGs and six tRNA genes. Of these genes, *clpP*, *trnA-UGC*, *trnI-GAU*, and *ycf3* had two introns, whereas *atpF*, *ndhA*, *ndhB*, *petB*, *rpl2*, *rpoC1*, *rps12*, *rps16*, *trnG-UCC*, *trnK-UUU*, *trnL-UAA*, and *trnV-UAC* had one intron. The *rps12* gene which encodes the 40S ribosomal protein S12, was trans-spliced, with one exon located in the LSC region, and the other two exons located in the IR region. The largest intron was located in the *trnK* gene (2490 bp) with the *matK* gene inside; *trnL-UAA* had the smallest intron (491 bp) (Table 3).

We compared the basic cp genome features of *M. shiluensis* with four Magnoliaceae species. The cp genome lengths of *M. laevifolia* and *M. insignis* were 45 and 42 bp longer than that of *M. shiluensis*, respectively, while the cp genome lengths of *M. odora* and *M. cathcartii* were five and 125 bp shorter, respectively. Compared with *M. shiluensis*, the variation in the lengths of the LSC, SSC, and IR regions ranged from 7 to 90, 3 to 14, and 1 to 78 bp, respectively. In addition, the GC content of the whole genome and of each region of *M. shiluensis* were highly similar to those of the other four species. Moreover, there was no variation with respect to the total number of genes, PCGs, tRNA genes, rRNA genes, and genes with introns (Table 1).


**Table 2.** List of the annotated genes in the *Michelia shiluensis* chloroplast genome.

Note: "\*" indicates duplicated genes.

**Table 3.** Characteristics of the genes that contain introns in the cp genome of *Michelia shiluensis*.


#### *3.2. Codon Usage and RNA Analysis*

Based on the PCGs, 22,791 codons were detected (excluding the stop codons) (Table 4). The three most abundant amino acids were leucine (2423 codons), isoleucine (2085 codons), and serine (1719 codons), and the three least abundant amino acids were cysteine (314 codons), tryptophan (427 codons), and

methionine (602 codons) (Figure S1). Of the 30 most frequent codons (RSCU > 1), most of them end with A or U, and only the UUG codon ends with G. In contrast, most of the 32 least frequent codons (RSCU < 1) end with C or G. In addition, two codons, AUG and UGG, have no codon bias (RSCU = 1).

PREP suite was used to edit predictions in the genome of *M. shiluensis* by manipulating the first codon position of the first nucleotide (Table S1). A total of 106 RNA editing sites were detected from the PCGs in *M. shiluensis*; with the majority of the amino acid conversions involving the conversion of serine to leucine. Most of the RNA editing sites were located on the *ndhB* gene (14 sites), followed by *ndhD* (11 sites), and *matK* (nine sites). Most of the conversions changed from a polar group to a nonpolar group, while only two sites changed from a nonpolar group to a polar group (proline to serine); one of these was located on the *psbE* gene while the other was located on the *ccsA* gene.


**Table 4.** Relative synonymous codon usage (RSCU) in the chloroplast genome of *Michelia shiluensis*.

Note: "\*" indicates the stop codon.

#### *3.3. Repeat Sequence Analysis*

The REPuter results show that the *M. shiluensis* cp genome contains a total of 49 repeats: 23 palindromic, 18 forward, and eight reverse repeats (Figure 2). The repeat size ranged from 18 to 33 bp. The most abundant repeats were 18 bp (12 sites) followed by 20 bp (10 sites) (Figure S2).

**Figure 2.** Comparison of repeats among five Magnoliaceae species: *Michelia shiluensis*, *Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*. (F: Forward; R: Reverse; C: Complement; and P: Palindromic).

In the first location, 46.9% of repeats were detected in the intergenic spacers (IGSs), while 34.7% were in the PCGs, and 18.4% were in the tRNA genes. Of all the PCGs, the *ycf2* gene had five forward repeats and four palindromic repeats and was the gene with the most repeats (Table S2). Comparison of the repeat types with the other four species revealed no substantial variation among the five species (Figure 2). *Michelia shiluensis* had the highest frequency of palindromic repeats (23), while *M. laevifolia* had the lowest (21). *Magnolia cathcartii*, *M. odora*, and *M. shiluensis* had the same number of forward repeats (18), while *M. shiluensis* and *M. cathcartii* had eight reverse repeats. In addition, only one complement repeat was found in the genomes of *M. laevifolia* and *M. odora*, whereas no complements were identified in the cp genomes of *M. shiluensis* and *M. insignis*. A total of 141 cpSSRs were found in the cp genome of *M. shiluensis* (Table S3). The majority of them were mononucleotide repeats (118), followed by tetranucleotide repeats (9), and dinucleotide repeats (8) (Figure 3). No pentanucleotide repeats were detected in the cp genome of *M. shiluensis*. The longest repeat was 18 bp while the shortest was 8 bp. Noncoding regions, including IGSs (97) and introns (19), contained most of the SSRs while only 25 repeats were located in coding regions, including *cemA*, *ndhD*, *ndhF*, *psbC*, *rpoB*, *rpoC1*, *rpoC2*, *rps19*, *rps3*, *ycf1*, *ycf2*, and *ycf4* (Table 5). The cpSSRs were mainly distributed in the LSC region (72.34%), followed by the SSC region (17.73%), with just 4.96% in the IR. The cpSSRs in *M. shiluensis* had base bias towards A-T bases. In total, 113 SSRs had A or T bases, accounting for 80.14% of the total SSRs (Figure 3). Comparison among the five species of Magnoliaceae show high similarity in SSR type and distribution. The variation in the total amount, mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats among the five species was 5, 4, 0, 2, 0, 1, and 1, respectively (Figure 3). The number of SSRs in the IR were the same among the five species while the counts in different locations and regions were highly conserved.


**Table 5.** Distribution of single sequence repeats in different locations and regions among five Magnoliaceae species: *Michelia shiluensis*, *Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*.

Notes: LSC: Large single copy; IR: Invert region; SSC: Small single copy; CDS: Coding sequence; IGS: Intergenic spacer.

**Figure 3.** Single sequence repeats (SSRs) in the chloroplast genome of *Michelia shiluensis*. (**a**) Comparison of the SSRs among five Magnoliaceae species (*M. shiluensis*, *Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*); (**b**) base composition of SSRs in the cp genome of *M. shiluensis*.

#### *3.4. Genome Comparison and Sequence Divergence*

The mVISTA online software was used to compare the variation in the whole cp genome among the five species (Figure 4). The alignments indicated that the whole cp genome of the five species was highly conserved, especially in the IR region. The noncoding sequences had relatively more divergence than the coding sequences. The noncoding sequences that contained high levels of divergence were *rps16-trnQ*, *atpH-atpI*, *trnT-psbD*, *petA-psbJ*, and *ndhF-trnL*. In the coding sequences, only *ycf1* show relatively more variation than the other genes. No obvious insertions were found among the five species.

The four junctions in the regions of the cp genomes of the five species were shown using IRscope (Figure 5). There was a conserved structure on each border, and slight distance differences among the five species. Gene *rps19* was fully located in the LSC at a distance of 1–6 bp from the LSC/IRb border, while gene *rpl2* was fully located in the IRb. The *ndhF* gene was found in the SSC region and was 61 bp away from the IRb/SSC border in *M. odora*, *M. laevifolia*, and *M. insignis*, while it was 21 bp longer in *M. shiluensis*, and 7 bp shorter in *M. cathcartii*. The SSC/IRa border was inside the *ycf1* gene in all five species. Compared to *M. shiluensis* and *M. odora*, the part of the *ycf1* gene in the SSC region of *M. laevifolia* and *M. insignis* was almost 20 bp longer, and this resulted in the differences in gene length. However, the *ycf1* gene of *M. cathcartii* was almost 30 bp shorter on both sides of the SSC/IRa border; thus, the *ycf1* gene of *M. cathcartii* was almost 60 bp shorter than those of the other four species. The distance from the *trnH* gene to the IRa/LSC border was 11 bp in *M. shiluensis*, *M. odora*, and *M. laevifolia*, while it was 16 bp in *M. insignis* and 9 bp in *M. cathcartii*. Due to the short length in the IR region, the whole length of the *M. cathcartii* cp genome was significantly shorter than those of the other four species.

To detect the selective pressures on the PCGs in the *M. shiluensis* cp genome, the rate of nonsynonymous (dN) substitutions, the rate of synonymous (dS) substitutions, and their ratio (dN/dS) were calculated based on the 79 PCG sequences of the five Magnoliaceae species (Figure 6). Only four genes had a dN/dS ratio greater than 1 (*accD* in *M. insignis* vs. *M. cathcartii*, score 1.14; *ndhD* in *M. shiluensis* vs. *M. cathcartii*, score 1.29; *ndhF* in *M. odora* vs. *M. cathcartii*, score 1.89; and *rpoC2* in *M. laevifolia* vs. *M. cathcartii*, score 2.50), which indicates that most genes are under the influence of negative selection, while only a few genes are under the influence of positive selection.

**Figure 4.** Sequence alignment of five whole chloroplast genomes in Magnoliaceae (*Michelia shiluensis*, *Michelia odora*, *Magnolia laevifolia*, *Magnolia insignis*, and *Magnolia cathcartii*) using *M. shiluensis* as a reference in *mVISTA*.

**Figure 5.** The four junctions of the regions in the chloroplast genomes of the five *Magnoliaceae* species, determined using *IRscope*.

**Figure 6.** The synonymous (dS) and nonsynonymous substitutions (dN)/dS ratio values of 79 proteincoding genes from five Magnoliaceae chloroplast genomes (Ms: *Michelia shiluensis*; Mo: *Michelia odora*; Ml: *Magnolia laevifolia*; Mi: *Magnolia insignis*; Mc: *Magnolia cathcartii*).

In the coding region, the mean Pi in the PCGs was 0.00117 (ranging from 0 to 0.00606); and the mean values of Pi in the LSC, IR, and SSC regions were 0.001192, 0.000186, and 0.001634, respectively (Figure 7). Meanwhile, in the IGSs, the mean Pi value was 0.00295 (ranging from 0 to 0.02416); and the mean Pi value in the LSC, IR, and SSC regions were 0.0297, 0.00045, and 0.00731, respectively. This result indicates that the Pi value in the coding region is lower than that in the IGSs. The results also demonstrate that the IR region is the most conserved region among the five species, followed by the LSC and SSC regions. In total, 20 mutation sites (Pi > 0.005) were identified, including 19 sites in IGSs and one site in a PCG. The mutation sites in IGSs were as follows: *trnH-psbA*, *psbK-psbI*, *atpA-atpF*, *rps2-rpoC2*, *trnT-psbD*, *ycf3-trnS*, *ndhJ-ndhK*, *ndhK-ndhC*, *accD-psaI*, *psbL-psbF*, *petL-petG*, *trnW-trnP*, *trnP-psaJ*, *rpl18-rpl20*, *ndhF-trnL*, *ccsA-ndhD*, *ndhD-psaC*, *ndhG-ndhI*, and *ndhI-ndhA*. One gene, *psaJ*, was unique and had a Pi value greater than 0.005.

**Figure 7.** Nucleotide diversity (Pi) in the chloroplast genome of five Magnoliaceae species (*Michelia shiluensis*; *Michelia odora*; *Magnolia laevifolia*; *Magnolia insignis*; and *Magnolia cathcartii*).

#### *3.5. Phylogenetic Analysis*

To reveal the evolutionary relationships between the investigated species and to enable comparison with traditional phylogenies, a maximum likelihood phylogenetic tree was constructed using RAxML (with 1000 bootstrap replicates) based on the PCG sequences found in 28 Magnoliaceae cp genomes (Figure 8). The phylogenetic tree generated 25 nodes; most of which had 100% bootstrap support. This result strongly supports the notion that *M. shiluensis* is most closely related to *M. odora*.

**Figure 8.** Maximum likelihood tree with 1000 bootstrap replicates constructed using RAxML based on chloroplast genomes of 28 Magnoliaceae species (26 species of *Magnolia* and *Michelia*, and two species of *Liriodendron* as outgroups). Bootstrap values (%) are shown above branches. Accession numbers: *Magnolia alba* NC037005, *Magnolia liliiflora* NC023238, *Magnolia denudata* NC018357, *Magnolia sprengeri* NC023242, *Magnolia salicifolia* NC023240,*Magnolia biondii*KY085894,*Magnolia kobus* NC023237,*Michelia odora* NC023239, *Michelia shiluensis* MN418056, *Magnolia laevifolia* NC035956, *Magnolia insignis* NC035657, *Magnolia cathcartii* NC023234, *Magnolia yunnanensis* NC024545, *Magnolia sinica* NC023241, *Magnolia kwangsiensis* NC015892, *Magnolia conifera* NC037001, *Magnolia dandyi* NC037004, *Magnolia aromatica* NC037000, *Magnolia fordiana* MF990562, *Magnolia glaucifolia* NC037003, *Magnolia duclouxii* NC037002, *Magnolia tripetala* NC024027, *Magnolia officinalis* NC020316, *Magnolia grandiflora* NC020318, *Magnolia pyramidata* NC023236, *Magnolia dealbata* NC023235, *Liriodendron chinense* NC030504, and *Liriodendron tulipifera* NC008326.

#### **4. Discussion**

In this study, we characterized the complete cp genome of *M. shiluensis*, an endangered and valuable species (Figure 1). By comparing five closely related species, we found that gene content, order, structure, and other features were highly conserved among them (Figures 3–5). *Michelia shiluensis* was shown to be most closely related to *M. odora* (Figure 8). This finding can help to further our understanding of the characterization of the *M. shiluensis* cp genome and reveal information concerning the evolution, population genetics, and phylogeny of this species.

Normally, the length of the cp genome in higher plants is in the range of about 120–160 kb, with a stable structure and conserved sequence [8,33]. The cp genome of *M. shiluensis* displayed a typical quadripartite structure, with an LSC and an SSC which were separated by two IR regions (Figure 1). The whole length of this genome was 160,075 bp, with 39.26% GC content, and containing 113 unique genes and 16 genes with one or two introns (Tables 1–3). Among the 26 Magnoliaceae species, the length of the cp genome ranged from 158,177 to 160,183 bp, the GC content ranged from 39.15% to 39.30%, and they collectively contained 112 common genes, including 79 PCGs, 29 tRNA, and four rRNA genes; also, one or two introns were found among these 16 genes. The results for the *M. shiluensis* cp genome were consistent with a previous analysis of 26 Magnoliaceae species [34], except for the number of genes, one additional tRNA gene (*trnV-GAC*) was detected in *M. shiluensis*. Similar to other angiosperms, a high GC content was detected in the IR region of *M. shiluensis*, which may be a result of the existence of high-GC rRNA sequences [9,35–37]. Introns play a vital role in selective gene splicing [38]. However, introns have been lost among some species during their evolution [39,40]. In this study, no introns were lost in the cp genome of *M. shiluensis* during evolution, which reflects the fact that the cp genome is highly conserved in Magnoliaceae [34].

In total, 22,791 codons were found in the cp genome of *M. shiluensis* (Table 4), among which, the codons for leucine were the most abundant (10.63%). This result has also been observed in *Ailanthus altissima* [41] and *Justicia flava* [42]. Among the preferred codons (RSCU > 1), we found that most of them ended with A or U, except UUG. This is not unique to the *M. shiluensis* cp genome as similar findings have been observed in *Papaver rhoeas* and *P. orientale* [36], *Ageratina adenophora* [43], and *Oryza sativa* [44]. Of the PCGs in the *M. shiluensis* cp genome, 106 possible sites for RNA editing were detected (Table S1). The majority of the amino acid conversion was from serine to leucine, and the *ndhB* gene accounted for a high number of editable sites (14 of the total 106 sites). Similar results have been obtained for *Forsythia suspensa* [45] and *Sanionia uncinata* [46].

Repeat sequences play an important role in genomic structural variation, expansion, and rearrangement [8,40]. Previous research has indicated that most of the repeat sequences are located in the IGS regions followed by the coding regions [47,48]. A similar result was found in this study, with 46.9% of repeats detected in the IGS regions, followed by 34.7% in the coding regions, and the remainder in the tRNAs (Table S2). The cpSSR is an effective marker [49,50] that is widely used in population genetics, biogeographic studies, and phylogenetic evaluation [51,52]. In the cp genome of *M. shiluensis*, over 80% of the SSRs consisted of A or T bases, and over 80% were mononucleotide repeats. Similar results have been observed in other studies [48,50,53]. The majority of SSRs are found in the SSC and LSC regions [50] and, in this respect, *M. shiluensis* is no exception (Table 5). These two regions accounted for 90.07% of the SSRs, and only seven SSRs were found in the IR region.

Although the cp genome of angiosperms is relatively conserved in structure and size [54,55], the expansion and contraction of the IR region, as caused by evolutionary events, has resulted in minor changes in the IR boundary and size of the genome [39,56], thus increasing the chloroplast genetic diversity of angiosperms [57,58]. In this study, comparative analysis of five Magnoliaceae species revealed that the IR lengths were similar in *M. shiluensis, M. odora*, and *M. laevifolia* (Figure 5). However, the IR region of *M. insignis* had completely lost 11 bp in the *rps12-trnV*, *rrn23*-*rrn4.5* IGSs, while *M. cathcartii* had lost 5 bp in the *rpl2* intron, 6 bp in the *rps12* intron, 26 bp in *ycf1*, and 41 bp in the *trnN-ndhF* IGS. The losses of these bases resulted in differences in the lengths of the IR regions among the five species.

DNA barcoding is a technique that is widely applied in plant identification studies [59,60]. However, only a few regions have been used for the DNA barcoding of Magnoliaceae [61–63]. We used mVISTA to compare the genomes of five Magnoliaceae species and revealed that the IR region is more highly conserved than the LSC and SSC regions, and that the coding region was more highly conserved than the noncoding region (Figure 4), consistent with other angiosperms [9,38]. Five regions in the *M. shiluensis* cp genome had high levels of variation (four on IGSs and one on a PCG). The Pi value was also investigated among the 79 PCGs and 125 IGSs (Figure 7), and only 20 regions were found to have a Pi value greater than 0.005; which confirmed the low base substitution rate in Magnoliaceae [64]. Regions with a high degree of variation can be used to develop high resolution DNA barcoding for identification.

Due to the high morphological similarity among Magnoliaceae species [18,65], there have been some difficulties with respect to the classification of the family. Thus, the classification of Magnoliaceae has always been controversial [66–72]. The cp genome contains sufficient information and has been shown to be more effective than cpDNA fragments for clarifying low level phylogenetic relationships in plants [53,73]. In this study, the phylogenetic results of 28 Magnoliaceae plants based on PCG sequences revealed that *M. shiluensis* is most closely related to *M. odora* (Figure 8), which is consistent with phylogenetic results based on the *ndhF* sequence [70]. According to traditional morphological classification, *M. insignis* has been placed in the subgenera *Manglietia*, and *M. alba* has been placed in the section *Michelia* [69,74]. However, the phylogenetic relationship results based on the cp genome show that *M. insignis* is located in the section *Michelia* clade and *M. alba* is located in the subgenera *Yulania* clade. This result differs from that of traditional morphological classification [69,74] and the results of three nuclear gene sequences [75]. These findings confirm that not even a complete cp genome can distinguish species in young evolutionary lineages, and that phylogenetic conclusions may require consideration of certain features in the nuclear genome [76].

#### **5. Conclusions**

The complete cp genome provided by this study can be used for in-depth genetic research on *M. shiluensis* and Magnoliaceae species in general, and may also play an important role in the development of new conservation and management strategies to ultimately aid species conservation efforts.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1999-4907/11/3/267/s1. Figure S1, Amino acid frequency among 79 protein-coding genes in the *Michelia shiluensis* chloroplast genome; Table S1, Possible RNA editing sites in the chloroplast genome of *Michelia shiluensis*; Table S2, Repeats in the chloroplast genome of *Michelia shiluensis*; Figure S2, Different lengths of repeats in the *Michelia shiluensis* cp genome; Table S3, Single sequence repeats in the chloroplast genome of *Michelia shiluensis*.

**Author Contributions:** Conceptualization, Y.D.; methodology, Y.D. and Y.H.; software, Y.L. and C.L.; validation, Y.D., Y.L., Y.H., C.L., and X.D.; formal analysis, Y.L. and Y.H.; resources, X.Q. and X.D.; writing—original draft preparation, Y.D.; writing—review and editing, Y.D.; supervision, X.D.; project administration, X.D.; funding acquisition, X.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the Forestry Public Welfare Industry Research of China (Grant Number: 201404116), the National Natural Science Foundation of China (Grant Number: 31670601), and the Forestry Science and Technology Innovation of Guangdong Province grant programs (Grant Numbers: 2014KJCX006 and 2017KJCX023).

**Acknowledgments:** We are sincerely grateful to Xiaomei Deng (supervisor to Yanwen Deng) for her support during this study, which included assistance with funding, materials, resources, and consultations with other academics. We are also grateful to Xinsheng Hu for his invaluable guidance during the manuscript preparation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Constitutive and Cold Acclimation-Regulated Protein Expression Profiles of Scots Pine Seedlings Reveal Potential for Adaptive Capacity of Geographically Distant Populations**

### **Danas Baniulis 1,\*, Monika Sirgedien ˙ e˙ 2, Perttu Haimi 1, Inga Tamošiun¯ e˙ <sup>1</sup> and Darius Danuseviˇcius <sup>2</sup>**


Received: 5 December 2019; Accepted: 8 January 2020; Published: 10 January 2020

**Abstract:** Geographically distant Scots pine (*Pinus sylvestris* L.) populations are adapted to specific photoperiods and temperature gradients, and markedly vary in the timing of growth patterns and adaptive traits. To understand the variability of adaptive capacity within species, molecular mechanisms that govern the physiological aspects of phenotypic plasticity should be addressed. Protein expression analysis is capable of depicting molecular events closely linked to phenotype formation. Therefore, in this study, we used comparative proteomics analysis to differentiate Scots pine genotypes originating from geographically distant populations in Europe, which show distinct growth and cold adaptation phenotypes. Needles were collected from 3-month-old seedlings originating from populations in Spain, Lithuania and Finland. Under active growth-promoting conditions and upon acclimation treatment, 65 and 53 differentially expressed proteins were identified, respectively. Constitutive protein expression differences detected during active growth were associated with cell metabolism and stress response, and conveyed a population-specific adaptation to the distinct climatic conditions. Acclimation-induced protein expression patterns suggested the presence of a similar cold adaptation mechanism among the populations. Variation of adaptive capacity among the genotypes was potentially represented by a constitutive low level of expression of the Ser/Thr-protein phosphatase, the negative regulator of the adaptive response. Also, overall less pronounced acclimation-induced response in seedlings from the Spanish population was observed. Thus, our study demonstrates that comparative proteomic analysis of young conifer seedlings is capable of providing insights into adaptation processes at the cellular level, which could help to infer variability of adaptive capacity within the plant species.

**Keywords:** conifer adaptation; phenotypic plasticity; comparative proteomics; stress response

#### **1. Introduction**

Sustainability of future forests is threatened by anthropogenic impacts, mainly due to pollution and climate change [1,2]. Therefore, special attention is being given to understanding of the molecular basis of adaptive traits in tree species (e.g., [3,4]). High genetic diversity, phenotypic plasticity and epigenetic mechanisms are the key factors of the adaptive capacity of forest tree populations (e.g., [5,6]). Trees have a low evolutionary rate in terms of nucleotide substitution and diversification rates as compared to herbaceous species [7]. On the other hand, tree populations are larger, less genetically structured and contain larger genetic diversity [7,8]. Most forest tree species show substantial variability in adaptive

traits at levels much higher than is observed at neutral genetic markers and form a collection of highly differentiated populations with contrasting adaptation to local climates [9]. Intra-specific genetic variation of tree species is linked to the high adaptability under constantly changing environments [3] providing diversity and plasticity of genetic variants for evolution, where phenotypic plasticity is a property of a genotype to produce different phenotypes in response to particular environmental conditions that vary both temporally and geographically. Among the main evolutionary forces influencing the variability patterns, natural selection primarily acts to increase genetic differentiation among populations, while gene flow and phenotypic plasticity tend to reduce it [10]. In wide-spread wind pollinated conifers such as Scots pine, the effects of genetic drift and mutations are weak. Although natural variation in epigenetic marks and the relation to phenotypic traits in trees is still an under-explored area, epigenetic variation has been suggested to contribute to the phenotypic plasticity and adaptive potential of forest trees [11].

Gene expression is a dynamic process and depends on the specific tissue, developmental stage and environmental signals. Gene expression analysis at both transcriptome and proteome level could provide details about population variability, and the mechanisms involved in evolution and adaptation [12,13]. Protein expression is the final step in gene expression process, thus, proteomic analysis bridges the gap between the genotype and phenotype, providing a depiction of molecular events that are most directly linked to the phenotype at cellular or organism level. Quantitative variation of protein abundance is the result of a complex network of regulatory mechanisms that integrate genetic variation and interaction with the environment of individual plants [14,15]. A study on *Arabidopsis* ecotypes showed that the proteomics approach could constitute a powerful tool to mine the biodiversity between ecotypes of a single plant species [16]. Proteomics was used to study phenotypic and molecular response to water deficit of two *Eucalyptus* genotypes [17] and adaptive variation in *Picea abies* (L.) H.Karst. ecotypes [18,19].

Scots pine (*Pinus sylvestris* L.) is the most widespread coniferous tree species occupying an extensive area in Eurasia. The complex biogeographic history of Scots pine in Europe has been well studied [20–22]. At the neutral part of the genome, in open pollinated wide-spread conifers such as Scots pine, the phylogenic structure is weak and population differentiation is low (e.g., [23,24]). Based on the maternally inherited mitochondrial DNA markers, the European range of Scots pine is grouped into three major haplotypes: south western (refugia in Pyrenees), central (refugia in Balkan and Italy) and south eastern (refugium in the southern edge of the Ural mountains) [20,25–27]. Gene flow and isolation by distance were other major factors affecting population differentiation in the Scots pine that were defined using neutral DNA markers [28]. Numerous studies based on indoor and field trials revealed a significant clinal variation among the Scots pine populations over latitude gradients in adaptive traits such as phenology [29,30].

Timing of cold acclimation is of adaptive significance for northerly conifers such as the Scots pine [31–33]. The seedlings cease active growth and enter the endodormancy stage in response to elongating nights towards the autumn [34], followed by shoot hardiness development in response to chilling temperatures in the early autumn [35,36]. In trees, cold acclimation treatment results in complex structural, biochemical and genetic shifts from elongating meristems to completely dormant and frost hardy tissues [37]. In brief, it causes protoplasmic dehydration, concentration of solutes, reduction of osmotic potential, upregulation of abscisic acid, a change in carbohydrate metabolism along with changes in membrane permeability and fatty acid composition [38,39]. The most prominent changes are induced in the chloroplasts and manifest as structural reorganization of the organelar membranes [40,41]. At protein level, the increased amount of antioxidant enzymes [42–44] and accumulation of dehydrins mostly localized in chloroplasts and the mitochondria membrane system was reported during the acclimation [45]. Gene expression studies showed upregulation of enzymes involved with synthesis of sugars and sugar derivatives, dehydration stress and other stress tolerance enzymes, such as heat-shock proteins [46].

Owing to the photoperiod and temperature-triggered adaptation in the northern regions, conifers have population-specific requirement for critical night length to budset and chilling temperatures to induce variable levels of shoot frost hardiness in early autumn [36]. When transferred to a single location, the critical night length for bud set and shoot hardiness development vary markedly between geographically distant populations within a latitudinal gradient [29,32,47]. Therefore, after a cold acclimation treatment, markedly different gene expression profiles could be expected in seedlings from the geographically distant Scots pine populations.

To probe an adaptive capacity within the Scots pine species, our study assessed phenotypic traits and quantitative and qualitative protein expression patterns in the Scots pine seedlings originating from the geographically distant populations of northern and southern Europe. Protein expression patterns were assessed by two-dimensional protein electrophoresis. Firstly, differences among the three populations were assessed during the active growth period. Furthermore, an acclimation treatment of the seedlings was used to extend the analysis of protein expression profiles with specific emphasis on cold adaptation processes in the distinct Scots pine genotypes.

#### **2. Materials and Methods**

#### *2.1. Seedling Production and Cold Acclimation Treatment*

We used seeds of Scots pine collected from the three autochthonous populations representing the Mediterranean, temporal and boreal climatic zones in Europe: central Spain (Valsain Forest mountains, Sierra de Guadarrama province, 40◦49 lat., 3◦57 long., 1828 m.a.s.l. alt.), central Lithuania (Kazlu Ruda forest tact, 54◦46 lat., 23◦32 long., 79 m.a.s.l. alt.) and northern Finland (Suomussalmi forest, 65◦07 lat., 28◦49 long., 233 m.a.s.l. alt.), respectively (Figure 1). The seeds were collected in native Scots pine stands naturally-born over generations in large forest tracts, where the influence of non-autochthonous sources is negligible. These populations mainly represent a geographically large latitudinal gradient of genepools adapted to distinct photoperiodic and temperate requirements to initiate dormancy and cold hardiness processes that have a major effect on timing of cold acclimation of the seedlings. The seed lots were mixtures from more than 20 trees within a forest stand, which was considered as representative of a population.

**Figure 1.** Location of the Finnish (1), Lithuanian (2) and Spanish (3) populations used in the study.

Seeds were soaked overnight in sterile deionized water and surface sterilized for 30 min with 30% hydrogen peroxide. Floating seeds were discarded, and the remaining seeds were sown in sterile plastic containers (length 50 cm × width 17 cm × height 15 cm) on sterilized peat treated with chalk and soaked with sterile nutritive solution. Seeds were germinated and seedlings were grown in the climatic growth chamber LT-36VL (Percival) during a 3-month period. Constant temperature of 24 ◦C, 100% humidity and a long-day (16 h, 30 μmol m−<sup>2</sup> s<sup>−</sup>1; fluorescent light 5500 K photoperiod) was used to promote active growth. Seedlings were watered at a two-day interval and fertilizer containing 44 g/L nitrogen, 11 g/L P2O5, 33 g/L K2O, 2.2–3.3 g/L MgO, was used every second week. Two containers for each population were used, and the containers were randomly relocated within the chamber once per week to reduce undesired external effects within in the chamber. After germination, seedlings were thinned out to maintain comparable plant density (approx. 100 seedlings per container). In 3 months after germination, samples for the active growth experimental group were collected, and the remaining seedlings were exposed to the 2-week cold acclimation treatment in the same growth chamber as follows: for one week, the plants were maintained at 10 ◦C 100% humidity and 8 h photoperiod, followed by one week of darkness with the same temperature and humidity conditions.

#### *2.2. Seedling Morphology Measurements*

Before the cold acclimation treatment, morphometric traits of 40 seedlings per population were measured to the nearest millimeter. To assess growth cessation during the acclimation treatment, the needle weight of pooled sample representing four repeats for the active growth and acclimated experimental groups of each population was measured. The pooled samples were prepared by combining equal amount of material from first crowns of needles from 25 seedlings.

#### *2.3. Two-Dimensional Gel Electrophoresis and Protein Identification*

Four biological repeats representing the active growth and acclimated experimental groups of each population were prepared as pooled needle samples as described in the previous section, and were frozen in liquid nitrogen and stored at −70 ◦C. Protein samples were prepared using phenol extraction and ammonium acetate precipitation, as described previously [48]. Briefly, samples were ground in liquid nitrogen and suspended in 500 μL of the extraction buffer (0.5 M Tris-HCl, pH −7.5, 0.1 M KCl, 0.05 M ethylenediaminetetraacetic acid, 0.7 M sucrose, 2% polyvinylpolypyrrolidone, 2% β-mercaptoethanol, 1 mM phenylmethanesulfonylfluoride). Equal volume of Tris-buffered phenol (pH −7.5) was added and incubated with shaking for 30 min at 4 ◦C. The tubes were centrifugated for 15 min at 15,000× *g* at 4 ◦C, the upper phase was recovered, and the phenol extraction was repeated. Proteins were precipitated by addition of equal volume of 3M ammonium acetate in methanol, incubation at −20 ◦C overnight and centrifugation as before. Protein pellets were washed twice by resuspending in cold methanol and once in cold acetone and centrifugation for 5 min as before. After aspiration of acetone, protein pellets were dried for few minutes and solubilised in two-dimensional electrophoresis lysis buffer (6 M urea, 2 M thiourea, 4% 3-cholamidopropyl dimethylammonio 1-propanesulfonate, 10 mM Tris-HCl, pH−8.5). Protein concentrations were measured using a Bradford assay [49]. Four biological repeats were prepared for each treatment. Internal standards were prepared from a pooled mixture of all protein extracts.

Protein separation and detection was performed using a differential gel electrophoresis procedure as described previously [50]. Briefly, sample aliquots of 50 mg were used for labeling with Cy3 and Cy5 fluorescent dyes, and the internal standard was labeled with Cy2 dye (Lumiprobe). For the preparative gel, 500 mg of unlabeled internal standard was mixed with 50 mg of the labeled internal standard. Proteins were applied to 24 cm immobilized pH gradient strips with a linear gradient of pH 4–7 and isoelectric focusing was performed with an Ettan IPGphor (GE Healthcare, Chicago, IL, USA). Further, the proteins were separated on 1-mm thick 12.5% polyacrylamide gels with Ettan DALTsix electrophoresis apparatus (GE Healthcare) and gels were scanned using a FLA 9000 fluorescence scanner

(GE Healthcare). Relative protein quantifications were performed using DeCyder 2-D Differential Analysis Software, v.7.0 (GE Healthcare).

Preparative gel was fixed in 50% methanol and 10% acetic acid. Protein spots were excised manually and subjected to protein digestion with trypsin, according to a method described previously [51]. Protein digests were loaded and desalted on a 100 mm × 20 mm Acclaim PepMap C18 trap column and separated on a 75 mm × 150 mm Acclaim PepMap C18 column using an Ultimate 3000 rapid separation liquid chromatography system (Thermo-Scientific, Waltham, MA, USA), coupled to a Maxis G4 Q-TOF mass spectrometer detector with a Captive Spray nano-electrospray ionization source (Bruker Daltonics, Bremen, Germany). Peptide identification was performed using the MASCOT server (Matrix Science, Boston, MA, USA) against the *Pinus taeda* L. genome database [52]. The mass error tolerance for peptide matching was limited to 5 ppm. The threshold value for the identification of proteins was a Mascot score >50 and at least two peptides. Where more than one protein was identified from the same spot, the exponentially modified protein abundance index (emPAI) [53] was employed for the relative quantitation of the proteins. Spots were designated as protein mixture if the emPAI indexes were similar for several of the identified proteins.

#### *2.4. Data Analysis*

A *t*-test was used to assess to assess the differences in the seedling morphology traits among the populations (40 seedlings per population).

The Biological Variation Analysis module of the DeCyder software was used to match protein spots of the four biological repeats across different gels and to perform statistical analysis of differential protein expression. A two-way analysis of variance (ANOVA) test was used to assess statistically significant (*p* ≤ 0.01) population and cold acclimation treatment effects. The threshold value of at least 2-fold difference in protein abundance was used.

To find patterns in the expression profiles, a cluster analysis was performed using Euclidian distance matrix and complete linkage method at the Extended Data Analysis module of the DeCyder software.

Blast2GO software [54] was used for the annotation and gene ontology analysis of the identified protein sequences. The identified *Pinus taeda* sequences were searched with Basic Local Alignment Search Tool (BLAST) against Swissprot database and protein sequences and annotated with Gene Ontology (GO) terms.

#### **3. Results**

#### *3.1. Seedling Morphology*

After 3-month cultivation under growth-promoting conditions (16 h light, high humidity and constant temperature of 24 ◦C), the seedlings of the Finnish population had the shortest length of stem and cotyledons (Table 1). There was no significant difference in the seedling height between the Lithuanian and Spanish populations. Seedlings of the Spanish population had more cotyledons and produced the largest needle biomass, as estimated by fresh weight of the sampled needles. During the acclimation treatment, there was no significant change of needle fresh weight for the seedlings from the Spanish population, indicating growth cessation. Meanwhile, the needle mass increased approximately 1,5-fold for the two northern populations.


**Table 1.** Quantitative traits of pine seedlings before and after the cold acclimation treatment.

Seedling height and cotyledon number were measured on 40 seedlings per population. Needle weight represents fresh weight of four pooled samples of 25 seedlings per population. The data are presented as mean and standard error of the mean. Means followed by the same letter are not significantly different (*p* < 0.05 by Tukey least significant difference (LSD) test).

#### *3.2. Di*ff*erentially Expressed Proteins*

The average number of detected protein spots was 1464 ± 145 per gel after alignment (Figure S1). Analysis of biological variance revealed statistically significant and >2-fold variation in the abundance of 76 proteoforms among the six experimental groups (non-aclimated and acclimated seedlings from the Spanish, Lithuanian and Finnish populations) (Figures 2 and 3). The largest number of unique differentially expressed protein spots were a characteristic of the Finnish population (25 and 20 in the non-acclimated and acclimated experimental groups, respectively), followed by the Spanish population (15 and 19). The Lithuanian population had the lowest number of unique spots (2 and 1) (Figure 2a,b). The number of proteoforms differentially expressed between populations was similar for the non-acclimated and acclimated experimental group. The abundance of 53 proteoforms changed after cold acclimation treatment (Figure 2c), and approx. 72% of the detected changes were independent of the seedling origin.

**Figure 2.** Quantitative distribution of differentially expressed protein spots identified by two-dimensional gel electrophoresis analysis of needle protein samples isolated from seedlings originating from the Spanish (SP), Finish (FI) and Lithuanian (LT) Scots pine populations. Number of differentially expressed protein spots between the populations under growth promoting conditions (**a**) and upon acclimation (**b**) is shown. Number of upregulated and downregulated protein spots upon acclimation treatment are indicated by numbers separated by slash (**c**). The portion of the cycles overlapping between two or all three populations indicate a number of protein spots that are differentially expressed between the two or among all three populations, respectively.

**Figure 3.** Quantitative differences in protein abundance between non-aclimated and acclimated Scots pine seedlings originating from the geographically separated populations. Representative images of ten identified spots are shown. Complete gel image is presented in Figure S1.

Through liquid chromatography-tandem mass spectrometry fingerprinting of trypsin-digested peptides, 26 proteoforms were unequivocally identified and annotated (Table A1). Among the identified proteins, 14 were associated with metabolic process (0044237), 17 were associated with a related term of cellular metabolic process (0008152) and 15 were linked to response to stress (0006950). At a lower GO hierarchy level, the terms included small molecule metabolic process (0044281), cellular protein metabolic process (0044267), cellular component organization (0016043), regulation of biological process (0050789), oxidation-reduction process (0055114), defense response (0006952) and response to salt stress (0009651), as indicated in Figure 4. Three proteins annotated as thaumatin-like protein (PITA\_000066768-RA), chloroplastic TIC 62 protein (PITA\_000059572-RA) and NAD-binding Rossmann-fold superfamily protein (PITA\_000059572-RA) (Table S1) had no GO terms assigned but they are known to be involved in stress response processes (Albrecht and Bowman, 2012; Dutta et al., 2009; Rahjam et al., 2007). Another 5 proteoforms were identified as proteins coded by the genes of unknown function, and 29 spots contained a mixture of 2–4 proteins (Table S1).

#### *3.3. Protein Expression Patterns*

Cluster analysis of variation of abundance of the 76 differentially expressed protein spots revealed protein expression patterns associated with pine seedling origin and/or their response to acclimation treatment (Figure 4). Cold acclimation treatment had little or no significant effect on the expression of the proteins assigned to the clusters 4–5, 7, 9 and 11–12 and these mostly reflected population-depended protein expression differences. The most prominent difference among the three populations was observed for the two proteoforms assigned to the cluster 12. The lactoylglutathione lyase had approximately 5- to 10-fold higher abundance in the seedlings from the Spanish population as compared to the other two populations. Four proteoforms included in the cluster 11 had higher abundance in the Spanish population (none of them could be identified). Furthermore, six proteoforms (including tubulin beta chain-like and tau class glutathione s-transferase (GST) were assigned to the cluster 9 and had higher abundance in the Spanish and Lithuanian populations.

**Figure 4.** Cluster analysis of abundance of the 76 protein spots differentially expressed in needles of the Scots pine seedlings among the three populations and/or under cold acclimation treatment. The standardized abundance scale is show at the lower left corner. Thirteen major clusters are marked by numbers. Star indicates spots that were identified as mixtures of 2–4 proteins (listed in Table S1). GO terms associated with the identified unique protein sequences are shown on the right (the terms corresponding to the indicated numbers are listed in the Results section).

In contrast, a group of 22 proteins (including calreticulin-3-like isoform x1, malate dehydrogenase (MDH), proteasome subunit beta type-5 (PSMB5), alcohol dehydrogenase, 40 s ribosomal protein sa-like, chaperone protein 1, Ser/Thr-protein phosphatase 2A (PP2A) catalytic subunit) had higher abundance either in the Finnish population or both northern populations and were assigned to the clusters 4, 5 and 7.

Expression pattern of the remaining clusters 1–3, 6, 8, 10 and 13 reflected population- and/or acclimation-dependent differences (Figure 4). Before the acclimation, the majority of the 18 proteoforms separated into the clusters 1, 3 and 13 (including CP29-like, thaumatine-like protein (TLP), cyanate hydratase and two proteoforms of pathogenesis-related 10 (PR-10) were significantly more abundant in the seedlings of the Finnish or both northern populations. These proteins were significantly upregulated in the acclimated seedlings from all the three populations, however, for cluster 3, the response was more prominent in the Finnish and/or in both northern populations. Cyanate hydratase (cluster 13) was upregulated approximately 10-fold in the acclimated seedlings of the Finnish population compared to the non-acclimated control, representing the strongest response to the acclimation treatment.

Among the 10 proteoforms included in cluster 2, only two significant differences (unidentified proteoforms 7 and 8) were detected among the populations before the acclimation treatment. All proteins of this cluster were upregulated by the acclimation treatment, and the response was more prominent in the seedlings for the northern populations, leading to increase of significant differences among the populations (protein disulfide-isomerase (PDI), nucleoside diphosphate kinase 1 (NDK1) and three unidentified proteoforms).

Furthermore, an abundance of 14 proteoforms included in the clusters 6, 8 and 10 was markedly decreased upon the acclimation treatment. The response to acclimation had similar effect on protein expression for all three populations, however, the protein abundance pattern was different. The northern populations had higher abundance of the guanosine-diphosphate (GDP)-mannose epimerase 2 and proteoform 49 among the three proteoforms of the cluster 6. Meanwhile the remaining proteins (clusters 8 and 10) including probable linoleate 9s-LOX 5, aconitase, TIC 62, oxygen-evolving enhancer protein (OEE), nicotinamide adenine dinucleotide (NAD)-binding RF superfamily protein, abscisic acid water deficit stress and ripening-like (ASR-like) and water deficit inducible lipoprotein 3 (LP3)-like, were more abundant in the samples from the Spanish population.

Notably, only a few proteoforms (including four identified proteins) showed a strictly population-specific regulation upon the acclimation treatment. Malate dehydrogenase was upregulated in both northern populations, although the acclimation-induced expression level in the Lithuanian population was still below the constitutive expression level in the Spanish population. Calreticulin-3-like isoform and alcohol dehydrogenase were upregulated in the Finnish and Lithuanian populations, respectively. Meanwhile, the catalytic subunit of Ser/Thr-PP2A was downregulated in the northern populations, and the protein abundance was reduced to the level that was comparable to the constitutive expression observed in the Spanish population.

#### **4. Discussion**

An adaptive capacity of species in response to specific environmental cues could be inferred from mechanistic information at molecular level and its integration with ecologically relevant phenotypes [55]. A previous study with conifer tree species by [19] has demonstrated that proteomic analysis of young seedlings imparts a variation of gene expression that could be of importance for adaptation to contrasting altitude conditions in the closely related Norway spruce ecotypes, and the adaptive capacity could be further explored by heat stress treatment of the seedlings [18]. Our study revealed a similar quantitative variation of protein expression profiles in the needles of young Scots pine seedlings from geographically distant populations, although it reflects a variation of different physiological traits of potentially adaptive significance that could be species specific. Conifer seedlings develop an acclimation capacity in response to a short photoperiod and low temperatures at an early stage of of development [56], therefore cold acclimation treatment of the seedlings have stimulated additional

differences in protein expression profiles that reflect a variation in cold adaptation capacity of the pine genotypes.

#### *4.1. Stress Response Proteins Upregulated in the Northern Populations under Active Growth Conditions*

Under active growth promoting conditions, both northern populations and in particular the Finnish population had higher abundance of proteins involved in protein metabolism (synthesis or degradation), such as heat shock protein (HSP), calreticulin, structural constituents of 40S ribosomes and 26S proteasome (Figure 4), that are also known as constituents of the stress tolerance mechanism. The chaperone protein 1 is a 100 kDa HSP that is critical for thermotolerance [57] and plant development [58]. Studies with *Arabidopsis* mutants revealed that constitutive expression of Hsp100 leads to preformed adaptation that ensures effective protection under heat stress [59]. Calreticulin is another chaperone protein which folds newly synthesized proteins and is also involved in regulation of Ca2<sup>+</sup> homeostasis in the endoplasmic reticulum [60]. Upregulation of calreticulin has been observed under salt and osmotic stress conditions in *Arabidopsis* [61], and a function of the signaling component involved in response to cold stress in rice was proposed for the protein [62]. The proteasome subunit beta type-5 is a component of the 26S proteasome that mediates ubiquitin-dependent proteolysis of proteins and plays an important role in the stress-induced degradation of misfolded proteins, as well as the regulation of stress response signaling [63]. The 40s ribosomal protein sa-like is a structural constituent of ribosomes that regulates protein synthesis. A function of the p40 protein homologue is crucial for the active tissue growth [64] and its transcriptional upregulation under salt stress has been described in *Arabidopsis* [65].

Furthermore, upregulation of the TLP and PR-10, has been unique to the Finnish population. TLP proteins belong to the PR5 protein family and are induced under biotic, osmotic or cold stress in a variety of plants [66–68]. It is also well established that the PR-10 proteins are involved in biotic and abiotic stress response [69]. In perennial plants, expression of the PR-10 is induced during winter dormancy in mulbery and it provides either cryoprotectection for freeze-labile enzymes or serves as a nitrogen-storage protein [70].

In addition, the seedlings from the Finnish population have had higher abundance of alcohol dehydrogenase, malate dehydrogenase, GDP-mannose-epimerase 2, and cyanate hydratase enzymes that are involved in the primary cell metabolism, but also play role in the stress tolerance. Upregulation of alcohol dehydrogenase is essential in anaerobic respiration under cold stress conditions [71,72]. Malate is important intermediate and regulator of metabolic processes. Plants have several malate dehydrogenases that catalyze conversion of malate and oxaloacetate coupled to reduction or oxidation of the NAD pool, and this function has been linked to an abiotic stress tolerance in alfalfa and apple [73,74]. Upregulation of the cyanate hydratase upon salt stress was observed in tomato [75] and *Suaeda aegyptiaca* (Hasselq.) Zohary plants [76]. An accumulation of cyanide compounds is associated with elevated levels of ethylene synthesis as a consequence of plant stress response, therefore cyanate hydratase is considered a key enzyme responsible for the cyanide detoxification [77,78]. GDP-mannose-epimerase converts GDP-D-mannose to GDP-L-galactose, a precursor of ascorbate and cell wall polysaccharides [79]. This branching metabolic pathway plays a dual role in neutralization of the reactive oxygen species [80] as well as plant development [81].

The seedlings of the northern populations showed shorter phenotype and accumulated less needle biomass as compared to the Spanish population under active growth conditions (Table 1). It could be presumed that the reduced growth and upregulation of the stress response proteins might be an indicator of onset of the cold acclimation processes triggered by photoperiod used for the active growth experiments (16 h light and 8 h dark), owing to the short night length critical for growth cessation for the northern populations [34]. However, the seedlings had a relatively high level of the catalytic subunit of PP2A enzyme under active growth conditions that was effectively downregulated by the acclimation treatment. The PP2A is involved in cold stress signaling as a negative modulator of adaptive responses [82], and transcript mRNA levels of the PP2A has been reduced in tomato and alfalfa plants upon cold stress treatment [83,84]. This would suggest that the seedlings of the northern populations maintained active growth under the conditions used in the experiment, and the observed growth and protein expression differences were rather a consequence of physiological differences among the genotypes. Higher abundance of the proteins involved in stress tolerance could be linked to a constitutive cold adaptation mechanism characteristic of the northern genotypes.

#### *4.2. Stress Response Proteins Upregulated in the Lithuanian and*/*or Spanish Population under Active Growth Conditions*

The clusters 9 through 12 shown in Figure 4 include 20 proteoforms that have been upregulated mostly in the Lithuanian and/or Spanish populations under active growth conditions. Lactoylglutathione lyase, also known as glyoxalase I, is highly expressed in the Spanish population (approx. 10 and 5-fold compared to the Lithuanian and Finnish populations, respectively). The enzyme detoxifies methylglyoxal, a cytotoxic compound that is produced through glycolysis intermediates and generates free radicals [85]. Levels of methylglyoxal have been shown to rise under salinity stress and are regulated by glutathione and glyoxalase I [86].

As it could be expected due to geographical origin, the Lithuanian population tends to follow a similar protein expression pattern as the Finnish population located further north. However, expression of the proteins assigned to the clusters 9 and 10 (including tubulin beta-chain like, tau class GST, chloroplastic OEE, ASR- and LP3-like) in the Lithuanian population is similar to the Spanish population. The tau class GST, is involved in the reactive oxygen species (ROS) detoxification reactions that conjugate reduced gluthatione with a variety of target compounds. Expression of GSTs in plants is responsive to both biotic and abiotic stresses [87] and overexpression of the enzyme leads to chilling and osmotic stress tolerance in tobacco seedlings [88]. Beta-tubulin is one of the two structural components of cell microtubules and it is intimately involved in regulation of cell morphogenesis. In addition, it has been shown that the ROS signaling-mediated rearrangement in tubulin cytoskeleton plays crucial part in cell adaptation to stress [89]. Further, the abscisic acid induced ASR-like protein plays an important role in drought and salinity stress [90], and overexpression of *ASR* genes results in an increased cold tolerance [91,92]. LP3-like protein is a water-deficit-induced protein, which is homologous to the ASR and accumulates in loblolly pine roots upon water-deficit stress [93]. OEE is involved in the assembly of the Photosystem II and has a protective effect on the manganese cluster [94]. Increased expression of OEE1 was shown under salt or water deficiency stress [95,96].

The observed variation of stress-related protein expression among the populations from the southern and northern boundaries of the continent indicates presence of the constitutive stress tolerance mechanisms that are pertinent to the climatic conditions of the geographically distant habitats. The Lithuanian population combines a mixture of the traits.

#### *4.3. E*ff*ect of Acclimation Treatment on Protein Expression*

Cold acclimation treatment revealed additional protein expression differences that could be separated into three major groups that include proteins either (i) upregulated or (ii) downregulated in seedlings of all the three populations, or (iii) regulated in a population-specific manner. These protein expression differences suggest the presence of distinct elements of the cold acclimation mechanism and/or variation of their regulation among the geographically distant pine populations.

The function of the first group of proteins was linked to stress response. HSC70-2 is a protein folding catalyst involved in protein stabilization under abiotic stress conditions [97]. PDI belongs to another well-known family of chaperones that mediates formation, isomerization and reduction/oxidation of disulfide bonds during protein synthesis [98]. Ribonucleoprotein chloroplastic-like is required for normal chloroplast development under cold stress conditions by stabilizing transcripts of mRNAs [99]. Nucleoside diphosphate kinase 1 is a cytosolic enzyme that is essential in homeostasis of cellular nucleoside triphosphate pools [100], and its role under cold stress conditions has been described [101]. The stress response related proteins PR-10, TLP and cyanate hydratase have been also highly upregulated

upon acclimation treatment, however, their abundance in the Spanish and Lithuanian populations have not exceed the level observed in the Finnish population under active growth conditions.

Acclimation treatment downregulates chloroplastic Tic62, a component of Tic translocon that mediated translocation of nuclear-encoded precursor proteins across the inner envelope of chloroplasts, and chloroplastic OEE were downregulated upon the acclimation treatment. Further downregulation of several enzymes, including cytoplasmic aconitase involved in the Krebs cycle, probable linoleate 9S-lipoxygenase 5 of the oxylipin synthesis pathway and GDP-mannose-epimerase 2 enzyme involved in the ascorbate and cell wall polysaccharide synthesis might be the result of reduced metabolic and photosynthetic activity in the seedlings maintained under the acclimation conditions of low temperature and reduced light. However, seasonal changes in chloroplast ultrastructure [102] and depression of photosynthesis [103] is also a part of the winter adaptation process in conifers.

Abundance of ASR-like and homologous LP3-like protein have been also downregulated in the seedling needle samples. This may appear to contradict the previous studies demonstrating that the ASP and LP3 play essential role in stress response [90,91,93]. However, the study by Padmanabhan et al. [93] has shown that the transcript of the LP3 protein is preferentially induced in roots with a constitutive basal level of expression observed in stems and needles, which would explain low expression levels in our experimental set up.

Among the proteins regulated in a population-specific manner, PP2A, the negative modulator of adaptive response [82], has been downregulated, and MDH and calreticulin-3-like isoforms have been upregulated in the northern populations upon the acclimation treatment. In the Spanish population, the response to acclimation treatment has a distinctive reduced intensity compared to the northern populations.

#### **5. Conclusions**

Our study demonstrates that comparative proteomic analysis of young Scots pine seedlings is capable of reflecting on adaptive capacity among the geographically distant populations. Under active growth conditions, quantitative protein expression differences were related to biological processes of cell metabolism and stress response and could represent elements of the adaptive mechanism tailored to the specific growth environment. Furthermore, acclimation treatment revealed an additional stratum of protein expression differences that could provide insight into molecular basis of the cold adaptation capacity among the genotypes. As could be expected, the acclimation-induced protein expression patterns confirmed the presence of a similar cold acclimation mechanism among the pine populations, however the analysis also revealed a discrete variation of the adaptive traits. The most prominent differences were substantiated by the constitutive low levels of the PP2A enzyme expression and the overall less-pronounced response to the acclimation treatment of seedlings from the Spanish population. The identified proteins involved in the constitutive or acclimation-induced expression profiles presented interesting candidates for further studies on cold adaptation physiology in pine.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/11/1/89/s1: Figure S1: Two-dimensional electrophoresis gel of the needle protein samples isolated from Scots pine seedlings; Table S1: Proteins differentially expressed in seedlings of the different populations of Scots pine under active growth conditions and upon acclimation treatment that were identified as unknown protein sequences of the *Pinus taeda* genome or contained several protein mixtures.

**Author Contributions:** Conceptualization, D.B. and D.D.; investigation, D.B., M.S., P.H., I.T.; writing—original draft preparation, D.B., D.D.; writing—review and editing, M.S., P.H., I.T.; funding acquisition, D.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Science and Education of Lithuania, grant number VP1-3.1-ŠMM-08-K-01-025.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**


**Table A1.** Proteins differentially expressed in seedlings of the different populations of Scots pine under active growth conditions (white bars) and upon acclimation treatment (grey bars).

**Table A1.** *Cont*.



**Table A1.** *Cont*.

<sup>a</sup> Sequences that were selected as most abundant sequences from protein mixture using emPAI index. Data presented in graphs as log mean and standard error of the mean of standardized abundance, the same letters indicate statistically significant (*p* < 0.01) differences between the three populations, the same number of stars indicate significant difference between non-acclimated and acclimated experimental groups within each population. Abbreviations: p. no.—peptide number; SC—sequence coverage; Theoretical MW/pI—theoretical molecular weight and pI values; SP, LT, FI—Scots pine populations from Spain, Lithuania and Finland.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Bark Features for Identifying Resonance Spruce Standing Timber**

#### **Florin Dinulică 1, Cristian-Teofil Albu 2, Maria Magdalena Vasilescu 1,\* and Mariana Domnica Stanciu <sup>3</sup>**


Received: 15 August 2019; Accepted: 6 September 2019; Published: 12 September 2019

**Abstract:** Measuring the acoustic properties of wood is not feasible for most luthiers, so identifying simple, valid criteria for diagnosis remains an exciting challenge when selecting materials for manufacturing musical instruments. This article aims to verify whether the bark qualities as a marker of resonance wood are indeed useful. The morphometric and colour traits (in CIELab space) of the bark scales were compared with the structural (width and regularity of the growth rings and of the latewood) and acoustic features (transverse sound velocity, radiation ratio, impedance, and wood basic density) of the wood from 145 standing and 10 felled spruce trees, which are considered a resource of the resonance wood in the Romanian Carpathians. It has been emphasized that the spruce trees with acoustic and structural features that match the requirements for the manufacture of violins have a bark phenotype distinguishable by colour (higher redness, lower yellowness and brightness)—as well as by scale shape (higher slenderness and width). The south-facing side of the trunk and the external side of the scale are best for identifying resonance trees by their bark. Additionally, the mature bark phenotypes denote topoclinal variations and do not depend on tree age. Moreover, the differences among bark phenotypes are noticeable to the naked eye.

**Keywords:** bark phenotype; bark scale; Norway spruce; resonance wood; sonic tomography

#### **1. Introduction**

There are 1.43 million ha of Norway spruce (*Picea abies* L. (Karst)) in the Romanian Carpathians [1], which host one of the most important resources of resonance wood in Europe in terms of value and volume [2,3]. Although it is continuously decreasing, the resonance wood resource still satisfies the demand for the local musical instruments industry, which are shipped across all continents [4].

Recognizing standing trees that have resonance wood has always been a challenge for luthiers. Long-time observations revealed the distinct physiognomy of resonance trees [5–7], but the topic remained somewhat unsolved until some of the physiognomic features were acoustically verified [8]. The literature provided a few morphological descriptors of the stem and crown of resonance trees [9,10] and of raw resonance wood [11–15]. However, other points of view have not placed much trust in these descriptors [16]—labelling them as folkloric [17]. Given the indicative value of some traits of trees in relation to the material's acoustic quality, we shall hereafter call them phenotypical markers or morphological descriptors of resonance spruce. Some luthiers empirically associate resonance spruce with the phenotype of smooth or thin fissured barked trees with small, soft, and rounded scales, grouped vertically [8,9,18], which is different from regular spruce trees that have thicker and deeply furrowed bark at the harvesting age of the resonance wood [4]. Furthermore, spruce with indented

rings is sought after by violin makers [19]—specifically indentations imprinting the underbark side [20]. In any case, the bark descriptors of resonance wood have not yet been acoustically and statistically checked [21].

Normally, bark width returns the tree growth traits [22] and together with these, they are ageand site-related [23]. The qualitative features of bark, such as relief and colour, are predominantly hereditary [24,25] and have a certain taxonomic value so that the bark texture can allow for the digital identification of species [26,27].

Besides the ecological and functional significance [28,29], bark morphology can be a good indicator of wood properties [30]. For instance, in the case of fir trees of similar age—which have persistent smooth bark—the wood is lighter, and the cellulose amount is smaller in contrast to early rough bark trees [31]. In the case of European and Chinese pear trees, the early rough bark trees contain more lignin in the wood and have a low carbon use ratio [32]. In the case of Scots pine trees, which have deeply fissured bark in thin, square plates that are smooth and light in colour, the heartwood is red, and its amount—alongside the amount of resin—is larger [33].

Sound velocity, wood density, dynamic modulus of elasticity and their indices, as radiation ratio, specific modulus, characteristic impedance, and acoustic converting efficiency, are preferred for expressing the suitability for strings [14,33–36]. High values for specific modulus of elasticity, sound velocity and radiation ratio, lower values for impedance and internal friction, as well as the lower values for density are recommended in the choice of material for soundboards [14,37–41]. Sound propagation velocity sets the clarity of the sound emitted by the musical instrument, acoustic radiation is a measure of acoustic power—in particular of the sound loudness—and the acoustic impedance expresses the sound sprinting [42,43].

Vibrational methods have already become common in identifying damages in standing trees [44–47] and assessing tree stiffness [48,49]. Using them in examining the tree goodness for the manufacture of musical instruments is still in the early stages [8]. New advanced methods, such as X-ray light microtomography coupled with scanning electron microscopy, are involved in describing the acoustical behavior of the wood [50]. Our aim is to check the hypothesis of the link between the bark features and the acoustic qualities of wood originating from stands that supply raw materials for violin manufacturing—specifically the possibility of an objective diagnosis of resonance wood using the bark. For this purpose, we: (1) identified the variation sources of the bark phenotype; (2) checked the connection between the bark features and the wood structure, and (3) verified the relation between the bark features and the acoustic properties of the wood.

#### **2. Materials and Methods**

#### *2.1. Sampled Area*

The materials originate from the Gurghiu Mountains, which today are home to the largest concentration of resonance wood in the Romanian Carpathians (Figure 1). The relief and the local volcanic substratum favoured the selection and settlement of the resonance spruce phenotype [4]. The resonance spruce trees are located inside the former volcanic basin, which today forms the Gurghiu Mountains. The volcanic bowl protects the trees from excess wind circulation, thus reducing the occurrence of compression wood that compromises the acoustic value of the wood [51]. The mean annual temperature ranges around 5.2 ◦C at 1200 to 1300 m elevation. The soil bedrock is of andesite origin, well-supplied with rain water (850–1100 mm·year−1). The soils are deep, loose, and of moderate fertility.

**Figure 1.** Location of the sampled resource of resonance spruce wood.

The local stands containing resonance wood are mixed, relatively uneven-aged or two-storied stands, and consist of spruce (*Picea abies* Karst.), beech (*Fagus sylvatica* L.), fir (*Abies alba* Mill.) and sometimes sycamore (*Acer pseudoplatanus* L.) as well. The resonance spruce trees cover the middle third of the slope—which are moderately to highly sloped—and avoid the cold air in the valleys or on the peaks.

#### *2.2. Sampling Design*

To check the link between the bark features, the wood structure, and the acoustic properties of the wood, four sample plots were established (Figure 2) in stands that had the highest occurrence of resonance spruce (identified by habitus [5,15,18]).

**Figure 2.** Scheme of the sampling design and outlook of one of the sample sites.

In the first three plots, the spruce trees with diameters larger than 20 cm were cored. The samples were collected at breast height, on two directions; one from the self-pruned sector of the bole and the other from the non-pruned sector. An increment borer with a length of 400 mm and an inner diameter of 5 mm was used (Table 1, denoted a). On the fourth sample plot (GUR4) which had undergone logging operations, 10 of the resonance trees used later by Gliga Company for manufacturing violins were selected. These 10 trees yielded 18 resonance logs from where discs were cut every 2 m—resulting in 65 discs that were used for the present study (Table 1, denoted b).



\* Cmeu—Eutric Cambisols; Cmdy—Distric Cambisols; PZrs—Rustic podzols—BOis—Lepti-dystric Cambisols [52]; \*\* BC—European beech; NS—Norway spruce; SF—Silver fir; \*\*\* Range (first quartile–third quartile) and median;

<sup>a</sup> cores; <sup>b</sup> discs.

Bark samples (six bark scales) were taken at breast height from each tree and disc: three facing North and three facing South. This was done to check the influence of the various insolation conditions on the bark features [53].

#### *2.3. Processing the Material*

#### 2.3.1. Bark Measurements

The measurements on the sampled bark were performed in the laboratory after seasoning, lasting at least three months. At the time of the measurements, the humidity of the bark scales was 11%, determined using the Ohaus MB45 halogen moisture analyzer [54] by seasoning at 104 ± 1 ◦C. The length LS and width WS of each scale were measured. The bark scale slenderness index (LS/WS ratio) was calculated as well. In order to quantify the colour, the CIELab chromatic system was employed [55] by using the CR-400 portable colorimeter [56]. Through this system, the colour is rendered in a space with three coordinate axes whose values are provided by the colour meter: *L* \*—colour brightness or whiteness (%), *a* \*—colour redness/greenness, and *b* \*—colour yellowness/blueness. Both sides of the scales were scanned three times. A total number of 1884 scales were measured. The ability of visual perception of the colour differences between bark samples was checked using the colour difference index Δ*E* \* resulted from calculations (relation 1). Its values were balanced to the perception scales proposed by Minemura and Umehara [57]:

$$
\Delta E = \sqrt[2]{\Delta L^{\*2} + \Delta a^{\*2} + \Delta b^{\*2}}.\tag{1}
$$

#### 2.3.2. Wood Structural Data Acquisition

After seasoning, the cores were mounted on boards. The samples from the GUR2 plot, which were used for the destructive determination of the wood density, were not glued to the board as the rings were measured after the surface of the cores was smoothed. The discs and the mounted cores were sanded to meet the requirements of 1200 dpi scanning resolution. The individual growth rings were transferred to digital format using WinDENDRO (Régent Instruments Inc., Québec City, Canada) Density equipment [58]. The measurements were carried out on the images resulted from the scans. For each disc, the radii of the four cardinal points were measured.

The digital format of the growth rings consists of: ring width TRW, earlywood width, latewood width LWW, earlywood proportion, and latewood proportion LWP. The raw data were stacked into chronological series. Using these raw variables, two more indices were produced for identifying the resonance wood: the difference in width between the consecutive rings DBR (mm) and the ring width regularity index RI [13,59]. For the discs, the ring circumferential regularity index CI was calculated, as percentage deviation from the average of extreme values along the girth, using the relation:

$$CI\_i = \frac{\max\{TR\_{i\_k}\} - \min\{TR\_{i\_k}\}}{average\{TR\_{i\_k}\}} \cdot 100 \ [\%] \,\tag{2}$$

where *CIi* is the circumferential irregularity of ring *i*, *TRik* is the width of ring *i* on radius *k* of the disc.

The width of the area with resonance wood (LRE) was established according to the fluctuations TRW, LWP, and DBR along the radius [59]. The size of these features enabled the structural classification of the sampled trees (Table 2). The average structural quality class (SQC) was calculated for the trees where two cores had been extracted.


**Table 2.** The structural-qualitative classification of raw material intended for violin making [4].

\* In the case of the first three classes, a maximum of seven growth rings exceeding the limits of the criteria for resonance wood by one standard deviation are admitted.

#### 2.3.3. Acoustic Measurements

The transverse sound velocity was measured (m·s−1) in the standing trees and discs using the Arbotom sonic tomograph produced by Rinntech [60]. Although in the acoustic tests the longitudinal direction is preferred, the vibratory properties of the wood across the fibre are proportional to those along the fibre [61]. The acoustic measurements in the standing trees were done outside the vegetation season when the wood had an average moisture content gravimetrically determined, of 39% in heartwood and 98% in sapwood, and the air temperature was around 4 ◦C. The acoustic measurements of the discs were done in the laboratory, at the wood moisture content of 10%. The 15 sensors were put on in the breast height section of the tree, 15 to 20 cm apart (Figure 3). The position of the sensors was established according to the directions on which the coring was done in the standing trees, or the rings were measured in the discs. The sensors were hammered five times with the same force. The software associated to the Arbotom equipment generated the sound velocity matrices and the sonic tomograms.

**Figure 3.** Caption of a tomography in the field.

To calculate the radiation ratio and the specific acoustic resistance (impedance), the wood basic density of the cores from the GUR2 plot was measured. For this purpose, the saturation method developed by Keylwerth [62] was used. The resin was removed from the cores through classic extraction using the Soxhlet device [63] because resin biased the acoustic radiation [64] and the wood density size.

Then, the cores were divided according to the quality sections of the wood structure: wood of structural quality for manufacturing musical instruments and wood with no structural qualities, respectively (Table 2). The solvent was a mixture of benzene and absolute ethyl alcohol 2:1 [65]. The extraction lasted 12 h for each sample. Saturation was obtained by boiling in distilled water for 12 hours [66]. After boiling, the samples were weighed, resulting the mass *m*max, then dried in a drying oven at 103 ◦C until constant mass was achieved [67]. After cooling in an exicator, they were weighed again (mass *m*0). The basic density size (ρ*c*) resulted from the following calculation [68]:

$$\rho\_c = \frac{1}{\frac{m\_{\text{max}}}{m\_0} - 0.3464} \left[ \text{g} \cdot \text{cm}^{-3} \right]. \tag{3}$$

The radiation ratio R was calculated by dividing sound velocity to the basic density of the core on the direction the coring was done [38]. The impedance was calculated by multiplying sound velocity with wood basic density [38].

#### *2.4. Data Processing*

In the case of the split discs, only the values on the directions that did not cross the shake were kept in the sound velocity matrix. In the statistical processing, the sound velocity values lower than 600 m·s−<sup>1</sup> were removed. The rough data were imported and processed using STATISTICA 8.0 [69].

Simple, multiple, and partial correlation analysis was employed in order to quantify the dependence between the variables [70]. Multiple correlation was used in order to identify and estimate the influence of the explicative variables. Only the variables with a statistically significant coefficient of partial regression (as confirmed by the *t* test) were considered. Some of the links were transposed into regressions. The regression model with the highest determination coefficients was adopted. To avoid multicollinearity, the relation between the predictors was verified beforehand, and only the explicative variables independent from one another were retained in the model.

The structural quality class of the basal area wood was adopted as a criterion for the stratification of the morphological tree features. Only the variables showing a link with SQC previously confirmed using a significance test were stratified. When selecting this test, the normality of the distributions was interrogated using the Shapiro-Wilks test. For the Gaussian type variables, ANOVA was adopted, and the non-parametric Kruskal-Wallis test or rank tests were adopted for the other variables [70].

#### **3. Results**

#### *3.1. The Variability of the Bark Features*

The examined bark features (scale size and colour) were moderately variable (coefficients of variation between samples from 15.7% to 32.9%)—excepting scale colour redness/greenness—which was highly variable (coefficient of variation of 82.6% and the largest amplitude). None of these variables had a Gaussian allure (*W* > 0.850, *p* < 0.0001). All were variables with continuous variations, some with high percent relative range, such as colour redness/greenness and scale length (376% and 235%, respectively, without outliers), and others with low percent relative range, such as colour brightness (91% without outliers). Even if they were continuous variables, the bark chromatics tended to separate into classes of values which distinguish the grey-barked trees from the brown-barked trees (Figure 4).

**Figure 4.** The spectrum of bark colours in the measured spruce trees.

Negative values for variable *b* \* were not recorded, so blue was not detected in the composition of the bark scale colour. The negative values of *a* \* (scale greenness) had a 7.3% frequency and were identified only on the external side of the scales—especially on the south-facing side of the trunk.

The bark features are stable among trees inside the sample plot (Table 3). The sample plot is the main source of variation for scale size, slenderness, and brightness—but not for bark scale redness/greenness and yellowness. There are differences between the northern and southern side of the bark from the trunk for redness/greenness and yellowness, and for scale width—but not for scale slenderness (Table 3). Both colour variable values are higher on the southern side. The values are more stable from plot to plot on the northern side.

**Table 3.** The statistical significance of the influence of some factors on the analyzed bark features, using the Kruskal-Wallis test.


LS: bark scale length; WS: bark scale width; SS: bark scale slenderness; *a* \* bark colour redness/greenness; *b* \* bark colour yellowness/blueness.

The scale sizes have a total contribution of 94% to the variation of their slenderness (multiple *R* ≤ 0.590, *F* = 7475, *p* < 0.001). The relation between scale shape and colour is weak: Spearman *R* rank order correlations between −0.156 (*p* < 0.001) and 0.048 (*p* = 0.15).

The contribution of tree age to the variation of the bark phenotype is up to 6.8% (Spearman *R* rank order correlations with age between −0.180 and +0.261, *p* = 0.01–0.98). The size and range of the scale slenderness index decrease with tree age (*H* = 10.40, *p* = 0.03). The influence of tree age on the bark colour was only detected on the external side of the north-facing scales (*p* = 0.01–0.05, compared to 0.06–0.98 on the southern side). On the internal side of the scales, slightly distinguishable was only the decrease of the colour brightness with age (*R* = −0.176, *p* = 0.05). The oldest trees (older than 300 years) were different from the younger trees in the low yellowness on the external side of the north-facing scales (*H* = 9.41, *p* = 0.05). The bark colour differences (Δ*E* \*) between the northern and southern side of the stem—as well as between the internal and external side of the scales—were not influenced by tree age (*H* = 1.99, *p* = 0.74 and *H* = 2.60, *p* = 0.63, respectively).

The bark colour variation from one sample plot to another is explained as well by the altitude range. The altitude share to the scale features variation is up to 13%: Spearman R rank order correlation *R* with bark yellowness on the northern external side of the scale = +0.364, *p* < 0.001; *R* with bark redness on both the northern and southern external side of the scale = −0.288 respectively −0.264, *p* < 0.001; *R* with scale length = −0.159, *p* = 0.04.

All the colour differences Δ*E* \* between the scales were, without exception, visible to the naked eye—at least at a "light" level of visual assessment (Δ*E* \* > 0.70), according to Minemura and Umehara's classification [48]. The colour differences between the two trunk sides (northern and southern) reached "considerable" and "important" levels (3.9–9.9), while the differences between the scale sides were "very important" (10.2–13.6). The colour differences between the scale sides are considerably higher between the trunk sides and are due to the redness/greenness and yellowness (*H* = 1279.45, *p* < 0.0001 and *H* = 970.15, *p* < 0.0001, respectively). More precisely, the scale internal side is more reddish than the external side.

#### *3.2. Linking Bark Features to the Wood Structure*

The qualities of bark as a marker of wood structure were analyzed using correlation and statistical significance tests.

The length and width of the scale's variation is independent of the wood structure (Spearman rank order correlation ≤0.16, *p* > 0.09). By contrast, the scale slenderness index is strongly connected to the ring structure regularity (Table 4). The link is even better if tree age is joined (multiple *R*). If its influence is filtered, the contribution of the scale slenderness index to the estimation of the circumferential irregularity is at least 73% (square partial *r*). DBH is not determinative in the link between scale slenderness and wood structure (Table 4). The bark descriptor with the closest link to wood structure is the circumferential irregularity of latewood width (Table 4). Logistic growth fitting best describes the link between the two variables (Figure 5). Similarly, the link between scale slenderness and circumferential irregularity of ring width is better expressed by the polynomial model. All these links indicate that trees with better annual ring regularity show high slenderness scales. If the threshold value of the latewood irregularity of 70% [13,59] is applied in Figure 5, then the scale slenderness indices must be at least 1.9 for resonance spruce trees.

**Table 4.** Bark as predictor of spruce wood structure from the breast height (only the correlations with *R* > 0.5 are shown).


• *p* from t test for the significance of the partial regression coefficient associated with the tree age (TA) and the DBH (diameter at breast height) variable, respectively. If *p* > 0.05, then the influence of the tree age or the diameter on the predictor is not statistically proven.

**Figure 5.** The regression of circumferential irregularity of latewood thickness with the bark scale slenderness index in trees sampled with discs.

The multiple regression analysis indicates that the width of the resonance zone at breast height level was rendered well by the brightness and redness of the bark scales (*F* = 7.23, *p* = 0.0002). *Forests* **2019**, *10*, 799

The trees with scales that were redder and darker on the north-facing side of the trunk have a wider resonance zone.

Some of the tree variables were grouped according to the structural quality of the wood; Figure 6 only shows the variables of interest for this research, as well as the variables of bark colour for which *p* ≤ 0.05. The bark scale slenderness index and length do not appear to be connected with the wood structure quality (Figure 6). However, the bark scales of resonance structural trees are somewhat wider.

Furthermore, the values of the chromatic components were polarized according to the structural quality class (Figure 6). However, the differences between the structural classes regarding bark scale colour are not major and only refer to one of the scale sides—either the northern or the southern side. After the stratification of the colour components depending on the cardinal points and their position on the scale sides, four chromatic markers of SQC became apparent (Figure 6): bark brightness on the north side, the redness on the back of the south-facing scales, and the yellowness on both sides of the bark. The contribution of tree age was irrelevant: the correlation coefficients between the chromatic coordinates and the tree age are below 0.200, *p* > 0.04. The tree diameter has a contribution of up to 7.3% to the colour variation.

**Figure 6.** The stratification \* of some morphological traits according to the structural quality class of the trees (*p* from Kruskal-Wallis test or ANOVA).

Even if the differences in bark colour between the trees are not significant—they are visible to the naked eye as proven by values Δ*E\** (Table 5). The northern side of the trunk is the most suitable for this purpose. The strongest differences occur between extreme quality classes. Trees with a matured (SQC ≤ 3) or developing (SQC = 4) structural quality can be easily distinguished (at the "considerable" and "important" levels of visual assessment) from the trees with no such qualities.


**Table 5.** The matrix of median differences in bark colour between the structural classes of standing spruce trees (Δ*E* \* on the northern side of the breast height section/Δ*E* \* on the southern side).

#### *3.3. Acoustic Screening of Bark Markers*

The measured acoustic features of the wood have a low variability level (Table 6). All the acoustic variables examined are Gaussian (*W* from the Shapiro-Wilk test has values higher than 0.93, *p* > 0.07), which encouraged the application of ANOVA as a significance test. The values of the sound velocity can be grouped according to the sampled variation sources. Of interest is the assigning structural quality class to transverse sound velocity and radiation ratio (the highest values of these acoustics in quality class 3 and the lowest values in quality class 5, both in the standing and felled trees).

**Table 6.** Variation sources of acoustic features of spruce wood in stands with resonance wood from the Gurghiu Mountains (the Romanian Carpathians).


\* 0.05 is the threshold value for statistical significance; \*\* moisture content (MC) of 39% in heartwood and 98% in sapwood; \*\*\* moisture content of 10%.

Among the measured bark features, only colour offers clues about the acoustic properties of the wood. The basic density of the wood (without extractable) cannot be explained by any of the measured bark features. The extractable content in benzene-alcohol of the measured spruce cores is due to the resin (average value of 5.26% of dry weight).

The transverse sound velocity is explained by the redness of the bark: the trees which carry sound better have redder scales on the outside and less red on the inside (Table 7). Trees with a more yellow bark on the scale outside have less sound flowing wood—and therefore a lower capacity for sonic energy (Table 7).


**Table 7.** Bark as predictor of the acoustic properties of spruce wood (only regressions with *R* > 0.5 are shown).

The assigning of bark scale colour to wood acoustic impedance can be observed in Figure 7 and helps to conclude that trees with better acoustic emission have redder and less yellow scales.

**Figure 7.** 3D contour plot of acoustic impedance against bark scale redness and yellowness.

#### **4. Discussion**

Joining the morphological features of the trees to the connection between wood structure-acoustic properties simplifies the identification of the trees that supply material for manufacturing musical instruments—offering luthiers expeditious criteria. These criteria should not replace the acoustic testing of the rough material, which is done by most luthiers only at the rudimentary level of hitting the trunk and examining the emitted sounds [42,71], or safer still, after the flitches are dry when the velocity of the longitudinal sound propagation is checked [4].

The relevance of the morphological bark and crown features in relation to the acoustic properties of the wood [8] lies in the contribution to the elastic-mechanical wood features of: crown metrics [72], stem knottiness [73], the repeated sway of the trees, and the factors that generate compression wood [51,74] and fibre twisting [12].

The "bark features-wood acoustics" relationship appears to be obscure. To provide an explanation, it is assumed that this relation is still manifested through the wood structure—a consequence of a common genetic control—similar to the one in spruce which guides the blooming of trees and the growth cessation simultaneously [75–77]. The comparison is not forced, since the phenology of the buds and the wood structure have common heredity [78], and resonance spruce is a late bud phenotype [5], with weak growth [59] and preference for the green colour of female strobili [6]. In spruce plantations outside the natural range, the stability of the bark features was noticeable, which explained the predominant contribution of heredity in their expression [22]. The genetic control of spruce bark features manifests through polygenes—in particular, at least two to three pairs of non-allelic genes for bark colour [79]. The weak correlation of bark colour with scale shape [79], confirmed by our analysis, suggests the control of these bark features through neighbour genes on the chromosome, with linked transmission [79].

The shoot colour in spruce has topoclinal variation [80] in which the lowland populations exhibit brownish bark, and the highland populations have grey-coloured bark [81,82]. Spruce trees from multisite comparative trials in Romania showed that the specimens with dark early rough bark originate from altitudes lower than 1000 m and are traditional populations containing resonance wood [83]. The phenotype of resonance spruce bark assigns the darker hues of scales (Figure 6)—namely more redness and less yellowness (Table 7, Figure 6). From our data, the results showed that even for a narrow altitude range (1215–1580 m) the yellowness of the bark increases with altitude while the redness decreases. Under these circumstances, the distribution of resonance trees becomes limited at medium and lower altitudes within the spruce range. In the Romanian Carpathians, resonance spruce was found at altitudes ranging between 700 and 1000 m [84], in the Metaliferi Mountains between 650 and 900 m [21], in Jura Mountains between 800 and 1000 m [53], and in the Alps between 1500 and 1900 m [10,21]. In fact, the distribution of resonance spruce is conditioned by the site ability to ensure stable soil moisture and a balanced nutrition, and to protect the trees from climatic extremes that could acoustically harm the wood structure [7,84,85]—conditions which indeed are not satisfied at high altitudes [85].

The bark redness explains the assigning of resonance wood to the *Europaea* variety (Teplouchoff) Schrotter, whose bark is brown-reddish [4]. The leaf and soil analyses conducted in natural spruce stands revealed a significantly smaller amount of nitrogen, phosphorus, and magnesium in the needles of brown-barked spruce trees as opposed to grey-barked spruce trees [86], which indicates a high efficiency in metabolizing these elements and explains the modest wood bioaccumulations in resonance wood [87].

The shape of the spruce bark scales is a feature with ecoclinal variation, conditioned by the quantity of light radiation responsible for the earliness of the rough bark [53]. Thus, differentiating the phenotypes according to the bark relief can be a result of site elevation, as well as the trees' social status and spacing. In the case of Scots pine, the association of the bark relief (longitudinally versus panel-like cracked) with the crown shape (conical and paraboidal, respectively) [88] was found. In general, resonance trees are dominant and less pressured for competition at maturity [4,9]. The resonance spruce trees' position in canopy is not the result of favouring the trees through growth, but the result of maintaining it tenaciously as the trees can reach considerable age, as shown in Table 1 [5]. In the case of poplar, it was found that MOE becomes bigger as the trees grow taller [89], so by extrapolating from spruce it can be assumed that a dominant position brings additional contribution to the elastic superiority of resonance wood. Our results show that scale shape does not appear to be relevant in relation to the acoustic properties of the wood—however, it is an indicator of its structural quality (Table 4, Figure 6). The most important clues relate to latewood regularity (Figure 5)—which is in fact a selection criterion for the rough material for violins [90]. Moreover, the latewood content has

a diagnostic value superior to the growth ring width in relation to its acoustic use [91]. The regularity of the anatomical structures is a key feature of the suitability for musical instruments [13,92].

The structural features of trees are priority when choosing logs for musical instruments—even if they are occasionally viewed reluctantly [11,14,59]. Wood structure on a macroscale is a marker of acoustic features by means of wood density [90] and its direct influence on MOE [65]. In general, resonance spruce is a xylotype with a lower density than common spruce [39]. The sizes of the wood density in our sample (Table 6) are inferior to the data in the literature [14]. One explanation for this is the removal of extractables from wood and the calculation specific to the basic density (minimal weight per maximum volume). Additionally, the resonance wood in Romania is lighter than in other geographic regions from central and Eastern Europe [11], and, in the classification of spruce seed sources in Romania, the local spruce population (Gurghiu) is located in the last size class—depending on the basic density and latewood ratio [93]. In our sample, the wood density variations are independent from the variations of the bark features, as well as from those of the wood structure. In some exotics, the wood density is not relevant for the wave velocity [94].

The other acoustic properties of wood are influenced by the moisture of the material. As far as sound velocity is concerned, a ratio of 0.85 resulted between green wood and dry wood. In the case of Scots pine, this ratio was 0.92 for the tangential ultrasound velocity and 0.82 for the radial ultrasound velocity, at moistures similar to the material examined herein [95]. Thus, the depreciatory influence the moisture content has on the acoustic properties of the wood is confirmed [96]. These values refer to the transverse sound propagation, which are, in any case, relevant to the longitudinal one [49] as the wood is a high anisotropic material [97]. Additionally, there are opinions according to which the cross grain elastic properties are more important than those along the grain in the acoustic behaviour of wood [34]. By characterizing green wood (Table 6), the average found values of the transverse sound velocity and radiation ratio are similar to the data in the literature applied to European spruce selected for musical instruments, which refer to dry wood [42]. Still, the values of the transverse sound velocity we discovered in the discs—harvested from trees used later in manufacturing violins—are circa 600 m·s−<sup>1</sup> lower than the values from the excellence class of spruce tonal wood [36]. The radiation ratio was thought to be the most important criterion for diagnosing the suitability of wood for soundboards [36].

The link of the bark features to the acoustic properties of wood changes from the northern to the southern side of the trunk, respectively (Table 7)—most likely because the acoustic properties themselves vary around stem circumference [98]. The closest relations are found on the southern side (Table 7), which explains some luthiers' preference for the sunny side of the tree [34].

The stability of the bark features inside the sample plot (Table 3) presupposes the fact that the phenotype of resonance trees, which are few [84], is in fact a mark of the population it is part of—which in turn is an elite population according to other features—such as the pruning of the trees, the branchiness and the rarity of the wood faults [99].

Out of precaution, we recommend that the diagnosis of resonance wood be given only after the examination and comparison of all the phenotypical features that individualize it, and, where possible, the acoustic testing of the wood, regardless whether the bark is a highly trusted marker of trees with acoustic properties. Consequently, recognizing and classifying resonance wood should be a multi-criteria analysis of the trees' phenotype [39].

#### **5. Conclusions**

In the case of spruce stands, which supply material for manufacturing violins, there is an important amount of phenotypical variability of the tree bark features between sites. The scale colour and length show topoclinal variations to a certain extent related to the altitude. From the variation sources analyzed, the location of the sample plot around circumference has the greatest impact on the morphometry and colour of the bark. The differences in bark among trees are to a small degree due to age, and are consistent only in relation to the sides of the trunk, the south-facing side being the most relevant for diagnosing resonance wood. On the north-facing side of the trunk, the differences in scale colour

from plot to plot are insignificant. On the south-facing side of the trunk, the influence of the tree age is not manifested, which increases its value when characterizing the bark phenotype for spruce. The most significant differences in colour are recorded between the external and internal sides of the bark scales—which are stable with age. The differences between phenotypes regarding bark colour are significant enough to be visible to the naked eye.

The sizes of the bark are not enlightening with regard to the annual ring structure specific to resonance wood, but their ratio is a valuable marker of the regularity of this structure—especially of the latewood width regularity. The relation shows that the trees with elongated bark scales have better ring regularity around circumference. The bark colour provides valuable clues for the identification of structures with acoustic value. The resonance structural trees have less brightness and yellowness, but more redness in the bark. The differences between the wood structure qualities are recorded on the external side of the north-facing scales and on the internal side of the south-facing scales.

The size of the bark scale is not relevant with regard to the acoustic properties of the wood, but, on the other hand, the scale colour is an important marker of the acoustic quality of the material. The bark redness and yellowness on the external side of the south-facing scales have a high diagnostic value in relation to the acoustic performance of the material expressed by the sound velocity, acoustic impedance, and radiation ratio—all in transversal direction. The bark does not offer any clues regarding wood density. The phenotype of trees with acoustic properties is signaled by brown bark, high redness, and low yellowness on the external side of the scales—especially on the south-facing side of the trunk.

At least by analyzing scale colour and slenderness, the bark can be used as a criterion for diagnosing standing resonance wood.

**Author Contributions:** Conceptualization, F.D. and C.T.A.; methodology, F.D.; software, F.D.; validation, F.D. and M.D.S.; investigation, C.T.A.; data curation, F.D. and M.M.V.; writing—original draft preparation, F.D.; writing—review and editing, M.M.V.; visualization, M.D.S.; supervision, F.D.

**Funding:** The authors received no specific funding for this work.

**Acknowledgments:** The authors thank Petru Maran who assisted us in the map analysis, and our colleague Aureliu Florin Hălălis, an, who helped us with the bark scale analysis. We are grateful as well to Alexandra Stan, for her contribution in English editing.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Assessment of Genetic Diversity of Tea Germplasm for Its Management and Sustainable Use in Korea Genebank**

#### **Kyung Jun Lee, Jung-Ro Lee, Raveendar Sebastin, Myoung-Jae Shin, Seong-Hoon Kim, Gyu-Taek Cho and Do Yoon Hyun \***

National Agrobiodiversity Center, National Institute of Agricultural Sciences (NAS), RDA, Jeonju 54874, Korea **\*** Correspondence: dyhyun@korea.kr; Tel.: +82-63-238-4912

Received: 24 July 2019; Accepted: 4 September 2019; Published: 8 September 2019

**Abstract:** Tea (*Camellia sinensis* (L.) O. Kuntze) is cultivated in many developing Asian, African, and South American countries, and is the most widely consumed beverage in the world. It is of critical importance to understand the genetic diversity and population structure of tea germplasm for effective collection, conservation, and utilization. In this study, 410 tea accessions collected from South Korea were analyzed using 21 simple sequence repeat (SSR) markers. Among 410 tea accessions, 85.4% (350 accessions) were collected from Jeollanam-do. A total of 286 alleles were observed, and the genetic diversity and evenness were estimated to be on average 0.79 and 0.61, respectively, across all the tested samples. Using discriminant analysis of principal components, four clusters were detected in 410 tea accessions. Among them, cluster 1 showed a higher frequency of rare alleles (less than 1%). Using the calculation of the index of association and rbaD value, each cluster showed a clonal mode of reproduction. The result of analysis of molecular variance (AMOVA) showed that most of the variation observed was within populations (99%) rather than among populations (1%). The present study revealed the presence of lower diversity and simpler population structure in Korean tea germplasms. Consequently, more attention should be focused on collecting and conserving the new tea individuals to broaden genetic variation of new cultivars in future breeding of the tea plant.

**Keywords:** *Camellia sinensis*; genetic diversity; population structure; SSR

#### **1. Introduction**

Tea (*Camellia sinensis* (L.) O. Kuntze, 2*n* = 2*x* = 30) is one of the most popular non-alcoholic beverages worldwide, and is consumed by approximately 70% of the world's population for its refreshing taste, attractive aroma, therapeutic uses, and mildly stimulating properties [1]. It is an economically important tree crop, grown in over 52 countries in Asia, Africa, and South America [2,3]. The tea is a woody ever-green perennial plant and recorded to be native to Yunnan and Sichuan provinces in China and the northern part of Myanmar [4]. In Korea, although tea was introduced from China as early as the seventh century, the development of the tea industry was slow, and production was small [5].

The importance of using genetic resources in breeding programs to enhance crop genetic potential has been well recognized [6]. Many germplasm appraisal methods, such as morphology, biochemistry, molecular markers, and sensory evaluation, have been used to evaluate the resources of tea germplasm [7–9]. The phenotype can be referred to as a good standard for the evaluation of tea germplasm, because this method is simply based on the morphological traits to analyze the genetic diversity assessment [10]. Recently, the technology of using molecular markers has been proven to be one of the most effective methods for identifying different tea varieties [2,7,11–14].

Tea is an out-crossing species, and selected elite genotypes are propagated vegetatively and released as clonal varieties [13,15,16]. Clonal identification is traditionally based on morphological descriptors such as plant shape, stem width, leaf shape, young leaf type, and fruit shape [15,17]. However, as in many out-crossing crops, tea is highly heterozygous with most of its morphological, physiological, and biochemical descriptors showing continuous variation and high plasticity [18,19]. Korir et al. reported that morphological traits are associated with drawbacks such as the influences of environment on trait expressions, epistatic interactions, and pleiotropic effects among others despite the value of their advantages [17]. On the contrary, molecular markers are used as they are least affected by environmental factors and indefinite presence. In addition, they offer a possibility to observe the genome directly and thus eliminate the shortcomings inherent in a phenotype observation [15]. In previous studies, genetic diversity, discrimination and differentiation of tea germplasms have been assessed using different DNA markers such as restriction fragment length polymorphism (RFLP), randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), inter-simple sequence repeat (ISSR), and simple sequence repeat (SSR) [3,5,7,11–15,20–22].

In Korea, the national research institutes collected and conserved excellent tea individuals and investigated their morphological characteristics [23]. Also, some studies analyzed the genetic diversity of Korean tea germplasm using RFLP and RAPD [5,14,16]. However, the analysis of genetic diversity in Korean tea germplasm is not sufficient as it included a very small population (approximately 20–50 individuals). In the present study, 21 SSR primer pairs were used to analyze 410 tea accessions from Korea, and the aim was: (1) to evaluate the genetic diversity and population structure of Korean tea accessions and (2) to estimate the genetic differentiation and variation source among inferred populations. It is hypothesized that the results of the present study would be helpful to gain a deeper understanding on the genetic diversity, population structure, and differentiation of tea germplasm to guide effective collection, conservation, and application of tea genetic resources in Korea.

#### **2. Materials and Methods**

#### *2.1. Plant Materials*

A total of 410 tea accessions were obtained from the National Agrobiodiversity Center (NAC) at the Rural Development Administration in South Korea. Among 410 tea accessions, 400 accessions were collected from 31 cities at three provinces in South Korea, while ten accessions lacked the data of collecting area (Table S1).

#### *2.2. DNA Extraction*

Genomic DNA was extracted from the tea leaves using a Qiagen DNA extraction kit (Qiagen, Hilden, Germany). DNA quality and quantity were measured using 1% (w/v) agarose gel and spectrophotometrically (Epoch, BioTek, Winooski, VT, USA). Extracted DNA was diluted to 30 ng/μL and stored at −20 ◦C until further PCR amplification.

#### *2.3. SSR Genotyping*

For SSR analysis, a total of 21 SSRs were fluorescently labeled (6-FAM, HEX and NED) and used for the detection of amplification products (Table 1 and Table S2). PCR reactions were carried out using 25 μL reaction mixture, containing 30 ng template DNA, 1.5 mM MgCl2, 0.2 mM of each dNTPs, 0.5 μm of each primer, and 1 U Taq polymerase (Inclone, Korea). The amplification was performed with the cycling conditions of: initial denaturation at 94 ◦C for 5 min, followed by 35 cycles of denaturation at 95 ◦C for 30 s, annealing at 55 ◦C for 30 s, extension at 72 ◦C for 1 min, and a final extension step at 72 ◦C for 10 min. Each amplicon was resolved on ABI prism 3500 DNA sequence (ABI3500, Thermo Fisher Scientific Inc., Wilmington, DE, USA) and scored using Gene Mapper Software (Version 4.0, Thermo Fisher Scientific Inc.).


**Table 1.** List of 21 simple sequence repeat (SSR) primers used in this study.

#### *2.4. Population Structure and Genetic Diversity*

The Number of alleles (Na), Shannon index (I), Nei's unbiased gene diversity (GD), and Eveness were calculated using *poppr* package for *R* software [24]. The analysis of molecular variance (AMOVA) and calculation of the coefficient of genetic differentiation among populations (PhiPT) were done using GenAlEx software (6.5 version) with 999 permutations [25].

The population structure was analyzed by a discriminant analysis of principal components (DAPC) using the *adegenet* package for *R* software [26,27]. The *find.clusters* function was used to detect the number of clusters in the population. It uses K-means clustering which decomposes the total variance of a variable into between-group and within-group components. The best number of subpopulations has the lowest associated Bayesian Information Criterion (BIC). A cross-validation function (*Xval. dapc*) was used to confirm the correct number of principal component (PC) to be retained. In this analysis, the data is divided into two sets: a training set (90% of the data) and a validation set (10% of the data) The member of each group is selected by stratified random sampling, which ensures that at least one member of each group or population in the original data is represented in both training and validation sets. DAPC is carried out on the training set by retaining variable numbers of PCs, and the degree to which the analysis is able to accurately predict the group membership of excluded individuals (those in the validation set) is used to identify the optimal number of PCs to be retained. At each level of PC retention, the sampling and DAPC procedures are repeated many times [28]. The best number of PCs that should be retained is associated with the lowest root mean square error. The resultant clusters were plotted in a scatter plot of the first and second linear discriminants of DAPC.

#### *2.5. Estimation of Reproduction Mode among 410 Korean Tea Accessions*

Linkage disequilibrium was calculated to test for the evidence of sexual reproduction. Linkage among loci can be caused by clonal reproduction and selection events; and as linkage increases populations fall into linkage disequilibrium, while recombination from sexual reproduction breaks up linkage among loci and generates linkage equilibrium. To quantify linkage, *poppr* package calculates the indices Ia (The index of association) and rbarD (The standardized index of association). High values of Ia, i.e., values that differ strongly from 0, can be interpreted as evidence of strong linkage and linkage disequilibrium [29]. The rbarD value has been shown to be a more reliable estimator of linkage equilibrium than Ia since it is not influenced by sample size [30], but to be thorough both metrics were calculated. Significance was tested by creating a null dataset (999 random permutations) and if the observed rbarD value lies outside the null dataset then the null hypothesis that no linkage exists would be rejected [24,29].

#### **3. Results**

#### *3.1. Regional Distribution of 410 Tea Accessions in South Korea*

The collection areas of 410 tea accessions were seven cities in Gyeongsangnam-do (GN), 19 in Jeollanam-do (JN), and five in Jeollabuk-do (JB) (Table 2). Among 410 tea accessions, 350 tea accessions (85.4%) were collected from JN, 7.1% (29 accessions) from GN, and 5.1% (21 accessions) from JB. Of 31 cities, the largest tea accessions were collected from Boseong (18.8%), followed by Suncheon (13.9%), and less than 10% from other cities.


**Table 2.** Distribution of collected locations and clusters from discriminant analysis of principal components (DAPC) analysis in 410 tea accessions.

<sup>1</sup> Clusters were from the result of DAPC analysis, <sup>2</sup> GN—Gyeongsangnam-do; JN—Jeollanam-do; JB—Jeollabuk-do.

#### *3.2. Population Structure and Mode of Reproduction of 410 Tea Accessions*

In order to understand the genetic relationship among 410 tea accessions, DAPC analysis was performed (Figure 1). Four clusters were detected in coincidence with the lowest BIC value using *find.clusters* function. DAPC analysis was carried out using the detected number of clusters. Typically, 50 first PCs (68.4% of variance conserved) of PCA and three discriminant eigenvalues were retained. These values were confirmed by a cross-validation analysis. The number of accessions in each cluster was 138, 75, 108, and 88 corresponding to clusters 1 to 4, respectively (Table 2).

Cluster 1 was the highest in JN with 36%, while JB and GN had a similar level of 21% and 24%, respectively. Unlike Cluster 1, cluster 2 accounted for the lowest percentage with 17% in JN and JB and

24% in GN. Cluster 3 had the highest ratio in JB (38%), followed by GN (31%) and JN (25%). The ratio of cluster 4 in GN and JN was almost similar accounting for 24% and 22%, respectively, while in JB it was only 14%. Among the ten unknown accessions, five accessions were in cluster 3 (50%), followed by cluster 2 (30%).

**Figure 1.** Discriminant analysis of principal components (DAPC) for 410 tea accessions. The axes represent the first two Linear Discriminants (LD). Each circle represents a cluster and each dot represents an individual. Numbers represent the different subpopulations identified by DAPC analysis.

Among 410 tea accessions, 409 multilocus SSR genotypes were found (Table 3). The Simpson's Dominance (λ), a Simpson's diversity (1-λ), and Nei's Gene diversity (GD) were calculated as 0.998, 0.002, and 0.792 in 410 tea accessions, respectively. In four clusters, 1-λ and GD ranged from 0.007 (C1) to 0.013 (C2) and 0.695 (C4) to 0.805 (C2), respectively. Each cluster was tested for linkage disequilibrium and evidence of sexual reproduction (Table 3 and Figure S3). All accessions showed the index of association (Ia) = 0.792 and rbarD = 0.0583. The ranges of Ia and rbarD in four clusters were from 0.123 (C1) to 0.999 (C4) and from 0.0062 (C1) and 0.0508 (C4), respectively. The hypothesis of no linkage among markers was rejected for all the accessions (*p* = 0.001) and each cluster (C1, *p* = 0.018; C2, C3, and C4, *p* = 0.001), thus supporting a clonal mode of reproduction.


**Table 3.** Genetic diversity and linkage of 410 tea accessions.

<sup>a</sup> N, Number of accessions; MLG, Number of multilocus genotypes; λ, Simpson's index; GD, Nei's unbiased gene diversity; Ia, The index of association; rbadD, The standardized index of association.

#### *3.3. Genetic Diversity*

A total of 286 alleles were detected among 410 tea accessions by polymorphic 21 SSR markers (Table S3). On average, 13.6 alleles varying from five (TM604) to 20 (MSE0083 and MSG0699) were amplified by each marker. The Shannon index (I) varied from 0.98 (MSE0237) to 2.31 (MSE0083) among 21 SSR markers. The average gene diversity (GD) was estimated to be 0.79 and varied from 0.52 (MSE0237) to 0.87 (MSE0083 and MSG0361). The average evenness (E) was 0.75 varied from 0.61 (MSG0429) to 0.87 (TM604), respectively.

Among four clusters, the lowest number of alleles was recorded in the cluster 3 (190, mean = 9.1), while the highest was registered in the cluster 1 (247, mean = 11.8) (Table 4). The average I, GD, and E varied from 1.50 (C4) to 1.89 (C2), 0.69 (C4) to 0.81 (C2), and 0.68 (C4) to 0.76 (C2), respectively.

Seventy-six rare alleles, defined with a frequency less than 1%, were observed in 21 SSR markers (Table 4). All the rare alleles were observed in 119 accessions from four clusters. Typically, 53.9% of rare alleles were observed in 41 accessions from cluster 1, followed by 39.5% in 30 accessions in cluster 2. Unique private alleles were found in 35 accessions and 40% of them belonged to cluster 1.


**Table 4.** Distribution of rare alleles observed in four clusters.

#### *3.4. Gene Flow*

The sources of genetic differentiation were revealed among different inferred clusters by the AMOVA method. Results indicated that 1% of variations could be attributed to differentiation among clusters and 99% of variations could be attributed to differentiation within inferred clusters (Table 5). PhiPT and gene flow (Nm) for 410 tea accessions was 0.014 (*p* < 0.001) and 36.156, respectively. Pairwise population PhiPT values for four clusters ranged from 0.01 (C2–C4) to 0.021 (C2–C3) (Table 6). Pairwise population estimates of gene flow (Nm) for four clusters ranged from 23.717 to 47.872 migrants per clusters.


**Table 5.** Analysis of molecular variance (AMOVA) of 410 tea accessions.

**Table 6.** Pairwise population PhiPT values (Below diagonal) and Nm values based on 999 permutations (above diagonal) from AMOVA. All PhiPT values were significantly greater than 0 (*p* < 0.0001).


#### **4. Discussion**

Erosion of plant genetic diversity is a very serious problem caused by modernization and replacement of wild plants or landraces with a few elite varieties [31,32]. Therefore, collection and preservation of plant genetic resources are of immense importance for crop breeding to support the demands of a growing human population. Effective management and utilization of plant genetic resources require information about the origin of strains, phenotypic traits, and genetic diversity (identified by molecular techniques) [33]. In this study, analysis of genetic diversity of 410 tea accessions collected and conserved in the Korea genebank was performed. Genetic diversity provides an assurance of future genetic progress and insurance against unforeseen threats to agricultural production such as disease epidemics or climate changes. Thus, the fate of genetic diversity in these gene pools is of utmost importance if plant breeding will continue to address the pressing needs of society such as increased yield, genetic resistance to diseases and pests, improved nutritional and processing quality of crop products, and reduction in environmental effects [34].

In the present study, about 85% of tea accessions were collected from Boseong and Suncheon in Jeollanam-do (JN) (Table 2). According to Eom and Kim, the tea seeds obtained from China were firstly cultivated in Mount Jiri in JN and so a majority of tea plants are included in the Honam region (Jeollanam-do and Jeollabuk-do) [35]. In addition, tea experiment stations in Boseong experiment station (BES) and Mokpo experiment station (MES, a city close to Suncheon) have collected the tea accessions since the late 1990s [18]. The two experiment stations have probably collected tea accessions around the area where the institute is located, and thus the largest number of tea accessions was collected in the region. The tea accessions of the two research institutes have been managed as registered tea germplasms of the NAC and appear to cause a regional collectivity imbalance of tea accessions in Korea.

In this study, the mean Nei's gene diversity (GD, 0.792) across 21 SSR markers was higher than other studies; 0.652 in 280 tea accessions using 23 SSR markers [7], 0.640 in 450 tea accessions using 96 EST-SSR marker [22], 0.543 in 185 Chinese tea cultivars using 48 SSR markers [13], and 0.680 in 64 Sri Lankan tea cultivars using 33 EST- or genomic-SSR markers [3]. The gene diversity of a locus, also known expected heterozygosity, is a fundamental measure of genetic variation in a population, and describes the proportion of heterozygosis expected under Hardy–Weinberg equilibrium [36]. As tea is an open pollinated plant, the tea plant shows highly heterogeneous and consequently broad

genetic variation [13]. The obtained results also showed high gene diversity in a manner similar to the previously reported data. However, Yao et al., mentioned that the comparison of the degree of genetic diversity between different studies is difficult as the analysis may be affected by various factors like sampling schemes, number of SSR markers, sizes of SSR repeats, and location of SSR in the genome [22].

Contrary to the result of higher gene diversity, 410 tea accessions in this study were characterized by an extreme dearth of genetic diversity as revealed by an overall Simpson's Dominance (λ) of 0.998. Furthermore, the AMOVA revealed there was no significant difference among populations, suggesting low genetic diversity across the entire collected region. In addition, the standardized index of association (rbarD, 0.0583, (*p* < 0.001)) supported the hypothesis of clonal population structure based on the linkage disequilibrium tests, where the null hypothesis of random mating was rejected for all populations. Under clonal propagation, heterozygosis and allelic diversity at each locus are expected to increase [37,38]. While high levels of clonality tend to increase genetic variation within the population, an opposite effect is expected on genetic differentiation among populations and on genotypic diversity, both decreasing with the rate of clonal reproduction [37,39]. Indeed, the 410 tea accessions in this study were landraces and are likely to have been collected from private farms. As breeding a reliable cultivar for a private farm is nearly impossible, almost all the tea gardens consist of seedling tea plants from the local and wild origin with great morphological variations [23]. BES and MES also collected and investigated the morphological characteristics of tea germplasm and many variations were observed in the number of stems, stem length, leaf area, and leaf color which were within the limits of the investigation of the morphological characteristics [40]. Due to the lack of sufficient studies on the genetic diversity of Korean tea germplasm, a few researchers argued over the importance of genetic collection and preservation of tea accessions [14,21].

Previous studies performed the analysis of genetic diversity of different tea accessions using molecular markers like RFLP, RAPD, and SSR [3,5,21,22]. In addition, the STRUCTURE software was used to analyze the population structure of tea germplasm [2,3,13,22]. To analyze the genetic diversity and population structure of Korean tea accessions, 21 SSR markers and DAPC analysis were used in this study. The DAPC method provides an interesting alternative to STRUCTURE software as it does not require that populations should be in Hard-Weinberg equilibrium and can handle large sets of data without using parallel processing software [41]. DAPC analysis divided the population into well-defined clusters associated with provenance, ploidy, taxonomy and breeding program of the genotypes and related to their genetic structure [42]. According to Rosyara et al., STRUCTURE, EIGENSTRAT, and DAPC exhibit the ability to control population structure in association with mapping studies [43]. EIGENSTRAT and DAPC were slightly better than STRUCTURE but DAPC led to a better separation among populations. In this study, DAPC (four clusters) analysis provided a more detailed clustering within tea accessions than STRUCTURE (two populations) (Figure S1). Campoy et al., reported that their results of population structure in sweet cherry using STRUCTURE and DAPC showed good consistency between the two methods and DAPC analysis provided a more detailed clustering among the populations compared to STRUCTURE analysis [41].

As per the result of DAPC, 410 tea accessions were divided into four clusters (Figure 1). Among them, cluster 1 and 2 showed a higher frequency of rare alleles and genetic diversity, and there was high gene flow (Nm = 45.120) between two clusters. Yao et al., reported that a majority of rare SSR alleles and higher diversity were observed in the tea accessions from Yunnan and its neighboring provinces, considered as an original center of the tea plant in China [22]. They also reported that the allele number, genetic diversity, and PIC value of tea germplasm significantly decreased with the distance away from the origin center of the tea plant. Although a particular collection area cannot be designated as an origin, tea accessions contained in cluster 1 and 2 are thought to be the origin of the Korean tea germplasm due to their ratio of rare alleles and higher genetic diversity.

Kaundun et al., reported that tea accessions collected from Korea showed higher genetic diversity than those from Taiwan and Japan [21]. On the other hand, Jeong and Park mentioned that the genetic

variation in Korean tea population is smaller compared to Chinese or Japanese wild tea populations [18]. The results of the present study confirmed that 410 tea accessions collected and conserved in Korea genebank exhibit the narrow genetic variations. Park et al., suggested that the low genetic diversity of Korean tea was established from a limited gene stock from China [14]. The short history and relatedly homogeneous environment in which they were introduced in the southwestern part of the country did not favor population differentiation. In addition, loss of diversity was exacerbated by the mass destruction of tea plantations in the fourteenth century due to political and religious reasons [44]. Consequently, tea being a highly outcrossing species, variability is mostly expected within rather than between the populations as predicted by Hamrick [45].

#### **5. Conclusions**

In this study, genetic diversity and population structure of 410 Korean tea accessions collected and conserved in Korea genebank were analyzed using 21 SSR markers. The results provided molecular evidence for the narrow genetic base of the Korean tea accessions. According to the database in NAC, initially, 4223 tea accessions were collected and conserved (http://genebank.rda.go.kr). However, only 510 tea accessions were conserved in NAC as many tea accessions were destroyed by the extreme cold condition in the winter season. Among them, 427 tea accessions were collected from Korea, 56 from China, 22 from Japan, and five from Indonesia. In conclusion, there exists an urgent need for broadening the genetic base of tea accessions in Korea genebank and the necessitation can be achieved by not only collecting tea plants in Korea but also introducing the tea germplasm from other countries. Additionally, it is necessary to analyze the biochemical components of tea accessions in order to gain an understanding of their effects on the quality characteristics of tea varieties and promote utilization of tea germplasm for tea breeding.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/10/9/780/s1, Table S1: List of 410 tea accessions used in this study; Table S2: Repeat motif and product size of 21 SSR primers used in this study; Table S3: Genetic diversity of 21 SSR markers in each cluster; Figure S1: (A) Relationship between delta K and K as revealed by STRUCTURE harvester. (B) Population structure analysis of 410 tea accessions inferred using STRUCTURE software based on 21 SSR markers for delta K = 2.

**Author Contributions:** Conceptualization, K.J.L.; Data curation, K.J.L., R.S., and M.-J.S.; Formal analysis, K.J.L., R.S., and M.-J.S.; Investigation, J.-R.L., S.-H.K., and G.-T.C.; Project administration, D.Y.H.; Resources, J.-R.L., S.-H.K., and G.-T.C.; Writing—original draft, K.J.L.; Writing—review and editing, D.Y.H.

**Funding:** This research was funded by [Research Program for Agricultural Science & Technology Development, National Institute of Agricultural Sciences, Rural Development Administration, Republic of Korea] grant number [PJ01355702].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Mutation Mechanism of Leaf Color in Plants: A Review**

#### **Ming-Hui Zhao, Xiang Li, Xin-Xin Zhang, Heng Zhang and Xi-Yang Zhao \***

State Key Laboratory of Tree Genetics and Breeding, School of Forestry, Northeast Forestry University, Harbin 150040, China; zhaominghui66@163.com (M.-H.Z.); lx2016bjfu@163.com (X.L.); zhangxinxin@nefu.edu.cn (X.-X.Z.); zhangheng815@nefu.edu.cn (H.Z.) **\*** Correspondence: zhaoxyphd@163.com; Tel.: +86-0451-8219-2225

Received: 3 July 2020; Accepted: 3 August 2020; Published: 6 August 2020

**Abstract:** Color mutation is a common, easily identifiable phenomenon in higher plants. Color mutations usually affect the photosynthetic efficiency of plants, resulting in poor growth and economic losses. Therefore, leaf color mutants have been unwittingly eliminated in recent years. Recently, however, with the development of society, the application of leaf color mutants has become increasingly widespread. Leaf color mutants are ideal materials for studying pigment metabolism, chloroplast development and differentiation, photosynthesis and other pathways that could also provide important information for improving varietal selection. In this review, we summarize the research on leaf color mutants, such as the functions and mechanisms of leaf color mutant-related genes, which affect chlorophyll synthesis, chlorophyll degradation, chloroplast development and anthocyanin metabolism. We also summarize two common methods for mapping and cloning related leaf color mutation genes using Map-based cloning and RNA-seq, and we discuss the existing problems and propose future research directions for leaf color mutants, which provide a reference for the study and application of leaf color mutants in the future.

**Keywords:** color mutation; pigment metabolism; chlorophyll; anthocyanin; mutation mechanism; RNA-seq

#### **1. Introduction**

As part of photosynthesis, leaves play an important role in the growth and development of plants. Leaf color mutation is a high frequency character variation that is easy to recognize and ubiquitous in various higher plants [1]. Leaf color mutations usually affect the plant's photosynthetic efficiency, resulting in stunted growth and even death. Therefore, leaf color mutations have been considered harmful mutations with no practical value by researchers in the past [2,3]. Since Granick used the Chlorella vulgaris green mutant *W5* to validate that protoporphyrin III was a precursor to chlorophyll synthesis in 1948, research related to leaf color mutants has gradually gained attention, especially research related to chlorophyll synthesis [4,5]. Physical and chemical mutagenesis and tissue culture are usually used to induce leaf color mutations and obtain leaf color mutants [6]. In general, the mutant genes of leaf color mutants can directly or indirectly affect the pigment (such as chlorophyll and anthocyanin) synthesis, degradation, content and proportion, which can block photosynthesis and lead to abnormal leaf color. Leaf color mutations are generally expressed at the seedling stage and can be divided into eight types: albino, greenish-white, white emerald, light green, greenish-yellow, etiolation, yellow-green, and striped [1]. In addition, leaf color mutants can also be divided into four types: total chlorophyll increased type, total chlorophyll deficient type, chlorophyll a deficient type, and chlorophyll b deficient type [7]. As an especially ideal material, leaf color mutants play an important role in the research of photosynthetic mechanisms, the chlorophyll biosynthesis pathway, chloroplast development and genetic control mechanisms [8,9]. Leaf color mutants have been

obtained in *Zea mays* [10,11], *Populus* L. [12,13], *Nicotiana tabacum* [14–16], *Arabidopsis thaliana* [17–20], *Oryza sativa* [21–24], *Rosa multiflora* [25,26], and *Cucumis melo* [27], resulting in many studies being conducted in these species.

Genetic changes in plant cells are usually the cause of leaf color mutations, although the underlying mechanism is more complicated. There are more than 700 sites involved in leaf color mutations in higher plants, all of which are involved in the development, metabolism or signal transduction of leaf color formation. We can therefore analyze and identify gene functions and understand gene interactions by using these mutants [28]. This paper mainly reviews studies related to leaf color mutants, describing some genes that control or affect pigment metabolism, chloroplast development and differentiation, and photosynthesis. We summarize two methods for mapping and identifying leaf color mutation genes, and preliminarily elucidate the formation mechanism of leaf color mutations. We also present the achievements and challenges inherent to the study of leaf color mutants and future research directions to provide a reference for the future study and application of leaf color mutants.

#### **2. Genetic Model of Plant Leaf Color Mutants**

Many studies have proved that the genetic modes of plant leaf color mutants are mainly divided into three types: nuclear heredity, cytoplasmic heredity and nuclear-plasmid gene interaction heredity, among which nuclear heredity is the most important genetic mode of leaf color. Studies of leaf color mutants mainly focus on recessive mutants controlled by nuclear genes. Such mutants follow Mendelian inheritance laws, including single-gene inheritance and multigene inheritance, among which the single gene recessive inheritance mode results in the majority of leaf color mutants and leaf color inheritance. A series of white leaf color mutants *virescen1*, *virescen3*, and yellow-green leaf *YSA* mutants were found in rice, all of which were proved to be controlled by a pair of recessive nuclear genes [28–30]. In addition, leaf color mutations controlled by a single recessive gene were also found in such species as *Capsicum annuum*, *Cucumis sativus*, and *C. melo* [27,31,32]. Moreover, it was found that the Sesamum indicum yellow-green mutation was controlled by an incompletely dominant nuclear gene, *Siyl-1* [33]. Conversely, few studies and reports exist related to leaf color mutations caused by mutations of the cytoplasmic genes and nuclear cytoplasmic gene interactions in such species as *A. thaliana*, *N. tabacum*, and *Lycopersicon esculentum* [34–36]. These mutations may be related to the fact that plant cells contain multiple organelles (e.g., chloroplasts and mitochondria) with their own DNA molecules.

#### **3. Molecular Mechanisms of Plant Leaf Color Mutations**

The molecular mechanisms of leaf color mutations are complex. The mutated genes can directly or indirectly interfere with pigment synthesis and stability, resulting in various sources of leaf color mutations. In this paper, we summarized the molecular mechanisms underlying the formation of leaf color mutants through the following aspects.

#### *3.1. Abnormal Chlorophyll Metabolism Pathway*

#### 3.1.1. Mutations of Genes Related to the Chlorophyll Synthesis Pathway

Chlorophyll in higher plants includes chlorophyll a and chlorophyll b. The biosynthesis of chlorophyll begins when L-Glutamyl-tRNA produces chlorophyll a, which is then oxidized by chlorophyllide, an oxygenase, to form the chlorophyll b synthesis cycle. This process involves a total of fifteen steps, and the whole synthesis process involves the participation of fifteen enzymes and the expression of twenty-seven genes encoding these enzymes (Figure 1 and Table 1) [37,38]. Any obstacle in this synthetic process will affect chlorophyll biosynthesis, resulting in leaf color mutants.

**Figure 1.** Biosynthetic pathway of angiosperm chlorophyll.

The entire process of chlorophyll synthesis is divided into two main parts: the first part includes the process of L-Glutamyl-tRNA synthesizing protoporphyrin IX (proto IX), and the second part encompasses from protoporphyrin IX to chlorophyll biosynthesis. The two synthetic parts can be divided into three stages: the first stage is from L-Glutamyl-tRNA to 5-aminolevulinic acid (ALA), the second stage is from ALA to proto IX biosynthesis, and the third stage is from proto IX to chlorophyll biosynthesis. Furthermore, the entire process of chlorophyll synthesis is localized at three tissues: the synthesis of ALA to proto IX is completed in the middle of the chloroplast stroma; the synthesis of proto IX to chlorophyllide occurs on the chloroplast membrane; and the process of chlorophyll a and chlorophyll b synthesis is completed on the thylakoid membrane [39]. Problems with any of genes in Table 1 may lead to changes in enzyme activity and function within chlorophyll synthesis process, resulting in the accumulation of excessive intermediates, affecting the normal metabolism of chlorophyll, and changing the proportion of pigments in chloroplasts, which can lead to oxidative damage, causing different plants to produce leaf color mutations or possibly even causing plant death [40]. For example, after glutamate-tRNA synthetase is silenced by the virus, the mutant phenotype is extremely yellow [41]. When the function of chlorophyll a oxidase is abnormal, chlorophyll b synthesis is reduced, or it is not synthesized [42]. In the fifteen steps of the chlorophyll metabolism pathway, the earlier the occurrence of the mutation, the more pronounced the leaf color mutations, which generally present as a yellow or white phenomenon. If the mutation occurs in the later stages of chlorophyll synthesis, it usually only results in patches, stripes, etc. [43].

In the chlorophyll biosynthesis pathway, the synthesis of ALA and the insertion of the Mg ion into proto IX are the two main control points that directly affect chlorophyll synthesis [44]. ALA is catalyzed by Glutamyl-tRNA reductase (GluTR) and Glutamate-1-semialdehyde 2,1-aminomutase (GSA-AM), which controls the rate of chlorophyll and heme synthesis. This is the rate-limiting step of the tetrapyrrole synthesis pathway, and it plays a key role in chlorophyll synthesis [45]. For example, the overexpression of the *HEMA1* gene in yellowing plants leads to an increase in protochlorophyllide content [46]. The synthesis of ALA is influenced by GluTR reductase, and GluTR is encoded by the *HEMA* gene, of which plants contain at least two (*HEMA 1, HEMA 2*) [47,48]. The antisense *HEMA1* gene was transferred into *A. thaliana*, and it was found that the antisense *HEMA1* gene inhibited the formation of ALA, which resulted in decreases of protochlorophyllide synthesis and chlorophyll content [49]. The activity of GluTR is regulated by heme, which as a terminal product can also provide feedback regulation on the activity of GluTR. For example, gene mapping of the *A. thaliana 'ulf'* mutant was located at *Hy1* site, which encodes for the synthesis of heme oxidase, and the reduced heme oxidase activity in the mutant led to the accumulation of heme, which inhibited the activity of GluTR [50].


**Table 1.** Genes encoding and enzymes involved in the chlorophyll synthesis pathway in *A. thaliana*.

Magnesium chelatase is the key factor in guaranteeing the biosynthesis of chlorophyll, which can catalyze the formation of Mg protoporphyrin by combining a magnesian ion with protoporphyrin [51]. The reaction step of metal ion insertion into protoporphyrin IX is the branching point for the synthesis of chlorophyll, heme and plant pigments. Magnesium chelatase catalyzes magnesian ion insertion into protoporphyrin IX to form the chlorophyll branch, while ferric chelatase catalyzes ferrous ion to be inserted into protoporphyrin IX to form the heme and plant pigment branches. Magnesium chelatase and ferric chelatase compete for protoporphyrin IX at the branch point [45,52]. Studies have shown that the main reason for the formation of chlorophyll mutants is problems with magnesium chelatase, which prevents the synthesis of chlorophyll from proceeding normally [53]. The decrease in magnesium chelatase activity leads to a decrease in chlorophyll content in the mutant, and the phenotype of the mutant will also be affected by the enzyme activity [54]. The function of the magnesium chelatase is complex, which consists of the *D*, *I*, and *H* subunits, depends on the synergy of the three functional subunits. For example, in *Arabidopsis*, a single base mutation in the third exon of the *GUN5* gene encoding the magnesium chelatase *H* subunit, will cause the formation of albinos [55]. If the *CHL1* gene encoding the magnesium chelatase *D* subunit mutates, the mutant will present as a light yellow-green at the seedling stage, and the expression of the nuclear gene *LHCP II* will also be affected through a

feedback regulation mechanism [56]. In addition, chlorophyll synthesis may be regulated by other factors besides the expression of the twenty-seven gene encoding enzymes. For example, in a study of *Pak-choi* yellow leaf mutants, the expression of *CHLG* encoding chlorophyll synthase was found to be significantly decreased, but there was no difference between wild type and mutant *CHLG* sequences, indicating that the chlorophyll synthetase activity was also regulated by other regulatory factors [57].

#### 3.1.2. Mutations of Genes Related to the Chlorophyll Degradation Pathway

In leaves with no color mutation, chlorophyll degradation is coupled with simultaneous chlorophyll synthesis in a dynamic balance. Chlorophyll degrades during the transformation of chloroplasts into chromoplast, which indicates that leaves will begin to age [58]. The reaction process of chlorophyll degradation is mainly divided into two stages. The first stage is the degradation of chlorophyll into the primary fluorescent chlorophyll catabolite (pFCC). In the second stage, the pFCC in the vacuole forms nonfluorescent chlorophyll catabolite (NCC), which is finally transformed into the oxidation degradation product, pyrrole (Figure 2) [59]. This degradation pathway involves a series of complex reactions, and an interruption in any step may change the chlorophyll content and produce leaf color mutations.

**Figure 2.** Biodegradation pathway of chlorophyll in higher plant. (1) Chl b reductase; (2) Chlorophyllase; (3) Metal chelating substance; (4) Pheide a oxygenase; (5) RCC reductase; (6) Catabolite transporter; (7) ABC transporter. Chl a, Chlorophyll a; Chl b, Chlorophyll b; Chlide a, chlorophyllide a; Pheide a, Pheophorbide a; RCC, red Chl catabolite; pFCC, primer fluorescent Chl catabolite; FCCs, fluorescent Chl catabolites; NCCs, nonfluorescent Chl catabolites.

Mutation of the chlorophyll degradation pathway will lead to a lack of chlorophyll degradation in plants, resulting in the stay-green phenomenon in plant leaves; this kind of mutation is called a stay-green mutation. In the *Z. mays 'fs854* mutant, the inhibition of the chlorophyll degradation pathway led to the occurrence of the stay-green phenomenon in maize, which promoted the increase of maize yield due to its high photosynthesis [60]. If the process of chlorophyll degradation is accelerated, leaf color mutants may also be produced. The expression of *CHL2* and *RCCR* encoding key chlorophyll degradation enzymes in *Cymbidium sinense* mutants was higher than wild-type, resulting in a decrease in chlorophyll content, which may be the reason for the yellow color mutation of Moran leaf [61]. Stay-green mutations have a distinct phenotype that can delay the leaf senescence of crops and increase crop yield. It has a much higher potential application value compared with yellow-green seedlings, and it therefore attracts significant research attention.

#### 3.1.3. Mutations of Genes Related to the Heme Metabolism Pathway

Tetrapyrrole is the skeleton structure and common precursor of chlorophyll and heme biosynthesis in plants, the biosynthetic pathway of which is divided into two branches, the chlorophyll biosynthetic pathway and the heme biosynthetic pathway, at protoporphyrin IX. The first branch chelates protoporphyrin IX with a magnesian ion to produce Mg-protoporphyrin IX, which finally forms chlorophyll; the second branch chelates proto IX with a ferrous ion to form heme [40]. Heme is a type of iron-containing cyclic tetrapyrrole that is the intermediate product of the synthetic pathway of phytochrome and phycobilin [62]. After a series of complex reactions, heme finally forms a photosensitive pigment chromophore (Figure 3) [40]. As heme and chlorophyll in plants have the same precursor and share part of the same pathway, they are the branch products of chlorophyll biosynthesis. Therefore, the synthesis of chlorophyll is regulated by the feedback inhibition of intracellular heme content. When the heme metabolism branch of plants is disturbed, it may lead to an increase in heme content in the cells. High concentrations of heme inhibit the synthesis of ALA through feedback regulation, which is a common precursor of chlorophyll and heme, resulting in the inhibition of chlorophyll synthesis and variations in leaf color [63].

**Figure 3.** Heme metabolism pathway in high plants. The 3Z-phytochrome chromophore could spontaneously form a 3E-phytochrome chromophore or be an enzyme catalyst.

Heme oxygenase (HO) catalyzes the degradation of heme to synthesize phytochrome precursors in higher plants, which controls the rate of heme degradation. Both pea and *Arabidopsis* leaf color mutants are caused by *HO1* deficiency [64]. The *Arabidopsis 'flu'* mutant gene is located at the *Hy1* site, which encodes a heme oxidase. A decrease in heme oxygenase activity in the mutant results in the accumulation of heme, while the accumulation of heme inhibits the activity of glutamyl tRNA reductase, which indicates that heme can provide feedback regulation for the activity of glutamyl tRNA reductase [50]. In the study of the rice *YELLOW-GREEN LEAF2* mutant, a 7-kb insertion was found in the first exon of *YGL2*/*HO1*, resulting in a significant reduction in the expression of *ygl2* in the *ygl2* mutant, which hinders chlorophyll synthesis and leads to the yellow leaf phenotype [65]. A 45 bp fragment was found to be inserted into the gene encoding heme oxygenase through map-based cloning, which inhibited the expression of the heme oxygenase encoding gene and resulted in leaf color mutation [66]. In addition, leaf color mutants lacking phytochrome chromophores can be used to study the regulation of heme on chlorophyll synthesis, as well as may be used as an ideal model with which to study the photomorphogenesis of higher plants [67].

#### *3.2. Abnormal Chloroplast Development and Di*ff*erentiation*

Chloroplasts originate from protoplasts and are autonomous organelles in plant cells. Their differentiation and development are controlled by the interaction and co-regulation of nuclear genes that encode related proteins and plastid genes involving gene transcription, RNA processing, and protein translation, folding and transportation [68]. This process can be divided into seven steps: nuclear gene transcription, chloroplast protein input and processing, chloroplast gene transcription and translation, thylakoid formation, pigment synthesis, plasmid–nuclear signal transduction, and chloroplast division [69]. Additionally, chloroplast development is closely related to chlorophyll content, and any obstructed pathway in the process of chloroplast development leads to chloroplast hypoplasia, thus affecting chlorophyll content in plants and causing leaf color mutations.

The regulatory pathway of the sesame yellow leaf character mutation was first analyzed in the *'Siyl-1* sesame mutant with yellow-green leaf color. The results showed that the number of chloroplasts and the morphological structure of the mutants changed significantly, and the chlorophyll content also decreased significantly [33]. The generation of leaf color variation is also closely related to the obstruction of the plastid-nuclear signal transduction pathway. Nuclear genes can encode chloroplast proteins, regulate the metabolic state of chloroplasts, and regulate the transcription and translation of chloroplast genes. Plastids can also regulate nuclear gene expression via retrograde signaling pathways [70]. In the study of *Arabidopsis 'cue'* mutants, the leaves of *'cue'* mutants with the visible phenotypes (virescent, yellow-green, pale), defective chloroplast development, delayed differentiation of chloroplasts, and defects in mesophyll structure. The results show that there were complex gene interactions in the cells, and nuclear-cytoplasmic genes coordinated the growth and development of plants through signal molecules [71]. The dynamic balance of chloroplast development, protein synthesis, and degradation is also an important factor affecting leaf color mutation [72]. The results of rice pale green mutants indicated that the decrease of chlorophyll content in leaves was mainly caused by the mutation of the protein *CSP41b* encoding chloroplast development [73].

In addition to being an important part of chloroplast structure, the thylakoid also plays an important role in the function of chloroplasts. One study showed that variation in thylakoid and vesicle structure can lead to the formation of *'vipp1* in *Arabidopsis* mutants [74]. While researching the barley yellow-green leaf color mutant *'ygl9* , it was found that grana and stroma thylakoids were severely linear, grana could not be stacked normally, and the chloroplast ultrastructure was damaged in leaves of *'ygl9* at seeding stage. [75]. In the study of a watermelon *yellow leaf (YL)* mutant, compared with ZK, the chloroplasts of the YL plant underwent incomplete development, resulting in a small number of grana thylakoids in the chloroplasts in which the arrangement was disordered and the cell metabolism was weak; these changes ultimately affected the light harvesting ability of the plants [76].

#### *3.3. Abnormal Carotenoid Metabolism Pathway*

Carotenoids are mainly divided into yellow carotene and orange lutein. Carotenoids have the function of absorbing and transmitting light energy in plants, which can play an important role in the protection of chlorophyll [77]. Regulation of carotenoid biosynthesis is mainly achieved through regulation of carotenoid content via the level of transcription of the enzymes and genes involved in carotenoid synthesis, and through regulating the type and quantity of carotenoids produced [78].

The main genes involved in plant carotenoid synthesis are phytoene synthase (PSY), phytoene desaturase (PDS), ζ-carotene desaturase (ZDS), lycopeneβ-cyclase (LCYB) and lycopene epsilon-cyclase (LYCE), among others. Phytoene synthase is considered the most important regulatory enzyme in the carotenoid synthesis pathway [79]. In *Arabidopsis*, the overexpression of *PSY* can lead to a significant increase in β-carotene content in leaves, and a corresponding increase in total carotenoids, which facilitates greening when etiolated seedlings emerge from the soil [80]. Ectopic expression of *PSY1* in tobacco leads to leaves showing abnormal pigmentation, and very young leaves are sometimes colored bright orange, before rapidly turning green [81]. The *immutans (im)* variegation mutant of *Arabidopsis* has light-dependent variegated phenotypes (green and white leaf sectors), the green leaf

sectors contain normal chloroplasts, while the white leaf sectors contain abnormal chloroplasts that lack carotenoids due to a defect in phytoene desaturase activity [82,83]. In the study of *PHS* mutants in rice, the main enzymes involved in the anabolic pathway of carotene and lutein were mutated, resulting in photooxidative damage of leaf photosystem II (PSII). The core proteins CP43, CP47, and D1 of PSII were all reduced, accompanied by the accumulation of reactive oxygen species (ROS) in the plants. This result showed that damage to carotenoid biosynthesis would result in photooxidation damage and abscisic acid (ABA) deficiency, which would lead to the sprouting of spikes and the appearance of albinism in leaves [84]. Park et al. [85] showed that carotenoids have a feedback regulating effect on plastids, such that changes in carotenoid synthesis can regulate the development of plastids.

#### *3.4. Abnormal Anthocyanin Metabolism Pathway*

Anthocyanins are a class of water-soluble pigments widely present in plants that belongs to a group of flavonoids produced by the secondary metabolism of plants. It mainly accumulates in the form of glycosides in plant vacuoles and can be transformed from chlorophyll. The generalized term anthocyanin refers to all kinds of anthocyanin glycosides [86]. The anthocyanin biosynthesis pathway is an important branch of the flavonoid metabolism pathway, which is closely related to the presentation of plant color [87]. Its biosynthetic pathway has been clearly demonstrated through the research of model plants such as *Arabidopsis*, corn, petunia, etc.

#### 3.4.1. Mutations of Structural Genes Related to the Anthocyanin Synthesis Pathway

Anthocyanin biosynthesis is a direct precursor of phenylalanine in the cytoplasm, which is catalyzed by a series of enzymes encoded by structural genes. After various modifications, it is transported to the vacuole and other places for storage. The synthetic pathway from phenylalanine to anthocyanins can be divided into three stages: (1) phenylalanine produces 4-coumaroyl CoA, which is catalyzed by phenylalanine ammonia-lyase; (2) 4-coumaroyl CoA produces colorless anthocyanins, which are catalyzed by biological enzymes such as chalcone synthase; and (3) colorless anthocyanins catalyze the production of anthocyanins through a series of enzymes in the end-stage metabolic pathway. In this process, a series of structural, modification and transport-related genes are directly and positively regulated by a complex (MBW complex for short) composed of MYB (v-myb avian myeloblastosis viral oncogene homolog), bHLH (basic helIX-loop-helIX) and WDR (WD repeat) regulatory factors [88,89]. Therefore, the genes that affect anthocyanin metabolism can be divided into two categories: structural genes and regulatory genes. Anthocyanin synthesis in plants is strongly influenced by the structural genes in the metabolic pathway, because the structural genes directly encode the enzymes needed in the biosynthesis pathway of anthocyanin metabolism, including phenylalanine ammonia-lyase (PAL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3 –5 hydroxylase (F3 5 H) and anthocyanin synthase (ANS), etc. (Figure 4) [90,91]. Variations in these structural or regulatory genes can lead to the formation of various leaf color mutations.

**Figure 4.** Biosynthesis pathway of anthocyanins in plants. PAL, Phenylalanine ammonia lyase; C4H, Cinnamate 4-hydroxylase; 4CL, 4-coumarate CoA ligase; CHS, Chalcone synthase; CHI, Chalcone isomerase; F3H, Flavanone 3-hydroxylase; F3 H, Flavonoid 3 -hydroxylase; F3 5 H, Flavonoid 3 ,5 -hydroxylase; DFR, Dihydroflavonol 4-reductase; ANS, Anthocyanidin synthase; LDOX, leucoanthocyanidin dioxygenase; UFGT, Flavonoid 3-O-glucosyltransferase; 5GT, Anthocyanin 5-O-glucosyltransferase; 7GT, Flavonoid 7-O-glucosyltransferase; MT, Methyl transferase; AT, Anthocyanin acyltransferase; GST, Glutathione S-transferase; AVIs, Anthocyanic vacuolar inclusions.

In the anthocyanin metabolism pathway, the PAL is the initial enzyme of anthocyanin synthesis and an important regulatory site. Its expression is regulated by its own development and environmental factors. The PAL activity and anthocyanin content of purple-foliage plum leaves were found to increase under light induction, and the leaves gradually turned purplish red [92]. CHS is another key enzyme in the anthocyanin biosynthetic pathway. By using RNA interference to inhibit the expression of *Torenia hybrida CHS*, the blue *T. hybrida* rich in mallow pigment and peony pigment can be transformed into white *T. hybrida* with anthocyanin deficiency [93]. When *Freesia hybrida CHS1* was imported into *Petunia hybrida*, its color changed from white to pink [94]. CHI is also related to the performance of plant leaf color. Inhibiting its expression or reducing its enzyme activity causes a large accumulation of 4 -hydroxychalcone in the anthocyanin synthesis pathway, which affects the rate of anthocyanin synthesis [95]. For example, *CHI* silencing can turn the color of carnation and tobacco to yellow [96]. Moreover, the ANS regulates the oxidation of downstream colorless proanthocyanin into colored anthocyanin. In a study of the onion mutant *ANSPS*, the gene encoding the ANS was found to

have a base insertion mutation, which caused the mutant material to have a different yellow color from the wild type [97]. Overexpression of the *F3 5 H* in roses and chrysanthemums can lead to the synthesis of delphinium in flowers, thus generating blue-hued flowers [98,99]. Therefore, disorders of the anthocyanin metabolic pathway are an important mechanism through which to produce leaf color mutations.

#### 3.4.2. Abnormal Regulatory Factors Related to the Anthocyanin Synthesis Pathway

Anthocyanin synthesis regulatory genes play an important role in plant metabolism. Although they are not structural genes, they are very important for the regulation of structural genes. These transcription factors regulate the metabolic pathways of anthocyanin biosynthesis by binding to the cis-acting elements in the promoters of structural genes to regulate the expression of one or more genes in the biosynthesis pathway. There are three types of transcription factors: MYB (v-myb avian myeloblastosis viral oncogene homolog), bHLH (basic helIX-loop-helIX), and WDR (WD40 repeat proteins). The expression of structural genes is directly controlled by the MBW (MYB-bHLH-WD40) protein complex, which is formed by three types of transcription factors: MYB, bHLH and WDR. The specific combinations of MYB, bHLH and WDR in the MBW protein complex determine the target and intensity of the complex regulation [100,101].

In the process of anthocyanin biosynthesis, most *MYB* transcription factors have positive regulatory effects and usually exist in plant genomes in the form of gene families, including two motifs, such as *R2* and *R3*, or one motif in *R3*. The helIX-helIX-turn-helIX structure at the C-terminal of the conserved domain can specifically recognize and bind to DNA, while its highly conserved C-terminal is of great significance for anthocyanin regulation [102,103]. *PtrMYB119* function as transcriptional activators of anthocyanin accumulation in both Arabidopsis and poplar. Overexpression of *PtrMYB119* in hybrid poplar resulted in elevated accumulation of anthocyanins in whole plants, which had strong red-color pigmentation, especially in leaves relative to non-transformed control plants [12]. Additionally, *PAP1* (production of anthocyanin segment 1) is a key *MYB* class transcription factor that regulates anthocyanin synthesis and affects the expression of structural genes such as *CHS*, *CHI* and *ANS*. For example, activating the overexpression of *Arabidopsis PAP1* can induce the accumulation of anthocyanins in large quantities, which leads to the dark purple color of the leaves of *Arabidopsis PAP1* [104]. In blood-fleshed peach, the coloring BL upstream of the MYB can activate the anthocyanin MYB transcription, resulting in anthocyanin accumulation in the flesh, which then becomes blood-colored [105].

The bHLH transcription factor is involved in the regulation of various physiological pathways, among which the regulation of flavonoid and anthocyanin synthesis is one of its most important functions [106]. In *Setaria italica*, *PPLS1* is a bHLH transcription factor that regulates the purple color of pulvinus and leaf sheath in foxtail millet. Co-expression of both *PPLS1* and *SiMYB85* in tobacco leaves increased anthocyanin accumulation and the expression of genes involved in anthocyanin biosynthesis (*NtF3H, NtDFR* and *Nt3GT*), resulting in purple leaves [107]. Moreover, the bHLH transcription factor can also coregulate the target gene with MYB. For example, expression of *Medicago truncatula MtTT8* in *A. thaliana tt8* mutant can produce anthocyanins and restore the phenotype of anthocyanin deficiency of the mutant [108]. The regulatory R2R3-MYB transcription factor *AmRosea1* and the bHLH transcription factor *AmDelila* in *Antirrhinum majus* have been extensively studied. The simultaneous expression of the *AmRosea1* and *AmDelila* activates anthocyanin production in the orange carrot, which young leaves were dark purple and matured into dark green leaves during the subsequent growth relative to non-transformed plants [109]. In addition, the chrysanthemum *CmbHLH* transcription factor can activate *DFR* expression and promote anthocyanin synthesis when co-expressed with the *CmMYB6* transcription factor [110].

In all the WDR proteins studied, their functions were more similar to those of an interaction platform between other proteins, and their role may that of an intermediate medium connecting MYB and bHLH to form a complex [111]. For example, the *bHLH* transcription factors *TT8*, *EGl3* and WD40 repeat proteins *TTG1* of *Arabidopsis* form MBW complexes, which regulate the expression of *DFR*, *ANS* and *tt19*, thus affecting anthocyanin biosynthesis [112]. In addition, WD40 can stabilize the MBW protein complex and directly regulate the transcriptional regulation of the anthocyanin synthesis pathway [113]. For example, in the study of *M. truncatula*, the silencing of *MYB5* and *MYB1* in *Arabidopsis* was found to directly change the expression pattern of the structural genes, resulting in a decrease in anthocyanin content [114].

#### **4. Mapping and Cloning of Leaf Color Mutational Genes**

With the rapid development of next-generation genome sequencing technologies, the application of genome, pangenome and transcriptome is increasingly extensive in revealing the genetic variation traits and dissecting the genetic regulation mechanism of key genes in plants. Current research is focused on revealing the mechanisms of growth, fruit quality, stress response, and mutation traits, as well as gender determination and flowering, in plants [115–119]. Currently, map-based cloning and RNA-seq are mainly used to mine genes related to leaf color mutations. By using next-generation genome sequencing technologies for the analysis of genomic and transcriptomic data, it is possible to greatly improve the efficiency and precision of the identification of leaf color mutation related genes in plant. The identification, cloning and functional study of these mutant genes plays an important role in elucidating the molecular mechanisms of leaf color mutation, plant photosynthesis, pigment synthesis, etc.

#### *4.1. Using Map-Based Cloning to Discover Mutant Genes*

Map-based cloning, also known as positional cloning, is a new technology for identifying and isolating genes through forward genetics. This technology can gradually locate and clone the gene of interest based on the traits of the organism and the position of closely linked molecular markers on the chromosome without knowing the sequence of the gene of interest in advance. Map-based cloning technology includes the following three steps: the first is to build positioning groups through inbreeding, hybridization, or back-crossing, etc.; the second is to find the markers linked with the gene of interest through molecular marker technology to build a genetic map; and the third is to predict and screen candidate genes and determine the function of the gene of interest by means of molecular biology [120].

The genes controlling leaf colors in plants are usually closely related to genes related to chlorophyll metabolism or chloroplast development. For instance, in the study of rice mutants, gene mapping of yellow-green leaf color genes in rice were performed using map cloning. The results showed that the gene encoding the chloroplast signal recognition particle had a mutation from *A* to *T*, which led to the change in leaf color [121]. In the study of tomato mutants, a single base mutation in the first intron was found using map-based cloning to result in downregulation of *WV* expression, which inhibited the expression of chloroplast-encoded genes and blocked chloroplast formation and chlorophyll synthesis, thereby producing leaf color mutants [122]. At present, map-based cloning technology is increasingly being used to study mutants, and many genes related to chlorophyll synthesis and chloroplast development have been cloned in rice, barley, pear, *Brassica oleracea*, and cucumber [23,123–126].

#### *4.2. Using RNA-Seq to Discover Mutant Genes*

RNA-seq, also known as mRNA-seq, is used to study changes in gene expression at the mRNA level and can directly perform transcriptome analysis on a species without reference to genome information. At present, it is considered to be an effective method for discovering new genes and for annotating coding and noncoding genes in the study of plant leaf color mutants, based on the different expression levels of transcripts among different samples to obtain the differentially expressed genes (DEGs) [127]. Through the functional annotation and enrichment analysis of differentially expressed genes, we can discover some key candidate genes, which is helpful in clarifying the molecular mechanisms of leaf color mutation [128].

RNA-seq is widely used in leaf color mutants of ornamental plants. For example, in the mutants (etiolated, rubescent, and albino) of *Anthurium andraeanum* 'Sonate', the ultrastructure of chloroplasts in mutant leaves was disrupted, and very few intact chloroplasts were observed. The ratio of carotenoid/total Chl in all three mutants was higher than that of the wild type and the content of anthocyanin was at a similar level in the leaf of etiolated mutant and wild type plants, while the albino leaves had the lowest anthocyanin content. After RNA sequencing, most genes related to chlorophyll synthesis, anthocyanin transport and pigment biosynthesis were down-regulated compared with the wild type. These results indicate that the abnormal leaf color of mutants was caused by changes in the expression pattern of genes responsible for pigment biosynthesis [129]. Therefore, it can be concluded that the chloroplast development, division and the mutation of chlorophyll and anthocyanin synthesis pathway in anthocyanin mutants affected the synthesis of chlorophyll and anthocyanin, and finally led to abnormal leaf color [130]. In addition, we also found the main differentially expressed genes related to chlorophyll metabolism, chloroplast development and the photosynthetic system after analyzing the transcriptome data of cucumber [32], tomato [131], barley [132], wheat [133], Birch [134], etc.

#### **5. Conclusions**

A growing number of genes that control or influence pigment metabolism, chloroplast development and differentiation, and photosynthesis have been identified from a variety of leaf color mutants. Through the study of their function and interaction relationships, the formation mechanism of leaf color mutants has been clarified, which is of great significance to the study of leaf color mutants. At present, research on the formation of leaf color mutants has focused on the functions of related genes and transcription factors, such as chlorophyll metabolism, chloroplast development and anthocyanin metabolism, while few studies have examined nuclear-cytoplasmic gene interactions, transcription factors, regulatory elements, and other pigment metabolism. Moreover, gene transcription and translation are influenced not only by transcription factors but also by non-coding RNA (such as miRNA and lncRNA), epigenetics (such as DNA methylation and histone acetylation), and protein spatial structure, although few studies have reviewed these influences. Therefore, in future studies, we should strengthen the research on other pigment molecules, nuclear-plasmid signal transduction, and transcription regulation, as well as examining noncoding RNA, DNA methylation, etc. This will help us to obtain more relevant information on the regulation of pigment anabolic metabolism, and to analyze the molecular regulation mechanisms of plant leaf color mutations from a system network level.

**Author Contributions:** Conceptualization, X.-Y.Z. and M.-H.Z.; methodology, X.L., X.-X.Z. and H.Z.; validation, M.-H.Z., X.L. and X.-X.Z.; resources, M.-H.Z. and H.Z.; writing—original draft preparation, M.-H.Z.; writing—review and editing, R.R.-S. and X.-Y.Z.; supervision, X.-Y.Z.; project administration, X.-Y.Z.; funding acquisition, X.-Y.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fundamental Research Funds for the Central Universities, grant number 2572017DA02.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Forests* Editorial Office E-mail: forests@mdpi.com www.mdpi.com/journal/forests

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18