**Genetic and Morphological Variation in Tropical and Temperate Plant Species**

Editors

**W. John Kress Nancai Pei**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* W. John Kress Department of Botany, National Museum of Natural History, Smithsonian Institution USA

Nancai Pei Research Institute of Tropical Forestry, Chinese Academy of Forestry China

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Forests* (ISSN 1999-4907) (available at: https://www.mdpi.com/journal/forests/special issues/ Plant Genetic Morphological).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03936-756-6 (Hbk) ISBN 978-3-03936-757-3 (PDF)**

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**



### **About the Editors**

**W. John Kress**, Ph.D., Visiting Scholar, Dartmouth College and The Arnold Arboretum of Harvard University Distinguished Scientist and Curator Emeritus National Museum of Natural History Smithsonian Institution, P.O. Box 37012 Washington, DC 20013-7012; Tel: 202-633-0939 (DC Office); 202-372-7745 (Mobile); Email: kressj@si.edu; kressjohn@gmail.com; Research Interests: systematic biology; evolutionary biology; conservation biology.

**Nancai Pei**, Dr. and Associate Research Professor in Research Institute of Tropical Forestry, Chinese Academy of Forestry; Add: No. 682, Guangshan Road 1, Tianhe District, Guangzhou 510520, P. R. China; E-Mail: nancai.pei@gmail.com; Research Interests: forest biology; plant DNA barcoding; urban forestry.

### *Editorial* **Research in Forest Biology in the Era of Climate Change and Rapid Urbanization**

**Nancai Pei 1,\* and W. John Kress 2,\***


Received: 16 December 2019; Accepted: 21 December 2019; Published: 23 December 2019

**Abstract:** Green plants provide the foundation for the structure, function, and interactions among organisms in both tropical and temperate zones. To date, many investigations have revealed patterns and mechanisms that generate plant diversity at various scales and from diverse ecological perspectives. However, in the era of climate change, anthropogenic disturbance, and rapid urbanization, new insights are needed to understand how plant species in these forest habitats are changing and adapting. Here, we recognize four themes that link studies from Asia and Europe presented in this Special Issue: (1) genetic analyses of diverse plant species; (2) above- and below-ground forest biodiversity; (3) trait expression and biological mechanisms; and (4) interactions of woody plants within a changing environment. These investigations enlarge our understanding of the origins of diversity, trait variation and heritability, and plant–environment interactions from diverse perspectives.

**Keywords:** climate change; forest biodiversity; plant–environmentinteractions; plant traits; urbanization

#### **1. Introduction**

Investigations of plants in both little disturbed, more natural environments, as well as in urban areas, are needed where crucial green infrastructure is ever more important for sustaining complex human societies. Recently, numerous studies have addressed the fundamental issues on plant evolution and community phylogenetics via exploring patterns and mechanisms from diverse organismal levels (e.g., molecular, population, species, community, landscape, and ecosystem) [1,2], plant functional traits (e.g., nutrient traits and reproductive traits) [3,4], and interaction of plant species with changing environments (e.g., water, atmosphere, soil, human activities) [5,6].The aim of this Special Issue is to help fill this void in the current research by focusing on diversity, trait, and plant–environment interactions within the context of forest ecosystems. The 19 papers encompassed here can best be linked under four basic themes: (1) genetic analyses of diverse plant species; (2) above- and below-ground forest biodiversity; (3) trait expression and biological mechanisms; and (4) interactions of woody plants within a changing environment.

Together, the guest editors of this Special Issue conceptualized these four themes as a means to progress an open discussion of forest biology, including plants and diverse environments. Our proposal for this Special Issue coincided with the common interests of the ecological and forestry research communities. This Special Issue includes research performed mostly in Asia and Europe with studies originating from Belgium, China, Croatia, India, Poland, and Spain. We could not have hoped to create a more internationally inclusive and relevant Special Issue, and are very proud to present as guest editors this collection of forest biology studies.

#### **2. Theme 1: Genetic Analyses of Diverse Plant Species**

This theme includes 7 papers investigating the physiological and transcriptome analyses of 6 woody plus 1 herbaceous species from 7 plant families. By studying the differentially expressed genes (DEGs) between yellow-green leaf mutant (yl) and control plants in Birch (Betulaceae), Gang et al. find that 1163 genes and 930 genes differentially express in yl compared with WT and C11. The KEGG pathway enrichment analysis for DEGs reveals that photosynthesis antenna proteins represent the most significant enriched pathway. The expressions of photosynthesis antenna proteins are crucial to the leaf color formation in yl. They also report that Chl accumulation, leaf anatomical structure, photosynthesis, and growth are affected in yl. This study has provided the difference in phenomenal, physiological, and gene expression characteristics in leaves between yl mutant and control plants, and presented a new insight into the mutation underlying the chlorotic leaf phenotype in birch [7].

Kaushik and Kumar report expressed transcripts in the leaves of *Aegle marmelos*, a medicinal and horticultural tree species from Rutaceae. They find that 133,616 contigs are assembled to 46,335 unigenes with minimum and maximum lengths of 201 bp and 14,853 bp, respectively. A total of 482 transcripts are annotated as cytochrome p450s, and 314 transcripts are annotated as glucosyltransferases. They suggest that the monoterpenoid biosynthesis pathway in leaves is predominant [8].

Liu et al. evaluate the genetic diversity of 42 wild individuals from seven populations of *Dalbergiaodorifera*, a semi-deciduous commercially importantand threatened tree species from Fabaceae, indigenous to Hainan Island in tropical China. They find 19 SSR markers harbored 54 alleles across the 42 samples, and the medium genetic diversity level is inferred. Among the 7 wild populations, the expected heterozygosity varies from 0.31 to 0.40. The AMOVA analysis shows that only 3% of genetic variation exists among populations. Moderate population differentiation among the investigated populations is indicated by pairwise Fst. Structure analysis suggests two clusters for the 42 samples. These findings provide a preliminary genetic basis for the conservation, management, and restoration of this endemic species [9].

Lu et al. evaluate the genetic diversity and structure of *Eucalyptus urophylla*, an important commercial tropical plantation species from Myrtaceae. They find that significant deviations from the Hardy–Weinberg equilibrium are recorded at all 16 loci in the populations, revealing reasonably high levels of genetic diversity. The genetic differentiation coefficient reveals low differentiation among pairs of provenances comprising the first cycle population. They also find that the majority of molecular genetic variation exists among individuals rather than among provenances for the first cycle population and among individuals rather than among field trial sources in the third cycle population [10].

Mo, Feng et al. construct four small RNA libraries from the graft union of Pecan, a high-value fruit tree from Juglandaceae. They find that 47 conserved miRNAs belonging to 31 families and 39 novel miRNAs are identified. For the identified miRNAs, 584 target genes are bioinformatically predicted, and 266 of them are annotated. Meanwhile, 29 miRNAs (including 16 conserved and 13 novel miRNAs) are differentially expressed during the graft process. The expression profiles of 12 miRNA are further validated by qRT-PCR. They also find that miRS26 may be involved in callus formation, while miR156, miR160, miR164, miR166, and miRS10 may be associated with vascular bundle formation. These results indicate that the miRNA-mediated gene regulations play important roles in the graft union development of pecan [11].

Wei et al. perform a genome-wide identification and analysis of members of the HbMADS-box gene family associated with floral organ and inflorescence development in *Hevea brasiliensis*, a rubber tree species from Euphorbiaceae. They find 20 MADS-box genes are newly identified in the *H. brasiliensis* genome. Expression profiling reveals that HbMad-box genes are differentially expressed in various tissues, which indicate that HbMad-box genes may exert different functions throughout the life cycle. Additionally, 12 genes are found to be associated with the differentiation of flower buds and may be involved in flower development. All of these floral-enriched HbMADS-box genes are regulated by hormone, salt, cold, high-temperature, and drought stresses. This study demonstrates that HbMad-box

genes may be multifunctional regulators, and are mainly involved in the maintenance of floral organ and inflorescence development [12].

In order to understand the function of heat shock transcription factors (Hsfs) in moso bamboo in the family Poaceae, Xie et al. identify 22 non-redundant Hsf genes in the moso bamboo genome. They find members of the PheHsf family can be clustered into three classes, containing stress-, hormoneand development-related cis-acting elements. They also find most PheHsfs participate in rapid shoot growth and flower development in moso bamboo, and PheHsfA1a is expressed mainly during moso bamboo development. Two hub genes are involved in a complex protein interaction network, and five PheHsfAsare predicted to play an important role in flower and shoot development and abiotic stress response of moso bamboo. This study provides an overview of the complexity of the PheHsf gene family and sets a basis for analyzing the functions of PheHsf genes [13].

#### **3. Theme 2: Above- and Below-Ground Forest Biodiversity**

Within this theme, five papers explore the dimensions of plant DNA barcoding, reproductive biology, and of edible plant and fungalresources in forest diversity. Wu, Li, Liao et al. evaluate the effectiveness of DNA barcoding in identifying 23 mangrove species in Guangdong Province (GP), southern China. They find that the success rates for PCR amplification of *rbc*L, *mat*K, *trn*H-*psb*A, and ITS are 100%, 80.29%, 99.38%, and 97.18%, respectively, and the rates of DNA sequencing are 100%, 75.04%, 94.57%, and 83.35%, respectively. These results suggest that both *rbc*L and *trn*H-*psb*A are universal in mangrove species in the sampled sites. The highest success rate for species identification is 84.48% for *trn*H-*psb*A, followed by *rbc*L (82.16%), ITS (66.48%), and *mat*K (65.09%), which increases to 91.25% with the addition of *rbc*L. They suggest that *rbc*L and *trn*H-*psb*A are the most suitable DNA barcode fragments for species identification in mangrove plants, and the combination of *mat*K + *rbc*L + *trn*H-*psb*A + ITS is optimal when constructing the phylogenetic tree in mangrove communities [14]. In addition, in GP, Liao et al. obtain diverse datasets of edible plants and macro-fungi from field collections, historical publications, and community surveys across seven cities. This work is guided on "Observation Methodology for Long-term Forest Ecosystem Research" of National Standards of the People's Republic of China (GB/T 33027-2016). They find that at least 100 plant species (with 64 plant species producing fruit) and 20 macro-fungi are commonly used as edible forest products in subtropical GP. There are 55 and 57 species providing edible parts in summer and autumn, respectively. Many edible plants have multiple uses. They suggest that edible plants and macro-fungi can enrich the food supply for residents in rural and urban areas by acting as supplemental resources to support the increasing demand for food in the era of rapid urbanization and global change [15].

Mao et al. describe the flowering phenology pattern of *Cyclocarya paliurus*, a monoecious species with a heterodichogamous mating system, in a juvenile plantation at the individual and population levels for 5 consecutive years. They find that four flowering phenotypes and strongly skewed ratios of protandry/protogyny and male/female occur in the juvenile population. Sexual type and ratio change significantly with the growth of the population over the years, showing an increasing monoecious group and a decreasing unisexual group, as well as a tendency for the sexual ratio to move towards equilibrium. Two flowering phases and bimodality in gender are displayed, as in other heterodichogamous species, thereby verifying the presence of heterodichogamy in *C. paliurus* [16].

In order to detect the reason for impaired cone maturation in the Pinaceae, Mo, Xu et al. compare transcriptome libraries of *Pinus massoniana* and Z pine (a natural introgression hybrid) cones at seven successive growth stages. They find that several cones indeed relate to reproductive processes. At every growth stage, these genes are expressed at a higher level in *P. massoniana* than in the Z pine. These data provide insights into understanding which molecular mechanisms are altered between *P. massoniana* and the Z pine that might cause changes in the reproductive process [17].

To examine the diversity and antimicrobial activities of endophytic fungi in *Litsea cubeba*, a medicinal plant from Lauraceae, Wu, Yang et al. obtain 970 isolates from the root, stem, leaf, and fruit segments. They find that the fungal endophytes belonged to the phylum Ascomycota and can be classified into 3 taxonomic classes, 9 orders, 12 families, and 17 genera. *Colletotrichum boninense* is the dominant species. For the antimicrobial activities, 17 isolates could inhibit the growth of plant pathogenic fungi, while the extracts of 6endophytes show antimicrobial activity to all the tested pathogenic fungi [18].

#### **4. Theme 3: Trait Expression and Biological Mechanisms**

Theme three includes three papers that investigate the physiological and morphological traits and the possible biological mechanisms that generate and maintain particular patterns in trait expression. In order to interpret the patterns of genetic variation of photosynthesis and the relationships with growth traits within gene resources of teak (*Tectonagrandis*), a commercially important tree species in the plant family Lamiaceae found in tropical regions, Huang et al. measure gas exchange, chlorophyll fluorescence parameters, growth traits of plants in nursery, and field trials for 20 teak clones originating from different countries. They report abundant genetic variation in gas exchange, chlorophyll fluorescence, and growth among the teak clones. The measured traits are found to have generally high heritability. The net photosynthetic rate, seedling height, and individual volume of wood are significantly correlated with each other, and seedling height is significantly correlated with plant height [19].

Zheng et al. screen creeping genes in crape myrtle (*Lagerstroemia indica*) in the plant family Lythraceae, which has significant good polymorphisms. They detect two SSR markers, with genetic distances of 23.49 centimorgan and 25.86 cM from the loci controlling the plant opening angle trait and the branching angle trait, respectively. The accuracy rate for phenotypic verification is 76.51% and 74.14%, respectively, which provides basic information for the molecular marker-assisted selective breeding and cloning of the creeping gene to improve architecture diversity in the breeding of crape myrtle [20].

To describe the spatial arrangement of shoot tissues (rectangular vs. cylindrical) and allometric relationships in two contrasting species of the *Polygonatum* from plant family Asparagaceae, Tulik et al. measure the mass and length of the aerial shoots of the individual plants. They find that both species differ significantly with respect to the length, diameter, and thickness of the outer zone of parenchyma. Allometric relationships are stronger for *P. multiflorum* [21].

#### **5. Theme 4: Interactions of Woody Plants within a Changing Environment**

The four papers addressing this theme describe the investigations of adaptations and responses of plants to the changing environment and provide feedback on forest management at specific sites. To determine the impact of drought on leaf phenology of *Quercus robur*, an economically and ecologically important tree species in the plant family Fagaceae, and spring frost susceptibility in nine provenances, Cehuli´ ˇ c et al. expose one-year-old saplings to experimental drought, re-water, and score leaf phenology and frost injury. They find that leaf phenology from most provenances is significantly influenced by the drought treatment. Drought induces a carry-over effect on flushing phenology. In contrast to flushing, autumn leaf phenology is unambiguously delayed following the drought treatments for all studied provenances. This higher susceptibility to spring frost is most likely caused by the advanced flushing phenology, which results from the drought treatment in the previous year [22].

To explore the effect of heritable phenotypic plasticity in the adaptation of woody species to a quickly changing environment, Vander Mijnsbrugge and Janssens observe the timing of bud burst, flower opening, leaf senescence, and leaf fall in two successive years in a common garden of *Crataegus monogyna* from Rosaceae. They find a strong auto-correlation is present among the spring phenophases as well as among the autumnal phenophases, with spring phenophases being negatively correlated with fall phenophases. The strongest between-provenance differentiation is found for the timing of bud burst in spring. Warmer spring temperatures advance the timing of bud burst. However, advancement is non-linear among the provenances. It can be hypothesized that non-local provenances display larger

temporal phenotypic plastic responses in the timing of their spring phenophases compared to local provenances when temperatures in the common garden deviate more from their home-sites [23].

The impacts of lime application, understory removal, and their interactive effects on soil microbial communities are tested by Wan et al., who conduct a lime application experiment combined with understory removal in a subtropical plantation of *Eucalyptus*(Myrtaceae). They find that lime application significantly decreases both fungal and bacterial phospholipid fatty acids (PLFAs). Understory removal reduces the fungal PLFAs but has no effect on the bacterial PLFAs. Changes in soil microbial communities caused by the lime application are mainly attributed to increases in soil pH and NO3 −-N contents, while changes caused by understory removal are mainly due to the indirect effects on soil microclimate and the decreased soil-dissolved carbon contents. Furthermore, both lime application and understory removal significantly reduce the litter decomposition rates, which may impact the microbe-mediated soil ecological process. They suggest that lime applications may not be suitable for the management of subtropical *Eucalyptus* plantations [24].

Wu, Li, Chen et al. examine the effects of two native phosphate solubilizing bacteria (PSB), and a mixture of both strains on the growth of seedlings of *Camellia oleifera* (Theaceae). They report a significant promotion of the growth of *C. oleifera* plants by three inoculation treatments. All the PSB inoculation treatments can improve the leaf N and P content and have positive effects on the available N, P, and K content of the rhizosphere soil. A co-inoculation of the two native PSB strains causes a synergistic effect and achieves the best benefit. PSB can convert the insoluble phosphates into plant-available forms and may have the potential for use in sustainable agricultural practices [25].

#### **6. Summary and Future Directions**

We are pleased to present this Special Issue and believe that many of the studies included here from across the world will make a lasting contribution to biology, ecology, and forestry at diverse scales in the era of changing climate and rapid urbanization. All of the case studies here highlight the important role of emerging techniques, new methods, and novel theories to promote the development of forest biology and plant ecology. We expect that subsequent contributions to this field might consider plant biology from the perspectives of morphology, genetics, trait function, and plant–environment interactions with biotic and abiotic factors [1,2,5,26–30]. Such studies may provide novel insights and new knowledge on quantitative evaluation and description of interactions of plants with animals and microbes, both in natural and urban environments, including terrestrial and aquatic systems.

**Author Contributions:** N.P. and W.J.K., proposed and guest-edited the Special Issue and wrote this editorial together. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (31570594), Fundamental Research Funds of CAF (CAFYBB2017QB002), Research Funds of Guangdong Academy of Sciences (2020GDASYL-20200401001), and CFERN & BEIJING TECHNO SOLUTIONS Award Funds on excellent academic achievements.

**Acknowledgments:** We would like to acknowledge the contributions made by the authors and all reviewers of the 19 manuscripts in this *Genetic and Morphological Variation in Tropical and Temperate Plant Species* Special Issue.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Physiological and Transcriptome Analysis of a Yellow-Green Leaf Mutant in Birch (***Betula platyphylla × B. Pendula***)**

#### **Huixin Gang, Guifeng Liu, Su Chen and Jing Jiang \***

State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, 26 Hexing Road, Harbin 150040, China; gang\_em@163.com (H.G.); liuguifeng@126.com (G.L.); chensunefu@163.com (S.C.) **\*** Correspondence: jiangjing@nefu.edu.cn; Tel.: +86-139-4602-6246

Received: 19 December 2018; Accepted: 1 February 2019; Published: 2 February 2019

**Abstract:** Chlorophyll (Chl)-deficient mutants are ideal materials for the study of Chl biosynthesis, chloroplast development, and photosynthesis. Although the genes encoding key enzymes related to Chl biosynthesis have been well-characterized in herbaceous plants, rice (*Oryza sativa* L.), Arabidopsis (*Arabidopsis thaliana*), and maize (*Zea mays* L.), yellow-green leaf mutants have not yet been fully studied in tree species. In this work, we explored the molecular mechanism of the leaf color formation in a yellow-green leaf mutant (*yl*). We investigated the differentially expressed genes (DEGs) between *yl* and control plants (wild type birch (WT) and *BpCCR1* overexpression line 11, (C11)) by transcriptome sequencing. Approximately 1163 genes (874 down-regulated and 289 up-regulated) and 930 genes (755 down-regulated and 175 up-regulated) were found to be differentially expressed in *yl* compared with WT and C11, respectively. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for DEGs revealed that photosynthesis antenna proteins represent the most significant enriched pathway. The expressions of photosynthesis antenna proteins are crucial to the leaf color formation in *yl*. We also found that Chl accumulate, leaf anatomical structure, photosynthesis, and growth were affected in *yl*. Taken together, our results not only provide the difference of phenomenal, physiological, and gene expression characteristics in leaves between *yl* mutant and control plants, but also provide a new insight into the mutation underlying the chlorotic leaf phenotype in birch.

**Keywords:** yellow-green leaf mutant; transcriptome; antenna protein; photosynthesis; birch

#### **1. Introduction**

Birch (*Betula*), a member of the tall deciduous tree family of Betulaceae, contains approximately 20 taxa birch species. Birch is an ecologically important tree species native to parts of China, Siberia, Korea, Japan, and Russia that has been introduced to many northern areas of the world [1]. This species grows fast and has a high tolerance that allows it to be used for revegetation and reforestation [2]. Birch also plays an important role in the forestry industry as a source of timber, fuelwood, plywood, pulpwood, and furniture [3–6]. Recently, there has been an increased interest in plants with colored leaves. *Betula pendula* 'Purple Rain', an intraspecific variety of *B. pendula*, has been used as a decorative plant for its purple leaves [7]. Previously, a yellow-green leaf mutant (*yl*) was derived from cinnamoyl-CoA reductase (*BpCCR1*)-overexpressing transgenic birch plants. The *yl* mutant displayed a distinct yellow-green phenotype, while the leaves of all the other *BpCCR1*-overexpressing lines were a normal color, the same as the wild type birch (WT). One of the *BpCCR1*-overexpressing lines, C11 with green leaves, was used as control birch [8]. The *yl* mutant is a very valuable resource because a yellow-green leaf is one of the most popular traits in landscape greening.

Leaf color variation is one of the most common mutated traits as it is easily discovered in higher plants. Up to now, many Chl-deficient mutants have been identified in herbaceous and woody plants, including Arabidopsis [9], rice [10], maize [11], wheat (*Triticum aestivum* L.) [12], cotton (*Gossypium barbadense* L.) [13], and tea (*Camellia sinensis* (L.) O. Kuntze) [14]. It has been reported that a yellow-green leaf mutant (*siygl1*) of foxtail millet (*Setaria italic* L.) isolated following ethyl methanesulfonate (EMS) treatment was due to the loss function of the *SiYGL1* gene. The identification of the *SiYGL1* gene that encodes Mg-chelatase ATPase subunit D facilitated the understanding of the biological processes of chlorophyll (Chl) biosynthesis in millet [15]. Another yellow-green leaf mutant (*ygl8*) was proved to be controlled by the *Ygl8* gene, which encodes a chloroplast-targeted uridine monophosphate (UMP) kinase and affects chloroplast development in rice [16]. Therefore, Chl-deficient mutants are valuable genetic materials for exploring the molecular mechanisms of Chl biosynthesis and regulation, chloroplast development, plastid-to-nucleus signal transduction, and photosynthesis.

Next-generation sequencing (NGS) technologies have been considered as powerful tools for advanced research in many areas. Examples are genome and transcriptome sequencing of animals, plants, and microbes with high-throughput, high-speed, and high-accuracy sequencing data [17–19]. Wang, using transcriptome sequencing, analyzed a Chl-deficient chlorina tea plant culticar and reported the molecular mechanisms of the chlorine tea phenotype [14]. Study of a *Lagerstroemia indica* yellow leaf mutant using transcriptome analysis revealed the formation pathway of a yellow leaf mutant and discovered novel candidate genes related to leaf color [20]. The development of NGS technologies has increased the rate and efficiency of gene discovery and permitted a deeper understanding of the gene expression network.

In this work, we performed transcriptome sequencing for *yl* with a yellow-green leaf phenotype, *BpCCR1* overexpression line 11(C11), and wild type birch (WT) with a normal green leaf phenotype to analyze gene expression in these plants and elucidate the molecular mechanisms related to the different phenotypes. In addition, comparative physiological studies were conducted to investigate the phenotypic differences between *yl* and control (WT and C11) plants. This study improved our understanding of the yellow-green phenotype in birch.

#### **2. Materials and Methods**

#### *2.1. Materials*

A birch (*Betula platyphylla* × *B. pendula*) Chl-deficient mutant *yl*, *BpCCR1* overexpression line 11 (C11), and wild type birch (WT) were used as the experimental materials. The *yl* mutant was derived from *BpCCR1* overexpression lines. All the plants were grown in the pots with dimensions of 8 × 8 cm and substrata of 9 cm under natural conditions and were well watered at the birch breeding base, Harbin, China. Mature leaves from the new stems were collected in the spring. Immediately after harvest, samples were frozen in liquid nitrogen and stored at −80 ◦C for RNA extraction.

#### *2.2. Methods*

#### 2.2.1. Measurement of Growth Traits and Pigment Content

Forty plants of each line (WT, C11, and *yl*) were used for the measurement of plant height. Each value was the average of the measurements.

Fresh leaves of WT, C11, and *yl* plants (from first to sixth leaf) were collected during the growing season and used for the measurement of pigment contents, according to the method of Lichtenthaler [21]. The first leaf was the youngest and the sixth leaf was the oldest on the main stem. Chl and carotenoid (Car) were extracted with 80% acetone at 4 ◦C for 24 h in the dark, and then calculated from the absorbance at 470 nm, 646 nm, and 663 nm in a Vis-UV spectrophotometer (TU-1901, Persee, China). Contents of Chl a (mg/g), Chl b (mg/g), and Car (mg/g) were calculated as follows:

$$\text{C}\_{\text{Chl a}} \text{ (mg/L)} = 12.21 \text{ A}\_{663} - 2.81 \text{ A}\_{646} \tag{1}$$

$$\text{C}\_{\text{Chl}\,\text{b}}\,\text{(mg/L)} = 20.13\,\text{A}\_{646} - 5.03\,\text{A}\_{663} \tag{2}$$

$$\text{C}\_{\text{Car}} \text{ (mg/L)} = 4.37 \, A\_{470} + 2.11 \, A\_{663} - 9.10 \, A\_{646} \tag{3}$$

$$\text{Chl a (mg/g)} = \text{C}\_{\text{Chl a (mg/L)}} \times V \text{ (L) } / \text{ M}\_{\text{fresh}} \text{ (g)} \tag{4}$$

$$\text{Chl} \,\text{b} \,\text{(mg/g)} = C\_{\text{Chl}\,\text{b}} \,\text{(mg/L)} \times V \,\text{(L)} \,\text{/} \,\text{M}\_{\text{fresh}} \,\text{(g)}\tag{5}$$

$$\text{Car (mg/g)} = \text{C}\_{\text{car (mg/L)}} \times V \text{ (L) } / \text{ M}\_{\text{fresh (g)}} \text{(g)} \tag{6}$$

$$\text{Chl a/b ratio} = \text{Chl a/ } \text{Chl b} \tag{7}$$

#### 2.2.2. Light Microscopy

The fourth leaves of WT, C11, and *yl* were used as samples and fixed in FAA (formaldehyde, glacial acetic acid and 50% ethyl alcohol, V:V:V = 1:1:18) for 24 h, dehydrated in a graded ethanol series and xylene, and then embedded in paraffin wax. Sections (10 μm thick) were stained with safranine and fast green dyes. The cell structures of the samples were examined and photographed using an Olympus DP26 digital camera (Olympus, Tokyo, Japan). Five leaf anatomical features, including lamina thickness (LT), adaxial epidermis thickness (UE), abaxial epidermis thickness (LE), palisade parenchyma thickness (PT), and spongy parenchyma thickness (ST), were examined with the cellSens Entry software. Additionally, the palisade parenchyma/mesophyll ratio was calculated. Four positions of each section were measured and each value of leaf anatomical feature was the average of 20 measurements from five individual plants.

#### 2.2.3. Leaf Gas-Exchange Measurement

The net photosynthetic rate (Pn) of the fourth leaves from WT, C11, and *yl* was measured using an Li-6400 portable photosynthesis system (LI-COR Inc, Lincoln, NE, USA) at 9:00–11:00 am on sunny days. CO2 concentration was controlled at 400 μmol mol<sup>−</sup>1. Relative air humidity was about 50% and leaf temperature was about 28 ◦C. The default red/blue LED light source (LI6400-02B) was chosen as the light source. The photosynthetic curves were made against the light intensity of 2000, 1800, 1500, 1200, 1000, 800, 600, 400, 200, 100, 50, 20, and 0 μmol photos m−<sup>2</sup> s−1. About 15 min light adaptation was applied to leaves at an initial light step before CO2 measurement and then the values were recorded when they were stable at each light step. The measurements of WT, C11, and *yl* were made under the same conditions (including time interval of illumination). The averaged values of each light step for each plant were used in the light-response curve.

#### 2.2.4. RNA Extraction, Library Construction, and RNA-seq

The fourth leaves of WT, C11, and *yl* were used as samples. Total RNAs were extracted from leaf samples using the CTAB (cetyltrimethylammonium bromide) method [22]. A summary of the procedure was as follows: The samples were individually milled in a mortar with liquid nitrogen and then incubated with 2% CTAB (added 2% β-mercaptoethanol) at 65 ◦C in a water-bath for 5 min. The samples were centrifuged at 13,400 g for 10 min and an equal volume of chloroform was added to the supernatant. A half volume of ethyl alcohol and 0.8 times volume of 5 mol/L LiCl were then added to the supernatant after centrifuging at 13,400 g, 4 ◦C for 10 min. After standing for 10 min, the samples were centrifuged at 13,400 g, 4 ◦C for 20 min. Then, the precipitates were washed in 70% ethanol and dried. RNAs were dissolved with diethylpyrocarbonate (DEPC)-treated water and treated with DNaseI. RNA quality, purity, and integrity were detected by 1% agrose gel electrophoresis, a NanoDrop2000 microvolume spectrophotometer (Thermo, Waltham, MA, USA) and an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA, USA), respectively. RNAs from three independent replicates were mixed by equal volume. Poly (A) mRNA was enriched using Oligo (dT)-magnetic beads and cleaved into short fragments with fragmentation buffer. These short fragments were used as templates to

synthesize the first-strand cDNA. Second-strand cDNA was synthesized using buffer, dNTPs, RNaseH, and DNA polymerase I. Purified cDNA was used as a template for PCR amplification and library construction. Lastly, the library was sequenced on an Illumina HiSeq™ 2500 platform by Biomarker Technology Company (Beijing, China). CTAB, β-mercaptoethanol, chloroform, ethyl alcohol, LiCl, and DEPC were bought from Sigma-Aldrich, St. Louis, MO, USA. DNaseI, dNTPs, RNaseH, and DNA polymerase I were bought from Promega, Madison, WI, USA.

#### 2.2.5. Gene Annotation and DEG Analysis

To elucidate the reason for the different phenotype observed in *yl*, we explored the gene expression of WT, C11, and *yl* at the molecular level. After removing adapters and low-quality sequences, we generated an average of approximately 3.46 Gb RNA-seq data with 91.28% Q30 bases and 46.28% GC content for each sample in the transcriptome sequencing. More than 82.21% of the clean reads were mapped to the birch reference genome [23] using TopHat2 [24]. Gene functions were annotated using the Nr, Swiss-Prot, Kyoto Encyclopedia of Genes and Genomes (KEGG), Eukaryotic Orthologous Groups (KOG), Clusters of Orthologous Groups (COG) and Gene Ontology (GO) databases. The expression level of each gene was calculated using FPKM (Fragments per kilobase per million mapped reads). Differentially expressed genes (DEGs) between each two-sample comparison were defined with fold change ≥2 and FDR (false discovery rate) <0.01 as a threshold, according to the statistical analysis performed by EBSeq. The percentages of DEGs in Go classification were calculated as follows:

$$\text{Percentages of genes} = \text{(Number\\_DCSs in a specific term)} \; / \; \text{(Number\\_AllDCSs)} \times 100\% \tag{8}$$

KEGG pathway terms with corrected enrichment *p* values less than 0.05 (Fisher's exact test) were considered to be significantly enriched. The genes involved in photosynthesis-antenna proteins were extracted according to the functional annotation information of the genes.

#### 2.2.6. RNA Extraction and Quantitative RT-PCR

Total RNAs were extracted from the functional leaves of the WT, C11, and *yl* lines as described in 2.1.4 and treated with DNaseI (Promega, Madison, WI, USA). cDNA was synthesized from 1 μg RNA of WT, C11, and *yl* using a ReverTreAce® qPCR RT Kit (Toyobo, Osaka, Japan), according to the manufacturers' instructions, respectively. The procedure was as follows: RNA was incubated with 5x RT Master Mix at 37 ◦C for 15 min, 50 ◦C for 5 min, and 98 ◦C for 5 min, and then diluted 10-fold with nuclease-free water. The quantitative (q)RT-PCR was performed on a 7500 real-time PCR system (Applied Biosystems, Darmstadt, Germany) using SYBR® Green PCR master mix (Toyobo, Osaka, Japan). Each qRT-PCR reaction (total 20 μL) contained 2 μL of cDNA, 10 μL of 2× SYBR Green Real-time PCR Master mix, 0.5 μL of each PCR primer, and 7 μL of nuclease-free water. qRT-PCR conditions were as follows: 95 ◦C for 30 s followed by 45 cycles of 95 ◦C for 15 s, 60 ◦C for 45 s, and a final extension at 72 ◦C for 30 s. The results were calculated using the 2−ΔΔCt method [25], and *Bp18S rRNA* was selected as an internal control gene [26]. Primer sequences are listed in Table 1.

#### 2.2.7. Statistical Analysis

Data were analyzed using SPSS statistics software, version 19.0 (International Business Machines, Armonk, NY, USA). Differences between the means of each line on leaf anatomical and plant height were determined using one-way analysis of variance and the Duncan multiple comparison procedure. A *p* value less than 0.05 was considered statistically significant and labeled with a different letter. The same letter represented that they were not significantly different. The correlations between pigment contents and Pn, stomatal conductance (Gs) and Pn, Gs and transpiration rate (Tr), and Tr and Pn were evaluated using Pearson's correlation coefficients. The relationship between Tr and Pn was tested for linear, exponential, and logarithmic functions and the best fit regressions were selected. Correlation coefficients (*R*2) and equations were obtained from nonlinear regression analysis of Tr and Pn using Origin Pro software, version 8.1 (OriginLab, Northampton, MA, USA).


**Table 1.** The primer sequences used in qRT-PCR.

#### **3. Results**

#### *3.1. Pigment Contents and Leaf Anatomical Structure of Chl-Deficient Mutant yl*

Previously, we transformed the *BpCCR1* gene to a hybrid birch, WT (*Betula platyphylla* × *B. Pendula*) by *Agrobacterium tumefaciens*. We obtained 19 *BpCCR1*-overexpressing transgenic lines. Among them, a transgenic line (*yl*) displayed a yellow-green leaf phenotype, which was distinct from other transgenic lines, including C11. During the growth season, the *yl* mutant exhibited yellow-green leaves, while C11 and WT exhibited green leaves (Figure 1a,b). In order to investigate the difference, we measured the pigment contents and leaf anatomical structure of *yl*. We measured the Chl and Car contents from the first leaf to the sixth leaf. The result showed that the pigment contents, including Chl a, Chl b, and Car, were increased from the first leaf to sixth leaf in all samples (Figure 1c). However, all pigment levels were lower in the leaves of *yl* than those of WT and C11. Chl a in *yl* was decreased by 44–65% of WT, and 41–57% of C11. Chl b in *yl* was decreased by 62–79% of WT, and 61–73% of C11. Similarly, Car in *yl* was decreased by 40–55% compared with WT, and 30–54% compared with C11. The ratio of Chl a to Chl b was increased in all leaves of *yl*.

We then investigated the leaf anatomical structure of WT, C11, and *yl*, as leaves are important organs for photosynthesis, and their anatomical structures could affect photosynthetic and physiological activities. A significant reduction in the lamina thickness of *yl* leaves compared with the leaves of WT and C11 was observed in transverse sections of the leaf blades (Figure 2a). Also, adaxial epidermis, abaxial epidermis, palisade parenchyma, and spongy parenchyma were reduced in thickness in *yl* leaves (Figure 2b). However, there was no difference in the ratio of palisade to spongy parenchyma.

#### *3.2. Changes in Photosynthesis and Growth*

To examine the effect of low pigment contents on photosynthesis in the *yl* mutant, we measured the photosynthetic rate of WT, C11, and *yl*. The result showed that *yl* had a lower net photosynthetic rate and a lower transpiration rate than WT and C11 at all light intensities determined (Figure 3a,b).

We then investigated the correlations between pigment contents, transpiration rate, stomatal conductance, and Pn. The results showed that the Pn values had a strong relationship with total Chl, Chl a, Chl b, and Car under high light intensity (*p* < 0.05). The correlations between pigment contents and Pn were decreased under low light intensity, revealing that not all pigments participated in photosynthesis under low light intensity (Table S1). In addition, we found that Pn and Tr values displayed a significant positive correlation (*p* < 0.01) with Gs in WT, C11, and *yl* plants. Similarly, there was also a highly significant correlation (*p* < 0.01) between Pn and Tr in WT, C11, and *yl* plants (Table 2). Then, we analyzed the scatter plots between Pn and Tr and found that the exponential decay function was the best fit. The curve could be divided into two stages. In the first stage, the Pn value increased with the rising Tr and the curve was likely to be linear. Gs may be the primary limiting factor in this stage. In the second stage, Pn increased slowly or mantained invariability along with the increase of Tr. Gs may not be the primary limiting factor in this stage (Figure 4a–c).

To explore the effect of low photosynthesis on growth, we measured the height of WT, C11, and *yl* plants. The heights of one-year-old WT and C11 were 36.7 cm and 37.0 cm, while the *yl* mutant was 30.9 cm, about 84% and 83% of WT and C11, respectively. The result revealed that *yl* grew slower compared to WT and C11 (Figure 3c).

**Figure 1.** Growth performance and pigment content of wild type birch (WT), *BpCCR1* overexpression line 11 (C11) and yellow-green leaf mutant (*yl*) lines. (**a**) Growth performance of one-year-old WT and *yl* plants. (**b**) The leaves from first to sixth of WT, C11, and *yl* lines. (**c**) Chl a, Chl b, Car, and Chl a/b in first to sixth leaves of WT, C11, and *yl* lines. Error bars represent the standard deviation (SD) of three independent experiments.

**Figure 2.** Leaf anatomical characteristics of WT, C11, and *yl*. (**a**) Leaf transections of WT, C11, and *yl* lines. (**b**) Leaf anatomical structure of WT, C11, and *yl*. Lamina thickness, LT. Adaxial epidermis thickness, UE. Abaxial epidermis thickness, LE. Palisade parenchyma thickness, PT. Spongy parenchyma thickness, ST. Error bars represent the standard deviation (SD) of 20 measurements from five individual plants.

**Figure 3.** Photosynthetic and growth performance of WT, C11, and *yl* lines. Photosynthetic rate (**a**) and transpiration rate (**b**) of WT, C11, and *yl*. Error bars represent the SD of three measurements. (**c**) Plant height of WT, C11, and *yl*. Error bars represent the SD of 40 measurements.

**Table 2.** The correlation analysis of transpiration rate (Tr), stomatal conductance (Gs), and Pn in WT, C11, and *yl*. \*\* Correlation is significant at the 0.01 level (2-tailed).


**Figure 4.** The scatter plots between Pn and Tr in WT, C11, and *yl*.

#### *3.3. Differently Expressed Genes between yl and Control Plants*

In order to generate DEGs, two transcriptome comparisons were carried-out, including C11 vs. *yl* and WT vs. *yl*. As a result, 1163 genes (874 down-regulated and 289 up-regulated) were found to be differentially expressed in *yl* compared with C11. 930 genes, including 755 down-regulated and 175 up-regulated that were differentially expressed in *yl* compared with WT (Figure 5a).

GO categories were then assigned to evaluate the potential functions of these DEGs according to the biological process, cellular component, and molecular function ontology. In the biological process, the DEGs were classified into twenty categories, and the most three overrepresented terms were cellular process, single-organism process, and metabolic process. For cell components, there are many DEGs involved in cells, cell parts, and organelles. For molecular functions, most of the DEGs were participates in binding and catalytic activity, as shown in Figure 5b.

**Figure 5.** Differentially expressed genes (DEGs) analysis based on Gene ontology. (**a**) Number of DEGs in C11 vs. *yl* and WT vs. *yl*. (**b**) Gene ontology classification of DEGs in C11 vs. *yl* and WT vs. *yl*.

#### *3.4. KEGG Pathway Analysis of DEGs*

The DEGs were further subjected to KEGG pathway analysis to identify the enriched biological pathways. The DEGs of C11 vs. *yl* and WT vs. *yl* were mapped to 81 and 85 KEGG pathways, respectively. Among them, two pathways, including photosynthesis-antenna proteins and phenylalanine metabolism, were considered significantly enriched at a cut-off P-value < 0.05 and FDR < 0.05 in both C11 vs. *yl* and WT vs. *yl*. Photosynthesis-antenna proteins represented the most significantly enriched pathway in the DEGs of *yl* compared to WT and C11 (Figure 6a,b).

**Figure 6.** KEGG-based pathway enrichment of DEGs in C11 vs. *yl* and WT vs. *yl*.

We then explored the photosynthesis-antenna proteins pathway in more detail. A total of twenty-one *Lhc* (light-harvesting complex) genes were found to be related to the antenna proteins in the birch genome. However, seven of these genes were differently expressed in *yl* compared to control plants and six displayed low or undetectable levels of expression in all samples. One gene (Bpev01.c0243.g0056.m0001) involved in light-harvesting the Chl a/b binding protein Lhca3 of Photosystem I, three genes (Bpev01.c0362.g0012.m0001, Bpev01.c0264.g0036.m0001, Bpev01.c1767.g0010.m0001) involved in light-harvesting the Chl a/b binding protein Lhcb1, and one gene (Bpev01.c1040.g0049.m0001) involved in light-harvesting the Chl a/b binding protein Lhcb2 of Photosystem II showed significantly reduced transcript levels in *yl*. Two genes (Bpev01.c0190.g0044.m0001, Bpev01.c0841.g0007.m0001) were involved in light-harvesting the Chl a/b binding protein Lhcb4 of Photosystem II, and Bpev01.c0190.g0044.m0001 was down-regulated and Bpev01.c0841.g0007.m0001 was up-regulated (Figure 7a,b). These results suggested that the changes in the photosynthesis-antenna proteins pathway were important to the unique phenotype of *yl*.

**Figure 7.** Expression pattern of genes involved in light-harvesting Chl complex in WT, C11, and *yl*. (**a**) DEGs in the pathway of light-harvesting Chl complex. Green box represents down-regulated genes. Blue box represents down-regulated and up-regulated genes. (**b**) Expression of all genes involved in light-harvesting Chl complex in WT, C11, and *yl*.

#### *3.5. qRT-PCR Verification of RNA-seq*

To test the reliability of RNA-Seq, we selected 12 functionally important and representative genes for validation using qRT-PCR, including two non-DEGs, five down-regulated genes, and five up-regulated genes (Table 3). The expression of all these genes obtained via qRT-PCR showed a similar pattern to that detected by transcriptome sequencing (Figure 8).

**Table 3.** The genes used for qRT-PCR.

**Figure 8.** Quantificational real-time PCR verification of RNA-seq. Error bars represent the SD of three measurements.

#### **4. Discussion**

In this study, we reported a Chl-deficient mutant *yl* that produced yellow-green leaves in birch. The mutant was isolated from *BpCCR1* transgenic lines in birch plant breeding. Physiological analysis and gene expression characterization of *yl* were performed to investigate the difference between *yl* and control plants (WT and C11).

Chl and Car are the main pigments that trap light energy in leaf tissue. It has been demonstrated that the leaves of Chl-deficient mutants always contain less Chl and Car. Some mutants also showed a change in the ratio of Chl a/b [16,27,28]. Consistent with the previous research, pigment analysis of *yl* showed that Chl a, Chl b, and Car were reduced in young leaf or mature leaf specimens, compared to WT and C11. We also found that the ratio of Chl a/b was increased in the *yl* plants (Figure 1). Chl b is thought to be essential to the stability of the light-harvesting Chl a/b protein complex [29]. Thus, the decreased Chl content and increased Chl a/b ratio in *yl* indicated that there might be fewer light-harvesting antenna complexes than in the WT and C11 controls. There are numerous Chl-deficient mutants that have shown reduced amounts of light-harvesting proteins (LHC) in the thylakoid membranes of the chloroplast [30,31]. Andersson reported that the absence of Lhcb1 and

Lhcb2 proteins in Arabidopsis showed reduced Chl levels and increased Chl a/b [32]. According to transcriptomic analysis, DEGs of *yl* compared with WT and C11 were both significantly enriched in photosynthesis-antenna proteins (Figures 6 and 7). These results revealed that the expression change of genes involved in photosynthesis-antenna proteins plays an important role in the formation of the yellow-green leaf phenotype in *yl*.

Green plants absorb light energy to convert CO2 and water into carbohydrates and oxygen through photosynthesis. Photosynthesis is the key process that provides energy for catabolic processes and growth in plants. It has been reported that photosynthesis could be influenced by many environmental factors, such as intensity, spectrum and duration of illumination, mechanical wounding, and heating [33,34]. These stimulations would induce the generation and propagation of variation potential [35]. Then, electron flow connected to pH would be changed and the light-harvesting complex would transfer to photosystem II (PSII). As a result, photosynthesis would be decreased in plants [36]. Photosynthesis is a complex process. The changes in pigment contents, stomata conductance, and gene expressions may also affect photosynthesis [37]. Due to the reduced pigment contents, most chlorina mutants have a poorer photosynthetic performance than that observed in wild type specimens [9,38]. However, this is not true for all chlorine mutants. For example, a chlorina rice mutant *Huangyu B* was found to have a higher photosynthetic efficiency than its wild type [39], and the photosynthetic rate of a Chl-deficient mutant *siygl1* in foxtail millet was even higher than that of Yugu1 plants during the reproductive growth stage [15]. In this study, *yl* showed a reduced pigment, including Chl a, Chl b, and Car contents (Figure 1). The photosynthetic rate, stomata conductance, and transpiration rate of the *yl* line were lower than WT and C11 under all light intensities set (Figure 3). Additionally, the measurement of photosynthesis in WT, C11, and *yl* was performed under the same conditions. We also found that there was a significant positive correlation between stomata conductance, pigment contents, and photosynthetic rate. RNA-seq results showed that many genes related to photosynthesis-antenna proteins were down-regulated in *yl* compared to WT and C11 (Figure 7). As a result, the energy absorbed, trapped, and transferred in photosystem I (PSI) and PSII would probably be affected. Taken together, the decreased photosynthesis in *yl* was probably mainly due to the low expression level of antenna protein genes, and reduced photosynthetic pigment contents. The difference in Pn may underlie the retarded growth in one-year-old *yl* mutant plants (Figures 1 and 3).

The molecular mechanism of leaf color mutation is complex. Mutation of genes related to chloroplast development, blocked in photosynthetic pigment biosynthesis, blocked in chloroplast protein transport, and blocked in phytochrome regulation would lead to the formation of a yellow-green mutant [26–31]. Studies have shown that plentiful genes are related to the yellow-green leaf phenotype in plants. Examples are cytokinin-responsive gata transcription factor1, *Cga1* [40]; chaperone protein ClpC, *ClpC1* [41]; signal recognition particle 43 kDa protein, *cpSRP43* [42]; chloroplast Signal recognition particle subunit, *cpSRP54* [43]; metallo-beta-lactamase, *GRY79* [44]; nuclear transcription factor Y, *HAP3A* [45]; NADPH-dependent thioredoxin reductase C, *NTRC* [46]; protein stay-green, *SGR* [47]; and YbeY endoribonuclease, *YbeY* [38]. However, the expression of all these genes did not change in the *yl* transcriptome (Table 4).

**Table 4.** The expression of well-known genes related to leaf color in the transcriptional level of a *yl* mutant.


Plastid-to-nucleus retrograde signaling is considered to coordinate nuclear gene expression. Nott has summarized three independent retrograde signaling pathways from previous studies, including signals generated by Mg-Protoporphyrin IX (Mg-Proto IX), chloroplast gene expression, and the redox state of photosynthetic electron transport components [48]. One possibly important function of retrograde signaling is to regulate the biosynthesis of Chl with the expression of genes for nuclear-encoded Chl-binding proteins, such as the Lhca and Lhcb proteins. Here, we found that the transcription of Chl-binding protein genes (*Lhc* gene family) was down-regulated via RNA-seq and qRT-PCR analysis (Figures 7 and 8). Members of the *Golden2-like* (*GLK*) gene family have been reported to regulate chloroplast development in diverse plant species [49,50]. *GLK* genes are sensitive to retrograde signaling from the chloroplast, and they could then operate downstream of genes for plastid retrograde signaling [51]. The expression of *GLK* (Bp023762) was only 0.7% of WT and 0.9% of C11 (Figure 8). In addition, the *PSRP1* gene (BP012524), an encoding ribosomal-binding factor (plastid-specific ribosomal protein 1) that inhibits plastid translation by blocking tRNA-binding sites on ribosomes, was upregulated in the *yl* mutant. These results suggest that plastid-to-nucleus retrograde signaling triggered in *yl* may regulate nuclear gene expression.

#### **5. Conclusions**

In this study, about 1163 DEGs and 930 DEGs were obtained in *yl* compared with WT and C11, respectively. The DEGs related to the photosynthesis antenna proteins pathway were significantly enriched. In addition, the physiological characteristics analysis showed that the yellow-green leaf mutant *yl* had reduced amounts of Chl, an increased Chl a/b value, and reduced leaf anatomical compared to control plants. Based on these results, we can conclude that the expression of genes involved in the photosynthesis antenna proteins pathway might be responsible for the lower pigment contents and ratio of Chl a/b, resulting in yellow-green leaves in *yl*.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/10/2/120/s1, Table S1: The correlation analysis between pigment contents and Pn.

**Author Contributions:** J.J. and S.C. designed the experiments; H.G. performed the experiments; S.C. analyzed the data; H.G. wrote the manuscript; J.J., G.L., and S.C. revised the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (No. 31570647) and the 111 Project (B16010).

**Acknowledgments:** We are grateful to the transgenic material *yl* generated by Rui Wei and Wenbo Zhang.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Transcriptome Analysis of Bael (***Aegle marmelos* **(L.) Corr.) a Member of Family Rutaceae**

#### **Prashant Kaushik <sup>1</sup> and Shashi Kumar 2,\***


Received: 13 June 2018; Accepted: 10 July 2018; Published: 26 July 2018

**Abstract:** *Aegle marmelos* (L.) Corr. is a medicinally and horticulturally important tree member of the family Rutaceae. It is native to India, where it is also known as Bael. Despite its importance, the genomic resources of this plant are scarce. This study presented the first-ever report of expressed transcripts in the leaves of *Aegle marmelos*. A total of 133,616 contigs were assembled to 46,335 unigenes with minimum and maximum lengths of 201 bp and 14,853 bp, respectively. There were 7002 transcription factors and 94,479 simple sequence repeat (SSR) markers. The *A. marmelos* transcripts were also annotated based on information from other members of Rutaceae; namely *Citrus clementina* and *Citrus sinensis*. A total of 482 transcripts were annotated as cytochrome p450s (CYPs), and 314 transcripts were annotated as glucosyltransferases (GTs). In the *A. marmelos* leaves, the monoterpenoid biosynthesis pathway was predominant. This study provides an important genomic resource along with useful information about *A. marmelos*.

**Keywords:** *Aegle marmelos* (L.) Corr.; transcripts; transcriptome assembly; simple sequence repeats; transcription factors; cytochrome p450; glycotransferases; metabolic pathway

#### **1. Introduction**

*Aegle marmelos* (L.) Corr. (2 n = 18) or Bael is an underexploited member of family Rutaceae. Believed to be native to the Indian subcontinent, it is well distributed throughout the tropical and subtropical belts of southeast Asia [1,2]. Botanically, *A. marmelos* is a deciduous tree stretching up to 10 m in height that flowers during the months of May–June [3,4]. It is also commonly grown as a horticultural plant in India, and its fruits are processed as juice or candies, as well as eaten fresh. During the past few decades, a spike in its cultivation as a horticulture plant has been attributed to its medicinal properties, along with a hardy nature that allows it to be cultivated on marginal lands with acidic or alkaline soils [5,6].

The traditional medicine system of Ayurveda in India routinely uses every part of *A. marmelos* as a therapy for medical conditions [7,8]. The leaves are most easily accessible, and are therefore more regularly used for the treatments than any other plant part. *A. maremelos* leaves are used to treat jaundice and help in wound-healing when applied as a paste on a wound surface [9]. Moreover, *A. marmelos* leaf extracts have been proved to be a better cure for gastrointestinal and hematopoietic damage than its fruits [10]. The leaf extract of *A. marmelos* is used as a medication against a number of chronic diseases such as diabetes, pancreatic cancer, and arthritis [11–14]. All of these medicinal properties of *A. marmelos* leaves are attributed to various phytochemicals present in the leaves such as aeglin, rutin, γ-sitosterole, β-sitosterol, eugenol, marmesinin, glycoside, skimmianine, etc. Broadly, these phytochemicals can be divided into three main classes: alkaloids, phenylpropanoids, and terpenoids [15,16]. However, there is no genomic data-based information about the pathways

of these important metabolic compounds that are present in the *A. marmelos* leaf. The information regarding the biosynthetic pathways and the encoding enzymes present in the *A. marmelos* leaf will be highly useful for the functional genomics in *A. marmelos* via transgenics and metabolic engineering approaches. Furthermore, *A. marmelos* leaf extract is used for the green synthesis of gold and silver nanoparticles [17,18].

The sum total of all of the transcripts captured in the cell of an individual organism is called its transcriptome [19]. There are two ways to capture the expressed transcripts: either by microarray, which is limited to predefined sequences, or by performing RNA-Seq using second-generation sequencing technologies [20]. This kind of sequencing has revolutionized the understanding of non-model organisms, and has evolved as one of the first choices of methods to apply to gene discovery and the expression profiling of non-model organisms [21,22]. The availability of well-defined computational tools, along with a well-applied methodology, has further demonstrated the effectiveness of de novo transcriptome assemblies in organisms even without a reference genome [23,24].

Genomic resources in A. marmelos are scarce compared with other members of Rutaceae, such as *Citrus sinensis* (Sweet Orange) and *Citrus clementina* (Clementine), both having well-annotated genomes [25,26]. Moreover, the unavailability of molecular markers based on the genomic information has further decelerated the molecular breeding efforts in *A. marmelos*. Earlier, a diversity study was carried out using only 12 random amplification of polymorphic DNA (RAPDs) [1]. This limitation can be overcome by developing an appropriate resource of genomic information-based molecular markers using a next-generation sequencing (NGS)-based approach such as transcriptomics [20,22]. To the best of our knowledge, this is the first detailed report on the transcriptome of this medically important plant. Moreover, only six expressed sequence tag (ESTs) are available in the National Center for Biotechnology Information (NCBI) database (accessed on 25 May 2018) [27]. An investigation into the leaf transcriptome of *A. marmelos* can help answer key questions regarding various aspects related to genes and their gene function, via the pathways involved in the metabolic compound formation. Therefore, we used RNA sequencing followed by the de novo transcriptome assembly of *A. marmelos* leaves to identify the transcription factors, simple sequence repeats (SSRs), and transcripts related to important metabolic pathways in the leaves of *A. marmelos*. Also, the information regarding cytochrome P450s (CYPs) and glucosyltransferases (GTs) extant in the leaf of *A. marmelos* was also accomplished.

#### **2. Materials and Methods**

#### *2.1. RNA Isolation and Sequencing*

Young and tender leaves from three mature and healthy plants of *A. marmelos* variety "Kaghzi" (~five years old) were collected from the Government Garden Nursery (coordinates at 29◦58 06.9" N 76◦52 50.8" E) in Haryana, India. The sampled leaf tissues were stored in RNAlater (Life Technologies, Carlsbad, CA, USA) till further use. RNA was extracted with a TRIZOL reagent (Life Technologies Corporation, Carlsbad, CA, USA) based RNA extraction protocol for plant leaves [28,29]. The quality of the extracted RNA was checked on a 1% formaldehyde denaturing agarose gel, and further quantified using a Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Montchanin, DE, USA). A pooled sample of RNA from three selected plants was used for a single cDNA library preparation. The library was prepared with a TruSeq RNA Library Prep Kit v2 from Illumina® (Illumina Inc., San Diego, CA, USA), and the library quantification was done using a Qubit Fluorometer (Qubit™ dsDNA HS Assay Kit, Life Technologies Corporation, Carlsbad, California, USA) and Agilent D1000 ScreenTape system (Agilent Technologies, Santa Clara, CA, USA). The library was further sequenced on the Illumina HiSeq 2500 (2 × 150 bp) platform (Illumina, Dedham, MA, USA).

#### *2.2. De Novo Assembly and Identification Coding Sequences*

The cleaned reads were assembled using Trinity software (version 2.4.0) and TransDecoder v. 3.0.1 (http://transdecoder.sourceforge.net/) [30] was used to identify candidate coding regions within the generated transcripts and look for the open reading frames (ORF) that were at least 100 amino acids long in order to decrease the chances of false positives.

#### *2.3. Gene Function Annotation*

The transcripts with ORFs were annotated with BLASTX (default parameters, *e*-value cut-off 10-5) by resemblance counter to NR (non-redundant protein sequences database of NCBI), protein family (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) [31], *<sup>e</sup>*-value cut-off of 1 × <sup>10</sup>−5, and cluster of orthologous groups (COG) (https://www.ncbi.nlm.nih. gov/COG/) [32]. The gVolantes server (https://gvolante.riken.jp) [33] was used for the assessment of the completeness of transcriptome assembly via BUSCO\_v3 selecting the plant ortholog set. Only transcripts pertaining to plant species were extracted and used for gene ontology. Pfam annotation was done with Hmmerscan, while Blast2Go was used for Gene Ontology (GO) annotation [34,35]. KEGG orthologies were estimated using the KEGG Automated Annotation Server (KAAS) by means of single-directional best hit method (http://www.genome.jp/kegg/kaas/) [36].

#### *2.4. Identification of Transcription Factors*

Transcription factors families present in the leaves of A. marmelos were identified by searching coding sequences identified by TransDecoder against the plant transcription factor database (PlnTFDB) (http://plntfdb.bio.uni-potsdam.de/v3.0/downloads.php) [37] with an e-value cut-off of < 1 × <sup>10</sup><sup>−</sup>10.

#### *2.5. Identification of Simple Sequence Repeats (SSRs)*

The presence of SSRs was determined by using MIcroSAtelliteidentification tool v1.0 (MISA) (http://pgrc.ipk-gatersleben.de/misa/) [38]. Briefly, the transcripts were checked 10 times for monorepeats, six times for direpeats, and five times for tri/tetra/penta/hexarepeats.

#### **3. Results**

#### *3.1. De Novo Assembly, Gene Prediction, and Functional Annotation*

RNA-Seq targeting expressed coding sequences has been used successfully in many medicinal and non-model plant species that do not have a reference genome (e.g., *Prosopis cineraria* L. [39], *Andrographis paniculata* Burm.f. [40], *Phyllanthus emblica* L. [41], *Picrorhiza kurroa* [42], and *Azadirachta indica* Royle ex Benth. [43]). Moreover, being a tree, *A. marmelos* can have a large genome size, which further restricts genome sequencing efforts [44].

The pooled RNA sample of *A. marmelos* leaves with RIN values around 8.0 generated a total of 115.92 million paired reads of high quality (Phred score > 30). Trinity assembler was used for the assembly, and after trimming of adapters, there was a total of 133,616 contigs (only from reads of 200 bp and above in length) clustered into 46,345 unigenes (Table 1). The raw data that was obtained as a result of sequencing was submitted to NCBI BioProject (PRJNA433585). The assembly completeness report from gVolante estimated that the transcriptome assembly was 90.15% complete (Figure S1). We scrutinized for an open reading frame that was at least 100 amino acids long in order to decrease the chances of false positives during open reading frames (ORF) predictions. The annotated transcripts with ORFs are listed in Table 2. A total of 90,525 transcripts were annotated to GO terms (Table S1). The transcripts related to plant species were extracted and used for gene ontology (Table 2).


**Table 1.** Assembly statistics of the leaf transcriptome.

**Table 2.** Annotation summary of *A. marmelos* leaf transcripts. COG: cluster of orthologous groups, GO: gene ontology, KEGG: Kyoto Encyclopedia of Genes and Genomes, ORFs: open reading frames, Pfam: protein family.


#### *3.2. GO Annotation*

In total, 600,642 Gene Ontology (GO) terms were mapped to the *A. marmelos* leaf contigs belonging to all the three possible classes, i.e., biological process (227,921 transcripts), cellular component (188,465 transcripts), and molecular function (184,25 transcripts) in the GO database (Figure 1). The breakdown of the proteins associated with the various biological process, cellular components, and molecular functions is illustrated in Figure 2. The "integral component of membrane" (GO: 0016021) associated with various cellular components, "transcription DNA-templated" (GO: 00006351) associated with biological processes and "ATP binding" (GO: 00005524) associated with molecular function were the most mapped terms in their respective categories (Figure 2.).

The GO terms primarily define three categories of functions: namely, the biological, cellular, and molecular functions for a gene product. This is achieved by associating a gene with their ontologies [45,46]. Earlier studies have pointed out a higher metabolic activity in the leaves of *A. marmelos*, which is because of the presence of phytochemicals such as alkaloids, flavonoids, and phenols [47,48]. We have identified a number of GO terms in the leaves of *A. marmelos*; this information could lead to the identification of important pathways of metabolic compounds in *A. marmelos* [49].

**Figure 1.** Genes associated with the biological process, cellular components, and molecular functions in the *A. marmelos* leaf transcriptome assembly.

**Figure 2.** Gene Ontology (GO) classification of *A. marmelos* transcripts. GO term are divided in three main categories: biological process (**a**), cellular component (**b**), and molecular function (**c**).

#### *3.3. Citrus Database Annotation*

The *A. marmelos* transcripts were also annotated via Phytozome (https://phytozome.jgi.doe.gov/) with reference to the *Citrus clementina* and *Citrus sinensis* genomes. This resulted in the mapping of 78.44% of the transcripts to the *Citrus clementine*, and 79.85% to the *Citrus sinensis* genome (Table 3). An almost similar number of transcripts were annotated with GO terms and KEGG annotation, respectively (Tables S2 and S3). However, recently, an extensive amount of relatedness was observed within the members of genus Citrus of family Rutaceae, which was based on the study performed by using the whole genome sequences of 60 members in the Citrus genus; the authors even pointed out the need for reformulation of the genus [50].


**Table 3.** Annotation summary of *A. marmelos*leaf transcripts with Citrus *sinensis* and*Citrus clementine* genome.

#### *3.4. Simple Sequence Repeats (SSRs) Prediction*

Simple sequence repeats (SSRs), or short tandem repeats or microsatellites, are short repeat motifs that show length polymorphism due to the insertion or deletion mutations of one or more repeat types [51]. We analyzed for the abundance of SSRs of annotated plant transcripts for *A. marmelos* leaf transcripts using the MISA tool, and the predicted SSRs statistics are shown in Figure 3. There were 58,354 transcripts that contained SSRs, and among these, 23,034 had more than one SSRs (Table S4). In total, 94,479 SSRs were identified, of which 65.27% were monorepeats, 19.78% were direpeats, and 13.40% were trirepeats (Figure 3). Tetra, penta and hexarepeats made up 1.01%, 0.26% and 0.24% of the total, respectively (Figure 3). However, out of a total of 94,479 identified SSRs, 11,400 (12.06%) were related to the compound formation.

**Figure 3.** Simple sequence repeats (SSRs) classes identified in the leaf transcripts of *A. marmelos*.

SSRs are codominant markers that are well dispersed throughout plant genomes. SSRs are popularly used for marker-assisted selection, fingerprinting, diversity assessment, and quantitative trait loci (QTLs) identification [52]. Routinely, SSRs are identified in the medicinal plants via transcriptome assemblies, because they are more robust and can also be transferred among different species within the same genus. These identified SSRs can also be used for the marker-assisted breeding in *A. marmelos* i.e., to breed this tree for a particular environment or condition. Otherwise, until recently, only diversity-related studies were conducted in *A. marmelos* using universal primers, and researchers were even limited to only 12 RAPDs and 16 universal ISSRs to access diversity among their *A. marmelos* genotypes collection [1,53]. Furthermore, these genomic information-based SSRs can help to identify and differentiate between homozygous and heterozygous individuals. SSRs are also commonly used for the map-based cloning of genes; a close association between genes and their SSRs is crucial in the context of genotyping and haplotyping [51,52].

#### *3.5. Transcriptional Factors Identification*

Gene expression patterns are regulated by transcription factors that in turn determine the different biological process [54]. A total of 7002 transcription factors were retrieved from the PlnTFDB. The 52 transcription factors were unique to the *A. marmelos* leaves; although these were out of a total 6122 that were extant above 100 in the unigenes (Table S5). The most abundant were Auxin response factors (ARFs) (717), myeloblastosis (MYB-related) (562), a basic domain/leucine zipper (bZIP) (437), and basic helix–loop–helix (bHLH) (417), whereas HB-Other (132) and CAMATA (109) were the least abundant (Figure 4).

**Figure 4.** Top 21 families of transcription factors identified in the *A. marmelos* leaf.

Auxin is the plant hormone that regulates the different plant processes from growth to senescence. Auxin response factors are necessary for the plant to response to auxin stimuli; they channelize the response via auxin response DNA elements that are present in the primary auxin response genes. ARFs switch the auxin response gene on and off via their transcriptional activation domain or transcriptional repression domain [55,56]. MYB-related transcription factors play many roles like protection against biotic and abiotic stresses. MYB transcription factors also regulate the metabolism of the phenylpropanoid pathway, and are well studied with respect to the regulation of primary and secondary metabolism in the plant [57,58]. Likewise, bZIP and bHLH transcription factors are also involved in the metabolic biosynthesis in plants, especially by activation of phenylpropanoid genes [59,60].

#### *3.6. Transcripts Encoding Cytochrome p450s (CYPs) and Glucosyltransferase (GTs)*

CYPs help in the primary and secondary metabolism of plants by catalyzing monooxygenation reactions. These cytochromes assist in the diversification of metabolic pathways in plants. Currently, these are potential targets for metabolic engineering for the overproduction of metabolites of interest [61]. There were 477 transcripts in total that were annotated cytochrome p450s (Table S6). Considering their vital role inmetabolic pathways, we further analyzed the abundance of SSRs annotated within these cytochrome p450 transcripts (Table 4). Among the 128 identified SSRs, 85 were with monorepeats, seven were with direpeats, and 36 were with trirepeats (Table S7).


**Table 4.** Prediction of simple sequence repeats (SSRs) for the annotated transcripts with cytochrome p450s (CYPs) and glucosyltransferase (GTs).

The last step in the production of plant secondary metabolites is glycosylation, which is carried out by glycotransferases (GTs) [62–64]. A total of 314 transcripts were annotated as glucosyltransferase (Table S8). We analyzed the abundance of SSRs that were present in these transcripts (Table S9), and among the 247 identified, 109 were monorepeats, 79 were direpeats, 58 were trirepeats, and only one was identified as a tetrarepeat (Table 4). The SSRs that were identified as using CYPs and GTs can be of immense potential for identifying genetic diversity among different *A. marmelos* accessions with divergent metabolic profiles.

#### *3.7. Identification of Biosynthetic Pathways in A. Marmelos Leaf*

*A. marmelos* leaves are used for the treatment of several medical conditions in Ayurveda and Yunani medicine systems [65]. The transcripts with the highest fragments per kilobase per million mapped reads (FPKM) values were extracted from an annotation file along with the Kyoto Encyclopedia of Genes and Genomes (KEGG) ID and sorted from the RNA-Seq by Expectation Maximization (RSEM) file that was obtained from the assembly for transcript quantification. Using the KEGG ID, pathways were identified (Table S10). RSEM is commonly used to obtain information regarding transcript abundance from the RNA-Seq data of an organism, even without a reference genome [66]. The pathway analysis identified that monoterpenoid biosynthesis and thiamine pathways were the two most expressed pathways present in the *A. marmelos* leaves (Figure 5).

**Figure 5.** The top 15 pathways in the *A. marmelos* leaf.

*A. marmelos* leaves have been reported to contain monoterpenoids as the principal metabolites in the leaves in levels as high as 93.9% [67,68]. Moreover, this monoterpenoid content is not affected by the geographic location of the plant, as it remains unaffected by changes in altitude, unlike many other metabolites [69]. Thiamine is naturally produced in the plants as a sulfur-comprising and water-soluble compound. *A. marmelos* contains thiamine, although thiamine concentration is higher in the fruits than the leaves, and is among the fruits with the highest thiamine content [70,71]. Detailed information regarding the routes, reactions, and encoded enzymes of the two most expressed pathways identified in the leaf transcriptome of *A. marmelos* is provided in Figures 6 and 7. This detailed information generated regarding the monoterpenoid and thiamine biosynthetic pathway will be useful for further genetic analysis of the production of these important metabolites and their pathway engineering.

**Figure 6.** *A. marmelos* leaf transcriptome encoded enzymes (highlighted) involved in the monoterpenoid biosynthetic pathway identified in the leaf.

**Figure 7.** *A. marmelos* leaf transcriptome encoded enzymes (highlighted) involved in the thiamine biosynthetic pathway identified in the leaf.

#### **4. Conclusions**

In cases of underexploited plant species, there is often not enough genomic information available to proceed with their genetic improvement, and subsequently transfer important genes from them to cultivated crops. Transcriptome assembly is a cost-effective alternative to genome sequencing for obtaining the information of expressed genes and assisting in the more effective development of underexploited crops and medicinal plants. RNA-Seq shines a light on genes and their functions, as well as the pathways that are present, and can subsequently lead to evolutionary studies via molecular markers. We have successfully performed the first de novo transcriptome assembly of *A. marmelos*, which is a plant with religious, medicinal, and horticultural importance. It is the first-ever information about this plant, which will be of immense value for evolutionary studies and represents the development of a valuable resource for *A. marmelos*. Also, once a transcriptome reference is available, anchored-based transcriptome assemblies and different types of evolutionary studies can be performed within family Rutaceae involving genus *Aegle*.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1999-4907/9/8/ 450/s1. Figure S1. Completeness assessment of *A. maremelos* leaf transcriptome assembly using gVolante server. Table S1. *A. marmelos* detailed annotation results. Table S2. Detail list of *A. maremlos* transcripts annotation with *Citrus sinensis* genome. Table S3. Detail result of *A. maremlos* transcripts annotation against *Citrus clementina* genome. Table S4. Detailed of SSRs in the *A. marmelos* leaf transcriptome. Table S5. Transcriptional factors family identified in *A. marmelos* leaf. Table S6. Detail list of cytochrome 450s (CYPs) annotation result. Table S7. SSRs identified in the cytochromes P450s transcripts. Table S8. Detail list of glycosyltransferases (GTs) annotation result. Table S9. SSRs identified in Glycosyltransferases (GTs) transcripts. Table S10. FPKM based top KEGG IDS in *A. maremlos* leaf transcriptome.

**Author Contributions:** P.K. and S.K. conceived and designed the project. P.K. performed the experiments. P.K. analyzed the data. P.K. and S.K. wrote the paper. Both authors read and approved the final manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We thank the anonymous reviewers for their careful reading of the manuscript and providing the insightful suggestions. P.K. would like to thank Bengaluru Genomics Centre Pvt. Limited, Bengaluru, India for letting him use their facility.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Genetic Diversity of the Endangered** *Dalbergia odorifera* **Revealed by SSR Markers**

**Fumei Liu 1,2,3, Zhou Hong 1, Daping Xu 1, Hongyan Jia 2, Ningnan Zhang 1, Xiaojin Liu 1, Zengjiang Yang <sup>1</sup> and Mengzhu Lu 3,\***


Received: 1 February 2019; Accepted: 25 February 2019; Published: 3 March 2019

**Abstract:** *Dalbergia odorifera* T. Chen (Fabaceae) is a semi-deciduous tree species indigenous to Hainan Island in China. Due to its precious heartwood "Hualimu (Chinese)" and Chinese medicinal components "Jiangxiang", *D. odorifera* is seriously threatened of long-term overexploitation and has been listed on the IUCN (International Union for Conservation of Nature's) red list since 1998. Therefore, the elucidation of its genetic diversity is imperative for conservation and breeding purposes. In this study, we evaluated the genetic diversity of 42 wild *D. odorifera* trees from seven populations covering its whole native distribution. In total, 19 SSR (simple sequence repeat) markers harbored 54 alleles across the 42 samples, and the medium genetic diversity level was inferred by Nei's gene diversity (0.36), observed (0.28) and expected heterozygosity (0.37). Among the seven wild populations, the expected heterozygosity (He) varied from 0.31 (HNQS) to 0.40 (HNCJ). The analysis of molecular variance (AMOVA) showed that only 3% genetic variation existed among populations. Moderate population differentiations among the investigated populations were indicated by pairwise Fst (0.042–0.115). Structure analysis suggested two clusters for the 42 samples. Moreover, the seven populations were clearly distinguished into two clusters from both the principal coordinate analysis (PCoA) and neighbor-joining (NJ) analysis. Populations from Haikou city (HNHK), Baisha autonomous county (HNBS), Ledong autonomous county (HNLD), and Dongfang city (HNDF) comprised cluster I, while cluster II comprised the populations from Wenchang city and Sansha city (HNQS), Changjiang autonomous county (HNCJ), and Wuzhisan city (HNWZS). The findings of this study provide a preliminary genetic basis for the conservation, management, and restoration of this endemic species.

**Keywords:** *Dalbergia odorifera* T. Chen; genetic diversity; population structure; EST-SSR marker; microsatellite marker; rosewood; conservation

#### **1. Introduction**

*Dalbergia odorifera* T. Chen, formerly named *Dalbergia hainanensis* Merr. et Chun, is endemic to Hainan province, southern China. It is a semi-deciduous perennial tree species (diploid) of predominant outcrossing in the Fabaceae family and one of the most valuable timber species in China. *Dalbergia odorifera* is restricted to relatively narrow tropical geographic areas in Hainan Island at altitudes below 600 m. Obviously, the development of *D. odorifera* plantations is required to alleviate the demand for this valuable wood, but so far no breeding systems have been established. In the 1950s, it was introduced to the subtropical areas of Guangdong, Guangxi, and Fujian provinces in China [1]. Following several decades, the introduced trees now exhibit a satisfactory growth performance, and have even formed valuable heartwood at most sites [1].

The heartwood of this species, locally known as "Hualimu" or "Huanghuli" (Chinese name), takes more than 50 years to mature. It is one of the most precious fragrant rosewoods with a high value on the furniture and craft markets (especially for luxury furniture and crafts) in China. As a source of traditional Chinese medicine, it is also known as "Jiangxiang", and contains a series of chemical components, such as flavonoid [2], phenolic [3], and sesquiterpene derivatives [4–6], which play important roles in the pharmaceutical industry for treatment of cardiovascular diseases [7], cancer, diabetes [8], blood disorders, ischemia, swelling, and rheumatic pain [9,10]. Due to its high medicinal and commercial value, *D. odorifera* has been overexploited for a long time and has been listed on the IUCN (International Union for Conservation of Nature's) red list by World Conservation Monitoring Centre (WCMC) since 1998 [11]. As a result, the species became rare, only limited numbers of individuals are found in parts of their original habitat, which was highly fragmented in the remaining forests in Hainan Island [12]. Therefore, a comprehensive survey is urgently needed to obtain information on the levels and patterns of genetic variation for *D. odorifera*. Such information is imperative for establishing an effective strategy for conservation and breeding purposes.

Molecular markers are often used to elucidate genetic variation in tree species [13]. However, in *D. odorifera* there are very few studies conducted using DNA molecular markers [12,14]. Compared to other molecular markers, microsatellite (simple sequence repeat, SSR) markers are the ideal choices for studying the genetic composition of wild populations because of their co-dominant character and high variability [15,16]. The use of microsatellite markers to analyze the genetic diversity of *D. odorifera* can provide an invaluable means for conservation and protection of this endangered species. Moreover, the developments of SSR markers have been innovated by next-generation sequencing based on transcriptomes (RNA-seq), especially for species without a reference genome [17–19]. This approach has been applied for SSR identification, development, and association studies in many tree species [20–22]. In the present study, we applied this approach and developed 19 polymorphic SSR markers specific for *D. odorifera*. The main objectives of this study were to use these developed SSR markers to evaluate the genetic diversity of wild *D. odorifera* populations, and find out the causes for the endangered and fragmented status of this species. The findings of this study will provide useful genetic information for conservation and breeding strategies in *D. odorifera*.

#### **2. Materials and Methods**

#### *2.1. Plant Materials and DNA Extraction Materials and Methods*

In total, 42 wild individuals representing seven *D. odorifera* populations were sampled from the whole Hainan Island of China (Figure 1, Table 1). We sampled all the trees with a diameter at breast height (DBH) larger than eight cm, and the 42 individuals were the last remaining resources. Ten leaves were collected from each individual and sealed in plastic bags with desiccants. Total genomic DNA was extracted for each sample using the Hi-DNAsecure Plant Kit (Tiangen, Beijing, China) according to the manufacturer's instructions. The quality and quantity of DNAs were determined by NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA).


**Table 1.** Geographical location of seven investigated *D. odorifera* populations in Hainan Island of China.

**Figure 1.** Geographic location of seven investigated *D. odorifera* populations collected from Hainan Island in China. P-HNQS—population of Wenchang city and Sansha city in Hainan province, P-HNHK—population of Haikou city, P-HNBS—population of Baisha city, P-HNDF—population of Dongfang city, P-HNLD—population of Ledong autonomous county, P-HNCJ—population of Changjiang autonomous county, P-HNWZS—population of Wuzhisan city. The pie charts estimated genetic structure of the seven populations based on STRUCTURE analysis with cluster number of two, in each chart, different color represents a different cluster accounted in each population.

#### *2.2. RNA Sequencing and Data Deposition*

To develop protocols, three leaves from three trees (H27, H98, and H100) were collected from three different populations of Haikou city (HNHK), Dongfang city (HNDF), and Changjiang autonomous county (HNCJ), respectively, and immediately put into liquid nitrogen. RNA extraction and sequencing

were done by Beijing Novogene Biological Information Technology Co., Ltd., Beijing, China (http: //www.novogene.com/). The sequence data were deposited in the database of SRA (Sequence Read Archive) at the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih. gov/), under accession number SRP175426, and SRR8398210, SRR8398212, and SRR8398211 were the three biosample accession numbers for H27, H98, and H100, respectively [23].

#### *2.3. SSR Identification and Marker Development*

The software of MISA (MIcro SAtellite; http://pgrc.ipk-gatersleben.de/misa) was employed to detect, locate, and identify SSR loci. The minimum number of motifs used to select the SSR was ten for mono-nucleotide repeats, and six for di-nucleotide motifs, five for tri-, tetra-, penta-, and hexa-nucleotide repeats. Primers were designed using Primer 3.0 software [24] using default settings with the following criteria: Predicted primer lengths of 18–24 bases, GC content of 40%–60%, annealing temperature of 56–62 ◦C, and predicted product sizes of 150–300 bp.

#### *2.4. Validation of SSR Marker by PCR and Capillary Electrophoresis*

Subsequently, DNAs from three samples (randomly selected from 42 individuals) were used to validate the 192 randomly selected SSR loci (exclude mononucleotide repeats) with the designed primers. PCR reactions were performed in 15 μL final volume, containing 10.25 μL water, 1.5 μL 10 x DNA polymerase buffer, 1.5 μL MgCl2 (25 mM), 0.3 μL dNTPs (10 mM each), 0.15 μL of each primer at 10 μM, 0.3 μL Taq polymerase at 5 units/μL (TaqUBA), and 1 μL of genomic DNA (40–50 ng). Totally 35 cycles of 94 ◦C for 15 s, appropriate annealing temperature for 15 s, and 72 ◦C for 30 s were performed, following the pre-denaturation at 94 ◦C for 3 min. PCR products of clear, stable, and specific bands with an expected length (100–350 bp) were considered as successful PCR amplifications. All the PCR reactions were repeated at least once. Finally, 22 SSR markers were randomly selected from the successful ones and used to analyze 42 samples. Their diluted PCR products mixed with 12.5 Hi-Di formamide and 0.25 μL size standard (Shanghai Generay Biotech Co., Ltd., Shanghai, China) were separated by capillary electrophoresis, and genotyped with an ABI 3730 Genetic Analyzer (Applied Biosystem, Foster, CA, USA) at Shanghai Generay Biotech Co., Ltd., Shanghai, China (http://www.generay.com.cn). Peak identification and fragment sizing were done using Gene Mapper v4.0 (Applied Biosystems, Foster, CA, USA) with default settings.

#### *2.5. Statistical Analysis*

The frequency of null alleles (FNA) and scoring errors were estimated using the Micro-checker software 2.2.3 [25]. POPGENE v1.3.1 software [26] was used to estimate the following genetic diversity parameters: Allele frequency, observed number of alleles (Na), effective number of alleles (Ne), expected and observed heterozygosities (He and Ho, respectively), Nei's gene diversity (GD), the percentage of polymorphic loci (PPB), and Wright's fixation index (F) and gene flow (Nm). The polymorphism information content (PIC) was calculated for each locus using the online program PICcalc [27]. F-statistics, including inbreeding coefficient within individuals (FIS), genetic differentiation among populations (FST), were computed using GenAlEx version 6.5, so were the pairwise Fst, pairwise G'ST (Hedrick's standardized genetic differentiation index, adjusted for bias) [28]. Hardy-Weinberg equilibrium (HWE) was evaluated using chi-squared tests for each population at individual loci [26]. The Ewens-Watterson test for neutrality at each locus was performed using POPGENE v1.3.1 [26]. Hierarchical analyses of molecular variance (AMOVA) were conducted using GenAlEx version 6.5 [28].

The genetic structure of the investigated populations was analyzed using STRUCTURE 2.0 [29]. The number of discontinuous K was estimated from one to seven with 20 replicates, both length of burn-in period and value of MCMC (Markov chain Monte Carlo) were set to 100,000 times [30]. The true value of clusters (K) were harvested online (http:taylor0.biology.ucla.edu/struct\_harvest/) according to the highest mean of estimated lnP(D) (log probability of data) and lnP(D)-derived

delta K value [31]. Repeated sampling analysis and the genetic structural plot were performed by CLUMPAK [32]. To summarize the patterns of variation in the multi-locus dataset, principal coordinate analysis (PCoA) was performed using GenAlEx version 6.5 software based on pairwise G'ST matrix. Next, Mantel tests were carried out between matrixes of pairwise G'ST and geographic and genetic distance (Nei's unbiased genetic distance) using GenAlEx version 6.5 software, respectively. Additionally, a Neighbor-Joining (NJ) tree based on Nei's unbiased genetic distance was drawn in MEGAX [33].

#### **3. Results**

#### *3.1. Distribution of SSR Loci in D. odorifera*

In total, 35,774 potential SSR loci were identified and distributed in 26,880 unigenes, of which 6629 (24.7%) contained more than one SSR locus (Table S1). The SSR loci distributed in the leaf transcriptome were of a frequency of 1/2.18 kb. According to the unevenly distributed prediction (Figure S1), mono-nucleotide repeat motifs were the most frequent (21,623, 60.44%), followed by di- (7612, 21.28%) and tri-nucleotide (6112 or 17.09%) repeat motifs. These three motifs represented 98.81% in all, whereas only 40 and 14 penta- and hexa-nucleotide repeat motifs were found, respectively.

#### *3.2. Development of Polymorphic SSR Markers*

We randomly selected 192 SSR loci and designed primers to test the specificity of amplification for three samples and the informative nature of these SSR markers. Of these, 104 pairs of primers (54.2%) either did not give any amplification products or gave unexpected products, while 88 (45.8%) produced clear amplicons with the expected size of 100–350 bp. Next, 22 of the 88 primers were randomly selected for polymorphism detection and 19 (86.4%) showed polymorphism (Table 2). Further information on these validated 88 SSR markers, including ID of cDNA sequence, SSR type, repeat motif, position in template sequence, primer sequence, annealing temperature, and expected amplicon length (for developing alternative primers if desired) is available in Table S2. Among the polymorphic SSR loci, three (15.79%) were confirmed to locate in coding sequences (CDSs), six (31.58%) in 5 -untranslated regions (5 UTRs), and three (15.79%) in 3 -untranslated regions (3 UTRs).

#### *3.3. Polymorphism of 19 SSR Loci*

In total, 19 SSR loci harbored 54 alleles across the 42 *D. odorifera* samples (File S1), the number of alleles detected per locus was in a range of two to five, with an allele frequency range of 0.01–0.99 (Table 3, Table S3). The largest number of alleles (five) was detected at locus S21, which also harbored the largest effective number of alleles (Ne, 2.79), expected heterozygosity (He, 0.65), Nei's gene diversity (GD, 0.64), and polymorphic information contents (PIC, 0.60). In terms of the overall PIC, both S09 and S21 were highly informative with PIC values higher than 0.50, while S02, S12, S23, S26, and S27 were less informative with PIC values smaller than 0.25, and the remaining 12 loci were moderately informative with PIC values between 0.25 and 0.50. The average of Wright's fixation index (F) was 0.16, ranging from −0.19 (S22) to 0.44 (S24). Furthermore, null alleles were found at loci S04, S09, S21, S24, and S29. Six loci (S04, S08, S09, S24, S27, and S29) showed significant deviations from the Hardy-Weinberg equilibrium across the 42 *D. odorifera* individuals. Additionally, all the 19 SSR loci were selectively neutral according to the Ewens-Watterson test for neutrality (Table S4).


**Table 2.** Characteristics of 19 SSR (simple sequence repeat) markers developed for *D. odorifera*.


**Table 3.** Diversity statistics of the 19 SSR loci across 42 *D. odorifera* samples.

Na—observed number of alleles, Ne—effective number of alleles, Ho—observed heterozygosity, He—expected heterozygosity, GD—Nei's gene diversity, PIC—polymorphic information content, FST—genetic differentiation coefficient, Nm—Gene flow, estimated from Fst, Nm = [(1/Fst) − 1]/4, F Wright's (1978) fixation index, FNA—frequency of null alleles, \* *p* < 0.05, likely contained null alleles, PHWE <sup>a</sup> *p*-value for deviation from Hardy-Weinberg equilibrium: ns not significant, \* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001.

#### *3.4. Genetic Diversity in D. odorifera*

Among the seven populations investigated, the number of polymorphic loci varied from 13 to 17, along with the percentage of polymorphic loci (PPB) from 68.42% to 89.47% (Table 4). Presenting the largest PPB, population HNLD also had the largest alleles number of 46 (Alleles), whereas HNHK, with the smallest PPB, had the smallest number of 35 (Alleles). In total, eight private alleles appeared among the investigated populations, of which, three appeared in HNDF, two in HNLD, and one in HNQS, HNBS and HNWZS, respectively. The observed heterozygosity (Ho) ranged from 0.24 (HNWZS) to 0.38 (HNDF) and expected heterozygosity (He) from 0.31 (HNQS) to 0.40 (HNCJ), with an average of 0.28 and 0.37, respectively. At the population and species level, the expected (He) heterozygosity was 0.36 and 0.37, respectively. Additionally, population HNCJ possessing the highest genetic diversity level (He, 0.40) also showed the largest value of Nei's gene diversity (0.36).


**Table 4.** Summary of different *D. odorifera* population diversity statistics averaged over the 19 SSR loci.

Population see Table 1, Size—number of sampled individuals, Alleles—total number of detected alleles, Na—observed mean number of alleles, Ne—mean effective number of alleles, Np—number of private alleles, Ho—observed heterozygosity, He—expected heterozygosity, GD—Nei's gene diversity, FIS—inbreeding coefficient, PPB %—the percentage of polymorphic loci, <sup>a</sup> diversity indices averaged over the 19 loci across all *D. odorifera* populations, <sup>b</sup> total number of sampled individuals.

Both AMOVA and pairwise Fst analysis were performed to investigate the genetic variations among these populations. The AMOVA analysis was conducted without grouping the investigated populations (population HNHK was not in this analysis for individuals below five). The result showed that only 3% of the total genetic variation occurred among populations, and 20% of the within population variation was due to the heterozygosity of the individuals within each population (Table 5). The overall FST was very small (0.03, Table 5), the overall gene flow was 2.58 (Nm) estimated among all these populations (Table 3). Furthermore, the pairwise Fst ranged from 0.042 to 0.115 (Table 6). The highest level appeared between populations HNQS and HNDF (0.115), whereas the lowest appeared between HNLD and HNCJ (0.042).


**Table 5.** Analysis of molecular variance (AMOVA) for six populations of *D. odorifera*.

*d.f.* degrees of freedom, population HNHK was not in this analysis for individuals below five, FST and FIS is based on standard permutation across the full data set, \* *p* < 0.05, \*\*\* *p* < 0.001.


**Table 6.** Pairwise Genetic Differentiation Index (Fst) between the seven populations.

Population see Table 1.

#### *3.5. Population Structure of D. odorifera*

An admixture model-based approach was implemented to evaluate the population structure of the 42 *D. odorifera* individuals. The optimum cluster number (K) of the investigated populations was two, with the largest values of both ln P(D) (log probability of data, −978) and delta K (15) harvested from the STRUCTURE HARVESTER website (Figure 2a,b). Based on K of two, a graphic representation of estimated membership coefficients of each individual was exhibited in Figure 2c. Each color showed the proportion of membership of each individual, represented by a vertical line, to the two clusters. The individual with the probability higher than a score of 0.75 was considered a pure one, and lower than 0.75 an admixture one. In this analysis, the yellow cluster included 17 individuals with 14 pure and 3 admixture ones, while the blue cluster included 25 individuals with 14 pure and 11 admixture ones. However, only HNBS and HNDF entirely consisted of individuals from the blue cluster, other populations consisted of individuals from both clusters.

The pairwise G'ST matrix was used for the principal coordinate analysis (PCoA). The first and second axis explained 63.25% and 22.13% of the variance within the molecular data, respectively (Figure 3a). Two clusters were clearly distinguished by PCoA analysis: Populations from HNHK, HNBS, HNLD and HNDF were grouped as cluster I, the other three populations (HNQS, HNCJ and HNWZS) grouped as cluster II. Moreover, the NJ (Neighbor-joining) dendrogram tree showed similar results based on Nei's unbiased genetic distance among the investigated populations (Figure 3b).

**Figure 2.** Results of STRUCTURE analysis for 42 *D. odorifera* individuals based on microsatellite data. (**a**) Estimation of population using mean of estimated lnP(D) (log probability of data) with cluster number (K) ranged from one to seven. (**b**) Estimation of population using lnP(D)-derived delta K with cluster number (K) ranged from one to seven. (**c**) Estimated genetic structure of the seven populations based on STRUCTURE analysis with cluster number (K) of two. In each plot, different color represents a different cluster and black segments separate the populations.

**Figure 3.** Relationships among the seven wild *D. odorifera* populations in Hainan Island. (**a**) Principal coordinate analysis (PCoA) based on pairwise G'ST (Hedrick's standardized Gst, analog of Fst, adjusted for bias), Coord.1 (63.25%): The first principal coordinate, explained 63.25% of variation; Coord.2 (22.13%): The second principal coordinate, explained 22.13% of variation. (**b**) Neighbour-joining (NJ) tree based on Nei's unbiased genetic distance among seven populations of *D. odorifera* in Hainan Island.

Subsequently, Mantel tests between the matrixes of pairwise G'ST and geographic distance (Figure 4a) and genetic distance (Figure 4b) were carried out, respectively. The results showed that genetic differentiations among the investigated populations were more attributed to the genetic distance (91.2%, Figure 4b) rather than to the geographic distance (27.7%, Figure 4a). Hence, there was no clear geographic origin-based structuring, or predominate isolation by distance among the investigated populations.

**Figure 4.** Mantel tests for pairwise G'ST matrix correspondence on relationships between geographic and genetic distance for *D. odorifera* populations in Hainan Island. (**a**) Relationships between pairwise G'ST and geographic distance. There is a positive relationship between the two elements (Rxy: 27.7%, *p* < 0.05). (**b**) Relationships between pairwise G'ST and Nei's unbiased genetic distance. There is a positive relationship between the two elements (Rxy: 91.2%, *p* < 0.05).

#### **4. Discussion**

#### *4.1. Development of SSR Marker for D. odorifera*

Measuring levels of genetic diversity within and among populations is essential to understand the adaptability to environments of a species, in particular for conservation studies to explore the causes of rare and/or endangered plant species [34]. However, reports on the genetic diversity of *D. odorifera* is scarce, as a shortage of molecular markers restricted to six RAPD (random amplified polymorphic DNA) [12] and 25 SRAP (sequence-related amplified polymorphism) [14] loci. The use of such dominant markers could give a biased estimation of genetic variation when populations are not in the Hardy-Weinberg equilibrium [35], which may be true for the *D. odorifera* fragmented populations. In the present study, six loci showed significant deviations from the Hardy-Weinberg equilibrium. Therefore, the development of co-dominant SSR markers for *D. odorifera* is of great use for genetic studies. In this study, we have identified 35,774 putative SSR loci from the leaf transcriptome dataset, substantially more than those reported for other legume species such as 5956 in *Prosopis alba* Griseb. [36], 5710 in *Millettia pinnata* (L.) Panigrahi [18], and 7493 in *Mucuna pruriens* (L.) DC. [37]. In addition, the dominant repeats and motif types of SSR also vary among the transcriptomes from different species. These differences may be attributable to different genome structure and composition in these species.

The effectiveness and success of SSR development rely considerably on the quality and the accuracy of the sequence data [38]. Therefore, the identified SSR loci need to be further validated. Of the 192 primer pairs selected, 45.8% (88) yielded the expected amplicons for each locus, indicating that no introns presented within the amplified regions. From those validated SSR markers, 22 were randomly selected for polymorphism detection, of which 86.4% (19) exhibited polymorphism among the 42 wild *D. odorifera* trees. The PIC content provides an estimation of the information content of locus. The average PIC value of these newly developed SSR markers is 0.31, which is comparable to or lower than that in other legumes, such as *Vigna umbellata* (Thunb.) Ohwi & Ohashi (0.2898) [39], *Mucuna pruriens* (L.) DC. (0.24) [37], and *Melilotus* species (0.79) [40], but relatively higher than that based on ISSR (inter simple sequence repeat) and RAPD markers in other *Dalbergia* species, such as *Dalbergia cochinchinensis* Pierre ex Laness (ISSR 0.101; RAPD 0.088) [41] and *Dalbergia oliveri* Prain (ISSR 0.147; RAPD 0.116) [42]. Both high and low allelic PIC value markers are useful for genetic diversity to avoid a biased estimation [43,44]. Therefore, these SSR markers developed in our present study appear to be useful for genetic studies of *D. odorifera* populations.

#### *4.2. Genetic Diversity of D. odorifera*

Genetic diversity is essential to the long-term survival of species and plays an important role in the genetic improvement through breeding programs. However, limited information on genetic diversity of *D. odorifera* is available. Prior to the present study, only one report has been conducted using six RAPD markers, indicating medium genetic diversity level (six populations) inferred by the percentage of polymorphic loci (PPB) of 54.55% and Nei's gene diversity (GD) of 0.21 [12]. Compared to which, our results exhibited a higher genetic diversity level with the higher values of 100% (PPB) and 0.36 (GD) using the 19 newly developed SSR markers. These differences may be due to the different numbers [45] and types of molecular markers [16] used in the studies, or alternatively, due to the different population sizes in the two studies.

Genetic diversity in wild plant species is often related to the geographic range, population size, longevity, mating system, migration, and balancing selection [34,45,46]. Higher genetic diversity is expected to reflect a better adaption to the environments of a species [47]. However, the medium genetic diversity level of *D. odorifera* was indicated by the observed and expected heterozygosity of 0.28 and 0.37, respectively. Most of studies have been concordant with the general trend or prediction that species with narrow or endemic distributions maintain significantly lower levels of genetic diversity than species with widespread distributions [48–51]. Notably, the native habitat of *D. odorifera* is restricted to small regions in Hainan Island. It is no wonder then that the genetic diversity values in the present study are much smaller than those in wide spread tropical tree species such as *Olea europaea* Linn. (12 SSR markers, Ho = 0.75, He = 0.6) [20], *Prunus africana* (Hook.f.) Kalkman (6 SSR markers, Ho = 0.68, He = 0.73) [52], and *Eugenia dysenterica* DC. (9 SSR markers, Ho = 0.545, He = 0.62) [53]. However, the diversity of *D. odorifera* is even lower than in some rare and endemic tree species like *Boswellia papyrifera* (Del. ex Caill.) Hochst (He = 0.69) [54], *D. cochinchinenesis* (He = 0.55), and *D. oliveri* (He = 0.75) [55]. Similar observations were reported for *Ottelia acuminate* (He = 0.35, endemic to southwestern China) [56] and *Dipterocarpus alatus* Roxb. ex G.Don (He = 0.22, endemic to southeastern Vietnam) [57], resulting from the extensive reduction in population sizes caused by human disturbance. Similarly, due to the long-time over-logging for the valuable fragrant heartwood, the distributions of *D. odorifera* populations in Hainan Island have been dramatically reduced in the past thirty years. The present populations were highly fragmented into subpopulations, each composed of only a few individuals and large trees are seldom [12]. This is consistent with the suggestion that

the distribution-restricted plant species are associated with a relatively low genetic diversity primarily from over-exploitation of their resources.

In the present study, the mean observed heterozygosity (0.28) was much lower than the mean expected heterozygosity (0.37), and the Wright's (1978) fixation index (F) was up to 0.16 across the 42 wild trees (Table 3), indicating a modest heterozygote deficiency existed within the entire wild distribution range for *D. odorifera*. This result may be attributed to the botanical characteristics of *D. odorifera*, more specifically, to its complicated reproduction system which causes a relatively high inbreeding coefficient (Table 5) [58]. *Dalbergia odorifera* is a predominantly outcrossing species [12]. Its flowers are entomophilous pollinated by small insects and fruits with flattened seedpods are dispersed by wind [59,60], which limits longer distance dispersal. Moreover, *D. odorifera* has the ability of coppice regeneration especially stimulated by trunk injuries [12]. These characters are consistent with a predominantly outcrossing mating system that includes at least some extent of inbreeding. Alternatively, it is due to the small populations in which mating between relatives occurred more frequently than in large populations [61].

#### *4.3. Genetic Differentiation and Population Structure*

Woody species with predominately outcrossing tend to have less differentiation among populations and high variation within populations [34]. Similarly, our AMOVA analysis showed that most of the genetic variation was within the investigated populations of *D. odorifera*, while only 3% genetic variation components existed among populations, which is much lower compared to other *Dalbergia* species (0.236, *D. cochinchinensis*; 0.126, *D. oliveri*) [55]. Genetic differentiation among different populations is strongly influenced by gene flow (Nm) and genetic drift [62]. For neutral genes, the value of Nm below one indicates that genetic drift is a predominant factor affecting population structure, whereas the value above four indicates that gene flow can replace a genetic drift [48,63]. In the present study, the 19 SSR markers, which were selectively neutral according to the results of the Ewens-Watterson test (Table S4), are excellent for investing the effects of gene flow and genetic drift, showing overall gene flow of 2.58 (Table 3). This relatively high gene flow could curtail parts of the dispersive effects caused by genetic drift, reducing the genetic variation among populations while increasing the diversity within populations. However, the frequent migration indicated by the relatively high gene flow was opposite to the fragmented distributions of the investigated populations, for which, the putative explanation may be that of frequent human actions, primarily due to overexploitation and illegal logging [12]. Similar observations were also found for *Acer miaotaiense* (P. C. Tsoong) [34] and *Plectranthus edulis* (Vatke) Agnew [47]. Additionally, a genetic drift could not be ignored, since the population sizes are so small that any reduction in size could result in genetic drift.

Pairwise Fst was in a range of 0.042–0.115 (Table 6), suggested that moderate population differentiation were found among these wild populations. The highest level of genetic differentiation (0.115) was found between populations HNDF and HNQS, and the distance between them was about 220 km, which was matching with the indication that long-term isolation may limit the level of gene flow between two populations [34]. However, the genetic differentiation level between HNLD and HNHK was only 0.055, the distance between them was also 220 km, which was opposite to the indication. An admixture model-based approach was implemented to evaluate the population structure, and suggested two clusters were the best for the 42 *D. odorifera* trees. Similar results were generated from both neighbor-joining and PCoA analysis. They all distinguished the investigated populations into two clusters—cluster I consisted of populations from Haikou city (HNHK), Baisha autonomous county (HNBS), Ledong autonomous county (HNLD), and Dongfang city (HNDF); while cluster II consisted of populations from Wenchang city and Sansha city (HNQS), Changjiang autonomous county (HNCJ), and Wuzhisan city (HNWZS). Moreover, genetic differentiation among the investigated populations showed positive relationships with both geographic distance (27.7%, Figure 4a) and genetic distance (91.2%, Figure 4b) distinguished by the Mantel tests, which was

attributed more to the genetic distance rather than the geographic distance. Across all these analyses, no clear geographic origin-based structuring or predominant signs of "isolation by distance" were found, the present population structure of *D. odorifera* was more likely to be inferred by human activities.

#### *4.4. Conservation*

The main goal of conservation is to establish a suitable strategy for maintaining current genetic diversity and ensuring the long-term evolution of an endangered species [64]. The current state of *D. odorifera* is: Medium genetic diversity level along with modest heterozygosity deficiency, low genetic differentiation, and really small population size. This pattern mainly results from extensive human activities, primarily due to the over-logging. Many necessary approaches have been implemented by the Chinese government: (1) *Dalbergia odorifera* has been promoted to a second-grade state-protected species and it is forbidden to exploit natural resources; (2) national parks and sanctuaries have been established for in situ conservation covering almost every habitat in Hainan Island, such as Hainan Jianfengling national reservation, Bawangling national park, and Wangning Botany Park, etc. However, the population size of *D. odorifera* is still decreasing due to illegal-logging. Therefore, impactful ex situ conservation strategies should be the best choice, to avoid the loss of genetic diversity due to illegal logging and increase the variability of progenies by "outcrossing" the trees available.

#### **5. Conclusions**

The present study provides an initial assessment on genetic diversity and structure of *D. odorifera* conducted using 19 SSR markers. Medium genetic diversity at the species level and low genetic differentiations among populations were found in this endangered endemic tree species. This pattern of genetic variation may be primarily caused by extensive human activities, and this information could be used in the establishment of conservation strategies of this endangered species. In addition, the large number of SSR loci may serve as tools for assisting breeding programs in future studies.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/10/3/225/s1, Figure S1: SSR motifs distribution, Table S1: Summary of SSR identified from the transcriptome. Table S2: Details of 88 validate SSR markers. Table S3: Allele frequency distribution across 42 *Dalbergia odorifera*. Table S4: The Ewens-Watterson test for neutrality across 19 microsatellite loci in *Dalbergia odorifera*. File S1: Raw data. Zip, contains three files: file a: 1-Allele-PDF, Allele reports, captured all the peaks in \*\*.pdf. (\*\* locus code, S01-S30); file b: 2-Raw and Statistical data-Excel, the 42 samples are identified to the "SAMPLE" labeled in the Allele reports. Scored raw data in "Raw-S01-S30. xlsx", then corrected the wrong captures and defined the allele series as integers to generate "S01-S30.xlsx" for statistics; file c: 3-Status of 42 wild *D*. *odorifera* trees.

**Author Contributions:** Data curation, Z.H. and N.Z.; formal analysis, F.L., H.J.; funding acquisition, Z.H., D.X. and N.Z.; investigation, N.Z. and Z.Y.; methodology, Z.H., D.X., X.L., Z.Y. and M.L.; project administration, Z.H. and D.X.; resources, X.L., H.J.; supervision, D.X.; writing—original draft, F.L.; writing—review & editing, F.L. and M.L.

**Funding:** This research was funded by Research Funds for the Central Non-profit Research Institution of Chinese Academy of Forestry (CAFYBB2017ZX001-4), National Natural Science Foundation of China (31500537), and Science Innovation Projects of Guangdong Province (2016KJCX009).

**Acknowledgments:** The authors are very grateful to Szmidt, A.E. (Department of Biology, Kyushu University) for the helpful comments on this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Genetic Diversity and Structure through Three Cycles of a** *Eucalyptus urophylla* **S.T.Blake Breeding Program**

#### **Wanhong Lu 1, Roger J. Arnold 1,\*, Lei Zhang <sup>2</sup> and Jianzhong Luo <sup>1</sup>**


**\*** Correspondence: roger.arnold@y7mail.com; Tel.: +86-158-1170-6013

Received: 17 April 2018; Accepted: 15 June 2018; Published: 21 June 2018

**Abstract:** *Eucalyptus urophylla* S.T.Blake is an important commercial tropical plantation species worldwide. In China, a breeding program for this species has progressed through three cycles but genetic diversity and structure in the breeding populations are uncertain. A sampling of field trials from these populations was carried out to evaluate their genetic diversity and structure using 16 microsatellite loci. Significant deviations from Hardy-Weinberg equilibrium were recorded at all 16 loci in the populations. Overall expected and observed heterozygosity (He and Ho) estimates of 0.87 and 0.59 respectively for the first cycle population, and 0.88 and 0.60 respectively for the third cycle population, revealed reasonably high levels of genetic diversity. The genetic differentiation coefficient (Fst) revealed low differentiation among pairs of provenances (from the species' native range) comprising the first cycle population (range: 0.012–0.108), and AMOVA results showed that the majority of molecular genetic variation existed among individuals rather than among provenances for the first cycle population and among individuals rather than among field trial sources in the third cycle population. Levels of genetic diversity appeared to remain unchanged from the first to third cycle populations, and the results indicate prospects for maintaining if not increasing diversity through recurrent breeding. Likely effects of artificial directional selection, prior to sampling, on both populations examined are discussed along with implications for future *E. urophylla* breeding.

**Keywords:** microsatellite locus; Hardy-Weinberg equilibrium; genetic differentiation; breeding population; artificial selection

#### **1. Introduction**

*Eucalyptus urophylla* S.T.Blake is a tall forest tree that has a natural distribution spanning seven of the Lesser Sunda Islands in eastern Indonesia, where it is mostly found growing on volcanically derived soils, and it also extends into East Timor. Across this natural range, the species can vary from a tall forest tree up to 45 m high to a shrub like form of less than2m[1,2]. Cross-pollination in the species is mostly effected by insects and birds [3] and though self-compatible, it is predominantly outcrossing but with a mixed mating system in natural stands [4]. On lower slopes it often co-occurs with *Eucalyptus alba* Reinw. ex Blume in mosaic stands [1] and it was only in 1977 that *E. urophylla* was described as a species separate from *E. alba* [3].

As an exotic forest plantation species *E. urophylla* is now one of the most commercially important hardwood species worldwide. Both the pure species and hybrid varieties involving this species (most commonly with *Eucalyptus grandis* W.Hill ex Maiden) are the foundation of substantial areas of commercial plantations in tropical and warmer subtropical regions for the production of pulpwood, fuelwood, poles, veneer logs and even saw logs [2,5,6]. It was first introduced to China in 1971 [7] and by mid-1990s the species and its hybrids had become leading genetic material for commercial plantations established in tropical and warmer sub-tropical areas of southern China [8,9]. Today in this

country, there are over three million hectares of plantations established with hybrid varieties of this species, and this resource provides livelihoods for hundreds of thousands of people [10].

Following the phenotypic diversity observed in the species for growth and stem-form through its natural range, high levels of genetic diversity have been recorded across this range through designed genetic field trials. Phenotypic observations/measurements on quantitative traits have been carried out in various statistically designed provenance/family cum progeny field trials of this species to examine genetic variability and diversity in adaptive and economically important growth, stem form and wood quality traits. Examples of such work with *E. urophylla* are provided by Hodge and Dvorak [11], who reported results for 65 provenances originating from the seven Indonesian islands where the species occurs naturally, that were tested in a series of 125 provenance/progeny trials planted in five countries, and by Kien et al. [12] who reported results on 144 families, representing 9 provenances, tested in two field trials located in northern Vietnam.

Genetic variation and relationships among and within natural populations of *E. urophylla* have also been examined in a number of molecular genetic studies. House and Bell [4] examined material from across the species' full natural range by using isozymes and found that most of the genetic diversity of the species was attributed to variation within populations. In contrast, they found that genetic differences between populations, at least for the isozymes examined, appeared to be small, while no striking patterns relating to geography were detected. Similarly, Payn et al. [13] and Payn et al. [14] investigated genetic diversity and geographical distribution of chloroplast DNA variations in the species and found moderate to high levels of genetic diversity throughout its geographic range (He = 0.70–0.78). However, the latter of these studies also found relatively low genetic differentiation among populations (Fst = 0.03), which the authors took to indicate low levels of recurrent gene flow among the Indonesian islands of the species occurrence.

In a separate study Tripiana et al. [15] used 10 microsatellite markers to study 360 seedlings, representing 49 provenances (referred to by them as "subpopulations"), spanning *E. urophylla*'s natural range in Indonesia and East Timor, which they grouped into 17 "natural populations", in order to assess the species' genetic diversity and structure. They found that microsatellite heterozygosity was moderate to high within populations (Ho = 0.51–0.72) based on the loci they examined. They also found that the index of fixation was significantly different from zero for all populations (FIS = 0.13–0.31), whilst the differentiation among populations was low (Fst = 0.04) and not significantly different from zero, due to extensive gene flow across the species' natural range via pollen flow. They also suggested that the FIS values observed might have been due to a Wahlund effect. The latter effect arises when two genetically distinct groups are (inadvertently or intentionally) lumped into a single sampling unit, either because they co-occur but rarely interbreed, or because the spatial scale chosen for sampling is larger than the true scale of a population (or of a subpopulation, depending on the definitions used for these terms). This Wahlund effect (substructure within populations) can lead to heterozygote deficits and deviations from Hardy-Weinberg equilibrium (HWE) [16,17].

However, despite the convincing evidence presented by such molecular genetic studies on *E. urophylla*, it is noteworthy that House and Bell [4] saw a paradox in that an apparent lack of, or at least relative minor, differentiation between populations of *E. urophylla* from its natural range found in isozyme, and subsequently also in molecular genetics studies, seemed somewhat contradictory compared to a high degree of population differentiation for morphological/adaptive traits. Pryor et al.'s [1] detailed study on morphological features from 23 populations across *E. urophylla*'s natural range supported the identification of three separate taxa from within the species: *E. urophylla*, *Eucalyptus orophila* L.D.Pryor and *Eucalyptus wetarensis* L.D. Pryor. Indeed House and Bell's [4] study identified a large degree of allelic diversity between the genetic material originating from the island of Wetar and that originating from other islands; the former populations having been classified as *E. wetarensis* by Pryor et al. [1]. Although such division into separate species has generally not yet been accepted either by eucalypt growers or researchers, a recently published taxonomic

classification of *Eucalyptus* species does identify *E. urophylla*, *E. orophila* and *E. wetarensis* as separate species [18].

Though studies such as those cited above [1,4,11,12,14,15] have provided excellent insights into the natural populations of *E. urophylla*, today most commercial growers of the species no longer focus on genetic material collected directly from natural populations. Worldwide, many *E. urophylla* improvement programs have already captured in their breeding populations diverse genetic material originating from the species' natural range, and out of the initial breeding populations most of them have by now progressed through several cycles of selection and breeding.

During the 1980s and 1990s genetically diverse seedlots of *E. urophylla* were imported to China for the establishment of a base population to support ongoing genetic improvement [9,19]. With this material a first cycle breeding population of *E. urophylla* was established as a series of field trials during the period 1988 to 1998 and included over 400 open-pollinated families representing 30 provenances from the species' natural range along with families and seedlots from various planted stands/exotic seed sources. Then in 2004 a second cycle breeding population of *E. urophylla* was established in China as a single field trial, with open-pollinated (OP) seed collected from selected plus-trees of the first cycle population (i.e., OP families). In 2006, a Chinese cooperative tree improvement program was initiated involving commercial growers and a number of government research institutes. Through this cooperative program, a third cycle breeding population for *E. urophylla* was established in 2010, with material selected from both the first and the second cycle populations.

How much genetic diversity is currently present in the third cycle breeding population of *E. urophylla* in China, and how this has been affected by the preceding cycles of selection, are critical questions. Maintaining broad genetic diversity through the successive cycles of breeding is essential for achieving genetic gains from both the current and future cycles of this species. Variation is needed in key economic traits, so that artificial selection can ultimately result in heritable genetic improvements [2]. The genetic diversity—quantifiable, unquantifiable and/or "cryptic"—will serve as the primary basis for adaptation to future biotic and abiotic challenges, and selection for traits not currently seen or linked to economic values, e.g. adaptation to future climatic shifts and/or emergence of new pathogens [15].

Thus, the present study was initiated in order to examine the first and third cycle *E. urophylla* breeding populations in China, using molecular genetic markers with the following specific objective: To evaluate the potential loss of genetic diversity through breeding cycles by comparing the genetic diversity, being assessed by molecular markers, between the founding (first cycle) and the descendent (third cycle) breeding populations. The parameters estimated in this study will also provide a benchmark for comparison with future, successive breeding populations.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

The natural stand origins of some of the genetic materials included in the breeding populations and examined in this study are known to be on the island of Wetar, Indonesia, and thus could be classified as *E. wetarensis* according to Pryor et al. [1] and Nicole [18]. However, for the purpose of the study reported here, the taxonomic classification of *E. urophylla* according to Brooker [20], is followed and all material involved is referred to as *E. urophylla*.

Samples were obtained from the first cycle breeding population of *E. urophylla* in China; see Table 1, Figure 1, and Supplementary Material Table S1 and Figure S1. In mid-2016 three out of the five field trials comprising this population (T46, T77 and T94) were sampled. Regarding the two other trials of the same cycle that were not sampled, trial T54 was terminated some years ago and therefore not available for sampling, and trial TJJ was just a duplicate of trial T94.


**Table 1.** Details of the samples obtained from 3 field trials (T46, T77 and T94) of the first cycle breeding population of *E. urophylla*.

Note: \* Key to abbreviations: Mt = Mount; DMFF = Dongmen State Forest Farm, Guangxi, China; Ind. = Indonesia. \*\* numerical ID's of provenances relate to locations indicated by numbered blue dots in Figure 2; \*\*\* The Mt Egon provenance was included in both T46 and T77 trials, but was represented by different families each trial (i.e., families in the 2 trials were mutually exclusive).

**Figure 1.** Development of three successive cycles of *Eucalyptus urophylla* S.T.Blake breeding in China. Trials shaded in grey were not sampled for this study; circled numbers indicate number of families contributing to the succeeding trial indicated by the associated arrow.

From the first cycle breeding population, 202 families were sampled; Table 1. Of these families, 170 were first generation progeny from mother trees originating from 20 natural stand provenances (i.e., provenances from the species' native range) from 4 Indonesian Islands (see Figure 2), and 32 families were progeny from plus-trees selected in earlier trials at Dongmen Forest Farm in China and presumed to be second generation genetic material. Unfortunately, the origins of the latter material are now not known as clear records are not available. The provenance categorization across the native range of the species was based on designations provided originally by CSIRO

(Commonwealth Scientific and Industrial Research Organization)'s Australian Tree Seed Centre, which was the supplier of the seedlots.

The single trial that comprised the second cycle breeding population (T135) had been intensively thinned and then suffered wind/typhoon damage over 5 years ago and was unsuitable for sampling. From the descendant third cycle breeding population, 125 families were sampled from one field trial (T164) that (initially) included almost a comprehensive set of the seed sources and families of third cycle breeding population, planted in 2010; Table 2 and Supplementary Material Table S1, Table S2 and Figure S1. Though this third cycle population comprised multiple field trials, each contained about the same set of families so just one trial was sampled for this study. Of the third cycle families sampled: 91 were progeny from mother trees selected in 4 trials of the first cycle breeding population (their mother trees represented 20 provenances from 4 Indonesian Islands of the species' natural range); 20 were progeny from mother trees selected in the first cycle breeding population which themselves were progeny of plus-trees selected in earlier trials at Dongmen Forest Farm; 14 were progeny from mother trees selected in the trial (T135), which comprised the second cycle breeding population, that were maternal descendants of first cycle families not related (maternally) to other families sampled from the third cycle population.

**Figure 2.** Geographic locations of *E. urophylla* natural stand provenances in Indonesia from which the genetic material included in the trials of the first cycle breeding population originated—the numbers linked to blue dots indicate the locations of the numbered provenances listed in Table 1.

Whilst each cycle of the breeding population might 'nominally' be considered as a generation, there is some variation among the families comprising each cycle with respect to the number of generations, at least on the maternal sides, and their descent from mother trees in natural stands in Indonesia and East Timor. The number of families sampled from the first and third cycles were a compromise between: (1) balancing the number of samples to represent the provenances/seed sources and their respective sizes (i.e., number of families from them) in each population; (2) trees available for sampling (considerations mostly for first cycle populations where thinning and some typhoon damage had reduced representation); and, (3) resources available for this study.

From each family sampled in each breeding cycle, fresh leaf tissue was collected from one tree (i.e., samples obtained from one tree per family, each family having originated from a different mother tree) for DNA extraction, in accordance with the methodologies used by Payn et al. [13], Payn et al. [14] and Tripiana et al. [15]. The one tree sampled per family in each cycle, was selected randomly, being the first tree found within each target family of the respective trials, starting from replicate 1 and working methodically through the replicates in numerical sequence.


**Table 2.** Details of the samples obtained from one of the field trials (T164), planted in 2010, of the third cycle breeding population of *E. urophylla*.

Note: \* Key to abbreviations: Mt = Mount; DMFF = Dongmen State Forest Farm, Guangxi, China; Ind. = Indonesia. \*\* numerical ID's of provenances relate to locations indicated by numbered blue dots in Figure 2.

#### *2.2. DNA Isolation and Microsatellite PCR Amplification*

Total genomic DNA was extracted from 300 mg of fresh leaf (from each tree sampled) using the modified cetyltrimethyl ammonium bromide (CTAB) method, following the methodology described by Wang [21]. DNA quality and quantity were determined by applying agarose gel electrophoresis and spectrophotometery, using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA).

Sixteen microsatellite loci were used in the study, all of which had been previously described and used in *Eucalyptus* species (Table 3). These loci were selected from among a total of 608 published microsatellite loci, as described by He et al. [22] and Brondani et al. [23], for their high polymorphism and allelic frequency differences between pools selected for genotyping a 'discovery' population following procedures described by Wang et al. [24].


**Table 3.** Details of 16 microsatellite loci examined in this study.

Note: EUCeSSR805 is an expressed sequence tag marker while all others are neutral markers across the whole genome.

Polymerase chain reactions (PCRs) were performed in a total volume of 10 μL, following a touchdown PCR procedure, as described by Li and Gan [25]. This procedure involved: Incubation at 94 ◦C for 1 min; 20 cycles of incubation at 94 ◦C for 20 s, 66 ◦C for 30 s with a decrease of 0.5 ◦C per cycle, and 72 ◦C for 1 min; 25 cycles of 94 ◦C for 20 s, 56 ◦C for 30 s, 72 ◦C for 30 s; then, a final extension at 72 ◦C for 10 min. Fluorescein-12-dUTP1 mM aqueous solution (MBI Fermentas Inc., Burlington, ON, Canada) was added to facilitate subsequent detection of PCR products using an Applied Biosystems 3130xl Series Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).

#### *2.3. Statistical Analyses*

GenePop v4.2 [26] software was used to test the breeding populations studied for Hardy-Weinberg equilibrium, heterozygote deficits and for heterozygote excesses based on the Markov chain method using 10,000 dememorizations, 20 batches and 5000 iterations per batch. The above provided probability test parameters for Hardy-Weinberg equilibrium (*P*HWE) separately for each one of the 16 microsatellite loci. Analyses with this software also provided estimates of the frequencies of null alleles at each locus.

The polymorphism information content (PIC) for every microsatellite locus was estimated separately for each breeding population by using the software PowerMarker v3.25 [27]. The number of alleles (Na), effective number of alleles (Ne), Shannon's Information Index (I), observed heterozygosity (Ho), expected heterozygosity (He), and Wright's fixation index (FIS) were also calculated separately for each locus (by population) using the software GenAlEx v6.4.1 [28], and then averaged across loci for each population. The same software was also used to determine counts of the number of private alleles (Npa) for each provenance and seed source [29]. Meanwhile, FSTAT v2.9.3.2 [30] software was used to evaluate genetic differentiation index (Fst) values on a pairwise basis between provenances of the first cycle breeding population.

#### *2.4. Population Genetic Structure*

Analyses of molecular variance (AMOVA) were carried out separately for the first and third cycle populations using GenAlEx v6.4.1 to apportion genetic variance within each population, based on 999 permutations. For the first cycle population, data from provenances represented by four or fewer families was omitted (i.e., 5 provenances omitted), and across the other 15 provenances the genetic variance was apportioned into that attributed to variation among provenances, where 'provenance' refers to both natural stand provenances and exotic seed sources (i.e., collections from trials at Dongmen), and to variation among individuals within provenances. As origins of two provenances included in the first cycle—DMFF I and DMFF II—were somewhat uncertain, the AMOVA was then repeated for the first cycle population without these two provenances. For the third cycle population, genetic variance was attributed to variation among seed sources, with each trial contributing material designated as a separate seed source, and to variation among individuals within seed sources.

Nei's genetic distance between the 15 provenances that contributed to the first cycle population (represented by ≥5 or more families in our sample), was estimated using Powermarker v3.25. These estimates were then used to create a neighbor-joining (NJ) dendrogram by applying cluster analysis using the unweighted pair group method with arithmetic means (UPGMA) and a bootstrap resampling number of 1000. MEGA v7 software [31] was used to edit the UPGMA-NJ dendrogram.

The number of genetically homogeneous clusters (K) in both the first and third cycle populations, were estimated using the software STRUCTURE v2.3.4 [32], which uses a Bayesian model-based clustering method that does not require prior information on either the number of sampling sites or the locations from which the individuals were sampled. The program parameters in STRUCTURE v2.3.4 were set as recommended by Pritchard et al. [33], including the assumption of admixture among populations and correlated allele frequencies. A burn-in period of 50,000 iterations was followed by 100,000 iterations of the Markov Chain Monte Carlo model (MCMC). The model was run for a range of K values varying from 2 to 16 and with 5 replications in each iteration. The optimal K value supported by the data was assessed according to the recommendations of Evanno et al. [34], whereby the statistic ΔK was calculated based on the rate of change in the log probability of data between successive K values. The optimal K value was determined with the highest ΔK method, and this was carried out using the software STRUCTURE Harvester v0.6 [35].

#### **3. Results**

#### *3.1. Microsatellite Loci Diversity and Polymorphism*

In total 459 and 428 alleles were identified across the 16 microsatellite loci examined in the first and third cycle populations respectively (Table 4). The average number of alleles per locus (Na) and the average effective number alleles per locus (Ne) were similar in both populations at 28.7 and 10.6 respectively for the first cycle, and at 26.8 and 10.4 respectively for the third cycle. Averaged across loci, both the polymorphism information content (PIC) and Shannon's information index (I) values for the third cycle population, being 0.87 and 2.58 respectively, showed little change from the first cycle population (0.86 and 2.56 respectively). Similarly, the values of average observed heterozygosity (Ho) and average expected heterozygosity (He) remained similar from the first to third cycle population, as did the values of Wright's fixation index (FIS) (0.32 and 0.31 respectively) (Table 4); values which suggest an excess of homozygotes in both populations.

**Table 4.** Loci genetic diversity indices for the first and third cycle breeding populations of *E. urophylla*; except for 'N' and 'Na total' the numbers for each trait in each population represent means across the 16 loci analyzed with standard deviations of these means given in brackets.


Note: N, number of trees and families sampled; Na Total, total number of alleles observed across all loci; Na, observed number of alleles per locus averaged across the 16 loci for each population; Ne, effective number of alleles per locus averaged across loci; PIC, polymorphism information content, averaged across loci; I, Shannon's information index, averaged across loci; Ho, observed heterozygosity averaged across loci; He, expected heterozygosity, averaged across loci; FIS, Wright's fixation index averaged across loci.

#### *3.2. Hardy-Weinberg Equilibrium and Null Alleles*

Both the first and third cycle populations showed marked deviations from Hardy-Weinberg equilibrium; at all 16 loci analyzed, the probabilities of Hardy–Weinberg equilibrium were non-significant (probability test parameters for Hardy-Weinberg equilibrium—*P*HWE—were less than 0.01 at each of the 16 loci). These results are in agreement with the excess of homozygotes suggested by some of the indices presented for both populations in Table 4.

Together with the indications of deviations from equilibrium at each locus, relatively high estimated frequencies of null alleles (from over 0.10 up to 0.36) were found at some loci in both populations studied (9 loci in both the first and third cycle populations). Although such null alleles were likely to have biased homozygote frequencies and hence the magnitudes of departure from the Hardy–Weinberg equilibrium, it is worth noting that all loci with low frequencies of null alleles (<0.05) also showed a departure from Hardy–Weinberg equilibrium.

#### *3.3. Population Diversity and Variation*

The portion of the first cycle population sampled in this study comprised predominantly first generation progeny (170 out of 202 families, representing 20 provenances) from the species' native range. The other families sampled (32 out of 202) were from mother trees selected at Dongmen Forest Farm (DMFF) and assumed to be second generation progeny. Among the 15 provenances represented by ≥5 families, Na by provenance (over all loci) ranged from 5.7 (Ilwaki) to 16.0 (Mt Egon) and Ne ranged from 3.8 (Ulanu River) to 8.3 (Mt Egon) and Shannon's information index (I) ranged from 1.44 (Ulanu River) to 2.25 (Mt Egon) (Table 5). Three of these 15 provenances—Andalan, Bangat and Wukoh—had no private alleles (i.e., Npa = 0) whilst in the other 12 of these provenances Npa varied from 2 (Jawaghar and Mt Lewotobi I) up to 18 (Mt Egon). Expected heterozygosity (He) by provenance ranged from 0.71 (Ulanu River) to 0.88 (Andalan) and observed heterozygosity (Ho) ranged from 0.33 (Ulanu River) to 0.69 (Jawaghar). The average of Wright's fixation index (FIS) across the 15 provenances was 0.28, and by provenance this parameter ranged from 0.13 (Jawaghar) to 0.54 (Ulanu River) indicating significant heterozygotic deficits and excesses of homozygotes for most provenances.


**Table 5.** Genetic diversity indices for the provenances comprising the first cycle breeding population of *E. urophylla*—results presented only for the 15 provenances which had 5 or more families sampled.

Note: N, number of trees and families sampled; Na, number of alleles per locus averaged across the 16 loci; Ne, number of effective alleles per locus averaged across loci; I, Shannon's information index, averaged across loci; Npa, total number of private alleles across all 16 loci; Ho, observed heterozygosity averaged across loci; He, expected heterozygosity averaged across loci; FIS, Wright's fixation index averaged across loci; Mt: Mount; Std. dev.: standard deviation.

For all the above genetic diversity indices, the two second generation seed sources were not exceptional; though DMFF I indices values were generally above those of DMFF II, both of them were only slightly above average. However, differences between these two sources may have been affected somewhat by the sample sizes; 23 families sampled represented DMFF I, but only 9 families represented DMFF II.

An AMOVA of hierarchical genetic variance across the 15 provenances of the first cycle breeding population represented by 5 or more families, revealed that only 3.4% of the total variance was attributed to variation among provenances, whilst most of the variance was due to variation among individuals within provenances (Table 6). When the AMOVA for the first cycle population was repeated after the two provenances of uncertain origins were removed—DMFF I and DMFF II—it yielded almost identical results to the previous one, indicating that those two sources, despite being potentially homogeneous with respect to provenance origins, had little effect on the AMOVA estimate regarding the level of among provenance variation. A separate AMOVA, based on the same loci, was carried out for the third cycle population and revealed similar results; genetic variation among individuals within seed sources accounted for almost 99% of the total molecular variance, whilst only 1% was attributed to variation among the seed sources (i.e., the five field trials of the first and second cycles from which the families were sourced).

**Table 6.** Analyses of molecular variance (AMOVA) based on 16 microsatellite loci across: 15 provenances of the first cycle breeding population of *E. urophylla* that were represented by 5 or more families; and, across the five field trial sources that contributed to the third cycle breeding population of *E. urophylla*.


<sup>1</sup> Significance levels of variance components were based on 999 permutations; \* indicates significant at *p* < 0.001.

#### *3.4. Genetic Structure of Populations*

The average genetic differentiation index values (Fst) for all pairwise provenance comparisons in the first cycle population was 0.044 and between any two provenances this index ranged from 0.012 (Egon vs. DMFF I) to 0.108 (Ulanu River vs. Jawaghar) (Table 7), suggesting a low degree of genetic differentiation among these *E. urophylla* provenances. The highest value achieved for this index was between two provenances from geographically distant origins; Ulanu River (Alor) and Jawaghar (Flores) which have a straight line distance of separation of approximately 220 km. Similarly, the Fst values of Ulanu River vs. Iling Gele, Ulanu River vs. Mandiri, and Ulanu River vs. Wukoh were also higher than other pairwise provenance comparisons and in each of these pairs the provenances had straight line distances of separation varying from around 184 to 222 km.


*Forests* **2018** , *9*, 372

#### *3.5. Structure*

Relationships between the 15 provenances from the first cycle population (that had 5 or more families sampled for this study) are summarized in an unrooted neighbor-joining dendrogram, which is based on Nei's genetic distance estimates, as shown in Figure 3. While the dendrogram generally does not show a strong connection to provenance geographic origins, it must be noted that many clusters and nodes identified in this are poorly supported (i.e. bootstrap values <50—see [36]). On account of this there can be little confidence in the patterns and genetic associations observed in the dendrogram. Of the two seed sources originating from previous trials (of exotic genetic material), DMFF I was closest to the provenances of Anadalan, Bangat and Iling Gele from the island of Flores, whilst DMFF II was closest to the provenance Ulanu from Alor Island, but these associations are dubious and uncertain on account of low bootstrap values associated with their branches.

**Figure 3.** Unrooted neighbor joining (NJ) dendrogram for the 15 *E. urophylla* provenances (from which 5 or more families were sampled) represented in the first cycle breeding population in China, numbers placed at the head of branches are bootstrap values (based on 1000 iterations).

Bayesian cluster analysis performed using STRUCTURE software on the first cycle population (all provenances and families sampled) initially suggested the existence of 12 genetically homogenous clusters within this population; log-likelihoods of the number of clusters plateaued at K = 12. However, methodology of Evanno et al. [34] strongly supported K = 4 as the correct number of clusters within this population. As STRUCTURE has been found to work "extremely well for inferring the number of clusters" even with Fst values down as low as 0.02 [37], there can be reasonable confidence in these cluster numbers; in the present study only 5 of the 231 relevant pair-wise provenance comparisons (first cycle population) had Fst values of less than 0.02.

Whilst the 12 clusters indicated by the initial analysis showed no discernible associations with geographic origins, the subsequently indicated 4 clusters aligned somewhat weakly with geographic provenance origins (Figure 4a). Individuals originating from earlier trials in China (i.e., DMFF I and DMFF II) were predominantly allocated to cluster 2, individuals from Flores Island were predominantly allocated to cluster 4 and those from Alor and Wetar were predominantly allocated to cluster 3. The members of cluster 1 were mixed, with only 34% or less of individuals originating from any particular island/seed source being allocated to this cluster. However, it must be noted that some

individuals might have been assigned to the wrong cluster, as Fst needs to be over 0.05 to achieve an assignment accuracy of 95% or more [37]; in the present study, the average of Fst across all relevant pair wise provenance comparisons was 0.044.

Similar cluster analyses conducted on the third cycle population suggested that it could be separated into just three genetically homogeneous clusters, i.e., K = 3 from the methodology of Evanno et al. [34] (Figure 4b). Weak patterns were also evident in the membership of these three clusters, with progeny from trials E94, TJJ and E135 being predominantly assigned to cluster 1; the former two of these trials comprised predominantly provenances from the islands of Wetar and Alor, whilst the latter trial itself comprised the second cycle population (and was predominantly second generation trees). Progeny from trials E46 and E77 were predominantly allocated to cluster 3; both of these trials contained predominantly progeny from provenances originating from the island of Flores. For cluster 2, membership was mixed with only 33% or less of individuals from any source (field trial) being allocated to this cluster.

**Figure 4.** Proportions from each source group belonging to genetically homogenous clusters, determined by Bayesian cluster analyses with number of clusters determined by methods of Evanno et al. [34], for: (**a**) the first cycle population (presented by source groups comprising island/country of origin); and, (**b**) the third cycle population (presented by source groups comprising field trials of origin) of *E. urophylla*. Categories on X-axes represent geographic origins (parent locations), and these are for: (**a**) DMFF = Dongmen Forest Farm, China; Flores = Flores Island, Indonesia; Alor = Alor Island, Indonesia; Wetar = Wetar Island, Indonesia; for (**b**) E46, E77, E94 and TJJ are first cycle family trials, and E135 is the second cycle family trial.

#### **4. Discussion**

#### *4.1. First Cycle Population*

The first cycle breeding population, which included over 25 *E. urophylla* provenances originating from its natural distribution and 5 or possibly more exotic seed sources (though only 20 provenances and 2 exotic seed sources sampled in this study), was established to provide a foundation for long term breeding through recurrent cycles of selection and inter-mating. The intentionally broad selection of provenances, representing 5 of the 7 islands where the species grows naturally in Indonesia and East Timor, was anticipated to provide broad genetic diversity and hence a solid foundation for long term genetic gains.

The diversity indices estimated in this study for the provenances included in the first cycle population, Na = 5.7–16.0 alleles/locus and average Ho and He of 0.58 and 0.84 respectively, indicate that it harbors reasonable levels of genetic diversity. While these results provide clear insights on the breeding population, it must be emphasized that they do not necessarily reflect the population genetic parameters of the 20 provenances of the natural range, nor when considered collectively do they reflect the genetic diversity within the species' entire natural range. Five of the provenances were represented by four or fewer families (thus DNA samples were only collected from four or less trees per provenance), and the trees that were sampled as representatives of those provenances had been subject to intensive artificial selection well before the sampling for this study was carried out.

Despite the above limitations, the genetic diversity parameters estimated for the first cycle breeding population were similar to those found by previous studies carried out on natural populations of this species not subjected to prior selection. The values estimated were also higher than those estimated for a number of other *Eucalyptus* species. For example, Jones et al. [38] obtained a He estimate of 0.62 from an *E. globulus* study that included 158 trees from four natural populations of the species, and Elliot and Byrne [39] reported He estimates, by population, in *E. occidentalis* ranging from 0.30 to 0.41.

The relatively low genetic differentiation observed for the first cycle population, as indicated by the pairwise Fst values that ranged from 0.008 to 0.108, was not unexpected. Similar values (for Fst) have been reported for natural populations of a range of other *Eucalyptus* species; i.e., Fst = 0.03 in *E. populnea* [40], Fst = 0.045 in *E. marginata* [41], Fst = 0.044–0.065 in *E. camaldulensis* [42], and Fst = 0.08 in *E. globulus* [38]. In the case of the species involved in the current study, House and Bell [4] suggested that pollen flow among populations and even among islands, mediated by birds and bats, may have contributed to the low differentiation between its geographically disparate populations.

The diversity indices estimated in this study from material sampled from the first cycle population (Table 5) were generally slightly lower than those found for "populations" of *E. urophylla*'s native range by two previous studies, though most "populations" in those previous studies encompassed multiple natural stand provenances. Payn et al.'s [13] study of 357 families from 19 populations (encompassing 45 natural stand provenances), representing all 7 islands, reported average number of alleles (Na) per locus per population of 7.7 to 12.0, expected heterozygosities (He) per population of 0.44 to 0.90 and observed heterozygosities (Ho) per population of 0.44 to 0.78. Tripiana et al.'s [15] study of 17 populations (encompassing 49 natural stand provenances) found Ho's per population of 0.51 to 0.72 and average Na's per marker loci, by population, of 5.2 to 10.6. As well as sampling differences arising from examination of populations (each comprising multiple provenances) vs. individual provenances, the lower indices of the present study might also be due to the use of different microsatellite markers than what were used by the earlier studies and/or the effects of artificial selection (prior to sampling) on the provenances examined in this present study.

Notable differences between this current study and both Payn et al.'s [13] and Tripiana et al.'s [15] studies regarding the indices estimated, were for Wright's fixation index (FIS). In the current study the estimates for this index for the first cycle population ranged from 0.13 to 0.54 at a provenance level (while when averaged across the 15 provenances represented by ≥5 families it was 0.28), and were generally higher than those reported by Payn et al. [13], who found values (per population) of just 0.017 to 0.150 and a little higher than those of Tripiana et al. [15], who reported values (per population) of 0.13 to 0.31. It is noteworthy that the latter authors suggested that the fixation index (FIS) values they reported could have been overestimated, as their DNA samples were extracted from non-selected seedlings that might have also included some seedlings originating from selfed seed. Given that intensive selection for growth and form had been a factor in the breeding populations sampled in this study, such a factor is quite unlikely to have contributed to our results.

Several factors might have contributed to the elevated fixation index values of the current study, compared to natural stand populations not subjected to artificial selection. In the first cycle population a number of provenances had relatively high and positive fixation index (FIS) values (>0.30), suggesting possible inbreeding; the provenance with the highest FIS value (Ulanu River, FIS = 0.54) had the lowest observed heterozygosity (Ho = 0.33).

The positive and relatively high fixation index values (FIS) found in the first cycle population of this study indicate a marked deviation from Hardy-Weinberg expectations, corroborating the Hardy-Weinberg test results obtained by individual loci. The latter result was not unexpected; both Tripiana et al. [15] and Faria et al. [43] had previously observed deviations from the Hardy-Weinberg equilibrium in natural stand genetic material of this species, a situation they attributed to an excess of homozygotes across most microsatellite loci they examined. At least three

factors may have contributed to this situation in the first cycle population studied here. Firstly, the presence of directional selection: a key requirement for equilibrium to be reached is the absence of directional selection [44], yet when sampled for this study, prior heavy selective thinning had already been performed on the population resulting in less than 15% of the trees originally planted (and 75% of families) remaining. But while such selection may have contributed to deviation from the Hardy-Weinberg equilibrium, it's hard to explain how the selection for traits of relatively low heritabilities—Kien et al. [12] and Hodge and Dvorak [11] reported within provenance narrow sense heritabilities for growth traits of mostly less than 0.25—could have resulted in marked selective pressure on the alleles of the 16 microsatellite loci examined in this study, as most or all of these loci were likely neutral for traits of selection (growth and form). Secondly, the Wahlund effect; such an effect was suggested by Tripiana et al. [15] based on results of their study on 17 "natural populations" of *E. urophylla* which included 49 provenances. Their aggregation of provenances resulted in their "populations" (within the total population they studied) having originated from wide geographical ranges and each likely "consisted of several possibly differentiated subpopulations". But, for the present study, any Wahlund effect could not have come from lumping of geographically distinct groups into single sampling units: we examined heterozygosity (in the first cycle) by provenance (Table 5) and all but two of the provenances were natural stand provenances. But rather than being due to lumping, it's possible that variable levels of inbreeding and/or hybrid introgression within provenances might have resulted in some unknown substructuring within provenances resulting in an apparent Wahlund effect. On all of the islands of *E. urophylla*'s natural range except Timor, the species and *E. alba* are sympatric at elevations between 400 and 800 m, and occasional natural hybrids between these two species have been recorded in their natural ranges [3,45]. On the island of Timor the two species occasionally co-occur [45]. Dvorak et al. [46] suggested that natural introgression with *E. alba*, may have had great influence on the genetic architecture of *E. urophylla*, as natural hybridization and introgression is often apparent in field trials. Indeed, Hodge and Dvorak [11] noted that some *E. urophylla* provenances in their extensive trials had up to 50% of progeny being white-barked trees, which they considered as indicating high levels of introgression with *E. alba*. Bark characteristics of the two 'pure' species are distinct, with *E. alba* having a smooth white bark and *E. urophylla* having rough brown fibrous bark that varies from a short basal stocking through to covering the trunk and extending to small branches [3,46]. Besides such introgression, inbreeding might also have been a factor contributing to an apparent Wahlund effect in the first cycle *E. urophylla* population (as well as directly contributing to a violation of Hardy-Weinberg Equilibrium through creating homozygotic excesses). While inbreeding due to selfing was unlikely, as House and Bell [4] found *E. urophylla* to be predominantly outcrossing (at least in natural stands) with mean multi-locus outcrossing rates (t) of around 0.90 and low variations between individual trees in outcrossing rates, the species' mixed mating system could have contributed to inbreeding. The latter authors had found high levels of Wright's fixation index (FIS), as is the case in the current study, a situation they attributed to breeding among close relatives (individuals with high coancestry) being prevalent in natural stands that had resulted in a lower level of inbreeding compared to actual selfing. Thirdly, the amplification failure of certain alleles at individual loci resulted in some null alleles and likely led to some heterozygotes being genotyped as homozygotes [17]. We estimated that null alleles occurred at frequencies (per locus) of 0.00 to 0.36 across the 16 microsatellite loci examined in the first cycle population. However, it must be noted that even at the seven loci where the frequencies of null alleles were low (below 0.05), significant deviations from the Hardy–Weinberg Equilibrium still occurred.

Out of the total molecular genetic variance recorded in the first cycle population, the vast majority (96.6%) was associated with variation among individuals within provenances. Despite the artificial selection that this population had been subject to, the percentage of variance attributed to variation among individuals was almost the same as that reported by Payn et al. [13], whose AMOVA showed that variation among unselected individuals within populations (most of which incorporated a range of geographically proximal provenances) also accounted for around 96.6% of the total molecular variance.

Even so, artificial selection probably played some role in the generally poor support (low bootstrap values), and hence lack of clear patterns, in the topology of the dendrogram illustrating genetic distances and clustering of the provenances, that was generated for the first cycle population. This result was contrary to results obtained by Payn et al. [13], whose dendrogram generated for 19 *E. urophylla* 'populations' of the species' native range coincided to a large extent with the geographic origins; their populations were generally clustered by island of origin. The latter authors identified two genetically homogenous groups (clusters), with strong geographic patterns; one of the clusters was clearly dominated by populations from the western islands (i.e., Flores, Pantar, Alor, Lomblen and Adanara) and the other one by populations from the eastern islands (Wetar and Timor). A number of factors in addition to artificial selection likely contributed to the differences between Payn et al.'s [13] cluster analysis results and those obtained from the present study (for the first cycle population), including sampling differences and loci differences, besides the fact that alignment of genetic differentiation with geographic origins is generally weak for *E. urophylla* for a variety of traits [4,13].

#### *4.2. Third Cycle Population*

The third cycle *E. urophylla* population involved in this study included families selected from the populations of both preceding cycles. This strategy was implemented in order to capture more genetic diversity than would have likely been available from just the single somewhat limited trial that comprised the second cycle population (it omitted selections from two of the first cycle trials). Indeed, the genetic diversity indices estimated for the third cycle population, with the total number of alleles (Na) being 26.8, observed heterozygosity (Ho) being 0.60 and expected heterozygosity (He) being 0.88, indicate that it does indeed contain a reasonable amount of genetic diversity which is very close to what was found in the first cycle population. This result was obtained even though the third population (and the sample obtained from it) was smaller, with respect to the number of families, than the first cycle population (and the sample obtained from it).

Departure from Hardy-Weinberg Equilibrium persisted in the third cycle too. This was indicated by the expected and observed heterozygosity estimates, along with the value estimated for Wright's fixation index (FIS), and the Hardy-Weinberg equilibrium probability test parameters for each one of the 16 microsatellite loci. But, had panmixia been achieved among the individuals selected for retention in the trials of the first and second cycle populations, the descendent third cycle population would have been expected to have a somewhat closer approximation to equilibrium. But this was not the case and we need to consider the reasons for this outcome.

One factor contributing to the third cycle population's deviation from Hardy-Weinberg equilibrium would have been artificial selection. Parents of all families included were superior trees selected within the field trials comprising the first and second cycle populations. Then, within the third cycle population, additional selection had been carried out prior to sampling: at the time of sampling for this study only 25% of trees originally planted remained in the trials, representing 158 out of the 195 families initially included.

Could the Wahlund effect, whatever its actual cause, also have persisted through to the third cycle and contributed to the observed departure from Equilibrium? This population comprised families that were second and third generation descendants from a wide range of provenances that exhibited generally limited genetic differentiation, while a range of factors discussed above could have contributed to a substructuring of the population groups examined into gene pools differing in their allele frequencies. Substructuring might have arisen due to provenance origins, with effects persisting through the cycles on account of inadequate panmixia, and/or due to some individuals having originated through hybridization with *E. alba*, and/or having varying levels of inbreeding.

Panmixia might not been achieved in the first and second cycle populations due to a combination of differences in phenology associated with geographic ancestral origins (i.e., between islands of origin and even between provenances within islands) and/or spatial separation of the separate field trials comprising the first cycle population. Swain et al. [47] found that ancestral provenance origins had a significant influence on genetic parameters into at least the second generation of an *E. nitens* breeding population (i.e., grand-maternal provenances effects were evident). Differences in phenology could have been a factor in maintenance of provenances effects through generations of open pollinated breeding.

Differences between the field trials of origin (i.e., immediate parents) in regards to average effective outcrossing rates between the selected parental plus-trees may also have been a contributor to substructuring. It is well known that outcrossing rates can affect the genetic quality of seed from *Eucalyptus* species [48], and despite that the overall genetic diversity of the third cycle population seemed reasonable (compared to the first cycle population), outcrossing rates weren't examined specifically.

Also, even into the third cycle population differential introgression (from *E. alba*) might still be leading to some substructuring. Parent tree selection in the first and second cycles was based on growth and form, irrespective of bark characteristics and/or leaf morphology, and thus some of the parent trees selected could well have carried significant amounts of *E. alba* alleles. In an intensive, well advanced breeding program (>3 generations) for *E. urophylla* in Indonesia, it was found that even after several generations of selection as *E. urophylla*, distinct *E. alba* traits were still present in some individuals [49].

Results from the cluster analyses of the third cycle population, when considered together with those from the first cycle population, accord somewhat with results from the studies of House and Bell [4] and Payn et al. [13], in that populations from the island Wetar exhibited a level of genetic differentiation from populations on Flores and closely adjacent islands, i.e., on Payn et al.'s [13] "western islands". However, the cluster analysis together with the AMOVA carried out in the current study also generally concur with House and Bell's [4] conclusions in that genetic differences and differentiation between populations originating across *E. urophylla's* natural range are generally small.

While the observed heterozygosity was around 30% below the expected heterzogosity in the third cycle population, there was almost no change in the fixation index estimated for the first cycle population. This lack of change suggests no change in genetic variability between the studied populations. Reasons for the discrepancy between the observed and expected heterozygosity are uncertain, and even if inbreeding might have been expected to be a factor contributing to such discrepancies in the first cycle population (as it mainly comprised natural stand progeny), in the third cycle population lower rates of inbreeding were expected due to the crossing between unrelated individuals in the preceding one to two generations.

#### *4.3. Practical Implications*

Domesticated populations of many forest tree species have been found to show relatively little reduction in overall expected heterozygosity compared to the native distributions from which they were derived [50]. The current study shows that this also holds for the *E. urophylla* breeding populations in China, at least up to the third cycle. Lefevre [50] also suggested that apparent genetic diversity in the breeding population of a domesticated tree species might be increased, relative to the original populations, due to crossing between differentiated native populations. However, results from House and Bell [4], Tripiana et al. [15] and Payn et al. [13], as well as the current study, suggest that the differentiation between the native populations of *E. urophylla* is relatively minor. Payn et al. [13] found that significant gene flow among *E. urophylla*'s geographically separate island populations has had continuous influence on the genetic diversity of the species in its natural range. Thus, crossing, within a diverse breeding population of this species, between geographically disparate native provenances, might be unlikely to result in much change to genetic diversity. Indeed the results obtained from the third cycle population of this current study support this.

Even so, evidence from other studies attest to real prospects for increasing diversity somewhat through future *E. urophylla* breeding cycles in China. Across 4 generations of *E. urophylla* breeding in Brazil, Pigato and Lopes [51] found genetic distances between individuals to be markedly higher in their third and fourth generations than their first and second generations, which they took as indicating an increase in genetic variability with the advance of their program. In an *E. regnans* program in New Zealand that had progressed through three cycles of breeding, Suontama et al. [52] found that some of their third cycle field trials provided the largest heritabilities and additive genetic variances (for height at age 3 years).

In order to better understand the genetic architecture of the current and future cycles of the *E. urophylla* breeding program in China, it would be of value in the future assessments to score trees selected (as parents for a subsequent cycle) for *E. alba* characteristics. Such characteristics include large and roundish leaves with a blunt tip and/or smooth white bark on the mature tree stems, compared to *E. urophylla* with broad-lanceolate leaves narrowing abruptly to a short point and a variable stocking of rough, sub-fibrous bark [3,45]. Such phenotypic data would enable some examination of the potential proportion of individuals that potentially express some level of introgression, and hence enable a better understanding of sources of variation within the breeding population.

The question of whether panmixia can be achieved in the *E. urophylla* breeding populations studied is also of critical importance; results from the current study suggest that it has not been achieved in at least the first and/or second cycles. An assessment of phenology to understand any temporal differences on account of ancestral geographical origins (i.e., island and provenances within island origins) is needed to understand if there is asynchronous flowering within the third cycle population, a process that might be creating unwanted substructure within the population. If real substructures and/or barriers to free interbreeding do exist, then measures such as controlled pollinations and/or sublining of the main population might be required to manage barriers to panmixia and prevent unwanted substructures limiting the potential for future genetic gains.

That null alleles were recorded at relatively high frequencies at some of the loci examined in both the first and third cycle populations, indicates a possible shortcoming in the current study's methodology. Such null alleles might be biasing some parameter estimates as they can result in inflated heterozygote deficits, and decreased estimates of Ho, He, and even genetic diversity [53]. For all or at least a random subset of the samples examined in the current study, no repeated marker amplifications were carried out, so estimation of error rates on locus scoring was not possible. Although this shortcoming is common among the majority of published studies on microsatellites in eucalypts, as well as in other plants (see [17]), it would be better if it was avoided. In any future work on *E. urophylla* and/or other species, we would undertake repeated marker amplification of at least a random subset (10–15%) of samples, to enable error rates to be calculated from the number of inconsistent genotypes between the first and second amplification attempts [54].

#### **5. Conclusions**

On account of a number of factors, including the origin of genetic materials used to develop the founding (first cycle) population for the *E. urophylla* breeding program in China, deficits of heterozygotes were found for the 16 microsatellite loci examined in both the founding (first cycle) and the descendant (third cycle) breeding populations of *E. urophylla*. Even so, the high allelic diversity observed in the founding population was maintained through cycles of intensive selection into the descendant third cycle population.

Most of the genetic variation within the two populations examined in this study existed among individuals, rather than between provenances or seed sources. This finding reaffirms that the number of (unrelated) individuals included in the populations was the key to capturing adequate genetic variation, rather than the number of seed sources/provenances represented by such individuals. Similarly, the results suggested that maintaining a high number of unrelated individuals in descendant populations should contribute to maintenance of genetic variation.

In general, the level of genetic diversity was maintained through the successive cycles observed in the current study. This indicates good prospects for maintaining if not increasing diversity through future descendant cycles of breeding *E. urophylla.*

#### **6. Data Archiving**

The data obtained in the course of this study, consisting of genotypic data for 16 microsatellite loci from 202 individuals of the first cycle population and 125 individuals from the third cycle population along with the necessary pedigree data, will be submitted to an appropriate online data repository upon article acceptance.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/9/7/372/s1, Table S1: Details of the *E. urophylla* trials comprising the first, second and third cycle breeding populations for this species in China. Table S2: Additional details on Trial T164 (third cycle breeding population). Figure S1: Locations of the *E. urophylla* trials comprising the first, second and third cycle breeding populations for this species in China.

**Author Contributions:** W.L. and J.L. conceived and designed the study; W.L. and L.Z. organized and conducted the trial sampling and managed the samples; W.L. carried out the laboratory analyses; W.L., L.Z., J.L. and R.J.A. collated, managed and analyzed data; W.L. and R.J.A. wrote the paper.

**Funding:** This research was funded by Fundamental Research Funds of Chinese Academy of Forestry project [project number CAFYBB2017MA022], National Natural Science Foundation of China (project number 31700599).

**Acknowledgments:** We are grateful to Lan Jun from Dongmen State Forest Farm for the assistance in accessing the field trials sampled in this study, and to Paul Macdonell from Queensland, Australia, for preparation of Figure 1. Two anonymous reviewers provided numerous suggestions for significantly improving the content of this paper and we greatly appreciate their guidance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Identification of miRNAs Associated with Graft Union Development in Pecan [***Carya illinoinensis* **(Wangenh.) K. Koch]**

#### **Zhenghai Mo 1,2,3, Gang Feng 1, Wenchuan Su 1, Zhuangzhuang Liu <sup>1</sup> and Fangren Peng 1,\***


Received: 23 May 2018; Accepted: 31 July 2018; Published: 3 August 2018

**Abstract:** Pecan [*Carya illinoinensis* (Wangenh.) K. Koch] is a high-value fruit tree with a long juvenile period. The fruiting process of pecan seedlings can be largely accelerated through grafting. As non-coding small RNAs, plant miRNAs participate in various biological processes through negative regulation of gene expression. To reveal the roles of miRNAs in the graft union development of pecan, four small RNA libraries were constructed from the graft union at days 0, 8, 15, and 30 after grafting. A total of 47 conserved miRNAs belonging to 31 families and 39 novel miRNAs were identified. For identified miRNAs, 584 target genes were bioinformatically predicted, and 266 of them were annotated; 29 miRNAs (including 16 conserved and 13 novel miRNAs) were differentially expressed during the graft process. The expression profiles of 12 miRNA were further validated by quantitative reverse transcription PCR (qRT-PCR). In addition, qRT-PCR revealed that the expression levels of 3 target genes were negatively correlated with their corresponding miRNAs. We found that miRS26 might be involved in callus formation; miR156, miR160, miR164, miR166, and miRS10 might be associated with vascular bundle formation. These results indicate that the miRNA-mediated gene regulations play important roles in the graft union development of pecan.

**Keywords:** grafting; pecan; miRNA; graft union; sequencing

#### **1. Introduction**

Grafting, as an asexual propagation technology, has been applied extensively in fruit trees to aid the adaptation of scion cultivars to potentially disadvantageous soil and climatic conditions, avoid the juvenile period, increase productivity, and improve quality [1]. Successful grafting is a complicated process that involves the initial adhesion of rootstock and scion, callus formation, and vascular connection at the graft union [2]. It has been reported that phytohormones (especially auxin) and antioxidant enzymes are important players during graft union development [3–6]. At the molecular level, a successful graft is controlled by numerous genes in plants, especially for the genes involved in hormone signaling. cDNA amplified fragment length polymorphism (AFLP) analysis of graft union in hickory [*Carya tomentosa* (Lam.) Nutt.] indicated that some genes related to signal transduction, metabolism, auxin transportation, wound response, cell cycle, and cell wall synthesis were responsive to grafting [7]. In *Arabidopsis*, genes involved in hormone signaling, wounding, and cellular debris clearing were induced during graft union development [8]. In grapevine, graft union formation activated the differential expression of genes participated in secondary metabolism, cell wall modification, and signaling [9]. Transcriptomic analysis of graft union in *Litchi chinensis* Sonn. revealed

that 9 unigenes annotated in auxin signaling had higher expression levels in the compatible grafts compared with the incompatible ones [10].

miRNAs, a category of non-coding RNAs with approximate 22 nucleotides (nt), are critical regulatory molecules of gene expression; they induce either post-transcriptional degradation or translational inhibition of their target mRNAs [11]. In plants, miRNAs bind to their target mRNA sequences with perfect or near-perfect complementarity, and negatively regulate gene expression mainly via targeted cleavage [12]. The binding sites of plant miRNAs are almost exclusively located within the open reading frames of their target genes [13]. In recent years, with the development of second-generation sequencing technology, miRNAs could be identified from non-model plants [14,15]. Numerous studies have suggested that miRNAs play regulatory roles in plant resistance to biotic and abiotic stresses, such as cold [16], heat [17], and virus infection [18]. In addition, miRNAs have been confirmed to participate in various development processes [19–21]. In grafted plants, miRNA has been reported to be involved in the regulation of scion and rootstock interaction. In watermelon cultivation, grafting is commonly used to increase resistance to environmental stresses. With high-throughput sequencing, Liu et al. [22] found that miRNAs would differentially express in grafted watermelon to regulate plant adaptation to stresses. Li et al. [23] identified grafted-responsive miRNAs in cucumber/pumpkin, pumpkin/cucumber heterografts, and found that miRNAs were involved in regulating physiological process of hetrografts. Khaldun et al. [24] investigated the expression profiles of miRNAs within a distant grafting of tomato/goji, and the result showed that when compared with tomato autografts, tomato/goji heterografts had 43 and 163 differently expressed miRNAs in shoot and fruit, respectively. Although mounting evidence indicates the involvement of miRNAs in scion-rootstock interactions, there was only one published report concerning the functions of miRNAs which participate in the graft process, which was presented in hickory [25].

Pecan [*Carya illinoinensis* (Wangenh.) K. Koch] is an economic nut tree which belongs to the family Juglandaceae and genus *Carya*. It has been widely planted in China in recent years. As a woody plant, the juvenile phase of pecan seedlings is very long, lasting about 10 years. To accelerate the fruit bearing process, grafting is widely used in the cultivation of pecan, by which, trees can begin to produce fruit in 5–7 years. In industrial pecan cultivation, grafting success rate is very low; 75% grafting success is considered to be good [26]. Nowadays, in China, using the graft technique of patch budding can sometimes achieve a grafting success of 90% for some cultivars of pecan, such as 'Pawnee', 'Stuart', and 'Shaoxing'. However, a low grafting success rate still exists in some cultivars, such as 'Mahan' and 'Jinhua'. To improve the graft survival rate of industrial pecan, a better understanding of the mechanism associated with the graft union development is needed. In our previous studies, morphological, proteomic, and transcriptomic analyses have been conducted in the graft process of pecan [27,28]. In this work, we investigated miRNA expression during the graft process of pecan using RNA-sequencing technology. Four small RNA libraries from the graft union collected at different time points (days 0, 8, 15, and 30 after grafting) were constructed, and the differentially expressed miRNAs were analyzed. Subsequently, the potential roles of these miRNAs and their target genes were discussed.

#### **2. Materials and Methods**

#### *2.1. Plant Materials*

Pecan homograft was performed through patch grafting in August at the experimental farm at Nanjing Forestry University. The pecan cultivar 'Pawnee' was used as scion, and one-year-old seedlings propagated from pecan seeds were used as rootstock. Based on our morphological observation of graft union development, samples from the graft unions (approximately 5 mm in length, the budding segment that includes the tissues of scion, and the developing xylem of rootstock) were collected at day 0 (ungrafted materials, and were used as control), day 8 (the stage of initial callus proliferation), day 15 (the stage of massive callus proliferation along with cambium establishment), and day 30 (the stage

of vascular bundles formation). For each sample, three different graft unions were pooled and frozen in liquid nitrogen immediately, and then stored at −80 ◦C until required for use.

#### *2.2. RNA Extraction and Deep Sequencing of Small RNA*

Total RNA was isolated from graft unions at four time points using the trizol reagent (Invitrogen, Carlsbad, CA, USA), following the manufacturer's instructions, and then digested with RNA-free DNase I (Takara, Kyoto, Japan) to degrade genomic DNA. Sequencing libraries were constructed by NEBNext® UltraTM small RNA Sample Library Prep Kit for Illumina® (NEB, Boston, MA, USA) according to the protocol. Briefly, approximately 1.5 μg RNA was ligated to 5 and 3 adapter by T4 RNA ligase for each of the samples. Next, reverse transcription synthetic first chain and PCR amplification was conducted. The resulting PCR products were subjected to polyacrylamide gel electrophoresis, and the 140–160 bp fragments were screened for sequencing. The sequencing raw data was deposited in the NCBI Sequence Read Archive (SRA) with the accession number SRP131300.

#### *2.3. Sequence Analysis and Target Prediction of Pecan miRNA*

Following sequencing, raw reads of the four libraries were processed through in-house Perl scripts. In this step, clean reads were obtained by removing low-quality reads and trimming adapter sequences. Reads smaller than 18 nt or longer than 30 nt were also abandoned. By using Bowtie software, clean reads with 18–30 nt in length were subsequently blasted against the Rfam (http://www.sanger.ac. uk/software/Rfam) and Repbase databases (http://www.girinst.org/) to filter rRNA, tRNA, snRNA, snoRNA, other ncRNA and repeats. The remaining sequences were aligned with the miRBase 21.0 database (http://www.mirbase.org/index.shtml) to identify putative conserved miRNAs, allowing no more than two mismatches. The remaining non-annotated reads were mapped to the pecan graft union development's transcriptome data (accession number SRP118757 and GGRT00000000 in NCBI database) to predict potential novel miRNAs by miRDeep2. The criteria for novel miRNA identification was as follows: (1) miRNA precursors could form hairpin-like structures; (2) miRNA should have a corresponding miRNA \* in sequencing data, and the two could form a duplex with 2 nt 3 overhangs; (3) in miRNA \*-deficient cases, candidate miRNAs should derive from multiple and independent libraries [29]. The secondary structures of novel miRNAs were predicted by Randfold software. Putative targets of miRNA were predicted by TargetFinder, and then annotated based on the databases of Nr (NCBI non-redundant protein sequences), Protein family (Pfam) and GO (Gene Ontology). The expression value of putative target genes were obtained from the supplementary materials of our previously published paper (http://www.mdpi.com/2073-4425/9/2/71/s1) [27].

#### *2.4. Analysis of Differentially Expressed miRNAs*

To calculate the expression levels of miRNAs in four libraries, miRNA counts were first normalized as transcripts per million (TPM) using the following formula: TPM = mapped read count/total reads × 106. Fold changes of miRNA in three comparisons (day 8/day 0, day 15/day 0, and day 30/day 0) were analyzed by IDEG6, and the miRNA were considered to be differentially expressed with the corrected *p* value (*q* value) < 0.05 and absolute log2 fold change >1.

#### *2.5. Quantitative Real-Time PCR (qRT-PCR)*

To validate the expression profiles of miRNAs, graft unions were collected at days 0, 8, 15, and 30 after grafting, with three biological repetitions. miRNAs were isolated by the Universal Plant microRNA Kit (BioTeke, Beijing, China). The subsequent reverse transcription and real-time PCR were carried out using BioTeke miRNA First Strand cDNA synthesis kit (BioTeke, Beijing, China) and BioTeke miRNA qPCR Detection Kit (BioTeke, Beijing, China), respectively. For target gene detection, total RNAs were extracted from the same samples, as mentioned above. First-strand cDNA synthesis and the following real-time qPCR were conducted by Prime-Script™ II First Strand cDNA synthesis kit (Takara, Dalian, China) and SYBR Premix Ex Taq™ II kit (Takara, Dalian, China), respectively. Primers

were designed based on the sequence of corresponding miRNAs and mRNAs, and were available in Table S1. 5.8S rRNA was chosen as an internal reference for miRNA normalization, while the Actin was used as an endogenous reference for mRNA analysis. All qPCR was run in three technical replicates. The relative expression levels of miRNAs and mRNA were calculated using the comparative 2−ΔΔ*C*<sup>t</sup> method.

To explore tissue-specific expression, miRNAs and total RNAs were extracted from different organs, including wound-induced calluses, xylem, phloem, and leaves. The qPCR primes are listed in Table S1.

#### **3. Results**

#### *3.1. Analysis of Small RNA Sequencing*

To identify miRNAs associated with graft union development in pecan, four small RNA libraries were constructed from the graft unions harvested at days 0, 8, 15, and 30 after grafting. Deep sequencing produced 20,691,228, 21,849,708, 22,850,876, and 34,439,863 raw reads for the four libraries, respectively (Table 1). After removing low-quality reads, 17,060,180 (day 0), 19,032,782 (day 8), 19,780,849 (day 15), and 28,632,161 (day 30) clean reads were obtained. Among the clean reads, 6,579,996 (day 0), 6,865,905 (day 8), 6,935,788 (day 15), and 7,815,688 (day 30) reads were unique, and 1,506,852 (day 0), 2,537,261 (day 8), 1,759,479 (day 15), and 2,290,283 (day 30) reads could map to the reference unigene (accession number GGRT00000000 in NCBI database). By aligning to Rfam and Repbase database, clean reads were classified into rRNA, snRNA, snoRNA, tRNA, and repeat-associated sRNA for almost all the libraries, except for day 0, which had no snRNA. The remaining unannotated reads were used for conserved miRNA identification and novel miRNAs predication.

The length distribution of unique clean reads ranging from 18 nt to 30 nt was summarized (Figure 1). We found that the most abundant class was the 24 nt sRNAs, which was consistent with previous studies in hickory [25,30,31]. The second most numerous was 23 nt sRNAs, and the majority of the sRNAs were generally distributed between 21 and 24 nt.


**Table 1.** Analysis of small RNAs from libraries of days 0, 8, 15, and 30 in pecan.

Note: Raw reads, reads generated from squencing platform; Clean reads, reads after quality control; Unique reads, clean reads after clustering; rRNA, ribosomal ribonucleic acid; snRNA, small nuclear ribonucleic acid; tRNA, transfer RNA; Repbase, repeat sequence; Unannotated reads, reads can not align to Rfam and Repbase databases; Mapped reads, the unannotated reads that can map to reference unigenes.

**Figure 1.** Size distribution of sRNAs from the libraries of days 0, 8, 15, and 30 in pecan. For each library, sRNAs were based on the total unique clean reads.

#### *3.2. Identification of Conserved miRNAs in Pecan*

To obtain conserved miRNAs in pecan, all unannotated reads in Rfam and Repbase were pooled and used to do a blast against miRbase, allowing two mismatches. Based on miRbase results and hairpin prediction, a total of 47 conserved miRNAs with their corresponding star strands were identified from the four libraries (Table S2). These 47 conserved miRNAs were classified into 31 miRNA families, among them, the miR482 family possessed the maximum members (four), followed by miR166 and miR396, while the remaining families have only one or two members. The 47 miRNAs showed great difference in expression levels, of these, miR159a–b, miR166a–c, and miR319a–b had relatively high expression level, in contrast, members such as miR4998, miR5998, miR6135, miR7504, and miR7717 presented low expression levels.

#### *3.3. Identification of Novel miRNAs in Pecan*

To identify novel miRNAs, all the remaining unannotated sRNAs were blasted against our transcriptome data. In total, 39 novel miRNAs corresponding to 39 distinct precursor sequences were obtained from the four libraries (Table S3), and all the precursors of these candidate miRNAs were found to have typical stem-loop structures (Figure S1). Star sequences were detected for all the novel miRNAs, an important evidence of being bona fide miRNAs [29]. The most common base for the first nucleotide of novel miRNAs was Uracil (U), a common pattern observed in other studies [25,32]. The length of these mature miRNAs ranged from 18 nt to 25 nt, and the most common was 24 nt. The range of the minimal free energy (MFE) for these novel miRNA precursors was from −96.9 to −31.8 kcal/mol, with −69.0 kcal/mol on average. The expression level of most novel miRNAs were generally low (TPM < 100), while some miRNAs such as miRS19 and miRS33 presented high level with dynamic TPM > 1000.

#### *3.4. Prediction and Functional Annotation of Target Genes of miRNAs*

We searched for putative targets by blasting the miRNAs against our transcriptome sequences with perfect or near-perfect complementarity. As a result, a total of 584 targets were predicted for the 86 miRNAs (with an average of 6.8 targets per miRNA), and 266 of them were annotated (Table S4). For functional classification, these targets were subjected to GO (Gene Ontology) analysis. As shown

in Figure 2, targets of miRNAs fell into 17 biological processes, with the three most abundant being metabolic process, cellular process, and single-organism process. Targets in the cellular component category were classified into 11 terms, with the three most frequent being cell, cell part, and organelle. With respect to molecular function, targets were assigned to 10 terms, with the two most frequent being binding and catalytic activity.

**Figure 2.** GO annotation of targets of identified miRNAs. Targets were functionally categorized by biological process, cellular component and molecular function according to the ontological definitions of the GO terms.

#### *3.5. Differential Expressed miRNAs during the Graft Process of Pecan*

To obtain insight into the possible roles of miRNAs in the graft union development of pecan, differential expressions were analyzed by comparing days 8, 15, 30 to day 0, with the criteria of absolute log2 fold change >1 and *q* value < 0.05. In total, 29 miRNAs with 16 conserved and 13 novel were considered to be differentially expressed in the three comparisons (Table 2). Of these, 10 miRNAs were differentially expressed in the comparison of day 8/day 0, with 7 down-regulated and 3 up-regulated. Fourteen differential expressed miRNAs were identified in day 15/day 0 comparison, with 4 down-regulated and 10 up-regulated. In the comparison of day 30/day 0, 23 differential expressed miRNAs were found, with 19 down-regulated and 4 up-regulated. There were 10 miRNAs whose expression changed significantly in two comparisons, and 4 miRNAs changed obviously in three comparisons. We compared the differential expressed value between miRNAs and their targets using our transcriptome data, and found that miRNAs were generally negatively correlated with their corresponding targets (Figure 3).



**Table** 

tryptophan-aspartic

 acid.

*Forests* **2018**, *9*, 472

**Figure 3.** Expression profile of some miRNAs and their targets in the graft process. Columns in the heatmap reprent different comparisons (experiment/control: day 8/day 0, day 15/day 0, and day 30/day 0). Comparisons were made to calculate expression changes (fold change). Rows in the heatmap symbolize miRNAs or target genes. The data in the heapmap are the value of log2 (fold change). Red and green indicate up-regulation and down-regulation respectively.

#### *3.6. Differential Expressed miRNAs during the Graft Process of Pecan*

To validate the dynamic expression of miRNAs at different time points after grafting obtained by sequencing, 12 miRNAs, including 8 conserved and 4 novel, were chosen for qRT-PCR analysis (Figure 4). Results showed that most of the expression profiles of studied miRNAs based on qRT-PCR were similar to those detected by high-throughput sequencing, except miR394. The expression of miR394 at day 30 was down-regulated based on high-throughput sequencing, while it was up-regulated detected by qRT-PCR. Also, for specific time points after grafting, the relative expression level of miRNAs detected by these two methods did not match exactly. For instance, sequencing data indicated that the ratio of miRS23 in day 15/day 0 was 0.11, but it was 0.60 with the corresponding qRT-PCR date. This inconsistency might result from the difference in data normalization protocols of the sequencing data and qRT-PCR. The sequencing was normalized to the global abundance of mapped reads sequenced by illumina, while qRT-PCR was normalized to the level of 5.8S rRNA. A correlation analysis of the fold change of miRNA expression between sequencing and qRT-PCR showed a significant similarity with *R*<sup>2</sup> = 0.84 (Figure S2), confirming the reliability of results obtained by sequencing. Additionally, to further validate the dynamic correlation between miRNAs and their targets, the expression of potential targets were also subjected to qRT-PCR assay. Results showed that all the three targets had an inverse expression profile with their corresponding miRNAs (Figure 5).

#### *3.7. Expression Patterns of miRNAs and Their Targets in Different Tissues of Pecan*

To understand the main roles of miRNAs and their targets, we analyzed the tissue-specific expression profiles of miRNAs and mRNAs in different organs of pecan. Generally, miRNAs and their targets were negatively correlated, and were preferentially expressed in specific tissues (Figure 6). miR156 showed lower expression level in xylem and phloem, while its target had higher expression values in those tissues. miR160, miR164, miR166, and miRS10 exhibited low expression levels in xylem, while their corresponding targets, expect NAC, were highly expressed in xylem. The target of miRS26 displayed the highest expression in callus.

**Figure 4.** qRT-PCR validation of miRNAs in the graft process of pecan. The histograms and lines indicate miRNA expression results obtained by sequencing and qRT-PCR, respectively. The x-axis represents samples collected at different time points after grafting, while the y-axis represents the relative expression level of miRNAs. The expression levels of miRNAs are normalized to the level of 5.8S rRNA. The normalized miRNA levels at day 0 are arbitrarily set to 1. Data from qRT-PCR are means of three replicates and bars represent SE (standard error).

**Figure 5.** The expression of miRNAs and their targets. The relative expression levels of miRNAs and their corresponding target genes are shown in grey and green histograms, respectively. The x-axis represents samples collected at different time points after grafting, while the y-axis represents the relative expression level of miRNAs and their target genes. The expression level of miRNAs and target genes are normalized to the level of 5.8S rRNA and Actin gene. For each miRNA and target gene, the expression levels at day 0 are arbitrarily set to 1. Data from qRT-PCR are means of three replicates and bars represent SE.

**Figure 6.** Expression of miRNAs and their targets in different tissues of pecan. The relative expression levels of miRNAs and their corresponding target genes are shown in blue and green histograms, respectively. The x-axis represents different tissues collected from pecan, while the y-axis represents the relative expression level of miRNAs and their target genes. The expression level of miRNAs and target gene are normalized to the level of 5.8S rRNA and Actin gene. For each miRNA and target gene, the expression levels at day 0 are arbitrarily set to 1. Data are means of three replicates and bars represent SE. SPL, squamosa promoter-binding protein-like; ARF, auxin response factor; NAC, NAC transcription factor; HD-ZIP, homeobox-leucine zipper; CCR, cinnamoyl-CoA reductase; CYCD, D-type cyclin.

#### **4. Discussion**

Although grafting has been extensively used in horticulture, our knowledge regarding the molecular mechanism of successful graft remains insufficient. Plant miRNAs are non-coding RNAs that play important roles in various biological processes at post-transcriptional level. In this study, we used high throughput sequencing to explore the conserved and novel miRNAs in pecan, and then analyzed the differentially expressed miRNAs to better understand the function of miRNAs in a successful grafting.

miRNAs are reported to be widely distributed throughout almost all eukaryotes, and some miRNAs are deeply conserved in plant kingdom [33]. In our work, 47 conserved miRNAs belonging to 31 miRNA families were identified. Of those, miRNAs including miR156, miR159, miR160, miR164, miR166, miR167, miR171, miR172, miR390, miR393, miR394, miR396, miR399, and miR403 were confirmed to be well-conserved in both monocot and dicot model plants [33]. We obtained a total of 39 novel miRNAs in the graft process of pecan. Those newly identified miRNAs might be pecan-specific. We detected that the novel miRNAs generally exhibited a lower expression level than the conserved ones, which was consistent with previous literature [34,35].

A total of 16 conserved and 13 novel miRNAs were differentially expressed during the graft processes. Since successful grafting is a developmental processes involving callus formation and vascular bundle formation, the differential expression of a cascade of miRNAs concerning those processes might suggest their involvement in the graft process as well. Previously, it has been reported that miR159, miR169, miR171, and miR172 were identified as being responsive to embryogenic callus formation of *Larix leptolepis* Gordon [36]. miR396 expressed at high level would attenuate cell proliferation in the developing leaves of *Arabidopsis thaliana* [37]. miR166 has been reported to be involved in xylem development of *Acacia mangium* (Willd.) [38]. In *Arabidopsis*, miR166 was found to be involved in vascular development through negatively regulating the expression of *ATHB15*, a *class III homeodomain-leucine zipper* (*HD-ZIP III*) gene [39]. miRNAs including miR156, miR159, miR160, miR172, miR390, and miR482 have been confirmed to participate in the graft process of hickory [25]. Consistent with these previous research, miR156, miR160, miR166, miR171, miR390, miR396, and miR482 showed significantly differential expression for our research, suggesting they might function for graft union development.

The putative target of the differentially expressed miR156, *squamosa promoter-binding protein-like* (*SPL*), encodes a plant-specific transcription factor that functions in multiple biological processes, including plant architecture, leaf development, juvenile-to-adult transition, flower and fruit development, as well as gibberellin (GA) signaling [40]. Among its divergent functions, SPL responses to GA signaling by affecting the genes involved in GAs biosynthesis. Studies have verified that GAs are important regulators in xylem differentiation [41]. In our study, miR156 was significantly down-regulated at day 30, which might induce the up-regulation of SPL during the stage of vasculature formation. It was presumed that the miR156-*SPL* interaction might involve in vascular bundle formation through regulating GA signaling indirectly.

A putative target of miR160 is *auxin response factor* (*ARF*). In the graft process, auxin has been confirmed to be critical in regulating callus formation and vascular development [8,42]. Auxin signaling is transduced via ARFs to regulate the expression of genes containing auxin response elements (AuxREs) in their promoter areas [43]. In *Arabidopsis*, *ARF6* and *ARF8* mutants reduced cell proliferation in response to cutting [42], and *ARF5* mutants showed abnormality in vascular development [44]. In the present study, we hypothesized that the down-regulated miR160a-b at day 30 may induce the accumulation of *ARF*, resulting in vascular connection.

A putative gene targeted by miR164 was the NAC transcription factor, which was in accordance with *Arabidopsis* [45], *Medicago truncatula* Gaertn [46], and *Triticum aestivum* L. [34]. NAC transcription factors are the master regulators in controlling secondary cell wall formation [47], and overexpression NAC1 in *Arabidopsis* was shown to produce thicker stems than the untransformed control plants [48]. Previous studies have reported that secondary cell-wall formation was indispensable for vascular system development [49]; thus, the down-regulated miR164b at day 30 in this work might stimulate *NAC1* expression to regulate vascular development.

miR166b belongs to the miR166 family, and targets the *homeobox*-*leucine zipper* (*HD*-*ZIP*) gene. Members of HD-ZIP gene family have been reported to function in various stress conditions, such as drought, salinity, and wounding [50,51]. The class III HD-ZIP gene family plays important roles in vascular bundle development. It was reported to be highly expressed in cambium tissue [52,53]. In *Arabidopsis*, the class III HD-ZIP proteins were also showed to control cambium activity through inducing axial cell elongation and xylem differentiation [54]. Overexpressing a *populus class III HD*-*ZIP* gene would lead to ectopic formation of vascular cambium within cortical parenchyma in poplar [55]. For a successful grafting, the formation of vascular bundles results from the promotion of vascular cambium activity. In our study, miR166 was down-regulated at day 15, suggesting *class III HD*-*ZIP* might be up-regulated at the stage of new cambium establishment. It is speculated that the increased *class III HD*-*ZIP* may stimulate the cambium activity for the subsequent xylem formation. Interestingly, we found that miR166 was significantly down-regulated at day 8 as well, indicating that the initial xylem differentiation might happen before new cambium establishment, as demonstrated by the reports of Pina [2].

A putative target of miRS10, *cinnamoyl*-*CoA reductase* (*CCR*), is a gene dedicated to monolignol biosynthesis. Down-regulation of *CCR* in poplar exhibited up to 50% reduced lignin level in outer xylem [56]. Since lignin is essential for vascular development, the down-regulated miRS10 at day 30 might induce the up-regulation of *CCR*, and then lead to the lignification of vasculature during the graft process of pecan.

A predicted target of miRS26 was *D-type cyclin* (*CYCD*). CYCD is a critical regulator that promotes the progression of cell cycle by binding to cyclin-dependent kinases A, which plays vital role in cell proliferation [57]. It was found to be induced by auxin and cytokinin [58]. *Arabidopsis* hypocotyl explants of overexpressing *CYCD4* showed faster induction of callus than the control explants on a media with lower auxin concentration [59]. In this study, miRS26 was down-regulated at day 8, a stage during initial callus formation, and then up-regulated at the following time-points, while *CYCD* was up-regulated throughout the grafting process, indicating that a negative correlation between miRS26 and *CYCD* at days 15 and 30 did not exist. However, the expressions of miRS26 and CYCD in different tissues indicated that they were negatively correlated. Considering the negative interaction exiting between miRS26 and *CYCD* during the stage of initial callus formation, and *CYCD* displaying highest expression abundance in callus tissue, we deduced that miRS26 might play a vital role in stimulating callus proliferation during graft union development.

In our study, the tissue-specific expression profiles of miRNAs and their targets might indirectly suggest their specific roles for the graft union development. The low expression of miRNAs in xylem tissues, such as miR156, miR160, miR164, miR166, and miRS10 might be indicative of their specific roles during vascular development. miRS26 showed low expression in callus tissues, suggest its possible involvement in callus formation for a successful graft.

#### **5. Conclusions**

Our study constructed four sRNA libraries from the graft unions of pecan collected at days 0, 8, 15, and 30 after grafting. We identified a total of 47 conserved miRNAs belonging to 31 families and 39 novel miRNAs. Among them, 29 miRNAs with 16 conserved and 13 novel were differentially expressed in the graft process, suggesting their critical roles in successful grafting. Particularly, for the graft union development of pecan, miRS26 might play an important role in callus formation; miR166, miR156, miR160, miR164, and miRS10 might contribute to vascular bundle formation (Figure 7).

**Figure 7.** Putative regulatory mechanism involving differentially expressed miRNAs and their targets in graft union formation of pecan. The upper arrow indicates upregulation, and the down arrow represents downregulation. CYCD, D-type cyclin; SPL, squamosa promoter-binding protein-like; ARF, auxin response factor; NAC, NAC transcription factor; CCR, cinnamoyl-CoA reductase; HD-ZIP, homeobox-leucine zipper.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/9/8/472/s1, Figure S1: The hairpin structures of novel miRNAs. The mature miRNAs are in red, and the miRNA \*s are in blue, Figure S2: Correlation analysis between sequencing and qRT-PCR. Scatter plots show fold-change measured by sequencing and qRT-PCR, Table S1: Primers used in this study, Table S2: Conserved miRNAs identified in pecan, Table S3: Novel miRNAs identified in pecan, Table S4: Target genes of identified miRNAs in pecan.

**Author Contributions:** F.P. conceived and designed the study. Z.M. performed the data analysis and wrote the manuscript. G.F. and W.S. carried out qRT-PCR. Z.L. was involved in sample collection. All authors read and approved the final manuscript.

**Funding:** The authors appreciate the financial support from the SanXin project of Jiangsu province (LYSX(2016)44), the state bureau of forestry 948 project (2015-4-16) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

**Conflicts of Interest:** The authors declare that they have no competing interests.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Genome-Wide Identification and Characterization of MADS-box Family Genes Related to Floral Organ Development and Stress Resistance in** *Hevea brasiliensis* **Müll. Arg.**

### **Mingming Wei 1, Yajie Wang 2, Ranran Pan <sup>2</sup> and Weiguo Li 1,\***


Received: 26 April 2018; Accepted: 28 May 2018; Published: 29 May 2018

**Abstract:** Elucidating the genetic mechanisms associated with the transition from the vegetative to reproductive phase in the rubber tree has great importance for both theoretical guidance and practical application to yield genetic improvement. At present, many transcription factors, including those that belong to the MADS-box gene family, have been revealed to have roles in regulating the transition from vegetative growth to reproductive growth. However, to the best of our knowledge, the Mad-box gene family from *H. brasiliensis* Müll. Arg. has not been characterized in detail. To investigate members of the HbMADS-box gene family associated with floral organ and inflorescence development in *H. brasiliensis*, we performed genome-wide identification and analysis of the MADS-box gene family related to flower development in *H. brasiliensis*, and a total of 20 MADS-box genes were newly identified in the *H*. *brasiliensis* genome. Expression profiling revealed that HbMad-box genes were differentially expressed in various tissues, which indicated that HbMad-box genes may exert different functions throughout the life cycle. Additionally, 12 genes (HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbAG, HbDEFL, HbTT16, and HbPADS2) were found to be associated with the differentiation of flower buds and may be involved in flower development in *H. brasiliensis*. All of these floral-enriched HbMADS-box genes were regulated by hormone, salt, cold, high-temperature, and drought stresses. The present study is the first to carry out the genome-wide identification and analysis of the MADS-box gene family related to flower development in *H. brasiliensis*, and 20 new HbMad-box genes were identified in *H. brasiliensis*. Most of the newly identified HbMad-box genes were found to be associated with the differentiation of flower buds and may be involved in flower development in *H. brasiliensis*. Our results demonstrated that HbMad-box genes may be multifunctional regulators that have roles in distinct aspects of development, and are mainly involved in the maintenance of floral organ and inflorescence development.

**Keywords:** *Hevea brasiliensis* Müll. Arg.; HbMad-box genes; conserved domains; gene structures; expression profiles; stress treatments

#### **1. Introduction**

Natural rubber (NR) is an important industrial and strategic raw material, and has been applied to many aspects of social production [1]. Although more than 2000 plant species in the world are considered to be latex producers, the rubber tree (*Hevea brasiliensis* Müll. Arg.) is the only commercial source of NR because of its high yield and the excellent physical properties of its rubber products [2], and it supplied 92% of the 10.2 million tons of NR consumed worldwide in 2016 [3].

With the rapid development of the world economy, the consumption of NR in major industrial countries is increasing year by year. To meet the growing demand for NR, it is necessary to expand the planting area of rubber trees. However, the rubber tree originates from the Amazon rainforest, so the planting area must be located in sub-tropical to tropical zones [4]. At present, 95% of rubber trees in the world are now mainly cultivated in South-East Asia, so the regions suitable for planting rubber trees are very limited. Therefore, there is an urgent need to improve the rubber yield per area.

To the best of our knowledge, breeding new varieties is one of the most effective approaches to increasing rubber yield per hectare. However, breeding experiments to yield genetic improvement of rubber trees are very inefficient and time-consuming, mainly because of the rubber tree's long life cycle of more than 30–35 years; it is immature for five to eight years until the rubber tree reaches the age of commercial productivity [5,6], and takes more than three decades to breed and select new clones for commercial production [7]. Furthermore, due to the low rate of success for controlled pollinations, genetic improvements of *H. brasiliensis* are very difficult and slow [8]. Therefore, research on the genetic mechanisms that affect the transition from the vegetative to reproductive phase in *H. brasiliensis* can provide insight for producing advantageous genetic improvement methods for controlling reproduction by genetic engineering and accelerate *Hevea* breeding.

In flowering plants, the transition from vegetative growth to reproductive growth is an important developmental process that involves many gene regulatory processes [9]. To date, many researchers have attempted to elucidate the functional genes in association with the transition from the vegetative growth to reproductive growth of plants. It is worth noting that many transcription factors (TFs), including those that belong to the MADS-box gene family, have been demonstrated to have roles in regulating the transition from vegetative growth to reproductive growth [10]. However, to the best our knowledge, the rubber tree Mad-box gene family has not been characterized in detail.

As a floral homeotic gene family, the Mad-box gene family was previously identified and investigated in the model plants *Arabidopsis* Heynh. in Holl & Heynh. and *Nicotiana tabaccum* L. [11–13], and has evolutionarily conserved DNA-binding domains, called the MADS-box [14]. Typically, the MADS-box protein sequences can be divided into four characteristic domains from the N to the C terminus: the MADS-box (M), intervening (I), keratin-like (K), and C-terminal (C) domains [15]. In plants, based on the structural features, MADS-box TFs usually contain two main groups–type I (M-type) and type II (MIKC-type) genes [16]; whereas, the type II genes can be categorized into MIKCc- and MIKC\*-type [17]. In previous studies, it has been reported that this superfamily encodes transcriptional regulators that are involved in various processes, including floral organ development [18,19], root development [20], leaf development [21], fruit development and maturation [22–24], and embryonic development [25,26]. In addition to growth and development-related functions, some MADS-box genes also play important roles in response to stress stimuli [27,28]. For instance, MADS-box genes have already been proved to play important roles under low temperature stress in tomato plants [29], while several MADS-box genes have been demonstrated to take part in cold, salt, and drought responses in rice [30]. Furthermore, a few MADS-box genes have been shown to be affected by the application of hormones and they exhibited differential expression in response to cytokinin, gibberellin [31], ethylene [32], and auxin [33] application in other plants.

Despite the fact that the MADS-box gene plays a great role in plant growth and development, only a few MADS-box genes have been identified and characterized in the rubber tree to date [34]. For example, previous studies have found that HbAGL62 is a specific expression in flowers and embryos, and highly expressed in the flower bud differentiation stage, which indicated that HbAGL62 might play an important role in flowering regulation of the rubber tree [35]; MADS27 is highly expressed in the flower buds of rubber trees and may participate in the flowering process of rubber trees [36]; HbMADS1 and HbMADS3 have highly frequent transcriptions in the laticifer cells and somatic embryogenesis, and their transcriptions are induced in the laticifer cells by

jamonic acid, which indicates that HbMADS1 and HbMADS3 may be important in natural rubber biosynthesis and somatic embryogenesis in the rubber tree [37]. In the present research, we newly identified 20 MADS-box genes in the rubber tree, and analyzed their gene structure and phylogenetic relationships. To identify differentially expressed patterns of Mad-box genes in various tissues, the 20 MADS-box genes of *H. brasiliensis* were detected and analyzed using real-time quantitative PCR (RT-qPCR). Furthermore, to understand the responses of HbMADS-box genes to various stresses, the expression profiles of 12 floral organ-specific HbMADS-box genes were examined in leaves of *Hevea* seedlings after hormone, salt, cold, high-temperature, and drought stress treatments.

#### **2. Materials and Methods**

#### *2.1. Plant Materials and Treatments*

12-year-old rubber tree clones CATAS 7-33-97 were used as the experimental material in this study. The rubber trees were grown under normal field conditions at the Experimental Station of the Rubber Research Institute, the Chinese Academy of Tropical Agricultural Sciences (Danzhou, Hainan, China). The fresh tissues and organs (including roots, stem, stem tips, leaves, labeled bark, xylem, latex, fruits, inflorescence, male flowers, and female flowers) were collected from 12-year-old mature trees of CATAS 7-33-97 during the flowering period. Each sample was harvested on average from five trees, and three biological replicates were taken from each sample (the image of some samples shown in Figure S1). Then, the prepared samples were frozen in liquid nitrogen and transferred to a −80 ◦C refrigerator for RNA separation.

The tissue culture seedlings of CATAS 7-33-97 were treated with cold, high temperature, and drought stress. Each treatment was set up with three replicates, and each replicate consisted of five seedlings. Under the cold stress condition, the seedlings were grown in a culture incubator set at 5 ◦C and continuous illumination. For high-temperature stress treatment, the tissue-cultured seedlings were planted at 40 ◦C and maintained at a relative humidity of 80% in the incubator. Leaf samples of 0, 3, 6, 12, and 24 h treated with low and high temperature stresses were collected for RNA extraction. For drought stress treatments, the tissue-cultured plants were grown in Hoagland nutrient solutions [38] containing 20% PEG6000, and incubated at different times (0 h, 3 h, 6 h, 12 h, 1 day, 3 day, 4 day, and 7 day). Then, the leaf samples of each drought-treated time point were collected for RNA extraction, and samples from untreated plants were used as controls.

For hormone and salt treatments, the tissue-cultured seedlings were treated with abscisic acid (ABA) (200 μmol/L), gibberellin (GA) (100 μmol/L), and high salt (1 M NaCl), respectively. Among these chemicals, ABA and GA were diluted in distilled water that contained 0.05% (*v*/*v*) ethanol. The diluted ABA, GA, and NaCl solutions were sprayed on the leaves and stems of seedlings until the runoff occurred. For control plants, the distilled water that contained 0.05% (*v*/*v*) ethanol was sprayed on the leaves and stems of seedlings. Leaf samples were harvested 0, 0.5, 2, 6, 12, 24, and 48 h after treatments. In all treatments, one leaf from each of the five plants was taken and mixed together for RNA extraction.

#### *2.2. RNA Isolation and First-Strand cDNA Synthesis*

Total RNA was isolated from the collected samples by the described methods [39], and the extracted RNA was digested with DNase (Promega, Madison, WI, USA) to remove genomic DNA contamination. The integrity and concentration of the RNA samples was detected by 1.5% agarose gel electrophoresis and NanoDrop 2000 (Thermo Scientific Inc., Waltham, MA, USA), respectively. Then, the RNA samples were reverse transcribed into First-strand cDNA with the RevertAid™ First Strand cDNA Synthesis Kit (TaKaRa, Shiga, Japan).

#### *2.3. Identification and Isolation of Mad-box Genes in H. brasiliensis*

Twenty full-length cDNA sequences of *H. brasiliensis* Mad-box genes were obtained from RNA sequencing. The cDNA sequences of these genes were compared with the Transcriptome Shotgun Assembly (TSA) and Expressed Sequence Tags (EST) of *H. brasiliensis* in the NCBI database (http: //www.ncbi.nlm.nih.gov/), or searched against the *Hevea* genome database. Then, the NCBI ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and Softberry (http://linux1.softberry.com/) were used to determine open reading frames (ORFs) of candidate mRNA or genome DNA sequences. In addition, in order to confirm the presence of the Mad-box domain, all the candidate HbMad-box genes were further validated by conserved domain searching using CDD (http://www.ncbi.nlm. nih.gov/Structure/cdd/wrpsb.cgi) and InterProScan (http://www.ebi.ac.uk/interpro/scan.html). After the similarity comparison, ORF Finder, and conserved domain searching, redundant sequences were removed and the sequences of the HbMad-box gene were obtained.

The gene-specific primers used to amplify the corresponding full-length cDNA sequences of HbMad-box genes were designed by Primer 3.0 (http://primer3.ut.ee/). The primer pairs for all HbMad-box genes are listed in Table S1. RT-PCR amplification of HbMad-box genes was conducted using Pyrobest™ DNA polymerase (TaKaRa, Japan) according to the instructions.

#### *2.4. Protein Properties and Gene Structure Analysis of HbMad-box Genes*

We used the ProtParam tool (http://web.expasy.org/protparam/) to predict the theoretical molecular weight (Mw) and isoelectric point (PI) of HbMad-box gene proteins. To further analyze the structural diversity of HbMADS-box genes, the exon–intron structures of HbMad-box genes were identified by comparing the coding sequence with their corresponding genomic sequence using the FGENESH-C tool (http://linux1.softberry.com), as previously described [40].

#### *2.5. Multiple Sequence Alignments and Phylogenetic Analysis*

In the present study, amino acid sequence identities and multiple alignments of 26 HbMad-box proteins were calculated using DNAMAN6.0. To examine the evolutionary history and phylogenetic relationships of the HbMADS-box genes, we constructed a phylogenetic tree using MEGA6.0 (http:// www.megasoftware.net/) based on multiple sequence alignment of HbMADS-box TFs from *Arabidopsis, Oryza sativa* L., *Vitis vinifera* L., *Jatropha carcas* L., and *H. brasiliensis*.

#### *2.6. Quantitative Real-Time PCR (qRT-PCR) Analysis*

The real-time quantitative RT-PCR (qRT-PCR) was performed according to the following procedures: 94 ◦C for 30 s, 94 ◦C for 5 s, followed by 40 cycles, 60 ◦C for 15 s, and 72 ◦C for 10 s. The reaction volume was 20 <sup>μ</sup>L: including 60 ng cDNA per sample, 1×SYBR® Premix Ex Taq™ (TaKaRa, Shiga, Japan), and 0.4 μM per primer. The reaction was carried out in 96-well plates using the CFX96™ Real-Time System (Bio-Rad, Hercules, CA, USA). The 18S rRNA gene (GenBank accession No.: AB268099) was used as a reference gene in the qRT-PCR reaction [41]. After the reaction, we used Bio-Rad CFX Manager Software 3.0 (Bio-Rad, Hercules, CA, USA) to analyze and visualize the data as previously described [42]. All primers used for qRT-PCR analysis were designed by Primer3.0 (http://frodo.wi.mit.edu/primer3). The primer sequences are given in Table S2.

#### *2.7. Statistical Analysis*

Data and graphical analysis was performed with Sigma Plot 12 software (Systat Software Inc., San Jose, CA, USA). The 2−ΔΔCT method was used to calculate the relative expression levels of all HbMad-box genes [43]. The data are represented by the mean ± SD (standard deviation) of the three biological repeats. The statistical significance of the values was determined by a *t*-test.

#### **3. Results**

#### *3.1. Subsection Identification and Characterization of HbMad-box Genes in H. brasiliensis*

The 76 candidate HbMADS-box genes were checked by CDS and Inter-ProScan to confirm the existence of the MADS-box domain. After deleting the redundant sequences, a total of 26 non-redundant HbMADS-box genes (designated as HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbAGL12, HbAG, HbAGL11, HbAGL15, HbSVP1, HbSVP2, HbTT16, HbDEFL, HbPMADS2, HbAGL30, HbAGL61, HbAGL65, HbMADS1, HbMADS2, HbMADS3, HbMADS4, HbMADS5, and HbMADS27) with complete open reading frames (ORFs) were identified in the rubber tree (Table 1), which included six previously reported HbMADS-box TFs (HbMADS1, HbMADS2, HbMADS3, HbMADS4, HbMADS5, and HbMADS27).


**Table 1.** Characteristics of HbMADS-box genes family in *Hevea brasiliensis* Müll. Arg.

ORF, open reading frame; bp, base pair; aa, amino acids; Mw, molecular weight; PI, isoelectric point.

To confirm the putative HbMADS-box genes, the complete ORF sequences of HbMADS-box genes were isolated through RNA-sequencing and PCR-based approaches. Accurate sequences of HbMADS-box genes were submitted to GenBank, and the accession numbers of HbMADS-box genes are listed in Table 1. The ORF lengths of HbMADS-box genes ranged from 609 bp (HbAGL12) to 1116 bp (HbAGL30), and encoded polypeptides that ranged from 202 to 371 amino acids (Table 1). The corresponding Mw ranged from 23.09 to 41.86 kDa, and the predicted PI varied from 6.06 (HbSVP2) to 9.34 (HbAGL11). The distribution of PI was similar to that of AtMADS-box genes; however, the length and Mw of the HbMADS-box TFs were slightly lower than those of AtMADS-box genes (Table S3).

Pairwise sequence comparisons were performed to check the sequence identities between HbMADS-box proteins. The results showed that the identities between two HbMADS-box genes ranged from 13.67% to 98.57% (Table S4). The average sequence homology between two HbMADS-box genes was 39.99%. The largest sequence identity was observed between HbMADS4 and HbSVP1 (98.5%). HbAGL30 and HbMADS3 showed the least sequence identity (13.67%).

#### *3.2. Analysis of Conserved Domains and Structural Features of HbMad-box Proteins*

Analysis of conserved domains showed that all of the deduced HbMADS-box proteins contained a highly conserved MADS-box domain of approximately 50 amino acid residues in length, a semi-conservative K domain with an obvious coiled-coil region, a less conservative I domain, and a non-conservative C domain (Figure 1), which indicates that they belonged to the MADS-box TF family. We further analyzed the conserved domain of HbMADS-box proteins, and the results indicated that for amino acids 3 (R), 17 (R), 20 (T), 23–24 (KR), 27 (G), 30–31 (KK), 34 (E), 38–39 (LC), 48 (F), and 52 (G), up to 13 sites were conserved in the MADS-box domains (Figure 1), indicating that the deduced amino acid sequences of HbMADS-box proteins shared significant homology with each other.


**Figure 1.** Phylogenetic analysis of HbMADS-box genes related proteins. MADS-box, I domain, K domain, and C domain are marked, respectively; coiled-coil domain is boxed.

The conserved motifs and structural characteristics analysis showed that all HbMADS-box genes included four conserved motifs: motifs 1 and 3 belonged to the MADS-box domains, whereas motifs 2 and 4 belonged to the K-box domains. Most HbMADS-box proteins (except HbAGL30 and HbAGL61) contained the MADS-box and K-box domains (Figure S2), which is similar to the previously reported MADS-box conserved motifs in other species [44].

#### *3.3. Gene Structures and Sequence Characteristics of HbMad-box Genes*

Sequence analysis revealed that all of the deduced *H. brasiliensis* MADS-box genes belong to the type II (MIKC-type) MADS-box genes. These HbMad-box genes usually contained multiple introns and exons, with a maximum of 11 exons (Figure S3). According to their predicted structures, the 26 candidate HbMADS-box genes could be divided into eight groups (Table S5). The first group of HbMADS-box genes consisted of eight exons interrupted by seven introns, including eight type II HbMADS-box genes (HbSEP, HbAGL9.2, HbCMB1, HbCMB1-L, HbSVP1, HbMADS4, HbAGL65, and HbMADS5). The second and third groups included five type II HbMADS-box genes, which contained six exons interrupted by five introns (HbAGL6, HbMADS2, HbMADS27, HbSVP2, and HbTT16), and seven exons interrupted by six introns (HbAP1, HbMADS3, HbAGL12, HbAG, and HbDEFL), respectively. The fourth and fifth groups consisted of two type II HbMADS-box genes, which contained 10 exons interrupted by nine introns (HbAGL9.1 and HbAGL8), and five exons interrupted by four introns (HbAGL11 and HbPMADS2), respectively. The sixth group included two type II HbMADS-box genes (HbAGL30 and HbMADS1), which contained 11 exons interrupted by 10 introns. The seventh and eighth groups each contained one type II HbMADS-box gene (HbAGL15 and HbAGL61), which consisted of four exons interrupted by three introns, and only one exon, respectively. These results revealed that most of the HbMADS-box genes (69.2% belong to the first, second, and third groups) within the same group shared conserved exon–intron structures, and the other HbMADS-box genes (30.8% of the fourth to eighth groups) with the different exon–intron structures may belong to distinct types of HbMADS-box genes in the rubber tree (Table S5). These characteristics are consistent with the features of MADS-box genes in other flowering plants, such as *Arabidopsis*, *Oryza sativa*, and *Vitis vinifera* [45–47].

#### *3.4. Phylogenetic Analysis of HbMad-box TFs*

Based on the phylogenetic tree analysis, the 26 HbMADS-box genes could be divided into three clades (I, II, and III), which contained 11 main branches: SEP, AGL6, FUL/AP1, AGL12, AG, SOC1, AGL15, AGL17, SVP, BS, and AP3/PI subfamilies (Figure 2). Clade I contained 12 type II HbMADS-box genes, and it could be further categorized into two subclades, which contained nine and three members. Clade II contained six type II HbMADS-box genes, and it could also be divided into two subclades, and one subclade contained one member, whereas the other contained five members. Clade III contained eight type II MADS-box genes, which had two subclades, one of which contained three members and the other contained five members.

**Figure 2.** Phylogenetic analysis of HbMADS-box genes with other MADS-box genes by MEGA version 6.0. A phylogenetic tree of MADS-box genes was generated by the neighbor-joining (NJ) method. The multiple sequence alignment and construction of phylogenetic tree were performed with MEGA6.06 using the neighbor joining method with 1000 bootstrap replicates. The proteins were clustered and divided into three distinct sub-families. The three sub-families were further divided into 12 sub-groups.

#### *3.5. Expression Profiles of HbMad-box Genes in Various Developmental Stages during Rubber Tree Reproductive Development*

In the present study, analyzing the expression patterns of HbMADS-box genes in different tissues and organs (including roots, stem, stem tips, leaves, labeled bark, xylem, latex, fruits, inflorescence, male flowers, and female flowers) of *H. brasiliensis* by qRT-PCR, the results showed that most of the deduced HbMADS-box genes might be primarily involved in floral organ differentiation and inflorescence development. For example, HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAG, HbDEFL, and HbPADS2 had significantly higher expression in stem tips and floral organs than other tissues; HbAP1 showed the highest expression pattern in stem tips compared to other tissues (Figure 3); HbAGL8 was significantly higher expressed in stem tips and leaves than other tissues (Figure 3); the MADS-box genes HbAGL15, HbTT16, HbMADS2, and HbAG were higher expressed in fruits than other tissues (Figure 3); and the MADS-box genes HbAGL9.2 and HbAGL6 were specifically expressed in floral organs and they were higher expressed in male flowers than other tissues (Figure 3). The higher expression of the HbMad-box genes in floral organs and shoot tips suggests that they may play a specific role in the corresponding tissues. However, in contrast, some MADS-box genes displayed different tissue expression patterns, such as HbAGL15, HbAGL30, HbAGL61, and HbAGL65, which were almost constitutively expressed in all tested tissues (Figure 3). In addition, we also found that some HbMADS-box genes showed higher expression patterns in other specific tissues; for example, HbMADS1 and HbMADS3 were higher expressed in latex than other tissues (Figure 3), which was consistent with the findings of a previous report that found that HbMADS1 and HbMADS3 were highly expressed in laticifer cells.

**Figure 3.** Relative expression levels of HbMad-box genes were determined by qRT-PCR and normalized by 18S rRNA gene expression. Relative expression levels of HbMad-box genes were determined by qRT-PCR and normalized by the 18S rRNA gene expression. For each gene, the transcript level in the root was used to normalize the transcript levels in other tissues. Values are means ± SD (standard deviation) of three biological replicates. 1–14 represent roots; stem; stem tips; leaves; labeled bark; xylem; latex; fruits; 3 cm inflorescence; 6 cm inflorescence; 9 cm inflorescence; 12 cm inflorescence; male flowers; and female flowers, respectively.

#### *3.6. Expression Patterns of HbMADS-box Genes in Response to Abiotic Stress*

Analyzing the expression patterns of 12 floral-enriched HbMADS-box genes under cold, high-temperature, drought, and salt stresses by qRT-PCR, the results indicated that the expressions of 12 HbMADS-box genes specifically related to the floral organ were affected by low-temperature stress (Figure S4). Among the 12 cold-regulated HbMADS-box genes related to floral organ development, most of them (HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAG, HbDEFL, and HbMADS2) were rapidly induced by cold stress, and reached the highest levels at 3 or 6 h after treatment, and then declined at 12 or 24 h after treatment. Only three HbMADS-box genes, HbAGL8, HbAP1, and HbTT16, exhibited down-regulated expression at all analyzed time-points in response to cold stress.

Under high-temperature stress, only HbDEFL expression did not significantly change (Figure S5). Of the 11 high temperature-responsive HbMADS-box genes, two genes (HbSEP and HbCMB1) were significantly up-regulated at all treated time points. Seven genes (HbAGL9.1, HbAGL9.2, HbCMB1-L, HbAGL6, HbAP1, HbAG, and HbTT16) were significantly up-regulated at at least one time point (Figure S5). By contrast, HbAGL8 and HbPADS2 were significantly down-regulated at all analyzed time points.

Under polyethylene glycol induced drought stress, the expression patterns of 12 HbMADS-box genes related to floral organ development were markedly affected (Figure S6). Of the 12 drought-responsive HbMADS-box genes, 10 genes (HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAP1, HbAG, HbTT16, HbDEFL, and HbMADS2) were significantly up-regulated during at least one time point after drought stress (Figure S6). Among them, HbAGL9.1, HbCMB1-L, HbAG, HbTT16, and HbDEFL were strongly affected by drought stress, reaching the highest levels that were 2-, 2.5-, 3.7-, 4-, and 5.8-fold increases compared with the control at 6 h, 12 h, 24 h, 7 day, and 4 day after treatment, respectively. Interestingly, the expression of HbAGL6 and HbAGL8 increased continuously after drought stress, and resulted in a 120- and 11-fold increase, respectively, at 7 day after treatment.

Under salt stress, the expressions of 12 floral-enriched HbMADS-box genes were regulated by salt stress. Eight genes (HbAGL9.1, HbAGL9.2, HbAGL6, HbAGL8, HbAP1, HbAG, HbDEFL, and HbPADS2) were significantly up-regulated at all of the treated time points (Figure S7). Their expression levels showed rapid up-regulation under salt stress, which peaked at 2, 48, 48, 12, 24, 48, 2, and 48 h of treatment, respectively. Four genes (HbSEP, HbCMB1, HbCMB1-L, and HbTT16) were significantly up-regulated at at least one time point under salt stress (Figure S7), and their transcripts reached maximum levels that were 2-, 2.9-, 12-, and 24-fold up-regulated compared with the control at 12, 2, 2, and 2 h after treatment, respectively.

#### *3.7. Expression Patterns of HbMADS-box Genes in Response to Phytohormones*

After ABA treatment, HbAGL9.1, HbAGL6, HbAP1, and HbPADS2 displayed obvious up-regulation across all time points (Figure S8). Their expression levels exhibited rapid up-regulation under ABA treatment, and peaked at 48, 48, 12, and 48 h after treatment, respectively. HbAGL9.2 and HbCMB1-L showed similar expression patterns; their transcripts exhibited down-regulated expression at the 6-h time point, but they were significantly up-regulated at other time points. HbSEP, HbCMB1, HbAGL8, HbAG, and HbDEFL were down-regulated at the time points of 24, 12, 0.5, 24, and 24 h compared with the control, respectively, but they displayed obvious up-regulation at other time points. In comparison, HbTT16 showed an irregular expression pattern under ABA treatment; it was up-regulated at 48 h, but was down-regulated at 0.5, 2, 12, and 24 h.

After GA3 treatment, HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbAGL6, HbAGL8, HbAP1, HbTT16, and HbDEFL showed obvious up-regulation across all time points (Figure S9). Their expression levels exhibited rapid up-regulation under GA3 treatment, and reached a maximum level at 0.5, 12, 6, 48, 12, 48, 6, and 48 h after treatment, respectively. However, HbAG and HbPADS2 were slightly down-regulated at the 24-h time point, but their expression levels showed obvious

up-regulation at other time points. By contrast, HbCMB1-L displayed an irregular expression pattern; it was strongly up-regulated at 0.5, 2, and 48 h, but was down-regulated at 6, 12, and 24 h.

#### **4. Discussion**

Understanding the flowering regulation of *H. brasiliensis* is important for accelerating the breeding process of this species. Based on studies of Arabidopsis and other plants, extensive efforts have been devoted to clarifying the molecular mechanism of reproductive development in plants [48,49], and techniques such as RT-qPCR and high-throughput RNA sequencing have been used in many transcriptional-level studies to identify the genes that regulate the metabolism of plant reproductive development. However, available information about TFs related to reproductive development in *H. brasiliensis* is still limited. To date, no study has been performed to determine the expression patterns of HbMad-box genes relative to rubber tree reproductive development.

As an important gene family, Mad-box genes are widespread in the plant kingdom. The number of Mad-box genes varies considerably in genomes of different species, from 20 in *Physcomitrella patens* (Hedw.) Bruch & Schimp. to 167 in *Brassica rapa* L. [50,51]. In this study, we performed genome-wide identification and analysis of the MADS-box gene family related to flower development in *H. brasiliensis*. A total of 20 MADS-box genes were newly identified in the *H. brasiliensis* genome, which was lower than the number of MADS-box genes in the woody plant *Malus pumila* Mill. [52]. A domain search using EMBL with the corresponding *H. brasiliensis* candidate protein sequences confirmed that 26 of the sequences contained a 'MADS' domain. We classified all 26 putative *H. brasiliensis* MADS-box proteins into three clades, which were consistent with the classifications of MADS-box gene family members in other flowering plants [53]. Sequence alignment analysis of 26 MADS-box genes revealed that their ORFs ranged from 609 to 1116 bp, and predicted protein lengths that ranged from 202 to 371 amino acids (Table 1). Subsequent gene structures analysis showed that most of the HbMad-box genes usually contained multiple introns, with a maximum of 11 introns; the exception was HbAGL61, which did not have any introns. Interestingly, we observed that only MIKC-type (no M-type) MADS-box genes existed in *H. brasiliensis*. In contrast, the number of M-type MADS-box genes in *Arabidopsis* is more than that of the MIKCC-type [54]. Furthermore, we found that the expansion of MIKCC-type and MIKC\*-type MADS-box genes was disproportionate, and there were more MIKCC-type (twenty-one members) than MIKC\*-type (five members) MADS-box genes presented in *H. brasiliensis*, which might be related to the fact that MIKCC-type genes conducted as functional genes to perform more complex functions in *H. brasiliensis* flower organogenesis.

Most studies on the functional identification of MADS-box genes were conducted on model plants [55–57], which also substantially contributed to revealing the diverse functions of MADS-box genes. In order to detect the expression of HbMADS-box genes in different tissues, we analyzed the expression patterns of HbMADS-box genes in 11 sample tissues and four stages of inflorescence development by qRT-PCR. The results revealed that for close orthologs of SEP and AGL6 in the rubber tree, HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, and HbAGL6 showed higher expression during stem tip and floral organ development; for close orthologs of FUL-AP1 and AG in the rubber tree, HbAGL8, HbAP1, and HbAG, also exhibited higher expression during stem tip and floral organ development (Figure 3). The same expression pattern was also observed in subclade AP3-PI; HbDEFL and HbPADS2 were found to be highly expressed during stem tip and floral organ development, which indicates association with the development of reproductive organs. These findings are consistent with those of previous results that revealed the MIKC-type MADS-box gene as the flower homeotic gene that plays a dominant role in floral organ development [58].

Interestingly, HbAGL6 was progressively increased in four stages of inflorescence development, and the expression level of HbAGL6 in the male flowers was almost four-fold more than the female flowers in *H. brasiliensis*, which indicates that HbAGL6 may be involved in the development of inflorescence and floral organs, and especially participates in the development of male flowers organs (Figure 3). In addition, HbDEFL also displayed strong expression in inflorescence and

was almost four-fold more abundant in the male flowers than the female flowers in *H. brasiliensis*, which indicates that HbDEFL may play an important role in the development of male flowers. In contrast, HbTT16 showed strong expression in the fruits and female flowers, and the expression level of HbTT16 in the female flowers was almost 40-fold more than the male flowers in *H. brasiliensis*, which indicates association with the development of female organs in the rubber tree. In other plant species, TT16 homologs were also demonstrated to mediate the crosstalk between the endothelium and nucellus for the development of female organs [59,60].

Besides being involved in the essential regulation of the development of floral organs, many studies have shown that MADS-box genes also play an essential role in the response to various stresses and exhibit differential expression patterns under abiotic stress [61,62]. Rubber trees originated in the Amazon Basin of South America, but are now widely planted in the northern margin of tropical areas, such as Southeast Asia countries (e.g., Thailand, Vietnam, and southern China), which are often suffer from cold, drought, typhoon, and other abiotic stresses.

To understand the responses of HbMADS-box genes to various stresses, the expression profiles of 12 floral-enriched HbMADS-box genes were examined in leaves of *Hevea* seedlings after hormone, salt, cold, high-temperature, and drought stress treatments. In the present study, almost all of the HbMADS-box genes specifically related to floral organs exhibited changes in expression patterns when responding to hormone, salt, cold, high-temperature, and drought stresses (Figures S4–S9). Our results show that different HbMADS-box genes responded differently to abiotic stress, which indicates that stress responses of *H. brasiliensis* are regulated by many factors, involving a variety of signaling pathways. In addition, the hormone and salt stress responses of 12 floral-enriched HbMADS-box genes were more intense than those of temperature and drought stress responses. Among them, HbAGL9.2, HbAGL6, and HbAP1 were strongly induced by ABA; HbAGL9.1, HbAGL9.2, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbTT16, and HbPADS2 were strongly induced by GA3; and HbAGL9.2, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbTT16, and HbPADS2 were strongly induced by salt treatments, whereas their expressions were slightly affected by cold, high-temperature, and drought stresses. The expressions of HbAGL9.2, HbAGL6, and HbAP1 were strongly induced by a variety of hormones, which revealed that these genes might play a crucial role in hormone signaling in *H. brasiliensis*.

Overall, we characterized HbMADS-box genes as multifunctional regulators in *H. brasiliensis* based on tissue specific expression analysis, as well as various stress responses. Our results revealed that HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbAG, HbDEFL, and HbPADS2 may mainly regulate the differentiation of flower buds and could help regulate reproduction in *H. brasiliensis*; whereas HbTT16 may mainly regulate the development of fruits and female organs in the rubber tree. Expression profiling revealed that different HbMADS-box genes responded differently to abiotic stresses, which indicates that abiotic stresses in *H. brasiliensis* are regulated by many factors and various signaling pathways. Our findings will promote the development of technology that can control the reproduction of *H. brasiliensis* by genetic engineering. In the near future, we will be able to verify the role of these HbMADS-box genes in the transition of vegetative to reproductive development using transgenic *H. brasiliensis*, as well as transgenic *Arabidopsis*.

#### **5. Conclusions**

In this study, 20 new HbMad-box genes were identified in the rubber tree. Subsequently, the bioinformatics characteristics, sequence identity, conserved domains, gene structure, and phylogenetic relationship of these genes were systematically analyzed. Expression profiling revealed that HbMad-box genes were differentially expressed in various tissues, which indicated that HbMad-box genes may exert different functions throughout the life cycle. Additionally, HbSEP, HbAGL9.1, HbAGL9.2, HbCMB1, HbCMB1-L, HbAGL6, HbAGL8, HbAP1, HbAG, HbDEFL, HbTT16, and HbPADS2 were found to be associated with the differentiation of flower buds and may be involved in flower development. All of these floral-enriched HbMADS-box genes were regulated by hormone, salt, cold, high-temperature, and drought stresses, which revealed that abiotic stresses in

*H. brasiliensis* are regulated by many factors and various signaling pathways. Our results provide a comprehensive overview of the HbMADS-box gene family related to floral development and lay the foundation for further functional characterization of this gene family in *H. brasiliensis*.

**Supplementary Materials:** The following are available online at www.mdpi.com/1999-4907/9/6/304/s1, Figure S1: Tissue samples of rubber tree flowers, Figure S2: The conserved domain logo and motif composition of HbMADS-box genes, Figure S3: Exon-intron structures of HbMads-box genes, Figure S4: Expression profiles of the floral-enriched HbMad-box genes under cold stress, Figure S5: Expression profiles of the floral-enriched HbMad-box genes under high temperature stress, Figure S6: Expression profiles of the floral-enriched HbMad-box genes under drought stress, Figure S7: Expression profiles of the floral-enriched HbMad-box genes under salt stress, Figure S8: Expression profiles of the floral-enriched HbMad-box genes responding to ABA treatment, Figure S9: Expression profiles of the floral-enriched HbMad-box genes responding to GA3 treatment, Table S1: List of primer sequences used for HbMad-box TFs amplification, Table S2: List of primer sequences used for qRT-PCR analysis, Table S3: Sequence features of AtMADSs in *A. thaliana*. (L.) Heynh. Gene IDs, protein length, PI, and molecular weight of corresponding AtMADSs in *A. thaliana* are shown, Table S4: Percent identity matrix of HbMad-box TFs proteins, Table S5: Statistics of exon and intron number distribution among HbMad-box TFs.

**Author Contributions:** M.W. and W.L. designed the experiments. R.P. performed the tissue and organ collection. M.W. and Y.W. performed the experiments. All authors read and approved the final manuscript.

**Funding:** This research was funded by the Central Public-interest Scientific Institution Basal Research Fund for Chinese Academy of Tropical Agricultural Sciences (No. 1630022017010) and the earmarked Fund for Modern Agro-industry Technology Research System (CARS-34).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Genome-Wide Analysis and Expression Profiling of the Heat Shock Factor Gene Family in** *Phyllostachys edulis* **during Development and in Response to Abiotic Stresses**

#### **Lihua Xie, Xiangyu Li, Dan Hou, Zhanchao Cheng, Jun Liu, Juan Li, Shaohua Mu and Jian Gao \***

Key Laboratory of Bamboo and Rattan Science and Technology, International Center for Bamboo and Rattan, State Forestry and Grassland Administration, Beijing 100102, China; xielihua0227@163.com (L.X.); leerduo727@163.com (X.L.); hou120314@foxmail.com (D.H.); chengzhan\_chao@126.com (Z.C.); liujun\_0325@163.com (J.L.); ljgx2003@126.com (J.L.); mush@icbr.ac.cn (S.M.) **\*** Correspondence: gaojian@icbr.ac.cn or gaojianicbr@163.com; Tel.: +86-010-8478-9801

Received: 3 December 2018; Accepted: 23 January 2019; Published: 26 January 2019

**Abstract:** Heat shock transcription factors (Hsfs) play crucial roles in regulating plant responses to heat and other stresses, as well as in plant development. As the largest monopodial bamboo species in the world, how to adapt to various stresses under the background of global climate change is very important for the sustainable development of bamboo forest. However, our understanding of the function of Hsfs in moso bamboo (*Phyllostachys edulis*) is limited. In this study, a total of 22 non-redundant Hsf genes were identified in the moso bamboo genome. Structural characteristics and phylogenetic analysis revealed that members of the PheHsf family can be clustered into three classes (A, B and C). Furthermore, PheHsfs promoters contained a number of stress-, hormoneand development-related cis-acting elements. Transcriptome analysis indicated that most *PheHsfs* participate in rapid shoot growth and flower development in moso bamboo. Moreover, the expression patterns of all 12 members of class A were analyzed under various stresses (heat, drought, salt and cold treatment) through Figurereal-time quantitative polymerase chain reaction (qRT-PCR). Within the class A *PheHsf* members, *PheHsfA1a* was expressed mainly during moso bamboo development. Expression of four *PheHsfA4s* and one *PheHsfA2* (*PheHsfA4a-1*, *PheHsfA4a-2*, *PheHsfA4d-1*, *PheHsfA4d-2*, and *PheHsfA2a-2*) was up-regulated in response to various stresses. *PheHsfA2a-2*, *PheHsfA4d-1* and *PheHsfA4d-2* were strongly induced respectively by heat, drought and NaCl stress. Through co-expression analysis we found that two hub genes *PheHsfA4a-2* and *PheHsfA4a-1* were involved in a complex protein interaction network. Based on the prediction of protein interaction networks, five *PheHsfAs* (*PheHsfA4a-1*, *PheHsfA4a-2*, *PheHsfA4d-1*, *PheHsfA4d-2*, and *PheHsfA2a-2*) were predicted to play an important role in flower and shoot development and abiotic stress response of moso bamboo. This study provides an overview of the complexity of the PheHsf gene family and a basis for analyzing the functions of PheHsf genes of interest.

**Keywords:** moso bamboo; heat shock factor gene; abiotic stresses; co-expression

#### **1. Introduction**

Moso bamboo (*Phyllostachys edulis* (Carrière) J. Houzeau, synonym *Phyllostachys heterocycla* (Carrière) is a large woody bamboo of high ecological, economical and cultural value in Asia. Under suitable spring conditions, its shoot can grow from 0 to 20 m in 45–60 days [1]. Moso bamboo forest covers an area of 3.87 million hm2, accounting for up to 70% of the Chinese bamboo forest area [2,3]. Because of its rapid growth and highly lignified culms, the annual economic value of moso bamboo production, including timber and wood production, reaches 184 billion dollars [4]. Moreover, carbon sequestration in moso

bamboo is two to four times greater than that of Chinese fir, making it an important global non-timber forest resource [5]. The growth of bamboo is dependent on natural precipitation and is vulnerable to high temperature and drought. Liu et al. [6] has shown that temperatures >40 ◦C and drought for >10 days during August result in severe losses in moso bamboo forests. Drought during spring can reduce moso bamboo shoot growth, yield and quality. From July to September, high temperatures and drought affect the sprouting phase of bamboo. These stresses affect the yield and quality of winter shoots, as well as new bamboo yield in the following year and the yield of wood during subsequent years [7]. Climate change has also been associated with more frequent high temperatures and drought, which in turn reduce the ecological and economical value of moso bamboo. Therefore, it is essential to elucidate the molecular mechanisms involved in heat and drought stress responses to improve stress tolerance in moso bamboo.

To survive high temperature and other stresses, plants have evolved a series of defense strategies [8,9]. Heat shock proteins (HSPs) act as molecular chaperones that protect cells against heat and other stress damage by preventing protein aggregation [10,11]. As the terminal components of the stress signal transduction chain, heat shock stress transcription factors (Hsfs) bind to the promoter regions of HSP genes to regulate transcription in response to stress [12,13], particularly heat stress [14]. Hsfs also contain a highly conserved DNA-binding domain (DBD) at their N-terminal and an oligomerization domain (OD or HR-A/B region) composed of two hydrophobic heptad repeats. Based on the amino acids of the HR-A/B region, plant Hsfs are grouped into three main classes (HsfA, HsfB and HsfC) [15]. Certain Hsfs contain a nuclear localization signal (NLS), a nuclear export signal (NES) and an activator motif (AHA). The AHA motif located at the C-terminus in class A Hsfs confers transcriptional activation. Moreover, a repressor domain (RD) that contains the tetrapeptide LFGV occurs at the C-terminal of class B Hsfs.

Recent studies have revealed that plant Hsfs plays important roles in generating responses to heat and other stimuli, as well as in organ development [16]. Class A HsfA1a is regarded as a master regulator and has a unique role in eliciting heat stress responses in tomato (*Solanum lycopersicum* L.) [17]. HsfA2 is functionally similar to HsfA1 in regulating thermotolerance, as well as serving as a key regulator in osmotic and oxidative stress responses [18–20]. The expression of *HsfA3* is induced in *Arabidopsis* by heat and drought stress, indicating that *HsfA3* might play a role in drought and heat stress signaling [21,22]. The thermotolerance of *Arabidopsis* with *hsfA3* T-DNA insertion mutants was decreased [23]. Moreover, the ectopic overexpression of *SIHsfA3* increases thermotolerance and salt hypersensitivity in germination in transgenic *Arabidopsis* [24]. The Arabidopsis mutant *athsfa4a* was more sensitive to dehydration. Furthermore, desiccation tolerance was rescued in *athsfa4a/BnHSFA4a* seeds to similar levels compared with those of Col-0 [25]. Transgenic chrysanthemum overexpressing *CmHSFA4* displayed enhanced salinity tolerance partly due to enhanced Na+/K+ ion and ROS homeostasis [26]. Interestingly, in rice, wheat and *Sedum Alfredii*, HsfA4a possibly confers cadmium tolerance [27,28]. In addition, AtHsfA9 plays a role in embryonic development and seed maturation in *Arabidopsis* [29]. In group B, the majority of HsfBs act as repressors due to the RD region [15]. However, AtHsfB1 act as repressors of the heat shock response under non-heat-stress conditions, but act as positive regulators of heat shock proteins under heat-stress conditions [30]. In group C, *FaHsfC1b* from *Festuca arundinacea* confers heat tolerance in *Arabidopsis* [31].

Although several Hsf family genes in *Arabidopsis* and other plant species have been characterized, functional analysis of those in moso bamboo has been limited. The completion of the draft genome sequence of moso bamboo has greatly facilitated the identification of Hsf family at the whole-genome level [32].

In this study, we describe the genome-wide identification and analysis of the PheHsf family of moso bamboo for the first time. In addition, expression patterns of *PheHsf* genes during development, as well as in response to various abiotic stresses, were also investigated. Our results will provide a foundation and valuable information for future functional analysis of PheHsfs.

#### **2. Materials and Methods**

#### *2.1. Database Searches for Hsf Genes in Moso Bamboo and Analyses of Physicochemical Characteristics*

For more accurate identification of Hsf genes in Moso bamboo, multiple database searches were performed according Hou et al. [33]. First, Hsf mRNA sequences of *Oryza sativa* and *Brachypodium distachyon*, obtained from NCBI Nucleotide database (https://www.ncbi.nlm.nih.gov/) as query sequences to blast against the moso bamboo genome database. For the filtration step of the blast process, the Hsf genes were obtained by blast bamboo transcriptome using Hsf mRNA from some other species identified as query with loose *e*-value of <0.00001. Next, the protein sequences of putative genes obtained from the first step were blast against the NCBI non-redundant protein database with *e*-value of <0.0000000001 by Blast2GO to confirm the identification. The putative genes described as Hsf proteins or proteins belonging to Hsf family were kept, and genes described as other family proteins were abandoned. Then the HSF domain (PF00447) of the HSF family were researched in these putative Hsf proteins to reconfirm our data using the Pfam database (http://pfam.sanger.ac.uk/) and Conserved Domains Database (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) with e-values <0.001. Only the gene models containing HSF domain were considered to belong to the Hsf family. Finally, the selected Hsfs were further screened using the full-length non-chimeric (FLNC) reads (http://www.forestrylab.org/db/PhePacBio/) [34]. The moso bamboo Hsf genome sequences, coding sequences, protein sequences, and putative novel or mis-annotated *Hsf* genes were obtained from moso bamboo genome database. The amino acid sequences of *Arabidopsis*, rice (*Oryza sativa*), and *B. distachyon* Hsf proteins were downloaded from Plant Transcription Factor Database v4.0 (PlantTFDB 4.0, http://planttfdb.cbi.pku.edu.cn/) and Heatster (http://www.cibiv.at/services/Hsf/) [35,36]. Molecular weights and theoretical isoelectric point (pI) were determined using ExPASY (http://web. expasy.org/compute\_pi/). CELLO v2.5 Server (http://cello.life.nctu.edu.tw/) was used to predict the protein subcellular locations for candidate PheHsfs [37]

#### *2.2. Phylogenetic Analysis*

Arabidopsis and rice Hsf gene datasets [38] were used to classify the moso bamboo Hsf genes and predict their functional roles. Multiple sequence alignments of full-length Hsf proteins from Arabidopsis, rice, B. distachyon, and moso bamboo were performed using ClustalX 1.83 (http:// www.clustal.org/) and two online programs (Clustal Omega and MUSCLE) [27,39–41]. An unrooted neighbor-joining (NJ) phylogenetic tree was constructed using MEGA7.0 with 1000 bootstrap replicates. Another phylogenetic tree with only the Moso bamboo Hsfs was also constructed using the amino acid sequences according to the same method.

#### *2.3. Structural and Motif Analyses of PheHsf Genes*

To confirm subgroup designation through phylogenetic analysis, the Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/) was used to illustrate exon-intron organization by aligning every cDNA sequence and its corresponding genomic DNA sequence [42]. The conserved motifs in the candidate PheHsf sequence were defined by MEME version 4.12.0 (http://meme-suite.org/ tools/meme) [43] using the following parameters: number of repetitions = any, maximum number of motifs = 30, minimum width ≥4, maximum width ≤200, and only motifs with an e-value <0.01 were retained for further analysis. The DBD (DNA Binding domain) and HR-A/B domains were identified using Heatster (http://www.cibiv.at/services/Hsf/). NES domains in the PheHsfs were predicted with the NetNES 1.1 server software (http://www.cbs.dtu.dk/services/NetNES/). NLS domains were predicted using cNLS Mapper software (http://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS\_Mapper\_form. cgi). AHA domains were predicted according to the conserved motif sequence FWxxF/L, F/I/L [16].

#### *2.4. Cis-Regulatory Element Analysis of PheHsf Genes*

The 1500-bp sequence upstream from the initiation codon of each *PheHsf* gene was obtained from the moso bamboo genome database. These sequences were used to identify *cis*-acting regulatory elements with the online program Plant CARE (http://bioinformatics.psb.ugent.be/webtools/ plantcare/html/).

#### *2.5. Plant Material*

After the surface was sterilized with 1% formaldehyde, the moso bamboo seeds were germinated in Petri dishes (12-cm diameter) lined with filter paper and containing 10 mL of sterile water. After 4 days, the germinated seedlings were planted in vermiculite and watered with 1/2 Hoagland's nutrient medium weekly (all plants were grown in 16 h day/8 h night at 22 ◦C). Two-month-old seedlings were used for abiotic stress treatments. According to Cheng et al. [44], drought and stress was conducted by incubated seedlings with 20% (*m*/*v*) PEG6000 and 200 mM NaCl, respectively. Heat stress and low temperature treatments were respectively created by placing seedlings in a 42 ◦C and 4 ◦C lighted growth chamber according to Liu et al. [45]. The control seedlings were grown without any stress treatment. The second or third mature leaves were collected at 0 h, 15 min, 30 min, 1 h, 3 h, 6 h, 12 h, and 24 h after abiotic stress treatments. These materials were immediately frozen and stored in liquid nitrogen until total RNA extraction and real-time quantitative polymerase chain reaction (qRT-PCR).

#### *2.6. RNA Isolation and Relative Expression Level Analysis of PheHsfs*

For the tissue-specific expression analysis, the RPKM (the reads per kilobase of exon model per million mapped reads) values of *PheHsfs* were retrieved from transcriptome sequencing of developing flowers (four stages of flowering and leaves) and shoots of moso bamboo (winter shoots, six shoot heights, and culms) [1,46]. RPKM values were used to analyze the relative expression levels of the *PheHsf* genes. For the flower development samples, four developmental stages (F1: the floral buds begun to form; F2: the floral organs gradually matured; F3: the flowers were in full blossom; and F4: the embryo formation stage) were defined based on the anatomical structure of floral organs by Gao et al. [46]. For F1 and F2, the buds were collected, respectively. For F3 and F4, spikelets were collected, respectively. Leaves collected from non-flowering moso bamboo were defined as CK1. For the shoot development stage sample, four development stages of shoots were defined based on the continuing measurement of bamboo shoot height and anatomical changes by Li et al. [1]. The eight samples according to the four developmental stages of moso bamboo shoots were as follows: S1 (winter shoots), S2–S5 (0.5 m, 1 m, 3 m, and 6 m, early growth period), S6–S7 (9 m and 12 m, late growth), and CK (clum after leaf expansion, mature period). For S1–S7, the shoot tips of different heights were collected, respectively. For CK2, each top internode was cut from the top to 1/2, then each top internode was divided into basal, middle and top. After that, the samples were cut from the tissue located in the top part of the three internodes above and then mixed.

For qRT-PCR, *PheHsfs* primers were designed using Primer 3.0 (http://primer3.ut.ee/). Primer sequences, amplicon Length, amplification efficiency, and correlation coefficients are listed in Table S1, and their specificity was verified using the online tool Primer-BLAST (https://www.ncbi.nlm. nih.gov/tools/primer-blast/index.cgi) and the melting curves of PCR products. For every primer pair, a standard curve was constructed to calculate the gene-specific PCR efficiency from the 10-fold series dilution of the mix cDNA template. The R<sup>2</sup> (correlation coefficients) and slope values can be obtained from the standard curve. The following formula was used to calculate the corresponding PCR amplification efficiencies (E): E = (10−1/slope − 1) × 100 [47]. Tonoplast intrinsic protein 41 gene (*TIP41*) was used as an internal control [48]. The qRT-PCR reactions were conducted using a SYBR Green I master mix (Roche, Mannheim, Germany). The qRT-PCR conditions were as follows: 45 cycles

of 95 ◦C for 10 s, 60 ◦C for 10 s, and 72 ◦C for 20 s. Three replicates were performed for each gene. Gene expression was evaluated using the 2−ΔΔ Ct method [49].

#### *2.7. Co-Expression Network and Protein Interactions of PheHsfs*

The expression correlation of the PheHsfs was calculated by Pearson correlation coefficient (PCC; R-value) using gene expression RPKM values from the high-throughput transcriptome data in R. Expression correlation data were used for the correlation network, and co-expressed gene pairs were filtered with a PCC cut-off of 0.85 as previously described [50]. Cytoscape version 3.4.0 were used to analyze and visualize the network [51].

For the protein interaction networks, the homolog Hsf proteins in rice were constructed by STRING (http://stringdb.org/) using an option value >0.7. The homolog proteins of the determined interactive rice proteins were identified in moso bamboo by BLASTP analysis.

#### **3. Results**

#### *3.1. The Hsf Family Genes in Moso Bamboo*

To identify Hsf genes in moso bamboo, a total of 41 candidate PheHsf genes were retrieved from the annotation in the Bamboo Genome Database. From these, 19 candidate PheHsf genes with incomplete HSF domains were considered as PheHsf-like genes, which were not selected for subsequent analysis in this study. Twenty-two putative moso bamboo PheHsf genes containing full HSF domains (PF00447) were confirmed by searching the Pfam and the Conserved Domain Databases. However, the CDS sequence for PheHsfA5, PheHsfB4a-1 and PheHsfB4c-1 contained 42 bases (from 268 to 309 base), 102 bases (233 to 336) and 123 base (256 to 377) inserts, respectively, when compared to the cloned cDNA sequence (Fasta S3). For this analysis, the CDS and amino acid sequences of these three genes are based on the cDNA sequences. The CDS and amino acid sequences are listed in supplementary files (Fasta S1 and S2). The identified 22 PheHsf genes were distributed among 22 scaffolds (Table 1). The sequences of 22 PheHsf genes were named according to the corresponding relationship among in rice, *B. distachyon* and moso bamboo. The number of amino acid (aa) sequences of PheHsf proteins ranged from 247 (PheHsfC1b-1) to 679 (PheHsfA1a), the predicted isoelectric points (pI) varied between 4.70 (PheHsfA6a) and 9.81 (PheHsfB4c-1), and the molecular weight (MW) ranged from 26.77 kDa (PheHsfC1b-1) to 75.05 kDa (PheHsfA1a) (Table 1).

#### *3.2. Phylogenetic Relationships and Multiple Sequence Alignment of PheHsf Genes*

To predict the potential function, an unrooted phylogenetic tree was constructed from an alignment of 94 full length Hsf protein sequences from different species (23 AtHsf, 25 OsHsf, 24 BdHsf, and 22 PheHsf) (Figure 1). Because the alignment results of Hsf proteins in moso bamboo using ClustalX 1.83, MUSCLE, and Clustal Omega were similar, we employed the results of ClustalX 1.83 and MEGA7.0 to illustrate the phylogenetic relationships of the PheHsf family. The PheHsf genes were also grouped into three subgroups, including class A (PheHsfA1a, PheHsfA2a-1, PheHsfA2a-2, PheHsfA4a-1, PheHsfA4a-2, PheHsfA4d-1, PheHsfA4d-2, PheHsfA6b-2, PheHsfA6b-1, PheHsfA6a, PheHsfA5, and PheHsfA7a), class B (PheHsfB1, PheHsfB2a, PheHsfB2c, PheHsfB4c-1, PheHsfB4c-2, PheHsfB4a-2, and PheHsfB4a-1), and class C (PheHsfC1b-1, PheHsfC1b-2, and PheHsfC2b), which is similar to that described in Arabidopsis [52], rice [38], and *B. distachyon* [53]. In moso bamboo, class A was the largest and consisted of 12 members from six subclasses, however, three subclasses (A3, A8, and A9) in this class were not detected. There were seven members in class B from subclasses B1, B2, and B4, but no members in subclass B3. Class C was the smallest, with only three members.


**Table 1.** Overview of *PheHsf* genes in moso bamboo.

Gene ID: refer to Bamboo Genome Datebase (http://202.127.18.221/bamboo/index.php); MW: molecular weight represents the predicted weights of PheHsf proteins; PI: represents the predicted isoelectric point of PheHsf proteins.

#### *3.3. Structure and Motif Analyses of PheHsf Genes*

To better understand the gene structure diversity of PheHsfs, we compared the intron-exon arrangements and the conserved motifs (Figure S1). The number of introns in the Hsf genes of moso bamboo ranged from zero to three. Most of the PheHsfs (19/22) contained one to two introns (Figure S1b). Three introns were found in PheHsf4, whereas none were detected in PheHsfB4a-1 and PheHsfB4c-1.

Based on the known information on functional domains of Hsfs in some model plants [15,54], the sequence and positions of similar domains were identified in the PheHsfs by sequence alignment. As shown in Table 2, five conserved domains (DBD, HR-A/B, NLS, NES, and AHA) were identified. The DBD domain comprised of three α-helices (α1–3) and four β-sheets (β1–4) were found in all PheHsfs (Figure S2a). HR-A/B domain is critical for one Hsf interacting with other Hsfs to form trimer [15]. All class A PheHsfs have a 21 amino-acid (aa) insertion between HR-A and HR-B regions; class C PheHsfs have seven aa insertions; all class B PheHsfs have no insertion (Figure S2b). NLS and NES domain function in the assembly of a nuclear import complex and the receptor-mediated export in complex with the NES receptor [15]. The majority of the PheHsfs showed the presence of a NES and/or NLS domain. In addition, the activation domain AHA was found in all class A members but not in classes B and C (Table 2).

**Figure 1.** Phylogenetic relationship of PheHsf, AtHsf and OsHsf proteins. Neighbor-joining method and MEGA 7.0 software were used for phylogenetic analysis of Hsf proteins from *Phyllostachys edulis* (22 PheHsfs), *Arabidopsis thaliana* (23 AtHsfs), *Oryza sativa* (25 OsHsfs), and *Brachypodium distachyon* (24 BdHsfs). The names of subclass are shown outside of the circle. Branch lines of subclass are colored, indicating different Hsf subclasses.

A MEME motif search revealed a total of 19 motifs in the PheHsfs (Figure S1c). Three motifs (1, 2, and 4) constituting the DBD domain were identified. Motif 3 and 5 for the OD domain were detected in class A and C, and motif 6 for the OD domain was observed in class B. Motif 6 and motif 11, motif 8, and motif 17 (NSL) were observed in class A, class B and C, respectively. In general, the structure of the PheHsf proteins is conserved in moso bamboo. Furthermore, motif 7 represented the AHA domain close to the PheHsfA C-terminus (Figure S1c and Table 2).


#### **Table 2.** Functional domains of PheHsfs.

DBD: DNA-binding domain; HR–A/B: OD (oligomerisation domain); NLS: Nuclear localization signal; AHA: Activator motifs; RD: Tetrapeptid motif LFGV as core of repressor domain; NES: Nuclear export signal; nd, no motifs detectable by sequence similarity search.

#### *3.4. Cis-Regulatory Element Analysis in Promoters of PheHsfs*

To predict the biological function of PheHsfs, the 1500 bp upstream sequence from the translation start sites of PheHsf genes were analyzed using the PlantCARE database. The results show that the promoter of each PheHsf has several cis-regulatory elements such as phytohormone- (abscisic acid, jasmonic acid and gibberellic acid), abiotic stress- (low temperature, heat stress, drought, and fungal elicitor), and developmental process-related elements. Figure S3 shows that the ABA-responsive element (ABRE), the MeJA-responsive element (CGTCA-motif), and SA-responsive element (TCA-element) were found in the promoters of 17, 16, and 11 PheHsf genes, respectively. The promoters of 12 and 10 PheHsf genes contained the HSE and LTR, respectively. MYB-binding sites involved in drought inducibility (MBS), fungal elicitor-responsive elements (Box-W1) and defense- and stress-responsive elements (TC-rich) were found in 17, 15 and 8 PheHsf genes, respectively. Additionally, meristem expression (CAT-box), meristem-specific activation (CCGTCC-box), and endosperm expression (Skn-1\_motif) motifs were found in the 13, 10 and 18 PheHsf genes, respectively. These findings indicate that PheHsfs might be associated with different transcriptional regulatory mechanisms for developmental, hormone and stress processes.

#### *3.5. Expression Pattern of the PheHsf Genes in Shoot and Flower Development*

Based on the RNA-Seq data of different flowering developmental stages [43] and the internodes of shoots at different heights [1], a heat map was constructed according to the RPKM of 22 PheHsfs (Figure 2). During four flowering developmental stages of moso bamboo, *PheHsf* genes could be classified into four groups (A, B, C, and D) according to their relative expression levels (Figure 2a). Most of the *PheHsfs* were highly expressed (RPKM > 10) in at least one stage, and only four PheHsf genes were expressed at low levels (RPKM < 2) in at least two stages during floral development (Table S2). Four members (*PheHsfB4a-1*, PheHsf15, *PheHsf12*, and *PheHsfB4c-1*) of group C had higher transcript accumulation during two earlier stages (F1 and F2) and were downregulated at two later stages (F3 and F4 stage). Group D consisted of 10 PheHsf genes (*PheHsfA4a-1*, *PheHsfB2c*, *PheHsfA1a*, *PheHsfA4a-2*, *PheHsfC2b*, *PheHsfA7a*, *PheHsfB2a*, *PheHsfA2a-2*, *PheHsfA6b-1*, and *PheHsfB1*), and their expression levels steadily increased at the F3 and F4 stages. Group B comprised of six *PheHsfs* (*PheHsfA4d-1*, *PheHsfA4d-2*, *PheHsfA5*, *PheHsfA6b-2*, *PheHsfA6a*, and *PheHsfA2a-1*), showing higher transcript accumulation at two later stages (F3 and F4) and in the leaves. Group A only had two genes, *PheHsfC1b-1* and *PheHsfC1b-2*, with expression levels six times and three times higher in leaves than the four stages of floral development, respectively. During shoot growth, Most *PheHsfs* (19/22)

showed very low expression levels (RPKM < 8) at all seven stages, and only two genes (*PheHsfA1a* and *PheHsfA6b-2*) had higher expression levels (RPKM > 20) in at least one stage of bamboo shoot growth (Table S3). Based on the expression profiles, the *PheHsfs* were classified into four groups (Figure 2b). Among these, *PheHsfA1a* and *PheHsfA2a-2* were clustered into the same group with continuous down-regulated expression from the S1 to S7 stage. However, *PheHsfA6b-2* and *PheHsfB2c* were clustered in the same group, which showed twin peaks during the S2 and S5 stage, respectively.

**Figure 2.** Expression pattern of *PheHsfs* in developmental flowers and shoots of moso bamboo. (**a**) The expression profile of *PheHsfs* in different stages of flowering. F1–F4: The four stages of developmental flowers (the floral buds begun to form, the floral organs gradually matured but did not undergo flowering, the flowers were in full blossom, and the embryo formation stage); CK1: leaves. (**b**) The expression profile of *PheHsfs* in different stages of shoots. S1: winter shoot; S2–S7: different heights of shoots (0.5 m, 1 m, 3 m, 6 m, 9 m and 12 m), and CK2 (culms after leaf expansion). All the samples had one repeat. The heatmap was pictured using R based on the RPKM of *Phehsfs* in each sample. Details of the RPKM are shown in Tables S2 and S3.

For different periods of flower and shoot development in moso bamboo, the *PheHsfA1a* and *PheHsfA6b-2* genes showed high transcript accumulation in all 11 stages, whereas four genes (*PheHsfB4a-1*, *PheHsfA4d-1, PheHsfA7a*, and *PheHsfB4a-2*) exhibited extremely low transcript accumulation at all stages. Moreover, the other 15 *PheHsfs* depicted significantly higher transcript levels during flower development than during shoot growth. These findings indicate that these genes have different regulatory roles in moso bamboo flower development and shoot growth.

#### *3.6. PheHsfAs Expression in Moso Bamboo in Response to Various Stresses*

The expression of 12 PheHsfAs were analyzed by qRT-PCR under four abiotic stresses: high temperature, cold, drought and salt (Figure 3). The expression levels of all members of subclass A were upregulated (>2 folds) after at least one treatment (Table S4, Figure 3b). For heat stress, eight PheHsfAs were upregulated (>2 folds). PheHsfA4d-1, PheHsfA2a-2, and PheHsfA6b-1 rapidly responded to high temperatures, showing upregulation up to ~nine-fold within 0–1 h of 42 ◦C (Table S4) treatment. Among these, PheHsfA2a-2 (~22-fold higher than the control) was the most strongly induced gene. For cold stress, the expression levels of seven PheHsfAs (PheHsfA4a-1, PheHsfA1a, PheHsfA4a-2, PheHsfA4d-2, PheHsfA4d-2, PheHsfA5 and PheHsfA2a-2) were at least two-fold greater than the control for at least one of the time points. For drought and salt stress, eight PheHsfAs (PheHsfA4a-1, PheHsfA4a-2, PheHsfA4d-1, PheHsfA4d-2, PheHsfA5, PheHsfA7a, and PheHsfA6a) were upregulated. The expression level of PheHsfA4d-2 was ~20-fold higher than control after 3 h of 20% PEG treatment. The expression level of PheHsfA4d-1 was ~70-fold higher than the control after 3 h of 200 mM NaCl treatment (Table S4). Among the PheHsfA genes, five genes (PheHsfA4a-1, PheHsfA4a-2, PheHsfA4d-1, PheHsfA4d-2, and PheHsfA2a-2) were upregulated relative to the control treatment. (Figure 3b). These findings indicate that these genes might play vital roles in different stress response pathways.

**Figure 3.** Expression analysis of PheHsfAs under different abiotic stress treatments in moso bamboo. (**a**) Heat map representation for the expression patterns of PheHsfAs after 15 min, 30 min, 1 h, 3 h, 6 h, 12 h, and 24 h of heat, cold, drought, and salt stresses: expression levels under stress vs. control; the expression results were obtained by qRT-PCR. The different colors correspond to log2 transformed value. (**b**) Venn diagram showing the number of overlapping PheHsfAs that are up-regulated > two-fold under abiotic stress: heat, cold, salt, and drought. Details of the expression data are shown in Table S4.

#### *3.7. Expression Correlation and Interaction Networks*

To further investigate the PheHsf proteins and how they interact with each other, a co-expression network was constructed using expression values of PheHsf genes during shoot and flower development. The connecting gene with PCC magnitude >0.85 was recognized as strongly co-expressed genes [36]. The result showed that 18 PheHsf genes and 38 correlations in a co-expression network with PPC >0.85 cutoffs were obtained (Figure 4a). Genes with stronger correlation might play roles as interacting partners in similar biological pathways. Based on the results, *PheHsfA4a-1* and *PheHsfA4a-2* were recognized as hub genes as nodes with 10 and 9 connectivity in the whole network, respectively, which had more connectivity than other PheHsf genes. Both *PheHsfA4a-1* and *PheHsfA4a-2* had an up-regulated expression in response to different stress treatments (heat stress, cold, drought and salt). These results indicated that the two *PheHsfA4s* with a greater role in shoot and flower development also had an important role in stress responses.

To identify the two PheHsfA4a-associated proteins and protein complexes, prediction networks were built with STRING (http://www.string-db.org/) based on the interaction network of rice orthologous genes (Figure 4b). Because the rice orthologous gene of both *PheHsfA4a-1* and *PheHsfA4a-2* were *OsHsfA4a*, the identified moso bamboo proteins predicted to participate in the interaction network with *PheHsfA4a-1* and *PheHsfA4a-2* were the same. They interacted directly with 10 identified proteins, including five HSP70 proteins, one HSP90 protein, and four MAPK proteins. Because *PheHsfA2a-2* (~22-fold higher than the control), *PheHsfA4d-2* (~20-fold higher than the control) and *PheHsfA4d-1* (~79-fold higher than the control) were strongly induced by heat, drought and NaCl stress, respectively. We also identified the two PheHsfA4d- and *PheHsfA2a-2*- associated proteins and protein complexes, and PheHsfA4ds and *PheHsfA2a-2* were also predicted to interact with HSP70 proteins, HSP90 protein, and MAPK proteins (Figure 4c,d).

**Figure 4.** Co-expression network and interaction network of selected PheHsf genes in moso bamboo. (**a**) The model was built based on RPKM of RNA-seq of shoot and flower development. (**b**–**d**) Interaction network of PheHsfA4as, PheHsfA4ds and PheHsfA2a-2 in moso bamboo, respectively. Colored balls (protein nodes) in the network were used as a visual aid to indicate different input proteins and predicted interactors. Protein nodes which are enlarged indicate the availability of 3D protein structure information. Gray lines connect proteins which are associated by recurring textmining evidence. Line thickness indicates the strength of data support.

#### **4. Discussion**

#### *4.1. Characterization of the Moso Bamboo Hsf Genes Family*

Hsf genes play essential roles in plant development and in responding to various stress conditions [55,56]. To explore the functions of *PheHsf* s in moso bamboo for the first time, the present study identified a total of 22 PheHsf genes according to the moso bamboo genome database and FLNC reads database [32,34,57].

We found that the moso bamboo has a similar number of *Hsf* s as rice, *B. distachyon*, maize, and *A. thaliana* (22–25). This partially accounts for the support of Hsf conservation during evolution. Phylogenetic analysis of Hsfs in moso bamboo, *O. sativa*, *B. distachyon*, and *A. thaliana* indicated that PheHsfs have a higher degree of sequence similarity with OsHsfs and BdHsfs than AtHsfs, which coincides with the evolutionary relationships among the four species. All three *Hsf* classes (A, B, and C) were identified in the three monocots and one dicot, implying that the Hsf genes originated prior to the divergence of monocots and dicots.

In the investigation of conserved Hsf domains and intron-exon structures, all 22 of the PheHsfs contain the necessary (DBD and OD) and/or specific protein domains (NLS, NES, and AHA). The hydrophobic core of DBD domain ensures precise positioning and highly selective interaction with heat stress promoter elements [15]. The OD of plant Hsfs confers distinct patterns of specificity for hetero oligomerization [15]. These AHA motifs, which are located in the C-terminal, are characterized by aromatic (W, F, Y), large hydrophobic (L, I, V) and acidic (E, D) amino acid residues [15]. Those domains might be essential for functional conservation [56]. Twenty of twenty-two PheHsf genes have one intron in their DBD domain (Figure S1), which is an evolutionarily conserved intron [13]. However, no intron was identified in *PheHsfB4a-1* and *PheHsfB4c-1* of subclass B4, which is different from subclass B4 of rice and *B. distachyon* [53].

#### *4.2. Cis-Regulatory Element Analysis in the Promoters of PheHsfs*

Previous studies have illustrated the key roles of *Hsf* s in developmental processes and stress tolerance through their regulation of target genes [58]. The number and form of *cis*-elements in the promoter region might play an essential function in the regulation of gene expression in relation to metabolic pathways [59]. The in silico survey of putative *cis*-elements using PlantCARE showed that 12 of the 22 *PheHsfs* promoters contained HSEs. This implies that *PheHsfs* might regulate themselves [55]. Additionally, the promoter region of *PheHsfA1a* contained more types of development-related *cis*-elements (Figure S3), which coincides with its constitutive expression during shoot and flower development in moso bamboo (Figure 2).

#### *4.3. PheHsfAs Involvement in Development Processes*

The 22 *PheHsfs* exhibited diverse expression patterns during shoot and flower development of moso bamboo under normal conditions. *PheHsfA1a* was upregulated during shoot and flower development and was constitutively expressed in different tissues (Figure 2). These findings are similar to that in *Arabidopsis* [60] and *Salix suchowensis* [61]. Under normal conditions, class A1 *Hsfs* of *Arabidopsis* are involved in housekeeping processes, and *SsuHsf-A1a* of *Salix suchowensis* are constitutively expressed in different tissues [60,61]. In rice, nearly all classA members showed high expression levels in all tissues [45,62]. In this study, nearly all the *PheHsfAs* (except *PheHsfA7a*) were found to show high transcription levels in the leaves, culms, and the four stages of flower development, similar to that in rice. Only two members of *PheHsfAs* (*PheHsfA1a* and *PheHsfA6b-2*) have high transcription levels in the seven stages of shoot development. This indicated that the function of *PheHsfAs* is conserved and/or specific in regulating flower development and shoot growth in moso bamboo.

#### *4.4. PheHsfAs are Involved in Stress Responses*

Under heat or other stress conditions, plant Hsfs regulate the transcription of target genes (*Hsps* and other stress-inducible genes) to enhance stress resistance. Recent genome-wide expression profile analyses showed that most of the Hsf genes are upregulated after heat, cold, drought, and salt stress [56,58,60]. An increase in the number of Hsf genes has been shown to improve plant stress tolerance [54,63]. In moso bamboo, Zhao et al. identified seven and two *PheHsfs* that were upregulated after 0.5 h and 8 h of high light stress (1200 <sup>μ</sup>mol · <sup>m</sup>−<sup>2</sup> · <sup>s</sup>−1), suggesting that these play vital roles both in response to short-term (0.5 h) and mid-term (8 h) high light [64]. However, the regulatory roles of Hsfs in response to other abiotic stresses are unclear. Classes B and C do not harbor AHA motifs, which are essential for the activity of class A Hsfs [16]. Therefore, class A PheHsf genes were selected to further study their response patterns under stress and hormone treatments.

Under heat treatment, 9 of 12 *OsHsfAs* and 9 of 13 *BdHsfAs* were upregulated in rice and *B. distachyon*, respectively [59,62]. In our study, 8 of 12 *PheHsfAs* were found to be induced by high temperature. Of these, *PheHsfA4d-1*, *PheHsfA2a*-2 and *PheHsfA6b* responded more quickly to heat stress than others. *PheHsfA2a*-2 showed stronger induced under-heat stress conditions, which is similar to that of its homologous genes in rice, Arabidopsis, and *B. distachyon* [18,45,53]. These findings suggest that *PheHsfA2a*-2 plays an important role in response to heat stress of moso bamboo. However, *PheHsfA2a*-*1*, which is the most similar to *PheHsfA2a*-2 and also belongs to A2a-type PheHsfs, was slightly upregulated 1 h after heat stress application. *PheHsfA4d-2*, which is the most similar to *PheHsfA4d-1*, was less upregulated by heat and salt stress compared to *PheHsfA4d-1*, but showed higher relative expression levels than *PheHsfA4d-1* with cold and drought stresses.

HsfA4a of wheat (*Triticum aestivum*) and rice conferred cadmium tolerance in yeast and plants, but other Hsfs with similar structure (OsHsfA4d, AtHsfA4a, and AtHsfA4c) did not [27]. HsfA4a of Arabidopsis and chrysanthemum (chrysanthemum cultivar 'Jinba') confers salt and oxidative stress [26,65]. In this study, *PheHsfA4a-1*, *PheHsfA4a-2*, *PheHsfA4d-1* and *PheHsfA4d-2* were upregulated in response to these four abiotic stress (heat, cold, drought and salt). These findings indicate that the three *PheHsfA4s* could act as potential "nodes" for connecting the above four abiotic stresses.

#### *4.5. Expression Correlation and Interaction Networks*

In addition, the expression values of PheHsf genes in shoot and flower developmental stages were used to identify potential underlying co-expression networks using Cytoscape. Based on their degree of connectivity, the hub genes *PheHsfA4a-1* and *PheHsfA4a-2* were identified to play a regulatory role and correlated with other PheHsfs in the complex feedback network. According to the prediction of five PheHsf proteins (four PheHsfA4s and one PheHsfA2a-2) interaction networks, the interactive proteins might include MAPKs and heat shock proteins. MPK5 protein in rice acts as a positive regulator of drought, salt and cold tolerance; is involved in disease resistance and abiotic stress tolerance signaling pathways; and also negatively modulates pathogenesis-related (PR) gene expression and broad-spectrum disease resistance [66]. The MPK1 protein in rice acts downstream of heterotrimeric G protein alpha subunit and small GTPase RAC1 and may regulate the expression of various genes involved in biotic and abiotic stress response [67]. Base on the above information and our results, four PheHsfA4s and one PheHsfA2a-2 may play a very important role in shoot and flower development and stress response.

#### **5. Conclusions**

In this study, 22 PheHsf genes in moso bamboo were identified for the first time. These genes could be classified into three classes (A, B and C) according to the comparison of the phylogenetic relationships with *O. sativa*, *B. distachyon* and *A. thaliana* Hsf genes. Expression analyses revealed that two hub genes, *PheHsfA4a-1* and *PheHsfA4a-2*, might act as a potential "node" for crosstalk between developmental processes and abiotic stress responses. Furthermore, *PheHsfA2a-2*, *PheHsfA4d-1* and *PheHsfA4d-2* might also act as essential parts in response to stress. These results provide insights into the responses of *PheHsfAs* to abiotic stresses treatments, although their underlying molecular mechanism requires further study.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/10/2/100/s1, Figure S1: Phylogenetic tree and gene structure of PheHsf genes, Figure S2: Multiple sequence alignment of the DBD domains and OD region of the PheHsf proteins, Figure S3: Various cis-acting elements in PheHsf genes promoter regions, Table S1: Primer sequences used in gene expression with qRT-PCR experiments, Table S2: The relative level of gene expression of PheHsf genes during developmental flowers (F1 to F4) and leaves (CK)(RPKM), Table S3: The relative level of gene expression of PheHsf genes during growing shoots (S1–S7) and clum (CK)(RPKM), Table S4: Expression analysis of *PheHsfAs* under different abiotic stress in moso bamboo, Fasta S1: PheHsfs and PheHsfs-like CDS sequence, Fasta S2: PheHsfs and PheHsfs-like protein sequence, Fasta S3: The cloned cDNA sequence of PheHsfA5, PheHsfB4a-1 and PheHsfB4c-1.

**Author Contributions:** X.L. and J.G. designed the experiments; X.L. and J.L. (Juan Li) performed the tissue and organ collection; L.X. and X.L. performed the experiments; Z.C. formal analysis; J.L. (Jun Liu) and S.M. assisted in equipment maintenance and sample collection; L.X. writing—original draft preparation; X.L. and D.H. writing—review and editing; J.G. review and funding acquisition. L.X. and X.L. contributed equally in this work and should be considered first co-authors.

**Funding:** This project was supported by National Keypoint Research and Invention Program in 13th Five-Year (Grant No. 2018YFD0600100).

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **DNA Barcoding Analysis and Phylogenetic Relation of Mangroves in Guangdong Province, China**

**Feng Wu 1,2, Mei Li 1, Baowen Liao 1,\*, Xin Shi <sup>1</sup> and Yong Xu 3,4**


Received: 9 December 2018; Accepted: 4 January 2019; Published: 12 January 2019

**Abstract:** Mangroves are distributed in the transition zone between sea and land, mostly in tropical and subtropical areas. They provide important ecosystem services and are therefore economically valuable. DNA barcoding is a useful tool for species identification and phylogenetic reconstruction. To evaluate the effectiveness of DNA barcoding in identifying mangrove species, we sampled 135 individuals representing 23 species, 22 genera, and 17 families from Zhanjiang, Shenzhen, Huizhou, and Shantou in the Guangdong province, China. We tested the universality of four DNA barcodes, namely *rbcL*, *matK*, *trnH-psbA*, and the internal transcribed spacer of nuclear ribosomal DNA (ITS), and examined their efficacy for species identification and the phylogenetic reconstruction of mangroves. The success rates for PCR amplification of *rbcL*, *matK*, *trnH-psbA*, and ITS were 100%, 80.29% ± 8.48%, 99.38% ± 1.25%, and 97.18% ± 3.25%, respectively, and the rates of DNA sequencing were 100%, 75.04% ± 6.26%, 94.57% ± 5.06%, and 83.35% ± 4.05%, respectively. These results suggest that both *rbcL* and *trnH–psbA* are universal in mangrove species from the Guangdong province. The highest success rate for species identification was 84.48% ± 12.09% with *trnH-psbA*, followed by *rbcL* (82.16% ± 9.68%), ITS (66.48% ± 5.97%), and *matK* (65.09% ± 6.00%), which increased to 91.25% ± 9.78% with the addition of *rbcL*. Additionally, the identification rate of mangroves was not significantly different between *rbcL* + *trnH-psbA* and other random fragment combinations. In conclusion, *rbcL and trnH-psbA* were the most suitable DNA barcode fragments for species identification in mangrove plants. When the phylogenetic relationships were constructed with random fragment combinations, the optimal evolutionary tree with high supporting values (86.33% ± 4.16%) was established using the combination of *matK* + *rbcL* + *trnH-psbA* + ITS in mangroves. In total, the 476 newly acquired sequences in this study lay the foundation for a DNA barcode database of mangroves.

**Keywords:** mangroves; DNA barcoding; species identification; phylogenetic relation

#### **1. Introduction**

In 2003, Hebert [1] proposed a novel DNA barcoding technology to expedite the process of species identification. Around 2005, the concept of DNA barcoding was introduced into botanical research [2,3]. In 2009, the CBOL(Consortium for the Barcode of Life) Plant Working Group initially identified and recommended the use of the chloroplast-derived DNA barcode fragments *rbcL* and *matK* [4]. In addition, *trnH-psbA*, the internal spacer region of the chloroplast gene, and the ITS region of the nuclear genome were also investigated [5–7]. Further research is required to compare DNA barcode fragments to test their efficacy for species identification [8–11]. Previous studies on the DNA barcoding

of plants have mainly focused on tropical and subtropical forests [12–15]. In addition, sequences obtained from DNA barcode fragments can also be used to reconstruct the phylogenetic relationships of specific biological groups, and this has become a new research hotspot in recent years [16–18]. This research promotes the integration of phylogenetic analysis, ecology, and barcoding technology and develops our understanding of evolutionary biology and other related disciplines [19–22].

Mangroves, in the broad sense, are woody plant communities growing in tropical and subtropical intertidal zones, which play an important role in maintaining the ecological balance of coastal zones [23,24]. There are 84 species of 24 genera and 16 families of mangrove plants in the world, including 70 species of 16 genera and 11 families of true mangrove plants and 14 species of eight genera and five families of semi-mangrove plants [25]. There are 25 species of mangrove plants from Guangdong, including 16 true mangroves and 9 semi-mangroves [26]. There are many similar species of mangroves, and their distribution areas are overlapped. It is difficult to identify similar mangrove species using the external morphology of plants, such as *Sonneratia alba* J. Smith, *Sonneratia caseolaris* (L.) Engl., *Sonneratia Hainanesis* Ko, E. Y. Chen et W. Y. Chen, *Sonneratia paracaseolaris* W. C. Ko et al. or *Bruguiera sexangula* (Lour.) Poir and *Bruguiera sexangula* (Lour.) Poir. var. rhynchopetala Ko. Moreover, it is hard to understand the evolutionary relationship between mangrove species with traditional classification methods. DNA barcoding in animals [27,28], insects [29,30], tropical and subtropical plants [19,31], and microorganisms [32,33] have achieved reliable reconstructions of evolutionary relationships, successfully identified species of the same genus, and discovered new species or cryptic species. In this study, we aimed to investigate the universality of DNA barcoding in the mangrove flora, which is in the transition zone between land and water, and to construct the phylogenetic trees of mangrove flora, to provide a scientific basis for the conservation of mangrove biodiversity.

#### **2. Materials and Methods**

#### *2.1. Plant Material*

In this study, the main distribution areas of mangroves in Guangdong were selected, namely the mangrove protection areas in Shenzhen, Huizhou, and Shantou in Eastern Guangdong and Zhanjiang in Western Guangdong. Specific sampling location information is shown in Table 1. According to the DNA barcode sample collection specifications of Gao [34], two to three individuals of each mangrove species were sampled. This involved taking fresh leaves and buds to facilitate the extraction of DNA molecular materials. Each individual sampled was more than 20 m apart. The test material was dried with silica gel after collection. A total of 135 individuals of mangrove plants were collected. Based on expert (LIAO and LI) identification, the individuals were classified into 23 species (including two mangrove companion species) of 22 genera belonging to 17 families. The number of true mangrove and semi-mangrove species sampled in this study accounted for 84% of the 25 species of mangrove plants in Guangdong province [26]. Among them, *B. sexangula* (Lour.) Poir and *B. sexangula* (Lour.) Poir. var. rhynchopetala Ko, which were introduced by humans, are almost extinct, so they were not sampled.



#### *2.2. DNA Extraction and Sequence Analysis*

The DNA of mangrove plants was extracted from silica gel-dried leaf material following a modified version of the cetyltrimethyl ammonium bromide (CTAB) protocol of DoyleandDoyle [35]. According to the recommendation of the international union for biological barcoding [4] and previous studies [13,36,37] on regional plant DNA barcoding, a total of four molecular sequences including chloroplast *rbcL*, *matK*, *trnH–psbA*, and nuclear genome ITS were selected for use as amplification fragments. Referrals to the PCR system recommended by the plant working group of the international DNA barcode alliance for life, optimization, and adjustment were made. Primer information and amplification procedures are provided in Table 2. All amplification products were sent to Guangzhou after gel electrophoresis detection for complete sequencing; BLAST (Basic Local Alignment Search Tool) searches were performed using GenBank for the sequences obtained after bidirectional sequencing of the four fragments. If significant inconsistencies were found between the sequences and the original species, reasons were found and reextracted or reconfirmed by consulting experts, until the BLAST results of the sequences and the original species were of the same genus or family. SeqMan 5.00 (DNASTAR package, Madison, WI, USA) was used to splice and proofread the obtained sequences. The sequences were aligned in Geneious 11.1.3 (Biomatters Ltd, Auckland, New Zealand) using the MAFFT (Multiple Alignment using Fast Fourier Transform) algorithm with the default parameters.


**Table 2.** The primers used to amplify DNA barcodes and the amplification protocol.

ITS: The internal transcribed spacer of nuclear ribosomal DNA.

#### *2.3. Statistical Analysis*

PCR amplification success rates and sequencing success rates were calculated following Kress [19]. The success rate of PCR amplification refers to the percentage of successful individuals of a segment in all individuals of the segment. BLAST was used to evaluate the efficacy of the species identification method. Firstly, a local database was established for the four DNA fragments in Geneious 11.1.3 [41], and all sequence comparisons were saved as \*. fasta files to adjust the sequence direction and clear the gap between sequences. BLAST-2.7.1+ (https://www.ncbi.nlm.nih.gov/ package, National Center for Biotechnology Information, USA) was used to compare each sequence with all sequences in the database, and the percentage of identical sites was used as the quantification standard. If the minimum value of the identical sites of the same species was greater than the value between individuals of all other species, then we considered that the sequence of this species had been accurately identified. The success rate of identifications was determined by multiplying the percentage of species successfully identified by the success rate of sequencing the segment. Joint fragment identification is the result of accumulation on a single fragment [13,41].

#### **3. Results**

#### *3.1. Sequence Universality*

Sequence statistics were calculated for 135 individuals of 23 species of mangrove plants. The results (Table 3) showed a total of 496 DNA barcode fragments were obtained, with a sequence acquisition rate of 88.15% (476 divided by 540). Among them, a total of 118 sequences of mangrove plants from the Zhanjiang mangrove reserve were obtained. The highest success rate for PCR amplification were with *rbcL* and *trnH-psbA* fragments, followed by ITS and *matK*. The highest sequencing success rate was 100% with the *rbcL* fragment, followed by *trnH-psbA*, ITS, and *matK*.

**Table 3.** The success rates of PCR amplification and sequencing of the four barcoding fragments in the four mangrove forests.


A total of 131 sequences of mangrove plants from the mangrove reserve of Shenzhen were obtained. The highest amplification success rates were with both *rbcL* and *trnH-psbA*, followed by ITS and *matK*. The highest sequencing success rate was 100% with *rbcL*, followed by *trnH-psbA*, ITS, and *matK*. A total of 141 sequences of mangrove plants from the Huizhou mangrove reserve were obtained. The highest amplification success rates were with *rbcL*, followed by ITS, *trnH-psbA*, and *matK*. The highest sequencing success rate was 100% with *rbcL*, followed by *trnH-psbA*, ITS, and *matK*. A total of 86 sequences of mangrove plants from the Shantou mangrove reserve were obtained. The amplification success rates were 100% with *rbcL*, *trnH-psbA* or ITS, and 91.67% with *matK*. The highest sequencing success rate was 100% for *rbcL*, followed by *trnH-psbA*, ITS, and *matK*.

#### *3.2. Species Delimitation*

The true mangroves and semi-mangroves distributed in Guangdong are all single-genus and single-species, except *Sonneratia* and *Bruguiera*. Therefore, only the success rates of species-level identification are discussed here. BLAST results showed that the highest success rate of species identification was 84.48% ± 12.09% with *trnH-psbA*, followed by 82.16% ± 9.68% with *rbcL*, 66.48% ± 5.97% with ITS, and 65.09% ± 6.00% with *matK* (Figure 1). However, more species were successfully identified with *rbcL* fragments than with *trnH-psbA* fragments in any region, except for Shantou. The main difference was that *trnH-psbA* could accurately distinguish *S. caseolaris* (L.) Engl. and *Sonneratia apetala* Buch.-Ham, but *rbcL* could not.

According to the statistical analysis of different combinations of multiple fragments, the identification success rate of *rbcL* + *trnH-psbA* was 91.25% ± 9.78%, which increased to 91.88% ± 8.62% with the addition of ITS. The success rate of species identification with all fragments was consistent with that of *rbcL* + *trnH-psbA* + *matK* (94.38% ± 4.48%). These results suggest that the combination of certain fragments can increase the success rate of barcoding for species identification.

**Figure 1.** Mangrove species discrimination rate of all single and multi-DNA barcoding fragments. R, *rbcL*; M, *matK*; T, *trnH–psbA*; I, ITS; All, the database consisted of all species.

#### *3.3. Phylogenetic Trees*

Phylogenetic trees were constructed in MEGA6.0 (Tamura, Stecher, Peterson, Filipski, and Kumar) using the neighbor-joining (NJ) method based on the Kimura'2-parameter model. Phylogenetic trees were constructed with the individual or random combined fragments and the average node support rate was calculated. The results showed that the highest average node support rate of phylogenetic trees using the combination *rbcL* + *matK* + *trnH-psbA* + ITS was 86.33% ± 4.16% (as shown in Figures 2–5). Phylogenetic trees were fan-shaped, with one branch of the same or similar species. The average node support rate for mangrove phylogenetic trees in the four regions was 89.66% ± 18.50% in Zhanjiang, 88.49% ± 17.25% in Huizhou, 86.85% ± 15.60% in Shenzhen, and 80.33% ± 19.89% in Shantou.

**Figure 2.** The phylogenetic tree of mangroves in Zhanjiang using fragment of *matK* + *rbcL* + *trnH-psbA* + ITS.

**Figure 3.** The phylogenetic tree of mangroves in Shenzhen using fragment of *matK* + *rbcL* + *trnH-psbA* + ITS.

**Figure 4.** The phylogenetic tree of mangroves in Huizhou using fragment of *matK* + *rbcL* + *trnH-psbA* + ITS.

**Figure 5.** The phylogenetic tree of mangroves in Shantou using fragment of *matK* + *rbcL* + *trnH-psbA* + ITS.

#### **4. Discussion**

Our study investigated mangrove plants in the Guangdong province using DNA barcoding technology. The purpose of this study was to evaluate the performance of DNA barcoding in terms of primer universality, successful identification rate, and phylogenetic tree construction.

#### *4.1. The Universality of DNA Barcoding in Mangrove Communities*

The success rate of PCR amplification and the sequencing of *rbcL* fragments in core barcodes of mangrove DNA samples reached 100%. Compared with Kress [19], our results showed higher universality and success rates, and were the same as those of Pei [42], with 90%–100% in the forest plant communities in tropical and subtropical regions. This indicates that *rbcL* has the fewest numbers of variable sites and that the selected primer sequence has strong universality. Therefore, *rbcL* is recommended as an effective fragment for DNA barcoding in mangroves. The success rate of another core fragment, *matK*, was the lowest of the four fragments. With this fragment, the amplification or sequencing of *Ceriops tagal* (perr.) C. B. Rob. and *Kandelia candel* (L.) Druce were not successful, and only a small number of individuals were successfully sequenced in, e.g., *Bruguiera gymnorrhiza* (L.) Poir., *Acrostichum aureum* L., and *Rhizophora stylosa* Griff., indicating a lack of universality for this

fragment compared to *rbcL*. This may be due to the existence of single nucleotide repeat sequences. The amplification rate of *matK* fragments in previous studies was lower, e.g., 68.18% [43], as was the success rate, e.g., 64% [14]. Similarly low results were reported in other studies of Liu [13], Lu [44], and Wei [45]. Generally, in this situation, the number of primers increases, and the procedure is repeated multiple times.

The barcode *trnH-psbA* has high universality, and the amplification rate and sequencing success rate can be 99.38% ± 1.25% and 94.57% ± 5.06%, respectively, with only one pair of primers, which is second only to the *rbcL* fragment. This fragment can also successfully distinguish *Sonneratia*. The success rate of the nuclear gene ITS amplification was 97.18% ± 3.25%; the amplifications of *A. aureum*, *B. gymnorrhiza*, *C. tagal*, and *R. stylosa* were not successful. The sequencing success rate of the nuclear gene ITS was 83.35% ± 4.05%, and only a few individuals were successfully sequenced in *K. candel* and *Acanthus ilicifolius* L. Previous studies, such as that of Tripathi [46], showed that the sequencing success rate of ITS sequences in tropical forest species in India was 62.0%. Kang [47] showed that the success rate of ITS sequencing in Hainan tropical cloud forests was 47.20% ± 5.76%. In conclusion, the *rbcL* fragment and *trnH-psbA* fragment are recommended for the amplification and sequencing of mangrove plants.

#### *4.2. Species Identification Ability*

CBOL Plant Working Group recommend *rbcL* and *matK* for the core DNA barcoding for plants. Burgess [41] found that the core barcode can successfully identify 93% of species in the temperate flora of Canada, and de Vere [48] found that *rbcL* + *matK* can identify 69.4%–74.1% of flowering plants in Wales, UK. Kress [19] studied 296 woody species in Panama and found that the species identification rate of *matK* + *rbcL* was as high as 98%. In this study, the species identification rate of the core barcode *rbcL* + *matK* combination fragment can reach 90.56% ± 6.94%. This indicates that the core barcode is effective for species identification and is consistent with previous research results. However, the identification rate of mangroves decreased when only one of them was used. Either *rbcL* or *trnH-psbA* had a higher identification rate of mangroves than that of *matK*. Additionally, the rate for *rbcL* + *trnH-psbA* was higher than that for *rbcL* + *matK*. Furthermore, the identification rate of mangroves had no significant differences between *rbcL* + *trnH-psbA* and other random fragment combinations.

Although the number of bases of *trnH-psbA* varies greatly among different plant groups, a large number of insertions and deletions make sequencing difficult, as well as the existence of single-nucleotide repeats and other special structures in some groups. In this study, the success rate of *trnH-psbA* sequencing was second only to *rbcL*. At the same time, this fragment was able to successfully identify *S. caseolares* and *S. apetala*. The species identification rate of *trnH-psbA* was the highest among the four single fragments, Gonzalez [12] and Tripathi [46] also showed that *trnH-psbA* is one of the most promising barcodes for species identification. This suggests that *trnH-psbA* may act as a complementary fragment to *rbcL*.

Although studies by Kress [19], Sass [49], and Li [7] support the incorporation of ITS into the core barcode of plant DNA, the supplementary ITS fragment was found to be of low universality in the present study, and the species identification rate, when used singly, was 66.48% ± 5.97%. However, the ITS fragment could identify *Laguncularia racemosa* C.F.Gaertn but not *rbcL* and *trnH-psbA*. In addition, the ITS region of the nuclear genome can provide more genetic information from parents than the chloroplast genes. Given the above, *rbcL* and *trnH-psbA* were effective fragments for mangrove identification, and ITS fragment could be used for specific mangroves.

#### *4.3. Phylogenetic Trees*

*rbcL* + *matK* + *trnH-psbA* fragments had been used to construct phylogenetic trees in different localities, e.g., Barro Colorado Island [19], the Dinghu mountain forest [50], the Ailao mountain forest [44], and tropical cloud forests in Hainan [47]. However, the average node support for phylogenetic trees of mangrove species in these four subtropical regions constructed with *rbcL* + *matK* + *trnH-psbA* + ITS was 5.59% higher than that with *rbcL* + *matK* + *trnH-psbA*, which saw no significant difference. In addition, in Figures 2–5, most of the same species were clustered into one branch, also indicating that DNA barcoding can be used to identify species. Due to the different geographical locations of the mangroves, environmental conditions are also different (Table 1), which leads to differences in selective pressure. This could lead to differences in the evolutionary trajectory of mangrove species in the four regions. For example, community composition was observed to be different at different locations, whereby true mangrove species gradually decreased from south to north.

#### **5. Conclusions**

The results of the present study suggest that *rbcL* and *trnH-psbA* have high success rates for amplification and sequencing, indicating that these two barcodes are common in mangrove species. In terms of species identification, these fragments were relatively successful compared to the other fragments tested. The phylogenetic trees of mangrove plants constructed with a combination of *rbcL* + *matK* + *trnH-psbA* + ITS had the highest node support rate. Due to the low efficiency of *matK* fragment amplification and identification in mangrove plants. And the identification success rate of *rbcL* was higher than that of *trnH-psbA*, except for Shantou region, where the fragment of *trnH-psbA* can be used to identify specific species which cannot be identified by *rbcL*. Thus, this study concluded that *rbcL* and *trnH-psbA* were the most suitable DNA barcode fragments for species identification in mangrove plants.

Data collection for mangrove DNA barcoding in the Guangdong province is ongoing. A total of 476 sequences were obtained, from 135 individuals of 23 species of mangrove plants, accounting for 55.26% of the mangrove plants in China (21/38). In future research, the sampling range can be further expanded to include the DNA barcoding of other mangrove tree species and mangrove companion plants, to build a complete and high-coverage mangrove plant DNA barcoding database.

**Author Contributions:** Conceptualization, B.L. and M.L.; methodology, F.W.; software, F.W., Y.X.; formal analysis, F.W., Y.X.; investigation, F.W., B.L., M.L., X.S.; supervision, B.L., M.L.; writing, F.W.

**Funding:** This research was funded by the Ministry of Science and Technology of China (No. 2017FY100705 and No. 2017FY100700), and the National Natural Science Foundation of China (No. 31570594).

**Acknowledgments:** The authors would like to thank Jiang, Z.M. and Xu, Y.W. for collecting material, and additional help. Our sincere thanks are extended to Yan, H.F. for providing a scientific research platform.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Diversity and Utilization of Edible Plants and Macro-Fungi in Subtropical Guangdong Province, Southern China**

**Juyang Liao 1,2, Linping Zhang 3, Yan Liu 2, Qiaoyun Li 2, Danxia Chen 2, Qiang Zhang <sup>4</sup> and Jianrong Li 5,\***


Received: 17 September 2018; Accepted: 22 October 2018; Published: 25 October 2018

**Abstract:** Food supply from forests is a fundamental component of forest ecosystem services, but information relating to suitability for human consumption and sustainable utilization of non-timber forest products (NTFPs) in developing countries is lacking. To address this gap in knowledge, diverse datasets of edible plants and macro-fungi were obtained from field collections, historical publications, and community surveys across seven cities in Guangdong Province (GP), southern China. Seven edible parts and five food categories of plant species were classified according to usage and specific nutrient components. Edible plant species were also categorized into different seasons and life forms. Our results show that at least 100 plant species (with 64 plant species producing fruit) and 20 macro-fungi were commonly used as edible forest products in subtropical GP. There were 55 and 57 species providing edible parts in summer and autumn, respectively. Many edible plants had multiple uses. Tree and herbaceous species made up the majority of the total. Our study provides evidence that both edible plants and macro-fungi can enrich the food supply for residents in rural and urban areas by acting as supplemental resources. We therefore suggest that, in spite of the prevalence of imported foods due to modern infrastructure, edible NTFPs from subtropical forests might be leveraged to support the increasing demand for food in an era of rapid urbanization and global change.

**Keywords:** edible forest product; forest biology; macro-fungi; non-timber forest products (NTFPs); Pan-Pearl River Delta

#### **1. Introduction**

Forests and trees are principal components of terrestrial ecosystems, providing the earth with a vast array of socio-ecological benefits [1–3]. Among these benefits, forest biodiversity is a crucial dimension, important for valuing ecosystem services, and has been attracting growing attention from municipal authorities, research agencies, and the public. The diversity of edible plants and macro-fungi is a special category of forest biodiversity. From a global perspective, food security is a major concern and will remain a great challenge for the rest of the 21st century, since crop yields have fallen in many countries and regions due to insufficient investments, irregular climate, and intense disturbance from agroforestry [4–7]. Forests, including natural and urban forests, are highly valuable to world food security, annually providing livelihoods and food for over 300 million people [8–11]. Moreover,

urban greenspaces contribute to and preserve a considerable proportion of terrestrial biodiversity [12]. Investigation and evaluation of edible species diversity from natural forests and urban green spaces can increase our understanding of the food supply required to meet the demands of a growing population. Furthermore, efforts to describe edible species diversity are of theoretical significance to understand diversity patterns of local food species, in addition to increasing citizens' awareness of biodiversity protection. Currently, edible fungi, wheat, rice, corn, sorghum, cassava, and potato are the top seven major food crops in the world [9,13]. Edible forest products from diverse ecosystems are important supplements for the global food supply. As the largest developing country in the world and a major economic entity, China has the largest population in Asia—more than 1.41 billion as of 2016—and has diverse forest types, including tropical, subtropical, and temperate zones. Therefore, enriching the food supply and achieving food security are important tasks for the international community, regional governments, and local authorities.

Food diversity from forests (i.e., edible plants and macro-fungi) is described in order to reveal a vital component of forest biodiversity in subtropical Guangdong Province (GP), southern China. The objectives of the present study were three fold: (1) to obtain a comprehensive species list of edible plants and macro-fungi commonly used in the region; (2) to explore general patterns of edible plants among different edible parts, categories, seasons, and life forms; and (3) to evaluate the utilization and conservation of edible forest products. Overall, exploring the diversity of edible forest products adds to our understanding of the socio-ecological benefits of forests. These efforts assess the suitability of non-timber forest products (NTFPs) for human consumption, as well as the necessity and feasibility of green infrastructure construction in urbanizing regions.

#### **2. Materials and Methods**

#### *2.1. Research Area*

Guangdong Province (20◦09 –25◦31 N, 109◦45 –117◦20 E) is located in southern China (Figure 1), geographically adjacent to Hong Kong SAR, Macao SAR, and 4 other provinces. GP consists of 21 prefecture-level cities and covers an area of 179,700 km2, with urbanized areas in the southeast and relatively rural areas in northwest. The typical vegetation is subtropical evergreen broadleaved forest, with natural habitats mainly distributed in the northwest, while there is a large proportion of urban forests and green spaces in the southeast. GP has a massive pool of forest resources, estimated to exceed 7400 vascular plant species and 1100 known macro-fungi [14,15]. Several major socioeconomic indicators were retrieved from the Guangdong Statistical Yearbook (2003–2016) (http://www.gdstats. gov.cn/) (Table 1). As a highly urbanized region and major municipality in southern China, GP is representative of China's modernization process and forest biodiversity patterns. According to the official statistics, as of 2016 GP hosted ~110 million residents and thus requires a large amount of imported food.

**Figure 1.** Location of Guangdong Province, southern China (highlighted in black).

#### *2.2. Edible Species Diversity and Taxonomic Information*

Records of edible species diversity (plants and fungi) and their illustrations were obtained from field observations and historical publications focusing on provincial forests (mostly natural reserves and forest parks), accompanied by community surveys at fruit stores and supermarkets. Sampling sites occurred in 4 cities (Foshan, Guangzhou, Shenzhen, and Zhongshan) located in highly urbanized areas of GP, and another 3 cities (Shaoguan, Zhanjiang, and Zhaoqing) located in relatively rural areas. Taxonomic information on all plant species (Supplementary Materials) refer to the English revision of Flora of China (http://foc.eflora.cn/). Specifically, species name, family name, lifestyle, life form, and harvest time were collected. The names of fungi and their taxonomy are from Mycobank (http://www.mycobank.org/).

Each edible plant species was classified into 1 of the following 7 categories: root (tuber), stalk (bark, and/or shoot), leaf, flower, fruit, seed, and/or whole plant. For fungi, the whole sporocarp was considered an edible part; these are usually used as vegetables and medicinal supplements. Four seasons of food harvest were classified according the climatic conditions in Guangdong Province as follows: March to May, spring; June to August, summer; September to November, autumn; and December to February of the second year, winter. Five food types were included in this study according to their specific nutrient components: cereals and saccharides (providing carbohydrate and starch), fruits (including all fruit types, providing vitamins and protein), vegetables (providing vitamins), oil plants (including oil crops and products, providing vegetable fat), medicinal plants (providing primary healthcare), and spice plants (providing flavoring agents). Field surveys and sampling methods were conducted according to the "Observation Methodology for Long-term Forest Ecosystem Research" of the National Standards of the People's Republic of China (GB/T 33027-2016). All figures were produced using Adobe Photoshop CC 2015 (Adobe Systems Inc, San Jose, CA, USA). Statistical difference (one-sample test) was performed using SPSS (Standard version 13.0; SPSS, Chicago, IL, USA) among the 6 edible parts, 4 seasons, 5 categories, and 4 life forms.


**Table 1.** General statistics of edible plants in Guangdong

 Province (2003–2016).

*Forests* **2018** , *9*, 666

#### **3. Results**

#### *3.1. Major Crops in Guangdong Province*

In recent years, social and economic growth has increased in Guangdong Province (Table 1). Provincial gross domestic product (GDP) and primary industry increased by 490% and 250% from 2003 to 2016, respectively. Primary industry accounted for a relatively low and stable proportion of GDP in GP, ranging from 4.59% in 2015 to 7.82% in 2003. The permanent resident population increased by 38% (30.45 million), whereas the proportion of urban population increased by 6%, to nearly 70%, from 2006 to 2016. Cereals decreased by 8% from 2003 to 2016, although yield remained relatively stable. Saccharides, oils, and vegetables all increased by one-third. Fruits increased by 120%; interestingly, the famous local lychee fruit tree (*Litchi chinensis* Sonn.) contributed slightly over 10% of the total fruit yield.

#### *3.2. Patterns of Species Diversity and Edible Parts*

Species diversity of edible plants was found to be relatively high in the study region. As many as 100 plant species (including var. and cv.) belonging to 88 genera in 51 families were identified (Supplementary Materials). Interestingly, 37% of edible plants were from several common families: nine species from Rosaceae; six species from Fabaceae, Myrtaceae, and Rutaceae; and five species from Poaceae. Species producing fruit parts contributed >60% of the total species, and some plants had multiple edible parts. Specifically, 64 species were identified with edible fresh or dried fruits, 11 species with edible roots or tubers, 10 species with edible seeds, 7 species with edible leaves, 6 species with edible flowers or whole plant, and 3 species with edible bark, stalks, or shoots. Statistical difference was not significant (t = 1.771, df (degree of freedom) = 5, *p* = 0.137; t = 2.402, *p* = 0.061 when six species with edible whole plant were accumulated to each part accordingly) among the six edible parts (Figure 2A).

**Figure 2.** Categories of edible plant species commonly found in Guangdong Province, southern China: (**A**) edible parts; (**B**) food categories; (**C**) seasonal patterns; and (**D**) life forms.

#### *3.3. Seasonal Patterns and Food Categories of Edible Forest Products*

Over half of the total plant species provided edible parts in the summer and autumn, with some plants spanning multiple seasons (Supplementary Materials). Specifically, 57 species were available in autumn, 55 species in summer, 20 species in spring, and 18 species in winter. Furthermore, eight species were available across all four seasons. Statistical difference was significant (t = 3.506, df = 3, *p* = 0.039; t = 4.254, *p* = 0.024 when eight species available during the full year were accumulated to each season accordingly) among the four seasons (Figure 2C). Many edible plants had multiple uses. Specifically, 53 species were used as fruits, 27 species as medicinal or spice plants, 23 species as vegetables, 14 species as cereals or saccharides, and 12 species as oil plants. Statistical difference was significant (t = 3.513, df = 4, *p* = 0.025) among the five categories (Figure 2B). Tree and herbaceous species made up approximately half and one-third of the total, respectively, while shrub species accounted for only 5% of the total. Statistical difference was not significant (t = 2.579, df = 3, *p* = 0.082) among the four life forms (Figure 2D).

Furthermore, 20 common edible macro-fungi were recorded, belonging to 16 families (Supplementary Materials). Three out the 20 species were from the family Pleurotaceae. The majority of fungus species were cultivated, accompanied by some wild species found in forests. For instance, wild species *Boletus speciosus* Frost and *Russula vinosa* Lindblad usually appeared in the summer and autumn. The whole sporocarp was edible, mostly used as a vegetable. Moreover, about two-thirds of the species pool could be used as medicinal supplements.

#### **4. Discussion**

The benefits of forest ecosystems are extremely diverse. The composition of food consumption, from the global perspective, is trending toward better, healthier, and more diverse foods [4,7,16,17]. Edible plants and macro-fungi are important because they provide livelihoods for people [11,18,19]. A growing number of Chinese people pursue a high-quality lifestyle (e.g., realizing the importance of a vegetarian diet) [16,20]. The reported 100 edible plants and 20 macro-fungi were identified based on a preliminary investigation within Guangdong Province, southern China. Particularly, biodiversity conservation and food supply in the region were investigated in order to understand how they might address poverty alleviation and sustainable utilization, which are increasingly important due to global changes and rapid urbanization. Our findings might also be utilized by local authorities and stakeholders adjacent to GP (such as Hunan and Jiangxi Provinces and the Pan-Pearl River Delta).

#### *4.1. Biodiversity Conservation of Forest Resources*

Species diversity is the basis of biodiversity conservation. High species diversity can help counteract homogeneous food resources in the region [6,13,21]. In light of the more than 7400 woody species and 300 macro-fungi already described in GP, additional species are likely to be edible [15,22]. In particular, the presence of endemic species (e.g., fruit species *Dimocarpus longan* Lour., *Litchi chinensis* Sonn., and *Mangifera indica* L.; vegetable species *Hylocereus undatus* Haw., *Lycium chinense* Mill. *var. chinense*, and *Sechium edule* (Jacq.) Swartz; and medicine species *Ficus hirta* Vahl., *Archidendron clypearia* Jack., and *Plumeria rubra* L.) demonstrates the value of preserving regional biodiversity. Genetic diversity serves as another important component of biodiversity. High genetic diversity is important to maintain phylogenetic community structure [23–26]. Interestingly, numerous studies have reported that hidden/cryptic genetic biodiversity can be revealed through modern taxonomic methods such as DNA barcoding [27,28]. Our results show that there is a moderate proportion of wild plants (14% wild, i.e., the species seldom exists in the form of a cultivar; and another 23% partially wild, i.e., the species has the potential to be domesticated) (Supplementary Materials), which suggests that the genetic diversity of edible plants needs to be improved, and may function as a reservoir of biodiversity in the region. In terms of landscape diversity, the third component of biodiversity, the present study provides very little information. However, we agree with previous

findings that actions and policies that protect living habitats and landscape components for wild plants, animals, and fungi are vital to regional biodiversity conservation in natural forests and urban green spaces [29,30].

Forests provide humans with multiple benefits, from subsistence to safety nets and cash income. In most cases, however, trade-offs may exist between exploitation and protection of forest resources. People could largely depend on local forest products, but deforestation and overexploitation might prevail in the absence of adequate protection, especially in face of strong global demand [31,32]. We suggest that one way to protect forest-based livelihoods while avoiding overexploitation would be to use an alternative solution that synergizes social and ecological benefits [33–35], with goals for sustainable forestry practices and win-win solutions. Future efforts might include the regions adjacent to GP, such as Hunan and Jiangxi Provinces, which geographically share the Nanling Mountains, a global biodiversity hotspot [36,37].

#### *4.2. Food Supply in Different Seasons*

Urban landscapes (e.g., ground-level and green rooftop gardens) have the potential to produce a large proportion of food crops (e.g., fresh fruits and vegetables) for a dense population, developing a local urban food system and providing eco-environmental benefits [38–40]. Due to the rapid socioeconomic development in China, attributed to the reform and opening-up policy (since 1978) and entry into the World Trade Organization (WTO) (since 2001), the dietary structure of residents has dramatically changed in GP with the availability of a diverse food supply throughout the year. It is now easy to obtain fresh fruits and vegetables via novel technologies like soilless culture, greenhouse cropping, and convenient transportation. However, we should also respect the natural life cycle of plants. Our results show that only a small proportion (~20%) of edible plant species were available in GP in the winter (such as *Chrysanthemum morifolium* cv. Hangju, *Garcinia mangostana* L., *Manihot esculenta* Crantz, *Trachycarpus fortune* (Hook.) H. Wendl, *Puerariamontana* var. *thomsonii* (Willd.), etc.) or spring (such as *Ananascomosus* L., *Bombax ceiba* L., *Houttuynia cordata* Thunb., etc.), though without specific everyday consumption, similar to previous studies calling for sustainable agroforestry and uninterrupted supplies of edible crops [9,19,41,42].

Edible forest products could be treated as an alternative defense against crises that affect food supply. Cereals are the most important food type supporting the current and projected global human population [43]. However, the yield of cereals largely depends on nutrient investments, climate change, and human activities, which might result in an imbalance of food supply at a particular time or place [44]. Our results show that there is tremendous potential for fruit trees, garden vegetables, and edible macro-fungi from nature to supplement the human food supply in the future. These edible forest products, though they are inflexible and sometimes have low returns, may still ensure sustainable food security for communities and generate as much income as cultivating crops in the long term [16,23,45].

#### *4.3. Estimation of Livelihoods from Forests*

Foods from forests and urban green spaces contribute a small proportion of everyday food consumption. In Guangdong Province, *Flammulina velutipes* (Curtis) Singer and *Pleurotus ostreatus* (Jacq.) P. Kumm. were the top two edible fungi, with a yield of 1.71 × 108 kg and 1.36 × <sup>10</sup><sup>8</sup> kg, respectively, according to a public report released by the Edible Fungus Index (http://www. mushroommarket.net/datas/). Among the species pool of 100 edible plants in this study, at least one-third were wild or partially wild species, which are not yet domesticated and exploited as are other major cultivars (e.g., lychee). Compared to the high regional species diversity, the proportions of edible plant diversity (~1.35%; 100/7400) and known macro-fungi (~1.82%; 20/1100) were relatively low [46], which indicates that wild plant and fungus resources were not overexploited and in situ conservation generally works well in GP. Meanwhile, many more wild species are likely to be edible and are expected to enrich the species diversity of forest food in this region. However, it should be

noted that many wild plants and fungi are poisonous (sometimes fatal) if eaten by mistake, which calls for the involvement of knowledgeable professionals.

Currently, poverty remains an important problem for over 1.2 billion people, mostly in developing countries and especially in rural areas [9,11,18,47]. Three principal ideas may achieve forest-based poverty alleviation: prevent forest resources from shrinking, if they are necessary to maintain well-being (protect the pie); make forests accessible and redistribute resources and rents (divide the pie differently); and increase the value of forest production (enlarge the pie) [48]. It is estimated that the livelihoods of ~20% of the global population are supported by forest products, serving as subsistence, safety nets, and pathways to prosperity [11]. Environmental income (e.g., transfer of ownership of natural forests and other urban green spaces to local communities, coupled with payments for improved ecosystem services) accounts for one-third of total household income [18,49]. In addition to plants and macro-fungi, animal proteins provided by wildlife and insects could benefit human nutrition and provide special livelihoods in some rural areas [17,50,51].

#### **5. Conclusions**

This study reports state-of-the-art knowledge on common forest products (i.e., edible plants and macro-fungi) in lower subtropical forests in southern China. Efforts to protect natural forests and urban green spaces should continue in order to prevent biodiversity loss. Domestication of edible wild plants could be reinforced to enrich livelihoods for highly urbanized regions. Exploration of more edible plants, macro-fungi, and insects might be expected to add to the food supply. We show that endeavors to explore the diversity of edible forest products could strengthen our understanding of the socio-ecological benefits of subtropical forests.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/9/11/666/ s1.

**Author Contributions:** J.L. (Jianrong Li), J.L. (Juyang Liao) and L.Z. conceived the idea, compiled the datasets, and conducted analyses. J.L. (Juyang Liao), L.Z., Y.L., Q.L., Q.Z., and J.L. (Jianrong Li) established the direction. J.L. (Juyang Liao) and J.L. (Jianrong Li) wrote the first draft of the manuscript. J.L. (Juyang Liao), L.Z., Y.L., Q.L., D.C., Q.Z., and J.L. (Jianrong Li) contributed with suggestions and corrections, and approved the final manuscript.

**Funding:** This study was funded by the Fundamental Research Funds of CAF (CAFYBB2017QB002), Science and Technology Key R & D Program of Hunan Province (2017SK2332), National Natural Science Foundation of China (31660189), Pearl River S&T Nova Program of Guangzhou (201610010001), and CFERN & BEIJING TECHNO SOLUTIONS Award Funds on excellent academic achievements.

**Acknowledgments:** We are grateful to three anonymous reviewers for valuable ideas and/or comments on the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Heterodichogamy, Pollen Viability, and Seed Set in a Population of Polyploidy** *Cyclocarya Paliurus* **(Batal) Iljinskaja (Juglandaceae)**

#### **Xia Mao, Xiang-Xiang Fu \*, Peng Huang, Xiao-Ling Chen and Yin-Quan Qu**

Co-Innovation Centre for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing 210037, China; 15380924779@163.com (X.M.); nanjinghp111@163.com (P.H.); chenxl0811@163.com (X.-L.C.); 18761808762@163.com (Y.-Q.Q.)

**\*** Correspondence: xxfu@njfu.edu.cn; Tel.: +86-025-85427403; Fax: +86-025-85427402

Received: 21 February 2019; Accepted: 17 April 2019; Published: 19 April 2019

**Abstract:** *Research Highlights: Cyclocarya paliurus*, native to the subtropical region of China, is a monoecious species with a heterodichogamous mating system. Its flowering phenology and low seed success characteristics differ from other typical heterodichogamous Juglandaceae species. This could be caused by the existence of polyploidy in the population. *Background and Objectives: C. paliurus* has been attracting more attention as a result of its medicinal value. To meet the needs for leaf harvest, cultivation expansion is required, but this is limited by a shortage of seeds. This study aims to profile the flowering phenology and the efficacy of pollen dispersal as well as elucidate on the mechanism of low seed success in the population. *Materials and Methods:* The flowering phenology pattern of *C. paliurus* was observed in a juvenile plantation containing 835 individuals of 53 families from 8 provenances at the individual (protandry, PA and protogyny, PG) and population levels for 5 consecutive years (2014–2018). Slides with a culture medium of 10% sucrose and 0.01% boric acid were used to estimate pollen density and viability in the population, and seeds were collected from 20 randomly selected PA and PG individuals to assess seed success during 2017–2018. *Results:* Four flowering phenotypes and strongly skewed ratios of PA/PG and male/female occurred in the juvenile population. Sexual type and ratio changed significantly with the growth of the population over the years, showing an increasing monoecious group (11.1% to 57.2%) and a decreasing unisexual group (33.6% to 16.3%), as well as a tendency for the sexual ratio to move towards equilibrium (5.42:1 to 1.39:1 for PG:PA). Two flowering phases and bimodality in gender were displayed, as in other heterodichogamous species. However, the high overlap of inter-phases and within individuals was quite different from many previous reports. Owing to the low pollen viability of *C. paliurus* (~30%), low seed success was monitored in the plantation, as well as in the investigated natural populations. *Conclusions:* Female-bias (PG and F) and a skewed ratio of mating types corresponded to nutrient accumulation in the juvenile population. Heterodichogamy in *C. paliurus* was verified, but was shown to be different from other documented species in Juglandaceae. The latest finding of major tetraploidy in a natural population could explain the characteristics of the flowering phenology and seed set of *C. paliurus* and also give rise to more questions to be answered.

**Keywords:** protogyny (PG); protandry (PA); pollen viability; seed success; polyploidy

#### **1. Introduction**

Heterodichogamy, a transitional type from dichogamy to dioecy, is a polymorphic phonologic sexual system [1,2]. It is defined as two complementary morphs, protogyny (PG, female function before male) and protandry (PA, male function before male), that function synchronously and reciprocally to one another at a population level [3]. Generally, disassortative mating between two morphs, regarded as the main pattern for heterodichogamous species, promotes proficient inter-morph cross-pollination, avoids selfing, and reduces inbreeding of intra-morph through temporal separation of the male and female functions in flowers [1,4]. To avoid frequency-dependent selection, the population has evolved to have a 1:1 morph ratio [5]. Based on this equilibrium ratio, genetic theory suggests that heterodichogamy was simply inherited with a single diallelic Mendelian locus [1], which has been confirmed in both *Juglans* and *Carya* genera from the Juglandaceae family [6,7].

Hitherto, plants with heterodichogamy have been documented in 13 families and 21 genera of flowering plants [8,9]. Heterodichogamy is particular found in Juglandaceae genera, including *Juglans*, *Carya*, *Peterocarya*, *Platycarya*, and *Cyclocarya* [1,9,10]. Renner [1] reported that about half of the heterodichogamous taxa are self-incompatible, but the Juglandaceae family is completely self-compatible [6,7]. However, Bai [11] demonstrated that the mating system of *J. mandshurica* is completely out-crossing.

Walnut and pecan species are regarded as a group of economically useful plants whose heterodichogamous features have gained extensive attention. This includes *J. regia*, *J. mandshurica*, *J. ailanthifolia*, *J.nigra*, *J. hindsii*, *J. cinerea*, and *J. cordiformis* in *Juglans*; *C. illinoensis*, *C. ovata*, *C. tomentosa*, and *C. laciniosa* in *Carya*; and some inter-specific hybrids in this family [7,8]. Different from nutand/or timber-use juglandaceous plants, *C. paliurus* is important for its leaf-use value. In the past two decades, an extraordinary amount of progress has been made in determining the medicinal functions of this species. Publications have mostly stressed the actions of bioactive components, including cyclocariosides, cyclocaric acids, flavonoids, and steroids. These leaf metabolites have been verified to exert important medicinal and health functions on conditions such as hypoglycemia, hypolipidemia, and diabetes mellitus, as well as carrying out antioxidant activities [12–16]. However, leaf resources only from natural stands limited its wide applications. Therefore, the cultivation of *C. paliurus* with improved accumulation of bioactive metabolites for leaf harvest is concerned [17,18]. To date, sexual rather than vegetative propagation has been successful for extensive cultivation. Importantly, the release of deep seed dormancy has greatly driven the development of plantations [19]. However, seed shortages resulting from low plumpness is now the main issue for large-scale plantations. Therefore, the following questions need to be answered: Why does such a low seed plumpness occur in populations of *C. paliurus*? Is this affected by genetic heterodichogamy or yearly changing climatic factors? Is the flowering phenology in *C. paliurus* similar to that of other recorded heterodichogamous Juglandaceae species? Therefore, the aims of this study are the following: (1) To profile the flowering patterns of the two mating types of *C. paliurus* (PA and PG) and the population as a whole using consecutive 5-year observations on the juvenile plantation; (2) to monitor the density and viability of pollen at the population level during the flowering season; (3) to analyze and hypothesize the reasons for the low seed success in the population using the chromosomal ploidy level of *C. paliurus*.

#### **2. Materials and Methods**

#### *2.1. Study Species*

*Cyclocarya paliurus*, a species of monotypic genus in Juglandaceae, is endemic in subtropical region of China (from 24◦16 12 N to 33◦22 12 N and from 103◦28 12 E to 119◦22 48 E). Often, small populations (<25 individuals) grow in moist valleys in mountain regions at an altitude of 390–1836 m, and mature trees reach a height of 10–30 (40) m because of their heliophilous characteristic [20].

A single female inflorescence of *C. paliurus* is found at the apex of the growing shoot (Figure 1A); rarely, a single male instead of a female inflorescence is observed at this position (Figure 1C). A cluster containing 2–4 catkins is found at the lateral short branch (Figure 1A). Besides unisexual inflorescences, a small number of female inflorescences mixed with male flower(s) are observed in some individuals at certain years (Figure 1D). However, similar to maples, the male flowers in female inflorescence are abortive [21]. Previous studies revealed that *C. paliurus* is a typically heterodichogamous species that undergoes wind-pollination [10,22].

**Figure 1.** Expressions of male and female inflorescences in *Cyclocarya paliurus*. (**A**) Male and female inflorescences of the protogyny (PG) type. The female flowers are mature, while the male flowers are still enclosed. (**B**) Protandry (PA) inflorescences at the elongation stage, showing the developmental differences between the two sexual inflorescences. Abnormal inflorescences include (**C**) a single male instead of a female inflorescence at the apex of growing shoot and (**D**) a female inflorescence mixed with a male flower.

#### *2.2. Plantation of C. paliurus*

A plantation of *C. paliurus* used in this study is located on Hongya Mountain and in Chuzhou City and Anhui Province, China (32◦21 N, 117◦58 E), where the climate is classified as northern subtropical humid monsoon, with an annual mean temperature of 15.5 ◦C, an annual rainfall of 1038 mm, and a frost-free period of 210 days. The site is dominated by a gravel mountain with a slight slope (5◦–6◦).

The plantation was established in 2008, containing 835 individuals of 53 families from 8 provenances (Lushan, Jiangxi; Hefeng Hubei; Shucheng, Anhui; Jianhe, Guizhou). The plant spacing was 3 m × 4 m, and the stand was not closed until 2018. A small number of flowering plants (mostly only one sexual inflorescence occurred) were observed in 2012; subsequently, an increasing number of plants exhibited sexual polymorphism.

#### *2.3. Investigation of Flowering Phenology*

Phenological monitoring was performed for a 5-year consecutive period (2014–2018), and observations were recorded from 2015, when at least one sexual flower (male/female) occurred in 3/8 individuals in the population. Characteristics recorded included flowering phenotype, flowering progress, and sex expression for all individuals. We also observed whether the flowering phenotype involved flowering or not, as well as the dates of onset, duration, and termination for male and female flowering each year.

The female flowering period was determined by two feathery stigmas with an angle of 120◦ and the change of the stigma color from green to brown (withering); the period of male flowering was from pollen shedding to shedding completion. The full-bloom stage for the population was defined as 50%–75% of flowers being in each flowering phase for monoecious individuals. Observations were made on male and female inflorescences tagged in the middle section of each plant.

#### *2.4. Pollen Dispersal*

To monitor the density and viability of pollen within the flowering season, pollen was collected from 5 trapping sites on stocks at a height of 1.5 m along a line of "S" in the stand. Six slides covered with a culture medium of 10% sucrose and 0.01% boric acid were placed on each stock. Three slides collected every two days, each for 24 h, were used for pollen density counting, while the remaining 3 slides were collected from 9:00–11:00 (when the highest pollen viability is observed) each day (except for rainy days) during flowering duration and used for pollen viability testing.

The slides were examined by (1) counting the number of pollen grains adhering to the surface of the culture medium under a microscope at ×50 magnification, and (2) culturing pollen in an incubator at 25 ◦C for 8 h. The pollen grain was considered viable when the pollen tube grew to the length of itself. The mean and standard error of the pollen density and viability were calculated over ten visual fields for each side.

#### *2.5. Seed Collection*

Respectively, 20 PA and 20 PG plants were randomly selected in 2017 and 2018, a total of 300–500 seeds were collected from the middle of each selected tree. Seed plumpness was judged by cutting seed along the hilum linear direction.

Values of pollen density and viability and seed plumpness are presented as mean ± SE, and an ANOVA analysis was conducted to determine differences among years and between mating types using SAS 18.1.

#### **3. Results**

#### *3.1. Sex Expression of the Juvenile Population*

As reported in the previous document, the flowering season of *C. paliurus* lasted about one month, from mid-April to the end of May. Two separated flowering phases occurred in the population: one during late-April to mid-May, while the other happened across all of May.

#### 3.1.1. Sexual Diversity

A total of five phenotypes from three groups were recognized for individuals in the population: The monoecious group (MO) containing individuals with both male and female inflorescences, including protogyny (PG) and protandry (PA); the unisexual group (UN) including individuals with only one sexual inflorescence, either male (M) or female (F); and unflowering trees (UF). MO and UN were both classified as flowering plants (FP).

#### 3.1.2. Expression Features of Various Flowering Phenotypes

In a natural population, initial flowering usually occurs at the age of 10–15 years. However, the onset of flowering in a plantation is significantly earlier than in a natural population. In the 5-year-old plantation (in 2013), the F type was predominant among FP plants, up to 75.7% and 53.8% in 2014 and 2015, respectively; the second most dominant type was the PG type. Comparatively, few plants were of the PA type, and this minority was assigned to the M type (Table 1).


**Table 1.** Annual expression of various flowering phenotypes in the juvenile plantation of *C. paliurus*.

\* Data changed annually because of the death of individuals. PA: protandry, PG: protogyny; MO: monoecious group, FP: flowering plants, UN: Unisexual group; F: females, M: males; T: total individuals in population.

With the growth of the population, more individuals flowered. The FP/T showed a tendency to increase, from 43.2% in 2014 to 73.5% in 2018 (Table 1). More significantly, the promotion of MO/T increased rapidly from 11.1% (in 2014) to 57.2% (in 2018), whereas there was a descent in UN/T, from 33.6% (2014) to 16.3% (2018).

This data suggests that the ages of 6–10 years (defined as the juvenile population) could be regarded as the transitional stage towards maturation for *C. paliurus*.

#### 3.1.3. Changing Patterns of Mating Types across Years

Although there was a tendency for the number of flowering individuals to increase (Table 1), the changing trends were divergent for various flowering phenotypes (Figure 2). In 2016, there were rapid increments of PA and PG morphs, accompanied by a rapid decrease in individuals of the F type. Of the four flowering phenotypes, similar increasing tendencies were observed in PA and PG types, but the proportion of PA was always less than that of PG. In addition, only a slight increment was observed in the M type, rising from 0.9% to 3.2%. On the contrary, the proportion of F type individuals dropped significantly from 32.71% to 12.68% across the five-year period (Figure 2).

Meanwhile, strongly skewed ratios of mating types, namely, PG/PA or F/M, always existed in the juvenile population (Table 1). Significantly, the ratio of PA/PG fell obviously from 5.42:1 in 2014 to 1.39:1 in 2018. Further the F/M ratio fell from 37.43:1 in 2014 to 3.5:1 in 2018. An *X*<sup>2</sup> test indicated the protogynous-biased ratio in the population during 2014 to 2018 (PG/PA: 65/12, *x*<sup>2</sup> = 36.5, d.f = 1, *p* < 0.01 in 2014; 86/40, *x*<sup>2</sup> = 16.8, d.f = 1, *p* < 0.01 in 2015; 173/127, *x*<sup>2</sup> = 7.2, d.f = 1, *p* < 0.01 in 2016; 200/151, *x*<sup>2</sup> = 7.7, d.f = 1, *p* < 0.01 in 2017; and 257/185, *x*<sup>2</sup> = 11.7, d.f = 1, *p* < 0.01 in 2018) significantly deviated from the equilibrium ratio of 1:1, but showed a tendency towards relative equilibrium.

**Figure 2.** Changing trends of flowering phenotypes with the growth of the *C. paliurus* population.\* Number of each flowering phenotypes/total plants in the population. F: females, M: males.

#### 3.1.4. Reciprocal Transitions between Mating Types

Owing to the immaturity of the juvenile population, plants were sensitive to nutrient shortage and unstable environmental factors. Therefore, reciprocal transitions among five phenotypes happened frequently between adjacent years (Table 2). As a whole, with the growth of population, the ratio of transition from UF to FP showed a decreasing trend from 68.8% to 51.2%, whereas there was an opposite tendency for the transition from UN to MO (Table 2).

It seems that F and PG types played the core roles for all transitions. As shown in Table 2, reciprocal transitions happened more often between UF↔F, UF↔/PG, and F↔PG/PA, and scarcely between M↔F and PA↔PG. Remarkably, observed transitions between PA and PG were usually ambiguous and mainly happened in individuals with a high overlap of male and female flowering (Figure 3).


**Table 2.** Transitions between flowering phenotypes in the juvenile plantation of *C. paliurus*.

#### *3.2. Flowering Phenology of PG and PA Types across Years*

Over the five-year period, 21 PA and 30 PG plants were screened via consecutive observations. As shown in Figure 3, for all marked individuals, a longer flowering duration was observed in females than in males; phenology also coincided with the sequence of flowering expression, but differed in terms of the separation of male and female functions across years. Such labile phenological characters could be mainly affected by environmental factors (e.g., temperature, rain, and wind).

Within each PA individual, the separation between two sexual functions gradually shortened and tended to overlap over the years. In 2015, separation occurred in 17 PA trees; however, it only occurred in four trees in 2018. Although overlap was common within PG individuals, it varied across years. This is in accordance with the statistical data for all monoecious plants, which displayed more overlap in the PG type (50%–90%) than in the PA type (14.3%–47.6%). Therefore, the potential mating probability within PA/PG individuals seems to be dependent on the overlap degree of two sexual functions.

**Figure 3.** Flowering phenology of males and females within individuals based on 21PA and 30PG trees of *C. paliurus* during 2015–2018.

#### *3.3. Phenological Characteristics of the Population*

PA and PG morphs are the two main types of heterodichogamous species. However, in the juvenile population of *C. paliurus*, monoecious plants were not more than half of the total (57.2%) until 2018 (Table 1). Moreover, some monoecious plants could transfer to other phenotypes, such as F or unflowering ones (Table 2). Here, all individuals of PA or PG morphs were clustered together to describe the flowering characteristics of the population.

Figure 4 illustrates that two incomplete separation flowering phases existed in the population. In the first phase, from 22–26 April to 12–18 May, PG females and PA males were in bloom, while during the second phase, from 28 April 28–1 May to 20–26 May, PG males and PA females were in bloom. Each phase lasted 20–25 days and the obvious overlap (7–10 days) of inter-phases varied across years. Generally, the female flowering duration (>25% individuals in the population in bloom) for two morphs was significantly longer than that of the males (one week).

Comparatively, the male flowering duration was shorter by 4–5 days in the first phase than in the second one. In the first phase, the peak of male flowering was reached earlier than that of female flowering, which lowered the opportunity for inter-morph pollination. In contrast, the flowering peaks of the complementary sexes were rather well synchronized in the second phase (Figure 4). This difference in inter-morph synchronism for each phase could result in higher seed success in PA (pollinated in the second phase) than in PG (pollinated in the first phase) and a lower chance for pollen from PA males than PG males as contributors to pollinate the PG females. In addition, the overlap of inter-phases mainly happened between PG males and PG females, increasing the intra-morph inbreeding probability (Figure 4).

**Figure 4.** Number dynamic patterns of PA and PG flowering individuals in the population of *C. paliurus* during 2015–2018. The back lines describe the first flowering phase, while the blue lines describe the second phase. High overlap was obvious between the two phases and the two sexual functions within each phase.

We found that the flowering expression of population was more complicated than the assembly of PA/PG individuals. Separation of the two sexual functions mainly existed within individuals rather than in the population (Figures 3 and 4). Rather than selfing within individuals, inter-morph and intra-morph pollination were dominant at the population level.

#### *3.4. Pollen Dispersal Characteristics of the Population*

A sufficient quantity and high viability of pollen guarantee for seed success. To illustrate the effect of pollen on seed bearing, pollen density and viability during flowering in the population were surveyed for three consecutive years.

Generally, pollen density displayed an obviously rising trend with the maturation of the population. In the first phase, the maximum pollen density in 2018 was up to 305 <sup>±</sup> 56 grain/cm2, far above that of 2016 (113 <sup>±</sup> 19.7 grain/cm2) and 2017, in the second phase, a similar maximum was observed over three years. The pollen density among inter-phases showed significant differences in 2016 and 2017 but was similar in 2018. This indicates that pollen dispersal is affected not only by nutrient conditions, but also by annual climate change.

In addition, the pollen viability kept rising with the growth of the population, showing maximums of 27.6% and 31.1% in 2018, higher than that of 16.4% and 28.6% in 2017 for the two flowering phases, respectively. Overall, the low pollen viability (30% or so) did not show a positive correlation with pollen density; moreover, irregular dynamics were monitored in the population over the three years (Table 3).


**Table 3.** Pollen density and viability during the flowering period in 2017 and 2018.

Note: data was listed as average ± SE; '/' means no data offered because of rainfall; data in red was not from the same day but instead was from the previous/next day to show the changing tendency.

#### **4. Discussion**

Usually, mature trees (flowering individuals) of *C. paliurus* are up to 10–30 (40) m tall, and both male and female inflorescences cover the outer crown above the canopy of the forest. It is difficult to find natural populations of a sufficient size to monitor flowering phenology; thus, plantation could be a better option for phenological observation [20]. Plants in our plantation included 53 families from eight provenances, representing the majority of the natural germplasm.

#### *4.1. Sexual Polymorphism and Bias Associated with Nutrient Conditions and Climatic Factors*

In natural populations, several heterodichogamous species (e.g., *A. japonicum*) were reported to include not only monoecious trees (PA/PG), but also unisexual trees (M/F) [23–25]. Similar findings were also observed in the *C. paliurus* population. For the first time, we showed that the F type is overwhelmingly superior in juvenile stands (2014), followed by the PG type. Opposite changing trends—an increment in PG and a reduction in F type-were then monitored over 5 years. Meanwhile, the number of PA trees tended to increase, though this was at a slower rate than the PG trees (Figure 2).

Levy and Dean [26] pointed out that most woody plants can flower only when they reach a certain age or size; flowering expression can be linked to factors such as nutrient supply and adversity. A low nutrient supply may support the formation of female flowers for *C. paliurus*. Therefore, the female superiority (F/PG) could be associated with the nutrient conditions in the juvenile population. This hypothesis is supported by our previous findings that flowering expressions are significantly

related to the plant size of *C. paliurus* [10]. Alternatively, adaptive interpretation considers that female-bias favors the production of more seeds for species survival under adverse conditions, like nutrient shortage and changing climatic factors (dry, cold, etc.). However, more extensive studies for longer periods need to be conducted to confirm this.

#### *4.2. Morph Ratio and Transition*

Generally, an equal morph ratio (PA:PG = 1:1) should occur in a heterodichogamous population, as shown in *Juglans* species, such as *J. regia*, *J. ailanthifolia*, and *J. mandshurica*, as well as in some *Acer* species (e.g., *A. opalus* and *A. japonicum*) (reviewed by Liu [8]). However, derivations from this equilibrium have been found in some species (e.g., *A. pseudoplatanus*) [24] and under certain situations (e.g., small population or human disturbance, like intentional selection) [27]. Similarly, a distinctive bias was found in our study. We speculate that the "juvenile" population is the main reason for this deviation, and the change in the morph ratio (PG/PA) from 5.42:1 in 2014 to 1.39:1 in 2018 predicts a state of equilibrium in the mature population.

Labile sex expression was documented in some heterodichogamous trees across years, like in *Acer pseudoplatanus* [24,28]. It is reported that the sexual phenotype changed mainly from unisexual to the monoecious type, with few or no changes between PA and PG [8,9]. We found that unisexuality is a temporary trait based on the reduction of the ratio of M and F types with the maturation of the stand (Table 2). Similar findings were reported in *J. ailanthifolia* and J. *mandshurica* [9,29]. Researchers believe that the unisexual trait (F and/or M) is an environmental and/or nutrient (young age) response phenotype rather than a separate morph [9,24,25].

Morphs of PA and PG are expected to be stable according to the theory that sex morphs are genetically determined [6,7], which has been demonstrated by some observations in *J. regia* and *P. strobilacea*. [9,29]. However, documents involving transitions between two morphs in *C. ovata*, *C. tomentosa*, and *A. japonica* have been published [24,25,30]. Such a transition in *C. paliurus* was also recorded in several individuals. On these plants, highly overlapping flowering durations of two sexual flowers led to the ambiguous judgment of mating type in 2018 (Figure 3). Thus, further observation needs to be made to confirm such transitions.

#### *4.3. Flowering Phenology of the Individual and Population*

Heterodichogamous populations showed two flowering phases and bimodality in gender for two distinct sexual morphs. This has been described in *J. regia*, *J. hindii*, *J. ailanthifolia*, and *J. mandshurica*, and *Platycarya strobilacea* [4,6,9,29,31]. Moreover, sufficiently long separation between individuals can avoid selfing but does not prevent intra-morph inbreeding [4]. Therefore, flowering separation and the overlap of two sex inter-phases as well as within-phase and within-individual are the key factors affecting mating patterns. As described by numerous publications, both features are divergent among heterodichogamous species, varying from complete separation to high overlap. In contrast to the pattern of no overlap within individuals and inter-phases in *J. mandshurica* [4] and a short period of overlap in the PG type of *J. ailanthifolia* [29], high overlap occurs extensively, not only inter-phase but also intra-morph and within individuals in *C. paliurus*. High overlap within individuals and intra-morphs seems to favor selfing and inbreeding but is contrary to the mechanism of avoiding assortative mating for heterodichogamous plants [4,32].

Mating fitness also depends on the synchronism of each sexual flower-very early/or late individuals would suffer low reproductive success. Bai et al. [4] observed that female flowering peaks were earlier than male peaks for two periods in *J. mandshurica*, and the PG morphs had a higher level of assortative mating than PA ones. We found that the flourish flowering period of the male preceded that of the female in the first stage, but the female flowering period covered the complete duration in the second stage for four consecutive years (Figure 5), suggesting that a higher seed set could be more successful in PA than in PG. This prediction was further supported by the result that seed plumpness in PA (22% ± 8.2%) was significantly higher than in PG (12% ± 6.8%) (*p* < 0.05).

**Figure 5.** Flourish flowering duration (>50% individuals of PA and PG in bloom) of two stages in the population of *C. paliurus*.

#### *4.4. Reasons for Low Seed Success in* C. paliurus

Heterodichogamy is recognized as a key factor affecting seed production, and high fruit production has been reported in many species, such as *J. regia* and *C. illinoinesis* [33,34]. Strangely, seeds with low plumpness of 10%–30%, and rarely 50%, were collected from natural populations of *C. paliurus* for nearly ten years. The low seed success (0–30%) in our plantation during 2016–2018 was also in accordance with the natural situation.

Low seed success in a small population can result from the bias of two mating types, like in *A. pseudoplatanus* [28]. Limited by the habitat of *C. paliurus* or by human disturbance, the size of more than two-thirds of the natural populations investigated was less than 25 individuals [20]. This might be one of the reasons for the low seed set in the natural population; however, it cannot be the explanation for our plantation. Furthermore, the overlap of the flowering phenology in *C. paliurus* seems to provide a better chance of pollination, including crossing, inbreeding, and even selfing, and a high seed success should be expected, according to the 100% rate of self-compatibility in *Juglandaceae* species [1]. However, the investigated data did not support these speculations.

#### *4.5. Polyploid C. Paliurus Relating to Characters of Flowering and Seed Success*

Different from documented heterodichogamous species in *Juglandaceae*, observations over five years demonstrated a high overlap between the two flowering phases and within individuals, low pollen viability rather than pollen density in the population (Table 3), and low seed plumpness not only in the natural stands and plantation, but also in the control pollination (12%–38%, data unpublished). This infers that unknown reason(s) could be responsible for the previously mentioned phenomenon.

Unexpectedly, data based on a genomic survey found the existence of a tetraploid individual. Subsequently, by screening 1087 individuals collected in our germplasm from 42 provenances from 13 provinces using a cell flow meter, we discovered that tetraploid plants occupied about 95%, and diploid and triploid (uncertain) ones accounted for about 5%. In a word, diploid-tetraploid plants coexist in a natural population. *C. paliurus* is the first species found to have both polyploidy and heterodichogamy characteristics.

Based on these findings, many differences observed in *C. paliurus* from other heterodichogamous species can be elucidated. The dominance of multi-allele rather than two-allele in polyploid *C. paliurus* controlled heterodichogamy might trigger the inter-phase overlap of flowering characteristics as well as within individuals. As we know, low pollen viability from polyploidy [35] could result in low seed success. However, based on the theory that heterodichogamy is simply inherited with a single diallelic Mendelian locus in diploid species [1], it is unclear how heterodichogamy is determined in polyploid *C. paliurus*. Further, it is unknown how *C. paliurus* with its low seed success rate ensures its succession. Thus, further research should be done in *C. paliurus* to disclose these secrets.

#### **5. Conclusions**

The female-bias (PG and F) and skewed ratio of mating types that occur in *C. paliurus* correspond to the nutrient accumulation in the juvenile population. Heterodichogamy in*C. paliurus* was demonstrated, but with a substantial difference from other recorded species. Low seed success in the population mainly results from low pollen viability. However, the latest finding of a polyploidy majority in the natural population could explain the flowering phenology and seed set characteristics of *C. paliurus*, but also raises more questions that need to be answered.

**Author Contributions:** This work was designed, directed, and coordinated by X.-X.F., who provided technical guidance for all aspects of the project and wrote the manuscript. X.M. was the principal investigator, contributed to the fieldwork and data analysis, performed the literature search, and helped with the writing of the manuscript. P.H. monitored the density and viability of pollens. X.-L.C. and Y.-Q.Q. assisted with the fieldwork and analysis of polyploidy for all individuals.

**Funding:** This research was funded by the National Natural Science Foundation of China (No. 31470637) and Priority Academic Programme Development of Jiangsu Higher Education Institutions, PAPD.

**Acknowledgments:** We also thank Jiaqiu Yuan, JingJing Liu, Qiang Lu, and Biqing Chen for help with fieldwork, and anonymous referees and the editor for valuable comments on the manuscript.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article Pinus massoniana* **Introgression Hybrids Display Differential Expression of Reproductive Genes**

**Jiaxing Mo 1,2, Jin Xu 1,2,\*, Yuting Cao 1,2, Liwei Yang 1,2, Tongming Yin 1,2, Hui Hua 1,2, Hui Zhao 1,2, Zhenhao Guo 1,2, Junjie Yang 1,2 and Jisen Shi 1,2**


Received: 23 January 2019; Accepted: 28 February 2019; Published: 5 March 2019

**Abstract:** *Pinus massoniana* and *P. hwangshanensis* are two conifer species located in southern China, which are of both economic and ornamental value. Around the middle and lower reaches of the Yangtze River, *P. massoniana* occurs mainly at altitudes below 700 m, while *P. hwangshanensis* can be found above 900 m. At altitudes where the distribution of both pines overlaps, a natural introgression hybrid exists, which we will further refer to as the Z pine. This pine has a morphological character that shares attributes of both *P. massoniana* and *P. hwangshanensis*. However, compared to the other two pines, its reproductive structure, the pinecone, has an ultra-low ripening rate with seeds that germinate poorly. In this study, we aimed to find the reason for the impaired cone maturation by comparing transcriptome libraries of *P. massoniana* and Z pine cones at seven successive growth stages. After sequencing and assembly, we obtained unigenes and then annotated them against NCBI's non-redundant nucleotide and protein sequences, Swiss-Prot, Clusters of Orthologous Groups, Gene Ontology and KEGG Orthology databases. Gene expression levels were estimated and differentially expressed genes (DEGs) of the two pines were mined and analyzed. We found that several of them indeed relate to reproductive process. At every growth stage, these genes are expressed at a higher level in *P. massoniana* than in the Z pine. These data provide insight into understanding which molecular mechanisms are altered between *P. massoniana* and the Z pine that might cause changes in the reproductive process.

**Keywords:** *Pinus massoniana*; introgression hybrid; RNA sequencing; DEGs; reproduction

#### **1. Introduction**

Gymnosperms have their own unique way of reproduction. Microspores grow into pollen carrying sperm cells, while megaspores develop into the megagametophyte. The archegonium forms during development of the megagametophyte. Then, pollen enter ovule through micropyle, they move towards the egg by way of extending a pollen tube, after which fertilization occurs. Finally, the embryo is formed and develops into a gymnosperm seed.

Conifers possess a series of properties that makes exploring their molecular biology through a genomics approach challenging, such as a long life cycle, a reproductive process lasting months or even years, a gigantic genome size and so forth [1,2]. We aimed to explore the molecular mechanism of conifer reproduction by generating transcriptome data through RNA-seq, of successive stages of the developing pinecone. RNA-seq technology has advanced to the stage where it is highly efficient, sensitive and accurate.

By studying the expression dynamics of differentially expressed genes (DEGs) in pinecones, we sought to gain insight into the relevant genes that control the reproductive process of conifers. Previous empirical studies have suggested several genes linked to reproduction, such as *DAL* [3], *MADS-box* [4,5], *MYB* [6] and *MSI* [7] and so forth. However, studies of the determinants of development and regulation of reproduction have concentrated on model angiosperm species so far, while gymnosperms remain largely understudied.

We centered our studies around two types of conifers, naturally occurring in China. First, *Pinus massoniana*, also known as the masson pine, is an economically important conifer species. It is mainly distributed in various southern Chinese provinces, at altitudes below 700 m around the middle and lower reaches of the Yangtze River. It offers wood and pulp for manufacturing furniture and paper and also supplies natural resin, which can be further manufactured into 'resin,' a crucial product used in the maintenance of instrument strings and as an ingredient in medicine. Furthermore, *P. massoniana* fulfills a significant ecological role by replacing or compensating natural forest destruction due to its fast growth and abundant biomass.

The second, *Pinus hwangshanensis*, primarily grows in southeastern China. It grows most abundantly at an altitude above 900 m around the middle and lower reaches of the Yangtze River. Growing at a higher altitude limits its speed of growth, as well as accumulation of biomass and the resin compared to *P. massoniana*. *P. hwangshanensis* is often viewed as luxurious and graceful, making it an ideal ornamental tree.

Mountain Lushan (Figure 1a) is located within the distribution area of both conifer species. Its peak has an altitude of 1474 m and it supports both *P. massoniana* and *P. hwangshanensis* vegetation at the afore mentioned altitudes. Where *P. massoniana* and *P. hwangshanensis* distribution overlaps, a natural introgression hybrid of both species occurs, sharing phenotypic characters of both its parents (Figure 1b) [8,9]. Due to the hybrid not being named yet, we refer to it as the 'Z pine' in this article. The Z pine has an extremely low germination and ripening rate compared to both *P. massoniana* and *P. hwangshanensis* [10]. These characters could indicate that the Z pine displays genetic incompatibilities during fertilization and/or even embryonic development. What causes this phenomenon is still unknown.

In this study, we collected seven successive development stages of open-pollinated cones *P. massoniana* and the Z pine, respectively. Then we characterized the transcriptome of these cones using Illumina high-throughput sequencing technology and forty two cDNA libraries were constructed. A series of experiments was performed to mine candidate genes, focusing on differential expression patterns between these two species. Moreover, differentially expressed genes related to fertilization and embryonic development were determined and analyzed in both taxa. This study could help explaining the defect of the Z pine of its unusual low ripening rate and germination rate, comparing to *P. massoniana* and may provide an approach to understanding difference between species and its introgressive hybrid at the transcriptome level.

(**a**)

**Figure 1.** (**a**) Location of Mountain Lushan. Mt. Lushan lies in Jiujiang City, Jiangxi Province, China. The Great Han Yang Peak, the highest point of the mountain, is 1474 m high. The local climate is humid, subtropical; (**b**) Schematic diagram of main distribution area of *P. hwangshanensis*, the Z pine and *P. massoniana* on Mt. Lushan.

#### **2. Materials and Methods**

#### *2.1. Sample Collection*

Differently staged, openly pollinated cones of *P. massoniana* and the Z pine were collected on Mt. Lushan, Jiujiang, China (Table 1, Figure 2). Due to complex environment of forest land and wind-pollinated way of pine, sampled individuals (especially the Z pine) may possess different level of introgressive background. In order to minimize the different between them, we assigned five maternal trees of each taxa (*P. massoniana* or the Z pine) in its sample plot. Experience tells us that *P. massoniana* carries out its pollination at around 10 April while the Z pine at around 20 April at sample plots. Aware of this condition, we conducted our first sample collection on 27 April (Table 1), when both *P. massoniana* and the Z pine are already pollinated. The cones were packed with aluminum-foil shortly after collection and then immediately submerged in liquid nitrogen, after which they were stored in a −80 ◦C freezer until RNA extraction.


**Table 1.** Information on the geographical sites of sample collection of *P. massoniana* and the Z pine.

**Figure 2.** Successively staged pine cones of *P. massoniana* (**a**) and the Z pine (**c**). Details of mature cones of MG (**b**) and ZG (**d**) stages are shown. The scale shields of *P. massoniana* are flat or slightly bulged, the transverse ridge is not very obvious, and the scale umbilical has no thorn. The scale shield of the Z pine is bulged, the transverse ridge is obvious, and the scale umbilical is thorny. Codenames for collected cones are explained in Table 1. (Scale = 10 mm.).

#### *2.2. RNA Extraction and Sequencing*

We randomly collected three to six cones for RNA isolation of each sample code to make sure that these cones and their RNA could be representative. Cones were taken from the −80◦C freezer and briefly re-frozen in liquid nitrogen to further weaken tissue, cut into pieces, after which sections containing ovules (or seeds) were collected and crushed. RNA was extracted from each sample using the Bioteke Plant RNA Extraction Kit (Beijing, China). Three replications of RNA were extracted for each sample code. Purity and quality of the RNA samples was checked respectively by measuring 260 nm/280 nm UV absorption values with a Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA,

170

USA) and examining the RIN (RNA Integrity Number) with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).

After RNA extraction, magnetic oligo (dT) beads were used to purify mRNA, which was then collected using RNeasy RNA reagent. The mRNA was then cut into small fragments using the RNA Fragment Reagent (Illumina, San Diego, CA, USA) and subsequently cleaned using an RNeasy RNA Cleaning Kit (Qiagen, Germany). First-strand cDNA was then synthesized using MMLV reverse transcriptase (Takara, Japan), while second-strand cDNA synthesis was performed using DNA Polymerase I and RNase H. cDNA was finally sequenced on an Illumina Hiseq X Ten (Illumina, USA). The sequencing raw data were submitted to the NCBI Short Reads Archive (SRA) database under the BioProject accession number PRJNA482692.

#### *2.3. Data Processing and Assembly*

The raw RNA-seq data was screened by removing adaptor and low quality sequences using Trimmomatic [11]. Qualified reads were assembled into non-redundant transcripts by Trinity [12,13], using the following parameters: –min\_contig\_length 200 –min\_kmer\_cov 2 –min\_glue 3 –seqType fq, other parameters use default settings. Contigs were assembled by Trinity, gathered and processed by TGICL [14], the parameters of TGICL are as follows: -l 40 -c 10 -v 25 -O -repeat\_stringency 0.95 -minmatch 35 -minscore 35. After which unigenes were collected.

#### *2.4. Functional Annotation*

Unigenes were generally annotated by aligning against SwissProt (Release-2016\_07) [15], NCBI non-redundant protein sequences (Nr, Release-20160314), NCBI non-redundant nucleotide sequences (Nt, Release-20140514) [16], Kyoto Encyclopedia of Genes and Genomes (KEGG, Release 59.3) [17,18], Cluster of Orthologous Groups of proteins (COG, Release-20090331) [19] and Gene Ontology (GO) [20] databases, selecting the most likely annotations. Blast2GO (v2.5.0) [21] was conducted as GO annotation tool against GO database (Release-201604) under default settings. Furthermore, analysis of GO function and KEGG pathways of differentially expressed genes was performed.

#### *2.5. Differentially Expressed Genes (DEGs) and Gene Expression Pattern Analysis*

Calculation of unigene read counts was performed using RNA-Seq by Expectation-Maximization (RSEM) software [22]. RSEM results were transformed into FPKM [23] values (expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced), commonly used for measuring gene expression levels. DESeq was used to determine differentially expressed genes of different transcript libraries [24]. Differentially expressed genes were assigned based on a threshold value of FDR (false discovery rate) ≤ 0.001 and|log2Ratio| ≥ 2. Gene expression patterns of *P. massoniana* and the Z pine were assembled by Short Time-series Expression Miner (STEM) [25].

#### *2.6. Validation by Quantitative Real-Time PCR (qRT-PCR)*

Quantitative real-time PCR was applied for validating differentially expressed genes detected by our RNA-seq analysis. Primers were designed using the NCBI Primer-Blast Tool [26] and synthesized by Generay Biotech Co., Ltd. (Shanghai, China). cDNA samples for qRT-PCR were synthesized using the Vazyme HiScript II Q RT SuperMix for qPCR (Nanjing, China). qRT-PCR was carried out using an Applied Biosystems 7500 PCR cycler (Thermo Fisher Scientific Corporation, CA, USA) and Vazyme ChamQ SYBR qPCR Master Mix (Nanjing, China) as reaction reagent kit. Each sample was run in triplicate, with samples having a final volume of 20 μL: containing 10 μL of ChamQ SYBR qPCR Master Mix (2×), 0.4 μL of each primer, 2 μL of cDNA and 7.2 μL of ddH2O. The reaction program was according to standard product instructions. An *Actin* gene that was discovered from RNA-seq data (Unigene69821\_All) was utilized as reference gene. The qRT-PCR data was analyzed with the 2−ΔΔCt method [27].

#### **3. Results**

#### *3.1. Illumina Sequencing and Assembly*

A total of 160 Gb raw data was obtained. It has an average of 53 million reads per library. We evaluated the quality of the original and clean sequencing data of all samples. The Q30 value has a range of 88.49~92.33 % for the original sequencing data (Table S1) and a range of 91.75~94.31% for the clean data (Table S2), indicating that this data set is ready for further assembly. All transcripts that we obtained from the staged pine cones of our two conifer species were assembled into 93,291 unigenes (see Materials and methods for details, Table 2). About 39.88% of them exceeded 2 kb in length, while 37.02% unigenes have a length from 1 kb to 2 kp and 23.1% were 100 bp to 1 kb (Figure S1). The average length of unigenes is 1987 nt, while N50 is 2494 nt.


**Table 2.** Number and length of unigenes.

1,2 When all samples are assembled, they would express much higher abundance than single samples, therefore the data of 'All' is usually higher than others.

#### *3.2. Functional Annotation of P. massoniana and Z Pine Unigenes*

The assembled unigenes were annotated against the SwissProt, NCBI non-redundant protein and nucleotide sequences (Nr and Nt), Kyoto Encyclopedia of Genes and Genomes (KEGG), Cluster of Orthologous Groups of proteins (COG) and Gene Ontology (GO) databases. A total of 86,006 unigenes were annotated in *P. massoniana* and the Z pine, of which 25,150 unigenes were annotated against all six databases (Figure S2).

We were able to annotate most genes using the Nr database, using sequences from *Picea sitchensis* for the bulk of the annotation (30,943), after which *Amborella trichopoda* (7055) and Indian lotus (4504) provided most annotations, suggesting that these sequenced species are most closely related to *P. massoniana* and the Z pine (Figure S3).

We then used data from the COG, GO and KEGG databases for unigene functional prediction. Using the COG database, 30,017 unigenes could be annotated and were classified into 24 functional categories. The 'general function prediction only' was the most abundant, followed by 'transcription' and 'replication, recombination and repair,' 'function unknown' and 'signal transduction mechanisms' (Figure 3).

A total of 229,950 redundant unigenes (with 38,619 nonredundant unigenes) were annotated into 56 sub-categories under three primary GO categories: biological process, cellular component and molecular function (Figure 4). The top three sub-categories were metabolic process (23,854 unigenes), cellular process (22,439 unigenes) and catalytic activity (20,324 unigenes).

The KEGG classification placed 41,931 unigenes into 5 pathway functional categories (Figure 5): organismal systems (5474 unigenes), metabolism (37,045 unigenes), genetic information processing (9684 unigenes), environmental information processing (2871 unigenes) and cellular processes (1487 unigenes). The top three sub-categories out of a total of 18 were 'global map,' 'environmental adaptation' and 'carbohydrate metabolism,' which contains 14,748, 5350 and 4965 unigenes, respectively.

**Figure 3.** COG functional classification of *P. massoniana* and the Z pine unigenes.

**Figure 4.** Gene ontology (GO) functional classification of *P. massoniana* and the Z pine unigenes.

**Figure 5.** Histogram of the KEGG Pathway classification of *P. massoniana* and the Z pine unigenes.

#### *3.3. Analysis of Expected Number of Fragments per Kilobase of Transcript Sequence per Million Base Pairs Sequenced (FPKM)*

FPKM values were calculated using RSEM software. The general density distribution of expression quantity (Figure 6) was analyzed and showed that the average total of expressed mRNAs across all unigenes of *P. massoniana* and the Z pine varies between species and stages.

**Figure 6.** (**a**) General density distribution of *P. massoniana* and Z pine unigenes by FPKM analysis; (**b**) Pairwise comparison of general density distribution of same stage *P. massoniana* and Z pine unigenes by FPKM analysis, M: *P. massoniana* (red curve), Z: Z pine (green curve), A~G: the seven successive stages of pinecone development. X-axis: logarithm to base 10 of FPKM, y-axis: density of distribution.

#### *3.4. GO Classification and KEGG Enrichment Assessment of Differentially Expressed Genes (DEGs) at Successive Pinecone Stages*

We determined and compared the number of up- and down-regulated genes between the two pine species at the seven different developmental stages of the pine cones collected (Figure 7 and Figure S4). In comparison group A, C and G, more down-regulated genes were found than up-regulated ones, while in group B, D, E and F, there was more up-regulated genes than down-regulated ones (Figure 7).

**Figure 7.** Statistics of differentially expressed genes of same stages of two pines (FDR ≤ 0.001 and |log2Ratio| ≥ 2). Red indicates up-regulated genes while green indicates down-regulated genes. The first two bars show that compare to ZA, MA has more down-regulated genes than up-regulated ones, the rest can be read in this manner.

To gain more insight in differential regulation of genes related to the pinecone reproductive process, we performed GO classification of all DEGs at every developmental stage (Figure S5). The number of differentially expressed unigenes classified in the 'reproduction' category under 'biological process' at each stage is listed (Figure S6). Three groups displayed more up- than down-regulated genes: ZA-MA, ZD-MD and ZF-MF. While the other four groups showed more down- than up-regulated genes: ZB-MB, ZC-MC, ZE-ME and ZG-MG. The ZG-MG group contains 101 down-regulated genes, almost double the number of up-regulated genes (51 genes).

We then determined whether specific cellular processes are differentially affected at each pinecone stage by performing a KEGG pathway enrichment analysis (Figure S7).

#### *3.5. Quantitative Real-Time PCR Validation*

We randomly selected five unigenes for validation of the accuracy of our RNA-seq data set using qRT-PCR. The following unigenes were randomly chosen: Unigene12135\_All, Unigene31229\_All, Unigene5965\_All, Unigene69986\_All and Unigene71003\_All. We tested cDNA derived from four samples: MA, ME, ZA, ZE, which were collected in different years from the two different species. Details of sequences and primers were list on Tables S3 and S4, respectively. Validation results shows a reliable correlation between RNA-seq and qRT-PCR (Figure S8).

#### *3.6. Temporal Gene Expression Profiles of P. massoniana and the Z Pine*

We analyzed gene expression dynamics of all unigenes across pinecone developmental stages for both species and clustered these into 49 unique expression profiles. The eighteen most frequently occurring profiles for each species are shown in Figure 8a. Within *P. massoniana*, the top five expression profile types are 16, 10, 31, 34 and 40, with respectively 8416, 7967, 5532, 3291 and 3092 genes showing these expression dynamics. In the Z pine the most frequently occurring profiles are profile types 40, 10, 27, 29 and 44, represented by 5610, 4336, 3441, 3371 and 3361 genes. *P. massoniana* and the Z pine shared fourteen profile types among their respective top 18 profiles (Figure 8a). In addition, we carried

out a profile comparison between *P. massoniana* and the Z pine. Every single profile of *P. massoniana* is listed and similar ones of Z pine are placed on the right of it in Figure 8b.

(**b**)

**Figure 8.** (**a**) Eighteen expression patterns of *P. massoniana* (M) and the Z pine (Z) based on the highest number of genes having these particular patterns. The number on the upper left of each square indicates profile type, the number on the lower left indicates the number of genes within each profile, the fold line shows the expression pattern, colored squares are significant profiles while white ones are insignificant profiles; (**b**) All *P. massoniana* (M) patterns compare to its similar ones in the Z pine (Z). The most left column of each part is profiles of *P. massoniana* while their similar Z pine counterparts are list on the right, significant profiles of M are marked profile numbers and colors.

#### *3.7. Reproductive Genes Are Differentially Expressed between P. massoniana and the Z Pine*

Next, we aimed to see whether genes related to reproduction might be differentially expressed between *P. massoniana* and the Z pine, potentially explaining the reproductive problems that the Z pine experiences. We looked for several DEGs involved in processes such as pollen development, pollen exine formation, pollen tube growth and development of the female gametophyte, endosperm, embryo and/or embryo sac according to recent reports (Table S5). Some of these genes show consistently higher expression in *P. massoniana* than in the Z pine, including: *ACA7* (Ca2+-ATPase), *MPK4* (mitogen-activated protein kinase), *QRT2* (polygalacturonase), *TKPR1* (tetraketide alpha-pyrone reductase 1), *PI5K* (phosphatidylinositol 4-phosphate 5-kinase), *PMEs* (pectin methylesterase), *SEC6* (exocyst complex component 6), *SEC15* (exocyst complex component 15), *SWK2* (slow walker 2), *PPR* (pentatricopeptide repeat-containing protein), *EMB* (embryo), *LEA* (late embryogenesis abundant protein), *SERK* (somatic embryogenesis receptor kinase), *BLH* (*BEL1-like* homeodomain protein) (Figure 9). Some genes relate to reproduction show a similar expression level in both pines, such as: *SHT* (spermidine hydroxycinnamoyl transferase), *SEC5* (exocyst complex component 5), *SEC8* (exocyst complex component 8), *EYE* (embryo yellow), *EDD1* (embryo defective development 1) and *EDA* (embryo sac development arrest).

**Figure 9.** *Cont.*

**Figure 9.** Expression levels of DEGs related to the reproductive process at successive pinecone developmental stages of *P. massoniana* and the Z pine. Expression level values in this figure have been transformed to a log10(FPKM+1) value. (**a**) pollen development; (**b**) pollen exine formation; (**c**) pollen tube growth; (**d**) female gametophyte; (**e**) embryo development; (**f**) embryo sac.

*ACA7* belongs to the auto-regulated Ca2+-ATPase family, which is exclusively detected in developing flowers of *Arabidopsis* and participates in the regulation of Ca2+ homeostasis [28]. *MPK4* plays an important role in plant growth, development and male fertility [29]. *QRT2* is necessary for pollen grain separation and is also involved in pollen development [30]. *TKPR1* takes part in a biosynthetic pathway leading to hydroxylated α-pyrone compounds [31]. *SHT* encodes an acyltransferase that conjugates spermidine to hydroxycinnamic acids, impacting the composition of the *Arabidopsis* pollen wall [32,33]. *NPG1* in *Arabidopsis* is specifically required for pollen germination [34] and not for pollen development [35]. A type B *PI5K* mediates *Arabidopsis* and *Nicotiana* pollen tube growth by regulating apical pectin secretion [36]. *PMEs* and its pro-region adjust cell wall dynamics of growing pollen tubes in *Nicotiana tabacum* [37]. The exocyst contributes to the morphogenesis of polarized cells in many eukaryotes, for example, *SEC8* facilitates the initiation and maintenance of polarized growth of pollen tubes [38]. *SWK2* has an essential role in the coordinated mitotic progression of the female gametophyte in *Arabidopsis* [39]. Absence of *CRINKLY4* could cause an inhibition of aleurone, which is in charge of differentiation normal progression over the endosperm surface development [40]. *PPR* is required for embryo and seed viability in *Arabidopsis*, its absence leading to embryo abortion [41,42]. *EYE* controls golgi-localized proteins, that have an important role in cell and organ expansion [43]. *EDD1* encodes plastid and mitochondria, functional absence mutation of *EDD1* causes embryo lethality [44]. *LEA* and *SERK* play key roles during embryogenesis and *SERK* is essential for embryogenic competence [45,46]. Misexpression of *BLH1* leads to a cell-fate switch of synergid to egg cell in the *Arabidopsis* eostre mutant embryo sac [47].

#### **4. Discussion**

#### *4.1. Gymnosperm Gene Annotation Using Transcriptome Data*

Transcriptome analysis based on RNA sequencing is an effective way to explore the huge genomes of plants like gymnosperms. Several RNA sequencing studies related to gymnosperms have previously been reported [48–51]. Yet until now few studies have focused on the development of *Pinus* reproductive organs. In *Pinus tabuliformis*, unusual bisexual cones were found; here, the gene expression pattern of *MADS-box* transcription factors, *FT*/*TFL1-like* and *LFY*/*NDLY* genes was compared between unisexual and bisexual cones [52]. In *Pinus bungeana*, 39.62 Gb of RNA sequencing data was analyzed from two kinds of sexual cones, obtaining 85,305 unigenes, 53,944 (63.23%) of which were annotated in public databases [53].

In this study, we collected a total of 160 Gb of RNA sequence data from *P. massoniana* and its introgression hybrid at seven different stages of pinecone development. N50 is a key parameter in genome or transcriptome assembly. It is defined as the sequence length of the shortest contig at 50% of the entire genome or transcriptome length. In principle, the higher the N50 value, the better the sequencing quality. We obtained an N50 of 2494 bp in all-unigene, compared to previously obtained values of (N50 = 551 bp) for *Picea abies* [48] and (N50 = 1942 bp) for *P. bungeana* [53], which means the quality of our sequencing data improves on previously available data.

A total of 30,943 genes (47.05%, rank 1) were annotated to *Picea sitchensis* through the Nr database, with further annotations being 1174 (1.79%, rank 7) to *Pinus tabuliformis*, 1074 (1.63%, rank 8) to *Pinus taeda*, 697 (1.06%, rank 12) to *Pinus monticola*, 401 (0.61%, rank 18) to *Pinus radiata* and 376 (0.57%, rank 19) to *Picea abies*; all these species are conifers and belong to the Pinaceae family. Out of these, two (*Pinus taeda* and *Picea abies*) had their genomes sequenced [1,54]. The genome sizes of *Pinus taeda* and *Picea abies* are 21.6 Gb and 19.7 Gb, respectively. Pines have an estimated genome size ranging from 18 Gb to 40 Gb [55–57]. This indicated that a lot of novel genes in *P. massoniana* and the Z pine still completely unknown and expect to discover more in the future.

#### *4.2. Impact of Introgression in Expression Levels*

Introgressive hybridization implies repeated backcrossing of hybrids with parental species [58]. Hybridization between pines exists frequently in nature [59,60]. As the pollen of pines is mainly moved by wind, it could spread to a vast area. In that case, the element consisting of individuals could be with various proportions of parental genomes. Therefore, those differences between individuals could lead to diversity on gene expression, particularly in genes relate to reproduction. In sample collection, we conducted mixture of cones for each sample code and also made a mixture apply to RNA isolation to reduce the possible expression bias through analysis process.

#### *4.3. Differential Expression of Reproductive Genes Could Relate to Delayed Maturation of Z Pinecones*

Within the *Pinus* genus, some female cones take 1.5 to 3 years to mature after pollination, while for *P. massoniana* and the Z pine specifically, it takes around 1.5 years to do that. Around the middle and lower reaches of the Yangtze River, these two pines are often pollinated in April and mature cones emerge in November of the next year, a long time compared to most angiosperms. The structure of a pinecone is rather complex compared to an angiosperm flower. Therefore, more genes relate to reproduction may exist in cone than in flower and more pathways of these genes may occur in this process also.

Genes directly related to reproduction in gymnosperms have only rarely been reported. One of them is *MADS* genes, which are well studied relatively, for example in *Gnetum* spp. [61,62], *Ginkgo biloba* [63], *Picea abies* [64] and *Cryptomeria japonica* [4]. *LEAFY* is also a crucial kind of gene that involve in reproductive process in *Welwitschia mirabilis* [65] and *Pinus caribaea var. Caribaea* [66], as well as *NEEDLY* in *Pinus radiata* [67]. We collected several such genes from model plants (e.g., *Arabidopsis thaliana*) and analyzed their expression level in the two pines. We found that *ACA7*, *MPK4*, *QRT2*, *TKPR1*, *PI5K*, *PMEs*, *SEC6*, *SEC15*, *SWK2*, *PPR*, *EMB*, *LEA*, *SERK*, *BLH1* showed a higher expression level in *P. massoniana* than in the Z pine. This result indicates that the Z pine may have a lowered expression level of genes related to pollen development, pollen exine formation, pollen tube growth and female gametophyte, embryo and/or embryo sac development, compared to *P. massoniana*. This outcome provides further understanding towards a possible molecular mechanism responsible for the altered reproduction process of the Z pine in comparison to *P. massoniana*.

#### **5. Conclusions**

*P. massoniana* and *P. hwangshanensis* mainly grow in southern China and produce an introgression hybrid, which we here temporarily named 'the Z pine,' on Mt. Lushan, where both species can be found. This Z pine has morphological characters derived from both parent species, yet has an ultra-low germination and ripening rate. In order to understand the molecular mechanism that might be causing this delayed reproduction, we collected cones from *P. massoniana* and the Z pine of seven successive developmental stages and determined their transcriptome. Herein we might discover differentially expressed genes underlying the observed reproductive delay. We obtained 93,291 unigenes with an average length size of 1987 bp and 2494 bp of N50. We identified significantly differentially expressed genes (DEGs) in all seven cone growth stages. We screened for DEGs related to reproduction, such as pollen tube growth, development of the female gametophyte and embryo and so forth. Several potentially vital genes were identified and the expression levels of the two pines were compared and analyzed. These results may offer insight into the molecular mechanisms of reproductive process between the two pines and several other plants that with similar differential mode.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4907/10/3/230/s1. Table S1: Quality of original sequencing data of P. massoniana and the Z pine; Table S2: Quality of clean data of P. massoniana and the Z pine; Table S3: Sequences of unigenes for qRT-PCR validation; Table S4: Primers for qRT-PCR; Table S5. Genes relate to reproduction process in P. massoniana and the Z pine; Figure S1: Length distribution of all unigene sequences; Figure S2: Venn diagram of annotation against to five databases: NCBI Nr, NCBI Nt, SwissProt, COG and KEGG for all unigenes; Figure S3. The species distribution of P. massoniana and the Z pine unigenes against the NCBI Nr database; Figure S4: Differentially expressed genes analysis between P. massoniana and the Z pine of each stage; Figure S5: Gene ontology (GO) functional classification of Z pine versus P. massoniana DEGs in seven stages; Figure S6: DEGs relate to reproduction in GO classification of Z pine versus P. massoniana in seven stages; Figure S7: KEGG pathway enrichment analysis of Z pine versus P. massoniana in seven stages; Figure S8: Comparison of unigene expression results between RNA sequencing (FPKM) and qRT-PCR.

**Author Contributions:** Conceptualization, J.X., T.Y. and J.S.; Formal analysis, J.M., Y.C., L.Y., H.H., H.Z., Z.G. and J.Y.; Funding acquisition, J.X.; Investigation, J.M., Y.C., L.Y., H.H., H.Z., Z.G. and J.Y.; Project administration, J.X.; Writing—original draft, J.M.; Writing—review & editing, J.M.

**Funding:** This research was funded by National Natural Science Foundation of China (31270661) and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

**Acknowledgments:** We would like to give thanks to Youxin Du, Qiang Huang and Benzhong Zhou from Lushan Botanical Garden, Jiangxi Province and Chinese Academy of Sciences for collecting samples on the Mt. Lushan. We also thank two reviewers for insightful comments on this article. Special thanks go to editors for their help in formulating the revisions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Diversity Estimation and Antimicrobial Activity of Culturable Endophytic Fungi from** *Litsea cubeba* **(Lour.) Pers. in China**

**Fei Wu 1,†, Dingchao Yang 1,†, Linping Zhang 1,\*, Yanliu Chen 1, Xiaokang Hu 2, Lei Li <sup>3</sup> and Junsheng Liang <sup>3</sup>**


Received: 6 December 2018; Accepted: 31 December 2018; Published: 6 January 2019

**Abstract:** Endophytes are important components of forest ecosystems, and have potential use in the development of medical drugs and the conservation of wild medicinal plants. This study aimed to examine the diversity and antimicrobial activities of endophytic fungi from a medicinal plant, *Litsea cubeba* (Lour.) Pers. The results showed that a total of 970 isolates were obtained from root, stem, leaf, and fruit segments of *L*. *cubeba*. All the fungal endophytes belonged to the phylum Ascomycota and could be classified into three taxonomic classes, nine orders, twelve families, and seventeen genera. SF15 (*Colletotrichum boninense*) was the dominant species in *L*. *cubeba*. Leaves harbored a greater number of fungal endophytes but lower diversity, while roots harbored the maximum species diversity of endophytic fungi. For the antimicrobial activities, seventeen isolates could inhibit the growth of plant pathogenic fungi, while the extracts of six endophytes showed antimicrobial activity to all the tested pathogenic fungi. Among these endophytes, SF22 (*Chaetomium globosum*) and SF14 (*Penicillium minioluteum*) were particularly effective in inhibiting seven plant pathogenic fungi growths and could be further explored for their potential use in biotechnology, medicine, and agriculture.

**Keywords:** endophytes; medicinal plants; pathogen; molecular identification; plant-microbe interaction

#### **1. Introduction**

The demand for new and useful compounds for disease prevention and control is ever growing [1]. Antibiotic resistance, the increasing incidence of fungal diseases, and the development of superbugs cause biodiversity loss and constantly bring challenges to the field of medicine [2,3]. Thus, there is an urgent need to find new antibiotics that are more effective, have lower toxicity, and a smaller environmental impact.

Forest ecosystems cover an area of approximately 38 million square kilometers and contain substantial resources [4,5]. Endophytes are an important component of the forest ecosystem, which inhabit the internal tissues of plants, have no detrimental effects on plants, and can sometimes improve plant growth performance [6,7]. Most of the natural compounds produced by endophytes have exhibited antimicrobial activity and, in many cases, these are related to the protection of the host from phytopathogenic microorganisms [8]. The endophyte *Beauveria bassiana* has been able to inhibit fungal pathogens by the production of bioactive metabolites [9]. The endophytic

fungus *Gliocladium catenulatum* can reduce the incidence of witches' broom disease in cacao by up to 70% [10]. Furthermore, some endophytic fungi can produce the same chemical compounds as the host, such as the paclitaxel producing fungus *Taxomyces andreanae* from *Taxus brevifolia* [11,12], and the podophyllotoxin generating fungus *Fusarium oxysporum* from *Juniperus recurva* [13]. There have been over 8600 discovered bioactive metabolites of fungal origin [14]. It is estimated that there are approximately 1 million fungal species of endophytic fungi in nature [15], whereas only a small percentage of endophytes have been discovered [16]. The enormous biodiversity and abundant fungal endophytes that occur in plant tissues show the potential role of endophytes in the production of novel natural antimicrobial compounds.

*Litsea cubeba* (Lour.) Pers. (Lauraceae) is a native woody species in China, Indonesia, and other countries in Southeast Asia [17]. It is a valuable traditional Chinese medicinal plant that has been used to treat rheumatic diseases, stomach aches, and common cold for thousands of years [18,19]. The active components of *L*. *cubeba* were reported to be antibacterial [20], anticancer [21], and anti-inflammatory [19]. Intercropping of *L*. *cubeba* and *Camellia oleifera* Abel. can reduce the incidence of *C*. *oleifera* disease, suggesting the role of *L*. *cubeba* in protecting economic plants from diseases. *Colletotrichum gloeosporioides* (Penz.) Penz. & Sacc. [22], *Fusarium andiyazi* Marasas, Rheeder, Lampr., K.A. Zeller & J.F. Leslie [23], *Alternaria alternata* (Fr.) Keissl. [24], *Phomopsis* sp. [25], *Ceratosphaeria phyllostachydis* Zhang [26], *Rhizoctonia solani* Kühn [27], and *Phytophthora capsici* Leonian [28] cause diseases in main economic crops in South China, leading to a heavy decline in crop yield and quality. Currently, the associated microflora of medicinal plants is being paid increased amounts of attention for the exploitation of antimicrobial drugs [29]. However, to our knowledge, there are no reports on the biodiversity and bioactivity of endophytic fungi in *L*. *cubeba*. This study aimed to investigate the diversity and antimicrobial activities of endophytic fungi of *L*. *cubeba*, and, further, to screen them as potential biocontrol agents against seven plant pathogens.

#### **2. Materials and Methods**

#### *2.1. Collection of Samples and Isolation of Endophytic Fungi*

The leaves, branches, roots, and fruits of *Litsea cubeba* were collected from a planting base in Lichuan county of Jiangxi Province, China, in May 2016. The leaves and fruits samples were cut into small pieces of about 0.5 × 0.5 cm using a sterile knife, and the branch and root samples were cut into small segments 1 cm in length. These fragments were surface sterilized with 70% (v/v) ethanol for 3 min, 3% (v/v) NaClO for 3–5 min, and then rinsed with sterile water four times. Excess moisture was blotted by sterile filter papers [30]. Then, they were cultured on potato dextrose agar (PDA) medium supplemented with streptomycin (50 U/mL) and penicillin (30 U/mL) at 25 ◦C under dark conditions for 7–15 days. Pure fungal cultures were obtained by picking hyphal tips of the developing fungal colonies. The acquired isolates were preserved on PDA slants and deposited at 4 ◦C for identification.

#### *2.2. Genomic DNA Extraction, PCR Amplification and Molecular Identification*

The isolates were first identified based on the morphological characteristics of the colony culture and spores. Fungal genomic DNA was extracted from the mycelia using an Ezup Column Fungi Genomic DNA Purification Kit (Sangon Biotech, Inc., Shanghai, China) according to the manufacturer's protocol. The internal transcribed spacer (ITS) regions were amplified using the universal primers ITS1 (5 -TCCGTAGGTGAACCTGCGC-3 ) and ITS4 (5 -TCCTCCGCTTATTGATATGC-3 ) [31]. The reaction mixtures (50 μL) contained 25 μL 2 × Taq PCR Master mixture (Sangon Biotech, Inc., Shanghai, China), 2 μL of ITS4, 2 μL of ITS5, 2 μL of Template DNA, and 19 μL of ddH2O. The reaction conditions were 94 ◦C for 5 min, 30 cycles at 94 ◦C for 50 s, 52 ◦C for 50 s, 72 ◦C for 1 min, and a final extension at 72 ◦C for 7 min. The PCR products were examined by electrophoresis in 1% (w/v) agarose gels and then purified using the Agarose Gel DNA Extraction Kit (Takara, Japan) and sequenced.

The resultant sequences were compared with previously deposited sequences in the GenBank, NCBI (http://www.ncbi.nlm.nih.gov) using a basic local alignment search tool (BLAST). Sequence alignment and phylogenetic analysis were conducted using MEGA version 7 [32]. Phylogenetic trees were constructed using a neighbor-joining method. The ITS gene sequences of the potential novel isolates were deposited in GenBank under the accession numbers MF962537–MF962573.

#### *2.3. Estimation and Quantification of Fungal Diversity*

Fungal diversity and richness in different plant tissues were measured and quantified using various indices, including the colonization rate (*CR*), isolation rate (*IR*), and Shannon-Wiener (*H'*), Simpson's (*Ds*) diversity index and evenness index (*E*). The calculations were as follows.

$$\text{CR} = \text{Nf}/\text{Nt} \times 100,\tag{1}$$

$$IR = \text{Ng} / \text{Nt} \times 100,\tag{2}$$

$$H' = -\sum \text{Pi} \times \text{Ln(Pi)},\tag{3}$$

$$D\_{\mathbb{S}} = 1 - \Sigma \text{Pr}^2,\tag{4}$$

$$E = H'/\operatorname{Ln}(S),\tag{5}$$

where *Nf* was the number of fragments with fungal growth, *Nt* was the total number of fragments, and *Ng* was the number of isolates of a given type isolated [33]. *Pi* = *ni*/*N*, is the relative abundance of the endophytic fungal species, *ni* is the number of isolates of one species, and *N* is the total species number of isolates [34,35]. *S* was the total number of the taxa (ITS genotype) present within each sample [16].

#### *2.4. Antimicrobial Activity of Endophytic Fungi*

The indicator strains include the following plant pathologens: the fungi *Colletotrichum gloeosporioides*, *F*. *andiyazi*, *A*. *alternata*, *Phomopsis* sp., *Ceratosphaeria phyllostachydis*, *R*. *solani*, and the Chromista *Phytophthora capsici*, provided by the Plant Pathology Laboratory, College of Forestry, Jiangxi Agricultural University, China.

A dual culture technique was applied to examine the antimicrobial activity of endophytic fungi from *L*. *cubeba* against fungal pathogens [36]. The mycelial discs (6 mm in diameter) of actively growing endophytes were placed at the periphery of the PDA plate. The mycelial discs of the pathogen were placed on the other side of the PDA plate, 4 cm away from the endophyte disc. The plate with only the pathogen was used as a control. Each treatment replicated 3 times. The dual culture plates were incubated for 3–8 days at 25 ◦C. The inhibition rate against pathogens was calculated according to the formula below.

$$\text{Inhibition rate (\%)}=(\text{R}\_1-\text{R}\_2)/(\text{R}\_1-0.6)\times 100,\tag{6}$$

where R1 is the colony diameter of the control, R2 is the colony diameter under experimental treatments, and 0.6 mm represents the mycelial discs.

The endophytes with high antimicrobial activity were selected and investigated for the in vitro antimicrobial activity of their extracts. Each of the endophytes were separately cultured on 200 mL PDA liquid medium at 25 ◦C, by shaking at 150 rpm for 8–12 days. The culture broth was collected by filtration and extracted with an equal amount of ethyl acetate three times. The organic phase was evaporated to dryness using a rotary evaporator. The dry extract was dissolved in 3 mL of methanol and formulated into 15 μg/mL of mycelia broth.

In vitro antimicrobial tests were conducted by testing the growth rate of the pathology fungi. The mycelial discs (6 mm in diameter) of the pathogen were placed in the center of the PDA plate containing 1.5 mL mycelia broth. The PDA plate without mycelia broth (containing only 1.5 mL methanol) was used as the control. The tested plates were cultured at 25 ◦C for 3–7 days. The formula for calculating the inhibition rate is the same as Formula (6).

#### *2.5. Statistical Analyses*

Statistical tests were performed using SPSS 13.0 (SPSS Inc., Chicago, IL, USA). Turkey's multiple range test was used to pairwise multiple comparisons between treatments.

#### **3. Results**

#### *3.1. Identification and Composition of Endophyte Assemblage*

A total of 970 isolates were obtained from root, stem, leaf, and fruit segments of *L*. *cubeba* (Table 1). The maximum number of isolates was obtained from the leaves (438 isolates), followed by stems (241 isolates), fruits (149 isolates), and roots (142 isolates). Molecular identification of the isolates was conducted based on a comparative analysis of ITS gene sequences and their similarity to reference sequences (Figure 1). The results showed that the isolated endophytic fungi could be allocated to 36 operational taxonomic units (OTUs). All of them belonged to the Ascomycota phylum and were classified into three taxonomic classes (Eurotiomycetes, Dothideomycetes, and Sordariomycetes), nine orders (Eurotiales, Botryosphaeriales, Pleosporales, Hypocreales, Chaetosphaeriales, Sordariales, Diaporthales, Xylariales, and an unassigned order), twelve families and seventeen genera. Twenty-three fungal morphotypic groups were taxonomically assigned to species, and the other 13 were classified at the genus level (Table 1). SF15 (*Colletotrichum boninense*) accounted for 39.79% of the total isolates and was the dominant species in the whole fungal endophytic community, followed by SF4 (*Botryosphaeria dothidea*) (6.60%).

#### *3.2. Diversity Estimation of Endophytic Fungi*

The biodiversity of endophytic fungi in *L*. *cubeba* was quantitatively investigated in terms of the colonization rate (*CR*), isolation rate (*IR*), Shannon-Wiener (*H'*), and Simpson's (*Ds*) diversity index and evenness index (*E*) (Table 2). The total *H'* and *Ds* were 2.52 and 0.82, respectively. The highest biodiversity of endophytic fungi was observed in roots (*H'* = 2.74, *Ds* = 0.90), followed by stems (*H'* = 2.56, *Ds* = 0.90), fruits (*H'* = 1.99, *Ds* = 0.76), and leaves (*H'* = 1.43, *Ds* = 0.56). The leaf samples had the highest endophytic fungi colonization rate but the lowest species evenness (*E* = 0.51) compared to the other plant parts.

#### *3.3. In Vitro Antimicrobial Activity of Endophytic Fungi*

The results of dual culture experiments showed that 17 isolates inhibited the growth of pathogenic fungi, which was manifested by the occurrence of the inhibition zone or mycelial atrophy of pathogens (Table 3). Among them, 10 isolates exhibited antibiotic effects on all the tested pathogenic microbes. SF22 (*Chaetomium globosum*) showed the strong activity against *Ceratosphaeria phyllostachydis*, *Phomopsis* sp., and *Alternaria alternata*, with inhibition rates of 78.43, 73.20, and 70.23%, respectively.

The results of the antimicrobial test on the fermentation products support that the fermentation products of SF14, SF22, SF23, SF27, SF29 and SF32 showed antimicrobial activity against all the tested pathogen fungi (Table 4). The antimicrobial activity of the fermentation products was stronger than the endophytic fungi. The inhibition rate of SF22 (*Chaetomium globosum*) extracts against *Ceratosphaeria phyllostachydis* was 93.24%. The inhibition rate of SF14 (*Penicillium minioluteum*) extracts against *Phomopsis* sp. was 87.87%. The inhibition rates of the fermentation products of these two isolates against the other six pathogens were over 60%.



#### *Forests* **2019**, *10*, 33


**Table3.**Antimicrobialactivitiesofendophytic fungifrom*Litseacubeba*(Lour.)Pers.


#### *Forests* **2019** , *10*, 33

w: Pathogen hyphae shrink; –: No inhibition.


**Table 4.** Antimicrobial activity of the metabolites of endophytic fungi from *Litsea cubeba* (Lour.) Pers.

at *p* ≤ 0.05 according to Turkey's test. 1-

J.F. Leslie; 4-

*Alternaria alternata* (Fr.) Keissl.; 5-

*Colletotrichum*

 *gloeosporioides* (Penz.) Penz. & Sacc.; 2-

*Phomopsis* sp.; 6-

*Ceratosphaeria*

 *phyllostachydis* Zhang; 7-

*Phytophthora capsici* Leonian; 3*Rhizoctonia solani* Kühn.

*Fusarium andiyazi* Marasas, Rheeder, Lampr., K.A. Zeller &

**Figure 1.** Neighbor-joining phylogenetic tree based on internal transcribed spacer (ITS)–rDNA gene sequences of endophytic fungi associated with *Litsea cubeba* (Lour.) Pers. Bootstrap percentages (>50) after 1000 replications are shown.

#### **4. Discussion**

Medicinal plants are legitimate targets to isolate endophytic fungi for their role in producing pharmacologically important secondary metabolites [37]. These fungal endophytes can be used to treat plant diseases. This is the first study that demonstrates the diversity, phylogeny, and bioactive potential of endophytic fungi associated with a medicinal plant, *L*. *cubeba*. In this study, all the fungal isolates were identified as Ascomycota, which is consistent with previous findings on *Ophiopogon japonicas* [38], *Calotropis procera* [39], and *Cannabis sativa* [35]. It is estimated that the phylum Ascomycota covers about 8% of the Earth's land and is among the most prevalent and diverse phyla of eukaryotes [37,40]. Endophytic fungi are ubiquitously distributed thoughout various classes of Ascomycota, including Eurotiomycetes, Dothideomycetes, Leotiomycetes, Pezizomycetes, and Sordariomycetes [6,41]. Katoch et al. [37] observed that the endophytic fungi in *Monarda citriodora*, a medicinal plant, were mainly distributed in the Sordariomycetes class, followed by Eurotiomycetes and Dothideomycetes. A similar presentation of classes was found in this study, indicating that endophytic fungi isolated in this study were cosmopolitan endophytes.

The fungal endophytes discovered in *L*. *cubeba* in this study were not identical to those reported in other studies. Ho et al. (2012) [42] isolated endophytic fungi from twigs of seven medicinal herbs belonging to the Lauraceae family (including *L*. *cubeba*) and found that the endophytes from *L*. *cubeba* belonged to six genera (*Pestalotiopsis*, *Arthrinium*, *Diaporthe*, *Xylaria*, *Hypoxylon*, and *Pyrenochaeta*). Only two genera (*Pestalotiopsis* and *Diaporthe*) were consistent with the results of the present study. This may due to the differences in sites, seasons, and climates [6].

The variation in endophytic communities was also found in spatial distribution. The endophytic community in *L*. *cubeba* exhibited tissue specificity. A similar phenomenon was also observed in *Dendrobium officinale* [16], which may be caused by the different external environments or by the biological differences among tissues and organs [6]. Microorganisms in the environment usually show low diversity and low abundance compared with the soil [43]. The results of the present study support this point that roots harbor the maximum species diversity of endophytic fungi. Leaves harbor a greater number of fungal endophytes but with a lower diversity than other plant samples. This may be because the large surface area and the presence of stomata in leaves exposed to the external environment provide access for the entry of fungal mycelium, so that leaves may harbor a greater number of endophytic fungi [36]. However, the substantial organic compounds in leaves were largely inaccessible to foliar microorganisms, and microorganisms may present in the leaves in the form of co-metabolism, thus limiting the diversity of endophytic fungi in leaves [4,44,45].

Some fungal endophytes have been considered as beneficial mutualisms in protecting the host from pathogens [46]. In this study, the fungal endophytes were investigated for antifungal activity using a dual culture method. The results showed that 17 isolates inhibited the growth of plant pathogenic fungi. SF22 (*Chaetomium globosum*) showed strongest anti-pathogen activity. Previous studies demonstrated that some endophytic fungi could produce metabolites with antimicrobial function [6,37]. The endophytic extracts were screened for antifungal activity, and the results indicate that there were six endophytes exhibiting strong anti-pathogen activity. The extracts of SF22 (*C*. *globosum*) and SF14 (*Penicillium minioluteum*) were particularly effective in inhibiting pathogen growth. The dominant fungi, SF15 (*Colletotrichum boninense*), was less efficacious, though previous studies reported that *Colletotrichum* sp. showed a broad range of antifungal activity [47]. This phenomenon showed that there was no direct relationship between antifungal activity and fungal colonization rate [36]. *Chaetomium globosum* was reported to have disease control capacity by producing chaetoviridins and chaetoglobosin [48,49]. The application of the culture filtrates of *C*. *globosum* to maize showed efficacy in the inhibition of northern corn leaf blight [48]. *Penicillium* sp. was also reported to be efficacious against plant pathogenic fungi [50] and, interestingly, *P*. *minioluteum* attracted more attention for its beneficial effects on plant stress tolerance [51]. The growth inhibitory activity against plant pathogenic fungi by these endophytes indicates that endophytic fungi have the potential to be used as biocontrol agents in the future.

#### **5. Conclusions**

This study is the first to investigate the diversity of endophytic fungi in *L*. *cubeba*. The results demonstrated that *L*. *cubeba* harbors a rich fungal endophytic community with antimicrobial activities. SF22 (*C*. *globosum*) and SF14 (*P*. *minioluteum*) were found to have anti-pathogenic fungi properties and, thus, could be sources of novel natural antimicrobial compounds. Meanwhile, the results highlighted the potential use of endophytes in the development of drugs and the conservation of medicinal plants.

**Author Contributions:** L.Z. designed the study; D.Y. and Y.C. carried out the experiment and analyzed the data. F.W. wrote the first draft of the manuscript; F.W., D.Y., L.Z., Y.C., X.H., L.L. and J.L. contributed with suggestions and corrections, and approved the final manuscript. F.W. and D.Y. contributed equally to this work.

**Funding:** This work was supported by the National Natural Science Foundation of China [grant numbers 31660189, 31570594], and Hunan Provincial Natural Science Foundation of China (2018JJ2217, 2018JJ3281).

**Acknowledgments:** The authors thank Key Laboratory of State Forestry Administration on Forest Ecosystem Protection and Restoration of Poyang Lake Watershed (JXAU) for providing experimental equipment support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
