**1. Introduction**

The previously defined Salicaceae *sensu stricto* (*s.s.*) only included *Salix* and *Populus* [1], but later more than 50 genera were classified into the Salicaceae *sensu lato* (*s.l.*) family, containing over 1000 species [2]. The family Salicaceae *s.l.* is a woody shrub plant, ranging in height from less than a few centimeters to tens of meters. The species in Salicaceae *s.l.* are primarily distributed in cold, tropical, and warm temperate regions and occupy extremely varied habitats [2,3]. This family's sexual systems are highly diverse. Most genera are dioecious, whereas, some are monoecious. Both XY and ZW sex determination systems have been reported in the dioecious species, indicating an amazingly diversified history of sex determination [4,5]. The species of this family have many uses. There are varied chemicals produced by the family. An abundant oil containing unsaturated fatty acids can be synthesized from *Idesia* fruits [6]. The early medicine, aspirin, was first isolated from willow and poplar bark, and the willow species have been developed as bioenergy crops [7]. *Populus* species have become

model organisms for basic research in molecular biology and genetics because of their small genome size and fast growth rates. Moreover, they play important roles in the ecosystem, plant domestication, and conservation, as well as being one of the most economically important groups of forest trees [8].

The plastid genome is widely used in plant genetic population and phylogeny analysis because of its slow rate of nucleotide substitution in gene coding genes and relatively conservative gene structure and content, which can also increase phylogenetic resolution at the lower taxonomic levels [9–14]. The plastid genome usually encodes about 80 unique proteins, 30 transfer RNAs (tRNAs), and four ribosomal RNAs (rRNAs) [15,16]. For terrestrial plants with photosynthesis, the plastid genome is 120–220 kb in size and has the typical feature of two inverted repeats (IRs), 20–28 kb in size, separated by small single copies (SSC) and large single copies ( LSC) with sizes of 16–27 kb and 80–90 kb, respectively [17,18]. The size of IR in plastids varies widely for different groups, genus, family or species [13,19,20]. The IR copies recombine themselves in order to maintain or confer stability in the remaining plastome [17,18,21]. With the development of high-throughput sequencing technology in recent years, the number of complete sequenced plastid genomes has increased rapidly [22]. The whole plastid genome of 61 species in Salicaceae *s.l.* has been sequenced and stored in the GenBank.

In previous studies [23–26], their work mainly focused on the relationship of the main subclades in the genera of *Salix* and *Populus*, because the delimitation of species in these genera remains controversial. There are 29–70 species in the genus *Populus* and based on their morphological characteristics they have been grouped into the following six sections: *Abaso*, *Turanga*, *Populus*, *Leucoides*, *Aigeiros*, and *Tacamahaca* [27]. For the genus *Salix*, about 450 species have been published and two main subclades have been identified [28–30]. The most recent study used 42 species from six genera based on the complete plastomes in order to examine phylogenetic relationships of Salicaceae [31]. Although several other genera in Salicaceae were mentioned in their research, their main purpose was to determine the relationship of subclades in the genus of *Salix* and *Populus*. We know little about the phylogenetic relationships of the other 48 genera in Salicaceae *s.l.*

In this study, we sequenced 20 plastid genomes in the family, and added the following four previously published plastid genomes: *Populus euphratica*, *Salix interior*, *Idesia polycarpa maxim*, and *Poliothyrsis sinensis*. In total, there were 24 species from 18 genera of Salicaceae *s.l.*, as well as two outgroups, *Passiflora laurifolia* and *Passiflora ligularis*. We mainly aimed to: (1) determine the repeat sequence variations of plastid genomes, (2) examine structural changes in the plastomes of the Salicaceae *s.l.*, and (3) delimit intergeneric relationships within Salicaceae *s.l.*

## **2. Results**

#### *2.1. Characteristics of the Plastid Genomes*

Twenty complete plastid genomes belonging to fourteen genera of the family Salicaceae *s.l.* were newly generated in this study (Table 1). All of the genome sequences have been submitted to the GenBank. In order to fully display the characteristics of the plastid genomes, we further collected four sequences of other species in the Salicaceae *s.l.* from the NCBI GenBank. In total, our subsequent comparative analysis included 24 species representing 18 genera of the family Salicaceae *s.l.* (Table 1). The total length of the chloroplast genome sequences ranged from 155,144 bp in *Flacourtia ramontchii* to 158,605 bp in *Bennettiodendron brevipes*. The structure of the genomes displayed a typical quadripartite structure, with a pair of inverted-repeat (IR) regions of 27,168–27,926 bp, separated by a large single copy (LSC) of 83,391–86,350 bp and a small single copy (SSC) of 16,305–17,889 bp. The LSC regions exhibited the greatest standard deviation in sequence length (s.d. = 872.58 bp), followed by the SSC regions (s.d. = 470.12 bp) and the the IR regions (s.d. = 217.50 bp). Nucleotide composition with an overall GC content of 36.8% was nearly identical in all plastid genomes.



A total of 131 functional genes with the same order were annotated in each of the newly sequenced plastomes, of which 102 were unique genes, including 78 protein-coding genes, 30 tRNA genes, and four rRNA genes (Figure 1, Table 2). Most of these genes occurred as a single copy, while 19 genes were duplicated in the IR regions (Table 1). The gene *cemA* contained premature termination codons in *Casearia decandra Jacq* and *Casearia velutina*, while gene *atpF* contained premature termination codons in *Olmediella betschleriana*.

**Figure 1.** Gene map of 20 Salicaceae *s.l.* chloroplast genomes. Functional categories of genes are color-coded. Genes inside the circle are transcribed clockwise and genes outside the circle are transcribed counter clockwise. The dashed area in the inner circle indicates the GC content of the plastid genome.
