*2.1. Assembly and Annotation of the DNA Sequence of the Complete Mitochondrial Genome of F. sylvatica L.*

The reference specimen (FASYL\_29) sequenced in this study was selected from a set of genotyped beech trees from a provenance trial used also in a former study [54]. It originates from the German population Gransee/Brandenburg, which is located in the center of the natural distribution range of *F. sylvatica* (see also Section 4.1).

The complete DNA sequence of the mitochondrial genome of FASYL\_29 was assembled based on Illumina MiSeq reads (2 × 300 bp; 24×) and validated using long reads from nanopore MinION sequencing (3.2×; MiSeq and MinION reads are accessible at SRA PRJNA648273). For the validation of the mtDNA sequence, nanopore reads were mapped to the assembled sequence. The mapping result presented in Figure S1 shows that the mtDNA sequence is completely covered by overlapping long reads from nanopore sequencing.

The mitochondrial genome of the *F. sylvatica* individual FASYL\_29 was assembled into a single DNA sequence of a total length of 504,715 bp and an average GC content of 45.8% (GenBank MT446430; Figure 1). The assembly may best fit on a circular map (Figure 1). This display is not corresponding to the physical structure of the genome in vivo where it more likely exists in different conformations ([14,26,33] among others; see introduction).

Furthermore, the *F. sylvatica* mitochondrial genome contains 32 interspersed repeats greater than 50 bp (Table S1) including two repeats greater than 300 bp: one inverted repeat of size 918 bp and one direct repeat of size 316 bp (Figure 1). One copy of the inverted repeat is located near an ancestral gene cluster consisting of the genes *rpl16*, *rps3*, and *rps19* [18] (Figure 1). In comparison, the mtDNA sequence of *Quercus variabilis* [9] contains 17 repeats greater than 50 bp and three repeats greater than 300 bp. The largest repeat is about 17.3 kbp in size. The mitochondrial genome of *Betula pendula* [10] contains 133 repeats greater than 50 bp and two repeats greater than 300 bp. The largest repeat is 474 bp long. Various fragments of the largest repeat of *Quercus variabilis* are included with high identity in the mtDNA sequence of *F. sylvatica* (52% of the repeat with about 97% identity included; File S1) and of *Betula pendula* (38% of the repeat with about 97% identity included; File S2). A comparison of all identified *F. sylvatica* repeats with the repeats in *Quercus variabilis* or *Betula pendula*, respectively, showed that several repeats are identical und many have high similarity (summary in Table S2; BlastN results in Files S3 and S4). One of the *F. sylvatica* repeats greater than 50 bp, namely repeat\_11 (81 bp in length), is 100% identical to a *Quercus variabilis* repeat of the same length (repeat\_12).

Chloroplast-like DNA sequence regions with more than 90% similarity to the *F. sylvatica* chloroplast genome sequence of the same individual, FASYL\_29 (NC\_041437.1) [4] account for about 0.72% of the FASYL\_29 mitochondrial genome and are distributed among three distinct regions in the genome (region 1: 70,870–71,106 bp with 98.3% identity; region 2: 159,645–160,312 bp with 98.8% identity; region 3: 199,831–202,538 bp with 95.2% identity). These three regions were also featured by increased coverage (compared to all other regions) when mapping all available trimmed Illumina reads from

DNA-sequencing of FASYL\_29 to the related mitochondrial genome sequence (data not shown). The increased coverage is due to an unspecific mapping of chloroplast DNA-derived reads in addition to mtDNA-derived reads to these regions.

**Figure 1.** Circular graphical display of the assembled mitochondrial genome sequence of the *F. sylvatica* individual FASYL\_29 (GenBank MT446430). The display does not correspond to the physical structure of the genome in vivo where it more likely exists in different conformations (see main text). Pairs of interspersed direct (DR) and inverted (IR) repeats longer than 300 bp and with ≥99% sequence identity are numbered (one pair each). In addition to protein-coding and structural RNA genes of predicted function, 23 potential CDS regions (indicated as "ORFs") of unknown function with support from RNA-Seq data were predicted and mapped to the genome sequence. The grey arrows indicate the direction of transcription of the two DNA strands. A GC content graph is depicted within the inner circle. The circle inside the GC content graph marks the 50% threshold. The map was created using OrganellarGenomeDraw [55,56].

In total, 58 genes with predicted function were annotated, including 35 protein-coding genes, 20 tRNA, and 3 rRNA genes. The gene *mttb* is probably a pseudogene (MT446430). All of the known genes coding for subunits of proteins of the respiratory chain were identified including *sdh3* and *sdh4* (Figure 1). Several genes coding for small or large subunits of ribosomal proteins are missing, i.e., *rps2*, *rps7*, *rps10*, *rps11*, *rps13*, *rpl2*, and *rpl15*. The genes *nad1*, *nad2*, and *nad5* were predicted to be fragmented in five exons each, belonging to more than one distinct transcription unit (MT446430). The maturation of these genes requires cis- as well as trans-splicing events. For the following genes, more than one exon at one distinct transcription unit was predicted: *ccmFc*, *nad4*, *nad7*, *cox2*, and *rps3* (MT446430). The start codons of *nad1*, *nad4L*, and *cox1* are potentially created by RNA editing, as indicated by mappings of RNA-Seq data from two individuals of *F. sylvatica* (RNA-Seq reads accessible at SRA PRJNA648273) to the annotated mitochondrial genome sequence of *F. sylvatica* in the related regions (Figure S2). Additionally, 23 potentially protein-coding genes of unknown function were annotated based on ORF prediction from assembled RNA-Seq data (ORF1–23 in Figure 1).

In Figure S3, the gene order of potential protein-coding genes annotated in the mitochondrial genome of *Liriodendron tulipifera* [18] was compared with that of *F. sylvatica* and *Quercus variabilis* [9] as another Fagales species (*Betula pendula* was not included in the global comparison because the mitochondrial genome has not been annotated so far). As expected, there is no conservation of synteny between *Liriodendron tulipifera* and *F. sylvatica*. Although *F. sylvatica* and *Quercus variabilis* are members of the same family (Fagaceae; in different subfamilies), no larger syntenic gene groups could be identified. However, several small collinear gene clusters inferred by Richardson et al. [18] as ancestral angiosperm gene clusters in *Liriodendron tulipifera* were also identified in the *F. sylvatica* mitochondrial genome (Figure S3). Ancestral gene clusters identified in *F. sylvatica* include among others the *sdh4*/*cox3*/*atp8*-cluster (cluster also in *Quercus variabilis*), the *atp4*/*nad4L*-cluster (not in *Quercus variabilis*), the *cob*/*rps15*/*rpl5*-cluster including *ccmFc* (physical separation of *ccmFc* from *cob*/*rps15* in *Quercus variabilis*), the *rps12*/*nad3*-cluster (also in *Quercus variabilis*), and the *rpl16*/*rps3*/*rps19*/*rpl2*-cluster without *rpl2* (*rpl2* is absent in *F. sylvatica*; cluster not in *Quercus variabilis*; Figure S3).

In the comparison of the gene order (Figure S3), two gene clusters—not present in *Liriodendron tulipifera*—were identified in both *F. sylvatica* and *Quercus variabilis*: the clusters *ccmB*/*rpl10* and *cox1*/*sdh3*. The *ccmB*/*rpl10*-cluster was also identified in the mitochondrial genome of another Fagales member in the Betulaceae family, *Betula pendula* (Figure S4), whereas *cox1* and *sdh3* are physically separated in *Betula pendula (*draft annotation of *cox1* in File S5; *sdh3* annotation by Blast analysis). In contrast to *Betula pendula*, *F. sylvatica* and *Quercus variabilis* include the tRNA-gene *trnK* (UUU) upstream of the *ccmB*/*rpl10*-cluster (in a distance of about 2500 bp from *ccmB*; Figure S4). Interestingly, the *ccmB*/*rpl10*-cluster is not present in some non-Fagales members of the fabids analyzed, such as *Populus tremula* (NC\_028096, family Malpighiales), *Vicia faba* (KC189947, Fabales; *rpl10* is not annotated), *Malus x domestica* (NC\_018554, Rosales; *rpl10* is not annotated), and *Citrullus lanatus* (NC\_014043, Cucurbitales; *rpl10* is not annotated).
