*3.1. Contrasting Evolutionary Trajectories of Trifolium Organelle Genomes*

*Trifolium* mitogenomes (294,911 to 348,724 bp) (Table 1) are similar in size to the other Trifolieae genus *Medicago* (271,618 bp), which has the smallest currently sequenced papilionoid mitogenome [8]. Mitogenomes of *Trifolium* have relatively little repetitive DNA (6.6–8.6%) (Table 2) compared to mitogenomes of other Papilionoideae species (2.9–60.6%) [8]. This low repeat content in the mitogenome is in contrast to the plastome of some *Trifolium* species. The acquisition of numerous, novel repeat sequences and drastic rearrangement in the plastome of *T. subterraneum* and related species has been reported [31,33,35]. Increased taxon sampling by Sveinsson and Cronk [34] revealed that plastome expansion is shared by five sections, referred to as the "refractory clade" in subgenus *Trifolium* (*Lupinaster*, *Trichocephalum*, *Trifolium*, *Vesicastrum* and *Trifoliastrum*). The distinct evolutionary trajectory of organelle genomes in the genus is particularly evident in *T. pratense*, which has the lowest percentage of repetitive DNA in the mitogenome and the highest in the plastome as well as the most highly rearranged structure (Table 2 and Figure 3). In plant mitogenomes, accumulation of repeats, genome expansions and rearrangements may be a consequence of error-prone DNA repair mechanisms

such as nonhomologous end-joining or break-induced-replication [48–50]. In Geraniaceae, a correlation between nonsynonymous substitution rates for DNA replication, recombination and repair (DNA-RRR) genes and plastome complexity was reported [51]. The plastome-specific increase in repeat complexity in the *Trifolium* refractory clade may be the result of disruption of 'plastid specific' DNA-RRR-protein genes, some of which are targeted to both mitochondria and plastids [7]. More comprehensive taxon sampling that includes data from all three plant genomic compartments of *Trifolium* is required to test this hypothesis.

#### *3.2. Multiple Functional Transfers of the Mitochondrial rps1 Gene to the Nucleus in Papilionoideae*

An earlier investigation reported the functional transfer of mitochondrial *rps1* to the nucleus in three genera of Trifolieae (*Trigonella*, *Melilotus* and *Medicago*) [39]. In the current study, the complete deletion of *rps1* gene from mitogenomes of four *Trifolium* species was detected (Figure S1), which is shared by the distantly related genus *Lotus*, a member of the tribe Loteae (Figure S2). There are two possible explanations for the phylogenetic distribution of the loss/transfer. The loss of mitochondrial *rps1* could be due to a single IGT in a common ancestor with differential resolution in descendant lineages, that is, acquisition of functional signals (or not) to stabilize transfer. Alternatively, there may have been independent functional transfers from an ancestor in each of the two unrelated lineages. To examine these alternatives, a maximum likelihood (ML) analysis was conducted using expanded taxon sampling of nuclear and mitochondrial*rps1* sequences. The resulting tree (Figure 5) included some long branches, which may be affected by the well-known phenomenon of long-branch attraction [52]. Nuclear *rps1* from *Lotus* and Trifolieae species were split into two independent clades, with intact and pseudogenized mitochondrial *rps1* placed between them. This pattern supports the explanation that functional transfers of *rps1* occurred at least two times in Papilionoideae, once in *Lotus* and a separate event in the ancestor of the Trifolieae clade that includes *Trigonella*, *Melilotus*, *Medicago* and *Trifolium*. The timing of the functional transfer of *rps1* in Trifolieae would likely be after the divergence of *Ononis* (Figure 5), which only has a mitochondrial copy [39].

Despite the putative functional replacement by nuclear *rps1*, the mitochondrial *rps1* in three genera (*Trigonella*, *Melilotus* and *Medicago*) was retained with limited sequence divergence (Figure 5), whereas it is completely and precisely deleted in *Trifolium* (Figure S1). Coding regions of plant mitogenomes are conserved by an accurate long homology-based repair mechanism, while non-coding regions are not conserved and are repaired by error-prone mechanisms [50]. Differential selection on mitogenomic molecules, which reduces harmful mutations on coding regions after double strand breaks (DSBs), was proposed to explain this [48,49]. Pseudogenized copies of mitochondrial *rps1* in the three genera *Trigonella*, *Melilotus* and *Medicago* are located adjacent to *nad5* exon1 (ca. 200 bp apart) [39]. Mutations in 5 region of *nad5* exon1 that do not disturb transcription or translation of the functional gene and only affect pseudogenized *rps1* can be inherited by selection after DSBs. So, the adjacent location of mitochondrial *rps1* to *nad5* exon1 may enable retention of high sequence identity after functional replacement by sharing the benefit of accurate repair. a similar situation is known for the *rps14* pseudogene that is adjacent to *rpl5* in grasses [53]. Conservation of non-coding regions adjacent to coding regions is also present in mitogenome-wide sequence divergence comparisons across Fabaceae [8].

#### *3.3. Shared DNA Among Genomes of Trifolium*

Comparative analyses of the three genomic compartments (nuclear, mitochondrial and plastid) in *T. pratense* revealed a substantial amount of shared DNA between nuclear and organelle genomes, most of which was short fragments (Figure 4, Table 3). The shared DNAs between nuclear and mitochondrial genome was 135.4 kb (Figure 4) and had GC content more similar to those of mitogenomes (Tables 1 and 3) suggesting that most IGT was unidirectional (i.e., mitochondrion to nucleus) and the nuclear genome of *T. pratense* includes numerous NUMTs. These NUMTs may integrate into the

nuclear genome of *T. pratense* as short fragments. Alternatively, these short fragments may be the consequence of post-IGT mutational decay and rearrangement of longer NUMT sequences [54].

The discovery of a long stretch of NUMTs (spanning 348.5 kb; GC: 44.3%) in chromosome 4 of *T. repens* (Figure S3) supports a recent genomic scale IGT event. This type of large IGT was identified in *Arabidopsis thaliana* (Brassicaceae) in which ~270 kb of 367 kb mitogenome transferred to the nucleus [55] and covers an ~620 kb region of the nuclear genome [56]. To estimate the amount of NUMTs in *T. repens*, a mitogenome sequence from the same DNA source (white clover cv 'Crau' derivative) [46,57] is necessary. Large NUMTs were reported for animal nuclear genomes (little brown bat and fugu), however, these were later shown represent artifacts of genome assembly [58,59]. The nuclear genomes of *Trifolium* species are drafts with many gaps [43–46]. Verification of long putative NUMTs in *Trifolium* is needed to confirm genomic scale IGT events from the mitochondrial to nuclear genome.

## *3.4. Multiple Fissions of ccmF in Land Plants and a Novel Event in Trifolium*

The first fission of mitochondrial *ccmF* dates back to the early evolution of land plants and split the gene into N-terminal (*ccmFn*) and C-terminal (*ccmFc*) coding regions [60]. In Marchantiales, the ORFs are closely adjacent (Figure S5). The mitogenome study of *Marchantia paleacea* (misidentified as *M. polymorpha* [61]) from the early 1990s [22] reported a fission of *ccmFc* (i.e., *ccmFc1* and *ccmFc2*) due to a single nucleotide deletion. This fission event was accepted in several subsequent papers [3,21,60], however, mitogenome sequences of two other *Marchantia* species (*M. inflexa* and *M. polymorpha* subsp. *ruderalis*) did not show the single nucleotide deletion, consistent with the other two available mitogenomes of Marchantiales (Figure S5). The initial report of a *ccmFc* fission in *Marchantia* should be re-examined to determine if it is specific to *M. paleacea* or the result of sequencing error.

In angiosperms, two independent fissions of *ccmFn* have been reported in *Allium* (Amaryllidaceae) [25] and Brassicaceae [24,62]. In both cases, *ccmFn1* and *ccmFn2* are distant from each other in the mitogenome and they share a similar breakpoint for the fission (Figure 6). The phylogenetic distribution of the fission in Amaryllidaceae was investigated by polymerase chain reaction using four genera in the family (*Narcissus*, *Tulbaghia*, *Ipheion* and *Allium*) and revealed that the separation of the two sequences is restricted to *Allium* [25]. However, the status of the other three genera without separation of *ccmFn* sequences does not necessarily guarantee that the gene is not split because there are cases of gene fission where the two new genes occupy a single locus, for example, fission of *ccmF* (into *ccmFn* and *ccmFc*) in Marchantiales (Figure S5) and *ccmFn* (into *ccmFn1* and *ccmFn2*) in *Trifolium* (Figure 2). The distribution and status of *ccmFn* fission in Amaryllidaceae needs further investigation including broad taxon sampling as well as confirmation with additional sequencing.

In Brassicaceae, it was argued that the fission is shared by all members of the family because it is present in five complete or draft mitochondrial genomes covering the earliest diverging genus (*Aethionema*) and other core genera (*Arabidopsis*, *Brassica*, *Raphanus*), whereas the mitogenome of the sister family Cleomaceae does not have the fission [62]. Further investigation, including additional published mitogenomes and assembled mitochondrial contigs for *ccmF* genes (Table S2), indicates that three species of *Aethionema* do not have the fission of *ccmFn* (Figure 6b). This discrepancy could be due an assembly error since the *Aethionema* data in the previous study was a draft mitogenome [62]. Whatever was the cause of discrepancy, it is clear that the fission of *ccmFn* is shared by many but not all Brassicaceae. The fission occurred after the divergence of *Aethionema* (Figure 6b); however, it is unknown if there was an intermediate stage that had experienced the fission but not physical separation of the *ccmFn1* and *ccmFn2*.

The independent fission of *ccmFn* in *Trifolium* represents a novel event. The fission was caused by a deletion of 59 bp resulting in a frame shift and premature stop codon (Figure 2). An alternative outcome of this deletion may be pseudogenization of the *ccmFn*. Mutational decay and deletion of pseudogenized mitochondrial genes can be delayed by proximity to functional genes (e.g., *rps1* in some Trifolieae genera and *rps14* in grasses, see Section 3.2). However, the gene that is consistently adjacent to *ccmFn* (*ccmFn1* and *ccmFn2*) is *ccmC*, which is ca. 8kb away from *ccmFn* in the four *Trifolium* species (Figure 1). Moreover, the expanded *ccmFn* sequence sampling confirms that the two ORFs (*ccmFn1* and *ccmFn2*) are conserved in eight *Trifolium* species with only a limited amount of sequence variation in coding regions (Figure S4). The fission break point in *Trifolium* is different from other angiosperms that express cytochrome c maturation protein from two ORFs, yet the conserved domains of the product remain intact (Figure 6c). Hence, the two ORFs of *ccmFn* are regarded as functional. The fission occurred after the divergence of genera *Trigonella* and *Melilotus* in the Trifolieae. The conserved adjacency of the two ORFs (*ccmFn1* and *ccmFn2*) may represent an early stage of the fission as in *ccmFn* and *ccmFc* in Marchantiales (Figure S5).

The fission of *ccmFn* in *Trifolium* leads to another question: is this event related to "intercompartmental piecewise gene transfer" [21]? To explore this question, we searched for ORFs of *ccmFn* in draft nuclear genomes of four *Trifolium* species (*T. subterraneum*, *T. pratense*, *T. pallescens* and *T. repens*). Both *T. pallescens* and *T. repens* (Figure S4) contained the *ccmFn* NUMTs however these were not restricted to a single ORF but included a locus covering both ORFs (*ccmFn1* and *ccmFn2*) and their flanking regions. The NUMTs were identical to their counterpart in mitogenome suggesting that the transfer was a recent event (or artifact in nuclear genome assembly, see discussion Section 3.3). Furthermore, there was no post-IGT sequence modification to suggest a functional transfer. Evidence did not support a relationship between fission of the mitochondrial gene *ccmFn* and piecewise or functional transfer in *Trifolium* species.
