**Structure–Function Analysis Reveals the Singularity of Plant Mitochondrial DNA Replication Components: A Mosaic and Redundant System**

#### **Luis Gabriel Brieba**

Laboratorio Nacional de Genómica para la Biodiversidad, Centro de Investigación y de Estudios Avanzados del IPN, Apartado Postal 629, Irapuato, Guanajuato C.P. 36821, Mexico; luis.brieba@cinvestav.mx

Received: 24 October 2019; Accepted: 19 November 2019; Published: 21 November 2019

**Abstract:** Plants are sessile organisms, and their DNA is particularly exposed to damaging agents. The integrity of plant mitochondrial and plastid genomes is necessary for cell survival. During evolution, plants have evolved mechanisms to replicate their mitochondrial genomes while minimizing the effects of DNA damaging agents. The recombinogenic character of plant mitochondrial DNA, absence of defined origins of replication, and its linear structure suggest that mitochondrial DNA replication is achieved by a recombination-dependent replication mechanism. Here, I review the mitochondrial proteins possibly involved in mitochondrial DNA replication from a structural point of view. A revision of these proteins supports the idea that mitochondrial DNA replication could be replicated by several processes. The analysis indicates that DNA replication in plant mitochondria could be achieved by a recombination-dependent replication mechanism, but also by a replisome in which primers are synthesized by three different enzymes: Mitochondrial RNA polymerase, Primase-Helicase, and Primase-Polymerase. The recombination-dependent replication model and primers synthesized by the Primase-Polymerase may be responsible for the presence of genomic rearrangements in plant mitochondria.

**Keywords:** DNA replication; evolution; replisome; recombination-dependent replication

#### **1. Introduction**

#### *1.1. Plant Mitochondria Genomes*

Mitochondria arose from a monophyletic endosymbiotic event between an archaea and an α-proteobacteria approximately two billion years ago [1]. During the evolution of eukaryotes, mitochondrial genomes have evolved in size and complexity. For instance, mitochondrial genomes vary in size more than three orders of magnitude and they exist as circular, linear, linear-branched, linear-fragmented, and mixtures of maxi and mini-circles [2]. In general, metazoan mitochondrial genomes are circular molecules that vary in sizes between 10 to 30 kb [3]. In contrast, plant mitochondrial genomes are predominantly large linear DNA molecules (up to 11 Mb in angiosperms from the genus Silene). Besides the differences between the physical structure of the plant and metazoan genomes (linear versus circular), the most remarkable characteristics of plant mitochondrial genomes are their ability to rearrange, their low nucleotide substitution rate, and the evolution of new mitochondrial open reading frames. For instance, almost all vertebrates exhibit a similar organization in their mitochondrial genome arrangement [4], whereas the mitochondrial genomic organization in plants is different even between ecotytpes of the same species [5]. The abundance of noncoding sequences severely complicates alignments of mitochondrial genomes from different plant families [6]. A comparison between the mitochondrial genomes of Col-0 and C24 ecotypes of *Arabidopsis thaliana*, that diverged 200,000 years ago, shows that both genomes exhibit different configurations because of a large inverted repeat [5,7–9]. Even though plant mitochondrial genomes rearrange, the substitution rate in their coding regions is almost negligible, in contrast with the highly mutable human mitochondrial genome [10,11].

#### *1.2. Replication in Mammalian Mitochondria*

Due to their bacterial origin, the mechanisms involved in mitochondrial and plastid DNA replication are expected to be related to bacteria. Yet mitochondrial DNA replication in metazoans is achieved by a replisome that is phylogenetically related to the bacteriophage T7 replisome [12,13]. In mitochondrial replisomes from metazoans, a bacteriophage-related RNA polymerase synthesizes RNA primers to start replication at the heavy and light chains of the circular DNA mitochondrial molecule, a hexameric helicase unwinds double-stranded DNA, and a trailing mitochondrial DNA polymerase synthesizes DNA. Human mitochondrial DNA replication starts by a strand-displacement model of replication in which human mitochondrial RNA polymerase (RNAP) transcribes the heavy-strand promoter generating a primer that is processed and passed on to the mitochondrial DNA polymerase (DNAP), DNA replication proceeds interruptedly to copy a new heavy-strand [14]. During this process, the replication fork replicates the light strand origin of replication. This DNA sequence folds into a stem–loop structure that allows primer synthesis by the mitochondrial RNAP, and these primers are elongated by the mitochondrial DNA polymerase [15]. Elongation of the heavy and light chains continues asynchronically until the two chains are completely copied. Although the strand-displacement model is generally accepted as the mechanism for mitochondrial DNA replication, there are discrepancies regarding how it proceeds. To date, two alternative models explain strand-asynchronous replication in mitochondria. One model proposes that long RNA molecules hybridize to the single-stranded heavy-strand [16]. This ribonucleotide (RNA) incorporation occurred throughout the lagging strand (RITOLS) transcripts that are continuously hybridized as replication continues [17]. The second model proposes that single-stranded DNA binding proteins coat the lagging-strand template [18]. Alternatively to the strand-displacement model, coupled leading and lagging-strand DNA synthesis can occur bidirectionally in mitochondria [19,20] and recent work stablished that cells can shift between the strand-asynchronous and the coupled leading and lagging-strand DNA synthesis depending of the amount of transcripts [21].

#### **2. Enzymes Involved in Organelle DNA Replication in Plants Can Be Grouped into Bacteriophage-Related, Replication-Dependent Replication and Unique Enzymes**

The main difference between the mitochondrial metazoan and bacteriophage T7 replisomes is that the T7 primase-helicase harbors an active primase module that synthesizes primers for lagging strand synthesis, whereas the primase module of metazoan primase-helicases is inactive and primer synthesis depends solely on the mitochondrial RNA polymerase [22,23]. Thus, metazoan primase-helicases harbors a primase module that has lost its priming activities. The similarities between the metazoan replicative mitochondrial DNA primase-helicase and the primase-helicase of bacteriophage T7 resulted in the name of TWINKLE (T7 gp4-like protein with intra-mitochondrial nucleoid localization) for this protein [24].

#### *2.1. A T7-Like Replisome in Plant Organelles*

In this review, we focus on the proteins from the model plant *Arabidopsis thaliana* as a representative of flowering plants. As their metazoan counterparts, plant organelles harbor enzymes related to the T7 replisome (Table 1). From the four enzymes involved in DNA replication in bacteriophage T7 and metazoan mitochondria, land plants have conserved three of them: (a) The primase-helicase, (b) the RNA polymerase, and (c) the single-stranded DNA binding protein (Table 1). The presence of these proteins suggests that plant mitochondrial DNA replication is executed in part by a mechanism that resembles the coordinated leading and lagging-strand replication model of bacteriophage T7 [22]. In this model, a central primase-helicase unwinds dsDNA in the 3'-5'direction followed by a processive DNA polymerase in the leading strand. The primase module of the primase-helicase uses the unwounded

single-stranded regions to recognize a sequence to start the synthesis of very short ribonucleotides that are handed off to the active site of the lagging strand DNA polymerase. The single-stranded DNA regions generated during this trombone mechanism are coated by the single-stranded binding proteins [22,25].


**Table 1.** Proteins related to bacteriophage T7 proteins present in plant mitochondria.

#### 2.1.1. Plant Organellar Primase-Helicase (AtTwinkle)

Primase-helicases are the central component of replisomes [26,27]. These enzymes unwind double-stranded DNA segments using NTP hydrolysis for translocation and primer synthesis, using their helicase and primase modules, respectively [22,27]. The organellar primase-helicases in *A. thaliana* (dubbed AtTwinkle) is a 709 amino acid protein with mitochondria and chloroplast localization [28] (Figure 1A). AtTwinkle, as predicted for all plant primase-helicases, harbors both primase and helicase activities [28–30]. Structural studies of primase-helicase show that these enzymes assemble as heptamers or hexamers in which the helicase modules form a compact oligomeric ring to which the primase modules attach [31,32] (Figure 1B,C). The primase module of AtTwinkle contains six conserved motifs [30]. Motif I corresponds to the zinc binding domain (ZBD) necessary for template recognition, whereas regions II to VI assemble the RNA Polymerase domain (Figure 1D). In contrast to all previously characterized primase-helicases, AtTwinkle recognizes two cryptic nucleotides within the ssDNA template [29], a biochemical property that may reduce the length of the Okazaki fragments during plant mitochondrial replication. The helicase module of AtTwinkle shares high amino acid identity with the helicase module of the T7 primase-helicase and harbors the five conserved motifs [33], including a Walker motif necessary for nucleotide hydrolysis. The presence of an active AtTwinkle protein in Arabidopsis suggests the presence of a plant mitochondrial replisome in which a DNA polymerase replicates DNA following the unwinding of the double helix and exposing the leading-strand for continuous synthesis [34]. The primase activity suggests that a trailing DNA polymerase synthesizes the lagging-strand using primers synthesized by the primase module of AtTwinkle [29]. This model of coordinated leading and lagging strands occurs in bacteriophages T4 and T7, but not in mitochondria from metazoans and yeast [22,23,26]. Interestingly, Arabidopsis harbors a protein that contains the zinc finger and the RNA polymerase module of AtTwinkle dubbed AtTwinky [28]. This module by itself is functional in vitro [29]. An Arabidopsis insertional line in AtTwinkle shows no apparent phenotype, maybe because the T-DNA insertion occurs in an intron or because of redundant mechanisms for primer synthesis and DNA unwinding [34].

**Figure 1.** AtTwinkle is a homolog of bacteriophage T7 primase-helicase and mitochondrial Twinkle. (**A**) Schematic representation of the bifunctional T7 primase-helicase in comparison to AtTwinkle and human Twinkle. T7 primase-helicase and AtTwinkle contain the conserved motifs necessary for primase and helicase activities, whereas human Twinkle is inactive as a primase. (**B**) Homology model of AtTwinkle showing its RNA polymerase domain and helicase modules with basis on the crystal structure of the heptameric T7 primase-helicase [31]. (**C**) Close view of a monomeric module of the RNAP and helicase of AtTwinkle. (**D**) Close view of the primase module composed of the zinc binding domain (ZBD) and RNAP domain. The conserved cysteines that coordinate the zinc atom are colored in red and magenta.

#### 2.1.2. Bacteriophage-Type Plant Organellar RNA Polymerases

In yeast and metazoans mitochondria, transcription is carried out by a single RNA polymerase (mtRNA) homologous to T7 RNA polymerase [35]. In contrast to metazoans that harbor one nuclear-encoded mtRNAP, flowering plants encode three bacteriophage-type RNA polymerases [36,37]. One is localized into the mitochondria (RpoTm), one into the chloroplast (RpoTp), and the third one presents dual mitochondrial and plastid localization (RpoTmp). In Arabidopsis, RpoTm and RpoTp start transcription at a specific set of promoters. However, RpoTmp is unable to start transcription by itself [38]. These enzymes are closely related to bacteriophage T7 RNAP and due to sequence similarity are expected to fold into two conserved domains: An N-terminal domain, possibly involved in RNA binding and a C-terminal or polymerization domain. The C-terminal domain is structurally divided into three subdomains, dubbed palm, fingers, and thumb (Figure 2). Yeast and metazoan mitochondrial RNAPs are only active by themselves on supercoiled templates; on linearized templates, they need an associated transcription factor to start transcription [39,40]. Likewise, plant mitochondrial RNAPs are only active in supercoiled templates [36], suggesting that they also need an unidentified plant mitochondrial transcription factor for efficient promoter melting. Mitochondrial RNAPs from metazoans and yeast contains an N-terminal pentatricopeptide repeat (PPR) not present in plant mitochondrial RNAPs and T7 RNAP (Figure 2). Thus, plant mitochondrial RNAP are more compact than yeast and metazoan mitochondrial RNAPs.

In bacteriophage T7 and metazoan mitochondria, their RNAPs synthesize long RNA chains at defined sequences that mark their origins of replication [15,41–43]. It is unknown if plant mitochondrial RNAPs play a role in synthesizing RNA primers during mitochondrial or plastid replication. However, plant mitochondrial genomes are proposed to exist as a multitude of linear fragments, carrying only partial segments of their genome [44–46]. The presence of numerous promoter DNA sequences in plant mitochondria makes possible the existence of multiple initiation replication sites in mitochondrial DNA.

During metazoan mitochondrial DNA replication, the RNA primers generated by the mitochondrial RNA polymerase are removed by a specific set of nucleases. In humans, five different nucleases participate in this process [47–52]. From those enzymes, RNAse H1 plays a predominant role by degrading the RNA primer until it reaches few nucleotides. These last two to three ribonucleotides can be removed by the flap specific nucleases FEN1, DNA2, and MGME1 or by the selective 5- -3- exonuclease EXOG [48,51]. Arabidopis encodes for three proteins highly homologous to RNase H1, dubbed AtRNH1A (At3g01410), AtRNH1B(At5g51080), and AtRNH1C (At1g24090) [47]. AtRNH1A is

localized into the nucleus, whereas AtRNH1B and AtRNH1C are imported into mitochondria and chloroplasts, respectively. AtRNH1C prevents R-loop accumulation in chloroplast especially at highly transcribed regions and putative origins of replication [47]. AtRNH1C is involved in assuring genome stability in the chloroplast, suggesting the possibility that AtRNH1B may contribute to the removal of RNA primers in plant mitochondria.

**Figure 2.** Bacteriophage-type plant organellar RNA polymerases. (**A**) Domain organization of bacteriophage-related RNAP. These enzymes share a C-terminal or polymerization domain that is divided into three subdomains: Fingers, palm, and thumb, and a N-terminal domain involved in promoter opening and RNA binding. The N-terminal domain is colored orange and the subdomains of the fingers, thumb, and palm of blue, green, and red, respectively. mtHsRNAP associates with two accessory subunits (TFB2M and TFAM) to open double-stranded DNA and contains a N-terminal pentatricopeptide repeat (PPR)-domain and a tether helix not present in plant mitochondrial RNAPs. (**B**) Structural model of the mtAtRNAP compared to bacteriophage T7RNAP and human mtRNAP during transcription initiation [40,53].

#### 2.1.3. Plant Organellar Single-Stranded DNA Binding Proteins

All replisomes contain single-stranded DNA binding proteins (SSBs) that coat the lagging-strand DNA chain and exert a multitude of interactions with DNA polymerases, DNA helicases, and other proteins involved in DNA metabolism. Flowering plants encode for two canonical single-stranded DNA binding proteins that are targeted to mitochondria (AtmtSSB1 and AtmtSSB2) [54,55]. Like all SSBs, these proteins harbor an oligonucleotide/oligosaccharide/binding (OB)-fold domain and share a conserved set of aromatic amino acids that in other bacterial and mitochondrial SSBs are important for binding to single-stranded DNA. Among these amino acids, residues W54 and F60 that are determinant for binding to SSB in bacteria are conserved in AtmtSSB1 and AtmtSSB2 [56–58] (Figure 3). AtmtSSB1 assembles as a tetramer, binds single-stranded DNA in the nanomolar range, and interacts with plant mitochondrial DNA polymerases from Arabidopsis [59]. A recent proteomic analysis indicates that both AtmtSSB1 and AtmtSSB2 are highly abundant proteins, suggesting that a great portion of the mitochondrial single-stranded DNA is coated with them [55]. The last nine amino acids of *E. coli* SSB are responsible for mediating protein–protein interactions [60,61]. AtmtSSB1 contains a predominant acid tail while AtmtSSB2 harbors an aromatic tail (Figure 3), suggesting the possibility that both SSBs exert differential protein–protein interactions.

**Figure 3.** Homology model of tetrameric AtmtSSB1. **(A**) Homology model of AtmtSBB1 illustrating its oligonucleotide/oligosaccharide/binding (OB)-fold and an acid C-terminal tail. (**B**) An amino acid sequence alignment illustrates that the C-terminal tail of AtmtSSB2 is composed of two aromatic amino acids, whereas AtmtSSB1 is acidic.

#### *2.2. A Putative Recombination-Dependent Replication System in Plant Mitochondria*

One of the main differences between plant and human mitochondrial genomes resides in the presence of highly abundant repeats of different lengths in plant mitochondria [62,63]. These repeats are classified by Gualberto and Newton as large repeats (>500 base pairs); intermediate-sized repeats (50–500 base pairs); and small repeats (<50 base pairs) [64,65]. Seminal studies deduced that the recombinogenic character at large repeats is responsible for plant mitochondrial DNA genomic configurations [62,66,67]. Thus, it is generally accepted that recombination at large repeats results in the presence of multiple mitochondrial genome conformations, whereas recombination at intermediate-size repeats are not as frequent [5,68]. The low-frequency recombination at intermediate-size repeats leads to changes in the stoichiometry of the mitochondrial genomes [69,70]. Finally, recombination at small repeats drives the apparition of new open reading frames associated with traits like cytoplasmic male sterility [71,72]. The notion that recombination is dependent on the length of the repeat is challenged by comparing new mitochondrial DNA sequences between domesticated and wild-type cultivars and by following the evolutionary history between species [73,74].

The recombinant character of the mitochondrial genome is reminiscent of bacteriophage T4 genome, which uses a recombination-dependent replication (RDR) mechanism [46,75]. Furthermore, seminal studies have shown the presence of linear molecules, head-to-tail concatemers, branched, and rosette-like structures during plant mitochondrial replication suggesting that free single-stranded DNA ends direct primer formation [45,46,76,77]. In contrast to metazoan mitochondria, plant mitochondria harbor a complete set of enzymes involved in HR. In bacteriophage, T4 RDR starts by coating of the single-stranded DNA by a recombinase dubbed UvsX, a protein homolog to bacterial RecA, or eukaryotic Rad51. As all recombinases, this protein uses ATP to catalyze the exchange of the single-stranded DNA into double-stranded DNA. This initial step creates a triple-stranded DNA region in which T4 DNA polymerase assembles to initiate replication. A replicative helicase loads onto the displaced DNA strand, this enzyme translocates in 5' to 3' direction, unwinding DNA, and generating a template for the trailing polymerase. The helicase associates with a primase that recognizes single-stranded sequences in the 3'-5'direction and generates primers used by a second DNA polymerase during replisome assembly. Although this system is relatively simple, it needs the presence of several mediator proteins that coordinate protein loading. In Arabidopsis mitochondria, several homologs to the battery of T4 enzymes involved in RDR are present, suggesting the possibility that RDR is a functional mechanism in plants (Table 2).


**Table 2.** Plant mitochondrial proteins related to bacteriophage T4 recombination-dependent replication proteins.

#### 2.2.1. AtRecA

RecA and its homologs Rad51 and BRCA are the central components of homologous recombination. RecA is an archetypical bacterial recombinase that loads onto resected single-stranded DNA in an ATP-dependent reaction. It assembles a nucleic acid-protein filament that navigates the double-stranded genome in search of a homologous sequence, and when a region of homology is encountered, this filament perfectly pairs with its homologous partner (located within a dsDNA region) and generates a heteroduplex or D-loop intermediate [78]. HR by Rad51/RecA is abrogated in the presence of mismatches and bacterial RecA needs at least eight nucleotides of perfect complementarity to form a stable D-loop, although the efficiency of heteroduplex formation increases according to the length of the perfect complementarity [79–81]. In bacteria, the RecA monomer consists of a central or core domain of approximately 230 amino acids. This domain folds into a single β-sheet and six α-helices [82]. This core domain is flanked by N and C-terminal domains of approximately 30 and 60 amino acids, respectively [82]. The crystal structure of bacterial RecA–ssDNA filament illustrates how the RecA assembles onto ssDNA and how Watson–Crick pairing is assured during the homology search [83] (Figure 4A).

Unlike metazoan mitochondria that are devoid of RecA homologs, plant mitochondria harbor orthologues of the recombinase RecA/Rad51 gene family [69,84–86]. These proteins are conserved from algae to flowering plants. Genetic studies in *Physcomitrella* patens and *Arabidopsis* demonstrate the role of RecA in preventing illegitimate recombination events at small repeats in *P. patents* and intermediate-size repeats in Arabidopsis [69,86,87]. *A. thaliana* harbors three RecA genes. RecA1 is targeted to the chloroplast, RecA2 is targeted to plastids and mitochondria, whereas RecA3 is only targeted to mitochondria [69,88]. AtRecA1 is an essential gene, whereas AtRecA2 is only necessary after the seedling stage [69,86]. AtRecA2 and AtRecA3 share 53% and 41% amino acid identity with *E. coli* RecA, respectively. The latter suggests that HR in plant mitochondria may follow a mechanism similar to bacteria. Interestingly, AtRecA3 lacks the last 22 amino acids of its C-terminal domain in comparison to *E. coli* RecA. In bacteria, these residues have a highly acidic composition and a deletion of 17 amino acids is more efficient in displacing bacterial SSB from ssDNA, thus the C-terminal extension negatively modulates RecA activity [89] (Figure 4). Plants mutated in AtRecA3 are phenotypically normal. However, they are sensitive to genotoxic treatments [69]. The loss of RecA2 and RecA3 promotes rearrangements at intermediate-size repeats [86]. These repeats are not perfect and lead to homeologous recombinant products (illegitimate recombination products). The increase of illegitimate recombination products in the absence of AtRecA2 or AtRecA3 suggests

that less stringent RecA-independent pathways take over in their absence. One possible pathway is the single-strand annealing recombination pathway (SSA) under the control of specialized SSBs with annealing capabilities as is the case in *Deinococcus radiodurans* [90]. Recent proteomic studies indicate that RecA2 is one of the most abundant DNA binding proteins in plant mitochondria [55].

**Figure 4.** Structural conservation of plant and bacterial RecAs. (**A**) Crystal structure of the bacterial RecA postsynaptic nucleoprotein filament determined by Chen, Yang, and Pavletich [83]. Each of the five RecA monomers is individually colored and labeled with numbers. The search strand is colored in yellow and the complementary strand in red. The crystal structure comprises solely the RecA fold and the C-terminal domain is not present in the initial construct. (**B**) Domain organization of AtRecA2 and AtRecA3 in comparison to bacterial RecA. AtRecA3 lacks the C-terminal regulatory domain.

#### 2.2.2. AtRecX

In bacteria, RecA can be inhibited by an interaction with a small protein (approximately 20 kDa) dubbed RecX [91]. RecX proteins bind to RecA monomers and DNA [92]. Bacterial RecX proteins are composed of nine α-helices that arrange into three three-helix bundles [93,94] (Figure 5A). RecX binds to RecA filaments promoting their dissociation from single-stranded DNA and impinging homologous recombination [95,96]. *A. thaliana* encodes for a gene of 382 amino acids, ortholog to bacterial RecX, with a predicted mitochondrial localization signal in its first 25 amino acids, a domain of unknown function and a C-terminal segment that presents 30% amino acid identity with *E. coli* RecX (Figure 5B). The presence of this RecX ortholog (AtRecX) suggests the possibility that RecA activities are subject to regulation in plants. The presence of three RecA genes in flowering plants also suggests that these proteins may be subject to a gradient of regulation by RecX in vivo. In the moss *Physcomitrella patens* RECX, overexpressing mutants exhibit increased recombination products at short dispersed repeats in mitochondria [97], suggesting that RecX modulates RecA activity and when RecA is not functionally active, less accurate DNA repair routes gain access to ssDNA with a concomitant appearance of illegitimate recombination products.

**Figure 5.** Structural organization of AtRecX. (**A**) Crystal structure of RecX from *E. coli* (PDB: 3c1d). RecX is composed of three repeats of a three-helix motifs, (**B**) modular organization of AtRecX in comparison to bacterial RecX. Plant RecX harbor a mitochondrial targeting sequence (MTS) and a N-terminal domain of unkown function. AtRecX share more than 30% amino acid identity with bacterial RecXs.

#### 2.2.3. Organellar DNA-Binding Proteins (ODBs)

Upon the formation of single-stranded breaks, canonical SSBs bind to ssDNA blocking its acess to other binding proteins. In order for RecA to bind ssDNA, SSBs have to be removed from ssDNA. In bacteria, a protein named RecO (or its functional homolog in yeast, Rad52) interacts with the C-terminal tails of SSBs creating space for RecA binding [98]. Via proteomic studies, the Gualberto group identified that Arabidopsis contains two organellar DNA-binding proteins (ODBs), one located in the mitochondria (AtODB1) and the other in the chloroplast (AtODB2) [99]. AtODBs are homologous to Rad52 and the yeast mitochondrial nucleoid protein Mgm101 [100]. Mgm101 assembles an oligomeric ring structure and preferentially binds single-stranded DNA, suggesting a role in stabilizing and annealing DNA segments [101,102]. Likewise, Rad52 induces the displacement of human replication protein A (RPA) from ssDNA, anneals complementary ssDNA strands, and promotes strand exchange between ssDNA and dsDNA [103]. Thus, Rad52 promotes HR by displacing RPA, and promotes the coating of Rad51 by directing single-stranded annealing. Crystal structures of human Rad52 in complex with ssDNA depict this molecule as an undecameric ring in which two Rad52 oligomers could mediate HR in trans [104–106]. AtODB1 comprises 177 amino acids and shares extensive homology with the N-terminal domain of Rad52 (that contains the DNA binding and oligomerization regions). However, AtODB1 lacks a C-terminal domain containing the interacting motif for RPA and Rad51, that are involved in their displacement from ssDNA [107,108]. AtODB1 is 41 amino acids shorter than the construct of 212 amino acids used to crystallize human Rad52. Interestingly, the last 41 amino acids of human Rad52 folds into an alpha-helix (named helix 5) that intercalates with the first alpha-helix of the structure stabilizing the oligomeric assembly [104] (Figure 6).

Because of the reduced size of AtODBs, it is unknown if these proteins interact with SSBs from plant mitochondria like AtmtSSBs, AtWhirlies, AtRecA, or AtOSBs. Arabidopsis odb1 insertional mutants present no variation in phenotype, however upon genotoxic stress, they show inferior homologous recombination potential and increased microhomology-mediated end joining (MMEJ) [100]. This suggests that plant ODBs may function as mediator proteins that promote the annealing of plant RecAs onto single-stranded DNA. Recombinantly expressed plant ODB1 can anneal short DNA sequences [100]. The increase in MMEJ in plants lacking AtODB1 may be related to a role of this protein in a single-strand annealing recombination pathway, since human Rad52 proteins promote this route [109,110].

**Figure 6.** AtODB1 resembles human Rad52. (**A**) Structural domain organization of AtODB1 in comparison to human Rad52. AtODB1 lacks the C-terminal domain necessary to interact with RPA and Rad51; (**B**) crystal structure of the undecameric ring of human Rad52. The undecameric structure is stabilized by alpha-helix 5 that interacts with alpha-helix 1 of the neighbor molecule. Each subunit (residues 1 to 172 is individually colored) and the C-terminal residues (172 to 212) are colored in read. (**C**) Model of AtODB1 as a undecameric ring lacking alpha-helix 5 of human Rad52.

#### 2.2.4. AtRadA

Bacterial RadA promotes single-stranded strand exchange similar to RecA, and was initially suggested to be orthologous to RecA [111]. Bacterial RadAs have a conserved domain organization composed of: (a) A putative zinc finger (ZnF), (b) a Rec-A like ATPase domain with a unique KNRFG motif, and (c) a region homologous to the Lon protease. Gualberto and Newton have identified the presence of a RadA-like gene in plant organelles [64] (At5g50340.1). This protein harbors a dual organellar targeting sequence in its first 88 amino acids and has 63% amino acid similarity with RadA from *Streptococcus pneumoniae* [112–114]. Bacterial Rad assembles as a hexameric ring, resembling the structural organization of replicative DnaB helicases [112] (Figure 7). Bacterial RadA interacts with RecA and unwinds dsDNA in the 3- -5 direction. These biochemical properties suggest that RadA promotes the extension of ssDNA after RecA mediated homologous recombination, similar to the extension of bacterial origins of replication mediated by DnaB [112].

Because of the conserved domain organization of AtRadA, it is plausible that this protein is involved in a recombination-dependent replication mechanism. The appearance of multiple origins of replication in plant mitochondria by electron microscopy suggests the possibility that the unwinding ability of AtRadA is a key element for break-induced replication, by stabilizing a D-loop in synchrony with AtRecAs in which AtPolIs could be loaded. An interaction between RecA and RadA promotes D-loop extension in bacteria [115], suggesting that a similar mechanism could exist in plant mitochondria.

**Figure 7.** Plant RadA resembles the bacterial enzyme. (**A**) Structural organization of AtRadA in comparision to bacterial RadA. AtRadA shares 63% amino acid similarity with RadA from *S. pneumoniae* and complete amino acid identity in the catalytic amino acids. Bacterial RadA harbor a zinc finger (ZnF), a Rec-A like ATPase domain with a unique KNRFG motif, and a region homologous to the Lon protease. (**B**) Crystal structure of the Rec-A like ATPase and Lon protease domains of RadA from *S. pneumoniae* showing its resemblance to a hexameric helicase. The ZnF domain is not present in the crystal structure.

#### 2.2.5. AtRecG

DNA lesions like thymine-dimers or abasic sites, that potentially block replicative DNA helicases and DNA polymerases, are expected to be predominant in plant mitochondria. Thus, it is expected that plant mitochondria have developed mechanisms to avoid replication roadblocks that lead to replication fork collapse. Stalled replication forks can be resolved via the formation of four-strand Holliday junctions. In bacteria and bacteriophage T4, the helicases RecG and UvsW execute this process [75,116–118]. Bacterial RecGs are loaded in a stalled replication fork where they catalyze replication fork reversal by "pushing" a halted three-strand fork and convert this three-strand fork into a four-strand junction or Holliday junction [117–120]. The Holliday junction structure functions as a starting point for replication fork restart.

Flowering plants encode a RecG homolog that is conserved from green algae [121]. In Arabidopsis this protein consists of 957 amino acids, from those residues its first 57 amino acids correspond to an organellar targeting sequence. AtRecG shares 34% amino acid identity with RecG from *Thermotoga maritima* and is expected to have a similar structure (Figure 8). Arabidopsis plants compromised in their RecG activity are prone to suffer recombination events at intermediate-size repeats and this phenomenon increases in plants deficient in AtRecA3 [121]. Although the precise role of AtRecG is unknown, this protein may be involved in the processing of Holliday junction structures and avoiding replication fork collapse or promoting DNA double-strand break repair.

**Figure 8.** Plants harbor a RecG ortholog. (**A**) AtRecG presents the same domain organization of bacterial RecG, plus the addition of an N-terminal organellar targeting sequence. (**B**) RecG remodels halted replication forks by promoting fork regression (chicken foot structure) that is converted to a Holliday junction. (**C**) Crystal structure of *T. maritima* RecG illustrating its modular assembly.

#### *2.3. Unique Proteins in Flowering Plant Mitochondria*

Flowering plant mitochondria have unique proteins. These proteins include: (i) Replicative DNA polymerases solely encoded by protists and plants, (ii) a modified family of single-stranded binding proteins, dubbed organellar single-stranded DNA binding proteins (OSBs) in which their OB-fold suffered extensive modifications, (iii) an associated motif dubbed PDF that plays a role in binding to ssDNA, (iv) a protein that resembles Muts from bacteria, dubbed Msh1, that is only found in plants and corals, and (v) a distinctive family of proteins that belong to a family dubbed whirly (Table 3) [65,122–128]. Both Msh1 and whirlies are proposed to play a dual role in DNA metabolism and as sensor proteins via retrograde signaling from chloroplast-to-nucleus [129,130].


**Table 3.** Unique proteins involved in DNA metabolism in flowering plant mitochondria.

#### 2.3.1. Plant Organellar DNA Polymerases (POPs)

DNA polymerases in metazoan mitochondria are related to bacteriophage T-odd DNA polymerases [12,131]. Pioneering studies by the groups of Professors Sakaguchi and Sato revealed that plant organellar DNA polymerases have a different evolutionary history than phage and mitochondrial DNAPs from metazoans [122–125]. POPs belong to the family A of DNA polymerases; however, they did not evolve from bacteriophage T-odd DNAPs. Flowering plants harbor two paralogous POP genes with chloroplast and mitochondrial localization. In Arabidopsis, one POP is a high-fidelity DNAP (AtPolIA), whereas the other, AtPolIB, is a low-fidelity enzyme [132]. From a structural point of view, the most distinctive elements in POPs are the presence of three unique insertions in their polymerization

domain, two of those insertions are located in the thumb subdomain (Ins1 and Ins2), whereas the third insertion is placed in the fingers subdomain [122–125]. Ins1 and Ins3 are involved in lyase, strand-displacement, and MMEJ activities [59,133,134] (Figure 9). AtPolIA and AtPolIB interact with AtTwinkle, and extend primers synthesized by its primase module [29,34]. The physical interaction between AtPolIs with AtTwinkle and AtSSB1 suggests the presence of a functional plant mitochondrial replisome [34]. Biochemical and functional evidence suggests that AtPolIA plays a predominant role in DNA replication, whereas the AtPolIB paralog plays a role in DNA repair [132,135,136]. The gene duplication event in POP evolution suggests a possible event of specialization. This situation resembles the presence of duplicated copies of the replicative DNA polymerase in Mycobacterium, in which one copy contributes to drug resistance because of its low nucleotide incorporation fidelity [137]. In this scenario, AtPolIB could be in the process of becoming a DNAP specialized in translesion synthesis or in other DNA repair pathways. Although AtPolIA and AtPolIB share more than 70% amino acid identity, a single amino acid change in homologous DNA polymerases provides translesion DNA synthesis capabilities [138].

**Figure 9.** Structural comparison between AtPolIB and bacterial DNAPs. (**A**) Domain organization of both DNAPs. The polymerization domains are colored in black and the 3- -5 exonuclease domains in orange. The unique amino acid insertions in AtPolIB in comparison to bacterial DNAPs I are depicted in a ball-stick representation and colored in red, green, and cyan. AtPolIs contain an N-terminal DTS and a disorder region not present in the structural model. (**B**) homology model of AtPolIBs with the crystal structures of the Klenow fragment from *E. coli* DNAP I. In both models, the dsDNA from Bacillus DNAP I is superimposed.

#### 2.3.2. AtWhirlies

The most iconic family of single-stranded binding proteins in plant mitochondria is a family dubbed whirly. Whirlies are oligomeric proteins unique to plants. In contrast to the majority of organellar DNA binding proteins, whirlies are encoded in the nucleus and were initially identified as nuclear transcription factors [139]. Whirlies assemble as tetramers, however, upon binding to

long-stretches of ssDNA they form a 24-mer assembly [140,141]. Arabidopsis harbors three members of the Whirly family, AtWhy2 localizes to mitochondria, and as a monomer is the most abundant DNA binding protein in plant mitochondria [55], whereas AtWhy1 and AtWhy3 translocate into chloroplasts [55,142]. T-insertional lines of Arabidopsis that knockout AtWhy1 and AtWhy3 accumulate DNA arrangements at microhomologous repeats in the chloroplast [143]. However, Arabidopsis plants devoid of AtWhy2 present a wild-type phenotype and do not accumulate MMEJ products in the absence of agents that induce DSBs [135,144], and show only a small increase in MMEJ products in presence of ciprofloxacin [135].

Whirly proteins bind ssDNA with nanomolar affinity and exhibit a novel protein fold in which each whirly monomer consists of two antiparallel beta sheets organized along two alpha-helices that resembles a whirligig [128,141]. The whirly domain comprises between 150 to 200 amino acids and contains an acidic/aromatic C-terminal end, that is disordered in crystal structures. The residues involved in ssDNA binding are distributed along the two antiparallel beta sheets and whirlies interact with ssDNA via hydrophobic residues and hydrogen bonds mediated by polar amino acids [140] (Figure 10). Whirlies harbor a conserved KGKAAL motif, located in the second beta strand of the first β-sheet, whose integrity is necessary for the 24-mer assembly [140]. Although mutations in this domain do not affect binding to short ssDNA segments, Arabidopsis complemented with a Why construct in which the second lysine of the KGKAAL motif is mutated to alanine are incompetent to reduce the appearance of microhomologies [140]. The latter suggests that the functional oligomeric state of Whirlies in vivo is a 24-mer. The solvent exposed localization of the unstructured C-terminal tail in whirlies suggests that they may mediate protein–protein interactions, analogous to bacterial SSB.

**Figure 10.** Structural organization of Whirlies. (**A**) Crystal structure of AtWhy2 (PDB ID: 4kop) with model ssDNA from Solanum whirly. The crystal structure represents residues 45 to 212. The second lysine of the KGKAAL motif is in a ball-stick representation. The C-terminal 310 helix is in red. (**B**) Structural organization of AtWhy2. The disordered C-terminal tail is indicated in the diagram.

#### 2.3.3. Organellar Single-Stranded DNA Binding Proteins (OSBs)

The groups of Gualberto and Imbault identified a unique family of single-stranded DNA binding proteins conserved from green algae to flowering plants [126]. These proteins harbor an N-terminal OB-fold domain linked to a motif of 50 amino acids dubbed PDF motif, because of a conserved signature of Pro, Asp, and Phe. Those researchers coined the name "Organellar Single-stranded DNA Binding proteins (OSB)" for members of this protein family. In OSBs, the PDF motif can be arranged as one or multiple copies (Figure 11). Arabidopsis contains four OSBs proteins, dubbed AtOSB1 to AtOSB4. AtOSB1and AtOSB2 are targeted exclusively to mitochondria and chloroplast, respectively, whereas AtOSB3 presents dual-target localization. Quantitative proteomic analysis showed that AtOSB4 and AtOSB3 are highly abundant proteins in mitochondria, whereas AtOSB1 is present at very low concentrations [55]. Remarkably, T-insertion lines of AtOSB1 generate homologous recombination products at repeats that are not commonly used [126].

**Figure 11.** Structural organization of OSBs. (**A**) Structural model of AtOSB1 showing its predicted OB-fold and PDF motif domains. (**B**) Modular organization of mitochondrial OSBs in *Arabidopsis*. AtOSBs consist of an OB-like fold followed by one to three PDF motifs (54). Although AtOSB1 is depicted as a monomer, AtOSB2 in solution assembles as tetramer.

AtOSB2 assembles as a tetramer and binds ssDNA with nanomolar affinity [59]. The PDF motif of AtOSB1 is sufficient for binding to ssDNA, whereas its OB-fold appears to have lost its ability to bind ssDNA [126]. AtOSB2 does not interact with AtPolIs, suggesting that in contrast to other single-stranded binding proteins, its role is not to avoid the formation of secondary structure elements that halt replicative DNA polymerases [59]. The high-affinity of AtOSBs for single-stranded DNA regions and their high abundance within mitochondrial DNA suggest that they coat single-stranded regions of DNA. This coating correlates with the increase of non-canonical homologous recombination products in plants lacking AtOSB1 [126].

#### 2.3.4. AtMhs1

George P. Rédei discovered that the CHLOROPLAST MUTATOR (chm) locus induces plant variegation and impaired fertility, and that both traits are inhered maternally [145,146]. The chm locus regulates the formation of rearrangements in plastids and mitochondria [147] and it encodes for a protein with resemblance to bacterial MutS, and therefore it was named Msh1 [65]. In bacteria, MutS and MutL are conserved elements of the DNA mismatch repair pathway. Within this pathway, MutS recognizes a mismatch and recruits the MutL endonuclease. Recognition of the mismatch correspondingly to the newly synthesized DNA chain is mediated by hemimethylation recognized by MutH [148]. The MSH1 gene is only present in corals and plants and is a multidomain protein harboring domains with homology to bacterial MutS and the GIY-YIG endonuclease [65,127,149,150]. Plants harboring deletions of this gene exhibit increased recombination frequencies at intermediate-size repeats. It is clear that Msh1 guards organellar genomes against aberrant or not frequent recombination

events and the roles of Msh1 appear to be related to homeologous recombination suppression [5,68]. Thus, Msh1 resembles a minimal MutS/MutL complex, in which the GIY-YIG endonuclease may play the same role as that MutL endonuclease. In spite of its prevalent role in keeping a pristine plant mitochondrial genome, the only functional study of this protein comes from the characterization of its GIY-YIG domain. By itself this domain binds to branched DNA structures, however the individual domain is not active as an endonuclease [151]. The proposed role of Msh1 in supressing homeologous recombination resembles the role of MutS2 in Helicobacter pylori which harbors an Smr domain that is a non-specific endonuclease [152,153].

#### *2.4. The Bacterial Gyrase, the Eukaryotic DNA Ligase, and the Archaeo-Eukaryotic PrimPol*

#### 2.4.1. The Bacterial-Like Plant Organellar Gyrase

Topoisomerases are divided into two types, type I topoisomerases transiently introduce ssDNA breaks and type II transiently generate dsDNA breaks. DNA gyrase is a type II topoisomerase typically present in bacteria. This enzyme is a tetramer encoded by two subunits of the GyrA and GyrB proteins. Bacterial gyrases use ATP to introduce negative supercoils in DNA. Wall and coworkers discovered that flowering plants encode one gene for gyrA (At3g10690) and two functional genes of gyrB (At3g10270 and At5g04130) [154,155]. AtGyrA is targeted to mitochondria and chloroplast, whereas the product of At5g04130 is targeted to mitochondria and was dubbed AtmtGyrB [154]. Both AtGyrA and the two AtmtGyrBs have a clear cyanobacterial origin [154].

Structural studies of bacterial gyrases show the coordination between gyrA and gyrB that drives cleavage of the DNA strands, strand passage between subunits, and ligation [156–158]. Heterologously purified AtGyrA/AtmtGyrB present supercoiling activity [155] and the bacterial origin of the plant organellar AtGyrA/AtmtGyrB makes them a target for the development of new herbicides based on quinolones [155]. Ciprofloxacin, a quinolone drug, is commonly used to induce specific DSBs in plant organelles as the gyrase catalytic cycle is not completed [135,159]. However, bacterial DNA gyrases in complex with quinolone drugs pose a barrier for replication and transcription when bound to DNA and it is possible that the DBS results from the collision of replication forks [160]. As replication induces the formation of positive supercoils ahead of replication forks [161], the plant organellar DNA gyrase may control the formation of origins of replication and the rate of transcription.

#### 2.4.2. Nuclear DNA Ligase I Is Targed to Organelles

*Arabidopsis thaliana* encodes for three ATP dependent DNA ligases, dubbed DNA ligase I, IV, and VI. From these, DNA ligase I is located in the nucleus and mitochondria. DNA ligase IV is solely nuclear and DNA ligase VI is possibly targeted to both nucleus and chloroplast [162,163]. Thus, in flowering plants, DNA ligase I (At1g08130.1) is the only ligase known to be targeted to mitochondria [163]. DNA ligase I from Arabidopsis (AtDNAligI) shares 46% amino acid identity with DNA ligase I from humans and its mitochondrial targeting sequence is predicted to involve the first 53 amino acids [113]. The unique role of DNA ligase I in plants contrast with the situation in metazoans in which a specific DNA ligase, dubbed DNA ligase III, is the main DNA ligase in human mitochondria. Although this scenario appears to be specific to vertebrates and in lower eukaryotes, DNA ligase I is both a nuclear and a mitochondrial ligase [164,165]. DNA ligases I are structurally divided into three conserved domains: DNA binding, adenylation, and OB-fold. They also contain an N-terminal PCNA interaction motif, as the interaction between DNA ligase I and PCNA is crucial for efficient nick-sealing. Human DNA ligase I have a toroidal shape structure in which PCNA could be accommodated [166].

The ligase active site is assembled between amino acids from the DNA binding and adenylation domains. Those domains harbors six conserved motifs (I, III, IIIa, IV, V, and VI) including the active site lysine, involved in the formation of the ligase–AMP intermediate [166,167]. As flowering plants appear to only have DNA ligase I in their mitochondria, this ligase is predicted to execute all nick sealing reactions. ATLIG1 is an essential gene and besides its role in DNA replication, it is involved in repairing single and DSBs [162]. A homology-based model of *A. thaliana* DNA ligase I using human DNA ligase I shows the predicted fold conservation between both proteins (Figure 12). The PCNA-interacting peptide (PIP box) motif, located at the N-terminal region of DNA ligases, is predicted to be absent in the mitochondrial isoform after its import into mitochondria (Figure 12). Although it is plausible that Arabidopsis DNA ligase I establishes a set of specific protein–protein interactions with protein partners in mitochondria, it is also possible that Arabidopsis DNA ligase I in mitochondria executes nick-sealing without the assistance of accessory proteins. Supporting this scenario, human mitochondrial DNA ligase III can be substituted for bacterial and viral ligases [168].

**Figure 12.** Structural comparison between HsDNAligI and AtDNligI. (**A**) AtDNAligI has a shorter N-terminal region. However, the core structure that harbors the DNA binding domain (red) the adenylation domain (cyan) and the OB-fold domain (orange) are conserved between both ligases. (**B**) Homology modeling of AtDNAlig I with basis on the crystal structure of human DNA ligase I (PDB ID: 1X9N).

#### 2.4.3. Plant PrimPol

Three independent groups discovered that eukaryotic cells harbor a novel primase from the archaeo-eukaryotic primase (AEP) superfamily [169–171]. This enzyme is homologous to eukaryotic primases, but harbors both primase and polymerase activities in a single polypeptide and therefore it was dubbed PrimPol [169–171]. PrimPol contains independent AEP and zinc finger domains; the first domain is responsible for template-dependent nucleotide incorporation and the second domain provides a mechanism to recognize single-stranded DNA templates [170,172–174]. Human PrimPol localizes to the nucleus and mitochondria [170]. In human mitochondria, this enzyme is not involved in primer synthesizes during mitochondrial replication, but in negotiating DNA lesions by repriming and translesion DNA synthesis [169,175]. *Arabidopsis thaliana* harbors a PrimPol ortholog (AtPrimPol -At5g52800-). This enzyme is potentially a translesion synthesis DNA polymerase able of primer synthesis at specific single-stranded DNA sequences (Figure 13). This enzyme harbors localization signal for the nucleus, the mitochondria, and the chloroplast, suggesting that it may play a role in translesion DNA synthesis in each genome.

**Figure 13.** AtPrimPol resembles HsPrimPol. (**A**) Both AtPrimPol and HsPrimPol share a modular organization. AtPrimPol contains an N-terminal sequence for dual organellar targeting. (**B**) Structural model of the archaeo-eukaryotic primase (AEP) domain of AtPrimPol. The structural model was constructed with basis on the crystal structure of the AEP domain of HsPrimPol.

#### **3. Known Unknowns in Plant Mitochondrial Replication**

#### *3.1. Mitochondrial DNA Replication Is Mosaic and Redundant*

Plant mitochondrial DNA replication is carried out by mosaic and redundant elements (Tables 1–3). For instance, two DNA polymerases (AtPolIA and AtPolIB) are capable of executing DNA replication; at least three different processes may exist for DNA unwinding: (a) Direct unwinding by AtTwinkle, (b) direct unwinding by RadA, and (c) intrinsic unwinding by AtPolIs due to their strong strand-displacement activities; and five different processes (double stranded breaks, abortive transcription by mitochondrial RNA polymerases, and primer synthesis by AtTwinkle, AtTwinky, and AtPrimPol) could generate 3- -OHs needed to start replication. Thus, is not surprising that few genes involved in mitochondrial DNA replication are essential.

In the coordinated leading and lagging-strand DNA synthesis model, an RNA polymerase synthesizes long RNA primers at unknown replication origins, AtTwinkle assembles at the single-stranded region, and these RNA primers are extended by a leading-strand AtPolI. AtTwinkle coordinates leader and lagging-strand synthesis by its primase activity. In the recombination-dependent replication system, a double-stranded break is resected and could be coated with AtRecAs. AtRecA would be responsible to find a homologous region in a double-stranded DNA segment. During AtRecA binding, the plant helicase AtRadA may bind to the single-stranded DNA assembling a replisome upon the interaction with AtPolIA or AtPolB (Figure 14).

**Figure 14.** Putative models for DNA replication in plant mitochondria. (**A**) Leader and lagging-strand DNA synthesis. (**B**) Recombination-dependent replication systems in plant mitochondria.

In contrast to metazoan mitochondria, in which the four enzymes responsible for its replication are clearly related to enzymes from T-odd bacteriophages, plant mitochondria harbor enzymes with clear bacterial origin (DNA gyrase), proteins solely present in plant mitochondria (Msh1, OSBs, Why), and enzymes related to bacteriophages (AtTwinkle). This redundant and mosaic system may be responsible for the peculiarities present in plant mitochondrial genomes.

The study of DNA metabolism in plant mitochondria is in its infancy. We do not know how DNA replication in plant mitochondria starts, if plant mitochondria genomes need an origin of replication, and our knowledge of the physical interaction between the proteins involved in mitochondrial DNA metabolism is practically null. The classic view of the need of an origin of replication is given by the study of DNA replication in *E. coli*, where the initiator protein DnaA binds to specific sequences to drive replication initiation. In metazoan mitochondria, its RNA polymerase synthesizes RNA primers that function as primers for heavy and light chains, and it is generally accepted that yeast mitochondria start its replication at double-stranded breaks.

#### *3.2. How Is the Accesibility to Single-Stranded DNA Regulated?*

A recent proteomic analysis shows that AtRecA2, AtSSB1, AtSSB2, AtWhy2, AtOSB3, and AtOSB4 are among the most abundant proteins in plant mitochondria [55]. In solution, AtSSB1, AtWhy2, and AtOSB2 assemble as tetramers, although AtOSB2 readily form higher-order complexes (possible 8-mers or 16-mers) [59]. Surprisingly, AtWhy2 assembles as 24-mers in the presence of long segments of ssDNA (more than 7 Kbs) [140]. The carefull study by Fuchs and coworkers reveals that plant mitochondria contains approximately 140 tetramers of AtSSBs, 45 tetramers of AtOSB3 or AtOSB4, and 240 tetramers of AtWhy2 [59,128]. The abundance of AtWhy2 correlates with the fact that plants devoid of this protein accumulate DNA rearrangements mediated by microhomologous regions in the presence of agents that create DSBs [135,140]. Although no cellular studies using AtOSB2 or AtOSB3 have been carried out to date, AtOSB1 mutants accumulate homologous recombination products at repeats that are not commonly used [126]. Given that single-stranded regions of mitochondrial

DNA are coated with AtSSB2s, AtWhy2, AtOSB2, and AtOSB3, it is unknown how these proteins are removed. A possible mechanism involves AtODB1, however AtODB1 lacks the C-terminal domain involved in protein–protein interactions. Thus, it is unknown if AtODB1 is able to displace ssDNA binding proteins like AtWhy2, AtSSBs, or AtOSBs from ssDNA or if AtSSBs interact with AtRecA2 to promote filament assembly.

#### *3.3. Open Question in Plant Mitochondrial DNA Replication*

It is puzzling how the open reading frames in plant mitochondria exhibit low substitution rates, while their non-coding regions are highly variable [6,9]. Mitochondrial DNA in land plants exists as linear molecules and it is proposed that neighboring DNA molecules can act as a template to avoid mutations [6]. If this is the case, it is unknown how the correct sequence is selected, given that plant mitochondrial DNA is not methylated. Furthermore, plant organellar DNA polymerases in Arabidopsis present a gradient of almost 10-fold in replication fidelity [134] and it is unknown if postraslational modification can affect their interaction with other proteins and their biochemical properties.

Several studies indicate the presence of non-homologous end joining (NHEJ) repair signatures in plant mitochondria. However, the key components of this route Artemis and Ku proteins are not targeted to plant mitochondria and the mechanisms by which a NHEJ-like route operate in plant mitochondria are unknown. Recent work using hybrid mitochondrial cell lines discovered that changes in the human epigenome are driven by modifications in the mitochondrial genome [176]. Does the highly recombinogenic nature of plant mitochondrial DNA confers an evolutionary advantage for flowering plants as a hub for adaptation?

**Funding:** Work in L.G.B. laboratory is supported by grants SEP-CINVESTAV-63 and CONACYT-253737.

**Acknowledgments:** Cei Abreu for critical reading and Víctor Juárez for Figure 14.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Factors A**ff**ecting Organelle Genome Stability in** *Physcomitrella patens*

#### **Masaki Odahara**

Biomacromolecules Research Team, RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan; masaki.odahara@riken.jp; Tel.: +81-48-462-1111

Received: 16 December 2019; Accepted: 21 January 2020; Published: 23 January 2020

**Abstract:** Organelle genomes are essential for plants; however, the mechanisms underlying the maintenance of organelle genomes are incompletely understood. Using the basal land plant *Physcomitrella patens* as a model, nuclear-encoded homologs of bacterial-type homologous recombination repair (HRR) factors have been shown to play an important role in the maintenance of organelle genome stability by suppressing recombination between short dispersed repeats. In this review, I summarize the factors and pathways involved in the maintenance of genome stability, as well as the repeats that cause genomic instability in organelles in *P. patens*, and compare them with findings in other plant species. I also discuss the relationship between HRR factors and organelle genome structure from the evolutionary standpoint.

**Keywords:** chloroplast; mitochondrion; genome stability; homologous recombination repair; repeated sequence; *Physcomitrella patens*

#### **1. Introduction**

*Physcomitrella patens* is a moss (bryophyte) that has been used as a model species for studying cell growth and differentiation [1]. Additionally, *P. patens* is recognized as a model for land plants because it is located at the base of the land plant lineage [2]. The life cycle of *P. patens* is simple and mostly haploid. Germinated spores of *P. patens* produce filamentous protonemal cells comprising chloronemal and caulonemal cells, which subsequently produce gametophores with leafy shoots. Sporophyte, the only diploid phase in the life cycle of *P. patens*, is developed from zygotes, archegonia, and antheridia, which are formed at the top of gametophores. Nuclear DNA of *P. patens* shows exceptionally high activity of homologous recombination, which enables its use for gene targeting in combination with polyethylene glycol-mediated protoplast transformation [3]. This feature, together with its haploid vegetative growth phase and recent advances in nuclear genome analysis, has accelerated reverse genetic analyses in *P. patens* [2,4].

Each *P. patens* cell harbors ≈50 large spindle-shaped chloroplasts and many rod- or sphere-shaped mitochondria. Chloroplast and mitochondria in *P. patens*, as in other plant species and algae, possess their own DNA, which associates with proteins to form nucleoids. The mitochondrial DNA (mtDNA) of *P. patens* is 105 kb in size and harbors genes encoding transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), and proteins that regulate gene expression and oxidative phosphorylation [5]. The mapped mitochondrial genomes of angiosperms are larger than that of *P. patens*; however, they are shown to form complicated structures including linear, branched, and circular structures [6]. Moreover, homologous recombination between repeats longer than 1 kb, which are frequently observed in angiosperm mtDNA, makes them a more complicated structure. By contrast, *P. patens* mtDNA forms a single circular structure because of the absence of repeats longer than 80 bp [5,7–9]. The chloroplast DNA (cpDNA) of *P. patens* is 123 kb in size and contains genes encoding tRNAs, rRNAs, and proteins including subunits of RNA polymerase- and photosynthesis-related proteins [10]. The cpDNA of *P. patens* exhibits a

typical circular structure with large single-copy (LSC) and small single-copy (SSC) regions separated by a pair of large inverted repeat (IR) regions [10]. Except for the large IR regions (9.6 kb each), the longest dispersed repeat in *P. patens* cpDNA is 63 bp in size, with a 3 bp mismatch [11]. Notably, neither mtDNA nor cpDNA encode proteins that are involved in DNA replication, recombination, and repair; instead, proteins involved in these processes are encoded by nuclear DNA, similar to a large number of proteins that function in chloroplasts and mitochondria.

#### **2. Plant Homologs of Bacterial Proteins and Their Localization**

Because chloroplasts and mitochondria are derived from bacteria, internal contents of these organelles resemble prokaryotes. Although orthologs of bacterial proteins function in chloroplasts and mitochondria, most of the chloroplast and mitochondrial proteins are encoded by nuclear DNA because of gene transfer during evolution. In bacteria, homologous recombination repair (HRR) proteins repair DNA double-strand breaks and collapsed or stalled replication forks. Homologs of bacterial HRR factors are also found in the nuclear genome of *P. patens* and that of other plant species. The N-terminus of HRR factors contain signal peptides that target these proteins to chloroplasts and/or mitochondria. Interestingly, such bacterial-type HRR factors have not been found in animal or yeast nuclear genomes [8,12–14], implying the existence of plant-specific mechanisms underlying organelle DNA maintenance by HRR. Table 1 summarizes plant homologs of bacterial HRR factors and MutS homolog 1 (MSH1; involved in organelle genome stabilization) in *P. patens* and other plant species, including *Chlamydomonas reinhardtii* and *Arabidopsis thaliana*, which are representative models of green algae and angiosperms, respectively. Nuclear genomes of *P. patens* and other plant species encode several homologs of bacterial HRR factors, although some homologs have not been identified in the genomes of *P. patens* and other plant species, on the basis of sequence similarity.


**Table 1.** Summary of homologous recombination repair (HRR) factors and MutS homolog 1 (MSH1) in *Escherichia coli* and their plant homologs.

RecA is a key factor in HRR, as it binds to single-stranded DNA (ssDNA) and identifies homologous sequences to perform strand exchange between them [27]. Nuclear DNA of *P. patens* encodes two types of RecA homologs, RECA1 and RECA2, which show moderate sequence similarity. Phylogenetic analysis shows that these two RECA proteins cluster with either cyanobacterial RecA or proteobacterial RecA in separate clades, suggesting that these proteins have different origins, that is, RECA1 from α-proteobacteria, and RECA2 from cyanobacteria [13]. Products of *RECA1* and *RECA2* genes expressed from the nuclear DNA are predominantly localized to mitochondria and chloroplasts, respectively, thus reflecting their predicted origins [13,18]. When full-length RECA1 and RECA2 proteins are transiently produced in protoplasts, they form granular structures that associate with organelle nucleoids [8,18], indicating that these proteins constantly associate with and/or act on nucleoids. Consistent with this hypothesis, chloroplast RecA is shown to associate with the chloroplast nucleoid by nucleoids enriched proteome in maize [28]. Interestingly, although HRR factors are encoded by a single conserved gene in plants, the copy number of *RECA* varies among plant species. Although *A. thaliana* and other flowering plants harbor multiple copies of the *RECA* gene, and the encoded proteins localize to chloroplasts and/or mitochondria, algae, including *C. reinhardtii*, harbor a single *RECA* gene copy, and the encoded RecA homolog localizes to chloroplasts [12] (Table 1).

RecG, a DNA helicase/translocase, functions in the rescue of branched DNA structures including stalled replication forks [29]. The nuclear genome of *P. patens* harbors a single copy of the *RECG* gene [14]. Phylogenetic analysis shows that plant RecG homologs, including *P. patens* RECG, are closely related to cyanobacterial RecG, suggesting that these proteins originated from cyanobacteria [23]. The RECG protein of *P. patens* harbors an ambiguous N-terminal signal peptide but localizes to both chloroplasts and mitochondria, similar to the *A. thaliana* RecG homolog, RECG1 [14,23]. Moreover, full-length *P. patens* RECG protein localizes to nucleoids of both organelles [14].

Unlike RecA and RecG, RecX does not act directly on DNA but participates in HRR by directly regulating RecA activity [30]. Although RecX is absent from several bacterial classes including α-proteobacteria and cyanobacteria [31], it is encoded by single copy genes present in the nuclear genomes of diverse plants ranging from green algae to angiosperms [8]. Because of difficulty in analyzing the evolutional origin of plant RecX homologs, it is unclear whether α-proteobacteria and cyanobacteria lost their RecX or plants acquired RecX via horizontal gene transfer. In protoplasts, a fluorescent protein-tagged RecX homolog of *P. patens*, RECX, localizes to mitochondrial and chloroplast nucleoids, thereby co-localizing with RECA1 and RECA2, respectively [8].

MSH is a eukaryotic homolog of bacterial MutS. Among several types of MSH proteins, MSH1 is the only protein that localizes to organelles [32,33]. MSH1 was originally identified in *A. thaliana* as a chloroplast mutator (CHM) protein because of the variegated phenotype of the mutant [34,35]. MSH1 is distinct from other MSH proteins and MutS because of the presence of the GIY-YIG endonuclease domain at its C-terminal end [21]. The nuclear genome of *P. patens* harbors two *MSH1* genes, *MSH1A* and *MSH1B*, although nuclear genomes of other plants carry only one *MSH1* gene copy. Because MSH1A lacks the C-terminal endonuclease domain, *P. patens MSH1* genes are thought to be derived by gene duplication or the loss of C-termini endonuclease domains after the duplication event [25]. Both *P. patens* MSH1 proteins (MSH1A and MSH1B) localize to organelle nucleoids by forming granular structures [25], similar to the MSH1 localization pattern in *A. thaliana* [26].

#### **3. Maintenance of Mitochondrial Genome Stability by HRR and MSH1**

#### *3.1. RECA*

*P. patens* mitochondrial *RECA1* knockout (KO) mutants generated by targeted gene disruption show severe defects in protonema cells, with less-developed gametophores and defective mitochondria characterized by an enlarged shape, disorganized cristae, and lower matrix electron density [7], indicating that *RECA1* is essential for normal growth. The mitochondrial genome of *P. patens RECA1* KO mutant is destabilized by the accumulation of products derived from aberrant recombination between short repeats dispersed throughout the mtDNA [7]. Most of the 24 pairs of repeats (≥30 bp) identified in *P. patens* mtDNA are involved in recombination in *RECA1* KO plants [8], occasionally leading to the generation of subgenomes [7]. Interestingly, because most of the repeats are located in

introns of genes in the direct orientation, recombination between them leads to the loss of genes and generation of subgenomes, which may be subsequently lost, as these are not replicated. Thus, copy number variation of loci resulting from the loss of subgenomes is associated with instability of mtDNA in the *RECA1* KO mutant [14]. Collectively, these findings show the role of RECA1 in maintaining mtDNA stability by suppressing aberrant recombination between short dispersed repeats (SDRs) in *P. patens*. Additionally, defects in the recovery of mtDNA damaged by methyl methanesulfonate (MMS) in *RECA1* KO plants suggest the involvement of RECA1 in the repair of exogenously damaged mtDNA [13].

In *A. thaliana*, two RecA homologs, RECA2 and RECA3, localize to mitochondria (Table 1). In comparison with RECA2, RECA3 is more diverged from other RECAs and has truncated C-terminus, which is considered unusual because the C-terminus of RecA is important for its function [21,36]. Consistent with the gene structure, *A. thaliana RECA2* mutants are seedling-lethal, thus indicating the importance of RECA2 for normal plant growth; by contrast, *RECA3* mutants are almost indistinguishable from the wild type [21]. Both *RECA2* and *RECA3* mutants accumulate products derived from recombination between intermediate-sized (100–300 bp) repeats in mtDNA, and the number of repeats involving recombination in *RECA2* mutants exceed that of *RECA3* mutants [36]. Although recombination between shorter repeats (<100 bp) has not been tested in *A. thaliana RECA2* and *RECA3* mutants, the aforementioned findings suggest a fundamental role of plant mitochondrial RecA homologs in maintaining mitochondrial genome stability by suppressing aberrant recombination between short repeats.

#### *3.2. RECG*

KO mutation of *P. patens RECG* gene leads to growth and morphological defects that are similar to but milder than those caused by the KO mutation of *RECA1* in plants [14]. The *RECG* KO mutant plants exhibit abnormal mitochondria, with disorganized cristae and lower matrix density. Moreover, mtDNA of the *RECG* KO mutant is destabilized by SDR-mediated recombination, similar to the mtDNA of the *RECA1* KO mutant, and the length of repeats involved in recombination is also similar between *RECA1* and *RECG* KO mutants [14]. However, these repeats exhibit some differences between *RECA1* and *RECG* KO mutants; for example, at the mitochondrial *atp9* locus, recombination between *ccmF* and *atp9* mediated by 47 bp repeats leads to product accumulation in mitochondria of the *RECG* KO mutant, whereas recombination between *nad2* and *atp9* mediated by 60 bp repeats, which is a hallmark of recombination induced by the *RECA1* KO mutation [7], does not lead to product accumulation in mitochondria of the *RECG* KO mutant [14]. Furthermore, increase in copy numbers of all tested loci in the *RECG* KO mutant differed from that in the *RECA1* KO mutant. These differences suggest that RECG of *P. patens* plays a somewhat different role from that of RECA1 in the maintenance of mtDNA stability. Because the amount of mitochondrial recombination products often show a direct correlation with the heterogeneous *RECG* KO growth defects, recombination between mitochondrial SDRs is considered as the cause of all morphological phenotypes [14]. Because of mtDNA rearrangements induced by the KO mutation of *RECG*, the level of mitochondrial transcripts is decreased by recombination between repeats located in introns of mitochondrial genes [14]. Although *A. thaliana RECG1* mutants are morphologically indistinguishable from wild-type plants under normal growth conditions, they show mtDNA instability because of aberrant recombination between intermediate-sized repeats (100–500 bp in length) [23]. Thus, RECG1 participates in the suppression of recombination between intermediate-sized repeats, and the loss of *RECG1* leading to the accumulation of recombination products. Although recombination between shorter repeats has not been analyzed in *A. thaliana RECG1* mutants, recombination surveillance indicates that RecG homolog is involved in the suppression of aberrant recombination between short and/or imperfect repeats in plant mitochondria.

#### *3.3. RECX*

KO mutation of *P. patens RECX*, which leads to no significant morphological phenotypes, results in a minor but reliable increase in products derived from recombination between several pairs of mitochondrial SDRs [8], suggesting the involvement of RECX in the maintenance of mtDNA stability. Overexpression (OEX) of *P. patens RECX* in plants leads to mtDNA instability because of the induction of recombination between many pairs of SDRs, sometimes with a comparable level with mtDNA instability in the *RECA1* KO mutant [8]. Taking into account the protein–protein interaction between *P. patens* RECX and RECA1, as revealed by yeast two-hybrid assays, RECX is believed to modulate the function of RECA1 by directly binding to RECA1 to maintain mtDNA stability, rather than inducing mtDNA instability in wild type. The involvement of *RECX* in the maintenance of mtDNA stability is also supported by the positive correlation between the expression of *RECX* and other mtDNA stabilizing genes, including *RECA1* and *RECG*, in several tissues of *P. patens* [8]. Interestingly, the expression of *RECX*, *RECA1*, *RECG*, and *MSH1B* is highly increased in *P. patens* spores, thus indicating their roles in mtDNA maintenance during transmission to progenies.

#### *3.4. MSH1*

Because *P. patens* unusually possesses two *MSH1* genes, single and double KO mutants of *MSH1* genes were generated. Although the single and double *MSH1* mutants showed no significant phenotypes compared with the wild type, comparison among the mutants show an involvement of *MSH1B* in the maintenance of mtDNA [25]. In the single *MSH1B* KO mutant and *MSH1A* and *MSH1B* double KO mutants, mtDNA is similarly destabilized by the induction of recombination between mitochondrial repeats (21–69 bp in length) that overlap with those in *P. patens RECA1* or *RECG* KO mitochondria. On the other hand, the accumulation of products derived from recombination between *nad2* and *atp9*, rather than that of products derived from recombination between *ccmF* and *atp9*, hallmarks of the mitochondrial *atp9* locus in *RECA1* KO and *RECG* KO mutants, respectively, in the *MSH1B* mutant suggest a similar mechanism of mtDNA stabilization between MSH1B and RECA1, whereas the *MSH1 RECA1* double KO mutant is likely lethal [25]. Genetic interaction between *P. patens MSH1B* and *RECG* loci, as shown by epistatic analysis of the suppression of recombination, suggests that MSH1B and RECA1 act in distinct pathways that converge at a node in mitochondria [25]. The importance of the GIY-YIG endonuclease domain of MSH1 for the suppression of recombination is indicated by its deletion mutants; on the other hand, no significant phenotypes are observed in the *MSH1A* KO mutant*,* which lacks the endonuclease domain [25]. The instability of mtDNA in *A. thaliana MSH1* mutants is well characterized; in these mutants, recombination is observed between 50–556 bp repeats, and the length of these repeats overlaps with that of repeats responsible for mtDNA instability in the *P. patens MSH1B* KO mutant [21,32,37]. Moreover, the difference in mtDNA rearrangements between *A. thaliana MSH1* mutants and *RECA3* mutants, as well as the highly pronounced phenotypes of the *MSH1 RECA3* double KO mutants, suggest that these genes act in distinct but overlapping pathways [21]. Recent biochemical characterization of the GIY-YIG domain of *A. thaliana* MSH1 shows its binding to a branched DNA structure, proposing a mechanism for the suppression of recombination between repeats [38].

#### **4. Maintenance of Chloroplast Genome Stability by HRR Proteins and MSH1**

#### *4.1. RECA*

KO mutation of *P. patens RECA2* results in modest growth inhibition under glucose-deficient conditions and increased sensitivity to MMS or ultraviolet (UV) radiation, leading to DNA damage [11]. These phenotypes of the *RECA2* KO mutant are in contrast to those of the *RECA1* KO mutant of *P. patens*, which show severe growth defects under normal conditions. However, despite the slight effect of *RECA2* KO mutation on the morphology of *P. patens*, the cpDNA of the *RECA2* KO mutant is destabilized by the induction of recombination between SDRs (13–63 bp in length) [11]. This shows that

RECA2 is involved in the maintenance of chloroplast genome stability by suppressing recombination between SDRs. Moreover, roles of RECA1 and RECA2 in mitochondria and chloroplasts suggest the common role of RecA homologs in maintaining organelle genome stability by suppressing aberrant recombination between SDRs. Because *P. patens* cpDNA has fewer relatively long (>35 bp) repeats, the lack of RecA homologs may lead to a slight effect on the stability of cpDNA compared with that of mtDNA. Impaired recovery of damaged cpDNA, but not that of nuclear DNA or mtDNA, in *P. patens RECA2* KO mutants suggests another role of RECA2 in the maintenance of cpDNA stability by promoting recovery from DNA damage [11]. In contrast to the modest phenotypes of *P. patens* lacking chloroplast RECA, the deficiency of chloroplast RECA (RECA1) in *A. thaliana* plants (Table 1) is lethal [21]. *A. thaliana* T-DNA insertion *RECA1* mutants in which the level of *RECA1* transcripts is decreased to 15% of that in the wild type suggest that RECA1 is involved in the maintenance of cpDNA integrity by maintaining the quantity and multimeric structure of cpDNA [39]. *A. thaliana* RECA1 also maintains cpDNA stability by preventing cpDNA rearrangements in plants carrying a mutation in *Whirly* genes, which encode a family of ssDNA-binding proteins that suppress cpDNA rearrangements [40,41]. Chloroplast RECA in *C. reinhardtii* (Table 1) is also involved in the maintenance of chloroplast genome stability by suppressing aberrant recombination between SDRs, and it regulates the dynamics of chloroplast nucleoid including segregation [42].

#### *4.2. RECG*

Because the morphological defects of *RECG* KO mutant plants are similar to those of *RECA1* KO mutant plants, the defects of *RECG* KO plants are mainly attributed to defects in mtDNA. However, KO mutation of *RECG* leads to abnormal chloroplasts that over-accumulate starch and possess less-developed thylakoids, implying defects in chloroplast function [14]. Indeed, cpDNA and mtDNA of the *RECG* KO mutant are destabilized by the induction of recombination between SDRs. The repeats involved in recombination are almost common between the cpDNA of *RECG* and *RECA2* KO mutants, although the accumulation of recombination products is higher in the *RECG* KO mutant than in the *RECA2* KO mutant [14]. These results suggest that RECG maintains chloroplast genome stability by suppressing recombination between a broad range of repeats in cpDNA. Both synergistic and suppressive relationships are observed between *RECG* and *RECA2*, with respect to the suppression of recombination between chloroplast repeats, depending on the type of repeats [25], suggesting a complex relationship between these genes. Thus, *RECG* and *RECA2* may act in distinct pathways or in the same pathway, depending on the repeats, to suppress recombination. *A. thaliana* RECG1 localizes to chloroplasts; however, evidence indicating the involvement of RECG1 in the maintenance of chloroplast genome stability is lacking [23].

#### *4.3. RECX*

Although RECX localizes to chloroplast nucleoids, significant phenotypes have not been observed in the chloroplasts of *P. patens RECX* KO mutants and OEX plants. These KO and OEX plants show a basal level of products derived from recombination between chloroplast SDRs, in contrast to *P. patens RECA2* KO plants, which accumulate these recombinant products to high levels [8]. However, yeast two-hybrid assays show protein–protein interaction between *P. patens* RECX and RECA2, which is stronger than that between RECX and RECA1 [8]. This implies that RECX may interact with RECA2 and modulate its activity to maintain chloroplast genome stability, and the effect of *RECX* KO mutation or OEX was not evident probably because of the moderate effect of RECA2 inhibition on cpDNA.

#### *4.4. MSH1*

Similar to the instability of mitochondrial genome in the *MSH1* KO mutant, the *MSH1B* KO mutant shows chloroplast genome instability because of recombination between 28–63 bp SDRs in *P. patens* [25]. KO mutation of the *MSH1A* gene does not increase the abundance of recombination products in the wild-type or *MSH1B* KO mutant, indicating that *MSH1B* plays a predominant role in the suppression of recombination between SDRs in chloroplasts and mitochondria [25]. Interestingly, the level of recombination products in chloroplasts vary among the *P. patens MSH1B*, *RECA2*, and *RECG* KO mutant plants, depending on the type of repeats. Among these KO mutants, the level of products resulting from recombination between direct repeat-1 (DR-1) is the highest in *RECG* KO mutants, whereas the level of products resulting from recombination between inverted repeat-1 (IR-1) is the highest in *MSH1B* KO mutant plants [25]. This suggests a complicated regulation of recombination in chloroplasts. Similar complicated regulation is also observed in the genetic interaction between genes, as shown by synergistic relationships between *MSH1B* and *RECG* and between *MSH1B* and *RECA2*, although synergistic relationships have been observed for DR-1 but not for IR-1 [25]. Figure 1 summarizes all the factors affecting organelle stability and their relationship in *P. patens*. In *A. thaliana MSH1* mutants, cpDNA rearrangements at a locus containing a number of small repeats (<15 bp) indicate the involvement of MSH1 in maintaining chloroplast genome stability, although the details of these rearrangements remain unclear [26].

**Figure 1.** Factors affecting organelle genome stability in *P. patens*. Factors involving organelle genome stability are summarized with their relationship. Protein localization of the factors are shown by their colors: green (chloroplasts), red (mitochondria), and white (chloroplasts and mitochondria). Suppression and genetic relationship are shown by solid and dashed lines, respectively. RECX shows protein–protein interaction with RECA2, but its involvement in chloroplast genome stability remains unclear.

#### **5. Organelle Genome Structure, Repeats, and HRR Proteins**

Recent evidence in various plant species suggests the role of HRR factors in chloroplasts and mitochondria exclusively for the maintenance of genome stability by suppressing recombination between ectopic loci containing repeats, as summarized above. Because the phenomena of genome destabilization are common between mutants of organelle HRR factors, these factors likely function in a same suppression pathway. However, epistatic analyses of recombination suppression sometimes show that these factors act in distinct pathways [25]. Plant organelle HRR factors are thought to function in the repair of stalled or collapsed replication forks, which are prone to rearrangements in mutants [7]. Because such stalling and collapse of replication forks are caused by various types of DNA damage, the pathways of suppression in organelles may be regulated in a complicated manner. On the other hand, as shown in Table 1, not all HRR factors are conserved in plants, and some are absent in organelles of certain plant species; for example, mitochondrial RecA homologs are absent in some algae including *C. reinhardtii*, whereas copy numbers of mitochondrial RecA homologs are increased in various angiosperms including *A. thaliana* (Table 1) [12,13]. By contrast, chloroplast RecA copy numbers are conserved in plants (Table 1). Interestingly, the size and shape of mitochondrial genomes vary among plant species—*C. reinhardtii* possesses a 16 kb linear mitochondrial genome, whereas *A. thaliana* harbors a 368 kb multi-chromosome circular mitochondrial genome (Table 2). Moreover, the number of short repeats, which may lead to organelle genome instability because of the loss of HRR, corresponds to the size of the mitochondrial genome (Table 2). The presence/absence of RecA homologs

may be correlated to the number and characteristics of repeats; RecA homologs are absent in algae because of the lack of significant repeats in mtDNA, whereas those in angiosperms are duplicated and functionally divergent to regulate recombination between increased and divergent repeats, or duplication of mitochondrial RecA homologs enabled increase of number of repeats in angiosperms. Recent advances in genome sequencing of various plant species provide an opportunity for exploring the relationship between HRR factors and organelle genome structure.



Repeats identified as ≥20 bp of direct or inverted repeats without mismatch by using REPuter [47].

**Funding:** This work was funded by SUMITOMO Foundation (170946) and the Japan Society for the Promotion of Science (19K22405).

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Mitochondrial DNA Repair in an** *Arabidopsis thaliana* **Uracil N-Glycosylase Mutant**

#### **Emily Wynn 1,2, Emma Purfeerst 1,3 and Alan Christensen 1,\***


**\*** Correspondence: achristensen2@unl.edu; Tel.: +1-402-472-0681; Fax: +1-402-472-8722

Received: 19 December 2019; Accepted: 16 February 2020; Published: 18 February 2020

**Abstract:** Substitution rates in plant mitochondrial genes are extremely low, indicating strong selective pressure as well as efficient repair. Plant mitochondria possess base excision repair pathways; however, many repair pathways such as nucleotide excision repair and mismatch repair appear to be absent. In the absence of these pathways, many DNA lesions must be repaired by a different mechanism. To test the hypothesis that double-strand break repair (DSBR) is that mechanism, we maintained independent self-crossing lineages of plants deficient in uracil-N-glycosylase (UNG) for 11 generations to determine the repair outcomes when that pathway is missing. Surprisingly, no single nucleotide polymorphisms (SNPs) were fixed in any line in generation 11. The pattern of heteroplasmic SNPs was also unaltered through 11 generations. When the rate of cytosine deamination was increased by mitochondrial expression of the cytosine deaminase APOBEC3G, there was an increase in heteroplasmic SNPs but only in mature leaves. Clearly, DNA maintenance in reproductive meristem mitochondria is very effective in the absence of UNG while mitochondrial genomes in differentiated tissue are maintained through a different mechanism or not at all. Several genes involved in DSBR are upregulated in the absence of UNG, indicating that double-strand break repair is a general system of repair in plant mitochondria. It is important to note that the developmental stage of tissues is critically important for these types of experiments.

**Keywords:** mitochondria; DNA repair; double-strand break repair; uracil-N-glycosylase

#### **1. Introduction**

Plant mitochondrial genomes have very low base substitution rates but expand and rearrange rapidly [1–5]. The low substitution rate and the high rearrangement rate of plant mitochondria can be explained by selection and the specific DNA damage-repair mechanisms available. These mechanisms can also account for the genome expansions often found in land plant mitochondria [6]. The low nonsynonymous substitution rates in protein coding genes indicate that selective pressure to maintain the genes is high, and the low synonymous substitution rates indicate that the DNA-repair mechanisms are very accurate [7,8]. Despite the low mutation rate of mitochondrial genes over evolutionary time, mitochondrial genomes in mature cells accumulate DNA damage that is not repaired [9]. This indicates that there are fundamental differences between DNA maintenance in genomes meant to be passed on to the next generation and genomes that are not. In meristematic cells, mitochondria fuse together to form a large mitochondrion [10]. This fusion brings mitochondrial genomes together for genome replication but also ensures that there is a homologous template available for DNA repair. These meristematic cells eventually produce the reproductive tissue of a plant; from embryogenesis to egg cell production, the mitochondrial genomes inherited from parents and passed down to offspring will have homologous templates available to them [11].

Much less is known about the multiple pathways of DNA repair in plant mitochondria than in other systems, such as the nucleus. So far, there is no evidence of nucleotide excision repair (NER) or mismatch repair (MMR) in plant mitochondria [12,13]. It has been hypothesized that, in plant mitochondria, the types of DNA damage that are usually repaired through NER and MMR are repaired through double-strand break repair (DSBR) [14,15]. Plant mitochondria do have the nuclear-encoded base excision repair (BER) pathway enzyme Uracil DNA glycosylase (UNG) [12]. UNG is an enzyme that can recognize and bind to uracil in DNA and that can begin the process of base excision repair by enzymatically excising uracil (U) from single-stranded or double-stranded DNA [16]. Uracil can appear in a DNA strand due to the spontaneous deamination of cytosine or by the misincorporation of dUTP during replication [17]. Unrepaired uracil in DNA can lead to G-C to A-T transitions within the genome.

In light of the apparent absences of NER and MMR in plant mitochondria, it is possible that many lesions, including mismatches, are repaired by creating double-strand breaks and by using a template to repair both strands. Our hypothesis is that DSBR accounts for most of the repair in meristematic plant mitochondria and that both error-prone and accurate subtypes of DSBR lead to the observed patterns of genome evolution [18]. One way of testing this is to eliminate the pathway of uracil base excision repair and to ask if the G-U mispairs that occur by spontaneous deamination are repaired and, if so, are instead repaired by DSBR. In this work, we examine an *Arabidopsis thaliana UNG* knockout line and investigate the effects on the mitochondrial genome over many generations. To disrupt the genome further, we express the cytidine deaminase APOBEC3G in the *Arabidopsis* mitochondria (MTP-A3G) to increase the rate of cytosine deamination and to accelerate DNA damage.

One of the hallmarks of DSBR in plant mitochondria is the effect on the non-tandem repeats that exist in virtually all plant mitochondria [19]. The *Arabidopsis thaliana* mitochondrial genome contains two pairs of very large repeats (4.2 and 6.6 kb) that commonly undergo recombination [20–22], producing multiple isoforms of the genome. The mitochondrial genome also contains many non-tandem repeats between 50 and 1000 base pairs [19,22–24]. In wild type plants, these repeats recombine at very low rates, but they have been shown to recombine with ectopic repeat copies at higher rates in several mutants in DSBR-related genes, such as *msh1* and *reca3* [25–27]. Thus, genome dynamics around non-tandem repeats can be an indicator of increased DSBs. In this work, we show that a loss of uracil base excision repair leads to alterations in repeat dynamics, allowing us to observe an increase in genome abandonment in older leaves.

Numerous proteins known to be involved in the processing of plant mitochondrial DSBs have been characterized. Plants lacking the activity of mitochondrially targeted *recA* homologs have been shown to be deficient in DSBR [26,28]. In addition, it has been hypothesized that the plant MSH1 protein may be involved in binding to DNA lesions and in initiating DSBs [14,15]. The MSH1 protein contains a mismatch binding domain fused to a GIY-YIG type endonuclease domain which may be able to make DSBs [29,30], although an in vitro assay with a C-terminal fragment of the protein had no detectable endonuclease activity [31]. In this work, we provide evidence that, in the absence of mitochondrial UNG activity, several genes involved in DSBR, including *MSH1*, are transcriptionally upregulated, providing a possible explanation for the increased DSBR. We also provide additional evidence to support the hypothesis that mitochondrial DNA maintenance is abandoned in non-meristematic tissue [32], calling attention to the need to closely control for age and developmental state in experiments involving the mitochondrial genome.

#### **2. Results**

#### *2.1. Lack of UNG Activity in Mutants*

It has previously been reported that cell extracts of the *Arabidopsis thaliana* UNG T-DNA insertion strain used in this experiment, GK-440E07 (ABRC seed stock CS308282), show no uracil glycosylase activity [12]. To increase the rate of cytosine deamination in the mitochondrial genome and to show that effects of the UNG knockout on mitochondrial mutation rates could be detected, the catalytic domain of the human APOBEC3G–CTD 2K3A cytidine deaminase (A3G) [33] was expressed under the control of the ubiquitin-10 promoter [34] in both wild-type and UNG *Arabidopsis thaliana* lines and targeted the mitochondria by an amino-terminal fusion of the 62 amino acid mitochondrial targeting peptide (MTP) from the alternative oxidase 1A protein. Fluorescence microscopy of *Arabidopsis thaliana* expressing an MTP-A3G-GFP fusion shows that the MTP-A3G construct is expressed and targeted the mitochondria (Figure S1).

We expected that, in the absence of UNG, there would be an increase in G-C to A-T substitution mutations. To test this prediction, we sequenced a wild-type Arabidopsis plant (Col-0), a wild-type Arabidopsis plant expressing the MTP-A3G construct (Col-0 MTP-A3G), and a UNG plant expressing the MTP-A3G construct (UNG MTP-A3G) using an Illumina Hi-Seq4000 system. Mitochondrial sequence reads from these plants were aligned to the Columbia-0 reference genome (modified as described in the Materials and Methods section) using BWA-MEM [35], and single nucleotide polymorphisms were identified using VarDict [36]. VarDict was chosen due to its high sensitivity and accuracy compared with other low-frequency variant callers when analyzing Illumina HiSeq data [37].

There were no SNPs that reached fixation (an allele frequency of 1) in any plant. Mitochondrial genomes are not diploid; each cell can have many copies of the mitochondrial genome. Therefore, it is possible that an individual plant could accumulate low-frequency mutations in some of the mitochondrial genomes in the cell. VarDict was used to detect heteroplasmic SNPs at allele frequencies as low as 0.01. VarDict's sensitivity in calling low-frequency SNPs scales with depth of coverage and quality of the sample, so it is not possible to directly compare heteroplasmic mutation rates in samples with different depths of coverage. However, because the activity of the UNG protein is specific to uracil, the absence of the UNG protein should not have any effect on mutation rates other than G-C to A-T transitions. Comparing the numbers of G-C to A-T transitions to all other substitutions should reveal if the rate of mutations that can be repaired by UNG is elevated compared to the background rate. If the UNG MTP-A3G line is accumulating G-C to A-T transitions at a faster rate than the Col-0 MTP-A3G line, we would expect to see an increased ratio of G-C to A-T transitions compared to other mutation types. Complicating the analysis, significant portions of the *A. thaliana* mitochondrial genome have been duplicated in the nucleus, forming regions called NuMTs, an abbreviation of Nuclear Mitochondrial DNA [38–40]. Mutations in the NuMTs might appear to be low-frequency SNPs in the mitochondrial genome, confounding the results. However, these mutations are likely to be shared in the common nuclear background of all our lines. To avoid attributing SNPs in NuMTs to the mitochondrial genome, only those SNPs unique to individual plant lines were used in this comparison. In addition, many of the shared SNPs were flanked by a number of paired-end reads with one end in the mitochondrial genome and the other in the nuclear genome, additional evidence that they are NuMTs. The Col-0 plant had a heteroplasmic GC-AT/total SNPs ratio of 0, the Col-0 MTP-A3G plant had a heteroplasmic GC-AT/total SNPs ratio of 0.47, while the UNG MTP-A3G plant had a heteroplasmic GC-AT/total SNPs ratio of 0.92 (Table 1). Therefore, when the rate of cytosine deamination is increased by the activity of APOBEC3G, Arabidopsis plants accumulate GC-AT SNPs and our computational pipeline is able to detect this increase.

**Table 1.** Heteroplasmic mitochondrial single nucleotide polymorphisms (SNPs) in Col-0 wild-type, generation 10 *uracil DNA glycosylase* (*UNG)* mutant lines, Col-0 MTP-A3G, and *UNG* MTP-A3G: SNPs were called using VarDict as described in the Methods section. SNP counts are shown for the entire mitochondrial genome. For the full spectrum of SNP types, including allele frequencies, see Supplementary File 2.


#### *2.2. Mutation Accumulation in the Absence of UNG*

To determine the effects of the *UNG* knockout across multiple generations, we performed a mutation accumulation study [41]. We chose 23 different *UNG* homozygous plants derived from one hemizygous parent. These 23 plants were designated as generation 1 *UNG* and were allowed to self-cross. The next generation was derived by single-seed descents from each line, and this was repeated until generation 10 *UNG* plants were obtained. Leaf tissue and progeny seeds from each line were kept at each generation.

The leaf tissue from generation 10 of the *UNG* mutation accumulation lines and a wild-type Col-0 were sequenced and analyzed with VarDict as described above. Similar to the MTP-A3G plants, there were no SNPs in any of our *UNG* mutation accumulation lines that had reached fixation (an allele frequency of one). In contrast, there was no relative increase in the ratios of GC-AT/total SNPs between the *UNG* lines and Col-0 (see Table 1). Because detection of low-frequency SNPs depends on read depth, we only report the 7 *UNG* samples with an average mitochondrial read depth above 125× for this comparison. In the absence of a functional UNG protein and under normal greenhouse physiological conditions, plant mitochondria do not accumulate cytosine deamination mutations at an increased rate.

#### *2.3. Nuclear Mutation Accumulation*

UNG is the only uracil-N-glycosylase in *Arabidopsis thaliana* and may be active in the nucleus as well as the mitochondria [12]. To test for nuclear mutations due to the absence of UNG, sequences were aligned to the Columbia-0 reference genome using BWA-MEM and single nucleotide polymorphisms were identified using Bcftools Call [42]. The *UNG* mutation accumulation lines do not have an elevated G-C to A-T mutation rate compared to wild-type (Table 2).

#### *2.4. Alternative Repair Pathway Genes*

Because the *UNG* mutants show increased double-strand break repair but not an increase of G-C to A-T transition mutations, we infer that the inevitable appearance of uracil in the DNA is repaired via conversion of a G-U pair to a double-strand break and efficiently repaired by the DSBR pathway. If this is true, genes involved in the DSBR processes of breakage, homology surveillance, and strand invasion in mitochondria will be upregulated in *UNG* mutants. To test this hypothesis, we assayed transcript levels of several candidate genes known to be involved in DSBR [13,23,25–28,43–46] in *UNG* lines compared to wild-type using RT-PCR. *MSH1* and *RECA2* were significantly upregulated in *UNG* lines (*MSH1*: 5.60-fold increase, unpaired T-test *p* < 0.05. *RECA2*: 3.19-fold increase, unpaired T-test *p* < 0.05; see Figure 1). The single-strand binding protein gene *OSB1* was also measurably upregulated in *UNG* lines (3.07-fold increase, unpaired T-test *p* = 0.053). *RECA3*, *SSB*, and *WHY2* showed no significant differential expression compared to wild-type (unpaired T-test *p* > 0.05).


**Table 2.** Nuclear SNPs in Col-0 wild-type, *UNG* mutant lines, Col-0 MTP-A3G, and *UNG* MTP-A3G: SNPs were called using Bcftools Call as described in the Methods section. SNP counts are for each chromosome, excluding chromosome 2. For individual data on each chromosome, see Supplementary File 2.

#### *2.5. Increased Mitochondrial Genome Abandonment*

If most DNA damage in plant mitochondria is repaired by double-strand break repair (DSBR), supplemented by base excision repair [12], then in the absence of the Uracil-N-glycosylase (UNG) pathway, we predict an increase in DSBR. To find evidence of this, we used quantitative PCR (qPCR) to assay crossing over between identical non-tandem repeats because changes in the dynamics around these repeats is indicative of changes in DNA processing at double-strand breaks [26,27,46]. Different combinations of primers in the unique sequences flanking the repeats allow us to determine the relative copy numbers of parental-type repeats and low-frequency recombinants (Figure 2a). The mitochondrial genes *cox2* and *rrn18* were used to standardize relative amplification between lines. We and others [24,46] have found that some of the non-tandem repeats are well suited for qPCR analysis and are sensitive indicators of ectopic recombination, increasing in repair-defective mutants. We analyzed the three repeats known as repeats B, D, and L [23] in both young leaves and mature leaves. In young leaves, there is no significant difference in the amounts of parental or recombinant forms between *UNG* lines and Col-0 (Figure 2b). In mature leaves, all three repeats show significant reductions in the parental 2/2 form while repeat B also shows a reduction in the parental 1/1 form

(unpaired t-test *p* < 0.05; Figure 2c). There is a difference in genome dynamics around non-tandem repeats in young leaves compared to old leaves, indicating a difference in the way these genomes are maintained.

**Figure 1.** Quantitative RT-PCR assays of enzymes involved in double-strand break repair (DSBR) in *UNG* lines relative to wild-type: Fold change in transcript level is shown on the Y-axis. Error bars are standard deviation of three biological replicates. *MSH1* and *RECA2* are significantly transcriptionally upregulated in *UNG* lines relative to wild-type (5.60-fold increase and 3.19-fold increase, respectively. Unpaired, 2-tailed student's t-test, \* indicates *p* < 0.05). *OSB1* is nearly significantly upregulated in *UNG* lines relative to wild-type (3.07-fold increase. Unpaired t-test *p* = 0.053).

#### *2.6. Transmission of SNPs Across Generations*

To determine if any heteroplasmic SNPs are passed on to the next generation, two progenies of each of the wild-type, *UNG*, MTP-A3G, and *UNG* MTP-A3G plants that were sequenced above were planted. Leaves were collected from each plant when it was 17 days old (young leaf) and again when it was 36 days old (mature leaf). Both the young and mature leaves of each plant were sequenced and analyzed as described above. Only 1 heteroplasmic SNP could be traced from a parent plant to both progeny, and 7 heteroplasmic SNPs could be traced from a parent plant to one progeny (Supplementary File 2). Interestingly, 117 heteroplasmic SNPs were detected in both offspring but not the parent plant. It is possible that heteroplasmic mutations that occur in reproductive tissue after the parental tissue had been collected could be passed on to the progeny. However, only 3 of these heteroplasmic SNPs are found in the mature tissue of both progeny, indicating that, even if a heteroplasmic SNP is passed on to a future generation, it is likely to be removed from the mitochondrial population before reproduction by genetic drift or gene conversion. In fact, of the 2792 heteroplasmic SNPs that were detected in young tissue across all samples, only 4 were detected in the mature tissue of the same plant. The overwhelming majority of heteroplasmic SNPs arose in mitochondria in non-meristematic differentiated tissue.

**Figure 2.** qPCR analysis of intermediate repeat recombination in *UNG* lines compared to wild-type: Recombination at intermediate repeats is an indicator of increased double-strand breaks in plant mitochondrial genomes. (**a**) Primer scheme for detecting parental and recombinant repeats: Using different combinations of primers that anneal to the unique sequence flanking the repeats, either parental type (1/1 and 2/2) or recombinant type (1/2 and 2/1) repeats can be amplified. (**b**) Fold change of intermediate repeats in young leaves of *UNG* lines relative to wild-type: Error bars are standard deviation of three biological replicates. (**c**) Fold change of intermediate repeats in mature leaves of *UNG* lines relative to wild-type: Error bars are standard deviation of three biological replicates. B1/1, B2/2, D2/2, and L2/2 show significant reduction in copy number (unpaired, 2-tailed student's t-test, \* indicates *p* < 0.05).

#### *2.7. SNP Accumulation in Young vs. Mature Leaves*

To confirm that the effects of the UNG knockout and the expression of APOBEC3G are consistent, the progenies of the wild-type, *UNG*, MTP-A3G, and *UNG* MTP-A3G plants were analyzed and the ratio of heteroplasmic GC-AT to total heteroplasmic SNPs was compared as described above. In mature leaves, the results were similar to the previous generation: both the *UNG* MTP-A3G and Col-0 MTP-A3G samples had increased GC-AT SNPs compared to the *UNG* and Col-0. Interestingly, in young leaves, neither the *UNG* MTP-A3G nor the Col-0 MTP-A3G samples had increased GC-AT SNPs (See Table 3). This indicates that the processes of mitochondrial genome maintenance are more efficient at repairing DNA damage in young leaves.

#### *2.8. Quality Control of DNA Library Preparation*

A common source of error when calling low-frequency SNPs is oxidative damage during library preparation [47]. This oxidative damage affects guanines, and the effects of this damage can be measured by comparing the ratio of G to T mutations between the R1 paired-end read and the R2 paired-end read. A Global Imbalance Value (GIV) above 1.5 indicates DNA damage during library preparation, while a GIV below 1.5 indicates little damage during library preparation. None of the samples used in this study had a GIV above 1.5 (see Supplementary File 2), indicating that DNA damage during library prep is not a significant source of false SNP calls.



#### *Plants* **2020** , *9*, 261

#### **3. Discussion**

In mitochondria as well as in the nucleus and chloroplast, cytosine is subject to deamination to uracil. This could potentially lead to transition mutations and is dealt with by a specialized base excision repair pathway. The first step in this pathway is hydrolysis of the glycosidic bond by the enzyme Uracil-N-glycosylase (UNG), leaving behind an abasic site [16]. An AP (apurinic) endonuclease can then cut the DNA backbone, producing a 3- OH and a 5 dRP (5- -deoxyribose-5-phosphate). Both DNA polymerases found in *A. thaliana* mitochondria, POL1A and POL1B, exhibit 5- -dRP lyase activity, allowing them to remove the 5 dRP and to polymerize a new nucleotide replacing the uracil [48]. In the absence of functional UNG protein, cytosine will still be deaminated in plant mitochondrial genomes, so efficient removal of uracil must be through a different repair mechanism, most likely DSBR [14,15]. We have found that, in *UNG* mutant lines, there is an increase in the expression of genes known to be involved in DSBR and significant changes in the relative abundance of parental and recombinant forms of intermediate repeats, consistent with this hypothesis.

We have shown that, when cytosine deamination is increased by the expression of the APOBEC3G cytidine deaminase in plant mitochondria, *UNG* lines accumulate more G-C to A-T transitions in mature leaves than does wild-type. Surprisingly, we have also found that, under normal cellular conditions, without the added deamination activity of APOBEC3G, *UNG* lines do not accumulate G-C to A-T transition mutations at a higher rate than wild-type. This finding is particularly surprising given the presumed bottlenecking of mitochondrial genomes during female gametogenesis and given the deliberate bottleneck in the experimental design of single-seed descent for 11 generations. This finding supports the hypothesis that plant mitochondria have a very efficient alternative damage surveillance system that can prevent G-C to A-T transitions from becoming fixed in the meristematic mitochondrial population. The use of the cytidine deaminase allows us to specifically alter the mutation rate, which helps us disentangle mutation from repair, selection and drift—common complications in mutation accumulation experiments [49].

The angiosperm MSH1 protein consists of a DNA mismatch-binding domain fused to a double-stranded DNA endonuclease domain [1,21] Although mainly characterized for its role in recombination surveillance [36], MSH1 is a good candidate for a protein that may be able to recognize and bind to various DNA lesions and to make DSBs near the site of the lesion, thus funneling these types of damage into the DSBR pathway. With many mitochondria and many mitochondrial genomes in each cell, there are numerous available templates for accurate repair of DSBs through homologous recombination, making this a plausible mechanism of genome maintenance. Here, we show that, in *UNG* lines, *MSH1* is transcriptionally upregulated more than 5-fold compared to wild-type. This is consistent with the hypothesis that MSH1 initiates repair in plant mitochondria by creating a double-strand break at G-U pairs and possibly other mismatches and damaged bases.

Several other proteins involved in processing plant mitochondrial DSBs have been characterized. The RECA homologs RECA2 and RECA3 are homology search and strand invasion proteins [26–28,45,50–52]. The two mitochondrial RECAs share much sequence similarity; however, RECA2 is dual targeted to both the mitochondria and the plastids, while RECA3 is found only in the mitochondria [26,27]. RECA3 also lacks a C-terminal motif present on RECA2 and most other homologs. This motif has been shown to modulate the ability of RECA proteins to displace competing ssDNA binding proteins in *E. coli* [53]. Arabidopsis *reca2* mutants are seedling lethal, and both *reca2* and *reca3* lines show increased ectopic recombination at intermediate repeats [26]. Arabidopsis RECA2 has functional properties that RECA3 cannot perform, such as complementing a bacterial *recA* mutant during the repair of UV-C-induced DNA lesions [20]. Here, we show that, in *UNG* lines, *RECA2* is transcriptionally upregulated more than 3-fold compared to the wild-type. However, *RECA3* is not upregulated in *UNG* lines. Responding to MSH1-initiated DSBs may be one of the functions unique to RECA2. The increased expression of *RECA2* in the absence of a functional UNG protein is further evidence that uracil arising in DNA may be repaired through the mitochondrial DSBR pathway.

The ssDNA binding protein OSB1- s transcript is upregulated over 3-fold. At a double-strand break, OSB1 competitively binds to ssDNA and recruits the RECA proteins to promote the repair of a double-strand break by a homologous template and to avoid the error-prone microhomology-mediated end-joining pathway [54].

We also tested the differential expression of other genes known to be involved in processing mitochondrial DSBs. The single-stranded binding protein genes *WHY2* and *SSB* were not found to be differentially expressed at the transcript level compared to wild-type. The presence of different ssDNA binding proteins influences which pathway of DSBR a break is repaired by [54]. Increased amounts of WHY2 and SSB may not be needed for accurate repair of induced DSBs in the *UNG* lines.

At intermediate repeats, the maintenance of the mitochondrial genome is different between wild-type, *UNG* mutants, and DSBR mutants. In *msh1* lines, there is an increase in repeat recombination likely due to relaxed homology surveillance in the absence of the MSH1 protein [27]. In mutant lines of ssDNA binding proteins involved in DSBR, such as *recA2, recA3,* and *osb1* [26,55], there is an increase in repeat recombination due to differences in the way DNA ends are handled in the absence of these ssDNA binding proteins. In young leaves, there is no significant difference in recombination at intermediate repeats between *UNG* lines and wild-type, while in mature leaves, *UNG* lines show a reduction in parental type repeats compared to wild-type. In *UNG* lines, the mitochondrial recombination machinery is still intact, so any differences in genome dynamics at intermediate repeats are not due to differences in processing the DSBs; instead, this could indicate that there is an increase in double-strand breaks and an increase in attempted DSBR by break-induced replication at intermediate repeats or that this could be an indication of degradation of mtDNA as differentiated tissue ages.

Plant mitochondrial genomes likely replicate by recombination-dependent replication (RDR) [56]. Most organellar genome replication occurs in meristematic tissue, where mitochondria fuse together to form a large, reticulate mitochondrion [10]. This mitochondrial fusion provides a means to homogenize mtDNA by gene conversion and to repair lesions through homologous recombination [57]. Accurate repair of uracil by homologous recombination would not be expected to change repeat dynamics. As cells differentiate and age, organellar genomes degrade [32]. Organellar genomes in nonreproductive tissue can be "abandoned" rather than repaired, reducing the metabolic cost of DNA repair [32]. In a mature cell, an attempt to repair uracil in the mitochondrial genome could lead to degradation of the DNA and changes in repeat dynamics if a double-strand break is initiated without a homologous template available. There is a difference in mitochondrial DNA maintenance in mature cells compared to young cells, due to either a lack of DNA repair in mature mitochondria or a difference in DNA-repair mechanism.

To determine the outcomes of genomic uracil in the absence of a functional UNG protein, we sequenced the genomes of several *UNG* lines. No fixed mutations of any kind were found in *UNG* lines, even after 11 generations of self-crossing. Low-frequency heteroplasmic SNPs were found in both wild-type and *UNG* lines, but *UNG* lines showed no difference in the ratio of G-C to A-T transitions to other mutation types when compared to wild-type. When the rate of cytosine deamination was increased with the expression of the APOBEC3G deaminase, there was an increase in G-C to A-T transitions but only in mature leaves. This is consistent with the idea of abandonment and is evidence that, in mitochondrial genomes that have not been abandoned, there is an efficient and accurate system of nonspecific repair.

Clearly, plant mitochondria can repair uracil in DNA sufficiently to prevent mutation accumulation in the absence of the UNG protein. Why then has the BER pathway been conserved in plant mitochondria while NER and MMR have apparently been lost? DSBR may be able to protect the genome efficiently from mutations being inherited by the next generation (see Table 3). There may still be selection to maintain mitochondrial BER to reduce the rate of mitochondrial genome abandonment and degradation in aging tissues. Throughout the evolutionary history of *Arabidopsis thaliana* and into the present, wild growing plants are exposed to a range of growth conditions and stresses that experimental plants in a greenhouse avoid. The rate of spontaneous cytosine deamination increases with increasing temperature [58,59], so DSBR alone may not be able repair the extent of uracil found in DNA across

the range of temperatures a wild plant would experience, providing the selective pressure to maintain a distinct BER pathway in plant mitochondria. If DSBR activity is reduced or lost as leaf tissue ages, there may also be a selective advantage to the plant of maintaining BER in mature leaves so they can continue to perform intermediary metabolism even as they age.

Here, we have provided evidence that, in the absence of a dedicated BER pathway, plants growing in greenhouse growth chamber conditions do not accumulate mitochondrial SNPs at an increased rate. Instead, DNA damage is accurately repaired by double-strand break repair, which also causes an increase in ectopic recombination at identical non-tandem repeats. It has recently been shown that mice lacking a different mitochondrial BER protein, oxoguanine glycosylase, also do not accumulate mitochondrial SNPs [60]. Here, we show that, in plants, base-excision repair by UNG is similarly unnecessary to prevent mitochondrial mutations in growth chamber conditions. The presence of the UNG pathway reduces ectopic recombination slightly and can successfully repair uracil in DNA even if the rate of cytosine deamination is increased. We have also found that, in mature leaves, uracil mutations do occur, further confirming the hypothesis that organellar genomes are abandoned in terminally differentiated tissues [32] and emphasizing the need for considering the tissue age and type when interpreting experimental results on DNA replication, repair, and recombination. Double-strand break repair and recombination are important mechanisms in the evolution of plant mitochondrial genomes, but many key enzymes and steps in the repair pathway are still unknown. Further identification and characterization of these missing steps is sure to provide additional insight into the unique evolutionary dynamics of plant mitochondrial genomes.

#### **4. Materials and Methods**

#### *4.1. Plant Growth Conditions*

*Arabidopsis thaliana* Columbia-0 (Col-0) seeds were obtained from Lehle Seeds (Round Rock, TX, USA). UNG (AT3G18630) T-DNA insertion hemizygous lines were obtained from the Arabidopsis Biological Resource Center, line number CS308282. Hemizygous T-DNA lines were self-crossed to obtain homozygous lines (Genotyping primers: wild-type 5- -TGTCAAAGTC CTGCAATTCTTCTCACA-3 and 5- -TCGTGCCATATCTTGCAGACCACA-3- , and *UNG* 5- -ATA ATAACGCTGCGGACATCTACATTTT-3 and 5- -ACTTGGAGAAGGTAAAGCAATTCA-3- ). All plants were grown in walk-in growth chambers under a 16:8 light:dark schedule at 22 ◦C. Plants grown on agar were surface sterilized and grown on 1× Murashige and Skoog Basal Medium (MSA) with Gamborg's vitamins (Sigma, St. Louis, MO, USA) with 5 μg/mL Nystatin Dihydrate to prevent fungal contamination.

#### *4.2. Vector Construction*

The APOBEC3G gene [61] was synthesized by Life Technologies Gene Strings using *Arabidopsis thaliana*-preferred codons and including the 62 amino acid mitochondrial targeting peptide (MTP) from alternative oxidase on the N-terminus of the translated protein. The MTP-A3G construct was cloned into the vector pUB-DEST (NCBI:taxid1298537) driven by the ubiquitin (UBQ10) promoter and transformed into wild-type and *UNG Arabidopsis thaliana* plants by the *Agrobacterium* floral dip method [62]. To ensure proper mitochondrial targeting of the MTP-A3G construct, the construct was cloned into pK7FWG2 with a C-terminal GFP fusion [63]. *Arabidopsis thaliana* plants were again transformed by the *Agrobacterium* floral dip method, and mitochondrial fluorescence was confirmed with confocal fluorescence microscopy.

#### *4.3. RT-PCR*

RNA was extracted from young leaves of plants grown in parallel on MSA during *UNG* generation ten [64]. Reverse transcription using Bio-Rad iScript was performed, and the resulting cDNA was used as a template for qPCR to measure relative transcript amounts. Quantitative RT-PCR data was normalized using *UBQ10* and *GAPDH* as housekeeping gene controls. Reactions were performed in a Bio-Rad CFX96 thermocycler using 96-well plates and a reaction volume of 20 μL/well. SYBRGreen mastermix (Bio-Rad, Hercules, CA, USA) was used in all reactions. Three biological replicates from different *UNG* MA lines and three technical replicates were used for each amplification. Primers are listed in Table S1. The MIQE guidelines were followed [65], and primer efficiencies are listed in Table S2. The thermocycling program for all RT-qPCR was a ten-minute denaturing step at 95◦ followed by 45 cycles of 10 s at 95◦, 15 s at 60◦, and 13 s at 72◦. Following amplification, melt curve analysis was done on all reactions to ensure target specificity. The melt curve program for all RT-qPCR was from 65◦–95◦ at 0.5◦ increments for 5 s each.

#### *4.4. Repeat Recombination qPCR*

DNA was collected from young and mature leaves of Columbia-0 and generation ten *UNG* plants grown in parallel using the CTAB DNA extraction method [66]. qPCR was performed using primers from the flanking sequences of the intermediate repeats. Primers are listed in Table S1. Using different combinations of forward and reverse primers, either the parental or recombinant forms of the repeat can be selectively amplified (see Figure 2a). The mitochondrially encoded *cox2* and *rrn18* genes were used as standards for analysis. Reactions were performed in a Bio-Rad CFX96 thermocycler using 96-well plates with a reaction volume of 20μL/well. SYBRGreen mastermix (Bio-Rad) was used in all reactions. Three biological and three technical replicates were used for each reaction. The thermocycling program for all repeat recombination qPCR was a ten-minute denaturing step at 95◦ followed by 45 cycles of 10 s at 95◦, 15 s at 60◦, and a primer-specific amount of time at 72◦ (extension times for each primer pair can be found in Table S3). Following amplification, melt curve analysis was done on all reactions to ensure target specificity. The melt curve program for all qPCR was from 65◦–95◦ at 0.5◦ increments for 5 s each.

#### *4.5. DNA Sequencing*

DNA extraction from frozen mature leaves of Columbia-0, generation 10 and *UNG*, and MTP-A3G plants and again from young and mature leaves of the progeny of these plants was done by a modification of the SPRI (Solid Phase Reversible Immobilization) magnetic beads method of Rowan et al. [67,68]. Genomic libraries for paired-end sequencing were prepared using a modification of the Nextera protocol [69] and modified for smaller volumes following Baym et al. [70]. Following treatment with the Nextera Tn5 transpososome, 14 cycles of amplification were done. Libraries were size-selected to be between 400 and 800 bp in length using SPRI beads [68]. Libraries were sequenced with 150 bp paired-end reads on an Illumina HiSeq 4000 by the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley. The raw data files are deposited with the Sequence Read Archive at ncbi.nlm.nih.gov under BioProject number PRJNA492503.

Reads were aligned using BWA-MEM v0.7.12-r1039 [35]. The reference sequence used for alignment was a file containing the improved Columbia-0 mitochondrial genome (accession BK010421.1) [71] as well as the TAIR 10 *Arabidopsis thaliana* nuclear chromosomes and chloroplast genome sequences [72]. A large portion of the mitochondrial genome has been duplicated into chromosome 2 [40]. To prevent reads from mapping to both locations, this large NuMT region was deleted from chromosome 2. Using Samtools v1.3.1 [73], bam files were sorted for uniquely mapped reads for downstream analysis. MarkDuplicates from the Genome Analysis ToolKit (GATK) was used to remove duplicate reads due to PCR during library prep [74].

Organellar variants were called using VarDict [36]. To minimize the effects of sequencing errors and to reduce false positives, SNPs called by VarDict were filtered by the stringent quality parameters of Qmean ≥ 30, MQ ≥ 30, NM ≤ 3, Pmean ≥ 8, Pstd = 1, AltFwdReads ≥ 3, and AltRevReads ≥ 3. When calling low-frequency SNPs, it is difficult to remove all false positives without also removing some true positives. By treating all samples to the same sequence analysis pipeline, all samples will have a similar spectrum of false positives. By analyzing the ratios of different SNP types rather than

raw SNP numbers, we further isolate biological effects from computational noise. VarDict was chosen because it is more sensitive to low allele-frequency variants [37].

DNA damage during library preparation was measured by individually analyzing the paired ends of Illumina paired-end sequencing and by looking for imbalances in mutations between the paired ends [47]. Mapped bam files were split into separate pairs, and GIV scores were calculated for each SNP type using the Damage-Estimator with mapping and base quality cutoffs set to 30.

Nuclear variants were called using Samtools mpileup (v. 1.3.1) and Bcftools call (v. 1.2) and were filtered for SNPgap of 3, Indelgap of 10, RPB > 0.1 and QUAL > 15, at least 3 high quality ALT reads (DP4(2) + DP4(3) ≥ 3), at least one high quality ALT read per strand (DP4(2) ≥ 1 and DP4(3) ≥ 1), and a high-quality ALT allele frequency ≥ 0.3. Chromosome 2 was excluded from this analysis to avoid false positives resulting from the presence of the large NuMT that has been duplicated and repeated there.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2223-7747/9/2/261/s1, Figure S1: Mitochondrial targeting of a GFP labeled MTP-APOBEC3G construct, Table S1: Primers for RT-PCR, Table S2: qPCR primer efficiency, Table S3: Primers for ROUS recombination assay, Supplementary File 2: SNP analysis tables.

**Author Contributions:** Conceptualization, E.W. and A.C.; methodology, E.W., E.P., and A.C.; software, E.W.; validation, E.W., E.P., and A.C.; formal analysis, E.W.; investigation, E.W., E.P. and A.C.; resources, A.C.; data curation, A.C.; writing—original draft preparation, E.W. and A.C.; writing—review and editing, E.W., E.P., and A.C.; visualization, E.W.; supervision, A.C.; project administration, A.C.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Science Foundation (USA), grants MCB-1413152 and MCB-1933590 to A.C.C.

**Acknowledgments:** Conversations with Arnie Bendich about organelle DNA replication and repair in meristem and vegetative cells were interesting and illuminating. We are grateful to Emily Jezewski for finding time in her busy golf schedule to do some of the qPCR experiments. Beth Rowan provided advice on Illumina library preparations from Arabidopsis leaves and many insights and helpful conversations about plant mitochondrial genomes. Daniel Sloan also provided insights and useful discussions. We thank Christian Elowski and the Nebraska Center for Biotechnology Core Research Facility for Microscopy for confocal fluorescent microscopy. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 OD018174 Instrumentation Grant. Daniel Schachtman helped with disposal of leaf tissues from generations 2–9. The use of product and company names is necessary to accurately report the methods and results; however, the United States Department of Agriculture (USDA) neither guarantees nor warrants the standard of the products, and the use of names by the USDA implies no approval of the product to the exclusion of others that may also be suitable. The USDA is an equal opportunity provider and employer.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
