*3.5. Analysis of Repeat Sequences in the P. vulgaris Mitogenome*

The vast majority of the variance in genome size of plant mitogenomes can be explained by differences in the sizes of repeat sequences, which are composed of SSRs, tandem repeats and dispersed repeats. Plant mitogenomes, particularly those of angiosperms, were already well known for its sizeable fractions of repetitive sequences even before any complete mitogenomes were available. SSRs are DNA tracts of tandem-repeated motifs of one to six bases that are useful molecular markers in studying genetic diversity and species identification [21]. In this study, a total of 314 perfect SSRs were identified in the *P. vulgaris* mitogenome, including 139 mono-, 140 di-, 5 tri-, 22 tetra-, 3 penta-, and 5 hexa-nucleotide repeats (Table 4). The mononucleotide repeats of A/T (129 repeats) were found to be more prevalent than other repeat types. The dinucleotides repeats, TA/AT, are the second most numerous (50 repeats), while tri-, tetra-, penta-, and hexa-nucleotide repeats are fewer in number and only observed in intronic or intergenic regions. As shown in Table 5, seven tandem repeats with lengths ranging from 13 bp to 57 bp were also detected in the *P. vulgaris* mitogenome. Among these seven tandem repeats, only one is localized in a coding region (*rrnL*), while the others are all found in intergenic spacers.

Besides SSRs and tandem repeats, 143 dispersed repeats with lengths > 30 bp (total length: 35,000 bp; 8.85% of the genome) were also identified in the *P. vulgaris* mitogenome (Figure 7; Table S4). Most of the repeats (77 repeats, 53.85%) are 30 bp to 59 bp long, and 25 repeats are longer than 100 bp, with only two longer than 1 kb (R1: 4866 bp; R2: 3529 bp). Previous studies have documented the importance of large repeats (>1 kb) in genomic structural changes, and pairwise direct and inverted large repeats may produce two small subgenomic conformations or isomeric conformations, respectively. As shown in Figure 1, the largest repeat was assembled as Contig15 and the second largest was assembled as Contig40, both of which were inverted repeats. By aligning the PacBio long reads to both ends of the two large repeats, we constructed the master circle and two isomeric molecules (Figure 1). Repeats are commonly found in plant mitogenomes but are poorly conserved across species, even within the same family. As shown in Figure 7 and Table S4, the total number of repeats ranges from 59 in *V. angularis* to 215 in *M. pinnata*, and the total length of repeats ranges from 9224 bp (2.28% of the whole genome) in *V. angularis* to 411,265 bp (69.94% of the whole genome) in *V. faba*. Mitogenome enlargement in *V. faba* is mainly caused by the expansion of repeated sequences. Thirteen large (>1 kb) repeats covered 398.8 kb or 68% of the whole mitogenome size [17]. However, when all but single copies of the large repeat sequences were excluded, the *V. faba* mitogenome size is 388.6 kb, which is similar to other Papilionoideae mitogenomes [18]. The extremely complex repeat patterns should be responsible for the various genome sizes of the plant mitogenome. However, genome size is by no means only determined by the size of repeats. The mitochondrial genome of *Vitis vinifera* has only 7% repeats despite a genome size of nearly 773 kb [73], while the moderately-sized (404.5 kb) *V. angularis* genome has fewer and smaller repeats than those found in the much smaller genomes of *Brassica napus* (222 kb) and *Silene latifolia* (253 kb; Table S2) [10,74].



 **5.** Distribution of tandem repeats in *P. vulgaris* mitogenome.

**Table**


**Figure 7.** Frequency distribution of dispersed repeat in the *P. vulgaris* mitogenome compared with five other leguminous plants. The number of dispersed repeats in *Phaseolus vulgaris*, *Glycine max Vicia faba*, *Vigna faba*, *Vigna angularis*, *Lotus japonicus*, and *Millettia pinnata* mitogenomes are shown by blue, orange, gray, yellow, blue, and green, respectively.

## *3.6. Phylogenetic Analyses and Multiple Losses of PCGs during Evolution*

With rapid developments in sequencing technology and assembly methods, an increasing number of complete plant mitogenomes has been assembled, providing an important opportunity for phylogenetic analyses using mitogenomes. In this study, to determine the phylogenetic position of *P. vulgaris*, we downloaded 23 plant mitogenomes from the GenBank database (https://www.ncbi.nlm. nih.gov/genome/browse/), including 19 species of Fabales, two species of Solanales, and two species of Malpighiales. A set of 26 conserved single-copy orthologous genes (*atp1*, *atp4*, *atp6*, *atp8*, *atp9*, *ccmB*, *ccmC*, *ccmFC*, *ccmFN*, *cob*, *cox1*, *cox3*, *matR*, *nad1*, *nad2*, *nad3*, *nad4*, *nad4L*, *nad5*, *nad6*, *nad7*, *nad9*, *rps3*, *rps4*, and *rps12*) present in all of the 23 analyzed mitogenomes was used to construct the phylogenetic tree, and species from the Solanales and Malpighiales were designated as the outgroup. As shown in Figure 8, the bootstrap values of each node are all over 70% supported and 15 nodes are supported 100%. The ML phylogenetic tree strongly supports that *P. vulgaris* is evolutionarily close to the clade formed by two Vigna species. The tree also strongly supports the separation of Fabales from the clade composed of Solanales and Malpighiales (100% bootstrap value), as well as the separation of Papilionoideae from the clade composed of Cercidoideae, Detarioideae, and Caesalpinioideae (100%). The bootstrap value for the separation of Detarioideae and Caesalpinioideae is 80%, and the value for the separation of Cercidoideae from the clade composed of Detarioideae and Caesalpinioideae is 70%.

As described by Richardson et al. [75], the mitochondrial genomes of higher plants vary significantly in genome size, gene content and order. Losses of PCGs occurred frequently during the evolution of higher plants. The phylogenetic tree provides a backdrop for the further analysis of gene loss during evolution, and the gene contents of all observed species are summarized in Figure 9. Most of the PCGs were conserved in different plant mitogenomes, especially for the genes in the groups of Complex I, Complex III, Complex V, cytochrome *c* biogenesis, maturases, and transport membrane protein [13]. The conservation of these genes suggests that they play crucial roles in the function of mitochondria. However, the ribosomal proteins and succinate dehydrogenase genes were highly variable. As shown in Figure 9, the *cox2* gene was only lost in the subfamily Phaseolinae (*V. angularis*, *V. radiata*, and *P. vulgaris*) but retained in other leguminous plants, suggesting that this gene was lost after separation from the subfamily Glycininae. The *rpl2* gene was lost in most leguminous

plants but regained in *A. ligulate*, *L. trichandra*, *H. brasuletto*, and *L. coriaria*, suggesting that this gene was lost before the emergence of Fabales but could be regained in some leguminous plants. Similar phenomena were found in many ribosomal proteins (*rpl10*, *rpl16*, *rps7*, *rps10*, and *rps19*). Additionally, *rpl6* and *rps8* genes were lost from liverworts (*M. polymorpha*) during evolution [76], the *rps11* gene was lost from gymnosperms (*G. biloba*) and liverworts during the divergence of the angiosperms and gymnosperms [77], and the *rpl10* gene was lost in monocots and gymnosperms but regained in dicots [33,78]. The enhanced loss of ribosomal proteins in plant mitogenomes indicates that these genes were encoded partly by mitochondrial native genes and partly by nuclear genes, due to the gene transfer between mitochondria and nucleus [79–81].

**Figure 8.** Maximum likelihood phylogenies of *P. vulgaris* within Fabaceae. Relationships were inferred employing 26 conserved PCGs of 23 plant mitogenomes. Numbers on each node are bootstrap support values. NCBI accession numbers are listed in Table S2. Scale indicates number of nucleotide substitutions per site.

**Figure 9.** Distribution of PCGs in plant mitogenomes. White boxes indicate that the gene is not present in the mitogenome. The colors of genes indicate their corresponding categories. The colors of species represent the classes of rosids (orange), asterids (pink), monocotyledons (gold), gymnosperms (light blue), and liverworts (green).
