*3.4. Data Sets*

The reference genome sequence and general feature format (GFF), coding sequence (CDS), protein sequence (PEP) files, and expression data are included in the GDS [20]. A summary of the genomic data currently available in the GDS is presented in Table 1. The versions of the genomes from top to bottom in the table are v1.0a2, r1.1, v1.0, r1.1, r1.1, r1.1, v4.0a2, and v1.0 [21–25].


**Table 1.** Statistics of the genome features for eight strawberry species.

#### *3.5. Completeness of the Genomes*

BUSCO provides measures for the quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB [26]. On the species introduction page, we have integrated BUSCO5 results of the eight species, including complete and single-copy, complete and duplicated, fragmented, and missing orthologs. The results are presented as a bar graph (Figure 2), showing that *F. ananassa* currently has the most complete gene assembly of the eight strawberry species.

**Figure 2.** Genomic completeness of eight strawberry species evaluated using BUSCO V5 software. Fana, *Fragaria* ×*ananassa*; Fiin, *Fragaria iinumae*; Fnil, *Fragaria nilgerrensis*; Fnip, *Fragaria nipponica*; Fnub, *Fragaria nubicola*; Fori, *Fragaria orientalis*; Fves, *Fragaria vesca*; and Fvir, *Fragaria viridis*.

## *3.6. Phylogenomic Relationships among the Strawberry Genomes*

OrthoFinder [27,28] is a fast, accurate, and comprehensive platform for comparative genomics. It identifies orthogroups and orthologs, infers rooted gene trees for all orthogroups, and identifies all of the gene duplication events in the gene trees. To analyze the relationships among eight strawberry genomes, we constructed a phylogenetic tree using OrthoFinder (V2.5.2) software and included two additional species, *Rosa chinensis*, and *Arabidopsis thaliana* (Figure 3a).

**Figure 3.** Phylogenomic analysis of eight strawberry species. (**a**) A phylogenomic species tree of eight strawberry species; (**b**) *F.* ×*ananassa*; (**c**) *F. iinumae*; (**d**) *F. nilgerrensis*; (**e**) *F. nipponica*; (**f**) *F. nubicola*; (**g**) *F. orientalis*; (**h**) *F. vesca*, and (**i**) *F. viridis.* The picture (**c**) is from https: //en.wikipedia.org/wiki/File:Fragaria\_iinumae\_(fruits).jpg (accessed on 10 November 2021); picture (**i**) is from https://en.wikipedia.org/wiki/File:%D0%9A%D0%BB%D1%83%D0%B1%D0%BD% D0%B8%D0%BA%D0%B0\_(Fragaria\_viridis).jpeg (accessed on 10 November 2021); picture (**f**) is from http://www.fpcn.net/uploads/allimg/131107/2-13110G30231V8.JPG (accessed on 10 November 2021).

*Fragaria virginiana* and *Fragaria chiloensis* are the genomes of the progenitor species of *Fragaria* ×*ananassa*. However, the dispute over its diploid ancestor has lasted for more than half a century and is still unresolved. In 2019, Edger et al. speculated that it has four different diploid ancestors, *F. vesca*, *F. iinumae*, *F. viridis* and *F. nipponica* [11]. Unexpectedly, just a few months later, Liston and others completed a reanalysis of the same set of data, but they came to a completely different conclusion. They believed that the octoploid strawberry has only two existing ancestors, *F. vesca* and *F. iinumae* [29]. Edger et al. insisted on the previous conclusion [30]. The structure of our phylogenetic tree [31] clearly indicates that *Fragaria vesca* is closest to *Fragaria* ×*ananassa*. However, because of the low sequencing and assembled technology or gene introgression, there is no direct evidence indicating the origin of the cultivated strawberry.

The pictures below the tree (Figure 3b–i) shows *F.* ×*ananassa*, *F. iinumae*, *F. nilgerrensis*, *F. nipponica*, *F. nubicola*, *F. orientalis*, *F. vesca*, and *F. viridis*. Photographs b, d, e, g and h were provided by our colleague, Dr. Qiao and the others were obtained from Wikipedia or Baidu.

#### *3.7. Genomic Comparison of Gene Orthogroups*

To provide an overview of the comparison among these strawberry genomes, we compared the number of gene orthogroups identified by Orthofinder in the strawberry genomes. We uploaded the orthogroup data to an online Venn diagram tool (https://www. vandepeerlab.org/?q=tools/venn-diagrams, accessed on 10 November 2021) to generate a Venn diagram showing the shared and unique gene orthogroups in *F. iinumae*, *F. nilgerrensis*, *F. nipponica*, *F. vesca*, and *F. viridis*. The gene orthogroup numbers are shown in each segment of the diagram (Figure 4); 13,766 gene orthogroups were shared among the five species, and 13,380 gene families appeared to be unique to *F. nipponica.*

**Figure 4.** Venn diagram of gene orthogroups in five diploid and wild *Fragaria* species. Comparison of the number of shared gene families among five diploid strawberries, *F. iinumae*, *F. nilgerrensis*, *F. nipponica*, *F. vesca*, and *F. viridis*.

The number of predicted genes was quite higher in F. nipponica, compared with the other four species, and the number of specific genes for F. nipponica was also extremely high (13,380). The reason why so many genes were predicted in *F. nipponica* is because there were more than 80,000 proteins annotated, only using Illumina sequencing technology, in 2014. In the future, with the improvement of the technology of sequencing, this problem will disappear.
