*2.1. Chloroplast Genome Size and Organization*

Sequencing of the nine tomato genotypes produced from 6.2 M (ves2001) to over 11.5 M (pol) high quality paired-end reads.

Cultivated tomato accessions, namely cor, pds, pol, ves2001, vfr, and vpz, had exactly the same plastome size (155,435 bp), with the exception of pgl (only one bp shorter), whereas plastome size in wild species was slightly larger in *S. neorickii* 1 (155,515 bp) and smaller in *S. pimpinellifolium* 1 (155,420 bp; Table 1). All genotypes exhibited the typical quadripartite structure of angiosperms plastome, including a pair of inverted repeats (IRs) separated by a large single copy (LSC) and a small single copy (SSC) regions (Table 1).


**Table 1.** Plastome features of the sequenced tomato genotypes.

## *2.2. Genetic Variability and Phylogenetic Analyses*

Comparative analyses were performed in order to identify patterns of nucleotide variability among the tomato plastomes (the nine genotypes sequenced in this work and the twelve genotypes retrieved from GenBank). An overview of the nucleotide variability was shown in Figure S1. A variable number of SNPs (from a minimum of 9 to a maximum of 290) was observed when cultivated and wild plastomes were compared with the reference genome IPA-6 (Figure 1). Particularly, in cultivated tomatoes the number of SNPs was markedly low (from 9 to 17 SNPs), with the notable exception of cer1 that differed for 74 SNPs from IPA-6, a difference comparable to that of wild *S. pimpinellifolium*. All local accessions showed identical plastome sequences, with the exception of cor that differed for one point mutation in the exon 2 of the *rpoC1* gene and pgl that was one bp shorter.

Considering the low variability detected, to verify whether the SNPs identified in cultivated genotypes were ancestral or *de novo* mutations likely evolved before or after the domestication process, a comparative analysis was performed on the investigated tomato genotypes clustered into five groups: (1) the *S. lycopersicum* var. *lycopersicum* tomato commercial varieties IPA-6 and M82; (2) the seven local accessions from Campania region; (3) the *S. lycopersicum* var. *cerasiforme* cer1 and cer2; (4) the *S. pimpinellifolium*, *S. pimpinellifolium* 1 and *S. pimpinellifolium* 2, and (5) the wild including *S. habrochaites*, *S. cheesmaniae* and *S. galapagense*, phylogenetically closer to cultivated tomato than other wild species (Figure 2).

**Figure 1.** Stacked bar chart showing the distribution of single nucleotide polymorphisms (SNPs) that fall within coding sequences of genes, introns, and intergenic regions of the nine tomato plastomes sequenced in this work and in those of eleven species retrieved from GenBank. The plastome of IPA-6 (AM087200) was used as reference for SNP calling.

**Figure 2.** Hierarchical clustered heatmap representing color-coded SNP alleles as scored across 5 different groups of genotypes, i.e., var. *cerasiforme*; var. *lycopersicum*; local accessions; *pimpinellifolium*; wild species (including *S. habrochaites*, *S. cheesmaniae*, and *S. galapagense*). Numbers at the base of the tree indicate the SNP(s) that fall into each group. Blue: reference allele; green: alternative allele; yellow: reference or alternative allele.

Notably, a high number of SNPs (271) was common between all five groups and different from distantly related wild species, thus being ancestral mutations evolved in the phyletic lineages including cultivated tomatoes. Ninety SNPs were common between the cultivated tomatoes (*S. lycopersicum* var. *cerasiforme*, *S. lycopersicum* var. *lycopersicum* and local accessions) and the *S. pimpinellifolium* groups, whereas other wild species showed either the reference or the alternative allele. It is very likely that these latter 90 SNPs have been fixed only in the phyletic lineage of wild *S. pimpinellifolium* and

cultivated tomatoes. Only two SNPs distinguished the local accessions from the remaining ones, but 38 SNPs (invariable among the cultivated genotypes) were different between cultivated tomatoes and *S. pimpinellifolium*. Notably, the *S. lycopersicum* var. *cerasiforme* group (cer1 and cer2) showed either the reference or the alternative allele of these 38 SNPs with cer1 sharing the *S. pimpinellifolium* allele, whilst cer2 the cultivated one. This result suggests that these 38 SNPs evolved as *de novo* mutations after the separation of cultivated forms from wild *S. pimpinellifolium* but were already present in the ancestral domesticated gene pool (including the *S. lycopersicum* var. *cerasiforme* group) and only subsequently fixed in the cultivated *S. lycopersicum* var. *lycopersicum* and local accessions groups. The other five SNPs were common between local accessions, *S. pimpinellifolium* and wild groups, while the *S. lycopersicum* var. *lycopersicum* and the *S. lycopersicum* var. *cerasiforme* groups showed, respectively, the reference and either the reference or the alternative allele. By excluding cer1, in cultivated tomatoes seven SNPs (including the latter five and the two exclusive point mutations of local accessions) represent the only differences between plastomes of *S. lycopersicum* var. *lycopersicum* and local accessions.

As expected, wild species showed the highest number of SNPs independently from the phylogenetic distance to the reference genome (Figures 1 and 2) with variation detected even between accessions of the same species: ten different SNPs were found between the two accessions of *S. pimpinellifolium*. By looking at the distribution of SNPs in coding sequences, introns, and intergenic regions, the highest number of SNPs was scored in intergenic regions ranging from 9 to 13 in cultivated tomatoes and cer2, 40 in cer1, and 40-170 in the wild relatives. The same trend was observed for SNP distribution in coding sequences (Figure 1). Particularly, SNPs in wild species ranged from 25 to 94 and were dispersed as 1-2 variations per gene in most genes, whereas among cultivated genotypes up to four SNPs in local accessions were located in *matK*, exon 2 of the *rpoC1*, and *ycf1* coding sequences, one of these being in charge of an amino acid change. In contrast to all other cultivated landraces, once again, cer1 showed the number and distribution of SNPs similar to that found in the wild *S. pimpinellifolium*.

The most variable genes, especially among wild species, were *ndhH* and *ycf1* with 9 and 42 SNPs, respectively (Figure S2 and Figure S3). The mutations observed in the *ndhH* gene were synonymous (i.e., not causing changes in the amino acid sequence), whilst the nucleotide variability observed in *ycf1* was also reflected at the amino acid level. Interestingly, a SNP variation produced an amino acid change between the var. *lycopersicum* and the local accessions (Figure S3).

One hundred and fourteen simple sequence repeats (SSRs) were identified. The mononucleotide repeat (adenosine or thymine) was the most common type of microsatellite. Only four wild genotypes showed dinucleotide repeats (*S. neorickii* 1 and 2, *S. peruvianum*, and *S. chilense*). As observed for SNPs, clustered heatmap of SSRs across grouped genotypes revealed a very low level of polymorphism (Figure 3).

Sixty-seven SSRs were ancestral (same number of repeat units), being shared by all the analyzed genotypes as compared with the most distantly related wild species not included in the wild group; six SSRs have the same number of repeat units both in *S. lycopersicum* var. *lycopersicum* and local accessions groups, whereas *S. pimpinellifolium* and wild groups displayed a different number of repeat units, and *S. lycopersicum* var. *cerasiforme* both. One SSR displayed 13 repeats shared between *S. lycopersicum* var. *lycopersicum* and local accessions groups (i.e., *atpB-rbcL* intergenic region). An exclusive number of repeat units in the local accessions group was detected in the *ndhC-tRNA-Val* (UAC) intergenic region, while a number of repeat units exclusive of *S. lycopersicum* var. *lycopersicum* group was found in the *psbE-petL* intergenic region (Figure 3, Table S2). Interestingly, one SSR in the *atpH-atpI* intergenic region has the same number of repeat units both in all cultivated genotypes and wild group, while *S. pimpinellifolium* displayed a different number (Table S2). A complete description of SSR variability was shown in Figure S4a. As already observed for SNPs, SSRs were mainly located in intergenic regions (58%) and were mostly included in the LSC (75%; Figure S4b).

**Figure 3.** Hierarchical clustered heatmap representing color-coded simple sequence repeat (SSR) alleles as scored across 5 different groups of genotypes, i.e., var. *lycopersicum*; local accessions; *S. pimpinellifolium*; var. *cerasiforme*; wild species (including *S. habrochaites*, S*. cheesmaniae*, and *S. galapagense*). Numbers at the base of the tree indicate the SSR(s) that fall into each group. Blue: reference allele; green: alternative allele; yellow: reference or alternative allele.

Among the in silico identified microsatellites, eight SSR loci with small variation in the number of repeat units were experimentally tested to verify the correct estimation of their length. No variation in the number of repeat units was detected both in silico and in the electrophoresis profiles in a representation of the nine genotypes sequenced in this work and in a large dataset including additional local accessions and processed/fresh market tomatoes (e.g., Acampora, Lucariello, San Marzano, and Sorrento) confirming the absence of SSR variation within and among cultivated tomatoes. A notable exception was the one basis difference found in the microsatellite located in the *ndhF-rpl32* intergenic region that allowed distinguishing local accessions group from other tomato landraces and that was also confirmed by the electrophoresis profiles (data not shown).

Additionally, 17 perfect tandem repeats (TRs) were found, with cultivated species displaying a lower TR number when compared with wild species (Figure 4a). The identified TRs were mainly located in the LSC and intergenic regions (70 and 82%, respectively); two TRs found in all genotypes were in the coding region of the *rps16* and *rps4* genes (Figure 4b). The TR period size ranged from 13 to 26 bp (Figure 4c). TRs confirmed the low variability among the analyzed tomato genotypes. No TR was specific to any cultivated tomato; neither *de novo* TRs could be identified. A TR located in the *tRNA-Gln (UUG)-psbK* intergenic region was the only one to be found variable among species (Table S3). Particularly, local accessions and *S. pimpinellifolium* 1 had one copy, *S. neorickii* 1 and 2 had three copies, while *S. lycopersicum* var. *lycopersicum* (IPA-6 and M82), *S. lycopersicum* var. *cerasiforme* (cer1 and cer2), *S. pimpinellifolium* 2, and the remaining wild species had two copies (Table S3). Interestingly, a *de novo* duplication of four bases motif (ATAA)2, exclusive of the local accessions, was scored by MSA (Figure S5).

Phylogenetic tree inferred from the complete plastomes of the twenty-one tomato genotypes using the potato chloroplast genome (*S. tuberosum* cv. Désirée, DQ386163) as an outgroup, showed two main clades with strong bootstrap support (100%; Figure 5). One clade included some wild species (*S. pennellii*, *S. neorickii* 1 and 2, *S. peruvianum,* and *S. chilense*) with *S. pennellii* as the basal species. The other clade is further separated into several subclades. In particular, the group that included the seven local accessions from the Campania region was closely related to a cluster populated by other cultivated varieties (IPA-6, M82, and cer2). As expected, all cultivated genotypes were more closely related to the clade comprising the two *S. pimpinellifolium* accessions and cer1. The remaining wild species (*S. galapagense*, *S. cheesmaniae,* and S*. habrochaites*) were in a separate clade. Finally, the phylogenetic analysis confirmed the admixed nature of *S. lycopersicum* var. *cerasiforme* as cer1 was closely related to the wild species (*S. pimpinellifolium* 1 and 2), while cer2 was part of the cultivated genotypes clade (M82 and IPA-6).

**Figure 4.** Perfect tandem repeats (TRs) in the nine plastomes sequenced in this work and in the plastome sequence of eleven species available in GenBank. The plastome of IPA-6 (AM087200) was used as reference. (**a**) Bar chart reporting the total number of TRs in each genotype. (**b**) Pie charts describing the percentage of TRs located in the coding sequences of genes, introns, and intergenic regions and in the large single copy (LSC), small single copy (SSC), and inverted repeat b (IR) regions. (**c**) Pie chart describing the percentage of TRs with a specific period size.

**Figure 5.** Phylogenetic tree of cultivated and wild tomato genotypes. Phylogram of the best maximum-likelihood (ML) tree on the complete plastome dataset using *Solanum tuberosum* cv. Désirée (DQ386163) as the outgroup. Numbers associated with branches are ML bootstrap support values. Bootstraps higher than 70% are reported on the nodes.
