3.2.2. Sequencing and Bioinformatic Analysis

Using the Illumina Hi-Seq paired-end sequencing platform, we obtained 52−71 Gb data per sample with > 20× coverage of the maize genome (2300 Mb) (Table 3). Reads were mapped to the B73 reference genome and variants were called.


**Table 3.** Sequencing data summary.

To identify EMS-induced segregating variants in the F2 for mapping, we used only variants that were fixed between the M5 mutant (having an alternative allele frequency of 1) and the ML10 wild-type samples (having an alternative allele frequency of 0), and had an allele frequency between 0.25 and 0.75 in the F2 wild-type pool samples. This allele-frequency filter was used to exclude EMS-induced SNPs from E-ML10 that were only present in individual F1 plants from which part of the F2 was derived. Of note, because the F2 was derived from more than one selfed F1 plant, this filter means that we only based the mapping on EMS-induced variants present in the E1-9 line versus the unmutagenized ML10 and discarded other induced SNPs from E-ML10, as their expected frequencies could not be determined due to the pedigrees used (Figure 2B). We then plotted the allele frequencies of these filtered variants along the 10 chromosomes of maize to identify variants enriched in the F2-mutant pool, but not in the F2-wt pool (Figure 2C). This identified a broad region of distorted segregation in the mutant pool on chromosome 4, with the strongest signal at the very top of this chromosome. In addition, we observed clusters containing a very high number of variants on chromosomes 1, 2, 3, 5 and 6; these likely reflect residual heterozygosity in the starting ML10 material.

**Figure 2.** A modified Mutmap to map the E1-9 fasciation mutant by crossing with the "Evil" ML10. (**A**) Ears of E1-9 fasciation mutant and the wild-type ML10. (**B**) Crossing scheme and material preparation for bulk-sequencing. (**C**) Allele frequencies of filtered mutant vs wild-type markers for F2 mutant (purple) and F2 wild-type (blue) pools shown as dots. Lines represent Loess smoothing through allele frequency averages over 1 Mb windows. Vertical gray bar shows *CLE7* position. In the clusters with high variant densities on chromosomes 1, 2, 3, 5 and 6 the dots for the wild-type pools are obscured by the dots for the mutant pools. (**D**) SSR marker analysis in E1-9/BL10 F2 families confirms that the E1-9 mutation is linked to the top of chromosome 4. DNA samples are E1-9 and BL10 parents and their F2 pools of plants with mutant (F2-mut) and with wild-type phenotypes (F2-wt). Below is the genetic map of Bins on chromosome 4 (adapted from [6]). *CLE7* is in bin 4.02.

Similar to these observations, Klein et al. [28] also found that the B73 line used for their mutagenized population contained segments that di ffered from the B73 reference genome. Liang et al. [40] sequenced several B73 stocks from di fferent laboratories and also reported the presences of clearly defined genomic blocks containing haplotypes that di ffer from the published B73 reference genome. Similarly, such residual heterozygosity in our ML10 material would explain why the E1-9 mutant contained genomic segments that di ffered from the E-ML10 used for crossing and the sequenced unmutagenized ML10 individuals. Thus, using our pipeline the E1-9 mutation was mapped to the top of chromosome 4.

### 3.2.3. Linkage Confirmation by PCR Analysis and Identification of Causal Mutation

To confirm the location of the mutation, bulk linkage analysis by SSR markers was performed in an independent F2 mapping population (E1-9 crossed with BL10). Results showed that the E1-9 mutation was linked to marker *phi021* in bin 4.03 on top of chromosome 4 (Figure 2D). These results showed that our map-by-sequencing approach allowed the rapid identification of the putative region harboring the causal mutation.

To identify the causal mutation in this region, we further looked at the above-filtered SNPs that were homozygous in the F2-mutant-pool sample and found only one such SNP at position Chr4: 8,764,229 (C in ML10 and T in E1-9). However, this SNP was in the intergenic region, and was 3 kb downstream from the closest gene LOC103655072, which encodes for a *Pectinesterase Inhibitor 38*. Hence, this SNP was unlikely to be the causal mutation in E1-9.

Therefore, we also checked the nucleotide sequences of candidate genes whose mutations are known to cause ear fasciation including *CT2* [41], *FEA2* [42], *TD1* [43], Zm*GB1* [44], *FEA3* [45], *FEA4* [46], and *CLE7* [47,48]. No di fferences were found in the coding region of these genes (Supplementary data) between ML10 and E1-9. However, we found a fixed 376 bp deletion (Chr4: 8,337,361−8,337,738) in the promoter of *CLE7* in E1-9 and F2-mutant-pool, but not in ML10, and segregating in the F2-wt pool (Figure 3A,B, Figure S1).

*CLE7* lies in bin 4.02 on top of chromosome 4, very close to the region identified in our whole genome sequencing analysis. As CRISPR-Cas9 derived *cle7* maize mutants have recently been shown to cause ear fasciation [48], this makes the promoter deletion in *cle7* in our E1-9 mutant the prime candidate for the causal mutation. The *cle7-a1*, *cle7-a2* CRISPR-Cas9 derived alleles cause a deletion of one nucleotide at + 10 and + 11 after the start of translation, respectively, hence resulting in a frameshift after the third amino acid in the protein and causing a premature stop codon in these likely null-alleles. Our mutation is a 376bp deletion at position −2 relative to the ATG start codon, which is likely to be a strong, close-to-null mutant. The large promoter deletion explains why we did not find this variant in our bioinformatic analysis of the Mapping-by-Sequencing data, but only identified the closely linked marker SNP. Although the mechanism is unclear, EMS has been shown before to cause insertions/deletions [8,49].

To test for co-segregation of mutant phenotype and the promoter deletion in *CLE7*, we designed a PCR marker for the deletion (Figure 3B) and tested it in two di fferent F2 populations. In one F2 (E1-9 crossed with B73), where a 1:3 segregation ratio of the mutant phenotype was observed (33 mutants: 95 wt plants), there was perfect co-segregation of the mutant phenotype with the homozygous deletion in all 33 mutant plants (Figure 3C). This result supports the hypothesis that the deletion in *CLE7* was the causal mutation in E1-9. In a di fferent F2 (E1-9 crossed with BL10), we observed a much lower frequency of the mutant phenotype (18 mutants: 96 wt plants), significantly di fferent from a 1:3 ratio (Chi-square test, *p* = 0.02). Although all of the 18 plants with the mutant phenotype were homozygous for the promoter deletion, confirming co-segregation, three plants, whose ears looked normal, were homozygous for the promoter deletion (Figure 3D). These three plants were genotypically mutant, but phenotypically wild-type, suggesting that the mutant phenotype was suppressed in these plants.

Taken together, these results strongly sugges<sup>t</sup> that E1-9 was a *cle7* mutant; interestingly, they also sugges<sup>t</sup> that there was a suppressor of the *cle7* mutant phenotype in the BL10 background.

**Figure 3.** Identification of a promoter deletion in *CLE7* in E1-9. (**A**) IGV screenshot showing the alignments of reads to the B73 reference genome for ML10, E1-9, F2-wt pool and F2-mutant pool at the *CLE7* locus. (**B**) Illustration of the *CLE7* gene and the position of the promoter deletion in the E1-9 mutant. Genotyping primers were designed such that the wild-type allele gives a product of 537 bp, while the mutant allele gives a product of 161 bp. (**C**) and (**D**) Genotyping results for the phenotypically mutant and wild-type plants from two F2 populations made from crosses between the homozygous E1-9 mutant to B73 (**C**) and to BL10 (**D**), respectively. (**D**) In the F2 with BL10, arrows point at samples from plants with wild-type looking ears that had mutant genotypes, suggesting that BL10 contains a suppressor(s) of the *cle7* mutant phenotype.
