*2.3. Rarefaction and Sequencing Analysis*

The raw paired-end FASTQ reads were demultiplexed using idemp (https://github.com/yhwu/ idemp/blob/master/idemp.cp) and imported into the Quantitative Insights Into Microbial Ecology 2 program (QIIME2, ver. 2017.9.0, https://qiime2.org/). Raw reads were subsequently deposited into the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database under the SRA accession SRP145097. The Divisive Amplicon Denoising Algorithm 2 (DADA2) was used to quality filter, trim, de-noise, and merge the data. Chimeric sequences were removed using the consensus method. A feature classifier in QIIME2 trained with the SILVA 99% operational taxonomic unit (OTU) database and trimmed to the V4 region of the 16S was used to assign taxonomy to all ribosomal sequence variants. Contaminating mitochondrial and chloroplast sequences were filtered out of the resulting feature table. The remaining representative sequences were aligned with MAFFT and used for phylogenetic reconstruction in FastTree. Finally, diversity metrics were calculated using the QIIME2 diversity plugin and visualized with Prism (ver. 7.0a, GraphPad, La Jolla, CA, USA).

After quality filtering and preprocessing, we determined that 8 of our 37 sequenced samples had fewer than 650 reads, which we deemed insufficient for statistically powerful diversity analysis, and thus a potential source of bias. We therefore removed these read-poor samples from downstream alpha and beta diversity analysis. Five of the discarded samples were distributed, one each, across different ingredient and environmental sample types. Given low variation between the remaining two replicates in these sample types, we feel the two replicates are sufficient for publication. The other three read-poor samples were the triplicate fermenting samples from Day 0. These samples were dominated by contaminating chloroplast reads, which were computationally removed. The remaining bacterial reads were sufficiently low that they presented a problem for alpha and beta diversity measurements. The low abundance of bacterial reads in Day 0 samples and our other samples is likely a reflection of the intrinsically low bacterial abundance of those communities. While this does limit the potential scope of our conclusions, it is an inevitable result of working with low abundance communities. This is reflected by the absence of Day 0 in Figures 1 and 2. To visualize the bacterial community at the Day 0 time point, we used a less restrictive cutoff for sample inclusion in our taxa bar plots—250 reads (Figure 3). This allowed us to recapture all three replicates from Day 0 and gain insight into the structure of these communities in the absence of diversity analysis. —
