*4.3. Clustering, Sequencing, and Quality Control*

The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v4-cBot-HS (Illumina) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform, and paired-end reads were generated. The raw data (raw reads) of fastq format were firstly processed through in-house perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapters, reads containing ploy-N, and low-quality reads from the raw data. At the same time, Q20, Q30, GC content, and sequence duplication level of the clean data were calculated. All the downstream analyses were based on clean data with high quality (Supplied by BioMarker, Beijing, China).
