*2.2. RNA Extraction and RNA-Seq*

RNA was extracted from grape berry skins of different varieties using an RNA extract kit (Solebao Biotechnology Co., Ltd., Shanghai, China). The integrity of sample RNA was detected with agarose gel, the purity and concentration of RNA were detected with a NanoDrop-2000 instrument (Thermo Scientific, Waltham, MA, USA), and the RQN value was tested with Agilent5300 software. Follow-up experiments could be carried out when the RNA was not contaminated by impurities, such as pigment, protein, sugar, etc. The RQN ≥ 7, the brightness of 28/23S was greater than 18/16S, the RNA concentration ≥ 100 ng/uL, the OD260/280 = 1.8~2.2, the OD260/230 ≥ 2, and the total yield of RNA (>1 μg) met the requirements of two RNA libraries.

A Takara RT reagent kit (Takara, Shanghai, China) was used for cDNA and doublestrand cDNA synthesis. RNA-Seq libraries were constructed using a TruSeq RNA sample prep kit v2 (Illumina, San Diego, CA, USA). The sequencing process was performed

with an Illumina HiSeq 4000 SBS kit (300 cycles) system (Shanghai Majorbio Bio-pharm Biotechnology Co, Shanghai, China).

**Figure 2.** Fruit clusters of six varieties during véraison period (10 wpf and 11 wpf). (**A**) 'Italia' berries at 10 and 11 wpf; (**B**) 'Benitaka' berries at 10 and 11 wpf; (**C**) 'Muscat of Alexandria' berries at 10 and 11 wpf; (**D**) 'Flame Muscat' berries at 10 and 11 wpf; (**E**) 'Rosario Bianco' berries at 10 and 11 wpf; (**F**) 'Rosario Rosso' berries at 10 and 11 wpf.

#### *2.3. Transcriptome Sequencing and Analysis*

SeqPrep (https://github.com/jstjohn/SeqPrep, accessed on 12 February 2022) and Sickle (https://github.com/najoshi/sickle, accessed on 12 February 2022) were used for trimming the adaptors of raw reads and quality control of the raw reads to obtain high-quality reads. The clean reads were aligned to a reference genome (reference genome version: 12X.v2, website source: https://urgi.versailles.inra.fr/Species/ Vitis/Data-Sequences/Genome-sequences, accessed on 14 February 2022) with HISAT2 (http://ccb.jhu.edu/software/hisat2/index.shtml, accessed on 14 February 2022) software, and the mapped reads of each sample were assembled with StringTie (https://ccb.jhu.edu/ software/stringtie/index.shtml?t=example, accessed on 15 February 2022). To identify DEGs (differential expression genes) between two different samples, the expression level of each transcript was calculated according to the transcripts per million reads (TPM) method. RSEM (http://deweylab.biostat.wisc.edu/rsem/) was used to quantify gene abundances. The DEG analysis was performed using DESeq2/DEGseq/EdgeR with Q values (adjusted *p*-value ≤ 0.05, DEGs with |log2FC| > 1 and Q value ≤ 0.05 (DESeq2 or EdgeR)/Q value ≤ 0.001 (DEGseq) that were considered to be significantly different expressed genes. The output of normalized TPM values and the DEG analysis were performed using the Majorbio cloud platform (Shanghai Majorbio Bio-Pharm Technology Co., Ltd.).

#### *2.4. Statistical Analysis*

A correlation analysis was performed among *VvMYBA1*, *VvMYBA2*, *VvMYB5a*, *VvMYB5b*, *VvMYBPA1*, and grape pericarp anthocyanin synthesis genes using the transcription group TPM value at the level of |r| > 0.7 and *p* < 0.05. Expression level was significantly related to the genes. Pearson's correlation coefficient was used to measure the correlation between two random variables. The closer the Pearson value to 1, the higher the similarity of gene expression between samples, and the better the correlation between the samples.

SPSS v26.0 (Chicago, IL, USA) was used for the significance and correlation analysis of *MYB*-related regulatory genes related to anthocyanin synthesis structural genes data and correlation between anthocyanin synthesis structural genes and *VvMYBA1* and *VvMYBPA1* regulatory genes in two bud sport groups data.

#### **3. Results**

#### *3.1. Quality Control Data Statistics*

The total number of raw sequencings reads of each sample ranged from 41,748,704 to 48,476,130 among all the samples. After removing the low-quality reads, the average error rate of the sequencing bases of the clean reads after quality control was less than 0.026%. The percentage of the samples reaching Q20 quality reads was more than 97.74%, and the Q30 percentage was more than 93.32% among all the sequence data. The G and C base ratios were 45.96% and 47.01% of the total bases, respectively. The sequence alignment rates of clean reads matched with the reference genome ranged from 78.27% to 93.11% (Table 1).


**Table 1.** RNA-Seq data quality of all 12 varieties.

(1) Raw reads: the total number of the raw sequencing data; (3) clean reads: the total number of clean sequencing data after quality filtering; (4) error rate (%): the average error rate of the sequencing base corresponding to the quality-filtered data, usually below 0.1%; (5) Q20 (%) and Q30 (%): base or read quality assessment parameters, Q20 and Q30 refer to the percentage of total bases with sequencing qualities of 99% and 99.9% above, respectively. Q20 is usually above 85% and Q30 is above 80%; (6) GC content (%): the percentage of G and C bases corresponding to the quality control data as a percentage of the total bases; (7) total mapped: the number of clean reads that can be matched on the genome.
