*3.6. Selection of Transcripts Related to Disease Resistance through DEG Analysis and GO Annotation*

A DEG analysis between the resistant and susceptible lines was performed to develop molecular markers related to disease resistance. TransDecoder was used to select 109,521 sequences with a length of 100 or more amino acids among the sequences generated using Trinity de novo assembly. As a result of calculating the read count of the transcript expression level using the StringTie program, we confirmed that the overall average mapping rate was 85%, and a total of 87,427 transcript read counts were obtained. Based on the read count of the transcript of each line, a comparative analysis between each onion line was performed using DESeq2. First, the raw read count data were normalized through size factor and dispersion. Thereafter, a correlation analysis between each line was performed based on the standardized value. In addition, from the RNA-seq results, the DEGs were analyzed for the resistant and susceptible groups. Using the DEGseq of the R package, it was confirmed that the difference in resistance versus susceptibility was statistically significant. Significantly expressed genes were verified using the MA plot results (Figure S3). In the MA plot, a false discovery rate (FDR) of less than 0.05 was indicated in red. Through the identification of the significantly expressed genes shown in red, it was confirmed that it was sufficient for use in the subsequent resistance-related gene analysis. In addition, a heatmap was constructed using z-scores to analyze the differences in expression for each line by using a group of significantly expressed genes (Figure 4). The heatmap was analyzed by comparing and analyzing the Pearson correlation coefficients for each line and gene after hierarchical clustering using the average linkage method. In addition, a volcano plot of the DEGs was obtained (Figure S4). The results of the volcano plot are shown in different colors according to the following conditions: FDR < 0.05 and |log2 fold change| ≥ 2. In the volcano plot, it was expressed in different colors according to the following conditions: red: FDR < 0.01 and |log2 fold change| ≤ 2, green: FDR < 0.01 and |log2 fold change| > 2, and orange: FDR ≥ 0.01 and |log2 fold change| > 2. Through this visualized DEG result, it was confirmed that there was a transcript showing a significant DEG difference between the gray mold-resistant and -susceptible groups.

**Figure 4.** The heatmap constructed using z-scores to analyze the differences of expression in each line by using a group of significantly expressed genes.

The differences between resistant and susceptible lines of 'Seeds & People' Co. were significant, and it could be statistically confirmed that a resistance-related mechanism analysis was possible. Among the obtained 87,427 transcripts, the transcripts with significantly different expression levels between the resistant and susceptible lines were selected to satisfy the following conditions: PADJ < 0.05 and |log2 fold change| ≥ 2. A total of 1636 transcripts were selected. Among the 1636 transcripts, 320 transcripts showed higher expression levels in the resistant group, and 1316 showed higher expression in the susceptible group. Among the 320 transcripts that showed a higher level in the resistant group, only 182 transcripts matched to TAIR ID, while 138 did not, suggesting an unknown onion gene. In addition, among the 1316 transcripts that showed higher expression levels in the susceptible group, only 897 transcripts matched, while 419 did not. The matched transcripts were analyzed by GO annotations of the cellular components, molecular functions, and biological processes using TAIR ID.

To select genes with increased expression in relation to resistance, among the 182 transcripts with increased expression levels, 22 transcripts related to 'response to stress' and seven transcripts related to 'response to biotic stimuli' were analyzed by GO annotation (Figure S5). Finally, 29 resistance-related transcripts were confirmed to be related to disease resistance, and variants were also observed between the resistant and susceptible groups. In addition, the gene functions of the identified transcripts were analyzed using TAIR ID
