*2.6. Analysis of Public Microarray Data*

We downloaded the gene expression profile data (series accession number: GSE146446 [26] and GSE45468 [27]) in the Gene Expression Omnibus database for using Biobase and GEOquery package in R. Both data used the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array; Agilent Technologies, Palo Alto, CA, USA). We found the Affymetrix probe IDs by searching for the gene name. Then, subsequent statistical analysis was performed using the gene expression level value of each gene.

#### *2.7. Batch Mean-Centering Correction, Missing Data Imputation, and Normalization*

Three batches were prepared, with batch 1 consisting of S15 (non-responders) and S29 (responders); batch 2 of S54 (non-responders) and S52 (responders); and batch 3 of S6 (non-responders), S11 (non-responders), S32 (responders), S34 (non-responders), S38 (responders), and S46 (responders), based on sample preparation date [28]. Mean-centering correction per protein was applied to raw data from 104 LC-MS/MS analyses to avoid the batch effect [29,30].

Then, missing data imputation was performed. Of 316 quantified proteins measured at one time in each individual sample, 180 were completely quantified, whereas missing data for the remaining 136 proteins were determined by a local least-squares imputation method [31]. Using this method, the 180 completely quantified proteins were clustered into 15 groups by Pearson's correlation analysis, and missing values were estimated by a linear optimal combination of the 15 selected clusters.

These data were normalized relative to endogenous normalizing proteins without spike-in standards [32]. From the complete data, six of 210 proteins were finally selected as suitable for LFQ normalization based on the following criteria: (1) their plasma concentrations remained nearly constant in all samples, as determined by their NormFinder stability value [33]; (2) their plasma concentrations did not differ significantly in the five responders and five non-responders, as shown by LMM analysis (*p*-value > 0.05); and (3) there were no reports of depression. The raw abundance of the six selected normalizing proteins, BTD, C8B, C1S, ITIH2, IGFALS, and SERPINA3, in each sample was divided by the geometric mean of six raw abundances in all samples. The median of these six ratios in a sample was defined as the normalization scaling factor (NSF) for that sample. The NSF for sample *s* can be calculated using the following equation:

$$NSF\_s = \text{geometric}(\frac{N\_{1,s}}{\hat{N}\_1}, \frac{N\_{2,s}}{\hat{N}\_2}, \dots, \frac{N\_{6,s}}{\hat{N}\_6})$$

where *Ni*,*<sup>s</sup>* is the raw protein abundance of a normalization protein *i* in sample *s*, and *N*ˆ*<sup>i</sup>* is the median abundance of protein *i* in all the samples. The normalized abundance of the intensity of each biomarker candidate in a sample was calculated by dividing its raw peak intensity by the NSF:

$$PA\_{j,s} = \frac{PA\_{j,s}}{NSF\_s}$$

where *PA*˘*j*,*<sup>s</sup>* is the normalized abundance of the *j*-th biomarker candidate in sample *s*, and *PAj*,*<sup>s</sup>* is the raw abundance of the corresponding protein.
