*2.14. Statistical Analysis of Expression Data*

Data preprocessing and all subsequent analyses were performed using the statistical programming language R, version 3.5.0 (R Development Core Team 2018). Normalization of the raw microarray data (CEL files) was done using RMA as implemented in the R package oligo. Normalization was performed on a set of, in total, 18 CEL files. To determine differentially expressed genes, the R package limma was used [36]. Adjustment for multiple testing was conducted with the method of Benjamini and Hochberg (FDR, false discovery rate) [37]. A gene was called differentially expressed if the adjusted *p*-value was <0.05 and log2 fold change was <−1.5 (downregulated) or >1.5 (upregulated). A volcano plot was generated that plots log2 fold change on the x-axis and statistical significance on the y-axis (-log10 of the FDR-adjusted *p*-value). Heatmaps were used to visualize z-scores (expression values standardized per gene to mean 0 and standard deviation 1), ordered according to average linkage hierarchical clustering of genes and experiments, respectively. Gene ontology enrichment analysis was performed based on probe set IDs with the topGO package [38], using Fisher's exact test and the elim method. Only results from the biological process ontology were considered. The cutoff for the enrichment *p*-value was set to 0.05.
