4.2.4. Multivariate Data Analysis

The integral of each included bin was normalized to the summed integral of all included bins. The normalized data was then utilized for multivariate data analysis using SIMCA-P<sup>+</sup> (V11.0 and 13.0, Umetrics AB, Umea, Sweden). Initially, for the overview of data distribution and detection of possible outliers, Principal Component Analysis (PCA) was performed on mean-centered data using two principal components. Then, Projection to Latent Structure Discriminant Analysis (PLS-DA) and Orthogonal Projection to Latent Structure Discriminant Analysis (OPLS-DA) were performed on unit-variance scaled data by using grouping information as Y-matrix [59]. Two PLS components were calculated for PLS-DA models, while one PLS and one orthogonal component were used for

OPLS-DA models. Both supervised models were validated using a 7-fold cross-validation method [59]. Further assessments of model quality were also performed, including a permutation test with 200 permutations for PLS models [60] and ANOVA of the cross-validated residuals (CV-ANOVA) tests for OPLS-DA models [61].

The OPLS-DA models were interpreted as back-transformed and color-coded correlation coefficients loadings plots [62] (MATLAB 7.0, The Mathworks Inc., Natick, MA, USA), where the colors indicate the significance of differentiating metabolites, with a warm color (e.g., red) being more significant than a cool color (e.g., blue). The cutoffs for correlation coefficients were chosen on the basis of discrimination significance (*p* < 0.05), e.g., a cutoff value (|r|) of 0.602 was corresponding to the sample number (n) of 10. Differentiating metabolites were also summarized in a heat map, color-coded with the Pearson correlation coefficients from the OPLS-DA models (MeV version 4.9.0).
