**2. Results**

#### *2.1. Analysis of Experimental Metadata*

Descriptive statistics of the datasets used in this experiment are shown in Table 1. BHBA concentrations were significantly higher in Dataset 1 than in Dataset 2 (*p* < 0.001). The di fferences in all other parameters were not statistically significant (*p* > 0.05). The correlation between BHBA and NEFA concentrations was 0.45 in Dataset 1 and 0.40 in Dataset 2.


**Table 1.** Descriptive statistics of the datasets used in this experiment, including number of animals (N), stage of lactation defined as days in milk (DIM), age in years, and β-hydroxybutyrate (BHBA) and non-esterified fatty acid (NEFA) concentrations (mmol/L) in the serum obtained from clinically healthy dairy cows.

 1 Statistical significance of the differences between Datasets 1 and 2 were determined using paired *t*-test for DIM, and a paired Wilcoxon signed-rank test for age, BHBA and NEFA.

#### *2.2. 1H NMR Spectra*

Twenty-four metabolites could be clearly identified from the 1H NMR spectra. Two metabolites, cholate and 3-phenyllactate, were tentatively identified. Figure 1 shows representative spectra from animals in Dataset 1 with (a) elevated BHBA concentration, (b) elevated NEFA concentration and (c) normal BHBA and NEFA concentrations. Upfield regions of spectra were dominated by branched-chain amino acids (leucine, isoleucine and valine), organics acids (BHBA, lactate, acetate) and the methyl and methylene groups of low density (LDL) and very low density lipoproteins (VLDL) at δ 0.86 ppm and δ 1.25 ppm, respectively [32]. We also observed a prominent peak at δ 2.03 ppm which was consistent with the N-acetyl groups of glycoproteins [33]. The singlet at δ 3.14 ppm was identified as dimethyl sulfone (DMSO2) [34,35]. The middle of the spectrum was complex and dominated by glucose. Signal overlap and weak 2D signal strength meant that hippurate was the only compound that could be clearly identified in the downfield region. Relative chemical shifts and the multiplicity of identified peaks are available in the supplementary material (Table S1).

Unsupervised analysis of the data using PCA showed no obvious clustering of samples by dataset. Results of ANOVA-simultaneous component analysis showed that fixed e ffects (cow age, herd of origin and days in milk (DIM)) explained only 13.94% of the spectral variation (Table S2). Only the effect of age was statistically significant (*p* < 0.05). This suggests that most spectral variation is due to di fferences between individual animals.

**Figure 1.** Representative 700 MHz 1H nuclear magnetic resonance spectra of serum samples from early lactation dairy cows with (**a**) elevated β-hydroxybutyrate (BHBA), (**b**) elevated non-esterified fatty acid (NEFA), and (**c**) normal BHBA and NEFA concentrations. Downfield regions were vertically expanded 32 times for clarity. Legend: 1, cholate; 2, very low density lipoprotein/low density lipoprotein; 3, leucine; 4, isoleucine; 5, valine; 6, β-hydroxybutyrate; 7, lactate; 8, alanine; 9, acetate; 10, N-acetyl glycoprotein; 11, pyruvate; 12, citrate; 13, creatine; 14, creatine phosphate; 15, dimethyl sulfone (DMSO2); 16, choline; 17, phosphocholine; 18, betaine; 19, methanol; 20, glucose; 21, glycine; 22, β-Glu; 23, α-Glu; 24, 3-phenyllactate; 25, hippurate; 26; formate. \* = tentative identification.

#### *2.3. Accuracy and Robustness of Prediction Models*

The robustness of the orthogonal partial least squares (OPLS) regression models built using data from Dataset 1 was assessed using (1) 10-fold cross-validation (Figure 2a,c) and (2) external validation with data from Dataset 2 (Figure 2b,d). Prediction accuracies derived from external validation were high for BHBA (R<sup>2</sup> = 0.88), and moderately high for NEFA (R<sup>2</sup> = 0.75). BHBA models were remarkably robust, with external validation R<sup>2</sup> and RMSE results almost identical to cross-validation results. Models predicting serum NEFA concentration were less accurate than those predicting BHBA (NRMSE 0.32 and 0.50, respectively), but external validation results indicated that these models were still quite robust. *p*-values derived from permutation testing were < 0.001 for all models, indicating that models were not over-fitted.

**Figure 2.** Accuracy of orthogonal partial least squares (OPLS) regression models predicting serum β-hydroxybutyrate (BHBA) and non-esterified fatty acid (NEFA) concentrations from 1H NMR spectra, built using data from Dataset 1 (*N* = 248); (**a**) 10-fold cross-validation (CV)-predicted BHBA vs. measured BHBA; (**b**) external validation (*N* = 50)-predicted BHBA vs. actual BHBA; (**c**) CV-predicted NEFA vs. measured NEFA; (**d**) external validation-predicted NEFA vs measured NEFA.

#### *2.4. Metabolomic Fingerprints of BHBA and NEFA*

The metabolomic fingerprints associated with BHBA and NEFA were investigated using OPLS regression. Larger scores on the first latent variable (LV1) correspond to higher concentrations of both BHBA and NEFA (Figure 3a,b). LV1 loadings plots were used to identify which spectral features contributed most to the variation in the reference biomarker concentrations [36] (Figure 3c,d). Spectral features with positive loadings correspond to metabolites that are positively correlated with reference biomarker concentrations, and vice-versa. Peaks with a variable importance of projection (VIP) score greater than one were considered statistically significant [37] (Figure S2).

**Figure 3.** Results of the orthogonal partial least squares (OPLS) regression models predicting serum BHBA and NEFA concentrations from 1H NMR spectra; (**a**) First latent variable (LV1) vs. second latent variable (LV2) scores for the BHBA prediction model; (**b**) LV1 vs. LV2 scores for the NEFA prediction model; (**c**) LV1 loadings for the BHBA prediction model; (**d**) LV1 loadings for the NEFA prediction model. Scores plots color-coded by reference biomarker concentration, loadings plots by VIP score. α-Glu = α glucose, β-Glu = β glucose, Ace = acetate, Ala = alanine, Bet = betaine, BHBA = β hydroxybutyrate, Cr = creatine, DMSO2 = dimethyl sulfone, Glu = glucose, Gly = glycine, Ile = isoleucine, Lac = lactate, Leu = leucine, NAG = N-acetyl glycoprotein, ChoP = phosphocholine, Pyr = pyruvate, Val = valine, LDL = low density lipoprotein; VLDL = very low density lipoprotein.

#### 2.4.1. Commonalities in the Metabolomic Fingerprints of BHBA and NEFA

The results of this study show that several metabolites showed similar co-variances with both BHBA and NEFA concentrations. The largest e ffect we observed was from peaks assigned to glucose, which were negatively correlated with both biomarkers. Other metabolites with common co-variances included lactate, valine and alanine (negatively correlated), and glycine and phosphocholine (positively correlated). Spectral regions attributed to lipoproteins (LDL and VLDL) and glycoproteins were positively correlated with both BHBA and NEFA concentrations.

#### 2.4.2. Di fferences between the Metabolomic Fingerprints of BHBA and NEFA

Figure 4 highlights the di fferences we observed between the metabolomic fingerprints of BHBA and NEFA. Acetate and creatine were positively correlated with BHBA, and negatively correlated with NEFA. A small number of metabolites showed significant co-variance with only one of the biomarkers. BHBA concentration was positively correlated with betaine, and negatively correlated with dimethyl sulfone (DMSO2), while NEFA concentration was positively correlated with isoleucine and negatively correlated with leucine.

**Figure 4.** Loadings on the first latent variable (LV1) derived from orthogonal partial least squares (OPLS) regression of 1H NMR spectra against serum BHBA (blue) and NEFA (red) concentrations in early lactation dairy cows. Spectral regions between (**a**) δ 0.2 ppm to 2.9 ppm and (b) δ 2.9 ppm to 5.5 ppm are shown. Figure (**b**) has been for clarity purposes. Ace = acetate, Bet = betaine, ChoP = Phosphocholine, Cr = creatine, DMSO2 = dimethyl sulfone, Ile = isoleucine, Leu = leucine, LDL/VLDL = low/very low-density lipoprotein, NAG = N-acetyl glycoprotein, Pyr = pyruvate.
