*4.3. Chemicals*

Methanol (>99.9% pure) and dipotassium hydrogen phosphate (anhydrous) were purchased from Fisher Chemical (Fair Lawn, NJ, USA). Sodium 2,2-dimethyl- 2-silapentane-5-sulfonate (DSS-d6, 98%) and deuterium oxide (D2O, 98%) were purchased from Cambridge Isotope Laboratories, Inc. (Tewksbury, MA, USA).

#### *4.4. Sample Preparation for NMR Spectroscopy*

Serum samples were thawed at room temperature for one hour and were prepared for NMR spectroscopy using a methanol protein precipitation method described by Nagana Gowda and Raftery [52]. Briefly, 300 μL of serum was mixed with 600 μL of methanol, vortexed (Ratek multi tube vortex mixer, MTV1), incubated at –20 ◦C for 20 min, then centrifuged to pellet proteins (11,360 g, 21 ◦C, 30 min). A 600 μL aliquot of supernatant was then transferred to a clean 2 mL microcentrifuge tube and dried under vacuum at 21 ◦C overnight using a SpeedVac Savant SPD 2010 Concentrator (Thermo Fisher Scientific, Waltham, MA, USA). Dried extracts were then reconstituted in a D2O phosphate bu ffer solution (100 mM K2HPO4) containing 0.25 mM DSS-d6 as an internal standard. A 550 μL aliquot was transferred to 5 mm NMR tube for analysis.

#### *4.5. 1H NMR Data Acquisition*

Routine 1D proton spectra were obtained on a Bruker Ascend 700 MHz spectrometer equipped with cryoprobe and SampleJet automatic sample changer (Bruker Biospin, Rheinstetten, Germany). A Bruker noesypr1d pulse sequence was used over –0.76 ppm to 10.32 ppm spectral range with 256 scans collected after eight dummy scans at 298K, with a total acquisition time of 2.11 seconds per increment and a relaxation delay (D1) of 2.00 seconds. The overall number of data points was 32,768. A line broadening of 0.3 Hz was applied to all spectra prior to Fourier transformation. Spectra were manually phased then baseline corrected in Topspin v.3.6.1 (Bruker Biospin, Rheinstetten, Germany). Samples were referenced to the internal standard (DSS-d6) at δ 0.00.

#### *4.6. 1H NMR Spectral Processing & Multivariate Statistical Analysis*

NMR spectra were imported into MatLab v.R2017b (Mathworks, Natick, WA, USA) using the ProMetab v.1.1 script [53]. Each raw spectrum consisted of 31,313 data points between −0.60 and 10.00 ppm.

Statistical analyses were performed in MatLab utilizing the PLS Toolbox v. 8.5.2 (Eigenvector Research Inc., Manson, WA, USA). The spectral region containing the residual water peak (δ 4.68–5.00) was removed. Spectra were aligned using the correlation optimized warping algorithm [54] to account for chemical shift drift, then normalized to total signal area to account for inherent concentration differences between samples. After normalization, spectral regions containing methanol (δ 3.32–3.36) and DSS-d6 (δ 0.4–−0.60) peaks, and the non-informative region beyond 9.00 ppm were removed. Finally, spectra were baseline corrected using automatic weighted least squares, and scaled by mean centering. After editing, a total of 24,349 chemical shift datapoints were included in subsequent statistical analyses.

For multivariate analyses, unsupervised principal component analysis (PCA) was used. Peaks of interest were identified using the Chenomx NMR suite software v.8.4 (Chenomx Inc., Edmonton, AB, Canada), comparison to the literature, and 2D NMR analysis.

#### *4.7. Correction of 1H NMR Spectra for the E*ff*ects of Systematic Environemtal and Physiological E*ff*ects*

In order to investigate the effects on spectra of systematic environmental effects (also known as fixed effects) spectra were "corrected" using linear regression models. When correcting for a single categorical fixed effect, this is equivalent to scaling data using the "class centering" pre-processing step. Rather than mean centering, which involves subtracting the global mean from each variable, class centering subtracts the mean of each class. This allows investigation of intra-class variation by removing the effects of inter-class variation [55]. The advantage of using linear models rather than class centering is that the effect of multiple fixed effects or classes can be modelled simultaneously.

The approach we took was based on the principals of quantitative genetic models, where

$$\text{Phenotypic observation} = \text{environment effects} + \text{generic effects} + \text{residual effects} \tag{1}$$

In this study, we only want to remove the effect of environmental factors (as it is the variation in NMR spectra under genetic influence that we are interested in), so the equation can be further simplified to

$$\text{Phenotypic observation} = \text{greattic effects} + \text{residual effects} \tag{2}$$

The "corrected phenotype" (i.e., the phenotypic observation with the effects of the environmental effects removed) is defined as the residuals from the above model. For the purposes of this study each chemical shift was treated as a separate phenotype, with the signal intensity at each chemical shift being an individual phenotypic observation. The "corrected spectra" was a matrix of the residuals of each model.

A 707 × 24,349 matrix of signal intensities of pre-processed spectra was imported into the R statistical software package v 3.6.2 [56]. Each row in the matrix represented a single sample, and each column represented 1 of the 24,349 chemical shifts between δ 0.40 and δ 8.99 that made up an individual spectrum. The following 4 linear models were applied to each of the 24,349 columns in the matrix (i.e., the signal intensity at each chemical shift was treated as the response variable in a separate regression model):

$$\mathbf{y}\_{\text{il}} = \mu + \text{VIM}\_{\text{i}} + \text{e}\_{\text{il}} \text{ (Model 1)} \tag{3}$$

$$\mathbf{y}\_{\text{jl}} = \mu + \mathbf{P}\_{\text{j}} + \text{ej}\_{\text{l}} \text{ (Model 2)} \tag{4}$$

$$\mathbf{y}\_{\rm kl} = \boldsymbol{\mu} + \mathbf{H}\_{\rm k} + \mathbf{e}\_{\rm kl} \text{ (Model 3)}\tag{5}$$

$$\mathbf{y}\_{\text{ijkl}} = \mu + \text{WIM}\_{\text{i}} + \mathbf{P}\_{\text{j}} + \mathbf{H}\_{\text{k}} + \mathbf{e}\_{\text{ijkl}} \text{ (Model 4)} \tag{6}$$

where y is the signal intensity at a given chemical shift, μ is the mean, WIM is weeks in milk (4 levels, defined as 1, 2, 3, or 4), P is parity (4 levels, defined as 1, 2, 3, or ≥ 4), H is the effect of herd (13 levels, with a range of 9 to 248 cows per herd), and e is the random error term. This resulted in four separate 707 × 24,349 matrices containing spectra corrected for the effects of WIM, parity, herd, and all fixed effects, respectively.

The R<sup>2</sup> values from each regression model were stored in a separate vector. This resulted in four vectors each containing 24,349 R<sup>2</sup> values; each value representing the percentage of variation in signal intensity explained by the fixed e ffect(s) at a given chemical shift.

#### *4.8. Quantifying the E*ff*ect of Stage of Lactation, Parity and Herd on 1H NMR Spectra*

A separate PCA was performed on each of the 4 corrected spectral datasets (as described in 4.7). Scores of the first three PCs were extracted for each model, and for the PCA model constructed using uncorrected data. We then calculated Pearson's correlations between scores derived from the 5 PCAs using the corrplot package [57] in R v 3.6.2 [56]. This resulted in three correlation matrices (one for each PC). The lower the Pearson's correlation coe fficient, the greater the di fferences between PC scores, the greater the di fferences between the two spectral datasets and therefore the greater the significance of the fixed e ffect(s).

An alternative approach to investigating the influence of fixed e ffects is to use multiple linear regression on PC scores from uncorrected spectra. The advantage of this approach is that all fixed effects can be fitted simultaneously, and the statistical significance of each fixed e ffect can be calculated. The model used was

$$\mathbf{y}\_{\text{ijkl}} = \boldsymbol{\mu} + \mathbf{W}\mathbf{I}\mathbf{M}\_{\text{i}} + \mathbf{P}\_{\text{j}} + \mathbf{H}\_{\text{k}} + \mathbf{e}\_{\text{ijkl}} \text{ (Model 5)}\tag{7}$$

where y is the PC score (on either PC1, PC2, or PC3) and μ, WIM, P, H, and e are the mean, fixed e ffect, and error terms described previously. The statistical significance of each fixed e ffect was determined using conditional Wald F statistics in ASReml v 4.2 (VSN International Ltd., Hemel Hempstead, UK). Conditional F statistics are used in multiple linear regression to infer the significance of a given fixed effect assuming that the e ffect of remaining predictor variables have been accounted for [58].

Finally, we validated our results using the analysis of variance (ANOVA) simultaneous component analysis (ASCA) method in the PLS Toolbox [55]. ASCA is a generalization of ANOVA used to quantify the variation induced by fixed experimental design factors on complex multivariate datasets [59]. ASCA was performed on all spectral datasets (corrected and uncorrected). Statistical significance was determined using permutation testing (50 iterations).

#### *4.9. The Relationships between 1H NMR Spectra and Existing Energy Balance Biomarker Concentrations*

In order to assess the utility of large and diverse datasets in livestock metabolomics studies, we used orthogonal partial least squares (OPLS) regression to compare 1H NMR spectra to serum BHBA concentrations determined by colorimetric assay. The aims of this analysis were (1) to assess the robustness of OPLS models built using uncorrected data and (2) investigate the influence of systematic environmental e ffects on the interpretation of 1H NMR spectra when used for untargeted metabolomic analyses.

#### 4.9.1. Robustness of OPLS Models to Predict External Phenotypes Using Uncorrected Data

The robustness of OPLS models constructed using large and diverse datasets was assessed using a leave-one-farm-out external validation. This involved setting aside data from one farm, training OPLS models using data from the remaining 12 farms, then using the withheld data for external validation. This process was repeated until data from each farm was used as an external validation set once. Model performance was assessed using the R<sup>2</sup> and RMSE of calibration, cross validation (venetian blind CV with 10 data splits, and one sample per split), and external validation. The statistical significance of OPLS models was determined using permutation testing (cross validated, Wilcoxon test). Only uncorrected data were used for this part of the analysis.

#### 4.9.2. Influence of Fixed E ffects on Interpretation of 1H NMR Metabolomic Data

To assess the impact of fixed e ffects on the results of untargeted metabolomic analyses we compared the results of OPLS models constructed from (1) uncorrected data from Farm 1 only (N = 129), (2) uncorrected data from all farms, and (3) data from all farms corrected for all fixed e ffects (Model 4). Farm 1 data was used to simulate a more "typical" metabolomics experiment in which confounding from environmental e ffects is controlled through experimental design.

When corrected spectra were used, reference BHBA concentrations were corrected for the same fixed e ffects (Model 4). The residuals of this model represent the "corrected BHBA" concentration, which is the expected BHBA concentration of an individual accounting for di fferences in WIM, Parity and Herd. This poses some challenges in terms of interpretation, as negative residual values (i.e., negative BHBA concentrations) are possible. However, for the purposes of genetic evaluations, the ranking of an animal, or the relative phenotypic value, is of more interest than an absolute value. The corrected value can therefore be considered a "corrected phenotypic ranking."

The impact of fixed e ffects on the ability of NMR spectra to predict external phenotypes (i.e., to classify animals or predict biomarker concentrations for managemen<sup>t</sup> purposes) was assessed by comparing the predictive ability of OPLS models. The influences of fixed e ffects on biomarker discovery were investigated using scores and loadings on LV1 which show the magnitude and direction of relationships between BHBA concentration and 1H NMR spectral features. Variable importance of projection (VIP) scores were used to identify the most statistically significant spectral features in each model. Variables with VIP scores greater than one were considered significant [60].
