**1. Introduction**

Modern metabolomic techniques such as proton nuclear magnetic resonance (1H NMR) spectroscopy allow high-throughput, synchronous characterization of the small metabolites present in biological matrices [1]. In dairy cows, the metabolome gives a snapshot of the complex interactions between host genetics, the rumen microbiome, and the environment at a given time point. 1H NMR-based metabolomics therefore o ffers exciting opportunities to better understand and characterize the complex physiological and biochemical challenges facing cows in the transition period (defined as the three weeks before and after calving [2,3]) which is the period of greatest disease risk [4]. This in turn can facilitate identification of new molecular phenotypes (metabotypes) for genetic selection for improved animal health. These "intermediate phenotypes," so-termed because they sit between the genome and external phenotype [5], can then be integrated with genomic data to improve

genomic prediction accuracies of complex traits [6,7]. The aim of metabotype identification is therefore to identify biomarkers that represent inter-animal variation free of confounding environmental factors.

Another aim of dairy cattle metabolomic studies is to identify biomarkers which enable early identification of health disorders in the transition period such as ketosis [8,9], hypocalcemia [10] and displaced abomasa [11]. Of particular interest are studies that have identified biomarkers that are predictive of transition period disorders, such as that by Hailemariam et al. [12], who identified a panel of three metabolites that could predict the occurrence of peri-parturient disease up to four weeks before calving. If robust, such predictive biomarkers would enable producers and veterinarians to implement preventive nutritional, managemen<sup>t</sup> and/or veterinary interventions before the onset of disease.Unlike metabotype biomarkers used for genetic selection, the aim of biomarkers used for managemen<sup>t</sup> purposes is to predict the external phenotype, and these must therefore capture all sources of phenotypic variation (i.e., host genetics, rumen microbiome, and the environment).

To date, most serum 1H NMR-based metabolomic studies of livestock have involved relatively small numbers of animals, often of a single breed, and often located on a single farm. In their review, Goldansaz et al. [13] identified limited sample size and diversity as limitations of many livestock metabolomics studies and highlighted the need for larger and more diverse datasets to ensure models and biomarkers are robust. However this needs to be balanced against the need for careful experimental design to account for potential confounding from systematic environmental effects such as diet/nutritional management, parity and stage of lactation, which are known to affect the metabolic status of cows [13]. However, in order to achieve large datasets, it may be necessary to obtain samples from multiple different farms, especially when the prevalence of the condition being investigated is low (e.g., displaced abomasa). Previous studies have reported differences in the milk metabolome of animals from different geographical regions [14], farms [15], and of different breeds [16]. However, given that there is not a strong relationship between blood and milk metabolomes [17,18], these findings cannot be extrapolated to the blood serum/plasma metabolome. More information is therefore needed on the impact of systematic environmental effects on the serum metabolome of livestock.

Linear models are routinely used by quantitative geneticists to account for the influence of systematic environmental effects (also known as fixed effects) known to have significant effects on phenotypic variation [19], and thus disentangle genetic from non-genetic effects. Frequently used fixed effects include stage of lactation, parity, and herd-year-season. Similar approaches have recently been applied to metabolomic data, for example Wanichthanarak et al. [20], who used linear mixed-effects models and patient metadata to account for biological variation in metabolomics data, and Laine et al. [21], who used linear models to study the effect of pregnancy on mid-infrared spectral data derived from cows' milk.

The aim of this study was therefore to investigate the feasibility of using of large and diverse datasets in livestock metabolomics studies by examining the effects of fixed environmental and physiological effects on the 1H NMR serum metabolome of clinically healthy dairy cows in early lactation. We propose a method that uses linear models to correct spectra for fixed effects and demonstrate its potential utility by quantifying the relationship between 1H NMR spectra and the current gold-standard serum biomarker of energy balance, β-hydroxybutyrate (BHBA) [22,23].
