*2.6. Statistical Analyses*

The distance from the start of the visual analogue scale to where the observer had made a mark was measured (mm) and these measurements were entered into individual observer Excel (Microsoft Excel 2003, North Ryde, NSW, Australia) files. These data were submitted to statistical analysis with Generalised Procrustes Analysis (GPA) as part of a specialised software package written for Françoise Wemelsfelder (Genstat 2008, *VS.*N International, Hemel Hempstead, Hertfordshire, UK; Wemelsfelder et al., 2000). Each Study (treatment comparison) was analysed separately, making a total of six independent analyses.

For a detailed description of GPA procedures, see Wemelsfelder et al. [12]. Briefly summarised, GPA calculates a consensus or 'best fit' profile between observer assessments through complex pattern matching. GPA provides a statistic (the Procrustes Statistic) which indicates the level of consensus (i.e., the percentage of variation explained between observers) that was achieved. The statistical process whereby this best-fit pattern, termed the consensus profile, is identified takes place independently of the meaning of individual terms used by observers. Whether this consensus is a significant feature of the data set, or, alternatively, an artefact of the Procrustean calculation procedures, is determined through a randomisation test [23]. This procedure rearranges at random each observer's scores and produces new permutated data matrices. By applying GPA to these permutated matrices, a 'randomised' profile is calculated. This procedure is repeated 100 times, providing a distribution of the Procrustes Statistic indicating how likely it is to find an observer consensus based on chance alone. Subsequently a one-way *t*-test is used to determine whether the actual observer consensus profile falls significantly outside the distribution of randomised profiles.

Through Principle Components Analysis (PCA), the number of dimensions of the consensus profile is reduced to several main dimensions (usually 2 or 3) explaining the variation between animals. GPA dimensions are interpreted by correlating the animals' scores to the observers' individual scoring patterns, producing individual observer word charts that describe the consensus dimensions through their association with each individual observer's terms. These word charts can then be compared for linguistic consistency. From these word charts, a list of terms describing the consensus dimensions was produced, by selecting terms for each observer that correlated strongly with those dimensions (i.e., the more highly correlated an individual term is with a dimension end, the more weight it has as a descriptor—positive or negative—for that dimension). Each video clip of animals receives a quantitative score on each of these dimensions, so that the transport trip's position in the consensus profile can be graphically represented in two- or three-dimensional plots. Each plot represents each of the transport trips in the respective treatments, where the position of the transport trip indicates its scores on each GPA axis.

To investigate treatment effects, the observer scores for each GPA dimension (response variables) were analysed using mixed-model ANOVAs, with driver ID and observer ID included as random factors to account for driver behaviour/truck design differences and repeated observations. For Study B3, we included both crate and deck as two independent factors. Analyses were carried out using Statistica software, version 9 (StatSoft-Inc, Tulsa, OK, USA).
