*2.2. Determination of Tissue Pixel Intensities*

Pixel intensities corresponding to respective tissues of fat, muscle and bone were calculated based upon comparison with dissected tissue as explored by Bunger et al. [12]. Pixel intensity windows for fat, muscle and bone were 800–1000, 1000–1100 and 1100–1750,

respectively. This group's previous research allowed us to incorporate set pixel intensity windows for each tissue type into the CV pipeline easily.

#### *2.3. Manual Image Processing and Phenotype Analysis*

All images had been previously labelled by manual phenotype extraction. Parts of the image superfluous for downstream phenotype determination including scanning cradle and testes were removed using STAR software routines [12] using the method described by Glasbey et al. [35]. Images produced from this processing are considered as "ground truth". From the ground truth images, tissue phenotype could then be extracted by calculating tissue distribution within the experimentally determined windows. Other phenotypes such as gigot length were measured manually by measuring the distance from the centre of the ischium bone cross-section to that of the femur bone cross-section in a "click and drag" fashion. Processing the images in this fashion took approximately 30 min.

#### *2.4. GAN Model*

GANs are two-component systems which have a generator component *G* to generate images and a discriminator component *D* to determine if the image is real or fake. The generator *G* takes an input image to translate into an output image *y* and can operate in either an unconditional fashion where random noise *z* is supplied or in a conditional fashion where an input image *x* or random noise *z* is supplied, *G*: {*x* or *z*} → *y.* The discriminator *D* determines if the image produced is "real" or "fake" and helps train the generator *G* to produce images which can pass as "real". GANs thus attempt to optimize the following function [36]:

$$\begin{aligned} \min\_{\mathbf{G}} \quad & \max\_{\mathbf{D}} \quad V(\mathbf{G}, \mathbf{D}) = \quad \mathcal{E}\_{\mathbf{x}, \mathbf{y}} \, [\log D(\mathbf{x}, \mathbf{y})] + \\ & \quad \quad \quad \quad \quad \quad \quad \mathbb{E}\_{\mathbf{x}, \mathbf{z}} \, [\log \left( 1 - D(\mathbf{x}, \mathbf{G}(\mathbf{x}, \mathbf{z})) \right) \end{aligned} \tag{1}$$

Further improvement of the generator *G* can be incorporated by including a function to minimise the absolute pixel differences between "real" and "fake" images [25].

$$\min\_{\mathbf{G}} \quad L\_{L1}(\mathbf{G}) = \mathbf{E}\_{\mathbf{x}, \mathbf{y}, z}[y - \mathbf{G}(\mathbf{x}, z)] \tag{2}$$

Which results in the following final model:

$$\mathbf{G}\* = \begin{array}{c} \min \quad \max \quad \text{max} \quad V(\mathbf{G}, \mathbf{D}) + L\_{L1}(\mathbf{G}) \tag{3}$$

#### 2.4.1. GAN Training

The GAN network trained in this study is an implementation of AUTOMAP [37] and Pix2Pix [25] which has been optimised for use with paired image datasets [38]. This particular GAN was chosen for this study as it was designed from the ground up to process paired sets of images, such as those commonly found in the medical field where an image can be altered to produce a "before" and "after" whilst maintaining the same subject ID and type, e.g., sheep–sheep, human–human, in a conditional synthesis process. This is in contrast to other popular GANs, such as CycleGAN and DCGAN, which perform unconditional synthesis by capturing key style concepts, from large batches of example images to translate images between two highly different abstract style concepts such as horse-to-zebra, photograph-to-Van Gogh or sketch-to-cat [39,40].

A dataset containing 126 raw and ground truth image pairs of mixed breed ovine CT scans taken from 2019–2020 were used for GAN training (Supplementary File S1). DICOM pairs were first split into training (n = 101) and validation (n = 25) datasets (80% and 20%, respectively). The raw and ground truth pairs of DICOM filename IDs were first given a suffix of "\_0" or "\_1", respectively, to act as identifiers. All file extensions were then

modified to ensure compatibility with the DICOM processing libraries used in this study. The script used to train the GAN, along with the full list of GAN settings used for this study, is available within Supplementary File S2. Key settings for training the GAN were as follows: random translation = 0, epochs = 100, weight for L1 reconstruction loss = 0, weight for L2 reconstruction loss = 10.0, weight for softmax focal reconstruction loss = 1.0, weight for total variation = 10−<sup>3</sup> . Following training, both the L1 (absolute pixel difference) and L2 (mean squared error) were approaching stable values (Supplementary Figure S1).

#### 2.4.2. Image Processing Using Trained GAN on Unseen Data

Thirty-two raw CT scans (Supplementary File S3) taken from 2018–2019 and belonging to the breed Charollais were passed through the trained GAN model to produce "predicted" images that were given a suffix of "\_2" to clearly differentiate between the raw and ground truth counterparts (Supplementary File S2).

#### *2.5. CT Scan Similarity Comparison*

#### 2.5.1. CT Scan Histogram Comparison

Alternative image manipulation techniques, such as removing pixels above or below certain intensities, were not suitable for processing the CT scans in the DICOM format as the pixel intensities of image objects needing to be removed overlapped with that of the subject's tissue. Furthermore, pixels in certain areas could not be removed since subject orientation was not constant. Due to the large irregular pixel area changes needed to process the images, a deep neural network that can perform image-to-image translations was deemed to be of potential use. This can be visualised by comparing the pixel intensity histograms of both the raw and ground truth images below in Figure 2 (generated as part of the computer vision pipeline in Supplementary File S4).

#### 2.5.2. Calculation of Image Similarity

Mean squared error (MSE) and structural similarity index (SSI) metrics were used to compare the raw and ground truth images with the resulting predicted images. Mean squared error is a full pixel-wise reference metric with values closer to zero being better; it is the sum of the accumulative mean squared difference across each pixel location between a pair of images. This technique, however, is extremely sensitive and seemingly large amounts of MSE can be accumulated by very minor shifts in the image, as perceived by the human eye, such as slight rotations or horizontal and vertical translations [41]. A newer, more holistic and subtle approach which avoids the extreme position sensitivity of MSE is calculating the SSI, which analyses local similarities in structure, luminance and contrast to more closely mimic how the human eye perceives similar images [42]. Both MSE and SSI were calculated for each pairwise comparison of image classes (raw, ground truth or predicted in this study) using the SciKit Image python image processing library as documented in Supplementary File S4 [43].

#### *2.6. Phenotype Measurement Using Computer Vision*

Automated phenotype extraction from ground truth and predicted (processed) images was performed using a pipeline which incorporated known pixel intensity value thresholding for each component of the carcass, based upon manual dissection, for each tissue type in combination. Geometric phenotypes were computed predominantly using the area, contour and perimeter functions within the CV library SciKit Image [43]. In addition, a set of bespoke functions were also written to detect probable tissue pixel intensity windows of fat, muscle and bone if no known set values were available, or if the images being analysed were from different sources. All steps of phenotype extraction using computer vision are documented in Supplementary File S4.

of the computer vision pipeline in Supplementary File S4).

**Figure 2.** A representative pixel intensity histogram of raw and ground truth image shows large variance. By comparing the raw and ground truth pixel intensity histograms it can be visualised that they a) share certain areas of similarity (as seen at the peak between 1000 and 1250) but also b) contain regions which have different non-zero abundances (within the peak between 750 and 1000). As there are no regions where pixel intensity is either present or not present, images cannot be processed by simply flattening pixel intensities which lie between certain values. This type of non-linear transformation is a task in which neural networks perform well. **Figure 2.** A representative pixel intensity histogram of raw and ground truth image shows large variance. By comparing the raw and ground truth pixel intensity histograms it can be visualised that they a) share certain areas of similarity (as seen at the peak between 1000 and 1250) but also b) contain regions which have different non-zero abundances (within the peak between 750 and 1000). As there are no regions where pixel intensity is either present or not present, images cannot be processed by simply flattening pixel intensities which lie between certain values. This type of non-linear transformation is a task in which neural networks perform well.

histograms of both the raw and ground truth images below in Figure 2 (generated as part

#### 2.6.2. Calculation of Image Similarity 2.6.1. Tissue Distribution

Mean squared error (MSE) and structural similarity index (SSI) metrics were used to compare the raw and ground truth images with the resulting predicted images. Mean squared error is a full pixel-wise reference metric with values closer to zero being better; it is the sum of the accumulative mean squared difference across each pixel location between a pair of images. This technique, however, is extremely sensitive and seemingly large amounts of MSE can be accumulated by very minor shifts in the image, as perceived by the human eye, such as slight rotations or horizontal and vertical translations [41]. A newer, more holistic and subtle approach which avoids the extreme position sensitivity of MSE is calculating the SSI, which analyses local similarities in structure, luminance and The areas of all tissues within the ground truth and predicted images were calculated using the SciKit Image contour function for later use in determining percentage tissue composition. Tissue masks for each image were applied by first setting pixel intensityvalues (fat, muscle, bone) outside the respective tissue windows to zero and then setting values within the window to max (2550). Fat, muscle and bone % of each image were determined by comparing the number of pixels that fell within each of the respective tissue masks to that of the area of all tissue. By visualising each of the tissue masks independently, muscle and fat distribution could be observed in addition to locations of key physical features such as bones for further geometric phenotype analysis.

#### contrast to more closely mimic how the human eye perceives similar images [42]. Both MSE and SSI were calculated for each pairwise comparison of image classes (raw, ground 2.6.2. Skeleton Geometry

truth or predicted in this study) using the SciKit Image python image processing library as documented in Supplementary File S4 [43]. *2.7. Phenotype Measurement Using Computer Vision*  Automated phenotype extraction from ground truth and predicted (processed) images was performed using a pipeline which incorporated known pixel intensity value thresholding for each component of the carcass, based upon manual dissection, for each tissue type in combination. Geometric phenotypes were computed predominantly using One key phenotype used for estimation of muscularity is the ratio of the length and width of the gigot muscle. These dimensions are typically measured by hand from the CT scan image but, by using CV models, we can extract this information automatically from the bone tissue mask image by implementing SciKit Image area and crofton perimeter functions [44]. Since small pieces of grit and sand may appear in the bone mask, due to high density as detected by X-rays, only bone mask objects over 200 pixels in both area and perimeter are referenced. Then, to avoid including spinal bone tissue, the four largest objects in the most +Y direction are assumed to be the features of interest and are placed into pairs according to their position along the X axis. The distance in pixels is then calculated between each pair of bones to determine gigot length. A line perpendicular to that between the bone pairs is then used to find the furthest non-zero positions within the muscle tissue mask and thus determine gigot width.

## *2.7. Computing Hardware and Software*

The training of machine learning models can be an intensive computational task which typically requires powerful graphics processing units (GPUs). As such, all computation was performed on an NVIDIA DGX Station workgroup server [45]. The DGX workstation provided supercomputing performance with one out of a total of four TESLA V100 GPUs being used for computations underpinned by an Ubuntu operating system. All code was run within a Compute Unified Device Architecture (CUDA) 10.1 docker container which allows parallelisation of general-purpose processing to be applied to the powerful GPUs. Within this container, the open source learning framework Chainer was used to accelerate creation of the neural networks [46]. The GAN trained in this study is an implementation of AUTOMAP [37] and Pix2Pix [25] which has been optimised for use with paired image datasets [38]. Predicted images produced by the GAN were then processed using a bespoke python script run within a Jupyter notebook (Supplementary File S4). The notebook contains code within cells which can either (a) run individual steps and generate intermediary output figures (slower) or (b) calculate metrics and compare images without visualising any medical images (faster).

#### **3. Results**

The trained model was able to transform the raw images with a high degree of accuracy and perform the large image area manipulations, such as scanning cradle and testicle removal, needed to produce images similar to the manually processed ground truth images. The accuracy of these transformations was confirmed by visual inspection of predicted images and measurement of image similarity metrics including MSE and SSI. Phenotypic traits such as fat, muscle and bone tissue distribution and both gigot length and width were then automatically extracted from the predicted (transformed) images using CV techniques. All values calculated using this pipeline area are recorded in an output file (Supplementary File S5).

#### *3.1. CT Scan Processing Using Trained GAN*

Raw CT scans not previously seen by the GAN were processed using the trained model at a speed of 0.11 s per scan. Predicted and ground truth images and pixel intensity histograms were first compared visually to initially assess GAN suitability and ensure that they were visually similar (Figure 3). Quantitative metrics such as MSE and SSI were further determined to accurately assess the success of the GAN for processing the CT scans (Figure 4).

#### 3.1.1. Images Produced from Trained Model

The trained model was able to perform the major structural alterations within the image dataset needed to transform the raw CT scans into something which, by eye, strongly resembled the ground truth images as shown below in Figure 3. Image IDs 1732, 9638 and 8353 were chosen to illustrate this transformation since, on visual inspection, they contained the largest area of features needing to be removed (large testes and a large scanning cradle).

ages. The accuracy of these transformations was confirmed by visual inspection of predicted images and measurement of image similarity metrics including MSE and SSI. Phenotypic traits such as fat, muscle and bone tissue distribution and both gigot length and width were then automatically extracted from the predicted (transformed) images using CV techniques. All values calculated using this pipeline area are recorded in an output

Raw CT scans not previously seen by the GAN were processed using the trained model at a speed of 0.11 s per scan. Predicted and ground truth images and pixel intensity histograms were first compared visually to initially assess GAN suitability and ensure that they were visually similar (Figure 3). Quantitative metrics such as MSE and SSI were further determined to accurately assess the success of the GAN for processing the CT

The trained model was able to perform the major structural alterations within the image dataset needed to transform the raw CT scans into something which, by eye, strongly resembled the ground truth images as shown below in Figure 3. Image IDs 1732, 9638 and 8353 were chosen to illustrate this transformation since, on visual inspection, they contained the largest area of features needing to be removed (large testes and a large

file (Supplementary File S5).

scans (Figure 4).

scanning cradle).

*3.1. CT Scan Processing Using Trained GAN* 

3.1.1. Images Produced from Trained Model

**Figure 3.** Representative comparison of raw, ground truth and predicted CT scan images. A trained generative adversarial network (GAN) was used to process raw CT images (**left** column) into something resembling manually processed ground truth images (**middle** column). Non-quantitative visual inspection of predicted results (**right** column) indicated that images produced by this GAN are **Figure 3.** Representative comparison of raw, ground truth and predicted CT scan images. A trained generative adversarial network (GAN) was used to process raw CT images (**left** column) into something resembling manually processed ground truth images (**middle** column). Non-quantitative visual inspection of predicted results (**right** column) indicated that images produced by this GAN are similar to ground truth counterparts. The GAN showed good capabilities in automatically handling the large image transformations needed to remove image objects such as testes and scanning cradle. MSE of 58,674 ± 17,766 and 58,008 ± 17,319 and with average SSIs of 0.49 ± 0.025 and 0.48 ± 0.024, respectively, indicating a high degree of image dissimilarity. On the other hand, comparing images in the ground truth and predicted datasets showed a much lower average MSE (1028 ± 1201) and a far higher average SSI of 0.98 ± 0.0035, indicating a far greater similarity and indicating high accuracy of the trained model in mimicking the manual processing of CT scan images.

**Figure 4.** Quantifying a high degree of quantified similarity between ground truth and predicted images. Comparing a representative pixel intensity histogram of a ground truth and predicted image (left) showed a high degree of overlay and that peaks were present in similar areas at similar amplitudes, indicating a similar distribution of pixel intensities within each image. Structural components of image groups were compared (right) using mean squared error (MSE) and structural similarity indexes (SSIs) which revealed a) high average MSE (58,674 ± 17,766 and 58,008 ± 17,319, n = 32) with low average SSI (0.49 ± 0.025 and 0.48 ± 0.024, n = 32) between raw vs. ground truth and raw vs. predicted image groups, respectively, b) low average MSE (1028 ± 1201) and high average SSI (0.98 ± 0.0035) when comparing ground truth vs. predicted images. These high SSI and low MSE values confirm the suitability of a trained generative adversarial network to perform highly **Figure 4.** Quantifying a high degree of quantified similarity between ground truth and predicted images. Comparing a representative pixel intensity histogram of a ground truth and predicted image (**left**) showed a high degree of overlay and that peaks were present in similar areas at similar amplitudes, indicating a similar distribution of pixel intensities within each image. Structural components of image groups were compared (**right**) using mean squared error (MSE) and structural similarity indexes (SSIs) which revealed a) high average MSE (58,674 ± 17,766 and 58,008 ± 17,319, n = 32) with low average SSI (0.49 ± 0.025 and 0.48 ± 0.024, n = 32) between raw vs. ground truth and raw vs. predicted image groups, respectively, b) low average MSE (1028 ± 1201) and high average SSI (0.98 ± 0.0035) when comparing ground truth vs. predicted images. These high SSI and low MSE values confirm the suitability of a trained generative adversarial network to perform highly accurate ovine CT image processing.

as length and width (Supplementary File S4).

The image processing library SciKit Image was successfully implemented to provide CV capabilities in the automated phenotype extraction pipeline. In this study, phenotypes of interest included fat, muscle and bone tissue abundance as well as leg geometry such

accurate ovine CT image processing.

#### 3.1.2. Image Similarity Metrics Confirm a High Degree of Similarity

Just as raw and ground truth image histograms were compared previously, likewise the ground truth and predicted images were compared in a similar fashion which revealed two histograms, highly similar, showing a large proportion of overlap and a high degree of similarity from visual inspection. The likeness of the raw, ground truth and predicted image sets (n= 32) was compared pairwise using MSE and SSI. Both raw vs. ground truth and raw vs. predicted showed the lowest image similarity values with an average MSE of 58,674 ± 17,766 and 58,008 ± 17,319 and with average SSIs of 0.49 ± 0.025 and 0.48 ± 0.024, respectively, indicating a high degree of image dissimilarity. On the other hand, comparing images in the ground truth and predicted datasets showed a much lower average MSE (1028 ± 1201) and a far higher average SSI of 0.98 ± 0.0035, indicating a far greater similarity and indicating high accuracy of the trained model in mimicking the manual processing of CT scan images.

#### *3.2. Automated Phenotype Extraction*

The image processing library SciKit Image was successfully implemented to provide CV capabilities in the automated phenotype extraction pipeline. In this study, phenotypes of interest included fat, muscle and bone tissue abundance as well as leg geometry such as length and width (Supplementary File S4).

#### 3.2.1. Leg Tissue Composition

Tissue abundance and distribution of fat, muscle and bone, within the single 2D image analysed, were calculated by counting pixels which fell within experimentally determined tissue pixel intensity windows compared to the total tissue area. Binary visualisation of these tissue value windows allowed rapid profiling of tissue distribution as seen below in Figure 5. Using this method, tissue abundances were calculated for each medical image in terms of both area and percentage composition (Figure 6). On average, the area of bone, muscle and fat across the dataset was 6488 <sup>±</sup> 533, 44,274 <sup>±</sup> 4051 and 5712 <sup>±</sup> 1377 mm<sup>2</sup> . Carcass tissue composition percentage-wise for bone, muscle and fat was 11.52 ± 0.78, 78.41 ± 1.90 and 10.07 ± 2.03%.

#### 3.2.2. Gigot Length and Width Phenotype Extraction

By applying CV functions from the SciKit Image library such as area, perimeter and location restraints to objects in the bone tissue mask, the position and centre of key features were detected, and gigot length and width determined automatically as part of the CV script (Supplementary File S4). This process is visualised below in Figure 7. Left and right gigot lengths were 164.45 ± 8.72 mm and 166.38 ± 9.71 mm with widths being 137.55 ± 10.53 mm and 143.99 ± 12.42, respectively.

#### 3.2.3. Phenotype Extraction Accuracy

Phenotypes from both predicted and ground truth datasets were extracted using the computer vision pipeline and compared to determine the suitability of predicted images for phenotype determination as seen below in Figure 8. Across all phenotypes, the average values were on average 101.44% that of the ground truth value with a standard deviation of 12.90% (n = 32). Muscle % was the most accurate predicted phenotype with estimated values between 93.67 and 106.65%. On the other hand, calculated fat area was the least accurate predicted phenotype with estimated values between 42.50 and 156.18% (following incomplete ovine testes removal from image ID 8346, fat-related phenotypes were not included in accuracy calculations as testes are calculated as fatty tissue. All other phenotypes for this image were recorded normally such as muscle area, bone area and gigot geometry).

Tissue abundance and distribution of fat, muscle and bone, within the single 2D image analysed, were calculated by counting pixels which fell within experimentally determined tissue pixel intensity windows compared to the total tissue area. Binary visualisation of these tissue value windows allowed rapid profiling of tissue distribution as seen below in Figure 5. Using this method, tissue abundances were calculated for each medical image in terms of both area and percentage composition (Figure 6). On average, the area of bone, muscle and fat across the dataset was 6488 ± 533, 44,274 ± 4051 and 5712 ± 1377 mm2. Carcass tissue composition percentage-wise for bone, muscle and fat was 11.52 ±

3.2.1. Leg Tissue Composition

0.78, 78.41 ± 1.90 and 10.07 ± 2.03 %.

**Figure 5.** Representative tissue distribution of fat, muscle and bone within tissue area of predicted images. The results of the trained generative adversarial network were analysed by examining total area (**top left**) and by applying pixel intensity threshold windows to separately visualise fat (**top right**), muscle (**bottom left**) and bone (**bottom right**). The total number of pixels that fell within these pixel intensity windows determined the volume of the respective tissue types in the sample since 1 pixel = 1 mm2. **Figure 5.** Representative tissue distribution of fat, muscle and bone within tissue area of predicted images. The results of the trained generative adversarial network were analysed by examining total area (**top left**) and by applying pixel intensity threshold windows to separately visualise fat (**top right**), muscle (**bottom left**) and bone (**bottom right**). The total number of pixels that fell within these pixel intensity windows determined the volume of the respective tissue types in the sample since 1 pixel = 1 mm<sup>2</sup> . *Sensors* **2021**, *21*, x FOR PEER REVIEW 12 of 18

**Figure 6.** Area and percentage composition of tissue types within predicted ovine medical images. Using the threshold windows for each of the respective tissue types, the total area occupied was calculated for each tissue type (**left**) and what percentage this represented within each individual CT scan (**right**). On average, the area of bone, muscle and fat across the dataset was 6488 ± 533, 44,274 ± 4051, 5712 ± 1377. Carcass tissue composition percentage-wise for bone, muscle and fat was 11.52 ± 0.78, 78.41 ± 1.90 and 10.07 ± 2.03 %. **Figure 6.** Area and percentage composition of tissue types within predicted ovine medical images. Using the threshold windows for each of the respective tissue types, the total area occupied was calculated for each tissue type (**left**) and what percentage this represented within each individual CT scan (**right**). On average, the area of bone, muscle and fat across the dataset was 6488 ± 533, 44,274 ± 4051, 5712 ± 1377. Carcass tissue composition percentage-wise for bone, muscle and fat was 11.52 ± 0.78, 78.41 ± 1.90 and 10.07 ± 2.03%.

tures were detected, and gigot length and width determined automatically as part of the CV script (Supplementary File S4). This process is visualised below in Figure 7. Left and right gigot lengths were 164.45 ± 8.72 mm and 166.38 ± 9.71 mm with widths being 137.55

3.2.2. Gigot Length and Width Phenotype Extraction

± 10.53 mm and 143.99 ± 12.42, respectively.

*Sensors* **2021**, *21*, x FOR PEER REVIEW 13 of 18

**Figure 7.** Automated identification of key features and determination of gigot length and width from predicted images. Using the bone tissue mask (**top left**) from the predicted image generated by the GAN, key features can be identified (**top right**). The centroid of each object within each feature area can be calculated (**bottom left**) to then measure distances and determine gigot length (LL, RL **bottom right**). By taking the perpendicular equation of the line which connects the two pairs of bones, the width of the gigot (LW, RW, **bottom right**) can be calculated by discovering the first and last non-zero values of these positions within the muscle tissue mask. Left and right gigot lengths on average were 164.45 ± 8.72 mm and 166.38 ± 9.71 mm with widths being 137.55 ± 10.53 mm and 143.99 ± 12.42, respectively. **Figure 7.** Automated identification of key features and determination of gigot length and width from predicted images. Using the bone tissue mask (**top left**) from the predicted image generated by the GAN, key features can be identified (**top right**). The centroid of each object within each feature area can be calculated (**bottom left**) to then measure distances and determine gigot length (LL, RL **bottom right**). By taking the perpendicular equation of the line which connects the two pairs of bones, the width of the gigot (LW, RW, **bottom right**) can be calculated by discovering the first and last non-zero values of these positions within the muscle tissue mask. Left and right gigot lengths on average were 164.45 ± 8.72 mm and 166.38 ± 9.71 mm with widths being 137.55 ± 10.53 mm and 143.99 ± 12.42, respectively.

Phenotypes from both predicted and ground truth datasets were extracted using the computer vision pipeline and compared to determine the suitability of predicted images for phenotype determination as seen below in Figure 8. Across all phenotypes, the average values were on average 101.44 % that of the ground truth value with a standard deviation of 12.90 % (n = 32). Muscle % was the most accurate predicted phenotype with estimated values between 93.67 and 106.65 %. On the other hand, calculated fat area was the least accurate predicted phenotype with estimated values between 42.50 and 156.18 % (following incomplete ovine testes removal from image ID 8346, fat-related phenotypes were not

3.2.3. Phenotype Extraction Accuracy

included in accuracy calculations as testes are calculated as fatty tissue. All other phenotypes for this image were recorded normally such as muscle area, bone area and gigot

**Figure 8.** Comparing values of predicted and ground truth phenotypes. Prediction estimation accuracy was determined by comparing phenotype values generated from both predicted and ground truth datasets using the computer vision pipeline. Across all phenotypes, predicted values for each image were on average 101.44 % that of the ground truth value with a standard deviation of 12.90 % (n = 32). **Figure 8.** Comparing values of predicted and ground truth phenotypes. Prediction estimation accuracy was determined by comparing phenotype values generated from both predicted and ground truth datasets using the computer vision pipeline. Across all phenotypes, predicted values for each image were on average 101.44% that of the ground truth value with a standard deviation of 12.90% (n = 32).

#### **4. Discussion**

geometry).

**4. Discussion**  Continued reduction in DNA genotyping cost over time has resulted in mainstream integration of genomic selection into genetic improvement programmes for a number of domesticated animals. The increase in availability of genotypes leads to the need to identify the correlated phenotypes, as subtle or rare as they may be [10]. One technology which shows great promise in detecting these subtle phenotypes is the use of trained neural networks and CV. The processing and extraction of key data from medical images in the past have been typically performed manually by trained and experienced professionals. However, more recently, emergence of trained artificial intelligence networks has contributed to increased analysis throughput and accuracy of phenotype determination, such as the increased use and accuracy of neural networks for cancer and disease detection compared to the results of medical professionals [15,16,47,48]. By implementing similar techniques in the field of animal breeding, we hope to enhance the speed and accuracy of phenotype Continued reduction in DNA genotyping cost over time has resulted in mainstream integration of genomic selection into genetic improvement programmes for a number of domesticated animals. The increase in availability of genotypes leads to the need to identify the correlated phenotypes, as subtle or rare as they may be [10]. One technology which shows great promise in detecting these subtle phenotypes is the use of trained neural networks and CV. The processing and extraction of key data from medical images in the past have been typically performed manually by trained and experienced professionals. However, more recently, emergence of trained artificial intelligence networks has contributed to increased analysis throughput and accuracy of phenotype determination, such as the increased use and accuracy of neural networks for cancer and disease detection compared to the results of medical professionals [15,16,47,48]. By implementing similar techniquesin the field of animal breeding, we hope to enhance the speed and accuracy of phenotype detection to streamline swift integration into genetic improvement programmes.

detection to streamline swift integration into genetic improvement programmes. As part of this automated pipeline, a generative adversarial network was first trained to perform the necessary image-to-image translation required for automatically processing previously unseen CT scan images for subsequent phenotype extraction using CV at a speed of 0.11 s per image, this speed is far greater than the approximate 30 min required to manually process the image. The resultant images processed in this manner had an SSI of (0.98 ± 0.0035) when compared to the manually processed ground truths according to their structural similarity index and were visually indistinguishable. Automated phenotype extraction from predicted CT images was then performed by subdividing each image into the respective tissue masks to display the fat, muscle and bone volume and distribution.

By using key feature detection within the bone image mask, distances between ischium and femur bone cross-sections were calculated to determine the geometric phenotype of gigot length and width. Phenotype values determined using the computer vision pipeline were on average 101.4 % that of the ground truth value with a standard deviation of 12.90% (n = 32), indicating a high level of accuracy across the population.

One of the potential limitations of this study was the small training dataset (m = 126) as development of neural networks typically uses datasets numbering in the thousands. However, this limited dataset did not cause any major issues in accuracy as ground truth and predicted images showed an SSI of 0.98 ± 0.0035 and were indistinguishable by eye. One possible reason for such high accuracy with this limited dataset was that all subjects within the CT scans were constrained to similar postures. This hypothesis was later confirmed by re-introducing artificial random movement (such as rotation or vertical/horizontal shifts) into the images used for GAN training, resulting in a higher validation loss, poorer network performance and blurry resultant images (Supplementary Figures S1–S9).

Unfortunately, using this limited dataset resulted in one of the unseen images containing a small amount of testis tissue following processing with the GAN which was then incorrectly quantified as fat tissue. In the future, as more images are integrated into the model, we believe that the accuracy of the GAN shall improve which shall directly improve the precision of the CV phenotype determination pipeline.

#### **5. Conclusions**

In summary, we believe this research represents the first case of using an automated phenotype detection pipeline on agricultural animal medical images. This was achieved by using a combined GAN-CV pipeline to analyse agricultural medical images in a fully automated fashion. By feeding a paired image dataset into a GAN, we were able to perform the various image processing steps needed to produce a predicted image, containing only the relevant tissues, with accuracies of 98% which rivalled that of manual processing and at a fraction of the cost. Phenotypes were then extracted or calculated from these predicted images by applying CV techniques as part of an automated pipeline.

We hope to immediately expand this highly accurate GAN-CV pipeline to process and extract phenotypes from other key CT scan sections such as the 8th thoracic vertebra and 5th lumbar vertebra positions. Further on, we hope to develop a pipeline to process a complete set of layered CT images to produce an accurate 3D model from which a multitude of phenotypes can then be extracted, such as spine length and vertebra number, and detect phenotypes which are best explored in 3D space such as organ morphology [49,50]. By continuing this research we will further expand the automated extraction of phenotypes from agricultural medical imaging data and use the findings to guide genetic and genomic breeding programmes.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/s21217268/s1, Performance of generator and discriminator networks under varying degrees of random translation (Figures S1–S6), Raw, ground truth and predicted images produced from the networks with varying degrees of random translation (Figures S7–S9), Training images (File S1), Script for GAN training (File S2), Unseen images (File S3), Computer vision python notebook (File S4), Results table (File S5), Trained GAN model (File S6).

**Author Contributions:** Conceptualization, M.C.; methodology, J.F.R.; software, J.F.R.; validation, M.C., S.J.D.; formal analysis, J.F.R.; investigation, J.F.R.; resources, M.C., S.J.D.; data curation, M.C.; writing—original draft preparation, J.F.R.; writing—review and editing, M.C., S.J.D.; visualization, J.F.R.; supervision, M.C.; project administration, M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Edinburgh Genetic Evaluations Services (EGENES) using data collected under a number of other funded projects (RESAS and AHDB Beef and Lamb) as part of CBS 1033316.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, all 2019-2020 procedures involving animals were approved by the SRUC Animal Ethics Committee and were performed under UK Home Office license (PPL P90111799), following the regulations of the Animals (Scientific Procedures) Act 1986.

**Data Availability Statement:** All raw and ground truth training images used in this study are included as part of Supplementary File S1. All raw, ground truth and predicted images from the unseen medical images are included as part of Supplementary File S3. The trained model is provided as Supplementary File S6.

**Acknowledgments:** This work was supported by Nicola Lambe, Kirsty McLean and John Gordon of the SRUC CT scanning and ovine research department by collecting such excellent paired and annotated training data. NVIDIA are acknowledged for their technical support of the DGX Station and software advice.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

