*2.4. Analysis*

To prepare data for processing by the CNNs, all images were resized (Figure 2C). The dimensions of each image were rescaled to the correct input shape for the training algorithm (224 × 224 pixels for the bespoke CNN, VGG-16, and ResNet-50 and 229 × 229 for InceptionV3). Resizing images in this manner also ensures that the magnification, resolution, or quality of the photos available when using the tool do not affect classification.

Before transfer learning was undertaken, a benchmark was established using a bespoke handmade CNN in TensorFlow. The architecture used for this CNN comprised a ReLU activated 3 × 3 input layer with 16 nodes, a (1, 1) stride, and 'same' padding (so that output size was equal to input size). This input layer then fed into three 3 × 3 ReLU activated hidden layers, with the same stride and padding as the input layer and whose number of nodes doubled from the previous layer (e.g., 16, 32, 64, and 128). Each convolutional layer fed into a 2 × 2 pooling layer, with a (2, 2) stride, to prevent overfitting. The final hidden layer was used as the input into a binary densely connected softmax output layer to capture either fertile (0) or infertile (1). As the model was a binary classifier, it was compiled using the 'Sparse Categorical Cross-Entropy' cost method, 'Root-Mean-Squared Propagation' optimiser, and 'Binary Accuracy' metric [41]. The model was trained against two training sets and used to generate two classifiers. The first classifier was trained against the original, pre-data augmentation, training set (i.e., 367 images) and the second full training set including data augmentation (i.e., 6973 images). During fitting, experimentation found that five epochs and a batch size of 32 produced the optimal performance. These models provided two benchmarks showing the impact of both data augmentation and transfer learning.

Once a benchmark was established, transfer learning was undertaken. The VGG-16 [26], ResNet-50 [27], and InceptionV3 [29] architectures with parameters pretrained against the ImageNet dataset were repurposed using the full training set (i.e., 6973 images). Although the architectures' layers were frozen, to maintain their ImageNet weighting, each was slightly altered for its new purpose. The output layer of each architecture was replaced with a densely connected softmax layer with two outputs, so as to accommodate the binary classification of 'fertile' or 'infertile'. These altered models were then compiled and fit to the training set. As each model is a deep net classifying a binary problem, all three were compiled using the 'Sparse Categorical Cross-Entropy' cost method, 'adam' optimiser, and 'Binary Accuracy' metric [41]. The training data were then used to improve the target predictive function of the architectures to detect and classify fertility status. When fitting, manual fine-tuning of the models' hyperparameters found that five epochs and a batch size of 32 maximised performance.

#### *2.5. Resources and Requirements*

Image pre-processing, data augmentation, and analysis were performed using the TensorFlow 2.4.1 library in Python through a Jupyter notebook created for this project by the lead author. All analysis found here was performed on an Intel 2.20 GHz 10 Core Xeon Silver 4114 CPU equipped on a desktop computer with 25.8 GB of RAM.
