**4. Discussion**

This study aimed to use deep learning, data augmentation, and transfer learning to develop an automatic method for the classification of mosquito fecundity. It was determined that, for a solution to this problem to be appropriate, it must (1) require no, or limited, expert knowledge to categorise an image, (2) achieve close to the human accuracy rate of 99–100%, (3) be in an easily distributable, non-proprietary, and low-cost format, and (4) classify an image faster than the estimated 2 s taken by human experts.

As such, we propose that a ResNet-50 CNN architecture [27], trained against the ImageNet database, be repurposed and fine-tuned to classify the fertility status ('fertile' or 'infertile') of *Anopheles* mosquito ovaries. Classification was based on Christopher's stages of egg development [15], with eggs in stage V classed as 'fertile' and those eggs remaining in stages I–IV labelled as 'infertile.' Here, we show that such a model is capable of automatically classifying 157 images with a 94% accuracy rate in less than 40 s. Furthermore, as the model is built using TensorFlow 2.4.1, it uses a freely available, accessible, and robust opensource technology that is easily distributable via the web or mobile phones [41]. Consequently, the approach detailed in this study meets three of its aims, as it does not require any experts to categorise an image, it is easily distributable in a free format, and it can classify images faster than an expert. However, although the accuracy rate of the model does not achieve that of a human expert, it is still highly precise and is only 5% less accurate than trained experts. Furthermore, it is likely that this accuracy rate of 94% can be raised as more data become available.

Such a model is useful when assessing the efficacy of PPF-based tools through measurements of induced sterility in laboratory reared and field-collected populations of mosquitoes [12], and it could be particularly useful for large-volume bioassays done for durability monitoring of bio-efficacy of PPF-treated ITNs distributed in disease-endemic communities over time. It can also be used in bioassays performed during resistance monitoring, whereby field-collected females are exposed to a discriminating concentration of PPF to measure induced sterility [8]. This is a practical and accessible tool available to all researchers studying the efficacy of PPF or other insecticides with a similar mode of action.

Although offering several advancements over the existing manual method for classifying ovary status via dissection and examination, the model presented here is subject to its own limitations. Machine learning will not remove the need for trained technicians to dissect ovaries, only the assessment of their fertility status. Consequently, some equipment and expertise to dissect samples and to take digital colour images are still required to use the model. However, as taking photos of dissected ovaries is standard practice for record keeping and quality control, this model's need for images should not add additional work but increase objectivity and reproducibility while removing the need for a second trained technician to confirm classification. A second limitation to the current model comes from the dataset included in its training. As only pyrethroid-resistant *Anopheles* mosquito ovaries exposed to PPF were included in this study, its results are not generalisable to other species, arthropods, or insecticides. Thirdly, as there are no established dissection and imaging guidelines for capturing mosquito ovaries, there may be considerable divergence between the methods and tools employed at different sites. This may mean that the model is currently only generalisable to those locations that use techniques similar to those detailed in this paper's methods. However, the scale of this divergence, if any, is not currently known. Lastly, although a distributable application of the ResNet-50 model is currently in development, a version of the tool accessible via the internet is not ye<sup>t</sup> available. Consequently, some knowledge of Python is currently necessary to employ the classifier.

It is likely that developments can be made to improve performance and accessibility. For example, to increase accuracy and applicability of the classification tool, the training set could be expanded to include samples exposed to other growth regulators or insecticides of interest, images from a broader range of sites, or other species of mosquito (including all cryptic subspecies of the *An. gambiae* complex). Additionally, accuracy and generalisability may be increased through the use of a fuzzy image classifier or classification using fuzzy logic, rather than a CNN. This alternative approach may improve precision as it could account for any ambiguity in the image dataset [42]. Furthermore, as the current model is limited to a binary classification of 'fertile' or 'infertile', it could be developed to capture the five Christopher stages of egg development or count the number of eggs in the dissected ovaries. Moreover, although the use of images from multiple locations in this study should

ensure that the model is robust enough to deal with differences in the dissection and imaging of samples, standard operating procedures concerning dissection and imaging need be developed to support the use of the classification tool. Lastly, the ResNet-50 model is currently only available via a Jupyter notebook; however, as a version of the model that can be accessed via the web is in development, a free and easy-to-use version of the model could be made freely available.
