**2. Related Work**

The scope of the research presented in this article is related to the use of data augmentation processes to create synthetic image datasets and object detection models to identify mummy berry disease affecting wild blueberry productivity. Thus, the literature review presented in this section is divided into two subsections. The first subsection lists the techniques reported in the literature on the use of data augmentation to create synthetic image datasets, while the second subsection details the various machine learning and deep learning algorithms reported in the literature for plant disease identification.

#### *2.1. Data Augmentation*

To build robust deep-learning models, it is important to ensure that validation error during training is minimized with the training error. The approach that has been reported in the literature to be successful is the data augmentation technique [30].

Recently, a method of data augmentation crop-and-paste has become popular in object detection [31] and instance segmentation. Khoreva et al. [32] used the cut-and-paste method to generate pairs of synthetic images for video object segmentation. However, object positions are sampled uniformly and changes between image pairs need only be guaranteed to be kept small, which does not work for image-level instance segmentation. A copy–paste method was proposed by Ghiasi et al. [33], which randomly selects a segment object and pastes it at a random location onto the background image data without considering its visual state. The high performance and efficiency of this method was experimentally verified. The authors of [34] presented a simple yet effective approach that took an object detection VOC2007 dataset and cut out objects according to their ground truth labels and pasted them onto images with different backgrounds. With this naive approach, the authors showed a significant improvement in object detection models such as YOLO [35] and SSD [36]. Khalil et al. [37] proposed a new method for augmenting annotated training datasets used for object detection tasks, which aims at relocating objects based on their segmentation masks to a new background that comprise changes in the property of the object such as: image spatial location, surrounding context, and scale. In [31], the authors proposed a context model to place segmented objects at backgrounds with proper context. They demonstrated that this technique can improve object detection on a Pascal VOC dataset. However, the method requires extra model training and off-line data preprocessing. A method of annotated instance masks with a location probability map is explored in [38] to augment the training dataset that can effectively improve the generalization ability of the dataset. Abayomi-Alli et al. [39] proposed a novel histogram transformation approach that improved the accuracy of deep learning models by generating synthetic images from

low-quality test images to enhance the number of images in a cassava leaf disease dataset by applying Gaussian blurring, motion blurring, resolution down-sampling, and overexposure with a Modified MobileNetV2 neural network model. Nair and Hinton [40] expanded and enriched their training data by random crop and horizontal reflection. They also applied PCA (principal component analysis) on the color space to change the intensity of the RGB channel (red, green, blue color model). Furthermore, geometric and color transformation were also performed on the dataset. However, the method is based upon simple transformations and cannot simulate higher levels of complexity inherent in the field environment.

Other recent works on image analysis [41,42] built and trained models on purely synthetic rendered 2D and 3D scenes. However, it is difficult to guarantee that models trained on synthetic images will generalize well to real field-collected data, as the process introduces significant changes in image statistics. To solve this problem, Gupta et al. [43] adopted a different approach by embedding real segmented objects into natural images. This reduces the presence of artifacts. The authors in [44] estimated the scene geometry and spatial orientation before synthetically placing objects to generate realistic training examples for the task of object instance detection.

#### *2.2. Deep Learning for Plant Disease Detection*

With the aim of developing effective plant disease detection systems, there has been an increasing number of research studies focused on plant disease classification and detection in recent years. Qu and Sun [15] proposed a lightweight deep learning model that can be deployed on embedded devices to detect mummy berry disease in a real environment. The model uses MobileNetV1 as the main network and adopts multi-scale feature extraction which combines dilated and depth-wise convolution in a parallel manner. In addition, at the end of the model, a feature filtering module-based channel attention mechanism is employed to improve classification performance. Fuentes et al. [45] presented a method of detection and identification of diseases and pests of tomatoes captured by camera equipment with different levels of resolution. To find a suitable deep learning architecture, the Fuentes et al. study combined three main families of detectors: fast region-based convolutional neural network (FAST R-CNN), region-based fully convolutional network (R-FCN), and single shot multibox detector (SSD) with VGG net and residual net to effectively identify nine different types of diseases and pests. Roy and Bhaduri [46] developed a deep learning based multi-class apple plant disease detection method and achieved 91.2% mean average precision and 95.9% a F1-score. The model was modified to optimize accuracy and validated by detecting diseases under complex orchard scenarios. Qi et al. [20] proposed a method for the recognition of tomato virus disease based on an improved SE-Yolov5 network model. A squeeze-and-excitation (SE) module was added to a Yolov5 model to focus the network on the effective features of tomato virus visual features. This approach improved the performance of the network.

#### **3. Materials and Methods**

In this section, we briefly present a field-collected image dataset that was used for model training and evaluation, as well as for generating synthetic images. We then introduced the system of synthetic dataset generation methods for object detection tasks. This section concludes with the description of an improved Yolov5 model based on attention mechanism and evaluation metrics.

#### *3.1. Data Source*

The first step in developing a deep learning model is to prepare a dataset. As the primary source of data in this study, images of healthy and diseased flowers, fruits, and leaves of the blueberry crop in a field environment with complex backgrounds were obtained from the University of Maine wild blueberry experimental fields at Blueberry Hill Farm, Jonesboro, ME, USA [47]. However, the total number of field images collected for training a deep learning network was not adequate. Therefore, to achieve high performance and reduce the risk of overfitting a predictive model for mummy berry disease detection, we first produced annotated synthetic images with a complex background that mimicked real field situations. Then we collected blueberry images with mummy berry disease from online sources such as the National Ecological Observatory Network (www.bugwood.org, accessed on 23 April 2022), and Google Images (www.google.com, accessed on 2 May 2022) to incorporate variety in training images, as deep learning models show enhanced results and higher generalization ability on the availability of a large dataset. A total of 459 field images of blueberries with mummy berry disease were obtained from the University of Maine wild blueberry experimental fields and online sources. Based on field images, a total of 1661 annotated images were produced by the synthetic data generation method (Table A1 in Appendix A).
