*2.3. Data Pre-Processing*

Following the field data acquisition by the UAV and the imaging components, data pre-processing was performed in remote computational units. The first step of the data processing phase was to generate the orthomosaic maps from each flight. For this process, the photogrammetric software Pix4D Mapper (Pix4D SA) was used, generating a total of three (3) RGB orthomosaics, with a final GSD of 0.25 cm. In the following step, the single best orthomosaic was selected in order to ensure that the final dataset that was going to be presented to the machine did not contain duplicates of the same crops, as this would increase the initial bias of the experiment. After close inspection, the mosaic of the second flight was selected, as it produced an orthomosaic of slightly higher quality (less blurry spots and zero holes), potentially indicating that the flight conditions were better during that time window, which enabled the UAV to perform its flight in a more optimal way with fewer disruptions.

Once the mosaic was selected, the next step was to create the dataset that would be fed to the models. As the generated mosaic was georeferenced, an initial crop with a vector layer was performed in a GIS (QGIS 3.10) to eliminate the majority of the black, zero-valued pixels that were created during the mosaicking process (the exported mosaic is written in a minimum-bounding-box method, surrounding the mosaic map with black pixels to create a rectangle, were all of the bands are assigned a zero value for the pixels which did not contain any data). This cropping served another purpose, as the next pre-processing step involved "cutting" the mosaics into smaller images so that they could be used as an input for the models. This was easily performed using a script written in Python that iterated the entire mosaic and then copied the first X number of pixels in one direction and Y pixels in the other direction for all of the bands of the initial mosaic. In our case, the desired image dimensions were 500 × 500 pixels, and therefore the step of each loop (one for each axis, as the rectangular mosaic is scanned) was set to 500, to result in a dataset of rectangular RGB images with a uniform resolution.

The final step of the pre-processing involved object labelling on individual images. In this phase, the generated dataset was imported to the Computer Vision Annotation Tool (CVAT), and the images were annotated using the ground truth labels as a basis (Figure 4). Finally, the annotations of the dataset were exported in the PASCAL Visual Object Classes (VOC) format [35], as this format of bounding box annotations is required by Tensorflow. PASCAL VOC is an XML annotation format that requires a pixel-positioning encoding, meaning that once drawn, each annotation file is exported in the form of a text file containing the sequence of the four coordinates of each bounding box within the image. The annotations consisted of rectangular boxes assigned with the respective maturity class of each broccoli head they contained. The final dataset contained a total of 288 images with over 640 annotations, where most of the bounding boxes presented a squared shape (Figure 6).

**Figure 6.** The dataset class representation (**a**) and annotation size plot (**b**).
