**1. Introduction**

The Brassicaceae are a family of flowering plants which are widely known for the multiple health benefits associated with their consumption [1]. Broccoli (*Brassica oleracea L. var. italica Plenck*.) is one of the most popular crops of this family, the global production of which reached 25 million tons in 2020 (Faostat, 2020). Approximately 10% of this quantity is produced within Europe (Eurostat, 2020, Faostat, 2020). Organic broccoli is an example of a high-value crop that requires delicate handling throughout the growing season and during its post-harvesting handling. In conventional farming, broccoli is mainly harvested mechanically, as the produce is typically intended for the process market (deep freezing). In the case of organic broccoli, as heads can be easily damaged, resulting in visible stains, it is still harvested 'on sight' by hand using handheld knives because it is targeted towards the fresh market. On top of that, this allows for a very strict time window of "optimal maturity" when the high-end quality broccoli heads should be harvested, before they remain exposed for too long in high-humidity conditions and become susceptible to fungal infections and

**Citation:** Psiroukis, V.; Espejo-Garcia, B.; Chitos, A.; Dedousis, A.; Karantzalos, K.; Fountas, S. Assessment of Different Object Detectors for the Maturity Level Classification of Broccoli Crops Using UAV Imagery. *Remote Sens.* **2022**, *14*, 731. https://doi.org/10.3390/ rs14030731

Academic Editor: Xanthoula Eirini Pantazi

Received: 14 January 2022 Accepted: 2 February 2022 Published: 4 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

quality degradation. Even slight delays from this time window can result in major losses in the final production (Figure 1). However, manual harvesting is a very laborious task, not only for the process of harvesting itself, but for the scouting required to initially identify the field segments where several broccoli plants have reached this maturity level. Moreover, the scouting process is performed on foot, as agricultural vehicles cruising across the fields result in soil compaction, which is highly undesirable in horticulture, especially in the case of organic systems [2].

**Figure 1.** Examples of broccoli fields in full bloom, representing yield losses due to quality degradation.

This case creates a very interesting challenge. First of all, the scouting process can be automated using machine learning, drastically increasing the overall efficiency and reducing the human effort required. At the same time, UAVs can act as a double-benefit factor. They can easily supervise large areas rapidly, whilst diminishing any potential soilcompaction problems. There is a growing need for automated horticultural operations due to increasing uncertainty in the reliability of labour, and to allow for more targeted, datadriven harvesting [3]. To this end, images captured from Unmanned Aerial Vehicles (UAV) could be used to replace the labour work done for field observation. UAVs are widely used in precision agriculture for image capturing and the detection of specific conditions in the field [4]. Compared to the time-consuming work that should be performed by a group of people to find a potential problem in a crop, UAVs could quickly provide a high-resolution image of the field. The produced image, combined with image vision techniques, could output the potential problem/condition that would need to be dealt with [5].

Object detection is a primary field of computer vision, determining the location of certain objects in the image, and then classifying those objects [6]. Initially, the first methods used to address this problem consisted of two stages: (1) the Feature Extraction stage, in which different areas in the image are identified using sliding windows of different sizes, and (2) the Classification stage, in which the classes of the objects detected are estimated. A common method for the implementation of image classification is the sliding window approach, where the classifier runs at evenly spaced locations over the entire image. Object detection algorithms are evaluated based on the speed and the accuracy that are demonstrated, but their optimisation in both factors could be a very challenging task [6].

Object detection techniques apply classifiers or localizers to perform detections on images at multiple locations and scales. Recent approaches like the R-CNN and its variations use region proposal methods to generate, initially, several potential bounding boxes across the image, and then run the classifier on these proposed boxes. During training with those approaches, after every classification, post-processing steps are used to refine the bounding boxes, usually by increasing the score of the best-performed bounding boxes and decreasing the worse ones, ultimately eliminating potential duplicate detections [7]. Faster R-CNN [8] is one of the most widely used two-stage object detectors. The first stage uses a region proposal network, which is an attention mechanism developed as an alternative to the earlier sliding window-based approaches. In the second stage, bounding box regression and object classification are performed. Faster R-CNN is fairly well-recognized as a successful architecture for object detection, but it is not the only meta-architecture which is able to reach state-of-the-art results [9]. On the other hand, single-shot detectors, such as SSD [10] and RetinaNet [11], integrate the entire object detection process into a single neural network to generate each bounding box prediction.

Automation in agriculture presents a more challenging situation compared to industrial automation due to field conditions and the outdoor environment in general [12,13]. Fundamentally, most tasks demand a high accuracy of crop detection and localization, as they are both critical components for any automated task in agriculture [14]. The fact that there is a constant downward trend of the available agricultural labour force [15] also adds to this problem, and makes the automation of several production aspects a necessity. Accurate crop detection and classification are essential for several applications [5], including crop/fruit counting and yield estimation. Crop detection is often the preliminary step, followed by the classification operation, such as the quantification of the infestation level through the identification of disease symptoms [16], or as per the subject of the present paper, maturity detection for the automation of crop surveying. At the same time, it is the single most crucial component for automated real-time actuation tasks, such as automated targeted spraying applications or robotic harvesting.

Focusing on horticultural crops, automation in growth-stage identification has been an open challenge for multiple decades due to the very nature of the crops, which in their majority are high-value and demand timely interventions to maintain top-cut yield quality. Therefore, different approaches have been implemented to achieve the automated mapping of crop growth across larger fields and to assist harvesting, either by correlating the images' frequency bands with broccoli head sizes for maturity detection [17–19] or to combine image analysis techniques and neural networks to identify broccoli quality parameters [20].

As developments in computer vision allowed the research to move from simple image analysis frameworks to more complex and automated pipelines, the interest shifted towards Artificial Intelligence. Commercial RGB cameras and machine learning algorithms can provide affordable and versatile solutions for crop detection. Computer vision systems based on deep CNN [21] are immune to variations in illumination and large interclass variability [22], both of which have posed challenges in agricultural imaging in the past, thus achieving the robust recognition of the targets in open-field conditions. Recent research [23–25] have shown that the Faster R-CNN (region-based convolutional neural network) architecture [8,26] or different YOLO model versions [27–29] can produce accurate results for a large set of horticultural crops and fruit orchards. Moreover, a comparison of different computer vision techniques, such as object detection and object segmentation [30], has provided significant improvements in crop detection. In recent horticultural research literature, several studies have also focused on the localization of broccoli heads, without any evaluation of their maturity, by implementing deep learning techniques [31–34].

The objective of this study was to compare state-of-the-art object detection architectures and data augmentation techniques, and to assess their potential in the maturity classification of open-field broccoli crops, using a high-Ground Sampling Distance (GSD) RGB image dataset collected from low-altitude UAV flights. The best-performing architecture and the trained model could be tested as a prototype in real-time UAV detections in order to assist in on-field broccoli maturity detection.
