**Comparison of Object Detection and Patch-Based Classification Deep Learning Models on Mid- to Late-Season Weed Detection in UAV Imagery**

**Arun Narenthiran Veeranampalayam Sivakumar <sup>1</sup> , Jiating Li <sup>1</sup> , Stephen Scott <sup>2</sup> , Eric Psota <sup>3</sup> , Amit J. Jhala <sup>4</sup> , Joe D. Luck <sup>1</sup> and Yeyin Shi 1,\***


Received: 30 April 2020; Accepted: 29 June 2020; Published: 3 July 2020

**Abstract:** Mid- to late-season weeds that escape from the routine early-season weed management threaten agricultural production by creating a large number of seeds for several future growing seasons. Rapid and accurate detection of weed patches in field is the first step of site-specific weed management. In this study, object detection-based convolutional neural network models were trained and evaluated over low-altitude unmanned aerial vehicle (UAV) imagery for mid- to late-season weed detection in soybean fields. The performance of two object detection models, Faster RCNN and the Single Shot Detector (SSD), were evaluated and compared in terms of weed detection performance using mean Intersection over Union (IoU) and inference speed. It was found that the Faster RCNN model with 200 box proposals had similar good weed detection performance to the SSD model in terms of precision, recall, f1 score, and IoU, as well as a similar inference time. The precision, recall, f1 score and IoU were 0.65, 0.68, 0.66 and 0.85 for Faster RCNN with 200 proposals, and 0.66, 0.68, 0.67 and 0.84 for SSD, respectively. However, the optimal confidence threshold of the SSD model was found to be much lower than that of the Faster RCNN model, which indicated that SSD might have lower generalization performance than Faster RCNN for mid- to late-season weed detection in soybean fields using UAV imagery. The performance of the object detection model was also compared with patch-based CNN model. The Faster RCNN model yielded a better weed detection performance than the patch-based CNN with and without overlap. The inference time of Faster RCNN was similar to patch-based CNN without overlap, but significantly less than patch-based CNN with overlap. Hence, Faster RCNN was found to be the best model in terms of weed detection performance and inference time among the different models compared in this study. This work is important in understanding the potential and identifying the algorithms for an on-farm, near real-time weed detection and management.

**Keywords:** CNN; Faster RCNN; SSD; Inception v2; patch-based CNN; MobileNet v2; detection performance; inference time

### **1. Introduction**

Weeds are unwanted plants that grow in the field and compete with the crops for water, light, nutrients, and space. If uncontrolled, weeds can have several negative consequences, such as crop

yield loss, production of a large number of seeds thereby creating a weed seed bank in the field, and contamination of grain during harvesting [1,2]. Traditionally, weed management programs involve the control of weeds through chemical or mechanical means such as the uniform application of herbicides throughout the field. However, the spatial density of weeds is not uniform across the field, thereby leading to overuse of chemicals which results in environmental concerns and evolution of herbicide-resistant weeds. To overcome this issue, a concept of site-specific weed management (SSWM), which refers to detecting weed patches and spot spraying or removal by mechanical means, was proposed in the early 1990s [3–5]. Weed control early in the season is critical, since otherwise the weeds would compete with the crops for resources during the critical growth stage of the crops resulting in possible yield loss [6,7]. Therefore, in addition to the application of pre-emergence herbicides, the early application of post emergence herbicides is preferred for effective weed control and also to reduce the damage to crops. The effectiveness of weed control from post-emergence herbicides depends on the timing of application [8,9]. Detection of early season weeds in an accurate and timely manner helps in the creation of prescription maps for the site-specific application of post-emergence herbicides [10–12]. Prescription maps for post-emergence application can also be created from late-season weeds detected during the previous seasons [13–17]. Compared to early season weeds, late-season weeds do not directly affect the yield of the crop, since it is not competing for resources during the critical growth period of the crop. However, if unattended, late-season weeds can produce large numbers of seeds creating problems in the subsequent growing seasons. Therefore, the detection and control of late-season weeds can be complementary to early season weed control.

Earlier studies on weed detection often used Color Co-occurrence Matrix-based texture analysis for digital images [18,19]. Following this, there were several studies on combining optical sensing, image processing algorithms, and variable rate application implements for real-time site-specific herbicide application on weeds. However, the speed of these systems was limited by computational power constraints for real-time detection, which in turn limited their ability to cover large areas of fields [20]. Unmanned aerial vehicles (UAVs) with their ability to cover large areas in a short amount of time and payload capacity to carry optical sensors provide an alternative. UAVs have been studied for various applications in precision farming such as weed, disease, pest, biotic and abiotic stress detection using high-resolution aerial imagery [21–24]. Several studies have investigated the potential of using remote sensing to discriminate between crops and weeds for weed mapping at different phenological stages and found that results differ based on the phenology [2,10,25–33]. The similar spectral signature of the crops and the weeds, occurrence of weeds as small patches and interference of soil pixels in detection are the major challenges for remote sensing in early season weed detection [2,12]. A common approach is to use vegetation indices to segment the vegetation pixels from the soil pixels, followed by crop row detection for weed classification using techniques such as object-based image analysis (OBIA) and Hough Transform [29,32,34]. However, crop row detection-based approaches cannot detect intra-row weeds. Hence, machine learning based classifiers using features computed from OBIA were used to detect intra-row weeds as well [10]. However, the performance of OBIA is sensitive to the segmentation accuracy and so optimal parameters for the segmentation step in OBIA have to be found for different crops and field conditions [35].

With advancements in parallel computing and the availability of large datasets, convolutional neural networks (CNN) were found to perform very well in computer vision tasks such as classification, prediction, and object detection [36]. In addition to performance, another principal advantage of CNN is that the network learns the features by itself during the training process, and hence manual feature engineering is not necessary. CNNs have been studied for various image-based applications in agriculture such as weed detection, disease detection, fruit counting, crop yield estimation, obstacle detection for autonomous farm machines, and soil moisture content estimation [37–41]. CNNs have been used for weed detection using data obtained in three different ways—using UAVs, using the autonomous ground robot, and high-resolution images obtained manually in the field. A simple CNN binary classifier was trained to classify manually collected small high-resolution images of maize and

weeds [42,43]. The performance of the classifier with transfer learning on various pre-trained networks such as LeNet and AlexNet was compared, but this study was limited in variability in the obtained dataset and on the evaluation of the classification approach with large images. Dyrmann et al. [23] used a pre-trained VGG-16 network and replaced the fully connected layer with a deconvolution layer to output a pixel-wise classification map of maize, weeds, and soil. The training images were simulated by overlapping a small number of available images of soil, maize, and weeds with various sizes and orientations. The use of an encoder-decoder architecture for real-time output of pixel-wise classification maps for site-specific spraying was studied. It was found that by adding hand-crafted features such as vegetation indices, different color spaces, and edges as input channels to CNN, the model's ability to generalize to different locations and at the different growth stages of the crop improved [44–46]. Furthermore, to improve the generalization performance of the CNN-based weed detection system, Lottes et al. [25] studied the use of fully-convolutional DenseNet with spatiotemporal fusion and spatiotemporal decoder with sequential images to learn the local geometry of crops in fixed straight lines along the path of a ground robot. In the case of overlapping crop and weed objects, Lottes et al. [15] proposed a key point based feature extraction approach that was used to detect weed objects that overlap with the crop. In addition to weed detection, for effective removal of weeds using mechanical or laser-based methods, it is necessary to detect the stem location of weeds prior to actuation. A fully-convolutional DenseNet was trained to output the stem location as well as a pixel-wise segmentation map of crops and weeds [47,48].

In the case of weed detection using UAV imagery, similar to OBIA approaches mentioned above, dos Santos Ferreira et al. [3] used a Superpixel segmentation algorithm to segment objects and trained a CNN to classify these clusters. They then compared the performance with other machine learning classifiers which use handcrafted features. Sa et al. [27] studied the use of an encoder-decoder architecture, Segnet, for the pixel-wise classification of multispectral imagery and followed up with a performance evaluation of this detection system using different UAV platforms and multispectral cameras [49–51]. Bah et al. [29] used the Hough transform along with a patch-based CNN to detect weeds from UAV imagery and found that overlapping weed and crop objects led to some errors in this approach. It is to be noted that, in this approach, the patches are sliced from the large image in a non-overlapping manner. Huang et al. [30] studied the performance of various deep learning architectures for pixel-wise classification of rice and weeds and found that the fully-convolutional network architecture outperformed other architectures. Yu et al. [52] studied the use of CNN for multispecies weed detection in rye grass.

From the literature reviewed, it can be seen that automated weed detection has been primarily focused on early season weeds, since that is found to be the critical period for weed management and to prevent crop yield loss. However, it should be noted that mid- to late-season weeds that escape from the routine early-season management also threaten production by creating a large number of seeds which creates problems for several future growing seasons. With herbicide resistance, escaped weeds can proliferate and become difficult to manage. Studies on early season weeds can use vegetation segmentation as a preprocessing step to reduce the memory requirements; however, this does not apply to mid- to late-season weed imaging with no soil pixels due to canopy closure. Furthermore, because of the significant overlap between crops and weeds, it is challenging to find the optimal scale and other parameters of segmentation in OBIA to achieve the maximum performance. With deep learning-based object detection methods proving successful for tasks such as fruit counting—another situation with a cluttered background—it is hypothesized that such methods would be able to detect mid- to late-season weeds from UAV imagery. Hence, the objective of this study was to evaluate deep learning-based object detection models on detecting mid- to late-season weeds and compare their performance with patch-based CNN method for near-real time weed detection. Near-real time refers to on-farm processing of the aerial imagery on the edge device as it is collected. We refer to this as near-real time rather than real-time because there is no real time control output generated from

the collected imagery and so we refer to near-real time as the completion of processing shortly after completion of data collection. The specific objectives of the study are:

