Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4

Pei, Haotian; Sun, Youqiang; Huang, He; Zhang, Wei; Sheng, Jiajia; Zhang, Zhiying

doi:10.3390/agriculture12070975

Open AccessArticle

Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4

by

Haotian Pei

^1,2

,

Youqiang Sun

²

,

He Huang

^2,*

,

Wei Zhang

^1,2

,

Jiajia Sheng

^2,3

and

Zhiying Zhang

⁴

¹

Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China

²

Institute of Intelligent Machines, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China

³

Science Island Branch of Graduate School, University of Science and Technology of China, Hefei 230026, China

⁴

Institute of Science and Technology Information, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(7), 975; https://doi.org/10.3390/agriculture12070975

Submission received: 19 May 2022 / Revised: 21 June 2022 / Accepted: 4 July 2022 / Published: 6 July 2022

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Effective maize and weed detection plays an important role in farmland management, which helps to improve yield and save herbicide resources. Due to their convenience and high resolution, Unmanned Aerial Vehicles (UAVs) are widely used in weed detection. However, there are some challenging problems in weed detection: (i) the cost of labeling is high, the image contains many plants, and annotation of the image is time-consuming and labor-intensive; (ii) the number of maize is much larger than the number of weed in the field, and this imbalance of samples leads to decreased recognition accuracy; and (iii) maize and weed have similar colors, textures, and shapes, which are difficult to identify when an UAV flies at a comparatively high altitude. To solve these problems, we propose a new weed detection framework in this paper. First, to balance the samples and reduce the cost of labeling, a lightweight model YOLOv4-Tiny was exploited to detect and mask the maize rows so that it was only necessary to label weeds on the masked image. Second, the improved YOLOv4 was used as a weed detection model. We introduced the Meta-ACON activation function, added the Convolutional Block Attention Module (CBAM), and replaced the Non-Maximum Suppression (NMS) with Soft Non-Maximum Suppression (Soft-NMS). Moreover, the distributions and counts of weeds were analyzed, which was useful for variable herbicide spraying. The results showed that the total number of labels for 1000 images decrease by half, from 33,572 to 17,126. The improved YOLOv4 had a mean average precision (mAP) of 86.89%.

Keywords:

crop row mask; weed detection; samples balance; YOLOv4; Meta-ACON; attention module; Soft-NMS; UAV images

1. Introduction

Maize plays an important role in agriculture because of its nutritional value and high consumption [1]. Excessive weeds in maize fields affect the growth and yield of maize. Weeds not only compete with crops for living space but also lead to the spread of diseases and pests, resulting in crop failure [2,3,4]. Therefore, we need to identify weeds in maize fields and ascertain their distribution and quantity. At present, the main method of weeding is large-scale spraying with herbicides, which destroys the ecological environment, consumes resources, and affects food safety [5]. The application of UAV technology in agriculture has developed rapidly [6,7,8] as it provides unique features that humankind cannot implement on the ground [9], thus facilitating the detection of weeds in maize fields. UAVs can obtain complete field images with high resolution [10,11]. The advantage of UAV images is that they have a wide collection area and can quickly obtain complete field data for statistics and analysis [12]. The analysis of farmland images and the implementation of precise spraying are of great significance for controlling the growth of weeds and improving grain yield [13]. To implement precise spraying, the first problem is to identify the distribution of crops and weeds in any given area of farmland. Many experts have conducted research in the field of weed detection and achieved corresponding results.

Radhika Kamath et al. [14] looked at how texture features extracted from Laws’ texture masks can be used to distinguish between carrot crops and weeds in digital photographs. Based on the analysis of photos obtained by UAVs, Lottes, P et al. [15] suggested a system that performs vegetation detection, plant-tailored feature extraction, and classification, consequently estimating the distribution of crops and weeds in fields. Using UAV photos gathered from a chili crop field in Australia, Nahina Islam et al. [16] assessed the performance of numerous machine learning methods, including random forest, Support Vector Machine (SVM), and k-nearest neighbor, and detected weeds. Although traditional machine learning methods have certain effects on weed recognition, they still exhibit defects, such as difficult feature extraction and poor anti-interference ability. Since the development of deep learning, convolutional neural networks (CNNs) have been widely used in agriculture. Compared with traditional machine learning methods, CNNs have strong feature extraction and autonomous learning abilities, thereby achieving outstanding results and making them more useful in the detection and localization of weeds [17]. ResNeXt-SVM is a hybrid model of deep learning for weed detection proposed by Jabir Brahim et al. [18]. They created the ResNeXt-SVM framework by combining a ResNeXt and SVM network to improve the exploitation of structured characteristics in images and the understanding of their content. KunHu et al. [19] introduced the Graph Weeds Net (GWN), a unique graph-based deep learning architecture aimed at recognizing numerous species of weed from standard RGB photos taken from complicated rangelands. We experimented and tested our data with different object detection algorithms. The single-stage algorithm included YOLOv3 [20], YOLOv4 [21], and SSD [22], and the two-stage algorithm used Faster R-CNN [23]. We found that YOLOv4 achieved higher accuracy, so we chose YOLOv4 as the baseline and optimized it for the weed detection model.

However, there are still some problems in the existing research: (i) the cost of labeling crops is high (an image contains many maize plants that need to be labeled), and annotation of the image is time-consuming and labor-intensive, (ii) the sample size is seriously unbalanced (more weed samples are needed, but labeled crop samples vastly outnumber the weed samples, which can lead to decreased recognition accuracy), and (iii) maize and weed have similar colors, textures, and shapes, which renders them difficult to identify when a UAV flies at a comparatively high altitude.

To solve the above problems, we designed a new weed detection framework. The framework of this paper is shown in Figure 1. In the first stage, it was necessary to detect crop rows. The traditional method of crop row detection is semantic segmentation. Semantic segmentation requires a pixel-level label, which cannot meet the requirement of simpler labeled content. Therefore, an object detection method was adopted to achieve it. We labeled a small number of maize row datasets. A lightweight model YOLOv4-Tiny was exploited to detect and mask the maize rows so that we only needed to label weeds on the masked image. Labeling the masked image in this way was convenient, and it also reduced the cost of labeling. In the second stage, we generated datasets by selecting images from the masked images to label the weeds, and we selected images from the original images to label the maize and weeds. Then, we considered the features of the detected image to optimize the model. We introduced the Meta−ACON [24] activation function, which can improve generalization and transmission performance. We added the CBAM [25] to allow our model to adaptively focus on more important areas. Due to the high height of the captured image, it often happens that maize or weeds are adjacent, so we used Soft-NMS [26] to avoid adjacent bounding boxes being removed. Finally, the optimized YOLOv4 model was used for weed detection.

The main contributions of this paper are as follows: (i) a new weed detection framework is proposed to deal with the high cost of labeling and the serious imbalance of samples, (ii) an optimized weed detection model is proposed, which has a good recognition accuracy, and (iii) the analysis of high-resolution images of farmland collected by the UAV, which can calculate the quantity, distribution, and density of maize and weeds in large areas.

2. Materials and Methods

2.1. Data Collection

We used a DJI Phantom 4 RTK for image collection in the experimental farm of the Yellow River Delta Agricultural Hi-Tech Industry Demonstration Zone of Dongying City, Shangdong Province, China. A Phantom 4 RTK camera uses a 1 inch CMOS sensor to capture 20 mega-pixel imagery with a resolution of 4864 ∗ 3648 pixels. During flight, a 3-axis gimbal on a Phantom 4 RTK provides a steady platform to keep the attached camera pointed close to the nadir. The capture area is 40 mu. The image was captured in the seedling stage (V1), and the specific image acquisition time was May 2021. We planned the flight route of the UAV according to the image resolution and its flight efficiency. We set the flying altitude to 25 m and the Ground Sampling Distance (GSD) to 0.685 cm/pixel. Finally, the visible photos were synthesized into a farmland image through two-dimensional reconstruction using pix4D [27] software.

The image collected by the UAV is shown in Figure 2. We cropped the edge of the non-farmland area to obtain an image of the whole farmland, as shown in Figure 3a. Finally, we cropped the image to a suitable resolution for the identification of maize and weed. The image was cropped to 416 ∗ 416 pixels, as shown in Figure 3b.

2.2. YOLOv4 and YOLOv4-Tiny

Convolutional Neural Networks have been widely employed in object detection and have shown promising results [28]. YOLO is a one-stage object detection network that turns object detection into a regression problem, as is typical of a standard object detection technique. By processing photos with a single CNN, YOLO can directly extract the target’s categories and position coordinates. YOLO has the advantages of fast detection speed, a small model, and convenient deployment. The increase in efficiency and accuracy of YOLOv4 compared with YOLOv3 arise mainly from several improvements incorporated into the model: (i) the backbone extraction network is improved from Darknet53 to CSPDarknet53; (ii) the spatial pyramid pooling (SPP) module is introduced to significantly increase the receptive field, (iii) the use of the Path Aggregation Network (PANet) as the parameter aggregation method, and (iv) the Mosaic data augmentation and Mish activation functions are used to further improve the accuracy.

YOLOv4-Tiny is a simplified version of YOLOv4 that greatly improves training and detection speeds by reducing the amount of network calculation [29]. YOLOv4-Tiny omits spatial pyramid pooling (SPP), and the Path Aggregation Network (PANet) uses the CSPDarknet53-Tiny network as the backbone network to replace the CSPDarknet53 network used in YOLOv4 [30]. Regarding feature fusion, YOLOv4-Tiny uses the feature pyramid network to extract and fuse features of different scales, which improves the accuracy of object detection. In addition, other lightweight versions of YOLOv4 include YOLOv4-Mobilenetv1 [31], YOLOv4-Mobilenetv3 [32], and YOLOv4-Ghost [33]. In YOLOv4-Mobilenetv1 and YOLOv4-Mobilenetv3, the original backbone CSPDarknet53 was replaced by Mobilenetv1 and Mobilenetv3. In YOLOv4-Ghost, the core concept is to use cheap operations to generate redundant feature maps.

2.3. Crop Row Detection and Mask

Accounting for the important fact that maize is grown in the crop row, we covered the maize by masking the crop row to obtain the image only with inter-row weeds. Then, we selected an appropriate number of original images and masked images and labeled them using LabelImg [34], which skillfully solves the problem of imbalanced samples. Moreover, by reducing the annotation of maize with low demand for sample size, a lot of time was saved. Therefore, greater significance is achieved if the training time of our crop row detection model satisfies Equation (1).

T_{t r a i n} + T_{M} + T_{N} < T_{M + N}

(1)

where

T_{t r a i n}

is the time of training crop row detection model,

T_{M}

is the time of labeling M original images,

T_{N}

is the time of labeling N masked images, and

T_{M + N}

is the time of labeling M and N original images.

This meant that it was necessary to obtain the crop row data set and train the crop row detection model faster. Therefore, the requirements for the crop row detection model were as follows: (i) fewer labeled samples, (ii) simpler labeled content, (iii) shorter training time, and (iv) a lightweight model. The traditional method of crop row detection is semantic segmentation. Semantic segmentation requires pixel-level label, which cannot meet the requirement of simpler labeled content. Therefore, an object detection method was adopted to achieve this.

2.3.1. Crop Row Detection Model Dataset

To meet the requirement of fewer labeled samples, only 150 images were randomly taken for labeling. We used the LabelImg tool to label crop rows, as shown in Figure 4. There are only 2–4 labels in any given image, and it took about 30 min to label 150 images. Finally, we augmented the dataset to 750 images.

2.3.2. Crop Row Detection Model

Since we needed to quickly obtain the best model to save time when outputting masked images, the model training time was a key factor when choosing the model. We experimented with several lightweight models of YOLOv4, such as YOLOv4-Tiny, YOLOv4-Ghost, YOLOv4-Mobilenetv1, and YOLOv4-Mobilenetv3. Considering the model performance and training time, YOLOv4-Tiny is an ideal model for crop row detection.

Finally, according to the detection results from the crop row detection model, we located the crop row coordinates, drew the bounding box, and masked the bounding box so that we obtained images only with inter-row weeds. The outputted masked image is shown in Figure 5.

2.4. Weed Detection Model Dataset

A total of 1000 images were selected for annotation. We randomly selected 300 images from the original images to label the maize and weeds. Furthermore, we randomly selected 700 images from the masked images to label the weeds. We labeled the original image and the masked image according to the ratio of 3:7, which greatly reduced the maize labels with similar features and helped to achieve sample balance. In total, 7700 maize labels and 9426 weed labels were labeled using this method. We estimated the number of labels for 1000 original images based on the number of maize and weed labels for 300 original images. It was estimated that 25,690 maize labels and 7882 weed labels were to be labeled. The total number of labels for 1000 images decreased by half, from 33,572 to 17,126, and all the reduced labels were maize labels with low demand for sample size. The proposed method not only saves a lot of labeling time but also achieves sample balance.

Before the network training, data augmentation was performed on the dataset, including brightness enhancement and reduction, contrast enhancement and reduction, Gaussian noise addition, vertically flipping 30% of all images, flipping 30% of all images with the mirror, translation, rotation, and zooming, the purpose of which was to enrich the image training set, effectively extract the image features, and avoid overfitting. After data argumentation, 3000 photos yielded a total of 23,100 maize labels and 28,278 weed labels, which were used for network training and parameter adjustment.

The method of labeling the original image and masked image is shown in Figure 6a,b. Data augmentation of the original image and masked image are shown in Figure 6c,d. We randomly selected 70% of images as the training set, 10% of images as the validation set, and 20% of images as the test set.

2.5. Weed Detection Model

In this paper, we optimized the YOLOv4 model. The structure of YOLOv4 is composed of three parts: (i) backbone, (ii) neck, and (iii) YOLO head. First, a new activation function, Meta−ACON, was introduced into our model, which can adaptively determine the upper and lower bounds and dynamically control the degree of linearity and nonlinearity. Second, we also added the CBAM and optimized it by changing the activation function of the channel attention module to Meta−ACON. Finally, the traditional NMS [35] function was replaced with the Soft-NMS function. Thus, the optimized model can improve the recognition accuracy and prove the applicability of the Meta−ACON activation function, optimized CBAM, and Soft-NMS in the weed detection model. In addition, we drew a distribution map of maize and weed in the whole farmland. The numbers of maize and weed were calculated to provide a basis for variable-rate spraying. The structure of improved YOLOv4 is shown in Figure 7.

2.5.1. CBA Module

We introduced the Meta−ACON activation function into our model, and the CBL module (conv, BN, and Leaky ReLU) in the neck was changed to the CBA module (conv, BN, and Meta−ACON), which improves the performance of the model. Meta−ACON is a new activation function. It is obtained by further analysis and extending the maxout form of ReLU with smooth maximum, which can adaptively determine the upper and lower bounds of the first derivative of the function and dynamically control the degree of linearity and nonlinearity of the network. This customized activation behavior helps to improve generalization and transmission performance. The calculation formula of Meta-ACON is shown in Appendix A(a).

2.5.2. CBAM

The attention module can effectively eliminate interference factors, allowing the network to focus on areas that need more attention. We added the CBAM to the model and introduced the Meta−ACON activation function in the channel attention part. We changed its activation function from ReLU to Meta−ACON to improve the detection effect. To facilitate the comparison between our optimized attention module and the original CBAM, we named the attention module with the attached Meta−ACON A_CBAM. A_CBAM, as a lightweight convolution block attention module, infers the attention map through two independent dimensions (channel and space), and then multiplies the attention map by the input feature map for adaptive feature refinement, improving the performance of the model through end-to-end training. The process and formulas of CBAM are shown in Appendix A(b).

2.5.3. Soft-NMS

The traditional NMS sorts all bounding boxes according to their scores. It selects the bounding box

M

with the highest score and suppresses all other bounding boxes

b_{i}

that significantly overlap with

M

. The problem with the NMS is that it sets the score of the adjacent bounding box to zero, so if an object exists in the overlap threshold, it can be missed. In farmland, it often happens that maize or weeds are adjacent, and the bounding box of maize or weeds with lower scores can be missed. Soft-NMS considers both the score and degree of overlap when performing non-maximum suppression. This is achieved by decaying the detection score of all other objects to a continuous function overlapped with

M

, rather than directly setting the score to zero, as in NMS. Therefore, no objects are eliminated in this process. The comparison between Soft-NMS and NMS is shown in Figure 8. The calculation formula of Soft-NMS is shown in Appendix A(c).

2.6. Methods Evaluation Indicator

Precision and recall should both be addressed while developing a detection model, so measures such as precision, recall, F1-score, AP, and mAP were utilized to test the model’s performance and evaluate the detection outcomes in this study. The precision, recall and F1-score can be calculated by Formulas (2)–(4).

P_{r} = \frac{T P}{T P + F P}

(2)

R_{e} = \frac{T P}{T P + F N}

(3)

F 1 = \frac{2 * P_{r} * R_{e}}{P_{r} + R_{e}}

(4)

where

P_{r}

represents precision,

R_{e}

represents recall, and

F 1

represents F1-score. TP (true positive) represents the number of positive samples correctly classified, and TN (true negative) represents the number of negative samples correctly classified. FP (false positive) indicates the number of negative samples incorrectly classified as positive samples, and FN (false negative) indicates the number of positive samples incorrectly classified as negative samples.

PR curve can be drawn by taking different precision and recall values. The area of PR curve is defined as AP, and the mean value of AP of all detection categories is mAP. AP and mAP can be calculated by Formulas (5) and (6).

A P = \int_{0}^{1} p (r) d r

(5)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(6)

where

p (r)

represents the function of PR curve, and

A P_{i}

represents the AP value of each category.

3. Results and Discussion

The tests were carried out on a system that had an Intel Xeon Gold 5220 CPU and an NVIDIA Tesla V100 GPU. CUDNN 7.6.5 and CUDA 10.2 were the accelerated environments. The crop row detection network and weed detection network were trained and tested using the Ubuntu 18.04 operating system, with Python 3.7.0 as the programming language.

3.1. Crop Row Detection Model Experiment

We experimented with several lightweight models of YOLOv4, such as YOLOv4-Ghost, YOLOv4-Mobilenetv1, and YOLOv4-Mobilenetv3. We conducted nine experiments for each model. Under the condition that other parameters are the same, the batch size was set to 4, 8, and 16, and the epoch was set to 50, 100, and 200, respectively. We compared the training time of each model on the same device. The evaluation indicators of each model and the training time under the same conditions are shown in Table 1 and Table 2.

It can be seen that each model achieved good results. YOLOv4 achieved the highest AP (93.15%) and the highest recall (90.77%), but the parameter size and training time of YOLOv4 were much larger than other lightweight models. YOLOv4-Tiny achieved the second highest AP (92.97%) and the highest precision (99.28%), with the lowest parameter value and the fastest detection speed. According to Table 2, YOLOv4-Tiny, which can obtain the best model at the fastest speed, had the shortest training time. Combining the results of Table 1 and Table 2, it can be concluded that YOLOv4-Tiny is the ideal model.

3.2. Weed Detection Model Ablation Experiment

To prove that the CBA module can effectively improve the performance of the model, we compared the evaluation indicators with the original YOLOv4. The comparison is shown in Table 3.

From the above table, we can see that the mAP of the improved model increased by 0.79% compared with the original YOLOv4. The AP of maize increased by 0.7%, reaching 86.67%, and the AP of weed increased by 0.89%, reaching 84.83%. Its recall greatly improved, especially that of weed (4.15%). Since Meta-ACON can dynamically control the degree of linearity and nonlinearity of the activation function, the performance significantly improved.

We added the CBAM to the model and introduced the Meta−ACON activation function in the channel attention part. As the attention module is a plug-and-play module, we studied the effect of adding the attention module at different positions. We conducted three ablation studies on CBAM and A_CBAM to evaluate the benefits of adding them after the three effective feature layers were extracted from the backbone network and after upsampling. The comparison is shown in Table 4.

It can be seen from the above table that the effect of adding after upsampling is better than adding after the effective feature layers, and the effect of adding on both positions is the best. The mAP of A_CBAM was 0.31% higher than that of CBAM and 1.12% higher than the original YOLOv4. The AP of maize increased by 1.11%, reaching 87.08%, and the AP of weed increased by 1.14%, reaching 85.08%. These results show that the attention module focuses on important information and suppresses irrelevant details through weight parameters, which can effectively improve the performance of the model.

3.3. Weed Detection Model Comparison Experiment

To further analyze the performance, we compared our method with several other types of object detection models, such as YOLOv3, SSD, and YOLOv4-Tiny. We used the same training set, validation set, and test set to train and test these networks. Table 5 shows the comparisons made for each of these methods.

It can be seen from Table 5 that the AP, recall precision, and F1-score of the improved YOLOv4 were all higher than those of the other detection models. The mAP of our model was 1.93%, 5.13%, 6.34%, and 13.65% higher than those of original YOLOv4, SSD, YOLOv3, and YOLOv4-Tiny, respectively. In addition, YOLOv4-Tiny had the worst effect, indicating that the lightweight model is not suitable for identifying complex targets such as maize and weed in the image. The mAP of YOLOv3 and SSD were 4.41% and 3.2%, respectively, which is lower than the original YOLOv4, indicating that it is correct to select YOLOv4 as the baseline. Thus, our improved model can achieve the best effect. The mAP was 86.89%, while the AP of maize was 87.49%, and the AP of weed was 86.28%.

The PR curves of different models are shown in Figure 9. Our proposed model had the best PR curve, especially the PR curve of weed.

The detection results of different models are shown in Figure 10. Other models have the problem of missing detection, and some even recognize maize as weed. The findings show that our suggested algorithm accurately and quickly detected maize and weed in images collected in natural situations.

3.4. Maize and Weed Distribution and Counts

We input the image to be predicted into the weed detection model, determined whether the object in the image was maize or weed, and drew bounding boxes for them. We counted the different bounding boxes separately to ascertain the amount of maize and weed, a method which can help estimate the yield of maize and the number of weeds. We spliced all the output images into a complete farmland image, thus obtaining the distribution of maize and weed for the whole farmland, as shown in Figure 11. The number of maize in the experimental field was 88,845, and the number of weeds was 16,976.

3.5. Regional Data Analysis

We divided the whole farmland into several areas, calculated the number and proportion of maize and weed in each area, and plotted the data on the original map. As shown in Figure 12, where m represents the number of maize, w represents the number of weeds, and r represents the proportion of weeds in the number of maize. The green area indicates that the proportion is less than 10%, the blue area indicates that the proportion is between 10 and 20%, the yellow area indicates that the proportion is between 20 and 30%, and the red area indicates that the proportion is greater than 30%. It can be seen that the green area accounts for 22%, the blue for 44%, the yellow for 25%, and the red for 9%. For the green area, we believe that weeds do not affect the living space of maize, so we did not spray herbicides in this area. For the blue, yellow, and red areas, we increased the use of herbicides in turn. Through variable spraying of herbicides, herbicide resources are saved, and the environment is protected. Furthermore, the yield of maize is affected by many factors. Several studies have shown that plant density is related to the yield in fields [36,37]. Through analysis of the statistics of maize in each area, the plant density can be understood and adjusted, as well as helping to facilitate data support for experts.

4. Conclusions

In this paper, we identified weeds in maize fields by UAV image and achieved good results. A new weed detection framework was proposed. First, a lightweight model YOLOv4-Tiny was exploited to detect and mask the maize rows so that we only needed to label weeds on the masked image, a process that can deal with the serious imbalance of samples and the high cost of labeling. Second, to improve the network recognition accuracy, the Meta-ACON activation function was used to change the CBL module, and the A_CBAM was added to the network. Furthermore, the NMS function was replaced with the Soft-NMS function. The results showed that the proposed model had a maize AP of 87.49%, a weed AP of 86.28%, and a mAP of 86.89% and can thus accurately identify maize and weed. We drew a distribution map of maize and weed, which provided data support for variable herbicide spraying and estimation of yield. In future work, we will propose new versions, such as YOLOv5 [38], to further improve the performance of weed detection.

Author Contributions

Conceptualization, H.P., H.H. and Y.S.; methodology, H.P., W.Z. and J.S.; software, H.P.; validation, H.P., Z.Z., W.Z. and J.S.; investigation, H.P., H.H. and Z.Z.; resources, Y.S. and H.H.; data curation, H.P.; writing—original draft preparation, H.P.; writing—review and editing, H.P., H.H. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant 2021YFD200060102) and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant XDA28120400).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

(a): Meta-ACON

The calculation formula of Meta-ACON is shown in Formula (A1).

f (x) = (p_{1} - p_{2}) x \cdot σ [β (p_{1} - p_{2}) x] + p_{2} x

(A1)

where

p_{1}

and

p_{2}

are two learnable parameters, which are used for adaptive adjustment to control the upper and lower bounds of the function.

β

dynamically controls the linearity or nonlinearity of the activation function. Specifically, when

β

→∞,

f (x)

→max(

p_{1} x

,

p_{2} x

); when

β

→0,

f (x)

→mean(

p_{1} x

,

p_{2} x

).

σ

denotes the sigmoid function.

(b): CBAM

In the part of the channel attention module, we first use global average-pooling and global max-pooling to aggregate the input feature maps. Then, we merge the output feature maps through shared multi-layer perceptron (MLP). Finally, the channel attention map

M_{c} (F)

is obtained by the sigmoid function. The channel attention process can be summarized by Formulas (A2) and (A3).

M_{c} (F) = σ (M L P (F_{a v g}^{c}) + M L P (F_{m a x}^{c}))

(A2)

F^{'} = M_{c} (F) \otimes F

(A3)

where

F_{a v g}^{c}

and

F_{m a x}^{c}

denote the features of average-pooling and max-pooling,

σ

denotes the sigmoid function, and

\otimes

denotes element-wise multiplication.

F^{'}

is the output feature maps obtained by the part of channel attention module.

In the part of the spatial attention module, we first apply average-pooling and max-pooling operations along the channel axis and concatenate them. Then, we apply a convolution layer and sigmoid function to generate a spatial attention map

M_{s} (F^{'})

. The spatial attention process can be summarized by Formulas (A4) and (A5):

M_{s} (F^{'}) = σ (f^{7 * 7} ([F_{a v g}^{s}; F_{m a x}^{s}]))

(A4)

F^{″} = M_{s} (F^{'}) \otimes F^{'}

(A5)

where

F_{a v g}^{s}

and

F_{m a x}^{s}

denote the features of average-pooling and max-pooling,

f^{7 * 7}

represents a convolution operation with the filter size of 7 ∗ 7,

σ

denotes the sigmoid function,

\otimes

denotes element-wise multiplication, and

F^{″}

is the attention maps of the final output.

(c): Soft-NMS

The calculation formula of detection score

S_{i}

is shown in Formula (A6).

S_{i} = S_{i} e^{- \frac{i o u^{{(M, b_{i})}^{2}}}{σ}}

(A6)

where

M

is the bounding box with the highest score,

b_{i}

is the bounding box that significantly overlaps with

M

, and

σ

denotes the sigmoid function.

References

Zheng, Y.; Zhu, Q.; Huang, M.; Guo, Y.; Qin, J. Maize and weed classification using color indices with support vector data description in outdoor fields. Comput. Electron. Agric. 2017, 141, 215–222. [Google Scholar] [CrossRef]
Hasan, A.S.M.M.; Sohel, F.; Diepeveen, D.; Laga, H.; Jones, M.G. A survey of deep learning techniques for weed detection from images. Comput. Electron. Agric. 2021, 184, 106067. [Google Scholar] [CrossRef]
Mohidem, N.A.; Che’Ya, N.N.; Juraimi, A.S.; Ilahi, W.F.F.; Roslim, M.H.M.; Sulaiman, N.; Saberioon, M.; Noor, N.M. How Can Unmanned Aerial Vehicles Be Used for Detecting Weeds in Agricultural Fields? Agriculture 2021, 11, 1004. [Google Scholar] [CrossRef]
Ramirez, W.; Achanccaray, P.; Mendoza, L.F.; Pacheco, M.A.C. Deep convolutional neural networks for weed detection in agricultural crops using optical aerial images. In Proceedings of the 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS), Santiago, Chile, 22–26 March 2020. [Google Scholar] [CrossRef]
Etienne, A.; Ahmad, A.; Aggarwal, V.; Saraswat, D. Deep Learning-Based Object Detection System for Identifying Weeds Using UAS Imagery. Remote Sens. 2021, 13, 5182. [Google Scholar] [CrossRef]
Ahmad, A.; Ordoñez, J.; Cartujo, P.; Martos, V. Remotely Piloted Aircraft (RPA) in Agriculture: A Pursuit of Sustainability. Agronomy 2020, 11, 7. [Google Scholar] [CrossRef]
Olson, D.; Anderson, J. Review on unmanned aerial vehicles, remote sensors, imagery processing, and their applications in agriculture. Agron. J. 2021, 113, 971–992. [Google Scholar] [CrossRef]
Ganesan, R.; Raajini, X.M.; Nayyar, A.; Sanjeevikumar, P.; Hossain, E.; Ertas, A.H. BOLD: Bio-Inspired Optimized Leader Election for Multiple Drones. Sensors 2020, 20, 3134. [Google Scholar] [CrossRef]
Yayli, U.C.; Kimet, C.; Duru, A.; Cetir, O.; Torun, U.; Aydogan, A.C.; Padmanaban, S.; Ertas, A.H. Design optimization of a fixed wing aircraft. Adv. Aircr. Spacecr. Sci. 2017, 4, 65–80. [Google Scholar] [CrossRef]
De Castro, A.; Shi, Y.; Maja, J.; Peña, J. UAVs for Vegetation Monitoring: Overview and Recent Scientific Contributions. Remote Sens. 2021, 13, 2139. [Google Scholar] [CrossRef]
Guo, X.; Liu, Q.; Sharma, R.P.; Chen, Q.; Ye, Q.; Tang, S.; Fu, L. Tree Recognition on the Plantation Using UAV Images with Ultrahigh Spatial Resolution in a Complex Environment. Remote Sens. 2021, 13, 4122. [Google Scholar] [CrossRef]
Huang, Y.; Reddy, K.N.; Fletcher, R.S.; Pennington, D. UAV low-altitude remote sensing for precision weed management. Weed Technol. 2017, 32, 2–6. [Google Scholar] [CrossRef]
Somerville, G.J.; Mathiassen, S.K.; Melander, B.; Bøjer, O.M.; Jørgensen, R.N. Analysing the number of images needed to create robust variable spray maps. Precis. Agric. 2021, 22, 1377–1396. [Google Scholar] [CrossRef]
Kamath, R.; Balachandra, M.; Prabhu, S. Crop and weed discrimination using laws’ texture masks. Int. J. Agric. Biol. Eng. 2020, 13, 191–197. [Google Scholar] [CrossRef]
Lottes, P.; Khanna, R.; Pfeifer, J.; Siegwart, R.; Stachniss, C. UAV-based crop and weed classification for smart farming. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 3 June 2017. [Google Scholar]
Islam, N.; Rashid, M.; Wibowo, S.; Xu, C.-Y.; Morshed, A.; Wasimi, S.; Moore, S.; Rahman, S. Early Weed Detection Using Image Processing and Machine Learning Techniques in an Australian Chilli Farm. Agriculture 2021, 11, 387. [Google Scholar] [CrossRef]
Wang, D.; He, D. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Biosyst. Eng. 2021, 210, 271–281. [Google Scholar] [CrossRef]
Jabir, B.; Falih, N. A New Hybrid Model of Deep Learning ResNeXt-SVM for Weed Detection. Int. J. Intell. Inf. Technol. 2022, 18, 1–18. [Google Scholar] [CrossRef]
Hu, K.; Coleman, G.; Zeng, S.; Wang, Z.; Walsh, M. Graph weeds net: A graph-based deep learning method for weed recognition. Comput. Electron. Agric. 2020, 174, 105520. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv 2015, arXiv:1512.02325. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Sun, J. Activate or Not: Learning Customized Activation [DB/OL]. 2020. Available online: https://doc.paperpass.com/foreign/arXiv200904759.html (accessed on 26 April 2022).
Kweon, I.S. CBAM: Convolutional Block Attention Module [DB/OL]. 2018. Available online: https://doc.paperpass.com/foreign/arXiv180706521.html (accessed on 26 April 2022).
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceeding of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
Pix4d. Available online: https://www.pix4d.com/ (accessed on 26 April 2022).
Junsong, R.; Yi, W. Overview of Object Detection Algorithms Using Convolutional Neural Networks. J. Comput. Commun. 2022, 10, 115–132. [Google Scholar] [CrossRef]
Li, X.; Du, Y.; Yao, L.; Wu, J.; Liu, L. Design and Experiment of a Broken Corn Kernel Detection Device Based on the Yolov4-Tiny Algorithm. Agriculture 2021, 11, 1238. [Google Scholar] [CrossRef]
Li, X.; Pan, J.; Xie, F.; Zeng, J.; Li, Q.; Huang, X.; Liu, D.; Wang, X. Fast and accurate green pepper detection in complex backgrounds via an improved Yolov4-tiny model. Comput. Electron. Agric. 2021, 191, 106503. [Google Scholar] [CrossRef]
Howard, A.G. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [DB/OL]. 2017. Available online: https://doc.paperpass.com/foreign/arXiv170404861.html (accessed on 26 April 2022).
Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M.; Adam, H. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 December 2019. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
LabelImg. Available online: https://github.com/tzutalin/labelImg (accessed on 26 April 2022).
Neubeck, A.; Van Gool, L. Efficient Non-Maximum Suppression. International Conference on Pattern Recognition. IEEE Comput. Soc. 2006, 3, 850–855. [Google Scholar]
Djaman, K.; Allen, S.; Djaman, D.S.; Koudahe, K.; Irmak, S.; Puppala, N.; Darapuneni, M.K.; Angadi, S.V. Planting date and plant density effects on maize growth, yield and water use efficiency. Environ. Chall. 2021, 6, 100417. [Google Scholar] [CrossRef]
Kiss, T.; Balla, K.; Bányai, J.; Veisz, O.; Karsai, I. Associations between plant density and yield components using different sowing times in wheat (Triticum aestivum L.). Cereal Res. Commun. 2018, 46, 211–220. [Google Scholar] [CrossRef]
Jocher, G.; Stoken, A.; Borovec, J.; Liu, C.; Hogan, A.; Diaconu, L.; Poznanski, J.; Ferriday, R.; Sullivan, T.; Wang, X.; et al. Ul-tralytics/yolov5: v4.0. 2020. Available online: https://zenodo.org/record/3983579#.YsVRg4RBxPY (accessed on 26 April 2022).

Figure 1. Framework used in this paper.

Figure 2. Image collected by the UAV in the Yellow River Delta Agricultural Hi-Tech Industry Demonstration Zone Dongying City, of Shangdong Province, China.

Figure 3. Steps of image processing: (a) remove the edge, and (b) image cropping.

Figure 4. Labeling method of crop row dataset.

Figure 5. (a) The image inputted into the crop row detection model, and (b) the masked image outputted by the crop row detection model.

Figure 6. Examples of different forms of sample labeling, where the green label box is maize and the red label box is weed. (a) The original image was labeled with maize and weed; (b) the masked image was labeled with weed; (c) data augmentation of the original image, and (d) data augmentation of the masked image.

Figure 7. Improved YOLOv4 network structure diagram.

Figure 8. Comparison of the detection results between NMS and Soft-NMS. (a₁) NMS cannot detect overlapping maize; (a₂) Soft-NMS can detect overlapping maize; (b₁) NMS cannot detect overlapping weed; and (b₂) Soft-NMS can detect overlapping weed.

Figure 9. The PR curves of different models. The curve area from largest to smallest is: improved YOLOv4, YOLOv4, SSD, YOLOv3, and YOLOv4-Tiny. (a) PR curve of maize, and (b) PR curve of weed.

Figure 10. Comparison of prediction maps from different models. The red labels are weeds, and the blue and green labels are maize.

Figure 11. Distribution map of maize and weed.

Figure 12. The image is divided into small areas for data analysis.

Table 1. Comparison of the performance of different models.

Model	AP	Precision	Recall	FPS	Weight	Parameter
YOLOv4-Mobilenet v1	90.19%	98.03%	89.52%	20.78	53.6 MB	48.42 MB
YOLOv4-Mobilenet v3	92.81%	98.12%	89.16%	18.91	56.4 MB	44.74 MB
YOLOv4-Ghost	92.56%	98.59%	90.28%	16.04	44.5 MB	43.60 MB
YOLOv4	93.15%	98.06%	90.77%	18.53	256.3 MB	245.53 MB
YOLOv4-Tiny	92.97%	99.28%	89.74%	42.24	23.6 MB	23.10 MB

Table 2. Comparison of the training time of different models.

Parameters	YOLOv4- Mobilenet v1	YOLOv4- Mobilenet v3	YOLOv4- Ghost	YOLOv4	YOLOv4-Tiny
Batch size = 4 Epochs = 50	39 m 48 s	49 m 48 s	56 m 48 s	92 m 08 s	10 m 45 s
Batch size = 4 Epochs = 100	78 m 37 s	102 m 31 s	113 m 26 s	181 m 19 s	21 m 29 s
Batch size = 4 Epochs = 200	154 m 47 s	204 m 48 s	215 m 39 s	349 m 48 s	42 m 23 s
Batch size = 8 Epochs = 50	11 m 30 s	13 m 17 s	14 m 45 s	19 m 43 s	8 m 36 s
Batch size = 8 Epochs = 100	22 m 31 s	25 m 28 s	29 m 30 s	38 m 35 s	17 m 08 s
Batch size = 8 Epochs = 200	48 m 29 s	48 m 09 s	48 m 14 s	75 m 57 s	33 m 05 s
Batch size = 16 Epochs = 50	11 m 05 s	11 m 30 s	12 m 08 s	17 m 10 s	8 m 18 s
Batch size = 16 Epochs = 100	22 m 13 s	22 m 30 s	24 m 01 s	34 m 15 s	16 m 33 s
Batch size = 16 Epochs = 200	48 m 30 s	47 m 45 s	47 m 20 s	67 m 33 s	32 m 31 s

Table 3. CBA module ablation experiment.

Model	mAP	AP		Recall		Precision
Model	mAP	Maize	Weed	Maize	Weed	Maize	Weed
Original YOLO-v4	84.96%	85.97%	83.94%	81.49%	73.80%	92.60%	92.71%
YOLOv4+CBA	85.75%	86.67%	84.83%	83.25%	77.95%	92.51%	92.69%

Table 4. Attention module ablation experiment.

Model	Position	mAP	AP		Recall		Precision
Model	Position	mAP	Maize	Weed	Maize	Weed	Maize	Weed
Original YOLOv4		84.96%	85.97%	83.94%	81.49%	73.80%	92.60%	92.71%
CBAM	effective feature layers	85.10%	86.13%	84.07%	81.53%	74.69%	92.59%	92.53%
A_CBAM	effective feature layers	85.31%	86.18%	84.45%	82.42%	74.58%	90.97%	92.78%
CBAM	upsampling	85.32%	86.52%	84.12%	81.87%	72.03%	92.09%	93.86%
A_CBAM	upsampling	85.49%	86.44%	84.55%	82.23%	75.59%	92.17%	93.46%
CBAM	effective feature layers +upsampling	85.77%	86.82%	84.72%	81.44%	72.09%	93.73%	94.63%
A_CBAM	effective feature layers +upsampling	86.08%	87.08%	85.08%	82.37%	73.77%	93.15%	94.25%

Table 5. Comparison of the performance of different object detection models.

Model	mAP	AP		Recall		Precision		F1
Model	mAP	Maize	Weed	Maize	Weed	Maize	Weed	Maize	Weed
Original YOLOv4	84.96%	85.97%	83.94%	81.49%	73.80%	92.60%	92.71%	87%	82%
SSD	81.76%	83.42%	80.10%	78.59%	69.36%	87.55%	85.07%	83%	76%
YOLOv3	80.55%	82.32%	78.79%	72.77%	63.05%	89.13%	87.53%	80%	73%
YOLOv4-Tiny	73.24%	74.56%	71.93%	67.97%	56.02%	79.39%	83.98%	73%	67%
Improved YOLOv4	86.89%	87.49%	86.28%	83.55%	78.02%	93.50%	93.98%	88%	85%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pei, H.; Sun, Y.; Huang, H.; Zhang, W.; Sheng, J.; Zhang, Z. Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4. Agriculture 2022, 12, 975. https://doi.org/10.3390/agriculture12070975

AMA Style

Pei H, Sun Y, Huang H, Zhang W, Sheng J, Zhang Z. Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4. Agriculture. 2022; 12(7):975. https://doi.org/10.3390/agriculture12070975

Chicago/Turabian Style

Pei, Haotian, Youqiang Sun, He Huang, Wei Zhang, Jiajia Sheng, and Zhiying Zhang. 2022. "Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4" Agriculture 12, no. 7: 975. https://doi.org/10.3390/agriculture12070975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weed Detection in Maize Fields by UAV Images Based on Crop Row Preprocessing and Improved YOLOv4

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. YOLOv4 and YOLOv4-Tiny

2.3. Crop Row Detection and Mask

2.3.1. Crop Row Detection Model Dataset

2.3.2. Crop Row Detection Model

2.4. Weed Detection Model Dataset

2.5. Weed Detection Model

2.5.1. CBA Module

2.5.2. CBAM

2.5.3. Soft-NMS

2.6. Methods Evaluation Indicator

3. Results and Discussion

3.1. Crop Row Detection Model Experiment

3.2. Weed Detection Model Ablation Experiment

3.3. Weed Detection Model Comparison Experiment

3.4. Maize and Weed Distribution and Counts

3.5. Regional Data Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI