Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module

Zhang, Lijuan; Zhao, Cuixing; Feng, Yuncong; Li, Dongming

doi:10.3390/agronomy13061583

Open AccessArticle

Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module

¹

College of Internet of Things Engineering, Wuxi University, Wuxi 214105, China

²

School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(6), 1583; https://doi.org/10.3390/agronomy13061583

Submission received: 23 May 2023 / Revised: 9 June 2023 / Accepted: 9 June 2023 / Published: 12 June 2023

(This article belongs to the Special Issue AI, Sensors and Robotics for Smart Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The development of the agricultural economy is hindered by various pest-related problems. Most pest detection studies only focus on a single pest category, which is not suitable for practical application scenarios. This paper presents a deep learning algorithm based on YOLOv5, which aims to assist agricultural workers in efficiently diagnosing information related to 102 types of pests. To achieve this, we propose a new lightweight convolutional module called C3M, which is inspired by the MobileNetV3 network. Compared to the original convolution module C3, C3M occupies less computing memory and results in a faster inference speed, with the detection precision improved by 4.6%. In addition, the GAM (Global Attention Mechanism) is introduced into the neck of YOLO5, which further improves the detection capability of the model. The experimental results indicate that the C3M-YOLO algorithm performs better than YOLOv5 on IP102, a public dataset consisting of 102 pests. Specifically, the detection precision P is 2.4% higher than that of the original model, and mAP_0.75 increased by 1.7%, while the F1-score improved by 1.8%. Furthermore, the mAP_0.5 and mAP_0.75 of the C3M-YOLO algorithm are higher than those of the YOLOX detection model by 5.1% and 6.2%, respectively.

Keywords:

1. Introduction

The world’s population is vast and continues to show an upward trend. Thus, people’s demand for food is constantly evolving with the times [1]. The management of pest-related issues associated with food crops has always been a significant concern in the field of agriculture [2]. Therefore, how to adopt effective pest control methods to rise the crop yield and reduce losses in the agricultural economy is an issue of essential concern in the industry [3,4]. In the field of computer vision, some experimental studies on pest detection, from the use of machine learning to deep learning techniques, will be introduced in Section 2.

The main work of the paper is illustrated in Figure 1. First, we performed data augmentation preprocessing on the IP102 dataset and then used GAM to obtain feature information processed by the YOLOv5 backbone. These features were further extracted by the neck network with the added C3M module. Finally, the head network obtained detection image results with anchor boxes at three scales.

In summary, there are three aspects to summarize the work presented in this paper:

The model training is based on an IP102 dataset [5] with a total of nearly 19,000 images of agricultural pests, which includes 102 pests of 8 crops (for example, rice, corn, and wheat);
Adopt the YOLOv5-6.0 version [6] as the baseline. We integrate the model with the lightweight convolutional structure idea that is proposed in MobileNetV3 [7] and propose a new module, C3M, which has a faster calculation speed and higher precision than the C3 module;
The GAM attention mechanism [8] was introduced to develop the range of receptive fields. This model receives more image feature points that are extracted from the backbone of the model. As a result, the experiments eventually confirmed that the improved algorithm had a better detection effect on the pest images than the YOLOv5.

2. Related Work

2.1. Machine Learning

In the past, agricultural experts had to manually inspect pests, which was time-consuming and inefficient. However, the development of artificial intelligence technology has provided great convenience for identifying pests in agriculture. The process of object detection tasks based on machine learning can be divided into three basic steps: data acquisition, data preprocessing, and algorithm model classification [9]. In the agricultural pest prediction methods, a real-time judgement system can be constructed using Gaussian Naive Bayes and Fast Association Rule Mining algorithms to aid farmers in identifying pests [10]. In addition, other research experiments for detecting pests include the preliminary extraction of the size, color, and texture characteristics of insects through HOG (Histogram of Oriented Gradient) [11] or chromatic aberration denoising extraction [12], and then SVM (Support Vector Machines) are used as the training model to learn these features for the identification of pests. Machine learning-related detection experiments were carried out for the common diseases of wheat plants, furan methrapyran blight, and Fusarium head blight. Multiple linear regression, ridge regression, and random forest regression are separately combined with neural network technology to obtain the identification models for detecting those specific agricultural diseases [13,14,15], which can provide a kind of auxiliary judgment basis for agricultural workers to quickly and effectively diagnose crop diseases and pests.

2.2. Deep Learning

Currently, machine learning-based deep learning methods demonstrate exceptional efficacy in pest detection. Compared to machine learning, deep learning is more robust and does not rely heavily on the artificial processing of image features, resulting in better data-fitting ability. He Yong et al. utilized the SDD (Single Shot MultiBox Detector) deep learning detection algorithm model with an inception layer to successfully detect rapeseed pests. A Dropout network layer was added to the proposed network to balance the performance metrics between the precision and time complexity of the identification, preventing overfitting of the model to the image data [16]. In the end, 12 typical rapeseed pests were tested under different lighting environments and backgrounds, resulting in an experiment obtaining a 77.14% mAP metric. Furthermore, PestNet [17] does not utilize a fully connected layer, instead, it uses position-sensitive score mapping to achieve a 75.46% mAP detection metric on datasets of 16 pests. It is quite a breakthrough that Deep-PestNet has a 100% accuracy rate in identifying major pests, such as aphids, nocturnal moths, and bollworms [18].

2.3. YOLOv5

The above experiments have common problems, that is, the sample types or sample volumes of the pest datasets used are not sufficient, so in the actual deployment application, there is an obvious disadvantage that the identification accuracy is not high, and some types of pests cannot be identified in reality. The SSD described above is a typical one-stage detection model. In the one-stage architecture, the YOLO series [19,20,21] takes an indispensable position, and the article chooses the YOLOv5 deep learning model as the basic model. The YOLOv5 model can be detected in real-time with the GPU environment and has the advantages of being open source and convenient to use, and it has multi-scale prediction. The model backbone adopts a CSPDarknet53 [22] lightweight network structure, to a certain extent, which reduces the amount of calculation and memory over-occupation of the model. The backbone is mainly composed of a standard convolution Conv module, a C3 module, and an SPPF module. The neck of the model applies an FPN (Feature Pyramid Network) [23]. The FPN can fuse different levels of image features into a multi-scale richer feature from upsampling and downsampling layers, which definitely improves the precision of the final detection. The head part has three scale detection visual fields, which are responsible for processing the different scale features corresponding to the backbone of the CSPDarknet53 network.

3. Materials and Methods

In this section, we first explain a data augmentation method, followed by a detailed description of the architecture and its component blocks of the YOLOv5 model, as well as the lightweight convolution module we propose. Finally, we list some common model evaluation indicators for object detection in formulas.

3.1. Data Augmentation

The data augmentation algorithm of Mosaic [24] can effectively improve the diversity of datasets and obtain image data with more semantic features, thereby improving the accurate performance of the model detection. The Mosaic algorithm includes Mixup, Cutout, and CutMix operations, etc., where Mixup is to mix two random image data proportionally, and the final classification result will be distributed in the same proportion. Cutout is to randomly fill part of the image data with 0-pixel values, with no change in the classification result. CutMix randomly uses other images of the same datasets to fill part of the original image, and the classification result equals the proportion uniformly distributed. The specific implementation details are shown in Figure 2. The image data format read in the model training process is to randomly select four images for Mosaic data enhancement; there is more target feature information on a single image, and the model can learn richer semantic features as a result.

3.2. YOLOv5 Network Model

As a typical one-stage detector representative, YOLOv5 is a lightweight detection architecture that was designed to achieve the target of real-time detection and meet the conditions of high detection accuracy and fast inference speed. YOLOv5 has 5 models of different magnitude, which could be sorted from the smallest to the largest: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. YOLOv5 is perfectly convenient to modify the magnitude of its own model by just adjusting the scale factor of the width and depth at the model configuration file. Version YOLOv5 iterates rapidly. In this article, we use version 6.0 as the baseline, and the overall architecture is shown in Figure 3. The model can be divided into three parts: the backbone, neck, and head, and the basic module structure, such as Conv, C3, and SPPF, in YOLOv5 will be introduced as follows.

3.2.1. Conv

In the Conv model, ordinary 2D convolution, BN (Batch Normalization) regularization operation, and SiLU (Sigmoid Linear Unit) activation functions are defined together into a new standard convolutional Conv module, as shown in Figure 4.

3.2.2. C3

In the C3 module shown in Figure 5, first, the input image is processed by the first Conv, and then the features from the second Conv and the bottleneck module are concat together, and finally, the output is obtained by the third Conv.

3.2.3. SPPF

SPPF is an upgraded version of SPP, which varies the convolution kernel size of three max-pooling layers from 5 × 5, 9 × 9, and 13 × 13 to a uniform 5 × 5 size, which improves the ability in handling the features of pooling part. SPPF is more efficient than SPP at features processing, and its concrete implementation architecture is described in Figure 6.

3.3. C3M-YOLO Model

The architecture of the C3M-YOLO model, which is an improvement on YOLOv5, is shown in Figure 7. We replaced the first C3 module of the original neck model with a C3M module. This allows the model to process features in a more flexible and diverse manner, resulting in improved detection performance. Additionally, to enable the network to recognize a wider field of view and capture more feature information from the backbone, we introduced the GAM attention mechanism.

3.3.1. C3M

The C3M architecture mainly relies on the convolution structure of the C3 module. We optimized the convolution block structure of the bottleneck section, and Figure 8 shows a new bottleneck convolution structure called MBottleneck. This structure processes image features faster and more efficiently. The MBottleneck also references the convolution characteristics of MobileNetV3 to construct a new convolution MConv module. Its detailed implementation will be described later. The MBottleneck introduces a Boolean variable called “Shortcut” to determine whether to concatenate the input with the feature obtained after several MConv convolutions to ensure deep network learning while avoiding the model degradation problem.

3.3.2. MConv

Figure 9 shows the architecture of MConv, which uses the “Identity” parameter to indicate whether the input and hidden layers of MConv are equal or not. If they are equal, the input is first processed by depthwise separable convolution and then by a smooth SE (Squeeze-and-Excitation) layer module with an attention mechanism for feature processing. Finally, the result is obtained through pointwise convolution. In addition, if the number of input and hidden layers is different, they will pass through four convolution modules in sequence: pointwise, depthwise, SELayer, and pointwise. They all have a residual parameter to determine whether to output features using the residual method.

Pointwise and depthwise are a pair of lightweight convolution methods. The former only changes the number of channels of the input image without changing the size of the feature map. The latter does the opposite, transforming the size of the feature map without changing the number of channels. The SELayer can enhance the sensitivity of the model to features of different channels. With the fusion of pointwise convolution, depthwise convolution, and SELayer convolution, the image obtained from these three dimensions is richer and more representative. Thus, MConv is more flexible than YOLOv5’s standard convolutions in processing image features.

3.3.3. GAM

The GAM attention mechanism effectively amplifies the cross-dimensional sensory domain feature information and stably improves the performance at different deep learning network architectures by combining the strengths of CA (Channel Attention Mechanism) and SA (Spatial Attention Mechanism). Figure 10 shows the implementation principle of the GAM mechanism, and the input image is successively disposed by CA and SA with basic multiplication operations in turn. GAM has better data scalability and robustness than other common attention mechanisms, such as CBAM [25], ABN [26], and TAM [27]. Thus, it is chosen to be added at the backbone to reinforce the scalability of the algorithm for obtaining more specific pest image features.

The GAM implementation equation is as (1) and (2),

F_{i n p u t}

represents the input image features,

F_{o u t p u t}

represents the output image features, and

F^{'}

is the intermediate excessive feature.

M_{c}

and

M_{s}

are the attention algorithmic mechanisms of CA and SA.

\begin{matrix} F^{'} = M_{c} (F_{i n p u t}) \otimes F_{i n p u t} \end{matrix}

(1)

\begin{matrix} F_{o u t p u t} = M_{s} (F^{'}) \otimes F^{'} \end{matrix}

(2)

3.4. Evaluation Metrics

The object detection evaluation metrics used in this paper include the P (Precision), R (Recall), F1 score, and mAP (mean Average Precision). The F1-Score is more rigorous than the P and R and is more reflective of the detection performance of the model. The AP (Average Precision) is the area enclosed by the P-R integration curve, which is formed by P as the vertical axis and R as the horizontal axis. Then, the mean AP of all the classes in the datasets can calculate the mAP metric, which is also a particularly important detection metric in object detection. In this paper, the detection ability of the model is tested by three ranges of mAP metrics. Among them, mAP_0.5 is the indicator when the IoU threshold is set at 0.5, mAP_0.75 is the average value of the AP calculated under different IoU (0.5–0.75, stride 0.05), and mAP_0.5–0.95 is the average value of the AP with IoU (0.5–0.95, stride 0.05). The detailed implementation formula of these metrics introduced above is shown in (3)–(7):

\begin{matrix} P = \frac{TP}{TP + FP} \end{matrix}

(3)

\begin{matrix} R = \frac{TP}{TP + FN} \end{matrix}

(4)

\begin{matrix} F_{1} = 2 * \frac{P * R}{P + R} \end{matrix}

(5)

\begin{matrix} AP = \int_{0}^{1} P (r) dr \end{matrix}

(6)

\begin{matrix} mAP = \frac{\sum_{i = 1}^{N} {AP}_{i}}{N} \end{matrix}

(7)

Representations of TP, TN, FP and FN are described in Table 1, in which True of the prediction situation means that the model’s judgment is correct, the original image is indeed an image of this category, and if it is not an image of this category, it is recorded as TP and TN. False indicates that the model is misjudged and does not match the reality.

4. Results and Discussion

In this section, we introduce the experimental environment for the model training and the distribution of the labels in the dataset samples. We also present demonstration experiments for our proposed modules, which include ablation experiments as well as comparative experiments.

4.1. Experiment Environment Configuration

The model takes the Pytorch deep learning framework as well as the Anaconda environment. The training environment is on a Ubuntu 20.04 system, and the GPU is RTX3090; the test environment is on a Windows 10 system, and the GPU is RTX3060. In addition, the GPU is used to accelerate the training process of the model.

4.2. Label Distribution of the Training Set

The training and testing data ratio of the IP102 pest detection datasets is 7:3, and the datasets can be obtained from https://github.com/xpwu95/IP102, accessed on 20 June 2019. Figure 11 illustrates all the true bounding boxes of pests in the training datasets. Figure 11a shows the number of instances of each type of pest according to the pest type. Due to the uneven distribution of the number of images for each pest type, it is difficult for the model to accurately identify all types of pests, which undoubtedly increases the difficulty of the model training.

Figure 11b shows the distribution of all the true bounding boxes of the pest images in the training set that we obtained using the K-means clustering algorithm, including the distribution range of the center point coordinates (x, y) and the bounding box width and height. In addition, according to the color depth, where the darker areas indicate a higher concentration, we concluded that most of the pest targets to be detected in the training dataset are located in the center of the original image. Moreover, the scatter plot with the width as the x-axis and the height as the y-axis describes their correlation, and it can be seen that most of the bounding boxes’ widths and heights are above the diagonal line, showing a proportional relationship.

Based on the distribution of these bounding box sizes, the model will select three sets of initial bounding box values that are closest to the size values in the dataset, which correspond to the three scale detection heads of YOLOv5. By using these three sets of anchor box values to detect pests of different sizes, only slight modifications to the bounding box size are needed, which effectively reduces the loss of bounding boxes and improves the detection efficiency of the model.

4.3. Experimental Hyperparameter Setting

Table 2 provides information on the specific experimental parameters used in training the new network model proposed in this paper. For the training, we set the number of epochs to 300 and the batch size to 16. We adjusted the input image size of the IP102 dataset to 640 × 640. We used SGD as the optimizer and initialized the learning rate to 0.01. The momentum and weight loss were set at 0.937 and 0.0005, respectively. This optimizer effectively accelerated the model’s training process and achieved optimal detection performance.

4.4. Training Results of the C3M-YOLO

Figure 12 and Figure 13 respectively show the relationships between the P, R, and confidence. Each black line represents the change in the detection accuracy of a pest in the IP102 dataset, while the blue line represents the average change in the detection accuracy of all the pests. Overall, the precision, P, of all the pests is directly proportional to the confidence, while the recall, R, is inversely proportional to the confidence, and the blue curve is relatively smooth. The maximum accuracy obtained by the improved model is 0.909, and the recall value is 0.880.

Figure 14 and Figure 15 show the P-R curve plots for the original and improved models of YOLOv5. The results indicate that the improved model performs better than the original model on the training set, with an increase of 0.9% in mAP_0.5. However, we noticed that some pests in the training results exhibit severe fluctuations in the detection accuracy, possibly due to insufficient samples for these pests. As a result, the detection model is unable to learn sufficient feature information, leading to misidentification.

Figure 16 shows the changing trend of the training loss and detection accuracy of C3M-YOLO for each epoch on the training and validation datasets. The figures indicate that the boundary loss (box_loss), confidence score loss (obj_loss), and classification loss (cls_loss) decrease gradually with an increase in the training epochs. Moreover, the corresponding detection precision, recall, and mAP values show significant improvement within the first 100 epochs, followed by slow growth from 100 epochs to 250 epochs, finally achieving convergence.

4.5. Ablation Experiments

The ablation experiment presented in Table 3 demonstrates the changes in the detection metrics, the mAP, P, and F1-Score, of the original model after introducing new modules. The results indicate that both the C3M module with deformable convolutions and the GAM module with an expanded perception field can effectively improve the accuracy of the model. Meanwhile, introducing both modules can achieve the best detection performance on the IP102 dataset, thereby proving the rationality of using both modules together.

4.6. Parameter Comparison Experiment

The experiments presented in Table 4 mainly demonstrate the change in the model parameters after introducing the C3M module. It can be inferred that after introducing the C3M module in YOLOv5, the model’s detection accuracy is improved by 4.6%, while the inference speed is also increased. Compared with the original model, C3M significantly reduces the depth of the network layers and reduces the parameter volume by 2.3% and the GFLOPs by 3%.

4.7. Comparative Experiments with Other Models

Table 5 presents a comparison of our proposed algorithm with several current object detection algorithms, including FPN [23], TOOD [28], SSD300 [29], PAA [30], Dynamic R-CNN [31], Sparse R-CNN [32], YOLOv3 [21], and YOLOX [33]. Compared to YOLOX, C3M-YOLO’s detection metrics are undoubtedly outstanding. Specifically, the mAP_0.5–0.95 has increased by 3.8%, while the mAP_0.5 and mAP_0.75 have increased by 5.1% and 6.2%, respectively, as shown in the numerical results. Therefore, the improved model has achieved the best detection accuracy, enabling it to better recognize pest images in IP102.

4.8. Presentation of Detection Results

Figure 17 shows the validation detection results of the model on the training set. The first row presents the true labels of the images, the second row displays the detection results of YOLOv5s, and the last row exhibits the recognition outcomes of C3M-YOLO. Compare to the original model, the accuracy of the improved model in detecting pest images has been enhanced. The detection confidence score for several pests, such as black cutworms, army worms, mole crickets, and blister beetles, has been increased by 0.2, resulting in higher detection precision.

Figure 18 compares the detection results of different models on several original insect images. For each image, the left side shows the test results of the original model, and the right side presents the results of the improved model. It can be observed that compared to YOLOv5s, C3M-YOLO has slightly increased the detection confidence for Cicadellidae, rice leaf roller, and other pests, especially for the corn borer, with a significant improvement of 0.22 in the confidence score.

5. Conclusions

First, we carried out data augmentation on the IP102 dataset using Mosaic enhancement so that the model could extract more detailed feature information and perform better in various real-world scenarios. Second, our proposed C3M module flexibly handles image features while also improving the model’s inference speed. Third, the introduction of the GAM attention mechanism expands the model’s receptive field, enabling it to effectively learn the feature information processed by the backbone in the neck of the model. Subsequent ablation experiments verified the rationality of our improvement strategy.

To address the issue of imbalanced sample sizes in the IP102 pest dataset, we can extract effective feature information by capturing global image features or using more versatile convolutional methods to improve the model’s detection accuracy. Additionally, we found in the experiments that the detection model produced inaccurate anchor boxes and even lost detection confidence for small pest categories, such as unaspis yanonensis and aleurocanthus spiniferus. Therefore, in the future, we will optimize the model’s detection performance for small target pests in complex background environments to better fit practical application scenarios.

Author Contributions

Methodology, L.Z. and C.Z.; Dataset preparation, D.L. and Y.F.; Experiments, L.Z. and C.Z.; Original draft, Y.F. and C.Z.; Visualization, L.Z and D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NSFC, grant number 61806024; JLEE, grant number 202107; Jilin Province Science and Technology Development Plan Key Research and Development Project, grant number 20210204050YY; Wuxi University Research Start-up Fund for Introduced Talents, grant number 2023r004, 2023r006.

Data Availability Statement

All the data mentioned in the paper are available through the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Davis, K.F.; Gephart, J.A.; Emery, K.A.; Leach, A.M.; Galloway, J.N.; D’Odorico, P. Meeting Future Food Demand with Current Agricultural Resources. Glob. Environ. Chang. 2016, 39, 125–132. [Google Scholar] [CrossRef]
Singh, A.; Dhiman, N.; Kar, A.K.; Singh, D.; Purohit, M.P.; Ghosh, D.; Patnaik, S. Advances in Controlled Release Pesticide Formulations: Prospects to Safer Integrated Pest Management and Sustainable Agriculture. J. Hazard. Mater. 2020, 385, 121525. [Google Scholar] [CrossRef] [PubMed]
Abate, T.; van Huis, A.; Ampofo, J.K.O. Pest Management Strategies in Traditional Agriculture: An African Perspective. Annu. Rev. Entomol. 2000, 45, 631–659. [Google Scholar] [CrossRef] [PubMed]
Gonzalez-de-Santos, P.; Ribeiro, A.; Fernandez-Quintanilla, C.; Lopez-Granados, F.; Brandstoetter, M.; Tomic, S.; Pedrazzi, S.; Peruzzi, A.; Pajares, G.; Kaplanis, G. Fleets of Robots for Environmentally-Safe Pest Control in Agriculture. Precis. Agric. 2017, 18, 574–614. [Google Scholar] [CrossRef] [Green Version]
Wu, X.; Zhan, C.; Lai, Y.-K.; Cheng, M.-M.; Yang, J. Ip102: A Large-Scale Benchmark Dataset for Insect Pest Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8787–8796. [Google Scholar]
Glenn, J.; Alex, S.; Ayush, C.; Jirka, B. Ultralytics/Yolov5: V6.0—YOLOv5n “Nano” Models, Roboflow Integration, TensorFlow Export, OpenCV DNN Support, Version 6.0; Zenodo: Honolulu, HI, USA, 2021. [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Durgabai, R.P.L.; Bhargavi, P. Pest Management Using Machine Learning Algorithms: A Review. Int. J. Comput. Sci. Eng. Inf. Technol. Res. 2018, 8, 13–22. [Google Scholar]
Tripathy, A.K.; Adinarayana, J.; Sudharsan, D.; Merchant, S.N.; Desai, U.B.; Vijayalakshmi, K.; Reddy, D.R.; Sreenivas, G.; Ninomiya, S.; Hirafuji, M. Data Mining and Wireless Sensor Network for Agriculture Pest/Disease Predictions. In Proceedings of the 2011 World Congress on Information and Communication Technologies, Mumbai, India, 11–14 December 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1229–1234. [Google Scholar]
Liu, T.; Chen, W.; Wu, W.; Sun, C.; Guo, W.; Zhu, X. Detection of Aphids in Wheat Fields Using a Computer Vision Technique. Biosyst. Eng. 2016, 141, 82–93. [Google Scholar] [CrossRef]
Yao, Q.; Lv, J.; Liu, Q.; Diao, G.; Yang, B.; Chen, H.; Tang, J. An Insect Imaging System to Automate Rice Light-Trap Pest Identification. J. Integr. Agric. 2012, 11, 978–985. [Google Scholar] [CrossRef]
Hooker, D.C.; Schaafsma, A.W.; Tamburic-Ilincic, L. Using Weather Variables Pre-and Post-Heading to Predict Deoxynivalenol Content in Winter Wheat. Plant Dis. 2002, 86, 611–619. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Váňová, M.; Klem, K.; Matušinský, P.; Trnka, M. Prediction Model for Deoxynivalenol in Wheat Grain Based on Weather Conditions. Plant Prot. Sci. 2009, 45, S33–S37. [Google Scholar] [CrossRef] [Green Version]
Rutkoski, J.; Benson, J.; Jia, Y.; Brown-Guedira, G.; Jannink, J.-L.; Sorrells, M. Evaluation of Genomic Prediction Methods for Fusarium Head Blight Resistance in Wheat. Plant Genome 2012, 5. [Google Scholar] [CrossRef] [Green Version]
He, Y.; Zeng, H.; Fan, Y.; Ji, S.; Wu, J. Application of Deep Learning in Integrated Pest Management: A Real-Time System for Detection and Diagnosis of Oilseed Rape Pests. Mob. Inf. Syst. 2019, 2019, 4570808. [Google Scholar] [CrossRef] [Green Version]
Liu, L.; Wang, R.; Xie, C.; Yang, P.; Wang, F.; Sudirman, S.; Liu, W. PestNet: An End-to-End Deep Learning Approach for Large-Scale Multi-Class Pest Detection and Classification. IEEE Access 2019, 7, 45301–45312. [Google Scholar] [CrossRef]
Ullah, N.; Khan, J.A.; Alharbi, L.A.; Raza, A.; Khan, W.; Ahmad, I. An Efficient Approach for Crops Pests Recognition and Classification Based on Novel DeepPestNet Deep Learning Model. IEEE Access 2022, 10, 73019–73032. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Fukui, H.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Attention Branch Network: Learning of Attention Mechanism for Visual Explanation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10705–10714. [Google Scholar]
Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 3139–3148. [Google Scholar]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-Aligned One-Stage Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9905, pp. 21–37. [Google Scholar]
Kim, K.; Lee, H.S. Probabilistic Anchor Assignment with IoU Prediction for Object Detection. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]

Figure 1. The main work of the article.

Figure 2. Sixteen examples with some interference characteristics of Mosaic data enhancement.

Figure 3. YOLOv5s model structure diagram.

Figure 4. Conv standard convolutional structure diagram.

Figure 5. C3 module structure diagram.

Figure 6. SPPF module structure diagram.

Figure 7. C3M-YOLO module structure diagram.

Figure 8. C3M structure diagram.

Figure 9. MConv convolutional architecture diagram.

Figure 10. GAM attention mechanism schematic.

Figure 11. (a) The quantity of each pest type in the IP102 dataset of the training set. (b) Use the K-means clustering method to calculate the center point coordinates, width, height, and their correlations of all target pests in the dataset.

Figure 12. Graph of the relationship between Precision and Confidence score for all pests.

Figure 13. Graph of the relationship between Recall and Confidence score for all pests.

Figure 14. P-R diagram of YOLOv5s for all pests.

Figure 15. P-R diagram of C3M-YOLO for all pests.

Figure 16. Summary of C3M-YOLO training results.

Figure 17. Comparison of image results during training.

Figure 18. Detection results of different pest images.

Table 1. Determination of the relationship between the predicted situation and the real situation.

Real Situation	Prediction Situation
Real Situation	True	False
Positive	TP	FP
Negative	TN	FN

Table 2. Training hyperparameter information.

Set of Parameter	Value or Name
Epochs	300
Batch size	16
Input size	640 × 640
Optimizer	SGD
Learning rate	0.01
Momentum	0.937
Weight decay	0.0005

Table 3. Ablation experiments.

Method	mAP_0.5	mAP_0.75	mAP_0.5–0.95	P	F1-Score
YOLOv5s	56.2	36.8	34.1	55.0	55.7
YOLOv5s + GAM	56.1	37.3	34.3	56.2	57.0
YOLOv5s + C3M	57.0	37.4	34.5	59.6	56.6
C3M-YOLO (ours)	57.2	38.5	34.9	57.4	57.5

Table 4. The magnitude and inference speed change with C3M.

Model	Layers	Parameters	GFLOPs	FPS
YOLOv5s	157	7,285,219	16.6	92
With C3M	152	7,121,123	16.1	97

Table 5. Comparative experiments.

Model	mAP_0.5	mAP_0.75	mAP_0.5–0.95
FPN [23]	54.9	23.3	28.1
TOOD [28]	43.9	28.7	26.5
SSD300 [29]	47.2	16.6	21.5
PAA [30]	42.7	26.1	25.2
Dynamic R-CNN [31]	50.7	30.3	29.4
Sparse R-CNN [32]	33.2	23.8	21.1
YOLOv3 [21]	50.6	21.8	25.7
YOLOX [33]	52.1	32.3	31.1
C3M-YOLO (ours)	57.2	38.5	34.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Zhao, C.; Feng, Y.; Li, D. Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module. Agronomy 2023, 13, 1583. https://doi.org/10.3390/agronomy13061583

AMA Style

Zhang L, Zhao C, Feng Y, Li D. Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module. Agronomy. 2023; 13(6):1583. https://doi.org/10.3390/agronomy13061583

Chicago/Turabian Style

Zhang, Lijuan, Cuixing Zhao, Yuncong Feng, and Dongming Li. 2023. "Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module" Agronomy 13, no. 6: 1583. https://doi.org/10.3390/agronomy13061583

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pests Identification of IP102 by YOLOv5 Embedded with the Novel Lightweight Module

Abstract

1. Introduction

2. Related Work

2.1. Machine Learning

2.2. Deep Learning

2.3. YOLOv5

3. Materials and Methods

3.1. Data Augmentation

3.2. YOLOv5 Network Model

3.2.1. Conv

3.2.2. C3

3.2.3. SPPF

3.3. C3M-YOLO Model

3.3.1. C3M

3.3.2. MConv

3.3.3. GAM

3.4. Evaluation Metrics

4. Results and Discussion

4.1. Experiment Environment Configuration

4.2. Label Distribution of the Training Set

4.3. Experimental Hyperparameter Setting

4.4. Training Results of the C3M-YOLO

4.5. Ablation Experiments

4.6. Parameter Comparison Experiment

4.7. Comparative Experiments with Other Models

4.8. Presentation of Detection Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI