A Lightweight Crop Pest Detection Method Based on Improved RTMDet

Wang, Wanqing; Fu, Haoyue

doi:10.3390/info15090519

Open AccessArticle

A Lightweight Crop Pest Detection Method Based on Improved RTMDet

by

Wanqing Wang

^1,* and

Haoyue Fu

^2,*

¹

College of Life Sciences, Northwest Normal University, Lanzhou 730070, China

²

College of Mathematics, Northeastern University, Shenyang 110819, China

^*

Authors to whom correspondence should be addressed.

Information 2024, 15(9), 519; https://doi.org/10.3390/info15090519

Submission received: 11 July 2024 / Revised: 13 August 2024 / Accepted: 21 August 2024 / Published: 26 August 2024

(This article belongs to the Special Issue Advanced Technologies in Intelligent Detection of Biological Information)

Download

Browse Figures

Versions Notes

Abstract

To address the issues of low detection accuracy and large model parameters in crop pest detection in natural scenes, this study improves the deep learning object detection model and proposes a lightweight and accurate method RTMDet++ for crop pest detection. First, the real-time object detection network RTMDet is utilized to design the pest detection model. Then, the backbone and neck structures are pruned to reduce the number of parameters and computation. Subsequently, a shortcut connection module is added to the classification and regression branches, respectively, to enhance its feature learning capability, thereby improving its accuracy. Experimental results show that, compared to the original model RTMDet, the improved model RTMDet++ reduces the number of parameters by 15.5%, the computation by 25.0%, and improves the mean average precision by 0.3% on the crop pest dataset IP102. The improved model RTMDet++ achieves a mAP of 94.1%, a precision of 92.5%, and a recall of 92.7% with 4.117M parameters and 3.130G computations, outperforming other object detection methods. The proposed model RTMDet++ achieves higher performance with fewer parameters and computations, which can be applied to crop pest detection in practice and aids in pest control research.

Keywords:

crop pest detection; deep learning; lightweight; shortcut connection; pruning

1. Introduction

As the global population continues to grow, increasing food production is essential to sustain human life. However, the limited availability of arable land complicates this challenge, making the sustainability and scientific precision of crop protection crucial for future agricultural advancements [1]. In modern agriculture, effective pest and disease management is key to improving and maintaining crop yields [2]. While pesticide application plays a significant role in mitigating crop losses, the residual effects of agricultural pests and environmental pollution are often underestimated [3,4,5]. To achieve the dual goals of meeting increased food production needs while protecting natural habitats and minimizing environmental pollution, automated precision pesticide application emerges as a promising method for optimizing pesticide use and ensuring soil safety [6]. The accurate detection of pest and disease locations is a prerequisite for achieving this goal.

Crop pests and diseases are one of the main negative factors affecting agricultural production, causing enormous economic losses every year. According to the United Nations Food and Agriculture Organization, global crop losses due to pests and diseases can reach about 40% of total production. Moreover, plant diseases cause economic losses exceeding 220 billion annually, and invasive insects cause damages of at least 70 billion [7]. Therefore, the research and application of precise and efficient pest and disease detection technologies are not only urgently needed to improve agricultural production efficiency, but are also a key to ensuring global food safety and ecological security.

Initially, crop pest and disease detection relied on manual patrols, empirical judgments, or the extensive use of chemical pesticides. These methods are not only inefficient and costly, but also lead to environmental pollution, decreased biodiversity, and increased pest resistance when there is an over-reliance on chemical pesticides. With the advancement of technology, especially in information technology, biotechnology, and intelligent equipment, modern pest and disease detection is gradually shifting towards more precise, eco-friendly, and intelligent approaches.

Subsequently, research primarily focused on pest and disease identification based on traditional image processing techniques. These methods utilize shallow features such as color, texture, shape, and traditional algorithms for the identification of pests and diseases. For example, Shi Fengmei et al. proposed a color image segmentation method for rice blast disease based on Support Vector Machines (SVMs). By selecting positive and negative training samples and extracting RGB color component feature vectors for training, this method achieved the classification and segmentation of image pixels [8]. M.A. Ebrahimi et al. used SVM with a difference kernel function for automated thrips detection in strawberry greenhouse monitoring. By integrating the ratio of major to minor diameter as a regional feature and intensity as a color feature into the SVM model, the mean error is below 2.25% [9]. Zhu Juanhua et al. utilized digital image processing and pattern recognition techniques for the automated diagnosis of corn leaf diseases, including image preprocessing, lesion segmentation, and feature extraction, with identification based on shape parameters. The model achieves an 80% diagnosis accuracy [10]. Zhang Yongling et al. proposed a method for identifying rice insect images based on multi-feature fusion and sparse representation. This method fused HSV, HOG, Gabor and LBP features, and constructed an over-complete dictionary for the sparse representation of test images, with the highest recognition rate and the lowest false detection rate reaching 90.1% and 5.2%, respectively [11]. Although these methods can achieve accurate identification by customizing specific disease features, they heavily rely on manually designed features. Due to the limitations of feature selection, these methods have poor generalization ability when facing diverse pest and disease morphologies and complex environmental conditions, and are easily affected by background interference and changes in lighting.

In recent years, the rapid development of deep learning technology, particularly the advent of Convolutional Neural Networks (CNNs) [12,13] and object detection algorithms such as YOLO series [14,15,16,17,18], have provided powerful technical support for the automated detection of crop pests and diseases. These algorithms can automatically learn features for the identification and localization of pests and diseases from images, enabling fast and accurate diagnosis, significantly improving the efficiency and accuracy of pest and disease detection. For example, José G.M et al. used CNNs to segment and classify different types of leaf spot diseases to assess the degree of biological factor-induced damage in coffee leaves and developed an Android platform app [19]. Zhang Meng et al. innovatively introduced a clustering method based on the Rao-1 algorithm to optimize the anchor box sizes of Yolov3. They constructed a target function using the intersection-over-union (IoU) to generate the most representative anchor boxes, improving the recognition effect between normal and damaged apples [20]. Park, H.M et al. applied the YOLOv4-Tiny model with a circular bounding box to accurately determine the optimal chrysanthemum harvest time, outperforming traditional rectangular boxes significantly in classifying flower bloom stages and detecting circular objects, demonstrating excellent scalability and cross-domain application potential [21]. Fu Xueqian et al. introduced an improved method for the recognition of crop pest and disease images, effectively selecting the most obvious regions by partitioning blocks and using the self-attention mechanism to better mine special features of non-obvious diseased areas [22]. Chinna G.S et al. proposed a pest detection algorithm combining foreground extraction with contour recognition, which can effectively identify insects in complex backgrounds. They used nine-fold cross-validation to optimize model performance, with the highest classification accuracy rates for 9 and 24 classes of pests reaching 91.5% and 90%, respectively [23]. He Yiting et al. proposed an improved coffee leaf disease detection algorithm based on YOLOv5. By incorporating the ConvNeXt network and ECA attention mechanism, they optimized the model’s feature extraction capability and reduced the problem of the failed detection of occluded and small targets. The mean average precision reached 94.13%, with a model parameter size of 17.2 MB [24]. Zhang Lijuan et al. enhanced the C2F module and introduced DCF for extracting vital features based on Yolov8, while employing the Mish activation function to improve non-linear learning; the improved method boosts mAP, precision and recall by 2%, 1.3% and 3.7%, respectively [25].

Despite significant breakthroughs in research, existing technologies still face many challenges in actual application, including but not limited to: high model complexity, difficulty in deploying in resource-constrained agricultural environments; the impact of changing lighting conditions and complex backgrounds on detection accuracy; and how to effectively deal with the diverse types of pests and diseases and their significant morphological changes. Therefore, developing a lightweight, high-precision pest and disease detection system that can adapt to complex natural environments has become a pressing key scientific issue.

In light of this, the aim of this study is to improve the RTMDet object detection model, designing and implementing a lightweight, efficient detection system for crop pests. The main contributions are summarized as follows:

We proposed a lightweight and accurate pest detection model RTMDet++ by improving the RTMDet model.
We adopt the pruning strategy to optimize the RTMDet structure and reduce model complexity, and introduce shortcut connection module to enhance the model’s feature extraction capabilities and improve detection accuracy.
We conduct experiments on the IP102 dataset containing natural environmental pest and disease data to evaluate our proposed model RTMDet++, as shown in Figure 1, ensuring that the research methods are applicable to real-world crop pest detection tasks.

2. Materials and Methods

2.1. RTMDet Model

RTMDet [26] is a high-precision, low-latency single-stage object detector with an overall architecture similar to YOLOX [27], as shown in Figure 2. RTMDet uses CSPNeXt as the backbone, CSPNeXtPAFPN as the neck, and SepBNHead as the detection head. SepBNHead includes classification and regression heads, where the convolutional weights of the head are same for each pyramid feature map, but batch normalization is computed separately. Specifically, the backbone CSPNeXt extracts three feature maps C3, C4, C5 from the input image. These are then sent to the CSPNeXtPAFPN, which includes top-down and bottom-up processes, producing feature maps M3, M4, M5 at intermediate CSPLayers. Next, M3, M4, M5 are processed by a convolutional module to generate pyramid feature maps P3, P4, P5 with the same number of channels. These are shared by the classification and regression prediction heads. Each head uses two stacked convolutional layers to extract features for classification and regression, which are used for object classification and bounding box prediction, respectively. The detailed structure and parameters of the model can be referenced in [28].

2.2. Pruning the RTMDet Model

In the original RTMDet structure, the backbone CSPNeXt has a total of five layers, consisting of one Stem layer and four Stage layers. The Stem layer includes three 3 × 3 convolutional modules and each module consists of a 3 × 3 convolutional layer, a batch normalization layer, and a SiLU activation layer. The structure of the Stem layer is shown in Figure 3a.

The Stage layer consists of one 3 × 3 convolutional module and one CSPLayer, with the structure of the convolutional module being the same as that of the Stem layer. The CSPLayer consists of three 1 × 1 convolutional modules, three serial CSPNeXt modules, and a channel attention module. The structure of the Stage layer is shown in Figure 3b.

Each CSPNeXt module consists of one 3 × 3 convolutional module and one 5 × 5 depthwise separable convolutional module, with the structure of the depthwise separable convolutional module being the same as that of the convolutional module, except for the use of depthwise separable convolutional layers. ChannelAtt consists of a global average pooling layer, a 1 × 1 convolutional layer, and a HardSigmoid activation layer. Their structures are shown in Figure 4a and Figure 4b, respectively.

In the original RTMDet’s neck structure, the feature maps C3, C4, C5 output from the last three Stage layers of the backbone network are used as inputs, and after the feature fusion process of the neck’s CSPNeXtPAFPN, which includes top-down and bottom-up operations, the feature maps M3, M4, M5 are output at the intermediate three CSPLayers. Then, M3, M4, M5 are processed by a 3 × 3 convolutional module to output the pyramid feature maps P3, P4, P5.

Considering that the backbone and neck structure of RTMDet contain CSPLayers that serially connect too many CSPNeXt modules, and the detection head has repetitively stacked convolutional layers, this significantly increases the number of parameters and computation of RTMDet. Therefore, it is necessary to prune the model. To this end, we decrease the three CSPNeXt modules in each CSPLayer to one, and the detection head retains one convolutional layer for classification and regression prediction, respectively, to reduce the number of parameters and computation of the model. Subsequent experiments will verify that the above pruning of the model can significantly reduce the number of parameters and computation while slightly reducing performance.

2.3. Shortcut Connection Module

In the pruned RTMDet detection head structure, the classification and regression prediction heads retain only one 3 × 3 convolutional layer each for extracting classification and regression features, respectively. Inspired by [29], we added shortcut connections to the classification and regression branches of the detection head to enhance the learning of classification and regression features, as shown by the red lines in Figure 5, which can be formulated as

F_{i} = f (P_{i}) + P_{i} .

(1)

G_{i} = g (P_{i}) + P_{i} .

(2)

where f and g are the operations of the 3 × 3 convolutional layers of the classification and regression heads, respectively.

P_{i}

is the pyramid feature map of the i-th Stage,

i = 3, 4, 5

.

The total loss used for training RTMDet is shown as

L_{t o t a l} = w_{1} \times L_{c l s} + w_{2} \times L_{r e g} .

(3)

where

L_{c l s} = Q u a l i t y F o c a l L o s s (p, q)

and

L_{r e g} = G I o U L o s s (p, q)

are the classification and regression loss functions, respectively, p and q are the predicted and true values, respectively,

w_{1}

and

w_{2}

are the weights of the classification and regression loss functions, respectively.

During training, model parameters are updated through back-propagation to minimize the loss function, reducing the discrepancy between the predicted and ground truth as much as possible. During testing, detection boxes with classification scores lower than 0.1 are first removed. Secondly, the remaining detection boxes are sorted in descending order of classification scores. Then, the detection box with the highest score is retained, and the intersection-over-union (IoU) with other boxes is calculated. If the IoU exceeds 0.5, the other boxes are removed. This process is repeated until all detection boxes have been traversed.

2.4. Dataset Preparation

We conducted experiments using the open-source crop pest and disease dataset IP102 [30]. This dataset is intended for the classification and detection tasks of crop pests and diseases, primarily containing pest images of crops such as rice, corn, wheat, pepper, alfalfa, grapes, citrus, and bananas. The dataset includes over 75,000 images of more than 102 different categories of pests, of which approximately 19,000 images are annotated with bounding boxes for object detection tasks. We divided the images with bounding boxes into training, validation, and test sets at a ratio of 6:1:1, respectively, containing 14,231, 2372, and 2372 images. The IP102 dataset exhibits a natural long-tail distribution, reflecting the actual uneven distribution of pest types in the real world, which increases the challenge of identification. The dataset has a hierarchical structure, where each subclass is grouped into a superclass based on the main crop it primarily harms. In the experiment, we detected pests as a superclass. Below are sample images extracted from the IP102 dataset, as shown in Figure 6.

2.5. Training and Testing

We built the model based on the open-source projects mmdetection [31] and mmyolo [32], using the open-source deep learning framework PyTorch and the programming language Python for our experiments. These experiments were conducted on a Linux system server equipped with a graphics processing unit (GPU) with computing capabilities, utilizing the CUDA parallel computing architecture and the cuDNN acceleration library for training and testing the model [33]. Specific information is shown in Table 1. The hyperparameters related to training and testing set in our experiments are shown in Table 2. Other parameter settings are consistent with mmyolo. During the training process, the variation in training losses and validation mAP with the iterations is shown in Figure 7. This indicates that the model is learning, with a decrease in the bias between the predicted values and the ground truth.

We used the mean average precision (mAP), precision, recall, and F1 score as metrics to evaluate the model’s performance [34]. A correct match detection is counted when the intersection over union (IoU) of the predicted bounding box with the true bounding box of the corresponding category is 0.5 or higher. Their definitions are as follows:

Precision denotes the proportion of samples that are truly positive within those predicted as positive. It measures the accuracy of the model’s predictions for positive samples, which can be formulated as

P = T P / (T P + F P) .

(4)

where

T P

is true positive, and

F P

is false positive.

Recall denotes the proportion of samples that are correctly classified as positive within those that are truly positive. It measures the model’s ability to identify all positive samples, which can be represented as

R = T P / (T P + F N) .

(5)

where

T P

is true positive, and

F N

is false negative.

F1 score denotes the harmonic mean of precision and recall, aiming to provide a comprehensive evaluation, suitable for scenarios with imbalanced samples. The calculation method can be written as

F 1 = 2 * P * R / (P + R) .

(6)

Mean average precision (mAP) denotes the average of the average precision (AP) across various categories, used to measure the detection performance of the algorithm across all categories. The calculation method is shown as

m A P = 1 / n * \sum_{i = 1}^{n} A P_{i} .

(7)

where n is the number of target categories, and

A P_{i}

is the average precision for the i-th category, equivalent to the area under the precision–recall (PR) curve.

3. Results

We compared other models with the improved model RTMDet++ on various metrics. The experimental results are shown in Table 3.

Performance: By comparing the performance of the improved model RTMDet++ with other models on various metrics, it can be seen that RTMDet++ generally outperforms other models such as SSD [35], YoloX, Yolov7, etc., reaching 94.1% for mAP, 92.5% for precision, 92.7% for recall, and 92.6% for F1 score.

The improved model RTMDet++ maintains relatively low parameter (M) and computation (GFlops) while keeping high precision. It is particularly evident when compared to Faster R-CNN, which has a similar accuracy but much higher parameters and computation than RTMDet++. The improved model RTMDet++ achieves a better balance between accuracy and parameters and computation.

Improvement: Compared with the original RTMDet, the improved model RTMDet++ reduces the parameters by 15.5% and the computation by 25.0%, while improving mAP, precision, recall, and F1 by 0.3%, 0.6%, 0.6%, and 0.6%, respectively, indicating that the improvement strategy not only reduces the parameters and computation but also effectively improves detection performance. The visualized pest detection results of RTMDet++ are shown in Figure 8.

By optimizing the model design, the improved model RTMDet++ effectively controls the parameters and computation while ensuring detection accuracy, achieving good performance compared to existing mainstream models, suitable for scenarios with high requirements for detection accuracy and computational complexity.

Ablation experiment:

We conducted an ablation experiment to assess the impact of each improvement module (pruning and shortcut connections) on model performance, as shown in Table 4.

Pruning: By comparing the first two rows, it can be seen that the pruning module causes a slight drop in performance but significantly reduces the parameters and computation. mAP, precision, recall, and F1 scores decreased by 0.2%, 0.6%, 1.0%, and 0.8%, respectively, while the parameters decreased from 4.873M to 4.117M, and the computation (Flops) decreased from 4.173G to 3.129G, representing reductions of 15.5% and 25.0%, respectively.

Shortcut Connection: By comparing the second and third rows, it can be seen that the shortcut connection module improves model performance without increasing the parameters and only slightly increasing the computation. mAP, precision, recall, and F1 scores are improved by 0.5%, 1.2%, 1.6%, and 1.4%, respectively. This indicates that the shortcut connection module enhances the model’s feature learning ability, thereby improving its performance.

4. Discussion

In response to the challenges of insufficient detection accuracy and large model sizes for pest and disease detection in natural environments, this research optimized and innovatively applied the RTMDet model. By integrating model pruning and shortcut connections, we constructed a lightweight and accurate detection model RTMDet++. In terms of innovation, we first optimized the model structure using network pruning technology, which not only significantly reduced the number of parameters and computation but also ensured efficient operation, facilitating practical application in resource-limited scenarios such as real-time monitoring in farm fields. According to the comparative experimental results, the improved model reduced the parameters by 15.5% and the computation by 25.0%, with only a slight decrease in detection performance. Secondly, we added a shortcut connection module to the classification and regression branches of the model, and this innovative module significantly strengthened the model’s ability to learn features, allowing the model to better retain and transmit feature information during the training process, improving the detection effect under complex backgrounds.

Furthermore, this research conducted experiments on the crop pest and disease dataset IP102 and provided comprehensive performance assessments, including average precision, precision, recall, and F1 scores. The results showed that the improved model achieved 94.1% average precision, 92.5% precision, 92.7% recall, and 92.6% F1 score with relatively small parameters and computation, outperforming other target detection methods. This provides new directions and ideas for the development of future crop pest and disease detection systems.

5. Conclusions

To handle the difficulty of low detection accuracy and high model complexity in crop pest and disease detection in natural scenes, this paper has made innovative improvements to the RTMDet model, proposing a lightweight and accurate crop pest disease detection model RTMDet++. Through experiments, the improved model has been shown to reduce parameters and computation while also enhancing detection accuracy. The main results are as follows:

We provided a useful method RTMDet++ for the real-time monitoring and control of crop pests and diseases in practice, which holds important theoretical and practical value.
We made the RTMDet model lightweight through pruning technology, reducing the number of parameters by 15.5% and the computation by 25.0%, significantly lowering the model’s complexity.
We introduced a shortcut connection module, which enhanced the RTMDet model’s feature learning capability, resulting in a 0.3% improvement in average precision, reaching 94.1%. This increased the detection accuracy while keeping the model lightweight.

For pest and disease detection in specific complex backgrounds, further in-depth research is required. Future research will focus on optimizing model structure to improve stability and accuracy in complex and changeable environments. Besides, this paper’s primary objective is pest detection, we will also consider enhancing the generalization capability of our model on other relevant datasets, such as the CCMT dataset, which contains images of diseased plant leaves. At the same time, more lightweight technologies will be explored to meet the needs of mobile devices and edge computing, promoting the intelligent development of modern agriculture.

Author Contributions

Conceptualization, W.W.; methodology, W.W. and H.F.; software, W.W. and H.F.; validation, W.W. and H.F.; formal analysis, H.F.; investigation, W.W. and H.F.; resources, W.W.; data curation, W.W. and H.F.; writing—original draft preparation, W.W.; writing—review and editing, H.F.; visualization, W.W. and H.F.; supervision, H.F.; project administration, W.W. and H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset analyzed during the current study is accessed on 12 February 2024 and publicly available at https://github.com/xpwu95/IP102.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Godfray, H.C.J.; Beddington, J.R.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Pretty, J.; Robinson, S.; Thomas, S.M.; Toulmin, C. Food Security: The Challenge of Feeding 9 Billion People. Science 2010, 327, 812–818. [Google Scholar] [CrossRef] [PubMed]
Oerke, E.C. Crop losses to pests. J. Agric. Sci. 2006, 144, 31–43. [Google Scholar] [CrossRef]
Skovgaard, M.; Renjel Encinas, S.; Jensen, O.C.; Andersen, J.H.; Condarco, G.; Jørs, E. Pesticide Residues in Commercial Lettuce, Onion, and Potato Samples From Bolivia—A Threat to Public Health? Environ. Health Insights 2017, 11, 1178630217704194. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Zhu, J.H.; Yang, Y.L.; Tang, H.; Lü, H.P.; Fan, M.S.; Shi, Y.; Dong, D.F.; Wang, G.J.; Wang, W.X.; et al. Status of Major Diseases and Insect Pests of Potato and Pesticide Usage in China. Sci. Agric. Sin. 2019, 52, 2800–2808. [Google Scholar] [CrossRef]
Editorial Committee of China Agricultural Yearbook. Chinese Agriculture Yearbook; China Agriculture Press: Beijing, China, 2017. [Google Scholar]
Zhang, F.; Chen, X.; Vitousek, P. An experiment for the world. Nature 2013, 497, 33–35. [Google Scholar] [CrossRef]
Gullino, M.; Albajes, R.; Al-Jboory, I.; Angelotti, F.; Chakraborty, S.; Garrett, K.; Hurley, B.; Juroszek, P.; Makkouk, K.; Pan, X.; et al. Scientific Review of the Impact of Climate Change on Plant Pests: A Global Challenge to Prevent and Mitigate Plant-Pest Risks in Agriculture, Forestry and Ecosystems; Food and Agriculture Organization of the United Nations: Rome, Italy, 2021. [Google Scholar]
Shi, F.; Zhao, K.; Meng, Q.; Ma, L. Research on Image Segmentation of Rice Blast Based on Support Vector Machine. J. Northeast. Agric. Univ. 2013, 44, 128–135. [Google Scholar] [CrossRef]
Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
Zhu, J.H.; Wu, A.; Li, P. Corn leaf diseases diagnostic techniques based on image recognition. In Proceedings of the Communications and Information Processing: International Conference, ICCIP 2012 Aveiro, Portugal, March 7–11, 2012 Revised Selected Papers, Part I; Springer: Berlin/Heidelberg, Germany, 2012; pp. 334–341. [Google Scholar] [CrossRef]
Zhang, Y.; Jiang, M.; Yu, P.; Yao, Q.; Yang, B.; Tang, J. Agricultural pest identification based on multi-feature fusion and sparse representation. Sci. Agric. Sin. 2018, 51, 2084–2093. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Wong, C.; Yifu, Z.; Montes, D.; et al. ultralytics/yolov5: v6.2-YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations. Zenodo 2022. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-time flying object detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar]
Esgario, J.G.; de Castro, P.B.; Tassis, L.M.; Krohling, R.A. An app to assist farmers in the identification of diseases and pests of coffee leaves using deep learning. Inf. Process. Agric. 2022, 9, 38–47. [Google Scholar] [CrossRef]
Zhang, M.; Liang, H.; Wang, Z.; Wang, L.; Huang, C.; Luo, X. Damaged apple detection with a hybrid YOLOv3 algorithm. Inf. Process. Agric. 2022, 11, 163–171. [Google Scholar] [CrossRef]
Park, H.M.; Park, J.H. YOLO Network with a Circular Bounding Box to Classify the Flowering Degree of Chrysanthemum. AgriEngineering 2023, 5, 1530–1543. [Google Scholar] [CrossRef]
Fu, X.; Ma, Q.; Yang, F.; Zhang, C.; Zhao, X.; Chang, F.; Han, L. Crop pest image recognition based on the improved ViT method. Inf. Process. Agric. 2023, 11, 249–259. [Google Scholar] [CrossRef]
Simhadri, C.G.; Kondaveeti, H.K.; Vatsavayi, V.K.; Mitra, A.; Ananthachari, P. Deep learning for rice leaf disease detection: A systematic literature review on emerging trends, methodologies and techniques. Inf. Process. Agric. 2024. [Google Scholar] [CrossRef]
He, Y.T.; Lin, Y.; Zeng, Y.L. Improved detection of coffee leaf diseases and insect pests based on YOLOv5. J. Anhui Agric. Sci. 2023, 51, 221–226. [Google Scholar] [CrossRef]
Zhang, L.; Ding, G.; Li, C.; Li, D. DCF-Yolov8: An Improved Algorithm for Aggregating Low-Level Features to Detect Agricultural Pests and Diseases. Agronomy 2023, 13, 2012. [Google Scholar] [CrossRef]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet Configs: An Empirical Study of Designing Real-Time Object Detectors 2022. Available online: https://github.com/open-mmlab/mmyolo/tree/main/configs/rtmdet (accessed on 10 August 2023).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Wu, X.; Zhan, C.; Lai, Y.K.; Cheng, M.M.; Yang, J. Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8787–8796. [Google Scholar] [CrossRef]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. MMYOLO: OpenMMLab YOLO Series Toolbox and Benchmark. 2022. Available online: https://github.com/open-mmlab/mmyolo/tree/main (accessed on 10 August 2023).
Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. cuDNN: Efficient Primitives for Deep Learning. 2014. Available online: https://developer.nvidia.com/cudnn (accessed on 20 January 2023).
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar] [CrossRef]

Figure 1. Comparisons of computations (GFLOPs), parameters (M) and mean average precision (mAP) on the IP102 dataset.

Figure 2. The architecture of the RTMDet model. The red boxe marks pest detected by RTMDet.

Figure 3. The structures of Stem and Stage. (a) represents Stem and (b) represents Stage.

Figure 4. The structures of the CSPNeXt Block and the ChannelAtt Block. (a) expresses CSPNeXt and (b) expresses ChannelAtt.

Figure 5. The shortcut connection module in detection head.

Figure 6. Sample images with different category and conditions from the IP102 dataset.

Figure 7. The variation in training losses and validation mAP with the iterations during training.

Figure 8. The visualized pest detection results (red boxes) of RTMDet++ for the IP102 sample images.

Table 1. The information of the experiment platform.

Platform	Version
System	Ubuntu-20.04
CUDA	11.3
CuDNN	8.2
Python	3.8
PyTorch	1.10.1
GPU	Nvidia RTX4090

Table 2. The hyperparameter configuration for the experiment.

Hyperparameter	Configuration
input	512 × 512
batch	32
optimizer	AdamW
learning rate	0.004
weight decay	0.05
score threshold	0.1
train epochs	150
IoU threshold	0.5

Table 3. Comparison experiment results on IP102 dataset.

Method	mAP	P	R	F1	Params	FLOPs
SSD	83.6	81.7	81.9	81.8	2.124	4.119
Yolov3	87.8	90.5	90.3	90.4	2.765	2.521
YoloX	91.4	90.8	90.7	90.7	5.033	3.937
Yolov7	91.8	92.1	92.4	92.3	6.015	3.406
Faster-RCNN	92.1	92.4	92.5	92.4	28.279	40.751
RTMDet	93.8	91.9	92.1	92.0	4.873	4.173
RTMDet++	94.1	92.5	92.7	92.6	4.117	3.130

Table 4. Ablation experiment results on the IP102 dataset.

Pruning	Shortcut	mAP	P	R	F1	Params	FLOPs
		93.8	91.9	92.1	92.0	4.873M	4.173G
√		93.6	91.3	91.1	91.2	4.117M	3.129G
√	√	94.1	92.5	92.7	92.6	4.117M	3.130G

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Fu, H. A Lightweight Crop Pest Detection Method Based on Improved RTMDet. Information 2024, 15, 519. https://doi.org/10.3390/info15090519

AMA Style

Wang W, Fu H. A Lightweight Crop Pest Detection Method Based on Improved RTMDet. Information. 2024; 15(9):519. https://doi.org/10.3390/info15090519

Chicago/Turabian Style

Wang, Wanqing, and Haoyue Fu. 2024. "A Lightweight Crop Pest Detection Method Based on Improved RTMDet" Information 15, no. 9: 519. https://doi.org/10.3390/info15090519

APA Style

Wang, W., & Fu, H. (2024). A Lightweight Crop Pest Detection Method Based on Improved RTMDet. Information, 15(9), 519. https://doi.org/10.3390/info15090519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Crop Pest Detection Method Based on Improved RTMDet

Abstract

1. Introduction

2. Materials and Methods

2.1. RTMDet Model

2.2. Pruning the RTMDet Model

2.3. Shortcut Connection Module

2.4. Dataset Preparation

2.5. Training and Testing

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI