Overview of Pest Detection and Recognition Algorithms
Abstract
:1. Introduction
2. Literature Collection Methods
- 1.
- Initial search: we conducted a comprehensive search in multiple academic databases, including Google Scholar, IEEE Xplore, and ArXiv, using keywords related to pest detection, pest recognition, and smart agriculture.
- 2.
- Manual search: to ensure the inclusion of pioneering works and the latest publications that may not yet be indexed in the databases, we performed manual searches by reviewing references in key articles and checking relevant conference proceedings and journals.
- 3.
- Screening process: The initial search yielded numerous articles. We screened these articles by reviewing titles and abstracts to exclude irrelevant studies. Full texts of the remaining articles were then reviewed to ensure they met our inclusion criteria: peer-reviewed journal articles and conference papers published between 2020 and 2024; articles had to be in English and provide substantial insights or advancements in the fields of pest detection or pest recognition using algorithms.
3. Background
3.1. CNN Architecture
3.1.1. ResNet
3.1.2. YOLO
3.2. Transformer Architecture
3.2.1. Vision Transformer
3.2.2. Detection Transformer
3.3. Comparison of CNN and Transformer
3.4. Evaluation Metrics
3.4.1. Evaluation Metrics for Recognition
- Accuracy
- Precision
- Recall
- F1-Score
3.4.2. Evaluation Metrics for Detection
- Average Precision
- mean Average Precision
4. Datasets
4.1. IP102 Dataset
4.2. Pest24 Dataset
5. Review of Algorithms
5.1. Pest Detection Algorithms
5.1.1. Algorithms on the IP102 Dataset
5.1.2. Algorithms on the Pest24 Dataset
5.1.3. Algorithms on Self-Collected Datasets
5.2. Pest Recognition Algorithms
5.2.1. Algorithms on the IP102 Dataset
5.2.2. Algorithms on Self-Collected Datasets
6. Challenges and Future Research Directions
6.1. Challenges
6.1.1. Complex Agricultural Environments
6.1.2. Variability in Pest Appearance
6.1.3. Small and Densely Distributed Pests
6.2. Future Research Directions
6.2.1. Directions in Pest Detection
- Optimization of deep learning models: For the complex agricultural environments and variability in pest appearance issues in pest detection tasks, these problems can be addressed by continuously optimizing deep learning models and enhancing their feature extraction capabilities. Ref. [96] proposed an improved YOLOv7 model where the original ELAN module and ELAN-W module were replaced by the CSPResNeXt-50 module and the VoVGSCSP module, respectively. Compared to Faster R-CNN, YOLOv3, YOLOv5-X, YOLOv7, and YOLOv7-X, its mAP is higher by 21.3%, 4.6%, 10.8%, 5.5%, 4.8%, respectively, demonstrating the significant results and importance of optimizing deep learning models.
- Integration of attention mechanism: For the small and densely distributed pests issue in pest detection tasks, the problem can be addressed by using an attention mechanism. Experimental results from [97,98,99] demonstrate that the model from [99] integrated with a channel attention mechanism performs better than the models from [98,99]. The experimental results from [103,104,105,106,107] also demonstrate the excellent performance achieved by model incorporated attention mechanisms. By employing the attention mechanism, the model can focus on key areas within the image, such as specific parts of the pests, thereby improving the detection accuracy of small targets.
- Hybrid architecture network: A hybrid architecture network typically combines the strengths of various network structures and techniques to address limitations that a single network may encounter. Ref. [108] proposed a hybrid architecture combining CNN and Transformer. Compared to Faster R-CNN, SSD, YOLOv5, YOLOv7, and YOLOv8, its average accuracy is higher by 17.04%, 11.23%, 5.78%, 3.75%, and 2.71%, respectively, demonstrating that the hybrid model integrates the fast inference speed of YOLO with the high accuracy of the Transformer.
6.2.2. Directions in Pest Detection
- Multi-image fusion: For the complex agricultural environments and variability in pest appearance issues in pest recognition tasks, these problems can be solved by adopting a multi-image fusion approach. Ref. [115] proposed a multi-image fusion recognition approach based on ResNet-50. Experimental results demonstrated that this model achieved an accuracy of 96.1% and 100%, with the fusion of five and two images on the IP102 dataset and D0 dataset, respectively, illustrating the great potential of multi-image fusion methods.
- Multimodal feature fusion: Experimental results from [121,122] demonstrate that the Transformer model, utilizing multimodal feature fusion techniques, achieves outstanding recognition accuracy on the self-collected image-text multimodal dataset. Therefore, in addition to visual data, integrating other types of data such as text and audio data into pest detection and recognition models should also be considered. Multimodal datasets can provide comprehensive pest feature information, thereby enhancing the algorithm’s generalization ability and accuracy.
- Applying transfer learning with pre-trained models: Deep learning models typically require substantial annotated data for training; transfer learning presents an effective solution. Leveraging pre-trained models on extensive datasets allows rapid adaptation to novel pest detection tasks, even under data constraints. Experimental results from [116], as shown in Table 6, demonstrate that utilizing an ensemble-trained CNN network through transfer learning achieves excellent results on the self-collected dataset.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sajitha, P.; Andrushia, A.D.; Anand, N.; Naser, M. A Review on Machine Learning and Deep Learning Image-based Plant Disease Classification for Industrial Farming Systems. J. Ind. Inf. Integr. 2024, 38, 100572. [Google Scholar] [CrossRef]
- Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
- Rajan, P.; Radhakrishnan, B.; Suresh, L.P. Detection and classification of pests from crop images using support vector machine. In Proceedings of the 2016 International Conference on Emerging Technological Trends (ICETT), Kollam, India, 21–22 October 2016; pp. 1–6. [Google Scholar]
- Sethy, P.K.; Bhoi, C.; Barpanda, N.K.; Panda, S.; Behera, S.K.; Rath, A.K. Pest Detection and Recognition in Rice Crop Using SVM in Approach of Bag-Of-Words. In Proceedings of the International Conference on Software and System Processes, Paris, France, 5–7 July 2017. [Google Scholar]
- Ashok, P.; Jayachandran, J.; Gomathi, S.S.; Jayaprakasan, M. Pest detection and identification by applying color histogram and contour detectionby Svm model. Int. J. Eng. Adv. Technol. 2019, 8, 463–467. [Google Scholar]
- Kasinathan, T.; Uyyala, S.R. Machine learning ensemble with image processing for pest identification and classification in field crops. Neural Comput. Appl. 2021, 33, 7491–7504. [Google Scholar] [CrossRef]
- Kasinathan, T.; Singaraju, D.; Uyyala, S.R. Insect classification and detection in field crops using modern machine learning techniques. Inf. Process. Agric. 2021, 8, 446–457. [Google Scholar] [CrossRef]
- Pattnaik, G.; Parvathy, K. Machine learning-based approaches for tomato pest classification. TELKOMNIKA Telecommun. Comput. Electron. Control 2022, 20, 321–328. [Google Scholar] [CrossRef]
- Kakulapati, V.; Saiteja, S.; Raviteja, S.; Reddy, K.R. A Novel Approach Of Pest Recognition By Analyzing Ensemble Modeling. Solid State Technol. 2020, 63, 1696–1704. [Google Scholar]
- Yang, Z.; Li, W.; Li, M.; Yang, X. Automatic greenhouse pest recognition based on multiple color space features. Int. J. Agric. Biol. Eng. 2021, 14, 188–195. [Google Scholar] [CrossRef]
- Luo, Q.; Xin, W.; Qiming, X. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2017; Volume 69, p. 012162. [Google Scholar]
- Banlawe, I.A.P.; Cruz, J.C.D.; Gaspar, J.C.P.; Gutierrez, E.J.I. Decision tree learning algorithm and naïve Bayes classifier algorithm comparative classification for mango pulp weevil mating activity. In Proceedings of the 2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS), Online, 26 June 2021; pp. 317–322. [Google Scholar]
- Sangeetha, T.; Lavanya, G.; Jeyabharathi, D.; Kumar, T.R.; Mythili, K. Detection of pest and disease in banana leaf using convolution Random Forest. Test Eng. Manag. 2020, 83, 3727–3735. [Google Scholar]
- Sharma, S.; Kumar, V.; Sood, S. Pest Detection Using Random Forest. In Proceedings of the 2023 International Conference on IoT, Communication and Automation Technology (ICICAT), Gorakhpur, India, 23–24 June 2023; pp. 1–8. [Google Scholar]
- Pusadan, M.Y.; Abdullah, A.I. k-Nearest Neighbor and Feature Extraction on Detection of Pest and Diseases of Cocoa. J. RESTI Rekayasa Sist. Dan Teknol. Inf. 2022, 6, 471–480. [Google Scholar]
- Li, Y.; Ercisli, S. Data-efficient crop pest recognition based on KNN distance entropy. Sustain. Comput. Inform. Syst. 2023, 38, 100860. [Google Scholar] [CrossRef]
- Resti, Y.; Irsan, C.; Putri, M.T.; Yani, I.; Ansyori, A.; Suprihatin, B. Identification of corn plant diseases and pests based on digital images using multinomial naïve bayes and k-nearest neighbor. Sci. Technol. Indones. 2022, 7, 29–35. [Google Scholar] [CrossRef]
- Chen, J.W.; Lin, W.J.; Cheng, H.J.; Hung, C.L.; Lin, C.Y.; Chen, S.P. A smartphone-based application for scale pest detection using multiple-object detection methods. Electronics 2021, 10, 372. [Google Scholar] [CrossRef]
- Süto, J. Embedded system-based sticky paper trap with deep learning-based insect-counting algorithm. Electronics 2021, 10, 1754. [Google Scholar] [CrossRef]
- Góral, P.; Pawłowski, P.; Piniarski, K.; Dąbrowski, A. Multi-Agent Vision System for Supporting Autonomous Orchard Spraying. Electronics 2024, 13, 494. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Wagle, S.A.; Varadarajan, V.; Kotecha, K. A new compact method based on a convolutional neural network for classification and validation of tomato plant disease. Electronics 2022, 11, 2994. [Google Scholar] [CrossRef]
- Yi, S.L.; Qin, S.L.; She, F.R.; Wang, T.W. RED-CNN: The multi-classification network for pulmonary diseases. Electronics 2022, 11, 2896. [Google Scholar] [CrossRef]
- Zhu, Z.; Wang, S.; Zhang, Y. ROENet: A ResNet-based output ensemble for malaria parasite classification. Electronics 2022, 11, 2040. [Google Scholar] [CrossRef] [PubMed]
- Fu’adah, Y.N.; Lim, K.M. Classification of Atrial Fibrillation and Congestive Heart Failure Using Convolutional Neural Network with Electrocardiogram. Electronics 2022, 11, 2456. [Google Scholar] [CrossRef]
- Rajeena P.P., F.; Orban, R.; Vadivel, K.S.; Subramanian, M.; Muthusamy, S.; Elminaam, D.S.A.; Nabil, A.; Abulaigh, L.; Ahmadi, M.; Ali, M.A. A novel method for the classification of butterfly species using pre-trained CNN models. Electronics 2022, 11, 2016. [Google Scholar] [CrossRef]
- Amin, R.; Reza, M.S.; Okuyama, Y.; Tomioka, Y.; Shin, J. A Fine-Tuned Hybrid Stacked CNN to Improve Bengali Handwritten Digit Recognition. Electronics 2023, 12, 3337. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Akhtar, M.J.; Mahum, R.; Butt, F.S.; Amin, R.; El-Sherbeeny, A.M.; Lee, S.M.; Shaikh, S. A robust framework for object detection in a traffic surveillance system. Electronics 2022, 11, 3425. [Google Scholar] [CrossRef]
- Cong, P.; Lv, K.; Feng, H.; Zhou, J. Improved yolov3 model for workpiece stud leakage detection. Electronics 2022, 11, 3430. [Google Scholar] [CrossRef]
- Amran, G.A.; Alsharam, M.S.; Blajam, A.O.A.; Hasan, A.A.; Alfaifi, M.Y.; Amran, M.H.; Gumaei, A.; Eldin, S.M. Brain tumor classification and detection using hybrid deep tumor network. Electronics 2022, 11, 3457. [Google Scholar] [CrossRef]
- Dai, J.; Li, T.; Xuan, Z.; Feng, Z. Automated defect analysis system for industrial computerized tomography images of solid rocket motor grains based on yolo-v4 model. Electronics 2022, 11, 3215. [Google Scholar] [CrossRef]
- Gu, Z.; Zhu, K.; You, S. YOLO-SSFS: A Method Combining SPD-Conv/STDL/IM-FPN/SIoU for Outdoor Small Target Vehicle Detection. Electronics 2023, 12, 3744. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Bhan, A.; Mangipudi, P.; Goyal, A. Deep Learning Approach for Automatic Segmentation and Functional Assessment of LV in Cardiac MRI. Electronics 2022, 11, 3594. [Google Scholar] [CrossRef]
- Gargari, M.S.; Seyedi, M.H.; Alilou, M. Segmentation of Retinal Blood Vessels Using U-Net++ Architecture and Disease Prediction. Electronics 2022, 11, 3516. [Google Scholar] [CrossRef]
- Yang, D.; Wang, C.; Cheng, C.; Pan, G.; Zhang, F. Semantic segmentation of side-scan sonar images with few samples. Electronics 2022, 11, 3002. [Google Scholar] [CrossRef]
- Xu, F.; Huang, J.; Wu, J.; Jiang, L. Active mask-box scoring r-cnn for sonar image instance segmentation. Electronics 2022, 11, 2048. [Google Scholar] [CrossRef]
- Xie, X.; Bai, L.; Huang, X. Real-time LiDAR point cloud semantic segmentation for autonomous driving. Electronics 2021, 11, 11. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Lample, G.; Conneau, A.; Denoyer, L.; Ranzato, M. Unsupervised machine translation using monolingual corpora only. arXiv 2017, arXiv:1711.00043. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. arXiv 2017, arXiv:1704.04368. [Google Scholar]
- Liu, Y.; Lapata, M. Text summarization with pretrained encoders. arXiv 2019, arXiv:1908.08345. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Henaff, O. Data-efficient image recognition with contrastive predictive coding. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; pp. 4182–4192. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Zhang, Q.; Yang, Y.B. Rest: An efficient transformer for visual recognition. Adv. Neural Inf. Process. Syst. 2021, 34, 15475–15485. [Google Scholar]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef] [PubMed]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Sun, Z.; Cao, S.; Yang, Y.; Kitani, K.M. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 3611–3620. [Google Scholar]
- Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring plain vision transformer backbones for object detection. In Proceedings of the European Conference on Computer Vision 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 280–296. [Google Scholar]
- Xia, L.; Cao, S.; Cheng, Y.; Niu, L.; Zhang, J.; Bao, H. Rotating Object Detection for Cranes in Transmission Line Scenarios. Electronics 2023, 12, 5046. [Google Scholar] [CrossRef]
- Huo, L.; Guo, K.; Wang, W. An Adaptive Multi-Content Complementary Network for Salient Object Detection. Electronics 2023, 12, 4600. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, Z.; Wang, X.; Shen, C.; Cheng, B.; Shen, H.; Xia, H. End-to-end video instance segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 8741–8750. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 19–25 June 2021; pp. 6881–6890. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Jiao, C.; Yang, T.; Yan, Y.; Yang, A. RFTNet: Region–Attention Fusion Network Combined with Dual-Branch Vision Transformer for Multimodal Brain Tumor Image Segmentation. Electronics 2023, 13, 77. [Google Scholar] [CrossRef]
- Baek, J.H.; Lee, H.K.; Choo, H.G.; Jung, S.h.; Koh, Y.J. Center-Guided Transformer for Panoptic Segmentation. Electronics 2023, 12, 4801. [Google Scholar] [CrossRef]
- Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 6836–6846. [Google Scholar]
- Neimark, D.; Bar, O.; Zohar, M.; Asselmann, D. Video transformer network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 3163–3172. [Google Scholar]
- Yang, J.; Dong, X.; Liu, L.; Zhang, C.; Shen, J.; Yu, D. Recurring the transformer for video action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 14063–14073. [Google Scholar]
- Ranasinghe, K.; Naseer, M.; Khan, S.; Khan, F.S.; Ryoo, M.S. Self-supervised video transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 2874–2884. [Google Scholar]
- Liang, J.; Cao, J.; Fan, Y.; Zhang, K.; Ranjan, R.; Li, Y.; Timofte, R.; Van Gool, L. Vrt: A video restoration transformer. IEEE Trans. Image Process. 2024, 33, 2171–2182. [Google Scholar] [CrossRef]
- Wu, X.; Zhan, C.; Lai, Y.K.; Cheng, M.M.; Yang, J. Ip102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 8787–8796. [Google Scholar]
- Xie, C.; Wang, R.; Zhang, J.; Chen, P.; Dong, W.; Li, R.; Chen, T.; Chen, H. Multi-level learning features for automatic classification of field crop pests. Comput. Electron. Agric. 2018, 152, 233–241. [Google Scholar] [CrossRef]
- Wang, Q.J.; Zhang, S.Y.; Dong, S.F.; Zhang, G.C.; Yang, J.; Li, R.; Wang, H.Q. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Comput. Electron. Agric. 2020, 175, 105585. [Google Scholar] [CrossRef]
- Nanni, L.; Manfè, A.; Maguolo, G.; Lumini, A.; Brahnam, S. High performing ensemble of convolutional neural networks for insect pest image detection. Ecol. Inform. 2022, 67, 101515. [Google Scholar] [CrossRef]
- Chen, M.; Chen, Y.; Guo, M.; Wang, J. Pest Detection and Identification Guided by Feature Maps. In Proceedings of the 2023 Twelfth International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France, 16–19 October 2023; pp. 1–6. [Google Scholar]
- Yang, S.; Xing, Z.; Wang, H.; Dong, X.; Gao, X.; Liu, Z.; Zhang, X.; Li, S.; Zhao, Y. Maize-YOLO: A new high-precision and real-time method for maize pest detection. Insects 2023, 14, 278. [Google Scholar] [CrossRef] [PubMed]
- Tang, Z.; Chen, Z.; Qi, F.; Zhang, L.; Chen, S. Pest-YOLO: Deep image mining and multi-feature fusion for real-time agriculture pest detection. In Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 7–10 December 2021; pp. 1348–1353. [Google Scholar]
- Tang, Z.; Lu, J.; Chen, Z.; Qi, F.; Zhang, L. Improved Pest-YOLO: Real-time pest detection based on efficient channel attention mechanism and transformer encoder. Ecol. Inform. 2023, 78, 102340. [Google Scholar] [CrossRef]
- Qi, F.; Wang, Y.; Tang, Z.; Chen, S. Real-time and effective detection of agricultural pest using an improved YOLOv5 network. J. Real-Time Image Process. 2023, 20, 33. [Google Scholar] [CrossRef]
- Wang, F.; Wang, R.; Xie, C.; Yang, P.; Liu, L. Fusing multi-scale context-aware information representation for automatic in-field pest detection and recognition. Comput. Electron. Agric. 2020, 169, 105222. [Google Scholar] [CrossRef]
- Jiao, L.; Li, G.; Chen, P.; Wang, R.; Du, J.; Liu, H.; Dong, S. Global context-aware-based deformable residual network module for precise pest recognition and detection. Front. Plant Sci. 2022, 13, 895944. [Google Scholar] [CrossRef] [PubMed]
- Dai, M.; Dorjoy, M.M.H.; Miao, H.; Zhang, S. A new pest detection method based on improved YOLOv5m. Insects 2023, 14, 54. [Google Scholar] [CrossRef]
- Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea tree pest detection algorithm based on improved Yolov7-Tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]
- Tian, Y.; Wang, S.; Li, E.; Yang, G.; Liang, Z.; Tan, M. MD-YOLO: Multi-scale Dense YOLO for small target pest detection. Comput. Electron. Agric. 2023, 213, 108233. [Google Scholar] [CrossRef]
- Chu, J.; Li, Y.; Feng, H.; Weng, X.; Ruan, Y. Research on multi-scale pest detection and identification method in granary based on improved YOLOv5. Agriculture 2023, 13, 364. [Google Scholar] [CrossRef]
- Li, K.; Wang, J.; Jalil, H.; Wang, H. A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5. Comput. Electron. Agric. 2023, 204, 107534. [Google Scholar] [CrossRef]
- Chen, J.; Chen, W.; Nanehkaran, Y.; Suzauddola, M. MAM-IncNet: An end-to-end deep learning detector for Camellia pest recognition. Multimed. Tools Appl. 2024, 83, 31379–31394. [Google Scholar] [CrossRef]
- Ye, R.; Gao, Q.; Qian, Y.; Sun, J.; Li, T. Improved Yolov8 and Sahi Model for the Collaborative Detection of Small Targets at the Micro Scale: A Case Study of Pest Detection in Tea. Agronomy 2024, 14, 1034. [Google Scholar] [CrossRef]
- Liu, W.; Wu, G.; Ren, F.; Kang, X. DFF-ResNet: An insect pest recognition model based on residual networks. Big Data Min. Anal. 2020, 3, 300–310. [Google Scholar] [CrossRef]
- Ayan, E.; Erbay, H.; Varçın, F. Crop pest classification with a genetic algorithm-based weighted ensemble of deep convolutional neural networks. Comput. Electron. Agric. 2020, 179, 105809. [Google Scholar] [CrossRef]
- Feng, F.; Dong, H.; Zhang, Y.; Zhang, Y.; Li, B. Ms-aln: Multiscale attention learning network for pest recognition. IEEE Access 2022, 10, 40888–40898. [Google Scholar] [CrossRef]
- Zheng, T.; Yang, X.; Lv, J.; Li, M.; Wang, S.; Li, W. An efficient mobile model for insect image classification in the field pest management. Eng. Sci. Technol. Int. J. 2023, 39, 101335. [Google Scholar] [CrossRef]
- Devi, R.; Kumar, V.; Sivakumar, P. EfficientNetV2 Model for Plant Disease Classification and Pest Recognition. Comput. Syst. Sci. Eng. 2023, 45, 2249–2263. [Google Scholar] [CrossRef]
- Anwar, Z.; Masood, S. Exploring Deep Ensemble Model for Insect and Pest Detection from Images. Procedia Comput. Sci. 2023, 218, 2328–2337. [Google Scholar] [CrossRef]
- Chen, Y.; Chen, M.; Guo, M.; Wang, J.; Zheng, N. Pest recognition based on multi-image feature localization and adaptive filtering fusion. Front. Plant Sci. 2023, 14, 1282212. [Google Scholar] [CrossRef]
- Nandhini, C.; Brindha, M. Visual regenerative fusion network for pest recognition. Neural Comput. Appl. 2024, 36, 2867–2882. [Google Scholar] [CrossRef]
- Li, Y.; Wang, H.; Dang, L.M.; Sadeghi-Niaraki, A.; Moon, H. Crop pest recognition in natural scenes using convolutional neural networks. Comput. Electron. Agric. 2020, 169, 105174. [Google Scholar] [CrossRef]
- Chen, J.; Chen, W.; Zeb, A.; Zhang, D.; Nanehkaran, Y.A. Crop pest recognition using attention-embedded lightweight network under field conditions. Appl. Entomol. Zool. 2021, 56, 427–442. [Google Scholar] [CrossRef]
- Xu, C.; Yu, C.; Zhang, S.; Wang, X. Multi-scale convolution-capsule network for crop insect pest recognition. Electronics 2022, 11, 1630. [Google Scholar] [CrossRef]
- Zhao, S.; Liu, J.; Bai, Z.; Hu, C.; Jin, Y. Crop pest recognition in real agricultural environment using convolutional neural networks by a parallel attention mechanism. Front. Plant Sci. 2022, 13, 839572. [Google Scholar] [CrossRef] [PubMed]
- Dai, G.; Fan, J.; Dewi, C. ITF-WPI: Image and text based cross-modal feature fusion model for wolfberry pest recognition. Comput. Electron. Agric. 2023, 212, 108129. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, L.; Yuan, Y. Multimodal fine-grained transformer model for pest recognition. Electronics 2023, 12, 2620. [Google Scholar] [CrossRef]
- Hassan, S.M.; Maji, A.K. Pest Identification based on fusion of Self-Attention with ResNet. IEEE Access 2024, 12, 6036–6050. [Google Scholar] [CrossRef]
CNN | Transformer | |
---|---|---|
Strengths | Local connectivity and weight sharing | Global feature extraction |
Translation invariance | Parallel computing | |
Mature models and techniques | Unified architecture | |
Weaknesses | Limited receptive field | High computational and memory costs |
More complexity for more layers | Requires large-scale data for training | |
Limited multi-scale feature handling | Lack of intrinsic spatial encoding |
Paper | Year | Methods | Dataset | Metrics |
---|---|---|---|---|
[94] | 2022 | Ensemble Learning | IP102 Dataset | mAP: 74.1% F1-Score: 73.0% |
[95] | 2023 | ResNet-50 CAM | mAP: 74.27% | |
[96] | 2023 | Improved YOLOv7 CSPResNeXt-50 VoVGSCSP | mAP: 76.3% Precision: 73.1% Recall: 77.3% F1-Score: 75.2% | |
[97] | 2021 | ResNet-50 Feature Fusion YOLOv3 | Pest24 Dataset | mAP: 71.6% Recall: 83.5% |
[98] | 2022 | ResNet-50 Transformer Feature Fusion YOLOv3 | mAP: 73.4% Recall: 83.9% | |
[99] | 2023 | GhostNet Improved YOLOv5 Channel Attention | mAP: 74.1% | |
[94] | 2022 | Ensemble Learning | D0 Dataset | mAP: 99.8% F1-Score: 99.7% |
[100] | 2020 | ResNet-50 Context-Aware Attention | 17,192 collected images | mAP: 74.3% |
[101] | 2022 | DBR-Net FPN Residual Learning | 24,412 collected images | mAP: 77.8% |
[102] | 2023 | Improved YOLOv5m Transformer | 1309 collected images | mAP: 96.4% Precision: 95.7% Recall: 93.1% F1-Score: 94.38% |
[103] | 2023 | Improved YOLOv7-Tiny Biformer Dynamic Attention | 782 collected images | mAP: 93.23% Recall: 90.81% |
[104] | 2023 | DenseNet Adaptive Attention YOLOv3 | 289 collected images | mAP: 86.2% F1-Score: 79.1% |
[105] | 2023 | Improved YOLOv5 Channel Attention FPN | 5231 collected images | mAP: 98.2% Accuracy: 97.20% Recall: 96.85% |
[106] | 2023 | Improved YOLOv5 Adaptive Attention | 6000 collected images | mAP: 96.51% F1-Score: 96.54% |
[107] | 2024 | VGG-16 Channel and Spatial Attention | 1035 collected images | mAP: 95.87% Recall: 81.44% Precision: 97.53% F1-Score: 88.76% |
[108] | 2024 | Improved YOLOv8 Transformer | 2864 collected images | mAP: 98.17% Precision: 96.32% Recall: 97.95% F1-Score: 97.13% |
Paper | Pros | Cons |
---|---|---|
[94] | High training efficiency | High computational complexity |
High robustness | Long training time | |
Complex management of models | ||
[95] | Strong algorithm generality | High demand for computational resources |
Low dependency on annotated data | ||
[96] | High detection accuracy | Poor generalization ability |
High computational efficiency | Complex model |
Paper | Pros | Cons |
---|---|---|
[97] | Excellent performance on small objects | Complex model |
High data dependency | ||
[98] | Excellent performance on small objects | High computational complexity |
Strong global feature capture ability | Large model size | |
[99] | Good real-time performance | Poor generalization ability |
High accuracy | High demand for computational | |
Small model footprint | resources |
Paper | Pros | Cons |
---|---|---|
[100] | Multi-scale information fusion | Complex practical application |
Suitable for large-scale datasets | Long training time | |
[101] | Good real-time performance | High data dependency |
Suitable for large-scale datasets | High demand for computational | |
resources | ||
[102] | High detection accuracy | Complex model |
Good robustness | ||
High computational efficiency | ||
[103] | Strong feature extraction capability | Poor generalization ability |
High detection accuracy | Complex model | |
[104] | Multi-scale detection capability | Long training time |
Good real-time performance | Poor generalization ability | |
Complex model | ||
[105] | High detection accuracy | Complex model |
Excellent performance on small objects | ||
Good robustness | ||
[106] | High detection accuracy | Complex model |
Good real-time performance | ||
Small model footprint | ||
[107] | High detection accuracy | Poor generalization ability |
Complex model | ||
[108] | Strong feature extraction capability | Long training time |
High detection accuracy | Complex model | |
Good real-time performance | High demand for computational | |
resources |
Paper | Year | Method | Dataset | Metrics |
---|---|---|---|---|
[109] | 2020 | ResNet Feature Fusion | IP102 Dataset | Accuracy: 55.43% F1-Score: 54.18% |
[110] | 2020 | Ensemble Learning | Accuracy: 67.13% Precision: 67.17% Recall: 67.13% F1-Score: 65.76% | |
[111] | 2022 | ResNet-50 Attention | Accuracy: 74.61% F1-Score: 67.83% | |
[112] | 2023 | EfficientNetV2 Coordinate Attention Feature Fusion | Accuracy: 73.7% | |
[113] | 2023 | EfficientNetV2 Transfer Learning | Accuracy: 80.1% | |
[114] | 2023 | Ensemble Learning Transfer Learning | Accuracy: 82.5% | |
[115] | 2023 | ResNet-50 Multi-Image Fusion | Accuracy: 96.1% F1-Score: 95.9% | |
[116] | 2024 | ResNet Feature Fusion | Accuracy: 68.34% Precision: 68.37% Recall: 68.33% F1-Score: 68.34% | |
[110] | 2020 | Ensemble Learning | D0 Dataset | Accuracy: 98.81% Precision: 98.88% Recall: 98.81% F1-Score: 98.81% |
[115] | 2023 | ResNet-50 Multi-Image Fusion | Accuracy: 100% F1-Score: 100% | |
[116] | 2024 | ResNet Feature Fusion | Accuracy: 99.12% Precision: 99.84% Recall: 99.12% F1-Score: 99.13% | |
[117] | 2020 | GoogleNet | 5629 collected images | Accuracy: 96.67% |
[118] | 2021 | MobileNetv2 CAM | Accuracy: 99.14% | |
[119] | 2022 | CNN CapsNet | Subset of IP102 Dataset | Accuracy: 91.4% |
[120] | 2022 | ResNet-50 Parallel Attention | 5245 collected images | Accuracy: 98.17% |
[121] | 2022 | Transformer Cross-modal Feature Fusion | 10,598 collected images | Accuracy: 97.98% |
[122] | 2023 | Transformer Cross-modal Feature Fusion | 1902 collected images | Accuracy: 98.12% Precision: 99.07% Recall: 98.56% F1-Score: 98.50% |
[123] | 2024 | ResNet-50 | 3150 collected images | Accuracy: 99.80% |
Paper | Pros | Cons |
---|---|---|
[109] | Strong generalization ability | Low recognition accuracy |
High demand for computational | ||
resources | ||
[110] | High recognition accuracy | High computational complexity |
Long training time | ||
Complex management of models | ||
[111] | Strong feature extraction capability | Large model footprint |
High recognition accuracy | Long training time | |
Complex model | ||
[112] | Small model footprint | Long training time |
Strong generalization ability | Complex model | |
High recognition accuracy | ||
[113] | Strong model learning capability | Static network topology |
High recognition accuracy | High demand for computational | |
resources | ||
[114] | High recognition accuracy | Static network topology |
Strong generalization ability | High computational complexity | |
Good robustness | Complex management of models | |
[115] | High recognition accuracy | Long training time |
Complex model | ||
[116] | High recognition accuracy | High computational complexity |
Good robustness | Poor generalization ability |
Paper | Pros | Cons |
---|---|---|
[117] | High recognition accuracy | Complex model |
Good robustness | High demand for computational | |
resources | ||
[118] | High recognition accuracy | Poor generalization ability |
Small model footprint | ||
[119] | Multi-scale feature extraction | High training difficulty |
High recognition accuracy | Complex model | |
[120] | High recognition accuracy | Poor performance on small objects |
Good real-time performance | Complex model | |
[121] | Cross-modal feature fusion | Complex practical application |
High recognition accuracy | Poor generalization ability | |
Suitable for large-scale datasets | Complex model | |
[122] | Cross-modal feature fusion | Complex practical application |
High recognition accuracy | Poor generalization ability | |
Complex model | ||
[123] | Strong feature extraction capability | High computational complexity |
High recognition accuracy | Complex model |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, B.; Wang, J.; Guo, M.; Chen, M.; Chen, Y.; Miao, Y. Overview of Pest Detection and Recognition Algorithms. Electronics 2024, 13, 3008. https://doi.org/10.3390/electronics13153008
Guo B, Wang J, Guo M, Chen M, Chen Y, Miao Y. Overview of Pest Detection and Recognition Algorithms. Electronics. 2024; 13(15):3008. https://doi.org/10.3390/electronics13153008
Chicago/Turabian StyleGuo, Boyu, Jianji Wang, Minghui Guo, Miao Chen, Yanan Chen, and Yisheng Miao. 2024. "Overview of Pest Detection and Recognition Algorithms" Electronics 13, no. 15: 3008. https://doi.org/10.3390/electronics13153008
APA StyleGuo, B., Wang, J., Guo, M., Chen, M., Chen, Y., & Miao, Y. (2024). Overview of Pest Detection and Recognition Algorithms. Electronics, 13(15), 3008. https://doi.org/10.3390/electronics13153008