HCFormer: A Lightweight Pest Detection Model Combining CNN and ViT
Abstract
:1. Introduction
- In the pest detection process, the limitations of image resolution result in insufficient feature information extraction, making it difficult to fully capture the image characteristics;
- In pest object detection tasks, complex environmental factors such as crops, lighting conditions, and occlusions increase the difficulties involved with target detection and impose higher demands on detection accuracy;
- During the training of object detection models, in the pursuit of greater accuracy, the models often have a large number of parameters. Therefore, finding a balance between accuracy and computational resources remains a significant challenge.
- To address the issue of insufficient feature extraction capability in pest object detection tasks, a novel parallel feature extraction network for pest classification recognition, HCFormer, is proposed in this paper, which leverages the strengths of CNNs and ViT networks to address the shortcomings of traditional image recognition networks. This hybrid method utilizes CNN’s robust capability for local feature extraction and ViT’s advantage in handling long-range dependencies and capturing global features. This combination allows for a more comprehensive extraction of features across different scales and parts, enhancing the accuracy of recognition. At the same time, employing overlapping CNNs with varying kernel sizes to extract multi-scale local features from images has improved the model’s ability to adapt to different pest shapes and sizes;
- To address the issues of large parameter quantities, slow computation speeds, and low computational efficiencies during the training of large models, a lightweight improvement to the attention mechanism network is introduced in this paper alongside the down-sampling technique. Additionally, convolutional networks are incorporated to compensate for the accuracy loss caused by reduced image resolution, thereby reducing the weight of the model;
- We validated our model using pest photos from Flickr, a popular photo-sharing platform, comparing it against various types of deep-learning pest recognition networks. To increase the diversity and scale of our training dataset and thus boost the model’s generalization capacity and performance, we utilized a variety of image augmentation techniques, including rotation, local zoom, brightness adjustment, and saturation modification. These augmentation techniques ensured consistent performance across varying lighting conditions and improved the model’s adaptability to color variations.
2. Materials and Methods
2.1. Related Work on Pest Detection
2.2. Pest Dataset
2.3. Methods
2.3.1. Input Block
2.3.2. HCFormer Block
2.3.3. Improved Attention Mechanism
2.3.4. Output Module
3. Results
3.1. Evaluation Metrics
3.2. Experimental Environment Parameter Settings
4. Discussion
4.1. Experimental Results and Analysis
4.2. Ablation Study
5. Conclusions
- To address the issues of insufficient detection accuracy, local receptive fields, and global receptive fields in current pest detection networks operating in complex environments, we propose a parallel feature extraction architecture that combines a multi-scale CNN and optimized ViT networks. This architecture extracts and integrates both the local and global features of images, enhancing the detection performance. Further, the backbone model employs a pyramid structure, which not only accelerates the computational speed of the model but also improves its robustness. This provides a new approach to pest object detection with strong practicality, it can be applied in agriculture through automated monitoring systems [53], intelligent spraying, drone surveillance [54], and farm management platforms [55], improving detection accuracy, reducing pesticide use, and increasing crop yield and quality;
- We redesigned the attention mechanism in the transformer and proposed a new attention mechanism network to address the issue of local feature accuracy loss when the input resolution is reduced in the pyramid structure. Additionally, we introduced down-sampling methods to reduce the model’s weight while maintaining accuracy, significantly reducing the number of parameters in the network and improving the model’s speed;
- We trained our model on a dataset obtained from Flickr and tested it against other mainstream object detection models. In both single-target and multi-target scenarios across different environments, our model exhibited excellent performance. The experimental results show that the HCFormer model achieved an accuracy of 98.17%, a recall rate of 91.98%, and an mAP of 90.57%, effectively meeting the requirements for lightweight and high-precision pest detection. The pest detection network, based on a hybrid of a multi-scale CNN and ViT networks, can efficiently complete pest detection tasks in complex environments, quickly and accurately extracting features of pest images. Therefore, our model holds significant research importance and practical value in the field of pest detection.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lee, S.; Yun, C.M. A Deep Learning Model for Predicting Risks of Crop Pests and Diseases from Sequential Environmental Data. Plant Methods 2023, 19, 145. [Google Scholar] [CrossRef] [PubMed]
- Domingues, T.; Brandão, T.; Ferreira, J.C. Machine Learning for Detection and Prediction of Crop Diseases and Pests: A Comprehensive Survey. Agriculture 2022, 12, 1350. [Google Scholar] [CrossRef]
- FAOSTAT. Available online: https://www.fao.org/faostat/en/#data/RP/visualize (accessed on 11 May 2024).
- Khan, B.A.; Nadeem, M.A.; Nawaz, H.; Amin, M.M.; Abbasi, G.H.; Nadeem, M.; Ali, M.; Ameen, M.; Javaid, M.M.; Maqbool, R.; et al. Pesticides: Impacts on Agriculture Productivity, Environment, and Management Strategies. In Emerging Contaminants and Plants: Interactions, Adaptations and Remediation Technologies; Aftab, T., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 109–134. ISBN 978-3-031-22269-6. [Google Scholar]
- Deutsch, C.A.; Tewksbury, J.J.; Tigchelaar, M.; Battisti, D.S.; Merrill, S.C.; Huey, R.B.; Naylor, R.L. Increase in Crop Losses to Insect Pests in a Warming Climate. Science 2018, 361, 916–919. [Google Scholar] [CrossRef]
- Yang, B.; Zhang, Z.; Yang, C.-Q.; Wang, Y.; Orr, M.C.; Wang, H.; Zhang, A.-B. Identification of Species by Combining Molecular and Morphological Data Using Convolutional Neural Networks. Syst. Biol. 2022, 71, 690–705. [Google Scholar] [CrossRef] [PubMed]
- Almeida-Silva, D.; Vera Candioti, F. Shape Evolution in Two Acts: Morphological Diversity of Larval and Adult Neoaustraranan Frogs. Animals 2024, 14, 1406. [Google Scholar] [CrossRef]
- Hu, Z.; Xiang, Y.; Li, Y.; Long, Z.; Liu, A.; Dai, X.; Lei, X.; Tang, Z. Research on Identification Technology of Field Pests with Protective Color Characteristics. Appl. Sci. 2022, 12, 3810. [Google Scholar] [CrossRef]
- Xiao, Z.; Yin, K.; Geng, L.; Wu, J.; Zhang, F.; Liu, Y. Pest Identification via Hyperspectral Image and Deep Learning. Signal Image Video Process. 2022, 16, 873–880. [Google Scholar] [CrossRef]
- Ai, Y.; Sun, C.; Tie, J.; Cai, X. Research on Recognition Model of Crop Diseases and Insect Pests Based on Deep Learning in Harsh Environments. IEEE Access 2020, 8, 171686–171693. [Google Scholar] [CrossRef]
- Jafar, A.; Bibi, N.; Naqvi, R.A.; Sadeghi-Niaraki, A.; Jeong, D. Revolutionizing Agriculture with Artificial Intelligence: Plant Disease Detection Methods, Applications, and Their Limitations. Front. Plant Sci. 2024, 15, 1356260. [Google Scholar] [CrossRef]
- Ngugi, L.C.; Abelwahab, M.; Abo-Zahhad, M. Recent Advances in Image Processing Techniques for Automated Leaf Pest and Disease Recognition—A Review. Inf. Process. Agric. 2021, 8, 27–51. [Google Scholar] [CrossRef]
- Liu, J.; Wang, X. Plant Diseases and Pests Detection Based on Deep Learning: A Review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
- Cai, G.; Qian, J.; Song, T.; Zhang, Q.; Liu, B. A Deep Learning-Based Algorithm for Crop Disease Identification Positioning Using Computer Vision. Int. J. Comput. Sci. Inf. Technol. 2023, 1, 85–92. [Google Scholar] [CrossRef]
- Preti, M.; Verheggen, F.; Angeli, S. Insect Pest Monitoring with Camera-Equipped Traps: Strengths and Limitations. J. Pest Sci. 2021, 94, 203–217. [Google Scholar] [CrossRef]
- Fu, X.; Ma, Q.; Yang, F.; Zhang, C.; Zhao, X.; Chang, F.; Han, L. Crop Pest Image Recognition Based on the Improved ViT Method. Inf. Process. Agric. 2024, 11, 249–259. [Google Scholar] [CrossRef]
- Malek, M.A.; Reya, S.S.; Hasan, M.Z.; Hossain, S. A Crop Pest Classification Model Using Deep Learning Techniques. In Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 5–7 January 2021; pp. 367–371. [Google Scholar]
- Rajalakshmi, D.D.; Monishkumar, V.; Balasainarayana, S.; Prasad, M.S.R. Deep Learning Based Multi Class Wild Pest Identification and Solving Approach Using Cnn. Ann. Rom. Soc. Cell Biol. 2021, 25, 16439–16450. [Google Scholar]
- Wang, S.; Zeng, Q.; Ni, W.; Cheng, C.; Wang, Y. ODP-Transformer: Interpretation of Pest Classification Results Using Image Caption Generation Techniques. Comput. Electron. Agric. 2023, 209, 107863. [Google Scholar] [CrossRef]
- Zhan, B.; Li, M.; Luo, W.; Li, P.; Li, X.; Zhang, H. Study on the Tea Pest Classification Model Using a Convolutional and Embedded Iterative Region of Interest Encoding Transformer. Biology 2023, 12, 1017. [Google Scholar] [CrossRef]
- Kalaydjian, C.T. An Application of Vision Transformer(ViT) for Image-Based Plant Disease Classification; UCLA: Los Angeles, CA, USA, 2023. [Google Scholar]
- Remya, S.; Anjali, T.; Abhishek, S.; Ramasubbareddy, S.; Cho, Y. The Power of Vision Transformers and Acoustic Sensors for Cotton Pest Detection. IEEE Open J. Comput. Soc. 2024, 5, 356–367. [Google Scholar] [CrossRef]
- Li, G.; Wang, Y.; Zhao, Q.; Yuan, P.; Chang, B. PMVT: A Lightweight Vision Transformer for Plant Disease Identification on Mobile Devices. Front. Plant Sci. 2023, 14, 1256773. [Google Scholar] [CrossRef]
- Ye, R.; Gao, Q.; Qian, Y.; Sun, J.; Li, T. Improved YOLOv8 and SAHI Model for the Collaborative Detection of Small Targets at the Micro Scale: A Case Study of Pest Detection in Tea. Agronomy 2024, 14, 1034. [Google Scholar] [CrossRef]
- Tian, Y.; Wang, S.; Li, E.; Yang, G.; Liang, Z.; Tan, M. MD-YOLO: Multi-Scale Dense YOLO for Small Target Pest Detection. Comput. Electron. Agric. 2023, 213, 108233. [Google Scholar] [CrossRef]
- Wang, F.; Wang, R.; Xie, C.; Zhang, J.; Li, R.; Liu, L. Convolutional Neural Network Based Automatic Pest Monitoring System Using Hand-Held Mobile Image Analysis towards Non-Site-Specific Wild Environment. Comput. Electron. Agric. 2021, 187, 106268. [Google Scholar] [CrossRef]
- Kang, C.; Jiao, L.; Wang, R.; Liu, Z.; Du, J.; Hu, H. Attention-Based Multiscale Feature Pyramid Network for Corn Pest Detection under Wild Environment. Insects 2022, 13, 978. [Google Scholar] [CrossRef]
- Larios, N.; Deng, H.; Zhang, W.; Sarpola, M.; Yuen, J.; Paasch, R.; Moldenke, A.; Lytle, D.A.; Correa, S.R.; Mortensen, E.N.; et al. Automated Insect Identification through Concatenated Histograms of Local Appearance Features: Feature Vector Generation and Region Detection for Deformable Objects. Mach. Vis. Appl. 2008, 19, 105–123. [Google Scholar] [CrossRef]
- Heo, G.; Klette, R.; Woo, Y.W.; Kim, K.-B.; Kim, N.H. Fuzzy Support Vector Machine with a Fuzzy Nearest Neighbor Classifier for Insect Footprint Classification. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
- Nanni, L.; Manfè, A.; Maguolo, G.; Lumini, A.; Brahnam, S. High Performing Ensemble of Convolutional Neural Networks for Insect Pest Image Detection. Ecol. Inform. 2022, 67, 101515. [Google Scholar] [CrossRef]
- Kuzuhara, H.; Takimoto, H.; Sato, Y.; Kanagawa, A. Insect Pest Detection and Identification Method Based on Deep Learning for Realizing a Pest Control System. In Proceedings of the 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), Chiang Mai, Thailand, 23–26 September 2020; pp. 709–714. [Google Scholar]
- Patel, D.; Bhatt, N. Improved Accuracy of Pest Detection Using Augmentation Approach with Faster R-CNN. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1042, 012020. [Google Scholar] [CrossRef]
- Venkatasaichandrakanth, P.; Iyapparaja, M. GNViT—An Enhanced Image-Based Groundnut Pest Classification Using Vision Transformer (ViT) Model. PLoS ONE 2024, 19, e0301174. [Google Scholar] [CrossRef]
- Zhang, L.; Du, J.; Wang, R. FE-VIT: A Faster and Extensible Vision Transformer Based on Self Pre-Training for Pest Recognition. In Proceedings of the International Conference on Agri-Photonics and Smart Agricultural Sensing Technologies (ICASAST 2022), Zhengzhou, China, 18 October 2022; SPIE: Bellingham, WA, USA, 2022; Volume 12349, pp. 35–42. [Google Scholar]
- Gulzar, Y.; Ünal, Z.; Ayoub, S.; Reegu, F.A.; Altulihan, A. Adaptability of Deep Learning: Datasets and Strategies in Fruit Classification. BIO Web Conf. 2024, 85, 01020. [Google Scholar] [CrossRef]
- Alkanan, M.; Gulzar, Y. Enhanced Corn Seed Disease Classification: Leveraging MobileNetV2 with Feature Augmentation and Transfer Learning. Front. Appl. Math. Stat. 2024, 9, 1320177. [Google Scholar] [CrossRef]
- Agarwal, N.; Kalita, T.; Dubey, A.K. Classification of Insect Pest Species Using CNN Based Models. In Proceedings of the 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 28–30 April 2023; pp. 862–866. [Google Scholar]
- Zhang, L.; Yin, L.; Liu, L.; Zhuo, R.; Zhuo, Y. Forestry Pests Identification and Classification Based on Improved YOLO V5s. In Proceedings of the 2021 International Conference on Electronic Information Engineering and Computer Science (EIECS), Changchun, China, 23–26 September 2021; pp. 670–673. [Google Scholar]
- Song, Y.; Duan, X.; Ren, Y.; Xu, J.; Luo, L.; Li, D. Identification of the Agricultural Pests Based on Deep Learning Models. In Proceedings of the 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 8–10 November 2019; pp. 195–198. [Google Scholar]
- Ullah, N.; Khan, J.A.; Alharbi, L.A.; Raza, A.; Khan, W.; Ahmad, I. An Efficient Approach for Crops Pests Recognition and Classification Based on Novel DeepPestNet Deep Learning Model. IEEE Access 2022, 10, 73019–73032. [Google Scholar] [CrossRef]
- Patole, S.S. Review on Beetles (Coleopteran): An Agricultural Major Crop Pests of the World. Int. J. Life-Sci. Sci. Res. 2017, 3, 1424–1432. [Google Scholar] [CrossRef]
- Szwejda, J.H. Butterfly Pests (Lepidoptera) Occurring on Vegetable Crops in Poland. J. Hortic. Res. 2022, 30, 67–86. [Google Scholar] [CrossRef]
- Das, P.P.G.; Bhattacharyya, B.; Bhagawati, S.; Devi, E.B.; Manpoong, N.S.; Bhairavi, K.S. Slug: An Emerging Menace in Agriculture: A Review. J. Entomol. Zool. Stud. 2020, 8, 1–6. [Google Scholar]
- Smith, L.N. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 464–472. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D.; et al. Ultralytics/Yolov5: V7.0—YOLOv5 SOTA Realtime Instance Segmentation. Zenodo 2022. Available online: https://zenodo.org/records/7347926 (accessed on 11 May 2024).
- Ultralytics YOLOv8. Available online: https://docs.ultralytics.com/models/yolov8 (accessed on 11 May 2024).
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Mehta, S.; Rastegari, M. MobileViT: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Chen, C.-F.R.; Fan, Q.; Panda, R. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 347–356. [Google Scholar]
- Sciarretta, A.; Calabrese, P. Development of Automated Devices for the Monitoring of Insect Pests. Curr. Agric. Res. J. 2019, 7, 19–25. [Google Scholar] [CrossRef]
- Chen, C.-J.; Huang, Y.-Y.; Li, Y.-S.; Chen, Y.-C.; Chang, C.-Y.; Huang, Y.-M. Identification of Fruit Tree Pests with Deep Learning on Embedded Drone to Achieve Accurate Pesticide Spraying. IEEE Access 2021, 9, 21986–21997. [Google Scholar] [CrossRef]
- Awuor, F.; Otanga, S.; Kimeli, V.; Rambim, D.; Abuya, T. E-Pest Surveillance: Large Scale Crop Pest Surveillance and Control. In Proceedings of the 2019 IST-Africa Week Conference (IST-Africa), Nairobi, Kenya, 8–10 May 2019; pp. 1–8. [Google Scholar]
Pest | Original | Rotation | Reduction | Intensity | Saturability | Total |
---|---|---|---|---|---|---|
Ants | 251 | 62 | 62 | 62 | 62 | 499 |
Bees | 252 | 62 | 62 | 62 | 62 | 500 |
Beetles | 212 | 52 | 52 | 52 | 52 | 420 |
Caterpillars | 227 | 57 | 57 | 57 | 57 | 455 |
Earthworms | 160 | 40 | 40 | 40 | 40 | 320 |
Earwigs | 234 | 58 | 58 | 58 | 58 | 466 |
Grasshoppers | 244 | 60 | 60 | 60 | 60 | 484 |
Moths | 248 | 62 | 62 | 62 | 62 | 496 |
Slugs | 194 | 49 | 49 | 49 | 49 | 390 |
Snails | 252 | 62 | 62 | 62 | 62 | 500 |
Wasps | 334 | 84 | 84 | 84 | 84 | 670 |
Weevils | 152 | 37 | 37 | 37 | 37 | 300 |
Parameter | Setting |
---|---|
GPU | GeForce RTX 3090 |
Operating system | Windows 10 |
GPU accelerated environment | CUDA 11.1 |
Framework | Pytorch |
Memory | 64 G |
Parameter | Value |
---|---|
Learning rate | 0.001 |
Batch size | 32 |
Epoch | 300 |
Optimization algorithm | SGDM |
Train size | 0.9 |
Test size | 0.1 |
S. No | Method | Accuracy (%) | mAP (%) | Recall (%) | Classes | Method |
---|---|---|---|---|---|---|
1 | VGGNet [45] | 85.60 | 77.31 | 78.31 | CNN | Multi-segment CNN |
2 | ResNet [46] | 87.75 | 79.64 | 80.60 | Residual block and CNN | |
3 | SENet [47] | 90.32 | 82.19 | 82.33 | Channel attention mechanism | |
4 | YOLOv5 [48] | 90.67 | 82.43 | 83.19 | YOLO | CBL, Res unit, CSPX, and SPP |
5 | YOLOv8 [49] | 94.62 | 87.78 | 88.56 | ||
6 | ViT [50] | 95.03 | 88.09 | 89.11 | ViT | Transformer encoder |
7 | Mobile ViT [51] | 95.57 | 88.33 | 89.37 | Lightweight ViT | |
8 | CrossViT [52] | 96.16 | 89.12 | 90.35 | Dual ranch transformer | |
9 | HCFormer | 98.17 | 90.57 | 91.98 | CNN and ViT | Fusing global and local features |
Class | Net Name | Accuracy (%) | mAP (%) | Recall (%) | Params (M) * |
---|---|---|---|---|---|
1 | CNN + ViT | 89.34 | 85.73 | 86.49 | 34.3 |
2 | ViT | 95.03 | 88.09 | 89.11 | 31.6 |
3 | Multi-CNN + ViT | 96.30 | 89.04 | 90.78 | 37.7 |
4 | HCFormer | 98.17 | 90.57 | 91.98 | 26.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, M.; Chen, S.; Liu, H.; Wang, W.; Xie, J. HCFormer: A Lightweight Pest Detection Model Combining CNN and ViT. Agronomy 2024, 14, 1940. https://doi.org/10.3390/agronomy14091940
Zeng M, Chen S, Liu H, Wang W, Xie J. HCFormer: A Lightweight Pest Detection Model Combining CNN and ViT. Agronomy. 2024; 14(9):1940. https://doi.org/10.3390/agronomy14091940
Chicago/Turabian StyleZeng, Meiqi, Shaonan Chen, Hongshan Liu, Weixing Wang, and Jiaxing Xie. 2024. "HCFormer: A Lightweight Pest Detection Model Combining CNN and ViT" Agronomy 14, no. 9: 1940. https://doi.org/10.3390/agronomy14091940