Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments
Abstract
:1. Introduction
- We developed a novel dataset specifically for Aksu apple detection in complex environments due to the absence of publicly available datasets. This dataset was constructed through manual image collection and further expanded using an offline data augmentation algorithm (weather_aug). This algorithm simulates various real-world weather conditions—such as rainy, foggy, and cloudy days—as well as low-light environments, thereby improving the model’s generalization ability in diverse orchard conditions.
- We proposed the RepIRD Block module and introduced the sparse vision graph attention (SVGA) module to address issues related to insufficient apple feature information and environmental interference. Additionally, we developed a new CNN-GCN architecture for feature extraction, designated Rep-Vision-GCN. This architecture effectively captures both multi-scale local features and global contextual information, thereby enhancing apple detection performance under complex environmental conditions.
- We implemented the RepConvsBlock re-parameterization module and constructed the Rep-FPN-PAN feature fusion network to address the challenge of inadequate feature fusion. This network is designed to effectively manage significant size variations in apples due to different shooting distances, thereby improving feature integration and detection accuracy.
- We adopted a channel pruning algorithm based on LAMP scores to tackle the increased computational and parameter overhead in the enhanced Rep-ViG-Apple model. This algorithm prunes redundant feature maps to compress the model’s size while maintaining its accuracy.
2. Materials and Methods
2.1. Dataset Construction
2.2. Rep-ViG-Apple Network Architecture
2.2.1. Rep-Vision-GCN Feature Extraction Network
2.2.2. Rep-FPN-PAN Feature Fusion Network
2.2.3. Model Pruning
3. Experiments and Results
3.1. Experimental Environment and Parameter Settings
3.2. Evaluation Metrics
3.3. Comparative Experiments of SVGA and Different Attention Mechanisms
3.4. Comparative Experiments of Rep-Vision-GCN and Different Types of Feature Extraction Networks
3.5. Ablation Study of the Improvement Process
3.6. Comparison Experiments of Different Detection Models
3.7. Comparison Experiments of Different Detection Models
3.8. Pruning Comparison Experiments for the Rep-ViG-Apple Model
3.9. Comparative Experimental Results of Different Detection Models
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, X.; Zhao, Z.; Wang, J. Measurement of Concentration of Apple Production in China’s Main Production Areas and Analysis of Their Competitiveness. J. Hebei Agric. Sci. 2023, 27, 83–86. [Google Scholar]
- Chen, Q.; Yin, C.; Guo, Z.; Wang, J.; Zhou, H.; Jiang, X. Current status and future development of the key technologies for apple picking robots. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2023, 39, 1–15. [Google Scholar]
- Chang, Q.; Li, J. Development trend of apple industry in China since 2000. North. Hortic. 2021, 3, 155–160. [Google Scholar]
- Sun, M.; Zhao, R.; Yin, X.; Xu, L.; Ruan, C.; Jia, W. FBoT-Net: Focal bottleneck transformer network for small green apple detection. Comput. Electron. Agric. 2023, 205, 107609. [Google Scholar] [CrossRef]
- Yao, Q.; Zheng, X.; Zhou, G.; Zhang, J. SGR-YOLO: A method for detecting seed germination rate in wild rice. Front. Plant Sci. 2024, 14, 1305081. [Google Scholar] [CrossRef] [PubMed]
- Sekharamantry, P.K.; Melgani, F.; Malacarne, J. Deep Learning-Based Apple Detection with Attention Module and Improved Loss Function in YOLO. Remote Sens. 2023, 15, 1516. [Google Scholar] [CrossRef]
- Villacrés, J.; Viscaino, M.; Delpiano, J.; Vougioukas, S.; Cheein, F.A. Apple orchard production estimation using deep learning strategies: A comparison of tracking-by-detection algorithms. Comput. Electron. Agric. 2023, 204, 107513. [Google Scholar] [CrossRef]
- Shang, Y.; Xu, X.; Jiao, Y.; Wang, Z.; Hua, Z.; Song, H. Using lightweight deep learning algorithm for real-time detection of apple flowers in natural environments. Comput. Electron. Agric. 2023, 207, 107765. [Google Scholar] [CrossRef]
- Long, Y.; Yang, Z.; He, M. Recognizing apple targets before thinning using improved YOLOv7. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2023, 39, 191–199. [Google Scholar]
- HAO, P.; LIU, L.; GU, R. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model. J. Graph. 2023, 44, 456–464. [Google Scholar]
- Sun, J.; Qian, L.; Zhu, W.; Zhou, X.; Dai, C.; Wu, X. Apple detection in complex orchard environment based on improved RetinaNet. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2022, 38, 314–322. [Google Scholar]
- Yan, B.; Fan, P.; Wang, M.; Shi, S.; Lei, X.; Yang, F. Real-time Apple Picking Pattern Recognition for Picking Robot Based on Improved YOLOv5m. Trans. Chin. Soc. Agric. Mach. 2022, 53, 28–38. [Google Scholar]
- Liu, R.-M.; Su, W.-H. APHS-YOLO: A Lightweight Model for Real-Time Detection and Classification of Stropharia Rugoso-Annulata. Foods 2024, 13, 1710. [Google Scholar] [CrossRef] [PubMed]
- Jung, A.; Wada, K.; Crall, J. Imgaug. 2020. Available online: https://github.com/aleju/imgaug (accessed on 1 February 2020).
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:abs/2107.08430. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE Computer Society: Washington, DC, USA, 2021; pp. 3490–3499. [Google Scholar]
- Cheng, G.; Wang, J.; Li, K.; Xie, X.; Lang, C.; Yao, Y.; Han, J. Anchor-Free Oriented Proposal Generator for Object Detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–11. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Yang, A.; Li, M.; Ding, Y.; Hong, D.; Lv, Y.; He, Y. GTFN: GCN and transformer fusion with spatial-spectral features for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3314616. [Google Scholar] [CrossRef]
- Wang, L.; Song, Z.; Zhang, X.; Wang, C.; Zhang, G.; Zhu, L.; Li, J.; Liu, H. SAT-GCN: Self-attention graph convolutional network-based 3D object detection for autonomous driving. Knowl. -Based Syst. 2023, 259, 110080. [Google Scholar] [CrossRef]
- Huang, B.; Zhang, J.; Ju, J.; Guo, R.; Fujita, H.; Liu, J. CRF-GCN: An effective syntactic dependency model for aspect-level sentiment analysis. Knowl.-Based Syst. 2023, 260, 110125. [Google Scholar] [CrossRef]
- Bao, Y.; Liu, J.; Shen, Q.; Cao, Y.; Ding, W.; Shi, Q. PKET-GCN: Prior knowledge enhanced time-varying graph convolution network for traffic flow prediction. Inf. Sci. 2023, 634, 359–381. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Yin, X.; Li, K.; Wang, L.; Wang, R.; Song, R. Distributed LSTM-GCN based spatial-temporal indoor temperature prediction in multi-zone buildings. IEEE Trans. Ind. Inform. 2023, 20, 482–491. [Google Scholar] [CrossRef]
- Liu, X.; Liu, J.; Cheng, X.; Li, J.; Wan, W.; Sun, J. VT-Grapher: Video Tube Graph Network with Self-Distillation for Human Action Recognition. IEEE Sens. J. 2024, 24, 14855–14868. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
- Fang, G.; Ma, X.; Song, M.; Mi, M.B.; Wang, X. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16091–16101. [Google Scholar]
- Wang, H.; Guo, P.; Zhou, P.; Xie, L. MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition. In Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Republic of Korea, 14–19 April 2024. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Li, L.; Zhu, Y.; Zhu, Z. Automatic modulation classification using resnext-gru with deep feature fusion. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2736–2746. [Google Scholar]
- Cui, C.; Gao, T.; Wei, S.; Du, Y.; Guo, R.; Dong, S.; Lu, B.; Zhou, Y.; Lv, X.; Liu, Q.; et al. PP-LCNet: A lightweight CPU convolutional neural network. arXiv 2021, arXiv:2109.15099. [Google Scholar]
- Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.M.; Yang, J.; Li, X. Large selective kernel network for remote sensing object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 16794–16805. [Google Scholar]
- Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. Efficientvit: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14420–14430. [Google Scholar]
- Han, B.; Lu, Z.; Dong, L.; Zhang, J. Lightweight Non-Destructive Detection of Diseased Apples Based on Structural Re-Parameterization Technique. Appl. Sci. 2024, 14, 1907. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
- Fu, X.; Zhao, S.; Wang, C.; Tang, X.; Tao, D.; Li, G.; Jiao, L.; Dong, D. Green Fruit Detection with a Small Dataset under a Similar Color Background Based on the Improved YOLOv5-AT. Foods 2024, 13, 1060. [Google Scholar] [CrossRef]
- Fu, L.; Feng, Y.; Wu, J.; Liu, Z.; Gao, F.; Majeed, Y.; Al-Mallahi, A.; Zhang, Q.; Li, R.; Cui, Y. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model. Precis. Agric. 2021, 22, 754–776. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, W.; Yu, J.; He, L.; Chen, J.; He, Y. Complete and accurate holly fruits counting using YOLOX object detection. Comput. Electron. Agric. 2022, 198, 107062. [Google Scholar] [CrossRef]
- Liu, G.H.; Chu, M.X.; Gong, R.F.; Zheng, Z.H. DLF-YOLOF: An improved YOLOF-based surface defect detection for steel plate. J. Iron Steel Res. Int. 2024, 31, 442–451. [Google Scholar] [CrossRef]
- Huo, B.; Li, C.; Zhang, J.; Xue, Y.; Lin, Z. SAFF-SSD: Self-attention combined feature fusion-based SSD for small object detection in remote sensing. Remote Sens. 2023, 15, 3027. [Google Scholar] [CrossRef]
- Zhang, Y.; Cai, Z. CE-RetinaNet: A channel enhancement method for infrared wildlife detection in UAV images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1922–1933. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Ren, H.; Cai, S.; Zhang, X. An improved faster R-CNN algorithm for assisted detection of lung nodules. Comput. Biol. Med. 2023, 153, 106470. [Google Scholar] [CrossRef]
- Cao, R.; Mo, W.; Zhang, W. MFMDet: Multi-scale face mask detection using improved Cascade rcnn. J. Supercomput. 2024, 80, 4914–4942. [Google Scholar] [CrossRef]
Names | Train Set | Val Set | Test Set | Totals |
---|---|---|---|---|
Origin Image | 408 | 51 | 51 | 510 |
Contrast Enhance | 408 | 51 | 51 | 510 |
Low Light | 408 | 51 | 51 | 510 |
Snowy Weather | 408 | 51 | 51 | 510 |
Sunny Weather | 408 | 51 | 51 | 510 |
Overcast Weather | 408 | 51 | 51 | 510 |
Foggy Weather | 408 | 51 | 51 | 510 |
Rainy Weather | 408 | 51 | 51 | 510 |
Totals | 3264 | 408 | 408 | 4080 |
Layer Number | Module Name | Parameters | Quantity | Input Size | Output Size | Feature Output |
---|---|---|---|---|---|---|
1 | Stem | 8694 | 1 | (3, 640, 640) | (42, 160, 160) | × |
2 | RepIRD Block | 34,188 | 2 | (42, 160, 160) | (42, 160, 160) | × |
3 | Downsample | 32,004 | 1 | (42, 160, 160) | (84, 80, 80) | × |
4 | RepIRD Block | 124,824 | 2 | (84, 80, 80) | (84, 80, 80) | ✓ |
5 | Downsample | 127,512 | 1 | (84, 80, 80) | (168, 40, 40) | × |
6 | RepIRD Block | 1,426,320 | 6 | (168, 40, 40) | (168, 40, 40) | ✓ |
7 | Downsample | 387,840 | 1 | (168, 40, 40) | (256, 20, 20) | × |
8 | SVGA Block | 1,979,904 | 1 | (256, 20, 20) | (256, 20, 20) | × |
9 | SPPF | 164,608 | 1 | (256, 20, 20) | (256, 20, 20) | ✓ |
Attention Mechanism Name | P/% | R/% | [email protected]/% | [email protected]:0.95/% |
---|---|---|---|---|
MLCA | 93.8 | 81.7 | 91.8 | 77.9 |
CA | 89.6 | 83.9 | 91.7 | 76.8 |
EMA | 90.2 | 84.7 | 91.8 | 76.7 |
SVGA | 93.4 | 84.9 | 91.9 | 78.1 |
Model Name | P/% | R/% | [email protected]/% | [email protected]:0.95/% | GFLOPS | Size/MB |
---|---|---|---|---|---|---|
ResNet18 | 92.7 | 81.4 | 90.9 | 77.6 | 35.1 | 26.9 |
GhostNet | 89.6 | 83.9 | 91.1 | 76.8 | 6.8 | 5.9 |
MobileNetv3 | 93.9 | 80.5 | 90.2 | 72.0 | 5.4 | 4.6 |
ResNeXt | 93.1 | 84.6 | 92.1 | 77.8 | 38.2 | 25.8 |
ResNeSt | 91.7 | 85.6 | 92.3 | 78.6 | 53.5 | 27.5 |
LcNet | 92.2 | 82.2 | 90.2 | 73.9 | 5.4 | 4.2 |
YOLOv8n-Backbone | 91.0 | 83.5 | 91.3 | 77.0 | 8.2 | 6.3 |
LSKNet | 91.1 | 82.7 | 89.9 | 74.2 | 19.7 | 11.6 |
FasterNet | 91.4 | 84.4 | 91.9 | 76.2 | 10.7 | 8.6 |
EfficientViT | 92.4 | 80.5 | 90.3 | 75.3 | 9.4 | 8.7 |
Ours (Rep-Vision-GCN) | 91.8 | 86.0 | 92.4 | 77.8 | 16.5 | 13.7 |
Rep-Vision-GCN | Rep-FPN-PAN (RepConvsBlock) | Rep-FPN-PAN-Plus (RepConvsBlockPlus) | P/% | R/% | [email protected]/% | [email protected]:0.95/% | GFLOPS | Size/MB |
---|---|---|---|---|---|---|---|---|
91.0 | 83.5 | 91.3 | 77.0 | 8.1 | 6.0 | |||
✓ | 91.8 | 86.0 | 92.4 | 77.8 | 16.5 | 13.7 | ||
✓ | 93.1 | 82.8 | 91.4 | 77.5 | 12.0 | 9.6 | ||
✓ | 91.2 | 85.4 | 91.7 | 77.3 | 13.4 | 11.8 | ||
✓ | ✓ | 94.2 | 83.8 | 92.7 | 78.4 | 21.2 | 15.8 | |
✓ | ✓ | 89.5 | 84.3 | 91.4 | 78.2 | 22.1 | 17.5 |
Model Name | Train | Inference | Params/MB | GFLOPS | [email protected]/% |
---|---|---|---|---|---|
Rep-Vision-GCN + FPN-PAN | ✓ | 6.69 | 16.5 | 92.4 | |
Rep-Vision-GCN + FPN-PAN | ✓ | 6.66 | 15.9 | 92.4 | |
YOLOv8n-Backbone + Rep-FPN-PAN | ✓ | 4.91 | 12.7 | 91.4 | |
YOLOv8n-Backbone + Rep-FPN-PAN | ✓ | 4.64 | 12.0 | 91.4 | |
Ours (Rep-ViG-Apple) | ✓ | 8.06 | 21.2 | 92.7 | |
Ours (Rep-ViG-Apple) | ✓ | 7.78 | 20.1 | 92.7 |
Model Name | P/% | R/% | [email protected]/% | [email protected]:0.95/% | GFLOPS | Size/MB |
---|---|---|---|---|---|---|
YOLOv8n | 91.0 | 83.5 | 91.3 | 77.0 | 8.1 | 6.3 |
YOLOv8s | 93.5 | 81.6 | 91.8 | 80.2 | 28.4 | 22.5 |
YOLOv8m | 92.7 | 83.5 | 91.7 | 80.9 | 78.7 | 52.1 |
YOLOv6n | 93.5 | 82.1 | 91.7 | 77.7 | 11.8 | 8.7 |
YOLOv6s | 91.7 | 81.5 | 91.5 | 79.0 | 44.0 | 32.9 |
YOLOv5n | 93.7 | 83.3 | 91.2 | 77.6 | 7.1 | 5.3 |
YOLOv5s | 93.3 | 82.4 | 91.7 | 79.4 | 23.8 | 18.5 |
YOLOv5m | 93.5 | 83.8 | 92.1 | 81.0 | 64.0 | 50.5 |
YOLOv3-tiny | 91.3 | 79.2 | 90.1 | 74.8 | 18.9 | 24.4 |
YOLOX | 88.0 | 83.6 | 89.4 | 60.6 | - | 61.1 |
YOLOF | 88.7 | 84.7 | 89.9 | 67.8 | - | 322.6 |
SSD | 79.2 | 86.7 | 84.9 | 63.1 | - | 181.9 |
Retinanet | 89.1 | 80.0 | 91.1 | 69.5 | - | 277.0 |
FCOS | 84.0 | 91.4 | 88.7 | 63.7 | - | 244.7 |
FasterRCNN | 83.8 | 88.5 | 84.3 | 70.4 | - | 321.2 |
CascadeRCNN | 86.2 | 89.3 | 86.8 | 71.2 | - | 527.8 |
Ours(Rep-ViG-Apple) | 94.2 | 83.8 | 92.7 | 78.4 | 21.2 | 15.8 |
Global | Speed Up | P/% | R/% | [email protected]/% | GFLOPS | Size/MB |
---|---|---|---|---|---|---|
1.3 | 92.5 | 85.0 | 93.3 | 16.5 | 12.4 | |
✓ | 1.3 | 92.4 | 86.4 | 93.0 | 16.5 | 12.4 |
1.5 | 91.5 | 82.3 | 92.6 | 14.3 | 10.9 | |
✓ | 1.5 | 92.2 | 86.6 | 92.8 | 14.3 | 10.9 |
1.8 | 90.1 | 73.2 | 82.5 | 12.8 | 9.8 | |
✓ | 1.8 | 90.5 | 71.5 | 82.3 | 12.8 | 9.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, B.; Lu, Z.; Zhang, J.; Almodfer, R.; Wang, Z.; Sun, W.; Dong, L. Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments. Agronomy 2024, 14, 1733. https://doi.org/10.3390/agronomy14081733
Han B, Lu Z, Zhang J, Almodfer R, Wang Z, Sun W, Dong L. Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments. Agronomy. 2024; 14(8):1733. https://doi.org/10.3390/agronomy14081733
Chicago/Turabian StyleHan, Bo, Ziao Lu, Jingjing Zhang, Rolla Almodfer, Zhengting Wang, Wei Sun, and Luan Dong. 2024. "Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments" Agronomy 14, no. 8: 1733. https://doi.org/10.3390/agronomy14081733
APA StyleHan, B., Lu, Z., Zhang, J., Almodfer, R., Wang, Z., Sun, W., & Dong, L. (2024). Rep-ViG-Apple: A CNN-GCN Hybrid Model for Apple Detection in Complex Orchard Environments. Agronomy, 14(8), 1733. https://doi.org/10.3390/agronomy14081733