An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s
Abstract
:1. Introduction
- (1)
- A new attention mechanism, the DCFE attention, is proposed in this work. DCFE combines convolution and multi-head attention, demonstrating its capability to effectively capture both local and global features.
- (2)
- The DCFE attention mechanism and DS-Conv have been incorporated into the YOLOv5 network to improve the model’s capacity to extract features from unstructured vineyards. It has led to a reduction in missed grape detections and an improvement in detection accuracy.
- (3)
- Furthermore, we implemented our algorithm on a grape-picking robot for harvesting experiments, offering solid evidence of the usefulness of the approach in this paper.
2. Materials and Methods
2.1. Data Acquisition and Processing
2.1.1. Data Acquisition
2.1.2. Dataset Annotation and Partition
2.2. GDN (Grape Detection Model)
2.3. Network Architecture of YOLOv5
2.4. Network Architecture of GDN
2.4.1. Improvements Based on DCFE Attention
2.4.2. Improvements Based on DS-Conv
2.5. Model Training
2.6. Evaluation Metrics
3. Results
3.1. Visualization of DS-Conv Feature Maps
3.2. Ablation Experiments
3.3. Performance Comparison of Different Models
Analysis of Detection Results for YOLOv5s and GDN
3.4. Harvesting Experiments
4. Discussion
5. Conclusions
- (1)
- Ablation experiments indicate that DCFE enhances the model’s robustness, significantly improving the recall rate in complex scenes. The addition of DS-CONV enhances feature extraction capabilities, thereby improving the model’s precision. The combination of both leads to the maximum performance improvement, with mAP0.5:0.95 increasing by 2.2% and mAP0.5 increasing by 2.5%.
- (2)
- A comparison of several object detection networks and SSD, YOLOv3-tiny, and YOLOv5s reveals that GDN performs much better across a range of metrics. Additionally, we analyzed the performance of the GDN model in different scenarios, including well-lit conditions, occlusion, and poor lighting with dense fruit distribution. The results indicate that, in complex scenes, GDN demonstrates stronger robustness compared to YOLOv5s.
- (3)
- Grape picking experiments demonstrate that our proposed algorithm has an average inference time of 1.04 s and a harvesting success rate of 90%, meeting the requirements for grape picking by the harvesting robot and proving its practicality.
- (4)
- In conclusion, the GDN grape detection model achieves high accuracy and recall in complex environments, making it suitable for practical grape harvesting by harvesting robots. It provides valuable insights for future research on deep learning-based harvesting algorithms.
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Roselli, L.; Casieri, A.; De Gennaro, B.C.; Sardaro, R.; Russo, G. Environmental and economic sustainability of table grape production in Italy. Sustainability 2020, 12, 3670. [Google Scholar] [CrossRef]
- Ehsani, R.; Udumala, S. Mechanical Harvesting of Citrus-An overview. Resour. Mag. 2010, 17, 4–6. [Google Scholar]
- Moreno, R.; Torregrosa, A.; Moltó, E.; Chueca, P. Effect of harvesting with a trunk shaker and an abscission chemical on fruit detachment and defoliation of citrus grown under Mediterranean conditions. Span. J. Agric. Res. 2015, 13, 12. [Google Scholar] [CrossRef]
- Yu, Y.; Sun, Z.; Zhao, X.; Bian, J.; Hui, X. Design and implementation of an automatic peach-harvesting robot system. In Proceedings of the 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), Xiamen, China, 29–31 March 2018; pp. 700–705. [Google Scholar]
- Wei, X.; Jia, K.; Lan, J.; Li, Y.; Zeng, Y.; Wang, C. Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Optik 2014, 125, 5684–5689. [Google Scholar] [CrossRef]
- Septiarini, A.; Hamdani, H.; Sauri, M.S.; Widians, J.A. Image processing for maturity classification of tomato using otsu and manhattan distance methods. J. Inform. 2022, 16, 118. [Google Scholar] [CrossRef]
- Sidehabi, S.W.; Suyuti, A.; Areni, I.S.; Nurtanio, I. Classification on passion fruit’s ripeness using K-means clustering and artificial neural network. In Proceedings of the 2018 International Conference on Information and Communications Technology (ICOIACT), Yogyakarta, Indonesia, 6–7 March 2018; pp. 304–309. [Google Scholar]
- Yu, Y.; Velastin, S.A.; Yin, F. Automatic grading of apples based on multi-features and weighted K-means clustering algorithm. Inf. Process. Agric. 2020, 7, 556–565. [Google Scholar] [CrossRef]
- Murillo-Bracamontes, E.A.; Martinez-Rosas, M.E.; Miranda-Velasco, M.M.; Martinez-Reyes, H.L.; Martinez-Sandoval, J.R.; Cervantes-de-Avila, H. Implementation of Hough transform for fruit image segmentation. Procedia Eng. 2012, 35, 230–239. [Google Scholar] [CrossRef]
- Lin, G.; Tang, Y.; Zou, X.; Cheng, J.; Xiong, J. Fruit detection in natural environment using partial shape matching and probabilistic Hough transform. Precis. Agric. 2020, 21, 160–177. [Google Scholar] [CrossRef]
- Peng, H.; Shao, Y.; Chen, K.; Deng, Y.; Xue, C. Research on multi-class fruits recognition based on machine vision and SVM. IFAC-Pap. 2018, 51, 817–821. [Google Scholar] [CrossRef]
- Behera, S.K.; Rath, A.K.; Sethy, P.K. Fruit recognition using support vector machine based on deep features. Karbala Int. J. Mod. Sci. 2020, 6, 16. [Google Scholar] [CrossRef]
- Bhargava, A.; Bansal, A. Fruits and vegetables quality evaluation using computer vision: A review. J. King Saud Univ.-Comput. Inf. Sci. 2021, 33, 243–257. [Google Scholar] [CrossRef]
- Vibhute, A.; Bodhe, S.K. Applications of image processing in agriculture: A survey. Int. J. Comput. Appl. 2012, 52, 34–40. [Google Scholar] [CrossRef]
- Khattak, A.; Asghar, M.U.; Batool, U.; Asghar, M.Z.; Ullah, H.; Al-Rakhami, M.; Gumaei, A. Automatic detection of citrus fruit and leaves diseases using deep neural network model. IEEE Access 2021, 9, 112942–112954. [Google Scholar] [CrossRef]
- Nagaraju, M.; Chawla, P.; Upadhyay, S.; Tiwari, R. Convolution network model based leaf disease detection using augmentation techniques. Expert Syst. 2022, 39, e12885. [Google Scholar] [CrossRef]
- Kaur, P.; Harnal, S.; Tiwari, R.; Upadhyay, S.; Bhatia, S.; Mashat, A.; Alabdali, A.M. Recognition of leaf disease using hybrid convolutional neural network by applying feature reduction. Sensors 2022, 22, 575. [Google Scholar] [CrossRef]
- Mishra, A.M.; Harnal, S.; Gautam, V.; Tiwari, R.; Upadhyay, S. Weed density estimation in soya bean crop using deep convolutional neural networks in smart agriculture. J. Plant Dis. Prot. 2022, 129, 593–604. [Google Scholar] [CrossRef]
- Xiao, B.; Nguyen, M.; Yan, W.Q. Fruit ripeness identification using transformers. Appl. Intell. 2023, 53, 22488–22499. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Fu, L.; Feng, Y.; Majeed, Y.; Zhang, X.; Zhang, J.; Karkee, M.; Zhang, Q. Kiwifruit detection in field images using Faster R-CNN with ZFNet. IFAC-Pap. 2018, 51, 45–50. [Google Scholar] [CrossRef]
- Gao, F.; Fu, L.; Zhang, X.; Majeed, Y.; Li, R.; Karkee, M.; Zhang, Q. Multi-class fruit-on-plant detection for apple in SNAP system using Faster R-CNN. Comput. Electron. Agric. 2020, 176, 105634. [Google Scholar] [CrossRef]
- Jia, W.; Tian, Y.; Luo, R.; Zhang, Z.; Lian, J.; Zheng, Y. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot. Comput. Electron. Agric. 2020, 172, 105380. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference (Part I 14), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Han, K.; Xu, M.; Li, S.; Xu, Z.; Ye, H.; Hua, S. Research on Positioning Technology of Facility Cultivation Grape Based on Transfer Learning of SSD MobileNet. In Proceedings of the International Conference on Wireless Communications, Networking and Applications, Wuhan, China, 16–18 December 2022; pp. 600–608. [Google Scholar]
- Santos, T.T.; de Souza, L.L.; dos Santos, A.A.; Avila, S. Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput. Electron. Agric. 2020, 170, 105247. [Google Scholar] [CrossRef]
- Liu, G.; Nouaze, J.C.; Touko Mbouembe, P.L.; Kim, J.H. YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3. Sensors 2020, 20, 2145. [Google Scholar] [CrossRef]
- Qi, X.; Dong, J.; Lan, Y.; Zhu, H. Method for identifying litchi picking position based on YOLOv5 and PSPNet. Remote Sens. 2022, 14, 2004. [Google Scholar] [CrossRef]
- Jimenez, A.; Ceres, R.; Pons, J.L. A survey of computer vision methods for locating fruit on trees. Trans. ASAE 2000, 43, 1911–1920. [Google Scholar] [CrossRef]
- Xiong, J.; Lin, R.; Liu, Z.; He, Z.; Tang, L.; Yang, Z.; Zou, X. The recognition of litchi clusters and the calculation of picking point in a nocturnal natural environment. Biosyst. Eng. 2018, 166, 44–57. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Liao, H.-Y.M.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–32. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Li, L.; Tang, S.; Deng, L.; Zhang, Y.; Tian, Q. Image caption with global-local attention. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6070–6079. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11030–11039. [Google Scholar]
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference (Part V 13), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Afsah-Hejri, L.; Homayouni, T.; Toudeshki, A.; Ehsani, R.; Ferguson, L.; Castro-García, S. Mechanical harvesting of selected temperate and tropical fruit and nut trees. Hortic. Rev. 2022, 49, 171–242. [Google Scholar]
- Lytridis, C.; Bazinas, C.; Kalathas, I.; Siavalas, G.; Tsakmakis, C.; Spirantis, T.; Badeka, E.; Pachidis, T.; Kaburlasos, V.G. Cooperative Grape Harvesting Using Heterogeneous Autonomous Robots. Robotics 2023, 12, 147. [Google Scholar] [CrossRef]
- Yang, Q.; Du, X.; Wang, Z.; Meng, Z.; Ma, Z.; Zhang, Q. A review of core agricultural robot technologies for crop productions. Comput. Electron. Agric. 2023, 206, 107701. [Google Scholar] [CrossRef]
- Badeka, E.; Karapatzak, E.; Karampatea, A.; Bouloumpasi, E.; Kalathas, I.; Lytridis, C.; Tziolas, E.; Tsakalidou, V.N.; Kaburlasos, V.G. A Deep Learning Approach for Precision Viticulture, Assessing Grape Maturity via YOLOv7. Sensors 2023, 23, 8126. [Google Scholar] [CrossRef]
- Xiong, J.; Liu, Z.; Lin, R.; Bu, R.; He, Z.; Yang, Z.; Liang, C. Green grape detection and picking-point calculation in a night-time natural environment using a charge-coupled device (CCD) vision sensor with artificial illumination. Sensors 2018, 18, 969. [Google Scholar] [CrossRef]
- Niu, K.; Wang, C.; Xu, J.; Yang, C.; Zhou, X.; Yang, X. An Improved YOLOv5s-Seg Detection and Segmentation Model for the Accurate Identification of Forest Fires Based on UAV Infrared Image. Remote Sens. 2023, 15, 4694. [Google Scholar] [CrossRef]
Time | Variety | Sensor | Resolution | Number | Train | Val | Deepth |
---|---|---|---|---|---|---|---|
11 July 2021 | Shine Muscat | D435i | 1280 × 720 | 580 | 464 | 116 | N |
6 July 2023 | Kyoho | D435i | 1280 × 720 | 680 | 544 | 136 | Y |
Environment | Details |
---|---|
GPU | NVIDIA A100 × 2 |
CPU | Intel Xeon Gold 6226R |
Ubuntu | 20.04 |
Python | Python 3.8 |
CUDA | 10.2 |
Hyperparameters | Details |
---|---|
Epochs | 400 |
Image Size | 640 × 640 |
Batch Size | 32 |
Optimizer | SGD |
Initial LR | 0.01 |
Method | mAP0.5:0.95 | mAP0.5 | P | R | F1 |
---|---|---|---|---|---|
YOLOv5s | 0.671 | 0.912 | 0.904 | 0.869 | 0.886 |
YOLOv5s + DCFE | 0.673 | 0.929 | 0.897 | 0.893 | 0.895 |
YOLOv5s + DS-CONV | 0.682 | 0.924 | 0.913 | 0.864 | 0.888 |
GDN | 0.693 | 0.937 | 0.918 | 0.885 | 0.901 |
Method | mAP0.5:0.95 | mAP0.5 | P | R | F1 | Size |
---|---|---|---|---|---|---|
SSD_VGG16 | 0.417 | 0.811 | 0.805 | 0.512 | 0.626 | 93.3 MB |
YOLOv3_tiny | 0.596 | 0.887 | 0.893 | 0.825 | 0.858 | 17.4 MB |
YOLOv5s | 0.671 | 0.912 | 0.904 | 0.869 | 0.886 | 14.4 MB |
GDN | 0.693 | 0.937 | 0.918 | 0.885 | 0.901 | 24.9 MB |
Experiment Number | Inference Time (s) | Total Time (s) | Harvesting Result | Experiment Number | Inference Time (s) | Total Time (s) | Harvesting Result |
---|---|---|---|---|---|---|---|
1 | 0.98 | 10.65 | √ | 11 | 1.00 | 12.57 | √ |
2 | 1.14 | 11.27 | √ | 12 | 0.91 | 14.57 | √ |
3 | 0.93 | 10.70 | √ | 13 | 0.92 | 11.59 | √ |
4 | 1.04 | 11.24 | √ | 14 | 0.91 | 12.26 | √ |
5 | 1.16 | 10.92 | √ | 15 | 0.99 | 11.68 | √ |
6 | 0.91 | 11.09 | × | 16 | 1.02 | 13.60 | √ |
7 | 0.91 | 12.12 | √ | 17 | 1.22 | 12.64 | √ |
8 | 1.17 | 10.94 | √ | 18 | 1.13 | 12.79 | √ |
9 | 1.25 | 11.82 | × | 19 | 1.01 | 11.66 | √ |
10 | 1.26 | 11.59 | √ | 20 | 0.95 | 11.09 | √ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, W.; Shi, Y.; Liu, W.; Che, Z. An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s. Agriculture 2024, 14, 262. https://doi.org/10.3390/agriculture14020262
Wang W, Shi Y, Liu W, Che Z. An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s. Agriculture. 2024; 14(2):262. https://doi.org/10.3390/agriculture14020262
Chicago/Turabian StyleWang, Wenhao, Yun Shi, Wanfu Liu, and Zijin Che. 2024. "An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s" Agriculture 14, no. 2: 262. https://doi.org/10.3390/agriculture14020262
APA StyleWang, W., Shi, Y., Liu, W., & Che, Z. (2024). An Unstructured Orchard Grape Detection Method Utilizing YOLOv5s. Agriculture, 14(2), 262. https://doi.org/10.3390/agriculture14020262