Apple Pose Estimation Based on SCH-YOLO11s Segmentation
Abstract
:1. Introduction
- The proposed SCH-YOLO11s network, combining SimAM with C3k2 to form the C3k2_SimAM module, then replacing the traditional Conv in the backbone network with the CMUNeXt Block, and finally inserting the HTB module in the C2PSA module to replace the traditional PSABlock with HTB to improve the apple and calyx basin segmentation accuracy.
- The developed apple pose estimation technique. The point clouds of the apple and calyx basin are segmented using the proposed network and an RGB-D camera, the center is determined by least squares sphere fitting to the apple point cloud, the center of the calyx basin point cloud is calculated by the mean value method, the segmentation of the concave effectively avoids its effect on the sphere fitting, and the concatenation vector of the two is defined as the attitude direction of the apple.
- A scheme was designed to validate the proposed method in a laboratory environment. The real attitude data of an apple were manually collected using an RGB-D camera and compared with the estimated data to validate the effectiveness of the proposed attitude estimation method.
2. Materials and Methods
2.1. Dataset Production and Data Augmentation
2.2. Model Construction
2.2.1. SCH-YOLO11s Model Construction
2.2.2. C3k2_SimAM
2.2.3. CMUNeXt Block
2.2.4. C2PSA_HTB
2.3. Apple Pose Estimation
2.3.1. Holistic Approach
2.3.2. Point Cloud Acquisition
2.3.3. Pose Estimation
3. Experimental Design and Evaluation Indicators
3.1. Network Modeling Experimental Design
3.2. Indicators for the Assessment of Network Models
3.3. Acquisition of Pose Truth Data
4. Experimental Results
4.1. Comparative Experiments with Different Models
4.2. Ablation Experiment
4.3. Attitude Estimation Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AP | Average Precision |
Conv | Convolution |
DGFF | Dual-Scale Gated Feed Forward |
DHSA | Dynamic Range Histogram Self-Attention |
FN | False Negative |
FP | False Positive |
FPS | Frames Per Second |
GELU | Gaussian Error Linear Unit |
HTB | Histogram Transformer Block |
IoU | Intersection over Union |
mAP | Mean Average Precision |
RGB-D | Red, Green, Blue-Depth |
SimAM | Simple Attention Mechanism |
SOR | Statistical Outlier Removal |
TP | True Positive |
YOLO | You Only Look Once |
References
- Guo, G.; Wen, Q.; Zhu, J. The Impact of Aging Agricultural Labor Population on Farmland Output: From the Perspective of Farmer Preferences. Math. Probl. Eng. Theory Methods Appl. 2015, 2015, 730618. [Google Scholar] [CrossRef]
- Zhang, L.; Jia, J.; Gui, G.; Hao, X.; Gao, W.; Wang, M. Deep Learning Based Improved Classification System for Designing Tomato Harvesting Robot. IEEE Access 2018, 6, 67940–67950. [Google Scholar] [CrossRef]
- Jha, K.; Doshi, A.; Patel, P.; Shah, M. A comprehensive review on automation in agriculture using artificial intelligence. Artif. Intell. Agric. 2019, 2, 1–12. [Google Scholar]
- Saleem, M.H.; Potgieter, J.; Arif, K.M. Automation in agriculture by machine and deep learning techniques: A review of recent developments. Precis. Agric. 2021, 22, 2053–2091. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, X.; Au, W.; Kang, H.; Chen, C. Intelligent robots for fruit harvesting: Recent developments and future challenges. Precis. Agric. 2022, 23, 1856–1907. [Google Scholar] [CrossRef]
- Bai, Q.; Li, S.; Yang, J.; Song, Q.; Li, Z.; Zhang, X. Object detection recognition and robot grasping based on machine learning: A survey. IEEE Access 2020, 8, 181855–181879. [Google Scholar] [CrossRef]
- Yang, Y.; Han, Y.; Li, S.; Yang, Y.; Zhang, M.; Li, H. Vision based fruit recognition and positioning technology for harvesting robots. Comput. Electron. Agric. 2023, 213, 108258. [Google Scholar] [CrossRef]
- Ji, W.; Zhang, T.; Xu, B.; He, G. Apple recognition and picking sequence planning for harvesting robot in a complex environment. J. Agric. Eng. 2024, 55, 1549. [Google Scholar] [CrossRef]
- Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep learning–Method overview and review of use for fruit detection and yield estimation. Comput. Electron. Agric. 2019, 162, 219–234. [Google Scholar] [CrossRef]
- Espinoza, S.; Aguilera, C.; Rojas, L.; Campos, P.G. Analysis of Fruit Images with Deep Learning: A Systematic Literature Review and Future Directions. IEEE Access 2023, 12, 3837–3859. [Google Scholar] [CrossRef]
- Tian, H.; Wang, T.; Liu, Y.; Qiao, X.; Li, Y. Computer vision technology in agricultural automation—A review. Inf. Process. Agric. 2020, 7, 1–19. [Google Scholar] [CrossRef]
- Lv, J.; Niu, L.; Han, Y.; Sun, X.; Wang, L.; Rong, H.; Zou, L. A growth posture identification method of immature peaches in natural environments. Int. J. Adv. Robot. Syst. 2024, 21, 17298806241278153. [Google Scholar]
- Ji, W.; Zhai, K.; Xu, B.; Wu, J. Green Apple Detection Method Based on Multidimensional Feature Extraction Network Model and Transformer Module. J. Food Prot. 2025, 88, 100397. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar]
- Ghosh, S.; Das, N.; Das, I.; Maulik, U. Understanding deep learning techniques for image segmentation. ACM Comput. Surv. (CSUR) 2019, 52, 1–35. [Google Scholar]
- Nguyen, A.; Le, B. 3D point cloud segmentation: A survey. In Proceedings of the 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), Manila, Philippines, 12–15 November 2013; IEEE: New York, NY, USA, 2013; pp. 225–230. [Google Scholar]
- Eizentals, P.; Oka, K. 3D pose estimation of green pepper fruit for automated harvesting. Comput. Electron. Agric. 2016, 128, 127–140. [Google Scholar] [CrossRef]
- Wang, F.; Tang, Y.; Gong, Z.; Jiang, J.; Chen, Y.; Xu, Q.; Hu, P.; Zhu, H. A lightweight Yunnan Xiaomila detection and pose estimation based on improved YOLOv8. Front. Plant Sci. 2024, 15, 1421381. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Tang, Y.; Zou, X.; Xiong, J.; Li, J. Guava Detection and Pose Estimation Using a Low-Cost RGB-D Sensor in the Field. Sensors 2019, 19, 428. [Google Scholar] [CrossRef]
- Yin, W.; Wen, H.; Ning, Z.; Ye, J.; Dong, Z.; Luo, L. Fruit detection and pose estimation for grape cluster–harvesting robot using binocular imagery based on deep neural networks. Front. Robot. AI 2021, 8, 626989. [Google Scholar]
- Jang, M.; Hwang, Y. Tomato pose estimation using the association of tomato body and sepal. Comput. Electron. Agric. 2024, 221, 108961. [Google Scholar] [CrossRef]
- Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
- Jain, A.K. The fundamentals of digital image processing prentice-hall. Chem. Soc. Rev. 1995, 24, 243–250. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO (Version 8.3.50). [Computer Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 December 2024).
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks; PMLR: Cambridge, MA, USA, 2021. [Google Scholar]
- Tang, F.; Ding, J.; Quan, Q.; Wang, L.; Ning, C.; Zhou, S.K. CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion. In Proceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2023. [Google Scholar] [CrossRef]
- Sun, S.; Ren, W.; Gao, X.; Wang, R.; Cao, X. Restoring images in adverse weather conditions via histogram transformer. In Proceedings of the European Conference on Computer Vision, Paris, France, 26–27 March 2025; Springer: Cham, Switzerland, 2025; pp. 111–129. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Rusu, R.B.; Cousins, S. 3D is here: Point Cloud Library (PCL). In Proceedings of the IEEE International Conference on Robotics & Automation, Shanghai, China, 9–13 May 2011; IEEE: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Pratt, V. Direct least-squares fitting of algebraic surfaces. ACM SIGGRAPH Comput. Graph. 1987, 21, 145–152. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the European Conference on Computer Vision, Paris, France, 26–27 March 2025; Springer: Cham, Switzerland, 2025. [Google Scholar] [CrossRef]
- Anon. CloudCompare. Version 2.13.2, GPL Software. 2023. Available online: http://www.cloudcompare.org/ (accessed on 12 December 2024).
Network | AP50 | mAP | FPS | GFLOPs |
---|---|---|---|---|
YOLOv8s | 94.5 | 67.4 | 3.4 | 42.4 |
YOLOv9c | 94.7 | 68.2 | 6.3 | 157.6 |
YOLO11s | 94.5 | 67.5 | 2.8 | 35.3 |
YOLO11m | 94.1 | 68.1 | 4.3 | 123.0 |
SCH-YOLO11s | 95.9 | 69.7 | 3.0 | 55.5 |
Model | C3k2 SimAM | CMUNeXt Block | C2PSA HTB | AP50 (box) | mAP (box) | AP50 (seg) | mAP (seg) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
apple | calyx | apple | calyx | apple | calyx | apple | calyx | ||||
YOLO11s | × | × | × | 96.3 | 96 | 92.8 | 55.3 | 95.3 | 93.7 | 83.9 | 51 |
√ | × | × | 95.9 | 96.1 | 93.1 | 56.9 | 95.9 | 94.3 | 83.9 | 52.2 | |
× | √ | × | 96.8 | 96.4 | 93.7 | 55.5 | 96.8 | 94.5 | 84.3 | 52.4 | |
× | × | √ | 96.5 | 96 | 93.1 | 57 | 96.5 | 94 | 84.4 | 52.9 | |
√ | √ | √ | 97.1 | 96.9 | 94.4 | 58.3 | 97.1 | 94.7 | 85.7 | 53.7 |
Range of Error Degree | Frequency | Mean Error (Degree) | Minimum Error (Degree) | Maximum Error (Degree) |
---|---|---|---|---|
Total | 1 | 12.3 | 2.63 | 24.9 |
0~5 | 0.225 | 3.7 | - | - |
5~10 | 0.075 | 5.9 | - | - |
10~15 | 0.425 | 12.9 | - | - |
15~20 | 0.125 | 17.73 | - | - |
20~25 | 0.15 | 22.45 | - | - |
Calyx Basin Visibility Condition | Frequency | Mean Error (Degree) | Minimum Error (Degree) | Maximum Error (Degree) |
---|---|---|---|---|
Full visibility | 0.7 | 9.71 | 2.63 | 18.48 |
Partially visible | 0.3 | 18.48 | 13.02 | 24.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Niu, J.; Bi, M.; Yu, Q. Apple Pose Estimation Based on SCH-YOLO11s Segmentation. Agronomy 2025, 15, 900. https://doi.org/10.3390/agronomy15040900
Niu J, Bi M, Yu Q. Apple Pose Estimation Based on SCH-YOLO11s Segmentation. Agronomy. 2025; 15(4):900. https://doi.org/10.3390/agronomy15040900
Chicago/Turabian StyleNiu, Jinxing, Mingbo Bi, and Qingyuan Yu. 2025. "Apple Pose Estimation Based on SCH-YOLO11s Segmentation" Agronomy 15, no. 4: 900. https://doi.org/10.3390/agronomy15040900
APA StyleNiu, J., Bi, M., & Yu, Q. (2025). Apple Pose Estimation Based on SCH-YOLO11s Segmentation. Agronomy, 15(4), 900. https://doi.org/10.3390/agronomy15040900