Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering †
Abstract
1. Introduction
2. Materials and Methods
2.1. Orchard Description, Data Acquisition and Annotation
2.2. Trunk Detection
2.3. Prompt Engineering
Algorithm 1 Prompt engineering. |
Input: - A: Instance mask of k labeled trunks - - if mode = supervised: - h: approximate height of tree (pixels) - else - G: RGB image of mask A Output: Z: Dictionary of k tuples
|
2.4. Tree Segmentation
2.5. Metrics
3. Statistical Analysis
4. Results
4.1. Trunk Segmentation and Detection
4.2. Tree Segmentation
5. Discussion
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LiDAR | Light Detection And Ranging |
CLAHE | Contrast Limited Adaptive Histogram Equalization |
FPN | Feature Pyramid Network |
YOLO | You Only Look Once |
VLM | Visual Language Model |
VFM | Visual Foundation Model |
MOLMO | Multimodal open language model |
SAM | Segment Anything Model |
SAM1 | Segment Anything Model, version 1 |
SAM2 | Segment Anything Model, version 2 |
SAMHQ2 | Segment Anything Model in High Quality, version 2 |
FastSAM | Fast Segment Anything Model |
RobustSAM | Robust Segment Anything Model |
DBSCAN | Density-based spatial clustering of applications with noise |
DINO | Distillation with No Labels |
ViT | Vision transformer |
ViT-B | ViT Base |
ViT-L | ViT Large |
AP | Average precision |
DSC | Dice–Sørensen |
ME | Mean Error |
TP | True positive |
FP | False positive |
FN | False negative |
Technical Terms
RGB cameras or images | Synonym of colour cameras or images while RGB means |
Red, Green and blue | |
Stereovision techniques | Computer vision methods that use at least two RGB cameras |
from different viewpoints to reconstruct the scene in 3D space. | |
Prompt | Structured input (text, point, boxes or masks) used to guide |
a vision model on specific objects or regions | |
Prompt engineering | Methodological process to design and optimize input |
(e.g prompt) to effectively guide a model toward specific output. | |
Row | A set of trees ordered in line or column |
Napari software | Image processing tools for multidimensional data |
Instance segmentation | Computer vision task to label individual objects in the same class |
Object detection | Computer vision task to draw object around box |
YOLOv5s, YOLOv8, YOLOv11 | Different versions of YOLO series |
SOLOv2 | Segmenting Objects by LOcations version 2 |
Mask R-CNN | Mask Region-based Convolutional Neural Network |
Fine-tuning | Computer vision process to train a pre-trained model on |
a smaller or specific dataset. | |
Training dataset | Data used to fit the model. |
Validation dataset | Data used to tune hyperparameters and evaluate performance |
during training. | |
Test dataset | Data used to assess the model’s final performance on unseen data. |
Data augmentation | Computer vision techniques to increase the diversity of data by |
artificially applying transformations to the original data. | |
CLAHE algorithm | Computer vision techniques to enhance local image contrast |
while limiting noise amplification | |
Grafting point | Anatomical interface between the scion and rootstock in |
a grafted tree. | |
Genotype | Individual distinguished based on its specific combination |
of genetic markers or alleles. | |
Image registration | Computer vision method to align at least two images of the same |
scene taken at different times, viewpoints, or sensors by | |
transforming them into a common coordinate system. | |
Pseudo-label mask | Mask automatically generated by a semi-supervised model |
used as a substitute for ground truth labels. | |
Teacher–assistant model | An intermediate model that helps transfer knowledge from |
a large teacher model to a smaller student model | |
during distillation. | |
Grouding approach | Approach linking text as a prompt or description to localize |
objects or regions in an image. | |
Latent embedding | Low-dimensional vector representation at the end of a neural |
network that captures the key features of input data. |
References
- Qi, H.; Huang, Z.; Jin, B.; Tang, Q.; Jia, L.; Zhao, G.; Cao, D.; Sun, Z.; Zhang, C. SAM-GAN: An improved DCGAN for rice seed viability determination using near-infrared hyperspectral imaging. Comput. Electron. Agric. 2024, 216, 108473. [Google Scholar] [CrossRef]
- Yang, L.; Zhao, J.; Ying, X.; Lu, C.; Zhou, X.; Gao, Y.; Wang, L.; Liu, H.; Song, H. Utilization of deep learning models to predict calving time in dairy cattle from tail acceleration data. Comput. Electron. Agric. 2024, 225, 109253. [Google Scholar] [CrossRef]
- Rayamajhi, A.; Jahanifar, H.; Mahmud, M.S. Measuring ornamental tree canopy attributes for precision spraying using drone technology and self-supervised segmentation. Comput. Electron. Agric. 2024, 225, 109359. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar]
- Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Jiang, Q.; Li, C.; Yang, J.; Su, H.; et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 38–55. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Oh, I.S. Review of Fruit Tree Image Segmentation. arXiv 2024, arXiv:2412.14631. [Google Scholar]
- Comesaña-Cebral, L.; Martínez-Sánchez, J.; Lorenzo, H.; Arias, P. Individual tree segmentation method based on mobile backpack LiDAR point clouds. Sensors 2021, 21, 6007. [Google Scholar] [CrossRef]
- Zine-El-Abidine, M.; Dutagaci, H.; Galopin, G.; Rousseau, D. Assigning apples to individual trees in dense orchards using 3D colour point clouds. Biosyst. Eng. 2021, 209, 30–52. [Google Scholar] [CrossRef]
- Chen, Q.; Luo, H.; Cheng, Y.; Xie, M.; Nan, D. An individual tree detection and segmentation method from TLS and MLS point clouds based on improved seed points. Forests 2024, 15, 1083. [Google Scholar] [CrossRef]
- Underwood, J.P.; Jagbrant, G.; Nieto, J.I.; Sukkarieh, S. Lidar-based tree recognition and platform localization in orchards. J. Field Robot. 2015, 32, 1056–1074. [Google Scholar] [CrossRef]
- Nielsen, M.; Slaughter, D.C.; Gliever, C.; Upadhyaya, S. Orchard and tree mapping and description using stereo vision and lidar. In Proceedings of the International Conference of Agricultural Engineering, Valencia, Spain, 8–12 July 2012; p. 1380. [Google Scholar]
- La, Y.J.; Seo, D.; Kang, J.; Kim, M.; Yoo, T.W.; Oh, I.S. Deep Learning-Based Segmentation of Intertwined Fruit Trees for Agricultural Tasks. Agriculture 2023, 13, 2097. [Google Scholar] [CrossRef]
- Huang, J.; Jiang, K.; Zhang, J.; Qiu, H.; Lu, L.; Lu, S.; Xing, E. Learning to prompt segment anything models. arXiv 2024, arXiv:2401.04651. [Google Scholar]
- Zhou, K.; Yang, J.; Loy, C.C.; Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 2022, 130, 2337–2348. [Google Scholar] [CrossRef]
- Wang, J.; Liu, Z.; Zhao, L.; Wu, Z.; Ma, C.; Yu, S.; Dai, H.; Yang, Q.; Liu, Y.; Zhang, S.; et al. Review of large vision models and visual prompt engineering. Meta-Radiology 2023, 1, 100047. [Google Scholar] [CrossRef]
- Shtedritski, A.; Rupprecht, C.; Vedaldi, A. What does clip know about a red circle? visual prompt engineering for vlms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 11987–11997. [Google Scholar]
- Zhang, C.; Puspitasari, F.D.; Zheng, S.; Li, C.; Qiao, Y.; Kang, T.; Shan, X.; Zhang, C.; Qin, C.; Rameau, F.; et al. A survey on segment anything model (sam): Vision foundation model meets prompt engineering. arXiv 2023, arXiv:2306.06211. [Google Scholar]
- Gu, J.; Han, Z.; Chen, S.; Beirami, A.; He, B.; Zhang, G.; Liao, R.; Qin, Y.; Tresp, V.; Torr, P. A systematic survey of prompt engineering on vision-language foundation models. arXiv 2023, arXiv:2307.12980. [Google Scholar]
- Ali, H.; Bulbul, M.F.; Shah, Z. Prompt Engineering in Medical Image Segmentation: An Overview of the Paradigm Shift. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), Mount Pleasant, MI, USA, 16–17 September 2023; pp. 1–4. [Google Scholar]
- Carraro, A.; Sozzi, M.; Marinello, F. The Segment Anything Model (SAM) for accelerating the smart farming revolution. Smart Agric. Technol. 2023, 6, 100367. [Google Scholar] [CrossRef]
- Swartz, L.G.; Liu, S.; Cozatl, D.M.; Palaniappan, K. Segmentation of Arabidopsis thaliana Using Segment-Anything. In Proceedings of the 2023 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Louis, MO, USA, 27–29 September 2023; pp. 1–5. [Google Scholar]
- Chen, Y.; Yang, Z.; Bian, W.; Serikawa, S.; Zhang, L. Extraction Study of Leaf Area and Plant Height of Radish Seedlings Based on SAM. In Networking and Parallel/Distributed Computing Systems: Volume 18; Springer: Cham, Switzerland, 2024; pp. 69–83. [Google Scholar]
- Sun, J.; Yan, S.; Alexandridis, T.; Yao, X.; Zhou, H.; Gao, B.; Huang, J.; Yang, J.; Li, Y. Enhancing Crop Mapping through Automated Sample Generation Based on Segment Anything Model with Medium-Resolution Satellite Imagery. Remote. Sens. 2024, 16, 1505. [Google Scholar] [CrossRef]
- Osco, L.P.; Wu, Q.; de Lemos, E.L.; Gonçalves, W.N.; Ramos, A.P.M.; Li, J.; Junior, J.M. The segment anything model (sam) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103540. [Google Scholar] [CrossRef]
- Torres-Lomas, E.; Lado-Jimena, J.; Garcia-Zamora, G.; Diaz-Garcia, L. Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties. arXiv 2024, arXiv:2403.12935. [Google Scholar] [CrossRef]
- Zhang, W.; Dang, L.M.; Nguyen, L.Q.; Alam, N.; Bui, N.D.; Park, H.Y.; Moon, H. Adapting the Segment Anything Model for Plant Recognition and Automated Phenotypic Parameter Measurement. Horticulturae 2024, 10, 398. [Google Scholar] [CrossRef]
- Jung, M.; Roth, M.; Aranzana, M.J.; Auwerkerken, A.; Bink, M.; Denancé, C.; Dujak, C.; Durel, C.E.; Font i Forcada, C.; Cantin, C.M.; et al. The apple REFPOP—A reference population for genomics-assisted breeding in apple. Hortic. Res. 2020, 7, 189. [Google Scholar] [CrossRef]
- Zhao, G.; Yang, R.; Jing, X.; Zhang, H.; Wu, Z.; Sun, X.; Jiang, H.; Li, R.; Wei, X.; Fountas, S.; et al. Phenotyping of individual apple tree in modern orchard with novel smartphone-based heterogeneous binocular vision and YOLOv5s. Comput. Electron. Agric. 2023, 209, 107814. [Google Scholar] [CrossRef]
- Sun, X.; Fang, W.; Gao, C.; Fu, L.; Majeed, Y.; Liu, X.; Gao, F.; Yang, R.; Li, R. Remote estimation of grafted apple tree trunk diameter in modern orchard with RGB and point cloud based on SOLOv2. Comput. Electron. Agric. 2022, 199, 107209. [Google Scholar] [CrossRef]
- Sapkota, R.; Ahmed, D.; Karkee, M. Comparing YOLOv8 and Mask R-CNN for instance segmentation in complex orchard environments. Artif. Intell. Agric. 2024, 13, 84–99. [Google Scholar] [CrossRef]
- Hamuda, E.; Glavin, M.; Jones, E. A survey of image processing techniques for plant extraction and segmentation in the field. Comput. Electron. Agric. 2016, 125, 184–199. [Google Scholar] [CrossRef]
- Chen, Y.; Huang, Y.; Zhang, Z.; Wang, Z.; Liu, B.; Liu, C.; Huang, C.; Dong, S.; Pu, X.; Wan, F.; et al. Plant image recognition with deep learning: A review. Comput. Electron. Agric. 2023, 212, 108072. [Google Scholar] [CrossRef]
- Upadhyay, A.; Chandel, N.S.; Singh, K.P.; Chakraborty, S.K.; Nandede, B.M.; Kumar, M.; Subeesh, A.; Upendar, K.; Salem, A.; Elbeltagi, A. Deep learning and computer vision in plant disease detection: A comprehensive review of techniques, models, and trends in precision agriculture. Artif. Intell. Rev. 2025, 58, 1–64. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
- Reza, A.M. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. Vlsi Signal Process. Syst. Signal Image Video Technol. 2004, 38, 35–44. [Google Scholar] [CrossRef]
- Deitke, M.; Clark, C.; Lee, S.; Tripathi, R.; Yang, Y.; Park, J.S.; Salehi, M.; Muennighoff, N.; Lo, K.; Soldaini, L.; et al. Molmo and pixmo: Open weights and open data for state-of-the-art multimodal models. arXiv 2024, arXiv:2409.17146. [Google Scholar]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv 2024, arXiv:2402.07927. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Part V 13. Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Metuarea, H.; Garnier, J.; Guerif, K.; Didelot, F.; Laurens, F.; Bervas, L.; Rasti, P.; Dutagaci, H.; Rousseau, D. Leveraging on foundation deep neural models for individual apple tree segmentation in dense orchards via prompt engineering in RGB images. In Proceedings of the Computer Vision For Plant Phenotyping and Agriculture 2024 ECCV Workshop, Milan, Italy, 29 September–4 October 2024; pp. 1–2. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
- Chen, Y.; Ivanova, A.; Saeed, S.U.; Hargunani, R.; Huang, J.; Liu, C.; Hu, Y. Segmentation by registration-enabled sam prompt engineering using five reference images. In Proceedings of the International Workshop on Biomedical Image Registration, 2024, Marrakesh, Morocco, 6 October 2024; Springer: Cham, Switzerland, 2024; pp. 241–252. [Google Scholar]
- Wang, Z.; Zhang, Y.; Zhang, Z.; Jiang, Z.; Yu, Y.; Li, L.; Li, L. Exploring Semantic Prompts in the Segment Anything Model for Domain Adaptation. Remote. Sens. 2024, 16, 758. [Google Scholar] [CrossRef]
- Dapena-Fuente, E.; Blázquez Nogueiro, M.D. Descripción de Las Variedades de Manzana de la DOP Sidra de Asturias; Servicio Regional de Investigacion y Desarrollo Agroalimentario (SERIDA): Villaviciosa, Asturias, Spain, 2009. [Google Scholar]
- Lespinasse, J.; Delort, J. Apple tree management in vertical axis: Appraisal after ten years of experiments. In Proceedings of the III International Symposium on Research and Development on Orchard and Plantation Systems 160, Montpellier, France, 21–26 May 1984; pp. 139–156. [Google Scholar]
- Sestras, R.E.; Sestras, A.F. Quantitative traits of interest in apple breeding and their implications for selection. Plants 2023, 12, 903. [Google Scholar] [CrossRef]
- Gallais, A.; Bannerot, H. Amélioration des Espèces végétales cultivées. Objectifs et critères de Sélection; Inra: Paris, France, 1992. [Google Scholar]
- Cheng, D.; Qin, Z.; Jiang, Z.; Zhang, S.; Lao, Q.; Li, K. Sam on medical images: A comprehensive study on three prompt modes. arXiv 2023, arXiv:2305.00035. [Google Scholar]
- Huang, M.; Xu, G.; Li, J.; Huang, J. A method for segmenting disease lesions of maize leaves in real time using attention YOLACT++. Agriculture 2021, 11, 1216. [Google Scholar] [CrossRef]
- Li, K.; Gong, W.; Shi, Y.; Li, L.; He, Z.; Ding, X.; Wang, Y.; Ma, L.; Hao, W.; Yang, Z.; et al. Predicting positions and orientations of individual kiwifruit flowers and clusters in natural environments. Comput. Electron. Agric. 2023, 211, 108039. [Google Scholar] [CrossRef]
- Picon, A.; Eguskiza, I.; Galan, P.; Gomez-Zamanillo, L.; Romero, J.; Klukas, C.; Bereciartua-Perez, A.; Scharner, M.; Navarra-Mestre, R. Crop-conditional semantic segmentation for efficient agricultural disease assessment. Artif. Intell. Agric. 2025, 15, 79–87. [Google Scholar] [CrossRef]
- Gai, R.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2023, 35, 13895–13906. [Google Scholar] [CrossRef]
- Mirhaji, H.; Soleymani, M.; Asakereh, A.; Mehdizadeh, S.A. Fruit detection and load estimation of an orange orchard using the YOLO models through simple approaches in different imaging and illumination conditions. Comput. Electron. Agric. 2021, 191, 106533. [Google Scholar] [CrossRef]
- Yang, S.; Zhang, J.; Yuan, J. A High-Accuracy Contour Segmentation and Reconstruction of a Dense Cluster of Mushrooms Based on Improved SOLOv2. Agriculture 2024, 14, 1646. [Google Scholar] [CrossRef]
- Crespo, A.; Moncada, C.; Crespo, F.; Morocho-Cayamcela, M.E. An efficient strawberry segmentation model based on Mask R-CNN and TensorRT. Artif. Intell. Agric. 2025, 15, 327–337. [Google Scholar] [CrossRef]
- Li, H.; Mo, Y.; Chen, J.; Chen, J.; Li, J. Accurate Orah fruit detection method using lightweight improved YOLOv8n model verified by optimized deployment on edge device. Artif. Intell. Agric. 2025, 15, 707–723. [Google Scholar] [CrossRef]
- Zhu, K.; Li, J.; Zhang, K.; Arunachalam, C.; Bhattacharya, S.; Lu, R.; Li, Z. Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting. arXiv 2025, arXiv:2502.01850. [Google Scholar] [CrossRef]
- Sapkota, R.; Paudel, A.; Karkee, M. Zero-shot automatic annotation and instance segmentation using llm-generated datasets: Eliminating field imaging and manual annotation for deep learning model development. arXiv 2024, arXiv:2411.11285. [Google Scholar]
- El Akrouchi, M.; Mhada, M.; Bayad, M.; Hawkesford, M.J.; Gérard, B. AI-Based Framework for Early Detection and Segmentation of Green Citrus fruits in Orchards. Smart Agric. Technol. 2025, 10, 100834. [Google Scholar] [CrossRef]
- Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef]
- Ke, L.; Ye, M.; Danelljan, M.; Liu, Y.; Tai, Y.W.; Tang, C.K.; Yu, F. Segment Anything in High Quality. In Proceedings of the NeurIPS, New Orleans, LA, USA, 10–16 December 2023; pp. 29914–29934. [Google Scholar]
- Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast Segment Anything. 2023. Available online: http://arxiv.org/abs/2306.12156 (accessed on 30 July 2025).
- Chen, W.T.; Vong, Y.J.; Kuo, S.Y.; Ma, S.; Wang, J. RobustSAM: Segment Anything Robustly on Degraded Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 4081–4091. [Google Scholar]
- Maier-Hein, L.; Reinke, A.; Godau, P.; Tizabi, M.D.; Buettner, F.; Christodoulou, E.; Glocker, B.; Isensee, F.; Kleesiek, J.; Kozubek, M.; et al. Metrics reloaded: Pitfalls and recommendations for image analysis validation. arXiv 2022, arXiv:2206.01653. [Google Scholar] [CrossRef]
- Saporta, G. Probabilités, Analyse des Données et Statistique; Editions Technip: Paris, France, 2006; pp. 1–622. [Google Scholar]
- Xiao, B.; Wu, H.; Xu, W.; Dai, X.; Hu, H.; Lu, Y.; Zeng, M.; Liu, C.; Yuan, L. Florence-2: Advancing a unified representation for a variety of vision tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4818–4829. [Google Scholar]
- Zou, X.; Yang, J.; Zhang, H.; Li, F.; Li, L.; Wang, J.; Wang, L.; Gao, J.; Lee, Y.J. Segment everything everywhere all at once. Adv. Neural Inf. Process. Syst. 2023, 36, 19769–19782. [Google Scholar]
- Yan, S.; Hou, W.; Rao, Y.; Jiang, D.; Jin, X.; Wang, T.; Wang, Y.; Liu, L.; Zhang, T.; Genis, A. Multi-scale cross-modal feature fusion and cost-sensitive loss function for differential detection of occluded bagging pears in practical orchards. Artif. Intell. Agric. 2025, 15, 573–589. [Google Scholar] [CrossRef]
Dataset | Images | Trees | Segmentation | Location |
---|---|---|---|---|
REFPOP | 275 | 697 | Instance | France |
275 | 944 | Instance | Spain | |
275 | 543 | Instance | Italy | |
275 | 841 | Instance | Switzerland | |
275 | 809 | Instance | Belgium | |
La et al. [13] | 150 | 150 | Semantic | South-Korea |
Split | Dataset | Location | Images | Total |
---|---|---|---|---|
France | 55 | |||
Spain | 55 | |||
Train | REFPOP | Italy | 55 | 275 |
Switzerland | 55 | |||
Belgium | 55 | |||
France | 30 | |||
Spain | 30 | |||
Validation | REFPOP | Italy | 30 | 150 |
Switzerland | 30 | |||
Belgium | 30 | |||
France | 30 | |||
Spain | 30 | |||
REFPOP | Italy | 30 | 150 | |
Test | Switzerland | 30 | ||
Belgium | 30 | |||
La et al. [13] | South-Korea | 30 | 30 |
Arch. | Param. | Training Set | |
---|---|---|---|
Supervised | |||
YOLOv11 [37] | CSPDarknet53 | 2.67M | MS COCO [41] |
YOLOv8 [36] | CSPDarknet53 | 5.35M | MS COCO [41] |
FPN [35] | VGG16 | 1.58M | INRAe [42] |
Zero-shot | |||
MOLMO [39] | Molmo 7B+ViT-L | 8.2B+639M | PixMo |
+ SAM2 [43] | |||
DINO [5] | SwinT+ViT-L | 310M+639M | Grounding-20M |
+ SAM2 [43] |
Approach | Method | Arch. | Param. | Training Set |
---|---|---|---|---|
Empirical | Supervised prompt | . | . | . |
Unsupervised | . | . | . | |
prompt | ||||
Grounding | Molmo [39] | Molmo 7B | 8.2B | PixMo |
DINO [5] | SwinT | 310M | Grounding-20M |
Arch. | Param. | Training Set | |
---|---|---|---|
Zero-shot | |||
SAM1 [4] | ViT-B | 93.7 M | SA-1B |
SAM2 [43] | ViT-B | 80.8 M | SA-V |
SAMHQ2 [63] | ViT-L | 224 M | HQSeg-44K |
FastSAM [64] | FastSAM-x | 68 M | SA-1B |
RobustSAM [65] | ViT-B | 153 M | LVIS, MSRA10K |
ThinObject-5k | |||
Supervised | |||
YOLOv8 [13] | CSPDarknet53 | 71.75M | [13] |
Dataset (→) | REFPOP | La et al. [13] |
---|---|---|
Models (↓) | ||
YOLOv11 | * | * |
YOLOv8 | * | 0.53 ± 0.09 |
FPN | 0.55 ± 0.05 | * |
Molmo+SAM2 | * | * |
DINO+SAM2 | * | * |
Dataset (→) | REFPOP | La et al. [13] |
---|---|---|
Models (↓) | ||
YOLOv11 | 0.97 ± 0.03 | 0.97 ± 0.05 |
YOLOv8 | * | * |
FPN | * | * |
Molmo+SAM2 | * | * |
DINO+SAM2 | * | * |
Dataset (→) | REFPOP | La et al. [13] |
---|---|---|
Models (↓) | ||
Zero-shot | ||
SAM1 | * | * |
SAM2 | * | * |
SAMHQ2 | 0.70 ± 0.03 | 0.84 ± 0.03 |
FastSAM | * | * |
RobustSAM | * | * |
Supervised | ||
YOLOv8 [13] | * | |
with La et al. | ||
YOLOv8 [13] | * | * |
with REFPOP |
Approach | SAM1 | SAM2 | SAMHQ2 | FastSAM | RobustSAM |
---|---|---|---|---|---|
No prompt | * | * | * | * | * |
0.66 ± 0.02 | * | * | |||
* | 0.65 ± 0.03 | 0.70 ± 0.03 | * | 0.63 ± 0.04 | |
* | * | * | * | * | |
* | * | * | 0.14 ± 0.13 | * |
Metrics (→) | Dice | Precision | Recall | Mean Error |
---|---|---|---|---|
Location (↓) | ||||
Switzerland | ||||
Spain | ||||
Italy | ||||
France | ||||
Belgium | ||||
South Korea [13] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Metuarea, H.; Laurens, F.; Guerra, W.; Lozano, L.; Patocchi, A.; Van Hoye, S.; Dutagaci, H.; Labrosse, J.; Rasti, P.; Rousseau, D. Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering. Sensors 2025, 25, 4721. https://doi.org/10.3390/s25154721
Metuarea H, Laurens F, Guerra W, Lozano L, Patocchi A, Van Hoye S, Dutagaci H, Labrosse J, Rasti P, Rousseau D. Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering. Sensors. 2025; 25(15):4721. https://doi.org/10.3390/s25154721
Chicago/Turabian StyleMetuarea, Herearii, François Laurens, Walter Guerra, Lidia Lozano, Andrea Patocchi, Shauny Van Hoye, Helin Dutagaci, Jeremy Labrosse, Pejman Rasti, and David Rousseau. 2025. "Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering" Sensors 25, no. 15: 4721. https://doi.org/10.3390/s25154721
APA StyleMetuarea, H., Laurens, F., Guerra, W., Lozano, L., Patocchi, A., Van Hoye, S., Dutagaci, H., Labrosse, J., Rasti, P., & Rousseau, D. (2025). Individual Segmentation of Intertwined Apple Trees in a Row via Prompt Engineering. Sensors, 25(15), 4721. https://doi.org/10.3390/s25154721