SAMNet++: A Segment Anything Model for Supervised 3D Point Cloud Semantic Segmentation
Abstract
:1. Introduction
- Develop a novel hybrid segmentation model that leverages unsupervised and supervised segmentation techniques for improved 3D segmentation accuracy and efficiency.
- Demonstrate the advantages of a dual-stage segmentation pipeline, where SAM LiDAR performs coarse segmentation and PointNet++ refines it.
- Evaluate SAMNet++ on real-world UAV-collected LiDAR datasets, demonstrating its effectiveness compared to some of the SOTA methods.
2. Proposed Approach
2.1. Data Fusion and Generating Colored Point Cloud
2.2. SAMNet++
2.2.1. SAM LiDAR Segmentation
2.2.2. Pointnet++ Segmentation
2.2.3. Training and Validation
2.2.4. Testing the Model
2.3. Point Translation
3. Data Collection and Preprocessing
3.1. Sensor Fusion
3.2. Annotation
4. Results
4.1. Evaluation on Our Experimental Datasets
4.2. Evaluation on a Public Dataset
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mo, Y.; Wu, Y.; Yang, X.; Liu, F.; Liao, Y. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 2022, 493, 626–646. [Google Scholar] [CrossRef]
- Paparoditis, N.; Cord, M.; Jordan, M.; Cocquerez, J.P. Building Detection and Reconstruction from Mid- and High-Resolution Aerial Imagery. Comput. Vis. Image Underst. 1998, 72, 122–142. [Google Scholar] [CrossRef]
- Hinz, S.; Baumgartner, A. Automatic extraction of urban road networks from multi-view aerial imagery. ISPRS J. Photogramm. Remote Sens. 2003, 58, 83–98. [Google Scholar] [CrossRef]
- Bhadauria, A.; Bhadauria, H.; Kumar, A. Building extraction from satellite images. IOSR J. Comput. Eng. 2013, 12, 76–81. [Google Scholar]
- Klonus, S.; Tomowski, D.; Ehlers, M.; Reinartz, P.; Michel, U. Combined Edge Segment Texture Analysis for the Detection of Damaged Buildings in Crisis Areas. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1118–1128. [Google Scholar] [CrossRef]
- Lohani, B.; Ghosh, S. Airborne LiDAR Technology: A Review of Data Collection and Processing Systems. Natl. Acad. Sci. India. Proc. Sect. A Phys. Sci. 2017, 87, 567–579. [Google Scholar] [CrossRef]
- Elamin, A.; El-Rabbany, A. UAV-Based Multi-Sensor Data Fusion for Urban Land Cover Mapping Using a Deep Convolutional Neural Network. Remote Sens. 2022, 14, 4298. [Google Scholar] [CrossRef]
- Lyu, Y.; Huang, X.; Zhang, Z. Learning to Segment 3D Point Clouds in 2D Image Space. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 12252–12261. [Google Scholar]
- Kuçak, R.A.; Özdemir, E.; Erol, S. The Segmentation of Point Clouds With K-Means and Ann (Artifical Neural Network). In The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; Copernicus GmbH: Gottingen, Germany, 2017; pp. 595–598. [Google Scholar]
- Zhao, J.; Li, C.; Tian, L.; Zhu, J.; Zhou, J.; Verikas, A.; Radeva, P.; Nikolaev, D. FPFH-based graph matching for 3D point cloud registration. In Proceedings of the Tenth International Conference on Machine Vision (ICMV 2017), Vienna, Austria, 13–15 November 2018; SPIE: Bellingham, WA, USA, 2017. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364. [Google Scholar] [CrossRef]
- Muhammad Yasir, S.; Ahn, H. Deep Learning-Based 3D Instance and Semantic Segmentation: A Review. J. Artif. Intell. 2022, 4, 99–114. [Google Scholar] [CrossRef]
- Zhang, R.; Wu, Y.; Jin, W.; Meng, X. Deep-Learning-Based Point Cloud Semantic Segmentation: A Survey. Electronics 2023, 12, 3642. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, B.; Wang, B.; Li, B. GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; IEEE: New York, NY, USA, 2023; pp. 17619–17629. [Google Scholar]
- Poux, F.; Mattes, C.; Kobbelt, L. Unsupervised Segmentation of Indoor 3D Point Cloud: Application to Object-Based Classification. In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; Copernicus GmbH: Gottingen, Germany, 2020; Volume XLIV-4/W1-2020, pp. 111–118. [Google Scholar] [CrossRef]
- Charles, R.Q.; Hao, S.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, 2017; pp. 77–85. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar] [CrossRef]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.A.A.K.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar] [CrossRef]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3D Semantic Parsing of Large-Scale Indoor Spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 1534–1543. [Google Scholar]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 9296–9306. [Google Scholar]
- Hou, Y.; Zhu, X.; Ma, Y.; Loy, C.C.; Li, Y. Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 8469–8478. [Google Scholar]
- El Madawi, K.; Rashed, H.; El Sallab, A.; Nasr, O.; Kamel, H.; Yogamani, S. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; IEEE: New York, NY, USA, 2019; pp. 7–12. [Google Scholar]
- Strom, J.; Richardson, A.; Olson, E. Graph-based segmentation for colored 3D laser point clouds. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; IEEE: New York, NY, USA, 2010; pp. 2131–2136. [Google Scholar]
- Krispel, G.; Opitz, M.; Waltner, G.; Possegger, H.; Bischof, H. FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; IEEE: New York, NY, USA, 2020; pp. 1863–1872. [Google Scholar]
- Zhuang, Z.; Li, R.; Jia, K.; Wang, Q.; Li, Y.; Tan, M. Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 16260–16270. [Google Scholar]
- Yarroudh, A. LiDAR Automatic Unsupervised Segmentation Using Segment-Anything Model (SAM) from Meta AI. Available online: https://github.com/Yarroudh/segment-lidar (accessed on 1 August 2024).
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
- Zenmuse L1 Specification. Available online: https://enterprise.dji.com/zenmuse-l1/specs (accessed on 20 May 2021).
- Cristóvão, M.P.; Portugal, D.; Carvalho, A.E.; Ferreira, J.F. A LiDAR-Camera-Inertial-GNSS Apparatus for 3D Multimodal Dataset Collection in Woodland Scenarios. Sensors 2023, 23, 6676. [Google Scholar] [CrossRef]
- Osco, L.P.; Wu, Q.; de Lemos, E.L.; Gonçalves, W.N.; Ramos, A.P.M.; Li, J.; Marcato, J. The Segment Anything Model (SAM) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103540. [Google Scholar] [CrossRef]
- Wu, Q.; Osco, L.P. samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM). J. Open Source Softw. 2023, 8, 5663. [Google Scholar] [CrossRef]
- Graham, L. The LAS 1.4 Specification. Photogramm. Eng. Remote Sens. 2012, 78, 93–102. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Bridle, J.S. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Neurocomputing: Algorithms, Architectures and Applications; Springer: Berlin/Heidelberg, Germany, 1990; pp. 227–236. [Google Scholar]
- Automatic Mixed Precision Package—Torch.Amp. Available online: https://pytorch.org/docs/stable/amp.html (accessed on 1 November 2024).
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Stone, M. Cross-Validatory Choice and Assessment of Statistical Predictions. J. R. Stat. Society. Ser. B Methodol. 1974, 36, 111–147. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Tan, W.; Qin, N.; Ma, L.; Li, Y.; Du, J.; Cai, G.; Yang, K.; Li, J. Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WD, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 797–806. [Google Scholar]
- DJI Terra Software (Version 4.0.0). Available online: https://enterprise.dji.com/dji-terra (accessed on 30 April 2024).
- Girardeau-Montaut, D. CloudCompare; Version 2.13.1; CloudCompare: 2024. CloudCompare. p. 3D Point Cloud and Mesh Processing Software, Open Source Project; Telecom ParisTech: Paris, France, 2024; Available online: https://www.cloudcompare.org/ (accessed on 1 November 2024).
- Precision 5820 Tower Workstation. Available online: https://www.dell.com/en-ca/shop/workstations/precision-5820-tower-workstation/spd/precision-5820-workstation (accessed on 1 November 2024).
- NVIDIA® T1000, 8 GB GDDR6, full height, PCIe 3.0x16, 4 mDP Graphics Card. Available online: https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/productspage/quadro/quadro-desktop/nvidia-t1000-datasheet-1987414-r4.pdf (accessed on 1 November 2024).
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 146. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 6410–6419. [Google Scholar]
- Ma, L.; Li, Y.; Li, J.; Tan, W.; Yu, Y.; Chapman, M.A. Multi-Scale Point-Wise Convolutional Neural Networks for 3D Object Segmentation From LiDAR Point Clouds in Large-Scale Environments. IEEE Trans. Intell. Transp. Syst. 2021, 22, 821–836. [Google Scholar] [CrossRef]
- Li, Y.; Ma, L.; Zhong, Z.; Cao, D.; Li, J. TGNet: Geometric Graph CNN on 3-D Point Cloud Segmentation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3588–3600. [Google Scholar] [CrossRef]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Learning Semantic Segmentation of Large-Scale Point Clouds With Random Sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8338–8354. [Google Scholar] [CrossRef]
- Yan, K.; Hu, Q.; Wang, H.; Huang, X.; Li, L.; Ji, S. Continuous Mapping Convolution for Large-Scale Point Clouds Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6502505. [Google Scholar] [CrossRef]
- Yoo, S.; Jeong, Y.; Jameela, M.; Sohn, G. Human Vision Based 3D Point Cloud Semantic Segmentation of Large-Scale Outdoor Scenes. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–27 June 2023; IEEE: New York, NY, USA, 2023; pp. 6577–6586. [Google Scholar]
- Zeng, Z.; Xu, Y.; Xie, Z.; Tang, W.; Wan, J.; Wu, W. Large-scale point cloud semantic segmentation via local perception and global descriptor vector. Expert Syst. Appl. 2024, 246, 123269. [Google Scholar] [CrossRef]
- Li, X.; Zhang, Z.; Li, Y.; Huang, M.; Zhang, J. SFL-NET: Slight Filter Learning Network for Point Cloud Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5703914. [Google Scholar] [CrossRef]
- Du, J.; Cai, G.; Wang, Z.; Huang, S.; Su, J.; Marcato Junior, J.; Smit, J.; Li, J. ResDLPS-Net: Joint residual-dense optimization for large-scale point cloud semantic segmentation. ISPRS J. Photogramm. Remote Sens. 2021, 182, 37–51. [Google Scholar] [CrossRef]
- Nurunnabi, A.; Teferle, F.N.; Li, J.; Lindenbergh, R.C.; Parvaz, S. Investigation of Pointnet for Semantic Segmentation of Large-Scale Outdoor Point Clouds. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; Copernicus GmbH: Gottingen, Germany, 2021; Volume XLVI-4/W5, pp. 397–404. [Google Scholar] [CrossRef]
- Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull. 1945, 1, 80–83. [Google Scholar] [CrossRef]
Method | OA | Precision | Recall | F1-Score | ||||
---|---|---|---|---|---|---|---|---|
Micro | Macro | Micro | Macro | Micro | Macro | |||
Dataset 1 | PointNet | 14.96% | 14.95% | 9.86% | 14.95% | 7.47% | 14.95% | 8.50% |
PointNet++ | 84.61% | 84.61% | 73.62% | 84.61% | 70.58% | 84.61% | 66.65% | |
SAMNET++ | 97.68% | 97.87% | 91.84% | 97.87% | 98.44% | 97.87% | 94.74% | |
Dataset 2 | PointNet | 20.41% | 20.41% | 13.01% | 20.41% | 10.20% | 20.41% | 11.44% |
PointNet++ | 89.49% | 89.49% | 78.98% | 89.49% | 63.78% | 89.49% | 66.25% | |
SAMNET++ | 98.86% | 98.86% | 93.96% | 98.86% | 98.86% | 98.86% | 96.26% |
Title 1 | PointNet | PointNet++ | SAMNet++ |
---|---|---|---|
Dataset 1 | 956.73 min | 120.70 min | 59.42 min |
Dataset 2 | 3001.73 min | 337.38 min | 159.33 min |
Method | OA | mIoU | Road | Road Markings | Nature | Building | Utility Line | Pole | Car | Fence |
---|---|---|---|---|---|---|---|---|---|---|
PointNet++ [17] | 84.88 | 41.81 | 89.27 | 0.00 | 69.0 | 54.1 | 43.7 | 23.3 | 52.0 | 3.0 |
DGCNN [46] | 94.24 | 61.79 | 93.88 | 0.00 | 91.25 | 80.39 | 62.40 | 62.32 | 88.26 | 15.81 |
KPFCNN [47] | 95.39 | 69.11 | 94.62 | 0.06 | 96.07 | 91.51 | 87.68 | 81.56 | 85.66 | 15.72 |
MS-PCNN [48] | 90.03 | 65.89 | 93.84 | 3.83 | 93.46 | 82.59 | 67.80 | 71.95 | 91.12 | 22.50 |
TGNet [49] | 94.08 | 61.34 | 93.54 | 0.00 | 90.83 | 81.57 | 65.26 | 62.98 | 88.73 | 7.85 |
MS-TGNet [41] | 95.71 | 70.50 | 94.41 | 17.19 | 95.72 | 88.83 | 76.01 | 73.97 | 94.24 | 23.64 |
RandLA-Net [50] | 92.95 | 77.71 | 94.61 | 42.62 | 96.89 | 93.01 | 86.51 | 78.07 | 92.85 | 37.12 |
MappingConvSeg [51] | 93.17 | 77.57 | 95.02 | 39.27 | 96.77 | 93.32 | 86.37 | 79.11 | 89.81 | 40.89 |
EyeNet [52] | 94.63 | 81.13 | 96.98 | 65.02 | 97.83 | 93.51 | 86.77 | 84.86 | 94.02 | 30.01 |
LACV-Net [53] | 95.8 | 78.5 | 94.8 | 42.7 | 96.7 | 91.4 | 88.2 | 79.6 | 93.9 | 40.6 |
SFL-Net * [54] | 96.0 | 78.1 | 94.2 | 34.0 | 96.9 | 93.8 | 87.1 | 85.7 | 93.5 | 39.7 |
RandLA-Net * [50] | 94.37 | 81.77 | 96.69 | 64.21 | 96.92 | 94.24 | 88.06 | 77.84 | 93.37 | 42.86 |
MappingConvSeg * [51] | 94.72 | 82.89 | 97.15 | 67.87 | 97.55 | 93.75 | 86.88 | 82.12 | 93.72 | 44.11 |
ResDLPS-Net * [55] | 96.49 | 80.27 | 95.82 | 59.80 | 96.10 | 90.96 | 86.82 | 79.95 | 89.41 | 43.31 |
LACV-Net * [53] | 97.4 | 82.7 | 97.1 | 66.9 | 97.3 | 93.0 | 87.3 | 83.4 | 93.4 | 43.1 |
SFL-Net * [54] | 97.9 | 81.9 | 97.7 | 70.7 | 95.8 | 91.7 | 87.4 | 78.8 | 92.3 | 40.8 |
SAMNet++ * (ours) | 96.90 | 86.93 | 95.37 | 53.62 | 98.99 | 99.10 | 91.26 | 92.06 | 97.86 | 77.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shahraki, M.; Elamin, A.; El-Rabbany, A. SAMNet++: A Segment Anything Model for Supervised 3D Point Cloud Semantic Segmentation. Remote Sens. 2025, 17, 1256. https://doi.org/10.3390/rs17071256
Shahraki M, Elamin A, El-Rabbany A. SAMNet++: A Segment Anything Model for Supervised 3D Point Cloud Semantic Segmentation. Remote Sensing. 2025; 17(7):1256. https://doi.org/10.3390/rs17071256
Chicago/Turabian StyleShahraki, Mohsen, Ahmed Elamin, and Ahmed El-Rabbany. 2025. "SAMNet++: A Segment Anything Model for Supervised 3D Point Cloud Semantic Segmentation" Remote Sensing 17, no. 7: 1256. https://doi.org/10.3390/rs17071256
APA StyleShahraki, M., Elamin, A., & El-Rabbany, A. (2025). SAMNet++: A Segment Anything Model for Supervised 3D Point Cloud Semantic Segmentation. Remote Sensing, 17(7), 1256. https://doi.org/10.3390/rs17071256