CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo
Abstract
:1. Introduction
- We adapt a contrastive learning approach, namely SegContrast, to address the challenges of large-scale, mobile mapping with 3D LiDAR point cloud;
- We design a data augmentation approach for mobile mapping point clouds;
- We leverage merged heterogeneous mobile mapping datasets during the pre-training phase of self-supervised learning to provide enough positive and negative pairs for contrastive learning, thereby improving accuracy and generalizability.
Literature Review
2. Materials and Methods
2.1. Revisiting SegContrast the Pre-Training Pipeline
2.2. Adapting the SegContrast Pre-Training Pipeline for Mobile Mapping Point Clouds
2.2.1. Dedicated Data Augmentation Approach
2.2.2. Heterogeneous Dataset Merging
- 1.
- Baseline: a classical supervised baseline with MinkUNet [28] trained with original data.
- 2.
- DA-supervised: A classical supervised baseline with MinkUNet [28] trained with augmented partitions.
- 3.
- CLOUDSPAM: A self-supervised pre-training algorithm with MoCo [11] using unlabeled partitions from merged heterogeneous datasets, followed by supervised fine-tuning with labeled partitions from the targeted dataset.
2.3. Data and Experimental Configuration
- KITTI-360: This dataset covers a 73.7 km of medium-population-density streets in Karlsruhe, Germany. It consists of 9 labeled sequences containing over 1.2 billion points for training and more than 340 million points for validation. This dataset encompasses 46 classes grouped into 7 categories. The point clouds were post-processed, and point density was uniformized. A single aggregated point cloud of this dataset has an average of 2.5 million points, with a density of around 500 pts/m2.
- Paris-Lille3D: This dataset covers 1.9 km of urban streets of Paris and Lille in France and contains 119.8 million points. The dataset encompasses 50 classes grouped into 9 categories. It is split into 4 point clouds, with point density ranging from 1000 pts/m2 to 2000 pts/m2. The test datasets were published as an add-on, and have different locations; 1 point cloud was acquired in Dijon, and 2 were acquired in Ajaccio, France. Each of them consists of exactly 10 million points.
- Toronto-3D: This dataset covers a 1 km road in a dense suburban area of Toronto, Canada. It contains 78.3 million points split into 8 classes. The dataset is divided into 4 sections within a driving distance of 250 m. In addition, there is overlap among the sections. The second section is kept as a test and contains 6.7 million points. The Toronto-3D dataset differs from the two other datasets due to its significant disparity in point density. In Toronto-3D, every point is detected within a 100 m LiDAR range, while a 20 m range is used in the other datasets. Moreover, there no post-processing trimming or downsampling was applied to this dataset.
3. Results
4. Discussion
4.1. DA-Supervised
4.2. CLOUDSPAM
4.3. Impact of Pre-Training
4.4. Impact of Data Augmentation
4.5. Comparison Against the State of the Art
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Griffiths, D.; Boehm, J. A Review on Deep Learning Techniques for 3D Sensed Data Classification. Remote Sens. 2019, 11, 1499. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4338–4364. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Tian, J.; Zhu, X.X. Linking points with labels in 3D: A review of point cloud semantic segmentation. IEEE Geosci. Remote Sens. Mag. 2020, 8, 38–59. [Google Scholar] [CrossRef]
- Xiao, A.; Huang, J.; Guan, D.; Zhang, X.; Lu, S.; Shao, L. Unsupervised point cloud representation learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11321–11339. [Google Scholar] [CrossRef]
- Arora, S.; Khandeparkar, H.; Khodak, M.; Plevrakis, O.; Saunshi, N. A theoretical analysis of contrastive unsupervised representation learning. arXiv 2019, arXiv:1902.09229. [Google Scholar]
- Xiao, A.; Zhang, X.; Shao, L.; Lu, S. A Survey of Label-Efficient Deep Learning for 3D Point Clouds. arXiv 2023, arXiv:2305.19812. [Google Scholar] [CrossRef]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Liao, Y.; Xie, J.; Geiger, A. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. Pattern Anal. Mach. Intell. (PAMI) 2022, 45, 3292–3310. [Google Scholar] [CrossRef] [PubMed]
- Gui, J.; Chen, T.; Cao, Q.; Sun, Z.; Luo, H.; Tao, D. A Survey of Self-Supervised Learning from Multiple Perspectives: Algorithms, Theory, Applications and Future Trends. arXiv 2023, arXiv:2301.05712. [Google Scholar]
- Hou, J.; Graham, B.; Nießner, M.; Xie, S. Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts. arXiv 2021, arXiv:2012.09165. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. arXiv 2020, arXiv:1911.05722. [Google Scholar]
- Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.J.; Litany, O. PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding. In Proceedings of the IEEE/CVF Europian Conference Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Nunes, L.; Marcuzzi, R.; Chen, X.; Behley, J.; Stachniss, C. SegContrast: 3D Point Cloud Feature Representation Learning through Self-supervised Segment Discrimination. IEEE Robot. Autom. Lett. (RA-L) 2022, 7, 2116–2123. [Google Scholar] [CrossRef]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Jiang, L.; Shi, S.; Tian, Z.; Lai, X.; Liu, S.; Fu, C.W.; Jia, J. Guided point contrastive learning for semi-supervised point cloud semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6423–6432. [Google Scholar]
- Li, L.; Shum, H.P.; Breckon, T.P. Less is more: Reducing task and model complexity for 3d point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9361–9371. [Google Scholar]
- Fei, B.; Yang, W.; Liu, L.; Luo, T.; Zhang, R.; Li, Y.; He, Y. Self-supervised Learning for Pre-Training 3D Point Clouds: A Survey. arXiv 2023, arXiv:2305.04691. [Google Scholar]
- Tan, W.; Qin, N.; Ma, L.; Li, Y.; Du, J.; Cai, G.; Yang, K.; Li, J. Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 202–203. [Google Scholar]
- Roynard, X.; Deschaud, J.E.; Goulette, F. Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. Int. J. Robot. Res. 2018, 37, 545–557. [Google Scholar] [CrossRef]
- van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Belongie, S.; Malik, J.; Puzicha, J. Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
- Körtgen, M.; Park, G.J.; Novotni, M.; Klein, R. 3D Shape Matching with 3D Shape Contexts. In Proceedings of the 7th Central European Seminar on Computer Graphics, Vienna, Austria, 7–9 May 2003. [Google Scholar]
- Xie, S.; Liu, S.; Chen, Z.; Tu, Z. Attentional ShapeContextNet for Point Cloud Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Assoc. Comput. Mach. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Narksri, P.; Takeuchi, E.; Ninomiya, Y.; Morales, Y.; Akai, N.; Kawaguchi, N. A Slope-robust Cascaded Ground Segmentation in 3D Point Cloud for Autonomous Vehicles. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 497–504. [Google Scholar] [CrossRef]
- Mahmoudi Kouhi, R.; Daniel, S.; Giguère, P. Data Preparation Impact on Semantic Segmentation of 3D Mobile LiDAR Point Clouds Using Deep Neural Networks. Remote Sens. 2023, 15, 74. [Google Scholar] [CrossRef]
- Choy, C.; Gwak, J.; Savarese, S. 4d Spatio-Temporal Convnets: Minkowski Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084. [Google Scholar]
- Robert, D.; Vallet, B.; Landrieu, L. Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LO, USA, 19–20 June 2022; pp. 5575–5584. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 Ocotber–2 November 2019; pp. 6411–6420. [Google Scholar]
- Ma, L.; Li, Y.; Li, J.; Tan, W.; Yu, Y.; Chapman, M.A. Multi-scale point-wise convolutional neural networks for 3D object segmentation from LiDAR point clouds in large-scale environments. IEEE Trans. Intell. Transp. Syst. 2019, 22, 821–836. [Google Scholar] [CrossRef]
- Li, Y.; Ma, L.; Zhong, Z.; Cao, D.; Li, J. TGNet: Geometric graph CNN on 3-D point cloud segmentation. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3588–3600. [Google Scholar] [CrossRef]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Learning semantic segmentation of large-scale point clouds with random sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8338–8354. [Google Scholar] [CrossRef] [PubMed]
- Thomas, H.; Goulette, F.; Deschaud, J.E.; Marcotegui, B.; LeGall, Y. Semantic classification of 3D point clouds with multiscale spherical neighborhoods. In Proceedings of the 2018 International conference on 3D vision (3DV), Verona, Italy, 5–8 September 2018; pp. 390–398. [Google Scholar]
- Roynard, X.; Deschaud, J.E.; Goulette, F. Classification of point cloud for road scene understanding with multiscale voxel deep network. In Proceedings of the 10th Workshop on Planning, Perceptionand Navigation for Intelligent Vehicules PPNIV’2018, Madrid, Spain, 1–5 October 2018. [Google Scholar]
- Liang, Z.; Yang, M.; Deng, L.; Wang, C.; Wang, B. Hierarchical depthwise graph convolutional neural network for 3D semantic segmentation of point clouds. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8152–8158. [Google Scholar]
- Luo, H.; Chen, C.; Fang, L.; Khoshelham, K.; Shen, G. MS-RRFSegNet: Multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8301–8315. [Google Scholar] [CrossRef]
- Boulch, A.; Le Saux, B.; Audebert, N. Unstructured point cloud semantic labeling using deep segmentation networks. 3dor@ Eurographics 2017, 3, 1–8. [Google Scholar]
- Boulch, A.; Puy, G.; Marlet, R. FKAConv: Feature-kernel alignment for point cloud convolution. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020; pp. 381–399. [Google Scholar]
Labeled Dataset | Method | 1% | 2% | 10% | 20% | 50% | 100% |
---|---|---|---|---|---|---|---|
KITTI-360 | Baseline | 23.1% | 29.1% | 37.9% | 39.1% | 41.3% | 51.0% |
DA-supervised | 38.1% | 42.4% | 52.4% | 58.3% | 61.9% | 64.1% | |
CLOUDSPAM | 41.3% | 46.3% | 53.3% | 59.0% | 61.9% | 63.6% | |
Toronto-3D | Baseline | 27.7% | 29.8% | 38.4% | 39.9% | 41.9% | 57.0% |
DA-supervised | 47.8% | 54.4% | 59.2% | 66.0% | 69.7% | 69.3% | |
CLOUDSPAM | 49.3% | 65.1% | 62.7% | 70.4% | 71.3% | 71.8% | |
Paris-Lille-3D | Baseline | 32.7% | 45.9% | 52.1% | 57.2% | 69.1% | 68.9% |
DA-supervised | 33.4% | 44.6% | 52.9% | 55.2% | 66.5% | 63.8% | |
CLOUDSPAM | 44.1% | 55.5% | 60.1% | 66.7% | 70.8% | 73.8% |
KITTI-360 | Toronto-3D | Paris-Lille-3D | ||||
---|---|---|---|---|---|---|
Original | Ours | Original | Ours | Original | Ours | |
# of segments | 11,950 | 552,012 | 1348 | 46,426 | 2405 | 48,532 |
# of clouds | 239 | 14,340 | 3 | 1231 | 4 | 1580 |
Avg. # of pts per cloud | 2,689,600 | 131,072 | 12,866,207 | 131,072 | 29,945,846 | 131,072 |
Avg. radius of clouds (m) | 113 | 20 | 125 | 25 | 300 | 12 |
Total # of points (in millions) | 1200.0 | 1,879.5 | 78.3 | 161.3 | 119.8 | 205.9 |
Method | mIoU | Road | Sidewalk | Building | Wall | Fence | Pole | Traffic lig. | Traffic sig. | Vegetation | Terrain | Person | Car | Truck | Motorcycle | Bicycle |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MinkUNet* | 54.2 | 90.6 | 74.4 | 84.5 | 45.3 | 42.9 | 52.7 | 0.5 | 38.6 | 87.6 | 70.3 | 26.9 | 87.3 | 66.0 | 28.2 | 17.2 |
DeepViewAgg | 57.8 | 93.5 | 77.5 | 89.3 | 53.5 | 47.1 | 55.6 | 18.0 | 44.5 | 91.8 | 71.8 | 40.2 | 87.8 | 30.8 | 39.6 | 26.1 |
SPT | 63.5 | 93.3 | 79.3 | 90.8 | 56.2 | 45.7 | 52.8 | 20.4 | 51.4 | 89.8 | 73.6 | 61.6 | 95.1 | 79.0 | 53.1 | 10.9 |
DA-supervised | 64.1 | 95.6 | 83.3 | 90.4 | 56.2 | 50.2 | 60.9 | 0.0 | 53.7 | 90.7 | 75.7 | 73.4 | 96.4 | 82.5 | 47.5 | 4.5 |
CLOUDSPAM | 63.6 | 95.6 | 83.4 | 90.4 | 56.2 | 48.7 | 60.6 | 10.4 | 52.7 | 90.7 | 75.5 | 62.0 | 96.3 | 75.6 | 49.3 | 6.9 |
Method | mIoU | Road | Road Mark | Natural | Building | Utility Line | Pole | Car | Fence |
---|---|---|---|---|---|---|---|---|---|
PointNet++ [30] | 56.5 | 91.4 | 7.6 | 89.8 | 74.0 | 68.6 | 59.5 | 54.0 | 7.5 |
PointNet++(MSG) [30] | 53.1 | 90.7 | 0.0 | 86.7 | 75.8 | 56.2 | 60.9 | 44.5 | 10.2 |
DGCNN [31] | 49.6 | 90.6 | 0.4 | 81.2 | 93.9 | 47.0 | 56.9 | 49.3 | 7.3 |
KPConv [32] | 60.3 | 90.2 | 0.0 | 86.8 | 86.8 | 81.1 | 73.1 | 42.8 | 21.6 |
MS-PCNN [33] | 58.0 | 91.2 | 3.5 | 90.5 | 77.3 | 62.3 | 68.5 | 53.6 | 17.1 |
TGNet [34] | 58.3 | 91.4 | 10.6 | 91.0 | 76.9 | 68.3 | 66.2 | 54.1 | 8.2 |
MS-TGNet [18] | 61.0 | 90.9 | 18.8 | 92.2 | 80.6 | 69.4 | 71.2 | 51.0 | 13.6 |
RandLA-Net [35] | 77.7. | 94.6 | 42.6 | 96.9 | 93.0 | 86.5 | 78.1 | 92.8 | 37.1 |
DA-supervised | 69.3 | 94.9 | 0.0 | 94.9 | 90.0 | 84.4 | 73.8 | 89.7 | 26.5 |
CLOUDSPAM | 71.8 | 95.0 | 0.0 | 95.7 | 90.5 | 85.7 | 77.1 | 91.7 | 38.7 |
Method | mIoU | Ground | Building | Pole | Bollard | Trash Can | Barrier | Pedestrian | Car | Nature |
---|---|---|---|---|---|---|---|---|---|---|
RF_MSSF [36] | 56.3 | 99.3 | 88.6 | 47.8 | 67.3 | 2.3 | 27.1 | 20.6 | 74.8 | 78.8 |
MS3_DVS [37] | 66.9 | 99.0 | 94.8 | 52.4 | 38.1 | 36.0 | 49.3 | 52.6 | 91.3 | 88.6 |
HDGCN [38] | 68.3 | 99.4 | 93.0 | 67.7 | 75.7 | 25.7 | 44.7 | 37.1 | 81.9 | 89.6 |
MS-RRFSegNet [39] | 79.2 | 98.6 | 98.0 | 79.7 | 74.3 | 75.1 | 57.9 | 55.9 | 82.0 | 91.4 |
ConvPoint [40] | 75.9 | 99.5 | 95.1 | 71.6 | 88.7 | 46.7 | 52.9 | 53.5 | 89.4 | 85.4 |
KPConv [32] | 82.0 | 99.5 | 94.0 | 71.3 | 83.1 | 78.7 | 47.7 | 78.2 | 94.4 | 91.4 |
FKACon [41] | 82.7 | 99.6 | 98.1 | 77.2 | 91.1 | 64.7 | 66.5 | 58.1 | 95.6 | 93.9 |
RandLA-Net [35] | 78.5 | 99.5 | 97.0 | 71.0 | 86.7 | 50.5 | 65.5 | 49.1 | 95.3 | 91.7 |
DA-supervised | 63.8 | 99.1 | 95.8 | 55.8 | 48.6 | 35.4 | 37.9 | 23.7 | 86.3 | 91.8 |
CLOUDSPAM | 73.8 | 99.4 | 95.7 | 56.7 | 66.4 | 64.4 | 58.0 | 39.8 | 92.5 | 91.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mahmoudi Kouhi, R.; Stocker, O.; Giguère, P.; Daniel, S. CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo. Remote Sens. 2024, 16, 3984. https://doi.org/10.3390/rs16213984
Mahmoudi Kouhi R, Stocker O, Giguère P, Daniel S. CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo. Remote Sensing. 2024; 16(21):3984. https://doi.org/10.3390/rs16213984
Chicago/Turabian StyleMahmoudi Kouhi, Reza, Olivier Stocker, Philippe Giguère, and Sylvie Daniel. 2024. "CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo" Remote Sensing 16, no. 21: 3984. https://doi.org/10.3390/rs16213984
APA StyleMahmoudi Kouhi, R., Stocker, O., Giguère, P., & Daniel, S. (2024). CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo. Remote Sensing, 16(21), 3984. https://doi.org/10.3390/rs16213984