RaSS: 4D mm-Wave Radar Point Cloud Semantic Segmentation with Cross-Modal Knowledge Distillation
Abstract
1. Introduction
- We present ZJUSSet, a new dataset designed specifically for 4D radar semantic segmentation. To our knowledge, it is the first dataset offering point-level annotations of 4D radar across 10 categories.
- We develop a cross-modal knowledge distillation framework for semantic segmentation of 4D radar point cloud. Leveraging spatially aligned feature maps from a LiDAR-based teacher model, the radar-only student model is able to learn and extract more discriminative features optimized for segmentation tasks.
- We evaluate our RaSS framework alongside other state-of-the-art approaches on both ZJUSSet and VoD [6]. The findings confirm that our model outperforms competing methods and highlight the promising prospects of 4D radar-based semantic segmentation.
2. Related Work
2.1. Radar Segmentation Dataset
2.2. Radar-Based Semantic Segmentation Methods
2.3. Knowledge Distillation
3. 4D Radar Semantic Segmentation Dataset
3.1. Sensor Setup
3.2. Calibration
3.3. Data Annotation
3.4. Statistics Analysis
4. Method
4.1. Framework
4.2. Adaptive Doppler Compensation (ADC)
- (1)
- For simplicity, we assume that the ego-vehicle moves only on the x-y plane and heads along the x direction. For each radar point in jth frame, we project the raw Doppler onto the x-y plane, resulting in the projected speed .
- (2)
- Project the to x-axis to obtain the relative speed along the vehicle’s motion direction, with .
- (3)
- Assuming the majority of points in the scene are from the static background, we discretize the and define a function to count the votes within the frame. The highest count is regarded as the estimated ego-vehicle velocity for the current jth frame, i.e., .
- (4)
- For the radar point in the jth frame, the compensated velocity is obtained by . Finally, we use instead of the original and feed it into the student network as the initial feature.
4.3. Radar Feature Knowledge Distillation (RFKD)
4.4. Loss Functions
5. Experiments
5.1. Experiment Setups
5.1.1. Dataset
5.1.2. Evaluation Metrics
5.1.3. Network Setups
5.1.4. Training and Inference Details
5.2. Experimental Results
5.2.1. Results on ZJUSSet
Method | Input | mIoU | Building | Fence | Vegetation | Car | Cyclist | Pedestrian | Truck | Bus | Tricycle |
---|---|---|---|---|---|---|---|---|---|---|---|
Cylinder3D [35] | L | 69.02 | 72.36 | 63.17 | 91.95 | 95.52 | 69.39 | 68.30 | 44.41 | 84.71 | 31.33 |
PT-V2 [36] | L | 72.47 | 80.46 | 62.45 | 93.60 | 96.66 | 79.65 | 77.34 | 48.66 | 84.46 | 28.95 |
KPConv [18] | R | 29.20 | 55.70 | 55.90 | 42.90 | 68.40 | 8.30 | 15.40 | 3.20 | 12.90 | 0.10 |
PolarNet [37] | R | 31.57 | 71.56 | 46.08 | 57.04 | 73.41 | 1.73 | 0.68 | 0.09 | 33.49 | 0.01 |
Minkowski [38] | R | 33.42 | 61.06 | 58.56 | 51.92 | 75.10 | 7.13 | 15.73 | 2.81 | 25.25 | 3.19 |
Cylinder3D [35] | R | 36.15 | 73.82 | 58.61 | 58.64 | 69.35 | 5.62 | 29.75 | 3.11 | 26.34 | 0.14 |
PT-V2 [36] | R | 36.78 | 75.53 | 62.32 | 63.81 | 62.07 | 3.69 | 18.56 | 4.76 | 40.05 | 0.25 |
RaSS(C) | R | 41.57 | 73.86 | 63.3 | 61.08 | 75.83 | 16.20 | 34.30 | 9.64 | 38.82 | 1.11 |
RaSS(P) | R | 42.35 | 74.09 | 63.46 | 61.99 | 69.06 | 9.66 | 51.74 | 7.82 | 42.37 | 0.94 |
5.2.2. Results on VoD Dataset
5.3. Failure Cases Analysis
5.4. Ablation Study
6. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
mm-Wave | millimeter-wave [34] |
PT-V2 | PointTransformer-V2 |
RaSS | Radar Point Cloud Semantic Segmentation |
VoD | The View of Delft dataset |
ADC | Adaptive Doppler Compensation |
RFKD | Radar Feature Knowledge Distillation |
BEV | bird’s-eye-view |
References
- Zheng, L.; Li, S.; Tan, B.; Yang, L.; Chen, S.; Huang, L.; Bai, J.; Zhu, X.; Ma, Z. Rcfusion: Fusing 4-d radar and camera with bird’s-eye view features for 3-d object detection. IEEE Trans. Instrum. Meas. 2023, 72, 8503814. [Google Scholar] [CrossRef]
- Wu, Z.; Chen, G.; Gan, Y.; Wang, L.; Pu, J. Mvfusion: Multi-view 3d object detection with semantic-aligned radar and camera fusion. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 2766–2773. [Google Scholar]
- Xu, R.; Xiang, Z.; Zhang, C.; Zhong, H.; Zhao, X.; Dang, R.; Xu, P.; Pu, T.; Liu, E. SCKD: Semi-supervised cross-modality knowledge distillation for 4d radar object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 25–30 February 2025; pp. 8933–8941. [Google Scholar]
- Pan, Z.; Ding, F.; Zhong, H.; Lu, C.X. Ratrack: Moving object detection and tracking with 4d radar point cloud. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 4480–4487. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
- Palffy, A.; Pool, E.; Baratam, S.; Kooij, J.F.; Gavrila, D.M. Multi-class road user detection with 3+ 1D radar in the View-of-Delft dataset. IEEE Robot. Autom. Lett. 2022, 7, 4961–4968. [Google Scholar] [CrossRef]
- Ouaknine, A.; Newson, A.; Rebut, J.; Tupin, F.; Pérez, P. Carrada dataset: Camera and automotive radar with range-angle-doppler annotations. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5068–5075. [Google Scholar]
- Rebut, J.; Ouaknine, A.; Malik, W.; Pérez, P. Raw high-definition radar for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17021–17030. [Google Scholar]
- Schumann, O.; Hahn, M.; Scheiner, N.; Weishaupt, F.; Tilly, J.F.; Dickmann, J.; Wöhler, C. Radarscenes: A real-world radar point cloud data set for automotive applications. In Proceedings of the 2021 IEEE 24th International Conference on Information Fusion (FUSION), Sun City, South Africa, 1–4 November 2021; pp. 1–8. [Google Scholar]
- Orr, I.; Cohen, M.; Zalevsky, Z. High-resolution radar road segmentation using weakly supervised learning. Nat. Mach. Intell. 2021, 3, 239–246. [Google Scholar] [CrossRef]
- Kaul, P.; De Martini, D.; Gadd, M.; Newman, P. Rss-net: Weakly-supervised multi-class semantic segmentation with fmcw radar. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 431–436. [Google Scholar]
- Liu, J.; Xiong, W.; Bai, L.; Xia, Y.; Huang, T.; Ouyang, W.; Zhu, B. Deep instance segmentation with automotive radar detection points. IEEE Trans. Intell. Veh. 2022, 8, 84–94. [Google Scholar] [CrossRef]
- Nobis, F.; Fent, F.; Betz, J.; Lienkamp, M. Kernel point convolution LSTM networks for radar point cloud segmentation. Appl. Sci. 2021, 11, 2599. [Google Scholar] [CrossRef]
- Prophet, R.; Deligiannis, A.; Fuentes-Michel, J.C.; Weber, I.; Vossiek, M. Semantic segmentation on 3D occupancy grids for automotive radar. IEEE Access 2020, 8, 197917–197930. [Google Scholar] [CrossRef]
- Prophet, R.; Li, G.; Sturm, C.; Vossiek, M. Semantic segmentation on automotive radar maps. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 756–763. [Google Scholar]
- Feng, Z.; Zhang, S.; Kunert, M.; Wiesbeck, W. Point cloud segmentation with a high-resolution automotive radar. In Proceedings of the AmE 2019-Automotive Meets Electronics; 10th GMM-Symposium, Dortmund, Germany, 12–13 March 2019; pp. 1–5. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
- Lombacher, J.; Laudt, K.; Hahn, M.; Dickmann, J.; Wöhler, C. Semantic radar grids. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1170–1175. [Google Scholar]
- Zeller, M.; Behley, J.; Heidingsfeld, M.; Stachniss, C. Gaussian radar transformer for semantic segmentation in noisy radar data. IEEE Robot. Autom. Lett. 2022, 8, 344–351. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. 2015. Available online: https://arxiv.org/abs/1503.02531 (accessed on 15 August 2025).
- Chen, D.; Mei, J.P.; Zhang, Y.; Wang, C.; Wang, Z.; Feng, Y.; Chen, C. Cross-layer distillation with semantic calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 7028–7036. [Google Scholar]
- Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3967–3976. [Google Scholar]
- Hou, Y.; Zhu, X.; Ma, Y.; Loy, C.C.; Li, Y. Point-to-voxel knowledge distillation for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8479–8488. [Google Scholar]
- Jiang, F.; Gao, H.; Qiu, S.; Zhang, H.; Wan, R.; Pu, J. Knowledge distillation from 3d to bird’s-eye-view for lidar semantic segmentation. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 402–407. [Google Scholar]
- Yan, X.; Gao, J.; Zheng, C.; Zheng, C.; Zhang, R.; Cui, S.; Li, Z. 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 677–695. [Google Scholar]
- Zhang, Z.; Yang, X.; Zhang, W.; Jin, C. ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation. 2024. Available online: https://arxiv.org/abs/2405.04121 (accessed on 15 August 2025).
- Unal, O.; Dai, D.; Hoyer, L.; Can, Y.B.; Van Gool, L. 2D feature distillation for weakly-and semi-supervised 3D semantic segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 7336–7345. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
- Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures, Boston, MA, USA, 12–15 November 1991; pp. 586–606. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Berman, M.; Triki, A.R.; Blaschko, M.B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4413–4421. [Google Scholar]
- Zhu, X.; Zhou, H.; Wang, T.; Hong, F.; Li, W.; Ma, Y.; Li, H.; Yang, R.; Lin, D. Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6807–6822. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Lao, Y.; Jiang, L.; Liu, X.; Zhao, H. Point transformer v2: Grouped vector attention and partition-based pooling. Adv. Neural Inf. Process. Syst. 2022, 35, 33330–33342. [Google Scholar]
- Nowruzi, F.E.; Kolhatkar, D.; Kapoor, P.; Heravi, E.J.; Hassanat, F.A.; Laganiere, R.; Rebut, J.; Malik, W. Polarnet: Accelerated Deep Open Space Segmentation Using Automotive Radar in Polar Domain. 2021. Available online: https://arxiv.org/abs/2103.03387 (accessed on 18 August 2025).
- Choy, C.; Gwak, J.; Savarese, S. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084. [Google Scholar]
Dataset | Year | Radar | Camera | LiDAR | Annotation | Categories |
---|---|---|---|---|---|---|
CARRADA [7] | 2020 | 3D | RGB | 2D Box, 2D Pixel | 4 | |
RADIal [8] | 2021 | 3D | RGB | 16-line | 2D Box, 2D point | 1 |
RadarScenes [9] | 2021 | 3D | RGB | 2D Point | 6 | |
ZJUSSet (Ours) | 2024 | 4D | RGB | Livox (∼128-line) | 3D Point | 10 |
Metric | Building | Fence | Vegetation | Car | Cyclist | Pedestrian | Truck | Bus | Tricycle |
---|---|---|---|---|---|---|---|---|---|
IoU | 74.09 | 63.46 | 61.99 | 69.06 | 9.66 | 51.74 | 7.82 | 42.37 | 0.94 |
Precision | 79.20 | 76.82 | 80.90 | 80.95 | 13.44 | 85.79 | 21.56 | 79.10 | 2.57 |
Recall | 91.99 | 78.49 | 72.62 | 82.46 | 25.57 | 56.59 | 10.93 | 47.71 | 1.46 |
Methods | Input | mIoU | Car | Pedestrian | Cyclist | Motor | Truck | Bicycle |
---|---|---|---|---|---|---|---|---|
Cylinder3D | L | 74.82 | 89.93 | 87.19 | 79.62 | 61.51 | 43.80 | 86.89 |
PTV2 | L | 74.71 | 77.09 | 89.81 | 83.51 | 87.68 | 21.12 | 89.05 |
Cylinder3D | R | 61.22 | 87.81 | 71.97 | 77.02 | 1.50 | 71.10 | 57.92 |
PTV2 | R | 62.65 | 85.49 | 72.49 | 80.91 | 0.00 | 84.55 | 52.44 |
RaSS(C) | R | 65.73 | 87.63 | 75.17 | 82.17 | 6.64 | 86.64 | 56.15 |
RaSS(P) | R | 64.49 | 82.40 | 69.26 | 78.67 | 38.81 | 65.08 | 52.71 |
Network | ADC | Logits-KD | Feats-KD | K | Selected | mIoU |
---|---|---|---|---|---|---|
Baseline | 0 | 36.15 | ||||
RaSS | ✓ | 0 | 37.39 | |||
✓ | ✓ | 0 | 38.46 | |||
✓ | ✓ | ✓ | 4 | ✓ | 40.58 | |
✓ | ✓ | 4 | 40.29 | |||
✓ | ✓ | 4 | ✓ | 41.57 | ||
✓ | ✓ | 2 | ✓ | 38.36 | ||
✓ | ✓ | 8 | ✓ | 39.77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Xiang, Z.; Xu, R.; Shan, H.; Zhao, X.; Dang, R. RaSS: 4D mm-Wave Radar Point Cloud Semantic Segmentation with Cross-Modal Knowledge Distillation. Sensors 2025, 25, 5345. https://doi.org/10.3390/s25175345
Zhang C, Xiang Z, Xu R, Shan H, Zhao X, Dang R. RaSS: 4D mm-Wave Radar Point Cloud Semantic Segmentation with Cross-Modal Knowledge Distillation. Sensors. 2025; 25(17):5345. https://doi.org/10.3390/s25175345
Chicago/Turabian StyleZhang, Chenwei, Zhiyu Xiang, Ruoyu Xu, Hangguan Shan, Xijun Zhao, and Ruina Dang. 2025. "RaSS: 4D mm-Wave Radar Point Cloud Semantic Segmentation with Cross-Modal Knowledge Distillation" Sensors 25, no. 17: 5345. https://doi.org/10.3390/s25175345
APA StyleZhang, C., Xiang, Z., Xu, R., Shan, H., Zhao, X., & Dang, R. (2025). RaSS: 4D mm-Wave Radar Point Cloud Semantic Segmentation with Cross-Modal Knowledge Distillation. Sensors, 25(17), 5345. https://doi.org/10.3390/s25175345