Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception
Abstract
:1. Introduction
- We propose a new radar perception framework named U-MLPNet. It combines the global capabilities of Metaformer and the local advantages of 3D CNNs to effectively achieve omni-dimensional spatio-temporal representation.
- A plug-and-play multi-scale feature fusion module in the latent space called U-MLP is designed for radar signals, offering new insights into radar perception tasks.
- Our approach effectively integrates and preserves the diversity between the RD and RA views, which is crucial for a comprehensive evaluation of object attributes.
- Experimental results demonstrate that our proposed approach achieves competitive performance compared to state-of-the-art (SOTA) methods without increasing computational complexity.
2. Related Works
2.1. CNN-Based Tasks
2.2. Attention-Based Tasks
2.3. MLP-Based Tasks
3. Proposed Method
3.1. Overall Framework
3.2. Encoder and Decoder Architectures
3.3. U-MLP
3.4. Loss Function
3.4.1. Object-Centric (OC) Focal Loss
3.4.2. Class-Agnostic Object Localization (CL) Loss
3.4.3. Soft Dice (SD) Loss
3.4.4. Multi-View (MV) Range-Matching Loss
4. Experiments
4.1. Datasets and Evaluation Metrics
4.1.1. CARRADA
4.1.2. CRUW
4.2. Implementation Details
4.2.1. CARRADA
4.2.2. CRUW
4.3. Comparisons with SOTA Models
4.3.1. CARRADA
4.3.2. CRUW
4.4. Ablation Studies
4.4.1. Effectiveness of Cross-Dilated Receptive Field
4.4.2. Effectiveness of Multi-Scale Fusion
4.4.3. Effectiveness of Multi-Scale, Multi-View Fusion
4.4.4. Effectiveness of Skip Connections
4.4.5. Effectiveness of Dilation Factor
4.5. Nighttime Test
4.6. Model Complexity Analysis
5. Conclusions
- Expanded Datasets: We plan to test U-MLPNet across a wider range of radar tasks to fully evaluate its scalability.
- Real-world Testing: We will collaborate with industry partners to test U-MLPNet on private radar datasets to promote its application in real-world scenarios.
- Generalization Enhancement: To improve the generalization capabilities of U-MLPNet, we will introduce techniques such as domain adaptation and transfer learning to enhance the performance of the model in different application scenarios.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, R.; Suzuki, K.; Owada, Y.; Takeda, S.; Umehira, M.; Wang, X.; Kuroda, H. A millimeter-wave automotive radar with high angular resolution for identification of closely spaced on-road obstacles. Sci. Rep. 2023, 13, 3233. [Google Scholar] [CrossRef] [PubMed]
- Yao, S.; Guan, R.; Huang, X.; Li, Z.; Sha, X.; Yue, Y.; Lim, E.G.; Seo, H.; Man, K.L.; Zhu, X.; et al. Radar-camera fusion for object detection and semantic segmentation in autonomous driving: A comprehensive review. IEEE Trans. Intell. Veh. 2023, 9, 2094–2128. [Google Scholar] [CrossRef]
- Li, P.; Wang, P.; Berntorp, K.; Liu, H. Exploiting temporal relations on radar perception for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17071–17080. [Google Scholar]
- Yoneda, K.; Suganuma, N.; Yanase, R.; Aldibaja, M. Automated driving recognition technologies for adverse weather conditions. IATSS Res. 2019, 43, 253–262. [Google Scholar] [CrossRef]
- Tait, P. Introduction to Radar Target Recognition; IET: London, UK, 2005; Volume 18. [Google Scholar]
- Cao, X.; Yi, J.; Gong, Z.; Wan, X. Automatic target recognition based on RCS and angular diversity for multistatic passive radar. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 4226–4240. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, X.; Huang, D.; Fang, X.; Zhou, M.; Zhang, Y. MRPT: Millimeter-wave radar-based pedestrian trajectory tracking for autonomous urban driving. IEEE Trans. Instrum. Meas. 2021, 71, 1–17. [Google Scholar] [CrossRef]
- Richards, M.A. Fundamentals of Radar Signal Processing; Mcgraw-Hill: New York, NY, USA, 2005; Volume 1. [Google Scholar]
- Wang, Y.; Wang, W.; Zhou, M.; Ren, A.; Tian, Z. Remote monitoring of human vital signs based on 77-GHz mm-wave FMCW radar. Sensors 2020, 20, 2999. [Google Scholar] [CrossRef]
- Scharf, L.; Demeure, C. Statistical Signal Processing: Detection, Estimation, and Time Series Analysis; Addison-Wesley Series in Electrical and Computer Engineering; Addison-Wesley Publishing Company: Boston, MA, USA, 1991. [Google Scholar]
- Chen, V.C.; Li, F.; Ho, S.S.; Wechsler, H. Analysis of micro-Doppler signatures. IEE Proc.-Radar, Sonar Navig. 2003, 150, 271–276. [Google Scholar] [CrossRef]
- Zhou, H.; Jiang, T. Decision tree based sea-surface weak target detection with false alarm rate controllable. IEEE Signal Process. Lett. 2019, 26, 793–797. [Google Scholar] [CrossRef]
- Li, Y.; Xie, P.; Tang, Z.; Jiang, T.; Qi, P. SVM-based sea-surface small target detection: A false-alarm-rate-controllable approach. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1225–1229. [Google Scholar] [CrossRef]
- Guo, Z.X.; Shui, P.L. Anomaly based sea-surface small target detection using K-nearest neighbor classification. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 4947–4964. [Google Scholar] [CrossRef]
- Du, L.; Liu, H.; Bao, Z. Radar HRRP statistical recognition: Parametric model and model selection. IEEE Trans. Signal Process. 2008, 56, 1931–1944. [Google Scholar] [CrossRef]
- Feng, D.; Harakeh, A.; Waslander, S.L.; Dietmayer, K. A review and comparative study on probabilistic object detection in autonomous driving. IEEE Trans. Intell. Transp. Syst. 2021, 23, 9961–9980. [Google Scholar] [CrossRef]
- Paek, D.H.; Kong, S.H.; Wijaya, K.T. Enhanced k-radar: Optimal density reduction to improve detection performance and accessibility of 4d radar tensor-based object detection. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–6. [Google Scholar]
- Venon, A.; Dupuis, Y.; Vasseur, P.; Merriaux, P. Millimeter Wave FMCW RADARs for Perception, Recognition and Localization in Automotive Applications: A Survey. IEEE Trans. Intell. Veh. 2022, 7, 533–555. [Google Scholar] [CrossRef]
- Wang, C.X.; Chen, X.; Zou, H.Y.; He, S.; Tang, X. Automatic target recognition of millimeter-wave radar based on deep learning. J. Phys. Conf. Ser. 2021, 2031, 12031. [Google Scholar] [CrossRef]
- Orr, I.; Cohen, M.; Zalevsky, Z. High-resolution radar road segmentation using weakly supervised learning. Nat. Mach. Intell. 2021, 3, 239–246. [Google Scholar] [CrossRef]
- Angelov, A.; Robertson, A.; Murray-Smith, R.; Fioranelli, F. Practical classification of different moving targets using automotive radar and deep neural networks. IET Radar Sonar Navig. 2018, 12, 1082–1089. [Google Scholar] [CrossRef]
- Wang, J.; Guo, J.; Shao, X.; Wang, K.; Fang, X. Road targets recognition based on deep learning and micro-Doppler features. In Proceedings of the 2018 International Conference on Sensor Networks and Signal Processing (SNSP), Xi’an, China, 28–31 October 2018; pp. 271–276. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, Z.; Gao, X.; Hwang, J.N.; Xing, G.; Liu, H. Rodnet: Radar object detection using cross-modal supervision. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 504–513. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Li, X.; Ding, H.; Yuan, H.; Zhang, W.; Pang, J.; Cheng, G.; Chen, K.; Liu, Z.; Loy, C.C. Transformer-based visual segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10138–10163. [Google Scholar] [CrossRef]
- Li, P.; Zhang, Y.; Yuan, L.; Xu, X. Fully transformer-equipped architecture for end-to-end referring video object segmentation. Inf. Process. Manag. 2024, 61, 103566. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Fan, Y.; Zhang, K.; Ranjan, R.; Li, Y.; Timofte, R.; Van Gool, L. Vrt: A video restoration transformer. IEEE Trans. Image Process. 2024, 33, 2171–2182. [Google Scholar] [CrossRef]
- Li, D.; Shi, X.; Zhang, Y.; Cheung, K.C.; See, S.; Wang, X.; Qin, H.; Li, H. A simple baseline for video restoration with grouped spatial-temporal shift. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9822–9832. [Google Scholar]
- Xu, K.; Xu, L.; He, G.; Yu, W.; Li, Y. Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer. arXiv 2024, arXiv:2404.13640. [Google Scholar]
- Jiao, L.; Zhang, X.; Liu, X.; Liu, F.; Yang, S.; Ma, W.; Li, L.; Chen, P.; Feng, Z.; Guo, Y.; et al. Transformer meets remote sensing video detection and tracking: A comprehensive survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1–45. [Google Scholar] [CrossRef]
- Zeng, F.; Dong, B.; Zhang, Y.; Wang, T.; Zhang, X.; Wei, Y. Motr: End-to-end multiple-object tracking with transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 659–675. [Google Scholar]
- Xie, F.; Chu, L.; Li, J.; Lu, Y.; Ma, C. Videotrack: Learning to track objects via video transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22826–22835. [Google Scholar]
- Zhuang, L.; Jiang, T.; Wang, J.; An, Q.; Xiao, K.; Wang, A. Effective mmWave Radar Object Detection Pre-Training Based on Masked Image Modeling. IEEE Sens. J. 2023, 24, 3999–4010. [Google Scholar] [CrossRef]
- Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10819–10829. [Google Scholar]
- Wang, J.; Zhang, S.; Liu, Y.; Wu, T.; Yang, Y.; Liu, X.; Chen, K.; Luo, P.; Lin, D. Riformer: Keep your vision backbone effective but removing token mixer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14443–14452. [Google Scholar]
- Kang, B.; Moon, S.; Cho, Y.; Yu, H.; Kang, S.J. MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 434–443. [Google Scholar]
- Lu, Z.; Kang, L.; Huang, J. Depthwise Convolution with Channel Mixer: Rethinking MLP in MetaFormer for Faster and More Accurate Vehicle Detection. In Proceedings of the International Conference on Artificial Neural Networks, Heraklion, Crete, Greece, 26–29 September 2023; Proceedings, Part X. Springer: Cham, Switzerland, 2023; pp. 136–147. [Google Scholar]
- Chen, J.; Luo, R. MetaCNN: A New Hybrid Deep Learning Image-based Approach for Vehicle Classification Using Transformer-like Framework. In Proceedings of the 5th International Conference on Computer Science and Software Engineering, Guilin, China, 21–23 October 2022; pp. 517–521. [Google Scholar]
- Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
- Bozic, V.; Dordevic, D.; Coppola, D.; Thommes, J.; Singh, S.P. Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers. arXiv 2023, arXiv:2311.10642. [Google Scholar]
- Cenkeramaddi, L.R.; Rai, P.K.; Dayal, A.; Bhatia, J.; Pandya, A.; Soumya, J.; Kumar, A.; Jha, A. A novel angle estimation for mmWave FMCW radars using machine learning. IEEE Sens. J. 2021, 21, 9833–9843. [Google Scholar] [CrossRef]
- Gupta, S.; Rai, P.K.; Kumar, A.; Yalavarthy, P.K.; Cenkeramaddi, L.R. Target classification by mmWave FMCW radars using machine learning on range-angle images. IEEE Sens. J. 2021, 21, 19993–20001. [Google Scholar] [CrossRef]
- Bi, X. Environmental Perception Technology for Unmanned Systems; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Nguyen, M.Q.; Feger, R.; Wagner, T.; Stelzer, A. High Angular Resolution Method Based on Deep Learning for FMCW MIMO Radar. IEEE Trans. Microw. Theory Tech. 2023, 71, 5413–5427. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, X.; Zhang, Y.; Guo, Y.; Chen, Y.; Huang, X.; Ma, Z. Peakconv: Learning peak receptive field for radar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17577–17586. [Google Scholar]
- Lian, D.; Yu, Z.; Sun, X.; Gao, S. As-mlp: An axial shifted mlp architecture for vision. arXiv 2021, arXiv:2107.08391. [Google Scholar]
- Zhang, A.; Nowruzi, F.E.; Laganiere, R. Raddet: Range-azimuth-doppler based radar object detection for dynamic road users. In Proceedings of the 2021 18th Conference on Robots and Vision (CRV), Burnaby, BC, Canada, 26–28 May 2021; pp. 95–102. [Google Scholar]
- Abdu, F.J.; Zhang, Y.; Fu, M.; Li, Y.; Deng, Z. Application of deep learning on millimeter-wave radar signals: A review. Sensors 2021, 21, 1951. [Google Scholar] [CrossRef]
- Jiang, W.; Wang, Y.; Li, Y.; Lin, Y.; Shen, W. Radar target characterization and deep learning in radar automatic target recognition: A review. Remote Sens. 2023, 15, 3742. [Google Scholar] [CrossRef]
- van Berlo, B.; Elkelany, A.; Ozcelebi, T.; Meratnia, N. Millimeter wave sensing: A review of application pipelines and building blocks. IEEE Sens. J. 2021, 21, 10332–10368. [Google Scholar] [CrossRef]
- Kaul, P.; De Martini, D.; Gadd, M.; Newman, P. Rss-net: Weakly-supervised multi-class semantic segmentation with fmcw radar. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 431–436. [Google Scholar]
- Dong, X.; Wang, P.; Zhang, P.; Liu, L. Probabilistic oriented object detection in automotive radar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 102–103. [Google Scholar]
- Patel, K.; Rambach, K.; Visentin, T.; Rusev, D.; Pfeiffer, M.; Yang, B. Deep learning-based object classification on automotive radar spectra. In Proceedings of the 2019 IEEE Radar Conference (RadarConf), Boston, MA, USA, 22–26 April 2019; pp. 1–6. [Google Scholar]
- Palffy, A.; Dong, J.; Kooij, J.F.; Gavrila, D.M. CNN based road user detection using the 3D radar cube. IEEE Robot. Autom. Lett. 2020, 5, 1263–1270. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, Z.; Li, Y.; Hwang, J.N.; Xing, G.; Liu, H. RODNet: A real-time radar object detection network cross-supervised by camera-radar fused object 3D localization. IEEE J. Sel. Top. Signal Process. 2021, 15, 954–967. [Google Scholar] [CrossRef]
- Gao, X.; Xing, G.; Roy, S.; Liu, H. Ramp-cnn: A novel neural network for enhanced automotive radar object recognition. IEEE Sens. J. 2020, 21, 5119–5132. [Google Scholar] [CrossRef]
- Ouaknine, A.; Newson, A.; Pérez, P.; Tupin, F.; Rebut, J. Multi-view radar semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15671–15680. [Google Scholar]
- Ouaknine, A.; Newson, A.; Rebut, J.; Tupin, F.; Pérez, P. Carrada dataset: Camera and automotive radar with range-angle-doppler annotations. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5068–5075. [Google Scholar]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and challenges of image segmentation: A review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Dalbah, Y.; Lahoud, J.; Cholakkal, H. RadarFormer: Lightweight and accurate real-time radar object detection model. In Proceedings of the Scandinavian Conference on Image Analysis, Sirkka, Finland, 18–21 April 2023; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2023; pp. 341–358. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Kothari, R.; Kariminezhad, A.; Mayr, C.; Zhang, H. Object detection and heading forecasting by fusing raw radar data using cross attention. arXiv 2022, arXiv:2205.08406. [Google Scholar]
- Zhuang, L.; Jiang, T.; Jiang, H.; Wang, A.; Huang, Z. LQCANet: Learnable-Query-Guided Multi-Scale Fusion Network based on Cross-Attention for Radar Semantic Segmentation. IEEE Trans. Intell. Veh. 2023, 9, 3330–3344. [Google Scholar] [CrossRef]
- Ho, J.; Kalchbrenner, N.; Weissenborn, D.; Salimans, T. Axial attention in multidimensional transformers. arXiv 2019, arXiv:1912.12180. [Google Scholar]
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxvit: Multi-axis vision transformer. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 459–479. [Google Scholar]
- Jiang, T.; Zhuang, L.; An, Q.; Wang, J.; Xiao, K.; Wang, A. T-rodnet: Transformer for vehicular millimeter-wave radar object detection. IEEE Trans. Instrum. Meas. 2022, 72, 1–12. [Google Scholar] [CrossRef]
- Dalbah, Y.; Lahoud, J.; Cholakkal, H. TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 353–362. [Google Scholar]
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxim: Multi-axis mlp for image processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5769–5780. [Google Scholar]
- Yu, T.; Li, X.; Cai, Y.; Sun, M.; Li, P. S 2-MLPv2: Improved spatial-shift MLP architecture for vision. arXiv 2021, arXiv:2108.01072. [Google Scholar]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Cheng, H.; Zhang, M.; Shi, J.Q. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10558–10578. [Google Scholar] [CrossRef]
- Kuzmin, A.; Nagel, M.; Van Baalen, M.; Behboodi, A.; Blankevoort, T. Pruning vs. quantization: Which is better? In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 7–14 December 2024; Volume 36. [Google Scholar]
- Tang, H.; Sun, Y.; Wu, D.; Liu, K.; Zhu, J.; Kang, Z. Easyquant: An efficient data-free quantization algorithm for llms. arXiv 2024, arXiv:2403.02775. [Google Scholar]
- Sun, S.; Ren, W.; Li, J.; Wang, R.; Cao, X. Logit standardization in knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 15731–15740. [Google Scholar]
- Pham, C.; Nguyen, V.A.; Le, T.; Phung, D.; Carneiro, G.; Do, T.T. Frequency attention for knowledge distillation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 2277–2286. [Google Scholar]
- Wang, J.; Chen, Y.; Zheng, Z.; Li, X.; Cheng, M.M.; Hou, Q. CrossKD: Cross-head knowledge distillation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16520–16530. [Google Scholar]
Dataset | CARRADA | CRUW |
---|---|---|
Frequency | 77 GHz | 77 GHz |
Sweep Bandwidth | 4 GHz | - |
Maximum Range | 50 m | - |
Range Resolution | 0.20 m | 0.23 m |
Maximum Radial Velocity | 13.43 m/s | - |
Radial Velocity Resolution | 0.42 m/s | - |
Field of View | 180° | 180° |
Angle Resolution | 0.70° | ∼15° |
Number of Chirps per Frame | 64 | 255 |
Number of Samples per Chirp | 256 | - |
View | Method | Params (M) | IoU (%) | Dice (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Bkg. | Ped. | Cycl. | Car | mIoU | Bkg. | Ped. | Cycl. | Car | mDice | |||
RD | FCN-8s [63] | 134.3 | 99.7 | 47.7 | 18.7 | 52.9 | 54.7 | 99.8 | 24.8 | 16.5 | 26.9 | 66.3 |
U-Net [62] | 17.3 | 99.7 | 51.1 | 33.4 | 37.7 | 55.4 | 99.8 | 67.5 | 50.0 | 54.7 | 68.0 | |
DeepLabv3+ [61] | 59.3 | 99.7 | 43.2 | 11.2 | 49.2 | 50.8 | 99.9 | 60.3 | 20.2 | 66.0 | 61.6 | |
RSS-Net [52] | 10.1 | 99.3 | 0.1 | 4.1 | 25.0 | 32.1 | 99.7 | 0.2 | 7.9 | 40.0 | 36.9 | |
RAMP-CNN [57] | 106.4 | 99.7 | 48.8 | 23.2 | 54.7 | 56.6 | 99.9 | 65.6 | 37.7 | 70.8 | 68.5 | |
MVNet [58] | 2.4 | 98.0 | 0.0 | 3.8 | 14.1 | 29.0 | 99.0 | 0.0 | 7.3 | 24.8 | 32.8 | |
TMVA-Net [58] | 5.6 | 99.7 | 52.6 | 29.0 | 53.4 | 58.7 | 99.8 | 68.9 | 45.0 | 69.6 | 70.9 | |
PeakConv [46] | 6.3 | 99.7 | 55.0 | 28.6 | 58.3 | 60.4 | 99.9 | 71.0 | 44.4 | 73.6 | 72.2 | |
TransRadar [72] | 4.8 | 99.8 | 56.5 | 28.7 | 57.0 | 60.5 | 99.9 | 72.2 | 44.6 | 72.6 | 72.3 | |
U-MLPNet | 17.9 | 99.8 | 59.3 | 32.4 | 62.4 | 63.5 | 99.9 | 74.5 | 49.0 | 76.9 | 75.0 | |
RA | FCN-8s [63] | 134.3 | 99.8 | 14.8 | 0.0 | 23.3 | 34.5 | 99.9 | 25.8 | 0.0 | 37.8 | 40.9 |
U-Net [62] | 17.3 | 99.8 | 22.4 | 8.8 | 0.0 | 32.8 | 99.9 | 25.8 | 0.0 | 37.8 | 40.9 | |
DeepLabv3+ [61] | 59.3 | 99.9 | 3.4 | 5.9 | 21.8 | 32.7 | 99.9 | 6.5 | 11.1 | 35.7 | 38.3 | |
RSS-Net [52] | 10.1 | 99.5 | 7.3 | 5.6 | 15.8 | 32.1 | 99.8 | 13.7 | 10.5 | 27.4 | 37.8 | |
RAMP-CNN [57] | 106.4 | 99.8 | 1.7 | 2.6 | 7.2 | 27.9 | 99.9 | 3.4 | 5.1 | 13.5 | 30.5 | |
MVNet [58] | 2.4 | 98.8 | 0.1 | 1.1 | 6.2 | 26.8 | 99.0 | 0.0 | 7.3 | 24.8 | 28.5 | |
TMVA-Net [58] | 5.6 | 99.8 | 26.0 | 8.6 | 30.7 | 41.3 | 99.9 | 41.3 | 15.9 | 47.0 | 51.0 | |
T-RODNet [71] | 162.0 | 99.9 | 25.4 | 9.5 | 39.4 | 43.5 | 99.9 | 40.5 | 17.4 | 56.6 | 53.6 | |
PeakConv [46] | 6.3 | 99.8 | 24.3 | 11.8 | 35.5 | 42.8 | 99.9 | 39.1 | 21.1 | 52.4 | 53.1 | |
SS-RODNet [34] | 33.1 | 99.9 | 26.7 | 8.9 | 37.2 | 43.2 | 99.9 | 42.2 | 16.3 | 54.2 | 53.2 | |
LQCANet [68] | 148.3 | 99.9 | 25.3 | 11.3 | 39.5 | 44.0 | 99.9 | 40.4 | 20.5 | 56.6 | 54.4 | |
TransRadar [72] | 4.8 | 99.9 | 28.9 | 14.3 | 34.9 | 44.5 | 99.9 | 44.9 | 25.0 | 51.8 | 55.4 | |
U-MLPNet | 17.9 | 99.9 | 28.2 | 17.5 | 32.4 | 44.5 | 99.9 | 44.0 | 29.8 | 48.9 | 55.7 |
View | Metric | Method | ||||
---|---|---|---|---|---|---|
MVNet | TMVA-Net | PeakConv | TransRadar | U-MLPNet | ||
RD | Precision (%) | 29.6 | 65.0 | 66.7 | 69.4 | 71.4 |
Recall (%) | 65.3 | 78.4 | 79.8 | 77.6 | 80.1 | |
RA | Precision (%) | 56.4 | 48.1 | 50.5 | 58.9 | 58.3 |
Recall (%) | 26.9 | 55.1 | 56.4 | 55.3 | 58.6 | |
Global | Precision (%) | 43.0 | 56.5 | 58.6 | 64.1 | 64.9 |
Recall (%) | 46.1 | 66.8 | 68.1 | 66.4 | 69.4 |
Model | All | Pedestrian | Cyclist | Car | ||||
---|---|---|---|---|---|---|---|---|
AP (%) | AR (%) | AP (%) | AR (%) | AP (%) | AR (%) | AP (%) | AR (%) | |
RODNet-CDC [56] | 75.20 | 77.84 | 76.13 | 77.98 | 67.38 | 68.05 | 82.46 | 88.59 |
RODNet-HG [56] | 77.04 | 79.50 | 77.93 | 79.75 | 68.49 | 69.06 | 85.18 | 90.79 |
RODNet-HWGI [56] | 78.06 | 81.07 | 79.47 | 81.85 | 70.35 | 71.40 | 84.39 | 90.65 |
SS-RODNet [34] | 83.07 | 86.43 | 81.37 | 84.61 | 83.34 | 84.34 | 85.55 | 90.86 |
T-RODNet [71] | 80.74 | 86.12 | 79.76 | 83.59 | 79.87 | 85.18 | 83.29 | 91.27 |
RadarFormer [64] | 82.63 | 86.56 | 83.08 | 86.55 | 82.52 | 83.54 | 82.03 | 89.94 |
U-MLPNet | 84.84 | 88.59 | 85.41 | 88.30 | 85.44 | 86.96 | 83.22 | 90.89 |
Cross-Dilated Receptive Fields | Multi-Scale Fusion | Multi-Scale, Multi-View Fusion | mIoURD (%) | mIoURA (%) |
---|---|---|---|---|
✗ | ✓ | ✓ | 61.6 | 42.2 |
✓ | ✗ | ✓ | 62.8 | 43.1 |
✓ | ✓ | ✗ | 61.5 | 41.6 |
✓ | ✓ | ✓ | 63.5 | 44.5 |
RA-Skip | RD-Skip | (%) | (%) |
---|---|---|---|
✗ | ✗ | 61.5 | 44.2 |
✗ | ✓ | 64.5 | 43.6 |
✓ | ✗ | 59.9 | 43.1 |
✓ | ✓ | 63.5 | 44.5 |
Dilation Factor | 1 | 2 | 3 |
---|---|---|---|
(%) | 61.6 | 60.0 | 63.5 |
(%) | 42.2 | 39.8 | 44.5 |
Model | MVNet | TMVA-Net | PeakConv | TransRadar | U-MLPNet |
---|---|---|---|---|---|
FLOPs (G) | 53.57 | 96.11 | 112.21 | 91.08 | 108.17 |
Infer. Time (ms) | 146.04 | 149.82 | 2718.56 | 25.93 | 23.68 |
Model | RODNet-CDC | RODNet-HG | T-RODNet | RadarFormer | U-MLPNet |
---|---|---|---|---|---|
FLOPs (G) | 280.03 | 129.19 | 182.53 | 150.12 | 138.58 |
Infer. Time (ms) | 7.78 | 119.56 | 27.71 | 106.05 | 44.35 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, H.; Li, Y.; Wang, L.; Chen, S. Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception. Remote Sens. 2024, 16, 4256. https://doi.org/10.3390/rs16224256
Yan H, Li Y, Wang L, Chen S. Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception. Remote Sensing. 2024; 16(22):4256. https://doi.org/10.3390/rs16224256
Chicago/Turabian StyleYan, Hang, Yongji Li, Luping Wang, and Shichao Chen. 2024. "Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception" Remote Sensing 16, no. 22: 4256. https://doi.org/10.3390/rs16224256
APA StyleYan, H., Li, Y., Wang, L., & Chen, S. (2024). Learning Omni-Dimensional Spatio-Temporal Dependencies for Millimeter-Wave Radar Perception. Remote Sensing, 16(22), 4256. https://doi.org/10.3390/rs16224256