An Asymmetric Selective Kernel Network for Drone-Based Vehicle Detection to Build a High-Accuracy Vehicle Trajectory Dataset
Abstract
:1. Introduction
- We present an open-source vehicle detection dataset of imagery from a drone’s top-down view aimed at extracting high-accuracy vehicle trajectory data, addressing the current gap in such publicly available datasets.
- We designed an Asymmetric Selective Kernel Network that enhances feature extraction along the vehicle’s longitudinal edges based on the distribution patterns of OBBs. Additionally, we modified the current vehicle detection dataset’s annotation method to single-label annotation, thereby improving the regression precision of vehicle detection boxes.
- We devised a method for vehicle height estimation based on high-precision vehicle detection results, further enhancing the accuracy of vehicle trajectory data.
2. Materials and Methods
2.1. FRVehicle Dataset
2.2. Asymmetric Selective Kernel Network
- The input feature map X(D, H, W) undergoes sequential processing through a 3 × 5 initial depthwise separable convolution followed by a 3 × 7 dilated depthwise separable convolution to obtain two spatial feature maps with dimensions (D, H, W).
- and , 1 × 1 convolutions, are applied to both spatial feature maps to reduce their channel dimensions by half, resulting in reduced spatial feature maps (D/2, H, W) and (D/2, H, W), which are then concatenated along the channel dimension to obtain a spatial concatenated feature map (D, H, W):
- Both maximum pooling and average pooling operations (denoted by and ) are performed along the channel dimension on the concatenated feature map to obtain two spatial pooled feature maps, and , with a single channel (1, H, W):
- The spatial pooled concatenated feature map (2, H, W) is processed using a 3 × 7 convolution kernel for attention feature extraction, followed by sigmoid activation to generate an attention weight feature map (2, H, W):
- The channel-reduced spatial feature maps (D/2, H, W) and (D/2, H, W) are multiplied with their corresponding single-channel weight feature map and then weighted to obtain a channel-reduced weighted attention feature map (D/2, H, W).
- A 1 × 1 convolution is applied for channel restoration to obtain an attention feature map S(D, H, W) with the same dimensions as the input feature map X(D, H, W):
2.3. Vehicle Height Estimation
3. Results
3.1. Vehicle Labels
3.2. Comparative Study
3.2.1. Results on FRVehicle Dataset
3.2.2. Results on DroneVehicle Dataset
4. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
OBB | Oriented bounding box |
HBB | Horizontal bounding box |
IoU | Intersection over Union |
ASKNet | Asymmetric Selective Kernel Network |
LSKNet | Large Selective Kernel Network |
References
- Berghaus, M.; Lamberty, S.; Ehlers, J.; Kalló, E.; Oeser, M. Vehicle trajectory dataset from drone videos including off-ramp and congested traffic—Analysis of data quality, traffic flow, and accident risk. Commun. Transp. Res. 2024, 4, 100133. [Google Scholar] [CrossRef]
- Lu, D.; Eaton, E.T.; Van Der Weg, M.; Wang, W.; Como, S.G.; Wishart, J.D.; Yu, H.; Yang, Y. CAROM Air—Vehicle Localization and Traffic Scene Reconstruction from Aerial Videos. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 10666–10673. [Google Scholar]
- Wang, Z.; Yu, Z.; Tian, W.; Xiong, L.; Tang, C. A Method for Building Vehicle Trajectory Data Sets Based on Drone Videos; SAE Technical Paper 2023-01-0714; SAE International: Warrendale, PA, USA, 2023. [Google Scholar]
- Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2118–2125. [Google Scholar]
- Bock, J.; Krajewski, R.; Moers, T.; Runde, S.; Vater, L.; Eckstein, L. The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2019; pp. 1929–1934. [Google Scholar]
- Krajewski, R.; Moers, T.; Bock, J.; Vater, L.; Eckstein, L. The rounD Dataset: A Drone Dataset of Road User Trajectories at Roundabouts in Germany. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar]
- Moers, T.; Vater, L.; Krajewski, R.; Bock, J.; Zlocki, A.; Eckstein, L. The exiD Dataset: A Real-World Trajectory Dataset of Highly Interactive Highway Scenarios in Germany. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 958–964. [Google Scholar]
- Zheng, O.; Abdel-Aty, M.A.; Yue, L.; Abdelraouf, A.; Wang, Z.; Mahmoud, N. CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety-Oriented Research and Digital Twins. Transp. Res. Rec. 2022, 2678, 606–621. [Google Scholar] [CrossRef]
- Zhan, W.; Sun, L.; Wang, D.; Shi, H.; Clausse, A.; Naumann, M.; Kümmerle, J.; Königshof, H.; Stiller, C.; de La Fortelle, A.; et al. Interaction Dataset: An International, Adversarial and Cooperative Motion Dataset in Interactive Driving Scenarios with Semantic Maps. arXiv 2019, arXiv:1910.03088. [Google Scholar]
- Xu, Y.; Shao, W.; Li, J.; Yang, K.-B.; Wang, W.; Huang, H.; Lv, C.; Wang, H. SIND: A Drone Dataset at Signalized Intersection in China. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 2471–2478. [Google Scholar]
- Lou, Z.; Cui, Q.; Wang, H.; Tang, X.; Zhou, H. Multimodal Sense-Informed Forecasting of 3D Human Motions. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2144–2154. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.-M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. Int. J. Comput. Vis. 2024. [Google Scholar] [CrossRef]
- Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.-M.; Yang, J.; Li, X. Large Selective Kernel Network for Remote Sensing Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 16794–16805. [Google Scholar]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. arXiv 2024, arXiv:2403.06258. [Google Scholar] [CrossRef]
- Jegham, N.; Koh, C.Y.; Abdelatti, M.; Hendawi, A.M. Evaluating the Evolution of YOLO (You Only Look Once) Models: A Comprehensive Benchmark Study of YOLO11 and Its Predecessors. arXiv 2024, arXiv:2411.00201. [Google Scholar]
- Pu, Y.; Wang, Y.; Xia, Z.; Han, Y.; Wang, Y.; Gan, W.; Wang, Z.; Song, S.; Huang, G. Adaptive Rotated Convolution for Rotated Object Detection. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6566–6577. [Google Scholar]
- Han, J.; Ding, J.; Xue, N.; Xia, G. ReDet: A Rotation-equivariant Detector for Aerial Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2785–2794. [Google Scholar]
- Yu, C.; Jiang, X.; Wu, F.; Fu, Y.; Pei, J.; Zhang, Y.; Li, X.; Fu, T. A Multi-Scale Feature Fusion Based Lightweight Vehicle Target Detection Network on Aerial Optical Images. Remote Sens. 2024, 16, 3637. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 548–558. [Google Scholar]
- Zhang, X.; Tian, Y.; Huang, W.; Ye, Q.; Dai, Q.; Xie, L.; Tian, Q. HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling. arXiv 2022, arXiv:2205.14949. [Google Scholar]
- Xu, Y.; Zhang, Q.; Zhang, J.; Tao, D. ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. arXiv 2021, arXiv:2106.03348. [Google Scholar]
- Zhang, Q.; Xu, Y.; Zhang, J.; Tao, D. ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond. Int. J. Comput. Vis. 2022, 131, 1141–1162. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, Q.; Xu, Y.; Zhang, J.; Du, B.; Tao, D.; Zhang, L. Advancing Plain Vision Transformer Toward Remote Sensing Foundation Model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5607315. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
- Dai, L.; Liu, H.; Tang, H.; Wu, Z.; Song, P. AO2-DETR: Arbitrary-Oriented Object Detection Transformer. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2342–2356. [Google Scholar] [CrossRef]
- He, X.; Liang, K.; Zhang, W.; Li, F.; Jiang, Z.; Zuo, Z.; Tan, X. DETR-ORD: An Improved DETR Detector for Oriented Remote Sensing Object Detection with Feature Reconstruction and Dynamic Query. Remote Sens. 2024, 16, 3516. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Zhou, Y.; Han, J.; Ding, G.; Sun, J. Scaling Up Your Kernels to 31 × 31: Revisiting Large Kernel Design in CNNs. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11953–11965. [Google Scholar]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Virtual, 6–14 December 2021. [Google Scholar]
- Yang, X.; Yan, J.; Qi, M.; Wang, W.; Zhang, X.; Tian, Q. Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss. In Proceedings of the 8th International Conference on Machine Learning, Stockholm, Sweden, 10–12 March 2021. [Google Scholar]
- Hou, L.; Lu, K.; Yang, X.; Li, Y.; Xue, J. G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection. Remote Sens. 2023, 15, 757. [Google Scholar] [CrossRef]
- Yang, X.; Zhang, G.; Yang, X.; Zhou, Y.; Wang, W.; Tang, J.; He, T.; Yan, J. Detecting Rotated Objects as Gaussian Distributions and its 3-D Generalization. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4335–4354. [Google Scholar] [CrossRef] [PubMed]
- Yu, Y.; Da, F. Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 13354–13363. [Google Scholar]
- Xiao, Z.; Yang, G.-Y.; Yang, X.; Mu, T.-J.; Yan, J.; Hu, S.-M. Theoretically Achieving Continuous Representation of Oriented Bounding Boxes. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16912–16922. [Google Scholar]
- Zhao, Z.; Xue, Q.; He, Y.; Bai, Y.; Wei, X.; Gong, Y. Projecting points to axes: Oriented object detection via point-axis representation. In Proceedings of the Computer Vision—ECCV 2024, 18th European Conference, Milan, Italy, 29 September–4 October 2025; pp. 161–179. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3520–3529. [Google Scholar]
- Ding, J.; Xue, N.; Long, Y.; Xia, G.-S.; Lu, Q. Learning RoI Transformer for Oriented Object Detection in Aerial Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Han, J.; Ding, J.; Li, J.; Xia, G.-S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
- Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented RepPoints for Aerial Object Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 18 2022; pp. 1–10. [Google Scholar]
- Yu, H.; Tian, Y.; Ye, Q.; Liu, Y. Spatial Transform Decoupling for Oriented Object Detection. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 6782–6790. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Ding, J.; Xue, N.; Xia, G.-S.; Bai, X.; Yang, W.; Yang, M.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7778–7796. [Google Scholar] [CrossRef]
- Sun, X.; Wang, P.; Yan, Z.; Xu, F.; Wang, R.; Diao, W.; Chen, J.; Li, J.; Feng, Y.; Xu, T.; et al. A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 116–130. [Google Scholar] [CrossRef]
- Cheng, G.; Yuan, X.; Yao, X.; Yan, K.; Zeng, Q.; Xie, X.; Han, J. Towards Large-Scale Small Object Detection: Survey and Benchmarks. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13467–13488. [Google Scholar] [CrossRef]
- Li, Y.; Luo, J.; Zhang, Y.; Tan, Y.; Yu, J.-G.; Bai, S. Learning to Holistically Detect Bridges From Large-Size VHR Remote Sensing Imagery. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 11507–11523. [Google Scholar] [CrossRef] [PubMed]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. LDConv: Linear deformable convolution for improving convolutional neural networks. Image Vis. Comput. 2024, 149, 105190. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
(k, d) Sequence | ASKNet | LSKNet |
---|---|---|
Height (, ) | (3, 1) | (5, 1) |
Width (, ) | (5, 1) | (5, 1) |
Height (, ) | (3, 1) | (7, 3) |
Width (, ) | (7, 3) | (7, 3) |
Height (, ) | (3, 1) | (7, 1) |
Width (, ) | (7, 1) | (7, 1) |
Methods | ASKNet | LSKNet |
---|---|---|
R3Det | 0.494 | 0.506 ↑ |
A-Net | 0.706 ↑ | 0.637 1 |
Oriented RCNN | 0.739 ↑ | 0.733 |
RoI Transformer | 0.759 ↑ | 0.757 |
Types | 3 × 5 | 5 × 5 | 3 × 7 | 7 × 7 | FPS | |
---|---|---|---|---|---|---|
Unselective | ✓ | ✕ | ✕ | ✕ | 0.732 | 21.2 |
Unselective | ✕ | ✕ | ✓ | ✕ | 0.738 | 21 |
Hybrid | ✓ | ✕ | ✕ | ✓ | 0.737 | 20.4 |
Hybrid | ✕ | ✓ | ✓ | ✕ | 0.737 | 20.5 |
Symmetric | ✕ | ✓ | ✕ | ✓ | 0.733 | 20.2 |
Asymmetric | ✓ | ✕ | ✓ | ✕ | 0.739 ↑ | 20.8 |
Methods | |
---|---|
RTMDet | 0.729 |
ARC | 0.742 |
PKINet | 0.755 |
RoI Transformer + ASKNet | 0.759 ↑ |
Methods | ASKNet (Original Data) | LSKNet (Original Data) | ASKNet (Rotated Data) | LSKNet (Rotated Data) |
---|---|---|---|---|
R3Det | 0.580 | 0.585 ↑ | 0.576 | 0.580 |
A-Net | 0.573 | 0.570 | 0.574 ↑ | 0.570 |
Oriented RCNN | 0.666 ↑ | 0.665 | 0.666 ↑ | 0.661 |
RoI Transformer | 0.659 | 0.654 | 0.669 ↑ | 0.664 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Xiong, L.; Yu, Z. An Asymmetric Selective Kernel Network for Drone-Based Vehicle Detection to Build a High-Accuracy Vehicle Trajectory Dataset. Remote Sens. 2025, 17, 407. https://doi.org/10.3390/rs17030407
Wang Z, Xiong L, Yu Z. An Asymmetric Selective Kernel Network for Drone-Based Vehicle Detection to Build a High-Accuracy Vehicle Trajectory Dataset. Remote Sensing. 2025; 17(3):407. https://doi.org/10.3390/rs17030407
Chicago/Turabian StyleWang, Zhenyu, Lu Xiong, and Zhuoping Yu. 2025. "An Asymmetric Selective Kernel Network for Drone-Based Vehicle Detection to Build a High-Accuracy Vehicle Trajectory Dataset" Remote Sensing 17, no. 3: 407. https://doi.org/10.3390/rs17030407
APA StyleWang, Z., Xiong, L., & Yu, Z. (2025). An Asymmetric Selective Kernel Network for Drone-Based Vehicle Detection to Build a High-Accuracy Vehicle Trajectory Dataset. Remote Sensing, 17(3), 407. https://doi.org/10.3390/rs17030407