AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images
Abstract
:1. Introduction
2. Materials and Methods
2.1. The AMFEF-DETR Model Architecture
2.2. Frequency-Adaptive Dilated Feature Extraction Network
2.3. Feature Interaction Utilizing the HLIFI Module
2.4. Improved Cross-Scale Feature Fusion Network
2.5. The Inner-Shape-IoU Loss Function
2.6. Datasets
2.7. Evaluation Indicators
3. Experiments and Results
3.1. Experimental Environment and Parameter Configuration
3.2. Feature Extraction Network Comparison Experiment
3.3. Analyzing the Performance of the HLIFI Module
3.4. Verifying the Effectiveness of the Adaptive Feature Fusion Network
3.5. Comparative Experiments of Different Loss Functions
3.6. Ablation Study
3.7. Comparative Experiments between the AMFEF-DETR Model and Other Advanced Models
3.8. Comparative Analysis of Different Detection Models
3.9. Visual Analysis
4. Extended Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Colomina, I.; Molina, P. Unmanned aerial systems for photogrammetry and remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2014, 92, 79–97. [Google Scholar] [CrossRef]
- Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.-L.; Chen, S.-C.; Iyengar, S.S. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge computing: Vision and challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
- Ke, R.; Li, Z.; Tang, J.; Pan, Z.; Wang, Y. Real-time traffic flow parameter estimation from UAV video based on ensemble classifier and optical flow. IEEE Trans. Intell. Transp. Syst. 2018, 20, 54–64. [Google Scholar] [CrossRef]
- Feng, Q.; Liu, J.; Gong, J. UAV remote sensing for urban vegetation mapping using random forest and texture analysis. Remote Sens. 2015, 7, 1074–1094. [Google Scholar] [CrossRef]
- Erdelj, M.; Natalizio, E.; Chowdhury, K.R.; Akyildiz, I.F. Help from the sky: Leveraging UAVs for disaster management. IEEE Pervasive Comput. 2017, 16, 24–32. [Google Scholar] [CrossRef]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Liu, Y.; Piramanayagam, S.; Monteiro, S.T.; Saber, E. Dense semantic labeling of very-high-resolution aerial imagery and lidar with fully-convolutional neural networks and higher-order CRFs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 76–85. [Google Scholar]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Bai, Z.; Pei, X.; Qiao, Z.; Wu, G.; Bai, Y. Improved YOLOv7 Target Detection Algorithm Based on UAV Aerial Photography. Drones 2024, 8, 104. [Google Scholar] [CrossRef]
- Mandal, M.; Shah, M.; Meena, P.; Devi, S.; Vipparthi, S.K. AVDNet: A small-sized vehicle detection network for aerial visual data. IEEE Geosci. Remote Sens. Lett. 2019, 17, 494–498. [Google Scholar] [CrossRef]
- Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A. Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. pp. 740–755. [Google Scholar]
- Baykara, H.C.; Bıyık, E.; Gül, G.; Onural, D.; Öztürk, A.S.; Yıldız, I. Real-time detection, tracking and classification of multiple moving objects in UAV videos. In Proceedings of the 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, USA, 6–8 November 2017; pp. 945–950. [Google Scholar]
- Bazi, Y.; Melgani, F. Convolutional SVM networks for object detection in UAV imagery. IEEE Trans. Geosci. Remote Sens. 2018, 56, 3107–3118. [Google Scholar] [CrossRef]
- Abughalieh, K.M.; Sababha, B.H.; Rawashdeh, N.A. A video-based object detection and tracking system for weight sensitive UAVs. Multimed. Tools Appl. 2019, 78, 9149–9167. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Roh, B.; Shin, J.; Shin, W.; Kim, S. Sparse detr: Efficient end-to-end object detection with learnable sparsity. arXiv 2021, arXiv:2111.14330. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
- Cheng, Q.; Wang, Y.; He, W.; Bai, Y. Lightweight air-to-air unmanned aerial vehicle target detection model. Sci. Rep. 2024, 14, 2609. [Google Scholar] [CrossRef]
- Zhang, W.; Hong, Z.; Xiong, L.; Zeng, Z.; Cai, Z.; Tan, K. Sinextnet: A New Small Object Detection Model for Aerial Images Based on PP-Yoloe. J. Artif. Intell. Soft Comput. Res. 2024, 14, 251–265. [Google Scholar] [CrossRef]
- Wang, S.; Jiang, H.; Li, Z.; Yang, J.; Ma, X.; Chen, J.; Tang, X. PHSI-RTDETR: A Lightweight Infrared Small Target Detection Algorithm Based on UAV Aerial Photography. Drones 2024, 8, 240. [Google Scholar] [CrossRef]
- Jin, R.; Jia, Z.; Yin, X.; Niu, Y.; Qi, Y. Domain Feature Decomposition for Efficient Object Detection in Aerial Images. Remote Sens. 2024, 16, 1626. [Google Scholar] [CrossRef]
- Wu, M.; Yun, L.; Wang, Y.; Chen, Z.; Cheng, F. Detection algorithm for dense small objects in high altitude image. Digit. Signal Process. 2024, 146, 104390. [Google Scholar] [CrossRef]
- Tan, S.; Duan, Z.; Pu, L. Multi-scale object detection in UAV images based on adaptive feature fusion. PLoS ONE 2024, 19, e0300120. [Google Scholar] [CrossRef] [PubMed]
- Battish, N.; Kaur, D.; Chugh, M.; Poddar, S. SDMNet: Spatially dilated multi-scale network for object detection for drone aerial imagery. Image Vis. Comput. 2024, 150, 105232. [Google Scholar] [CrossRef]
- Wang, X.; He, N.; Hong, C.; Sun, F.; Han, W.; Wang, Q. YOLO-ERF: Lightweight object detector for UAV aerial images. Multimed. Syst. 2023, 29, 3329–3339. [Google Scholar] [CrossRef]
- Chen, L.; Gu, L.; Zheng, D.; Fu, Y. Frequency-Adaptive Dilated Convolution for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3414–3425. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Pan, Z.; Cai, J.; Zhuang, B. Fast vision transformers with hilo attention. Adv. Neural Inf. Process. Syst. 2022, 35, 14541–14554. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More effective intersection over union loss with auxiliary bounding box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6070–6079. [Google Scholar]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023, arXiv:2311.11587. [Google Scholar]
- Zhong, J.; Chen, J.; Mian, A. DualConv: Dual convolutional kernels for lightweight deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9528–9535. [Google Scholar] [CrossRef]
- Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, Don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
- Zhang, J.; Li, X.; Li, J.; Liu, L.; Xue, Z.; Zhang, B.; Jiang, Z.; Huang, T.; Wang, Y.; Wang, C. Rethinking mobile block for efficient attention-based models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 1389–1400. [Google Scholar]
- Jiang, P.-T.; Zhang, C.-B.; Hou, Q.; Cheng, M.-M.; Wei, Y. Layercam: Exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 2021, 30, 5875–5888. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
- Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Yang, C.; Huang, Z.; Wang, N. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13668–13677. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 3490–3499. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Yao, Z.; Ai, J.; Li, B.; Zhang, C. Efficient detr: Improving end-to-end object detector with dense prior. arXiv 2021, arXiv:2104.01318. [Google Scholar]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Type | Number | Pedestrian | People | Bicycle | Car | Van | Truck | Tricycle | Awning Tricycle | Bus | Motor |
---|---|---|---|---|---|---|---|---|---|---|---|
Training | 6471 | 109,185 | 38,560 | 13,069 | 187,004 | 32,702 | 16,284 | 6387 | 4377 | 9117 | 40,377 |
Validation | 548 | 8844 | 5125 | 1287 | 14,064 | 1975 | 750 | 1045 | 532 | 251 | 4886 |
Test | 1610 | 21,006 | 6376 | 1302 | 28,074 | 5771 | 2659 | 530 | 599 | 2940 | 5845 |
Total | 8629 | 139,035 | 50,061 | 15,658 | 229,142 | 40,448 | 19,693 | 7962 | 5508 | 12,308 | 51,108 |
Type | Version | Type | Value |
---|---|---|---|
GPU | RTX 4090 | Optimizer | AdamW |
Python | 3.8.0 | Batch | 4 |
Pytorch | 2.0.0 | Learning rate | 1 × 10−4 |
CUDA | 11.8 | Momentum | 0.9 |
Model | (%) | (%) | (%) | (%) | Param (M) | GFLOPs (G) |
---|---|---|---|---|---|---|
BasicBlock | 54.76 | 38.31 | 37.34 | 21.69 | 19.97 | 57.3 |
AKConv-Block | 55.38 | 38.72 | 37.78 | 22.19 | 15.63 | 51.8 |
DualConv-Block | 57.81 | 39.67 | 38.88 | 22.73 | 16.20 | 52.5 |
DySnakeConv-Block | 55.10 | 39.30 | 37.89 | 22.06 | 29.98 | 65.1 |
PConv-Block | 56.99 | 39.33 | 38.50 | 22.41 | 14.45 | 50.2 |
iRMB-Block | 55.92 | 39.97 | 38.64 | 22.65 | 16.72 | 53.4 |
FADC-Block | 57.50 | 40.08 | 39.01 | 22.85 | 20.09 | 49.8 |
Model | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) |
---|---|---|---|---|---|---|---|---|
1. PAFPN (base) | 60.83 | 47.22 | 48.43 | 29.96 | 57.04 | 39.95 | 38.82 | 22.65 |
2. BiFPN | 63.35 | 48.69 | 50.50 | 31.28 | 57.29 | 41.72 | 40.11 | 23.46 |
3. BiFPN + Weighted Fusion | 63.39 | 48.75 | 50.82 | 31.30 | 57.07 | 41.05 | 40.23 | 23.55 |
4. BiFPN + Concatenation Fusion | 63.71 | 48.83 | 50.46 | 31.39 | 57.41 | 40.96 | 40.13 | 23.22 |
5. BiFPN + Adaptive Fusion (BAFPN) | 63.48 | 49.08 | 50.99 | 31.74 | 57.88 | 41.36 | 40.29 | 23.74 |
Loss Function | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) |
---|---|---|---|---|---|---|---|---|
GIoU | 63.48 | 49.08 | 50.99 | 31.74 | 57.88 | 41.36 | 40.29 | 23.74 |
DIoU | 63.01 | 49.96 | 51.37 | 32.17 | 57.68 | 41.64 | 40.44 | 23.94 |
CIoU | 63.68 | 49.77 | 51.07 | 31.76 | 56.77 | 41.77 | 40.59 | 23.93 |
SIoU | 63.28 | 49.84 | 51.56 | 32.12 | 57.69 | 42.03 | 40.85 | 24.09 |
Shape-IoU (scale = 0.0) | 63.80 | 48.98 | 50.26 | 30.62 | 57.25 | 41.04 | 39.88 | 23.02 |
Shape-IoU (scale = 0.5) | 63.47 | 50.09 | 51.79 | 32.19 | 58.01 | 41.60 | 40.82 | 23.90 |
Shape-IoU (scale = 1.0) | 63.57 | 49.33 | 51.18 | 31.45 | 56.71 | 42.09 | 40.31 | 23.37 |
Shape-IoU (scale = 1.5) | 63.32 | 49.90 | 51.79 | 32.18 | 56.48 | 41.91 | 40.67 | 23.96 |
Inner-Shape-IoU (ratio = 0.70) | 64.05 | 49.42 | 51.42 | 32.23 | 57.24 | 41.82 | 40.86 | 24.09 |
Inner-Shape-IoU (ratio = 0.75) | 63.68 | 50.01 | 51.87 | 32.27 | 59.62 | 41.66 | 41.36 | 24.28 |
Inner-Shape-IoU (ratio = 0.80) | 63.70 | 49.77 | 51.22 | 31.75 | 56.98 | 41.37 | 39.96 | 23.37 |
Inner-Shape-IoU (ratio = 1.10) | 64.32 | 49.43 | 51.56 | 32.26 | 57.94 | 42.14 | 40.93 | 24.06 |
Inner-Shape-IoU (ratio = 1.13) | 64.17 | 49.04 | 51.50 | 32.10 | 58.20 | 41.15 | 40.91 | 24.06 |
Inner-Shape-IoU (ratio = 1.15) | 64.50 | 49.04 | 51.10 | 31.83 | 58.85 | 42.09 | 41.01 | 24.25 |
Methods | FADC-ResNet | HLIFI | BAFPN | Inner-Shape-IoU | (%) | (%) | (%) | (%) | F1 (%) |
---|---|---|---|---|---|---|---|---|---|
1. Base | 54.76 | 38.31 | 37.34 | 21.69 | 44 | ||||
2 | √ | 57.50 | 40.08 | 39.01 | 22.85 | 46 | |||
3 | √ | 56.75 | 39.32 | 38.46 | 22.45 | 46 | |||
4 | √ | 57.76 | 41.42 | 40.56 | 23.88 | 47 | |||
5 | √ | 57.35 | 40.22 | 39.47 | 23.27 | 46 | |||
6 | √ | √ | 57.04 | 39.95 | 38.82 | 22.65 | 46 | ||
7 | √ | √ | √ | 57.88 | 41.36 | 40.29 | 23.74 | 48 | |
8. Ours | √ | √ | √ | √ | 59.62 | 41.66 | 41.36 | 24.28 | 48 |
Class | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) |
---|---|---|---|---|---|---|---|---|
All | 63.68 | 50.01 | 51.87 | 32.27 | 59.62 | 41.66 | 41.36 | 24.28 |
Pedestrian | 68.83 | 53.31 | 58.67 | 29.28 | 61.86 | 38.37 | 41.15 | 17.34 |
People | 64.70 | 49.91 | 51.10 | 22.63 | 61.56 | 26.44 | 29.86 | 11.35 |
Bicycle | 45.01 | 27.97 | 25.82 | 12.40 | 46.56 | 18.36 | 16.38 | 7.07 |
Car | 81.48 | 84.44 | 86.81 | 63.75 | 79.32 | 77.05 | 78.98 | 51.54 |
Van | 70.11 | 48.41 | 53.80 | 40.86 | 57.59 | 43.55 | 40.52 | 29.21 |
Truck | 65.61 | 44.27 | 46.75 | 32.20 | 60.39 | 49.49 | 49.23 | 32.18 |
Tricycle | 54.81 | 42.01 | 41.28 | 24.28 | 40.44 | 35.47 | 26.67 | 14.82 |
Awning Tricycle | 42.06 | 20.49 | 20.87 | 13.15 | 54.03 | 25.21 | 25.46 | 16.15 |
Bus | 80.22 | 65.34 | 70.02 | 52.44 | 78.08 | 55.23 | 60.59 | 44.03 |
Motor | 63.96 | 63.84 | 63.58 | 31.67 | 56.34 | 47.42 | 44.76 | 19.15 |
Model | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | Param (M) | GFLOPs (G) | FPS (s) |
---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv5-L | 55.25 | 43.45 | 44.48 | 27.43 | 48.53 | 37.31 | 35.32 | 20.71 | 46.51 | 109 | 101 |
YOLOv6-L | 54.09 | 40.82 | 42.64 | 26.38 | 47.70 | 36.38 | 34.80 | 20.33 | 59.60 | 151 | 47 |
YOLOv8-M | 54.45 | 41.01 | 42.78 | 26.17 | 46.12 | 36.43 | 34.61 | 20.24 | 25.85 | 79 | 135 |
YOLOv8-L | 56.33 | 42.58 | 44.75 | 27.71 | 49.62 | 37.02 | 35.87 | 21.19 | 43.61 | 165 | 111 |
YOLOv9 | 57.01 | 43.88 | 45.89 | 28.61 | 50.56 | 39.92 | 38.57 | 23.15 | 57.30 | 189 | 112 |
YOLOv10-L | 56.31 | 42.94 | 44.55 | 27.47 | 51.79 | 37.12 | 36.75 | 21.41 | 24.37 | 120 | 63 |
QueryDet | 60.69 | 45.85 | 48.12 | 29.79 | 52.21 | 36.98 | 38.08 | 23.03 | - | - | 21.6 |
TOOD | 55.22 | 40.18 | 41.92 | 25.58 | 46.85 | 34.03 | 33.92 | 20.19 | 32.04 | 199 | 34.9 |
RTMDet | 55.74 | 41.26 | 43.18 | 26.36 | 48.15 | 34.73 | 35.36 | 21.16 | 52.30 | 80 | 37.7 |
Efficient DETR | 58.75 | 44.02 | 46.08 | 28.49 | 49.54 | 36.18 | 36.76 | 22.07 | 32.01 | 159 | - |
RT-DETR-R18 | 61.34 | 45.40 | 47.02 | 28.80 | 54.76 | 38.31 | 37.34 | 21.69 | 20.18 | 58 | 72.4 |
RT-DETR-R34 | 60.76 | 44.35 | 46.21 | 28.29 | 53.63 | 38.52 | 37.41 | 21.96 | 31.44 | 90 | 60.2 |
RT-DETR-R50 | 63.60 | 48.94 | 50.31 | 31.25 | 56.95 | 41.52 | 40.24 | 23.55 | 42.94 | 135 | 53.5 |
RT-DETR-L | 63.27 | 46.42 | 48.38 | 29.62 | 56.88 | 39.26 | 38.49 | 22.38 | 32.01 | 108 | 59.2 |
AMFEF-DETR | 63.68 | 50.01 | 51.87 | 32.27 | 59.62 | 41.66 | 41.36 | 24.28 | 35.81 | 142 | 84.5 |
Model |
Person (%) |
Car (%) |
Bicycle (%) |
OtherVehicle (%) |
DontCare (%) | (%) | (%) |
---|---|---|---|---|---|---|---|
YOLOv5 | 92.58 | 98.05 | 90.10 | 73.31 | 23.30 | 75.47 | 48.18 |
YOLOv6 | 94.17 | 96.48 | 91.37 | 52.69 | 57.16 | 78.38 | 49.73 |
YOLOv8 | 94.47 | 96.59 | 91.41 | 57.78 | 59.30 | 79.91 | 51.04 |
YOLOv9 | 92.29 | 98.87 | 92.80 | 77.26 | 43.12 | 80.89 | 52.49 |
YOLOv10 | 88.01 | 96.80 | 84.49 | 66.10 | 52.48 | 77.70 | 47.39 |
RT-DETR | 93.67 | 97.37 | 90.08 | 59.59 | 53.16 | 78.77 | 49.59 |
RT-DETR-R34 | 92.28 | 96.46 | 88.88 | 50.61 | 47.43 | 75.13 | 46.91 |
RT-DETR-R50 | 93.43 | 96.85 | 90.63 | 55.94 | 64.82 | 80.33 | 51.27 |
RT-DETR-L | 94.15 | 96.87 | 90.72 | 57.14 | 56.49 | 79.07 | 49.65 |
AMFEF-DETR | 94.17 | 96.07 | 91.01 | 58.46 | 67.51 | 81.45 | 53.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, S.; Jiang, H.; Yang, J.; Ma, X.; Chen, J. AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images. Drones 2024, 8, 523. https://doi.org/10.3390/drones8100523
Wang S, Jiang H, Yang J, Ma X, Chen J. AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images. Drones. 2024; 8(10):523. https://doi.org/10.3390/drones8100523
Chicago/Turabian StyleWang, Sen, Huiping Jiang, Jixiang Yang, Xuan Ma, and Jiamin Chen. 2024. "AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images" Drones 8, no. 10: 523. https://doi.org/10.3390/drones8100523
APA StyleWang, S., Jiang, H., Yang, J., Ma, X., & Chen, J. (2024). AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images. Drones, 8(10), 523. https://doi.org/10.3390/drones8100523