An Efficient Rep-Style Gaussian–Wasserstein Network: Improved UAV Infrared Small Object Detection for Urban Road Surveillance and Safety
Abstract
:1. Introduction
- By redesigning and improving the backbone and neck, network parameters are reduced, and target detection accuracy is improved.
- A new loss function is proposed. Aiming to address the drawbacks of the existing loss function in small target recognition, we propose the loss function LGWPIoU to improve target detection accuracy.
- To the best of our knowledge, this is the first time that up to five small target detection categories are considered using only UAV infrared images.
2. Materials and Methods
2.1. Datasets
2.1.1. HIT-UAV
2.1.2. DroneVehicle
2.2. Methods
2.2.1. Backbone
2.2.2. Neck
2.2.3. Loss Function of the Head
3. Results
3.1. Evaluation Metrics
3.1.1. Precision
3.1.2. Average Precision (AP)
3.1.3. Mean Average Precision (mAP)
3.1.4. mAP50
3.2. Comparative Experiments
3.2.1. Results on the HIT-UAV Dataset
3.2.2. Results on the DroneVehicle Dataset
3.3. Ablation Experiment
4. Discussion
- During model training, the degree of fluctuation in the training curves on the HIT-UAV dataset was larger than that on the DroneVehicle dataset, indicating that the large sample dataset was more suitable for training.
- The mAP50 score of the proposed algorithm on both datasets was greater than 80%, but it still has space for improvement; hence, we plan to improve the algorithm in future work.
- To understand which bounding boxes were used to make predictions, in this paper, we used class activation maps (CAM) [43,44,45] to help overcome the black-box rationale of deep learning models. The CAM of different classes from the DroneVehicle datasets are shown in Figure 16. The redder color indicates the higher classification contribution and the bluer color represents the lower classification contribution
- It can be seen from the CAM results on the DroneVehicle dataset that the ERGW-net achieves accurate localization results for small road targets from different infrared aerial images and categorizes them into different classes with high efficiency.
- In order to comparatively analyze the CAM visualization results of ERGW-net on the different datasets, Figure 17 shows the CAM visualization results of the algorithm on the HIT-UAV dataset for different classes.
- Figure 17 illustrates that the algorithm’s CAM results on the HIT-UAV dataset are not as good as those on the DroneVehicle dataset. This is because the number of samples is small and the feature information of small targets such as bicycles and people is not obvious in the HIT-UAV dataset.
- In order to understand the false recognition rate of the ERGW-net between different classes during target detection, the confusion matrix of the algorithm on different datasets is given in Figure 18.
- Figure 18a shows that there is a higher probability that some background objects such as streetlights or small thermal targets are recognized as people, which is up to 0.46.
- Because there was no corresponding object called “DontCare” in the real world, and the number of relevant samples was very small in the HIT-UAV dataset, the mAP50 of “DontCare” is quite low.
- Because some of the traffic facilities and small houses have similar imaging features to car in the infrared image, it can be seen in Figure 18b that there is a higher probability that some of the objects in the background will be detected as car.
- According to Figure 18b, it is known that sometimes a truck is recognized as a freight_car; meanwhile, a freight_car is also detected as a truck. This is because a truck and a freight_car have relatively similar features except for the shape ratio.
- Compared with similar research, the number of classes detected using our method based on UAV infrared images is more than that of the other methods, and the proposed loss function in this method has some value for subsequent small object detection research.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cheng, J.R.; Gen, M. Accelerating genetic algorithms with GPU computing: A selective overview. Comput. Ind. Eng. 2019, 128, 514–525. [Google Scholar] [CrossRef]
- Pennisi, S. The Integrated Circuit Industry at a Crossroads: Threats and Opportunities. Chips 2022, 1, 150–171. [Google Scholar] [CrossRef]
- Hao, Y.; Xiang, S.; Han, G.; Zhang, J.; Ma, X.; Zhu, Z.; Guo, X.; Zhang, Y.; Han, Y.; Song, Z.; et al. Recent progress of integrated circuits and optoelectronic chips. Sci. China Inf. Sci. 2021, 64, 201401. [Google Scholar] [CrossRef]
- Lee, C.Y.; Lin, H.J.; Yeh, M.Y.; Ling, J. Effective Remote Sensing from the Internet of Drones through Flying Control with Lightweight Multitask Learning. Appl. Sci. 2022, 12, 4657. [Google Scholar] [CrossRef]
- Ecke, S.; Dempewolf, J.; Frey, J.; Schwaller, A.; Endres, E.; Klemmt, H.J.; Tiede, D.; Seifert, T. UAV-Based Forest Health Monitoring: A Systematic Review. Remote Sens. 2022, 14, 3205. [Google Scholar] [CrossRef]
- Zhang, J.Z.; Guo, W.; Zhou, B.; Okin, G.S. Drone-Based Remote Sensing for Research on Wind Erosion in Drylands: Possible Applications. Remote Sens. 2021, 13, 283. [Google Scholar] [CrossRef]
- Wavrek, M.T.; Carr, E.; Jean-Philippe, S.; McKinney, M.L. Drone remote sensing in urban forest management: A case study. Urban For. Urban Green. 2023, 86, 127978. [Google Scholar] [CrossRef]
- Wang, X.T.; Pan, Z.J.; Gao, H.; He, N.X.; Gao, T.G. An efficient model for real-time wildfire detection in complex scenarios based on multi-head attention mechanism. J. Real Time Image Process. 2023, 20, 4. [Google Scholar] [CrossRef]
- Liu, H.M.; Jin, F.; Zeng, H.; Pu, H.Y.; Fan, B. Image Enhancement Guided Object Detection in Visually Degraded Scenes. IEEE Trans. Neural Netw. Learn. Syst. 2023. [Google Scholar] [CrossRef]
- Zhang, T. Target Detection for Motion Images Using the Improved YOLO Algorithm. J. Database Manag. 2023, 34, 3. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef]
- La Salandra, M.; Colacicco, R.; Dellino, P.; Capolongo, D. An Effective Approach for Automatic River Features Extraction Using High-Resolution UAV Imagery. Drones 2023, 7, 70. [Google Scholar] [CrossRef]
- Fakhri, S.A.; Satari Abrovi, M.; Zakeri, H.; Safdarinezhad, A.; Fakhri, S.A. Pavement crack detection through a deep-learned asymmetric encoder-decoder convolutional neural network. Int. J. Pavement Eng. 2023, 24, 2255359. [Google Scholar] [CrossRef]
- Perz, R.; Wronowski, K.; Domanski, R.; Dąbrowski, I. Case study of detection and monitoring of wildlife by UAVs equipped with RGB camera and TIR camera. Aircr. Eng. Aerosp. Technol. 2023, 95, 1461–1469. [Google Scholar] [CrossRef]
- Zhang, J.Y.; Rao, Y. A Target Recognition Method Based on Multiview Infrared Images. Sci. Program. 2022, 2022, 1358586. [Google Scholar]
- Iwasaki, Y.; Kawata, S. A Robust Method for Detecting Vehicle Positions and Their Movements Even in Bad Weather Using Infrared Thermal Images. In Technological Developments in Education and Automation; Springer: Dordrecht, The Netherlands, 2010; pp. 213–217. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1; Curran Associates Inc.: Lake Tahoe, NV, USA, 2012; pp. 1097–1105. [Google Scholar]
- Zhang, X.; Zhu, X. Vehicle Detection in the Aerial Infrared Images via an Improved Yolov3 Network. In Proceedings of the IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 372–376. [Google Scholar]
- Ren, K.; Gao, Y.; Wan, M.; Gu, G.; Chen, Q. Infrared small target detection via region super resolution generative adversarial network. Appl. Intell. 2022, 52, 11725–11737. [Google Scholar] [CrossRef]
- Alhammadi, S.A.; Alhameli, S.A.; Almaazmi, F.A.; Almazrouei, B.H.; Almessabi, H.A.; Abu-Kheil, Y. Thermal-Based Vehicle Detection System using Deep Transfer Learning under Extreme Weather Conditions. In Proceedings of the 8th International Conference on Information Technology Trends (ITT), Dubai, United Arab Emirates, 25–26 May 2022; pp. 119–123. [Google Scholar]
- Zhang, X.X.; Zhu, X. Moving vehicle detection in aerial infrared image sequences via fast image registration and improved YOLOv3 network. Int. J. Remote Sens. 2020, 41, 4312–4335. [Google Scholar] [CrossRef]
- Bhadoriya, A.S.; Vegamoor, V.; Rathinam, S. Vehicle Detection and Tracking Using Thermal Cameras in Adverse Visibility Conditions. Sensors 2022, 22, 4567. [Google Scholar] [CrossRef]
- Tichý, T.; Švorc, D.; Růžička, M.; Bělinová, Z. Thermal Feature Detection of Vehicle Categories in the Urban Area. Sustainability 2021, 13, 6873. [Google Scholar] [CrossRef]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-Based RGB-Infrared Cross-Modality Vehicle Detection Via Uncertainty-Aware Learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Suo, J.; Wang, T.; Zhang, X.; Chen, H.; Zhou, W.; Shi, W. HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection. Sci. Data 2023, 10, 227. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.H.; Hou, B.; Wu, Z.T.; Ren, B.; Ren, Z.L.; Jiao, L.C. Gaussian Synthesis for High-Precision Location in Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5619612. [Google Scholar] [CrossRef]
- Wen, L.; Cheng, Y.; Fang, Y.; Li, X. A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 2023, 224, 119960. [Google Scholar] [CrossRef]
- Liu, C.; Sui, X.; Kuang, X.; Liu, Y.; Gu, G.; Chen, Q. Adaptive Contrast Enhancement for Infrared Images Based on the Neighborhood Conditional Histogram. Remote Sens. 2019, 11, 1381. [Google Scholar] [CrossRef]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
- Szegedy, C.; Wei, L.; Yangqing, J.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Weng, K.; Chu, X.; Xu, X.; Huang, J.; Wei, X. EfficientRep: An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design. arXiv 2023, arXiv:2302.00386. [Google Scholar]
- Sergey, I.; Christian, S. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Xu, C.; Wang, J.; Yang, W.; Yu, H.; Yu, L.; Xia, G.-S. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2022, 190, 79–93. [Google Scholar] [CrossRef]
- Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Lan, J.H.; Zhang, C.; Lu, W.J.; Gu, N.W. Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images. J. Indian Soc. Remote Sens. 2023, 51, 1427–1439. [Google Scholar] [CrossRef]
- Padilla, R.; Netto, S.L.; Silva, E.A.B.d. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Mahaur, B.; Mishra, K.K. Small-object detection based on YOLOv5 in autonomous driving systems. Pattern Recognit. Lett. 2023, 168, 115–122. [Google Scholar] [CrossRef]
- Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Kim, J.H.; Kim, N.; Won, C.S. High-Speed Drone Detection Based On Yolo-V8. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–2. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
Different Methods, Classes | Person | Car | Bicycle | Other Vehicles | DontCare | mAP50 |
---|---|---|---|---|---|---|
Faster RCNN | 20.5 | 80.4 | 51 | 35.6 | 40 | 45.5 |
YOLOV5 | 91.2 | 98 | 85.9 | 77.1 | 37.1 | 78 |
YOLOV7 | 90.4 | 97.5 | 92.1 | 78.1 | 26.3 | 77.5 |
YOLOV8 | 90.4 | 97.9 | 87.9 | 71.7 | 29.7 | 75.5 |
Ours | 91.2 | 98 | 88.9 | 83.4 | 45.7 | 81.5 |
Different Methods, Classes | Car | Truck | Bus | Van | Freight Car | mAP50 |
---|---|---|---|---|---|---|
Faster RCNN | 80.4 | 53.2 | 73.5 | 47.5 | 46.9 | 60.3 |
YOLOV5 | 96 | 73 | 94.1 | 59.3 | 71.4 | 78.8 |
YOLOV7 | 96.7 | 73.4 | 95 | 64.4 | 70.0 | 79.8 |
YOLOV8 | 96.5 | 73.6 | 94.8 | 64.4 | 72.5 | 80.4 |
Ours | 96.9 | 77.9 | 96.1 | 66.8 | 74.8 | 82.5 |
iRepblock | ERC Block | mAP50 on DroneVehicle | |
---|---|---|---|
✓ | ✓ | 80.1 | |
✓ | ✓ | 79.2 | |
✓ | ✓ | 78.4 | |
✓ | ✓ | ✓ | 82.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aibibu, T.; Lan, J.; Zeng, Y.; Lu, W.; Gu, N. An Efficient Rep-Style Gaussian–Wasserstein Network: Improved UAV Infrared Small Object Detection for Urban Road Surveillance and Safety. Remote Sens. 2024, 16, 25. https://doi.org/10.3390/rs16010025
Aibibu T, Lan J, Zeng Y, Lu W, Gu N. An Efficient Rep-Style Gaussian–Wasserstein Network: Improved UAV Infrared Small Object Detection for Urban Road Surveillance and Safety. Remote Sensing. 2024; 16(1):25. https://doi.org/10.3390/rs16010025
Chicago/Turabian StyleAibibu, Tuerniyazi, Jinhui Lan, Yiliang Zeng, Weijian Lu, and Naiwei Gu. 2024. "An Efficient Rep-Style Gaussian–Wasserstein Network: Improved UAV Infrared Small Object Detection for Urban Road Surveillance and Safety" Remote Sensing 16, no. 1: 25. https://doi.org/10.3390/rs16010025
APA StyleAibibu, T., Lan, J., Zeng, Y., Lu, W., & Gu, N. (2024). An Efficient Rep-Style Gaussian–Wasserstein Network: Improved UAV Infrared Small Object Detection for Urban Road Surveillance and Safety. Remote Sensing, 16(1), 25. https://doi.org/10.3390/rs16010025