Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection
Abstract
:1. Introduction
- (1)
- We designed a ResSwin Backbone based on residual structure and SwinTransformer, which improves the interaction of global information and fully preserves the shallow detail features of small infrared targets through self-attentive computation and residual.
- (2)
- We proposed a plug-and-play attention module, LCA Block, which is based on local contrast calculation. This block helps to enhance the feature representation of infrared small targets and helps the network to locate and identify infrared small targets more accurately.
- (3)
- We developed an air-to-ground multi-scene infrared vehicle dataset using a UAV. The dataset has scene diversity and environment diversity, which can support infrared target detection model testing and infrared target characterization studies for aerial remote sensing. Experiments on our dataset and other infrared datasets such as DroneVehicle showed that our method achieves state-of-the-art performance and our method is also applicable in real-time.
2. Materials
2.1. Motivation
2.2. Dataset Introduction
3. Methods
3.1. RSLCANet
3.2. ResSwin Backbone
3.3. LCA Block
Algorithm 1: Local Contrast Calculation |
Input: Input feature F of dimension [b,c,h,w], conversion factor p. Output: Enhancement feature |
|
4. Experiment and Results
4.1. Evaluation Metrics
4.1.1. Precision and Recall
4.1.2. F1 Score
4.1.3. Mean Average Precision
4.1.4. Frames per Second (FPS)
4.2. Experimental Details
4.3. Comparison Experiments
4.3.1. Comparison Experiments on the Dim-Small Aircraft Targets Dataset
4.3.2. Comparison Experiments on the DroneVehicle Dataset
4.3.3. Comparison Experiments on Our Captured Multi-Scene Infrared Vehicle Dataset
4.4. Ablation Experiments
4.4.1. The Design of ResSwin Backbone
4.4.2. The Design of LCA Block
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Number | Time | Location | Weather | Height (m) |
---|---|---|---|---|
1 | 6:30 | Highway | Sunny | 80 |
2 | 15:30 | Highway | Sunny | 80 |
3 | 21:00 | Parking | Heavy fog | 50 |
4 | 20:00 | Parking | Heavy fog | 80 |
5 | 21:00 | Highway | Cloudy | 100 |
6 | 19:30 | Mall | Light Fog | 100 |
7 | 20:30 | Highway | Light Fog | 80 |
8 | 19:30 | Mall | Sunny | 100 |
9 | 10:00 | Crossroad | Light Fog | 30–200 |
10 | 11:00 | Crossroad | Light Fog | 200 |
11 | 12:30 | Crossroad | Sunny | 250 |
12 | 12:40 | Crossroad | Sunny | 80–250 |
13 | 17:00 | Crossroad | Sunny | 100 |
14 | 20:30 | Highway | Sunny | 30–100 |
15 | 20:30 | Highway | Sunny | 100 |
16 | 21:30 | Highway | Sunny | 200 |
17 | 20:30 | Highway | Sunny | 100–200 |
18 | 20:30 | Highway | Sunny | 300 |
19 | 6:30 | Highway | Sunny | 300–100 |
20 | 15:30 | Highway | Sunny | 70 |
References
- Ren, K.; Sun, W.; Meng, X.; Yang, G.; Peng, J.; Huang, J. A locally optimized model for hyperspectral and multispectral images fusion. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5519015. [Google Scholar] [CrossRef]
- Zhou, J.; Sun, W.; Meng, X.; Yang, G.; Ren, K.; Peng, J. Generalized linear spectral mixing model for spatial–temporal–spectral fusion. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5533216. [Google Scholar] [CrossRef]
- Sun, W.; Ren, K.; Meng, X.; Yang, G.; Xiao, C.; Peng, J.; Huang, J. MLR-DBPFN: A multi-scale low rank deep back projection fusion network for anti-noise hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522914. [Google Scholar] [CrossRef]
- Hou, T.; Sun, W.; Chen, C.; Yang, G.; Meng, X.; Peng, J. Marine floating raft aquaculture extraction of hyperspectral remote sensing images based decision tree algorithm. Int. J. Appl. Earth Obs. Geoinf. 2022, 111, 102846. [Google Scholar] [CrossRef]
- Sun, W.; Liu, K.; Ren, G.; Liu, W.; Yang, G.; Meng, X.; Peng, J. A simple and effective spectral-spatial method for mapping large-scale coastal wetlands using China ZY1-02D satellite hyperspectral images. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102572. [Google Scholar] [CrossRef]
- Ma, J.; Guo, H.; Rong, S.; Feng, J.; He, B. Infrared Dim and Small Target Detection Based on Background Prediction. Remote Sens. 2023, 15, 3749. [Google Scholar] [CrossRef]
- Toth, C.; Jozkow, G. Remote sensing platforms and sensors: A survey. ISPRS J. Photogramm. Remote Sens. 2016, 115, 22–36. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. pp. 740–755. [Google Scholar]
- Henini, M.; Razeghi, M. Handbook of Infrared Detection Technologies; Elsevier: Amsterdam, The Netherlands, 2002. [Google Scholar]
- Razeghi, M.; Nguyen, B.-M. Advances in mid-infrared detection and imaging: A key issues review. Rep. Prog. Phys. 2014, 77, 082401. [Google Scholar] [CrossRef]
- Li, X.; Sun, S.; Gu, L.; Liu, X. Infrared scene prediction of night unmanned vehicles based on multi-scale feature maps. Infrared Phys. Technol. 2021, 118, 103897. [Google Scholar] [CrossRef]
- Qiu, G.Y.; Wang, B.; Li, T.; Zhang, X.; Zou, Z.; Yan, C. Estimation of the transpiration of urban shrubs using the modified three-dimensional three-temperature model and infrared remote sensing. J. Hydrol. 2021, 594, 125940. [Google Scholar] [CrossRef]
- Ren, H.; Ye, X.; Nie, J.; Meng, J.; Fan, W.; Qin, Q.; Liang, Y.; Liu, H. Retrieval of land surface temperature, emissivity, and atmospheric parameters from hyperspectral thermal infrared image using a feature-band linear-format hybrid algorithm. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4401015. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, C.; Wang, B.; Chen, C.; He, J.; Zhou, Y.; Li, J. An infrared pedestrian detection method based on segmentation and domain adaptation learning. Comput. Electr. Eng. 2022, 99, 107781. [Google Scholar] [CrossRef]
- Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Proceedings of the Signal and Data Processing of Small Targets 1999, Denver, CO, USA, 18–23 July 1999; pp. 74–83. [Google Scholar]
- Chen, C.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote Sens. 2013, 52, 574–581. [Google Scholar] [CrossRef]
- Han, J.; Ma, Y.; Zhou, B.; Fan, F.; Liang, K.; Fang, Y. A robust infrared small target detection algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 2014, 11, 2168–2172. [Google Scholar]
- Han, J.; Liang, K.; Zhou, B.; Zhu, X.; Zhao, J.; Zhao, L. Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geosci. Remote Sens. Lett. 2018, 15, 612–616. [Google Scholar] [CrossRef]
- Gao, C.; Meng, D.; Yang, Y.; Wang, Y.; Zhou, X.; Hauptmann, A.G. Infrared patch-image model for small target detection in a single image. IEEE Trans. Image Process. 2013, 22, 4996–5009. [Google Scholar] [CrossRef]
- Shi, M.; Wang, H. Infrared dim and small target detection based on denoising autoencoder network. Mob. Netw. Appl. 2020, 25, 1469–1483. [Google Scholar] [CrossRef]
- Zheng, G.; Wu, X.; Hu, Y.; Liu, X. Object detection for low-resolution infrared image in land battlefield based on deep learning. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8649–8652. [Google Scholar]
- Du, S.; Zhang, P.; Zhang, B.; Xu, H. Weak and occluded vehicle detection in complex infrared environment based on improved YOLOv4. IEEE Access 2021, 9, 25671–25680. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Attentional local contrast networks for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9813–9824. [Google Scholar] [CrossRef]
- Zhu, R.; Zhuang, L. Unsupervised Infrared Small-Object-Detection Approach of Spatial–Temporal Patch Tensor and Object Selection. Remote Sens. 2022, 14, 1612. [Google Scholar] [CrossRef]
- Wang, Q.; Chi, Y.; Shen, T.; Song, J.; Zhang, Z.; Zhu, Y. Improving RGB-infrared object detection by reducing cross-modality redundancy. Remote Sens. 2022, 14, 2020. [Google Scholar] [CrossRef]
- Dang, L.M.; Wang, H.; Li, Y.; Min, K.; Kwak, J.T.; Lee, O.N.; Park, H.; Moon, H. Fusarium wilt of radish detection using RGB and near infrared images from Unmanned Aerial Vehicles. Remote Sens. 2020, 12, 2863. [Google Scholar] [CrossRef]
- Wu, J.; Shen, T.; Wang, Q.; Tao, Z.; Zeng, K.; Song, J. Local Adaptive Illumination-Driven Input-Level Fusion for Infrared and Visible Object Detection. Remote Sens. 2023, 15, 660. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2017; pp. 10012–10022. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Wang, Y.; Zhang, X.; Yang, T.; Sun, J. Anchor detr: Query design for transformer-based detector. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Virtual, 22 February–1 March 2022; pp. 2567–2575. [Google Scholar]
- Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, H.; Pang, Y.; Han, J.; Mou, E.; Cao, E. An Infrared Small Target Detection Method Based on a Weighted Human Visual Comparison Mechanism for Safety Monitoring. Remote Sens. 2023, 15, 2922. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; Christopher, S.; Laughing, L.C. Ultralytics/Yolov5: v6.0; Zenodo: Geneva, Switzerland, 2021. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Braun, M.; Krebs, S.; Flohr, F.; Gavrila, D.M. The eurocity persons dataset: A novel benchmark for object detection. arXiv 2018, arXiv:1805.07193. [Google Scholar]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Hui, B.; Song, Z.; Fan, H.; Zhong, P.; Hu, W.; Zhang, X.; Ling, J.; Su, H.; Jin, W.; Zhang, Y. A dataset for infrared detection and tracking of dim-small aircraft targets under ground/air background. China Sci. Data 2020, 5, 291–302. [Google Scholar]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Han, J.; Moradi, S.; Faramarzi, I.; Liu, C.; Zhang, H.; Zhao, Q. A local contrast method for infrared small-target detection utilizing a tri-layer window. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1822–1826. [Google Scholar] [CrossRef]
- Moradi, S.; Moallem, P.; Sabahi, M.F. Fast and robust small infrared target detection using absolute directional mean difference algorithm. Signal Process. 2020, 177, 107727. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric contextual modulation for infrared small target detection. In Proceedings of Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Salt Lake City, UT, USA, 18–23 June 2018; pp. 950–959. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Devaguptapu, C.; Akolekar, N.; Sharma, M.M.; Balasubramanian, V.N. Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Indicators | Thermal Camera | Visual Camera |
---|---|---|
Spectral Band | 8–14 μm | 0.38~0.7 μm |
Resolution | 640 × 512 | 3840 × 2160/1920 × 1080 |
Sensors | Uncooled VOx Microbolometer | 1/2 CMOS |
Ground Truth\Predicted Value | Positive | Negative |
---|---|---|
Positive | True Positive (TP) | False Negative (FN) |
Negative | False Positive (FP) | True Negative (TN) |
Method | P | R | F1 |
---|---|---|---|
RLCM [18] | 0.949 | 0.444 | 0.605 |
TLLICM [44] | 0.931 | 0.347 | 0.506 |
ADMD [45] | 0.754 | 0.455 | 0.567 |
ACM [46] | 0.523 | 0.242 | 0.331 |
Center Net [47] | 0.876 | 0.728 | 0.795 |
YOLOv5 [38] | 0.978 | 0.291 | 0.449 |
YOLOX [39] | 0.970 | 0.790 | 0.871 |
Ours | 0.996 | 0.785 | 0.878 |
Method | Backbone | [email protected] | [email protected]:0.95 | FPS | Parameters (M) |
---|---|---|---|---|---|
Faster R-CNN [48] | resnet50 | 0.744 | 0.479 | 26 | 42.0 |
DETR [31] | resnet50 | 0.874 | 0.531 | 27 | 41.3 |
Anchor DETR [33] | resnet50 | 0.890 | 0.568 | 37 | 36.8 |
YOLOX [39] | Darknet53 | 0.855 | 0.642 | 55 | 8.95 |
Ours | ResSwin | 0.898 | 0.673 | 41 | 28.5 |
Model | Dataset | [email protected] | [email protected]:0.95 | Parameter |
---|---|---|---|---|
Darknet53YOLOX | VOC2007 | 0.513 | 0.286 | 8.95 M |
ResSwinYOLOX (3 layers) | VOC2007 | 0.528(+0.015) | 0.297(+0.011) | 28.54 M |
ResSwinYOLOX (4 layers) | VOC2007 | 0.534(+0.021) | 0.305(+0.019) | 42.74 M |
Darknet53YOLOX | DroneVehicle | 0.855 | 0.642 | 8.95 M |
ResSwinYOLOX (3 layers) | DroneVehicle | 0.871(+0.016) | 0.647(+0.005) | 28.54 M |
ResSwinYOLOX (4 layers) | DroneVehicle | 0.876(+0.021) | 0.648(+0.006) | 42.74 M |
Model | Connection Mode | Dataset | [email protected] | [email protected]:0.95 | Parameter |
---|---|---|---|---|---|
Darknet53YOLOX | / | VOC2007 | 0.513 | 0.286 | 8.95 M |
ResSwinYOLOX (3 layers) | Add | VOC2007 | 0.528(+0.015) | 0.297(+0.011) | 28.54 M |
ResSwinYOLOX (3 layers) | Concat | VOC2007 | 0.523(+0.010) | 0.298(+0.013) | 28.93 M |
Darknet53YOLOX | / | DroneVehicle | 0.855 | 0.642 | 8.95 M |
ResSwinYOLOX (3 layers) | Add | DroneVehicle | 0.871(+0.016) | 0.647(+0.005) | 28.54 M |
ResSwinYOLOX (3 layers) | Concat | DroneVehicle | 0.869(+0.014) | 0.644(+0.002) | 28.93 M |
Model | LCA Block | Position | [email protected] | [email protected]:0.95 | Parameter |
---|---|---|---|---|---|
ResSwinYOLOX (3 layers) | ✗ | / | 0.871 | 0.647 | 28.54 M |
ResSwinYOLOX (3 layers) | ✓ | Backbone | 0.892(+0.021) | 0.669(+0.022) | 28.54 M |
ResSwinYOLOX (3 layers) | ✓ | Neck | 0.898(+0.027) | 0.673(+0.026) | 28.54 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, T.; Cao, J.; Hao, Q.; Bao, C.; Shi, M. Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection. Remote Sens. 2023, 15, 4387. https://doi.org/10.3390/rs15184387
Zhao T, Cao J, Hao Q, Bao C, Shi M. Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection. Remote Sensing. 2023; 15(18):4387. https://doi.org/10.3390/rs15184387
Chicago/Turabian StyleZhao, Tianhua, Jie Cao, Qun Hao, Chun Bao, and Moudan Shi. 2023. "Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection" Remote Sensing 15, no. 18: 4387. https://doi.org/10.3390/rs15184387
APA StyleZhao, T., Cao, J., Hao, Q., Bao, C., & Shi, M. (2023). Res-SwinTransformer with Local Contrast Attention for Infrared Small Target Detection. Remote Sensing, 15(18), 4387. https://doi.org/10.3390/rs15184387