Fully Deformable Convolutional Network for Ship Detection in Remote Sensing Imagery
Abstract
:1. Introduction
- A novel fully deformable convolutional network for ship detection in HRSI is proposed, which is named FD-Net.
- In FD-Net, we design an enhanced feature pyramid network (EFPN) to improve the ability to detect ships with variable scale, orientation and shape by integrating deformable convolution into the entire network structure. At the same time, we design an adaptive balanced feature integrated (ABFI) module to detect dense and small objects by modeling the scale-sensitive dependence among feature maps and highlighting the object feature.
- We propose a novel crop mosaic data augmentation method to improve the diversity of the dataset while preserving the target information as much as possible.
- We evaluate the effect of some training methods on the accuracy of the proposed model. Compared with other remote sensing ship detection methods, experiments verify that our method achieves higher detection accuracy on the two public remote sensing datasets mentioned above. Ablation experiments confirmed that all parts of our method have a positive effect on the improvement of the detection result.
2. Related Work
2.1. Multiple-Scale Object Detection Methods
2.2. Deformable Convolutional Networks
3. Proposed Method
3.1. Fully Deformable Convolution Network
3.2. Enhanced Feature Pyramid Network
3.3. Adaptive Balanced Feature Integrated Module
3.4. Crop Mosaic Data Augmentation
4. Experiments
4.1. Datasets
4.2. Implementation Details
4.3. Metrics
4.4. Ablation Experiments
4.5. Results of DOTA Dataset
4.6. Results of the DIOR Dataset
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, D.; Zhan, J.; Tan, L.; Gao, Y.; Župan, R. Comparison of two deep learning methods for ship target recognition with optical remotely sensed data. Neural Comput. Appl. 2021, 33, 4639–4649. [Google Scholar] [CrossRef]
- Feng, Y.; Diao, W.; Sun, X.; Yan, M.; Gao, X. Towards automated ship detection and category recognition from high-resolution aerial images. Remote Sens. 2019, 11, 1901–1924. [Google Scholar] [CrossRef] [Green Version]
- Lippitt, C.D.; Zhang, S. The impact of small unmanned airborne platforms on passive optical remote sensing: A conceptual perspective. Int. J. Remote Sens. 2018, 39, 4852–4868. [Google Scholar] [CrossRef]
- Xu, J.; Fu, K.; Sun, X. An Invariant Generalized Hough Transform Based Method of Inshore Ships Detection. In Proceedings of the 2011 International Symposium on Image and Data Fusion (ISIDF), Tengchong, Yunnan, China, 9–11 August 2011; pp. 1–4. [Google Scholar]
- Weber, J.; Lefevre, S. A multivariate hit-or-miss transform for conjoint spatial and spectral template matching. In Proceedings of the International Conference on Image and Signal Processing, Cherbourg, France, 1–3 July 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 226–235. [Google Scholar]
- Corbane, C.; Najman, L.; Pecoudl, E.; Demagistrit, L.; Petit, M. A complete processing chain for ship detection using optical satellite imagery. Int. J. Remote Sens. 2010, 31, 5837–5854. [Google Scholar] [CrossRef]
- Proia, N.; Pagé, V. Characterization of a Bayesian Ship Detection Method in Optical Satellite Images. IEEE Geosci. Remote Sens. Lett. 2010, 7, 226–230. [Google Scholar] [CrossRef]
- Nie, T.; He, B.; Bi, G.; Zhang, Y.; Wang, W. A method of ship detection under complex background. Int. J. Geo Inf. 2017, 6, 159–177. [Google Scholar] [CrossRef] [Green Version]
- Qi, S.; Ma, J.; Lin, J.; Li, Y.; Tian, J. Unsupervised ship detection based on saliency and s-hog descriptor from optical satellite images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1451–1455. [Google Scholar]
- Dong, C.; Liu, J.; Xu, F. Ship detection in optical remote sensing images based on saliency and a rotation-invariant descriptor. Remote Sens. 2018, 10, 400–419. [Google Scholar] [CrossRef] [Green Version]
- Su, X.; Yang, G.; Sang, H. Ship detection in polarimetric sar based on support vector machine. Res. J. Appl. Sci. Eng. Technol. 2012, 4, 3448–3454. [Google Scholar]
- Yu, Y.; Ai, H.; He, X.; Yu, S.; Zhong, X.; Lu, M. Ship Detection in Optical Satellite Images Using Haar-like Features and Periphery-Cropped Neural Networks. IEEE Access 2018, 6, 71122–71131. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Dong, Y.; Chen, F.; Han, S.; Liu, H. Ship Object Detection of Remote Sensing Image Based on Visual Attention. Remote Sens. 2021, 13, 3192–3210. [Google Scholar] [CrossRef]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens. 2018, 10, 132–146. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Ma, L.; Chen, H. Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Wang, C.; Bai, X.; Wang, S.; Zhou, J.; Ren, P. Multiscale Visual Attention Networks for Object Detection in VHR Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 310–314. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. VarifocalNet: An IoU-aware Dense Object Detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 8514–8523. [Google Scholar]
- Zhang, H.; Wang, Y.; Dayoub, F.; Sünderhauf, N. Swa Object Detection. arXiv 2020, arXiv:2012.12645. [Google Scholar]
- Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.; Liao, H. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Lin, T.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 821–830. [Google Scholar]
- Tang, G.; Zhuge, Y.; Claramunt, C.; Men, S. N-YOLO: A SAR Ship Detection Using Noise-Classifying and Complete-Target Extraction. Remote Sens. 2021, 13, 871–887. [Google Scholar] [CrossRef]
- Ultralytics. YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2021).
- Li, L.; Jiang, L.; Zhang, J.; Wang, S.; Chen, F. A Complete YOLO-Based Ship Detection Method for Thermal Infrared Remote Sensing Images under Complex Backgrounds. Remote Sens. 2022, 14, 1534–1547. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Zhu, M.; Hu, G.; Zhou, H.; Wang, S.; Feng, Z.; Yue, S. A Ship Detection Method via Redesigned FCOS in Large-Scale SAR Images. Remote Sens. 2022, 14, 1153–1171. [Google Scholar] [CrossRef]
- Dong, C.; Liu, J.; Xu, F.; Liu, C. Ship Detection from Optical Remote Sensing Images Using Multi-Scale Analysis and Fourier HOG Descriptor. Remote Sens. 2019, 11, 1529–1548. [Google Scholar] [CrossRef] [Green Version]
- Xu, X.; Zhang, X.; Zhang, T. Lite-YOLOv5: A Lightweight Deep Learning Detector for On-Board Ship Detection in Large-Scene Sentinel-1 SAR Images. Remote Sens. 2022, 14, 1018–1045. [Google Scholar] [CrossRef]
- Liu, S.; Kong, W.; Chen, X.; Xu, M.; Yasir, M.; Zhao, L.; Li, J. Multi-Scale Ship Detection Algorithm Based on a Lightweight Neural Network for Spaceborne SAR Images. Remote Sens. 2022, 14, 1149–1169. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9308–9316. [Google Scholar]
- Deng, Z.; Sun, H.; Lei, L.; Zhou, S.; Zou, H. Object detection in remote sensing imagery with multi-scale deformable convolutional networks. Acta Geod. Cartogr. Sin. 2018, 47, 1216–1227. [Google Scholar]
- Ren, Y.; Zhu, C.; Xiao, S. Deformable faster r-cnn with aggregating multi-layer features for partially occluded object detection in optical remote sensing images. Remote Sens. 2018, 10, 1470–1483. [Google Scholar] [CrossRef] [Green Version]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Rome, Italy, 10–14 June 2013; pp. 315–323. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Lin, D. MMDetection: Open mmlab Detection Toolbox and Benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote Sens. 2020, 12, 389–409. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Jia, Y.; Gu, L. EFM-Net: Feature Extraction and Filtration with Mask Improvement Network for Object Detection in Remote Sensing Images. Remote Sens. 2021, 13, 4151–4169. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 14–19 June 2020; pp. 9759–9768. [Google Scholar]
- Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 840–849. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss v2: Learning reliable localization quality estimation for dense object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 11632–11641. [Google Scholar]
- Kim, K.; Lee, H. Probabilistic anchor assignment with iou prediction for object detection. In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020; pp. 355–371. [Google Scholar]
- Li, B.; Liu, Y.; Wang, X. Gradient Harmonized Single-Stage Detector. arXiv 2018, arXiv:1811.05181. [Google Scholar] [CrossRef]
Method | Backbone | EFPN | ABFI | CM | LSJ | SWA | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
VFNet | ResNet-50 | - | - | - | - | - | 0.486 | 0.664 | 0.595 | 0.289 | 0.632 | 0.788 |
our | ResNet-50 | √ | - | - | - | - | 0.487 | 0.672 | 0.599 | 0.320 | 0.611 | 0.701 |
our | ResNet-50 | - | √ | - | - | - | 0.484 | 0.671 | 0.593 | 0.314 | 0.611 | 0.720 |
our | ResNet-50 | - | - | √ | - | - | 0.510 | 0.677 | 0.625 | 0.322 | 0.646 | 0.824 |
our | ResNet-50 | - | - | √ | 0.509 | 0.681 | 0.626 | 0.327 | 0.644 | 0.803 | ||
our | ResNet-50 | - | - | - | - | √ | 0.508 | 0.675 | 0.615 | 0.297 | 0.660 | 0.839 |
our | ResNet-50 | √ | √ | √ | √ | √ | 0.536 | 0.737 | 0.701 | 0.402 | 0.706 | 0.804 |
Method | Backbone | EFPN | ABFI | CM | LSJ | SWA | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
VFNet | ResNet-50 | - | - | - | - | - | 0.528 | 0.783 | 0.626 | 0.428 | 0.705 | 0.868 |
our | ResNet-50 | √ | - | - | - | - | 0.539 | 0.789 | 0.648 | 0.447 | 0.706 | 0.847 |
our | ResNet-50 | - | √ | - | - | - | 0.535 | 0.789 | 0.642 | 0.444 | 0.698 | 0.825 |
our | ResNet-50 | - | - | √ | - | - | 0.529 | 0.787 | 0.630 | 0.435 | 0.694 | 0.848 |
our | ResNet-50 | - | - | √ | 0.551 | 0.794 | 0.659 | 0.459 | 0.711 | 0.875 | ||
our | ResNet-50 | - | - | - | - | √ | 0.531 | 0.782 | 0.633 | 0.434 | 0.704 | 0.872 |
our | ResNet-50 | √ | √ | √ | √ | √ | 0.561 | 0.833 | 0.709 | 0.507 | 0.725 | 0.876 |
Method | Backbone | ||||||
---|---|---|---|---|---|---|---|
ATSS | ResNet-50 | 0.486 | 0.671 | 0.598 | 0.283 | 0.634 | 0.789 |
Faster-R-CNN | ResNet-50 | 0.478 | 0.669 | 0.592 | 0.314 | 0.597 | 0.784 |
FSAF | ResNet-50 | 0.477 | 0.675 | 0.582 | 0.303 | 0.602 | 0.735 |
GFL | ResNet-50 | 0.453 | 0.648 | 0.548 | 0.279 | 0.584 | 0.776 |
PAA | ResNet-50 | 0.459 | 0.647 | 0.548 | 0.219 | 0.631 | 0.790 |
RetinaNet-GHM | ResNet-50 | 0.429 | 0.628 | 0.511 | 0.196 | 0.605 | 0.725 |
RetinaNet | ResNet-50 | 0.423 | 0.627 | 0.502 | 0.187 | 0.601 | 0.703 |
VFNet | ResNet-50 | 0.486 | 0.664 | 0.595 | 0.289 | 0.632 | 0.788 |
YOLOv5-s | - | 0.415 | 0.572 | - | - | - | - |
YOLOv5-m | - | 0.528 | 0.735 | - | - | - | - |
FD-Net | ResNet-50 | 0.536 | 0.737 | 0.701 | 0.402 | 0.706 | 0.804 |
Method | Backbone | ||||||
---|---|---|---|---|---|---|---|
ATSS | ResNet-50 | 0.515 | 0.782 | 0.609 | 0.416 | 0.689 | 0.817 |
Faster-R-CNN | ResNet-50 | 0.501 | 0.775 | 0.588 | 0.409 | 0.665 | 0.820 |
FSAF | ResNet-50 | 0.505 | 0.784 | 0.593 | 0.416 | 0.669 | 0.787 |
GFL | ResNet-50 | 0.522 | 0.783 | 0.621 | 0.425 | 0.694 | 0.850 |
PAA | ResNet-50 | 0.503 | 0.766 | 0.593 | 0.395 | 0.692 | 0.849 |
RetinaNet-GHM | ResNet-50 | 0.488 | 0.754 | 0.571 | 0.380 | 0.686 | 0.788 |
RetinaNet | ResNet-50 | 0.487 | 0.752 | 0.569 | 0.378 | 0.688 | 0.799 |
VFNet | ResNet-50 | 0.528 | 0.783 | 0.626 | 0.428 | 0.705 | 0.868 |
YOLOv5-s | - | 0.420 | 0.613 | - | - | - | - |
YOLOv5-m | - | 0.539 | 0.805 | - | - | - | - |
FD-Net | ResNet-50 | 0.561 | 0.833 | 0.709 | 0.507 | 0.725 | 0.876 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, H.; Bai, H.; Yuan, Y.; Qin, W. Fully Deformable Convolutional Network for Ship Detection in Remote Sensing Imagery. Remote Sens. 2022, 14, 1850. https://doi.org/10.3390/rs14081850
Guo H, Bai H, Yuan Y, Qin W. Fully Deformable Convolutional Network for Ship Detection in Remote Sensing Imagery. Remote Sensing. 2022; 14(8):1850. https://doi.org/10.3390/rs14081850
Chicago/Turabian StyleGuo, Hongwei, Hongyang Bai, Yuman Yuan, and Weiwei Qin. 2022. "Fully Deformable Convolutional Network for Ship Detection in Remote Sensing Imagery" Remote Sensing 14, no. 8: 1850. https://doi.org/10.3390/rs14081850