RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images
Abstract
:1. Introduction
2. Proposed Methods
2.1. Rotation Bounding Box Prediction Based on Mask
2.1.1. Instance Label Generation
2.1.2. Rotation Bounding Box Prediction
2.2. Refine Feature Pyramid Network
2.3. Multi-Layer Attention Network
2.4. Loss Function
3. Experiments and Results
3.1. Datasets
3.2. Implementation Detatils
3.2.1. Rpn
3.2.2. Training
3.2.3. Inference
3.2.4. Evaluation Indicators
3.3. Peer Methods Comparison
3.3.1. Results on Dota
3.3.2. Results on Nwpuvhr-10
3.4. Ablation Study
3.4.1. Quantitative Analysis
3.4.2. Qualitative Analysis
4. Discussion
4.1. Effectiveness of Refine Feature Pyramid Network and Multi-Layer Attention Network on Faster R-Cnn
4.2. Sensitivity Analysis of Nms Threshold for Radet
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Braga, A.M.; Marques, R.C.; Rodrigues, F.A.; Medeiros, F.N. A median regularized level set for hierarchical segmentation of SAR images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1171–1175. [Google Scholar] [CrossRef]
- Jin, R.; Yin, J.; Zhou, W.; Yang, J. Level set segmentation algorithm for high-resolution polarimetric SAR images based on a heterogeneous clutter model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4565–4579. [Google Scholar] [CrossRef]
- Lang, F.; Yang, J.; Yan, S.; Qin, F. Superpixel Segmentation of Polarimetric Synthetic Aperture Radar (SAR) Images Based on Generalized Mean Shift. Remote Sens. 2018, 10, 1592. [Google Scholar] [CrossRef] [Green Version]
- Ciecholewski, M. River channel segmentation in polarimetric SAR images: Watershed transform combined with average contrast maximisation. Expert Syst. Appl. 2017, 82, 196–215. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 379–387. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv Prepr. 2018, arXiv:1804.02767. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2cnn: Rotational region cnn for orientation robust scene text detection. arXiv Prepr. 2017, arXiv:1706.09579. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef] [Green Version]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 845–853. [Google Scholar]
- Liu, W.; Rabinovich, A.; Berg, A.C. Parsenet: Looking wider to see better. arXiv Prepr. 2015, arXiv:1506.04579. [Google Scholar]
- Bell, S.; Lawrence Zitnick, C.; Bala, K.; Girshick, R. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 354–370. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv Prepr. 2014, arXiv:1409.0473. [Google Scholar]
- Cheng, J.; Dong, L.; Lapata, M. Long short-term memory-networks for machine reading. arXiv Prepr. 2016, arXiv:1601.06733. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–10 December 2017; pp. 5998–6008. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. arXiv Prepr. 2018, arXiv:1805.08318. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. Isprs J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Han, J.; Zhou, P.; Zhang, D.; Cheng, G.; Guo, L.; Liu, Z.; Bu, S.; Wu, J. Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. Isprs J. Photogramm. Remote. Sens. 2014, 89, 37–48. [Google Scholar] [CrossRef]
- Kang, M.; Ji, K.; Leng, X.; Lin, Z. Contextual region-based convolutional neural network with multilayer fusion for SAR ship detection. Remote Sens. 2017, 9, 860. [Google Scholar] [CrossRef] [Green Version]
- Yang, X.; Sun, H.; Sun, X.; Yan, M.; Guo, Z.; Fu, K. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network. IEEE Access 2018, 6, 50839–50849. [Google Scholar] [CrossRef]
Types | Methods | Advantages | Disadvantages |
---|---|---|---|
Classical Detectors | Two-stage Detectors (e.g. Faster R-CNN, R-FCN, FPN, Mask R-CNN, etc.) | High detection precision, Low misdetection rate | Non-real-time detection, Locate objects with horizontal bounding box |
One-stage Detectors (e.g. YOLO, SSD, YOLO v2, YOLO v3, etc. ) | Real-time detection, Simple network structure | Low detection precision, Locate objects with horizontal bounding box, Poor results for mall and dense objects, Easy to mislocate | |
Rotation Detectors | e.g. R2CNN, RRPN, RDFPN, FR-O, etc. | Locate objects with rotation bounding box, Using rotaion anchors | Large model calculations, Greatly affected by artificial factors, Non-real-time detection, Complex |
Method | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FR-O [15] | 79.09 | 69.12 | 17.17 | 63.49 | 34.20 | 37.16 | 36.20 | 89.19 | 69.60 | 58.96 | 49.40 | 52.52 | 46.69 | 44.80 | 46.30 | 52.93 |
R-DFPN [16] | 80.92 | 65.82 | 33.77 | 58.94 | 55.77 | 50.94 | 54.78 | 90.33 | 66.34 | 68.66 | 48.73 | 51.76 | 55.10 | 51.32 | 35.88 | 57.94 |
R2CNN [13] | 80.94 | 65.67 | 35.34 | 67.44 | 59.52 | 50.91 | 55.81 | 90.67 | 66.92 | 72.39 | 55.06 | 52.23 | 55.14 | 53.35 | 48.22 | 60.67 |
RRPN [14] | 88.52 | 71.20 | 31.66 | 59.30 | 51.85 | 56.19 | 57.25 | 90.81 | 72.84 | 67.38 | 56.69 | 52.84 | 53.08 | 51.94 | 53.58 | 61.01 |
Yang et al. [36] | 81.25 | 71.41 | 36.53 | 67.44 | 61.16 | 50.91 | 56.60 | 90.67 | 68.09 | 72.39 | 55.06 | 55.60 | 62.44 | 53.35 | 51.47 | 62.29 |
Ours | 79.66 | 77.36 | 47.64 | 67.61 | 65.06 | 74.35 | 68.82 | 90.05 | 74.72 | 75.67 | 45.60 | 61.84 | 64.88 | 68.00 | 53.67 | 67.66 |
Ours * | 79.45 | 76.99 | 48.05 | 65.83 | 65.46 | 74.40 | 68.86 | 89.70 | 78.14 | 74.97 | 49.92 | 64.63 | 66.14 | 71.58 | 62.16 | 69.09 |
Method | SSD512 [12] | Faster R-CNN [6] | FPN [7] | Ours |
---|---|---|---|---|
Tennis-court | 33.85 | 79.77 | 86.69 | 90.74 |
Vehicle | 45.45 | 81.02 | 89.39 | 89.98 |
Harbor | 32.95 | 79.37 | 69.52 | 78.01 |
Basketball-court | 61.85 | 79.96 | 90.60 | 97.46 |
Ground-track-field | 99.31 | 90.67 | 90.42 | 99.53 |
Bridge | 45.45 | 59.93 | 79.49 | 77.05 |
Ship | 53.90 | 81.82 | 81.40 | 81.39 |
Airplane | 90.91 | 90.91 | 90.91 | 100.00 |
Storage-tank | 93.06 | 97.89 | 98.95 | 97.90 |
Baseball-diamond | 90.35 | 90.24 | 90.12 | 90.36 |
mAP | 64.71 | 83.25 | 86.75 | 90.24 |
Method | Deconvolution | Resize-Convolution | mAP |
---|---|---|---|
RFPN with different component | ✓ | 64.86 | |
✓ | 65.64 |
Method | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline | 79.78 | 74.88 | 44.13 | 63.22 | 63.60 | 67.25 | 68.56 | 90.01 | 69.44 | 67.76 | 44.65 | 64.09 | 63.84 | 64.89 | 50.34 | 65.10 |
+ RFPN | 79.84 | 75.91 | 43.08 | 65.22 | 65.11 | 72.93 | 69.09 | 90.69 | 68.97 | 68.86 | 43.58 | 63.10 | 64.62 | 67.59 | 46.02 | 65.64 () |
+ MANet | 80.03 | 75.32 | 43.58 | 62.47 | 64.13 | 72.77 | 68.74 | 90.19 | 70.29 | 73.51 | 51.26 | 61.24 | 64.44 | 68.04 | 43.75 | 65.98 () |
+ RFPN + MANet | 79.66 | 77.36 | 47.64 | 67.61 | 65.06 | 74.35 | 68.82 | 90.05 | 74.72 | 75.67 | 45.60 | 61.84 | 64.88 | 68.00 | 53.67 | 67.66 () |
Method | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Faster R-CNN [6] | 80.32 | 77.55 | 32.86 | 68.13 | 53.66 | 52.49 | 50.04 | 90.41 | 75.05 | 59.59 | 57.00 | 49.81 | 61.69 | 56.46 | 41.85 | 60.46 |
Faster R-CNN + RFPN | 79.29 | 75.95 | 47.97 | 58.54 | 54.88 | 50.10 | 52.14 | 79.93 | 59.96 | 68.77 | 42.63 | 64.04 | 66.04 | 69.14 | 57.44 | 61.79 |
Faster R-CNN + MANet | 79.65 | 74.29 | 49.63 | 55.59 | 55.02 | 50.16 | 52.29 | 79.60 | 66.57 | 68.59 | 39.50 | 63.15 | 65.50 | 71.87 | 55.84 | 61.82 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Huang, Q.; Pei, X.; Jiao, L.; Shang, R. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote Sens. 2020, 12, 389. https://doi.org/10.3390/rs12030389
Li Y, Huang Q, Pei X, Jiao L, Shang R. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote Sensing. 2020; 12(3):389. https://doi.org/10.3390/rs12030389
Chicago/Turabian StyleLi, Yangyang, Qin Huang, Xuan Pei, Licheng Jiao, and Ronghua Shang. 2020. "RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images" Remote Sensing 12, no. 3: 389. https://doi.org/10.3390/rs12030389
APA StyleLi, Y., Huang, Q., Pei, X., Jiao, L., & Shang, R. (2020). RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images. Remote Sensing, 12(3), 389. https://doi.org/10.3390/rs12030389