Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images
Abstract
:1. Introduction
- Deep CNN can extract mid-level and high-level features. However, feature representation in RSIs using single CNN is limited due to scale variation, complicated scenes, and noise.
- Deep learning methods rely heavily on massive amounts of annotated data for learning. Despite the explosive growth of satellite imagery in quality and quantity, much of the data must be labeled manually.
- Different from the objects in natural scene images, the objects in RSIs usually are photographed from an overhead view and we cannot capture details from multiple angles. In addition, affected by illumination, shadow, scale variations, resolution, and so on, objects in RSIs are difficult to detect accurately using deep learning methods directly.
- Many geospatial objects in RSIs are combinations of some objects, such as airports, schools, and thermal power plants. These composite objects contain several parts and the feature representation is not fixed. It is challenging to detect composite objects due to their diverse appearance, irregular boundaries, and complex background.
- We propose a feature enhanced network (FENet) for PSSs detection. The proposed method can improve the performance of PSSs detection in RSIs and effectively avoid the influence of negative samples. Compared with other object detection methods, our proposed method can locate the objects precisely without complex computation related to anchors. This simple anchor-free method also provides a new idea for object detection.
- An enhanced feature module (EFM) is proposed to enlarge the receptive field in high-level layers and enhance discrimination of features. EFM contains two parts: one is a multi-scale local attention (MSLA) module for extracting multi-scale features, and the other is a channel attention-guided unit (CAU) for re-weighting features to obtain global attention and improve the semantic consistency among multiscale features. Through critical information extraction of high-level layers and further feature fusion, EFM can improve the classification and localization during PSSs detection.
- A context-aware strategy and complete IoU (CIoU) loss are introduced to our network for further optimizing predicted bounding boxes. The context-aware strategy can make full use of foreground information and generate more positive samples. The CIoU loss considers the relationship between predicted boxes and ground-truth boxes in many cases and achieves faster regression of bounding boxes. These strategies are suitable for anchor-free methods and can effectively predict positive samples while ignoring the negative samples.
- We build a PSSs dataset for composite object detection. This dataset is based on GF satellites with 2 m resolution and includes 1685 annotated images. The PSSs dataset provides a benchmark for future composite object detection.
2. Methods
2.1. Network Architecture
2.2. Enhanced Feature Module
2.3. Context-Aware Strategy
2.4. Multitask Loss Function
3. Experiments and Results
3.1. Datasets
3.2. Evaluation Metrics
3.3. Implementation Details
3.4. Ablation Studies on Different Structures
3.5. The Experiments on Different Radii in the Context-Aware Strategy
3.6. Comparison to State-of-the-Art Methods
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-Class Geospatial Object Detection and Geographic Image Classification Based on Collection of Part Detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Chen, S.; Zhan, R.; Zhang, J. Geospatial Object Detection in Remote Sensing Imagery Based on Multiscale Single-Shot Detector with Activated Semantics. Remote Sens. 2018, 10, 820. [Google Scholar] [CrossRef] [Green Version]
- Chen, Z.; Zhang, T.; Ouyang, C. End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef] [Green Version]
- Pang, J.; Li, C.; Shi, J.; Xu, Z.; Feng, H. R2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5512–5524. [Google Scholar] [CrossRef] [Green Version]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Swizerland, 2016; pp. 21–37. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.; Gong, Y. Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens. 2019, 11, 755. [Google Scholar] [CrossRef] [Green Version]
- Zhu, M.; Xu, Y.; Ma, S.; Li, S.; Ma, H.; Han, Y. Effective Airplane Detection in Remote Sensing Images Based on Multilayer Feature Fusion and Improved Nonmaximal Suppression Algorithm. Remote Sens. 2019, 11, 1062. [Google Scholar] [CrossRef] [Green Version]
- Zhuang, S.; Wang, P.; Jiang, B.; Wang, G.; Wang, C. A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection. Remote Sens. 2019, 11, 594. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Si, Y.; Hong, H.; Yao, X.; Guo, L. Cross-Scale Feature Fusion for Object Detection in Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1–5. [Google Scholar] [CrossRef]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A Survey. arXiv 2021, arXiv:2111.07624. [Google Scholar] [CrossRef]
- Zhang, G.; Lu, S.; Zhang, W. CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10015–10024. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Wan, L.; Zhu, J.; Xu, G.; Deng, M. Multi-Scale Spatial and Channel-Wise Attention for Improving Object Detection in Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2020, 17, 681–685. [Google Scholar] [CrossRef]
- Dong, R.; Jiao, L.; Zhang, Y.; Zhao, J.; Shen, W. A Multi-Scale Spatial Attention Region Proposal Network for High-Resolution Optical Remote Sensing Imagery. Remote Sens. 2021, 13, 3362. [Google Scholar] [CrossRef]
- Cai, B.; Jiang, Z.; Zhang, H.; Zhao, D.; Yao, Y. Airport Detection Using End-to-End Convolutional Neural Network with Hard Example Mining. Remote Sens. 2017, 9, 1198. [Google Scholar] [CrossRef] [Green Version]
- Li, S.; Xu, Y.; Zhu, M.; Ma, S.; Tang, H. Remote Sensing Airport Detection Based on End-to-End Deep Transferable Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1–5. [Google Scholar] [CrossRef]
- Xu, Y.; Zhu, M.; Li, S.; Feng, H.; Ma, S.; Che, J. End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks. Remote Sens. 2018, 10, 1516. [Google Scholar] [CrossRef] [Green Version]
- Sun, X.; Wang, P.; Wang, C.; Liu, Y.; Fu, K. PBNet: Part-Based Convolutional Neural Network for Complex Composite Object Detection in Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 173, 50–65. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. arXiv 2019, arXiv:1904.01355. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. arXiv 2019, arXiv:1808.01244. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar]
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. arXiv 2019, arXiv:1904.11490. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. FoveaBox: Beyond Anchor-Based Object Detector. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Fu, H.; Fan, X.; Yan, Z.; Du, X. Detection of Schools in Remote Sensing Images Based on Attention-Guided Dense Network. ISPRS Int. J. Geo-Inform. 2021, 10, 736. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G.; Albanie, S. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. arXiv 2019, arXiv:1903.06586. [Google Scholar]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated Residual Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 636–644. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019, arXiv:1911.08287. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar]
Stage | Backbone | Output |
---|---|---|
C2 | 128 × 128, 256 | |
C3 | 64 × 64, 512 | |
C4 | 32 × 32, 1024 | |
C5 | 16 × 16, 2048 |
Method | EFM | Context-Aware Strategy | CIoU Loss | AP |
---|---|---|---|---|
Baseline | 74.0 | |||
Baseline-1 | √ | 76.2 | ||
Baseline-2 | √ | 75.9 | ||
Baseline-3 | √ | 75.4 | ||
FENet | √ | √ | √ | 78.7 |
Method | Regression | AP |
---|---|---|
FENet-1 | IoU | 76.5 |
FENet-2 | GIoU | 77.9 |
FENet-3 | CIoU | 78.7 |
Radius | 0.5 | 1.0 | 1.5 | 2.0 | 2.5 | 3.0 | All Area |
---|---|---|---|---|---|---|---|
AP | 69.1 | 75.7 | 77.2 | 78.7 | 77.1 | 75.5 | 76.7 |
Methods | AP |
---|---|
RetinaNet | 72.4 |
FPN | 73.2 |
ADNet | 79.2 |
CenterNet | 72.1 |
FCOS | 74.0 |
FENet | 78.7 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fu, H.; Fan, X.; Yan, Z.; Du, X.; Jian, H.; Xu, C. Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Appl. Sci. 2022, 12, 3114. https://doi.org/10.3390/app12063114
Fu H, Fan X, Yan Z, Du X, Jian H, Xu C. Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Applied Sciences. 2022; 12(6):3114. https://doi.org/10.3390/app12063114
Chicago/Turabian StyleFu, Han, Xiangtao Fan, Zhenzhen Yan, Xiaoping Du, Hongdeng Jian, and Chen Xu. 2022. "Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images" Applied Sciences 12, no. 6: 3114. https://doi.org/10.3390/app12063114
APA StyleFu, H., Fan, X., Yan, Z., Du, X., Jian, H., & Xu, C. (2022). Feature Enhanced Anchor-Free Network for School Detection in High Spatial Resolution Remote Sensing Images. Applied Sciences, 12(6), 3114. https://doi.org/10.3390/app12063114