Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China
Abstract
:1. Introduction
2. Study Area and Data
2.1. Study Area
2.2. Data Acquisition and Preprocessing
3. Methods
3.1. UAV Optical Image Visual Feature Extraction Methods
3.1.1. Calculation of Visible Difference Vegetation Index (VDVI)
3.1.2. Calculation of Sobel Edge Detection Features
3.2. Improved Mask R-CNN for UAV Image Roof Type Recognition
3.2.1. Migration Learning Deployment of ResNet152
3.2.2. Construction of Improved Mask R-CNN Model
- (1)
- Input a pre-processed UAV remote sensing image of a specific size into the pre-trained ResNet152 network to obtain the corresponding feature maps.
- (2)
- Assign a fixed number of RoIs to each point on the feature map, resulting in multiple RoIs.
- (3)
- Transfer these candidate RoIs to the RPN network for binary classification (foreground and background) and fine-tuning of the location and size of the bounding box to obtain a more accurate bounding box for better fitting of the target. Simultaneously, filter out some of the candidate RoIs by using non-maximal value suppression.
- (4)
- Run the RoI Align operation on the remaining RoIs, that is, first mapping the feature map’s pixels to the original map, and then mapping the feature map to the fixed features.
- (5)
- Finally, these RoIs are subjected to multi-category classification, bounding box regression, and mask generation by FCN in the sub-network.
3.2.3. Model Implementation and Training
- (1)
- Software and hardware environment configuration: The computer used in this experiment is equipped with a 3.7 GHz octa-core Intel Core i9-10900K CPU, an 11 GB NVIDIA GeForce GTX 2080 Super graphics card, a 32 GB memory stick and Windows 10 as the operating system. The neural network design framework used in this paper is the Pytorch deep learning framework.
- (2)
- Construction of a sample dataset of rural building roof types: We cropped the seven images and then calculated the spectral visual features and spatial visual features of the sample area images according to the method described in Section 3.1, and combined them with the original UAV visible band images for different features. We used ArcGIS Pro 2.8 to manually visually interpret the sample labeling of each representative roof type in these combined features and cross-checked it with multiple people to ensure the accuracy of the sample types, including the gabled type labeled as 1, flat type labeled as 2, hipped type labeled as 3, complex type labeled as 4 and mono-pitched labeled as 5. The labeled images are converted to GeoTIFF format, which is used as the reference standard for training sample data and model accuracy verification of the deep learning model. Due to the hardware limitation in the training of the deep learning network model, the image needs to be segmented into several small pieces. Based on the random strategy [69], a 224 × 224 area is randomly intercepted from the manually labeled sample area as the input image for the training model.
- (3)
- Data enhancement and sample data set assignment: To expand the training sample size of the UAV remote image dataset to avoid the model overfitting problem, we randomly select 50% of the images from the training dataset for data enhancement, and get 1.5 times the amount of image data as the original training data. These enhancement methods include rotate, crop, brightness enhancement, contrast enhancement, and scaling. In this paper, we use two regions, T1 and T2, as test regions, and other regions as training and validation regions, with the training sample set, validation sample set, and test sample set divided in the ratio of 5:1:4. It should be noted that because the number of hipped, complex and mono-pitched types of roofs on buildings in rural areas is small, which easily causes the unbalanced extraction of features from different roof types by the model, we conduct separate secondary training for the above three datasets of roof types with a small number of training samples, and the secondary training classification results are jointly processed with the full type training classification results as the final classification result output.
- (4)
- Feature combination training mode setting: In this paper, different visual feature images are combined with UAV visible RGB band images as different feature combinations, and they are input to the model for training to analyze the performance of the model under several different feature combinations. The image features are divided into four different combinations as input layers: RGB, RGB + Sobel, RGB + VDVI and RGB + VDVI + Sobel.
- (5)
- Model training parameters setting: After comparing the experimental results with several parameter selections, the improved Mask R-CNN deep learning network uses the average binary cross entropy as the loss function, which allows the generation of masks for each class, and there is no inter-class competition. The weight decay coefficient is 0.0001, the momentum coefficient is 0.9, the activation function is sigmoid, the batch size is set to 8, the epoch is set to 20, the initial learning rate is 0.001, and the optimization method uses the stochastic gradient descent (SGD) method, which can accelerate the convergence of the network.
3.2.4. Accuracy Evaluation Method
4. Experimental Results
4.1. Accuracy Comparison of Roof Recognition with Different Feature Combinations Based on the Improved Mask R-CNN
4.1.1. Comparison of Accuracy of Roof Type Recognition Results of Single Rural Buildings with Different Feature Combinations
4.1.2. Comparison of the Overall Recognition Result Accuracy for Different Feature Combinations
4.2. Comparison of Roof Recognition Accuracy with Other Deep Learning Models
4.2.1. Comparison of Roof Type Recognition Results of Single Buildings with Different Deep Learning Models
4.2.2. Overall Recognition Accuracy of Roofs Compared to Other Deep Learning Models
5. Discussion
5.1. Sensitivity Analysis of Different Feature Combinations on the Training Results of the Improved Mask R-CNN
5.2. Effect of Different Feature Extraction Layers of ResNet on the Accuracy of Results
5.3. Analysis of the Limitations of Roof Type Identification Methods for Complex Rural Buildings
- (1)
- Uneven sample size across roof types
- (2)
- Limitations of different visual feature extraction methods
- (3)
- Mask R-CNN structure problem
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, D.; Loboda, T.V.; Silva, J.A.; Tonellato, M.R. Characterizing Small-Town Development Using Very High Resolution Imagery within Remote Rural Settings of Mozambique. Remote Sens. 2021, 13, 3385. [Google Scholar] [CrossRef]
- Sun, L.; Tang, Y.; Zhang, L. Rural building detection in high-resolution imagery based on a two-stage CNN model. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1998–2002. [Google Scholar] [CrossRef]
- Varol, B.; Yılmaz, E.Ö.; Maktav, D.; Bayburt, S.; Gürdal, S. Detection of illegal constructions in urban cities: Comparing LIDAR data and stereo KOMPSAT-3 images with development plans. Eur. J. Remote Sens. 2019, 52, 335–344. [Google Scholar] [CrossRef] [Green Version]
- Song, X.; Huang, Y.; Zhao, C.; Liu, Y.; Lu, Y.; Chang, Y.; Yang, J. An approach for estimating solar photovoltaic potential based on rooftop retrieval from remote sensing images. Energies 2018, 11, 3172. [Google Scholar] [CrossRef] [Green Version]
- Tiwari, A.; Meir, I.A.; Karnieli, A. Object-based image procedures for assessing the solar energy photovoltaic potential of heterogeneous rooftops using airborne LiDAR and orthophoto. Remote Sens. 2020, 12, 223. [Google Scholar] [CrossRef] [Green Version]
- Tu, J.; Sui, H.; Feng, W.; Sun, K.; Hua, L. Detection of damaged rooftop areas from high-resolution aerial images based on visual bag-of-words model. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1817–1821. [Google Scholar] [CrossRef]
- He, H.; Zhou, J.; Chen, M.; Chen, T.; Li, D.; Cheng, P. Building extraction from UAV images jointly using 6D-SLIC and multiscale Siamese convolutional networks. Remote Sens. 2019, 11, 1040. [Google Scholar] [CrossRef] [Green Version]
- Guo, M.; Liu, H.; Xu, Y.; Huang, Y. Building extraction based on U-Net with an attention block and multiple losses. Remote Sens. 2020, 12, 1400. [Google Scholar] [CrossRef]
- Benarchid, O.; Raissouni, N.; El Adib, S.; Abbous, A.; Azyat, A.; Achhab, N.B.; Lahraoua, M.; Chahboun, A. Building extraction using object-based classification and shadow information in very high resolution multispectral images, a case study: Tetuan, Morocco. Can. J. Image Processing Comput. Vis. 2013, 4, 1–8. [Google Scholar]
- Schuegraf, P.; Bittner, K. Automatic building footprint extraction from multi-resolution remote sensing images using a hybrid FCN. ISPRS Int. J. Geo-Inf. 2019, 8, 191. [Google Scholar] [CrossRef] [Green Version]
- Zhu, Q.; Li, Z.; Zhang, Y.; Guan, Q. Building extraction from high spatial resolution remote sensing images via multiscale-aware and segmentation-prior conditional random fields. Remote Sens. 2020, 12, 3983. [Google Scholar] [CrossRef]
- Liao, C.; Hu, H.; Li, H.; Ge, X.; Chen, M.; Li, C.; Zhu, Q. Joint Learning of Contour and Structure for Boundary-Preserved Building Extraction. Remote Sens. 2021, 13, 1049. [Google Scholar] [CrossRef]
- Nyandwi, E.; Koeva, M.; Kohli, D.; Bennett, R. Comparing human versus machine-driven cadastral boundary feature extraction. Remote Sens. 2019, 11, 1662. [Google Scholar] [CrossRef] [Green Version]
- Chen, R.; Li, X.; Li, J. Object-based features for house detection from RGB high-resolution images. Remote Sens. 2018, 10, 451. [Google Scholar] [CrossRef] [Green Version]
- Turker, M.; Koc-San, D. Building extraction from high-resolution optical spaceborne images using the integration of support vector machine (SVM) classification, Hough transformation and perceptual grouping. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 58–69. [Google Scholar] [CrossRef]
- Li, W.; He, C.; Fang, J.; Zheng, J.; Fu, H.; Yu, L. Semantic segmentation-based building footprint extraction using very high-resolution satellite images and multi-source GIS data. Remote Sens. 2019, 11, 403. [Google Scholar] [CrossRef] [Green Version]
- Zhang, C.; Jiao, J.-c.; Deng, Z.-l.; Cui, Y.-s. Individual Building Rooftop Segmentation from High-resolution Urban Single Multispectral Image Using Superpixels. DEStech Trans. Comput. Sci. Eng. 2019, 188–193. [Google Scholar] [CrossRef]
- Castagno, J.; Atkins, E. Roof shape classification from LiDAR and satellite image data fusion using supervised learning. Sensors 2018, 18, 3960. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tan, Y.; Wang, S.; Xu, B.; Zhang, J. An improved progressive morphological filter for UAV-based photogrammetric point clouds in river bank monitoring. ISPRS J. Photogramm. Remote Sens. 2018, 146, 421–429. [Google Scholar] [CrossRef]
- Boonpook, W.; Tan, Y.; Ye, Y.; Torteeka, P.; Torsri, K.; Dong, S. A deep learning approach on building detection from unmanned aerial vehicle-based images in riverbank monitoring. Sensors 2018, 18, 3921. [Google Scholar] [CrossRef] [Green Version]
- Shao, H.; Song, P.; Mu, B.; Tian, G.; Chen, Q.; He, R.; Kim, G. Assessing city-scale green roof development potential using Unmanned Aerial Vehicle (UAV) imagery. Urban For. Urban Green. 2021, 57, 126954. [Google Scholar] [CrossRef]
- Liu, W.; Yang, M.; Xie, M.; Guo, Z.; Li, E.; Zhang, L.; Pei, T.; Wang, D. Accurate building extraction from fused DSM and UAV images using a chain fully convolutional neural network. Remote Sens. 2019, 11, 2912. [Google Scholar] [CrossRef] [Green Version]
- Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A fully convolutional neural network for automatic building extraction from high-resolution remote sensing images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef] [Green Version]
- Singh, P.; Verma, A.; Chaudhari, N.S. Deep convolutional neural network classifier for handwritten Devanagari character recognition. In Information Systems Design and Intelligent Applications; Springer: Berlin/Heidelberg, Germany, 2016; pp. 551–561. [Google Scholar]
- Yang, H.; Wu, P.; Yao, X.; Wu, Y.; Wang, B.; Xu, Y. Building extraction in very high resolution imagery by dense-attention networks. Remote Sens. 2018, 10, 1768. [Google Scholar] [CrossRef] [Green Version]
- Alidoost, F.; Arefi, H. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image. PFG—J. Photogram. Remote Sens. Geoinfor. Sci. 2018, 86, 235–248. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Li, E.; Xia, J.; Du, P.; Lin, C.; Samat, A. Integrating multilayer features of convolutional neural networks for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5653–5665. [Google Scholar] [CrossRef]
- Arnab, A.; Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Larsson, M.; Kirillov, A.; Savchynskyy, B.; Rother, C.; Kahl, F.; Torr, P.H. Conditional random fields meet deep neural networks for semantic segmentation: Combining probabilistic graphical models with deep learning for structured prediction. IEEE Signal Processing Mag. 2018, 35, 37–52. [Google Scholar] [CrossRef]
- Pan, Z.; Xu, J.; Guo, Y.; Hu, Y.; Wang, G. Deep learning segmentation and classification for urban village using a worldview satellite image based on U-Net. Remote Sens. 2020, 12, 1574. [Google Scholar] [CrossRef]
- Wu, T.; Hu, Y.; Peng, L.; Chen, R. Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 2910. [Google Scholar] [CrossRef]
- Zhao, K.; Kang, J.; Jung, J.; Sohn, G. Building extraction from satellite images using mask R-CNN with building boundary regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 247–251. [Google Scholar]
- Ji, C.; Tang, H. Number of Building Stories Estimation from Monocular Satellite Image Using a Modified Mask R-CNN. Remote Sens. 2020, 12, 3833. [Google Scholar] [CrossRef]
- Stiller, D.; Stark, T.; Wurm, M.; Dech, S.; Taubenböck, H. Large-scale building extraction in very high-resolution aerial imagery using Mask R-CNN. In Proceedings of the 2019 Joint Urban Remote Sensing Event (JURSE), Vannes, France, 22–24 May 2019; pp. 1–4. [Google Scholar]
- Chen, M.; Wu, J.; Liu, L.; Zhao, W.; Tian, F.; Shen, Q.; Zhao, B.; Du, R. DR-Net: An improved network for building extraction from high resolution remote sensing image. Remote Sens. 2021, 13, 294. [Google Scholar] [CrossRef]
- Zhong, Z.; Li, J.; Ma, L.; Jiang, H.; Zhao, H. Deep residual networks for hyperspectral image classification. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1824–1827. [Google Scholar]
- Hu, Y.; Guo, F. Building Extraction Using Mask Scoring R-CNN Network. In Proceedings of the 3rd International Conference on Computer Science and Application Engineering, Sanya, China, 22–24 October 2019; pp. 1–5. [Google Scholar]
- Yang, F.; Li, W.; Hu, H.; Li, W.; Wang, P. Multi-scale feature integrated attention-based rotation network for object detection in VHR aerial images. Sensors 2020, 20, 1686. [Google Scholar] [CrossRef] [Green Version]
- Kumar, A.; Abhishek, K.; Kumar Singh, A.; Nerurkar, P.; Chandane, M.; Bhirud, S.; Patel, D.; Busnel, Y. Multilabel classification of remote sensed satellite imagery. Trans. Emerg. Telecommun. Technol. 2021, 4, 118–133. [Google Scholar] [CrossRef]
- Zhuo, X.; Fraundorfer, F.; Kurz, F.; Reinartz, P. Optimization of OpenStreetMap building footprints based on semantic information of oblique UAV images. Remote Sens. 2018, 10, 624. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Cai, X.; Qi, J. AMFNet: An attention-based multi-level feature fusion network for ground objects extraction from mining area’s UAV-based RGB images and digital surface model. J. Appl. Remote Sens. 2021, 15, 036506. [Google Scholar] [CrossRef]
- Sun, W.; Wang, R. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM. IEEE Geosci. Remote Sens. Lett. 2018, 15, 474–478. [Google Scholar] [CrossRef]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Boonpook, W.; Tan, Y.; Xu, B. Deep learning-based multi-feature semantic segmentation in building extraction from images of UAV photogrammetry. Int. J. Remote Sens. 2021, 42, 1–19. [Google Scholar] [CrossRef]
- Zhang, L.; Wu, J.; Fan, Y.; Gao, H.; Shao, Y. An efficient building extraction method from high spatial resolution remote sensing images based on improved mask R-CNN. Sensors 2020, 20, 1465. [Google Scholar] [CrossRef] [Green Version]
- Ji, S.; Wei, S.; Lu, M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Trans. Geosci. Remote Sens. 2018, 57, 574–586. [Google Scholar] [CrossRef]
- Li, W.; Li, Y.; Gong, J.; Feng, Q.; Zhou, J.; Sun, J.; Shi, C.; Hu, W. Urban Water Extraction with UAV High-Resolution Remote Sensing Data Based on an Improved U-Net Model. Remote Sens. 2021, 13, 3165. [Google Scholar] [CrossRef]
- Zhang, X.; Fu, Y.; Zang, A.; Sigal, L.; Agam, G. Learning classifiers from synthetic data using a multichannel autoencoder. arXiv 2015, arXiv:1503.03163. [Google Scholar]
- Yan, G.; Li, L.; Coy, A.; Mu, X.; Chen, S.; Xie, D.; Zhang, W.; Shen, Q.; Zhou, H. Improving the estimation of fractional vegetation cover from UAV RGB imagery by colour unmixing. ISPRS J. Photogramm. Remote Sens. 2019, 158, 23–34. [Google Scholar] [CrossRef]
- Jannoura, R.; Brinkmann, K.; Uteau, D.; Bruns, C.; Joergensen, R.G. Monitoring of crop biomass using true colour aerial photographs taken from a remote controlled hexacopter. Biosyst. Eng. 2015, 129, 341–351. [Google Scholar] [CrossRef]
- Xiaoqin, W.; Miaomiao, W.; Shaoqiang, W.; Yundong, W. Extraction of vegetation information from visible unmanned aerial vehicle images. Trans. Chin. Soc. Agric. Eng. 2015, 31, 152–159. [Google Scholar]
- Zhang, Y.; Zhang, F.; Shakhsheer, Y.; Silver, J.D.; Klinefelter, A.; Nagaraju, M.; Boley, J.; Pandey, J.; Shrivastava, A.; Carlson, E.J. A batteryless 19μW MICS/ISM-Band energy harvesting body sensor node SoC for ExG applications. IEEE J. Solid-State Circuits 2012, 48, 199–213. [Google Scholar] [CrossRef]
- Yuan, H.; Liu, Z.; Cai, Y.; Zhao, B. Research on vegetation information extraction from visible UAV remote sensing images. In Proceedings of the 2018 Fifth International Workshop on Earth Observation and Remote Sensing Applications (EORSA), Xi’an, China, 18–20 June 2018; pp. 1–5. [Google Scholar]
- Huang, Y.-h.; Chen, D.-w. Image fuzzy enhancement algorithm based on contourlet transform domain. Multimed. Tools Appl. 2020, 79, 35017–35032. [Google Scholar] [CrossRef]
- Vincent, O.R.; Folorunso, O. A descriptive algorithm for sobel image edge detection. In Proceedings of the Informing Science & IT Education Conference (InSITE), Macon, GA, USA, 12–15 June 2009; pp. 97–107. [Google Scholar]
- Ding, L.; Goshtasby, A. On the Canny edge detector. Pattern Recognit. 2001, 34, 721–725. [Google Scholar] [CrossRef]
- Burt, P.J.; Adelson, E.H. The Laplacian pyramid as a compact image code. In Readings in Computer Vision; Elsevier: Amsterdam, The Netherlands, 1987; pp. 671–679. [Google Scholar]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef] [Green Version]
- Ma, G.; He, Q.; Shi, X.; Fan, X. Automatic Vectorization Extraction of Flat-Roofed Houses Using High-Resolution Remote Sensing Images. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 44–47. [Google Scholar]
- Teng, L.; Xue, F.; Bai, Q. Remote sensing image enhancement via edge-preserving multiscale retinex. IEEE Photonics J. 2019, 11, 1–10. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Xu, L.; Chen, Q. Remote-sensing image usability assessment based on ResNet by combining edge and texture maps. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1825–1834. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, D.; Deng, L.; Lee, M.; Wang, H. IoT data feature extraction and intrusion detection system for smart cities based on deep migration learning. Int. J. Inf. Manag. 2019, 49, 533–545. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Processing Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tuia, D.; Muñoz-Marí, J.; Camps-Valls, G. Remote sensing image segmentation by active queries. Pattern Recognit. 2012, 45, 2180–2192. [Google Scholar] [CrossRef]
- Li, M.; Wu, P.; Wang, B.; Park, H.; Yang, H.; Wu, Y. A Deep Learning Method of Water Body Extraction From High Resolution Remote Sensing Images With Multisensors. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3120–3132. [Google Scholar] [CrossRef]
- Guo, H.; Shi, Q.; Du, B.; Zhang, L.; Wang, D.; Ding, H. Scene-driven multitask parallel attention network for building extraction in high-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4287–4306. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Zhu, Y.; Liang, Z.; Yan, J.; Chen, G.; Wang, X. ED-Net: Automatic Building Extraction From High-Resolution Aerial Images With Boundary Information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4595–4606. [Google Scholar] [CrossRef]
- Diamond, N.B.; Armson, M.J.; Levine, B. The truth is out there: Accuracy in recall of verifiable real-world events. Psychol. Sci. 2020, 31, 1544–1556. [Google Scholar] [CrossRef]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef] [Green Version]
Sample | Gabled | Flat | Hipped | Complex | Mono-Pitched |
---|---|---|---|---|---|
Training data | 8.43/4252 | 8.74/3767 | 0.64/249 | 0.9/258 | 0.28/229 |
Test data | 7.26/3111 | 4.04/2166 | 0.21/39 | 1/209 | 0.12/122 |
Total | 15.69/7363 | 12.78/5933 | 0.85/288 | 1.9/467 | 0.4/351 |
Feature | Type | T1 | T2 | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | ||
RGB | gabled | 83.5% | 76.4% | 0.798 | 95.6% | 93.9% | 0.947 |
flat | 86.4% | 98.1% | 0.919 | 92.5% | 90.0% | 0.912 | |
hipped | 47.5% | 64.9% | 0.549 | 73.8% | 47.4% | 0.577 | |
complex | 90.9% | 35.9% | 0.515 | 63.7% | 80.9% | 0.713 | |
mono-pitched | 77.8% | 31.1% | 0.444 | 56.3% | 54.5% | 0.554 | |
RGB + Sobel | gabled | 90.0% | 82.4% | 0.860 | 96.2% | 92.1% | 0.941 |
flat | 90.3% | 98.0% | 0.940 | 88.5% | 91.6% | 0.900 | |
hipped | 56.3% | 94.7% | 0.706 | 69.7% | 74.3% | 0.719 | |
complex | 97.7% | 77.0% | 0.861 | 67.8% | 85.4% | 0.756 | |
mono-pitched | 37.5% | 66.7% | 0.480 | 54.2% | 46.4% | 0.500 | |
RGB + VDVI | gabled | 84.5% | 81.0% | 0.827 | 95.0% | 81.3% | 0.876 |
flat | 89.6% | 98.2% | 0.937 | 69.8% | 94.7% | 0.804 | |
hipped | 65.5% | 59.4% | 0.623 | 67.3% | 55.4% | 0.608 | |
complex | 74.3% | 34.9% | 0.475 | 63.6% | 81.0% | 0.713 | |
mono-pitched | 93.3% | 37.8% | 0.538 | 53.1% | 44.7% | 0.485 | |
RGB + VDVI + Sobel | gabled | 86.0% | 47.0% | 0.608 | 93.1% | 85.8% | 0.893 |
flat | 80.6% | 99.1% | 0.889 | 62.9% | 94.8% | 0.756 | |
hipped | 29.9% | 92.1% | 0.451 | 83.3% | 37.5% | 0.517 | |
complex | 90.2% | 44.0% | 0.591 | 33.3% | 1.0% | 0.019 | |
mono-pitched | 38.5% | 12.5% | 0.189 | 1.0% | 1.0% | 0.010 |
Feature | T1 | T2 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | KC | OA | Precision | Recall | F1-Score | KC | OA | |
RGB | 77.2% | 61.3% | 0.683 | 0.716 | 0.842 | 69.8% | 71.4% | 0.706 | 0.696 | 0.832 |
RGB + Sobel | 74.4% | 83.8% | 0.788 | 0.831 | 0.903 | 75.3% | 78.0% | 0.766 | 0.811 | 0.907 |
RGB + VDVI | 81.4% | 62.3% | 0.706 | 0.758 | 0.870 | 76.4% | 73.3% | 0.748 | 0.793 | 0.910 |
RGB + VDVI + Sobel | 65.0% | 58.9% | 0.618 | 0.565 | 0.791 | 54.7% | 44.0% | 0.488 | 0.626 | 0.827 |
Model | Type | T1 | T2 | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | ||
Mask R-CNN | gabled | 84.1% | 92.3% | 0.880 | 92.4% | 87.2% | 0.897 |
flat | 74.4% | 71.1% | 0.727 | 73.5% | 92.6% | 0.820 | |
hipped | 40.0% | 50.0% | 0.444 | 61.2% | 30.0% | 0.403 | |
complex | 90.9% | 63.8% | 0.750 | 69.7% | 65.2% | 0.674 | |
mono-pitched | 65.4% | 68.0% | 0.667 | 66.7% | 31.3% | 0.426 | |
U-Net | gabled | 70.2% | 93.0% | 0.800 | 85.2% | 95.1% | 0.899 |
flat | 93.7% | 86.5% | 0.900 | 85.4% | 79.6% | 0.824 | |
hipped | 66.7% | 2.6% | 0.050 | 51.9% | 17.8% | 0.265 | |
complex | 81.0% | 18.4% | 0.300 | 45.1% | 18.0% | 0.257 | |
mono-pitched | 20.0% | 1.4% | 0.026 | 100.0% | 12.3% | 0.219 | |
DeeplabV3 | gabled | 76.8% | 86.0% | 0.811 | 86.9% | 95.9% | 0.912 |
flat | 95.4% | 90.0% | 0.926 | 89.0% | 79.6% | 0.840 | |
hipped | 18.5% | 41.9% | 0.257 | 75.0% | 8.2% | 0.148 | |
complex | 18.2% | 4.4% | 0.071 | 49.5% | 33.9% | 0.402 | |
mono-pitched | 2.3% | 3.9% | 0.029 | 72.7% | 16.0% | 0.262 | |
PSPNet | gabled | 74.7% | 90.0% | 0.816 | 85.5% | 95.9% | 0.904 |
flat | 91.3% | 92.2% | 0.917 | 86.8% | 86.9% | 0.868 | |
hipped | 34.2% | 15.3% | 0.211 | 44.3% | 17.3% | 0.249 | |
complex | 40.6% | 9.2% | 0.150 | 48.1% | 8.1% | 0.139 | |
mono-pitched | 16.7% | 1.5% | 0.028 | 20.0% | 1.8% | 0.033 | |
Our Model | gabled | 90.0% | 82.4% | 0.860 | 96.2% | 92.1% | 0.941 |
flat | 90.3% | 98.0% | 0.940 | 88.5% | 91.6% | 0.900 | |
hipped | 56.3% | 94.7% | 0.706 | 69.7% | 74.3% | 0.719 | |
complex | 97.7% | 77.0% | 0.861 | 67.8% | 85.4% | 0.756 | |
mono-pitched | 37.5% | 66.7% | 0.480 | 54.2% | 46.4% | 0.500 |
Model | T1 | T2 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | KC | OA | Precision | Recall | F1-Score | KC | OA | |
Mask R-CNN | 71.0% | 69.0% | 0.700 | 0.688 | 0.807 | 72.7% | 61.3% | 0.665 | 0.705 | 0.851 |
U-Net | 66.3% | 40.4% | 0.502 | 0.66 | 0.807 | 73.5% | 44.6% | 0.555 | 0.626 | 0.838 |
DeeplabV3 | 42.2% | 45.2% | 0.437 | 0.65 | 0.791 | 74.6% | 46.7% | 0.575 | 0.663 | 0.854 |
PSPNet | 51.5% | 41.6% | 0.460 | 0.682 | 0.82 | 56.9% | 42.0% | 0.483 | 0.657 | 0.849 |
Our Model | 74.4% | 83.8% | 0.788 | 0.831 | 0.903 | 75.3% | 78.0% | 0.766 | 0.811 | 0.907 |
Model | T1 | T2 | Training Time (min) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | KC | OA | Precision | Recall | F1-Score | KC | OA | ||
ResNet18 | 39.6% | 48.2% | 0.435 | 0.443 | 0.596 | 77.0% | 62.7% | 0.691 | 0.588 | 0.733 | 187 |
ResNet34 | 53.4% | 49.9% | 0.516 | 0.512 | 0.633 | 57.2% | 59.3% | 0.583 | 0.615 | 0.752 | 198 |
ResNet50 | 71.0% | 69.0% | 0.700 | 0.688 | 0.807 | 72.7% | 61.3% | 0.665 | 0.705 | 0.851 | 211 |
ResNet101 | 69.8% | 83.9% | 0.762 | 0.808 | 0.884 | 77.1% | 70.6% | 0.737 | 0.779 | 0.913 | 301 |
ResNet152 | 74.4% | 83.8% | 0.788 | 0.831 | 0.903 | 75.3% | 78.0% | 0.766 | 0.811 | 0.907 | 220 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Li, S.; Teng, F.; Lin, Y.; Wang, M.; Cai, H. Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China. Remote Sens. 2022, 14, 265. https://doi.org/10.3390/rs14020265
Wang Y, Li S, Teng F, Lin Y, Wang M, Cai H. Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China. Remote Sensing. 2022; 14(2):265. https://doi.org/10.3390/rs14020265
Chicago/Turabian StyleWang, Yanjun, Shaochun Li, Fei Teng, Yunhao Lin, Mengjie Wang, and Hengfan Cai. 2022. "Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China" Remote Sensing 14, no. 2: 265. https://doi.org/10.3390/rs14020265
APA StyleWang, Y., Li, S., Teng, F., Lin, Y., Wang, M., & Cai, H. (2022). Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China. Remote Sensing, 14(2), 265. https://doi.org/10.3390/rs14020265