SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images
Abstract
:1. Introduction
- We introduce two kinds of SERMs into the semantic segmentation network to recalibrate feature responses adaptively and aggregate global information of the channel and spatial dimensions in parallel. The RAM is embedded into the bottom of the network to focus on features that are more informative among the features extracted by the network.
- We introduce DSM data and IRRG data to focus on the segmentation of surface vegetation categories, which helps to obtain better segmentation results.
- We conduct multiple comparative experiments using different data combinations and different models on the ISPRS Vaihingen and Potsdam datasets [16] to prove the superiority of the model.
2. Related Work
2.1. Semantic Segmentation Methods
2.2. Land Cover Segmentation of Remote Sensing Images
3. Methods
3.1. Network Architectures
3.1.1. Squeeze and Excitation Residual Network
3.1.2. Segmentation of Surface Vegetation
3.2. Squeeze and Excitation Residual Module
3.2.1. Channel Squeeze and Excitation Block
3.2.2. Spatial Squeeze and Excitation Block
3.2.3. Squeeze and Excitation Residual Block
3.3. Refine Attention Module
4. Experiments
4.1. Datasets
4.1.1. Original Datasets
4.1.2. Processed Datasets
4.2. Evaluation Metrics
Implementation Details
4.3. Analysis
4.3.1. Segmentation of Original Categories
4.3.2. Segmentation of Vegetation Categories
5. Discussion
5.1. Ablation Study
5.2. Improvements and Limitations
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, M.; Chen, Z. Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation. ISPRS J. Photogramm. Remote Sens. 2020, 159, 140–152. [Google Scholar]
- Moser, G.; Serpico, S.B.; Benediktsson, J.A. Land-Cover mapping by Markov modeling of spatial–contextual information in very-High-Resolution remote sensing images. Proc. IEEE 2013, 101, 631–651. [Google Scholar] [CrossRef]
- Dechesne, C.; Mallet, C.; Le Bris, A.; Gouet-Brunet, V. Semantic segmentation of forest stands of pure species combining airborne lidar data and very high resolution multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2017, 126, 129–145. [Google Scholar] [CrossRef]
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1–9. [Google Scholar]
- Liu, Y.; Minh Nguyen, D.; Deligiannis, N.; Ding, W.; Munteanu, A. Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery. Remote Sens. 2017, 9, 522. [Google Scholar] [CrossRef]
- Srivastava, S.; Volpi, M.; Tuia, D. Joint height estimation and semantic labeling of monocular aerial images with CNNS. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 5173–5176. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Wang, J. Pop-Net: Encoder-Dual Decoder for Semantic Segmentation and Single-View Height Estimation. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 4963–4966. [Google Scholar] [CrossRef]
- Qin, R.; Fang, W. A hierarchical building detection method for very high resolution remotely sensed images combined with DSM using graph cut optimization. Photogramm. Eng. Remote Sens. 2014, 80, 873–883. [Google Scholar] [CrossRef]
- Sun, W.; Wang, R. Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM. IEEE Geosci. Remote Sens. Lett. 2018, 15, 474–478. [Google Scholar] [CrossRef]
- Gedeon, T.; Parker, A.E.; Campion, C.; Aldworth, Z. Annealing and the normalized N-cut. Pattern Recognit. 2008, 41, 592–606. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks; Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- Bell, S.; Zitnick, C.L.; Bala, K.; Girshick, R. Inside-Outside Net: Detecting Objects in Context With Skip Pooling and Recurrent Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2874–2883. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 483–499. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv Prep. 2017, arXiv:1710.10903. [Google Scholar]
- Cramer, M. The DGPF Test on Digital Aerial Camera Evaluation—Overview and Test Design. Photogramm. Fernerkund. Geoinf. 2009, 11, 73–82. [Google Scholar]
- Yang, Y.; Hallman, S.; Ramanan, D.; Fowlkes, C.C. Layered object models for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1731–1743. [Google Scholar] [CrossRef] [Green Version]
- Al-Amri, S.S.; Salem Saleh, N.V.K.; Khamitkar, S.D. Image segmentation by using edge detection. Int. J. Comput. Sci. Eng. 2010, 2, 804–807. [Google Scholar]
- Zheng, X.; Lei, Q.; Yao, R.; Gong, Y.; Qian, Y. Image segmentation based on adaptive K-means algorithm. EURASIP J. Image Video Process. 2018, 2018, 1–10. [Google Scholar] [CrossRef]
- Sang, Q.; Zhuang, Y.; Dong, S.; Wang, G.; Chen, H.; Li, L. Improved land cover classification of VHR optical remote sensing imagery based upon detail injection procedure. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 18–31. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning Deconvolution Network for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015; pp. 1520–1528. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Volpi, M.; Tuia, D. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 881–893. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Yue, K.; Yang, L.; Li, R.; Hu, W.; Zhang, F.; Li, W. TreeUNet: Adaptive Tree convolutional neural networks for subdecimeter aerial image segmentation. ISPRS J. Photogramm. Remote Sens. 2019, 156, 1–13. [Google Scholar] [CrossRef]
- Ghiasi, G.; Fowlkes, C.C. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 519–534. [Google Scholar]
- Bilinski, P.; Prisacariu, V. Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6596–6605. [Google Scholar]
- Nogueira, K.; Dalla Mura, M.; Chanussot, J.; Schwartz, W.R.; dos Santos, J.A. Dynamic Multicontext Segmentation of Remote Sensing Images Based on Convolutional Networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7503–7520. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Simonyan, K.; Andrew, Z. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 630–645. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Li, H.; Xiong, P.; An, J.; Wang, L. Pyramid Attention Network for Semantic Segmentation. arXiv Prep. 2018, arXiv:1805.10180. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, H.; Qiu, K.; Chen, L.; Mei, X.; Hong, L.; Tao, C. SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 905–909. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–21 June 2019; pp. 3146–3154. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
- Marcos, D.; Volpi, M.; Kellenberger, B.; Tuia, D. Land cover mapping at very high resolution with rotation equivariant CNNs: Towards small yet accurate models. ISPRS J. Photogramm. Remote Sens. 2018, 145, 96–107. [Google Scholar] [CrossRef]
- Marmanis, D.; Schindler, K.; Wegner, J.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018, 135, 158–172. [Google Scholar] [CrossRef] [Green Version]
- Penatti, O.A.; Nogueira, K.; Dos Santos, J.A. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 44–51. [Google Scholar]
- Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 172–181. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Neupane, B.; Horanont, T.; Aryal, J. Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis. Remote Sens. 2021, 13, 808. [Google Scholar] [CrossRef]
- Seferbekov, S.; Iglovikov, V.; Buslaev, A.; Shvets, A. Feature pyramid network for multi-class land segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 272–275. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, Online, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
- Coy, A.; Rankine, D.; Taylor, M.; Nielsen, D.C.; Cohen, J. Increasing the Accuracy and Automation of Fractional Vegetation Cover Estimation from Digital Photographs. Remote Sens. 2016, 8, 474. [Google Scholar] [CrossRef]
- Li, Y.; Cao, Z.; Xiao, Y.; Lu, H.; Zhu, Y. A novel denoising autoencoder assisted segmentation algorithm for cotton field. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; pp. 588–593. [Google Scholar] [CrossRef]
- Liu, H.; Sun, H.; Li, M.; Iida, M. Application of Color Featuring and Deep Learning in Maize Plant Detection. Remote Sens. 2020, 12, 2229. [Google Scholar] [CrossRef]
- Xu, W.; Zhao, L.; Li, J.; Shang, S.; Ding, X.; Wang, T. Detection and classification of tea buds based on deep learning. Comput. Electron. Agric. 2022, 192, 106547. [Google Scholar] [CrossRef]
- Zhuang, S.; Wang, P.; Jiang, B. Segmentation of Green Vegetation in the Field Using Deep Neural Networks. In Proceedings of the 2018 13th World Congress on Intelligent Control and Automation (WCICA), Changsha, China, 4–8 July 2018; pp. 509–514. [Google Scholar] [CrossRef]
- Yang, L.; Chen, W.; Bi, P.; Tang, H.; Zhang, F.; Wang, Z. Improving vegetation segmentation with shadow effects based on double input networks using polarization images. Comput. Electron. Agric. 2022, 199, 107123. [Google Scholar] [CrossRef]
- Lemaire, C. Aspects of the DSM production with high resolution images. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 2008; Volume 37, pp. 1143–1146. Available online: https://www.isprs.org/proceedings/XXXVII/congress/4_pdf/200.pdf (accessed on 18 September 2022).
- Kosov, S.; Rottensteiner, F.; Heipke, C.; Leitloff, J.; Hinz, S. 3D Classification of Crossroads from Multiple Aerial Images Using Markov Random Fields. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX-B3, 479–484. [Google Scholar] [CrossRef]
- Taghanaki, S.A.; Zheng, Y.; Zhou, S.K.; Georgescu, B.; Sharma, P.; Xu, D.; Comaniciu, D.; Hamarneh, G. Combo loss: Handling input and output imbalance in multi-organ segmentation. Comput. Med. Imaging Graph. 2019, 75, 24–33. [Google Scholar] [CrossRef] [Green Version]
- Jha, D.; Smedsrud, P.H.; Riegler, M.A.; Johansen, D.; De Lange, T.; Halvorsen, P.; Johansen, H.D. Resunet++: An advanced architecture for medical image segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 225–2255. [Google Scholar]
Method | Imp. Surface | Building | Low Veg. | Tree | Car | mIoU(%) | AF(%) | OA(%) |
---|---|---|---|---|---|---|---|---|
FCN-32s [23] | 68.24/81.14 | 70.55/82.71 | 58.19/72.26 | 61.05/75.59 | 27.35/39.02 | 57.08 | 70.14 | 78.13 |
FCN-8s [23] | 72.23/83.41 | 75.54/84.98 | 63.64/77.78 | 64.98/77.86 | 44.97/59.91 | 64.27 | 76.79 | 80.56 |
UNet [26] | 78.09/86.93 | 79.47/86.87 | 66.52/79.67 | 66.13/79.35 | 52.38/67.63 | 68.52 | 80.09 | 83.69 |
ResNet50 [38] | 78.71/87.16 | 80.43/87.37 | 66.71/79.94 | 67.72/81.17 | 54.21/70.15 | 69.56 | 81.16 | 84.45 |
RefineNet [28] | 78.56/87.58 | 80.31/88.13 | 64.76/78.98 | 65.36/79.23 | 56.98/73.13 | 69.19 | 81.41 | 84.85 |
CBAM [42] | 78.98/88.21 | 81.17/89.02 | 66.83/80.14 | 69.87/81.98 | 53.32/69.46 | 70.03 | 81.76 | 85.03 |
ResUnet++ [64] | 79.47/88.65 | 81.23/89.23 | 65.93/79.62 | 69.11/81.33 | 53.04/68.77 | 69.76 | 81.52 | 85.07 |
SCAttNet V2 [43] | 80.32/89.07 | 82.06/90.26 | 66.81/80.05 | 67.18/80.14 | 54.16/69.88 | 70.11 | 81.88 | 85.42 |
PSPNet [33] | 81.13/89.63 | 82.67/90.51 | 66.43/79.89 | 67.52/80.61 | 53.81/70.14 | 70.31 | 82.16 | 85.64 |
FPN [52] | 79.72/88.44 | 81.19/89.31 | 65.75/79.58 | 67.29/80.44 | 57.63/73.95 | 70.32 | 82.34 | 85.65 |
Deeplab v3+ [35] | 79.67/88.37 | 81.91/89.64 | 67.75/81.70 | 68.70/81.73 | 54.43/70.04 | 70.49 | 82.30 | 85.69 |
RAANet [54] | 79.49/88.26 | 83.42/91.06 | 67.93/82.01 | 67.35/80.51 | 55.29/71.56 | 70.50 | 82.37 | 85.73 |
SENet [40] | 80.57/89.01 | 82.11/90.02 | 67.03/80.79 | 68.44/81.43 | 54.81/70.63 | 70.59 | 82.38 | 85.75 |
SERNet w/o SE, RAM | 80.52/89.32 | 82.75/90.38 | 66.84/80.43 | 67.70/81.42 | 56.66/72.39 | 70.89 | 82.79 | 86.38 |
SERNet w/o RAM | 83.00/91.69 | 84.10/91.25 | 67.47/81.31 | 68.04/81.01 | 59.27/75.12 | 72.38 | 84.08 | 87.78 |
SERNet (ours) | 83.15/91.79 | 84.59/91.64 | 67.72/81.68 | 67.73/81.64 | 60.25/75.68 | 72.69 | 84.49 | 88.19 |
Method | Imp. Surface | Building | Low Veg. | Tree | Car | mIoU(%) | AF(%) | OA(%) |
---|---|---|---|---|---|---|---|---|
FCN-32s [23] | 68.40/78.98 | 74.54/84.85 | 46.06/63.77 | 60.78/74.86 | 58.76/74.00 | 61.71 | 75.29 | 79.73 |
FCN-8s [23] | 71.65/83.16 | 72.58/83.47 | 50.16/65.95 | 62.11/75.06 | 58.73/73.81 | 63.05 | 76.29 | 82.17 |
UNet [26] | 74.69/85.15 | 85.01/91.53 | 49.47/66.11 | 62.62/76.05 | 59.76/74.64 | 66.31 | 78.70 | 84.09 |
ResNet50 [38] | 76.13/86.47 | 86.57/92.67 | 50.03/67.21 | 62.58/77.19 | 65.11/79.43 | 68.08 | 80.59 | 85.65 |
RefineNet [28] | 76.27/86.49 | 86.10/92.54 | 50.95/67.49 | 62.75/77.33 | 62.93/77.17 | 67.80 | 80.20 | 85.46 |
CBAM [42] | 76.23/85.72 | 84.17/90.92 | 49.65/66.58 | 62.78/77.40 | 76.87/86.62 | 69.94 | 81.45 | 85.99 |
ResUNet++ [64] | 77.91/87.56 | 87.36/91.29 | 50.26/66.79 | 64.27/78.34 | 70.43/81.14 | 70.05 | 81.02 | 86.39 |
SCAttNet V2 [43] | 80.44/89.18 | 88.89/94.20 | 56.17/71.91 | 63.45/77.63 | 77.80/87.36 | 73.35 | 84.06 | 87.58 |
PSPNet [33] | 79.78/88.75 | 87.91/93.60 | 54.43/70.48 | 64.12/78.08 | 78.97/88.23 | 73.04 | 83.83 | 87.45 |
FPN [52] | 77.49/87.33 | 86.98 /92.88 | 56.28/72.05 | 64.71/78.26 | 81.34/91.71 | 73.36 | 84.45 | 87.67 |
Deeplab v3+ [35] | 78.75/88.06 | 87.72/93.44 | 58.49/74.52 | 65.86/79.37 | 77.15/87.09 | 73.59 | 84.50 | 87.23 |
RAANet [54] | 78.83/88.14 | 88.65/94.12 | 55.34/71.03 | 64.07/77.92 | 79.58/88.72 | 73.29 | 83.99 | 87.56 |
SENet [40] | 81.57/90.41 | 88.09/93.67 | 57.12/73.29 | 67.13/81.79 | 79.98/89.10 | 74.78 | 85.65 | 87.91 |
SERNet (ours) | 84.31/91.75 | 91.87/95.30 | 57.58/73.74 | 67.81/82.26 | 82.24/92.13 | 76.76 | 87.04 | 90.29 |
Method | Low Veg. | Tree | Background | mIoU(%) | AF(%) | OA(%) |
---|---|---|---|---|---|---|
FCN-32s [23] | 58.47/73.67 | 61.67/76.43 | 84.59/89.17 | 68.24 | 79.76 | 81.24 |
FCN-8s [23] | 64.31/78.05 | 65.80/79.37 | 87.74/91.02 | 72.62 | 82.81 | 83.76 |
UNet [26] | 65.97/79.02 | 65.81/79.04 | 86.21/90.45 | 72.66 | 82.84 | 86.57 |
ResNet50 [38] | 66.49/79.68 | 66.93/80.42 | 88.40/91.94 | 73.94 | 84.01 | 86.61 |
RefineNet [28] | 65.61/79.70 | 66.04/79.86 | 89.04/92.56 | 73.56 | 84.04 | 87.49 |
CBAM [42] | 67.11/80.35 | 69.98/82.02 | 88.39/92.03 | 75.16 | 84.80 | 89.16 |
ResUnet++ [64] | 65.38/79.23 | 68.46/80.79 | 88.13/91.78 | 73.99 | 83.93 | 87.99 |
SCAttNet V2 [43] | 67.04/80.11 | 67.13/80.14 | 89.21/93.04 | 74.46 | 84.43 | 88.53 |
PSPNet [33] | 67.17/80.20 | 68.15/81.01 | 88.46/92.15 | 74.59 | 84.45 | 88.38 |
FPN [52] | 66.55/79.76 | 68.67/81.59 | 89.32/92.81 | 74.85 | 84.72 | 89.38 |
Deeplab v3+ [35] | 68.21/81.54 | 68.92/81.80 | 88.01/91.76 | 75.05 | 85.03 | 89.67 |
RAANet [54] | 68.06/81.83 | 67.62/80.60 | 89.05/92.29 | 74.91 | 84.91 | 89.54 |
SENet [40] | 67.83/81.16 | 68.37/81.23 | 88.67/92.31 | 74.96 | 84.90 | 88.72 |
SERNet (a-DSM) | 50.34/62.71 | 48.47/61.23 | 67.59/71.86 | 55.47 | 65.27 | 71.83 |
SERNet (a) | 68.02/81.39 | 68.78/81.95 | 89.41/92.82 | 75.40 | 85.39 | 90.01 |
SERNet (b) | 69.71/83.42 | 70.69/83.87 | 90.14/94.57 | 76.85 | 87.29 | 91.45 |
Method | Low Veg. | Tree | Background | mIoU(%) | AF(%) | OA(%) |
---|---|---|---|---|---|---|
FCN-32s [23] | 46.08/63.72 | 61.29/74.98 | 74.99/84.91 | 60.79 | 74.54 | 82.74 |
FCN-8s [23] | 50.57/66.47 | 62.65/75.55 | 76.79/85.34 | 63.34 | 75.79 | 84.95 |
UNet [26] | 50.24/66.71 | 62.04/75.46 | 84.76/89.62 | 65.68 | 77.26 | 86.19 |
ResNet50 [38] | 50.01/66.43 | 63.98/77.59 | 87.47/93.30 | 67.15 | 79.11 | 88.91 |
RefineNet [28] | 50.74/67.59 | 63.11/77.32 | 86.12/92.19 | 66.66 | 79.03 | 87.78 |
CBAM [42] | 51.62/68.01 | 64.23/78.14 | 85.73/91.25 | 67.19 | 79.13 | 88.79 |
ResUnet++ [64] | 49.59/66.24 | 61.71/75.34 | 86.59/92.77 | 65.96 | 78.12 | 87.90 |
SCAttNet V2 [43] | 56.10/71.84 | 63.40/77.57 | 88.84/94.09 | 69.45 | 81.17 | 89.75 |
PSPNet [33] | 55.46/71.29 | 64.57/78.38 | 87.41/93.26 | 69.15 | 80.98 | 89.86 |
FPN [52] | 56.22/71.92 | 64.32/78.26 | 86.09/92.00 | 68.88 | 80.73 | 89.57 |
Deeplab v3+ [35] | 58.75/74.64 | 65.42/79.07 | 88.31/93.8 | 70.83 | 82.50 | 89.42 |
RAANet [54] | 57.38/73.33 | 65.10/78.87 | 88.67/92.36 | 70.38 | 81.52 | 89.93 |
SENet [40] | 57.43/73.51 | 67.66/82.14 | 90.29/94.67 | 71.79 | 83.44 | 90.01 |
SERNet (a-DSM) | 40.27/53.46 | 50.28/63.45 | 70.16/75.18 | 53.57 | 64.03 | 70.87 |
SERNet (a) | 58.17/74.09 | 68.34/83.21 | 91.05/94.84 | 72.52 | 84.05 | 91.03 |
SERNet (b) | 59.63/75.73 | 69.85/84.73 | 92.68/96.53 | 74.05 | 85.66 | 92.59 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Li, L.; Di, D.; Wang, J.; Chen, G.; Jing, W.; Emam, M. SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 4770. https://doi.org/10.3390/rs14194770
Zhang X, Li L, Di D, Wang J, Chen G, Jing W, Emam M. SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sensing. 2022; 14(19):4770. https://doi.org/10.3390/rs14194770
Chicago/Turabian StyleZhang, Xiaoyan, Linhui Li, Donglin Di, Jian Wang, Guangsheng Chen, Weipeng Jing, and Mahmoud Emam. 2022. "SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images" Remote Sensing 14, no. 19: 4770. https://doi.org/10.3390/rs14194770
APA StyleZhang, X., Li, L., Di, D., Wang, J., Chen, G., Jing, W., & Emam, M. (2022). SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sensing, 14(19), 4770. https://doi.org/10.3390/rs14194770